Metabolic engineering concepts and applications : Volume 13a 9783527823468, 3527823468


584 91 18MB

English Pages [929] Year 2021

Report DMCA / Copyright

DOWNLOAD PDF FILE

Recommend Papers

Metabolic engineering concepts and applications : Volume 13a
 9783527823468, 3527823468

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Metabolic Engineering

Related Titles

Wittmann, Ch., Liao, J.C. (eds.)

Komives, C., Zhou, W. (eds.)

Industrial Biotechnology

Bioprocessing Technology for Production of Biopharmaceuticals and Bioproducts

Products and Processes

2019 Print ISBN: 978-1-118-36198-6 (Also available in a range of electronic products)

Hu, W.

2017 Print ISBN: 978-3-527-34181-8

Nielsen, J., Hohmann, S. (eds.)

Systems Biology 2017 Print ISBN: 978-3-527-33558-9

Engineering Principles in Biotechnology

Smolke, C.

2018

Parts, Devices, and Applications

Print ISBN: 978-1-119-15902-5 (Also available in a range of electronic products)

Synthetic Biology 2018 Print ISBN: 978-3-527-33075-1

La Barre, S., Bates, S.S. (eds.)

Chang, H.N.

Blue Biotechnology Production and Use of Marine Molecules

Emerging Areas in Bioengineering

2018

2018 Print ISBN: 978-3-527-34088-0

Print ISBN: 978-3-527-34138-2 (Also available in a range of electronic products)

Lee, M.L., Kildgaard, H.F.

Cell Culture Engineering Further Volumes of the “Advanced Biotechnology” Series: Published: Villadsen, J. (ed.)

Recombinant Protein Production 2019 Print ISBN: 978-3-527-81140-3

Planned:

Fundamental Bioengineering

Rehm, B.H.A., Moradali, M.F.

2016 Print ISBN: 978-3-527-33674-6

Biopolymers for Biomedical and Biotechnological Applications

Love, J.Ch. (ed.)

2020 Print ISBN: 978-3-527-34530-4

Micro- and Nanosystems for Biotechnology

Zhao, H.

2016 Print ISBN: 978-3-527-33281-6

Wittmann, Ch., Liao, J.C. (eds.)

Industrial Biotechnology Microorganisms (2 Volumes) 2017 Print ISBN: 978-3-527-34179-5

Protein Engineering Tools and Applications 2021 Print ISBN: 978-3-527-34470-3

Hudson, P.

Cyanobacteria Biotechnology 2021 Print ISBN: 978-3-527-34

Metabolic Engineering Concepts and Applications

Edited by Sang Yup Lee Jens Nielsen Gregory Stephanopoulos

Volume 13a

Metabolic Engineering Concepts and Applications

Edited by Sang Yup Lee Jens Nielsen Gregory Stephanopoulos

Volume 13b

Volume and Series Editors Prof. Dr. Sang Yup Lee

KAIST 373-1; Guseong-Dong 291 Daehak-ro, Yuseong-gu 305-701 Daejon South Korea

All books published by Wiley-VCH are carefully produced. Nevertheless, authors, editors, and publisher do not warrant the information contained in these books, including this book, to be free of errors. Readers are advised to keep in mind that statements, data, illustrations, procedural details or other items may inadvertently be inaccurate.

Prof. Dr. Jens Nielsen

Chalmers University Department of Biology and Biological Engineering Kemivägen 10 412 96 Göteborg Sweden

Library of Congress Card No.:

applied for British Library Cataloguing-in-Publication Data

A catalogue record for this book is available from the British Library.

Prof. Dr. Gregory Stephanopoulos

Massachusetts Institute of Technology Department of Chemical Engineering Massachusetts Ave 77 Cambridge, MA 02139 USA Cover

Bibliographic information published by the Deutsche Nationalbibliothek

The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at .

Gettyimages 1134101608/ imaginima © 2021 WILEY-VCH GmbH, Boschstr. 12, 69469 Weinheim, Germany All rights reserved (including those of translation into other languages). No part of this book may be reproduced in any form – by photoprinting, microfilm, or any other means – nor transmitted or translated into a machine language without written permission from the publishers. Registered names, trademarks, etc. used in this book, even when not specifically marked as such, are not to be considered unprotected by law. Print ISBN: 978-3-527-34662-2 ePDF ISBN: 978-3-527-82344-4 ePub ISBN: 978-3-527-82345-1 oBook ISBN: 978-3-527-82346-8

Adam-Design, Weinheim, Germany Typesetting Straive, Chennai, India Cover Design

Printing and Binding

Printed on acid-free paper 10 9 8 7 6 5 4 3 2 1

To the memory of Maria Flytzani Stephanopoulos, a brilliant engineer-scientist, who believed in metabolic engineering as catalysis of the future, and Hye Jean Hwang and Dina Petranovic Nielsen for their inspiration and unwavering support.

vii

Contents

Volume 13a Preface xv

Part I

Concepts 1

1

Metabolic Engineering Perspectives 3 Nian Liu and Gregory Stephanopoulos

1.1 1.2 1.2.1 1.2.2 1.3 1.3.1 1.3.2 1.3.3 1.4 1.5 1.6 1.7

History and Overview of Metabolic Engineering 3 Understanding Cellular Metabolism and Physiology 5 Computational Methods in Understanding Metabolism 6 Experimental Methods in Understanding Metabolism 7 General Approaches to Metabolic Engineering 9 Rational Metabolic Engineering 10 Combinatorial Metabolic Engineering 12 Systems Metabolic Engineering 14 Host Organism Selection 15 Substrate Considerations 15 Metabolic Engineering and Synthetic Biology 16 The Future of Metabolic Engineering 17 References 19

2

Genome-Scale Models: Two Decades of Progress and a 2020 Vision 23 Bernhard O. Palsson

2.1 2.2 2.2.1 2.2.2 2.2.3 2.2.4 2.2.5 2.2.6

Introduction 23 Flux Balance Analysis 23 Dynamic Mass Balances 23 Analogy to Deriving Enzymatic Rate Equations 25 Formulating Flux Balances at the Genome-Scale 25 Constrained Optimization 26 Principles 26 Additional Constraints 27

viii

Contents

2.2.7 2.2.8 2.3 2.3.1 2.3.2 2.3.3 2.3.4 2.3.5 2.3.6 2.3.7 2.3.8 2.3.9 2.4 2.4.1 2.4.2 2.4.3 2.4.4 2.4.5 2.4.6 2.5 2.5.1 2.5.2 2.5.2.1 2.5.2.2 2.5.2.3 2.5.2.4 2.5.3 2.6 2.6.1 2.6.2 2.6.2.1 2.6.2.2 2.6.3 2.7 2.7.1 2.7.2 2.8

Flux–Concentration Duality 28 Recap 28 Network Reconstruction 30 Assembling the Reactome 30 Basic Principles of Network Reconstruction 30 Curation 32 GEMs Have a Genomic Basis 32 Computational Queries 32 Scope Expansion 33 Knowledge Bases 35 Availability of GEMs 35 Recap 35 Brief History of the GEM for E. coli 36 Origin 36 Model Organism 36 Key Predictions 38 Design Algorithms 38 Scope Expansions 41 Recap 42 From Metabolism to the Proteome 42 ME Models 42 Capabilities of ME Models 43 Growth-Coupled Metabolic Designs Can Be Reproduced in GEMs 43 ME Models Can Reflect Properties of the Metalloproteome 44 ME Models Can Compute the Biomass Objective Function 44 Computing Stresses 46 Recapitulation 49 Current Developments 50 Kinetics 50 Transcriptional Regulation 51 iModulons 52 Activities 52 Protein Structures 55 Broader Perspectives 56 Distal Causation 56 Contextualization of GEMs Within Workflows 57 What Does the Future Look Like for GEMs? 59 Disclaimer 62 Acknowledgments 63 References 63

3

Quantitative Metabolic Flux Analysis Based on Isotope Labeling 73 Wolfgang Wiechert and Katharina Nöh

3.1

Introduction 73

Contents

3.1.1 3.1.2 3.2 3.2.1 3.2.2 3.2.3 3.2.4 3.2.5 3.2.6 3.2.7 3.2.8 3.2.9 3.2.10 3.2.11 3.2.12 3.3 3.3.1 3.3.2 3.4 3.4.1 3.4.2 3.4.3 3.4.4 3.4.5 3.4.6 3.4.7 3.5 3.5.1 3.5.2 3.5.3 3.5.4 3.5.5 3.5.6 3.5.7 3.5.8 3.5.9 3.6

What Metabolic Flux Analysis Is About 73 The Variants of 13 C-MFA 76 A Toy Example Illustrates the Basic Principles 77 Fluxomics: More Than Just a Branch of Metabolomics 77 Isotope Labeling: The Key to Metabolic Fluxes 79 From the Data to the Intracellular Fluxes 82 INST-13 C-MFA: Metabolic Stationary, but Isotopically Nonstationary 83 From Measurements to Flux Estimates: Parameter Fitting 84 Flux Estimates Have Confidence Bounds: Statistical Analysis 86 The Classical Approach at Metabolic and Isotopic Stationary State 90 An Additional Source of Information: Carbon Atom Transitions 91 Input Labeling Design: How Informative Can an Experiment Be Made? 93 The Isotopomers of a Single Metabolite Can Be a Rich Source of Information 94 Bidirectional Reaction Steps: More Than Just Nuisance Factors 95 Isotopomer Fractions Cannot Be Measured Comprehensively 96 Lessons Learned from the Example 97 Definition of 13 C-MFA Revisited 97 Statistical Evaluation and Optimal Experimental Design 99 How to Configure an Isotope Labeling Experiment 100 Modeling and Simulation of Isotope Labeling Experiments 101 Metabolic Network Specification 101 Atom Transition Network Specification 103 Input Labeling Composition 104 Measurement Specification 106 Flux Constraints 107 In Silico Experimental ILE Design 108 Putting Theory into Practice 108 A Recipe How to Start 108 Metabolic and Isotopic Stationarity 110 Measuring Extracellular Fluxes 111 Administering Labeled Substrate(s) 112 Metabolomics: Sampling, Sample Preparation, and Analytical Procedures 113 Adjusting Labeling Enrichments for Isotopic Steady State Approximation 115 Correcting Labeling Enrichments for Natural Isotope Abundance 116 Simulation of Labeling Data and Flux Estimation 117 Delicacies of INST-13 C-MFA 123 Future Challenges of 13 C-MFA 124 Acknowledgments 125 Abbreviations 125 References 126

ix

x

Contents

4

Proteome Constraints in Genome-Scale Models 137 Yu Chen, Jens Nielsen, and Eduard J. Kerkhoven

4.1 4.2 4.3 4.3.1 4.3.2 4.4

Introduction 137 Cellular Constraints 137 Formulation of Proteome Constraints 139 Coarse-Grained Integration of Proteome Constraints 139 Fine-Tuned Integration of Proteome Constraints 144 Perspectives 150 References 151

5

Kinetic Models of Metabolism 153 Hongzhong Lu, Yu Chen, Jens Nielsen, and Eduard J. Kerkhoven

5.1 5.2 5.2.1 5.3 5.4 5.4.1 5.4.2 5.4.3 5.5 5.5.1 5.6 5.7 5.7.1 5.7.2 5.7.3 5.7.4 5.8 5.9

Introduction 153 Definition of Enzyme Kinetics 153 Michaelis–Menten Formula 153 Factors Affecting Intracellular Enzyme Kinetics 155 Kinetic Model: Definition and Scope 156 What Is a Kinetic Model? 156 Scope of Kinetic Models 156 How to Build a Functional Kinetic Model? 157 Main Mathematical Expressions in Description of Reaction Rates 158 Mechanistic Rate Expressions 158 Approximative Rate Expressions 159 Approaches to Assign Parameters in the Rate Expressions 160 Direct Measurements of Kinetic Parameters in Enzyme Assays 161 Querying Databases 161 Inferring from Measured Fluxes 162 Parameters Inference Using the Statistical Analysis 163 Applications 166 Perspectives 167 References 168

6

Metabolic Control Analysis 171 David A. Fell

6.1

The Metabolic Engineering Context of Metabolic Control Analysis 171 MCA Theory 174 Metabolic Steady State 174 Flux Control Coefficients 175 Examples of the Flux–Enzyme Relationship 176 Flux Summation Theorem 178 Concentration Control Coefficients 179 Linking Control Coefficients to Enzyme Properties 181 Enzyme Rate Equations and Elasticity Coefficients 181 Elasticities and Control Coefficients 184 Block Coefficients and Top-Down Analysis 186 Feedback Inhibition 186

6.2 6.2.1 6.2.2 6.2.3 6.2.4 6.2.5 6.2.6 6.2.6.1 6.2.6.2 6.2.6.3 6.2.7

Contents

6.2.8 6.3 6.3.1 6.3.2 6.3.3 6.3.4 6.3.4.1 6.3.4.2 6.3.4.3 6.3.4.4 6.3.5 6.4

Large Alterations of Enzyme Activity 188 Implications of MCA for Metabolic Engineering Strategies 190 Abolishing Feedback Inhibition 191 Increasing Demand for Product 194 Inhibition of Competing Pathways 195 Designing Large Changes in Metabolic Flux 196 Yeast Tryptophan Synthesis 197 The Universal Method 199 Bacterial Production of Aromatic Amino Acids 200 Penicillin and Other Instances 202 Impacts on Yield from a Growing System 203 Conclusion 205 Appendix 6.A: Feedback Inhibition Simulation 205 References 207

7

Thermodynamics of Metabolic Pathways 213 Daniel Robert Weilandt, Maria Masid, and Vassily Hatzimanikatis

7.1 7.2 7.2.1 7.2.1.1 7.2.1.2 7.2.1.3

Bioenergetics in Life and in Metabolic Engineering 213 Thermodynamics-Based Flux Analysis Workflow 215 Thermodynamic Model Curation 215 Estimation of the Standard Free Energies of Formation 216 Compensating for Compartment-Specific Ionic Strength and pH 220 Compensating the Free Energy of Formation for Isomer Distributions 221 Computing the Transformed Free Energies of Reaction 223 Mathematical Formulation 227 Thermodynamics-Based Flux Analysis Applications 228 Constraining the Flux Space with Metabolomics Data 228 Characterizing the Feasible Concentration Space 229 Conclusion and Future Perspectives 231 References 233

7.2.1.4 7.2.2 7.3 7.3.1 7.3.2 7.4

8

8.1 8.1.1 8.2 8.2.1 8.2.1.1 8.2.1.2 8.2.1.3 8.2.2 8.2.2.1 8.2.2.2 8.2.2.3 8.2.3

Pathway Design 237 Jasmin Hafner, Homa Mohammadi-Peyhani, and Vassily Hatzimanikatis

Definition 237 De Novo Design of Metabolic Pathways 237 Manual Versus Computational Design 238 Pathway Design Workflow 238 Biochemical Search Space 238 Reaction Prediction 240 Retrobiosynthesis 241 Network Data Representation 242 Pathway Search 242 Stoichiometric Matrix-Based Search 243 Graph-Based Search 243 Pathway Ranking 244 Enzyme Assignment 244

xi

xii

Contents

8.2.3.1 8.2.3.2 8.2.4 8.2.4.1 8.2.4.2 8.2.4.3 8.2.4.4 8.2.4.5 8.3 8.3.1 8.3.2 8.3.3 8.3.3.1 8.3.3.2 8.3.3.3 8.3.3.4 8.3.3.5 8.4

Enzyme Prediction for Orphan and Novel Reactions 244 Choice of Protein Sequence 246 Pathway Feasibility 246 Chassis Metabolic Model 246 Stoichiometric Feasibility 246 Thermodynamic Feasibility 246 Kinetic Feasibility 247 Toxicity of Intermediates 247 Applications 247 Available Tools for Pathway Design 247 Successful Applications of Pathway Design Tools 249 Practical Example of Pathway Design 249 Creating a Biochemical Network Around BDO 249 Search for Biosynthetic Pathways 251 Finding Enzymes for Novel Reactions 251 Stoichiometric and Thermodynamic Pathway Evaluation 251 Overall Ranking of Pathways 251 Conclusions and Future Perspectives 253 References 254

9

Metabolomics 259 Tomek Diederen, Alexis Delabrière, Alaa Othman, Michelle E. Reid, and Nicola Zamboni

9.1 9.2 9.2.1 9.2.2 9.2.3 9.3 9.3.1 9.3.2 9.3.2.1 9.3.2.2 9.3.2.3 9.3.3 9.3.3.1 9.3.3.2 9.3.3.3 9.3.3.4 9.3.3.5 9.4 9.4.1 9.4.1.1 9.4.1.2 9.4.1.3 9.4.1.4

Introduction 259 Fundamentals 260 Experimental Design 260 Targeted and Untargeted Metabolomics 260 Sequences and Standards 261 Analytical Techniques 262 Sample Preparation 262 Separation Techniques 264 Liquid Chromatography 264 Gas Chromatography 266 Alternative Separation Techniques 266 Mass Spectrometry 268 Ionization Techniques 268 Low-Resolution MS 269 High-Resolution MS 270 Acquisition Modes for Targeted MS 271 Acquisition Modes for Untargeted Metabolomics 272 Data Analysis 272 Data Processing in Untargeted Metabolomics 273 Preprocessing of Individual MS Runs 273 Peak Picking 273 Peak Alignment and Retention Time Correction 274 Peak Grouping 274

Contents

9.4.1.5 9.4.1.6 9.4.1.7 9.4.2 9.4.2.1 9.4.2.2 9.4.2.3 9.5 9.5.1 9.5.2 9.5.3 9.6 9.6.1 9.6.2 9.6.3 9.6.4 9.6.5 9.7

Missing Values 274 Normalization 274 Annotation 276 Data Analysis and Interpretation 277 Univariate Statistics 277 Multivariate Statistics 278 Pathway Analysis 278 Emerging Trends for Cellular Analyses 279 High-Throughput Metabolomics for Large Scale Screening 279 Single Cell Metabolomics 280 Dynamic Analysis 281 Applications of Metabolomics in Metabolic Engineering 281 Pathway Design by Thermodynamic Analysis 281 Alleviating Pathway Bottlenecks 283 Reduction of Side Products and Metabolite Damage 284 Improving Stress Tolerance 284 Engineer Medium Composition 285 Final Remarks 285 References 286

10

Genome Editing of Eukarya 301 Jonathan A. Arnesen, Jakob Blæsbjerg Hoof, Helene Faustrup Kildegaard, and Irina Borodina

10.1 10.2 10.2.1 10.2.2 10.2.3 10.3 10.3.1 10.3.2 10.3.3 10.4

Basic Principles of Genome Editing 301 Endonucleases 304 Zinc-Finger Nucleases 304 Transcription Activator-Like Effectors Nucleases 306 CRISPR/Cas 308 Genome Editing of Industrially Relevant Eukaryotes 310 Yeast 310 Filamentous Fungi 313 Chinese Hamster Ovary Cells 316 Outlook 320 References 320 Volume 13b Preface xvii

Part II 11

Applications 339

Metabolic Engineering of Escherichia coli 341 Zi Wei Luo, Jung Ho Ahn, Tong Un Chae, So Young Choi, Seon Young Park, Yoojin Choi, Jiyong Kim, Cindy Pricilia Surya Prabowo, Jong An Lee, Dongsoo Yang, Taehee Han, Hanwen Xu, and Sang Yup Lee

xiii

xiv

Contents

403

12

Metabolic Engineering of Corynebacterium glutamicum Judith Becker and Christoph Wittmann

13

Metabolic Engineering of Bacillus – New Tools, Strains, and Concepts 469 Mathis Appelbaum and Thomas Schweder

14

Metabolic Engineering of Pseudomonas 519 Pablo I. Nikel and Víctor de Lorenzo

15

Metabolic Engineering of Lactic Acid Bacteria 551 Robin Dorau, Jianming Liu, Christian Solem, and Peter Ruhdal Jensen

16

Metabolic Engineering and the Synthetic Biology Toolbox for Clostridium 611 Rochelle C. Joseph, Susan Q. Kelley, Nancy M. Kim, and Nicholas R. Sandoval

17

Metabolic Engineering of Filamentous Actinomycetes 653 Charlotte Beck, Kai Blin, Tetiana Gren, Xinglin Jiang, Omkar Satyavan Mohite, Emilia Palazzotto, Yaojun Tong, Pep Charusanti, and Tilmann Weber

18

Metabolic Engineering of Yeast 689 Rui Pereira, Olena P. Ishchuk, Xiaowei Li, Quanli Liu, Yi Liu, Maximilian Otto, Yun Chen, Verena Siewers, and Jens Nielsen

19

Harness Yarrowia lipolytica to Make Small Molecule Products 735 Kang Zhou and Gregory Stephanopoulos

20

Metabolic Engineering of Filamentous Fungi 765 Vera Meyer

21

Metabolic Engineering of Photosynthetic Cells – in Collaboration with Nature 803 Mette Sørensen and Birger Lindberg Møller

22

Metabolic Engineering for Large-Scale Environmental Bioremediation 859 Pablo I. Nikel and Víctor de Lorenzo Index 891

vii

Contents

Volume 13a Preface xv

Part I

Concepts 1

1

Metabolic Engineering Perspectives 3 Nian Liu and Gregory Stephanopoulos

2

Genome-Scale Models: Two Decades of Progress and a 2020 Vision 23 Bernhard O. Palsson

3

Quantitative Metabolic Flux Analysis Based on Isotope Labeling 73 Wolfgang Wiechert and Katharina Nöh

4

Proteome Constraints in Genome-Scale Models 137 Yu Chen, Jens Nielsen, and Eduard J. Kerkhoven

5

Kinetic Models of Metabolism 153 Hongzhong Lu, Yu Chen, Jens Nielsen, and Eduard J. Kerkhoven

6

Metabolic Control Analysis 171 David A. Fell

7

Thermodynamics of Metabolic Pathways 213 Daniel Robert Weilandt, Maria Masid, and Vassily Hatzimanikatis

8

Pathway Design 237 Jasmin Hafner, Homa Mohammadi-Peyhani, and Vassily Hatzimanikatis

viii

Contents

9

Metabolomics 259 Tomek Diederen, Alexis Delabrière, Alaa Othman, Michelle E. Reid, and Nicola Zamboni

10

Genome Editing of Eukarya 301 Jonathan A. Arnesen, Jakob Blæsbjerg Hoof, Helene Faustrup Kildegaard, and Irina Borodina Volume 13b Preface xvii

Part II

Applications 339

11

Metabolic Engineering of Escherichia coli 341 Zi Wei Luo, Jung Ho Ahn, Tong Un Chae, So Young Choi, Seon Young Park, Yoojin Choi, Jiyong Kim, Cindy Pricilia Surya Prabowo, Jong An Lee, Dongsoo Yang, Taehee Han, Hanwen Xu, and Sang Yup Lee

11.1 11.2 11.2.1 11.2.2 11.2.3 11.2.4 11.3 11.3.1 11.3.1.1 11.3.1.2 11.3.1.3 11.3.1.4 11.3.1.5 11.3.1.6 11.3.2 11.3.2.1 11.3.2.2 11.3.3 11.3.3.1 11.3.3.2 11.3.3.3 11.3.3.4 11.4 11.4.1 11.4.1.1 11.4.1.2 11.4.1.3

Introduction 341 Metabolic Engineering of E. coli for the Production of Fuels 342 Fermentative Pathway 343 Keto Acid Pathway 354 Isoprenoid Pathway 355 Fatty Acid Pathway 355 Metabolic Engineering of E. coli for the Production of Chemicals 356 Bulk Chemicals 357 𝜔-Amino Acids 357 Hydroxy Acids 359 Diamines 360 Dicarboxylic Acids 361 Diols 362 Lactams 363 Specialty Chemicals 363 l-Amino Acids 364 Specialty Aromatics 365 Natural Products 367 Terpenoids 367 Phenylpropanoids 369 Alkaloids 370 Polyketides 370 Metabolic Engineering of E. coli for the Production of Materials 371 Recombinant Proteins 371 Therapeutic Proteins 371 Membrane Proteins 373 Protein-Based Materials 374

Contents

11.4.2 11.4.2.1 11.4.2.2 11.4.2.3 11.4.3 11.5

Biopolymers 374 PHAs 374 Polysaccharides 379 Nonprotein Poly(Amino Acids) 380 Nanomaterials 381 Conclusions and Perspectives 383 Acknowledgment 384 References 384

12

Metabolic Engineering of Corynebacterium glutamicum Judith Becker and Christoph Wittmann

12.1 12.2 12.2.1 12.2.2 12.3 12.3.1 12.3.2 12.3.2.1 12.3.2.2 12.3.2.3 12.3.2.4 12.3.3 12.3.3.1 12.3.4 12.3.4.1 12.3.4.2 12.4 12.4.1 12.4.1.1 12.4.1.2 12.4.1.3 12.4.1.4 12.4.1.5 12.4.1.6 12.4.1.7 12.4.1.8 12.4.1.9 12.4.2 12.4.2.1 12.4.2.2 12.4.2.3 12.4.2.4 12.4.2.5 12.4.3 12.4.3.1 12.4.3.2

Introduction 403 Systems Metabolic Engineering Strategies 404 Experimental and Computational Systems Biology 404 Genome Editing Approaches and Technologies 406 Metabolic Engineering of the Substrate Spectrum 407 Industrial Raw Materials 407 Lignocellulosic Sugars 408 Xylose 408 Arabinose 413 Mannose 414 Oligosaccharides 414 Aquatic Sugars 415 Mannitol 415 Valorization of Lignin Aromatics 417 Catechol, Phenol, and Benzoate 417 Ferulate, Caffeate, Cinnamate, and p-Coumarate 418 Industrial Products 419 Amino Acids 419 l-Glutamate 419 l-Lysine 420 Aminovalerate 421 Shinorine 423 Ectoine 425 l-Pipecolic Acid 426 Trans-4-hydroxyproline 426 l-Theanine 427 4-Hydroxyisoleucine 427 Organic Acids and Alcohols 427 Cis-cis-muconate 428 Glutarate 429 Itaconate 430 3-Hydroxypropionate 430 Short-Chain Alcohols 432 Natural Products and Active Ingredients 433 Pyrazine 433 Violacein 433

403

ix

x

Contents

12.4.3.3 12.4.4 12.4.4.1 12.4.4.2 12.4.5 12.4.5.1 12.4.5.2 12.4.6 12.5

Terpenoids 433 Biopolymers 434 Hyaluronic Acid 434 Polyglutamate 435 Recombinant Proteins 436 Endoxylanase 436 β-Glucosidase 436 Recombinant RNA 437 Conclusions and Perspectives 437 References 438

13

Metabolic Engineering of Bacillus – New Tools, Strains, and Concepts 469 Mathis Appelbaum and Thomas Schweder

13.1 13.2 13.3 13.3.1 13.3.2 13.3.3 13.4 13.4.1 13.4.2 13.4.3

Introduction 469 The Determination of Essential Physiological Traits and Circuits 470 The Minimal Cell Concept 472 Why Minimal Genomes 472 Overview of Genome Reduction Projects in B. subtilis 473 Productivity of Genome-Reduced Strains 477 Tools for Genome Editing 478 Counter-Selection and Markerless Genome Editing 478 CRISPR/Cas in B. subtilis and Related Strains – Basic Principles 481 Expanding the Scope of Application – CRISPR/Cas9 in Metabolic Engineering and Synthetic Biology of Bacillus 486 Multiplex Genome Editing 486 Modulating Gene Expression Levels: CRISPRi and CRISPRa 487 CRISPR-dCas9 Mediated Base Editing 488 Boosting Strain Development of Alternative Bacillus Strains Using CRISPR/Cas9 489 Optimization, Standardization, and Modularity in Gene Expression 490 Activity-Independent Screening of Target Molecule Synthesis 492 The Biotechnological Application of Metabolic Engineering Strategies 493 Concluding Remarks and Future Perspectives 497 References 499

13.4.3.1 13.4.3.2 13.4.3.3 13.4.3.4 13.5 13.6 13.7 13.8

14

Metabolic Engineering of Pseudomonas 519 Pablo I. Nikel and Víctor de Lorenzo

14.1 14.2

Introduction 519 Bacteria from the Genus Pseudomonas as Platforms for Metabolic Engineering 520 General Characteristics of P. putida 521 Substrate Utilization and the Unique Core Metabolism of P. putida 522

14.2.1 14.2.2

Contents

14.2.3 14.3 14.3.1 14.3.2 14.3.3 14.3.4 14.3.5 14.4

Synthetic Biology Tools for Metabolic Engineering of P. putida 524 Examples of Metabolic Engineering of P. putida and Other Pseudomonas Species 528 Toward a Reference Chassis: Genome-Reduced Variants of Pseudomonas 528 Expansion of the Carbon Substrate Range 529 Engineering the Oxygen-Dependent Lifestyle of P. putida 530 Production of Aromatic Molecules and Organic Acids 532 Other Bioproducts 534 Conclusions and Future Prospects 534 Acknowledgments 535 References 536

15

Metabolic Engineering of Lactic Acid Bacteria 551 Robin Dorau, Jianming Liu, Christian Solem, and Peter Ruhdal Jensen

15.1 15.1.1 15.1.2 15.1.3 15.2 15.2.1 15.2.2 15.2.2.1 15.2.2.2 15.2.2.3 15.2.3 15.2.3.1 15.2.3.2 15.2.3.3 15.2.3.4 15.3 15.3.1 15.3.2 15.3.2.1 15.3.2.2 15.3.2.3 15.3.3 15.4

Introduction 551 General Features and Phylogeny of LAB 552 Metabolism of LAB 553 Overview of This Chapter 555 Genetic Engineering Strategies for LAB 556 Transformation of LAB and Shuttle Vectors 556 Gene Expression in LAB 557 Nisin-controlled Gene Expression 557 Synthetic Promoter Libraries 557 Other Systems for Controlling Gene Expression 558 Genetic Engineering of LAB 559 Plasmid Integration via Homologous Recombination 559 Integration Using Phage Attachment Sites 560 Recombineering 561 CRISPR-Cas-Technology for Genome Editing in LAB 563 Traditional Applications of LAB and Optimizing Performance 565 Starter Cultures for Food Fermentations 565 Stress Tolerance 567 Heat Tolerance 567 Acid Tolerance 569 Oxygen Tolerance 570 Substrate Utilization 571 Metabolic Engineering of LAB for Production of Chemicals or Proteins 572 Food Ingredients 572 Lactic Acid 572 Diacetyl 574 Alanine 574 Acetaldehyde 574 Bulk Chemicals 575 Ethanol 575

15.4.1 15.4.1.1 15.4.1.2 15.4.1.3 15.4.1.4 15.4.2 15.4.2.1

xi

xii

Contents

15.4.2.2 15.4.2.3 15.4.3 15.4.3.1 15.4.3.2 15.4.3.3 15.4.3.4 15.4.4 15.4.4.1 15.4.4.2 15.4.5 15.4.5.1 15.4.5.2 15.4.5.3 15.4.5.4 15.5

Acetoin 575 Butanediol-Isomers 577 Vitamins and Polyols 578 Riboflavin – Vitamin B2 578 Folate – Vitamin B9 579 Cobalamin and Other Vitamins 579 Polyols as Natural Sweeteners 579 Therapeutic Proteins and Bacteriocins 580 Therapeutic Proteins 580 Bacteriocins – Lantibiotics 581 LAB as Biocatalysts 583 Bioreduction of Ketones to Alcohols by LAB Cells 584 Conversion of Glycerol to PDO or 3HP 585 More Biotransformations Using LAB 587 Plant Metabolites 588 Conclusion and Prospects 589 References 590

16

Metabolic Engineering and the Synthetic Biology Toolbox for Clostridium 611 Rochelle C. Joseph, Susan Q. Kelley, Nancy M. Kim, and Nicholas R. Sandoval

16.1 16.2 16.3 16.3.1 16.3.2 16.3.3 16.3.4 16.4 16.4.1 16.4.2 16.4.2.1 16.4.2.2 16.4.2.3 16.4.2.4 16.4.2.5 16.4.3 16.4.4

Introduction 611 Aims of Metabolic Engineering in Clostridium 613 Genomic Editing in Clostridium 614 ClosTron 615 Transposon-Based Random Mutagenesis 615 Counterselection Markers 617 CRISPR-Based Editing in Clostridium 619 Genetic Parts in Clostridium 626 Promoters 627 Reporters 630 Enzyme-Based Reporters 630 Bioluminescent Reporters 631 Fluorescent Reporters 631 FbFP-Based Fluorescent Reporters 632 FAST, HaloTag, and SNAP-Tag Fluorescent Reporters 633 Terminators 633 5′ -UTRs and Riboswitches 634 Author Contributions 634 References 634

17

Metabolic Engineering of Filamentous Actinomycetes 653 Charlotte Beck, Kai Blin, Tetiana Gren, Xinglin Jiang, Omkar Satyavan Mohite, Emilia Palazzotto, Yaojun Tong, Pep Charusanti, and Tilmann Weber

17.1 17.2

Definition, Subject, and Importance 653 Recently Developed Tools and Strategies to Find Novel Bioactive Natural Products 653

Contents

17.3 17.4 17.4.1 17.4.2 17.4.3 17.5 17.5.1 17.5.2 17.5.3 17.5.3.1 17.5.4 17.5.4.1 17.5.4.2 17.5.4.3 17.5.4.4 17.6 17.6.1 17.6.2 17.7

Natural Product Biosynthetic Pathways 655 Genome Mining for Biosynthetic Gene Clusters 657 Current Software for Secondary Metabolite Genome Mining 658 How Genome Mining Works 658 Caveats 658 Engineering Secondary Metabolite Biosynthesis 660 Selection of Host Strains for Heterologous Expression 660 Cloning of BGCs for Heterologous Expression 661 Genome Editing of Actinomycetes 663 CRISPR Genome Editing Applications for Metabolic Engineering of Actinomycetes 663 Synthetic Biology Approaches to Engineer Actinomycetal Natural Product Biosynthesis Pathways 665 Biological Parts for Synthetic Biology 665 Biosensors 669 Other Regulatory Elements 670 Full Pathway Refactoring 671 Systems Metabolic Engineering of Filamentous Actinomycetes 671 Multiomics Studies as a Basis to Optimize the Production of Natural Products 671 Genome-Scale Metabolic Modeling of Filamentous Actinomycetes 674 Outlook and Perspectives 675 Acknowledgments 676 References 676

18

Metabolic Engineering of Yeast 689 Rui Pereira, Olena P. Ishchuk, Xiaowei Li, Quanli Liu, Yi Liu, Maximilian Otto, Yun Chen, Verena Siewers, and Jens Nielsen

18.1 18.2 18.2.1 18.2.2 18.2.3 18.2.3.1 18.2.4 18.2.4.1 18.2.4.2 18.2.5 18.2.5.1 18.2.5.2 18.2.5.3 18.2.5.4 18.2.6 18.2.6.1 18.2.6.2

Introduction 689 Production of Biofuels 691 First-Generation Bioethanol Production 691 Second-Generation Bioethanol 694 Higher Alcohols 697 Fatty Acids, Fatty Alcohols, and Hydrocarbons 699 Production of Commodity Chemicals 702 Organic Acids 702 Diols 706 Production of Fine Chemicals 706 Phenylpropanoids 706 Alkaloids 710 Sesquiterpenes 712 Monoterpenes, Triterpenes, and Other Isoprenoids 716 Production of Recombinant Proteins 718 Secreted Proteins 718 Virus like Particles 721 References 723

xiii

xiv

Contents

19

Harness Yarrowia lipolytica to Make Small Molecule Products 735 Kang Zhou and Gregory Stephanopoulos

19.1 19.2 19.3 19.3.1 19.3.2 19.3.3 19.4 19.4.1 19.4.2

Introduction 735 Genetic Tools for Engineering Y. lipolytica 737 Production of Short-Chain Organic Acids 738 Production of Citrate 738 Production of Pyruvate and α-Ketoglutarate 740 Production of Succinate 743 Production of Triacylglycerol 745 De novo Triacylglycerol Biosynthesis 745 The Push-and-Pull Strategy to Increase Flux Toward Triacylglycerol Synthesis 747 Desaturation of Fatty Acyl Chains Improved Triacylglycerol Production 747 Improve the Pathway Yield Through Balancing Redox Cofactors 748 Lower Substrate Cost by Utilizing Waste Streams 749 Improve Availability of Cytosolic Acetyl-CoA 753 Production of New Products 754 Production of Eicosapentaenoic Acid 755 Production of Triacetic Acid Lactone 756 Production of β-Carotene 757 Opportunities and Challenges 759 References 761

19.4.3 19.4.4 19.4.5 19.4.6 19.5 19.5.1 19.5.2 19.5.3 19.6

20

Metabolic Engineering of Filamentous Fungi 765 Vera Meyer

20.1 20.2

Introduction 765 Development and Implementation of Genetic and Genome Tools 768 Metabolic and Regulatory Models 770 Engineering Strategies for Improved Substrate Utilization 773 Engineering Strategies for Enhanced Product Formation 775 Aspergillus niger 776 Aspergillus oryzae 778 Aspergillus terreus 779 Penicillium chrysogenum 780 Trichoderma reesei 781 Thermothelomyces thermophilus 783 Engineering Strategies for the Production of New-to-Nature Compounds 783 Engineering Strategies for Controlled Macromorphologies 785 Future of the Field 788 References 790

20.3 20.4 20.5 20.5.1 20.5.2 20.5.3 20.5.4 20.5.5 20.5.6 20.6 20.7 20.8

Contents

21

Metabolic Engineering of Photosynthetic Cells – in Collaboration with Nature 803 Mette Sørensen and Birger Lindberg Møller

21.1 21.1.1 21.1.2 21.1.3

Plants for the Future 803 Plants as the World’s Champions of Complex Chemistry 803 Solar Radiation as a Renewable Energy Source 805 The Important Move from Model Plants to Engineered Crop Plants and Biosustainable Industrial Production Platforms 806 Photosynthetic Organisms 808 Metabolic Engineering Approaches 810 Selected Examples: Increased Bioproduction 810 Other Examples 812 Plant Cell Wall 813 Metabolic Engineering Approaches 814 Selected Examples: Reduced Xylan Production 814 Other Examples 816 Plant Bioactive Natural Products 816 Synthesis and Regulation of Bioactive Natural Products 817 Metabolic Engineering Approaches 818 Selected Examples: Upregulated Anthocyanin Synthesis 818 Other Examples 819 Chloroplasts as the Site of Production 821 Metabolic Engineering Approaches 824 Selected Examples: A Light-Driven Power-House 824 Other Examples 828 Metabolic Engineered Production in Microalgae 829 Metabolic Engineering Approaches 829 Selected Examples: Future Production Organisms 830 Other Examples 830 Metabolons – Advantages Using Plants 831 Metabolic Engineering Approaches 831 Selected Examples: Metabolic Highways 832 Other Examples 833 Biocondensates 833 Metabolic Engineering Approaches 834 Selected Examples: In Cell Storage Capacity 834 Conclusion: Metabolic Engineering of Plants in the Transition Toward a Biobased Society 838 Acknowledgments 838 References 839

21.2 21.2.1 21.2.1.1 21.2.1.2 21.3 21.3.1 21.3.1.1 21.3.1.2 21.4 21.4.1 21.4.2 21.4.2.1 21.4.2.2 21.5 21.5.1 21.5.1.1 21.5.1.2 21.6 21.6.1 21.6.1.1 21.6.1.2 21.7 21.7.1 21.7.1.1 21.7.1.2 21.8 21.8.1 21.8.1.1 21.9

22

Metabolic Engineering for Large-Scale Environmental Bioremediation 859 Pablo I. Nikel and Víctor de Lorenzo

22.1

Introduction 859

xv

xvi

Contents

22.2 22.3 22.4 22.5 22.6 22.7 22.8 22.9 22.10

Metabolic Engineering for Bioremediation: From 2.0 to 3.0 862 Dealing with Global Environmental Waste 865 Beyond Bioremediation 3.0: From the Test Tube to Planet Earth 867 Bottlenecks in the Development of Environmental Biocatalysts 870 Chassis for Delivery of Activities Beneficial for the Environment 872 Manufacturing Catalytic Consortia 874 Environmental Galenics 877 Toward HGT-Based, Large-Scale Bioremediation 878 Conclusions and Future Prospects: Towards Bioremediation 4.0 879 Acknowledgments 880 References 881 Index 891

xv

Preface We are facing unprecedented challenges of climate change, increasing and aging population that is placing increasing pressure on limited resources, environmental problems including waste plastics, and more recently the COVID-19 pandemic. Metabolic engineering will play increasingly important roles in addressing many of these challenges. The central theme is sustainability and the ability of metabolic engineering to create sustainable solutions by efficiently utilizing renewable resources. This goal has motivated the many contributors to this volume. Even under the extremely difficult conditions of COVID-19 pandemic, experts worldwide happily agreed to contribute to a book on metabolic engineering and completed their chapters on time. We are most grateful to the authors for their commitment, dedication, and quality of their work. This book comprises two parts, concepts and applications. The concept part starts with Chapter 1 on the history and perspectives of metabolic engineering, which also provides directions of future metabolic engineering studies. This introductory chapter is followed by an insightful Chapter 2 on genome-scale modeling and simulation, which over the last couple of decades have become an essential tool to understand metabolism and design metabolic engineering strategies. Metabolic flux analysis provides quantitative information on how metabolic fluxes are distributed in a metabolic network, and thus is essential in metabolic engineering. Chapter 3 describes how metabolic fluxes are determined from data collected following labeling with stable isotopes. Genome-scale metabolic simulation can be much better performed and more realistically with proper constraints. Chapter 4 describes how constraints from proteome data can be implemented to achieve this goal. Chapter 5 covers kinetic models that allow analysis of pathway fluxes based on enzyme and substrate concentrations and general enzyme properties. This chapter is followed by Chapter 6 on metabolic control analysis, which is a theoretical framework for understanding how changes in the activity of one or multiple enzymes affect the fluxes of metabolic networks and metabolite concentrations. Chapter 7 describes thermodynamics of metabolic pathways, focusing on thermodynamic feasibility of metabolic pathway reactions, which is essential in pathway design. Chapter 8 then naturally follows to describe how to design metabolic pathways through a four-step strategy involving defining biochemical search space, pathway search, enzyme assignment for each reaction step, and evaluation of pathway performance. Metabolic engineering cannot exist without fully understanding metabolites.

xvi

Preface

Chapter 9 describes how to perform metabolome analysis and data processing together with emerging trends of metabolomics. As the technologies for genome engineering of eukaryotes is far behind those of prokaryotes, we decided to have a specific Chapter 10 on genome editing of eukaryotes as the last chapter of Part 1. The chapter also describes general tools that can be employed in any cell type. Part 2 is devoted to applications of metabolic engineering in different organisms. Besides traditional workhorse strains such as Escherichia coli (Chapter 11), Corynebacteria (Chapter 12), Bacillus (Chapter 13), and yeasts (Chapter 18), emerging host strains including Pseudomonas (Chapter 14), lactic acid bacteria (Chapter 15), Clostridia (Chapter 16), actinomycetes (Chapter 17), Yarrowia lipolytica (Chapter 19), filamentous fungi (Chapter 20), and photosynthetic organisms (Chapter 21) are covered. These chapters showcase how metabolic engineering is performed for the production of a vast array of example products including chemicals, fuels, materials, drugs, and natural functional compounds in respective organisms. The metabolic engineering strategies employed in these organisms are often universal, yet some are uniquely developed and applied to the specific host cell described in these chapters. We believe that these actual metabolic engineering examples will be helpful to those working on these and similar topics. The final Chapter 22 covers the topic of bioremediation considering the importance of developing strategies to deal with increasing stresses on the environment. This chapter emphasizes that metabolic engineering is not only important for “producing something useful for humans,” but also essential for “improving our environment.” We anticipate that this book will serve as a textbook for senior undergraduate and junior graduate metabolic engineering classes. Also, the book will be a useful resource for researchers working in the field of metabolic engineering. We want to thank again all the authors who contributed their expertise to this volume. Last but not least, we want to thank the Wiley team, who worked tirelessly in communicating with authors, copy editing, and finalizing the book in such a nice manner. We hope that you will enjoy reading this book as much as we did during the editing process. November 2020

Sang Yup Lee, Jens Nielsen, and Gregory Stephanopoulos

1

Part 1 Concepts

3

1 Metabolic Engineering Perspectives Nian Liu and Gregory Stephanopoulos Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA

1.1 History and Overview of Metabolic Engineering Metabolic engineering emerged in the late 1980s primarily to capitalize on the advent of recombinant DNA technologies that allowed, for the first time, the direct genetic modification of microbial cells. Its original manifestation was with the first Metabolic Engineering conference in 1996, which was renamed after the conference of “Recombinant DNA Biotechnology III: The Integration of Biological and Engineering Science.” The peer-reviewed scientific journal, Metabolic Engineering, soon followed, and the book, Metabolic Engineering: Principles and Methodologies, completed the essentials of a new discipline. Even prior to its formal establishment, the ideas of metabolic engineering had already emerged. In a sense, the field was preceded by the study of mixed cultures: If a particular conversion from substrate A to a desired final product B could not be accomplished by a single organism, which could only convert A to an intermediate I, then it was logical to complement the organism with another species that completes the route from I to B. Recombinant technology essentially allowed the isolation and transfer of genes comprising the pathway from I to B so that the complete conversion could be accomplished in a single organism. Despite the declining interest in mixed cultures after that, many concepts of coexistence and stability as well as related mathematical methods of nonlinear dynamics and bifurcation theories found their way to the analysis of recombinant cultures in various configurations. Of course, metabolic engineering would not be in its current position without molecular biology, which lies at the heart of modern biotechnology with numerous applications. In plant sciences, they enable the introduction of new, useful traits in crops such as draught and salinity resistance [1]; in the medical area, they facilitate the identification of genes underlying a disease and the development of gene therapy as a cure [2]; in environmental applications, they are used for the degradation of recalcitrant compounds [3]. In the microbial world, the central objective of metabolic engineering and associated industrial biotechnology applications is the overproduction of chemical and fuel products either native to an organism or newly synthesized through the introduction of a heterologous Metabolic Engineering: Concepts and Applications, First Edition. Edited by Sang Yup Lee, Jens Nielsen, and Gregory Stephanopoulos. © 2021 WILEY-VCH GmbH. Published 2021 by WILEY-VCH GmbH.

4

1 Metabolic Engineering Perspectives

pathway. A microorganism is thus converted into a “chemical factory” to execute new biochemistry through its numerous native and non-native enzymes. Many parallels can be drawn between this approach and conventional chemical processes. For instance, just like how chemical conversions are determined by the stoichiometry, kinetics, and thermodynamics of reactions, a microbial pathway is also defined by these physical parameters of the constituent enzymes. Similar to the necessity of identifying rate limiting steps in chemical processes, a central goal of metabolic engineering is also the analysis of bottlenecking steps in a biochemical reaction network. A key difference between the two is that, while methods to overcome limiting steps are limited in chemistry, there are molecular biology tools including gene deletion and overexpression that can specifically target bottlenecking enzymes to enhance the overall cell productivity. At this point, one commonly asked question by many scientists and engineers is “Why should one use microbes instead of chemistry to carry out these reactions?” The answer lies in the unique ability of enzymes to conduct complex chemistry with high specificity. Thus, cell-catalyzed processes will be the preferred methods for making more intricate molecules such as pharmaceuticals, vitamins, proteins, probiotics, and other similar compounds. While biotechnology can make most of these products in a few steps, chemical methods would require a much longer synthesis route including a series of unavoidable protection–deprotection steps to accomplish the same goal. The second class of applications where biotechnology is likely to be superior is sustainable production, which requires the use of renewable feedstocks. Sugars, as a prime example, tend to be highly reactive and attempts at modifying them using organic chemistry techniques will commonly induce many byproducts. On the other hand, these renewable compounds are the perfect substrate for most microorganisms. With the power of molecular biology and metabolic engineering, sugars can be converted to the target product (organic acids, alcohols, biopolymers, solvents, and many other chemical products) with high yield and specificity. Correspondingly, these applications have fueled much interest in metabolic and microbial cell engineering to achieve diverse goals. Despite the focus of product biosynthesis, it should be noted that the methodology of metabolic engineering is applicable to nearly all areas of biotechnological activity. For example, judicious choice of isotopic tracers and analysis of labeled metabolites identified the function of a reverse TCA cycle in cancer cells under hypoxia [4]. This discovery had profound implications on our understanding of cancer metabolism and its treatment. In plant sciences, transferring genes with unknown functions into yeast cells and characterizing the metabolic steps in microbes has led to the elucidation of a new pathway responsible for cucurbitacin synthesis, which is used by plants for defense against pests [5]. A similar strategy has also been used to identify naturally synthesized herbicides that are highly effective [6]. These are just a few examples illustrating the broad application of metabolic engineering tools developed for the purpose of understanding and manipulating cell physiology, and there is no doubt that these tools will find further applications in the times to come. Nevertheless, to maintain a tighter focus, in this book we will keep our discussion within processes that use microbial biocatalysts as the enabling element.

1.2 Understanding Cellular Metabolism and Physiology

Hence, this volume is mostly dedicated to microbial systems and reviews the issues and methods related to improving the capability of host cells to produce useful products. Host selection, pathway design and expression, assessment of pathway function, elimination of stoichiometric limitations and kinetic bottlenecks, and evaluation of cell performance in bioreactor environments are all core topics underlying the various chapters. Experimental and mathematical tools that help achieve strain optimization are also discussed. For the remainder of this chapter in particular, we will briefly touch upon the central ideas of metabolic engineering. First and foremost, it is clear that metabolic engineering is related closely to microbial metabolism. This relationship is further addressed in the next section where computational and experimental methods that dissect cellular physiology are presented. Particular attention is given to methods probing cell-wide and genome-wide properties as they provide a holistic view of the entire cellular metabolism instead of a local one, and this is a hallmark of metabolic engineering. In Section 1.3, we examine the two general approaches to engineering a better cell catalyst, rational and combinatorial, along with systems metabolic engineering which combines the two. In the final few sections, we examine other important topics of metabolic engineering, such as host cell selection, substrate considerations, and synthetic biology, before closing with an assessment of the state of the field and its future directions.

1.2 Understanding Cellular Metabolism and Physiology In metabolic engineering, cellular metabolism is viewed as a network of biochemical reactions that can be exploited to convert a starting substrate to the final product through a sequence of steps. The traditional approach to metabolic engineering relies on initially developing a systematic understanding of the metabolic network with an eye on kinetic bottlenecks and stoichiometric limitations, and then applying this knowledge to engineer pathways that funnel fluxes toward the desired substance. Earlier efforts that utilized this methodology oftentimes focused on a more “localized” view of metabolic pathways in that only the steps directly connecting the substrate to the product were considered. This paradigm, despite largely simplifying the complexity of biological systems, has seen great success in terms of improving the titer, productivity, and yield of several bioproducts, such as amino acids [7]. Since the number of reactions that needs to be considered is relatively small, it is possible to manually interrogate each enzyme to determine its kinetic and thermodynamic limitations, shedding light on how the properties of individual steps affect flux through the entire pathway. Once this is known, an overall engineering strategy can hence be formulated and subsequently carried out. However, successes in employing this “localized” view are limited to situations where only a few enzymatic steps are relevant, thereby restricting its range of applications. As the field progressed, the biosynthesis of complex molecules with more structural and functional diversity quickly became the focus of many researchers. Correspondingly, new tools and

5

6

1 Metabolic Engineering Perspectives

methods have been developed to better understand cellular metabolism and guide engineering on a “global” scale. Regardless of what approach a metabolic engineer decides to take in designing pathways, a basic understanding of cellular metabolism will be essential and thus is the focus of this section. 1.2.1

Computational Methods in Understanding Metabolism

As mentioned above, the first step in a strain engineering project is to understand how cells behave metabolically and many computational methods have been developed to accomplish this. Metabolic control analysis (MCA), originally developed in the 1970s, represents one of the earliest efforts [8]. This theoretical framework examines the effect of a single enzyme on a pathway flux and defines the metrics that describe this relationship. Of particular relevance to metabolic engineering is its focus on the rigidity or flexibility of enzymes in a metabolic network. The central idea here is that bottlenecking steps show minimal response to changes in other steps (i.e. rigid) and therefore kinetically limit the pathway as a whole. Quantitatively, this is captured by observing the changes in flux through the entire pathway resulting from perturbations of individual enzymes. Performing such an analysis will inform the researcher which individual targets in the network have greater global impact on a cell’s overall metabolism. Hence, they deserve more attention when it comes to pathway optimization. Similar analyses can be applied to investigate the sensitivity of flux distribution at branch points of a metabolic network, with the idea that rigid branch points (or nodes) control the selectivity of a product by limiting the fraction of flux from a central pathway that can be diverted to it. Besides identifying key control points, other computational methods focused on determining and visualizing flux distributions across a metabolic network: Metabolic flux analysis (MFA) [9] and flux balance analysis (FBA) [10]. The goal of both methods is to calculate the pathway flux values in a preconstructed model subject to several physical and biological constraints. Each method has its unique strengths and weaknesses. MFA uses experimentally determined extracellular fluxes (i.e. substrate consumption and product formation rates) and metabolite isotopic enrichment patterns (see Section 1.2.2) to determine a set of flux values that best satisfies these measurements, while simultaneously abiding to mass balance constraints. The results from this exercise are therefore experimentally supported and quite accurate. A critical aspect of MFA is the degree of redundancy that is achieved with the selected isotopic tracers and measured metabolite enrichments, where more degrees of redundancy leads to better refined flux values. In other words, MFA is a nonlinear regression algorithm with fewer parameters (i.e. pathway flux values) than experimental observations (i.e. extracellular fluxes and metabolite enrichment) and hence it can be challenging at times to obtain a good fit. This in turn highlights the importance of constructing the metabolic model to reflect the pathways within a cell as accurately as possible. As such, MFA typically considers only a small subset of all biochemical reactions, most notably the well-characterized pathways such as central carbon metabolism, amino acid synthesis, and other secondary pathways. On the other hand, FBA is primarily based on computation

1.2 Understanding Cellular Metabolism and Physiology

and can be used on much larger models, even genome scale models (GSM), where nearly all biochemical reactions inferred from genome annotation are included. This results in greatly underdetermined systems, and to obtain a solution, an optimization criterion is superimposed, such as the maximization of growth yield. During the optimization process, the algorithm also attempts to satisfy mass balance and thermodynamic constraints. Since no experimental input is required, FBA can rapidly screen through a vast number of situations in silico, which is one of its major advantages. Nevertheless, while FBA operates on models that capture a more holistic view of the cell, the underconstrained nature of the system calls for suitable objective functions in order to pass experimental validation, which can sometimes be challenging to find. Regardless of whether MFA or FBA is used, both of them can be powerful tools in analyzing complex metabolic networks when correctly implemented. It is also worth noting that when carrying out these exercises, the most valuable information often resides not in the absolute values of fluxes but rather in their variations from a base case. In other words, determining flux variations resulting from genetic modulations or changes in organism type and culture conditions can generate valuable information that is masked when studying each condition in isolation. For instance, plots of fluxes against different enzyme levels can be elucidating, where a linear plot would suggest that the enzyme is likely a limiting step. A key limitation of MFA and FBA is that they operate on stoichiometric models with no kinetic inputs. However, since enzyme kinetics are key determinants of pathway dynamics, genome-scale kinetic models have also been developed in recent years [11]. As the name suggests, these models are parameterized with unknown rate constants associated with each enzyme that can be determined by fitting experimental flux data. For instance, if Michaelis–Menten kinetics are assumed, then the unknowns will be k cat and K m values for each enzymatic reaction, along with enzyme-level regulations and their associated K I values. Enzyme as well as metabolite concentrations are also commonly included as partially determined parameters. Due to the nature of how these models are constructed, they tend to capture the intricacies of metabolism in vivo, with the ability to incorporate regulatory effects being a major advantage. Furthermore, once the rate parameters have been solved for, the model can be used to generate additional flux distributions in response to changing conditions. This generally allows for better predictability of flux results where there are no known experimental data. Using this information, one can simply test different engineering targets in silico and observe how the flux within a cell changes to determine the best engineering strategy prior to performing any wet-lab experiments. 1.2.2

Experimental Methods in Understanding Metabolism

A key question that was raised in the early days of metabolic engineering was how it differed from genetic engineering. The simple answer to this question was that metabolic engineering concerned itself with the properties of metabolic networks viewed as a system instead of a collection of individual genes and enzymes. This introduced a new mind frame in research that was critically punctuated by the emergence of “omics” technologies. “Omics” refers to the

7

8

1 Metabolic Engineering Perspectives

profiling of individual cellular components enabled by state-of-the-art analytical techniques and biological assays. These measurements can be applied to nearly all constituents of the cell, including DNA (genomics), RNA (transcriptomics), proteins (proteomics), metabolites (metabolomics), and lipids (lipidomics), and are invaluable to metabolic engineering as they provide holistic information about the cell state. Transcriptomics, for example, surveys the transcription level of nearly every gene and allows one to evaluate all physiological effects brought about by a particular genetic modulation or environmental change as opposed to only those around the small confines of a particular metabolic section [12]. In addition, “omics” data can also guide engineering efforts. With genomics data, one can design and assemble DNA fragments to upregulate or downregulate genes, redirecting metabolic flux into desired pathways. Similarly, transcriptomics and proteomics reveal specific elements that can be engineered to either take advantage of (e.g. inducible promoters) [13] or evade (e.g. mutation of allosteric site) biological regulation [13, 14]. Metabolomics is of particular interest to metabolic engineers as it provides a snapshot of the numerous extra- and intracellular metabolites. Despite its complexity, this information is indispensable in assessing changes in the metabolic phenotype of cells in response to genetic or enzymatic changes, as pathway activity is ultimately reflected in the profiles of underlying metabolites. The ability to detect and quantify key metabolites, enabled by chromatographic separation techniques and mass spectrometry (MS), has seen major developments in the past several decades, allowing one to probe compounds within cells at concentrations as low as several nmol per gram cell dry weight (gCDW). Assessment of how these metabolites regulate pathway flux can then be carried out. For instance, changes in reduced-to-oxidized ratio of common electron carriers (e.g. NAD[P]+) globally affect the thermodynamics of many redox reactions [15]. As another example, high levels of a pathway intermediate can, in some cases, accelerate downstream mass-action based enzymes or in other cases induce feedback inhibition in upstream enzymes. Closely related to these developments are methods that measure metabolite concentrations in intracellular compartments, such as the mitochondria and the endoplasmic reticulum (ER) [16]. Many important reactions that take place specifically in such compartments cannot be accessed presently with existing metabolite extraction methods. Therefore, compartment-specific molecular sensors for metabolites have been constructed, which could provide valuable information in understanding the control of flux and improving the productivity of compounds synthesized in sub-cellular locations [17]. With better accuracy and precision in metabolite quantification, the resulting data help researchers further appreciate the importance of metabolite pool sizes in determining the thermodynamic and kinetic control of associated enzymes. Therefore, it is common to see metabolomics being integrated into many strain engineering studies nowadays. Another concept closely related to metabolomics is isotopic tracing, which provides information by tracking the flow of constituting atoms or groups of atoms as they transition from metabolite to metabolite. This is conducted by first providing the cells with an isotopically labeled tracer (replacing media components

1.3 General Approaches to Metabolic Engineering

with isotopically labeled versions) and then analyzing the isotopic enrichment of the metabolome. Isotopic tracing requires deep knowledge of the atom transition mapping of all participating reactions as well as nearly all known metabolic pathways. Nevertheless, the outcomes can be very rewarding as such data has been instrumental in the discovery of new pathways [18], determination of flux ratios at key branch points [19], measurement of enzyme reversibility [20], quantification of pathway flux in conjunction with MFA (Section 1.2.1) [9], and analysis of individual pathway contribution to the final product [21]. At this point it is worthy to stress that all “omics” and isotopic tracing measurements are intimately related to one another. Metabolite pool sizes are determined by the upstream and downstream pathway strengths, which are in turn dependent on the transcriptional and translational profiles. Isotopic tracing depends on the reaction fluxes of associated enzymes, which are affected by the expression of associated genes. Due to such intricacies, interpreting specific “omics” data can be challenging and considerations should be put in place to avoid drawing misleading conclusions from their changes, especially those that are related to causality. Unfortunately, there are no generalized tools that can be used for the automated interpretation of the plethora of “omics” data in a mechanistic sense, which is certainly a limitation in realizing the full value of these measurements. Another point worth mentioning is that the “omics” profiles are dynamic and constantly changing with time, albeit at different time scales. Hence, taking “omics” measurements over time with frequent sampling along with sensor development for continuous monitoring will certainly help shed more light onto the dynamics of cellular processes.

1.3 General Approaches to Metabolic Engineering When cellular metabolism is sufficiently understood in terms of kinetics and regulation, one can formulate strategies on rewiring the native metabolic network of an organism to achieve a certain objective. This paradigm of engineering cells based on a priori knowledge is referred to as rational metabolic engineering and has been broadly applied, especially in the early days of metabolic engineering. In recent years, with the advent of fast and automated DNA synthesis methods, robotics, and high-throughput screening methods, a different approach has emerged which attempts to generate improved production phenotypes by broadly searching a space comprising numerous random genetic variants of a base strain. This approach, termed combinatorial metabolic engineering, relies less on a basic understanding of fundamental biochemistry and physiology, and is more of a systematic search process. Despite their differences in terms of pros and cons, both approaches have demonstrated considerable success in various applications. In fact, the most successful examples of metabolic engineering integrate ideas from both approaches as well as some other related disciplines. This resulted in the emergence of systems metabolic engineering as a more recent framework tailored toward strain design in industrial biotechnology.

9

10

1 Metabolic Engineering Perspectives

1.3.1

Rational Metabolic Engineering

If we understand the kinetics, thermodynamics, and regulations of all reactions participating in a bioreaction network, it is possible to determine the optimal enzymatic profile that will maximize the desired product. In an extreme scenario where we have access to reliable metabolic models that capture all characteristics and interactions within a cell, the optimal genetic modification to an organism that best achieves our goal can be determined mathematically without any experimentation. This represents the ideal case of rational metabolic engineering. Since such a complete understanding of enzymatic networks is rarely available, practical rational engineering relies on experiments that provide much of the missing information at sufficient detail to allow the targeted modulation of the genome. Typically, the workflow can be generalized to these following steps: (i) design biochemical pathways that can convert the starting substrate to the final product; (ii) source the necessary genes that encode the enzymes of the pathway; (iii) introduce the pathway into the host organism using molecular biology tools and optimize gene expression as well as enzyme kinetics. There are a number of considerations in executing each of these steps. For instance, pathway design requires us to find reaction paths that can work most efficiently, where it is typically better to have fewer enzymatic steps, minimal cofactor requirements, and limited decarboxylation steps (loss of material in the form of CO2 ). Hence, properly scoring candidate pathways using quantifiable metrics, such as the ones mentioned above, is of critical importance. When choosing the source of the required enzymes, there is usually a tradeoff between ease of expression and stability of the host’s native enzymes versus superior catalytic properties of heterologous enzymes. As for pathway implementation and optimization, in addition to fine-tuning the specific parameters of directly participating enzymes, we also need to identify potential branching pathways, balance the needs of cell growth and production, analyze how the product is sequestered and so on. Generally, throughout the rational engineering process, tools for assessing metabolism are invaluable in addressing many of the challenges that come along the way. In addition, new insights into how the pathways function can also be discovered after they are implemented and tested. Thus, it is necessary to repeat these steps multiple times, each using the new information obtained from the previous iteration, in order to fully uncover the potential of the designed pathway. Although typical rational metabolic engineering efforts generally follow this roadmap, detailed strategies will vary in each case. Nevertheless, throughout the 30 years or so of development in the field, several rules have emerged and proven to be widely applicable to many cases. The most common way to increase the amount of a product is by overexpressing bottlenecking enzymes within the production pathway and knocking out enzymes in competing pathways [22]. Here, overexpression of a bottlenecking enzyme, usually done through increasing promoter strength or gene copy number, elevates the amount of protein that catalyzes the rate-limiting step, thereby increasing overall flux toward product formation. On the other hand, knocking out enzymes through

1.3 General Approaches to Metabolic Engineering

loss of function mutations or blocking transcription does the exact opposite, diminishing the undesired competing reactions that siphon flux away from the desired direction. Combining the two alters the native metabolism of the cell and results in a funneling effect that conserves the starting substrate’s mass and energy into the final product as much as possible. However, this is not always fail-safe. In some cases, distant metabolic pathways that seemingly have no relation to the product may play a key role and hence need to be considered as well. Consequently, when multiple relevant pathways are present in parallel, their respective pathway strengths should be evaluated and balanced in a way that precisely satisfies the product requirements [23]. Even in cases of linear pathways, care must be given to balancing upstream and downstream pathway flux with respect to a key metabolic intermediate in order to avoid its unwanted accumulation [24]. These examples illustrate the central idea of current metabolic engineering research, which is to precisely control flux through metabolic pathways as needed without it being too weak or too strong. To this end, using promoters with varied expression levels, manipulating gene copy numbers, tuning the kinetic properties and stability of enzymes, changing the substrate–enzyme binding strength, and introducing genetic circuits are all effective methods to achieve such degree of control [25]. Furthermore, from a systems biology point of view, one can also alter and control the flux distribution of the cell simply through the introduction of multiple judiciously chosen substrates at proper feed rates, so long as they are co-utilized [21]. This can help circumvent challenges related to genetic engineering (i.e. some organisms are not transformable) while still being able to markedly improve product yield and productivity. It is also important to note that the control of flux can also be achieved spatially and temporally. “Spatial control” refers to the ability to direct the biochemical reactions of a pathway to a subcellular location or organelle through enzyme targeting [26]. It can also be used to cluster several enzymes together by means of fusion protein construction [27] or scaffolding [28]. In either case, these strategies primarily serve to bring enzymes within close proximity of each other in order to minimize the limitations associated with mass transfer and cross-membrane transport. On the other hand, “temporal control,” commonly achieved through inducible systems, refers to control over changes in the enzyme profile of the cell at the desired time. With modern tools that can enable precise induction time and level, this method has become increasingly powerful in redistributing the limited cellular resources in order to resolve the conflict between a cell’s native and engineered systems. For instance, one common application is to switch from growth phase to production phase [29]. Since cell growth most likely utilizes different pathways compared to product biosynthesis, we can use inducible systems to delay the expression of the production pathway such that the cells can rapidly replicate initially. Later when there are sufficient amounts of biomass present, the induction system can then be activated, shifting the flux distribution to favor the production pathway instead. An added benefit of this strategy is that it decreases product toxicity since growth is purposefully maintained at slower rates during product accumulation and hence cells are less susceptible to adverse effects.

11

12

1 Metabolic Engineering Perspectives

1.3.2

Combinatorial Metabolic Engineering

It is clear that the requirements for rational approaches, which entail knowledge of pathway enzymes, are rarely satisfied in full. In light of the limitations in our understanding of actual enzyme behavior in vivo, researchers have opted for a combinatorial search in a space spanned by various biological parameters. This gave rise to combinatorial metabolic engineering that seeks to systematically search through many and ideally exhaust all permutations of a cell’s relevant genetic material in order to find the combination that gives optimal results. The major advantage of this approach is that it reduces, to a certain degree, the need for detailed understanding of biochemical factors affecting a pathway or cell physiology. Its drawbacks, however, are that the genetic search space is truly vast, demanding an effective high-throughput screening procedure to complete the task within a reasonable time. In practice, combinatorial metabolic engineering can be applied to an organism as a whole or to a specific enzyme of choice. It typically begins with the rapid generation of a large number of variants associated with the targeted genetic parts, formulating a library. Libraries can be constructed by random or semirandom mutagenesis techniques, such as UV irradiation, chemical mutagens, and error-prone polymerase chain reaction (PCR) [30]. The mutants within the library confer a wide distribution of characteristics giving rise to superior entities that outperform the original wild-type variant, which is essentially the basic premise of combinatorial engineering. Nevertheless, one major challenge to library construction is to have sufficient coverage of the design space in order to not miss any potential “hits.” This is especially important in the case of engineering organisms, as certain parts of the genome may not be accessible to mutagenesis (e.g. tight histone wrapping of certain DNA sequences), thereby skewing the library to a particular type of mutant. Another challenge related specifically to libraries generated from random mutagenesis is that they are oftentimes too large with only a very small subpopulation exhibiting meaningful changes. Thus, an alternative approach is to only work within a predetermined genetic space, which will require some basic understanding of the system initially. For instance, if one knows or determines that a particular enzyme is rate limiting, it might be sufficient to generate strain libraries comprising variants of this enzyme only. A systematic method to shrink down the library size while still permitting changes to occur across the entire cellular network is by mutating global transcription factors. The global Transcriptional Machinery Engineering (gTME) method [31], which targeted the cells’ major sigma factors to maximize transcriptional diversity, represents a successful demonstration of such an idea. With the rapid development of DNA synthesis and assembly techniques, it is no longer tedious to design and construct these specialized libraries, which can greatly increase the chances of successfully finding an improved mutant. Once the variants within a library are generated, they can be tested individually – a process called screening. Choosing the correct characteristic to benchmark each variant against is the most important factor since it has to not only reflect the desired phenotype but also enable rapid and accurate screening. As such, spectrophotometry or fluorescence techniques, including multiwell plate

1.3 General Approaches to Metabolic Engineering

assays or flow cytometry, are generally employed due to short run times per sample [32]. Microfluidic techniques are also useful here to reduce the experimental footprint of each test while also facilitating simultaneous measurements of multiple samples [33]. Furthermore, efforts have been made to encapsulate individual cells along with microliters of the surrounding medium using microfluidics in order to screen for overproducers of secreted products, which was previously impossible due to product diffusion [34]. Nevertheless, all of these techniques rely on the ability to benchmark a mutant through optical measurements, which can be especially challenging for the vast majority of desirable phenotypes since they do not exhibit optical properties. To this end, cleverly designed biosensors or fused fluorescent proteins can be implemented to overcome this barrier [32]. In more recent years, efforts have also been made to shorten the turnover rate for many analytical methods, such as chromatography and MS, which broadens the array of measurable characteristics, ultimately leading to better assessment of performance [35]. With a chosen benchmarking trait and the appropriate screening technique, all variants within the generated library can be rapidly tested to identify the best-performing member. Additional rounds of mutation and selection are also possible using the promising new mutant as the base variant, although caution should be exercised here as repeating this cycle does not guarantee a path toward the global maximum. Closely related to combinatorial metabolic engineering is the use of evolution methods. Borrowing from natural evolution, these strategies aim to streamline the library construction and screening processes by applying adverse selection pressures to gradually enrich a desirable subpopulation of the library. In a sense, a properly designed evolution effort pushes the library to screen itself over time, retaining only the variants that exhibit characteristics of interest. Directed evolution represents a very successful example of applying such a technique, where it aims to generate new enzymes that have orders of magnitude of higher catalytic efficiency or can catalyze brand new chemical reactions not found in nature [36]. The same concept can also be applied to strain optimization where the selection pressure is designed in such a way that allows variants with a certain characteristic to outgrow those that do not have this characteristic [37]. Note that, just as choosing the correct screening assay is important in combinatorial engineering, applying the appropriate selection pressure is crucial to the success of evolution studies. This is because the result of an evolution will only give rise to a phenotype that is most directly related to the pressure and thus it is paramount to ensure that this phenotype is closely correlated with the desired trait. Phage-assisted continuous evolution (PACE) serves as an excellent illustration of this idea where the applied selection pressure primarily screens for a rapidly propagating bacteriophage [38]. However, in PACE, the ability of a phage to infect and hence propagate has been engineered to depend on the catalytic activity of an enzyme to be evolved. Thus, during the evolution process, despite the application of a seemingly unrelated selection pressure, enzymes with higher activities emerge as a result of this link. Regardless of the approach used, once a superior strain or enzyme has been obtained, it is important to determine the genetic changes responsible for the improved phenotype. This may be a demanding task, especially when several

13

14

1 Metabolic Engineering Perspectives

genetic alterations are identified (by sequencing) in the mutated strain, requiring the evaluation of all combinations of such changes. Furthermore, due to the random nature of combinatorial engineering, many of the changes might not be related to the identified phenotype at all, prompting further experimentation in order to determine the root cause of improved performance. As a result, a recapitulation of the improved phenotype in a clean genetic background is critical for establishing the causal genetic factors of improvement, constructing robust production strains, and securing the associated intellectual properties. A mechanistic understanding at the biochemical level should also follow for better understanding of the genotype–phenotype correlation. 1.3.3

Systems Metabolic Engineering

Despite their differences in methodology, rational and combinatorial metabolic engineering approaches complement each other in the overall strain engineering process. Rational engineering based on established biological knowledge and previous experience in manipulating metabolic pathways allows one to quickly obtain an optimal strain with a high probability of success. On the other hand, combinatorial engineering can be used to explore uncharted regions of the genotype–phenotype landscape in order to push the capabilities of the rationally engineered strain further. Consequently, systems metabolic engineering emerged as a more recent approach that integrates both facets of metabolic engineering while also drawing ideas from fields such as systems biology, synthetic biology, and “omics” studies [39, 40]. Although a lot of the strategies overlap with what has been discussed above, this new integrated approach brings several noteworthy concepts to the field. In particular, applying a modular view of metabolic pathways in strain engineering and optimization should be emphasized. Generally, despite the tremendous amount of diversity in the biomolecules we are able to synthesize with metabolic engineering, they can commonly be traced back to only a handful of precursors along central carbon metabolism. For instance, all acyl molecules as well as isoprenoids and polyketides can be derived from acetyl-CoA; nearly all aromatic compounds are derived from erythrose-4-phosphate (E4P). Because of this, many products that have similar carbon backbone structure share common biosynthetic pathways [41]. Grouping these pathways into specific modules facilitates the transfer of engineering knowledge laterally across multiple compounds that resemble each other, accelerating the strain construction process. The biosynthesis of isoprenoids serves as an excellent example here, where the pathway can be broken down into four major components: glycolysis to generate acetyl-CoA from glucose; the mevalonate (MVA) pathway to generate isopentenyl pyrophosphate (IPP) from acetyl-CoA; the prenyl phosphate pathway for carbon backbone building; specific cyclases and P450s to generate the specific isoprenoid. Specifically, engineering strategies for the first three modules are largely similar for all isoprenoids across most organisms. Hence, the metabolic engineering of isoprenoids can be largely simplified without the need to redesign pathways for each specific compound and strains for new target products can be rapidly engineered by applying previous successful methods.

1.5 Substrate Considerations

1.4 Host Organism Selection Choosing a suitable organism for the synthesis of a product usually occurs in conjunction with pathway design and is one of the very first steps in metabolic engineering regardless of the approach used. In the earlier years, only model microbes, such as Escherichia coli for prokaryotic systems and Saccharomyces cerevisiae for eukaryotic systems, were sensible choices due to their vast amount of biological knowledge, strain collections, and available genetic tools. However, in nearly all cases, metabolic engineering only alters a very small subset of an organism’s genetic information, leaving most genes that dictate the cell’s basic biology unaltered. This suggests that it may be difficult to adopt model organisms for function in every application. Fortunately, nature offers a wide variety of options in terms of host organisms, each having unique metabolic pathways, distinct promotor systems and enzymes, as well as defined substrate preferences and toxicity tolerances. Consequently, identifying each microorganism’s strengths and properly exploiting them in suitable conditions can lead to extraordinary performances. For instance, certain Cyanobacterium [42] and Clostridium [43] species have attracted attention lately for photosynthetic and nonphotosynthetic CO2 fixation, respectively; Yarrowia lipolytica has been classified as the ideal organism for producing acetyl-CoA-derived biomolecules [44]. Furthermore, by comparing the metabolism of these organisms, we can also understand why they excel in their respective cases, thereby deepening our fundamental understanding of biology. The major disadvantage of these organisms, however, is that there is insufficient experience and tools for their genetic modulation compared to E. coli and S. cerevisiae. Hence, there is generally a tradeoff between using model versus nonmodel hosts: the former brings a wealth of information, culture collections, and molecular biology methods, but may have difficulty performing in adverse environments including high salinity, extreme pH, high temperature, or presence of solvents and other inhibitors; the latter can offer specialized pathways and superior growth in particular conditions, although it may take a long time before the genetic tools become available for introducing additional functionality to the cells. Here, we should emphasize that a lot of the advantages of nonmodel organisms cannot be attributed to a single gene or enzyme, but rather depend on convoluted interactions among the cells’ genome, transcriptome, proteome, and metabolome. As such, it can be particularly difficult to replicate an organism’s growth characteristics, inhibitor tolerance, or superior pathway flux in model organisms. For these reasons, systematizing the development of genetic tools for a few representative nonmodel organisms would markedly facilitate the progress of metabolic engineering, minimizing the issues raised by experimental feasibility when it comes to host selection.

1.5 Substrate Considerations Although constructing a productive strain with metabolic engineering is paramount to the success of industrial biotechnology, the importance of the

15

16

1 Metabolic Engineering Perspectives

culture medium cannot be overemphasized. Hence, the genetic engineering of cells to expand the range of utilizable carbon sources beyond glucose (the most common carbon source) has been an active area of investigation. Specifically, modulating substrate transporters, introducing substrate assimilation pathways, and enhancing tolerance to substrate toxicity have all been carried out to achieve the utilization of numerous, and sometimes unconventional, substrates [45]. In some cases, several studies have even created artificial pathways to support growth on challenging yet important non-native carbon sources such as CO2 [46] and methanol [47]. Recently, emphasis has been placed on utilizing inexpensive and sustainable feedstocks, which commonly contain a complex mixture of nutrients (lignocellulosic biomass and municipal waste streams for instance). In these cases, the confinement of diauxic growth that prevents simultaneous consumption of multiple carbon-energy sources poses special limitations and methods to abolish catabolite repression without hindering growth should receive priority. Regardless of the feedstock used however, we emphasize that tailoring pathways and enzymes to a set of relevant substrates should always be an integral part of strain and process design. This is because each substrate has a unique carbon backbone structure, energy density, and entry point into metabolism, prompting appropriate pathway engineering and optimization in order to maximize its efficiency in converting into the product.

1.6 Metabolic Engineering and Synthetic Biology As mentioned, metabolic engineering emerged between 1985 and 1990 as a field that modulates metabolic pathways for the purpose of overproducing fuels, chemicals, and pharmaceuticals. Synthetic biology was first mentioned in the early 2000s, with an initial focus on constructing genetic circuits for the control of metabolism [48, 49]. Building upon core technologies such as the assembly of various natural or artificial genetic components, the field soon expanded to encompass biosensors, targeted gene editing, and artificial proteins and subcellular compartments [50]. Generally, there is significant overlap between metabolic engineering and synthetic biology as they both study cells and rely heavily on DNA synthesis, molecular biology methods, and other engineering principles. The foci of activity, however, remain different with metabolic engineering aiming at optimizing microbial product synthesis and advancing industrial biotechnology, while synthetic biology concentrates on exploring and developing new biological concepts. However, there are many synergies that exist between metabolic engineering and synthetic biology, facilitating the two fields to benefit from each other. In particular, metabolic engineering gains from the synthetic DNA, protein engineering tools, and control circuits created by synthetic biologists to advance the synthesis and control of metabolic pathways. Similarly, synthetic biology learns from the pathway design, analysis, and optimization concepts that are central to metabolic engineering.

1.7 The Future of Metabolic Engineering

1.7 The Future of Metabolic Engineering It has been approximately a quarter of a century since metabolic engineering first emerged. The concept of engineering microorganisms using tools of molecular biology to produce fuels, polymers, food products, and pharmaceuticals is now widely accepted. As such, it drives the next generation of applications in industrial biotechnology, where some microbial technologies are novel and some are in direct competition with chemistry. Evidence of this is the steady growth in the number of successfully synthesized bioproducts as well as the variety of host cells that have been engineered. At the same time, the degree of sophistication in designing and optimizing pathways has risen considerably to allow biosynthesis of complex molecules at remarkable rates. While this state of affairs attests to the great success of metabolic engineering, it also prompts a closer look at where the future holds for the field and what kind of new challenges lie ahead. Most of the unsolved questions revolve around improving the efficiency of characterizing cellular physiology in order to assess the impact of genetic modulations. As mentioned, technologies of DNA synthesis and assembly now allow multiplexed vector construction and strain transformation at a large scale. However, the means of assessing the impact of these genetic modifications remain relatively underdeveloped. How can we systematically generate and interpret the full “omics” profile of an organism and its large number of genetic variants? Is there a way to best characterize and evaluate any given phenotype? Once we determine a methodology to characterize a cell’s response to gene edits, the question of how this information can be used then comes to mind. Which parameters or experimental inputs that describe the cell do we need in order to enhance the predictability of metabolic models? How can we integrate the results from phenotyping to both rational and combinatorial engineering approaches? We also mentioned the importance of various tools used to engineer microbes and proteins. In that vein, how can we create a pipeline for genetic tool development in anticipation of promising newly discovered organisms? To what extent can we successfully engineer artificial pathways and enzymes to expand the boundaries of natural constraints? Providing a definitive answer to any of these questions will mark a major milestone in the progress of metabolic engineering. In sharp contrast to other engineering applications, mathematical models and first principles have been rather underutilized in metabolic engineering, despite their significant impact in helping understand the intricacies of complex metabolic networks. This is understandable considering the great complexity of biological systems and the lack of our ability to systematically generate and interpret big data that enumerates and explains key cellular processes. Therefore, another area of importance is the use of powerful new computational approaches with inputs from advances in fields such as machine learning and artificial intelligence. These new algorithms could provide the missing piece of information that combines all facets of metabolic engineering to accelerate the strain development process (Figure 1.1). Finally, although the focus of this book is on the application of metabolic engineering to industrial biotechnology, the field need not be limited to the realm

17

18

1 Metabolic Engineering Perspectives

Strain engineering

Metabolic modeling

Flux values targets Base strain

ΔG° kcat KM

Predetermined knowledge Targets

“Omics” data model refinement Predicted met. conc. predicted enz. prop.

Genotype “Omics” data

Library creation Base strain Mutant strains

A C C G T A

Machine learning

Strain screening Genotype “Omics” data

Figure 1.1 A potential generalized metabolic engineering strategy to rapidly create strains exhibiting a desired overproducing phenotype. The strategy integrates rational engineering, combinatorial engineering, high-throughput strain genotyping and phenotyping, machine learning, and metabolic modeling. Each of the aspects illustrated here are integral to the overall process and they work in conjunction to better elucidate how certain changes in the microbe’s genome lead to metabolic and physiological shifts.

of making products. After all, metabolic engineering encompasses more broadly the study of cellular metabolism and physiology for the purpose of identifying genetic elements to a particular end. The same tools and concepts in metabolic engineering have found very successful use in the area of biomedical research, such as cancer metabolism. Therefore, we think that many of the ideas can potentially benefit the study of specific diseases directly related to metabolism, such as diabetes, where we can draw parallels between diagnostics and pathway interrogation as well as between treatment and pathway rewiring. Besides the most obvious metabolic diseases, most other human functions will also benefit from this endeavor. Immunology, for instance, has heavy interplay with human metabolism and the importance of T-cell metabolism on its proliferation, differentiation, and efficacy has only very recently been discussed with huge potential for further investigations [51]. Overall, the unique focus of metabolic engineering on the identification of genes as targets to impact cellular phenotype will find broad uses on identifying the root cause of diseases and designing novel therapies. We envision that this will be another avenue of research for metabolic engineering that impacts our everyday lives.

References

References 1 Prado, J.R., Segers, G., Voelker, T. et al. (2014). Genetically engineered crops:

from idea to product. Annu. Rev. Plant Biol. 65: 769–790. 2 Anguela, X.M. and High, K.A. (2019). Entering the modern era of gene

therapy. Annu. Rev. Med. 70: 273–288. 3 Muyzer, G. and Stams, A.J.M. (2008). The ecology and biotechnology of

sulphate-reducing bacteria. Nat. Rev. Microbiol. 6: 441–454. 4 Metallo, C.M., Gameiro, P.A., Bell, E.L. et al. (2012). Reductive glutamine

5 6

7

8

9 10 11 12 13

14

15

16

17

18

metabolism by IDH1 mediates lipogenesis under hypoxia. Nature 481: 380–384. Zhou, Y., Ma, Y., Zeng, J. et al. (2016). Convergence and divergence of bitterness biosynthesis and regulation in Cucurbitaceae. Nat. Plants 2: 1–8. Yan, Y., Liu, Q., Zang, X. et al. (2018). Resistance-gene-directed discovery of a natural-product herbicide with a new mode of action. Nature 559: 415–418. Vallino, J.J. and Stephanopoulos, G. (1993). Metabolic flux distributions in Corynebacterium glutamicum during growth and lysine overproduction. Biotechnol. Bioeng. 41: 633–646. Heinrich, R. and Rapoport, T.A. (1974). A linear steady-state treatment of enzymatic chains: general properties, control and effector strength. Eur. J. Biochem. 42: 89–95. Wiechert, W. (2001). 13 C metabolic flux analysis. Metab. Eng. 3: 195–206. Orth, J.D., Thiele, I., and Palsson, B.O. (2010). What is flux balance analysis? Nat. Biotechnol. 28: 245–248. Tran, L.M., Rizk, M.L., and Liao, J.C. (2008). Ensemble modeling of metabolic networks. Biophys. J. 95: 5606–5617. DeRisi, J.L., Iyer, V.R., and Brown, P.O. (1997). Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278: 680–686. Maruyama, K., Todaka, D., Mizoi, J. et al. (2012). Identification of cis-acting promoter elements in cold-and dehydration-induced transcriptional pathways in arabidopsis, rice, and soybean. DNA Res. 19: 37–49. Jiménez-Osés, G., Osuna, S., Gao, X. et al. (2014). The role of distant mutations and allosteric regulation on LovD active site dynamics. Nat. Chem. Biol. 10: 431–436. Canelas, A.B., Van Gulik, W.M., and Heijnen, J.J. (2008). Determination of the cytosolic free NAD/NADH ratio in Saccharomyces cerevisiae under steady-state and highly dynamic conditions. Biotechnol. Bioeng. 100: 734–743. Chen, W.W., Freinkman, E., Wang, T. et al. (2016). Absolute quantification of matrix metabolites reveals the dynamics of mitochondrial metabolism. Cell 166: 1324–1337.e11. Tao, R., Zhao, Y., Chu, H. et al. (2017). Genetically encoded fluorescent sensors reveal dynamic regulation of NADPH metabolism. Nat. Methods 14: 720–728. Calvin, M. and Benson, A.A. (1948). The path of carbon in photosynthesis. Science 107: 476–479.

19

20

1 Metabolic Engineering Perspectives

19 Jang, C., Chen, L., and Rabinowitz, J.D. (2018). Metabolomics and isotope

tracing. Cell 173: 822–837. 20 Park, J.O., Tanner, L.B., Wei, M.H. et al. (2019). Near-equilibrium glycoly-

21 22

23 24

25

26

27

28

29

30 31

32

33

34

35

sis supports metabolic homeostasis and energy yield. Nat. Chem. Biol. 15: 1001–1008. Park, J.O., Liu, N., Holinski, K.M. et al. (2019). Synergistic substrate cofeeding stimulates reductive metabolism. Nat. Metab. 1: 643–651. Pickens, L.B., Yang, Y., and Chooi, Y.-H. (2011). Metabolic engineering for the production of natural products. Annu. Rev. Chem. Biomol. Eng. 2: 211–236. Luo, X., Reiter, M.A., d’Espaux, L. et al. (2019). Complete biosynthesis of cannabinoids and their unnatural analogues in yeast. Nature 567: 123–126. Ajikumar, P.K., Xiao, W.H., Tyo, K.E. et al. (2010). Isoprenoid pathway optimization for Taxol precursor overproduction in Escherichia coli. Science 330: 70–74. Jones, J.A., Toparlak, T.D., and Koffas, M.A.G. (2015). Metabolic pathway balancing and its role in the production of biofuels and chemicals. Curr. Opin. Biotechnol. 33: 52–59. Xu, P., Qiao, K., Ahn, W.S., and Stephanopoulos, G. (2016). Engineering Yarrowia lipolytica as a platform for synthesis of drop-in transportation fuels and oleochemicals. Proc. Natl. Acad. Sci. U. S. A. 113: 10848–10853. Zhang, Y., Li, S.Z., Li, J. et al. (2006). Using unnatural protein fusions to engineer resveratrol biosynthesis in yeast and mammalian cells. J. Am. Chem. Soc. 128: 13030–13031. Dueber, J.E., Wu, G.C., Malmirchegini, G.R. et al. (2009). Synthetic protein scaffolds provide modular control over metabolic flux. Nat. Biotechnol. 27: 753–759. Dinh, C.V. and Prather, K.L.J. (2019). Development of an autonomous and bifunctional quorum-sensing circuit for metabolic flux control in engineered Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 116: 25562–25568. Adrio, J.L. and Demain, A.L. (2006). Genetic improvement of processes yielding microbial products. FEMS Microbiol. Rev. 30: 187–214. Alper, H. and Stephanopoulos, G. (2007). Global transcription machinery engineering: a new approach for improving cellular phenotype. Metab. Eng. 9: 258–267. Dietrich, J.A., McKee, A.E., and Keasling, J.D. (2010). High-throughput metabolic engineering: advances in small-molecule screening and selection. Annu. Rev. Biochem. 79: 563–590. Du, G., Fang, Q., and den Toonder, J.M.J. (2016). Microfluidics for cell-based high throughput screening platforms – a review. Anal. Chim. Acta 903: 36–50. Wang, B.L., Ghaderi, A., Zhou, H. et al. (2014). Microfluidic high-throughput culturing of single cells for selection based on extracellular metabolite production or consumption. Nat. Biotechnol. 32: 473–478. Imaduwage, K.P., Lakbub, J., Go, E.P., and Desaire, H. (2017). Rapid LC-MS based high-throughput screening method, affording no false positives or false negatives, identifies a new inhibitor for carbonic anhydrase. Sci. Rep. 7: 1–10.

References

36 Arnold, F.H. and Volkov, A.A. (1999). Directed evolution of biocatalysts.

Curr. Opin. Biotechnol. 3: 54–59. 37 Pontrelli, S., Fricke, R.C.B., Sakurai, S.S.M. et al. (2018). Directed strain evo-

38 39 40

41 42

43

44 45

46

47

48 49 50 51

lution restructures metabolism for 1-butanol production in minimal media. Metab. Eng. 49: 153–163. Esvelt, K.M., Carlson, J.C., and Liu, D.R. (2011). A system for the continuous directed evolution of biomolecules. Nature 472: 499–503. Lee, S.Y. and Kim, H.U. (2015). Systems strategies for developing industrial microbial strains. Nat. Biotechnol. 33: 1061–1072. Choi, K.R., Jang, W.D., Yang, D. et al. (2019). Systems metabolic engineering strategies: integrating systems and synthetic biology with metabolic engineering. Trends Biotechnol. 37: 817–837. Lee, S.Y., Kim, H.U., Chae, T.U. et al. (2019). A comprehensive metabolic map for production of bio-based chemicals. Nat. Catal. 2: 18–33. Kanno, M., Carroll, A.L., and Atsumi, S. (2017). Global metabolic rewiring for improved CO2 fixation and chemical production in cyanobacteria. Nat. Commun. 8: 1–11. Jones, S.W., Fast, A., Carlson, E. et al. (2016). CO2 fixation by anaerobic non-photosynthetic mixotrophy for improved carbon conversion. Nat. Commun. 7: 20411723. Abdel-Mawgoud, A.M., Markham, K.A., Palmer, C.M. et al. (2018). Metabolic engineering in the host Yarrowia lipolytica. Metab. Eng. 50: 192–208. Ledesma-Amaro, R. and Nicaud, J.M. (2016). Metabolic engineering for expanding the substrate range of Yarrowia lipolytica. Trends Biotechnol. 34: 798–809. Gleizer, S., Ben-Nissan, R., Bar-On, Y.M. et al. (2019). Conversion of Escherichia coli to generate all biomass carbon from CO2 . Cell 179: 1255–1263.e12. Kim, S., Lindner, S.N., Aslan, S. et al. (2020). Growth of Escherichia coli on formate and methanol via the reductive glycine pathway. Nat. Chem. Biol. https://doi.org/10.1038/s41589-020-0473-5. Elowitz, M.B. and Leibler, S. (2000). A synthetic oscillatory network of transcriptional regulators. Nature 403: 335–338. Gardner, T.S., Cantor, C.R., and Collins, J.J. (2000). Construction of a genetic toggle switch in Escherichia coli. Nature 403: 339–342. Khalil, A.S. and Collins, J.J. (2010). Synthetic biology: applications come of age. Nat. Rev. Genet. 11: 367–379. Klein Geltink, R.I., Kyle, R.L., and Pearce, E.L. (2018). Unraveling the complex interplay between T cell metabolism and function. Annu. Rev. Immunol. 36: 461–488.

21

23

2 Genome-Scale Models Two Decades of Progress and a 2020 Vision Bernhard O. Palsson Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA

2.1 Introduction In a seminal paper in 1991, Jay Bailey outlined the field of metabolic engineering [1]. A few years later, whole microbial genome sequences started to appear: Haemophilus influenzae in 1995 [2] and Escherichia coli K-12 MG1655 in 1997 [3], a commonly used strain in metabolic engineering. With metabolic genes being a well-characterized class of gene products, the extensive annotation of these whole genome sequences with metabolic functions led to the ability to reconstruct genome-scale metabolic networks [4, 5]. Such network reconstructions could be converted into computational models using flux balance analysis (FBA) [6]. With the demonstrated ability to explain metabolic capabilities and to predict the consequences of gene deletions, these models proved to have utility in the metabolic engineering process. Genome-scale models (GEMs) have advanced notably over the past 20 years, through expansion in scope [7], validation using large data sets [8], and improved computational tools [9–11]. Consequently, GEMs have been broadly used for quantitative studies in microbiology [12–16]. In this chapter, I review these developments that started 20 years ago, and attempt to forecast what may lie ahead in the coming decade.

2.2 Flux Balance Analysis The dominant mathematical language and simulation approach used for GEMs today is that of constraint-based modeling via FBA. I begin this chapter by describing the fundamentals of this approach. 2.2.1

Dynamic Mass Balances

Metabolic engineering has roots in the field of chemical engineering and biomanufacturing. Chemical engineers use dynamic mass balances to describe chemical processes. Dynamic mass balances assume a homogeneous and continuous Metabolic Engineering: Concepts and Applications, First Edition. Edited by Sang Yup Lee, Jens Nielsen, and Gregory Stephanopoulos. © 2021 WILEY-VCH GmbH. Published 2021 by WILEY-VCH GmbH.

24

2 Genome-Scale Models

Dynamic mass balance

Vtrans

dX =S.v–b dt

Cell membrane

Vsyn System

Vdeg

Xi Vuse

Boundary

(a)

v = fn (X, ...) x = Metabolite concentrations s = Stoichiometric matrix v = Reaction fluxes b = Net transport out

Metabolic requirements e.g. for growth and maintenance

Flux balances in the steady state S.v=b

(b)

Unknown metabolic fluxes v

Figure 2.1 Basics of flux balance analysis. (a) The types of fluxes that affect the concentration of a metabolite (x i ). (b) Differential equations that describe the dynamic mass balance equations and the algebraic matrix equations that describe flux balances. Source: Modified from Varma and Palsson [6].

medium in which the reactions take place and are formulated by adding up all the fluxes going into and out of a node in a metabolic network. A node is represented by the concentration of a particular metabolite, xi . The fluxes into and out of the node fall into four categories (Figure 2.1): (i) metabolic reactions that synthesize the metabolite (vsyn ), (ii) metabolic reactions that degrade the metabolite (vdeg ), (iii) the uses of a metabolite for cellular functions (vuse ), and (iv) transporters that import or export the metabolite (vtrans ). dxi = vsyn − vdeg − vuse + vtrans dt The first three are “internal” fluxes, denoted by vi , while the last is a boundary flux denoted by bi . The latter is given a different symbol as it is subject to direct measurement by monitoring the concentration of xi in the medium. The totality of all individual dynamic mass balances, represented by the equation above, for all nodes are represented by the matrix differential equation dx = Sv − b, dt where v and b are vectors of internal and boundary fluxes, respectively, and S is the stoichiometric matrix. A network reconstruction is a knowledge base that leads to the formulation of S, as I discuss below. The dynamic mass balance equations take the form of ordinary differential equations. They contain relevant reaction rate laws that require myriad parameter values to solve them. Unfortunately, for most metabolic processes, numerical values for such parameters are typically not available, making it difficult to fully formulate the dynamic mass balance equations. If the sum of fluxes into a node exceeds the sum of the fluxes exiting the node, the time derivative of the concentration, dxi /dt, is positive and thus the concentration of xi is increasing with time, and vice versa. If the input and output fluxes sum up to be zero, then the time derivative is zero and xi is not changing over

2.2 Flux Balance Analysis

time, i.e. the node is in a steady state. The steady state equation is Sv = b. This is an algebraic matrix equation with the elements of v as the variables. The concentrations do not appear. Often the vector b is moved into the matrix S to form an alternate version of the flux balance equations (see [17]): Sv = 𝟎 This equation represents a balanced flux state around every node in the network. Thus, using it to analyze the properties of a network is referred to as flux balance analysis, or FBA. 2.2.2

Analogy to Deriving Enzymatic Rate Equations

Note that the quasi-steady state assumption (QSSA) applied to the intermediate X in the Michaelis–Menten reaction mechanism (i.e. d[X]/dt = 0) S + E ↔ X → P + E, where S is the substrate, E is the free enzyme, and P is the product, leads to the well-known Michaelis–Menten equation v=

Vm [S] , (Km + [S])

where V m is the maximum reaction velocity, K m is the Michaelis–Menten constant, and [S] is the concentration of the substrate. The QSSA is used to derive the basic enzymatic rate laws [18, 19]. It is based on a flux balance around the intermediate complex. 2.2.3

Formulating Flux Balances at the Genome-Scale

The dynamics of bacterial metabolism have short time scales and adjust quickly to changes in the cellular environment. Therefore, a QSSA can be applied relative to the time scale of growth and environmental changes, such that the time derivatives in the dynamic mass balances become zero. This assumption converts the dynamic mass balances to a simple algebraic equation Sv = 0, that represents flux balances in the steady state. In this global flux balance equation, the stoichiometric matrix (S) contains the set of stoichiometric coefficients required to solve the equations in addition to measured boundary fluxes. The fluxes through every node across the entire network are balanced simultaneously using the flux balance equation. Its solution is a flux vector (v) that describes the metabolic state of a cell in a given environment. This is a simple matrix equation with few parameters that represents an underdetermined system. It has an infinite number of steady state solutions that are contained in the null space of the stoichiometric matrix, Null(S), each one of which would balance the flux balance equations and form a steady state (Figure 2.2).

25

26

2 Genome-Scale Models

V2

V1

Vobj

Figure 2.2 Conceptual depiction of the null space of S with capacity constraints on the reaction fluxes (ai < v i < bi ) leads to the formation of a polytope in which optimal solutions are found. The solution space depicted is formed by two reactions (v1 and v 2 ) and an objective function (v obj ). FBA either maximizes or minimizes the flux of a user-defined reaction (i.e. the objective function) using linear programming. Optimal solutions lie at the periphery of the solution space. One optimal solution is shown by the green circle. Another common COnstraint-Based Reconstruction and Analysis (COBRA) method is flux variability analysis, in which the maximum and minimum fluxes through each reaction are computed while the flux of the objective function is constrained near its maximum value. The FVA points are shown by yellow circles mapping out a whole face of the solution space. Yet another common COBRA method is Markov chain Monte Carlo (MCMC) sampling, which computes randomly distributed candidate flux distributions (shown as red dots) throughout the solution space. Many useful properties are derived from the probability distribution for the fluxes. Unlike optimization, MCMC is unbiased, as no assumption of an objective is required. Source: Bordbar et al. [20]. © 2014, Reprinted by permission from Springer Nature.

2.2.4

Constrained Optimization

Additional criteria are therefore needed to find a particular solution of interest. An objective function can be stated to search the null space for the best solutions that optimize the stated objective. Since the biomass composition can be measured and is relatively well-known, an objective function that represents biomass synthesis can be formulated [21]. Optimizing this objective function within the null space would give the best growth rate that a cell can achieve under the conditions where the solution space is formed. This optimization problem is fully defined if S (the result from network reconstruction), measurements of the needed bi , and the biomass composition are available. This information can be obtained for a particular strain under growth conditions of interest. The optimization problem is well described and well understood in terms of the assumptions made and information used for its formulation. Primers on how to use constraint-based modeling are found in some studies [14, 16, 22–25]. 2.2.5

Principles

The basic principles underlying FBA are illustrated in Figure 2.3. They are relatively simple, clearly defined, and all parameters are either experimentally

2.2 Flux Balance Analysis

v3

Constraints (1) Sv = 0 (2) ai < vi < bi

v3

Optimization maximize Z

v1

v1 Unconstrained solution space

v2

v3

v1

Allowable solution space

v2

Optimal solution

v2

Figure 2.3 The basic steps of constraint-based modeling. Source: Orth et al. [22]. © 2010, Reprinted by permission from Springer Nature.

formulated or fundamental and knowable (i.e. stoichiometric coefficients). The combination of genome-scale reconstruction and applications of constraint-based modeling is known as COnstraint-Based Reconstruction and Analysis (COBRA). At the core of these methods is FBA and/or other types of constraint-based modeling methods. There are two major steps underlying COBRA: first, to form the solution space based on the reconstruction and imposed constraints, and second, to find solutions of interest in the solution space using optimization. The flexibility of COBRA methods to incorporate disparate data types and fundamental considerations, and to formulate different optimization problems, led to the inception of a community of researchers in the early 2000s and spawned a proliferation of methods and their applications (Figure 2.4). The details of many of these optimization methods are described in a recent book [26]. 2.2.6

Additional Constraints

The constraint-based approach is flexible and accommodating of additional constraints [17, 27]. The more data and information that we have about a system, the more constraints that can be formulated. Although the steady state flux balancing on individual nodes in a metabolic network in growing bacteria is a reasonable assumption, it is not always applicable. For instance, metabolomic profiles in human red blood cells (enucleated and thus nondividing) under blood storage conditions have been obtained, and they show that metabolites present in high concentrations are not steady, and that the changes in these concentrations drive changes in the state of the metabolic network [28]. Such time profiles of the metabolome can be used to estimate the rate of change of these metabolites. These estimates can be used to make the flux balances on these nodes nonzero and constrained by the estimated values of the time derivatives. This approach is called unsteady FBA (uFBA), and it is embodied in the workflow illustrated in Figure 2.5. This example is typical of the practical use of COBRA methods as part of complex workflows, helping with strain design and the interpretation of data obtained from bioprocesses. More on this below.

27

2 Genome-Scale Models

Flux balance Dynamic FBA DMMM analysis with SIM MD-FBA SEM biological pFBA SMM Genomic- constraints FBA PhPP Monte Carlo context analysis sampling FBAME ExPa Geometric FBA EFM MCMC FBAwMC AOS Alpha Minimal E-flux Exometabalome Metabolite spectrum FVA cut sets essentiality Elementary Bayesian MOMA states Gene flux patterns FFCA GNI FBA deletion FCF analysis Alternative ne at ion e Network-based G rb optima Reductive Drug pathway analysis rt u pe S ROOM evolution off-target t simulation analysis Message passing Flux balance analysis

n sig de

S

tFBA

MBA PROM

Re gu lato ry

Gene Force

met h o d s

OptORF OptReg EMILiO Opt Force RobustKnock-proxy

rFBA SR-FBA iFBA idFBA Regulatory mechanisms

Thermodynamics

Flux minimization

Thermodynamic realizability TMFA Thermodynamic parameter-based analysis

NET analysis

EBA ll-COBRA Loop removal

Gene GDLS deletion CiED design OptGene SA, SEAs

FSEOF

Biased

Constraints from ’omic data

in ra

–1 0 0 0 1 –1 0 –1 0 1 –1 0 0 0 1 1

GIMME ShlomiNBT-08

sed bia Un

Symbolic flux analysis MADE

Mo de l re fin em ent

28

Evolutionary SA search Bayesian selection ObjFind

RobustKnock

OptStrain Reaction addition design

OptKnock Objective tilting Reaction perturbation design

Algorithmic gap-filling SCAR GapFind, GapFill OMNI Network-based Gap-filling enzyme localization

NISE LPPFBA Grow Match

BOSS

Objective function development

Figure 2.4 The phylogenetic tree of COBRA methods as of 2012 showing the major applications of constraint-based modeling. Source: Lewis et al. [12], where these classes of algorithms are detailed. © 2012, Reprinted by permission from Springer Nature.

2.2.7

Flux–Concentration Duality

Finally, I note that there is a duality between fluxes and concentrations in the dynamic mass balance equations. Thus, one can use either fluxes or concentrations as the state variables describing the state of a metabolic network [30]. The fluxes are dominant variables as they lead to a balanced or homeostatic state of a network, which is the hallmark of systems biology (i.e. to study how myriad cellular components function together to produce a phenotype). Concentrations are more easily measured than fluxes. Thus, the left null space of S will become important as it is likely to lead to a new set of experimentally derived constraints. Unlike the Null(S), the left null space remains largely unexplored [31–33]. 2.2.8

Recap

Flux balances were used to analyze small scale networks [6] when the first genome sequences became available in the mid to late 1990s. Their annotation enabled the development of genome-scale metabolic network reconstructions. With realistic networks and the initial success with FBA methods, a community of researchers developed to advance the COBRA methods that have been employed for a wide range of biological research pursuits. Given the importance

S . v ≥ b1 S.v≤b

Flux 1

2

vmin ≤ v ≤ vmax

Flux 2

Flux 3

Figure 2.5 A workflow that has been developed to incorporate knowledge of time-dependent changes in metabolites. The concentration profiles lead to the definition of the derivatives of the concentrations that in turn can be used to imbalance the flux balances on the corresponding nodes by the measured rate of change. Source: Bordbar et al. [29]. CC BY 4.0

30

2 Genome-Scale Models

of optimal performance production strains, COBRA tools found their way into complex workflows that are used for strain design and the development of bioprocesses [14, 24, 34]. Numerous COBRA methods have been developed as well as the computational tools with which to implement them [9, 13, 25, 35–38].

2.3 Network Reconstruction 2.3.1

Assembling the Reactome

The basics of FBA and constraint-based modeling are conceptually relatively simple. In practice, they are intricate and laborious given the scale of GEMs. We need an accurate genome-scale version of S for the target strain. Its formulation requires the implementation of a protocol underlying the complex genome-scale reconstruction process. This protocol has been laid out [37, 38], partially automated [39], and has quality control standards [40]. A genome sequence is assembled from sequencing reads, and by analogy, a metabolic reactome for a genome is reconstructed from the chemical equations describing every enzyme known to occur in the target organism (Figure 2.6). 2.3.2

Basic Principles of Network Reconstruction

The principles of network reconstruction are conceptually straightforward. We need to identify all the genes on a genome that encode enzymes, find the biochemical reactions that they catalyze, and then organize this knowledge into a format that can be queried computationally and converted into GEMs. Genome assembly

Contigs

Comprehensively represents genetic material in all human cells

Reads

Chromosomes

Reactions

Base pairs

Compounds

Pathways

ACTNM0 0.667

Network annotation

Comprehensively represents biochemical activities in all human cells

Methylglyoxal

Gene–protein relationships

Figure 2.6 Genomes and reactomes are reconstructed from their constituent parts – reads and reactions, respectively – to form a coherent whole or an “ome.” Source: Palsson [17]. Reproduced with permission from Cambridge University Press.

0

1

–1

0

–1

H

0

1

adp

0

g6p

0

f6p

P GI

PFK

FB P

FBA

TPI

glc-D

0

0

0

0

0

–1

0

0

0

0

0

0

0

–1

0

0

0

0

0

1

0

0

0

0

1

0

1

0

0

0

0

1

–1

0

0

0

0

0

0

0

1

–1

1

0

0

0

fdp

0

0

0

1

–1

–1

0

0

pi

0

0

0

0

1

0

0

0

h2o

0

0

0

0

–1

0

0

0

g3p

0

0

0

0

0

1

1

0

dhap

0

0

0

0

0

1

–1

0

atp HEX1

h adp

g6p PGI

f6p

pi atp FBP

PFK

h2o

adp h fdp

EX–glc

HEX1

glc-D[e] GLCt1

GLCt1

2.3 Network Reconstruction

glc-D[e]

–1

glc-D atp

=S

FBA dhap TPI

(a)

g3p

(b)

Figure 2.7 Principles of network reconstruction. (a) The first few reactions in glycolysis are shown, and (b) the corresponding stoichiometric matrix is formulated. Reactions: GLCt1, glucose transport (import); HEX1, Hexokinase (D-glucose:ATP); PGI, phosphoglucose isomerase; PFK, phosphofructokinase; FBP, fructose bisphosphatase; FBA, fructose-bisphosphate aldolase; TPI, triose-phosphate isomerase; EX_glc, glucose exchange; Metabolites: glc-D[e], D-glucose exchange; glc-D, D-glucose; ATP, adenosine triphosphate; H, Hydrogen; ADP, Adenosine diphosphate; G6P, D-glucose 6-phosphate; F6P, D-Fructose 6-phosphate; FDP, D-fructose 1,6-bisphosphate; pi, phosphate; H2 O, water; G3P, glyceraldehyde 3-phosphate; DHAP, dihydroxyacetone phosphate. Source: Palsson [17]. Reproduced with permission from Cambridge University Press.

The basic principles can be illustrated using a simple example that describes the first few reactions in glycolysis (Figure 2.7). First, the set of reactions are defined and a pathway or network map is sketched out. Then, a list of proper chemical equations is written down that describes every reaction in the network, including the boundary fluxes. These chemical equations have to be accurately represented, for instance, with respect to elemental and charge balancing. Coarse-grained thermodynamic information in terms of reversibility or irreversibility is noted. A matrix of stoichiometric coefficients is then formed. In this matrix, a row corresponds to a metabolite (i.e. a node in the network) and a column corresponds to a biochemical metabolic reaction (i.e. a link in the network). The entries in the matrix are integer stoichiometric coefficients. This makes S a “connectivity matrix” that contains topological information about the network, as discussed below.

31

32

2 Genome-Scale Models

We can progressively expand the scope of a network in terms of its content and move forward with a reconstruction in a stepwise fashion; first complete glycolysis, then add the pentose pathway, and so on, until the pathways known to be encoded on a genome are completed. If we exhaust all the information available about the metabolism of an organism, we have reached the genome-scale. This network will represent all possible metabolic states under all conditions where the organism can grow. This feature is one of the characteristics differentiating GEMs from the classical condition-specific biophysical models, i.e. the thinking is genome-centric and focused on possible phenotypic states. 2.3.3

Curation

Genome annotation alone normally leads to incomplete networks that contain knowledge “gaps.” These gaps can then be filled, by computational methods [41] or experimental discovery [42], to get a contiguous network that can be flux balanced. The metabolic demands are then formulated, the most common of which is the biomass objective function [21]. Metabolic demands are typically not described in terms of proper chemical equations but as a lumped set of fluxes out of the metabolic network to meet a measured demand [43]. 2.3.4

GEMs Have a Genomic Basis

Every reaction in a genome-scale network should have a genomic element describing its function. A genomic basis is found for the majority of processes in a GEM, but usually not for all of them. For example, a process called gap-filling is often required to add reactions for enzymes based on only indirect evidence of their existence [44]. Transporters fall into this category, as they are often not well-known, but we know that a compound is transported across a membrane. Inclusion of such nongene-associated processes forms the basis for hypothesis formulation and for experimental testing. The relationship between a gene and a reaction is known as a GPR (gene to protein to reaction) association. GPRs have been formulated on the genome-scale and are mathematically represented by logistical equations, such as a Boolean expression. An example for glucose-6-phosphate isomerase (PGI) is shown in Figure 2.8. The genomic location of the gene is represented. The gene (teal), transcript (purple), and protein (orange) form linked boxes that are associated with the isomerization reaction of glucose-6 phosphate to fructose-6 phosphate. 2.3.5

Computational Queries

Boolean statements are used to describe GPRs. If the PGI gene is removed from the genome, then a series of Boolean statements are used to eliminate the isomerization reaction from the network; in other words, no gene then no protein then no reaction. A gene deletion is reflected in constraining the corresponding reaction flux to zero. In this way, a computational prediction of essential genes and synthetically lethal gene pairs can be made [45]. Predictions of gene essentiality are made by

2.3 Network Reconstruction

lysC 4,230,000

pgi 4,232,000

GLCT1

GLC-D

GLC-D[E]

b4025 ATP

b4025

HEX1 ADP

Pgi

H G6P

PGI

G6PDH2R

NADP

6PGL

H NADPH

PGL

H2O

F6P

6PGC

H

GND

NADP

RU5P-D

CO2 NADPH

RPI

RPE

R5P

XU5P-D

ATP

Pi

TKT2

PFK

FBP

TKT1

ADP H2O H S7P

FDP

G3P

FBA DHAP

TALA

G3P

TPI

NAD

F6P

Pi E4P

GAPD

H NADH

13DPG

Figure 2.8 Principles of network reconstruction. The representation of gene-protein-reaction (GPR) association, and the network reparation of a gene deletion. Source: Palsson [17]. Reproduced with permission from Cambridge University Press.

computing the growth capabilities (i.e. the synthesis of all biomass components in the right ratios) of an in silico cell by removing the activities of a single gene. Synthetic lethals can be predicted by simultaneously removing two genes from an in silico cell and computing its growth capabilities. Such predictions of biological functions represent perhaps the largest scale and most intricate computational predictions of phenotypes performed to date, reaching hundreds of thousands of predicted experimental outcomes. The comparison of computed lethality and experimental measurements in a number of studies is shown in Figure 2.9. 2.3.6

Scope Expansion

The scope and contents of reconstructions has continued to grow over time. Features of genetic elements and their gene products have been incorporated into reconstructions. We mention three here. 1) GPRs can add genomic features that can be used to expand the scope of a reconstruction (Figure 2.8). For instance, promoter information comes with the genomic location of the gene. This information can be used to reconstruct and include transcriptional regulatory information. 2) In multistrain reconstructions, an allele can be associated with the sequence of an ORF found in a particular strain [46]. The totality of all alleles found leads to the definition of the alleleome for a set of sequenced genomes from many

33

2 Genome-Scale Models Number of predicted phenotypes

Prediction accuracy (%) 100

1,000,000

SKO prediction accuracy

10,000

E. coli double knockout library

n

l

100,000

d

First validated prediction of integrated metabolic and regulatory network First validated genome-scale model (GEM)

100

10 1998

f e

c

a

b

2000

2002

k

h i

g

70

m

60

j

50 E. coli S. cerevisiae H. pylori B. subtilis

First validated prediction of gene–gene interactions using a GEM 2004

2006

2008

90 80

DKO prediction accuracy

1,000

2010

2012

2014

40 30 20

2016

Year of publication

Figure 2.9 Predicting experimental outcomes of cellular growth screens. The number of predictions made on growth screens that cross environmental conditions with gene knockouts has grown steadily over the past 15 years. Over this time, both single-gene knockout (SKO) predictions (red line) and double-gene knockout (DKO) predictions (blue line) have become increasingly accurate. Source: Monk and Palsson [45]. Reprinted with permission from AAAS.

strains of a species [47]. The alleleome of metabolism in the E. coli species was first delineated in 2017 [48], and is shown in Figure 2.10. For most of the metabolic ORFs in E. coli, the alleleome is “shallow” (i.e. few amino acid substitutions are found across a large number of strains) while a handful of ORFs have a “deep” alleleome, with HisD having the largest number of amino acid substitutions.

990 Conserved metabolic genes

34

hisD-7 hisD

hisD 8 hisD-4

gdhA

hisD 6

hisD-3

pyrD

hisD-1

hisD 5

ALDH-like domain

pfk

HisD 5%

5% 5% 5%

48%

10%

yhbG

0

2 4 6 8 10 Avg. AA differences

hisD-2

0.01

14%

8%

Figure 2.10 The alleleome for the metabolic genes in E. coli. Across 99% of 1122 clinical E. coli strains, 976 genes with metabolic functions are conserved. This core set of metabolic genes has mutations. The chart on the left shows the average number of amino acid substitutions in these core genes for the 1122 strains of E. coli. A few genes have many alleles, while most have six amino acid substitutions or less on average. HisD had the highest number of alleles. The pie chart represents the percentage of strains that contain unique hisD alleles. The hisD allele in E. coli K-12 MG1655 is present in only 19 (1.7%) of the sequenced strains. The structure of HisD can be used to display the location of the amino acid sequence variations and to classify the structures into eight families. Note that such broad variation in the alleles of an ORF is likely the result of selection pressures that are unknown at present. Source: Adapted from Monk et al. [48].

2.3 Network Reconstruction

3) Protein structures can be associated with the protein, as has been done for the recent E. coli [48] and human [49] reconstructions. The inclusion of protein structures enables a variety of applications in what is called structural systems biology [50]. For instance, the structure for HisD can be used to visualize and classify the allelic variation in HisD (Figure 2.10). Some of these issues are further discussed below. 2.3.7

Knowledge Bases

The above considerations lead to the important concept that a well-curated genome-scale reconstruction is a knowledge base. It represents disparate types of information about a genome and its properties. This knowledge base can be computationally represented and queried. The in silico gene essentiality assessment is one such query. A reconstruction allows us to scale up thinking from pathways to networks. For instance, we can query the topological properties of a network by examining properties of S. If we sum up all the nonzero elements in a row of S, we get a result that represents the number of reactions in which the corresponding metabolite participates. This number is known as the “connectivity” of a metabolite. We can rank order the connectivities and plot them in a log–log scale. Deeper and more sophisticated topological properties of a network can be obtained by studying the singular value decomposition of S, an advanced topic covered in [17]. 2.3.8

Availability of GEMs

Since the development of the first GEM for H. influenzae [4], the field of genome-scale modeling has advanced significantly with a rapid rise in the number of GEMs built [51, 52]. The number of tools and methods involved in network reconstruction and analysis also bloomed, which accelerated the model-building process [53] and enabled numerous uses of GEMs [54]. As of 2019, GEMs had been generated for more than 6000 sequenced genomes either manually or via automatic GEM reconstruction tools [53], covering bacteria, archaea, and eukaryotes. These reconstructions need curation, which leads to a set of validated GEMs [55]. Standards have been developed to give models a set of confidence scores [40]. 2.3.9

Recap

Reconstruction technology has advanced significantly since the initial reconstruction of genome-scale networks 20 years ago. We can now add increasingly more cellular processes into a reconstruction to achieve a true genome-scale view of a target genome. We can also add an increasing amount of information about the genes and the proteins in a reconstruction. A reconstruction is a knowledge base that we can computationally query, and convert into computational GEMs. GEMs can be used to advance the metabolic engineering process.

35

36

2 Genome-Scale Models

2.4 Brief History of the GEM for E. coli 2.4.1

Origin

In 1995, the full sequencing of the first genome (for H. influenzae [2]) was a major milestone in the history of biology. For the first time, we had the full genomic sequence that encoded all the genes and other genetic elements necessary to describe an autonomous living organism. This genome was 1,830,137 Mbp in length and had on the order of 1743 open reading frames (ORFs) identified. Of these, 1007 ORFs were functionally annotated. Of the functionally annotated ORFs, the metabolic genes were associated with 488 metabolic reactions that involved 343 metabolites. The biochemical function of an enzyme can be described by the chemical equation of the reaction that it catalyzes. All of these chemical equations, effectively representing the reactome of H. influenzae metabolism, could be used to reconstruct the genome-scale metabolic network, and to formulate the corresponding stoichiometric matrix, S. Thus, the genome-scale reconstruction process was born, and it led to the first GEM of a metabolic network [4]. 2.4.2

Model Organism

E. coli has long been a model organism for biological research. It also became a model organism for developing GEMs and systems biology, and subsequently their use in metabolic engineering. The E. coli K-12 MG1655 genome appeared in 1997 [3], and the GEM built based on its first genome annotation appeared in 2000 [5]. This initial metabolic reconstruction represented 660 ORFs. Following the initial reconstruction, increasingly more metabolic functions encoded on the E. coli genome were added to its metabolic network reconstruction. Four serial expansions of the genome-scale reconstruction have appeared, systematically expanding the scope of the reconstructed network (Figure 2.11). The added metabolic genes came from new gene function discoveries as well as from more exhaustive searches of the literature (Figure 2.12). The latest version of this reconstruction contains 1515 genes [48]. The genes of unknown function in E. coli K-12 MG1655 are called y-genes, as their given name starts with a “y.” The y-ome consists of all the identified genes on the genome that do not have an established function. In 2019, an assessment of the y-ome concluded that it contained 1600 genes [57]. These were divided into two groups: 1489 genes for which partial putative functional assignments had been made and 111 genes for which no information is available regarding their function. Currently, machine learning mechanisms are being developed [58] that have predicted 113 metabolic gene functions in the y-ome. These predictions now await experimental assessment. Thus, at some point in the future, we may have the complete list of metabolic genes in E. coli defined. Past that, we will still be challenged with finding all of the reactions that they catalyze, as some enzymes are promiscuous and can catalyze similar reactions, at a lower conversion rate. These metabolic capabilities are referred to as underground metabolism [59, 60]. It has been shown that such underground metabolic functions can be

2.4 Brief History of the GEM for E. coli

iAF1260

Genes Reactions Metabolites

2500 2000 1500 1000

Amino acid and nucleotide biosynthesis

500 0

Periplasm added Extensive cell wall metabolism

iJR904 Thermodynamics

Alternate carbon iJE660 utilization Genome used Quinone characterization Fatty acid Cell wall & metabolism Elemental co-factor and charge biosynthesis Expanded balance transport Growthsystems dependent biomass objective function

Pre-genome era

1990

iML1515

Protein structure

iJO1366

Reactive oxygen species metabolism

Expanded co-factor metabolism

Metabolite repair pathways

Genome era

1995

2000

2005

2010

2015

Figure 2.11 Historical development of the reconstruction of the E. coli metabolic network: The numbers of reactions, metabolites, and genes graphed has increased significantly over the seven iterations of this reconstruction. A larger and larger scope of metabolic capabilities was incorporated to the point where the literature was effectively exhausted and new discoveries can be added as they appear. According to the naming convention for network reconstructions, model names consist of an “i” for in silico followed by the initials of the person who built the model, and the number of open reading frames accounted for in the reconstruction. Source: Fang et al. [7]. © 2020, Reprinted by permission from Springer Nature.

Number of genes 25

Number of new genes

25 20 16 15 10

10 2

1

2

1 1

2 2

2

3 3

19 8 19 6 8 19 8 9 19 2 9 19 3 9 19 4 9 19 5 9 19 7 9 19 8 9 20 9 0 20 1 02 20 0 20 3 0 20 4 0 20 5 06 20 0 20 7 0 20 8 09 20 10

1 1

11

7

7

5 0

10

Year

Figure 2.12 Histogram the new genes added to the iAF1260 reconstruction to form iJO1366. These came from curated literature (prior to 2007) and new gene functions discovered between 2007 and 2010. Source: Orth et al. [56]. Reproduced with permission from John Wiley & Sons.

37

38

2 Genome-Scale Models

enhanced using laboratory evolution by applying the appropriate selection pressure [61]. Thus, the metabolic network of E. coli has been comprehensively reconstructed, although more work remains to be done. 2.4.3

Key Predictions

Shortly after they were formulated, the GEMs for metabolism made meaningful predictions. Predictions of gene essentiality were successful (Figure 2.9) and predictions of optimal growth rates were impactful [62]. Perhaps the most impactful early predictions were those for adaptive evolution [63, 64]. GEMs can describe both proximal and distal causation in biology, and never before had distal causation (i.e. adaptation with acquisition of mutations) been successfully predicted at this level. Laboratory evolution was used to improve the growth rate of E. coli on glycerol from a suboptimal growth rate to an optimal state. The endpoint matched the predicted metabolic and growth state of E. coli (Figure 2.13). The evolution trajectories were not predicted but could be mapped out as a function of oxygen and glycerol uptake rate. A few years later, DNA sequencing technology had reached the stage where the causal mutations could be discovered [65]. The primary causal mutations were found in glycerol kinase and in RNA polymerase and the interactions between these mutations were then detailed using multiomic data analysis [66]. Thus, throughout the 2000s, metabolic GEMs broadened in their scope and range of applications. Importantly, they showed the ability to understand and predict intricate biological processes and proved useful for metabolic engineering purposes [14, 35, 67]. 2.4.4

Design Algorithms

Metabolic engineering efforts around microbial factories aim to attain near-optimal substrate-to-production conversions. This involves modifying the gene content so as to create rational strain designs. To this end, a number of predictive algorithms have been developed, beginning with OptKnock in 2003 [68], which propose gene deletions to couple cellular growth with an overproduction goal. Using the strong duality property allows the initial bilevel formulation to be recast as a single-level mixed-integer linear program (MILP). OptKnock was further extended with OptStrain to account for both reaction deletions and additions by adding heterologous enzymes catalyzing non-native metabolic functions [69]. First, OptStrain, using an MILP framework, identified the set of reactions to be added such that the desired product is maximally produced. Then, OptKnock is implemented to remove competing pathways and ensure high product yields coupled to organism growth. OptForce was later developed to leverage the availability of process and/or flux data sets obtained through 13 C stable labeled experiments [70]. OptForce identified which reactions had to change (i.e. increase, decrease, or be eliminated) so that the production level of a target metabolite was guaranteed (worse-case optimization). A minimal subset of reactions is then extracted

2.4 Brief History of the GEM for E. coli

Figure 2.13 Growth of E. coli K-12 on glycerol. a, Change in growth rate with time for three adaptive evolution experiments. b, The pre-evolution states. The line of optimality (LO) is shown in red. Gl-UR, glycerol uptake rate; OUR, oxygen uptake rate. c, States during the adaptive evolution. Experimental values for E1 are indicated in blue, for E2 they are indicated in green, and for E3 in red. The starting point of evolution for E1 and E2 is indicated in black (day 0). d, The metabolic states after 40 days (about 700 generations) of evolution. e, The state after 60 days (1000 generations) of evolution. Source: Ibarra et al. [63]. © 2002, Reprinted by permission from Springer Nature.

39

“Omics” data E. coli i2k

Transcription and translation

Genomics Regulatory ections

Transcriptomics

Proteomics

Regulation

Proteins

Monomers and energy

Metabolism

Metabolomics

Input signals

Interactomics Environment

Figure 2.14 Integrated constraint-based model of E. coli: the E. coli i2K model. Constraint-based modeling frameworks have been developed for metabolism [6, 74–78], regulation [79], transcription, and translation [80]. The connectivity among the three modeling components is shown here. Integration of these three modeling components should produce an integrated model of E. coli that accounts for nearly 2000 genes, referred to as the E. coli i2K model. The i2K represents the gene count. Almost 20 years later, GEMs are approaching representing activities of 2000 ORFs. This model can be used to reconcile diverse omics data types and utilize the data to more accurately predict a cellular phenotype. Source: Reed and Palsson [81]. © 2003 American Society for Microbiology.

2.4 Brief History of the GEM for E. coli

such that all other network fluxes are forced to be consistent with the design objective. A further development beyond OptForce was k-OptForce [71], which designed interventions consistent with enzyme kinetics and physiological metabolite concentrations. The resultant bilevel formulation consists of nonlinear kinetic expressions in the outer problem and a linear inner problem, which is reformulated using strong duality to produce a single-level mixed-integer nonlinear program (MINLP). GEMs can also be used to not only guide the re-engineering of microbial hosts but also the de novo design of genomes tailored to specific engineering outcomes. minGenome [72] iteratively identifies a monotonically decreasing size order of a sequence of contiguous genomic stretches which are dispensable with respect to maximum biomass yield and other desired metabolic traits. This approach can be used for genome streamlining and the reduction of transcriptional/translational burden and redirection of resources toward targeted overproductions. Algorithmic developments for the redesign of not only microbial hosts but also plants are beginning to leverage the information encoded within GEMs. SNPeffect [73] assigns a putative explanation for the role of SNPs present in coding regions by linking changes in the observed agronomic data (i.e. growth yield, nitrogen efficiency, etc.) with measured omics data such as protein levels and/or metabolite concentrations. This information can then be used to devise genomic selection-based methods to combine multiple SNPs for a superior cultivar with desired phenotypes.

2.4.5

Scope Expansions

GEMs for metabolism were well developed by the early 2010s and they enabled a variety of applications for use in basic science and engineering [16, 20]. Clearly, the COBRA approach had helped with building the field of systems biology for metabolism. This led to further explorations of possible applications of the COBRA approach to other cellular processes. A vision for how this process might unfold was put forth in 2003 (Figure 2.14). It called for three key features: (i) the reconstruction of the translation/transcription process to compute the composition of the proteome, (ii) the use of metabolite signals as activators and inhibitors of transcription factor activity, and (iii) the formation of a mechanism-based approach to multiomic analysis. It has taken almost 20 years to realize this vision. Throughout the 2010s, progress was made with developing models of the properties of the proteome. In 2009, a reconstruction of the translation/transcription machinery appeared [82], that was subsequently integrated with metabolic models [83, 84]. The computation of the composition of the pangenome was enabled, and through the reconstruction of the protein translocation machinery [85] the proteome could be compartmentalized. At the beginning of the 2020s, reconstruction of stress mitigation mechanisms took place, enabling the computation of stress responses. This history is summarized in Figure 2.15.

41

2 Genome-Scale Models Metabolic models

Metabolism

(2000s)

Amino acid, nucleotide, cell wall and co-factor biosynthesis Growth dependent biomass objective function Fatty acid metabolism Expanded transport system Periplasm added Extensive cell wall metabolism Thermodynamics

iJE660 (2000)

Proteome synthesis

ME models Alternative carbon utilization Quinone characterization Elemental & charge balance

iJR904 (2003)

Transcription and translation machinery

iAF1260 (2007)

Strain-specific genome sequences

(2010s) Expanded co-factor metabolism

E-matrix (2009)

iJO1366 (2011)

Multi-E. coli (2013)

iML1515 (2017)

(2020s)

iOL1650-ME (2013)

Protein translocation

Protein structure Reactive oxygen species metabolism Metabolite repair pathways

Stress response

42

Metalloprotein oxidation & repair

iJL1678-ME (2014)

Proteostasis network Phenotypic data from stress-evolved strains

OxidizeME (2019)

foldME (2017)

Acid response mechanisms

AcidifyME (2019)

Sensome: twocomponent systems StressME

Figure 2.15 History of generated and potential future genome-scale models of E. coli. Gray ovals indicate models, and blue boxes represent data incorporated to generate the models.

2.4.6

Recap

The genome-scale reconstruction of E. coli’s metabolism advanced after its initial version was established in 2000. In the early 2000s it became clear that the reconstruction process could be applied to other cellular processes. This led to a steady growth in the scope of the reconstruction, which in turn led to a family of ever more comprehensive GEMs.

2.5 From Metabolism to the Proteome 2.5.1

ME Models

As we have seen, GEMs for metabolism (also known as M Models) have a long history of development and use, and their utility motivated an extension to other cellular processes. The entire translation and transcription machinery can be reconstructed in terms of chemical equations, as is possible for metabolism [82]. For E. coli, this reaction network comprises 423 gene products that can be converted into a computational model, referred to as an E model (E for expression).

2.5 From Metabolism to the Proteome NTPs

Transcription reaction

mRNA

+

M-model

AA

tRNA charging reaction Protein synthesis

+

+

Protein

M

Integration

Translation reaction

E

+

+

Metabolic reaction

Enzyme

+

FoldMe: Temperature stress Unfolded peptide DnaK - assisted folding

Spontaneous folding

Mn

GroEL - assisted Folding

Zn

2+

Mn2+

2+

Suf

IscU

Zn2+

AcidifyME: Acid stress Native substrate Unfolded substrate pH-dependent protein activity Soluble & stability Inactive Active HdeA complexes HdeA H+

Fe2+

Metabolic enzymes Reactants

OxidizeME: Oxidative stress

Demetallation & mismetallation

Products

Iron-sulfur cluster damage & repair Fe2+ + H2O2 + HO`+ OH- + Fe3+

Fenton chemistry

Periplasm

H+

Glu

GABA

Cytoplasm

H+ CBO, NDH-I, NDH-II, SDH

H+

ATP synthase, cytochrome d

Glu H+

GABA CO2

DNA damage

Figure 2.16 General formulation of the ME model and its application to the study of stress response. ME models are generated through integration of M models and protein synthesis pathways (E models). Stress-specific response mechanisms can be integrated with the ME model to produce stress specific ME models: FoldME, OxidizeME, and AcidifyME. Source: Fang et al. [7]. © 2020, Reprinted by permission from Springer Nature.

This model, for example, accurately computed the ribosome production, without any parameterization. By integrating the metabolic and expression models, one can form integrated models of metabolism and gene expression (Figure 2.16), called ME models. As mentioned above, these models can compute various properties of the proteome. In addition, they serve as a basis for developing stressME models that incorporate known stress response mechanisms. 2.5.2

Capabilities of ME Models

The integration of metabolism and expression opens up the ability to compute properties of the proteome. These properties range from proteome allocation and condition of the metalloproteome, to the effects of stress functions. These capabilities are just beginning to be explored. They also enable deep exploration of scientific questions and the characterization of organism properties. We provide a few examples below. 2.5.2.1

Growth-Coupled Metabolic Designs Can Be Reproduced in GEMs

A recent survey of the published literature identified 89 strain designs that were described as growth-coupled [86]. These designs included 10 native metabolites and 15 non-native compounds (Figure 2.17). The ME model could reproduce

43

44

2 Genome-Scale Models

Figure 2.17 The engineered fermentation pathways in E. coli. All the engineered pathways from the literature survey are shown, along with their metabolic precursors. Source: King et al. [86]. © 2017 Elsevier.

40–62 of these designs, depending on which parameters were used, outperforming M models. 2.5.2.2

ME Models Can Reflect Properties of the Metalloproteome

The catalytic activity of inorganic iron has played a central role in metabolic processes dating back to the beginning of life on earth. As a result, the function of many processes in living cells still depends on the availability of iron (Figure 2.18a). Using ME models, we can predict how a reduction in the availability of iron, and thus reduced enzyme activity, causes the metabolic phenotype of E. coli to change. In agreement with experimental observations, decreasing iron availability causes the cell to grow much slower and shift to secreting D-lactate instead of acetate (Figure 2.18b,c). The predicted causes of this metabolic rewiring is the reduced activity of the metabolic enzymes that have iron in their catalytic site. 2.5.2.3

ME Models Can Compute the Biomass Objective Function

M models require the statement of the biomass objective function. This function is formulated from the biomass composition of the organisms in question. The biomass composition of E. coli K-12 MG1655 has been established, but is typically not available for most organisms of interest.

2.5 From Metabolism to the Proteome

ATP synthase

Glucose uptake

Transhydrogenase

PPP

Oxidative respiration

Glycolysis

Citric acid cycle Oxidative stress

Lactate secretion

Carbon secreted (Cmol Cmol−1)

(a)

(b)

In silico prediction

0.8

0.8

Acetate secretion

Experimental measurement Lactate

0.6

0.6

0.4

0.4

0.2

0.2

0.0

0.0

0.2

0.4

Relative growth rate

0.6

0.0

Acetate

0.1

0.2 Growth rate

0.3

0.4

(h–1)

Figure 2.18 ME models can be used to study properties of the metalloproteome. (a) Central carbon metabolism with the reactions catalyzed by iron-containing enzymes in red. (b) In silico predictions of relative growth rate and metabolic byproduct secretion when under varying levels of iron limitation. (c) The in silico predictions agree with experimental measurements of E. coli growing in iron limitation. Source: Schmidt [87]. © 2019 Elsevier.

Thus, the metabolic demands for biomass formation are provided in M models, whereas ME models compute the biomass composition, including that of the proteome. The amino acid composition of E. coli was determined by the hydrolysis of the proteome followed by a determination of the relative amino acid abundances. These are shown in Figure 2.19. The computations of the biomass composition in the ME model are condition-dependent. We can thus compare ME-computed biomass composition under multiple conditions to those that have been measured under only one (Figure 2.19). This feature of the ME models enables a computational assessment of condition-dependent biomass composition down to the level of micronutrients [88].

45

2 Genome-Scale Models

Amino acids

Growth normalized synthesis (mmol (gDW)−1) LAl L- an L- Ar ine As gin pa in L- ra e As gi p ne L- ar t Cy at e LG stei lu ne t L- a G m lu at ta e m G ine L- lyc H i L- is ne Is tid ol in eu e L- ci Le ne u L v cin L Ly e L- -M Ph eth sin en io e yl nin al e an L- in Pr e o L lin L- -Se e Th r i L- re ne Tr on yp in to e L- ph Ty an ro s L- ine Va lin e

46

101 100

ME-aerobic ME-anaerobic iJO1366 BOF

10–1 10–2

Figure 2.19 Comparison of ME- and M-model predicted amino acid growth-normalized synthesize rates. The ME-model predictions are a function of the predicted intracellular fluxes provided by the simulation, whereas the M-model values are provided externally by the biomass objective. ME-model predictions are thus condition dependent and are shown for aerobic and anaerobic in silico conditions. Source: Lloyd et al. [88]. CC BY 4.0.

2.5.2.4

Computing Stresses

StressME models offer the opportunity to estimate the costs of various stresses experienced by production strains under bioprocessing conditions and evaluate how much they reduce the theoretical productivity of the strains. High temperatures destabilize protein. Protein structural properties determine their heat lability. GEM-PRO can associate a structure with every reaction in an M model (Figure 2.20a). Four processes determine the functional state of a protein as a function of temperature (Figure 2.20b). Thermal instability creates the need for chaperones that refold a denatured protein (Figure 2.20c), and these processes can be reconstructed and used to compute the chaperone required to refold the proteome as a function of temperature. This, in turn, allows for the computation of proteome allocation to maintenance functions and for growth functions. Remarkably, these computations reveal that the computed proteome allocation is predictive of the temperature sensitivity of growth (Figure 2.20d). The computed proteome allocation for optimal growth at different temperatures is supported by expression profiling data (Figure 2.20e). The chemical mechanisms involved in the damage caused by reactive oxygen species (ROS) have been characterized. They can be reconstructed and added to an ME model. OxidizeME was formulated and used to explain four key responses to oxidative stress: (i) ROS-induced auxotrophy for branched-chain, aromatic, and sulfurous amino acids; (ii) nutrient-dependent sensitivity of growth rate to ROS; (iii) ROS-specific differential gene expression separate from global growth-associated differential expression; and (iv) coordinated expression of iron–sulfur cluster (ISC) and sulfur assimilation (SUF) systems for ISC biosynthesis (Figure 2.21). All these cases are rooted in the location of ROS sensitive enzymes in the metabolic network, as was previously discussed (Figure 2.18). Understanding E. coli’s response to acid stress is important to understand its life cycle as well as its tolerance of pH changes during bioprocessing. To develop a fundamental understanding of the pH response, known stress mitigation mechanisms were reconstructed and added into the ME model to form an

(a) GEM-PRO

(c) Proteostasis network Complex Unfolded peptide

Ribosomes

Native enzyme

Chaperones

Gene

(b) Protein-specific properties Kinetic folding rate kf(T)

Aggregation propensity

Protein sequence & 3D structure

(d) Predicted & experimental growth rates Simulation M9+glucose M9+glucose+aa M9+glucose+aa+nt

1

Relative growth rate

Metabolic model

0.9 0.8 0.7

Literature BHI LB DM+glucose

0.6 0.5 0.4 0.3 0.2 0.1

Enzyme catalytic Thermostability ΔG(T) rate kcat(T)

Arrhenius optimal stressed

0

20

25

30

35

40

Temperature (°C)

(e) Proteome allocation shifts

Temperature

Temperature

28 °C

37 °C

45 °C

45

50

Experiment M9+glucose M9+glucose+aa M9+glucose+aa+nt

Ribosomal proteins Molecular chaperones Proteins highly expressed at 37°C Proteins not expressed at 37°C

Figure 2.20 Utilization of protein structural properties within a genome-scale model for the purpose of modeling thermal stress adaptations. (a) The formal integration is termed a “GEM-PRO,” or genome-scale model with protein structures. Enzymatic reactions are mapped to their corresponding protein sequences and structures. (b) Protein-specific properties can be predicted from sequence and structure, and further computed to reflect protein property changes under stress. For example, four processes represent the influence of temperature on the status of the structural proteome. (c) Incorporation of chaperone folding networks adds the proteostasis network response to the model. (d) Simulated growth rates of an integrated model match very closely to measured experimental rates at different temperatures. (e) Proteome allocation of the cell under thermal stress can be inspected to find a dramatic shift toward chaperone production at high temperatures. Source: Adapted from Schmidt [87], Mih et al. [89], and Chen et al. [90].

48

2 Genome-Scale Models

(a) Demetallation and mismetallation (b) Iron-sulfur cluster damage Mn2+

Suf

Zn2+

IscU

Fe2+, e– (NADH)

Zn2+

Mn2+ Fe2+ Fe2+

(c)

Fe3+ (H2O2) Fe2+ (O2–)

Unincorporated iron

Fe2+ + H2O2

Dps



HO +

OH –

+

Fe3+

Fenton chemistry

(e)

NTPs mRNA AAs

Protein

(d) Fe2+

Fe3+

High RSA Low RSA

DNA damage

Fe2+

Medium RSA

Figure 2.21 OxidizeME: a multiscale description of metabolism and macromolecular expression that accounts for damage by ROS to macromolecules. (a) Mononuclear Fe(II) proteins are demetallated by ROS and mismetallated with alternative divalent metal ions. (b) Iron–sulfur clusters are oxidized and repaired. (c) Unincorporated Fe(II) spontaneously reacts with H2 O2 via Fenton chemistry, generating hydroxyl radicals that damage DNA, while the Dps protein stores unincorporated iron and protects DNA from damage. (d) Protein structural properties are computed to estimate the probability of metal cofactor damage by ROS (RSA, relative solvent accessibility). (e) Processes in (a)–(d) are integrated into a multiscale oxidative model, named OxidizeME. OxidizeME is used to compute the scope of macromolecular damage and the cellular response for varying intracellular concentrations of superoxide, hydrogen peroxide, and divalent metal ions (Fe(II), Mn(II), Co(II), Zn(II)). Source: Yang et al. [91]. © 2019 National Academy of Sciences.

acidifyME model [92] (Figure 2.22). Three known mechanisms of acid stress mitigation were considered: (i) change in membrane lipid fatty acid composition, (ii) change in periplasmic protein stability over external pH and periplasmic chaperone protection mechanisms, and (iii) change in the activities of membrane proteins. After integrating these mechanisms into an established ME model, acidifyME could simulate their responses in the context of other cellular processes. The simulations were validated using RNA sequencing data obtained from five E. coli strains grown at external pH ranging from 5.5 to 7.0. The study showed that: (i) for the differentially expressed genes accounted for in the ME model, 80% of the upregulated genes were correctly predicted by the ME model, and (ii) these genes are mainly involved in translation processes (45% of genes), membrane proteins and related processes (18% of genes), amino acid metabolism (12% of genes), and

2.5 From Metabolism to the Proteome

Figure 2.22 Comparison of acidifyME simulations, accounting for the three acid stress mechanisms, against RNA-seq data from E. coli. Differentially expressed genes (DEGs) due to acid stress were found to be consistent with model predictions and RNA-seq data. We grouped the list of DEGs found into different COG categories. Source: Du et al. [92]. CC BY 4.0.

cofactor and prosthetic group biosynthesis (8% of genes). AcidifyME provides a quantitative framework that describes, on a genome-scale, the acid stress mitigation response of E. coli that has both scientific and practical uses. Multistress ME models can be formulated by integrating the three stress mitigation responses discussed above. Once customized to a particular production strain, such a model can be used to interpret disparate data types obtained under bioprocessing conditions of interest. 2.5.3

Recapitulation

GEMs have reached a level of sophistication such that they reveal intricate properties of functional proteomes and how they support different E. coli lifestyles. A more detailed summary of the capabilities of ME models is found in [7].

49

2 Genome-Scale Models

2.6 Current Developments A thorough review that appeared in 2013 assessed the advantages and shortcomings of M models [93]. The shortcomings were mostly due to the lack of kinetic and regulatory information. This information is hard to come by at the genome-scale, but progress has been made in recent years. A third data source (augmenting metabolism and gene expression) that has started to appear in GEMs are protein structures, opening up an entirely new range of possible uses of GEMs (Figure 2.23). 2.6.1

Kinetics

Turnover rates of enzymes have historically been measured in vitro through enzymatic assays. The conditions chosen for such in vitro measurements may or may not accurately reflect the in vivo milieu. In vivo turnover rates have been compared to those obtained in vitro, demonstrating the discrepancy between the two (Figure 2.24) [99]. This comparison was performed in wild type strains. A more stringent test is to knock out strategically selected metabolic genes and use laboratory evolution to generate optimally growing strains with a different flux distribution. Given the special nature of the flux distribution, these strains have been termed “metabolic specialists” [98]. The selection of metabolic genes to be knocked out is performed using an M model. After adaptive laboratory evolution where growth rate has been Large flux perturbations via metabolic gene knockouts

Optimize KO strains via laboratory evolution (ALE) Replicate 1

GLCptspp, pthsHIcrr

PGI, pgi

Proteomic and fluxomic profiling of 21 optimized strains

Fluxomics

Calculate in vivo kinetics (and extrapolate to genome-scale via machine learning)

vi,j pi,j

in vivo kapp,max ≈ kcat i

Proteomics TPI, tpiA

WT

Enzyme i Strain j

= kappi,j

Maximum across strains

Replicate 2

SUCDi, sdhCB

Growth rate

50

i

Machine learning

kapp,max at genome scale Cumulative cell divisions (CCD)

m/z

Parameterize genome-scale metabolic models

Figure 2.23 Approach for obtaining kcat in vivo from metabolic specialists: Knockout of enzymes in central metabolism was followed by adaptive laboratory evolution (ALE) to obtain strains that had diverse flux states, while achieving high growth rates [94–97]. Fluxomics and proteomics data was then integrated for the evolved strains to obtain the maximum kapp across the strains (kapp,max ) for each enzyme that could be mapped uniquely. The obtained kapp,max vector was then extrapolated to the genome-scale via supervised machine learning and used to parameterize genome-scale metabolic models. The resulting genome-scale models were then validated on unseen proteomics data. Source: Heckmann et al. [98].

log10(kapp,max from growth conditions)

2.6 Current Developments

2

R2 = 0.9 MAE = 0.36 n = 210

0

–2

–4

–4

–2 0 2 log10(kapp,max from KO ALEs)

Figure 2.24 Estimates of in vivo turnover numbers are consistent between wild type strains and metabolic specialists. Comparison between 210 kapp,max obtained from metabolic specialists [98] and kapp,max from wild type strains [99]. Source: Heckmann et al. [98].

maximized, both the fluxome and the proteome are quantitatively measured, and the ratio between them gives the apparent in vivo turnover rate, called k app (Figure 2.23). The k app,max can then be estimated across all the conditions considered and the intrinsic in vivo turnover rate estimated. These numerical values can then be used to parameterize ME models. The need for a genome-scale assessment of kinetic parameters has been articulated and the set of these numbers has been termed the “kinetome” [100]. 2.6.2

Transcriptional Regulation

Since the early 2000s, there has been a push to make transcriptional regulatory networks (TRNs) a part of GEMs [81]. Many COBRA methods have appeared (see “regulatory methods” branch in Figure 2.4). Unlike the reconstruction of metabolism and expression processes with fundamental chemical equations, the TRN is the outcome of an evolutionary process and cannot be reconstructed directly from first principles. Fortunately, machine learning methods are enabling us to reconstruct TRNs in a top-down fashion from transcriptomic data. Reconstructed TRNs should provide additional constraints on solutions from GEMs. RNAseq profiling is now affordable and large compendia of transcriptomes are becoming available [101, 102]. The number of publicly available bacterial transcriptomes is doubling roughly every two years. Recently, independent component analysis (ICA) has been applied to compendia of transcriptomic data to find

51

52

2 Genome-Scale Models

independent signals that comprise them [103–105]. When applied to transcriptomic data, ICA decomposes a collection of expression profiles (organized in a matrix X where the columns are experiments and the rows are the genes) into: 1) a set of components, which represent underlying biological signals (M), and 2) the components’ condition-specific activities (A) (Figure 2.25a). 2.6.2.1

iModulons

Each component, represented by a column of M, contains a coefficient for each gene that represents the effect of a particular underlying signal on the gene’s expression level. Components do not contain information on the condition-specific transcriptomic state, but rather a list of the genes that are coordinately regulated under all conditions. These have been called independently modulated sets of genes, or iModulons, and thus M is a matrix representing their composition modulon matrix. The iModulons represent the outcome of evolution and thus represent distal causation. 2.6.2.2

Activities

Conversely, ICA computes activity levels for each iModulon across every condition in the compendium, represented by a row of A, to account for condition-dependent expression changes. Each expression profile is represented by the summation over all iModulons, each scaled by its condition-specific activity (Figure 2.25b,c). A thus represents the composition of the transcriptome in a particular condition, and thus represents proximal causation. This decomposition of the transcriptome is illustrated in Figure 2.25a,b. The mathematics are shown in panel a, and the entries of the matrices M and A are illustrated in panel b. Statistically significant elements in a column of M are normally few and call out the list of coherently expressed genes across all the conditions in X. They are “painted” blue based on knowledge of transcription factor binding sites in their promoters. Often, many of these genes are regulated by the same transcriptional regulator, leading to the association of the regulator to the iModulon, and a functional annotation for the iModulon. The corresponding row of A gives the condition specific activity of the iModulon. One can then call out specific conditions of interest. Relative activities of iModulons give direct information about the structure of the TRN (Figure 2.26). The “fear-greed” tradeoff (i.e. relative proteome allocation to stress vs. growth functions) is consistently found in transcriptomic data (see [103, 104]) as shown in Figure 2.26a. We also see tradeoffs in gene expression for iron-activated responses (Figure 2.26b) and oxygen availability (Figure 2.26c). Specific cases, such as the response to antibiotic exposure, can then be highlighted against all the conditions in the data set (see specific data points called out in Figure 2.26). iModulons have been applied to a central issue in metabolic engineering: the response of the host to the expression of a heterologous gene. Thus far, this response is unpredictable and changes from protein to protein. A recent study shows how the expression of heterologous genes activates five key classes

2.6 Current Developments

X

M

=

Samples ICs

Gene expression dataset

Genes

Genes

(a)

A

ICs

Samples

Structure

Activities

Count

Regulons Gene ontology Genotype IC gene weightings

Condition

0.0

0.2

0.4

IC gene weightings

Functional iModulon

Number of genes

2500

l

Carbon source

iModulon Genes

5 proV proW proX

5

0.00

Genes with “hyperosmotic response” gene ontology

0.25

0

40

10

0

Genes in iModulon

3

Genes in iModulon

n=4

20 n = 274

0 +NaCl -NaCl

Condition

IC gene weightings 5

iModulon genes

10

thrA 5 borD

0

0.00

thrA

0.5

IC gene weightings

thrB thrC

iModulon activity

Number of genes

2500

Genomic iModulon

n = 260

th

0

Genes regulated by GlpR (glycerol-3phosphate repressor)

ro

glpD

n = 18

ce

glpX glpQ

10

ly

5

20

er

8

glpB glpF glpT glpC glpK

10

iModulon activity

iModulon genes

2000 1000

Condition-specific activity

G

Regulatory iModulon

Number of genes

(c)

Biological interpretation

iModulon activity

Transcriptome structure

O

(b)

iModulons

IC/iModulon activity

iModulon genes

0 −5 −10

n=4

ΔthrA

n = 274

WT thrA

Genotype

Figure 2.25 Characterization of iModulons. (a) Schematic illustration of the workflow applied to a gene expression data set X. (b) Descriptions of the three classes of characterized iModulons. The first column contains histograms illustrating the distribution of gene coefficients in each of the three independent components (ICs) from the RNAseq data set, where most genes have coefficients near zero. Genes outside of a threshold (in red) belong to an “iModulon.” The second column illustrates the biological interpretation of the iModulon types. iModulons are characterized by comparing their genes with known regulons, ontological annotations, and genotypes. The third column illustrates the ICA-computed activity levels for each IC across every condition in the data set. Panel c gives three specific examples of iModulons: a regulatory iModulon-associated glycerol metabolism that is regulated by glpR; a functional iModulon where the regulated genes have the same functional annotation based on GO classification (no regulator is known, but is discoverable with the information in the iModulon) [103]; and a genomic iModulon that represents a genomic alteration, such as a gene deletion. These iModulons show up as a very specific signal for strains with genes that have been deleted or duplicated.

53

20

0 –10

–20

Stress

–30

(a)

0 20 –20 RpoS I-modulon activity

Pearson R = –0.60

Evolved strain

Reference (WT)

Unevolved strain

RpoB E672K RpoB E546V

No Growth rate measurement

(b) 1.5 h−1 1 h−1 0.5

h−1

0 h−1

ΔFur

–5

Low iron –10 –15 –20 –25

40

Growth rate

–40

0

High iron –20

–10 0 10 Fur-1 I-modulon activities

Control M9

CIP-treated M9

ArcA l-modulon activities

Fur-2 I-modulon activities

Translation I-modulon activity

10

Low O2

High O2

5

Growth

0

–10 –20

–30

Nitrate respiration

Anaerobic respiration

20

Other Abx M9

R10LB

R10LB

R10LB

CA-MHB

CA-MHB

CA-MHB

(c)

0

10 20 30 Fnr I-modulon activities

M9 minimal media Control Abx treated

RPMI + 10%LB Control Abx treated

Figure 2.26 Tradeoffs in the bacterial transcriptome. (a) The growth vs. stress tradeoff. (b) Iron responses. The Fur-1 iModulon responds to iron starvation, de-repressing iron (II) and (III) siderophore synthesis and transport systems, ribonucleotide reductases, superoxide dismutase, and iron–sulfur cluster assembly. The Fur-2 iModulon responds to excess iron, further repressing siderophore transport and repressing the energy-transducing Ton system. (c) The aerobic–anaerobic respiration tradeoff highlights differential response of E. coli to subinhibitory concentrations of three diverse antibiotics (Trimethoprim-sulfamethoxazole, ceftriaxone, ciprofloxacin) in minimal vs. physiological media. Abbreviations: CIP, ciprofloxacin; R10LB, RPMI +10% LB; CA-MHB, cation-adjusted Mueller–Hinton broth. Sources: Based on (a) Kavvas et al. [103]. (b) and (c) Based on Ghatak et al. [106].

2.6 Current Developments

of stress responses in the host, including the fear-greed tradeoff, and various metalloproteome protection mechanisms [107]. We note that the percentage of variance in X explained by the iModulons represents the percentage of the transcriptome that can be explained by underlying mechanisms. This is a different interpretation than is obtained from principal component analysis (PCA), which simply identifies what percentage of the variation is represented by the principal components: a statistical measure. For example, if the regulatory iModulons represent 67% of the variation in X, then they explain 67% of the functions of the TRN. 2.6.3

Protein Structures

Properties of proteins can be evaluated based on their structures. This allows us to overcome some of the shortcomings of GEMs. Complete structural proteomes have been assembled for the latest E. coli and human metabolic reconstructions. The establishment of a comprehensive set of protein-specific GEMs can now be evaluated down to the level of tertiary structure using the GEM-PRO workflow [89]. Having a comprehensive set of structures available for a GEM leads to the structural systems biology analysis of production strains, opening up a number of new possibilities [50]: 1) Detailed physicochemical properties of the structural proteome enable the use of protein structures as an “omics” data source for downstream data science analysis and advanced metabolic modeling simulations incorporating residue-level measurements. 2) A better understanding of functional assignments of proteins based on their structures leads to improved GEMs, which better predict proteome allocations in different conditions. 3) Classical analyses of protein folds and families in the context of metabolic networks add an additional level of functional understanding, enabling the analysis of the distribution of folds within a network and between strains or species. 4) In silico structural bioinformatic tools enable residue-level predictions of mutations or post-translational modifications that can then be used to modulate changes within a metabolic network, leading to an understanding of the global effect of small changes upon an entire cell. 5) Large-scale analyses of sequence variation mapped to structure can uncover how small differences in protein structure potentially lead to metabolic changes within different strains of an organism. Furthermore, these small differences are crucial to finding regions in proteins that are amenable to enzyme engineering and drug design approaches. 6) The interactions between proteins and small molecules have largely been limited to those involved directly in catalysis and studied by enzymatic assays. A large-scale understanding of all small-molecule interactions with proteins inside a cell adds yet another “interactome” to systems biology models, uncovering competitive and noncompetitive interactions that regulate processes alongside the transcriptome.

55

56

2 Genome-Scale Models

2.7 Broader Perspectives GEMs have developed to a considerable degree of sophistication in content and computational abilities and are being used for an increasing number of complex applications. Here we show how they can help to form a multiscale basis for assessing causality in cellular phenotypes, and how they fit into the increasingly complex and interesting world of data analytics. 2.7.1

Distal Causation

Mutations represent important variables for strain design. There are a number of causal mutations now available in public databases [108] resulting from mutation screens or obtained from controlled laboratory evolutions [109]. Detailed mutation causality can be established by using an integration of the analysis approaches discussed in the preceding sections (Figure 2.27).

Figure 2.27 Classification of structural systems biology studies into six use categories. Source: Mih and Palsson [50]. CC BY 4.0

2.7 Broader Perspectives

Numerous laboratory evolutions to optimal growth have shown that RNA polymerase mutations are causal. These mutations reallocate the proteome away from stress functions [110] to growth functions, creating a fear-greed tradeoff [103] that is shown in Figure 2.26a. Mutations are identified after resequencing an evolved strain using the sequence of the wild-type strain as a reference (Figure 2.28a). These mutations can then go through a structural analysis (Figure 2.28b). Transcriptomic analysis then reveals the differentially expressed sets of genes showing the upregulation of growth promoting genes and downregulation of stress functions (Figure 2.28c). The ME model can then interpret these expression profiling changes by computing how the phenotype shifts the proteome allocation to growth promoting functions away from nongrowth functions (Figure 2.28d). Finally, overall phenotypic traits such as growth capabilities and stress response can be measured (Figure 2.28e). The capabilities of current GEMs can analyze every step in this multiscale relationship. 2.7.2

Contextualization of GEMs Within Workflows

There now exist many levels of statistical and model-based analysis methods available to the metabolic engineer. It is difficult to put them all into context and understand the advantage and use of each one. Below, I try to put this challenge into perspective by proposing four levels of analysis (Figure 2.29): 1) Statistics: Large omics data sets are often analyzed by multivariate statistics and machine learning. These methods reveal the overall features of a data set and often useful classification schema. No mechanisms are represented. 2) Knowledge-enriched methods: The outcomes of artificial intelligence (AI) approaches are often intricate and complex patterns that are hard to interpret. This has created the need for explanatory AI that embeds prior knowledge. A number of approaches are now being developed along these lines and the iModulons described above represent such knowledge-based approaches to transcriptomic data. Other examples of relationships in multiomic data have appeared [112], including protein structures [50]. 3) Network models: GEMs represent a case where knowledge bases (i.e. network reconstructions) have become comprehensive enough to build meaningful computational models. 4) Biophysical models: Reduced and parameterized GEMs can be converted into classical kinetic models. Such models have historically been used in biophysics and metabolic engineering and are covered in textbooks. Levels 1 and 4 represent classical disciplines, the third level is mostly represented by GEMs and their development over the past 20 years, while level 2 is relatively new and is seeing the most current progress [34]. An early depiction of the role of GEMs in the strain design process illustrates their role in designing ideal networks and performing omic data analysis (Figure 2.30). Design on the genome-scale is challenging and thus the implementation of this process has been gradual. An early success came through the engineering of a fully synthetic pathway to 1,4 Butanediol (BDO) that is not a

57

2 Genome-Scale Models

A->TG->A

Adaptive RNAP mutations

(a) rpoB

rpoC

Structural effects

(b)

RNAP ppGpp

Stress functions

Growth

*

(d)

TRN reprogramming

Genome-scale model interpretation

Nongrowth proteome fraction (%)

(c)

Lines of constant growth rate

*

WT

Systems level phenotype

rpoB mutants

Glucose M9

1.25

1.00

(e)

0.75

Wild type rpoBE546V rpoBE672K

0.50

0.25

0.00 0.0

2.5

5.0

7.5

10.0

12.5

Time (h)

15.0

Growth-rate changes relative to WT

Nongrowth energy use

OD600

58

Substrate readiness

Cellular level Phenotype

17.5



Reduced stress responses in mutant

Figure 2.28 Multiscale characterization of mutational effects, from genotype to phenotype. The multiscale effects of the studied adaptive regulatory mutations in RNAP are summarized. The mutations alter the structural dynamics of RNAP, perturbing the transcriptional regulatory network through the action of key transcription factors. The decreased proteome and energy allocation toward stress-preparedness processes (hedging functions) increases cellular growth. In turn, the cell can grow faster in conditions of steady state growth, but is less fit under environmental shifts and shocks. The panels are detailed in the text. Source: Utrilla et al. [111]. © 2016 Elsevier.

2.8 What Does the Future Look Like for GEMs?

3 Systems biology, physiology

2 Assembly, reconstruction

Dim 2

1 Statistics

Network

4 Biophysical models

Genomic feature

Dim. 1

Clustermap

PCA plot

Flux map System boundary

• Multivariate statistics • Machine learning • PCA/ICA • Clustering • Dimensionality reduction

Protein

Binding sites

• Functional systems • Homeostasis • Data reconciliation • Data mapping • Prediction

• Draft reconstruction • Multistrain analyses • Reactomes • Transcriptional regulatory network

Bioprocess

Motility

• Detailed P/C models • Dynamic simulations • Design • Pharmacokinetics

Figure 2.29 Four basic layers of data analysis available for metabolic engineering. Level 1 consists of statistical tools for data analysis. Level 2 involves mapping knowledge types onto large data sets. Level 3 involves the use of GEMs and other systems models. Level 4 consists of the familiar biophysical and engineering modeling methods.

B

A

D Genome engineering

35 30 25 20 15 10 5 0

20 15 10 5 0

5 10 15 20 25 Time (h)

0

Fermentation

Acetone, Pyruvate, L-valing (gl)

c

Glucose (gl). growth (00 ml)

Systems metabolic engineering

Modeling and simulation Systems-level analysis

Figure 2.30 An early depiction of the use of systems level analysis, both statistical and mechanistic (GEMs), to form a closed design loop for production. Source: Park and Lee [113]. © 2008 Elsevier.

natural metabolite [114]. Subsequently, this process was scaled up and put into commercial operation in 2016 [115, 116]. GEMs played a key role in engineering and developing this strain, leading to the quote “It’s all about balance, and not just in metabolism” by Harish Nagarajan of Genomatica at a Metabolic Engineering class in 2017.

2.8 What Does the Future Look Like for GEMs? On their 20th anniversary, it seems appropriate to put the emergence of GEMs into historical context (Figure 2.31). As stated above, whole genome sequencing

59

Figure 2.31 Synthetic biology and minimal cells: a historical perspective. Elucidating the DNA double helix marked the beginning of the molecular biology era, and it became possible to study molecular mechanisms that underpinned observable phenotypes. DNA sequencing methods improved, leading to whole-genome sequencing at the end of the 1990s. Methods for mathematical cell modeling were developed during the 1980s and 1990s, and computer simulations of metabolic networks (also known as genome-scale models of metabolism, or GEMs) could be reconstructed. A defining moment took place in 2008 (red), with the creation of the first artificial genome that mimicked the genetic information of Mycoplasma genitalium, the free-living, nonsynthetic organism with the smallest genome. Thanks to developments in next-generation sequencing methods, this was paired with the rise of large-scale genome sequencing ventures, such as the human microbiome and the 1000 genomes projects. Advances in whole-genome synthesis, assembly, and transplantation helped create the first cell living with an entirely synthetic genome shortly after. All told, these achievements marked the coming of age for synthetic biology. Source: Lachance et al. [117]. CC BY 4.0.

2.8 What Does the Future Look Like for GEMs?

emerged in the mid-1990s followed by GEMs around 2000. DNA sequencing and synthesis technologies advanced markedly in the late 2000s. Inexpensive sequencing led to the development of numerous methods with which to perform genome-wide characterizations (RNAseq, Ribo-seq, ChIPseq, transcriptions start and termination seq, etc.). In the 2010s, the synthesis of entire natural [118] and modified [119] microbial genomes led to the first semisynthetic organisms. In the mid to late 2010s, large investments were made in commercial enterprises that generate massive data sets to relate genotypes to phenotypes. Taken together, these advances at the genome-scale represent the ability to build, to monitor, and to statistically and mechanistically model cell functions. Such a set of capabilities is foundational to all engineering disciplines, and thus I can hypothesize that there will be new academic programs, and eventually departments, leading to a new engineering discipline: one of Genome Engineering. Such programs are likely to be in place by the end of this decade. Metabolic engineering is likely to become a component of such departments, representing an early success with engineering phenotypes at the genome-scale. Biology is characterized by dual causation (both proximal and distal causation). Distal causation characterizes changes in organism properties over many generations, while proximal causation refers to the responses of an organism to its environment against a fixed genetic background. The former is fundamental to biology, while the latter is key to engineering. GEMs describe dual causation as they represent the set of phenotypes that can be produced by the genetic elements encoded on a genome and can also address how these capabilities shift with changes in the genetic elements. Thus, there are two distinct frontiers in GEM development. As biology is characterized by adaptation and evolution, interest is growing in developing GEMs across branches of the phylogenetic tree, and to characterize multiple strains of the same bacterial species. This pursuit has led to the development of a research emphasis called pangenomics. The pangenome is defined as the totality of genes found in available genomic sequences of strains of a species. Conversely, the set of genes found to be present in the genome of all sequenced strains of a species is referred to as the core genome. The core genome represents one way to define a species. The accessory genome (the pangenome minus the genes in the core genome) contains genes that have a variable presence in the sequenced strains. These genes confer strain-specific properties. Numerous pangenome studies have appeared [120–124]. Now that the scale of such analysis is reaching thousands or even tens of thousands of strains, the consequences of allelic variations in shared genes can be assessed [47]. This gives rise to alleleomics as a field. These developments have broad implications for biology and specific ones for metabolic engineering. For example, alleleomics can provide a list of likely compatible parts for use in a production strain that originate from other strains of the same species. More broadly, building a Global Metabolic Atlas across the phylogenetic tree would be of broad interest, and represents a grand challenge for biology. Another notable consequence of a broader phylogenetic view enabled by GEMs is the design of minimal genomes [125], which represents a step toward genome-scale engineering.

61

62

2 Genome-Scale Models

In contrast to the broad scope of biology, metabolic engineering is focused on the optimal performance of one strain in a specified production environment. From this standpoint, GEMs could be improved by developing them toward more specific and validated models. GEMs can be specialized to whole-cell models, where the environment and genetics are fixed and the emphasis is on optimal performance, i.e. proximal causation. This calls for a clear definition of the expressed genes under the conditions of interest along with the establishment of their gene products’ physiochemical properties to enable a quantitative simulation of cell behavior under specific processing conditions. This need calls for the difficult delineation of the regulatory mechanisms dominant under these specific conditions and the kinetic properties of key enzymes. The elimination of any unknown or undesirable genes that can be inadvertently activated – leading to deteriorated performance – may also be required. Broadly, both genome reduction to further eliminate unknown or unwanted parts and the systematic discovery of the function of uncharacterized cellular components must be prioritized in the pursuit of predictive engineering models. GEMs are mechanistic models based on curated knowledge bases. However, we do not know everything necessary about genomes and genome products to build such models comprehensively. The history of other engineering fields teaches us that phenomenological correlations need to be integrated with mechanistic models to enable engineering designs. Thus, workflows integrating GEMs with knowledge-enriched statistical models (such as iModulons to describe the TRN) need to be developed (Figures 2.29 and 2.30). This necessitates addressing exciting challenges to generate large data sets, new algorithms, expanded reconstructions, and advances in computational biology. Major advances in science and technology are often framed around grand challenges. These include the Manhattan project in the 1940s to split the atom, the race to send a man to the moon in the 1960s and sequencing the human genome in the 1990s. The ability to design and build genomes would certainly impact human history. Unlike the other grand challenges mentioned, each with a specific goal, Genome Engineering will develop more broadly as a field, and is likely to reach many historical milestones in the decades to come. The continued development of the scope and capabilities of GEMs is likely to play an integral role in this process.

Disclaimer The COBRA field now has about 2000 practitioners (as estimated from Google analytics of the traffic of average unique monthly users on the BiGG Models website [126], a commonly used source for GEMs, and for COBRA Toolbox [127] and COBRApy [128], two commonly used computational resources) with many contributions that are hard for me to represent comprehensively. Thus, this chapter is necessarily slanted toward my experience and the leadership role that the E. coli GEM has played in the field thus far. I should note that the developments outlined in this chapter have been extended to eukaryotes. In particular, extensive work on yeast GEM development has

References

been carried out by Jens Nielsen at Chalmers University, material that is unfortunately not covered in this chapter. Apologies in advance to those who feel their contributions have been overlooked. I tried my best to get input from the leading practitioners in the field to build a comprehensive list of key citations.

Acknowledgments Many thanks to Sang-Yup Lee, Daniel Zielinski, Colton Lloyd, Charles Norsigian, Harish Nagarajan, Xin Fang, Jonathan Monk, and Adam Feist for reviewing this chapter, providing helpful input, or inspiring certain sections. Special thanks to Costas Maranas for his detailed input on the “Design Algorithms” sub-section, which would not be in this chapter without his help. Support comes mostly from the National Institute of General Medical Sciences (Grant R01GM057089) that has been continuously funded from 1998 to 2022, and from the Novo Nordisk Foundation (Grant NNF10CC1016517) from 2011 to 2020. Special thanks to Marc Abrams who has been editing manuscripts on GEMs since 1998, and without whom the impact of the contributions from my lab would have been diminished due to lack of clarity and poor presentation. He is intimately familiar with the history of the E. coli GEM that forms the backbone of this chapter, and that represents a pioneering contribution to metabolic systems biology and its impact on metabolic engineering.

References 1 Bailey, J.E. (1991). Toward a science of metabolic engineering. Science 252

(5013): 1668–1675. 2 Fleischmann, R.D., Adams, M.D., White, O. et al. (1995). Whole-genome

3 4

5

6 7

random sequencing and assembly of Haemophilus influenzae Rd. Science 269 (5223): 496–512. Blattner, F.R., Plunkett, G. 3rd,, Bloch, C.A. et al. (1997). The complete genome sequence of Escherichia coli K-12. Science 277 (5331): 1453–1462. Edwards, J.S. and Palsson, B.O. (1999). Systems properties of the Haemophilus influenzae Rd metabolic genotype. J. Biol. Chem. 274 (25): 17410–17416. Edwards, J.S. and Palsson, B.O. (2000). The Escherichia coli MG1655 in silico metabolic genotype: its definition, characteristics, and capabilities. Proc. Natl. Acad. Sci. U. S. A. 97 (10): 5528–5533. Varma, A. and Palsson, B.O. (1994). Metabolic flux balancing: basic concepts, scientific and practical use. Nat. Biotechnol. 12 (10): 994–998. Fang, X., Lloyd, C.J., and Palsson, B.O. (2020). Reconstructing organisms in silico: genome-scale models and their emerging applications. Nat. Rev. Microbiol. [Internet] Epub ahead of print. https://doi.org/10.1038/s41579020-00440-4.

63

64

2 Genome-Scale Models

8 Covert, M.W., Knight, E.M., Reed, J.L. et al. (2004). Integrating

9

10

11

12

13 14

15 16 17 18 19 20

21

22 23

24 25

high-throughput and computational data elucidates bacterial networks. Nature 429 (6987): 92–96. Heirendt, L., Arreckx, S., Pfau, T. et al. (2019). Creation and analysis of biochemical constraint-based models using the COBRA toolbox v.3.0. Nat. Protoc. 14 (3): 639–702. Schellenberger, J., Que, R., Fleming, R.M.T. et al. (2011). Quantitative prediction of cellular metabolism with constraint-based models: the COBRA toolbox v2.0. Nat. Protoc. 6 (9): 1290–1307. Becker, S.A., Feist, A.M., Mo, M.L. et al. (2007). Quantitative prediction of cellular metabolism with constraint-based models: the COBRA toolbox. Nat. Protoc. 2 (3): 727–738. Lewis, N.E., Nagarajan, H., and Palsson, B.O. (2012 Apr). Constraining the metabolic genotype-phenotype relationship using a phylogeny of in silico methods. Nat. Rev. Microbiol. 10 (4): 291–305. Kim, W.J., Kim, H.U., and Lee, S.Y. (2017). Current state and applications of microbial genome-scale metabolic models. Curr. Opin. Syst. Biol. 2: 10–18. Kim, B., Kim, W.J., Kim, D.I., and Lee, S.Y. (2015 Mar). Applications of genome-scale metabolic network model in metabolic engineering. J. Ind. Microbiol. Biotechnol. 42 (3): 339–348. Oberhardt, M.A., Palsson, B.O., and Papin, J.A. (2009). Applications of genome-scale metabolic reconstructions. Mol. Syst. Biol. 5: 320. O’Brien, E.J., Monk, J.M., and Palsson, B.O. (2015). Using genome-scale models to predict biological capabilities. Cell 161 (5): 971–987. Palsson, B.Ø. (2015). Systems Biology: Constraint-Based Reconstruction and Analysis. Cambridge University Press. Palsson, B. (2011). Systems Biology Simulation of Dynamic Network States. Cambridge, UK: Cambridge University Press. Segel, I.H. (1975). Enzyme Kinetics: Behavior and Analysis of Rapid Equilibrium and Steady-State Enzyme Systems. Wiley. Bordbar, A., Monk, J.M., King, Z.A., and Palsson, B.O. (2014). Constraint-based models predict metabolic and associated cellular functions. Nat. Rev. Genet. 15 (2): 107–120. Lachance, J.-C., Lloyd, C.J., Monk, J.M. et al. (2019). BOFdat: generating biomass objective functions for genome-scale metabolic models from experimental data. PLoS Comput. Biol. 15 (4): e1006971. Orth, J.D., Thiele, I., and Palsson, B.Ø. (2010). What is flux balance analysis? Nat. Biotechnol. 28 (3): 245–248. Chae, T.U., Choi, S.Y., Kim, J.W. et al. (2017). Recent advances in systems metabolic engineering tools and strategies. Curr. Opin. Biotechnol. 47: 67–82. Kim, H.U., Kim, T.Y., and Lee, S.Y. (2008 Feb). Metabolic flux analysis and metabolic engineering of microorganisms. Mol. Biosyst. 4 (2): 113–120. Park, J.M., Kim, T.Y., and Lee, S.Y. (2009). Constraints-based genome-scale metabolic simulation for systems metabolic engineering. Biotechnol. Adv. 27 (6): 979–988.

References

26 Maranas, C.D. and Zomorrodi, A.R. (2016). Optimization Methods in

Metabolic Networks. Hoboken, NJ: Wiley. 27 Reed, J.L., Famili, I., Thiele, I., and Palsson, B.O. (2006). Towards multidi-

mensional genome annotation. Nat. Rev. Genet. 7 (2): 130–141. 28 Bordbar, A., Johansson, P.I., Paglia, G. et al. (2016). Identified metabolic sig-

29

30

31

32

33 34 35

36

37 38

39

40

41

42 43

nature for assessing red blood cell unit quality is associated with endothelial damage markers and clinical outcomes. Transfusion 56 (4): 852–862. Bordbar, A., Yurkovich, J.T., Paglia, G. et al. (2017). Elucidating dynamic metabolic physiology through network integration of quantitative time-course metabolomics. Sci. Rep. 7: 46249. Jamshidi, N. and Palsson, B.O. (2010). Mass action stoichiometric simulation models: incorporating kinetics and regulation into stoichiometric models. Biophys. J. 98: 175–185. Famili, I., Mahadevan, R., and Palsson, B.O. (2005). k-Cone analysis: determining all candidate values for kinetic parameters on a network scale. Biophys. J. 88 (3): 1616–1625. Haraldsdóttir, H.S. and Fleming, R.M.T. (2016). Identification of conserved moieties in metabolic networks by graph theoretical analysis of atom transition networks. PLoS Comput. Biol. 12 (11): e1004999. Ghaderi, S., Haraldsdóttir, H.S., Ahookhosh, M. et al. (2020). Structural conserved moiety splitting of a stoichiometric matrix. J. Theor. Biol. 499: 110276. Lee, S.Y. and Kim, H.U. (2015). Systems strategies for developing industrial microbial strains. Nat. Biotechnol. 33 (10): 1061–1072. Ko, Y.-S., Kim, J.W., Lee, J.A. et al. (2020). Tools and strategies of systems metabolic engineering for the development of microbial cell factories for chemical production. Chem. Soc. Rev. 49 (14): 4615–4636. Ebrahim, A., Lerman, J.A., Palsson, B.O., and Hyduke, D.R. (2013). COBRApy: COnstraints-Based Reconstruction and Analysis for Python. BMC Syst. Biol. 7: 74. Thiele, I. and Palsson, B.Ø. (2010). A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat. Protoc. 5 (1): 93–121. Norsigian, C.J., Fang, X., Seif, Y. et al. (2020). A workflow for generating multi-strain genome-scale metabolic models of prokaryotes. Nat. Protoc. 15 (1): 1–14. Faria, J.P., Rocha, M., Rocha, I., and Henry, C.S. (2018). Methods for automated genome-scale metabolic model reconstruction. Biochem. Soc. Trans. 46 (4): 931–936. Lieven, C., Beber, M.E., Olivier, B.G. et al. (2020). MEMOTE for standardized genome-scale metabolic model testing. Nat. Biotechnol. 38 (3): 272–276. Satish Kumar, V., Dasika, M.S., and Maranas, C.D. (2007). Optimization based automated curation of metabolic reconstructions. BMC Bioinformatics 8: 212. Orth, J.D. and Palsson, B.Ø. (2010). Systematizing the generation of missing metabolic knowledge. Biotechnol. Bioeng. 107 (3): 403–412. Feist, A.M. and Palsson, B.O. (2010). The biomass objective function. Curr. Opin. Microbiol. 13 (3): 344–349.

65

66

2 Genome-Scale Models

44 Reed, J.L., Patel, T.R., Chen, K.H. et al. (2006). Systems approach to refining

genome annotation. Proc. Natl. Acad. Sci. U. S. A. 103 (46): 17480–17484. 45 Monk, J. and Palsson, B.O. (2014). Genetics. Predicting microbial growth.

Science 344 (6191): 1448–1449. 46 Kavvas, E.S., Catoiu, E., Mih, N. et al. (2018). Machine learning and struc-

47 48 49

50 51 52 53

54

55

56 57

58

59

60

61

tural analysis of Mycobacterium tuberculosis pan-genome identifies genetic signatures of antibiotic resistance. Nat. Commun. 9 (1): 4306. Kavvas, E.S., Yang, L., Monk, J.M. et al. (2020). A biochemically-interpretable machine learning classifier for microbial GWAS. Nat. Commun. 11 (1): 2580. Monk, J.M., Lloyd, C.J., Brunk, E. et al. (2017). iML1515, a knowledgebase that computes Escherichia coli traits. Nat. Biotechnol. 35 (10): 904–908. Brunk, E., Sahoo, S., Zielinski, D.C. et al. (2018). Recon3D enables a three-dimensional view of gene variation in human metabolism. Nat. Biotechnol. 36 (3): 272–281. Mih, N. and Palsson, B.O. (2019). Expanding the uses of genome-scale models with protein structures. Mol. Syst. Biol. 15 (11): e8601. Gu, C., Kim, G.B., Kim, W.J. et al. (2019). Current status and applications of genome-scale metabolic models. Genome Biol. 20 (1): 121. Monk, J., Nogales, J., and Palsson, B.O. (2014). Optimizing genome-scale network reconstructions. Nat. Biotechnol. 32 (5): 447–452. Mendoza, S.N., Olivier, B.G., Molenaar, D., and Teusink, B. (2019). A systematic assessment of current genome-scale metabolic reconstruction tools. Genome Biol. 20 (1): 158. Lewis, N.E., Nagarajan, H., and Palsson, B.O. (2012). Constraining the metabolic genotype–phenotype relationship using a phylogeny of in silico methods. Nat. Rev. Microbiol. [Internet] https://www.nature.com/articles/ nrmicro2737. Norsigian, C.J., Pusarla, N., McConn, J.L. et al. (2020). BiGG models 2020: multi-strain genome-scale models and expansion across the phylogenetic tree. Nucleic Acids Res. 48 (D1): D402–D406. Orth, J.D., Conrad, T.M., Na, J. et al. (2011). A comprehensive genome-scale reconstruction of Escherichia coli metabolism – 2011. Mol. Syst. Biol. 7: 535. Ghatak, S., King, Z.A., Sastry, A., and Palsson, B.O. (2019). The y-ome defines the 35% of Escherichia coli genes that lack experimental evidence of function. Nucleic Acids Res. 47 (5): 2446–2454. Ryu, J.Y., Kim, H.U., and Lee, S.Y. (2019). Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers. Proc. Natl. Acad. Sci. U. S. A. 116 (28): 13996–14001. Notebaart, R.A., Szappanos, B., Kintses, B. et al. (2014). Network-level architecture and the evolutionary potential of underground metabolism. Proc. Natl. Acad. Sci. U. S. A. 111 (32): 11762–11767. Nyerges, Á., Csörg˝o, B., Nagy, I. et al. (2016). A highly precise and portable genome engineering method allows comparison of mutational effects across bacterial species. Proc. Natl. Acad. Sci. U. S. A. 113 (9): 2502–2507. Guzmán, G.I., Sandberg, T.E., LaCroix, R.A. et al. (2019). Enzyme promiscuity shapes adaptation to novel growth substrates. Mol. Syst. Biol. [Internet]: e8462. http://dx.doi.org/10.15252/msb.20188462.

References

62 Edwards, J.S., Ibarra, R.U., and Palsson, B.O. (2001). In silico predictions of

63

64

65

66

67 68

69

70

71

72

73 74

75 76

77

Escherichia coli metabolic capabilities are consistent with experimental data. Nat. Biotechnol. 19 (2): 125–130. Ibarra, R.U., Edwards, J.S., and Palsson, B.O. (2002). Escherichia coli K-12 undergoes adaptive evolution to achieve in silico predicted optimal growth. Nature 420 (6912): 186–189. Fong, S.S. and Palsson, B.Ø. (2004). Metabolic gene-deletion strains of Escherichia coli evolve to computationally predicted growth phenotypes. Nat. Genet. 36 (10): 1056–1058. Herring, C.D., Raghunathan, A., Honisch, C. et al. (2006). Comparative genome sequencing of Escherichia coli allows observation of bacterial evolution on a laboratory timescale. Nat. Genet. 38 (12): 1406–1412. Cheng, K.-K., Lee, B.-S., Masuda, T. et al. (2014). Global metabolic network reorganization by adaptive mutations allows fast growth of Escherichia coli on glycerol. Nat. Commun. 5: 3233. Lee, S.Y., Kim, H.U., Chae, T.U. et al. (2019). A comprehensive metabolic map for production of bio-based chemicals. Nat. Catal. 2 (1): 18–33. Burgard, A.P., Pharkya, P., and Maranas, C.D. (2003). Optknock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnol. Bioeng. 84 (6): 647–657. Pharkya, P., Burgard, A.P., and Maranas, C.D. (2004). OptStrain: a computational framework for redesign of microbial production systems. Genome Res. 14 (11): 2367–2376. Ranganathan, S., Suthers, P.F., and Maranas, C.D. (2010). OptForce: an optimization procedure for identifying all genetic manipulations leading to targeted overproductions. PLoS Comput. Biol. 6 (4): e1000744. Chowdhury, A., Zomorrodi, A.R., and Maranas, C.D. (2014). k-OptForce: integrating kinetics with flux balance analysis for strain design. PLoS Comput. Biol. 10 (2): e1003487. Wang, L. and Maranas, C.D. (2018). MinGenome: an in silico top-down approach for the synthesis of minimized genomes. ACS Synth. Biol. 7 (2): 462–473. Sarkar, D. and Maranas, C.D. (2020). SNPeffect: identifying functional roles of SNPs using metabolic networks. Plant J. 103 (2): 512–531. Bonarius, H.P.J., Schmid, G., and Tramper, J. (1997). Flux analysis of underdetermined metabolic networks: the quest for the missing constraints. Trends Biotechnol. 15 (8): 308–314. Edwards, J.S., Covert, M., and Palsson, B. (2002). Metabolic modelling of microbes: the flux-balance approach. Environ. Microbiol. 4 (3): 133–140. Edwards, J.S., Ramakrishna, R., Schilling, C.H., and Palsson, B.O. (1999). Metabolic flux balance analysis. In: Metabolic Engineering (eds. S.Y. Lee and E.T. Papoutsakis), 13–57. New York, NY: Marcel Dekker. Schilling, C.H., Letscher, D., and Palsson, B.O. (2000). Theory for the systemic definition of metabolic pathways and their use in interpreting metabolic function from a pathway-oriented perspective. J. Theor. Biol. 203 (3): 229–248.

67

68

2 Genome-Scale Models

78 Schuster, S., Dandekar, T., and Fell, D.A. (1999). Detection of elementary

79

80 81

82

83

84

85

86

87 88

89 90

91

92

93

flux modes in biochemical networks: a promising tool for pathway analysis and metabolic engineering. Trends Biotechnol. 17 (2): 53–60. Covert, M.W. and Palsson, B.Ø. (2002). Transcriptional regulation in constraints-based metabolic models of Escherichia coli. J. Biol. Chem. 277 (31): 28058–28064. Allen, T.E. and Palsson, B.Ø. (2003). Sequence-based analysis of metabolic demands for protein synthesis in prokaryotes. J. Theor. Biol. 220 (1): 1–18. Reed, J.L. and Palsson, B.Ø. (2003). Thirteen years of building constraint-based in silico models of Escherichia coli. J. Bacteriol. 185 (9): 2692–2699. Thiele, I., Jamshidi, N., Fleming, R.M.T., and Palsson, B.Ø. (2009). Genome-scale reconstruction of Escherichia coli’s transcriptional and translational machinery: a knowledge base, its mathematical formulation, and its functional characterization. PLoS Comput. Biol. 5 (3): e1000312. Lerman, J.A., Hyduke, D.R., Latif, H. et al. (2012). In silico method for modelling metabolism and gene product expression at genome scale. Nat. Commun. 3: 929. O’Brien, E.J., Lerman, J.A., Chang, R.L. et al. (2013). Genome-scale models of metabolism and gene expression extend and refine growth phenotype prediction. Mol. Syst. Biol. 9: 693. Liu, J.K., O’Brien, E.J., Lerman, J.A. et al. (2014). Reconstruction and modeling protein translocation and compartmentalization in Escherichia coli at the genome-scale. BMC Syst. Biol. 8: 110. King, Z.A., O’Brien, E.J., Feist, A.M., and Palsson, B.O. (2017). Literature mining supports a next-generation modeling approach to predict cellular byproduct secretion. Metab. Eng. 39: 220–227. Schmidt, T.M. (2019). Encyclopedia of Microbiology. Academic Press. Lloyd, C.J., Monk, J., Yang, L. et al. (2020). Computation of condition-dependent proteome allocation reveals variability in the macro and micro nutrient requirements for growth [Internet]. p. 2020.03.23.003236. https://www.biorxiv.org/content/10.1101/2020.03.23.003236v1 (accessed 28 August 2020). Mih, N., Brunk, E., Chen, K. et al. (2018). ssbio: a Python framework for structural systems biology. Bioinformatics 34 (12): 2155–2157. Chen, K., Gao, Y., Mih, N. et al. (2017). Thermosensitivity of growth is determined by chaperone-mediated proteome reallocation. Proc. Natl. Acad. Sci. U. S. A. 114 (43): 11548–11553. Yang, L., Mih, N., Anand, A. et al. (2019). Cellular responses to reactive oxygen species are predicted from molecular mechanisms. Proc. Natl. Acad. Sci. U. S. A. 116 (28): 14368–14373. Du, B., Yang, L., Lloyd, C.J. et al. (2019). Genome-scale model of metabolism and gene expression provides a multi-scale description of acid stress responses in Escherichia coli. PLoS Comput. Biol. 15 (12): e1007525. McCloskey, D., Palsson, B.Ø., and Feist, A.M. (2013). Basic and applied uses of genome-scale metabolic network reconstructions of Escherichia coli. Mol. Syst. Biol. 9: 661.

References

94 McCloskey, D., Xu, S., Sandberg, T.E. et al. (2018). Adaptive laboratory

95

96

97

98

99

100 101 102

103

104

105

106

107

evolution resolves energy depletion to maintain high aromatic metabolite phenotypes in Escherichia coli strains lacking the phosphotransferase system. Metab. Eng. 48: 233–242. McCloskey, D., Xu, S., Sandberg, T.E. et al. (2018). Adaptation to the coupling of glycolysis to toxic methylglyoxal production in tpiA deletion strains of Escherichia coli requires synchronized and counterintuitive genetic changes [internet]. Metab. Eng.: 82–93. http://dx.doi.org/10.1016/j.ymben .2018.05.012. McCloskey, D., Xu, S., Sandberg, T.E. et al. (2018). Multiple optimal phenotypes overcome redox and glycolytic intermediate metabolite imbalances in Escherichia coli pgi knockout evolutions. Appl. Environ. Microbiol. [Internet] 84 (19) http://dx.doi.org/10.1128/AEM.00823-18. McCloskey, D., Xu, S., Sandberg, T.E. et al. (2018). Growth adaptation of gnd and sdhCB Escherichia coli deletion strains diverges from a similar initial perturbation of the transcriptome. Front. Microbiol. 9: 1793. Heckmann, D., Campeau, A., Lloyd, C.J. et al. (2020). Kinetic profiling of metabolic specialists demonstrates stability and consistency of in vivo enzyme turnover numbers. Proc. Natl. Acad. Sci. U. S. A. 117 (37): 23182–23190. Davidi, D., Noor, E., Liebermeister, W. et al. (2016). Global characterization of in vivo enzyme catalytic rates and their correspondence to in vitro kcat measurements. Proc. Natl. Acad. Sci. U. S. A. 113 (12): 3401–3406. Nilsson, A., Nielsen, J., and Palsson, B.O. (2017). Metabolic models of protein allocation call for the kinetome. Cell Syst. 5 (6): 538–541. Clough, E. and Barrett, T. (2016). The gene expression omnibus database. Methods Mol. Biol. 1418: 93–110. Ziemann, M., Kaspi, A., and El-Osta, A. (2019 Apr 1). Digital expression explorer 2: a repository of uniformly processed RNA sequencing data. Gigascience [Internet] 8 (4) http://dx.doi.org/10.1093/gigascience/giz022. Sastry, A.V., Gao, Y., Szubin, R. et al. (2019). The Escherichia coli transcriptome mostly consists of independently regulated modules. Nat. Commun. 10 (1): 5536. Poudel, S., Tsunemoto, H., Seif, Y. et al. (2020). Revealing 29 sets of independently modulated genes in Staphylococcus aureus, their regulators, and role in key physiological response. Proc. Natl. Acad. Sci. U. S. A. [Internet] http:// dx.doi.org/10.1073/pnas.2008413117. Rychel, K., Sastry, A.V., and Palsson, B.O. (2020). Machine learning uncovers independently regulated modules in the Bacillus subtilis transcriptome [Internet]. p. 2020.04.26.062638. https://www.biorxiv.org/content/10.1101/ 2020.04.26.062638v1 (accessed 28 August 2020). Sastry, A., Dillon, N., Poudel, S. et al. (2020). Decomposition of transcriptional responses provides insights into differential antibiotic susceptibility. bioRxiv 2020.05.04.077271; https://doi.org/10.1101/2020.05.04.077271. Tan, J., Sastry, A.V., Fremming, K.S. et al. (2020). Independent component analysis of E. coli’s transcriptome reveals the cellular processes that respond to heterologous gene expression. Metab. Eng. 61: 360–368.

69

70

2 Genome-Scale Models

108 Phaneuf, P.V., Gosting, D., Palsson, B.O., and Feist, A.M. (2018). ALEdb 1.0:

109

110

111

112 113

114

115

116

117 118 119 120

121

122

123

a database of mutations from adaptive laboratory evolution experimentation. Nucleic Acids Res. [Internet] http://dx.doi.org/10.1093/nar/gky983. Mundhada, H., Seoane, J.M., Schneider, K. et al. (2017). Increased production of L-serine in Escherichia coli through adaptive laboratory evolution. Metab. Eng. 39: 141–150. Conrad, T.M., Frazier, M., Joyce, A.R. et al. (2010). RNA polymerase mutants found through adaptive evolution reprogram Escherichia coli for optimal growth in minimal media. Proc. Natl. Acad. Sci. U. S. A. 107 (47): 20500–20505. Utrilla, J., O’Brien, E.J., Chen, K. et al. (2016). Global rebalancing of cellular resources by pleiotropic point mutations illustrates a multi-scale mechanism of adaptive evolution. Cell Syst. 2 (4): 260–271. Ebrahim, A., Brunk, E., Tan, J. et al. (2016). Multi-omic data integration enables discovery of hidden biological regularities. Nat. Commun. 7: 13091. Park, J.H. and Lee, S.Y. (2008). Towards systems metabolic engineering of microorganisms for amino acid production. Curr. Opin. Biotechnol. 19 (5): 454–460. Yim, H., Haselbeck, R., Niu, W. et al. (2011). Metabolic engineering of Escherichia coli for direct production of 1,4-butanediol. Nat. Chem. Biol. 7 (7): 445–452. Barton, N.R., Burgard, A.P., Burk, M.J. et al. (2015). An integrated biotechnology platform for developing sustainable chemical processes. J. Ind. Microbiol. Biotechnol. 42 (3): 349–360. Burgard, A., Burk, M.J., Osterhout, R. et al. (2016). Development of a commercial scale process for production of 1,4-butanediol from sugar. Curr. Opin. Biotechnol. 42: 118–125. Lachance, J.-C., Rodrigue, S., and Palsson, B.O. (2019). Minimal cells, maximal knowledge. Elife [Internet] 8 http://dx.doi.org/10.7554/eLife.45379. Gibson, D.G., Glass, J.I., Lartigue, C. et al. (2010). Creation of a bacterial cell controlled by a chemically synthesized genome. Science 329 (5987): 52–56. Hutchison, C.A. 3rd,, Chuang, R.-Y., Noskov, V.N. et al. (2016). Design and synthesis of a minimal bacterial genome. Science 351 (6280): aad6253. Monk, J.M., Charusanti, P., Aziz, R.K. et al. (2013). Genome-scale metabolic reconstructions of multiple Escherichia coli strains highlight strain-specific adaptations to nutritional environments. Proc. Natl. Acad. Sci. U. S. A. 110 (50): 20338–20343. Fang, X., Monk, J.M., Mih, N. et al. (2018). Escherichia coli B2 strains prevalent in inflammatory bowel disease patients have distinct metabolic capabilities that enable colonization of intestinal mucosa. BMC Syst. Biol. 12 (1): 66. Norsigian, C.J., Attia, H., Szubin, R. et al. (2019). Comparative genome-scale metabolic modeling of metallo-beta-lactamase-producing multidrug-resistant Klebsiella pneumoniae clinical isolates. Front. Cell. Infect. Microbiol. 9: 161. Seif, Y., Kavvas, E., Lachance, J.-C. et al. (2018). Genome-scale metabolic reconstructions of multiple Salmonella strains reveal serovar-specific metabolic traits. Nat. Commun. 9 (1): 3771.

References

124 Kim, Y., Gu, C., Kim, H.U., and Lee, S.Y. (2020). Current status of

125 126 127 128

pan-genome analysis for pathogenic bacteria. Curr. Opin. Biotechnol. 63: 54–62. Lara, A.R. and Gosset, G. (2019). Minimal Cells: Design, Construction, Biotechnological Applications. Springer International Publishing. BiGG Models [Internet]. http://bigg.ucsd.edu (accessed 28 August 2020). Home Page – The COBRA Toolbox [Internet]. https://opencobra.github.io/ cobratoolbox/stable (accessed 28 August 2020). cobrapy – constraint-based metabolic modeling in Python [Internet]. https:// opencobra.github.io/cobrapy (accessed 28 August 2020).

71

73

3 Quantitative Metabolic Flux Analysis Based on Isotope Labeling Wolfgang Wiechert and Katharina Nöh Forschungszentrum Jülich GmbH, Institute of Bio- and Geosciences, Jülich, Germany

3.1 Introduction 3.1.1

What Metabolic Flux Analysis Is About

The overarching aim of Metabolic Flux Analysis (MFA) is to find out as much as possible about the intracellular reaction rates taking place within living microorganisms. It is a quantitative strategy based on a metabolic network and experimental data. From this information MFA seeks to calculate the fluxes with highest possible accuracy. Over the recent decades, a vast diversity of approaches and techniques for MFA emerged, inspired by a range of biological questions and various experimental-analytical advances. These MFA techniques share one key aspect: They interrogate experimental data on the basis of mass balances derived from a metabolic reaction network. Consequently, MFA should not be confused with Flux Balance Analysis (FBA), which predicts physiological phenotypes that are in principle allowed by the underlying metabolic and are enforced by an “evolutionary” goal, possibly together with some additional information (see, for example, Chapter 4). Both methods are based on the stoichiometric mass balance equations for fluxes, but only MFA strives to narrow down the real flux distribution by assimilating sufficient information from various measured data sources. Nonetheless, both techniques, FBA and MFA, fall into the broader category of “constraint-based modeling” approaches, and both have a fixed place in systems biology and, specifically, systems metabolic engineering. This chapter introduces state-of-the-art quantitative MFA techniques, namely MFA based on isotope labeling experiments (ILEs), in short 13 C-MFA. Building on the principles of MFA, 13 C-MFA extends the stoichiometric models introduced in the preceding Chapter 2 by constituting the relations between the fluxes and the measured labeling data, originating from ILEs. The ultimate question in quantitative MFA therefore is, whether it is possible to unambiguously determine the intracellular metabolic fluxes in a microorganism based on measured data from a cultivation experiment. Moreover, and emphasized by the adjective quantitative, the statistical quality of the derived fluxes in terms of confidence bounds is included. The final result of a 13 C-MFA investigation is thus Metabolic Engineering: Concepts and Applications, First Edition. Edited by Sang Yup Lee, Jens Nielsen, and Gregory Stephanopoulos. © 2021 WILEY-VCH GmbH. Published 2021 by WILEY-VCH GmbH.

74

3 Quantitative Metabolic Flux Analysis Based on Isotope Labeling GLCin

glc_upt

GLCex

PEP

PYR

pts

CO2 G6P

RU5P

gnd

net xcb

net xcb

PPP

pgi

FBP

R5P net xcb

net xch

pfk

tkt1

S7P

tkt2

GAP net xcb

net xch fda

DHAP

tal

E4P

GAP

tpi

net xcb rpi

X5P

F6P

EMP

rpe

net xch

E6P

Figure 3.1 Typical result of a 13 C-MFA investigation. Flux map of central carbon metabolism of Corynebacterium glutamicum. Net fluxes are indicated by pathway strength. Statistical determinacy of the net and exchange fluxes is represented by traffic-light labels: green – flux precision of 10–20%, yellow – flux precision of 20–100%, and red – flux precision of >100%. Net fluxes, except those in the anaplerosis, are well-determined by the data. Source: Data taken from Zelle [1].

gapA

net xch

PGA net xch eno

net xch

CO2

PEP

pck_ppc

pyk

CO2

net xch

PYR

mez CO2 pyc_odx

pdh

ANA

CO2 ACoA

net xch

OAA

gltA CIT

mqo_mdh

acnA net xch

MAL

ACN

acnB fumC

TCA

net xch

ICIT

FUM

icd

CO2

sdh SUC net xch

CO2 aKG

odh

given by a metabolic flux map along with the related statistical information on estimated fluxes (Figure 3.1). Generally, in statistics the term estimation is used for uncertain quantities, for which the true numbers are unknown. Accordingly, in 13 C-MFA we speak of flux estimates, which are the most or highly likely values that explain the data, and that are always accompanied by confidence bounds.

3.1 Introduction

The set of all time-dependent metabolic fluxes in a living cell is often referred to as the fluxome and, hence, belongs to the omics data family [2]. Roughly spoken, the genome determines the cellular inventory, the transcriptome relates to genetic regulation, the proteome describes catalytic activities of enzymes, and the metabolome prescribes thermodynamic driving forces, but only the fluxome informs about the ultimate functional manifestation of the physiological phenotype of a cell [3–5]. With respect to metabolic engineering, the fluxome is of utmost importance because it tells how the cell actually uses its resources to steer the intracellular material flows toward wanted and unwanted products including biomass. This is the reason why MFA has been mainly developed in the field of industrial biotechnology where it immediately became a fixed member of the toolset for metabolic engineers [6, 7]. Conceptually, however, there is one big difference between fluxomics and all the other omics methods: There is no direct way of measuring the intracellular fluxes. As is shown below, even if all intracellular concentrations are continuously measured over time, in general there is no way to calculate the fluxes from concentration changes without imposing additional biological assumptions. In other words, fluxomics is not just a branch of metabolomics. For flux quantification an additional information source is needed, given by the isotopic enrichment data for the intracellular metabolite pools. However, although direct interpretation of labeling data can give qualitative insights in special situations [8], a computational model-based procedure is indispensable for deriving the wanted intracellular fluxes from the labeling patterns. More precisely, the knowledge of the metabolic reaction network paired with some material balances and mild biological assumptions provide the necessary information for metabolic flux quantification. Based on this general setting, many different 13 C-MFA techniques have emerged over the recent decades. In general, these variants are motivated by different biological questions, different kinds of information, experimental regimes, or measurement configurations. An overview is given in Figure 3.2. This introductory chapter concentrates on the current mainstream techniques1 to provide an orientation for the newcomer as well as a guide for the practitioner. Strong focus is given to fundamental principles and possible pitfalls, which are motivated with simple examples. References are provided to more specialized topics. Nevertheless, it is still highly recommended to read introductory papers from the beginnings of 13 C-MFA like [9–14] because in those days the variety of different methods, tools, analytical instruments, and applications was much smaller and, thus, could be described in a more focused way. Step-by-step protocols for 13 C-MFA in Escherichia coli are found in [15], its recent update [16], and in the collections for microbes [17], eukaryotes [18], and plants [19]. Furthermore, plenty of timely thematic reviews are available, e.g. [20–22]. We also refer the reader to classic introductions from the same authors [23–26]. Finally, it should also be said that this book chapter cannot cover all important contributions to the multifaceted field of 13 C-MFA. 1 “Mainstream” here means that software tools are available for data evaluation.

75

Metabolic dynamic

3 Quantitative Metabolic Flux Analysis Based on Isotope Labeling

m+1

X

m+0 Time

Time

Fluxes

Nonstationary Stationary m+0 m+2

m+3 X m+4

m+1

Time t1

Flux ratios

Metabolic stationary

76

t2

m+5 m+4 m+3 m+2 m+1 m+0

Time 100% 20%

80%

40%

P

SB

SA

P

SB

SA

60%

100%

t1

t2

Figure 3.2 Taxonomy of 13 C-isotope labeling techniques and related flux information.

3.1.2

The Variants of 13 C-MFA

Every 13 C-MFA study is undertaken to answer a concrete biological question, to get a deeper understanding of cellular properties. Typical application scenarios in the context of metabolic engineering are the detection of production bottlenecks, carbon and energy leaks [27–31] to unravel the metabolic responses to genetic manipulations [32–35] or the diagnosis of extracellular cues [36]. To address these kinds of questions, a broad variety of different concepts and tools have been developed over the past decades. This book chapter is directed to practitioners who want to use the available tools for their research. For those interested in more theoretical background another up-to-date book chapter [37] is available, which focuses in more detail on modeling, simulation, experimental design, and statistical evaluation of ILEs. This chapter concentrates on metabolic pseudo-steady-state approaches, where intracellular metabolite concentrations and metabolic fluxes during a cell cultivation are considered constant, both with respect to time and the population. Particularly, all MFA approaches have in common that they study cell population averages where the cellular exterior and interior are considered homogeneous and well-mixed. Typically, cellular steady states are achieved by establishing a constant growth rate over a sufficiently long time window. In industrial biotechnology, this is accomplishable in batch, fed-batch, and

3.2 A Toy Example Illustrates the Basic Principles

continuous cultures. While most 13 C-MFA applications rely on the metabolic steady state (MSS) assumption, noteworthy exceptions exist. Among those are fully dynamic approaches [38, 39] that rely on knowledge about reaction kinetics, and mechanistic-free approaches for describing the labeling dynamics [40–42]. Such dynamic 13 C-MFA techniques, however, are much more demanding and, thus, not yet supported by off-the-shelf 13 C-MFA software tools. While the MSS characterizes the metabolism, the isotopic steady state (ISS) characterizes the enrichment of a stable isotopic tracer in the metabolites. When a 13 C-labeled substrate is constantly metabolized, eventually, the labeling in the intracellular metabolites becomes constant over time (Figure 3.2). This state is called the ISS, which follows the initial isotopic transient state. A principal distinction in 13 C-MFA is whether flux values are desired in absolute or relative numbers. The quantification of absolute fluxes necessarily involves network-wide mass balancing and the knowledge of the extracellular rates, while relative fluxes are obtainable with local balances, in the simplest case at metabolic branch points where two or more pathways converge [15, 43, 44]. An example is given in Section 3.2.8. The typical number of flux ratios that are determinable in the central carbon metabolism from measureable key metabolites is restricted [45], given the limited number of observable converging nodes in a metabolic network. The derivation of the flux ratio formulas, nonetheless, still requires the knowledge of the full 13 C-MFA network, and several additional assumptions are made to overcome missing information. Nevertheless, the flux ratio approach is computationally cheap and should be considered when extracellular rates cannot be determined [46, 47].

3.2 A Toy Example Illustrates the Basic Principles In this and the following sections, we explain the basics of 13 C-MFA with the very simple, yet illustrative, toy network shown in Figure 3.3. The purpose is to give an intuition about where the information needed for intracellular flux quantification actually comes from and how it is mathematically exploited. At the same time, possible pitfalls and misconceptions are pinpointed and the concept of experimental design for an optimal information harvest is introduced. Notice, that all described concepts work for realistic metabolic networks as well. 3.2.1

Fluxomics: More Than Just a Branch of Metabolomics

In the running example in Figure 3.3, the metabolic system consists of three irreversible reactions (x, y, z) that connect three intracellular metabolites (A, B, C). The system is fed by an (unlimited) resource S through the uptake reaction u, and produces a product P, which may accumulate in the extracellular space (through reaction v). Besides, a side-product Q is formed and exported out of the cell by reaction w. As we delineate below, with such a network specification at hand, the three essential sources of information for flux quantification of the reactions x, y, z are:

77

3 Quantitative Metabolic Flux Analysis Based on Isotope Labeling

S

u

undary Cell bo

78

v

x A

B

y

z

P

Figure 3.3 Running example to illustrate the principles of 13 C-MFA. Extracellular reactions (u, v, w) transport the substrate (S) and the (by)products (P, Q) across the cell boundary; intracellular reactions (x, y, z) connect the metabolites A, B, C. Arrowheads indicate reaction directions. MFA rests on the fundamental assumption that the material exchange for every intracellular metabolite pool is balanced (influxes = outfluxes).

C w Q

1) Extracellular fluxes connecting intracellular metabolism with its surrounding.2 2) Isotopic labeling states of the intracellular metabolite pools. 3) Intracellular metabolite pool sizes in the case of isotopically non-stationary analysis (Section 3.2.4). In the following, same capital letters specify metabolites and their metabolic pool sizes, as well as the same small letters are used for reactions and their associated metabolic fluxes. Names and quantities are, however, distinguished by normal and italic font, respectively. We consider the example network under the most general conditions given by the metabolic dynamic situation. This means that the pool sizes S, P, Q, A, B, C of the metabolites S, P, Q, A, B, C are time-dependent and, likewise, the metabolic fluxes u, v, w, x, y, z of the reactions u, v, w, x, y, z may change over time. Then, for the three intracellular metabolite pool sizes the following mass balances hold: d A(t) = u(t) − x(t) − y(t) dt d B∶ B(t) = x(t) − v(t) − z(t) dt d C∶ C(t) = y(t) + z(t) − w(t) dt

A∶

(3.1)

The question now is, if, for each single time instance, the intracellular fluxes x, y, z can be determined from these equations using measured quantities. As a starting point, we assume the best of all possible measurement scenarios: 2 Extracellular fluxes are often referred to as exchange fluxes in metabolic modeling, in particular in FBA. The notion of exchange fluxes in 13 C-MFA is, however, a rather different one. Here, exchange fluxes quantify the net-neutral material exchange of products and educts of a reversible intracellular reaction.

3.2 A Toy Example Illustrates the Basic Principles

1) All extracellular rates like the substrate uptake rate u and output rates (by-product formation, biomass production, growth, and maintenance) v, w are available, e.g. derived from bioprocess mass balances. 2) All intracellular pool sizes A, B, C are continuously measured. 3) The temporal sample resolution is sufficiently high and the measurement noise is negligible, so that the time derivatives d/dt A, d/dt B, d/dt C can be approximated from the time series with sufficient precision by interpolation. Under these optimistic conditions Eq. (3.1) constitutes a time-dependent linear equation system with three equations for the three unknown quantities x(t), y(t), z(t). Unfortunately, it turns out that these stoichiometric balances are underdetermined due to their rank deficiency. To see this, take any solution x(t), y(t), z(t). Then for any constant c, the shifted values x(t) + c, y(t) − c, z(t) + c constitute another solution of Eq. (3.1). Thus, the stoichiometric balances permit arbitrary many internal solutions for fixed external fluxes. This rank deficiency is not only present in the toy network, but also occurs for most realistic metabolic networks. This simple example shows that even with perfect time-resolved knowledge of extracellular fluxes and intracellular metabolite sizes, the given information is not sufficient to calculate the intracellular fluxes. Notably, a restriction to the MSS d/dt A = d/dt B = d/dt C = 0 does not improve the situation. In particular, this shows that fluxomics should not just be considered as a branch of metabolomics because metabolite concentrations alone do not contain the necessary information for flux quantification. Consequently, an additional information source is needed for flux quantification. One way to settle the problem is to introduce mechanistic assumptions about the relations between the pool sizes and the fluxes, in the form of enzyme kinetic terms [39]. This leads to the field of mechanistic or kinetic network modeling (cf. Chapter 5). In this framework, flux quantification is replaced by the tasks of identifying kinetic expressions and estimating their enzyme-kinetic parameters [48]. However, despite recent advances, the task of determining the parameter of dynamic systems still poses plenty of severe challenges, which cannot be discussed here in detail. Apart from technical problems, one of the main hurdles is that mechanistic models bear a lot of critical assumptions about the in vivo metabolism [49]. 3.2.2

Isotope Labeling: The Key to Metabolic Fluxes

The missing piece of the puzzle is given by isotopic labeling data, which are measurable with mass spectrometers (MSs) and nuclear magnetic resonance (NMR) devices. Isotopic labeling data are generated by ILEs where the microorganisms under study are fed with stable isotopic tracers. By the metabolic activity, these tracers are then distributed over the whole metabolic network. Isotopes are atom species with the same number of protons and electrons, but that differ in the number of neutrons, resulting in distinguishable masses. In the early days of using isotopic tracers, long-lived radioactive (e.g. 14 C, 3 H) tracers played a pioneering role because they provided strong, easily detectable signals [50, 51]. Nowadays, in the vast majority of studies, non-radioactive stable 13 C

79

80

3 Quantitative Metabolic Flux Analysis Based on Isotope Labeling

[1–13C1]-GLC HO

O

HO

[1,2–13C2]-GLC OH

HO

OH

O

HO

OH

[U–13C6]-GLC HO HO

OH OH

OH

O

HO

OH

[5,6–13C2]-GLC OH OH

OH

OH

OH

[5–13C1]-GLC HO

O

HO

O

HO

OH OH

OH

Figure 3.4 Common isotopically labeled glucose tracers used for 13 C-MFA.

labeled nutrients are administered. Nonetheless, isotopic labeling is not necessarily restricted to carbon atoms, but also other isotopically stable chemical elements like 2 H, 15 N, or 18 O are used for studying specific questions. Despite, the variant of MFA that uses isotopic tracers is hitherto subsumed under the term 13 C-MFA, noting that irrespective of the stable tracer utilized, the theory is readily transferable. Any isotope species has a natural abundance (NA). For example, 12 C occurs at an abundance of ∼98.89%, while 13 C has a natural abundance of ∼1.1% in nature,3 whereas the 14 C radioisotope occurs only in trace amounts ( ADP PFK1 and PFK2 + F16P + H

0

FBA

Fructosebisphosphate aldolase

F16P Ga3P + DHAP

FBA1

1

TPI

Triosephosphate isomerase

DHAP Ga3P

TPI1

1

Metabolite abbreviations: ADP, adenosine diphosphate; ATP, adenosine triphosphate; DHAP, dihydroxyacetone phosphate; F16P, fructose 1,6-bisphosphate; F6P, fructose 6-phosphate; G6P, glucose 6-phosphate; Ga3P, glyceraldehyde 3-phosphate; Glc, d-glucose; H, proton.

reactions (Figure 4.2a and b). Thereby, each enzyme is represented as a pseudometabolite that can be produced and consumed by reactions in the model. For those reactions that a particular enzyme can catalyze, it will be added as a substrate to those reactions, with the stoichiometric coefficient based on the maximal turnover rate, k cat value. The 1/k cat value is used to consider that faster enzymes, with a higher k cat value, will require a lower usage to catalyze the reaction than enzymes with a lower k cat value. Note that this does not signify the consumption of the enzyme in the reaction, it merely represents the usage of enzyme to catalyze the reaction, as detailed in Example 4.1. Furthermore, GECKO adds new columns to the matrix to represent the enzyme usages (Figure 4.2a and b), where each enzyme pseudometabolite that is utilized by a metabolic reaction is replenished. This allows to directly impose upper bounds for the usage of each enzyme, corresponding to experimentally measured protein levels. To facilitate the absence of measured expression level of some or all proteins, the summed usage of all enzymes can also be defined. The resulting model is referred to as an enzyme-constrained model, i.e. ecModel. Through this relatively simple formulation, GECKO integrates proteome constraints into a traditional GEM without requiring larger computational resources. In the original study where GECKO was introduced [8], this formalism was applied to the S. cerevisiae GEM Yeast7, and the resulting model was named ecYeast7. Due to the additional constraints, ecYeast7 can show a lower flux variability compared to Yeast7 (Figure 4.2c). The flux variability is basically the uncertainty by which each flux through the metabolic network is determined. As a measure of the number of possible flux solutions in a particular model, it defines the range of individual metabolic flux values that the model can use to obtain the same overall objective function, typically maximizing growth under a certain

(b)

(a)

(d)

(c)

(e)

(f)

Figure 4.2 Coarse-grained integration of proteome constraints. (a) The GECKO approach allows for the addition of enzymes into the metabolic reactions with their kinetics, i.e. kcat values, being included as the coefficients. This enables the estimation of enzyme usages that are required to sustain metabolic fluxes. M represents metabolite, E enzyme, v metabolic flux, and e enzyme usage. (b) After adding enzymes into metabolic reactions, the original stoichiometric matrix is extended to have additional rows representing enzymes and columns representing enzyme usage. The GECKO approach enables to constrain the enzyme usage not only for each individual enzyme but also for the total enzyme pool. (c) The integration of proteome constraints is capable of reducing flux variability. (d) Proteome constraints enable to predict a metabolic shift, which is illustrated by describing the so-called Crabtree effect occurring in the yeast Saccharomyces cerevisiae. When the glucose uptake rate exceeds a certain threshold, metabolism shifts from purely respiratory metabolism to a mix of fermentative and respiratory metabolism. This leads to ethanol production and a decrease in oxygen uptake rate when the growth rate increases further. (e) The integration of proteome constraints allows for predicting growth rates under various carbon sources. Without proteome constraints, a traditional GEM usually predicts unlimited growth unless the uptake rates of carbon sources probably as well as other nutrients are constrained by experimental values. (f ) The integration of proteome constraints captures the impact of genetic modifications, which is illustrated by predicting the shift in the critical threshold where the Crabtree effect occurs caused by a knockout of the gene NDI1.

142

4 Proteome Constraints in Genome-Scale Models

constraint of substrate uptake. The reduced solution space gives more reliable flux estimations, which can be helpful when comparing different conditions, and correlation with other types of omics data. Besides, ecYeast7 can correctly predict phenotypes that Yeast7 cannot, including the Crabtree effect (Figure 4.2d), growth on various carbon sources (Figure 4.2e), and effect of genetic modifications (Figure 4.2f ). All the predictions could be explained by the addition of proteome constraints. Substantially, ecModels are suitable for integration of proteomics data by limiting the usages of individual enzymes with absolute protein levels, thereby leading to a more constrained and reliable simulation. Example 4.1 Construction of Enzyme-Constrained Models According to the GECKO Formalism Upper glycolysis in S. cerevisiae is used as a toy model to show how to construct an ecModel from an existing stoichiometric matrix. The toy model includes five enzymatic reactions, and they can be extracted from a traditional GEM as shown in Table 4.1. The stoichiometric matrix of the toy model is shown in Figure 4.3a. To obtain an ecModel, there are four steps that needs to be taken: 1) Split reversible reactions into forward and backward reactions. This step is crucial, as each enzyme-catalyzed reaction should utilize an enzyme as substrate, with direction-specific maximal turnover rates. In the toy model, there are three reversible reactions: PGI, FBA, and TPI. Each of them will be replaced by forward and backward reactions without changing gene association, and the reactions will be suffixed with _f and _b in their identifiers, respectively. At this stage, the stoichiometric matrix is thereby expanded: the FBA column in the original matrix is converted to two columns, i.e. FBA_f and FBA_b (Figure 4.3a and b), while the coefficients in the columns are inversed for the same metabolites. 2) Split the reactions that are catalyzed by isozymes. In the toy model, only the HXK reaction should be split into multiple reactions at this stage, as it is the only reaction that has alternative isozymes associated to it. The HXK column in the original matrix is replaced by three columns as the reaction has three isozymes (Figure 4.3a and b). The new reactions are marked HXK_1, HXK_2, and HXK_3, and show identical coefficients with the same metabolites as they all catalyze the same reaction. 3) Add enzymes as substrates to their corresponding reactions. At this stage, k cat values should be collected for each enzymatic reaction, as the coefficient for the enzyme added to the reaction is defined as 1/k cat (Figure 4.3b and c), based on Eq. (4.1). Many k cat values can be retrieved from the BRENDA [12] and SABIO-RK [13] databases, and where multiple values are available for a reaction GECKO uses the maximum to prevent overconstraining the model. If no k cat value can be found for a particular reaction, GECKO performs a matching algorithm that attempts to assign k cat values that are, as much as possible, related to the specific reaction [8], but could be from a different organism or substrate. At this stage, the stoichiometric matrix is expanded to include new rows for the enzymes (Figure 4.3b). For reactions catalyzed by complexes, e.g.

(c)

(a) –1 –1

(d)

–1 –1 –1

–1 –1

(b) –1 –1

–1 –1

–1 –1

–1 –1 –1

–1

–1

–1

–1 –1

–1

–1

–1

Figure 4.3 Expansion of the stoichiometric matrix by coarse-grained integration of proteome constraints. (a) The original stoichiometric matrix from a traditional GEM of the yeast S. cerevisiae. Reactions with isoenzymes (red) and reversible reactions (blue) will be expanded into individual reactions. (b) Integration of enzymes into metabolic reactions expands the stoichiometric matrix, which includes additional rows for enzymes and columns for enzyme usage reactions. Additional columns are defined for isoenzymes (red) and reversible reactions (blue). (c) List of species-specific kcat values, as could be obtained from, e.g. the BRENDA database. The enzymes are coupled to their corresponding reactions with 1/kcat as stoichiometric coefficients. As the flux units in the model are in mmol/gCDW/h, the kcat values are first converted to /h. (d) Converting the concentration unit of the enzyme usage reactions from mmol/gCDW to g/gCDW requires the molecular weights of all enzymes that are subsequently used as stoichiometric coefficients in the expanded matrix (see panel B).

144

4 Proteome Constraints in Genome-Scale Models

the PFK reaction, the subunits should be added following the same ratio as the complex stoichiometry of the complex. 4) Add a usage reaction for each enzyme. Since the coarse-grained model does not take protein synthesis into account, it assumes that all enzymes are directly derived from a common protein pool. It should be noted that the enzyme usage reaction represents the occupation of the common protein pool by the enzyme, which is in a concentration unit of mmol/gCDW, rather than a flux unit of mmol/gCDW/h. The concentration unit for each enzyme usage reaction should furthermore be converted from mmol/gCDW to g/gCDW as the common protein pool represents the total mass of the modeled proteins in grams. This conversion can be performed by introducing the molecular weight of each enzyme into reactions as coefficients (Figure 4.3d), where GECKO can obtain molecular weights from UniProt [14] or are calculated based on amino acid sequences. In addition, an exchange reaction should be added for the common protein pool. Accordingly, 19 new reactions are added in the model, and the size of the stoichiometric matrix further increases (Figure 4.3b). By following the steps above, an ecModel can be generated based on a traditional GEM. The stoichiometric matrix for the metabolic part of the resulting ecModel does not change, i.e. the upper left submatrix is equivalent to the original stoichiometric matrix (Figure 4.3a and b). Rather, introducing new submatrices makes it possible to impose proteome constraints or integrate proteomics data. In lieu of knowledge on protein expression levels when no proteomics data is available, the total cellular protein pool can instead be limited in the model by constraining the upper bound of the protein pool exchange reaction. When absolute proteomics data is available, enzymes can be constrained individually by applying the measured protein levels as the upper bound of the enzyme usage reactions. Therewith, coarse-grained approaches as GECKO provide a straightforward platform to integrate proteome constraints. 4.3.2

Fine-Tuned Integration of Proteome Constraints

Contrasting with the coarse-grained integration, fine-tuned approaches tend to explicitly integrate biological processes into a GEM, e.g. protein synthesis process. A few approaches have been developed that account for proteome constraints in a fine-tuned manner, such as models of metabolism and gene expression [4, 5] and resource balance analysis (RBA) [15]. Models developed in such a fine-tuned manner are here referred to as proteome-constrained models (pcModel). Similar to ecModels, Eq. (4.1) is the basis for connecting metabolism with proteome constraints in pcModels. A difference with ecModels is that pcModels use “coupling constraints” [16] to relate the synthesis, dilution, and degradation of enzymes to the corresponding metabolic reactions (Figure 4.4a), while these processes are not explicitly modeled in ecModels. The coupling constraints can furthermore be applied to relate other cellular machineries to their catalytic functions, e.g. ribosomes to synthesis of individual proteins. While pcModels require substantially more parameters such as catalytic rates of the various machineries, this also has the potential to increase its predictive scope.

4.3 Formulation of Proteome Constraints

(a)

(b)

(c)

(d)

Figure 4.4 Fine-tuned integration of proteome constraints. (a) In pcModels, the synthesis of enzymes and machineries is formulated, using amino acids and energy produced by metabolic processes as substrates. The synthesis rate is shown as vsyn,E for enzymes and v syn,R for ribosomes. Protein dilution due to growth of the cell and protein degradation are included with v dil and v deg being the rates, respectively. Enzymes are able to catalyze metabolic reactions while ribosomes synthesize proteins including metabolic enzymes. The link between catalysts and their functions is shown as the dependence of rates, e.g. metabolic rate vmet on catalyst concentrations (e.g. enzyme concentration [E]). The steady state assumption relates the concentration of the enzyme to its synthesis rate, which relies on the availability of degradation constant kdeg,E and dilution rate 𝜇. Combining the equations, the relationship between metabolic rate and enzyme synthesis rate can be formulated. With detailing protein synthesis, proteome-constrained models can expand the predictions as illustrated by several examples below. (b) Prediction of amino acid composition of biomass. Given that metabolic enzymes and gene expression machineries synthesized by the proteome-constrained models account for the majority of biomass protein, the amino acid composition of all the modeled proteins can be estimated and compared to the average amino acid composition of biomass. (c) Prediction of RNA/protein ratio. In addition to protein mass, the proteome-constrained models are able to predict the total RNA mass, as rRNA, tRNA, and mRNA all act as catalysts in gene expression process. (d) Prediction of metal utilization. Metals as well as other micronutrients, e.g. vitamins, can act as essential cofactors that are associated to enzymes and required for their functioning. Given that protein abundance can be predicted by the proteome-constrained models, metal utilization can also be predicted by specifying the metal requirements for each of the proteins in the cell. Source: Modified from O’Brien et al. [5].

In addition to predicting metabolic shifts [5] and reducing solution space [4], which can be achieved with ecModels, pcModels are capable of predicting additional phenotypes such as the amino acid composition in biomass (Figure 4.4b), RNA/protein ratio (Figure 4.4c), and effects of metal availability (Figure 4.4d). Example 4.2 Construction of Proteome-Constrained Models Through Coupling Constraints The same toy model of upper glycolysis in Example 4.1 is used here to detail the construction of a pcModel. In addition to splitting reversible reactions and

145

Table 4.2 The synthesis reaction of the TPI enzyme.

Reaction identifier

syn_TPI1

Reaction name

Synthesis of TPI1

Reaction equation

992 H2 O + 496 ATP + 496 GTP + 5 A-tRNA(GCC) + 20 A-tRNA(GCU) + 8 R-tRNA(AGA) + 11 N-tRNA(AAC) + N-tRNA(AAU) + 7 D-tRNA(GAC) + 8 D-tRNA(GAU) + 2 C-tRNA(UGU) + 7 Q-tRNA(CAA) + 17 E-tRNA(GAA) + 22 G-tRNA(GGU) + 3 H-tRNA(CAC) + 9 I-tRNA(AUC) + 6 I-tRNA(AUU) + 4 L-tRNA(UUA) + 15 L-tRNA(UUG) + 2 K-tRNA(AAA) + 19 K-tRNA(AAG) + M1-tRNA(AUG1) + 8 F-tRNA(UUC) + 3 F-tRNA(UUU) + 6 P-tRNA(CCA) + P-tRNA(CCU) + 7 S-tRNA(UCC) + 7 S-tRNA(UCU) + 5 T-tRNA(ACC) + 7 T-tRNA(ACU) + 3 W-tRNA(UGG) + 6 Y-tRNA(UAC) + 13 V-tRNA(GUC) + 13 V-tRNA(GUU) + 2 S-tRNA(AGC) => 992 H + 992 pi + 496 ADP + 496 GDP + 2 tRNA(AAA) + 11 tRNA(AAC) + 19 tRNA(AAG) + tRNA(AAU) + 5 tRNA(ACC) + 7 tRNA(ACU) + 8 tRNA(AGA) + 9 tRNA(AUC) + tRNA(AUG1) + 6 tRNA(AUU) + 7 tRNA(CAA) + 3 tRNA(CAC) + 6 tRNA(CCA) + tRNA(CCU) + 17 tRNA(GAA) + 7 tRNA(GAC) + 8 tRNA(GAU) + 5 tRNA(GCC) + 20 tRNA(GCU) + 22 tRNA(GGU) + 13 tRNA(GUC) + 13 tRNA(GUU) + 6 tRNA(UAC) + 7 tRNA(UCC) + 7 tRNA(UCU) + 3 tRNA(UGG) + 2 tRNA(UGU) + 4 tRNA(UUA) + 8 tRNA(UUC) + 15 tRNA(UUG) + 3 tRNA(UUU) + 2 tRNA(AGC) + TPI1

Gene association

Reversibility

Ribosome

0

Note: Given that tRNA is responsible for transferring amino acids, tRNA charged with amino acids are used as substrates in the reaction. For example, A-tRNA(GCC) represents the tRNA changed with alanine whose RNA codon is GCC, and subsequently tRNA(GCC) is returned. Metabolite abbreviations: ADP, adenosine diphosphate; ATP, adenosine triphosphate; GDP, guanosine diphosphate; GTP, guanosine triphosphate; H, proton; H2 O, water; pi, phosphate.

4.3 Formulation of Proteome Constraints

isozymes, synthesis reactions should be added for all the involved metabolic enzymes and machineries, together with their respective degradation reactions. The synthesis of a protein involves the entire gene expression process, including transcription, RNA modification, tRNA charging, translation, etc. In this example, only the translation reaction from charged tRNA to matured protein is formulated, focusing on the coupling constraint between ribosome and translation. Other processes and coupling constraints for other machineries are not shown in this example, while these are relatively similar in terms of the formalism. In addition, protein degradation is not included in this example. Therefore, the step-wise construction of pcModel follows: 1) Split reversible reactions into forward and backward reactions. This is the same as required for constructing an ecModel as detailed in Example 4.1. 2) Split the reactions catalyzed by isozymes. This is the same as required for constructing an ecModel as detailed in Example 4.1. 3) Add synthesis reactions for all the enzymes, i.e. translation reactions. In this step, amino acid sequences of all proteins are used to account for the material cost of producing individual proteins (Table 4.2). It should be noted that amino acids are transferred by tRNAs, and thereby the charged tRNAs should be consumed in the reaction with uncharged tRNAs being produced. Here we do not show the synthesis and modification of tRNAs, while these steps should be included in a fully functional pcModel. Another consideration is the consumption of ATP and GTP as energy cost during the translation process. The energy cost of translation varies among organisms but is generally determined by the length of a protein. Here (Table 4.2), the syn_TPI1 reaction is formulated for the synthesis of TPI1, and the energy cost is roughly assumed to be 496 ATP and 496 GTP molecules based on its length. 4) Add ribosome assembly reaction. Given that translation is catalyzed or controlled by ribosomes, the translation rate is related to the synthesis rate of ribosomes. If more ribosomes are present, translation of individual proteins can occur faster. Therefore, the synthesis of ribosomes should be added to the model, together with coupling constraints for ribosomes and translation. Indeed, mRNA is also related to the translation process as the template that is read by the ribosome, but this would follow the same type of coupling constraint as for translation reactions and is therefore not shown here. We formulate that ribosomes are made up of individual ribosomal proteins; each of these subunits should themselves be synthesized, but this is not shown here (Table 4.3). In addition, rRNA should be included in the ribosomes, but not shown here. 5) Add dilution reactions. In the current model, synthesis reactions for enzymes and machineries are added. Growing and dividing cells dilute out the concentrations of enzymes and machines to their daughter cells, and act in similar time-scales as cell division (in contrast to metabolic reactions, that have much higher turnover rates). Therefore, dilution reactions are added and formulated as dissipation of enzymes and ribosomes to nothing.

147

148

4 Proteome Constraints in Genome-Scale Models

Table 4.3 The ribosome assembly reaction. Reaction

Reaction name

identifier

Reaction equation

syn_Ribo Synthesis of ribosome S0 + S1 + S2 + S3 + S4 + S5 + S6 + S7 + S8 + S9 + S10 + S11 + S12 + S13 + S14 + S15 + S16 + S17 + S18 + S19 + S20 + S21 + S22 + S23 + S24 + S25 + S26 + S27 + S28 + S29 + S30 + S31 + Asc1 + L1 + L2 + L3 + L4 + L5 + L6 + L7 + L8 + L9 + L10 + L11 + L12 + L13 + L14 + L15 + L16 + L17 + L18 + L19 + L20 + L21 + L22 + L23 + L24 + L25 + L26 + L27 + L28 + L29 + L30 + L31 + L32 + L33 + L34 + L35 + L36 + L37 + L38 + L39 + L40 + L41 + L42 + L43 + P0 + P1 + P2 => Ribosome

Gene association Reversibility

0

Note: The substrates in the reaction are ribosomal subunits.

6) Change biomass reaction and ATP requirements. In traditional GEMs without proteome constraints, the material cost of protein synthesis is included in the biomass equation. Due to the addition of dilution reactions of enzymes and machineries, the material cost corresponding to those proteins should be removed from the original biomass reaction. In addition, the growth-associated ATP should be reduced as the protein synthesis processes have now already accounted for their ATP/GTP consumption, while they were previously considered to be part of the growth-associated energy requirement. If protein degradation is explicitly included in the model, then the non-growth-associated energy requirement should also be reduced. This step is not shown in the toy model. While we have expanded the metabolic network by including protein translation reactions through the sequential steps described above, the protein synthesis process does not seem to affect the metabolic part of the model as enzymes do not participate in metabolic reactions. In pcModels, the coupling constraints are used to relate metabolic reaction rates to enzyme synthesis rates, and to relate enzyme synthesis rates to ribosome assemble rates. Here we introduce the principles of coupling constraints and use the toy model as an example. In the toy model, two types of coupling constraints are needed: 1) Metabolic enzymes are coupled to their corresponding metabolic reactions. Here using the enzyme TPI1 as an example, its synthesis rate equals the dilution rate plus degradation rate at a steady state: vsyn

TPI1

= vdil

TPI1

+ vdeg

TPI1 #

(4.2)

4.3 Formulation of Proteome Constraints

The dilution rate in Eq. (4.2) can be described as: vdil

= 𝜇 ⋅ [E]TPI1, #

TPI1

(4.3)

where 𝜇 is the growth rate (/h) and [E]TPI1 is the intracellular concentration of TPI1 (mmol/gCDW). As the toy model does not account for protein degradation, vdeg_TPI1 can be eliminated. Accordingly, the synthesis rate of TPI1 should be: vsyn

TPI1

= 𝜇 ⋅ [E]TPI1 #

(4.4)

Given Eq. (4.1), if we use the maximum turnover rate k cat in the model, then we can use the following inequality to relate the concentration of TPI1 to the rate of the TPI_f reaction: vTPI1 f ≤ kcat,TPI1 f ⋅ [E]TPI1 f #

(4.5)

and to the rate of the backward reaction TPI_b: vTPI1

b

≤ kcat,TPI1b ⋅ [E]TPI1b , #

(4.6)

where [E]TPI1_f and [E]TPI1_b signify the amount of the TPI enzyme occupied by catalyzing the forward and reverse reactions, and the sum of them represents the total concentration of TPI1: [E]TPI1 b + [E]TPI1

b

(4.7)

= [E]TPI1 #

Note that Eqs. (4.5, 4.6) are non-strict inequalities: e.g. the maximal rate through vTPI1_f is the product of the concentration of TPI and the maximal turnover rate of the forward reaction, while in reality this rate is often lower due to suboptimal substrate and product concentrations that affect its kinetics. Combining Eqs. (4.4–4.6) yields: 1 kcat,TPI1

⋅ vTPI1 f + f

1 kcat,TPI1

⋅ vTPI1 b

b



1 ⋅v 𝜇 syn

TPI1 #

(4.8)

Accordingly, Eq. (4.8) relates the synthesis rate of the TPI1 enzyme to the metabolic fluxes through the TPI catalyzed reaction in the model. In other words, the summed forward and backward rates through the TPI reaction are limited by how fast the TPI enzyme can be synthesized. Such a coupling constraint can be extended to all other enzymes. 2) Ribosomes are coupled to all protein synthesis reactions. Likewise to coupling enzymes to metabolic reactions, we can obtain the following equation by balancing ribosomes: vsyn

Ribo

= 𝜇 ⋅ [E]Ribo #

(4.9)

Following Eq. (4.1), we can describe synthesis rate of the enzyme TPI1 as: vsyn

TPI1

= kribo ⋅ [E]Ribo,TPI1 , #

(4.10)

where k ribo represents the catalytic rate of ribosome. This can either be determined experimentally or computationally fitted to experimental data, and has the unit amino acid per ribosome per hour. Combining Eq. (4.9) with ((4.10)), and expanding to all the enzymes catalyzed by ribosome, we obtain

149

150

4 Proteome Constraints in Genome-Scale Models



(vsyn,i ⋅ length(peptidei )) ≤

kribo ⋅ vsyn 𝜇

Ribo #

(4.11)

Equation (4.11) is the coupling constraint that relates the assembly of ribosomes to the synthesis reactions of proteins. The other coupling constraints, such as the one relating the production of RNA polymerase to transcription reactions, should follow a similar formalism and will therefore not be detailed here. The coupling constraints together with the stoichiometric matrix of the model result in a self-replicating system [17], where protein synthesis processes constrain metabolism and vice versa. The metabolic part provides sufficient precursors and energy for protein expression as well as biomass formation. Meanwhile, the protein synthesis part produces sufficient proteins and machineries to sustain growth, as otherwise limited proteome or ribosome capabilities would constrain growth. It should be noted that growth rate is used as an input in coupling constraints, and therefore growth maximization cannot be solved as a normal linear programming (LP) objective as is common in the FBA approach of traditional GEMs. The maximal growth rate can be estimated by a binary search, which solves a series of LPs with using a range of growth rates as inputs and searches for the maximum that is able to obtain a feasible solution. However, this is a time-consuming process. Moreover, an exact rational LP solver is required as protein synthesis fluxes are generally much lower than metabolic fluxes [18]. Therefore, a practical solution is to set the specific growth rate and then calculate all fluxes, which can be done by linear programming, and then simply scan a range of different specific growth rates for which a feasible solution can be identified.

4.4 Perspectives Adding proteome constraints can evidently improve the predictive strength of GEMs, which is well illustrated by the fact that this enables prediction of the so-called Crabtree effect as illustrated in Figure 4.2d, which traditional FBA modeling does not allow for. Whether one should use an ecModel or a pcModel will, as always in modeling, depend on the application. ecModels have the advantage of being relatively simple, but still having good predictive strength. A further advantage of these models is that they enable easy integration of proteomics data, hereby further improving the constraints and resulting in very precise estimation of metabolic fluxes and cellular phenotypes. Among the challenges are the need for a high-quality stoichiometric matrix, and the requirement of information about the k cat values for all the enzymes in the model. This can be an issue for less characterized organisms, and there is a need to develop good methods for estimating reasonable k cat values in organisms where there is no experimental data available. pcModels have the advantage of enabling the prediction of many cellular features, e.g. the effect of cell size, the effect of mitochondrial volume, ribosomal requirements, and importance of protein folding. However, the model structure is more complex, and it is more demanding to set up these models. Also, they require, besides information about all the k cat values for the enzymes, information

References

about the translational rate, protein degradation rate, protein folding rate, etc. Many of these parameters can be obtained for well-studied organisms like yeast and Escherichia coli, but for other organisms there is limited information about these parameter values. The computation time for these models is also significantly higher than that for ecModels, so these models should only be used when there is a need for having a more precise and detailed insight into cellular phenotype, e.g. in basic physiological or systems biological studies. For cell factory design in metabolic engineering, the ecModels will therefore, at least currently, be preferred due to their simpler structure and easy integration with experimental data.

References 1 Covert, M.W., Famili, I., and Palsson, B.O. (2003). Identifying constraints that

2

3 4

5

6 7 8

9

10

11

12

govern cell behavior: a key to converting conceptual to computational models in biology? Biotechnol. Bioeng. 84 (7): 763–772. Beg, Q.K., Vazquez, A., Ernst, J. et al. (2007). Intracellular crowding defines the mode and sequence of substrate uptake by Escherichia coli and constrains its metabolic activity. Proc. Natl. Acad. Sci. 104 (31): 12663–12668. Zhuang, K., Vemuri, G.N., and Mahadevan, R. (2011). Economics of membrane occupancy and respiro-fermentation. Mol. Syst. Biol. 7 (1): 500. Lerman, J.A., Hyduke, D.R., Latif, H. et al. (2012). In silico method for modelling metabolism and gene product expression at genome scale. Nat. Commun. 3 (1): 929. O’Brien, E.J., Lerman, J.A., Chang, R.L. et al. (2013). Genome-scale models of metabolism and gene expression extend and refine growth phenotype prediction. Mol. Syst. Biol. 9 (1): 693. Nilsson, A. and Nielsen, J. (2016). Metabolic trade-offs in yeast are caused by F1F0-ATP synthase. Sci. Rep. 6 (1): 22264. Mori, M., Hwa, T., Martin, O.C. et al. (2016). Constrained allocation flux balance analysis. PLoS Comput. Biol. 12 (6): e1004913. Sánchez, B.J., Zhang, C., Nilsson, A. et al. (2017). Improving the phenotype predictions of a yeast genome-scale metabolic model by incorporating enzymatic constraints. Mol. Syst. Biol. 13 (8): 935. Chen, Y. and Nielsen, J. (2019). Energy metabolism controls phenotypes by protein efficiency and allocation. Proc. Natl. Acad. Sci. U. S. A. 116 (35): 17592–17597. Yang, L., Yurkovich, J.T., King, Z.A., and Palsson, B.O. (2018). Modeling the multi-scale mechanisms of macromolecular resource allocation. Curr. Opin. Microbiol. 45: 8–15. Adadi, R., Volkmer, B., Milo, R. et al. (2012). Prediction of microbial growth rate versus biomass yield by a metabolic network with kinetic parameters. PLoS Comput. Biol. 8 (7): e1002575. Jeske, L., Placzek, S., Schomburg, I. et al. (2019). BRENDA in 2019: a European ELIXIR core data resource. Nucleic Acids Res. 47 (D1): D542–D549.

151

152

4 Proteome Constraints in Genome-Scale Models

13 Wittig, U., Rey, M., Weidemann, A. et al. (2018). SABIO-RK: an updated

14 15 16

17

18

resource for manually curated biochemical reaction kinetics. Nucleic Acids Res. 46 (D1): D656–D660. The UniProt Consortium (2018). UniProt: the universal protein knowledgebase. Nucleic Acids Res. 46 (5): 2699–2699. Bulovi´c, A., Fischer, S., Dinh, M. et al. (2019). Automated generation of bacterial resource allocation models. Metab. Eng. 55: 12–22. Thiele, I., Fleming, R.M.T., Bordbar, A. et al. (2010). Functional characterization of alternate optimal solutions of Escherichia coli’s transcriptional and translational machinery. Biophys. J. 98 (10): 2072–2081. Molenaar, D., van Berlo, R., de Ridder, D., and Teusink, B. (2009). Shifts in growth strategies reflect tradeoffs in cellular economics. Mol. Syst. Biol. 5 (1): 323. Lloyd, C.J., Ebrahim, A., Yang, L. et al. (2018). COBRAme: a computational framework for genome-scale models of metabolism and gene expression. PLoS Comput. Biol. 14 (7): e1006302.

153

5 Kinetic Models of Metabolism Hongzhong Lu 1 , Yu Chen 1 , Jens Nielsen 1,2 , and Eduard J. Kerkhoven 1 1 Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden 2

BioInnovation Institute, Copenhagen N, Denmark

5.1 Introduction Metabolism is the sum of all biochemical reactions that take place in living cells. Chapters 2 and 4 show that metabolism can be mathematically converted into equations by connecting all the reactions and metabolites based on reaction stoichiometry. The rate of each metabolic reaction can accordingly be estimated using constrained-based approaches, e.g. flux balance analysis (Chapter 2). While such approaches rely mainly on the stoichiometry of all reactions, rates of single reactions could also be calculated based on their own characteristics including enzyme concentrations, enzyme properties, metabolite concentrations, as well as how these factors can be integrated, i.e. the rate expressions, which is referred to as reaction kinetics. This encourages to integrate reaction kinetics with stoichiometric models, resulting in kinetic models of metabolism. Kinetic models are more explicit than stoichiometric models, and therefore suitable for some particular applications and analyses, especially metabolic control analysis (MCA), which will be detailed in Chapter 6. Here the kinetic models of metabolism are introduced followed by examples on the construction of kinetic models as well as applications. Note that most of the principles related with metabolism are also relevant for other dynamic processes in biology that could be described using kinetic models, including signaling pathways, pharmacokinetics, circadian rhythms, the cell-cycle, population dynamics, and so on.

5.2 Definition of Enzyme Kinetics 5.2.1

Michaelis–Menten Formula

The Michaelis–Menten formula is one of the earliest and best known mechanistic models to describe enzyme kinetics [1], which we will regard here as the basis for

Metabolic Engineering: Concepts and Applications, First Edition. Edited by Sang Yup Lee, Jens Nielsen, and Gregory Stephanopoulos. © 2021 WILEY-VCH GmbH. Published 2021 by WILEY-VCH GmbH.

5 Kinetic Models of Metabolism

(a) v (mM/min)

154

(b)

80

Vmax

60 40

½ Vmax

20 0

0

50

80

80

60

60

40

40 Increased kcat or enzyme level

20

KM 100 150 200

(c)

0

0

50

Increased Km

20

100 150 200

0

0

50

100 150 200

Substrate concentration (mM)

Figure 5.1 Relationship between parameters and reaction rates in Michaelis–Menten formulated enzyme kinetics. (a) K M is equal to the substrate concentration when the reaction rate v reaches half of the maximum reaction rate (V max ). (b) The maximum reaction rate is elevated when increasing the kcat or enzyme level. (c) The reaction rate is decreased when increasing the K M .

the building of kinetic models that is introduced later in this chapter. Note that not all enzymes demonstrate Michaelis–Menten kinetics, but it is a common and versatile approximation for most reactions. At the first glance, the reaction rate dependency on substrate concentration can be accurately described using the Michaelis–Menten formula (Figure 5.1a). In general, this expression is suitable for one-substrate reactions without backward reactions and effectors: k−1

E + cS ⇄ EcS →kcat E + cP , k1

(5.1)

where k 1 and k −1 are the association and disassociation constants for the enzyme and the substrate, respectively; and k cat is the catalytic rate constant, or the turnover number of the enzyme. c represents metabolite, where cS is substrate and cP is product, while E and EcS are the enzyme and enzyme–substrate complex, respectively. By taking the assumption that the concentration of enzyme is much smaller than the concentration of substrate, the product formation rate can be calculated using the Michaelis–Menten formula: 𝜈=

V ⋅c dcP = max S dt cS + KM

(5.2)

In the above expression, k−1 + kcat k1

(5.3)

Vmax = kcat Etotal

(5.4)

KM =

dEcS dE (5.5) + = 0 or Etotal = E + EcS = constant, dt dt where V max is the maximum reaction rate and K M is the Michaelis or affinity constant for the substrate, which are two important kinetic parameters of enzymes. Etotal represents the total enzyme concentration, by combining the concentration.

5.3 Factors Affecting Intracellular Enzyme Kinetics

5.3 Factors Affecting Intracellular Enzyme Kinetics With the Michaelis–Menten formulation, the influence of enzyme properties (K M , k cat ), enzyme abundance (Etotal ), and metabolite concentration (c) on the dynamic behavior of a reaction can be explained mechanistically. Firstly, the k cat reflects the maximum catalytic efficiency of an enzyme. As k cat increases, the maximum reaction rate is enhanced accordingly (Figure 5.1b). The k cat of a specific enzyme by itself can be influenced through inhibition, activation, synergistic effects, and allosteric regulation. For example, so-called noncompetitive inhibition could slow down the enzyme catalytic efficiency, as binding of the inhibitor prevents the enzyme from catalyzing the reaction, thereby reducing the amount of effective enzyme and reducing the apparent k cat . As another key enzyme parameter, K M characterizes the affinity between the enzyme and its reactant c. Its value is not constant across reactions and depends on enzyme structure, substrate structure, as well as the environmental pH, temperature, and ionic strength. Once the external conditions are fixed, K M is equal to the substrate concentration at which the reaction rate reaches half of V max (Figure 5.1a). If cS ≫ K M , the reaction rate approaches V max while if cS ≪ K M , the reaction rate is then simplified as: 𝜈=

Vmax ⋅ cS KM

(5.6)

In this case of low cS concentration, there exists an approximately linear correlation between the reaction rate and substrate concentration (Figure 5.1a). Meanwhile, if the substrate concentration is fixed, an enzyme with a higher K M value results in a lower reaction rate (Figure 5.1c). Generally, the ratio of k cat /K M can be used to evaluate the effectiveness of enzymes from different sources that all catalyze the same reaction: a higher k cat /K M indicates a higher catalytic efficiency of an enzyme, as a higher reaction rate can be reached with less substrate present. In addition to the enzyme characteristic parameters, also the enzyme levels constrain the rate through each reaction. Increases in the level of enzyme translate to increases in reaction rates (Figure 5.1b), especially if the substrate concentration is high (cS > K M ). In vivo, the intracellular enzyme levels are a tradeoff between the protein synthesis and protein degradation, which by themselves are affected by processes such as product inhibition, transcriptional, and feedback regulation. In metabolic engineering, enzyme levels can therefore be augmented through multiple strategies, i.e. increasing the gene copies, improving the enzyme stability, and relieve feedback inhibition of end products. Although the enzyme kinetics are useful to describe enzyme dynamics in isolation, they are not sufficient to describe the dynamic changes in metabolite concentrations and reaction rates in vivo. In reality, enzymes function in the context of a metabolic network consisting of many different reactions and enzymes, and dynamics of enzymes will mutually influence each other. This gives rise to emerging properties of the metabolic network, which is a function of the enzyme kinetics but can only be observed when studying the whole system and not only its

155

156

5 Kinetic Models of Metabolism

constituent parts in isolation. Therefore, it is required to take the enzyme kinetics and combine these in kinetic models.

5.4 Kinetic Model: Definition and Scope 5.4.1

What Is a Kinetic Model?

Kinetic models mechanistically represent the processes that take place within a cell, and these models are made up of a series of ordinary differential equations (ODEs). The ODEs encompass the detailed rate expressions and kinetic parameters that describe the dynamic behavior of individual reactions within the model, as described in the previous section. The mathematical formalisms of kinetic models, whether they follow Michaelis–Menten or other kinetics, can be summarized using Eq. (5.7). In the kinetic model, the enzyme kinetics and levels are the parameters of the model while metabolite concentrations are variables, all of which are absent in purely stoichiometric models. Thus, compared with stoichiometric models, a kinetic model can predict changes and dynamics of reaction rates (fluxes) and metabolites concentration over time [2]. dci (5.7) = S ⋅ v(E; ci ; k), i = 1, 2, 3, … , n, ci (0) = ci,0 dt In Eq. (5.7), S represents the stoichiometric matrix, v represents the vector of metabolic reactions or fluxes, and ci,0 represents the initial metabolite concentration in the system. Each reaction rate (vj ) is determined by the enzyme abundance (Ej ), metabolite concentration (ci ), and the corresponding kinetic parameters (k). 5.4.2

Scope of Kinetic Models

Typically, kinetic models consist of tens to hundreds of metabolic reactions with their detailed kinetic information from one or several sub-pathways. The scope and size of a kinetic model depends on the computational resources and the scientific questions to be answered. For long, kinetic models have been used to describe the dynamics of sub-pathways consisting of up to 10–20 reactions, such as glycolysis in beef heart supernatant by Garfinkel et al. in 1968 [3]. Since then, larger kinetic models have frequently been reconstructed while centering on multiple core metabolic pathways including glycolysis (EMP pathway), the pentose phosphate (PP) pathway, and tricarboxylic acid (TCA) cycle, which generally contain 50–100 reactions with detailed ODEs. To further increase the coverage of cellular metabolism, near genome-scale kinetic models with over 200 reactions have been developed for several intensively studied model organisms, i.e. E. coli [4] and S. cerevisiae [5]. However, to date no full genome-scale models (GEMs) with detailed enzyme kinetics have been built for E. coli and S. cerevisiae, let alone lesser studied organisms. A kinetic model requires the definition of rate equations and their respective parameters for each of the reactions, which are currently unknown for many of the reactions contained in GEMs. As an example, while there are about 700 Enzyme

5.4 Kinetic Model: Definition and Scope

Commission (EC) numbers associated to the S. cerevisiae GEM (Yeast8.3), only about 42 of these have K M values recorded in the relevant databases. An alternative approach to overcome this lack of kinetic parameters for large size models has been to use approximative rate equations, which will be discussed below.

5.4.3

How to Build a Functional Kinetic Model?

The general procedures to establish a functional kinetic model are summarized in Figure 5.2. After describing the metabolic network for which a model will be constructed, the kinetic rate expressions for individual reactions are gathered and subsequently combined with their respective parameter values to build a complete model. In detail, the procedures to establish a kinetic model are divided into the following five steps: Step 1. Define the metabolic network. In this step, one needs to decide on the scope of the metabolic network to be modeled, i.e. which sub-pathways to include. Based on this a detailed metabolic network structure is built, encompassing the stoichiometry of metabolites, reactions, and their respective enzymes. Ideally, also information regarding regulation and interaction among the components of the network should be gathered. The metabolic network is generated based on the genome annotation of the organism of interest and by consulting previous studies (Figure 5.2). Step 2. Define the kinetic rate expressions. Each reaction in the metabolic model will be assigned a rate expression, of which the Michaelis–Menten equation is one example. To infer kinetic rate expressions, the biochemical and mechanistic information should be gathered from biological databases and literature. Step 3. Assign parameter values. The rate expressions from Step 2 require parameterization, and these enzyme-specific parameters are either measured from experiments or queried from literature and/or databases. For unknown parameters, their value should be obtained by, e.g. taking reported values from the same reaction but a different organism, or through inference by simulating the

1. Describe structure of metabolic network 2. Define kinetic rate expressions 3. Assign parameter values 4. Define initial concentrations

Quality control and analysis Kinetic model Parameter estimation

Quality curation

Cross validation

5. Model simulation

Figure 5.2 Framework to build a functional kinetic model.

157

158

5 Kinetic Models of Metabolism

model with arbitrary parameter values and compare the simulation results with measured data. Step 4. Define initial concentrations of metabolites and enzyme levels, based on measured values or reported data. Step 5. Conduct simulation with the complete kinetic model. With the information from Steps 1 to 4, a kinetic model containing ODEs are defined (Eq. (5.7)). During simulations with this model, measured physiological data, metabolite concentrations, enzyme levels, and 13 C labeled fluxes can all be used to evaluate the predictive performance of the kinetic model. Once a kinetic model of high quality is obtained, it will be further used for practical applications in metabolic engineering and biological discoveries, as discussed later in this chapter. As the reaction rate expressions and their related kinetic parameters, Steps 2 and 3, are arguably the most important components of a kinetic model, they will be discussed in more detail below.

5.5 Main Mathematical Expressions in Description of Reaction Rates Reaction kinetics can be described with mathematical expressions where the reaction rates are functions of kinetic parameters and the concentration of metabolites. These rate expressions are of varying complexity, referring to the catalytic mechanism they describe, potential regulatory properties, and the number of required parameters, and include mechanistic, approximate, and stochastic formulas. Of these, the stochastic formulas are typically the most complex and computationally intensive, while their use should be deliberated if stochasticity is expected to play an important role in the simulated system, such as noise in signal transduction or gene expression. However, such processes can often be disregarded when studying a large population of cells, such as a bioreactor cultivation of micro-organisms, as the stochastic behavior is normalized over the sheer number of cells. Instead, mechanistic (Box 5.1) and approximate (Box 5.2) rate expression are most frequently used for kinetic models of metabolism. 5.5.1

Mechanistic Rate Expressions

Underlying mechanistic rate expressions is the law of mass action, which assumes that a reaction rate is proportional to the concentration of its reactants. While mass action expressions are suitable for enzymatic or transporter reactions, the equations become very large if every step of the enzyme catalytic process is to be described and this results in many parameters that are hard to measure (see Eq. (5.1), there would also be association and dissociation constants for the product and a reverse catalytic constant). Instead, mass action kinetics is typically reduced to the aforementioned Michaelis–Menten equation with its apprehensible parameters, while the Hill equation is another example of a mechanistic rate expression (Box 5.1). These mechanistic expressions distinctly

5.6 Approximative Rate Expressions

clarify the roles of various factors on the reaction rates, as shown in the above Michaelis–Menten equations. More importantly, such mechanistic expressions can easily be extended based on new experimental evidences, to cover reactions with more reactants and products, as well as with complex regulatory mechanisms, i.e. activation and inhibition in enzyme activity by metabolites (Box 5.1). Box 5.1 Mechanistic rate expressions: two typical examples. Michaelis–Menten. As introduced earlier in this chapter, the Michaelis–Menten formulation is a mechanistic rate expression that can be used when the reaction kinetics follow the distinct hyperbolic saturation curve (Figure 5.1). Based on the simplified case shown in Eq. (5.2), the Michaelis–Menten formulation can be extended to describe more complex cases, such as competitive substrate inhibition, where a nonreactant metabolite competes with the substrate to bind the same part of the enzyme. 𝜈=

Vmax ⋅ c dc = , Ki is the inhibition constant dt KM + c(1 + (c∕Ki ))

(5.8)

Hill equation. Enzymes that have sigmoidal saturation curves can have their kinetics described using a Hill equation. A typical case of such an enzyme is a homomultimer, where the affinity of the subunits for the substrate increases if one or more other subunits are already bound to the substrate. This cooperativity is represented in the unitless Hill coefficient n which represents positive cooperativity with a value >1: 𝜈=

⋅ cn V dc = max , n dt K0.5 + cn

(5.9)

where K 0.5 is the half-maximal concentration constant. If n = 1, K 0.5 is equal to the Michaelis constant (K M ).

5.6 Approximative Rate Expressions While mechanistic rate expressions are simplifications of mass action kinetics, they can still be very complex for reactions with multiple reactants, products, and/or intertwined metabolic regulation, such as the influences of pH, temperature, and cofactor concentrations. Moreover, detailed mechanistic knowledge and measured parameter values are typically inadequate for enzymes that are not catalyzing reactions in central carbon metabolism. For nonmodel organisms, this situation is further exacerbated by the sheer lack of measured kinetic parameters. To circumvent these issues, approximative rate expressions is also adopted in the kinetic model reconstruction. There are various approximative rate expressions, including generalized mass action, log-lin, and lin-log (Box 5.2). Compared with the mechanistic Michaelis–Menten formulation, the approximative rate expressions are of lower complexity, thereby enhancing the computation efficiency for large-scale kinetic models. Simulations of central carbon metabolism in E. coli

159

160

5 Kinetic Models of Metabolism

using a lin-log model are consistent with a mechanistic model [6], even when simple structures and few parameters were used in the lin-log model. Due to the above advantages, lin-log expressions have been used to reconstruct a kinetic model of yeast metabolism with 240 reactions [5]. Box 5.2 Approximate expressions: two typical examples. Power-laws. Generalized mass action (GMA) is a so-called power law formalism with noninteger exponents (Eq. (5.10)) [7, 8]. In the GMA expressions, the reaction rate is proportional to the enzyme activity, as well as to the power law involving dependent and independent concentrations of metabolites. Compared with mechanistic rate expressions, the GMA expressions reduce the parameters in the formula. m n ∏ ∏ aij bik cin,j ⋅ cex,k , (5.10) vi = ki ⋅ Ei ⋅ j=1

k=1

where E i represents the enzyme level; cin and cex are dependent (intracellular) and independent (extracellular, or constant) metabolite concentrations, respectively; ki , aij , and bik denote kinetic coefficients that can be obtained from fitting the equation to observed enzyme dynamics. Lin-log. The lin-log modeling approach is basically the same as the power law formulation, but through transformation to the logarithmic domain a set of linear equations are obtained, which significantly simplify parameter estimations. Thus, even though the relation between the reaction rate and enzyme level, metabolites concentration and kinetic parameters is highly nonlinear, the linear logarithmic (lin-log) approximation handles reaction rates in a simple and linear manner to obtain an analytic solution. In the lin-log expressions, the rate is proportional to enzyme level, as well as to a linear sum of logarithms of metabolites concentration [9] (the so-called lin-log), which is thus beneficial to reduce the kinetic parameters used in the rate expressions. 𝜈i = Ei (ai + pi.1 ln cin,1 + pi.2 ln cin,2 + · · · + pi.m ln cin,m + qi.1 ln cex,1 . . + qi.2 ln cex,2 + · · · + qi,r ln cex,r ),

(5.11)

where E i represents the enzyme level; cin and cex are dependent (intracellular) and independent (extracellular, or constant) metabolite concentrations, respectively; p and q represent the independent kinetic coefficients that are obtained by fitting the rate expression to observed enzyme dynamics.

5.7 Approaches to Assign Parameters in the Rate Expressions Estimation of parameters in rate expressions is essential for having good predictive performance of a kinetic model. However, the determination of parameter through experimentation, estimation, or fitting is one of most challenging aspects of kinetic model reconstruction. There are three main procedures to

5.7 Approaches to Assign Parameters in the Rate Expressions

obtain kinetic parameters that are included in a rate expression, namely direct experimental measurement through enzyme assays; querying databases and literature of previously reported enzyme parameters; and parameter inference using statistical analysis. 5.7.1

Direct Measurements of Kinetic Parameters in Enzyme Assays

Enzyme parameters can be measured in in vitro enzyme assays, using either purified enzymes or whole-cell extracts. The reaction rate is typically measured colorimetrically, e.g. by coupling the product formation with a chemical reaction that causes a color change. Sequentially changing the substrate concentration in the enzyme assays can then yield the K M and V max parameters that are part of Michaelis–Menten expressions. Here it should be noted that the in vitro measured parameters can be different from those observed in vivo, even while a positive correlation between the in vitro and in vivo kinetic parameters exists [10]. Such inconsistencies are likely due to the fact that in vitro assay conditions are distinct from the in vivo intracellular microenvironment, e.g. metabolite concentrations, temperature, pH, osmotic strength, and presence of potential inhibitors. Nonetheless, it is not uncommon that V max values are reported at assay conditions that give the highest activity but that are far from in vivo-like conditions, and Van Eunen et al. showed that the use of such unrealistic parameters resulted in unrealistic metabolite concentrations in a model of yeast glycolysis [11]. It is therefore important to perform enzyme assays in conditions that are close to in vivo conditions. 5.7.2

Querying Databases

Instead of measuring all kinetic parameters that are required for a kinetic model, one can also leverage previous work and download kinetic data of specific enzymes from the BRENDA (https://www.brenda-enzymes.org) and SABIO-RK (http://sabio.villa-bosch.de/SABIORK) databases (Figure 5.3 as an example), which now contain 68 963 and 648 732 enzyme entries, respectively, with detailed annotation information [12, 13]. When querying and using this public data in kinetic models, one implicitly assumes that the measured kinetic parameters are conserved or similar for strains of different genetic background, as the databases collate data from many different studies and not measurements from one single strain isolate. Moreover, the parameters compiled in these databases are measured under many different experimental conditions, of which many are not representing relevant in vivo-like conditions. This does not render the information from these databases unusable, but it does indicate that one should be critical about the provided parameter values. In addition, the available parameters are not evenly distributed across all enzyme catalyzed reactions: many enzymes have been subjected to few studies, while for some enzymes many experimentally measured parameter values are provided. To fully address all these issues, it becomes near indispensable to perform parameter estimation based on large scale of physiological data, as will be discussed later in this chapter.

161

162

5 Kinetic Models of Metabolism

Figure 5.3 An entry example from SABIO-RK13 (http://sabio.villa-bosch.de/SABIORK) with detailed reaction and kinetic information for the enzyme glucose-6-phosphate isomerase. Source: HITS, gGmbH.

5.7.3

Inferring from Measured Fluxes

Besides directly measuring kinetic parameters, in vivo k cat values can be estimated from accurate measurements of fluxes and protein abundances [10] (Figure 5.4). Fluxes through the individual reactions can be determined based on 13 C metabolic flux analysis, or alternatively from flux balance analysis (FBA). Enzyme levels can subsequently be determined by proteomics, where all protein levels are quantitatively measured by mass spectrometry. From this, the apparent k cat is calculated as the ratio between the fluxes and the corresponding enzyme levels. By applying this approach across a series of different cultivation conditions, across maximum apparent k cat values are regarded as the maximum

5.7 Approaches to Assign Parameters in the Rate Expressions

GLC

13C

HXK

G6P F6P PFK

vPGI PGI

fluxes FBA

Condition 1, kcat1 vPGI = 6 mmol/gbiomass . h [PGI] = 3343 proteins/cell

Mass spectrometry

kcat

vPGI [PGI]

Condition 2, kcat2 Condition 3, kcat3 kmax,in vivo≈ max kcat,i

Figure 5.4 Procedure to calculate in vivo kcat values. For each reaction, the kcat is calculated based on fluxes and enzyme abundances. By performing this procedure under a series of cultivation conditions, reaction specific maximum kcat values are assigned.

in vivo k cat (Figure 5.4). Through this procedure, relatively reliable k cat values can be obtained from their relevant in vivo conditions without laborious enzyme assays. A drawback of this procedure is that it requires accurate simultaneous determination of fluxes and proteomics data. 5.7.4

Parameters Inference Using the Statistical Analysis

While kinetic parameters can be obtained from databases or measured in vivo or in vitro, it is not unlikely that a large number of parameters are still unknown when constructing a kinetic model, especially for a model of large size. Consequently, parameter inference through data fitting is increasingly used to get a fully functional kinetic model. In general, parameter estimation aims to sample reasonable parameter sets that minimize the distance between model predictions and experimental data. Parameter estimation can be a complex process for large-scale kinetic models, as the number of rate expressions with unknown parameter values increases drastically. Thus, effective parameter fitting methods become important to balance computation cost and accuracy in the parameter inference. Various algorithms have been conceived to conduct the parameter estimation, of which Maximum Likelihood Estimator (MLE)- and Monte Carlo-based approaches are widely used. MLE-based approaches try to find the best estimate for each parameter and quantify the uncertainties in these estimates, while Monte Carlo-based approaches sample from probability distributions for each parameter, to extract values that result in a reasonable output based on the optimized objective function. Because of the nonlinear structures of kinetic models and the typical large number of unknown kinetic parameters, MLE-based approaches are not suitable for reasonable parameter fitting of complex and large kinetic models. In contrast, Monte Carlo-based approaches estimate parameter distributions instead of singular parameter values; this partially overcomes the drawbacks of MLE, and is therefore commonly used for parameter estimation of large kinetic models. Regardless which algorithm is used for parameter inference, the estimated parameters should be comprehensively evaluated. For this, simulation results that are obtained from the constructed kinetic models are to be compared with relevant experimental measurements, until the model is able to yield acceptable results.

163

164

5 Kinetic Models of Metabolism

Example 5.1 A toy kinetic model. A toy model with two reactions, two enzymes, and four metabolites, is used here to illustrate how to generate a simple kinetic model for specific sub-pathways, and how to use this kinetic model to predict the dynamic behaviors of the system (Figure 5.5). The five steps of kinetic model reconstruction are: 1) Describe the structure of the toy network, made up of two reactions that are catalyzed by enzymes E1 and E2 , with their corresponding reaction rates v1 and v2 . 2) Define the kinetic rate expressions. Michaelis–Menten kinetics are defined for both reactions, where the first reaction has two substrates and therefore a more expanded rate expression. Based on the network structure and rate expressions, mass balance equations are defined for the four metabolites (c1 , c2 , c3 , c4 ), which describe how their respective concentrations are affected by the reaction rates. 3) Assign parameters values for the two rate expressions from measurements or databases. 4) Define initial concentrations of metabolites and enzymes, according to the experimental condition or measurement. 5) Model simulation. The toy kinetic model can now be used to predict the evolution of fluxes and metabolites concentration data over time. The output of such a time-course simulation is shown in Figure 5.5, where the concentration of the first set of metabolites (c1 , c2 ) decrease gradually, while the concentration of

Figure 5.5 A toy example to show how to build kinetic model.

5.7 Approaches to Assign Parameters in the Rate Expressions

the intermediate metabolite (c3 ) first rises and then falls after 20 minutes, and the concentration of end metabolite (c4 ) is gradually increasing. Example 5.2 A functional kinetic model for core metabolic pathway of yeast. To display kinetic model reconstruction of a real sub-pathway, a kinetic model covering glycolysis in yeast is taken as an example (Figure 5.6). To obtain a functional kinetic model predicting the dynamic behavior of this pathway, the following steps are essential: 1) Define the metabolic network. From literature reports, we can define the metabolic pathway that we want to model, including detailed annotation on reactions, metabolites, and enzymes. It is not uncommon to lump very long pathways into simpler reactions, to reduce the complexity of the subsequent kinetic model and decrease the number of unknown parameters. Here the reaction network of EMP pathway is listed as a simple example (Figure 5.6) of the real metabolic models, which consists of metabolites and enzymes for each reaction from EMP pathway. 2) Define the kinetic rate expressions. Shown are the Michaelis–Menten rate expressions for phosphoglucose isomerase (PGI) and phosphoglycerate mutase (PGM), who among others affect the concentrations of glucose 6-phosphate and 3-phosphoglycerate, as detailed in their mass balances. 3) Assign parameters values, where enzymes that have had their kinetics characterized can have their kinetic parameters collected from literature or databases, shown here for PGI. Unknown parameters are estimated by parameter inference while the known parameters are set.

Figure 5.6 Detailed steps to build a functional kinetic model which could predict the cellular physiology. Source: Adapted from Smallbone et al. [14].

165

166

5 Kinetic Models of Metabolism

4) Define initial concentrations based on the measured metabolite concentrations (or assumption thereof ) and measured enzyme levels. 5) Model simulation. Predicted fluxes, enzyme levels, and metabolite concentrations should be compared with experimental values to validate the model. Typically, multiple rounds of curation are required to result in a model that is highly consistent with experimental measurements. Once the quality of the kinetic model has been deemed sufficient, it will be used in the further applications, such as metabolic control analysis (Chapter 6) or strain design in metabolic engineering.

5.8 Applications Kinetic models can be used for various applications, which are largely divided into three groups, i.e. (i) metabolic control analysis (MCA)-based methods; (ii) time-course simulations of dynamic processes; and (iii) integrative analysis of omics data. Kinetic models enable the calculation of control of each enzyme on the flux through a pathway through MCA-based approaches (Figure 5.7a). Key enzymes that exert high control on a particular pathway indicate promising targets in the fields of biotechnology and systems medicine. Indeed, kinetic models have succeeded in predicting metabolic engineering targets that improve productivity in microbial cell factories [15–17]. For example, a kinetic model was used to identify limonene synthase as a key metabolic flux-controlling enzyme for limonene biosynthesis in the cyanobacterium Synechococcus elongatus, resulting in improved limonene production by increasing the enzyme level through genetic engineering [17]. In systems medicine, kinetic models and MCA approaches have been used to identify putative drug targets in biochemical networks [18, 19]. Kinetic models are also uniquely suitable for simulating time-dependent behavior that cannot be captured by steady state models, thereby possibly providing optimization strategies for industrial bioprocesses (Figure 5.7b). Ideally, also the extracellular conditions in which the cells are cultured should be modeled. An example of this is the use of a kinetic model of Chinese hamster ovary (CHO) cells to simulate a fed-batch cultivation, which was able to capture time-dependent extracellular metabolite concentrations and the effect of various process variables on antibody production. By simulating over 9000 combinations of process variables, e.g. cell density at inoculation; day at which the culture was shifted to a lower temperature (a strategy to helps to balance cell growth and protein productivity); how many days after inoculation the temperature shift took place; and knockdowns of metabolic enzymes, the researchers we able to optimize antibody production by modifying some of the process parameters [20]. With the increasing ease of high-throughput data generation, kinetic models can provide a framework for the analysis of omics data, which are large-scale measurements of cellular components, e.g. protein, mRNA, and metabolite concentrations. Notably, some of the omics data have direct connections to

Metabolic control analysis

Potential target

Strain improvement Drug discovery

(a)

(b)

Kinetic model

Time

Fluxomics

Omics data

Simulated Measured

dci dt

Proteomics

Concentration

Process variable

Metabolomics

5.9 Perspectives

= S v (E; ci; k)

Kinetic parameters Regulatory mechanisms

(c)

Figure 5.7 Applications of kinetic models of metabolism. (a) In metabolic control analysis, kinetic models are able to calculate the control of each enzyme on the flux through a network. Enzymes with high control are potential targets for strain improvement and drug discovery. (b) Kinetic models are able to simulate dynamic processes. The example illustrates measured and simulated time-course concentrations of a metabolite in response to different levels of a process variable. (c) Kinetic models are suitable for analyzing omics data as rate expressions contain cellular components, i.e. metabolite (ci ); enzyme (E); as well as metabolic rate (v), which correspond to metabolomics, proteomics, and fluxomics data, respectively. Integrated analysis of multiomics data can estimate enzyme kinetics and regulatory mechanisms. The equation corresponds to Eq. (5.7).

the parameters and variables of kinetic models (Figure 5.7c). When analyzing single-type omics data, kinetic models can give deeper understanding that cannot be obtained from the data alone. For example, researchers found that estimating kinetic parameters in personalized kinetic models of erythrocyte metabolism are better representations of the individual’s genotype than based on metabolomics data [21]. Furthermore, kinetic models enable integration of multiple different types of omics data (multiomics), thereby bringing systematic insights on metabolism and regulation. For example, researchers used a kinetic modeling framework to perform integrated analysis of proteomics, metabolomics, and fluxomics data and identified that substrate concentrations are the strongest drivers of metabolic fluxes [22].

5.9 Perspectives In essence, kinetic models can be regarded as stoichiometric models that are augmented with rate expressions that account for kinetic information, which can render this model approach advantageous over classical stoichiometric GEMs in particular aspects. Kinetic models can quantitatively simulate metabolite concentrations which cannot be achieved by stoichiometric GEMs, while kinetic models are also more suitable to simulate dynamic responses where GEMs rely on the steady state assumption. Meanwhile, the augmentation with kinetic information also has drawbacks. Simulations with kinetic models are relatively computationally expensive as nonlinear optimization problems need to be solved. In addition, the construction of kinetic models requires copious experimental knowledge on rate expressions and kinetic parameters, and while these can be assumed, estimated, or simplified, this introduces more uncertainties.

167

168

5 Kinetic Models of Metabolism

Considering both the pros and cons of kinetic models and stoichiometric GEMs, researchers have proposed hybrid modeling approaches that utilize simplified rate expressions where enzyme turnover rates are set as constraints on fluxes through stoichiometric networks [23]. Despite these advanced modeling approaches, challenges remain including unknown parameters, which for such models are purely turnover rates. This has raised calls for measurements of the kinetome [24], i.e. all enzyme turnover rates in a cell, information that is also required in so-called proteome-constrained models (Chapter 4). With continuing progress in the generation and utilization of kinetic models, two directions can be envisioned for future advances. First, the quality of kinetic models can be greatly improved. While pathway and network stoichiometries can now largely be readily obtained from genome annotation and reaction databases, the quality of kinetic models instead depends greatly on the availability of knowledge on rate expressions, kinetic parameters, and concentrations of cellular components such as metabolites. Although the kinetics of many enzymes have been characterized, the obtained parameters are mostly in vitro, which are not necessarily representative of their in vivo behavior [25]. While inference from measured fluxes and omics data is promising, the amount of absolutely quantified omics data is still sparse, and it remains challenging to measure metabolites from different subcellular compartmentation in eukaryotes [26]. All of these points currently hamper improvement of kinetic models, but likewise indicate where significant breakthroughs can be made. In addition to the approaches of parameter estimation and inference as mentioned above, it is also anticipated that machine learning algorithms will be able to effectively determine parameter values in the near future, as such approaches have already shown applications in various biological studies [27]. The second direction for further advancement in kinetic models would be the efficacious implementation of kinetic models on a genome scale. A number of recent efforts have been made to build genome-scale kinetic models, where the kinetic model k-ecoli457 covers major parts of E. coli metabolism and has shown even better predictive power than constraint-based approaches in terms of predicting yields of many products [4]. There are, however, several obstacles on the way to genome-scale kinetic models. In addition to the high computational cost of model simulations, again the lack of large-scale data and missing rate expressions for individual reactions needs to be overcome for model construction and calibration. As several approaches are taken to address these obstacles [28–30], it is anticipated that genome-scale kinetic models have promise for wide application in the future.

References 1 Ainsworth, S. (1977). Michaelis–Menten kinetics. In: Steady-State Enzyme

Kinetics, 43–73. London: Macmillan Education UK. 2 Stalidzans, E., Seiman, A., Peebo, K. et al. (2018). Model-based metabolism

design: constraints for kinetic and stoichiometric models. Biochem. Soc. Trans. 46: 261–267.

References

3 Garfinkel, D., Frenkel, R.A., and Garfinkel, L. (1968). Simulation of the

4

5

6

7 8

9 10

11

12 13

14

15

16

17

18

detailed regulation of glycolysis in a heart supernatant preparation. Comput. Biomed. Res. 2: 68–91. Khodayari, A. and Maranas, C.D. (2016). A genome-scale Escherichia coli kinetic metabolic model k-ecoli457 satisfying flux data for multiple mutant strains. Nat. Commun. 7: 13806. St. John, P.C., Strutz, J., Broadbelt, L.J. et al. (2019). Bayesian inference of metabolic kinetics from genome-scale multiomics data. PLoS Comput. Biol. e1007424: 15. Visser, D., Schmid, J.W., Mauch, K. et al. (2004). Optimal re-design of primary metabolism in Escherichia coli using linlog kinetics. Metab. Eng. 6: 378–390. Savageau, M.A. (1976). Biochemical Systems Analysis. Reading, MA: Addison-Wesley Pub. Co. Tucker, W., Kutalik, Z., and Moulton, V. (2007). Estimating parameters for generalized mass action models using constraint propagation. Math. Biosci. 208: 607–620. Heijnen, J.J. (2005). Approximative kinetic formats used in metabolic network modeling. Biotechnol. Bioeng. 91: 534–545. Davidi, D., Noor, E., Liebermeister, W. et al. (2016). Global characterization of in vivo enzyme catalytic rates and their correspondence to in vitro k(cat) measurements. Proc. Natl. Acad. Sci. U S A 113: 3401–3406. van Eunen, K., Kiewiet, J.A.L., Westerhoff, H.V., and Bakker, B.M. (2012). Testing biochemistry revisited: how in vivo metabolism can be understood from in vitro enzyme kinetics. PLoS Comput. Biol. 8: e1002483. Jeske, L., Placzek, S., Schomburg, I. et al. (2019). BRENDA in 2019: a European ELIXIR core data resource. Nucleic Acids Res. 47: 542–549. Wittig, U., Rey, M., Weidemann, A. et al. (2017). SABIO-RK: an updated resource for manually curated biochemical reaction kinetics. Nucleic Acids Res. 46: D656–D660. Smallbone, K., Messiha, H.L., Carroll, K.M. et al. (2013). A model of yeast glycolysis based on a consistent kinetic characterisation of all its enzymes. FEBS Lett. 587: 2832–2841. Cintolesi, A., Clomburg, J.M., Rigou, V. et al. (2012). Quantitative analysis of the fermentative metabolism of glycerol in Escherichia coli. Biotechnol. Bioeng. 109: 187–198. Andreozzi, S., Chakrabarti, A., Soh, K.C. et al. (2016). Identification of metabolic engineering targets for the enhancement of 1,4-butanediol production in recombinant E. coli using large-scale kinetic models. Metab. Eng. 35: 148–159. Wang, X., Liu, W., Xin, C. et al. (2016). Enhanced limonene production in cyanobacteria reveals photosynthesis limitations. Proc. Natl. Acad. Sci. U. S. A. 113: 14225–14230. Murabito, E., Smallbone, K., Swinton, J. et al. (2011). A probabilistic approach to identify putative drug targets in biochemical networks. J. R. Soc. Interface 8: 880–895.

169

170

5 Kinetic Models of Metabolism

19 Haanstra, J.R., Gerding, A., Dolga, A.M. et al. (2017). Targeting pathogen

metabolism without collateral damage to the host. Sci. Rep. 7: 40406. 20 Nolan, R.P. and Lee, K. (2012). Dynamic model for CHO cell engineering.

J. Biotechnol. 158: 24–33. 21 Bordbar, A., McCloskey, D., Zielinski, D.C. et al. (2015). Personalized

22

23

24 25

26

27

28

29

30

whole-cell kinetic models of metabolism for discovery in genomics and pharmacodynamics. Cell Syst. 1: 283–292. Hackett, S.R., Zanotelli, V.R.T., Xu, W. et al. (2016). Systems-level analysis of mechanisms regulating yeast metabolic flux. Science 354: aaf2786. https://doi .org/10.1126/science.aaf2786. Sanchez, B.J., Zhang, C., Nilsson, A. et al. (2017). Improving the phenotype predictions of a yeast genome-scale metabolic model by incorporating enzymatic constraints. Mol. Syst. Biol. 13: 935. Nilsson, A., Nielsen, J., and Palsson, B.O. (2017). Metabolic models of protein allocation call for the kinetome. Cell Syst. 5: 538–541. Teusink, B., Passarge, J., Reijenga, C.A. et al. (2000). Can yeast glycolysis be understood in terms of in vitro kinetics of the constituent enzymes? Testing biochemistry. Eur. J. Biochem. 267: 5313–5329. Verhagen, K.J., van Gulik, W.M., and Wahl, S.A. (2020). Dynamics in redox metabolism, from stoichiometry towards kinetics. Curr. Opin. Biotechnol. 64: 116–123. Kim, G.B., Kim, W.J., Kim, H.U., and Lee, S.Y. (2020). Machine learning applications in systems metabolic engineering. Curr. Opin. Biotechnol. 64: 1–9. Tummler, K. and Klipp, E. (2018). The discrepancy between data for and expectations on metabolic models: how to match experiments and computational efforts to arrive at quantitative predictions? Curr. Opin. Syst. Biol. 8: 1–6. Saa, P.A. and Nielsen, L.K. (2017). Formulation, construction and analysis of kinetic models of metabolism: a review of modelling frameworks. Biotechnol. Adv. 35: 981–1003. Srinivasan, S., Cluett, W.R., and Mahadevan, R. (2015). Constructing kinetic models of metabolism at genome-scales: a review. Biotechnol. J. 10: 1345–1359.

171

6 Metabolic Control Analysis David A. Fell Oxford Brookes University, Oxford, UK

Metabolic control analysis (MCA) arose as a mathematical framework for relating the system properties of metabolic networks, such as metabolic fluxes and metabolite concentrations, to the activities and kinetic properties of their components: the enzymes and transporters. It demonstrated that the fluxes and concentrations, in principle, depended on all the components of the system, so could not be attributed to a single key enzyme. This has subsequently been borne out in many experimental studies. An important consequence is that the extent to which a metabolic flux can be increased by any one enzyme is constrained by the system interactions, which has considerable implications for the design of metabolic engineering strategies to achieve large increases in the flux to a desired product. An initial limitation of MCA was that it only offered an exact insight into the control of metabolism in the immediate neighborhood of a particular metabolic state. However, there are now extensions to the theory that can provide reasonable approximations to the response to large perturbations of metabolism. Some investigations into engineering specific changes in metabolism have used designs influenced by MCA. Many others have not (or at least, not explicitly), but even in these cases, MCA can still be applied retrospectively to provide deeper insight into why some interventions worked and others did not, or had some undesirable consequences as well. Hence rational design of a metabolic engineering strategy can be assisted by understanding the principles of MCA.

6.1 The Metabolic Engineering Context of Metabolic Control Analysis A major goal of metabolic engineering is to make changes in the fluxes through specific parts of metabolism, or the concentrations of specific metabolites, by a rational process that begins with a design of the alterations that need to be made, followed by implementation and testing, and then perhaps updating the design in the light of the results obtained. What makes this a difficult problem is that the metabolism of a cell is an interconnected network of thousands of enzyme and transport reactions that produce and consume thousands of metabolites. (In Metabolic Engineering: Concepts and Applications, First Edition. Edited by Sang Yup Lee, Jens Nielsen, and Gregory Stephanopoulos. © 2021 WILEY-VCH GmbH. Published 2021 by WILEY-VCH GmbH.

172

6 Metabolic Control Analysis

the context of this chapter, “reactions” include enzyme-catalyzed reactions, the spontaneous chemical reactions that some metabolites undergo, and the actions of transporters embedded in cell membranes.) Even in the case where the targeted metabolism carries a significant fraction of the carbon flux, so that the many reactions carrying minor fluxes to make rare components of the cell can be largely ignored, the relevant, higher flux part of the network can still consist of tens to a few hundred reactions. The fluxes and metabolite concentrations that arise in the cell are system properties arising from the interactions between all the relevant enzymes and metabolites, and depending on the activities and kinetic properties of the enzymes and transporters, as well as the environment of the cell, such as availability of nutrients. Hence the central problem in rational design of a metabolic engineering strategy is finding out what modifications, and to which components, will result in the desired outcome from the metabolic network. In most branches of engineering, the standard design tool for a multicomponent system would be computer-aided simulation, but this has had limited application in metabolic engineering, though some examples will be presented later in Section 6.3. Kinetic simulations of multienzyme pathways can model fluxes and metabolite concentrations (See Chapter 5) but require an appropriate rate function for each enzyme, along with values of all the relevant parameters (such as limiting rate or V m , K m , K i ) obtained under conditions matching the intracellular state as closely as possible. Generally, this information is not fully available for the majority of enzymes, and, such as there is, has rarely been measured under appropriate conditions, nor for very many cell types. Hence, most of the metabolic models of this type (e.g. the BioModels database: www .ebi.ac.uk/biomodels) are of relatively small sections of metabolism (up to a few tens of reactions) and are mostly used to assess the extent of our knowledge or explore hypotheses about the operation of the pathway (e.g. later examples in this chapter [1–3]). A few larger models exist covering a larger fraction of central carbon metabolism, but these are less detailed in terms of representation of enzyme properties. In the case of artificial pathways developed within synthetic biology, the BioBricks project (https://biobricks.org) aims to simplify computer-aided design by using well-characterized standardized components. Even so, the components do not always function as expected when introduced into a non-native host cell. Larger models, up to genome-scale, can be made by using only the stoichiometries of the reactions, and are analyzed by the linear programming technique termed Flux Balance Analysis (See Chapter 2). These can answer questions about the feasibility of producing a given product, and its potential maximum yield, within a specific cell, as well as indicating potential stress points in the network caused by increased production and candidate reactions for deletion to assist redirection of fluxes in the network. However, with detailed kinetic information about the reactions being absent, the choice of the specific reactions to be altered and the consequent effects on fluxes and metabolite concentrations cannot be modeled. Given the absence of detailed and accurate models to assist the metabolic engineering design process, it is necessary to fall back on a more generic,

6.1 The Metabolic Engineering Context of Metabolic Control Analysis

systems-level understanding of how fluxes and concentrations in metabolism are controlled and regulated. This can increase the chances that the designed intervention is appropriate and will move the metabolic response in the required direction, but will not necessarily be quantitatively accurate, especially if the change made is large. In this context, it was unfortunate that, for much of the last century, metabolic biochemists relied on verbal reasoning to develop their concepts of control and regulation, and this led to general adoption of the principle that the flux in a metabolic pathway is controlled by a single pacemaker enzyme – the rate-limiting step. As will be shown below, this is rarely the case, and the criteria used to identify candidates for the rate-limiting step were inaccurate. (This is discussed in more detail than can be given here in my book [4].) This was compounded by a partial understanding of the effects of feedback inhibition in metabolic pathways, such that feedback-inhibited (or allosteric) enzymes were regarded as prime candidates for the rate-limiting step of their pathway. From the 1960s onward, mathematical interpretations of metabolic control began to develop, using the concept of sensitivity analysis, where the responsiveness of a system variable (in this case fluxes or concentrations) to a change in a parameter of a system component (such as the activity of an enzyme) is evaluated (symbolically or quantitatively) taking into account the interactions and properties of all the system components. Two main strands of this approach emerged: Savageau’s Biochemical Systems Theory (BST) [5] and Metabolic Control Analysis (MCA), developed independently by Kacser and Burns [6] and Heinrich and Rapoport [7]. Though formulated and parameterized somewhat differently, they are fundamentally consistent and allowed critical examination of the concepts of the rate-limiting step and feedback inhibition, revealing the inadequacy of the conventional views. These theoretical developments led to the design of experimental approaches for measuring the responses of metabolic fluxes to changes in enzyme activities; the results supported the theory and showed that total control of flux by a single rate-limiting enzyme was very far from the norm (see [4] for examples), and that claimed candidates for the control of a pathway failed to qualify. Unfortunately, many biochemistry textbooks have not reflected these long-standing findings and continue to refer to rate-limiting steps and to give as instances enzymes that have been shown not to be. This has certainly misled researchers into overexpressing such enzymes in the expectation of increasing a metabolic flux only for them to be disappointed. On the other hand, metabolic engineers who have been aware of the theoretical and experimental findings have made progress toward more successful designs. In this chapter, I will first present some of the principal elements of the theory of MCA relevant to understanding the responses of metabolic systems to perturbations of enzyme activities. However, because MCA is a sensitivity analysis conducted about a reference point (the unperturbed metabolic steady state), it becomes less accurate as the size of the perturbation increases. Hence I will then consider what lessons from MCA are still relevant to large changes and the extent to which the responses to them can be anticipated. Finally I will present examples of the application

173

174

6 Metabolic Control Analysis

of MCA to the modification of pathways and processes of biotechnological interest.

6.2 MCA Theory 6.2.1

Metabolic Steady State

MCA describes the properties of a dynamic steady state; that is, it is assumed that fluxes and metabolite concentrations in the metabolic system have become time-invariant, apart from the inputs and outputs of matter that generate the flows. This corresponds to a common setup for metabolic experiments, where the input material is supplied in an amount that is either only very slowly depleted by the metabolizing cells or tissues, or else is continually replenished, as in the continuous culture of microbial cells. Similarly, the outputs of metabolism are either diluted into a large volume or washed out by the flow of medium in continuous culture. The inputs may also be termed the source or nutrients and the outputs the sink or waste products. They can also be referred to as the external metabolites, in contrast to the metabolites in the pathway, which are internal (Figure 6.1). Note, however, that these latter designations do not necessarily correspond to extracellular and intracellular locations; the external metabolites are those that are fixed by the initial operating conditions and do not respond significantly to changes in the metabolic system, whereas the internal metabolites are those that may be affected by a change in the metabolic components, or indeed the external metabolites. Thus, if a cell is utilizing a large intracellular store, such as a starch granule or a lipid droplet, then that is an external metabolite by these definitions. Typically, at the start of an experiment, the internal metabolites may not be time-invariant, but relax asymptotically to their steady-state values. Hence, another aspect of the distinction between internal and external metabolites is that it corresponds to a separation in their characteristic timescales; internal metabolites adapt rapidly, whereas external metabolites change slowly. Thus, the steady state is an abstraction, and many experimental conditions will generate a quasi-steady state, since internal metabolites will adapt on timescales from a fraction of a second to minutes, whereas external metabolites change, if at all, on timescales of hours. (The same quasi-steady state assumption is applied, though on shorter timescales, in the derivation of rate equations for enzyme mechanisms.) trans1 X0 Source



xase x0



Jxase

ydh s1 → · · ·y



Jydh

trans2 sn · · · → x1 → X1 Sink

Figure 6.1 A schematic metabolic network. Upper case metabolite names correspond to external metabolites that enter and leave the cell via the two transporters. Lower case letters correspond to internal metabolites. Two specific enzymes and their fluxes, Jxase and Jydh , are labeled for future reference. As the network branches at s1 , these two fluxes are not equal.

6.2 MCA Theory

Why should we expect metabolic systems to reach a steady state? One answer is that the need to obtain reproducible experimental results in the face of biological variability favors designs that lead to a steady state, since a strong dependence on time of measurement would give an additional source of potential error. The same constraint probably applies to organisms themselves; it would be difficult for them to evolve a coordinated metabolism if given external conditions did not lead to a reproducible response. 6.2.2

Flux Control Coefficients

Under the paradigm of control of a metabolic pathway by a rate-limiting step, investigations were very enzyme-centered, looking for specific characteristics that would be possessed by such a step. The change in view that came with MCA and BST was to regard a pathway as a system of interacting components, and to attempt to relate the degree of influence exerted by any specific component both to its characteristics, and to those of the other components, without the preconception that one of them would be specially dominant. Furthermore, given that metabolic networks were composed of enzymes whose kinetics with respect to the metabolites are nonlinear, there were reasons to expect that the relationship between metabolic flux and the activity of any particular enzyme would also be nonlinear, as had already been suggested by earlier work (e.g. [8, 9]). Hence both originating groups of MCA [6, 7] chose to characterize the influence of an enzyme on metabolic network flux by a dimensionless sensitivity coefficient, now termed the flux control coefficient.1 An operational definition, illustrated with respect to the network shown in Figure 6.1, is as follows. Suppose a small change, 𝛿Exase , is made in the amount of enzyme Exase , and that this produces a small change in the flux through the step catalyzed by ydh. The flux Jydh is approximately the % change in J ydh produced by a 1% control coefficient Cxase change in Exase . Note that the nomenclature explicitly allows that the coefficient can be for the impact on a flux in the network other than that through the enzyme under consideration, in the expectation that the value may differ for a different flux. A more technical definition as a logarithmic sensitivity coefficient is: J

ydh = Cxase

𝜕Jydh Exase 𝜕 ln Jydh • = 𝜕Exase Jydh 𝜕 ln Exase

(6.1)

A graphical interpretation of the two forms of the definition as the tangent to the flux v enzyme curve is shown in Figure 6.2. One potential way to measure the flux control coefficient is to determine the flux at two enzyme levels Exase,1 and Exase,2 close enough that the coefficient is near constant. Using the logarithmic form of Eq. (6.1) gives: J

ydh ≈ Cxase

ln Jydh,2 − ln Jydh,1 ln Exase,2 − ln Exase,1

(6.2)

1 Kacser and Burns [6] originally called this the sensitivity coefficient, but the name was changed in a unification of terminology with that of Heinrich and Rapoport under the title of MCA [10].

175

6 Metabolic Control Analysis

Slope, t

Slope, t ln (Flux, J) ln(j)

j Flux, J

176

e

(a)

Enzyme, E

(b)

ln(e) ln(Enzyme, E)

Figure 6.2 Definition of the flux control coefficient. (a) The coefficient is defined at a specific reference point e, j on the response curve of the flux J to the enzyme E. CEJ = e ∗ t∕j where the slope t = 𝜕J/𝜕E. (b) For the same curve plotted with log scaling, CEJ = t, where the slope t = 𝜕 ln J/𝜕 ln E.

This can be rearranged to give the expected change in flux for a small change in enzyme activity if the flux control coefficient is known. Evidently, this relationship is not appropriate for the large changes in enzyme activity of interest for metabolic engineering, and this issue will be revisited later. As to how flux control coefficients may be determined, there is a range of direct and indirect methods that will not be described here, though many available approaches are set out in [4, 11]. 6.2.3

Examples of the Flux–Enzyme Relationship

Before the paper of Kacser and Burns [6], the response of metabolic flux to variation in the amount of an enzyme had not been studied experimentally. Their initial examples showed convex relationships, some of which tended asymptotically to a maximum at normal cellular levels of the enzyme, as in the example of argininosuccinate lyase shown in Figure 6.3. In that particular case, the points are well-described by a rectangular hyperbolic relationship between flux, J, and enzyme activity, E, of the form: c1 ⋅ E . (6.3) c2 + E Many examples have since been investigated, especially as developments in molecular biology techniques led to a variety of different ways of modulating the expression of a target enzyme in cells, and hyperbolic relationships are relatively common. A second example of this is also shown in Figure 6.3, and others can be found in [4]. However, not all observations are fitted by a rectangular hyperbola, nor is there a theoretical expectation that they would be. Nevertheless, in the cases where the curve is rectangular hyperbolic, it has been shown [14], by differentiation of Eq. (6.3), that the relationship between the flux control coefficient and the enzymatic activity must also be a rectangular hyperbola with the parameter c2: J=

6.2 MCA Theory

80

0.8

60

0.6

40

0.4

20

0.2 0.0

0

(a)

0

20 40 60 80 100 Argininosuccinate lyase, % wild-type

8 6

1.0 0.8 0.6

4 2 0

(b)

1.2

0.4 0.2

Flux control coefficient, - - -

1.0

Tryptophan uptake flux, nmol/mg/h

100

10

Flux control coefficient, - - -

1.2

Flux, % wild-type

120

0.0

0 20 40 60 80 100 120 Tryptophan 2,3-dioxygenase (nmol−1 mg−1 h−1)

Figure 6.3 Examples of hyperbolic flux v enzyme curves. (a) Dependence of arginine synthesis flux on activity of argininosuccinate lyase in Neurospora crassa. Data from [6, 12] fitted by a hyperbola (Eq. 6.3) with parameters c1 = 109.5 and c2 = 7.26. The corresponding flux control coefficient (dashed line) was calculated with c2 according to Eq. (6.4). Sources: Data from Kacser and Burns [6] and Flint et al. [12]. (b) Dependence of tyrosine uptake by rat hepatocytes on the activity of tryptophan 2,3-dioxygenase. Data from [13] fitted by a hyperbola with c1 = 11.3 and c2 = 45.1. Again the flux control coefficient is plotted using the value of c2. Source: Data from Salter et al. [13].

CEJ =

c2 c2 + E

(6.4)

This is also exemplified in Figure 6.3. Further implications of these relationships for metabolic engineering will be considered later. From the 1980s onward, additional methods were developed to measure flux control coefficients that relied on relationships between them and other measurable properties of the metabolic network that could be deduced from the theory of MCA. There is no space here to describe them all, but the main approaches can be found in [4, 11]. The general feature of the results is that most of them gave values of the control coefficient closer to 0 than to 1.0 at enzyme activities typical of the normal, unperturbed level in the cell, including measurements made on enzymes that had previously been described as “rate-limiting.” Again, this is exemplified by argininosuccinate lyase in Figure 6.3a, where the enzyme activity has to be less than 5% of the wild-type level for the flux control coefficient value to start to approach 1.0.2 The results undermined the rationale for the old binary classification of enzymes as either “rate-limiting” or “non–rate-limiting,” since more than one step in the same pathway could be found to have a nonzero control coefficient, and thus have some influence on the metabolic flux, even if the percentage response of the flux to a 1% change in activity was weak. Only if a 1% change in enzyme caused a proportional 1% response in the flux would an enzyme be strongly rate-limiting, with a flux control coefficient of 1.0. Such a result was hardly ever found in the normal physiological operating range of a metabolic network; an example of the rare exceptions is that of the flux control 2 It should be noted that the experiment was repeated [12] for other enzymes of the arginine synthesis pathway without finding any larger control coefficients.

177

178

6 Metabolic Control Analysis

coefficient of hexokinase IV on glycogen synthesis in rat hepatocytes, which was 0.8 at 5 mM glucose and 1.1 at 10 mM glucose [15]. 6.2.4

Flux Summation Theorem

An important result derived by Kacser and Burns is the flux summation theorem. This states that if a metabolic flux J is potentially influenced by the n enzymes E1…n of a metabolic network, the sum of their flux control coefficients is 1: n ∑ i=1

CEJ = 1 i

(6.5)

It follows from this that the value of any enzyme’s flux control coefficient is not wholly determined by its own kinetic characteristics but is a systems property of the metabolic network. Consider the case of argininosuccinate lyase in Figure 6.3a. At 100% of wild-type activity, its flux control coefficient is below 0.1, whereas at the reduced activity levels on the left of the plot it has risen to almost 0.6. This means that at wild-type levels, the sum of the control coefficients of other enzymes affecting arginine synthesis was around 0.9, but their share has dropped to around 0.4 at low activities of argininosuccinate lyase. However, these enzymes were unchanged in these experiments, so the flux control coefficients of one or more of them have been altered by the change in activity of the manipulated enzyme. A second implication is that it is possible for the control of flux to be shared between enzymes of the pathway, not necessarily equally, and in proportions that change in response to modulations of the enzymes. If the control is distributed, there is apparently no scope for a true rate-limiting enzyme with a flux control coefficient of 1 because this would leave no residue of control for the other enzymes according to the theorem. Sharing of control implies that, in a network where a significant number of enzymes can influence the flux, the average flux control coefficient will be nearer to 0 than 1, and this is largely what has been found experimentally. This in turn raises the question of how cells (or metabolic engineers) can change metabolic fluxes when there is no single enzyme that has a substantial influence on the flux. The potential answer will be presented toward the end of this chapter (Section 6.3.4). There is one proviso to these conclusions from the flux summation theorem: it is possible in some network structures to get negative flux control coefficients. One such case would be the flux control coefficients in a branched pathway, where activating an enzyme in one branch might draw flux away from the other branch. That would allow the sum of the positive flux control coefficients to exceed one by the magnitude of the negative coefficients. There have been a few reports of small negative flux control coefficients; ADP-glucose pyrophosphorylase and phosphoglucomutase in the starch-synthesizing branch of Arabidopsis metabolism have weak negative control on the competing sucrose synthesis branch in low light conditions [16], and pyruvate kinase has a control coefficient of −0.17 on the gluconeogenic flux from lactate to glucose in rat hepatocytes [17]. Large negative control coefficients probably require very specific circumstances, such as where

6.2 MCA Theory

the branch containing the negative control has a much larger share of the flux than the branch whose flux is being affected, among other factors.3 One example of a flux control coefficient of −1.0 is that of the phosphorylation subsystem of rat liver mitochondria on the proton leak flux [19], and this is only attained at maximum respiration rate (which is probably not achieved in liver cells). Under these circumstances, the control coefficients of both the respiratory chain subsystem and the leak are both close to 1.0. However, negative coefficients are not seen on the physiologically relevant respiration and phosphorylation fluxes. Furthermore, control within the respiratory and phosphorylation subsystems is known from other experiments to be distributed between the components [20, 21] without any of them qualifying as rate-limiting, so the concern about possible negative control coefficients is not often a major issue. The case of a flux control coefficient of −2.3 on a flux that is a tiny fraction of the flux in the branch with the large negative control is described later in Section 6.3.3. 6.2.5

Concentration Control Coefficients

Another type of sensitivity coefficient used in MCA is the concentration control coefficient for the effect of an enzyme J xase on some internal metabolite of the metabolic network Sj . The definition takes the same form as the flux control coefficient: 𝜕Sj Exase 𝜕 ln Sj Si • = = (6.6) Cxase 𝜕Exase Sj 𝜕 ln Exase Again a simple interpretation is that it is the percentage change in metabolite concentration brought about by a 1% change in the enzyme activity. In spite of the equivalence of the definition, concentration control coefficients have very different characteristics. This is because they are constrained by a summation theorem [7, 22], but to a total value of zero: n ∑ i=1

S

CEj = 0 i

(6.7)

As a consequence, the positive concentration control coefficients, denoting that an increase in activity of the enzyme raises the metabolite concentrations, are exactly balanced by the negative ones. Typically the positive control coefficients would be on metabolites downstream of the affected enzyme, and the negative ones upstream. In fact, this was used, as the crossover theorem, in the days before MCA to deduce the site of action of activators or inhibitors in a metabolic pathway. However, Heinrich and Rapoport [23] showed with their earliest MCA papers that the change from negative to positive concentration effects could only be relied on to occur at the affected enzyme in the simplest linear unbranched pathways, and could occur elsewhere in more complex structures, such as multiple feedback loops. A further aspect of the concentration summation theorem is that there is not an intrinsic limit on the magnitudes of 3 The justification of this statement involves the branch point theorem on flux control coefficients [18].

179

180

6 Metabolic Control Analysis

the coefficients, only that they should balance, so that values greater than 1.0 are both feasible and known. Unlike flux control coefficients, there have been few attempts to measure concentration control coefficients directly. In the case of hexokinase IV in rat hepatocytes, its control coefficient on glucose 6-phosphate concentration was measured as 1.4–1.7 by variable degrees of viral-mediated overexpression [24]. The control coefficients of CTP synthase on CTP, dCTP, and UTP in Lactococcus lactis were determined by varying the expression of the pyrG gene using synthetic constitutive promoters from 3 to 665% [25]. The values at wild type levels of CTP synthase (with the range from the lowest to highest levels of expression) were: for CTP, 1.03 (1.35 to 0.0); for dCTP, 0.49 (1.0 to 0.0), and for UTP, −0.28 (−0.11 to −0.63). It should be noted that the flux control coefficient of CTP synthase on growth (which reports on the fluxes to CTP, dCTP, and UTP) was zero except at the very lowest values of expression. These direct demonstrations that appreciable magnitudes of the concentration control coefficients are common are supported by other observations where manipulation of enzyme activity impacted on metabolite concentrations, even if the results obtained did not have sufficient information to calculate the coefficients accurately. For example, we observed large changes in glycolytic intermediates in potato tubers on 5- to 40-fold overexpression of phosphofructokinase, even though there was no measurable impact on glycolytic flux [26]. The changes were consistent with control coefficients of up to 0.6 for metabolites from fructose 1,6-bisphosphate to phosphoenolpyruvate (PEP). In Arabidopsis thaliana, overexpression of nine genes of the histidine synthesis pathway by at least 2.5-fold (as estimated from transcript levels) was tested for effect on histidine concentrations in shoot tissue [27]. Only the two isoforms of ATP-phosphoribosyltransferase (ATP–PRT1 and ATP–PRT2) had a significant effect, causing average 15- and 10-fold increases, respectively. As the growth of the transformed plants was somewhat impaired, it is unlikely there was much change in the flux to histidine. Though direct experimental measurements of concentration control coefficients are rare, they can readily be calculated from kinetic models of metabolism of various types, and indirectly from other types of experimental data. In fact, the third of the pioneering MCA papers from Rapoport and Heinrich applied the theory developed in the first two, in combination with experimental measurements, to compute the control coefficients of the glycolytic enzymes in erythrocytes [28]. The results showed a number of concentration control coefficients with magnitudes greater than 10. In a generalized MCA investigation of regulation in a two-step supply and demand pathway, Hofmeyr and Cornish-Bowden [29] showed that for much of the feasible steady state space accessible with usual enzyme properties, the magnitude of the concentration control coefficients would be larger than the flux control coefficients, with magnitudes greater than one easily possible. Similar conclusions have been reached in other such studies, so that in most cases it would be a reasonable expectation that modulation of the activity of a single enzyme will generate a large response in one or more metabolite concentrations, the opposite of the expectation for its effect on flux.

6.2 MCA Theory

6.2.6

Linking Control Coefficients to Enzyme Properties

The goal of MCA was not simply to define a set of sensitivity coefficients for a metabolic network, but to relate them to the properties of the underlying components of the network, and in this sense it was one of the fore-runners of systems biology. The underlying components are the enzymes and transporters of the metabolic system, and their properties that are relevant to this task are their kinetic responses to the metabolites. However, to make the linkage between the kinetic characteristics of the components and the behavior of the metabolic system, and to elucidate the principles underlying it, it is necessary to use a level of description less detailed than the rate functions of enzyme kineticists. This is provided by the elasticity coefficients that describe the responsiveness of the rate of an enzyme reaction to variations in the concentrations of the metabolites that affect it. 6.2.6.1

Enzyme Rate Equations and Elasticity Coefficients

The archetypal enzyme rate equation is that of Michaelis and Menten, describing the dependence of the initial rate v of an enzyme reaction on the substrate concentration S via the two parameters of limiting rate, V m , and Michaelis constant, K m: v=

V m •S . Km + S

(6.8)

Unfortunately, this equation is of very little use for our purposes; it applies to laboratory situations where the initial rate is measured at a given substrate concentration in the absence of the product and before any product has accumulated. In a metabolic network at steady state, the product, P, of this enzyme will be present, leading to some degree of binding of P at the active site in competition with S, and the possibility of the reverse conversion of P to S, thereby lowering the net rate of conversion of S to P, depending on the value of the equilibrium constant, K eq , for the reaction. A minimal equation accounting for these effects is known as the reversible Michaelis–Menten equation: vnet =

(Vm,f ∕Km,S )(S − P∕Keq ) 1 + S∕Km,S + P∕Km,P

,

(6.9)

where V m,f is the limiting rate in the S to P direction (equivalent to V m in Eq. (6.8)), and K m,S and K m,P are the Michaelis constants for S and P, respectively. An example is plotted in Figure 6.4. Note that there are two independent negative effects of the product on the rate, one via reversibility expressed in the numerator and the other by competitive inhibition at the active site, represented in the denominator and applying even to near-irreversible enzymes. Although it is common to find statements in research articles about the degree of saturation of an enzyme in vivo, on the basis of metabolite measurements and an in vitro K m value, the diagram shows the shortcomings of this. At S = 5, P = 0, the saturation of the enzyme with S is over 80%, as is the rate as a fraction of V m,f ; in the far corner at S = 5, P = 10, the degree of saturation with S has been reduced to 45% by competition

181

182

6 Metabolic Control Analysis

120 100 80 60 v

40 20 0 –20 –40 –60 10

8

6 P

4

2

0

0

1

2

3

4

5

S

Figure 6.4 The reversible Michaelis–Menten equation. Equation (6.9) illustrated for V m,f = 100, K m,S = 1, K m,P = 2, and K eq = 4. The light green grid shows the zero velocity plane; values below this plane correspond to net conversion of P to S. A hypothetical intracellular steady state of S = 2.5 and P = 3.3 is shown as the arrow projected onto the surface. The initial rate Michaelis–Menten rectangular hyperbola for S can be seen on the S–v plane at P = 0; that for P is on the P–v plane at S = 0.

with P, and the net rate to 23%. Furthermore, around the hypothetical intracellular steady state marked by the arrow head, the responsiveness of the enzyme rate to changes in either S or P is not represented by the initial rate rectangular hyperbolas. For these and other reasons, we need a different approach to characterizing the responses of enzymes to metabolites in MCA. The measure of the sensitivity of an enzyme toward metabolites is provided by the elasticity coefficient, which is defined in a similar way to the control coefficients for an enzyme xase with respect to a metabolite S as: 𝜀xase = S

𝜕vxase S 𝜕 ln ∣ vxase ∣ = . , 𝜕S vxase 𝜕 ln S

(6.10)

where vxase is the rate of the enzyme, and the concentrations of all other metabolites that affect the enzyme (products and effectors) considered as constant. As MCA is a steady-state analysis, the metabolite levels of interest are normally those at a metabolic steady state. The elasticity coefficient is not a system coefficient, however, because there is no consideration of the impact that a change in the enzyme rate in response to a change in S might have on the other metabolites in the network; it solely describes the intrinsic response of the enzyme in isolation. Also, as enzyme kinetic responses are nonlinear, its value is not constant and will vary when determined for different metabolic states. Evidently, an elasticity coefficient can be defined with respect to every chemical that has an effect on the enzyme activity in question. In the case that the enzyme is completely described by a rate equation, elasticities can be determined by algebraic or numerical differentiation. For example,

6.2 MCA Theory

∈ vS

4

2

0

–2

0

1

2

3

4

5

S

Figure 6.5 Elasticity of a reversible enzyme. Equation (6.11) is plotted for P = 0 (green), 2.5 (magenta), and 5 (blue) for the enzyme shown in Figure 6.4. The rate curve at P = 0 is the dashed line, reduced by a scale factor of 20.

the elasticity of the enzyme described by Eq. (6.9) with respect to S is: 𝜀vS =

S∕Km,S 1 , − 1 − 𝜌 1 + S∕Km,S + P∕Km,P

(6.11)

where 𝜌 is the disequilibrium ratio, that is P/(S⋅K eq ). The first term on the right-hand side depends on the degree of displacement from equilibrium, going from 1 in the absence of product to infinity at equilibrium, and the second term represents the fractional saturation of the enzyme with S, going from 0 in the absence of S to a maximum value of 1, giving the overall result illustrated in Figure 6.5. Note that the definition of elasticity as a relative effect on the reaction rate means that the limiting rate, V m , cancels out and does not appear in the expression. Some generic features of elasticities can be seen in this example. Firstly, when the enzyme is far from equilibrium (e.g. when P = 0), the elasticity declines from 1 at S = 0 toward 0, as the second term on the right-hand side of Eq. (6.11) is the saturation of the enzyme with S. Secondly, in the presence of sufficient P that the equilibrium favors formation of S as product, its elasticity is negative. Thirdly, where the P/S ratio approaches equilibrium, the first term on the right-hand side of Eq. (6.11) dominates where 𝜌 is near one. Thus the value of the elasticity becomes more or less independent of the parameters of the rate equation other than K eq . Also there is a discontinuity in the plot at equilibrium, as the elasticity approaches ±∞. Numerically, single substrate, single product enzymes are a minority in metabolic networks and are mainly represented by the isomerases, so the rate functions for more typical enzymes with multiple substrates and products are more complicated (see [4, 30] for examples) but can be used to derive elasticity

183

184

6 Metabolic Control Analysis

functions with the same characteristics. The only exceptions are allosteric enzymes that exhibit cooperativity with respect to substrates, products and/or inhibitors. In this case, even far from equilibrium, it is possible for substrate and activator elasticities to exceed 1, and for product and inhibitor elasticities to be less than −1. Typically the elasticity is close to the local Hill coefficient at low saturation of the ligand, but falls below it at higher saturations and drops below 1 even where the Hill coefficient is still greater than 1. Though the discussion above relates elasticities to enzyme rate functions, if we were dependent on having appropriate rate functions with parameter values valid for intracellular conditions, we would have the same restrictions that limit the building of kinetic models of metabolism. For MCA purposes we only need the local response of the enzyme to metabolite alterations near the current steady state, and this requires less information. There exist a number of experimental methods for determining elasticities that are described in [4]. An early instance was applied to gluconeogenesis from lactate in hepatocytes [17]. 6.2.6.2

Elasticities and Control Coefficients

The relationships between the control coefficients and elasticities have been termed the connectivity theorems. The first of these is the flux connectivity theorem [6, 22], which has the general form: n ∑

CiJ 𝜀iS = 0,

(6.12)

i=1

where S is an internal metabolite of a metabolic network of n enzymes and J is a specific flux in the network. If S does not affect the rate of some specific enzyme i, then 𝜀iS will be zero, as will therefore be the term CJi 𝜀iS . Hence the summation of the nonzero terms in the equation links together the control coefficients of all the enzymes for which S is a substrate, product or effector. As a simple example of the implications of this connectivity relationship, consider the hypothetical pathway: ydh

xase

X0 −−−−→ Y −−−−→ X1 There is only one connectivity equation: ydh

J J Cxase 𝜀xase + Cydh 𝜀Y = 0. Y

(6.13)

An immediate consequence of this is that the ratio of the two flux control coefficients is determined by the ratio of the elasticities: J Cxase J Cydh

ydh

=−

𝜀Y

𝜀xase Y

(6.14)

ydh

Since 𝜀Y will normally be negative, as a product elasticity, the ratio of the flux control coefficients is positive, and the larger of the two is the one with the smaller magnitude of its elasticity with respect to Y . However, we can go further by introducing the flux summation equation for the pathway: J J + Cydh = 1. Cxase

(6.15)

6.2 MCA Theory

Together Eqs. (6.13) and (6.15) are a pair of simultaneous equations with the solution: ydh

J Cxase

=

J Cydh =

𝜀Y ydh

𝜀Y − 𝜀xase Y −𝜀xase Y ydh

𝜀Y − 𝜀xase Y

(6.16)

J If the product elasticity 𝜀xase were 0, then Cxase would be 1 (a rate-limiting Y J enzyme) and Cydh would be 0. This illustrates the danger of discounting product inhibition or slight reversibility; even weak inhibition and a small product = 0 [31]. elasticity result in very different system properties from 𝜀xase Y There are also connectivity relationships between concentration control coefficients and elasticities [22, 32]: n ∑ i=1

S

Ci j 𝜀iS = −𝛿j,k , k

(6.17)

where the Kronecker delta, 𝛿 j,k = 1 when j = k (i.e. when the control coefficient and the elasticity are with respect to the same metabolite Sj ) and 0 otherwise. In the case of the two-step pathway above there is only one concentration connectivity relationship: ydh

Y Y 𝜀xase + Cydh 𝜀Y = −1. Cxase Y

(6.18)

As before, there is also a concentration summation theorem: Y Y Cxase + Cydh = 0.

These two equation have the solution: 1 Y = ydh Cxase 𝜀Y − 𝜀xase Y −1 Y Cydh = ydh 𝜀Y − 𝜀xase Y

(6.19)

(6.20)

Note that the denominator of these expressions is identical to those in Eq. (6.16); this is a generic feature. With normal kinetics, where substrates activate and products inhibit, the denominators are positive, so Eq. (6.20) shows that activation of xase will increase Y whereas activation of ydh will cause it to decrease. A similar outcome is not guaranteed for more complex network , will have structures and kinetics [23]. Again note that a 0 product elasticity, 𝜀xase Y an undesirable effect if Y rises to a level where the substrate elasticity is much less than 1; both concentration control coefficients will have a magnitude much greater than 1, showing very poor control of Y in response to small fluctuations in the enzyme activities [29, 31]. The example above shows, for this simple case, that the system-level flux and concentration control coefficients can be expressed entirely in terms of the kinetic characteristics of the component enzymes. Extending this to larger, realistic pathways requires considering additional constraints on the control

185

186

6 Metabolic Control Analysis

coefficients imposed by flux distributions at branch points [18, 33] and a different form of the connectivity theorem for conserved metabolites, that is groups of metabolites that together have an invariant total concentration (such as NAD and NADH if their synthesis pathway is not included in the metabolic network under consideration) [18]. Reder [34] subsequently developed a formal mathematical proof showing that control coefficients could always be expressed in terms of the structure and component properties of any metabolic network at steady state and that relationships such as the summation theorems are embedded in the general solution. A more discursive account of Reder’s formulation has been given by Heinrich and Schuster [35]. 6.2.6.3

Block Coefficients and Top-Down Analysis

Though analysis of a two-step pathway might seem too unrealistic, it has greater applicability than first appears. This is because control coefficients are additive [7, 22], so that if the first step of the pathway, xase, was in fact a block of enzymes, its control coefficient would be the sum of the control coefficients of the components. Furthermore, the group can be considered to have a set of “block” elasticities to metabolites outside the block, and these can in turn be expressed as functions of the elasticities of the individual components [18]. The block elasticities can also be measured experimentally using approaches developed by Brand and co-workers [36] and termed top-down control analysis, leading to calculation of block control coefficients. Hence MCA can be carried out at different scales with different information requirements: on a coarse scale with top-down analysis and relatively straightforward experiments, or on an enzyme by enzyme bottom-up approach that generally requires a lot more experimental effort (see [4] for examples). Top-down analysis can also be applied recursively so that the control within a block can be more precisely located, and, subject to appropriate choice of blocks, can be applied in branched networks [37]. 6.2.7

Feedback Inhibition

Traditional concepts of metabolism proposed control by rate-limiting enzymes, and, as we have seen, this was shown by MCA to not be generally true. However, another feature of the traditional theory was the identification of an allosteric enzyme subject to feedback inhibition as the likely candidate for the rate-limiting step. MCA has also shown this to be a misinterpretation of the role of feedback inhibition in metabolism, and it is important to analyze this since engineering allosteric enzymes has been a common strategy in metabolic engineering (as discussed further in Section 6.3.1).4 Kacser and Burns [6] used the connectivity theorem to show how feedback inhibition would act to reduce the flux control coefficient of the inhibited enzyme by varying the inhibition constant from weak to strong, represented by the elasticity coefficient of the feedback inhibitor going from slightly below zero and becoming increasingly negative. Their allosteric enzyme was at the beginning of 4 Savageau’s Biochemical Systems Theory also provides important insights into the role of different configurations of feedback inhibition loops in metabolic networks [5, 38].

6.2 MCA Theory

Flux control coefficient, CJi

1.0 0.8 0.6 0.4 0.2 0.0

1

10 Inhibition constant, Ki,P

100

Figure 6.6 Feedback inhibition transfers control to the steps consuming the inhibitor. A four-step pathway is simulated with E 1 represented as a reversible Monod–Wyman–Changeux enzyme with product inhibition by A and allosteric inhibition by P, as shown in the text. E2 and E3 are reversible Michaelis–Menten enzymes with product inhibition, and E4 is an irreversible enzyme. Details of the simulation are given in Appendix 6.A.

a short pathway, and with a weak inhibition constant, its flux control coefficient was the largest in the pathway, close to 1.0. This can also be shown by a computer simulation of the following pathway: X0

E1

A

E2

B

E3

P

E4

X1

Source: Adapted from Fell [4]. The feedback of P on the allosteric enzyme E1 can be varied from very strong (K i,P < 1) to very weak (K i,P ≈ 100) and the flux control coefficients calculated as in Figure 6.6. With weak inhibition, the largest flux control coefficient is that of E1 , but strong inhibition transfers control to the step that represents the demand for P. Supply-demand analysis [29] shows the conditions under which this transfer of control occurs, and its characteristics. As in Section 6.2.6.3, we can consider the four enzyme pathway above to be equivalent to a supply block composed of the first three enzymes and a demand block comprising the fourth. The flux control coefficients of the two blocks are given by Eq. (6.16) above, where 𝜀xase Y supply ydh now represents 𝜀P and 𝜀Y is 𝜀demand . If E is an allosteric enzyme subject 1 P to cooperative feedback inhibition, its elasticity to P, which can be J10 , r3 is less than r1 , perhaps to a significant degree, depending on the flux J 2 . Evidently, metabolism is a much more complex network than shown, so that in practice it would be necessary to go back to the branch point preceding S and recompute the r-fold factor for the flux leading into that. In terms of practical implementation, this would potentially lead to large numbers of enzymes to be Figure 6.11 Simple model to illustrate the Universal Method. N, metabolic inputs or nutrients; S, branch point metabolic intermediate; G, cell biomass and other products; P, desired metabolic product; Ji , metabolic fluxes. Source: Based on Kacser and Acerenza [61].

N

J3 J2

S J1

P

G

199

200

6 Metabolic Control Analysis

overexpressed by different carefully calibrated factors. The hope would be that at some point, in the high flux core of central carbon metabolism, the required amplification factor would fall close to 1 and not require any intervention. Kacser and Acerenza illustrated more complex examples in their paper [61], including a design for extending the work on tryptophan synthesis in yeast by calculating the amplification factors needed for the shikimate pathway leading to chorismate (steps 1–7 in Figure 6.10). Implementation of the Universal Method in a computer simulation is evidently much easier, and an example for a branched pathway with feedback inhibition, analogous to the aromatic amino acids pathway, was included in a comparison of different overexpression studies [56]. Unsurprisingly, the Universal Method was the best performer, achieving exactly the intended flux increase with no impacts on metabolite concentrations. (It was referred to as the evasion strategy by these authors, since changes in metabolic regulation mechanisms, such as feedback loops, are avoided when no internal metabolite concentrations change, as noted in [61].) The only closely competitive strategy found in the simulation was activation of product removal, as previously noted in Section 6.3.2. Another computer simulation of a mechanistically based kinetic model of yeast aromatic amino acid synthesis from glucose examined a limited application of the Universal Method, where the number of enzyme groups to be modulated was restricted to two, but metabolite concentrations were allowed to change somewhat from their original values [62]. Several options for increased tryptophan flux were found, using optimization techniques, that showed synergistic interactions resulting in flux gains of the same order as the activity amplification factors. Some solutions also involved downregulation of a competing branch, but this approach does presuppose a detailed knowledge of the pathway kinetics sufficient to calculate control coefficients. Experimental implementation of the Kacser and Acerenza design for increasing tryptophan flux in yeast was under way when Henrik Kacser died. The four ARO genes encoding the shikimate pathway were inserted into a single copy plasmid, and the five genes of the tryptophan pathway into a multicopy plasmid, giving 2–5 fold and 20–60 fold enzyme overexpression respectively [63], which was in the range of the relative degrees of amplification previously calculated. Joint expression of the plasmids in yeast raised the intracellular concentrations of the three aromatic amino acids two- to threefold but did not increase the flux. The reasons for this were not clear, but could have included: the fall in the demand for the amino acids because of slower growth of the cells bearing the plasmids; lack of a transport mechanism for tryptophan out of the cell, and flux control in reactions before the shikimate pathway. The latter is certainly supported by studies on bacterial production of aromatic amino acids (Section 6.3.4.3). However, there was no follow-up to this project to resolve these issues. 6.3.4.3

Bacterial Production of Aromatic Amino Acids

There is a substantial market for production of the L-aromatic amino acids, both in their own right and as starting points for manufacture of other chemicals (e.g. the sweetener aspartame from phenylalanine). Considerable effort has been put into engineering enhanced production in E. coli and Corynebacterium strains, as

6.3 Implications of MCA for Metabolic Engineering Strategies

reviewed in [51]. As mentioned previously, this has frequently involved introduction of feedback-resistant enzymes, but in most cases some additional enzymes have to be overexpressed in tandem to prevent unwanted losses through excretion of pathway intermediates. Thus in general, the results support the principle that multiple enzyme modulations produce better results. However, it is not clear that much of this development has been guided by MCA considerations, except in the case of work by the Liao group on the upper part of the pathway between glucose or xylose and DAHP. The Liao group isolated this section of the pathway, as far as the synthesis of DAHP from erythrose 4-phosphate (E4P) and PEP by DAHPS, by using an aroB mutant to block synthesis at step 2 (Figure 6.10), as others had done before. This causes excretion of the DAHP, allowing the synthesis flux to be calculated from its rate of accumulation. As others had done before them, they overexpressed a feedback-resistant DAHPS (aroGfbr ) and transketolase (TktA), the direct producer of E4P in the pentose pathway [64]. In addition, they identified by stoichiometric analysis that the yield of DAHP in E. coli was potentially limited because glucose is taken up by the phosphotransferase system (PTS) and phosphorylated by PEP, so that there is only net production of one PEP per glucose. They calculated that overexpressing the gluconeogenic enzyme PEP synthase (PpsA, or pyruvate, water dikinase) to convert pyruvate generated by the PTS back to PEP, would increase the yield of DAHP from glucose. These three enzymes were expressed on plasmids under the control of different promoters so that DAHPS and PpsA overexpression could be modulated by the use of separate inducers. In a subsequent paper, they also investigated expression of another pentose pathway enzyme on the route to E4P, transaldolase (Tal) [65] and presented the control analysis of the results from both papers. At low levels of overexpression of feedback-resistant DAHPS, DAHP synthesis was stimulated and its flux control coefficient near the wild-type level was estimated to be close to 1. In this region, overexpression of neither TktA, Tal nor PpsA stimulated flux to DAHP so their coefficients were all near zero. At higher levels of amplification of DAHPS, where its effect on DAHP synthesis had saturated, so its flux control coefficient had reduced to zero, overexpression of either TktA or Tal produced further increases in flux to DAHP. For these two enzymes, there were only flux measurements at the initial level and one level of overexpression, so the activity and flux changes were rather too large for calculation of the control coefficients by the equation they used. Their values of 0.13 and 0.12 for TktA and Tal, respectively actually correspond to the Deviation Index of large change theory (Section 6.2.8 and [45]), which has been shown to estimate the control coefficient at the amplified activity level. Using the same data, from Table III of [65] in Eq. 6.21 gives the flux control coefficients at the original activity levels as 0.49 and 0.57.9 It might appear that all the control has been transferred to these two enzymes at high levels of DAHPS. However, simultaneous expression of the two revealed no synergy; the flux to DAHP was no different from either separately! This seems surprising, but the pentose pathway has a complicated nonlinear 9 The authors could have calculated single coefficients using the more robust logarithmic approximation, Eq. (6.2), which gives the values 0.29 and 0.31, i.e. a mid–estimate over the range.

201

202

6 Metabolic Control Analysis

structure, so additivity of the control coefficients is not guaranteed. PpsA only gives a further increase in flux after DAHPS and TktA have been overexpressed, suggesting that the demand for PEP has to reach a certain level before its supply becomes increasingly limiting. No flux control coefficient was reported for it, and is not easily calculable from the data presented, but a semi-quantitative estimate made by reference to Figure 6.7 would be in the region of 0.3 or less in the conditions where it exerts some control. All this implies that after the overexpression of three to four targets, the flux limitation has now moved to other steps in the pathway that initially did not exert significant control. 6.3.4.4

Penicillin and Other Instances

The commercial development of penicillin production is primarily an example of empirical strain selection, followed by very many rounds of mutation and selection to achieve an increase of over 104 fold in the yield of the antibiotic from wild-type fungi, Penicillium and Aspergillus species [66]. However, interesting insights into the changes required to obtain such a large amplification of flux have come subsequently from genome sequencing, kinetic modeling and MCA. The final pathway for the assembly of penicillin consists of three enzymes. The first, ACVS, condenses L-𝛼-aminoadipate (an intermediate in the fungal lysine biosynthesis pathway), cysteine and valine to form a tripeptide. This then undergoes oxidative cyclization to generate the 𝛽-lactam ring of isopenicillin N by isopenicillin-N synthase (IPNS). Finally, L-𝛼-aminoadipate is replaced by an acyl group from an acyl-CoA catalyzed by acyl-CoA-isopenicillin-N acyltransferase (AAT), giving penicillin G or penicillin V depending on the acyl group. Kinetic modeling of this final part of the pathway, based on the characteristics of a high-yielding strain of P. chrysogenum (the source of industrial production strains), suggested that flux control was shared between ACVS and IPNS, depending on oxygen concentration, with ACVS having higher control at high oxygen, and IPNS having more as dissolved oxygen decreased [3]. This is supported by overexpression studies of the three enzymes in a low-level natural penicillin producer A. nidulans; only ACVS amplification causes a significant increase in penicillin production, where 100-fold amplification of the enzyme causes a 30-fold increase in penicillin yield, corresponding to a flux control coefficient close to 1 (actually 0.98), as justifiably claimed by the authors [67]. Even so, when ACVS and IPNS were simultaneously overexpressed with the same promoter, a marked further increase in penicillin production was obtained [66]. The amplification of ACVS alone had probably reduced its flux control coefficient to around 0.3, and some of this control had evidently been taken up by IPNS. This requirement for parallel amplification of the enzymes of a pathway is supported by the sequencing studies of industrial strains of P. chrysogenum: the genes of the three enzymes have been amplified in tandem by factors of 50 fold or more [66]. There have been many other adaptations as well to cope with supplying the pathway, but the gene copy number of the three enzymes correlates with productivity. That mutation and selection should have resulted in coordinate amplification of the enzymes of a pathway in order to deliver increased flux should really not

6.3 Implications of MCA for Metabolic Engineering Strategies

come as a surprise. It is seen in the organization of the genes of linear segments of metabolism into operons and regulons to ensure coordinate induction and repression, and in the fusion of enzyme activities of linear, and even cyclic, sections of metabolism into multifunctional proteins and multienzyme complexes. Further arguments for multiple modulation being the only mechanism capable of giving large increases in metabolic flux are given in my earlier book [4]. 6.3.5

Impacts on Yield from a Growing System

This chapter has concentrated on the difficulties of engineering strategies to increase flux, but the target in biotechnological processes is not flux per se, but the overall yield of the product, which is the integral of the flux over time. It turns out that the relative increase in yield from amplifying an enzyme, for a product whose synthesis is coupled to an organism’s growth, can be greater than the relative increase in flux. The most straightforward case is during the exponential growth phase of an organism whose biomass formation is coupled to that of the product. This is naturally the case for catabolic fermentation products, but as mentioned previously, there are network design strategies that can make this the case for other metabolic outputs [58, 59]. Exponential growth is defined by the differential equation for the rate of growth: dM = kM, (6.27) dt where k is the rate constant and M is the amount (or concentration) of biomass, or of a product produced in constant proportion to the biomass in the exponential growth phase. The flux to product or biomass, J, is proportional to the value of M at time t. In my derivation of the equations for the impact of changing enzyme activity on yield [68], I showed that the control coefficient of enzyme E on flux, CEJ , is equal to the control coefficient of E on k, CEk . The equation above can be integrated to give the value of M at time t, Mt , given its initial value M0 at the start of the exponential phase, where t = 0: J=

Mt = M0 ekt .

(6.28)

If enzyme E is amplified r-fold, the fold increase f in the rate constant k can be calculated from the large change Eq. (6.21), and hence Mt,rE will be given by: Mt,rE = M0 efkt .

(6.29)

From the previous two equations, the relative value of M at a given time t and for the same initial value of M0 , mr is: mr =

Mt,rE Mt

= exp(kt(f − 1)).

(6.30)

Thus, whereas the relative flux difference remains the same throughout the exponential phase, the yield ratio increases with time. For an exponential phase lasting for four doublings, the amplification of the yield is significantly larger

203

6 Metabolic Control Analysis

2.6 2.4 Relative increase in yield

204

2.2 2.0 1.8 1.6 1.4 1.2 1.0 1.0

2.0

3.0 4.0 Fold overexpression

5.0

6.0

Figure 6.12 Changes in yield and flux to growth-coupled product for different degrees of overexpression. Relative changes in flux (dashed lines) and yield at 4 doubling times (solid lines) are shown for variable degrees of overexpression of enzymes with flux control coefficients of 0.1 (blue) and 0.3 (magenta).

than for flux, though there is the same tendency for the effect to diminish as the enzyme amplification increase, as shown by the computed example in Figure 6.12. The relationships above also allow estimation of flux control coefficients from yield data. Yield measurements against time allow measurement of k and fk from the logarithmic forms of Eqs. (6.28) and (6.29), which determines f , and given measurement of r, the flux control coefficient is obtained from the large change Eq. (6.22). Currently there is only one published implementation of this analysis to determine the flux control coefficient of an enzyme via its observed effects on yield after overexpression. This was part of a project to engineer increased oil yield in oil seed rape, Brassica napus [69] in which I participated. Analysis of the literature on the deposition of triacylglycerol (TAG) in the seed and the growth of the seed in the pod showed that both had a substantial, contemporaneous exponential phase. Previous work, using top-down control analysis (Section 6.2.6.3) on oilseed rape embryos in vitro, had already established that there was significant control of TAG synthesis by the lipid assembly block in which acyl-CoAs and glycerol 3-phosphate are converted to TAG [70]. We therefore investigated overexpression of the second enzyme of the TAG assembly pathway, lysophosphatidate acyltransferase (LPAAT), by transformation and selection of plants with insertion of a single extra copy of the LPAAT gene [69]. Two of these plants, with independent insertion loci, were self-fertilized, and from the next generation, plants homozygous for the insertion were selected for biochemical

Appendix 6.A: Feedback Inhibition Simulation

analysis alongside homozygous null plants from the same cross. With the original wild-type untransformed plants, this gave three lines with relative LPAAT activities of 1.0, 1.5, and 2.0, and the fold increase in the rate of TAG synthesis in the transgenic plants, calculated from the final TAG content of the seeds using Eqs. (6.28) and (6.29), was 1.06 and 1.07, respectively. Eqs. (6.22) and (6.23) give the two pairs of estimates of the flux control coefficient at wild-type and amplified levels as 0.17 and 0.12, and 0.14 and 0.07, respectively. Alternatively, a hyperbola can be fitted to the three pairs of relative enzyme and flux values, as in Eq. (6.3), and this gives a consensus estimate of the wild-type value of the flux control coefficient of LPAAT as 0.15. Though this may seem small, given that the experiment was done on whole plants, the full pathway is from CO2 via photosynthesis, sucrose transport to the seed, generation of precursors through glycolysis and fatty acyl-CoA synthesis, to assembly, so this is not a negligible share of the whole in planta pathway. Although the relative flux changes from LPAAT amplification were estimated as 1.06–1.07, the relative yields of TAG were 1.24 and 1.29. This is accounted for by Eq. (6.30), since the kt factor in these experiments was 3.6, corresponding to an exponential phase of 5 doublings of seed size. This new method of estimating flux control coefficients could almost certainly be retrospectively applied to enzyme overexpression studies of growth-coupled product yields wherever there are available measurements of a steady state exponential phase and enzymatic assays of the degree of activity amplification.

6.4 Conclusion I hope that, in this chapter, I have shown the importance of an understanding of the principles and findings of MCA to the rational design of strategies for the engineering of metabolism, and to the analysis of experimental outcomes in order to gain deeper insight. It is unlikely now that there will be frequent experimental efforts made solely to determine flux control coefficients, as it is often experimentally demanding, and usually confirms that the value is rather low. Nevertheless, experiments in enzyme manipulation can be interpreted in terms of control coefficients and MCA, if the appropriate data is collected, using the calculation methods described here. Indeed, as I have hinted in places in this chapter, researchers have published papers that already contain the results needed for an MCA analysis that would have complemented their work had they chosen to do so. Those results remain available for mining retrospectively, if the appropriate method from this chapter is selected, and could therefore assist future studies.

Appendix 6.A: Feedback Inhibition Simulation The flux control coefficient values shown in Figure 6.6 were simulated for the following pathway with feedback inhibition:

205

206

6 Metabolic Control Analysis

X0

A

E1

B

E2

E3

P

E4

X1

The kinetic scheme was based on that of Olivier et al. [71], where the first feedback-inhibited enzyme is represented by a reversible Hill equation (shown to give equivalent regulatory behavior to a reversible Monod–Wyman–Changeux allosteric enzyme for appropriate choice of parameters). The second and third steps are reversible Michaelis–Menten enzymes, and the fourth, demand step for the feedback metabolite P is an irreversible Michaelis–Menten enzyme. The equations and parameters for the four steps were as follows: 1)

( X0

Vf 1 • K

X0 K1,X0



1,X0

v1 =

+

(

A K1,A

)h−1 (

A K1,A

+

1+

+

P K1,P

( 1+𝛼

A X0 Keq,1

)h

(

)h X0 K1,X0

1−



) ,

)h

P K1,P

where X 0 = 2.0, V f 1 = 200, K1,X0 = 1.0, K 1, A = 5.0, 0.5 ≤ K 1, P ≤ 80.0, h = 2.5, 𝛼 = 0.01, and K eq,1 = 200. 2) Vf 2

v2 =

(

1+

A K2,A

)

B Keq,2

A−



K2,A

+

B K2,B

,

where V f 2 = 2000, K 2, A = 2.5, K 2, B = 5.0, and K eq, 2 = 10.0. 3) Vf 3

v3 =

( •

K3,B

1+

B− B K3,B

P Keq,3

+

)

P K3,P

,

where V f 3 = 2000, K 3, B = 2.5, K 3, P = 5.0, and K eq, 3 = 10.0. 4) v4 =

Vf 4 P K4,P + P

,

where V f 4 = 200 and K 4, P = 2.0. The model was simulated with the steady-state solver in the metabolic modeling package ScrumPy [72] (https://gitlab.com/MarkPoolman/scrumpy_2) over a range of values of the feedback inhibition constant K 1, P . The values of the four flux control coefficients, the concentrations of metabolites A, B, and P, and the steady-state flux were recorded. The flux control coefficients C1J and C4J are shown in Figure 6.6 of the main text. The effect of feedback inhibition in lowering the concentrations of metabolites within the feedback loop (A and B) is shown in Figure 6.A.1. This equally demonstrates why abolishing feedback inhibition can significantly increase intracellular

References

10.0

Concentration

8.0 6.0 4.0 2.0 0.0

1

10 Inhibition constant, Ki,P

100

Figure 6.A.1 The concentrations of metabolites A (solid line) and B (dashed line) are plotted as a function of feedback strength (stronger on the left and weaker at the right).

metabolite concentrations. Over the same range, the pathway flux only varies between 139 and 193 flux units.

References 1 Chassagnole, C., Rais, B., Quentin, E. et al. (2001). An integrated study of

2

3

4 5 6 7

8 9

threonine-pathway enzyme kinetics in Escherichia coli. Biochem. J. 356: 415–423. Hoefnagel, M.H.N., Starrenburg, M.J.C., Martens, D.E. et al. (2002). Metabolic engineering of lactic acid bacteria, the combined approach: kinetic modelling, metabolic control and experimental analysis. Microbiology 148: 1003–1013. Noronha de Pissara, P., Nielsen, J., and Bazin, M.J. (1996). Pathway kinetics and metabolic control analysis of a high-yielding strain of Penicillium chrysogenum during fed batch cultivations. Biotechnol. Bioeng. 51: 168–176. Fell, D.A. (1997). Understanding the Control of Metabolism. London: Portland Press. Savageau, M.A. (1976). Biochemical Systems Analysis: A Study of Function and Design in Molecular Biology. Reading, MA: Addison–Wesley. Kacser, H. and Burns, J.A. (1973). The control of flux. Symp. Soc. Exp. Biol. 27: 65–104. Reprinted in Biochem. Soc. Trans. 23, 341–366, 1995. Heinrich, R. and Rapoport, T.A. (1974). A linear steady-state treatment of enzymatic chains; general properties, control and effector strength. Eur. J. Biochem. 42: 89–95. Higgins, J. (1963). Analysis of sequential reactions. Ann. N. Y. Acad. Sci. 108: 305–321. Waley, S.G. (1964). A note on the kinetics of multi-enzyme systems. Biochem. J. 91: 514–517.

207

208

6 Metabolic Control Analysis

10 Burns, J.A., Cornish-Bowden, A., Groen, A.K. et al. (1985). Control analysis

of metabolic systems. Trends Biochem. Sci. 10: 16. 11 Fell, D.A. (1992). Metabolic control analysis: a survey of theoretical and

experimental developments. Biochem. J. 286: 313–330. 12 Flint, H.J., Tateson, R.W., Bartelmess, I.B. et al. (1981). Control of flux in the

arginine pathway of Neurospora crassa. Biochem. J. 200: 231–246. 13 Salter, M., Knowles, R.G., and Pogson, C.I. (1986). Quantification of the

14 15

16

17 18

19

20

21

22 23 24

25

26

importance of individual steps in the control of aromatic amino acid metabolism. Biochem. J. 234: 635–647. Torres, N.V., Mateo, F., Melendez-Hevia, E., and Kacser, H. (1986). Kinetics of metabolic pathways. Biochem. J. 234: 169–174. Agius, L., Peak, M., Newgard, C.B. et al. (1996). Evidence for a role of glucose-induced translocation of glucokinase in the control of hepatic glycogen synthesis. J. Biol. Chem. 271: 30479–30486. Neuhaus, H.E. and Stitt, M. (1990). Control analysis of photosynthate partitioning: impact of reduced activity of ADP-glucose pyrophosphorylase or plastid phosphoglucomutase on the fluxes to starch and sucrose in Arabidopsis thaliana(L.) Heynh. Planta 182: 445–454. Groen, A.K., van Roermund, C.W.T., Vervoorn, R.C., and Tager, J.M. (1986). Control of gluconeogenesis in rat liver cells. Biochem. J. 237: 379–389. Fell, D.A. and Sauro, H.M. (1985). Metabolic control analysis: additional relationships between elasticities and control coefficients. Eur. J. Biochem. 148: 555–561. Hafner, R.P., Brown, G.C., and Brand, M.D. (1990). Analysis of the respiration rate, phosphorylation rate, proton leak rate and protonmotive force in isolated mitochondria using the “top-down” approach of metabolic control theory. Eur. J. Biochem. 188: 313–319. Groen, A.K., Wanders, R.J.A., Westerhoff, H.V. et al. (1982). Quantification of the contribution of various steps to the control of mitochondrial respiration. J. Biol. Chem. 257: 2754–2757. Rossignol, R., Letellier, T., Malgat, M. et al. (2000). Tissue variation in the control of oxidative phosphorylation: implication for mitochondrial diseases. Biochem. J. 347: 45–53. Burns, J.A. (1971). Studies on Complex Enzyme Systems. University of Edinburgh. Heinrich, R. and Rapoport, T.A. (1974). A linear steady state treatment of enzymatic chains. Eur. J. Biochem. 42: 97–105. Harndahl, L., Schmoll, D., Herling, A.W., and Agius, L. (2006). The role of glucose 6-phosphate in mediating the effects of glucokinase overexpression on hepatic glucose metabolism. FEBS J. 273: 336–346. Jorgensen, C.M., Hammer, K., Jensen, P.R., and Martinussen, J. (2004). Expression of the pyrG gene determines the pool sizes of CTP and dCTP in Lactococcus lactis. Eur. J. Biochem. 271: 2438–2445. Thomas, S., Mooney, P.J.F., Burrell, M.M., and Fell, D.A. (1997). Finite change analysis of glycolytic intermediates in tuber tissue of lines of transgenic potato (Solanum tuberosum) overexpressing phosphofructokinase. Biochem. J. 322: 111–117.

References

27 Rees, J.D., Ingle, R.A., and Smith, J.A.C. (2009). Relative contributions of nine

28 29 30 31

32 33 34 35 36

37 38 39

40 41

42 43

44

45

genes in the pathway of histidine biosynthesis to the control of free histidine concentrations in Arabidopsis thaliana. Plant Biotechnol. J. 7: 499–511. Rapoport, T.A., Heinrich, R., Jacobasch, G., and Rapoport, S. (1974). A linear steady state treatment of enzymatic chains. Eur. J. Biochem. 42: 107–120. Hofmeyr, J. and Cornish-Bowden, A. (1991). Quantitative assessment of regulation in metabolic systems. Eur. J. Biochem. 200: 223–236. Segel, I.H. (1993). Enzyme Kinetics. Wiley Classics Library. Cornish-Bowden, A. and Cárdenas, M.L. (2001). Information transfer in metabolic pathways. Effects of irreversible steps in computer models. Eur. J. Biochem. 268: 6616–6624. Westerhoff, H.V. and Chen, Y.D. (1984). How do enzyme activities control metabolite concentrations? Eur. J. Biochem. 142: 425–430. Kacser, H. (1983). The control of enzyme systems in vivo: elasticity of the steady state. Biochem. Soc. Trans. 11: 35–40. Reder, C. (1988). Metabolic control theory: a structural approach. J. Theor. Biol. 135: 175–201. Heinrich, R. and Schuster, S. (1996). Metabolic control analysis. In: The Regulation of Cellular Systems, 138–291. Chapman and Hall. Brown, G.C., Hafner, R.P., and Brand, M.D. (1990). A “top–down” approach to the determination of control coefficients in metabolic control theory. Eur. J. Biochem. 188: 321–325. Brand, M.D. (1996). Top down metabolic control analysis. J. Theor. Biol. 182: 351–360. Savageau, M.A. (1974). Optimal design of feedback control by inhibition: steady state considerations. J. Mol. Evol. 4: 139–156. Liu, C., Donahue, J.P., Heath, L.S., and Turnbough, C.L. (1993). Genetic evidence that promoter P2 is the physiologically significant promoter for the pyrBI operon of Escherichia coli K–12. J. Bacteriol. 175: 2363–2369. Schaaff, I., Heinisch, J., and Zimmerman, F.K. (1989). Overproduction of glycolytic enzymes in yeast. Yeast 5: 285–290. Ruijter, G.J., Panneman, H., and Visser, J. (1997). Overexpression of phosphofructokinase and pyruvate kinase in citric acid-producing Aspergillus niger. Biochim. Biophys. Acta 1334: 317–326. Burrell, M.M., Mooney, P.J., Blundy, M. et al. (1994). Genetic manipulation of 6-phosphofructokinase in potato tubers. Planta 194: 95–101. Urbano, A.M., Gillham, H., Groner, Y., and Brindle, K.M. (2000). Effects of overexpression of the liver subunit of 6-phosphofructo-1-kinase on the metabolism of a cultured mammalian cell line. Biochem. J. 352: 921–927. Koebmann, B.J., Westerhoff, H.V., Snoep, J.L. et al. (2002). The glycolytic flux in Escherichia coli is controlled by the demand for ATP. J. Bacteriol. 184: 3909–3916. Small, J.R. and Kacser, H. (1993). Responses of metabolic sytems to large changes in enzyme activities and effectors. 1. The linear treatment of unbranched chains. Eur. J. Biochem. 213: 613–624.

209

210

6 Metabolic Control Analysis

46 Kruckeberg, L., Neuhaus, H.E., Feil, R. et al. (1989). Decreased-activity

47

48

49

50

51

52

53

54

55

56

57

58 59

60

61

mutants of phosphoglucose isomerase in the cytosol and chloroplast of Clarkia xantiana. Biochem. J. 261: 457–467. Tosaka, O., Takinami, K., and Hirose, Y. (1978). l-Lysine production by S-(2-aminoethyl) l-cysteine and 𝛼-amino-𝛽-hydroxyvarelic acid resistant mutants of Brevibacterium lactofermentum. Agric. Biol. Chem. 42: 745–752. Chassagnole, C., Fell, D.A., Rais, B. et al. (2001). Control of the threonine-synthesis pathway in Escherichia coli: a theoretical and experimental approach. Biochem. J. 356: 433–444. Bröer, S. and Krämer, R. (1991). Lysine excretion by Corynebacterium glutamicum. 1. Identification of a specific secretion carrier system. Eur. J. Biochem. 202: 131–135. Reinscheid, D.J., Kronemeyer, W., Eggeling, L. et al. (1994). Stable expression of hom-1-thrB in Corynebacterium glutamicum and its effect on the carbon flux to threonine and related amino acids. Appl. Environ. Microbiol. 60: 126–132. Bongaerts, J., Krämer, M., Müller, U. et al. (2001). Metabolic engineering for the microbial production of aromatic amino acids and derived compounds. Metab. Eng. 3: 289–300. Vrljic, M., Kronemeyer, W., Sahm, H., and Eggeling, L. (1995). Unbalance of l-lysine flux in Corynebacterium glutamicum and its use for the isolation of excretion-defective mutants. J. Bacteriol. 177: 4021–4027. Vrljic, M., Sahm, H., and Eggeling, L. (1996). A new type of transporter with a new type of cellular function: l-lysine export from Corynebacterium glutamicum. Mol. Microbiol. 22: 815–826. Yang, C., Hua, Q., and Shimizu, K. (1999). Development of a kinetic model for l-lysine biosynthesis in Corynebacterium glutamicum and its application to metabolic control analysis. J. Biosci. Bioeng. 88: 393–403. Kreuzer, C., Hans, S., Mechtild, R. et al. (2001). l-Lysine-producing Corynebacteria and process for the preparation of l-lysine. U.S. Patent 6 200 785 B1. Cornish-Bowden, A., Hofmeyr, J.H.S., and Cárdenas, M.L. (1995). Strategies for manipulating metabolic fluxes in biotechnology. Bioorg. Chem. 23 (4): 439–449. Holms, W.H., Hamilton, I.D., and Mousdale, D. (1991). Improvements to microbial productivity by analysis of metabolic fluxes. J. Chem. Technol. Biotechnol. 50: 139–141. Hädicke, O. and Klamt, S. (2011). Computing complex metabolic intervention strategies using constrained minimal cut sets. Metab. Eng. 13: 204–213. Trinh, C.T., Unrean, P., and Srienc, F. (2008). Minimal Escherichia coli cell for the most efficient production of ethanol from hexoses and pentoses. Appl. Environ. Microbiol. 74: 3634–3643. Niederberger, P., Prasad, R., Miozzari, G., and Kacser, H. (1992). A strategy for increasing an in vivo flux by genetic manipulation: the tryptophan system of yeast. Biochem. J. 287: 473–479. Kacser, H. and Acerenza, L. (1993). A universal method for achieving increases in metabolite production. Eur. J. Biochem. 216: 361–367.

References

62 Stephanopoulos, G. and Simpson, T.W. (1997). Flux amplification in complex

metabolic networks. Chem. Eng. Sci. 52: 2607–2627. 63 Hunt, P. (2002). The Application of the Universal Method to Tryptophan

Biosynthesis in Yeast. University of Edinburgh. 64 Patnaik, R., Spitzer, R.G., and Liao, J.C. (1995). Pathway engineering for

65

66 67

68 69

70

71

72

production of aromatics in Escherichia coli: confirmation of stoichiometric analysis by independent modulation of AroG, TktA, and Pps activities. Biotechnol. Bioeng. 46: 361–370. Lu, J.L. and Liao, J.C. (1997). Metabolic engineering and control analysis for production of aromatics: role of transaldolase. Biotechnol. Bioeng. 53: 132–138. Peñalva, M.A., Rowlands, R.T., and Turner, G. (1998). The optimization of penicillin biosynthesis in fungi. Trends Biotechnol. 16: 483–489. Kennedy, J. and Turner, G. (1996). 𝛿-(l-𝛼-Aminoadipyl)-l-cysteinyl-d-valine synthetase is a rate limiting enzyme for penicillin production in Aspergillus nidulans. Mol. Gen. Genet. 253: 189–197. Fell, D.A. (2018). Metabolic control analysis of exponential growth and product formation. bioRxiv: 485680. Woodfield, H.K., Fenyk, S., Wallington, E. et al. (2019). Increase in lysophosphatidate acyltransferase activity in oilseed rape Brassica napus increases seed triacylglycerol content despite its low intrinsic flux control coefficient. New Phytol. 10 (224): 700–711. Tang, M., Guschina, I.A., O’Hara, P. et al. (2012). Metabolic control analysis of developing oilseed rape Brassica napus (cv Westar) embryos shows that lipid assembly exerts significant control over oil accumulation. New Phytol. 196: 414–426. Olivier, B.G., Rohwer, J.M., Snoep, J.L., and Hofmeyr, J.H.S. (2006). Comparing the regulatory behaviour of two cooperative, reversible enzyme mechanisms. IEE Proc. Syst. Biol. 153: 335–337. Poolman, M.G. (2006). ScrumPy – metabolic modelling with Python. IEE Proc. Syst. Biol. 153 (5): 375–378.

211

213

7 Thermodynamics of Metabolic Pathways Daniel Robert Weilandt * , Maria Masid * , and Vassily Hatzimanikatis Laboratory of Computational Systems Biotechnology, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland * These authors contributed equally

7.1 Bioenergetics in Life and in Metabolic Engineering Living organisms require a constant supply of energy to maintain their life cycles. Therefore, organisms have evolved strategies to harvest energy stored in their environment to satisfy this constant demand. Energy from the environment comes in different forms, such as, in the chemical bonds of molecules, like glucose and other sugars, as chemical potentials like hydrogen gradients, or as radiation, as the visible light [1]. The second law of thermodynamics dictates that this energy can only be harvested if a thermodynamic force drives the chemical and physical processes involved [2]. As a consequence, the intracellular processes of cells have to be continuously displaced from equilibrium to maintain their functions. Therefore, organisms evolved biochemistry that converts the different forms of energy to energy stored in chemical bonds of compounds such as ATP, NADH, or FADH [3]. These energy-rich compounds are then used to drive energy-demanding reactions such as the synthesis of biopolymers like DNA, RNA, lipids, and proteins. The biochemical capabilities of a cell are commonly described by metabolic networks known as genome-scale metabolic models (Chapter 2). These networks encompass the set of biochemical reactions that describe the biochemistry necessary to process chemicals and energy from the environment to synthesize the macromolecules required for cellular growth and maintenance functions. Under constant environmental conditions, cells operate at a steady state, i.e. they constantly process nutrients without accumulating any intermediate compounds, yielding either a constant growth rate or a nongrowing state where the synthesis rates of macromolecules equal their degradation rates. This characteristic behavior allows investigating the flux distribution of the biochemical reactions within the cells, for example, using flux balance analysis (FBA). FBA allows to calculate and analyze the stoichiometrically feasible flux distribution of the cells assuming a cellular objective function [4]. It is important to note that in cells the flux Metabolic Engineering: Concepts and Applications, First Edition. Edited by Sang Yup Lee, Jens Nielsen, and Gregory Stephanopoulos. © 2021 WILEY-VCH GmbH. Published 2021 by WILEY-VCH GmbH.

214

7 Thermodynamics of Metabolic Pathways

distribution is not only subject to the stoichiometric constraints but also to the thermodynamic driving forces. According to the second law of thermodynamics, these thermodynamic driving forces determine the directionality of the reaction net fluxes (Box 7.1) [1]. Therefore, thermodynamics additionally constrains the stoichiometrically feasible flux distributions resulting in only flux directionalities that satisfy the second law of thermodynamics. Metabolic engineering exploits the capability of cells to convert inexpensive molecules like sugar into high-energy intermediates to drive the synthesis of valuable (bio)chemical compounds. To this end, genetic modifications are introduced into the cells to alter their biochemical capabilities, or the native metabolism of the cell is rewired by introducing heterologous biochemistry that forces the cells to produce the desired compounds. However, if the introduced pathway violates the stoichiometric or the thermodynamic constraints it will not be operational. Therefore, when engineering the metabolism of the cell or designing a new strain to produce a certain biochemical, it is crucial to take into account if the genetic alterations allow the organism to maintain its life cycle and if the reactions to produce the desired compound are stoichiometrically and thermodynamically feasible. This chapter introduces existing methods to compute and estimate the thermodynamic information for the compounds and reactions in the metabolic networks using experimental data. Furthermore, it presents a workflow to integrate the thermodynamic information in the metabolic models by expanding the FBA formulation, building metabolic models that consider not only the stoichiometric but also the thermodynamic feasibility of the flux distributions of the reactions of the network at steady state. The resulting metabolic models will help in the process of engineering cells from different organisms and to design novel strains for the production of biochemical compounds of interest. Box 7.1 Reaction reversibility vs. reaction directionality. It is important to differentiate between reaction reversibility and reaction directionality. For an enzyme catalyzing a simple reaction A → B, reaction reversibility is the intrinsic property of the enzyme denoting the catalytic capability of the enzyme to perform (catalyze) the forward (A → B) and the reverse (B → A) reactions. For most of the enzymes in the metabolic networks, the information about reaction reversibility is scarce. In the absence of information about the catalytic properties of the enzyme, it is reasonable to assume that the enzymes are catalytically reversible. On the other hand, reaction directionality expresses the possibility of the reaction to carry a net flux in any of the directions allowed by its reversibility. The directionality of the reaction is determined by the displacement of the reaction from the thermodynamic equilibrium. According to the second law of thermodynamics, the reaction net flux (directionality) can only occur in the direction for which the Gibbs free energy of the reaction is negative. The Gibbs free energy of the reaction depends on the Gibbs free energy of formation of the compounds and their activities (A and B in the example).

7.2 Thermodynamics-Based Flux Analysis Workflow

7.2 Thermodynamics-Based Flux Analysis Workflow The flux through the reactions in the metabolic network of an organism is governed by the laws of thermodynamics. It is thus fundamental to integrate this information in the genome-scale metabolic models (GEMs) to correctly define the space of thermodynamically feasible reaction fluxes. To this end, the method thermodynamics-based flux analysis (TFA) incorporates thermodynamic information in the GEMs by performing a thermodynamic curation of the model, which includes the cellular thermodynamic properties of the compounds and compartments in the model, and by extending the mathematical formalism of the flux balance analysis (FBA) problem. In effect, TFA extends the FBA problem to account for the thermodynamic constraints by formulating a mixed-integer linear program (MILP). In comparison with FBA, TFA reduces the allowable flux space by eliminating thermodynamically infeasible flux configurations, and it allows to determine the thermodynamically feasible space of metabolite concentrations and thermodynamic displacements [5, 6]. The second law of thermodynamics defines the directionality of the reactions based on their Gibbs free energy. For a reaction with M reactants, the Gibbs free energy is formulated as it follows: Δr G′ = Δr G′∘ + RT

M ∑

nj ln(xj )

j=1 ′

where Δr G ∘ is the Gibbs free energy of the reaction at aqueous standard conditions (1 atm, 25 ∘ C, 1 M concentration, pH = 7, ionic strength I = 0 M), and nj and xj are the stoichiometric coefficients and the activities of each reactant, respectively. T denotes the temperature and R is the universal gas constant. The activities of the reactants can be approximated by their concentrations if ′ the Δr G ∘ is corrected to account for nonstandard physiological conditions. In ′ that case, the Δr G ∘ is modified to the cellular pH, ionic strength, temperature, and membrane potential of the specific organism, resulting in Δr G′ = (Δr G′∘ )pH, I, T, ΔΨ + RT

M ∑

nj ln([Xj ])

j=1

where, [X j ] is the concentration of reactant j and ΔΨ is the membrane potential. Sections 7.2.1 and 7.2.2 provide detailed guidelines on how to use available methods to estimate and correct the thermodynamic information required to incorporate the Gibbs free energy of formation of the compounds and reactions in the GEM under physiological conditions and how the mathematical formulation of TFA allows to include additional constraints in the FBA problem to account for the second law of thermodynamics. 7.2.1

Thermodynamic Model Curation ′

The physiological standard Gibbs free energies of reaction (Δr G ∘ )pH, I, T, ΔΨ are estimated as a weighted sum of the physiological standard Gibbs free

215

216

7 Thermodynamics of Metabolic Pathways ′

energies of formation (Δf G ∘ )pH, I, T, 𝜎 of the compounds participating in the reaction (Box 7.2). The curation of the Gibbs free energy of the compounds is herein done in three steps. First, the aqueous standard Gibbs free energies of ′ formation (Δf G ∘ ) are estimated using group contribution methods. Secondly, ′∘ the Δf G are corrected using the Debye–Hückel approximation to account for the physiological ionic strength and pH of the predominant dissociation state of the compounds under those conditions. Thirdly, the isomer distribution in ′ physiological conditions is considered to further correct the (Δf G ∘ )pH, I, T for all ′ the isomer forms of the compound present in the cell. Finally, the (Δf G ∘ )pH, I, T, 𝜎 are used to compute the physiological standard Gibbs free energy of reactions accounting also for membrane potentials in the case of transport reactions. Box 7.2 Standard and physiological standard Gibbs free energies. ′

Aqueous standard Gibbs free energy of formation Δf G ∘ denotes the chemical potential of a chemical compound in aqueous solution at standard conditions (pH = 7, T = 25∘ C, I = 0 M, [X] = 1 M, p = 1 atm). Physiological standard Gibbs free energy of ′ formation (Δf G ∘ )pH, I, T, 𝜎 includes the changes in the activities due to changes in pH, ions, and other nonideal factors. These transformed energies are then used with the respective membrane potentials ΔΨ to calculate the physiological stan′ dard Gibbs free energy of reactions (Δr G ∘ )pH, I, T, ΔΨ .

7.2.1.1

Estimation of the Standard Free Energies of Formation

The first step in the thermodynamic curation of a genome-scale model is to ′ obtain the aqueous standard Gibbs free energy of formation Δf G ∘ for the compounds in the model. The values for these formation energies can be obtained (i) from calorimetric measurements, e.g. reported in the NIST database [7] or the National Bureau of Standards database [8]; (ii) from quantum mechanical/molecular dynamics (QM/MD) calculations, which use computational methods to estimate the energies of formation from the first physical principle [9–11]; and (iii) from group contribution methods that use machine learning methods to estimate the energies of formation based on the chemical structure of the compounds. Although plenty of calorimetric measurements were conducted during the 1950s and 1960s, the data available today do not cover all the compounds known to be involved in biochemical reactions. This knowledge gap requires the usage of computational methods to estimate the energies of formation. Although QM/MD methods could be used to estimate the Gibbs free energies of formation, their high computational cost limits their capability to interrogate large reaction systems. This has promoted the emergence of data-driven methods to compute the Gibbs free energies of formation. Among them, the group contribution methods (GCMs) [12] are most commonly used to estimate the Gibbs free energies of formation. Generally, GCMs decompose the chemical structure into fragments that account additively for the overall free energy of formation.

7.2 Thermodynamics-Based Flux Analysis Workflow

The first GCMs developed to estimate the Gibbs free energy of formations for biochemical compounds [13] and free energies of biotransformations [14] were developed by Mavrovouniotis. His methods were later extended by Jankowski et al. [15], including additional Gibbs free energy measurements corrected to standard conditions, introducing additional groups for charged molecules and halogen residues, as well as, adding interaction groups for conjugations, thioesters, and vicinal chlorides. Later attempts focused on increasing the precision of computing the standard Gibbs free energies of reactions and on accounting for different temperatures [16, 17]. Depending on physiological conditions, in particular pH, the compounds can exist in different dissociation states. The GCM by Jankowski et al. allows to estimate the Gibbs free energy of formation of the predominant isomer at aqueous standard conditions and use it as a reference to estimate the physiological Gibbs free energy of formation of the other dissociation states. Therefore, the first step in computing the standard Gibbs free energy of formation is to identify the predominant isomer of the compound under standard conditions. To this end, it is recommended to use physiochemical calculation software, such as the Marvin calculation plugins, SPARC [18], Sirius Analytical website, or ChemSpider. Then, the group contribution method decomposes the predominant structure at aqueous standard conditions into molecular groups, which are identified using a particular search priority. Thereby, the groups with the lowest search priority are assigned first, following the ones with higher search priority (Table 7.1). The only exception to this rule are phosphate chains of size n, which should always be decomposed into n − 1, −O − (PO2 )−1 − groups and one −O − (PO2 )−1 − O or −O − (PO3 )−1 group. In the next step, the interaction groups for, e.g. aromatic, three-member, or fused rings and conjugations are identified (Table 7.2). These groups where introduced to account for interactions among specific molecular groups, e.g. in aromatic rings where the ring electrons are shared. Finally, if the compound can be completely decomposed into molecular groups, the summation of the Gibbs free energy contributions of the molecular and interaction groups yields an estimate for the standard Gibbs free energy of formation of the predominant isomer at standard conditions: ∑ Δf Gs′∘ = Δg G′∘ × ng,s g ′

where Δg G ∘ denotes the Gibbs free energy contribution of group g and ng, s denotes the number of times group g occurs within the structure of the isomer s. For example, glucose is decomposed into four molecular groups following the priority search (Table 7.2). The Gibbs free energy of formation of each group is then used to compute the overall Gibbs free energy of formation of glucose ′∘ (Δf GGLC ) by multiplying the occurrence of each group times its energy of for′∘ mation and adding all the values. This results in a Δf GGLC of −218.28 kcal mol−1 (Figure 7.1a). Following the same procedure, ADP is decomposed into 12 groups ′∘ and 1 interaction group, resulting in an overall Δf GADP of −465.85 kcal mol−1 (Figure 7.1b).

217

218

7 Thermodynamics of Metabolic Pathways

Table 7.1 Thermodynamic information (aqueous standard Gibbs free energy and its corresponding error) for the groups as defined in the group contribution method by Jankowski et al. ′

Groups

−Cl (attached to a primary carbon with two other Cl atoms attached) −S − OH −CO − OPO−2 3 − =N+ < (participating in two fused rings) −O − CO − (participating in a ring) ≡C− >CH− (participating in two fused rings) −Cl (attached to a primary carbon with one other Cl atom attached) −S− (participating in a ring) −O − PO−1 2 − (participating in a ring) >N− (participating in two fused rings) −O − CO− >C= (participating in two fused aromatic rings)



𝚫g G ∘ (kcal mol−1 )

(𝚫g G ∘ )err (kcal mol−1 )

−5.55

0.293

Search priority

1

32.4

3.42

1

−298

0.239

1

3.77

1.27

1

−71.0

0.787

1

41.6

2.32

1

2.6

0.779

1

−8.54

0.397

2 2

0.72

0.706

−190

0.957

2

12.4

1.1

2

−75.3

0.422

2

−0.0245

0.927

2

>C< (participating in two fused rings)

−3.89

3.03

2

−Cl (attached to a secondary carbon with one other Cl atom attached)

−7.18

0.448

3 3

−S+ < −O − PO−2 3 =N+ < (double bond and one single bond participating in a ring)

21.9

2.05

−254

0.159

3

13.5

0.672

3

>C = O (participating in a ring)

−30.1

0.292

3

>C= (participating in two fused nonaromatic rings)

16.7

0.891

3

>CH2 (participating in one ring)

3.18

0.247

3

−Cl (attached to a primary carbon with no other Cl atoms attached)

−11.7

0.481

4

−OSO−1 3 −

−156

0.698

4

−O − PO−2 2

−205

0.44

4

> NH+2

5.95

0.9

4

−83.1

0.111

4

>C= (participating in an aromatic and a nonaromatic fused rings)

6.77

0.607

4

>CH− (participating in one ring)

4.84

0.216

4

−COO−1

(continued)

7.2 Thermodynamics-Based Flux Analysis Workflow

Table 7.1 (Continued) ′

Groups

−Cl (attached to a secondary carbon with no other Cl atoms attached) −SH −O − PO−1 2 − O− >NH −CH = O =CH− (participating in one aromatic ring) >C< (participating in one ring) −Cl (attached to a tertiary carbon with no other Cl atoms attached) −S−1 −O − PO−1 2 − >NH (participating in a ring) >C = O >C= (a single bond and a double bond participating in an aromatic ring) −CH3 −Br (attached to an aromatic ring)



𝚫g G ∘ (kcal mol−1 )

(𝚫g G ∘ )err (kcal mol−1 )

−10.2

0.6

−0.740

0.636

5

−234

0.438

5

10.5

0.515

5

Search priority

5

−30.4

0.164

5

4.93

0.142

5

7.17

0.42

5

−7.38

0.422

6

12.7

2.85

6

−208

0.122

6

6.18

0.532

6

−28.4

0.18

6

6.95

0.313

6

−3.65

0.109

6

2.5

1.26

7

5.69

1.2

7

>N− (participating in a ring)

22.1

0.617

7

−O− (participating in a ring)

−36.6

0.902

7

>C= (two single bonds participating in a nonaromatic ring)

32.1

2.14

7

>CH2

1.62

0.088

7

−I (attached to an aromatic ring)

16.6

1.26

8

−S−

8.77

0.74

8

=NH+ − (participating in a ring)

4.37

1.04

8

−41.5

0.126

8

8.46

0.293

8

5.08

0.153

8

−43.0

1.26

9

−S − S−

−OH =CH− (participating in a nonaromatic ring) >CH− −F (attached to an aromatic ring) =N− (participating in a ring) −O−1 >C= (a double bond and a single bond participating in a ring) >C< = NH+2

4.17

0.572

9

−32.8

0.934

9

11.7

0.362

9

7.12

0.298

9

−22.7

1.34

10 (continued)

219

220

7 Thermodynamics of Metabolic Pathways

Table 7.1 (Continued) ′



𝚫g G ∘ (kcal mol−1 )

(𝚫g G ∘ )err (kcal mol−1 )

−23.2

0.408

10

≡CH

60.7

4.74

10

=NH

−21.7

1.52

11

=CH−

12.8

0.242

11

−NH+3

−6.25

0.196

12

=CH2

6.87

0.312

12

−NH2

2.04

0.331

13

>C=

15.7

0.394

13

−32.1

4.34

14

>NH+ −

15.5

1.17

15

>N−

24.4

1.14

16

=N−

16.1

3.16

17

>N+
CH– (in a ring) –O– (in a ring) >CH2 –OH

O OH

5 1 1 5

OH

OH

ΔgG′° (kcal mol–1) 4.84 24.20 –36.60 –36.60 1.62 1.62 –41.50 –207.50 = –218.28 ΔfG′° GLC

OH

(a)

ADP group decomposition NH2

HO

P O–

N

O

O O

P

O

O–

O

N

Groups

N N

Phosphate chain OH OH

(b)

Num. –1

–O–PO2 – –O–PO3–1 >C= (in two non-arom. rings) >CH– (in a ring) –CH= (in a non-arom. ring) >CH2 –O– (in a ring) –OH >C= (in a non-arom. ring*) =N– (in a ring) –NH2 >N– Heteroaromatic ring

1 1 2 4 2 1 1 2 1 3 1 1 2

ΔgG′° (kcal mol–1) –208.00 –254.00 16.70 4.84 8.46 1.62 –36.60 –41.50 11.70 4.17 2.04 22.10 –1.95

–208.00 –254.00 33.40 19.36 16.92 1.62 –36.60 –83.00 11.70 12.51 2.04 22.10 –3.90

ΔfG′° = –465.85 ADP

Figure 7.1 Application of the group contribution method to metabolic compounds. (a) Group decomposition of D-glucose according to the group contribution methods by Jankowski et al. (b) Group decomposition of ADP according to the group contribution method by Jankowski et al. Source: Data from Jankowski et al. [15].

Debye–Hückel approximation [19]:

√ 2 I − N ) A(z ( ) H,s s Δf Gs′∘ pH,I,T = Δf Gs′∘ + NH,s RT ln(10)pH − RT ln 10 √ 1 + 𝛼s B I

where NH,s is the number of hydrogen atoms in the isomer s; T denotes the absolute temperature; zs is the charge of the isomer; 𝛼 s is the effective hydrated diameter of the isomer; I is the ionic strength; A, B are temperature-dependent parameters; and R denotes the universal gas constant. The parameters A and B are derived from the extended Debye–Hückel limiting law (Table 7.3). Alberty recommends estimating the effective hydrated diameter with 𝛼s = 4.87 Å to get reasonable corrections for ionic strength between 0.05 and 0.25 M [19]. As the Jankowski GCM estimates the Gibbs free energy at standard conditions, the Δf Gs′∘ Debye–Hückel correction is performed using the parameters for 25 ∘ C. 7.2.1.3

Compensating the Free Energy of Formation for Isomer Distributions

Until now the Gibbs free energy of formation of the principal isomer has been corrected for nonstandard pH and ionic strength. However, changes in the pH may also induce changes in the dissociation state of the compound. For example, if a weakly acidic compound is introduced into a solution with a pH larger than its acid dissociation constant K a , part of the molecules dissociate into their base

221

222

7 Thermodynamics of Metabolic Pathways

Table 7.3 Parameters A and B calculated for different temperatures according to the extended Debye–Hückel limiting law [20]. Temperature (∘ C)

−1

A (l1/2 mol−1/2 )

B (l1∕2 mol−1∕2 Å )

20

0.5046

0.3276

25

0.5092

0.3286

30

0.5141

0.3297

40

0.5241

0.3318

Source: Data from Wright [20].

and a hydrogen ion. This occurs until an equilibrium distribution of associated and dissociated species is reached. If the compound can donate multiple protons there exist acid dissociation constants Kai for each proton dissociation state. [AHn ]zi +1 ⇌ [AHn−1 ]zi + H + Kai A compound can exist in several dissociation states at the same time, where the abundance of each state depends on the pH and its acid dissociation constant. To account for the apparent free energy of the isomer mixture originating from a single chemical compound in its different dissociation states, the Gibbs free energy of a pseudoisomer (𝜎) is introduced. The standard physiological Gibbs free energy of formation for the pseudoisomer (Δf Gj′∘ )pH,I,T,𝜎 describes the overall chemical potential of the pool of dissociated isomers. It is calculated from the individual physiological Gibbs free energy contributions (Δf Gs′∘ )pH,I,T of each isomer species s at a given pH and ionic strength, (N ( )) ′∘ 𝜎 ∑ G ) (Δ f s pH,I,T (Δf Gj′∘ )pH,I,T,𝜎 = −RT ln exp − RT s=1 It was shown by Alberty and Goldberg [21] that this could also be rewritten as (Δ G′∘ ) = (Δ G′∘ ) − RT ln P f

j

pH,I,T,𝜎

f

1 pH,I,T

where Δf G1′∘

is the standard transformed Gibbs energy of formation of the species with the smallest number of dissociable hydrogen atoms (which corresponds to the isomer with the smallest dissociation constant Ka1 ) and P is the binding polynomial, P =1+

[H + ]2 [H + ]3 [H + ]N𝜎 [H + ] + + +…+ Ka1 Ka1 Ka2 Ka1 Ka2 Ka3 KaN ! 𝜎

+

where [H ] is the concentration of the hydrogen as defined by the pH and Kai are the successive acid dissociation constants enumerated from smallest to highest. Given that the equilibrium constants depend on physiological conditions, the individual acid dissociation constants are corrected for ionic strength using the

7.2 Thermodynamics-Based Flux Analysis Workflow

Debye–Hückel approximation as used in the previous section, √ A I(1 − (zi − 1)2 + zi 2 ) ln Kai (I) = ln Kai (I = 0) − ln 10 √ 1 + 𝛼i B I where zi is the charge of the dissociated isomer. Given the series of Kai values the with Ka1 as the smallest acid dissociation, i.e. the highest pK ai value, the correction for the Kai acid dissociation constant can be also calculated as follows: √ A I(1 − (z1 + i − 2)2 + (z1 + i − 1)2 ) ln Kai (I) = ln Kai (I = 0) − ln 10 √ 1 + 𝛼i B I Using the corrected acid dissociation constants Kai (I) to calculate the corrected binding polynomial PI and applying the ionic strength corrections to the species with the lowest number of hydrogen (Δf G1′∘ )pH,I,T , the physiological Gibbs free energy of formation for each pseudoisomer is calculated as follows: (Δf Gj′∘ )pH,I,T,𝜎 = (Δf G1′∘ )pH,I,T − RT ln(PI )

7.2.1.4

Computing the Transformed Free Energies of Reaction

Finally, the reaction stoichiometry nij , the transformed Gibbs free energies of the reactants (Δf Gj′∘ )pH,I,T,𝜎 and the Gibbs free energy required to transport ions ′∘ )pH,T,Δ𝜓 [5, 22], are used to compute the transformed Gibbs free energy (Δr Gi,tpt of each reaction (Δr Gi′∘ )pH,I,T,Δ𝜓 as follows: (Δr Gi′∘ )pH,I,T,Δ𝜓 =

M ∑

′∘ nij (Δf Gj′∘ )pH,I,T,𝜎 + (Δr Gi,tpt )pH,T,Δ𝜓

j=1 ′∘ where (Δr Gi,tpt )pH,T,Δ𝜓 accounts for the Gibbs free energy contributions from proton and charge transport, which originate from differences in the electrochemical potential and differences in the pH for ions transported across a membrane [5, 22], such that, ′∘ (Δr Gi,tpt )pH,T,Δ𝜓 = zi,tpt FΔ𝜓 − NH,tpt ln 10RTΔpH

where zi, tpt is the charge transported across the membrane, F is the Faraday constant, Δ𝜓 is the electrical potential across the membrane, NH,tpt is the number of hydrogens transported across the membrane, and ΔpH is the difference in pH between both compartments [5, 22]. Note that in the case of non-transport reac′∘ tions the term (Δr Gi,tpt )pH,Δ𝜓 will be zero, as Δ𝜓 = 0 and ΔpH = 0, when the reaction occurs in one compartment. The physiological Gibbs free energies of formation of the reactants (Δf Gj′∘ )pH,I,T,𝜎 already account for the chemical potential of protons at a given pH. Therefore, the contribution of protons to the (Δr Gi′∘ )pH,I,T,Δ𝜓 is considered (Δf GH′∘+ )pH,I,T,𝜎 = 0 [19].

223

224

7 Thermodynamics of Metabolic Pathways

Example 7.1 Thermodynamic curation of the reaction Hexokinase in E. coli. In the first step of glycolysis, the reaction catalyzed by the cytosolic enzyme Hexokinase converts glucose (GCL) into glucose 6-phosphate (G6P) using ATP and ADP as cofactors, such that:

HEX D-glucose ATP

ADP

H+

D-Glucose 6-phosphate

In order to obtain the thermodynamic information associated to this reaction, the chemical structures of the four reactants are obtained. Next, using all pK a values for a physiologically relevant range of 0 < pK a < 11, the predominant forms of these compounds at pH = 7 are identified. For the case of glucose 6-phosphate, there exist two physiologically relevant pK a values corresponding to three dissociation states as depicted here: O– OH P O O

OH OH P O O H

O H

H OH

H

H

OH

HO

OH

H Ka2 = 6.03 ×10

–2

O– O– P O O O H

H OH

H

H

OH

HO

+

Ka1 = 5.62 × 10–7

OH

G6P[3]

H

H+

O H

H OH

H

H

OH

HO

G6P[2]

+

2 H+

OH G6P[1]

Based on this information, the predominant form of glucose 6-phosphate at pH = 7 is the fully dissociated state (corresponding, in this case, to the lowest Kai ). In the same form, the following pKai values were identified for the rest of the species that participate in the reaction: pKa = –log10(Ka)

Compound

pKa values

Ka values

D-glucose





Glucose 6-phosphate

1.22 | 6.25

6.03 × 10–2 | 5.62 × 10–7

ATP

0.90 | 1.55 | 3.29 | 7.42

1.26 × 10–1 | 2.82 × 10–2 | 5.13 × 10–4 | 3.80 × 10–8

ADP

1.77 | 2.22 | 7.42

1.70 × 10–2 | 6.00 × 10–3 | 3.80 × 10–8

Once identified, the predominant forms at pH = 7 are used in the group contribution method to characterize the chemical and interactions groups of the individual compounds. As a result, the respective free energy contributions of each group are computed as illustrated in Figure 7.1. These will then yield the standard Gibbs free energies of formation of each of the four compounds: ′∘ ′∘ Δf GGLC = −218.28 kcal mol−1 , Δf GG6P = −430.78 kcal mol−1 , ′∘ ′∘ = −673.85 kcal mol−1 , Δf GADP = −465.85 kcal mol−1 . Δf GATP

7.2 Thermodynamics-Based Flux Analysis Workflow

Following the workflow, the physiological Gibbs free energy (corrected for physiological pH and ionic strength) of formation of the compounds and their isomers are computed. In the case of E. coli, a cytosolic pH of 7.5, a cytosolic ionic strength of 0.25, and a temperature of 298.15 K are considered. For glucose 6-phosphate, using the universal gas constant as R = 1.986 × 10−3 kcal (K ⋅ mol)−1 , the corresponding physiological Gibbs free energy of the three isomers are: ′∘ )pH,I,T = −430.78 + 11 × R × 298.15 ln(10)7.5 (Δf GG6P[1] √ 0.5092((−2)2 − 11) 0.25 −R × 298.15 ln(10); √ 1 + 4.87 × 0.3286 0.25 (Δ G′∘ ) = −316.95 kcal mol−1 . f

G6P[1] pH,I,T

′∘ )pH,I,T = −430.78 + 12 × R × 298.15 ln(10)7.5 (Δf GG6P[2] √ 0.5092((−1)2 − 12) 0.25 −R × 298.15 ln(10); √ 1 + 4.87 × 0.3286 0.25 (Δ G′∘ ) = −305.95 kcal mol−1 . f

G6P[2] pH,I,T

′∘ )pH,I,T = −430.78 + 13 × R × 298.15 ln(10)7.5 (Δf GG6P[3] √ 0.5092((0)2 − 13) 0.25 −R × 298.15 ln(10); √ 1 + 4.87 × 0.3286 0.25 (Δ G′∘ ) = −295.34 kcal mol−1 . f

G6P[3] pH,I,T

Next, the Kai values are corrected to account for the physiological ionic strength: √ 0.5092 0.25(1 − ((−2) − 1)2 + (−2)2 ) −7 ln 10; ln Ka1 (I) = ln(5.62 × 10 ) − √ 1 + 4.87 × 0.3286 0.25 Ka1 (I) = 2.07 × 10−6 M √ 0.5092 0.25(1 − ((−1) − 1)2 + (−1)2 ) ln 10; ln Ka2 (I) = ln(6.03 × 10 ) − √ 1 + 4.87 × 0.3286 0.25 Ka2 (I) = 1.16 × 10−1 M −2

And the corrected Kai values are used to compute the binding polynomial, 10−7.5 10−2×7.5 = 1.015 + −6 −6 2.07 × 10 (2.07 × 10 ) × (1.16 × 10−1 ) ′∘ Using the (Δf GG6P[1] )pH,I,T , and the binding polynomial, the physiological Gibbs free energy for the pseudoisomer of d-glucose 6-phosphate is computed, P =1+

′∘ )pH,I,T,𝜎 = −316.95 − R × 298.15 ln(1.015) = −316.96 kcal mol−1 (Δf GG6P

225

226

7 Thermodynamics of Metabolic Pathways

Similarly, the pseudoisomer (Δf Gj′∘ )pH,I,T,𝜎 of the other compounds are computed, resulting in, ′∘ )pH,I,T,𝜎 = −93.27 kcal mol−1 ; (Δf GGLC = −541.84 kcal mol−1 ; (Δ G′∘ ) f

ATP pH,I,T,𝜎

′∘ )pH,I,T,𝜎 = −332.52 kcal mol−1 (Δf GADP

Knowing the physiological Gibbs free energies of all the compounds, the physiological standard Gibbs free energy of the reaction is computed as follows: ′∘ ′∘ ′∘ )pH,I,Δ𝜓 = nHEX,GLC (Δf GGLC )pH,I,T,𝜎 + nHEX,ATP (Δf GATP )pH,I,T,𝜎 (Δr GHEX ′∘ ′∘ +nHEX,G6P (Δf GG6P )pH,I,T,𝜎 + nHEX,ADP (Δf GADP )pH,I,T,𝜎 ′∘ +n ) + (Δ G HEX,H

f

H + pH,I,T,𝜎

′∘ (Δr GHEX )pH,I,Δ𝜓 = (−1)(−93.27) + (−1)(−541.84) + 1(−316.96)

+1(−332.52) + 1(0) = −14.37 kcal mol−1 Note that as cited in the main text, the contribution of the Gibbs free energy of formation of the protons are not explicitly considered to compute the Gibbs free energy of the reaction as their chemical potential is absorbed in the physiological Gibbs free energy of formation of the reactants. Example 7.2 Thermodynamic curation of d-glucose hydrogen transport in E. coli. As another example, the Gibbs free energy of a transport reaction is computed. To this end, the transport of d-glucose from the periplasm to the cytosol of E. coli is selected. In this reaction, d-glucose is transported with the help of transporting a proton in the same direction as d-glucose. D-Glucose

H+ Periplasm

GLCt2

cytosol

H+ D-Glucose

It has been already computed in the previous example that the physiological ′∘ ) = Gibbs free energy of formation of the cytosolic glucose is (Δf GGLC c pH,I,T,𝜎 −1 −93.27 kcal mol . Following the same procedure and considering a periplasmic pH of 7, and an ionic strength of 0.25 M, the physiological Gibbs ′∘ ) = free energy of formation of the periplasmic glucose is (Δf GGLC p pH,I,T,𝜎 −1 −101.45 kcal mol .

7.2 Thermodynamics-Based Flux Analysis Workflow

Next, the correction for the membrane potential is computed, assuming a membrane potential of the periplasm with respect to the cytosol of −0.15 V in the case of E. coli. The Faraday constant and the universal gas constant are F = 23.06 kcal eV−1 and R = 1.986 × 10−3 kcal (K ⋅ mol)−1 , respectively. = 1 × F × (−0.15) − 13 × ln 10 × R × 298.15 × (7.5 − 7); (Δ G′∘ ) r

i,tpt pH,Δ𝜓

′∘ (Δr Gi,tpt )pH,Δ𝜓 = −12.32 kcal mol−1

Finally, the physiological Gibbs free energy of the reaction corrected to account for the membrane potential between both compartments is: = 101.45 − 93.27 − 12.32; (Δ G′∘ ) = −4.14 kcal mol−1 (Δ G′∘ ) r

7.2.2

i

pH,I,Δ𝜓

r

i

pH,I,Δ𝜓

Mathematical Formulation

Considering a network with j = 1, …, M metabolites and i = 1, …, N reactions, TFA is formulated as a MILP extending the traditional FBA formulation. First, each flux variable is split into a positive forward and a positive backward flux variable. These two variables account for the forward (vFi ) and backward (vRi ) directionalities separately. Then, each of the forward and the backward flux variables are coupled to a binary variable (ziF,R ). This constraint enforces that only when the binary variable is equal to one the respective (directional) flux variable can be non-zero (Eq. 7.3). Moreover, to prevent that both flux directionalities are active, an additional constraint is introduced (Eq. 7.4). Subsequently, the constraints describing the Gibbs free energy are introduced. The TFA formulation computes the Gibbs free energy of each reaction as a function of the physiological standard Gibbs free energy of reaction, the logarithmic concentrations of the reactants ln([X j ]), and the temperature T (Eq. 7.5). The physiological Gibbs free energy of reaction (Δr Gi′∘ )pH,I,T,Δ𝜓 accounts, on the one hand, for changes in the metabolite activities due to differences in the physical properties of the cell, such as, pH, ionic strength, and temperature, and on the other hand, for additional thermodynamic driving forces for transport reactions, such as, membrane potentials. Finally, the Gibbs free energy variable Δr Gi′ of each reaction is coupled to the corresponding directionality binary variables (ziF,R ). To this end, TFA introduces new constraints that enforce the value of the binary variable respecting the second law of thermodynamics (Eq. 7.6). The resulting MILP allows only for stoichiometric and thermodynamically feasible reaction directionalities and thermodynamically feasible concentration ranges. Objective

min∕max

cT × v subject to

FBA constraints Mass balance

S×v=𝟎

Flux capacity

0 ≤ vFi ≤ vi

(7.1) F

0 ≤ vRi ≤

R vi

(7.2)

227

228

7 Thermodynamics of Metabolic Pathways

TFA constraints Flux coupling

vFi − ziF K ≤ 0 vBi ziF



ziB K ≤ ziB ≤ 1

(7.3)

0

+ (7.4) Directionality coupling ′ ′∘ Gibbs free energy of reaction Δr Gi = (Δr Gi )pH, I, T, Δ𝜓 +RT

M ∑

nij ln[Xj ]

(7.5)

j=1

Thermodynamic feasibility

Δr Gi′ + ziF K − K = 0 (7.6) Δr Gi′ − ziB K + K = 0

Concentration ranges

ln[Xj ] ≤ ln[Xj ] ≤ ln[Xj ] (7.7)

In the last years, software packages tools have been developed that allow to build and solve TFA models in an integrated way. In particular, matTFA and pyTFA are a Matlab and a python toolbox [23] that provide the tools to extend the FBA formulation of a metabolic model into a TFA formulation and then use MILP-solvers like GNU Linear Programming Kit (GLPK), CPLEX, or Gurobi, to solve the optimization problem.

7.3 Thermodynamics-Based Flux Analysis Applications The thermodynamic constraints implemented in TFA expand the analytic and predictive capabilities of FBA by introducing reaction directionalities, Gibbs free energies of reactions, and logarithmic metabolite concentrations as variables of the problem. These additional variables can then directly be used to integrate experimental data such as metabolite concentration and reaction rates. The following sections provide examples on how TFA can be used to integrate metabolomics data, and how desired directionalities of production pathways can infer thermodynamically feasible titer concentrations. 7.3.1

Constraining the Flux Space with Metabolomics Data

TFA allows integrating metabolomics data as bounds of the concentration variables. To this end, the upper and lower bounds of the logarithmic concentrations variables are defined according to the variability of the experimental measurements. The following demonstrates how metabolomics data can be used to gain insight into the physiologically and thermodynamically feasible flux distributions for E. coli. In this context, the E. coli genome-scale metabolic model IJO1366 [24] is thermodynamically curated to account for the Gibbs free energy of formation of the compounds and the Gibbs free energy of reactions. The thermodynamic model can then be used to integrate metabolomics data. As an example, metabolomics data from Park et al. [25] for the compounds in the central carbon metabolism are used to constrain the logarithmic concentration variables

7.3 Thermodynamics-Based Flux Analysis Applications

according to the 95% confidence interval of the concentration measurements (Table 7.4). The remaining logarithmic concentration variables are subject to the default physiological TFA bounds [26]. Then, the thermodynamically curated IJO1366 is optimized for growth under aerobic conditions using glucose as the only carbon source with and without constrained concentrations. Seeking to analyze the effect of constraining the concentrations on the thermodynamically feasible flux ranges, a thermodynamic flux variability analysis is performed for maximal growth with and without constrained concentrations. The analysis illustrates how the integration of metabolomics data constraints the flux space (Figure 7.2), revealing that in the case of integrating measured metabolite concentrations, four reactions (fumarase, isocitrate dehydrogenase, aconitase A, and B) within the TCA cycle have fixed directionalities, indicating a cyclic operating TCA cycle (Figure 7.3). Note here, that these four reactions were able to carry flux in both directions before the metabolomics data integration, meaning that, in this case, the mere integration of concentration data in combination with the thermodynamic constraints unveils additional phenotypical information. The analysis further indicates that integrating metabolomics data reduced the flux flexibility of most of the reactions in the central carbon metabolism (Figure 7.2). Considering that the integrated metabolic data only sparsely covered glycolysis and the pentose phosphate pathway, these results demonstrate that even partial metabolomics data can be used in TFA to better understand the phenotypic traits of the studied E. coli. 7.3.2

Characterizing the Feasible Concentration Space

Additionally, TFA allows investigating the opposite scenario where some of the directionalities are known, and the thermodynamically feasible concentrations are unknown. This analysis helps to design novel production strains, particularly when introducing heterologous enzymes to produce compounds non-native to the organism. As such non-native compounds may be toxic for the organism, it is helpful to assess the concentrations that will drive the reactions in the required directions avoiding high concentrations of undesirable compounds. TFA enables to compare the feasibility of different strain designs by calculating the thermodynamically feasible concentration ranges for the product of interest and the intermediates of the production pathway. In the following, this is demonstrated by assessing the thermodynamically feasible concentration ranges when ethanol production is demanded under optimal growth. Therefore, the model IJO1366 is again optimized for growth under aerobic conditions using glucose as the only carbon source. Subsequently, a variability analysis of the logarithmic concentration variables constraining the model to (i) maximal aerobic growth and (ii) maximal aerobic growth and maximal ethanol production at maximal growth is done (Figure 7.4). The results show that three metabolites in the TCA cycle (acetyl-CoA, succinate, and oxaloacetate) and two metabolites involved in glycolysis (dihydroxyacetone phosphate and glyceraldehyde 3-phosphate) are constrained to smaller concentration ranges when maximal ethanol production is forced. Thereby, for maximal ethanol production, acetyl-CoA, dihydroxyacetone

229

230

7 Thermodynamics of Metabolic Pathways

Table 7.4 Metabolomics data from Park et al. [25] integrated into the IJO1355 model. Metabolite

BiGG ID

2-Phosphoglycerate

2pg

6-Phospho-d-gluconate

6pgc

Mean (M)

LB (M)

9.18 × 10−5

3.81 × 10−5

3.22 × 10−4

−3

−3

3.85 × 10−3

−5

3.77 × 10

3.69 × 10

Acetoacetyl-CoA

aacoa

2.18 × 10

1.37 × 10

3.47 × 10−5

Acetyl-CoA

accoa

6.06 × 10−4

5.29 × 10−4

6.94 × 10−4

−5

−5

1.88 × 10−5

−3

1.13 × 10−3

−6

Aconitate Acetylphosphate

acon-C actp

−5

UB (M)

1.61 × 10

−3

1.07 × 10

1.02 × 10

ADP-glucose

adpglc

4.27 × 10

2.83 × 10

6.44 × 10−6

a-Ketoglutarate

akg

4.43 × 10−4

3.12 × 10−4

6.31 × 10−4

−4

−4

2.84 × 10−4

−3

3.48 × 10−3

−5

7.54 × 10−5

−5

S-Adenosyl-l-methionine Citrate Carbon dioxide

amet cit co2

−6

1.38 × 10

1.84 × 10

−3

1.96 × 10

−5

7.52 × 10

1.10 × 10 5.02 × 10

Erythrose-4-phosphate

e4p

4.90 × 10

4.19 × 10

5.64 × 10−5

Fructose-6-phosphate

f6p

2.52 × 10−3

2.16 × 10−3

2.89 × 10−3

fad

−4

−5

3.19 × 10−4

−2

1.64 × 10−2

−5

FAD Fructose-1,6-bisphosphate

fdp

−5

1.19 × 10

1.73 × 10

−2

1.52 × 10

1.40 × 10

Flavin mononucleotide

fmn

5.37 × 10

3.84 × 10

7.51 × 10−5

Glucose-6-phosphate

g6p

7.88 × 10−3

7.59 × 10−3

8.17 × 10−3

−3

−4

3.08 × 10−3

−5

1.87 × 10−4

−6

Glycerate sn-Glycerol 3-phosphate

glyc-R glyc3p

−5

9.33 × 10

1.41 × 10

−5

4.90 × 10

1.29 × 10

Isocitrate

icit

3.67 × 10

4.68 × 10

4.29 × 10−5

Malate

mal-L

1.68 × 10−3

1.66 × 10−3

1.70 × 10−3

−5

−7

3.09 × 10−3

−4

1.61 × 10−4

−3

Malonyl-CoA Methionine

malcoa met-L

−5

6.44 × 10

3.54 × 10

−4

1.45 × 10

1.31 × 10

NAD+

nad

2.55 × 10

2.32 × 10

2.80 × 10−3

NADH

nadh

8.36 × 10−5

5.45 × 10−5

1.27 × 10−4

nadp

−6

−7

3.11 × 10−5

−4

1.34 × 10−4

−7

NADP+ NADPH

nadph

−3

4.05 × 10

2.08 × 10

−4

1.21 × 10

1.10 × 10

Oxaloacetate

oaa

4.87 × 10

2.81 × 10

8.55 × 10−7

Phosphoenolpyruvate

pep

1.84 × 10−4

1.46 × 10−4

2.31 × 10−4

−5

−5

1.61 × 10−4

−3

4.20 × 10−3

−4

8.36 × 10−4

−5

Phenylpyruvate Pyruvate Ribose-5-phosphate

phpyr pyr r5p

−7

1.40 × 10

8.98 × 10

−3

3.66 × 10

−4

7.87 × 10

3.13 × 10 7.86 × 10

Riboflavin

ribflv

1.90 × 10

1.72 × 10

2.11 × 10−5

Ribulose-5-phosphate

ru5p-D

1.12 × 10−4

1.12 × 10−4

1.27 × 10−4

−4

−4

9.24 × 10−4

−4

9.49 × 10−4

Sedoheptulose-7-phosphate Succinate

s7p succ

−5

5.01 × 10

8.82 × 10

−4

5.69 × 10

8.40 × 10 3.41 × 10

LB and UB denote the respective lower and upper bounds of the 95% confidence interval. Source: Data from Park et al. [25].

7.4 Conclusion and Future Perspectives

Flux (mmol (gDW)−1 h−1)

Flux range without metabolomics

Flux (mmol (gDW)−1 h−1)

Flux range with metabolomics

Figure 7.2 Flux ranges in the central carbon metabolism of E. coli (IJO1366) calculated using thermodynamic flux variability analysis with and without integrating metabolomics data. FUM, fumarase; IDCHyr, isocitrate dehydrogenase; ACONTa, aconitase A; and ACONTb, aconitase B.

phosphate, and glyceraldehyde 3-phosphate are constrained to higher concentration ranges. In contrast, succinate and oxaloacetate are constrained to lower concentration ranges. A similar analysis can be used to compare different heterologous production pathways, by assessing the thermodynamic and physiological feasibility of the strain design, and by analyzing the thermodynamically required concentration ranges of the intermediate components of the pathway.

7.4 Conclusion and Future Perspectives Thermodynamics enables metabolic engineers to utilize fundamental laws of physics to gain insight into the native metabolism of a host organism, and to evaluate potential strain designs with respect to their thermodynamic and physiological properties. Particularly, the ability to directly integrate metabolomics

231

232

7 Thermodynamics of Metabolic Pathways glc-D_e

h_e

GLCptspp

h_e h_c

co2_c

PGL

G6PDH2r

GND

r5p_c

g6p_c

6pgl_c

nadp_c

CYTBDpp

RPI

nadph_c

h_c

q8h2_c

ru5p-D_c

6pgc_c

h2o_c

nadp_c

xu5p-D_c

nadph_c

q8_c

h_c

RPE

CYTBO3_4pp

q8h2_c

h2o_c

q8_c

nadh_c h_c

nad_c

TKT2

q8_c

h_c o2_c

NADH5

PGI

h_e

xu5p-D_c

f6p_c

q8h2_c

h2o_c

o2_c

TKT1

h_e

pi_c h_c

atp_c

PFK

e4p_c

NADH16pp

FBP

adp_c

FBA3

h2o_c

q8h2_c

nadh_c

nad_c

TALA

s17bp_c

fdp_c

THD2pp

h_c

nadh_c

nadp_c

h_e

nadph_c

adp_c

PFK_3

FBA

q8_c

atp_c

adp_c

s7p_c pi_c dhap_c

TPI

nad_c

h_c

ATPS4rpp

atp_c

NADTRHD

h2o_c

h_c

g3p_c nad_c pi_c h_c

GAPD

ATPM

nadh_c 13dpg_c

adp_c

PGK

q8_c

q8h2_c

atp_c

SUCDi

FUM

h_c

succ_c

fum_c

atp_c

3pg_c

h2o_c nad_c

PGM ME1

2pg_c

nadp_c

ENO

adp_c

nad_c

h2o_c

MDH

ME2

co2_c

pi_c

h_c

coa_c h_c

PPC

h2o_c

co2_c

pi_c

MALS

succoa_c glx_c

nadph_c

nadh_c

h2o_c pep_c

SUCOAS

mal-L_c

nadh_c

co2_c

coa_c

h_c

co2_c

nadh_c

oaa_c

PPCK

AKGDH

adp_c

PYK

co2_c coa_c

atp_c

nad_c

pyr_c

h2o_c

atp_c

adp_c h_c

nadh_c

PFL

h_c

h_c

akg_c

ICL

coa_c

accoa_c

coa_c

nad_c

CS

co2_c

PDH

co2_c cit_c

nadh_c coa_c h_c nadh_c

nadph_c

ACALD

for_c

coa_c

ACONTa

nad_c

LDH_D

icit_c

ACS

acald_c

nadp_c

acon-C_c h2o_c

h_c

h2o_c

nadh_c

ALCD2x

nad_c

ICDHyr

ACONTb

ppi_c amp_c

h_c atp_c

coa_c

nad_c lac-D_c

fum_c

Concentrations without data Constraint integrated in the model

ac_c

Fluxes that keep directionality Fluxes with altered directionality

Figure 7.3 Reaction network of the central carbon metabolism of E. coli indicating the integrated metabolomics concentrations and the reaction fluxes with constrained directionality.

data makes Thermodynamic Flux Analysis (TFA) an effective tool for metabolic engineers. Currently, TFA has been used to investigate the metabolism of many different organisms including E. coli [22], P. putida [27], S. cerevisiae [28], T. gondi [29], P. Falciparum [30], as well as H. sapiens [31]. The authors also believe it is important to consider the thermodynamic feasibility to efficiently evaluate potential heterologous pathways [32, 33] (see Chapter 8). Recently, thermodynamic constraints have been introduced into metabolic and expression models allowing to integrate information on thermodynamics, kinetics, and expression into genome-scale metabolic models [34]. From this chapter, it is evident that TFA relies on methods that accurately estimate the standard Gibbs free energies of reactants in the physiological conditions. Current methods are able to estimate the standard Gibbs free energies in aqueous solution with reasonable accuracy. These aqueous conditions hold for most of the metabolites as most of the volume of cells is occupied by water. Nevertheless, there exist metabolites, such as lipids, and fatty acids that form phase-separated structure such as membranes and droplets. These metabolites

References

Logarithmic concentration [log10(M)]

Logarithmic concentration [log10(M)]

Concentration range for maximal growth Concentration range with maximal ethanol production and growth

Figure 7.4 Thermodynamically feasible concentration ranges in glycolysis and the TCA cycle for optimal growth and ethanol production. accoa_c, acetyl-CoA; succ_c, succinate; oaa_c, oxaloacetate; dhap, dihydroxyacetone phosphate; and g3p_c, glyceraldehyde 3-phosphate.

do therefore not exist in an aqueous state and novel methods are needed to model the Gibbs free energy of such reactants comprehensively [12, 35]. Further, the free energy estimates consider a dilute aqueous solution. For most cells this is not the case as about 20–40% the cytosolic volume is occupied by macromolecules [36]. Recent work has shown that the crowded environment has additional effects on the Gibbs free energy [37, 38]. Thus, future work should focus on addressing these shortcomings, in order to converge toward a complete thermodynamic coverage of the biochemical reaction space that considers all relevant physiochemical properties of the cellular environment.

References 1 Demirel, Y. and Sandler, S.I. (2002). Thermodynamics and bioenergetics.

Biophys. Chem. 97: 87–111. 2 Demirel, Y. (2014). Nonequilibrium Thermodynamics: Transport and Rate

Processes in Physical, Chemical and Biological Systems. Amsterdam: Elsevier. 3 Harris, D.A. (2009). Bioenergetics at a Glance: An Illustrated Introduction.

Hoboken, NJ: Wiley. 4 Orth, J.D. et al. (2010). What is flux balance analysis?. Nat. Biotechnol. 28 (3):

245–248. 5 Henry, C.S., Broadbelt, L.J., and Hatzimanikatis, V. (2007).

Thermodynamics-based metabolic flux analysis. Biophys. J. 92: 1792–1805. 6 Ataman, M. and Hatzimanikatis, V. (2015). Heading in the right direc-

tion: thermodynamics-based network analysis and pathway engineering. Curr. Opin. Biotechnol. 36: 176–182.

233

234

7 Thermodynamics of Metabolic Pathways

7 Goldberg, R.N., Tewari, Y.B., and Bhat, T.N. (2004). Thermodynamics of

8

9 10

11

12

13

14 15

16

17

18

19 20 21

22

enzyme-catalyzed reactions – a database for quantitative biochemistry. Bioinformatics 20: 2874–2877. Wang, P. and Neumann, D.B. (1989). A database and retrieval system for the NBS tables of chemical thermodynamic properties. J. Chem. Inf. Comput. Sci. 29: 31–38. Jorgensen, W.L. (1989). Free energy calculations: a breakthrough for modeling organic chemistry in solution. Acc. Chem. Res. 22: 184–189. Hu, H. and Yang, W. (2008). Free energies of chemical reactions in solution and in enzymes with ab initio quantum mechanics/molecular mechanics methods. Annu. Rev. Phys. Chem. 59: 573–601. Rizzo, R.C., Aynechi, T., Case, D.A., and Kuntz, I.D. (2006). Estimation of absolute free energies of hydration using continuum methods: accuracy of partial charge models and optimization of nonpolar contributions. J. Chem. Theory Comput. 2: 128–139. Du, B., Zielinski, D.C., and Palsson, B.O. (2018). Estimating metabolic equilibrium constants: progress and future challenges. Trends Biochem. Sci. 43: 960–969. Mavrovouniotis, M.L. (1990). Group contributions for estimating standard Gibbs energies of formation of biochemical-compounds in aqueous-solution. Biotechnol. Bioeng. 36: 1070–1082. Mavrovouniotis, M.L. (1991). Estimation of standard Gibbs energy changes of biotransformations. J. Biol. Chem. 266: 14440–14445. Jankowski, M.D., Henry, C.S., Broadbelt, L.J., and Hatzimanikatis, V. (2008). Group contribution method for thermodynamic analysis of complex metabolic networks. Biophys. J. 95: 1487–1499. Noor, E., Haraldsdottir, H.S., Milo, R., and Fleming, R.M.T. (2013). Consistent estimation of Gibbs energy using component contributions. PLoS Comput. Biol. 9 (7): e1003098. Du, B., Zhang, Z., Grubner, S. et al. (2018). Temperature-dependent estimation of Gibbs energies using an updated group-contribution method. Biophys. J. 114: 2691–2702. Hilal, S.H., Karickhoff, S.W., and Carreira, L.A. (1995). A rigorous test for SPARC’s chemical reactivity models: estimation of more than 4300 ionization pKas. Quant. Struct.-Act. Relatsh. 14: 348–355. Alberty, R.A. (2005). Thermodynamics of Biochemical Reactions. Hoboken, NJ: Wiley. Wright, M.R. (2007). An Introduction to Aqueous Electrolyte Solutions. Chichester, England; Hoboken, NJ: Wiley. Alberty, R.A. and Goldberg, R.N. (1992). Standard thermodynamic formation properties for the adenosine 5’-triphosphate series. Biochemistry 31: 10610–10615. Henry, C.S., Jankowski, M.D., Broadbelt, L.J., and Hatzimanikatis, V. (2006). Genome-scale thermodynamic analysis of Escherichia coli metabolism. Biophys. J. 90: 1453–1461.

References

23 Salvy, P., Fengos, G., Ataman, M. et al. (2019). pyTFA and matTFA: a Python

24 25

26

27

28

29

30

31

32

33

34

35

36 37

38

package and a Matlab toolbox for thermodynamics-based flux analysis. Bioinformatics 35: 167–169. Orth, J.D., Conrad, T.M., Na, J. et al. (2011). A comprehensive genome-scale reconstruction of Escherichia coli metabolism – 2011. Mol. Syst. Biol. 7: 535. Park, J.O., Rubin, S.A., Xu, Y.-F. et al. (2016). Metabolite concentrations, fluxes and free energies imply efficient enzyme usage. Nat. Chem. Biol. 12: 482–489. Soh, K.C. and Hatzimanikatis, V. (2014). Constraining the flux space using thermodynamics and integration of metabolomics data. Methods Mol. Biol. 1191: 49–63. Tokic, M., Hatzimanikatis, V., and Miskovic, L. (2020). Large-scale kinetic metabolic models of Pseudomonas putida KT2440 for consistent design of metabolic engineering strategies. Biotechnol. Biofuels 13: 33. Soh, K.C., Miskovic, L., and Hatzimanikatis, V. (2012). From network models to network responses: integration of thermodynamic and kinetic properties of yeast genome-scale metabolic networks. FEMS Yeast Res. 12: 129–143. Tymoshenko, S., Oppenheim, R.D., Agren, R. et al. (2015). Metabolic needs and capabilities of Toxoplasma gondii through combined computational and experimental analysis. PLoS Comput. Biol. 11: e1004261. Chiappino-Pepe, A., Tymoshenko, S., Ataman, M. et al. (2017). Bioenergetics-based modeling of Plasmodium falciparum metabolism reveals its essential genes, nutritional requirements, and thermodynamic bottlenecks. PLoS Comput. Biol. 13: e1005397. Masid, M., Ataman, M., and Hatzimanikatis, V. (2020). Analysis of human metabolism by reducing the complexity of the genome-scale models using redHUMAN. Nat. Commun. 11: 2821. Brunk, E., Neri, M., Tavernelli, I. et al. (2012). Integrating computational methods to retrofit enzymes to synthetic pathways. Biotechnol. Bioeng. 109: 572–582. Hadadi, N. and Hatzimanikatis, V. (2015). Design of computational retrobiosynthesis tools for the design of de novo synthetic pathways. Curr. Opin. Chem. Biol. 28: 99–104. Salvy, P. and Hatzimanikatis, V. (2020). The ETFL formulation allows multi-omics integration in thermodynamics-compliant metabolism and expression models. Nat. Commun. 11: 1–17. Panayiotou, C., Mastrogeorgopoulos, S., Ataman, M. et al. (2016). Molecular thermodynamics of metabolism: hydration quantities and the equation-of-state approach. Phys. Chem. Chem. Phys. 18: 32570–32592. Ellis, R.J. and Minton, A.P. (2003). Cell biology: join the crowd. Nature 425: 27–28. Angeles-Martinez, L. and Theodoropoulos, C. (2016). Estimation of flux distribution in metabolic networks accounting for thermodynamic constraints: the effect of equilibrium vs. blocked reactions. Biochem. Eng. J. 105: 347–357. Weilandt, D.R. and Hatzimanikatis, V. (2019). Particle-based simulation reveals macromolecular crowding effects on the Michaelis-Menten mechanism. Biophys. J. 117: 355–368.

235

237

8 Pathway Design Jasmin Hafner, Homa Mohammadi-Peyhani, and Vassily Hatzimanikatis Laboratory of Computational Systems Biotechnology, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland

Definition Metabolic pathway design is the process of finding a suite of biochemical reactions that produce a target compound of interest from one or several precursor metabolites, including enzymes capable of performing these reactions. The reactions in the pathway can come from available biochemical resources, but they can also be predicted by reaction prediction tools. Network prediction and pathway search algorithms are employed to search the biochemical reaction space for possible pathways converting the precursor(s) into the target compound. The goal of pathway design is also to ensure that the proposed pathways are viable within the host organism and therefore to minimize experimental effort. To ensure feasibility of the pathway, a range of evaluation criteria can be employed, including stoichiometric and thermodynamic feasibility within the host organism and avoidance of toxic intermediates. A pathway passing all these criteria will have an increased chance to successfully produce the target compound in the host organism.

8.1 De Novo Design of Metabolic Pathways Metabolic engineering of an organism for the production of a certain target molecule starts with the conceptual design of its biosynthesis pathway. Here, we will discuss the case of a typical biosynthetic pathway design problem. The described workflow can be easily modified and applied to the reverse problem – the design of a biodegradation pathway. In the biosynthesis problem discussed here, the compound to be produced is called “target compound.” The target compound is synthesized from one (or several) starting metabolites, called the “precursor compound.” The precursor compound is either produced by the host organism, or it is fed to the host organism as a substrate. The biosynthetic pathway is therefore defined as a sequence of reactions that convert the precursor compound into the final target compound. Any other compounds that are Metabolic Engineering: Concepts and Applications, First Edition. Edited by Sang Yup Lee, Jens Nielsen, and Gregory Stephanopoulos. © 2021 WILEY-VCH GmbH. Published 2021 by WILEY-VCH GmbH.

238

8 Pathway Design

necessary to build the target compounds are either called cofactors, in case they are recycled by the host metabolism, or co-substrates, in case they contribute atoms to the target compound and are therefore consumed by the pathway. Here, we assume that the host organism is already known. Different host organisms and their particular characteristics and applications are discussed in the second part of this book. 8.1.1

Manual Versus Computational Design

The focus of this chapter is on computational pathway design. However, metabolic pathways can also be designed manually by relying on the intuition of the scientist. It is up to the scientist to decide which part of the design step can be done manually, and which one needs a computational approach. In general, if the pathway leading to the target compound has been characterized previously in another organism, the easiest solution might be to rely on existing knowledge and to express the enzymes catalyzing each step in the host (heterologous pathway expression). For this, the scientist can consult biochemical databases to retrieve information on the pathway to be implemented. The main drawback of manual pathway design is that the pathway chosen by intuition might be sub-optimal compared to other, unexplored biochemical possibilities. Also, this approach is limited to known biochemistry from scientific articles and databases, and does not consider novel, predicted reactions. The success of manual pathway design relies therefore on the individual biochemical knowledge of the researcher. While intuitive pathway design had been the only possible approach in the beginning of metabolic engineering and has led to many examples of successful metabolic engineering, the advent of more computational tools and methods has accelerated the development of bioproducer strains and stimulated the exploration of the biochemical space. The following section will focus on computational pathway design, keeping in mind that each of the steps can be replaced by the intuitive choices of a scientist.

8.2 Pathway Design Workflow Computational pathway design is the search of a sequence of metabolic steps that lead to the production of a target molecule of interest. The following workflow describes the important aspects to consider when designing a metabolic pathway in four working steps (Figure 8.1). 8.2.1

Biochemical Search Space

Pathway design starts with defining the biochemical search space: What kind of metabolites and reactions are to be considered? For example, should the pathway only involve compounds and reactions that are native in a certain species, family, kingdom? Can all known metabolic reactions be considered for creating the pathway? Are reactions from organic chemistry and compounds only found

8.2 Pathway Design Workflow

Figure 8.1 Workflow for computational pathway design.

239

240

8 Pathway Design

Table 8.1 Selection of popular biochemical databases as a source of molecular structures, metabolic reactions and pathways, and enzymes. Resource

Web address

Description

Genes, proteins and enzymes UniProt

www.ebi.uniprot .org

Comprehensive resource for information on proteins

Entrez Gene

http://www.ncbi .nlm.nih.gov/sites/ entrez?db=gene

Extensive gene database including automatically generated and curated information on genes for fully sequenced organisms

ExplorEnz

www.enzymedatabase.org

The primary source of the IUBMBa) enzyme list

BRENDA

www.brendaenzymes.org

Comprehensive resource for enzyme function, annotation, and occurrence

Metabolic reactions, pathways and networks KEGG

www.genome.ad.jp/ kegg

One of the most useful databases for metabolic networks. Organization of metabolic genes, enzymes, reactions, and metabolites into hundreds of metabolic pathways

BioCyc

biocyc.org

Literature-derived metabolic reactions and pathways (MetaCyc and EcoCyc)

BiGG

bigg.ucsd.edu

Collection of genome-scale metabolic models

Metabolites and compounds ChEBI

www.ebi.ac.uk/ chebi

Manually annotated and non-redundant information on small compounds of biological interest

PubChem

pubchem.ncbi.nlm .nih.gov

Extensive information on millions of small-molecule compounds and chemical substances

a) International Union of Biochemistry and Molecular Biology.

in chemical databases allowed in the pathway? Should hypothetical, predicted biochemical reactions be considered, and even novel molecular structure? Depending on the how much knowledge is available about the target compound and its metabolism in nature, the answer to this question will be different. Once the scope of the biochemical network is defined, resources are needed that provide a set of reactions to design the pathway. Organism-specific reactions can be obtained from genome-scale models, which represent the entirety of known metabolism for a given organism. General known enzymatic reactions can be collected from biochemical databases (Table 8.1). In case the metabolic databases are not sufficient to explain the biosynthesis of the target compound, hypothetical reactions have to be included (see Section 8.2.1.1). 8.2.1.1

Reaction Prediction

In case the pathway to be engineered includes novel, hypothetical biochemical reactions, a method for reaction prediction is required. Reaction prediction usually relies on enzymatic reaction rules extracted from biochemical knowledge.

8.2 Pathway Design Workflow

An important feature of reaction rules is that they are general, meaning that one reaction rule always catalyzes the same type of biochemical reaction, but applies on range of compounds showing the same reactive site. Generalized reaction rules thus computationally mimic a substrate-promiscuous enzymatic activity. Like real enzymatic activity, two descriptors are required to describe the action of a reaction rule. The first descriptor defines the nature of the reactive site by describing the type of atoms and the bond configuration that are recognized by the enzyme. The second descriptor defines the bond rearrangement during the reaction, either by describing which bonds should be broken or formed, or by describing the resulting molecular structure in the product compound. Reaction rules can be derived manually by experts from biochemical knowledge, or they can be automatically derived from known enzymatic reactions using machine learning approaches. While in the manual design of reaction rules the biochemical accuracy of each rule is ensured, the process of defining rules is extremely tedious and time-consuming. In contrast, automatic derivation of reaction rules from enzymatic data is fast, but involves the risk of error propagation from erroneous or incomplete database entries into the rule definition. A comparison of different tools providing reaction prediction can be found in Section 8.3.1. 8.2.1.2

Retrobiosynthesis

Reaction prediction is performed in a predefined chemical space. The standard approach of reaction prediction is to iteratively apply reaction rules on the biosynthesis target compound and to expand the reaction network (Figure 8.2). One can also apply reaction rules on a given metabolic pathway, on all metabolites in a metabolic network, or even on all compounds of a database (e.g. ATLAS of Biochemistry [1]). In the traditional application of reaction rules, a hypothetical network is expanded around a target compound in a process called retrobiosynthesis [2].

Figure 8.2 Schematic of retrobiosynthetic network generation.

241

242

8 Pathway Design

The concept of exploring chemistry around a synthesis target has been originally derived from retrosynthetic approaches in organic chemistry, where reaction rules were applied iteratively on target compounds, thus chemically “walking back” the synthesis pathway until a suitable precursor compound is hit. If the pathway to be engineered is an enzyme-catalyzed, biochemical pathway, the process is called retrobiosynthesis. The precursors, in this case, are native molecules of the organism that will serve as the chassis for pathway implementation. The typical retrobiosynthesis approach starts with defining the (bio)chemical scope of the network generation and the set of reaction rules: What type of reactions should be included (known, novel, specific cofactors)? What kind of compounds are allowed in the network (biological, chemical, novel structures)? Once these parameters are set, one can start generating the biochemical network around the target compound by iteratively applying the reaction rules. In a first iteration, the reaction rules are applied to the target compound in order to generate all possible reactions and products within the search scope. The products of the first iteration are then used as substrates in a second iteration of reaction generation, and so on, until a complete biochemical network around the target compound is generated. The resulting network contains known and novel, hypothetical reactions. Each iteration can theoretically result in all possible chemical structures, based on the reaction rules. In order to avoid a network explosion, several techniques exist: • Only allow compounds to be produced that are part of the predefined search space (database membership, constraints on molecular formula, molecular weight, etc.). • Only allow compounds that share a certain number of atoms with the target compound. • Only allow thermodynamically feasible reactions (constrain reaction directionality). • Explore the product compounds of each iteration in a targeted way using machine learning techniques, for example by prioritizing parts of the network to expand [3], or by prioritizing certain reaction rules over others [4]. 8.2.1.3

Network Data Representation

Whether the network comes from a known database or has been generated by reaction prediction, the output is a list of reactions interconnecting metabolites. Whatever the storage format of the network (text file, excel sheet, database), it should have unique identifiers for metabolites and reactions. This data are the basis for the next step, where the biochemical network is transformed into a format that is suitable for pathway search, i.e. stoichiometric matrix or graph representation. 8.2.2

Pathway Search

The aim of pathway search is to extract possible sequences of reactions that connect the precursor metabolite to the target compound from a biochemical

8.2 Pathway Design Workflow

network. The network can come from a database, or from a retrobiosynthetic approach (see Section 8.2.1). The biochemical network contains all the necessary information for the biosynthesis of the target compound. To search for pathways, the biochemical network needs to be represented either as a stoichiometric matrix, or as a mathematical graph. Depending on the chosen network representation, different computational techniques are used to extract linear or branched pathways that lead to the bioproduction of the target compound in the host organism. The different available tools for pathway search have been comprehensively summarized by [5]. 8.2.2.1

Stoichiometric Matrix-Based Search

If the biochemical network is represented as a stoichiometric matrix, one can optimize the production of the target compound (Flux Balance Analysis), and then extract the subnetwork of reactions carrying flux. The extracted subnetworks can be linear or branched, depending on the cofactors and co-substrates required to produce the target compound. As a consequence, the stoichiometric balance of cofactors and co-reactants is guaranteed by definition. Furthermore, additional constraints can be directly implemented in the formulation of the optimization problem. The drawback of this method is the high computational cost, increasing significantly with the size of the network to search. For big reaction networks with hundreds of thousands of reactions (e.g. retrobiosynthesis networks), a graph representation is more commonly used thanks to its computational efficiency. 8.2.2.2

Graph-Based Search

There are different ways to represent a biochemical reaction network as a mathematical graph. In general, compounds are represented by nodes (or vertices), and reactions are represented by edges. The graph is called directional if the edges have a direction assigned, and nondirectional otherwise. In a simple graph, an edge always connects two nodes. A biochemical network however is best described as a hypergraph (or digraph), where one edge can connect multiple nodes, in the same way as a reaction consumes and produces different metabolites. This representation is equivalent to a stoichiometric matrix, where all substrates and products of each reaction are fully described. However, no algorithms are available to search a hypergraph for paths. Therefore, the biochemical network is usually translated into a simple graph, for which path search algorithms are well-defined. To obtain a simple graph, only the main biotransformation of a reaction is usually represented, and cofactors are omitted to avoid the creation of shortcuts in the network. The main biotransformation can be found by manual curation (KEGG), by removing known cofactors from the network, by similarity search between substrates and products, or by only considering biotransformations that conserve a minimal number of atoms. Once the network is defined, pathways leading from the precursor to the target compound can then be enumerated by applying standard path search algorithms. Usually, a breadth-first algorithm is applied that explores the network starting from the target compound until a suitable precursor compound is found. Since the pathways obtained from a graph representation are not necessarily balanced

243

244

8 Pathway Design

in terms of co-substrate utilization and cofactor balance, pathway evaluation methods are necessary to ensure the feasibility of the pathway (see Section 8.2.4). 8.2.2.3

Pathway Ranking

The output of any pathway search algorithm will be a list of possible pathways. In order to choose pathways for evaluation, it can be useful to rank the pathways according to different criteria: • Pathway length: Shorter pathways are generally preferred. • Number of known reaction steps: Novel or orphan reaction steps may be avoided. • Pathway weight for the case that the network is a weighted graph: The weight can represent different aspects of a biotransformation, such as the chemical similarity or the number of conserved atoms between the substrate and the product. Depending on the implementation of the network into a graph representation, further information might be provided from the pathway search, and depending on the project, one criterion might be preferred over another. At this point, the scientist will decide which criteria are the most important ones, and weigh them accordingly to get the desired pathway ranking. However, the information provided by the pathway search is usually not sufficient to determine which pathway is the best one. Sections 8.2.3 and 8.2.4 discuss the steps that help to determine the best-suited pathway for implementation in the host organism. Each step contributes additional evaluation data to the overall pathway ranking. The weights of the different criteria on the final pathway ranking can be adjusted by the scientist to fit the project requirements. 8.2.3

Enzyme Assignment

Each enzymatic reaction step in a pathway needs to be catalyzed by an enzyme. For known, well-described reactions, appropriate enzymes can be found by literature search or database lookup. However, if the list of pathways obtained from the previous step is long, computational approaches are needed to annotate reactions with enzymes, or to predict enzymes for hypothetical reactions. Automated querying of enzyme databases can be used to assign enzymes to reactions. For predicted, hypothetical reactions, however, no enzyme will be found. It can also occur that, although the reaction is known and classified by the Enzyme Commission (EC), no information is available on the enzyme catalyzing the reaction. These reactions are called orphan and are treated the same way as novel reactions. 8.2.3.1

Enzyme Prediction for Orphan and Novel Reactions

Despite the increasing progress in new enzyme discovery, up to 50% of the enzymatic reactions cataloged in biochemical databases are orphan [6]. With no protein sequence available, orphan enzymes have to be excluded from pathway design. To find enzymes for orphan or novel reactions, tools are built to find similar reactions with known catalyzing enzymes. The assumption behind is that if

8.2 Pathway Design Workflow

the substrate structure and the type of reaction are similar enough between the orphan and the sequence-annotated reaction, the enzyme catalyzing the annotated reaction has an increased chance to also catalyze the novel reaction. Even if the enzyme has a very low catalytic activity on the substrate of the orphan reaction, the enzyme can be used as a starting point for enzyme engineering. The methods used to find similar reactions to the orphan reaction are based on molecular fingerprints, which are mathematical descriptors of the structural and topological properties of the participating metabolites. In a first step, fingerprints are calculated for all known, enzyme-annotated reactions of a reference reaction database, and also for the orphan reaction. In a second step, a similarity score is calculated between the fingerprint of the orphan reaction and all reactions in the reference database. Finally, the reactions of the reference database are ranked by their similarity score with the orphan reaction. The top-ranking reactions of the reference database are then suggested as putative enzymes to catalyze the orphan reactions. Available tools for enzyme prediction differ in their method to calculate the molecular fingerprint (Table 8.2). For example, the enzyme prediction tool BridgIT assesses the similarity of two reactions, one orphan and one non-orphan, by including information on the bond rearrangement of the reactive site of the substrate, as well as the molecular substructure surrounding the reactive site, in the fingerprint calculation. Other tools like EC BLAST, Selenzyme, and E-zyme use different combinations of structural similarity of substrates, products, and reactive sites, as well as bond rearrangement during the transformation to calculate the fingerprint. Table 8.2 Overview on different enzyme prediction tools and methods. Enzyme prediction tools

Description of reaction comparison techniques

BridgIT [7]

Based on BridgIT reaction fingerprint, including information on the reactive site and its neighborhood Bond changes (BC)

Based on bond change similarity, including information on bonds formed and cleaved, bond order change, and stereochemical change

Reaction center (RC)

Based on reaction center similarity, comparing the structure around modified bonds (reaction centers)

EC BLAST [8] Both BC and RC

Selenzyme [9] E-zyme [10]

Based on both bond change and reaction center methods

Structural similarity

Based on substructure similarity, comparing the molecular structures of all molecules involved in the reaction

RDKit

Based on RDKit fingerprints

Morgan

Based on Morgan fingerprints

Pattern

Based on pattern fingerprints Based on structures of substrate–product pairs (reactant pairs)

245

246

8 Pathway Design

8.2.3.2

Choice of Protein Sequence

Not all the enzymes are phylogenetically compatible with the host cell. As a general rule, the closer the native organism of the enzyme is with the host cell on the phylogenetic tree, the higher the chance that the enzyme will express successfully in the host cell environment. Therefore, when choosing between different sequences for a same enzymatic activity, it might be advantageous to give preference to enzymes that come from an organism that is phylogenetically close to the host organism. If the organism of the native enzyme belongs to a different kingdom, techniques such as codon optimization might be required to successfully express the enzyme in the host. 8.2.4

Pathway Feasibility

To increase the probability of successful pathway engineering, the generated pathways can be evaluated by computationally simulating the biosynthesis of the target compound in a genome-scale metabolic model. For this, a metabolic model of the host organism is required. By inserting the candidate pathways one by one into the model, different properties such as the stoichiometric, thermodynamic, and kinetic feasibility of the proposed pathway can be calculated. This approach eliminates any unfeasible pathways, and it can be used to rank the remaining pathways and pick the best-suited ones for experimental implementation. 8.2.4.1

Chassis Metabolic Model

Having a metabolic model of the host organism is a pre-requirement for the following analyses. If no appropriate model has been constructed yet, an existing model of a related organism can be adapted, or a new metabolic model can be generated from scratch (see Chapter 2). Databases such as BiGG [11] can be queried to find a suitable metabolic model for pathway evaluation. 8.2.4.2

Stoichiometric Feasibility

A first criterion of feasibility is that the compounds participating in the pathway are elementally balanced with regard to the organism’s metabolism. In order to assess the stoichiometric feasibility, each predicted pathway is implemented into a genome-scale model of the chassis organism, and the production of the target compound is optimized in a Flux Balance Analysis (FBA) type of problem. This automatic procedure serves to determine pathways that use co-substrates or cofactors that are not available within the host organism, and also to exclude pathways that produce by-products that are not recyclable within the host. The stoichiometric feasibility evaluation outputs a yes/no answer for feasibility. Additionally, the theoretical carbon yield from the substrate can be calculated in the process and used as a ranking criterion. 8.2.4.3

Thermodynamic Feasibility

A second criterion is the thermodynamic feasibility of the production of the target compound. In order to evaluate the thermodynamic feasibility, thermodynamic data have to be collected or calculated for the compounds and reactions in the pathway. A common approach to estimate the thermodynamic properties is

8.3 Applications

the Group Contribution Method (GCM), as implemented in a range of tools (see Chapter 7) [12, 13]. The estimations of the Gibbs free energy of formation and the Gibbs free energy of reaction are necessary to perform thermodynamics-based flux analysis (TFA) on the production of the target molecule in the chassis model. For this, a thermodynamically curated model is a prerequisite, as well as the availability of metabolite concentration ranges that allow to constrain the metabolic model. Performing TFA on the production of the target molecule tells us if the compound can be produced form a thermodynamic point of view, and therefore helps to identify infeasible pathways that can be discarded from further analysis. 8.2.4.4

Kinetic Feasibility

To assess the kinetic feasibility of a pathway, kinetic descriptors are required for each enzyme. While kinetic descriptors are rather difficult to obtain for the whole metabolism of the host organism (see Chapter 5), it can already be useful to perform a kinetic analysis of the isolated pathway. Identification of kinetic bottlenecks or kinetic traps is of particular importance in branched or circular pathways [14]. A kinetic analysis can detect such kinetic problems and therefore be used to eliminate kinetically infeasible pathways. 8.2.4.5

Toxicity of Intermediates

The production of toxic by-products or pathway intermediates can be detrimental to the host organisms and severely reduce the yield of the target product. To avoid the accumulation of toxic compounds in the engineered cell, two factors should be considered: (i) the concentration of the toxic metabolite and (ii) the degree of toxicity of the compound. The thermodynamic analysis of the pathway can inform the scientist about the allowed concentration ranges of each metabolite, allowing the exclusion of pathways that require high concentration of toxic compounds. A kinetic model of the pathway can further help to determine the accumulation of a toxic compound over time. Hence, the thermodynamic and kinetic analyses can help to reduce the risk of accumulation of intermediates. In addition, one can employ computational tools to estimate the toxicity of compounds (e.g. structural alerts [15]). In combination with the thermodynamic and kinetic pathway analyses, toxicity estimations can be integrated into the overall pathway ranking to avoid the production of molecules that are toxic to the cell, and that will hamper the growth of the cell and eventually result in a lower product yield.

8.3 Applications In the following section, examples of successful pathway design projects are reviewed to help the reader navigate existing tools and to find relevant example applications. 8.3.1

Available Tools for Pathway Design

Tools and methods have been developed in the past to address the different aspects of the pathway design problem (Table 8.3). Some pathway design tools

247

Table 8.3 Available pathway design tools and their characteristics. Network Enzyme generation assignment

Pathway evaluation

Tool

Enzyme Reaction identification prediction tool

Network stoichiometry Host Gibbs free (1) and organism energy of thermodynamics Toxicity of intermediates specificity Availability reaction (2)

BNICEa)

Yes

BridgIT [7]

GCM [12] Yes (1,2)

RetroPath seriesb)

Yes

Selenzyme [9, 26]

Yes

Yes (1)

novoStoic

Yes

Yes

Yes [13]

Yes (1,2)

GEM-Path

Yes

Yes

GCM [12] Yes (1,2)

Yes

SimPheny

Yes

Manual

Yes [37]

Yes

ReactPRED

Yes

TransformMinER

Yes

DESHARKY

Yes [27]

Yes

Open data via ATLAS [1]

Tool development [16], applications [17–23], reviews [2, 24, 25], experimental validation [14]

Yes

Open-source and open data via XTMS

Tool development [3, 28–30], reviews [31], pathway search [32], experimental validation [33]

Yes

Open-source

Tool development [34], reviews [5, 35]

Yes Yes Yes

References

Yes

Tool development [36] Commercial

Tool development (Genomatica) [38], experimental validation [39]

Open-source

Tool development [40]

Online tool

Tool development [41] Tool development [42], review [43]

ReBiT

Yes

Yes

Review [44]

Cho et al.

Yes

Yes

Tool development and application [45]

a) The development of BNICE started in the Northwestern University in 2004, and later continued at EPFL under the new name BNICE.ch. b) Comparison includes the original RetroPath method as well as its extended versions RetroPath2.0 and RetroPath RL. Source: Adapted and updated from Hadadi and Hatzimanikatis [2].

8.3 Applications

offer the whole suite of network generation, pathway search, enzyme assignment, and pathway evaluation in one package, while others just perform one specific part of the design task. Furthermore, some of the more recent approaches perform network generation and pathway search in one workflow step using machine learning approaches [4, 46]. In these cases, after each step in the network generation, the most promising branches of the biochemical reaction tree are selectively explored, and the exploration is stopped once a precursor metabolite has been successfully connected to the target compound. 8.3.2

Successful Applications of Pathway Design Tools

Past success stories of computational pathway design can serve as inspiration and as technical guidance in a metabolic engineering project. These successful applications have been selected as a reference: • Production of the commodity chemical 1,4-butanediol in Escherichia coli using the pathway prediction tool SimPheny [39]. • Heterologous production of the flavonoid pinocembrin in E. coli using the pathway prediction tool RetroPath [33]. • Design of a one-carbon assimilation pathway using predicted reactions from the ATLAS of Biochemistry database, selection of pathways using stoichiometric and kinetic models [14]. Without published experimental validation, comprehensive computational pathway design has been performed for methyl ethyl ketone [18], isobutanol [35], and a collection of 20 commodity chemicals [36]. The design of biodegradation pathways is discussed for benzo[a]pyrene [34]. 8.3.3

Practical Example of Pathway Design

The following example illustrates the application of pathway design for 1,4-butanediol (BDO), a compound that was subject of several successful metabolic engineering efforts in the past [39, 47, 48]. The previously published BDO-producing E. coli strains serve as a benchmark for this exercise. For this showcase, we used the tools available on the website of LCSB Databases (lcsb-databases.epfl.ch). For the stoichiometric and thermodynamic pathway evaluation, the toolboxes COBRA and pyTFA, respectively, were used. Alternative software options are discussed in Section 8.3.1. 8.3.3.1

Creating a Biochemical Network Around BDO

Since BDO is not natively produced by E. coli, we decided to include synthetic chemicals in the compound search space, and to consider novel, predicted reactions. We extracted the reaction network surrounding BDO from the ATLAS of Biochemistry database, which contains known and novel reactions. Novel reactions in ATLAS are predicted by enzymatic reaction rules from BNICE.ch. The extraction of four reaction steps around BDO resulted in a network of 31 656 compounds and 50 118 biotransformations (Figure 8.3a). Each

249

(a)

(b)

Figure 8.3 Retrobiosynthetic network generation around BDO. (a) Visualization of the reaction network of known and hypothetical reactions for four generations around BDO. The white arrow indicates the node in the network representing the target compound. (b) The extracted showcase network connects 12 compounds through 15 reactions. The numbering of reactions follows the order of edges traversed in a depth-first graph search for listing all possible pathways connecting the target compound to E. coli precursor metabolites (grey boxes).

8.3 Applications

biotransformation in the network represents one reaction, or several similar reactions involving alternative cofactors (e.g. one reaction consuming NADH and another consuming NADPH). Since the analysis of such a big network requires computational analysis, the subsequent steps of the pathway design workflow are shown on a smaller network. As a showcase, we therefore manually extracted a subset from the retrobiosynthesis network (Figure 8.3b). Network extraction around a target compound from the ATLAS of Biochemistry is available as a web application on lcsb-databases.epfl.ch. 8.3.3.2

Search for Biosynthetic Pathways

In a next step, the retrobiosynthetic reaction network was searched for potential pathways connecting host metabolites with the target compound BDO. The simplified network was translated into a graph format that describes all biotransformations involved. The graph representation of the reaction network has 12 nodes (i.e. compounds) and 15 edges (i.e. biotransformations) (Figure 8.3b). The graph representation of the reaction network can then be searched for pathways using off-the-shelf path search algorithms such as breadth-first or depth-first graph traversal algorithms. In the case where the edges are assigned a weight (e.g. atom conservation, probability of occurrence, compound similarity), Yen’s k-shortest loop-less path search can be employed to find the k shortest pathways in a weighted network [49]. Here, we performed a depth-first search from BDO to the three E. coli metabolites present within the network (Figure 8.3b), which resulted in a list of six possible pathways connecting the target to each of the three identified possible precursors (i.e. E. coli native metabolites). 8.3.3.3

Finding Enzymes for Novel Reactions

When novel reactions are involved in the predicted pathways, we need to find enzymes that can potentially catalyze the predicted reactions. To do this, we applied the enzyme prediction tool BridgIT to all of the 12 novel reactions in the network (Table 8.4). The reactions that are already listed in other biochemical databases were assigned their respective identifiers (R4, R6, and R11). BridgIT is available as a web application on lcsb-databases.epfl.ch. 8.3.3.4

Stoichiometric and Thermodynamic Pathway Evaluation

To evaluate the stoichiometric and thermodynamic feasibility of each pathway, we appended the pathways one by one to a thermodynamically curated genome-scale model of E. coli (iJO1366). In a first step, the production of BDO was optimized using Flux Balance Analysis (FBA), and the yield from glucose was calculated in gproduct /gglucose . In a second step, all stoichiometrically feasible pathways were analyzed for thermodynamic feasibility using Thermodynamic Flux Analysis (TFA) (Table 8.5). If the pathway was found to be feasible from a thermodynamic point of view as well, the yield from glucose was calculated in gproduct /gglucose . 8.3.3.5

Overall Ranking of Pathways

For an overall comparison of the pathways, we chose to consider the length of the pathway, the availability of enzymes for the reaction steps, and the

251

252

8 Pathway Design

Table 8.4 Each reaction involved in the extracted showcase network is either annotated with its respective database identifiers, or, if the reaction is novel, with the most similar known reaction proposed by BridgIT, its corresponding enzyme, and a similarity measure (BridgIT score). External identifiers (KEGGa), ModelSEEDb))

BridgIT score

Most similar known reaction

EC number

R1

0.97

R11448

1.14.13.230

1.14.13.-

R2

0.17

R07408

2.6.1.82

1.5.1.-

0.22

R01394

5.3.1.22

5.3.2.-

1.1.1.-

1.1.1.-

1.2.1.10, 1.2.1.57, 1.2.1.87

1.2.1.-

6.2.1.40

6.2.1.-

Reaction

R3 R4

rxn23029

R5

R6

0.99

R01172

R09279, rxn16139

Third level EC of reaction rule

R7

0.87

R00272

4.1.1.71

4.1.1.-

R8

0.91

R00402

1.3.1.6

1.3.1.-

R9

0.52

R01082

4.2.1.2

4.2.1.-

R10

0.95

R01644

1.1.1.61

1.1.1.-

4.2.1.43, 4.2.1.141

4.2.1.-

R11

R09186, rxn18934

R12

0.55

R01047

4.2.1.30

4.2.1.-

R13

0.97

R09281

1.1.1.79

1.1.1.-

R14

0.77

R00272

4.1.1.71

4.2.1.-

R15

0.97

R11448

1.14.13.230

1.14.13.-

a) www.kegg.jp. b) modelseed.org.

thermodynamic yield. We only considered pathways that were thermodynamically feasible. Each criterion is translated into a score between 0 and 1, giving each of them an equal weight in the overall ranking: (i) The number of steps was translated into a length score (1/number of steps). (ii) The enzyme availability score results from the averaged BridgIT score over each step in the pathway, with a score of 1 for known reactions. (iii) The TFA yield was multiplied by a factor of 10 to calculate the TFA feasibility score. The overall rank was calculated by summing up the scores for the three criteria for each pathway, and then sorting them from highest to lowest (Table 8.6). The top-ranked pathways P4 [39], P2 [47], and P5 [48] have been validated experimentally, while the in vivo feasibility of P3, which is equivalent to P5 in terms of thermodynamics, remains to be shown. This toy example shows how computational pathway design can be used to predict, extract, annotate, and rank potential biosynthetic pathways.

8.4 Conclusions and Future Perspectives

Table 8.5 List of extracted pathways with number of steps, stoichiometric and thermodynamic feasibility as well as yield of BDO from glucose in [gBDO /gglucose ]. Thermodynamic feasibility

Max. theoretical yield TFA

0.202

No



0.094

Yes

0.043

Yes

0.125

Yes

0.056

4

Yes

0.125

Yes

0.056

R14 → R13 → R12 → R4

4

Yes

0.125

Yes

0.056

R14 → R13 → R15

3

Yes

0.125

No



Pathway

Reaction sequence

Number of steps

Stoichiometric feasibility

Max. theoretical yield FBA

P1

R3 → R2 → R1

3

Yes

P2

R6 → R5 → R4

3

Yes

P3

R9 → R8 → R7 → R4

4

P4

R11 → R10 → R7 → R4

P5 P6

Table 8.6 TFA-feasible pathways ranked by the three criteria length, enzyme availability, and TFA yield.

No. Length Pathway steps score Average BridgIT score

TFA feasibility score Enzyme availability [10−1 gBDO / Overall Overall score gglucose ] score rank

P4

4

1/4

(1 + 0.99 + 0.97 + 1)/4

0.96

0.56

1.77

1

P2

3

1/3

(1 + 0.99 + 1)/3

1.00

0.43

1.76

2

P3

4

1/4

(0.58 + 0.91 + 0.87 + 1)/4 0.84

0.56

1.65

3

P5

4

1/4

(0.77 + 0.97 + 0.55 + 1)/4 0.82

0.56

1.63

4

8.4 Conclusions and Future Perspectives Computational pathway design has been shown in the past to help scientists to find optimal pathways from a native metabolite, or from a precursor compound taken up by the cell, toward the production of a target molecule. For each step in the pathway design process, a range of computational tools is available from different sources. However, designing metabolic pathways for biosynthesis in an accurate and reliable manner remains a challenging endeavor. A major hurdle is the integration of the different tools into integrated pipelines. The usage of standard, interchangeable formats for sharing pathway data would greatly help the field to facilitate the usage of these tools, as well as the comparison of tools with each other. In an ideal case, all the necessary methods would be integrated in one single tool that can predict biosynthesis pathways, evaluate them in the context of a host organisms, find adapted enzymes and provide codon-optimized gene sequences that can be readily used to transform the host organisms for the bioproduction of a compound of interest. This idea has been

253

254

8 Pathway Design

promoted by Nielsen and Keasling, who imagined a biological computer-aided design (CAD) tool, as it is widely used in other engineering disciplines such as mechanical, electrical, and civil engineering, for metabolic engineering [50]. The output of such a BioCAD would be an experimental recipe that enables metabolic engineers to create new strains for the bioproduction of any desired chemical compound. Even though such a tool is not yet reality for metabolic engineering, it has been achieved for chemical synthesis: The Chematica platform proposes a computer-assisted planning of synthesis routes based on the concepts of retrosynthesis and chemical reaction rules [51]. Chematica has been successfully applied to generate synthesis recipes that could be directly used by chemists to perform complex syntheses of a range of benchmark chemicals, without requiring any prior experience in multistep organic synthesis [52]. These predictions were generated within 15–20 minutes thanks to the integration of several mathematical and computational techniques such as graph theory, linear programming, artificial intelligence, and expert-curated chemical knowledge. A tool providing such accurate predictions in a short time would be desirable for metabolic engineering, and it would help to accelerate the expansion of range of biosynthetically produced chemicals. To achieve a comparable success rate for biological systems, however, we first need to overcome multiple challenges, most of them due to the complexity of biological systems. For example, knowledge gaps in biology affect the accuracy of biosynthetic predictions. Furthermore, additional factors such as choice of the best chassis organism, and uncertainties in the activity of enzymes (e.g. kinetic properties, substrate promiscuity) should be considered. Hence, an intelligent integration of available knowledge is required to predict biosynthesis pathways in a way comparable to chemical synthesis. The authors suggest that future developments in the field of pathway design will evolve toward integrated computational platforms that facilitate the application of pathway design tools for users without bio- or cheminformatic background.

References 1 Hadadi, N., Hafner, J., Shajkofci, A. et al. (2016). ATLAS of biochemistry:

2

3 4

5

a repository of all possible biochemical reactions for synthetic biology and metabolic engineering studies. ACS Synth. Biol. 5: 1155–1166. Hadadi, N. and Hatzimanikatis, V. (2015). Design of computational retrobiosynthesis tools for the design of de novo synthetic pathways. Curr. Opin. Chem. Biol. 28: 99–104. Koch, M., Duigou, T., and Faulon, J.-L. (2019). Reinforcement learning for bio-retrosynthesis. bioRxiv: 800474. Wicker, J., Lorsbach, T., Gütlein, M. et al. (2015). enviPath – The environmental contaminant biotransformation pathway resource. Nucleic Acids Res. 44 (D1): D502–D508. https://doi.org/10.1093/nar/gkv1229. Wang, L., Dash, S., Ng, C.Y., and Maranas, C.D. (2017). A review of computational tools for design and reconstruction of metabolic pathways. Synth. Syst. Biotechnol. 2: 243–252.

References

6 Sorokina, M., Stam, M., Médigue, C. et al. (2014). Profiling the orphan

enzymes. Biol. Direct 9: 10. 7 Hadadi, N., MohammadiPeyhani, H., Miskovic, L. et al. (2019). Enzyme anno-

8

9 10

11

12

13

14 15

16 17

18

19

20

21 22

tation for orphan and novel reactions using knowledge of substrate reactive sites. Proc. Natl. Acad. Sci. U. S. A. 116: 201818877. Rahman, S.A., Cuesta, S.M., Furnham, N. et al. (2014). EC-BLAST: a tool to automatically search and compare enzyme reactions. Nat. Methods 11: 171–174. Carbonell, P., Wong, J., Swainston, N. et al. (2018). Selenzyme: enzyme selection tool for pathway design. Bioinformatics 34: 2153–2154. Yamanishi, Y., Hattori, M., Kotera, M. et al. (2009). E-zyme: predicting potential EC numbers from the chemical transformation pattern of substrate-product pairs. Bioinformatics 25: i179–i186. King, Z.A., Lu, J., Dräger, A. et al. (2016). BiGG models: a platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Res. 44: D515–D522. Jankowski, M.D., Henry, C.S., Broadbelt, L.J., and Hatzimanikatis, V. (2008). Group contribution method for thermodynamic analysis of complex metabolic networks. Biophys. J. 95: 1487–1499. Noor, E., Haraldsdóttir, H.S., Milo, R., and Fleming, R.M.T. (2013). Consistent estimation of Gibbs energy using component contributions. PLoS Comput. Biol. 9: e1003098. Yang, X., Yuan, Q., Luo, H. et al. (2019). Systematic design and in vitro validation of novel one-carbon assimilation pathways. Metab. Eng. 56: 142–153. Sushko, I., Salmina, E., Potemkin, V.A. et al. (2012). ToxAlerts: a web server of structural alerts for toxic chemicals and compounds with potential adverse reactions. J. Chem. Inf. Model. 52: 2310–2316. Hatzimanikatis, V., Li, C., Ionita, J.A. et al. (2005). Exploring the diversity of complex metabolic networks. Bioinformatics 21: 1603–1609. Hadadi, N., Cher Soh, K., Seijo, M. et al. (2014). A computational framework for integration of lipidomics data into metabolic pathways. Metab. Eng. 23: 1–8. Toki´c, M., Hadadi, N., Ataman, M. et al. (2018). Discovery and evaluation of biosynthetic pathways for the production of five methyl ethyl ketone precursors. ACS Synth. Biol. 7 (8): 1858–1873. https://doi.org/10.1021/acssynbio .8b00049. Brunk, E., Neri, M., Tavernelli, I. et al. (2012). Integrating computational methods to retrofit enzymes to synthetic pathways. Biotechnol. Bioeng. 109: 572–582. Henry, C.S., Broadbelt, L.J., and Hatzimanikatis, V. (2010). Discovery and analysis of novel metabolic pathways for the biosynthesis of industrial chemicals: 3-hydroxypropanoate. Biotechnol. Bioeng. 106 (3): 462–473. Finley, S.D., Broadbelt, L.J., and Hatzimanikatis, V. (2009). Computational framework for predictive biodegradation. Biotechnol. Bioeng. 104: 1086–1097. Finley, S.D., Broadbelt, L.J., and Hatzimanikatis, V. (2009). Thermodynamic analysis of biodegradation pathways. Biotechnol. Bioeng. 103: 532–541.

255

256

8 Pathway Design

23 Jeffryes, J.G., Colastani, R.L., Elbadawi-Sidhu, M. et al. (2015). MINEs: open

24 25

26 27

28 29

30 31

32

33

34

35

36

37

38

access databases of computationally predicted enzyme promiscuity products for untargeted metabolomics. J. Cheminform. 7: 44. Soh, K.C. and Hatzimanikatis, V. (2010). DREAMS of metabolism. Trends Biotechnol. 28: 501–508. Hatzimanikatis, V., Li, C., Ionita, J.A., and Broadbelt, L.J. (2004). Metabolic networks: enzyme function and metabolite structure. Curr. Opin. Struct. Biol. 14: 300–306. Mellor, J., Grigoras, I., Carbonell, P., and Faulon, J.-L. (2016). Semisupervised Gaussian process for automated enzyme search. ACS Synth. Biol. 5: 518–528. Planson, A.-G., Carbonell, P., Paillard, E. et al. (2012). Compound toxicity screening and structure-activity relationship modeling in Escherichia coli. Biotechnol. Bioeng. 109: 846–850. Delépine, B., Duigou, T., Carbonell, P., and Faulon, J.-L. (2018). RetroPath2.0: a retrosynthesis workflow for metabolic engineers. Metab. Eng. 45: 158–170. Carbonell, P., Planson, A.-G., Fichera, D., and Faulon, J.-L. (2011). A retrosynthetic biology approach to metabolic pathway design for therapeutic production. BMC Syst. Biol. 5: 122. Carbonell, P., Parutto, P., Herisson, J. et al. (2014). XTMS: pathway design in an eXTended metabolic space. Nucleic Acids Res. 42: W389–W394. Carbonell, P., Planson, A.-G., and Faulon, J.-L. (2013). Retrosynthetic design of heterologous pathways. In: Systems Metabolic Engineering: Methods and Protocols, Methods in Molecular Biology, vol. 985 (ed. H.S. Alper), 149–173. Totowa, NJ: Humana Press https://doi.org/10.1007/978-1-62703-299-5_9. Carbonell, P., Fichera, D., Pandit, S.B., and Faulon, J.-L. (2012). Enumerating metabolic pathways for the production of heterologous target chemicals in chassis organisms. BMC Syst. Biol. 6: 10. Fehér, T., Planson, A.-G., Carbonell, P. et al. (2014). Validation of RetroPath, a computer-aided design tool for metabolic pathway engineering. Biotechnol. J. 9: 1446–1457. Kumar, A., Wang, L., Ng, C.Y., and Maranas, C.D. (2018). Pathway design using de novo steps through uncharted biochemical spaces. Nat. Commun. 9: 184. Wang, L., Ng, C.Y., Dash, S., and Maranas, C.D. (2018). Exploring the combinatorial space of complete pathways to chemicals. Biochem. Soc. Trans. 46: 513–522. Campodonico, M.A., Andrews, B.A., Asenjo, J.A. et al. (2014). Generation of an atlas for commodity chemical production in Escherichia coli and a novel pathway prediction algorithm, GEM-path. Metab. Eng. 25: 140–158. Mavrovouniotis, M.L. (1990). Group contributions for estimating standard gibbs energies of formation of biochemical compounds in aqueous solution. Biotechnol. Bioeng. 36: 1070–1082. Schilling, C.H., Thakar, R., Travnik, E. et al. SimPhenyTM : A Computational Infrastructure for Systems Biology. http://citeseerx.ist.psu.edu/viewdoc/ summary?doi=10.1.1.107.8797 (accessed 18 December 2020).

References

39 Yim, H., Haselbeck, R., Niu, W. et al. (2011). Metabolic engineering of

40

41

42

43

44

45 46 47

48

49

50 51

52

Escherichia coli for direct production of 1,4-butanediol. Nat. Chem. Biol. 7: 445–452. Sivakumar, T.V., Giri, V., Park, J.H. et al. (2016). ReactPRED: a tool to predict and analyze biochemical reactions. Bioinformatics 32 (22): 3522–3524. https://doi.org/10.1093/bioinformatics/btw491. Tyzack, J.D., Ribeiro, A.J.M., Borkakoti, N., and Thornton, J.M. (2019). Exploring chemical biosynthetic design space with transform-MinER. ACS Synth. Biol. 8: 2494–2506. Rodrigo, G., Carrera, J., Prather, K.J., and Jaramillo, A. (2008). DESHARKY: automatic design of metabolic pathways for optimal cell growth. Bioinformatics 24: 2554–2556. Martin, C.H., Nielsen, D.R., Solomon, K.V., and Prather, K.L.J. (2009). Synthetic metabolism: engineering biology at the protein and pathway scales. Chem. Biol. 16: 277–286. Prather, K.L.J. and Martin, C.H. (2008). De novo biosynthetic pathways: rational design of microbial chemical factories. Curr. Opin. Biotechnol. 19: 468–474. Cho, A., Yun, H., Park, J. et al. (2010). Prediction of novel synthetic pathways for the production of desired chemicals. BMC Syst. Biol. 4: 35. Koch, M., Duigou, T., and Faulon, J.L. (2020). Reinforcement learning for bioretrosynthesis. ACS Synth. Biol. 9: 157–168. Liu, H. and Lu, T. (2015). Autonomous production of 1,4-butanediol via a de novo biosynthesis pathway in engineered Escherichia coli. Metab. Eng. 29: 135–141. Wang, J., Jain, R., Shen, X. et al. (2017). Rational engineering of diol dehydratase enables 1,4-butanediol biosynthesis from xylose. Metab. Eng. 40: 148–156. Hafner, J., Hatzimanikatis, V. (2020). Finding metabolic pathways in large networks through atom-conserving substrate-product pairs. bioRxiv 2020.11.25.398453. https://doi.org/10.1101/2020.11.25.398453 Nielsen, J. and Keasling, J.D. (2016). Engineering cellular metabolism. Cell 164: 1185–1197. Szymku´c, S., Gajewska, E.P., Klucznik, T. et al. (2016). Computer-assisted synthetic planning: the end of the beginning. Angew. Chemie Int. Ed. 55: 5904–5937. Klucznik, T., Mikulak-Klucznik, B., McCormack, M.P. et al. (2018). Efficient syntheses of diverse, medicinally relevant targets planned by computer and executed in the laboratory. Chem 4: 522–532.

257

259

9 Metabolomics Tomek Diederen, Alexis Delabrière, Alaa Othman, Michelle E. Reid, and Nicola Zamboni ETH Zurich, Institute of Molecular Systems Biology, Zurich, Switzerland

9.1 Introduction Metabolomics is the discipline that aims at detecting, quantifying, and identifying the metabolome, i.e. the ensemble of all small molecules that are present in a given biological sample. The key motivation behind the development and application of metabolomics is the investigation of metabolic phenotypes. Metabolic phenotypes are an emergent property of whole cells and can hardly be inferred from genomics, transcriptomics, or proteomics data. On the one hand, regulation of metabolic pathway occurs to a large by mechanisms that cannot be captured by these techniques, e.g. post-translational modifications, allosteric interactions, or localization [1]. On the other hand, the size and the poor knowledge of detailed kinetics prevent to quantitatively predict how differences in sequence or expression of enzymes will affect enzyme activity and, in turn, metabolic fluxes. The metabolome, in contrast, is sensitive to all changes that affect fluxes, regardless of the cause. Therefore, it provides an integrated readout of metabolic function. From the analytical standpoint, the major challenge in metabolomics is diversity. The number of small molecules with molecular weight of up to 1500 Da found in a cell is in the order of thousands. About 500–700 are highly conserved primary metabolites necessary for proliferation, but the number explodes by 2–3 orders of magnitude if secondary metabolites (e.g. natural products) or lipids are included. Even though not all of them are present in detectable amounts in every type of cell, they pose technical challenges because of their very heterogeneous physicochemical properties (charge, size, hydrophilicity, reactivity, chirality, etc.). These differences have direct consequences on how samples are processed for metabolomics. Fortunately, there is one technology that allows detecting most metabolite and lipid classes: mass spectrometry (MS). This versatile technology combines separation, quantification, sensitivity, dynamic range, and speed in one, and is therefore the predominant approach for modern metabolomics. In this chapter, we describe the methodological, technical, and algorithmic fundamentals of MS-based metabolomics. Finally, Metabolic Engineering: Concepts and Applications, First Edition. Edited by Sang Yup Lee, Jens Nielsen, and Gregory Stephanopoulos. © 2021 WILEY-VCH GmbH. Published 2021 by WILEY-VCH GmbH.

260

9 Metabolomics

we provide representative examples on how metabolomics is used in metabolic engineering.

9.2 Fundamentals 9.2.1

Experimental Design

Metabolomics is a multipurpose discipline: it is employed to screen mutant libraries, monitor metabolic states during a fermentation, pinpoint production bottlenecks, infer network fluxes, unravel cellular regulation, etc. To address different goals, experimental and analytical workflows have to be adjusted ad hoc. One fundamental distinction is between endometabolomics and exometabolomics, i.e. between the analysis of metabolite levels within cells or in spent medium, respectively. Endometabolomics is the approach of choice to survey intracellular state. As we will explain later, current technology allows to detect thousands of metabolites and lipids within a single run. Thus, most metabolic pathways can be assayed in parallel to deliver a comprehensive snapshot of metabolism. The crux of endometabolomics is that metabolite levels are not sufficient to determine metabolic fluxes, i.e. in vivo rates of reactions and pathways [2]. Unless additional data on enzyme expression or kinetic properties is used, endometabolomics is used diagnostically to identify what pathways are affected and qualitatively guess flux changes. In contrast, exometabolomics delivers a direct measure of fluxes, but only for a small subset of reactions. This is possible when cells are grown in a closed system, for example a shake flask. By measuring the rate at which metabolites in the medium change, it is possible to calculate the flux of transport reactions to and from the extracellular space. This information can be complemented with growth rate, biomass composition, and off-gas analysis to assemble carbon and nitrogen balances that comprehensively describe cellular physiology. Measurement of intracellular fluxes relies on stable isotope tracers and detection of the resulting labeling patterns by endometabolomics. These methods use mathematical calculations and are described elsewhere in this book. One additional element of experimental design is the choice between single point measurements or the collection of time-resolved data. Endpoint measurements are the best option to screen large sets of strains, for example to identify clones with increased product formation. Time-resolved analysis facilitates the elucidation of transient regulatory processes in dynamic experiments [3, 4], and increases accuracy of uptake and production rate determination by exometabolomics. Further, temporal data is used to verify that the system under observation is in a steady state and, if not, to segment the experiment into shorter quasi steady-states to be analyzed following the paradigm of stationary systems [5]. 9.2.2

Targeted and Untargeted Metabolomics

On the analytical side, metabolomics analyses fall into two complementary categories: targeted and untargeted [6]. In targeted metabolomics, the metabolites

9.2 Fundamentals

of interest are defined in advance from prior knowledge or a working hypothesis that relates to a specific set of metabolites. Normally, the goal is then to quantify these compounds avoiding potential interferences of other molecules present in the same sample. The entire workflow from sample preparation to measurement and data analysis is tailored to obtain a possibly selective and accurate assay. Pure standards are employed to develop the assay, verify linearity and detection limits in complex matrices, and to determine absolute amounts upon calibration. Hence, targeted metabolomics is the approach of choice for applications that demand absolute quantification or to compare across labs, i.e. in diagnostics. Untargeted metabolomics pursues a different goal, which is the relative comparison of possibly many compounds over the whole metabolic network. This approach leverages on the exquisite multiplexing capability of modern high-resolution instruments, which allows detecting thousands of distinct chemical compounds in parallel. Untargeted metabolomics is ideally suited for broad scope discovery campaigns in which it is either not possible or not desired to focus on a subset of metabolites. In most cases, identification of detectable features metabolites is only partial, e.g. at the level of chemical formula. Only for features that are deemed relevant by data analysis, identity is verified with an in-depth MS characterization and compared to pure standards. Further differences exist in sample preparation, type of instrumentation, mode of acquisition, standardization procedures, etc. and will be discussed in the following. 9.2.3

Sequences and Standards

Modern mass spectrometers unite parallel detection of large sets of metabolites with unprecedented sensitivity and speed. Despite these benefits, the technology suffers from fundamental limitations that have to be considered when planning an experiment. An important issue is that the signal intensities reported by mass spectrometry vary considerably. Small and hard-to-control differences in the sample or buffer compositions, ambient conditions, instrument geometry and tuning, or inherently transient factors such as dirt accumulation or detector aging, all affect ion transmission and detection. To achieve accurate quantification, two measures are necessary. First, calibration curves must be routinely acquired to determine the relation between the metabolite amount and the recorded signal. Importantly, representative calibrations are obtained with pure standards spiked into a background that reflects the composition of study samples, i.e. the matrix. Second, internal standards at a fixed concentration are added to all study samples and calibration samples. Metabolite amounts are then quantified on the ratio of the analyte(s) of interest to the corresponding internal standard(s). Internal standards not only provide a linear correction, but they can also be spiked early during sample preparation to compensate variations in upstream processing such as degradation or pipetting errors. To be effective, internal standards should have the same chemical properties and abundance than the metabolites of interest. The best option is to use 13 C, or 15 N isotopologues, followed by deuterated variants which tend to deviate in elution and response [7]. In the absence of matching heavy standards, possibly similar proxies should be chosen ensuring that they do not occur in study samples. A common

261

262

9 Metabolomics

and inexpensive strategy to obtain 13 C-standards is to grow bacteria or yeasts in bioreactors on [U-13 C] glucose [8]. The resulting metabolites extracts are almost exclusivel uniformly 13 C labeled and can be used as an internal standard for hundreds of primary metabolites. Overall, a standard analysis sequence in targeted metabolomics starts with three to five mock samples to condition the system, followed by blanks and calibrants and the study samples in randomized order. A second set of calibrants is appended at the end to ensure that no tangible drifts occurred. Standardization of untargeted experiments follows a different paradigm. Internal standards are avoided because they do not cover the whole detectable metabolome and absolute quantification is not the primary goal. Instead, external standards are used. Commonly, a pooled study sample is obtained by mixing aliquots from study samples. The resulting pooled sample is intercalated at regular intervals in the sequence. Upon acquisition, the repeated measurements of the pooled samples are used to diagnose and correct drifts [9]. Optionally, external reference material is added to normalize across batches. This standardization approach is only effective to account for linear drifts and requires reference material to be identical across batches and over time.

9.3 Analytical Techniques 9.3.1

Sample Preparation

A major challenge in sampling is that enzymes can turn around metabolites in seconds. Hence, rapid quenching of all metabolic processes is essential to capture metabolite levels close to the in vivo state. In most cases, quenching is achieved by rapid cooling or freezing of samples. Adherent cells are directly exposed to liquid nitrogen or precooled buffers immediately after removal of growth medium [10]. For suspended cells, the common practice is to cool the suspension, centrifuge at low temperature, discard supernatant, and snap freeze the pellets [11, 12]. This procedure works best only with baker’s yeast, because it can be quickly cooled to −20 ∘ C in water:methanol without leaking [11]. Most other cell types leak metabolites if exposed to solvents. Therefore, all buffers used for washing or cooling must be aqueous and kept above the freezing point. Quenching is only partial and residual enzymatic activity biases the measurement. Alternatively, filtering has proved to be a valid approach [13, 14]. The core step of sample preparation is metabolite extraction, which aims at isolating all metabolites of interest and remove unwanted contaminants that disturb quantitation such as salts, macromolecules (RNA, DNA, proteins), and further debris (membranes). The simplest method is protein precipitation, in which a single solvent or a mixture of miscible solvents with chaotropic effect is added to the sample. During a period of incubation and vigorous shaking, macromolecules become disordered and metabolites are released from cells and subcellular compartments. Subsequent centrifugation precipitates the proteins and metabolites that are insoluble in the extraction solvent system. Different solvents are of use in metabolomics. Methanol, acetonitrile, or ethanol mixed

9.3 Analytical Techniques

with water are commonly used to extract polar and very hydrophilic metabolites. Butanol, isopropanol, methyl-tert-butyl ether (MTBE), dichloromethane, and chloroform are commonly used in miscible combinations to extract lipids and hydrophobic metabolites [15, 16]. Protein precipitation is simple and ideal for automation on liquid handling platforms. However, this method comes at the cost of incomplete separation of hydrophilic and hydrophilic metabolites. Moreover, protein precipitation is never complete and some proteins will persist in the clear supernatant. These are issues that can interfere with MS detection (e.g. matrix effects). In contrast to protein precipitation, two immiscible solvents are added sequentially to the sample to allow for liquid–liquid extraction (LLE). LLE follows the same steps of incubation, shaking, and centrifugation. The hallmark of LLE is that after centrifugation the solvent separates in distinct phases. The relative partitioning of analytes depends on several factors, which can partly be adjusted to enrich particular sets of metabolites over contaminants. One important factor is the partition coefficient (log P) [4], that is the logarithm of the concentrations ratio of an analyte in octanol vs. water mixed in equal volumes. Metabolites with negative log P are hydrophilic, while lipids and lipophilic metabolites have positive log P. Further relevant factors are the charge of the compounds, governed by the pH of the extraction buffer and the dissociation constants (pK a ) of ionizable groups, temperature, ionic strength, etc. Eventually, the unequal partitioning drives the extraction of hydrophilic and hydrophobic metabolites in the aqueous or the organic phase, respectively. Traditionally, the upper phase is aqueous due to lower viscosity compared to the organic phase that segregates at the bottom. This happens, for example, with the widely adopted LLE procedures by Folch et al. [17] and by Bligh and Dyer [18]. Recent protocols with inverted phase order were devised to facilitate transfer of the upper organic phase, e.g. the MTBE [19] and BUME [20] protocols. LLE enriches either hydrophilic or hydrophobic metabolites in the aqueous or organic phases, respectively, thereby providing a general cleaner extract for downstream processing. Due to the presence of two phases, it is more laborious especially that proteins precipitate in the interphase in several protocols. This makes it difficult to collect the lower phase without interruption of the protein interphase and requires more training. The same principle of metabolite partitioning applies to solid-phase extraction (SPE), but between a solid material and the liquid solvent. Typically, the sample dissolved in buffer is applied to an SPE cartridge or an SPE plate containing sorbent material. The sample is then forced through the solid sorbent by centrifugation, positive or negative pressure allowing some metabolites to bind (according to log P, pK a , pH, etc.) while others pass with the flow-through. To enrich for specific classes, different types of sorbents exist: alkyl material (C4 , C8 , C18 ) captures hydrophobic metabolites, charged sorbents (weak and strong ion exchangers) bind hydrophilic compounds, and mixed-mode sorbents to adsorb amphipathic metabolites. After binding and washing with a similar buffer, an elution solvent with strong affinity to the metabolites is used to release them from the sorbent. Compared to LLE, SPE provides higher purity and selectivity that can be achieved by tailoring sorbents and elution solvents to the extraction of specific metabolite classes. SPE is very suited for desalting

263

264

9 Metabolomics

[21] or lipid removal [22]. SPE allows concentrating the sample by adjusting the elution volume. With other extraction methods, this is only possible by adding a lyophilization (freeze-drying) and resuspension cycle at the end. This procedure, however, has several drawbacks: it adds noise, it is hard to automatize, and similarly concentrates all nonvolatile contaminants, thereby increasing the risk of interferences. Quenched cell pellets and metabolite extracts can be stored frozen for several months. In general, it is advisable to extract all samples in a unique batch with identical solvents. In case of large-scale studies, the best practice is to store quenched but unextracted cell pellets at −80 ∘ C until all samples are collected, and to proceed with extraction at the end for all samples. 9.3.2

Separation Techniques

In the analysis of biological samples, separation techniques are widely used to further reduce interferences, distinguish metabolites that cannot be resolved by MS, or support identification of metabolites with retention time information. Chromatography exploits the differential affinity of metabolites to a solid stationary phase immobilized in a column. Affinity is higher in case of hydrophobic, hydrogen bonding, 𝜋-𝜋, and electrostatic interactions. Similar to extraction, interactions in chromatography are governed by the physicochemical properties of the phases, analytes, and parameters such as temperature and pH. Samples are injected at one end and pushed toward the MS at the other end by a flow of mobile phase. Depending on the strength of binding, metabolites elute at a specific time called retention time. Chromatographic performance is expressed as the ratio between the length of the column and the width of the elution peak, and is given by the height equivalent of a theoretical plate (HETP). The smaller the HETP, the narrower the peaks, and the better the chromatographic process. The factors and the relationships affecting the HETP are summarized in the van Deemter equation [12]. Based on the nature of the mobile phase, chromatographic methods for metabolomics can be classified into three main categories: liquid, gas, and supercritical fluid chromatography. 9.3.2.1

Liquid Chromatography

In liquid chromatography (LC), pumps provide a liquid mobile phase to the analytical column with solid stationary phase. An injection port is located before the column to introduce samples. Isocratic methods use the same mobile phase throughout the run, but it is more common to change the relative composition of the mobile phase by the coordinated action of multiple pumps. Typically, a gradient from aqueous to organic (or vice versa) is employed because it provides better separation, narrower peak widths, and faster runs. The total flow can range orders of magnitude from nanoliters per minute in nano-LC to a few milliliters per minute in normal flow LC. Coupled with MS, the lowest flow rates favor sensitivity, while the highest are better suited for high-throughput and robustness. Different column dimensions such as length, internal diameter, and particle diameter are also available to match different flow rates, gradients, peak widths, and backpressure. In the last decade, columns packed with particles with 2 μm

9.3 Analytical Techniques

diameter and less have become very popular because they feature very low HETP, but require LC systems capable of operating at about 1000 bar pressure. The use of drastically different stationary phases makes LC a very versatile approach to handle different metabolite classes. In reverse-phase liquid chromatography (RPLC), metabolites and lipids are retained by hydrophobic interactions. Stationary phases include siloxane particles coated with C30 , C18 , C8, or C4 alkyl chains. Further, residual surface charges on the particles are protected by chemical groups to reduce unwanted interferences. Mobile phases are water and organic solvents such as methanol, isopropanol, and acetonitrile. A typical gradient starts with a predominantly aqueous solvent to promote hydrophobic interactions with the stationary phase and, thus, binding. The content of organic phase is gradually increased to almost 100% organic to elute all bound metabolites. Notably, extreme values such as 0% water or 100% water are generally avoided because of the risk of irreversibly collapsing the stationary phase unless especially resistant columns are used. Also, most silica-based materials are not stable above pH 8.5. Therefore, coating of silica reactive groups in most modern columns expand this pH range to 1–12. RPLC is well suited for polar and nonpolar lipids and metabolites with some hydrophobic moieties, e.g. alkyl chains or aromatic rings. Retention order is predictable on the basis of log P and pK a . Overall, RPLC is very robust and the most commonly used LC mode [23]. The direct complement to RPLC is normal-phase liquid chromatography (NPLC) [24]. In NPLC, the stationary phase is polar and retains the analytes by hydrogen bonding, dipole–dipole interactions, hydrophilic partitioning, and electrostatic interactions. NPLC stationary phases include amino-, cyano-, diol-, and silica packed columns. Hexane or dichloromethane are frequently used binding solvents, and isopropanol or ethanol are common eluting solvents. NPLC stationary phases are not very stable at the conditions required for broad-scope metabolomics studies. Instead, hydrophilic interaction liquid chromatography (HILIC) [25] is a variant that is widely used for the separation of polar metabolites with negative log P but offers also advantages for the analysis of lipids [26, 27]. Different from NPLC where nonpolar organic solvents are used, HILIC mobile phases are aprotic polar solvents such as acetonitrile as the noneluting solvents and water or methanol as the eluting solvent. Typical stationary phases are amide-, anionic-, cationic-, and zwitterionic-bonded silica phases [28]. The retention order is less robust than in RPLC, and small changes in solvent composition, ion strength, or pH can lead to unpredictable shifts or loss of resolution. Column equilibration is critical and takes more time and volume than in RPLC. The advantages of HILIC are the separation of polar metabolites, compatibility with sample extraction methods that provide metabolites in organic solvents, and good compatibility with MS because of the high organic content. An alternative approach for the analysis of polar metabolites is ion-pairing RPLC (IP-RPLC), which is similar to RPLC with the addition of an ion-pairing agent to the mobile phase [29]. The ion-pairing agent mediates interactions between the polar metabolites and the hydrophobic stationary phase. Hence, it typically contains a polar or charged group and a nonpolar group such as

265

266

9 Metabolomics

alkyl-sulfonates, alkyl-sulfates, or alkyl-ammonium salts. Importantly, the ion-pairing agent should be possibly volatile to decrease interference of the MS detection. Ion-pairing RPLC was proven very useful for the analysis of primary metabolism, in particular, in the challenging separation of isomers such as sugar-phosphates [30–32]. The main drawback is that the ion-pairing agent is persistent, and thorough cleaning of the LC system is necessary before switching to other methods. Ideally, a dedicated LC–MS system should be used for IP-RPLC. 9.3.2.2

Gas Chromatography

In gas chromatography (GC), the mobile phase is an inert gas (N2 , He2 , or Ar2 ) that does not interact with analytes. The stationary phase is typically a 10–60 m long fused silica capillary coated with polyimide to increase mechanical robustness. The inner walls of the capillary are functionalized with a variety of groups (e.g. diphenyl-, cyano-, polyethylene glycol) to modulate polarity. The column is located in an oven with precise temperature control. The sample is injected into the carrier gas flow and vaporized immediately to a temperature above the lowest boiling point of the sample. The temperature in the oven is ramped up to 250–300 ∘ C and metabolites elute according to the boiling point as well as interactions with the stationary phase. The retention time is very robust and the separation efficiency very high. Therefore, GC is one of the standard and most commonly used separation techniques in metabolomics studies [33, 34]. The main drawback of GC is that virtually all metabolites have to be first chemically derivatized to increase volatility, thermal stability, and decrease analyte polarity. Derivatization modifies the chemically reactive groups, such as carboxyl-, amino-, hydroxyl-, and thiol- groups through alkylation, acylation, esterification, or silylation. The most generic derivatization procedure is methoxymation followed by silylation with trimethylsilyl- groups, but other chemistries offer benefits for specific compounds classes. The need for derivatization introduces new problems. Full derivatization of large molecules with numerous polar groups might be sterically hindered and thus hard to achieve. Multiple and slow derivatization of amines leads to the emergence of multiple derivatives and complicates quantification. Hence, precise timing and automatization of derivatization are essential to obtain reliable results. Overall, GC remains a powerful approach to separate and quantify volatile or – upon derivatization – small compounds such as amino and organic acids, sugars, nucleobases, and fatty acids. Larger metabolites are problematic because of thermostability and incomplete derivatization [35]. 9.3.2.3

Alternative Separation Techniques

Additional systems are suited for the separation of metabolites. In supercritical fluid chromatography (SFC), the stationary phase is identical to the columns used in LC while the mobile phase is a supercritical fluid, typically supercritical CO2 which reaches supercritical state readily at 31 ∘ C and 73 bar [36]. Supercritical fluids have the capacity of liquids to dissolve metabolites, and the diffusivity close to gases allowing for a faster velocity. The combination of both properties

9.3 Analytical Techniques

makes SFC ideal for fast gradient separations. SFC is ideally suited for separation of lipids and nonpolar metabolites [37], but modifiers such as methanol can be added to the mobile phase to expand the spectrum to polar and hydrophilic metabolites [38]. Ion chromatography (IC) exploits merely ionic interactions. To separate anions, the stationary phase is a charged cation (e.g. quaternary ammonium) and the mobile phase a strong acid (e.g. methanesulfonic acid). For cations, a charged anion (e.g. carboxylic acid groups) and a strong base (e.g. potassium hydroxide) are used as stationary and mobile phases, respectively. IC is ideal for separation of organic acids and polyamines that might be poorly retained with other chromatographic techniques [39–41]. However, as the mobile phases are not compatible with MS, an intermediate step of ion exchange is essential before the eluent is introduced to the MS detector. An alternative, nonchromatographic separation method used in metabolomics is capillary electrophoresis (CE) [42]. In CE, separation is driven by electrophoretic mobility of metabolites, which largely depend on their charge and size. In a capillary, surface charges cause a bulk movement of the buffer containing electrolytes. This effect is called electro-osmosis and has the benefit that the electro-osmotic flow (EOF) it provokes is nearly perfectly laminar over the whole section of the capillary. Hence, it does not cause peak broadening. Importantly, the EOF can be freely modulated by adjusting buffer pH and surface charge. For instance, coating of the natively negative capillary walls with neutral groups or even positively charged agents leads to null or even reversed EOF, respectively [43]. CE also offers elegant options for concentrating (stacking) or multiplexing [44, 45]. Overall, CE combines resolution, sensitivity, with broad coverage of the polar metabolome [43, 46], but is less robust than chromatography methods and therefore requires experts hands. The last nonchromatographic separation method of note is ion mobility, which separates ionized metabolites in the gas phase on the basis of their average radius, given by the cross collisional section (CCS) [47, 48]. Ion mobility can be tightly integrated in MS instruments, which, as described later, already provide a mean to ionize metabolites and detect them. In drift-time ion mobility separation, the ions are separated in a drift tube where the forward motion of the ions is promoted by electric fields and challenged by a counter flow of inert gas. The CCS of each ion determines the resistance that it experiences and affects the total drift time. Upon calibration with standards, the measured drift time can be converted into a CCS value that can be compared to reference standards [49, 50] or predicted [51] values to help in identification. Ion mobility ideally complements chromatographic and MS separation with orthogonal information, and is particularly useful to distinguish structural and positional isomers with different shapes. Another type of ion mobility separation employs the different behavior of ions in high and low electric fields by applying a high field asymmetric waveform (FAIMS). Depending on their size, ions are deflected from the ideal trajectory that leads to the MS detector. A variable compensatory voltage is applied to realign a subset of ions with specific size properties. Hence, FAIMS is used as a filter to reduce interference of ions with different shapes [52, 53].

267

268

9 Metabolomics

9.3.3

Mass Spectrometry

Mass spectrometry (MS) is a technique that has revolutionized the study of metabolites and proteins concerning speed of analysis, accuracy in characterization and identification. Mass spectrometers use electric and magnetic fields to move, trap, filter, separate, or detect charged species. The features common to all MS instruments are only few: they all employ a process to obtain ionic and gaseous molecules from the sample, discriminate analytes on the basis of the mass-to-charge ratio (m/z), and report intensities in terms of current that relates to the number of charges detected. Apart from these core characteristics, mass spectrometers vary in working principle and design. These have profound effects on the specifications of the instruments, in particular, sensitivity, speed, dynamic range, spectral resolution, coverage, and capacity to support peak identification with multistage MS analysis. The latter point refers to the capacity of selecting ions by m/z, induce their fragmentation, and measure abundance and m/z of resulting fragments (i.e. MS2 ). Some instruments allow multiple isolation and fragmentation rounds (MS3 or MSn ) to achieve a finer structural analysis of the analyte. Multistage MS offers key benefits in sensitivity, resolution of compound with similar m/z, and metabolite identification [6]. 9.3.3.1

Ionization Techniques

Ionization is necessary to create metabolites in the gaseous and ionic form for MS analysis. Electrospray ionization (ESI) is the predominant technique for liquid samples. These include samples separated by LC, SFC, and CE or direct infusion and flow injection. In ESI, a strong electric field promotes the formation of a plume of small-sized droplets, referred to as Taylor cone [54]. The combination of solvent evaporation and charge repulsions results in the formation of either protonated or deprotonated molecular ions, depending on the polarity of the electric field. The exact mechanism is a matter of debate [55], but basic knowledge is sufficient to understand the key limitations of ESI. First, the ionization process is inefficient and only analytes at the droplet surface eventually get to the gas phase forcing analyte competition and interaction. The ionization of a metabolite is affected by the presence of others: for example, lipids suppress the ionization of polar compounds because of their natural propensity to accumulate at the surface. This phenomenon is called matrix effect, and is an important issue for quantification that justifies the use of isotopically labeled internal standards. Second, all measures that promote formation of small droplets or desolvation have a beneficial effect on efficiency and matrix effects: heating, increase in content of solvent with low boiling point (e.g. methanol and acetonitrile used in RPLC), use of nanoemitters, etc. [56–58]. Third, the soft ionization conditions in ESI favor the formation of molecular ions, but not uniquely. It is quite common to observe in source fragments with, e.g. water or CO2 losses, noncovalent adducts with solvent or salt molecules, as well as multiply charged ions. Overall, the same compound can generate multiple species with distinct m/z and, thus, spectral peaks. These complicate quantification and, in untargeted metabolomics, must be considered during peak annotation.

9.3 Analytical Techniques

Atmospheric pressure chemical ionization (APCI) is an alternative approach for liquid samples. In contrast to ESI, the solvent is first ionized by a corona discharge needle to produce reactive ion species, which in turn ionizes gas-phase neutral analytes. APCI operates at temperatures up to 500 ∘ C for desolvation. Compared to ESI, APCI is less prone to matrix effects due to ion formation in the gas phase and the lack of competition in droplets. The high temperatures can provoke undesired in-source fragmentation. APCI provides good sensitivity at high LC flow rates for high-throughput applications. Gaseous molecules produced from GC are ionized by either electron impact (EI) or chemical ionization (CI). In EI ionization, a beam of high-energy electrons impacts analytes to generate ionic radical fragments. This ionization technique is very efficient, insensitive to matrix effects, reproducible across labs, but is fully destructive in the sense that only fragments are charged and hence, detectable. The fragmentation spectra obtained by EI are structure-dependent, yet too complex to directly infer the original structure. Therefore, libraries with EI spectra of pure (derivatized) compounds are necessary for identification. CI uses a reactive gas such as methane or ammonia to mediate ionization. CI is softer and therefore allows observation of the molecular ion. For solid samples, two approaches are predominant in our context. Matrix-assisted laser desorption ionization (MALDI) uses a pulsed laser beam to transfer energy to the sample coated with a UV- or IR-absorbing matrix and is kept in a vacuum. This process releases a plume of mostly intact singly charged ions ranging from small to large molecules, which then traverse the MS. Once the cells or metabolite extract is deposited and dried, matrix is administered to the slide using an airbrush, automated sprayer, dried-droplet spotting, or sputtered-deposition [59]. Modern MALDI is fast and offers a high spatial resolution, which is determined based on the size of the laser width (down to microns). Therefore, it is well suited for fast screening of samples spotted on a grid or imaging of a cross-section. The drawback of MALDI is the use of the matrix, which (i) creates a strong background in the low mass range of spectra (m/z < 300) and (ii) crystallizes heterogeneously causing challenges for quantification. An alternative method is desorption electrospray ionization (DESI), which does not require a matrix but uses a charged solvent spray at atmospheric pressure. At the sample surface, a thin layer of solvent builds and desorption extracts compounds while splattered secondary ions are ejected from the desorption pool. Singly and multiply charged particles are drawn into the extended capillary and transferred into the mass spectrometer [56]. DESI is a representative of a large family of ionization methods that work at ambient pressure [60]. 9.3.3.2

Low-Resolution MS

An important building block of mass spectrometers is the quadrupole, which consists of four electrodes placed in a circle around the trajectory of an ion. Quadrupoles (and hexapoles, octupoles, etc.) can be operated with oscillatory electric potentials at a specific radio frequency (RF) to form fields that keep ions centered and thus transmit ions. If a continuous direct current is superimposed to the RF field, many trajectories become unstable and only a fraction of ions

269

270

9 Metabolomics

reaches the exit according to the Mathieu equations [61]. Ion stability depends on its m/z. As mass analyzers, quadrupoles are tuned to filter ions in a given m/z window that can be as thin as 0.2 amu, but is typically larger to avoid signal loss. The peak width and accuracy of quadrupoles is in the range of integer masses and sometimes decimals. Importantly, quadrupoles are set to a single voltage configuration at a time, and therefore can only isolate a specific packet of ions. If multiple m/z windows are desired, the instrument has to cycle through different voltage settings. Modern instruments can cycle in milliseconds, but this does not prevent an intrinsic loss in duty cycle and, hence, sensitivity. Linear ion traps (LITs) are an alternative type of MS that offers similar resolution and accuracy, but are better suited for a quick analysis of a wide mass range. The LIT first confines a packet of ions in an electric field and then selectively excite ions to eject them from the trap. Ejection is m/z dependent. This is exploited to measure the whole spectrum with a sequential excitation of m/z ranges, or to simply isolate a m/z window of interest. One drawback of ion traps is that the number of charges they can host in one packet has to be limited to avoid electrostatic interactions between analytes, referred to as space charging. This threshold is reached quickly with complex samples, and limits the measurable dynamic range within each spectrum. Both quadrupoles and linear ion traps are used in multistage MS for fragmentation, which is obtained by colliding molecules with kinetic energy against inert gases. By a slightly different mechanism, both retain the resulting fragments in the ion path for further analysis. Hence, they are widely used in hyphenated instruments for MS2 and MSn acquisition. 9.3.3.3

High-Resolution MS

The mass resolution of quadrupoles or LITs is insufficient to distinguish compounds with m/z differences lower than 0.5 amu. This is only possible with high-resolution instruments, which resolve peaks within few millidalton and report m/z values with sub-ppm accuracy. Accurate mass high-resolution MS is dominated by three technologies. Time-of-flight (TOF) MS instruments pulse ions in a field-free region and measure the time they require to reach the detector [62]. The pulse strength depends on the charge and translates in kinetic energy. Heavier ions fly with lower velocity and reach the detector later. The detector is coupled to fast electronics that count the number of ions in discrete time bins of about 1 ns. Pulsing and counting is repeated thousand times per second to produce a total spectrum that is transferred to the acquisition. TOF detectors can deliver up to 50–100 full spectra per second with minor losses in resolution. This mass analyzer pairs well with MALDI, due to the pulsed duty cycle of the instrument. The second important high-resolution mass analyzer is the orbital trap (i.e. Orbitrap) [63]. It consists of an inner, spindle-shaped electrode and an outer barrel, coaxial electrode split into two at the center. Ions orbit around the inner electrode and, importantly, oscillate along its axis with a frequency that depends on their m/z. The resulting current is recorded with the two outer electrodes, and frequencies are transformed to m/z by fast Fourier transform. Resolution depends on how long the oscillations in the trap have been recorded. The resolution capacity is much higher than that of any TOF-MS, in particular for m/z

9.3 Analytical Techniques

values 400 amu [119]. As a workaround, it is common practice to query databases of known compounds such as PubChem [120], ChemSpider [121], or organism-specific genomic reconstructions provided by SEED [122], KEGG [123], and curated repositories [124–126]. Confidence of annotation can be assessed using a target-decoy strategy [118]. The second cue available for metabolite identification is retention time (RT), provided that a separation step is included. RT supports identification if only few candidates have to be tested, they are available, and separable by chromatography. In untargeted metabolomics, however, RT matching for hundreds of metabolites without standards is problematic because of the strong dependency of RT values on the chromatographic system in use. This complicates comparing across labs or with published values. With GC the problem is addressed by including retention time standards (e.g. alkanes) to obtain a time-independent reference indexing system. In LC, retention indexing works partly for reverse phase methods [127] but not for systems with more complex retention mechanisms such as HILIC. For RPLC, machine learning allows predicting RT using RT data collected on the same system [128, 129] or other platforms [130]. Dedicated software exist to assist metabolite annotation with accurate mass and retention time using correlation analysis [131] or Bayesian statistics [132]. The third pillar of metabolite annotation is MS2 (or MSn ) data, which is the privileged source of structural information. Automated interpretation of MSn data evolved swiftly in the last decade [133]. Initially, MSn spectra annotation was based on manual analysis done by experts. This expert knowledge was translated into fragmentation rules that software can apply to simulate MSn spectra from structures. This works well for lipids with LipidBlast [134] and more in general for hydrogen rearrangements by MS-FINDER [135] or the commercial software Mass Frontier. In parallel, MSn spectral databases grew massively to aggregate

9.4 Data Analysis

curated spectra for tens of thousands of pure compounds. Notable resources are the NIST [136], METLIN [137], and the community databases MassBank (or MoNA) [138] and GNPS [139]. Measured spectra can be directly searched in these databases using simple similarity criteria [140]. Apart from direct matching, clustering of MS2 spectra has emerged as a successful technique to generate molecular networks in which structural similar compounds are linked and build clusters [139, 141]. This approach allows propagating annotation from known to unknowns [142]. Further, inclusion of knowledge of metabolic reactions increases annotation rate [143]. As the coverage of databases with measured spectra remains limited compared to the space of molecules [144], complementary approaches were developed to annotate MSn spectra starting from structural databases as PubChem and ChemSpider. For example, combinatorial in silico fragmentation of molecular graphs with MetFrag allows to search large compound libraries for candidates [145, 146]. Notably, in silico fragmentation has been substantially improved by machine learning to improve the quality of the prediction as demonstrated by the CFM-ID framework [147, 148]. An alternative approach is to infer the presence or absence of substructures (i.e. molecular fingerprints) from MSn data [149], and then search in structure databases for molecules with matching profile [150]. This powerful approach is easily accessible in SIRIUS [151]. 9.4.2

Data Analysis and Interpretation

An untargeted metabolomics experiment delivers intensities in arbitrary units for thousands of features, both annotated and unannotated. Therefore, it is important to adhere to the standards of statistics and properly account for the problem of multiple testing [152]. This can become quickly prohibitive in terms of number of replicates needed [153, 154]. However, analysis of untargeted metabolomics does not adhere to the strictest confidence thresholds that are standard, for example, in clinical studies. This is motivated by the questions for which metabolomics is employed, which focus on generating testable hypotheses on metabolic phenomena. Hence, it is quite common to deal with a small number of replicates (two to five) and poor statistical significance, but follow up experimentally on leads and directly verify biological relevance. Regardless of the scientific context in which metabolomics data is acquired, a few exploratory steps are of general use for quality control and initial data mining. These are available in all environments used for data science (R, python, Matlab) [155], and wrapped by metabolomics-specific solutions like MetaboAnalyst [156] and Wokflow4Metabolomics [157]. 9.4.2.1

Univariate Statistics

The simplest form of data analysis is to compare metabolites between known groups, one by one. Notably, changes in metabolites levels are more subtle than for transcripts (often less than twofold). To evaluate statistical significance, analysis of variance (ANOVA) models and Student’s t-tests are frequently used even though normality is not guaranteed [158]. Given the large number of independent tests, significance is evaluated upon correction for false discovery rates using

277

278

9 Metabolomics

the Benjamini–Hochberg [159] or Storey–Tibshirani [160] procedures. The latter method is recommended for studies in which the expected number of true differences is substantial, but requires a large number of features to properly fit a p-value distribution [152]. In the ranking of results, the attention should focus more on the magnitude (fold-change) rather than significance, in particular if only few replicates are available. The ideal outcome of each group comparison is a handful of particularly strong changes. This allows rationally generating hypotheses about the underlying metabolic differences or necessary interventions. If the number of changes is much larger, the analysis continues with a pathway or network enrichment analysis to eventually identify affected cellular processes. If no significant changes emerge from the univariate analysis, a multivariate analysis is adopted to assess data quality and then search for complex discriminatory patterns. 9.4.2.2

Multivariate Statistics

Multivariate methods fall into two large categories: unsupervised and supervised methods. In metabolomics, unsupervised methods are used for exploratory analyses; this means to glance through data to find predominant groups of samples or metabolites. This is particularly helpful for quality control. Biological replicates are expected to be close together. In contrast, outlier samples could be flawed by artifacts, and unexpected trends might emerge and call for additional normalization of batch effects or drifts. All techniques of dimension reduction are well suited for exploratory, unsupervised analysis. Principal component analysis is the conceptually simplest method that simply rotates axes to condense correlated variables in a set of principal components [161]. If the data contains multiple uncorrelated variables that cannot be compressed in two to three dimensions, nonlinear techniques such as stochastic network embedding are recommended to obtain a visual representation of sample similarity [162]. Reduced datasets are evaluated visually, or in conjunction with clustering algorithms like hierarchical clustering, k-means, or density-based [163]. Hierarchical clustering is also the fundament to plot data as heatmaps and practically associate groups of samples with specific changes in metabolites. Supervised methods in untargeted metabolomics are used for feature selection, which means to identify features that discriminate between predefined sample groups. As an extension of univariate analyses, multivariate analyses allow to identify characteristic patterns of data. The trend in metabolomics is to avoid overly complex or nonlinear methods because they are prone to identify patterns of dubious biochemical significance. Instead, linear discrimination methods such as partial least squares or support vector machines are preferred to favor biological interpretation [161]. 9.4.2.3

Pathway Analysis

Pathway analysis is a systematic way of determining whether metabolite changes are concentrated in specific metabolic pathways [164]. This analysis becomes relevant in two scenarios. First, when the number of metabolites identified by univariate or multivariate is too large for rational interpretation. Second, when no marked differences are found with traditional thresholds. In the latter scenario,

9.5 Emerging Trends for Cellular Analyses

pathway analysis aims at identifying more subtle changes, which become significant only because similar changes are also detected in proximal metabolites [165]. Hence, pathway analysis exploit the known network topology and metabolite annotation to project changes in a cellular context, thereby increasing the sensitivity and biological significance. The simplest forms of pathway analyses considers pathways as sets, not as organized graphs. Enrichment significance is calculated by the Fisher’s exact test. To overcome the problem that the results strongly depend on arbitrary thresholds, metabolites are ranked by magnitude or significance and the enrichment test is repeated for all top-N sets [165]. From all tests calculated for a pathway, only the best enrichment value is retained and compared with that of all other pathways. Correction might also be necessary to account for testing large numbers of pathways as in the case of univariate analyses. The limitation of pathway enrichment analyses is that they rely on an arbitrary pathway definition (often from KEGG) which does not reflect the architecture of metabolism or its regulatory modules and, thus, is prone to biases [166]. The alternative approach is to perform pathway analysis with the metabolic network, not discrete pathway sets. An example is given by the Mummichog algorithm for predicting network activity from untargeted data [167]. The approach uses the known metabolic network and seeks for subnetworks that concentrate strong metabolome changes by using algorithms that have been devised to detect communities within networks [168]. Because of its recursive nature, the Mummichog approach also helps to resolve annotation ambiguities by exploiting network neighborhood [167]. Similar methods of network mapping and maximum-weight subgraph extraction have been developed, even to account for inclusion of matching data on enzyme levels [169, 170]. An alternative concept for network analysis aims at identifying metabolic reactions that cause the measured metabolite changes [171]. The algorithm uses random Markov fields to segment the metabolic network in modules that are approximately at chemical equilibrium, and thereby highlights reactions that are far from equilibrium. The latter are likely to be the regulated sites and should be investigated in depth with targeted methods to unravel the exact mechanism. These are only a few examples from a lively and rapidly evolving field of network methods.

9.5 Emerging Trends for Cellular Analyses 9.5.1

High-Throughput Metabolomics for Large Scale Screening

Recent developments in metabolomics open novel opportunities in metabolic engineering. A rapidly evolving area is high-throughput metabolomics, which is likely to affect how screening will be applied in strain optimization. To date, a wide palette of options with a broad range of throughputs exist [172, 173]. If a separation step is needed to resolve metabolites or reduce matrix effects, it is challenging to go below five minutes per sample. Faster separations are technically possible, but only practicable if the sample composition is not overly complex

279

280

9 Metabolomics

or the analytes of interest are chemically heterogeneous. However, separation of similar compounds or complex matrices require time and reduce the overall throughput to 200–400 samples per day. Substantially higher throughputs can be achieved by omitting the separation step and, instead, relying on high-resolution mass spectrometry or tandem mass spectrometry to resolve individual metabolites. The simplest form of this is to use flow injection or direct infusion and electrospray ionization. These systems allow to analyze about one sample per minute, i.e. 1000–2000 samples per day. This is an unproblematic speed for modern mass spectrometers, and it is therefore possible to profile the full mass range with high-resolving instruments to detect thousands of features in parallel. These techniques are particularly suited for untargeted and thus full metabolome analysis of up to 10 000 samples in few weeks. For example, we used this approach to comprehensively profile metabolite changes in whole single-knockout collections [174]. In vitro, the untargeted approach was instrumental to efficiently screen for unknown catalytic activity of thousands of putative or already known enzymes [175]. Higher throughputs in the range of 10 000 samples per day are also possible, but require the use of different ionization techniques. Laser-based desorption ionization techniques enable to analyze samples within few seconds with negligible carryover and time loss between samples. MALDI is largely diffused in pharmaceutical screening, but the spectral interference of matrix ions can become problematic for applications that focus on small molecules. A valid matrix-free alternative technique is nanostructure-initiator mass spectrometry (NIMS) [176]. Both techniques require to deposit a sample to a metal support and dry, but can work with minute sample amounts. Hence, their potential is maximized in combination with microfluidics [177, 178]. At this speed, it becomes prohibitive to precisely profile full metabolomes. Hence, this approach guarantees quantification of a limited number of up to a dozen metabolites. To date, the highest throughput ever achieved in metabolomics was obtained by exploiting the ejection of nanoliter droplets from a liquid reservoir triggered by acoustic waves [179]. This system offers an exceptional sampling rate of three samples per second, which translates in 100 000 samples per day and instrument. Such ultrahigh-throughput system are used for screening enzymes with specific enzymatic activity or for screening of compound libraries. 9.5.2

Single Cell Metabolomics

Single cell metabolomics is a further trending topic of relevance in metabolic engineering to investigate clonal effects, the consequence of transcriptional noise, or the emergence of subpopulations. The quantification of metabolites in single cells is limited primarily by the absolute sensitivity of mass spectrometers. Under optimized experimental conditions, expert labs are able to profile dozens of the most abundant metabolites in a single yeast cell using laser desorption [180]. Metabolome analysis of bacteria remains a prohibitive task because of the smaller size and insufficient number of molecules they provide. In contrast, larger cells such as mammalian cell lines and plant cells provide sufficient

9.6 Applications of Metabolomics in Metabolic Engineering

material and surface to be sampled multiple times and generate two-dimensional images that reflect spatial organization of small molecules within cells. Apart of instrument sensitivity, the practical challenge in single cell metabolomics is to obtain metabolites from a single cell while avoiding dilutions and all further issues that also apply to population metabolomics like cellular stress, starvation, slow extraction, insufficient quenching, etc. This is an area of active developments that heavily relies on techniques for microsampling, droplet microfluidics, and concentration at the micro scale [180–182]. 9.5.3

Dynamic Analysis

Time-resolved metabolomic measurements are increasingly used to infer metabolic regulation in perturbation experiments [3, 4]. Importantly, sampling rate must be fast for carefully describing metabolome dynamics. To investigate fast mechanisms such as allosteric regulation or post-translational modifications, multiple samples per minute are crucial. Considering that sampling, quenching, and extraction has to last a maximum of few seconds, all methods relying on centrifugation have to be excluded. Instead, two special types of sampling are used in this case. The first is using filters [3, 183]. Cells are grown in suspension and then transferred to a filter that retains them. To preserve growth and a metabolic steady state, medium is continuously poured on the cells and sucked through the filter by negative pressure. Step-like perturbations to the cells are achieved by switching to different media at predefined times. At any time point, immediate sampling is done by transferring the filters with cells to a quenching solution, for example, a methanol mix. This procedure generates experiments and samples with precise timing. The drawback is that an independent filter has to be used for each sample, and a collection of multiple time points is labor intensive. The second technique is to directly interface the cultivation system to an MS with ESI [4]. Cells must be in suspension and continuously cycle in a low pressure loop that includes a six-port injection valve. Upon switching medium, cells are diverted to the ESI source in a fraction of a second and ionized. This system allowed to profile about 250 metabolites of central metabolism at a frequency of five samples per minute for longer periods with low noise. The drawback of this approach is that it does not separate medium and cells and is therefore better suited to analyze cells in minimal media.

9.6 Applications of Metabolomics in Metabolic Engineering 9.6.1

Pathway Design by Thermodynamic Analysis

In the design of biosynthetic pathways for the synthesis of a product of interest (POI), the first step is to choose a host. For hosts that do not natively produce the desired product, design must include a heterologous pathway that uses endogenous metabolic precursors and cofactors. An absolute requirement for

281

282

9 Metabolomics

any pathway is thermodynamic feasibility, i.e. the difference in Gibbs free energy of each included reaction Δr G′ is negative [184]. Given the pH and ionic strength of each cellular compartment, the Δr G′ primarily depends on the concentration of all involved reactants. The implication is that the reaction (or the pathway) can only proceed in a given direction if metabolites are within a range that make it possible. Since metabolic reactions coexist in the same compartment (e.g. cytoplasm and possibly mitochondria), reaction fluxes are constrained by metabolite concentrations beyond what is predicted by mere stoichiometry [185, 186]. Constraint-based modeling can be extended to fulfill thermodynamic constraints [187] in the calculation of maximum yields. Further, the Δr G′ also indicates the ratio of forward to backward flux of a reaction. This so-called flux–force relationship can be exploited to calculate the so-called max-min driving force (MDF) of a pathway [188], which is the energy dissipated by the least thermodynamically favorable reaction in a pathway. A substantial fraction of an enzyme that operates close to equilibrium (Δr G′ ∼ 0) will be busy with catalyzing the reaction in the undesired direction. Further from equilibrium, enzymes mostly catalyze flux in the forward direction and their manipulation is likely to exert a larger effect. The MDF provides a concise way of discriminating pathways that have favorable thermodynamics and an accompanying lower enzyme cost and thus pose a lower burden on the host. In all these thermodynamic analyses, quantitative metabolomics measurements are instrumental to instruct the model with physiological concentrations, and provide more accurate assessment on which pathways are likely to operate or will depend on deviations from the normal range. For example, the POPPY [189] software integrates biosynthetic pathway enumeration, network embedded thermodynamic and MDF analyses, and experimental metabolomics data to quantitatively assess the thermodynamic potential of pathways. The analysis was extended to different hosts to identify optimal pathways in each specific context [189]. An MDF analysis supported by metabolomics data was also applied to the organism Clostridium thermocellum in the fermentation of cellulose to ethanol [190]. This analysis highlighted five reactions as possible bottlenecks for product formation. Modifications of the organism were evaluated by computing the MDF for 336 elementary flux modes in a metabolic model of C. thermocellum. This analysis found two pathways with a positive and high MDF. One of these pathways seemed to be functionally similar to the pathway employed by the ethanol-producing bacterium Thermoanaerobacterium saccharolyticum. In the case of mevalonate-producing Escherichia coli, metabolomics and thermodynamic analysis were employed to determine Δr G′ and substrate saturation for reactions in central carbon metabolism – both in controls and recombinant strains producing mevalonate [191]. To increase mevalonate production, an engineered E. coli was subjected to thermodynamic analysis. The absolute concentrations of central carbon metabolism intermediates were measured in order to assess the thermodynamic driving force of all reactions and enzyme saturation. The analysis indicated that a limitation in Acetyl-CoA and NADPH could limit mevalonate formation in the recombinant strain, and suggest activation of the Entner–Doudoroff pathway as optimization strategy. The newly

9.6 Applications of Metabolomics in Metabolic Engineering

engineered strain with increased Entner–Doudoroff pathway achieved >50% mevalonate production rate compared to the parent recombinant strain. 9.6.2

Alleviating Pathway Bottlenecks

Once an adequate host and pathway have been selected, it is key to find the right expression levels for all enzymes and to alleviate other types of regulation that impede flux through the pathway. A straightforward approach is to use metabolomics to identify peculiar changes within the pathway of interest. Peculiar accumulation of an intermediate suggests that the activity of the downstream enzyme might be limiting. Vice versa, a metabolite decrease might indicate substrate limitation. For example, targeted IP-RPLC-MS/MS metabolomics was used to identify a CoA imbalance that hampered the production of 1-butanol in E. coli [192]. The imbalance was identified as a result of high levels of CoA-derived compounds such as pyruvate and butanoate. The expression of the bottleneck enzyme alcohol dehydrogenase AdhE2 was engineered through a ribosome binding site library, resulting in an improvement of the titer of 1-butanol from 15 to 18.3 g/l. In a different example, quantification of central carbon metabolism intermediates by targeted LC-MS/MS in Corynebacterium glutamicum allowed for the identification of pyruvate kinase (pyk) as the bottleneck in the co-utilization of l-arabinose with d-glucose [193]. Upon the overexpression of pyk and the deletion of the genomic araR, which represses genes involved in arabinose uptake and catabolism, an equal uptake rate of arabinose and glucose was achieved without compromising glucose utilization. Human intuition can only handle simple patterns. In most cases, formal frameworks are necessary to infer engineering strategies from metabolomics data. Metabolic control analysis (MCA) is a popular mathematical framework that describes how the steady-state of a biochemical system depends on infinitesimal changes in parameters, including metabolite and enzyme levels [194, 195]. MCA identifies reactions or parameters that exert a lot of control over steady-state pathway flux and which are therefore good engineering targets. Both steady-state metabolomics and short-term perturbation experiments accompanied by metabolome analysis are commonly used to calculate the flux control coefficients and elasticity coefficients that are at the heart of MCA. For example, MCA was applied to optimization of tryptophan production in E. coli strains grown on glycerol [196]. Strains were perturbed with four different metabolites (glycerol, glucose, succinate, and pyruvate) and the concentration of 56 metabolites was monitored over time. MCA identified the last step of serine biosynthesis as a likely bottleneck and a larger supply of the precursor metabolite PRPP as another target for engineering. MCA requires knowledge of the steady state fluxes through a pathway, which need to be either measured or estimated. An alternative approach that does not require flux estimation was taken in the engineering of the hexosamine biosynthetic pathway in Bacillus subtilis for the production of N-acetylglucosamine [197]. Time-resolved metabolome data was used to fit an ensemble of small phenomenological models of ordinary differential equations (ODE). Each model

283

284

9 Metabolomics

represented a different hypothesis of a pathway bottleneck. This relatively data-light approach allowed for the identification of a futile cycle involving ATP. The knock out of the responsible glucokinase more than doubled the production of the product N-acetylglucosamine. In contrast, this is an example that used ODE models to describe coarse hypotheses; the same type of models can be used to describe in detail enzyme kinetics of whole metabolic networks. Obviously, this increases the demand for data (i.e. metabolites and enzyme quantities, kinetic parameters, fluxes) and computational difficulty. An excellent example is the investigation of 1,4-butanediol production in E. coli using large-scale kinetic modeling [198]. 9.6.3

Reduction of Side Products and Metabolite Damage

A fundamental tradeoff between catalytic rate and specificity implies that many enzymes catalyze side-reactions and produce toxic or otherwise wasteful products [199]. These so-called damage reactions are favored in pathways that carry high flux or entail large metabolite pools [200], both of which are pursued in metabolic engineering. Since damage-reactions also occur natively in any organism, repair-enzymes have evolved to neutralize them [200, 201]. Because of the poor predictability, untargeted metabolomics remains the method of choice to discover unexpected side products hidden in unknown fraction of detectable features [202]. Even though a side product might have only negligible effect on host fitness, productivity, or yields, it can negatively affect product quality. One such example is wine making, in which yeasts play the key role in ethanol fermentation but can spoil the product with byproducts that affect the taste. Such undesired effect occurred in yeast strains that produce reduced-ethanol wines, and prompted a multiomics study investigation to improve the sensory aspect [203]. Metabolomics and olfactory curation by expert assessors identified 2,4,5-trimethyl-1,3-dioxolane (TMDX) as an additional off-flavor metabolite. TMDX is produced through a spontaneous damage-reaction between acetaldehyde and 2,3-butanediol. Since no enzyme is involved, reduction of TMDX could only be achieved by making the reaction less thermodynamically favorable. This strategy was successfully pursued by knocking out a dehydrogenase responsible for the bulk of the formation of the reactions substrate 2,3-butanediol. 9.6.4

Improving Stress Tolerance

A common goal in metabolic engineering is to engineer strains that are better suited to the constraints imposed by a production process. One key aspect is to improve the tolerance of the host to process conditions or product formation. For example, a S. cerevisiae strain that is used in the production of ethanol from lignocellulosic biomass. Its growth and thus rate of ethanol production suffer from the toxicity of weak organic acids like acetic acid and formic acid, which are released during solubilization of lignocellulosic material. Targeted absolute quantification of central carbon metabolites of S. cerevisiae using CE-MS and GC–MS revealed elevated pentose-phosphate pathway (PPP) intermediates upon addition of acetic

9.7 Final Remarks

acid [204]. This indicated an unexpected thermodynamic or kinetic hurdle in nonoxidative PPP. A strain overexpressing the PPP enzyme transaldolase TAL1 was able to better cope with the stress induced by acetate and formic acid and showed markedly more production of ethanol. Enhanced tolerance can also be achieved by investigating endogenous mechanisms that confer stress tolerance. Yarrowia lipolytica exhibits natural tolerance to ionic liquids. Through adaptive laboratory evolution, strains with enhanced tolerance were obtained and analyzed by an array of methods including metabolomics and lipidomics [204]. Profound rearrangements of the membrane composition were detected, including an increase in sterol content supported by upregulation in gene expression of the corresponding biosynthetic pathway. Pharmacological inhibition of this pathway severely sensitized the cells and, hence, confirmed the relevance of sterol in the adaptation to ionic liquids. Untargeted metabolomics was also employed to elucidate the mechanisms of osmotic stress tolerance. In E. coli, it lead to the discovery that ubiquinone natively accumulates with increasing salt concentration to stabilize membrane stability and withstand osmotic stress [205]. The same effect was obtained with external supplementation and absence of endogenous biosynthesis. In a follow-up study, metabolomics was applied to 15 species to unravel the specific osmoprotection strategies depending on species’ taxonomy, habitat, and envelope structure [206]. 9.6.5

Engineer Medium Composition

By monitoring metabolic state, metabolomics offers unique opportunities to improve process conditions. A representative example of process development involved the bioproduction of the immunosuppressant tacrolimus with Streptomyces tsukubaensis [207]. Untargeted metabolomics was used to profile 98 metabolites in cultures with low and high productivities. A partial least squares model revealed 13 metabolites that were statistically associated to the product. Highest correlations were found for methylmalonyl-CoA and shikimate. A rational feeding strategy was designed based on these results and led to an increase in yield from 251 to 405 mg l−1 . Concomitantly, the formation of the two main byproducts decreased by 30–40%. The same principles can be applied to the optimization of growth media for mammalian cells. For example, a metabolomics study of chicken embryo fibroblasts led to 40% increase in cell density and 2.7 fold increase in vaccine production [208].

9.7 Final Remarks The metabolomics toolbox has greatly expanded in the past two decades. The analytical possibilities offer a full range of options: from quantitative measurements to broad-scope profiling, from polar to lipophilic compounds, from high-throughput to real-time and time-resolved analyses, etc. In metabolic engineering, quantitative metabolomics data allow using thermodynamic and

285

286

9 Metabolomics

kinetic models for predictive analyses that guide pathway optimization. In parallel, large-scale metabolomics studies with tens of thousands of mutants are in reach, and hold promise to support screening for better producers and, at the same time, unravel intracellular state that is beneficial for product formation. This offers novel opportunities for pathway engineering, in particular if combined with the tools offered by synthetic biology.

References 1 Chubukov, V., Gerosa, L., Kochanowski, K., and Sauer, U. (2014). Coordina2 3

4

5

6 7

8

9

10

11 12

tion of microbial metabolism. Nat. Rev. Microbiol. 12 (5): 327–340. Gerosa, L. and Sauer, U. (2011). Regulation and control of metabolic fluxes in microbes. Curr. Opin. Biotechnol. 22 (4): 566–575. Link, H., Kochanowski, K., and Sauer, U. (2013). Systematic identification of allosteric protein-metabolite interactions that control enzyme activity in vivo. Nat. Biotechnol. 31 (4): 357–361. Link, H., Fuhrer, T., Gerosa, L. et al. (2015). Real-time metabolome profiling of the metabolic switch between starvation and growth. Nat. Methods 12 (11): 1091–1097. Zampar, G.G., Kummel, A., Ewald, J. et al. (2013). Temporal system-level organization of the switch from glycolytic to gluconeogenic operation in yeast. Mol. Syst. Biol. 9: 651. Patti, G.J., Yanes, O., and Siuzdak, G. (2012). Innovation: metabolomics: the apogee of the omics trilogy. Nat. Rev. Mol. Cell Biol. 13 (4): 263–269. Berg, T. and Strand, D.H. (2011). (13)C labelled internal standards – a solution to minimize ion suppression effects in liquid chromatography-tandem mass spectrometry analyses of drugs in biological samples? J. Chromatogr. A 1218 (52): 9366–9374. Wu, L., Mashego, M.R., van Dam, J.C. et al. (2005). Quantitative analysis of the microbial metabolome by isotope dilution mass spectrometry using uniformly 13 C-labeled cell extracts as internal standards. Anal. Biochem. 336 (2): 164–171. Dunn, W.B., Broadhurst, D., Begley, P. et al. (2011). Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nat. Protoc. 6 (7): 1060–1083. Pinu, F.R., Villas-Boas, S.G., and Aggio, R. (2017). Analysis of intracellular metabolites from microorganisms: quenching and extraction protocols. Metabolites 7 (4): 53. Canelas, A.B., Ras, C., ten Pierick, A. et al. (2008). Leakage-free rapid quenching technique for yeast metabolomics. Metabolomics 4 (3): 226–239. Ewald, J.C., Heux, S., and Zamboni, N. (2009). High-throughput quantitative metabolomics: workflow for cultivation, quenching, and analysis of yeast in a multiwell format. Anal. Chem. 81 (9): 3623–3629.

References

13 Taymaz-Nikerel, H., de Mey, M., Ras, C. et al. (2009). Development and

14

15 16

17

18 19

20

21

22

23

24

25

26

27

28

application of a differential method for reliable metabolome analysis in Escherichia coli. Anal. Biochem. 386 (1): 9–19. Bordag, N., Janakiraman, V., Nachtigall, J. et al. (2016). Fast filtration of bacterial or mammalian suspension cell cultures for optimal metabolomics results. PLoS One 11 (7): e0159389. Alshehry, Z.H., Barlow, C.K., Weir, J.M. et al. (2015). An efficient single phase method for the extraction of plasma lipids. Metabolites 5 (2): 389–403. Gil, A., Zhang, W., Wolters, J.C. et al. (2018). One- vs two-phase extraction: re-evaluation of sample preparation procedures for untargeted lipidomics in plasma samples. Anal. Bioanal. Chem. 410 (23): 5859–5870. Folch, J., Lees, M., and Sloane Stanley, G.H. (1957). A simple method for the isolation and purification of total lipides from animal tissues. J. Biol. Chem. 226 (1): 497–509. Bligh, E.G. and Dyer, W.J. (1959). A rapid method of total lipid extraction and purification. Can. J. Biochem. Physiol. 37 (8): 911–917. Matyash, V., Liebisch, G., Kurzchalia, T.V. et al. (2008). Lipid extraction by methyl-tert-butyl ether for high-throughput lipidomics. J. Lipid Res. 49 (5): 1137–1146. Lofgren, L., Stahlman, M., Forsberg, G.B. et al. (2012). The BUME method: a novel automated chloroform-free 96-well total lipid extraction method for blood plasma. J. Lipid Res. 53 (8): 1690–1700. Johnson, W.M., Soule, M.C.K., and Kujawinski, E.B. (2017). Extraction efficiency and quantification of dissolved metabolites in targeted marine metabolomics. Limnol. Oceanogr. Methods 15 (4): 417–428. Sitnikov, D.G., Monnin, C.S., and Vuckovic, D. (2016). Systematic assessment of seven solvent and solid-phase extraction methods for metabolomics analysis of human plasma by LC-MS. Sci. Rep. 6: 38885. Naser, F.J., Mahieu, N.G., Wang, L.J. et al. (2018). Two complementary reversed-phase separations for comprehensive coverage of the semipolar and nonpolar metabolome. Anal. Bioanal. Chem. 410 (4): 1287–1297. Kamleh, A., Barrett, M.P., Wildridge, D. et al. (2008). Metabolomic profiling using Orbitrap Fourier transform mass spectrometry with hydrophilic interaction chromatography: a method with wide applicability to analysis of biomolecules. Rapid Commun. Mass Spectrom. 22 (12): 1912–1918. Alpert, A.J. (1990). Hydrophilic-interaction chromatography for the separation of peptides, nucleic acids and other polar compounds. J. Chromatogr. 499: 177–196. Guder, J.C., Schramm, T., Sander, T., and Link, H. (2017). Time-optimized isotope ratio LC-MS/MS for high-throughput quantification of primary metabolites. Anal. Chem. 89 (3): 1624–1631. Tang, D.Q., Zou, L., Yin, X.X., and Ong, C.N. (2016). HILIC-MS for metabolomics: an attractive and complementary approach to RPLC-MS. Mass Spectrom. Rev. 35 (5): 574–600. Wernisch, S. and Pennathur, S. (2016). Evaluation of coverage, retention patterns, and selectivity of seven liquid chromatographic methods for metabolomics. Anal. Bioanal. Chem. 408 (22): 6079–6091.

287

288

9 Metabolomics

29 Horvath, C., Melander, W., Molnar, I., and Molnar, P. (1977). Enhancement

30

31

32

33

34

35

36 37 38

39

40

41

42

43

of retention by ion-pair formation in liquid-chromatography with nonpolar stationary phases. Anal. Chem. 49 (14): 2295–2305. Lu, W.Y., Clasquin, M.F., Melamud, E. et al. (2010). Metabolomic analysis via reversed-phase ion-pairing liquid chromatography coupled to a stand alone Orbitrap mass spectrometer. Anal. Chem. 82 (8): 3212–3221. Buescher, J.M., Moco, S., Sauer, U. et al. (2010). Ultrahigh performance liquid chromatography-tandem mass spectrometry method for fast and robust quantification of anionic and aromatic metabolites. Anal. Chem. 82 (11): 4403–4412. Coulier, L., Bas, R., Jespersen, S. et al. (2006). Simultaneous quantitative analysis of metabolites using ion-pair liquid chromatography-electrospray ionization mass spectrometry. Anal. Chem. 78 (18): 6573–6582. Fiehn, O. (2016). Metabolomics by gas chromatography-mass spectrometry: combined targeted and untargeted profiling. Curr. Protoc. Mol. Biol. 114: 30.4.1–30.4.32. Beale, D.J., Pinu, F.R., Kouremenos, K.A. et al. (2018). Review of recent developments in GC-MS approaches to metabolomics-based research. Metabolomics 14 (11): 152. Buescher, J., Czernik, D., Ewald, J.C. et al. (2009). Cross-platform comparison of methods for quantitative metabolomics of primary metabolism. Anal. Chem. 81 (6): 2135–2143. Klesper, E., Corwin, A.H., and Turner, D.A. (1962). High pressure gas chromatography above critical temperatures. J. Org. Chem. 27 (2): 700–701. Lisa, M. and Holcapek, M. (2018). UHPSFC/ESI-MS analysis of lipids. Methods Mol. Biol. 1730: 73–82. Shulaev, V. and Isaac, G. (2018). Supercritical fluid chromatography coupled to mass spectrometry – a metabolomics perspective. J. Chromatogr. B Anal. Technol. Biomed. Life Sci. 1092: 499–505. Schwaiger, M., Rampler, E., Hermann, G. et al. (2017). Anion-exchange chromatography coupled to high-resolution mass spectrometry: a powerful tool for merging targeted and non-targeted metabolomics. Anal. Chem. 89 (14): 7667–7674. Petucci, C., Zelenin, A., Culver, J.A. et al. (2016). Use of ion chromatography/mass spectrometry for targeted metabolite profiling of polar organic acids. Anal. Chem. 88 (23): 11799–11803. Wang, J., Christison, T.T., Misuno, K. et al. (2014). Metabolomic profiling of anionic metabolites in head and neck cancer cells by capillary ion chromatography with Orbitrap mass spectrometry. Anal. Chem. 86 (10): 5116–5124. Zhang, W., Hankemeier, T., and Ramautar, R. (2017). Next-generation capillary electrophoresis-mass spectrometry approaches in metabolomics. Curr. Opin. Biotechnol. 43: 1–7. Soga, T., Ueno, Y., Naraoka, H. et al. (2002). Simultaneous determination of anionic intermediates for Bacillus subtilis metabolic pathways by capillary electrophoresis electrospray ionization mass spectrometry. Anal. Chem. 74 (10): 2233–2239.

References

44 Grochocki, W., Markuszewski, M.J., and Quirino, J.P. (2016). Different detec-

45

46

47 48

49

50

51

52

53 54 55 56 57

58

59

tion and stacking techniques in capillary electrophoresis for metabolomics. Anal. Methods 8 (6): 1216–1221. Kuehnbaum, N.L., Kormendi, A., and Britz-McKibbin, P. (2013). Multisegment injection-capillary electrophoresis-mass spectrometry: a high-throughput platform for metabolomics with high data fidelity. Anal. Chem. 85 (22): 10664–10669. Sasaki, K., Sagawa, H., Suzuki, M. et al. (2019). Metabolomics platform with capillary electrophoresis coupled with high-resolution mass spectrometry for plasma analysis. Anal. Chem. 91 (2): 1295–1301. Sinclair, E., Hollywood, K.A., Yan, C. et al. (2018). Mobilising ion mobility mass spectrometry for metabolomics. Analyst 143 (19): 4783–4788. Levy, A.J., Oranzi, N.R., Ahmadireskety, A. et al. (2019). Recent progress in metabolomics using ion mobility-mass spectrometry. TrAC Trends Anal. Chem. 116: 274–281. Paglia, G., Williams, J.P., Menikarachchi, L. et al. (2014). Ion mobility derived collision cross sections to support metabolomics applications. Anal. Chem. 86 (8): 3985–3993. Picache, J.A., Rose, B.S., Balinski, A. et al. (2019). Collision cross section compendium to annotate and predict multi-omic compound identities. Chem. Sci. 10 (4): 983–993. Plante, P.L., Francovic-Fontaine, E., May, J.C. et al. (2019). Predicting ion mobility collision cross-sections using a deep neural network: DeepCCS. Anal. Chem. 91 (8): 5191–5199. Szykula, K.M., Meurs, J., Turner, M.A. et al. (2019). Combined hydrophilic interaction liquid chromatography-scanning field asymmetric waveform ion mobility spectrometry-time-of-flight mass spectrometry for untargeted metabolomics. Anal. Bioanal. Chem. 411 (24): 6309–6317. Jonasdottir, H.S., Papan, C., Fabritz, S. et al. (2015). Differential mobility separation of leukotrienes and protectins. Anal. Chem. 87 (10): 5036–5040. Fenn, J.B., Mann, M., Meng, C.K. et al. (1990). Electrospray ionization-principles and practice. Mass Spectrom. Rev. 9 (1): 37–70. Konermann, L., Ahadi, E., Rodriguez, A.D., and Vahidi, S. (2013). Unraveling the mechanism of electrospray ionization. Anal. Chem. 85 (1): 2–9. Peacock, P.M., Zhang, W.J., and Trimpin, S. (2017). Advances in ionization for mass spectrometry. Anal. Chem. 89 (1): 372–388. Schmidt, A., Karas, M., and Dulcks, T. (2003). Effect of different solution flow rates on analyte ion signals in nano-ESI MS, or: when does ESI turn into nano-ESI? J. Am. Soc. Mass Spectrom. 14 (5): 492–500. Liuni, P. and Wilson, D.J. (2011). Understanding and optimizing electrospray ionization techniques for proteomic analysis. Expert Rev. Proteomics 8 (2): 197–209. Oviano, M. and Bou, G. (2019). Matrix-assisted laser desorption ionization-time of flight mass spectrometry for the rapid detection of antimicrobial resistance mechanisms and beyond. Clin. Microbiol. Rev. 32 (1): e00037-18.

289

290

9 Metabolomics

60 Clendinen, C.S., Monge, M.E., and Fernandez, F.M. (2017). Ambient mass

spectrometry in metabolomics. Analyst 142 (17): 3101–3117. 61 Sudakov, M.Y. and Mamontov, E.V. (2016). Analysis of the quadrupole mass

62 63 64

65

66

67

68

69

70

71

72

73

74

filter with quadrupole excitation by the envelope equation method. Tech. Phys. 61 (11): 1715–1723. Boesl, U. (2017). Time-of-flight mass spectrometry: introduction to the basics. Mass Spectrom. Rev. 36 (1): 86–109. Zubarev, R.A. and Makarov, A. (2013). Orbitrap mass spectrometry. Anal. Chem. 85 (11): 5288–5296. Eilera, J., Cesar, J., Chimiak, L. et al. (2017). Analysis of molecular isotopic structures at high precision and accuracy by Orbitrap mass spectrometry. Int. J. Mass Spectrom. 422: 126–142. Ghaste, M., Mistrik, R., and Shulaev, V. (2016). Applications of Fourier transform ion cyclotron resonance (FT-ICR) and Orbitrap based high resolution mass spectrometry in metabolomics and lipidomics. Int. J. Mol. Sci. 17 (6) https://doi.org/10.3390/ijms17060816. Kirwan, J.A., Weber, R.J.M., Broadhurst, D.I., and Viant, M.R. (2014). Direct infusion mass spectrometry metabolomics dataset: a benchmark for data processing and quality control. Sci. Data 1 https://doi.org/10.1038/sdata.2014 .12. Domingo-Almenara, X., Montenegro-Burke, J.R., Ivanisevic, J. et al. (2018). XCMS-MRM and METLIN-MRM: a cloud library and public resource for targeted analysis of small molecules. Nat. Methods 15 (9): 681–684. Yuan, M., Breitkopf, S.B., Yang, X.M., and Asara, J.M. (2012). A positive/negative ion-switching, targeted mass spectrometry-based metabolomics platform for bodily fluids, cells, and fresh and fixed tissue. Nat. Protoc. 7 (5): 872–881. Koelmel, J.P., Kroeger, N.M., Gill, E.L. et al. (2017). Expanding lipidome coverage using LC-MS/MS data-dependent acquisition with automated exclusion list generation. J. Am. Soc. Mass Spectrom. 28 (5): 908–917. Fenaille, F., Saint-Hilaire, P.B., Rousseau, K., and Junot, C. (2017). Data acquisition workflows in liquid chromatography coupled to high resolution mass spectrometry-based metabolomics: where do we stand? J. Chromatogr. A 1526: 1–12. Bonner, R. and Hopfgartner, G. (2019). SWATH data independent acquisition mass spectrometry for metabolomics. TrAC, Trends Anal. Chem. 120 https://doi.org/10.1016/j.trac.2018.10.014. Nash, W.J. and Dunn, W.B. (2019). From mass to metabolite in human untargeted metabolomics: recent advances in annotation of metabolites applying liquid chromatography-mass spectrometry data. TrAC, Trends Anal. Chem. 120: 115324. MacLean, B., Tomazela, D.M., Shulman, N. et al. (2010). Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26 (7): 966–968. Smith, C.A., Want, E.J., O’Maille, G. et al. (2006). XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Anal. Chem. 78 (3): 779–787.

References

75 Gowda, H., Ivanisevic, J., Johnson, C.H. et al. (2014). Interactive XCMS

76

77

78

79 80

81 82 83

84 85

86

87 88

89

90

91

online: simplifying advanced metabolomic data processing and subsequent statistical analyses. Anal. Chem. 86 (14): 6931–6939. Pluskal, T., Castillo, S., Villar-Briones, A., and Oresic, M. (2010). MZmine 2: modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. BMC Bioinformatics 11: 395. Rost, H.L., Sachsenberg, T., Aiche, S. et al. (2016). OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat. Methods 13 (9): 741–748. Tsugawa, H., Cajka, T., Kind, T. et al. (2015). MS-DIAL: data-independent MS/MS deconvolution for comprehensive metabolome analysis. Nat. Methods 12 (6): 523–526. Adusumilli, R. and Mallick, P. (2017). Data conversion with ProteoWizard msConvert. Methods Mol. Biol. 1550: 339–368. Martens, L., Chambers, M., Sturm, M. et al. (2011). mzML – a community standard for mass spectrometry data. Mol. Cell. Proteomics 10 (1): R110.000133. Mathur, R. and O’Connor, P.B. (2009). Artifacts in Fourier transform mass spectrometry. Rapid Commun. Mass Spectrom. 23 (4): 523–529. Savitzky, A. and Golay, M.J.E. (1964). Smoothing + differentiation of data by simplified least squares procedures. Anal. Chem. 36 (8): 1627–1639. Du, P., Kibbe, W.A., and Lin, S.M. (2006). Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching. Bioinformatics 22 (17): 2059–2065. Tautenhahn, R., Bottcher, C., and Neumann, S. (2008). Highly sensitive feature detection for high resolution LC/MS. BMC Bioinformatics 9: 504. Yu, T.W. and Jones, D.P. (2014). Improving peak detection in high-resolution LC/MS metabolomics data using preexisting knowledge and machine learning approach. Bioinformatics 30 (20): 2941–2948. Manier, S.K., Keller, A., and Meyer, M.R. (2019). Automated optimization of XCMS parameters for improved peak picking of liquid chromatography-mass spectrometry data using the coefficient of variation and parameter sweeping for untargeted metabolomics. Drug Test Anal 11 (6): 752–761. Libiseller, G., Dvorzak, M., Kleb, U. et al. (2015). IPO: a tool for automated optimization of XCMS parameters. BMC Bioinformatics 16: 118. Prince, J.T. and Marcotte, E.M. (2006). Chromatographic alignment of ESI-LC-MS proteomics data sets by ordered bijective interpolated warping. Anal. Chem. 78 (17): 6140–6152. Tsai, T.H., Tadesse, M.G., Wang, Y., and Ressom, H.W. (2013). Profile-based LC-MS data alignment-a Bayesian approach. IEEE/ACM Trans. Comput. Biol. Bioinform. 10 (2): 494–503. Wandy, J., Daly, R., Breitling, R., and Rogers, S. (2015). Incorporating peak grouping information for alignment of multiple liquid chromatography-mass spectrometry datasets. Bioinformatics 31 (12): 1999–2006. Zhang, W.C., Lei, Z.T., Huhman, D. et al. (2015). MET-XAlign: a metabolite cross-alignment tool for LC/MS-based comparative metabolomics. Anal. Chem. 87 (18): 9114–9119.

291

292

9 Metabolomics

92 Kuhl, C., Tautenhahn, R., Bottcher, C. et al. (2012). CAMERA: an inte-

93

94 95 96

97

98

99

100

101

102

103

104 105

106

grated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. Anal. Chem. 84 (1): 283–289. Senan, O., Aguilar-Mogas, A., Navarro, M. et al. (2019). CliqueMS: a computational tool for annotating in-source metabolite ions from LC-MS untargeted metabolomics data based on a coelution similarity network. Bioinformatics 35 (20): 4089–4097. Stekhoven, D.J. and Buhlmann, P. (2012). MissForest-non-parametric missing value imputation for mixed-type data. Bioinformatics 28 (1): 112–118. Troyanskaya, O., Cantor, M., Sherlock, G. et al. (2001). Missing value estimation methods for DNA microarrays. Bioinformatics 17 (6): 520–525. Wei, R.M., Wang, J.Y., Jia, E. et al. (2018). GSimp: a Gibbs sampler based left-censored missing value imputation approach for metabolomics studies. PLoS Comput. Biol. 14 (1): e1005973. Shah, J.S., Rai, S.N., DeFilippis, A.P. et al. (2017). Distribution based nearest neighbor imputation for truncated high dimensional data with applications to pre-clinical and clinical metabolomics studies. BMC Bioinformatics 18: 114. Wang, W.X., Zhou, H.H., Lin, H. et al. (2003). Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards. Anal. Chem. 75 (18): 4818–4826. Cairns, D.A., Thompson, D., Perkins, D.N. et al. (2008). Proteomic profiling using mass spectrometry does normalising by total ion current potentially mask some biological differences? Proteomics 8 (1): 21–27. Lee, J., Park, J., Lim, M.S. et al. (2012). Quantile normalization approach for liquid chromatography-mass spectrometry-based metabolomic data from healthy human volunteers. Anal. Sci. 28 (8): 801–805. Kohl, S.M., Klein, M.S., Hochrein, J. et al. (2012). State-of-the art data normalization methods improve NMR-based metabolomic analysis. Metabolomics 8 (1): S146–S160. Lin, S.M., Du, P., Huber, W., and Kibbe, W.A. (2008). Model-based variance-stabilizing transformation for Illumina microarray data. Nucleic Acids Res. 36 (2): e11. Dieterle, F., Ross, A., Schlotterbeck, G., and Senn, H. (2006). Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in H-1 NMR metabonomics. Anal. Chem. 78 (13): 4281–4290. Wehrens, R., Hageman, J.A., van Eeuwijk, F. et al. (2016). Improved batch correction in untargeted MS-based metabolomics. Metabolomics 12 (5): 88. Risso, D., Ngai, J., Speed, T.P., and Dudoit, S. (2014). Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 32 (9): 896–902. De Livera, A.M., Sysi-Aho, M., Jacob, L. et al. (2015). Statistical methods for handling unwanted variation in metabolomics data. Anal. Chem. 87 (7): 3606–3615.

References

107 Johnson, W.E., Li, C., and Rabinovic, A. (2007). Adjusting batch effects in

108

109

110

111

112

113

114

115 116

117

118

119

120 121 122

microarray expression data using empirical Bayes methods. Biostatistics 8 (1): 118–127. Haghverdi, L., Lun, A.T.L., Morgan, M.D., and Marioni, J.C. (2018). Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat. Biotechnol. 36 (5): 421–427. Kuligowski, J., Sanchez-Illana, A., Sanjuan-Herraez, D. et al. (2015). Intra-batch effect correction in liquid chromatography-mass spectrometry using quality control samples and support vector regression (QC-SVRC). Analyst 140 (22): 7810–7817. Kirwan, J.A., Broadhurst, D.I., Davidson, R.L., and Viant, M.R. (2013). Characterising and correcting batch variation in an automated direct infusion mass spectrometry (DIMS) metabolomics workflow. Anal. Bioanal. Chem. 405 (15): 5147–5157. Redestig, H., Fukushima, A., Stenlund, H. et al. (2009). Compensation for systematic cross-contribution improves normalization of mass spectrometry based metabolomics data. Anal. Chem. 81 (19): 7974–7980. Sysi-Aho, M., Katajamaa, M., Yetukuri, L., and Oresic, M. (2007). Normalization method for metabolomics data using optimal selection of multiple internal standards. BMC Bioinformatics 8: 93. Li, B., Tang, J., Yang, Q.X. et al. (2017). NOREVA: normalization and evaluation of MS-based metabolomics data. Nucleic Acids Res. 45 (W1): W162–W170. Viant, M.R., Kurland, I.J., Jones, M.R., and Dunn, W.B. (2017). How close are we to complete annotation of metabolomes? Curr. Opin. Chem. Biol. 36: 64–69. Sumner, L.W., Amberg, A., Barrett, D. et al. (2007). Proposed minimum reporting standards for chemical analysis. Metabolomics 3 (3): 211–221. Schymanski, E.L., Jeon, J., Gulde, R. et al. (2014). Identifying small molecules via high resolution mass spectrometry: communicating confidence. Environ. Sci. Technol. 48 (4): 2097–2098. Scheubert, K., Hufsky, F., Petras, D. et al. (2017). Significance estimation for large scale metabolomics annotations by spectral matching. Nat. Commun. 8: 1494. Palmer, A., Phapale, P., Chernyavsky, I. et al. (2017). FDR-controlled metabolite annotation for high-resolution imaging mass spectrometry. Nat. Methods 14 (1): 57–60. Kind, T. and Fiehn, O. (2007). Seven Golden rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry. BMC Bioinformatics 8: 105. Kim, S., Chen, J., Cheng, T.J. et al. (2019). PubChem 2019 update: improved access to chemical data. Nucleic Acids Res. 47 (D1): D1102–D1109. Pence, H.E. and Williams, A. (2010). ChemSpider: an online chemical information resource. J. Chem. Educ. 87 (11): 1123–1124. Henry, C.S., DeJongh, M., Best, A.A. et al. (2010). High-throughput generation, optimization and analysis of genome-scale metabolic models. Nat. Biotechnol. 28 (9): 977–982.

293

294

9 Metabolomics

123 Kanehisa, M., Furumichi, M., Tanabe, M. et al. (2017). KEGG: new perspec-

124 125

126

127

128

129

130

131

132

133

134 135

136

137

138

tives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45 (D1): D353–D361. Wishart, D.S., Feunang, Y.D., Marcu, A. et al. (2018). HMDB 4.0: the human metabolome database for 2018. Nucleic Acids Res. 46 (D1): D608–D617. Keseler, I.M., Mackie, A., Santos-Zavaleta, A. et al. (2017). The EcoCyc database: reflecting new knowledge about Escherichia coli K-12. Nucleic Acids Res. 45 (D1): D543–D550. Herrgard, M.J., Swainston, N., Dobson, P. et al. (2008). A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology. Nat. Biotechnol. 26 (10): 1155–1160. Abate-Pella, D., Freund, D.M., Ma, Y. et al. (2015). Retention projection enables accurate calculation of liquid chromatographic retention times across labs and methods. J. Chromatogr. A 1412: 43–51. Broeckling, C.D., Ganna, A., Layer, M. et al. (2016). Enabling efficient and confident annotation of LC-MS metabolomics data through MS1 spectrum and time prediction. Anal. Chem. 88 (18): 9226–9234. Bach, E., Szedmak, S., Brouard, C. et al. (2018). Liquid-chromatography retention order prediction for metabolite identification. Bioinformatics 34 (17): 875–883. Stanstrup, J., Neumann, S., and Vrhovsek, U. (2015). PredRet: prediction of retention time by direct mapping between multiple chromatographic systems. Anal. Chem. 87 (18): 9421–9428. Uppal, K., Walker, D.I., and Jones, D.P. (2017). xMSannotator: an R package for network-based annotation of high-resolution metabolomics data. Anal. Chem. 89 (2): 1063–1067. Del Carratore, F., Schmidt, K., Vinaixa, M. et al. (2019). Integrated probabilistic annotation: a Bayesian-based annotation method for metabolomic profiles integrating biochemical connections, isotope patterns, and adduct relationships. Anal. Chem. 91 (20): 12799–12807. Kind, T., Tsugawa, H., Cajka, T. et al. (2018). Identification of small molecules using accurate mass MS/MS search. Mass Spectrom. Rev. 37 (4): 513–532. Kind, T., Liu, K.H., Lee, D.Y. et al. (2013). LipidBlast in silico tandem mass spectrometry database for lipid identification. Nat. Methods 10 (8): 755–758. Tsugawa, H., Kind, T., Nakabayashi, R. et al. (2016). Hydrogen rearrangement rules: computational MS/MS fragmentation and structure elucidation using MS-FINDER software. Anal. Chem. 88 (16): 7946–7958. Yang, X.Y., Neta, P., and Stein, S.E. (2017). Extending a tandem mass spectral library to include MS2 spectra of fragment ions produced in-source and MSn spectra. J. Am. Soc. Mass Spectrom. 28 (11): 2280–2287. Guijas, C., Montenegro-Burke, J.R., Domingo-Almenara, X. et al. (2018). METLIN: a technology platform for identifying knowns and unknowns. Anal. Chem. 90 (5): 3156–3164. Horai, H., Arita, M., Kanaya, S. et al. (2010). MassBank: a public repository for sharing mass spectral data for life sciences. J. Mass Spectrom. 45 (7): 703–714.

References

139 Wang, M.X., Carver, J.J., Phelan, V.V. et al. (2016). Sharing and commu-

140 141

142

143

144

145

146

147

148

149

150

151

152 153 154

nity curation of mass spectrometry data with global natural products social molecular networking. Nat. Biotechnol. 34 (8): 828–837. Stein, S.E. (1995). Chemical substructure identification by mass-spectral library searching. J. Am. Soc. Mass Spectrom. 6 (8): 644–655. Watrous, J., Roach, P., Alexandrov, T. et al. (2012). Mass spectral molecular networking of living microbial colonies. Proc. Natl. Acad. Sci. U. S. A. 109 (26): E1743–E1752. Ernst, M., Kang, K.B., Caraballo-Rodriguez, A.M. et al. (2019). MolNetEnhancer: enhanced molecular networks by integrating metabolome mining and annotation tools. Metabolites 9 (7) https://doi.org/10.3390/metabo9070144. Shen, X.T., Wang, R.H., Xiong, X. et al. (2019). Metabolic reaction network-based recursive metabolite annotation for untargeted metabolomics. Nat. Commun. 10: 1516. https://doi.org/10.1038/s41467-019-09550-x. Frainay, C., Schymanski, E.L., Neumann, S. et al. (2018). Mind the gap: mapping mass spectral databases in genome-scale metabolic networks reveals poorly covered areas. Metabolites 8 (3): 51. https://doi.org/10.3390/ metabo8030051. Wolf, S., Schmidt, S., Muller-Hannemann, M., and Neumann, S. (2010). In silico fragmentation for computer assisted identification of metabolite mass spectra. BMC Bioinformatics 11: 148. Ruttkies, C., Schymanski, E.L., Wolf, S. et al. (2016). MetFrag relaunched: incorporating strategies beyond in silico fragmentation. J. Cheminformatics 8: 3. Djoumbou-Feunang, Y., Pon, A., Karu, N. et al. (2019). CFM-ID 3.0: significantly improved ESI-MS/MS prediction and compound identification. Metabolites 9 (4): 72. Allen, F., Pon, A., Greiner, R., and Wishart, D. (2016). Computational prediction of electron ionization mass spectra to assist in GC/MS compound identification. Anal. Chem. 88 (15): 7689–7697. Heinonen, M., Shen, H.B., Zamboni, N., and Rousu, J. (2012). Metabolite identification and molecular fingerprint prediction through machine learning. Bioinformatics 28 (18): 2333–2341. Duhrkop, K., Shen, H.B., Meusel, M. et al. (2015). Searching molecular structure databases with tandem mass spectra using CSI: FingerID. Proc. Natl. Acad. Sci. U. S. A. 112 (41): 12580–12585. Duhrkop, K., Fleischauer, M., Ludwig, M. et al. (2019). SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information. Nat. Methods 16 (4): 299–302. Krzywinski, M. and Altman, N. (2014). Points of significance: comparing samples-part II. Nat. Methods 11 (4): 355–356. Krzywinski, M. and Altman, N. (2013). Points of significance: power and sample size. Nat. Methods 10 (12): 1139–1140. Blaise, B.J., Correia, G., Tin, A. et al. (2016). Power analysis and sample size determination in metabolic phenotyping. Anal. Chem. 88 (10): 5179–5188.

295

296

9 Metabolomics

155 Mendez, K.M., Pritchard, L., Reinke, S.N., and Broadhurst, D.I. (2019).

156

157

158 159

160 161 162 163 164

165

166 167

168 169

170

171

172

Toward collaborative open data science in metabolomics using Jupyter notebooks and cloud computing. Metabolomics 15 (10): 125. Chong, J., Soufan, O., Li, C. et al. (2018). MetaboAnalyst 4.0: towards more transparent and integrative metabolomics analysis. Nucleic Acids Res. 46 (W1): W486–W494. Giacomoni, F., Le Corguille, G., Monsoor, M. et al. (2015). Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics. Bioinformatics 31 (9): 1493–1495. Krzywinski, M. and Altman, N. (2013). Significance, P values and t-tests. Nat. Methods 10 (11): 1041–1042. Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate – a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Stat Methodol. 57 (1): 289–300. Storey, J.D. and Tibshirani, R. (2003). Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. U. S. A. 100 (16): 9440–9445. Worley, B. and Powers, R. (2013). Multivariate analysis in metabolomics. Curr. Metab. 1 (1): 92–107. Kobak, D. and Berens, P. (2019). The art of using t-SNE for single-cell transcriptomics. Nat. Commun. 10: 5416. Altman, N. and Krzywinski, M. (2017). Clustering. Nat. Methods 14 (6): 545–546. Khatri, P., Sirota, M., and Butte, A.J. (2012). Ten years of pathway analysis: current approaches and outstanding challenges. PLoS Comput. Biol. 8 (2): e1002375. Subramanian, A., Tamayo, P., Mootha, V.K. et al. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U. S. A. 102 (43): 15545–15550. Deo, R.C., Hunter, L., Lewis, G.D. et al. (2010). Interpreting metabolomic profiles using unbiased pathway models. PLoS Comput. Biol. 6 (2): e1000692. Li, S.Z., Park, Y., Duraisingham, S. et al. (2013). Predicting network activity from high throughput metabolomics. PLoS Comput. Biol. 9 (7) https://doi .org/10.1371/journal.pcbi.1003123. Newman, M.E.J. (2006). Modularity and community structure in networks. Proc. Natl. Acad. Sci. U. S. A. 103 (23): 8577–8582. Jha, A.K., Huang, S.C.C., Sergushichev, A. et al. (2015). Network integration of parallel metabolic and transcriptional data reveals metabolic modules that regulate macrophage polarization. Immunity 42 (3): 419–430. Sergushichev, A.A., Loboda, A.A., Jha, A.K. et al. (2016). GAM: a web-service for integrated transcriptional and metabolic network analysis. Nucleic Acids Res. 44 (W1): W194–W200. Kuehne, A., Mayr, U., Sevin, D.C. et al. (2017). Metabolic network segmentation: a probabilistic graphical modeling approach to identify the sites and sequential order of metabolic regulation from non-targeted metabolomics data. PLoS Comput. Biol. 13 (6): e1005577. de Raad, M., Fischer, C.R., and Northen, T.R. (2016). High-throughput platforms for metabolomics. Curr. Opin. Chem. Biol. 30: 7–13.

References

173 Fuhrer, T. and Zamboni, N. (2015). High-throughput discovery metabolomics.

Curr. Opin. Biotechnol. 31: 73–78. 174 Fuhrer, T., Zampieri, M., Sevin, D.C. et al. (2017). Genomewide landscape

175

176

177

178

179

180 181 182

183 184 185

186

187

188

189

of gene-metabolome associations in Escherichia coli. Mol. Syst. Biol. 13 (1): 907. Sevin, D.C., Fuhrer, T., Zamboni, N., and Sauer, U. (2017). Nontargeted in vitro metabolomics for high-throughput identification of novel enzymes in Escherichia coli. Nat. Methods 14 (2): 187–194. Greving, M.P., Patti, G.J., and Siuzdak, G. (2011). Nanostructure-initiator mass spectrometry metabolite analysis and imaging. Anal. Chem. 83 (1): 2–7. Heinemann, J., Deng, K., Shih, S.C.C. et al. (2017). On-chip integration of droplet microfluidics and nanostructure-initiator mass spectrometry for enzyme screening. Lab Chip 17 (2): 323–331. Kuster, S.K., Fagerer, S.R., Verboket, P.E. et al. (2013). Interfacing droplet microfluidics with matrix-assisted laser desorption/ionization mass spectrometry: label-free content analysis of single droplets. Anal. Chem. 85 (3): 1285–1289. Sinclair, I., Bachman, M., Addison, D. et al. (2019). Acoustic mist ionization platform for direct and contactless ultrahigh-throughput mass spectrometry analysis of liquid samples. Anal. Chem. 91 (6): 3790–3794. Zenobi, R. (2013). Single-cell metabolomics: analytical and biological perspectives. Science 342 (6163): 1243259. Duncan, K.D., Fyrestam, J., and Lanekoff, I. (2019). Advances in mass spectrometry based single-cell metabolomics. Analyst 144 (3): 782–793. Ali, A., Abouleila, Y., Shimizu, Y. et al. (2019). Single-cell metabolomics by mass spectrometry: advances, challenges, and future applications. TrAC Trends Anal. Chem. 120 https://doi.org/10.1016/j.trac.2019.02.033. Yuan, J., Bennett, B.D., and Rabinowitz, J.D. (2008). Kinetic flux profiling for quantitation of cellular metabolic fluxes. Nat. Protoc. 3 (8): 1328–1340. Alberty, R.A. (2003). Thermodynamics of Biochemical Reactions, 397. Hoboken, NJ: Wiley-Interscience. ix. Kummel, A., Panke, S., and Heinemann, M. (2006). Putative regulatory sites unraveled by network-embedded thermodynamic analysis of metabolome data. Mol. Syst. Biol. 2: 2006.0034. Zamboni, N., Kummel, A., and Heinemann, M. (2008). anNET: a tool for network-embedded thermodynamic analysis of quantitative metabolome data. BMC Bioinformatics 9: 199. Henry, C.S., Broadbelt, L.J., and Hatzimanikatis, V. (2007). Thermodynamics-based metabolic flux analysis. Biophys. J. 92 (5): 1792–1805. Noor, E., Bar-Even, A., Flamholz, A. et al. (2014). Pathway thermodynamics highlights kinetic obstacles in central metabolism. PLoS Comput. Biol. 10 (2): e1003483. Asplund-Samuelsson, J., Janasch, M., and Hudson, E.P. (2018). Thermodynamic analysis of computed pathways integrated into the metabolic

297

298

9 Metabolomics

190

191

192

193

194 195

196

197

198

199 200 201

202

203

204

networks of Escherichia coli and Synechocystis reveals contrasting expansion potential. Metab. Eng. 45: 223–236. Dash, S., Olson, D.G., Chan, S.H.J. et al. (2019). Thermodynamic analysis of the pathway for ethanol production from cellobiose in Clostridium thermocellum. Metab. Eng. 55: 161–169. Nagai, H., Masuda, A., Toya, Y. et al. (2018). Metabolic engineering of mevalonate-producing Escherichia coli strains based on thermodynamic analysis. Metab. Eng. 47: 1–9. Ohtake, T., Pontrelli, S., Lavina, W.A. et al. (2017). Metabolomics-driven approach to solving a CoA imbalance for improved 1-butanol production in Escherichia coli. Metab. Eng. 41: 135–143. Kawaguchi, H., Yoshihara, K., Hara, K.Y. et al. (2018). Metabolome analysis-based design and engineering of a metabolic pathway in Corynebacterium glutamicum to match rates of simultaneous utilization of D-glucose and L-arabinose. Microb. Cell Factories 17 (1): 76. Kacser, H. and Burns, J.A. (1973). The control of flux. Symp. Soc. Exp. Biol. 27: 65–104. Wang, L., Birol, I., and Hatzimanikatis, V. (2004). Metabolic control analysis under uncertainty: framework development and case studies. Biophys. J. 87 (6): 3750–3763. Trondle, J., Schoppel, K., Bleidt, A. et al. (2020). Metabolic control analysis of L-tryptophan production with Escherichia coli based on data from short-term perturbation experiments. J. Biotechnol. 307: 15–28. Liu, Y., Link, H., Liu, L. et al. (2016). A dynamic pathway analysis approach reveals a limiting futile cycle in N-acetylglucosamine overproducing Bacillus subtilis. Nat. Commun. 7: 11933. Andreozzi, S., Chakrabarti, A., Soh, K.C. et al. (2016). Identification of metabolic engineering targets for the enhancement of 1,4-butanediol production in recombinant Escherichia coli using large-scale kinetic models. Metab. Eng. 35: 148–159. Tawfik, D.S. (2014). Accuracy-rate tradeoffs: how do enzymes meet demands of selectivity and catalytic efficiency? Curr. Opin. Chem. Biol. 21: 73–80. Sun, J.Y., Jeffryes, J.G., Henry, C.S. et al. (2017). Metabolite damage and repair in metabolic engineering design. Metab. Eng. 44: 150–159. Bommer, G.T., Van Schaftingen, E., and Veiga-da-Cunha, M. (2019). Metabolite repair enzymes control metabolic damage in glycolysis. Trends Biochem. Sci. 45: 228–243. Showalter, M.R., Cajka, T., and Fiehn, O. (2017). Epimetabolites: discovering metabolism beyond building and burning. Curr. Opin. Chem. Biol. 36: 70–76. Varela, C., Schmidt, S.A., Borneman, A.R. et al. (2018). Systems-based approaches enable identification of gene targets which improve the flavour profile of low-ethanol wine yeast strains. Metab. Eng. 49: 178–191. Hasunuma, T., Sanda, T., Yamada, R. et al. (2011). Metabolic pathway engineering based on metabolomics confers acetic and formic acid tolerance to a recombinant xylose-fermenting strain of Saccharomyces cerevisiae. Microb. Cell Factories 10: 2.

References

205 Sevin, D.C. and Sauer, U. (2014). Ubiquinone accumulation improves

osmotic-stress tolerance in Escherichia coli. Nat. Chem. Biol. 10 (4): 266–272. 206 Sevin, D.C., Stahlin, J.N., Pollak, G.R. et al. (2016). Global metabolic responses to salt stress in fifteen species. PLoS One 11 (2): e0148888. 207 Xia, M.L., Huang, D., Li, S.S. et al. (2013). Enhanced FK506 production in Streptomyces tsukubaensis by rational feeding strategies based on comparative metabolic profiling analysis. Biotechnol. Bioeng. 110 (10): 2717–2730. 208 Lin, J., Yi, X.P., and Zhuang, Y.P. (2019). Medium optimization based on comparative metabolomic analysis of chicken embryo fibroblast DF-1 cells. RSC Adv. 9 (47): 27369–27377.

299

301

10 Genome Editing of Eukarya Jonathan A. Arnesen 1 , Jakob Blæsbjerg Hoof 2 , Helene Faustrup Kildegaard 3 , and Irina Borodina 1 1 The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kgs. Lyngby, Denmark 2 Department of Biotechnology and Biomedicine, Technical University of Denmark, Kgs. Lyngby, Denmark 3 Department of Mammalian Expression, Novo Nordisk A/S, Maaloev, Denmark

Genome editing is the process of altering an organism’s DNA by either deleting, substituting, or inserting DNA or changing the chromosomal structure, in a targeted manner. Genome editing is used for fundamental gene function studies, construction of cell factories, and gene therapy [1, 137, 263]. While humans have harnessed the power of naturally occurring mutations for millennia to artificially select improved, domesticated plant lines and animal breeds, modern techniques for genome editing allow for more accurate control of DNA changes and enable modification of living organisms by design [18]. The advent of new molecular biological tools together with the increasing number of eukaryotic genomes available has enabled targeted genetic modification of many organisms, including plants, fungi, animals, and humans. This chapter seeks to present the history and current techniques for genome editing in eukaryotes.

10.1 Basic Principles of Genome Editing Genome editing takes advantage of the innate DNA repair mechanisms of living organisms to alter their genetic structure. Breaks in DNA are lethal in living organisms that therefore rely on enzymatic systems to repair breaks when they appear [2, 3]. DNA repair can proceed via several mechanisms like homologous recombination (HR) and non-homologous end joining (NHEJ) with alternative routes such as microhomology-mediated end-joining (MMEJ) (Figure 10.1) [4– 6]. Both the mechanisms of HR and NHEJ can be used in strategies for genome editing; however, HR is particularly useful because it can facilitate defined and precise alteration of genetic sequences. HR relies on homology between the damaged sequence and a donor sequence to guide repair. When a DNA break occurs, Metabolic Engineering: Concepts and Applications, First Edition. Edited by Sang Yup Lee, Jens Nielsen, and Gregory Stephanopoulos. © 2021 WILEY-VCH GmbH. Published 2021 by WILEY-VCH GmbH.

302

10 Genome Editing of Eukarya

the HR-machinery trims the ends of the damaged DNA in the 5′ to 3′ direction leaving 3′ single-strand overhangs. Subsequently, the 3′ end overhang of the damaged DNA anneals to the complementary strand of the donor DNA, a process known as “strand invasion,” where the strands of the donor DNA are opened in a D-loop. A new DNA strand can then be synthesized from the invading 3′ end with the complementary donor DNA strand serving as template for the DNA polymerase. From there, the final DNA repair may proceed through several different mechanisms. However, if the D-loop undergoes second-end capture forming a double Holliday Junction (HJ), it can lead to a crossover event of genetic material depending on how the HJ is resolved [7]. This event can be harnessed by delivering foreign DNA with high homology to the organism’s native DNA into the cell, triggering HR and thus generating a crossover event from the foreign DNA to the organism’s chromosomes [138]. Alternatively, the damaged DNA can be repaired by NHEJ or MMEJ, which may lead to deletions or insertions at the repaired DNA junction or chromosomal rearrangements [4, 5, 8, 9]. Thus, many techniques for targeted genome editing rely on HR to transfer genetic material from donor DNA with significant homology to one or more loci on the recipient genome, causing an alteration or addition of DNA sequence in the native locus. The foreign donor DNA often contains regions homologous to the target site, which are flanking genetic sequences or expression cassettes resulting in substitutions, deletions, or additions, such as new genes and their transcriptional elements (Figure 10.2) [10, 138, 143]. Furthermore, a method for introducing DNA into transformation competent cells and a selection scheme are needed. Methods for delivering DNA into cells are numerous, varied, and depend on the particular organism and cells used for transformation; some organisms readily take up DNA with relatively simple procedures while others require extensive preparation before transformation can be achieved [187]. The selection allows for distinguishing between cells that have taken up the foreign DNA and cells that have not. For example, this can involve the use of antibiotic resistance genes added to the donor DNA. By growing the transformed cells on media supplemented with antibiotics, only the cells that have taken up the donor DNA will survive, which allows for a simple selection of the correct clones [138, 146]. HR-based integration rates for foreign DNA may be meager in some organisms. In mammalian cells, integration of donor DNA via HR can occur at a rate of 1 out of 1000 transfected cells with selection [10]. An efficient method to increase recombination rates via HR is to generate double-stranded breaks (DSBs) in the donor DNA, often done by placing the DSB in the flanking homology regions, or in the recipient DNA [11–13]. While donor DNA can easily be cleaved with restriction enzymes before the transformation, editing the recipient DNA of living cells is more challenging. However, this can be done by targeting endonucleases to specific genomic sites resulting in local DSBs, triggering HR, and improving integration efficiency of the donor DNA into the cut site [14]. Induction of DSBs by endonucleases can also result in mutations caused by the more error-prone NHEJ system, which in itself is also a valid strategy to mutate genomes, often used to knock-out genes [15, 16]. Meganucleases were among the first endonucleases used to generate DSBs in genomic DNA for improving integration efficiency in mammalian cells [14].

10.1 Basic Principles of Genome Editing

Figure 10.1 DNA double-strand break repair by homologous recombination resulting in a crossover event. While several mechanisms for DSB repair in living cells exist, the mechanism of HR can lead to a crossover event where genetic information is exchanged between the damaged DNA and the donor DNA. The repair mechanisms start by resecting the DNA ends, which generates 3′ end overhangs that can invade the donor DNA and use it as a template for DNA synthesis. Second-end capture of the newly synthesized invading strand results in the formation of a double HJ, which can result in a crossover event depending on the resolving catalyzed by endogenous repair enzymes. In the case of a crossover event, the genetic information has been exchanged between the two DNA molecules and this mechanism underlies the technique of HR-mediated genome editing.

Damaged DNA

Donor DNA

End resection

Strand invasion DNA synthesis

Extended D-loop Second end capture DNA synthesis ligation

Double holliday junction HJ resolution

Crossover

Meganucleases are restriction enzymes that have relatively large recognition sites, and the yeast endonuclease I-SceI with its 18 base pairs (bp) recognition site has been commonly used [17]. Using meganucleases requires that their target is already present in the genome or that it is integrated before use, which necessitates additional rounds of transformation or transfection. However, meganucleases have become largely obsolete by the introduction of new

303

304

10 Genome Editing of Eukarya DNA constructs

Genome

Genome Insertion

Deletion

Substitution

Figure 10.2 HR-mediated techniques for genome editing. The HR-mediated technique for genome editing can result in three different outcomes: (a) Insertion: Besides homologous regions targeting specific loci, the DNA construct can carry new foreign DNA, such as selectable markers or genes of interest. (b) Deletion: The homologous flanking regions can be designed, so part of the genome is removed upon correct integration. The DNA construct will often need to carry a selection marker, unless the target is selectable or endonucleases are used. (c) Substitution: The DNA construct needs to carry a selection marker, unless targeting selectable loci or using an endonuclease-gene expressed from a plasmid carrying the selection marker.

endonucleases such as zinc-finger nucleases (ZFNs), transcription activator-like effector nucleases (TALENs), and clustered regularly interspaced short palindromic repeats (CRISPR)/Cas that offer easier implementation and greater flexibility. These tools will be described in greater detail in the following sections. Targeted genome editing stands in contrast to random or semi-random methods for genome modification. Random techniques have a long history since mutations occur naturally in genetic material as a result of UV-radiation and other environmental factors. Naturally occurring random mutations have formed the basis for artificial selection for crop plants or domesticated animals for centuries [18, 19]. Similarly, chemicals or physical mutagens like ionizing radiation or ion beams have been used for random mutagenesis [20]. Indeed, plants have even been sent to outer space as part of mutagenesis programs as space radiation quite effectively induces mutations [21]. Random mutagenesis has also been used in yeast to engineer better cell factories [22, 23]. Molecular approaches for semi-random genome modification includes transposon-generated mutations, where transposons or retrotransposons are transferred into an organism’s genome followed by semi-random genomic insertion as such transpositional elements may prefer certain genomic regions and targets [24–26]. However, due to the scope limitations, this chapter will concern itself mainly with the principles of targeted genome editing in eukaryotes emphasizing relevant organisms amenable to metabolic engineering.

10.2 Endonucleases 10.2.1

Zinc-Finger Nucleases

Zinc fingers were discovered in 1985 as part of the eukaryotic transcription factor IIIA from Xenopus laevis [27]. These domains bind DNA and were named

10.2 Endonucleases

Figure 10.3 Model of a pair of Cys2 His2 zinc fingers. Encircled letters indicate conserved amino residues. The zinc ion ligand binds to the conserved cysteine and histidine residues, which folds up the amino acid chain in a finger-shaped loop capable of binding DNA.

L

L

F

F

C D Y

D

Zn C

C

H

H

H Zn

C

H

“zinc fingers” as they contain a zinc molecule that folds up the protein in a loop structure reminiscent of a finger (Figure 10.3). The zinc fingers occur in tandem structures, which, like a metaphorical “hand,” can “grip” the nucleotide sequences. Each finger can bind a specific motif of three consecutive nucleotides, and many different zinc fingers exist with different nucleotide triplet preferences [28, 29]. However, while researchers have postulated the potential for an underlying “code” for nucleotide triplet specificity, no simple code has emerged yet [30–32]. In order to generate targeted DSBs, zinc fingers must be fused with DNA-cleavage domains. These complexes are called Zinc-Finger Nucleases (ZFNs). The commonly used DNA cleavage domain originates from the bacterial FokI restriction enzyme that naturally consists of a specific DNA binding domain and a non-specific DNA cleavage domain [33]. The FokI cleavage domain was first used together with the Drosophila DNA-binding Ubx homeodomain, which enabled targeted cleavage of nucleotide sequences [34]. Not long after that, the FokI cleavage domain was fused to zinc-finger domains that were able to cleave DNA at sites determined by the zinc-finger specificity [35]. In order to cut efficiently, the FokI cleavage domain needs to dimerize [36, 37]. Therefore, at least two fusion proteins with a FokI cleavage domain and different groups of zinc fingers targeted to closely spaced nucleotides triplets are required [37, 38]. Furthermore, the FokI domain can be mutated to avoid homodimerization, which decreases the likelihood of off-target nuclease activity [39, 40]. Multiple zinc fingers can be fused to target longer specific sequences, which may reduce off-targeting since long sequences and their close resemblances are less likely to appear throughout the genome. However, adding too many zinc fingers to the ZFN complex can reduce the activity, and even well-established zinc fingers may not work together in new combinations [41–43]. The distance between the targeted nucleotide sequences of the ZFN pair depends on the fusion protein design. When zinc fingers and FokI cleavage domains are fused directly or with short amino linkers, the optimal distance between targeted sequences can be five to six nucleotides, whereas longer amino linkers will result in longer optimal distances between the targeted nucleotide sequences [44, 45]. When ZFNs are used for HR-directed recombination, the off-target effects can be reduced by inactivating one FokI monomer of the ZFN pair, which results in nicking of the target site rather than a DSB [46, 47]. Currently, ZFNs are being used in

305

306

10 Genome Editing of Eukarya

clinical trials for the treatment of diseases, such as HIV and Hunter’s syndrome [1, 48]. The clinical trial toward the treatment of Hunter’s syndrome represents an instance of in vivo gene therapy in humans. The earlier trial on HIV treatment used isolated cells that were gene-edited before being reinfused into the subjects. Therefore, while ZFNs are largely obsolete for the generation of cell factories, they still play a substantial role in gene therapy. 10.2.2

Transcription Activator-Like Effectors Nucleases

Transcription activator-like effectors (TALEs) were initially found in plant pathogenic bacteria of the genus Xanthomonas, which influence the cellular metabolism of their plant host by transferring proteins known as “transcription activator-like effectors” (TALEs) into the plant cells [49]. TALE-proteins are transferred to the plant cells through the bacterial type III secretion system. After being injected into the cytoplasm, TALEs invade the nucleus and act as transcription factors, modulating gene expression of the plant host to accommodate the infection process [50–52]. TALEs consist of several functional domains residing in the N-terminal, C-terminal, or central domain (Figure 10.4). The signals important for translocation and type III secretion are located in the N-terminal, while the nuclear localization signal (NLS) resides in the C-terminal [53]. The N-terminal also contains structures involved in DNA-binding [54]. The C-terminally located acidic activator domain is essential to activate gene transcription [55]. The DNA-binding activity of TALEs is dictated by the central domain that is composed of several repeats [50]. The central region consists of between 1.5 and 33.5 repeats, each of which typically consists of 34 amino acids, with the final, shorter repeat toward the C-terminal being considered a “half”-repeat [49]. These repeats are typically highly conserved except residues 12 and 13 that are highly variable and termed repeat variable diresidues (RVDs) [56, 57]. Each repeat binds to a single nucleotide, which is facilitated by RVDs 12 and 13 that recognize a specific nucleotide depending on the amino acid combination [58, 59]. This has led to the elucidation of a simple code connecting pairs of RVDs to target nucleotides. The commonly occurring RVDs in Xanthomonas TALENs are HD, NN, NI, and NG that bind to C, G or A, A, and T, respectively. In particular, NK and NH are also used to target G, while other less specific RVDs can be used to generate TALEs that bind more N Translocation domain

C Repeat region DNA binding domain

Nuclear localization signals

Activation domain

Figure 10.4 Domain architecture of Xanthomonas transcription activator-like effector (TALE). The translocation domain is located in the N-terminal (N). The repeat region varies from 1.5 to 33.5 repeats, with the last being a half-repeat. Each repeat contains a pair of repeat variable diresidues (RVDs), an amino acid pair that facilitates DNA binding. Both the nuclear localization signals (NLS) and activation domain are located toward the C-terminal (C).

10.2 Endonucleases

promiscuously [60, 61]. The efficiency of binding also correlates with the use of RVDs, as RVDs differ in the strength to which they bind their respective bases [61–63]. Using mainly weak RVDs, such as NG, NI, and NK, can result in non-binding proteins, but adding a few stronger RVDs, such as NN or HD, can rescue the DNA-binding properties. Conveniently, the specific binding of each repeat is not affected by its neighboring repeats. The straightforward code of TALE nucleotide recognition allows for DNA-binding prediction of TALEs with known RVDs and that synthetic TALEs with specific DNA-recognition patterns can be designed [64–68]. TALEs generally also need a 5′ -end thymine preceding the target sequence for binding, yet TALEs have been engineered to bind more promiscuously [69]. Like zinc fingers, TALEs can be fused with nuclease domains, such as the non-specific FokI domain, to generate Transcription Activator-Like Effector Nucleases (TALENs) capable of causing genomic DSBs in living cells [70, 71]. As with ZFNs, TALE-FokI fusions require dimerization of the FokI domains to function properly [72–74]. Therefore, dual TALE-FokI proteins can be designed to target separate sequences with around 12–24 bp between each other. Alternatively, engineered TALEs can be fused to transposase or recombinase domains to modify DNA structure [75, 76]. TALEs can also be fused to activator or epigenetic modifier domains to alter gene expression without changing nucleotide sequences [77, 78]. Increasing the number of repeats in synthetic TALEs can increase DNA-binding efficiency similarly to using strong RVDs, as it was demonstrated that 10.5 repeats were not sufficient to activate a “reporter gene” while 17.5 repeats were enough when using RVDs of similar strength [61]. Furthermore, repeat numbers also influence specificity as shorter target sequences are more likely to appear throughout the genome. Having short target sequences could potentially result in off-target effects and toxicity [71]. To improve synthetic TALEs, it is desirable to reduce their size, which can involve the truncation of the C- and N-terminal domains, while retaining the DNA-binding function. The extent to which the C-terminal can be truncated seems context-dependent, as studies differ in how many amino acids can be removed without loss of function. In one study, truncating the C-terminal to less than 63 amino acids (aa) resulted in at least 50% activity reduction, while other studies show reduced activity when truncating down to 22 aa but not at 17, 47, or 95 [73, 74, 77]. On the other hand, shortening the N-terminal by more than 158 aa significantly reduced activity, while retaining 152 aa of the N-terminal seems to still result in a functional protein [73, 77]. The N-terminal may, therefore, be more sensitive to deletions than the C-terminal for retaining function. A large number of similar repeats required for the construction of sequence-specific synthetic TALEs pose a challenge for cloning. Several methods like Golden Gate Cloning, high-throughput solid-phase assembly, or ligation-independent cloning can be used to overcome these challenges [71, 79, 80]. TALEs do not exclusively occur in Xanthomonas spp. TALEs from other bacteria like Ralstonia solanacearum and Burkholderia rhizoxinica have been studied and characterized [81–83]. Ralstonia TALEs use a different RVD-code for binding nucleotides compared to Xanthomonas TALEs, while Burkholderia TALEs are shorter compared to Xanthomonas TALE. These new

307

308

10 Genome Editing of Eukarya

types of TALEs expand the toolkit with new opportunities [84]. Upon their conception, TALENs seemed to hold great potential compared to ZFNs, as the codified recognition pattern offered better programmability. However, TALENs have seemingly failed to gain much traction. This can in part be explained by the soon-after emergence of CRISPR/Cas, which enabled an even greater flexibility and ease of design than TALENs. However, it may still be too early to write off TALENs. As the field of gene editing, in particular, gene therapeutics, is still in its infancy, it may be that the TALENs hold some inherent benefit for certain applications as compared to CRISPR/Cas and ZFNs. 10.2.3

CRISPR/Cas

Clustered Regularly Interspaced Short Palindromic Repeats, later known as CRISPR, were discovered in Escherichia coli in the 1980s [85, 86]. Spacers of variable sequences occurred in between the repeats and these CRISPR elements lay next to CRISPR-associated (Cas) genes. Later, it was recognized that the variable sequences between repeats originated from conjugatable prokaryotic plasmids, viral and transposable elements, which were hypothesized to be part of immune defense against such foreign elements [87–90]. It was later discovered that Streptococcus thermophilus bacteria can take up viral DNA after phage contact and that these spacers have a profound effect on immunity as they help the Cas enzyme to target the viral pathogens [91]. More specifically, spacer-transcribed CRISPR RNAs (crRNAs) are responsible for targeting Cas activity [92]. Importantly, the protospacer adjacent motif (PAM) sequence located immediately downstream of the target site (called protospacer) in the viral genome is important for CRISPR/Cas immune activity and the spacers have to be identical to the protospacers for full functionality [93]. The biotechnological potential of CRISPR/Cas was shown in 2012 by two independent research groups. They demonstrated that by changing the crRNA sequence, the Cas9 enzyme could be targeted to specific DNA sequences based on base-pairing between the crRNA and the target site [94, 95]. Furthermore, one of the groups showed that the two essential RNA molecules, the crRNA and the trans-activating RNA (tracrRNA) required for crRNA maturation and full Cas activity, could be combined in a single guide RNA (gRNA or sgRNA). The Cas enzyme can thus be targeted to specific genomic sites by the gRNA, where it generates DSBs (Figure 10.5). This generally necessitates the use of an NLS that targets the Cas enzyme to the cell nucleus [96–100]. While many CRISPR/Cas-systems exist, the most commonly used variant is the type II CRISPR/Cas9 system from Streptococcus pyogenes, in part due to its relatively simple PAM sequence (NGG) [94]. CRISPR/Cas systems from other bacteria, such as Francisella novicida, Neisseria meningitides, and Campylobacter jejuni, have also been used for genome editing [101–103]. Having alternative systems offers benefits like different PAM preferences, smaller Cas protein sizes (the C. jejuni Cas9 is only 984 aa compared to the S. pyogenes Cas9 at 1368 aa), and substrate specificity as in the case of Cas13 that targets RNA instead of DNA [103, 104]. Cpf1 found in both Acidaminococcus and Lachnospiraceae bacteria is a Cas12a enzyme, that naturally only uses a single gRNA and creates 5′ -end

10.2 Endonucleases Protospacer

Guiding sequence Cas9

PAM Genomic DNA 5′ gRNA Linker loop 3′

Figure 10.5 Components of the CRISPR/Cas9-system. The Cas9 protein generates DSBs at the protospacer recognized by the guiding sequence. The guide RNA (gRNA) is a single RNA molecule combining both crRNA with the guiding sequence and the tracrRNA with its loop-structure. The protospacer adjacent motif (PAM), which for S. pyogenes Cas9 is 5′ -NGG-3′ , needs to be immediately downstream of the protospacer for cleavage to occur.

overhangs instead of blunt ends like S. pyogenes Cas9 [105–107]. Recently, a transposon-associated CRISPR system was discovered. This catalytically inactive Cas variant targets DNA sequences and forms a complex with the transposon associated TniQ protein that mediates transposon integration at a downstream position of the targeted protospacer [108]. This system can potentially be used to integrate DNA without DSBs. The Cas9 enzyme can be catalytically inactivated by mutating both its catalytic domains, HNH and RuvC, which are together responsible for generating DSBs [94, 109, 110]. Catalytically inactive Cas9 is known as “dead” Cas9 (dCas9). The dCas9 still binds to DNA and can be used for a variety of purposes, including transcriptional repression, activation, and DNA mutagenesis by fusing appropriate protein domains [111–113]. Mutating only one of the domains will result in a “nicking” Cas9 (nCas9) that generates SSBs rather than DSBs [94]. While finding and utilizing new Cas proteins from different sources represents one way of expanding the CRISPR toolkit, new additions can also be provided by modifying existing enzymes and CRISPR systems. Engineering of Cas enzymes to increase their targeting options have resulted in S. pyogenes Cas9 and Staphylococcus aureus Cas9 variants with greater acceptance of different PAM sequences and a F. novicida Cas9 variant that recognizes the PAM sequence (C/T)G instead of NGG [101, 114, 115]. Several studies have demonstrated that CRISPR/Cas systems often result in off-target mutations and chromosomal rearrangements [8, 9, 116–118]. Therefore, considerable efforts have been dedicated to understanding and preventing off-target effects, as these are undesirable in cell factory engineering and utterly unacceptable in gene therapy. Some strategies have utilized non-active Cas to map genomic Cas binding sites in mammalian cells, which found that Cas tends to bind non-specifically in open chromatin regions at rates dependent on the particular gRNA in use [119, 120]. Several methods have been developed for detecting where genomic DSBs occur as potentially caused by CRISPR/Cas activity [121–123]. To prevent off-targeting effects, mutational approaches to design more specific Cas enzymes have been successful [124, 125]. Off-targeting

309

310

10 Genome Editing of Eukarya

effects can also be reduced by transferring the mature gRNA–Cas9 complex directly into cells, as the gRNA–Cas9 complex disappears faster than when expressed from a plasmid and thus has less time to make off-target DSBs [126, 127]. Strategies where inactive dCas9 are fused to FokI domains and used in pairs to target closely spaced sequences can be used to reduce off-targeting, as dimerization of the FokI domain is necessary for DNA cleavage [128, 129]. Alternatively, pairs of nCas enzymes can be used similarly to generate a single DSB at a locus targeted by both nCas enzymes [130]. The specificity of CRISPR/Cas can also be increased by changing the length of the gRNA. Paradoxically, both shortening and extending the gRNA have been shown to limit off-target occurences [118, 131]. As will be apparent from the following chapters, CRISPR/Cas has found a great use for basic research and the construction of designer cell factories. Having largely replaced both ZFNs and TALENs for the generation of cell factories, CRISPR seems poised to become the pre-eminent universal tool for genome editing in the future. However, the use of CRISPR/Cas for the treatment of genetic diseases has been shrouded in controversy, due to recent scandals, such as the unregulated study where two twin girls had their genomes edited to make them resistant to HIV [132, 133]. However, while germline editing remains controversial, clinical trials using CRISPR toward the treatment of diseases, such as sickle cell anemia and 𝛽-thalassemia, in patients are ongoing [134]. In these trials, cells are removed from the patient’s body, gene-edited, and re-injected. Should such trials prove safe and efficacious, they will surely pave the way for many novel CRISPR-based treatments in the future.

10.3 Genome Editing of Industrially Relevant Eukaryotes 10.3.1

Yeast

Yeast has a long history of biotechnological use that ranges from bread baking and beer brewing to the present, where yeasts are used for industrial production of pharmaceutical proteins, enzymes, nutraceuticals, fuels, and chemicals [135, 136]. Likewise, yeast has been used by scientists to study the basic cellular functions of eukaryotes [137]. While many different yeast species are used industrially and for basic research, the most ubiquitously used yeast is baker’s yeast Saccharomyces cerevisiae. It offers benefits like the sequenced well-annotated genome, high rates of HR, and excellent genome editing tools [138–140]. Due to the high rates of HR in S. cerevisiae, targeted HR-mediated integration has been possible for a long time even without the use of endonucleases [138, 141, 142]. The most straightforward technique for genetic modification in S. cerevisiae is the integration of a DNA fragment containing the desired modification, a selection marker, and flanking regions homologous to the integration site [142, 143]. This can result in gene deletion, mutation, or the integration of additional genes contained in the fragment. Short flanking regions of 38 to 50 bp have been used to integrate DNA in S. cerevisiae [138] These flanking regions can easily be made with PCR amplification, which eases the cloning procedure. Furthermore,

10.3 Genome Editing of Industrially Relevant Eukaryotes

integration rates may be higher for linear DNA compared to circular [12]. DNA constructs can also be assembled in vivo by S. cerevisiae, provided that the DNA fragments share overlapping homologous regions [145]. Even though these techniques are rather straightforward with sufficiently high integration rates for identifying positive clones, a potential issue arises due to the genomic integration of selectable markers. Developing synthetic yeast systems often requires iterative step-wise engineering through multiple transformation steps. Having integrated and stably expressed marker-genes prevents their reuse and can sometimes result in undesired side effects, as identical promoter/terminator elements could recombine. Although many selectable markers exist for S. cerevisiae, there have been developed several strategies for the subsequent removal of integrated markers [146]. This includes the use of counter-selectable markers that can be deleted with DNA fragments during a second transformation step or flanking the counter-selectable marker with direct repeats for HR-mediated removal [148, 149]. Alternatively, LoxP sites can be used to remove the marker with a Cre recombinase [147]. The key advantage of the CRISPR/Cas system is that the integration of selection markers can be avoided. In a simple example of a CRISPR/Cas-mediated markerless integration system, the yeast cells are transformed with a circular plasmid containing a Cas gene cassette, a gRNA cassette where the gRNA is targeted to the desired genomic site, and a selectable marker, as well as a linear DNA fragment for integration carrying homologous flanking sites to the genomic target sites with middle regions containing the desired modification [98]. As genomic DSBs are potentially lethal to the host organism, the integration of the linear fragment is indirectly selected for as it can repair the DSB caused by the Cas enzyme at the target site (Figure 10.6a). After transformation, the transient, circular plasmid containing the selectable marker can be lost by cultivation on non-selective media, and the marker can subsequently be reused [150]. Furthermore, although HR is already efficient in S. cerevisiae, the use of CRISPR/Cas to generate DSBs results in improved integration rates. For example, integration rates of a 90-bp double-stranded oligonucleotide were enhanced 130-fold compared to that without gRNA expression [151]. The integration rate was dependent on the gRNA sequence used to target the integration site, with considerable variation between different gRNA sequences. This variance in gRNA efficiency has been corroborated by later studies, which also suggests the targeted position has an effect on efficiency [152]. The Cas cassette can also be integrated into the yeast genome and stably expressed, which can significantly decrease the size of the circular gRNA containing plasmids [153]. More than one gRNA cassette can also be included in the plasmid and co-transformed with corresponding repair templates to target several genomic sites at once [150, 154, 155]. For instance, five genomic loci were edited with a single vector harboring the same number of gRNA cassettes, while six genomic loci were edited by co-transforming three gRNA plasmids [150, 153]. The efficiency may drop for targeting identical genomic loci across all chromosomes in polyploid strains compared to diploid strains [98]. The high HR-rates of S. cerevisiae can be deployed to assemble in vivo repair templates or several gRNA cassettes into a single vector, which shortens preparation time while allowing for the targeting of multiple loci [154, 156]. The repair template and the gRNA cassette

311

312

10 Genome Editing of Eukarya

can also be located on the same DNA fragment [157, 158]. The gRNA molecule can be expressed in yeast from RNA polymerase III promoters, such as the bacterial SNR52-promoter or the pRPR1-promoter [113, 151]. The RNA III polymerase transcript may stay in the nucleus due to the lack of polyA-tails and cap-structures, where it can guide Cas9 to the target site [170]. Alternatively, RNA II polymerase promoters can be used if accompanied by specific ribozymes [159]. Ribozymes are RNA molecules that can catalyze chemical reactions, such as self-cleavage. RNA pol II promoters generate capped RNA transcripts with PolyA tails; however, the self-cleaving activity of 5′ end hammerhead and 3′ -end hepatitis delta virus ribozymes release the gRNA transcript from the larger mRNA structure and enable it to guide the Cas enzyme. Catalytically inactive dCas has also been used to silence gene expression in yeast as the binding of dCas to transcriptional elements can prevent RNA-polymerase activity (Figure 10.6b) [112]. Linking dCas to a transcriptional repressor domain can further downregulate gene expression. Alternatively, dCas can be linked to transcriptional activator domains and used to activate gene expression by targeting a site generally upstream of the TATA box in the promoter region of a gene (Figure 10.6c) [113]. It was also demonstrated that such systems can gradually either downor upregulate gene expression when activator– or repressor–dCas complexes are targeted at variable distances to the promoter TATA box, with the strongest effects often occurring closer to the TATA box [160]. Expanding the gRNA with additional hairpin structures for binding of activator proteins can also be used to modulate gene expression in yeast [161]. As an alternative to DSB-mediated mutations, CRISPR-guided basepair editing can be done in yeast by combining the dCas9 or nCas9 with an activation-induced cytidine deaminase (AID) domain [111]. The dCas9–AID complex converts deoxycytidine to deoxyuridine via a deamination reaction that results in mutagenesis converting C to either G or T (Figure 10.6d). The mutation mainly occurs 16–19 bp before the PAM sequence on the noncomplementary strand. Methods like these offer a potential way to reduce off-targeting effects compared to DSB-mediated mutagenesis with CRISPR/Cas. CRISPR/Cas can also be used to change the chromosomal structure and localization of genomic loci. By fusing dCas9 to a cohesion domain and fusing a dockerin domain to a nuclear membrane protein, the targeted sequence of Cas9 can be relocated to the nuclear periphery due to the interaction of the cohesion-dockerin domains [162]. Cas enzymes from different organisms have also been used to engineer yeast, as they allow using other PAM sequences. For instance, AsCpf1, LbCpf1, and FnCpf1 from Acidaminococcus, Lachnospiraceae bacterium, and F. novicida respectively have been used to edit genomic sequences in S. cerevisiae [163]. In particular, the Cas12a (Cpf1) enzyme has been used in S. cerevisiae, which enabled the use of a single gRNA array consisting of three individual gRNAs targeting different genomic sequences. The Cas12a, which also possesses endoribonuclease activity, could then cleave the gRNA array into single transcripts that then guided the Cas12a-endonuclease activity to each of the three target sites resulting in DSB-mediated integration of donor DNA [145]. The development of CRISPR/Cas tools for genome editing had a profound effect on the use of nonconventional yeast hosts, for which the NHEJ system is more dominant than HR in the repair of DNA DSBs [164–166].

10.3 Genome Editing of Industrially Relevant Eukaryotes

CRISPR/Cas tools for genome engineering have been developed for fission yeast Schizosaccharomyces pombe [167], Yarrowia lipolytica [168], Candida albicans [169], Komagataella phaffii (also known as Pichia pastoris) [170], Kluyveromyces lactis [154], Cryptococcus neoformans [171], Issatchenkia orientalis [172], and the yeast-like fungus Aureobasidium pullulans [173]. To further increase HR rates in nonconventional yeasts, the genes KU70 or KU80 genes can be knocked out as they are involved in NHEJ repair of DSBs [165, 166, 174, 175]. Simultaneous editing of multiple genomic loci with CRISPR/Cas systems was demonstrated in some nonconventional yeast species, with up to three simultaneous edits in Y. lipolytica and two in K. phaffii [170, 176]. 10.3.2

Filamentous Fungi

Filamentous fungi have been used for a long time in the production of foods: soy sauce, sake, and moldy cheeses as Camembert and Roquefort [177]. Later, filamentous fungi emerged as useful cell factories for important chemicals, the famous example being Alexander Fleming’s discovery of penicillin-producing fungi in the early twentieth century [178]. Besides medicines like cholesterol-lowering statins and antibiotics, filamentous fungi can also be used to produce plant hormones for agriculture, enzymes for detergents or lignocellulosic biomass degradation, polyunsaturated fatty acids for nutrition, or food additives like citric acid [179–184]. Recombinant technology enabled the production of non-native products in fungi, heralded by the company Novo’s release of lipolase, an enzyme from Humicola lanuginosa produced in Aspergillus oryzae (in 2000, Novo was split into three companies: Novo A/S, Novozymes A/S, and Novo Nordisk A/S) [185, 186]. Genome editing presents a powerful tool to enhance the productivity of both recombinant and natural fungal cell factories. Genome editing is also used for the study of pathogenic fungi. Therefore, this chapter seeks to present techniques and possibilities for genome editing in filamentous fungi. Filamentous fungi are encased in cell walls that make it difficult for DNA to penetrate the cells. Therefore, several methods have been developed for fungal transformation, such as electroporation, particle bombardment, Agrobacterium tumefaciens-mediated transformation, and protoplast transformation [187–190]. A basic example of a DNA construct for genomic integration in fungal cells would be a linear fragment consisting of a selectable marker flanked by regions with homology to the genomic target [191, 192]. Such constructs can be used to delete genes by insertion or deletion. Other genetic elements can be included in the construct to express new genes. Generally, the homologous flanking regions need to be larger than for S. cerevisiae, probably due to the less efficient HR mechanism. Flanking regions of 0.5–2 kb are commonly used in filamentous fungi [138, 191–193]. In filamentous fungi, it is often the case that the NHEJ repair system is more dominant than HR for mending DSBs. Therefore, knocking out the NHEJ-related KU70 or KU80 genes has been reported to increase HR-mediated integration efficiency of foreign DNA in several species. In Neurospora crassa, deleting either KU70 or KU80 resulted in 100% targeting efficiency into the mtr-locus by constructs with 2 kb homologous flanking regions compared to 10–30% for

313

Repressor-domain

Genome

+donor DNA

Gene Cas –donor DNA

(b)

dCas

Activator-domain

Genome NHEJ HR Genome (a)

Donor DNA integration

Gene

Indel

(c)

dCas

AID-domain

(d)

dCas

Figure 10.6 CRISPR/Cas-strategies for genome editing and gene modulation. (a) A common strategy for CRISPR/Cas-based genome editing involves targeting the Cas-enzyme to specific loci with designer gRNAs. The Cas9/gRNA-complex then generates double-strand breaks at the targeted loci. Should donor DNA with homology regions to the targeted site be present, the DSB can facilitate HR-mediated integration of the donor DNA. If no donor DNA is present, the DSB can be repaired by mechanisms like NHEJ, which often result in small insertions or deletions (*). (b) The mutated Cas (dCas) incapable of generating DSBs can be targeted to promoter regions, resulting in transcriptional repression of the downstream gene. The repression effect can be further enhanced by fusing dCas with repressor domains. (c) dCas can also be fused to an activator domain, whereby targeting the dCas-activator fusion protein to the gene’s upstream regions and increasing transcription. (d) Linkage of dCas and the activation-induced cytidine deaminase (AID) domain can result in a fusion protein capable of substituting C to G or T at around 18 bp upstream of the PAM site.

10.3 Genome Editing of Industrially Relevant Eukaryotes

the control strain [191]. The deletion of KU70 also yielded an HR efficiency of around 90% in Aspergillus nidulans and increased the HR efficiency from 11 to 63% in A. oryzae [192, 193]. In Aspergillus niger, deletion of KU80 increased HR efficiency from 1.78 to 65.6%, while deleting both KU70 and KU80 resulted in 100% HR efficiency [194]. Furthermore, the deletion of LigD, a homolog to the human DNA ligase IV, was shown in A. oryzae to increase gene replacement efficiency from 28 to 100% when using 1000 bp homologous flanking regions [195]. Disruption of LigD has also been shown to increase HR-mediated DNA integration in other filamentous fungi like Penicillium oxalicum, Aspergillus kawachii, and Aspergillus glaucus [196–198]. Notably, LigD disruption was more effective than KU70 disruption in P. oxalicum for all compared gene replacements. Both auxotrophic and dominant selection markers are used in filamentous fungi, with examples as URA5s in uracil-auxotrophic strains, bleomycin resistance gene (BLE) or hygromycin B phosphotransferase gene (HPH) that provide resistance to bleomycin or hygromycin, respectively [182, 199, 200]. As with yeast, engineering filamentous fungi often requires iterative cycles of genome editing, which integrated selectable markers complicate. Methods for marker recycling like the recombinase-based Cre/LoxP system or direct repeats flanking counter selectable markers like the orotidine 5′ -phosphate decarboxylase gene (PyrG) have successfully been used in species like A. nidulans, A. oryzae, Trichoderma reesei, N. crassa, and Aspergillus aculeatus [201–207]. However, by using endonucleases for gene targeting, the HR-efficiency can be increased, marker integration can be avoided, and the size of homologous flanking regions reduced. There are examples of TALENs being used to increase HR-efficiency in Pyricularia oryzae from 1.3 to 100% [208]. However, after the emergence of CRISPR and its adaption to filamentous fungi, CRISPR has become the predominant endonuclease system for gene targeting in filamentous fungi. The first instance of CRISPR/Cas9 use in filamentous fungi was in 2015, where Cas9-expressing strains of T. reesei were transformed with in vitro transcribed gRNA targeting URA5 [96]. The system was also used for HR-mediated knock-out of the LAE1 gene with 93% efficiency by a construct with relatively short flanking homology regions of 200 bp. Increasing the homology arms to 600 bp resulted in 100% efficiency. It was further demonstrated that two or three genes could be knocked out simultaneously by this system, albeit at lower efficiencies. Shortly thereafter, another report demonstrated a highly versatile CRISPR/Cas9 system for genome editing in several Aspergilli species with AMA1-based vectors that can propagate in multiple fungal species [99]. The presence of a selectable marker can be necessary when utilizing CRISPR/Cas9 in fungi. The selection cassette can be integrated with the donor DNA or be located on transient, circular plasmids together with either or both Cas9 and gRNA cassettes [99, 209–212]. The transient plasmid can be lost by cultivation on nonselective media, which enables the re-use of the selectable marker. The plasmid loss can also be forcibly induced by activation of growth inhibitory genes located on the plasmid, which was done in A. oryzae to expediently prepare the mutated strain for re-transformation [212]. Some studies also demonstrate that the gRNA and Cas9 cassettes can be integrated into the fungal genome [213, 214]. The Cas9 protein and gRNA can also be made in vitro and delivered as a preassembled complex into the fungal

315

316

10 Genome Editing of Eukarya

cells [211, 215]. The benefits of this strategy is a relatively short-lived presence of Cas9, which in theory can reduce the occurrence of off-target cleavage; however, a selectable marker may need to be integrated with the donor DNA. Delivering pre-assembled gRNA and Cas9 may also reduce the toxicity of Cas9 compared to plasmid expression [216]. Studies have investigated the occurrence of off-target effects in species like Ustilago maydis, Magnaporthe oryzae, and Aspergillus fumigates, but have not found any off-site mutations directly attributable to CRISPR/Cas9 activity [97, 216, 217]. Using CRISPR/Cas9 for targeted DNA integration can also allow for the use of short homologous flanking regions. It was demonstrated that homologous flanking regions of 60 bp was enough for HR-mediated DNA integration using CRISPR/Cas9 in Penicillium chrysogenum, while oligo-mediated mutations and deletions were efficient in A. nidulans with single-stranded oligonucleotides of 90 bp or chimeric oligonucleotides, respectively [211, 218]. Homologous flanking regions of just 45 bp were used to integrate DNA in Ustilago trichophora genome [219]. The first study on CRISPR/Cas9 in T. reesei demonstrated simultaneous editing of two and three genomic targets with efficiencies of 45 and 4.2%, respectively [96]. However, later studies have demonstrated more efficient multiplex editing in other species. In A. nidulans, two genes were mutated and one gene inserted simultaneously and correctly in 9 out of 10 screened clones [218]. In Fusarium fujikuroi, the efficiency of two and three edits was 20.8 and 4.2%, respectively [183]. Double mutations were also done in A. fumigatus and in the insect-pathogenic fungi Beauveria bassiana with 39% for two edits and 5% for three edits [220, 221]. In the thermophilic fungi Myceliophthora thermophile, simultaneous editing of three targets was done at 30% efficiency and that of four targets at 22% efficiency [222]. Alternative Cas enzymes like Cpf1 have also been used for donor DNA integration and oligo-mediated mutagenesis in A. niger and A. nidulans [223]. The expression of gRNA in filamentous fungi can be achieved by using the RNA III polymerase promoter. Polymerase III promoters like SNR52 from yeast have been demonstrated to work in A. fumigatus and N. crassa, while 5s rRNA promoters have been used in A. niger and F. fujikuroi [183, 210, 214, 224]. The U6 Pol III promoter has also been used in a variety of species like A. niger, F. fujikuroi, U. trichophora, and P. oryzae [183, 209, 219, 225]. Lastly, tRNA promoters have been used in A. niger [226]. The gRNA can also be expressed from RNA II polymerase promoters if the gRNA sequence is flanked by self-cleaving ribozymes. This strategy was demonstrated in several Aspergilli species and species like C. neoformans and Alternaria alternata, but failed to work in Cordyceps militaris [99, 213, 227, 228]. However, if suitable promoters for gRNA transcription are not available in particular fungal species, delivering in vitro transcribed gRNA may overcome this issue. 10.3.3

Chinese Hamster Ovary Cells

The first immortal laboratory culture of Chinese Hamster Ovary (CHO) cells was established by Dr. Theodore Puck in the 1950s [229, 230]. This initial culture gave rise to several distinct CHO cell lines as subcultures of the original line were handed off to other scientists and propagated independently. Later

10.3 Genome Editing of Industrially Relevant Eukaryotes

in the 1980s, the first commercial protein product (Activase tPA, a tissue plasminogen activator for treatment of heart conditions) produced from CHO cells was approved for clinical use [231]. In a recent report, 57 out of the 68 new monoclonal antibody pharmaceuticals approved between January 2014 and July 2018 were produced in CHO cells, illustrating the huge industrial importance of CHO cells [232]. Genome editing gives the means to generate stable CHO cell lines for the production of biopharmaceuticals as genes encoding desirable proteins can be integrated into the genome. Genome editing also offers the possibility of improving the CHO cell chassis for large-scale production by altering endogenous genes or expressing new genes that make the CHO cells better suited to the particular environments found during industrial production. A typical DNA construct for expression of a heterologous protein in CHO cells contains the gene encoding the desired protein under the expression of a strong promoter [233]. While circular plasmids transfected into mammalian cells may be linearized by endogenous enzymes, delivering linearized plasmids into the cells works well for generating stable cell lines [234, 235]. DNA delivery methods for mammalian cells have been developed with examples such as electroporation, lipofection, and polymer-based methods [236]. DNA transformed into mammalian cells can be assembled into tandem structures by endogenous enzymes, forming large concatemers with multiple units of the same DNA construct [237]. As with mammalian cells generally, the HR mechanism is relatively inefficient in CHO cells [238, 243]. Therefore, random approaches for DNA integration have commonly been used in CHO cells. Random integration of transgenes can be achieved by delivering a DNA fragment containing the desired gene cassette and a selectable marker. By growing the cells in selective media, surviving cells with integrated DNA fragments can be isolated [239]. Although inefficient without the assistance of DSBs, DNA fragments with homologous flanking regions can be used for HR-mediated targeted integration in mammalian cells [240, 241]. Such constructs have been used to knock-out endogenous genes like FUT8 in CHO cells, requiring two rounds of transfection and screened more than 70 000 cell lines during the second round to secure ten cell lines mutated in both FUT8 alleles [242]. Selection in CHO cells can be done with genes conferring resistance to antibiotics, such as neomycin, zeocin, or hygromycin, or alleviating auxotrophic dependencies [242, 243]. However, a popular method for selection can be used in CHO cells deficient in the enzyme dihydrofolate reductase (DHFR) that is involved in nucleotide metabolism [244]. The DHFR is required for the production of hypoxanthine and thymidine and restoration of the gene in transformed cells allows for growth in media depleted of those metabolites. Furthermore, the DHFR system allows for the “amplification” of gene copy number of both the integrated, functional DHFR copy and co-located genes of interest [245–247]. The compound methotrexate (MTX) inhibits the function of DHFR and can be applied in increasing concentrations to the transformed cells, which will result in surviving cells with an increased copy number of DHFR to overcome the increased MTX-concentrations resulting in the co-amplification of the genes of interest [233, 247]. A similar system can be used with the Glutamine Synthase (GS) encoded by the gene GLUL. The GS is involved in the production of glutamine and restoration of the gene in transfected cells allows for growth

317

318

10 Genome Editing of Eukarya

on media depleted of glutamine. Application of methionine sulfoximine (MSX) prevents the activity of GS and can be used to boost the copy numbers of GLUL and co-localized genes similar to the DHFR system [248, 249]. Furthermore, GS reduce ammonia levels in the cells since ammonia and glutamate is utilized for the production of glutamine by GS. Recombinase systems can also be used for the integration of DNA constructs. In one strategy, a DNA fragment with a GFP-cassette containing a LoxP-site situated after the GFP-start codon, and a DHFR cassette was integrated into the CHO cell genome. After DHFR amplification, the CHO cell lines were transfected with a plasmid carrying genes encoding a human monoclonal antibody and a LoxP-site followed by a selection gene without start-codon and promoter, together with a Cre-expression plasmid. Cre-mediated site-specific recombination of the two LoxP-constructs resulted in restoration of the selection gene and generated antibody producing CHO cell lines that were further improved by additional rounds of DHFR-amplification [243]. The recombinase systems like Flp/FRT, R4, and BxB1 have also been used for DNA integration, with the more recently developed BxB1 system seemingly being more efficient than Flp/FRT [250–252]. In fact, using the BxB1-system for DNA integration resulted in 92% correct clones compared to Flp/FRT with 67% correct clones. As with other organisms, the possibilities of genome editing in CHO cells have been dramatically enhanced by the usage of targetable endonucleases that generate DSBs to increase the occurrence of HR-mediated integration or mutate genes due to NHEJ-mediated repair mechanisms. ZFNs have been used to simultaneously disrupt two gene alleles in CHO cells at the frequency of more than 1%, an astonishing high efficiency compared to the screening of 70 000 clones that was necessary for isolation double allele fut8Δ mutants using HR-mediated disruption without endonucleases [242, 253, 254]. Through several rounds of engineering and transfection, the ZFN-method could also generate triple mutants, by mutating one biallelic gene per round of transfection [255]. ZFNs have also been used to stimulate HR for targeted integration of DNA constructs with only short homologous flanking regions of 50 bp [256]. It was also demonstrated that ZFNs and TALENs could be used for simultaneous gene deletion and insertion by DSB-mediated integration of a foreign antibody gene into the endogenous FUT8 gene [257]. Another demonstration of TALEN use in CHO cells enabled the isolation of glulΔ-double mutants by PCR screening around 20 clones from a single round of transfection [258]. In 2014, the first report of CRISPR/Cas9-mediated genome editing in CHO cells was published [100]. NHEJ-mediated generation of indels of the COSMC-gene was achieved at a frequency of 47.3% in a pool of cells without selection, whereas the frequency was 99.7% for the FUT8-gene with Lens culinaris agglutinin selection. Simultaneous NHEJ-mediated generation of indels in three genes (FUT8, BAX, and BAK) with CRISPR/Cas9 was demonstrated in CHO cells [259]. The mutated phenotypes were validated for one triple mutant clone, consistent with a lack of Bax and Bak proteins and fucosylation activity as expected, and no significant mutations were detected at the 15 most likely off-target genomic sites for each of the gRNA sequences used. However, other studies have found an occurrence of off-target effects [260]. In one case, indels were found at suspected off-target sites with frequencies ranging from very low

10.3 Genome Editing of Industrially Relevant Eukaryotes

to 10.7% depending on the gRNA used. Likewise, whole-genome sequencing of two CRISPR/Cas9 modified CHO cell lines revealed an off-target mutation frequency of 0.17 and 0.18%, respectively, although these mutations did not seem to affect cell performance during cultivation [261]. CRISPR/Cas9 has been used for HR-mediated targeted integration of transgene-containing DNA constructs into specific genomic loci in CHO cells. Site-specific integration in CHO cells can be advantageous since may prevent interclonal phenotypical variations such as hypothermic response, which can vary in clones generated from random integration [262]. Integration of a gene cassette with flanking homology regions of about 750 bp into either the COSMC, Mgat1, or LdhA genomic sites at frequencies of 27.8, 16.4, or 7.4%, respectively, was achieved with CRISPR-Cas9 [260]. Another study demonstrated CRISPR/Cas9-mediated integration into the loci C12orf35, GRIK1, and HPRT with a site-dependent targeting efficiency of 25.4–41.7% for an anti-PD1 mAb expression construct [261]. Importantly, site-specific integration in CHO cells enables the targeting of the so-called “hot spots,” which are defined genomic loci with high transcriptional activity [263]. The integration of gene cassettes into hot spots, such as C12orf35, can lead to higher expression and improved production stability of the desired products [261]. CRISPR-mediated integration of constructs with very short homologous flanking regions of 20–40 bp was achieved in CHO cells [264]. By having one gRNA target the genomic integration site HPRT and two other gRNAs targeting the donor DNA at the 5′ end and 3′ end of the right and left homology regions, respectively, MMEJ-based integration of the construct into HPRT was achieved at 30.7% efficiency. When the two donor DNA-targeted gRNAs were targeted upstream of the right homology site and downstream of the left homology site, respectively, the efficiency was increased to 64.3%. The Cas9 encoding gene and the gRNA-cassettes can be expressed from their own vectors and be co-transfected [100]. The Cas9 protein and gRNA can also be expressed from the same plasmid [265, 266]. The Cas9 plasmid can be co-transfected with several individual gRNA plasmids to generate multiple edits or Cas9 and multiple gRNAs can be expressed from a single plasmid [259, 264, 267]. The gRNA and Cas9 can also be produced in vitro prior to delivery into the CHO cells, which may serve to lessen off-target effects [268]. For gRNA expression, polymerase III promoters like the U6 promoter can be used [100, 261]. CRISPR/Cas can also be used to integrate recombinase site “landing pads” for later construct integration, a technique known as recombinase-mediated cassette exchange (RMCE). In one study, a mCherry reporter cassette was integrated by CRISPR/Cas into a site in the CHO genome [269]. Lox sites flanked the mCherry gene with the promoter and terminator being located outside the recombinase sites. Thereafter, heterologous genes could be swapped with the mCherry gene and expressed without the need for additional construct elements or antibiotic selection. RMCE approaches have used the BxB1, Flp/FRT, or Cre/LoxP recombinase systems [252, 270]. By using recombinase sites, multiconstruct integration can be achieved simultaneously. The recombinase site for BxB1 was integrated into three different hot spots, which enabled the triple integration of mAb expression vectors during a single transfection event [263]. This generated stable lines producing increased mAb titers compared to single integration. Alternative

319

320

10 Genome Editing of Eukarya

Cas enzymes, such as Cpf1 from Acidaminococcus and Lachnospiraceae, have been used in CHO cells for gene deletions [271]. In additionally, by targeting two gRNAs to the FUT8-locus at varying distances, deletions of 2–150 kb were achieved with CRISPR/Cas9. Furthermore, it was demonstrated that both Cpf1 could function with bicistronic gRNA arrays and Cas9 could use bicistronic gRNAs containing tRNA-sequences cleaved by native enzymes.

10.4 Outlook Genome editing holds vast potential to improve our understanding of nature and build solutions to previously insurmountable problems. From the initial discoveries of heritable traits and the DNA double-helix to now, the field has undergone rapid changes, which has taken us from the basic understanding of biological systems to being able to design them at will. While the earliest methods for genome editing were often cumbersome, imprecise, and working at very low efficiencies, the emergence of endonucleases like meganucleases and ZFNs were revolutionary and provided the means to precisely target and alter DNA sequences. The discovery of CRISPR/Cas as a gene-editing technology disrupted the field and together with the other endonucleases, underscored their profound importance for basic research. Indeed, CRISPR/Cas has replaced the older endonucleases in many contexts by being faster, more convenient, and considerably cheaper. The field will likely continue to expand, due to modern gene-editing tools like CRISPR/Cas being refined continuously and broadly applied, even without the emergence of new disrupting tools. This development will be further spurred by the increasing availability of new genomes and genome editing of new organisms. Continuous improvement of genome editing tools will contribute both to the basic research of Eukarya and to developing novel processes for the bio-based economy.

References 1 Tebas, P., Stein, D., Tang, W.W. et al. (2014). Gene editing of CCR5 in

2

3 4

5

autologous CD4 T cells of persons infected with HIV. N. Engl. J. Med. 370: 901–910. Ho, K.S.Y. (1975). Induction of DNA double-strand breaks by X-rays in a radiosensitive strain of the yeast Saccharomyces cerevisiae. Mutat. Res. Mol. Mech. Mutagen. 30: 327–334. Resnick, M.A. (1977). Unrepaired double-strand breaks in nuclear DNA are not always lethal. Mutat. Res. Mol. Mech. Mutagen. 42: 131–134. Lieber, M.R. (2010). The mechanism of double-strand DNA break repair by the non-homologous DNA end-joining pathway. Annu. Rev. Biochem. 79: 181–211. Sfeir, A. and Symington, L.S. (2015). Microhomology-mediated end joining: a back-up survival mechanism or dedicated pathway? Trends Biochem. Sci. 40: 701–714.

References

6 Sung, P. and Klein, H. (2006). Mechanism of homologous recombination:

7

8 9

10 11

12

13

14

15

16

17

18 19 20

21

22

mediators and helicases take on regulatory functions. Nat. Rev. Mol. Cell Biol. 7: 739–750. Wright, W.D., Shah, S.S., and Heyer, W.D. (2018). Homologous recombination and the repair of DNA double-strand breaks. J. Biol. Chem. 293: 10524–10535. Cullot, G., Boutin, J., Toutain, J. et al. (2019). CRISPR-Cas9 genome editing induces megabase-scale chromosomal truncations. Nat. Commun. 10: 1136. Cradick, T.J., Fine, E.J., Antico, C.J., and Bao, G. (2013). CRISPR/Cas9 systems targeting 𝛽-globin and CCR5 genes have substantial off-target activity. Nucleic Acids Res. 41: 9584–9592. Thomas, K., Folger, K., and Capecchi, M. (1986). High frequency targeting of genes to specific sites in the mammalian genome. Cell 44: 419–428. Jasin, M., de Villiers, J., Weber, F., and Schaffner, W. (1985). High frequency of homologous recombination in mammalian cells between endogenous and introduced SV40 genomes. Cell 43: 695–703. Orr-Weaver, T.L., Szostak, J.W., and Rothstein, R.J. (1981). Yeast transformation: a model system for the study of recombination. Proc. Natl. Acad. Sci. 78: 6354–6358. Jasin, M., Rouet, P., and Smih, F. (1994). Introduction of double-strand breaks into the genome of mouse cells by expression of a rare-cutting endonuclease. Mol. Cell. Biol. 14: 8096–8106. Choulika, A., Perrin, A., Dujon, B., and Nicolas, J.F. (1995). Induction of homologous recombination in mammalian chromosomes by using the I-SceI system of Saccharomyces cerevisiae. Mol. Cell. Biol. 15: 1968–1973. Perez, E.E., Wang, J., Miller, J.C. et al. (2008). Establishment of HIV-1 resistance in CD4+ T cells by genome editing using zinc-finger nucleases. Nat. Biotechnol. 26: 808–816. Holt, N., Wang, J., Kim, K. et al. (2010). Human hematopoietic stem/progenitor cells modified by zinc-finger nucleases targeted to CCR5 control HIV-1 in vivo. Nat. Biotechnol. 28: 839–847. Colleaux, L., D’Auriol, L., Galibert, F., and Dujon, B. (1988). Recognition and cleavage site of the intron-encoded omega transposase. Proc. Natl. Acad. Sci. 85: 6022–6026. Meyer, R.S. and Purugganan, M.D. (2013). Evolution of crop species: genetics of domestication and diversification. Nat. Rev. Genet. 14: 840–852. Duarte, C.M., Marba, N., and Holmer, M. (2007). Ecology: rapid domestication of marine species. Science 316: 382–383. Suprasanna, P., Mirajkar, S.J., and Bhagwat, S.G. (2015). Induced mutations and crop improvement. In: Plant Biology and Biotechnology (eds. B. Bahadur, M.V. Rajam, L. Sahijram and K.V. Krishnamurthy), 593–617. Springer India. Ou, X., Long, L., Wu, Y. et al. (2010). Spaceflight-induced genetic and epigenetic changes in the rice (Oryza sativa L.) genome are independent of each other. Genome 53 524–532. Park, E.-H., Shin, Y.-M., Lim, Y.-Y. et al. (2000). Expression of glucose oxidase by using recombinant yeast. J. Biotechnol. 81: 35–44.

321

322

10 Genome Editing of Eukarya

23 Valli, M., Sauer, M., Branduardi, P. et al. (2006). Improvement of lactic acid

24 25

26

27

28 29

30

31

32

33 34 35

36

37

38

production in Saccharomyces cerevisiae by cell sorting for high intracellular pH. Appl. Environ. Microbiol. 72: 5492–5499. Rad, R., Rad, L., Wang, W. et al. (2010). PiggyBac transposon mutagenesis: a tool for cancer gene discovery in mice. Science 330: 1104–1107. Seifert, H.S., Chen, E.Y., So, M., and Heffron, F. (1986). Shuttle mutagenesis: a method of transposon mutagenesis for Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. 83: 735–739. Urba´nski, D.F., Małolepszy, A., Stougaard, J., and Andersen, S.U. (2012). Genome-wide LORE1 retrotransposon mutagenesis and high-throughput insertion detection in Lotus japonicus. Plant J. 69: 731–741. Miller, J., McLachlan, A.D., and Klug, A. (1985). Repetitive zinc-binding domains in the protein transcription factor IIIA from Xenopus oocytes. EMBO J. 4: 1609–1614. Pavletich, N. and Pabo, C. (1991). Zinc finger-DNA recognition: crystal structure of a Zif268-DNA complex at 2.1 A. Science 252: 809–817. Wolfe, S.A., Nekludova, L., and Pabo, C.O. (2002). DNA recognition by Cys 2 His 2 zinc finger proteins. Annu. Rev. Biophys. Biomol. Struct. 29: 183–212. Choo, Y. and Klug, A. (1994). Toward a Code for the Interactions of Zinc Fingers with DNA: Selection of Randomized Fingers Displayed on Phage. Proc. Natl. Acad. Sci. 91 (23): 11163–11167. https://doi.org/10.1073/pnas.91 .23.11163. Desjarlais, J.R. and Berg, J.M. (1992). Toward rules relating zinc finger protein sequences and DNA binding site preferences. Proc. Natl. Acad. Sci. 89: 7345–7349. Dutta, S., Madan, S., and Sundar, D. (2016). Exploiting the recognition code for elucidating the mechanism of zinc finger protein-DNA interactions. BMC Genomics 17: 1037. Li, L., Wu, L.P., and Chandrasegaran, S. (1992). Functional domains in Fok I restriction endonuclease. Proc. Natl. Acad. Sci. 89: 4275–4279. Kim, Y.G. and Chandrasegaran, S. (1994). Chimeric restriction endonuclease. Proc. Natl. Acad. Sci. 91: 883–887. Kim, Y.G., Cha, J., and Chandrasegaran, S. (1996). Hybrid restriction enzymes: zinc finger fusions to Fok I cleavage domain. Proc. Natl. Acad. Sci. U. S. A. 93: 1156–1160. Bitinaite, J., Wah, D.A., Aggarwal, A.K., and Schildkraut, I. (1998). FokI dimerization is required for DNA cleavage. Proc. Natl. Acad. Sci. 95: 10570–10575. Smith, J. (2002). Requirements for double-strand cleavage by chimeric restriction enzymes with zinc finger DNA-recognition domains. Nucleic Acids Res. 28: 3361–3369. Bibikova, M., Carroll, D., Segal, D.J. et al. (2001). Stimulation of homologous recombination through targeted cleavage by chimeric nucleases. Mol. Cell. Biol. 21: 289–297.

References

39 Miller, J.C., Holmes, M.C., Wang, J. et al. (2007). An improved zinc-finger

40

41 42

43

44

45

46

47

48

49 50 51

52

53

54

nuclease architecture for highly specific genome editing. Nat. Biotechnol. 25: 778–785. Doyon, Y., Vo, T.D., Mendel, M.C. et al. (2011). Enhancing zinc-finger-nuclease activity with improved obligate heterodimeric architectures. Nat. Methods 8: 74–79. Shimizu, Y., S¸ öllü, C., Meckler, J.F. et al. (2011). Adding fingers to an engineered zinc finger nuclease can reduce activity. Biochemistry 50: 5033–5041. Ramirez, C.L., Foley, J.E., Wright, D.A. et al. (2008). Erratum: unexpected failure rates for modular assembly of engineered zinc fingers. Nat. Methods 5: 575–575. Carroll, D., Morton, J.J., Beumer, K.J., and Segal, D.J. (2006). Design, construction and in vitro testing of zinc finger nucleases. Nat. Protoc. 1: 1329–1341. Händel, E.-M., Alwin, S., and Cathomen, T. (2009). Expanding or restricting the target site repertoire of zinc-finger nucleases: the inter-domain linker as a major determinant of target site selectivity. Mol. Ther. 17: 104–111. Shimizu, Y., Bhakta, M.S., and Segal, D.J. (2009). Restricted spacer tolerance of a zinc finger nuclease with a six amino acid linker. Bioorg. Med. Chem. Lett. 19: 3970–3972. Ramirez, C.L., Certo, M.T., Mussolino, C. et al. (2012). Engineered zinc finger nickases induce homology-directed repair with reduced mutagenic effects. Nucleic Acids Res. 40: 5560–5568. Wang, J., Friedman, G., Doyon, Y. et al. (2012). Targeted gene addition to a predetermined site in the human genome using a ZFN-based nicking enzyme. Genome Res. 22: 1316–1326. Muenzer, J., Prada, C.E., Burton, B. et al. (2019). CHAMPIONS: a phase 1/2 clinical trial with dose escalation of SB-913 ZFN-mediated in vivo human genome editing for treatment of MPS II (Hunter syndrome). Mol. Genet. Metab. 126: S104. Boch, J. and Bonas, U. (2010). Xanthomonas AvrBs3 family-type III effectors: discovery and function. Annu. Rev. Phytopathol. 48: 419–436. Kay, S., Hahn, S., Marois, E. et al. (2007). A bacterial effector acts as a plant transcription factor and induces a cell size regulator. Science 318: 648–651. Römer, P., Hahn, S., Jordan, T. et al. (2007). Plant pathogen recognition mediated by promoter activation of the pepper Bs3 resistance gene. Science 318: 645–648. Van Den Ackerveken, G., Marois, E., and Bonas, U. (1996). Recognition of the bacterial avirulence protein AvrBs3 occurs inside the host plant cell. Cell 87: 1307–1316. Szurek, B., Rossier, O., Hause, G., and Bonas, U. (2002). Type III-dependent translocation of the Xanthomonas AvrBs3 protein into the plant cell. Mol. Microbiol. 46: 13–23. Gao, H., Wu, X., Chai, J., and Han, Z. (2012). Crystal structure of a TALE protein reveals an extended N-terminal DNA binding region. Cell Res. 22: 1716–1720.

323

324

10 Genome Editing of Eukarya

55 Zhu, W., Yang, B., Chittoor, J.M. et al. (1998). AvrXa10 contains an acidic

56

57

58 59 60

61 62

63

64

65

66

67

68

69

70

transcriptional activation domain in the functionally conserved C terminus. Mol. Plant-Microbe Interact. 11: 824–832. Hopkins, C.M., White, F.F., Choi, S.-H., and Leach, J.E. (1992). Identification of a family of avirulence genes from Xanthomonas Oryzae Pv. Oryzae. Mol. Plant Microbe Interact. 5 (6): 451–459. https://doi.org/10.1094/MPMI-5-451. Bonas, U., Stall, R.E., and Staskawicz, B. (1989). Genetic and structural characterization of the avirulence gene avrBs3 from Xanthomonas campestris pv. vesicatoria. MGG Mol. Gen. Genet. 218: 127–136. Boch, J., Scholze, H., Schornack, S. et al. (2009). Breaking the code of DNA binding. Science 326: 1509–1512. Moscou, M.J. and Bogdanove, A.J. (2009). A simple cipher governs DNA recognition by TAL effectors. Science 326: 1501. Cong, L., Zhou, R., Kuo, Y. et al. (2012). Comprehensive interrogation of natural TALE DNA-binding modules and transcriptional repressor domains. Nat. Commun. 3: 968. Streubel, J., Blücher, C., Landgraf, A., and Boch, J. (2012). TAL effector RVD specificities and efficiencies. Nat. Biotechnol. 3 (7): 593–595. Christian, M.L., Demorest, Z.L., Starker, C.G. et al. (2012). Targeting G with TAL effectors: a comparison of activities of TALENs constructed with NN and NK repeat variable Di-residues. PLoS One 7: 1–9. Meckler, J.F., Bhakta, M.S., Kim, M.-S. et al. (2013). Quantitative analysis of TALE–DNA interactions suggests polarity effects. Nucleic Acids Res. 41: 4118–4128. Pérez-Quintero, A.L., Rodriguez-R, L.M., Dereeper, A. et al. (2013). An improved method for TAL effectors DNA-binding sites prediction reveals functional convergence in TAL repertoires of Xanthomonas oryzae strains. PLoS One 8: e68464. Grau, J., Wolf, A., Reschke, M. et al. (2013). Computational predictions provide insights into the biology of TAL effector target sites. PLoS Comput. Biol. 9: 1–20. Doyle, E.L., Booher, N.J., Standage, D.S. et al. (2012). TAL effector-nucleotide targeter (TALE-NT) 2.0: tools for TAL effector design and target prediction. Nucleic Acids Res. 40: 117–122. Bultmann, S., Morbitzer, R., Schmidt, C.S. et al. (2012). Targeted transcriptional activation of silent oct4 pluripotency gene by combining designer TALEs and inhibition of epigenetic modifiers. Nucleic Acids Res. 40: 5368–5377. Garg, A., Lohmueller, J.J., Silver, P.A., and Armel, T.Z. (2012). Engineering synthetic TAL effectors with orthogonal target sites. Nucleic Acids Res. 40: 7584–7595. Lamb, B.M., Mercer, A.C., and Barbas, C.F. (2013). Directed evolution of the TALE N-terminal domain for recognition of all 5′ bases. Nucleic Acids Res. 41: 9779–9785. Li, T., Huang, S., Zhao, X. et al. (2011). Modularly assembled designer TAL effector nucleases for targeted gene knockout and gene replacement in eukaryotes. Nucleic Acids Res. 39: 6315–6325.

References

71 Reyon, D., Tsai, S.Q., Khayter, C. et al. (2012). FLASH assembly of TALENs

for high-throughput genome editing. Nat. Biotechnol. 30: 460–465. 72 Christian, M., Cermak, T., Doyle, E.L. et al. (2010). Targeting DNA

double-strand breaks with TAL effector nucleases. Genetics 186: 757–761. 73 Miller, J.C., Tan, S., Qiao, G. et al. (2011). A TALE nuclease architecture for

efficient genome editing. Nat. Biotechnol. 29: 143–148. 74 Mussolino, C., Morbitzer, R., Lütge, F. et al. (2011). A novel TALE nucle-

75

76

77

78

79

80

81

82

83 84 85

86

ase scaffold enables high genome editing activity in combination with low toxicity. Nucleic Acids Res. 39: 9283–9293. Mercer, A.C., Gaj, T., Fuller, R.P., and Barbas, C.F. (2012). Chimeric TALE recombinases with programmable DNA sequence specificity. Nucleic Acids Res. 40: 11163–11172. Luo, W., Galvan, D.L., Woodard, L.E. et al. (2017). Comparative analysis of chimeric ZFP-, TALE- and Cas9-piggyBac transposases for integration into a single locus in human cells. Nucleic Acids Res. 45: 8411–8422. Zhang, F., Cong, L., Lodato, S. et al. (2011). Efficient construction of sequence-specific TAL effectors for modulating mammalian transcription. Nat. Biotechnol. 29: 149–153. Maeder, M.L., Angstman, J.F., Richardson, M.E. et al. (2013). Targeted DNA demethylation and activation of endogenous genes using programmable TALE-TET1 fusion proteins. Nat. Biotechnol. 31: 1137–1142. Cermak, T., Doyle, E.L., Christian, M. et al. (2011). Efficient design and assembly of custom TALEN and other TAL effector-based constructs for DNA targeting. Nucleic Acids Res. 39: 1–11. Schmid-Burgk, J.L., Schmidt, T., Kaiser, V. et al. (2013). A ligation-independent cloning technique for high-throughput assembly of transcription activator–like effector genes. Nat. Biotechnol. 31: 76–81. de Lange, O., Schreiber, T., Schandry, N. et al. (2013). Breaking the DNA-binding code of Ralstonia solanacearum TAL effectors provides new possibilities to generate plant resistance genes against bacterial wilt disease. New Phytol. 199: 773–786. Lahaye, T. (2014). Programmable DNA-binding proteins from Burkholderia provide a fresh perspective on the TALE-like repeat domain. Nucleic Acids Res. 42: 7436–7449. Li, L., Atef, A., Piatek, A. et al. (2013). Characterization and DNA-binding specificities of Ralstonia TAL-like effectors. Mol. Plant 6: 1318–1330. Juillerat, A., Bertonati, C., Dubois, G. et al. (2015). BurrH: a new modular DNA binding protein for genome engineering. Sci. Rep. 4: 3831. Ishino, Y., Shinagawa, H., Makino, K. et al. (1987). Nucleotide sequence of the iap gene, responsible for alkaline phosphatase isozyme conversion in Escherichia coli, and identification of the gene product. J. Bacteriol. 169: 5429–5433. Jansen, R., Van Embden, J.D.A., Gaastra, W., and Schouls, L.M. (2002). Identification of genes that are associated with DNA repeats in prokaryotes. Mol. Microbiol. 43: 1565–1575.

325

326

10 Genome Editing of Eukarya

87 Mojica, F.J.M., Díez-Villaseñor, C., García-Martínez, J., and Soria, E. (2005).

88

89

90

91 92 93

94

95

96

97

98 99

100

101 102

Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elements. J. Mol. Evol. 60: 174–182. Bolotin, A., Quinquis, B., Sorokin, A., and Dusko Ehrlich, S. (2005). Clustered regularly interspaced short palindrome repeats (CRISPRs) have spacers of extrachromosomal origin. Microbiology 151: 2551–2561. Pourcel, C., Salvignol, G., and Vergnaud, G. (2005). CRISPR elements in Yersinia pestis acquire new repeats by preferential uptake of bacteriophage DNA, and provide additional tools for evolutionary studies. Microbiology 151: 653–663. Makarova, K.S., Grishin, N.V., Shabalina, S.A. et al. (2006). A putative RNA-interference-based immune system in prokaryotes: computational analysis of the predicted enzymatic machinery, functional analogies with eukaryotic RNAi, and hypothetical mechanisms of action. Biol. Direct 1: 7. Barrangou, R., Fremaux, C., Deveau, H. et al. (2007). CRISPR provides acquired resistance against viruses in prokaryotes. Science 315: 1709–1712. Brouns, S.J.J., Jore, M.M., Lundgren, M. et al. (2008). Small CRISPR RNAs guide antiviral defense in prokaryotes. Science 321: 960–964. Deveau, H., Barrangou, R., Garneau, J.E. et al. (2008). Phage response to CRISPR-encoded resistance in Streptococcus thermophilus. J. Bacteriol. 190: 1390–1400. Jinek, M., Chylinski, K., Fonfara, I. et al. (2012). A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337: 816–821. Gasiunas, G., Barrangou, R., Horvath, P., and Siksnys, V. (2012). Cas9-crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria. Proc. Natl. Acad. Sci. 109: E2579–E2586. Liu, R., Chen, L., Jiang, Y. et al. (2015). Efficient genome editing in filamentous fungus Trichoderma reesei using the CRISPR/Cas9 system. Cell Discov. 1: 1–11. Schuster, M., Schweizer, G., Reissmann, S., and Kahmann, R. (2016). Genome editing in Ustilago maydis using the CRISPR–Cas system. Fungal Genet. Biol. 89: 3–9. Ryan, O.W., Skerker, J.M., Maurer, M.J. et al. (2014). Selection of chromosomal DNA libraries using a multiplex CRISPR system. elife 3: 1–15. Nødvig, C.S., Nielsen, J.B., Kogle, M.E., and Mortensen, U.H. (2015). A CRISPR-Cas9 system for genetic engineering of filamentous fungi. PLoS One 10: e0133085. Ronda, C., Pedersen, L.E., Hansen, H.G. et al. (2014). Accelerating genome editing in CHO cells using CRISPR Cas9 and CRISPy, a web-based target finding tool. Biotechnol. Bioeng. 111: 1604–1616. Hirano, H., Gootenberg, J.S., Horii, T. et al. (2016). Structure and engineering of Francisella novicida Cas9. Cell 164: 950–961. Hou, Z., Zhang, Y., Propson, N.E. et al. (2013). Efficient genome engineering in human pluripotent stem cells using Cas9 from Neisseria meningitidis. Proc. Natl. Acad. Sci. 110: 15644–15649.

References

103 Kim, E., Koo, T., Park, S.W. et al. (2017). In vivo genome editing with a

104 105 106

107

108

109 110

111

112

113

114

115

116 117

118

119

small Cas9 orthologue derived from Campylobacter jejuni. Nat. Commun. 8: 14500. Cox, D.B.T., Gootenberg, J.S., Abudayyeh, O.O. et al. (2017). RNA editing with CRISPR-Cas13. Science 358: 1019–1027. Yamano, T., Nishimasu, H., Zetsche, B. et al. (2016). Crystal structure of Cpf1 in complex with guide RNA and target DNA. Cell 165: 949–962. Zetsche, B., Gootenberg, J.S., Abudayyeh, O.O. et al. (2015). Cpf1 is a single RNA-guided endonuclease of a class 2 CRISPR-Cas system. Cell 163: 759–771. Fonfara, I., Richter, H., Bratoviˇc, M. et al. (2016). The CRISPR-associated DNA-cleaving enzyme Cpf1 also processes precursor CRISPR RNA. Nature 532: 517–521. Klompe, S.E., Vo, P.L.H., Halpin-Healy, T.S., and Sternberg, S.H. (2019). Transposon-encoded CRISPR–Cas systems direct RNA-guided DNA integration. Nature 571: 219–225. Nishimasu, H., Ran, F.A., Hsu, P.D. et al. (2014). Crystal structure of Cas9 in complex with guide RNA and target DNA. Cell 156: 935–949. Qi, L.S., Larson, M.H., Gilbert, L.A. et al. (2013). Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell 152: 1173–1183. Nishida, K., Arazoe, T., Yachie, N. et al. (2016). Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 353: aaf8729. Gilbert, L.A., Larson, M.H., Morsut, L. et al. (2013). CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes. Cell 154: 442-451. Farzadfard, F., Perli, S.D., and Lu, T.K. (2013). Tunable and multifunctional eukaryotic transcription factors based on CRISPR/Cas. ACS Synth. Biol. 2: 604–613. Kleinstiver, B.P., Prew, M.S., Tsai, S.Q. et al. (2015). Broadening the targeting range of Staphylococcus aureus CRISPR-Cas9 by modifying PAM recognition. Nat. Biotechnol. 33: 1293–1298. Kleinstiver, B.P., Prew, M.S., Tsai, S.Q. et al. (2015). Engineered CRISPR-Cas9 nucleases with altered PAM specificities. Nature 523: 481–485. Hsu, P.D., Scott, D.A., Weinstein, J.A. et al. (2013). DNA targeting specificity of RNA-guided Cas9 nucleases. Nat. Biotechnol. 31: 827–832. Pattanayak, V., Lin, S., Guilinger, J.P. et al. (2013). High-throughput profiling of off-target DNA cleavage reveals RNA-programmed Cas9 nuclease specificity. Nat. Biotechnol. 31: 839–843. Cho, S.W., Kim, S., Kim, Y. et al. (2014). Analysis of off-target effects of CRISPR/Cas-derived RNA-guided endonucleases and nickases. Genome Res. 24: 132–141. Wu, X., Scott, D.A., Kriz, A.J. et al. (2014). Genome-wide binding of the CRISPR endonuclease Cas9 in mammalian cells. Nat. Biotechnol. 32: 670–676.

327

328

10 Genome Editing of Eukarya

120 Kuscu, C., Arslan, S., Singh, R. et al. (2014). Genome-wide analysis reveals

121

122

123

124

125 126

127

128

129

130

131 132 133 134

135

136

characteristics of off-target sites bound by the Cas9 endonuclease. Nat. Biotechnol. 32: 677–683. Kim, D., Bae, S., Park, J. et al. (2015). Digenome-seq: genome-wide profiling of CRISPR-Cas9 off-target effects in human cells. Nat. Methods 12: 237–243. Tsai, S.Q., Zheng, Z., Nguyen, N.T. et al. (2015). GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nat. Biotechnol. 33: 187–197. Crosetto, N., Mitra, A., Silva, M.J. et al. (2013). Nucleotide-resolution DNA double-strand break mapping by next-generation sequencing. Nat. Methods 10: 361–365. Kleinstiver, B.P., Pattanayak, V., Prew, M.S. et al. (2016). High-fidelity CRISPR–Cas9 nucleases with no detectable genome-wide off-target effects. Nature 529: 490–495. Slaymaker, I.M., Gao, L., Zetsche, B. et al. (2016). Rationally engineered Cas9 nucleases with improved specificity. Science 351: 84–88. Lin, S., Staahl, B.T., Alla, R.K., and Doudna, J.A. (2014). Enhanced homology-directed human genome engineering by controlled timing of CRISPR/Cas9 delivery. elife 3: e04766. Kim, S., Kim, D., Cho, S.W. et al. (2014). Highly efficient RNA-guided genome editing in human cells via delivery of purified Cas9 ribonucleoproteins. Genome Res. 24: 1012–1019. Tsai, S.Q., Wyvekens, N., Khayter, C. et al. (2014). Dimeric CRISPR RNA-guided FokI nucleases for highly specific genome editing. Nat. Biotechnol. 32: 569–576. Guilinger, J.P., Thompson, D.B., and Liu, D.R. (2014). Fusion of catalytically inactive Cas9 to FokI nuclease improves the specificity of genome modification. Nat. Biotechnol. 32: 577–582. Aach, J., Stranges, P.B., Esvelt, K.M. et al. (2013). CAS9 transcriptional activators for target specificity screening and paired nickases for cooperative genome engineering. Nat. Biotechnol. 31: 1–15. Fu, Y., Sander, J.D., Reyon, D. et al. (2014). Improving CRISPR-Cas nuclease specificity using truncated guide RNAs. Nat. Biotechnol. 32: 279–284. Cyranoski, D. (2019). The CRISPR-baby scandal: what’s next for human gene-editing. Nature 566: 440–442. Cohen, J. (2019). The long shadow of a CRISPR scandal. Science 365: 436–436. Papasavva, P., Kleanthous, M., and Lederer, C.W. (2019). Rare opportunities: CRISPR/Cas-based therapy development for rare genetic diseases. Mol. Diagn. Ther. 23: 201–222. Nielsen, J., Larsson, C., van Maris, A., and Pronk, J. (2013). Metabolic engineering of yeast for production of fuels and chemicals. Curr. Opin. Biotechnol. 24: 398–404. Walker, R.S.K. and Pretorius, I.S. (2018). Applications of yeast synthetic biology geared towards the production of biopharmaceuticals. Genes (Basel) 9: 1–22.

References

137 Botstein, D. and Fink, G.R. (2011). Yeast: an experimental organism for 21st

century biology. Genetics 189: 695–704. 138 Lorenz, M.C., Muir, R.S., Lim, E. et al. (1995). Gene disruption with PCR

products in Saccharomyces cerevisiae. Gene 158: 113–117. 139 Chen, B., Lee, H.L., Heng, Y.C. et al. (2018). Synthetic biology toolkits and

applications in Saccharomyces cerevisiae. Biotechnol. Adv. 36: 1870–1881. 140 Mewes, H., Albermann, K., and Bähr, M. (1997). Overview of the yeast 141 142

143 144

145

146

147

148 149

150

151

152

153

genome. Nature 387: 7–65. Hinnen, A., Hicks, J.B., and Fink, G.R. (1978). Transformation of yeast. Proc. Natl. Acad. Sci. 75: 1929–1933. Scherer, S. and Davis, R.W. (1979). Replacement of chromosome segments with altered DNA sequences constructed in vitro. Proc. Natl. Acad. Sci. 76: 4951–4955. Rothstein, R.J. (1983). [12] One-step gene disruption in yeast. Methods Enzymol. 101: 202–211. Hua, S., Qiu, M., Chan, E. et al. (1997). Minimum length of sequence homology required for in vivo cloning by homologous recombination in yeast. Plasmid 38: 91–96. Ciurkot, K., Vonk, B., Gorochowski, T.E. et al. (2019). CRISPR/Cas12a multiplex genome editing of Saccharomyces cerevisiae and the creation of yeast pixel art. J. Vis. Exp. 147: 1–15. Siewers, V. (2014). An overview on selection marker genes for transformation of Saccharomyces cerevisiae. In: Yeast Metabolic Engineering (eds. J.M. Walker and V. Mapelli), 3–15. Springer Science+Business Media. Carter, Z. and Delneri, D. (2010). New generation of loxP-mutated deletion cassettes for the genetic manipulation of yeast natural isolates. Yeast 27: 765–775. Storici, F., Lewis, L.K., and Resnick, M.A. (2001). In vivo site-directed mutagenesis using oligonucleotides. Nat. Biotechnol. 19: 773–776. Akada, R., Hirosawa, I., Kawahata, M. et al. (2002). Sets of integrating plasmids and gene disruption cassettes containing improved counter-selection markers designed for repeated use in budding yeast. Yeast 19: 393–402. Jakoˇcinas, T., Bonde, I., Herrgård, M. et al. (2015). Multiplex metabolic pathway engineering using CRISPR/Cas9 in Saccharomyces cerevisiae. Metab. Eng. 28: 213–222. Dicarlo, J.E., Norville, J.E., Mali, P. et al. (2013). Genome engineering in Saccharomyces cerevisiae using CRISPR-Cas systems. Nucleic Acids Res. 41: 4336–4343. Smith, J.D., Suresh, S., Schlecht, U. et al. (2016). Quantitative CRISPR interference screens in yeast identify chemical-genetic interactions and new rules for guide RNA design. Genome Biol. 17: 1–16. Mans, R., van Rossum, H.M., Wijsman, M. et al. (2015). CRISPR/Cas9: a molecular Swiss army knife for simultaneous introduction of multiple genetic modifications in Saccharomyces cerevisiae. FEMS Yeast Res. 15: 1–15.

329

330

10 Genome Editing of Eukarya

154 Horwitz, A.A., Walter, J.M., Schubert, M.G. et al. (2015). Efficient multi-

155

156

157

158

159

160

161

162

163

164 165

166 167

168

169

plexed integration of synergistic alleles and metabolic pathways in yeasts via CRISPR-Cas. Cell Syst. 1: 88–96. Liu, K., Yuan, X., Liang, L. et al. (2019). Using CRISPR/Cas9 for multiplex genome engineering to optimize the ethanol metabolic pathway in Saccharomyces cerevisiae. Biochem. Eng. J. 145: 120–126. Jakoˇciunas, T., Rajkumar, A.S., Zhang, J. et al. (2015). CasEMBLR: Cas9-facilitated multiloci genomic integration of in vivo assembled DNA parts in Saccharomyces cerevisiae. ACS Synth. Biol. 4: 1226–1234. Garst, A.D., Bassalo, M.C., Pines, G. et al. (2017). Genome-wide mapping of mutations at single-nucleotide resolution for protein, metabolic and genome engineering. Nat. Biotechnol. 35: 48–55. Buchmuller, B.C., Herbst, K., Meurer, M. et al. (2019). Pooled clone collections by multiplexed CRISPR-Cas12a-assisted gene tagging in yeast. Nat. Commun. 10: 2960. Gao, Y. and Zhao, Y. (2014). Self-processing of ribozyme-flanked RNAs into guide RNAs in vitro and in vivo for CRISPR-mediated genome editing. J. Integr. Plant Biol. 56: 343–349. Deaner, M. and Alper, H.S. (2017). Systematic testing of enzyme perturbation sensitivities via graded dCas9 modulation in Saccharomyces cerevisiae. Metab. Eng. 40: 14–22. Zalatan, J.G., Lee, M.E., Almeida, R. et al. (2015). Engineering complex synthetic transcriptional programs with CRISPR RNA scaffolds. Cell 160: 339–350. Lin, J.L., Ekas, H., Deaner, M., and Alper, H.S. (2019). CRISPR-PIN: modifying gene position in the nucleus via dCas9-mediated tethering. Synth. Syst. Biotechnol. 4: 73–78. Verwaal, R., Buiting-Wiessenhaan, N., Dalhuijsen, S., and Roubos, J.A. (2018). CRISPR/Cpf1 enables fast and simple genome editing of Saccharomyces cerevisiae. Yeast 35: 201–211. Kegel, A. (2006). Genome wide distribution of illegitimate recombination events in Kluyveromyces lactis. Nucleic Acids Res. 34: 1633–1645. Verbeke, J., Beopoulos, A., and Nicaud, J.-M. (2013). Efficient homologous recombination with short length flanking fragments in KU70 deficient Yarrowia lipolytica strains. Biotechnol. Lett. 35: 571–576. Saraya, R., Krikken, A.M., Kiel, J.A.K.W. et al. (2012). Novel genetic tools for Hansenula polymorpha. FEMS Yeast Res. 12: 271–278. Jacobs, J.Z., Ciccaglione, K.M., Tournier, V., and Zaratiegui, M. (2014). Implementation of the CRISPR-Cas9 system in fission yeast. Nat. Commun. 5: 5344. Schwartz, C.M., Hussain, M.S., Blenner, M., and Wheeldon, I. (2016). Synthetic RNA polymerase III promoters facilitate high-efficiency CRISPR-Cas9-mediated genome editing in Yarrowia lipolytica. ACS Synth. Biol. 5: 356–359. Vyas, V.K., Barrasa, M.I., and Fink, G.R. (2015). A Candida albicans CRISPR system permits genetic engineering of essential genes and gene families. Sci. Adv. 1: e1500248.

References

170 Weninger, A., Hatzl, A.M., Schmid, C. et al. (2016). Combinatorial opti-

171

172

173

174

175

176

177

178 179 180

181

182

183

184 185

mization of CRISPR/Cas9 expression enables precision genome engineering in the methylotrophic yeast Pichia pastoris. J. Biotechnol. 235: 139–149. Wang, Y., Wei, D., Zhu, X. et al. (2016). A “suicide” CRISPR-Cas9 system to promote gene deletion and restoration by electroporation in Cryptococcus neoformans. Sci. Rep. 6: 31145. Tran, V.G., Cao, M., Fatma, Z. et al. (2019). Development of a CRISPR/Cas9-Based Tool for Gene Deletion in Issatchenkia orientalis. mSphere 4: 1–11. Zhang, Y., Feng, J., Wang, P. et al. (2019). CRISPR/Cas9-mediated efficient genome editing via protoplast-based transformation in yeast-like fungus Aureobasidium pullulans. Gene 709: 8–16. Kretzschmar, A., Otto, C., Holz, M. et al. (2013). Increased homologous integration frequency in Yarrowia lipolytica strains defective in non-homologous end-joining. Curr. Genet. 59: 63–72. Näätsaari, L., Mistlberger, B., Ruth, C. et al. (2012). Deletion of the Pichia pastoris KU70 homologue facilitates platform strain generation for gene expression and synthetic biology. PLoS One 7: e39720. Gao, S., Tong, Y., Wen, Z. et al. (2016). Multiplex gene editing of the Yarrowia lipolytica genome using the CRISPR-Cas9 system. J. Ind. Microbiol. Biotechnol. 43: 1085–1093. Dupont, J., Dequin, S., Giraud, T. et al. (2017). Fungi as a source of food. Microbiol. Spectr. 5 (3): FUNK-0030-2016. https://doi.org/10.1128/ microbiolspec.FUNK-0030-2016. Goldsworthy, P.D. and McFarlane, A.C. (2002). Howard Florey, Alexander Fleming and the fairy tale of penicillin. Med. J. Aust. 176: 176–178. Adrio, J.L. and Demain, A.L. (2003). Fungal biotechnology. Int. Microbiol. 6: 191–199. Zhou, P.-P., Meng, J., and Bao, J. (2017). Fermentative production of high titer citric acid from corn stover feedstock after dry dilute acid pretreatment and biodetoxification. Bioresour. Technol. 224: 563–572. Fang, H. and Xia, L. (2015). Cellulase production by recombinant Trichoderma reesei and its application in enzymatic hydrolysis of agricultural residues. Fuel 143: 211–216. Ando, A., Sumida, Y., Negoro, H. et al. (2009). Establishment of Agrobacterium tumefaciens-mediated transformation of an oleaginous fungus, Mortierella alpina 1S-4, and its application for eicosapentaenoic acid producer breeding. Appl. Environ. Microbiol. 75: 5529–5535. Shi, T.Q., Gao, J., Wang, W.J. et al. (2019). CRISPR/Cas9-based genome editing in the filamentous fungus Fusarium fujikuroi and its application in strain engineering for gibberellic acid production. ACS Synth. Biol. 8: 445–454. Marumo, S., Katayama, M., Komori, E. et al. (1982). Microbial production of abscisic acid by Botrytis cinerea. Agric. Biol. Chem. 46: 1967–1968. Pandey, A., Benjamin, S., Soccol, C.R. et al. (1999). The realm of microbial lipases in biotechnology. Biotechnol. Appl. Biochem. 29 (Pt 2): 119–131.

331

332

10 Genome Editing of Eukarya

186 Novozymes’ history, our heritage, from 1921 to the present (29 July 2019).

187

188

189 190

191

192 193

194

195

196

197

198

199

200

© 2019 Novozymes. https://www.novozymes.com/en/about-us/history (accessed 29 July 2019). Turgeon, B.G., Condon, B., Liu, J., and Zhang, N. (2010). Molecular and Cell Biology Methods for Fungi. In: (eds. J.M. Walker and A. Sharon), 3–19. Springer Science+Business Media. Bhairi, S.M. (1992). Transient expression of the 𝛽-glucuronidase gene introduced into Uromyces appendiculatus uredospores by particle bombardment. Phytopathology 82: 986. Richey, M.G. (1989). Transformation of filamentous fungi with plasmid DNA by electroporation. Phytopathology 79: 844. de Groot, M.J.A., Bundock, P., Hooykaas, P.J., and Beijersbergen, A.G.M. (1998). Agrobacterium tumefaciens-mediated transformation of filamentous fungi. Nat. Biotechnol. 16: 839–842. Ninomiya, Y., Suzuki, K., Ishii, C., and Inoue, H. (2004). Highly efficient gene replacements in Neurospora strains deficient for non homologous end-joining. Proc. Natl. Acad. Sci. 101: 12248–12253. Nayak, T., Szewczyk, E., Oakley, C.E. et al. (2006). A versatile and efficient gene-targeting system for Aspergillus nidulans. Genetics 172: 1557–1566. Takahashi, T., Masuda, T., and Koyama, Y. (2006). Enhanced gene targeting frequency in KU70 and KU80 disruption mutants of Aspergillus sojae and Aspergillus oryzae. Mol. Gen. Genomics. 275: 460–470. Zhang, J., Mao, Z., Xue, W. et al. (2011). KU80 gene is related to non-homologous end-joining and genome stability in Aspergillus niger. Curr. Microbiol. 62: 1342–1346. Mizutani, O., Kudo, Y., Saito, A. et al. (2008). A defect of LigD (human Lig4 homolog) for non-homologous end joining significantly improves efficiency of gene-targeting in Aspergillus oryzae. Fungal Genet. Biol. 45: 878–889. Qin, X., Li, R., Luo, X. et al. (2017). Deletion of ligD significantly improves gene targeting frequency in the lignocellulolytic filamentous fungus Penicillium oxalicum. Fungal Biol. 121: 615–623. Tashiro, S., Futagami, T., Wada, S. et al. (2013). Construction of a ligD disruptant for efficient gene targeting in white koji mold, Aspergillus kawachii. J. Gen. Appl. Microbiol. 59: 257–260. Fang, Z., Zhang, Y., Cai, M. et al. (2012). Improved gene targeting frequency in marine-derived filamentous fungus Aspergillus glaucus by disrupting ligD. J. Appl. Genet. 53: 355–362. Kück, U., Walz, M., Mohr, G., and Mracek, M. (1989). The 5′ -sequence of the isopenicillin N-synthetase gene (pcbC) from Cephalosporium acremonium directs the expression of the prokaryotic hygromycin B phosphotransferase gene (hph) in Aspergillus niger. Appl. Microbiol. Biotechnol. 31: 358–365. Austin, B., Hall, R.M., and Tyler, B.M. (1990). Optimized vectors and selection for transformation of Neurospora crassa and Aspergillus nidulans to bleomycin and phleomycin resistance. Gene 93: 157–162.

References

201 Forment, J.V., Ramón, D., and MacCabe, A.P. (2006). Consecutive gene

202

203

204

205

206

207

208

209

210

211 212

213

214

deletions in Aspergillus nidulans: application of the Cre/loxP system. Curr. Genet. 50: 217–224. Mizutani, O., Masaki, K., Gomi, K., and Iefuji, H. (2012). Modified Cre-loxP recombination in Aspergillus oryzae by direct introduction of Cre recombinase for marker gene rescue. Appl. Environ. Microbiol. 78: 4126–4133. Steiger, M.G., Vitikainen, M., Uskonen, P. et al. (2011). Transformation system for Hypocrea jecorina (Trichoderma reesei) that favors homologous integration and employs reusable bidirectionally selectable markers. Appl. Environ. Microbiol. 77: 114–121. Honda, S. and Selker, E.U. (2009). Tools for fungal proteomics: multifunctional neurospora vectors for gene replacement, protein expression and protein purification. Genetics 182: 11–23. Tani, S., Tsuji, A., Kunitake, E. et al. (2013). Reversible impairment of the KU80 gene by a recyclable marker in Aspergillus aculeatus. AMB Express 3: 4. Nielsen, M.L., Albertsen, L., Lettier, G. et al. (2006). Efficient PCR-based gene targeting with a recyclable marker for Aspergillus nidulans. Fungal Genet. Biol. 43: 54–64. Nielsen, J.B., Nielsen, M.L., and Mortensen, U.H. (2008). Transient disruption of non-homologous end-joining facilitates targeted genome manipulations in the filamentous fungus Aspergillus nidulans. Fungal Genet. Biol. 45: 165–170. Arazoe, T., Ogawa, T., Miyoshi, K. et al. (2015). Tailor-made TALEN system for highly efficient targeted gene replacement in the rice blast fungus. Biotechnol. Bioeng. 112: 1335–1342. Arazoe, T., Miyoshi, K., Yamato, T. et al. (2015). Tailor-made CRISPR/Cas system for highly efficient targeted gene replacement in the rice blast fungus. Biotechnol. Bioeng. 112: 2543–2549. Matsu-ura, T., Baek, M., Kwon, J., and Hong, C. (2015). Efficient gene editing in Neurospora crassa with CRISPR technology. Fungal Biol. Biotechnol. 2: 4. Pohl, C., Kiel, J.A.K.W., Driessen, A.J.M. et al. (2016). CRISPR/Cas9 based genome editing of Penicillium chrysogenum. ACS Synth. Biol. 5: 754–764. Katayama, T., Nakamura, H., Zhang, Y. et al. (2019). Forced recycling of an AMA1-based genome-editing plasmid allows for efficient multiple gene deletion/integration in the industrial filamentous fungus Aspergillus oryzae. Appl. Environ. Microbiol. 85: 1–16. Arras, S.D.M., Chua, S.M.H., Wizrah, M.S.I. et al. (2016). Targeted genome editing via CRISPR in the pathogen Cryptococcus neoformans. PLoS One 11: e0164322. Fuller, K.K., Chen, S., Loros, J.J., and Dunlap, J.C. (2015). Development of the CRISPR/Cas9 system for targeted gene disruption in Aspergillus fumigatus. Eukaryot. Cell 14: 1073–1080.

333

334

10 Genome Editing of Eukarya

215 Pohl, C., Mózsik, L., Driessen, A.J.M. et al. (2018). Genome editing in

216

217

218

219

220

221

222

223

224

225

226

227 228

229

Penicillium chrysogenum using Cas9 ribonucleoprotein particles. In: Synthetic Biology: Methods and Protocols (ed. J.C. Braman), 213–232. Springer Science+Business Media. Foster, A.J., Martin-Urdiroz, M., Yan, X. et al. (2018). CRISPR-Cas9 ribonucleoprotein-mediated co-editing and counterselection in the rice blast fungus. Sci. Rep. 8: 1–12. Al Abdallah, Q., Souza, A.C.O., Martin-Vicente, A. et al. (2018). Whole-genome sequencing reveals highly specific gene targeting by in vitro assembled Cas9-ribonucleoprotein complexes in Aspergillus fumigatus. Fungal Biol. Biotechnol. 5: 5–12. Nødvig, C.S., Hoof, J.B., Kogle, M.E. et al. (2018). Efficient oligo nucleotide mediated CRISPR-Cas9 gene editing in Aspergilli. Fungal Genet. Biol. 115: 78–89. Huck, S., Bock, J., Girardello, J. et al. (2019). Marker-free genome editing in Ustilago trichophora with the CRISPR-Cas9 technology. RNA Biol. 16: 397–403. Zhang, C., Meng, X., Wei, X., and Lu, L. (2016). Highly efficient CRISPR mutagenesis by microhomology-mediated end joining in Aspergillus fumigatus. Fungal Genet. Biol. 86: 47–57. Chen, J., Lai, Y., Wang, L. et al. (2017). CRISPR/Cas9-mediated efficient genome editing via blastospore-based transformation in entomopathogenic fungus Beauveria bassiana. Sci. Rep. 8: 1–10. Liu, Q., Gao, R., Li, J. et al. (2017). Development of a genome-editing CRISPR/Cas9 system in thermophilic fungal Myceliophthora species and its application to hyper-cellulase production strain engineering. Biotechnol. Biofuels 10: 1. Vanegas, K.G., Jarczynska, Z.D., Strucko, T., and Mortensen, U.H. (2019). Cpf1 enables fast and efficient genome editing in Aspergilli. Fungal Biol. Biotechnol. 6: 1–10. Zheng, X., Zheng, P., Zhang, K. et al. (2018). 5S rRNA promoter for guide RNA expression enabled highly efficient CRISPR/Cas9 genome editing in Aspergillus niger. ACS Synth. Biol. 8 (7): 1568–1574. Zheng, X., Zheng, P., Sun, J. et al. (2018). Heterologous and endogenous U6 snRNA promoters enable CRISPR/Cas9 mediated genome editing in Aspergillus niger. Fungal Biol. Biotechnol. 5: 2. Song, L., Ouedraogo, J.P., Kolbusz, M. et al. (2018). Efficient genome editing using tRNA promoter-driven CRISPR/Cas9 gRNA in Aspergillus niger. PLoS One 13: 1–17. Wenderoth, M., Pinecker, C., Voß, B., and Fischer, R. (2017). Establishment of CRISPR/Cas9 in Alternaria alternata. Fungal Genet. Biol. 101: 55–60. Chen, B.-X., Wei, T., Ye, Z.-W. et al. (2018). Efficient CRISPR-Cas9 gene disruption system in edible-medicinal mushroom Cordyceps militaris. Front. Microbiol. 9: 1–11. Tjio, J.H., Puck, T.T. (1958). The genetics of somatic mammalian cells: II. Chromosomal constitution of cells in tissue culture. J. Exp. Med. 108: 259–68.

References

230 Wurm, F. and Wurm, M. (2017). Cloning of CHO cells, productivity and

genetic stability – a discussion. Processes 5: 20. 231 Wurm, F.M. and Hacker, D. (2011). First CHO genome. Nat. Biotechnol. 29:

718–720. 232 Walsh, G. (2018). Biopharmaceutical benchmarks 2018. Nat. Biotechnol. 36:

1136–1145. 233 Kaufman, R.J., Wasley, L.C., Spiliotes, A.J. et al. (1985). Coamplification

234

235

236

237

238 239

240 241

242

243

244

245

and coexpression of human tissue-type plasminogen activator and murine dihydrofolate reductase sequences in Chinese hamster ovary cells. Mol. Cell. Biol. 5: 1750–1759. Finn, G.K., Kurz, B.W., Cheng, R.Z., and Shmookler Reis, R.J. (1989). Homologous plasmid recombination is elevated in immortally transformed cells. Mol. Cell. Biol. 9: 4009–4017. Jordan, M., Schallhorn, A., and Wurm, F.M. (1996). Transfecting mammalian cells: optimization of critical parameters affecting calcium-phosphate precipitate formation. Nucleic Acids Res. 24: 596–601. Norton, P.A. and Pachuk, C.J. (2003). Methods for DNA introduction into mammalian cells. In: New Comprehensive Biochemistry (ed. S.C. Makrides), 265–277. Elsevier. Folger, K.R., Wong, E.A., Wahl, G., Capecchi, M.R. (1982). Patterns of integration of DNA microinjected into cultured mammalian cells: evidence for homologous recombination between injected plasmid DNA molecules. Mol. Cell. Biol. 2: 1372–87. Zheng, H. and Wilson, J.H. (1990). Gene targeting in normal and amplified cell lines. Nature 344: 170–173. Krämer, O., Klausing, S., and Noll, T. (2010). Methods in mammalian cell line engineering: from random mutagenesis to sequence-specific approaches. Appl. Microbiol. Biotechnol. 88: 425–436. Jasin, M. and Berg, P. (1988). Homologous integration in mammalian cells without target gene selection. Genes Dev. 2: 1353–1363. Smithies, O., Gregg, R.G., Boggs, S.S. et al. (1985). Insertion of DNA sequences into the human chromosomal 𝛽-globin locus by homologous recombination. Nature 317: 230–234. Yamane-Ohnuki, N., Kinoshita, S., Inoue-Urakubo, M. et al. (2004). Establishment of FUT8 knockout Chinese hamster ovary cells: an ideal host cell line for producing completely defucosylated antibodies with enhanced antibody-dependent cellular cytotoxicity. Biotechnol. Bioeng. 87: 614–622. Kito, M., Itami, S., Fukano, Y. et al. (2003). Construction of engineered cho strains for high-level production of recombinant proteins. Appl. Microbiol. Biotechnol. 60: 442–448. Urlaub, G. and Chasin, L.A. (1980). Isolation of Chinese hamster cell mutants deficient in dihydrofolate reductase activity. Proc. Natl. Acad. Sci. 77: 4216–4220. Pallavicini, M.G., DeTeresa, P.S., Rosette, C. et al. (1990). Effects of methotrexate on transfected DNA stability in mammalian cells. Mol. Cell. Biol. 10: 401–404.

335

336

10 Genome Editing of Eukarya

246 Gandor, C., Leist, C., Fiechter, A., and Asselbergs, F.A.M. (1995). Amplifi-

247

248

249

250

251

252

253

254

255

256

257

258

259

260

cation and expression of recombinant genes in serum-independent Chinese hamster ovary cells. FEBS Lett. 377: 290–294. Wurm, F.M., Gwinn, K.A., and Kingston, R.E. (1986). Inducible overexpression of the mouse c-myc protein in mammalian cells. Proc. Natl. Acad. Sci. U. S. A. 83: 5414–5418. Barnes, L.M., Bentley, C.M., and Dickson, A.J. (2000). Advances in animal cell recombinant protein production: GS-NS0 expression system. Cytotechnology 32: 109–123. Bebbington, C.R., Renner, G., Thomson, S. et al. (1992). High-level expression of a recombinant antibody from myeloma cells using a glutamine synthetase gene as an amplifiable selectable marker. Nat. Biotechnol. 10: 169–175. Huang, Y., Li, Y., Wang, Y.G. et al. (2007). An efficient and targeted gene integration system for high-level antibody expression. J. Immunol. Methods 322: 28–39. Lieu, P.T., MacHleidt, T., Thyagarajan, B. et al. (2009). Generation of site-specific retargeting platform cell lines for drug discovery using phiC31 and R4 integrases. J. Biomol. Screen. 14: 1207–1215. Inniss, M.C., Bandara, K., Jusiak, B. et al. (2017). A novel Bxb1 integrase RMCE system for high fidelity site-specific integration of mAb expression cassette in CHO Cells. Biotechnol. Bioeng. 114: 1837–1846. Santiago, Y., Chan, E., Liu, P.-Q. et al. (2008). Targeted gene knockout in mammalian cells by using engineered zinc-finger nucleases. Proc. Natl. Acad. Sci. 105: 5809–5814. Cost, G.J., Freyvert, Y., Vafiadis, A. et al. (2010). BAK and BAX deletion using zinc-finger nucleases yields apoptosis-resistant CHO cells. Biotechnol. Bioeng. 105: 330–340. Liu, P.Q., Chan, E.M., Cost, G.J. et al. (2010). Generation of a triple-gene knockout mammalian cell line using engineered zinc-finger nucleases. Biotechnol. Bioeng. 106: 97–105. Orlando, S.J., Santiago, Y., DeKelver, R.C. et al. (2010). Zinc-finger nuclease-driven targeted integration into mammalian genomes using donors with limited chromosomal homology. Nucleic Acids Res. 38: 1–15. Cristea, S., Freyvert, Y., Santiago, Y. et al. (2013). In vivo cleavage of transgene donors promotes nuclease-mediated targeted integration. Biotechnol. Bioeng. 110: 871–880. Lin, P.-C., Chan, K.F., Kiess, I.A. et al. (2019). Attenuated glutamine synthetase as a selection marker in CHO cells to efficiently isolate highly productive stable cells for the production of antibodies and other biologics. MAbs 11: 965–976. Grav, L.M., Lee, J.S., Gerling, S. et al. (2015). One-step generation of triple knockout CHO cell lines using CRISPR/Cas9 and fluorescent enrichment. Biotechnol. J. 10: 1446–1456. Lee, J.S., Kallehauge, T.B., Pedersen, L.E., and Kildegaard, H.F. (2015). Site-specific integration in CHO cells mediated by CRISPR/Cas9 and homology-directed DNA repair pathway. Sci. Rep. 5: 8572.

References

261 Zhao, M., Wang, J., Luo, M. et al. (2018). Rapid development of stable

262

263

264

265

266

267

268

269

270

271

transgene CHO cell lines by CRISPR/Cas9-mediated site-specific integration into C12orf35. Appl. Microbiol. Biotechnol. 102: 6105–6117. Lee, J.S., Park, J.H., Ha, T.K. et al. (2018). Revealing key determinants of clonal variation in transgene expression in recombinant CHO cells using targeted genome editing. ACS Synth. Biol. 7: 2867–2878. Gaidukov, L., Wroblewska, L., Teague, B. et al. (2018). A multi-landing pad DNA integration platform for mammalian cell engineering. Nucleic Acids Res. 46: 4072–4086. Kawabe, Y., Komatsu, S., Komatsu, S. et al. (2018). Targeted knock-in of an scFv-Fc antibody gene into the HPRT locus of Chinese hamster ovary cells using CRISPR/Cas9 and CRIS-PITCh systems. J. Biosci. Bioeng. 125: 599–605. Lu, Y., Zhou, Q., Han, Q. et al. (2018). Inactivation of deubiquitinase CYLD enhances therapeutic antibody production in Chinese hamster ovary cells. Appl. Microbiol. Biotechnol. 102: 6081–6093. Wang, W., Zheng, W., Hu, F. et al. (2018). Enhanced biosynthesis performance of heterologous proteins in CHO-K1 cells using CRISPR-Cas9. ACS Synth. Biol. 7: 1259–1268. Eisenhut, P., Klanert, G., Weinguny, M. et al. (2018). A CRISPR/Cas9 based engineering strategy for overexpression of multiple genes in Chinese hamster ovary cells. Metab. Eng. 48: 72–81. Shin, J., Lee, N., Song, Y. et al. (2015). Efficient CRISPR/Cas9-mediated multiplex genome editing in CHO cells via high-level sgRNA-Cas9 complex. Biotechnol. Bioprocess Eng. 20: 825–833. Grav, L.M., Sergeeva, D., Lee, J.S. et al. (2018). Minimizing clonal variation during mammalian cell line engineering for improved systems biology data generation. ACS Synth. Biol. 7: 2148–2159. Kim, M.S., Kim, W.H., and Lee, G.M. (2008). Characterization of site-specific recombination mediated by Cre recombinase during the development of erythropoietin producing CHO cell lines. Biotechnol. Bioprocess Eng. 13: 418–423. Schmieder, V., Bydlinski, N., Strasser, R. et al. (2018). Enhanced genome editing tools for multi-gene deletion knock-out approaches using paired CRISPR sgRNAs in CHO cells. Biotechnol. J. 13: 1–10.

337

339

Part 2 Applications

341

11 Metabolic Engineering of Escherichia coli Zi Wei Luo 1,2 , Jung Ho Ahn 1,2 , Tong Un Chae 1,2 , So Young Choi 1,2 , Seon Young Park 1,2 , Yoojin Choi 1,2 , Jiyong Kim 1,2 , Cindy Pricilia Surya Prabowo 1,2 , Jong An Lee 1,2 , Dongsoo Yang 1,2 , Taehee Han 1,2 , Hanwen Xu 1,2 , and Sang Yup Lee 1,2,3,4 1 Metabolic and Biomolecular Engineering National Research Laboratory, Department of Chemical and Biomolecular Engineering (BK21 Plus Program), Institute for the BioCentury, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea 2 Systems Metabolic Engineering and Systems Healthcare Cross-Generation Collaborative Laboratory, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea 3 BioProcess Engineering Research Center, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea 4 BioInformatics Research Center, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea

11.1 Introduction Metabolic engineering has become an essential and enabling technology for the development of engineered microbial strains capable of producing chemicals and materials at high efficiencies from renewable resources. Its integration with more recently developed tools and strategies in the fields of systems biology, synthetic biology, and evolutionary engineering has made a paradigm shift in performing metabolic engineering research, allowing more rapid, comprehensive, and sophisticated engineering of cellular regulatory and metabolic networks at systems level, which is termed systems metabolic engineering [1–3]. Thanks to these advances, an increasing number of metabolic engineering applications, such as production of biobased chemicals, fuels, and materials, have been successfully demonstrated over the past several decades [4–6]. Such advances are allowing us to move toward a sustainable biobased economy and helping us accomplish the United Nations’ sustainable development goals [7]. Escherichia coli has been serving as one of the most frequently employed host microorganisms for carrying out metabolic engineering and other related studies, either for the establishment of new and proof-of-concept engineering tools or for the development of novel and useful microbial biotechnological applications [8, 9]. The preference toward E. coli as a workhorse for metabolic engineering and synthetic biology has arisen from its several advantageous features, such as fast growth, well-established cultivation techniques, the wealth of knowledge on E. coli with respect to all subject disciplines including genetics, biochemistry, and physiology, and the availability of a myriad of genetic and Metabolic Engineering: Concepts and Applications, First Edition. Edited by Sang Yup Lee, Jens Nielsen, and Gregory Stephanopoulos. © 2021 WILEY-VCH GmbH. Published 2021 by WILEY-VCH GmbH.

342

11 Metabolic Engineering of Escherichia coli

other engineering tools and strategies, which have been accumulated through the effort of extensive investigation and development over more than a century [8]. Such knowledgebase has made it possible to design rational approaches for engineering E. coli possessing desirable phenotypes. Also, such knowledge allowed the establishment of sophisticated strategies for better metabolic engineering, thereby largely speeding up the overall strain development process and reducing costs of development especially for industrial applications [10]. For instance, elucidation of the complex gene regulatory networks involved in the cellular amino acid biosynthesis has contributed significantly to the development of engineered E. coli strains capable of overproducing amino acids by applying targeted deregulation tactics [11–13]. Advanced DNA assembly methods, featuring rapidness, reliability, standardization, modularity, and automation in high throughput, have allowed combinatorial construction of multigene pathway in E. coli a straightforward task [14–16]. In addition, various genome-wide manipulation techniques have been developed for rapid and multiplexed gene deletion, insertion, replacement, or even up- and downregulation. Some of the representative techniques are multiplex automated genome engineering (MAGE) [17], clustered regularly interspaced short palindromic repeats (CRISPR)-based gene editing technologies along with CRISPR activation and interference [18–21], and synthetic small regulatory RNAs [22–24], which are indispensable for performing systems metabolic engineering of E. coli and other microorganisms. On the other hand, with the increasing availability of omics data, especially for E. coli, various systems biological tools and strategies supported by bioinformatics (genome-scale metabolic network reconstruction and modeling, flux balance analysis, etc.) have been developed and utilized in combination with metabolic engineering, facilitating the development of high performance strains [25, 26]. Taken together, many of the great tools and strategies of metabolic engineering described earlier and also in previous chapters have been established using E. coli, making E. coli one of the most preferred microbial cell factories. In this chapter, we review the achievements and advances in metabolic engineering of E. coli for a wide spectrum of applications, ranging from biobased production of fuels (including traditional and advanced biofuels) and chemicals (including bulk, specialty, and natural chemicals) to materials (including proteins, biopolymers, and nanoparticle [NP] materials) through representative examples.

11.2 Metabolic Engineering of E. coli for the Production of Fuels Our energy system has been relying too much on fossil resources of limited nature, which is one of the major causes of climate change [27, 28]. Renewable biofuels are an important alternative to alleviate the drawbacks associated with fossil fuels [29–31]. Metabolic engineering has been performed on various native microorganisms capable of producing alcohols [32–34]. In parallel, much progress has been made in metabolic engineering of non-native biofuel

11.2 Metabolic Engineering of E. coli for the Production of Fuels

producers, especially E. coli, through reconstitution of heterologous biofuel pathways derived from other organisms. These non-natural biofuels produced by E. coli (Table 11.1) can be broadly categorized according to their production pathways (Figure 11.1): fermentative pathway, keto acid pathway, isoprenoid pathway, and fatty acid pathway. 11.2.1

Fermentative Pathway

The fermentative pathway uses pyruvate and acetyl-CoA as precursors to produce alcohols including ethanol, 1-butanol, and isopropanol (Figure 11.1, yellow highlight). Ethanol is used to blend with gasoline to increase the octane number of fuels and reduce the engine pollutant emission [107]. The first ethanol production in E. coli was achieved by overexpressing the Zymomonas mobilis PET operon comprising the pdc and adhII genes that encode pyruvate decarboxylase and alcohol dehydrogenase II, respectively, under the native lac promoter [35]. The PET operon was also expressed in E. coli by chromosomal integration into the pflB gene (encoding pyruvate formate-lyase) to maintain stable expression while simultaneously removing a competing pathway from pyruvate. The frd gene encoding fumarate reductase was further deleted to eliminate succinate production. The resultant engineered strain, designated KO11, produced 54.4 and 41.6 g l−1 of ethanol from glucose and xylose, respectively, by batch fermentation [36]. Furthermore, the KO11 strain was subjected to adaptive laboratory evolution by exposing cells to increasing levels of ethanol to enhance ethanol tolerance. This approach resulted in the isolation of the LYO1 strain, which was able to produce 63.2 g l−1 of ethanol from xylose by batch fermentation [37]. Butanol possesses higher energy content than ethanol and is much less hygroscopic, making it a better gasoline substitute [38]. Isopropanol is a secondary alcohol used to substitute methanol in the esterification of fats and oils to reduce the crystallization of biodiesel at low temperatures [108]. Production of 1-butanol and isopropanol has been established in E. coli by overexpressing the heterologous genes responsible for their biosynthetic pathways from Clostridium acetobutylicum [39, 42]. 1-Butanol is synthesized via six enzymatic steps from acetyl-CoA (Figure 11.1, yellow highlight). By replacing the C. acetobutylicum thl gene (encoding acetyl-CoA acetyltransferase) responsible for the conversion of acetyl-CoA to acetoacetyl-CoA with an E. coli ortholog (atoB), elevated 1-butanol production was obtained [38, 109, 110]. In addition, an engineered E. coli strain deficient in native fermentation pathways (ΔldhA ΔadhE ΔfrdBC Δpta) was constructed to increase the cellular availability of NADH and acetyl-CoA, which served as effective driving forces for 1-butanol synthesis. Further overexpression of the Treponema denticola ter gene encoding trans-enoyl-CoA reductase in the above strain resulted in the production of 30 g l−1 of 1-butanol by anaerobic fed-batch fermentation with in situ product removal [40]. Isopropanol is also synthesized from acetyl-CoA, via four enzymatic steps (Figure 11.1, yellow highlight). An engineered E. coli strain was developed to produce isopropanol by overexpressing the C. acetobutylicum thl and adc (encoding acetoacetate decarboxylase), E. coli atoAD (encoding acetoacetyl-CoA

343

Table 11.1 Representative biofuels, chemicals, and materials produced by metabolically engineered E. coli. Titer (g l−1 )

Yield (g g−1 )

Productivity (g l−1 h−1 )

Carbon source

Culture style

Metabolic engineering strategies

Ethanol

34.6

N/Aa)

N/A

Glucose

Flask

Overexpression of Z. mobilis pdc and adhII

[35]

Ethanol

54.4

0.57

N/A

Glucose

Batch

Chromosomal integration of Z. mobilis pdc and adhII into pfl locus; deletion of frd

[36]

Product

Reference

Biofuel

Ethanol

41.6

0.53

N/A

Xylose

Batch

Same as above

[36]

Ethanol

63.2

0.45

N/A

Xylose

Batch

Mutant strain with high ethanol tolerance

[37]

Butanol

0.552

N/A

N/A

Glycerol

Flask

Overexpression of C. acetobutylicum adhE2, crt, bcd, eftAB, and hbd and E. coli atoB; deletion of adhE, ldhA, frdBC, fnr, and pta

[38]

Butanol

1.2

N/A

N/A

Glucose

Flask

Overexpression of C. acetobutylicum adhE2, crt, bcd, eftAB, hbd, thiL, and adhE

[39]

Butanol

30

0.287

0.18

Glucose

Fed-batch with gas stripping

Overexpression of T. denticola ter

[40]

Butanol

0.667

N/A

N/A

Glucose

Flask

Based on keto acid pathway; overexpression of leuABCD, ilvA, KDC, and ADH; deletion of ilvD

[41]

Isopropanol

4.9

0.145

0.41

Glucose

Flask

Overexpression of C. acetobutylicum thl and adc, E. coli atoAD, and C. beijerinckii adh

[42]

Isopropanol

143

0.224

N/A

Glucose

Fed-batch with gas stripping

Optimization of fermentation condition and product removal by gas stripping

[43]

Isobutanol

22

0.35

N/A

Glucose

Flask

Overexpression of B. subtilis ilvICD and alsS, KDC, and ADH; deletion of adhE, ldhA, frdAB, fnr, pta, and pflB

[41]

Isobutanol

13.4

0.411

N/A

Glucose

Flask

Overexpression of engineered ilvC and adhA

[44]

Isobutanol

50

0.29

0.7

Glucose

Fed-batch with gas stripping

Overexpression of L. lactis kivd and adhA, B. subtilis alsS, and ilvCD; deletion of adhE, frdBC, fnr, ldhA, pta, and pflB

[45]

Propanol

0.6

N/A

N/A

Glucose

Flask

Overexpression of ilvA, KDC, and ADH; deletion of ilvD

[41]

3-Methyl-1butanol

9.5

0.11

N/A

Glucose

Two-phase flask

Overexpression of ilvCD, KDC, ADH, leuA (G462D), leuBCD, and B. subtilis alsS

[46]

2-Phenylethanol

1.016

0.05

0.033

Glucose

Flask

Overexpression of aroGfbr , pheAfbr , KDC, ADH, and S. cerevisiae aro8

[47]

Isoprene

0.587

N/A

N/A

Glucose

Two-phase flask

Overexpression of mvaD, idi, and codon-optimized ispS

[48]

FAEEs

0.674

N/A

N/A

Glucose

Flask

Overexpression of TES, ACL, FAR, pdc, adhB, and AT; deletion of fadE

[49]

FAEEs

1.5

N/A

N/A

Glucose

Batch

Dynamic sensor-regulated expression of pdc, adhB, tesA, fadD, and atfA

[50]

0.56

N/A

N/A

Glucose

Batch

Overexpression of pdc, adhB, tesA, atfA, BTE, and DmJHAMT; deletion of fadD and aas

[51]

0.3

N/A

N/A

Glucose

Flask

Overexpression of orf1594 and orf1593

[52]

FAMEs Long-chain alkanes and alkenes (C13–17)

(continued)

Table 11.1 (Continued)

Product

Titer (g l−1 )

Yield (g g−1 )

Productivity (g l−1 h−1 )

Carbon source

Culture style

Metabolic engineering strategies

Alkenes

0.098

N/A

24.9 mg ⋅g/DCW

Glucose

Flask

Overexpression of oleTJE and rhFRED

[53]

Short-chain alkanes (C4–12)

0.581

N/A

N/A

Glucose

Fed-batch

Overexpression of tesA (L109P), fadD, acr, and CER1

[54]

4ABA

66.25

N/A

1.01

Glucose

Fed-batch

Overexpression of C. propionicum act and mutant gadB

[55]

5AVA

0.86

0.046

0.018

Glucose

Flask

Overexpression of P. putida davB and davA; deletion of ldcC and cadA

[56]

3HP

71.9

N/A

1.8

Glycerol

Fed-batch

Overexpression of K. pneumonia glycerol dehydratase regeneration factor and a mutant of C. necator aldehyde dehydrogenase; deletion of ackA, pta, and yqhD

[57]

4HB

103.4

0.419

0.844

Glycerol

Fed-batch

Overexpression of aceEF, lpd, sucCD, C. kluyveri sucD, and 4hbd; deletion of ptsG, iclR, sdhAB, gabD, and yneI

[58]

1,3-Diaminopropane

13.06

N/A

0.19

Glucose

Fed-batch

Overexpression of A. baumannii dat, ddc, E. coli ppc and aspC; mutation in thrA and lysC to remove feedback inhibition; deletion of pfkA

[59]

Putrescine

43.0

0.256

1.265

Glucose

Fed-batch

Overexpression of E. coli speC, argECBH, speF, potE, argD, and speC; deletion of lacI, speE, speG, argI, and puuPA

[24]

Reference

Chemical

Cadaverine

12.6

N/A

0.18

Glucose

Fed-batch

Overexpression of E. coli dapA and cadA; deletion of lacI, speE, speG, ygjG, and puuPA; down-regulation of murE

[22]

Butyrolactam

54.14

0.12

0.58

Glucose

Fed-batch

Overexpression of E. coli mutant gadB and C. propionicum act; deletion of lacI

[55]

Valerolactam

1.18

0.0045

0.017

Glucose

Fed-batch

Overexpression of E. coli ppc, P. putida davAB, and C. propionicum act; deletion of lacI, speE, speG, ygjG, puuPA, cadA, and ldcC

[55]

Caprolactam

79.6 μg l−1

3.9 μg g−1

9.5 μg l−1 h−1

Glycerol

Fed-batch

Overexpression of A. vinelandii nifV , M. aeolicus Nankai-3 aksDEF, L. lactis mutant kdcA, V. fluvialis vfl, and C. propionicum act

[55]

Glutaric acid

54.5

0.37

0.76

Glucose

Fed-batch

Modulating expression of cadA, patA, patD, gabT, and gabD; overexpression of gabP and potE; deletion of iclR

[60]

99.2

1.14

1.3

Glucose

Two-phase

Overexpression of R. etli pyc; deletion of pflB, ldhA, and ptsG

[61]

1,3-Propanediol

135

0.51

3.5

Glucose

Fed-batch

Overexpression of galP, glk, and yqhD, S. cerevisiae DAR1 and GPP2, and K. pneumonia dhaB1, dhaB2, and dhaB3; deletion of ptsH, ptsI, crr, and tpiA; down-regulation of gapA

[62]

1,4-Butanediol

18

N/A

N/A

Glucose

Fed-batch

Overexpression of P. gingivalis 4hbD, cat2, and ald and C. beijerinckii adh

[63]

Succinic acid

(continued)

Table 11.1 (Continued)

Product

Titer (g l−1 )

Yield (g g−1 )

Productivity (g l−1 h−1 )

Carbon source

Culture style

Metabolic engineering strategies

l-Arginine

11.64

0.17

0.24

Glucose

Batch

Overexpression of yggA and argPV216A ; deletion of argR, speC, speA, and adiA; replacement of argA with feedback-resistant variant

[64]

l-Threonine

82.4

0.39

1.65

Glucose

Fed-batch

Overexpression of rhtABC and acs; deletion of thrA, lysC, tdh, ilvA, lysA, metA, iclR, and tdcC

[26]

l-Valine

60.7

0.22

2.06

Glucose

Fed-batch

Overexpression of ilvB, mutant ilvN, ilvCED, lrp, and ygaZH; deletion of ilvA

[65]

l-Tyrosine

43.14

0.107

N/A

Glucose

Fed-batch

Overexpression of aroGfbr , aroL, and Z. mobilis tyrC; deletion of tyrP

[66]

Methyl anthranilate

4.47

N/A

N/A

Glucose

Two-phase fed-batch

Overexpression of Zea mays aamt1; gene expression optimization; increasing the precursor anthranilate supply and intracellular SAM pool; in situ product removal with tributyrin

[67]

S-Mandelic acid

1.02

0.012

N/A

Glucose

Flask

Overexpression of A. orientalis hmaS; combinatorial inactivation of competing pathways

[68]

R-Mandelic acid

0.88

0.010

N/A

Glucose

Flask

Overexpression of A. orientalis hmaS, S. coelicolor hmo, and R. graminis dmd; combinatorial inactivation of competing pathways

[68]

S-Styrene oxide

1.32

0.115

N/A

Glucose

Flask

Overexpression of A. thaliana PAL, S. cerevisiae FDC1, P. putida S12 styAB, and E. coli tktA; deletion of tyrA

[69]

Reference

Amorpha-4,11-diene

27.4

N/A

N/A

Glucose

Fed-batch

Overexpression of MEP pathway genes; introduction of heterologous mevalonate pathway for enhanced pool of precursors, IPP and DMAPP

[70]

Artemisinic acid

0.105

N/A

N/A

Complex

Flask

Overexpression of MEP pathway genes; introduction of heterologous mevalonate pathway for enhanced pool of precursors, IPP and DMAPP; N-terminal transmembrane engineering of AMO

[71]

Oxygenated taxane

0.57

N/A

N/A

Glycerol

Fed-batch

Balancing expression levels between T5aOH and CPR genes; N-terminal modification of T5aOH and CPR

[72]

Resveratrol

2.3

N/A

N/A

Glucose

Flask

Investigation of various STSs; exploration of different genetic configurations of 4CL and STS; improvement of the precursor malonyl-CoA pool by overexpressing its biosynthetic genes

[73]

Naringenin

0.422

N/A

N/A

Glucose

Flask

Improvement of the precursor malonyl-CoA pool by silencing candidate genes using CRISPR interference system

[74]

S-Reticuline

0.046

N/A

N/A

Glucose, glycerol

Fed-batch

Improvement of the precursor tyrosine pool by increasing metabolic flux in the shikimate pathway

[75]

Thebaine

0.0021

N/A

N/A

Complex

Flask

N-terminus deletion of enzymes for their functional expression; stepwise cultivation

[76]

(continued)

Table 11.1 (Continued)

Product

Titer (g l−1 )

Yield (g g−1 )

Productivity (g l−1 h−1 )

Carbon source

Culture style

Metabolic engineering strategies

Hydrocodone

360 μg l−1

N/A

N/A

Complex

Flask

N-terminus deletion of enzymes for their functional expression; stepwise cultivation

[76]

Erythromycin A

0.01

N/A

N/A

Glycerol, propionate

Flask

His-tag fusion to pathway enzymes for improving their expression; overexpression of the chaperone-encoding genes for soluble expression of a large modular PKS

[77]

Echinomycin

300 μg l−1

N/A

N/A

Glucose

Fed-batch

Isolation, characterization, and heterologous expression of echinomycin biosynthetic gene cluster from S. lasaliensis

[78]

Yersiniabactin

0.067

N/A

0.0011

Glucose, salicylate

Fed-batch

Lowering culture temperature to 22 ∘ C for more abundant expression of large genes

[79]

Immunoglobulin G (IgG)

0.001–0.025

N/A

N/A

Complex

Flask

Oxidative cytoplasm conditions

[80]

Fab

0.131

N/A

N/A

Complex

Flask

Transcriptional machinery engineering

[81]

Protein glycosylated with glycan trimannosyl chitobiose

50 μg l−1

N/A

N/A

Complex

Flask

Introduce yeast glycotransferases

[82]

Reference

Material

Protein glycosylated with glycan trimannosyl chitobiose

0.014

N/A

N/A

Complex

Flask

Flow cytometry cell sorting

[83]

Atm1 (yeast mitochondrial ABC transporter)

0.0004–0.001

N/A

N/A

Complex

Flask

OmpA signal sequence fusion

[84, 85]

[86]

Spider dragline silk

0.5–2.7

N/A

N/A

Glucose

Fed-batch

Overexpression of tRNAGly and glyA

Poly(3HB)

141.6

N/A

4.63

Glucose

Fed-batch

Overexpression of phaCAB

[87]

Poly(3HP)

25.7

N/A

0.71

Glycerol

Fed-batch

Overexpression of tyrA, dhaB123, gdrAB, pduP, and phaC; deletion of tyrA and prpR

[88]

Poly(4HB)

7.843

N/A

0.151

Glucose

Fed-batch

Overexpression of sucD, 4hbD, orfZ, and phaP1; deletion of sad and gabD

[89]

MCL-PHAb)

21 wt% DCW

N/A

N/A

Decanoate

Flask

Overexpression of phaC1; deletion of fadB

[90]

SCL-MCL-PHA or MCL-PHAc)

8 wt% DCW

N/A

N/A

Dodecanoate

Flask

Overexpression of phaC1 and fabG

[91]

SCL-MCL-PHAd)

1.90

N/A

N/A

Gluconate, decanoate

Flask

Overexpression of phaC2, phaAB, and parB; deletion of fadA

[92]

Poly(3HB-co-3HHx) 21.5

N/A

0.53

Dodecanoate

Fed-batch

Overexpression of phaC, phaJ, orf , and phaB

[93]

PLA

11 wt% DCW

N/A

N/A

Glucose

Flask

Overexpression of ldhA, pct532, and phaC1400; deletion of ppc, ackA, and adhE

[94]

Poly(LA-co-3HB)

20

N/A

N/A

Glucose

Fed-batch

Overexpression of phaAB, pct540, phaC1310, ldhA, and acs; deletion of pflB, adhE, lacI, and frdABCD

[95]

(continued)

Table 11.1 (Continued)

Product

Titer (g l−1 )

Yield (g g−1 )

Productivity (g l−1 h−1 )

Carbon source

Culture style

Metabolic engineering strategies

PLGA

1.95

N/A

0.023

Glucose, xylose

Fed-batch

Overexpression of xylBC, ldhA, pct540, and phaC1437; deletion of adhE, poxB, frdB, pflB, and dld

[96]

PLGA

6.93

N/A

0.099

Xylose

Fed-batch

Overexpression of xylBC, ldhA, pct540, phaC1437; deletion of adhE, poxB, frdB, pflB, aceB, dld, and glcDEFGB

[97]

Poly(PhLA-co3HB)

13.9

N/A

0.145

Glucose

Fed-batch

Overexpression of aroGfbr , pheAfbr , phaAB, hadA, phaC1437, and fldH; deletion of tyrR, poxB, pflB, frdB, adhE, tyrB, aspC, and ldhA

[98]

Hyaluronan

0.19

N/A

N/A

Glucose

Flask

Overexpression of S. equisimilis ssehasA and E. coli ugd

[99]

Bacterial cellulose

0.031

N/A

N/A

Glucose

Flask

Overexpression of G. hansenii bcsABCD, cmcax, and ccpAx

[100]

Poly(glutamic acid)

3.7

N/A

0.104

Glucose, glutamate

Fed-batch

Overexpression of B. subtilis pgsBCA

[101]

1.6

N/A

N/A

Glucose

Batch

Overexpression of Synechocystis sp. cphA

[102]

Cyanophycin a) b) c) d)

Not available. Poly(3HHx-co-3HO-co-3HD-co-3HDD). Poly(3HB-co-3HHx) or poly(3HHx-co-3HO-co-3HD). Poly(3HB-co-3HHx-co-3HO-co-3HD).

Reference

354

11 Metabolic Engineering of Escherichia coli

Figure 11.1 Metabolic pathways and engineering strategies for the production of representative biofuels in E. coli. Metabolite abbreviations: Ac-CoA, acetyl-CoA; AcAc-CoA, acetoacetyl-CoA; ACP, acyl carrier protein; DMAPP, dimethylallyl diphosphate; FAEE, fatty acid ethyl ester; FAME, fatty acid methyl ester; FFA, free fatty acid; FPP, farnesyl pyrophosphate; GPP, geranyl pyrophosphate; (S)3HA-ACP, 3-hydroxyacyl-ACP; 3-HB-CoA, 3-hydroxybutyryl-CoA; IPP, isopentenyl pyrophosphate; Mal-CoA, malonyl-CoA; Mal-ACP, malonyl ACP; MEP, methylerythritol-4-phospate; MVA, mevalonate. Gene symbols: aar, acyl-ACP reductase; accBCDA, acetyl-CoA carboxylase complex; adc, fatty aldehyde decarbonylase; adh, aldehyde/alcohol dehydrogenase; adh2, alcohol dehydrogenase; adhE2, aldehyde/alcohol dehydrogenase; adhII, alcohol dehydrogenase II; atfA, wax-ester synthase; atoAD, acetyl-CoA:acetoacetyl-CoA transferase; atoB, acetyl-CoA acetyltransferase; bcd, butyryl-CoA dehydrogenase; BTE, acyl-ACP thioesterase; CER1, aldehyde decarbonylase; crt, crotonase; DmJHAMT, Juvenile hormone acid O-methyltransferase; etf , electron transfer flavoprotein; fadD, acyl-CoA synthase; fadR, fatty acid metabolism regulator protein; frd, fumarate reductase; hbd, 3-hydroxybutyryl-CoA dehydrogenase; idi, isopentenyl diphosphate isomerase; ispSopt , codon-optimized isoprene synthase; KDC, keto acid decarboxylase; kivD, keto acid decarboxylase; mvaD, diphosphomevalonate decarboxylase; mvaE, acetoacetyl-CoA thiolase; mvaS, MVA synthase; mvk, MVA kinase; oleT JE , P450 fatty acid decarboxylase; PCC7942_orf1593, fatty aldehyde decarbonylase; PCC7942_orf1594, acyl-ACP reductase; pdc, pyruvate decarboxylase; pflB, pyruvate formate lyase; pmk, phosphomevalonate kinase; pta, phosphate acetyltransferase; rhFRED, P450 reductase domain; tesA, acyl-ACP thioesterase; thl, acetyl-CoA acetyltransferase. Sources: Atsumi et al. [41], Connor et al. [46], Bastian et al. [44], Guo et al. [47], Liao et al. [103], Kuzuyama [104], Martin et al. [105], and Kang and Nielsen [106].

transferase), and codon-optimized Clostridium beijerinckii adh (encoding alcohol dehydrogenase) genes [42]. In a fed-batch fermentation with pH control and gas stripping for product recovery, this engineered strain successfully produced 143 g l−1 of isopropanol from glucose in 240 hours [43]. 11.2.2

Keto Acid Pathway

Branched-chain alcohols generally have higher octane numbers than straight-chain alcohols given the same carbon chain length. They are better substitutes for gasoline compared to ethanol, due to the higher energy density and lower hygroscopy [41]. Branched alcohols can be biosynthesized from the Ehrlich degradation pathway via the common 2-keto acid precursor [41]. This degradative pathway proceeds with the decarboxylation of 2-keto acids to aldehydes catalyzed by keto acid decarboxylase (KDC), followed by reduction to alcohols by alcohol dehydrogenase (ADH) [41]. Thanks to the broad substrate spectra of KDC and ADH, various alcohols including isobutanol, 1-propanol, 1-butanol, 3-methyl-1-butanol, and 2-phenylethanol could be produced from the corresponding 2-keto acids in a nonfermentative fashion (Figure 11.1, pink highlight) [41, 44, 46, 47]. For instance, an engineered E. coli strain was constructed to convert 2-ketoisovalerate into isobutanol by overexpressing the Lactobacillus lactis kivD and Saccharomyces cerevisiae adh2 genes encoding KDC and ADH, respectively. Furthermore, to increase the precursor 2-ketoisovalerate supply, several genes were overexpressed, including the Bacillus subtilis alsS (encoding acetolactate synthase), and endogenous ilvC and ilvD (encoding acetohydroxy acid isomeroreductase and dihydroxy-acid dehydratase, respectively), together with the deletion of pflB gene encoding pyruvate formate-lyase. As a result, the

11.2 Metabolic Engineering of E. coli for the Production of Fuels

engineered E. coli strain produced 22 g l−1 of isobutanol in flask culture [41]. By using gas stripping for in situ removing isobutanol from the fermentation broth, the engineered strain was able to produce more than 50 g l−1 of isobutanol in 72 hours by batch fermentation [45]. 11.2.3

Isoprenoid Pathway

Isoprenoid-derived fuels such as monoterpenes, sesquiterpenes, and hemiterpenes are promising next-generation jet fuels and fuel additives [30, 111]. Isoprenoids are generally biosynthesized from isopentenyl pyrophosphate (IPP) and its isomer, dimethylallyl diphosphate (DMAPP) [112]. IPP can be obtained from two different biosynthetic pathways, namely, methylerythritol-4-phosphate (MEP) pathway and mevalonate (MVA) pathway (Figure 11.1, blue highlight) [103–105]. The MEP pathway requires three molecules of ATP and three NADPHs, while the MVA pathway requires three ATPs and two NADPHs [113]. Therefore, the MVA pathway has been considered to be more efficient for producing isoprenoid compounds [114–116]. In isoprene production, for example, the MVA pathway is particularly preferred due to the high demand of redox equivalents [113, 117, 118]. Isoprene is a volatile hydrocarbon used to produce second-order fuel molecules to supplement gasoline, jet fuel, and diesel [119]. To produce isoprene in E. coli, a single-step pathway from DMAPP was introduced by overexpressing the Populus alba ispS gene encoding isoprene synthase with codon optimization. Then, to enrich the DMAPP pool, the MVA pathway was further introduced into the E. coli strain by overexpressing six heterologous genes that encode acetoacetyl-CoA thiolase (mvaE) and MVA synthase (mvaS) from Enterococcus faecalis; MVA kinase (mvk) from Methanosarcina mazei; and phosphomevalonate kinase (pmk), diphosphomevalonate decarboxylase (mvaD), and isopentenyl diphosphate isomerase (idi) from Streptococcus pneumonia. The resultant E. coli strain was able to produce 587 mg l−1 of isoprene from glucose in two-phase flask cultivation [48]. Dodecane, which shows a high capacity to dissolve isoprene, was used as an organic layer for the in situ extraction of isoprene, thereby reducing its toxicity to cells [48]. 11.2.4

Fatty Acid Pathway

Fatty acids are composed of long alkyl chains, and their biosynthetic pathways can be harnessed to produce an array of advanced fuels, such as fatty acid alkyl esters (FAAEs), fatty alcohols, and alkanes. Fatty acid de novo biosynthesis usually starts with malonyl-CoA and proceeds through repeated addition of two carbon units mediated by acyl carrier proteins (ACPs) (Figure 11.1, green highlight). Various fatty acyl-ACPs can be further converted through reduction, dehydration, and esterification to fatty acid derivatives which are useful as fuels. For example, when fatty acyl-ACPs are hydrolyzed by thioesterase, free fatty acids (FFAs) are released. Then, FFAs can be esterified to esters leading to the production of FAAEs (Figure 11.1, green highlight). Two representative FAAEs are fatty acid ethyl esters (FAEEs) and fatty acid methyl esters (FAMEs), which are commonly known as biodiesel. E. coli was engineered to produce FAEEs by

355

356

11 Metabolic Engineering of Escherichia coli

overexpressing the Z. mobilis pdc and adhB (encoding pyruvate decarboxylase and alcohol dehydrogenase, respectively), Acinetobacter baylyi ADP1 atfA (encoding wax-ester synthase), and endogenous fadD and tesA (encoding acyl-CoA synthetase and thioesterase, respectively) genes. The final engineered strain was able to produce 674 mg l−1 of FAEEs from glucose in two-phase flask culture using dodecane as an organic phase [49]. Production of FAEEs was further enhanced by developing a biosensor-regulator system to dynamically control the gene expression levels of biodiesel formation pathways, leading to the production of 1.5 g l−1 of FAEEs from glucose by batch culture [50]. To produce FAMEs, the broad-specificity juvenile hormone acid O-methyltransferase from Drosophila melanogaster (encoded by DmJHAMT) was employed to enable direct methylation of FFAs using S-adenosyl-l-methionine (SAM) as a methyl donor. Overexpression of DmJHAMT in an engineered E. coli strain overproducing medium-chain FFAs resulted in the production of 0.56 g l−1 of FAMEs from glucose by batch culture [51]. Alkanes and alkenes, which are obviously attractive substitutes for gasoline, can also be produced from fatty acyl-ACPs (Figure 11.1, green highlight) [106]. In E. coli, co-overexpression of the cyanobacterium Synechococcus elongatus PCC7942 orf1594 and orf1593 genes (encoding acyl-ACP reductase and aldehyde decarbonylase, respectively) resulted in the mixed production of long-chain alkanes and alkenes (C13–17) from glucose. A number of orf1593 orthologs from other cyanobacteria were also examined, and the one from Nostoc punctiforme PCC73102 was shown to produce the highest titer (300 mg l−1 by flask culture) of alkanes [52]. Also, alkenes were produced in E. coli by direct decarboxylation of FFAs. Co-overexpression of the Jeotgalicoccus sp. ATCC 8456 oleT JE and Rhodococcus sp. rhFRED genes that encode P450 fatty acid decarboxylase and P450 reductase domain, respectively, in FFAs-overproducing E. coli strain allowed production of 97.6 mg l−1 of alkenes from glucose in flask culture [53]. Furthermore, short-chain alkanes (C4–12) were also produced from glucose in engineered E. coli, with a titer of 580.8 mg l−1 by fed-batch fermentation. This was achieved by overexpressing the ’tesA(L109P) (encoding a mutated thioesterase that converts short-chain fatty acyl-ACPs to FFAs), fadD, C. acetobutylicum acr (encoding acyl-CoA reductase), and Arabidopsis thaliana CER1 (encoding aldehyde decarbonylase) genes [54].

11.3 Metabolic Engineering of E. coli for the Production of Chemicals Microbial metabolic engineering has played even more important roles in the sustainable production of chemicals from renewable resources. Over the past decades, a large number of chemicals, including bulk chemicals, specialty chemicals, and natural products, have been successfully produced by metabolically engineered microorganisms. In this section, microbial production of these chemicals through metabolic engineering of E. coli is discussed (Table 11.1).

11.3 Metabolic Engineering of E. coli for the Production of Chemicals

11.3.1

Bulk Chemicals

Bulk chemicals are those chemicals that are manufactured and consumed in large quantities. Representative bulk chemicals that have been produced using metabolically engineered E. coli are 𝜔-amino acids, hydroxy acids, diamines, dicarboxylic acids, diols, and lactams. The presence of amine, carboxyl, and hydroxyl functional groups in these chemicals makes them particularly useful as building blocks for other valuable or commodity products. To date, a number of bioprocesses for these bulk chemicals have been commercialized or are in preparation for commercialization [120]. Here, metabolic engineering of E. coli with advanced strategies for the production of various 𝜔-amino acids, hydroxy acids, diamines, dicarboxylic acids, diols, and lactams is discussed. 11.3.1.1

𝝎-Amino Acids

𝜔-Amino acids are non-natural linear amino acids that possess a carboxyl group at one end and an amine group at the other end. Among the most typical biobased 𝜔-amino acids are 4-aminobutyric acid (4ABA) [55], 5-aminovaleric acid (5AVA) [56], and 6-aminocaproic acid [121]. Metabolic engineering of E. coli for 4ABA and 5AVA production is discussed in detail below. 4ABA is a four-carbon 𝜔-amino acid used for the manufacture of butyrolactam (also known as 2-pyrrolidone) and polyamide-4 in the chemical industry [122]. 4ABA can be naturally produced in wild-type E. coli but at a very low efficiency. For example, the E. coli WL3110 strain (a lacI mutant of W3110 strain) was reported to produce only 15 mg l−1 of 4ABA from glucose in flask culture [55]. To increase 4ABA production, the gadB gene (encoding glutamate decarboxylase) responsible for the last step in 4ABA synthesis (Figure 11.2, yellow highlight) was overexpressed in the WL3110 strain, which led to the production of 252 mg l−1 of 4ABA [55]. However, the optimal activity of GadB was observed at pH 4.5, which is too acidic for the growth of E. coli. Thus, site-directed mutagenesis was applied to develop a GadB variant (E89Q, Δ452–466) which is active in a wide pH range [129]. The WL3110 strain overexpressing this engineered GadB was able to produce 360 mg l−1 of 4ABA from glucose in flask cultivation. Furthermore, 4ABA production was interestingly increased to 1360 mg l−1 by co-overexpressing the Clostridium propionicum act gene encoding β-alanine CoA-transferase, which converts 4ABA to butyrolactam. In a fed-batch culture, the WL3110 strain co-overexpressing the engineered GadB and act genes produced 66.25 g l−1 of 4ABA from glucose, which is the highest 4ABA titer ever reported by metabolically engineered E. coli [55]. 5AVA is a five-carbon nonprotein 𝜔-amino acid that can be used to make diverse valuable C5 chemicals including glutaric acid, 5-hydroxyvaleric acid, 1,5-pentanediol, and valerolactam [130]. Microbial production of 5AVA has relied mainly on the l-lysine degradation pathway [131], which however does not exist in wild-type E. coli [132]. As such, the Pseudomonas putida KT2440 lysine degradation pathway comprising lysine monooxygenase (encoded by davB) and 5-aminovaleramidase (encoded by davA) was introduced into E. coli to produce 5AVA (Figure 11.2, yellow highlight). Furthermore, the constitutive and inducible lysine decarboxylases (encoded by ldcC and cadA, respectively)

357

11.3 Metabolic Engineering of E. coli for the Production of Chemicals

Figure 11.2 Metabolic pathways for the production of representative bulk and specialty chemicals in E. coli. Metabolite abbreviations: Ac-CoA, acetyl-CoA; 6ACA, 6-aminocaproic acid; 6ACA-CoA, 6-aminocaproyl-CoA; ANT, anthranilate; 5AVA, 5-aminovaleric acid; CHA, chorismate; DAHP, 3-deoxy-D-arabino-heptulosonate 7-phosphate; DHAP, dihydroxyacetone phosphate; DHQ, 3-dehydroquinate; DHS, 3-dehydroshikimate; E4P, erythrose 4-phosphate; EPSP, 5-enolpyruvyl-shikimate 3-phosphate; FVA, formylvalerate; G3P, glyceraldehyde 3-phosphate; G6P, glucose 6-phosphate; 4HB, 4-hydroxybutyric acid; 4HBAld, 4-hydroxybutyraldehyde; 4HB-CoA, 4-hydroxybutyryl-CoA; 3HPAld, 3-hydroxypropionaldehyde; PEP, phosphoenolpyruvate; PPN, phenylpyruvate; PRE, prephenate; SHK, shikimate; S3P, shikimate 3-phosphate; Succ-CoA, succinyl-CoA. Gene symbols: aamt1, anthranilic acid methyltransferase1; act, β-alanine CoA-transferase; adh, alcohol dehydrogenase; aksD, 3-isopropylmalate dehydratase large subunit; aksE, 3-isopropylmalate dehydratase small subunit; aksF, isopropylmalate/isohomocitrate dehydrogenase; ald, 4-hydroxybutyrate-CoA reductase; argA, N-acetylglutamate synthase; argB, N-acetylglutamate kinase; argC, N-acetylglutamate semialdehyde dehydrogenase; argD, N-acetylornithine transaminase; argE, acetylornithinase; argF, ornithine carbamoyltransferase; argG, argininosuccinate synthase; argH, argininosuccinate lyase; cadA, inducible L-lysine decarboxylase; cat2, 4-hydroxybutyrate CoA-transferase; dapA, dihydrodipicolinate synthase; dapB, dihydrodipicolinate reductase; dapC, N-succinyldiaminopimelate-aminotransferase; dapD, tetrahydrodipicolinate succinylase; dapE, N-succinyl-L-diaminopimelate desuccinylase; dapF, diaminopimelate epimerase; DAR1, glycerol 3-phosphate dehydrogenase; davA, 𝛿-aminovaleramidase; davB, lysine 2-monooxygenase; ddh, meso-diaminopimelate dehydrogenase; dhaB, glycerol dehydratase; dmd, D-mandelate dehydrogenase; FDC1, ferulate decarboxylase; frd, fumarate reductase; fum, fumarase; gabD4, aldehyde dehydrogenase; gabD, glutarate semialdehyde dehydrogenase; gabT, 5-aminovalerate aminotransferase; gadB, L-glutamate decarboxylase; gdh, glutamate dehydrogenase; GPP2, glycerol 3-phosphate phosphatase; 4hbD, NADH-dependent 4-hydroxybutyrate dehydrogenase; hmaS, hydroxymandelate synthase; hmo, hydroxymandelate oxidase; ilvC, keto-acid reductoisomerase; ilvD, dihydroxy-acid dehydratase; ilvE, branched-chain amino acid aminotransferase; ilvIH, acetohydroxyacid synthase; kdcA, branched-chain α-ketoacid decarboxylase; ldcC, constitutive L-lysine decarboxylase; lysA, diaminopimelate decarboxylase; mdh, malate dehydrogenase; metL, homoserine dehydrogenase II; nifV, homocitrate synthase; PAL, phenylalanine ammonia lyase; patA, putrescine aminotransferase; patD, aminobutyraldehyde dehydrogenase; pheA, chorismate mutase; ppc, phosphoenolpyruvate carboxylase; pyc, pyruvate carboxylase; sdh, succinate hydrogenase; styAB, xylene monooxygenase; sucD, CoA-dependent succinate semialdehyde dehydrogenase; thrA, homoserine dehydrogenase I; thrB, homoserine kinase; thrC, L-threonine synthase; tpi, triosephosphate isomerase; trpED, anthranilate synthase; tyrA, chorismate mutase; tyrB, tyrosine aminotransferase; vfl, pyruvate transaminase; yqhD, alcohol dehydrogenase. Sources: Chae et al. [55], Yeom et al. [123], Rajagopal et al. [124], Lütke-Eversloh and Stephanopoulos [125, 126], Juminaga et al. [127], Köllner et al. [128], and Sun et al. [68].

that are responsible for the conversion of lysine to cadaverine were inactivated to enhance the intracellular lysine pool for increased 5AVA production. Taken together, the final engineered E. coli strain produced 0.86 g l−1 of 5AVA from glucose in flask culture [56]. 11.3.1.2

Hydroxy Acids

Hydroxy acids are industrially important chemicals that have a carboxyl group at one end and a hydroxyl group at the other. Two representative hydroxy acids that have been produced in E. coli are 3-hydroxypropionic acid (3HP) [57] and 4-hydroxybutyric acid (4HB) [58].

359

360

11 Metabolic Engineering of Escherichia coli

3HP is one of the most important hydroxy acids that can be used to synthesize a wide range of industrial chemicals including acrylic acid, 1,3-propanediol, β-propiolactone, and malonic acid [133–135]. 3HP can be produced from glycerol using a two-step pathway, consisting of the conversion of glycerol to 3-hydroxypropionaldehyde (3HPAld) by glycerol dehydratase (encoded by dhaB) and further oxidation into 3HP by aldehyde dehydrogenase (encoded by gabD4) (Figure 11.2, gray highlight). To reduce acetic acid formation in E. coli, the ackA and pta genes (encoding acetate kinase and phosphate acetyltransferase, respectively) were deleted. In addition, the yqhD gene (encoding alcohol dehydrogenase) was disrupted to prevent 1,3-propanediol formation from 3HPAld. To enhance glycerol conversion to 3HP, a glycerol dehydratase regeneration factor from Klebsiella pneumonia and a mutant of Cupriavidus necator aldehyde dehydrogenase showing higher catalytic activity were introduced in E. coli. Collectively, these strategies resulted in an engineered E. coli strain producing the highest 3HP titer of 72 g l−1 from glycerol with the productivity of 1.8 g l−1 h−1 by aerobic fed-batch fermentation [57]. 4HB is another important platform hydroxy acid that can be used to produce various valuable chemicals and polymers [136]. The best 4HB overproducer strain of E. coli was constructed by first removing the glucose-specific permease of phosphotransferase system (encoded by ptsG) to prevent phosphoenolpyruvate consumption [58]. The iclR gene encoding isocitrate lyase repressor was then deleted to allow a deregulated glyoxylate shunt. As 4HB is produced from succinyl-CoA via succinate semialdehyde (Figure 11.2, gray highlight), the succinate dehydrogenase (encoded by sdhAB) and succinate semialdehyde dehydrogenase (encoded by gabD or yneI) were inactivated. The succinate dehydrogenase converts succinate to fumarate and succinate semialdehyde dehydrogenase converts succinate semialdehyde to succinate. Pyruvic acid assimilation was additionally enhanced by overexpressing the E. coli pyruvate dehydrogenase complex (encoded by aceEF and lpd together). Lastly, the E. coli sucCD (encoding succinyl-CoA synthetase) and Clostridium kluyveri sucD and 4hbd (encoding CoA-dependent succinate semialdehyde dehydrogenase and NADH-dependent 4-hydroxybutyrate dehydrogenase, respectively) genes were overexpressed to improve the conversion from succinate to 4HB. The final engineered E. coli strain produced 103.4 g l−1 of 4HB from glycerol by microaerobic fed-batch fermentation [58]. 11.3.1.3

Diamines

Diamines are valuable chemicals having diverse applications including monomers for polyamides [137, 138]. Numerous metabolic engineering studies have been conducted in E. coli for the production of linear and aliphatic diamines such as 1,3-diaminopropane [59], putrescine (also known as 1,4-diaminobutane) [24], and cadaverine (also known as 1,5-diaminopentane) [139] from renewable carbon sources. Here we describe only 1,3-diaminopropane and cadaverine as representative metabolic engineering examples, while readers can refer to other references including [120]. 1,3-Diaminopropane is a three-carbon diamine which is used as a crosslinker for epoxy resins, as well as a precursor for pharmaceuticals,

11.3 Metabolic Engineering of E. coli for the Production of Chemicals

agrochemicals, organic chemicals, and polyamides. Microbial production of 1,3-diaminopropane was demonstrated in an engineered E. coli strain overexpressing the Acinetobacter baumannii dat and ddc genes that encode 2-ketoglutarate 4-aminotransferase and l-2,4-diaminobutanoate decarboxylase, respectively. Moreover, the flux toward 1,3-diaminopropane production was enhanced by the overexpression of ppc and aspC genes, removing feedback inhibition through the introduction of mutations in thrA and lysC genes, and increasing the intracellular NADPH pool by deletion of the pfkA gene. Fed-batch fermentation of the final engineered E. coli strain produced 13 g l−1 of 1,3-diaminopropane from glucose [59]. Cadaverine is a five-carbon diamine that has widespread applications in agriculture, medicine, and chemical industry, and particularly as a monomer for various polymers [140]. Cadaverine can be biosynthesized directly from l-lysine by decarboxylation (Figure 11.2, dark blue highlight). To overproduce cadaverine in E. coli, the cadaverine catabolic pathway was inactivated by deleting the speE and speG genes encoding cadaverine aminopropyltransferase and spermidine acetyltransferase, respectively. Also, the putrescine degradation pathway encompassing putrescine aminotransferase (encoded by ygjG), γ-glutamate-putrescine ligase (encoded by puuA), and putrescine importer (encoded by puuP) was eliminated to prevent cadaverine from degradation. Further overexpression of the cadA and dapA genes encoding lysine decarboxylase and dihydrodipicolinate synthase, respectively, enabled the engineered E. coli strain to produce 9.61 g l−1 of cadaverine from glucose with the productivity of 0.32 g l−1 h−1 by fed-batch fermentation [139]. A genome-wide knockdown target screening was conducted using the synthetic small RNA technology, and downregulation of the murE gene encoding UDP-N-acetylmuramoyl-l-alanyl-d-glutamate-2,6-diaminopimelate ligase further enhanced cadaverine production up to 12.6 g l−1 in fed-batch fermentation [22]. The knockdown of murE gene could increase cadaverine production titer since the cell wall biosynthesis pathway starting with UDPN-acetylmuramoyl-l-alanyl-d-glutamate-2,6-diaminopimelate ligase competes with the cadaverine biosynthesis pathway. 11.3.1.4

Dicarboxylic Acids

Dicarboxylic acids are industrially valuable chemicals that have a carboxyl group at each end. Several representative biobased dicarboxylic acids are adipic [141], glutaric [132], sebacic [138], and succinic acids [142]. In this section, production of glutaric and succinic acids by metabolically engineered E. coli is discussed in detail. Glutaric acid is a five-carbon dicarboxylic acid that is often used to synthesize commercial plastics. Glutaric acid can be produced through the natural l-lysine degradation pathway similar to cadaverine and 5AVA (Figure 11.2, orange highlight). The highest glutaric acid-producing E. coli strain was developed by first modulating the expression of its native cadA, patA, patD, gabT, and gabD genes encoding lysine decarboxylase, putrescine aminotransferase, γ-aminobutyraldehyde dehydrogenase, 4-aminobutyrate aminotransferase, and succinate semialdehyde dehydrogenase, respectively, using plasmids with different copy numbers [60]. Lysine catabolism through the successive reactions

361

362

11 Metabolic Engineering of Escherichia coli

catalyzed by the above enzymes generates two molecules of glutamate and NAD(P)H, which are in turn needed for lysine biosynthesis. In addition, the 5AVA importer (encoded by gabP) and bidirectional cadaverine transporter (encoded by potE) were introduced into E. coli to reuptake extracellular 5-AVA and cadaverine, respectively. Finally, the iclR gene encoding the transcriptional regulator of glyoxylate bypass operon was deleted to enhance oxaloacetate supply. The final engineered E. coli strain produced 54.5 g l−1 of glutaric acid with the yield and productivity of 0.5 mol mol−1 glucose and 0.76 g l−1 h−1 , respectively, by fed-batch fermentation [60]. Succinic acid is a four-carbon dicarboxylic acid used to synthesize industrially important chemicals such as adipic acid, 1,4-butanediol, and tetrahydrofuran. Also, it is widely used as a monomer for the synthesis of polymers such as polybutylene succinate and polyamides [143]. To overproduce succinic acid in E. coli (Figure 11.2, orange highlight), the pflB and ldhA genes encoding pyruvate formate-lyase and lactate dehydrogenase, respectively, were eliminated to prevent byproduct formation. In addition, the glucose-specific permease of the phosphotransferase system (encoded by ptsG), which consumes phosphoenolpyruvate during glucose uptake, was inactivated to secure phosphoenolpyruvate pool for enhanced succinic acid production [144]. Moreover, the Rhizobium etli pyc gene encoding pyruvate carboxylase that participates in the anaplerotic reaction was overexpressed to enhance succinic acid production. Finally, a dual-phase fed-batch fermentation, which entailed aerobic growth phase and anaerobic succinic acid production phase, was employed to produce 99.2 g l−1 of succinic acid with the yield and productivity of 1.74 mol mol−1 glucose and 1.3 g l−1 h, respectively [61]. 11.3.1.5

Diols

Diols are important platform chemicals that have a hydroxyl group at each end. Biobased production of diols with variant carbon chain lengths, such as ethylene glycol [145], 1,3-propanediol [62], and 1,4-butanediol [63], has been successfully demonstrated in metabolically engineered E. coli. 1,3-Propanediol is an important chemical that serves as a building block for polymers, de-icers, and food additives. The most robust E. coli strain producing 1,3-propanediol was developed by DuPont and Genencor (merged to DuPont later) [146]. To construct such a strain, the ptsH, ptsI, and crr genes encoding different components of the glucose phosphotransferase system were first disrupted, along with overexpression of the galP and glk genes encoding galactose-proton symporter and glucokinase, respectively. This was performed to shift the engineered strain from phosphoenolpyruvate-dependent to ATP-dependent glucose uptake. Then, triosephosphate isomerase (encoded by tpiA) was removed to allow even splitting of carbon flux at the fructose bisphosphate aldolase step, and glyceraldehyde 3-phosphate dehydrogenase (encoded by gapA) was downregulated to improve the flux distribution between 1,3-propanediol production and TCA cycle. As 1,3-propanediol is produced directly from glycerol (Figure 11.2, light blue highlight), the glycerol formation from glucose was also enhanced by introducing the S. cerevisiae DAR1 and GPP2 genes encoding glycerol 3-phosphate dehydrogenase and phosphatase,

11.3 Metabolic Engineering of E. coli for the Production of Chemicals

respectively. Furthermore, the conversion of glycerol to 1,3-propanediol was enhanced by co-overexpression of the K. pneumonia operon containing dhaB1, dhaB2, and dhaB3 genes that together encode glycerol dehydratase, and E. coli yqhD gene that encodes oxidoreductase. The final engineered strain produced 135 g l−1 of 1,3-propanediol from glucose with the productivity of 3.5 g l−1 h−1 by fed-batch fermentation [62]. 1,4-Butanediol is a four-carbon diol used to synthesize spandex, plastics, elastic fibers, and films. 1,4-Butanediol can theoretically be produced from either α-ketoglutarate or succinyl-CoA (Figure 11.2, light blue highlight). However, due to the inefficient conversion of α-ketoglutarate to succinyl semialdehyde by α-ketoglutarate decarboxylase in E. coli, 1,4-butanediol production via succinyl-CoA is more preferred [147]. To build this heterologous route, the Porphyromonas gingivalis 4hbD, cat2, and ald (encoding 4-hydroxybutyrate dehydrogenase, 4-hydroxybutyrate-CoA transferase, and 4-hydroxybutyrate-CoA reductase, respectively) and C. beijerinckii adh (encoding alcohol dehydrogenase) genes were overexpressed, which resulted in an engineered E. coli strain producing 18 g l−1 of 1,4-butanediol from glucose by microaerobic fed-batch fermentation [63]. 11.3.1.6

Lactams

Lactams are cyclic form of 𝜔-amino acids that are used as monomers for various polyamides. Fermentative production of lactams has once remained a grand challenge, due to the lack of natural metabolic pathways for lactam biosynthesis. To overcome it, novel synthetic pathways for three lactams have recently been established in E. coli successfully, namely, butyrolactam [55, 148], valerolactam [55, 149], and caprolactam [55, 123, 149]. Among them, butyrolactam is the most efficiently produced by metabolically engineered E. coli reaching the titer of 54.14 g l−1 by fed-batch fermentation from glucose. The detailed engineering strategies have already been discussed for 4ABA (see Section 11.3.1.1). In this section, production of caprolactam is discussed in detail. Caprolactam is a six-carbon lactam widely used as a precursor for polyamide 6. Recently, a synthetic metabolic pathway for caprolactam was constructed by leveraging the promiscuous activities of C. propionicum β-alanine CoA-transferase (encoded by act) [55], Streptomyces aizunensis acyl-CoA ligase [149], and Citrobacter freundii 3-hydroxybutyrate dehydrogenase [123] on 6-aminocaproic acid (Figure 11.2, purple highlight). Furthermore, caprolactam production from renewable carbon sources (e.g. glycerol) was demonstrated by engineered E. coli strain overexpressing the C. propionicum act gene as well as six additional genes (Azotobacter vinelandii nifV , Methanococcus aeolicus aksDEF, Lactococcus lactis kdcA, and Vibrio fluvialis vfl) which together encode the synthetic pathway to 6-aminocaproic acid [55]. Although the final caprolactam titer was rather low (80 μg l−1 ), this result showed the possibility for renewable production of caprolactam in the future. 11.3.2

Specialty Chemicals

In comparison to bulk chemicals as described above, specialty chemicals are generally synthesized and consumed on a relatively smaller scale but with higher

363

364

11 Metabolic Engineering of Escherichia coli

prices. Specialty chemicals often serve as additives in food and cosmetic products or precursors for more valuable pharmaceutical end products. Two typical categories of specialty chemicals that have been largely produced by microbial fermentation are proteinogenic l-amino acids and aromatic compounds. 11.3.2.1

L-Amino Acids

Natural proteinogenic l-amino acids are widely utilized in food, animal feed, cosmetic, and pharmaceutical industries [150]. These amino acids are metabolites universally present in the primary metabolism of microorganisms. Thus, microbial production of amino acids, such as l-aspartic acid [151], l-alanine [152], l-arginine [64], l-lysine [153], l-methionine [154], l-threonine [155], and l-valine [65], has been investigated and pursued extensively, especially using metabolically engineered E. coli. It should be noted that Corynebacterium strains are capable of producing some amino acids (e.g. l-arginine, l-glutamate, and l-lysine) to much higher titer with higher productivity. Here, production of representative l-amino acids by engineered E. coli is discussed, focusing on metabolic engineering strategies employed. l-Arginine is an important essential amino acid with medicinal and pharmaceutical applications [156] and is also used as a precursor for nitric oxide [157]. To develop an l-arginine-overproducing E. coli strain, the l-arginine responsive repressor protein (encoded by argR) was removed for efficient transcription of l-arginine biosynthesis genes; ornithine decarboxylase (encoded by speC) and l-arginine decarboxylase (encoded by speA or adiA) were inactivated to prevent ornithine/l-arginine degradation. Next, the N-acetylglutamate synthase (encoded by argA) catalyzing the first committed step in l-arginine biosynthesis (Figure 11.2, green highlight) was replaced with a feedback-resistant variant [124]. Moreover, an l-arginine exporter (encoded by yggA) was co-overexpressed with its transcriptional regulator mutant (encoded by argPV216A ) to increase l-arginine secretion [158]. The final engineered E. coli strain produced 11.64 g l−1 of l-arginine from glucose by batch fermentation [64]. l-Threonine is an essential amino acid that is widely used as a nutritional supplement and precursor of flavoring agents in animal feeds and human foods [159]. To overproduce l-threonine in E. coli, the aspartokinase I and III (encoded by thrA and lysC, respectively), which are subjected to feedback inhibition, and l-threonine operon leader peptide, which regulates transcriptional attenuation, were eliminated. Moreover, competing pathways for l-threonine consumption were eliminated by inactivating l-threonine 3-dehydrogenase (encoded by tdh) and mutating l-threonine dehydratase (encoded by ilvA) (Figure 11.2, green highlight). Next, the flux toward l-threonine synthesis was further enhanced by removing other competitive pathways including l-methionine and l-lysine synthesis pathways. Finally, transcriptome profiling in conjunction with genome-scale in silico flux response analysis was applied to identify gene amplification targets linked with enhanced l-threonine production. As a result, the rhtABC and acs genes (encoding threonine exporter and acetyl-CoA synthetase, respectively) were overexpressed in the final engineered E. coli strain for enhancing l-threonine export and reducing acetate formation, respectively.

11.3 Metabolic Engineering of E. coli for the Production of Chemicals

This engineered E. coli was able to produce 82.4 g l−1 of l-threonine with the yield of 0.39 g g−1 glucose in fed-batch fermentation [26]. l-Valine is a highly demanded branched-chain amino acid that is used for animal feed additives, antibiotics, cosmetics, and pharmaceuticals [150]. The highest l-valine production has been achieved in engineered E. coli W strain, which features less acetate formation and high l-valine tolerance compared to the W3110 strain [65]. In this engineered strain, the ilvA gene encoding l-threonine dehydratase was deleted to enhance pyruvate availability, and the lacI repressor was removed to enable the constitutive gene expression. Next, feedback inhibition of acetohydroxy acid synthase I (AHAS I, encoded by ilvB and ilvN together) was relieved by introducing a feedback-resistant mutant [160]. l-Valine biosynthetic pathway (Figure 11.2, green highlight) was further strengthened by overexpressing the ilvCED genes encoding acetohydroxy acid isomeroreductase, dihydroxy acid dehydratase, and transaminase, respectively. Finally, the lrp gene and ygaZH operon (encoding l-leucine responsive protein and l-valine exporter, respectively) were overexpressed to enhance AHAS III expression and to export l-valine from the cell, respectively. The resulting engineered strain produced 60.7 g l−1 of l-valine from glucose with the productivity of 2.06 g l−1 h−1 by fed-batch fermentation [65]. 11.3.2.2

Specialty Aromatics

Aromatic compounds constitute a diverse group of molecules that serve a myriad of applications in making polymers, fibers, esters, nutraceuticals, and pharmaceuticals [161]. Over the past decades, considerable progress has been made in the microbial production of value-added aromatics from renewable carbon sources [162–164]. Aromatic biochemicals are generally derived from the shikimate pathway or the biosynthetic pathways of the three essential aromatic amino acids including l-tyrosine, l-phenylalanine, and l-tryptophan [165, 166]. Here, metabolic engineering of E. coli for the production of exemplary aromatic specialty chemicals is discussed, which spans all of the three aromatic amino acid biosynthesis pathways. l-Tyrosine has been an attractive target for microbial fermentation studies owing to its widespread applications in food, pharmaceutical, and chemical industries [167, 168]. In E. coli, l-tyrosine, just like other amino acids, is subjected to tight and complex regulations. Elucidation of these regulatory mechanisms, ranging from transcriptional and translational to metabolite levels, has allowed the implementation of various targeted deregulation approaches for developing l-tyrosine overproducers [12, 169]. For instance, the aroGfbr and tyrAfbr genes encoding feedback-resistant versions of 3-deoxy-d-arabino-heptulosonate synthase and dual-function chorismate mutase/prephenate dehydrogenase, respectively, which are two key enzymes in l-tyrosine biosynthesis, were overexpressed for enhanced l-tyrosine production. The individual and/or combinatorial overexpression of genes involved in other rate-limiting steps including aroB, aroK, and aroL in the shikimate pathway, and ppsA and tktA in central metabolism (Figure 11.2, pink highlight) was also shown to increase l-tyrosine production [125–127]. Recently, an engineered E. coli strain was developed by overexpressing the aroGfbr , aroL, and Z. mobilis tyrC (functionally similar

365

366

11 Metabolic Engineering of Escherichia coli

to tyrAfbr ) genes in a best combination and deleting the tyrP gene encoding l-tyrosine transporter, which produced 43.14 g l−1 of l-tyrosine from glucose in fed-batch culture employing a combination of exponential and DO-stat feeding strategies [66]. On the other hand, l-tyrosine overproducers have also been developed by using large-scale gene target screening techniques, such as global transcriptional machinery engineering [170] and synthetic small regulatory RNAs [22]. Methyl anthranilate is a grape flavor compound that is widely utilized in foods, soft drinks, cosmetics, and perfumes, as well as in pharmaceutical and agricultural applications [171]. As such, there has been much interest to develop purely “natural” methyl anthranilate by biological means to substitute for the traditional artificial grape flavor. As a proof-of-concept, E. coli has been successfully engineered to produce methyl anthranilate directly from glucose [67]. First, a synthetic metabolic pathway for methyl anthranilate (Figure 11.2, pink highlight) was constructed in E. coli by introducing a plant SAM-dependent methyltransferase (AAMT1) capable of methylating anthranilic acid into methyl anthranilate in a single step [128]. Anthranilic acid, also serving as a useful aromatic specialty chemical [172], is a native metabolite in l-tryptophan biosynthesis, and several studies have reported its overproduction in E. coli [173, 174]. On this basis, engineering of the precursor anthranilic acid supply resulted in improved methyl anthranilate production. Production of methyl anthranilate was further increased by optimization of the AAMT1 expression level and increase of the intracellular SAM pool. Moreover, a two-phase cultivation with in situ extraction using tributyrin was developed to mitigate methyl anthranilate cytotoxicity. The final engineered E. coli strain produced 4.47 g l−1 of methyl anthranilate from glucose by two-phase fed-batch fermentation [67]. Mandelic acid is an important aromatic specialty chemical used for the synthesis of various pharmaceuticals such as cephalosporin antibiotics [175]. It is also used for resolving certain alcohol and amine racemates of interest [176]. Engineered E. coli strains have been developed for the direct production of both stereoisomers, S-mandelic acid and R-mandelic acid (Figure 11.2, pink highlight) [68]. First, S-mandelic acid was produced in E. coli by introducing the Amycolatopsis orientalis hmaS gene encoding hydroxymandelate synthase that can convert phenylpyruvic acid, an endogenous metabolite in l-phenylalanine pathway, to S-mandelic acid. Further optimization of the recombinant E. coli strain by combinatorial inactivation of competing pathways resulted in the production of 1.02 g l−1 of S-mandelic acid from glucose by flask culture [68]. On the basis of S-mandelic acid pathway, R-mandelic acid was also produced with the titer of 0.88 g l−1 by further introducing the Streptomyces coelicolor hmo and Rhodotorula graminis dmd genes encoding hydroxymandelate oxidase and d-mandelate dehydrogenase, respectively [68]. In both S- and R-mandelic acid production, HmaS was identified as a bottleneck step due to the low enzymatic activity [68]. Therefore, the use/discovery of superior HmaS isozymes from other organisms or protein engineering of HmaS for improved catalytic efficiency can be pursued to further enhance mandelic acid production [177, 178]. S-Styrene oxide is an important aromatic compound used for the synthesis of numerous pharmaceuticals and value-added specialty chemicals, such as the

11.3 Metabolic Engineering of E. coli for the Production of Chemicals

biocides levamisole and nematocide [179]. A novel synthetic pathway has been devised for the production of S-styrene oxide from glucose in E. coli [69]. This non-natural pathway starts with l-phenylalanine and proceeds in three steps via the successive conversion of l-phenylalanine to trans-cinnamic acid catalyzed by phenylalanine ammonia lyase (PAL), then to styrene by ferulate decarboxylase (FDC1), and ultimately to S-styrene oxide by styrene monooxygenase (SMO) (Figure 11.2, pink highlight). trans-Cinnamic acid and styrene, both having commercial values, have previously been produced using engineered microbes including E. coli [180, 181], which paved the way for engineering E. coli toward S-styrene oxide production. Co-overexpression of the P. putida S12 styAB genes encoding a reportedly most efficient SMO [182] in the engineered E. coli strain that already harbored a styrene pathway comprising the S. cerevisiae FDC1 and A. thaliana PAL, successfully led to S-styrene oxide production from glucose. Through systematic engineering of the precursor l-phenylalanine production, in particular tyrA deletion and tktA overexpression, the S-styrene oxide titer was further increased to 1.32 g l−1 by flask culture, reaching the level of its toxicity limit [69]. 11.3.3

Natural Products

Natural products are referred to as chemical compounds that are naturally produced (oftentimes as secondary metabolites) by various organisms including bacteria, yeasts, and plants. Due to their distinct and valuable properties, natural products have been widely used in human lives as agricultural, nutritional, cosmetic, pharmaceutical, and industrial ingredients. As an increasing number of novel natural products are being discovered and characterized, demands for them have been continuing to grow [183]. Thus, there have been numerous studies reporting natural products production in various microorganisms including E. coli, Streptomyces, and yeasts, among which E. coli has drawn much attention as a competitive chassis strain due to its fast growth and higher capacity to produce some common precursors such as aromatic amino acids. Natural products can be broadly classified into four main categories based on their structural features and biosynthetic pathways (Figure 11.3): terpenoids, phenylpropanoids, alkaloids, and polyketides [184]. Here, representative natural products of each category and relevant metabolic engineering strategies applied for their biobased production in E. coli are discussed. 11.3.3.1

Terpenoids

Terpenoids are derivatives of terpenes, which are formed by integrating the precursors IPP and DMAPP that can be obtained from either MVA or MEP pathway as described earlier (see Section 11.2.3) [185, 186]. Representative terpenoids that have been produced in E. coli are amorpha-4,11-diene and artemisinic acid, which are precursors of the antimalarial drug artemisinin [70], and oxygenated taxanes, which are key intermediates for the synthesis of the anti-cancer agent taxol (paclitaxel) [187]. To produce amorpha-4,11-diene in E. coli, the amorphadiene synthase (ADS) was first introduced to convert farnesyl diphosphate (FPP) to amorpha-4,11-diene (Figure 11.3, gray highlight).

367

Figure 11.3 Metabolic pathways and engineering strategies for the production of exemplary natural products belonging to terpenoids, phenylpropanoids, alkaloids, or polyketides in E. coli. Metabolite abbreviations: Ac-CoA, acetyl-CoA; DAHP, 3-deoxy-D-arabino-heptulosonate 7-phosphate; 3,4-DHPAA, 3,4-dihydroxyphenylacetaldehyde; DMAPP, dimethylallyl diphosphate; L-DOPA, L-3,4-dihydroxyphenylalanine; DXP, 1-deoxy-D-xylulose 5-phosphate; E4P, erythrose 4-phosphate; F6P, fructose 6-phosphate; FPP, farnesyl diphosphate; GGPP, geranylgeranyl pyrophosphate; G3P, glyceraldehyde 3-phosphate; HMBPP, 1-hydroxy-2-methyl-2-butenyl 4-diphosphate; IPP, isopentenyl diphosphate; Mal-CoA, malonyl-CoA; MEV, mevalonate; PEP, phosphoenolpyruvate; X5P, xylulose 5-phosphate. Enzyme abbreviations: ADS, amorphadiene synthase; AMO, amorphadiene oxidase; CHI, chalcone isomerase; CHS, chalcone synthase; 4CL, 4-coumaroyl: CoA ligase; CNMT, coclaurine N-methyltransferase; CPR, P450 reductase; DEBS, deoxyerythronolide B synthase; EryG, erythromycin O-methyltransferase; EryK, erythromycin C-12 hydroxylase; 4′ OMT, 4′ -O-methyltransferase; PCC, propionyl-CoA carboxylase; Sbm, methylmalonyl-CoA mutase; STORR, epimerase of S- to R-reticuline; STS, stilbene synthase; T5αOH, taxadiene 5-α hydroxylase; TxS, taxadiene synthase; YgfG, methylmalonyl-CoA decarboxylase. Source: Based on Park et al. [184].

11.3 Metabolic Engineering of E. coli for the Production of Chemicals

In addition to the E. coli native MEP pathway, introduction of the heterologous MVA pathway increased the intracellular pool of IPP and DMAPP, which consequently allowed the high production of amorpha-4,11-diene (27.4 g l−1 ) from glucose by fed-batch fermentation [70]. Amorpha-4,11-diene can be further converted to artemisinic acid by amorphadiene oxidase (AMO) (Figure 11.3, gray highlight), a cytochrome P450 enzyme (P450) which is hardly active in E. coli due to the lack of intracellular membrane structures and absence of P450 reductases (CPRs). Upon codon optimization and N-terminal engineering, AMO could be functionally expressed in E. coli, leading to successful production of 105 mg l−1 of artemisinic acid in flask culture [71]. This titer was much lower than the highest artemisinic acid production achieved by yeast (25 g l−1 ) [188], mainly because yeast is more suitable as a cell factory for functional expression of P450s. Likewise, functional P450 expression is also an obstacle for the production of taxol precursors in E. coli (Figure 11.3, gray highlight). To solve this issue, expression of a gene encoding a key P450 enzyme (taxadiene 5-α hydroxylase) responsible for oxygenating taxadiene was successfully improved by optimizing its interaction with the partner reductase and modifying its N-terminal peptide, which enabled the production of 570 mg l−1 of oxygenated taxanes in E. coli by fed-batch fermentation [72]. Oxygenated taxanes were also produced from glucose using engineered E. coli-S. cerevisiae co-culture system, in which the engineered E. coli strain was responsible for producing taxadiene from glucose and yeast was responsible for oxygenation of the produced taxadiene [189]. Although the oxygenated taxane titer obtained in the above study was low (only 33 mg l−1 ), this novel co-culture concept has since been developed into an increasingly popular approach for the microbial production of some complex bioproducts [190, 191]. 11.3.3.2

Phenylpropanoids

Phenylpropanoids are natural aromatic compounds synthesized from intermediates of the shikimate pathway [192]. p-Coumaric acid is the central intermediate of phenylpropanoids, which is synthesized through deamination of l-tyrosine or deamination plus hydroxylation of l-phenylalanine, and is converted into phenylpropanoids following downstream hydroxylation, methylation, or dehydrogenation [193]. One of the best studied phenylpropanoids is resveratrol, which has diverse health benefits and therapeutic applications. Resveratrol can be produced in E. coli from p-coumaric acid by introducing two heterologous enzymes, 4-coumaroyl-CoA ligase (4CL) and stilbene synthase (STS) (Figure 11.3, green highlight). Screening STSs of various origins and optimization of the co-expression of 4CL and STS using different genetic configurations, resulted in the production of 2.3 g l−1 of resveratrol from glucose in flask culture [73]. Naringenin is another well-known phenylpropanoid with useful pharmaceutical properties, which also serves as an important precursor for various flavonoids [194]. Naringenin is synthesized through assimilation of three molecules of malonyl-CoA and one molecule of 4-coumaroyl-CoA catalyzed by chalcone synthase (CHS), followed by cyclization by chalcone isomerase (CHI) (Figure 11.3, green highlight). As malonyl-CoA, which is a key precursor of naringenin, is usually limited in E. coli, enhancing its intracellular

369

370

11 Metabolic Engineering of Escherichia coli

availability is necessary for optimal naringenin production. In an effort to increase the malonyl-CoA pool, gene knockdown targets in central metabolism which are linked with enhanced naringenin production were screened using the CRISPR interference technique [74]. Repression efficiencies of the selected knockdown target genes were subsequently fine-tuned to balance cell growth and malonyl-CoA production, which eventually resulted in the production of 421.6 mg l−1 of naringenin from glucose by flask culture [74]. 11.3.3.3

Alkaloids

Alkaloids are broadly defined as secondary metabolites that contain nitrogen moieties. Many alkaloids serve as widely used drugs including the famous examples of vinblastine, morphine, and scopolamine. Among many subclasses of alkaloids such as monoterpene indole alkaloids (MIAs) and benzylisoquinoline alkaloids (BIAs), only BIAs have been successfully produced in engineered E. coli. The BIA biosynthetic pathway (Figure 11.3, yellow highlight) starts from l-tyrosine, which is converted to l-3,4-dihydroxyphenylalanine (l-DOPA) by tyrosinase (TYR) and then to dopamine by l-DOPA decarboxylase (DODC). Dopamine is further converted to S-reticuline, which is the core intermediary metabolite of the sanguinarine pathway, by a series of enzymatic cascades via S-norlaudanosoline [195]. Production of S-reticuline has been achieved in both engineered E. coli (46.0 mg l−1 from glycerol) [75] and yeast (80.6 μg l−1 from glucose) [196], and the significantly higher S-reticuline titer by the E. coli strain was likely attributed to the efficient production of the precursor l-tyrosine. R-Reticuline, which is a key intermediate of the morphine pathway (Figure 11.3, yellow highlight), could also be produced in E. coli, either from S-reticuline by an epimerase (STORR) or from R,S-norlaudanosoline by coclaurine N-methyltransferase (CNMT) and 4′ -O-methyltransferase (4’OMT) with the simultaneous formation of S-reticuline [76]. While the BIA downstream pathways are branched into the sanguinarine pathway and morphine pathway after reticuline (Figure 11.3, yellow highlight), only morphinan alkaloids have been produced in engineered E. coli. To produce morphinan alkaloids, the long and complex biosynthetic pathway was divided into four modules and each module was introduced into a separate E. coli strain to allow step-wise cultivation (four-steps). This fermentation approach enabled the production of 2.1 mg l−1 of thebaine and 0.36 mg l−1 of hydrocodone from glycerol [76]. 11.3.3.4

Polyketides

Polyketides are synthesized by a series of Claisen condensation reactions of the acyl-CoA molecules followed by various modification steps [197]. A family of enzymes called polyketide synthases (PKSs) are responsible for polyketide synthesis (Figure 11.3, purple highlight), which can be grouped into three different categories (Type I, II, and III PKSs) depending on the mechanism of carbon chain elongation. Type I PKS is formed by an assembly line of enzymes, each acting as a module for each step of chain elongation. In type II PKS system, a minimal PKS comprising ketosynthase heterodimers and an acyl-carrier protein can carry out a complete cycle of carbon chain elongation. In type III PKS system, a single enzyme forms a homodimer to elongate the carbon chain. A notable example of

11.4 Metabolic Engineering of E. coli for the Production of Materials

polyketides produced in engineered E. coli is erythromycin (Figure 11.3, purple highlight), which is synthesized by a modular type I PKS machinery from Saccharopolyspora erythraea. To achieve the complete biosynthesis of erythromycin A in E. coli, the precursor 6-deoxyerythronolide B was first produced from propionate by expressing all of the components of type I PKS in active form [198]. Various tailoring enzymes were subsequently introduced, which resulted in the production of 10 mg l−1 of erythromycin A from propionate by flask culture [77]. Recently, the type I PKS system was repurposed to allow the production of desired small molecules such as pentadecane [199], as well as the type III PKS system harnessed for the construction of a malonyl-CoA biosensor [23]. Nonribosomal peptides (NRPs) are also an important class of natural products having various pharmaceutically active properties, which can be synthesized by modular mega-enzymes called nonribosomal peptide synthetases (NRPSs). Due to the large size and complexity of NRPSs, it has been a challenge to express NRPSs in active form in E. coli. Therefore, the number of cases of successful NRPs production in E. coli has remained rather limited [78, 79].

11.4 Metabolic Engineering of E. coli for the Production of Materials 11.4.1

Recombinant Proteins

Since the advent of recombinant DNA technology, production of proteins in heterologous hosts has transformed both biochemical and biopharmaceutical industries. E. coli has been one of the most commonly used host organisms for recombinant protein production due to the low cost, rapid growth, efficient expression, and easy genetic manipulation [200]. Also, some of the inherent limitations of E. coli in protein synthesis, such as the lack of post-translational modification machinery and inclusion body formation, have been overcome through various engineering strategies, which allows production of even more complex recombinant proteins [201]. In this section, the strategies of engineering E. coli for the production of several important and hard-to-express proteins including therapeutic proteins, membrane proteins, and protein-based materials (Figure 11.4) are discussed. 11.4.1.1

Therapeutic Proteins

Protein-based biopharmaceuticals have been playing a key role in improving human health. Production of therapeutic proteins from recombinant microbial hosts has revolutionized biopharmaceutical industry, as in the past only few milligrams of proteins could be extracted from large amounts of animal tissues due to the inefficient processes [202]. Monoclonal antibodies (mAbs) are a dominant class of therapeutic proteins. The majority of mAbs are currently produced in mammalian expression systems, and yet there are several drawbacks associated with the process, such as high cost and long production cycle [203]. As such, bacterial systems have attracted much attention in mAbs production. To produce functional mAbs in a bacterial host,

371

Figure 11.4 Overall scheme for microbial production of recombinant proteins including therapeutic, membrane, and structural proteins.

11.4 Metabolic Engineering of E. coli for the Production of Materials

the correct formation of disulfide bonds has remained a major challenge. To cope with this, an engineered E. coli strain having an oxidative cytoplasm was developed, which successfully allowed the cytoplasmic production of immunoglobulin G (IgG) [80]. On the other hand, the approach of global transcriptional machinery engineering was applied to improve the full-length antibody (Fab) production in E. coli. More specifically, a mutant library of the global sigma factor RpoD was generated and assessed for improved Fab production using high-throughput screening, which ultimately resulted in a Fab titer of 130.7 mg l−1 in flask culture [81]. Glycosylated proteins are equally valuable targets to be pursued in the field of recombinant therapeutic proteins. Glycosylation, as one of the most abundant post-translational modifications, is essential to many therapeutic proteins [204]. The transfer of a functional Campylobacter jejuni glycosylation machinery into E. coli has opened the possibility of utilizing bacteria as a cell factory for glycoprotein production [205]. A number of glycoconjugate vaccines have been successfully developed in E. coli engineered with glycosylation capacity [206]. However, glycosylation using eukaryotic glycans has remained difficult in glycoprotein production. To resolve this, four yeast glycosyltransferases were introduced into E. coli to synthesize the eukaryotic glycan trimannosyl chitobiose, which was subsequently integrated with target proteins by the bacterial oligosaccharyltransferase from C. jejuni [82]. To further increase the glycoprotein yields in E. coli, various engineering strategies were investigated, including introduction of orthogonal glycosylation pathways with rational design, optimization of the glycosylation pathway, and cell-free metabolic engineering [207]. In addition, metabolic engineering combined with flow cytometric cell sorting allowed the production of glycoprotein with eukaryotic glycan up to 14 mg l−1 in E. coli [83]. 11.4.1.2

Membrane Proteins

Functional and structural elucidation of membrane proteins has contributed to the advancement in the fields of cell biology and drug discovery. As the abundance of natural membrane proteins is often too low for functional and structural analyses, recombinant expression of membrane proteins has been a prerequisite to investigate these proteins [208, 209]. Two different strategies have been adopted for expressing membrane proteins in E. coli. One is the cytoplasmic expression of a nonfunctional premature membrane protein followed by in vitro refolding into its functional form. For instance, chemokine receptor CXCR is an integral membrane protein that specifically binds and responds to cytokines of the CSC chemokine family. This protein was first expressed in E. coli as an inclusion body, which was purified and then refolded in 1,2-dimyristoyl-sn-glycero-3-phosphatidylcholine proteoliposomes to recover its function [210]. The other strategy is to directly produce membrane-anchored functional proteins for immediate studies or through detergent extraction. Taking the yeast mitochondrial ABC transporter Atm1 as an example, it was first expressed in E. coli by fusing with the OmpA signal sequence, and subsequently cell membrane of the recombinant strain was isolated using the detergent n-dodecyl-β-d-maltoside [84, 85].

373

374

11 Metabolic Engineering of Escherichia coli

11.4.1.3

Protein-Based Materials

Proteins can also serve as (bio)materials themselves. The flexibility of proteinbased materials allows the design and manufacture of novel biomaterials with diverse and complex functionalities [211]. A prerequisite for large-scale application of these protein-based materials is the establishment of an efficient and stable production platform. One such example is the successful production of recombinant native-sized spider dragline silk proteins in E. coli. Spider dragline silk is well known for its extraordinary mechanical properties. However, it is difficult and highly inefficient to harvest silk proteins directly from spiders due to their territorialism and cannibalism. A more practical way for mass production of spider dragline silk is instead to produce recombinant spider silk proteins followed by their subsequent spinning into silk fibers. Production of recombinant spider silk proteins has been a challenging task, as they have large size and are highly repetitive and glycine- and alanine-rich in amino acid composition [212]. To address these issues, the glycyl-tRNA pool of E. coli was enriched by increasing the availability of tRNAGly , and the glycine biosynthetic pathway was enhanced by overexpressing the glyA gene encoding serine hydroxymethyltransferase. These strategies enabled the production of spider silk proteins as large as 54.6–284.9 kDa with the titers of 0.5–2.7 g l−1 by fed-batch culture [86]. 11.4.2

Biopolymers

Microorganisms can naturally synthesize a range of polymeric materials, such as polynucleotides, polysaccharides, polyesters, and polyamides [213]. These biopolymers possess distinctive mechanical properties with biocompatibility and biodegradability, and can thus be used for many industrial and medical applications [214, 215]. In this section, we focus on three representative examples of such microbial biopolymers, which are polyhydroxyalkanoates (PHAs), polysaccharides, and nonprotein poly(amino acid)s. And, the development of numerous metabolically engineered E. coli strains for the production of these natural or non-natural biopolymers is discussed. 11.4.2.1

PHAs

Since the first discovery of poly(3-hydroxybutyrate) [poly(3HB)] in Bacillus megaterium in the early 1900s, a family of polyesters consisting of 3-, 4-, 5-, and 6-hydroxycarboxylic acids (C3–C20) have been identified from various microorganisms, which are now referred to as PHAs [216, 217]. Various microorganisms accumulate PHAs intracellularly as an energy and redox storage material when they confront unfavorable growth condition such as nitrogen or phosphate limitation in the presence of excess carbon source. To date, over 160 hydroxycarboxylic acids have been identified as the monomers of PHAs [216, 218]. The variety of monomer types and monomer compositions allow PHAs to possess a wide range of material properties from thermoplastics to elastomeric rubber-like polymers. PHAs can be used for various daily plastic products such as cups, containers, food packaging, and bags. In addition, PHAs are biodegradable and biocompatible, thus have been extensively studied for medical materials such as absorbable sutures and drug delivery carriers [219, 220]. To contribute to a

11.4 Metabolic Engineering of E. coli for the Production of Materials

sustainable plastics industry and to reduce plastic waste pollution derived from the carelessly disposed conventional nondegradable plastics, continuous effort has been devoted to developing industrial strains for the production of PHAs [221–223]. Although E. coli is not a native PHA producer, it has been the most commonly employed host for PHA production owing to its several advantages described earlier. E. coli can utilize a wide spectrum of different carbon sources, and accumulate large amounts of PHAs (e.g. poly(3HB)) with high productivity [224, 225]. In addition, the fragility of E. coli cells accumulating large amounts of PHA as compared to the native producer C. necator, allows simpler PHA purification processes at low cost [224]. Metabolic engineering of E. coli has been mainly performed toward the production of two types of PHAs, which are short chain length PHA (SCL-PHA) consisting of C3–C5 monomers and medium chain length PHA (MCL-PHA) consisting of C6–C14 monomers. In addition, E. coli has been successfully engineered to produce several non-natural polyesters such as poly(lactate) (PLA) by equipping the strain with a specific engineered PHA synthase capable of polymerizing the corresponding non-natural monomers. The SCL-PHAs include poly(3HB), poly(3HB-co-3-hydroxyvalerate) [poly (3HB-co-3 HV)], poly(3-hydroxypropionate) [poly(3HP)], and poly(4-hydroxybutyrate) [poly(4HB)]. As a representative example, poly(3HB) is synthesized from acetyl-CoA through a simple three step biosynthetic pathway (Figure 11.5, gray highlight). Specifically, β-ketothiolase (encoded by phaA) condenses two acetyl-CoA molecules into acetoacetyl-CoA, which is then reduced to 3-hydroxybutyryl-CoA by acetoacetyl-CoA reductase (encoded by phaB). Finally, 3-hydroxybutyryl-CoA is polymerized into poly(3HB) by PHA synthase (encoded by phaC) [227]. To synthesize poly(3HB) in E. coli, the phaCAB operon from various microorganisms such as C. necator, Aeromonas hydrophila, or Alcaligenes latus was introduced into E. coli and the expression level was further optimized for enhancing poly(3HB) production [87, 228, 229]. For example, E. coli was engineered to constitutively overexpress the A. latus phaCAB operon by using a plasmid harboring the parB (hok/sok) locus for plasmid stabilization [87]. In a pH-stat fed-batch culture, the resultant E. coli strain produced 141.6 g l−1 of poly(3HB) with the productivity of 4.63 g l−1 h−1 , which was comparable to the high records obtained by native producers [87]. Other SCL-PHAs such as poly(3HP) and poly(4HB) have also been produced in recombinant E. coli by constructing their synthetic pathways [88, 89, 230–232]. For poly(3HP), several synthetic pathways have been developed (Figure 11.5, gray highlight), such as the C3 substrate (glycerol or 1,3-propanediol)-based pathways and the malonyl-CoA pathway [231]. The poly(4HB) biosynthetic pathway starting from succinyl-CoA (Figure 11.5, gray highlight) was also constructed by introducing the C. kluyveri sucD, 4hbD, and cat2 genes together with C. necator phaC gene [89]. The MCL-PHAs generally represent polymers comprising 3-hydroxycarboxylic acid monomers of C6–C14 such as 3-hydroxyhexanoate (3HHx), 3-hydroxyoctanoate (3HO), 3-hydroxydecanoate (3HD), and 3-hydroxydodecanoate (3HDD), which can be generated from both fatty acid de novo biosynthesis and

375

11.4 Metabolic Engineering of E. coli for the Production of Materials

Figure 11.5 Metabolic pathways and engineering strategies for the production of representative PHAs and non-natural polyesters in E. coli. Metabolite abbreviations: Ac-CoA, acetyl-CoA; AcAc-CoA, acetoacetyl-CoA; DAHP, 3-deoxy-D-arabino-heptulosonate-7-phosphate; Chor, chorismate; GA, glycolate; G6P, glucose 6-phosphate; 2HA, 2-hydroxyalkanoate; 3HA-ACP, 3-hydroxyacyl-ACP; 3HA-CoA, 3-hydroxyacyl-CoA; 3HB, 3-hydroxybutyrate; 4HB, 4-hydroxybutyrate; 3HP, 3-hydroxypropionate; 3HPAld, 3-hydroxypropionalydehyde; LA, lactate; Mal-ACP; malonyl-ACP; Mal-CoA, malonyl-CoA; PEP, phosphoenolpyruvate; PhAla, Phenylalanine; P(3HB), poly(3-hydroxybutyrate); P(4HB), poly(4-hydroxybutyrate); PhLA, phenyllactate; PhPy, phenylpyruvate; P(3HP), poly(3-hydroxypropionate); PLGA, poly(lactate-co-glycolate); P(PhLA-co-3HB), poly(phenyllactate-co-3-hydroxybutyrate); PreP, prephenate; Suc-CoA, succinyl-CoA; Succ-SA, succinate semialdehyde. Gene symbols: acc, acetyl-CoA carboxylase; aldD, aldehyde dehydrogenase; aroGfbr , feedback-inhibition resistant mutant of DAHP synthase; aspC, aspartate aminotransferase; cat2, 4-hydroxybutyrate-CoA transferase; dhaB, glycerol dehydratase; dhaT, 1,3-propanediol dehydrogenase; fabG, 3-ketoacyl-ACP reductase; fadB, 3-hydroxyacyl-CoA dehydrogenase; fldH, phenyllactate dehydrogenase; hadA, isocaprenoyl-CoA:2-hydroxyisocaproate CoA-transferase; 4hbD, 4-hydroxybutyrate dehydrogenase; mcr, malonyl-CoA reductase; pcs’, propionyl-CoA synthetase; pct540, Clostridium propionicum propionyl-CoA transferase mutant (Val193Ala and four silent nucleotide mutations of T78C, T669C, A1125G, and T1158C); pduP, propionaldehyde dehydrogenase; phaA, β-ketothiolase; phaB, acetoacetyl-CoA reductase; phaC, PHA synthase; phaC1437, Pseudomonas sp. MBEL 6–19 PHA synthase mutant (Glu130Asp, Ser325Thr, Ser477Gly, Gln481Lys); phaJ, enoyl-CoA hydratase; pheAfbr , feedback-inhibition resistant mutant of chorismate mutase; prpE, propionyl-CoA synthetase; sucD, succinate semialdehyde dehydrogenase; tyrB, tyrosine aminotransferase; xylBC, xylose dehydrogenase and xylonolactonase. Sources: Langenbach et al. [90], Park and Lee [226], and Taguchi et al. [91].

β-oxidation (fatty acid degradation) pathways (Figure 11.5). Among the four different classes of PhaCs, only the class II PhaC prefers MCL over SCL monomers as substrates. Therefore, various class II PhaC genes have been expressed in E. coli and examined for the production of MCL-PHAs [90, 91, 226]. Together with PhaC, E. coli has been engineered to efficiently generate MCL-hydroxyacyl-CoA monomers by engineering the β-oxidation pathway and/or fatty acid de novo biosynthesis pathway. The first MCL-PHA-producing E. coli was developed by overexpressing the Pseudomonas aeruginosa phaC1 gene in a fadB mutant E. coli, in which the β-oxidation pathway was blocked to accumulate MCL-hydroxyacyl-CoAs from the fatty acid substrate [90]. The engineered E. coli strain produced MCL-PHA having 2.5 mol% 3HHx, 20 mol% 3HO, 72.5 mol% 3HD, and 5 mol% 3HDD with a polymer content of 23 wt% of dry cell weight (DCW) from decanoate. Several engineering strategies on the β-oxidation pathway and fatty acid de novo biosynthesis pathway have been successfully adopted for producing MCL-PHAs in E. coli. These are represented by the deletion of fadB (encoding 3-hydroxyacyl-CoA dehydrogenase) or fadA (encoding 3-ketoacyl-CoA thiolase) to inhibit the β-oxidation pathway and overexpression of fabG (encoding 3-ketoacyl-ACP reductase) or phaJ (encoding enoyl-CoA hydratase) to promote MCL-hydroxyacyl-CoA synthesis (Figure 11.5) [90, 91, 226]. Moreover, several E. coli enzymes including an enoyl-CoA hydratase (encoded by maoC) and FadB homologous enzymes (encoded by paaG, paaF, and ydbU) have been newly identified as engineering targets for improving MCL-PHAs production [226, 233]. SCL-MCL-PHAs,

377

378

11 Metabolic Engineering of Escherichia coli

which are PHAs having both SCL and MCL monomers, can also be produced through the use of particular PhaC enzymes, such as Pseudomonas sp. 61-3 PhaC1 and PhaC2 and Aeromonas caviae PhaC, which can accept both SCL and MCL monomers as substrates [92]. These types of PHAs, such as poly(3HB-co-3HHx) and poly(3HB-co-MCL-3-hydroxy acid), display material properties suitable for broader applications than SCL-PHAs and MCL-PHAs [93, 234]. Apart from natural PHAs described above, polyesters containing 2-hydroxy acid moieties, such as PLA and poly(lactate-co-glycolate) (PLGA), are well-known biodegradable plastics for their various industrial and medical applications [235, 236]. Although a wide range of hydroxy acid monomers have been identified as PHA constituents, these 2-hydroxy acid monomers including lactate and glycolate have never been found in PHAs produced by native microbes, since the key enzyme PhaC shows negligible activity toward them [237–239]. Thus, to develop E. coli strains producing such non-natural polyesters, PhaC was engineered to possess enhanced activity toward lactyl-CoA at first. The class II PhaCs from Pseudomonas sp. MBEL 6-19 and Pseudomonas sp. 61-3 were subjected to site-directed mutagenesis against the amino acid residues that had been reported to have influence on the substrate specificity and activity [94, 95, 239–241]. The activity assay of PhaC was performed in vivo by overexpressing the phaC gene variant together with the pct gene encoding propionyl-CoA transferase that converts lactate to lactyl-CoA which is the substrate of PhaC. Two Pcts from C. propionicum and Megasphaera elsdenii were successfully employed for generating lactyl-CoA [94, 95, 239–241]. PhaC variants harboring amino acid mutations at Glu130, Ser325, Ser477, and Gln481 showed significantly enhanced activities toward lactyl-CoA [94, 95, 239–241]. In addition, through random mutagenesis of the C. propionicum pct gene, two superior mutants Pct532 (Ala243Thr, and a single silent nucleotide mutation of A1200G) and Pct540 (Val193Ala, and four silent nucleotide mutations of T78C, T669C, A1125G, and T1158C) were acquired [240]. With the overexpression of Pct532 and Pseudomonas sp. MBEL 6-19 PhaC mutant (PhaC1400; Glu130Asp, Ser325Thr, Ser477Arg, and Gln481Met), E. coli was further engineered to concentrate the flux toward PLA biosynthesis by inactivating the byproduct formation pathways and upregulating the lactate formation pathway, which resulted in the production of homo-PLA with polymer contents up to 11 wt% of DCW [94]. The engineered PhaCs used for polymerizing lactyl-CoA allowed the expansion of monomer spectrum for diverse 2-hydroxy acids in addition to lactate. The various CoA transferases and PhaCs used for polymerization of 2-hydroxy acid monomers have been thoroughly reviewed previously [242]. To date, engineered E. coli strains equipped with the engineered PhaCs have been able to produce a variety of non-natural polyesters containing not only aliphatic 2-hydroxy acid monomers such as glycolate, 2-hydroxybutyrate, 2-hydroxyisovalerate, 2-hydroxyisocaproate, and 2-hydroxy-3-methylvalerate [96, 97, 243–246], but also aromatic ones such as mandelic acid and phenyllactate (PhLA) [98]. Two successful examples in which intensive metabolic engineering was implemented are the development of E. coli strains producing PLGA and poly(PhLA-co-3HB) from renewable biomass [96, 98].

11.4 Metabolic Engineering of E. coli for the Production of Materials

To biosynthesize PLGA, the heterologous xylose-catabolizing pathway, Dahms pathway, was established in E. coli by overexpressing the Caulobacter crescentus xylB and xylC genes encoding xylose dehydrogenase and xylonolactonase, respectively. Glycolate generated from xylose through the Dahms pathway can be converted to glycolyl-CoA by Pct540 which also generates lactyl-CoA from lactate. Glycolyl-CoA and lactyl-CoA can be polymerized into PLGA by an evolved Pseudomonas sp. MBEL 6-19 PhaC (PhaC1437; Glu130Asp, Ser325Thr, Ser477Gly, and Gln481Lys) (Figure 11.5, green highlight). This E. coli strain was further engineered to concentrate the metabolic flux toward lactate and glycolate by knocking out several chromosomal genes including adhE, poxB, frdB, dld, aceB, and glcDEFGB, and by replacing the native ldhA promoter with strong trc promoter [96]. Moreover, the metabolic flux toward the Dahms pathway was finely modulated by using synthetic promoters with different strengths for the expression of C. crescentus xylBC genes. As a result, the final engineered E. coli strain produced PLGA up to 6.93 g l−1 by fed-batch fermentation [97]. For poly(PhLA-co-3HB) production, the aromatic amino acid pathway was engineered to overproduce PhLA (Figure 11.5, purple highlight). It was achieved by the overexpression of feedback-resistant genes aroGfbr (encoding 3-deoxy-d-arabinoheptulosonate-7-phosphate synthase) and pheAfbr (encoding chorismate mutase/prephenate dehydratase) and the deletion of tyrB (encoding l-tyrosine aminotransferase) and aspC (encoding l-aspartate aminotransferase) genes based on the in silico genome-scale metabolic flux analysis (Figure 11.5, yellow highlight). On the other hand, the conversion of PhLA into phenyllactyl-CoA was mediated by a CoA-transferase from Clostridium difficile (encoded by hadA) (Figure 11.5, green highlight), whose activity toward PhLA as well as other aliphatic and aromatic hydroxy acids including 3HB, lactate, glycolate, and mandelic acid was newly identified. Finally, phenyllactyl-CoA was polymerized with 3-hydroxybutyryl-CoA, which was generated by overexpressing the C. necator phaAB genes, to form poly(PhLA-co-3HB) by PhaC1437. Through the fed-batch fermentation of the engineered E. coli strain, poly(38.1 mol% PhLA-co-3HB) was produced up to 13.9 g l−1 from glucose with a polymer content of 55 wt% of DCW [98]. 11.4.2.2

Polysaccharides

Polysaccharides are a group of polymers made of simple sugars. There are several kinds of polysaccharides present in biological systems such as hyaluronan, alginate, exopolysaccharide, and bacterial cellulose [247, 248]. Among them, hyaluronan [99], exopolysaccharide [249], and bacterial cellulose [100] have been successfully produced in engineered E. coli. Hyaluronan, also known as hyaluronic acid, is a polymer with repeating disaccharide units composed of glucuronic acid and N-acetyl-glucosamine. Hyaluronan has numerous applications in medicine, cosmetics, and specialty foods, such as skin moisturizers, osteoarthritis treatment, ophthalmic surgery, adhesion prevention after abdominal surgery, and wound healing [247]. Microbial fermentation of Streptococcus sp. has been a preferred method for hyaluronan production [250], but E. coli has also been explored [99]. As a first step, the Streptococcus equisimilis ssehasA gene encoding hyaluronan synthase was overexpressed. To

379

380

11 Metabolic Engineering of Escherichia coli

facilitate the expression of ssehasA gene in E. coli, all of the rare codons in the gene sequence were substituted with other favorable ones. In addition, the ugd gene encoding UDP-glucose 6-dehydrogenase was co-overexpressed to enhance the flux toward hyaluronan precursor. The final engineered E. coli strain produced 190 mg l−1 of hyaluronan from glucose by flask cultivation [99]. Bacterial cellulose has the same molecular formula as the well-known plant cellulose. Compared with plant cellulose, bacterial cellulose has excellent properties such as high mechanical stability, tensile strength, thermostability, crystallinity, purity, and biocompatibility. Bacterial cellulose can be used in biomedical products such as bone tissue scaffolds, artificial blood vessels, artificial skin, and dental implants [248]. More recently, it has also been exploited as a single-layer separator in a large-scale lithium rechargeable battery [251]. Komagataeibacter xylinus and Gluconacetobacter hansenii are one of the representative native bacterial cellulose producers. There has also been an attempt to produce bacterial cellulose using E. coli as a host strain. In one study, the bacterial cellulose biosynthetic operon (bcsABCD) from G. hansenii was introduced in E. coli C41(DE3) strain. Furthermore, the G. hansenii cmcax and ccpAx genes which play a role in enhancing cellulose synthesizing activity and locating bcsABCD complex on the cell membrane, respectively, were also overexpressed. After optimizing the inducer concentration (0.05 mM IPTG) and culture temperature (30 ∘ C), the final engineered strain succeeded to produce 31.1 mg l−1 of bacterial cellulose by flask cultivation [100]. 11.4.2.3

Nonprotein Poly(Amino Acids)

Some of the representative nonprotein poly(amino acids) include poly(glutamic acid), cyanophycin, poly(lysine), poly(diaminopropionic acid), and poly(diaminobutyric acid). To overproduce these biopolymers, most of the studies have focused on optimizing the fermentation of native producers. However, similar to the aforementioned hyaluronan case, most of these natural producers are physiologically unfavorable for industrial application. Thus, several studies have been conducted to switch the production host to E. coli by heterologous expression of the nonprotein poly(amino acid) biosynthetic genes. Poly(glutamic acid) could be produced up to 40 g l−1 by engineering the native producer B. subtilis [252]. However, the use of native producers has several drawbacks including strain instability with spontaneous appearance of poly(glutamic acid) nonproducing cells, degradation of poly(glutamic acid) due to the inherent degradation pathway, and byproducts formation such as exopolysaccharides [253]. Thus, E. coli was engineered to produce poly(glutamic acid) by overexpressing the biosynthetic genes (pgsB, pgsC, and pgsA) from B. subtilis. As the choice of host strain and promoter for expressing heterologous genes significantly affects production capability, three E. coli strains including BL21(DE3), W3110, and FMJ123, and two promoters including the trc promoter and native promoter of Geobacillus toebii d-amino acid aminotransferase gene (PHCE), were explored for poly(glutamic acid) production. Among them, E. coli BL21(DE3) strain expressing the pgsBCA genes under PHCE promoter was the most effective combination and produced 0.48 g l−1 of poly(glutamic acid) by fed-batch fermentation. To further optimize the performance, transcriptome

11.4 Metabolic Engineering of E. coli for the Production of Materials

profiling was performed for 2850 E. coli functional genes. The result showed that the expression levels of glnA, glnG, glnK, nac, and nitrogen-regulated genes (yhdX, yhdY , yhdZ, amtB, argT, and cbl) were upregulated, indicating that engineered cells had suffered from nitrogen starvation. Based on these results, new fed-batch fermentation was conducted using the feeding solution supplemented with additional nitrogen source, (NH4 )2 SO4 , resulting in the production of 3.7 g l−1 of poly(glutamic acid) [101]. In the case of cyanophycin, an engineered E. coli strain overexpressing the Synechocystis sp. cphA gene encoding cyanophycin synthetase was constructed. The transcription of cphA gene was controlled by the lambda PL promoter and the temperature-sensitive cI857 repressor. Thus, the expression of cphA gene could be stimulated by shifting cultivation temperature from 30 to 37 ∘ C without adding any expensive inducers, which is highly desirable for industrial fermentation. Furthermore, several E. coli strains (DH1, TOP10, DH5α, JM109, and SMH50) were tested as hosts to deal with the plasmid instability issue. It was found that E. coli DH1 could successfully maintain the plasmid even after several transfers. Consequently, the E. coli DH1 strain overexpressing cphA gene was employed to produce 1.6 g l−1 of cyanophycin in a 500-L fermentor by batch fermentation [102]. Although the titers achieved using metabolically engineered E. coli strains are inferior to the native producers, further improvement seems to be possible through further metabolic engineering and fermentation optimization. 11.4.3

Nanomaterials

Nanomaterials (NMs) such as metal NPs, metal oxide NPs, quantum dots (QDs), and graphene have received much attention due to their diverse applications in catalysis, bioimaging, biosensing, drug delivery, and electronics. Several studies have reported successful biosynthesis of NMs using microorganisms [254, 255]. In particular, E. coli has been one of the most frequently used cell factories for biosynthesizing NMs. To date, 32 elements from the periodic table have been utilized to biosynthesize the corresponding NMs in E. coli. While several types of NMs including single-element NMs (e.g. Ag, Au, Cu, Pt, and Te) [256, 257], multi-element NMs (e.g. CdS and CdTe) [258, 259], and nanohybrid NMs (e.g. reduced graphene oxide/E. coli) [260] have been produced in wild-type E. coli, genetically engineered E. coli strains for NMs have also been developed, which can largely be categorized according to the different engineering strategies applied (Figure 11.6, gray highlight): increasing metal-binding affinity, increasing metal ion reduction, and decreasing metal ion toxicity. To increase metal-binding affinity, several metal-binding proteins and peptides have been introduced into genetically engineered E. coli. For example, metallothionein (MT), a metal-binding protein, and phytochelatin (PC), a metal-binding peptide, can bind to heavy metals (e.g. Cu, Zn, and Cd); therefore, the genes encoding MT and PC synthase (PCS) were overexpressed in E. coli to enable the biosynthesis of QDs such as CdS, CdSe, and EuSe [261–263]. To expand the biosynthetic diversity of NMs, recombinant E. coli strains co-expressing the P. putida MT and A. thaliana PCS genes have been developed [264, 265]. Recently, biosynthesis of 60 different NMs involving 32 elements was

381

Figure 11.6 Engineering strategies for the biosynthesis of nanomaterials in E. coli and their applications. Thirty-two elements that have been used to biosynthesize nanomaterials are illustrated. Abbreviations or gene symbols: ArsB, arsenic efflux pump; ArsC2 and ArsC3, arsenate reductases; ArsR, Ars operon repressor protein; gshA, L-glutamate cysteine ligase; melA, tyrosinase; MT, metallothionein; OsmY, osmotically inducible protein Y; PCS, phytochelatin synthase.

11.5 Conclusions and Perspectives

demonstrated in a recombinant E. coli strain co-expressing the genes encoding MT and PCS [266]. Some other metal-binding proteins, such as R. etli tyrosinase (encoded by melA) and Cupriavidus metallidurans osmotically inducible protein Y (OsmY), were also introduced into E. coli for producing NMs containing transition metals (Cu and Ni) and noble metals (Au, Ag, Pt, and Pd) [267, 268]. Next, to enhance metal ion reduction, the gshA gene encoding l-glutamate cysteine ligase involved in glutathione biosynthesis was overexpressed in E. coli to elevate the cellular glutathione content. Glutathione contains a thiol group, which can act as a reducing agent for Cd and Te ions. As a result, the recombinant E. coli strain overexpressing the gshA gene was able to produce CdTe QDs with increased fluorescence intensity, compared to the wild-type E. coli [269]. Lastly, to overcome metal ion toxicity, for example during the biosynthesis of As NMs, several As resistance-related proteins such as the E. coli As operon repressor protein (ArsR) and arsenic efflux pump protein (ArsB), and Desulfovibrio alaskensis G20 arsenate reductases (ArsC2 and ArsC3), were overexpressed to allow the production of As NMs [270]. Biosynthesized NMs by engineered E. coli have been utilized in a variety of applications, including Li-ion batteries, bioimaging, and drug delivery (Figure 11.6, yellow highlight). For instance, the reduced graphene oxide/E. coli hybrid exhibited excellent catalytic activity for the oxygen reduction reaction, and was thus successfully applied in a Li-ion battery as anode material [271]. In another example, biogenic EuSe NP showed great potential as a bioimaging material and drug-carrying agent in biomedical applications thanks to its high fluorescence properties at the excitation wavelength of 320 nm and biocompatibility [262]. Biogenic Au NP/doxorubicin was also applied as a drug carrier to improve drug delivery in human cancer cells (e.g. HeLa and SKOV-3), as it displayed higher biocompatibility and less toxic effects than chemically synthesized Au NP [272]. From the above applications, it is anticipated that biogenic NMs will be increasingly used in many exciting applications including chemical sensors, biosensors, drug delivery, and cancer therapy.

11.5 Conclusions and Perspectives In this chapter, we reviewed the current status of metabolic engineering of E. coli for diverse applications in terms of the production of biobased fuels, chemicals, and materials from renewable resources. Various metabolic engineering tools and strategies applied for the development of relevant engineered strains with superior performance were also discussed. As showcased above, tremendous progress has been achieved in the application of metabolic engineering of E. coli for the sustainable production of various kinds of bioproducts. Nevertheless, many of the products are still being produced at low efficiencies, and thus require further strain and/or process optimization to enhance their titer, yield, and productivity prior to commercialization. To this end, systems metabolic engineering will be playing an increasingly significant role in developing high-performance cell factories. One of the major weaknesses of employing

383

384

11 Metabolic Engineering of Escherichia coli

E. coli as a microbial cell factory is its relatively less robustness, compared with yeasts and even a bacterium like Corynebacterium. Thus, more effort will be exerted to enhance its robustness. For example, strain tolerance to toxic products can be improved through either rational approaches including cell membrane engineering [273] and transporter engineering [274, 275] or rational-random strategies such as adaptive laboratory evolution [276]. In addition, E. coli as a bacterium is often susceptible to phage attack, especially during large-scale industrial fermentations [277]. To address this challenge, it is important to understand various bacteriophage resistance mechanisms [278], which can in turn facilitate the engineering of microbial cell factories with enhanced phage immunity and consequently higher production performance [279, 280]. Another weakness is that E. coli is not a generally recognized as safe (GRAS) strain, and therefore some bioproducts such as natural products produced by E. coli require further processing to remove endotoxins. In addition, E. coli lacks in certain advanced biological functions, such as protein glycosylation, as described earlier. Thus, much effort to engineer such functions into E. coli needs to be exerted. The toolbox and methodologies of systems metabolic engineering continue to be enriched with respect to scope and depth, for example through interdisciplinary approaches integrating the emerging sciences of big data and artificial intelligence to extract new and useful information from the complex biological networks that may in turn suggest better and critical strain optimization strategies [281]. These advances will allow more efficient biobased production of biofuels, chemicals, and materials from renewable resources by all microorganisms described in this book. However, E. coli will remain our favorite model and production microbial host for the production of increasing number of biofuels, chemicals, and materials.

Acknowledgment This work was supported by the Technology Development Program to Solve Climate Changes on Systems Metabolic Engineering for Biorefineries (NRF-2012M1A2A2026556 and NRF-2012M1A2A2026557) from the Ministry of Science and through the National Research Foundation of Korea.

References 1 Chae, T.U., Choi, S.Y., Kim, J.W. et al. (2017). Recent advances in systems

metabolic engineering tools and strategies. Curr. Opin. Biotechnol. 47: 67–82. 2 Lee, S.Y. and Kim, H.U. (2015). Systems strategies for developing industrial microbial strains. Nat. Biotechnol. 33 (10): 1061–1072. 3 Choi, K.R., Jang, W.D., Yang, D. et al. (2019). Systems metabolic engineering strategies: integrating systems and synthetic biology with metabolic engineering. Trends Biotechnol. 37 (8): 817–837.

References

4 Cho, C., Choi, S.Y., Luo, Z.W., and Lee, S.Y. (2015). Recent advances in

5

6 7

8 9 10

11

12

13

14

15

16

17

18

19 20

microbial production of fuels and chemicals using tools and strategies of systems metabolic engineering. Biotechnol. Adv. 33 (7): 1455–1466. Lee, J.W., Na, D., Park, J.M. et al. (2012). Systems metabolic engineering of microorganisms for natural and non-natural chemicals. Nat. Chem. Biol. 8 (6): 536–546. Lee, S.Y., Kim, H.U., Chae, T.U. et al. (2019). A comprehensive metabolic map for production of bio-based chemicals. Nat. Catal. 2 (1): 18–33. Yang, D., Cho, J.S., Choi, K.R. et al. (2017). Systems metabolic engineering as an enabling technology in accomplishing sustainable development goals. Microb. Biotechnol. 10 (5): 1254–1258. Pontrelli, S., Chiu, T.-Y., Lan, E.I. et al. (2018). Escherichia coli as a host for metabolic engineering. Metab. Eng. 50: 16–46. Choi, K., Shin, J., Cho, J. et al. (2016). Systems metabolic engineering of Escherichia coli. EcoSal Plus 7: 1–56. Meyer, H.-P. and Schmidhalter, D.R. (2012). Microbial expression systems and manufacturing from a market and economic perspective. In: Innovations in Biotechnology (ed. E.C. Agbo), 211–250. Rijeka: IntechOpen. Tong, I.T., Liao, H.H., and Cameron, D.C. (1991). 1,3-Propanediol production by Escherichia coli expressing genes from the Klebsiella pneumoniae dha regulon. Appl. Environ. Microbiol. 57 (12): 3541–3546. Bongaerts, J., Krämer, M., Müller, U. et al. (2001). Metabolic engineering for microbial production of aromatic amino acids and derived compounds. Metab. Eng. 3 (4): 289–300. Park, J.H. and Lee, S.Y. (2008). Towards systems metabolic engineering of microorganisms for amino acid production. Curr. Opin. Biotechnol. 19 (5): 454–460. Weber, E., Engler, C., Gruetzner, R. et al. (2011). A modular cloning system for standardized assembly of multigene constructs. PLoS One 6 (2): e16765. Storch, M., Casini, A., Mackrow, B. et al. (2015). BASIC: a new biopart assembly standard for idempotent cloning provides accurate, single-tier DNA assembly for synthetic biology. ACS Synth. Biol. 4 (7): 781–787. Hochrein, L., Machens, F., Gremmels, J. et al. (2017). AssemblX: a user-friendly toolkit for rapid and reliable multi-gene assemblies. Nucleic Acids Res. 45 (10): e80. Wang, H.H., Isaacs, F.J., Carr, P.A. et al. (2009). Programming cells by multiplex genome engineering and accelerated evolution. Nature 460 (7257): 894–898. Mougiakos, I., Bosma, E.F., Ganguly, J. et al. (2018). Hijacking CRISPR-Cas for high-throughput bacterial metabolic engineering: advances and prospects. Curr. Opin. Biotechnol. 50: 146–157. Jiang, W., Bikard, D., Cox, D. et al. (2013). RNA-guided editing of bacterial genomes using CRISPR-Cas systems. Nat. Biotechnol. 31 (3): 233–239. Kim, S.K., Seong, W., Han, G.H. et al. (2017). CRISPR interference-guided multiplex repression of endogenous competing pathway genes for redirecting metabolic flux in Escherichia coli. Microb. Cell Factories 16 (1): 188.

385

386

11 Metabolic Engineering of Escherichia coli

21 Bikard, D., Jiang, W., Samai, P. et al. (2013). Programmable repression and

22

23

24

25

26 27 28

29

30

31

32 33

34

35

36

activation of bacterial gene expression using an engineered CRISPR-Cas system. Nucleic Acids Res. 41 (15): 7429–7437. Na, D., Yoo, S.M., Chung, H. et al. (2013). Metabolic engineering of Escherichia coli using synthetic small regulatory RNAs. Nat. Biotechnol. 31 (2): 170–174. Yang, D., Kim, W.J., Yoo, S.M. et al. (2018). Repurposing type III polyketide synthase as a malonyl-CoA biosensor for metabolic engineering in bacteria. Proc. Natl. Acad. Sci. U. S. A. 115 (40): 9835–9844. Noh, M., Yoo, S.M., Kim, W.J., and Lee, S.Y. (2017). Gene expression knockdown by modulating synthetic small RNA expression in Escherichia coli. Cell Syst. 5 (4): 418–426. Park, J.H., Lee, K.H., Kim, T.Y., and Lee, S.Y. (2007). Metabolic engineering of Escherichia coli for the production of L-valine based on transcriptome analysis and in silico gene knockout simulation. Proc. Natl. Acad. Sci. U. S. A. 104 (19): 7797–7802. Lee, K.H., Park, J.H., Kim, T.Y. et al. (2007). Systems metabolic engineering of Escherichia coli for L-threonine production. Mol. Syst. Biol. 3 (1): 149. Kerr, R.A. (2007). Global warming is changing the world. Science 316 (5822): 188–190. Perera, F. (2018). Pollution from fossil-fuel combustion is the leading environmental threat to global pediatric health and equity: solutions exist. Int. J. Environ. Res. Public Health 15 (1): 16. Mondal, M., Goswami, S., Ghosh, A. et al. (2017). Production of biodiesel from microalgae through biological carbon capture: a review. 3 Biotech 7 (2): 99. Lee, S.K., Chou, H., Ham, T.S. et al. (2008). Metabolic engineering of microorganisms for biofuels production: from bugs to synthetic biology to fuels. Curr. Opin. Biotechnol. 19 (6): 556–563. Srivastava, N., Srivastava, M., Gupta, V.K. et al. (2018). Recent development on sustainable biodiesel production using sewage sludge. 3 Biotech 8 (5): 245. Jones, D.T. and Woods, D.R. (1986). Acetone-butanol fermentation revisited. Microbiol. Rev. 50 (4): 484–524. Kim, J.H., Ryu, J., Huh, I.Y. et al. (2014). Ethanol production from galactose by a newly isolated Saccharomyces cerevisiae KL17. Bioprocess Biosyst. Eng. 37 (9): 1871–1878. You, Y., Liu, S., Wu, B. et al. (2017). Bio-ethanol production by Zymomonas mobilis using pretreated dairy manure as a carbon and nitrogen source. RSC Adv. 7 (7): 3768–3779. Ingram, L.O., Conway, T., Clark, D.P. et al. (1987). Genetic engineering of ethanol production in Escherichia coli. Appl. Environ. Microbiol. 53 (10): 2420–2425. Ohta, K., Beall, D.S., Mejia, J.P. et al. (1991). Genetic improvement of Escherichia coli for ethanol production: chromosomal integration of Zymomonas mobilis genes encoding pyruvate decarboxylase and alcohol dehydrogenase II. Appl. Environ. Microbiol. 57 (4): 893–900.

References

37 Yomano, L.P., York, S.W., and Ingram, L.O. (1998). Isolation and characteri-

38 39

40

41

42

43

44

45

46

47

48

49

50

51

52

zation of ethanol-tolerant mutants of Escherichia coli KO11 for fuel ethanol production. J. Ind. Microbiol. Biotechnol. 20 (2): 132–138. Atsumi, S., Cann, A.F., Connor, M.R. et al. (2008). Metabolic engineering of Escherichia coli for 1-butanol production. Metab. Eng. 10 (6): 305–311. Inui, M., Suda, M., Kimura, S. et al. (2008). Expression of Clostridium acetobutylicum butanol synthetic genes in Escherichia coli. Appl. Microbiol. Biotechnol. 77 (6): 1305–1316. Shen, C.R., Lan, E.I., Dekishima, Y. et al. (2011). Driving forces enable high-titer anaerobic 1-butanol synthesis in Escherichia coli. Appl. Environ. Microbiol. 77 (9): 2905–2915. Atsumi, S., Hanai, T., and Liao, J.C. (2008). Non-fermentative pathways for synthesis of branched-chain higher alcohols as biofuels. Nature 451 (7174): 86–89. Hanai, T., Atsumi, S., and Liao, J.C. (2007). Engineered synthetic pathway for isopropanol production in Escherichia coli. Appl. Environ. Microbiol. 73 (24): 7814–7818. Inokuma, K., Liao, J.C., Okamoto, M., and Hanai, T. (2010). Improvement of isopropanol production by metabolically engineered Escherichia coli using gas stripping. J. Biosci. Bioeng. 110 (6): 696–701. Bastian, S., Liu, X., Meyerowitz, J.T. et al. (2011). Engineered ketol-acid reductoisomerase and alcohol dehydrogenase enable anaerobic 2-methylpropan-1-ol production at theoretical yield in Escherichia coli. Metab. Eng. 13 (3): 345–352. Baez, A., Cho, K.-M., and Liao, J.C. (2011). High-flux isobutanol production using engineered Escherichia coli: a bioreactor study with in situ product removal. Appl. Microbiol. Biotechnol. 90 (5): 1681–1690. Connor, M.R., Cann, A.F., and Liao, J.C. (2010). 3-Methyl-1-butanol production in Escherichia coli: random mutagenesis and two-phase fermentation. Appl. Microbiol. Biotechnol. 86 (4): 1155–1164. Guo, D., Zhang, L., Kong, S. et al. (2018). Metabolic engineering of Escherichia coli for production of 2-phenylethanol and 2-phenylethyl acetate from glucose. J. Agric. Food Chem. 66 (23): 5886–5891. Liu, C.-L., Bi, H.-R., Bai, Z. et al. (2019). Engineering and manipulation of a mevalonate pathway in Escherichia coli for isoprene production. Appl. Microbiol. Biotechnol. 103 (1): 239–250. Steen, E.J., Kang, Y., Bokinsky, G. et al. (2010). Microbial production of fatty-acid-derived fuels and chemicals from plant biomass. Nature 463 (7280): 559–562. Zhang, F., Carothers, J.M., and Keasling, J.D. (2012). Design of a dynamic sensor-regulator system for production of chemicals and fuels derived from fatty acids. Nat. Biotechnol. 30 (4): 354–359. Sherkhanov, S., Korman, T.P., Clarke, S.G., and Bowie, J.U. (2016). Production of FAME biodiesel in Escherichia coli by direct methylation with an insect enzyme. Sci. Rep. 6: 24239. Schirmer, A., Rude, M.A., Li, X. et al. (2010). Microbial biosynthesis of alkanes. Science 329 (5991): 559–562.

387

388

11 Metabolic Engineering of Escherichia coli

53 Liu, Y., Wang, C., Yan, J. et al. (2014). Hydrogen peroxide-independent pro-

54 55

56

57

58

59

60

61

62

63

64

65

66

67

duction of α-alkenes by OleTJE P450 fatty acid decarboxylase. Biotechnol. Biofuels 7 (1): 28. Choi, Y.J. and Lee, S.Y. (2013). Microbial production of short-chain alkanes. Nature 502 (7472): 571–574. Chae, T.U., Ko, Y.-S., Hwang, K.-S., and Lee, S.Y. (2017). Metabolic engineering of Escherichia coli for the production of four-, five- and six-carbon lactams. Metab. Eng. 41: 82–91. Adkins, J., Jordan, J., and Nielsen, D.R. (2013). Engineering Escherichia coli for renewable production of the 5-carbon polyamide building-blocks 5-aminovalerate and glutarate. Biotechnol. Bioeng. 110 (6): 1726–1734. Chu, H.S., Kim, Y.S., Lee, C.M. et al. (2015). Metabolic engineering of 3-hydroxypropionic acid biosynthesis in Escherichia coli. Biotechnol. Bioeng. 112 (2): 356–364. Choi, S., Kim, H.U., Kim, T.Y., and Lee, S.Y. (2016). Systematic engineering of TCA cycle for optimal production of a four-carbon platform chemical 4-hydroxybutyric acid in Escherichia coli. Metab. Eng. 38: 264–273. Chae, T.U., Kim, W.J., Choi, S. et al. (2015). Metabolic engineering of Escherichia coli for the production of 1,3-diaminopropane, a three carbon diamine. Sci. Rep. 5: 13040. Li, W., Ma, L., Shen, X. et al. (2019). Targeting metabolic driving and intermediate influx in lysine catabolism for high-level glutarate production. Nat. Commun. 10 (1): 3337. Vemuri, G.N., Eiteman, M.A., and Altman, E. (2002). Succinate production in dual-phase Escherichia coli fermentations depends on the time of transition from aerobic to anaerobic conditions. J. Ind. Microbiol. Biotechnol. 28 (6): 325–332. Nakamura, C.E. and Whited, G.M. (2003). Metabolic engineering for the microbial production of 1,3-propanediol. Curr. Opin. Biotechnol. 14 (5): 454–459. Yim, H., Haselbeck, R., Niu, W. et al. (2011). Metabolic engineering of Escherichia coli for direct production of 1,4-butanediol. Nat. Chem. Biol. 7 (7): 445–452. Ginesy, M., Belotserkovsky, J., Enman, J. et al. (2015). Metabolic engineering of Escherichia coli for enhanced arginine biosynthesis. Microb. Cell Factories 14 (1): 29. Park, J.H., Jang, Y.-S., Lee, J.W., and Lee, S.Y. (2011). Escherichia coli W as a new platform strain for the enhanced production of L-valine by systems metabolic engineering. Biotechnol. Bioeng. 108 (5): 1140–1147. Kim, B., Binkley, R., Kim, H.U., and Lee, S.Y. (2018). Metabolic engineering of Escherichia coli for the enhanced production of L-tyrosine. Biotechnol. Bioeng. 115 (10): 2554–2564. Luo, Z.W., Cho, J.S., and Lee, S.Y. (2019). Microbial production of methyl anthranilate, a grape flavor compound. Proc. Natl. Acad. Sci. U. S. A. 116 (22): 10749–10756.

References

68 Sun, Z., Ning, Y., Liu, L. et al. (2011). Metabolic engineering of the

69

70

71

72

73

74

75 76

77

78

79

80

81

82

L-phenylalanine pathway in Escherichia coli for the production of S- or R-mandelic acid. Microb. Cell Factories 10 (1): 71. McKenna, R., Pugh, S., Thompson, B., and Nielsen, D.R. (2013). Microbial production of the aromatic building-blocks (S)-styrene oxide and (R)-1,2-phenylethanediol from renewable resources. Biotechnol. J. 8 (12): 1465–1475. Tsuruta, H., Paddon, C.J., Eng, D. et al. (2009). High-level production of amorpha-4,11-diene, a precursor of the antimalarial agent artemisinin, in Escherichia coli. PLoS One 4 (2): e4489. Chang, M.C.Y., Eachus, R.A., Trieu, W. et al. (2007). Engineering Escherichia coli for production of functionalized terpenoids using plant P450s. Nat. Chem. Biol. 3 (5): 274–277. Biggs, B.W., Lim, C.G., Sagliani, K. et al. (2016). Overcoming heterologous protein interdependency to optimize P450-mediated Taxol precursor synthesis in Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 113 (12): 3209–3214. Lim, C.G., Fowler, Z.L., Hueller, T. et al. (2011). High-yield resveratrol production in engineered Escherichia coli. Appl. Environ. Microbiol. 77 (10): 3451–3460. Wu, J., Du, G., Chen, J., and Zhou, J. (2015). Enhancing flavonoid production by systematically tuning the central metabolic pathways based on a CRISPR interference system in Escherichia coli. Sci. Rep. 5: 13477. Nakagawa, A., Minami, H., Kim, J.-S. et al. (2011). A bacterial platform for fermentative production of plant alkaloids. Nat. Commun. 2: 326. Nakagawa, A., Matsumura, E., Koyanagi, T. et al. (2016). Total biosynthesis of opiates by stepwise fermentation using engineered Escherichia coli. Nat. Commun. 7: 10390. Zhang, H., Wang, Y., Wu, J. et al. (2010). Complete biosynthesis of erythromycin A and designed analogs using Escherichia coli as a heterologous host. Chem. Biol. 17 (11): 1232–1240. Watanabe, K., Hotta, K., Praseuth, A.P. et al. (2006). Total biosynthesis of antitumor nonribosomal peptides in Escherichia coli. Nat. Chem. Biol. 2 (8): 423–428. Pfeifer, B.A., Wang, C.C.C., Walsh, C.T., and Khosla, C. (2003). Biosynthesis of yersiniabactin, a complex polyketide-nonribosomal peptide, using Escherichia coli as a heterologous host. Appl. Environ. Microbiol. 69 (11): 6698–6702. Robinson, M.-P., Ke, N., Lobstein, J. et al. (2015). Efficient expression of full-length antibodies in the cytoplasm of engineered bacteria. Nat. Commun. 6: 8072. McKenna, R., Lombana, T.N., Yamada, M. et al. (2019). Engineered sigma factors increase full-length antibody expression in Escherichia coli. Metab. Eng. 52: 315–323. Valderrama-Rincon, J.D., Fisher, A.C., Merritt, J.H. et al. (2012). An engineered eukaryotic protein glycosylation pathway in Escherichia coli. Nat. Chem. Biol. 8 (5): 434–436.

389

390

11 Metabolic Engineering of Escherichia coli

83 Glasscock, C.J., Yates, L.E., Jaroentomeechai, T. et al. (2018). A flow cyto-

84

85

86

87

88

89

90

91

92

93

94

95

96

97

metric approach to engineering Escherichia coli for improved eukaryotic protein glycosylation. Metab. Eng. 47: 488–495. Srinivasan, V., Pierik, A.J., and Lill, R. (2014). Crystal structures of nucleotide-free and glutathione-bound mitochondrial ABC transporter Atm1. Science 343 (6175): 1137–1140. Kuhnke, G., Neumann, K., Mühlenhoff, U., and Lill, R. (2006). Stimulation of the ATPase activity of the yeast mitochondrial ABC transporter Atm1p by thiol compounds. Mol. Membr. Biol. 23 (2): 173–184. Xia, X.-X., Qian, Z.-G., Ki, C.S. et al. (2010). Native-sized recombinant spider silk protein produced in metabolically engineered Escherichia coli results in a strong fiber. Proc. Natl. Acad. Sci. U. S. A. 107 (32): 14059–14063. Choi, J.-I., Lee, S.Y., and Han, K. (1998). Cloning of the Alcaligenes latus polyhydroxyalkanoate biosynthesis genes and use of these genes for enhanced production of poly(3-hydroxybutyrate) in Escherichia coli. Appl. Environ. Microbiol. 64 (12): 4897–4903. Gao, Y., Liu, C., Ding, Y. et al. (2014). Development of genetically stable Escherichia coli strains for poly(3-hydroxypropionate) production. PLoS One 9 (5): e97845–e97845. Zhou, X.-Y., Yuan, X.-X., Shi, Z.-Y. et al. (2012). Hyperproduction of poly(4-hydroxybutyrate) from glucose by recombinant Escherichia coli. Microb. Cell Factories 11 (1): 54. Langenbach, S., Rehm, B.H.A., and Steinbüchel, A. (1997). Functional expression of the PHA synthase gene phaC1 from Pseudomonas aeruginosa in Escherichia coli results in poly(3-hydroxyalkanoate) synthesis. FEMS Microbiol. Lett. 150 (2): 303–309. Taguchi, K., Aoyagi, Y., Matsusaki, H. et al. (1999). Co-expression of 3-ketoacyl-ACP reductase and polyhydroxyalkanoate synthase genes induces PHA production in Escherichia coli HB101 strain. FEMS Microbiol. Lett. 176 (1): 183–190. Park, S.J. and Lee, S.Y. (2004). Biosynthesis of poly(3-hydroxybutyrate-co-3hydroxyalkanoates) by metabolically engineered Escherichia coli strains. Appl. Biochem. Biotechnol. 114 (1): 335–346. Park, S.J., Ahn, W.S., Green, P.R., and Lee, S.Y. (2001). Production of poly(3-hydroxybutyrate-co-3-hydroxyhexanoate) by metabolically engineered Escherichia coli strains. Biomacromolecules 2 (1): 248–254. Jung, Y.K., Kim, T.Y., Park, S.J., and Lee, S.Y. (2010). Metabolic engineering of Escherichia coli for the production of polylactic acid and its copolymers. Biotechnol. Bioeng. 105 (1): 161–171. Jung, Y.K. and Lee, S.Y. (2011). Efficient production of polylactic acid and its copolymers by metabolically engineered Escherichia coli. J. Biotechnol. 151 (1): 94–101. Choi, S.Y., Park, S.J., Kim, W.J. et al. (2016). One-step fermentative production of poly(lactate-co-glycolate) from carbohydrates in Escherichia coli. Nat. Biotechnol. 34 (4): 435–440. Choi, S.Y., Kim, W.J., Yu, S.J. et al. (2017). Engineering the xylosecatabolizing Dahms pathway for production of poly(d-lactate-co-glycolate)

References

98

99 100

101

102

103

104

105

106

107

108 109

110

111

112

and poly(d-lactate-co-glycolate-co-d-2-hydroxybutyrate) in Escherichia coli. Microb. Biotechnol. 10 (6): 1353–1364. Yang, J.E., Park, S.J., Kim, W.J. et al. (2018). One-step fermentative production of aromatic polyesters from glucose by metabolically engineered Escherichia coli strains. Nat. Commun. 9 (1): 79. Yu, H. and Stephanopoulos, G. (2008). Metabolic engineering of Escherichia coli for biosynthesis of hyaluronic acid. Metab. Eng. 10 (1): 24–32. Buldum, G., Bismarck, A., and Mantalaris, A. (2018). Recombinant biosynthesis of bacterial cellulose in genetically modified Escherichia coli. Bioprocess Biosyst. Eng. 41 (2): 265–279. Jiang, H., Shang, L., Yoon, S.H. et al. (2006). Optimal production of poly-γ-glutamic acid by metabolically engineered Escherichia coli. Biotechnol. Lett. 28 (16): 1241–1246. Frey, K.M., Oppermann-Sanio, F.B., Schmidt, H., and Steinbüchel, A. (2002). Technical-scale production of cyanophycin with recombinant strains of Escherichia coli. Appl. Environ. Microbiol. 68 (7): 3377–3384. Liao, J.C., Mi, L., Pontrelli, S., and Luo, S. (2016). Fuelling the future: microbial engineering for the production of sustainable biofuels. Nat. Rev. Microbiol. 14 (5): 288–304. Kuzuyama, T. (2002). Mevalonate and nonmevalonate pathways for the biosynthesis of isoprene units. Biosci. Biotechnol. Biochem. 66 (8): 1619–1627. Martin, V.J.J., Pitera, D.J., Withers, S.T. et al. (2003). Engineering a mevalonate pathway in Escherichia coli for production of terpenoids. Nat. Biotechnol. 21 (7): 796–802. Kang, M.-K. and Nielsen, J. (2017). Biobased production of alkanes and alkenes through metabolic engineering of microorganisms. J. Ind. Microbiol. Biotechnol. 44 (4): 613–622. Hsieh, W.-D., Chen, R.-H., Wu, T.-L., and Lin, T.-H. (2002). Engine performance and pollutant emission of an SI engine using ethanol–gasoline blended fuels. Atmos. Environ. 36 (3): 403–410. Atsumi, S. and Liao, J.C. (2008). Metabolic engineering for advanced biofuels production from Escherichia coli. Curr. Opin. Biotechnol. 19 (5): 414–419. Duncombe, G.R. and Frerman, F.E. (1976). Molecular and catalytic properties of the acetoacetyl-coenzyme a thiolase of Escherichia coli. Arch. Biochem. Biophys. 176 (1): 159–170. Wiesenborn, D.P., Rudolph, F.B., and Papoutsakis, E.T. (1988). Thiolase from Clostridium acetobutylicum ATCC 824 and its role in the synthesis of acids and solvents. Appl. Environ. Microbiol. 54 (11): 2717. Niu, F.-X., Lu, Q., Bu, Y.-F., and Liu, J.-Z. (2017). Metabolic engineering for the microbial production of isoprenoids: carotenoids and isoprenoid-based biofuels. Synth. Syst. Biotechnol. 2 (3): 167–175. Yang, J., Xian, M., Su, S. et al. (2012). Enhancing production of bio-isoprene using hybrid MVA pathway and isoprene synthase in Escherichia coli. PLoS One 7 (4): e33509.

391

392

11 Metabolic Engineering of Escherichia coli

113 Partow, S., Siewers, V., Daviet, L. et al. (2012). Reconstruction and

114

115

116

117

118

119

120

121

122

123

124

125

126

evaluation of the synthetic bacterial MEP pathway in Saccharomyces cerevisiae. PLoS One 7 (12): e52498. Bentley, F.K., Zurbriggen, A., and Melis, A. (2014). Heterologous expression of the mevalonic acid pathway in cyanobacteria enhances endogenous carbon partitioning to isoprene. Mol. Plant 7 (1): 71–86. George, K.W., Thompson, M.G., Kang, A. et al. (2015). Metabolic engineering for the high-yield production of isoprenoid-based C5 alcohols in Escherichia coli. Sci. Rep. 5: 11128. Liu, C.L., Dong, H.G., Zhan, J. et al. (2019). Multi-modular engineering for renewable production of isoprene via mevalonate pathway in Escherichia coli. J. Appl. Microbiol. 126 (4): 1128–1139. Liu, C.-L., Lv, Q., and Tan, T.-W. (2015). Joint antisense RNA strategies for regulating isoprene production in Escherichia coli. RSC Adv. 5 (91): 74892–74898. Li, M., Nian, R., Xian, M., and Zhang, H. (2018). Metabolic engineering for the production of isoprene and isopentenol by Escherichia coli. Appl. Microbiol. Biotechnol. 102 (18): 7725–7738. Wang, S., Wang, Z., Wang, Y. et al. (2017). Production of isoprene, one of the high-density fuel precursors, from peanut hull using the high-efficient lignin-removal pretreatment method. Biotechnol. Biofuels 10 (1): 297. Qian, Z.-G., Xia, X.-X., and Lee, S.Y. (2009). Metabolic engineering of Escherichia coli for the production of putrescine: a four carbon diamine. Biotechnol. Bioeng. 104 (4): 651–662. Chung, H., Yang, J.E., Ha, J.Y. et al. (2015). Bio-based production of monomers and polymers by metabolically engineered microorganisms. Curr. Opin. Biotechnol. 36: 73–84. Choi, J.W., Yim, S.S., Lee, S.H. et al. (2015). Enhanced production of gamma-aminobutyrate (GABA) in recombinant Corynebacterium glutamicum by expressing glutamate decarboxylase active in expanded pH range. Microb. Cell Factories 14 (1): 21. Yeom, S.-J., Kim, M., Kwon, K.K. et al. (2018). A synthetic microbial biosensor for high-throughput screening of lactam biocatalysts. Nat. Commun. 9 (1): 5053. Rajagopal, B.S., DePonte, J., Tuchman, M., and Malamy, M.H. (1998). Use of inducible feedback-resistant N-acetylglutamate synthetase (argA) genes for enhanced arginine biosynthesis by genetically engineered Escherichia coli K-12 strains. Appl. Environ. Microbiol. 64 (5): 1805–1811. Lütke-Eversloh, T. and Stephanopoulos, G. (2007). L-tyrosine production by deregulated strains of Escherichia coli. Appl. Microbiol. Biotechnol. 75 (1): 103–110. Lütke-Eversloh, T. and Stephanopoulos, G. (2008). Combinatorial pathway analysis for improved L-tyrosine production in Escherichia coli: identification of enzymatic bottlenecks by systematic gene overexpression. Metab. Eng. 10 (2): 69–77.

References

127 Juminaga, D., Baidoo, E.E.K., Redding-Johanson, A.M. et al. (2012). Modu-

128

129

130

131

132

133

134

135

136

137

138 139

140

141

142

lar engineering of L-tyrosine production in Escherichia coli. Appl. Environ. Microbiol. 78 (1): 89–98. Köllner, T.G., Lenk, C., Zhao, N. et al. (2010). Herbivore-induced SABATH methyltransferases of maize that methylate anthranilic acid using s-adenosyl-L-methionine. Plant Physiol. 153 (4): 1795–1807. Thu Ho, N.A., Hou, C.Y., Kim, W.H., and Kang, T.J. (2013). Expanding the active pH range of Escherichia coli glutamate decarboxylase by breaking the cooperativeness. J. Biosci. Bioeng. 115 (2): 154–158. Shin, J.H., Park, S.H., Oh, Y.H. et al. (2016). Metabolic engineering of Corynebacterium glutamicum for enhanced production of 5-aminovaleric acid. Microb. Cell Factories 15 (1): 174. Park, S.J., Oh, Y.H., Noh, W. et al. (2014). High-level conversion of L-lysine into 5-aminovalerate that can be used for nylon 6,5 synthesis. Biotechnol. J. 9 (10): 1322–1328. Park, S.J., Kim, E.Y., Noh, W. et al. (2013). Metabolic engineering of Escherichia coli for the production of 5-aminovalerate and glutarate as C5 platform chemicals. Metab. Eng. 16: 42–47. Choi, S., Song, C.W., Shin, J.H., and Lee, S.Y. (2015). Biorefineries for the production of top building block chemicals and their derivatives. Metab. Eng. 28: 223–239. Song, C.W., Kim, J.W., Cho, I.J., and Lee, S.Y. (2016). Metabolic engineering of Escherichia coli for the production of 3-hydroxypropionic acid and malonic acid through β-alanine route. ACS Synth. Biol. 5 (11): 1256–1263. Tokuyama, K., Ohno, S., Yoshikawa, K. et al. (2014). Increased 3-hydroxypropionic acid production from glycerol, by modification of central metabolism in Escherichia coli. Microb. Cell Factories 13 (1): 64. Choi, S., Kim, H.U., Kim, T.Y. et al. (2013). Production of 4-hydroxybutyric acid by metabolically engineered Mannheimia succiniciproducens and its conversion to γ-butyrolactone by acid treatment. Metab. Eng. 20: 73–83. Wendisch, V.F., Mindt, M., and Pérez-García, F. (2018). Biotechnological production of mono- and diamines using bacteria: recent progress, applications, and perspectives. Appl. Microbiol. Biotechnol. 102 (8): 3583–3594. Chae, T.U., Ahn, J.H., Ko, Y.-S. et al. (2020). Metabolic engineering for the production of dicarboxylic acids and diamines. Metab. Eng. 58: 2–16. Qian, Z.-G., Xia, X.-X., and Lee, S.Y. (2011). Metabolic engineering of Escherichia coli for the production of cadaverine: a five carbon diamine. Biotechnol. Bioeng. 108 (1): 93–103. Ma, W., Cao, W., Zhang, H. et al. (2015). Enhanced cadaverine production from L-lysine using recombinant Escherichia coli co-overexpressing CadA and CadB. Biotechnol. Lett. 37 (4): 799–806. Skoog, E., Shin, J.H., Saez-Jimenez, V. et al. (2018). Biobased adipic acid – the challenge of developing the production host. Biotechnol. Adv. 36 (8): 2248–2263. Ahn, J.H., Lee, J.A., Bang, J., and Lee, S.Y. (2018). Membrane engineering via trans-unsaturated fatty acids production improves succinic acid production

393

394

11 Metabolic Engineering of Escherichia coli

143 144

145

146

147

148

149

150 151

152

153

154

155

156

157

in Mannheimia succiniciproducens. J. Ind. Microbiol. Biotechnol. 45 (7): 555–566. Ahn, J.H., Jang, Y.-S., and Lee, S.Y. (2016). Production of succinic acid by metabolically engineered microorganisms. Curr. Opin. Biotechnol. 42: 54–66. Chatterjee, R., Millard, C.S., Champion, K. et al. (2001). Mutation of the ptsG gene results in increased production of succinate in fermentation of glucose by Escherichia coli. Appl. Environ. Microbiol. 67 (1): 148–154. Chae, T.U., Choi, S.Y., Ryu, J.Y., and Lee, S.Y. (2018). Production of ethylene glycol from xylose by metabolically engineered Escherichia coli. AIChE J. 64 (12): 4193–4200. Cervin, M.A., Soucaille, P., and Valle, F. (2010). Process for the biological production of 1,3-propanediol with high yield. US Patent 7,745,184, filed 24 March 2008 and issued 29 June 2010. Zhang, Y., Liu, D., and Chen, Z. (2017). Production of C2–C4 diols from renewable bioresources: new metabolic pathways and metabolic engineering strategies. Biotechnol. Biofuels 10 (1): 299. Zhang, J., Kao, E., Wang, G. et al. (2016). Metabolic engineering of Escherichia coli for the biosynthesis of 2-pyrrolidone. Metab. Eng. Commun. 3: 1–7. Zhang, J., Barajas, J.F., Burdu, M. et al. (2017). Application of an acyl-CoA ligase from Streptomyces aizunensis for lactam biosynthesis. ACS Synth. Biol. 6 (5): 884–890. Hirasawa, T. and Shimizu, H. (2016). Recent advances in amino acid production by microbial cells. Curr. Opin. Biotechnol. 42: 133–146. Fusee, M.C., Swann, W.E., and Calton, G.J. (1981). Immobilization of Escherichia coli cells containing aspartase activity with polyurethane and its application for L-aspartic acid production. Appl. Environ. Microbiol. 42 (4): 672–676. Zhang, X., Jantama, K., Moore, J.C. et al. (2007). Production of L-alanine by metabolically engineered Escherichia coli. Appl. Microbiol. Biotechnol. 77 (2): 355–366. Wang, Y., Li, Q., Zheng, P. et al. (2016). Evolving the L-lysine high-producing strain of Escherichia coli using a newly developed high-throughput screening method. J. Ind. Microbiol. Biotechnol. 43 (9): 1227–1235. Huang, J.-F., Liu, Z.-Q., Jin, L.-Q. et al. (2017). Metabolic engineering of Escherichia coli for microbial production of L-methionine. Biotechnol. Bioeng. 114 (4): 843–851. Zhao, H., Fang, Y., Wang, X. et al. (2018). Increasing L-threonine production in Escherichia coli by engineering the glyoxylate shunt and the L-threonine biosynthesis pathway. Appl. Microbiol. Biotechnol. 102 (13): 5505–5518. Shin, J.H. and Lee, S.Y. (2014). Metabolic engineering of microorganisms for the production of L-arginine and its derivatives. Microb. Cell Factories 13 (1): 166. Ignarro, L.J., Cirino, G., Casini, A., and Napoli, C. (1999). Nitric oxide as a signaling molecule in the vascular system: an overview. J. Cardiovasc. Pharmacol. 34 (6): 879–886.

References

158 Nandineni, M.R. and Gowrishankar, J. (2004). Evidence for an arginine

159

160

161

162 163

164

165

166

167

168

169

170

171

172

exporter encoded by yggA (argO) that is regulated by the LysR-type transcriptional regulator ArgP in Escherichia coli. J. Bacteriol. 186 (11): 3539–3546. Takors, R., Bathe, B., Rieping, M. et al. (2007). Systems biology for industrial strains and fermentation processes – example: amino acids. J. Biotechnol. 129 (2): 181–190. Park, J.H., Kim, T.Y., Lee, K.H., and Lee, S.Y. (2011). Fed-batch culture of Escherichia coli for L-valine production based on in silico flux response analysis. Biotechnol. Bioeng. 108 (4): 934–946. Lee, J.-H. and Wendisch, V.F. (2017). Biotechnological production of aromatic compounds of the extended shikimate pathway from renewable biomass. J. Biotechnol. 257: 211–221. Noda, S. and Kondo, A. (2017). Recent advances in microbial production of aromatic chemicals and derivatives. Trends Biotechnol. 35 (8): 785–796. Wang, J., Shen, X., Rey, J. et al. (2018). Recent advances in microbial production of aromatic natural products and their derivatives. Appl. Microbiol. Biotechnol. 102 (1): 47–61. Wu, F., Cao, P., Song, G. et al. (2018). Expanding the repertoire of aromatic chemicals by microbial production. J. Chem. Technol. Biotechnol. 93 (10): 2804–2816. Huccetogullari, D., Luo, Z.W., and Lee, S.Y. (2019). Metabolic engineering of microorganisms for production of aromatic compounds. Microb. Cell Factories 18 (1): 41. Rodriguez, A., Martnez, J.A., Flores, N. et al. (2014). Engineering Escherichia coli to overproduce aromatic amino acids and derived compounds. Microb. Cell Factories 13 (1): 126. van Spronsen, F.J., van Rijn, M., Bekhof, J. et al. (2001). Phenylketonuria: tyrosine supplementation in phenylalanine-restricted diets. Am. J. Clin. Nutr. 73 (2): 153–157. Lütke-Eversloh, T., Santos, C.N.S., and Stephanopoulos, G. (2007). Perspectives of biotechnological production of L-tyrosine and its applications. Appl. Microbiol. Biotechnol. 77 (4): 751–762. Dell, K.A. and Frost, J.W. (1993). Identification and removal of impediments to biocatalytic synthesis of aromatics from D-glucose: rate-limiting enzymes in the common pathway of aromatic amino acid biosynthesis. J. Am. Chem. Soc. 115 (24): 11581–11589. Santos, C.N.S., Xiao, W., and Stephanopoulos, G. (2012). Rational, combinatorial, and genomic approaches for engineering L-tyrosine production in Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 109 (34): 13538–13543. Wang, J. and Luca, V.D. (2005). The biosynthesis and regulation of biosynthesis of Concord grape fruit esters, including “foxy” methylanthranilate. Plant J. 44 (4): 606–619. Kuepper, J., Dickler, J., Biggel, M. et al. (2015). Metabolic engineering of Pseudomonas putida KT2440 to produce anthranilate from glucose. Front. Microbiol. 6: 1310.

395

396

11 Metabolic Engineering of Escherichia coli

173 Balderas-Hernández, V.E., Sabido-Ramos, A., Silva, P. et al. (2009). Metabolic

174

175 176

177

178

179

180 181

182

183 184

185 186

187

engineering for improving anthranilate synthesis from glucose in Escherichia coli. Microb. Cell Factories 8 (1): 19. Noda, S., Shirai, T., Oyama, S., and Kondo, A. (2016). Metabolic design of a platform Escherichia coli strain producing various chorismate derivatives. Metab. Eng. 33: 119–129. Ward, M., Yu, B., Wyatt, V. et al. (2007). Anti-HIV-1 activity of poly(mandelic acid) derivatives. Biomacromolecules 8 (11): 3308–3316. Bhushan, R. and Agarwal, C. (2008). Direct enantiomeric TLC resolution of dl-penicillamine using (R)-mandelic acid and l-tartaric acid as chiral impregnating reagents and as chiral mobile phase additive. Biomed. Chromatogr. 22 (11): 1237–1242. Reifenrath, M. and Boles, E. (2018). Engineering of hydroxymandelate synthases and the aromatic amino acid pathway enables de novo biosynthesis of mandelic and 4-hydroxymandelic acid with Saccharomyces cerevisiae. Metab. Eng. 45: 246–254. Reifenrath, M., Bauer, M., Oreb, M., and Boles, E. (2018). Bacterial bifunctional chorismate mutase-prephenate dehydratase PheA increases flux into the yeast phenylalanine pathway and improves mandelic acid production. Metab. Eng. Commun. 7: e00079. Han, J.H., Park, M.S., Bae, J.W. et al. (2006). Production of (S)-styrene oxide using styrene oxide isomerase negative mutant of Pseudomonas putida SN1. Enzyme Microb. Technol. 39 (6): 1264–1269. McKenna, R. and Nielsen, D.R. (2011). Styrene biosynthesis from glucose by engineered Escherichia coli. Metab. Eng. 13 (5): 544–554. Noda, S., Miyazaki, T., Miyoshi, T. et al. (2011). Cinnamic acid production using Streptomyces lividans expressing phenylalanine ammonia lyase. J. Ind. Microbiol. Biotechnol. 38 (5): 643–648. Panke, S., Wubbolts, M.G., Schmid, A., and Witholt, B. (2000). Production of enantiopure styrene oxide by recombinant Escherichia coli synthesizing a two-component styrene monooxygenase. Biotechnol. Bioeng. 69 (1): 91–100. Butler, M.S. (2004). The role of natural product chemistry in drug discovery. J. Nat. Prod. 67 (12): 2141–2153. Park, S.Y., Yang, D., Ha, S.H., and Lee, S.Y. (2018). Metabolic engineering of microorganisms for the production of natural compounds. Adv. Biosys. 2 (1): 1700190. Goldstein, J.L. and Brown, M.S. (1990). Regulation of the mevalonate pathway. Nature 343 (6257): 425–430. Rohmer, M., Seemann, M., Horbach, S. et al. (1996). Glyceraldehyde 3-phosphate and pyruvate as precursors of isoprenic units in an alternative non-mevalonate pathway for terpenoid biosynthesis. J. Am. Chem. Soc. 118 (11): 2564–2566. Rowinsky, E.K., Cazenave, L.A., and Donehower, R.C. (1990). Taxol: a novel investigational antimicrotubule agent. J. Natl. Cancer Inst. 82 (15): 1247–1259.

References

188 Paddon, C.J., Westfall, P.J., Pitera, D.J. et al. (2013). High-level semi-synthetic

189

190

191 192 193

194

195

196

197 198

199

200 201

202 203 204 205

production of the potent antimalarial artemisinin. Nature 496 (7446): 528–532. Zhou, K., Qiao, K., Edgar, S., and Stephanopoulos, G. (2015). Distributing a metabolic pathway among a microbial consortium enhances production of natural products. Nat. Biotechnol. 33 (4): 377–383. Zhang, H., Pereira, B., Li, Z., and Stephanopoulos, G. (2015). Engineering Escherichia coli coculture systems for the production of biochemical products. Proc. Natl. Acad. Sci. U. S. A. 112 (27): 8266. Zhang, H. and Wang, X. (2016). Modular co-culture engineering, a new approach for metabolic engineering. Metab. Eng. 37: 114–121. Vogt, T. (2010). Phenylpropanoid biosynthesis. Mol. Plant 3 (1): 2–20. Ferrer, J.L., Austin, M.B., Stewart, C., and Noel, J.P. (2008). Structure and function of enzymes involved in the biosynthesis of phenylpropanoids. Plant Physiol. Biochem. 46 (3): 356–370. Wu, J., Zhou, T., Du, G. et al. (2014). Modular optimization of heterologous pathways for de novo synthesis of (2S)-naringenin in Escherichia coli. PLoS One 9 (7): e101492. Minami, H., Kim, J.-S., Ikezawa, N. et al. (2008). Microbial production of plant benzylisoquinoline alkaloids. Proc. Natl. Acad. Sci. U. S. A. 105 (21): 7393–7398. DeLoache, W.C., Russ, Z.N., Narcross, L. et al. (2015). An enzyme-coupled biosensor enables S-reticuline production in yeast from glucose. Nat. Chem. Biol. 11 (7): 465–471. Staunton, J. and Weissman, K.J. (2001). Polyketide biosynthesis: a millennium review. Nat. Prod. Rep. 18 (4): 380–416. Pfeifer, B.A., Admiraal, S.J., Gramajo, H. et al. (2001). Biosynthesis of complex polyketides in a metabolically engineered strain of Escherichia coli. Science 291 (5509): 1790–1792. Liu, Q., Wu, K., Cheng, Y. et al. (2015). Engineering an iterative polyketide pathway in Escherichia coli results in single-form alkene and alkane overproduction. Metab. Eng. 28: 82–90. Demain, A.L. and Vaishnav, P. (2009). Production of recombinant proteins by microbes and higher organisms. Biotechnol. Adv. 27 (3): 297–306. Chen, R. (2012). Bacterial expression systems for recombinant protein production: Escherichia coli and beyond. Biotechnol. Adv. 30 (5): 1102–1107. Rosano, G.L. and Ceccarelli, E.A. (2014). Recombinant protein expression in Escherichia coli: advances and challenges. Front. Microbiol. 5: 172. Ecker, D.M., Jones, S.D., and Levine, H.L. (2015). The therapeutic monoclonal antibody market. MAbs 7 (1): 9–14. Pandhal, J. and Wright, P.C. (2010). N-linked glycoengineering for human therapeutic proteins in bacteria. Biotechnol. Lett. 32 (9): 1189–1198. Wacker, M., Linton, D., Hitchen, P.G. et al. (2002). N-linked glycosylation in Campylobacter jejuni and its functional transfer into Escherichia coli. Science 298 (5599): 1790–1793.

397

398

11 Metabolic Engineering of Escherichia coli

206 Cuccui, J. and Wren, B. (2015). Hijacking bacterial glycosylation for the pro-

207

208

209 210 211 212 213

214 215 216 217 218

219

220

221

222 223

duction of glycoconjugates, from vaccines to humanised glycoproteins. J. Pharm. Pharmacol. 67 (3): 338–350. Natarajan, A., Jaroentomeechai, T., Li, M. et al. (2018). Metabolic engineering of glycoprotein biosynthesis in bacteria. Emerg. Top. Life Sci. 2 (3): 419–432. Snijder, H.J. and Hakulinen, J. (2016). Membrane protein production in Escherichia coli for applications in drug discovery. In: Advanced Technologies for Protein Complex Production and Characterization (ed. M.C. Vega), 59–77. Cham: Springer International Publishing. Schlegel, S., Hjelm, A., Baumgarten, T. et al. (2014). Bacterial-based membrane protein production. Biochim. Biophys. Acta 1843 (8): 1739–1749. Park, S.H., Das, B.B., Casagrande, F. et al. (2012). Structure of the chemokine receptor CXCR1 in phospholipid bilayers. Nature 491 (7426): 779–783. Wang, Y., Katyal, P., and Montclare, J.K. (2019). Protein-engineered functional materials. Adv. Healthc. Mater. 8 (11): e1801374. Chung, H., Kim, T.Y., and Lee, S.Y. (2012). Recent advances in production of recombinant spider silk proteins. Curr. Opin. Biotechnol. 23 (6): 957–964. Steinbüchel, A. (2001). Perspectives for biotechnological production and utilization of biopolymers: metabolic engineering of polyhydroxyalkanoate biosynthesis pathways as a successful example. Macromol. Biosci. 1 (1): 1–24. Rehm, B.H.A. (2010). Bacterial polymers: biosynthesis, modifications and applications. Nat. Rev. Microbiol. 8 (8): 578–592. Rodríguez-Carmona, E. and Villaverde, A. (2010). Nanostructured bacterial materials for innovative medicines. Trends Microbiol. 18 (9): 423–430. Steinbüchel, A. and Valentin, H.E. (1995). Diversity of bacterial polyhydroxyalkanoic acids. FEMS Microbiol. Lett. 128 (3): 219–228. Lee, S.Y. (1996). Bacterial polyhydroxyalkanoates. Biotechnol. Bioeng. 49 (1): 1–14. Agnew, D.E. and Pfleger, B.F. (2013). Synthetic biology strategies for synthesizing polyhydroxyalkanoates from unrelated carbon sources. Chem. Eng. Sci. 103: 58–67. Bugnicourt, E., Cinelli, P., Alvarez, V., and Lazzeri, A. (2014). Polyhydroxyalkanoate (PHA): review of synthesis, characteristics, processing and potential applications in packaging. Express Polym Lett 8 (11): 791–808. Valappil, S.P., Misra, S.K., Boccaccini, A.R., and Roy, I. (2006). Biomedical applications of polyhydroxyalkanoates, an overview of animal testing and in vivo responses. Expert. Rev. Med. Devices 3 (6): 853–868. Choi, J. and Lee, S.Y. (1999). Factors affecting the economics of polyhydroxyalkanoate production by bacterial fermentation. Appl. Microbiol. Biotechnol. 51 (1): 13–21. Chen, G.-Q. (2009). A microbial polyhydroxyalkanoates (PHA) based bioand materials industry. Chem. Soc. Rev. 38 (8): 2434–2446. Yin, J., Chen, J.-C., Wu, Q., and Chen, G.-Q. (2015). Halophiles, coming stars for industrial biotechnology. Biotechnol. Adv. 33 (7): 1433–1442.

References

224 Choi, J.-I. and Lee, S.Y. (1999). Efficient and economical recovery of

225

226

227

228

229

230

231

232

233

234

235 236 237

238

poly(3-hydroxybutyrate) from recombinant Escherichia coli by simple digestion with chemicals. Biotechnol. Bioeng. 62 (5): 546–553. Lee, Y., Cho, I.J., Choi, S.Y., and Lee, S.Y. (2019). Systems metabolic engineering strategies for non-natural microbial polyester production. Biotechnol. J. 14 (9): e1800426. Park, S.J. and Lee, S.Y. (2003). Identification and characterization of a new enoyl coenzyme A hydratase involved in biosynthesis of medium-chain-length polyhydroxyalkanoates in recombinant Escherichia coli. J. Bacteriol. 185 (18): 5391–5397. Madison, L.L. and Huisman, G.W. (1999). Metabolic engineering of poly(3-hydroxyalkanoates): from DNA to plastic. Microbiol. Mol. Biol. Rev. 63 (1): 21–53. Lee, S.Y., Yim, K.S., Chang, H.N., and Chang, Y.K. (1994). Construction of plasmids, estimation of plasmid stability, and use of stable plasmids for the production of poly(3-hydroxybutyric acid) by recombinant Escherichia coli. J. Biotechnol. 32 (2): 203–211. Lu, X.-Y., Wu, Q., Zhang, W.-J. et al. (2004). Molecular cloning of polyhydroxyalkanoate synthesis operon from Aeromonashydrophila and its expression in Escherichia coli. Biotechnol. Prog. 20 (5): 1332–1336. Zhou, Q., Shi, Z.-Y., Meng, D.-C. et al. (2011). Production of 3-hydroxypropionate homopolymer and poly(3-hydroxypropionate-co-4-hydroxybutyrate) copolymer by recombinant Escherichia coli. Metab. Eng. 13 (6): 777–785. Wang, Q., Liu, C., Xian, M. et al. (2012). Biosynthetic pathway for poly(3-hydroxypropionate) in recombinant Escherichia coli. J. Microbiol. 50 (4): 693–697. Andreeßen, B., Taylor, N., and Steinbüchel, A. (2014). Poly(3-hydroxypropionate): a promising alternative to fossil fuel-based materials. Appl. Environ. Microbiol. 80 (21): 6574–6582. Park, S.J. and Lee, S.Y. (2004). New fadB homologous enzymes and their use in enhanced biosynthesis of medium-chain-length polyhydroxyalkanoates in fadB mutant Escherichia coli. Biotechnol. Bioeng. 86 (6): 681–686. Noda, I., Green, P.R., Satkowski, M.M., and Schechtman, L.A. (2005). Preparation and properties of a novel class of polyhydroxyalkanoate copolymers. Biomacromolecules 6 (2): 580–586. Drumright, R.E., Gruber, P.R., and Henton, D.E. (2000). Polylactic acid technology. Adv. Mater. 12 (23): 1841–1846. Makadia, H.K. and Siegel, S.J. (2011). Poly lactic-co-glycolic acid (PLGA) as biodegradable controlled drug delivery carrier. Polymers 3 (3): 1377–1397. Yuan, W., Jia, Y., Tian, J. et al. (2001). Class I and III polyhydroxyalkanoate synthases from Ralstonia eutropha and Allochromatium vinosum: characterization and substrate specificity studies. Arch. Biochem. Biophys. 394 (1): 87–98. Zhang, S., Kamachi, M., Takagi, Y. et al. (2001). Comparative study of the relationship between monomer structure and reactivity for two polyhydroxyalkanoate synthases. Appl. Microbiol. Biotechnol. 56 (1): 131–136.

399

400

11 Metabolic Engineering of Escherichia coli

239 Yang, T.H., Jung, Y.K., Kang, H.O. et al. (2011). Tailor-made type II Pseu-

240

241

242

243

244

245

246

247 248

249

250 251

252

253

domonas PHA synthases and their use for the biosynthesis of polylactic acid and its copolymer in recombinant Escherichia coli. Appl. Microbiol. Biotechnol. 90 (2): 603–614. Yang, T.H., Kim, T.W., Kang, H.O. et al. (2010). Biosynthesis of polylactic acid and its copolymers using evolved propionate CoA transferase and PHA synthase. Biotechnol. Bioeng. 105 (1): 150–160. Taguchi, S., Yamada, M., Matsumoto, K. et al. (2008). A microbial factory for lactate-based polyesters using a lactate-polymerizing enzyme. Proc. Natl. Acad. Sci. U. S. A. 105 (45): 17323–17327. Choi, S.Y., Rhie, M.N., Kim, H.T. et al. (2020). Metabolic engineering for the synthesis of polyesters: a 100-year journey from polyhydroxyalkanoates to non-natural microbial polyesters. Metab. Eng. 58: 47–81. Li, F.-F., Zhao, Y., Li, B.-Z. et al. (2016). Engineering Escherichia coli for production of 4-hydroxymandelic acid using glucose–xylose mixture. Microb. Cell Factories 15 (1): 90. Park, S.J., Lee, T.W., Lim, S.-C. et al. (2012). Biosynthesis of polyhydroxyalkanoates containing 2-hydroxybutyrate from unrelated carbon source by metabolically engineered Escherichia coli. Appl. Microbiol. Biotechnol. 93 (1): 273–283. Mizuno, S., Enda, Y., Saika, A. et al. (2018). Biosynthesis of polyhydroxyalkanoates containing 2-hydroxy-4-methylvalerate and 2-hydroxy-3-phenylpropionate units from a related or unrelated carbon source. J. Biosci. Bioeng. 125 (3): 295–300. Li, Z.-J., Qiao, K., Shi, W. et al. (2016). Biosynthesis of poly(glycolate-co-lactate-co-3-hydroxybutyrate) from glucose by metabolically engineered Escherichia coli. Metab. Eng. 35: 1–8. Ruffing, A. and Chen, R.R. (2006). Metabolic engineering of microbes for oligosaccharide and polysaccharide synthesis. Microb. Cell Factories 5 (1): 25. Jang, W.D., Hwang, J.H., Kim, H.U. et al. (2017). Bacterial cellulose as an example product for sustainable production and consumption. Microb. Biotechnol. 10 (5): 1181–1185. Ionescu, M. and Belkin, S. (2009). Overproduction of exopolysaccharides by an Escherichia coli K-12 rpoS mutant in response to osmotic stress. Appl. Environ. Microbiol. 75 (2): 483–492. Chong, B.F., Blank, L.M., McLaughlin, R., and Nielsen, L.K. (2005). Microbial hyaluronic acid production. Appl. Microbiol. Biotechnol. 66 (4): 341–351. Gwon, H., Park, K., Chung, S.-C. et al. (2019). A safe and sustainable bacterial cellulose nanofiber separator for lithium rechargeable batteries. Proc. Natl. Acad. Sci. U. S. A. 116 (39): 19288–19293. Scoffone, V., Dondi, D., Biino, G. et al. (2013). Knockout of pgdS and ggt genes improves γ-PGA yield in Bacillus subtilis. Biotechnol. Bioeng. 110 (7): 2006–2012. Goto, A. and Kunioka, M. (1992). Biosynthesis and hydrolysis of poly(γ-glutamic acid) from Bacillus subtilis IF03335. Biosci. Biotechnol. Biochem. 56 (7): 1031–1035.

References

254 Park, T.J., Lee, K.G., and Lee, S.Y. (2016). Advances in microbial biosynthesis

of metal nanoparticles. Appl. Microbiol. Biotechnol. 100 (2): 521–534. 255 Singh, P., Kim, Y.-J., Zhang, D., and Yang, D.-C. (2016). Biological synthesis

256

257

258 259 260

261

262

263

264

265 266

267

268

269

of nanoparticles from plants and microorganisms. Trends Biotechnol. 34 (7): 588–599. Rubilar, O., Rai, M., Tortella, G. et al. (2013). Biogenic nanoparticles: copper, copper oxides, copper sulphides, complex copper nanostructures and their applications. Biotechnol. Lett. 35 (9): 1365–1375. Attard, G., Casadesús, M., Macaskie, L.E., and Deplanche, K. (2012). Biosynthesis of platinum nanoparticles by Escherichia coli MC4100: can such nanoparticles exhibit intrinsic surface enantioselectivity? Langmuir 28 (11): 5267–5274. Sweeney, R.Y., Mao, C., Gao, X. et al. (2004). Bacterial biosynthesis of cadmium sulfide nanocrystals. Chem. Biol. 11 (11): 1553–1559. Bao, H., Lu, Z., Cui, X. et al. (2010). Extracellular microbial synthesis of biocompatible CdTe quantum dots. Acta Biomater. 6 (9): 3534–3541. Gurunathan, S., Han, J.W., Eppakayala, V., and Kim, J.-H. (2013). Microbial reduction of graphene oxide by Escherichia coli: a green chemistry approach. Colloids Surf. B: Biointerfaces 102: 772–777. Zhang, D., Yamamoto, T., Tang, D. et al. (2019). Enhanced biosynthesis of CdS nanoparticles through Arabidopsis thaliana phytochelatin synthase-modified Escherichia coli with fluorescence effect in detection of pyrogallol and gallic acid. Talanta 195: 447–455. Kim, E.B., Seo, J.M., Kim, G.W. et al. (2016). In vivo synthesis of europium selenide nanoparticles and related cytotoxicity evaluation of human cells. Enzym. Microb. Technol. 95: 201–208. Mi, C., Wang, Y., Zhang, J. et al. (2011). Biosynthesis and characterization of CdS quantum dots in genetically engineered Escherichia coli. J. Biotechnol. 153 (3): 125–132. Park, T.J., Lee, S.Y., Heo, N.S., and Seo, T.S. (2010). In vivo synthesis of diverse metal nanoparticles by recombinant Escherichia coli. Angew. Chem. Int. Ed. 49 (39): 7019–7024. Lee, K.G., Hong, J., Wang, K.W. et al. (2012). In vitro biosynthesis of metal nanoparticles in microdroplets. ACS Nano 6 (8): 6998–7008. Choi, Y., Park, T.J., Lee, D.C., and Lee, S.Y. (2018). Recombinant Escherichia coli as a biofactory for various single- and multi-element nanomaterials. Proc. Natl. Acad. Sci. U. S. A. 115 (23): 5944–5949. Tsai, Y.-J., Ouyang, C.-Y., Ma, S.-Y. et al. (2014). Biosynthesis and display of diverse metal nanoparticles by recombinant Escherichia coli. RSC Adv. 4 (102): 58717–58719. Ouyang, C.-Y., Lin, Y.-K., Tsai, D.-Y., and Yeh, Y.-C. (2016). Secretion of metal-binding proteins by a newly discovered OsmY homolog in Cupriavidus metallidurans for the biogenic synthesis of metal nanoparticles. RSC Adv. 6 (20): 16798–16801. Monrás, J.P., Díaz, V., Bravo, D. et al. (2012). Enhanced glutathione content allows the in vivo synthesis of fluorescent CdTe nanoparticles by Escherichia coli. PLoS One 7 (11): e48657.

401

402

11 Metabolic Engineering of Escherichia coli

270 Edmundson, M.C. and Horsfall, L. (2015). Construction of a modular

271

272

273

274

275

276

277 278 279

280

281

arsenic-resistance operon in Escherichia coli and the production of arsenic nanoparticles. Front. Bioeng. Biotechnol. 3: 160. Wang, X., Ai, W., Li, N. et al. (2015). Graphene–bacteria composite for oxygen reduction and lithium ion batteries. J. Mater. Chem. A 3 (24): 12873–12879. Seo, J.M., Kim, E.B., Hyun, M.S. et al. (2015). Self-assembly of biogenic gold nanoparticles and their use to enhance drug delivery into cells. Colloids Surf. B Biointerfaces 135: 27–34. Tan, Z., Yoon, J.M., Nielsen, D.R. et al. (2016). Membrane engineering via trans unsaturated fatty acids production improves Escherichia coli robustness and production of biorenewables. Metab. Eng. 35: 105–113. Foo, J.L., Jensen, H.M., Dahl, R.H. et al. (2014). Improving microbial biogasoline production in Escherichia coli using tolerance engineering. MBio 5 (6): e01932-14. Fisher, M.A., Boyarskiy, S., Yamada, M.R. et al. (2014). Enhancing tolerance to short-chain alcohols by engineering the Escherichia coli AcrB efflux pump to secrete the non-native substrate n-butanol. ACS Synth. Biol. 3 (1): 30–40. Royce, L.A., Yoon, J.M., Chen, Y. et al. (2015). Evolution for exogenous octanoic acid tolerance improves carboxylic acid production and membrane integrity. Metab. Eng. 29: 180–188. de Melo, A.G., Levesque, S., and Moineau, S. (2018). Phages as friends and enemies in food processing. Curr. Opin. Biotechnol. 49: 185–190. Labrie, S.J., Samson, J.E., and Moineau, S. (2010). Bacteriophage resistance mechanisms. Nat. Rev. Microbiol. 8 (5): 317–327. Høyland-Kroghsbo, N.M., Mærkedahl, R.B., and Svenningsen, S.L. (2013). A quorum-sensing-induced bacteriophage defense mechanism. MBio 4 (1): e00362-12. Burgard, A., Burk, M.J., Osterhout, R. et al. (2016). Development of a commercial scale process for production of 1,4-butanediol from sugar. Curr. Opin. Biotechnol. 42: 118–125. Kim, G.B., Kim, W.J., Kim, H.U., and Lee, S.Y. (2020). Machine learning applications in systems metabolic engineering. Curr. Opin. Biotechnol. 64: 1–9.

403

12 Metabolic Engineering of Corynebacterium glutamicum Judith Becker and Christoph Wittmann Institute of Systems Biotechnology, Saarland University, Saarbrücken, Germany

12.1 Introduction Corynebacterium glutamicum, discovered in the 1950s during a Japanese screening program for l-glutamate overproducing bacteria, is known as the world’s most important amino acid cell factory [1–4]. Its enormous biosynthetic power, excellent large-scale performance, and GRAS status (generally regarded as safe) have continuously driven research and application efforts. Over the years, concepts of classical mutagenesis, targeted genetic engineering, metabolic engineering, and more recently systems and synthetic metabolic engineering have helped to upgrade C. glutamicum stepwise into an all-rounder of biotechnology. Meanwhile, systems-wide insights on metabolic fluxes [5–8], intracellular and extracellular metabolite abundance [9–12], gene expression [13–15], and protein inventory [16–18] are combined with computational modeling and design [19–21] and genomic engineering [22–24] into a comprehensive workflow for strain engineering, designated “design–build–test–learn” cycle. Notably, metabolically engineered strains of C. glutamicum provide more than 80 different products of recognized commercial value from more than 25 different carbon sources. Among the products, traditional ones such as l-glutamate and l-lysine [1, 2, 25–28] still play an important role. However, the portfolio is rapidly expanding into a rich spectrum of bulk and specialty chemicals, active ingredients, materials and fuels for applications in food and feed, textiles and housing, packaging, energy and transportation, and health care and well-being [1, 29–31], including further amino acids and their derivatives [1, 4, 12, 32–34], alcohols [1, 35–38], diamines [15, 39–46], terpenoids [47–49], organic acids [29, 50, 51], and aromatic compounds [52–54]. Continuously ongoing discoveries and improvements indicate that there is no end in sight. For example, improved production of the amino acids l-serine [55], l-arginine [56], l-histidine [57], l-leucine [58], l-isoleucine [59], l-glutamate [60], l-lysine [61–64], l-valine [65], and l-cysteine [66–68] is complemented by novel processes for nonproteinogenic amino acids, such as l-ornithine [69], l-4-hydroxyisoleucine [70], ectoine [71], l-pipecolic acid [72], and trans-4-hydroxyproline [73]. In addition, aromatic products experience a Metabolic Engineering: Concepts and Applications, First Edition. Edited by Sang Yup Lee, Jens Nielsen, and Gregory Stephanopoulos. © 2021 WILEY-VCH GmbH. Published 2021 by WILEY-VCH GmbH.

404

12 Metabolic Engineering of Corynebacterium glutamicum

boom [52, 54]. Their applications are manifold and include bioplastics, fibers, cosmetics, pharmaceuticals, flavors, and fragrances [1, 74, 75]. Interestingly, C. glutamicum possesses a rich set of catabolic and anabolic aromatic pathways [76, 77] and is surprisingly tolerant to these rather toxic molecules. The microbe therefore emerges as an excellent choice to produce aromatic compounds, including violacein [78], plant polyphenols [79, 80], and different variants of hydroxyl-benzoate [74, 81–83], protocatechuate [74, 84–86], shikimate [87], anthranilate, and methyl anthranilate [88]. Other directions of intense research focus on the use of C. glutamicum for the production of polymers [89–91], such recombinant proteins [92–95] or RNA [96–98].

12.2 Systems Metabolic Engineering Strategies Systems metabolic engineering is one of the major drivers that lifted C. glutamicum among the top of industrial cell factories. Not only in terms of versatileness but also in terms of efficiency, C. glutamicum broke several world records thanks to smart and clever strain engineering strategies. As described in the following, systems metabolic engineering of C. glutamicum, which integrates systems and synthetic biology with metabolic engineering concepts, is a most powerful approach to design cell factories of previously unprecedented performance. 12.2.1

Experimental and Computational Systems Biology

Systems biology provides quantitative and comprehensive data and serves as a most valuable knowledge base for strain engineering. The various layers of the cell, e.g. genome, transcriptome, proteome, metabolome, and fluxome, hide the secret key to decipher the microbe’s physiology and find out how to modify it a la cárte. The genome sequence of the type strain C. glutamicum ATCC 13032 is known since almost 20 years [99, 100], when the world leading l-lysine producing companies Degussa (today Evonik), BASF SE, and Kyowa Hakko [101], all using C. glutamicum for manufacturing, completed their efforts almost simultaneously. The knowledge of the genetic repertoire of C. glutamicum was groundbreaking for later (postgenomic) developments such as genome breeding [102, 103], transcriptome analysis [26, 104, 105], proteome analysis [17, 104, 106], and genome scale metabolic modeling [107–109]. While most follow-up studies were of experimental nature, computational pathway models (using metabolic networks created from the annotated genome sequence) strongly emerged too, because they could predict growth and production behavior on a global scale, including experimentally not accessible scenarios. In a seminal study, the integration of in silico metabolic modeling with in vivo 13 C metabolic flux analysis enabled the design of one of the best l-lysine producing C. glutamicum cell factories to date [19]. In addition, compelling modeling cases investigated amino acid metabolism [107, 110], predicted optimal scenarios for l-methionine [21] and l-lysine production [20, 111], and suggested novel production routes [112].

12.2 Systems Metabolic Engineering Strategies

In the area of experimental tools and technologies, DNA microarrays (designed from the genome sequence and shortly arising after its decryption) initiated the era of transcriptomics of C. glutamicum. Detailed studies unraveled the cellular response of the microbe to external stimuli such as heat stress [102, 105], pH stress [113], process-related oscillations of oxygen and carbon [114], and different sources of nitrogen [18, 115, 116], sulfur [117], and carbon [14, 105, 118]. In this regard, transcriptomics helped to identify transcriptional regulators involved in carbon metabolism and stress response and decode the regulatory system of C. glutamicum. Important regulators discovered over the years comprise AmtR [119–121], DtxR [122], MalR [123], McbR [117, 124], GlxR [125], HspR [126], and RamAB [125, 127, 128]. In addition, transcriptomics was applied to characterize producer strains for l-glutamate [129], l-valine [130], and l-lysine [131] and, even more important, guided strain engineering toward superior production of l-lysine [14], l-arginine [132], diaminopentane [15], glycolate [133], and succinate [134], and toward improved acid tolerance [135]. On the technology side, quantitative RNA sequencing (complementing and partially replacing the use of microarrays) meanwhile enables absolute transcriptome analysis with higher precision and information content, including the exploration of previously not known small RNAs [136, 137]. Recently, RNA sequencing was used to characterize protein [138] and l-lysine [139] production, and disclose the function of the alarmones, ppGpp and pppGpp, in C. glutamicum [140]. In the field of proteomics, different techniques were established to characterize the cytosolic, membrane, and extracellular proteomes of C. glutamicum [16, 17, 104]. Similar to transcriptomics, global proteomic studies focused on changes related to environmental stress [141–144], the utilization of different carbon sources [145–147], and nitrogen availability [18], at least partially unraveling the interplay between gene expression and protein level. Toward production of organic acids, the effect of different carbon four and five dicarboxylic acids was studied on the proteomic level [148]. Similarly to transcriptomics, proteome analysis was also applied to characterize producer strains, e.g. l-valine [130], l-ornithine [149], and l-lysine [150]. Only recently, targeted proteomics was applied to determine the protein levels of terminal biosynthetic pathways in isopentenol [151] and ectoine [71] producing C. glutamicum. Metabolome analysis in C. glutamicum has proven valuable to identify so far unknown metabolic pathways [12, 152–154] and metabolic bottlenecks, which manifested in the accumulation or depletion of specific pathway intermediates [12, 33, 155]. Moreover, it deepened our understanding of transcriptional and metabolic regulatory circuits in C. glutamicum [13, 156], and the cell’s response to genetic and environmental perturbation [33, 114, 129, 157–160]. With regard to intracellular metabolites, metabolomics is challenging [161, 162], far more than transcriptomics and proteomics. The turnover of metabolites in the cell occurs down to the level of milliseconds, demanding for ultrafast quenching to stop metabolic activity, thereby conserving the cell integrity to avoid leakage and undefined metabolite loss. Thorough developments have established robust sampling techniques and various instrumentation, including LC-ESI-MS, GC-MS, MALDI-TOF-MS, and NMR to assess different intracellular metabolites of interest [9, 11, 163–166] and are continuously upgraded and extended, most recently

405

406

12 Metabolic Engineering of Corynebacterium glutamicum

to short CoA thioesters [167]. In addition, the analysis of oxygen, carbon dioxide, using online MS and fluorescent sensors, have provided important insights into respiration of C. glutamicum [168–173]. Like in other cells, the interplay of transcription, translation, post-translational modification, and metabolic control determines the in vivo operation of the metabolic reactions – the fluxome – of C. glutamicum [161, 174]. A powerful technology for high-resolution fluxomics is 13 C metabolic flux analysis [175–179]. Steady-state approaches, inferring flux information from the 13 C labeling pattern of proteinogenic amino acids, formed in specifically designed 13 C isotope experiments, analyzed by GC-MS and processed by suitable software and emerged as routine method at lab scale and miniaturized scale [180–185]. In addition, specific experimental set-ups were developed to enable metabolic flux analysis under dynamic conditions [13, 186, 187], at large scale [188–191], and in nongrowing cells [192–194]. Thanks to these achievements, C. glutamicum is one of the best-studied organisms on the level of metabolic fluxes and its detailed analysis has greatly advanced the field of fluxomics research over the past 20 years [181, 195, 196]. The otherwise inaccessible knowledge from metabolic flux analysis has guided systems metabolic engineering many times and substantially contributed to the fast development of streamlined C. glutamicum cell factories [7, 8, 19, 111, 158, 159, 189, 197–206]. 12.2.2

Genome Editing Approaches and Technologies

Molecular biology tools for genetic engineering of C. glutamicum were already established in the 1980s, based on naturally occurring plasmids [207, 208] and DNA transfer via protoplast formation [207, 209]. First applied as episomal vectors, the plasmids were equipped with antibiotic selection markers and origins of replication (ORIs) for C. glutamicum and E. coli to facilitate amplification, maintenance, and propagation [210, 211]. Today, a variety of shuttles is available for targeted genetic manipulation of the microbe. Only recently, a novel expression vector was developed, which relied on auxotrophic complementation to guarantee heredity instead of the addition of antibiotics [212]. Moreover, the strength of plasmid-based expression was tuned. In particular, the adjustment of the plasmid copy number was a central issue of expression vectors. Early studies revealed that in cryptic C. glutamicum plasmid pGA1, controls its copy number by the presence and orientation of the open reading frame (ORF) A [213]. A more recent study intentionally increased the copy number through adaptive evolution. The evolved plasmid contained a nonsense mutation in its parB gene, resulting in a 10-fold higher copy number as compared to the ancestor [214]. In the pVC7N shuttle vector, derived from the cryptic plasmid pAM330, the copy number sensitively depends on mutations in a specific region between long inverted repeats, designated the copA1 region. Upon single base conversion of cytosine into adenine (copA1 mutation) and guanine (copA2 mutation), respectively, the plasmid copy number is increased from 11 (native) to >112 (copA1) and >300 (copA2), respectively [215]. Beyond plasmid-based expression, non–self-replicating vectors equipped with homology domains for homologous recombination, enabled genomic

12.3 Metabolic Engineering of the Substrate Spectrum

modification. The implementation of the conditionally lethal sacB system as counter selection marker substantially facilitated marker-free genome editing and is routinely routine used for genetic engineering of C. glutamicum since more than two decades [31, 216, 217]. In the presence of sucrose, the expression of sacB, encoding levansucrase, causes growth defects in C. glutamicum, thus promoting vector excision after the delivery of its genetic cargo into the genome [1, 218]. Recent research has focused on clustered regularly interspaced short palindromic repeats (CRISPR) related techniques, which promise fast, multiplexed genome editing [1, 219]. The adaptation of the CRISPR technology from other microbes for C. glutamicum was approached by different workgroups, who derived different solutions [22–24, 220]. In short, CRISPR-based genome editing in C. glutamicum was realized, using different nucleases to introduce the double strand break (DSB) in the DNA: Cpf1 (Cas12a) from Francisella novicida [23] and Cas9 from Streptococcus pyogenes [22, 24, 220]. Homolog-directed DNA repair (HDR) then either relied on single strand DNA editing templates [22, 23] or plasmid-born editing templates [24, 220]. A major challenge was – and still is – the design of the RNA guiding sequence (sgRNA) [22, 220], and the protospacer adjacent motif (PAM) recognized by the nucleases. For example, the T-rich PAM of Cpf1 (Cas12a) made it impossible to specifically implement point mutations in four TCA cycle genes of GC-rich C. glutamicum as the required PAM was missing in the landing region of the editing template [221]. Even the utilization of Cas9, that targeted NGG as the PAM, failed for two of the designated gene loci [221]. Improvement can, however, be expected from ongoing research. Systematic investigation of the CRISPR-Cpf1 system identified a 5′ -NYTV-3′ PAM and a 21 bp spacer sequence as most efficient crRNA design for C. glutamicum [222]. Moreover, the development of novel Cas9 variants (accepting different PAM sequences) promises to increase the number of targetable genetic loci for genome manipulation [223–226]. Without doubt, the powerful approaches developed over time enabled genome editing at a larger scale. As example, genome-reduced chassis strains were constructed [227], which allowed to reduce the genome size up to 22% without affecting growth in minimal medium [228] and gain genetic stability [229]. Moreover, sophisticated sensing and control functions were implemented into the microbe, enabling to screen for desired cellular properties by specific biosensor [230–236] products and control cells by light and other external stimuli [237].

12.3 Metabolic Engineering of the Substrate Spectrum 12.3.1

Industrial Raw Materials

The choice of the raw material is crucial for every biotechnology process, since it affects production performance, process operation, costs, and eco-efficiency. Naturally, C. glutamicum uses a variety of carbon sources for growth, including carbohydrates such as glucose, fructose, and sucrose; organic acids; and sugar alcohols [31]. This capability nicely matches with traditional fermentation feedstocks such as raw sugar and molasses (cane molasses in Asia and South America,

407

408

12 Metabolic Engineering of Corynebacterium glutamicum

beet molasses in Europe) and starch hydrolysates (from corn in the United States and wheat in Northern countries). All these raw materials are rich in glucose, fructose, and sucrose. However, the growing demand for edible crops (including corn, sugar beet, sugar cane, and wheat, listed above) to feed the growing world population has turned interests and needs of industrial bioproduction to sustainable raw materials beyond first generation feedstocks. Lignocellulosic sugars (or so-called “wood sugars”), which accumulate as side streams in the paper and pulp industry and the agricultural sector [238–240], have been intensively studied as a second-generation renewable resource. In the same line, the valorization of glycerol, a side product from biodiesel production, was investigated [238, 241]. More recently, novel concepts aim to turn bioeconomy even greener and enable the use of third-generation renewables such as lignin, the world’s most underutilized biopolymer from terrestrial plants [76] and even marine biomass to relieve highly overstrained arable land [242]. 12.3.2

Lignocellulosic Sugars

Hemicellulose makes up approximately 20–30% of lignocellulosic biomass. It is a branched heteropolymer composed of different sugars. Depending on the plant type, the main polysaccharide chain is either xylan (hardwood) or mannan (softwood), built from xylose and mannose, respectively [243]. The side chains of hemicellulose also contains other sugars, such as arabinose, glucose, and galactose, although to a smaller extent [243, 244]. Most strains of C. glutamicum (except for a few natural isolates) hardly consume these substrates naturally, so that substantial effort was required to make “woody” sugars and hemicellulose bioavailable for biotechnological use (Figure 12.1). Table 12.1 provides a comprehensive overview of metabolic engineering studies of C. glutamicum to utilize hemicellulosic biomass. There are also excellent review articles available that cover this interesting and promising research field [238, 267]. 12.3.2.1

Xylose

Xylose is the predominant sugar in hemicellulosic biomass. C. glutamicum naturally cannot metabolize the pentose. Although the wild type exhibits low activity of xylulokinase (XylB), it lacks xylose isomerase (XylA), which catalyzes the initial step of the xylose isomerase (XI) pathway [240, 246, 267]. For the reconstitution of xylose assimilation, heterologous expression of the gene xylA from E. coli was sufficient [240]. Additional overexpression of the E. coli xylB gene, however, substantially improved growth of C. glutamicum on xylose [240]. A major hurdle for utilization of xylose was its uptake by C. glutamicum. The responsible transporters limit productivity, specifically at low xylose concentration in the medium, but remain unknown [246]. Xylose utilization has accordingly been improved by heterologous expression of the xylose transporter gene xylE from E. coli [256]. Alternatively, expression of araE, encoding an arabinose transporter, from E. coli [268] and Bacillus subtilis [247] supported xylose uptake in C. glutamicum. A systematic investigation of heterologous xylA expression from different donor species (Xanthomonas campestris, B. subtilis, E. coli, Mycobacterium smegmatis) identified the X. campestris gene as the

Table 12.1 Metabolic engineering of Corynebacterium glutamicum for the utilization of hemicellulosic biomass and sugars derived thereof.

Product

1,5-Diaminopentane

Substrate

Substrate engineering strategy

Titer (g l−1 )

Yield (g g−1 )

Productivity (g l−1 h−1 )

References

Xylose

Overexpression of xylAB from E. coli

1.4

0.11

0.05

[246]

Hemicellulose hydrolysate

Overexpression of xylAB from E. coli

2.0

0.17a)

0.07

[246]

Xylose

Overexpression of xylAB from E. coli, overexpression of tkt-operon

103.0

0.22

1.37

[14]

XOS

Overexpression of xylAB from E. coli, cell surface display of β-xylosidase from B. subtilis

1.2

0.08a)

0.01a)

[253]

1,4-Diaminobutane

Xylose

Overexpression of xylA from X. campestris, xylB from C. glutamicum

1.3

0.09a)

0.03

[248]

l-Lysine

XOS

Overexpression of xylAB from E. coli, cell surface display of β-xylosidase from B. subtilis

1.8

0.12a)

0.02a)

[253]

l-Glutamate

Arabinose

Overexpression of araBAD from E. coli

8.0

0.11



[255]

Xylan

Overexpression of xylAB from E. coli, xylE from E. coli, secretory system for endoxylanase (XlnA) from Streptomyces coelicolor and xylosidase (XynB) from Bacillus pumilus

1.1

0.06

0.01

[256]

Hemicellulose hydrolysate

Overexpression of xylAB and araBAD from E. coli

6.1

0.25

0.085

[257]

Xylose

Overexpression of xylA from X. campestris, xylB from C. glutamicum

1.7a)

0.11a)

0.04

[248]

Hemicellulose hydrolysate

Overexpression of xylA from X. campestris, xylB from C. glutamicum

6.9

0.14a)

0.14

[248]

Arabinose

Overexpression of araBAD from E. coli

5.4

0.07



[255]

Hemicellulose hydrolysate

Overexpression of xylAB and araBAD from E. coli

13.7

0.34a)

0.14a)

[257] (continued)

Table 12.1 (Continued)

Product

l-Ornithine

Substrate

Substrate engineering strategy

Titer (g l−1 )

Yield (g g−1 )

Productivity (g l−1 h−1 )

References

0.03

[248]

0.11

[248]

Xylose

Overexpression of xylA from X. campestris, xylB from C. glutamicum

2.0

0.13a)

Hemicellulose hydrolysate

Overexpression of xylA from X. campestris, xylB from C. glutamicum

5.4

0.11a)

Arabinose

Overexpression of araBAD from E. coli

11.7

0.31



[255]

Xylose

Overexpression of xylA from X. campestris, xylB from C. glutamicum

2.6

0.17a)

0.04

[248] [69]

Xylose

Overexpression of xylAB from X. campestris

18.9

0.44

0.26

l-Arginine

Arabinose

Overexpression of araBAD from E. coli

4.3

0.30



[255]

Sarcosine

Xylose + acetate

Overexpression of xylA from X. campestris, xylB from C. glutamicum

2.7

0.08

0.04

[258]

Arabinose + acetate

Overexpression of araBAD from E. coli

3.4

0.11

0.05a)

[258]

γ-Aminobutyrate

Empty fruit bunch

Overexpression of xylAB from E. coli

35.7

0.36a)

0.67a)

[259]

Ectoine

Xylose

Overexpression of xylA from X. campestris, xylB from C. glutamicum

0.4

0.04



[260]

Arabinose

Overexpression of araBAD from E. coli

0.4

0.04



[260]

l-Pipecolic acid

Xylose

Overexpression of xylA from X. campestris, xylB from C. glutamicum

0.5

0.05



[261]

Succinate

Lignocellulose hydrolysate

Overexpression of xylAB from X. campestris

98.6

0.87

4.29

[247]

Corn cob hydrolysate

Overexpression of xylAB from E. coli

40.8

0.69

0.85

[262]

Xylose

Overexpression of xylAB from E. coli

6.0a)

0.40a)

2.00

[240]

Arabinose

Overexpression of araBAD from E. coli

8.7

0.45

0.10a)

[263]

Mannose + glucose

Overexpression of manA from C. glutamicum, deletion of ptsF

11.8

0.21a)

0.98a)

[245]

Xylose

Overexpression of xylAB from E. coli

7.8a)

0.50a)

2.60

[240]

Mannose + glucose

Overexpression of manA from C. glutamicum, deletion of ptsF

35.6

0.66a)

2.97a)

[245]

Isobutanol

Hemicellulose hydrolysate

Overexpression of xylA from X. campestris, xylB from C. glutamicum, araBAD from E. coli

0.5

0.20a)

0.01

[264]

Xylitol

Hemicellulose hydrolysate

Overexpression of xylose reductase (xr) from Rhodotorula mucilaginosa, araA from E. coli, d-psicose 3-epimerase (dpe) from Agrobacterium tumefaciens, l-xylulose reductase (lxr) from Mycobacterium smegmatis, araT from B. licheniformis

31.0

0.28

2.58a)

[265]

Xylonate

Xyloseb + glucose

Overexpression of xylose dehydrogenase (xdh) from Caulobacter crescentus, xylE from E. coli

20.7

1.02

1.04

[266]

Xylan

Overexpression of xylose dehydrogenase (xdh) from Caulobacter crescentus, xylE from E. coli, secretory system for endoxylanase (XlnA) from Streptomyces coelicolor and xylosidase (XynB) from Bacillus pumilus

6.2

0.31

0.12a)

[266]

Lactate

XOS, xylo-oligosaccharides. a) Estimated from reference. b) Xylose was used only as a biotransformation substrate.

412

12 Metabolic Engineering of Corynebacterium glutamicum

Figure 12.1 Metabolic engineering of Corynebacterium glutamicum for the utilization of hemicellulosic biomass. The figure summarizes efforts from different studies to enable utilization of mannose (illustrated in red) [245], xylose and xylan (orange) [14, 240, 246–253], arabinose and arabinoxylan (light orange) [155, 239, 254]. Modifications of endogenous pathways for improved substrate uptake and metabolization are indicated in turquoise and green. Abf51B, arabinofuranosidase; AraA, arabinose isomerase; AraB, ribulokinase; AraD, ribulose 6-phosphate 4-epimerase; AraE, arabinose importer; AXOS, arabino-xylo-oligosaccharides; BSU17580, extracellular β-xylosidase; Fbp, fructose 1,6-bisphosphatase; Icd, isocitrate dehydrogenase; IolT, myoinositol transporter; IolR, repressor of IolT; KDY, 2-keto-3-deoxydehydratase; KSH, 2-ketoglutarate semialdehyde dehydrogenase; ManA, mannose isomerase; Pgl, phosphogluconolactonase; PtsF, fructose-specific subunit of phosphotransferase system; Tal, Transaldolase; Tkt, transketolase; XDY, xylonate dehydratase; XIDH, xylose dehydrogenase; XLS, xylonolactonase; XOS, xylo-oligosaccharides; XylA, xylose isomerase; XylB, xylulokinase; XylD, intracellular β-xylosidase; XylE, xylose transporter; XylEFG, xylo-oligosaccharide ABC-transporter; XylI, secreted xylanase; Zwf, glucose 6-phosphate dehydrogenase. Sources: Sasaki et al. [245], Kawaguchi et al. [240], Buschke et al. [246], Mao et al. [247], Meiswinkel et al. [248], Radek et al. [249], Radek et al. [250], Brüsseler et al. [251], Watanabe et al. [252], Imao et al. [253], Kawaguchi et al. [155], Kawaguchi et al. [239], and Kuge et al. [254].

most favorable gene candidate [248]. Combining its expression with additional overexpression of the native xylB gene resulted in an engineered strain that achieved a growth rate of 0.20 h−1 [248]. The gene combination resulted in improved production of l-glutamate, l-lysine, l-ornithine, and putrescine from xylose [248]. A similar result was obtained, when overexpressing the full xylose operon xylAB from X. campestris, E. coli, Paenibacillus polymyxa SC2, and Streptomyces coelicolor [247]. X. campestris appeared as optimal donor strain to mediate xylose consumption under aerobic growth and anaerobic succinate production conditions, likely due to the similar codon usage in X. campestris and C. glutamicum [247]. In a two-stage process, comprising aerobic biomass and anaerobic succinate production, strain CGS1 accumulated succinate at a high titer (28 g l−1 ) and yield (0.93 g g−1 ) [247]. Further improvement in xylose utilization was achieved by overexpression of the native transaldolase and transketolase genes tal and tkt and heterologous expression of araEBS [247]. The

12.3 Metabolic Engineering of the Substrate Spectrum

resulting strain CGS5 produced 100 g l−1 succinate from glucose/xylose mixtures and 98 g l−1 succinate from a corn stalk hydrolysate [247]. Furthermore, overexpression of xylAB from E. coli enabled production of the biobased monomer diaminopentane (DAP) from xylose and glucose/xylose mixtures [246]. Plasmid-based expression of the genes under control of the groEL promoter in C. glutamicum DAP-3C [39] allowed DAP production at a yield of 165 mmol (mol xylose)−1 [246]. Using a cascaded process, DAP production from hemicellulose was demonstrated. In a first step, oat spelt hemicellulose was hydrolyzed by enzymatic treatment. The hydrolysate was used in a second step as carbon source for the engineered producer C. glutamicum DAP-Xyl1 [246]. C. glutamicum DAP-Xyl1 efficiently converted xylose, glucose, acetate, and citrate, all present in the hemicellulose hydrolysate, into DAP. A combined approach of in silico modeling, 13 C metabolic flux analysis, and transcriptomics then revealed novel engineering targets related to the pentose phosphate (PP) pathway and the TCA cycle [14]. Downregulation of icd (isocitrate dehydrogenase), overexpression of fbp (fructose 1,6-bisphosphatase), overexpression of the tkt operon (comprising the genes zwf , encoding glucose 6-phosphate dehydrogenase; pgl, encoding phosphogluconolactonase; tkt, encoding transketolase; and tal, encoding transketolase), and elimination of byproduct formation by deletion of lysE (encoding the l-lysine exporter) and NCgl1469 (encoding DAP acetyltransferase) improved the DAP yield from xylose by 54% [14]. In a fed-batch process, the optimized strain C. glutamicum DAP-Xyl2 produced 103 (g DAP) L−1 with a molar yield of 32% [14]. In addition, the Weimberg (WMB) pathway was reconstituted in C. glutamicum to enable use of xylose, using the xylXABCD operon from Caulobacter crescentus [249]. The operon encoded xylose dehydrogenase, xylonolactonase, xylonate dehydratase, 2-keto-3-deoxy-dehydratase, and 2-ketoglutarate semialdehyde dehydrogenase. Through adaptive evolution, the initially poor growth (μ = 0.07 h−1 ) was substantially improved (μ = 0.26 h−1 ) [250], due to a point mutation in IolR, a repressor of the glucose/myoinositol transporter IolT1 [251]. Furthermore, xylose was used as biotransformation substrate for the production of ethylene glycol and glycolate in engineered C. glutamicum [221, 269]. The production process relied on a synthetic ribulose 1-phosphate pathway that recruited the genes xylA (from E. coli), dte (d-tagatose 3-epimerase from Pseudomonas cichorii), fucK (l-fuculokinase from E. coli), and fucA (l-fuculose phosphate aldolase from E. coli). The pathway intermediate glycolaldehyde was then oxidized into glycolate by aldehyde dehydrogenase, encoded by aldA from E. coli, or reduced into ethylene glycol by aldehyde reductase, encoded by yqhD from E. coli [269]. In addition, xylB and iolR were deleted to improve production efficiency [269]. The obtained strains produced (i) 5.8 g l−1 ethylene glycol with a yield of 0.31 g g−1 , and (ii) 10.1 g l−1 glycolate with a yield of 0.51 g g−1 in shake flasks. In a fed-batch experiment, the glycolate titer was increased to 24.1 g l−1 [269]. 12.3.2.2

Arabinose

Next to xylose, arabinose can make up to 28% of pentose sugars, particularly in crop residues [267]. The arabinose pathway comprises three enzymatic steps, catalyzed by arabinose isomerase (AraA), ribulokinase (AraB), and ribulose

413

414

12 Metabolic Engineering of Corynebacterium glutamicum

6-phosphate 4-epimerase (AraD). Interestingly, the ability to utilize arabinose is different for different C. glutamicum strains. The genome of C. glutamicum ATCC 13032 and C. glutamicum R does not contain an araBAD operon, while C. glutamicum ATCC 31831 has a fully functional arabinose pathway [270]. At low concentration, arabinose utilization is subject to carbon catabolite repression in the latter strain. Metabolome studies disclosed phosphofructokinase and pyruvate kinase as a major bottleneck, limiting arabinose utilization [155]. Pyruvate kinase overexpression was then combined with deletion of the arabinose repressor gene araR to enhance arabinose uptake. The obtained strain simultaneously consumed glucose and arabinose at equally high rate [155]. Metabolic engineering of other strains (ATCC 13032 and R) into arabinose metabolizers was realized by heterologous expression of the araBAD operon from E. coli [239, 255] (Figure 12.1). This strategy enabled the production of organic acids [239, 263], lactate [239], acetate [239], amino acids [255], isobutanol [264], and ectoine [260]. Additional expression of araE substantially improved arabinose consumption and stimulated simultaneous consumption of both arabinose and glucose [271]. 12.3.2.3

Mannose

Mannan, a homopolymer of mannose, is the major constituent of softwood hemicellulose [272]. Mannose is a natural carbon source of C. glutamicum. Its catabolism involves mannose 6-phosphate isomerase (PMI), encoded by the manA gene [245]. Mannose uptake is mediated by the phosphotransferase system (PTS), whereby the glucose-PTS and the fructose-PTS both contribute to mannose uptake [245]. Although mannose is utilized as a carbon and energy source by C. glutamicum, growth on the hexose is poor, related to low PMI activity [245]. The amplified expression of manA (using the tac promoter) solved the growth impairment. The additional overexpression of ptsF enabled efficient growth on mannose and simultaneous consumption of mannose and glucose [245] (Figure 12.1). A shift to oxygen-deprived conditions initiated the production of succinate (12 g l−1 ), lactate (35 g l−1 ), and acetate (2 g l−1 ) from mannose and glucose mixtures [245]. Mannose was also used to produce 86 mg l−1 GDP-l-fucose. Genetic engineering comprised overexpression of the endogenous genes manB and manC (encoding phosphomannomutase and GTP-mannose 1-phosphate guanylyl-transferase, respectively) and heterologous expression of the E. coli genes gmd and wcaG (encoding GDP-d-mannose-4,6-dehydratase and GDP-4-keto-6-deoxy-d-mannose-4,5-epimerase-4-reductase, respectively) [273]. 12.3.2.4

Oligosaccharides

Xylo-oligosaccharides (XOSs), such as xylobiose, xylotriose, and xylotetraose, regularly occur during hemicellulose hydrolysis. Their use is desirable for hemicellulose biorefineries. Initial efforts toward direct fermentation of XOSs have used heterologous expression of the xylanase-encoding gene xysA from Streptomyces halstedii JM8 in Brevibacterium lactofermentum, a close relative of C. glutamicum [274, 275]. Utilization of XOSs was first established in a xylose-consuming C. glutamicum strain by expression of an

12.3 Metabolic Engineering of the Substrate Spectrum

intracellular β-xylosidase (XylD) and an XOS ABC-transporter (XylEFG) from Corynebacterium alkanolyticum [252] (Figure 12.1). An alternative strategy toward XOS fermentation considered cell surface display of the extracellular β-xylosidase BSU17580, derived from B. subtilis [253]. The porin PorH was used as an anchor [276]. Xylobiose and xylotriose supplied to the medium were cleaved by the β-xylosidase and used for growth and production of l-lysine and DAP [253]. In addition, C. glutamicum was engineered to degrade arabino-xylo-oligosaccharides (AXOSs), stemming from the hydrolysis of arabinoxylan [254]. The chassis strain C. glutamicum XA assimilated xylose and arabinose due to incorporation of the clusters xylAB and araBAD from E. coli and the xylEFGD operon, derived from C. alkanolyticum [254]. To enable AXOS degradation, abf51b, an arabinofuranosidase-encoding gene from C. alkanolyticum, was additionally expressed. The use of the xylI gene from C. alkanolyticum, encoding a secreted xylanase, allowed arabinoxylan to be directly utilized as a carbon source, as XylI released xylose, xylobiose, xylotriose, and arabinoxylobiose from arabinoxylan [254]. 12.3.3 12.3.3.1

Aquatic Sugars Mannitol

Algal biomass appears as a most promising sustainable and renewable feedstock [277, 278]. Marine macro algae (seaweed) benefit from fast growth, land-independent cultivation without a need for fertilizers, pesticides, and sweet water, high carbohydrate content, and a simple low-energy processing for sugar release [278–281]. The sugar alcohol mannitol is a major constituent of macroalgae and accounts for up to 30% of their dry matter [111]. Assuming an estimated seaweed production of 500 million tons by 2050 [282], mannitol provides a significant carbon source for future biorefineries. Although C. glutamicum encodes a mannitol degradation pathway, the wild type does not consume mannitol, because the pathway is naturally repressed by MtlR/AtlR [283]. Interestingly, repression of mannitol catabolism is not relieved by mannitol itself, but only by the related sugar alcohol arabitol, also degraded via this pathway [284]. Upon deletion of the mtlR gene, C. glutamicum grew on mannitol as the sole carbon source, involving mannitol uptake by the transporter MtlT, conversion into fructose by mannitol dehydrogenase (MtlD) [283], and phosphorylation of fructose into fructose 1-phosphate by the fructose-specific PTS [111, 283]. Mannitol utilization for bioproduction was only recently established in C. glutamicum [111] (Figure 12.2). After deletion of mtlR in the l-lysine-hyperproducing strain LYS-12 [19], the derived strain SEA-1 used mannitol for growth and l-lysine production. Limitations in the mannitol-utilizing pathway became obvious by the secretion of larger amounts of fructose. To overcome this bottleneck, two fructose 6-kinase genes, scrK from Clostridium acetobutylicum and mak from E. coli, were evaluated, whereby the E. coli enzyme yielded better performance [111]. Applying 13 C metabolic flux analysis further revealed that insufficient NADPH supply limited l-lysine production. During growth on mannitol, C. glutamicum exhibited only a very low PP pathway flux [111], the major NADPH source for l-lysine production [103, 159, 202, 285]. The

415

416

12 Metabolic Engineering of Corynebacterium glutamicum

Figure 12.2 Systems-wide metabolic engineering of Corynebacterium glutamicum SEA-3 for the production of L-lysine from mannitol [111]. Stepwise improvement was achieved by engineering of the biosynthesis (orange), carbon precursor supply (purple), NADPH supply (turquoise), competing pathways (red), and substrate assimilation (yellow). dapB, dihydrodipicolinate reductase; ddh, diaminopimelate dehydrogenase; fbp, fructose 1,6-bisphosphatase; gapN, NADP-dependent glyceraldehyde 3-phosphate dehydrogenase from S. mutans; homV59A , homoserine dehydrogenase with amino acid exchange valine → alanine at position 59; icdA1G , isocitrate dehydrogenase with start codon exchange ATG → GTG; lysA, diaminopimelate decarboxylase; lysC T311I , aspartokinase with amino acid exchange threonine → isoleucine at position 311; mak, fructokinase from E. coli; mtlR, mannitol repressor; pck, phosphoenolpyruvate carboxykinase; Psod , promoter of the sod gene, encoding superoxide dismutase; Ptuf , promoter of the tuf gene, encoding elongation factor tu; pycP458S , pyruvate carboxylase with amino acid exchange proline → serine at position 458; tktop , expression unit comprising the genes tkt (transketolase), tal (transaldolase), zwf (glucose 6-phosphate dehydrogenase), and pgl (phosphogluconolactonase). Source: Based on Hoffmann et al. [111].

12.3 Metabolic Engineering of the Substrate Spectrum

expression of the NADPH-dependent gapN gene from Streptococcus mutans then yielded the strain SEA-3, which provided extra NADPH through the glycolytic chain (Figure 12.2). SEA-3 produced l-lysine from mannitol at a yield of 0.24 mol mol−1 and turned out to be as efficient as the glucose-based parent strain LYS-12 [19]. Compared to the first generation SEA-1 strain, the engineered substrate assimilation and redox supply improved the l-lysine yield in SEA-3 by 60% [111]. 12.3.4

Valorization of Lignin Aromatics

Lignin is a complex polymeric mesh of aromatic compounds that constitute the structural skeleton of terrestrial plants. The current amount of lignin in the biosphere exceeds 300 billion tons and annually increases by approximately 20 billion tons, so that lignin is one of most abundant renewable feedstocks on Earth [76, 286]. At present, lignin is highly underutilized, because the massive amounts of technical lignin, accumulating every year as a waste stream from the wood-processing industry, are simply burned [76, 287]. Therefore, valorization of lignin is highly attractive. Related to its robustness and potential to degrade aromatic compounds, C. glutamicum has been considered as a candidate for lignin-based fermentation beside Pseudomonas putida [288, 289], Amycolatopsis spp. [290], and Sphingomonas paucimobilis [291, 292]. The capability of C. glutamicum to utilize aromatic compounds is impressive. The microbe grows on the following aromatic compounds: gentisate, protocatechuate, vanillate, vanillin, ferulate, 3-hydrobenzoate, benzoate, 4-hydroxybenzoate, 2,4-dihydroxybenzoate, 3,5-dihydroxytoluene, phenol, 4-cresol, resorcinol, benzyl alcohol, and naphthalene [77]. This has enabled different valorization concepts for lignin-based aromatics using C. glutamicum as illustrated in Figure 12.3. 12.3.4.1

Catechol, Phenol, and Benzoate

The production of cis-cis-muconic acid (MA) from small aromatics is a master piece toward a future lignin industry [76]. In a pioneering study, MA production in C. glutamicum was established by deletion of the catB gene, which encodes MA cycloisomerase, downstream of MA in aromatic catabolism, i.e. the 2-keto-adipate pathway. The resulting strain C. glutamicum MA-1 efficiently converted the aromatic compounds benzoate, catechol, and phenol into MA at a molar yield of 100% [293]. Notably, the aromatic substrate had a substantial impact on the productivity. Determination of the activity of the MA-forming enzyme catechol 1,2-dioxygenase encoded by catA revealed strong induction, when benzoate was used. During MA production on catechol and phenol, the enzyme was, however, hardly expressed [293]. To uncouple catA expression from induction by benzoate, the native promoter was replaced by the strong constitutive tuf promoter. The resulting strain MA-2 exhibited an almost 30-fold higher specific MA production rate than MA-1, using catechol as substrate [293]. In a fed-batch fermentation, MA-2 produced 85 g l−1 MA with a maximal productivity of 2.4 g l−1 h−1 . In addition, MA production (1.8 g l−1 ) was achieved by direct fermentation of a lignin hydrolysate, obtained from thermochemical depolymerization [293].

417

418

12 Metabolic Engineering of Corynebacterium glutamicum

Figure 12.3 Metabolic engineering of Corynebacterium glutamicum for valorization of lignin-based aromatic compounds through biotransformation for the production of muconate (yellow) [293], protocatechuate (purple) [84], and polyphenols (turquoise) [79, 80]. 4cl, 4-coumarate:CoA ligase; catA, catechol 1,2-dioxygenase; catB, muconate cycloisomerase; cg0344–cg0347, catabolic phenylpropanoid operon (phdBCDE); cg0502 (qsuB), phosphate isomerase/epimerase; cg1226 (pobA), 4-hydroxybenzoate 3-monooxygenase; cg2625–cg2640, catabolic benzoate (ben), catechol (cat), and protocatechuate (pca) operon; chi, chalcone isomerase; chs, chalcone synthase; f3h, flavanone 3-hydroxylase; fls, flavonol synthase; tuf, elongation factor tu; sts, stilbene synthase. Sources: Becker et al. [293], Silberbach and Burkovski [115], Okai et al. [84], Kallscheuer et al. [79], and Kallscheuer et al. [80].

12.3.4.2

Ferulate, Caffeate, Cinnamate, and p-Coumarate

C. glutamicum was also used for production of protocatechuate (PCA) from ferulate (FA). The host strain ATCC 21420 was modified by heterologous expression of the vanAB gene from Corynebacterium efficiens. The gene encoded vanillate-O-demethylase, which catalyzed the last step in the conversion of FA into PCA. The strain produced 1.1 g l−1 PCA with a yield of 0.45 g g−1 within 12 hours [84]. In addition, the aromatic compounds caffeate, cinnamate, and p-coumarate were used for the production of different polyphenols [79, 80]. In a first step, a chassis strain was constructed by the deletion of four gene clusters responsible for aromatic catabolism in C. glutamicum. Then, codon-optimized genes encoding chalcone synthase (CHS) and chalcone isomerase (CHI) from petunia were expressed to establish production of naringenin from p-coumarate and eriodictyol from caffeate [80]. Production of the stilbenes pinosylvin (from cinnamate), resveratrol (from p-coumarate), and piceatannol (from caffeate) was then achieved by expression of 4-coumarate:CoA ligase (4CL) gene from parsley and stilbene synthase (STS) gene from peanut [80]. Additional expression of an O-methyltransferase from grape wine in resveratrol-producing C. glutamicum allowed the production of pterostilbene (PSB). The success of

12.4 Industrial Products

PSB production relied on protein fusion of O-methyltransferase with the E. coli maltose-binding protein to increase protein solubility [79]. The production of flavanonols and flavonols was achieved by heterologous expression of flavanone 3-hydroxylase from petunia and flavonol synthase from eastern cottonwood [79].

12.4 Industrial Products 12.4.1

Amino Acids

C. glutamicum is the world’s flagship to produce industrial amino acids [2, 25, 30]. Only a few years after its discovery [294], production plants for l-glutamate and l-lysine were already operating [295, 296]. Using random mutagenesis, novel strains were soon created to produce other amino acids such as l-tryptophan [297], l-phenylalanine [298], l-tyrosine [298], l-ornithine [299, 300], and l-arginine [301]. Since then, researchers continuously improve the efficiency to derive these traditional products. Today, the production of l-glutamate and l-lysine [2, 25, 158, 203, 302] is still a major business using C. glutamicum [1], as outlined below. Other proteinogenic amino acids derived by the microbe include l-isoleucine [59, 303–310], l-valine [65, 311–325], l-methionine [21, 33, 168, 326, 327], and l-tryptophan [328, 329], as reported in a series of compelling studies. In addition, streamlined cell factories provided various other (nonproteinogenic) amino acids and their derivatives [1, 4, 12, 32–34, 330–332] including halogenated variants [333, 334]. 12.4.1.1

L-Glutamate

Umami is one of humans’ gustatory senses and stimulated by the amino acid l-glutamate. Umami was commercialized as seasoning in the form of monosodium glutamate and launched in 1909 with the brand name AJI-NO-MOTO (“the essence of taste”) in Japan. Starting from first fermentative production processes in the 1950s and 1960s using strains of C. glutamicum [295, 296], the market size for l-glutamate has meanwhile reached 4 million tons per year and exhibits continuous growth, presently at approximately 5% (CAGR). l-glutamate accounts for 50% of the revenue in the total amino acid market. Following initial strategies of random mutagenesis and selection, metabolic engineering approaches over the past 10 to 13 years allowed a more fine-tuned improvement of metabolism. Different efforts focused to fuel the TCA cycle, continuously depleted during glutamate formation, via enhanced anaplerotic carboxylation [10, 11]. In addition, downregulation of the competing 2-oxoglutarate dehydrogenase complex (ODHC) emerged as a key point of control [12–14]. These strategies generated mutants that produced about 30 g l−1 of L-glutamate but remained below the performance of classical strains with reported titers up to 120 g l−1 [335]. Although studied for such a long time, glutamate research in C. glutamicum still holds surprises. Recently, a novel glutamate exporter was discovered in an industrial strain [336]. The novel protein, designated MscCG2, had only low amino acid sequence identity (23%)

419

420

12 Metabolic Engineering of Corynebacterium glutamicum

to the previously known protein MscCG, obviously evolved separately, and displays an interesting target for future strain optimization. A next level in performance was reached by using industrial strains from conventional mutagenesis as a starting point for metabolic engineering. Using strain C. glutamicum S9114 [337], a genetically installed pathway control effectively triggered L-glutamate accumulation [338]. First, C-terminal truncation of the glutamate secretion channel protein MscCG (ΔC110) enabled L-glutamate secretion without the need for induction. Then, attenuation of the α-ketoglutarate dehydrogenase complex (ODHC) via adaptation of the odhA ribosomal binding sequence further elevated production performance. The obtained XW6 strain reached 65 g l−1 L-glutamate with an overall yield of 0.63 g (g glucose)−1 using a lignocellulosic hydrolysate from corn stover as the feedstock. It is interesting to note that lignocellulosic biomass contains elevated levels of biotin [339], known to inhibit L-glutamate secretion. Initial strategies demonstrated improved L-glutamate production upon biotin depletion during pretreatment (using biotin-binding proteins) and the addition of penicillin (to overcome the negative biotin-excess effect). Both concepts are not applicable on a large scale, but they underline the impact of even trace nutrients in raw materials on fermentation performance. 12.4.1.2

L-Lysine

l-Lysine, mainly applied as a feed additive [2], exhibits a world market of approximately 2.5 million tons per year [340]. Over the years, substantial effort has been made to improve strains for industrial application [2, 31, 296]. Recently, research has touched on alternative feedstocks and demonstrated l-lysine production from starch [341–343], xylose [248], arabinose [255], cellobiose [344], glycerol [345], and mannitol [111]. In a seminal study, systems metabolic engineering upgraded the wild type into the l-lysine hyperproducing strain C. glutamicum LYS-12, which produced 120 g l−1 l-lysine within 30 hours at a yield of 55% and displays a milestone for industrial production of this premium feed amino acid [19]. In addition, a set of well-elaborated metabolic engineering studies optimized specific features of C. glutamicum relevant for l-lysine production, such as redox and building block supply. To a large extent, these strategies were inferred from a detailed understanding of how different pathways contribute to carbon flux, created by sophisticated 13 C metabolic flux analysis [3, 12–19]. The oxidative PP pathway emerged as major route to supply NADPH, required in substantial amount to form the product (4 NADPH per l-lysine). Superior l-lysine producers were then created by improving NAPDH supply, involving amplification of the gluconeogenic enzyme fructose 1,6-bisphosphatase, overexpression of individual PP pathway enzymes [21], modification of kinetic properties of PP pathway flux controlling enzymes [21, 22], and interruption of the competing Embden–Meyerhof–Parnas (EMP) pathway [23]. More recently, the creation of an NADPH-forming EMP pathway was shown to enhance the supply of this important cofactor. Different studies implemented an NADP-dependent glyceraldehyde dehydrogenase from either Streptococcus mutans (GapN) [346], Clostridium acetobutylicum (GapC) [347], or an engineered GapA enzyme

12.4 Industrial Products

variant from C. glutamicum [348]. Notably, the use of GapN even allowed l-lysine production independent of the PP pathway [349]. GapN was also key to enhance l-lysine production from the third-generation substrate mannitol. The EMP pathway was also tackled at the level of pyruvate kinase [201, 350, 351], however, with a mixed outcome. Regarding improved precursor (oxaloacetate) supply, anaplerotic reactions were investigated systematically [352]. Pyruvate carboxylase was identified as the major bottleneck [353]. In addition, PEP carboxylase was engineered into a feedback-resistant variant to eliminate its undesired allosteric regulation [354]. Moreover, attenuation of the TCA cycle flux, competing with l-lysine biosynthesis, was found beneficial, as shown for strains with downregulated isocitrate dehydrogenase [355], pyruvate dehydrogenase [312], and citrate synthase [156, 356]. A pathway-spanning concept even coupled the TCA cycle to l-lysine biosynthesis [158] and used the high cycle flux to drive product formation. Different attempts continuously engineered the l-lysine pathway itself [357–359]. Recently, superior variants of aspartokinase, the control valve for the l-lysine flux [360], were created using protein engineering [361, 362] and other enzymes in the biosynthetic chain were optimized for co-factor use [63, 363]. In addition, the diaminopimelate node (a branch point between l-lysine and cell wall biosynthesis) was engineered. Variants of UDP-N-acetylmuramyl-lalanyl-d-glutamate: meso-diaminopimelate ligase, encoded by murE, enabled increased l-lysine production [364]. Moreover, electrochemical concepts recently established anaerobic l-lysine production [365]. Using anodic electrofermentation with ferricyanide as an electron carrier, 2.9 mM l-lysine was produced by C. glutamicum under anaerobic conditions [366]. 12.4.1.3

Aminovalerate

Aminovalerate (5-aminovalerate, AVA) is naturally observed as an intermediate of microbial l-lysine degradation. The nonproteinogenic amino acid is a building block for nylons and of commercial value for sustainable bioplastics [367, 368]. AVA production in C. glutamicum was first established by genome-based overexpression of the davBA operon from P. putida in the l-lysine overproducing strain Lys-12 [154] (Figure 12.4). The davBA operon encodes l-lysine monooxygenase and aminovaleramidase, which catalyze the two-step conversion of l-lysine into aminovalerate. The genetic design used in this pioneering study comprised polycistronic expression of the native genes under control of the strong constitutive promoter eftu, whereby the open reading frames were separated by a 20 bp ribosomal binding site. The implementation of the DavBA pathway was sufficient to enable AVA production in the strain AVA-1 [154]. However, the process suffered from substantial formation of l-lysine and glutarate as undesired byproducts. Subsequent strain engineering eliminated l-lysine secretion by deleting the l-lysine exporter gene lysE and gabT, encoding AVA transaminase and causing AVA degradation into glutarate (Figure 12.4). The finally obtained AVA-3 strain produced 28 g l−1 AVA with a yield of 0.13 g g−1 and a maximum productivity of 0.9 g l−1 h−1 within 50 hours [154].

421

422

12 Metabolic Engineering of Corynebacterium glutamicum

Figure 12.4 Systems-wide metabolic engineering of Corynebacterium glutamicum AVA-3 for the production of aminovalerate [154]. Stepwise improvement was achieved by engineering of the biosynthesis (orange), carbon precursor supply (purple), NADPH supply (turquoise), and competing pathways (red). dapB, dihydrodipicolinate reductase; davA, aminovaleramidase from P. putida; davB, L-lysine monooxygenase from P. putida; ddh, diaminopimelate dehydrogenase; fbp, fructose 1,6-bisphosphatase; gabT, aminovalerate transaminase; homV59A , homoserine dehydrogenase with amino acid exchange valine → alanine at position 59; icdA1G , isocitrate dehydrogenase with start codon exchange ATG → GTG; lysA, diaminopimelate decarboxylase; lysC T311I , aspartokinase with amino acid exchange threonine → isoleucine at position 311; pck, phosphoenolpyruvate carboxykinase; Psod , promoter of the sod gene, encoding superoxide dismutase; Ptuf , promoter of the tuf gene, encoding elongation factor tu; pycP458S , pyruvate carboxylase with amino acid exchange proline → serine at position 458; tktop , expression unit comprising the genes tkt (transketolase), tal (transaldolase), zwf (glucose 6-phosphate dehydrogenase), and pgl (phosphogluconolactonase). Source: Based on Rohles et al. [154].

12.4 Industrial Products

Moreover, the classically derived l-lysine producer C. glutamicum BE was modified to produce AVA [369]. Among different variants tested for davBA expression, highest production was achieved when codon-optimized davA was fused to an N-terminal His6-Tag and expressed as an operon together with davB under control of a synthetic H36 promoter from an episomal plasmid [369]. In a BE-based gabT deletion strain, this expression module allowed the production of 33 g l−1 Ava within 150 hours [369]. Another study focused on host selection for production. For this purpose, several randomly created l-lysine producers were investigated with regard to l-lysine production [370]. C. glutamicum KTCT 1857 was selected and modified by plasmid-based expression of the native and codon-optimized davBA operon under control of the PH30 promoter. Interestingly, the native gene variants yielded best production. In a fed-batch fermentation, AVA accumulated up to 40 g l−1 [370]. However, l-lysine was still secreted as a major product, reaching even higher levels (45 g l−1 ) than the desired product [370]. In addition, a cadaverine (1,5-diaminopentane) based route for AVA production was established in C. glutamicum [371]. Similar to AVA, cadaverine is an intermediate of microbial l-lysine degradation [41, 46]. The compound does naturally not occur in C. glutamicum, but is formed after heterologous expression of either ldcC [39, 40] or cadA [42], both l-lysine decarboxylase encoding genes. The combined expression of ldcC together with patA (encoding putrescine transaminase) and patD (encoding γ-aminobutyraldehyde dehydrogenase) enabled AVA production in the wild-type C. glutamicum ATCC 13032 and its l-lysine-producing derivative GRLys1 [371]. The obtained mutant was additionally modified by deletion of sugR (encoding DeoR-type transcriptional regulator) and ldhA (encoding lactate dehydrogenase) to promote glucose consumption. The deletion of snaA (encoding cadaverine N-acetyltransferase [152]), cgmA (encoding a putative cadaverine export system [15]), and gabTDP (encoding a glutarate pathway [154] and an AVA importer [372]) aimed to reduce byproduct formation. The final strain 5AVA3 produced 5 g l−1 AVA from glucose via the cadaverine route [371]. Moreover, the substrate spectrum was extended to alternative feedstocks [371]. 12.4.1.4

Shinorine

Shinorine is a rare mycosporine-like amino acid (MAA) and belongs to a group of small secondary metabolites, which are accumulated by a wide range of prokaryotes and eukaryotes in environments with high UV exposure [373]. MAAs are highly soluble in water and efficiently absorb UV light, which explains why they are commonly referred to as “nature’s sunscreen” [373]. These properties make shinorine attractive as an ingredient for the cosmetics industry [374]. Shinorine production in C. glutamicum was recently achieved by recruiting a gene cluster from Actinosynnema mirum DSM 43827. The biosynthetic shinorine pathway originates from the PP pathway intermediate sedoheptulose 7-phosphate (S7P) and comprises four metabolic reactions encoded by the genes amir4259, amir4258, amir4257, and amir4256 (Figure 12.5) [374, 375]. The initial step, catalyzed by dimethyl 4-deoxygadusol (DDG) synthase, forms DDG from S7P. Subsequently, 4-deoxygadusol (4-DG) is formed by the activity

423

424

12 Metabolic Engineering of Corynebacterium glutamicum

Figure 12.5 Systems metabolic engineering of Corynebacterium glutamicum for the production of shinorine [374]. Stepwise improvement was achieved by engineering of the biosynthesis (orange), carbon precursor supply (purple), and competing pathways (red). amir4256, nonribosomal peptide synthase (NRPS) homolog; amir4257, ATP-grasp family protein; amir4258, O-methyltransferase; amir4259, dimethyl 4-deoxygadusol (DDG) synthase; gap, glyceraldehyde 3-phosphate dehydrogenase; gnd, 6-phosphogluconate dehydrogenase; ldh, lactate dehydrogenase; tal, transaldolase; tuf , elongation factor tu. Sources: Tsuge et al. [374] and Miyamoto et al. [375].

of an O-methyltransferase. The addition of glycine by an ATP-grasp family protein yields mycosporine-glycine, which is then converted to shinorine by an incorporation of serine by a nonribosomal peptide synthase (NRPS) homolog [374]. Expression of the gene cluster in C. glutamicum was based on an episomal plasmid, pCH [342], under control of the promoter of gapA, which encodes glyceraldehyde 3-phosphate dehydrogenase [374]. To potentially avoid medium acidification related to lactate formation, the ldhA gene encoding lactate dehydrogenase was deleted. Moreover, the PP pathway was engineered to improve the supply of S7P as a precursor for shinorine using different strategies. First, S7P consumption by transaldolase was eliminated by deletion of the encoding gene tal, resulting in

12.4 Industrial Products

fivefold improved production. In the next step, increased formation of S7P was attempted by overexpression of (i) the tkt (transketolase) operon, (ii) rpe (ribulose 5-phosphate epimerase), (iii) rpi (ribose 5-phosphate isomerase), and (iv) gnd (6-phosphogluconase dehydrogenase). Only the last option increased shinorine production [374]. The optimized strain YTK827 produced 19.1 mg l−1 shinorine from gluconic acid as a substrate [374]. 12.4.1.5

Ectoine

The cyclic amino acid ectoine is a natural extremolyte. It is formed by extremophilic microbes in nature to protect themselves from environmental stress [376, 377]. Because of its kosmotropic properties, ectoine protects proteins, cell membranes, and human epithelia from allergens, UV light, heat, and dryness and is consequently applied in lotions, sprays, and ointments [378–381]. Its present industrial production relies on halophilic bacteria at high salt level, which suffers from corrosive effects on equipment and instrumentation and extra costs in wastewater handling. Thus, heterologous hosts are regarded attractive because they promise ectoine production without the need for high salt. Biochemically, ectoine biosynthesis starts from aspartate semialdehyde, an intermediate of the l-lysine pathway. Due to its remarkable ability to produce l-lysine, C. glutamicum was considered a useful chassis for heterologous production. Indeed, genome-based expression of the Pseudomonas stutzeri gene cluster ectABCD, encoding 2,4-diaminobutyrate acetyltransferase, L-2,4-diaminobutyrate transaminase, ectoine synthase, and ectoine hydroxylase, in a basic l-lysine producer was sufficient to produce and secrete ectoine and its derivative hydroxyectoine [382]. All genes were transcribed polycistronically using the eftu promoter as a control element, and each gene was equipped with an individual RBS for optimal initiation of translation [382]. Remaining l-lysine secretion in the basic ECT-1 strain could be eliminated by lysE deletion. In a fed-batch experiment, the resulting strain ECT-2 accumulated 4.5 g l−1 ectoine with a molar yield of 30% [382]. To further improve production, the genetic design of the ectoine cluster was modified. First, ectD was excluded to exclusively yield pure ectoine. Second, the original polycistronic design was replaced by installing separate monocistronic modules for ectABC [71]. To this end, a plasmid library with 185 193 possible variants was created by randomly combining 19 synthetic promoters and 3 linker elements to fine tune and balance their expression level in the cell [71]. Screening of more than 400 mutants led to the discovery of several high-titer producers. A closer inspection of the genetic composition and the protein level of EctA, EctB, and EctC in the different mutants revealed that high production efficiency was supported by a specifically balanced ectoine pathway. High flux was achieved when the protein level of EctB was significantly higher than that of EctA and when the total protein amount of all ectoine enzymes was low [71]. During fed-batch fermentation on a molasses-based medium, the best ectoine producer C. glutamicum ectABC opt produced 65 g l−1 ectoine with a volumetric productivity of 1.2 g l−1 h−1 and a specific productivity of 120 mg g−1 h−1 , thereby setting a benchmark for this high-value active ingredient [71].

425

426

12 Metabolic Engineering of Corynebacterium glutamicum

Another study recruited the ectoine pathway from Chromohalobacter salexigens, which was expressed in l-lysine-producing C. glutamicum DM 1729 [260]. The strain was additionally modified by deletion of sugR and ldhA to accelerate glucose metabolism [260]. It finally produced 22 g l−1 ectoine with a productivity of 0.32 g l h−1 and a yield of 0.16 g g−1 in a fed-batch process along with remaining secretion of l-lysine (12 g l−1 ). Though l-lysine secretion could be eliminated by deletion of lysE in principle, this modification was detrimental to growth and ectoine production [260]. 12.4.1.6

L-Pipecolic Acid

l-Pipecolic acid (l-piperidine 2-carboxylic acid, l-PA) receives attention as a chiral building block for therapeutics and as a compatible solute [383]. l-PA is obtained from l-lysine using a two-step biotransformation [383, 384]. In addition, de novo synthesis has been demonstrated in C. glutamicum. The microbe cannot produce l-PA naturally, but the accessibility of l-PA from l-lysine suggested well-established l-lysine C. glutamicum overproducers as promising chassis strains [72, 261]. The pathway, which was implemented into C. glutamicum, aimed to convert l-lysine into l-PA via oxidative deamination, dehydration, and reduction, involving l-lysine 6-dehydrogenase (deaminating) from Silicibacter pomeroyi and native pyrroline 5-carboxylate reductase. The initial mutant exhibited the desired l-PA formation but excreted l-lysine as a side product. The latter was eliminated by deleting the l-lysine exporter. However, this modification resulted in reduced growth due to high intracellular l-lysine levels. Increased expression of the l-PA pathway could partly restore the growth defects. Taken to a glucose/sucrose-based fed-batch process, the strain PIPE4 produced 14.4 g l−1 L-PA with a volumetric productivity of 0.21 g l−1 h−1 and an overall yield of 0.20 g g−1 [58]. Moreover, L-PA production was demonstrated from glycerol, xylose, glucosamine, and starch as alternative carbon sources. 12.4.1.7

Trans-4-hydroxyproline

Trans-4-hydroxyproline (4-HYP) is one of the major constituents of mammalian collagen and considered as chiral building block for cosmetics and pharmaceuticals with anti-inflammatory or antimycotic agents [26, 385, 386]. Its production mainly relies on acid hydrolysis of animal collagens, as microbial fermentation processes are not yet competitive [73, 386]. However, recent efforts substantially improved fermentative production, which promise industrially competitive processes, given further rounds of optimization. Biochemically, 4-HYP can be obtained from l-proline by the activity of enantioselective l-proline trans-4-hydroxylases (P4Hs), which occur in different eukaryotic and prokaryotic species [385]. In C. glutamicum, 4-HYP production was first achieved by heterologous expression of putative p4h genes from Pseudomonas stutzeri (p4hP), Bordetella bronchiseptica (p4hB), and Dactylosporangium spp. (p4hD), using episomal plasmids under control of the Ptrc promoter [385]. Several strains of C. glutamicum were tested as recombinant host, including the wild-type ATCC 13032 and l-proline overproducers. The best production (0.11 g l−1 ) was achieved when p4hD was expressed in an l-proline overproducer [385]. The TCA cycle succinyl-CoA synthetase gene

12.4 Industrial Products

sucCD was then deleted to redirect carbon flux from the α-ketoglutarate pool toward 4-HYP. In addition, expression of the proB* gene (proB with a G446A point mutation), which encoded a mutant variant of γ-glutamyl-kinase, was fine-tuned and combined with optimized overexpression of the p4hD gene [73]. The latter was achieved by using a modified ribosomal binding sequence based on computational prediction [387, 388]. In a fed-batch experiment, the optimized producer Hyp-7 accumulated 22 g l−1 4-HYP within 60 hours with a molar yield of 27% [73]. 12.4.1.8

L-Theanine

l-Theanine is a unique nonprotein amino acid found in tea plants and offers commercial potential as an additive to food and beverages [389]. In addition to existing enzymatic production processes, fermentative production of l-theanine was recently established in C. glutamicum. For this purpose, γ-glutamyl-methylamide synthetase (GMAS), which catalyzed the ATP-dependent ligation of l-glutamate and ethylamine, was expressed in wild-type ATCC 13032 and strain GDK-9, a previously derived l-glutamate producer. To reduce secretion of l-glutamate, the corresponding exporter was deleted. The derived mutant accumulated 42 g l−1 l-theanine with a yield of 19.6%, which, however required ethylamine supplementation. Different strategies to derive the precursor de novo from decarboxylation of l-alanine via plant-based decarboxylases failed but provide an interesting concept to be pursued further. 12.4.1.9

4-Hydroxyisoleucine

4-Hydroxyisoleucine (4-HIL) exhibits glucose-dependent insulinotropic activity and is suggested as a candidate for the treatment of diabetes. Biochemically, it is derived from l-isoleucine using l-isoleucine dioxygenase (IDO). The enzyme catalyzes the hydroxylation of l-isoleucine into 4-HIL under consumption of α-ketoglutarate and oxygen. Pioneering efforts expressed the corresponding ido gene from Bacillus thuringiensis YBT-1520 in an l-isoleucine-overproducer, Corynebacterium glutamicum ssp. lactofermentum SN01 and managed to provide 4-HIL through single-step fermentation [390]. Subsequent studies unraveled the impact of synergistically promoting substrate supply and improving IDO activity [70]. A comprehensive study recently increased 4-HIL production in another l-isoleucine producing C. glutamicum [391]. Six genes encoding enzymes at the oxaloacetate and the α-ketoglutarate node were manipulated, which increased the production of 4-HIL from initially low levels to the gram scale but could not prevent that the accumulation of l-isoleucine remained. A sophisticated strategy than modulated the activity of the TCA cycle at the level of the α-ketoglutarate dehydrogenase complex (ODHC) by employing l-isoleucine-responsive transcription and attenuation. This dynamic control enabled the production of 34 g l−1 4-HIL in strain HIL-18 with negligible accumulation of byproducts. 12.4.2

Organic Acids and Alcohols

Organic acids and alcohols are greatly represented in industrial biotechnology, and their biobased production has emerged as a fast-evolving field related to their

427

428

12 Metabolic Engineering of Corynebacterium glutamicum

broad applicability as flavor additives, preserving agents, and building blocks for the manufacturing of polymers and commodity chemicals [26, 29, 50, 392–394]. The portfolio of organic acids obtained by fermentation of C. glutamicum is rich. Cis-cis-muconate [288–290, 293], glutarate [372], and itaconate [395] have recently attracted attention toward the sustainable production of high-value chemicals and bioplastics. In addition, C. glutamicum has been engineered into an efficient host to derive lactate [396–399], succinate [400, 401], and pyruvate [51]. Regarding alcohols, C. glutamicum was successfully modified to produce isobutanol [38], 2,3-butanediol (2,3-BDO) [402], ethanol [403], 1,2-propanediol [404, 405], 1-propanol [404], ethylene glycol [406], and 1,3-propanediol [407]. 12.4.2.1

Cis-cis-muconate

Cis-cis-muconate (MA) is an unsaturated dicarboxylic acid of outstanding commercial value for commodity chemicals, polymers, and plastics. As discussed above, biobased MA production from lignin-derived aromatics has greatly advanced in recent years and achieved titers up to 85 g l−1 using strains of C. glutamicum [288–290, 293]. Alternative strategies demonstrated de novo production of MA from glucose via routes of aromatic amino acid biosynthesis (Figure 12.6) [408]. To enable production, substantial modification of the wild-type ATCC13032 was required. In C. glutamicum, the natural aromatic amino acid pathway starts with the formation of 3-deoxy-d-arabino-heptulosonate-7-phosphate (DHAP) from erythrose 4-phosphate (E4P) and phosphoenolpyruvate (PEP) by the activity of phospho-2-dehydro-3-deoxyheptonate aldolase, which is encoded by aroFGH. DHAP is then converted via 3-dehydroquinate (3-DHQ) into 3-dehydroshikimate (3-DHS) by 3-DHQ synthase (AroB) and shikimate/quinate dehydratase (AroD). For the designed MA pathway, 3-DHS represented an important and branch point, as it could be channeled to protocatechuate (PCA) and then to MA, instead of further anabolic conversion into shikimate. To avoid 3-DHS loss via the shikimate pathway, aroE, encoding shikimate 5-dehydrogenase, was deleted. In addition, natural PCA degradation was eliminated by deletion of the encoding genes pcaGH. To connect the pathway from PCA to MA via catechol, functional expression of PCA decarboxylase remained an essential puzzle piece [408]. Different gene variants and combinations of heterologous genes of Klebsiella pneumoniae with different promoters were tested to achieve optimal expression. The best combination comprised an operon consisting of the sod promoter and codon-optimized versions of the genes aroY , kpdB, and kpdD, which encode PCA decarboxylase and corresponding “helper” enzymes [76, 408]. In a 50 l fed-batch fermentation, 54 g l−1 MA was produced [408]. A similar approach for reconstruction of the MA pathway was reported in another study, which additionally enhanced the PCA-forming reaction catalyzed by 3-DHS dehydratase (QsuB) [86]. Moreover, the strain was modified by engineering glucose uptake. The PTS for glucose uptake was inactivated by in-frame deletion of ptsI, which encodes the enzyme one (EI) complex of the PTS, to increase the availability of PEP as a precursor for MA biosynthesis, a target well known from aromatic amino acid overproduction. Glucose uptake was

12.4 Industrial Products

Figure 12.6 Systems metabolic engineering of Corynebacterium glutamicum for the production of cis,cis-muconate (MA) from sugar [408]. Stepwise improvement was achieved by engineering of the biosynthetic chain (orange), carbon precursor supply (purple), substrate uptake (yellow), and eliminating competing pathways (red). aroE, shikimate 5-dehydrogenase; aroY, protocatechuate decarboxylase from Klebsiella pneumoniae; catB, muconate cycloisomerase; CO, codon-optimization; IolR, repressor of myoinositol transporter; kpdBH, enhancer-like protein for AroY; pcaHG, protocatechuate 3,4-dioxygenase; ptsI, EI complex of the phosphotransferase system; sod, superoxide dismutase; qsuB, 3-dehydroshikimate dehydratase. Source: Based on Lee et al. [408].

stimulated by deletion of iolR, which encodes the regulator of the myoinositol transporters [86]. The engineered glucose metabolism resulted in a 14% increase in titer. The optimized producer P30 (Figure 12.6) accumulated 4.5 g l−1 MA in a batch process [86]. 12.4.2.2

Glutarate

The 5-carbon dicarboxylic acid glutarate has great potential as a building block for bioplastics and green solvent in cleaning products and paints [409].

429

430

12 Metabolic Engineering of Corynebacterium glutamicum

Initially, its discovery in C. glutamicum was a lucky surprise during the design of a cell factory for AVA production [154]. Different AVA-producing strains accumulated glutarate to titers of 7 g l−1 [154] and 12 g l−1 [369] as a byproduct. The responsible metabolic pathway from AVA to glutarate comprised transamination and oxidation steps, potentially catalyzed by aminovalerate transaminase and glutarate semialdehyde dehydrogenase. The search for the encoding genes identified gabT and gabD, originally assigned to γ-aminobutyrate/succinate catabolism in C. glutamicum [154, 410] and obviously also acting on aminovalerate. Starting from the previously developed strain AVA-2, glutarate production was enhanced by overexpression of the genomic gabTD module using the strong native eftu promoter [372]. The resulting strain GTA-1 efficiently formed glutarate with a yield of 265 mmol mol−1 but still secreted aminovalerate into the medium. Feeding experiments with 13 C precursors revealed that secreted AVA was reimported and channeled toward glutarate by an unknown transport system. The inspection of the genomic regions around the gabTD gene cluster suggested the permease NCgl0464 as candidate [372]. The gene was hence overexpressed by introducing a second copy under the control of Peftu into the genome, resulting in 22% increased glutarate production and 70% reduced AVA secretion. In a molasses-based fed-batch process, the optimized strain GTA-4 (Figure 12.7) accumulated 90 g l−1 glutarate without any AVA secretion. The product was purified by a multistep processing route including acidification, evaporation, treatment with activated carbon, and freeze-drying to yield 99.9% pure glutarate. Glutarate was then polymerized with hexamethylenediamine into a novel biobased nylon 6.5 [372]. 12.4.2.3

Itaconate

Itaconate (methylene-succinate) serves as a building block for polymers, chemicals, and fuels. Historically, the unsaturated carbon five dicarboxylic acid was obtained by the distillation of citric acid, but presently, the preferred route for production is fermentation, primarily using fungi. C. glutamicum is highly tolerant to itaconate and does not metabolize it, which appeared as a suitable starting point of development [395]. Expression of the Aspergillus terreus CAD1 gene encoding cis-aconitate decarboxylase (CAD) in wild-type C. glutamicum enabled low-level production in the stationary growth phase. Co-expression of CAD1 with the maltose-binding protein from Escherichia coli as a fusion product increased enzymatic activity and itaconate titer. Nitrogen-limited growth conditions further boosted the itaconate titer to the gram scale. Subsequent reduction of isocitrate dehydrogenase activity via exchange of the ATG start codon to GTG and TTG [395] resulted in an itaconate titer of 7.8 g l−1 and a yield of 0.4 mol mol−1 during production from glucose. Admittedly, the achieved performance does not reach the high level of fungal fermentation but displays a valuable proof-of-concept. 12.4.2.4

3-Hydroxypropionate

3-Hydroxypropionate (3-HP) is a valuable precursor for the synthesis of acrylic acid, 1,3-propanediol, and biobased plastics [26]. C. glutamicum was recently upgraded to produce 3-HP via the glycerol pathway. Metabolic engineering

12.4 Industrial Products

Figure 12.7 Systems-wide metabolic engineering of Corynebacterium glutamicum GTA-4 for the production of glutarate [372]. Stepwise improvement was achieved by engineering of the biosynthesis (orange), carbon precursor supply (purple), NADPH supply (turquoise), and competing pathways (red). NCgl0464, aminovalerate permease; dapB, dihydrodipicolinate reductase; davA, aminovaleramidase from P. putida; davB, L-lysine monooxygenase from P. putida; ddh, diaminopimelate dehydrogenase; fbp, fructose 1,6-bisphosphatase; gabD, glutarate semialdehyde dehydrogenase; gabT, aminovalerate transaminase; homV59A , homoserine dehydrogenase with amino acid exchange L-valine → L-alanine at position 59; icdA1G , isocitrate dehydrogenase with start codon exchange ATG → GTG; lysA, diaminopimelate decarboxylase; lysC T311I , aspartokinase with amino acid exchange L-threonine → L-isoleucine at position 311; pck, phosphoenolpyruvate carboxykinase; Psod , promoter of the sod gene, encoding superoxide dismutase; Ptuf , promoter of the tuf gene, encoding elongation factor tu; pycP458S , pyruvate carboxylase with amino acid exchange L-proline → L-serine at position 458; tktop , expression unit comprising the genes tkt (transketolase), tal (transaldolase), zwf (glucose 6-phosphate dehydrogenase), and pgl (phosphogluconolactonase).

431

432

12 Metabolic Engineering of Corynebacterium glutamicum

involved expression of the operon for glycerol synthesis (gpd, gpp) from Saccharomyces cerevisiae together with a synthetically assembled 3-HP production pathway [268]. The latter comprised a diol dehydrogenase and its activator (pduCDEGH) from Klebsiella pneumoniae and a mutated aldehyde dehydrogenase (gapDE209Q/E269Q ) from Cupriavidus necator. The mutant formed 21 g l−1 3-HP, however with higher levels of lactate and acetate. Further rounds of engineering reduced the undesired byproduct formation by deleting the genes ldhA, poxB, ptaA, and ackA. Subsequently, the lower EMP pathway was attenuated at the level of gapA (glyceraldehyde dehydrogenase) and glucose uptake was switched to inositol permease (IolT1) and glucokinase (Glk), which overall resulted in remarkable titers of 39 g l−1 3-HP in batch and 63 g l−1 3-HP in fed-batch fermentation [268]. Recent studies demonstrated 3-HP formation via the malonyl-CoA pathway, although at a lower level [411]. 12.4.2.5

Short-Chain Alcohols

Microbial production of alcohols is of increasing interest as a renewable route to derive these important platform chemicals and fuels [412]. C. glutamicum tolerates high levels of alcohols, an important feature to derive such products through fermentation [36]. Over the years, intensive research provided strains, which produce various alcohols, including isobutanol [38], 2,3-butanediol (2,3-BDO) [402], ethanol [403], 1,2-propanediol [404, 405], 1-propanol [404] ethylene glycol [406], and 1,3-propanediol [407]. As an example, metabolic engineering of C. glutamicum for 2,3-BDO production was based on expressing budA from K. pneumoniae, encoding α-acetolactate decarboxylase, known to support the natural 2,3-BDO fermentation [402]. The mutant accumulated 1.76 g l−1 2,3-BDO. Complementation with the genes budB and budC, encoding α-acetolactate synthase and acetoin reductase, revealed that only the first one was beneficial. The resulting budAB strain produced 18.9 g l−1 2,3-BDO from 80 g l−1 glucose and also converted molasses and cassava powder into the product, although at reduced efficiency [402]. Because production was linked to substantial levels of byproducts (acetate, lactate, succinate, acetoin), the corresponding pathways were systematically eliminated involving deletion of ldhA, aceE, pqo, and mdh [413]. Combined with heterologous expression of the Lactococcus lactis 2,3-BDO pathway, the recombinant C. glutamicum strain achieved 6.3 g l−1 2,3-BDO with a yield of 0.3 g g−1 glucose, using a two-stage process [413]. In addition, C. glutamicum was engineered to produce ethanol [403]. For this purpose, the ethanol fermentation pathway was installed in the microbe by expressing the Zymomonas mobilis genes pdc (pyruvate decarboxylase) and adhB (alcohol dehydrogenase). A streamlined EMP pathway with overexpressed pgi (phosphoglucoisomerase), pfkA (phosphofructokinase), gapA (glyceraldehyde dehydrogenase), tpi (triose phosphate isomerase), and pyk (pyruvate kinase) almost doubled productivity. Final production using the derived mutant, was operated as a two-stage process with initial aerobic growth and subsequent anaerobic production, and yielded 106 g l−1 ethanol from glucose [403]. Further rounds of engineering finally enabled fermentation from sugar mixtures of glucose, xylose, and arabinose [414].

12.4 Industrial Products

12.4.3 12.4.3.1

Natural Products and Active Ingredients Pyrazine

C. glutamicum naturally forms a variety of alkylated pyrazines, important food additives used in formulations for roasted nuts, chocolate, potatoes, meat flavors, and more [415]. The observed pyrazine spectrum included methylpyrazine, 2,5-dimethylpyrazine, 2,3-dimethylpyrazine, trimethylpyrazine, 2ethyl-3,6-dimethylpyrazine, 2-ethyl-3,5-dimethylpyrazine, 2-propyl-3,5-dimethylpyrazine, tetramethylpyrazine, ethyltrimethylpyrazine, 2-(hydroxymethyl)-5-methyl-pyrazine, and 2-(hydroxymethyl)-5,6-dimethylpyrazine [75]. Studies with isotopically labeled precursors and gene deletion mutants revealed that the biosynthetic pathway originates from glycolytic intermediates [75]. In a fed-batch process, engineered strains of C. glutamicum accumulated 2.2 g l−1 tetramethylpyrazine [416]. 12.4.3.2

Violacein

The bisindole violacein and its derivative deoxyviolacein have diverse biological activities and efficiently acts against cancer cells and Gram-positive pathogens such as Staphylococcus aureus. Biochemically, violacein and deoxyviolacein are formed from 2 molecules of tryptophan in a sequential pathway, encoded by vioABCDE operon and strains of E. coli have been mainly used to produce the compounds [417–419]. In addition, the classically mutagenized tryptophan producer C. glutamicum ATCC 21850 was employed as a host [78] to express the native vio cluster from Chromobacterium violaceum. Refactoring of the operon by improved ribosomal binding sites, inducible promoters and the development of a suitable fermentation process led to substantial overproduction of violacein. 12.4.3.3

Terpenoids

Terpenoids display a structurally extremely diverse group of natural products. They give colors to fruits and flowers, promote plant development, and protect organisms from pathogens. Many of these molecules are commercially valuable and have applications in pharmaceutical and chemical industries and as flavors and fragrances. Pioneering efforts investigated the natural carotenoid synthesis in C. glutamicum and identified decaprenoxanthin as yellow pigment of the microbe [48]. Decaprenoxanthin is formed via the nonmevalonate pathway involving farnesyl pyrophosphate, geranylgeranyl pyrophosphate, lycopene, and flavuxanthin. Disruption of the terminal biosynthetic pathway and upregulation of upstream reactions enabled accumulation of lycopene, a carotenoid of commercial value, to a level of 2.3 mg (g cell dry weight)−1 [48]. Pathway extension by heterologous expression of lycopene cyclase, beta-carotene ketolase, and hydroxylase from different marine bacteria then enabled the production of astaxanthin up to a volumetric productivity of 0.4 mg l−1 h−1 [49]. In addition, enhanced production of beta-carotene and bisanhydrobacterioruberin (a non-native carbon 50 carotenoid) was achieved by modulating pathway expression via the sigma factor sigA [420]. Recent studies unraveled the influence of light on carotenoid biosynthesis in C. glutamicum AJ1511 and demonstrated

433

434

12 Metabolic Engineering of Corynebacterium glutamicum

that the light responsive regulator CrtR naturally controls expression of the pathway [421]. With regard to monoterpenes, the overexpression of geranyl diphosphate synthases (GPPS) and pinene synthases (PS) from Pinus taeda and Abies grandis and native 1-deoxy-d-xylulose-5-phosphate synthase (Dxs) and isopentenyl diphosphate isomerase (Idi), enabled the formation of 2.7 μg (g cell dry weight)−1 αand β-pinene with applications in fragrances and flavors [422]. Similar strategies yielded the carbon 15 sesquiterpenes patchoulol (60 mg l−1 ) [423] and valencene (2.4 mg l−1 ) [424]. 12.4.4

Biopolymers

Different studies report the successful engineering of metabolic pathways that directly yield polymers of commercial interest in C. glutamicum. 12.4.4.1

Hyaluronic Acid

Hyaluronic acid (HA) is an important constituent of the human cellular connective tissue and plays an important role in cell proliferation and migration [91]. It can bind and retain high amounts of water and is used as an ingredient in moistening lotions [91, 425]. Chemically, HA is a copolymer of d-glucuronic acid and N-acetyl-glucosamine (GlcNAc). Its biosynthesis originates from glucose 6-phosphate and fructose 6-phosphate (Figure 12.8). The production of HA in C. glutamicum requires heterologous expression of only one gene: hasA, encoding HA synthase [91]. All other enzymes of the HA biosynthetic pathway are naturally encoded in the genome of C. glutamicum [91]. HasA, HasB (dehydrogenase), and HasC (uridylyltransferase) are of utmost importance for HA formation. The design of an HA producer consequently considered the expression of a codon-optimized hasA gene from Streptococcus equisimilis alone, and in combination with endogenous hasB and hasBC. In all cases the expression was controlled by either constitutive promoters of sod (encoding superoxide dismutase) or dapB (encoding dihydrodipicolinate reductase) or the inducible Ptac promoter [91]. Highest production (8 g l−1 HA) was achieved, when the hasAB operon was expressed under control of Ptac [91]. A subsequent study tested several more gene combinations, additionally considering hasD and hasE (pgi), which encode acetyltransferase and phosphoglucoisomerase, respectively [427]. Moreover, the ldhA gene, encoding lactate dehydrogenase, was deleted from the genome to increase the energy supply (ATP) in the host. The best strain, C. glutamicum /Δldh-AB, was tested in a 5 l fed-batch fermentation. The strain produced 21.6 g l−1 HA, threefold more than the native producer S. equisimilis [427]. Further optimization was based on flux-balance analysis, which predicted to attenuate the EMP pathway, the PP pathway, and pyruvate-consuming reactions for increased HA level. Antisense RNA expression then targeted fba (fructose 1,6-bisphosphate aldolase). Pyruvate dehydrogenase was downregulated by start codon exchange of aceE, and the genes zwf , ackA, pta, cat, and poxB (which encode glucose 6-phosphate dehydrogenase, acetate kinase, acetate phosphotransferase, acetyl-CoA:CoA transferase, and pyruvate:quinone oxidoreductase, respectively) were deleted.

12.4 Industrial Products

Figure 12.8 Systems metabolic engineering of Corynebacterium glutamicum CgHA25 for the production of hyaluronic acid (HA) [426]. Stepwise improvement was achieved by engineering of the biosynthesis (orange), carbon precursor supply (purple), and competing pathways (red). aceE, subunit of pyruvate dehydrogenase complex; ackA, acetate kinase; as-E, antisense RNA of pyruvate dehydrogenase complex (AceE); as-F, antisense RNA of fructose 1,6-bisposphate aldolase (Fba); cat, acetyl-CoA:CoA transferase; dapB, dihydrodipicolinate reductase; hasA, hyaluronic acid synthase from Streptococcus equisimilis; ldh, lactate dehydrogenase; poxB, pyruvate:quinone oxidoreductase; pta, phosphate acetyltransferase; ugdA1 (hasB homolog), UDP-glucose dehydrogenase; zwf , glucose 6-phosphate dehydrogenase. Source: Based on Cheng et al. [426].

The obtained producer CgHA25 (Figure 12.8) accumulated 28.7 g l−1 HA with an average molecular weight of 0.21 MDa [426]. 12.4.4.2

Polyglutamate

Polyglutamate (PGA) is a biodegradable polymer with desired properties, such as biocompatibility, water solubility, and viscosity, which promote its application for human health care and well-being [90]. The polypeptide chain of PGA consists of d-glutamate and/or l-glutamate connected via γ-amide linkages. Depending on the relative content of d- or l-glutamate, material properties and thus application fields differ. Although C. glutamicum is an extraordinary l-glutamate producer, a pathway for PGA synthesis is naturally missing. In order to upgrade C. glutamicum into a PGA producer, different PGA synthases from B. subtilis

435

436

12 Metabolic Engineering of Corynebacterium glutamicum

Ia1a (pgsBCA) and Bacillus licheniformis 9945a (capBCA) were tested [90]. The highest production of 11.4 g l−1 PGA was achieved with the capBCA gene variant, when expressed in the glutamate hyperproducer C. glutamicum F343. The monomer composition of the polymer was then modified by additional expression of the glutamate racemase gene racE from B. subtilis under control of the tac promoter. Different expression levels of racE were achieved by a plasmid design comprising different copies of the lacO operator [90]. Using this strategy, the relative content of l-glutamate in PGA could be modulated between 37% and 97%. High expression levels of racE, however, impaired growth and production. A strain with moderate racE expression exhibited highest PGA production (15.4 g l−1 ) with a productivity of 0.64 g l−1 h−1 [90]. High residual glucose at the end of the process was taken as a reason to optimize the initial supply in the medium. Using a starting concentration of 80 g l−1 glucose resulted in the highest PGA production of 21.4 g l−1 [90]. 12.4.5

Recombinant Proteins

Recombinant proteins are of huge importance in biotechnology. Therapeutic recombinant proteins alone record sales of more than US$200 billion [428]. While E. coli and B. subtilis are well-known and established producers of proteins [429–431], metabolic engineering of C. glutamicum into an efficient protein producer is still in the fledgling stages but with initial promising success stories [94]. Hereby, the use of efficient expression elements, including functional promoters, terminators and secretory systems emerged as a key for successful recombinant protein expression. 12.4.5.1

Endoxylanase

As described above, endoxylanases are needed for the efficient utilization of lignocellulosic biomass for fermentation purposes. Improved production of endoxylanase XynA in C. glutamicum CGMCC1.15647 was approached by engineering its expression and secretion [432]. In a first step, the secretory proteome of the strain was analyzed, which revealed that the cell wall protein CspB2 was the most abundant secreted protein. The native cspB2 promoter and the CspB signal peptide were hence used to control the expression and secretion of XynA [432]. The strain was additionally modified by deletion of cspB2 and clpS, the latter of which encodes a cytosolic protease. High expression of xynA was achieved by a combination of xynA integration into the genome and concomitant expression of xynA from two compatible plasmids (pXMJ19-xynA and pEC-XK99E-xynA) [432]. This strategy was highly beneficial, achieving extracellular XynA activities of up to 2500 U ml−1 . In a fed-batch process, the optimized strain accumulated the desired protein at 1.77 g l−1 [432]. 12.4.5.2

𝛃-Glucosidase

The application of glucosidases in food-grade processes requires the use of safe production hosts for recombinant production. Because of its GRAS status, C. glutamicum ATCC13032 was chosen for production of the ginsenoside-transforming glucosidase MT619 from Microbacterium testaceum

12.5 Conclusions and Perspectives

[433]. The enzyme is used to produce the minor ginsenoside compounds K (CK) and F1. Both molecules show promising anti-inflammatory and anticancer activity but are rarely present in ginseng, the natural source of ginsenosides [433]. A codon-optimized mt619 gene was expressed in C. glutamicum as a fusion construct with C3a, a cellulose binding module from Clostridium thermocellum, under control of a synthetic H36 promoter. C3a was used for protein immobilization on a cellulose carrier after release of the fusion protein from the cell by sonication treatment [433]. The immobilized enzyme formed 7.6 g l−1 CK and 9.4 g l−1 F1, using commercially available protopanaxadiol (PPD)-type ginsenoside mixtures (PPDGMs) and protopanaxatriol (PPT)-type ginsenoside mixtures (PPTGMs), respectively, the highest titers reported for these minor ginsenosides [433]. 12.4.6

Recombinant RNA

In recent years, there has been increasing interest in using recombinant RNA as therapeutic molecule and as alternative agent for pest control in agriculture [97, 434]. RNA production has been approached by different methods, including recombinant expression, enzyme-based production, and chemical synthesis [435]. For recombinant overexpression, E. coli has been established as a major production host, though attempts have been made to use alternative hosts for in vivo production, such as Rhodovulum sulfidophilum [434, 435]. Only recently, C. glutamicum, deficient for RNase III (Δrnc), was engineered for the production of recombinant RNA [97]. The production was based on a high copy number plasmid that was equipped with the strong promoter F1 and a corynephage BFK20 terminator. The target U1A*-RNA sequence of approximately 160 nucleotides comprised a stem/loop II hairpin structure and, together with the U1A protein, formed a U1 sn-ribonucleotide [97]. In addition to the full-sequence target RNA, small amounts of 3′ -truncated transcripts were formed. Overall, 300 mg l−1 intact model RNA was produced within 24 hours [97]. A follow-up study produced 75 mg l−1 double-stranded RNA (dsRNA) diap1*, known to suppress expression of the essential gene diap1 (death-associated inhibitor of apoptosis protein 1) in the model target pest Henosepilachna vigintioctopunctata [96]. In contrast to previous studies, a plasmid with substantially increased copy number [215] was applied [96]. Moreover, RNA stability was improved by sterilization of C. glutamicum with ethanol, following production [96].

12.5 Conclusions and Perspectives The discovery and isolation of Corynebacterium glutamicum in the 1950s opened up a new era in biotechnology. Since then, C. glutamicum has been metabolically engineered into one of the best-studied and most versatile production hosts. Its product portfolio is exploding and presently covers more than 80 different compounds. In addition, its substrate spectrum has been substantially extended,

437

438

12 Metabolic Engineering of Corynebacterium glutamicum

specifically toward nonfood feedstocks [267, 436]. Due to highly advanced tools for systems biology, synthetic biology, and systems and synthetic metabolic engineering [22, 161, 250, 437], the timeline from initial ideas to producing strain has been increasingly shortened, speeding up development. Owing to these massive achievements, the coming years promise further exciting discoveries and breakthroughs and predict C. glutamicum to be one of the key cell factories of the twenty-first century within the globally growing bioeconomy.

References 1 Becker, J., Rohles, C.M., and Wittmann, C. (2018). Metabolically engineered

2

3

4

5

6

7

8

9 10

11

12

Corynebacterium glutamicum for bio-based production of chemicals, fuels, materials, and healthcare products. Metab. Eng. 50: 122–141. Eggeling, L. and Bott, M. (2015). A giant market and a powerful metabolism: l-lysine provided by Corynebacterium glutamicum. Appl. Microbiol. Biotechnol. 99 (8): 3387–3394. Baritugo, K.A., Kim, H.T., David, Y. et al. (2018). Metabolic engineering of Corynebacterium glutamicum for fermentative production of chemicals in biorefinery. Appl. Microbiol. Biotechnol. 102 (9): 3915–3937. Felix, F., Letti, L.A.J., Vinicius de Melo Pereira, G. et al. (2019). l-Lysine production improvement: a review of the state of the art and patent landscape focusing on strain development and fermentation technologies. Crit. Rev. Biotechnol. 39 (8): 1031–1055. Wittmann, C. and Heinzle, E. (2001). Application of MALDI-TOF MS to lysine-producing Corynebacterium glutamicum: a novel approach for metabolic flux analysis. Eur. J. Biochem. 268 (8): 2441–2455. Wittmann, C. and Heinzle, E. (2001). Modeling and experimental design for metabolic flux analysis of lysine-producing Corynebacteria by mass spectrometry. Metab. Eng. 3 (2): 173–191. Wittmann, C. and Heinzle, E. (2002). Genealogy profiling through strain improvement by using metabolic network analysis: metabolic flux genealogy of several generations of lysine-producing Corynebacteria. Appl. Environ. Microbiol. 68 (12): 5843–5859. Wittmann, C., Kiefer, P., and Zelder, O. (2004). Metabolic fluxes in Corynebacterium glutamicum during lysine production with sucrose as carbon source. Appl. Environ. Microbiol. 70 (12): 7277–7287. Wittmann, C., Hans, M., and Heinzle, E. (2002). In vivo analysis of intracellular amino acid labelings by GC/MS. Anal. Biochem. 307 (2): 379–382. Wittmann, C. and Heinzle, E. (2001). MALDI-TOF MS for quantification of substrates and products in cultivations of Corynebacterium glutamicum. Biotechnol. Bioeng. 72 (6): 642–647. Krömer, J.O., Fritz, M., Heinzle, E., and Wittmann, C. (2005). In vivo quantification of intracellular amino acids and intermediates of the methionine pathway in Corynebacterium glutamicum. Anal. Biochem. 340 (1): 171–173. Krömer, J.O., Heinzle, E., Schröder, H., and Wittmann, C. (2006). Accumulation of homolanthionine and activation of a novel pathway for isoleucine

References

13

14

15

16 17 18

19

20

21

22

23 24

25

26

27

biosynthesis in Corynebacterium glutamicum McbR deletion strains. J. Bacteriol. 188 (2): 609–618. Krömer, J.O., Sorgenfrei, O., Klopprogge, K. et al. (2004). In-depth profiling of lysine-producing Corynebacterium glutamicum by combined analysis of the transcriptome, metabolome, and fluxome. J. Bacteriol. 186 (6): 1769–1784. Buschke, N., Becker, J., Schäfer, R. et al. (2013). Systems metabolic engineering of xylose-utilizing Corynebacterium glutamicum for production of 1,5-diaminopentane. Biotechnol. J. 8: 557–570. Kind, S., Kreye, S., and Wittmann, C. (2011). Metabolic engineering of cellular transport for overproduction of the platform chemical 1,5-diaminopentane in Corynebacterium glutamicum. Metab. Eng. 13 (5): 617–627. Bendt, A.K., Burkovski, A., Schaffer, S. et al. (2003). Towards a phosphoproteome map of Corynebacterium glutamicum. Proteomics 3 (8): 1637–1646. Hermann, T., Pfefferle, W., Baumann, C. et al. (2001). Proteome analysis of Corynebacterium glutamicum. Electrophoresis 22 (9): 1712–1723. Silberbach, M., Schäfer, M., Hüser, A.T. et al. (2005). Adaptation of Corynebacterium glutamicum to ammonium limitation: a global analysis using transcriptome and proteome techniques. Appl. Environ. Microbiol. 71 (5): 2391–2402. Becker, J., Zelder, O., Haefner, S. et al. (2011). From zero to hero – design-based systems metabolic engineering of Corynebacterium glutamicum for l-lysine production. Metab. Eng. 13 (2): 159–168. Melzer, G., Esfandabadi, M.E., Franco-Lara, E., and Wittmann, C. (2009). Flux design: In silico design of cell factories based on correlation of pathway fluxes to desired properties. BMC Syst. Biol. 3: 120. Krömer, J.O., Wittmann, C., Schröder, H., and Heinzle, E. (2006). Metabolic pathway analysis for rational design of l-methionine production by Escherichia coli and Corynebacterium glutamicum. Metab. Eng. 8 (4): 353–369. Cho, J.S., Choi, K.R., Prabowo, C.P.S. et al. (2017). CRISPR/Cas9-coupled recombineering for metabolic engineering of Corynebacterium glutamicum. Metab. Eng. 42: 157–167. Jiang, Y., Qian, F., Yang, J. et al. (2017). CRISPR-Cpf1 assisted genome editing of Corynebacterium glutamicum. Nat. Commun. 8: 15179. Liu, J., Wang, Y., Lu, Y. et al. (2017). Development of a CRISPR/Cas9 genome editing toolbox for Corynebacterium glutamicum. Microb. Cell Factories 16 (1): 205. Becker, J. and Wittmann, C. (2012). Systems and synthetic metabolic engineering for amino acid production – the heartbeat of industrial strain development. Curr. Opin. Biotechnol. 23 (5): 718–726. Becker, J. and Wittmann, C. (2015). Advanced biotechnology: metabolically engineered cells for the bio-based production of chemicals and fuels, materials, and health-care products. Angew. Chem. Int. Ed. Engl. 54: 3328–3350. Wendisch, V.F. (2017). Microbial production of amino acid-related compounds. Adv. Biochem. Eng. Biotechnol. 159: 255–269.

439

440

12 Metabolic Engineering of Corynebacterium glutamicum

28 Wang, Y.Y., Xu, J.Z., and Zhang, W.G. (2019). Metabolic engineering of

29

30

31

32 33

34

35

36

37

38

39

40

41

42

l-leucine production in Escherichia coli and Corynebacterium glutamicum: a review. Crit. Rev. Biotechnol. 39 (5): 633–647. Becker, J., Lange, A., Fabarius, J., and Wittmann, C. (2015). Top value platform chemicals: bio-based production of organic acids. Curr. Opin. Biotechnol. 36: 168–175. Becker, J. and Wittmann, C. (2012). Bio-based production of chemicals, materials and fuels –Corynebacterium glutamicum as versatile cell factory. Curr. Opin. Biotechnol. 23 (4): 631–640. Becker, J., Gießelmann, G., Hoffmann, S.L., and Wittmann, C. (2016). Corynebacterium glutamicum for sustainable bio-production: from metabolic physiology to systems metabolic engineering. In: Synthetic Biology – Metabolic Engineering (eds. H. Zhao and A.P. Zeng), 217–263. Heidelberg: Springer. Ikeda, M. (2003). Amino acid production processes. Adv. Biochem. Eng. Biotechnol. 79: 1–35. Bolten, C.J., Schröder, H., Dickschat, J., and Wittmann, C. (2010). Towards methionine overproduction in Corynebacterium glutamicum – methanethiol and dimethyldisulfide as reduced sulfur sources. J. Microbiol. Biotechnol. 20 (8): 1196–1203. Krömer, J.O., Bolten, C.J., Heinzle, E. et al. (2008). Physiological response of Corynebacterium glutamicum to oxidative stress induced by deletion of the transcriptional repressor McbR. Microbiology 154 (Pt 12): 3917–3930. Blombach, B. and Eikmanns, B.J. (2011). Current knowledge on isobutanol production with Escherichia coli, Bacillus subtilis and Corynebacterium glutamicum. Bioeng. Bugs 2 (6): 346–350. Blombach, B., Riester, T., Wieschalka, S. et al. (2011). Corynebacterium glutamicum tailored for efficient isobutanol production. Appl. Environ. Microbiol. 77 (10): 3300–3310. Smith, K.M., Cho, K.M., and Liao, J.C. (2010). Engineering Corynebacterium glutamicum for isobutanol production. Appl. Microbiol. Biotechnol. 87 (3): 1045–1055. Yamamoto, S., Suda, M., Niimi, S. et al. (2013). Strain optimization for efficient isobutanol production using Corynebacterium glutamicum under oxygen deprivation. Biotechnol. Bioeng. 110 (11): 2938–2948. Kind, S., Jeong, W.K., Schröder, H., and Wittmann, C. (2010). Systems-wide metabolic pathway engineering in Corynebacterium glutamicum for bio-based production of diaminopentane. Metab. Eng. 12 (4): 341–351. Kind, S., Neubauer, S., Becker, J. et al. (2014). From zero to hero – production of bio-based nylon from renewable resources using engineered Corynebacterium glutamicum. Metab. Eng. 25: 113–123. Kind, S. and Wittmann, C. (2011). Bio-based production of the platform chemical 1,5-diaminopentane. Appl. Microbiol. Biotechnol. 91 (5): 1287–1296. Mimitsuka, T., Sawai, H., Hatsu, M., and Yamada, K. (2007). Metabolic engineering of Corynebacterium glutamicum for cadaverine fermentation. Biosci. Biotechnol. Biochem. 71 (9): 2130–2135.

References

43 Schneider, J., Eberhardt, D., and Wendisch, V.F. (2012). Improving putrescine

44 45

46

47

48

49

50

51

52

53

54

55

56

production by Corynebacterium glutamicum by fine-tuning ornithine transcarbamoylase activity using a plasmid addiction system. Appl. Microbiol. Biotechnol. 95 (1): 169–178. Schneider, J. and Wendisch, V.F. (2010). Putrescine production by engineered Corynebacterium glutamicum. Appl. Microbiol. Biotechnol. 88 (4): 859–868. Schneider, J. and Wendisch, V.F. (2011). Biotechnological production of polyamines by bacteria: recent achievements and future perspectives. Appl. Microbiol. Biotechnol. 91 (1): 17–30. Becker, J. and Wittmann, C. (2017). Diamines for bio-based materials. In: Industrial Biotechnology (eds. C. Wittmann and J.C. Liao), 393–404. Weinheim: Wiley-VCH. Heider, S.A., Peters-Wendisch, P., Netzer, R. et al. (2014). Production and glucosylation of C50 and C40 carotenoids by metabolically engineered Corynebacterium glutamicum. Appl. Microbiol. Biotechnol. 98 (3): 1223–1235. Heider, S.A., Peters-Wendisch, P., and Wendisch, V.F. (2012). Carotenoid biosynthesis and overproduction in Corynebacterium glutamicum. BMC Microbiol. 12: 198. Henke, N.A., Heider, S.A., Peters-Wendisch, P., and Wendisch, V.F. (2016). Production of the marine carotenoid astaxanthin by metabolically engineered Corynebacterium glutamicum. Mar Drugs. 14 (7) https://doi.org/10 .3390/md14070124. Wieschalka, S., Blombach, B., Bott, M., and Eikmanns, B.J. (2013). Bio-based production of organic acids with Corynebacterium glutamicum. Microb. Biotechnol. 6 (2): 87–102. Wieschalka, S., Blombach, B., and Eikmanns, B.J. (2012). Engineering Corynebacterium glutamicum for the production of pyruvate. Appl. Microbiol. Biotechnol. 94 (2): 449–459. Huccetogullari, D., Luo, Z.W., and Lee, S.Y. (2019). Metabolic engineering of microorganisms for production of aromatic compounds. Microb. Cell Factories 18 (1): 41. Averesch, N.J.H. and Krömer, J.O. (2018). Metabolic engineering of the shikimate pathway for production of aromatics and derived compounds – present and future strain construction strategies. Front Bioeng Biotechnol. 6: 32. Kogue, T. and Inui, M. (2018). Recent advances in metabolic engineering of Corynebacterium glutamicum for bioproduction of value-added aromatic chemicals and natural products. Appl. Microbiol. Biotechnol. 102 (20): 8685–8705. Zhang, X., Lai, L., Xu, G. et al. (2019). Rewiring the central metabolic pathway for high-yield l-serine production in Corynebacterium glutamicum by using glucose. Biotechnol. J. 14 (6): e1800497. Zhan, M., Kan, B., Dong, J. et al. (2019). Metabolic engineering of Corynebacterium glutamicum for improved l-arginine synthesis by enhancing NADPH supply. J. Ind. Microbiol. Biotechnol. 46 (1): 45–54.

441

442

12 Metabolic Engineering of Corynebacterium glutamicum

57 Schwentner, A., Feith, A., Munch, E. et al. (2019). Modular systems

58

59

60

61

62

63

64

65

66

67

68

69

70

metabolic engineering enables balancing of relevant pathways for l-histidine production with Corynebacterium glutamicum. Biotechnol. Biofuels 12: 65. Wang, Y.Y., Zhang, F., Xu, J.Z. et al. (2019). Improvement of l-leucine production in Corynebacterium glutamicum by altering the redox flux. Int. J. Mol. Sci. 20 (8): 2020. https://doi.org/10.3390/ijms20082020. Ma, W., Wang, J., Li, Y., and Wang, X. (2019). Cysteine synthase A overexpression in Corynebacterium glutamicum enhances l-isoleucine production. Biotechnol. Appl. Biochem. 66 (1): 74–81. Dele-Osibanjo, T., Li, Q., Zhang, X. et al. (2019). Growth-coupled evolution of phosphoketolase to improve l-glutamate production by Corynebacterium glutamicum. Appl. Microbiol. Biotechnol. 103: 8413–8425. Kortmann, M., Mack, C., Baumgart, M., and Bott, M. (2019). Pyruvate carboxylase variants enabling improved lysine production from glucose identified by biosensor-based high-throughput fluorescence-activated cell sorting screening. ACS Synth. Biol. 8 (2): 274–281. Wu, W., Zhang, Y., Liu, D., and Chen, Z. (2019). Efficient mining of natural NADH-utilizing dehydrogenases enables systematic cofactor engineering of lysine synthesis pathway of Corynebacterium glutamicum. Metab. Eng. 52: 77–86. Xu, J.Z., Ruan, H.Z., Chen, X.L. et al. (2019). Equilibrium of the intracellular redox state for improving cell growth and l-lysine yield of Corynebacterium glutamicum by optimal cofactor swapping. Microb. Cell Factories 18 (1): 65. Xu, J.Z., Yu, H.B., Han, M. et al. (2019). Metabolic engineering of glucose uptake systems in Corynebacterium glutamicum for improving the efficiency of l-lysine production. J. Ind. Microbiol. Biotechnol. 46 (7): 937–949. Wang, X., Yang, H., Zhou, W. et al. (2019). Deletion of cg1360 affects ATP synthase function and enhances production of l-valine in Corynebacterium glutamicum. J. Microbiol. Biotechnol. 29 (8): 1288–1298. Wei, L., Wang, H., Xu, N. et al. (2019). Metabolic engineering of Corynebacterium glutamicum for l-cysteine production. Appl. Microbiol. Biotechnol. 103 (3): 1325–1338. Kishino, M., Kondoh, M., and Hirasawa, T. (2019). Enhanced l-cysteine production by overexpressing potential l-cysteine exporter genes in an l-cysteine-producing recombinant strain of Corynebacterium glutamicum. Biosci. Biotechnol. Biochem. 83 (12): 2390–2393. Kondoh, M. and Hirasawa, T. (2019). l-Cysteine production by metabolically engineered Corynebacterium glutamicum. Appl. Microbiol. Biotechnol. 103 (6): 2609–2619. Zhang, B., Gao, G., Chu, X.H., and Ye, B.C. (2019). Metabolic engineering of Corynebacterium glutamicum S9114 to enhance the production of l-ornithine driven by glucose and xylose. Bioresour. Technol. 284: 204–213. Shi, F., Zhang, S., Li, Y., and Lu, Z. (2019). Enhancement of substrate supply and ido expression to improve 4-hydroxyisoleucine production in recombinant Corynebacterium glutamicum ssp. lactofermentum. Appl. Microbiol. Biotechnol. 103 (10): 4113–4124.

References

71 Giesselmann, G., Dietrich, D., Jungmann, L. et al. (2019). Metabolic

72

73

74

75 76

77

78

79

80

81

82

83

84

engineering of Corynebacterium glutamicum for high-level ectoine production – design, combinatorial assembly and implementation of a transcriptionally balanced heterologous ectoine pathway. Biotechnol. J.: e201800417. Perez-Garcia, F., Brito, L.F., and Wendisch, V.F. (2019). Function of l-pipecolic acid as compatible solute in Corynebacterium glutamicum as basis for its production under hyperosmolar conditions. Front. Microbiol. 10: 340. Zhang, Y., Shang, X., Wang, B. et al. (2019). Reconstruction of tricarboxylic acid cycle in Corynebacterium glutamicum with a genome-scale metabolic network model for trans-4-hydroxyproline production. Biotechnol. Bioeng. 116 (1): 99–109. Kallscheuer, N. and Marienhagen, J. (2018). Corynebacterium glutamicum as platform for the production of hydroxybenzoic acids. Microb. Cell Factories 17 (1): 70. Dickschat, J., Wickel, S., Bolten, C.J. et al. (2010). Pyrazine biosynthesis in Corynebacterium glutamicum. Eur. J. Org. Chem. 2010 (14): 2687–2695. Becker, J. and Wittmann, C. (2019). A field of dreams: lignin valorization into chemicals, materials, fuels, and health-care products. Biotechnol. Adv. 37 (6): 107360. Shen, X.H., Zhou, N.Y., and Liu, S.J. (2012). Degradation and assimilation of aromatic compounds by Corynebacterium glutamicum: another potential for applications for this bacterium? Appl. Microbiol. Biotechnol. 95 (1): 77–89. Sun, H., Zhao, D., Xiong, B. et al. (2016). Engineering Corynebacterium glutamicum for violacein hyper production. Microb. Cell Factories 15 (1): 148. Kallscheuer, N., Vogt, M., Bott, M., and Marienhagen, J. (2017). Functional expression of plant-derived O-methyltransferase, flavanone 3-hydroxylase, and flavonol synthase in Corynebacterium glutamicum for production of pterostilbene, kaempferol, and quercetin. J. Biotechnol. 258: 190–196. Kallscheuer, N., Vogt, M., Stenzel, A. et al. (2016). Construction of a Corynebacterium glutamicum platform strain for the production of stilbenes and (2S)-flavanones. Metab. Eng. 38: 47–55. Kitade, Y., Hashimoto, R., Suda, M. et al. (2018). Production of 4-hydroxybenzoic acid by an aerobic growth-arrested bioprocess using metabolically engineered Corynebacterium glutamicum. Appl. Environ. Microbiol. 84 (6): e02587-17. Syukur Purwanto, H., Kang, M.S., Ferrer, L. et al. (2018). Rational engineering of the shikimate and related pathways in Corynebacterium glutamicum for 4-hydroxybenzoate production. J. Biotechnol. 282: 92–100. Kawaguchi, H., Sasaki, K., Uematsu, K. et al. (2015). 3-Amino-4-hydroxybenzoic acid production from sweet sorghum juice by recombinant Corynebacterium glutamicum. Bioresour. Technol. 198: 410–417. Okai, N., Masuda, T., Takeshima, Y. et al. (2017). Biotransformation of ferulic acid to protocatechuic acid by Corynebacterium glutamicum ATCC 21420 engineered to express vanillate O-demethylase. AMB Express 7 (1): 130.

443

444

12 Metabolic Engineering of Corynebacterium glutamicum

85 Okai, N., Miyoshi, T., Takeshima, Y. et al. (2016). Production of protocate-

86

87

88

89

90

91

92

93

94 95

96

97

98

99

chuic acid by Corynebacterium glutamicum expressing chorismate-pyruvate lyase from Escherichia coli. Appl. Microbiol. Biotechnol. 100 (1): 135–145. Shin, W.S., Lee, D., Lee, S.J. et al. (2018). Characterization of a non-phosphotransferase system for cis,cis-muconic acid production in Corynebacterium glutamicum. Biochem. Biophys. Res. Commun. 499: 279–284. Kogure, T., Kubota, T., Suda, M. et al. (2016). Metabolic engineering of Corynebacterium glutamicum for shikimate overproduction by growth-arrested cell reaction. Metab. Eng. 38: 204–216. Luo, Z.W., Cho, J.S., and Lee, S.Y. (2019). Microbial production of methyl anthranilate, a grape flavor compound. Proc. Natl. Acad. Sci. U. S. A. 116 (22): 10749–10756. Wiefel, L., Wohlers, K., and Steinbüchel, A. (2019). Re-evaluation of cyanophycin synthesis in Corynebacterium glutamicum and incorporation of glutamic acid and lysine into the polymer. Appl. Microbiol. Biotechnol. 103: 4033–4043. Xu, G., Zha, J., Cheng, H. et al. (2019). Engineering Corynebacterium glutamicum for the de novo biosynthesis of tailored poly-gamma-glutamic acid. Metab. Eng. 56: 39–49. Cheng, F., Gong, Q., Yu, H., and Stephanopoulos, G. (2016). High-titer biosynthesis of hyaluronic acid by recombinant Corynebacterium glutamicum. Biotechnol. J. 11 (4): 574–584. Choi, J.W., Yim, S.S., Kim, M.J., and Jeong, K.J. (2015). Enhanced production of recombinant proteins with Corynebacterium glutamicum by deletion of insertion sequences (IS elements). Microb. Cell Factories 14: 207. Goldbeck, O. and Seibold, G.M. (2018). Construction of pOGOduet – an inducible, bicistronic vector for synthesis of recombinant proteins in Corynebacterium glutamicum. Plasmid 95: 11–15. Lee, M.J. and Kim, P. (2018). Recombinant protein expression system in Corynebacterium glutamicum and its application. Front. Microbiol. 9: 2523. Yim, S.S., Choi, J.W., Lee, R.J. et al. (2016). Development of a new platform for secretory production of recombinant proteins in Corynebacterium glutamicum. Biotechnol. Bioeng. 113 (1): 163–172. Hashiro, S., Mitsuhashi, M., Chikami, Y. et al. (2019). Construction of Corynebacterium glutamicum cells as containers encapsulating dsRNA overexpressed for agricultural pest control. Appl. Microbiol. Biotechnol. 103: 8485–8496. Hashiro, S., Mitsuhashi, M., and Yasueda, H. (2019). Overexpression system for recombinant RNA in Corynebacterium glutamicum using a strong promoter derived from corynephage BFK20. J. Biosci. Bioeng. 128 (3): 255–263. Sun, D., Chen, J., Wang, Y. et al. (2019). Metabolic engineering of Corynebacterium glutamicum by synthetic small regulatory RNAs. J. Ind. Microbiol. Biotechnol. 46 (2): 203–208. Kalinowski, J., Bathe, B., Bartels, D. et al. (2003). The complete Corynebacterium glutamicum ATCC 13032 genome sequence and its impact on the

References

100

101

102

103

104

105

106

107

108

109

110

111

112

113

production of l-aspartate-derived amino acids and vitamins. J. Biotechnol. 104 (1–3): 5–25. Ikeda, M. and Nakagawa, S. (2003). The Corynebacterium glutamicum genome: features and impacts on biotechnological processes. Appl. Microbiol. Biotechnol. 62 (2–3): 99–109. Becker, J. and Wittmann, C. (2017). Industrial microorganisms: Corynebacterium glutamicum. In: Industrial Biotechnology (eds. C. Wittmann and J.C. Liao), 183–203. Weinheim: Wiley-VCH. Ohnishi, J., Hayashi, M., Mitsuhashi, S., and Ikeda, M. (2003). Efficient 40 degrees C fermentation of l-lysine by a new Corynebacterium glutamicum mutant developed by genome breeding. Appl. Microbiol. Biotechnol. 62 (1): 69–75. Ohnishi, J., Mitsuhashi, S., Hayashi, M. et al. (2002). A novel methodology employing Corynebacterium glutamicum genome information to generate a new l-lysine-producing mutant. Appl. Microbiol. Biotechnol. 58 (2): 217–223. Wendisch, V.F. and Polen, T. (2013). Transcriptome/proteome analysis of Corynebacterium glutamicum. In: Corynebacterium glutamicum – Biology and Biotechnology (eds. H. Yukawa and M. Inui), 173–216. Berlin-Heidelberg: Springer. Muffler, A., Bettermann, S., Haushalter, M. et al. (2002). Genome-wide transcription profiling of Corynebacterium glutamicum after heat shock and during growth on acetate and glucose. J. Biotechnol. 98 (2–3): 255–268. Hermann, T., Wersch, G., Uhlemann, E.M. et al. (1998). Mapping and identification of Corynebacterium glutamicum proteins by two-dimensional gel electrophoresis and microsequencing. Electrophoresis 19 (18): 3217–3221. Kjeldsen, K.R. and Nielsen, J. (2009). In silico genome-scale reconstruction and validation of the Corynebacterium glutamicum metabolic network. Biotechnol. Bioeng. 102 (2): 583–597. von Kamp, A. and Klamt, S. (2017). Growth-coupled overproduction is feasible for almost all metabolites in five major production organisms. Nat. Commun. 8: 15956. Shinfuku, Y., Sorpitiporn, N., Sono, M. et al. (2009). Development and experimental verification of a genome-scale metabolic model for Corynebacterium glutamicum. Microb. Cell Factories 8: 43. Radhakrishnan, D., Rajvanshi, M., and Venkatesh, K.V. (2010). Phenotypic characterization of Corynebacterium glutamicum using elementary modes towards synthesis of amino acids. Syst. Synth. Biol. 4 (4): 281–291. Hoffmann, S.L., Jungmann, L., Schiefelbein, S. et al. (2018). Lysine production from the sugar alcohol mannitol: design of the cell factory Corynebacterium glutamicum SEA-3 through integrated analysis and engineering of metabolic pathway fluxes. Metab. Eng. 47: 475–487. Rezola, A., de Figueiredo, L.F., Brock, M. et al. (2011). Exploring metabolic pathways in genome-scale networks via generating flux modes. Bioinformatics 27 (4): 534–540. Follmann, M., Ochrombel, I., Kramer, R. et al. (2009). Functional genomics of pH homeostasis in Corynebacterium glutamicum revealed novel links

445

446

12 Metabolic Engineering of Corynebacterium glutamicum

114

115

116

117

118

119

120

121

122

123

124

125

126

between pH response, oxidative stress, iron homeostasis and methionine synthesis. BMC Genomics 10: 621. Käß, F., Hariskos, I., Michel, A. et al. (2014). Assessment of robustness against dissolved oxygen/substrate oscillations for C. glutamicum DM1933 in two-compartment bioreactor. Bioprocess Biosyst. Eng. 37 (6): 1151–1162. Silberbach, M. and Burkovski, A. (2006). Application of global analysis techniques to Corynebacterium glutamicum: new insights into nitrogen regulation. J. Biotechnol. 126 (1): 101–110. Silberbach, M., Hüser, A., Kalinowski, J. et al. (2005). DNA microarray analysis of the nitrogen starvation response of Corynebacterium glutamicum. J. Biotechnol. 119 (4): 357–367. Rey, D.A., Pühler, A., and Kalinowski, J. (2003). The putative transcriptional repressor McbR, member of the TetR-family, is involved in the regulation of the metabolic network directing the synthesis of sulfur containing amino acids in Corynebacterium glutamicum. J. Biotechnol. 103 (1): 51–65. Arndt, A., Auchter, M., Ishige, T. et al. (2008). Ethanol catabolism in Corynebacterium glutamicum. J. Mol. Microbiol. Biotechnol. 15 (4): 222–233. Beckers, G., Strösser, J., Hildebrandt, U. et al. (2005). Regulation of AmtR-controlled gene expression in Corynebacterium glutamicum: mechanism and characterization of the AmtR regulon. Mol. Microbiol. 58 (2): 580–595. Buchinger, S., Strösser, J., Rehm, N. et al. (2009). A combination of metabolome and transcriptome analyses reveals new targets of the Corynebacterium glutamicum nitrogen regulator AmtR. J. Biotechnol. 140 (1–2): 68–74. Jakoby, M., Nolden, L., Meier-Wagner, J. et al. (2000). AmtR, a global repressor in the nitrogen regulation system of Corynebacterium glutamicum. Mol. Microbiol. 37 (4): 964–977. Brune, I., Werner, H., Hüser, A.T. et al. (2006). The DtxR protein acting as dual transcriptional regulator directs a global regulatory network involved in iron metabolism of Corynebacterium glutamicum. BMC Genomics 7: 21. Hünnefeld, M., Persicke, M., Kalinowski, J., and Frunzke, J. (2019). The MarR-type regulator MalR is involved in stress-responsive cell envelope remodeling in Corynebacterium glutamicum. Front. Microbiol. 10: 1039. Rey, D.A., Nentwich, S.S., Koch, D.J. et al. (2005). The McbR repressor modulated by the effector substance S-adenosylhomocysteine controls directly the transcription of a regulon involved in Sulphur metabolism of Corynebacterium glutamicum ATCC 13032. Mol. Microbiol. 56 (4): 871–887. van Ooyen, J., Emer, D., Bussmann, M. et al. (2011). Citrate synthase in Corynebacterium glutamicum is encoded by two gltA transcripts which are controlled by RamA, RamB, and GlxR. J. Biotechnol. 154 (2–3): 140–148. Ehira, S., Teramoto, H., Inui, M., and Yukawa, H. (2009). Regulation of Corynebacterium glutamicum heat shock response by the extracytoplasmic-function sigma factor SigH and transcriptional regulators HspR and HrcA. J. Bacteriol. 191 (9): 2964–2972.

References

127 Auchter, M., Cramer, A., Hüser, A. et al. (2011). RamA and RamB are global

128

129

130

131

132

133

134

135

136

137

138

139

140

141

transcriptional regulators in Corynebacterium glutamicum and control genes for enzymes of the central metabolism. J. Biotechnol. 154 (2–3): 126–139. Emer, D., Krug, A., Eikmanns, B.J., and Bott, M. (2009). Complex expression control of the Corynebacterium glutamicum aconitase gene: identification of RamA as a third transcriptional regulator besides AcnR and RipA. J. Biotechnol. 140 (1–2): 92–98. Hirasawa, T., Saito, M., Yoshikawa, K. et al. (2018). Integrated analysis of the transcriptome and metabolome of Corynebacterium glutamicum during penicillin-induced glutamic acid production. Biotechnol. J. 13 (5): e1700612. Zhang, H., Li, Y., Wang, C., and Wang, X. (2018). Understanding the high l-valine production in Corynebacterium glutamicum VWB-1 using transcriptomics and proteomics. Sci. Rep. 8 (1): 3632. Hayashi, M., Ohnishi, J., Mitsuhashi, S. et al. (2006). Transcriptome analysis reveals global expression changes in an industrial l-lysine producer of Corynebacterium glutamicum. Biosci. Biotechnol. Biochem. 70 (2): 546–550. Ikeda, M., Mitsuhashi, S., Tanaka, K., and Hayashi, M. (2009). Reengineering of a Corynebacterium glutamicum l-arginine and l-citrulline producer. Appl. Environ. Microbiol. 75 (6): 1635–1641. Zahoor, A., Otten, A., and Wendisch, V.F. (2014). Metabolic engineering of Corynebacterium glutamicum for glycolate production. J. Biotechnol. 192: 366–375. Chung, S.C., Park, J.S., Yun, J., and Park, J.H. (2017). Improvement of succinate production by release of end-product inhibition in Corynebacterium glutamicum. Metab. Eng. 40: 157–164. Xu, N., Lv, H., Wei, L. et al. (2019). Impaired oxidative stress and sulfur assimilation contribute to acid tolerance of Corynebacterium glutamicum. Appl. Microbiol. Biotechnol. 103 (4): 1877–1891. Pfeifer-Sancar, K., Mentz, A., Rückert, C., and Kalinowski, J. (2013). Comprehensive analysis of the Corynebacterium glutamicum transcriptome using an improved RNAseq technique. BMC Genomics 14: 888. Mentz, A., Neshat, A., Pfeifer-Sancar, K. et al. (2013). Comprehensive discovery and characterization of small RNAs in Corynebacterium glutamicum ATCC 13032. BMC Genomics 14: 714. Sun, Y., Guo, W., Wang, F. et al. (2017). Transcriptome analysis of Corynebacterium glutamicum in the process of recombinant protein expression in bioreactors. PLoS One 12 (4): e0174824. Kim, H.I., Nam, J.Y., Cho, J.Y. et al. (2013). Next-generation sequencing-based transcriptome analysis of l-lysine-producing Corynebacterium glutamicum ATCC 21300 strain. J. Microbiol. 51 (6): 877–880. Ruwe, M., Persicke, M., Busche, T. et al. (2019). Physiology and transcriptional analysis of (p)ppGpp-related regulatory effects in Corynebacterium glutamicum. Front. Microbiol. 10: 2769. Fanous, A., Hecker, M., Gorg, A. et al. (2010). Corynebacterium glutamicum as an indicator for environmental cobalt and silver stress – a proteome analysis. J. Environ. Sci. Health B 45 (7): 666–675.

447

448

12 Metabolic Engineering of Corynebacterium glutamicum

142 Fanous, A., Weiland, F., Luck, C. et al. (2007). A proteome analy-

143

144 145

146

147

148

149

150

151

152

153

154

155

sis of Corynebacterium glutamicum after exposure to the herbicide 2,4-dichlorophenoxy acetic acid (2,4-D). Chemosphere 69 (1): 25–31. Fanous, A., Weiss, W., Gorg, A. et al. (2008). A proteome analysis of the cadmium and mercury response in Corynebacterium glutamicum. Proteomics 8 (23–24): 4976–4986. Fränzel, B., Trotschel, C., Rückert, C. et al. (2010). Adaptation of Corynebacterium glutamicum to salt-stress conditions. Proteomics 10 (3): 445–457. Haussmann, U. and Poetsch, A. (2012). Global proteome survey of protocatechuate- and glucose-grown Corynebacterium glutamicum reveals multiple physiological differences. J. Proteome 75 (9): 2649–2659. Polen, T., Schluesener, D., Poetsch, A. et al. (2007). Characterization of citrate utilization in Corynebacterium glutamicum by transcriptome and proteome analysis. FEMS Microbiol. Lett. 273 (1): 109–119. Qi, S.W., Chaudhry, M.T., Zhang, Y. et al. (2007). Comparative proteomes of Corynebacterium glutamicum grown on aromatic compounds revealed novel proteins involved in aromatic degradation and a clear link between aromatic catabolism and gluconeogenesis via fructose-1,6-bisphosphatase. Proteomics 7 (20): 3775–3787. Vasco-Cardenas, M.F., Banos, S., Ramos, A. et al. (2013). Proteome response of Corynebacterium glutamicum to high concentration of industrially relevant C4 and C5 dicarboxylic acids. J. Proteome 85: 65–88. Lu, D.M., Liu, J.Z., and Mao, Z.W. (2012). Engineering of Corynebacterium glutamicum to enhance l-ornithine production by gene knockout and comparative proteomic analysis. Chin. J. Chem. Eng. 20 (4): 731–739. Fränzel, B., Poetsch, A., Trotschel, C. et al. (2010). Quantitative proteomic overview on the Corynebacterium glutamicum l-lysine producing strain DM1730. J. Proteome 73 (12): 2336–2353. Sasaki, Y., Eng, T., Herbert, R.A. et al. (2019). Engineering Corynebacterium glutamicum to produce the biogasoline isopentenol from plant biomass hydrolysates. Biotechnol. Biofuels 12: 41. Kind, S., Jeong, W.K., Schröder, H. et al. (2010). Identification and elimination of the competing N-acetyldiaminopentane pathway for improved production of diaminopentane by Corynebacterium glutamicum. Appl. Environ. Microbiol. 76 (15): 5175–5180. Petri, K., Walter, F., Persicke, M. et al. (2013). A novel type of N-acetylglutamate synthase is involved in the first step of arginine biosynthesis in Corynebacterium glutamicum. BMC Genomics 14: 713. Rohles, C.M., Giesselmann, G., Kohlstedt, M. et al. (2016). Systems metabolic engineering of Corynebacterium glutamicum for the production of the carbon-5 platform chemicals 5-aminovalerate and glutarate. Microb. Cell Factories 15 (1): 154. Kawaguchi, H., Yoshihara, K., Hara, K.Y. et al. (2018). Metabolome analysis-based design and engineering of a metabolic pathway in Corynebacterium glutamicum to match rates of simultaneous utilization of d-glucose and l-arabinose. Microb. Cell Factories 17 (1): 76.

References

156 van Ooyen, J., Noack, S., Bott, M. et al. (2012). Improved l-lysine produc-

157

158

159

160

161

162 163 164 165

166

167

168

169

170

tion with Corynebacterium glutamicum and systemic insight into citrate synthase flux and activity. Biotechnol. Bioeng. 109 (8): 2070–2081. Rehm, N., Buchinger, S., Strosser, J. et al. (2010). Impact of adenylyltransferase GlnE on nitrogen starvation response in Corynebacterium glutamicum. J. Biotechnol. 145 (3): 244–252. Kind, S., Becker, J., and Wittmann, C. (2013). Increased lysine production by flux coupling of the tricarboxylic acid cycle and the lysine biosynthetic pathway – metabolic engineering of the availability of succinyl-CoA in Corynebacterium glutamicum. Metab. Eng. 15: 184–195. Becker, J., Klopprogge, C., Herold, A. et al. (2007). Metabolic flux engineering of l-lysine production in Corynebacterium glutamicum – over expression and modification of G6P dehydrogenase. J. Biotechnol. 132 (2): 99–109. Schwentner, A., Feith, A., Munch, E. et al. (2018). Metabolic engineering to guide evolution – creating a novel mode for l-valine production with Corynebacterium glutamicum. Metab. Eng. 47: 31–41. Becker, J. and Wittmann, C. (2018). From systems biology to metabolically engineered cells – an omics perspective on the development of industrial microbes. Curr. Opin. Microbiol. 45: 180–188. Zampieri, M., Sekar, K., Zamboni, N., and Sauer, U. (2017). Frontiers of high-throughput metabolomics. Curr. Opin. Chem. Biol. 36: 15–23. Bolten, C.J., Kiefer, P., Letisse, F. et al. (2007). Sampling for metabolome analysis of microorganisms. Anal. Chem. 79 (10): 3843–3849. Van Gulik, W.M., Canelas, A.B., Taymaz-Nikerel, H. et al. (2012). Fast sampling of the cellular metabolome. Methods Mol. Biol. 881: 279–306. Peifer, S., Schneider, K., Nurenberg, G. et al. (2012). Quantitation of intracellular purine intermediates in different Corynebacteria using electrospray LC-MS/MS. Anal. Bioanal. Chem. 404 (8): 2295–2305. Kohlstedt, M., Sappa, P.K., Meyer, H. et al. (2014). Adaptation of Bacillus subtilis carbon core metabolism to simultaneous nutrient limitation and osmotic challenge: a multi-omics perspective. Environ. Microbiol. 16 (6): 1898–1917. Gläser, L., Kuhl, M., Jovanovic, S. et al. (2020). A common approach for absolute quantification of short chain CoA thioesters in prokaryotic and eukaryotic microbes. Microb. Cell Factories 19: 160. Nentwich, L.M. and Wittmann, C.W. (2020). Emergency department evaluation of the adult psychiatric patient. Emerg. Med. Clin. North Am. 38 (2): 419–435. Schiefelbein, S., Fröhlich, A., John, G.T. et al. (2013). Oxygen supply in disposable shake-flasks: prediction of oxygen transfer rate, oxygen saturation and maximum cell concentration during aerobic growth. Biotechnol. Lett. 35 (8): 1223–1230. John, G.T., Klimant, I., Wittmann, C., and Heinzle, E. (2003). Integrated optical sensing of dissolved oxygen in microtiter plates: a novel tool for microbial cultivation. Biotechnol. Bioeng. 81 (7): 829–836.

449

450

12 Metabolic Engineering of Corynebacterium glutamicum

171 Wittmann, C., Kim, H.M., John, G., and Heinzle, E. (2003). Characterization

172

173

174

175

176

177 178 179

180

181

182

183 184

185

and application of an optical sensor for quantification of dissolved O2 in shake-flasks. Biotechnol. Lett. 25 (5): 377–380. Yang, T.H., Wittmann, C., and Heinzle, E. (2003). Dynamic calibration and dissolved gas analysis using membrane inlet mass spectrometry for the quantification of cell respiration. Rapid Commun. Mass Spectrom. 17 (24): 2721–2731. Tholey, A., Wittmann, C., Kang, M.J. et al. (2002). Derivatization of small biomolecules for optimized matrix-assisted laser desorption/ionization mass spectrometry. J. Mass Spectrom. 37 (9): 963–973. Kohlstedt, M., Becker, J., and Wittmann, C. (2010). Metabolic fluxes and beyond-systems biology understanding and engineering of microbial metabolism. Appl. Microbiol. Biotechnol. 88 (5): 1065–1075. Schwechheimer, S.K., Becker, J., and Wittmann, C. (2018). Towards better understanding of industrial cell factories: novel approaches for 13 C metabolic flux analysis in complex nutrient environments. Curr. Opin. Biotechnol. 54: 128–137. Guo, W., Sheng, J., and Feng, X. (2018). Synergizing 13 C metabolic flux analysis and metabolic engineering for biochemical production. Adv. Biochem. Eng. Biotechnol. 162: 265–299. Wittmann, C. (2002). Metabolic flux analysis using mass spectrometry. Adv. Biochem. Eng. Biotechnol. 74: 39–64. Wittmann, C. (2007). Fluxome analysis using GC-MS. Microb. Cell Factories 6: 6. Kohlstedt, M. and Wittmann, C. (2019). GC-MS-based 13 C metabolic flux analysis resolves the cyclic glucose metabolism of Pseudomonas putida KT2440 and Pseudomonas aeruginosa PAO1. Metab. Eng. 54: 35–53. Becker, J. and Wittmann, C. (2014). GC-MS-based 13 C metabolic flux analysis. In: Metabolic Flux Analysis – Methods and Protocols (eds. J.O. Krömer, L.K. Nielsen and L.M. Blank), 165–174. Springer (Humana Press). Becker, J. and Wittmann, C. (2020). Pathways at work: metabolic flux analysis of the industrial cell factory Corynebacterium glutamicum. In: Corynebacterium glutamicum – Biology and Biotechnology (eds. H. Yukawa and M. Inui), 227–265. Berlin-Heidelberg: Springer. Wittmann, C., Kim, H.M., and Heinzle, E. (2004). Metabolic network analysis of lysine producing Corynebacterium glutamicum at a miniaturized scale. Biotechnol. Bioeng. 87 (1): 1–6. Heux, S., Berges, C., Millard, P. et al. (2017). Recent advances in high-throughput 13 C-fluxomics. Curr. Opin. Biotechnol. 43: 104–109. Heinzle, E., Yuan, Y., Kumar, S. et al. (2008). Analysis of 13 C labeling enrichment in microbial culture applying metabolic tracer experiments using gas chromatography-combustion-isotope ratio mass spectrometry. Anal. Biochem. 380 (2): 202–210. Quek, L.E., Wittmann, C., Nielsen, L.K., and Krömer, J.O. (2009). OpenFLUX: efficient modelling software for 13 C-based metabolic flux analysis. Microb. Cell Factories 8: 25.

References

186 Iwatani, S., Van Dien, S., Shimbo, K. et al. (2007). Determination of

187

188

189

190

191

192

193

194

195 196

197

198

199

metabolic flux changes during fed-batch cultivation from measurements of intracellular amino acids by LC-MS/MS. J. Biotechnol. 128 (1): 93–111. Wiechert, W. and Nöh, K. (2013). Isotopically non-stationary metabolic flux analysis: complex yet highly informative. Curr. Opin. Biotechnol. 24 (6): 979–986. Drysch, A., El Massaoudi, M., Mack, C. et al. (2003). Production process monitoring by serial mapping of microbial carbon flux distributions using a novel sensor reactor approach: II-13 C-labeling-based metabolic flux analysis and L-lysine production. Metab. Eng. 5 (2): 96–107. Drysch, A., El Massaoudi, M., Wiechert, W. et al. (2004). Serial flux mapping of Corynebacterium glutamicum during fed-batch l-lysine production using the sensor reactor approach. Biotechnol. Bioeng. 85 (5): 497–505. El Massaoudi, M., Spelthahn, J., Drysch, A. et al. (2003). Production process monitoring by serial mapping of microbial carbon flux distributions using a novel sensor reactor approach: I-sensor reactor system. Metab. Eng. 5 (2): 86–95. Yuan, Y., Yang, T.H., and Heinzle, E. (2010). 13 C metabolic flux analysis for larger scale cultivation using gas chromatography-combustion-isotope ratio mass spectrometry. Metab. Eng. 12 (4): 392–400. Yang, T.H., Heinzle, E., and Wittmann, C. (2005). Theoretical aspects of 13 C metabolic flux analysis with sole quantification of carbon dioxide labeling. Comput. Biol. Chem. 29 (2): 121–133. Yang, T.H., Wittmann, C., and Heinzle, E. (2006). Respirometric 13 C flux analysis, part I: design, construction and validation of a novel multiple reactor system using on-line membrane inlet mass spectrometry. Metab. Eng. 8 (5): 417–431. Yang, T.H., Wittmann, C., and Heinzle, E. (2006). Respirometric 13 C flux analysis – part II: in vivo flux estimation of lysine-producing Corynebacterium glutamicum. Metab. Eng. 8 (5): 432–446. Wittmann, C. (2010). Analysis and engineering of metabolic pathway fluxes in Corynebacterium glutamicum. Adv. Biochem. Eng. Biotechnol. 120: 21–49. Wittmann, C. and De Graaf, A.A. (2005). Metabolic flux analysis in Corynebacterium glutamicum. In: Handbook of Corynebacterium glutamicum (eds. L. Eggeling and M. Bott), 277–304. Boca Raton: CRC Press. Bartek, T., Blombach, B., Lang, S. et al. (2011). Comparative 13 C metabolic flux analysis of pyruvate dehydrogenase complex-deficient, l-valine-producing Corynebacterium glutamicum. Appl. Environ. Microbiol. 77 (18): 6644–6652. Kappelmann, J., Wiechert, W., and Noack, S. (2016). Cutting the Gordian knot: identifiability of anaplerotic reactions in Corynebacterium glutamicum by means of 13 C-metabolic flux analysis. Biotechnol. Bioeng. 113 (3): 661–674. Marx, A., de Graaf, A.A., Wiechert, W. et al. (1996). Determination of the fluxes in the central metabolism of Corynebacterium glutamicum by nuclear magnetic resonance spectroscopy combined with metabolite balancing. Biotechnol. Bioeng. 49 (2): 111–129.

451

452

12 Metabolic Engineering of Corynebacterium glutamicum

200 Sonntag, K., Schwinde, J., deGraaf, A.A. et al. (1995).

201

202

203

204

205

206

207

208 209

210

211

212

213

13 C

NMR studies of the fluxes in the central metabolism of Corynebacterium glutamicum during growth and overproduction of amino acids in batch cultures. Appl. Microbiol. Biotechnol. 44 (3–4): 489–495. Becker, J., Klopprogge, C., and Wittmann, C. (2008). Metabolic responses to pyruvate kinase deletion in lysine producing Corynebacterium glutamicum. Microb. Cell Factories 7: 8. Becker, J., Klopprogge, C., Zelder, O. et al. (2005). Amplified expression of fructose 1,6-bisphosphatase in Corynebacterium glutamicum increases in vivo flux through the pentose phosphate pathway and lysine production on different carbon sources. Appl. Environ. Microbiol. 71 (12): 8587–8596. Kiefer, P., Heinzle, E., Zelder, O., and Wittmann, C. (2004). Comparative metabolic flux analysis of lysine-producing Corynebacterium glutamicum cultured on glucose or fructose. Appl. Environ. Microbiol. 70 (1): 229–239. Marx, A., Eikmanns, B.J., Sahm, H. et al. (1999). Response of the central metabolism in Corynebacterium glutamicum to the use of an NADH-dependent glutamate dehydrogenase. Metab. Eng. 1 (1): 35–48. Marx, A., Hans, S., Möckel, B. et al. (2003). Metabolic phenotype of phosphoglucose isomerase mutants of Corynebacterium glutamicum. J. Biotechnol. 104 (1–3): 185–197. Umakoshi, M., Hirasawa, T., Furusawa, C. et al. (2011). Improving protein secretion of a transglutaminase-secreting Corynebacterium glutamicum recombinant strain on the basis of 13 C metabolic flux analysis. J. Biosci. Bioeng. 112 (6): 595–601. Santamaria, R., Gil, J., Mesas, J., and Martin, J. (1984). Characterization of an endogenous plasmid and development of cloning vectors and a transformation system in Brevibacterium lactofermentum. J. Gen. Microbiol. 130: 2237–2246. Miwa, K., Matsui, H., Terabe, M. et al. (1984). Cryptic plasmids in glutamic acid producing bacteria. Agric. Biol. Chem. 48 (11): 2901–2903. Katsumata, R., Ozaki, A., Oka, T., and Furuya, A. (1984). Protoplast transformation of glutamate-producing bacteria with plasmid DNA. J. Bacteriol. 159 (1): 306–311. Miwa, K., Matsui, K., Terabe, M. et al. (1985). Construction of novel shuttle vectors and a cosmid vector for the glutamic acid-producing bacteria Brevibacterium lactofermentum and Corynebacterium glutamicum. Gene 39 (2–3): 281–286. Nesvera, J. and Patek, M. (2011). Tools for genetic manipulations in Corynebacterium glutamicum and their applications. Appl. Microbiol. Biotechnol. 90 (5): 1641–1654. Li, Y., Ai, Y., Zhang, J. et al. (2020). A novel expression vector for Corynebacterium glutamicum with an auxotrophy complementation system. Plasmid 107: 102476. Nesvera, J., Patek, M., Hochmannova, J. et al. (1997). Plasmid pGA1 from Corynebacterium glutamicum codes for a gene product that positively influences plasmid copy number. J. Bacteriol. 179 (5): 1525–1532.

References

214 Choi, J.W., Yim, S.S., and Jeong, K.J. (2018). Development of a

215

216

217

218

219

220

221

222

223

224

225 226 227

high-copy-number plasmid via adaptive laboratory evolution of Corynebacterium glutamicum. Appl. Microbiol. Biotechnol. 102 (2): 873–883. Hashiro, S., Mitsuhashi, M., and Yasueda, H. (2019). High copy number mutants derived from Corynebacterium glutamicum cryptic plasmid pAM330 and copy number control. J. Biosci. Bioeng. 127 (5): 529–538. Jäger, W., Schäfer, A., Pühler, A. et al. (1992). Expression of the Bacillus subtilis sacB gene leads to sucrose sensitivity in the gram-positive bacterium Corynebacterium glutamicum but not in Streptomyces lividans. J. Bacteriol. 174 (16): 5462–5465. Patek, M. and Nesvera, J. (2013). Promoters and plasmid vectors of Corynebacterium glutamicum. In: Corynebacterium glutamicum – Biology and Biotechnology (eds. H. Yukawa and M. Inui), 51–88. Berlin-Heidelberg: Springer. Schäfer, A., Tauch, A., Jäger, W. et al. (1994). Small mobilizable multi-purpose cloning vectors derived from the Escherichia coli plasmids pK18 and pK19: selection of defined deletions in the chromosome of Corynebacterium glutamicum. Gene 145 (1): 69–73. Jinek, M., Chylinski, K., Fonfara, I. et al. (2012). A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337 (6096): 816–821. Peng, F., Wang, X., Sun, Y. et al. (2017). Efficient gene editing in Corynebacterium glutamicum using the CRISPR/Cas9 system. Microb. Cell Factories 16 (1): 201. Lee, S.S., Park, J., Heo, Y.B., and Woo, H.M. (2020). Case study of xylose conversion to glycolate in Corynebacterium glutamicum: current limitation and future perspective of the CRISPR-Cas systems. Enzym. Microb. Technol. 132: 109395. Zhang, J., Yang, F., Yang, Y. et al. (2019). Optimizing a CRISPR-Cpf1-based genome engineering system for Corynebacterium glutamicum. Microb. Cell Factories 18 (1): 60. Wang, Y., Liu, Y., Li, J. et al. (2019). Expanding targeting scope, editing window, and base transition capability of base editing in Corynebacterium glutamicum. Biotechnol. Bioeng. 116 (11): 3016–3029. Hirano, S., Abudayyeh, O.O., Gootenberg, J.S. et al. (2019). Structural basis for the promiscuous PAM recognition by Corynebacterium diphtheriae Cas9. Nat. Commun. 10 (1): 1968. Nishimasu, H., Shi, X., Ishiguro, S. et al. (2018). Engineered CRISPR-Cas9 nuclease with expanded targeting space. Science 361 (6408): 1259–1262. Hu, J.H., Miller, S.M., Geurts, M.H. et al. (2018). Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature 556 (7699): 57. Wendisch, V. (2020). Genome-reduced Corynebacterium glutamicum fit for biotechnological applications. In: Minimal Cells: Design, Construction, Biotechnological Applications (eds. A. Lara and G. Gosset), 95–116. Cham: Springer.

453

454

12 Metabolic Engineering of Corynebacterium glutamicum

228 Unthan, S., Baumgart, M., Radek, A. et al. (2015). Chassis organism from

229

230

231

232

233

234

235

236

237

238

239

240

Corynebacterium glutamicum – a top-down approach to identify and delete irrelevant gene clusters. Biotechnol. J. 10 (2): 290–301. Baumgart, M., Unthan, S., Ruckert, C. et al. (2013). Construction of a prophage-free variant of Corynebacterium glutamicum ATCC 13032 for use as a platform strain for basic research and industrial biotechnology. Appl. Environ. Microbiol. 79 (19): 6006–6015. Binder, S., Siedler, S., Marienhagen, J. et al. (2013). Recombineering in Corynebacterium glutamicum combined with optical nanosensors: a general strategy for fast producer strain generation. Nucleic Acids Res. 41 (12): 6360–6369. Jurischka, S., Bida, A., Dohmen-Olma, D. et al. (2020). A secretion biosensor for monitoring Sec-dependent protein export in Corynebacterium glutamicum. Microb. Cell Factories 19 (1): 11. Tung, Q.N., Loi, V.V., Busche, T. et al. (2019). Stable integration of the Mrx1-roGFP2 biosensor to monitor dynamic changes of the mycothiol redox potential in Corynebacterium glutamicum. Redox Biol. 20: 514–525. Liu, C., Zhang, B., Liu, Y.M. et al. (2018). New intracellular shikimic acid biosensor for monitoring shikimate synthesis in Corynebacterium glutamicum. ACS Synth. Biol. 7 (2): 591–601. Schulte, J., Baumgart, M., and Bott, M. (2017). Development of a single-cell GlxR-based cAMP biosensor for Corynebacterium glutamicum. J. Biotechnol. 258: 33–40. Mahr, R., Gatgens, C., Gatgens, J. et al. (2015). Biosensor-driven adaptive laboratory evolution of l-valine production in Corynebacterium glutamicum. Metab. Eng. 32: 184–194. Mustafi, N., Grunberger, A., Mahr, R. et al. (2014). Application of a genetically encoded biosensor for live cell imaging of l-valine production in pyruvate dehydrogenase complex-deficient Corynebacterium glutamicum strains. PLoS One 9 (1): e85731. Binder, D., Frohwitter, J., Mahr, R. et al. (2016). Light-controlled cell factories: employing photocaged isopropyl-beta-d-thiogalactopyranoside for light-mediated optimization of lac promoter-based gene expression and (+)-valencene biosynthesis in Corynebacterium glutamicum. Appl. Environ. Microbiol. 82 (20): 6141–6149. Buschke, N., Schäfer, R., Becker, J., and Wittmann, C. (2013). Metabolic engineering of industrial platform microorganisms for biorefinery applications – optimization of substrate spectrum and process robustness by rational and evolutive strategies. Bioresour. Technol. 135: 544–554. Kawaguchi, H., Sasaki, M., Vertes, A.A. et al. (2008). Engineering of an l-arabinose metabolic pathway in Corynebacterium glutamicum. Appl. Microbiol. Biotechnol. 77 (5): 1053–1062. Kawaguchi, H., Vertes, A.A., Okino, S. et al. (2006). Engineering of a xylose metabolic pathway in Corynebacterium glutamicum. Appl. Environ. Microbiol. 72 (5): 3418–3428.

References

241 Rittmann, D., Lindner, S.N., and Wendisch, V.F. (2008). Engineering of a

242

243

244 245

246

247

248

249

250

251

252

253

254

glycerol utilization pathway for amino acid production by Corynebacterium glutamicum. Appl. Environ. Microbiol. 74 (20): 6216–6222. Poblete-Castro, I., Hoffmann, S.L., Becker, J., and Wittmann, C. (2020). Cascaded valorization of seaweed using microbial cell factories. Curr. Opin. Biotechnol. 65: 102–113. Kamm, B., Kamm, M., Schmidt, M. et al. (2010). Lignocellulose-based chemical products and product family trees. In: Biorefineries – Industrial Processes and Products (eds. B. Kamm, P.R. Gruber and M. Kamm), 97–149. Weinheim: Wiley-VCH Verlag GmbH & Co. KGaA. Peng, F., Peng, P., Xu, F., and Sun, R.C. (2012). Fractional purification and bioconversion of hemicelluloses. Biotechnol. Adv. 30 (4): 879–903. Sasaki, M., Teramoto, H., Inui, M., and Yukawa, H. (2011). Identification of mannose uptake and catabolism genes in Corynebacterium glutamicum and genetic engineering for simultaneous utilization of mannose and glucose. Appl. Microbiol. Biotechnol. 89 (6): 1905–1916. Buschke, N., Schröder, H., and Wittmann, C. (2011). Metabolic engineering of Corynebacterium glutamicum for production of 1,5-diaminopentane from hemicellulose. Biotechnol. J. 6 (3): 306–317. Mao, Y., Li, G., Chang, Z. et al. (2018). Metabolic engineering of Corynebacterium glutamicum for efficient production of succinate from lignocellulosic hydrolysate. Biotechnol. Biofuels 11: 95. Meiswinkel, T.M., Gopinath, V., Lindner, S.N. et al. (2013). Accelerated pentose utilization by Corynebacterium glutamicum for accelerated production of lysine, glutamate, ornithine and putrescine. Microb. Biotechnol. 6 (2): 131–140. Radek, A., Krumbach, K., Gatgens, J. et al. (2014). Engineering of Corynebacterium glutamicum for minimized carbon loss during utilization of d-xylose containing substrates. J. Biotechnol. 192: 156–160. Radek, A., Tenhaef, N., Müller, M.F. et al. (2017). Miniaturized and automated adaptive laboratory evolution: evolving Corynebacterium glutamicum towards an improved d-xylose utilization. Bioresour. Technol. 245 (Pt B): 1377–1385. Brüsseler, C., Radek, A., Tenhaef, N. et al. (2018). The myo-inositol/proton symporter IolT1 contributes to d-xylose uptake in Corynebacterium glutamicum. Bioresour. Technol. 249: 953–961. Watanabe, A., Hiraga, K., Suda, M. et al. (2015). Functional characterization of Corynebacterium alkanolyticum beta-xylosidase and xyloside ABC transporter in Corynebacterium glutamicum. Appl. Environ. Microbiol. 81 (12): 4173–4183. Imao, K., Konishi, R., Kishida, M. et al. (2017). 1,5-Diaminopentane production from xylooligosaccharides using metabolically engineered Corynebacterium glutamicum displaying beta-xylosidase on the cell surface. Bioresour. Technol. 245 (Pt B): 1684–1691. Kuge, T., Watanabe, A., Hasegawa, S. et al. (2017). Functional analysis of arabinofuranosidases and a xylanase of Corynebacterium alkanolyticum for

455

456

12 Metabolic Engineering of Corynebacterium glutamicum

255

256

257

258

259

260

261

262

263

264

265

266

267

arabinoxylan utilization in Corynebacterium glutamicum. Appl. Microbiol. Biotechnol. 101 (12): 5019–5032. Schneider, J., Niermann, K., and Wendisch, V.F. (2011). Production of the amino acids l-glutamate, l-lysine, l-ornithine and l-arginine from arabinose by recombinant Corynebacterium glutamicum. J. Biotechnol. 154 (2–3): 191–198. Yim, S.S., Choi, J.W., Lee, S.H., and Jeong, K.J. (2016). Modular optimization of a hemicellulose-utilizing pathway in Corynebacterium glutamicum for consolidated bioprocessing of hemicellulosic biomass. ACS Synth. Biol. 5 (4): 334–343. Gopinath, V., Meiswinkel, T.M., Wendisch, V.F., and Nampoothiri, K.M. (2011). Amino acid production from rice straw and wheat bran hydrolysates by recombinant pentose-utilizing Corynebacterium glutamicum. Appl. Microbiol. Biotechnol. 92 (5): 985–996. Mindt, M., Heuser, M., and Wendisch, V.F. (2019). Xylose as preferred substrate for sarcosine production by recombinant Corynebacterium glutamicum. Bioresour. Technol. 281: 135–142. Baritugo, K.A., Kim, H.T., David, Y. et al. (2018). Enhanced production of gamma-aminobutyrate (GABA) in recombinant Corynebacterium glutamicum strains from empty fruit bunch biosugar solution. Microb. Cell Factories 17 (1): 129. Perez-Garcia, F., Ziert, C., Risse, J.M., and Wendisch, V.F. (2017). Improved fermentative production of the compatible solute ectoine by Corynebacterium glutamicum from glucose and alternative carbon sources. J. Biotechnol. 258: 59–68. Perez-Garcia, F., Max Risse, J., Friehs, K., and Wendisch, V.F. (2017). Fermentative production of l-pipecolic acid from glucose and alternative carbon sources. Biotechnol. J. 12 (7) https://doi.org/10.1002/biot.201600646. Wang, C., Zhang, H., Cai, H. et al. (2014). Succinic acid production from corn cob hydrolysates by genetically engineered Corynebacterium glutamicum. Appl. Biochem. Biotechnol. 172 (1): 340–350. Chen, T., Zhu, N., and Xia, H. (2014). Aerobic production of succinate from arabinose by metabolically engineered Corynebacterium glutamicum. Bioresour. Technol. 151: 411–414. Lange, J., Müller, F., Takors, R., and Blombach, B. (2018). Harnessing novel chromosomal integration loci to utilize an organosolv-derived hemicellulose fraction for isobutanol production with engineered Corynebacterium glutamicum. Microb. Biotechnol. 11 (1): 257–263. Dhar, K.S., Wendisch, V.F., and Nampoothiri, K.M. (2016). Engineering of Corynebacterium glutamicum for xylitol production from lignocellulosic pentose sugars. J. Biotechnol. 230: 63–71. Yim, S.S., Choi, J.W., Lee, S.H. et al. (2017). Engineering of Corynebacterium glutamicum for consolidated conversion of hemicellulosic biomass into xylonic acid. Biotechnol. J. 12 (11) https://doi.org/10.1002/biot.201700040. Choi, J.W., Jeon, E.J., and Jeong, K.J. (2019). Recent advances in engineering Corynebacterium glutamicum for utilization of hemicellulosic biomass. Curr. Opin. Biotechnol. 57: 17–24.

References

268 Chen, Z., Huang, J., Wu, Y. et al. (2017). Metabolic engineering of

269

270

271

272 273

274

275

276

277

278

279

280 281

282

Corynebacterium glutamicum for the production of 3-hydroxypropionic acid from glucose and xylose. Metab. Eng. 39: 151–158. Lee, S.S., Choi, J.I., and Woo, H.M. (2019). Bioconversion of xylose to ethylene glycol and glycolate in engineered Corynebacterium glutamicum. ACS Omega. 4 (25): 21279–21287. Kawaguchi, H., Sasaki, M., Vertes, A.A. et al. (2009). Identification and functional analysis of the gene cluster for l-arabinose utilization in Corynebacterium glutamicum. Appl. Environ. Microbiol. 75 (11): 3419–3429. Sasaki, M., Jojima, T., Kawaguchi, H. et al. (2009). Engineering of pentose transport in Corynebacterium glutamicum to improve simultaneous utilization of mixed sugars. Appl. Microbiol. Biotechnol. 85 (1): 105–115. Chauhan, P.S. and Gupta, N. (2017). Insight into microbial mannosidases: a review. Crit. Rev. Biotechnol. 37 (2): 190–201. Chin, Y.W., Park, J.B., Park, Y.C. et al. (2013). Metabolic engineering of Corynebacterium glutamicum to produce GDP-l-fucose from glucose and mannose. Bioprocess Biosyst. Eng. 36 (6): 749–756. Adham, S.A., Campelo, A.B., Ramos, A., and Gil, J.A. (2001). Construction of a xylanase-producing strain of Brevibacterium lactofermentum by stable integration of an engineered xysA gene from Streptomyces halstedii JM8. Appl. Environ. Microbiol. 67 (12): 5425–5430. Adham, S.A., Honrubia, P., Diaz, M. et al. (2001). Expression of the genes coding for the xylanase Xys1 and the cellulase Cel1 from the straw-decomposing Streptomyces halstedii JM8 cloned into the amino-acid producer Brevibacterium lactofermentum ATCC13869. Arch. Microbiol. 177 (1): 91–97. Tateno, T., Hatada, K., Tanaka, T. et al. (2009). Development of novel cell surface display in Corynebacterium glutamicum using porin. Appl. Microbiol. Biotechnol. 84 (4): 733–739. John, R.P., Anisha, G.S., Nampoothiri, K.M., and Pandey, A. (2011). Micro and macroalgal biomass: a renewable source for bioethanol. Bioresour. Technol. 102 (1): 186–193. Wei, N., Quarterman, J., and Jin, Y.S. (2013). Marine macroalgae: an untapped resource for producing fuels and chemicals. Trends Biotechnol. 31 (2): 70–77. van Hal, J.W., Huijgen, W.J., and Lopez-Contreras, A.M. (2014). Opportunities and challenges for seaweed in the biobased economy. Trends Biotechnol. 32 (5): 231–233. Menetrez, M.Y. (2012). An overview of algae biofuel production and potential environmental impact. Environ. Sci. Technol. 46 (13): 7073–7085. Maneein, S., Milledge, J.J., Nielsen, B.V., and Harvey, P.J. (2018). A review of seaweed pre-treatment methods for enhanced biofuel production by anaerobic digestion or fermentation. Fermentation 4 (4) https://doi.org/10.3390/ fermentation4040100. Rasmus, B., Valderrama, D., Sims, N. et al. (2016). Seaweed Aquaculture for Food Security, Income Generation and Environmental Health, 1–17. Washington, DC: World Bank Group.

457

458

12 Metabolic Engineering of Corynebacterium glutamicum

283 Peng, X., Okai, N., Vertes, A.A. et al. (2011). Characterization of the man-

284

285

286

287

288

289 290

291

292

293

294 295

296

297

nitol catabolic operon of Corynebacterium glutamicum. Appl. Microbiol. Biotechnol. 91 (5): 1375–1387. Laslo, T., von Zaluskowski, P., Gabris, C. et al. (2012). Arabitol metabolism of Corynebacterium glutamicum and its regulation by AtlR. J. Bacteriol. 194 (5): 941–955. Ohnishi, J., Katahira, R., Mitsuhashi, S. et al. (2005). A novel gnd mutation leading to increased l-lysine production in Corynebacterium glutamicum. FEMS Microbiol. Lett. 242 (2): 265–274. Fernandez-Rodriguez, J., Erdocia, X., Sanchez, C. et al. (2017). Lignin depolymerization for phenolic monomers production by sustainable processes. J. Energy Chem. 26 (4): 622–631. Rinaldi, R., Jastrzebski, R., Clough, M.T. et al. (2016). Paving the way for lignin valorisation: recent advances in bioengineering, biorefining and catalysis. Angew. Chem. Int. Ed. Engl. 55 (29): 8164–8215. Kohlstedt, M., Starck, S., Barton, N. et al. (2018). From lignin to nylon: cascaded chemical and biochemical conversion using metabolically engineered Pseudomonas putida. Metab. Eng. 47: 279–293. Vardon, D.R., Franden, M.A., Johnson, C.W. et al. (2015). Adipic acid production from lignin. Energy Environ. Sci. 8 (2): 617–628. Barton, N., Horbal, L., Starck, S. et al. (2018). Enabling the valorization of guaiacol-based lignin: integrated chemical and biochemical production of cis,cis-muconic acid using metabolically engineered Amycolatopsis sp ATCC 39116. Metab. Eng. 45: 200–210. Kasai, D., Masai, E., Miyauchi, K. et al. (2005). Characterization of the gallate dioxygenase gene: three distinct ring cleavage dioxygenases are involved in syringate degradation by Sphingomonas paucimobilis SYK-6. J. Bacteriol. 187 (15): 5067–5074. Sonoki, T., Takahashi, K., Sugita, H. et al. (2018). Glucose-free cis,cis-muconic acid production via new metabolic designs corresponding to the heterogeneity of lignin. ACS Sustain. Chem. Eng. 6 (1): 1256–1264. Becker, J., Kuhl, M., Kohlstedt, M. et al. (2018). Metabolic engineering of Corynebacterium glutamicum for the production of cis,cis-muconic acid from lignin. Microb. Cell Factories 17 (1): 115. Kinoshita, S., Nakayama, K., and Kitada, S. (1958). l-Lysine production using microbial auxotrophs. J. Gen. Appl. Microbiol. 4 (2): 128–129. Kimura, E. (2005). l-Glutamate production. In: Handbook of Corynebacterium glutamicum (eds. L. Eggeling and M. Bott), 439–463. Boca Raton: CRC Press. Wittmann, C. and Becker, J. (2007). The L-lysine story. From metabolic pathways to industrial production. In: Amino Acid Biosynthesis – Pathways, Regulation and Metabolic Engineering (ed. V.F. Wendisch), 40–68. Berlin Heidelberg: Springer. Sugimoto, S. and Shiio, I. (1977). Enzymes of the tryptophan synthetic pathway in Brevibacterium flavum. J. Biochem. 81 (4): 823–833.

References

298 Sugimoto, S., Nakagawa, M., Tsuchida, T., and Shiio, I. (1973). Regulation of

299

300

301

302

303

304

305

306

307

308

309

310

311

aromatic amino acid biosynthesis and production of tyrosine and phenylalanine in Brevibacterium flavum. Agric. Biol. Chem. 37 (10): 2327–2336. Udaka, S. and Kinoshita, S. (1958). Studies on l-ornithine fermentation I – the biosynthetic pathway of l-ornithine in Micrococcus glutamicus. J. Gen. Appl. Microbiol. 4 (4): 272–275. Udaka, S. and Kinoshita, S. (1958). Studies on l-ornithine fermentation II – the change of fermentation product by a feedback type mechanism. J. Gen. Appl. Microbiol. 4 (4): 283–288. Kubota, K., Onoda, T., Kamijo, H. et al. (1973). Production of l-arginine by mutants of glutamic acid-producing bacteria. Gen. Appl. Microbiol. 19: 339–352. Kiefer, P., Heinzle, E., and Wittmann, C. (2002). Influence of glucose, fructose and sucrose as carbon sources on kinetics and stoichiometry of lysine production by Corynebacterium glutamicum. J. Ind. Microbiol. Biotechnol. 28 (6): 338–343. Dong, X., Zhao, Y., Hu, J. et al. (2016). Attenuating l-lysine production by deletion of ddh and lysE and their effect on l-threonine and l-isoleucine production in Corynebacterium glutamicum. Enzym. Microb. Technol. 93–94: 70–78. Vogt, M., Krumbach, K., Bang, W.G. et al. (2015). The contest for precursors: channelling l-isoleucine synthesis in Corynebacterium glutamicum without byproduct formation. Appl. Microbiol. Biotechnol. 99 (2): 791–800. Wang, J., Wen, B., Xu, Q. et al. (2013). Enhancing (l)-isoleucine production by thrABC overexpression combined with alaT deletion in Corynebacterium glutamicum. Appl. Biochem. Biotechnol. 171 (1): 20–30. Yin, L., Hu, X., Xu, D. et al. (2012). Co-expression of feedback-resistant threonine dehydratase and acetohydroxy acid synthase increase l-isoleucine production in Corynebacterium glutamicum. Metab. Eng. 14 (5): 542–550. Xie, X., Xu, L., Shi, J. et al. (2012). Effect of transport proteins on l-isoleucine production with the l-isoleucine-producing strain Corynebacterium glutamicum YILW. J. Ind. Microbiol. Biotechnol. 39 (10): 1549–1556. Hashiguchi, K., Kojima, H., Sato, K., and Sano, K. (1997). Effects of an Escherichia coli ilvA mutant gene encoding feedback-resistant threonine deaminase on l-isoleucine production by Brevibacterium flavum. Biosci. Biotechnol. Biochem. 61 (1): 105–108. Colon, G.E., Nguyen, T.T., Jetten, M.S. et al. (1995). Production of isoleucine by overexpression of ilvA in a Corynebacterium lactofermentum threonine producer. Appl. Microbiol. Biotechnol. 43 (3): 482–488. Eikmanns, B.J., Eggeling, L., and Sahm, H. (1993). Molecular aspects of lysine, threonine, and isoleucine biosynthesis in Corynebacterium glutamicum. Antonie Van Leeuwenhoek 64 (2): 145–163. Hasegawa, S., Suda, M., Uematsu, K. et al. (2013). Engineering of Corynebacterium glutamicum for high-yield l-valine production under oxygen deprivation conditions. Appl. Environ. Microbiol. 79 (4): 1250–1257.

459

460

12 Metabolic Engineering of Corynebacterium glutamicum

312 Buchholz, J., Schwentner, A., Brunnenkan, B. et al. (2013). Platform engi-

313

314

315

316

317

318

319

320

321

322

323

324

325

neering of Corynebacterium glutamicum with reduced pyruvate dehydrogenase complex activity for improved production of l-lysine, l-valine, and 2-ketoisovalerate. Appl. Environ. Microbiol. 79 (18): 5566–5575. Hou, X., Chen, X., Zhang, Y. et al. (2012). (l)-Valine production with minimization of by-products’ synthesis in Corynebacterium glutamicum and Brevibacterium flavum. Amino Acids 43 (6): 2301–2311. Krause, F.S., Henrich, A., Blombach, B. et al. (2010). Increased glucose utilization in Corynebacterium glutamicum by use of maltose, and its application for the improvement of l-valine productivity. Appl. Environ. Microbiol. 76 (1): 370–374. Holatko, J., Elisakova, V., Prouza, M. et al. (2009). Metabolic engineering of the l-valine biosynthesis pathway in Corynebacterium glutamicum using promoter activity modulation. J. Biotechnol. 139 (3): 203–210. Blombach, B., Arndt, A., Auchter, M., and Eikmanns, B.J. (2009). l-Valine production during growth of pyruvate dehydrogenase complex-deficient Corynebacterium glutamicum in the presence of ethanol or by inactivation of the transcriptional regulator SugR. Appl. Environ. Microbiol. 75 (4): 1197–1200. Blombach, B., Schreiner, M.E., Bartek, T. et al. (2008). Corynebacterium glutamicum tailored for high-yield l-valine production. Appl. Microbiol. Biotechnol. 79 (3): 471–479. Blombach, B., Schreiner, M.E., Holatko, J. et al. (2007). l-Valine production with pyruvate dehydrogenase complex-deficient Corynebacterium glutamicum. Appl. Environ. Microbiol. 73 (7): 2079–2084. Elisakova, V., Patek, M., Holatko, J. et al. (2005). Feedback-resistant acetohydroxy acid synthase increases valine production in Corynebacterium glutamicum. Appl. Environ. Microbiol. 71 (1): 207–213. Radmacher, E., Vaitsikova, A., Burger, U. et al. (2002). Linking central metabolism with increased pathway flux: l-valine accumulation by Corynebacterium glutamicum. Appl. Environ. Microbiol. 68 (5): 2246–2250. Han, G., Xu, N., Sun, X. et al. (2020). Improvement of l-valine production by atmospheric and room temperature plasma mutagenesis and high-throughput screening in Corynebacterium glutamicum. ACS Omega 5 (10): 4751–4758. Bampidis, V., Azimonti, G., Bastos, M.L. et al. (2019). Safety and efficacy of l-valine produced using Corynebacterium glutamicum CGMCC 11675 for all animal species. EFSA J. 17 (3): e05611. Ma, Y., Cui, Y., Du, L. et al. (2018). Identification and application of a growth-regulated promoter for improving l-valine production in Corynebacterium glutamicum. Microb. Cell Factories 17 (1): 185. Wang, X., Zhang, H., and Quinn, P.J. (2018). Production of l-valine from metabolically engineered Corynebacterium glutamicum. Appl. Microbiol. Biotechnol. 102 (10): 4319–4330. Chen, C., Li, Y., Hu, J. et al. (2015). Metabolic engineering of Corynebacterium glutamicum ATCC13869 for l-valine production. Metab. Eng. 29: 66–75.

References

326 Li, Y., Cong, H., Liu, B. et al. (2016). Metabolic engineering of Corynebac-

327

328

329

330

331

332

333

334

335

336

337

338

339

terium glutamicum for methionine production by removing feedback inhibition and increasing NADPH level. Antonie Van Leeuwenhoek 109 (9): 1185–1197. Si, M., Zhang, L., Chaudhry, M.T. et al. (2015). Corynebacterium glutamicum methionine sulfoxide reductase A uses both mycoredoxin and thioredoxin for regeneration and oxidative stress resistance. Appl. Environ. Microbiol. 81 (8): 2781–2796. Ikeda, M. and Katsumata, R. (1999). Hyperproduction of tryptophan by Corynebacterium glutamicum with the modified pentose phosphate pathway. Appl. Environ. Microbiol. 65 (6): 2497–2502. Ikeda, M., Nakanishi, K., Kino, K., and Katsumata, R. (1994). Fermentative production of tryptophan by a stable recombinant strain of Corynebacterium glutamicum with a modified serine-biosynthetic pathway. Biosci. Biotechnol. Biochem. 58 (4): 674–678. Han, G., Hu, X., and Wang, X. (2016). Overexpression of methionine adenosyltransferase in Corynebacterium glutamicum for production of S-adenosyl-l-methionine. Biotechnol. Appl. Biochem. 63 (5): 679–689. Han, G., Hu, X., Qin, T. et al. (2016). Metabolic engineering of Corynebacterium glutamicum ATCC13032 to produce S-adenosyl-l-methionine. Enzym. Microb. Technol. 83: 14–21. Kim, J.Y., Lee, Y.A., Wittmann, C., and Park, J.B. (2013). Production of non-proteinogenic amino acids from alpha-keto acid precursors with recombinant Corynebacterium glutamicum. Biotechnol. Bioeng. 110 (11): 2846–2855. Veldmann, K.H., Dachwitz, S., Risse, J.M. et al. (2019). Bromination of l-tryptophan in a fermentative process with Corynebacterium glutamicum. Front. Bioeng. Biotechnol. 7: 219. Veldmann, K.H., Minges, H., Sewald, N. et al. (2019). Metabolic engineering of Corynebacterium glutamicum for the fermentative production of halogenated tryptophan. J. Biotechnol. 291: 7–16. Li, H.W., Su, Q.H., Li, Z.J., and Dai, L.Y. (2005). Comprehensive report on Chinese glutamic acid production trade. Acad. Period. Farm Prod. Process. 42: 65–67. Wang, Y., Cao, G., Xu, D. et al. (2018). A novel Corynebacterium glutamicum l-glutamate exporter. Appl. Environ. Microbiol. 84 (6): e02691-17. https://doi.org/10.1128/AEM.02691-17. Lv, Y., Wu, Z., Han, S. et al. (2011). Genome sequence of Corynebacterium glutamicum S9114, a strain for industrial production of glutamate. J. Bacteriol. 193 (21): 6096–6097. Wen, J. and Bao, J. (2019). Engineering Corynebacterium glutamicum triggers glutamic acid accumulation in biotin-rich corn stover hydrolysate. Biotechnol. Biofuels 12: 86. Wen, J., Xiao, Y., Liu, T. et al. (2018). Rich biotin content in lignocellulose biomass plays the key role in determining cellulosic glutamic acid accumulation by Corynebacterium glutamicum. Biotechnol. Biofuels 11: 132.

461

462

12 Metabolic Engineering of Corynebacterium glutamicum

340 Lee, J.H. and Wendisch, V.F. (2017). Production of amino acids – genetic

341

342

343

344

345

346

347

348

349

350

351

352

353

and metabolic engineering approaches. Bioresour. Technol. 245 (Pt B): 1575–1587. Seibold, G., Auchter, M., Berens, S. et al. (2006). Utilization of soluble starch by a recombinant Corynebacterium glutamicum strain: growth and lysine production. J. Biotechnol. 124 (2): 381–391. Tateno, T., Fukuda, H., and Kondo, A. (2007). Direct production of l-lysine from raw corn starch by Corynebacterium glutamicum secreting Streptococcus bovis alpha-amylase using cspB promoter and signal sequence. Appl. Microbiol. Biotechnol. 77 (3): 533–541. Tateno, T., Fukuda, H., and Kondo, A. (2007). Production of l-lysine from starch by Corynebacterium glutamicum displaying alpha-amylase on its cell surface. Appl. Microbiol. Biotechnol. 74 (6): 1213–1220. Adachi, N., Takahashi, C., Ono-Murota, N. et al. (2013). Direct l-lysine production from cellobiose by Corynebacterium glutamicum displaying beta-glucosidase on its cell surface. Appl. Microbiol. Biotechnol. 97 (16): 7165–7172. Meiswinkel, T.M., Rittmann, D., Lindner, S.N., and Wendisch, V.F. (2013). Crude glycerol-based production of amino acids and putrescine by Corynebacterium glutamicum. Bioresour. Technol. 145: 254–258. Takeno, S., Murata, R., Kobayashi, R. et al. (2010). Engineering of Corynebacterium glutamicum with an NADPH-generating glycolytic pathway for l-lysine production. Appl. Environ. Microbiol. 76 (21): 7154–7160. Xu, J., Han, M., Zhang, J. et al. (2014). Metabolic engineering Corynebacterium glutamicum for the l-lysine production by increasing the flux into l-lysine biosynthetic pathway. Amino Acids 46 (9): 2165–2175. Bommareddy, R.R., Chen, Z., Rappert, S., and Zeng, A.P. (2014). A de novo NADPH generation pathway for improving lysine production of Corynebacterium glutamicum by rational design of the coenzyme specificity of glyceraldehyde 3-phosphate dehydrogenase. Metab. Eng. 25: 30–37. Takeno, S., Hori, K., Ohtani, S. et al. (2016). l-Lysine production independent of the oxidative pentose phosphate pathway by Corynebacterium glutamicum with the Streptococcus mutans gapN gene. Metab. Eng. 37: 1–10. Shiio, I., Yokota, A., and Sugimoto, S.I. (1987). Effect of pyruvate kinase deficiency on l-lysine productivities of mutants with feedback-resistant aspartokinases. Agric. Biol. Chem. 51 (9): 2485–2493. Gubler, M., Jetten, M., Lee, S.H., and Sinskey, A.J. (1994). Cloning of the pyruvate kinase gene (pyk) of Corynebacterium glutamicum and site-specific inactivation of pyk in a lysine-producing Corynebacterium lactofermentum strain. Appl. Environ. Microbiol. 60 (7): 2494–2500. Jetten, M.S. and Sinskey, A.J. (1995). Purification and properties of oxaloacetate decarboxylase from Corynebacterium glutamicum. Antonie Van Leeuwenhoek 67 (2): 221–227. Peters-Wendisch, P.G., Schiel, B., Wendisch, V.F. et al. (2001). Pyruvate carboxylase is a major bottleneck for glutamate and lysine production by Corynebacterium glutamicum. J. Mol. Microbiol. Biotechnol. 3 (2): 295–300.

References

354 Chen, Z., Bommareddy, R.R., Frank, D. et al. (2014). Deregulation of feed-

355

356

357

358

359

360

361

362

363

364

365

366

back inhibition of phosphoenolpyruvate carboxylase for improved lysine production in Corynebacterium glutamicum. Appl. Environ. Microbiol. 80 (4): 1388–1393. Becker, J., Klopprogge, C., Schröder, H., and Wittmann, C. (2009). Metabolic engineering of the tricarboxylic acid cycle for improved lysine production by Corynebacterium glutamicum. Appl. Environ. Microbiol. 75 (24): 7866–7869. Lam, P.Y., Tobimatsu, Y., Takeda, Y. et al. (2017). Disrupting flavone synthase II alters lignin and improves biomass digestibility. Plant Physiol. 174 (2): 972–985. Shaw-Reid, C.A., McCormick, M.M., Sinskey, A.J., and Stephanopoulos, G. (1999). Flux through the tetrahydrodipicolinate succinylase pathway is dispensable for l-lysine production in Corynebacterium glutamicum. Appl. Microbiol. Biotechnol. 51 (3): 325–333. Eggeling, L., Oberle, S., and Sahm, H. (1998). Improved l-lysine yield with Corynebacterium glutamicum: use of dapA resulting in increased flux combined with growth limitation. Appl. Microbiol. Biotechnol. 49 (1): 24–30. Sonntag, K., Eggeling, L., De Graaf, A.A., and Sahm, H. (1993). Flux partitioning in the split pathway of lysine synthesis in Corynebacterium glutamicum. Quantification by 13 C- and 1 H-NMR spectroscopy. Eur. J. Biochem. 213 (3): 1325–1331. Kim, H.M., Heinzle, E., and Wittmann, C. (2006). Deregulation of aspartokinase by single nucleotide exchange leads to global flux rearrangement in the central metabolism of Corynebacterium glutamicum. J. Microbiol. Biotechnol. 16 (8): 1174–1179. Schendzielorz, G., Dippong, M., Grünberger, A. et al. (2014). Taking control over control: use of product sensing in single cells to remove flux control at key enzymes in biosynthesis pathways. ACS Synth. Biol. 3 (1): 21–29. Chen, Z., Meyer, W., Rappert, S. et al. (2011). Coevolutionary analysis enabled rational deregulation of allosteric enzyme inhibition in Corynebacterium glutamicum for lysine production. Appl. Environ. Microbiol. 77 (13): 4352–4360. Xu, J.Z., Yang, H.K., Liu, L.M. et al. (2018). Rational modification of Corynebacterium glutamicum dihydrodipicolinate reductase to switch the nucleotide-cofactor specificity for increasing l-lysine production. Biotechnol. Bioeng. 115 (7): 1764–1777. Hochheim, J., Kranz, A., Krumbach, K. et al. (2017). Mutations in MurE, the essential UDP-N-acetylmuramoylalanyl-d-glutamate 2,6-diaminopimelate ligase of Corynebacterium glutamicum: effect on l-lysine formation and analysis of systemic consequences. Biotechnol. Lett. 39 (2): 283–288. Xafenias, N., Kmezik, C., and Mapelli, V. (2017). Enhancement of anaerobic lysine production in Corynebacterium glutamicum electrofermentations. Bioelectrochemistry 117: 40–47. Vassilev, I., Giesselmann, G., Schwechheimer, S.K. et al. (2018). Anodic electro-fermentation: anaerobic production of l-lysine by recombinant Corynebacterium glutamicum. Biotechnol. Bioeng. 115 (6): 1499–1508.

463

464

12 Metabolic Engineering of Corynebacterium glutamicum

367 Adkins, J., Jordan, J., and Nielsen, D.R. (2013). Engineering Escherichia

368

369

370

371

372

373

374

375

376

377

378

379

380

coli for renewable production of the 5-carbon polyamide building-blocks 5-aminovalerate and glutarate. Biotechnol. Bioeng. 110 (6): 1726–1734. Park, S.J., Oh, Y.H., Noh, W. et al. (2014). High-level conversion of l-lysine into 5-aminovalerate that can be used for nylon 6,5 synthesis. Biotechnol. J. 9 (10): 1322–1328. Shin, J.H., Park, S.H., Oh, Y.H. et al. (2016). Metabolic engineering of Corynebacterium glutamicum for enhanced production of 5-aminovaleric acid. Microb. Cell Factories 15 (1): 174. Joo, J.C., Oh, Y.H., Yu, J.H. et al. (2017). Production of 5-aminovaleric acid in recombinant Corynebacterium glutamicum strains from a Miscanthus hydrolysate solution prepared by a newly developed Miscanthus hydrolysis process. Bioresour. Technol. 245 (Pt B): 1692–1700. Jorge, J.M.P., Perez-Garcia, F., and Wendisch, V.F. (2017). A new metabolic route for the fermentative production of 5-aminovalerate from glucose and alternative carbon sources. Bioresour. Technol. 245 (Pt B): 1701–1709. Rohles, C.M., Gläser, L., Kohlstedt, M. et al. (2018). A bio-based route to the carbon-5 chemical glutaric acid and to bionylon-6,5 using metabolically engineered Corynebacterium glutamicum. Green Chem. 20 (20): 4662–4674. Oren, A. and Gunde-Cimerman, N. (2007). Mycosporines and mycosporine-like amino acids: UV protectants or multipurpose secondary metabolites? FEMS Microbiol. Lett. 269 (1): 1–10. Tsuge, Y., Kawaguchi, H., Yamamoto, S. et al. (2018). Metabolic engineering of Corynebacterium glutamicum for production of sunscreen shinorine. Biosci. Biotechnol. Biochem. 82 (7): 1252–1259. Miyamoto, K.T., Komatsu, M., and Ikeda, H. (2014). Discovery of gene cluster for mycosporine-like amino acid biosynthesis from Actinomycetales microorganisms and production of a novel mycosporine-like amino acid by heterologous expression. Appl. Environ. Microbiol. 80 (16): 5028–5036. Czech, L., Hermann, L., Stöveken, N. et al. (2018). Role of the extremolytes ectoine and hydroxyectoine as stress protectants and nutrients: genetics, phylogenomics, biochemistry, and structural analysis. Genes (Basel) 9 (4): 177. Pastor, J.M., Salvador, M., Argandona, M. et al. (2010). Ectoines in cell stress protection: uses and biotechnological production. Biotechnol. Adv. 28 (6): 782–801. Kunte, H.J., Lentzen, G., and Galinski, E.A. (2014). Industrial production of the cell protectant ectoine: protection mechanisms, processes, and products. Curr. Biotechnol. 3: 10–25. Marini, A., Reinelt, K., Krutmann, J., and Bilstein, A. (2014). Ectoine-containing cream in the treatment of mild to moderate atopic dermatitis: a randomised, comparator-controlled, intra-individual double-blind, multi-center trial. Skin Pharmacol. Physiol. 27 (2): 57–65. Unfried, K., Kramer, U., Sydlik, U. et al. (2016). Reduction of neutrophilic lung inflammation by inhalation of the compatible solute ectoine: a randomized trial with elderly individuals. Int. J. Chron. Obstruct. Pulmon. Dis. 11: 2573–2583.

References

381 Werkhauser, N., Bilstein, A., and Sonnemann, U. (2014). Treatment of aller-

382

383

384

385

386 387 388

389

390

391

392

393

394

gic rhinitis with ectoine containing nasal spray and eye drops in comparison with azelastine containing nasal spray and eye drops or with cromoglycic acid containing nasal spray. J. Allergy (Cairo). 2014: 176597. Becker, J., Schäfer, R., Kohlstedt, M. et al. (2013). Systems metabolic engineering of Corynebacterium glutamicum for production of the chemical chaperone ectoine. Microb. Cell Factories 12: 110. Tani, Y., Miyake, R., Yukami, R. et al. (2015). Functional expression of l-lysine alpha-oxidase from Scomber japonicus in Escherichia coli for one-pot synthesis of l-pipecolic acid from dl-lysine. Appl. Microbiol. Biotechnol. 99 (12): 5045–5054. Fujii, T., Mukaihara, M., Agematu, H., and Tsunekawa, H. (2002). Biotransformation of l-lysine to l-pipecolic acid catalyzed by l-lysine 6-aminotransferase and pyrroline-5-carboxylate reductase. Biosci. Biotechnol. Biochem. 66 (3): 622–627. Yi, Y., Sheng, H., Li, Z., and Ye, Q. (2014). Biosynthesis of trans-4-hydroxyproline by recombinant strains of Corynebacterium glutamicum and Escherichia coli. BMC Biotechnol. 14: 44. Bach, T.M. and Takagi, H. (2013). Properties, metabolisms, and applications of (L)-proline analogues. Appl. Microbiol. Biotechnol. 97 (15): 6623–6634. Salis, H.M. (2011). The ribosome binding site calculator. Methods Enzymol. 498: 19–42. Espah Borujeni, A. and Salis, H.M. (2016). Translation initiation is controlled by RNA folding kinetics via a ribosome drafting mechanism. J. Am. Chem. Soc. 138 (22): 7016–7023. Ma, H., Fan, X., Cai, N. et al. (2020). Efficient fermentative production of l-theanine by Corynebacterium glutamicum. Appl. Microbiol. Biotechnol. 104 (1): 119–130. https://doi.org/10.1007/s00253-019-10255-w. Epub 2019 Nov 27. Shi, F., Niu, T., and Fang, H. (2015). 4-Hydroxyisoleucine production of recombinant Corynebacterium glutamicum ssp. lactofermentum under optimal corn steep liquor limitation. Appl. Microbiol. Biotechnol. 99 (9): 3851–3863. Zhang, C., Li, Y., Ma, J. et al. (2018). High production of 4-hydroxyisoleucine in Corynebacterium glutamicum by multistep metabolic engineering. Metab. Eng. 49: 287–298. Wendisch, V.F., Bott, M., and Eikmanns, B.J. (2006). Metabolic engineering of Escherichia coli and Corynebacterium glutamicum for biotechnological production of organic acids and amino acids. Curr. Opin. Microbiol. 9 (3): 268–274. Lange, A., Becker, J., Schulze, D. et al. (2017). Bio-based succinate from sucrose: high-resolution 13 C metabolic flux analysis and metabolic engineering of the rumen bacterium Basfia succiniciproducens. Metab. Eng. 44: 198–212. Becker, J., Reinefeld, J., Stellmacher, R. et al. (2013). Systems-wide analysis and engineering of metabolic pathway fluxes in bio-succinate producing Basfia succiniciproducens. Biotechnol. Bioeng. 110 (11): 3013–3023.

465

466

12 Metabolic Engineering of Corynebacterium glutamicum

395 Otten, A., Brocker, M., and Bott, M. (2015). Metabolic engineering of

396

397

398

399

400

401

402

403

404

405

406

407

408

Corynebacterium glutamicum for the production of itaconate. Metab. Eng. 30: 156–165. Tsuge, Y., Hasunuma, T., and Kondo, A. (2015). Recent advances in the metabolic engineering of Corynebacterium glutamicum for the production of lactate and succinate from renewable resources. J. Ind. Microbiol. Biotechnol. 42 (3): 375–389. Tsuge, Y., Kato, N., Yamamoto, S. et al. (2019). Enhanced production of d-lactate from mixed sugars in Corynebacterium glutamicum by overexpression of glycolytic genes encoding phosphofructokinase and triosephosphate isomerase. J. Biosci. Bioeng. 127 (3): 288–293. Tsuge, Y., Yamamoto, S., Kato, N. et al. (2015). Overexpression of the phosphofructokinase encoding gene is crucial for achieving high production of d-lactate in Corynebacterium glutamicum under oxygen deprivation. Appl. Microbiol. Biotechnol. 99 (11): 4679–4689. Okino, S., Suda, M., Fujikura, K. et al. (2008). Production of d-lactic acid by Corynebacterium glutamicum under oxygen deprivation. Appl. Microbiol. Biotechnol. 78 (3): 449–454. Inui, M., Murakami, S., Okino, S. et al. (2004). Metabolic analysis of Corynebacterium glutamicum during lactate and succinate productions under oxygen deprivation conditions. J. Mol. Microbiol. Biotechnol. 7 (4): 182–196. Okino, S., Noburyu, R., Suda, M. et al. (2008). An efficient succinic acid production process in a metabolically engineered Corynebacterium glutamicum strain. Appl. Microbiol. Biotechnol. 81 (3): 459–464. Yang, J., Kim, B., Kim, H. et al. (2015). Industrial production of 2,3-butanediol from the engineered Corynebacterium glutamicum. Appl. Biochem. Biotechnol. 176 (8): 2303–2313. Inui, M., Kawaguchi, H., Murakami, S. et al. (2004). Metabolic engineering of Corynebacterium glutamicum for fuel ethanol production under oxygen-deprivation conditions. J. Mol. Microbiol. Biotechnol. 8 (4): 243–254. Siebert, D. and Wendisch, V.F. (2015). Metabolic pathway engineering for production of 1,2-propanediol and 1-propanol by Corynebacterium glutamicum. Biotechnol. Biofuels 8: 91. Lange, J., Müller, F., Bernecker, K. et al. (2017). Valorization of pyrolysis water: a biorefinery side stream, for 1,2-propanediol production with engineered Corynebacterium glutamicum. Biotechnol. Biofuels 10: 277. Chen, Z., Huang, J., Wu, Y., and Liu, D. (2016). Metabolic engineering of Corynebacterium glutamicum for the de novo production of ethylene glycol from glucose. Metab. Eng. 33: 12–18. Huang, J., Wu, Y., Wu, W. et al. (2017). Cofactor recycling for co-production of 1,3-propanediol and glutamate by metabolically engineered Corynebacterium glutamicum. Sci. Rep. 7: 42246. Lee, H.N., Shin, W.S., Seo, S.Y. et al. (2018). Corynebacterium cell factory design and culture process optimization for muconic acid biosynthesis. Sci. Rep. UK. 8 https://doi.org/10.1038/s41598-018-36320-4.

References

409 Trivedi, S., Fluck, D., Sehgal, A. et al. (2008). Cleaning compositions incor-

porating green solvents and methods for use. US8222194B2. 410 Ni, Y., Shi, F., and Wang, N. (2015). Specific gamma-aminobutyric acid

411

412

413

414

415 416

417

418

419

420

421

422

423

decomposition by gabP and gabT under neutral pH in recombinant Corynebacterium glutamicum. Biotechnol. Lett. 37 (11): 2219–2227. Chang, Z., Dai, W., Mao, Y. et al. (2020). Engineering Corynebacterium glutamicum for the efficient production of 3-hydroxypropionic acid from a mixture of glucose and acetate via the malonyl-CoA pathway. Catalysts 10 (2): 203. https://doi.org/10.3390/catal10020203. Celinska, E. and Grajek, W. (2009). Biotechnological production of 2,3-butanediol – current state and prospects. Biotechnol. Adv. 27 (6): 715–725. Rados, D., Carvalho, A.L., Wieschalka, S. et al. (2015). Engineering Corynebacterium glutamicum for the production of 2,3-butanediol. Microb. Cell Factories 14 (1): 171. Jojima, T., Noburyu, R., Sasaki, M. et al. (2015). Metabolic engineering for improved production of ethanol by Corynebacterium glutamicum. Appl. Microbiol. Biotechnol. 99 (3): 1165–1172. Maga, J.A. (1982). Pyrazines in foods: an update. Crit. Rev. Food Sci. Nutr. 16 (1): 1–48. Eng, T., Sasaki, Y., Herbert, R.A. et al. (2020). Production of tetra-methylpyrazine using engineered Corynebacterium glutamicum. Metab. Eng. Commun. 10: e00115. Rodrigues, A.L., Becker, J., de Souza Lima, A.O. et al. (2014). Systems metabolic engineering of Escherichia coli for gram scale production of the antitumor drug deoxyviolacein from glycerol. Biotechnol. Bioeng. 111 (11): 2280–2289. Rodrigues, A.L., Trachtmann, N., Becker, J. et al. (2013). Systems metabolic engineering of Escherichia coli for production of the antitumor drugs violacein and deoxyviolacein. Metab. Eng. 20: 29–41. Rodrigues, A.L., Göcke, Y., Bolten, C. et al. (2012). Microbial production of the drugs violacein and deoxyviolacein: analytical development and strain comparison. Biotechnol. Lett. 34 (4): 717–720. Taniguchi, H., Henke, N.A., Heider, S.A.E., and Wendisch, V.F. (2017). Overexpression of the primary sigma factor gene sigA improved carotenoid production by Corynebacterium glutamicum: application to production of beta-carotene and the non-native linear C50 carotenoid bisanhydrobacterioruberin. Metab. Eng. Commun. 4: 1–11. Sumi, S., Suzuki, Y., Matsuki, T. et al. (2019). Light-inducible carotenoid production controlled by a MarR-type regulator in Corynebacterium glutamicum. Sci. Rep. 9 (1): 13136. Kang, M.K., Eom, J.H., Kim, Y. et al. (2014). Biosynthesis of pinene from glucose using metabolically-engineered Corynebacterium glutamicum. Biotechnol. Lett. 36 (10): 2069–2077. Henke, N.A., Wichmann, J., Baier, T. et al. (2018). Patchoulol production with metabolically engineered Corynebacterium glutamicum. Genes (Basel) 9 (4): 219. https://doi.org/10.3390/genes9040219.

467

468

12 Metabolic Engineering of Corynebacterium glutamicum

424 Frohwitter, J., Heider, S.A., Peters-Wendisch, P. et al. (2014). Production of

425

426

427

428

429

430

431

432

433

434

435

436

437

the sesquiterpene (+)-valencene by metabolically engineered Corynebacterium glutamicum. J. Biotechnol. 191: 205–213. Becker, J. and Wittmann, C. (2016). Systems metabolic engineering of Escherichia coli for the heterologous production of high value molecules – a veteran at new shores. Curr. Opin. Biotechnol. 42: 178–188. Cheng, F., Yu, H., and Stephanopoulos, G. (2019). Engineering Corynebacterium glutamicum for high-titer biosynthesis of hyaluronic acid. Metab. Eng. 55: 276–289. https://doi.org/10.1016/j.ymben.2019.07.003. Cheng, F., Luozhong, S., Guo, Z. et al. (2017). Enhanced biosynthesis of hyaluronic acid using engineered Corynebacterium glutamicum via metabolic pathway regulation. Biotechnol. J. 12 (10) https://doi.org/10.1002/biot .201700191. Papaneophytou, C. (2019). Design of experiments as a tool for optimization in recombinant protein biotechnology: from constructs to crystals. Mol. Biotechnol. 61: 873–891. Rosano, G.L., Morales, E.S., and Ceccarelli, E.A. (2019). New tools for recombinant protein production in Escherichia coli: a 5-year update. Protein Sci. 28 (8): 1412–1422. Cai, D., Rao, Y., Zhan, Y. et al. (2019). Engineering Bacillus for efficient production of heterologous protein: current progress, challenge and prospect. J. Appl. Microbiol. 126 (6): 1632–1642. Wittmann, C., Weber, J., Betiku, E. et al. (2007). Response of fluxome and metabolome to temperature-induced recombinant protein synthesis in Escherichia coli. J. Biotechnol. 132 (4): 375–384. Zhang, W., Yang, Y., Liu, X. et al. (2019). Development of a secretory expression system with high compatibility between expression elements and an optimized host for endoxylanase production in Corynebacterium glutamicum. Microb. Cell Factories 18 (1): 72. Cui, C.H., Jeon, B.M., Fu, Y. et al. (2019). High-density immobilization of a ginsenoside-transforming beta-glucosidase for enhanced food-grade production of minor ginsenosides. Appl. Microbiol. Biotechnol. 103 (17): 7003–7015. Pereira, P., Pedro, A.Q., Tomas, J. et al. (2016). Advances in time course extracellular production of human pre-miR-29b from Rhodovulum sulfidophilum. Appl. Microbiol. Biotechnol. 100 (8): 3723–3734. Baronti, L., Karlsson, H., Marusic, M., and Petzold, K. (2018). A guide to large-scale RNA sample preparation. Anal. Bioanal. Chem. 410 (14): 3239–3252. Wendisch, V.F., Brito, L.F., Gil Lopez, M. et al. (2016). The flexible feedstock concept in industrial biotechnology: metabolic engineering of Escherichia coli, Corynebacterium glutamicum, Pseudomonas, Bacillus and yeast strains for access to alternative carbon sources. J. Biotechnol. 234: 139–157. Eggeling, L., Bott, M., and Marienhagen, J. (2015). Novel screening methods – biosensors. Curr. Opin. Biotechnol. 35: 30–36.

469

13 Metabolic Engineering of Bacillus – New Tools, Strains, and Concepts Mathis Appelbaum and Thomas Schweder 1 Department of Pharmaceutical Biotechnology, Institute of Pharmacy, University of Greifswald, Greifswald, Germany 2 Institute of Marine Biotechnology, Greifswald, Germany

13.1 Introduction Metabolic engineering approaches and optimization of microbial cell factories in general depend on a detailed knowledge of the specific physiology of the expression host and cellular functions required for efficient product formation. Furthermore, suitable molecular biological technologies that allow for detailed functional genome analyses are essential. Our knowledge of cellular physiology and adaptation mechanisms of microbial host systems and their metabolic pathways has tremendously increased in the multiomics era, driven by high throughput sequencing techniques, and by improved analytical methods for the identification and quantification of proteins, RNAs, and metabolites. These techniques produce large data sets, the functional analysis of which is challenging and requires suitable bioinformatic tools for data processing [1]. Analyzing bacterial cells globally is no longer a bottleneck but understanding or even predicting the function and interaction of specific proteins as well as of biological systems in a holistic way remains challenging. Therefore, engineering in life sciences lags far behind engineering in nonbiology fields or in chemical synthesis. The ambitious research fields of systems and synthetic biology aim to apply principles from engineering like standardization, modularity, and (mathematically) predictable behavior to close this gap [2]. Such a holistic view on biological systems requires suitable model organisms. Bacillus subtilis represents one of the best characterized microbial cell factories. This Gram-positive bacterium has not only been used as a model organism, but also for molecular analysis of bacterial mechanisms of cell differentiation for almost six decades [3–5]. Moreover, B. subtilis and related species are important industrial workhorses for the production of valuable enzymes or primary and secondary metabolites [6]. However, their application in the field of synthetic biology and metabolic engineering has not been widely explored yet [7].

Metabolic Engineering: Concepts and Applications, First Edition. Edited by Sang Yup Lee, Jens Nielsen, and Gregory Stephanopoulos. © 2021 WILEY-VCH GmbH. Published 2021 by WILEY-VCH GmbH.

470

13 Metabolic Engineering of Bacillus – New Tools, Strains, and Concepts

B. subtilis is an undemanding host cell characterized by high growth rates in both minimal and complex media. The modest growth of B. subtilis strains on cheap carbon and energy sources ensures robustness in industrial fermentation processes [6]. Furthermore, members of the genus Bacillus are highly efficient in secreting target enzymes into the medium [8–11]. Finally, the safety of this widely used industrial hosts is reflected by the generally regarded as safe (GRAS) and quality presumption of safety (QPS) status of selected industrial Bacillus strains [8]. The genetics, biochemistry, and physiology of Bacillus, in particular B. subtilis as the Gram-positive model bacterium, is well characterized, including the function of central metabolic pathways and cellular differentiation as well as (specific) stress adaptation mechanisms. The basis for advanced functional analyses was established by the determination of the complete genome sequence of B. subtilis 168 as early as 1997 [12, 13]. Subsequently, genome-wide analyses of targeted gene mutations enabled the determination of essential genes in this model bacterium [14–17]. Following in-depth global transcriptome [18, 19], proteome [20–24], and metabolome [25–27] profiling under defined environmental conditions enabled a detailed exploration of genome-encoded functions. These approaches allowed for the identification of new protein functions and metabolic pathways as well as the determination of specific regulatory circuits [28] in this bacterium. Established online tools like SubtiWiki, for genome annotation [29]; SporeWeb, for illustrating the dynamics of the complete sporulation cycle [30]; genome-scale metabolic models such as BsubCyc and iBsu1103 [12, 31, 32]; and DBTBS, for the search of promoter sequences and transcriptional regulators [33] are valuable databases to integrate and search these complex data sets for in-depth functional analysis of B. subtilis as a model organism [32, 34]. Due to the demands of a biobased sustainable economy, robust cell factories for new industrial bioprocesses are required. The several established industrial Bacillus production strains and fermentation processes as well as the in-depth knowledge on their genetics and physiology are a strong basis for this future development.

13.2 The Determination of Essential Physiological Traits and Circuits To get detailed insights into physiological traits and gene regulatory circuits controlling specific cellular activities, global gene expression analyses, such as transcriptomics or proteomics are required. Both techniques have been intensively used to determine stress and starvation adaptation mechanisms in Gram-positive model bacteria like B. subtilis (reviewed by [35]) or Bacillus licheniformis (reviewed by [36]). Global gene expression analyses have revealed a comprehensive set of genes, the expression of which is specifically induced by one stressor (specific stress genes) or by different environmental stress situations (general stress genes). At least four major strategies for the adaptation to changing environmental conditions have been determined in these model

13.2 The Determination of Essential Physiological Traits and Circuits

bacteria [37]. In the first stage of adapting to limiting conditions, the expression of vegetative genes for central metabolic pathways is downregulated [38–42]. This process is mainly controlled by conserved RelA/SpoT homologues via the intracellular pool of alarmones guanosine tetra- and penta-phosphate ([p]ppGpp) [43]. The second cellular response during starvation conditions is based on the induction of general survival mechanisms that enable a global preventive protection against other potentially detrimental environmental conditions. In Bacillus, this mechanism is controlled by the alternative sigma factor SigB [44], which regulates the general stress genes to adapt anticipatorily to different potential physical stress conditions [45]. With regard to activation of the general stress response, B. licheniformis represents an exception lacking the energy limitation-induced mechanism of the SigB-controlled general stress response [40, 42, 46]. Bacillus’ third adaptation strategy is the specific induction of regulons involved in the formation of different cell populations with physiologically distinct cell types. This differentiation process includes the formation of a motile, chemotactic state, the development of genetic competence for uptake of DNA from the environment, and the formation of biofilm matrix components [47–52]. At this stage of the cellular adaptation process, the secretion of extracellular enzymes is increased to mobilize alternative nutrient sources [53, 54]. Finally, when facing prolonged nutrient starvation conditions Bacillus initiates the sporulation cascade to enter a dormant state [55]. The early stage of sporulation is accompanied by killing of sibling cells to release nutrients to the remaining cell population, which allows the population to delay commitment to sporulation [56] and, possibly, enable the cells to complete their complex and energy-demanding sporulation process [57]. It was suggested that this so-called “cannibalism” mechanism ensures a critical number of still slow growing cells to successfully and flexibly compete for limited nutrient resources against the microbial community [58]. This provides a fitness advantage, because once cells have initiated the sporulation cascade, they are unable to resume vegetative growth rapidly even if nutrients become available again [56]. These differentiation processes and the formation of heterogeneous cell populations aim to ensure Bacillus’ survival under adverse environmental conditions in its natural habitats. It has been shown that bacteria are able to recognize specific stress situations within seconds also during large-scale fermentation processes [59]. The induced set of specific and general stress genes encompasses comprehensive regulons which can comprise more than 150 genes as shown for the SigB-controlled regulon in B. subtilis [18, 60, 61]. From a bioprocess point of view, the cellular stress response and differentiation can hamper the productivity in industrial fermentation processes. Increased expression of large sets of stress genes presents a metabolic burden which reduces the productivity of the host cells [37, 62]. Furthermore, due to gradients in industrial bioreactors in large-scale bioprocesses, microbial cells continuously experience variations in medium composition. These gradients are mainly caused by the feeding strategy of frequently used fed-batch fermentation processes and by insufficient mixing, which changes the local concentrations of nutrients. It has been shown that the size of these zones as well as the residence time and the type of response of the bacteria influence the outcome of a bioprocess [62].

471

472

13 Metabolic Engineering of Bacillus – New Tools, Strains, and Concepts

The productivity of industrial bacterial fermentation processes is also determined by the efficiency of the substrate utilization. Comparative characterization of the maintenance energy coefficient of different Bacillus species revealed distinct differences in the maintenance metabolism of Bacillus species [63]. It was concluded that B. subtilis might not be the optimal cell factory from a bioenergetic point of view while the sporulation-deficient strain B. licheniformis T380B displayed the lowest maintenance energy coefficient compared to B. subtilis and B. amyloliquefaciens strains [63]. The B. licheniformis strains were characterized by more robust growth characteristics with weaker acetate overflow. One explanation for the different physiological phenotypes of B. subtilis and B. licheniformis could be the glyoxylate shunt of the tricarboxylic acid (TCA) cycle that is missing in B. subtilis [46], which hampers the utilization of overflow metabolites. It could be speculated that due to its missing glyoxylate cycle and its specific ecological niche adaptation, B. subtilis depends on the energy-controlled shunt of the SigB-regulated global stress response under in situ conditions as mentioned above. This could ensure a required resilience of selected cell populations under nutrient-limited growth conditions in nature. In contrast, B. licheniformis possesses the anaplerotic glyoxylate pathway of the TCA cycle, which enables this microbe to grow on overflow metabolites such as acetate, acetoin, or 2,3-butanediol resulting in improved process characteristics [46]. In accordance with this hypothesis, introduction of the B. licheniformis isocitrate lyase (aceB) and malate synthase (aceA) encoding genes into the genome of B. subtilis not only enabled the recombinant cells to utilize acetate but also resulted in more robust growth and improved enzyme production [46]. Up to 70% of the carbon source can be consumed by maintenance metabolism in microbial fermentation processes [64]. This illustrates the potential impact of the maintenance physiology on the productivity of a host cell.

13.3 The Minimal Cell Concept 13.3.1

Why Minimal Genomes?

The rational design of metabolic pathways and cell factories in general requires a deep understanding of the host’s physiology. Although B. subtilis 168 is one of the best characterized model organisms, about 350 of its genes are annotated as “hypothetical” and 370 genes encode proteins of unknown function according to https://bsubcyc.org [31]. Consequently, genes of unknown or unclear function and – more importantly – our limited understanding of complex regulatory networks hamper the predictability of the host’s physiology [2]. One strategy to understand and predict the physiology of a bacterial cell is to reduce complexity by constructing strains with a minimal set of genes required for robust growth and cellular function [65]. Simplified cells may function as chassis for the construction of rationally designed expression platforms by integrating genes and pathways required for product formation. In addition, reducing the number of genes related to functions not required for product formation may reduce metabolic burdens and channel cellular resources

13.3 The Minimal Cell Concept

toward product formation [10, 14, 66, 67]. Genome reduction is achieved by either bottom-up or top-down approaches. The most prominent example for bottom-up approaches is the semiartificially constructed Mycoplasma mycoides JCVI-syn3.0 strain with 51% reduced genome size, which contains only 473 out of its originally 1079 genes [68]. Recently, more complex bacterial genomes have been studied by minimal genome approaches. This includes Caulobacter crescentus (bottom-up approach), Corynebacterium glutamicum, Escherichia coli, Pseudomonas putida, Lactococcus lactis, Streptomyces strains, Bacillus subtilis (all top-down), and even yeast [67, 69–79]. B. subtilis has been studied intensively and a wide range of tools for genetic manipulation are available, making it an excellent model system for genome reduction [65]. In addition, its function as a model organism (e.g. for cellular differentiation) and its industrial relevance make B. subtilis a good candidate for genome streamlining projects. The estimated minimal number of essential genes in B. subtilis is 253, including two essential RNA-coding genes, the tmRNA ssrA, and the signal recognition particle component scr [14]. To evaluate the physiological relevance of all genes, single gene knockout strains were systematically constructed, revealing 6.2% of all genes to be essential for growth of B. subtilis in complex medium (Lysogeny broth) at 37 ∘ C [14, 15]. However, the minimal number of genes required to sustain life is higher than the pure number of essential genes due to functional redundancy of genes and pathways regarding essential cellular functions (e.g. paralogous proteins, ribosomal proteins, rRNAs, tRNAs, uptake or de novo synthesis of amino acids) [80]. These effects cannot be uncovered by single gene knockout approaches but have to be considered in minimal genome projects. A detailed description of the criteria for the target selection in genome reduction approaches and the construction of a minimal cell was previously published [65]. 13.3.2

Overview of Genome Reduction Projects in B. subtilis

Milestones of genome reduction studies with B. subtilis are illustrated in Figure 13.1. The first report on a genome-reduced B. subtilis strain focused on the deletion of prophages, prophage-like elements, and selected gene clusters for the synthesis of secondary metabolites, resulting in a 7.7% reduction of the genome [81]. The resulting strain B. subtilis Δ6 showed no unique properties and revealed almost identical growth rates, biomass yield (g/g glucose), and metabolite flux through central carbon metabolic pathways compared to the parental strain [81]. The genome of this strain was further reduced by deleting genes related to the remaining prophage-like sequences as well as to the production of further secondary metabolites [83]. In addition, further production-relevant processes were targeted by preventing spore formation (deletion of sigG, sigE, spoGA) and the killing of sibling cells (deletion of skf , sdp) [58]. Growth of the resulting strain IG-Bs20-4, which is 13.6% reduced in genome size, was again not hampered, neither in minimal nor complex medium. This demonstrates that the genome of B. subtilis can be significantly reduced without detrimental effects on cell physiology under laboratory conditions [83]. Another strain derived from this lineage, B. subtilis IIG-Bs-20-5-1, showed even higher growth rates

473

474

13 Metabolic Engineering of Bacillus – New Tools, Strains, and Concepts

Figure 13.1 Genome reduction in B. subtilis 168. The timeline illustrates major steps toward a B. subtilis minimal cell. Supporting information on essential genes and proof of concept studies demonstrating improved productivity are shown. The size of the genome reduction is indicated in percentage of the unmodified B. subtilis 168 genome (gray circles).

13.3 The Minimal Cell Concept

and increased biomass yields, indicating an improved conversion of glucose into biomass [87]. In contrast, B. subtilis BSK814 engineered for production of guanosine and thymidine was characterized by a lower growth rate while biomass yield was increased [84]. In another study, sequential large-scale deletions similar to the above-mentioned strains were conducted. The genome of the resulting strain B. subtilis MGB469 was further reduced by deleting six regions previously identified to contain nonessential genes resulting in a genome reduction of about 25% [66]. While MGB469 showed wildtype-like growth and productivity, both parameters were unstable in the final strain MG1M [66, 67], which demonstrates the risk of dead ends when constructing minimal cells. Therefore, starting from B. subtilis MGB469 again, the deletion of 63 individual gene clusters was analyzed by single knockout strains regarding growth in complex (LB) and minimal medium (SMM). Deletions of 11 of the previously evaluated gene clusters that did not severely affect growth were combined, resulting in strain B. subtilis MGB874 with a 20.7% reduced genome [67]. The strain displayed regular cell morphology and chromosomal distribution, both indicators for the viability of the cell. In contrast to the single knockout strains based on MGB469, MGB874 was slightly affected in growth with an increase in the doubling time from 21 to 27 minutes in LB medium and 67 to 100 minutes in SMM [67, 82]. These findings show that predicting the effect of genome reduction is difficult, even if the deleted genes encode for known cellular functions or were characterized in single knockout strains before. Therefore, step by step evaluation after each round of genome reduction is required. Recently, another genome-wide study increased the knowledge on regions in the B. subtilis genome that are not required for robust growth [34]. In total, 146 individual regions ranging from 2 to 159 kb were deleted and the strains were tested for growth in LB and defined minimal medium. The experimental data generated in this study were used to refine and verify the metabolic model of B. subtilis iBsu1103, thereby providing valuable information for model-assisted metabolic engineering and genome reduction [32, 34]. Interestingly, these data also provided new insights into the composition of LB medium and metabolites that are included or can be derived from degradation of oligomeric and polymeric components in LB medium [34]. Within the MiniBacillus framework, B. subtilis has undergone the most extensive genome reduction of all bacteria so far, excluding M. mycoides [75]. The two most advanced B. subtilis strains PG10 and PS38, which are about 36 and 42%, respectively, reduced in their genome size, were characterized by a multiomics approach, providing valuable insights into the role of essential and nonessential genes. The relative number of essential genes in the genome-reduced strains has increased from 6 to 9% of the whole genome, while the proportion of the corresponding proteins decreased from 57% in the parental strain to 50% in the genome-reduced strains. In contrast, the percentage of proteins related to changes in the lifestyle of Bacillus (e.g. cell differentiation) has increased, although many genes in this category were deleted as they are considered to be dispensable under laboratory conditions [75]. More importantly, 28% of the genome of the reference strain that had been deleted in PG10 and PS38

475

476

13 Metabolic Engineering of Bacillus – New Tools, Strains, and Concepts

contributed to less than 2.5% of the total proteome. Many of these proteins are related to unknown functions, stress, or cell differentiation. Thus, the overall physiological consequences of the construction of a minimal cell achieved at the genomic level is not equally reflected at the proteomic level since many of these genes might be required under very specific conditions only [75]. It is important to note that the analysis of Reuß et al. [75] reflects gene expression profiles from the mid exponential growth phase, which excludes specific lifestyle categories required under limiting growth conditions (e.g. sporulation and biofilm formation) or under specific environmental stress conditions [50]. Consequently, the metabolic burden caused by genes in this category is low during exponential growth but may increase from the late exponential growth phase on. This is of special interest when constructing new chassis for industrial biotechnology, as the late exponential or transient growth phase coincides with an increased production of exoenzymes or secondary metabolites [50, 54]. Similar to the previous report on the genome-reduced strain MGB874 [67], growth of PG10 and PS38 was slightly affected, with a reduced cell yield and the doubling time increasing from 22 minutes to 29 and 33 minutes, respectively [75]. The reduced growth rate correlates with a lower relative amount of ribosomal proteins, although genes encoding ribosomal proteins were not deleted in strains PG10 and PS38 [75]. Interestingly, changes in the arginine degradation pathway and the intracellular glutamate levels were observed in both genome reduction projects [67, 75, 82]. In strain MGB874 [67], the deletion of rocDEF-rocR, involved in the arginine degradation pathway, compensated for the lower cell yield in the genome-reduced precursor strains of MGB874 [82]. The transcriptional activator RocR regulates the expression of genes responsible for arginine uptake and degradation, with the deamination of glutamate to 2-oxoglutarate catalyzed by the glutamate dehydrogenase RocG being the final step [88, 89]. RocG also inhibits glutamate synthesis from glutamine and 2-oxoglutarate by indirectly repressing transcription of the glutamate synthase genes gltAB [90]. Thus, deletion of rocR results in strongly reduced expression of rocG, which in turn leads to an increase in the intracellular glutamate pool by inhibition of the glutamate degradation and activation of its synthesis [82]. In addition, further considerable changes in the metabolism were observed, many of them resulting in or from increased glutamate levels and thereby contributing to an overall increased metabolic activity [67]. Higher cell yield thus likely results at least in part from enhanced protein synthesis with glutamate as the key metabolite for the synthesis of further amino acids, proteins, and almost all N-containing compounds [82, 91, 92]. In contrast to strain MGB874, deletion of rocR in PG10 did not affect growth, and high activity of the arginine degradation pathway was neither required for nor did hinder proper growth of PG10 [75]. Instead, changes in the regulatory network were attributed to the strong upregulation of genes related to arginine degradation (rocABC, rocDEF) without a physiological demand or benefit regarding growth. Increased levels of the sigma factor SigL, required for transcription of rocABC and rocDEF, high activity of the transcriptional activator RocR, and increased levels of its effector molecule ornithine caused the strong increase in the expression levels [75].

13.3 The Minimal Cell Concept

13.3.3

Productivity of Genome-Reduced Strains

Besides the high interest in genome-reduced strains from the basic research point of view, genome reduction has also gained interest in industrial biotechnology as a promising strategy to obtain more robust strains with enhanced productivity. In contrast to changing conditions in the soil, the natural habitat of B. subtilis, the controlled environment in industrial fermentations processes does not require complex adaptation processes, but productivity might be hampered due to byproduct formation and waste of cellular resources [10, 66, 93]. So far, only a limited number of studies has analyzed the biotechnological potential of genome-reduced Bacillus expression hosts [67, 84, 86, 87]. This includes reports on more than twofold increased secretion of the extracellular alkaline cellulase Egl-237 and increased production of the subtilisin-like alkaline protease M [67, 94]. In addition, nucleosides potentially used as food additives or precursor molecules were produced at higher levels in a genome-reduced B. subtilis strain [84]. Most recently, strain PG10 [75] was shown to facilitate the secretory production of four staphylococcal proteins/antigens that could not be produced in the parental strain B. subtilis 168 [86]. Improved secretion and translational efficiency as well as the lack of eight extracellular proteases in PG10 resulted in a higher expression level and product stability of these difficult-to-express proteins [86]. While the beneficial effect of these properties is obvious, the physiological basis remains unclear, with the exception of reduced extracellular proteolytic activity, a known bottleneck frequently addressed in the optimization of Bacillus expression platforms [10]. Interestingly, heterologous expression of the cellulase Egl-237 continued to increase throughout the cultivation period of 72 hours in the genome-reduced strain MGB874, while cellulase activity levels remained constant for the progenitor strain B. subtilis 168 after 24 hours [67]. It could be speculated that this continued increase in expression levels is supported by enhanced carbon source utilization in case of MGB874. Maltose consumption was comparable between the two strains until the early stationary growth phase, but more pronounced in mid to late stationary phase in case of MGB874 [67]. Transcriptome analyses revealed a higher metabolic activity in MGB874 with genes for carbon and nitrogen source utilization as well as electron transport and ATP synthesis being upregulated during the stationary growth phase. Thus, the specific genome engineering of this mutant resulted in a more efficient nutrient consumption. In addition, changes in the regulatory network were observed including higher (and earlier) expression of degU, delayed inactivation of AbrB, and delayed activation of sporulation-related genes, indicating an extended transition state [67]. Another effect that increased the productivity in MGB874 was shown to result from a promoter and plasmid copy number effect, both contributing to higher transcript levels of the target gene egl-237. The alkaline cellulase was expressed from a pUB110-based expression vector using a vegetative SigA-type promoter. As the copy number of pUB110 decreases in stationary phase, maintenance of the vegetative state in MGB874, resulting from metabolic changes described above, may have prolonged the production period [82]. However, the above-mentioned points might not be the sole cause for increased productivity.

477

478

13 Metabolic Engineering of Bacillus – New Tools, Strains, and Concepts

Also synergistic effects from genes with known or unknown function in the regions deleted could contribute to the optimization of B. subtilis MGB874 [82]. Improved gene expression capacity and metabolic efficiency seems to be one of the major beneficial traits that can be achieved by genome reduction, as observed in different species [67, 69, 73, 76, 82, 84, 86, 93]. In addition, deletion of prophages and cannibalism factors (skfA, sdpC) targeted in most genome-reduced B. subtilis strains is likely to contribute to reduced cell lysis, which leads to increased biomass and higher productivity [84, 87, 95, 96]. Besides the opportunity to construct improved strains, there are also several challenges arising from genome reduction. Combined sequential large-scale deletions may result in undesirable phenotypes with unstable growth and productivity or increased cell lysis [66, 67, 75, 86]. Unknown synergistic effects, synthetic lethality (co-lethal), and quasiessential genes necessary for robust growth require an iterative strain construction process, although this is not unique to genome minimization but a general property of strain engineering [34, 68]. Despite careful target selection and evaluation by single knockouts, the outcome of combined sequential large-scale deletions is difficult to predict. Thus, genome reduction remains a rather randomized approach when it comes to strain optimization [66, 67, 75]. Nevertheless, the construction of minimal cells that show robust growth and physiological parameters provides a powerful basis for increasing the predictability in engineering organisms with higher productivity.

13.4 Tools for Genome Editing Functional molecular genetic analyses require tools for the exchange of chromosomal genes by a mutated allele or inactivated gene copy. While members of the genus Bacillus are well-established organisms for industrial applications, available genetic tools are still lagging behind those for other popular production hosts such as Escherichia coli and Saccharomyces cerevisiae [97, 98]. In this section, we focus on counter-selection tools and CRISPR/Cas as key technologies for strain modification in Bacilli. Further excellent reviews on molecular biology of Bacillus were recently published [6, 99, 100]. 13.4.1

Counter-Selection and Markerless Genome Editing

The fastest and most straightforward way to genetically modify Bacillus strains utilizes deletion cassettes comprising homology regions and an antibiotic resistance marker to replace the target gene. Recently, two genome-scale deletion libraries using allelic replacement were constructed in B. subtilis [16]. The deletion cassettes are either directly transformed as PCR products into the target strain via natural genetic competence, or subcloned into suicide vectors lacking a Bacillus origin of replication prior to transformation [101]. Alternatively, integrative plasmids are applied to disrupt the target gene by integration of the complete vector via a single (Campbell-type) crossover event [102]. However,

13.4 Tools for Genome Editing

introduction of antibiotic resistance markers and artificial DNA sequences is undesirable, due to the limited number of available selection markers or regulatory demands for genetically modified hosts in industrial production processes [103]. To remove the antibiotic resistance marker from the genome of the final strain, site-specific recombinases are deployed, allowing for recycling of the marker gene and multiple rounds of genetic modification [104, 105]. In case of B. subtilis and related species, the Cre/loxP system is widely used and has been further optimized to enhance recombination efficiency and as well as to modify poorly accessible strains [106–109]. The Cre expression cassette is either integrated into the genome or transiently provided on a plasmid for excision of the resistance marker after allelic replacement of the target region. While the efficiency of this approach is high, the method is limited to gene deletion and integration, requires additional steps for excision and curing, and leaves “scars” behind [83, 103]. To overcome these limitations, counter-selection markers have been developed and applied in a wide range of industrially relevant species for strain optimization, metabolic engineering, and reverse genetics [83, 110–112]. Counter-selection systems apply selective pressure due to the detrimental effect of the cytotoxic marker gene under specific conditions. This includes conversion of nontoxic to toxic compounds in the presence of the counter-selection marker, as well as inducible expression of the toxin in toxin–antitoxin systems, as recently reviewed [113]. Integration of the modified gene copy together with an antibiotic resistance marker and the counter-selection marker is forced via positive selection for antibiotic resistance after a first recombination event, followed by a second recombination event that leads to excision of the antibiotic resistance gene and the counter-selection marker (Figure 13.2). Screening of putative positive clones that have undergone the possibly rare second recombination event is strongly improved by selection for absence of the counter-selection cassette [110]. The most commonly used counter-selection system in Bacillus employs the uracil-phosphoribosyltransferase (UPRTase) gene upp together with the toxic pyrimidine analogue 5-fluorouracil (5-FU) [67, 110, 114, 115]. UPRTase catalyzes the conversion of 5-FU to 5-fluoro-UMP, which is further metabolized to 5-fluoro-dUMP leading to inhibition of the thymidylate synthase and depletion of dTMP/DNA synthesis [110]. Absence of the upp gene confers resistance to 5-FU. Consequently, usage of upp as a counter-selection marker requires deletion of the endogenous upp gene copy to create 5-FU resistance which leads to uracil auxotrophy [110, 112]. Therefore, an alternative strategy similar to the upp counter-selection was recently published based on the cytosine deaminase codA from E. coli, which is not native to many prokaryotes [112]. CodA converts cytosine to uracil and is also able to catalyze the deamination of 5-fluorocytosine to 5-FU, thereby acting upstream of the above-mentioned pathway involving upp. For efficient uptake of 5-FC, simultaneous expression of codB, encoding a cytosine transporter, was required in B. licheniformis [112]. The clear advantage of the codBA system is the avoidance of uracil auxotrophy and, more importantly, the broad applicability in a wide range of industrially relevant

479

480

13 Metabolic Engineering of Bacillus – New Tools, Strains, and Concepts

Figure 13.2 Generation of markerless deletion mutants using counter-selection marker genes. (a) Homologous flanking regions located on the deletion plasmid lead to integration of the full plasmid by a Campbell-type mechanism. (b) Positive selection for integrants is achieved by antibiotic resistance marker genes (here Erythromycin resistance gene). (c) The second recombination event leads to spontaneous excision of the integrated plasmid. Depending on whether the first and second recombination event occur via the same or different flanking region, the resulting strain carries the wildtype or mutated (d) allele. The application of counter-selection conditions simplifies identification of positive clones that have undergone the possibly rare event of plasmid excision. Similar to the schema shown, CRISPR/Cas9 can act as a counter-selection system by specifically cleaving the wildtype allele after the second recombination event.

13.4 Tools for Genome Editing

bacteria, including Rhodococcus, Streptomyces, and Bacillus, without the need for genetic modification of the host prior to establishing the system [112, 116, 117]. Another new, but already well-established counter-selection tool uses the mannose-specific transporter ManP [83]. As a part of the mannose phosphoenolpyruvate-dependent phosphotransferase system, ManP enables uptake and phosphorylation of mannose, which is subsequently isomerized by ManA to fructose-6-phosphate. Strains deficient in ManA are sensitive to mannose due to intracellular accumulation of mannose-6-phosphate reaching toxic levels. Absence of manP, on the other hand, enables growth in presence of mannose. Like the upp system, manP-based counter-selection requires deletion of the native manPA alleles [83]. The success of manP-based counter-selection was proven in multiple, successive rounds of large-scale deletions in genome reduction projects of B. subtilis [75, 83]. The deleted regions were up to 78 kb large, which is comparable to results achieved with upp counter-selection-based deletions [67, 83]. Alternative counter-selection systems used in Bacillus species have been reviewed recently [113]. Many of the counter-selection procedures were designed using nonreplicative vectors or PCR-generated deletion cassettes [83, 110, 114, 118]. This requires a high transformation efficiency and (induced) genetic competence for efficient uptake and recombination of exogenous DNA. Nevertheless, the methods can be adapted to strains not accessible via genetic competence or showing only low transformation efficiencies by using shuttle vectors allowing for replication in Bacillus [112, 115]. The protocols for markerless genome modification based on replicative plasmids comprise systems with (e.g. the pKVM series) and without (e.g. pMAD, pMiniMAD) the help of counter-selection markers [112, 115, 119–121]. Either way, a temperature-sensitive Bacillus origin of replication (ori) is a prerequisite to force chromosomal integration of the plasmid by a temperature upshift after initial replicative establishment at a lower, permissive temperature. For engineering of mesophilic Bacillus species, the ori from the staphylococcal derived broad-host range plasmid pE194 and more importantly its temperature-sensitive derivative pE194ts is widely used [122]. Like the temperature-sensitive pSG5 ori in actinomycetes, the pE194ts ori is fundamental to engineering of Bacillus strains [122, 123]. 13.4.2

CRISPR/Cas in B. subtilis and Related Strains – Basic Principles

Although allelic replacement is well established and highly useful, the technique can be time-consuming and can suffer from low success rate depending on the strain, target, and type of genetic modification [98, 103, 124–126]. In addition, most of the methods require a specific genetic background (Δupp, ΔmanPA) [83, 110]. The recent rise of Clustered Regularly Interspaced Short Palindromic Repeats/Cas9 (CRISPR/Cas9)-based genome editing bears the potential to overcome these limitations, in particular to accelerate and simplify genome editing and metabolic engineering with high efficiencies and a broad application range [127, 128]. Since 2016, various type II CRISPR/Cas9 systems were developed for the engineering of Bacillus strains, including B. subtilis, B. licheniformis,

481

482

13 Metabolic Engineering of Bacillus – New Tools, Strains, and Concepts

and the thermophilic B. smithii [129–131]. Selected CRISPR-based genome editing systems developed for Bacillus species are summarized in Table 13.1. Although advances in CRISPR/Cas9-based genome editing of Bacillus have been reviewed recently [132], several additional studies were published in the meantime, demonstrating the fast progress in this field. The strategies for delivery of the CRISPR components in Bacillus (Cas9, sgRNA, tracrRNA/crRNA) comprise single and dual plasmid approaches as well as chromosomal integration of the Cas9 expression cassette and, optionally, the (s)gRNA [7, 98, 129]. While single-plasmid approaches are easy to handle (one transformation step, antibiotic resistance marker and curing) the size of the vector carrying the 4.1 kb cas9 gene might be a limiting factor when it comes to integration of large inserts or multiplexing approaches [132]. On the other hand, single-plasmid approaches may reduce the metabolic burden compared to dual plasmid systems, which require two selectable markers [132]. Dual plasmid approaches enable simultaneous cloning of the sgRNA and the homology template in two different plasmids, thereby overcoming the necessity of two consecutive cloning steps and thus resulting in a reduced time requirement [97, 125, 132]. Similarly, single-plasmid-based expression of the CRISPR components and provision of the homology template as a linear PCR product requires only one cloning step [133], but is limited to strains accessible via natural genetic competence. In addition, cloning of sgRNA and cas9 on two different plasmids may circumvent potential cloning issues in the cloning host. As mentioned before, temperature-sensitive plasmid replication plays a crucial role in marker-less genome modification techniques. Similarly, all plasmid-based CRISPR/Cas9 systems designed for Bacillus exploit the pE194ts backbone for curing, except for the dual plasmid approach developed by So et al. [134]. Finally, although chromosomal integration of the Cas9 expression cassette represents a higher initial effort, this approach simplifies subsequent cloning steps, overcomes potential plasmid instability, and represents a good strategy for multiplexing approaches and successive rounds of genome editing [98, 130]. In addition, single copy integration may reduce the physiological impact on the cell [98], although additional factors influence cas9 expression levels, in particular, the promoter, the growth phase, and the genomic integration site [135]. While the easily accessible laboratory strain B. subtilis 168 was modified using all three types of systems, difficult-to-transform strains, including B. subtilis ATCC6051a, B. licheniformis 2709, and B. methanolicus MGA3, have been preferentially modified by single-plasmid approaches, possibly to reduce complexity and circumvent the requirement of simultaneous uptake of two plasmids or one plasmid and a linear PCR product for homology-directed repair (HDR) [125, 126, 130, 136]. Likewise, chromosomal integration of the cas9 (cas9n) expression cassette reduces the complexity in the transformation steps and has been implemented in B. licheniformis DW2 [130]. In addition, using the Cas9-nickase (Cas9n) instead of Cas9 has proven to increase transformation efficiencies due to reduced toxicity of the introduced single-strand DNA breaks compared to double-strand cleavage and the resulting higher number of viable clones [97, 130, 137, 138]. In case of the super-competent B. subtilis 168 derivative REG19, the transformation efficiency increased 20-fold when using Cas9n

Table 13.1 Selected CRISPR/Cas9 genome editing systems developed for Bacillus species. Strain

Control of Cas expression

sgRNA expression and cloninga)

Editing efficiencyb)

Novelty and key features

(A) Single plasmid systems B. subtilis 168 [129]

PmanPA–cas9 (mannose inducible)

PvanABK –sgRNA (constitutive); Golden Gate

Deletion trpC+

90–97% 100%

PmanP is silent in E. coli, limited host range due to ManA-, ManR-dependent regulation; high efficiency for 25 kb deletion using one sgRNA only

B. subtilis ATCC6051a [125]

PamyQ–cas9 (truncated, constitutive)

P43–sgRNA (constitutive); inverse PCR of the plasmid, spacer added as overhangs

Deletion

33–53%

First report on CRISPR/Cas9 in a poorly transformable, nonmodel B. subtilis strain

B. smithii ET 138 [179]

PxylL–ThermoCas9 (native, constitutive)

Ppta–sgRNA (constitutive); Gibson Assembly

Deletion CRISPRi ldhL

10–40% reduced lactate synthesis

Characterization of a new, thermostable Cas9/dCas9 from Geobacillus thermodenitrificans; wide temperature range and increased specificity at 37 ∘ C

B. subtilis 3NA [138]

xylR, PxylA–cas9; PtetLM- cas9; PmanPA–cas9(n)

See Altenbuchner [129]; separate cloning of the second spacer in Cas9n approaches

25 kb deletion: Cas9 Cas9n

PmtlR–dCas9 + lac operator site (mannitol inducible)

PmtlR–sgRNA (inducible)

CRISPRi spo0A, mtlD, katA

B. methanolicus MGA3 [136]

93–97% 53%

20–90% lower expression/ functional proof

Comparison of Cas9 and Cas9n shows reduced toxicity of Cas9n; potential broad host range; oriT/traJ fragment for conjugational transfer Functional dCas9 (S. pyogenes Cas9) at elevated temperatures (50 ∘ C) in contrast to previous reportsc) (continued)

Table 13.1 (Continued) Control of Cas expression

sgRNA expression and cloninga)

Editing efficiencyb)

B. subtilis 168 [173]

lacI, Pgrac–dcas9-AID (deaminase)

Pveg–sgRNA (constitutive); Golden Gate

Single base editing C→T

100% triple site editing efficiency

Proof of concept CRISPR-dCas9-mediated deaminase base editing including multiplexing application

B. subtilis 168 [176]

lacI, Pgrac–(d)MAD7

Pveg sgRNA (constitutive); Golden Gate compatible

Deletion CRISPRi

93–100%

Free-to-use MAD7 nuclease and dMAD7 in B. subtilis; Analysis of CRISPR-based editing mode of action

Strain

Novelty and key features

(B) Dual plasmid systems B. subtilis 168 [134]

lacI, Pgrac–cas9 (IPTG inducible)

ParaABCD–sgRNA (constitutive); Golden Gate cloning of the spacer

Deletion/Insertion 38 kb Δpps point mutations

83–97% 80% 69%

Large-scale deletion of 38 kb required prolonged incubation and simultaneous targeting by two sgRNAs

B. subtilis DB104 [7]

lacI, Pgrac–cas9, dCas9-𝜔/𝛼 (IPTG inducible)

P43/P242–sgRNA (constitutive); BioBrickd) cloning of multiple sgRNAs

CRISPRi vpr/bpr CRISPRa prsA

78–95% reduced, 126% increased transcript levels

First CRISPRa in Bacillus; multiplexing; simultaneous activation and repression of target genes in a position dependent manner using a single modulator (dCas9-ω)

B. subtilis 168 [244]

lacI, Pgrac–cas9 (IPTG inducible)

See [134]; PspoIVA–sgRNA for plasmid self-elimination

Deletion

80%

Growth phase-dependent self-elimination of the sgRNA/HDR template providing plasmid (98% efficiency); improved multiround editing

B. subtilis 168 [97]

xylR, PxylA–cas9n (xylose inducible)

P43–sgRNA; PmanP–sgRNA for plasmid elimination; Golden Gate

Multiplexing (three point mutations)

49–58% increased to 65% (in ΔligD strain)

Fast editing due to modular construction of multiplexing plasmid and self-elimination of the HDR providing plasmid; ΔligD further increases multiplexing capability

(C) Chromosomal integration B. subtilis 1A751 [98]

Pcas–cas9(n) (constitutive)

xylR, PxylA–gRNA (inducible); BioBrickd) cloning of multiple sgRNAs

Deletion Double k.o. CRISPRi

85% 85% Eightfold repression

Integration of Cas9 and (s)gRNA; HDR template provided as linear DNA; uncoupling of tracrRNA and crRNA; mazF counter-selection for removal of the integrated gRNA

B. licheniformis DW2 [130]

P43–cas9n (constitutive)

P43–sgRNA (Two sgRNA expression cassettes for dual knockout); SOE-PCR of promoter, sgRNA and HDR template

Deletion/Insertion Double k.o. 43 kb ΔbacABC

77–90% 11%e) 79%

Integration of Cas9n; plasmid-based gRNA delivery; Cas9n instead of Cas9 might have been crucial for establishment of CRISPR/Cas9 in this difficult to access strain

B. subtilis 168 [17]

PxylA–dCas9 (xylose inducible)

Pveg–sgRNA (strong, constitutive); BioBrickd) cloning of multiple sgRNAs

CRISPRi multiplexing (eight genes)

140- to 1350-fold reduced expression

Integration of dCas9 and sgRNA expression cassette; CRISPRi library targeting all essential genes; titratable control of Cas9n expression

a)

Unless otherwise stated, the DNA HDR template is cloned by a standard restriction–ligation protocol (in case of dual plasmid systems, the HDR template is located on the sgRNA providing plasmid except for Liu et al. [97]). b) Efficiencies listed refer to the results from deletions using at least 500 bp per flanking region [130]. In case of multiple target genes, the efficiencies listed are the minimum values reported. c) BioBrick, consecutive cloning steps ([190]; [187]). d) Temperatures higher than 42 ∘ C inhibit functional CRISPR/Cas9-sgRNA complex formation [131]. e) Possibly low efficiency due to low deletion efficiency of epr even in single deletion experiments or higher lethality when two ds breaks?

486

13 Metabolic Engineering of Bacillus – New Tools, Strains, and Concepts

instead of Cas9, while the deletion efficiency decreased from 93 to 53% [138]. In B. licheniformis DW2 carrying a chromosomally integrated Cas9n, transformation of a single plasmid providing the sgRNA and the repair template led to 45–77 cfu μg−1 [130]. Taking the reduced transformation efficiency into account when using the regular Cas9 protein, utilization of Cas9n may have been crucial for establishment of the CRISPR/Cas9 system in this strain. The deletion efficiencies achieved are 90 and 77% for single knockout of the 0.75 kb yvmC gene and the 42.7 kb bacitracin synthase gene cluster bacABC, respectively [130]. Thus, Cas9n provides a valid alternative for the construction of robust CRISPR/Cas9 systems regarding the transformation and deletion efficiency tradeoff [130, 138]. Expression of Cas9 in genome editing systems developed for Bacillus is driven by inducible promoters (PmanP, PxylA, Pgrac), constitutive promoters (P43), or regulated by the tetLM riboswitch. Inducible promoters are chosen to limit Cas9 expression to a certain time interval, thereby reducing the metabolic burden to the cell. However, control of some of the commonly used promoters, including the xylose-inducible PxylA and the IPTG-inducible Pgrac, was shown to be not strict enough, leading to basal expression levels of Cas9 that may cause toxic effects as indicated by a reduced number of transformants under noninduced conditions [97, 133, 139]. In contrast, the B. subtilis mannose-inducible Promoter PmanPA was shown to be tightly regulated and was used to drive Cas9 expression in one of the first CRISPR/Cas9 reports for Bacillus [129]. Transformation of functional CRISPR/PmanPA-Cas9 vectors resulted in only slightly reduced colony forming units compared to the empty control vector in B. subtilis 168 [129]. However, regulation of PmanPA depends on the presence of the activator ManR and the transporter ManP, thereby limiting the application of this promoter in bacteria other than B. subtilis [140]. To overcome these limitations, the authors investigated two alternative promoters. The PxylA promoter with a plasmid-based copy of xylR, encoding the repressor of the xylose-utilization genes, and the PtetLM promoter were tested for their ability to drive Cas9 expression in a single-plasmid approach. The latter is regulated by a riboswitch with tetracycline as the ligand. Both promoters were used to construct deletion mutants in a B. subtilis 168 derivative with an efficiency of up to 97% [138]. These new promoter systems for Cas9 expression might have a wide application range in Bacillus and related species. However, transformation of the PtetLM-based plasmid resulted in extremely low transformation efficiencies, possibly due to relaxed regulation of the riboswitch, which is a known drawback of the PxylA promoter as well [138, 139, 141]. It remains to be clarified if the alternative systems improve the applicability in strains other than B. subtilis. 13.4.3 Expanding the Scope of Application – CRISPR/Cas9 in Metabolic Engineering and Synthetic Biology of Bacillus 13.4.3.1

Multiplex Genome Editing

Metabolic engineering requires modification of the expression level of multiple genes and cellular networks to balance metabolic flux and optimize product titers [142–145]. In contrast to E. coli, S. cerevisiae, Streptomyces, and Corynebacterium glutamicum [143, 146–149], established multiplexing tools for

13.4 Tools for Genome Editing

the engineering of Bacillus strains are rare [97, 98]. Simultaneous modification of two independent gene loci in B. subtilis and B. licheniformis was reported previously [98, 130]. Yet, missing strategies for multiple cloning steps required for plasmid construction or Cas9-induced toxicity hamper the applicability in multiplexing approaches. By using Cas9n cell viability was increased due to the avoidance of multiple dsDNA breaks created by Cas9 [97, 138]. But, the resulting lower selection pressure led to more frequent religation of nicked DNA independent from homology-directed recombination, which in turn led to lower editing efficiencies in Cas9n-based systems [97]. This problem could be partially solved by expression of two different spacer sequences targeting the gene of interest, which, however, also increases the cloning effort again [138]. An alternative solution to increase the efficiency of Cas9n is based on gene editing by deletion of ligD, encoding the nonessential ATP-dependent ligase LigD involved in the repair of DNA nicks [97, 150]. Simultaneous introduction of point mutations in three independent genes (amyE, aprE, upp) was achieved with an efficiency of 65% compared to 49% in ligD positive strains [97]. The role of LigD in nick repair in B. subtilis was confirmed by overexpression of ligD in strains harboring pBSCas9(n) without a repair template, resulting in strongly improved viability. Importantly, deletion of ligD did not affect growth. Furthermore, non-HDR is likely to be not fully abolished in this mutant, due to an intact ligA gene, encoding the essential NAD+ -dependent ligase LigA [97, 151]. The overall efficiency of this dual-plasmid-based system in terms of time requirement was improved using a one-step Golden Gate Assembly cloning strategy of PCR-generated sgRNA expression modules to construct the Cas9n/sgRNA multiplexing plasmids without the need for subcloning of sgRNAs and successive rounds of restriction–ligation [97, 152, 153]. In addition, curing of the HDR template-providing vector pDonor is accelerated by mannose-induced expression of a rep60-specific sgRNA located on pDonor, leading to plasmid elimination via self-targeting. However, curing of the Cas9n/sgRNA-providing plasmid by a temperature upshift is still required [97]. The applicability of this CRISPR/Cas9n-based multiplexing approach was proven by optimization of riboflavin biosynthesis in B. subtilis. Using a randomized RBS-library for ribB, ribA, and ribH, riboflavin titers, and glucose-yield increased 1.4- and 1.8-fold, respectively, compared to the already improved parental strain BS89. The editing efficiency for simultaneous modification of all three loci was 50% [97]. 13.4.3.2

Modulating Gene Expression Levels: CRISPRi and CRISPRa

While genome editing using CRISPR/Cas9(n) was shown to be compatible with multiplexing and highly beneficial for construction of randomized libraries, screening of positive clones and curing of the CRISPR components to obtain new strains prior to their characterization is still laborious [154]. Technologies that do not rely on genome editing but on transient modification of expression levels overcome these limitations. This includes antisense RNAs as well as CRISPR interference and activation (CRISPRi/a) based on the catalytically inactive Cas9 mutant dCas9 [155–157]. CRISPRi has been applied in a wide range of industrially relevant species including E. coli, Lactococcus lactis, C. glutamicum, Clostridium sp., Streptomyces sp., and B. subtilis [7, 17, 98, 143, 158–162].

487

488

13 Metabolic Engineering of Bacillus – New Tools, Strains, and Concepts

In contrast to alternative RNA-mediated technologies, CRISPRi is simple to implement, as exchange of the 20 bp short spacer sequence is sufficient for programming and shows higher silencing efficiencies [98, 163]. Moreover, CRISPRi has made gene silencing available as a routine technology in Bacillus for the first time [17, 154]. Another advantage of CRISPRi is the ability to reduce expression levels, and, ideally, to precisely tune expression levels, rather than to inactivate the target gene completely. This is of special interest for studying essential cellular functions or genes required for robust growth as well as for streamlining of metabolic fluxes without severely affecting cell physiology [17, 143, 164]. In a groundbreaking study, Peters et al. constructed a genome-wide CRISPRi library using dCas9 under control of the xylose-inducible promoter PxylA to target the essential gene set in B. subtilis [17]. Comprehensive phenotypical and physiological characterization provided new insights into the functional connection between essential genes as well as into the growth phase-dependent robustness of their expression levels. In addition, screening of the library against chemical compounds enabled the discovery of antibiotic targets [17]. From the technology perspective, it is interesting to note that transcriptional repression of a chromosomally integrated, constitutively expressed mRFP is titratable by the prevailing xylose concentration, ranging from 3- to 150-fold reduction in mRFP expression levels. Furthermore, simultaneous knockdown of eight genes was demonstrated with a remarkably high transcriptional repression of at least 140-fold during exponential growth [17]. While CRISPRi allows for downregulation of specific genes without the need for permanent genome modifications, enhancing gene expression still requires modification or exchange of regulatory sequences by either conventional or CRISPR/Cas9-based methods. To utilize the simplicity of dCas9 for gene activation, CRISPRa was developed, where dCas9 is fused to a transcriptional activator protein [157]. First applied in E. coli, fusion of the omega (𝜔) subunit of the RNA polymerase to dCas9 was also used in B. subtilis to modulate gene expression levels [7, 155]. Moreover, dCas9-𝜔 was used for simultaneous transcriptional repression and activation of three endogenous genes in a position-dependent manner, thereby providing valuable insights that aid the development of basic guidelines for location of the spacer binding site and the resulting transcriptional modulation [7]. The suitability of dCas9-𝜔 for multidimensional fine-tuning of gene expression levels was demonstrated by repression of vpr and bpr, encoding extracellular proteases in the already aprE- and nprE-deficient B. subtilis SCK6 strain, and simultaneous activation of the molecular chaperone prsA, resulting in 2.6-fold increased amylase expression [7]. 13.4.3.3

CRISPR-dCas9 Mediated Base Editing

Another strategy that utilizes the simplicity of dCas9 but enables permanent strain modifications at the genomic level is deaminase-mediated base editing [165–167]. In this case, dCas9 or Cas9n is fused to a cytidine or adenine deaminase. The sgRNA-guided Cas9 mediates target specificity and temporal formation of ssDNA which is subsequently modified by the deaminase, thereby creating C to T or A to G conversions [165–169]. In eukaryotes, base editing gained special interest as a promising approach to improve the introduction of

13.4 Tools for Genome Editing

single point mutations by circumventing formation of DNA double-strand breaks (DSB) and subsequent nonhomologous end-joining (NHEJ) [170]. Although efficient HDR in prokaryotes prevents NHEJ-related formation of indels and allows for precise genome engineering, base editing has great potential as a tool for gene inactivation, in vivo (protein) engineering, and multiplexing approaches in prokaryotes as well [171, 172]. Genetic modification independent from an exogenously provided HDR template simplifies cloning, while only moderately increasing the screening effort [171, 172]. In addition, this circumvents the possible negative impact of the HDR template located on multicopy plasmids on (cloning) host physiology and thereby simplifies plasmid construction and handling. Although the toolbox for CRISPR-dCas9-mediated deaminase base editing has expanded rapidly to increase specificity, efficiency, and the application range, base editing has been applied in a few prokaryotic hosts only, including E. coli and Streptomyces [170–172]. Only recently, a CRISPR-dCas9-mediated cytosine deaminase base editing system was developed for B. subtilis. The single-plasmid system enabled inactivation of eight extracellular proteases in just two rounds of genome editing, proving multiplexing capability while minimizing cloning effort due to the lacking requirement for editing templates [173]. The simplicity of CRISPRi/a tremendously enhances rapid screening of new targets in metabolic engineering [154, 164]. Subsequently, gene expression levels can be modified permanently to fine-tune transcriptional and translational levels [174]. However, CRISPRi may also be used as a permanent strategy in the production host; for example, to uncouple growth from production by CRISPRi-based metabolic switches as demonstrated in E. coli [175]. Future tasks include further improvement of the predictability of sgRNA binding sites and the resulting target gene expression levels to reduce the need for testing of multiple spacer sequences as well as optimization of nuclease expression levels to reduce toxicity [7, 160, 163, 164]. Expanding the toolbox by alternative CRISPR systems addresses the inherent limitation of the availability of suitable PAM sites and increases the application range as recently reviewed [128, 154]. Moreover, the free-to-use MAD7 nuclease from Inscripta (CO) was recently adapted for genome editing in B. subtilis [176]. It will be interesting to see how CRISPR-based genome editing further boosts metabolic engineering and broadens the application range of Bacillus expression hosts in industrial biotechnology. 13.4.3.4 Boosting Strain Development of Alternative Bacillus Strains Using CRISPR/Cas9

The simplicity of CRISPR/Cas9-based genome editing does not only accelerate modification of already established industrial workhorses and production platforms but also widens the choice of potential expression hosts to more difficult-to-modify strains that lack efficient genome modification tools and protocols. Recently, CRISPRi and CRISPR/Cas9-based genome editing was established in various bacteria, including the thermophilic Bacillus smithii, B. methanolicus, nonmodel Streptomyces species, and Rhodobacter sphaeroides [128, 131, 136, 177]. The applicability of dCas9 has been demonstrated in B. methanolicus MGA3, a promising host for the production of amino acids from C1-sources [136, 178].

489

490

13 Metabolic Engineering of Bacillus – New Tools, Strains, and Concepts

Transcription of mtlD was reduced by 85% during cultivation at 50 ∘ C, indicating functional dCas9 despite the previously described inactivity of the Streptococcus pyogenes Cas9 (SpCas9) at temperatures above 42 ∘ C [131, 136, 179]. However, the higher temperature might modulate the CRISPRi efficiency in this setup, as significantly higher repression efficiencies were observed in previous studies with SpCas9 in B. subtilis at moderate temperatures [17, 98]. In contrast, similar repression levels were reported by other authors also using B. subtilis and dCas9 expression levels. Thus, target-specific and spacer position-dependent effects may play a more important role in CRISPRi efficiency [7, 163, 164]. Mougiakos et al. exploited the inactivity of SpCas9 at elevated temperatures in developing a CRISPR/SpCas9-based genome editing system for the thermophilic bacterium B. smithii [131]. The temperature range for growth of B. smithii is 37–65 ∘ C with an optimum at 50 ∘ C, similar to B. methanolicus [131, 180]. Strict control of SpCas9 activity is achieved by cultivation of B. smithii at 50 ∘ C, which inhibits functional CRISPR/Cas9-sgRNA complex formation [131]. A stepwise temperature decrease to 37 ∘ C allowed for SpCas9-mediated dsDNA cleavage in clones that had not undergone allelic replacement in the first step [131]. To circumvent the need for temperature shifting, which requires adaptation of the host cell, and to increase the application range of CRISPR/Cas9, alternative CRISPR systems have been investigated including the thermostable ThermoCas9 from Geobacillus thermodenitrificans [179]. ThermoCas9 is active between 20 and 70 ∘ C and has been applied in genome editing of B. smithii and Pseudomonas putida at 50 and 37 ∘ C, respectively. In addition, ThermoCas9 was shown to be superior to SpCas9 in terms of target specificity at 37 ∘ C [179]. The lack of efficient molecular tools has hampered the exploration and development of alternative production hosts not exclusively, but, in case of Bacillus strains, especially with regard to thermophilic species [179, 180]. CRISPR/Cas9 is likely to narrow this gap between model and nonmodel strains [128, 131, 136, 179]. However, further work is required to expand the toolboxes for thermophilic Bacilli regarding more fundamental tools as previously reviewed [181]. This includes a limited number of characterized (inducible) promoters and suitable plasmids, a drawback that was recently targeted in B. methanolicus [180].

13.5 Optimization, Standardization, and Modularity in Gene Expression In synthetic biology, standardization and modularization are a prerequisite to simplify work streams, increase reproducibility, and ideally enable predictable forward engineering [182]. Most standardized modules were developed and characterized for engineering of E. coli, due to its ease to use and the predominant role as a model organism [183–185]. Only recently, fundamental work was done in the construction and characterization of biological parts for B. subtilis and related species [174, 186–188]. The recently established Bacillus BioBrick Box comprises well-characterized (integrative) plasmids, reporter genes, inducible

13.5 Optimization, Standardization, and Modularity in Gene Expression

and constitutive promoters, as well as epitope tags for N- or C-terminal translational fusions [186, 187]. All parts meet the BioBrick standard widely used in the synthetic biology community [189, 190]. Furthermore, to isolate and characterize new promoters of varying strength, a promoter screening system based on the two-step cat-lux system was established [186]. Screening of clones derived from a promoter library is achieved by selection for chloramphenicol resistance conferred by the cat gene in the first step. The minimum promoter strength desired is defined by the antibiotic concentration. Subsequently, expression levels are quantified with high sensitivity and temporal resolution using the lux operon-based luciferase readout [186]. Recently, the repertoire of precharacterized promoters spanning a wide range of transcriptional levels was expanded including constitutive, growth phase-dependent, and stress-induced promoters [174, 191–193]. The high dynamic range allows for optimal fine-tuning of expression levels, which are crucial in balancing metabolic flux and pathway design [174, 194]. However, targeting the consensus sequence of the conserved −35 and −10 promoter region resulted in strongly reduced promoter activity in the vast majority of B. subtilis clones [174]. As an alternative strategy to optimize promoter strength, the nonconserved sequence space of B. subtilis promoters was modified [7, 174, 195, 196]. This comprises random mutagenesis of the spacing region between the −35 and −10 motifs [174, 195] as well as shuffling of the UP-element upstream of the −35 region, together with the −35/−10 spacing region and the transcriptional start site [7, 196]. This approach resulted in enhanced reporter gene expression and a high dynamic range. Sauer et al. included the spacer between the Shine–Dalgarno sequence and the translational start site as a fourth element to generate a library of ∼12 000 synthetic expression modules (SEM) [196]. This SEM library was generated to optimize the already strong 𝜎 A -dependent B. subtilis housekeeping Pveg promoter and resulted in the isolation of SEMs with up to 13-fold enhanced expression levels [196]. Lu et al. provided an alternative approach called oligo-annealed promoter shuffling (OAPS) to construct new hybrid promoters from native promoter fragments [7]. Similar to DNA shuffling in directed evolution of proteins, OAPS uses in vitro shuffling of three DNA fragments (UP element, −35/−10 spacing region, TSS), each of them derived from six native promoters in this proof of concept study [7]. The fragments are generated by annealing of complementary oligonucleotides while simultaneously creating single-stranded homologous overhangs located in the conserved −35 and −10 region which allows for randomized assembly of the fragments [7]. CRISPR/Cas9-assisted chromosomal integration of the library, which was used to drive the expression of the heterologous alpha-amylase BLA, resulted in improved amylase expression in 8.8% of all transformants. This represents a high efficiency regarding synthesis of strong and functional promoter variants. The maximum amylase activity increased 100-fold compared to the frequently used P43 reference promoter [7]. Further expression systems including constitutive and inducible promoters have recently been reviewed in detail [197]. For industrial applications, especially easily controllable, auto- or self-inducible promoter systems, which are driven by specific shifts in nutrient availability, are of interest. This includes the

491

492

13 Metabolic Engineering of Bacillus – New Tools, Strains, and Concepts

acetoin-controlled acoA promoter and the mannose-regulated manP promoter from B. subtilis [198, 199] as well as the phosphate/phytate-controlled promoter phyL from B. licheniformis [200]. Recently, the srfA promoter of B. subtilis has been suggested for an expression system which is activated during high cell densities by a quorum-sensing mechanism [201, 202]. The optimization of Bacillus expression platforms also includes systematic evaluation of production bottlenecks, including translation initiation as well as screening and optimization of signal peptides [118, 174, 203–208]. In addition, the coding sequence of the target protein was shown to influence the product yield, possibly due to multiple factors including improved translation, secretion, and folding of the target protein [209], which highlights the importance of integrating all stages of protein expression in optimizing expression platforms [203, 209]. Altogether, the comprehensive set of well-characterized promoter systems as well as related technological approaches for improved protein translation, folding, and secretion efficiency provide a powerful basis for optimization of Bacillus expression platforms.

13.6 Activity-Independent Screening of Target Molecule Synthesis Optimization of expression levels is fundamental to metabolic engineering and synthetic biology [144, 194, 210]. The strategies range from sequential engineering of single elements to combinatorial, multidimensional pathway optimization and global transcription machinery engineering (gTME) with a decreasing requirement of a priori knowledge and an increasing solution space from the first to the latter [6, 100, 210–214]. To address the increasing complexity in synthetic biology, high-throughput compatible screening systems are required. The major limitation of many of the approaches described in the literature is the weak correlation of the results obtained by reporter gene-based screening and expression of the target gene, due to sequence-specific mRNA structures affecting, among others, translation initiation [7, 196, 208]. Selected industrially relevant enzymes allow for photometric or fluorometric activity assays compatible with high- or rather medium-throughput screening technologies. However, target proteins lacking suitable enzymatic activity require alternative (high-throughput technology) assays, such as immunological detection or fusion proteins for fluorometric detection [207]. Knapp et al. adapted a split GFP approach to quantify secreted proteins in a protease-deficient B. subtilis strain [207]. The C-terminal 𝛽-sheet of GFP is translationally fused to the target protein, which, after its secretion, can complement a truncated GFP provided by the detector solution. The resulting functional GFP allows for fluorometric detection and was shown to correlate with the amount of secreted target protein [207]. Out-of-frame translational fusion of the target protein to GFP provides an alternative strategy to display translation efficiency of the target protein rather than analyzing promoter activity as in promoter–reporter gene transcriptional fusion [215]. While this

13.7 The Biotechnological Application of Metabolic Engineering Strategies

readout was used in single cell analysis to monitor heterogeneity in amylase production in B. subtilis [215], it might be suitable for screening of – not ′ only but in particular – libraries of the ribosome binding sites and the 5 UTR in general, which highly affect translation initiation due to mRNA structure formation [208, 216]. Intracellular GFP expression allows for fluorescence-based high-throughput screening of the library and subsequent verification of a reduced number of candidate strains using enzymatic or immunological assays to analyze secretion of the target protein [215]. While the above-mentioned approaches are simple to implement for screening of protein production, analytics of chemical compounds require chromatographic methods, with limited high-throughput capability [217]. One possibility is to couple product formation to growth, or to design circuits creating a growth advantage for cells with increased product formation. One recent example is provided by the enhanced S-adenosylmethionine (SAM) synthesis in B. amyloliquefaciens [218]. Deletion of sucCD, encoding succinyl-CoA synthetase, led to disruption of the TCA cycle and increased flux of succinyl-CoA into the SAM (and methionine) synthesis pathway where succinate is (re-)generated. Consequently, increased SAM synthesis complements the disrupted TCA cycle and creates a growth advantage, thereby driving selection for improved strains [218]. Biosensors provide an alternative approach, meeting the requirement for high throughput screening in semirational or rather randomized strain development as recently reviewed [217, 219]. Originally developed for their application in environmental toxicology, genetically encoded biosensors have recently been designed for metabolic engineering [220, 221]. This includes riboswitches and transcription factor (TF)-based biosensors regulating protein expression upon binding of an effector molecule such as the product or a central pathway intermediate [220, 222]. Coupling sensing of the effector to the expression of a fluorescence reporter gene or a selectable marker create an output compatible with screening of large libraries using antibiotic selection, microfluidics, or FACS [217, 219]. In addition, biosensors allow for dynamic pathway control, single cell analysis, and adaptive laboratory evolution of production strains for products not related to growth or fitness of the expression host, thereby addressing major demands in metabolic engineering and synthetic biology [219]. Recently, extracytoplasmic function sigma factors (ECF) have been engineered for synthetic circuit design in B. subtilis [223, 224]. This group of alternative sigma factors shows highly diverse sensing and signaling principles and has a potentially wide application range in synthetic biology, including the construction of biosensors [219, 225].

13.7 The Biotechnological Application of Metabolic Engineering Strategies Bacillus strains are well established as cell factories to produce proteins, especially technical enzymes, and low molecular substances, such as biotin or

493

494

13 Metabolic Engineering of Bacillus – New Tools, Strains, and Concepts

Vitamines/ nucleosides Riboflavin Pyridoxine Pantothenic acid Guanosine Thymidine

Enzymes Protease Amylase Pullulanase Xylanase Esterase Lipase Phytase Cellulase

Bacillus Mono-/ polysaccharides N-Acetylglucosamine Chondroitin sulfate Heparosan Hyaluronic acid

Chemicals Lactic acid lsobutanol 2,3-Butandiol Acetoin

B. subtilis B. licheniformis B. amyloliquefaciens B. pumilus B. velezensis B. methanolicus B. clausii B. megaterium B. brevis B. smithii ...

Amino acids/ polyamines L-Lysine L-Glutamate L-Valine

1,5-Diaminopentane Poly-Y-gluamic acid

Bioactive compounds/ antibiotics Bacitracin Polymyxin Gramicidin Surfactin Lichenisin Fengycin Iturin scyllo-Inositol Taxadiene

Other bioproducts Probiotics Fertilizer Spores Biofuel Vaccines

Figure 13.3 Overview of industrially relevant Bacilli and their commercial fields of application as well as available products. Sources: Refs. [6, 10, 226–229].

bioactive compounds like surfactin or bacitracin (Figure 13.3). Although E. coli is the most important bacterial cell factory for the industrial production of biopharmaceuticals, Bacilli have become alternative hosts for the overproduction of pharmaceutically relevant proteins like hormones and growth factors [10, 227]. One major benefit of Bacilli is their naturally high efficiency to secret large amounts of extracellular enzymes. However, one bottleneck for the stable overproduction of secreted target proteins in Bacillus strains is the production of versatile extracellular proteases which degrade most heterologous proteins. The problem has been largely solved by constructing multiprotease deficient expression hosts. B. subtilis strains carrying six [230] or even eight [231] mutations in the protease-encoding genes enabled a high extracellular enrichment of overproduced heterologous proteins. Such a positive effect on the production of extracellular heterologous model proteins was also revealed by the genome-reduced B. subtilis strain PG10, which lacks 1553 genes of the wild-type strain 168 including the genes for eight major extracellular proteases [86]. It was further concluded in this “MiniBacillus” study that enhanced translation contributes to the improved secretory production of difficult proteins in the genome-reduced strain PG10. An excellent review on metabolic engineering in B. subtilis cells and their biotechnological application was recently presented by [6]. One focus of this review is the discussion of recent strategies for the improvement of 2,3-butanediol or poly-γ-glutamic acid (γ-PGA)-production in B. subtilis. Besides B. subtilis, also B. licheniformis is of special interest for production

13.7 The Biotechnological Application of Metabolic Engineering Strategies

of γ-PGA [232], a natural polymer which consists of l- and/or d-glutamic acid monomers. This multifunctional polymer is of versatile biotechnological importance for applications in cosmetics (hydrogels or hyaluronidase inhibitor), medicine (drug formulation and delivery), food production (food preservation and freeze-drying agent), or wastewater treatment (heavy metal ion chelating substance) [233]. γ-PGA production in Bacilli is frequently based on glycerol, a byproduct of biofuel production, as major carbon and energy source [232]. However, Bacillus wild-type strains were characterized by a low product yield in glycerol-based fermentation processes. Therefore, metabolic engineering approaches were tested to improve bacterial γ-PGA production in B. licheniformis. One recent approach focused on the overexpression of key enzymes in cofactor NADPH generation to increase the critical NADPH pool [234]. It was demonstrated that overexpression of Zwf (glucose-6-phosphate dehydrogenase) increases γ-PGA production in B. licheniformis. The engineered zwf strain resulted in an increase of the NADPH concentration, an improved NADPH/NADH ratio, as well as a decrease of the concentration of the main byproducts acetoin or 2,3-butanediol [232]. Zhan et al. [235] tested an optimization of γ-PGA production with glycerol by substituting the native promoter of the glpFK operon, encoding the glycerol transport facilitator (GlpF) and the glycerol kinase (GlpK), with a stronger promoter, such as the constitutive P43 promoter [235]. The resulting strain enabled more efficient utilization of glycerol leading to 30% higher γ-PGA production in B. licheniformis. Through combinational overexpression of the key enzymes GlpK, GlpX, Zwf, and Tkt1, using crude glycerol as substrate, the γ-PGA titer could be further increased [232]. In this engineered strain, overexpression of the Tkt1 transketolase seems to strengthen the flux of the pentose phosphate pathway (PPP) which provides more NADPH for γ-PGA biosynthesis. Furthermore, due to the absence of glucose in these glycerol-driven production processes, insufficient glucose 6-phosphate (G6P) generation is supposed to lead to decreased activity of the PPP [232]. Therefore, to support the upper gluconeogenesis pathway and to fuel the PPP for γ-PGA synthesis, the glpX gene was overexpressed. This combinatorial targeted metabolic engineering led to a 1.50-fold increase in γ-PGA synthesis in B. licheniformis compared to the wild-type strain. Another limiting factor in polyglutamate synthesis seems to be the supply of adenosine triphosphate (ATP). Cai et al. [236] demonstrated various successful approaches to improve ATP supply in engineered B. licheniformis strains. It was shown that the elimination of the cytochrome bd oxidase branch reduced the maintenance coefficient which supported γ-PGA synthesis. Furthermore, co-expression of a heterologous hemoglobin and combined overexpression of the homologous purB and adK genes strengthened the ATP-biosynthetic pathway and increased the γ-PGA yield [236]. These various strategies illustrate the importance of defined steps in fine tuning the metabolic capacity by iteratively adapting the physiology of individual polyglutamate-overproducing strains. An interesting approach to produce a precursor of the chemotherapeutic compound Taxol in B. subtilis 168 was recently presented [237]. This metabolic engineering approach was based on the cloning and functional heterologous expression of the plant-derived taxadiene synthase, which catalyzes

495

496

13 Metabolic Engineering of Bacillus – New Tools, Strains, and Concepts

the conversion of the precursor geranylgeranyl pyrophosphate (GGPP) to taxa-4,11-diene. In a next step, a synthetic operon (dxs, ispD, ispF, ispH, ispC, ispE, ispG) was designed harboring the B. subtilis genes for the 2-C-methyld-erythritol-4-phosphate pathway. This pathway was supported by the co-expression of heterologous yqiD and ctrE genes which encode the geranyl and farnesyl pyrophosphate synthases, providing farnesyl pyrophosphate, and the geranylgeranyl pyrophosphate synthase (GGPPS), providing the precursor GGPP, respectively. The combined overexpression of these homologous and heterologous genes enabled a total amount of about 17.8 mg l−1 taxadiene in the extracellular medium by the respective engineered B. subtilis strain [237]. Another important field for the biotechnological application of Bacillus is the production of peptide antibiotics such as bacitracin. This broad-spectrum antibiotic, which is produced in B. subtilis and B. licheniformis, is synthesized by nonribosomal peptide synthetases (NRPSs) and composed of nine amino acids including the branched chain amino acids (BCAA) Leu, Ile, and Val [238]. The large NRPS multienzyme complexes are difficult to manipulate, and the supply of essential BCAA as precursors is one major bottleneck in bacitracin production. However, as recently shown, a mutation of the uncharacterized gene yhdG, encoding a putative BCAA permease, resulted in an increase of bacitracin production of B. licheniformis DW2 [239]. Furthermore, the deletion of the lrp gene for the leucine-responsive regulator increased the intracellular BCAA level by increased expression of the BCAA importer BrnQ [240]. Another strategy for an improved supply of BCAA was the enhancement of the amino acid metabolism by the manipulation of the TCA cycle by means of overexpressing the key enzymes citrate synthase CitZ and isocitrate dehydrogenase Icd [241]. This approach seemed to ensure an enhanced metabolic flux through the TCA cycle for improved energy and precursor supply for BCAA biosynthesis and uptake. The increased amino acid productivity in turn resulted in an improved bacitracin production [241]. An alternative approach for the enhanced production of bacitracin in B. licheniformis was tested by Cai et al. [238], who engineered the amino acid biosynthesis by manipulating main TFs for the regulation of carbon and nitrogen metabolism. The authors demonstrated that deletions of the genes for the carbon catabolite repressor CcpN and the LysR-type transcriptional regulator CcpC led to increased bacitracin yields. The ccpC deletion resulted in an overexpression of the target genes for the citrate synthase CitZ and aconitate hydratase CitB of the TCA cycle, and the ccpN deletion caused improved expression of the phosphoenolpyruvate carboxykinase gene pckA and the glyceraldehyde-3-phosphate dehydrogenase-encoding gene gapB. This metabolic engineering strategy for these ATP-forming and NAPDH-dependent enzymes most probably led to an improved precursor level as well as ATP and NADPH supply and thus to more efficient amino acid and bacitracin biosynthesis. Moreover, the individual P43 promoter-mediated overexpression of the nitrogen-responsive regulators CodY, TnrA, and GlnR also led to increased intracellular BCAA levels, and thus further enhanced bacitracin yields [238]. The combinatorial genome editing of the ccpC and ccpN mutations with chromosomally integrated codY , tnrA, and glnR-overexpressing gene copies

13.8 Concluding Remarks and Future Perspectives

enabled a further significantly increased bacitracin yield in the metabolically engineered B. licheniformis strain DW2-CNCTG. Despite being a relatively new technology, CRISPRi systems have been developed and successfully applied in synthetic biology approaches in Bacillus. Inhibition of byproduct formation and competing pathways without the need for permanent genome modification represents a major benefit which bears strong potential to simplify and accelerate screening of new targets for strain engineering. Moreover, CRISPRi allows for fine tuning of gene expression levels facilitating new design strategies. The advantage of CRISPRi-based metabolic engineering of Bacillus spp. became apparent in a recent study on the optimization of a heterologous hyaluronic acid (HA) biosynthesis pathway in B. subtilis [164]. Formation of HA from the monomeric precursor uridine diphosphate (UDP)-glucuronic acid and UDP-N-acetyl-glucosamine competes with central carbon metabolism and cell wall biosynthesis (Figure 13.4). Two key-enzymes compete with flux toward product formation, Zwf (glucose 6-phosphate dehydrogenase), which initiates the first step of the pentose phosphate way, and PfkA (6-phosphofructokinase), which feeds fructose-6-P into the glycolytic pathway. Increasing product formation without severely affecting essential cellular functions is challenging, as demonstrated by growth defects resulting from inactivation of pfkA [16]. In contrast, CRISPRi-based reduction of pfkA and zwf expression levels by 80–90% resulted in twofold higher product titers and reduced production of overflow metabolites indicating a reduced carbon flux through glycolysis, while maintaining wild type-like cell densities [164]. In another study, biosynthesis of the lipopeptide surfactin in B. subtilis was improved using CRISPRi to modify the amino acid metabolism of the production host [160]. Prescreening of 20 single knockdown strains and subsequent combinatorial repression of the most promising target genes resulted in a 4.5-fold increase in final product titers, volumetric productivity, and carbon-source yield [160]. Many of the targets that resulted in higher yields are related to aspartate or glutamate metabolism, indicating that a shutdown of these pathways increased the supply of precursor molecules for surfactin biosynthesis. Although the repression of the yrpC and racE genes, encoding glutamate racemases, was accompanied by improved growth, this parameter contributed only partially to improved productivity [160]. Furthermore, two of the three targets (yrpC, murC), whose repression caused the highest productivity increase in the prescreening process, were among the least repressed genes with an expression level of still 20%. This study indicated the advantage of reducing competing pathways by metabolic strategies rather than their complete inactivation, in particular in case of genes affecting cellular core functions (murC and racE are essential genes in B. subtilis) [160].

13.8 Concluding Remarks and Future Perspectives In an ideal scenario, metabolic modeling allows for high predictability regarding the characteristics of tailored cell factories. But we are still at the beginning

497

498

13 Metabolic Engineering of Bacillus – New Tools, Strains, and Concepts

Figure 13.4 Metabolic engineering of the hyaluronic acid (HA) biosynthetic pathway in B. subtilis. Source: Adapted from Westbrook et al. [164]. Proteins marked in blue represent targets for upregulation. Primary target reactions (proteins) for downregulation are marked in orange. Expression of TuaB (UDP-glucose 6-dehydrogenase) and HasA (hyaluronan synthase from Streptococcus sp.) is upregulated in all engineered strains and critical for high-level HA production [164]. UDP, uridine diphosphate; GlcUA, glucuronic acid; GlcNAc, N-acetyl-glucosamine; PfkA, 6-phosphofructokinase; Pgi, glucose-6-P isomerase; Zwf, glucose-6-P 1-dehydrogenase; PgcA, phosphoglucomutase; GlmS, glutamine-fructose-6-P amidotransferase; GlnA, glutamine synthetase; GtaB, UTP-glucose-1-P uridylyltransferase; GlmM, phosphoglucosamine mutase; GcaD, UDP-GlcNAc pyrophosphorylase; TagE, UDP-glucose: polyglycerol phosphate glucosyltransferase; TagO, UDP-GlcNAc: undecaprenyl-P N-acetylglucosaminyl-1-P transferase; MnaA, UDP-N-acetylmannosamine 2-epimerase; MurAA, UDP-GlcNAc 1-carboxyvinyltransferase; MurAB, UDP-GlcNAc 1-carboxyvinyltransfe-rase. Note that MurAA, MurAB, TagO, and MnaA are essential to B. subtilis. Hyaluronic acid is frequently produced from sucrose as the primary carbon source [164]. The required sucrose-specific phosphotransferase system with the sucrose transporter SacP as well as the sucrose-6-P hydrolase SacA are not depicted.

of understanding biological systems in a holistic way. Although B. subtilis is one of the best characterized model organisms, the complexity of this highly versatile bacterium is still insufficiently understood. Consequently, there is a strong demand for the development and improvement of genome-scale metabolic models. In addition, further in-depth characterizations of unknown protein functions are required. Synthetic biology approaches have the potential to enhance our hitherto limited understanding of host cell physiology by applying new technologies, as outlined above in this chapter. This includes multidimensional pathway optimization and gTME recently applied successfully for optimization of Bacillus cell factories [7, 212]. Moreover, these technologies increase the solution space compared to rather traditional metabolic engineering approaches. The development of more sophisticated readout systems to improve screening of large libraries resulting from complex design strategies will be crucial for the

References

success of these synthetic biology approaches. Bacillus bears strong potential as a mining source for a genetically encoded biosensor providing new solutions for activity-independent screening [219, 225]. Finally, the construction of a minimal cell with an essential gene set will help to reduce complexity thereby contributing to improved understanding of cellular life and consequently improved chassis design. B. subtilis is one of the most extensively studied organism in this context, and the studies conducted provide valuable information to the whole synthetic biology community [67, 75, 82]. Moreover, genome reduction approaches have the potential to uncover unknown physiological effects, which may result in improved but also undesirable strain characteristics [67]. Future work in the field of metabolic engineering and adaptive laboratory evolution may help to construct improved and more robust minimal Bacillus cells as new cell factories [69, 82, 242, 243]. Promising targets for future strain optimization by the minimal genome concept are the versatile stress and starvation adaptation mechanisms of bacterial host cells. Bacteria have developed several molecular strategies to adapt to adverse environmental conditions and to compete against other organisms in their natural habitats. In nature, Bacilli usually encounter limiting growth conditions, in a condensed, mixed population of high cell densities in a dynamic, competitive microbial community. However, under controlled environmental conditions in an industrial bioreactor, these natural adaptation processes are usually not required as abiotic stress is rather low, continuous substrate supply is guaranteed, and there is no competition for limited resources with other microbes. Thus, studying the underlying mechanisms of specific regulatory circuits and gene functions for specific adaptive stress, starvation, or survival mechanisms by modulation or inactivation of gene expression are suitable future approaches of metabolic engineering. Genomically streamlined “designer bugs” would not only lead to more resilient and efficient Bacillus cell factories for established industrial bioprocesses but would also result in new cell chassis for the sustainable and economical production of novel bioproducts. Finally, such “designer bugs” are a prerequisite to substitute established chemical production processes by new synthetic bioprocesses.

References 1 Zhang, W., Li, F., and Nie, L. (2010). Integrating multiple “omics” analysis

for microbial biology: application and methodologies. Microbiology (Reading, England) 156: 287–301. https://doi.org/10.1099/mic.0.034793-0. 2 Bartley, B.A., Kim, K., Medley, J.K., and Sauro, H.M. (2017). Synthetic biology: engineering living systems from biophysical principles. Biophysical Journal 112: 1050–1058. https://doi.org/10.1016/j.bpj.2017.02.013. 3 Anagnostopoulos, C. and Spizizen, J. (1961). Requirements for transformation in Bacillus subtilis. Journal of Bacteriology 81: 741–746. https://doi.org/ 10.1128/JB.81.5.741-746.1961. 4 Freese, E. (1972). Chapter 3. Sporulation of Bacilli, a model of cellular differentiation. In: Current Topics in Developmental Biology, vol. 7, 85–124. Elsevier.

499

500

13 Metabolic Engineering of Bacillus – New Tools, Strains, and Concepts

5 Schaeffer, P., Millet, J., and Aubert, J.P. (1965). Catabolic repression of bac-

6

7

8

9 10

11

12

13

14

15

16

17

18

terial sporulation. Proceedings of the National Academy of Sciences of the United States of America 54: 704–711. https://doi.org/10.1073/pnas.54.3.704. Gu, Y., Xu, X., Wu, Y. et al. (2018). Advances and prospects of Bacillus subtilis cellular factories: from rational design to industrial applications. Metabolic Engineering 50: 109–121. https://doi.org/10.1016/j.ymben.2018.05 .006. Lu, Z., Yang, S., Yuan, X. et al. (2019). CRISPR-assisted multi-dimensional regulation for fine-tuning gene expression in Bacillus subtilis. Nucleic Acids Research 47: e40. https://doi.org/10.1093/nar/gkz072. Schallmey, M., Singh, A., and Ward, O.P. (2004). Developments in the use of Bacillus species for industrial production. Canadian Journal of Microbiology 50: 1–17. https://doi.org/10.1139/w03-076. Simonen, M. and Palva, I. (1993). Protein secretion in Bacillus species. Microbiological Reviews 57: 109–137. Westers, L., Westers, H., and Quax, W.J. (2004). Bacillus subtilis as cell factory for pharmaceutical proteins: a biotechnological approach to optimize the host organism. Biochimica et Biophysica Acta 1694: 299–310. https://doi.org/10.1016/j.bbamcr.2004.02.011. Zweers, J.C., Barák, I., Becher, D. et al. (2008). Towards the development of Bacillus subtilis as a cell factory for membrane proteins and protein complexes. Microbial Cell Factories 7: 10. https://doi.org/10.1186/14752859-7-10. Barbe, V., Cruveiller, S., Kunst, F. et al. (2009). From a consortium sequence to a unified sequence: the Bacillus subtilis 168 reference genome a decade later. Microbiology (Reading, England) 155: 1758–1775. https://doi.org/10 .1099/mic.0.027839-0. Kunst, F., Ogasawara, N., Moszer, I. et al. (1997). The complete genome sequence of the gram-positive bacterium Bacillus subtilis. Nature 390: 249–256. https://doi.org/10.1038/36786. Juhas, M., Reuß, D.R., Zhu, B., and Commichau, F.M. (2014). Bacillus subtilis and Escherichia coli essential genes and minimal cell factories after one decade of genome engineering. Microbiology (Reading, England) 160: 2341–2351. https://doi.org/10.1099/mic.0.079376-0. Kobayashi, K., Ehrlich, S.D., Albertini, A. et al. (2003). Essential Bacillus subtilis genes. Proceedings of the National Academy of Sciences of the United States of America 100: 4678–4683. https://doi.org/10.1073/pnas.0730515100. Koo, B.-M., Kritikos, G., Farelli, J.D. et al. (2017). Construction and analysis of two genome-scale deletion libraries for Bacillus subtilis. Cell Systems 4: 291–305.e7. https://doi.org/10.1016/j.cels.2016.12.013. Peters, J.M., Colavin, A., Shi, H. et al. (2016). A comprehensive, CRISPR-based functional analysis of essential genes in bacteria. Cell 165: 1493–1506. https://doi.org/10.1016/j.cell.2016.05.003. Nicolas, P., Mäder, U., Dervyn, E. et al. (2012). Condition-dependent transcriptome reveals high-level regulatory architecture in Bacillus subtilis. Science (New York, N.Y.) 335: 1103–1106. https://doi.org/10.1126/science .1206848.

References

19 Rasmussen, S., Nielsen, H.B., and Jarmer, H. (2009). The transcriptionally

20

21

22

23

24

25

26

27

28

29

30

31

active regions in the genome of Bacillus subtilis. Molecular Microbiology 73: 1043–1057. https://doi.org/10.1111/j.1365-2958.2009.06830.x. Kohlstedt, M., Sappa, P.K., Meyer, H. et al. (2014). Adaptation of Bacillus subtilis carbon core metabolism to simultaneous nutrient limitation and osmotic challenge: a multi-omics perspective. Environmental Microbiology 16: 1898–1917. https://doi.org/10.1111/1462-2920.12438. Maa𝛽, S., Wachlin, G., Bernhardt, J. et al. (2014). Highly precise quantification of protein molecules per cell during stress and starvation responses in Bacillus subtilis. Molecular & Cellular Proteomics: MCP 13: 2260–2276. https://doi.org/10.1074/mcp.M113.035741. Muntel, J., Fromion, V., Goelzer, A. et al. (2014). Comprehensive absolute quantification of the cytosolic proteome of Bacillus subtilis by data independent, parallel fragmentation in liquid chromatography/mass spectrometry (LC/MS(E)). Molecular & Cellular Proteomics: MCP 13: 1008–1019. https:// doi.org/10.1074/mcp.M113.032631. Otto, A., Bernhardt, J., Meyer, H. et al. (2010). Systems-wide temporal proteomic profiling in glucose-starved Bacillus subtilis. Nature Communications 1: 137. https://doi.org/10.1038/ncomms1137. Tjalsma, H., Antelmann, H., Jongbloed, J.D.H. et al. (2004). Proteomics of protein secretion by Bacillus subtilis: separating the “secrets” of the secretome. Microbiology and Molecular Biology Reviews: MMBR 68: 207–233. https://doi.org/10.1128/MMBR.68.2.207-233.2004. Fischer, E. and Sauer, U. (2005). Large-scale in vivo flux analysis shows rigidity and suboptimal performance of Bacillus subtilis metabolism. Nature Genetics 37: 636–640. https://doi.org/10.1038/ng1555. Liu, Y., Link, H., Liu, L. et al. (2016). A dynamic pathway analysis approach reveals a limiting futile cycle in N-acetylglucosamine overproducing Bacillus subtilis. Nature Communications 7: 11933. https://doi.org/10.1038/ ncomms11933. Meyer, H., Weidmann, H., Mäder, U. et al. (2014). A time resolved metabolomics study: the influence of different carbon sources during growth and starvation of Bacillus subtilis. Molecular BioSystems 10: 1812–1823. https://doi.org/10.1039/c4mb00112e. Buescher, J.M., Liebermeister, W., Jules, M. et al. (2012). Global network reorganization during dynamic adaptations of Bacillus subtilis metabolism. Science (New York, N.Y.) 335: 1099–1103. https://doi.org/10.1126/science .1206871. Zhu, B. and Stülke, J. (2018). SubtiWiki in 2018: from genes and proteins to functional network annotation of the model organism Bacillus subtilis. Nucleic Acids Research 46: D743–D748. https://doi.org/10.1093/nar/gkx908. Eijlander, R.T., de Jong, A., Krawczyk, A.O. et al. (2014). SporeWeb: an interactive journey through the complete sporulation cycle of Bacillus subtilis. Nucleic Acids Research 42: D685–D691. https://doi.org/10.1093/nar/ gkt1007. Caspi, R., Altman, T., Billington, R. et al. (2014). The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of

501

502

13 Metabolic Engineering of Bacillus – New Tools, Strains, and Concepts

32

33

34

35

36

37 38

39

40

41

42

43

44

pathway/genome databases. Nucleic Acids Research 42: D459–D471. https:// doi.org/10.1093/nar/gkt1103. Henry, C.S., Zinner, J.F., Cohoon, M.P., and Stevens, R.L. (2009). iBsu1103: a new genome-scale metabolic model of Bacillus subtilis based on SEED annotations. Genome Biology 10: R69. https://doi.org/10.1186/gb2009-10-6-r69. Sierro, N., Makita, Y., de Hoon, M., and Nakai, K. (2008). DBTBS: a database of transcriptional regulation in Bacillus subtilis containing upstream intergenic conservation information. Nucleic Acids Research 36: D93–D96. https://doi.org/10.1093/nar/gkm910. Tanaka, K., Henry, C.S., Zinner, J.F. et al. (2013). Building the repertoire of dispensable chromosome regions in Bacillus subtilis entails major refinement of cognate large-scale metabolic model. Nucleic Acids Research 41: 687–699. https://doi.org/10.1093/nar/gks963. Hecker, M., Reder, A., Fuchs, S. et al. (2009). Physiological proteomics and stress/starvation responses in Bacillus subtilis and Staphylococcus aureus. Research in Microbiology 160: 245–258. https://doi.org/10.1016/j.resmic.2009 .03.008. Voigt, B., Schroeter, R., Schweder, T. et al. (2014). A proteomic view of cell physiology of the industrial workhorse Bacillus licheniformis. Journal of Biotechnology 191: 139–149. https://doi.org/10.1016/j.jbiotec.2014.06.004. Schweder, T. (2011). Bioprocess monitoring by marker gene analysis. Biotechnology Journal 6: 926–933. https://doi.org/10.1002/biot.201100248. Antelmann, H., Scharf, C., and Hecker, M. (2000). Phosphate starvation-inducible proteins of Bacillus subtilis: proteomics and transcriptional analysis. Journal of Bacteriology 182: 4478–4490. https://doi.org/ 10.1128/jb.182.16.4478-4490.2000. Bernhardt, J., Weibezahn, J., Scharf, C., and Hecker, M. (2003). Bacillus subtilis during feast and famine: visualization of the overall regulation of protein synthesis during glucose starvation by proteome analysis. Genome Research 13: 224–237. https://doi.org/10.1101/gr.905003. Le Hoi, T., Voigt, B., Jürgen, B. et al. (2006). The phosphate-starvation response of Bacillus licheniformis. Proteomics 6: 3582–3601. https://doi.org/ 10.1002/pmic.200500842. Voigt, B., Schweder, T., Becher, D. et al. (2004). A proteomic view of cell physiology of Bacillus licheniformis. Proteomics 4: 1465–1490. https:// doi.org/10.1002/pmic.200300684. Voigt, B., Le Hoi, T., Jürgen, B. et al. (2007). The glucose and nitrogen starvation response of Bacillus licheniformis. Proteomics 7: 413–423. https:// doi.org/10.1002/pmic.200600556. Atkinson, G.C., Tenson, T., and Hauryliuk, V. (2011). The RelA/SpoT homolog (RSH) superfamily: distribution and functional evolution of ppGpp synthetases and hydrolases across the tree of life. PLoS One 6: e23479. https://doi.org/10.1371/journal.pone.0023479. Hecker, M., Pané-Farré, J., and Völker, U. (2007). SigB-dependent general stress response in Bacillus subtilis and related gram-positive bacteria.

References

45

46

47

48 49

50

51

52

53

54

55

56

57

58

Annual Review of Microbiology 61: 215–236. https://doi.org/10.1146/annurev .micro.61.080706.093445. Le Tam, T., Antelmann, H., Eymann, C. et al. (2006). Proteome signatures for stress and starvation in Bacillus subtilis as revealed by a 2-D gel image color coding approach. Proteomics 6: 4565–4585. https://doi.org/10.1002/ pmic.200600100. Kabisch, J., Pratzka, I., Meyer, H. et al. (2013). Metabolic engineering of Bacillus subtilis for growth on overflow metabolites. Microbial Cell Factories 12: 72. https://doi.org/10.1186/1475-2859-12-72. Chai, Y., Chu, F., Kolter, R., and Losick, R. (2008). Bistability and biofilm formation in Bacillus subtilis. Molecular Microbiology 67: 254–263. https:// doi.org/10.1111/j.1365-2958.2007.06040.x. Dubnau, D. (1991). Genetic competence in Bacillus subtilis. Microbiological Reviews 55: 395–424. Kearns, D.B. and Losick, R. (2005). Cell population heterogeneity during growth of Bacillus subtilis. Genes & Development 19: 3083–3094. https://doi.org/10.1101/gad.1373905. Lopez, D., Vlamakis, H., and Kolter, R. (2009). Generation of multiple cell types in Bacillus subtilis. FEMS Microbiology Reviews 33: 152–163. https:// doi.org/10.1111/j.1574-6976.2008.00148.x. Veening, J.-W., Hamoen, L.W., and Kuipers, O.P. (2005). Phosphatases modulate the bistable sporulation gene expression pattern in Bacillus subtilis. Molecular Microbiology 56: 1481–1494. https://doi.org/10.1111/j.1365-2958 .2005.04659.x. Vlamakis, H., Aguilar, C., Losick, R., and Kolter, R. (2008). Control of cell fate by the formation of an architecturally complex bacterial community. Genes & Development 22: 945–953. https://doi.org/10.1101/gad.1645008. Marlow, V.L., Cianfanelli, F.R., Porter, M. et al. (2014). The prevalence and origin of exoprotease-producing cells in the Bacillus subtilis biofilm. Microbiology (Reading, England) 160: 56–66. https://doi.org/10.1099/mic.0 .072389-0. Veening, J.-W., Igoshin, O.A., Eijlander, R.T. et al. (2008). Transient heterogeneity in extracellular protease production by Bacillus subtilis. Molecular Systems Biology 4: 184. https://doi.org/10.1038/msb.2008.18. Piggot, P.J. and Hilbert, D.W. (2004). Sporulation of Bacillus subtilis. Current Opinion in Microbiology 7: 579–586. https://doi.org/10.1016/j.mib.2004 .10.001. González-Pastor, J.E., Hobbs, E.C., and Losick, R. (2003). Cannibalism by sporulating bacteria. Science (New York, N.Y.) 301: 510–513. https://doi.org/ 10.1126/science.1086462. Claverys, J.-P. and Håvarstein, L.S. (2007). Cannibalism and fratricide: mechanisms and raisons d’être. Nature Reviews. Microbiology 5: 219–229. https:// doi.org/10.1038/nrmicro1613. González-Pastor, J.E. (2011). Cannibalism: a social behavior in sporulating Bacillus subtilis. FEMS Microbiology Reviews 35: 415–424. https://doi .org/10.1111/j.1574-6976.2010.00253.x.

503

504

13 Metabolic Engineering of Bacillus – New Tools, Strains, and Concepts

59 Schweder, T., Krüger, E., Xu, B. et al. (1999). Monitoring of genes

60

61

62

63

64

65

66

67

68

69

70

71

72

that respond to process-related stress in large-scale bioprocesses. Biotechnology and Bioengineering 65: 151–159. https://doi.org/10.1002/ (sici)1097-0290(19991020)65:23.0.co;2-v. Nannapaneni, P., Hertwig, F., Depke, M. et al. (2012). Defining the structure of the general stress regulon of Bacillus subtilis using targeted microarray analysis and random forest classification. Microbiology (Reading, England) 158: 696–707. https://doi.org/10.1099/mic.0.055434-0. Pané-Farré, J., Quin, M.B., Lewis, R.J., and Marles-Wright, J. (2017). Structure and function of the stressosome signalling hub. Sub-Cellular Biochemistry 83: 1–41. https://doi.org/10.1007/978-3-319-46503-6_1. Enfors, S.-O., Jahic, M., Rozkov, A. et al. (2001). Physiological responses to mixing in large scale bioreactors. Journal of Biotechnology 85: 175–185. https://doi.org/10.1016/s0168-1656(00)00365-5. Tännler, S., Decasper, S., and Sauer, U. (2008). Maintenance metabolism and carbon fluxes in Bacillus species. Microbial Cell Factories 7: 19. https:// doi.org/10.1186/1475-2859-7-19. Heijnen, J.J., Roels, J.A., and Stouthamer, A.H. (1979). Application of balancing methods in modeling the penicillin fermentation. Biotechnology and Bioengineering 21: 2175–2201. https://doi.org/10.1002/bit.260211204. Reuß, D.R., Commichau, F.M., Gundlach, J. et al. (2016). The blueprint of a minimal cell: MiniBacillus. Microbiology and Molecular Biology Reviews: MMBR 80: 955–987. https://doi.org/10.1128/MMBR.00029-16. Ara, K., Ozaki, K., Nakamura, K. et al. (2007). Bacillus minimum genome factory: effective utilization of microbial genome information. Biotechnology and Applied Biochemistry 46: 169–178. https://doi.org/10.1042/BA20060111. Morimoto, T., Kadoya, R., Endo, K. et al. (2008). Enhanced recombinant protein productivity by genome reduction in Bacillus subtilis. DNA Research: An International Journal for Rapid Publication of Reports on Genes and Genomes 15: 73–81. https://doi.org/10.1093/dnares/dsn002. Hutchison, C.A., Chuang, R.-Y., Noskov, V.N. et al. (2016). Design and synthesis of a minimal bacterial genome. Science (New York, N.Y.) 351: aad6253. https://doi.org/10.1126/science.aad6253. Bu, Q.-T., Yu, P., Wang, J. et al. (2019). Rational construction of genome-reduced and high-efficient industrial Streptomyces chassis based on multiple comparative genomic approaches. Microbial Cell Factories 18: 16. https://doi.org/10.1186/s12934-019-1055-7. Hirokawa, Y., Kawano, H., Tanaka-Masuda, K. et al. (2013). Genetic manipulations restored the growth fitness of reduced-genome Escherichia coli. Journal of Bioscience and Bioengineering 116: 52–58. https://doi.org/10.1016/ j.jbiosc.2013.01.010. Komatsu, M., Uchiyama, T., Omura, S. et al. (2010). Genome-minimized Streptomyces host for the heterologous expression of secondary metabolism. Proceedings of the National Academy of Sciences of the United States of America 107: 2646–2651. https://doi.org/10.1073/pnas.0914833107. Leprince, A., de Lorenzo, V., Völler, P. et al. (2012). Random and cyclical deletion of large DNA segments in the genome of Pseudomonas putida.

References

73

74

75

76

77

78

79

80

81

82

83

84

Environmental Microbiology 14: 1444–1453. https://doi.org/10.1111/j.14622920.2012.02730.x. Lieder, S., Nikel, P.I., de Lorenzo, V., and Takors, R. (2015). Genome reduction boosts heterologous gene expression in Pseudomonas putida. Microbial Cell Factories 14: 23. https://doi.org/10.1186/s12934-015-0207-7. Pósfai, G., Plunkett, G., Fehér, T. et al. (2006). Emergent properties of reduced-genome Escherichia coli. Science (New York, N.Y.) 312: 1044–1046. https://doi.org/10.1126/science.1126439. Reuß, D.R., Altenbuchner, J., Mäder, U. et al. (2017). Large-scale reduction of the Bacillus subtilis genome: consequences for the transcriptional network, resource allocation, and metabolism. Genome Research 27: 289–299. https://doi.org/10.1101/gr.215293.116. Sasaki, M., Kumagai, H., Takegawa, K., and Tohda, H. (2013). Characterization of genome-reduced fission yeast strains. Nucleic Acids Research 41: 5382–5399. https://doi.org/10.1093/nar/gkt233. Shen, X., Wang, Z., Huang, X. et al. (2017). Developing genome-reduced Pseudomonas chlororaphis strains for the production of secondary metabolites. BMC Genomics 18: 715. https://doi.org/10.1186/s12864-0174127-2. Unthan, S., Baumgart, M., Radek, A. et al. (2015). Chassis organism from Corynebacterium glutamicum – a top-down approach to identify and delete irrelevant gene clusters. Biotechnology Journal 10: 290–301. https://doi.org/ 10.1002/biot.201400041. Zhu, D., Fu, Y., Liu, F. et al. (2017). Enhanced heterologous protein productivity by genome reduction in Lactococcus lactis NZ9000. Microbial Cell Factories 16: 1. https://doi.org/10.1186/s12934-016-0616-2. Commichau, F.M., Pietack, N., and Stülke, J. (2013). Essential genes in Bacillus subtilis: a re-evaluation after ten years. Molecular BioSystems 9: 1068–1075. https://doi.org/10.1039/c3mb25595f. Westers, H., Dorenbos, R., van Dijl, J.M. et al. (2003). Genome engineering reveals large dispensable regions in Bacillus subtilis. Molecular Biology and Evolution 20: 2076–2090. https://doi.org/10.1093/molbev/msg219. Manabe, K., Kageyama, Y., Morimoto, T. et al. (2011). Combined effect of improved cell yield and increased specific productivity enhances recombinant enzyme production in genome-reduced Bacillus subtilis strain MGB874. Applied and Environmental Microbiology 77: 8370–8381. https:// doi.org/10.1128/AEM.06136-11. Wenzel, M. and Altenbuchner, J. (2015). Development of a markerless gene deletion system for Bacillus subtilis based on the mannose phosphoenolpyruvate-dependent phosphotransferase system. Microbiology (Reading, England) 161: 1942–1949. https://doi.org/10.1099/mic.0.000150. Li, Y., Zhu, X., Zhang, X. et al. (2016). Characterization of genome-reduced Bacillus subtilis strains and their application for the production of guanosine and thymidine. Microbial Cell Factories 15: 94. https://doi.org/10.1186/ s12934-016-0494-7.

505

506

13 Metabolic Engineering of Bacillus – New Tools, Strains, and Concepts

85 Borriss, R., Danchin, A., Harwood, C.R. et al. (2018). Bacillus subtilis,

86

87

88

89

90

91 92

93

94

95

96

97

98

the model Gram-positive bacterium: 20 years of annotation refinement. Microbial Biotechnology 11 (1): 3–17. doi.org/10.1111/1751-7915.13043. Aguilar Suárez, R., Stülke, J., and van Dijl, J.M. (2019). Less is more: toward a genome-reduced Bacillus cell factory for “difficult proteins”. ACS Synthetic Biology 8: 99–108. https://doi.org/10.1021/acssynbio.8b00342. Geissler, M., Kühle, I., Morabbi Heravi, K. et al. (2019). Evaluation of surfactin synthesis in a genome reduced Bacillus subtilis strain. AMB Express 9: 84. https://doi.org/10.1186/s13568-019-0806-5. Ali, N.O., Jeusset, J., Larquet, E. et al. (2003). Specificity of the interaction of RocR with the rocG-rocA intergenic region in Bacillus subtilis. Microbiology (Reading, England) 149: 739–750. https://doi.org/10.1099/mic.0.26013-0. Gardan, R., Rapoport, G., and Débarbouillé, M. (1997). Role of the transcriptional activator RocR in the arginine-degradation pathway of Bacillus subtilis. Molecular Microbiology 24: 825–837. https://doi.org/10 .1046/j.1365-2958.1997.3881754.x. Commichau, F.M., Herzberg, C., Tripal, P. et al. (2007). A regulatory protein–protein interaction governs glutamate biosynthesis in Bacillus subtilis: the glutamate dehydrogenase RocG moonlights in controlling the transcription factor GltC. Molecular Microbiology 65 (3): 642–654. doi.org/ 10.1111/j.1365-2958.2007.05816.x. Sonenshein, A.L., Hoch, J.A., and Losick, R. (2002). Bacillus subtilis and its closest relatives. Washington, DC: ASM Press 629 pp. Walker, M.C. and van der Donk, W.A. (2016). The many roles of glutamate in metabolism. Journal of Industrial Microbiology & Biotechnology 43: 419–430. https://doi.org/10.1007/s10295-015-1665-y. Lee, J.H., Sung, B.H., Kim, M.S. et al. (2009). Metabolic engineering of a reduced-genome strain of Escherichia coli for L-threonine production. Microbial Cell Factories 8: 2. https://doi.org/10.1186/1475-2859-8-2. Manabe, K., Kageyama, Y., Morimoto, T. et al. (2013). Improved production of secreted heterologous enzyme in Bacillus subtilis strain MGB874 via modification of glutamate metabolism and growth conditions. Microbial Cell Factories 12: 18. https://doi.org/10.1186/1475-2859-12-18. Kabisch, J., Thürmer, A., Hübel, T. et al. (2013). Characterization and optimization of Bacillus subtilis ATCC 6051 as an expression host. Journal of Biotechnology 163: 97–104. https://doi.org/10.1016/j.jbiotec.2012.06.034. Wang, Y., Chen, Z., Zhao, R. et al. (2014). Deleting multiple lytic genes enhances biomass yield and production of recombinant proteins by Bacillus subtilis. Microbial Cell Factories 13: 129. https://doi.org/10.1186/s12934014-0129-9. Liu, D., Huang, C., Guo, J. et al. (2019). Development and characterization of a CRISPR/Cas9n-based multiplex genome editing system for Bacillus subtilis. Biotechnology for Biofuels 12: 197. https://doi.org/10.1186/s13068019-1537-1. Westbrook, A.W., Moo-Young, M., and Chou, C.P. (2016). Development of a CRISPR-Cas9 tool kit for comprehensive engineering of Bacillus

References

99 100

101

102

103 104

105

106

107

108

109

110

111

112

subtilis. Applied and Environmental Microbiology 82: 4876–4895. https:// doi.org/10.1128/AEM.01159-16. Harwood, C.R., Pohl, S., Smith, W., and Wipat, A. (2013). Bacillus subtilis. In: Microbial Synthetic Biology, 87–117. Elsevier. van Tilburg, A.Y., Cao, H., van der Meulen, S.B. et al. (2019). Metabolic engineering and synthetic biology employing Lactococcus lactis and Bacillus subtilis cell factories. Current Opinion in Biotechnology 59: 1–7. https:// doi.org/10.1016/j.copbio.2019.01.007. Guérout-Fleury, A.M., Frandsen, N., and Stragier, P. (1996). Plasmids for ectopic integration in Bacillus subtilis. Gene 180: 57–61. https://doi.org/10 .1016/s0378-1119(96)00404-0. Vagner, V., Dervyn, E., and Ehrlich, S.D. (1998). A vector for systematic gene inactivation in Bacillus subtilis. Microbiology (Reading, England) 144 (Pt 11): 3097–3104. https://doi.org/10.1099/00221287-144-11-3097. Burby, P.E. and Simmons, L.A. (2017). CRISPR/Cas9 editing of the Bacillus subtilis genome. Bio-Protocol 7 https://doi.org/10.21769/BioProtoc.2272. Albert, H., Dale, E.C., Lee, E., and Ow, D.W. (1995). Site-specific integration of DNA into wild-type and mutant lox sites placed in the plant genome. The Plant Journal: For Cell and Molecular Biology 7: 649–659. https://doi .org/10.1046/j.1365-313x.1995.7040649.x. Marx, C.J. and Lidstrom, M.E. (2002). Broad-host-range cre-lox system for antibiotic marker recycling in gram-negative bacteria. BioTechniques 33: 1062–1067. https://doi.org/10.2144/02335rr01. Kovács, A.T., van Hartskamp, M., Kuipers, O.P., and van Kranenburg, R. (2010). Genetic tool development for a new host for biotechnology, the thermotolerant bacterium Bacillus coagulans. Applied and Environmental Microbiology 76: 4085–4088. https://doi.org/10.1128/AEM.03060-09. Kumpfmüller, J., Kabisch, J., and Schweder, T. (2013). An optimized technique for rapid genome modifications of Bacillus subtilis. Journal of Microbiological Methods 95: 350–352. https://doi.org/10.1016/j.mimet.2013 .10.003. Wu, L., Wu, H., Chen, L. et al. (2015). Bacilysin overproduction in Bacillus amyloliquefaciens FZB42 markerless derivative strains FZBREP and FZBSPA enhances antibacterial activity. Applied Microbiology and Biotechnology 99: 4255–4263. https://doi.org/10.1007/s00253-014-6251-0. Yan, X., Yu, H.-J., Hong, Q., and Li, S.-P. (2008). Cre/lox system and PCR-based genome engineering in Bacillus subtilis. Applied and Environmental Microbiology 74: 5556–5562. https://doi.org/10.1128/AEM .01156-08. Fabret, C., Ehrlich, S.D., and Noirot, P. (2002). A new mutation delivery system for genome-scale approaches in Bacillus subtilis. Molecular Microbiology 46: 25–36. https://doi.org/10.1046/j.1365-2958.2002.03140.x. Sawitzke, J.A., Thomason, L.C., Costantino, N. et al. (2007). Recombineering: in vivo genetic engineering in E. coli, S. enterica, and beyond. Methods in Enzymology 421: 171–199. doi.org/10.1016/S0076-6879(06)21015-2. Kostner, D., Rachinger, M., Liebl, W., and Ehrenreich, A. (2017). Markerless deletion of putative alanine dehydrogenase genes in Bacillus licheniformis

507

508

13 Metabolic Engineering of Bacillus – New Tools, Strains, and Concepts

113

114

115

116

117

118

119

120

121

122

123

124

using a codBA-based counterselection technique. Microbiology (Reading, England) 163: 1532–1539. https://doi.org/10.1099/mic.0.000544. Dong, H. and Zhang, D. (2014). Current development in genetic engineering strategies of Bacillus species. Microbial Cell Factories 13: 63. https://doi.org/ 10.1186/1475-2859-13-63. Borgmeier, C., Bongaerts, J., and Meinhardt, F. (2012). Genetic analysis of the Bacillus licheniformis degSU operon and the impact of regulatory mutations on protease production. Journal of Biotechnology 159: 12–20. https:// doi.org/10.1016/j.jbiotec.2012.02.011. Wemhoff, S. and Meinhardt, F. (2013). Generation of biologically contained, readily transformable, and genetically manageable mutants of the biotechnologically important Bacillus pumilus. Applied Microbiology and Biotechnology 97: 7805–7819. https://doi.org/10.1007/s00253-013-4935-5. Dubeau, M.-P., Ghinet, M.G., Jacques, P.-E. et al. (2009). Cytosine deaminase as a negative selection marker for gene disruption and replacement in the genus Streptomyces and other actinobacteria. Applied and Environmental Microbiology 75: 1211–1214. https://doi.org/10.1128/AEM.02139-08. van der Geize, R., de Jong, W., Hessels, G.I. et al. (2008). A novel method to generate unmarked gene deletions in the intracellular pathogen Rhodococcus equi using 5-fluorocytosine conditional lethality. Nucleic Acids Research 36: e151. https://doi.org/10.1093/nar/gkn811. Brockmeier, U., Caspers, M., Freudl, R. et al. (2006). Systematic screening of all signal peptides from Bacillus subtilis: a powerful strategy in optimizing heterologous protein secretion in Gram-positive bacteria. Journal of Molecular Biology 362: 393–402. https://doi.org/10.1016/j.jmb.2006.07.034. Arnaud, M., Chastanet, A., and Débarbouillé, M. (2004). New vector for efficient allelic replacement in naturally nontransformable, low-GC-content, gram-positive bacteria. Applied and Environmental Microbiology 70: 6887–6891. https://doi.org/10.1128/AEM.70.11.6887-6891.2004. Patrick, J.E. and Kearns, D.B. (2008). MinJ (YvjD) is a topological determinant of cell division in Bacillus subtilis. Molecular Microbiology 70: 1166–1179. https://doi.org/10.1111/j.1365-2958.2008.06469.x. Rachinger, M., Bauch, M., Strittmatter, A. et al. (2013). Size unlimited markerless deletions by a transconjugative plasmid-system in Bacillus licheniformis. Journal of Biotechnology 167: 365–369. https://doi.org/10 .1016/j.jbiotec.2013.07.026. Villafane, R., Bechhofer, D.H., Narayanan, C.S., and Dubnau, D. (1987). Replication control genes of plasmid pE194. Journal of Bacteriology 169: 4822–4829. https://doi.org/10.1128/jb.169.10.4822-4829.1987. Muth, G. (2018). The pSG5-based thermosensitive vector family for genome editing and gene expression in actinomycetes. Applied Microbiology and Biotechnology 102: 9067–9080. https://doi.org/10.1007/s00253-018-9334-5. Konkol, M.A., Blair, K.M., and Kearns, D.B. (2013). Plasmid-encoded ComI inhibits competence in the ancestral 3610 strain of Bacillus subtilis. Journal of Bacteriology 195: 4085–4093. https://doi.org/10.1128/JB.00696-13.

References

125 Zhang, K., Duan, X., and Wu, J. (2016). Multigene disruption in undo-

126

127

128

129

130

131

132

133

134

135

136

137

mesticated Bacillus subtilis ATCC 6051a using the CRISPR/Cas9 system. Scientific Reports 6: 27943. https://doi.org/10.1038/srep27943. Zhou, C., Liu, H., Yuan, F. et al. (2019). Development and application of a CRISPR/Cas9 system for Bacillus licheniformis genome editing. International Journal of Biological Macromolecules 122: 329–337. https://doi.org/10 .1016/j.ijbiomac.2018.10.170. Ferreira, R., David, F., and Nielsen, J. (2018). Advancing biotechnology with CRISPR/Cas9: recent applications and patent landscape. Journal of Industrial Microbiology & Biotechnology 45: 467–480. https://doi.org/10.1007/ s10295-017-2000-6. Tong, Y., Weber, T., and Lee, S.Y. (2019). CRISPR/Cas-based genome engineering in natural product discovery. Natural Product Reports 36: 1262–1280. https://doi.org/10.1039/c8np00089a. Altenbuchner, J. (2016). Editing of the Bacillus subtilis genome by the CRISPR-Cas9 system. Applied and Environmental Microbiology 82: 5421–5427. https://doi.org/10.1128/AEM.01453-16. Li, K., Cai, D., Wang, Z. et al. (2018). Development of an efficient genome editing tool in Bacillus licheniformis using CRISPR-Cas9 nickase. Applied and Environmental Microbiology 84 https://doi.org/10.1128/AEM.02608-17. Mougiakos, I., Bosma, E.F., Weenink, K. et al. (2017). Efficient genome editing of a facultative thermophile using mesophilic spCas9. ACS Synthetic Biology 6: 849–861. https://doi.org/10.1021/acssynbio.6b00339. Hong, K.-Q., Liu, D.-Y., Chen, T., and Wang, Z.-W. (2018). Recent advances in CRISPR/Cas9 mediated genome editing in Bacillus subtilis. World Journal of Microbiology and Biotechnology 34: 153. https://doi.org/10.1007/ s11274-018-2537-1. Price, M.A., Cruz, R., Baxter, S. et al. (2019). CRISPR-Cas9 in situ engineering of subtilisin E in Bacillus subtilis. PLoS One 14: e0210121. https://doi.org/10.1371/journal.pone.0210121. So, Y., Park, S.-Y., Park, E.-H. et al. (2017). A highly efficient CRISPR-Cas9-mediated large genomic deletion in Bacillus subtilis. Frontiers in Microbiology 8: 1167. https://doi.org/10.3389/fmicb.2017.01167. Sauer, C., Syvertsson, S., Bohorquez, L.C. et al. (2016). Effect of genome position on heterologous gene expression in Bacillus subtilis: an unbiased analysis. ACS Synthetic Biology 5: 942–947. https://doi.org/10.1021/acssynbio .6b00065. Schultenkämper, K., Brito, L.F., López, M.G. et al. (2019). Establishment and application of CRISPR interference to affect sporulation, hydrogen peroxide detoxification, and mannitol catabolism in the methylotrophic thermophile Bacillus methanolicus. Applied Microbiology and Biotechnology 103: 5879–5889. https://doi.org/10.1007/s00253-019-09907-8. Ran, F.A., Hsu, P.D., Lin, C.-Y. et al. (2013). Double nicking by RNA-guided CRISPR Cas9 for enhanced genome editing specificity. Cell 154: 1380–1389. https://doi.org/10.1016/j.cell.2013.08.021.

509

510

13 Metabolic Engineering of Bacillus – New Tools, Strains, and Concepts

138 Toymentseva, A.A. and Altenbuchner, J. (2019). New CRISPR-Cas9 vectors

139

140

141

142 143

144

145 146

147

148

149

150

151

for genetic modifications of Bacillus species. FEMS Microbiology Letters 366 https://doi.org/10.1093/femsle/fny284. Bhavsar, A.P., Zhao, X., and Brown, E.D. (2001). Development and characterization of a xylose-dependent system for expression of cloned genes in Bacillus subtilis: conditional complementation of a teichoic acid mutant. Applied and Environmental Microbiology 67: 403–410. https://doi.org/10 .1128/AEM.67.1.403-410.2001. Sun, T. and Altenbuchner, J. (2010). Characterization of a mannose utilization system in Bacillus subtilis. Journal of Bacteriology 192: 2128–2139. https://doi.org/10.1128/JB.01673-09. Vavrová, L., Muchová, K., and Barák, I. (2010). Comparison of different Bacillus subtilis expression systems. Research in Microbiology 161: 791–797. https://doi.org/10.1016/j.resmic.2010.09.004. Bailey, J.E. (1991). Toward a science of metabolic engineering. Science (New York, N.Y.) 252: 1668–1675. https://doi.org/10.1126/science.2047876. Cleto, S., Jensen, J.V., Wendisch, V.F., and Lu, T.K. (2016). Corynebacterium glutamicum metabolic engineering with CRISPR interference (CRISPRi). ACS Synthetic Biology 5: 375–385. https://doi.org/10.1021/ acssynbio.5b00216. Lee, S.Y. and Kim, H.U. (2015). Systems strategies for developing industrial microbial strains. Nature Biotechnology 33: 1061–1072. https://doi.org/10 .1038/nbt.3365. Stephanopoulos, G.N., Aristidou, A.A., and Nielsen, J. (eds.) (1998). Metabolic Engineering. Elsevier. Cobb, R.E., Wang, Y., and Zhao, H. (2015). High-efficiency multiplex genome editing of Streptomyces species using an engineered CRISPR/Cas system. ACS Synthetic Biology 4: 723–728. https://doi.org/10.1021/ sb500351f. Garst, A.D., Bassalo, M.C., Pines, G. et al. (2017). Genome-wide mapping of mutations at single-nucleotide resolution for protein, metabolic and genome engineering. Nature Biotechnology 35: 48–55. https://doi.org/10 .1038/nbt.3718. Ronda, C., Pedersen, L.E., Sommer, M.O.A., and Nielsen, A.T. (2016). CRMAGE: CRISPR optimized MAGE recombineering. Scientific Reports 6: 19452. https://doi.org/10.1038/srep19452. Wang, Y., Liu, Y., Liu, J. et al. (2018). MACBETH: multiplex automated Corynebacterium glutamicum base editing method. Metabolic Engineering 47: 200–210. https://doi.org/10.1016/j.ymben.2018.02.016. Moeller, R., Stackebrandt, E., Reitz, G. et al. (2007). Role of DNA repair by nonhomologous-end joining in Bacillus subtilis spore resistance to extreme dryness, mono- and polychromatic UV, and ionizing radiation. Journal of Bacteriology 189: 3306–3311. https://doi.org/10.1128/JB.00018-07. Petit, M.A. and Ehrlich, S.D. (2000). The NAD-dependent ligase encoded by yerG is an essential gene of Bacillus subtilis. Nucleic Acids Research 28: 4642–4648. https://doi.org/10.1093/nar/28.23.4642.

References

152 Engler, C., Gruetzner, R., Kandzia, R., and Marillonnet, S. (2009). Golden

153

154

155

156

157

158

159

160

161

162

163

164

gate shuffling: a one-pot DNA shuffling method based on type IIs restriction enzymes. PLoS One 4: e5553. https://doi.org/10.1371/journal.pone .0005553. Li, Y., Lin, Z., Huang, C. et al. (2015). Metabolic engineering of Escherichia coli using CRISPR–Cas9 meditated genome editing. Metabolic Engineering 31: 13–21. https://doi.org/10.1016/j.ymben.2015.06.006. Mougiakos, I., Bosma, E.F., Ganguly, J. et al. (2018). Hijacking CRISPR-Cas for high-throughput bacterial metabolic engineering: advances and prospects. Current Opinion in Biotechnology 50: 146–157. https://doi.org/10 .1016/j.copbio.2018.01.002. Bikard, D., Jiang, W., Samai, P. et al. (2013). Programmable repression and activation of bacterial gene expression using an engineered CRISPR-Cas system. Nucleic Acids Research 41: 7429–7437. https://doi.org/10.1093/nar/ gkt520. Na, D., Yoo, S.M., Chung, H. et al. (2013). Metabolic engineering of Escherichia coli using synthetic small regulatory RNAs. Nature Biotechnology 31: 170–174. https://doi.org/10.1038/nbt.2461. Qi, L.S., Larson, M.H., Gilbert, L.A. et al. (2013). Repurposing CRISPR as an RNA-guided platform for sequence-specific control of gene expression. Cell 152: 1173–1183. https://doi.org/10.1016/j.cell.2013.02.022. Berlec, A., Škrlec, K., Kocjan, J. et al. (2018). Single plasmid systems for inducible dual protein expression and for CRISPR-Cas9/CRISPRi gene regulation in lactic acid bacterium Lactococcus lactis. Scientific Reports 8: 1009. https://doi.org/10.1038/s41598-018-19402-1. Lv, L., Ren, Y.-L., Chen, J.-C. et al. (2015). Application of CRISPRi for prokaryotic metabolic engineering involving multiple genes, a case study: controllable P(3HB-co-4HB) biosynthesis. Metabolic Engineering 29: 160–168. https://doi.org/10.1016/j.ymben.2015.03.013. Wang, C., Cao, Y., Wang, Y. et al. (2019). Enhancing surfactin production by using systematic CRISPRi repression to screen amino acid biosynthesis genes in Bacillus subtilis. Microbial Cell Factories 18: 90. https://doi.org/10 .1186/s12934-019-1139-4. Wen, Z., Minton, N.P., Zhang, Y. et al. (2017). Enhanced solvent production by metabolic engineering of a twin-clostridial consortium. Metabolic Engineering 39: 38–48. https://doi.org/10.1016/j.ymben.2016.10.013. Zhao, Y., Li, L., Zheng, G. et al. (2018). CRISPR/dCas9-mediated multiplex gene repression in Streptomyces. Biotechnology Journal 13: e1800121. https://doi.org/10.1002/biot.201800121. Zhan, Y., Xu, Y., Zheng, P. et al. (2019). Establishment and application of multiplexed CRISPR interference system in Bacillus licheniformis. Applied Microbiology and Biotechnology https://doi.org/10.1007/s00253-019-10230-5. Westbrook, A.W., Ren, X., Oh, J. et al. (2018). Metabolic engineering to enhance heterologous production of hyaluronic acid in Bacillus subtilis. Metabolic Engineering 47: 401–413. https://doi.org/10.1016/j.ymben .2018.04.016.

511

512

13 Metabolic Engineering of Bacillus – New Tools, Strains, and Concepts

165 Gaudelli, N.M., Komor, A.C., Rees, H.A. et al. (2017). Programmable base

166

167

168

169

170

171

172

173

174

175

176

177

178

editing of A•T to G•C in genomic DNA without DNA cleavage. Nature 551: 464–471. https://doi.org/10.1038/nature24644. Komor, A.C., Kim, Y.B., Packer, M.S. et al. (2016). Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533: 420–424. https://doi.org/10.1038/nature17946. Nishida, K., Arazoe, T., Yachie, N. et al. (2016). Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science (New York, N.Y.) 353 https://doi.org/10.1126/science.aaf8729. Jiang, F., Taylor, D.W., Chen, J.S. et al. (2016). Structures of a CRISPR-Cas9 R-loop complex primed for DNA cleavage. Science (New York, N.Y.) 351: 867–871. https://doi.org/10.1126/science.aad8282. Komor, A.C., Zhao, K.T., Packer, M.S. et al. (2017). Improved base excision repair inhibition and bacteriophage Mu Gam protein yields C:G-to-T:A base editors with higher efficiency and product purity. Science Advances 3: eaao4774. https://doi.org/10.1126/sciadv.aao4774. Rees, H.A. and Liu, D.R. (2018). Base editing: precision chemistry on the genome and transcriptome of living cells. Nature Reviews. Genetics 19: 770–788. https://doi.org/10.1038/s41576-018-0059-1. Banno, S., Nishida, K., Arazoe, T. et al. (2018). Deaminase-mediated multiplex genome editing in Escherichia coli. Nature Microbiology 3: 423–429. https://doi.org/10.1038/s41564-017-0102-6. Tong, Y., Whitford, C.M., Robertsen, H.L. et al. (2019). Highly efficient DSB-free base editing for streptomycetes with CRISPR-BEST. Proceedings of the National Academy of Sciences of the United States of America 116: 20366–20375. https://doi.org/10.1073/pnas.1913493116. Yu, S., Price, M.A., Wang, Y. et al. (2020). CRISPR-dCas9 mediated cytosine deaminase base editing in Bacillus subtilis. ACS Synthetic Biology 9: 1781–1789. https://doi.org/10.1021/acssynbio.0c00151. Guiziou, S., Sauveplane, V., Chang, H.-J. et al. (2016). A part toolbox to tune genetic expression in Bacillus subtilis. Nucleic Acids Research 44: 7495–7508. https://doi.org/10.1093/nar/gkw624. Li, S., Jendresen, C.B., Grünberger, A. et al. (2016). Enhanced protein and biochemical production using CRISPRi-based growth switches. Metabolic Engineering 38: 274–284. https://doi.org/10.1016/j.ymben.2016.09.003. Price, M.A., Cruz, R., Bryson, J. et al. (2020). Expanding and understanding the CRISPR toolbox for Bacillus subtilis with MAD7 and dMAD7. Biotechnology and Bioengineering 117: 1805–1816. https://doi.org/10.1002/ bit.27312. Mougiakos, I., Orsi, E., Ghiffary, M.R. et al. (2019). Efficient Cas9-based genome editing of Rhodobacter sphaeroides for metabolic engineering. Microbial Cell Factories 18: 204. https://doi.org/10.1186/s12934-019-1255-1. Müller, J.E.N., Heggeset, T.M.B., Wendisch, V.F. et al. (2015). Methylotrophy in the thermophilic Bacillus methanolicus, basic insights and application for commodity production from methanol. Applied Microbiology and Biotechnology 99: 535–551. https://doi.org/10.1007/s00253-014-6224-3.

References

179 Mougiakos, I., Mohanraju, P., Bosma, E.F. et al. (2017). Characterizing

180

181

182

183

184

185

186

187

188

189

190

191

a thermostable Cas9 for bacterial genome editing and silencing. Nature Communications 8: 1647. https://doi.org/10.1038/s41467-017-01591-4. Irla, M., Heggeset, T.M.B., Nærdal, I. et al. (2016). Genome-based genetic tool development for Bacillus methanolicus: theta- and rolling circle-replicating plasmids for inducible gene expression and application to methanol-based cadaverine production. Frontiers in Microbiology 7: 1481. https://doi.org/10.3389/fmicb.2016.01481. Drejer, E.B., Hakvåg, S., Irla, M., and Brautaset, T. (2018). Genetic tools and techniques for recombinant expression in thermophilic Bacillaceae. Microorganisms 6 https://doi.org/10.3390/microorganisms6020042. Decoene, T., de Paepe, B., Maertens, J. et al. (2018). Standardization in synthetic biology: an engineering discipline coming of age. Critical Reviews in Biotechnology 38: 647–656. https://doi.org/10.1080/07388551.2017.1380600. Lee, M.E., DeLoache, W.C., Cervantes, B., and Dueber, J.E. (2015). A highly characterized yeast toolkit for modular, multipart assembly. ACS Synthetic Biology 4: 975–986. https://doi.org/10.1021/sb500366v. Mutalik, V.K., Guimaraes, J.C., Cambray, G. et al. (2013). Quantitative estimation of activity and quality for collections of functional genetic elements. Nature Methods 10: 347–353. https://doi.org/10.1038/nmeth.2403. Nielsen, A.A.K., Segall-Shapiro, T.H., and Voigt, C.A. (2013). Advances in genetic circuit design: novel biochemistries, deep part mining, and precision gene expression. Current Opinion in Chemical Biology 17: 878–892. https:// doi.org/10.1016/j.cbpa.2013.10.003. Popp, P.F., Dotzler, M., Radeck, J. et al. (2017). The Bacillus BioBrick Box 2.0: expanding the genetic toolbox for the standardized work with Bacillus subtilis. Scientific Reports 7: 15058. https://doi.org/10.1038/s41598-01715107-z. Radeck, J., Kraft, K., Bartels, J. et al. (2013). The Bacillus BioBrick Box: generation and evaluation of essential genetic building blocks for standardized work with Bacillus subtilis. Journal of Biological Engineering 7: 29. https:// doi.org/10.1186/1754-1611-7-29. Radeck, J., Meyer, D., Lautenschläger, N., and Mascher, T. (2017). Bacillus SEVA siblings: a Golden Gate-based toolbox to create personalized integrative vectors for Bacillus subtilis. Scientific Reports 7: 14134. https://doi.org/ 10.1038/s41598-017-14329-5. Müller, K.M. and Arndt, K.M. (2012). Standardization in synthetic biology. Methods in Molecular Biology (Clifton, N.J.) 813: 23–43. https://doi.org/10 .1007/978-1-61779-412-4_2. Shetty, R.P., Endy, D., and Knight, T.F. (2008). Engineering BioBrick vectors from BioBrick parts. Journal of Biological Engineering 2: 5. https://doi.org/10 .1186/1754-1611-2-5. Liu, D., Mao, Z., Guo, J. et al. (2018). Construction, model-based analysis, and characterization of a promoter library for fine-tuned gene expression in Bacillus subtilis. ACS Synthetic Biology 7: 1785–1797. https://doi.org/10 .1021/acssynbio.8b00115.

513

514

13 Metabolic Engineering of Bacillus – New Tools, Strains, and Concepts

192 Song, Y., Nikoloff, J.M., Fu, G. et al. (2016). Promoter screening from Bacil-

193

194

195

196

197

198

199

200

201

202

203

lus subtilis in various conditions hunting for synthetic biology and industrial applications. PLoS One 11: e0158447. https://doi.org/10.1371/journal.pone .0158447. Yang, S., Du, G., Chen, J., and Kang, Z. (2017). Characterization and application of endogenous phase-dependent promoters in Bacillus subtilis. Applied Microbiology and Biotechnology 101: 4151–4161. https://doi.org/10 .1007/s00253-017-8142-7. Keasling, J.D. (2012). Synthetic biology and the development of tools for metabolic engineering. Metabolic Engineering 14: 189–195. https://doi.org/ 10.1016/j.ymben.2012.01.004. Han, L., Cui, W., Suo, F. et al. (2019). Development of a novel strategy for robust synthetic bacterial promoters based on a stepwise evolution targeting the spacer region of the core promoter in Bacillus subtilis. Microbial Cell Factories 18: 96. https://doi.org/10.1186/s12934-019-1148-3. Sauer, C., ver van Loren Themaat, E., Boender, L.G.M. et al. (2018). Exploring the nonconserved sequence space of synthetic expression modules in Bacillus subtilis. ACS Synthetic Biology 7: 1773–1784. https://doi.org/10 .1021/acssynbio.8b00110. Cui, W., Han, L., Suo, F. et al. (2018). Exploitation of Bacillus subtilis as a robust workhorse for production of heterologous proteins and beyond. World Journal of Microbiology and Biotechnology 34: 145. https://doi.org/10 .1007/s11274-018-2531-7. Silbersack, J., Jürgen, B., Hecker, M. et al. (2006). An acetoin-regulated expression system of Bacillus subtilis. Applied Microbiology and Biotechnology 73: 895–903. https://doi.org/10.1007/s00253-006-0549-5. Wenzel, M., Müller, A., Siemann-Herzberg, M., and Altenbuchner, J. (2011). Self-inducible Bacillus subtilis expression system for reliable and inexpensive protein production by high-cell-density fermentation. Applied and Environmental Microbiology 77: 6419–6425. https://doi.org/10.1128/AEM .05219-11. Trung, N.T., Hung, N.M., Thuan, N.H. et al. (2019). An auto-inducible phosphate-controlled expression system of Bacillus licheniformis. BMC Biotechnology 19: 3. https://doi.org/10.1186/s12896-018-0490-6. Guan, C., Cui, W., Cheng, J. et al. (2015). Construction and development of an auto-regulatory gene expression system in Bacillus subtilis. Microbial Cell Factories 14: 150. https://doi.org/10.1186/s12934-015-0341-2. Guan, C., Cui, W., Cheng, J. et al. (2016). Development of an efficient autoinducible expression system by promoter engineering in Bacillus subtilis. Microbial Cell Factories 15: 66. https://doi.org/10.1186/s12934016-0464-0. Caspers, M., Brockmeier, U., Degering, C. et al. (2010). Improvement of Sec-dependent secretion of a heterologous model protein in Bacillus subtilis by saturation mutagenesis of the N-domain of the AmyE signal peptide. Applied Microbiology and Biotechnology 86: 1877–1885. https://doi.org/10 .1007/s00253-009-2405-x.

References

204 Degering, C., Eggert, T., Puls, M. et al. (2010). Optimization of protease

205

206

207

208

209

210

211

212

213

214

215

216

secretion in Bacillus subtilis and Bacillus licheniformis by screening of homologous and heterologous signal peptides. Applied and Environmental Microbiology 76: 6370–6376. https://doi.org/10.1128/AEM.01146-10. Fu, G., Liu, J., Li, J. et al. (2018). Systematic screening of optimal signal peptides for secretory production of heterologous proteins in Bacillus subtilis. Journal of Agricultural and Food Chemistry 66: 13141–13151. https://doi .org/10.1021/acs.jafc.8b04183. Heinrich, J., Drewniok, C., Neugebauer, E. et al. (2019). The YoaW signal peptide directs efficient secretion of different heterologous proteins fused to a StrepII-SUMO tag in Bacillus subtilis. Microbial Cell Factories 18: 31. https://doi.org/10.1186/s12934-019-1078-0. Knapp, A., Ripphahn, M., Volkenborn, K. et al. (2017). Activity-independent screening of secreted proteins using split GFP. Journal of Biotechnology 258: 110–116. https://doi.org/10.1016/j.jbiotec.2017.05.024. Liebeton, K., Lengefeld, J., and Eck, J. (2014). The nucleotide composition of the spacer sequence influences the expression yield of heterologously expressed genes in Bacillus subtilis. Journal of Biotechnology 191: 214–220. https://doi.org/10.1016/j.jbiotec.2014.06.027. Skoczinski, P., Volkenborn, K., Fulton, A. et al. (2017). Contribution of single amino acid and codon substitutions to the production and secretion of a lipase by Bacillus subtilis. Microbial Cell Factories 16: 160. https://doi.org/10 .1186/s12934-017-0772-z. Alper, H., Fischer, C., Nevoigt, E., and Stephanopoulos, G. (2005). Tuning genetic control through promoter engineering. Proceedings of the National Academy of Sciences of the United States of America 102: 12678–12683. https://doi.org/10.1073/pnas.0504604102. Alper, H. and Stephanopoulos, G. (2007). Global transcription machinery engineering: a new approach for improving cellular phenotype. Metabolic Engineering 9: 258–267. https://doi.org/10.1016/j.ymben.2006.12.002. Cao, H., Villatoro-Hernandez, J., Weme, R.D.O. et al. (2018). Boosting heterologous protein production yield by adjusting global nitrogen and carbon metabolic regulatory networks in Bacillus subtilis. Metabolic Engineering 49: 143–152. https://doi.org/10.1016/j.ymben.2018.08.001. Gong, Z., Nielsen, J., and Zhou, Y.J. (2017). Engineering robustness of microbial cell factories. Biotechnology Journal 12 https://doi.org/10.1002/biot .201700014. Jeschek, M., Gerngross, D., and Panke, S. (2017). Combinatorial pathway optimization for streamlined metabolic engineering. Current Opinion in Biotechnology 47: 142–151. https://doi.org/10.1016/j.copbio.2017.06.014. Ploss, T.N., Reilman, E., Monteferrante, C.G. et al. (2016). Homogeneity and heterogeneity in amylase production by Bacillus subtilis under different growth conditions. Microbial Cell Factories 15: 57. https://doi.org/10.1186/ s12934-016-0455-1. de Smit, M.H. and van Duin, J. (1990). Control of prokaryotic translational initiation by mRNA secondary structure. In: Progress in Nucleic Acid

515

516

13 Metabolic Engineering of Bacillus – New Tools, Strains, and Concepts

217 218

219

220

221

222

223

224

225

226

227

228

229

Research and Molecular Biology, vol. 38, 1–35. Elsevier doi.org/10.1016/ S0079-6603(08)60707-2. Nielsen, J. and Keasling, J.D. (2016). Engineering cellular metabolism. Cell 164: 1185–1197. https://doi.org/10.1016/j.cell.2016.02.004. Ruan, L., Li, L., Zou, D. et al. (2019). Metabolic engineering of Bacillus amyloliquefaciens for enhanced production of S-adenosylmethionine by coupling of an engineered S-adenosylmethionine pathway and the tricarboxylic acid cycle. Biotechnology for Biofuels 12: 211. https://doi.org/10 .1186/s13068-019-1554-0. Mahr, R. and Frunzke, J. (2016). Transcription factor-based biosensors in biotechnology: current state and future prospects. Applied Microbiology and Biotechnology 100: 79–90. https://doi.org/10.1007/s00253-015-7090-3. Chou, H.H. and Keasling, J.D. (2013). Programming adaptive control to evolve increased metabolite production. Nature Communications 4: 2595. https://doi.org/10.1038/ncomms3595. Mahr, R., Gätgens, C., Gätgens, J. et al. (2015). Biosensor-driven adaptive laboratory evolution of l-valine production in Corynebacterium glutamicum. Metabolic Engineering 32: 184–194. https://doi.org/10.1016/j.ymben.2015.09 .017. Kirchner, M., Schorpp, K., Hadian, K., and Schneider, S. (2017). An in vivo high-throughput screening for riboswitch ligands using a reverse reporter gene system. Scientific Reports 7: 7732. https://doi.org/10.1038/ s41598-017-07870-w. Pinto, D., Vecchione, S., Wu, H. et al. (2018). Engineering orthogonal synthetic timer circuits based on extracytoplasmic function 𝜎 factors. Nucleic Acids Research 46: 7450–7464. https://doi.org/10.1093/nar/gky614. Pinto, D., Dürr, F., Froriep, F. et al. (2019). Extracytoplasmic function 𝜎 factors can be implemented as robust heterologous genetic switches in Bacillus subtilis. iScience 13: 380–390. https://doi.org/10.1016/j.isci.2019.03.001. Mascher, T. (2013). Signaling diversity and evolution of extracytoplasmic function (ECF) 𝜎 factors. Current Opinion in Microbiology 16: 148–155. https://doi.org/10.1016/j.mib.2013.02.001. Kaspar, F., Neubauer, P., and Gimpel, M. (2019). Bioactive secondary metabolites from Bacillus subtilis: a comprehensive review. Journal of Natural Products 82: 2038–2053. https://doi.org/10.1021/acs.jnatprod.9b00110. Lakowitz, A., Godard, T., Biedendieck, R., and Krull, R. (2018). Mini review: recombinant production of tailored bio-pharmaceuticals in different Bacillus strains and future perspectives. European Journal of Pharmaceutics And Biopharmaceutics: Official Journal of Arbeitsgemeinschaft fur Pharmazeutische Verfahrenstechnik e.V 126: 27–39. https://doi.org/10.1016/j.ejpb.2017.06.008. Liu, Y., Li, J., Du, G. et al. (2017). Metabolic engineering of Bacillus subtilis fueled by systems biology: recent advances and future directions. Biotechnology Advances 35: 20–30. https://doi.org/10.1016/j.biotechadv .2016.11.003. Su, Y., Liu, C., Fang, H., and Zhang, D. (2020). Bacillus subtilis: a universal cell factory for industry, agriculture, biomaterials and medicine. Microbial Cell Factories 19: 173. https://doi.org/10.1186/s12934-020-01436-8.

References

230 Wu, X.C., Lee, W., Tran, L., and Wong, S.L. (1991). Engineering a Bacillus

231

232

233

234

235

236

237

238

239

240

241

subtilis expression-secretion system with a strain deficient in six extracellular proteases. Journal of Bacteriology 173: 4952–4958. Wu, S.-C., Hassan Qureshi, M., and Wong, S.-L. (2002). Secretory production and purification of functional full-length streptavidin from Bacillus subtilis. Protein Expression and Purification 24: 348–356. https://doi.org/10 .1006/prep.2001.1582. Zhan, Y., Sheng, B., Wang, H. et al. (2018). Rewiring glycerol metabolism for enhanced production of poly-γ-glutamic acid in Bacillus licheniformis. Biotechnology for Biofuels 11: 306. https://doi.org/10.1186/s13068018-1311-9. Ogunleye, A., Bhat, A., Irorere, V.U. et al. (2015). Poly-γ-glutamic acid: production, properties and applications. Microbiology (Reading, England) 161: 1–17. https://doi.org/10.1099/mic.0.081448-0. Cai, D., He, P., Lu, X. et al. (2017). A novel approach to improve poly-γ-glutamic acid production by NADPH regeneration in Bacillus licheniformis WX-02. Scientific Reports 7: 43404. https://doi.org/10.1038/ srep43404. Zhan, Y., Zhu, C., Sheng, B. et al. (2017). Improvement of glycerol catabolism in Bacillus licheniformis for production of poly-γ-glutamic acid. Applied Microbiology and Biotechnology 101: 7155–7164. https://doi.org/10 .1007/s00253-017-8459-2. Cai, D., Chen, Y., He, P. et al. (2018). Enhanced production of poly-γ-glutamic acid by improving ATP supply in metabolically engineered Bacillus licheniformis. Biotechnology and Bioengineering 115: 2541–2553. https://doi.org/10.1002/bit.26774. Abdallah, I.I., Pramastya, H., van Merkerk, R. et al. (2019). Metabolic engineering of Bacillus subtilis toward taxadiene biosynthesis as the first committed step for taxol production. Frontiers in Microbiology 10: 218. https://doi.org/10.3389/fmicb.2019.00218. Cai, D., Zhu, J., Zhu, S. et al. (2019). Metabolic engineering of main transcription factors in carbon, nitrogen, and phosphorus metabolisms for enhanced production of bacitracin in Bacillus licheniformis. ACS Synthetic Biology 8: 866–875. https://doi.org/10.1021/acssynbio.9b00005. Li, Y., Wu, F., Cai, D. et al. (2018). Enhanced production of bacitracin by knocking out of amino acid permease gene yhdG in Bacillus licheniformis DW2. Sheng Wu Gong Cheng Xue Bao = Chinese Journal of Biotechnology 34: 916–927. (chi). doi: https://doi.org/10.13345/j.cjb.170500. Zhu, J., Cai, D., Xu, H. et al. (2018). Enhancement of precursor amino acid supplies for improving bacitracin production by activation of branched chain amino acid transporter BrnQ and deletion of its regulator gene lrp in Bacillus licheniformis. Synthetic and Systems Biotechnology 3: 236–243. https://doi.org/10.1016/j.synbio.2018.10.009. Liu, Z., Yu, W., Nomura, C.T. et al. (2018). Increased flux through the TCA cycle enhances bacitracin production by Bacillus licheniformis DW2. Applied Microbiology and Biotechnology 102: 6935–6946. https://doi.org/10 .1007/s00253-018-9133-z.

517

518

13 Metabolic Engineering of Bacillus – New Tools, Strains, and Concepts

242 Abatemarco, J., Hill, A., and Alper, H.S. (2013). Expanding the metabolic

engineering toolbox with directed evolution. Biotechnology Journal 8: 1397–1410. https://doi.org/10.1002/biot.201300021. 243 Pfeifer, E., Gätgens, C., Polen, T., and Frunzke, J. (2017). Adaptive laboratory evolution of Corynebacterium glutamicum towards higher growth rates on glucose minimal medium. Scientific Reports 7: 16780. https://doi.org/10 .1038/s41598-017-17014-9. 244 Lim, H. and Choi, S.-K. (2019). Programmed gRNA removal system for CRISPR-Cas9-mediated multi-round genome editing in Bacillus subtilis. Frontiers in Microbiology 10: 1140. https://doi.org/10.3389/fmicb.2019.01140.

519

14 Metabolic Engineering of Pseudomonas Pablo I. Nikel 1 and Víctor de Lorenzo 2 1 The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, Denmark 2

Systems and Synthetic Biology Department, Centro Nacional de Biotecnología (CNB-CSIC), Madrid, Spain

14.1 Introduction Since its inception as an independent discipline, metabolic engineering has enabled access to industrially relevant compounds via whole-cell biocatalysis, habitually run by a suite of microorganisms [1–3]. Yet, success stories have been largely confined to genetic and genomic manipulations leading to overproduction of natively synthesized metabolites [4]. A limited number of structurally simple molecules (e.g. the diols 1,4-butanediol and 1,3-propanediol) and a few natural active compounds (e.g. artemisinin) have found their way toward industrial-scale production and commercialization [5, 6]. A major reason underlying this state of affairs has been the use of a restricted number of microbial platforms, typically Escherichia coli and Saccharomyces cerevisiae, as cell factories [7, 8]. Over time, the field of metabolic engineering (supported by synthetic biology and systems biology) witnessed the adoption of alternative hosts, permitting the synthesis of novel and structurally complex products. In this sense, soil bacteria constitute a group of microorganisms that have attracted attention as microbial platforms since they are exposed to extreme and changing environmental conditions (e.g. different types of physicochemical stress) in their natural niches – an occurrence correlating with remarkable metabolic and physiological robustness. Among this cluster of bacteria, the intrinsically high metabolic diversity that is characteristic of Pseudomonas species continues to provide a solid basis for designing and creating novel (synthetic) pathways for bioproduction [9, 10]. Pseudomonas putida strain mt-2, one of the reference isolates of this genus and the origin of the widely used strain KT2440 (see below) was originally isolated as a degrader of aromatic chemicals in Japan [11], and soon became a plant-growth promoting microorganism and a biocontrol bacterial agent [12–14]. Nowadays, P. putida is regarded as an efficient producer bacterium for biotechnological synthesis of a broad portfolio of biopolymers, chemicals, and pharmaceuticals, including natural products. Key to this development is the purposeful combination of (i) enabling technologies for genome engineering and genetic manipulations, such as novel synthetic biology tools, (ii) alternative feedstocks that cannot be Metabolic Engineering: Concepts and Applications, First Edition. Edited by Sang Yup Lee, Jens Nielsen, and Gregory Stephanopoulos. © 2021 WILEY-VCH GmbH. Published 2021 by WILEY-VCH GmbH.

520

14 Metabolic Engineering of Pseudomonas

Figure 14.1 Pseudomonas putida as a functional chassis for metabolic engineering. This Gram-negative soil bacterium, originally isolated in the sixties in Japan, has been domesticated in the laboratory and became the host of choice for practical applications where a versatile metabolism and high levels of stress resistance are desirable. Over the years, synthetic biology tools (enabling technologies) together with the knowledge brought about by Systems Biology (omic data) facilitated the adoption of P. putida as a platform for metabolic engineering. Note that, besides the unique value of P. putida as a host for chemical production, this bacterium can be also used in environmental applications and has been adopted as a model microorganism for basic research.

processed by other microbial platforms (including man-made waste streams), and (iii) the (ever growing) wealth of omics data that expand our fundamental knowledge of this bacterium (Figure 14.1). A critical feature underlying the rational exploitation of Pseudomonas in whole-cell biocatalysis is the distinct structure of its native metabolism – amenable to targeted manipulations via enabling synthetic biology technologies. P. putida has also become a model bacterium for basic research on its own merit, which permitted studying and uncovering regulatory processes typical of soil microorganisms. In the present chapter, we discuss the current status and recent developments on the use of Pseudomonas species as a chassis of choice for metabolic reprogramming and we identify the key steps toward exploiting its full biotechnological potential, highlighting the areas that will require further development in the near future.

14.2 Bacteria from the Genus Pseudomonas as Platforms for Metabolic Engineering To date, no single, naturally isolated bacterial strain seems to possess all the characteristics desirable in an optimal host for bioproduction. Although the choice of a suitable bacterial platform will largely depend on the application intended, the first step toward developing a functional chassis is the adoption of a flexible and robust bacterium, amenable to genetic and metabolic manipulations and naturally endowed with high catalytic performance across a variety of operating conditions. P. putida satisfies many of these requirements

14.2 Bacteria from the Genus Pseudomonas as Platforms for Metabolic Engineering

because of the environmental conditions prevailing in its natural niches. These conditions include (but are not limited to) a somewhat continuous exposure to environmental contaminants, often together with all sorts of physicochemical stresses (e.g. pH, temperature, low levels of key nutrients) and competing with other (predatory) microbial species. In a broader context, many environmental bacterial species are recognized by their versatile and flexible metabolic lifestyles, allowing them to adapt to rapidly changing conditions (e.g. oxidative stressors, temperature challenges, and osmotic perturbations). The combination of these features finds the best example in P. putida, as disclosed in the next section. 14.2.1

General Characteristics of P. putida

P. putida is a ubiquitous rhizosphere saprophytic specimen and soil colonizer that belongs to the wide group of fluorescent Pseudomonas species. Strains of industrial relevance comprise P. putida KT2440 (the most used specimen, both in the laboratory and industry), P. putida BIRD1 (a plant growth-promoting species [15]), P. putida F1 (originating from a polluted creek and established as a bioremediation agent [16]), P. putida ND6 (known because of its naphthalene-degrading capabilities [17]), P. putida S11 (isolated from rhizosphere soil and used as a plant-growth promoting species [18]), P. putida S16 (a nicotine-degrading isolate [19]), and P. putida UW4 (another remarkable plant-growth promoting species [20]). In addition, solvent-tolerant strains of P. putida, such DOT-T1E [21] or S12 [22], exhibit effective defense mechanisms and display enhanced robustness against organic solvents. P. putida strain KT2440 is still the best-characterized saprophytic member of the group, being a model laboratory species and retaining the ability to survive and thrive in soil environments [12–14]. This P. putida strain is a derivative of P. putida mt-2, isolated from a soil sample in Japan in 1960 as a degrader of 3-methylbenzoic acid («mt-2» indicates «meta-toluate degrader, isolate 2») [23]. Such metabolic property was traced to the presence of the catabolic TOL plasmid pWW0, which encodes dedicated metabolic activities that enable P. putida mt-2 to grow on various aromatic substrates (e.g. toluene, m-xylene, and p-xylene) as sole carbon and energy sources [24, 25]. As a consequence, the metabolic potential of P. putida and closely related Pseudomonas species for biodegradation of aromatic compounds was recognized soon after the first specimens were isolated. Once the catabolic plasmid pWW0 was eliminated from strain mt-2, the plasmid-less variant was designated as strain KT2440 [26] and it soon became the subject of a series of genetic and genomic studies in the laboratory. Although P. putida KT2440 shares coding regions with pathogenic Pseudomonas strains (e.g. Pseudomonas aeruginosa), it lacks typical virulence factors (i.e. type III secretion system components or exotoxin A). The taxonomic credentials of P. putida KT2440 have been described by the time of sequencing the genome of this strain [13, 27], although the issue of taxonomic classification may be revisited in the future. Regardless, an inherent feature of strain KT2440 is its remarkable resistance to oxidative stress [28–32], associated with its capacity to degrade compounds that are a source of stress themselves. Finally, an important practical feature of this strain was the recognition as a safe host of recombinant

521

522

14 Metabolic Engineering of Pseudomonas

DNA early on, only requiring a safety level 1 for its manipulation [33, 34]. This quality has eased its adoption both as a favorite recipient of DNA in a large number of research laboratories and as a safe and straightforward platform for industrial biocatalysis. The in silico analysis of the 6 181 873-bp long genome sequence of P. putida KT2440 [35, 36] confirmed the lack of any conspicuous virulence factor in the 5592 coding sequences present in the genome. The core and pangenome of P. putida have been likewise identified [37]. The latest genome annotation of strain KT2440 generated a list of novel biochemical functions (e.g. the assimilation of alternative, nontrivial carbon and nitrogen sources) that were not previously identified in P. putida and resulted in a total of 1256 degradation reactions – including newly identified catabolic pathways for 32 carbon sources, 28 nitrogen sources, 29 phosphorus sources, and 3 substrates that can be simultaneously used as carbon and nitrogen sources. Genome-wide exploration of the metabolic capabilities of strain KT2440 led to the development of genome-scale metabolic models [36, 38–41], including the latest update by Nogales et al. [42], where the in silico metabolic potential of P. putida has been fully evaluated in the most comprehensive metabolic reconstruction built to date. Interestingly, this study mapped a conserved functional metabolic core common to 82 sequenced P. putida isolates. Based on this reconstruction, large-scale kinetic metabolic models were recently reported to study the effect on single gene knock-outs and ATP availability in silico [43]. As hinted by these studies, and explained in detail in the next section, the rich metabolism of P. putida KT2440 is wired to such a robust core biochemistry characterized by a very distinct architecture. 14.2.2 Substrate Utilization and the Unique Core Metabolism of P. putida In addition to its ability to use a wide variety of carbon and nitrogen sources indicated above, rapid growth and low nutrient demand are some of the added advantages of adopting P. putida as a chassis [44–46]. These properties arise from (or are intimately connected to) a robust central carbon metabolism [47–49]. A logical question in this regard is how is central carbon metabolism fueled by sugars in P. putida. Many Pseudomonads are endowed with peripheral pathways for oxidation of sugars (yielding gluconate and 2-ketogluconate) in addition to the canonical, phosphorylation-dependent routes for hexose assimilation present in other microorganisms [50–52]. Interestingly, only one phosphoenolpyruvate-dependent sugar transport system (phosphotransferase system [PTS]) is present in P. putida (FruBKA, PP_0793-PP_00795), and it is connected to fructose uptake and phosphorylation [53, 54]. Glucose does not play the same central role as a substrate for Pseudomonas as it does in E. coli, Bacillus subtilis, or lactic acid bacteria [55]. In fact, the preferred carbon sources for Pseudomonas are some organic acids (e.g. intermediates of the tricarboxylic acid cycle) or amino acids, rather than hexoses. Not surprisingly, Pseudomonads (which arguably have broader global distributions than E. coli and other well-studied Enterobacteria) have succeeded in conquering diverse environmental niches by deploying metabolic strategies nearly opposite the classical

14.2 Bacteria from the Genus Pseudomonas as Platforms for Metabolic Engineering

carbon catabolite repression phenomena. P. putida seems to utilize a carbon catabolite repression strategy termed reverse catabolite repression [56], because the order of preferred substrates is practically contrary of that observed in E. coli and related species. As such, Pseudomonads prefer organic acids over sugars, may or may not select preferred substrates to optimize growth rates, and do not allocate intracellular resources in a way that results in an overflow metabolism (e.g. as typically observed in E. coli growing in the presence of an excess of glucose [57], which leads to the secretion of acetate). In the presence of succinate or citrate, glucose metabolism is inhibited until these compounds are consumed. The uptake of glucose in P. putida is not mediated by a PTS system, and the sugar enters the periplasmic space through the OprB-1 porin. Glucose can then be directly transported into the cell, or converted to gluconate or to 2-ketogluconate in the periplasmic space, compounds that are internalized using specific transporters [55]. In contrast, and as indicated above, a PTS transporter takes up fructose [58] by means of a mechanism subjected to exquisite metabolic and transcriptional regulation [54]; all other sugars metabolized by Pseudomonads are transported through PTS-independent systems. P. putida, as well as several other Pseudomonas strains, cannot grow on C5 sugars such as xylose or arabinose, and only selected isolates (such as Pseudomonas taiwanensis) can assimilate xylose via the Weimberg pathway [59]. In an attempt to broaden the spectrum of sugar substrates that can be used for bioprocesses involving P. putida, different pathways for xylose catabolism have been recently implemented in strain KT2440 (i.e. the isomerase route, and the Weimberg and Dahms pathways) toward the production of added-value compounds [60]. In contrast, glycerol, organic acids, fatty acids, and amino acids are assimilated by the cell via specific transporters. Glycerol, a side product from the biodiesel industry and an interesting industrial substrate [61], is incorporated by an outer-membrane porin (OprB-1, PP_1019) and is further transported into the cytoplasm by a facilitator (GlpF, PP_1076). The catabolic breakdown of this substrate proceeds via a kinase (GlpK, PP_1075), which phosphorylates glycerol into glycerol-3-phosphate. This intermediate is oxidized to dihydroxyacetone phosphate (DHAP) by a membrane-bound dehydrogenase (GlpD, PP_1073) [62]. Glycerol metabolism has been described to activate both glycolytic and gluconeogenic pathways in P. putida [62] and, intriguingly, this substrate consistently provokes a protracted lag phase [63]. Deletion of the gene encoding the GlpR regulator (PP_1074) in strain KT2440 alleviates the repression of several genes involved in glycerol metabolism, thus reducing the length of the lag phase [64]. As indicated above, glucose is incorporated either via oxidation of the sugar to gluconate or 2-ketogluconate (each oxidized product follows a dedicated catabolic pathway for further processing) or the direct phosphorylation by glucokinase (PP_1011) [65]. The excreted gluconate and 2-ketogluconate can be reimported into the cell upon glucose depletion. After phosphorylation, these intermediates converge at the 6-phosphogluconate node and feed the Entner–Doudoroff route (composed by Edd, 6-phosphogluconate dehydratase, PP_1010, and Eda, 2-keto-3-deoxy-6-phosphogluconate aldolase, PP_1024), yielding the trioses pyruvate and glyceraldehyde 3-phosphate. Irrespective of the

523

524

14 Metabolic Engineering of Pseudomonas

first steps in sugar processing (either oxidation or phosphorylation, although the former seems to be preferred over the latter in strain KT2440), these pathways converge at the key intermediate 6-phosphogluconate – a metabolite that serves as precursor of the Entner–Doudoroff pathway and sits at the interphase between the oxidative and nonoxidative branches of the pentose phosphate route [50]. As reviewed by Udaondo et al. [55], the regulation of catabolic pathways for carbohydrate processing is orchestrated by a versatile set of transcriptional regulators in several Pseudomonas species [66–70]. A remarkable feature of the metabolic architecture in P. putida KT2440 is the EDEMP cycle, formed by the combined activity of enzymes from the Entner–Doudoroff pathway, the pentose phosphate pathway, and the (incomplete) Embden–Meyerhof–Parnas route (Figure 14.2a). When growing on hexoses, the operativity of this metabolic architecture endows P. putida with high NADPH regeneration rates via partial recycling of triose-phosphates [72] – a property that is in turn regulated by the presence and extent of oxidative stress conditions. Furthermore, the EDEMP cycle enables different ATP and NADPH formation rates depending on the degree of recycling, a circumstance reflecting the specific stoichiometry of the Entner–Doudoroff pathway and the Embden–Meyerhof–Parnas pathway [73]. The former yields half the ATP per molecule of glucose than the latter, but at the same time produces one NADH and one NADPH equivalent. Assuming (i) a NAD+ -dependent glyceraldehyde-3-phosphate dehydrogenase and a NADP+ -dependent glucose 6-phosphate dehydrogenase under physiological conditions, (ii) pyruvate as the main triose end product (further processed to acetyl-coenzyme A), and (iii) negligible fluxes through the peripheral loop leading to 2-ketogluconate, the net flux of NADPH formation stemming from the activities in the EDEMP cycle increases linearly with the overall recycling flux from glyceraldehyde 3-phosphate upward [47]. Although the EDEMP cycle fosters NADPH regeneration, ATP formation via a linear glycolysis (interrupted in strain KT2440, due to the lack of a 6-phosphofructo-1-kinase activity) would be desirable for some applications where the energy status of the cells is key for whole-cell biocatalysis. As illustrated in Figure 14.2b, deep refactoring efforts allowed for the functional replacement of this glycolytic cycle by a synthetic catabolic route [71], based on the elements of the archetypal Embden–Meyerhof–Parnas pathway [52]. Strains carrying the synthetic glycolysis were further engineered to enable synthesis of carotenoids from glucose, indicating that the implanted route supported higher product content on biomass and yield on sugar as compared with strains running the native hexose catabolism. This example of metabolic rewiring demonstrates how the expansion of the synthetic biology toolbox enables the implementation of complex engineering efforts that would have been impossible a few years ago, as discussed in the next section. 14.2.3

Synthetic Biology Tools for Metabolic Engineering of P. putida

Metabolic engineering is experiencing a transition from being a mostly trial-and-error exercise to become an authentic branch of rational engineering thanks to the tools and approaches brought about by contemporary synthetic

14.2 Bacteria from the Genus Pseudomonas as Platforms for Metabolic Engineering

(a)

(b)

Figure 14.2 Natural and engineered features of Pseudomonas putida useful for metabolic engineering. (a) Glucose catabolism occurs mainly through the activity of the Entner–Doudoroff (ED) pathway, and part of the trioses-phosphate generated are recycled to hexoses-phosphate by the EDEMP cycle (shaded in blue), encompassing activities from the ED (green), Embden–Meyerhof–Parnas (EMP, blue), and pentose phosphate (PP, red) routes. G6P, glucose-6-phosphate; 6PG, 6-phosphogluconate; KDPG, 2-keto-3-deoxy-6-phosphogluconate; GA3P, glyceraldehyde-3-phosphate; FBP, fructose-1,6-bisphosphate; DHAP, dihydroxyacetone phosphate; F6P, fructose-6-P; and Pyr, pyruvate. Source: Based on Sánchez-Pascuala et al. [52]. (b) Functional replacement of the core sugar metabolism of P. putida KT2440 by a synthetic glycolysis composed of the minimal EMP pathway elements [71]. Source: Based on Sánchez-Pascuala et al. [71]. Note that the organization of the synthetic glycolytic modules (GlucoBrick I and II) is indicated by highlighting the EMP activities encoded by each gene [52].

biology. Such enabling technologies are essential for the design and implantation of synthetic designs in cell factories, and they have enabled the adoption of Pseudomonas as a platform for biochemical reprogramming. Dedicated tools for gene expression have been developed for P. putida over the years [74], and Pseudomonas species have also served as a treasure trove of functional parts (e.g. specific transcriptional regulators and their cognate promoters) that have been harnessed for the construction of novel expression systems. Table 14.1 compiles the most popular gene expression systems. Building on the wealth of catabolic pathways in P. putida, many transcriptional regulators that respond to complex molecules (e.g. regulatory proteins that rule aromatic degradation pathways) have been used to design expression systems that could be considered orthogonal when implanted in any surrogate host. In other cases, expression systems originally developed for E. coli have been successfully transferred to P. putida [93], including specific biosensors for monitoring cell physiology [94]. The lack of well-established standards, which used to afflict not only engineering efforts using P. putida but essentially the entire field of synthetic biology, has been tackled systematically only recently. The hallmark brought about by the Standard European Vector Architecture (SEVA; a user-friendly, open-source platform in constant expansion [95–97]) has enormously helped to standardize the use of genetic parts that can be applied for metabolic engineering of P. putida (Figure 14.3a). Besides many of the gene expression systems listed in Table 14.1, other functionalities have been formatted and added to the SEVA collection, including Tn5- and Tn7-based transposon vectors [101–104], and the list of novel parts, devices, and options for assembly of complex constructs

525

526

14 Metabolic Engineering of Pseudomonas

Table 14.1 Pseudomonas species as the source of genetic parts applied for regulated gene expression in metabolic engineering. Expression system (Transcriptional regulator/promoter)

Inducera)

Source of regulatory elements

Reference(s)

XylS/Pm

3-Methylbenzoic acid

P. putida mt-2

[75–77]

XylR/Pu

3-Methylbenzyl alcohol

P. putida mt-2

[78–80]

AlkS/PalkB

Short-chain alkanes

P. putida GPo1

[81, 82]

NahR/Psal

Salicylic acid

P. putida NCIB 9816-4

[83–85]

TodST/PtodX

4-Chloroaniline

P. putida T-57

[86]

CymR/Pcym

4-Isopropylbenzoic acid

P. putida F1

[87]

ClcR/PclcA

3- or 4-Chlorocatechol

P. putida PRS2015

[88]

HpdR/PhpdH

3-Hydroxypropionic acid

P. putida KT2440

[89]

MmsR/PmmsA

3-Hydroxypropionic acid

P. putida KT2440

[89]

MekR/PmekA

Methyl-ethyl ketone

P. veronii

[90, 91]

MtlR/PmtlE

Mannitol

Pseudomonas fluorescens

[92]

a) In some cases, more than one inducer can activate the expression system. Only the typical inducers are listed in the table (usually selected because they are the cheapest and/or the most effective in triggering the activity of the transcriptional regulator).

continues to expand, including strategies for controlling plasmid replication and maintenance [105]. Calibrated promoters and other standard parts are likewise available for targeted manipulations [106, 107]. Recent breakthroughs have been brought about by implementing technologies based on clustered regularly interspaced short palindromic repeats (CRISPR) and the Cas9 protein [108]. These advances resulted in novel tools that facilitate manipulations of P putida [100, 109–111], including counterselection strategies for genome engineering (Figure 14.3b) and specific downregulation of gene expression (i.e. CRISPR interference) using catalytically deactivated versions of the Cas9 protein [112–115]. Genome engineering strategies are being constantly upgraded as well [99, 116], in some cases to levels comparable to those in E. coli. The most recent innovation includes adapting multiplex automated genome engineering (MAGE)-like [117] strategies for high-efficiency multiple genomic site engineering (HEMSE) of P. putida [118]. This effort was made possible by previously identifying Pseudomonas-specific single-stranded DNA recombinases [119], which brought replication fork invasion by mutagenic DNA to workable levels in the absence of phenotypic selection. Should desired mutations be inconspicuous, DNA recombineering can then be combined with CRISPR/Cas9 counterselection of wild-type sequences [100]. The impact of CRISPR/Cas-based gene editing techniques and its combination with other approaches is expected to reach genetically intractable microbial species [120], and will be instrumental to harness the metabolic potential of nontraditional Pseudomonas strains [121].

(a)

(b)

Figure 14.3 (a) General structure of the Standard European Vector Architecture (SEVA). The key elements common to all SEVA plasmids [96] are indicated with different colors, including the determinant of antibiotic resistance (AbR ). PS1–PS2 identify conserved oligonucleotides that can be used for sequencing of individual vector modules. Source: Modified from Silva-Rocha et al. [96], 2013, Oxford University Press. (b) Different approaches for genome engineering of Pseudomonas species using CRISPR/Cas9 as counterselection system. Plasmid-based CRISPR/Cas9 expression is illustrated with vector pS448⋅CsR as an example [98]. This plasmid carries a streptomycin-resistance gene (SmR ), cas9 under the control of the inducible XylS/Pm expression system and a customizable cassette including a specific synthetic guide RNA (sgRNA) constitutively expressed by the PEM7 promoter. Once a co-integrate strain is constructed [99], genome engineering of Pseudomonas would include (1) Cas9 production upon induction of the system with 3-methylbenzoate (3-mBz) and introduction of a double-stranded (ds) break in DNA at the target locus that is lethal for wild-type cells. CRISPR/Cas9-assisted counterselection can be also combined with DNA recombineering [100]. A linear DNA fragment (e.g. dsDNA obtained via PCR (2.1) or a chemically synthesized single-stranded (ss) DNA fragment, both containing homologous arms (HA) to the target locus) are used to introduce genetic modifications (indicated with a red star) at any given target site. If dsDNA is used for this purpose (2.2), one of the two DNA strands is digested with an exonuclease (e.g. Beta/Exo from the 𝜆 Red system), leaving an ssDNA fragment protected from degradation by an ssDNA-binding protein (e.g. Beta, SSR, or RecT). The two flanking HAs mediate the insertion of the mutagenic ssDNA fragment into the chromosome via homologous recombination (2.3) or as an Okazaki fragment during chromosome replication (2.4). Finally, CRISPR/Cas9 counterselection of the wild-type sequence allows for the allelic exchange of a homologous DNA fragment containing the desired modifications directly from a plasmid via a double-crossover mechanism (3). Sources: Wirth and Nikel [98]; Martínez-García and de Lorenzo [99]; Aparicio et al. [100]; Woolston et al. [2].

528

14 Metabolic Engineering of Pseudomonas

Besides plasmid-based gene expression and gene manipulation, assembly and handling of large DNA segments has been also incorporated to the toolbox [122, 123], together with the use of transposons for random or site-directed insertion of DNA cargoes [101–103]. Moreover, the critical issue of leaky expression in Pseudomonas has been likewise addressed recently by implementing a simple digitalizer module (based on the activity of a translation-inhibitory small RNA) that completely suppresses the basal output of strong promoters [124]. Coupling gene expression with specific degradation of protein products thereof, mediated by orthogonal proteases, has been likewise implemented as a strategy to reduce background levels of leaky expression [125, 126]. RNA levels have been also targeted for adjusting gene expression levels in Pseudomonas [127, 128], and 16S rDNA loci can be used as landing pads for gene integration [129]. While the number and scope of synthetic biology approaches and tools continue to expand, several examples on the use of Pseudomonas as hosts for metabolic engineering have been described in the literature. Some of the most relevant cases are presented in the next section below.

14.3 Examples of Metabolic Engineering of P. putida and Other Pseudomonas Species 14.3.1 Toward a Reference Chassis: Genome-Reduced Variants of Pseudomonas The targeted elimination of metabolic and physiological functions that are deemed unnecessary (or even detrimental) for practical applications is one of the key steps in the construction of robust cell factories [9]. A major energy-wasting process is the flagellar motion that many bacteria use to move toward substrates that are needed for growth (including oxygen) – a process that appears redundant in stirred fermentation tanks. Martínez-García et al. [130] removed a large stretch of DNA (∼70 kb, corresponding to ∼1.1% of the bacterial genome) from the chromosome of P. putida KT2440, including 69 structural and regulatory genes for the assembly and export of flagella as well as several chemotaxis functions. This operation resulted in a diverse set of different physiological changes in the reduced-genome strain. Most obviously, cell motility was suppressed, resulting in high sedimentation rates, but the loss of the outer membrane-associated flagella also decreased surface hydrophobicity. Both the lag phase and the maximum growth rate of this strain were significantly altered upon exposure to various carbon sources. These physiological features were accompanied by changes in the ATP/ADP ratio (i.e. the energy charge of the cell), which increased by ∼1.3-fold. The nonflagellated mutant also had a ∼1.2-fold higher NADPH/NADP+ ratio than the wild-type strain, while the catabolic charge (i.e. the NADH/NAD+ ratio) remained essentially constant. An increased availability of NADPH not only enhanced the anabolic capacity of the cell for biosynthesis, but also increased its tolerance toward oxidative stress and UV exposure [131] – traits that are useful for industrially relevant applications.

14.3 Examples of Metabolic Engineering of P. putida and Other Pseudomonas Species

An essential property of a cell factory in large-scale production scenarios is genetic stability. A significant source of instability is brought about by viral DNA and transposing elements (sometimes accounting for up to 20% of the genome), which usually lie dormant within the bacterial chromosome [132]. Although subject to a continuous decay, prophages often retain some of their gene functions if they provide a benefit for the host. While these mobile genetic elements (and similar features) might contain beneficial functions under certain (selective) conditions, they also represent a disruptive mutagenic force by randomly inserting into gene clusters. In P. putida KT2440, 2.6% of the genome was found to encode phage-related functions, distributed in four prophage elements, each one containing up to 72 ORFs [133]. None of the prophages reinitiates a lytic cycle, despite having the ability to be excised under specific environmental conditions. Deletion of all prophages led to an enhanced tolerance to diverse stress factors, and a prophage-less P. putida variant showed increased survival in stationary phase and lower sensitivity to UV-irradiation and chemical mutagens. Moreover, removal of the proviral load led to an increase in fitness in a set of different culture media compositions [133]. Analysis of the annotated genome of P. putida KT2440 further revealed the presence of 54 transposable elements, one of which (Tn4652) has been reported to become active under carbon starvation [134]. Martínez-García et al. [133] removed all of these mobile genetic elements as well as DNA-degrading systems of P. putida KT2440 that are likely to interfere with the introduction of foreign DNA during genetic engineering efforts. The deleted regions comprise a total number of 299 genes, corresponding to ∼4.3% of the genome. These sequential deletions resulted in the cell factory strain P. putida EM42 [130]. This strain was subjected to extensive physiological and genetic characterization. As described for a flagella-less strain, lag phases were decreased on various carbon sources, likely due to an increased NADPH/NADP+ ratio that also enabled the cells to overcome oxidative stress. The reduced-genome variant has a higher biomass yield than its parental strain, suggesting that multiple deletions reduced maintenance requirements [135]. This experimental observation is in line with an elevated energy charge in strain EM42 (due to the absence of flagella, as described above), as well to as an increased availability of the central metabolite acetyl-coenzyme A – thereby indicating a better availability of resources for biomass synthesis. The surplus of ATP also allows strain EM42 to survive and even grow at temperatures (42∘ C) that are usually lethal for wild-type P. putida KT2440 [136]. Recently, the implementation of similar approaches in other Pseudomonas species led to the creation of reduced-genome variants of P. taiwanensis VLB120 [137], which exhibited enhanced catalytic properties toward production of aromatic compounds, and also reduced-genome derivatives of the plant growth-promoting species Pseudomonas chlororaphis [138]. 14.3.2

Expansion of the Carbon Substrate Range

One of the unique features of P. putida is its versatility in terms of carbon and nitrogen substrates. This trait has been explored in silico for 57 individual carbon sources, and the growth parameters of strain KT2440 have been estimated

529

530

14 Metabolic Engineering of Pseudomonas

in each case, with experimental validations carried out in batch cultures for six substrates (i.e. acetate, glycerol, citrate, succinate, malate, and methanol) [139]. Glycerol was the carbon source that promoted the highest biomass yield on substrate (0.61 C-mol C-mol−1 ) with a very good fit between the in silico prediction and the experimental validation. Not surprisingly, and due to its biotechnological value as a byproduct of the biodiesel industry, glycerol consumption by Pseudomonas has been explored from a biochemical and genetic point of view [62, 64]. This substrate has been also used for bioprocesses for the cost-efficient production of polyhydroxyalkanoates [140, 141]. From a broader perspective, the nutritional landscape of P. putida, which matches its environmental niches in Nature (including plant rhizosphere and polluted soils), has pushed metabolic specialization toward the use of organic acids, amino acids, and aromatic substrates. Considering that glucose and xylose are the two most abundant building blocks of the polysaccharides in plant cell walls (i.e. cellulose and hemicellulose), both of them are attractive substrates for bioprocesses [142]. As indicated before, P. putida KT2440 can grow on hexoses (glucose and fructose), but this strain is unable to naturally metabolize disaccharides or C5 sugars [59], thus limiting the number and nature of sugars that can be used in bioprocesses. Dvoˇrák and de Lorenzo [143] expanded the substrate range of the genome-reduced strain EM42 to include disaccharides and xylose as substrates by adding a 𝛽-glucosidase from Thermobifida fusca (i.e. for intracellular hydrolysis of cellobiose) and three separate activities from E. coli (i.e. a xylose transporter and isomerase, and a xylulokinase), simultaneously blocking the oxidative branch of sugar utilization by eliminating glucose dehydrogenase. These manipulations enabled co-utilization and total utilization of both cellobiose and xylose by the engineered strain in a minimal medium. Other substrates beyond individual sugars are likewise relevant for industrial bioproduction. Lignocellulose can be decomposed to cellulose (25–55%), hemicellulose (11–50%), and lignin (10–40%) [144]. C5 and C6 sugars can be dehydrated to furfural and 5-(hydroxymethyl)furfural, respectively, during the extraction process. Both aldehydes are major inhibitors in microbial conversion processes, but some microorganisms are known to convert them to their less toxic alcohol counterparts, furfuryl alcohol and 5-(hydroxymethyl)furfuryl alcohol. Following this rationale, P. putida KT2440 was engineered to utilize both compounds as sole carbon and energy sources via genomic integration of the hmf gene cluster, encoding the eight enzymes from Burkholderia phytofirmans that transform the two aldehydes in the common metabolite 2-furoic acid [145]. Sugars derived from lignin have been recently used as substrates for the production of medium-chain-length biopolymers by engineered P. putida [146]. 14.3.3

Engineering the Oxygen-Dependent Lifestyle of P. putida

Microbial activity in soil is known to be spatially heterogeneous, and individual species form hotspots that differentially contribute to biogeochemical processes. Spatial organization of bacterial species contributes to the persistence of anoxic areas in soils – a hypothesis recently confirmed experimentally with a soil-like system that includes both the obligate aerobe P. putida and the facultative

14.3 Examples of Metabolic Engineering of P. putida and Other Pseudomonas Species

anaerobe Pseudomonas veronii [147]. The strictly aerobic, highly oxidative metabolism of P. putida, however, hampers its application under micro-oxic and anoxic conditions, excluding the utilization and production of oxygen-sensitive proteins and metabolites in this species. This situation also leads to complications for industrial-scale applications, where oxygen gradients are essentially unavoidable. On the one hand, the operation of oxic bioprocesses increases capital cost because scaling up these processes is limited by oxygen transfer rates. As a consequence, both the maximum and average scale of oxic bioreactors are smaller in comparison with anoxic tanks [148]. On the other hand, while oxygen-dependent bioproduction inevitably brings about substrate loss in the form of CO2 , anoxic processes can achieve carbon yields close to theoretical yields. Therefore, engineering an anoxic P. putida chassis has been the subject of intense research over the last few years, starting with a quantitative study of the responses of P. putida to different agitation rates and oxygen availability in shaken-flask cultures [149]. Three types of engineering approaches have been implemented to this end: engineering a mixed fermentation or a nitrate-dependent respiration, and using bioelectrochemical systems. In the first approach, the genes encoding acetate kinase from E. coli and the ethanol biosynthesis pathway from Zymomonas mobilis were introduced in strain KT2440, enabling extended bacterial survival in the absence of oxygen [150]. This chassis was further engineered with two haloalkane dehalogenases from Pseudomonas pavonaceae [151], which conferred the ability of degrading 1,3-dichloprop-1-ene (a recalcitrant xenobiotic that neither wild-type P. putida nor P. pavonaceae can degrade under anoxic conditions). In a different study, Steen et al. [152] constructed two cosmids encoding all the structural, maturation-related, and regulatory genes needed for the activity of nitrate reductase and nitrite- and nitric oxide reductase from P. aeruginosa in order to establish nitrate-dependent respiration in strain KT2440. The resulting engineered strains efficiently reduced nitrate or nitrite, which in turn sustained an extended anoxic lifespan. The utilization of bioelectrochemical systems was exploited by Schmitz et al. [153] to engineer a P. putida KT2440 derivative able to synthesize phenazine redox-mediators. Formation of redox-active pyocyanin allowed for partial redox balancing with an electrode under micro-oxic conditions, and the biomass yield on glucose of the engineered P. putida was doubled as compared with the wild-type strain. Lai et al. [154] used [Fe(CN)6 ]3− or [Co(2,2′ -bipyridine)3 ]3+ as redox mediators when culturing P. putida F1 in the anodic compartment of a bioelectrochemical system that ran under anoxic conditions in the presence of glucose as the carbon source. Under these conditions, most of the glucose was converted into 2-ketogluconate (with a yield of ca. 0.9 mol mol−1 ), and overexpression of the endogenous gluconate dehydrogenase boosted 2-ketogluconate formation under anoxic conditions by >600% [155]. Yet, no metabolically engineered P. putida strain is actually able to grow in the absence of oxygen, which opens up the question of what are the factors missing in the picture. Kampers et al. [156] attempted to solve the issue by using a combination of genome-scale metabolic modeling and comparative genomics pinpoint essential oxygen-dependent processes in strain KT2440. The analysis indicated

531

532

14 Metabolic Engineering of Pseudomonas

that limited ATP generation hinders growth under anoxic conditions (as previously shown by Nikel and de Lorenzo [150]) paired with reduced synthesis of essential metabolites. Several engineered strains were constructed to tackle these issues, adding further metabolic activities besides acetate kinase for ATP production. Some of these strains harbor a class-I dihydroorotate dehydrogenase and a class-III, anaerobic ribonucleotide triphosphate reductase from Lactococcus lactis for the synthesis of key metabolic intermediates needed for biomass formation. As observed in previous studies, the engineered strain was shown to survive for longer under micro-oxic conditions when compared to P. putida KT2440, although the biochemical mechanisms necessary for fully anoxic growth remain elusive as of yet. 14.3.4

Production of Aromatic Molecules and Organic Acids

The capacity of P. putida strains to degrade aromatic compounds was recognized early on as a key signature of the species, and the late 1980s and early 1990s witnessed the golden era of research in the biodegradation of xenobiotic compounds. Numerous attempts followed the pioneering work by Chakrabarty, describing the construction of recombinant P. putida strains by plasmid-assisted molecular breeding, i.e. propagating catabolic capabilities through directed bacterial conjugation and plasmid transfer [157]. The resulting strains were able to break down the hydrocarbons typically found in crude oil faster than any individual strain previously described, giving rise to the first patent granted for a recombinant microorganism [158]. On this background, biodegradation, a key signature of Pseudomonas species [159], will be discussed in a separate chapter of this book. Considering that Pseudomonas species are naturally endowed with the metabolic machinery needed for degradation of aromatic compounds (and an inherently high level of tolerance to these chemicals [160]), these bacteria can be used for the synthesis of cognate molecules. Production of some aromatic building blocks to be used as commodities is an important objective toward sustainability, since the industrial synthesis of aromatics currently relies on petrochemical-based processes from benzene, toluene, and xylenes [161]. P. putida has been extensively engineered for the production of aromatic chemicals that are toxic to other microbial hosts – e.g. cinnamate, p-coumarate, p-hydroxybenzoate, and phenol [162–164]. Phenol is an aromatic commodity with applications in the chemical industry, and biological production from tyrosine was achieved by introducing tyrosine-phenol lyase from Pantoea agglomerans or Pasteurella multocida in P. putida or E. coli strains [165]. However, P. putida variants capable of producing phenol from glucose with reasonably high yields were obtained by nonrational strain engineering techniques, i.e. random mutagenesis followed by extensive high-throughput screening. Building on the knowledge stemming from early studies, Pinus taiwanensis was forward- and reverse-engineered (22 modifications in total) to yield a strain that bears no plasmids (i.e. all relevant activities were integrated into the chromosome), and exhibits no auxotrophies (a problem that afflicted engineered P. putida strains constructed thus far). The product titer and yield

14.3 Examples of Metabolic Engineering of P. putida and Other Pseudomonas Species

of phenol on glucose reached by the best engineered candidate was ca. 3 mM and 16% C-mol C-mol−1 , respectively, the highest reported in the absence of complex additives (e.g. yeast extract) to the culture medium. Using a very similar approach, trans-cinnamate is another aromatic compound that has been produced in engineered P. taiwanensis using glycerol as the main carbon substrate [166]. Organic acids form a different group of compounds that can be produced at high yields by Pseudomonas species. Because of its economic importance, a noteworthy example is the production of cis,cis-muconic acid [(2E,4E)-2,4-hexanedioic acid]. This dicarboxylic acid is a relevant platform chemical currently recognized for its broad industrial value [167] – providing synthetic access to terephthalic acid, 3-hexenedioic acid, 2-hexenedioic acid, 1,6-hexanediol, 𝜀-caprolactam, and 𝜀-caprolactone, all of which are important building blocks of commercial plastics, resins, and polymers (e.g. Nylon-6,6 via adipic acid [168]). The traditional chemical processes for production of muconates rely on nonrenewable, oil-based feedstocks and high concentrations of heavy metal catalysts – yielding a mixture of cis,cis- and cis,trans-muconic acid isomers. The synthesis of cis,cis-muconic acid in P. putida mt-2 takes place as an essential part of the upper catabolic pathways dealing with the degradation of aromatic compounds [169]. All of these catabolic segments ultimately converge at the level of catechol (1,2-dihydroxybenzene) as a central metabolic intermediate, which then undergoes intradiol ring ortho-cleavage by the action of catechol 1,2-dioxygenase to yield cis,cis-muconic acid. Inactivating muconate cycloisomerase was among the first attempts to enhance cis,cis-muconic acid titers, given that this enzyme would lactonize the product and thereby reduce yields. Such strategy enabled the production of cis,cis-muconic acid via the catechol branch of the 𝛽-ketoadipate pathway directly from benzoic acid and toluene [170]. In addition, the expression of phenol hydroxylating enzymes (i.e. the dmpKLMNOP-encoded phenol monooxygenase from Pseudomonas sp. strain CF600) enabled cis,cis-muconic acid synthesis from phenol [171]. More recently, the entire protocatechuate branch of the 𝛽-ketoadipate pathway was successfully connected to the catechol node, thereby allowing for cis,cis-muconic acid formation from an even larger number of precursors, including coniferyl alcohol, p-coumarate, vanillate, ferulate, and protocatechuate [172]. As a result, the upper pathways of P. putida for aromatic compound degradation were rendered a metabolic funnel to convert heterogeneous mixtures of aromatics with catechol as a central intermediate [173]. Kohlstedt et al. [174] designed an entire genealogy of engineered P. putida strains endowed with enhanced catechol tolerance and high conversion efficiency, and also having a wider substrate spectrum than other production platforms. Different metabolic configurations were tested, and cis,cis-muconic acid was produced from benzoate, catechol, mixtures of catechol and phenol, and from a lignin hydrolysate liquor. When grown under fed-batch conditions using glucose as the main carbon source, some of the engineered strains attained a titer of ca. 65 g L−1 from externally added catechol and 13 g L−1 from the lignin hydrolysate. More importantly, these engineered producers provided the first case example of lignin-to-Nylon-6,6 production via a cascaded biochemical

533

534

14 Metabolic Engineering of Pseudomonas

and chemical integrated process. The bulk of the acid obtained in this process was hydrogenated to adipic acid via Pd-assisted hydrogenation process, and the product was directly reacted with hexamethylenediamine to yield Nylon-6,6. More recently, Bentley et al. [175] applied adaptive laboratory evolution combined with rationale engineering of P. putida strains for the production of cis,cis-muconic acid by rewiring glucose metabolism. The resulting strain (in which hexR, gntZ, and gacS have been inactivated) was capable of producing 22 g L−1 of the acid at a volumetric productivity of 0.21 g L−1 h−1 and with a yield of 35.6% (mol/mol) from glucose. The recent implementation of synthetic modules for glucose catabolism [52, 71, 176] could be combined with these strategies toward increasing cis,cis-muconic acid production from sugars. 14.3.5

Other Bioproducts

The list of products that have been targeted using Pseudomonas species as the host for metabolic manipulations, besides the examples covered in the previous sections, include rhamnolipids [177, 178] and its 3-(3-hydroxyalkanoyloxy) alkanoic acid precursors [179], n-butanol [180] and other short-chain alcohols [181], valerolactam [182], methyl ketones [183], terpenoids [44], and polyketides [184]. Notably, chemical and biotechnological companies (e.g. Pfizer, Lonza, DSM, DuPont, and BASF) have exploited P. putida for the biosynthesis of natural products [185]. The main interest on these chemical structures relies on their value as precursors for the pharmaceutical industry, especially when it comes to the synthesis of chiral molecules that are particularly difficult to synthesize using traditional chemical methods [186]. Not surprisingly, Pseudomonas species are endowed with a number of multicomponent enzymes for secondary metabolism (e.g. nonribosomal peptide synthetases) that, for the time being, remain largely unexploited for metabolic engineering. Along the same line, biopolymers constitute one of the main products of P. putida, and polyhydroxyalkanoate production has been exploited for commercial purposes (especially for medium-chain-length biopolymers). The reader is referred to the key literature in the domain for further information on polyester production in engineered P. putida [187–195].

14.4 Conclusions and Future Prospects The potential of P. putida and other Pseudomonas species to accommodate native, engineered, or completely synthetic biochemical pathways is a matter of intense research. Yet, there is room for further developments by capitalizing on the growing body of knowledge on these bacteria. In this concluding section, we summarize some of the challenges ahead in the use of Pseudomonas as bacterial platforms. Firstly, most examples of metabolic engineering of bioproducts are primarily limited to molecules resulting from the combination of only six common elements (i.e. carbon, hydrogen, oxygen, nitrogen, phosphorus, and sulfur). Expanding the chemical landscape of microbial cell factories is one of the most

Acknowledgments

sought after objectives in the field nowadays [196–198]. Pseudomonas species are certainly a good option to engineer metabolisms toward the production of new-to-Nature compounds that included nonbiological atoms, e.g. halogens, considering the natural tolerance to complex molecules (and the inherent ability of these bacteria to handle halogenated organic substrates). Along the same line, alternative substrates should be also considered. A promising bioproduction strategy could be the use of reduced inorganic or one-carbon (C1) electron donors to provide reducing power and energy to drive CO2 fixation, e.g. CO, CH4 , CH3 OH, and HCOOH. Developing large-scale processes to bridge the gap between design (laboratory-scale) and production (industrial-scale) continues to be a major challenge in metabolic engineering [199, 200]. The lack of genomic stability of whole-cell catalysts when expressing complex DNA constructs, the reliance on sterile liquid media for growth, and the difficulty of downstream processing for recovering the products of interest are just a few issues to be solved in this domain. Developing strategies for scaling up the corresponding operations and making them economically viable are thus as important as the deep genetic engineering attempts discussed in the sections above. Efficient strategies for high-cell-density cultivation of engineered P. putida are needed to reach productivities compatible with commercialization, and the regulatory circuits that are known to affect cell density-dependent physiological traits (e.g. quorum sensing) should be tackled accordingly. Curbing evolvability [201, 202], increasing portability of genetic circuits [203, 204], handling water-limitation and saline stress [205], engineering efficient secretion systems [206], and implementing alternative production system [207, 208] will help establishing industrial applications of Pseudomonas – as well as in other fields such as agriculture [209]. Valuable efforts have been attempted thus far, but further developments are needed toward the much pursued shift between an oil-based economy into a biosustainable chemical industry [210, 211]. While these practical bottlenecks will be tackled in the near future, we strongly advocate the relevance of P. putida and related species for metabolic engineering in an era dominated by the 4th Industrial Revolution [212].

Acknowledgments The input of members of the authors’ laboratories is gratefully acknowledged, especially Nicolas T. Wirth (DTU Biosustain) for his valuable input in figure drawing. Financial support from The Novo Nordisk Foundation (NNF10CC1016517 and NNF 18CC0033664), the Danish Council for Independent Research (SWEET, DFF-Research Project 8021-00039B), and the European Union’s Horizon2020 Research and Innovation Program under grant agreement No. 814418 (SinFonia) to P.I.N. is gratefully recognized. This study was also supported by the HELIOS Project of the Spanish Ministry of Economy and Competitiveness BIO2015-66960-C3-2-R (MINECO/ FEDER), and the ARISYS (ERC-2012-ADG-322797), EmPowerPutida (EUH2020-BIOTEC-2014-2015-6335536), and MADONNA (H2020-FET-

535

536

14 Metabolic Engineering of Pseudomonas

OPEN-RIA-2017-1-766975) contracts of the European Union to V.D.L. The authors declare that no conflict of interest exists in connection with the contents of this chapter.

References 1 Nielsen, J. and Keasling, J.D. (2016). Engineering cellular metabolism. Cell

164: 1185–1197. 2 Woolston, B.M., Edgar, S., and Stephanopoulos, G. (2013). Metabolic engi-

neering: past and future. Annu. Rev. Chem. Biomol. Eng. 4: 259–288. 3 Park, S.Y., Yang, D., Ha, S.H., and Lee, S.Y. (2018). Metabolic engineering of

4 5

6

7

8 9

10 11 12 13

14 15

16

microorganisms for the production of natural compounds. Adv. Biosys. 2: 1700190. Smanski, M.J., Zhou, H., Claesen, J. et al. (2016). Synthetic biology to access and expand nature’s chemical diversity. Nat. Rev. Microbiol. 14: 135–149. King, J.R., Edgar, S., Qiao, K., and Stephanopoulos, G. (2016). Accessing Nature’s diversity through metabolic engineering and synthetic biology. F1000Res. 5: 397. Chubukov, V., Mukhopadhyay, A., Petzold, C.J. et al. (2016). Synthetic and systems biology for microbial production of commodity chemicals. Syst. Biol. Appl. 2: 16009. Beites, T. and Mendes, M.V. (2015). Chassis optimization as a cornerstone for the application of synthetic biology based strategies in microbial secondary metabolism. Front. Microbiol. 6: 906. Liu, H. and Deutschbauer, A.M. (2018). Rapidly moving new bacteria to model-organism status. Curr. Opin. Biotechnol. 51: 116–122. Calero, P. and Nikel, P.I. (2019). Chasing bacterial chassis for metabolic engineering: a perspective review from classical to non-traditional microorganisms. Microb. Biotechnol. 12: 98–124. Volke, D.C., Calero, P., and Nikel, P.I. (2020). Pseudomonas putida. Trends Microbiol. 28: 512–513. Palleroni, N.J. (2010). The Pseudomonas story. Environ. Microbiol. 12: 1377–1383. Timmis, K.N. (2002). Pseudomonas putida: a cosmopolitan opportunist par excellence. Environ. Microbiol. 4: 779–781. Regenhardt, D., Heuer, H., Heim, S. et al. (2002). Pedigree and taxonomic credentials of Pseudomonas putida strain KT2440. Environ. Microbiol. 4: 912–915. Wackett, L.P. (2003). Pseudomonas putida–a versatile biocatalyst. Nat. Biotechnol. 21: 136–138. Matilla, M.A., Pizarro-Tobías, P., Roca, A. et al. (2011). Complete genome of the plant growth-promoting rhizobacterium Pseudomonas putida BIRD-1. J. Bacteriol. 193: 1290–1290. George, K.W. and Hay, A. (2012). Less is more: reduced catechol production permits Pseudomonas putida F1 to grow on styrene. Microbiology 158: 2781–2788.

References

17 Li, S., Zhao, H., Li, Y. et al. (2012). Complete genome sequence of the

18

19

20

21 22

23 24

25

26

27

28

29 30

31

32

naphthalene-degrading Pseudomonas putida strain ND6. J. Bacteriol. 194: 5154–5155. Ponraj, P., Shankar, M., Ilakkiam, D. et al. (2012). Genome sequence of the plant growth-promoting rhizobacterium Pseudomonas putida S11. J. Bacteriol. 194: 6015–6015. Yu, H., Tang, H., Wang, L. et al. (2011). Complete genome sequence of the nicotine-degrading Pseudomonas putida strain S16. J. Bacteriol. 193: 5541–5542. Duan, J., Jiang, W., Cheng, Z. et al. (2013). The complete genome sequence of the plant growth-promoting bacterium Pseudomonas sp. UW4. PLoS One 8: e58640. Daniels, C., Godoy, P., Duque, E. et al. (2010). Global regulation of food supply by Pseudomonas putida DOT-T1E. J. Bacteriol. 192: 2169–2181. van der Werf, M.J., Overkamp, K.M., Muilwijk, B. et al. (2008). Comprehensive analysis of the metabolome of Pseudomonas putida S12 grown on different carbon sources. Mol. BioSyst. 4: 315–327. Nakazawa, T. (2002). Travels of a Pseudomonas, from Japan around the world. Environ. Microbiol. 4: 782–786. Worsey, M.J. and Williams, P.A. (1975). Metabolism of toluene and xylenes by Pseudomonas putida (arvilla) mt-2: evidence for a new function of the TOL plasmid. J. Bacteriol. 124: 7–13. Wong, C.L. and Dunn, W.B. (1974). Transmissible plasmid coding for the degradation of benzoate and m-toluate in Pseudomonas arvilla mt-2. Microbiology 23: 227–232. Bagdasarian, M., Lurz, R., Rückert, B. et al. (1981). Specific purpose plasmid cloning vectors. II. Broad host range, high copy number, RSF1010-derived vectors, and a host-vector system for gene cloning in Pseudomonas. Gene 16: 237–247. Mulet, M., García-Valdes, E., and Lalucat, J. (2013). Phylogenetic affiliation of Pseudomonas putida biovar A and B strains. Res. Microbiol. 164: 351–359. Lemire, J., Alhasawi, A., Appanna, V.P. et al. (2017). Metabolic defence against oxidative stress: the road less travelled so far. J. Appl. Microbiol. 123: 798–809. Kim, J. and Park, W. (2014). Oxidative stress response in Pseudomonas putida. Appl. Microbiol. Biotechnol. 98: 6933–6946. Nikel, P.I., Chavarría, M., Martínez-García, E. et al. (2013). Accumulation of inorganic polyphosphate enables stress endurance and catalytic vigour in Pseudomonas putida KT2440. Microb. Cell Factories 12: 50. Arce-Rodríguez, A., Calles, B., Nikel, P.I., and de Lorenzo, V. (2016). The RNA chaperone Hfq enables the environmental stress tolerance super-phenotype of Pseudomonas putida. Environ. Microbiol. 18: 3309–3326. Nikel, P.I., Pérez-Pantoja, D., and de Lorenzo, V. (2016). Pyridine nucleotide transhydrogenases enable redox balance of Pseudomonas putida during biodegradation of aromatic compounds. Environ. Microbiol. 18: 3565–3582.

537

538

14 Metabolic Engineering of Pseudomonas

33 Federal Register (1982). Appendix E, Certified host-vector systems. 47:

17197. 34 Kampers, L.F.C., Volkers, R.J.M., and Martins dos Santos, V.A.P. (2019).

35

36

37

38

39

40

41

42

43

44

45

46

47

Pseudomonas putida KT2440 is HV1 certified, not GRAS. Microb. Biotechnol. 12: 845–848. Nelson, K.E., Weinel, C., Paulsen, I.T. et al. (2002). Complete genome sequence and comparative analysis of the metabolically versatile Pseudomonas putida KT2440. Environ. Microbiol. 4: 799–808. Belda, E., van Heck, R.G.A., López-Sánchez, M.J. et al. (2016). The revisited genome of Pseudomonas putida KT2440 enlightens its value as a robust metabolic chassis. Environ. Microbiol. 18: 3403–3424. Udaondo, Z., Molina, L., Segura, A. et al. (2016). Analysis of the core genome and pangenome of Pseudomonas putida. Environ. Microbiol. 18: 3268–3283. Nogales, J., Palsson, B.Ø., and Thiele, I. (2008). A genome-scale metabolic reconstruction of Pseudomonas putida KT2440: iJN746 as a cell factory. BMC Syst. Biol. 2: 79. Sohn, S.B., Kim, T.Y., Park, J.M., and Lee, S.Y. (2010). In silico genome-scale metabolic analysis of Pseudomonas putida KT2440 for polyhydroxyalkanoate synthesis, degradation of aromatics and anaerobic survival. Biotechnol. J. 5: 739–750. Puchałka, J., Oberhardt, M.A., Godinho, M. et al. (2008). Genome-scale reconstruction and analysis of the Pseudomonas putida KT2440 metabolic network facilitates applications in biotechnology. PLoS Comput. Biol. 4: e1000210. Yuan, Q., Huang, T., Li, P. et al. (2017). Pathway-consensus approach to metabolic network reconstruction for Pseudomonas putida KT2440 by systematic comparison of published models. PLoS One 12: e0169437. Nogales, J., Mueller, J., Gudmundsson, S. et al. (2020). High-quality genome-scale metabolic modelling of Pseudomonas putida highlights its broad metabolic capabilities. Environ. Microbiol. 22: 255–269. Tokic, M., Hatzimanikatis, V., and Miskovic, L. (2020). Large-scale kinetic metabolic models of Pseudomonas putida KT2440 for consistent design of metabolic engineering strategies. Biotechnol. Biofuels 13: 33. Loeschcke, A. and Thies, S. (2015). Pseudomonas putida–A versatile host for the production of natural products. Appl. Microbiol. Biotechnol. 99: 6197–6214. Poblete-Castro, I., Borrero de Acuña, J.M., Nikel, P.I. et al. (2017). Host organism: Pseudomonas putida. In: Industrial Biotechnology: Microorganisms (eds. C. Wittmann and J.C. Liao), 299–326. Weinheim: Wiley. Nikel, P.I., Martínez-García, E., and de Lorenzo, V. (2014). Biotechnological domestication of pseudomonads using synthetic biology. Nat. Rev. Microbiol. 12: 368–379. Nikel, P.I., Chavarría, M., Danchin, A., and de Lorenzo, V. (2016). From dirt to industrial applications: Pseudomonas putida as a synthetic biology chassis for hosting harsh biochemical reactions. Curr. Opin. Chem. Biol. 34: 20–29.

References

48 Chavarría, M., Nikel, P.I., Pérez-Pantoja, D., and de Lorenzo, V. (2013). The

49

50

51 52

53

54

55

56

57

58

59

60

61

62

Entner-Doudoroff pathway empowers Pseudomonas putida KT2440 with a high tolerance to oxidative stress. Environ. Microbiol. 15: 1772–1785. Nikel, P.I. and Chavarría, M. (2016). Quantitative physiology approaches to understand and optimize reducing power availability in environmental bacteria. In: Hydrocarbon and Lipid Microbiology Protocols–Synthetic and Systems Biology – Tools (eds. T.J. McGenity, K.N. Timmis and B. Nogales-Fernández), 39–70. Heidelberg: Humana Press. del Castillo, T., Ramos, J.L., Rodríguez-Herva, J.J. et al. (2007). Convergent peripheral pathways catalyze initial glucose catabolism in Pseudomonas putida: genomic and flux analysis. J. Bacteriol. 189: 5142–5152. Lessie, T.G. and Phibbs, P.V. (1984). Alternative pathways of carbohydrate utilization in pseudomonads. Annu. Rev. Microbiol. 38: 359–388. Sánchez-Pascuala, A., de Lorenzo, V., and Nikel, P.I. (2017). Refactoring the Embden-Meyerhof-Parnas pathway as a whole of portable GlucoBricks for implantation of glycolytic modules in Gram-negative bacteria. ACS Synth. Biol. 6: 793–805. Pflüger-Grau, K. and de Lorenzo, V. (2014). From the phosphoenolpyruvate phosphotransferase system to selfish metabolism: a story retraced in Pseudomonas putida. FEMS Microbiol. Lett. 356: 144–153. Chavarría, M., Goñi-Moreno, A., de Lorenzo, V., and Nikel, P.I. (2016). A metabolic widget adjusts the phosphoenolpyruvate-dependent fructose influx in Pseudomonas putida. mSystems 1: e00154-16. Udaondo, Z., Ramos, J.L., Segura, A. et al. (2018). Regulation of carbohydrate degradation pathways in Pseudomonas involves a versatile set of transcriptional regulators. Microb. Biotechnol. 11: 442–454. Park, H., McGill, S.L., Arnold, A.D., and Carlson, R.P. (2020). Pseudomonad reverse carbon catabolite repression, interspecies metabolite exchange, and consortial division of labor. Cell. Mol. Life Sci. 77: 395–413. Nikel, P.I., Pettinari, M.J., Ramírez, M.C. et al. (2008). Escherichia coli arcA mutants: metabolic profile characterization of microaerobic cultures using glycerol as a carbon source. J. Mol. Microbiol. Biotechnol. 15: 48–54. Velázquez, F., Pflüger, K., Cases, I. et al. (2007). The phosphotransferase system formed by PtsP, PtsO, and PtsN proteins controls production of polyhydroxyalkanoates in Pseudomonas putida. J. Bacteriol. 189: 4529–4533. Köhler, K.A., Blank, L.M., Frick, O., and Schmid, A. (2015). D-Xylose assimilation via the Weimberg pathway by solvent-tolerant Pseudomonas taiwanensis VLB120. Environ. Microbiol. 17: 156–170. Bator, I., Wittgens, A., Rosenau, F. et al. (2019). Comparison of three xylose pathways in Pseudomonas putida KT2440 for the synthesis of valuable products. Front. Bioeng. Biotechnol. 7: 480. Poblete-Castro, I., Wittmann, C., and Nikel, P.I. (2020). Biochemistry, genetics, and biotechnology of glycerol utilization in Pseudomonas species. Microb. Biotechnol. 13: 32–53. Nikel, P.I., Kim, J., and de Lorenzo, V. (2014). Metabolic and regulatory rearrangements underlying glycerol metabolism in Pseudomonas putida KT2440. Environ. Microbiol. 16: 239–254.

539

540

14 Metabolic Engineering of Pseudomonas

63 Escapa, I.F., del Cerro, C., García, J.L., and Prieto, M.A. (2012). The role of

64

65

66

67

68

69

70

71

72

73

74

75

76

GlpR repressor in Pseudomonas putida KT2440 growth and PHA production from glycerol. Environ. Microbiol. 15: 93–110. Nikel, P.I., Romero-Campero, F.J., Zeidman, J.A. et al. (2015). The glycerol-dependent metabolic persistence of Pseudomonas putida KT2440 reflects the regulatory logic of the GlpR repressor. mBio 6: e00340-15. Fuhrer, T., Fischer, E., and Sauer, U. (2005). Experimental identification and quantification of glucose metabolism in seven bacterial species. J. Bacteriol. 187: 1581–1590. Daddaoua, A., Krell, T., Alfonso, C. et al. (2010). Compartmentalized glucose metabolism in Pseudomonas putida is controlled by the PtxS repressor. J. Bacteriol. 192: 4357–4366. Daddaoua, A., Krell, T., and Ramos, J.L. (2009). Regulation of glucose metabolism in Pseudomonas: the phosphorylative branch and Entner-Doudoroff enzymes are regulated by a repressor containing a sugar isomerase domain. J. Biol. Chem. 284: 21360–21368. del Castillo, T., Duque, E., and Ramos, J.L. (2008). A set of activators and repressors control peripheral glucose pathways in Pseudomonas putida to yield a common central intermediate. J. Bacteriol. 190: 2331–2339. Dolan, S.K., Pereira, G., Silva-Rocha, R., and Welch, M. (2020). Transcriptional regulation of central carbon metabolism in Pseudomonas aeruginosa. Microb. Biotechnol. 13: 285–289. Corona, F., Martínez, J.L., and Nikel, P.I. (2019). The global regulator Crc orchestrates the metabolic robustness underlying oxidative stress resistance in Pseudomonas aeruginosa. Environ. Microbiol. 21: 898–912. Sánchez-Pascuala, A., Fernández-Cabezón, L., de Lorenzo, V., and Nikel, P.I. (2019). Functional implementation of a linear glycolysis for sugar catabolism in Pseudomonas putida. Metab. Eng. 54: 200–211. Nikel, P.I., Chavarría, M., Fuhrer, T. et al. (2015). Pseudomonas putida KT2440 strain metabolizes glucose through a cycle formed by enzymes of the Entner-Doudoroff, Embden-Meyerhof-Parnas, and pentose phosphate pathways. J. Biol. Chem. 290: 25920–25932. Kohlstedt, M. and Wittmann, C. (2019). GC-MS-based 13 C metabolic flux analysis resolves the parallel and cyclic glucose metabolism of Pseudomonas putida KT2440 and Pseudomonas aeruginosa PAO1. Metab. Eng. 54: 35–53. Martínez-García, E. and de Lorenzo, V. (2017). Molecular tools and emerging strategies for deep genetic/genomic refactoring of Pseudomonas. Curr. Opin. Biotechnol. 47: 120–132. Gawin, A., Valla, S., and Brautaset, T. (2017). The XylS/Pm regulator/promoter system and its use in fundamental studies of bacterial gene expression, recombinant protein production and metabolic engineering. Microb. Biotechnol. 10: 702–718. Calero, P., Jensen, S.I., and Nielsen, A.T. (2016). Broad-host-range ProUSER vectors enable fast characterization of inducible promoters and optimization of p-coumaric acid production in Pseudomonas putida KT2440. ACS Synth. Biol. 5: 741–753.

References

77 de Lorenzo, V., Fernández, S., Herrero, M. et al. (1993). Engineering of

78

79

80

81

82

83

84

85

86

87

88 89

90

alkyl- and haloaromatic-responsive gene expression with mini-transposons containing regulated promoters of biodegradative pathways of Pseudomonas. Gene 130: 41–46. Ramos, J.L., Marqués, S., and Timmis, K.N. (1997). Transcriptional control of the Pseudomonas TOL plasmid catabolic operons is achieved through an interplay of host factors and plasmid-encoded regulators. Annu. Rev. Microbiol. 51: 341–373. Marqués, S. and Ramos, J.L. (1993). Transcriptional control of the Pseudomonas putida TOL plasmid catabolic pathways. Mol. Microbiol. 9: 923–929. Blatny, J.M., Brautaset, T., Winther-Larsen, H.C. et al. (1997). Construction and use of a versatile set of broad-host-range cloning and expression vectors based on the RK2 replicon. Appl. Environ. Microbiol. 63: 370–379. Panke, S., Meyer, A., Huber, C.M. et al. (1999). An alkane-responsive expression system for the production of fine chemicals. Appl. Environ. Microbiol. 65: 2324–2332. Makart, S., Heinemann, M., and Panke, S. (2007). Characterization of the AlkS/PalkB -expression system as an efficient tool for the production of recombinant proteins in Escherichia coli fed-batch fermentations. Biotechnol. Bioeng. 96: 326–336. Cebolla, A., Guzmán, C., and de Lorenzo, V. (1996). Nondisruptive detection of activity of catabolic promoters of Pseudomonas putida with an antigenic surface reporter system. Appl. Environ. Microbiol. 62: 214–220. Cebolla, A., Sousa, C., and de Lorenzo, V. (2001). Rational design of a bacterial transcriptional cascade for amplifying gene expression capacity. Nucleic Acids Res. 29: 759–766. Becker, P.D., Royo, J.L., and Guzmán, C.A. (2010). Exploitation of prokaryotic expression systems based on the salicylate-dependent control circuit encompassing nahR/Psal::xylS2 for biotechnological applications. Bioeng. Bugs 1: 244–251. Vangnai, A.S., Kataoka, N., Soonglerdsongpha, S. et al. (2012). Construction and application of an Escherichia coli bioreporter for aniline and chloroaniline detection. J. Ind. Microbiol. Biotechnol. 39: 1801–1810. Choi, Y.J., Morel, L., Le François, T. et al. (2010). Novel, versatile, and tightly regulated expression system for Escherichia coli strains. Appl. Environ. Microbiol. 76: 5058–5066. Guan, X., Ramanathan, S., Garris, J.P. et al. (2000). Chlorocatechol detection based on a clc operon/reporter gene system. Anal. Chem. 72: 2423–2427. Hanko, E.K.R., Minton, N.P., and Malys, N. (2017). Characterisation of a 3-hydroxypropionic acid-inducible system from Pseudomonas putida for orthogonal gene expression control in Escherichia coli and Cupriavidus necator. Sci. Rep. 7: 1724. Graf, N. and Altenbuchner, J. (2013). Functional characterization and application of a tightly regulated MekR/PmekA expression system in Escherichia coli and Pseudomonas putida. Appl. Microbiol. Biotechnol. 97: 8239–8251.

541

542

14 Metabolic Engineering of Pseudomonas

91 Luo, X., Yang, Y., Ling, W. et al. (2016). Pseudomonas putida KT2440 mark-

92

93

94

95

96

97

98

99

100

101

102

103

erless gene deletion using a combination of 𝜆 Red recombineering and Cre/loxP site-specific recombination. FEMS Microbiol. Lett. 363: fnw014. Hoffmann, J. and Altenbuchner, J. (2015). Functional characterization of the mannitol promoter of Pseudomonas fluorescens DSM 50106 and its application for a mannitol-inducible expression system for Pseudomonas putida KT2440. PLoS One 10: e0133248. Dvoˇrák, P., Chrást, L., Nikel, P.I. et al. (2015). Exacerbation of substrate toxicity by IPTG in Escherichia coli BL21(DE3) carrying a synthetic metabolic pathway. Microb. Cell Fact. 14: 201. Arce-Rodríguez, A., Volke, D.C., Bense, S. et al. (2019). Non-invasive, ratiometric determination of intracellular pH in Pseudomonas species using a novel genetically encoded indicator. Microb. Biotechnol. 12: 799–813. Martínez-García, E., Aparicio, T., Goñi-Moreno, A. et al. (2014). SEVA 2.0: an update of the Standard European Vector Architecture for de-/re-construction of bacterial functionalities. Nucleic Acids Res. 43: D1183–D1189. Silva-Rocha, R., Martínez-García, E., Calles, B. et al. (2013). The Standard European Vector Architecture (SEVA): a coherent platform for the analysis and deployment of complex prokaryotic phenotypes. Nucleic Acids Res. 41: D666–D675. Martínez-García, E., Goñi-Moreno, A., Bartley, B. et al. (2019). SEVA 3.0: an update of the Standard European Vector Architecture for enabling portability of genetic constructs among diverse bacterial hosts. Nucleic Acids Res. 48: D1164–D1170. Wirth, N.T. and Nikel, P.I. (2020). Engineering reduced-genome strains of Pseudomonas putida for product valorization. In: Minimal Cells: Design, Construction, Biotechnological Applications (eds. A.R. Lara and G. Gosset), 69–93. Cham: Springer. Martínez-García, E. and de Lorenzo, V. (2011). Engineering multiple genomic deletions in Gram-negative bacteria: analysis of the multi-resistant antibiotic profile of Pseudomonas putida KT2440. Environ. Microbiol. 13: 2702–2716. Aparicio, T., de Lorenzo, V., and Martínez-García, E. (2019). CRISPR/Cas9-enhanced ssDNA recombineering for Pseudomonas putida. Microb. Biotechnol. 12: 1076–1089. Martínez-García, E., Aparicio, T., de Lorenzo, V., and Nikel, P.I. (2014). New transposon tools tailored for metabolic engineering of Gram-negative microbial cell factories. Front. Bioeng. Biotechnol. 2: 46. Martínez-García, E., Aparicio, T., de Lorenzo, V., and Nikel, P.I. (2017). Engineering Gram-negative microbial cell factories using transposon vectors. Methods Mol. Biol. 1498: 273–293. Martínez-García, E., Calles, B., Arévalo-Rodríguez, M., and de Lorenzo, V. (2011). pBAM1: an all-synthetic genetic tool for analysis and construction of complex bacterial phenotypes. BMC Microbiol. 11: 38.

References

104 Nikel, P.I. and de Lorenzo, V. (2013). Implantation of unmarked regula-

105

106

107

108

109

110

111

112 113

114

115

116

117 118

119

tory and metabolic modules in Gram-negative bacteria with specialised mini-transposon delivery vectors. J. Biotechnol. 163: 143–154. Volke, D.C., Friis, L., Wirth, N.T. et al. (2020). Synthetic control of plasmid replication enables target- and self-curing of vectors and expedites genome engineering of Pseudomonas putida. Metab. Eng. Commun. 10: e00126. Elmore, J.R., Furches, A., Wolff, G.N. et al. (2017). Development of a high efficiency integration system and promoter library for rapid modification of Pseudomonas putida KT2440. Metab. Eng. Commun. 5: 1–8. Zobel, S., Benedetti, I., Eisenbach, L. et al. (2015). Tn7-based device for calibrated heterologous gene expression in Pseudomonas putida. ACS Synth. Biol. 4: 1341–1351. Jakoˇci¯unas, T., Jensen, M.K., and Keasling, J.D. (2017). System-level perturbations of cell metabolism using CRISPR/Cas9. Curr. Opin. Biotechnol. 46: 134–140. Aparicio, T., de Lorenzo, V., and Martínez-García, E. (2018). CRISPR/Cas9-based counterselection boosts recombineering efficiency in Pseudomonas putida. Biotechnol. J. 13: e1700161. Mougiakos, I., Mohanraju, P., Bosma, E.F. et al. (2017). Characterizing a thermostable Cas9 for bacterial genome editing and silencing. Nat. Commun. 8: 1647. Wirth, N.T., Kozaeva, E., and Nikel, P.I. (2020). Accelerated genome engineering of Pseudomonas putida by I-SceI—mediated recombination and CRISPR-Cas9 counterselection. Microb. Biotechnol. 13: 233–249. Tan, S.Z., Reisch, C.R., and Prather, K.L.J. (2018). A robust CRISPR interference gene repression system in Pseudomonas. J. Bacteriol. 200: e00575-17. Sun, J., Wang, Q., Jiang, Y. et al. (2018). Genome editing and transcriptional repression in Pseudomonas putida KT2440 via the type II CRISPR system. Microb. Cell Factories 17: 41. Batianis, C., Kozaeva, E., Damalas, S.G. et al. (2020). An expanded CRISPRi toolbox for tunable control of gene expression in Pseudomonas putida. Microb. Biotechnol. 13: 368–385. Kim, S.K., Yoon, P.K., Kim, S.J. et al. (2020). CRISPR interference-mediated gene regulation in Pseudomonas putida KT2440. Microb. Biotechnol. 13: 210–221. Choi, K.R., Cho, J.S., Cho, I.J. et al. (2018). Markerless gene knockout and integration to express heterologous biosynthetic gene clusters in Pseudomonas putida. Metab. Eng. 47: 463–474. Wang, H.H., Isaacs, F.J., Carr, P.A. et al. (2009). Programming cells by multiplex genome engineering and accelerated evolution. Nature 460: 894–898. Aparicio, T., Nyerges, A., Martínez-García, E., and de Lorenzo, V. (2020). High-efficiency multi-site genomic editing (HEMSE) of Pseudomonas putida through thermoinducible ssDNA recombineering. iScience 23: 100946. Ricaurte, D.E., Martínez-García, E., Nyerges, A. et al. (2018). A standardized workflow for surveying recombinases expands bacterial genome-editing capabilities. Microb. Biotechnol. 11: 176–188.

543

544

14 Metabolic Engineering of Pseudomonas

120 Shapiro, R.S., Chávez, A., and Collins, J.J. (2018). CRISPR-based genomic

121

122

123

124

125

126

127

128

129

130

131

132 133

134

tools for the manipulation of genetically intractable microorganisms. Nat. Rev. Microbiol. 16: 333–339. Chen, W., Zhang, Y., Zhang, Y. et al. (2018). CRISPR/Cas9-based genome editing in Pseudomonas aeruginosa and cytidine deaminase-mediated base editing in Pseudomonas species. iScience 6: 222–231. Domröse, A., Weihmann, R., Thies, S. et al. (2017). Rapid generation of recombinant Pseudomonas putida secondary metabolite producers using yTREX. Synth. Syst. Biotechnol. 2: 310–319. Weihmann, R., Domröse, A., Drepper, T. et al. (2020). Protocols for yTREX/Tn5-based gene cluster expression in Pseudomonas putida. Microb. Biotechnol. 13: 250–262. Calles, B., Goñi-Moreno, A., and de Lorenzo, V. (2019). Digitalizing heterologous gene expression in Gram-negative bacteria with a portable ON/OFF module. Mol. Syst. Biol. 15: e8777. Durante-Rodríguez, G., de Lorenzo, V., and Nikel, P.I. (2018). A post-translational metabolic switch enables complete decoupling of bacterial growth from biopolymer production in engineered Escherichia coli. ACS Synth. Biol. 7: 2686–2697. Volke, D.C., Turlin, J., Mol, V., and Nikel, P.I. (2019). Physical decoupling of XylS/Pm regulatory elements and conditional proteolysis enable precise control of gene expression in Pseudomonas putida. Microb. Biotechnol. 13: 222–232. Neves, D., Vos, S., Blank, L.M., and Ebert, B.E. (2019). Pseudomonas mRNA 2.0: boosting gene expression through enhanced mRNA stability and translational efficiency. Front. Bioeng. Biotechnol. 7: 458. Viegas, S.C., Apura, P., Martínez-García, E. et al. (2018). Modulating heterologous gene expression with portable mRNA-stabilizing 5′ -UTR sequences. ACS Synth. Biol. 7: 2177–2188. Domröse, A., Hage-Hülsmann, J., Thies, S. et al. (2019). Pseudomonas putida rDNA is a favored site for the expression of biosynthetic genes. Sci. Rep. 9: 7028. Martínez-García, E., Nikel, P.I., Aparicio, T., and de Lorenzo, V. (2014). Pseudomonas 2.0: genetic upgrading of P. putida KT2440 as an enhanced host for heterologous gene expression. Microb. Cell Fact. 13: 159. Martínez-García, E., Nikel, P.I., Chavarría, M., and de Lorenzo, V. (2014). The metabolic cost of flagellar motion in Pseudomonas putida KT2440. Environ. Microbiol. 16: 291–303. Casjens, S. (2003). Prophages and bacterial genomics: what have we learned so far? Mol. Microbiol. 49: 277–300. Martínez-García, E., Jatsenko, T., Kivisaar, M., and de Lorenzo, V. (2014). Freeing Pseudomonas putida KT2440 of its proviral load strengthens endurance to environmental stresses. Environ. Microbiol. 17: 76–90. Ilves, H., Hõrak, R., and Kivisaar, M. (2001). Involvement of sS in starvation-induced transposition of Pseudomonas putida transposon Tn4652. J. Bacteriol. 183: 5445–5448.

References

135 Lieder, S., Nikel, P.I., de Lorenzo, V., and Takors, R. (2015). Genome reduc-

136

137

138

139

140

141

142

143

144

145

146

147

148

149

tion boosts heterologous gene expression in Pseudomonas putida. Microb. Cell Fact. 14: 23. Aparicio, T., de Lorenzo, V., and Martínez-García, E. (2019). Improved thermotolerance of genome-reduced Pseudomonas putida EM42 enables effective functioning of the PL /cI857 system. Biotechnol. J. 14: e1800483. Wynands, B., Otto, M., Runge, N. et al. (2019). Streamlined Pseudomonas taiwanensis VLB120 chassis strains with improved bioprocess features. ACS Synth. Biol. 8: 2036–2050. Shen, X., Wang, Z., Huang, X. et al. (2017). Developing genome-reduced Pseudomonas chlororaphis strains for the production of secondary metabolites. BMC Genomics 18: 715–715. Hintermayer, S.B. and Weuster-Botz, D. (2017). Experimental validation of in silico estimated biomass yields of Pseudomonas putida KT2440. Biotechnol. J. 12: 1600720. Beckers, V., Poblete-Castro, I., Tomasch, J., and Wittmann, C. (2016). Integrated analysis of gene expression and metabolic fluxes in PHA-producing Pseudomonas putida grown on glycerol. Microb. Cell Fact. 15: 73. Prieto, M.A., Escapa, I.F., Martínez, V. et al. (2016). A holistic view of polyhydroxyalkanoate metabolism in Pseudomonas putida. Environ. Microbiol. 18: 341–357. Becker, J. and Wittmann, C. (2019). A field of dreams: lignin valorization into chemicals, materials, fuels, and health-care products. Biotechnol. Adv. 37: 107360. Dvoˇrák, P. and de Lorenzo, V. (2018). Refactoring the upper sugar metabolism of Pseudomonas putida for co-utilization of cellobiose, xylose, and glucose. Metab. Eng. 48: 94–108. Ravindran, R. and Jaiswal, A.K. (2016). A comprehensive review on pre-treatment strategy for lignocellulosic food industry waste: challenges and opportunities. Bioresour. Technol. 199: 92–102. Guarnieri, M.T., Franden, M.A., Johnson, C.W., and Beckham, G.T. (2017). Conversion and assimilation of furfural and 5-(hydroxymethyl)furfural by Pseudomonas putida KT2440. Metab. Eng. Commun. 4: 22–28. Salvachúa, D., Rydzak, T., Auwae, R. et al. (2020). Metabolic engineering of Pseudomonas putida for increased polyhydroxyalkanoate production from lignin. Microb. Biotechnol. 13: 290–298. Borer, B., Tecon, R., and Or, D. (2018). Spatial organization of bacterial populations in response to oxygen and carbon counter-gradients in pore networks. Nat. Commun. 9: 769. Ruiz, J.A., de Almeida, A., Godoy, M.S. et al. (2012). Escherichia coli redox mutants as microbial cell factories for the synthesis of reduced biochemicals. Comput. Struct. Biotechnol. J. 3: e201210019. Rodríguez, A., Escobar, S., Gómez, E. et al. (2018). Behavior of several Pseudomonas putida strains growth under different agitation and oxygen supply conditions. Biotechnol. Prog. 34: 900–909.

545

546

14 Metabolic Engineering of Pseudomonas

150 Nikel, P.I. and de Lorenzo, V. (2013). Engineering an anaerobic metabolic

151

152

153

154

155

156

157

158

159

160

161

162

163

regime in Pseudomonas putida KT2440 for the anoxic biodegradation of 1,3-dichloroprop-1-ene. Metab. Eng. 15: 98–112. Nikel, P.I., Pérez-Pantoja, D., and de Lorenzo, V. (2013). Why are chlorinated pollutants so difficult to degrade aerobically? Redox stress limits 1,3-dichloroprop-1-ene metabolism by Pseudomonas pavonaceae. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. 368: 20120377. Steen, A., Utkür, F.Ö., Borrero de Acuña, J.M. et al. (2013). Construction and characterization of nitrate and nitrite respiring Pseudomonas putida KT2440 strains for anoxic biotechnical applications. J. Biotechnol. 163: 155–165. Schmitz, S., Nies, S., Wierckx, N.J.P. et al. (2015). Engineering mediator-based electroactivity in the obligate aerobic bacterium Pseudomonas putida KT2440. Front. Microbiol. 6: 284. Lai, B., Yu, S., Bernhardt, P.V. et al. (2016). Anoxic metabolism and biochemical production in Pseudomonas putida F1 driven by a bioelectrochemical system. Biotechnol. Biofuels 9: 39. Yu, S., Lai, B., Plan, M.R. et al. (2018). Improved performance of Pseudomonas putida in a bioelectrochemical system through overexpression of periplasmic glucose dehydrogenase. Biotechnol. Bioeng. 115: 145–155. Kampers, L.F.C., van Heck, R.G.A., Donati, S. et al. (2019). In silico-guided engineering of Pseudomonas putida towards growth under micro-oxic conditions. Microb. Cell Fact. 18: 179. Chakrabarty, A.M., Mylroie, J.R., Friello, D.A., and Vacca, J.G. (1975). Transformation of Pseudomonas putida and Escherichia coli with plasmid-linked drug-resistance factor DNA. Proc. Natl. Acad. Sci. U. S. A. 72: 3647–3651. Chakrabarty, A.M. (1981). Microorganisms having multiple compatible degradative energy-generating plasmids and preparation thereof. Patent US4259444A. Dvoˇrák, P., Nikel, P.I., Damborský, J., and de Lorenzo, V. (2017). Bioremediation 3.0: engineering pollutant-removing bacteria in the times of systemic biology. Biotechnol. Adv. 35: 845–866. de Lorenzo, V. and Loza-Tavera, H. (2011). Microbial bioremediation of chemical pollutants: How bacteria cope with multi-stress environmental scenarios. In: Bacterial Stress Responses, 2e (eds. G. Storz and R. Hengge), 481–492. Washington, D.C.: American Society of Microbiology. Lee, J.H. and Wendisch, V.F. (2017). Biotechnological production of aromatic compounds of the extended shikimate pathway from renewable biomass. J. Biotechnol. 257: 211–221. Calero, P., Jensen, S.I., Bojanoviˇc, K. et al. (2018). Genome-wide identification of tolerance mechanisms toward p-coumaric acid in Pseudomonas putida. Biotechnol. Bioeng. 115: 762–774. Molina-Santiago, C., Cordero, B.F., Daddaoua, A. et al. (2016). Pseudomonas putida as a platform for the synthesis of aromatic compounds. Microbiology 162: 1535–1543.

References

164 Wynands, B., Lenzen, C., Otto, M. et al. (2018). Metabolic engineering of

165

166

167

168

169

170

171 172

173

174

175

176

177

Pseudomonas taiwanensis VLB120 with minimal genomic modifications for high-yield phenol production. Metab. Eng. 47: 121–133. Wierckx, N.J.P., Ballerstedt, H., de Bont, J.A., and Wery, J. (2005). Engineering of solvent-tolerant Pseudomonas putida S12 for bioproduction of phenol from glucose. Appl. Environ. Microbiol. 71: 8221–8227. Otto, M., Wynands, B., Lenzen, C. et al. (2019). Rational engineering of phenylalanine accumulation in Pseudomonas taiwanensis to enable high-yield production of trans-cinnamate. Front. Bioeng. Biotechnol. 7: 312. Xie, N.Z., Liang, H., Huang, R.B., and Xu, P. (2014). Biotechnological production of muconic acid: current status and future prospects. Biotechnol. Adv. 32: 615–622. van Duuren, J.B.J.H., de Wild, P.J., Starck, S. et al. (2020). Limited life cycle and cost assessment for the bioconversion of lignin-derived aromatics into adipic acid. Biotechnol. Bioeng. 117: 1381–1393. de Lorenzo, V. and Joshi, H. (2019). Genomic responses of Pseudomonas putida to aromatic hydrocarbons. In: Handbook of Hydrocarbon and Lipid Microbiology (Consequences of Microbial Interactions with Hydrocarbons, Oils, and Lipids: Biodegradation and Bioremediation) (ed. R. Steffan), 1–15. Cham: Springer. Chua, J.W. and Hsieh, J.H. (1990). Oxidative bioconversion of toluene to 1,3-butadiene-1,4-dicarboxylic acid (cis,cis-muconic acid). World J. Microbiol. Biotechnol. 6: 127–143. Vardon, D.R., Franden, M.A., Johnson, C.W. et al. (2015). Adipic acid production from lignin. Energy Environ. Sci. 8: 617–628. Johnson, C.W., Salvachúa, D., Khanna, P. et al. (2016). Enhancing muconic acid production from glucose and lignin-derived aromatic compounds via increased protocatechuate decarboxylase activity. Metab. Eng. Commun. 3: 111–119. Linger, J.G., Vardon, D.R., Guarnieri, M.T. et al. (2014). Lignin valorization through integrated biological funneling and chemical catalysis. Proc. Natl. Acad. Sci. U. S. A. 111: 12013–12018. Kohlstedt, M., Starck, S., Barton, N. et al. (2018). From lignin to nylon: cascaded chemical and biochemical conversion using metabolically engineered Pseudomonas putida. Metab. Eng. 47: 279–293. Bentley, G.J., Narayanan, N., Jha, R.K. et al. (2020). Engineering glucose metabolism for enhanced muconic acid production in Pseudomonas putida KT2440. Metab. Eng. 59: 64–75. Sánchez-Pascuala, A., Nikel, P.I., and de Lorenzo, V. (2018). Re-factoring glycolytic genes for targeted engineering of catabolism in Gram-negative bacteria. In: Synthetic Biology: Methods and Protocols (ed. J.C. Braman), 3–24. New York: Springer. Chong, H. and Li, Q. (2017). Microbial production of rhamnolipids: opportunities, challenges and strategies. Microb. Cell Fact. 16: 137.

547

548

14 Metabolic Engineering of Pseudomonas

178 Wittgens, A., Tiso, T., Arndt, T.T. et al. (2011). Growth independent rham-

179

180

181 182

183

184

185

186

187

188

189

190

191 192

nolipid production from glucose using the non-pathogenic Pseudomonas putida KT2440. Microb. Cell Fact. 10: 80. Germer, A., Tiso, T., Müller, C. et al. (2020). Exploiting the natural diversity of the acyltransferase RhlA for the synthesis of the rhamnolipid precursor 3-(3-hydroxyalkanoyloxy)alkanoic acid. Appl. Environ. Microbiol. 86: e02317-19. Nitschel, R., Ankenbauer, A., Welsch, I. et al. (2020). Engineering Pseudomonas putida KT2440 for the production of isobutanol. Eng. Life Sci. 20: 148–159. Nikel, P.I. and de Lorenzo, V. (2014). Robustness of Pseudomonas putida KT2440 as a host for ethanol biosynthesis. New Biotechnol. 31: 562–571. Thompson, M.G., Valencia, L.E., Blake-Hedges, J.M. et al. (2019). Omics-driven identification and elimination of valerolactam catabolism in Pseudomonas putida KT2440 for increased product titer. Metab. Eng. Commun. 9: e00098. Dong, J., Chen, Y., Benites, V.T. et al. (2019). Methyl ketone production by Pseudomonas putida is enhanced by plant-derived amino acids. Biotechnol. Bioeng. 116: 1909–1922. Yang, D., Kim, W.J., Yoo, S.M. et al. (2018). Repurposing type III polyketide synthase as a malonyl-CoA biosensor for metabolic engineering in bacteria. Proc. Natl. Acad. Sci. U. S. A. 115: 9835–9844. Poblete-Castro, I., Becker, J., Dohnt, K. et al. (2012). Industrial biotechnology of Pseudomonas putida and related species. Appl. Microbiol. Biotechnol. 93: 2279–2290. Liu, X., Ding, W., and Jiang, H. (2017). Engineering microbial cell factories for the production of plant natural products: from design principles to industrial-scale production. Microb. Cell Fact. 16: 125. Tripathi, L., Wu, L.P., Chen, J., and Chen, G.Q. (2012). Synthesis of diblock copolymer poly-3-hydroxybutyrate-block-poly-3-hydroxyhexanoate [PHB-b-PHHx] by a 𝛽-oxidation weakened Pseudomonas putida KT2442. Microb. Cell Factories 11: 44. Li, S.Y., Dong, C.L., Wang, S.Y. et al. (2011). Microbial production of polyhydroxyalkanoate block copolymer by recombinant Pseudomonas putida. Appl. Microbiol. Biotechnol. 90: 659–669. Arias, S., Bassas-Galia, M., Molinari, G., and Timmis, K.N. (2013). Tight coupling of polymerization and depolymerization of polyhydroxyalkanoates ensures efficient management of carbon resources in Pseudomonas putida. Microb. Biotechnol. 6: 551–563. Liu, Q., Luo, G., Zhou, X.R., and Chen, G.Q. (2011). Biosynthesis of poly(3-hydroxydecanoate) and 3-hydroxydodecanoate dominating polyhydroxyalkanoates by 𝛽-oxidation pathway inhibited Pseudomonas putida. Metab. Eng. 13: 11–17. Meng, D.C. and Chen, G.Q. (2018). Synthetic biology of polyhydroxyalkanoates (PHA). Adv. Biochem. Eng. Biotechnol. 162: 147–174. Nikel, P.I., Pettinari, M.J., Galvagno, M.A., and Méndez, B.S. (2010). Metabolic selective pressure stabilizes plasmids carrying biosynthetic genes

References

193

194

195

196

197 198

199 200 201

202

203

204

205 206

207

208

for reduced biochemicals in Escherichia coli redox mutants. Appl. Microbiol. Biotechnol. 88: 563–573. Fedeson, D.T., Saake, P., Calero, P. et al. (2020). Biotransformation of 2,4-dinitrotoluene in a phototrophic co-culture of engineered Synechococcus elongatus and Pseudomonas putida. Microb. Biotechnol. 13: 997–1011. López, N.I., Pettinari, M.J., Nikel, P.I., and Méndez, B.S. (2015). Polyhydroxyalkanoates: much more than biodegradable plastics. Adv. Appl. Microbiol. 93: 93–106. Gomez, J.G.C., Méndez, B.S., Nikel, P.I. et al. (2012). Making green polymers even greener: towards sustainable production of polyhydroxyalkanoates from agroindustrial by-products. In: Advances in Applied Biotechnology (ed. M. Petre), 41–62. Rijeka: InTech. Reed, K.B. and Alper, H. (2018). Expanding beyond canonical metabolism: interfacing alternative elements, synthetic biology, and metabolic engineering. Synth. Syst. Biotechnol. 3: 20–33. Nikel, P.I. (2019). Synthesis of recoded bacterial genomes toward bespoke biocatalysis. Trends Biotechnol. 37: 1036–1038. Martinelli, L. and Nikel, P.I. (2019). Breaking the state-of-the-art in the chemical industry with new-to-nature products via synthetic microbiology. Microb. Biotechnol. 12: 187–190. Danchin, A. (2012). Scaling up synthetic biology: do not forget the chassis. FEBS Lett. 586: 2129–2137. Sanford, K., Chotani, G., Danielson, N., and Zahn, J.A. (2016). Scaling up of renewable chemicals. Curr. Opin. Biotechnol. 38: 112–122. Umenhoffer, K., Fehér, T., Balikó, G. et al. (2010). Reduced evolvability of Escherichia coli MDS42, an IS-less cellular chassis for molecular and synthetic biology applications. Microb. Cell Factories 9: 38. Fernández-Cabezón, L., Cros, A., and Nikel, P.I. (2019). Evolutionary approaches for engineering industrially-relevant phenotypes in bacterial cell factories. Biotechnol. J. 14: 1800439. Hueso-Gil, A., Nyerges, A., Pál, C. et al. (2020). Multiple-site diversification of regulatory sequences enables interspecies operability of genetic devices. ACS Synth. Biol. 9: 104–114. Goñi-Moreno, A. and Nikel, P.I. (2019). High-performance biocomputing in synthetic biology–Integrated transcriptional and metabolic circuits. Front. Bioeng. Biotechnol. 7: 40. Yin, J., Chen, J.C., Wu, Q., and Chen, G.Q. (2015). Halophiles, coming stars for industrial biotechnology. Biotechnol. Adv. 33: 1433–1442. Zhou, Y., Lu, Z., Wang, X. et al. (2018). Genetic engineering modification and fermentation optimization for extracellular production of recombinant proteins using Escherichia coli. Appl. Microbiol. Biotechnol. 102: 1545–1556. Benedetti, I., de Lorenzo, V., and Nikel, P.I. (2016). Genetic programming of catalytic Pseudomonas putida biofilms for boosting biodegradation of haloalkanes. Metab. Eng. 33: 109–118. Volke, D.C. and Nikel, P.I. (2018). Getting bacteria in shape: synthetic morphology approaches for the design of efficient microbial cell factories. Adv. Biosyst. 2: 1800111.

549

550

14 Metabolic Engineering of Pseudomonas

209 Wurtzel, E.T., Vickers, C.E., Hanson, A.D. et al. (2019). Revolutionizing agri-

culture with synthetic biology. Nat. Plants 5: 1207–1210. 210 Timmis, K., de Lorenzo, V., Verstraete, W. et al. (2017). The contribution

of microbial biotechnology to economic growth and employment creation. Microb. Biotechnol. 10: 1137–1144. 211 de Lorenzo, V., Prather, K.L., Chen, G.Q. et al. (2018). The power of synthetic biology for bioproduction, remediation and pollution control: the UN’s sustainable development goals will inevitably require the application of molecular biology and biotechnology on a global scale. EMBO Rep. 19: e45658. 212 Martínez-García, E. and de Lorenzo, V. (2019). Pseudomonas putida in the quest of programmable chemistry. Curr. Opin. Biotechnol. 59: 111–121.

551

15 Metabolic Engineering of Lactic Acid Bacteria Robin Dorau, Jianming Liu, Christian Solem, and Peter Ruhdal Jensen National Food Institute, Technical University of Denmark, Lyngby, Denmark

Concise definition of subject: Lactic acid bacteria (LAB) are a group of Gram-positive, nonsporulating, microaerophilic bacteria, which generally have rather small (500 nt usually) regions of homology flanking the cargo DNA. Additionally, this method relies entirely on the host’s native recombination machinery as no heterologous protein expression is required. The term “double crossover” refers to the two separate recombination events that must occur for successful editing; the first integrates the entire plasmid into the genome while the second excises the vector backbone from the chromosome. Counterselection markers, genetic elements that result in cell death when present, are useful for selecting chromosomal insertions that do not contain the vector backbone (i.e. single-crossover events). Few successful double-crossover mutations were isolated prior to the development of Clostridium-specific counterselection marker systems [83, 84]. Previously described homologous recombination mutations were often single-crossover integrations, which are by their nature unstable [85–90].

617

618

16 Metabolic Engineering and the Synthetic Biology Toolbox for Clostridium

One counterselection method involves deactivating an easily screenable gene on the chromosome first and then complementing the mutant strain with a heterologous version of that gene as a counterselective marker. Uracil auxotrophic mutants, formed by disrupting pyrE, pyrF, or upp genes, require uracil supplementation for growth and are resistant to the antimetabolites 5-fluoroorotic acid (5-FOA) or 5-fluorouracil (5-FU). Double-crossover events in these mutants can easily be isolated by including a functional copy of the disrupted gene on the backbone of the donor DNA plasmid as successful events are marked by cells that no longer require uracil supplementation or demonstrate a resistance to 5-FOA or 5-FU [91–93] (Figure 16.1b). Similarly, in C. perfringens, disruption of the galKT operon produces mutants unable to produce the enzymes involved in galactose metabolism. GalK catalyzes the production of galactose-1-phosphate (Gal-1-P) from galactose, and GalT catalyzes its consumption. The accumulation of Gal-1-P is believed to inhibit cell growth by causing intracellular stress and inducing stress-responsive genes [94, 95]. By including only the galK gene and not the galT gene on the integration vector and plating mutant cells on galactose-supplemented plates, unedited cells do not grow due to an accumulation of Gal-1-P while mutants that undergo a double-crossover event can be isolated [96]. Allelic Coupled Exchange (ACE) couples a counterselection marker gene to a desired double-crossover event. This has been demonstrated in two ways. One method exploits the 5-FOA resistance conferred by a disrupted pyrE or pyrF gene. This method does not require the cells to be auxotrophic for uracil prior to recombination, nor does it rely on a heterologous version of the gene as a counterselection marker. ACE technology employs asymmetric homology arms to direct the order in which crossover events occur. The longer arm, homologous to a 1200 bp region immediately downstream of the pyrE of pyrF directs the first-crossover event in which the entire plasmid is incorporated into the genome. The second-crossover event excises the plasmid backbone and is directed by the shorter arm, which is homologous to a 300 bp internal region of the pyrE or pyrF gene. This second recombination replaces the wildtype pyrE gene with a truncated form, thereby producing a mutant that can be screened based on 5-FOA resistance [93]. Alternatively, a promoter-less heterologous pyrE gene or antibiotic marker can be inserted in the integration vector with the regions of homology such that a successful double-crossover event places the silent gene directly downstream of a constitutive promoter [93]. However, unlike previous methods, which relied heavily on ClosTron technology to first produce auxotrophic mutants, pyrE mutants can be created utilizing ACE technology while the use of an antibiotic marker circumvents the need for a requisite mutant strain [93, 97] (Figure 16.1c). ACE has been proven to be applicable over a range of Clostridium species, having been used for gene editing in C. acetobutylicum [98–100] and C. sporogenes [78, 93], as well as C. difficile [9, 93]. Several heterologous genes have been used for counterselection. The cytosine deaminase gene (codA) from E. coli can be used for counterselection based on the ability of the CodA protein to catalyze the conversion of 5-fluorocytosine (5-FC), an innocuous pyrimidine analog, to 5-FU [101]. codA can only be used for counterselection in strains with a functional upp gene but no native codA

16.3 Genomic Editing in Clostridium

gene [98]. However, a bioinformatics survey suggests several Clostridium species contain codA homologs, restricting the applicability of codA among the genus [102]. In order to genetically manipulate C. saccarobutylicum, a markerless deletion system was developed using codBA operon utilizing the codA gene and a codB gene that encodes a cytosine permease. A suicide vector containg catP gene for thiamphenicol resistance and the codBA gene fuses DNA segments around the deletion target [103]. This led to further markerless deletion and insertion techniques to edit both C. acetobutylicum and C. saccharobuylicum [104]. Restriction-less, markerless counterselection markers have also been used to continuously produce n-butanol with glucose as a sole carbon source by deleting ldhA, ctfAB, adc, ptb, buk, and adhE2 genes. This optimized the acetyl-CoA conversion to n-butanol. By creating a stable process to produce n-butanol, the Weizmann process can be used to ferment and scale up production of n-butanol [1]. Toxin-antitoxin systems are another useful source of counterselection markers. The E. coli-based mazF is an mRNA interferase, coded along with mazE in an operon. Under regular cell conditions, mazE binds to and inhibits mazF activity. During cellular stress, mazE is degraded, allowing mazF to bind mRNA, degrading them at 5′ -ACA′ 3′ sequences, thereby arresting cell growth. mazF, coupled with an antibiotic-resistant marker flanked by FRT sites, can be used as a counterselection marker in plasmid based-homologous recombination. mazF is placed on the gene disruption plasmid under the control of an inducible lac promoter. A double-crossover event can be isolated in cells able to grow on lactose-supplemented plates [102] (Figure 16.1d). The use of mazF requires no prior mutation for successful screening, is independent of the availability of Clostridium genetic parts, and has been shown to function across Clostridium species [28, 80, 102]. Flp-frt recombination has also been used to eliminate the backbone of a donor plasmid following a single-crossover event in C. acetobutylicum, allowing for the use of an antibiotic gene as a marker for the crossover event after the donor plasmid had been cured [105]. 16.3.4

CRISPR-Based Editing in Clostridium

CRISPR systems have revolutionized gene editing in several non-model organisms including Clostridium species. Natively found in prokaryotes [106, 107] and some few bacteriophages [108], these systems have been repurposed for a variety of gene manipulation applications in a wide range of organisms [109–111]. In Clostridium and other bacterial species, CRISPR systems, most commonly CRISPR/Cas9, have been used as a counterselection tool based on their ability to implement a double stranded break (DSB) to a targeted DNA region. Clostridium spp. lack or have inefficient nonhomologous end-joining (NHEJ) systems, therefore a chromosomal DSB results in cell death [112, 113]. Studies in E. coli have shown that DSBs can enhance homologous recombination in bacteria, whereby homology directed repair (HDR) occurs after a break has been induced [114], but studies in various Clostridium species suggest that HDR efficiency is too low to select for successful HDR events [115–117]. Counterselection, in the context of plasmid-based double-crossover editing, is achieved by the targeting of the

619

620

16 Metabolic Engineering and the Synthetic Biology Toolbox for Clostridium

CRISPR/Cas system to the wildtype sequence to eliminate non-edited members of the population. The use of CRISPR/Cas9 can enable scarless edits and represents a major advancement in Clostridium genomic editing. The Type II CRISPR system native to the Streptococcus pyogenes bacterium was the first exploited for genomic engineering based on its minimality [118] with other CRISPR systems having been discovered and repurposed since [119]. The CRISPR/Cas9 system consists of a single effector protein (Cas9), which can bind to and implement a DSB to a targeted DNA system, when co-expressed with a single guide RNA (sgRNA, a fusion of the individual crRNA and trRNAs found in the native system) targeting a 20 bp region immediately adjacent to the protospacer adjacent motif (PAM, NGG for SpyCas9). While the CRISPR/Cas9 system has been used as a counterselection tool in several Clostridium species (Table 16.1) [105, 117, 120, 136], its use has been hindered by several factors. First, the limited number of characterized genetic parts for Clostridium poses a challenge. A dearth of well-characterized, tightly regulated, promoters can lead to simultaneous constitutive expression of both the Cas9 protein and sgRNA, resulting in very few transformed colonies even in the presence of a homologous repair template, as Cas9 activity induces cell death before recombination can occur [105, 117, 120, 136]. One way this can be addressed is through a two-plasmid system, where the donor template and sgRNA are introduced separately from the Cas9 protein, allowing time for recombination prior to counterselection [105, 115, 143]. Not only has this method facilitated the isolation of recombinants at high efficiencies, but it avoids the transformation of very large plasmids, which reduce transformation efficiency. However, it requires two separate transformation events [121]. Successful recombinants have been isolated at a rate of up to 100% [121] with commonly observed efficiencies of greater than 50% (Table 16.1). Although inducible promoters have been employed for Cas9 expression [105, 121] their use has not always mitigated cell death as basal expression from leaky promoters can still be lethal. In response, methods for more stringent control of Cas9 expression have been developed. RiboCas employs the use of a theophylline-induced riboswitch for tight control of Cas9 expression. It has facilitated genomic deletions of up to 2.4 kbp across four Clostridium species at 100% efficiency and was used in C. sporogenes to facilitate the insertion of a 2.9 kbp fragment at efficiencies over 90% (Table 16.1) [8]. Anti-CRISPR proteins, proteins which are able to post-translationally inhibit the nuclease activity of CRISPR effector proteins, also reduce unwanted Cas9 activity [151]. The antiCas protein AcrIIA4 has been coupled with a dual-plasmid CRISPR-Cas9 system in C. acetobutylicum to limit undesired activity of the Cas9 protein, resulting in significantly higher transformation rates and 100% gene editing efficiency [122]. Cas9 nickase (Cas9n) systems exploit CRISPR gene editing while circumventing the lethality associated with the co-expression of a guide RNA with Cas9. This method utilizes Cas9n, a Cas9 with one of its nuclease sites mutated, so a nick results in lieu of a DSB. Implementing a single nick into the genome via Cas9n allows homologous recombination without the lethal effects of a Cas9-mediated DSB, thus permitting a mixed population of edited and unedited strains to coexist. CRISPRn has been used to implement gene deletions and

Table 16.1 CRISPR-based gene and gene repression in Clostridium spp.

Species

CRISPR effector

Number of spacers

Desired edit (efficiency, maximum deletion/insertion, HA length)

C. acetobutylicum

SpyCas9

1

Deletion (100%, 66 bp, 500–1000 bp); replacement (100%, 306 bp, 1000 bp); DNM (100%, N/A, 664 bp)

C. autoethanogenum

C. beijerinckii

Notes

References

[120–122]

SpyCas9n

1

Deletion (7–100%, 20 bp, NR)

SpyCas9

1

Deletion (>50%, NR, NR)

1

[105]

SpyCas9

1

Deletion (0–100%, 2379 bp, 400–1000 bp); insertion (75–87%, 1614 bp, 1000 bp); SNM (>99%, N/A, 1000 bp)

[115, 117] 2

[116]

SpyCas9n

1

Deletion (0–100%, 1149 bp, 150–1200 bp)

SpyCas9nApobec1-UGI

1

Base editing (11–100%)

[116, 123, 124] [125] [126]

AsCas12a

1

Deletion (100%, 1021 bp, 500 bp)

C. botulinum

SpyCas9

1

Deletion (100%, 2400 bp, NR)

C. cellulolyticum

SpyCas9n

1

Deletion (100%, 23 bp, 200–1000 bp), insertion (95–100%, 1720 bp, 100–1000 bp) Deletion (100%, 948 bp, 1000 bp)

[130]

1

Deletion (20–100%, 2400 bp, 1000 bp); insertion (80%, >400 bp, NR)

[8, 131–133]

C. chauvoei

SpyCas9 SpyCas9

C. difficile

C. Ijungdahlii

[8] 3

[127–129]

1

Deletion (37.5–100%, 46.7 kbp, 500–1000 bp)

2

Deletion (25–58.3%, 46.7 kbp, 500–1000 bp)

Cas3b)

1

Deletion (100%, 258 bp, 1200 bp)

SpyCas9

1

Deletion (50–100%, 2600 bp, NR); insertion (100%, 8500 bp, N/A)

dSpyCas9-Deaminiase fusion

1

Base editing (12.5–100%)

[138]

FnCas12a

1

Deletion (100%, 2600 bp, 1000–2500 bp)

[139]

AsCas12a

[134] 4

[134]

5, 6

[136, 137]

[135]

(continued)

Table 16.1 (Continued)

Species

C. pasteurianum C. saccharoperbutylacetonicum

C. sporogenes

C. thermocellum

C. tyrobutyricum

CRISPR effector

Number of spacers

Desired edit (efficiency, maximum deletion/insertion, HA length)

SpyCas9

1

Deletion (100%, 2400 bp, 1000 bp)

SpyCas9n

1

Deletion (100%, 1200 bp, 1000 bp)

[141]

Cas3b)

1

Deletion (100%, 762 bp, NR)

[140]

SpyCas9

1

Deletion (>75%, NR, 1000 bp); insertion (NR, 2560 bp, 1000 bp)

Cas3b)

1

SNP (20%, N/A, NR); deletion (100%, NR, 760 bp); insertion (35–60%, 5000 bp, 950 bp)

[145]

SpyCas9

1

Deletion (100%, 2400 bp, NR); insertion (>90%, 2900 bp, NR)

[8]

Deletion (0.21–91%, NR, 50–1000 bp)

8, 9, 10

[146]

Cas3b)

1

Deletion (14–70%, NR, 50–1000 bp)

9, 10

[146]

SpyCas9

1

Deletion (25%, 816 bp, 500 bp)

[147]

AsCas12a

1

Deletion (12.5%, 816 bp, 500 bp)

[147]

Cas3b)

1

Deletion (100%, NR, 300–1000 bp); replacement (100%, NR, 500 bp)

11

[148]

4, 5

[148]

GeoCas9n

Notes

References

[8, 140]

7

[142–144]

Cas3b)

2

Deletion (100%, NR, 300 bp)

Species

CRISPR effector

Target strand (level of repression)

Method of detection

References

C. acetobutylicum

dSpyCas9

Template (20%); nontemplate (45–90%)

Fluorescence of targeted reporter AFP; RT-qPCR, GC analysis of metabolites

[116, 120]

C. beijerinckii

dSpyCas9

Target strand not reported (65–95%)

RT-qPCR, GC analysis of metabolites; enzyme activity assay, measure protein concentration

[116, 149]

C. cellulovorans

dSpyCas9

Target strand not reported (95%)

C. ljundahlii

dFnCas12a

Template (88–99%)

RT-qPCR, GC analysis of metabolites

[150]

RT-qPCR, GC analysis of metabolites

[139]

1) Lower efficiencies with ptb promoter vs. thl. 2) HA of 500 bp or less resulted in less than 50% efficiency. 3) Short homology arms did not significantly affect efficiency. Attempts to insert fragments of 3000 and 6000 bp were unsuccessful using HA of 1000 bp. 4) Duplexed CRISPR array used to target multiple crRNAs to a single gene for increased editing efficiency AND to target multiple genes simultaneously. 5) Subculturing required for maximum efficiency. 6) 8.5 kbp inserted by coupling CRISPR/Cas9 system with phage serine integrase system. 7) Also used for curing of 136 kbp megaplasmid [144]. 8) Attempts to utilize wt GeoCas9 yielded no transformed colonies. 9) Spacer of approximately 30 bp optimal. PAM Site = TCR where R is adenine or guanine. 10) Coupling CRISPR system with heterogeneous recombineering machinery greatly increased recombination efficiency. 11) Low editing efficiencies observed with HA of 50 bp even when recombineering machinery is expressed. All mentions of Cas9 refer to Streptococcus pyogenes-derived Cas9. Reported editing efficiencies are the fraction of successful mutants of total colonies screened. Abbreviations: DNM, dinucleotide modification; NR, not reported; SNM, single nucleotide modification. Cas3 is the effector protein in the native C. pasteurianum type I-B CRISPR system. a) Targeted region downstream of Ccel_3198 gene. b) Endogenous system.

624

16 Metabolic Engineering and the Synthetic Biology Toolbox for Clostridium

insertions in C. acetobutylicum [113], C. beijerinckii [116], C. cellulolyticum [128], C. pasteurianum [140] with up to 100% efficiency. While the most common, SpyCas9 use on Clostridium genomes (∼30% GC content) has been limited by relative unavailability of the G-rich PAM site as well as the inability to easily target multiple genes simultaneously. To address these challenges, Type V-A systems have been used in Clostridium species. Like the Type II CRISPR systems, the Type V-A only requires a single CRISPR effector protein, Cas12a, co-expressed with a CRISPR RNA (crRNA) to implement a DSB. Cas12a proteins recognize T-rich PAM sites and are capable of self-processing CRISPR arrays into individual crRNAs, thereby easily allowing for multiple targets simultaneously (i.e. multiplexing) [139, 152]. Various Cas12a proteins have been employed for counterselection in Clostridium species. AsCas12a (from Acidaminococcus sp.), which recognizes a TTTV PAM site, has been the most used Cas12a protein in Clostridium. AsCas12a-mediated editing has been demonstrated in C. beijerinckii, C. difficile, and C. tyrobutyricum [126, 134, 147]. In C. difficile, a CRISPR array was used to target AsCas12a to multiple regions simultaneously. Multiplexed CRISPR activity is able to accommodate both the editing of multiple genes concurrently and the targeting of a single gene in multiple regions for increased editing efficiency [134]. A second Cas12a protein, FnCas12a, native to Francisella novicida, has been used for counterselection in C. ljundahlii with 100% observed efficiency [139] (Table 16.1). In that study, FnCas12a was selected as it was least toxic to cells from a group of four Cas12a proteins. This study suggests the level of toxicity of heterogeneous Cas effector proteins may be specific to species. Type I-B CRISPR systems are endogenous to several Clostridium species and can be programmed via expression of synthetic CRISPR arrays for gene targeting in their native species. The repurposing of endogenous CRISPR systems to facilitate gene editing circumvents the need to transform large plasmid constructs, as only a synthetic CRISPR array and donor DNA is required for successful activity. Utilization of native systems has especially been beneficial in species whose primary method of plasmid uptake is via conjugation such as C. difficile [153] and C. tyrobutyricum where conjugation efficiencies have impeded CRISPR-based editing progress. C. tyrobutyricum’s native CRISPR system, which recognizes a TCR PAM site, has been used for gene editing with 100% efficiencies. The utility of this method is exemplified by the replacement of the cat1, which previously could not be deleted, with adhE, which mostly eliminated butyrate production in C. tyrobutyricum, a known hyper-butyrate producer, and converted it into a hyper-butanol producing strain [148]. SpyCas9 and AsCas12a systems have successfully been employed in C. tyrobutyricum due to increasing conjugation efficiencies [147], but with lower success rates. Endogenous CRISPR systems have also been employed in C. difficile [153], C. pasteurianum [140], and C. thermocellum [146]. One major challenge in using the endogenous CRISPR system is the identification of the corresponding PAM sequence, which may require bioinformatic approaches along with wet lab experiments to identify functional motifs. Thermophillic Clostridium, of general interest for their cellulose consumption capabilities, cannot use the well-characterized CRISPR systems that have been

16.3 Genomic Editing in Clostridium

successfully employed in mesophilic strains. In C. thermocellum, the use of both its native Type I-B CRISPR system, as well as a Cas9 variant from Geobacillus stearothermophilus (GeoCas9), has been performed [146]. Editing efficiencies of up to 70 and 94% were achieved using the endogenous Type I-B CRISPR system and the GeoCas9 respectively. In general, longer regions of homology on the donor plasmid have been shown to increase efficiency. In C. cellulolyticum, donor template arm lengths greater than 0.2 kbp had an efficiency of more than 95% when compared with smaller arms, which were only 55% efficient in a CRISPRn system [127]. A similar study using Cas9 in C. acetobutylicum demonstrated an increased efficiency when homology arm lengths of 1 kbp were used as opposed to 500 bp arm lengths [120]. In C. tyrobutyricum, homology arms of 300 bp were used with its endogenous CRISPR system, compromising the rate of recombination for smaller plasmid size and increased transformation efficiency. In this case, maximal efficiency was achieved via subculturing, which has also been shown to enhance the edited population across all CRISPR systems [136, 139, 146, 148]. CRISPR systems have been further repurposed for the regulation of gene expression. In catalytically dead effector proteins, the nuclease active sites are mutated such that the proteins retain their ability to bind to DNA but are unable to perform endonuclease activity. In Clostridium species, these proteins, most commonly, dSpyCas9, have been used for targeted gene repression, whereby, transcription of a gene is sterically hindered by the CRISPR effector protein bound to it. CRISPR interferance (CRISPRi) has enabled simple, tunable, reversible gene knockdown at a transcriptional level through the co-expression of the catalytically dead effector protein and sgRNAs, designed with the aid of bioinformatics tools. This method is reversible, with no permanent change in the genome [154]. Additionally, CRISPRi activity can be modulated not only through controlling the expression of effector protein [155], but also by its relative position to the promoter and gene start site [156], allowing tight control of gene expression. dSpyCas9 has allowed gene knockdowns in C. acetobutylicum [120], C. beijerinckii [116, 149], in C. cellulovorans [150], and C. difficile [8, 131, 132] and has been used to silence both native [116, 149, 150] and heterologous genes [120]. dFnCas12a has been used to regulate gene expression in C. ljundahlii [138].Gene repression of up to 97% was achieved with dSpyCas9 and 99% using dFnCas12a, although the effectiveness of CRISPRi varies among species with similar configurations [116], as well as among genes targeted within a single species (Table 16.1). The tunability of dCas9 has yet to be fully explored in Clostridium, as it has in other bacterial systems [155, 156]. Control of effector protein expression is important as constant expression is needed to supress genes. One study showed activity of the knockdown target unintentionally increased over the time of fermentation, which was attributed to the variable strength of the commonly used thiolase promoter used [149]. Gene expression knockdowns have also been accomplished on the translational level in Clostridium through antisense RNA (asRNA) technology. The asRNA knockdown method involves targeting an mRNA transcript using its asRNA. This method has been used to investigate the function of genes in various Clostridium

625

626

16 Metabolic Engineering and the Synthetic Biology Toolbox for Clostridium

species [128, 157–160], including essential genes where a genetic knockout is unfeasible. asRNA technology has also been used to manipulate gene expression affecting solvent titers [161, 162]. asRNAs are tunable and reversible, and have been effective in silencing, achieving as high as 90% repression levels of certain genes. However, despite their lengths (>100 bp) asRNAs have been shown to be promiscuous, binding especially to transcripts with a high homology to the target sequence [157]. Consequently, asRNA technology requires large constructs for efficient gene repression while CRISPRi provides repression specificity, using a 20 nt sgRNA choice. Several other applications of catalytically dead CRISPR effector proteins have been recognized based on their very ability to selectively bind to specified DNA sequences. Base editors, deaminases which indiscriminatingly install cytosine-to-thymine or adenine to guanine point mutations to DNA, have been tethered to dCas9 (as well as Cas9n) to implement targeted point mutations in several organisms [163–168]. The fusion of the base editor to dCas9 limits its activity to the region in which the dCas9 protein is targeted. In Clostridium, CRISPR-mediated base editing has been performed in C. ljundahlii and C. beijericnkii with editing efficiencies of between 11 and 100% [138, 169]. While CRISPR-based gene activation and imaging tools [170–173] have been developed in other bacterial species, the application of these tools has yet to be realized in Clostridium species. Application of CRISPR tools overall is still limited in Clostridium due to low plasmid transformation efficiencies and a lack of characterized recombineering and NHEJ tools. While the use of native CRISPR systems circumvents the need to express heterologous Cas effector proteins, the applicability of these systems may be limited to strains with functional CRISPR/Cas machinery and by unknown PAM sequences. Recombineering, through lambda red technology, has facilitated gene engineering in E. coli via the use of linear DNA repair templates, a process that skips the cloning steps required in plasmid-based homologous recombination methods. Coupled with CRISPR, this technology enables multiplexed ssDNA recombineering events with efficiencies allowing large libraries (>105 members) to be constructed in parallel [174]. However, the lack of ssDNA recombineering machinery functional in Clostridium hinders the development of comparable Clostridium-based technologies. Although a RecT protein from C. perfringens demonstrated recombineering activity in C. acetobutylicum, the results obtained were not comparable to routine recombineering events in E. coli [175]. Similarly, the expression of Ku and LigD genes from Mycobacterium tuberculosis enabled NHEJ in E. coli following the implementation of a DSB via Cas9 [176]. Although NHEJ-related genes (ku, DNA ligase, and ligD) are found on the C. cellulolyticum, they are not highly expressed and NHEJ events have not been observed in the species after a Cas9 DSB [127]. Heterologous expression of such genes may enable NHEJ in Clostridium.

16.4 Genetic Parts in Clostridium Rational engineering of solventogenic clostridia relies on having access to a “toolbox” of well-characterized biological parts including promoters, ribosomal

16.4 Genetic Parts in Clostridium

binding sites (RBS), origin of replication (ORI), and terminators for further advancement of metabolic engineering efforts in Clostridium. To enhance productivity and yield of desired products, the “toolbox” should enable assemblies of individual or grouped genes to be well-regulated [177]. 16.4.1

Promoters

Promoters play a critical role in controlling gene expression. Precise control of gene expression allows balancing gene expression of metabolic pathways and constructing compatible genetic circuits. Therefore, expanding the availability of well-characterized promoters is essential in proper construction of genetic circuits for Clostridium. Several Clostridium regulatory promoters and transcription factors have been identified through systems biology and bioinformatic analysis. Transcriptional analysis data of differential carbohydrate utilization in the presence of glucose in C. acetobutylicum [178] were used to develop new sugar-based inducible promoter systems for the carbon source of interest [179]. Sugar-based inducers are simultaneously utilized as carbon sources in Clostridium hosts, and thus depleted over time and are complicated by carbon catabolite repression [180]. As in E. coli, lactose, and the unconsumable isopropyl 𝛽-d-1-thiogalactopyranosidethe (IPTG)-responsive promoters are the most widely used inducible systems in Clostridium (Table 16.2). Pfac system, engineered with the E. coli LacI regulator and lacO, is responsive to 1 mM IPTG in C. acetobutylicum, C. botulinum, C. sporogenes, and C. difficile [51, 188]. Sugar-based inducers are utilized as carbon sources in Clostridium hosts and complicated by carbon catabolite repression. Tetracycline/Anhydrotetracycline (aTc)-inducible systems regulated by TetR from S. aureus [158], C. autoethanogenum [186], and E. coli [181] are commonly used systems in Clostridium, however, can lead to inducer toxicity at high concentrations [180]. Use of Clostridium promoters is complicated by the limited supply of native promoters originating from a few strains that are not always transferrable across hosts within the same genus. Commonly used constitutive promoters such as the ptb (phosphotransbutyrylase) and thl (thiolase) from C. acetobutylicum and C. pasteurianum have shown varied promoter activity in different strains and stages of growth [189, 190]. Furthermore, high basal expression is often observed across clostridial inducible promoters such as the ARAi system [80], lactose-inducible promoters using the transcriptional regulator BgaR [182], xylose-inducible promoter-repressor system [179, 187, 191], as well as many others (Table 16.2) when used in organisms other than the base strain, thus, affecting the dynamic range. Optimization strategies to further fine-tune the biosensor performance to respond with the appropriate sensitivity and signal output can be achieved by altering the number, location, and sequence of TF binding sites. Through the addition of two tetO1 operator sequences surrounding the −35 and −10 boxes of a native constitutive promoter, the creation of a tetracycline/aTc-inducible promoter, Pcm-2tetO1, reduced basal expression and increased binding of TetR. This resulted in a 313-fold induction compared to a 41-fold induction performed

627

Table 16.2 Inducible promoters in Clostridium.

Inducer

Transcription factor Source

Promoter Source

Pptk C. acetobutylicum L-Arabinose

aTc

Fructose

Lactose

Lamin-aribiose

AraR C. acetobutylicum

TetR E. coli

FruR C. acetobutylicum

BgaR C. perfringens

Species tested

C. cellulolyticum

Dynamic range

Reporter gene

>800-fold

gusA (MUG)

(Imaging)

PpFbFPm

P1341–2 C. acetobutylicum

C. acetobutylicum

60-fold

C. beijerinckii

32-fold

P1343 C. acetobutylicum

C. acetobutylicum

21-fold

C. beijerinckii

27-fold

Pcm-2tetO1 Pcm /tetO1 C. acetobutylicum/E. coli

C. acetobutylicum

313-fold

C. ljungdahlii

28-fold

Pcm-tetO2/1

C. acetobutylicum

P0231 C. acetobutylicum P0234 C. acetobutylicum PbgaL C. perfringens

C. acetobutylicum C. acetobutylicum C. beijerinckii

mCherryOpt

gusA (X-Gluc)

References

[80]

[179]

[181]

120-fold 2-fold

mCherryOpt

50 g l−1 ) of citrate or α-ketoglutarate (αKG) in growth medium under nitrogen or thiamine limitation, respectively [14]. These discoveries led to engineering Y. lipolytica to produce other short-chain organic acids [16], and also laid the foundation for overproducing lipids and other products that were derived from cytosolic acetyl-CoA [17], because citrate can be cleaved in cytosol to produce acetyl-CoA. In the rest of this chapter, we summarize the key genetic engineering tools that have been developed for Y. lipolytica, and systematically discuss how these tools – together with strain screening and process optimization – were used to produce small molecule products. We organize the products as short-chain organic acids, TAGs, and other acetyl-CoA derived products.

19.2 Genetic Tools for Engineering Y. lipolytica

19.2 Genetic Tools for Engineering Y. lipolytica Most genetic engineering works of Y. lipolytica were done by manipulating its genome. It is widely accepted that the plasmids that can be replicated in Y. lipolytica are still not stable and lead to inferior performance compared with using genome engineering [1]. There are two mechanisms to edit Y. lipolytica’s genome: homologous recombination (HR) and nonhomologous end joining (NHEJ). In general, HR allows deletion of a specific part of the genome (e.g. knockout of a gene) or integrate an expression cassette into a specific locus to overexpress a gene; NHEJ allows random integration of an expression cassette [16]. An expression cassette refers to a promoter followed by a gene and a terminator. Wide-type Y. lipolytica strains have high NHEJ activities [4]. When a linear double-strand DNA is introduced into Y. lipolytica by using the lithium acetate method – the most commonly used Y. lipolytica transformation method [11] – it could be integrated into a locus of a chromosome randomly via NHEJ. The cells whose genome incorporated the foreign DNA can be selected with the aid of an auxotrophic marker, an antibiotic resistance marker, or a programmable nuclease/nickase such as Cas9 used with the CRISPR system. Auxotrophic strains are commercially available from ATCC or other sources (e.g. Y. lipolytica Po1f is available from ATCC). The most frequently used markers are ura3 and leu2. A book chapter [12] systematically summarized these strains and markers. Hygromycin B is the most commonly used antibiotic, but using it usually led to lower expression level of a target gene compared with using an auxotrophic marker [18], possibly because of the stress posed by the antibiotic. The CRISPR-based genome editing is not discussed in this chapter and readers can refer to a recent review [19]. A double-strand DNA could also be integrated into a predefined locus of Y. lipolytica via HR, if two long homologous arm sequences (1000 nt) were used [16]. In Y. lipolytica, HR-mediated integration occurred at a much lower frequency compared to that facilitated by NHEJ, so the long arms were used to increase the efficiency of HR. NHEJ mechanism in Y. lipolytica was successfully abolished by deleting ku70 (encoding a key protein binding broken DNA ends during NHEJ) to increase the success rate of HR-mediated integration [20]; however, the resulting strain often became abnormal during subsequent transformation and growth, suggesting that NHEJ was critical in stabilizing the chromosomes [18]. Every integration event can introduce one or more expression cassettes on one double-strand DNA into the genome. The transformant can be transformed again by using a similar method to express more genes and/or to increase expression level of the genes that have been overexpressed [21]. To establish a long metabolic pathway, one needs to express multiple genes, each of which may encode an enzyme catalyzing a step (reaction) of the pathway. Certain step of a pathway could be the rate-limiting step and increasing the expression level of the corresponding gene(s) could provide more catalysts to increase the reaction rate. Using multiple rounds of integration requires multiple selection markers and/or marker recycling [21].

737

738

19 Harness Yarrowia lipolytica to Make Small Molecule Products

Besides increasing gene copy number, one can also increase the expression efficiency to increase expression level of a given gene. The expression efficiency is determined by many factors, including the integration site and promoter. Different parts of a chromosome have different accessibility to RNA polymerase because of the three-dimensional structure of the chromosomes. If targeted integration is used, there are a panel of integration sites that can be considered [22]; if random integration is used, one often needs to screen a number of transformants to isolate a colony that has high gene expression level [18]. A promoter controls the copy number of RNA transcribed from a coding gene within a given period. Most promoters used in engineering Y. lipolytica were endogenous promoters that control expression of important enzymes or structural proteins/RNAs. A list of frequently used promoter can be found in a book chapter [12]. Additional genetic elements, including upstream activation sequence [23] and intron [17], have been used to improve promoter efficiency when they were placed next to a promoter. Genes from different organisms usually have different sequences. Even when they encode enzymes that catalyze the same chemical transformation, these enzymes exhibit different catalytic performance (e.g. activity, selectivity, stability), because of the differences in their three-dimensional structure and active functional groups, which were largely caused by the sequence variation. For a given rate-limiting reaction, testing genes from different sources was frequently used [21, 24], which may increase not only the quantity but also the specific activity of the catalyst. It is worth noting that higher rate of a reaction in a pathway may not necessarily lead to higher rate of forming the product [25], because of accumulation of reaction intermediates and/or depletion of substrates that are important to other essential cellular processes. In such cases, one needs to tune the rate of this reaction to find its optimal value, which may be achieved by using the principles described above to change the quantity and/or specific activity of the catalyst.

19.3 Production of Short-Chain Organic Acids 19.3.1

Production of Citrate

Under nitrogen limitation, if glucose is the sole carbon source, glycolysis in a typical wild-type Y. lipolytica strain would still be active and pyruvate can still be transported into mitochondria and be decarboxylated and oxidized there, producing acetyl-CoA (Figure 19.2). When one acetyl-CoA is condensed with one borrowed oxaloacetate, one citrate would be formed and can be isomerized into isocitrate. Under nitrogen limitation, many intracellular nitrogen-containing compounds are degraded to release ammonium ion for basic cellular maintenance [26]. For example, one AMP can be transformed into one inosine 5-phosphate and one ammonium ion by the action of AMP deaminase (Figure 19.2). AMP is an allosteric activator of isocitrate dehydrogenase (Idh), which cleaves isocitrate into α-ketoglutarate and CO2 to drive

19.3 Production of Short-Chain Organic Acids

Figure 19.2 Metabolic pathways of Y. lipolytica related to citrate production under nitrogen limitation condition. Idh, isocitrate dehydrogenase; AMP, adenine monophosphate. Idh is allosterically activated by AMP. Under nitrogen limitation, AMP would be degraded to release ammonium ion as part of the stress-response nitrogen scavenging mechanism. Reduction in Idh activity would lead to accumulation and eventually export of citrate from mitochondria to medium broth. Cytosolic malate is the countersubstrate of the mitochondrial citrate transporter. The cytosolic malate can be produced by pyruvate carboxylation plus oxaloacetate reduction. The malate imported to mitochondria can be used to synthesize citrate to sustain the malate translocation from cytosol to mitochondria. Pyruvate can enter mitochondria through pyruvate-proton symporter. The green numbers indicate the desired flux distribution at the branch point.

the oxidative TCA pathway in mitochondria (Figure 19.2). With the decline of the mitochondrial AMP concentration, the activity of Idh decreases, leading to accumulation of isocitrate in mitochondria. Mitochondrial citrate concentration also increases during this process because isocitrate and citrate can be interconverted and co-exist in equilibrium. Citrate can be exported to cytosol through a transporter using cytosolic malate as a counter-substrate [26]. Mitochondrial citrate accumulation would thus lead to export of citrate from mitochondria to cytosol (Figure 19.2). When one citrate is exported, one cytosolic malate is imported into mitochondria. This imported malate can be oxidized into one oxaloacetate, returning the borrowed oxaloacetate. The cytosolic pool of malate can be replenished by using pyruvate and CO2 through pyruvate carboxylation and oxaloacetate reduction (Figure 19.2). A 13 C metabolic flux analysis [27]

739

740

19 Harness Yarrowia lipolytica to Make Small Molecule Products

was done for a citrate-overproducing strain, supporting the aforementioned pathway. The chemical equation of producing citrate from glucose is Glucose →→→ Citrate + 3 NADH + ATP

(19.1)

Equation 19.1 sets the theoretical yield at 1.07 g g−1 . The produced NADH can be oxidized by using molecular oxygen to drive ATP production through the electron transport chain and the ATP synthase. Through this pathway, Y. lipolytica can derive the energy needed for its maintenance under the nitrogen limitation constraint. Y. lipolytica W29, a well-known wild-type strain, can produce 49 g l−1 citrate by using 60 g l−1 glucose in a batch fermentation under nitrogen limitation condition [26], achieving 76% of the theoretical yield. The main byproduct was isocitrate (the citrate to isocitrate ratio was usually 85 : 15). A small quantity of lipids was also produced. Process parameters including pH, temperature, and medium composition are important to achieving high citrate yield/titer/productivity – this topic was thoroughly reviewed in a book chapter [26]. Citrate was also produced from alkanes or ethanol in industrial scale in Soviet Union/Russia without using modern, tailored genetic manipulation techniques [14]. 19.3.2

Production of Pyruvate and 𝛂-Ketoglutarate

Some Y. lipolytica strains are thiamine auxotroph [14]. Under thiamine limitation, over time the cells would have a fixed or declining amount of thiamine diphosphate (Thpp), which is derived from thiamine and is a co-factor of dihydrolipoamide dehydrogenase (Lpd1). Lpd1 is an essential component of pyruvate dehydrogenase (Pdh) complex and of α-ketoglutarate (αKG) dehydrogenase (Kgdh) complex [28]. Since the onset of the thiamine limitation, activity of these complexes would decline, leading to accumulation of their substrates, pyruvate, and αKG (Figure 19.3). There are reports on producing either of them as the main product, depending on carbon source and strain’s genotype. Among the carbon substrates that enter the central metabolism via glycolysis, glycerol instead of glucose was frequently chosen in the studies that aimed to overproduce pyruvate or αKG [14, 29], possibly because of its lower cost. Under this condition, pyruvate is a precursor of αKG (Figure 19.3). If a strain’s Pdh activity dropped below a threshold, pyruvate would accumulate and became the major product. The overall chemical equation of producing pyruvate from glycerol is Glycerol →→→ Pyruvate + FADH2 + NADH + ATP

(19.2)

Equation 19.2 sets the theoretical yield to be 0.96 g g−1 . In a pilot-scale bioreactor [14], 61 g l−1 pyruvate was produced from 86 g l−1 glycerol, achieving 74% of the theoretical yield (Figure 19.3). The ratio of pyruvate to αKG in this experiment was not reported. Based on materials balance, we can conservatively estimate that the mass ratio of pyruvate to αKG to be at least 77 : 23. Some wild-type Y. lipolytica strains still had sufficient activity of Pdh under the thiamine limitation condition, allowing conversion of pyruvate into acetyl-CoA for the synthesis of αKG (Figure 19.3). For example, screening 500 wild-type strains isolated Y. lipolytica WSH-Z06, which produced 39 g l−1 α-ketoglutarate

19.3 Production of Short-Chain Organic Acids

Figure 19.3 Metabolic pathways of Y. lipolytica related to pyruvate and α-ketoglutarate production from glycerol under thiamine limitation condition. Thpp, thiamine diphosphate; Pdh, pyruvate dehydrogenase (complex); Kgdh, α-ketoglutarate dehydrogenase (complex); Idh, isocitrate dehydrogenase. Thpp is a co-factor of an essential component of Pdh complex and of Kgdh complex. Under the thiamine limitation, the availability of Thpp would be low, leading to reduction in activity of Pdh and Kgdh. As a result, pyruvate and α-ketoglutarate would accumulate in mitochondria. High mitochondrial pyruvate concentration would lead to stop of importing pyruvate from cytosol into mitochondria, resulting in pyruvate excretion. α-ketoglutarate uses a mitochondrial transporter similar to citrate’s. Whether pyruvate or α-ketoglutarate is the main product is dependent on the strain and the carbon source. The nitrogen source should not be limiting to ensure sufficient activity of Idh. Pyruvate can enter mitochondria through pyruvate-proton symporter. The green numbers indicate the desired flux distribution at the branch point.

and 17 g l−1 pyruvate from 100 g l−1 glycerol [29]. The overall chemical equation of producing α-ketoglutarate from glycerol is 2 Glycerol →→→ 𝛼KG + CO2 + 2 FADH2 + 4 NADH + ATP

(19.3)

Equation 19.3 sets the theoretical yield to be 0.79 g g−1 . The activity of Pdh in this strain was experimentally confirmed [24], suggesting that biosynthetic pathway of αKG was similar to that of citrate (Figure 19.3). Supply of the C4 unit (oxaloacetate) to αKG synthesis was found to be a rate-limiting step, because overexpressing pyruvate carboxylase led to production of 49 g l−1 α-ketoglutarate and 6 g l−1 pyruvate from 100 g l−1 glycerol [24], achieving 61% of the theoretical yield. This experiment [24] used a strong hybrid promoter (Php4d ) consisting of core sequence of a promoter involved in leucine biosynthesis (core of leu2 promoter, PLEU2 ) and four upstream activation sequences (UAS1B). The expression cassette was integrated into the genome randomly via NHEJ mechanism. The genes encoding pyruvate carboxylase from Saccharomyces cerevisiae and Rhizopus oryzae were tested, and the gene from R. oryzae yielded better result.

741

742

19 Harness Yarrowia lipolytica to Make Small Molecule Products

Figure 19.4 Metabolic pathways of Y. lipolytica related to α-ketoglutarate production from ethanol under thiamine limitation condition. α-ketoglutarate is the main product under thiamine limitation if ethanol is the carbon source, because pyruvate is not an intermediate during the ethanol-to-α-ketoglutarate conversion. Two acetyl-CoA can be transformed into one malate through the glyoxylate shunt. Export of α-ketoglutarate under this condition is not well understood, so its transport is not illustrated in this figure. Acetate is able to diffuse into mitochondria. The green numbers indicate the desired flux distribution at the branch point.

Another strategy to produce αKG as the major product is using a carbon substrate which can be converted into αKG without involving pyruvate as an intermediate (Figure 19.4). This was achieved by using alkanes or ethanol as carbon substrate [14]. When ethanol was used, it can be oxidized into acetate, which can enter central metabolism directly through acetyl-CoA. Two acetyl-CoA can be converted into one oxaloacetate through the glyoxylate shunt (Figure 19.4). Another acetyl-CoA can be condensed with this oxaloacetate to produce one αKG via the TCA pathway. This pathway involves both mitochondria and peroxisomes, and the detailed reaction compartmentalization is not fully elucidated. The overall chemical equation of producing αKG from glycerol is 3 Ethanol + 6 ATP →→→ 𝛼KG + CO2 + 10 NADH

(19.4)

The ATP needed by this reaction can be sufficiently provided by using the NADH that is regenerated by this reaction. Thiamine limitation is still needed to enable α-ketoglutarate accumulation. Equation 19.3 sets the theoretical yield to be 1.06 g g−1 . Y. lipolytica N1 can produce 49 g l−1 αKG from 117 g l−1 ethanol [30], which achieved 39% of the theoretical yield. The mass ratio of pyruvate to α-ketoglutarate was estimated to be 8 : 92, which proved that bypassing pyruvate was an effective strategy to produce αKG instead of pyruvate.

19.3 Production of Short-Chain Organic Acids

19.3.3

Production of Succinate

Most Y. lipolytica wild-type strains naturally do not accumulate succinate as a major product in growth broth. Succinate is chosen as an example in this chapter to illustrate the metabolic engineering principles used to engineer Y. lipolytica to accumulate a new short-chain organic acid. When αKG is decarboxylated and oxidized by Kgdh complex, the product is succinyl-CoA (Figure 19.5), which can be hydrolyzed to produce succinate. This is part of the oxidative TCA pathway, along which succinate is oxidized into fumarate by succinate dehydrogenase (Sdh) complex. The general strategy of accumulating a metabolite is to deactivate the enzymes that consume the metabolite, just as abolishing activity of Kgdh complex by using thiamine limitation in the case of overproducing αKG. Since no fermentation condition was found to turn off Sdh complex through the native regulatory mechanism, genetic engineering was used to intervene the activity of the complex. This protein complex has five subunits, Sdh1–5 [31]. When deletion of sdh1 or sdh2 was attempted, no viable mutants were recovered. It was later found that growth of the mutants was sensitive to carbon source – they grew on glycerol but not glucose [31]. A theory to explain the observation is that the succinate-to-fumarate reaction reduces FAD into FADH2 . When the complex’s activity is abolished and glucose is used as the substrate, the cells lost its means to produce FADH2 , which is essential to growth. FADH2 can be regenerated when glycerol is used as carbon source. Both glucose and glycerol use glyceraldehyde 3-phosphate (GAP) as the entry point to lower glycolysis. While the glucose-to-GAP pathway (upper glycolysis) does not release any electrons, two electrons are released when one glycerol (a molecule more reduced than glucose) is converted into one GAP, and the two electrons can be stored in one FAD, forming one FADH2 . When sdh1 was deleted in a Y. lipolytica wild-type strain, 21 g l−1 succinate was produced from 63 g l−1 glycerol when pH was buffered at 5 by using calcium carbonate [31]. Similar results were obtained when sdh2 or sdh5 was deleted [32]. Maintaining pH at 5 enabled the cells to quickly consume all the added glycerol, but low pH fermentation was desired (at pH lower than pK a1 of succinate, 4.2), because undissociated succinic acid would be formed in the broth resulting in much lower purification cost. A Δsdh5 strain was systematically engineered to gain the phenotype of producing succinic acid at low pH [16]. The Δsdh5 strain was found to make a large quantity of acetate as a byproduct. A gene (ach1) was found to encode a transferase that moved the CoA group from acetyl-CoA to succinate, producing acetate and succinyl-CoA (Figure 19.5). Once ach1 was deleted, almost no acetate accumulated through the fermentation, but pyruvate became a major byproduct. This issue was resolved before in the αKG production example by overexpressing a pyruvate carboxylase that converted pyruvate into oxaloacetate [24]. The same strategy reduced pyruvate accumulation in this succinate-overproducing strain. A similar approach, overexpressing phosphoenolpyruvate (PEP) carboxykinase (Pck) to convert PEP into oxaloacetate, worked even better. The strain’s performance was further improved when the enzyme converting succinyl-CoA into succinate (Scs2) was overexpressed. The resulting strain produced 110 g l−1

743

744

19 Harness Yarrowia lipolytica to Make Small Molecule Products

Figure 19.5 Metabolic pathways of Y. lipolytica related to succinate production from glycerol. Thicker green lines with arrowhead indicate the steps whose enzymes have been overexpressed. Brown lines with arrowhead indicate the steps whose enzymes have been knocked out for improving succinate production. Sdh, succinate dehydrogenase; Ach1, a CoA-transferase; Kgdh, α-ketoglutarate dehydrogenase; Idh, isocitrate dehydrogenase; Scs2, succinyl-CoA synthase 2; Pck, phosphoenolpyruvate carboxykinase. Compartmentalization is not considered in this figure. Glycerol, not glucose, is frequently used as the carbon source to produce succinate because assimilating glycerol regenerates FADH2 (an essential co-factor), which could not be regenerated when Sdh is inactive and glucose is the carbon source. The green numbers indicate the desired flux distribution at the branch point.

succinate at low pH (pH at the final time point was 3.4). The yield achieved was 0.53 g g−1 , 83% of the theoretical yield [16]. The chemical equation of producing succinate from glycerol is 2 Glycerol →→→ Succinate + 2 CO2 + 2 FADH2 + 5 NADH + 2 ATP (19.5) Equation 19.5 sets the theoretical yield at 0.64 g g−1 .

19.4 Production of Triacylglycerol

In the above example, sdh5 and ach1 were deleted by using HR, with ∼1000 bp homologous arms. Amino acid auxotrophic markers ura3 and leu2 were used to delete sdh5 and ach1, respectively. After the deletions, ura3 and leu2 were removed from the genome by using the Cre–loxP method. Each marker was flanked by two loxP sites. A plasmid expressing Cre protein was introduced into the strain by using hygromycin B as a selection agent. The Cre protein is a recombinase and can excise DNA within two loxP sites. This Cre-expressing plasmid can be lost after the gene deletions are achieved. Overexpression of pck and scs2 were done by using Php4d promoter, ura3/leu2 marker, and NHEJ random integration [16].

19.4 Production of Triacylglycerol 19.4.1

De novo Triacylglycerol Biosynthesis

When Y. lipolytica grows on fatty acids, it stores some of these hydrophobic substrates in the form of TAGs in lipid bodies, and a cell can have a single large lipid body that is over 50% of its dry weight [33]. When wild-type Y. lipolytica grows on hydrophilic substrates (e.g. glucose, glycerol), it accumulates TAGs to ∼10% of its dry mass during nitrogen limitation, to store carbon atoms and energy for cell growth when nitrogen becomes available again. TAGs are glycerol esters of fatty acids (mainly C16 and C18). The glycerol backbone is derived from glycerol-3-phosphate (G3P), which can be easily derived from glycolysis (Figure 19.6). One G3P can accept two fatty acids through two transesterification reactions using two fatty acyl-CoAs. The product is one phosphatidic acid (PA), which can be dephosphorylated to become one diacylglycerol (DAG), which accepts another fatty acid to become one TAG through a transesterification reaction using another fatty acyl-CoA (Figure 19.6). In Y. lipolytica, there are a few genes whose protein products can catalyze the last step of TAG biosynthesis, and one of them is dga1. The TAG biosynthesis happens in endoplasmic reticulum (ER) and lipid bodies [34]. One stearoyl-CoA (an example fatty acyl-CoA, C18:0) is synthesized by condensing eight malonyl-CoA and one acetyl-CoA through the iterative fatty acid biosynthetic pathway (Figure 19.6). In each iteration (the number of iterations is the same to that of malonyl-CoA used by the pathway), one ketone group needs to be reduced to one methylene group consuming two NADPH. One malonyl-CoA can be obtained by condensing one CO2 with one acetyl-CoA. This reaction needs to hydrolyze one ATP per malonyl-CoA to overcome the energy barrier and is catalyzed by acetyl-CoA carboxylase (Acc). In Y. lipolytica, one ACC-encoding gene is acc1, and this de novo fatty acid biosynthesis occurs in cytosol and ER [34]. Under nitrogen limitation, Y. lipolytica diverts most of its carbon flux toward citrate production as discussed in the previous section. In the process of translocating citrate from mitochondria to extracellular space, citrate has to pass cytosol, in which one citrate may be cleaved into one oxaloacetate (OAA)

745

746

19 Harness Yarrowia lipolytica to Make Small Molecule Products Oleoyl-CoA

Triacylglycerol

14 Glucose

D9

Triacylglycerol

Diacylglycerol

Stearoyl-CoA 24 Malonyl-CoA

PA

Dga1

28 ATP

CoA

24 CO2 NADH Pi 1

28 GAP Pi

27

2

G3P

Negative feedback

CoA

3 Stearoyl-CoA 48 NADPH 3

27 NADH Pi

Acc1 24

27 CO2, 24 ATP CoA

Pi

27 Acetyl-CoA

27 BPG

27 Malate 54 ATP 27 OAA

27 Pyruvate

27 ATP 27 NADH

27 Citrate Cytosol/ER/Lipid bodies Mitochondria

27 Malate 27 Pyruvate

27 NADH 27 CO2, 27 NADH

27 Citrate Isocitrate 27 OAA

CoA 27 Acetyl-CoA

α-ketoglutarate Oxygen atom

Sulfur atom

Lower flux due to nitrogen limitation

Double bond

Figure 19.6 Metabolic pathways of Y. lipolytica related to triacylglycerol production from glucose. Thicker green lines with arrowhead indicate the steps whose enzymes have been overexpressed. The dotted green line connects two identical molecules placed at different positions. GAP, glyceraldehyde 3-phosphate; G3P, glycerol 3-phosphate; BPG, 1,3-biphosphoglycerate; OAA, oxaloacetate; PA, phosphatidic acid; ER, Endoplasmic reticulum; Dga1, diacylglycerol acyltransferase 1; Acc1, acetyl-CoA carboxylase 1; D9, Δ9 stearoyl-CoA desaturase. Acetyl-CoA is the main building block of triacylglycerol, and is provided in cytosol through cleavage of citrate. Unlike citrate production, here citrate export from mitochondria can be sustained by using the cytosolic malate that is derived from the citrate cleavage. Nitrogen limitation is still required to accumulate citrate in mitochondria, when cytosolic acetyl-CoA is provided through citrate cleavage (an alternative way of producing cytosolic acetyl-CoA is illustrated in Figure 19.10). Stearyl CoA, a saturated acyl-CoA, is inhibitory to Acc1. Overexpression of D9 reduced the concentration of stearyl CoA, alleviating the Acc1 inhibition. The green numbers indicate the desired flux distribution at the branch points.

and one acetyl-CoA at the expense of one ATP. This cytosolic acetyl-CoA is the building block of the de novo fatty acid biosynthesis, and also of other important anabolic reactions such as synthesis of ergosterol [35]. The overall chemical equation of producing TAG from glucose is 14 Glucose + 25 ATP + 48 NADPH →→→ Triacylglycerol + 27 CO2 + 53 NADH

(19.6)

Equation 19.6 needs further revision in the following sections because the process covered by it is not self-sustainable – not all the ATP and NADPH can be generated by using the excess NADH.

19.4 Production of Triacylglycerol

19.4.2 The Push-and-Pull Strategy to Increase Flux Toward Triacylglycerol Synthesis A very successful strategy to convert Y. lipolytica from a citrate producer into a TAG producer is overexpression of acc1 and dga1 [17], whose protein products catalyze the first step of fatty acid synthesis and the last step of TAG synthesis respectively. Having more Acc1 improves the fatty acid synthesis by providing more catalysts to compete with other reactions for cytosolic acetyl-CoA (push flux toward product formation). Overexpressing dga1 speeds up formation of stable, nontoxic TAGs (pull flux toward product formation), which also reduces accumulation of the intermediates that could negatively regulate activity of upstream enzymes. For example, saturated acyl-CoA (e.g. stearoyl-CoA) inhibits Acc1. This push-and-pull strategy successfully reduced citrate concentration from 47 to 5 g l−1 when it is applied to a wild-type strain (Y. lipolytica Po1g) in a glucose fermentation under a nitrogen limitation condition [17]. At the same time, the TAG titer was increased from 2.5 to 17.6 g−1 l−1 . In this experiment, the TAGs constituted 61.7% of DCW; the overall process yield was 0.195 g g−1 ; the overall TAG productivity was 0.143 g l−1 h−1 . This strain overexpressing acc1 and dga1 was named as Y. lipolytica AD. In this study, the acc1 overexpression cassette used Php4d ; the dga1 cassette used the intron-containing TEF promoter (PTEFin ), which was much stronger than Php4d [17]. The acc1 and dga1 expression cassettes were cloned into one plasmid. The resulting plasmid was integrated randomly into genome of Y. lipolytica Po1g through the NHEJ mechanism, and the cells were selected by using the leu2 marker. Since the acc1 and dga1 used in this experiment were native genes, Y. lipolytica AD contained two copies of acc1 and dga1. For example, one copy of acc1 was under control of its native promoter and the second copy was controlled by Php4d . 19.4.3 Desaturation of Fatty Acyl Chains Improved Triacylglycerol Production A follow-up study further increased the expression level of acc1. PTEFin was used to overexpress acc1 and dga1 in their respective expression cassette. The two cassettes were then connected and randomly introduced into genome of Po1g by using the leu2 marker and the NHEJ mechanism [36]. Δ9 stearoyl-CoA desaturase (D9) was overexpressed in this new strain by performing a new round of transformation. Expression of d9 was controlled under PTEFin ; the cassette was randomly integrated; the transformants were selected by using hygromycin B. The resulting strain (Y. lipolytica ad9) had a 20% increase in TAG yield compared with Y. lipolytica AD (Y. lipolytica ad9’s yield: 0.234 g g−1 ). Most importantly, TAG productivity of Y. lipolytica ad9 was 400% higher than that of Y. lipolytica AD (Y. lipolytica ad9’s productivity: 0.707 g l−1 h−1 ). High productivity is important to industrial process because it means lower capital investment and operating cost. D9 could have played multiple roles in improving the lipid synthesis [36]: (i) it decreased the percentage of saturated acyl-CoAs (e.g. stearoyl-CoA) that would inhibit Acc1; (ii) it increased the percentage of unsaturated acyl-CoAs (e.g.

747

748

19 Harness Yarrowia lipolytica to Make Small Molecule Products

oleoyl-CoA) that are more preferred by Dga1 in synthesis of TAGs (resulting in higher Dga1 activity); (iii) it improved cell growth by providing more unsaturated acyl-CoAs, which are building blocks of cellular membranes; (iv) it enhanced cell’s stress tolerance (e.g. osmotic stress tolerance) by modulating fluidity of the cellular membranes. 19.4.4

Improve the Pathway Yield Through Balancing Redox Cofactors

The overall product yield is a very important process performance indicator because it largely determines the substrate cost, which can be the largest fraction of the product manufacturing cost [37]. The overall glucose-to-TAG chemical equation (Eq. 19.6) reveals that TAG biosynthesis requires 48 NADPH per TAG and regenerates 53 NADH per TAG. A 13 C Metabolic flux analysis [38] found that majority of the NADPH consumed by Y. lipolytica AD in the lipid production phase was provided through oxidative pentose phosphate pathway (PPP), through which one glucose can be fully oxidized into six CO2 regenerating 12 NADPH at the expense of one ATP. If all 48 NADPH are provided this way, Eq. 19.6 can be derived to be 18 Glucose + 29 ATP →→→ Triacylglycerol + 51 CO2 + 53 NADH (19.7) Equation 19.7 sets the theoretical yield at 0.275 g g−1 (the needed ATP can be regenerated by using the excess NADH). This is close to that achieved by Y. lipolytica ad9 (0.234 g g−1 ), and made any effort aiming to further improve the yield to be difficult under the assumption that NADPH is provided through the oxidative PPP. One solution is to use the excess NADH from the TAG production to regenerate NADPH. If one NADPH is regenerated per NADH, Eq. 19.6 can be derived to be 14 Glucose + 25 ATP →→→ Triacylglycerol + 27 CO2 + 5 NADH (19.8) If we assume 2.5 ATP can be regenerated per NADH and 32 ATP can be regenerated per glucose, Eq. 19.8 can be derived to be 14.4 Glucose →→→ Triacylglycerol + 29.4 CO2

(19.9)

Eq. 19.9 sets the theoretical yield at 0.343 g g−1 . A number of strategies can be used to implement the idea of regenerating NADPH by using NADH [37]. One effective approach was overexpressing a Clostridium acetobutylicum GAP dehydrogenase (encoded by gapC) that can regenerate one NADPH from the oxidation of one GAP. The native Y. lipolytica GAP dehydrogenase (GapA) prefers to regenerate NADH. The net result of this overexpression is to regenerate one NADPH at the cost of one NADH (Figure 19.7a). Another approach was to express a Mucor circinelloides malic enzyme (Mce2, encoded by mce2) that can regenerate one NADPH when one malate is decarboxylated and oxidized into one pyruvate. This reaction can be used to form a cycle, when the resulting pyruvate is converted back into malate

19.4 Production of Triacylglycerol

Figure 19.7 Two strategies to use NADH to regenerate NADPH. (a) The glyceraldehyde 3-phosphate (GAP) dehydrogenase approach. (b) The malate–pyruvate–oxaloacetate cycle approach. Thicker blue lines with arrowhead indicate chemical reactions catalyzed by recombinant foreign enzymes. Green lines with arrowhead indicate native reactions. GapA, native GAP dehydrogenase; GapC, GAP dehydrogenase from C. acetobutylicum; Mce2, a malic enzyme from M. circinelloides.

through pyruvate carboxylation and oxaloacetate reduction (Figure 19.7b). The net reaction of this cycle is regeneration of one NADPH at the expense of one NADH and one ATP. When gapC and the mce2 were co-expressed in Y. lipolytica AD, the overall process yield at bench top bioreactor scale was increased from 0.18 to 0.27 g g−1 . The yield during the lipid production phase during this fermentation exceeded 0.275 g g−1 , the theoretical yield set by Eq. 19.6. This strain was named as Y. lipolytica ADgm. The aforementioned strategy not only replenished the NADPH needed by the lipid biosynthesis, but also substantially reduced the amount of excess NADH that had to be oxidized by using oxygen. In high cell density aerobic fermentation, cell growth is often limited by oxygen transfer rate especially in industrial-scale bioreactors [37], because increasing the gas mass transfer is costly due to high viscosity of the high cell density culture. Partly because of its lower specific oxygen requirement, Y. lipolytica ADgm were able to achieve a high cell density quickly, resulting in high TAG titer (99 g l−1 ) and remarkable TAG productivity (1.2 g l−1 h−1 ). 19.4.5

Lower Substrate Cost by Utilizing Waste Streams

Besides increasing product yield, lowering unit cost of raw substrate is another effective strategy to reduce the overall substrate cost. If a substrate is derived from a waste stream, it could have a low or even negative cost. Acetate will be main subject of this section. Utilization of acid whey will be briefly discussed at the end of the section. Acetate can be produced from CO2 by using hydrogen and/or CO as source of energy in a bacterial (Moorella thermoacetica) fermentation [39]. CO2 as the most well-known greenhouse gas has received wide attention from the public regarding its emission. Hydrogen and CO could be obtained from some industrial waste gas streams (e.g. exhaust gas from steel mills), and hydrogen could

749

750

19 Harness Yarrowia lipolytica to Make Small Molecule Products

Figure 19.8 Metabolic pathways of Y. lipolytica related to triacylglycerol production from acetate. Thicker green lines with arrowhead indicate the reactions whose enzymes have been overexpressed. G3P, glycerol 3-phosphate; PA, phosphatidic acid. All the reactions occur in cytosol, endoplasmic reticulum, or peroxisomes. Dga1, diacylglycerol acyltransferase 1; Acc1, acetyl-CoA carboxylase 1. Green number indicates the ratio of the fluxes. The green numbers indicate the desired flux distribution at the branch points.

be produced from water electrolysis/splitting powered by renewable energy [39]. Acetate can also be acquired through anaerobic digestion of food or other organic carbon wastes [40]. Many wild-type Y. lipolytica strains can grow on acetate as sole carbon source. Y. lipolytica AD accumulated TAG to ∼55% of its DCW when it grew on acetate under nitrogen limitation condition [41]. Acetate is activated into acetyl-CoA in cytosol at the expense of two ATP per acetate (Figure 19.8). The cytosolic acetyl-CoA can be directly used for fatty acid synthesis, therefore fatty acid production is independent of citrate transport when the substrate is acetate. Two acetyl-CoA can be condensed into one malate through the glyoxylate shunt (the same process also used in the ethanol-to-αKG process, Figure 19.4). Malate can be decarboxylated into pyruvate which fuels the gluconeogenesis for generation of G3P and glucose 6-phosphate (G6P). G3P provides the glycerol backbone of TAG (Figure 19.8), and G6P is needed to regenerate NADPH through PPP. The chemical equation of the acetate-to-TAG process is 29 Acetate + 84 ATP + 48 NADPH →→→ Triacylglycerol + NADH + CO2

(19.10)

This pathway requires large quantity of energy and reducing equivalents, because the acetate-to-acetyl-CoA process is nonoxidative and energy-consuming. As a comparison, in the glucose-to-fatty acid process, the segment from glucose to acetyl-CoA is oxidative, releasing energy and reducing equivalents. As a result, some acetate must be oxidized into CO2 to provide the ATP and NADPH needed by the TAG synthesis. If we assume (i) 8 ATP can be regenerated per acetate through the TCA cycle, (ii) 12 NADPH can

19.4 Production of Triacylglycerol

be regenerated at the expense of 4 acetate and 3 ATP through the oxidative PPP, and (iii) 2.5 ATP is regenerated per NADH through respiration and oxidative phosphorylation, Eq. 19.10 is derived to be 55.6 Acetate →→→ Triacylglycerol + 54.1 CO2

(19.11)

Eq. 19.11 sets theoretical yield at 0.267 g g−1 . A 13 C MFA study revealed that when Y. lipolytica AD grew on acetate, the oxidative TCA flux and the oxPPP flux were indeed high during the lipid production phase [41], presumably for regenerating ATP and NADPH, respectively. It takes 18 enzymatic steps to convert four acetate into one G6P to enter the oxPPP, thus cell growth and lipid production could be limited by the availability of NADPH. This hypothesis was supported by a recent study, in which feeding gluconate to Y. lipolytica growing on acetate substantially improved cell growth and lipid production [42]. Gluconate can be phosphorylated into 6-phosphogluconate (6PG), which is one substrate of regenerating NADPH in the oxPPP. Feeding gluconate directly fuels the oxPPP. Another challenge faced in the acetate fermentation is the low concentration of acetate in the raw substrate [39]. For example, M. thermoacetica gas fermentation typically generates 30 g l−1 acetate in the aqueous broth. If it is assumed that 30 g l−1 acetate is converted into TAGs based on the theoretical yield (0.267 g g−1 ) and that there is no dilution, the TAG titer would only be 8 g l−1 , which is much lower than the minimal product titer required by the relevant product recovery process. An effective solution was provided through process engineering. The acetate fermentation was operated in a semi-continuous mode, in which feedstock solution was fed continuously into a stirred-tank bioreactor [43]. Culture broth was also harvested continuously. Compared with a continuous mode, the only difference was that this design had a cell retention device (e.g. a membrane filtration unit) so that the harvested broth was cell-free and the culture in the bioreactor would not reach a steady state (cell concentration and TAG titer would increase over time). By monitoring the CO2 evolution rate and the oxygen consumption rate of the bioreactor, and using this information as a feedback signal, acetate feedstock feeding was optimized [43], which led to a good fermentation performance (TAG titer: 115 g l−1 ; yield: 0.16 g g−1 ; productivity: 0.8 g l−1 h−1 ). So far in this chapter, all the examples used the substrates Y. lipolytica naturally can utilize. How to engineer Y. lipolytica to utilize a new substrate is introduced below by using a recent example [44]. Acid whey is generated in large volume (165 000 tons per year globally) by the yogurt industry and 25–50% of them are underutilized. The most abundant carbon-containing molecule in acid whey is lactose, a disaccharide [44]. Most Y. lipolytica wild-type strains lack the enzyme that can hydrolyze lactose into one glucose and one β-galactose, and do not have any transporter that can take up lactose (Figure 19.9). When a lactose hydrolase (LacA) with an N-terminal secretion signal peptide was expressed in Y. lipolytica AD, the hydrolase was secreted, allowing the resulting strain (Y. lipolytica XLacA) to grow in a medium that had lactose as the sole carbon source. The strain, however, could not efficiently utilize galactose, because it did not have sufficient amount of enzymes to convert β-galactose into α-glucose-1-phosphate (G1P), which can be efficiently isomerized into G6P to enter the central

751

752

19 Harness Yarrowia lipolytica to Make Small Molecule Products

Figure 19.9 Engineering Y. lipolytica to utilize lactose as carbon source. Thicker blue lines with arrowhead indicate the reaction catalyzed by a recombinant foreign enzyme. Green lines with arrowhead indicate native reactions. Thicker green arrows indicate the native reactions that have been enhanced through gene overexpression. α-GAL-1P, α-galactose-1-phosphate; α-GLC-UDP, α-glucose-1-UDP; α-GAL-UDP, α-galactose-1-UDP; Gal10M, galactose mutarotase; Gal1, galactokinase; Gal7, α-GAL-1P uridylyl-transferase; Gal10E, α-GAL-UDP epimerase. Green number indicates the ratio of the fluxes. The lighter and darker shaded six-atom rings are the glucose and galactose rings, respectively.

metabolism (Figure 19.9). β-Galactose needs to be isomerized into α-galactose by galactose mutarotase (Gal10M). α-Galactose next needs to be phosphorylated by galactokinase (Gal1) to form α-galactose-1-phosphate (α-GAL-1P). The galactose unit in α-GAL-1P can be exchanged with the glucose unit of a borrowed α-glucose-1-UDP (α-GLC-UDP, catalyzed by α-GAL-1P uridylyl-transferase, Gal7). After the exchange, α-GAL-1P was transformed into α-glucose 1-phosphate (G1P), and α-GLC-UDP was converted into α-galactose-1-UDP (α-GAL-UDP), which can be isomerized back to α-GLC-UDP (catalyzed by α-GAL-UDP epimerase, Gal10E), returning the borrowed α-GLC-UDP

19.4 Production of Triacylglycerol

(Figure 19.9). At this point the transformation of β-galactose into G1P is complete. When four genes involved in this transformation were overexpressed (gal10M, gal1, gal7, gal10E), the resulting strain (Y. lipolytica XLacAGal4) was able to assimilate all the sugars and acids in the undiluted acid whey [44]. The whey contained ∼50 g l−1 substrates (32 g l−1 lactose, 6 g l−1 lactate and 6 g l−1 galactose). After the fermentation, 14 g l−1 cells were formed containing 6 g l−1 TAGs. The lipid-rich yeast may be used as animal feed directly [44]. 19.4.6

Improve Availability of Cytosolic Acetyl-CoA

Nitrogen limitation was needed for TAG production when glucose was used as the carbon substrate. Although it was an effective strategy, it separated the whole process into a growth stage and a product formation stage, which limited the overall process productivity because little TAG was formed during the growth phase (when nitrogen was not the limiting nutrient). Another concern is that nitrogen limitation may not be easily achieved when the substrate is a complex feedstock that has high nitrogen content. Motivated by these limitations, new pathways have been explored to provide cytosolic acetyl-CoA without nitrogen limitation. There are at least seven candidate pathways to produce cytosolic acetyl-CoA in Y. lipolytica. A study [45] evaluated five of them by using Y. lipolytica AD as the base strain. The most effective approach found in the study was overexpression of carnitine acetyltransferase (Cat2), which transfers acetyl-moiety of mitochondrial acetyl-CoA to carnitine in mitochondria (Figure 19.10). Mitochondria acetyl-carnitine can be exported to cytosol by using a transporter and cytosolic carnitine as a counter substrate. The translocated acetyl-carnitine donates its acetyl-moiety to a CoA, forming acetyl-CoA in cytosol. The produced carnitine serves as the counter substrate of acetyl-carnitine export to sustain the process. The net reaction is to transport an acetyl-CoA from mitochondrion to cytosol. The strain expressing cat2 from S. cerevisiae started to produce TAG from the beginning of the growth phase, and thus achieved 3.1-fold higher productivity than the parent strain in a well-controlled bioreactor [45]. The Cat2 overexpression approach still relied on mitochondrial acetyl-CoA, which is generated from pyruvate by pyruvate dehydrogenase (Pdh) complex in mitochondria. All the other four pathways evaluated in the study [45] produce acetyl-CoA without involving mitochondria, and three out of the four pathways improved TAG production in Y. lipolytica AD to various extents. The three pathways are introduced below. In the first pathway, cytosolic pyruvate was directly cleaved into acetyl-CoA and formate (pyruvate formate lyase [Pfl]-encoding genes from Escherichia coli were expressed). In the second one, cytosolic pyruvate was decarboxylated into CO2 and acetaldehyde (a pyruvate decarboxylase [Pdc] from S. cerevisiae was expressed). By using an aldehyde dehydrogenase (AldH) from E. coli, acetaldehyde was oxidized into acetate, which can be activated into acetyl-CoA by the native acetyl-CoA synthase (Acs). In the third pathway, xylulose 5-phosphate in the PPP was cleaved into

753

754

19 Harness Yarrowia lipolytica to Make Small Molecule Products

Figure 19.10 Seven approaches to improve availability of cytosolic acetyl-CoA in Y. lipolytica. Cat2, carnitine acetyltransferase; Pfl, pyruvate formate lyase; Pdc, pyruvate decarboxylase; AldH, aldehyde dehydrogenase; Acs, acetyl-CoA synthase; Aad, CoA-acylating aldehyde dehydrogenase; Pk, phosphoketolase; Pta, phosphotransacetylase. Approach 1 is to use Cat2 to transport acetyl-CoA from mitochondria to cytosol. Approach 2 uses Pfl to produce acetyl-CoA from pyruvate in cytosol. Approach 3 uses Pdc, AldH, and Acs to produce acetyl-CoA from pyruvate in cytosol. Approach 4 uses Pdc and Aad to produce acetyl-CoA from pyruvate in cytosol. Approach 5 uses Pk and Pta to produce acetyl-CoA from xylulose 5P in cytosol. Approach 6 uses Pdh to produce acetyl-CoA from pyruvate in cytosol. Approach 7 uses β-oxidation to produce acetyl-CoA from TAGs in peroxisomes. Blue lines with arrowhead indicate the reactions that have been enabled or enhanced by overexpressing foreign genes. Dotted green lines with arrowhead indicate the native reactions we have not considered in this figure to make the pathways easier to understand, and to provide reasonable stoichiometric coefficients. The six pyruvate-to-acetyl-CoA reactions are in parallel – the pyruvate could be converted into one acetyl-CoA through one of the reactions.

acetyl phosphate and GAP (a phosphoketolase [Pk] from Aspergillus nidulans was expressed), and acetyl phosphate was transformed into acetyl-CoA by a phosphotransacetylase (Pta) from Bacillus subtilis. Since Y. lipolytica had a very high carbon flux through PPP, xylulose 5-phosphate consumed by the third pathway can be easily replenished.

19.5 Production of New Products Y. lipolytica has been engineered to produce many small molecules that the species does not naturally produce. Basic principles of three examples are discussed here, covering polyunsaturated fatty acids, isoprenoids, and polyketides. More comprehensive review of different new products made by Y. lipolytica can be found in these review articles [1, 46].

19.5 Production of New Products

19.5.1

Production of Eicosapentaenoic Acid

DuPont has engineered Y. lipolytica to accumulate eicosapentaenoic acid (EPA) and commercialized the technology to produce nutraceuticals and animal feed [21]. Every EPA has 20 carbon atoms and five C=C double bonds (Figure 19.11). Oleic acid is the major fatty acid in the TAGs produced by wild-type Y. lipolytica strains. Every oleic acid has 18 carbon atoms and one C=C double bond. In seven rounds of genome editing, four desaturases and one elongase were overexpressed in a Y. lipolytica strain (the starting strain was a wild-type Y. lipolytica, ATCC20362), to introduce four C=C double bonds to oleic acid, and to elongate the carbon chain by two carbon atoms (Figure 19.11). All of the five genes were integrated into the genome in multiple copies (as many as seven copies) to improve their expression level. Various promoters (PEXP1 , PFBAINm , PGPAT , PGPD , PYAT ) were also used to drive different copies of each gene. In each round, a linear double-strand DNA fragment containing three to four expression cassettes was inserted into the genome by using an auxotrophic marker and the marker was removed. In one round, one transformant (Y. lipolytica Y4128) had 138% higher EPA content than its parent strain [21]. Using genome walking revealed that the insertion of the expression cassettes targeted locus pex10 and abolished activity of Pex10. This protein (Pex10) is essential in maintaining integrity of peroxisomes, where β-oxidation takes place. As a result, β-oxidation was not functioning in Y. lipolytica Y4128, preventing degradation of fatty acyl-CoAs. Because (i) the elongase (Δ9 elongase, D9e) catalyzed the rate-limiting step in the EPA synthesis

Figure 19.11 Biosynthetic pathway of eicosapentaenoic acid (EPA) used by the DuPont study. Thicker blue lines with arrowhead indicate the reactions catalyzed by recombinant foreign enzymes. Green lines with arrowhead indicate native reactions. Thicker green arrows indicate the native reactions that have been enhanced through gene overexpression. EDA, eicosadienoic acid; ETrA, eicosatrienoic acid; DGLA, dihomo-γ-linolenic acid; ARA, arachidonic acid; ETA, eicosatetraenoic acid; C16e, palmitic acid elongase; D9e, Δ9 elongase; D[x]d, Δ[x] desaturase, where x is 5, 8, 9, 12, or 17. Free fatty acids are usually not the substrates of the elongases and desaturases but they are shown here to simplify the figure. Their actual substrates are the corresponding CoA derivative or glycerol esters.

755

756

19 Harness Yarrowia lipolytica to Make Small Molecule Products

and (ii) the elongase only accepted the acyl moiety in fatty acyl-CoA as substrate, the increased concentration of fatty acyl-CoA by Pex10 inactivation enhanced activity of the elongase and the overall chemical transformation. With overexpression of C16e (the elongase that can convert C16 fatty acids into C18 ones) and Cpt (it can improve fatty acid desaturation), the final strain accumulated EPA to 15% of DCW.

19.5.2

Production of Triacetic Acid Lactone

Since Y. lipolytica has been shown to be an excellent producer of lipids, whose main building blocks are acetyl-CoA and malonyl-CoA, it was rational to engineer the yeast to make other products that are derived from the same building blocks. A successful example [47] is production of triacetic acid lactone (TAL), a structurally simple polyketide. One TAL is derived from one acetyl-CoA and two malonyl-CoA in one reaction step. When the enzyme catalyzing this reaction (encoded by g2ps1) was expressed in a wild-type strain (Y. lipolytica Po1f ), 2.1 g l−1 TAL was produced in a culture tube. Four copies of g2ps1 were integrated into genome of Po1f to achieve this result. This strain was named as Y. lipolytica YT. Subsequently, various ways were explored to improve acetyl-CoA supply in cytosol. A new route they tested was to establish the pyruvate dehydrogenase (Pdh) complex in Y. lipolytica cytosol (Figure 19.10). All the proteins of the complex (Pda1, Pde2, Pde3 and Pdb1) were overexpressed by using a strong promoter that had multiple UAS elements and TEF core promoter (P16dTEF ). When Acc1 (the enzyme converting acetyl-CoA into malonyl-CoA) was co-expressed with the cytosolic Pdh complex (Figure 19.12) in Y. lipolytica YT, the TAL titer was increased to 2.5 g l−1 under the same fermentation condition (23% improvement). The authors [47] also tried to increase availability of cytosolic acetyl-CoA by overexpression of Pex10, the peroxisome scaffold protein that was found to be a knockout target in the EPA study [21]. The titer of TAL was increased by 22% to 2.4 g l−1 when Pex10 was overexpressed in Y. lipolytica YT. It was suspected that overexpression of Pex10 indirectly enhanced β-oxidation of TAGs to supply cytosolic acetyl-CoA (Figure 19.10). Another pathway they tested to improve cytosolic acetyl-CoA supply is the pyruvate-acetaldehyde-acetate-AcCoA route (Figures 19.10 and 19.12). The native genes were used to enhance this pathway [47]. Because there are two native enzymes that may catalyze the pyruvate decarboxylation step and there are five enzymes that can catalyze the acetaldehyde oxidation step, various combinations of these enzymes were tested. A combination (Pdc2, Ald5, Acs1, and Acc1) improved the titer to 2.8 g l−1 . The strain is named in this chapter as Y. lipolytica PAAA. If we consider all the cytosolic acetyl-CoA are provided by using this pathway and glucose is the sole carbon substrate (Figure 19.12), the chemical equation of converting glucose into TAL is 1.5 Glucose + 5 ATP →→→ Triacetic acid lactone + 3 CO2 + 6 NADH (19.12)

19.5 Production of New Products

Figure 19.12 The metabolic pathway of producing triacetic acid lactone (TAL) from glucose. Thicker blue lines with arrowhead indicate the reactions catalyzed by recombinant foreign enzymes. Green lines with arrowhead indicate native reactions. Thicker green lines with arrowhead indicate the native reactions that have been enhanced through gene overexpression. Pdc2, pyruvate decarboxylase 2; Ald5, aldehyde dehydrogenase 5; Acs1, acetyl-CoA synthase 1; G2ps1, TAL synthase; Acc1, acetyl-CoA carboxylase 1. The green numbers indicate the desired flux distribution at the branch point.

The theoretical yield is 0.47 g g−1 based on Eq. 19.12. Y. lipolytica PAAA consumed 20 g l−1 glucose in the experiment, so the yield was 0.14 g g−1 , which was 30% of the theoretical yield. Performance of Y. lipolytica PAAA was substantially enhanced when acetate was used as a co-substrate besides glucose [47]. The titer of TAL was increased to 4.9 g l−1 in culture tube. Interestingly, a large fraction of acetate remained unconsumed in the experiment, suggesting that adding acetate may positively affected the TAL production through protein acetylation, altering cellular redox state and/or other regulatory mechanisms [47]. When a well-controlled bioreactor was used, 36 g l−1 TAL was produced from 180 g l−1 glucose and 13.7 g l−1 acetate. The overall productivity was 0.12 g l−1 h−1 , which is sixfold higher than a previous report using S. cerevisiae. The process produced ∼20 g l−1 citrate as a byproduct. 19.5.3

Production of 𝛃-Carotene

Y. lipolytica has been engineered to produce many isoprenoids, such as linalool [48], lycopene [49], β-carotene [50], and astaxanthin [51]. Isoprenoids are a large family of natural products, which are built from C5 building block(s), isopentenyl diphosphate (IPP), and/or dimethylallyl diphosphate (DMAPP). DMAPP can be obtained through isomerization of IPP, and IPP is synthesized via the mevalonate pathway in Y. lipolytica [50]. The mevalonate pathway uses three acetyl-CoA to make one IPP in the cytosol (Figure 19.13). Many researchers chose to produce isoprenoids in Y. lipolytica because of its ample supply of cytosolic acetyl-CoA. In the mevalonate pathway, three acetyl-CoA are condensed in two steps into hydroxymethyl-glutaryl-CoA (HMG-CoA), which is then reduced into mevalonate by using two NADPH (Figure 19.13). This step is catalyzed by HMG-CoA

757

758

19 Harness Yarrowia lipolytica to Make Small Molecule Products 2 DMAPP 12 Glucose

PPi

2

CoA

16 8

CrtB

PPi

8 CO2 8 ATP

CrtI

8 MVAPP

24 Acetyl-CoA

CrtY 8 Mevalonate β-carotene

CoA

8 Acetoacetyl-CoA

8 O2 Lycopene

16 ATP

CoA

16 NADH

cis-Phytoene

PPi

Hmgr 8 HMG-CoA

2 GGPP PPi

6

8 IPP 24 CO2 48 NADH, 24 ATP

CrtE

Oxygen atom

Sulfur atom

12 Glucose + 48 ATP

Double bond

β-carotene + 32 CO2 + 36 NADH

Figure 19.13 The metabolic pathway of producing β-carotene from glucose in Y. lipolytica. Thicker blue lines with arrowhead indicate the reactions catalyzed by recombinant foreign enzymes. Green lines with arrowhead indicate native reactions. Thicker green lines with arrowhead indicate the native reactions that have been enhanced through gene overexpression. HMG-CoA, hydroxymethyl-glutaryl-CoA; MVAPP, mevalonate diphosphate; IPP, isopentenyl diphosphate; DMAPP, dimethylallyl diphosphate; GGPP, geranylgeranyl diphosphate; Hmgr, HMG-CoA reductase; CrtE, GGPP synthase; CrtB, cis-phytoene synthase; CrtI, cis-phytoene desaturase; CrtY, lycopene β-cyclase. IPP is the building block of β-carotene and is provided through the mevalonate pathway, which uses acetyl-CoA as substrate. All the reactions happen in cytosol or ER.

reductase (Hmgr), which is usually considered to be the rate-limiting step of the mevalonate pathway [52]. NAD-dependent Hmgr is known [53], so this step in Y. lipolytica would consume two NADH if such Hmgr is exclusively used. Mevalonate is then phosphorylated twice by two kinases to form mevalonate diphosphate (MVAPP). The hydroxyl group of MVAPP is phosphorylated to facilitate decarboxylation of MVAPP and the newly formed phosphate group is eventually removed, forming IPP which contains one C=C double bond (Figure 19.13). β-Carotene is used as the example molecule because the reported process produced grams per liter of the product [50]. β-carotene is a C40 carotenoid, which is a precursor of many valuable isoprenoids, including retinoids [54] and strigolactones [55]. Three IPP can be assembled with one DMAPP to form one geranylgeranyl diphosphate (GGPP). This assembly can be catalyzed by a single enzyme, GGPP synthase (CrtE). Two GGPP can be assembled by phytoene synthase (CrtB) into one cis-phytoene, which is then isomerized and is oxidized into lycopene (four C–C double bonds are introduced per cis-phytoene). The isomerization and reduction can be catalyzed by phytoene desaturase (CrtI). CrtI uses molecular oxygen as the ultimate electron acceptor at aerobic condition (Y. lipolytica is an obligate aerobe) and reduces it to superoxide, which is quenched through an unknown mechanism [56]. In the last step, lycopene is cyclized by lycopene cyclase (CrtY) into β-carotene. If we

19.6 Opportunities and Challenges

do not consider compartmentalization and assume that Hmgr uses NADH, the chemical equation of converting glucose into β-carotene is 12 Glucose + 48 ATP →→→ β − carotene + 24 CO2 + 36 NADH (19.13) This process generates excessive amount of NADH. The excess NADH is sufficient to regenerate the needed ATP through respiration and oxidative phosphorylation. When crtE, crtB, crtI, and crtY were expressed in a wild-type strain (Y. lipolytica Po1d), 18.4 mg l−1 β-carotene was produced [50]. Native crtE was expressed by using promoter PPGM . M. circinelloides crtBY and crtI were expressed by using Promoters PGAPDH and PTEF1 , respectively. These three cassettes were grouped and termed as car-cassette. β-Carotene is only made of carbon and hydrogen atoms and is therefore very hydrophobic. β-Carotene cannot be exported from the cells, thus requiring intracellular space to store it. Another reason why Y. lipolytica was considered as a host of producing β-carotene was that its hydrophobic lipid bodies may serve as storage reservoir to dissolve β-carotene for achieving high β-carotene titer. The authors [50] had constructed a lipid-producing strain (named as Y. lipolytica ob). This strain was derived from Y. lipolytica Po1d by overexpressing native dga2 and gpd1 and deleting pox1-6 and tgl4. When car-cassette was introduced into Y. lipolytica ob, 35.7 mg l−1 β-carotene was produced, which was 94% higher than that achieved in the background of Y. lipolytica Po1d. The titer was further increased by 240% (to 121.6 mg l−1 ) when the rate-limiting gene (hmgr) of the mevalonate pathway was overexpressed by using promoter PTEF (the obtained strain was named as ob-CH), confirming the importance of HMG-CoA reduction. When all the promoters in the car-cassette were replaced with PTEF , the new cassette was named as carTEF -cassette and it was found to be better than car-cassette. Two copies of carTEF -cassette were introduced into Y. lipolytica ob-CH. The new strain (Y. lipolytica ob-CHCTEF CTEF ) produced 454 mg l−1 β-carotene in shake flask. Y. lipolytica ob-CHCTEF CTEF was used to perform media optimization and bioreactor process engineering. YPD medium (yeast extract–peptone–dextrose medium, a rich medium) was found to be better than YNB-glucose medium (a chemically defined medium). With YPD medium, 2.9 g l−1 β-carotene was produced in a bioreactor. When they increased the concentration of yeast extract and peptone in the medium (to 20 and 40 g l−1 , respectively), the β-carotene titer was increased to 6.5 g l−1 [50]. In the process, 180 g l−1 glucose was consumed; 40 g l−1 TAG was produced; almost no citrate was produced.

19.6 Opportunities and Challenges Y. lipolytica is not a new species to industrial biotechnology and to microbiologists who study basic cell biology of nonconventional yeasts. In the past decade, this species has rapidly gained more popularity among metabolic engineers, both in academic institutions and companies. The species is attractive partly because

759

760

19 Harness Yarrowia lipolytica to Make Small Molecule Products

of (i) its long history of safe use in food preparation and chemical production, (ii) the wide range of the carbon substrates it can utilize, (iii) its efficient pathways to supply acetyl-CoA in cytosol, (iv) its tolerance to the waste streams that are inhibitory to many other species, and (v) its ability of forming hydrophobic lipid bodies that can be used as storage reservoir of other hydrophobic products. There is no doubt that the number of new products made from this species will continue to increase rapidly. It was observed in all three examples in Section 19.5 of this chapter that inserting multiple copies of key expression cassette into the genome resulted in better results than using a single copy. In the three examples, each copy was inserted in a separate step, and each step comprised of a random integration operation and a marker recycling operation. Such procedure is time- and manpower-consuming, because a large number of colonies must be screened after each random integration operation to isolate a top producer. There are methods that can integrate multiple copies of an expression cassette into Y. lipolytica genome through one operation [57, 58]. Incorporating some of these methods into the strain development workflow should substantially reduce the project time and cost. Developing high-throughput screening method/device could be combined with the advances in multicopy integration to achieve a synergistic effect. Currently, most screening efforts assess strain performance based on product titer, possibly because only one data point is needed for a strain through this method. Product yield is a very important performance indicator as discussed in Section 19.4.4, and substantially affects the economic viability of almost any given process. Developing new screening procedure should consider determining product yield, although it is more difficult, requiring at least two data points for a strain (product titer and residual substrate concentration). Another area to look forward to is application of nanopore sequencing in engineering genome of Y. lipolytica. In the EPA example [21], investigation of where the expression cassette was inserted led to discovery of importance of pex10 and understanding of the related cell biology. The investigation was done by using genome walking, which can only reveal the regions close to the integration site. Although Illumina sequencing can be used to sequence all the chromosomes of Y. lipolytica [36], its long turnover time makes it difficult to use the genome sequence information to guide the subsequent engineering in a timely manner. The basic nanopore sequencing device is affordable to many academic labs, allowing the users to perform whole genome sequencing of yeast by themselves within days [59]. The obtained information would shed light on why a strain worked well or failed to work, through identification of unexpected gene duplication, deletion and/or mutation. When a large number of Y. lipolytica strains can be sequenced by a typical laboratory on a routine basis, the loci to which expression cassettes are frequently integrated (integration hot spots), and new integration loci that lead to higher expression level would be identified. This information would allow better understanding of the NHEJ-based random integration, and/or would guide HR-based targeted genome editing, which theoretically does not require screening of a large number of transformants. These efforts would facilitate development of better Y. lipolytica strains faster and at lower cost.

References

References 1 Abdel-Mawgoud, A.M., Markham, K.A., Palmer, C.M. et al. (2018). Metabolic

engineering in the host Yarrowia lipolytica. Metab. Eng. 50: 192–208. 2 Bakkaiova, J., Arata, K., Matsunobu, M. et al. (2014). The strictly aerobic yeast

3 4 5

6

7 8

9 10

11

12

13

14

15

16

Yarrowia lipolytica tolerates loss of a mitochondrial DNA-packaging protein. Eukaryot. Cell 13 (9): 1143–1157. Harzevili, F.D. (2014). Biotechnological Applications of the Yeast Yarrowia lipolytica. Springer. Lazar, Z., Liu, N., and Stephanopoulos, G. (2018). Holistic approaches in lipid production by Yarrowia lipolytica. Trends Biotechnol. 36 (11): 1157–1170. Barns, S.M., Lane, D.J., Sogin, M.L. et al. (1991). Evolutionary relationships among pathogenic candida species and relatives. J. Bacteriol. 173 (7): 2250–2255. Mekouar, M., Blanc-Lenfle, I., Ozanne, C. et al. (2010). Detection and analysis of alternative splicing in Yarrowia lipolytica reveal structural constraints facilitating nonsense-mediated decay of intron-retaining transcripts. Genome Biol. 11 (6): R65. Buckholz, R.G. and Gleeson, M.A. (1991). Yeast systems for the commercial production of heterologous proteins. Biotechnology 9 (11): 1067–1072. Groenewald, M., Boekhout, T., Neuveglise, C. et al. (2014). Yarrowia lipolytica: safety assessment of an oleaginous yeast with a great industrial potential. Crit. Rev. Microbiol. 40 (3): 187–206. Barth, G. and Gaillardin, C. (1997). Physiology and genetics of the dimorphic fungus Yarrowia lipolytica. FEMS Microbiol. Rev. 19 (4): 219–237. Zieniuk, B. and Fabiszewska, A. (2018). Yarrowia lipolytica: a beneficious yeast in biotechnology as a rare opportunistic fungal pathogen: a minireview. World J. Microbiol. Biotechnol. 35 (1): 10. Chen, D.C., Beckerich, J.M., and Gaillardin, C. (1997). One-step transformation of the dimorphic yeast Yarrowia lipolytica. Appl. Microbiol. Biotechnol. 48 (2): 232–235. Madzak, C. and Beckerich, J.-M. (2013). Heterologous protein expression and secretion in Yarrowia lipolytica. In: Yarrowia lipolytica (ed. G. Barth), 1–76. Springer. Michalik, B., Biel, W., Lubowicki, R., and Jacyno, E. (2014). Chemical composition and biological value of proteins of the yeast Yarrowia lipolytica growing on industrial glycerol. Can. J. Anim. Sci. 94 (1): 99–104. Finogenova, T., Morgunov, I., Kamzolova, S., and Chernyavskaya, O. (2005). Organic acid production by the yeast Yarrowia lipolytica: a review of prospects. Appl. Biochem. Microbiol. 41 (5): 418–425. Dobrowolski, A., Mitula, P., Rymowicz, W., and Mironczuk, A.M. (2016). Efficient conversion of crude glycerol from various industrial wastes into single cell oil by yeast Yarrowia lipolytica. Bioresour. Technol. 207: 237–243. Cui, Z., Gao, C., Li, J. et al. (2017). Engineering of unconventional yeast Yarrowia lipolytica for efficient succinic acid production from glycerol at low ph. Metab. Eng. 42: 126–133.

761

762

19 Harness Yarrowia lipolytica to Make Small Molecule Products

17 Tai, M. and Stephanopoulos, G. (2013). Engineering the push and pull of lipid

18

19

20

21

22

23

24

25

26

27

28

29

30

31

biosynthesis in oleaginous yeast Yarrowia lipolytica for biofuel production. Metab. Eng. 15: 1–9. Cui, Z., Jiang, X., Zheng, H. et al. (2019). Homology-independent genome integration enables rapid library construction for enzyme expression and pathway optimization in Yarrowia lipolytica. Biotechnol. Bioeng. 116 (2): 354–363. Shi, T.Q., Huang, H., Kerkhoven, E.J., and Ji, X.J. (2018). Advancing metabolic engineering of Yarrowia lipolytica using the crispr/cas system. Appl. Microbiol. Biotechnol. 102 (22): 9541–9548. Verbeke, J., Beopoulos, A., and Nicaud, J.M. (2013). Efficient homologous recombination with short length flanking fragments in ku70 deficient Yarrowia lipolytica strains. Biotechnol. Lett. 35 (4): 571–576. Xue, Z., Sharpe, P.L., Hong, S.P. et al. (2013). Production of omega-3 eicosapentaenoic acid by metabolic engineering of Yarrowia lipolytica. Nat. Biotechnol. 31 (8): 734–740. Schwartz, C., Shabbir-Hussain, M., Frogue, K. et al. (2017). Standardized markerless gene integration for pathway engineering in Yarrowia lipolytica. ACS Synth. Biol. 6 (3): 402–409. Blazeck, J., Liu, L., Redden, H., and Alper, H. (2011). Tuning gene expression in Yarrowia lipolytica by a hybrid promoter approach. Appl. Environ. Microbiol. 77 (22): 7905–7914. Yin, X., Madzak, C., Du, G. et al. (2012). Enhanced alpha-ketoglutaric acid production in Yarrowia lipolytica wsh-z06 by regulation of the pyruvate carboxylation pathway. Appl. Microbiol. Biotechnol. 96 (6): 1527–1537. Ajikumar, P.K., Xiao, W.H., Tyo, K.E. et al. (2010). Isoprenoid pathway optimization for taxol precursor overproduction in Escherichia coli. Science 330 (6000): 70–74. Carsanba, E., Papanikolaou, S., Fickers, P. et al. (2019). Citric acid production by Yarrowia lipolytica. In: Non-conventional Yeasts: From Basic Research to Application (ed. A. Sibirny), 91–117. Springer. Sabra, W., Bommareddy, R.R., Maheshwari, G. et al. (2017). Substrates and oxygen dependent citric acid production by Yarrowia lipolytica: insights through transcriptome and fluxome analyses. Microb. Cell Factories 16 (1): 78. Holz, M., Otto, C., Kretzschmar, A. et al. (2011). Overexpression of alpha-ketoglutarate dehydrogenase in Yarrowia lipolytica and its effect on production of organic acids. Appl. Microbiol. Biotechnol. 89 (5): 1519–1526. Zhou, J., Zhou, H., Du, G. et al. (2010). Screening of a thiamine-auxotrophic yeast for alpha-ketoglutaric acid overproduction. Lett. Appl. Microbiol. 51 (3): 264–271. Chernyavskaya, O.G., Shishkanova, N.V., Il’chenko, A.P., and Finogenova, T.V. (2000). Synthesis of alpha-ketoglutaric acid by Yarrowia lipolytica yeast grown on ethanol. Appl. Microbiol. Biotechnol. 53 (2): 152–158. Yuzbashev, T.V. et al. (2010). Production of succinic acid at low pH by a recombinant strain of the aerobic yeast Yarrowia lipolytica. Biotechnol. Bioeng. 107 (4): 673–682.

References

32 Gao, C. et al. (2016). Robust succinic acid production from crude glycerol

using engineered Yarrowia lipolytica. Biotechnol. Biofuels 9 (1): 179. 33 Fickers, P. et al. (2005). Hydrophobic substrate utilisation by the yeast

34

35

36 37 38

39 40

41

42 43

44 45

46 47

48 49

Yarrowia lipolytica, and its potential applications. FEMS Yeast Res. 5 (6–7): 527–543. Wilfling, F. et al. (2013). Triacylglycerol synthesis enzymes mediate lipid droplet growth by relocalizing from the ER to lipid droplets. Dev. Cell 24 (4): 384–399. Koch, B., Schmidt, C., and Daum, G. (2014). Storage lipids of yeasts: a survey of nonpolar lipid metabolism in Saccharomyces cerevisiae, Pichia pastoris, and Yarrowia lipolytica. FEMS Microbiol. Rev. 38 (5): 892–915. Qiao, K. et al. (2015). Engineering lipid overproduction in the oleaginous yeast Yarrowia lipolytica. Metab. Eng. 29: 56–65. Qiao, K. et al. (2017). Lipid production in Yarrowia lipolytica is maximized by engineering cytosolic redox metabolism. Nat. Biotechnol. 35 (2): 173–177. Wasylenko, T.M., Ahn, W.S., and Stephanopoulos, G. (2015). The oxidative pentose phosphate pathway is the primary source of NADPH for lipid overproduction from glucose in Yarrowia lipolytica. Metab. Eng. 30: 27–39. Hu, P. et al. (2016). Integrated bioprocess for conversion of gaseous substrates to liquids. Proc. Natl. Acad. Sci. U S A 113 (14): 3773–3778. Li, Y. et al. (2015). Acetic acid production from food wastes using yeast and acetic acid bacteria micro-aerobic fermentation. Bioprocess Biosyst. Eng. 38 (5): 863–869. Liu, N., Qiao, K., and Stephanopoulos, G. (2016). (13)C Metabolic flux analysis of acetate conversion to lipids by Yarrowia lipolytica. Metab. Eng. 38: 86–97. Park, J.O. et al. (2019). Synergistic substrate cofeeding stimulates reductive metabolism. Nat. Metab. 1 (6): 643–651. Xu, J. et al. (2017). Application of metabolic controls for the maximization of lipid production in semicontinuous fermentation. Proc. Natl. Acad. Sci. U S A 114 (27): E5308–E5316. Mano, J. et al. (2020). Engineering Yarrowia lipolytica for the utilization of acid whey. Metab. Eng. 57: 43–50. Xu, P. et al. (2016). Engineering Yarrowia lipolytica as a platform for synthesis of drop-in transportation fuels and oleochemicals. Proc. Natl. Acad. Sci. U S A 113 (39): 10848–10853. Markham, K.A. and Alper, H.S. (2018). Synthetic Biology Expands the Industrial Potential of Yarrowia lipolytica. Trends Biotechnol. 36 (10): 1085–1095. Markham, K.A. et al. (2018). Rewiring Yarrowia lipolytica toward triacetic acid lactone for materials generation. Proc. Natl. Acad. Sci. U S A 115 (9): 2096–2101. Cao, X. et al. (2017). Enhancing linalool production by engineering oleaginous yeast Yarrowia lipolytica. Bioresour. Technol. 245 (Pt B): 1641–1644. Matthaus, F. et al. (2014). Production of lycopene in the non-carotenoid-producing yeast Yarrowia lipolytica. Appl. Environ. Microbiol. 80 (5): 1660–1669.

763

764

19 Harness Yarrowia lipolytica to Make Small Molecule Products

50 Larroude, M. et al. (2018). A synthetic biology approach to transform

51 52 53

54 55

56

57 58

59

Yarrowia lipolytica into a competitive biotechnological producer of beta-carotene. Biotechnol. Bioeng. 115 (2): 464–472. Kildegaard, K.R. et al. (2017). Engineering of Yarrowia lipolytica for production of astaxanthin. Synth. Syst. Biotechnol. 2 (4): 287–294. Ro, D.K. et al. (2006). Production of the antimalarial drug precursor artemisinic acid in engineered yeast. Nature 440 (7086): 940–943. Ma, S.M. et al. (2011). Optimization of a heterologous mevalonate pathway through the use of variant HMG-CoA reductases. Metab. Eng. 13 (5): 588–597. Jang, H.J. et al. (2011). Retinoid production using metabolically engineered Escherichia coli with a two-phase culture system. Microb. Cell Fact. 10: 59. Saeed, W., Naseem, S., and Ali, Z. (2017). Strigolactones biosynthesis and their role in abiotic stress resilience in plants: a critical review. Front. Plant Sci. 8: 1487. Schaub, P. et al. (2012). On the structure and function of the phytoene desaturase CRTI from Pantoea ananatis, a membrane-peripheral and FAD-dependent oxidase/isomerase. PLoS One 7 (6): e39550. Juretzek, T. et al. (2001). Vectors for gene expression and amplification in the yeast Yarrowia lipolytica. Yeast 18 (2): 97–113. Lv, Y. et al. (2019). Combining 26s rDNA and the Cre-loxP system for iterative gene integration and efficient marker curation in Yarrowia lipolytica. ACS Synth. Biol. 8 (3): 568–576. Kono, N. and Arakawa, K. (2019). Nanopore sequencing: review of potential applications in functional genomics. Dev. Growth Differ. 61 (5): 316–326.

765

20 Metabolic Engineering of Filamentous Fungi Vera Meyer Chair of Applied and Molecular Microbiology, Technische Universität Berlin, Berlin, Germany

20.1 Introduction In 1917, the chemist James Currie published a general equation of the metabolism of the filamentous fungus Aspergillus niger [1]: Carbohydrate → Citric acid → Oxalic acid → Carbon dioxide → Mycelium He speculated that “This reaction can be controlled to a very considerable extent” by varying the nature and quantity of carbon and nitrogen sources supplied to the medium. Indeed, he could prove his assumption in this groundbreaking publication and concluded that “the conditions most favorable for a high yield of the end-products, carbon dioxide and mycelium, are least favorable for the formation of the intermediate products, citric and oxalic acids” [1]. Currie identified many factors supporting high citric acid production in this study and, thus, laid the foundation for A. niger to become the pioneer fungus for industrial organic acid fermentation and the birth of modern biotechnology exploiting filamentous fungi [2]. His publication ends with the following statement: The painstaking investigation of all the conditions favoring the production of such substances will lay the only sure foundations for the development of a chemical fermentation industry. It is the hope of the writer that the work here recorded may prove a definite contribution to this much neglected but promising field of scientific endeavor. [1] A hundred years later, not only A. niger but many other filamentous fungi are used as cell factories in very diverse industrial sectors, including chemical, biofuel, textile, pharma, and food industries, to name but a few. The hope of James Currie has, thus, come true. The natural metabolic capacities of filamentous fungi are appreciated as extraordinary diverse, are nowadays much better understood and are purposefully harnessed for the production of primary and secondary metabolites, proteins and enzymes, food and vitamins, and even composite materials and vegan leather (Table 20.1). Metabolic Engineering: Concepts and Applications, First Edition. Edited by Sang Yup Lee, Jens Nielsen, and Gregory Stephanopoulos. © 2021 WILEY-VCH GmbH. Published 2021 by WILEY-VCH GmbH.

766

20 Metabolic Engineering of Filamentous Fungi

Table 20.1 List of some filamentous fungal cell factories and their products. Filamentous fungus

Important product(s)

Acremonium chrysogenum

β-Lactam antibiotics (cephalosporins)

Aspergillus niger

Enzymes (glucoamylase, proteases, phytases, glucose oxidase) Organic acids (citric acid, gluconic acid)

Aspergillus oryzae

Enzymes (amylases)

Aspergillus terreus

Enzymes (xylanases) Organic acids (itaconic acid) Secondary metabolites (lovastatin)

Blakeslea trispora

Vitamins (β-carotene)

Fusarium venenatum

Mycoprotein

Ganoderma lucidum

Composite materials (packaging material, construction material) Imitation leather

Penicillium chrysogenum

β-Lactam antibiotics (penicillins) Enzymes (glucose oxidase)

Pleurotus ostreatus

Composite materials (packaging material, construction material)

Thermothelomyces thermophilus

Enzymes (cellulases, phytases, laccases)

Trichoderma reesei

Enzymes (cellulases, hemicellulases)

Note that T. thermophilus was formerly named Myceliophthora thermophila and P. chrysogenum was recently renamed Penicillium rubens. Source: Modified after Cairns et al. [3].

Fungal biotechnology harnessing the metabolic activities of filamentous fungi has, thus, managed to establish itself as an essential platform technology for innumerable branches of industry and shapes our daily life and lifestyle decisively in an invisible way. It offers not only exciting solutions to mankind’s efforts to promote the transition from our current petroleum-based economy into a future sustainable bio-based circular economy but also new concepts on how to secure the increasing food demand for a growing human population ([4] and see below). Many of the naturally existing metabolic activities of filamentous fungi have been elucidated and leveraged in strain optimization programs to obtain more efficient cell factories [5]. If everything seems to be perfect, why, then, consider metabolic engineering of filamentous fungi? The short answers are: (i) Most molecular mechanisms determining the efficiency of substrate utilization and product formation are far from being fully understood; (ii) a confounding factor that limits the productivity of filamentous fungal cell factories is their morphological development under submerged cultivation conditions (Figure 20.1); and (iii) the concept of enlarging the product portfolio within one filamentous fungus to generate a multipurpose cell factory, i.e. a one-size-fits-all solution has not been fully explored yet. Detailed answers will be given in the following sections accompanying the examples of six established filamentous fungal cell factories: A. niger, A. oryzae,

20.1 Introduction

Figure 20.1 Morphologies adopted by filamentous fungi under submerged cultivation conditions, which is the most common fermentation method in industry. Different branching frequencies of the growing mycelium cause the development of different macromorphologies. These are visible to the eye and can range from pellets (lower left), whose diameter can be several hundred micrometer up to several centimeter, over loose clumps (not shown) to dispersed morphologies causing a very viscous culture broth in a bioreactor (lower right). The example of A. niger is shown. The diameter of its hyphae is about 3.5–4 μm. Source: Vera Meyer.

High branching rate

Low branching rate

A. terreus, P. chrysogenum, T. reesei, and T. thermophilus. These species have been selected because their biotechnological products cover most of the product range offered by filamentous fungi. They have also been selected because most metabolic engineering efforts are currently focused on these species. Despite the progress made, metabolic engineering of filamentous fungi is still in its infancy. Whereas the keyword “Metabolic engineering” coupled with either “Saccharomyces cerevisiae” or “Escherichia coli” results in about 2000 or 4000 retrieved articles from PubMed in 2019, respectively, the number of articles for filamentous fungi ranges from 5 (T. thermophilus), over 40–60 (A. oryzae, A. terreus, P. chrysogenum, T. reesei) to about 100 for A. niger. One reason for this discrepancy is that the genome sequences of filamentous fungi became available only a decade after the genomes of the model unicellular fungus S. cerevisiae (1996, [17]) and the model bacterium E. coli (1997, [18]) were released to the public. On top of this, the genomes of filamentous fungi contain far more genes. A filamentous fungal genome usually carries between 9000 and 14 000 genes (Table 20.2), whereas E. coli can live with about 4000 [18] and S. cerevisiae with about 6000 [17] genes, respectively. Finally, the research communities studying filamentous fungi are considerably small. A recent mapping of research on A. niger uncovered a network of about 30 research labs worldwide [2]. By contrast, more than 1800 research labs studying S. cerevisiae are registered at the Saccharomyces Genome Database [19]. In the following chapter, the current state of the art of metabolic engineering in filamentous fungi will be discussed. Hereby, the definition of metabolic engineering of the research journal Nature will be used: Metabolic engineering is the use of genetic engineering to modify the metabolism of an organism. It can involve the optimization of existing biochemical pathways or the introduction of pathway components, most commonly in bacteria, yeast or plants, with the goal of high-yield production of specific metabolites for medicine or biotechnology. [20]

767

768

20 Metabolic Engineering of Filamentous Fungi

Table 20.2 Filamentous fungal cell factories with available genome sequence data and CRISPR genome editing tools.

Strain

First genome published

No. of predicted genes

First CRISPR tool published

References

A. niger

2007

∼14 000

2015

[6, 7]

A. oryzae

2005

∼12 000

2016

[8] [9]

A. terreus

2005

∼10 000



P. chrysogenum

2008

∼13 000

2016

[10, 11]

T. thermophilus

2011

∼9 000

2017

[12, 13]

T. reesei

2008

∼9 000

2015

[14, 15]

Note that only the publications reporting a genome sequence or a CRISPR tool for the six fungal cell factories for the first time have been cited here. The reader is directed to [16] for a recent review of the implementation of different CRISPR protocols for filamentous fungi. Note also that T. thermophilus was formerly named M. thermophila and P. chrysogenum was recently renamed P. rubens.

This definition already indicates that filamentous fungi play a minor role in the perception of the metabolic engineering community as promising and powerful metabolite producers. However, industrially exploited filamentous fungi are often superior to bacterial and yeast cell factories, regarding robustness to harsh industrial cultivation conditions, metabolic versatility, and secretory capacity [4]. The focus of the chapter will, thus, be on the development and implementation of genetic tools for filamentous fungi (Section 20.2), the establishment of metabolic and regulatory models (Section 20.3), engineering strategies for improved substrate utilization (Section 20.4), enhanced product formation (Section 20.5), and new product developments (Section 20.6). The chapter concludes with a discussion of current strategies on how to engineer and control the development of macromorphologies in filamentous fungi (Section 20.7) and will finally discuss new developments in the metabolic engineering of filamentous fungi in the near future (Section 20.8).

20.2 Development and Implementation of Genetic and Genome Tools Molecular studies with filamentous fungi have long been considered as difficult and painstaking because of their low growth rate compared to bacteria and yeast, a lack of efficient genetic transformation systems, the nonexistence of versatile dominant or auxotrophic selection markers, and poor transformation rates. Diametrically opposed to this is the wish of bioengineers who consider

20.2 Development and Implementation of Genetic and Genome Tools

fast and efficient genetic manipulation tools as a fundamental prerequisite for metabolic engineering. Consequently, unicellular and easier to handle bacterial and yeast systems tend to be the default, although filamentous fungal systems may be the rational choice for many applications [4]. Fortunately, the last 10–15 years have witnessed a revolution in molecular tools and technologies for filamentous fungi. Several efficient transformation techniques, a broad range of selection markers, and a variety of constitutive or inducible expression systems are available nowadays for filamentous fungal cell factories [21, 22], making them easy to handle for trained fungal staff. In addition, the lengthy procedure of screening for homologous transformants within the set of transformants gained has been streamlined by the implementation of recipient strains which are defective in nonhomologous recombination [23]. Hence, knock-out, knock-in, gene replacement, and conditional gene expression of any gene of interest has become routine and a genetically modified filamentous fungus carrying the intended genetic modification can be obtained within approximately a week. Simultaneous expression of all genes belonging to a complete biosynthetic pathway via a polycistronic expression cassette has also been shown to be feasible in filamentous fungi and used to produce bioactive secondary metabolites, such as the antibiotics penicillin and enniatin, as well as the insecticides austinoids in different Aspergilli [24–27]. In addition, genome sequences have been published for hundreds of filamentous fungi, including the most relevant industrial cell factories (Table 20.2) and the genomic datasets for filamentous fungi are continuously expanding and being made accessible for the research community by databases such as FungiDB [28, 29], MycoCosm [30], and Ensembl [31]. The genomes of 17 different A. niger strains, for example, have been sequenced since the first A. niger genome became available in 2007 [6, 32–39]. The availability of this wealth of genome sequence data combined with the most recent implementation of a rich and diverse set of CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) genome editing protocols for filamentous fungi (Table 20.2), which are also compatible with microtiter plate methods [40], will probably elicit a new era in genetic and metabolic engineering for filamentous fungi. Fast and efficient introduction of targeted single-point or large chromosomal mutations or rewiring of a biosynthetic pathway in filamentous fungi is no longer fantasy, but reality. However, the challenge of today is the quality of genome data for filamentous fungi, which can vary greatly. If of poor quality, the outcome of comparative genomics studies is impaired as is the efficiency of genetic and metabolic engineering efforts. Therefore, the research community advocates several recommendations and protocols to the individual researchers in order to enable sustainable use and reuse of published genomics data and, more broadly speaking, of transcriptomics, proteomics, and metabolomics data [4, 41, 42]. Unfortunately, a gold standard genome (i.e. a reference genome that is near error-free and near gapless and can be used to map the genomes of closely related organisms) has not been published yet for filamentous fungi. However, a gold standard genome for a lab strain of A. niger is likely to become released soon [43].

769

770

20 Metabolic Engineering of Filamentous Fungi

20.3 Metabolic and Regulatory Models Next to gene annotation and genetic engineering, metabolic and regulatory models are of crucial importance for metabolic engineering approaches. They predict how genetic and ambient medium perturbations impact growth and metabolite production and can, thus, guide metabolic engineering efforts. Genome sequences generally form the basis for draft genome-scale metabolic models (GEMs), where genes become assigned to metabolic pathways and experimental data integrated into a structured framework. The probably best and, in a community-driven effort, continuously curated fungal GEM is available for the yeast model S. cerevisiae, which can even be used to predict phenotypic traits of single point mutations [48]. The best and validated GEM for a filamentous fungus available so far is for A. niger (Table 20.3). The power of this community-driven GEM is that it integrates the experimental knowledge from nearly 900 publications, thus, representing the best experimentally supported model currently available for a filamentous fungus. It is a true consensus model as it integrates information from former models (e.g. [49]). Furthermore, it provides individual strain-specific models for the most commonly used A. niger strains (CBS 513.88 and ATCC 1015) and can be used easily for gene-protein association studies based on genomic and transcriptomic data [44]. It is notable that existing GEMs can be used as a template to reconstruct a GEM for related species [50]. Older GEMs for A. niger have, thus, been used to develop GEMs for P. chrysogenum [46] and T. reesei [51] (Table 20.3). A convenient software suite for the semiautomated reconstruction, simulation and curation of GEMs is the RAVEN toolbox, which is freely available at GitHub [52]. An alternative is the metabolic model reconstruction algorithm CoReCo, which has been recently improved and used for T. reesei and 55 other fungi [53]. The code for this pipeline can be downloaded from GitHub [54] or the BioModels database [55]. A confounding factor when harnessing genome data is the fact that a considerably high number of genes is “hypothetical” and lack functional predictions. Table 20.3 Filamentous fungal cell factories with curated genome-scale metabolic models (GEMs). Strain

Year

GEM

A. niger

2018

Covers 2320 reactions and 1325 genes

[44]

A. oryzae

2014

Covers 2453 reactions and 820 genes

[45]

A. terreus

2014

Covers 2401 reactions and 794 genes

[45]

P. chrysogenum

2013

Covers 1471 reactions and 1006 genes

[46]

T. thermophilus T. reesei

— 2016

References





Covers 3926 reactions and 697 genes

[45]

Note that only the most recent models are referred to. The reader is directed to [47] for a recent review of the current state of genome scale reconstruction in filamentous fungi.

20.3 Metabolic and Regulatory Models

Recent estimates suggest that between 40 and 50% of the genes in a filamentous fungal genome are hypothetical [4]. Furthermore, only 2–10% of the genes with predicted functions have been experimentally studied in filamentous fungi and, thus, have a verified function [29, 56]. This renders thousands of genes within a single filamentous fungal genome uncharacterized, a reconstructed GEM considerably incomplete, i.e. with gaps, dead-end reactions and dead-end metabolites, and our general understanding of filamentous fungal biology, therefore, far from comprehensive. Even for the model S. cerevisiae, 21% of its predicted genes (i.e. about 1400 genes) still have dubious functional predictions in 2019, which is 22 years after the release of its genome sequence and despite a research community with more 1800 labs worldwide [57]. One powerful solution for this challenge is the interrogation of gene expression networks based on hundreds of transcriptomic datasets available for filamentous fungi. This wealth of data can be harnessed to improve gene annotations and to assign gene function predictions. The underlying hypothesis is the so-called “guilt-by-association” approach, which is rooted in the assumption that genes which are frequently coexpressed during growth, development, metabolite production, and under diverse environmental conditions or during genetic perturbations are likely to function in the same or closely related biological processes or pathways [59]. A meta-analysis of 283 transcriptomics experiments publicly available for A. niger, for example, generated coexpression networks for 9579 genes, which are about 65% of all genes present in the A. niger CBS 513.88 genome [58]. This dataset was coupled with gene ontology enrichment analyses [60] and allowed to predict biological processes including metabolic and regulatory functions for 9263 of A. niger genes (Figure 20.2a). Remarkably, this approach assigned processes to 2970 hypothetical genes, which is about 50% of all hypothetical genes predicted in the genome of A. niger. These predictions, therefore, give, for the first time, the opportunity to link hypothetical genes with known metabolic and cellular processes (Figure 20.2b). This compendium can, thus, be used for hypothesis generation on a variety of conceptual levels and can generate shortlists of yet unstudied genes that can be investigated further under certain conditions to decipher their function. Taken together, many GEMs are currently available for filamentous cell factories and can be used to simulate biomass accumulation and metabolite production. Nevertheless, GEMs for filamentous fungi need continuous quality improvement. They are considerably limited as they rely on current insights into genome data and gene function predictions. Older models, for example, for T. reesei, exhibited stoichiometrically unrealistic yields (more carbon was available in the biomass than was available from the carbon source) but newer models contain more relevant biochemical pathways and stoichiometrically unrealistic yields are no longer observed [53]. However, in the most recent model for T. reesei, the condition where the highest protein production and secretion is experimentally observed, the GEM predictions showed the highest variation and predicted higher growth rates than were experimentally measured. Hence, there is still a long way to go to until in silico GEMs can fully predict the growth and product formation of filamentous fungi.

771

772

20 Metabolic Engineering of Filamentous Fungi

Figure 20.2 Gene coexpression networks uncover highly connected genes, which are functionally related, members of the same metabolic or regulatory pathway and controlled by the same transcriptional regulatory program. (a) The global gene coexpression network obtained for A. niger covers 9579 genes and allows functional predictions for thousands of genes including hypothetical ones. (b) Coexpression sub-networks can be interrogated for any gene of interest. The example of the chaperon BipA is shown, which is of central importance for high protein secretion in A. niger. Genes are represented by circles, with positive and negative correlations depicted by grey and red lines, respectively. BipA is given in a black diamond box. The BipA encoding gene is highly connected with genes with functions in protein folding (chaperones and foldases), protein transport (COPI- and COPII-vesicle transport, cytoskeleton), plasma membrane biosynthesis (ergosterol biosynthesis), and other secretion and regulatory-associated proteins. The strength of this approach is that several hypothetical proteins have been identified to probably play a role in protein secretion, which is not possible based only on gene content information. Source: Note that parts of this figure have been reproduced and modified from Schäpe et al. [59], CC BY 4.0.

20.4 Engineering Strategies for Improved Substrate Utilization

20.4 Engineering Strategies for Improved Substrate Utilization During evolution, filamentous fungi have learned to live and feed on various polymeric substances and efficiently decompose organic matter [61]. Polysaccharides from plant biomass are among the preferred carbon sources, perhaps because the kingdom of plants concentrates most of the biomass on Earth (450 gigatons of carbon out of a total of 550 gigatons of carbon) [62]. Whereas humans first ingest and then digest, filamentous fungi have learned to first digest through extracellular hydrolysis of the polymers and then ingest the resulting low molecular weight degradation products. For this purpose, enzymes such as cellulases, amylases, pectinases, and inulases, to name but a few, become secreted into the surrounding medium and hydrolyze plant polysaccharides, such as cellulose, starch, pectin, and inulin. The degradation products are usually mono- and oligosaccharides and are taken up into cells with the help of specific sugar transporters present in their plasma membranes. The high and very efficient degradation potential of plant biomass makes filamentous fungi very interesting as a source of enzymes for several industries, including food and feed, pulp and paper, pharmaceutical and chemical industries, and places filamentous fungal cell factories in a central position for the sustainable production of biofuels and chemicals [4]. The enzymes involved in carbohydrate degradation have been classified into families in the Carbohydrate-Active enzyme database (CAZy, [63]). A recent comparative genomics analysis uncovered that the predicted enzyme set for plant polysaccharide degradation in A. niger, A. oryzae, P. chrysogenum, and T. reesei harbors 200, 242, 174, and 119 enzymes, respectively, whereas only 30 can be found in S. cerevisiae [64]. It is generally thought that the presence of a gene predicted to encode carbohydrate-degrading enzyme correlates with the ability of fungi to grow on a specific carbon source [65]. However, it seems that extra copies of an enzyme do not necessarily result in faster growth or degradation but rather correlate with the phylogenetic relationship between different species [65]. CAZyme families are controlled by orthologous transcription factors, some of which are present in almost all filamentous fungi (e.g. Ace1, CreA, ClrA, ClrB, GaaR, XlnR), whereas others are only present in few filamentous fungi (e.g. AmyR, InuR, Ace2 [66]). An important underlying molecular mechanism of CAZyme gene expression is the inducer-dependent activation of their corresponding transcription factors. The inducing compound is a mono- or disaccharide or a derivative thereof which becomes liberated from the specific plant polysaccharide by the respective fungal enzyme. Upon uptake, the inducer activates the substrate-specific transcription factor (e.g. cellobiose, the degradation product of cellulose, activates the cellulase regulator ClrA which, in turn, activates ClrB). A comprehensive overview of fungal CAZymes and the complex regulatory machinery behind ensuring their expression can be found in two recent reviews [66, 67].

773

774

20 Metabolic Engineering of Filamentous Fungi

Several approaches have been followed to increase substrate utilization, i.e. to increase the secretion of the enzyme set to fully degrade the plant biomass. This is a major challenge, as the composition of plant biomass usually varies due to the use of different agricultural waste streams, pretreatments or impurities. The enzyme cocktails secreted by filamentous fungi, therefore, vary considerably. Although this can be appreciated as a high natural metabolic flexibility, a lot of effort is invested by academic and industrial groups to genetically engineer filamentous fungi to produce specific CAZymes independently of these variations, i.e. to obtain a defined set of enzymes at high yields. These genetic engineering approaches involve, for example, targeted deletions of transcriptional repressors (e.g. Ace1 and Rce1 in T. reesei, [68, 69]) or overexpression of transcriptional activators (e.g. AmyR in A. niger [70], ManR in A. oryzae [71], and Xyr1 in T. reesei and T. thermophilus, respectively [72, 73]). However, as overexpression of AmyR or Xyr1 does not cause inducer-independent CAZyme gene expression, several approaches have been followed to generate synthetic transcription factors, which constitutively bind to their target gene promoters. One successful example is the constitutive overexpression of a hybrid transcription factor in T. reesei, which contained DNA-binding domains from both Cre1 and Xyr1 [74]. Another very interesting approach follows the rational overexpression of epigenetic regulators, which are thought to result in more loosely packed chromatin and, thus, the easier access of transcription factors to their target genes. The overexpression of predicted chromatin remodelers (e.g. N-acetyltransferase or methyltransferase) indeed improved cellulase gene expression in T. reesei [75, 76]. The interested reader is referred to a most recent review which covers the research literature up to 2018 [67] for more genetic engineering examples regarding the optimization of fungal plant biomass degradation. Thereafter, A. oryzae has been reprogrammed to produce a cellulolytic enzyme cocktail at high yield. This metabolically engineered strain produces all three enzyme activities, a cellobiohydrolase, an endoglucanase, and β-glucosidase activity, respectively, to fully degrade cellulose to glucose, whereby each cellulase gene was constitutively expressed from multiple gene copies. The resulting strain displayed a 40-fold higher cellulase activity than the progenitor strain containing single copies of the respective genes [77]. Interestingly, the growth of T. reesei on cellulose is not only inducer-dependent but also light-dependent. T. reesei has photoreceptors which sense blue light and, in turn, alter cellulase gene expression tenfold and more. A remarkably high 75% of glycoside hydrolases seem to display blue light-dependent gene regulation and a lot is already known regarding how this is achieved on the molecular level. A recent comprehensive review is recommended for further reading [78]. Although this phenomenon has not yet been studied at all in A. niger, A. oryzae, A. terreus, or T. thermophilus, an impact of light on secondary metabolite production in P. chrysogenum has been reported (see Section 20.6). It might come as no surprise that the metabolism of fungi – like in humans – has adapted to light and darkness during evolution. Still, this phenomenon is nearly unexplored in filamentous fungal cell factories, except for T. reesei. A phylogenetic analysis with a particular focus on representative genomes of major filamentous fungi has recently identified opsin-encoding genes in A. niger, A. terreus, and P. chrysogenum. These fungal opsins are probably functional homologs of

20.5 Engineering Strategies for Enhanced Product Formation

bacterial green-light sensory rhodopsins, suggesting that light-sensing systems also exist in these cell factories [79]. In any case, the current insights, although very minor, have already provided important implications for research and strain improvement programs [78]. On the one hand, good laboratory praxis should ensure controlled light conditions and avoid light pulses during fungal cultivations. This can be achieved by constant light in shakers or glass vessel bioreactors which are run in labs where the light is always on. This will ensure consistent data from gene regulation and multiomics studies. On the other hand, large-scale industrial production with filamentous fungi occurs in darkness due to cultivation in stainless steel fermenters. Therefore, one might consider blinding filamentous fungal cell factories by deleting genes encoding light-sensing proteins. Genetic engineering approaches of blinded strains at lab-scale level would, therefore, more likely result in production strains which – with respect to light and darkness – would behave more predictively at a large-scale level.

20.5 Engineering Strategies for Enhanced Product Formation The product portfolio of filamentous fungi is as diverse as their abilities to grow on different organic carbon sources. Figure 20.3 highlights some central catabolic routes and main products derived thereof which are harnessed in fungal biotechnology. These products involve primary metabolites (organic acids), macromolecules (proteins), or secondary metabolites (polyketides, nonribosomal peptides, terpenes, alkaloids). Some examples selected and current metabolic engineering strategies to increase their titer or redirect metabolic fluxes into other pathways will be discussed in the following section. Figure 20.3 A simplified model of carbon catabolism in filamentous fungi when cultivated on polysaccharides. The main product classes are summarized in grey boxes; products of the secondary metabolism are indicated in italics. Acetyl-CoA provides the link between primary and secondary metabolism. For simplicity, the currency metabolites ATP and NAD(P)H are not indicated in the figure. Note that most filamentous fungi also secrete proteases and lipases, enabling them to also grow on other polymeric carbon sources, such as proteins and lipids.

775

776

20 Metabolic Engineering of Filamentous Fungi

20.5.1

Aspergillus niger

Citric acid, an intermediate of the citric acid cycle, is the most important bulk product in the organic acid industry worldwide. It has been produced mainly with A. niger and used as a flavoring agent, acidifier, and chelating agent in food, pharma, and chemical industries for the last 100 years [2]. Approximately 80% of the worldwide production of citric acid is realized by submerged fermentation of A. niger, whereby yields of 0.95 g g−1 glucose are achieved [80]. Although this is already close to the theoretical yield (1.067 g g−1 glucose), several strain optimization programs are ongoing to achieve or even exceed the physiological limit, for example by uncoupling fermentation from biomass formation, which is principally feasible as shown for the bacterium Thermotoga maritima producing hydrogen-biofuel [81]. Systems metabolic engineering to rationally redesign A. niger’s citric acid production capacities is generally possible due to the wealth of A. niger multiomics data and a well-curated GEM available. Comparative genomic studies with the 17 A. niger genomes available, some of which are citric acid producers, some of which are enzyme producers, have already unveiled unique genes or single nucleotide polymorphisms [32], which relate to the phenotypic traits of the production strains. A recently published review lists 18 metabolic engineering approaches dedicated to improving citric acid production in A. niger [82], a few of which will be highlighted here. Dynamic flux balance analysis on time-course data during batch fermentation of the citric acid-producing strain A. niger ATCC 1015, for example, has recently uncovered that phosphate-limitation is a key factor that induces citric acid production [83]. A proteomics approach with a focus on membrane-associated proteins has identified two new high-affinity glucose transporters (MstG, MstH) and one rhamnose transporter (RhtA) which can be used to optimize substrate uptake in A. niger [84, 85]. Deletion of an α-glucosidase (agdA) coupled with overexpression of the glucoamylase-encoding gene glaA improved substrate utilization when cultivated on liquefied corn starch and, thus, productivity [86]. Finally, ATP-mediated feedback inhibition of the glycolytic enzyme phosphofructokinase became attenuated by replacing the ATP-producing cytochrome-dependent respiration with an alternative oxidase (Aox1), which still enables reoxidation of NADH but does not produce ATP [87]. These and further studies highlight why A. niger is such an excellent citric acid producer: (i) Efficient carbon source utilization, (ii) high glycolytic flux due to relief of ATP and citrate feedback inhibition, (iii) high anaplerotic activities which refill the citric acid cycle upon depletion, (iv) low citrate degrading enzyme activities, and (v) alternative respiratory pathways. Even more improvements of citric acid production became feasible with the recent identification of the citrate exporter CexA [88] and its targeted overexpression via the inducible synthetic Tet-on gene switch [89]. The future challenge will be to integrate all these individual gene modifications into one chassis strain that carries all relevant modifications in a synergistic manner, which, when referring to James Currie, results in minimum biomass but maximum citric acid production. The importance of cell compartmentalization for citric acid production has also not been fully explored yet. The canonical view, so far, is that the citric acid cycle

20.5 Engineering Strategies for Enhanced Product Formation

runs mainly in mitochondria, as the citrate synthase CitA is localized there. However, it has recently been proposed that part of the citrate could also be synthesized in the cytosol by a cytosolic localized CitB enzyme, which would be accompanied by less ATP production and, hence, reduced feedback inhibition [90]; a hypothesis worth studying further. A. niger has recently gained lots of interest as a potential new producer of other organic acids, such as itaconic and galactaric acid. Itaconate could replace petroleum-based polyacrylic acid, which is a precursor for the polymer industry (absorbent polymers, polyester resins, synthetic latex) and galactarate could replace the current petroleum-based polyethylene terephthalate (PET) used for plastic production [90, 91]. Itaconate stems from the citric acid cycle, where citric acid becomes metabolized to cis-aconitate, which is then converted to itaconate by the enzyme cis-aconitate decarboxylase CadA [92]. This pathway is common for A. terreus but not naturally present in A. niger because it lacks CadA and a cis-aconitate transporter MttA, which transports cis-aconitate out of the mitochondrium into the cytoplasm where cis-aconitate becomes eventually decarboxylated to itaconate by CadA [93]. Surprisingly, rewiring of the metabolism of A. niger toward itaconate does not turn it into a superior itaconate producer. This is currently assumed because of unwanted itaconate conversion into either itaconate methyl-ester or full oxidation into pyruvate and acetyl-CoA [94]. Hence, more research is necessary to establish A. niger as an itaconate producer in which product degradation has been successfully prevented. The rewiring of A. niger to overproduce galactarate, which was achieved by several single- and multiplexed CRISPR approaches, seems very promising. In brief, an engineered A. niger strain was established that was capable of hydrolyzing pectin (a component of the plant primary cell wall and most abundant in sugar beet pulp and citrus processing waste streams) to d-galacturonate, which was further oxidized to galactarate with titers of 12 g l−1 [40, 91]. Interestingly, d-galacturonate has also been shown to be convertible by a genetically engineered A. niger strain to l-ascorbate, i.e. vitamin C [95]. Although only 170 mg l−1 vitamin C was obtained, which is far below of what can be achieved with bacterial cell factories, this can be considered as an important breakthrough study. This is the first report of a metabolically engineered filamentous fungus for vitamin C production in a one-step fermentation process on citrus peel waste. Last but not the least, A. niger is one of the most commonly exploited cell factories for protein and enzyme production because of its extraordinarily high secretion capacities. The most abundant enzyme secreted by A. niger is glucoamylase, which has applications in industries based on starch, for example, the food, feed, biofuel and chemical industries. Up to 30 g l−1 can be achieved during industrial production [96]. A. niger is also used for the production of other enzymes, including cellulases, pectinases, proteases, catalases, and phytases [2, 5, 97]. A common theme between citric acid and protein production is surprisingly oxygen limitation. It was shown for both processes that limited oxygen supply favors high production in the strict aerobic fungus A. niger [98, 99], i.e. an inverse correlation between citric acid/protein production and cell growth can be found. Notably, a high specific protein production rate is also

777

778

20 Metabolic Engineering of Filamentous Fungi

achieved at relatively low growth rates in T. reesei [53]. A recent multiomics analysis with a focus on glucoamylase production integrated transcriptomics, metabolomics and GEM simulations reported that this is probably achieved through several metabolic mechanisms: (i) An increased flux through glycolysis, which probably generates more amino acid precursors for protein production; (ii) reduced fatty acid and ribosome biogenesis and, thus, reduced growth; and (iii) increased flux through the glyoxylate bypass to reduce NADH formation from the citric acid cycle and to maintain the cellular redox balance [99]. The general view is that once growth is limited, more reducing equivalents NADH and NADPH, and precursors can be channeled into glucoamylase production. A comparative transcriptomics analysis of A. niger strains forced to overexpress and secrete glucoamylase uncovered that A. niger benefits from a very flexible transcriptional machinery that ensures A. niger adapts to the burden of high protein loads. Under this circumstance, A. niger increases the transcription of secretory pathway genes involved in protein folding in the endoplasmic reticulum (ER) and protein trafficking from the ER over the Golgi to the plasma membrane (see Figure 20.2b). In addition, expression of genes less required for growth and survival under this condition become decreased [100]. This phenomenon is called “Repression under secretion stress” (RESS) and was first discovered in T. reesei [101]. A. niger can, thus, fall back on a very efficient regulatory and metabolic machinery that balances cellular capacities with the necessary. The canonical view is that the extraordinary capacity of A. niger and other filamentous fungi for protein secretion is linked to their hyphal tip growth mode, a relationship that will be discussed in more detail in Section 20.7. 20.5.2

Aspergillus oryzae

A. oryzae has traditionally been used for the production of Asian food and beverages for over a thousand years and was awarded the title as the national microorganism of Japan (Koku-kin) by the Brewing Society of Japan in 2006 [102]. The efficacy of A. oryzae as a protein producer and secretor is also rooted in a sophisticated transcriptional control machinery that enables high-level production and secretion of amylolytic enzymes when cultivated on starch-rich sources, such as rice and soybeans [103]. A. oryzae also gained lots of interest recently as a cell factory for biofuel production, as its genome contains many cellulolytic and xylanolytic enzymes [64]. First transcriptomics insights have already uncovered that the conserved transcriptional regulator XlnR seems to play a central role in the regulation of cellulolytic gene expression than in other filamentous fungi [104], thus, being an excellent target for future metabolic engineering approaches. A successful recent metabolic engineering approach focusing specifically on multiple cellulolytic genes which increased their secretion 40-fold [77] was described in Section 20.4. A. oryzae is, furthermore, of biotechnological interest because it is the exclusive producer of the secondary metabolite kojic acid, which is basically a by-product during rice and soybean fermentation. However, because of its high biocompatibility and antioxidant activity, it is applied in cosmetics as a skin-lightener. This is due to its potent tyrosinase inhibition activity in the

20.5 Engineering Strategies for Enhanced Product Formation

synthesis of melanin [105]. It is, moreover, of interest as a building block for biodegradable plastics [106, 107]. It is assumed that all enzymes necessary for synthesizing kojic acid are encoded in a gene cluster which comprises 14 genes, one of which encodes the pathway-specific transcription factor KojR [108]. Targeted overexpression of KojR combined with overexpression of three cellulolytic genes enabled A. oryzae to produce kojic acid directly from cellulose instead of from starch [106]. However, the kojic acid titer achieved on cellulose was about 100-fold lower when compared to glucose as a carbon source (26 g l−1 ). Hence, further optimization efforts are necessary. Nevertheless, the use of cheap and renewable carbon sources derived from waste streams containing cellulose might establish A. oryzae as an attractive cell factory for kojic acid-based biodegradable plastics in the near future. Finally, another promising platform chemical which can be produced with A. oryzae and that could serve as a driver to consolidate bioeconomy is malic acid. This organic acid stems from the citric acid cycle and has manifold applications in the food (acidulant, flavor enhancer), chemical (polyester resins), and pharmaceutical (acidulant) industries [109]. Several microbial cell factories from bacterial, yeast, or filamentous fungal origin have been genetically engineered during the last few years to produce this platform chemical. The approaches followed the biochemical routes which have been genetically engineered, and the yields which have been achieved in the different organisms have recently been summarized in [110]. A. oryzae is among the strains with the greatest potential, which fostered several metabolic engineering efforts. Lately, a producing strain was reported that displayed high malate titer (127 g l−1 ) and malate yield (0.9 g g−1 corn starch) and produced much less succinate, which is the unwanted byproduct in this process [111]. This was achieved by synergistically targeting carbon and redox metabolism and included 12 genetic modifications in total which were introduced into A. oryzae. These involved overexpression of amylolytic genes, overexpression of the malate-producing enzyme fumarase, increased expression of the glyoxylate shunt to bypass succinate, downregulation of the malate-degrading enzyme citrate synthase, and introduction of a NADH oxidase of bacterial origin to improve the redox balance by decreasing the NADH/NAD+ ratio, to name but a few [111]. The significant increase in malate production with a simultaneous decrease of the by-product succinate improved the productivity of this process considerably and is, thus, an important step forward to a sustainable production of this platform chemical in A. oryzae. 20.5.3

Aspergillus terreus

A. terreus is a well-established cell factory for the production of the organic acid itaconate and the polyketide lovastatin. The former is of interest for the polymer industry [112], as described in Section 20.5.1, while the latter is applied in medicine as a cholesterol-lowering drug for the treatment of cardiovascular diseases. It has been marketed under the trade name Mevacor since the late 1980s [113]. Lovastatin also serves as a starter molecule for manufacturing semisynthetic statins. One example is simvastatin, the second leading statin in the market and traded under the name Zocor [114].

779

780

20 Metabolic Engineering of Filamentous Fungi

Interestingly, the biosynthetic routes for both itaconate and lovastatin are encoded in adjacent biosynthetic gene clusters in the genome of A. terreus. However, a biosynthetic co-occurrence of both has never been reported, probably because they are not under the control of a common regulatory mechanism [115]. This is thought to be because itaconate stems from the primary metabolism, whereas lovastatin is from the secondary metabolism. It has, therefore, been proposed that strains which have been selected during strain development programs for itaconate manufacturing are poor producers of lovastatin and vice versa. The maximal titers which can be reached for itaconate in A. terreus are 140 g l−1 but only about 1 g l−1 for lovastatin [115]. Reports of genetic and metabolic engineering of both metabolites in A. terreus are sparse. Most efforts are devoted to process optimization, in order to identify optimum carbon and nitrogen sources, trace elements, pH conditions, oxygen supply, and macromorphologies. The interested reader may consult [112, 115] for a detailed overview of bioprocess-related studies. Another review which can be recommended for further reading is a comparative analysis of citric acid production in A. niger and itaconate production in A. terreus regarding common or dissimilar metabolic and regulatory mechanisms important to ensure their high level production in both cell factories [90]. Current efforts in metabolic engineering concerning A. terreus aim to reroute lovastatin biosynthesis in order to obtain high amounts of one of its biosynthetic intermediates, monacolin J. This is basically the preferred precursor for semisynthetic simvastatin production and not lovastatin [116]. The classic production process of simvastatin has, so far, integrated several steps: (i) Lovastatin production through A. terreus fermentation; (ii) lovastatin extraction and purification from the biomass of A. terreus; (iii) enzymatic hydrolysis of lovastatin to monacolin J; and (iv) chemical transformation of monacolin J to simvastatin [117]. An improved A. terreus strain was engineered recently that allowed a single-step bioproduction of monacolin J through deleting the gene encoding the last enzymatic step of lovastatin biosynthesis (lovD) and constitutive overexpressing of the gene encoding the pathway-specific transcription factor LovE. This enabled high-level production of monacolin J (5.5 g l−1 ), which is far above all published efforts to establish monacolin J production in heterologous hosts, such as S. cerevisiae (75 mg l−1 ) or Pichia pastoris (600 mg l−1 ) [117]. 20.5.4

Penicillium chrysogenum

P. chrysogenum (renamed P. rubens) is important for antibiotics production and is the main cell factory producing β-lactams, such as penicillin and semisynthetic derivatives thereof. Penicillin is a secondary metabolite which is naturally produced during late stages of growth of P. chrysogenum. Its biosynthetic pathway is encoded in a biosynthetic gene cluster which comprise three genes. The core gene is a nonribosomal peptide synthase. Current production strains have undergone multiple classic strain improvement programs over the last 70 years, since P. chrysogenum was established as a β-lactam production strain in 1943. These strains can contain up to 50 copies of the penicillin gene cluster and are able to produce up to 55 g l−1 penicillin [118].

20.5 Engineering Strategies for Enhanced Product Formation

Strain improvement programs were based on random mutagenesis using UV or mustard gas and considerably altered the metabolic fluxes in P. chrysogenum [118, 119]. Most recent attempts aim to comprehensively understand the genetic, regulatory, and metabolic mechanisms that were rewired in order to reconstruct high-production strains by targeted genome breeding in the near future. Genome breeding is a well-established approach which has been successfully applied for the first time for the amino acid-producing bacterial cell factory Corynebacterium glutamicum [120]. Basically, mutations useful for production become identified in a set of different low- and high-producing classical mutant strains through genome sequencing and integrated multiomics analyses. The relevant mutations become, thereafter, systemically introduced into a wild-type genome to obtain a genetically streamlined strain that carries only useful mutations. A comparative genomic and metabolomic analysis of three P. chrysogenum strains with low or high penicillin titers led to the following conclusions [119]: Firstly, 2500 mutations have been introduced into high-level P. chrysogenum penicillin production strains over the last 70 years when compared to the progenitor wild-type strain. Secondly, the epigenetic and light-sensing regulator complex Velvet with its central methyltransferase LaeA and the scaffold protein VelA have been repeatedly targeted. And thirdly, P. chrysogenum seems to be very flexible in redirecting nitrogen, i.e. amino acids, from one nonribosomal peptide biosynthetic route to another. If the penicillin route is blocked, for example, other nonribosomal peptides are produced by P. chrysogenum, such as roquefortines, meleagrin, or chrysogenins [119]. A similar redirection phenomenon is also observed in the bacterial antibiotics producer Streptomyces coelicolor [121]. This might suggest that nonribosomal peptides could be viewed as a flexible set of nitrogen storage molecules under a high flux of carbon and sufficient intracellular amino acid availability. A hypothesis worth studying further. The penicillin biosynthetic route in P. chrysogenum has lately been reprogrammed toward an industrial pravastatin production process [122]. Pravastatin (trade name Pravachol) is an interesting alternative to simvastatin (Zocor) because of different structural, bioavailability and pharmacokinetic properties [123]. A one-step fermentative production of pravastatin in P. chrysogenum was achieved by (i) deleting the penicillin gene cluster, i.e. generation of a β-lactam-free platform strain, (ii) random introduction of the compactin gene cluster from Penicillium citrinum in the genome, and (iii) expression of a fusion protein containing an stereoselectively evolved compactin hydroxylase from Amycolatopsis and a reductase as a redox partner from Rhodococcus. This eventually resulted in a production strain that achieved titers of 6 g l−1 pravastatin [122]. The readers may consult a recent review for more engineering examples focusing on P. chrysogenum and strategies which are followed for this cell factory to activate or silence biosynthetic gene clusters [124]. 20.5.5

Trichoderma reesei

The main biotechnological importance of T. reesei is attributed to its cellulase and hemicellulase enzymes, which are key to converting lignocellulosic biomass

781

782

20 Metabolic Engineering of Filamentous Fungi

into biofuel. Lignocellulose is composed of cellulose, hemicellulose and lignin and is a waste product of agriculture (e.g. straw, bagasse, corn stover) and forestry (sawdust). Whereas the originally isolated wild-type strain (QM6a) is a poor (hemi)cellulase producer and secretor, classic strain development yielded the Rutgers strain Rut-C30 with 30 g l−1 in the 1980s. Hypersecreting strains were obtained by directed evolution at the dawn of the twenty-first century, achieving cellulase titers of 100 g l−1 [125]. These are the highest titers ever reported for protein secretion and exceed by 10–10 000-fold what can be achieved nowadays with bacterial, yeast or mammalian cell factories. Protein secretion titers in these cell factories are usually in the order of mg l−1 to only a few g l−1 [126–128]. Such an extraordinary high capacity of filamentous fungi, such as T. reesei, A. niger, and A. oryzae, exploited in biotechnology for protein secretion can be attributed to several factors: (i) An effective protein secretion machinery is a prerequisite for fast hyphal growth (see Section 20.7); (ii) the saprophobic lifestyle of filamentous fungi on plant biomass is only possible due to efficient secretion of high amounts of extracellularly active enzymes; and (iii) the RESS phenomenon, which was first described in T. reesei but has also been documented in A. niger ([101], see Section 20.5.1), ensures efficient down-regulation of genes less important for survival during fast colonization of dead plant material. The underlying metabolic and regulatory machineries enabling T. reesei to transcribe and express (hemi)cellulose encoding genes efficiently and genetic engineering strategies to make (hemi)cellulase secretion inducer-independent have been discussed in Section 20.4. Thermostable cellulases are one major goal in current lignocellulose degradation processes [129] and have inspired protein engineering efforts in T. reesei to obtain cellulases which are functional at higher temperatures. In general, high process temperatures are preferred in biorefineries as they reduce the viscosity and solubility of lignocellulosic biomass and thus increase reaction rates [129]. Using protein structure and stability predictions, chimeric enzymes were designed based on a cellulase from T. reesei and its thermostable homologs from two other filamentous fungi (Talaromyces emersonii and Chaetomium thermophilum). This approach eventually enabled the improvement of the thermostability of the T. reesei enzyme by up to 3 ∘ C [130]. Lytic polysaccharide monooxygenases are auxiliary enzymes that accelerate the breakdown of cellulose, chitin and starch by oxidative cleavage of glycosidic bonds [131]. It has been shown that, when added to purified T. reesei cellulolytic enzyme cocktails, they improve the hydrolysis of lignocellulose considerably [132]. This observation has led to the development of the new commercial cellulase cocktails Cellic CTec2 and Cellic Ctec3 [133]. A thermostable lytic polysaccharide monooxygenase from the filamentous fungus Talaromyces cellulolyticus has recently been heterologously expressed in T. reesei and, indeed, improved T. reesei’s degradation efficiency of cellulose and delignified corncob residues [134]. T. reesei is also of interest for the production of the organic acid galactarate. A QMA6a-derived strain, in which its intrinsically encoded degradation pathway for galactarate was eliminated and a bacterial galacturonate dehydrogenase gene was introduced, resulted in a strain which produced 20 g l−1 galactarate directly

20.6 Engineering Strategies for the Production of New-to-Nature Compounds

from pectin [135]. A recently metabolically engineered S. cerevisiae platform strain obtained 8 g l−1 from citrus peel waste [136]. 20.5.6

Thermothelomyces thermophilus

The biotechnological application of the thermophilic T. thermophilus is primarily associated with its ability to produce and secrete thermostable cellulolytic enzymes. It is an emerging filamentous fungal cell factory and was previously known as M. thermophila. The T. thermophilus strain ATCC 42464 is accepted as the general wild-type strain in academia, whereas the proprietary mature enzyme production strain C1 is used in industry [137]. Production levels up to 100 g l−1 cellulases are possible using the C1 strain, with the additional advantage of low viscosity levels during fermentation. The first commercial product was CeluStar CL, which was granted the GRAS status by the FDA in 2009 [138]. It is currently being improved by industry to become a producer for biologics, such as vaccines, therapeutic enzymes, proteins and biosimilars [139]. In addition, first very promising metabolic engineering studies show impressively how T. thermophilus can be reprogrammed to generate platform chemicals, such as fumarate and malate, directly from renewable feedstocks. So far, fumaric acid has been produced with filamentous fungi from the genus Rhizopus, which naturally employ the cytosolic reductive citric acid cycle under nitrogen-limiting conditions. Fumarate is of interest as an acidulant and antioxidant in the food and beverage industries and for the manufacturing of synthetic resins and biodegradable polymers to replace petroleum-based processes [140]. Optimization of medium composition and process-relevant parameters increased the fumarate titers in natural Rhizopus strains up to 40 g l−1 [141, 142]. Metabolically engineered E. coli strains reached 28 g l−1 on glucose and 42 g l−1 on glycerol as a carbon source [143, 144], an engineered S. cerevisiae strain 6 g l−1 [145] and an engineered strain of the yeast Torulopsis glabrata 33 g l−1 [146]. A recent CRISPR-based metabolic engineering effort in T. thermophilus achieved 17 g l−1 fumarate and involved simultaneous optimization of organic acid transport in and out of the mitochondria, overexpression of fumarase to increase the flux from malate to fumarate and deletion of the fumarate-degrading enzyme fumarate reductase [147]. Conceptually in a similar approach but tackling other phenomena such as CO2 -fixation through pyruvate carboxylase and mitochondrial transport systems, a malate-overproducing strain was generated that produced 200 g l−1 malate from crystalline cellulose and 110 g l−1 from corncob [148]. These data are promising for the future establishment of T. thermophilus as an efficient cell factory for platform chemical production directly from plant waste streams.

20.6 Engineering Strategies for the Production of New-to-Nature Compounds Filamentous fungi produce a wide range of secondary metabolites, which are all derived from acetyl-CoA as the critical initial building block (Figure 20.3).

783

784

20 Metabolic Engineering of Filamentous Fungi

Table 20.4 Selected secondary metabolites from filamentous fungi and their applications. Compound

Application

References

β-Lactams

Penicillins and cephalosporins account for more than 30% of the global antibiotics market

[149]

Cyclosporin

Immunosuppressant that avoids organ rejection in transplant surgery

[150]

Echinocandins

Caspofungin, micafungin, and anidulafungin used for the treatment of Candida infections

[151]

Griseofulvin

Antifungal used for the treatment of skin infections

[152]

Mycophenolic acid

Immunosuppressant that avoids organ rejection in transplant surgery and is traded as CellCept

[153]

Myriocin

Chemical analog thereof is used to treat multiple sclerosis; approved in 2018 as Gylenia

[154]

Statins

Lovastatin, simvastatin and pravastatin are used to treat cardiovascular diseases by lowering cholesterol levels

[113]

These compounds not only differ in structure but also in their bioactivities, which can be antibacterial, antifungal, insecticidal, antiparasitic, or cytotoxic, to name but a few. Some currently traded important pharmaceuticals from filamentous fungi and their medicinal applications are summarized in Table 20.4. Metabolic engineering strategies to improve the production of some of these have been discussed in Section 20.5. Genome mining of hundreds of filamentous fungal genomes disclosed that the number of predicted biosynthetic gene clusters by far exceeds the number of known fungal secondary metabolites and suggests that millions of fungal metabolites await their discovery, many of which will have potential pharmaceutical applications [4, 58, 155]. The reader is referred to [156, 157] for more information on how to identify and harness this untapped resource by integrating multiomics studies and implementing technological advances, such as microfluidics, next-generation 3D-bioprinting and controlled cocultivation. Notably, the impressively high structural diversity of the secondary metabolites discovered so far suggest that they can already serve as very interesting lead structures for the development of a broad repertoire of new-to-nature compounds. These novel compounds might have new bioactivities or improved bioavailability, higher stabilities and better pharmacokinetics compared to related drugs used currently or could even serve as better precursors for semisynthetic routes for new drugs [158]. Combined with the power of filamentous fungal cell factories established already in which pathway engineering ensures their efficient diversification, new production processes might become feasible. In the following, the conceptual strategy for the diversification of fungal secondary metabolites will, thus, be illustrated along with the example of fungal cyclodepsipeptides (CDPs), which are cyclic nonribosomal peptides composed of alternating units of amino acids and α-hydroxy acids.

20.7 Engineering Strategies for Controlled Macromorphologies

Enniatin, beauvericin, bassianolide, and PF1022 belong to the class of CDPs and they exhibit antibacterial, antifungal, insecticidal, anthelmintic, or even anticancer activities and are, thus, of great interest to the pharmaceutical industry [158]. Two CDPs have already been commercialized: fusafungine (a mixture of enniatins) for the treatment of bacterial throat infections and emodepside (a semisynthetic derivative of PF1022A), which is used as anthelmintic compound in veterinary medicine [159]. Expression of the nonribosomal peptide synthetase encoding gene esyn1 from Fusarium oxysporum under the control of the synthetic Tet-on gene switch in A. niger and optimization of the medium composition resulted in enniatin B production up to 4.5 g l−1 during fed-batch bioreactor cultivations [160, 161]. Tet-on-driven polycistronic expression of the esyn1 and the kivR gene encoding an enzyme generating the α-hydroxy acid precursor molecule resulted in 40% of the product titer [25]. This proved for the first time that polycistronic secondary metabolite biosynthesis is possible in A. niger and, furthermore, suggested that the KivR enzyme catalyzes the rate-limiting step in enniatin B biosynthesis. Remarkably, A. niger was shown to be not only a superior expression host for enniatin B but also for beauvericin and bassianolide, by producing the highest titers ever reported for bacterial, yeast, or fungal hosts [162]. It was, furthermore, demonstrated that A. niger is an ideal platform strain for the production of new-to-nature CDPs, which were obtained by designing chimeric CDP synthetases by either the swapping of enzyme modules, domains, or subunits thereof [163, 164]. Feeding alternative α-hydroxy acid precursors also allowed the synthesis of novel CDP derivatives up to 1 g l−1 [162]. Most importantly, some of the new-to-nature CDPs displayed considerably higher bioactivities compared to their parental CDPs and reference drugs [164]. Hence, the currently available engineering toolbox for A. niger and its high metabolic flux toward amino acids and α-hydroxy acids can be harnessed to produce nonribosomal peptides with natural or novel structures at industrial relevant titers.

20.7 Engineering Strategies for Controlled Macromorphologies The macromorphology of filamentous fungi adopted during submerged cultivation in bioreactors is very critical for product titers. The formation of pellets, loose clumps, or dispersed macromorphologies (Figure 20.1) result from various interacting phenomena [3]. In brief, spores are used as inoculum. After an initial period of spore swelling in which spores break metabolic dormancy, a complex developmental program is initiated that ensures that germ tubes are formed. An intricate intracellular interplay and coordinated regulation of polarity proteins (which target growth exclusively to the newly formed tip, e.g. cell end markers, formins, polarisome), cytoskeletal elements (tubulin, actin), and vesicle transport proteins (which ensure trafficking of proteins within vesicles from the ER over the Golgi to the plasma membrane along the cytoskeletal tracks, e.g. SNARE proteins,

785

786

20 Metabolic Engineering of Filamentous Fungi

GTPases, myosin) ensure that vesicles accumulate at the tip of the germ tubes, where they eventually fuse with the plasma membrane. This results in plasma membrane extension, which causes the germ tube to elongate and form long, thread-like cells termed hyphae. Part of the vesicular cargo is devoted to cell wall synthesizing proteins, which remain embedded in the plasma membrane and secure the biosynthesis of chitin and glucans. Another part is released into the external environment and is mainly composed of hydrolytic enzymes, whose function is to degrade organic matter, i.e. amylases, cellulases, proteases, and lipases, to name but a few. As growth continues, hyphae start to form cross-walls (called septa) and branches and, eventually, a mycelium is formed. Hence, it is thought that strains with more hyphal tips might have the theoretical potential to secrete more proteins as more exit routes are available. From a cell biological perspective, this is an oversimplified view of what is known so far (an estimated number of 2000 proteins are thought to participate in hyphal growth and development in Aspergillus [165]) and the reader may consult the following reviews for further reading [166–168]. The cell biological knowledge generated so far inspired several targeted strain optimization efforts. It was shown, for example, that the downregulation of chitin synthase genes in A. niger and P. chrysogenum elevated citric acid (40%) or penicillin titers (40%), respectively [169, 170]. In another study, cell cycle genes were upregulated in A. oryzae and increased malate titers by about 50%. Furthermore, a hyperbranching A. niger strain was generated that accumulated the same biomass during cultivation while it was hyperbranching compared to the wild-type [171]. This hyperbranching strain was then further genetically engineered to overexpress glucoamylase. By putting transcriptional control of the glaA gene under the metabolism-independent Tet-on gene switch, 400% higher secretion of the glucoamylase was achieved [172]. Smaller pellets and smaller dispersed mycelia were observed in all four cases when compared to the respective progenitor strains. From a process engineering perspective, it is known that the first hours after spore inoculation will already be influential for the development of the final macromorphologies. This depends on the spore titer used, their ability to coagulate or not, the medium composition, the pH of the medium, and the agitation speed of the bioreactor. The frequency of hyphal branching is also decisive – the higher the branching frequency, the higher the tendency to form pellets. Notably, dispersed mycelia or loose clumps can also agglomerate during later stages of fermentation and form pellets, whereas pellets can fragment into smaller entities due to high shear forces. The mechanistic basis for the formation of pellets or dispersed mycelia is, thus, attributed to hyphal extension and branching rates, pellet fragmentation rates, and bioreactor parameters, which are now increasingly modeled [173–176]. Pelleted or dispersed macromorphologies have both advantages and disadvantages. Pellets are more resistant to shear stress and cause low viscosity of the liquid phase. However, transport of oxygen and substrate to the core area of larger pellets is limited by the hyphal network, which, in turn, may limit growth, viability, and, finally, product formation. By contrast, dispersed mycelia grow rapidly and are less limited regarding nutrient transport but cause a higher medium viscosity, thus, a lower volumetric gas-liquid mass transfer. Dispersed mycelia are

20.7 Engineering Strategies for Controlled Macromorphologies

also more susceptible to shear stress [3]. Importantly, a canonical view on how fungal macromorphologies are coupled with product formation is missing. On the one hand, this is due to the fact that many reports in the literature contradict each other [3, 115]. On the other hand, only a limited number of systematic attempts have been undertaken so far to (i) understand the genetic and metabolic network driving hyphal growth and the evolution of macromorphologies, (ii) to analyze at which levels the macromorphological structures interact with the bioreactor environment, and (iii) to measure to what extent these interactions feedback to the cells and activate or repress certain metabolic activities. However, several tools have recently been implemented to measure and quantify macromorphological parameters under submerged cultivation conditions which are important prerequisites for future attempts to model and engineer fungal macromorphologies in an integrated manner. The MPD pipeline [177], for example, measures hundreds of macromorphological structures from microscopic images by quantitatively assessing filamentous fungal cultures, which usually consist of both dispersed and pelleted forms (Figure 20.4a). It, thus, gives a quantitative measurement of culture heterogeneity. Furthermore, it automatically generates key Euclidian parameters for individual fungal structures, such as projected area, circularity, aspect ratio, and surface roughness, and calculates the dimensionless morphology number MN, which varies between 0 (a one-dimensional line) and 1 (a perfect circle) [178].

Figure 20.4 Image analysis tools to quantify fungal macromorphologies. (a) The MPD pipeline [177] uses microscopic raw images (left panel, the scale bar = 500 μm) to automatically assess pellets and disperse structures. Structures are depicted as outlines indexed with a unique number (red), enabling simple assessment of automated calls by the end user. Processed outlines of fungal structures passing default definitions of pelleted (≥500 μm2 ) and dispersed (110 μg g−1 dry mass in the tubers [131]. Alkaloids are a class of N-containing bioactive natural products providing numerous pharmacologically active compounds such as the plant alkaloids morphine and codeine. Although the biosynthesis, regulation, and transport of alkaloids are highly complex processes [132], several engineering efforts have been successful [133, 134]. Successful engineering of colchicine alkaloids has recently been achieved [135]. Glucosinolates and cyanogenic glucosides, two other classes of N-containing bioactive natural products have also successfully been engineered [2, 11, 125, 136, 137].

21.5 Chloroplasts as the Site of Production

In broader terms, metabolic engineering of bioactive natural products has also been used to alter the volatile spectrum emitted from leaves and flowers [8], to generate genetically modified crop plants resistant to aphid herbivores [138], and to generate herbicide resistant plants [139]. The latter has been implemented in several glyphosate herbicide resistant transgenic crop plants produced, accounting for 80% of the transgenic crops grown worldwide [140]. At the global scale, the most planted biotech crops in 2018 were soybean, maize, cotton, and canola. Of the total areas grown with soybean, maize, cotton, and canola, 78, 76, 30, and 29%, respectively, were genetically engineered (https://www.isaaa.org/resources/ publications/pocketk/16). The long-term effect of such applications are not fully resolved.

21.5 Chloroplasts as the Site of Production In Eukaryotic organisms, the cells contain a nucleus and a number of different organelles such as the tonoplast, chloroplasts, mitochondria, the ER, Golgi apparatus, and peroxisomes. All these organelles are situated in the cytoplasm enclosed within the plasma membrane. Each organelle represents a distinct functional unit and serves to compartmentalize and orchestrate order in the crowded chaos of the cell. Harboring the light and dark reactions of photosynthesis, the chloroplast is the most conspicuous organelle present in algae and plants, and has, for this reason been subject to important metabolic engineering approaches (Section 21.2.1). But the chloroplast harbors additional biochemical potential for metabolic engineering both regarding synthesis of high value bioactive natural products, lipids, and proteins [141] (Figure 21.6). Terpenoids constitute the largest class of bioactive natural products with approximately 50 000 different known structures, making terpenoids the richest plant repository for chemicals with a wide range of bioactivities. A number of global industries base their business on the properties of terpenoids as important constituents in foods, as antimicrobial agents, as biologicals, and pharmaceuticals or lead compounds for new pharmaceuticals. Although plants are a rich repository of terpenoids, high value diterpenoids can often only be obtained in small amounts by extraction from their host plants, which might be rare medicinal and herbal plants. This sets the stage for metabolic engineering to increase access to desired structurally complex diterpenoids and to use combinatorial biochemistry to further augment the natural diversity of diterpenoid core structures and decoration patterns. An ultimate goal of metabolic engineering within this research area is to design a “plug-and-play” template-based production system that on a long-term basis can contribute to production of otherwise extremely costly medicinal compounds. Typically, structurally complex diterpenoids contain numerous chiral carbon atoms, rendering chemical synthesis of the compounds a major challenge. These common denominators render the plant and algae chloroplasts and thylakoids of cyanobacteria an ideal target for metabolic engineering and synthetic biology approaches aimed at large-scale synthesis of complex diterpenoids driven by light.

821

822

21 Metabolic Engineering of Photosynthetic Cells – in Collaboration with Nature

Figure 21.6 The chloroplast as a platform for biosynthesis of plant metabolites toward metabolic engineered synthesis of bioactive natural products, lipids, and proteins. Source: Nielsen et al. [82], with minor modifications. Reproduced with permission from John Wiley & Sons.

The synthesis of terpenoids is highly modular using the five carbon containing prenyl isomers isopentenyl diphosphate and dimethylallyl diphosphate as universal building blocks [84]. These are formed from either the mevalonate (MVA) pathway localized in the cytosol or from the 2-methyl-d-erythritol 4-phosphate (MEP) pathway localized in the chloroplast [84]. Condensation of four 5-carbon units catalyzed by prenyl transferases provides geranylgeranyl diphosphate (GGDP), the linear C-20 precursor for synthesis of the phytol side chain of chlorophyll and diterpenoids. Classes II and I diterpenoid synthases likewise localized in the chloroplast catalyze the cyclization of GGDP giving rise to the multitude of different diterpenoid core structures [142]. The enzymes catalyzing the founding reactions for biosynthesis of the complex core structures of diterpenoids in plants are thus compartmentalized within the chloroplast. All enzymes involved are encoded by nuclear genes and targeted to the chloroplast by transit peptides. Decoration of the diterpenoid core structures typically proceeds with P450s as key players in further modifications of the core structures as illustrated by the forskolin biosynthetic pathway (Figure 21.7). In nature, the P450s are localized on the ER. Final decoration of the hydroxylated core structures may require additional acylation, alkylation, and/or glycosylation reactions. To metabolically engineer the entire pathway for production of a structurally complex diterpenoid in the chloroplast, the specific Classes II and I diterpene synthases, the P450s, and enzymes catalyzing rearrangement reactions and further decoration need to be imported into the chloroplast. This is achieved by introducing chloroplast targeting sequences in their respective nuclear encoded genes. The construction of the entire process is simplified by the preexisting endogenous production of the initial GGDP substrate in the chloroplasts [84]. Likewise, molecular oxygen and NADPH required for P450s to

21.5 Chloroplasts as the Site of Production

Figure 21.7 The biosynthetic pathway of forskolin from Indian coleus (Coleus forskohlii) derived from geranylgeranyl diphosphate (GGDP) involving two diterpene synthases, several P450s, and an acetyl transferase.

OPP

GGDP CfTPS2 OPP OH Labda-13-en-8-ol diphosphate CfTPS3

H+ O

13R-Manoyl-oxide (MO) CfCYP76AH15 CfCYP76AH8 CfCYP76AH17 O O

13R-MO=O CfCYP76AH11 O OH

O OH

OH

CfCYP76AH16 O OH OH

O

OH OH Deacetyl-forskolin CfACT1-8 O OH OH

O

O OH Forskolin

O

823

824

21 Metabolic Engineering of Photosynthetic Cells – in Collaboration with Nature

carry out the monooxygenase reaction are plentiful endogenous chloroplast products as the outcome of the photosynthetic light reactions (Figure 21.2). In context of auto-toxicity issues resulting from accumulation of diterpenoids in the engineered chloroplasts, it is to be noticed that UDP-Glucose is also synthesized within the chloroplast [145] and thus available as a cofactor for UDP-glucosyltransferase-mediated detoxification by glucosylation. In a typical plant cell, the chloroplasts take up 20–60% of the cellular space dependent on the plant species, tissue, and cell type [146]. The chloroplast lumen thus provides ample space for storage of natural products. In nature, this is exemplified by the vanilla orchid (Vanilla planifolia), which deposits vanillin glucoside in its senescent chloroplasts at 4.7 M concentration, i.e. 1.5 g ml−1 [147, 148] (Section 21.8.1.1). Vanillin glucoside is derived from the shikimate pathway, all enzymes of which are localized within the chloroplast [149–151]. Aromatic amino acid synthesis also takes place in the chloroplast [81, 149]. This offers a separate track of metabolic engineering-based synthesis of the myriad of tyrosine-derived bioactive natural products [81] such as the cyanogenic glucoside dhurrin and alkaloids like the (S)-reticuline derived morphine type alkaloids [152, 153], biosynthetic pathways that both are dependent on P450 reactions. 21.5.1

Metabolic Engineering Approaches

Identification of the full set of genes encoding the enzymes catalyzing the entire pathways for structurally complex diterpenoids has been a great challenge but is now progressing [143, 154–158]. The genes encoding the enzymes catalyzing diterpenoid synthesis are sometimes localized in gene clusters on the nuclear genome. This facilitates full pathway elucidation [155]. The lack of essential genes has delayed detailed experimental testing of the full potential of plant chloroplasts as a light-driven production engine of diterpenoids. But the principle of using endogenous chloroplast production of key precursors for engineering has been verified by re-routing of other entire biosynthetic pathways into the chloroplast. A typical cell in a higher plant contains 40–50 chloroplasts and around 2000 independent photosynthetic electron transport chains per chloroplast [146, 159]. Thus in total, a plant cell will harbor approximately 90 000 independent photosynthetic electron transport chains. In the engineering process, only a fraction of the photosynthetic electron transport chains will be directly coupled to enzymes catalyzing product formation. The engineered plant would thus only show minor, if any, changed growth properties. 21.5.1.1

Selected Examples: A Light-Driven Power-House

As an experimental model system for introduction of a P450-dependent biosynthetic pathway into the chloroplast while taking advantage of the inherent de novo biosynthesis of aromatic amino acids in this organelle, the entire biosynthetic pathway for the tyrosine-derived cyanogenic glucoside dhurrin (d-glucopyranosyloxy-(S)-p-hydroxymandelonitrile) was engineered into tobacco (N. benthamiana) chloroplasts [141]. In sorghum, the dhurrin pathway is catalyzed by the two P450s SbCYP79A1 and SbCYP71E1, the

21.5 Chloroplasts as the Site of Production

UDPG-glucosyltransferase SbUGT85B1, and the NADPH-P450 oxidoreductase SbPOR2b [160–165]. The engineering was achieved by transient expression of gene constructs encoding fusion proteins between the transit peptide of the chloroplast stroma-localized ferredoxin (AtFedA) from Arabidopsis and the coding regions of the two P450s SbCYP79A1 and SbCYP71E1, and SbUGT85B1 [141]. Stable integration of the dhurrin pathway genes into the tobacco chloroplast genome and demonstration of their functional expression was also accomplished [154]. The engineered light-dependent formation of dhurrin provided experimental proof that the chloroplast was able to provide the two substrates tyrosine and UDP-glucose and ferredoxin as an electron donor to the P450s. In this context, it is important to note that PSI has been shown to be able to serve as a direct electron donor to P450s [166] thereby negating the demand for co-introduction of the membrane-bound SbPOR2b that is required when the P450s are localized in the ER. The expressed P450s were demonstrated to be active when anchored in the thylakoid membrane and able to function in a light-driven manner. Likewise, the P450s were not inactivated by the shift in stroma pH from neutral to alkaline following irradiation [141]. PSI generates the most negative redox potential known in nature [167] and has a high stability. The subunit composition of the PSI complex is known [167] and crystal structures are also available [168]. Only three of the subunits (the chlorophyll-binding heterodimer PsaA/PsaB and PsaC) carry electron acceptors [169, 170] whereas the remaining subunits bind the light harvesting chlorophyll antennae, are regulators, or serve to stabilize the structure of the complex. When a C-terminal fusion protein between SbCYP79A1 and ferredoxin (AtFd2) was introduced into the chloroplasts, the fusion protein was found to interact with PSI in a way that channeled electrons from PSI directly to the P450 without competition from free ferredoxin (Figure 21.8) [171]. Whether the mechanism mediating this efficient interaction is dependent on metabolon formation remains to be investigated. These properties make PSI a highly interesting biobrick in metabolic engineering approaches aimed at establishing production platforms based on use of chloroplasts as a light-driven power-house for the synthesis of novel and structurally complex molecules. Chimeric P450s fused to other electron donating entities may also be used [172]. These studies demonstrate that it is indeed possible to transfer an entire P450-dependent biosynthetic pathway for synthesis of a bioactive natural product to the chloroplast. Based on solar radiation and use of water as the primary electron donor, it is possible to directly tap into the reducing power generated by photosynthesis to drive the redox reactions catalyzed by P450s [141]. Similar results were obtained in cyanobacteria (Synechocystis sp. PCC 6803) [173]. This opens the avenue for light-driven synthesis of a vast array of other bioactive natural products in the chloroplast, like structurally complex alkaloids and diterpenoids. Like for essential amino acids, humans are dependent on acquiring vitamin E from their plant based-diet. The compounds with vitamin E activity are the tocochromanols named tocopherols and tocotrienols (Figure 21.9). Their formation requires condensation of homogentisic acid with phytyl diphosphate and GGDP, respectively. In nature, all these precursors are biosynthesized in the

825

826

21 Metabolic Engineering of Photosynthetic Cells – in Collaboration with Nature

NADP+

Stroma

NADPH

FNR e–

Fd e–

PSII

PQ

e– OEC H2O

Cyt b6/f e– PC

+

2H + ½ O2

Aldoxime

Fd

Light

Light

Tyrosine

CYP79

PSI

Thylakoid membrane

Lumen

Figure 21.8 Schematic illustration of chloroplast thylakoid containing the interfacing of photosynthetic electron transport with light-driven biosynthesis by fusing ferredoxin to SbCYP79A1, thus directing electrons toward cytochrome P450-catalyzed hydroxylation reactions. OEC, oxygen-evolving complex; PSII, photosystem II; PQ, plastoquinone pool; Cyt b6 /f , cytochrome b6 f; PC, plastocyanin; PSI, photosystem I; Fd, ferredoxin; FNR, ferredoxin NADPH-reductase. Source: Reprinted with permission from Mellor et al. [171], with minor modifications. © 2014 American Chemical Society.

chloroplast being derived from the shikimate and MEP pathways (Figure 21.6) and with all enzymes involved being encoded by nuclear encoded genes. The final products are formed following cyclization and methylations of the aromatic ring system (Figure 21.9). This illustrates how the inherent biosynthetic potential of the chloroplast interlinks use of intermediates in the shikimate pathway resulting in the production of aromatic amino acids with those in the MEP pathway directed toward synthesis of chlorophyll, carotenoids, phylloquinone, plastoquinols, and gibberellins. It is estimated that more than 20% of the carbon fixed in plants pass through the shikimate pathway [174]. Metabolic engineering of the balanced fluxes through the shikimate pathway may provide increased biomass and yields [175]. To increase the production of vitamin E in tobacco (N. tabacum), orthologous genes from the cyanobacterium Synechocystis were stably transformed into the tobacco chloroplast genome using homologous recombination [53, 54]. Cyanobacterial genes were used to avoid interference from regulation of the endogenous tobacco genes. The rate limiting step in the tocochromanol pathway was thought to be the activity of the homogentisate phytyltransferase. The transplastomic line expressing the cyanobacterial gene

21.5 Chloroplasts as the Site of Production

827

828

21 Metabolic Engineering of Photosynthetic Cells – in Collaboration with Nature

Figure 21.9 Metabolic engineering based on incorporation of the genes encoding biosynthesis of tocochromanols (tocopherols and tocotrienols) into the plastid genome to obtain a 10-fold enrichment of product formation. Tocopherols and tocotrienols are formed in prenylation reactions with phytyl diphosphate (PDP) and geranylgeranyl diphosphate (GGDP) as donors. Cyclases and methyltransferases are catalyzing formation of the final set of tocochromanols. Structures and enzymes shown in black refers to formation of tocopherols. Enzymes involved in tocotrienol formation are shown in green and the structural differences to tocopherols are marked in green on the structures. MPBQ, 2-methyl-6-phytylbenzoquinone; DMPBQ, 2,3-dimethyl-5-phytyl-1,4-benzoquinone; DMGGBQ, 2,3-dimethyl-5-geranylgeranyl1,4-benzoquinone. Source: Lu et al. [54].

encoding this enzyme increased tocochromanol formation by a factor of 5 [54]. When all three pathway genes were co-expressed from the operon construct, the tocochromanol level was only increased by a factor of 1.7. This was shown to reflect incomplete processing of the polycistronic RNA. Following incorporation of an intercistronic expression element in the operon [176], proper processing of the polycistronic operon transcript was achieved and the tocochromanol level was enriched 10-fold compared to wild type. The transplastomic lines with increased flux of isoprenoids toward the tocochromanol pathway did not show any visual phenotype but had a slightly increased carotenoid and chlorophyll content. When subjected to photooxidative stress, the high tocochromanol content protected the transplastomic lines from oxidative stress [54]. The ability of the transplastomic lines to maintain proper levels of other isoprenoid-derived metabolites demonstrate the robustness and plasticity of chloroplast metabolism. Similar studies with comparable results were carried out in tomato [54]. These studies demonstrate how stable transformation of the chloroplast genome can be used to significantly increase the content of vitamin E and thereby increase the nutritional value of plants while at the same time increasing the ability of the engineered plants to resist oxidative stress [54]. 21.5.1.2

Other Examples

The ability to engineer the chloroplast genome offers the opportunity to produce high levels of recombinant proteins. Thus, a proteinaceous antibiotic against pathogenic group A and group B streptococci was produced at levels representing 70% of the total soluble protein [45]. Biopharmaceuticals triggering immune responses against hepatitis C [177] and inhibitors preventing entry of the HIV virus [178] have been produced in transplastomic plants. High level production of proteinaceous biopharmaceuticals in transplastomic lettuce [51] and in tomato fruits [44] offers the opportunity that these biopharmaceuticals might be administered orally by ingesting these vegetables as parts of raw foods [179]. A metabolic engineering approach termed combinatorial supertransformation of transplastomic recipient lines (COSTREL) combining plastid transformation with subsequent nuclear transformation has been developed to transfer entire pathways of natural products into tobacco (N. tabacum) with a concomitant insertion of genes optimizing the carbon flux through the pathway to improve product yield [180]. This was exemplified by expressing the entire biosynthetic

21.6 Metabolic Engineered Production in Microalgae

pathway for the sesquiterpenoid artemisinic acid into tobacco with production levels reaching 120 milligram artemisinic acid per kilogram biomass. Artemisinic acid is converted chemically to the antimalarial drug artemisinin. The COSTREL engineering platform enabling both introduction of all the biosynthetic genes as well as a set of genes increasing the flux through the pathway sets the stage for future high level production of structural complex bioactive natural products in plants. Other examples are increased resistance to low temperature obtained by engineering endogenous accumulation of the antifreeze compound glycine betaine in transplastomic potato plants [181]. Poplar trees are subject to damage by insect pests. The infestations were counteracted by engineering of poplar (Populus L.) plants expressing the Bacillus thuringiensis cry3Bb gene. The transplastomic plants conferred high mortality to the leaf eating beetles [182].

21.6 Metabolic Engineered Production in Microalgae Microalgae are rapidly growing robust photosynthetic organisms. Algae biomass may be used as replacement of fossil fuels to cover global energy demand. The ability of microalgae to adapt to high and rapidly changing light conditions is associated with their ability to synthesize a large number of isoprenoid-derived compounds securing light capture (chlorophyll, carotenoids), electron transfer (ubiquinone, plastoquinone), and radical scavenging (carotenoids) [183]. This makes microalgae obvious candidates as light-driven production systems for especially isoprene-derived high value bioactive natural products. Several algae species have been studied, often with the purpose to investigate their potential as producers of biofuel or to enhance their ability to produce high value carotenoids [184–189].

21.6.1

Metabolic Engineering Approaches

Bioproduction based on microalgae is not dependent on arable land. In contrast, it may serve to valorize nonarable marginal lands and wastewater. The microalgae may be grown in open ponds or in fermenters in both cases with solar energy as the energy input and CO2 as the carbon source [190–196]. In open ponds, growth and production issues related to contamination with bacteria, other algae, or cyanobacteria may represent a major constraint to establish effective cultivation processes. To circumvent such issues, the microalgae production organism may be engineered to produce a bacterial phosphite oxidoreductase (PtxD). This enables the production microalgae to metabolize and utilize phosphite as a phosphorus source, while contaminating organisms would suffer from phosphate deprivation and, hence, being unable to propagate [197]. In bioreactors, algae are easy to environmentally contain and thus meets the European standards for cultivation of gene modified organisms (GMOs).

829

830

21 Metabolic Engineering of Photosynthetic Cells – in Collaboration with Nature

21.6.1.1

Selected Examples: Future Production Organisms

Chlamydomonas reinhardtii has been chosen as a model microalga based on its haploid genome organization, established transformation protocols for its nuclear, chloroplastic, and mitochondrial genomes and short life cycle [196]. This potential has been experimentally demonstrated in C. reinhardtii with production of the diterpenoids casbene, taxadiene, and 13R(+) manoyl oxide using enzymes targeted to the chloroplasts [183]. A truncated P450 devoid of a membrane anchor was also targeted to the chloroplast and functionally active with 13R(+) manoyl oxide as substrate [183]. Diterpenoids produced in the chloroplasts were excreted into the medium and collected in a dodecane solvent overlay of the growth medium with 22.5 mg 13R(+) manoyl oxide produced per gram algae dry matter per day. A prime challenge in working with microalgae has been the low expression levels of nuclear genes from higher plants. In the study with C. reinhardtii [183], this was partly overcome using codon optimization in combination with a design of the nuclear genes including regular spaced in silico insertions of the first intron sequence from Rubisco subunit 2 from C. reinhardtii [146]. Photosynthetic rate and growth were increased by overexpression of the Calvin–Benson cycle enzyme sedoheptulose-1,7-bisphosphatase [198]. To advance the use of C. reinhardtii as a versatile production platform, methodologies for high-yield production of recombinant proteins [199] and for synthesis of multisubunit protein complexes and the introduction of entire pathways for synthesis of complex natural products [200] have recently been reported. Nannochloropsis oceanica is an oleaginous, eukaryotic marine microalga growing in salty water. Hence, production based on this microalga avoids the consumption of high-quality fresh water. N. oceanica is evolutionary related to brown algae and diatoms, the main contributors to CO2 fixation in world’s oceans [201]. N. oceanica has gained industrial attention because of its ability to accumulate high levels of oil, fatty acids, and other nonpolar compounds. These accumulate in lipid droplets with eicosapentaenoic acid (Omega-3) as a main constituent of high commercial value and used in food supplements [202, 203]. N. oceanica has been used to decipher the importance of nonphotochemical quenching (Section 21.2.1.1) and for identification of the bifurcation step directing carotenoids for either light-harvesting or photoprotection [79, 201]. In this context, the biosynthesis of the carotenoid fucoxanthin involving the conversion of violaxanthin to neoxanthin was elucidated [201]. These studies provide a unique molecular background for process optimization and for metabolic engineered production of derived high value carotenoids. The potential of N. oceanica as a future important organism for bioproduction is augmented by the reported possibility to transform this eukaryotic alga by homologous recombination [204]. Optimized growth conditions for N. oceanica and for a mutant with an increased oil content have been determined [205–207]. 21.6.1.2

Other Examples

As in higher plants, the possibility of using the reducing power of PSI to drive energy requiring reactions in bioactive natural product biosynthesis has been demonstrated in cyanobacteria [208]. As a proof of concept, the soluble catalytic domain of the P450 SbCYP79A1 [160] was fused to the membrane anchor

21.7 Metabolons – Advantages Using Plants

of the PsaM subunit of Synechococcus sp. PCC7002 [208] and incorporated into the cyanobacterial thylakoids by homologous recombination using its natural transformability. The PsaM subunit was chosen as the anchor point for the P450 because it is a peripheral subunit of the PSI complex, to facilitate electron donation from soluble ferredoxin. With light as the energy input, the fusion protein was functionally active and catalyzed conversion of tyrosine to p-hydroxyphenylacetaldoxime with endogenously reduced ferredoxin serving as the electron donor [208]. For each cyanobacterium or alga to be developed into a production host, a proper scientific toolset needs to be established to render metabolic engineering feasible. The biosynthetic potential, in cell accumulation or excretion and auto-toxicity issues are key challenges that need to be overcome but obviously varies in importance dependent of the chosen host organism and the properties of the target molecule. A key issue in all cases would be to increase photosynthetic efficiency and thus reduce the unitary cost of biomass production [86, 87, 209].

21.7 Metabolons – Advantages Using Plants On demand formation of a myriad of bioactive natural products enables plants to respond quickly to a continuum of environmental stresses. Dynamic metabolons may be important contributors to this metabolic plasticity and extend compartmentalization to the molecular level [210, 211]. Metabolons facilitate intermediate channeling between consecutive enzymes, decrease the transit time of intermediates, avoid interference with other pathways, and prevent leakage of potentially toxic intermediates. Based on these properties, metabolons facilitate the formation of metabolic highways [212, 213]. Metabolons are typically composed of both membrane-anchored and soluble enzymes. P450s and POR anchored at the ER membrane may provide nucleation sites for metabolon assembly based on protein interactions [210–212, 214, 215]. Enzymes are dynamic chemical catalysts subject to local folding and unfolding events. Structural disorders affect distal changes and prime metabolon assembly [216]. Domain movements give rise to an ensemble of conformations oscillating at multiple time scales dependent on the functional properties [217–219]. As highly dynamic entities, the lifetime of a metabolon cannot be assessed from bulk studies but requires studies at the single molecule level and in each case may match the demand for a metabolic output [219]. These delicate metabolic mechanisms regulate the interplay between entangled metabolic grids controlling biosynthesis, storage, and recycling. For metabolic engineering approaches involving expression of numerous genes encoding the enzymes catalyzing an entire multistep biosynthetic pathway, the possibility of metabolon formation needs to be considered to optimize production of the target compound and avoid or limit the formation of side products. 21.7.1

Metabolic Engineering Approaches

Initially, metabolic engineering efforts in plants have been carried out without focus on metabolon formation as a way to augment yields and facilitate product

831

832

21 Metabolic Engineering of Photosynthetic Cells – in Collaboration with Nature

isolation. In synthetic biology approaches using combinatorial biochemistry, the modules are typically isolated from multiple different biological sources. This does not right away provide the opportunity to profit from inherent association domains in the individual enzyme modules used. Inspired by the natural systems that exhibit substrate channeling, strategies have been developed to incorporate the advantages of metabolon formation in metabolic engineering [220– 223]. Some of these strategies affords application independent on the metabolic pathway to be introduced, and programmability for fine-tuning desired pathway fluxes based on synthetic scaffolds. 21.7.1.1

Selected Examples: Metabolic Highways

In plants, enzymes of numerous biosynthetic pathways involved in both general and specialized metabolism have been proposed to organize in metabolons on the ER surface [210, 212, 214, 215]. In sorghum, the biosynthetic pathway for the cyanogenic glucoside dhurrin has been shown to be organized in a metabolon (Figure 21.10). The metabolon composed of SbCYP79A1, SbCYP71E1, SbUGT85B1, and SbPOR2b was isolated within its native membrane disc using styrene maleic acid polymer-based technology in the absence of detergents [211, 224]. In young sorghum seedlings, the dhurrin content constitutes up to 30% of the dry mass [165]. The high amount of dhurrin is synthesized de novo in the seedling since the mature seed only contains negligible amounts [225]. This illustrates the applied potential of metabolic engineering approaches based on organizing interesting biosynthetic pathways in metabolons. The crop species cassava (Manihot esculenta) produces the two cyanogenic glucosides linamarin and lotaustralin in all parts of the plant [226]. Analyses show strong diurnal regulation of the biosynthetic enzymes MeCYP79D1/D2, MeCYP71E7, and MeUGT85K4/K5 in the foliage [125, 226–228] both at the transcriptional and translational level. Both transcripts and enzymes disappear with the onset of morning light and starts to accumulate again at dusk [229]. In light, cyanogenic glucosides are recycled to provide reduced nitrogen to balance the rapid photosynthetic formation of carbon [23]. It is estimated that 20–25% of the cyanogenic glucoside content in the cassava leaves are turned over every day. Resynthesis then take place during the dark period. The recycling pathways have been shown to operate in cassava, sorghum and almonds [22, 23]. Thus, even though cyanogenic glucosides are considered constitutive defense molecules, they are continuously recycled and replenished. The costs of the operation

Figure 21.10 Model of the dhurrin metabolon illustrating how the biosynthetic enzymes involved in synthesis is envisioned arranged in dynamic enzyme complexes. Source: Laursen et al. [224]. Reprinted with permission from AAAS.

21.8 Biocondensates

of recycling and resynthesis are obviously outweighed by other physiological benefits to the plant [24]. In this context, it may be important that minute quantitative amounts of the enzymes are involved in biosynthesis of the often large quantities of cyanogenic glucosides produced. This has been demonstrated in sorghum seedlings, where SbCYP79A1 and SbCYP71E1 catalyzing the conversion of tyrosine to p-hydroxymandelonitrile (the aglucon of dhurrin) constitute less than 1% of the total protein in the seedling and are responsible for de novo synthesis of dhurrin from tyrosine in amounts equaling 30% of the seedling dry mass [160, 165]. As shown in sorghum, this reflects the efficiency of metabolon formation. If the biosynthetic pathway in cassava is also organized as a metabolon, the total loss of the biosynthetic proteins during the diurnal light period demonstrate the maximum lifetime of the metabolon, which is 12 hours. Traditionally, defense compounds in plants have been considered end products of linear pathways. The possibility of continuous recycling and resynthesis of bioactive natural products may be more prevalent than hitherto expected and needs to be reconsidered in efforts to metabolically engineer plants as producers of large amounts of natural products. On the other hand, understanding the biosynthetic potential of these different systems provide unique opportunities for engineering plants toward production of large amounts of bioactive natural products. 21.7.1.2

Other Examples

The presence of metabolons in plants is gaining increased attention and has been studied e.g. in the phenylpropanoid pathway [230–232], lipid synthesis [233], and in camalexin formation [215]. Each of these systems have been characterized using different methodologies and serve to illustrate the multitude of molecular mechanisms that provide the basis for plant plasticity.

21.8 Biocondensates All plant species produce a large number of bioactive natural products with different structural complexity and functionalities. As previously mentioned, most of these compounds are produced in low amounts just sufficient to meet the demand of the plant. This complicates the use of most plants as reliable and economically feasible sources of the majority of these compounds. This has promoted research on production of the bioactive natural products on demand using microbial systems [157, 234, 235]. However, some plant species in nature possess the ability to produce high amounts of bioactive natural products. The mechanisms enabling cells to store such high amounts of bioactive natural products while maintaining cellular homeostasis and avoiding autotoxicity are not yet understood. Knowledge on how this is accomplished could provide the route to entirely new sustainable light-driven production systems based on higher plants or microalgae. The central vacuole, chromoplasts, or trichomes are common storage sites for bioactive natural products in plants. In all cases, these storage sites are

833

834

21 Metabolic Engineering of Photosynthetic Cells – in Collaboration with Nature

encapsulated by membranes. An equivalent to cellular compartments confined by a membranous lipid bilayer may be established by liquid–liquid phase separation [236, 237]. Liquid–liquid phase separation arises when mixtures of sugars, protein amino acids, choline, and organic acids are present in certain amounts and in specific ratios. NMR studies show the ubiquitous presence of high amounts of such constituents in plant cells [237]. When these crystalline compounds are mixed in the right stoichiometric proportions, dissolved in water and lyophilized until the mass remains constant with approximately 8% water remaining, they form a highly viscous liquid termed a NAtural Deep Eutectic Solvent (NADES). These NADESs have been demonstrated to be excellent solvents for sparsely soluble bioactive natural products such as vanillin glucoside, rutin, and other flavonoids [238–242]. Because the constituents required to form NADESs are all present in plants, it was proposed that they might form a third liquid phase in the plant cell and function as crucial solvents for storage of bioactive natural products. The NADES might be formed in the cytosol or within membrane-encapsulated domains [237]. A 1:1 molar ratio of glucose:tartaric acid represents a NADES and is present in raisins. Raisins maintain a liquid phase despite an almost complete removal of water. NADES form a liquid crystal through formation of inter- and intramolecular hydrogen bonds [243–246]. This results in a high melting point depression causing the solids to form viscous oils and, in many cases, to remain fluid at room temperature [237]. In this way, the NADESs might function as super-molecules mimicking the polymers known to stabilize liquid–liquid phase separation in animal cells. The unparalleled solubilization capacity of NADESs would suggest that they possess the ability to establish hydrogen bonds to specific bioactive natural products. Upon addition of water, the unique properties of the NADES are gradually abolished. At present, no direct evidence for the in planta existence or possible mode of establishing NADES-based biocondensates has been reported.

21.8.1 21.8.1.1

Metabolic Engineering Approaches Selected Examples: In Cell Storage Capacity

Some plant species are known to contain tissues with extraordinary high levels of bioactive natural products. This applies to young sorghum seedlings where the cyanogenic glucoside dhurrin represents 30% of the dry mass [165] (Section 21.7.1.1). In the black flower petals of the gentiana “flower of death” (Lisianthius nigrescens) (Figure 21.11), the anthocyanin content represents 24% of the dry mass [247, 248]. The buds and flowers of sophora (Sophora japonica, Pagoda tree) contain the flavonoid rutin that constitutes up to 30% of their dry mass [237, 240, 249]. In pods of the vanilla orchid (Vanilla planifolia), vanillin glucoside is present in 4.7 M concentrations stored in senescent chloroplasts [147, 148]. Indian coleus (Coleus forskohlii) accumulates the manoyl oxide derived diterpenoid forskolin in specialized cells in the outer root bark [144]. In cannabis (Cannabis sativa), cannabinoids accumulate in balloon-shaped secretory cavity positioned at the tip of the capitate-stalked trichome at concentrations approching 0.3 M [250, 251].

21.8 Biocondensates

(a)

(b)

(c)

(d)

Figure 21.11 (a) Black flower petals of the gentiana “flower of death” (Lisianthius nigrescens) https://www.projectnoah.org/spottings/37532042, (b) purple lisianthus (Eustoma grandiflorum) and transverse light-microscopy sections showing anthocyanin condensates “black holes” from purple lisianthus (c), and blue-gray carnation (Dianthus caryophyllus) (d). Source: (a) Black Wild Flower, Project Noah. (b)–(d) Markham et al. [247]. © 2000 Elsevier.

Detailed investigations on the molecular processes orchestrating each of these storage systems would offer intriguing opportunities to turn molecular engineered plants into highly productive light-driven producers of bioactive natural products. The enzymes catalyzing dhurrin synthesis in sorghum have all been isolated and characterized [136, 160, 161, 252–255]. In spite of being a well-established experimental model system, the precise subcellular localization of dhurrin remains unresolved. Raman hyperspectral imaging spectroscopy indicated the absence of dhurrin from the central vacuole in sorghum seedlings [256]. This would imply a localization in the cytosol either in an organelle like the chloroplast or vesicular structures or possible in membrane-less biocondensates formed by liquid–liquid phase partitioning. However, a biocondensate not confined by a membrane would be difficult to isolate. NADES may provide an inert environment protecting enzymes against irreversible denaturation and catalytic inactivation. Using a glucose:tartrate NADES and the dhurrin biosynthetic enzymes as experimental system, the NADES was shown to increase storage and heat stability of the biosynthetic enzymes [257]. Vanillin is the main aroma compound of vanilla extracts. In the vanilla orchid, vanillin is produced and stored as vanillin glucoside [147]. It is synthetized from ferulic acid or ferulic acid glucoside in a process catalyzed by a cysteine proteinase named vanillin synthase [258, 259]. The vanillin synthase is localized in the chloroplast of the inner pod mesophyll [148]. The synthesis of vanillin glucoside progresses over the last three to four months of pod development. As part of pod maturation, the chloroplasts senesce and gradually redifferentiate into phenyloplasts (Figure 21.12). Vanillin glucoside accumulates in the stroma of the phenyloplasts at concentrations reaching 4.7 M [147]. It remains to be elucidated how such extreme high molar concentrations of vanillin glucoside are accumulated and stored in the phenyloplasts. The diterpenoid forskolin (Section 21.5, Figure 21.7) has gained considerable interest because of a wide range of pharmacological applications. The structure of forskolin carries eight chiral carbon atoms. The efficacy of forskolin relies on its ability to activate adenylate cyclase resulting in increased levels of intracellular 3′ ,5′ -cyclic adenosine monophosphate (cAMP) [143, 144]. Forskolin is produced

835

836

21 Metabolic Engineering of Photosynthetic Cells – in Collaboration with Nature

21.8 Biocondensates

Figure 21.12 Vanillin glucoside synthesis and accumulation takes place in the inner part of the mesophyll tissue of the vanilla pod in chloroplasts that during pod maturation redifferentiates into phenyloplasts harboring molar concentrations of vanillin glucoside. (a) Cross-section of a vanilla pod showing the different tissues present. A section of the inner part of the mesophyll tissue where vanillin glucoside synthesis takes place is marked by a yellow square. (b) Redifferentiation of chloroplasts into phenyloplasts in the course of pod maturation. (i) Initiation of redifferentiation of chloroplasts into phenyloplasts in a vanilla fruit four months after pollination. Chloroplast showing grana thylakoids and plastoglobules. (ii) Redifferentiating chloroplast showing granular stroma and grana thylakoid membranes generating loculi between them. Insert: magnification of rough thylakoids (bar 40 nm). (iii) Budding of the thylakoid membranes into pseudocircular vesicles containing ribosomes (blacklined arrow). Free vesicles are also seen (white-lined arrow). (iv) Increasing number of loculi. Emergence of osmiophilic material (white-lined arrow). (v) A plastid showing its twin membranes and a locule filled with vanillin glucoside. (vi) A mature filled phenyloplast with an entirely osmiophilic content and a surrounding membrane system. Plastoglobules no longer visible. cy, cytoplasm; cw, cell wall; gt, grana thylakoid; lo, locule; mb, membrane; pg, plastoglobule; st, stroma; th, thylakoid; t, tonoplast; v, vacuole. Sources: (a) Gallage et al. [148], by permission of Oxford University Press and (b) Brillouet et al. [147]. CC BY 3.0.

in Indian coleus. It accumulates in the root cork in a specialized cell type with each cell containing a histochemical structure, reminiscent of an oil body in which forskolin is stored [144]. The entire forskolin pathway was elucidated by combined use of metabolomics and transcriptomics, and verified by stable expression of the biosynthetic genes in yeast and by transient expression in tobacco [143, 144]. The substrate specificities of the involved P450s show that they most likely function as part of a metabolic grid and not as part of a strictly linear pathway. Analysis of the presence of potential NADES candidates in isolated forskolin containing oil droplets in conjunction with deep analyses of the available transcriptomes could possibly provide knowledge on the onset of transcription of genes encoding enzymes catalyzing NADES production and of transcription factors controlling the entire process of forskolin biosynthesis and storage. Storage of forskolin in oil bodies in specialized cells in the outer root bark illustrate the approach used by plants to combine physical defense in the form of cell walls with a chemical defense based on bioactive natural products. In other plant species diterpenoids are stored in trichomes. This applies for the cannabinoids in Cannabis sativa. They accumulate in glandular trichomes present at the highest density in female flowers [250, 260]. The cannabinoids are secreted into a balloon-shaped secretory cavity positioned at the tip of the capitate-stalked trichome [250, 261, 262]. The cannabinoid biosynthetic pathway has recently been elucidated [157, 158, 263–265]. It is interesting that tetrahydrocannabinolic acid synthase and cannabidiolic acid synthase, catalyzing final biosynthetic steps, are localized in the hydrophobic phase of the balloon-shaped extracellular storage cavity. The very same storage cavity also contains hydrophilic, amphiphilic, and osmoprotective compounds that could possibly represent a NADES stabilizing the enzymes and keeping the cannabinoids present in solution [251, 266, 267].

837

838

21 Metabolic Engineering of Photosynthetic Cells – in Collaboration with Nature

21.9 Conclusion: Metabolic Engineering of Plants in the Transition Toward a Biobased Society Plants are biochemists and chemists par excellence in nature using CO2 from the atmosphere as their carbon source and solar energy to provide humanity with food, feed, biomaterials, and feedstock, e.g. for biofuel production. On top of this, plants produce the vitamins and essential amino acids that are vital components of the human diet. Three unique biological processes make this possible: photosynthesis, the biosynthesis of a cell wall and production of a myriad of bioactive natural products often based on the action of a plethora of P450s. As outlined in this review, advances in plant basic research is now rendering it possible to further evolve these unique processes. To maintain plant vitality it is important to collaborate with nature and to focus metabolic engineering efforts on inherent properties already embedded in the plant as part of its plasticity. This plasticity is displayed in nature by the major differences between species with respect to the organization of the photosynthetic processes, the differences in cell wall composition, and the differences in natural products formed. Thus, plant metabolic engineering plays a fundamental role in facilitating the transition from the petrochemical-based production systems of the Anthropocene Era into a plant-centered Planthropocene Era based on environmental benign global scale production systems. This is the context in which the metabolic engineering topics discussed in the review were chosen. Plant metabolic engineering may set the stage for disruptive innovation steps offering new industries a clear competitive edge. In conjunction with synthetic biology, plant metabolic engineering is gaining recognition as such a transformative technology with the power to provide science-based recommendations on how to address a wide range of the global challenges humanity is facing in the decades in front of us. A semiopen approach to knowledge development based on the “share-your-parts-idea” has inspired citizen scientists around the globe to engage in advancing sustainable production systems. It is important to recognize that open sharing of ideas and knowledge across the globe is essential to meet the challenges we are facing. The successful transition to and realization of a knowledge-based bioeconomy relies heavily upon the ability to turn future challenges into vehicles for sustainable growth and shared prosperity. Focus on development, marketing of new and innovative products manufactured using renewable resources, local production, and novel green technologies that possess the transformative power and are within the economic realm becomes crucial. In the past, plants and other photosynthetic organisms have made the world. If humanity wants to prosper, plants are also going to make our future.

Acknowledgments Financial support from the VILLUM Foundation to the VILLUM research center “Plant Plasticity,” from the Lundbeck Foundation to the research initiative “Brewing Diterpenoids,” from the Novo Nordisk Interdisciplinary Biosynergy

References

Program to the research initiative “Desert-loving Therapeutics,” and from the Novo Nordisk Foundation Distinguished Researcher Program is gratefully acknowledged. BLM acknowledges the continued strong support from private Danish Research Foundations and the European Research Council throughout his entire career. We are grateful to Prof. Peter Ulvskov and Assistant Prof. Tomas Laursen for valuable discussions and advices on the manuscript. Many researchers have made important contributions within plant metabolic engineering but do not find their work discussed and cited in our review. Please accept our apologies. Our intent with this review was not to present an exhaustive list of all research advances made but rather as outlined in the introduction to focus on advances within photosynthetic, cell wall, and natural bioactive product research with a few glimpses of other research areas.

References 1 Choi, K.R., Jiao, S., and Lee, S.Y. (2020). Metabolic engineering strategies

2 3

4

5

6

7 8 9

10

11

toward production of biofuels. Current Opinion in Chemical Biology 59: 1–14. Tattersall, D.B., Bak, S., Jones, P.R. et al. (2001). Resistance to an herbivore through engineered cyanogenic glucoside synthesis. Science 293: 1826–1828. Kromdijk, J., Głowacka, K., Leonelli, L. et al. (2016). Improving photosynthesis and crop productivity by accelerating recovery from photoprotection. Science 354: 857–861. Hughes, J., Hepworth, C., Dutton, C. et al. (2017). Reducing stomatal density in barley improves drought tolerance without impacting on yield. Plant Physiology 174: 776–787. Butelli, E., Titta, L., Giorgio, M. et al. (2008). Enrichment of tomato fruit with health-promoting anthocyanins by expression of select transcription factors. Nature Biotechnology 26: 1301–1308. Zhang, Y., Butelli, E., Alseekh, S. et al. (2015). Multi-level engineering facilitates the production of phenylpropanoid compounds in tomato. Nature Communications 6: 8635. Ort, D.R. and Melis, A. (2011). Optimizing antenna size to maximize photosynthetic efficiency. Plant Physiology 155: 79–85. Dudareva, N. and Pichersky, E. (2008). Metabolic engineering of plant volatiles. Current Opinion in Biotechnology 19: 181–189. South, P.F., Cavanagh, A.P., Liu, H.W., and Ort, D.R. (2019). Synthetic glycolate metabolism pathways stimulate crop growth and productivity in the field. Science 363: eaat9077. Brugliera, F., Tao, G.Q., Tems, U. et al. (2013). Violet/blue chrysanthemums – metabolic engineering of the anthocyanin biosynthetic pathway results in novel petal colors. Plant & Cell Physiology 54: 1696–1710. Nour-Eldin, H.H., Madsen, S.R., Engelen, S. et al. (2017). Reduction of antinutritional glucosinolates in Brassica oilseeds by mutation of genes encoding transporters. Nature Biotechnology 35: 377–382.

839

840

21 Metabolic Engineering of Photosynthetic Cells – in Collaboration with Nature

12 Renau-Morata, B., Carrillo, L., Cebolla-Cornejo, J. et al. (2020). The tar-

13

14

15 16

17

18

19

20 21

22

23

24 25

26 27

geted overexpression of SlCDF4 in the fruit enhances tomato size and yield involving gibberellin signalling. Scientific Reports – UK 10: 10645. https:// doi.org/10.1038/s41598-020-67537-x. Drewry, D.T., Kumar, P., and Long, S.P. (2014). Simultaneous improvement in productivity, water use, and albedo through crop structural modification. Global Change Biology 20: 1955–1967. Lopez-Arredondo, D.L., Leyva-Gonzalez, M.A., Alatorre-Cobos, F., and Herrera-Estrella, L. (2013). Biotechnology of nutrient uptake and assimilation in plants. International Journal of Developmental Biology 57: 595–610. Chen, L.Y. and Liao, H. (2017). Engineering crop nutrient efficiency for sustainable agriculture. Journal of Integrative Plant Biology 59: 710–735. Ferrol, N., Azcon-Aguilar, C., and Perez-Tienda, J. (2019). Review: arbuscular mycorrhizas as key players in sustainable plant phosphorus acquisition: an overview on the mechanisms involved. Plant Science 280: 441–447. Møller, I.S., Gilliham, M., Jha, D. et al. (2009). Shoot Na+ exclusion and increased salinity tolerance engineered by cell type-specific alteration of Na+ transport in Arabidopsis. The Plant Cell 21: 2163–2178. Giesemann, P., Rasmussen, H.N., Liebel, H.T., and Gebauer, G. (2020). Discreet heterotrophs: green plants that receive fungal carbon through Paris-type arbuscular mycorrhiza. The New Phytologist 226: 960–966. Uga, Y., Sugimoto, K., Ogawa, S. et al. (2013). Control of root system architecture by DEEPER ROOTING 1 increases rice yield under drought conditions. Nature Genetics 45: 1097–1102. Huisman, R. and Geurts, R. (2020). A roadmap toward engineered nitrogen-fixing nodule symbiosis. Plant Communications 1: 100019. Dar, M.H., Zaidi, N.W., Waza, S.A. et al. (2018). No yield penalty under favorable conditions paving the way for successful adoption of flood tolerant rice. Scientific Reports – UK 8: 9245. Bjarnholt, N., Neilson, E.H., Crocoll, C. et al. (2018). Glutathione transferases catalyze recycling of auto-toxic cyanogenic glucosides in sorghum. The Plant Journal 94: 1109–1125. Picmanova, M., Neilson, E.H., Motawia, M.S. et al. (2015). A recycling pathway for cyanogenic glycosides evidenced by the comparative metabolic profiling in three cyanogenic plant species. The Biochemical Journal 469: 375–389. Neilson, E.H., Goodger, J.Q., Woodrow, I.E., and Møller, B.L. (2013). Plant chemical defense: at what cost? Trends in Plant Science 18: 250–258. Nishiyama, T., Sakayama, H., de Vries, J. et al. (2018). The Chara genome: secondary complexity and implications for plant terrestrialization. Cell 174: 448–464.e24. Harholt, J., Moestrup, O., and Ulvskov, P. (2016). Why plants were terrestrial from the beginning. Trends in Plant Science 21: 96–101. Ulvskov, P. and Harholt, J. (2018). A new polysaccharide with a long evolutionary history. The Plant Cell 30: 1165–1166.

References

28 Jensen, J.K., Busse-Wicher, M., Poulsen, C.P. et al. (2018). Identification of

29

30 31

32 33

34

35

36 37

38 39 40

41

42

43

an algal xylan synthase indicates that there is functional orthology between algal and plant cell wall biosynthesis. The New Phytologist 218: 1049–1060. Pear, J.R., Kawagoe, Y., Schreckengost, W.E. et al. (1996). Higher plants contain homologs of the bacterial celA genes encoding the catalytic subunit of cellulose synthase. Proceedings of the National Academy of Sciences 93: 12637–12642. Blankenship, R.E. (2017). How cyanobacteria went green. Science 355: 1372–1373. Cannell, N., Emms, D.M., Hetherington, A.J. et al. (2020). Multiple metabolic innovations and losses are associated with major transitions in land plant evolution. Current Biology 30: 1783–1800.e11. Cheng, S.F., Xian, W.F., Fu, Y. et al. (2019). Genomes of subaerial Zygnematophyceae provide insights into land plant evolution. Cell 179: 1057. Rousseau Gueutin, M., Keller, J., Ferreira De Carvalho, J. et al. (2018). The intertwined chloroplast and nuclear genome coevolution in plants. In: Plant Growth and Regulation – Alterations to Sustain Unfavorable Conditions. Londres: InTech Open. Stephenson, P.G., Moore, C.M., Terry, M.J. et al. (2011). Improving photosynthesis for algal biofuels: toward a green revolution. Trends in Biotechnology 29: 615–623. Zhu, X.G., Long, S.P., and Ort, D.R. (2008). What is the maximum efficiency with which photosynthesis can convert solar energy into biomass? Current Opinion in Biotechnology 19: 153–159. Nelson, N. and Ben-Shem, A. (2004). The complex architecture of oxygenic photosynthesis. Nature Reviews Molecular Cell Biology 5: 971–982. Faralli, M. and Lawson, T. (2020). Natural genetic variation in photosynthesis: an untapped resource to increase crop yield potential? The Plant Journal 101: 518–528. Roles, J., Yarnold, J., Wolf, J. et al. (2020). Charting a development path to deliver cost competitive microalgae-based fuels. Algal Research 45. Gust, D., Moore, T.A., and Moore, A.L. (2009). Solar fuels via artificial photosynthesis. Accounts of Chemical Research 42: 1890–1898. Long, S.P., Marshall-Colon, A., and Zhu, X.G. (2015). Meeting the global food demand of the future by engineering crop photosynthesis and yield potential. Cell 161: 56–66. Lughadha, E.N., Govaerts, R., Belyaeva, I. et al. (2016). Counting counts: revised estimates of numbers of accepted species of flowering plants, seed plants, vascular plants and land plants with a review of other recent estimates. Phytotaxa 272: 82–88. Singh, S.K., Sundaram, S., Sinha, S. et al. (2016). Recent advances in CO2 uptake and fixation mechanism of cyanobacteria and microalgae. Critical Reviews in Environmental Science and Technology 46: 1297–1323. Kuroda, H. and Maliga, P. (2001). Complementarity of the 16S rRNA penultimate stem with sequences downstream of the AUG destabilizes the plastid mRNAs. Nucleic Acids Research 29: 970–975.

841

842

21 Metabolic Engineering of Photosynthetic Cells – in Collaboration with Nature

44 Zhou, F., Badillo-Corona, J.A., Karcher, D. et al. (2008). High-level expres-

45

46 47

48

49 50 51

52

53 54

55

56

57

58

sion of human immunodeficiency virus antigens from the tobacco and tomato plastid genomes. Plant Biotechnology Journal 6: 897–913. Oey, M., Lohse, M., Kreikemeyer, B., and Bock, R. (2009). Exhaustion of the chloroplast protein synthesis capacity by massive expression of a highly stable protein antibiotic. The Plant Journal 57: 436–445. Maliga, P. (2004). Plastid transformation in higher plants. Annual Review of Plant Biology 55: 289–313. Bock, R. (2007). Plastid biotechnology: prospects for herbicide and insect resistance, metabolic engineering and molecular farming. Current Opinion in Biotechnology 18: 100–106. Ruf, S., Hermann, M., Berger, I.J. et al. (2001). Stable genetic transformation of tomato plastids and expression of a foreign protein in fruit. Nature Biotechnology 19: 870–875. Dufourmantel, N., Pelissier, B., Garcon, F. et al. (2004). Generation of fertile transplastomic soybean. Plant Molecular Biology 55: 479–489. Hou, B.-K., Zhou, Y.-H., Wan, L.-H. et al. (2003). Chloroplast transformation in oilseed rape. Transgenic Research 12: 111–114. Ruhlman, T., Ahangari, R., Devine, A. et al. (2007). Expression of cholera toxin B-proinsulin fusion protein in lettuce and tobacco chloroplasts – oral administration protects against development of insulitis in non-obese diabetic mice. Plant Biotechnology Journal 5: 495–510. Oey, M., Lohse, M., Scharff, L.B. et al. (2009). Plastid production of protein antibiotics against pneumonia via a new strategy for high-level expression of antimicrobial proteins. Proceedings of the National Academy of Sciences of the United States of America 106: 6579–6584. Bock, R. (2014). Engineering chloroplasts for high-level foreign protein expression. Methods in Molecular Biology (Clifton, NJ) 1132: 93–106. Lu, Y., Rijzaani, H., Karcher, D. et al. (2013). Efficient metabolic pathway engineering in transgenic tobacco and tomato plastids with synthetic multigene operons. Proceedings of the National Academy of Sciences 110: E623–E632. Tong, Y., Weber, T., and Lee, S.Y. (2019). CRISPR/Cas-based genome engineering in natural product discovery. Natural Product Reports 36: 1262–1280. Liu, G., Li, J., and Godwin, I.D. (2019). Genome editing by CRISPR/Cas9 in Sorghum through biolistic bombardment. In: Sorghum: Methods and Protocols (eds. Z.-Y. Zhao and J. Dahlberg), 169–183. Springer New York: New York, NY. Chen, R., Chen, X., Hagel, J.M., and Facchini, P.J. (2020). Virus-induced gene silencing to investigate alkaloid biosynthesis in opium poppy. In: Virus-Induced Gene Silencing in Plants: Methods and Protocols, 75–92. New York, NY: Springer US. Schnabel, A., Cotinguiba, F., Athmer, B. et al. (2020). A piperic acid CoA ligase produces a putative precursor of piperine, the pungent principle from black pepper fruits. The Plant Journal 102: 569–581.

References

59 Bally, J., Jung, H., Mortimer, C. et al. (2018). The rise and rise of Nicotiana

60

61

62

63 64

65

66

67

68

69

70

71

72

73

benthamiana: a plant for all reasons. Annual Review of Phytopathology 56: 405–426. Bjarnholt, N., Li, B., D’Alvise, J., and Janfelt, C. (2014). Mass spectrometry imaging of plant metabolites – principles and possibilities. Natural Product Reports 31: 818–837. Li, B., Knudsen, C., Hansen, N.K. et al. (2013). Visualizing metabolite distribution and enzymatic conversion in plant tissues by desorption electrospray ionization mass spectrometry imaging. The Plant Journal 74: 1059–1071. Montini, L., Crocoll, C., Gleadow, R.M. et al. (2020). Matrix-assisted laser desorption/ionization-mass spectrometry imaging of metabolites during Sorghum germination. Plant Physiology 183: 925–942. Boughton, B.A., Thinagaran, D., Sarabia, D. et al. (2016). Mass spectrometry imaging for plant biology: a review. Phytochemistry Reviews 15: 445–488. Belcher, M.S., Vuu, K.M., Zhou, A. et al. (2020). Design of orthogonal regulatory systems for modulating gene expression in plants. Nature Chemical Biology 16: 857–865. Choi, K.R., Jang, W.D., Yang, D. et al. (2019). Systems metabolic engineering strategies: integrating systems and synthetic biology with metabolic engineering. Trends in Biotechnology 37: 817–837. Li, D.P., Halitschke, R., Baldwin, I.T., and Gaquerel, E. (2020). Information theory tests critical predictions of plant defense theory for specialized metabolism. Science Advances 6 (24): eaaz0381. https://doi.org/10.1126/ sciadv.aaz0381. Xu, S.Q., Kreitzer, C., McGale, E. et al. (2020). Allelic differences of clustered terpene synthases contribute to correlated intraspecific variation of floral and herbivory-induced volatiles in a wild tobacco. The New Phytologist https://doi.org/10.1111/nph.16739. Valim, H., Dalton, H., Joo, Y. et al. (2020). TOC1 in Nicotiana attenuata regulates efficient allocation of nitrogen to defense metabolites under herbivory stress. The New Phytologist. Lassen, L.M., Nielsen, A.Z., Ziersen, B. et al. (2014). Redirecting photosynthetic electron flow into light-driven synthesis of alternative products including high-value bioactive natural compounds. ACS Synthetic Biology 3: 1–12. Bracher, A., Whitney, S.M., Hartl, F.U., and Hayer-Hartl, M. (2017). Biogenesis and metabolic maintenance of rubisco. Annual Review of Plant Biology 68: 29–60. Field, C.B., Behrenfeld, M.J., Randerson, J.T., and Falkowski, P. (1998). Primary production of the biosphere: integrating terrestrial and oceanic components. Science 281: 237–240. Erb, T.J. and Zarzycki, J. (2018). A short history of RubisCO: the rise and fall (?) of Nature’s predominant CO2 fixing enzyme. Current Opinion in Biotechnology 49: 100–107. Busch, F.A. (2020). Photorespiration in the context of rubisco biochemistry, CO2 diffusion and metabolism. The Plant Journal 101: 919–939.

843

844

21 Metabolic Engineering of Photosynthetic Cells – in Collaboration with Nature

74 Eisenhut, M. and Weber, A.P.M. (2019). Improving crop yield. Science 363:

32–33. 75 Bauwe, H., Hagemann, M., and Fernie, A.R. (2010). Photorespiration: play-

ers, partners and origin. Trends in Plant Science 15: 330–336. 76 Ray, D.K., Mueller, N.D., West, P.C., and Foley, J.A. (2013). Yield trends are

insufficient to double global crop production by 2050. PLoS One 8: e66428. 77 Zhu, X.-G., Long, S.P., and Ort, D.R. (2010). Improving photosynthetic

efficiency for greater yield. Annual Review of Plant Biology 61: 235–261. 78 Głowacka, K., Kromdijk, J., Kucera, K. et al. (2018). Photosystem II subunit

79

80

81 82

83

84 85

86

87 88

89

90

S overexpression increases the efficiency of water use in a field-grown crop. Nature Communications 9: 868. Park, S., Steen, C.J., Lyska, D. et al. (2019). Chlorophyll–carotenoid excitation energy transfer and charge transfer in Nannochloropsis oceanica for the regulation of photosynthesis. Proceedings of the National Academy of Sciences 116: 3385–3390. Correa-Galvis, V., Redekop, P., Guan, K. et al. (2016). Photosystem II subunit PsbS is involved in the induction of LHCSR protein-dependent energy dissipation in Chlamydomonas reinhardtii. The Journal of Biological Chemistry 291: 17478–17487. Schenck, C.A. and Maeda, H.A. (2018). Tyrosine biosynthesis, metabolism, and catabolism in plants. Phytochemistry 149: 82–102. Nielsen, A.Z., Mellor, S.B., Vavitsas, K. et al. (2016). Extending the biosynthetic repertoires of cyanobacteria and chloroplasts. The Plant Journal 87: 87–102. Lande, N.V., Barua, P., Gayen, D. et al. (2020). Proteomic dissection of the chloroplast: moving beyond photosynthesis. Journal of Proteomics 212: 103542. Tetali, S.D. (2019). Terpenes and isoprenoids: a wealth of compounds for global use. Planta 249: 1–8. Peterhansel, C., Niessen, M., and Kebeish, R.M. (2008). Metabolic engineering towards the enhancement of photosynthesis. Photochemistry and Photobiology 84: 1317–1323. Ort, D.R., Merchant, S.S., Alric, J. et al. (2015). Redesigning photosynthesis to sustainably meet global food and bioenergy demand. Proceedings of the National Academy of Sciences 112: 8529–8536. Bailey-Serres, J., Parker, J.E., Ainsworth, E.A. et al. (2019). Genetic strategies for improving crop yields. Nature 575: 109–118. Mnich, E., Bjarnholt, N., Eudes, A. et al. (2020). Phenolic cross-links: building and de-constructing the plant cell wall. Natural Product Reports 37: 919–961. Brandon, A.G. and Scheller, H.V. (2020). Engineering of bioenergy crops: dominant genetic approaches to improve polysaccharide properties and composition in biomass. Frontiers in Plant Science 11: 282. Lampugnani, E.R., Flores-Sandoval, E., Tan, Q.W. et al. (2019). Cellulose synthesis – central components and their evolutionary relationships. Trends in Plant Science 24: 402–412.

References

91 Watanabe, Y., Schneider, R., Barkwill, S. et al. (2018). Cellulose synthase

92 93 94

95

96 97

98 99

100

101

102

103

104 105

106

107

complexes display distinct dynamic behaviors during xylem transdifferentiation. Proceedings of the National Academy of Sciences of the United States of America 115: E6366–E6374. Scheller, H.V. and Ulvskov, P. (2010). Hemicelluloses. Annual Review of Plant Biology 61: 263–289. Eudes, A., Liang, Y., Mitra, P., and Loque, D. (2014). Lignin bioengineering. Current Opinion in Biotechnology 26: 189–198. Loque, D., Scheller, H.V., and Pauly, M. (2015). Engineering of plant cell walls for enhanced biofuel production. Current Opinion in Plant Biology 25: 151–161. Renault, H., Werck-Reichhart, D., and Weng, J.K. (2019). Harnessing lignin evolution for biotechnological applications. Current Opinion in Biotechnology 56: 105–111. Schreiber, L. (2010). Transport barriers made of cutin, suberin and associated waxes. Trends in Plant Science 15: 546–553. Renault, H., Alber, A., Horst, N.A. et al. (2017). A phenol-enriched cuticle is ancestral to lignin evolution in land plants. Nature Communications 8: 14713. Pauly, M. and Keegstra, K. (2008). Cell-wall carbohydrates and their modification as a resource for biofuels. The Plant Journal 54: 559–568. Valencia, R.H.C. and Hagan, A. (2020). The Chemical Industry Under the 4th Industrial Revolution: The Sustainable, Digital and Citizens One. Wiley-VCH Verlag GmbH. UNEP (2019). The evolving chemicals economy: status and trends relevant for sustainability. In: Global Chemicals Outlook II (ed. Programme UNE). UNEP. Petersen, P.D., Lau, J., Ebert, B. et al. (2012). Engineering of plants with improved properties as biofuels feedstocks by vessel-specific complementation of xylan biosynthesis mutants. Biotechnology for Biofuels 5: 84. Yan, J.W., Aznar, A., Chalvin, C. et al. (2018). Increased drought tolerance in plants engineered for low lignin and low xylan content. Biotechnology for Biofuels 11: 195. Brandon, A.G., Birdseye, D.S., and Scheller, H.V. (2020). A dominant negative approach to reduce xylan in plants. Plant Biotechnology Journal 18: 5–7. Yang, F., Mitra, P., Zhang, L. et al. (2013). Engineering secondary cell wall deposition in plants. Plant Biotechnology Journal 11: 325–335. Scullin, C., Cruz, A.G., Chuang, Y.D. et al. (2015). Restricting lignin and enhancing sugar deposition in secondary cell walls enhances monomeric sugar release after low temperature ionic liquid pretreatment. Biotechnology for Biofuels 8: 95. https://doi.org/10.1186/s13068-015-0275-2. Eudes, A., Pereira, J.H., Yogiswara, S. et al. (2016). Exploiting the substrate promiscuity of hydroxycinnamoyl-CoA:shikimate hydroxycinnamoyl transferase to reduce lignin. Plant & Cell Physiology 57: 568–579. Eudes, A., Sathitsuksanoh, N., Baidoo, E.E.K. et al. (2015). Expression of a bacterial 3-dehydroshikimate dehydratase reduces lignin content and

845

846

21 Metabolic Engineering of Photosynthetic Cells – in Collaboration with Nature

108

109

110 111 112

113

114 115 116

117

118

119 120

121 122

123

improves biomass saccharification efficiency. Plant Biotechnology Journal 13: 1241–1250. Sundin, L., Vanholme, R., Geerinck, J. et al. (2014). Mutation of the inducible ARABIDOPSIS THALIANA CYTOCHROME P450 REDUCTASE2 alters lignin composition and improves saccharification. Plant Physiology 166: 1956–1971. Li, G., Jones, K.C., Eudes, A. et al. (2018). Overexpression of a rice BAHD acyltransferase gene in switchgrass (Panicum virgatum L.) enhances saccharification. BMC Biotechnology 18: 54. Baldwin, I.T. (2017). Plant science: the plant as pugilist. Nature 543: 39. Banerjee, P., Erehman, J., Gohlke, B.O. et al. (2015). Super Natural II – a database of natural products. Nucleic Acids Research 43: D935–D939. He, J., Fandino, R.A., Halitschke, R. et al. (2019). An unbiased approach elucidates variation in (S)-(+)-linalool, a context-specific mediator of a tri-trophic interaction in wild tobacco. Proceedings of the National Academy of Sciences of the United States of America 116: 14651–14660. Halitschke, R., Stenberg, J.A., Kessler, D. et al. (2008). Shared signals – “alarm calls” from plants increase apparency to herbivores and their enemies in nature. Ecology Letters 11: 24–34. Nelson, D. and Werck-Reichhart, D. (2011). A P450-centric view of plant evolution. The Plant Journal 66: 194–211. Nelson, D.R. (2018). Cytochrome P450 diversity in the tree of life. Bba-Proteins Proteom 1866: 141–154. Hamberger, B. and Bak, S. (2013). Plant P450s as versatile drivers for evolution of species-specific chemical diversity. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 368: 20120426. Ilc, T., Arista, G., Tavares, R. et al. (2018). Annotation, classification, genomic organization and expression of the Vitis vinifera CYPome. PLoS One 13: e0199902. Sanchez-Munoz, R., Perez-Mata, E., Almagro, L. et al. (2020). A novel hydroxylation step in the taxane biosynthetic pathway: a new approach to paclitaxel production by synthetic biology. Frontiers in Bioengineering and Biotechnology 8 (410) https://doi.org/10.3389/fbioe.2020.00410. Sanchez-Perez, R., Pavan, S., Mazzeo, R. et al. (2019). Mutation of a bHLH transcription factor allowed almond domestication. Science 364: 1095–1098. Xu, Z.S., Feng, K., Que, F. et al. (2017). A MYB transcription factor, DcMYB6, is involved in regulating anthocyanin biosynthesis in purple carrot taproots. Scientific Reports – UK 7: 45324. https://doi.org/10.1038/ srep45324. Broun, P. (2004). Transcription factors as tools for metabolic engineering in plants. Current Opinion in Plant Biology 7: 202–209. Li, J.W., Zhang, X.Y., Wu, H., and Bai, Y.P. (2020). Transcription factor engineering for high-throughput strain evolution and organic acid bioproduction: a review. Frontiers in Bioengineering and Biotechnology 8: 98. Yuan, S.-F., Yi, X., Johnston, T.G., and Alper, H.S. (2020). De novo resveratrol production through modular engineering of an Escherichia coli–Saccharomyces cerevisiae co-culture. Microbial Cell Factories 19: 143.

References

124 Tohge, T., Zhang, Y., Peterek, S. et al. (2015). Ectopic expression of snap-

125

126

127 128 129

130

131

132

133

134

135 136

137

138

dragon transcription factors facilitates the identification of genes encoding enzymes of anthocyanin decoration in tomato. The Plant Journal 83: 686–704. Jørgensen, K., Bak, S., Busk, P.K. et al. (2005). Cassava plants with a depleted cyanogenic glucoside content in leaves and tubers. distribution of cyanogenic glucosides, their site of synthesis and transport, and blockage of the biosynthesis by RNA interference technology. Plant Physiology 139: 363–374. Usher, S., Haslam, R.P., Ruiz-Lopez, N. et al. (2015). Field trial evaluation of the accumulation of omega-3 long chain polyunsaturated fatty acids in transgenic Camelina sativa: making fish oil substitutes in plants. Metabolic Engineering Communications 2: 93–98. Vogt, T. (2010). Phenylpropanoid biosynthesis. Molecular Plant 3: 2–20. Giuliano, G., Tavazza, R., Diretto, G. et al. (2008). Metabolic engineering of carotenoid biosynthesis in plants. Trends in Biotechnology 26: 139–145. Ye, X.D., Al-Babili, S., Kloti, A. et al. (2000). Engineering the provitamin A (beta-carotene) biosynthetic pathway into (carotenoid-free) rice endosperm. Science 287: 303–305. Paine, J.A., Shipton, C.A., Chaggar, S. et al. (2005). Improving the nutritional value of Golden Rice through increased pro-vitamin A content. Nature Biotechnology 23: 482–487. Diretto, G., Al-Babili, S., Tavazza, R. et al. (2007). Metabolic engineering of potato carotenoid content through tuber-specific overexpression of a bacterial mini-pathway. PLoS One 2: e350. Glenn, W.S., Runguphan, W., and O’Connor, S.E. (2013). Recent progress in the metabolic engineering of alkaloids in plant systems. Current Opinion in Biotechnology 24: 354–365. Singh, A., Menendez-Perdomo, I.M., and Facchini, P.J. (2019). Benzylisoquinoline alkaloid biosynthesis in opium poppy: an update. Phytochemistry Reviews 18: 1457–1482. Li, Q.S., Ramasamy, S., Singh, P. et al. (2020). Gene clustering and copy number variation in alkaloid metabolic pathways of opium poppy. Nature Communications 11. Nett, R.S., Lau, W., and Sattely, E.S. (2020). Discovery and engineering of colchicine alkaloid biosynthesis. Nature. Kristensen, C., Morant, M., Olsen, C.E. et al. (2005). Metabolic engineering of dhurrin in transgenic Arabidopsis plants with marginal inadvertent effects on the metabolome and transcriptome. Proceedings of the National Academy of Sciences of the United States of America 102: 1779–1784. Zang, Y.X., Kim, D.H., Park, B.S., and Hong, S.B. (2009). Metabolic engineering of indole glucosinolates in Chinese cabbage hairy roots expressing Arabidopsis CYP79B2, CYP79B3, and CYP83B1. Biotechnology and Bioprocess Engineering 14: 467–473. Yu, X.D., Pickett, J., Ma, Y.Z. et al. (2012). Metabolic engineering of plant-derived (E)-𝛽-farnesene synthase genes for a novel type of

847

848

21 Metabolic Engineering of Photosynthetic Cells – in Collaboration with Nature

139

140 141

142

143

144

145

146

147

148

149

150

151

152 153

aphid-resistant genetically modified crop plants. Journal of Integrative Plant Biology 54: 282–299. Han, Y.-J. and Kim, J.-I. (2019). Application of CRISPR/Cas9-mediated gene editing for the development of herbicide-resistant plants. Plant Biotechnology Reports 13: 447–457. Duke, S. and Powles, S. (2009). Glyphosate-resistant crops and weeds: now and in the future. AgBioForum 12 (3–4): 346–357. Nielsen, A.Z., Ziersen, B., Jensen, K. et al. (2013). Redirecting photosynthetic reducing power toward bioactive natural product synthesis. ACS Synthetic Biology 2: 308–315. Zhang, L.S. and Lu, S.F. (2017). Overview of medicinally important diterpenoids derived from plastids. Mini-Reviews in Medicinal Chemistry 17: 988–1001. Pateraki, I., Andersen-Ranberg, J., Jensen, N.B. et al. (2017). Total biosynthesis of the cyclic AMP booster forskolin from Coleus forskohlii. eLife 6: e23001. https://doi.org/10.7554/eLife.23001. Pateraki, I., Andersen-Ranberg, J., Hamberger, B. et al. (2014). Manoyl oxide (13R), the biosynthetic precursor of forskolin, is synthesized in specialized root cork cells in Coleus forskohlii. Plant Physiology 164: 1222–1236. Okazaki, Y., Shimojima, M., Sawada, Y. et al. (2009). A chloroplastic UDP-glucose pyrophosphorylase from Arabidopsis is the committed enzyme for the first step of sulfolipid biosynthesis. The Plant Cell 21: 892–909. Harwood, R., Goodman, E., Gudmundsdottir, M. et al. (2020). Cell and chloroplast anatomical features are poorly estimated from 2D cross-sections. The New Phytologist 225: 2567–2578. Brillouet, J.-M., Verdeil, J.-L., Odoux, E. et al. (2014). Phenol homeostasis is ensured in vanilla fruit by storage under solid form in a new chloroplast-derived organelle, the phenyloplast. Journal of Experimental Botany 65: 2427–2435. Gallage, N.J., Jørgensen, K., Janfelt, C. et al. (2018). The intracellular localization of the vanillin biosynthetic machinery in pods of Vanilla planifolia. Plant & Cell Physiology 59: 304–318. de la Torre, F., Cañas, R.A., Pascual, M.B. et al. (2014). Plastidic aspartate aminotransferases and the biosynthesis of essential amino acids in plants. Journal of Experimental Botany 65: 5527–5534. Bickel, H., Palme, L., and Schultz, G. (1978). Incorporation of shikimate and other precursors into aromatic amino acids and prenylquinones of isolated spinach chloroplasts. Phytochemistry 17: 119–124. Schulze-Siebert, D., Heineke, D., Scharf, H., and Schultz, G. (1984). Pyruvate-derived amino acids in spinach chloroplasts: synthesis and regulation during photosynthetic carbon metabolism. Plant Physiology 76: 465–471. Han, X., Lamshöft, M., Grobe, N. et al. (2010). The biosynthesis of papaverine proceeds via (S)-reticuline. Phytochemistry 71: 1305–1312. Onoyovwe, A., Hagel, J.M., Chen, X. et al. (2013). Morphine biosynthesis in opium poppy involves two cell types: sieve elements and laticifers. The Plant Cell 25: 4110–4122.

References

154 Gnanasekaran, T., Vavitsas, K., Andersen-Ranberg, J. et al. (2015). Heterol-

155

156

157 158

159

160

161

162

163

164

165

ogous expression of the isopimaric acid pathway in Nicotiana benthamiana and the effect of N-terminal modifications of the involved cytochrome P450 enzyme. Journal of Biological Engineering 9: 24. Mao, L.F., Kawaide, H., Higuchi, T. et al. (2020). Genomic evidence for convergent evolution of gene clusters for momilactone biosynthesis in land plants. Proceedings of the National Academy of Sciences of the United States of America 117: 12472–12480. Irmisch, S., Jancsik, S., Yuen, M.M.S. et al. (2020). Complete biosynthesis of the anti-diabetic plant metabolite montbretin A. Plant Physiology https://doi .org/10.1104/pp.\ignorespaces20.00522. Luo, X.Z., Reiter, M.A., d’Espaux, L. et al. (2019). Complete biosynthesis of cannabinoids and their unnatural analogues in yeast. Nature 567: 123. Gülck, T., Booth, J.K., Carvalho, Â. et al. (2020). Synthetic biology of cannabinoids and cannabinoid glycosides in Nicotiana benthamiana and Saccharomyces cerevisiae. Journal of Natural Products 83 (10): 2877–2893. Antal, T.K., Kovalenko, I.B., Rubin, A.B., and Tyystjärvi, E. (2013). Photosynthesis-related quantities for education and modeling. Photosynthesis Research 117: 1–30. Sibbesen, O., Koch, B., Halkier, B.A., and Møller, B.L. (1994). Isolation of the heme-thiolate enzyme cytochrome P-450TYR, which catalyzes the committed step in the biosynthesis of the cyanogenic glucoside dhurrin in Sorghum bicolor (L.) Moench. Proceedings of the National Academy of Sciences of the United States of America 91: 9740–9744. Sibbesen, O., Koch, B., Halkier, B.A., and Møller, B.L. (1995). Cytochrome P-450TYR is a multifunctional heme-thiolate enzyme catalyzing the conversion of L-tyrosine to p-hydroxyphenylacetaldehyde oxime in the biosynthesis of the cyanogenic glucoside dhurrin in Sorghum bicolor (L.) Moench. The Journal of Biological Chemistry 270: 3506–3511. Kahn, R.A., Bak, S., Svendsen, I. et al. (1997). Isolation and reconstitution of cytochrome P450ox and in vitro reconstitution of the entire biosynthetic pathway of the cyanogenic glucoside dhurrin from sorghum. Plant Physiology 115: 1661–1670. Bak, S., Kahn, R.A., Nielsen, H.L. et al. (1998). Cloning of three A-type cytochromes P450, CYP71E1, CYP98, and CYP99 from Sorghum bicolor (L.) Moench by a PCR approach and identification by expression in Escherichia coli of CYP71E1 as a multifunctional cytochrome P450 in the biosynthesis of the cyanogenic glucoside dhurrin. Plant Molecular Biology 36: 393–405. Jones, P.R., Møller, B.L., and Hoj, P.B. (1999). The UDP-glucose:p-hydroxymandelonitrile-O-glucosyltransferase that catalyzes the last step in synthesis of the cyanogenic glucoside dhurrin in Sorghum bicolor. Isolation, cloning, heterologous expression, and substrate specificity. The Journal of Biological Chemistry 274: 35483–35491. Halkier, B.A. and Møller, B.L. (1989). Biosynthesis of the cyanogenic glucoside dhurrin in seedlings of Sorghum bicolor (L.) Moench and partial purification of the enzyme system involved. Plant Physiology 90: 1552–1559.

849

850

21 Metabolic Engineering of Photosynthetic Cells – in Collaboration with Nature

166 Jensen, K., Jensen, P.E., and Moller, B.L. (2011). Light-driven cytochrome

P450 hydroxylations. ACS Chemical Biology 6: 533–539. 167 Jensen, P.E., Bassi, R., Boekema, E.J. et al. (2007). Structure, function and

168 169

170

171

172

173

174

175

176

177

178

179 180

regulation of plant photosystem I. Biochimica et Biophysica Acta 1767: 335–352. Jordan, P., Fromme, P., Witt, H.T. et al. (2001). Three-dimensional structure of cyanobacterial photosystem I at 2.5 A resolution. Nature 411: 909–917. Høj, P.B. and Møller, B.L. (1986). The 110 kDa reaction center protein of photosystem-I, P700-chlorophyll a-protein-1, is an iron-sulfur protein. Journal of Biological Chemistry 261: 14292–14300. Høj, P.B., Svendsen, I., Scheller, H.V., and Møller, B.L. (1987). Identification of a chloroplast-encoded 9 kDa polypeptide as a 2[4fe-4s] protein carrying center A and center B of photosystem I. Journal of Biological Chemistry 262: 12676–12684. Mellor, S.B., Nielsen, A.Z., Burow, M. et al. (2016). Fusion of ferredoxin and cytochrome P450 enables direct light-driven biosynthesis. ACS Chemical Biology 11: 1862–1869. Sadeghi, S.J. and Gilardi, G. (2013). Chimeric P450 enzymes: activity of artificial redox fusions driven by different reductases for biotechnological applications. Biotechnology and Applied Biochemistry 60: 102–110. Wlodarczyk, A., Gnanasekaran, T., Nielsen, A.Z. et al. (2016). Metabolic engineering of light-driven cytochrome P450 dependent pathways into Synechocystis sp. PCC 6803. Metabolic Engineering 33: 1–11. Maeda, H. and Dudareva, N. (2012). The shikimate pathway and aromatic amino acid biosynthesis in plants. Annual Review of Plant Biology 63: 73–105. Chen, J., Zhu, M., Liu, R. et al. (2020). BIOMASS YIELD 1 regulates sorghum biomass and grain yield via the shikimate pathway. Journal of Experimental Botany 71: 5506–5520. Zhou, F., Karcher, D., and Bock, R. (2007). Identification of a plastid intercistronic expression element (IEE) facilitating the expression of stable translatable monocistronic mRNAs from operons. The Plant Journal: For Cell and Molecular Biology 52: 961–972. Clarke, J.L., Paruch, L., Dobrica, M.-O. et al. (2017). Lettuce-produced hepatitis C virus E1E2 heterodimer triggers immune responses in mice and antibody production after oral vaccination. Plant Biotechnology Journal 15: 1611–1621. Hoelscher, M., Tiller, N., Teh, A.Y. et al. (2018). High-level expression of the HIV entry inhibitor griffithsin from the plastid genome and retention of biological activity in dried tobacco leaves. Plant Molecular Biology 97: 357–370. Bock, R. and Warzecha, H. (2010). Solar-powered factories for new vaccines and antibiotics. Trends in Biotechnology 28: 246–252. Fuentes, P., Zhou, F., Erban, A. et al. (2016). A new synthetic biology approach allows transfer of an entire metabolic pathway from a medicinal plant to a biomass crop. eLife 5: e13664.

References

181 Song, Q.P., You, L.L., Liu, Y. et al. (2020). Endogenous accumulation of

182

183

184

185 186

187

188

189

190

191 192 193

194

195

glycine betaine confers improved low temperature resistance on transplastomic potato plants. Functional Plant Biology 47 (12): 1105–1116. Xu, S., Zhang, Y., Li, S. et al. (2020). Plastid-expressed Bacillus thuringiensis (Bt) cry3Bb confers high mortality to a leaf eating beetle in poplar. Plant Cell Reports 39: 317–323. Lauersen, K.J., Wichmann, J., Baier, T. et al. (2018). Phototrophic production of heterologous diterpenoids and a hydroxy-functionalized derivative from Chlamydomonas reinhardtii. Metabolic Engineering 49: 116–127. Polle, J.E.W., Jin, E., and Ben-Amotz, A. (2020). The alga Dunaliella revisited: looking back and moving forward with model and production organisms. Algal Research – Biomass Biofuels and Bioproducts 49 https://doi .org/10.1016/j.algal.2020.101948. Torres-Tiji, Y., Fields, F.J., and Mayfield, S.P. (2020). Microalgae as a future food source. Biotechnology Advances 41: 107536. Fayyaz, M., Chew, K.W., Show, P.L. et al. (2020). Genetic engineering of microalgae for enhanced biorefinery capabilities. Biotechnology Advances 43: 107554. Lafarga, T., Clemente, I., and Garcia-Vaquero, M. (2020). Carotenoids from microalgae. In: Carotenoids: Properties, Processing and Applications (ed. C.M. Galanakis). Academic Press https://doi.org/10.1016/b978-0-12-8170670.00005-1. Liu, Y.M., Cui, Y.L., Chen, J. et al. (2019). Metabolic engineering of Synechocystis sp. PCC6803 to produce astaxanthin. Algal Research – Biomass Biofuels and Bioproducts 44 https://doi.org/10.1016/j.algal.2019.101679. Liang, Z.C., Liang, M.H., and Jiang, J.G. (2020). Transgenic microalgae as bioreactors. Critical Reviews in Food Science and Nutrition 60 (19): 3195–3213. https://doi.org/10.1080/10408398.2019.1680525. Scherer, K., Stiefelmaier, J., Strieth, D. et al. (2020). Development of a lightweight multi-skin sheet photobioreactor for future cultivation of phototrophic biofilms on facades. Journal of Biotechnology 320: 28–35. Jerney, J. and Spilling, K. (2020). Large scale cultivation of microalgae: open and closed systems. Methods in Molecular Biology (Clifton, NJ) 1980: 1–8. Roles, J., Yarnold, J., Wolf, J. et al. (2020). Charting a development path to deliver cost competitive microalgae-based fuels. Algal Research 45: 101721. Anto, S., Mukherjee, S.S., Muthappa, R. et al. (2020). Algae as green energy reserve: technological outlook on biofuel production. Chemosphere 242: 125079. https://doi.org/10.1016/j.chemosphere.2019.125079. Huang, J.K., Hankamer, B., and Yarnold, J. (2019). Design scenarios of outdoor arrayed cylindrical photobioreactors for microalgae cultivation considering solar radiation and temperature. Algal Research – Biomass Biofuels and Bioproducts 41 https://doi.org/10.1016/j.algal.2019.101515. Yarnold, J., Ross, I.L., and Hankamer, B. (2016). Photoacclimation and productivity of Chlamydomonas reinhardtii grown in fluctuating light regimes which simulate outdoor algal culture conditions. Algal Research 13: 182–194.

851

852

21 Metabolic Engineering of Photosynthetic Cells – in Collaboration with Nature

196 Vecchi, V., Barera, S., Bassi, R., and Dall’Osto, L. (2020). Potential and chal-

197

198

199

200

201

202

203

204

205

206

207

208

lenges of improving photosynthesis in algae. Plants (Basel) 9 (1) https://doi .org/10.3390/plants9010067. Gonzalez-Morales, S.I., Pacheco-Gutierrez, N.B., Ramirez-Rodriguez, C.A. et al. (2020). Metabolic engineering of phosphite metabolism in Synechococcus elongatus PCC 7942 as an effective measure to control biological contaminants in outdoor raceway ponds. Biotechnology for Biofuels 13: 119. Hammel, A., Sommer, F., Zimmer, D. et al. (2020). Overexpression of sedoheptulose-1,7-bisphosphatase enhances photosynthesis in Chlamydomonas reinhardtii and has no effect on the abundance of other Calvin-Benson cycle enzymes. Frontiers in Plant Science 11: 868. Carrera-Pacheco, S.E., Hankamer, B., and Oey, M. (2020). Light and heat-shock mediated TDA1 overexpression as a tool for controlled high-yield recombinant protein production in Chlamydomonas reinhardtii chloroplasts. Algal Research – Biomass Biofuels and Bioproducts 48 https:// doi.org/10.1016/j.algal.2020.101921. Larrea-Alvarez, M. and Purton, S. (2020). Multigenic engineering of the chloroplast genome in the green alga Chlamydomonas reinhardtii. Microbiol (Reading) 166: 510–515. Dautermann, O., Lyska, D., Andersen-Ranberg, J. et al. (2020). An algal enzyme required for biosynthesis of the most abundant marine carotenoids. Science Advances 6: eaaw9183. Adarme-Vega, T.C., Lim, D.K.Y., Timmins, M. et al. (2012). Microalgal biofactories: a promising approach towards sustainable omega-3 fatty acid production. Microbial Cell Factories 11: 96. https://doi.org/10.1186/14752859-11-96. Diao, J.J., Song, X.Y., Guo, T.H. et al. (2020). Cellular engineering strategies toward sustainable omega-3 long chain polyunsaturated fatty acids production: state of the art and perspectives. Biotechnology Advances 40: 107497. https://doi.org/10.1016/j.biotechadv.2019.107497. Kilian, O., Benemann, C.S.E., Niyogi, K.K., and Vick, B. (2011). High-efficiency homologous recombination in the oil-producing alga Nannochloropsis sp. Proceedings of the National Academy of Sciences of the United States of America 108: 21265–21269. Carneiro, M., Cicchi, B., Maia, I.B. et al. (2020). Effect of temperature on growth, photosynthesis and biochemical composition of Nannochloropsis oceanica, grown outdoors in tubular photobioreactors. Algal Research 49: 101923. https://doi.org/10.1016/j.algal.2020.101923. Ryu, A.J., Kang, N.K., Jeon, S. et al. (2020). Development and characterization of a Nannochloropsis mutant with simultaneously enhanced growth and lipid production. Biotechnology for Biofuels 13: 38. Xue, Z.H., Yu, Y., Yu, W.C. et al. (2020). Development prospect and preparation technology of edible oil from microalgae. Frontiers in Marine Science 7 (402) https://doi.org/10.3389/fmars.2020.00402. Lassen, L.M., Nielsen, A.Z., Olsen, C.E. et al. (2014). Anchoring a plant cytochrome P450 via PsaM to the thylakoids in Synechococcus sp. PCC 7002: evidence for light-driven biosynthesis. PLoS One 9: e102184.

References

209 Lindblad, P., Fuente, D., Borbe, F. et al. (2019). CyanoFactory, a European

210

211

212

213 214

215 216

217

218

219

220

221 222 223

224

consortium to develop technologies needed to advance cyanobacteria as chassis for production of chemicals and fuels. Algal Research – Biomass Biofuels and Bioproducts 41 https://doi.org/10.1016/j.algal.2019.101510. Laursen, T., Moller, B.L., and Bassard, J.E. (2015). Plasticity of specialized metabolism as mediated by dynamic metabolons. Trends in Plant Science 20: 20–32. Bassard, J.E., Møller, B.L., and Laursen, T. (2017). Assembly of dynamic P450-mediated metabolons-order versus chaos. Current Molecular Biology Reports 3: 37–51. Møller, B.L. and Conn, E.E. (1980). The biosynthesis of cyanogenic glucosides in higher plants. Channeling of intermediates in dhurrin biosynthesis by a microsomal system from Sorghum bicolor (L.) Moench. The Journal of Biological Chemistry 255: 3049–3056. Srere, P.A. (1985). The metabolon. Trends in Biochemical Sciences 10: 109–110. Jørgensen, K., Rasmussen, A.V., Morant, M. et al. (2005). Metabolon formation and metabolic channeling in the biosynthesis of plant natural products. Current Opinion in Plant Biology 8: 280–291. Mucha, S., Heinzlmeir, S., Kriechbaumer, V. et al. (2019). The formation of a camalexin biosynthetic metabolon. The Plant Cell 31: 2697–2710. Hilser, V.J. and Thompson, E.B. (2007). Intrinsic disorder as a mechanism to optimize allosteric coupling in proteins. Proceedings of the National Academy of Sciences 104: 8311–8315. Henzler-Wildman, K.A., Lei, M., Thai, V. et al. (2007). A hierarchy of timescales in protein dynamics is linked to enzyme catalysis. Nature 450: 913–916. Hatzakis, N.S., Wei, L., Jorgensen, S.K. et al. (2012). Single enzyme studies reveal the existence of discrete functional states for monomeric enzymes and how they are “selected” upon allosteric regulation. Journal of the American Chemical Society 134: 9296–9302. Laursen, T., Singha, A., Rantzau, N. et al. (2014). Single molecule activity measurements of cytochrome P450 oxidoreductase reveal the existence of two discrete functional states. ACS Chemical Biology 9: 630–634. Dueber, J.E., Wu, G.C., Malmirchegini, G.R. et al. (2009). Synthetic protein scaffolds provide modular control over metabolic flux. Nature Biotechnology 27: 753–759. Farre, G., Blancquaert, D., Capell, T. et al. (2014). Engineering complex metabolic pathways in plants. Annual Review of Plant Biology 65: 187. Singleton, C., Howard, T.P., and Smirnoff, N. (2014). Synthetic metabolons for metabolic engineering. Journal of Experimental Botany 65: 1947–1954. Lv, X.Q., Cui, S.X., Gu, Y. et al. (2020). Enzyme assembly for compartmentalized metabolic flux control. Metabolites 10 (4): 125. https://doi.org/10 .3390/metabo10040125. Laursen, T., Borch, J., Knudsen, C. et al. (2016). Characterization of a dynamic metabolon producing the defense compound dhurrin in sorghum. Science 354: 890–893.

853

854

21 Metabolic Engineering of Photosynthetic Cells – in Collaboration with Nature

225 Nielsen, L.J., Stuart, P., Picmanova, M. et al. (2016). Dhurrin metabolism

226

227

228

229

230

231

232

233

234

235

236

in the developing grain of Sorghum bicolor (L.) Moench investigated by metabolite profiling and novel clustering analyses of time-resolved transcriptomic data. BMC Genomics 17: 1021. Andersen, M.D., Busk, P.K., Svendsen, I., and Møller, B.L. (2000). Cytochromes P-450 from cassava (Manihot esculenta Crantz) catalyzing the first steps in the biosynthesis of the cyanogenic glucosides linamarin and lotaustralin. Cloning, functional expression in Pichia pastoris, and substrate specificity of the isolated recombinant enzymes. The Journal of Biological Chemistry 275: 1966–1975. Jørgensen, K., Morant, A.V., Morant, M. et al. (2011). Biosynthesis of the cyanogenic glucosides linamarin and lotaustralin in cassava: isolation, biochemical characterization, and expression pattern of CYP71E7, the oxime-metabolizing cytochrome P450 enzyme. Plant Physiology 155: 282–292. Kannangara, R., Motawia, M.S., Hansen, N.K.K. et al. (2011). Characterization and expression profile of two UDP-glucosyltransferases, UGT85K4 and UGT85K5, catalyzing the last step in cyanogenic glucoside biosynthesis in cassava. The Plant Journal 68: 287–301. Schmidt, F.B., Cho, S.K., Olsen, C.E. et al. (2018). Diurnal regulation of cyanogenic glucoside biosynthesis and endogenous turnover in cassava. Plant Direct 2: e00038. Bassard, J.E., Richert, L., Geerinck, J. et al. (2012). Protein-protein and protein-membrane associations in the lignin pathway. The Plant Cell 24: 4465–4482. Dastmalchi, M., Bernards, M.A., and Dhaubhadel, S. (2016). Twin anchors of the soybean isoflavonoid metabolon: evidence for tethering of the complex to the endoplasmic reticulum by IFS and C4H. The Plant Journal 85: 689–706. Lallemand, B., Erhardt, M., Heitz, T., and Legrand, M. (2013). Sporopollenin biosynthetic enzymes interact and constitute a metabolon localized to the endoplasmic reticulum of tapetum cells. Plant Physiology 162: 616–625. Kwiatkowska, M., Polit, J.T., Stepinski, D. et al. (2015). Lipotubuloids in ovary epidermis of Ornithogalum umbellatum act as metabolons: suggestion of the name “lipotubuloid metabolon”. Journal of Experimental Botany 66: 1157–1163. Paddon, C.J. and Keasling, J.D. (2014). Semi-synthetic artemisinin: a model for the use of synthetic biology in pharmaceutical development. Nature Reviews. Microbiology 12: 355–367. Yang, D., Park, S.Y., Park, Y.S. et al. (2020). Metabolic engineering of Escherichia coli for natural product biosynthesis. Trends in Biotechnology 38: 745–765. Dolgin, E. (2018). Cell biology’s new phase like oil in water, the contents of cells can separate into droplets. Finding out why is one of biology’s hottest questions. Nature 555: 300–302.

References

237 Choi, Y.H., van Spronsen, J., Dai, Y.T. et al. (2011). Are natural deep eutec-

238

239

240

241 242

243

244

245 246

247

248

249

250

251

tic solvents the missing link in understanding cellular metabolism and physiology? Plant Physiology 156: 1701–1705. Grimplet, J., Wheatley, M.D., Jouira, H.B. et al. (2009). Proteomic and selected metabolite analysis of grape berry tissues under well-watered and water-deficit stress conditions. Proteomics 9: 2503–2528. Dai, Y.T., Witkamp, G.J., Verpoorte, R., and Choi, Y.H. (2013). Natural deep eutectic solvents as a new extraction media for phenolic metabolites in Carthamus tinctorius L. Analytical Chemistry 85: 6272–6278. Zhao, B.Y., Xu, P., Yang, F.X. et al. (2015). Biocompatible deep eutectic solvents based on choline chloride: characterization and application to the extraction of rutin from Sophora japonica. ACS Sustainable Chemistry & Engineering 3: 2746–2755. Kim, H.K., Choi, Y.H., and Verpoorte, R. (2010). NMR-based metabolomic analysis of plants. Nature Protocols 5: 536–549. Gonzalez, C.G., Mustafa, N.R., Wilson, E.G. et al. (2018). Application of natural deep eutectic solvents for the “green” extraction of vanillin from vanilla pods. Flavour and Fragance Journal 33: 91–96. Francisco, M., van den Bruinhorst, A., and Kroon, M.C. (2013). Low-transition-temperature mixtures (LTTMs): a new generation of designer solvents. Angewandte Chemie International Edition 52: 3074–3085. Hammond, O.S., Bowron, D.T., and Edler, K.J. (2017). The effect of water upon deep eutectic solvent nanostructure: an unusual transition from ionic mixture to aqueous solution. Angewandte Chemie International Edition 56: 9782–9785. Hayes, R., Warr, G.G., and Atkin, R. (2015). Structure and nanostructure in ionic liquids. Chemical Reviews 115: 6357–6426. Hammond, O.S., Bowron, D.T., and Edler, K.J. (2016). Liquid structure of the choline chloride-urea deep eutectic solvent (reline) from neutron diffraction and atomistic modelling. Green Chemistry 18: 2736–2744. Markham, K.R., Gould, K.S., Winefield, C.S. et al. (2000). Anthocyanic vacuolar inclusions – their nature and significance in flower colouration. Phytochemistry 55: 327–336. Markham, K.R., Bloor, S.J., Nicholson, R. et al. (2004). Black flower coloration in wild Lisianthius nigrescens: its chemistry and ecological consequences. Zeitschrift Fur Naturforschung Section C – A Journal of Biosciences 59: 625–630. Horosanskaia, E., Nguyen, T.M., Vu, T.D. et al. (2017). Crystallization-based isolation of pure rutin from herbal extract of Sophora japonica L. Organic Process Research and Development 21: 1769–1778. Happyana, N. and Kayser, O. (2016). Monitoring metabolite profiles of Cannabis sativa L. Trichomes during flowering period using H-1 NMR-based metabolomics and real-time PCR. Planta Medica 82: 1217–1223. Møller, B.L. and Laursen, T. (2020). Metabolons and bio-condensates: the essence of plant plasticity and the key elements in development of green production systems. In: Advances in Botanical Research. Academic Press.

855

856

21 Metabolic Engineering of Photosynthetic Cells – in Collaboration with Nature

252 Møller, B.L. and Conn, E.E. (1979). The biosynthesis of cyanogenic glu-

253

254

255

256

257

258

259

260 261

262

263

264

265

cosides in higher plants. N-Hydroxytyrosine as an intermediate in the biosynthesis of dhurrin by Sorghum bicolor (Linn) Moench. The Journal of Biological Chemistry 254: 8575–8583. Bak, S., Olsen, C.E., Halkier, B.A., and Møller, B.L. (2000). Transgenic tobacco and Arabidopsis plants expressing the two multifunctional sorghum cytochrome P450 enzymes, CYP79A1 and CYP71E1, are cyanogenic and accumulate metabolites derived from intermediates in dhurrin biosynthesis. Plant Physiology 123: 1437–1448. Kahn, R.A., Fahrendorf, T., Halkier, B.A., and Moller, B.L. (1999). Substrate specificity of the cytochrome P450 enzymes CYP79A1 and CYP71E1 involved in the biosynthesis of the cyanogenic glucoside dhurrin in Sorghum bicolor (L.) Moench. Archives of Biochemistry and Biophysics 363: 9–18. Koch, B.M., Sibbesen, O., Halkier, B.A. et al. (1995). The primary sequence of cytochrome P450tyr, the multifunctional N-hydroxylase catalyzing the conversion of L-tyrosine to p-hydroxyphenylacetaldehyde oxime in the biosynthesis of the cyanogenic glucoside dhurrin in Sorghum bicolor (L.) Moench. Archives of Biochemistry and Biophysics 323: 177–186. Heraud, P., Cowan, M.F., Marzec, K.M. et al. (2018). Label-free Raman hyperspectral imaging analysis localizes the cyanogenic glucoside dhurrin to the cytoplasm in sorghum cells. Scientific Reports – UK 8 https://doi.org/10 .1038/s41598-018-20928-7. Knudsen, C., Bavishi, K., Viborg, K.M. et al. (2020). Stabilization of dhurrin biosynthetic enzymes from Sorghum bicolor using a natural deep eutectic solvent. Phytochemistry 170: 112214. Gallage, N.J., Hansen, E.H., Kannangara, R. et al. (2014). Vanillin formation from ferulic acid in Vanilla planifolia is catalysed by a single enzyme. Nature Communications 5: 4037. Gallage, N.J. and Møller, B.L. (2015). Vanillin-bioconversion and bioengineering of the most popular plant flavor and its De Novo biosynthesis in the vanilla orchid. Molecular Plant 8: 40–57. Flores-Sanchez, I.J. and Verpoorte, R. (2008). Secondary metabolism in cannabis. Phytochemistry Reviews 7: 615–639. Livingston, S.J., Quilichini, T.D., Booth, J.K. et al. (2020). Cannabis glandular trichomes alter morphology and metabolite content during flower maturation. The Plant Journal 101: 37–56. Ventrella, M.C. and Marinho, C.R. (2008). Morphology and histochemistry of glandular trichomes of Cordia verbenacea DC. (Boraginaceae) leaves. Brazilian Journal of Botany 31: 457–467. Zirpel, B., Kayser, O., and Stehle, F. (2018). Elucidation of structure-function relationship of THCA and CBDA synthase from Cannabis sativa L. Journal of Biotechnology 284: 17–26. Booth, J.K., Yuen, M.M.S., Jancsik, S. et al. (2020). Terpene synthases and terpene variation in Cannabis sativa. Plant Physiology https://doi.org/10 .1104/pp.\ignorespaces20.00593. Gülck, T. and Møller, B.L. (2020). Phytocannabinoids: origins and biosynthesis. Trends in Plant Science 25: 985–1004.

References

266 Rodziewicz, P., Loroch, S., Marczak, L. et al. (2019). Cannabinoid synthases

and osmoprotective metabolites accumulate in the exudates of Cannabis sativa L. glandular trichomes. Plant Science 284: 108–116. 267 Knudsen, C., Gallage, N.J., Hansen, C.C. et al. (2018). Dynamic metabolic solutions to the sessile life style of plants. Natural Product Reports 35: 1140–1155.

857

859

22 Metabolic Engineering for Large-Scale Environmental Bioremediation Pablo I. Nikel 1 and Víctor de Lorenzo 2 1 The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, Denmark 2

Systems and Synthetic Biology Department, Centro Nacional de Biotecnología (CNB-CSIC), Madrid, Spain

22.1 Introduction Bioremediation entails the deliberate use of biological agents for biodegradation, detoxification, or immobilization of chemical waste produced by industrial and urban activity [1]. This simple definition branches out into many specific strategies to the same end, depending on (i) the type of pollutants at stake, (ii) their concentration/availability and their localization, (iii) the biological agents of choice, e.g. naturally occurring or genetically designed catalysts, and (iv) the physicochemical characteristics of the target sites and other circumstances that are specific of the locations afflicted by the toxic waste(s) [2, 3]. Regardless of the concrete bioremediation scenario, three challenges need to be always met for any sound intervention holding chances of successful counteraction of environmental pollution. The first consideration is the biological activity proper at the core of the contaminant elimination process. It can encompass from just one metabolic pathway in a single strain all the way to a complex merge of catabolic activities contributed by diverse biological agents, e.g. catalytic mycorrhiza involving plant, fungal, and bacterial inputs to the process. Cell-free activities, e.g. released by dead biomass in soil but still active enzymatically, can also participate in a multistep catalytic process and join forces with the live agents present at the target site [4]. Finally, one can artificially enter physical (e.g. electricity, thermal desorption), chemical (O2 injection, N and P supply), and biological (e.g. straw, manure, plant rhizosphere) ingredients for adding extra stages to a stepwise biodegradation process and/or altering the conditions of the niche for enabling colonization by an efficacious microbiota [5]. A second challenge is the choice of vehicle for delivering the catalytic activities of interest to the target site (Figure 22.1). Again, there is ample variety of possibilities to this end. One may consider individual enzymes or enzymatic cocktails with various degrees of purity, dead bacterial biomass, and/or more sophisticated cell-free formulations. In a more elaborated approach, live biological agents (e.g. bacteria, fungi, plants, worms, or other organisms) can be developed as agents of Metabolic Engineering: Concepts and Applications, First Edition. Edited by Sang Yup Lee, Jens Nielsen, and Gregory Stephanopoulos. © 2021 WILEY-VCH GmbH. Published 2021 by WILEY-VCH GmbH.

860

22 Metabolic Engineering for Large-Scale Environmental Bioremediation

Figure 22.1 Types of microbiological agents for bioremediation of environmental pollution. (a) Cell-free agents. Despite their microbial origin, extracts enriched with enzymes and enzymatic cocktails can be used as catalysts for eliminating chemical pollution, e.g. delivered in capsules or other physical formats. (b) Inactivated whole cells. Bacteria expressing biodegradative enzymes (whether naturally occurring or genetically engineered) are killed and used in bioremediation interventions. (c) Chromosome-less bacteria (SimCells). Cells are engineered to produce an activity of interest and then the genome is self-destroyed upon conditional expression of a nuclease. (d) Monoclonal strains. Live and self-replicating bacteria. (e) Catalytic consortia. Combination of various strains or species may empower the resulting biodegradative activity. Partnerships can occur among planktonic cells in solution, forming biofilms (either forming flocks in suspension or surface-attached), with or without fixed tridimensional structures. (f ) DNA propagated through horizontal gene transfer (HGT). Note that these microbiological agents can be implemented by themselves or in combination with other biological (e.g. plants) or non-biological helpers (e.g. electric fields).

bioremediation, either because of their innate capacities or as recipients (chassis in the jargon of synthetic biology1 ) of pathways and activities artificially introduced into them. As indicated before, the active agent can also be composed of just one species or forming a consortium with intra-kingdom or inter-kingdom partners [6]. Finally, the third shared challenge in planning bioremediation interventions is the upscaling, preservation, and release of the active catalysts to their final 1 According to the European Food Safety Authority (EFSA), a synthetic biology chassis is a genetically engineerable and reusable biological platform with a genome encoding a number of basic functions for stable self-maintenance, growth and optimal operation but with the tasks and signal processing components purposely edited for strengthening performance under pre-specified environmental conditions (https://efsa.onlinelibrary.wiley.com/doi/epdf/10.2903/j.efsa.2020.6263).

22.1 Introduction

destination where they must efficaciously do the job. Remarkably, this essential endeavor has received very little attention compared to a much greater emphasis on pathway construction and chassis development [6–8]. Borrowing the concept from traditional pharmacy, the term environmental Galenics [9] has been proposed to describe the technologies that enable passing from a useful activity in the controlled conditions of the laboratory (similar to the active principle of a drug formulation) to delivery into an environment degraded by emissions (akin to a sick body). Current strategies to this end include injection of liquid cultures, seed coating, microbial pellets, as well as strains and consortia encapsulated in a variety of inert matrices (see Section 22.8). In many cases, such a release of new strains to a given ecosystem should overcome resistance to colonization by the resident community and grazing by protozoa [10]. As explained later, one way to alleviate such problems is engineering delivery of the activities not through direct addition of strains but by promoting horizontal gene transfer (HGT) of the corresponding DNA into a wide variety of hosts native of the target site (see 22.9). In terms of intensity of the intervention, various degrees can be entertained [12]. The simplest is what has been tactfully called natural attenuation – basically, do nothing and just wait that ordinary geochemical and biological processes will solve the problem in the long run [13]. This strategy is in fact the most widespread and, if time is not an issue, it eventually works in most cases of moderate contamination. Yet, the endpoint of such a scheme is often chronic, historical pollution: some sites contaminated with heavy metals can be traced to the primitive metallurgy of Neolithic times. The second possibility is biostimulation – i.e. deliberate modification of the physicochemical conditions of the site for making it amenable to spontaneous colonization by an adequate, native (micro)flora able to get rid of the noxious chemicals. This operation typically involves nutritional amendments with metabolizable N and P sources (as habitually done with oil spills) or air injection in otherwise anoxic environments for stimulating aerobic catabolism. More complex strategies (which fall under the concept of landfarming) may include large-scale mixture of soil with organic and inorganic additives, covering with plastic, setting bioactive walls, application of electric fields, and other manipulations. In these cases, the issue is not so much engineering an adequate live catalyst for inoculation in the site as it is creating an optimal habitat for naturally occurring microorganisms and plants to move to the place and execute catabolic or co-metabolic activities on the recalcitrant chemicals. One way or the other, the frontline strategy that has stirred most attention and opened more possibilities since the late 80s of last century is bioaugmentation [14]. In this case, the concept involves directed inoculation of one or more specific precultivated agents (microorganisms in most instances) endowed by themselves or knocked-in through genetic engineering (GE) with a superior capacity to tackle the pollutant under scrutiny (see types in Figure 22.1a). Once at hand, the bioremediation agents must encounter (and process) the target compounds in a physicochemical landscape that is defined by at least six parameters: toxicity, abundance, concentration, biodegradability, bioavailability, and mobility [11] (Figure 22.2b).

861

862

22 Metabolic Engineering for Large-Scale Environmental Bioremediation

Greenhouse gases Flame retardants

Cosmetics

Plastics

(a)

Recalcitrance

Abundance

Pesticides

Nitro/Clorganics Hydrocarbons

Toxicity

Solvents

Pharmaceuticals

Endocrine disruptors

Bioavailability

Concentration Mobility

(b)

Figure 22.2 Typical chemical emissions and parameters involved in the ease of bioremediation. (a) Industrial and urban activities release into the environment a diversity of molecules that have a noxious effect on the Biosphere. (b) The six parameters indicated in the plot define the pollutant profile and determines the line of attack of the bioremediation strategy. Source: Redrawn from de Lorenzo et al. [11].

22.2 Metabolic Engineering for Bioremediation: From 2.0 to 3.0 Shortly after the spread of recombinant DNA technologies in the early 80s of twentieth century, the opportunity to genetically design strains of environmental bacteria with improved biodegradation capacities, especially Pseudomonads became apparent. The literature of the late 80s–90s is packed with examples of construction of new pathways for complete, successful catabolism of otherwise recalcitrant chemicals (in particular Pseudomonas and akin species) with a potential as bioremediation agents [3]. Note that genetic manipulations could be considered not just to upgrade whole-cell catalytic abilities, but also to increase qualities for its execution under environmentally relevant conditions (e.g. surfactant production [15]). Alas, the performance of such genetically improved strains in sites actually polluted by the compounds at stake was generally poor and the field came to a standstill in the early 2000s. This interesting period of time in the history of metabolic engineering for biodegradation and eventually bioremediation has been exhaustively documented [16] and will not be addressed here. Yet, the work done during that interlude both pinpointed problems and highlighted many of the issues that were later taken over by what has been called Bioremediation 3.0 (with 1.0 and 2.0 being natural attenuation and knowledge-based biostimulation and bioaugmentation, respectively). As such, bioremediation 3.0 entails the further development of the field under the conceptual and technical umbrella of contemporary systems and synthetic biology [17]. Adoption of the computational and wet tools emerging from these fields has enabled addressing again topics and issues that could not be developed in the first wave of molecular and genetically inspired bioremediation. The first key change at the transit between 2.0 and 3.0 is the access to metabolic models which, with different degrees of accuracy, can guide genetic interventions

22.2 Metabolic Engineering for Bioremediation: From 2.0 to 3.0

and predict their outcomes for a large number of environmental microorganisms, including typical synthetic biology chassis (see Section 22.6). Along the same line, the last few years have witnessed a series of increasingly accurate platforms for virtual assembly of new biochemical pathways, both for biotransformations and for biodegradation. This trend started in the 90s with the setup of the Minnesota Biodegradation and Biotransformation Database [18] (now relocated in http:// eawag-bbd.ethz.ch) and has reached a considerable landmark recently with the setup of the ATLAS platform [19], which allows in silico composition of biosynthetic routes using intermediate metabolites as starting building blocks. One key angle of such a frame is the consideration of not only the biochemical reactions proper, but also their kinetics, thermodynamics, and global impact in a given metabolic network. In this sense, ATLAS follows up and expand the path initiated before in the field of Chemoinformatics [20], but adding a frame of biological feasibility to reaction networks. With such computational tools handy [21], designing new pathways largely involves setting the desired input (a substrate or an intermediate metabolite) and the output (either an added value molecule or CO2 and H2 O as the products of complete catabolism). The computational platform is then let explore a solution space shaped by chemical/thermodynamic constraints and availability of biological activities. In the best-case scenario, the queried system returns a list of possible routes and indications on where to find the corresponding genes and activities, e.g. in genomes or metagenomes. Such in silico appraisals of new pathways have boosted the otherwise limited capacity of earlier researchers who faced similar challenges but relied only in their intuition and partial knowledge of biological reactions [22]. Alas, rational and/or in silico design of a biodegradation pathway is just the beginning of the actual process of constructing effective agents for environmental catalysis. A second development that separates Bioremediation 2.0 and 3.0 is the booming of sophisticated genetic tools for deep engineering of bacterial genomes beyond the habitual model microorganisms, e.g. Escherichia coli or yeast. These tools are permanently being improved [23, 24] and include not only assets for typical genomic editing (i.e. insertions, deletions, allelic replacements) but also implantation of entirely alien metabolic routes and signal-responding logic circuits implemented with genetic parts, many of them synthetic or semisynthetic. These enable what we call cyborgization of bacterial strains, i.e. merging naturally occurring properties with engineered ones. Archetypal modifications include (i) enhancement of innate desirable traits, (ii) replacement of innate traits by better ones, (iii) knocking-in entirely new traits, and (iv) elimination of drawbacks (debottlenecking). Oftentimes, multiple genomic changes have to be implemented simultaneously [25] with an optimal stoichiometry that is impossible to calculate from first principles. In these cases, one can rely on fluctuations in regulatory sequences (i.e. promoters and intergenic regions) brought about by single-stranded (ss) DNA recombineering, followed by selection of the best performers [26]. This approach, which was initially developed for E. coli under the denomination of multiplex automated genome editing (MAGE) has found its way toward multiple-site engineering of bioremediation chassis such as P. putida ([27]; see below). In fact, ssDNA recombineering, along with CRISPR/Cas9 toolbox available for counterselection of wild-type sequences and interference of

863

864

22 Metabolic Engineering for Large-Scale Environmental Bioremediation

given target activities [28, 29] seem to cover most of the technical demands for genome editing in a considerable number of Gram-negative bacteria generally considered as adequate vehicles of bioremediation activities. The third change that has pushed the transition 2.0 to 3.0 in bioremediation is the ease of both sequencing nucleic acids and synthesizing DNA. The price tag of either has come down by various orders of magnitude in recent years and has opened possibilities that we unthinkable in the not so distant past. On the one hand, what started as discrete DNA sequences of individual genomes has expanded toward metagenomes of specific sites, then to global surveys (e.g. the Global Ocean Sampling Expedition [30]) and then to the ambition of sequencing of the whole of DNA of the Biosphere [31, 32]. In parallel, massive transcriptomic data from the most diverse environments have exposed the composition, activities, and interactions of (mostly nonculturable) environmental bacteria with an unprecedented detail. In turn, this has allowed for the assembly of metabolic models and prediction of catabolic potential for whole communities with reasonable accuracy [33]. Two important take home lessons from these recent bodies of information are worth mentioning for the sake of this chapter. One is that the environmental bacteria that are most efficient at degrading recalcitrant pollutants may not be those that grow the fastest in the laboratory, even on the same target compounds. In fact, some of the best degraders can hardly be grown in isolation on a Petri dish. The second lesson learnt is that natural catalysts for elimination of chemical waste typically involve whole bacterial consortia rather than all-powerful super-bugs – as entertained under the Bioremediation 2.0 paradigm [17]. The counterpart to the ease of massive DNA and RNA sequence (accompanied by a suite of omics technologies, e.g. proteomics, metabolomics, and fluxomics) is the equally affordable synthesis of DNA on demand along with a plethora of methods for DNA assembly in vitro. The often time-consuming and fastidious endeavor of building complex genetic constructs that was so typical of genetic engineering laboratories is growingly being replaced by direct orders of synthetic DNA through a suite of commercial providers. The still expensive and intricate endeavor of synthesizing whole bacterial genomes is bound to become routine in the not so distant future, perhaps making earlier technologies for genome editing obsolete [34]. As a consequence of the above, the current field of bioremediation has now a much larger toolbox, both wet and computational, for understanding the fate of pollutants and entertain interventions guided by sound molecular and physiological information about the biological actors of the process. Yet, note that advanced Metabolic engineering in the field is thus far only limited to specific species and even strains, among which various types of Pseudomonas like P. putida and P. fluorescens stick out as the best examples of possible chassis for engineering and eventually large-scale delivery of activities of interest [35, 36]. Alas, despite their key role in a plethora of environmental processes (e.g. dehalogenation and halorespiration), strictly anaerobic bacteria are still difficult to reprogram genetically as compared to their facultative aerobic or oxygen-tolerant (e.g. Azoarcus species) counterparts [37]. The same is true for fungi, the only biological agents able to deliver strong oxidative activities for biodegradation, e.g. of lignocellulosic waste [38]. Fortunately, at the time of

22.3 Dealing with Global Environmental Waste

writing this chapter, the onset of CRISPR/Cas-based tools is quickly reaching out species that were not amenable to genome editing and GE, and it is a matter of time that more bioremediation agents can be designed in the laboratory based on a large repertoire of microbial chassis [39]. In the meantime, some advanced engineering of catalytic activities with environmental interest and ensuing bioremediation interventions can be entertained with just knowledge of the genomes of the microorganisms involved and (if possible) availability of their metabolic models. This approach, which some call GE-free synthetic biology in a different context [40] may fill the gap between GE/non-GE catalysts and also ease public and industrial acceptance of bioremediation. The idea in this case is to play with community composition that can be modified through direct intervention (e.g. by adding a new constituent that fills a gap in a multistep process) or by creating environmental conditions optimal for the spontaneous emergence of the desired consortium. In any case, whether based on GE agents or not, any bioremediation intervention starts by identifying the target compounds to be tackled. But which are the most pressing challenges issues and how can metabolic engineering help to tackle them?

22.3 Dealing with Global Environmental Waste Traditional environmental microbiology has largely dealt with the issue of chemical pollution from three different perspectives: prevention, monitoring, and remediation. Metabolic and genetic engineering of bacteria have contributed very significantly to each of these aspects. Many of the accompanying chapters of this book deal directly or indirectly with prevention, as long as they propose sustainable and environmentally friendly alternatives to chemical processes that are otherwise highly harmful to ecosystems [11]. Pollutant-responding whole cell biosensors of many types have also been generated over the years able to convert the presence or absence of given chemicals into optical or electrical readouts that enable quantitation of the bioavailable fraction of specific compounds [41]. Most of these are based on transcriptional factors (TFs) or riboswitches that either naturally of after some modifications enable expression of suitable reporters (e.g. fluorescent proteins). Although the original concept of such whole-cell biosensors for small molecules originated in environmental pollution control, the notion has reached out the field of metabolic engineering as a way of monitoring performance of biosynthetic or biodegradative pathways [42]. This situation, in turn, has afforded general strategies for detection of chemicals for which neither TFs nor riboswitches exist. This objective is accomplished by engineering a bacterial pathway that converts the target molecule into another chemical species that can indeed be recognized by existing TFs [43, 44]. More recently, cell-free in vitro transcription systems have been engineered for the simple, portable, point-of-care detection of contaminants in drinking water including antibiotics, heavy metals, and small molecules [45]. Yet, bacterial-based manufacturing of chemicals and biomonitoring continue to be supportive actions in bioremediation. The real biological protagonist of

865

866

22 Metabolic Engineering for Large-Scale Environmental Bioremediation

Table 22.1 Major global environmental problems caused by urban and industrial activities. Chemical emissions Major origin/activity

Pollutants

Consequences

References

Fossil-fuel burning

CO2

Green House effect

[46]

Agricultural and urban activities

Methane, N oxides, S oxides, fluorocarbons

Green House effect Ozone depletion

[47]

Plastic industry

PET, polyethylene, polystyrene

Perturbation of trophic chain (e.g. water ecosystems)

[48–51]

Intensive agriculture Paper industry

Lignocellulosic residues

Biomass surplus, agricultural waste incineration

[52, 53]

Intensive agriculture

Soluble N and P species

Eutrophication of surface and coastal waters

[54]

Urban activities, medical prescriptions

Micropollutants: pharmaceuticals, flame retardants

Endocrine disruption Microbiome interference

[55]

Mismanagement of natural resources Major origin/activity

Process elicited

Consequences

References

Land (re)use

Desertification

Growing expansion of arid lands

[56]

Demand of metabolizable N

Chemical synthesis of ammonia through the Haber-Bosch reaction

Energy overconsumption

[57, 58]

Fossil-fuel based transport and heating

Suspended particulates in air (dust, dirt, soot, smoke)

Decline of air quality

[59]

Freshwater overexploitation

Chemical, biological adulteration, oversalting

Shortage of salubrious drinking water

[60]

Exhaustion of bioavailable P

Depletion of mines of soluble geological phosphates

Irreversible loss of nonrenewable fertilizer

[61]

Bushfires and wildfires

Air and soil overheating

Firestorms, scorched land

[62]

the endeavor is biodegradation proper – the question is therefore what is to be degraded and how can we propagate the cognate remediating agents at the desired scale. The field of modern bioremediation was born at the time of growing awareness of the environmental impact of industrialization in the 1980s of last century. In contrast, the transition between what we called bioremediation 2.0 and 3.0 above [17] has taken place in the midst of an unprecedented deterioration of global environmental quality. As shown in Table 22.1, out of the different types of

22.4 Beyond Bioremediation 3.0: From the Test Tube to Planet Earth

urban and industrial discharges, at least five categories of chemical emissions can be identified that impact directly the operation of basic functions at a planetary scale. While they are not particularly toxic per se, their extensive dispersal and penetration in all types of ecosystems do perturb many of the operations necessary to keep large-scale homeostasis. The most evident of them is the accumulation of greenhouse gases that are largely to blame for the ongoing global warming and the ensuing climate crisis. But others are a matter of concern as well – plastic pollution, micropollutants, lignocellulosic residues, and eutrophication of water with discharged P and N to name just a few. Another type of environmental problems stems not so much from specific chemical emissions but from human mishandling of natural resources, e.g. water treatment, scorched soil, expansion of arid ecosystems, particulates in air and mismanagement of fertilizers [63]. These scenarios have elicited new opportunities for genetic and metabolic engineering of whole-cell catalysts with a potential to mitigate the problems, some of which are the subject of other chapters of this collection. Major efforts are going on at this time for developing biological agents with a superior capacity to handle CO2 [64–66], to synthesize biodegradable polymers [67] and to bring about biological alternatives to otherwise environmentally-costly chemical processes [68]. On this background, the concept of a waste-free circular economy is growingly seen as the most promising way out of the current state of affairs. One of the drivers of such a notion is the shift of emphasis from mere pollutant removal or biodegradation to the more economically appealing valorization of waste. Advanced metabolic engineering has a pivotal role to convert, e.g. CO2 and other industrial gases [69–71] into biopolymers, biological conversion of “bad” plastics into their environmentally friendly counterparts and generation of added-value chemicals out of lignin. These efforts enlarge the plethora of biotransformations designed for adding worth to cheap substrates otherwise destined for just burning of landfilling [52, 72]. These and others are welcome examples of what one could call reactive responses to environmental pollution. Such actions are envisioned to work within specific geographical locations/factories and with the biological components well confined in reactors. Even bioremediation interventions are generally thought to be applicable to identifiable sites and – in case it is a genetically modified organism (GMO) – making sure the biological agent at stake does not escape the target spot. While these approaches have functioned reasonably well thus far, the ramping climate change obliges us to rethink proactively (i) the scale at which bioremediation interventions need to be carried out, (ii) which are the agents able to do it, and (iii) how such a phenomenal task could be managed in practice.

22.4 Beyond Bioremediation 3.0: From the Test Tube to Planet Earth The notion of massive human interventions deployed to reshape the ecology of a large geographical extension is by no means absent of the scientific literature. Such processes can happen nondeliberately, e.g. the effect of the Haber–Bosch reaction to generate ammonia out of N2 and H2 [57, 58] or purposely, e.g. the

867

868

22 Metabolic Engineering for Large-Scale Environmental Bioremediation

whole remake of the ecology of the Ascension Island after Darwin’s stopover in 1836 [73]. At a still larger scale, Astrobiology has been playing with the concept of Terraforming (i.e. creating an unconstrained planetary environment that supports life) as a way to engineer sustainable biological networks following deliberate inoculation of ecosystem-forming agents in otherwise life-free settings [74]. This hypothetical picture has recently become a reference to entertain interventions aimed not to seed life in an abiotic scenario but to restore extensive ecosystems otherwise spoiled by human emissions and industrial/urban activities [75, 76]. There seems to be a considerable opportunity for Synthetic Biology to fight against climate chance and environmental decline [77]. Yet, how realistic are these scenarios? The starting point of such a “Terraforming Earth” approach is the existence of ∼100 G tons of microbial biomass on the Earth surface including soil and marine systems [79]. If we exclude just one complex molecule (i.e. lignin) that is more abundant in mass, no other biological actor of our planet has the sheer volume and the catalytic power to have an impact on the list of environmental problems listed in Table 22.1 – in particular when combined with plants. Owing to the inherent properties of microorganisms, they can propagate very quickly through countless physical paths and deliver their activities through many different scales [78] (Figure 22.3b). In some remarkable cases, just one strain can colonize many square kilometers of soil [80]. Occasional pandemics of pathogens bear witness of the astounding proficiency of microorganisms to overcome material and geographical barriers. While this spreading capacity is a serious problem when the strain at stake is a pathogen, the same dispersal mechanisms could be adapted with the molecular tools of contemporary genetic engineering described above for unleashing beneficial traits at large scale. On this basis, it thus seems that the environmental microbiome should the obvious executioner of remediation deeds and thus the subject of metabolic and (in general) genetic programing to this end (Figure 22.3a). A second plus of the already existing environmental microbiome is epitomized in its amazing ability to host massive HGT events through a suite of mechanisms, typically conjugation, transformation, and transduction – often in combination with the existing abundance of mobile genetic elements [81]. As studies on antibiotic resistance genes have repeatedly proven, specific traits can disseminate very quickly from one individual genome all the way to a large landscape under the right selective pressure [82]. As indicated above, these events and their underlying mechanisms may be undesirable from a human health perspective, but the corresponding molecular machineries could be repurposed for the sake of dispersing environmentally favorable activities (Figure 22.3b). In sum, it would seem that global environmental troubles at this time of XXI century open three types of nonidentical, but still related challenges, i.e. (i) engineering activities for either reactive or proactive interventions aimed at alleviating one or more of the risks listed in Table 22.1, (ii) development of tools for advanced genetic design of agents destined for environmental release – and methods for their dispersal thereof, and (iii) domestication of HGT mechanisms for deliberate spreading of activities helpful to counteract the effect of human emissions. The state of affairs in these three fronts (summarized in Table 22.2) is addressed in the following sections.

22.4 Beyond Bioremediation 3.0: From the Test Tube to Planet Earth

(a)

(b)

Figure 22.3 Multiscale propagation of microbial activities. (a) Microorganisms deliver their deeds through the whole of scales that operate in the functioning of the Biosphere. This makes them ideal catalysts for elimination of chemical pollution, from a few grams in a relatively small site to many million tons spread through the planet at large. Source: Redrawn from De Lorenzo [78]. (b) Dissemination of DNA through horizontal gene transfer. As studies on dispersal of antibiotic resistance genes have shown, a genetic innovation originated in a single genome can propagate in a few years through the whole environmental microbiome provided there is selective pressure to move HGT forward. We argue that the same mechanisms could be exploited for circulating beneficial activities through the whole environmental microbiome.

Table 22.2 Synthetic biology-based technologies for large-scale bioremediation. Scientific questions

References

Pathway/biological trait design/optimization

[27, 83, 84]

Device interoperability

[85, 86]

Broad-host-range expression systems

[24, 87]

Genomic stability

[88, 89]

Ecological theory/models

[75, 90]

Enabling genetic technologies

References

Environmental chassis

[36, 91, 92]

Containment and safety-by-design

[93–96]

Barcoding/unique identifiers

[97, 98]

Community design

[40, 99–101]

Environmental Galenics

References

Formulation of catalytic organisms and meta-organisms

[102, 103]

Stimulation of horizontal gene transfer (HGT)

[104–106]

Infiltrating bacterial dispersion and HGT highways

[107–109]

869

870

22 Metabolic Engineering for Large-Scale Environmental Bioremediation

22.5 Bottlenecks in the Development of Environmental Biocatalysts Naturally occurring bacteria possess an apparently inexhaustible ability to run a suite of chemically difficult reactions. The design of any bioremediation agent thus starts with the identification of the genetic complement that encodes the properties of interest and their rational refactoring for delivering optimally the catalytic activity at stake through an equally optimal live carrier. At that stage, the scientific challenge is identical to that posed by many other metabolic engineering efforts. The pathway of interest for counteracting any of the environmental issues listed in Table 22.1 can be genetically assembled in a carrier or chassis of choice, and the stoichiometry of the different components let to fluctuate until the route reaches a peak of efficiency under the desired operation conditions [83]. A wealth of computational resources, genetic tools, and specialized hosts have been developed in recent years to this end, including mutagenic ssDNA recombineering for adjusting regulatory sequences to an adequate performance [25–27] (Figure 22.4). At the present time, access to these enabling methodologies is not a major issue. Constructing new biochemical pathways by recruiting genes from diverse origins and improving their performance, assembly of the cognate route(s) and evolutionary optimization of the constructs for fitting specific hosts may be time consuming, but not intractable at all. Difficulties start, however, when a pathway optimized for a certain host and under the controlled conditions of the laboratory or a bioreactor is envisaged to perform somewhere else – in an open environment and/or, as explained above, expected to propagate into other microbial carriers. This raises the question of interoperability and portability of

Figure 22.4 Exploration of the solution space for a given pathway through induced variability of the regulatory regions. The problem can be abstracted/formulated as a route to generate compound 𝜑 by subjecting substrate 𝛼 to the action of enzymes ABCDE. A retrosynthesis platform (or other CAD resources for pathway assembly) delivers a draft DNA sequence that is then diversified in vivo by targeting promoters, intergenic regions (IRs), and key metabolic bottlenecks with a variety of diversity-generating tools. The result (sketched as a geometric/physical problem solved by gravity) can be followed with a biosensor, e.g. a TF responding to 𝜑 by activating a cognate promoter to express a growth trait.

22.5 Bottlenecks in the Development of Environmental Biocatalysts

genetic constructs, an issue hardly tackled thus far. While orthogonalization of genetic devices has been proposed as a way to avoid this predicament and making their output more predictable [110, 111], the solutions proposed thus far include too many components that may mutate and compromise durability. The way to overcome this bottleneck may involve to firstly develop a conceptual frame for quantifying the tolerance of constructs to function in the different genetic and biochemical backgrounds of environmental bacteria – an important endeavor that is still in its infancy [112]. A related aspect is that of securing expression of the genes or pathways of interest in a variety of hosts. The intrinsic activity of enzymes proteins is most often kept when produced in a heterologous recipient. Yet, the regulatory actors (e.g. promoters, TFs, RBSs) that determine expression levels and the parameters that rule their role in vivo change drastically from one species to another – even from one strain to another. To overcome this impasse for environmental applications it might be mandatory to extend the existing collection of regulatory parts toward promoters and other control sequences found in promiscuous plasmids and other parasitic or unrestrained mobile elements thereof, as they have naturally evolved to be functional in a large variety of hosts [87]. Ideally, such promiscuous expression systems could operate not just among bacteria but also be able to trespass interkingdom barriers and propagate in eukaryotic destinies – especially fungi and plants. This phenomenon already happens naturally (e.g. [113]) so there is no reason why the same mechanisms could not be refactored for environmental spread of environmentally beneficial traits. The next obvious issue is that of genetic stability, i.e. the permanence of an inalterable DNA sequence over time and in face of environmental fluctuations and physicochemical insults. Mutations and spontaneous genetic diversification when exposed to changes seem to be inherent to any living system, and bypassing them is extremely difficult as long as we remain within the realm of “familiar” biology. Yet, not every bacterial host is as evolvable as the others, and some of them are endowed with specific mechanisms to secure preservation of DNA-encoded information over time. These naturally occurring devices include, inter alia, hyperactive SOS systems of DNA repair [114], polyploidy, and metabolic networks geared toward NAD(P)H overproduction to counteract oxidative stress [115]. In the other extreme, we can find very unstable genomes characterized by an excess of mobile genetic elements, expressing error-prone DNA polymerases, and/or deficient in mismatch repair systems [116]. A traditional approach toward artificially increasing genomic constancy is to simply render cells recA-minus, thereby killing the SOS response. Yet this manipulation comes at the price of lessening the tolerance to other environmental stresses. In more recent times, a number of synthetic genetic approaches have been proposed to either prevent or penalize the loss of heterologous genes as well as eliminating cells that bear mutations in DNA sequences of interest. The strategies range from editing the genomes of the host for eliminating insertion sequences and other instability-generating elements [88], to active genetic circuits for detecting underperformance of engineered constructs [89]. Further, more sophisticated approaches include artificial fail-safe genetic codes [117] and entanglement of sequences of essential and engineered genes [118, 119]

871

872

22 Metabolic Engineering for Large-Scale Environmental Bioremediation

that purge loss-of-function mutations. Such an overlap protects a potentially costly gene from removal by natural evolution by associating the benefit of this elimination with a larger or even lethal cost. Finally, a separate strategy has been recently proposed involving cells programmed to dissolve their chromosome upon conditional induction of an endogenous nuclease by an external signal (e.g. a chemical inducer) and thus result in catalytically-active but DNA-free vesicles [120]. These biocatalysts maintain their activity for many hours before finally dying out, thereby increasing their containment and safety. The downside of this strategy is of course that such cells (called SimCells) stop propagating and therefore their applicability to large-scale bioremediation scenarios is limited. While all these stratagems to avoid evolution have a considerable value, they still need to be tested under field-operation conditions. Finally, an overarching bottleneck that hinders the immediate possibility of large-scale environmental interventions is the lack of adequate models of propagation of GMOs and SBAs through terrestrial. Aquatic and airborne scenarios – both the living agents as such, or their DNA. Despite decades of discussion and controversies on the deliberate or accidental release of such GMOs, the reality is that reliable models do not exist yet and it is thus problematic to make sound environmental risk assessment (ERA) of such heavily engineered agents. The closest approach to this question can be found in current models of propagation of pathogens and antibiotic resistance genes [121, 122]. But there are essential differences between simulations based on damage to human health and others based on global environmental benefits. Only recently the issue starts to be tackled in overarching models that go through a multi-scale space connecting individual microbiomes to the larger context of the ecological communities of the entire Biosphere [90]. One critical aspect is identification of critical tipping points that, once trespassed, make spoiled sites run into a nonreversible deterioration mode [123]. At the same time, these models indicate the type of biological activities that are necessary for restoration and the characteristic of the ecosystem engineering agents (either natural or generated through synthetic biology) that are necessary to that end [75, 76]. While still in an early stage, these models will be invaluable to plan large-scale bioremediation interventions.

22.6 Chassis for Delivery of Activities Beneficial for the Environment While the section above listed a number of still open questions for designing sound bioremediation agents, there are already available assets stemming from contemporary Synthetic Biology that should ease the transit from small-scale to large-magnitude undertakings. One key aspect is the design of the catalytic agent(s) proper. As sketched in Figure 22.1, such agents may range from cell-free extracts to dead cells, chromosome-free vesicles, to live microorganisms. The last can operate as single, individual strains or forming communities that, in turn, can be either planktonic or shaping flocks or surface-attached biofilms. In yet another turn of complexity, the active agent might also be a multi-strain or

22.6 Chassis for Delivery of Activities Beneficial for the Environment

multi-species consortium let to assemble spontaneously or built with a preset 3D structure [124]. Finally, microbial activities can be sustained by or combined with plants (i.e. rhizoremediation [125]) and also empowered by bioelectrogenic processes [126]. We cannot cover in this chapter all these interesting scenarios, but many of converge to the question of the properties that a bacterial chassis should have for an optimal performance in environmental settings independently of the catalytic traits therein engineered. The concept of chassis is central to contemporary synthetic biology [91]. The concept is not merely technical or scientific, as it can have also important regulatory consequences that mark the difference between being approved by for use or not by supervisory agencies. It is not enough for an environmental microorganism to act as a host of recombinant DNA to become a bona fide chassis for bioremediation. The bacterium at stake must also fulfill a large number of requirements, including genetic stability, lack of virulence factors, defined stress-resistance traits, and known HGT abilities [91, 92]. All these features might vary depending on the application sites – what logically determines the choice of the species that are best suited for each target ecosystem. For instance, cyanobacteria appear as very appealing chassis for dealing with arid soil crusts, Rhizobacteria for plant roots, Alteromonas for marine systems and Azoarcus for anaerobic sites. There is often a tradeoff between ease of genetic engineering and environmental efficacy, as molecular tools for programming the most promising species are frequently lacking or the cognate toolbox is underdeveloped. The onset of CRISPR/cas9-based approaches is quickly changing this state of affairs and more and more species are in the pipeline of genomic editing and rational programming [127, 128]. Yet, it seems reasonable that the number of chassis as vehicles of bioremediation activities will remain limited, if only to ease ERA and regulations for deliberate release. In the meantime, the cosmopolitan soil bacterium species P. putida and in particular strain KT2440 [35, 36] and its derivatives thereof have emerged as a prototype of bioremediation agent and a test bed for inspecting the many questions regarding liberation of SBAs and large-scale propagation of its borne recombinant DNA. A separate chapter of this book deals in detail with the biochemical and physiological characteristics that render P. putida so appealing for enduring stress and hosting harsh redox reactions often required for sound degradation of environmental pollutants (Chapter 14). Yet, the principles that positioned P. putida as a cell factory of choice for bioproduction of complex molecules also explain its choice as a promising bioremediation agent – not least, its wide range of catabolic activities that make the catabolism of recalcitrant structures feasible. Notwithstanding, when the challenge is not to have a whole catalyst in a bioreactor but to forward its activity in an open environment, issues other than biochemical performance need to be considered. Genetic stability and catalytic efficacy granted, the next questions include containment and traceability. The history of genetic engineering is intimately connected to the concerns about the possible detrimental effects of accidental or deliberate escape of GMOs – and more recently synthetic biology agents, SBAs – to the environment, including apprehension of malicious use [129]. A large number of approaches have been entertained to limit the propagation of such genetically manipulated

873

874

22 Metabolic Engineering for Large-Scale Environmental Bioremediation

strains (or their DNA) beyond the time and space that they were expected to perform [93]. These range from engineering simple auxotrophies to the way more complex recoding of the entire genome to make DNA dependent on exogenous xenobases, reassignment of the genetic code or introduction of kill switches (reviewed in [93]). While some of these stratagems do curtail survival of the recombinant agents and/or HGT of their genes, none of them secures certainty of containment beyond 10−12 . This is a very low scape rate, but still too high in numbers relative to the dimensions of the global microbiota. Genetic and semantic firewalls may thus be a kind of prophylactic measure for SBAs, but absolute restraint is not yet in sight [130]. Instead, the last few years have witnessed a change of emphasis from containment to traceability. Implantation of unique sequence identifiers in the genomes of microorganisms destined for the environment is not just a more realistic approach to track them but also to quickly retrieve their pedigree, identify its characteristics and (if necessary), implement countermeasures to stop propagation [131]. Such identifiers can be associated to specific chassis through introduction of DNA barcodes in stable regions of the genome, thereby allowing a sort of version control for each of the strains and generating digital twins of each of the agents under study [97]. Alternatively – or simultaneously – specific DNA constructs can be watermarked with short identifiers to secure their origin [131–133] and avoid tampering or misappropriation. This might be particularly interesting for tagging genes and constructs destined for propagation and when a widespread circulation of recombinant DNA – and not its containment (see 22.9) – is desired.

22.7 Manufacturing Catalytic Consortia Naturally-occurring biodegradation of environmental pollutants is hardly ever run by a single bacterium present in the afflicted site. Instead, this task is executed by microbial consortia that often become structured in space for an optimal execution of a complex catabolic pathway (Figure 22.5). Not infrequently, the consortium includes members that get rid of toxic intermediates, protect against grazing [135] or provide a physical scaffold for the community to assemble. At the same time, natural consortia have to deal with cheaters that benefit from common goods present in a syntrophic scenario but give nothing in return [136]. Such a well-documented natural story of biodegradation processes advices against the development of monoclonal agents and single strains to eliminate environmental pollution. After a decade of trials, heavily engineered super-degraders failed altogether as bioremediation instruments and the notion was basically set aside by the end of the twentieth century [16]. The problem of reaching the exact stoichiometry and the best biochemical background for each of the steps involved in a pathway has often been evolutionarily cracked not so much by selecting single strains with all the biochemical capacity to do the job but by gathering multistrain and multispecies communities. These assemblies can quickly adapt to specific conditions by changing their relative composition instead of regulating gene expression at individual levels (which is anyways difficult to ascertain). The take-home lesson of all this body of knowledge is that engineering

22.7 Manufacturing Catalytic Consortia

S S B C D E

P

P (a)

(b)

Figure 22.5 Single-strain multistep process vs. consortium-based distributed catalysis. (a) Single-strain catalyst. This is the most typical objective of metabolic engineering. Genes of various origins (BCDE) are combined in a single microbial chassis in a fashion able to convert substrate S into product P (which, in a bioremediation scenario, can be just CO2 + H2 O). The most advantageous stoichiometry and flow of intermediates through the pathway is achieved through optimization of the regulatory elements (promoters and intergenic regions) of the engineered route. (b) Distributed catalysis in a multistrain/species partnership. The same pathway BCDE can be implemented through a combination of different microorganisms, each holding one or more (optimized) steps of the route. Enzymatic stoichiometry is adjusted in this case through fluctuations in consortium composition.

biodegradative consortia might be more useful for bioremediation than designing monoclonal bacteria which may do well in a test tube but might turn useless under real operation conditions. This scenario has elicited a considerable interest in development of tools aimed at artificially assembling partnerships of different types of strains and species for generation of meta-organismal catalysts [40, 99, 100, 137]. Apart of just combining biochemical activities, the various partners can be physically connected by means of surface-exposed adhesins that follow certain association rules (Figure 22.6). Adhesin-target libraries are available, e.g. as series of nanobodies/target antigens [138]. Once encoded genetically and displayed on the cell surface, such combinations of adhesins enable the design of flocks with a certain stoichiometry and 3D structure and thus bearing what has been called synthetic morphologies [139, 140] Also, if the targets of one or more adhesins are presented on a solid surface, the bacteria involved may form a monolayer reminiscent of a biofilm but with an artificially programmed regular structure. To this end, eliminating bulky structures from the cell surface increases access between matching interactions partners and thus predictability of the resulting agents [134]. Playing with physical morphologies (including making them controllable with an external signal [141–143]) provides opportunities of upgrading catalytic performance well beyond the sheer optimization of the corresponding biochemical route. This trait can itself be further enhanced by endowing the designed partnership with inter-cell communication devices that can help to maintain community composition and

875

876

22 Metabolic Engineering for Large-Scale Environmental Bioremediation

Specific attachment

Spherocylinder

Surfome

Artificial

editing

adhesins

Wild type

Naked cell

Engineered partnership

(a)

L2

4

L12

L1 3

L45

Singlet

L56 L2

3

(b)

Aggregate

Singlet

Figure 22.6 Engineering bacterial consortia in flocks with predetermined 3D structures. (a) Surface-editing of whole-cell catalysts for easing display of adhesins on the bacterial envelope. Genetic editing of the cell surfome for eliminating structures that impede access to adhesins presented on the cell exterior eases engineering of artificial partnerships. Source: Reprinted with permission from Martínez-García et al. [134]. © 2020 American Chemical Society. (b) The ensuing surface-naked cells can be modeled as spherocylinders and the 3D structures of the resulting flocks mathematically modeled. Source: Based on [134].

hold the consortium together. Such synthetic communication channels have been shown feasible both by repurposing one or more of the quorum sensing systems often found in environmental bacteria for the sake of keeping a given community stable [144, 145]. Alternatively, a complex degradative pathway (e.g. the TOL pathway for catabolism of toluene by P. putida mt-2) can be split in various metabolic segments and the secreted intermediates used as signals for communication between different members of the consortium [146]. Although these advanced methods of bacterial community manufacturing are still incipient at the time of writing this chapter, they are expected to take whole-cell catalyst engineering to the next level of complexity and efficacy – whether for industrial biotransformations or for environmental applications.

22.8 Environmental Galenics

22.8 Environmental Galenics The step immediately prior to a bioremediation intervention is the preparation of the biological agents at stake in a material format that makes them amenable to extensive spreading. There is a considerable body of experience about formulating microorganisms for area-wide inoculation as plant growth promoters and biocontrol agents to prevent agricultural pests [147]. Also, the probiotic sector has developed over the years many ways of delivering beneficial bacteria to the human gut, e.g. capsules, pills, and syrups [148]. Let alone the burgeoning field of fecal transplantations, where a preexisting microbial community is complemented or altogether replaced by another following a microbiological logic [149]. In stark contrast, not much has been developed specifically to the same end for conceptually equivalent environmental probiotics capable of counteracting pollution. In the case of oil spills in marine ecosystems, the strategy has mostly involved in situ fertilization of the leaked petroleum with oleophilic N and P source to promote emergence of naturally-occurring degraders [150]. But deliberate, large-scale spreading of engineered bacteria – whether mono-strain or multi-strain – raises its own specific issues that have not received much attention thus far. One attractive possibility involves the manufacturing of the degradative GMOs or SBAs in water-soluble capsules [102, 103] filled with the biological agent proper plus some additives to increase its catalytic performance, shelf-life and containment. One typical problem that often hinders applicability of such microorganisms is in fact sensitivity to desiccation, which could be improved by adding osmoprotectants in the formulation [103]. The same capsules can contain nutrients for securing co-metabolic activity and expression of adequate enzymes under otherwise oligotrophic conditions. Finally, the formulation can contain one or more compounds on which the SBA is entirely dependent for growth. This may range from a simple metabolite to compensate an auxotrophy [94] to an essential synthetic chemical [151], thereby imposing a degree of containment. Furthermore, by adopting a physical shape and properties not that different from plant seeds, gelatin capsules could be dispersed over large extensions of land with existing seeding strategies and machineries so common in intensive agriculture practices. A separate challenge for spreading catalytic SBAs is that of predation, as artificial increases of bacterial populations create a niche for grazing protozoa [152]. This occurrence could be avoided by either engineering the biodegradative bacteria for production of compounds unpalatable for protists or adding other bacteria to the formulation to the same end [135]. In an extreme case, whole-cell catalysts have been put inside a protective plastic tubing that keep microorganisms protected from grazing but still able to deliver their activities to soil [153, 154]. Most of these possibilities remain to be explored, but they need to be tackled for making bioremediation based on deeply engineering agents a reality. In any case, as mentioned above the field still demands development of what could be called Environmental Galenic Science for enabling efficacious transfer of beneficial activities developed in the laboratory into ecosystems deteriorated by human activities.

877

878

22 Metabolic Engineering for Large-Scale Environmental Bioremediation

22.9 Toward HGT-Based, Large-Scale Bioremediation While the concepts and approaches discussed above expand the dimension of the scenarios where advanced bio-based interventions can be entertained, the possibility of global actions to curb pollution at a planetary level still appear problematic [63]. The bottlenecks hampering further developments include public reservations on the ethics, safety and governance of such actions. These are issues recurrently discussed regarding Geoengineering as well [156, 157]. In the more technical side, the reality is that there are very few approaches developed thus far for deliberately spreading biological activities at a very large scale. On top of this, the historical development of genetic engineering has been systematically accompanied by a concern about the fate of recombinant DNA released into the environment, whether accidentally or intentionally [158–160]. The spreading of antibiotic resistance genes (Figure 22.3b) exposes how fast a specific trait can propagate through the multi-scale layers of the biological world owing to the efficacy of HGT mechanisms. This has traditionally created a legitimate distress about the liberation of bioremediation agents, let alone propagation of recombinant DNA encoding the traits of interest, and has stimulated much research on genetic and biological containment. The bottom-line of these efforts is that HGT is bad and it needs to be avoided whenever possible. However, the last few years have exposed a number of synthetic biology-based strategies, e.g. for depletion of pathogens in a complex bacterial community, which depend on massive HGT of conjugative plasmids bearing an engineered genetic device for detection and destruction of virulence or antibiotic resistance genes [161–163]. Furthermore, the recent invention of gene drives based in CRISPR/Cas9 [164] has opened possibilities of designing processes for self-dispersal of selected traits in a fashion that does not require an external selective pressure. Most envisioned applications of gene drives deal with extinction of noxious species, but the same frame could be adapted to the propagation of beneficial properties though a population, including biodegradative capacities. Gene drives are not automatically adaptable to typically mono-chromosomal haploid bacteria, but some strategies have recently started to emerge – inspired in the concept that enable propagation of specific DNA sequences without phenotypic selection [165, 166]. These two examples indicate that HGT is valuable under given circumstances. Alas, the very creative efforts invested in stopping DNA transfer in the microbial community lack any counterpart in the direction of strategies for just the opposite – i.e. fostering HTG. Available data at small scale suggest that such a strategy for spreading biodegradative genes is indeed feasible [104–106] but the scheme could be blown up to a much larger dimension. In particular, this overarching objective can be combined with the broad-host-range expression signals and the DNA watermarks discussed above. Engineering super-spreaders of environmentally useful activities could become the ultimate tool for fortification of environmental microbiomes in their ability to deal with pollutants and other challenges (Figure 22.7). A final consideration that goes beyond the scope of this chapter is the eventual exploitation of already existing channels of area-wide dispersion of bacteria and their genes. Once more, such channels are generally considered a target to destroy for avoiding the spread of virulent

22.10 Conclusions and Future Prospects: Towards Bioremediation 4.0

Figure 22.7 Fortification of environmental microbiomes with beneficial traits through massive HGT. Engineered HGT “super-donors” could propagate beneficial traits through a diversity of microbiomes, whether environmental, plant-based, or gut-based. Conjugation could be the preferred mechanism to this end, but others may also operate. As explained in the text, designing such a process in a predictable and safe fashion requires tackling a number of issues, including broad host range expression, device interoperability, watermarking human-made constructs, and development of models for large-scale propagation of recombinant DNA – and features encoded therein. Source: Pinilla-Redondo et al. [155]. © 2018 Elsevier.

pathogens and antibiotic resistance. But note also that some transmission networks could also be invaluable for delivery of bioremediation agents. In this respect, cloud microbiology [107, 108] – a somewhat marginal branch of contemporary bacteriology – may emerge in the not so distant future as fertile playground for developing novel stratagems aimed at spreading biodegradative microorganisms and/or their genes over long distances and extensions.

22.10 Conclusions and Future Prospects: Towards Bioremediation 4.0 Bioremediation as a scientific discipline and a technology has gone through various stages which can be tagged as 1.0 (natural attenuation and trial and error with naturally-occurring biologicals), 2.0 (genetically-engineered catalysts) and 3.0 (systems and synthetic biology-designed strains and microbial consortia). Yet, there is still room for a 4.0 version (and beyond), where strategies and constructs for a global action are considered and novel conceptual, material and genetic tools to bring them about are developed. If nearly every biotechnological development faces the challenge of scaling up the cognate process, the task of Bioremediation

879

880

22 Metabolic Engineering for Large-Scale Environmental Bioremediation

4.0 is still more phenomenal given that the ultimate application target is the whole biosphere. Yet, a large challenge can be disclosed into a collection of smaller and more tractable questions like the ones that have been addressed above. Under these circumstances, we advocate the value of developing suitable experimental systems where many of the uncertainties listed in this article can be adequately addressed. In this sense, much of our most recent work has been spent in upgrading P. putida not just as an efficient chassis for industrial biotransformations (see Chapter 14) but also as a prototype of what a biological agent for Bioremediation 4.0 could look like. The genomic complement of the archetypal KT2440 strain of this species has been genetically edited for enhancing environmentally valuable properties, for eliminating metabolic drawbacks and for hosting synthetic devices [23, 36]. Furthermore, the same strain is currently improved further to become a safe, tractable and efficacious chassis of choice for in situ delivery of a suite of biodegradative activities by itself or as a member of a multi-species consortium [134]. Through focusing on this specific experimental system and microorganism, we expect to expose and eventually overcome virtually all bottlenecks that limit the applications of modern GE for the sake of a cleaner environment and a more sustainable planet. In conclusion, although the bioremediation paradigm has shifted from decontamination of oil spillages and toxic wastes in relatively confined environments, the field is currently empowered with the concepts and framework of synthetic biology for tackling problems at a global scale. Once limited to hydrocarbons and low-scale-production contaminants, Bioremediation 4.0 has the potential of resolving environmental issues that were not considered a problem a few decades ago – such as plastics, emissions of greenhouse gases, and the inadvertent but continuous release of drugs in aquatic and terrestrial ecosystems.

Acknowledgments Ricard Solé and Alvaro San Millán are gratefully acknowledged for inspiring discussions along with members of the authors’ laboratories. This work was funded by the SETH (RTI2018-095584-B-C42) (MINECO/FEDER), SYCOLIM (PCI2019-111859-2 ERA-COBIOTECH 2018) Project of the Spanish Ministry of Science and Innovation. MADONNA (H2020-FET-OPEN-RIA-2017-1-766975), BioRoboost (H2020-NMBP-BIO-CSA-2018/ 820699), SYNBIO4FLAV (H2020-NMBP/0500), and MIX-UP (H2020-Grant 870294) Contracts of the European Union and the InGEMICS-CM (S2017/BMD-3691) Project of the Comunidad de Madrid/European Structural and Investment Funds (FSE, FECER). Financial support from The Novo Nordisk Foundation (NNF10CC1016517 and NNF 18CC0033664), the Danish Council for Independent Research (SWEET, DFF-Research Project 8021-00039B), and the European Union’s Horizon2020 Research and Innovation Program under grant agreement No. 814418 (SinFonia) to P.I.N. is also gratefully recognized. The authors declare that no conflict of interest exists in connection with the contents of this article.

References

References 1 Daubaras, D. and Chakrabarty, A.M. (1992). The environment, microbes

2 3 4 5

6 7

8 9

10

11

12

13

14

15

16

and bioremediation: microbial activities modulated by the environment. Biodegradation 3: 125–135. de Lorenzo, V. (2008). Systems biology approaches to bioremediation. Current Opinion in Biotechnology 19 (6): 579–589. Timmis, K.N. and Pieper, D.H. (1999). Bacteria designed for bioremediation. Trends in Biotechnology 17 (5): 201–204. Karig, D.K. (2017). Cell-free synthetic biology for environmental sensing and remediation. Current Opinion in Biotechnology 45: 69–75. Adams, G.O., Tawari-Fufeyin, P., Okoro, S., and Ehinomen, I. (2015). Bioremediation, biostimulation and bioaugmention: a review. International Journal of Environmental Bioremediation & Biodegradation 3 (1): 28–39. Pritchard, P.H. (1992). Use of inoculation in bioremediation. Current Opinion in Biotechnology 3 (3): 232–243. Santos, M.S., Nogueira, M.A., and Hungria, M. (2019). Microbial inoculants: reviewing the past, discussing the present and previewing an outstanding future for the use of beneficial bacteria in agriculture. AMB Express 9 (1): 205. Baez-Rogelio, A. et al. (2017). Next generation of microbial inoculants for agriculture and bioremediation. Microbial Biotechnology 10 (1): 19–21. De Lorenzo, V. (2009). Recombinant bacteria for environmental release: what went wrong and what we have learnt from it. Clinical Microbiology and Infection 15: 63–65. Holmes, D.E. et al. (2013). Enrichment of specific protozoan populations during in situ bioremediation of uranium-contaminated groundwater. The ISME Journal 7 (7): 1286–1298. De Lorenzo, V. et al. (2018). The power of synthetic biology for bioproduction, remediation and pollution control: the UN’s Sustainable Development Goals will inevitably require the application of molecular biology and biotechnology on a global scale. EMBO Reports 19 (4): e45658. Bento, F.M. et al. (2005). Comparative bioremediation of soils contaminated with diesel oil by natural attenuation, biostimulation and bioaugmentation. Bioresource Technology 96 (9): 1049–1055. Illman, W.A. and Alvarez, P.J. (2009). Performance assessment of bioremediation and natural attenuation. Critical Reviews in Environmental Science and Technology 39 (4): 209–270. Tyagi, M., da Fonseca, M.M.R., and de Carvalho, C.C. (2011). Bioaugmentation and biostimulation strategies to improve the effectiveness of bioremediation processes. Biodegradation 22 (2): 231–241. Golyshin, P. et al. (1999). Effect of novel biosurfactants on biodegradation of polychlorinated biphenyls by pure and mixed bacterial cultures. The New Microbiologica 22 (3): 257. Cases, I. and de Lorenzo, V. (2005). Genetically modified organisms for the environment: stories of success and failure and what we have learned from them. International Microbiology 8 (3): 213–222.

881

882

22 Metabolic Engineering for Large-Scale Environmental Bioremediation

17 Dvoˇrák, P. et al. (2017). Bioremediation 3.0: engineering pollutant-removing

18

19

20 21

22

23

24

25

26

27

28

29

30 31 32

bacteria in the times of systemic biology. Biotechnology Advances 35 (7): 845–866. Ellis, L.B. et al. (2003). The University of Minnesota biocatalysis/biodegradation database: post-genomic data mining. Nucleic Acids Research 31 (1): 262–265. Hadadi, N. et al. (2016). ATLAS of biochemistry: a repository of all possible biochemical reactions for synthetic biology and metabolic engineering studies. ACS Synthetic Biology 5 (10): 1155–1166. Engel, T. (2006). Basic overview of chemoinformatics. Journal of Chemical Information and Modeling 46 (6): 2267–2277. Finley, S.D., Broadbelt, L.J., and Hatzimanikatis, V. (2009). Computational framework for predictive biodegradation. Biotechnology and Bioengineering 104 (6): 1086–1097. Pazos, F., Valencia, A., and De Lorenzo, V. (2003). The organization of the microbial biodegradation network from a systems-biology perspective. EMBO Reports 4 (10): 994–999. Martínez-García, E. and de Lorenzo, V. (2017). Molecular tools and emerging strategies for deep genetic/genomic refactoring of Pseudomonas. Current Opinion in Biotechnology 47: 120–132. Martínez-García, E. et al. (2020). SEVA 3.0: an update of the Standard European Vector Architecture for enabling portability of genetic constructs among diverse bacterial hosts. Nucleic Acids Research 48 (D1): D1164–D1170. Csörg˝o, B., Nyerges, A., and Pál, C. (2020). Targeted mutagenesis of multiple chromosomal regions in microbes. Current Opinion in Microbiology 57: 22–30. Aparicio, T. et al. (2020). High-efficiency multi-site genomic editing (HEMSE) of Pseudomonas putida through thermoinducible ssDNA recombineering. iScience: 100946. Hueso-Gil, A. et al. (2020). Multiple-site diversification of regulatory sequences enables interspecies operability of genetic devices. ACS Synthetic Biology 9 (1): 104–114. Tan, S.Z., Reisch, C.R., and Prather, K.L.J. (2018). A robust CRISPR interference gene repression system in Pseudomonas. Journal of Bacteriology 200 (7): e00575-17. Batianis, C. et al. (2020). An expanded CRISPRi toolbox for tunable control of gene expression in Pseudomonas putida. Microbial Biotechnology 13 (2): 368–385. Parthasarathy, H., Hill, E., and MacCallum, C. (2007). Global ocean sampling collection. PLoS Biology 5 (3): e83. Landenmark, H.K., Forgan, D.H., and Cockell, C.S. (2015). An estimate of the total DNA in the biosphere. PLoS Biology 13 (6): e1002168. Lewin, H.A. et al. (2018). Earth BioGenome Project: sequencing life for the future of life. Proceedings of the National Academy of Sciences of the United States of America 115 (17): 4325–4333.

References

33 Machado, D. et al. (2018). Fast automated reconstruction of genome-scale

34 35

36

37 38

39

40 41 42

43 44 45 46

47 48 49 50

51

metabolic models for microbial species and communities. Nucleic Acids Research 46 (15): 7542–7553. Fredens, J. et al. (2019). Total synthesis of Escherichia coli with a recoded genome. Nature 569 (7757): 514–518. Nikel, P.I., Martínez-García, E., and De Lorenzo, V. (2014). Biotechnological domestication of pseudomonads using synthetic biology. Nature Reviews Microbiology 12 (5): 368–379. Nikel, P.I. and de Lorenzo, V. (2018). Pseudomonas putida as a functional chassis for industrial biocatalysis: from native biochemistry to trans-metabolism. Metabolic Engineering 50: 142–155. Watanabe, K. (2001). Microorganisms relevant to bioremediation. Current Opinion in Biotechnology 12 (3): 237–241. Deshmukh, R., Khardenavis, A.A., and Purohit, H.J. (2016). Diverse metabolic capacities of fungi for bioremediation. Indian Journal of Microbiology 56 (3): 247–264. Shi, T.-Q. et al. (2017). CRISPR/Cas9-based genome editing of the filamentous fungi: the state of the art. Applied Microbiology and Biotechnology 101 (20): 7435–7443. Blasche, S. et al. (2017). Model microbial communities for ecosystems biology. Current Opinion in Systems Biology 6: 51–57. van der Meer, J.R. (2010). Bacterial sensors: synthetic design and application principles. Synthesis Lectures on Synthetic Biology 2 (1): 1–167. Liu, D., Evans, T., and Zhang, F. (2015). Applications and advances of metabolite biosensors for metabolic engineering. Metabolic Engineering 31: 35–43. Pandi, A. et al. (2019). Optimizing cell-free biosensors to monitor enzymatic production. ACS Synthetic Biology 8 (8): 1952–1957. Koch, M. et al. (2019). Custom-made transcriptional biosensors for metabolic engineering. Current Opinion in Biotechnology 59: 78–84. Jung, J.K. et al. (2020). Cell-free biosensors for rapid detection of water contaminants. Nature Biotechnology https://doi.org/10.1038/s41587-020-0571-7. Davis, S.J., Caldeira, K., and Matthews, H.D. (2010). Future CO2 emissions and climate change from existing energy infrastructure. Science 329 (5997): 1330–1333. Montzka, S.A., Dlugokencky, E.J., and Butler, J.H. (2011). Non-CO2 greenhouse gases and climate change. Nature 476 (7358): 43–50. York, A. (2020). Adapting to plastic. Nature Reviews Microbiology 18 (7): 362–363. Lebreton, L.C.M. et al. (2017). River plastic emissions to the world’s oceans. Nature Communications 8: 15611. Galloway, T.S. and Lewis, C.N. (2016). Marine microplastics spell big problems for future generations. Proceedings of the National Academy of Sciences of the United States of America 113 (9): 2331–2333. Sussarellu, R. et al. (2016). Oyster reproduction is affected by exposure to polystyrene microplastics. Proceedings of the National Academy of Sciences of the United States of America 113 (9): 2430–2435.

883

884

22 Metabolic Engineering for Large-Scale Environmental Bioremediation

52 Beckham, G.T. et al. (2016). Opportunities and challenges in biological lignin

valorization. Current Opinion in Biotechnology 42: 40–53. 53 Guerriero, G. et al. (2016). Lignocellulosic biomass: biosynthesis, degrada-

tion, and industrial utilization. Engineering in Life Sciences 16 (1): 1–16. 54 Conley, D.J. et al. (2009). Controlling eutrophication: nitrogen and phospho-

rus. Science 323 (5917): 1014–1015. 55 Luo, Y. et al. (2014). A review on the occurrence of micropollutants in

56 57

58 59

60

61 62

63

64 65 66 67

68 69

the aquatic environment and their fate and removal during wastewater treatment. Science of the Total Environment 473: 619–641. Berdugo, M. et al. (2020). Global ecosystem thresholds driven by aridity. Science 367 (6479): 787–790. Erisman, J.W. et al. (2013). Consequences of human modification of the global nitrogen cycle. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 368 (1621): 20130116. Ritter, S.K. (2008). The Haber-Bosch reaction: an early chemical impact on sustainability. Chemical and Engineering News 86: 33. Bennett, J.E. et al. (2019). Particulate matter air pollution and national and county life expectancy loss in the USA: a spatiotemporal analysis. PLoS Medicine 16 (7): e1002856. Coffey, R. et al. (2019). A review of water quality responses to air temperature and precipitation changes 2: nutrients, algal blooms, sediment, pathogens. JAWRA Journal of the American Water Resources Association 55 (4): 844–868. Wu, Y. et al. (2019). Potentials and challenges of phosphorus recovery as vivianite from wastewater: a review. Chemosphere 226: 246–258. Alexander, M.E. (2000). Fire behaviour as a factor in forest and rural fire suppression. Forest Research Bulletin No. 197. Forest and Rural Fire Scientific and Technical Series Report No. 5. https://www.cfs.nrcan.gc.ca/ bookstore_pdfs/18242.pdf. de Lorenzo, V., Marliere, P., and Sole, R. (2016). Bioremediation at a global scale: from the test tube to planet Earth. Microbial Biotechnology 9 (5): 618–625. Antonovsky, N. et al. (2016). Sugar synthesis from CO2 in Escherichia coli. Cell 166 (1): 115–125. Gleizer, S. et al. (2019). Conversion of Escherichia coli to generate all biomass carbon from CO2 . Cell 179 (6): 1255–1263. e12. Miller, T.E. et al. (2020). Light-powered CO2 fixation in a chloroplast mimic with natural and synthetic parts. Science 368 (6491): 649–654. Lee, Y. et al. (2019). Systems metabolic engineering strategies for non-natural microbial polyester production. Biotechnology Journal 14 (9): 1800426. Chae, T.U. et al. (2017). Recent advances in systems metabolic engineering tools and strategies. Current Opinion in Biotechnology 47: 67–82. Phillips, J.R., Huhnke, R.L., and Atiyeh, H.K. (2017). Syngas fermentation: a microbial conversion process of gaseous substrates to various products. Fermentation 3 (2): 28.

References

70 Erdogan, A. and Orhan, Ö.Y. (2017). CO2 utilization: developments in con-

version processes. Petroleum 3: 109–126. 71 Naims, H. (2016). Economics of carbon dioxide capture and utilization – a

72

73

74 75 76 77 78

79

80

81 82 83

84

85

86 87

supply and demand perspective. Environmental Science and Pollution Research 23 (22): 22226–22241. Blank, L.M. et al. (2020). Biotechnological upcycling of plastic waste and other non-conventional feedstocks in a circular economy. Current Opinion in Biotechnology 62: 212–219. Wilkinson, D.M. (2004). The parable of Green Mountain: Ascension Island, ecosystem construction and ecological fitting. Journal of Biogeography 31 (1): 1–4. Rivera-Valentín, E.G. (2019). Reimagining terraforming. Nature Astronomy 3 (10): 883–884. Vicente, S., Montañez, R., and Duran Nebreda, S. (2015). Synthetic circuit designs for earth terraformation. Biology Direct 10: 37. Sole, R. (2015). Bioengineering the biosphere? Ecological Complexity 22: 40–49. DeLisi, C. (2019). The role of synthetic biology in climate change mitigation. Biology Direct 14 (1): 1–5. de Lorenzo, V. (2009). Exploiting microbial diversity: the challenges and the means. In: Handbook of Hydrocarbon and Lipid Microbiology, vol. 4 (ed. K.N. Timmis), 2437–2457. Berlin-Heidelberg: Springer-Verlag. Bar-On, Y.M., Phillips, R., and Milo, R. (2018). The biomass distribution on Earth. Proceedings of the National Academy of Sciences of the United States of America 115 (25): 6506. Ferguson, B.A. et al. (2003). Coarse-scale population structure of pathogenic Armillaria species in a mixed-conifer forest in the Blue Mountains of northeast Oregon. Canadian Journal of Forest Research 33 (4): 612–623. Aminov, R.I. (2011). Horizontal gene exchange in environmental microbiota. Frontiers in Microbiology 2: 158–158. Normark, B.H. and Normark, S. (2002). Evolution and spread of antibiotic resistance. Journal of Internal Medicine 252 (2): 91–106. Pfleger, B.F. et al. (2006). Combinatorial engineering of intergenic regions in operons tunes expression of multiple genes. Nature Biotechnology 24 (8): 1027–1032. Jeschek, M., Gerngross, D., and Panke, S. (2017). Combinatorial pathway optimization for streamlined metabolic engineering. Current Opinion in Biotechnology 47: 142–151. Khan, N. et al. (2020). A broad-host-range event detector: expanding and quantifying performance between Escherichia coli and Pseudomonas species. Synthetic Biology 5 (1): ysaa002. Beal, J. et al. (2016). Reproducibility of fluorescent expression from engineered biological constructs in E. coli. PLoS One 11 (3): e0150182. Martínez-García, E., Benedetti, I., Hueso, A., and de Lorenzo, V. (2015). Environmental plasmids for synthetic biology parts and devices. Microbiol Spectrum 3 (1): PLAS-0033-2014.

885

886

22 Metabolic Engineering for Large-Scale Environmental Bioremediation

88 Vernyik, V. et al. (2020). Exploring the fitness benefits of genome reduction

89 90 91

92 93

94

95

96

97

98 99 100

101

102

103

104

in Escherichia coli by a selection-driven approach. Scientific Reports 10 (1): 1–12. Rugbjerg, P. and Sommer, M.O. (2019). Overcoming genetic heterogeneity in industrial fermentations. Nature Biotechnology 37 (8): 869–876. Conde-Pueyo, N. et al. (2020). Synthetic biology for terraformation lessons from Mars, Earth, and the microbiome. Lifestyles 10 (2): 14. Calero, P. and Nikel, P.I. (2019). Chasing bacterial chassis for metabolic engineering: a perspective review from classical to non-traditional microorganisms. Microbial Biotechnology 12 (1): 98–124. Nora, L.C. et al. (2019). Recent advances in plasmid-based tools for establishing novel microbial chassis. Biotechnology Advances 37 (8): 107433. Schmidt, M. and de Lorenzo, V. (2016). Synthetic bugs on the loose: containment options for deeply engineered (micro)organisms. Current Opinion in Biotechnology 38: 90–96. Schmidt, M. and Pei, L. (2015). Improving biocontainment with synthetic biology: beyond physical containment. In: Hydrocarbon and Lipid Microbiology Protocols, 185–199. Springer. Schmidt, M. and de Lorenzo, V. (2012). Synthetic constructs in/for the environment: managing the interplay between natural and engineered biology. FEBS Letters 586 (15): 2199–2206. Asin-Garcia, E. et al. (2020). Genetic safeguards for safety-by-design: so close yet so far. Trends in Biotechnology https://doi.org/10.1016/j.tibtech .2020.04.005. Tellechea-Luzardo, J. et al. (2020). Linking engineered cells to their digital twins: a version control system for strain engineering. ACS Synthetic Biology 9 (3): 536–545. Qian, J. et al. (2020). Barcoded microbial system for high-resolution object provenance. Science 368 (6495): 1135–1140. Karkaria, B.D., Fedorec, A.J., and Barnes, C.P. (2021). Automated design of synthetic microbial communities. Nature Communications 12 (1): 672. McCarty, N.S. and Ledesma-Amaro, R. (2019). Synthetic biology tools to engineer microbial communities for biotechnology. Trends in Biotechnology 37 (2): 181–197. Che, S. and Men, Y. (2019). Synthetic microbial consortia for biosynthesis and biodegradation: promises and challenges. Journal of Industrial Microbiology & Biotechnology 46 (9–10): 1343–1358. Johnston, T.G. et al. (2020). Compartmentalized microbes and co-cultures in hydrogels for on-demand bioproduction and preservation. Nature Communications 11 (1): 1–11. de las Heras, A. and de Lorenzo, V. (2011). In situ detection of aromatic compounds with biosensor Pseudomonas putida cells preserved and delivered to soil in water-soluble gelatin capsules. Analytical and Bioanalytical Chemistry 400 (4): 1093–1104. Li, L. et al. (2020). Plasmids persist in a microbial community by providing fitness benefit to multiple phylotypes. The ISME Journal: 1–12.

References

105 Wang, S. et al. (2020). Conjugative transfer of megaplasmids pND6–1 and

106

107 108 109 110 111 112 113 114 115

116

117 118 119

120

121

122 123

pND6–2 enhancing naphthalene degradation in aqueous environment: characterization and bioaugmentation prospects. Applied Microbiology and Biotechnology 104 (2): 861–871. Pinilla-Redondo, R. et al. (2020). Conjugative dissemination of plasmids in rapid sand filters: a trojan horse strategy to enhance pesticide degradation in groundwater treatment. bioRxiv. https://doi.org/10.1101/2020.03.06.980565. Temkiv, T.Š. et al. (2012). The microbial diversity of a storm cloud as assessed by hailstones. FEMS Microbiology Ecology 81 (3): 684–695. Amato, P. et al. (2017). Active microorganisms thrive among extremely diverse communities in cloud water. PLoS One 12 (8): e0182869. Sun, D. (2018). Pull in and push out: mechanisms of horizontal gene transfer in bacteria. Frontiers in Microbiology 9: 2154. Nilgiriwala, K.S. et al. (2015). Synthetic tunable amplifying buffer circuit in E. coli. ACS Synthetic Biology 4 (5): 577–584. Costello, A. and Badran, A.H. (2020). Synthetic biological circuits within an orthogonal central dogma. Trends in Biotechnology 39(1): 59–71. Boo, A., Ellis, T., and Stan, G.-B. (2019). Host-aware synthetic biology. Current Opinion in Systems Biology 14: 66–72. Wang, H. et al. (2020). Horizontal gene transfer of Fhb7 from fungus underlies Fusarium head blight resistance in wheat. Science 368 (6493): eaba5435. Radman, M. (2016). Protein damage, radiation sensitivity and aging. DNA Repair 44: 186–192. Akkaya, Ö. et al. (2018). The metabolic redox regime of Pseudomonas putida tunes its evolvability toward novel xenobiotic substrates. mBio 9 (4): e01512-1. Zhao, Q. et al. (2017). Comparative genomic analysis of 26 Sphingomonas and Sphingobium strains: dissemination of bioremediation capabilities, biodegradation potential and horizontal gene transfer. Science of the Total Environment 609: 1238–1247. Calles, J. et al. (2019). Fail-safe genetic codes designed to intrinsically contain engineered organisms. Nucleic Acids Research 47 (19): 10439–10451. Decrulle, A.L. et al. (2019). Engineering gene overlaps to sustain genetic constructs in vivo. bioRxiv: 659243. Blazejewski, T., Ho, H.-I., and Wang, H.H. (2019). Synthetic sequence entanglement augments stability and containment of genetic information in cells. Science 365 (6453): 595. Fan, C. et al. (2020). Chromosome-free bacterial cells are safe and programmable platforms for synthetic biology. Proceedings of the National Academy of Sciences of the United States of America 117 (12): 6752–6761. Aparicio, J.P. and Pascual, M. (2007). Building epidemiological models from R0: an implicit treatment of transmission in networks. Proceedings of the Royal Society B: Biological Sciences 274 (1609): 505–512. Brockmann, D. and Helbing, D. (2013). The hidden geometry of complex, network-driven contagion phenomena. Science 342 (6164): 1337–1342. Lenton, T.M. (2011). Early warning of climate tipping points. Nature Climate Change 1 (4): 201–209.

887

888

22 Metabolic Engineering for Large-Scale Environmental Bioremediation

124 Gupta, S. et al. (2020). Investigating the dynamics of microbial consortia in

spatially structured environments. Nature Communications 11 (1): 1–15. 125 Kuiper, I. et al. (2004). Rhizoremediation: a beneficial plant-microbe interac-

tion. Molecular Plant-Microbe Interactions 17 (1): 6–15. 126 Wang, X. et al. (2020). Microbial electrochemistry for bioremediation. Envi-

ronmental Science and Ecotechnology 1: 100013. 127 Shapiro, R.S., Chavez, A., and Collins, J.J. (2018). CRISPR-based genomic

128

129

130 131 132 133

134

135 136

137

138

139

140

tools for the manipulation of genetically intractable microorganisms. Nature Reviews Microbiology 16 (6): 333–339. Tian, P. et al. (2017). Fundamental CRISPR-Cas9 tools and current applications in microbial systems. Synthetic and Systems Biotechnology 2 (3): 219–225. de Lorenzo, V. (2010). Environmental biosafety in the age of synthetic biology: do we really need a radical new approach? Environmental fates of microorganisms bearing synthetic genomes could be predicted from previous data on traditionally engineered bacteria for in situ bioremediation. BioEssays 32 (11): 926–931. Schmidt, M. (2019). A metric space for semantic containment: towards the implementation of genetic firewalls. Biosystems 185: 104015. Gallegos, J.E. et al. (2020). Securing the exchange of synthetic genetic constructs using digital signatures. ACS Synthetic Biology 9 (10): 2656–2664. Liss, M. et al. (2012). Embedding permanent watermarks in synthetic genes. PLoS One 7 (8): e42465. Kar, D.M. et al. (2020). Synthesizing DNA molecules with identity-based digital signatures to prevent malicious tampering and enabling source attribution. Journal of Computer Security 28: 437–467. Martinez-Garcia, E. et al. (2020). The naked bacterium: emerging properties of a surfome-streamlined Pseudomonas putida strain. ACS Synthetic Biology 9 (9): 2477–2492. Matz, C. and Kjelleberg, S. (2005). Off the hook–how bacteria survive protozoan grazing. Trends in Microbiology 13 (7): 302–307. Leinweber, A., Inglis, R.F., and Kümmerli, R. (2017). Cheating fosters species co-existence in well-mixed bacterial communities. The ISME Journal 11 (5): 1179–1188. Zhou, K. et al. (2015). Distributing a metabolic pathway among a microbial consortium enhances production of natural products. Nature Biotechnology 33 (4): 377–383. Glass, D.S. and Riedel-Kruse, I.H. (2018). A synthetic bacterial cell-cell adhesion toolbox for programming multicellular morphologies and patterns. Cell 174 (3): 649–658. e16. Volke, D.C. and Nikel, P.I. (2018). Getting bacteria in shape: synthetic morphology approaches for the design of efficient microbial cell factories. Advanced Biosystems 2 (11): 1800111. Kassinger, S.J. and van Hoek, M.L. (2020). Biofilm architecture: an emerging synthetic biology target. Synthetic and Systems Biotechnology 5 (1): 1–10.

References

141 Chen, F. and Wegner, S.V. (2020). Blue-light-switchable bacterial cell–cell

142

143

144

145

146 147 148 149 150

151 152

153

154

155

156 157

158

adhesions enable the control of multicellular bacterial communities. ACS Synthetic Biology 9 (5): 1169–1180. Chen, F. and Wegner, S.V. (2017). Blue light switchable bacterial adhesion as a key step toward the design of biofilms. ACS Synthetic Biology 6 (12): 2170–2174. Benedetti, I., de Lorenzo, V., and Nikel, P.I. (2016). Genetic programming of catalytic Pseudomonas putida biofilms for boosting biodegradation of haloalkanes. Metabolic Engineering 33: 109–118. Stephens, K. and Bentley, W.E. (2020). Synthetic biology for manipulating quorum sensing in microbial consortia. Trends in Microbiology 28(8): 633–643. Miano, A., Liao, M.J., and Hasty, J. (2020). Inducible cell-to-cell signaling for tunable dynamics in microbial communities. Nature Communications 11 (1): 1–8. Silva-Rocha, R. and de Lorenzo, V. (2014). Engineering multicellular logic in bacteria with metabolic wires. ACS Synthetic Biology 3 (4): 204–209. Bashan, Y. (1998). Inoculants of plant growth-promoting bacteria for use in agriculture. Biotechnology Advances 16 (4): 729–770. Sanders, M.E. and Marco, M.L. (2010). Food formats for effective delivery of probiotics. Annu. Rev. Food Sci. Technol. 1: 65–85. Filip, M., Tzaneva, V., and Dumitrascu, D.L. (2018). Fecal transplantation: digestive and extradigestive clinical applications. Clujul Medical 91 (3): 259. Lim, M.W., Von Lau, E., and Poh, P.E. (2016). A comprehensive guide of remediation technologies for oil contaminated soil – present works and future directions. Marine Pollution Bulletin 109 (1): 14–45. Chan, C.T. et al. (2016). “Deadman” and “Passcode” microbial kill switches for bacterial containment. Nature Chemical Biology 12 (2): 82–86. Kota, S., Borden, R.C., and Barlaz, M.A. (1999). Influence of protozoan grazing on contaminant biodegradation. FEMS Microbiology Ecology 29 (2): 179–189. Mertens, B., Boon, N., and Verstraete, W. (2006). Slow-release inoculation allows sustained biodegradation of gamma-hexachlorocyclohexane. Applied and Environmental Microbiology 72 (1): 622–627. Boon, N. et al. (2002). Bioaugmenting bioreactors for the continuous removal of 3-chloroaniline by a slow release approach. Environmental Science & Technology 36 (21): 4698–4704. Pinilla-Redondo, R. et al. (2018). Monitoring plasmid-mediated horizontal gene transfer in microbiomes: recent advances and future perspectives. Plasmid 99: 56–67. Vaughan, N.E. and Lenton, T.M. (2011). A review of climate geoengineering proposals. Climatic Change 109 (3–4): 745–790. Corner, A. and Pidgeon, N. (2010). Geoengineering the climate: the social and ethical implications. Environment: Science and Policy for Sustainable Development 52 (1): 24–37. Ramos, J.L. et al. (1994). The behavior of bacteria designed for biodegradation. Bio/Technology 12 (12): 1349–1356.

889

890

22 Metabolic Engineering for Large-Scale Environmental Bioremediation

159 Ramos, J.L. et al. (1995). Suicide microbes on the loose. Bio/Technology 13

(1): 35–37. 160 Keiper, F. and Atanassova, A. (2020). Regulation of synthetic biology: devel-

161

162 163 164

165

166

opments under the convention on biological diversity and its protocols. Frontiers in Bioengineering and Biotechnology 8: 310. López-Igual, R. et al. (2019). Engineered toxin–intein antimicrobials can selectively target and kill antibiotic-resistant bacteria in mixed populations. Nature Biotechnology 37 (7): 755–760. Bikard, D. et al. (2014). Exploiting CRISPR-Cas nucleases to produce sequence-specific antimicrobials. Nature Biotechnology 32 (11): 1146–1150. Bikard, D. and Barrangou, R. (2017). Using CRISPR-Cas systems as antimicrobials. Current Opinion in Microbiology 37: 155–160. Hammond, A. et al. (2016). A CRISPR-Cas9 gene drive system targeting female reproduction in the malaria mosquito vector Anopheles gambiae. Nature Biotechnology 34 (1): 78–83. Valderrama, J.A. et al. (2019). A bacterial gene-drive system efficiently edits and inactivates a high copy number antibiotic resistance locus. Nature Communications 10 (1): 1–8. French, K.E., Zhou, Z., and Terry, N. (2020). Horizontal “gene drives” harness indigenous bacteria for bioremediation. Scientific Reports 10: 15091.

891

Index a acetaldehyde production, in Streptococcus thermophilus 574–575 acetate 472 acetate-to-acetyl-CoA process 750 production 749 acetoin 196, 432, 472, 492, 495, 558, 575–578 acetolactate synthase (ALS) 196, 354, 697, 706 acetone–butanol–ethanol (ABE) metabolic route 611, 613 butanol production 698 acetyl-CoA 14–15, 738 cytoplasmic acetyl-CoA 713 cytosolic acetyl-CoA 746, 750, 753, 754 AcidifyME model 48 actinomycetes CRISPR-based genome editing techniques 663–665 generalized systems metabolic engineering workflow for 672 genome editing of 663 actinomycetes synthetic biology biological parts of 665, 667–669 biosensors 669–670 full pathway refactoring 671 riboswitches 670–671 activity-independent screening of target molecule synthesis 492–493 acyl carrier proteins (ACPs) 354–356, 370, 377, 699–702

adaptive laboratory evolution (ALE) 50, 285, 343, 384, 493, 499, 534, 555, 566, 567–570, 694 adenosine triphosphate (ATP) 31, 102, 140, 147–148, 180, 188, 191, 194, 196, 213, 224, 284, 354, 355, 362, 424, 427, 434, 477, 487, 495–496, 522, 524, 528–529, 532, 555, 558, 585, 631, 693, 701, 704–705, 714, 740, 742, 744–746, 748–751, 775–777, 808–809 α-ketoglutarate (αKG) production 363, 420, 427, 703, 736, 738, 740–744, 750 alanine 117, 374, 416, 574 alanine (ALA2-3 ) 117 alarmones guanosine tetra- and penta-phosphate 471 alcohol dehydrogenase (ADH) 283, 343, 354, 356, 359–360, 363, 432, 576–577, 583–585, 613, 692–693, 695, 697–698, 700–702, 704 algae biomass 829 algal biomass 415 alkaloids 367, 368, 370, 713, 775, 812, 817, 820, 824–825 production, in S. cerevisiae 710–712 alkane biosynthetic pathway 702 alleleome 33–34 Allelic Coupled Exchange (ACE) couples 618, 703

Metabolic Engineering: Concepts and Applications, First Edition. Edited by Sang Yup Lee, Jens Nielsen, and Gregory Stephanopoulos. © 2021 WILEY-VCH GmbH. Published 2021 by WILEY-VCH GmbH.

892

Index

amino acids aminovalerate 421–423 L-arginine 364 auxotrophic markers 745 cell factory 403 ectoine 425–426 L-glutamate 419–420 4-HIL production 427 L-lysine 420–421 L-pipecolic acid 426 L-theanine 427 L-threonine 364 trans-4-hydroxyproline 426–427 L-valine 365 ω-amino acids 4ABA 357 5AVA 357–359 4-aminobutyric acid (4ABA) 357, 363 6-aminocaproic acid 357, 359, 363 aminovalerate 359, 421, 422, 430–431 5-aminovaleric acid (5AVA) 357, 359, 361, 362, 423 anaerobic reporter 631 anaplerosis 74, 101 anaplerotic glyoxylate pathway 472 anhydrotetracycline (aTc)-inducible systems 627 anoxic P. putida chassis 531 anthocyanin 588, 709, 818–820, 834–835 antimicrobial activity (AA), of bacteriocins 557, 582 anti-CRISPR proteins 620 antibiotic resistance genes 302, 479, 615, 868–869, 878 antibiotics 302, 313, 317, 365–366, 406, 496, 561, 581, 583, 653, 665–666, 673–676, 769, 780–781, 788–789, 865 Antirrhinum majus 819 antiSMASH 658–660 approximative rate expression 159–160 aquatic cyanobacteria 806 Arabidopsis thaliana 180, 356, 588, 632, 710, 806, 815

arabinose 408, 413–415, 420, 432, 523, 571–572, 495, 697 arabinoxylans 412, 415, 816 L-arginine 364, 403, 405, 419 aromatic amino acids 824 bacterial production of 200–202 bisynthesis, in S. cerevisiae 708 synthesis 824 aromatic compounds mandelic acid 366 methyl anthranilate 366 S-styrene oxide 366, 367 L-tyrosine 365, 366 artemisinic acid production 369, 713, 715 Aspart 720 Aspartate transcarbamylase (ATCase) 188 Aspergillus niger 765, 766, 776, 785, 789 enzyme set for plant polysaccharide degradation 773 genom-scale metabolic models 770 glucoamylase 777 hyperbranching strain 786 metabolic and regulatory functions 771 X-ray microcomputed tomography (μCT) 788 Aspergillus oryzae 766, 778 enzyme set for plant polysaccharide degradation 773 Aspergillus terreus 430, 767, 779–780 asRNA knockdown method 625 atmospheric pressure chemical ionization (APCI) 269 atom enumeration scheme 103–104 atom transition network 91, 92, 94–95, 104 atom transition network specification 103 atom transitions 91–92, 97, 99, 103–104 auto-scaling 275 azadirachtin 817

Index

b Bacillus amyloliquefaciens strains 472, 493 Bacillus subtilis 469 B. subtilis ATCC6051a 482 B. subtilis BSK814 475 B. subtilis MGB469 475 B. subtilis MGB874 475 CRISPR/Cas and related strains 481–486 genome reduction projects 473–476 Bacillus licheniformis 470 B. licheniformis 2709 482 isocitrate lyase (aceB) 472 Bacillus thuringiensis cry3Bb gene 829 Bacillus BioBrick Box 490 Bacillus origin of replication (ori) 481 bacitracin 486, 494, 496–497 42.7 kb bacitracin synthase gene cluster bacABC 486 backslopping process 566 bacterial 3-dehydroshikimate dehydratase 816 bacterial cellulose 379–380 bacteriocins production, in LAB 581 bacteriophage infections 565 basic helix-loop-helix (bHLH) transcription factor 818–819 batch cultures 109, 111, 112, 356, 530 benzylisoquinoline alkaloids (BIAs) 370, 710–712 bioactive natural products 653, 804 biosynthetic pathways 655, 667, 668 tools and strategies for discovery 653–655 bioaugmentation 861–862 biochemical databases 238, 240, 244, 251 biochemical network 166, 240, 242–243, 249 biochemical search space 238, 240 Biochemical Systems Theory (BST) 173, 175 biocondensates 803, 833–835 bioenergetics 213–214

bioethanol production first-generation 691–694 second generation 694–697 bioluminescent reporters 631 biomass effluxes 102–103 bioproducer strains 238 bioproduction 243, 253–254, 285, 408, 415, 519, 530–531, 535, 780, 810–811, 817, 829–830, 873 bioremediation 875 challenges 859 chemical emissions and parameters 862, 867 definition 859 transition from 2.0 to 3.0 862 bioremediation 3.0 862–865, 867–869 bioremediation 4.0 879–880 bioremediation agent 521, 861, 862, 865, 870, 872, 878–879 P. putida 873 biosensors 13, 16, 383, 493, 525, 669, 670, 865 biostimulation 861–862 biosustainable industrial production platforms 806–808 biosynthetic gene clusters (BGCs) 655, 657 cloning and heterologous expression 660–663 genome mining 657–660 promoter engineering 667 reporter-guided mutant selection (RGMS) 670 biosynthetic pathways of natural products 655, 657 search for 251 block elasticities 186 blood proteins 721 B. methanolicus MGA3 482, 489 bondomers 121 bottromycin biosynthetic pathway 667 branched chain amino acids (BCAA) 359, 365, 496, 552 branched-chain alcohols 354 BsubCyc 470

893

894

Index

butanediol (BDO)-isomers 577 1,4-butanediol 57, 249, 363 2,3-butanediol 472, 494, 706 butanol 343, 697 production of 343, 588, 611–613, 698–699 butyrolactam 357, 363

c cadaverine 359, 361–362, 423 camalexin 833 cannabidiolic acid synthase 837 Cannabis sativa 834, 837 cannibalism 374, 471, 478 capillary electrophoresis (CE) 267 capitate-stalked trichome 834, 837 caprolactam 363, 533 Carbohydrate-Active enzyme database (CAZy) 773 carbon atom transitions 91–92 carbon catabolite repressor 496 β-carotene production 758, 759, 820 carotenoids 433, 524, 713, 718, 758, 817, 820, 826, 828–830 Cas-enzymes 308, 310, 312, 316, 320 Cas12a (Cpf1) enzyme 312, 663 Cas9 nickase (Cas9n) systems 482, 620 Cas9-Assisted Targeting of CHromosome segments (CATCH) 662 Cas9-mediated mutagenesis 564 Caulobacter crescentus 379, 413, 473 CAZyme gene expression, molecular mechanism of 773 cell type specific metabolic engineering 815 cell-free in vitro transcription systems 865 cell-free agents 860 cellular compartments 103, 282, 718, 834 cellular constraints 137–139 cellulase Egl-237 477 CeluStar CL 783 central carbon intermediates 112 13CFLUX2 117 13 C glucose 106 1

chalcone isomerase (CHI) 369, 417, 418, 708, 709 chalcone synthase (CHS) 369, 417, 418, 708, 709 chassis development 861 chassis for bioremediation 873 chassis metabolic model 246 chemical ionization (CI) 269 Chinese Hamster Ovary (CHO) 166, 188, 317–320 ChIPseq 61 Chlamydomonas reinhardtii 806, 830 chloramphenicol acetyltransferase (CAT) reporter 630 chlorophyll 822, 825–826, 828, 829 chloroplast stroma-localized ferredoxin (AtFedA) 825 chloroplast thylakoid 809, 826 chloroplasts 805, 806, 810, 821–830, 834–835, 837 chromosome-less bacteria (SimCells) 860 cinnamic acid 367, 707, 708, 819 circular gRNA-containing plasmids 311 cis,cis-muconate 428, 429 cis-cis-muconic acid 417 in P. putida 533, 534 citrate production 739, 745–746 Y. lipolytica 738 citrate synthase CitZ 496 citric acid production 765, 780, 788 in A. niger 776 13 C-labeled substrate 77 Class II and Class I diterpenoid synthases 822 cloning, of BGCs 661 Clostridium 5′ -UTRs & Riboswitches 634 genetic parts 626–627 promoters 627–630 reporters 630 enzymatic-based reporters 630–631 bioluminescent reporters 631 fluorescent reporters 631–632

Index

FbFP-based fluorescent reporters 632–633 FAST, HaloTag, and SNAP-tag fluorescent reporters 633 genome editing 614–615 ClosTron system 615 counter-selection markers 617–619 CRISPR systems 619–626 transposon-based random mutagenesis 615–617 terminators 633–634 ClosTron systems 615 cloud microbiology 879 Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) 304, 308, 309, 619, 625 associated (Cas) genes 308 based metabolic engineering, in T. thermophilus 783 mediated gene manipulation systems, for filamentous actinomycetes 664 technology, for C. glutamicum 407 13 C metabolic flux analysis 76, 98, 162, 404, 406, 413, 415, 420, 739–740, 748 2-C-methyl-D-erythritol-4-phosphate pathway 496 codBA system 479 CO2 fixation 15, 535, 783, 806, 811, 830 coarse grained thermodynamic information 31 coarse-grained integration of proteome constraints 139–144 cobalamin (vitamin B12) production 578, 579 coclaurine N-methyltransferase (CNMT) 370 Coleus forskohlii 823, 834 combinatorial biosynthesis 662, 667, 670 combinatorial metabolic engineering 9, 12–14

combinatorial supertransformation of transplastomic recipient lines (COSTREL) 828–829 computational pathway design 238, 239, 252, 253 computer–aided simulation 172 concentration control coefficient 179–180, 185, 188 confidence intervals (CoI) 89–90, 229 conjugative transposons 615 connectivity matrix 31 conserved metabolites 186 constrained optimization 26 COnstraint-Based Reconstruction and Analysis (COBRA) method 26–28, 30, 41, 51, 62, 240 continuous culture 77, 83, 110–113, 174 COPI vesicles 719–720 COPII cargo proteins 720 core phenylpropanoid biosynthesis 819 CoReCo metabolic model reconstruction algorithm 770 Corynebacterium glutamicum 191, 194, 403, 437, 473 aquatic sugars 415–416 amino acids 419–427 biopolymers 434–436 CRISPR technology 407 genome editing approaches 406 hemicellulosic biomass 408, 409, 412 industrial product production 419 natural products and active ingredients 433 industrial raw materials 407–408 lignin aromatics, valorization of 416–419 lignocellulosic sugars 408–415 L-lysine production 404 marker-free genome editing 407 metabolome analysis 405 organic acids and alcohols 427 cis-cis muconate 428–429 3-hydroxypropionate 430–432

895

896

Index

Corynebacterium glutamicum (contd.) glutarate production 429–430 itaconate 430 short-chain alcohols 432 plasmid-based expression 406 systems biology 404 systems metabolic engineering 404 Cosmid/fosmid library vectors 662 coumarins 817 biosynthesis 706, 708, 817 p-coumaroyl-CoA 707, 816 counter-selection and markerless genome editing 478–481 counter-selection markers 616, 617–619 coupling constraints 144–145, 147–150 Crabtree effect 141, 142, 150 14 C radioisotope 80 Cre/loxP system 315, 479, 561 CRISPR-BEST system 665, 666 CRISPR-Cas-technology 304, 313 -based genome editing 314 in B. subtilis and related strains 481–486 LAB genome editing 563–565 CRISPR-Cas9 and Cas12a-based genome editing methods 663 based approach 873 based genome editing 481, 489 CRISPR-dCas9 mediated base editing 488–489 mediated cytosine deaminase base editing system 489 mediated deaminase base editing 489 CRISPR interference (CRISPRi) 487–490, 497, 564–565, 625–626 CRISPR interference and activation (CRISPRi/a) 487, 489 CRISPR/PmanPA-Cas9 vectors 486 CRISPR RNAs (crRNAs) 308, 624 CRISPR/SpCas9-based genome editing system 490 cross collisional section (CCS) 267

cross–over theorem 179 cucurbitacin synthesis 4 cumomers 121 cutin 813 cyanobacteria 356, 805–806, 808, 811, 822, 825–826, 829–831, 873 cyanogenic glucosides 817–818, 820, 824, 832–833 cyanophycin 380–381 cyborgization of bacterial strains 863 cyclodepsipeptides (CDPs) 784–785 cytochrome P450 (CYP) enzymes 369, 701, 707–708, 710, 714, 812, 816, 826 cytochrome P450 oxidoreductase 2 816 cytoplasmic acetyl-CoA 713, 716 cytosolic acetyl-CoA 705, 717, 736, 746–747, 750, 756, 757 availability 698, 753–754

d data analysis, in metabolomics multivariate statistics 278 pathway analysis 278–279 univariate statistics 277–278 untargeted MS data processing 273–277 data dependent acquisition (DDA) 272 data-independent acquisition (DIA) 272 Daucus carota 818 DcMYB6 818 Del/Ros1 transgenic plant 819 Δ9 stearoyl-CoA desaturase (D9) 746, 747 de novo triacylglycerols biosynthesis 745–746 dead’ Cas9 (dCas9) 309 Debye–Hückel approximation 216, 221–223 deep alleleome 34 2-dehydro-3-deoxy-phosphogluconate aldolase reaction 91–92 dehydrogenase 92 alcohol 283, 343, 354, 356, 359–360, 363, 432, 576–577, 583–585,

Index

613, 692–693, 695, 697–698, 700–702, 704 GAP 428, 432, 748–749 lactate 196, 362, 377, 423–424, 434–435, 554, 571–573, 577, 587–588 3-dehydroshikimate 428–429, 816 deoxyviolacein 433 desorption electrospray ionization (DESI) 269 dhurrin synthesis 835 diacetyl production 574 diamines 360–361 1,3-diaminopropane 360 cadaverine 361 putrescine 361 diaminopimelate 92, 359, 416, 421, 422, 431 1,3-diaminopropane 360–361 dicarboxylic acids 361–362 glutaric acid 361, 362 succinic acid 362 dihydrofolate reductase (DHFR) 317–318 L-3,4-dihydroxyphenylalanine (L-DOPA) 370 dimethylallyl diphosphate (DMAPP) 355, 367, 369, 713–716, 757–758, 822 diols 1,4-butanediol 363 production 706 1,3-propanediol 362–363 “direct” omics approach 100 diterpenoid forskolin 834–835 diterpenoids 821, 830 DNA double-strand break (DSB) 489 DNA double-strand breaks (DSB) 302–303, 305, 307–314, 317–318, 407, 489, 563–565, 619–620, 624, 626, 663–665 DNA microarray analysis 722 DNA microarrays 405 docosanol 702 L-DOPA decarboxylase (DODC) 370 double crossover 615–619

double-gene knockout (DKO) predictions 34 double-helix model 320 double-stranded breaks (DSBs) 302 Drosophila DNA-binding Ubx homeodomain 305 dSpyCas9 625 dual plasmid approaches 482 dwarf-type mutants 814 D-xylose 695, 697 dynamic degradation of TAG’ (ddTAG) strategy 673 dynamic mass balances 23–25, 28 dynamic metabolons 831

e ecModels 140, 142, 144–145, 147, 150–151 ectoine production 425–426 EDEMP cycle 524–525 Ehrlich degradation pathway 354 eicosapentaenoic acid (EPA) 755–756, 760, 830 electron impact (EI) ionization 269 electrospray ionization (ESI) 268–269, 280–281 Elementary Metabolite Units (EMU) 121 Embden–Meyerhof–Parnas pathway 524 Embryophyta 806 emodepside 785 enantiopure L-lactic acid production 572 endogenous chloroplast-localized NADP-malic enzyme 810 endometabolomics 260 endonucleases CRISPR 308–310 TALEs 306–308 ZFN 304–306 endoplasmic reticulum (ER) 8, 701, 745–746, 750, 778, 818 endoplasmic reticulum-associated protein degradation (ERAD) 719 endoxylanases 436

897

898

Index

energy-demanding sporulation process 471 energy-rich adenosine triphosphate (ATP) 808 Entner–Doudoroff pathway 92, 101, 282, 283, 524 environmental biocatalysts, development of 870 environmental Galenics 861, 877– Environmental Galenic Science 877 environmental microbiology 865 environmental microbiomes, fortification of 879 environmental pollution 859, 865, 867, 874 microbiological agents for bioremediation 860 enzymatic rate equations 25 enzyme abundance 155–156, 163 enzyme bottom–up approach 186 enzyme kinetics factors affecting intracellular enzyme kinetics 155–156 Michaelis–Menten formula 153–154 enzyme mechanisms 91, 174 enzyme prediction for orphan and novel reactions 244–246, 251, 257 enzyme properties 153, 155, 172, 180–186 equality constraints 107 ERG9 714–715, 717 erythrose-4-phosphate (E4P) 14, 201, 428, 709 Escherichia coli (E. coli) 36, 473, 556 bulk chemicals ω-amino acids 357–359 diamines 360–361 dicarboxylic acids 361–362 diols 362–363 hydroxy acids 359–360 lactams 363 definition 341 fuels production non-native biofuel producers 343 non-natural biofuels 343

renewable biofuels 342 microbial biopolymers non-protein poly(amino acid)s 380–381 PHAs 374–379 polysaccharides 379–380 nanomaterials (NMs) 381–383 natural products 367–371 recombinant protein production 371, 372 membrane proteins 372, 373 protein-based materials 374 therapeutic proteins 371, 373 specialty chemicals aromatic compounds 365–367 L-amino acids 364–365 Escherichia coli K-12 MG1655 23, 36 essential genes 32, 437, 470, 473–475, 478, 488, 497, 499, 565, 626, 824 ethanol production 229, 233, 284, 343, 575, 611, 691–695, 697–698, 703–706 eukaryotic microalgae 806 evolutionary engineering 341, 571, 697 exometabolomics 260 exopolysaccharide 379–380 expression cassette 302, 310, 318, 479, 482, 737, 741, 747, 755, 760, 769 extended Debye–Hückel limiting law 221, 222 external constraints 138–139 external metabolites 174 extra-cytoplasmic function sigma factors (ECF) 493 extracellular alkaline cellulase Egl-237 477 extracellular fluxes 6, 78–79, 83–85, 91, 93, 98, 106, 111–113 extracellular reactions 78, 103

f FAIMS 267 Faraday constant 223, 227 β-farnesene production 713, 716

Index

fatty acid de novo biosynthesis 355, 375, 377 fatty acid alkyl esters (FAAEs) 355 fatty acid ethyl esters (FAEEs) 354–356 fatty acid methyl esters (FAMEs) 354–356 fatty acid pathway 355–356 feedback inhibition simulation 205–207 fermentative pathway 343, 354 ferulic acids 588, 816, 835 filamentous actinomycetes genome-scale metabolic models 674–675 multi-omics studies 671–674 filamentous fungi 313, 767, 784 A. niger 765, 776–778 A. oryzae 778–779 A. terreus 779–780 carbon catabolism 775 cell factories and products 766 CRISPR genome editing protocols for 768, 769 genetic and genome tool development 768–769 improved substrate utilization 773–775 industrially exploited 768 macromorphology of 785–788 metabolic and regulatory models 770–772 morphologies 767 natural metabolic capacities of 765 P. chrysogenum 780–781 protein production 788 secondary metabolites from 784 T. reesei 781–783 T. thermophilus 783 fine-tuned integration of proteome constraints coupling constraints 148, 150 pcModels 144, 145, 147 ribosome assembly reaction 148 TPI enzyme 146 first-generation bioethanol production 691–694

five-carbon containing prenyl isomers 822 FK506 biosynthesis 674 flavonoids 817, 819 biosynthesis 708, 818 in S. cerevisiae 709 flavonols 418–419, 708–710, 819 fluorescence-activating and absorption-shifting tag (FAST) protein 633 fluorescent reporters 631–633 5-fluoro-dUMP 479 5-fluorouracil (5-FU) 479 flux balance analysis (FBA) 6, 7, 162, 172, 215, 251 additional constraints 27, 29 analogy to deriving enzymatic rate equations 25 constrained optimization 26 dynamic mass balances 23–25 flux-concentration duality 28 genome-scale 25, 26 flux-concentration duality 28 flux connectivity theorem 184 flux constraints 107 flux control coefficient 175–180, 184, 186–191, 195–198, 201–202, 204–206, 283 flux quotient 96 flux summation theorem 178–179, 199 flux vector 25 flux–enzyme relationship 176–178 fluxomics 50, 75, 77–79, 167, 406, 864 FMN-based fluorescent proteins (FbFPs) 632 FokI-cleavage domain 305 folate production, in LAB 579 forskolin 822–823, 834–835, 837 forward simulation 84–85, 98, 121, 123 Fourier-transform ion cyclotron resonance (FT-ICR) 271 fractional labeling enrichment (FLE) 81, 82, 106, 116 free amino acids 112, 115, 116, 735, 755

899

900

Index

free fatty acids (FFAs) 354, 355, 701, 735, 755 free/independent fluxes 84 fructose 6-phosphate 31, 140, 434, 481, 580 fruit-specific E8 promoter 819 fumaric acid 783 functional CRISPR/Cas9-sgRNA complex formation 490 functional kinetic model 157–158, 163, 165–166 fungal biotechnology 766, 775, 788–789 fusafungine 785

g γ-aminobutyrate (GABA) 587 GAP dehydrogenase (GapA) 428, 432, 748–749 gas chromatography (GC) 114, 266 GE-free synthetic biology 865 GEM with enzymatic constraints using kinetics and omics data (GECKO) 139–142, 144 gene expression knockdowns 625 gene expression, in lactic acid bacteria 557 gene/genetically modified organisms (GMOs) 555, 565–566, 574, 579, 590, 829, 867, 872, 873, 877 gene-protein-reaction (GPR) 32–33, 674 generalized mass action (GMA) 159–160 genome annotation 7, 32, 36, 157, 168, 470, 522 genome editing definition of 301 of industrially relevant eukaryotes CHO 317–320 filamentous fungi 313, 315–316 yeast 310–313 principles of 301–304 genome editing tools counter-selection and markerless genome editing 478–481

CRISPR/Cas in B. subtilis and related strains 481–486 genome mining 581, 671, 673, 675–676, 784 for biosynthetic gene clusters 657–660 genome mining tools 658, 660 limitations 658 genome reduction projects in B. subtilis 473–477, 481 genome-reduced strain MGB874 476–477 genome-reduced variants, of P. putida 528–529 genome-scale kinetic models 7, 156, 168 genome-scale metabolic models (GEMs) 213, 215, 674, 770 internal constraints membrane economics 139 molecular crowding 139 proteome constraints 139 genome scale models (GSM) 7, 23–63, 137–151, 156, 240 genome-scale models (GEMs) 7, 23 E. coli 36–42 developments 50–55 from metabolism to the proteome 42–49 perspectives 56–59 genome-wide CRISPRi 488 genome-wide transcriptome analysis 690 Geobacillus thermodenitrificans 490 geranylgeranyl diphosphate (GGDP) 758, 822–824, 828 geranylgeranyl pyrophosphate (GGPP) 496, 713, 714, 717–718, 758 geranylgeranyl pyrophosphate synthase (GGPPS) 496 gibberellins 817, 826 Gibbs free energy 214–218, 220–223, 225–228, 233, 247 of pseudoisomer 222 global environmental problems, by urban and industrial activities 866

Index

global gene expression analyses 470 global scale 6, 404, 803, 821, 838, 880 global Transcriptional Machinery Engineering (gTME) method 12, 366, 373, 492 glpX gene 495 glucoamylase 776–778, 786 glucose 6-phosphate (G6P) generation 31–32, 165, 180, 224, 359, 413, 422, 431, 434–435, 495, 497, 524–525, 554, 750 glucose-6-phosphate dehydrogenase 495, 497, 554 glucose-6-phosphate isomerase (PGI) 32, 142, 162, 165, 554 β-glucosidase 436–437, 530, 571, 697, 774 glucosinolates 817, 820 L-glutamate 364, 383, 403, 405, 412, 419–420, 427, 435–436, 587 glutamate (GLU1-5 ) 117 glutamate dehydrogenase RocG 476 glutamate racemases 497 Glutamine Synthase (GS) 318 glutarate production 431, 741, 742 glutaric acid 357, 361, 362 glyceraldehyde 3-phosphate (GAP) dehydrogenase approach 749 dehydrogenase-encoding gene 496 glycerate 809 glycerol kinase (GlpK) 38, 495 glycerol metabolism, in LAB 585 glycerol transport facilitator (GlpF) 495 glycine betaine 829 glycolysis 14, 31–32, 101, 112, 140, 142, 145, 156, 161, 165, 188, 192, 205, 224, 229, 233, 497, 524–525, 553, 674, 704–705, 738, 740, 743, 745, 778 glycosylated proteins 373, 720 glyoxylate shunt 101, 360, 472, 703, 742, 750, 779 Golden Rice 820 Golgi apparatus 718–720, 821 Golgi-derived secretory vesicles 720

gram cell dry weight (gCDW) 8, 144, 149 graph-based search 243–244 green fluorescent protein (GFP) 318, 492–493, 631–632, 667 guide RNA (gRNA) 308–309, 620, 663 guilt-by-association approach 771

h Haemophilus influenzae 23 “hard” constraints 138–139 heat stress tolerance 567–569 height equivalent of a theoretical plate (HETP) 264 heme biosynthesis 721 hemicellulosic polysaccharides 813 hemoglobin 495, 721 hepatitis B surface antigen (HBsAg) 722 hepatitis B vaccines 722 high resolution mass spectrometry 270, 280 high-resolution fluxomics 406 high-throughput metabolomics 279–280 Hill coefficient 184 Hill equation 159, 206 homogalacturonan 813 homogentisic acid 826 homologous recombination (HR) mechanism 301, 303, 406, 559–560, 564, 616–620, 626, 662–663, 737, 769, 807, 826, 830, 831 horizontal gene transfer (HGT) 861 based, large-scale biormediation 878 events 868 HR-mediated technique for genome editing 302, 304 human insulin 720–721 human papillomavirus (HPV) vaccines 722 Hunter’s syndrome 306 hyaluronan 379–380 hyaluronic acid (HA) production 435 in C. glutamicum 434, 435

901

902

Index

hydrocortisone 717 hydrophilic interaction liquid chromatography (HILIC) 265, 276 hydroxy acids 3HP 360 4HB 360 4-hydroxybutyric acid (4HB) 359–360 4-hydroxycinnamic acid (p-HCA) 707–709 3-hydroxydecanoate (3HD) 375 3-hydroxydodecanoate (3HDD) 375, 377 3-hydroxyhexanoate (3HHx) 375 4-hydroxyisoleucine (4-HIL) production 403, 427 3-hydroxyoctanoate (3HO) 375 3-hydroxypropionate (3-HP) 375, 377, 430, 432 3-hydroxypropionic acid (3-HP) 359–360, 585, 703, 705 hygromycin B 315, 737, 745, 747

i iBsu1103 470 iChip 654 illumina sequencing 760 iModulons 52–55, 57, 62 in-depth global transcriptome 470 inactivated whole cells 860 INCA 117 independent component analysis (ICA) 51 inducible promoters 620, 627 inequality constraints 107 input labeling composition 104 input labeling design 93 in silico cell 33 in silico experimental ILE design 108 INST-13 C-MFA 83, 123 integrative analysis of omics data 166 intermediates, toxicity of 247 internal constraints 138–139 internal metabolites 174, 179, 184, 200 intracellular compartments 8

intracellular enzyme kinetics 155–156 intracellular fluxes 46, 75, 77–79, 82–84, 91, 93, 99–100, 103, 107, 112, 260 intracellular metabolite pool size 78, 95, 106, 123 inverse problem 85, 98, 121 in vitro enzyme assays 161 in vivo RBS-selector 667 ion chromatography (IC) 264, 267 ion mobility 267 ion-pairing RPLC (IP-RPLC) 265–266, 283 ionization techniques 268–269, 280 irregular xylem (IRX) mutants 814 IRX7 deficient plants 814 isobutanol production 698–699 isocitrate dehydrogenase Icd 496 isoflavonoids 708, 710, 817 isopentenyl diphosphate 354–355, 434, 757–758, 822 isopentenyl pyrophosphate (IPP) 14, 354–355, 367–369, 661, 713–717, 757–758 isoprenoid pathway 343, 355 isopropanol 263, 265, 343, 354, 613 isotope labeling experiment 100 atom transition network specification 103–104 input labeling composition 104–106 metabolic network specification 101–103 modeling and simulation 101 isotopic labeling 78–80, 96, 98, 115 isotopic steady state (ISS) 77, 90, 123 approximation 115–116 isotopic tracers 4, 6, 77, 79–80, 99 isotopic tracing 8–9 isotopically labelled tracer 8 isotopologues 97, 261, 274 isotopomer fractions 94, 96–97, 103 isotopomers 91–98, 106, 114, 116 itaconate 428, 430, 777, 779, 780 A. niger 777

Index

k keto acid decarboxylase (KDC) 354, 697 2-keto-acid route 698 α-ketoglutarate production 741–742 ketones bioreduction 584–585 keyicin production 673 kinetic feasibility 246, 247 kinetic model applications 166–167 approximative rate expression 159–160 description 156 functional 157–158 mechanistic rate expressions 158–159 perspectives 167–168 rate expressions 160–166 scope 156–157 toy example 164–165 toy model 164–165 yeast 165–166 kinetome 51, 168 kirromycin biosynthesis 656, 657 knowledge bases 24, 35, 57, 62, 404, 838, 862 kojic acid A. oryzae 778, 779 k-OptForce 41

l L- and/or D-glutamic acid monomers 495 L-arabinose 283, 695, 697 labeled substrate 77, 80–81, 99, 104, 108, 110–113, 116 labeling enrichments 81–83, 85, 90–91, 100, 103, 106, 109, 113, 115–117, 121, 123 lactams 357, 363 lactate dehydrogenase (LDH) 196, 362, 377, 423–424, 434–435, 554, 571–573, 577, 587–588 lactic acid 195, 572, 573, 613, 705 lactic acid bacteria (LAB) 555 as biocatalysts 583–589 bacteriocins 581–583

biotransformations 587–588 bulk chemical production acetoin 575–577 butanediol-isomers 577–578 ethanol 575 butanol production 588 CRISPR-Cas-technology for genome editing 563–565 features and phylogeny of 552, 553 fermentation 552 food ingredients acetaldehyde 574–575 alanine 574 diacetyl 574 lactic acid 572, 573 gene expression 557–559 genetic engineering of 559–565 glycerol conversion 585–587 heat tolerance acid tolerance 569 oxygen tolerance 570–571 host for plant metabolite production 588 ketones bioreduction 584–585 metabolism of 553–555 multistress-resistance 567 plasmid integration by homologous recombination 559–560 plasmid integration using phage attachment sites 560–561 polyols production 579–580 preservative effect 551 recombineering based genetic engineering 561–563 roles and functions 551 shuttle vectors 556–557 starter cultures for food fermentation 565–567 stress tolerance heat tolerance 567–569 substrate utilization routes 571–572 therapeutic proteins production 580–581 thermotolerance 568 transformation of 556–557 vitamins production cobalamin (vitamin B12) 579

903

904

Index

lactic acid bacteria (LAB) (contd.) folate (vitamin B9 579 riboflavin (vitamin B2) 578 lactic acid production 569, 572, 587, 705 Lactobacillus casei 556, 561, 563–564, 568–569, 577, 580, 582, 584, 587 Lactobacillus reuteri 561–562, 579, 584, 586 Lactococcus lactis 473, 556–558, 560, 570 for bulk chemicals production 576 as drug delivery vehicle (DDV) 580 for over-producing flavor compounds 573 lanthipeptides 581–582 ligD positive strains 487 lignans 817 lignocellulose 530, 694, 782 lignocellulosic biofuel production 694 limiting rate 172, 181, 183 lin-log modeling approach 160 linear ion traps (LITs) 270 linear plus linear homologous recombination (LLHR) 662 linearized statistics 89–90 lipid synthesis 747, 833 lipid-rich yeast 753 lipolase 313 lipopeptide surfactin 497 Lisianthius nigrescens 834–835 liquid chromatography (LC) 114, 264–266, 669 liquid–liquid extraction (LLE) 263 lithium acetate method 737 lovastatin 653, 779–780 low resolution mass spectrometry 269–270 Lpd1 740 lycopene, in S. avermitilis 667 L-lysine 110, 357, 359, 361, 364, 403–405, 413, 415–416, 419–423, 425–426 lysophosphatidate acyltransferase (LPAAT) 204–205

m machine learning approaches 241, 242, 249, 660 maize 806, 821 malate synthase 472, 697, 810 malate–pyruvate–oxaloacetate cycle approach 749 malic acid production 703–704 A. oryzae 779 maltose consumption 477 mandelic acid 366, 378–379 Manihot esculenta 832 mannitol 420–421, 579–580 utilization 415–416 mannose 414 mannose phosphoenolpyruvatedependent phosphotransferase system 481 mannose-specific transporter ManP 481 manPA alleles 481 manual pathway design 238 Mariner-transposable Himar1-based systems 617 markerless-deletion system 619 Markov chain Monte Carlo (MCMC) approach 89 sampling 26 mass isotopomers 97–98, 114, 116–117 mass spectrometry (MS) 268 acquisition modes for targeted metabolomics 271–272 for targeted MS 272 for untargeted metabolomics 272 features 268 high resolution 270–271 ionization techniques 268–269 low resolution 269–270 Mathieu equations 270 matrix effect 263, 268–269, 279 matrix-assisted laser desorption ionization (MALDI) 269–270, 280 max-min driving force (MDF) of pathway 282

Index

Maximum Likelihood Estimator (MLE) 163 maximum reaction velocity 25 mCherry-reporter cassette 319 ME models 42–49, 57 mechanistic rate expressions 158–160 medium chain length PHA (MCL-PHA) 375, 378 meganucleases 302–304, 320 membrane bound cytochrome P450 (P450) enzymes 812 metabolic control analysis (MCA) 6, 153, 171, 173, 283 aromatic amino acids, bacterial production of 200–202 based methods 166 concentration control coefficient 179–180 demand for product 194–195 flux control coefficient 175–176 flux summation theorem 178–179 flux–enzyme relationship 176–178 inhibition of competing pathways 195–196 limitation of 171, 188 linking control coefficients to enzyme properties 181 alterations of enzyme activity 188–190 block elasticities 186 control coefficients and elasticities 184–186 elasticity coefficient 181–184 enzyme rate equations 181–184 feedback inhibition 186–188 feedback inhibition, abolishing 191–194 top–down analysis 186 metabolic steady state 174–175 Universal Method 199–200 yeast tryptophan synthesis 197–198 yield impacts 203–205 metabolic engineering bioenergetics in life 213–214 of Bacillus

activity-independent screening of target molecule synthesis 492–493 biotechnological application 493–497 generally regarded as safe 470 minimal cell concept 472–478 optimization, standardization, and modularity in gene expression 490–492 physiological traits and circuits 470–472 quality presumption of safety 470 tools for genome editing 478–490 cellular metabolism and physiology ‘omics’ technologies 7–9 FBA 6, 7 isotopic tracing 8, 9 MCA 6 MFA 6, 7 combinatorial metabolic engineering 12–14 Escherichia coli 341–384 goal of 171 history and overview of 3 host organism selection 15 industrial biotechnology 17 rational metabolic engineering 10–11 strain development process 17, 18 substrates 15–16 synthetic biology 16 systems metabolic engineering 14 Metabolic Engineering: Principles and Methodologies 3 metabolic flux analysis (MFA) 6, 7 13 C-MFA 97–99, 124–125 13 C-MFA variants 76–77 bidirectional reaction steps 95–96 biotechnologically relevant organisms 108 carbon atom transitions 91–93 extracellular fluxes 111–112 flux constraints 107 fluxomics 77–79 from the data to the intracellular fluxes 82–83

905

906

Index

metabolic flux analysis (MFA) (contd.) in silico experimental ILE design 108 input labeling design 93–94 INST-13 C-MFA 83–84, 123–124 introduction 73–77 isotope labeling experiment 100–108 isotopic labeling 79–81 isotopic steady state approximation 115–116 isotopomer fractions 96–97 isotopomers 94–95 labeled substrate 112–113 measurement specification 106–107 metabolic and isotopic stationarity 110–111 metabolic and isotopic stationary state 90–91 metabolomics 113–115 natural isotope abundance 116–117 parameter fitting 84–86 simulation of labeling data and flux estimation 117–123 statistical analysis 86, 89–90 statistical evaluation and optimal experimental design 99–100 metabolic pathway design 237–238 metabolic pseudo-steady state approaches 76, 109 metabolic steady state (MSS) 77, 173–175, 182, 188, 281 assumption 77 metabolism 153 of LAB 553–555 metabolism 523 sugar 554 metabolite extraction 8, 262 metabolomics 8, 113, 259 analytical techniques sample preparation 262–264 separation techniques 264–267 applications 281–285 calibration curves 261 challenges 259 data analysis 272–277

and interpretation 277–279 engineer medium composition 285 experimental design 260 improving stress tolerance 284–285 internal standards 261 pathway design by thermodynamic analysis 281–283 reduction of side products and metabolite damage 284 sequences and standards 261–262 targeted and untargeted 260–261 metabolons 825, 831–833 MetaCyc 101 metallo-proteome 43–45, 55 meta-organismal catalysts 875 methotrexate (MTX) inhibits 317 methyl anthranilate 366, 404 2-methyl-D-erythritol 4-phosphate (MEP) pathway 822 methylerythritol-4-phosphate (MEP) 355 mevalonate (MVA) pathway 354, 355, 822 in S. cerevisiae 714 mevalonate diphosphate (MVAPP) 758 mevalonate pathway 588–589, 757–759 Michaelis constant 159, 181 Michaelis–Menten constant 25 Michaelis–Menten formula 153–154 enzyme kinetics 154 formulation 159 Michaelis–Menten kinetics 7, 154, 164 Michaelis–Menten reaction mechanism 25 microalgae 803, 806, 811, 829–831 microbial activities, multi-scale propagation of 869 microbial biocatalysts 4 microbial biopolymers non-protein poly(amino acid)s 380 PHAs 374–379 polysaccharides 379–380 microbial consortia 874, 879 microbiological agents for bioremediation 860

Index

microfluidics 13, 280–281, 483, 654, 784 microhomology-mediated end-joining (MMEJ) 301, 302, 319 minGenome 41 MiniBacillus 494 MiniBacillus framework 475 minimal cell concept genome reduction projects in B. subtilis 473–476 minimal genomes 472–473 productivity of genome-reduced strains 477–478 minimal genomes 61, 472–473, 499, 789 missing values 85, 274 mixed-integer linear program (MILP) 38, 215, 227–228 mixed-integer nonlinear program (MINLP) 41 monoclonal antibodies (mAbs) 371, 373 monoterpene indole alkaloids (MIAs) 370, 710–712 monoterpenes, in S. cerevisiae 716 Monte Carlo based approach 163 MPD pipeline 787–788 M. thermoacetica gas fermentation 751 multi-strain reconstructions 33 multiple reaction monitoring (MRM) method 271 multiplex automated genome editing/engineering (MAGE) 342, 526, 863 multiplex genome editing 486–487 multivariate statistics 278 Mummichog algorithm 279 mutagenic ssDNA recombineering 870 mutS mutations 569 Mycoplasma mycoides JCVI-syn3.0 strain 473 mycosporine-like amino acid (MAA) 423

n NADH oxidase (NOX) of Lactococcus lactis 196 NADPH-P450 oxidoreductase SbPOR2b 825 NADPH-P450 oxidoreductases (PORs) 817 Nannochloropsis oceanica 806, 830 nanomaterials (NMs) 381–383 nanopore sequencing technology 655 nanopore sequencing, in Y. lipolytica genome 760 nanostructure-initiator mass spectrometry (NIMS) 280 natural abundance (NA) 80 correction 116–117 NAtural Deep Eutectic Solvent (NADES) 834–835, 837 natural isotope abundance 116–117 natural isotope enrichments 105 natural products alkaloids 368, 370 phenylpropanoids 368–370 polyketides 368, 370–371 terpenoids 367–369 naturally labeled 12 C glucose 80 N-containing bioactive natural products 820–821 network data representation 242 network reconstruction availability of GEMs 35 basic principles 30–32 computational queries 32–33 curation 32 genomic basis 32 knowledge bases 35 scope expansion 33–35 Nicotiana benthamiana 807 nicotinamide adenine dinucleotide phosphate (NADPH) 714, 808, 809 Nisin controlled gene expression (NICE) system, in LAB 557, 558, 565 nisin Z 582 nisK 557, 558 nisR 557, 558

907

908

Index

non-homologous end joining (NHEJ) mechanism 301, 302, 313, 314, 318, 489, 619, 626, 663, 737, 741, 745, 747, 760 non-native biofuel producers fatty acids pathway 355–356 fermentative pathway 343–354 isoprenoid pathway 355 keto acid pathway 354–355 non-natural biofuels 343 non-oxidative pentose phosphate pathway (PPP) 96, 101, 284, 285, 495, 553, 554, 748, 750, 751, 753, 754 non-protein poly(amino acid)s 380–381 nonribosomal peptide synthetases (NRPSs) 371, 496, 534, 657, 667 R,S-norlaudanosoline 370 normal-phase liquid chromatography (NPLC) 265 novel reactions 244–245, 251, 257, 583 nuclear localization signal (NLS) 306, 308

o Obiwarp method 274 off-target effects 305, 307, 309, 316, 319 oleochemicals 699 oligo-annealed promoter shuffling (OAPS) 491 oligosaccharides 412, 414–415, 773 omega-3 fatty acid 818 4′ -O-methyltransferase (4′ OMT) 368, 370 omics’ technologies 7–9 OPENFLUX 117 OptForce 38, 41 optimal experimental design (OED) 89, 99–100, 101, 108, 109, 123 OptKnock 38 OptStrain 38 orbital trap 270 Orbitrap instruments 270, 271

organic acids production S. cerevisiae 702 orotate transporter (oroP) 559, 560 Oryza sativa 806 oxidative pentose phosphate pathway 553, 748 oxidative TCA cycle 703, 704 OxidizeME 43, 46, 48 2-oxoglutarate 419, 476, 711 oxygen tolerance 570–571

p paclitaxel 367, 717, 818 pangenome 41, 61, 522 paralogous proteins 473 parameter covariance matrix 89 parameter fitting 84–86, 96, 98, 107, 116, 121, 123, 163 parametric Monte Carlo bootstrap 89 pathway design 238 available tools for 247–249 practical example of 249 successful applications of 249 pathway feasibility 246–247 pathway search 237, 242–243, 244, 249 algorithm 244 PCR-generated sgRNA expression 487 P450-dependent biosynthetic pathway 824, 825 peak grouping 274 peak picking 273–274 pectin matrix 813 penicillin 202–203, 420, 653, 769, 780, 781 penicillin-producing fungi 313 Penicillium chrysogenum 767, 780 enzyme set for plant polysaccharide degradation 773 GEMs for 770 secondary metabolite production 774 X-ray microcomputed tomography (μCT) 787–788 pentose fermentation rate 695 pentose phosphate pathway (PPP) 96, 101, 156, 229, 284, 285, 495,

Index

524, 553, 554, 674, 695, 748, 750, 753, 754 pentose sugars 413, 553, 695, 697 γ-PGA biosynthesis 495 pG+ host system 560 phage-assisted continuous evolution (PACE) 13 phenylalanine ammonia lyase (PAL) 359, 367, 707, 708, 819, 820 3-phenyllactic acid (PLA) 375, 378, 587, 588 phenylpropanoid biosynthesis 819 phenylpropanoid pathway 707, 819, 833 phenylpropanoids 367, 368–370, 706–710, 813, 817, 819 phosphite oxidoreductase (PtxD) 829 phosphoenolpyruvate carboxykinase gene 416, 422, 431, 496, 744 phosphofructokinase (PFK1) 31, 180, 188, 191, 414, 432, 497, 498, 554, 674, 692, 776 6-phosphofructokinase 497, 498, 674 butanol production 698 phosphoglucose isomerase (PGI) 31, 32, 33, 142, 165, 692 3-phosphoglycerate 165, 692, 809 phosphoglycerate mutase (PGM) 165, 692 phosphotransferase systems (PTS) 201, 414, 415, 428, 522, 523, 554, 555, 580 photorespiration 809, 810 photosynthetic cells biosustainable industrial production platforms 806–808 plants 803–805 solar radiation 805–806 photosynthetic organisms cyanobacteria 808 metabolic engineering strategies 810 photorespiration 809 ribulose-1,5-bisphosphate 809 Rubisco 808 thylakoids 808 phylloquinone 826

physiological effects 8, 499 L-pipecolic acid 403, 426 plant bioactive natural products phenylpropanoids 817 synthesis and regulation of 817–818 terpenoids 817 upregulated anthocyanin synthesis 818–819 plant cell wall 530, 803, 814, 816, 813 plant metabolic engineering 803, 804, 820, 838 plant metabolites 588–589, 807, 822 plant-derived taxadiene synthase 495 plants 803–808 plasmid copy number effect 477 plasmid-based CRISPR/Cas9 systems 482 plastoquinols 826 plastoquinone 809, 826, 829 pollutant-responding whole cell biosensors 865 poly(diaminobutyric acid) 380 poly(diaminopropionic acid) 380 poly(glutamic acid) 380, 381 poly(lactate-co-glycolate) (PLGA) 377, 378, 379 poly(lysine) 380 poly(PhLA-co-3HB) production 379 poly-γ-glutamic acid (γ-PGA) 494, 495 polyglutamate (PGA) 435–436, 494, 495 polyglutamate synthesis 495 polyhydroxyalkanoates (PHAs) energy and redox storage material 374 MCL-PHA 375, 378 PHA (SCL-PHA) 375 PhaCs 378 PLGA 379 poly(3HB) 375, 377 poly(PhLA-co-3HB) production 379 polyketides 14, 367, 368, 370–371, 534, 581, 713, 754, 775 polymerase chain reaction (PCR) 12, 311, 478, 481, 482, 487, 527, 558, 662, 663, 736

909

910

Index

polysaccharides 374, 379–380, 408, 530, 773, 775, 782, 813, 814, 815, 816 pOri plasmid 560 practical non-identifiability 100 pravastatin 781 precursor compound 237, 242, 243, 253 P43 reference promoter 491 prenyl transferases 822 principal component analysis 55, 278 pristinamycin 663, 664 pro-vitamin A 820 productivity of genome-reduced strains 477–478 profile likelihoods 89 proinsulin 720, 721 promoters 627–630 promoters, in actinomycetes 667 1,2-propanediol 428, 432, 706 1,3-propanediol (1,3-PDO) 360, 362, 363, 375, 377, 428, 430, 432, 519, 585, 586 protein precipitation 262, 263 protein sequence 47, 244, 246 protein structures 35, 47, 50, 55, 57, 632, 782 protein-based biopharmaceuticals 371 proteinogenic amino acids 112, 114, 115, 116, 403, 406, 419, 655, 657 proteome constraints coarse-grained integration of 139–144 fine-tuned integration of 144–150 proteome-constrained models (pcModel) 144, 145, 147, 150, 168 protospacer adjacent motif (PAM) sequence 308, 309, 407, 620 Prunus dulcis 818 Pseudomonas putida 473, 528 EDEMP cycle 524 in silico analysis 522 in silico metabolic potential 522 aromatic molecules, production of 532–534

biosynthesis of natural products 534 carbon substrate range, expansion of 529–530 characteristics of 521–522 CRISPR/Cas–based gene editing techniques 526 features 529 genome-reduced variants 528–529 glucose uptake 523 glycerol metabolism 523 high-efficiency multiple genomic site engineering (HEMSE) of 526 nutritional landscape 530 organic acids, production of 533 oxygen-dependent lifestyle engineering 530–532 solvent-tolerant strains 521 substrate utilization and core metabolism 522–524 synthetic biology tools for 524–528 taxonomic credentials 521 xylose catabolism, pathways for 523 Pseudomonas putida KT2440 357, 519 ptb 619, 627 putative BCAA permease 496 putrescine 359, 360, 361, 412, 423 pyrazines 433 pyruvate 741 -acetaldehyde-acetate-AcCoA route 756 and α-ketoglutarate production 740 pyruvate decarboxylase negative strain (PDC) 704, 705

q quadrupoles 269–271 quantum mechanical/molecular dynamics (QM/MD) 216 quasi-steady state assumption (QSSA) 25, 174 quenching 110, 114, 262, 281, 811, 830 quorum sensing systems 876

r random mutagenesis 12, 304, 378, 419, 491, 532, 566, 567, 615–617, 670, 781

Index

rate expressions 153, 156–166, 163, 164, 167, 168 rate–limiting steps 173, 365, 721 rational metabolic engineering 9, 10–11, 613 reaction directionality 214, 242 reaction prediction 237, 240–241, 242 reaction reversibility 214 Recombinant DNA Biotechnology III + The Integration of Biological and Engineering Science 3 recombinant DNA technologies 3, 862 recombinant proteins 371, 372, 404, 436, 718–723, 828, 830 recombinant protein production membrane proteins 371, 372, 373 protein-based materials 374 therapeutic proteins 371–373 recombinant RNA 437 Recombinase-Mediated Cassette Exchange (RMCE) 319, 320 recombineering based genetic engineering, in LAB 561–562 reduced-genome variants, of P. taiwanensis VLB120 529 RelA/SpoT homologues 471 renewable biofuels 342 Repeat Variable Diresidues (RVDs) 306, 307 reporter-guided mutant selection (RGMS) 670 repression under secretion stress (RESS) 778, 782 rep60-specific sgRNA 487 resveratrol 369, 418, 818 retention time 264, 266, 272, 274, 276 (S)-reticuline derived morphine type alkaloids 824 retrobiosynthesis 241–242, 243, 251 retrobiosynthetic approach 115, 243 reverse catabolite repression strategy 523 reverse-phase liquid chromatography (RPLC) 265, 266, 268, 276, 283

reversible Michaelis–Menten equation 181, 182 rhamnogalacturonan 813 Ribo-seq 61 ribosomal binding sites (RBSs) 433, 667, 668, 871 ribosomal proteins 147, 473, 476 riboswitches 493, 634, 670, 865 ribozymes 316, 312 ribulose-1,5-bisphosphate 808, 809 carboxylase-oxygenase (Rubisco) 808 rice 571, 778, 806, 809, 816, 818, 820 RNA sequencing 48, 51, 53, 61, 405 roseoflavin 578 RPLC 265, 266, 268, 276, 283 rRNAs 473 16s rRNA-based unrooted phylogenetic tree, of LAB 552–553

s SABIO-RK13 161–162 Saccharomyces cerevisiae alkaloid compounds production 710 aromatic amino acids 708 commodity chemicals production diols 706 organic acids 702–705 cytosolic fatty acid synthesis 699 fatty acid synthesis 699–700 fatty acids synthesis 699–700 first-generation bioethanol production 691–694 flavonoids biosynthesis 710 higher alcohol production 695–697 history 689 insulin analogue expression 721 mevalonate and sterol pathway 713–714 monoterpenes, triterpenes, and isoprenoids production 716–718 phenylpropanoids production 706–710 protein secretory pathway 718–719 recombinant proteins production 718–723

911

912

Index

Saccharomyces cerevisiae (contd.) second generation bioethanol 694–697 timeline 690 virus like particles in 721–723 S-adenosylmethionine (SAM) synthesis 356, 366, 493 Salinispora genomes 672 SAM-dependent methyltransferase (AAMT1) 366 sample batches 275 sample-specific variations, correction of 275 S. cerevisiae GEM 140, 157 scrambling reactions 102, 104, 107 second generation bioethanol 694–697 second law of thermodynamics 213–215, 227 Secondary Metabolite Bioinformatics Portal 658 secondary metabolites 653, 655, 665, 673 biosynthesis 660–671 BGC cloning for heterologous expression 661–663 genome mining, software for 658, 659 host strain selection, for heterologous expression 660–661 genome mining, software for 658 -related bioinformatics tools 658 sedoheptulose-1,7-bisphosphatase 830 selected reaction monitoring (SRM) 271 sequencing-based approaches 655 sesquiterpenoid artemisinic acid 829 sgRNA-guided Cas9 488 shallow’ alleleome 34 shikimate p-coumaroyl transferase 816 shinorine production, in C. glutamicum 423, 424 short chain length PHA (SCL-PHA) 375 short-chain alcohols 432, 534, 697

SigB-controlled general stress response 471 SigB-controlled regulon 471 sigma factor SigB 471 signal-recognition particle (SRP) 718 simultaneous saccharification and fermentation (SSF) 694 single cell metabolomics 280–281 single-gene knockout (SKO) 33, 34 single-strain multi-step process vs. consortium-based distributed catalysis 874–875 site-directed mutagenesis 357, 582, 711 SNPeffect 41 Solanum lycopersicum 819 solar radiation 805–806, 825 solid-phase extraction (SPE) 263, 264 Sophora japonica 834 sorbitol 579, 580, 694 space charging 270 SpCas9-mediated dsDNA cleavage 490 specialized metabolites 653, 804, 805 spectinabilin gene cluster 671 SporeWeb 470 SpyCas9 624, 625 squalene 714, 715, 717 S-reticuline by an epimerase (STORR) 368, 370 Standard European Vector Architecture (SEVA) 525, 527 statistical analysis 86–90, 122, 161, 163–166 sterol pathway, in S. cerevisiae 714 stilbenes 418, 817 stoichiometric coefficient 25, 27, 31, 138, 139, 140, 143, 215, 754 stoichiometric feasibility 246 stoichiometric matrix-based search 243 stoichiometric pathway evaluation 249, 251 strain development methods 566 strain development process 17, 342 strand invasion 302

Index

Streptococcus thermophilus 308, 555, 565 Streptomyces antibiotics producers 673 biosensor 669 metabolic engineering 654 species 489, 661 strains 473 Streptomyces coelicolor 659, 660, 663, 666, 670, 673, 675 genome-scale metabolic models 674–675 Streptomyces tsukubaensis 673 genome-scale metabolic models 674–675 streptomycin 527, 665 streptothricin 665 stressME model 43, 46 structural non-identifiability 100 structural systems biology 35, 55, 56 S-styrene oxide 366, 367 suberin 813 substrate mixture 94, 104, 105 substrate utilization routes, LAB 571 subtilisin-like alkaline protease M 477 SubtiWiki 470 succinate dehydrogenase (Sdh) complex 360, 704, 743, 744 succinate production 343, 412, 743, 744 succinic acid 361, 362, 690, 703, 704, 743 succinylase 92, 359 sugar metabolism, of lactic acid bacteria 554 supercritical fluid chromatography (SFC) 264, 266, 267, 268 supervised methods, in untargeted metabolomics 278 surfactin 494, 497 surfactin biosynthesis 497 switchgrass 816 synthetic biology 862 metabolic engineering 16 synthetic biology agents (SBAs) 872, 873, 874, 877

synthetic biology-based technologies for large-scale bioremediation 868–869 synthetic morphologies 875 synthetic promotor libraries (SPL) 557–558 synthetically lethal gene pairs 32 systems biology 11, 14, 28, 35, 36, 41, 55, 56, 63, 73, 181, 341, 404, 438, 519, 520, 614, 627, 654, 672 systems metabolic engineering 5, 9, 14, 73, 341, 342, 383, 384, 404–407, 420, 424, 429, 435, 671–675, 776

t target compound 237, 238, 240, 241, 242, 243, 246, 249, 250, 251, 660, 663, 674, 817, 818, 831, 861, 864, 865 targeted metabolomics 260–261, 262, 268, 272, 273, 274, 275, 276, 277, 278, 284, 285 target gene egl-237 477 TATA-box 312 taxa-4,11-diene 496 taxadiene 368, 369, 495, 496, 717, 830 Taylor cone 268 TCA cycle 229 in cancer cells 4 temperature-sensitive derivative pE194ts 481 temperature-sensitive pSG5 ori 481 temporal control 11 temporal drifts 275 terminators 436, 612, 633, 634, 667 terminators, in actinomycetes 667 terpenes 367, 712–714, 775, 812 terpenoid production 713, 718, 812 terpenoids 367–369, 433, 534, 712, 713, 714, 716, 717, 718, 817, 821, 822, 824, 825, 830, 837 Terraforming Earth’ approach 868 terrestrial charophycean green algae 805

913

914

Index

tert-butyl dimethylsilyl (TBDMS) 117, 118 tetLM riboswitch 486 tetracycline 486, 581, 627, 653, 673 tetrahydro-cannabinoic acid synthase 837 L-theanine production 427 theophylline responsive riboswitch 634, 670 therapeutic proteins 371–373, 552, 580–583 thermodynamic feasibility 214, 232, 237, 246–247, 251, 253, 282 thermodynamic flux analysis (TFA) 215, 227, 228, 229, 232, 247, 251, 252, 253 thermodynamic pathway evaluation 249, 251–257 thermodynamics-based flux analysis (TFA) workflow 215 characterizing feasible concentration space 229–233 compartment-specific ionic strength and pH 220–221 constraining flux space with metabolomics data 228–229, 232 estimation of standard free energies of formation 216–221 free energy of formation for isomer distributions 221–223 mathematical formulation 227–228 model curation 215–216 transformed free energies of reaction 223–227 thermostable cellulases 782 thermostable ThermoCas9 490 Thermothelomyces thermophilus 766–767, 774, 783 L-threonine 359, 364, 365, 431 thylakoids 808, 821, 831, 837 thymidylate synthase 479, 581 time-course simulations of dynamic processes 166 time-of-flight (TOF) MS 114, 270, 271, 272, 273, 274, 405

time-resolved metabolomic measurements 281 tocochromanols 826, 828 tocopherols 818, 826, 828 tocotrienols 826, 828 top–down control analysis 186, 204 toxic pyrimidine analogue 5-fluorouracil (5-FU) 479, 618 toxin-antitoxin systems 479, 619 toy model 142, 145. 148, 164 TP901-1 integration system has 561 trans-activating RNA (tracrRNA) 308, 309, 482 transcription factor (TF)-based biosensors 493 Transcription-Activator Like Effectors (TALEs) 306 Transcription-Activator Like Effector Nucleases (TALENs) 304, 306–308, 310, 315, 318 transcriptional regulatory networks (TRNs) 51 transcriptomics 8, 259, 405, 413, 470, 589, 672, 673, 675, 690, 769, 771, 778, 837 transformation-associated recombination (TAR) 662 trans-4-hydroxyproline (4-HYP) production 403, 426–427 triacetic acid lactone (TAL) production 756–757 triacylglycerols (TAGs) 673, 735, 736, 745, 747, 748, 751, 753, 754, 755, 756 triacylglycerols production Y. lipolytica cytosolic acetyl-CoA availability 753–754 desaturation of fatty acyl chains 747–748 lower raw substrate cost 749–753 pathway yield through balancing redox cofactors 748–749 push-and-pull strategy 747 Triticum aestivum 806 tricarboxylic acid (TCA) cycle 156, 472, 522, 703

Index

tricarboxylic citric acid cycle (TCA) 4, 101, 110, 124, 156, 229, 233, 362, 407, 413, 419, 421, 426, 427, 472, 403, 406, 674, 703, 704, 705, 739, 742, 743, 750, 751 Trichoderma reesei 767, 771, 781 enzyme set for plant polysaccharide degradation 773 growth on cellulose 774 overexpression of hybrid transcription factor in 774 2,4,5-trimethyl-1,3-dioxolane (TMDX) 284 triple-quadrupole (QQQ) instruments 271 Triticum aestivum 806 tRNAs 147, 473, 712 type I-B CRISPR systems 624 type II CRISPR system 620, 624 tyrosinase (TYR) 370, 382, 383, 778 L-tyrosine 365, 366, 369, 370, 379, 419, 707–710 tyrosine-derived cyanogenic glucoside dhurrin (D-glucopyranosyloxy-(S)-p-hydroxymandelonitrile) 824–825

u ubiquinone 829 UDP-Glucose 380, 435, 498, 588, 824, 825 UDP-glucosyltransferase mediated de-toxification 824 UDP-N-acetyl-glucosamine 497 UDPG-glucosyltransferase SbUGT85B1 825 unfolded protein response (UPR) 719 univariate intervals 89 Universal Method 199, 200, 564 unlabeled isotopomer 94 unsteady FBA (uFBA) 27 untargeted metabolomics 260–261, 268, 272–278, 284, 285 untargeted MS data processing annotation 276–277 missing values 274

normalization approach 275 peak alignment and retention time correction 274 peak grouping 274 peak picking 273–274 preprocessing step 273 upper glycolysis in S. cerevisiae 139–140, 142 upregulated anthocyanin synthesis 818–819 uracil-phosphoribosyltransferase (UPRTase) gene upp 479, 560 uridine diphosphate-glucuronic acid 497, 498

v valerolactam 357, 363, 534 L-valine 364, 365, 405, 419, 431 vanilla orchid 824, 834, 835 Vanilla planifolia 824, 834 vanillin 417, 588, 835, 837 vegetative SigA-type promoter 477 very-long-chain fatty acid (VLCFA)-derived chemicals 702 violacein 404, 423, 433 Viridiplantae 806 virus like particles (VLPs), in S. cerevisiae 721 vitamin E 818, 825, 826, 828

w waste-free circular economy 867 weighted sum of squared residual (SSR) 86, 89, 121, 527 wheat 571, 806, 809 whole-cell biocatalysis 519, 520, 524, 589

x XlnR transcriptional regulator 778 xylan production 814–816 xylo-oligosaccharides (XOSs) 412, 414, 415 xylose 408 inducible PxylA 486 inducible promoter PxylA 488, 627

915

916

Index

y Yarrowia lipolytica 15, 285 applications 736 β-carotene production 757–759 cytosolic acetyl-CoA 753–754 eicosapentaenoic acid production 755–756 genetic engineering tools for 737–738 genotypic features 735 homologous recombination mechanism 737 lactose as carbon source 752 NHEJ mechanism 737 opportunities and challenges 759–760 phenotypic features 735 proteases secretion 735 short-chain organic acid production 738 citrate 738–740 pyruvate and α-ketoglutarate 740–741 succinate 743–745

triacetic acid lactone (TAL) production 756–757 triacylglycerol production 746, 747–748, 750 de novo biosynthesis 745–746 cytosolic acetyl-CoA availability 753–754 desaturation of fatty acyl chains 747–748 lower raw substrate cost 749–753 pathway yield through balancing redox cofactors 748–749 push-and-pull strategy 747 yeast 310–313 S. cerevisiae 689 Y. lipolytica 735 yeast tryptophan synthesis 197–198 y-genes 36 yellow fluorescent protein (YFP) 632

z Zea mays 806 zinc-finger nucleases (ZFNs) 304–306, 307, 308, 310, 318, 320