Yeast Metabolic Engineering: Methods and Protocols [1 ed.] 1493905627, 9781493905621

Yeast Metabolic Engineering: Methods and Protocols provides the widely established basic tools used in yeast metabolic e

300 32 5MB

English Pages 316 [327] Year 2014

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Front Matter....Pages i-x
Front Matter....Pages 1-1
Front Matter....Pages 3-15
Front Matter....Pages 17-42
Front Matter....Pages 43-62
Back Matter....Pages 63-85
....Pages 87-111
Recommend Papers

Yeast Metabolic Engineering: Methods and Protocols [1 ed.]
 1493905627,  9781493905621

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Methods in Molecular Biology 1152

Valeria Mapelli Editor

Yeast Metabolic Engineering Methods and Protocols

METHODS

IN

M O L E C U L A R B I O LO G Y

Series Editor John M. Walker School of Life Sciences University of Hertfordshire Hatfield, Hertfordshire, AL10 9AB, UK

For further volumes: http://www.springer.com/series/7651

Yeast Metabolic Engineering Methods and Protocols

Edited by

Valeria Mapelli Industrial Biotechnology, Chalmers University of Technology, Gothenburg, Sweden

Editor Valeria Mapelli Industrial Biotechnology Chalmers University of Technology Gothenburg, Sweden

ISSN 1064-3745 ISSN 1940-6029 (electronic) ISBN 978-1-4939-0562-1 ISBN 978-1-4939-0563-8 (eBook) DOI 10.1007/978-1-4939-0563-8 Springer New York Heidelberg Dordrecht London Library of Congress Control Number: 2014936204 © Springer Science+Business Media, LLC 2014 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Cover illustration: “Matzot” by Mara Haregu Pagani, mixed media Printed on acid-free paper Humana Press is a brand of Springer Springer is part of Springer Science+Business Media (www.springer.com)

Preface The incidental use of yeast for fermented products can be traced back to about 6,000 years ago. However, the definition of yeast as living organism and as responsible for sugar fermentation became clear only in the 1800s, when considerable attention was paid—especially for economic reasons—to the study of fermentation, aiming at preventing spoilage of wines and other alcoholic beverages. Those studies have been seminal for a better understanding of the fermentation process and of the role of yeast, and further steps forward have been made with the discovery and isolation at the beginning of the 1900s of different yeast species and strains with peculiar properties. From those years and on, the science of yeast has never stopped. Thanks to the development of novel molecular biology techniques and the availability of the complete genome sequence of Saccharomyces cerevisiae, yeast has been used both as a model organism for higher eukaryotes and as a work horse microorganism for diverse industrial productions, ranging from proteins to metabolites with diverse applications. The branch of technologies and techniques that have brought the use of yeast to several production processes goes under the name of Metabolic Engineering, whose aim is to modify and tune yeast metabolism according to the production target. Several publications already exist on this topic, but technologies and methods are continuously being developed and improved. Therefore, this volume is intended to provide an overview of the widely established basic tools used in yeast metabolic engineering; while describing in deeper detail novel and innovative methods and protocols that have a valuable potential to improve metabolic engineering strategies aiming at industrial biotechnology applications. With this perspective, the first part of the volume tries to give an overview of the basic tools existing for S. cerevisiae metabolic engineering, such as selection markers and engineered promoters, aiming to give the reader a sort of compendium that collects such tools which will always remain fundamental in the field. On the other hand, novel metabolic engineering techniques and technologies, such as the use of RNA switches and the generation of arming yeasts, are described in the form of detailed protocols, as they are not commonly established yet and their potential might be great for certain applications. Although S. cerevisiae is the species to which the word “yeast” is commonly referred to, other yeast genera and species are receiving increasing interest thanks to their peculiar features conferring them high potential for specific biotechnological applications. Therefore, particular focus is given to protocols that can be used when dealing with metabolic engineering of Komagataella spp. (formerly known as Pichia spp.), Hansenula polymorpha, and Zygosaccharomyces bailii. The reader familiar with laboratory practices is also aware of the fact that often the protocols developed for the so-called laboratory yeast strains are not easily transferable to wild or industrial yeasts, which are known to be genetically more complex. For this reason, a few chapters provide protocols for the engineering of industrial strains also presenting an innovative protocol for the optimization of fed-batch fermentations with Pichia pastoris. While the first section provides the tools for engineering yeasts, the second section (Tools and technologies for investigation and determination of yeast metabolic features)

v

vi

Preface

provides detailed protocols established to identify and evaluate the actual metabolic changes generated through genetic engineering. In particular, a protocol for metabolic flux analysis is described using the yeast P. pastoris as a case study, and a specific metabolite profiling method is reported also providing a summary of existing methodologies for yeast metabolome analysis. Since one of the most challenging steps in metabolome studies is the analysis of the resulting huge amount of data, it has been considered worthwhile to dedicate one full chapter to a novel bioinformatics tool for processing and understanding metabolome data. Along the bioinformatics line, the third section of the volume deals with Metabolic models for yeast metabolic engineering, which are more and more popular for the initial definition and the improvement of metabolic engineering strategies. The two chapters focusing on this topic provide an overview on how genome-scale metabolic models are constructed and show a metabolic engineering application that has been developed exploiting yeast metabolic models and the related bioinformatics tools. Since the topics in this volume have been treated giving considerable relevance to the industrial application of the metabolically engineered yeasts, the editor thought that some space, though little, could be given to the patenting practice as conclusion of the volume. It might not look a proper conclusion in a book of methods and protocols, but the editor’s personal opinion is that knowing the fundamental principles of patenting the products resulting from laboratory investigation can be extremely useful also in guiding the choice of the methods that the researchers intend to use in their research. In conclusion, I would like to thank all the researchers and authors who contributed with enthusiasm, patience, and professionalism to this volume, willing to share the protocols they developed and the knowledge they hold with the scientific community. It has been a real pleasure dealing with such people. Furthermore, last but not least, I would like to thank Dr. John Walker, the Editor-in-Chief of the Methods in Molecular Biology series, for his continued trust and support. Gothenburg, Sweden

Valeria Mapelli

Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

v ix

PART I MOLECULAR TOOLS AND TECHNOLOGY FOR YEAST ENGINEERING 1 An Overview on Selection Marker Genes for Transformation of Saccharomyces cerevisiae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Verena Siewers 2 Natural and Modified Promoters for Tailored Metabolic Engineering of the Yeast Saccharomyces cerevisiae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Georg Hubmann, Johan M. Thevelein, and Elke Nevoigt 3 Tools for Genetic Engineering of the Yeast Hansenula polymorpha. . . . . . . . . . Ruchi Saraya, Loknath Gidijala, Marten Veenhuis, and Ida J. van der Klei 4 Molecular Tools and Protocols for Engineering the Acid-Tolerant Yeast Zygosaccharomyces bailii as a Potential Cell Factory . . . . . . . . . . . . . . . . . Paola Branduardi, Laura Dato, and Danilo Porro 5 Strains and Molecular Tools for Recombinant Protein Production in Pichia pastoris . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael Felber, Harald Pichler, and Claudia Ruth 6 Methods for Efficient High-Throughput Screening of Protein Expression in Recombinant Pichia pastoris Strains . . . . . . . . . . . . . . Andrea Camattari, Katrin Weinhandl, and Rama K. Gudiminchi 7 Synthetic RNA Switches for Yeast Metabolic Engineering: Screening Recombinant Enzyme Libraries. . . . . . . . . . . . . . . . . . . . . . . . . . . . Joshua K. Michener and Christina D. Smolke 8 Generation of Arming Yeasts with Active Proteins and Peptides via Cell Surface Display System: Cell Surface Engineering, Bio-arming Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kouichi Kuroda and Mitsuyoshi Ueda 9 Genetic Engineering of Industrial Saccharomyces cerevisiae Strains Using a Selection/Counter-selection Approach . . . . . . . . . . . . . . . . . . . . . . . . Dariusz R. Kutyna, Antonio G. Cordente, and Cristian Varela 10 Evolutionary Engineering of Yeast. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ceren Alkım, Burcu Turanlı-Yıldız, and Z. Petek Çakar 11 Determination of a Dynamic Feeding Strategy for Recombinant Pichia pastoris Strains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oliver Spadiut, Christian Dietzsch, and Christoph Herwig

vii

3

17 43

63

87

113

125

137

157 169

185

viii

Contents

PART II TOOLS AND TECHNOLOGIES FOR INVESTIGATION AND DETERMINATION OF YEAST METABOLIC FEATURES 12 Yeast Metabolomics: Sample Preparation for a GC/MS-Based Analysis . . . . . . Sónia Carneiro, Rui Pereira, and Isabel Rocha 13 13 C-Based Metabolic Flux Analysis in Yeast: The Pichia pastoris Case . . . . . . . . Pau Ferrer and Joan Albiol 14 Pathway Activity Profiling (PAPi): A Tool for Metabolic Pathway Analysis . . . . Raphael B.M. Aggio 15 QTL Mapping by Pooled-Segregant Whole-Genome Sequencing in Yeast . . . . Thiago M. Pais, María R. Foulquié-Moreno, and Johan M. Thevelein

PART III

209 233 251

METABOLIC MODELS FOR YEAST METABOLIC ENGINEERING

16 Genome-Scale Metabolic Models of Yeast, Methods for Their Reconstruction, and Other Applications . . . . . . . . . . . . . . . . . . . . . . Sergio Bordel 17 Model-Guided Identification of Gene Deletion Targets for Metabolic Engineering in Saccharomyces cerevisiae . . . . . . . . . . . . . . . . . . . Ana Rita Brochado and Kiran Raosaheb Patil

PART IV

197

269

281

PATENTING AND REGULATIONS

18 Patents: A Tool to Bring Innovation from the Lab Bench to the Marketplace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Z. Ying Li and Wolfram Meyer

297

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

311

Contributors RAPHAEL B.M. AGGIO • Department of Gastroenterology, Institute of Translational Medicine, University of Liverpool, Liverpool, UK JOAN ALBIOL • Department of Chemical Engineering, Escola d’Enginyeria, Universitat Autònoma de Barcelona, Bellaterra (Cerdanyola del Vallès), Spain CEREN ALKIM • Department of Molecular Biology and Genetics, Faculty of Science and Letters, Dr. Orhan Öcalgiray Molecular Biology, Biotechnology and Genetics Research Center (ITU-MOBGAM), Istanbul Technical University, Istanbul, Turkey SERGIO BORDEL • Department of Chemical and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden PAOLA BRANDUARDI • Department of Biotechnology and Biosciences, University of Milano-Bicocca, Milan, Italy ANA RITA BROCHADO • Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany Z. PETEK ÇAKAR • Department of Molecular Biology and Genetics, Faculty of Science and Letters, Dr. Orhan Öcalgiray Molecular Biology, Biotechnology and Genetics Research Center (ITU-MOBGAM), Istanbul Technical University, Istanbul, Turkey ANDREA CAMATTARI • Graz University of Technology, Graz, Austria SÓNIA CARNEIRO • Center of Biological Engineering, IBB Institute for Biotechnology and Bioengineering, University of Minho, Braga, Portugal ANTONIO G. CORDENTE • The Australian Wine Research Institute, Adelaide, SA, Australia LAURA DATO • Department of Biotechnology and Biosciences, University of Milano-Bicocca, Milan, Italy CHRISTIAN DIETZSCH • Research Area Biochemical Engineering, Institute of Chemical Engineering, Vienna University of Technology, Vienna, Austria MICHAEL FELBER • Austrian Centre of Industrial Biotechnology, Graz, Austria PAU FERRER • Department of Chemical Engineering, Escola d’Enginyeria, Universitat Autònoma de Barcelona, Bellaterra (Cerdanyola del Vallès), Spain MARÍA R. FOULQUIÉ-MORENO • Laboratory of Molecular Cell Biology, Institute of Botany and Microbiology, KU Leuven, Flanders, Belgium; Department of Molecular Microbiology, VIB, Flanders, Belgium LOKNATH GIDIJALA • Molecular Cell Biology, Kluyver Centre for Genomics of Industrial Fermentation, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands RAMA K. GUDIMINCHI • Austrian Centre of Industrial Biotechnology (ACIB), Graz, Austria CHRISTOPH HERWIG • Research Area Biochemical Engineering, Institute of Chemical Engineering, Vienna University of Technology, Vienna, Austria GEORG HUBMANN • Molecular Systems Biology, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands; Laboratory of Molecular Cell Biology, Institute of Botany and Microbiology, KU Leuven, Flanders, Belgium; Department of Molecular Microbiology, VIB, Flanders, Belgium

ix

x

Contributors

IDA J. VAN DER KLEI • Molecular Cell Biology, Kluyver Centre for Genomics of Industrial Fermentation, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands KOUICHI KURODA • Division of Applied Life Sciences, Graduate School of Agriculture, Kyoto University, Kyoto, Japan DARIUSZ R. KUTYNA • The Australian Wine Research Institute, Adelaide, SA, Australia Z. YING LI • Ropes & Gray LLP, New York, NY, USA WOLFRAM MEYER • European Patent Office, Munich, Germany JOSHUA K. MICHENER • Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA, USA ELKE NEVOIGT • School of Engineering and Science, Jacobs University gGmbH, Bremen, Germany THIAGO M. PAIS • Laboratory of Molecular Cell Biology, Institute of Botany and Microbiology, KU Leuven, Flanders, Belgium; Department of Molecular Microbiology, VIB, Flanders, Belgium; Instituto de Ciências da Saúde, Universidade Federal de Mato Grosso – UFMT, Sinop, MT, Brazil KIRAN RAOSAHEB PATIL • Structural and Computational Biology, European Molecular Biology Laboratory, Heidelberg, Germany RUI PEREIRA • Center of Biological Engineering, IBB Institute for Biotechnology and Bioengineering, University of Minho, Braga, Portugal HARALD PICHLER • Institute of Molecular Biotechnology, Graz University of Technology, Graz, Austria; Austrian Centre of Industrial Biotechnology, Graz, Austria DANILO PORRO • Department of Biotechnology and Biosciences, University of Milano-Bicocca, Milan, Italy ISABEL ROCHA • Center of Biological Engineering, IBB Institute for Biotechnology and Bioengineering, University of Minho, Braga, Portugal CLAUDIA RUTH • Austrian Centre of Industrial Biotechnology, Graz, Austria RUCHI SARAYA • Molecular Cell Biology, Kluyver Centre for Genomics of Industrial Fermentation, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands VERENA SIEWERS • Department of Chemical and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden CHRISTINA D. SMOLKE • Department of Bioengineering, Stanford University, Stanford, CA, USA OLIVER SPADIUT • Research Area Biochemical Engineering, Institute of Chemical Engineering, Vienna University of Technology, Vienna, Austria JOHAN M. THEVELEIN • Laboratory of Molecular Cell Biology, Institute of Botany and Microbiology, KU Leuven, Flanders, Belgium; Department of Molecular Microbiology, VIB, Flanders, Belgium BURCU TURANLI-YILDIZ • Department of Molecular Biology and Genetics, Faculty of Science and Letters, Dr. Orhan Öcalgiray Molecular Biology, Biotechnology and Genetics Research Center (ITU-MOBGAM), Istanbul Technical University, Istanbul, Turkey MITSUYOSHI UEDA • Division of Applied Life Sciences, Graduate School of Agriculture, Kyoto University, Kyoto, Japan CRISTIAN VARELA • The Australian Wine Research Institute, Adelaide, SA, Australia MARTEN VEENHUIS • Molecular Cell Biology, Kluyver Centre for Genomics of Industrial Fermentation, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Groningen, The Netherlands KATRIN WEINHANDL • Austrian Centre of Industrial Biotechnology (ACIB), Graz, Austria

Part I Molecular Tools and Technology for Yeast Engineering

Chapter 1 An Overview on Selection Marker Genes for Transformation of Saccharomyces cerevisiae Verena Siewers Abstract For genetic manipulation of yeast, numerous selection marker genes have been employed. These include prototrophic markers, markers conferring drug resistance, autoselection markers, and counterselectable markers. This chapter describes the different classes of selection markers and provides a number of examples for different applications. Key words Auxotrophy, Autoselection, Drug resistance, Counterselection, Marker loop-out

1

Introduction Deletion of endogenous genes, introduction of new features into the yeast genome, as well as transformation with centromeric or episomal plasmids require the use of marker genes in order to be able to select for transformation events. While after genomic integration the new properties are usually stably inherited and the strain can be cultivated under nonselective conditions, selective conditions will in most cases have to be maintained after transformation with a non-integrative plasmid in order to avoid plasmid loss. The first marker genes used for yeast transformation were endogenous prototrophic markers, which were later complemented by dominant (mainly drug-resistance) markers and autoselection systems. In the following subchapters, different types of marker genes with their potential applications, advantages, and disadvantages are introduced.

Valeria Mapelli (ed.), Yeast Metabolic Engineering: Methods and Protocols, Methods in Molecular Biology, vol. 1152, DOI 10.1007/978-1-4939-0563-8_1, © Springer Science+Business Media, LLC 2014

3

4

Verena Siewers

2

Prototrophic Markers Prototrophic marker genes are probably the most commonly used selection markers. They are usually derived from either amino acid (e.g., LEU2, TRP1) or nucleotide base (e.g., URA3, ADE2) biosynthesis pathways and require the availability of an auxotrophic host strain carrying a nonfunctional version or a deletion of the respective gene. Further examples are listed in Table 1. Apart from using endogenous genes, it is also possible to complement auxotrophies in S. cerevisiae with heterologous genes. Examples that have shown sufficient activity to be used as selection markers are the URA3 gene of Kluyveromyces lactis [1] and the Schizosaccharomyces pombe his5+ gene [2], equivalents of S. cerevisiae HIS3. Some prototrophic markers allow for additional genotype screenings based on colony color. Strains carrying an inactive ade1 or ade2 allele result in red colonies due to the vacuolar accumulation of purine biosynthetic pathway precursors; adenine prototrophic colonies in contrast appear white [3]. Another example are methionine-auxotrophic met15 strains, which become black when grown in the presence of divalent lead ions (Pb2+), while their prototrophic counterparts stay white [4, 5].

3

C/N Source-Related Markers Several genes that confer the ability to grow on certain carbon or nitrogen sources have been used as selection markers (Table 2). S. cerevisiae cells expressing FCY1 encoding cytosine deaminase and GAP1 encoding a general amino acid permease can be selected on medium containing cytosine and L-citrulline, respectively, as sole nitrogen source [16, 17]. Since both genes are present in a wild-type strain, in analogy to auxotrophic markers, the availability of a background strain carrying the respective deletion is required. On the other hand, the LAC4/LAC12 and LSD1 genes, which allow for growth on lactose and dextran as sole carbon sources, respectively, are derived from different species and do not have any equivalents in the S. cerevisiae genome [18, 19]; i.e., they represent dominant marker genes, and this feature makes them very attractive markers for the transformation of industrial strains. All marker genes discussed so far rely on the use of chemically defined media for selection. When selective conditions are required for stable maintenance of centromeric or episomal plasmids, chemically defined media might not represent an obstacle for small-scale fermentations. They are however not practical for long-term plasmid maintenance in industrial processes that are normally based on complex media. Here, autoselection systems can serve as an alternative.

a

w/o: without

Sp his5+ Sp ura4+

Heterologous genes AURA3 CaLYS5 CaURA3 KlLEU2 KlURA3 MET2-CA

HIS2 HIS3 LEU2 LYS2 LYS5 MET15 (=MET17) TRP1 URA3

ADE2 ADE8 ECM31

Endogenous genes ADE1

Gene name

Table 1 Prototrophic markers

Arxula adeninivorans orotidine-5′-phosphate decarboxylase Candida albicans phosphopantetheinyl transferase C. albicans orotidine-5′-phosphate decarboxylase Kluyveromyces lactis β-isopropylmalate dehydrogenase K. lactis orotidine-5′-phosphate decarboxylase Saccharomyces carlsbergensis L-homoserine-O-acetyltransferase involved in methionine biosynthesis Schizosaccharomyces pombe imidazoleglycerol-phosphate dehydratase Schizosaccharomyces pombe orotidine-5′-phosphate decarboxylase

N-succinyl-5-aminoimidazole-4-carboxamide ribotide synthetase involved in purine biosynthesis Phosphoribosylaminoimidazole carboxylase involved in purine biosynthesis Phosphoribosyl-glycinamide transformylase involved in purine biosynthesis Ketopantoate hydroxymethyltransferase involved in pantothenic acid biosynthesis Histidinolphosphatase involved in histidine biosynthesis Imidazoleglycerol-phosphate dehydratase involved in histidine biosynthesis β-Isopropylmalate dehydrogenase involved in leucine biosysthesis α-Aminoadipate reductase involved in lysine biosynthesis Phosphopantetheinyl transferase involved in lysine biosynthesis O-acetyl homoserine-O-acetyl serine sulfhydrylase involved in sulfur amino acid biosynthesis Phosphoribosylanthranilate isomerase involved in tryptophan biosynthesis Orotidine-5′-phosphate decarboxylase involved in pyrimidine biosynthesis

Gene product

[5] [7] [8] [6] [9] [9] [7] [10] ([4, 5])

w/o adenine w/o adenine w/o pantothenic acid w/o histidine w/o histidine w/o leucine w/o lysine w/o lysine w/o methionine w/o cysteine w/o tryptophan w/o uracil

[11] [10] [12] [13] [1] [14] [2] [15]

w/o uracil w/o lysine w/o uracil w/o leucine w/o uracil w/o methionine w/o histidine w/o uracil

[9] [9]

[6]

Reference

w/oa adenine

Selection conditions

Selection Markers 5

6

Verena Siewers

Table 2 Carbon/nitrogen source-specific markers Gene name

Gene product

Selection conditions

Reference

amdS

Aspergillus nidulans acetamidase

Acetamide as sole nitrogen source

[65]

FCY1

S. cerevisiae cytosine deaminase

Cytosine as sole nitrogen source

[16]

FCA1

Candida albicans cytosine deaminase

Cytosine as sole nitrogen source

[16]

GAP1

S. cerevisiae general amino acid permease

L-citrulline

[17]

LAC4 + LAC12

K. lactis β-galactosidase and lactose permease

Lactose as sole carbon source

[18]

LSD1

Lipomyces starkeyi dextranase

Dextran as sole carbon source

[19]

4

source

as sole nitrogen

Autoselection Systems In an autoselection system (Table 3), the marker gene is essential for the viability of the cell under any (or almost any) growth condition. Thus, selection pressure can be maintained even in complex media. Furthermore, there is little risk of cross-feeding, which when using prototrophic markers even under selective conditions can lead to subpopulations of cells that have lost the marker gene while living on metabolites provided by the marker gene-carrying cells [20]. The URA3 system (see above) was modified by using a background strain, in which not only pyrimidine biosynthesis is inhibited by a ura3 mutation, but even the pyrimidine salvage pathway is inactivated through a fur1 urk1 double mutation. External supplementation with uracil, uridine, cytosine, or cytidine does therefore not enable growth in the absence of the URA3 gene, and URA3-bearing plasmids are stably maintained [21]. In several examples, glycolytic pathway genes such as FBA1, TPI1 (derived from either S. cerevisiae or a heterologous host), and PGI1 were used as marker genes and shown to provide stable plasmid maintenance in complex media [22–24]. A second group of genes used as autoselection markers are essential cell division cycle genes such as CDC4, CDC9, and CDC28 [23, 25, 26]. The construction and maintenance of the host strain used in an autoselection system can however require a special procedure, since an essential gene needs to be deleted. One possibility is the use of a strain that is still viable under specific conditions. For example, a strain carrying the srb1-1 allele, a mutation in PSA1 encoding GDP-mannose pyrophosphorylase involved in cell wall synthesis, is nonviable in the absence of osmotic stabilizers but can

Selection Markers

7

Table 3 Autoselection systems Gene name

Gene product

Reference

URA3 fur1 urk1

Orotidine-5′-phosphate decarboxylase; uracil phosphoribosyltransferase; uridine/cytidine kinase

[21]

FBA1

Fructose 1,6-bisphosphate aldolase

[22]

POT

Schizosaccharomyces pombe triose phosphate isomerase

[24]

TPI

A. nidulans triose phosphate isomerase

[23]

PGI1

Phosphoglucose isomerase

[23]

CDC4

F-box protein

[23]

CDC9

DNA ligase

[25]

CDC28

Catalytic subunit of the main cell cycle cyclin-dependent kinase

[26]

MOB1

Component of the mitotic exit network

[26]

PSA1 (SRB1)

GDP-mannose pyrophosphorylase

[27]

be maintained by the addition of sorbitol to the medium [27]. A second option is the use of a maintenance plasmid carrying the essential gene that can be exchanged against the target plasmid in a plasmid-shuffling procedure [26].

5

Resistance Markers If the host strain does not contain the appropriate mutant allele required for the use of a prototrophic or an autoselection marker— as it is often the case for industrial strains—a (semi)dominant marker needs to be employed. Two examples for dominant markers (LAC4/LAC12 and LSD1) based on carbon source utilization have already been mentioned above. Most (semi)dominant markers however confer resistance to various growth-inhibitory or toxic compounds (Table 4). These can be divided into three groups: 1. Endogenous genes, which confer resistance to specific agents when overexpressed either by introduction of multiple copies or by expression from a strong promoter: There are many examples of such genes in the literature, but only those specifically tested as marker genes are listed in Table 4. For instance, expression of formaldehyde dehydrogenase encoding SFA1 from the strong GPD1 promoter allowed cells to grow at up to 7 mM formaldehyde [28]. 2. Mutant alleles of endogenous genes: These may encode proteins with a lower affinity for an inhibitory drug such as a ribosomal

8

Verena Siewers

Table 4 Resistance markers Gene name

Gene product

Endogenous genes CUP1 Metallothionein conferring resistance to copper and cadmium ERG11 Lanosterol 14α-demethylase conferring resistance to azole antifungals MPR1 N-acetyltransferase conferring resistance to L-azetidine-2-carboxylic acid (AZC) SSU1 Plasma membrane sulfite pump conferring sulfite resistance SFA1 Formaldehyde dehydrogenase conferring resistance to formaldehyde YAP1 Transcription factor conferring resistance to cerulenin and cycloheximide Mutant alleles of endogenous genes ARO4-OFP Mutated DAHP synthase conferring resistance to fluorophenylalanine AUR1-C Mutated inositol-phosphoceramide synthase conferring resistance to aureobasidin A cyh2 Mutated ribosomal protein conferring resistance to cycloheximide FZF1-4 Mutated transcription factor conferring sulfite resistance LEU4-1 Mutated α-isopropylmalate synthase conferring resistance to trifluoroleucine pdr3-9 Mutated transcriptional activator conferring multidrug resistance SMR1-410/ Mutated acetolactate synthases (Ilv2) SMR1B conferring resistance to sulfometuron methyl Heterologous genes aroA E. coli 5-enolpyruvylshikimate3-phosphate synthase conferring resistance to glyphosate ble Tn5 phleomycin-binding protein conferring resistance to phleomycin cat Tn9 acetyltransferase conferring resistance to chloramphenicol dehH1 Moraxella sp. dehalogenase conferring resistance to fluoroacetate dsdA E. coli deaminase conferring resistance to D-serine hph Klebsiella pneumoniae phosphotransferase conferring resistance to hygromycin B kan Tn 903 phosphotransferase conferring resistance to G418

Selection conditions

Reference

1–14 mM CuSO4

[34, 35]

1–3 mg/l flusilazole

[36]

0.5–2.0 mg/ml AZC

[37]

3.5 mM Na2SO3

[30]

4 mM formaldehyde

[28]

0.5–1.0 μg/ml cycloheximide 1.0–4.0 μg/ml cerulenin

[38]

2 mg/ml fluorophenylalanine 0.5–2.0 μg/ml aureobasidin A

[39] [40]

0.3–10 μg/ml cycloheximide 3.5 mM Na2SO3

[29]

200 μg/ml trifluoroleucine

[41]

For example 1 μg/ml cycloheximide 20 μg/ml sulfometuron methyl

[31]

0.5–6 mg/ml glyphosate

[43]

7.5 μg/ml phleomycin

[13]

1–3 mg/ml chloramphenicol (glycerol/ethanol medium) 1 mM fluoroacetate (acetate/ethanol medium)

[44]

2 mg/ml D-serine 5 mg/ml L-proline 300 μg/ml hygromycin B

[45]

200 mg/l G418

[33]

[30]

[42]

[28]

[46]

(continued)

Selection Markers

9

Table 4 (continued) Gene name

Gene product

Selection conditions

Reference

mdr3

Mus musculus P-glycoprotein conferring resistance to FK520 Streptomyces noursei acetyltransferase conferring resistance to nourseothricin Streptomyces viridochromogenes acetyltransferase conferring resistance to bialaphos E. coli dihydrofolate reductase conferring resistance to methotrexate

100 μg/ml FK520

[47]

100 μg/ml nourseothricin

[46]

200 μg/ml bialaphos

[46]

10 μg/ml methotrexate 5 mg/ml sulfanilamide

[32]

nat1 pat R · dhfr

protein with reduced affinity for cycloheximide [29] or mutated transcription factors that increase the expression of drug exporters such as Fzf1-4 and Pdr3-9 [30, 31]. 3. Heterologous genes: Drug resistance in these cases can, e.g., be based on the expression of an enzyme which, in contrast to the native one, is not susceptible to an inhibitor such as E. coli dihydrofolate reductase, which unlike the homologous yeast enzyme is insensitive to methotrexate [32]. Other mechanisms are based on drug inactivation, e.g., by binding (ble, [13]) or enzymatic inactivation (kan, [33]). It should be noted that the selection conditions given in Table 4 represent indicative values derived from the cited examples. Drug concentrations suitable for selection may need to be adapted depending on the natural resistance/sensitivity of the desired host strain, the expression level of the resistance gene (strong vs. weak promoter and single-copy vs. multi-copy vector) and the medium composition (defined vs. complex medium). The use of some resistance marker genes comes along with very specific medium requirements. As chloramphenicol only acts on mitochondrial and not on cytoplasmic ribosomes, it should be used in combination with non-fermentable carbon sources whose assimilation is dependent on intact mitochondria. Another example is G418, which has a reduced activity in ammonium sulfate-containing media. When using minimal medium for G418-based selection, ammonium sulfate as nitrogen source should therefore be replaced with glutamate. In many cases, it has been shown to be beneficial to incubate the transformed cells for ca. 2 up to 18 h in nonselective medium before plating them on selective plates in order to allow for the resistance gene to be expressed.

Verena Siewers

10

6

Marker Reuse and Counterselection If a yeast strain needs to be cured of a plasmid, e.g., during plasmid shuffling, or an integrated marker is to be eliminated, e.g., for later reuse in an iterative gene deletion or gene integration approach, counterselectable markers can be of great benefit. These allow selecting for their absence in a cell. Several selectable markers presented so far are at the same time counterselectable, the most widely used system being the URA3 marker, which does not allow for growth in the presence of 5-fluoroorotic acid [48] (Table 5). Other markers do not allow for positive selection but only counterselection. An example is PKA3 (TPK2) encoding the catalytic subunit of cAMP-dependent protein kinase, which becomes toxic to

Table 5 Counterselectable markers Counterselection conditions

Reference

A. nidulans acetamidase conferring sensitivity to fluoroacetamide

2.3 g/l fluoroacetamide

[65]

LYS2 LYS5 CaLYS5

α-aminoadipate reductase; phosphopantetheinyl transferase

α-aminoadipate as sole nitrogen source

[56]

MET15 (=MET17)

O-acetyl homoserine-O-acetyl serine sulfhydrylase conferring sensitivity to methylmercury

1 μM methylmercury

[4]

TRP1

Phosphoribosylanthranilate isomerase conferring sensitivity to 5-fluoroanthranilic acid (5-FAA)

0.5–1 g/l 5-FAA

[57]

URA3 AURA3 KlURA3 Sp ura4+

Orotidine-5′-phosphate decarboxylase conferring sensitivity to 5-fluoroorotic acid (5-FOA)

1 g/l 5-FOA

[48]

FCY1 FCA1

Cytosine deaminase conferring sensitivity to 5-fluorocytosine (5-FC)

100 μM–1 mM 5-FC

[16]

GAP1

General amino acid permease conferring sensitivity to D-histidine

1.6 g/l D-histidine 1 g/l L-proline

[17]

CAN1

Arginine permease conferring sensitivity to L-canavanine

100 μg/ml L-canavanine

[58]

GIN11M86

(Modified version of a subtelomeric, growth-inhibitory sequence)

Induced overexpression

[59]

PKA3 (=TPK2)

Catalytic subunit of cAMP-dependent protein kinase

Induced overexpression

[14]

Gene name

Gene product

amdS

Selection Markers

11

cells when overexpressed. By setting this gene under the control of a strong conditional promoter, marker gene loss can be initiated through promoter induction [14]. When eliminating an integrated marker, the aim is usually to select for complete deletions and not for random mutations: a common approach is to flank the marker gene with direct repeats, as homologous recombination between these repeats will lead to a marker gene “loop-out” [49]. While being rather simple, this method has the disadvantage that it leaves one copy of the relatively large repeat sequence in the genome (usually at least about 150 bp, [50]), which can lead to unwanted recombination events in case the same marker gene cassette is used multiple times. To avoid any sequence remnants, the so-called delitto perfetto approach has been developed. Here, the marker gene is removed in a second transformation step using oligonucleotides homologous to the sequences flanking the integration site [51]. The efficiency of both methods has been improved through the introduction of I-SceI recognition sites into the marker gene cassette and inducible expression of the corresponding endonuclease to generate doublestrand breaks [52, 53]. An additional possibility is recombination between direct repeats, which represent recognition sites for specific recombinase enzymes. With about 34 bp, these are much shorter than the direct repeats employed for recombination via the yeast DNA repair system as described above. Both the bacteriophage-derived Cre/loxP system as well as the yeast 2 μm plasmid-based Flp/FRT system have been applied for marker gene excision [54, 55]. An advantage is the high efficiency of this approach, which therefore does not necessarily require the use of a counterselectable marker.

7

Marker Modifications When introducing marker genes into S. cerevisiae, it is in general desirable to reduce the amount of sequences with a high degree of homology to yeast genome sequences in order to prevent unwanted recombination events. One possibility is to use the entire marker gene cassettes consisting of promoter, gene, and terminator derived from a different organism such as in the case of the KlURA3 cassette (i.e., the URA3 gene from K. lactis) [1]. In the ideal situation, this organism should have enough sequence dissimilarity to avoid recombination but still be related closely enough for the promoter and terminator to function in S. cerevisiae. In other cases, marker genes from non-related organisms are combined with a promoter and terminator of fungal origin other than S. cerevisiae. Very commonly used are the so-called MX cassettes containing the Ashbya gossypii (Ag) TEF promoter and terminator (e.g., [33, 46]). It has however recently been discovered that the AgTEF promoter can be toxic to S. cerevisiae at high copy numbers [60].

12

Verena Siewers

A second advantage for using heterologous promoters can be a low activity in S. cerevisiae. If the transcription level is not high enough to sustain cell growth, this may be compensated for by increasing the copy number. This can be exploited when selecting for cells with multiple integration events or increased copy numbers of episomal plasmids. An extreme example is probably the utilization of the neo gene conferring resistance to G418 without any eukaryotic promoter to select for multiple genomic integrations [61]. In other studies, marker genes have been put under the control of a regulatable [22] or a truncated promoter, such as in the widely used LEU2-d and URA3-d alleles in order to increase episomal plasmid copy number [62, 63]. Additional approaches to promote high plasmid copy number are the use of temperaturesensitive marker gene alleles [25] or the use of markers with a reduced protein half-life [64].

8

Concluding Remarks It may have become clear from the above explanations that a marker gene needs to be selected based on the specific application. The first decisive element is the background strain: if it does not contain the appropriate genotype, e.g., an auxotrophy that allows for the use of a recessive marker allele, a dominant marker gene has to be chosen. This is often the case when transforming wild or industrial strains. The marker gene will also have an influence on transformation efficiency. Especially the use of drug resistance markers requires the optimization of selection conditions to obtain sufficient amounts of transformants while avoiding the growth of non-transformed background colonies. For certain drugs, spontaneous resistant mutants may appear. Of crucial importance is the marker choice when plasmids need to be maintained in the cell. In fact, a specific medium composition in order to maintain selection pressure can represent an important cost factor especially for the use of antibiotics and in large-scale fermentations. In addition, the marker will influence the stability but also the copy number of an episomal plasmid.

References 1. Längle-Rouault F, Jacobs E (1995) A method for performing precise alterations in the yeast genome using a recyclable selectable marker. Nucleic Acids Res 23:3079–3081 2. Wach A, Brachat A, Alberti-Segui C et al (1997) Heterologous HIS3 marker and GFP reporter modules for PCR-targeting in Saccharomyces cerevisiae. Yeast 13: 1065–1075

3. Ugolini S, Bruschi CV (1996) The red/white colony color assay in the yeast Saccharomyces cerevisiae: epistatic growth advantage of white ade8-18, ade2 cells over red ade2 cells. Curr Genet 30:485–492 4. Cost GJ, Boeke JD (1996) A useful colony colour phenotype associated with the yeast selectable/counter-selectable marker MET15. Yeast 12:939–941

Selection Markers 5. Brachmann CB, Davies A, Cost GJ et al (1998) Designer deletion strains derived from Saccharomyces cerevisiae S288C: a useful set of strains and plasmids for PCR-mediated gene disruption and other applications. Yeast 14: 115–132 6. Chee MK, Haase SB (2012) New and redesigned pRS plasmid shuttle vectors for genetic manipulation of Saccharomyces cerevisiae. G3 (Bethesda) 2:515–526 7. Sadowski I, Su T-C, Parent J (2007) Disintegrator vectors for single-copy yeast chromosomal integration. Yeast 24:447–455 8. Shimoi H, Okuda M, Ito K (2000) Molecular cloning and application of a gene complementing pantothenic acid auxotrophy of sake yeast Kyokai no. 7. J Biosci Bioeng 90: 643–647 9. Sikorski RS, Hieter P (1989) A system of shuttle vectors and yeast host strains designed for efficient manipulation of DNA in Saccharomyces cerevisiae. Genetics 122:19–27 10. Ito-Harashima S, McCusker JH (2004) Positive and negative selection LYS5MX gene replacement cassettes for use in Saccharomyces cerevisiae. Yeast 21:53–61 11. Giersberg M, Degelmann A, Bode R et al (2012) Production of a thermostable alcohol dehydrogenase from Rhodococcus ruber in three different yeast species using the Xplor®2 transformation/expression platform. J Ind Microbiol Biotechnol 39:1385–1396 12. Goldstein AL, Pan X, McCusker JH (1999) Heterologous URA3MX cassettes for gene replacement in Saccharomyces cerevisiae. Yeast 15:507–511 13. Gueldener U, Heinisch J, Koehler GJ et al (2002) A second set of loxP marker cassettes for Cre-mediated multiple gene knockouts in budding yeast. Nucleic Acids Res 30:e23 14. Olesen K, Johannesen PF, Hoffmann L et al (2000) The pYC plasmids, a series of cassettebased yeast plasmid vectors providing means of counter-selection. Yeast 16:1035–1043 15. Jakopec V, Walla E, Fleig U (2011) Versatile use of Schizosaccharomyces pombe plasmids in Saccharomyces cerevisiae. FEMS Yeast Res 11:653–655 16. Hartzog PE, Nicholson BP, McCusker JH (2005) Cytosine deaminase MX cassettes as positive/negative selectable markers in Saccharomyces cerevisiae. Yeast 22:789–798 17. Regenberg B, Hansen J (2000) GAP1, a novel selection and counter-selection marker for multiple gene disruptions in Saccharomyces cerevisiae. Yeast 16:1111–1119

13

18. Leite FC, Dos Anjos RS, Basilio AC et al (2013) Construction of integrative plasmids suitable for genetic modification of industrial strains of Saccharomyces cerevisiae. Plasmid 69:114–117 19. Zhang Y, Wang Z-Y, He X-P et al (2008) New industrial brewing yeast strains with ILV2 disruption and LSD1 expression. Int J Food Microbiol 123:18–24 20. Pronk JT (2002) Auxotrophic yeast strains in fundamental and applied research. Appl Environ Microbiol 68:2095–2100 21. Napp SJ, Da Silva NA (1993) Enhancement of cloned gene product synthesis via autoselection in recombinant Saccharomyces cerevisiae. Biotechnol Bioeng 41:801–810 22. Compagno C, Tura A, Ranzi BM et al (1993) Copy number modulation in an autoselection system for stable plasmid maintenance in Saccharomyces cerevisiae. Biotechnol Prog 9: 594–599 23. Kawasaki GH, Bell L (1999) Stable DNA constructs. US Patent 5871957 24. Thim L, Hansen MT, Norris K et al (1986) Secretion and processing of insulin precursors in yeast. Proc Natl Acad Sci U S A 83: 6766–6770 25. Unternährer S, Pridmore D, Hinnen A (1991) A new system for amplifying 2 μm plasmid copy number in Saccharomyces cerevisiae. Mol Microbiol 5:1539–1548 26. Geymonat M, Spanos A, Sedgwick SG (2007) A Saccharomyces cerevisiae autoselection system for optimised recombinant protein expression. Gene 399:120–128 27. Rech SB, Stateva LI, Oliver SG (1992) Complementation of the Saccharomyces cerevisiae srb1-1 mutation: an autoselection system for stable plasmid maintenance. Curr Genet 21:339–344 28. van den Berg MA, Steensma HY (1997) Expression cassettes for formaldehyde and fluoroacetate resistance, two dominant markers in Saccharomyces cerevisiae. Yeast 13: 551–559 29. del Pozo L, Abarca D, Claros MG, Jiménez A (1991) Cycloheximide resistance as a yeast cloning marker. Curr Genet 19:353–358 30. Park H, Lopez NI, Bakalinsky AT (1999) Use of sulfite resistance in Saccharomyces cerevisiae as a dominant selectable marker. Curr Genet 36:339–344 31. Lacková D, Šubík J (1999) Use of mutated PDR3 gene as a dominant selectable marker in transformation of prototrophic yeast strains. Folia Microbiol 44:171–176

14

Verena Siewers

32. Miyajima A, Miyajima I, Arai K-I, Arai N (1984) Expression of plasmid R388-encoded type II dihydrofolate reductase as a dominant selective marker in Saccharomyces cerevisiae. Mol Cell Biol 4:407–414 33. Wach A, Brachat A, Pöhlmann R, Philippsen P (1994) New heterologous modules for classical or PCR-based gene disruptions in Saccharomyces cerevisiae. Yeast 10:1793–1808 34. Hottiger T, Kuhla J, Pohlig G et al (1995) 2-μm vectors containing the Saccharomyces cerevisiae metallothionein gene as a selectable marker: excellent stability in complex media, and high-level expression of a recombinant protein from a CUP1-promoter-controlled expression cassette in cis. Yeast 11:1–14 35. Zhang JG, Liu XY, He XP et al (2011) Improvement of acetic acid tolerance and fermentation performance of Saccharomyces cerevisiae by disruption of the FPS1 aquaglyceroporin gene. Biotechnol Lett 33:277–284 36. Doignon F, Aigle M, Ribereau-Gayon P (1993) Resistance to imidazoles and triazoles in Saccharomyces cerevisiae as a new dominant marker. Plasmid 30:224–233 37. Ogawa-Mitsuhashi K, Sagane K, Kuromitsu J et al (2009) MPR1 as a novel selection marker in Saccharomyces cerevisiae. Yeast 26:587–593 38. Akada R, Shimizu Y, Matsushita Y et al (2002) Use of a YAP1 overexpression cassette conferring specific resistance to cerulenin and cycloheximide as an efficient selectable marker in the yeast Saccharomyces cerevisiae. Yeast 19:17–28 39. Fukuda K, Watanabe M, Asano K et al (1992) Molecular breeding of a sake yeast with a mutated ARO4 gene which causes both resistance to o-fluoro-DL-phenylalanine and increased production of β-phenethyl alcohol. J Ferment Bioeng 73:366–369 40. Hashida-Okado T, Ogawa A, Kato I, Takesako K (1998) Transformation system for prototrophic industrial yeasts using the AUR1 gene as a dominant selection marker. FEBS Lett 425: 117–122 41. Bendoni B, Cavalieri D, Casalone E et al (1999) Trifluoroleucine resistance as a dominant molecular marker in transformation of strains of Saccharomyces cerevisiae isolated from wine. FEMS Microbiol Lett 180:229–233 42. Xie Q, Jiménez A (1996) Molecular cloning of a novel allele of SMR1 which determines sulfometuron methyl resistance in Saccharomyces cerevisiae. FEMS Microbiol Lett 137:165–168 43. Kunze G, Bode R, Rintala H, Hofemeister J (1989) Heterologous gene expression of the glyphosate resistance marker and its applica-

44.

45.

46.

47.

48.

49.

50.

51. 52.

53.

54.

55.

tion in yeast transformation. Curr Genet 15:91–98 Hadfield C, Cashmore AM, Meacock PA (1986) An efficient chloramphenicol-resistance marker for Saccharomyces cerevisiae and Escherichia coli. Gene 45:149–158 Vorachek-Warren MK, McCusker JH (2004) DsdA (D-serine deaminase): a new heterologous MX cassette for gene disruption and selection in Saccharomyces cerevisiae. Yeast 21:163–171 Goldstein AL, McCusker JH (1999) Three new dominant drug resistance cassettes for gene disruption in Saccharomyces cerevisiae. Yeast 15:1541–1553 Raymond M, Ruetz S, Thomas DY, Gros P (1994) Functional expression of P-glycoprotein in Saccharomyces cerevisiae confers cellular resistance to the immunosuppressive and antifungal agent FK520. Mol Cell Biol 14: 277–286 Boeke JD, LaCroute F, Fink GR (1984) A positive selection for mutants lacking orotidine-5′-phosphate decarboxylase activity in yeast: 5-fluoro-orotic acid resistance. Mol Gen Genet 197:345–346 Alani E, Cao L, Kleckner N (1987) A method for gene disruption that allows repeated use of URA3 selection in the construction of multiply disrupted yeast strains. Genetics 116: 541–545 Reid RJ, Lisby M, Rothstein R (2002) Cloningfree genome alterations in Saccharomyces cerevisiae using adaptamer-mediated PCR. Methods Enzymol 350:258–277 Storici F, Lewis LK, Resnick MA (2001) In vivo site-directed mutagenesis using oligonucleotides. Nat Biotechnol 19:773–776 Fairhead C, Llorente B, Denis F et al (1996) New vectors for combinatorial deletions in yeast chromosomes and for gap-repair cloning using “split-marker” recombination. Yeast 12:1439–1457 Storici F, Durham CL, Gordenin DA, Resnick MA (2003) Chromosomal site-specific doublestrand breaks are efficiently targeted for repair by oligonucleotides in yeast. Proc Natl Acad Sci U S A 100:14994–14999 Güldener U, Heck S, Fielder T et al (1996) A new efficient gene disruption cassette for repeated use in budding yeast. Nucleic Acids Res 24:2519–2524 Storici F, Coglievina M, Bruschi CV (1999) A 2-μm DNA-based marker recycling system for multiple gene disruption in the yeast Saccharomyces cerevisiae. Yeast 15:271–283

Selection Markers 56. Chattoo BB, Sherman F, Azubalis DA et al (1979) Selection of lys2 mutants of the yeast Saccharomyces cerevisiae by the utilization of α-aminoadipate. Genetics 93:51–65 57. Toyn JH, Gunyuzlu PL, White WH et al (2000) A counterselection for the tryptophan pathway in yeast: 5-fluoroanthranilic acid resistance. Yeast 16:553–560 58. Suizu T, Iimura Y, Gomi K et al (1989) L-Canavanine resistance as a positive selectable marker in diploid yeast transformation through integral disruption of the CAN1 gene. Agric Biol Chem 53:431–436 59. Akada R, Hirosawa I, Kawahata M et al (2002) Sets of integrating plasmids and gene disruption cassettes containing improved counterselection markers designed for repeated use in budding yeast. Yeast 19:393–402 60. Babazadeh R, Jafari SM, Zackrisson M et al (2011) The Ashbya gossypii EF-1α promoter of the ubiquitously used MX cassettes is toxic to Saccharomyces cerevisiae. FEBS Lett 585: 3907–3913

15

61. Wang X, Wang Z, Da Silva NA (1996) G418 Selection and stability of cloned genes integrated at chromosomal delta sequences of Saccharomyces cerevisiae. Biotechnol Bioeng 49:45–51 62. Loison G, Vidal A, Findeli A et al (1989) High-level of expression of a protective antigen of schistosomes in Saccharomyces cerevisiae. Yeast 5:497–507 63. Erhart E, Hollenberg CP (1983) The presence of a defective LEU2 gene on 2 μ DNA recombinant plasmids of Saccharomyces cerevisiae is responsible for curing and high copy number. J Bacteriol 156:625–635 64. Chen Y, Partow S, Scalcinati G et al (2012) Enhancing the copy number of episomal plasmids in Saccharomyces cerevisiae for improved protein production. FEMS Yeast Res 12: 598–607 65. Solis-Escalante D, Kuijpers NGA, Bongaerts N et al (2013) amdSYM, a new dominant recyclable marker cassette for Saccharomyces cerevisiae. FEMS Yeast Res 13:126–139

Chapter 2 Natural and Modified Promoters for Tailored Metabolic Engineering of the Yeast Saccharomyces cerevisiae Georg Hubmann, Johan M. Thevelein, and Elke Nevoigt Abstract The ease of highly sophisticated genetic manipulations in the yeast Saccharomyces cerevisiae has initiated numerous initiatives towards development of metabolically engineered strains for novel applications beyond its traditional use in brewing, baking, and wine making. In fact, baker’s yeast has become a key cell factory for the production of various bulk and fine chemicals. Successful metabolic engineering requires fine-tuned adjustments of metabolic fluxes and coordination of multiple pathways within the cell. This has mostly been achieved by controlling gene expression at the transcriptional level, i.e., by using promoters with appropriate strengths and regulatory properties. Here we present an overview of natural and modified promoters, which have been used in metabolic pathway engineering of S. cerevisiae. Recent developments in creating promoters with tailor-made properties are also discussed. Key words Promoter, Yeast, Saccharomyces cerevisiae, Engineering, Metabolic engineering

1

Introduction Many initial metabolic engineering approaches established new pathways and/or restructured the cell’s metabolic network by either strong, constitutive expression or entire deletion of genes encoding appropriate enzymes, transporters, or regulatory proteins. However, these extreme types of modifications often cause “metabolic burden,” particularly when the modification affects the cell’s energy and redox metabolism or when the final product or one or more pathway intermediates are toxic. It became obvious that metabolic engineering approaches targeting the central carbon and energy metabolism require fine-tuning of pathways to ensure a good balance between smooth flux towards the final product and the basic metabolic requirements of the cell. A recent example of such an attempt has been provided by the study of Hubmann et al. [1]. In some cases, it might even be necessary to uncouple biomass production from product formation, allowing a period of cell growth without the burden caused by product formation.

Valeria Mapelli (ed.), Yeast Metabolic Engineering: Methods and Protocols, Methods in Molecular Biology, vol. 1152, DOI 10.1007/978-1-4939-0563-8_2, © Springer Science+Business Media, LLC 2014

17

18

Georg Hubmann et al.

In this case, the metabolic flux towards the product is only switched on in a later phase of the process, i.e., after sufficient biomass has been formed. This latter scenario has traditionally been applied when engineering cell factories for heterologous protein production. There are manifold molecular levels at which metabolic fluxes (i.e., the involved protein activities) in a cell can be modified. Gene copy number, transcription efficiency, mRNA stability, translation efficiency, protein stability, and allosteric control are the most obvious set screws. In cell factory design, control of transcription efficiency has been most popular. This is certainly due to the fact that cellular regulation at this molecular level is relatively well understood and, thus, easy to modify by targeted genetic modifications. The transcription of genes is mainly controlled by regions upstream of the coding sequences and appropriate protein factors (global and gene-specific transcription factors) which bind to these regions and recruit the RNA polymerase. These sequences are referred to as promoters. They control how often a gene is transcribed in a given period of time. They also allow the cell to regulate transcription efficiency as a function of the external and internal conditions. High diversity in promoter structure within a genome allows for very complex gene- and condition-dependent transcriptional regulation. In prokaryotes, genes encoding enzymes, transporters, and regulatory proteins involved in a certain metabolic pathway are often clustered as an operon in the genome, i.e., transcribed as one single polycistronic mRNA and thus concertedly regulated at this level. Eukaryotic organisms, such as S. cerevisiae, usually lack these gene modules, and each coding sequence has its own upstream sequence for initiation (promoter) and a downstream sequence for termination of transcription (terminator). In the context of engineering complex metabolic pathways and regulatory networks of S. cerevisiae, the natural lack of polycistronic gene expression requires precisely coordinating transcription of each individual gene in terms of strength and timing [2]. Nevertheless, there have been attempts of obtaining polycistronic gene expression in eukaryotic organisms including the yeast S. cerevisiae by using viral ribosome entry sequences [3]. A large body of knowledge exists regarding the initiation and control of transcription in S. cerevisiae due to the fact that this yeast has become a major eukaryotic model organism in fundamental research. Various well-characterized endogenous as well as synthetic promoters of different strength and regulation have been identified or developed in S. cerevisiae in order to precisely control gene expression at the transcriptional level. This review first introduces the general structure of S. cerevisiae promoters and the basic principle of the transcription initiation process. Afterwards, natural and modified promoters as well as promoter collections which have been used in yeast metabolic engineering are described, and methods to obtain promoters with tailor-made properties are also addressed. Finally, the review discusses the limitations of promoter modifications in the context of metabolic engineering.

Promoters for Engineering of Saccharomyces cerevisiae

2

19

Promoter Structure of Protein-encoding Genes in Saccharomyces cerevisiae

2.1 Structure of Yeast Core Promoters

In eukaryotes, transcription of protein-encoding genes is performed by RNA polymerase II. The complex process of transcription generally occurs in two steps. First, the basic transcription machinery composed of RNA polymerase II and general (basal) transcription factors are assembled at the core promoter forming the pre-initiation complex (PIC). The core promoter responsible for basal transcription is located directly upstream of an open reading frame (Fig. 1). The polymerase is subsequently released from the PIC and starts the gene transcription at the transcriptional start site (TSS) (Fig. 1). The general transcription factors remain bound at the core promoter and facilitate re-initiation of transcription of the same gene in consecutive rounds by recruiting free RNA polymerase II. Therefore, the core promoter is indispensible for significant transcription of any gene. For a comprehensive review on transcription initiation in S. cerevisiae the reader is referred to Hahn and Young [4]. In general, the initiation at the TSS can occur either within a single nucleotide referred to as focused transcription initiation or within a cluster of a broader range of 50–100 bp nucleotides, containing several weak start points referred to as dispersed transcription initiation [5]. Focused and dispersed transcription initiation correlate with the two distinct types of promoters commonly found in yeast [6]; i.e., focused transcription is typically associated with regulated promoters, whereas dispersed transcription is commonly found in constitutive promoters [5].

Fig. 1 Frequent promoter motifs of (a) metazoan and (b) Saccharomyces cerevisiae promoters. The metazoan promoter contains several conserved motifs, including the TATA box, the transcriptional start site (TSS; +1) at the initiator (INR), recognition elements of the transcription factor IIB (BRE elements), and downstream regulatory elements, such as the motif 10 element (MTE) and core-promoter element (DPE). Compared to metazoan promoters, only the TATA box has been identified as a similar conserved motif in about 20 % of all S. cerevisiae core promoters. The cis-acting elements in S. cerevisiae, which are recognized by specific transcription activators or repressors, are located further upstream of the core promoter in the upstream activator or repressor sequences (UAS/URS)

20

Georg Hubmann et al.

Both RNA polymerase II and general transcription factors occupy at least 60 bp of the core promoter DNA [4]. In fact, they interact with several core promoter motifs, which are located from around 40 bp upstream to 40 bp downstream, relative to the TSS. In general, core promoters contain several functional DNA motifs, referred to as core promoter elements (Fig. 1). Metazoan core promoter elements include (1) the TATA box located at −30 relative to the TSS; (2) the initiator (INR) element located at, or immediately adjacent to, the TSS; (3) the TFIIB recognition element (BRE) immediately flanking the TATA box; as well as (4) two further elements located downstream of the TSS, i.e., the downstream promoter element (DPE) centered at +30 and the motif 10 element (MTE) (Fig. 1). Notably, most of these core promoter motifs are degenerated in S. cerevisiae, and their occurrence is remarkably variable in different promoters. In fact, out of the above metazoan core promoter motifs, only the TATA motif is clearly discernible in approximately 20 % of all yeast genes [7, 8]. The location of the TATA box in S. cerevisiae is variable between 40 and 120 bp upstream of the TSS, while it is fixed at 25–30 bp upstream of the transcription start site in metazoan promoters (Fig. 1). The TATA box synergistically acts together with the TSS to determine the direction of the transcription. As comprehensively reviewed by Rando and Winston [9], the TATA box is found in regulated S. cerevisiae promoters, i.e., those which are highly controlled by external and internal conditions. This is in contrast to constitutive promoters found upstream of housekeeping genes, whose expression level is basically the same even under different environmental conditions. According to the authors, genes with and without TATA box are referred to as stress and growth genes, respectively. Interestingly, those S. cerevisiae promoters which carry a TATA box are associated with an atypical chromatin structure, exhibiting dense occupation of the core promoter by nucleosomes instead of the usual nucleosome depletion at the TSS in constitutive promoters. Thus, yeast promoters with a TATA box show chromatin structure-dependent expression, resulting in a higher expression variability compared to TATA box-less promoters [10]. Transcription is most commonly initiated at an adenosine site within the TSS, referred to as the “+1” position. A frequently appearing TSS motif was identified with the consensus sequence “PuPuPyPuPu”—i.e., a pyrimidine nucleotide (Py) flanked on either side by two purine nucleotides (Pu) [11, 12]. Other not yet identified TSS motifs might exist: this is supported by the fact that promoters deficient in the latter motif only show slightly reduced or even unaffected expression strength. As a consequence of variability in TSS motifs, the transcription process in S. cerevisiae might even start at several heterogeneous positions within the core promoters [11]. Other frequently appearing motifs in the core promoter in S. cerevisiae are AT-rich sequences, which usually consist of short

Promoters for Engineering of Saccharomyces cerevisiae

21

repetitive nucleotide sequences of two to six nucleotides, referred to as tandem repeats. However, only one-fourth of all yeast core promoters contain such poly(dA-dT) motifs, which are located about 20–100 nucleotides upstream the TSS [10]. These repeats have been shown to be very unstable; i.e., they mutate at a frequency which is much higher than the average natural mutation rate [13, 14]. Mutations frequently change the number of repeats, which increases or decreases the length of the entire repeat region. As addressed by Tirosh et al. [10], the number of tandem repeats present in a certain yeast core promoter affects the density and positioning of nucleosomes in promoter regions relevant for transcription. Close or distant positioning clearly changes the accessibility of a motif for transcription factors. Therefore, insertion or deletion of these tandem repeats is a way to modify promoter strength and characteristics. 2.2 Upstream Cis-Acting Promoter Elements

Upstream of the S. cerevisiae core promoter additional regulatory sequences may be located. These sequences enable gene-specific transcriptional regulation in response to certain stimuli, such as changes in external conditions, and are referred to as cis-regulatory elements. These elements are short specific DNA motifs, typically 10–30 bp in length. They represent binding sites for transcription factors which are not part of the PIC but act as regulators for either activation or repression of transcription in a gene-specific manner depending on the conditions. Transcriptional activators (TAs) activate gene expression by recruiting the transcriptional machinery [15]. In contrast, transcriptional repressors (TRs) maintain chromatin structure in a repressive state when bound to the cisregulatory element, thereby disabling transcription and binding of activators [4]. TAs and TRs bind to specific upstream activation (UAS) or upstream repression sequences (URS), respectively. These binding sequences are predominantly located within nucleosome-depleted regions of yeast promoters or are exposed at the surface of nucleosomes for easier accessibility [4]. Genes which are concertedly regulated by a particular signal often have the same specific upstream binding motif and are regulated by the same transcriptional regulator. The majority of upstream cis-regulatory sites are located between 100 and 500 bp upstream of the TSS. In some rare cases, the distance between a UAS or a URS and TSS can be up to 1,400 bp [16]. Some S. cerevisiae promoters have a very simple architecture and contain only a single cis-regulatory element, enabling the activation or the repression in response to one specific stimulus. However, more often, several different UASs and URSs are arranged next to each other, allowing for a more complex regulation by various stimuli, involving multiple TAs and TRs [17]. Particular motifs might also be present in multiple copies. Copy number variation in cis-regulatory promoter elements possibly allows for a graded transcriptional response [17].

22

Georg Hubmann et al.

Many conserved cis-regulatory motifs have been identified by classical genetic approaches in S. cerevisiae, such as specific reporter plasmids [18], or by in silico searching for conserved motifs in promoter regions across the genome sequences of several yeast species [19, 20]. Recently, a promoter motif detection by bioinformatics combined with chromatin immunoprecipitation (ChIP) sequencing data [21, 22] has allowed the establishment of a detailed genome map of transcriptional binding sites for specific TAs and TRs in yeast, which is accessible at http://fraenkel.mit.edu/yeast_ map_2006/ [17]. 2.3 Bidirectional Promoters

3

Often, two neighboring genes showing opposite direction of transcription are regulated by a single promoter which is active in both directions. This allows for the relatively high gene density in the yeast genome. The most prominent example for a bidirectional promoter is the GAL1/GAL10 promoter. Approximately 600 bp lie between the TSS of GAL1 and the TSS of GAL10, and the expression of both genes is coordinately induced in the presence of galactose but repressed as long as glucose is present [23, 24]. A GC-rich sequence located almost in the center between the two genes is the binding site for the TA Gal4 responsible for bidirectional transcriptional activation induced by galactose. This regulatory region is flanked by AT-rich regions. Recent investigations by Xu et al. [25] and Neil et al. [26] found that bidirectionality is a widespread phenomenon in yeast promoters.

Natural Endogenous and Heterologous Promoters for Gene Expression in S. cerevisiae Several promoters harboring cis-acting promoter elements of the yeast S. cerevisiae are well characterized in terms of regulation and expression strength under diverse growth conditions. These endogenous promoters have represented valuable tools and found wide applications for metabolic engineering of S. cerevisiae. Homologous promoters most often used are discussed below. The use of heterologous promoters can also be useful in yeast metabolic engineering, and a few examples are mentioned.

3.1 Constitutive Promoters

Constitutive promoters are frequently used in metabolic pathway engineering, particularly if the desired product can or should be produced throughout the entire production process covering all phases of microbial growth. Constitutive gene expression is most frequently found for housekeeping genes, for example the genes encoding enzymes that catalyze the central catabolic pathways such as glycolysis, structural proteins such as actin, or proteins involved in other pivotal cellular processes such as transcription and translation. The transcription of these genes is largely independent from environmental conditions and growth phase; i.e., strong

Promoters for Engineering of Saccharomyces cerevisiae

23

induction or repression is absent, and expression is generally assumed to occur at a relatively constant level. Nevertheless, it has to be noted that the promoters have often been tested only under a limited number of conditions and fluctuations in expression might still be possible when different environmental conditions are applied. Several endogenous, constitutive yeast promoters are well characterized in terms of relative activity. Taken together, these natural S. cerevisiae promoters cover a wide range of activities, i.e., from low to very high expression strength [27–31], and per se represent useful tools for metabolic engineering. Many commonly used constitutive promoters in S. cerevisiae originate from genes encoding enzymes in the yeast glycolytic pathway, for instance the promoters of genes encoding for phosphoglycerate kinase (PPGK1), pyruvate decarboxylase (PPDC1), triosephosphate isomerase (PTPI1), alcohol dehydrogenase (PADH1), glyceraldehyde 3-phosphate dehydrogenase (PTDH3; also often referred to as PGPD), and pyruvate kinase PPYK1 [32]. Other frequently used constitutive promoters for S. cerevisiae are derived from genes encoding a translation elongation factor (PTEF1), a cytochrome c isoform (PCYC1), actin (PACT1), or a hexose transporter (PHXT7). The relative strength of these constitutive promoters has been determined by several authors [27–31]. Notably, the cited studies differ considerably with regard to the experimental setup. For example, some studies used episomal (multicopy) plasmids for expression studies of promoter/reporter constructs, while others integrated the constructs (in a single copy) into the genome. Moreover, the reporter proteins chosen as readout for promoter activity as well as the growth conditions varied. These experimental differences aggravate reliable comparison of the obtained results and may well explain discrepancies between them. Nevertheless, one can deduce general tendencies from these studies, particularly with regard to the relative expression strength of the tested promoters as recently reviewed by Da Silva and Srikrishnan [32] and the former studies have also been included in Table 1. A more comprehensive characterization of several constitutive promoters has recently been carried out by Partow et al. [30] and Sun et al. [31]. Both studies constitute a very valuable source of knowledge for choosing promoters in yeast metabolic engineering. Partow and co-workers [30] compared the expression strength of different promoters using the lacZ gene as a reporter, i.e., ß-galactosidase activity as a readout. The authors used an integrative plasmid and compared seven different constitutive yeast promoters (PADH1, PHXT7, PPGK1, PTPI1, PPYK1, PTDH3, and PTEF1) with regard to their expression strength during different growth phases. Generally, the activity of most promoters was dependent on the carbon source. When consuming glucose in batch cultivation, the relative strength of the tested promoters was as follows: PTEF1 ~ PPGK1 ~ PTDH3 > PTPI1 ~ PPYK1 > PADH1 > PHXT7

24

Georg Hubmann et al.

Table 1 Relative strengths of often used constitutive Saccharomyces cerevisiae promoters Promoter

Promoter strength

References

ACT1

++

[28]

ADH1

+

[28, 30, 31, 99, 100]

CYC1

+

[99]

ENO2

++

[31, 100]

FBA1

++

[31]

GPM1

++

[31]

HXT7

+

[30, 31, 100]

PDC1

++

[31, 100]

PGK1

++

[28, 30, 31, 100]

PGI1

+

[31]

PYK1

++

[30, 31, 100]

TDH2

++

[31]

TDH3

++

[30, 31]

TEF1

+++

[30, 31]

TEF2

++

[31]

TPI1

++

[30, 31, 100]

Their approximate strength as measured in glucose-grown cells is categorized by high (+++), intermediate (++), and low (+) expression strength. However, the reader should consider the variable experimental conditions used in the underlying studies as discussed in the text

After glucose depletion, i.e., when cells entered a fully respiratory metabolism and used the previously formed ethanol as the main source of carbon, the expression strength of some of the tested promoters changed considerably resulting in a different order of promoters with regard to their relative strength: PTEF1 ~ PHXT7 > PPGK1 > PTPI1 ~ PTDH3 > PPYK1 ~ PADH1 The strong impact of the metabolic mode (fermentative, respiro-fermentative versus respiratory) on the activity of several of these promoters is consistent with the results of other studies [27–29, 31] and indicates glucose-dependent regulation. The strongest differences were observed for PHXT7. These results substantiate our previous statement that not all promoters which are usually considered as constitutive can be expected to result in exactly the same level of expression under all experimental conditions or metabolic modes. Sun et al. [31] cloned and characterized

Promoters for Engineering of Saccharomyces cerevisiae

25

14 different constitutive promoters including those tested by Partow et al. [30]. All promoters were placed onto an episomal 2 μ plasmid and controlled the expression of the green fluorescent protein (GFP), which was used here as a reporter. The expression strength of the various promoters was tested under glucose- and oxygen-limited and non-limited conditions. Independent of the carbon source and oxygen availability, PTEF1 was consistently highly active. Other promoters, such as PCYC1 and PADH1, showed consistently low activity; whereas a rather modest expression strength was found for instance for PPYK1 and PTDH3. Hence, these endogenous constitutive promoters characterized in these two studies [30, 31] already cover a wide, dynamic range of expression strength. The available information from these and the abovementioned older studies about relative promoter strength of often used S. cerevisiae promoters (when measured in glucose-grown cells) has been collected and the promoters grouped into three rough categories (Table 1). Nevertheless, to find the optimal expression strength for a gene in a metabolic engineering approach, it is best to test different promoters. This strategy has been used for instance by Lu and Jeffries [33] to optimize xylose fermentation. They succeeded in obtaining optimal expression for all relevant genes encoding the appropriate metabolic pathway enzymes relevant for xylose utilization simply by shuffling different homologous constitutive promoters. 3.2 Regulated Promoters

High constitutive expression of genes in a pathway is rather counterproductive when the product of interest or a pathway intermediate is toxic to the cells or when severe re-routing of central metabolic pathways is a burden to the cell. In such cases, production processes need to be separated from cell growth: use of regulated promoters in order to control the expression of one or more pathway genes has been usually the method of choice. Ideally, such promoters control gene expression similar to an on–off switch and genes can be induced or repressed in a targeted manner, i.e., in response to a specific stimulus. Promoter regulation usually occurs at the cis-regulatory elements through recruitment of transcriptional activator(s) and/or the release of a transcriptional repressor(s). Whether a transcription factor is able to bind to its binding site in the promoter depends on changes of its conformation, which can be led by several molecular events in response to specific stimuli. Phosphorylation by a kinase and binding of an effector (small molecule) are only two well-known possibilities. This topic is beyond the scope of this review, and the reader is referred to other publications such as Campbell et al. [34]. The use of regulated promoters in metabolic pathway engineering allows inducing the activity of the target pathway at a certain time point during the production process. The induction

26

Georg Hubmann et al.

Table 2 Regulated promoters of Saccharomyces cerevisiae

Promoter source

Regulatory stimulus (general) Induction

CUP1

Chemical

GAL1/GAL7 & GAl10 Chemical

Repression Reference [29, 48, 52, 101]

Addition of Cu2+ Galactose

Glucose

[29, 52, 101]

DAN1

Physical

Oxygen depletion

[95]

ADH2

Chemical

Glucose depletion

MET3 & MET25

Chemical

Absence of methionine Methionine [29]

PHO5

Chemical

Phosphate depletion

Tet-off

Chemical

Tet-on

Chemical

Doxycycline

[78]

Dual Tet-on

Chemical

Doxycycline

[79]

Dual Tet-off

Chemical

Glucose

[52, 101]

[32] Doxycycline [77]

Doxycycline [79]

can be coupled to a certain growth phase or external stimulus. In S. cerevisiae, multiple regulated endogenous and heterologous promoters have been used for metabolic pathway engineering including heterologous protein production [32, 35], and the most frequently used regulated promoters are summarized in Table 2. One way to purposefully regulate gene expression at the transcriptional level is the use of chemical stimuli. For example, specific nutrients can be used as chemical stimuli when added to the growth medium. Other stimuli can be generated by the accumulation of cellular metabolites or products during the course of cellular growth and fermentation or inducer molecules can be added to the medium. The most prominent examples of regulated gene transcription in S. cerevisiae, i.e., the well-characterized glucoserepressed and galactose-inducible promoters of the genes GAL1, GAL7, and GAL10, fall into this class. The respective genes encode the enzymes of the Leloir pathway for galactose dissimilation. The corresponding promoters are tightly regulated and show a strict repression in the presence of glucose and strong induction in the presence of galactose, with the latter being nullified by the presence of even low levels of glucose [36]. The GAL genes are regulated by the transcriptional activator Gal4. In the absence of galactose, the activator Gal4, which is constitutively bound to a specific UAS in the promoters, is inhibited by a second galactoseresponsive protein factor Gal80, which interacts with Gal4 and represses the activation process. The repression of the GAL promoters is relieved by the galactokinase homologue, Gal3, which has lost its phosphorylation activity, binds galactose and ATP, and

Promoters for Engineering of Saccharomyces cerevisiae

27

interacts with Gal80 and Gal4. The complete galactose-regulatory system or particular parts including the core promoter and the UAS were further optimized [37–39], and the regulatory motif has been very often used for the construction of sophisticated regulatory systems, as described in the next section. Natural galactose-regulatory systems have also found wide applications in metabolic engineering, including the production of genistein [40], amorpha-4,11-diene [41], and n-butanol [42]. Besides galactose, other sugars, such as maltose [43] or sucrose [44], are known to specifically induce the expression of particular genes, encoding enzymes involved in maltose or sucrose dissimilation, respectively. However, glucose repression overrules induction by the respective sugars, similar as the situation in the natural GAL promoters, limiting the applicability of these promoters in mixed substrates. Particularly, the use of non-food (waste) renewable feedstocks in industrial biotechnology demands for regulatory systems applicable in the microbial production host and properly working even in growth media of complex compositions. Another commonly used regulated promoter is PCUP1. The CUP1 gene encodes a metallothionein, which is upregulated in response to copper ions to maintain cellular copper homeostasis [45, 46]. The expression of CUP1 is regulated by the Cup2 TA, which is converted into its active state in the presence of Cu2+ ions [45]. This copper-activated TA binds specifically to a regulatory UAS of PCUP1 and activates the expression of the metallothionein. The copper-induced gene expression systems have been used frequently for heterologous protein production [47, 48] and to a lesser extent in metabolic engineering such as for the production of methyl benzoate [49] and 1,2-propanediol [50]. The induction or the repression of genes might occur not only upon addition of a chemical but also upon depletion of certain nutrients. The lack of phosphate induces the expression of the starvationresponsive acid phosphatase, Pho5, which leads to the recovery of phosphate from phosphate-containing compounds in the medium [32]. PHO5 gene expression is inhibited in the presence of high inorganic phosphate concentrations. Upon depletion of phosphate, the two TAs, Pho4 and Pho2, bind to the PPHO5 and activate transcription of the downstream-located gene. Other frequently used promoters, which are responsive to the depletion of a nutrient, are the methionine-repressed promoters PMet25 [29] and PMet3 [51] and the glucose-repressed promoter PADH2 [52]. Their particular advantage is that gene expression is turned on after the specific nutrient is depleted from the medium through usage by the cells (self-inducing promoters). Hence, no expensive and/or toxic inducer has to be added, which is attractive for control of product formation particularly when it comes to large-scale commercial processes. Other examples of natural self-inducing promoters are those which control conditional flocculation [53] or the induction of tolerance to certain process-induced stress types [54].

28

Georg Hubmann et al.

In general, gene transcription can also be specifically induced by physical stimuli, such as changes in temperature, (osmotic) pressure, light exposure, or oxygen availability. Certain physical stimuli are attractive for implementation in large-scale production processes, particularly if they do not require sophisticated equipment. A temperature-regulated promoter system was developed based on artificial thermosensitivity of mating-type regulation in S. cerevisiae [55]. Specifically, a temperature-sensitive mutation in the SIR3 gene renders the silencing protein Sir3 inactive at 35 °C and active at 25 °C. Inactivation of Sir3 leads to the synthesis of several mating-type regulatory proteins, including the Matα2 repressor, which represses the transcription of Mata-specific genes in haploid Mata strains by binding to a known UAS sequence of 31 bp [56]. In the temperature-sensitive sir3 mutants, expression of the MATα2 repressor is temperature dependent. Therefore, specific target genes cloned downstream of a promoter with this 31 bp UAS sequence are repressed at 35 °C due to the inactivation of Sir3 allowing for the expression of the Matα2 repressor [55]. At the permissive temperature, 25 °C, Sir3 is active and therefore inhibits expression of the Matα2 repressor, hence enabling the expression of genes controlled by the Matα2 system [55, 57, 58]. Despite the attractiveness for simple induction with a simple temperature downshift, the system has been shown to respond slowly, with a delay of several hours after the shift [59]. Promoters regulated by changes in oxygen availability would also potentially be attractive tools in regulating gene expression in large-scale processes. Depletion of oxygen is known to induce expression of the DAN/TIR genes encoding yeast cell wall mannoproteins [60]. The regulation of these genes occurs through several cis-regulatory sites. Cohen and co-workers [61] identified and tested two regulatory promoter elements in the promoter PDAN1, which activate DAN1 expression under anaerobic conditions. A third region regulates oxygen-dependent repression of DAN1 [61]. Recently, this system has been subjected to directed promoter evolution, as described in Subheading 4.3. 3.3 Heterologous Promoters

Promoters from other organisms than S. cerevisiae have been rarely used when constructing metabolic pathways and regulatory networks in this organism. However, there are particular advantages in using nonhomologous promoters which are active in yeast, particularly when multiple gene expression cassettes are applied in an engineering approach. In fact, a high DNA sequence identity between the promoter within an expression cassette and a natural yeast promoter in the yeast genome could cause false homologous integrations into the genome and/or increase the risks of instabilities within the genome of engineered yeast due to homologous recombination. The sequences of heterologous promoters generally diverge more and can circumvent the problem of sequence similarity [2]. In particular, DNA sequence similarities should be

Promoters for Engineering of Saccharomyces cerevisiae

29

avoided when stably integrating gene cassettes into selected target sites in the genome based on homologous recombination [62]. A prominent example for a heterologous promoter is the PTEF1 of Ashbya gossypii, which is commonly used to drive the expression of selectable genetic markers within gene deletion or integration cassettes. The use of this heterologous promoter avoids integration events of the cassette at the position of the endogenous PTEF1 promoter [62, 63]. The use of heterologous promoters provides another major advantage. If the heterologous promoter is inducible by a stimulus nonnatural for the yeast S. cerevisiae, there is the chance that regulation of this promoter does not interfere with the existing yeast transcriptional network. Such promoters usually derive from bacterial, viral, or phage species and ensure a better predictable behavior at the whole-cell level than endogenous promoters [2]. One example of a functional viral promoter is the human cytomegalovirus promoter (PCMV), which is a strong constitutive promoter in several eukaryotic cells [64]. It is generally assumed that this viral promoter is not influenced by the yeast’s own transcriptional network and has been used for strong expression of genes in S. cerevisiae [65]. As first demonstrated in human cells, PCMV was shown to be induced if the cells are exposed to environmental stresses [66]. The induced expression of genes regulated by PCMV is probably stimulated by the mitogen-activated protein kinase p38, which has the ortholog Hog1 in S. cerevisiae. Recent investigations by Romero-Santacreu and co-workers [65] have shown that PCMV in yeast is indeed upregulated through Hog1, which is activated under osmotic stress [65]. These results again confirm that expression of the so-called constitutive promoters can be variable depending on the experimental conditions or the metabolic modes. Moreover, the viral PCMV even does not seem to be completely independent from the yeast transcriptional network and can be subjected to regulation under certain conditions. Hence, regulation characteristics of heterologous promoters should be carefully analyzed before use. Besides A. gossypii PTEF1 and viral PCMV, there might even be more heterologous promoters with potential applications in yeast engineering. Often specific regulatory motifs of heterologous promoters, such as TA- and/or TR-binding sites, have been used for the construction of hybrid promoters for controlling gene expression in S. cerevisiae. A variety of examples of such hybrid promoters are discussed in the following section.

4

Promoter Engineering: Generating Promoters with Tailor-Made Properties All methods which modify promoter sequences with the goal to optimize and/or customize their characteristics, such as strength and regulation, are collectively referred to as promoter engineering. Promoter engineering can be performed in a rational or an

30

Georg Hubmann et al.

undirected way. Rational methods comprise the simple deletion of promoter motifs, the amplification of certain transcription factorbinding sites, and the combination of well-characterized structural promoter motifs from different endogenous or heterologous promoters [67–69]. Nonrational methods include random mutagenesis of whole natural promoters or of specific parts of natural, chimeric/hybrid, or synthetic promoters. Several examples of successful applications of promoter engineering are discussed below. The reasons why promoter engineering has been applied in S. cerevisiae have been manifold. With regard to metabolic engineering approaches, Blazek and Alper [67] have emphasized three limitations by natural S. cerevisiae promoters which have challenged promoter engineering: (1) the isolation and characterization of a natural promoter with the desired properties can be tedious and genetic context specific, (2) isolated natural promoters only sample the continuum of gene expression at a few discrete points and may be plagued by disparate regulation patterns, and (3) isolated endogenous promoters are unable to maximize the true transcriptional capacity attainable within the host. This list can be further extended, particularly in view of regulatable promoters. In fact, most available natural promoters described in the previous section (4) show residual basal transcription in their repressed state (i.e., no tight regulation), (5) require expensive or toxic inducer molecules, or (6) are difficult to handle. It might also be a limitation of natural regulated promoters that (7) the inducer molecule results in pleiotropic effects in the cell. The latter is particularly relevant in the design of complex synthetic regulatory networks. Independence from existing networks is often called orthogonality and is most readily achieved by using and combining heterologous and endogenous promoters and parts thereof [2]. 4.1 Modification of Endogenous Promoters

The modular structure of yeast promoters as a fusion of core promoters and further upstream located regulatory elements has facilitated the rational engineering of promoters. Strategies of rational promoter engineering have often been based on changing the number of inducible or repressible elements in the upstream promoter sequence. An example of a simple approach of rational promoter engineering has been the deletion of regulatory motifs from the 5′ end of promoters. This can lead to significant changes in the expression strength or the regulation of the promoter, as shown for PADH1 [70, 71]. The activity of the original PADH1 decreases when cells enter the ethanol-consuming phase after depletion of glucose or during growth on non-fermentable carbon sources. The removal of 1,100 bp (i.e., from −414 to −1,500 bp) resulted in constant PADH1 promoter activity until late in the ethanol-consuming phase but also caused a lag in reporter production in the

Promoters for Engineering of Saccharomyces cerevisiae

31

glucose-consuming phase [71]. A slightly shorter deletion (from −414 to −700 bp) fixed the latter problem [70]. More sophisticated strategies incorporated cis-acting elements in one or more copies from other homologous S. cerevisiae promoters. For example, the incorporation of multiple copies of either the same or different types of regulatory motifs upstream of a core promoter resulted in variable expression strength and/or changed regulatory properties of a promoter [72–75]. Blazeck et al. [72] used endogenous core promoters obtained from TDH3 (PTDH3), TEF1 (PTEF), CYC1 (PCYC1), and shorter version of the LEU2 promoter (PLEUM) naturally showing very different activities. These core promoters were fused with variable copies of different UAS for enhancement of the expression strength. This strategy was able to convert weak promoters such as PCYC1 and PLEUM into strong promoters. However, the activities of the latter did never exceed the level of a strong natural reference promoter such as PTEF or PTDH3. Notably, a similar strategy applied to natural core promoters with naturally high activities (PTEF and PTDH3) increased their activity by 40–50 %. Other attempts to customize/optimize natural S. cerevisiae regulatory systems did not directly target promoter sequences. For example, Napp and Da Silva [37] reduced the inducer costs required for the GAL promoter system. They deleted the galactokinase gene GAL1 to prevent the consumption of galactose used as the inducer of this promoter. The deletion makes the addition of high amounts of expensive galactose for promoter induction obsolete. Another strategy aimed at fine-tuning the expression strength of the galactose-inducible system. Intermediate levels of PGAL1dependent gene expression were achieved by controlling the intracellular concentration of the inducer galactose. The intracellular galactose concentration is dependent on the activity of galactose permease Gal2. The expression of GAL2 itself was controlled at the transcriptional level by a chimeric promoter (see the following Subheading 4.2) using the tet-off system tunable by tetracycline concentration added to the medium [38, 39]. 4.2 Chimeric/Hybrid Promoters

Chimeric/hybrid promoters are fusions of sections from different natural S. cerevisiae or heterologous promoters. Often, chimeric promoters for use in S. cerevisiae have been constructed by combining heterologous regulatory promoter motifs with homologous core promoters. An important driver for the development of chimeric/hybrid promoters has been the demand of gene regulatory tools which do not interfere with the natural S. cerevisiae transcriptional network by, e.g., erroneous activations of other promoters in the network or any other type of cross talk. In synthetic biology, this property is referred to as orthogonality, meaning that the regulation of a part of an artificial circuit should occur independently and all parts should act as new insulated wires [2].

32

Georg Hubmann et al.

One prominent example for such a chimeric regulatory system is the tet promoter system, which has originally been developed for mammalian and plant cells but is also functional in yeast [76]. It is based on the bacterial promoter tet operators, which are repressed in the presence of tetracycline [77]. These tet operator sequences have been fused to core promoter sequences and enable the repression of target gene expression in the presence of tetracycline. The constructed chimeric promoters contain two or seven tetO operators, referred to as tetO2 and tetO7 fused to the CYC1 or the GAL1 core promoter, respectively. The results showed that one tetO box was sufficient to completely turn off expression of the respective downstream gene by tetracycline, and the introduction of more tetO boxes had almost no additional effect on expression strength and regulation [77]. The residual expression strength of tet-off promoters is inversely dependent on the concentration of the effector molecule, i.e., tetracycline or the tetracycline analogue, doxycycline [77, 78]. While doxycycline concentrations higher than 1 μg/ml completely turn off gene expression, lower concentrations allow for some residual expression. Another tetracycline-dependent system, referred to as tet-on, has also been constructed. It is based on a mutated version of the transcription factor tetR referred to as tetR’. While tetR binds to the operator when tetracycline is absent, tetR’ only binds in its presence. This allows induction of target gene expression by the addition of the effector, and the system is also tuneable by the inducer concentration [79]. Another tuneable expression system used in controlling eukaryotic gene expression is the lac operator–repressor system, which originates from the bacterial genes required for lactose utilization. It is composed of a TR, which is bound to the cis-acting lacO operator, both together forming a genetic switch. In the presence of lactose (precisely its isomer allolactose) or its artificial substitute isopropylβ-D-thiogalactopyranoside (IPTG), the repressor undergoes a conformational change that releases the TR from its operator, enabling the transcription of the gene [80]. The lac operator–repressor system has first been adapted for the use in mammalians cells, in particular transgenic mice, and was used to regulate transcription in mammalian cells [81]. The bacterial repressor gene was codon optimized to resemble a mammalian coding sequence and the lacO operator needed to be included in the targeted promoter. Recently, Ellis and co-workers [82] have used this eukaryotic optimized version to construct a regulatory synthetic gene network to control the timing of yeast sedimentation. Hybrid promoters have also been constructed using regulatory sequences from human cells. For example, the constitutive yeast core promoter PPGK1 was fused to human androgen-responsive sequences [83]. The authors have been able to control target gene expression by androgen and set the expression level within a 1,400-fold range without affecting normal cell growth.

Promoters for Engineering of Saccharomyces cerevisiae

33

Another type of approach to generate systems for regulating target gene expression by nonnatural inducers is the use of chimeric transcription factors rather than chimeric promoters. For example, Shimizu-Sato et al. [84] developed a sophisticated light-responsive promoter system, which induces gene expression after exposure to red light and switches it off after exposure to far-red light. The system is based on a chimera of the homologous Gal4 transcriptional activator, which binds to a regulatory site at the promoter PGAL1. The Gal4 DNA-binding domain was fused to the plant photoreceptors phytochrome phyA and phyB. In the darkness, both phytochromes are present in their inactive Pr conformers. The exposure to red light converts both phyA and phyB into their active Pfr conformers, which allows for binding of the plant transcriptional coactivator Pif3. The Pif3 regulator was fused to the Gal4 activation domain and allowed the induction of yeast gene transcription following the change in wavelength of light exposure [84]. The activation is reversible by conversion of the phytochromes from their Pfr into their Pr conformers. This light-inducible promoter system is an exciting example of how regulatory elements of different kingdoms can be combined to meet the requirements of biotechnological applications even though the practical use of this particular system in large-scale processes is questionable. Another regulatory system has been based on a chimeric version of the human estrogen receptor, the viral protein 16 transcriptional activation domain (VP16), and the DNA-binding domain of Gal4 [85, 86]. Upon addition of the inducer hormone β-estradiol, the chimeric transcription factor can bind to promoters containing the Gal4-binding sequences and hence activate transcription. The system is acting fast, resulting in readily detectable transcription within 5 min after addition of the inducer [86]. Importantly, the inducer β-estradiol hardly induces any other gene expression and thus allows a very specific gene induction in S. cerevisiae without changing nutrients or temperature. Moreover, the expression strength is graded dependent on the inducer concentration. With the advent of synthetic biology, the term synthetic promoter emerged. In the view of synthetic biology, the behavior of a synthetic promoter should be deduced from its individual motifs in order to reliably predict its behavior within the respective biological system [87]. This strategy has been hampered to a certain extent by the fact that the motifs of S. cerevisiae core promoters are diverse and knowledge about structure–function relation is still limited. Nevertheless, there have been attempts towards providing synthetic promoters. Several authors combined sequence motifs from known promoters and filled the spacer regions with random sequences. Afterwards, they searched for those promoter versions which fulfilled specific requirements [68, 69, 73, 82]. We discuss this approach (saturated mutagenesis) in Subheading 4.3. Although these studies provided useful promoters with novel characteristics

34

Georg Hubmann et al.

and might be considered as a first step towards synthetic promoters, we believe that a clear definition of a synthetic promoter is still lacking. In fact, the design and construction of those promoters have been based on the body of knowledge which exists for homologous and heterologous promoters. In this context, any hybrid promoter simply combining two or more promoter sections with well-known characteristics could be considered as a synthetic promoter. Extensive synthetic metabolic networks in S. cerevisiae will require many independent regulatory systems with predictable characteristics. In this context, it is relevant to mention a novel type of transcription factor. Plant pathogens produce the so-called transcription activator-like effectors (TALEs) which act as eukaryotic transcriptional activators in the plant cell nucleus, where they directly bind to DNA via a central domain of tandem repeats [88, 89]. The most interesting issue about TALEs is that they use a DNA-binding code that can be exploited to generate DNAbinding domains for any DNA target. Blount et al. [73] recently modified TALEs for the first time and designed them to act as orthogonal repressors for specific PFY1-based promoters. The core promoter originated from the constitutive PFY1 promoter because it shows almost no regulation and has no regulatory binding sites within its 100 bp core sequence. The ability to diversify a promoter at its core sequence and then independently target transcription activator-like orthogonal repressors (TALORs) to virtually any of these sequences shows great promises towards the design and construction of future synthetic gene networks. For a comprehensive overview on synthetic networks in S. cerevisiae the reader is referred to Blount et al. [2]. 4.3 Undirected Approaches to Customize Promoter Characteristics

In contrast to rational promoter construction, random methods to diversify promoter sequences, followed by selection of promoter mutant versions with desired properties, have the advantage of being almost independent from prior knowledge about sequence– function relation. Random diversification has regarded either restricted promoter sections or entire natural promoters and aimed at either modifying the activity of a given promoter or customizing its regulatory properties. The first type of approach, referred to as saturation mutagenesis [67, 68], was initially developed for bacterial promoter systems [90, 91] and has later been successfully applied in yeast [69, 73, 82]. This technique usually starts with several known promoter motifs connected by degenerated oligonucleotides. While the conserved motifs ensure promoter function, the random bases surrounding them can modulate the motif efficiency and thus the promoter expression strength. Unlike the well-conserved core promoter motifs in bacteria, i.e., the −35 and −10 region, S. cerevisiae promoters are more variable and less strict in their sequence, as described in a previous section. To find an ideal starting construct for S. cerevisiae, authors usually analyzed

Promoters for Engineering of Saccharomyces cerevisiae

35

numerous natural S. cerevisiae promoters and used the information for combining several promoter sections. Starting from such synthetic promoters, the saturated mutagenesis approach was successfully applied to yield several promoter libraries covering a wide range of expression strength or regulatory properties [69, 73, 82]. A second approach has been used to diversify whole natural S. cerevisiae promoters or large sections of them by random mutations based on error-prone PCR [92, 93]. Mutated promoter versions with desired expression strength or regulatory properties have been selected out of respective promoter libraries, containing thousands of different mutated versions. This method requires a suitable reporter such as GFP and a high-throughput screening method such as fluorescence-activated cell sorting for testing multiple promoter versions and sorting them according to their characteristics. In S. cerevisiae, directed evolution was applied to the promoters of the TEF1 gene [92, 94] and the hypoxia-inducible DAN1 gene [95]. The mutagenesis of the TEF1 promoter aimed at obtaining a robust collection of promoters with finely graded activities for fine-tuning gene expression in S. cerevisiae. Most mutated versions of the natural PTEF1 showed reduced activity with some rare-appearing promoter mutants that showed an increase in promoter strength. The final selection of 11 PTEF1 mutants can be used to generate appropriate promoter replacement cassettes for controlling the expression level of any given gene in the S. cerevisiae genome [94]. This collection is attractive for finely adjusting metabolic fluxes in metabolic engineering. For example, two promoter mutants out of the abovementioned promoter collection exhibiting relatively low activities were used to reduce (but not abolish) glycerol production in S. cerevisiae. Glycerol is an unwanted side product during ethanol production, but complete abolishment of its formation pathway has resulted in an osmo-sensitive phenotype and abolished anaerobic growth [96, 97]. The lower strength TEF1 promoter mutants allowed generating “intermediate” phenotypes [1] and enabled to find a good compromise between low glycerol production, ethanol productivity, and osmotolerance, providing an example of how fine-tuning of transcriptional activity can be used to find a compromise between a certain metabolic engineering target and the fundamental cellular requirements such as energy and redox balance as well as stress tolerance. In order to obtain a spontaneously inducible system for the induction of gene expression upon depletion of oxygen during the industrial fermentation processes, the DAN1 promoter was also subjected to error-prone PCR [95]. While the natural version can only be induced under fastidious anaerobiosis, as described in Subheading 3.2, two promoter mutant versions were isolated which allow induction just after consuming the residual oxygen in a bioreactor.

36

5

Georg Hubmann et al.

Limitations of Transcriptional Control in Metabolic Engineering Transcriptional control has been certainly the most often used tool to adjust the level of a certain protein for metabolic engineering approaches. Nevertheless, there are limitations which cannot be overcome by appropriate promoter choice or engineering. In theory, limitations might occur at different molecular levels such as restricted availability of required transcription factors in their activated form, processivity of RNA polymerase, mRNA stability and transport, translational activity, and protein stability. For example, strong overexpression of a protein at the level of transcription will not result in increased protein activity if the protein itself underlies a regulation by inactivation. This is particularly true for transporter proteins, whose activity is often regulated by degradation via endocytosis depending on environmental conditions. Besides limitations which influence the concentration of certain proteins in a cell, further limitations might occur at the posttranslational level. Catalytic properties of enzymes are often dependent on posttranslational modifications, in particular phosphorylation, acetylation, or adenylation. A tricky situation might also occur if a specific protein requires a coenzyme or a prosthetic group whose cellular concentrations are limited. Last but not least, a limitation might also occur at the level of the metabolic flux itself. The reaction rate of an enzyme or a metabolic pathway is influenced by not only the catalyst (enzyme) concentration but also the concentration of the substrate(s) and product(s), as described by the Michaelis–Menten kinetics. In fact, fluxes in metabolic pathways mainly depend on metabolite concentration rather than on catalyst concentration [98]. Hence, small changes in substrate concentrations might have more effect than any change in the expression of the catalyst itself. Finally, the synthesized proteins have kinetic characteristics, like the KM value and Vmax, which are independent of their concentration. In addition, the enzyme(s) catalyzing the first step(s) of a certain pathway is often controlled by allosteric control, where the final product of the pathway acts as the allosteric effector (feedback inhibition). Simple overproduction of such enzymes will also not affect metabolic flux through the target pathway.

6

Conclusions Methods to customize well-characterized promoters are with no doubt highly valuable tools for metabolic engineering and synthetic biology despite the abovementioned limitations of transcriptional control. As summarized above, various well-characterized endogenous as well as modified promoters, which cover a wide

Promoters for Engineering of Saccharomyces cerevisiae

37

range of expression strengths and regulatory properties, are available for targeted gene expression in S. cerevisiae. The presented examples impressively show the possibility to successfully combine promoter elements from different kingdoms in order to obtain promoters with tailor-made characteristics, to customize the regulatory properties of natural S. cerevisiae promoters, to achieve expression strengths higher than provided by any natural yeast promoter, and to construct promoter libraries for fine-tuning of transcriptional activity. Still, there is demand for promoters which meet the requirements of specific metabolic engineering approaches: in particular, cost-efficient and well-defined regulatory systems applicable in largescale yeast-based bioprocesses are of very high interest. References 1. Hubmann G, Guillouet S, Nevoigt E (2011) Gpd1 and Gpd2 fine-tuning for sustainable reduction of glycerol formation in Saccharomyces cerevisiae. Appl Environ Microbiol 77(17):5857–5867. doi:AEM. 05338-11 [pii] 10.1128/AEM.05338-11 2. Blount BA, Weenink T, Ellis T (2012) Construction of synthetic regulatory networks in yeast. FEBS Lett 586(15):2112– 2121. doi:10.1016/j.febslet.2012.01.053 3. Edwards SR, Wandless TJ (2010) Dicistronic regulation of fluorescent proteins in the budding yeast Saccharomyces cerevisiae. Yeast 27(4):229–236. doi:10.1002/yea.1744 4. Hahn S, Young ET (2011) Transcriptional regulation in Saccharomyces cerevisiae: transcription factor regulation and function, mechanisms of initiation, and roles of activators and coactivators. Genetics 189(3):705– 736. doi:189/3/705 [pii] 10.1534/genetics. 111.127019 5. Juven-Gershon T, Kadonaga JT (2010) Regulation of gene expression via the core promoter and the basal transcriptional machinery. Dev Biol 339(2):225–229. doi:S0012-1606(09)01116-6 [pii] 10.1016/ j.ydbio.2009.08.009 6. Struhl K (1986) Constitutive and inducible Saccharomyces cerevisiae promoters: evidence for two distinct molecular mechanisms. Mol Cell Biol 6(11):3847–3853 7. Basehoar AD, Zanton SJ, Pugh BF (2004) Identification and distinct regulation of yeast TATA box-containing genes. Cell 116(5):699– 709. doi:S0092867404002053 [pii] 8. Sugihara F, Kasahara K, Kokubo T (2011) Highly redundant function of multiple AT-rich sequences as core promoter elements

9. 10.

11.

12.

13.

14.

15.

in the TATA-less RPS5 promoter of Saccharomyces cerevisiae. Nucleic Acids Res 39(1):59–75. doi:gkq741 [pii] 10.1093/ nar/gkq741 Rando OJ, Winston F (2012) Chromatin and transcription in yeast. Genetics 190(2):351– 387. doi:10.1534/genetics.111.132266 Tirosh I, Barkai N (2008) Two strategies for gene regulation by promoter nucleosomes. Genome Res 18(7):1084–1091. doi:gr. 076059.108 [pii] 10.1101/gr.076059.108 Mosch HU, Graf R, Braus GH (1992) Sequence-specific initiator elements focus initiation of transcription to distinct sites in the yeast TRP4 promoter. EMBO J 11(12): 4583–4590 Yang C, Bolotin E, Jiang T, Sladek FM, Martinez E (2007) Prevalence of the initiator over the TATA box in human and yeast genes and identification of DNA motifs enriched in human TATA-less core promoters. Gene 389(1):52–65. doi:S0378-1119(06)00623-8 [pii] 10.1016/j.gene.2006.09.029 Lynch M, Sung W, Morris K, Coffey N, Landry CR, Dopman EB, Dickinson WJ, Okamoto K, Kulkarni S, Hartl DL, Thomas WK (2008) A genome-wide view of the spectrum of spontaneous mutations in yeast. Proc Natl Acad Sci U S A 105(27):9272–9277. doi:10.1073/pnas.0803466105 Lee TH, Maheshri N (2012) A regulatory role for repeated decoy transcription factor binding sites in target gene expression. Molecular systems biology 8:576. doi:10.1038/msb.2012.7 Ptashne M, Gann A (1997) Transcriptional activation by recruitment. Nature 386(6625): 569–577. doi:10.1038/386569a0

38

Georg Hubmann et al.

16. Guarente L (1987) Regulatory proteins in yeast. Annu Rev Genet 21:425–452. doi:10.1146/annurev.ge.21.120187.002233 17. Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, Hannett NM, Tagne JB, Reynolds DB, Yoo J, Jennings EG, Zeitlinger J, Pokholok DK, Kellis M, Rolfe PA, Takusagawa KT, Lander ES, Gifford DK, Fraenkel E, Young RA (2004) Transcriptional regulatory code of a eukaryotic genome. Nature 431(7004):99–104. doi:10.1038/nature02800 nature02800 [pii] 18. Struhl K (1989) Molecular mechanisms of transcriptional regulation in yeast. Annu Rev Biochem 58:1051–1077. doi:10.1146/ annurev.bi.58.070189.005155 19. Bailey TL, Williams N, Misleh C, Li WW (2006) MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res 34((Web Server issue)):W369– W373. doi:34/suppl_2/W369 [pii] 10.1093/ nar/gkl198 20. Tompa M, Li N, Bailey TL, Church GM, De Moor B, Eskin E, Favorov AV, Frith MC, Fu Y, Kent WJ, Makeev VJ, Mironov AA, Noble WS, Pavesi G, Pesole G, Regnier M, Simonis N, Sinha S, Thijs G, van Helden J, Vandenbogaert M, Weng Z, Workman C, Ye C, Zhu Z (2005) Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnol 23(1):137– 144. doi:nbt1053 [pii] 10.1038/nbt1053 21. Reid JE, Evans KJ, Dyer N, Wernisch L, Ott S (2010) Variable structure motifs for transcription factor binding sites. BMC Genomics 11:30. doi:1471-2164-11-30 [pii] 10.1186/1471-2164-11-30 22. Hu M, Yu J, Taylor JM, Chinnaiyan AM, Qin ZS (2010) On the detection and refinement of transcription factor binding sites using ChIP-Seq data. Nucleic Acids Res 38(7):2154– 2167. doi:gkp1180 [pii] 10.1093/nar/ gkp1180 23. Johnston M, Davis RW (1984) Sequences that regulate the divergent GAL1-GAL10 promoter in Saccharomyces cerevisiae. Mol Cell Biol 4(8):1440–1448 24. West RW Jr, Yocum RR, Ptashne M (1984) Saccharomyces cerevisiae GAL1-GAL10 divergent promoter region: location and function of the upstream activating sequence UASG. Mol Cell Biol 4(11):2467–2478 25. Xu Z, Wei W, Gagneur J, Perocchi F, ClauderMunster S, Camblong J, Guffanti E, Stutz F, Huber W, Steinmetz LM (2009) Bidirectional promoters generate pervasive transcription in yeast. Nature 457(7232):1033–1037. doi: nature07728 [pii] 10.1038/nature07728

26. Neil H, Malabat C, d’Aubenton-Carafa Y, Xu Z, Steinmetz LM, Jacquier A (2009) Widespread bidirectional promoters are the major source of cryptic transcripts in yeast. Nature 457(7232):1038–1042. doi:nature07747 [pii] 10.1038/nature07747 27. DeMarini DJ, Carlin EM, Livi GP (2001) Constitutive promoter modules for PCRbased gene modification in Saccharomyces cerevisiae. Yeast 18(8):723–728. doi:10.1002/ yea.721 28. Monfort A, Finger S, Sanz P, Prieto JA (1999) Evaluation of different promoters for the efficient production of heterologous proteins in baker’s yeast. Biotechnology Letters 21 (3):225–229. doi: 10.1023/A:1005467912623 29. Mumberg D, Muller R, Funk M (1994) Regulatable promoters of Saccharomyces cerevisiae: comparison of transcriptional activity and their use for heterologous expression. Nucleic Acids Res 22(25):5767–5768 30. Partow S, Siewers V, Bjorn S, Nielsen J, Maury J (2010) Characterization of different promoters for designing a new expression vector in Saccharomyces cerevisiae. Yeast 27(11):955–964. doi:10.1002/yea.1806 31. Sun J, Shao Z, Zhao H, Nair N, Wen F, Xu JH (2012) Cloning and characterization of a panel of constitutive promoters for applications in pathway engineering in Saccharomyces cerevisiae. Biotechnol Bioeng 109(8):2082– 2092. doi:10.1002/bit.24481 32. Da Silva NA, Srikrishnan S (2012) Introduction and expression of genes for metabolic engineering applications in Saccharomyces cerevisiae. FEMS Yeast Res 12(2):197–214. doi:10.1111/j.1567-1364.2011.00769.x 33. Lu C, Jeffries T (2007) Shuffling of promoters for multiple genes to optimize xylose fermentation in an engineered Saccharomyces cerevisiae strain. Appl Environ Microbiol 73(19):6072–6077. doi:AEM.00955-07 [pii] 10.1128/AEM.00955-07 34. Campbell RN, Leverentz MK, Ryan LA, Reece RJ (2008) Metabolic control of transcription: paradigms and lessons from Saccharomyces cerevisiae. The Biochemical journal 414(2):177–187. doi:10.1042/BJ20080923 35. Maya D, Quintero MJ, de la Cruz M-CM, Chavez S (2008) Systems for applied gene control in Saccharomyces cerevisiae. Biotechnol Lett 30(6):979–987. doi:10.1007/ s10529-008-9647-z 36. Johnston M (1987) A model fungal gene regulatory mechanism: the GAL genes of Saccharomyces cerevisiae. Microbiol Rev 51(4):458–476

Promoters for Engineering of Saccharomyces cerevisiae 37. Napp SJ, Da Silva NA (1994) Enhanced productivity through gratuitous induction in recombinant yeast fermentations. Biotechnol Prog 10(1):125–128. doi:10.1021/bp00025a015 38. Hawkins KM, Smolke CD (2006) The regulatory roles of the galactose permease and kinase in the induction response of the GAL network in Saccharomyces cerevisiae. J Biol Chem 281(19):13485–13492. doi:M512317200 [pii] 10.1074/jbc.M512317200 39. Hawkins KM, Smolke CD (2008) Production of benzylisoquinoline alkaloids in Saccharomyces cerevisiae. Nat Chem Biol 4(9): 564–573. doi:nchembio.105 [pii] 10.1038/ nchembio.105 40. Katsuyama Y, Miyahisa I, Funa N, Horinouchi S (2007) One-pot synthesis of genistein from tyrosine by coincubation of genetically engineered Escherichia coli and Saccharomyces cerevisiae cells. Appl Microbiol Biotechnol 73(5):1143–1149. doi:10.1007/s00253-0060568-2 41. Lindahl AL, Olsson ME, Mercke P, Tollbom O, Schelin J, Brodelius M, Brodelius PE (2006) Production of the artemisinin precursor amorpha-4,11-diene by engineered Saccharomyces cerevisiae. Biotechnol Lett 28(8):571–580. doi:10.1007/s10529-006-0015-6 42. Steen EJ, Chan R, Prasad N, Myers S, Petzold CJ, Redding A, Ouellet M, Keasling JD (2008) Metabolic engineering of Saccharomyces cerevisiae for the production of n-butanol. Microb Cell Fact 7:36. doi:1475-2859-7-36 [pii] 10.1186/1475-2859-7-36 43. Finley RL Jr, Zhang H, Zhong J, Stanyon CA (2002) Regulated expression of proteins in yeast using the MAL61-62 promoter and a mating scheme to increase dynamic range. Gene 285(1–2):49–57. doi:S037811190 2004201 [pii] 44. Park YS, Shiba S, Lijima S, Kobayashi T, Hishinuma F (1993) Comparison of three different promoter systems for secretory alpha-amylase production in fed-batch cultures of recombinant Saccharomyces cerevisiae. Biotechnol Bioeng 41(9):854–861. doi:10.1002/bit.260410904 45. Furst P, Hu S, Hackett R, Hamer D (1988) Copper activates metallothionein gene transcription by altering the conformation of a specific DNA binding protein. Cell 55(4):705–717. doi:0092-8674(88)90229-2 [pii] 46. Huibregtse JM, Engelke DR, Thiele DJ (1989) Copper-induced binding of cellular factors to yeast metallothionein upstream activation sequences. Proc Natl Acad Sci USA 86(1):65–69

39

47. Koller A, Valesco J, Subramani S (2000) The CUP1 promoter of Saccharomyces cerevisiae is inducible by copper in Pichia pastoris. Yeast 16(7):651–656. doi:10.1002/(SICI)10970061(200005)16:73.0.CO;2-F [pii] 10.1002/ (SICI)1097-0061(200005)16:73.0.CO;2-F 48. Macreadie IG (1990) Yeast vectors for cloning and copper-inducible expression of foreign genes. Nucleic Acids Res 18(4):1078 49. Farhi M, Dudareva N, Masci T, Weiss D, Vainstein A, Abeliovich H (2006) Synthesis of the food flavoring methyl benzoate by genetically engineered Saccharomyces cerevisiae. J Biotechnol 122(3):307–315. doi:S01681656(05)00764-9 [pii] 10.1016/j.jbiotec. 2005.12.007 50. Lee W, Dasilva NA (2006) Application of sequential integration for metabolic engineering of 1,2-propanediol production in yeast. Metab Eng 8(1):58–65. doi:S1096-7176(05)00071-6 [pii] 10.1016/j.ymben.2005.09.001 51. Mountain HA, Bystrom AS, Larsen JT, Korch C (1991) Four major transcriptional responses in the methionine/threonine biosynthetic pathway of Saccharomyces cerevisiae. Yeast 7(8):781–803. doi:10.1002/yea.320070804 52. Lee KM, DaSilva NA (2005) Evaluation of the Saccharomyces cerevisiae ADH2 promoter for protein synthesis. Yeast 22(6):431– 440. doi:10.1002/yea.1221 53. Cunha AF, Missawa SK, Gomes LH, Reis SF, Pereira GA (2006) Control by sugar of Saccharomyces cerevisiae flocculation for industrial ethanol production. FEMS Yeast Res 6(2):280–287. doi:FYR038 [pii] 10.1111/ j.1567-1364.2006.00038.x 54. Cardona F, Carrasco P, Perez-Ortin JE, del Olmo M, Aranda A (2007) A novel approach for the improvement of stress resistance in wine yeasts. Int J Food Microbiol 114(1):83– [pii] 91. doi:S0168-1605(06)00587-3 10.1016/j.ijfoodmicro.2006.10.043 55. Sledziewski AZ, Bell A, Yip C, Kelsay K, Grant FJ, MacKay VL (1990) Superimposition of temperature regulation on yeast promoters. Methods Enzymol 185:351–366. doi:0076-6879(90)85031-I [pii] 56. Rine J, Herskowitz I (1987) Four genes responsible for a position effect on expression from HML and HMR in Saccharomyces cerevisiae. Genetics 116(1):9–22 57. Kobayashi H, Nakazawa N, Harashima S, Oshima Y (1990) A system for temperaturecontrolled expression of a foreign gene with dual mode in Saccharomyces cerevisiae.

40

58.

59.

60.

61.

62.

63.

64.

65.

66.

67.

Georg Hubmann et al. J Ferment Bioeng 69(6):322–327. doi:10.1016/0922-338X(90)90237-Q Silva NAD, Bailey JE (1989) Construction and characterization of a temperaturesensitive expression system in recombinant yeast. Biotechnol Prog 5(1):18–26. doi:10.1002/btpr.5420050107 Cheng C, Yang S-T (1996) Dynamics and modeling of temperature-regulated gene product expression in recombinant yeast fermentation. Biotechnol Bioeng 50(6):663– 674. doi:10.1002/(sici)1097-0290(19960620) 50:63.0.co;2-i Abe F (2007) Induction of DAN/TIR yeast cell wall mannoprotein genes in response to high hydrostatic pressure and low temperature. FEBS Lett 581(25):4993–4998. doi:10.1016/j.febslet.2007.09.039 Cohen BD, Sertil O, Abramova NE, Davies KJ, Lowry CV (2001) Induction and repression of DAN1 and the family of anaerobic mannoprotein genes in Saccharomyces cerevisiae occurs through a complex array of regulatory sites. Nucleic Acids Res 29(3):799–808 Gueldener U, Heinisch J, Koehler GJ, Voss D, Hegemann JH (2002) A second set of loxP marker cassettes for Cre-mediated multiple gene knockouts in budding yeast. Nucleic Acids Res 30(6):e23 Wach A, Brachat A, Pohlmann R, Philippsen P (1994) New heterologous modules for classical or PCR-based gene disruptions in Saccharomyces cerevisiae. Yeast 10(13): 1793–1808 Becskei A, Seraphin B, Serrano L (2001) Positive feedback in eukaryotic gene networks: cell differentiation by graded to binary response conversion. EMBO J 20(10):2528– 2535. doi:10.1093/emboj/20.10.2528 Romero-Santacreu L, Orozco H, Garre E, Alepuz P (2010) The bidirectional cytomegalovirus immediate/early promoter is regulated by Hog1 and the stress transcription factors Sko1 and Hot1 in yeast. Molecular genetics and genomics: MGG 283(5):511– 518. doi:10.1007/s00438-010-0537-4 Bruening W, Giasson B, Mushynski W, Durham HD (1998) Activation of stressactivated MAP protein kinases up-regulates expression of transgenes driven by the cytomegalovirus immediate/early promoter. Nucleic Acids Res 26(2):486–489 Blazeck J, Alper HS (2013) Promoter engineering: recent advances in controlling transcription at the most fundamental level. Biotechnol J 8(1):46–58. doi:10.1002/ biot.201200120

68. Hammer K, Mijakovic I, Jensen PR (2006) Synthetic promoter libraries-tuning of gene expression. Trends Biotechnol 24(2):53–55. doi:S0167-7799(05)00326-4 [pii] 10.1016/ j.tibtech.2005.12.003 69. Jeppsson M, Johansson B, Jensen PR, HahnHagerdal B, Gorwa-Grauslund MF (2003) The level of glucose-6-phosphate dehydrogenase activity strongly influences xylose fermentation and inhibitor sensitivity in recombinant Saccharomyces cerevisiae strains. Yeast 20(15):1263–1272. doi:10.1002/yea.1043 70. Ruohonen L, Aalto MK, Keranen S (1995) Modifications to the ADH1 promoter of Saccharomyces cerevisiae for efficient production of heterologous proteins. J Biotechnol 39(3):193–203. doi:016816569500024 K [pii] 71. Ruohonen L, Penttila M, Keranen S (1991) Optimization of Bacillus alpha-amylase production by Saccharomyces cerevisiae. Yeast 7(4):337–346. doi:10.1002/yea.320070404 72. Blazeck J, Garg R, Reed B, Alper HS (2012) Controlling promoter strength and regulation in Saccharomyces cerevisiae using synthetic hybrid promoters. Biotechnol Bioeng 109(11):2884–2895. doi:10.1002/bit.24552 73. Blount BA, Weenink T, Vasylechko S, Ellis T (2012) Rational diversification of a promoter providing fine-tuned expression and orthogonal regulation for synthetic biology. PLoS One 7(3):e33279. doi:10.1371/journal. pone.0033279 PONE-D-11-25479 [pii] 74. Murphy KF, Balazsi G, Collins JJ (2007) Combinatorial promoter design for engineering noisy gene expression. Proc Natl Acad Sci U S A 104(31):12726–12731. doi:0608451104 [pii] 10.1073/pnas.0608451104 75. Raijman D, Shamir R, Tanay A (2008) Evolution and selection in yeast promoters: analyzing the combined effect of diverse transcription factor binding sites. PLoS Comput Biol 4(1):e7. doi:07-PLCB-RA-0237 [pii] 10.1371/journal.pcbi.0040007 76. Dingermann T, Frank-Stoll U, Werner H, Wissmann A, Hillen W, Jacquet M, Marschalek R (1992) RNA polymerase III catalysed transcription can be regulated in Saccharomyces cerevisiae by the bacterial tetracycline repressor-operator system. EMBO J 11(4): 1487–1492 77. Gari E, Piedrafita L, Aldea M, Herrero E (1997) A set of vectors with a tetracyclineregulatable promoter system for modulated gene expression in Saccharomyces cerevisiae. Yeast 13(9):837–848. doi:10.1002/(SICI) 1097-0061(199707)13:93.0.CO;2-T [pii] 10.1002/

Promoters for Engineering of Saccharomyces cerevisiae

78.

79.

80. 81. 82.

83.

84.

85.

86.

87.

(SICI)1097-0061(199707)13:93.0.CO;2-T Belli G, Gari E, Aldea M, Herrero E (1998) Functional analysis of yeast essential genes using a promoter-substitution cassette and the tetracycline-regulatable dual expression system. Yeast 14(12):1127–1138. doi:10.1002/(SICI)10970061(19980915)14:123.0.CO;2-# [pii] 10.1002/(SICI) 1097-0061(19980915)14:123.0.CO;2-# Belli G, Gari E, Piedrafita L, Aldea M, Herrero E (1998) An activator/repressor dual system allows tight tetracycline-regulated gene expression in budding yeast. Nucleic Acids Res 26(4):942–947. doi:gkb 206 [pii] Lewis M (2005) The lac repressor. Comptes rendus biologies 328(6):521–548. doi: 10.1016/j.crvi.2005.04.004 Scrable H, Stambrook PJ (1997) Activation of the lac repressor in the transgenic mouse. Genetics 147(1):297–304 Ellis T, Wang X, Collins JJ (2009) Diversitybased, model-guided construction of synthetic gene networks with predicted functions. Nat Biotechnol 27(5):465–471. doi:10.1038/ nbt.1536 Purvis IJ, Chotai D, Dykes CW, Lubahn DB, French FS, Wilson EM, Hobden AN (1991) An androgen-inducible expression system for Saccharomyces cerevisiae. Gene 106(1):35– 42. doi:0378-1119(91)90563-Q [pii] Shimizu-Sato S, Huq E, Tepperman JM, Quail PH (2002) A light-switchable gene promoter system. Nat Biotechnol 20(10):1041–1044. doi:10.1038/nbt734 nbt734 [pii] Louvion JF, Havaux-Copf B, Picard D (1993) Fusion of GAL4-VP16 to a steroid-binding domain provides a tool for gratuitous induction of galactose-responsive genes in yeast. Gene 131(1):129–134. doi:0378-1119(93) 90681-R [pii] McIsaac RS, Silverman SJ, McClean MN, Gibney PA, Macinskas J, Hickman MJ, Petti AA, Botstein D (2011) Fast-acting and nearly gratuitous induction of gene expression and protein depletion in Saccharomyces cerevisiae. Mol Biol Cell 22(22):4447–4459. doi:mbc.E11-05-0466 [pii] 10.1091/mbc. E11-05-0466 Andrianantoandro E, Basu S, Karig DK, Weiss R (2006) Synthetic biology: new engineering rules for an emerging discipline. Molecular systems biology 2(2006):0028. doi:10.1038/ msb4100073

41

88. Kay S, Hahn S, Marois E, Hause G, Bonas U (2007) A bacterial effector acts as a plant transcription factor and induces a cell size regulator. Science 318(5850):648–651. doi:318/5850/648 [pii] 10.1126/science. 1144956 89. Scholze H, Boch J (2011) TAL effectors are remote controls for gene activation. Current opinion in microbiology 14(1):47–53. doi:10.1016/j.mib.2010.12.001 90. Jensen PR, Hammer K (1998) Artificial promoters for metabolic optimization. Biotechnol Bioeng 58(2–3):191–195. doi:10.1002/ (SICI)1097-0290(19980420)58: 2/33.0.CO;2-G [pii] 91. Jensen PR, Hammer K (1998) The sequence of spacers between the consensus sequences modulates the strength of prokaryotic promoters. Appl Environ Microbiol 64(1): 82–87 92. Alper H, Fischer C, Nevoigt E, Stephanopoulos G (2005) Tuning genetic control through promoter engineering. Proc Natl Acad Sci U S A 102(36):12678–12683. doi:0504604102 [pii] 10.1073/pnas.0504604102 93. Tyo KE, Nevoigt E, Stephanopoulos G (2011) Directed evolution of promoters and tandem gene arrays for customizing RNA synthesis rates and regulation. Methods Enzymol 497:135–155. doi:B978-012-385075-1.00006-8 [pii] 10.1016/ B978-0-12-385075-1.00006-8 94. Nevoigt E, Kohnke J, Fischer CR, Alper H, Stahl U, Stephanopoulos G (2006) Engineering of promoter replacement cassettes for fine-tuning of gene expression in Saccharomyces cerevisiae. Appl Environ Microbiol 72(8):5266–5273. doi:72/8/5266 [pii] 10.1128/AEM.00530-06 95. Nevoigt E, Fischer C, Mucha O, Matthaus F, Stahl U, Stephanopoulos G (2007) Engineering promoter regulation. Biotechnol Bioeng 96(3):550–558. doi:10.1002/bit.21129 96. Bjorkqvist S, Ansell R, Adler L, Liden G (1997) Physiological response to anaerobicity of glycerol-3-phosphate dehydrogenase mutants of Saccharomyces cerevisiae. Appl Environ Microbiol 63(1):128–132 97. Nissen TL, Hamann CW, Kielland-Brandt MC, Nielsen J, Villadsen J (2000) Anaerobic and aerobic batch cultivations of Saccharomyces cerevisiae mutants impaired in glycerol synthesis. Yeast 16(5):463–474 98. Daran-Lapujade P, Rossell S, van Gulik WM, Luttik MA, de Groot MJ, Slijper M, Heck AJ, Daran JM, de Winde JH, Westerhoff

42

Georg Hubmann et al.

HV, Pronk JT, Bakker BM (2007) The fluxes through glycolytic enzymes in Saccharomyces cerevisiae are predominantly regulated at posttranscriptional levels. Proc Natl Acad Sci U S A 104(40):15753–15758. doi:10.1073/pnas. 0707476104 99. Mumberg D, Muller R, Funk M (1995) Yeast vectors for the controlled expression of heterologous proteins in different genetic backgrounds. Gene 156(1):119–122

100. Hauf J, Zimmermann FK, Muller S (2000) Simultaneous genomic overexpression of seven glycolytic enzymes in the yeast Saccharomyces cerevisiae. Enzyme and microbial technology 26(9–10):688–698 101. Shen MW, Fang F, Sandmeyer S, Da Silva NA (2012) Development and characterization of a vector set with regulated promoters for systematic metabolic engineering in Saccharomyces cerevisiae. Yeast 29(12): 495–503. doi:10.1002/yea.2930

Chapter 3 Tools for Genetic Engineering of the Yeast Hansenula polymorpha Ruchi Saraya, Loknath Gidijala, Marten Veenhuis, and Ida J. van der Klei Abstract Hansenula polymorpha is a methylotrophic yeast species that has favorable properties for heterologous protein production and metabolic engineering. It provides an attractive expression platform with the capability to secrete high levels of commercially important proteins. Over the past few years many efforts have led to advances in the development of this microbial host including the generation of expression vectors containing strong constitutive or inducible promoters and a large array of dominant and auxotrophic markers. Moreover, highly efficient transformation procedures used to generate genetically stable strains are now available. Here, we describe these tools as well as the methods for genetic engineering of H. polymorpha. Key words Genetic engineering, Hansenula polymorpha, Methylotrophic yeast, Metabolic engineering, Heterologous protein production

1

Introduction Natural compounds of high medical value (i.e., taxol) are often synthesized at very low quantities from the natural producer(s) which makes their purification procedures time consuming and costly. It is therefore attractive to manipulate the biosynthetic pathways to enhance production rates in the natural producer or even transfer these pathways in a suitable heterologous host that can readily be genetically manipulated. Well known is the use of various cell lines and microbial production systems, among which yeast species. The use of microorganisms has the advantage that it allows introducing complete biosynthetic pathways for the commercial production of desired valuable chemical compounds, resulting in sustainable “green” production processes. A striking example from our laboratory is the introduction of five heterologous genes in the yeast Hansenula polymorpha (also designated as Pichia angusta) which resulted in an engineered strain that produced and secreted the antibiotic penicillin [1].

Valeria Mapelli (ed.), Yeast Metabolic Engineering: Methods and Protocols, Methods in Molecular Biology, vol. 1152, DOI 10.1007/978-1-4939-0563-8_3, © Springer Science+Business Media, LLC 2014

43

44

Ruchi Saraya et al.

Successful introduction of novel biosynthetic pathways in microorganisms requires the availability of optimized genetic tools, for example gene deletion or targeted integration, resulting in the expression of multiple heterologous genes in a single strain. Finetuning of gene expression rates (for enzymes of the production pathway as well as for transporter proteins) is also important for optimal production levels. Microorganisms are also used for the (bulk) production of a large range of heterologous proteins for various industrial and pharmaceutical applications (e.g., vaccines, enzymes, protein hormones). At first, Escherichia coli was mainly applied as a favorable host for production of heterologous proteins and metabolic engineering. However, eukaryotic proteins often require specific cofactors and/or posttranslational modifications that cannot be achieved using E. coli or other prokaryotic systems. To this end, eukaryotic microorganisms like the yeast Saccharomyces cerevisiae are attractive alternatives. S. cerevisiae has the advantage of being a GRAS (generally considered as safe) organism, being readily accessible to genetic manipulation, and allowing secretion of the product into the cultivation media, thus facilitating product purification. Various examples of successfully engineered strains and optimized metabolic pathways that result in the production of valuable pharmaceuticals and fine chemicals have been established in this yeast species [2]. However, product yields are often lower for S. cerevisiae strains compared to H. polymorpha, the latter having stronger promoters. In addition, secreted proteins in S. cerevisiae are often hyper-glycosylated. For metabolic engineering and heterologous protein production, the methylotrophic yeast species like H. polymorpha has favorable properties compared to baker’s yeast. First, several strong inducible promoters are available, such as key enzymes for methanol metabolism, namely, alcohol oxidase (AO), dihydroxyacetone synthase (DHAS), and formaldehyde dehydrogenase (FHD) [3], in conjunction with the strong promoter of the cell membranebound ATPase [4]. Strong constitutive promoters such as glyceraldehyde 3-phosphate dehydrogenase (GAP) and translation elongation factor 1-α (EF1α) are also available [5, 6]. Second, H. polymorpha efficiently secretes proteins. Engineered strains have been constructed and have shown humanlike N-glycosylation of secreted proteins [7]. Third, H. polymorpha has excellent fermentation characteristics: it can be grown in large fermenters to high cell densities; it is thermotolerant (growth is possible up to 49 °C), hence reducing costs related to the cooling process during the fermentation; and it tolerates a broad range of pH values (2.5–6.5). Finally, H. polymorpha secretes only very minor amounts of endogenous proteins, facilitating downstream processing. Nowadays, H. polymorpha is widely used as production host for industrial and pharmaceutical proteins via metabolic engineering: examples are

Genetic Tools for Hansenula polymorpha

45

the production of pharmaceuticals as the insulin-like growth factor 1 or hepatitis B vaccine [8, 9]. In addition, phytase (13.5 g/l) for food supplementation produced by H. polymorpha is an example of one of the highest protein productivities ever observed for yeasts [10]. Importantly, like S. cerevisiae, H. polymorpha also has the GRAS status. In this contribution we focus on current protocols that are used for genetic engineering of H. polymorpha, specifically for gene deletion, single- or multi-copy gene integration, and transformation.

2

Materials Prepare all solutions using deionized water (dH2O) and analytical grade reagents. Growth media should be autoclaved before use at 121 ° C, 15 psi, for 20 min, unless otherwise indicated. All solutions can be stored at room temperature, unless stated otherwise.

2.1 H. polymorpha and E. coli Strains

Three H. polymorpha strains are commonly applied for basic research and industrial purposes: CBS4732, NCYC495, and DL1 (Table 1). H. polymorpha CBS4732 has first been isolated from soil irrigated with distillery waste water, and the uracil auxotrophic derivatives (odc1-mitochondrial 2-oxodicarboxylate carrier) LR9 and RB11 have been developed as hosts for heterologous gene expression [11, 12]. Strain NCYC495 is identical to a strain isolated by Wickerham from spoiled orange juice [13], while DL-1 was isolated from soil and is very similar to CBS4732 [14–16]. These strains exhibit different features and chromosome numbers. The genome of strains CBS4732 [17] (http://www.ncbi.nlm.nih.

Table 1 H. polymorpha strains Strain

Alternative names

Auxotrophy

References

CBS4732

ATCC34438; NRRL-Y-5445; CCY38-22-2



[17]

LR9 odc1 derivative of CBS4732



ura

[11]

RB11 odc1 derivative of CBS4732



ura

[12]

NCYC495

CBS1976; ATCC 14754; NRRL-Y-1798



[16]

DL-1

NRRL-Y-7560; ATCC26012



[14, 15]

ku80



leu1.1

[22]

46

Ruchi Saraya et al.

gov/genbank) and NCYC 495 leu1.1 has been completely sequenced (http://genome.jgi.doe.gov/Hanpo2/Hanpo2.info.html). All protocols and tools detailed in this chapter have been developed for strain NCYC495. H. polymorpha yku80 strain, wherein the YKU80 gene is deleted, is deficient in nonhomologous end joining, which strongly enhances the efficiency of gene targeting relative to the parental NCYC 495 leu1.1 strain (see Subheading 3.4.4). For cloning purposes, Escherichia coli DH5α is used. 2.2

Growth Media

2.2.1 YPD Medium

2.2.2 Yeast Nitrogen Base Plates

Dissolve 10 g of yeast extract, 10 g of peptone, and 10 g of glucose into 1,000 ml of dH2O, and autoclave the solution. For YPD agar plates add 20 g/l agar before autoclaving. 1. Prepare 100 ml of a 10× concentrated yeast nitrogen base (YNB) solution by dissolving 1.7 g of YNB w/o amino acids and nitrogen source in 80 ml of dH2O. Subsequently add 2.5 g ammonium sulfate (NH4)2SO4, and bring the final volume to 100 ml with dH2O. Filter sterilize using a 0.2 μm sterile filter. 2. For preparing YNB agar plates add 20 g agar to 900 ml dH2O and autoclave. After autoclaving, when the temperature of the autoclaved agar solution has cooled down to approx. 50 °C, add 100 ml of filter-sterilized 10× concentrated YNB and other medium constituents (carbon, nitrogen sources, and other medium constituents) and pour the plates. 3. For the choice of the carbon and nitrogen sources see Subheading 3.1 and Table 2. Addition of other medium constituents (Table 3) from sterile stock solutions depends on the auxotrophic requirements of the strain (see Note 1).

2.2.3 Vishniac Solution (1,000× Concentrated)

1. Dissolve 10 g of EDTA in 800 ml of dH2O. Then add 4.4 g of ZnSO4·7H2O, and adjust the pH to 6.0 using 1 M NaOH solution. 2. Add trace elements (1.01 g of MnCl2·4H2O, 0.32 g of CoCl2·6H2O, 0.315 g of CuSO4·5H2O, 0.22 g of (NH4)6Mo7O24·4H2O, 1.47 g of CaCl2·2H2O, 1.0 g of FeSO4·7H2O) one by one under continuous stirring and continuously adjusting the pH to 6. 3. Adjust the pH to 4.0 using 1 M HCl. Make up the volume to 1 l. Sterilize the 1,000× concentrated Vishniac solution by autoclaving. Store at 4 °C in the dark.

2.2.4 Vitamin Solution (1,000× Concentrated Stock)

1. Dissolve 10 mg of biotin in 10 ml of 0.1 N NaOH. Add 80 ml of 10 mM potassium phosphate buffer pH 7.5. 2. Add 20 mg of thiamine, 10 mg of riboflavine, 0.5 g of nicotinic acid, 30 mg of p-aminobenzoic acid, 10 mg of pyridoxal hydrochloride, 0.2 g of Ca-pantothenate, and 1 g of inositol.

Genetic Tools for Hansenula polymorpha

47

Table 2 Carbon and nitrogen sources for growth of H. polymorpha

a

Compound

Final concentration

Stock

Sterilization of stock

Carbon source Glucose Methanol Ethanol Glycerol

0.5 %a (w/v) 0.4 %a (v/v) 0.3 %a (v/v) 1 %a (v/v)

50 % (w/v) 40 % (v/v) 50 % (v/v) 50 % (v/v)

Autoclave Filter sterilizationb Filter sterilizationb Autoclave

Nitrogen source Methylamine Ethylamine Choline D-alanine (NH4)2SO4

0.25 % (w/v) 0.25 % (w/v) 0.25 % (w/v) 0.25 % (w/v) 0.25 % (w/v)

25 % (w/v) 25 % (w/v) 50 % (w/v) 25 % (w/v) 50 % (w/v)

Filter sterilizationb Filter sterilizationb Filter sterilizationb Filter sterilizationb Autoclave

These are maximum concentrations in the media Filter sterilize using sterile filter with 0.2 μm pore size

b

Table 3 Additional medium constituents

Constituent

mg/l in medium

Stock solution (mg/ml) (concentration of stock)a

Adenine-sulfate

20

2 (100×)

Uracil

30

3 (100×)

L-tryptophane

20

10 (500×)

L-histidine

HCl

20

10 (500×)

L-arginine

HCl

20

10 (500×)

L-methionine

20

10 (500×)

L-tyrosine

30

2 (66×)

L-leucine

30

3 (100×)

L-isoleucine

30

3 (100×)

L-lysine·HCl

30

10 (300×)

L-phenylalanine

50

10 (200×)

L-glutamate

100

10 (100×)

L-aspartate

100

10 (100×)

L-valine

150

30 (200×)

L-threonine

200

40 (200×)

L-serine

400

80 (200×)

All stock solutions can be autoclaved and stored at room temperature

a

48

Ruchi Saraya et al.

3. Add dH2O to the final volume of 100 ml liter. Sterilize the solution by filtration through a 0.2 μm sterile filter. Store at 4 °C. 2.2.5 Mineral Medium for H. polymorpha

1. Dissolve 2.5 g of (NH4)2SO4, 0.2 g of MgSO4, 0.7 g of K2HPO4, 3.0 g of NaH2PO4, 0.5 g of yeast extract, and 1 ml of (1,000×) Vishniac solution in 900 ml of dH2O. Make up the final volume to 1 l and autoclave. 2. After autoclaving add 1 ml of 1,000× concentrated filtersterilized vitamin solution. Add carbon source (see Table 2) and additional constituents (see Table 3) from sterile stock solutions (see Note 2). When instead of (NH4)2SO4 alternative nitrogen sources are used, omit (NH4)2SO4, but add 0.2 % (w/v) K2SO4 to prevent SO42− limitation.

2.2.6 LB Medium

1. Dissolve 10 g of tryptone, 5 g of yeast extract, and 10 g of NaCl in 950 ml dH2O. Adjust the pH to 7.0 with 0.1 N NaOH, and make up the volume to 1 l. For LB medium agar plates, supplement the medium with 2 % (w/v) agar. 2. After autoclaving, add 100 μg/ml ampicillin or 50 μg/ml kanamycin when required for selective pressure.

2.3 Solutions for Genomic DNA Isolation

1. T100E50 (pH 8) buffer: 100 mM Tris–HCl, pH 8, 50 mM EDTA, pH 8 in dH2O. Adjust the pH to 8.0 using 1 M HCl. 2. T10E1 (pH 8): 10 mM Tris–HCl, pH 8, 1 mM EDTA, pH 8 in dH2O. Adjust the pH to 8.0 using 1 M HCl. 3. C100E10: 100 mM Na-citrate and 10 mM EDTA in dH2O and adjust the pH to 5.8 using 0.1 M NaOH. 4. β-mercaptoethanol (supplied as 14.3 M stock). 5. 10 % SDS: Dissolve 10 g of SDS in dH2O. 6. 6 M NaCl in dH2O. 7. RNase (1000×): Prepare an RNase stock solution by dissolving 20 mg/ml RNase in dH2O. In order to inactivate the DNase, the solution is incubated at 100 °C for 10 min, followed by slowly cooling the solution to room temperature. Aliquots are stored at −20 °C. 8. Zymolyase solution (10×): Dissolve 10 mg of zymolyase-100 T from Arthrobacter luteus in 1 ml of 1 M sorbitol.

2.4 Materials and Solutions for Cloning and Transformation 2.4.1 Primers and Gateway Vectors

Dissolve primers to a final concentration of 100 μM in dH2O, and store them at −20 °C. PCR products are stored at −20 °C. 1. Forward Gateway primer: The primer consists of 22 or 25 bp attB site followed by 18–25 bp of template/gene-specific sequence. Use Table 4 to choose appropriate attB site (see Fig. 1 and Subheading 3.4.1) (see Notes 3–5).

Genetic Tools for Hansenula polymorpha

49

Table 4 Choice of appropriate att site to be used in designing Gateway primers

DNA sequence of interest

Forward PCR primer

Reverse PCR primer

5′ region

attB4

attB1

Selection marker

attB1

attB2

3′ region

attB2

attB3

pENTR 2 3

pENTR 4 1 B4

B1

5 Prime Region

B1

B2

Selection Marker

3 Prime Region

B3

B2

pENTR 2 1

B4

5 Prime Region

Selection Marker/Gene

3 Prime Region

B3

pDEST 4 3, final deletion construct

Fig. 1 Recombination cloning using GatewayTM technology in Hansenula polymorpha. (a) Schematic representation of the multi-site GatewayTM cloning system. Three modules in ENTR vectors are recombined with a destination vector resulting in a new plasmid. Depending on the modules chosen, the system can be utilized for (heterologous) gene expression or making gene deletion cassettes. B1, B2, B3, and B4 represent the respective att sites (recombination sites)

2. Reverse Gateway primer: The primer consists of 22 or 25 bp attB site followed by 18–25 bp of template/gene-specific sequence. Use Table 4 to choose appropriate attB site (see Fig. 1 and Subheading 3.4.1, see Notes 3–5). 3. Attachment sites for primer sequences are listed in Table 5 (see Notes 3–5). 4. Primers for single-step PCR: These primers are 70–80 nucleotides long and typically contain 20–30 nucleotides specific to the selection marker used to delete the gene of interest (refer to Table 6 pDONR 221). Add to the primer a sequence of 50 base pairs homologous to the region of interest (see Note 6).

50

Ruchi Saraya et al.

Table 5 Primer sequence of the att sites used in designing the Gateway primers attB Forward

Primer 5′–3′

attB1

GGGGACAAGTTTGTACAAAAAAGCAGGCTNN… (15–20 gene-specific nucleotides)

attB2

GGGGACAGCTTTCTTGTACAAAGTGGNN… (15–20 gene-specific nucleotides)

attB4

GGGGACAACTTTGTATAGAAAAGTTGNN… (15–20 gene-specific nucleotides)

attB Reverse attB1

GGGGACTGCTTTTTTGTACAAACTTGN… (15–20 gene-specific nucleotides)

attB2

GGGGACCACTTTGTACAAGAAAGCTGGGTN… (15–20 gene-specific nucleotides)

attB4

GGGGACAACTTTGTATAATAAAGTTGN… (15–20 gene-specific nucleotides)

att attachment sites are italicized

Table 6 Gateway vectors

Available vectors

Cloned element (promoter/marker gene/ terminator)

References

pDONR P4-P1

5′ region

Invitrogen

pDONR P4-P1R + PAOX

H. polymorpha AOX promoter

[21]

pDONR P4-P1R + PAMO

H. polymorpha AMO promoter

[39]

pDONR P4-P1R + PTEF1

H. polymorpha TEF1 promoter

Unpublished

pDONR P4-P1R + PPEX14

H. polymorpha PEX14 promoter

Unpublished

pDONR P4-P1R + PPEX11

H. polymorpha PEX11 promoter

[40]

pDONR 221

pENTR221-URA3 (uracil) pENTR221-Ca LEU2 (leucine) pENTR221-bsd (blasticidin) pENTR221-hph (hygromycine) pENTR221-kanMX (geneticin) pENTR221-nat1 (nourseothricin) pENTR221-ble (zeocin) pENTR221-pat (bialaphos)

[22]

pDONR 23

3′ element

Invitrogen

pDONR P2R-P3 + TAMO

H. polymorpha AMO terminator

[41]

pDEST 43

Final destination vector

Invitrogen

pDEST-NAT

Destination vector with nat1 (nourseothricin) gene

[21]

pDEST-ZEO

Destination vector with ble (zeocin) gene

[21]

Genetic Tools for Hansenula polymorpha 2.4.2 Materials and Solutions for PCR

51

1. DNA polymerase in an appropriate buffer according to the instructions of the manufacturer. 2. Primers, DNA template, dH2O which is RNase and DNase free. 3. 50× TAE buffer: Dissolve 242 g Tris base in 500 ml in dH2O, add 57.1 ml glacial acetic acid and 100 ml of a solution of 500 mM EDTA (pH 8.0), and make up the final volume to 1 l. 4. Agarose gel: 1 % agarose in TAE buffer.

2.4.3 Electrotransformation

1. TED buffer: 100 mM Tris–HCl, pH 8.0, 50 mM EDTA, pH 8.0 in dH2O. Adjust the pH to 8.0 using 1 M HCl. Add 25 mM dithiothreitol (DTT). 2. STM buffer: 270 mM sucrose, 10 mM Tris–HCl, pH 8.0, 1 mM MgCl2 in dH2O. Adjust pH to 8.0 using 1 M HCl. 3. BTX electro cell manipulator model number ECM 600.

3

Methods

3.1 Cultivation of H. polymorpha in Batch Cultures

H. polymorpha is grown in Erlenmeyer flasks in a shaker incubator at 37 °C and 200 rpm. For medium composition to grow strains harboring a plasmid, see Note 7. 1. Inoculate cells using colonies from a YPD plate (plate with colonies can be stored in the refrigerator for maximum 2 months) into 20 ml of mineral medium supplemented with 0.5 % (w/v) glucose and 0.25 % (w/v) ammonium sulfate in a 100 ml Erlenmeyer flask. Incubate overnight in a shaker incubator. 2. The overnight culture (optical density at 660 nm (OD660) approx. 4–5.5) is used to inoculate 20 ml mineral medium with glucose/ammonium sulfate to an OD660 of 0.1 (see Note 7). Incubate until this culture reaches the exponential growth phase (OD660 = 1.5). Dilute this culture again to fresh mineral medium with glucose/ammonium sulfate and grow until OD660 = 1.5. Finally, this culture is again diluted to fresh medium. When this third culture reaches OD660 = 1.5, use it as inoculum for the final culture. 3. The composition of the medium with respect to carbon and nitrogen sources depends on the promoter used for gene expression (Table 7). For induction of the alcohol oxidase promoter (PAOX) or the dihydroxyacetone synthase promoter (PDHAS) cells should be grown on methanol. For induction of the amine oxidase promoter (PAMO) cells should be grown on glucose in the presence of methylamine as nitrogen source.

52

Ruchi Saraya et al.

The TEF1 promoter is constitutive. For expression vectors containing PTEF1 cells can be grown on any medium. The regulation of the promoter of INP2 is cell cycle dependent but independent of carbon and nitrogen sources. The CAT gene and PEX genes show a relatively low expression in cells grown on glucose. The expression of these genes (and the genes under the control of the corresponding promoters) is slightly Table 7 Plasmids for genetic engineering of H. polymorpha Plasmid

Promoter

Gene/marker

References

pHI1

Hp AOX

Hp URA3/uracil

[23]

pHIPA4

Hp AOX

Hp ADE11/adenine

[24]

pHIPX4

Hp AOX

Sc LEU2/leucine

[25]

pHIPM4

Hp AOX

Hp MET6/methionine

[26]

pHIPX5

Hp AMO

Sc LEU2/leucine

[27]

pHIPX6

Hp PEX3

Sc LEU2/leucine

[27]

pHIPX7

Hp TEF1

Sc LEU2/leucine

[28]

pHIPX8

Hp TEF2

Sc LEU2/leucine

Lab collection

pHIPX9

Hp CAT

Sc LEU2/leucine

Lab collection

pHIPX10

Hp PEX14

Sc LEU2/leucine

[29]

pHIPX11

Hp PEX4

Sc LEU2/leucine

Lab collection

pHIPX12

Hp PEX5

Sc LEU2/leucine

[30]

pHIPX13

Hp PEX19

Sc LEU2/leucine

[31]

pHIPX4-B

Hp AOX

Sc LEU2/leucine

[32]

pHIPX4-HNBESX

Hp AOX

Sc LEU2/leucine

[33]

pHIPX14

Hp INP2

Sc LEU2/leucine

Lab collection

pHIPZ4

Hp AOX

Sh-ble/zeocine

[34]

pHIPZ5

Hp AMO

Sh-ble/zeocine

[35]

pHIPZ6

Hp PEX3

Sh-ble/zeocineR

Lab collection

pHIPZ7

Hp TEF1

Sh-ble/zeocineR

[28]

pHIPZ15

Hp DHAS

Sh-ble/zeocine

[33]

pHIPZ17-Nia

Hp PEX11

Sh-ble/zeocineR

Lab Collection

pHIPN4

Hp AOX

Sn-nat1/nourseothricinR

[21]

pHIPH4

Hp AOX

Kp-hph/hygromycin B

[21]

pHIPB4

Hp AOX

Sv-pat/bialaphos

Lab collection

pHIPK4

Hp AOX

Tn-KanMX/G-418/geneticinR

Lab collection

R R

R

R

Ra

(continued)

Genetic Tools for Hansenula polymorpha

53

Table 7 (continued) Replicating plasmids Plasmid

Gene/marker

References

pHARS1

Sc URA3/uracil

[11]

pHRP2

Sc LEU2/leucine

[36]

pHS6

Sc LEU2/leucine

[37]

pMPT121

Sc URA3/leucine

[12]

pYT3

Sc LEU2/leucine

[38]

pSK92

HARO7/tyrosine

[36]

Hp Hansenula polymorpha, Kp Klebsiella pneumoniae, Sc Saccharomyces cerevisiae, Sh Streptoalloteichus hindustanus, Sn Streptomyces noursei, Sv Streptomyces viridochromogenes, Tn Escherichia coli transposon Tn903, AOX Alcohol oxidase, AMO Amine oxidase, TEF1 and TEF2 Translational elongation factor EF-1 alpha, CAT Catalase, PEX various genes involved in peroxisome biogenesis, R resistance a Bialaphos is a peptide that can be used only in combination with synthetic mineral medium and not with rich yeast extract peptone dextrose (YPD) plates

increased during growth on methanol. Of the PEX promoters PPEX4 is the weakest whereas PPEX11 is the strongest. 3.2 Genomic DNA Isolation

All steps are performed at room temperature unless stated otherwise. 1. Inoculate cells using colonies from a YPD plate into YPD medium. Incubate the culture at 37 °C overnight in a shaker incubator. 2. Collect the cells by centrifugation at 15,000 × g for 2 min using microcentrifuge tubes. 3. Resuspend the cell pellet in 500 μl T100E50 buffer containing 5 μl of β-mercaptoethanol using a micropipette. Make sure that the pellet is suspended completely until no cell clumps are present anymore. 4. Incubate the tubes for 15 min at 37 °C with shaking at 200 rpm. 5. Harvest the cells by centrifugation for 2 min at 15,000 × g, wash the cells by resuspending the cell pellet in 1 ml C100E10 (pH 5.8), and harvest the cells again by centrifugation at 15,000 × g for 2 min. 6. In order to lyse the cell walls, the cells are suspended in 300 μl C100E10 (pH 5.8) with 1 mg zymolyase. The cells are incubated at 37 °C for 60 min with shaking at 200 rpm. 7. Spheroplasts are lysed by adding 15 μl of 10 % SDS solution and, upon gentle mixing, subsequent incubation for 10 min at 65 °C without shaking.

54

Ruchi Saraya et al.

8. In order to precipitate the proteins, add 100 μl 6 M NaCl. Upon carefully mixing incubate the suspension on ice for 5 min. 9. Precipitated proteins are pelleted by centrifuging at 15,000 × g for 3 min. Transfer the supernatant containing the DNA to a new 1.5 ml microcentrifuge tube. 10. Precipitate the DNA by adding 300 μl of isopropanol and gently mix by inversion; centrifuge the tubes at 15,000 × g for 5 min. 11. Carefully decant the supernatant. Wash the DNA pellet by adding 750 μl of 70 % (v/v) ethanol and centrifugation at 15,000 × g for 5 min. 12. Completely remove the supernatant, and air-dry the pellet (see Note 8). 13. Dissolve the DNA in 50 μl of T10E1 (pH 8). If the pellet does not dissolve completely, incubate the tubes at 37 °C for 10 min. 14. Add 20 μg/ml RNase and incubate the tubes for 1 h at 37 °C or overnight at 4 °C. 15. Analyze the DNA by gel electrophoresis using 1–2 μl of the DNA preparation on a 0.8 % agarose gel. Intact DNA typically migrates above the 10 kb marker band. 3.3 Transformation of H. polymorpha 3.3.1 Preparation of H. polymorpha Competent Cells

1. Grow H. polymorpha cells overnight at 37 °C in YPD medium (see Note 9). Inoculate 2 ml of the overnight culture into 100 ml of pre-warmed YPD medium and incubate at 37 °C with shaking at 200 rpm (New Brunswick Innova incubator shaker 44R) up to OD660 1.2–1.5 (it will take approximately 6 h). 2. Harvest cells by centrifugation for 5 min at 3,000 × g at room temperature (RT). Use 50 ml falcon tubes. 3. Resuspend the cells in 25 ml of TED buffer (see Note 10). 4. Incubate for 15 min at 37 °C with shaking at 200 rpm. Harvest cells by centrifugation for 5 min at 3,000 × g at RT, and discard the supernatant. 5. Wash the cells by resuspending them in 100 ml of ice-cold STM buffer. 6. Collect the cells by centrifugation for 5 min at 3,500 × g at 4 °C. Discard the supernatant. 7. Wash the cells by resuspending in 50 ml of ice-cold STM buffer. 8. Spin down the cells for 5 min at 3,500 × g at 4 °C, and discard the supernatant.

Genetic Tools for Hansenula polymorpha

55

9. Resuspend the cells in 0.5 ml of ice-cold STM buffer ( see Note 11). 10. Aliquot the cell suspension in batches of 60 μl in microcentrifuge tubes. 11. Competent cells are immediately used for transformation (see Note 12). 3.3.2 Electrotransformation of H. polymorpha Cells

Electroporation is an efficient method to introduce foreign DNA in H. polymorpha. We use this method for making both gene integrations and deletions. Depending on the choice of integration site expression plasmid, the plasmid DNA is linearized with an appropriate restriction enzyme either in promoter region or in the yeast auxotrophic selection marker. For deletion we use PCR cassettes generated using deletion plasmid as template (containing the 5′ region, a selection marker, and the 3′ region; see Subheading 3.4.3). 1. Transfer a 60 μl competent cell suspension and maximum 4 μl of DNA solution to a 2 mm electroporation cuvette. 2. Pulse setting in BTX ECM600 Cell electroporator is as follows: 50 μF, 129 Ω, and 1.5 kV (7.5 kV/cm). Electro-pulse the cuvette (resulting pulse length will be 4–5 ms). 3. Add immediately 940 μl of YPD (RT) to the cell/DNA mixture and transfer to 2.0 ml microcentrifuge tubes. 4. Incubate cells at 37 °C, shaking at 200 rpm for 1 h. 5. Harvest the cells by centrifugation at 5,500 × g for 2 min, and resuspend the cells in 1 ml of YND. 6. Harvest the cells by centrifugation at 5,500 × g for 2 min. 7. Resuspend cells in 1 ml of YND, and plate 1, 10, and 89 % of the suspension of transformed cells on selective YND plates, respectively. Incubate the plates at 37 °C. Colonies appear after 2–3 days. In case of a dominant marker, the YND washing step (steps 6 and 7 in this section) can be omitted when plating in the presence of antibiotics (100 μg/ml zeocin or 100 μg/ml nourseothricin or 300 μg/ml hygromycine B).

3.4 GatewayTM Cloning 3.4.1 Principle of GatewayTM Cloning

Expression vectors or constructs for gene deletion can be readily made using the GatewayTM technology (Invitrogen). This technology is based on site-specific recombination properties of bacteriophage lambda. It provides an efficient way to clone DNA sequences into multiple vectors. The major components are DNA recombination sequences and specific recombination enzymes. Specific recombination enzymes bind to DNA recombination sequences, called attachment sites (att sites). Recombination occurs between attB (the sequence that is present on the DNA fragment to be cloned) and attP (sequence present on the pDONR GatewayTM plasmid) in the BP reaction. The first step in Gateway

56

Ruchi Saraya et al.

cloning is the preparation of a Gateway ENTR clone. ENTR clones are generally made in two steps. First, Gateway attB sequences are added to the 5′ and 3′ end of a gene fragment using gene-specific PCR primers with appropriate attB sites (Table 5). Subsequently, the PCR amplification products are mixed with Gateway Donor vectors and the BP ClonaseTM enzyme mix. The enzyme mix catalyzes the recombination and insertion of the attB sequence-containing PCR product into the attP recombination sites in the Gateway Donor vector. The resulting plasmid is called the ENTR clone. The gene cassette in the ENTR clone can be transferred into any Gateway Destination vector (Gateway vector with attR recombination sequences) using LR ClonaseTM. In order to make a deletion construct, the 5′ region of the gene to be deleted is cloned in pDONR 4 1, resulting in vector pENTR 4 1. By cloning of the 3′ region of the gene in pDONR 2 3, vector pENTR 2 3 is generated. A selection marker is cloned in pDONR 2 1, which results in pENTR 2 1. Finally the three ENTR vectors are combined with the Gateway destination vector pDEST 4 3, resulting in the final deletion plasmid. See Fig. 1 for a schematic explanation of the cloning steps described above. In an essentially similar way expression plasmids can be made: the promoter is cloned in pDONR 4 1 vector, the gene of interest in pDONR 2 1 vector, and the terminator in pDONR 2 3 vector to get the corresponding pENTR vectors. Upon combining these vectors with the pDEST 4 3 Gateway Destination vector containing a selection marker (Table 6) an expression plasmid is generated. 3.4.2 Construction of Deletion Plasmids Using GatewayTM Cloning

Using pDONR 2 1 with a yeast selection marker (Table 6) and gene-specific 5′ and 3′ regions cloned in the pDONR 4 1 and pDONR 2 3 clones, respectively, a deletion construct is generated by mixing different pENTR clones (Fig. 1).

BP Reaction

The BP reaction in the GatewayTM system catalyzes recombination of an attB site with an attP site. 1. Amplify via PCR the 5′ region of the target gene using total genomic DNA as template with Gateway primers (designed along with the attachment sites). 2. Reaction mix for PCR: (a) 0.5 μl of Fw primer (10 μM). (b) 0.5 μl of Rev primer (10 μM). (c) 5 μl DNA polymerase buffer (10×). (d) 1 μl DNA polymerase enzyme. (e) 42 μl dH2O. (f) 2–3 μl Template DNA (100 ng). Total volume 50 μl.

Genetic Tools for Hansenula polymorpha

57

3. Amplify via PCR the 3′ region of the target gene using total genomic DNA as template with Gateway primers (designed along with the attachment sites), using the same PCR reaction mix except for the primers used. 4. Isolate the PCR products for the BP reaction from an agarose gel (see Note 13). 5. Reaction mix for the BP reaction (see Note 14): (a) 0.5 μl vector pDONRxy (150 ng/μl = 50 fmol/μl). (b) 1.0 μl PCR product (25 fmol/μl). (c) 2.5 μl dH2O. (d) 1.0 μl BP clonase II (incl. 5× buffer). (e) 5.0 μl total. 6. Incubate overnight (maximum 18 h) at 25 °C for cleavage of the sites by BP clonase. 7. To stop the activity of BP clonase add 0.5 μl Proteinase K solution (see Note 15) and incubate for 10 min at 37 °C. 8. Perform electrotransformation to E. coli (DH5α) according to [18]. It is recommended to perform ethanol precipitation of the DNA before electrotransformation (see Note 16). 9. Plate the transformation reaction mix on selective LB plates containing 50 μg/ml kanamycin. Incubate the plates at 37 °C overnight. LR Reaction

The LR reaction facilitates recombination of an att L site (ENTR clone) with an att R site (destination vector). 1. Isolate vector plasmids according to standard plasmid isolation methods (e.g., using commercial kit for plasmid isolation). 2. Reaction mix for LR reaction (see Note 14): (a) x μl pDONR R4-R1 (5 fmol). (b) x μl pDONR R 221 (5 fmol) (Table 6). (c) x μl pDONR R2-L3 (5 fmol). (d) 0.5 μl pDEST R4-R3 (10 fmol). (e) x μl dH2O. (f) μl LR clonase PlusII (incl. 5× buffer). 5 μl total. 3. In order to let LR clonase cleave the sites, incubate overnight (maximum 18 h) at 25 °C. 4. To stop the activity of the LR clonase enzyme add 0.5 μl Proteinase K solution (see Note 15) and incubate for 10 min at 37 °C. 5. The mix can be stored at −20 °C for maximum 1 week.

58

Ruchi Saraya et al.

6. Perform electrotransformation of E. coli (DH5α). Ethanol precipitation for DNA is recommended [18] before electrotransformation to purify and concentrate the DNA (see Note 16). 7. Plate the transformation reaction mix on selective LB plate containing 100 μg/ml ampicillin, and incubate the plates at 37 °C overnight. 3.4.3 Gene Deletion in H. polymorpha

1. Isolate from E. coli the final LR plasmid containing the 5′ region, a selection marker, and the 3′ region of the gene to be deleted. 2. Amplify the deletion cassette via PCR using LR plasmid DNA as a template and 5′ forward and 3′ reverse gene-specific primers. 3. Isolate the PCR product (to be used for H. polymorpha transformation) from an agarose gel (see Note 8). 4. Perform electrotransformation of H. polymorpha using the purified PCR product (see Subheading 3.3.2). The transformation reaction mix is plated on YND or YPD agar plates and incubated at 37 °C for 2–3 days until colonies appear (generally 1 day). In case of dominant markers use YPD plates with the corresponding antibiotic. For auxotrophic markers use YND plates with the appropriate amino acids. 5. Transformants are first screened via yeast colony PCR [19]. Primers are designed based on the genome sequence (few nucleotides before the gene deleted and the other primer based on the sequence inside the marker present in the transformed fragment). Correct integration should be confirmed by southern blot analysis [20]. Examples of gene deletions can be found in [21].

3.5 Gene Integration in H. polymorpha

The GatewayTM technology can also be used to construct H. polymorpha expression plasmids. In this case the pENTR 4 1 vector with the desired promoter and the pENTR 2 3 vector with terminator of the gene of interest in pENTR 2 1 are combined with the final destination vector (pDEST plasmid with selection marker, Table 6). The final expression plasmid can be used as an integration plasmid.

3.5.1 Selection of Strains with Single-, Double-, or Multi-Copy Gene Integration

1. Pool transformants of the 89 % plate and resuspend the colonies in 20 ml YPD. 2. Grow for approximately 50 generations in YPD medium containing 50 mg/l ampicillin (to prevent contamination). Growth on nonselective medium is performed to get rid of transformants in which the DNA is not integrated. 3. Harvest the cells by centrifugation at 3,000 × g for 5 min at 37 ° C. Discard the supernatant, and add 30–40 ml of fresh YPD medium. Repeat this step 4–5 times.

Genetic Tools for Hansenula polymorpha

59

4. Make dilutions from cell suspension in YND and plate on YND plates containing the required amino acids. Normally, 50 μl of 103-, 104-, and 105-fold diluted suspension give selectable individual colonies. 5. After 2–4 days, small-, medium-, and large-sized colonies can be discriminated. Small ones typically have one copy of the expression plasmid integrated, medium-sized ones two copies, and large ones three or more copies. Selecting these differentsized colonies will give a range of expression levels of the heterologous gene (i.e., the correlation between the copy number of the plasmid and the size of the colony is linked to the presence of different copies of the auxotrophic marker) [24, 26]. 6. For protein production, leucine should be added to the culture medium to get optimal growth of the strains containing singleor double-copy integration, because otherwise leucine may become limiting (Table 3). 7. Check the site-specific integration (for expression we normally integrate in the promoter region by means of single crossover) by southern blotting [20].

4

Notes 1. YNB is not suitable to grow H. polymorpha in batch cultures on methanol. Use YNB only in agar plates (2 % agar) to select autotrophic/prototrophic colonies. 2. Minimal medium is the preferred medium for cultivating yeast cells in liquid media. 3. The attB1 and attB2 reverse primers must include one additional nucleotide to maintain the proper reading frame. 4. Any in-frame stop codons between the attB sites and your gene of interest must be removed. 5. If you do not wish to fuse your PCR product in frame with a C-terminal tag, your gene of interest or the attB2 primer must include a stop codon. 6. Choose the regions in such a way that the terminator and the promoter of the flanking genes remain intact. 7. For strains harboring plasmids, pre-cultivation is performed in media without yeast extract. For the final culture (methanol) 0.02 % yeast extract can be used. 8. The DNA pellet is not translucent; over-drying the pellet might cause difficulties in resuspending the DNA. 9. Potential bacterial contamination of the cultures can be minimized by adding 50 mg/l kanamycin or 100 mg/l ampicillin to the cultures.

60

Ruchi Saraya et al.

10. Instead of preparing stock solution of DTT, alternatively use a fresh solution of 100 mg DTT per 25 ml TE buffer. 11. To obtain the highest frequencies, resuspend cell pellet in smallest volume of STM. 12. For long-term storage, aliquots of cell are frozen in liquid N2 and stored at −80 °C. The efficiency of competent cells drops upon freezing. For maximum efficiency use freshly prepared competent cells. 13. Purify the PCR product from an agarose gel using commercially available agarose gel extraction kits in order to remove any primer dimers and other nonspecific DNA fragments. 14. Thaw and keep enzymes in a −20 °C block and return to −80 °C freezer as soon as possible. 15. Proteinase K is also active at low temperatures (even on ice); thaw on ice, and return to −80 °C freezer as soon as possible to avoid proteolytic digestion of the enzyme. 16. Because of the low amount of DNA used in these reactions you can add 1 μl glycogen (20 μg/μl) before ethanol precipitation. Addition of glycogen makes the DNA pellet visible and reduces the chance of losing the DNA pellet during ethanol precipitation. References 1. Gidijala L, Kiel JA, Douma RD et al (2009) An engineered yeast efficiently secreting penicillin. PLoS One 12:e8317 2. Keasling JD (2010) Manufacturing molecules through metabolic engineering. Science 330: 1355–1358 3. Faber KN, Harder W, Ab G et al (1995) Review: methylotrophic yeasts as factories for the production of foreign proteins. Yeast 11:1331–1344 4. Cox H, Mead D, Sudbery P et al (2000) Constitutive expression of recombinant proteins in the methylotrophic yeast Hansenula polymorpha using the PMA1 promoter. Yeast 16:1191–1203 5. Heo JH, Hong WK, Cho EY et al (2003) Properties of the Hansenula polymorphaderived constitutive GAP promoter, assessed using an HSA reporter gene. FEMS Yeast Res 4:175–184 6. Kiel JA, Titorenko VI, van der Klei IJ et al (2007) Overproduction of translation elongation factor 1-α (eEF1A) suppresses the peroxisome biogenesis defect in a Hansenula polymorpha pex3 mutant via translational readthrough. FEMS Yeast Res 7:1114–1125

7. Hamilton SR, Bobrowicz P, Bobrowicz B et al (2003) Production of complex human glycoproteins in yeast. Science 301:1244–1246 8. Janowicz ZA, Melber K, Merckelbach A et al (1991) Simultaneous expression of the S and L surface antigens of hepatitis B and formation of mixed particles in the methylotrophic yeast Hansenula polymorpha. Yeast 7:431–433 9. Brierley RA, Davis GR, Holtz GC (1994) Production of insulin-like growth factor 1 in methylotrophic yeast cells, United States Patent, No. 5324639 10. Mayer AF, Hellmuth K, Schlieker H et al (1999) An expression system matures: a highly efficient and cost-effective process for phytase production by recombinant strains of Hansenula polymorpha. Biotechnol Bioeng 63:373–381 11. Rainer Roggenkamp H, Eckart M, Janowicz Z, Hollenberg CP (1986) Transformation of the methylotrophic yeast Hansenula polymorpha by autonomous replication and integration vectors. Mol Gen Genet 202:302–308 12. Suckow M, Gellissen G (2005) The expression platform based on H. polymorpha Strain RB11 and its derivatives—history, status and perspec-

Genetic Tools for Hansenula polymorpha

13. 14. 15.

16. 17.

18.

19. 20.

21.

22. 23.

24.

25.

26.

tives. In: Gellissen G (ed) Hansenula polymorpha. Wiley, Weinheim, pp 105–123 Levine DW, Cooney CL (1973) Isolation and characterization of a thermotolerant methanolutilizing yeast. Appl Microbiol 26:982–990 Higgins DR, Cregg JM (1998) Introduction to Pichia pastoris. Methods Mol Biol 103:1–15 Gellissen G, Kunze G, Gaillardin C et al (2005) New yeast expression platforms based on methylotrophic Hansenula polymorpha and Pichia pastoris and on dimorphic Arxula adeninivorans and Yarrowia lipolytica—a comparison. FEMS Yeast Res 5:1079–1096 Gleeson MA, Sudbery PE (1988) Genetic analysis in the methylotrophic yeast Hansenula polymorpha. Yeast 4:293–303 Ramezani-Rad M, Hollenberg CP, Lauber J et al (2003) The Hansenula polymorpha (strain CBS4732) genome sequencing and analysis. FEMS Yeast Res 4:207–215 Sambrook J, Fritsch EF, Sambrook J (1989) Molecular cloning: a laboratory manual, 2nd edn. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY Amberg DC, Burke DJ, Strathern JN (2006) Yeast Colony PCR. Cold Spring Harbor Protocols. 1, prot4170 Amberg DC, Burke DJ, Strathern JN (2006) Isolation of Yeast Genomic DNA for Southern Blot Analysis. Cold Spring Harbor Protocols. 1, prot4149 Saraya R, Krikken AM, Veenhuis M et al (2011) Peroxisome reintroduction in Hansenula polymorpha requires Pex25 and Rho1. J Cell Biol 193:885–900 Saraya R, Krikken AM, Kiel JA et al (2012) Novel genetic tools for Hansenula polymorpha. FEMS Yeast Res 12:271–278 Kiel JA, Hilbrands RE, van der Klei IJ et al (1999) Hansenula polymorpha Pex1p and Pex6p are peroxisome-associated AAA proteins that functionally and physically interact. Yeast 15:1059–1078 Haan GJ, van Dijk R, Kiel JA et al (2002) Characterization of the Hansenula polymorpha PUR7 gene and its use as selectable marker for targeted chromosomal integration. FEMS Yeast Res 2:17–24 Gietl C, Faber KN, van der Klei IJ et al (1994) Mutational analysis of the N-terminal topogenic signal of watermelon glyoxysomal malate dehydrogenase using the heterologous host Hansenula polymorpha. Proc Natl Acad Sci U S A 91:3151–3155 Gidijala L, van der Klei IJ, Veenhuis M et al (2007) Reprogramming Hansenula polymor-

27.

28.

29.

30.

31.

32.

33.

34.

35.

36.

37.

61

pha for penicillin production: expression of the Penicillium chrysogenum pcl gene. FEMS Yeast Res 7:1160–1167 Kiel JA, Keizer-Gunnink IK, Krause T et al (1995) Heterologous complementation of peroxisome function in yeast: the Saccharomyces cerevisiae PAS3 gene restores peroxisome biogenesis in a Hansenula polymorpha per9 disruption mutant. FEBS Lett 377:434–438 Baerends RJ, Salomons FA, van der Klei IJ et al (1997) Deviant Pex3p levels affect normal peroxisome formation in Hansenula polymorpha: a sharp increase of the protein level induces the proliferation of numerous, small protein-import competent peroxisomes. Yeast 13:1449–1463 Bellu AR, Komori M, van der Klei IJ et al (2001) Peroxisome biogenesis and selective degradation converge at Pex14p. J Biol Chem 276:44570–44574 van der Klei IJ, Hilbrands RE, Swaving GJ et al (1995) The Hansenula polymorpha PER3 gene is essential for the import of PTS1 proteins into the peroxisomal matrix. J Biol Chem 270:17229–17236 Otzen M, Perband U, Wang D et al (2004) Hansenula polymorpha Pex19p is essential for the formation of functional peroxisomal membranes. J Biol Chem 279:19181–19190 Komori M, Rasmussen SW, Kiel JA et al (1997) The Hansenula polymorpha PEX14 gene encodes a novel peroxisomal membrane protein essential for peroxisome biogenesis. EMBO J 16:44–53 Kiel JA, Otzen M, Veenhuis M et al (2005) Obstruction of polyubiquitination affects PTS1 peroxisomal matrix protein import. Biochim Biophys Acta 1745:176–186 Salomons FA, Kiel JAKW, Faber KN et al (2000) Overproduction of Pex5p stimulates import of alcohol oxidase and dihydroxyacetone synthase in a Hansenula polymorpha Pex14 null mutant. J Biol Chem 275: 12603–12611 Faber KN, van Dijk R, Keizer-Gunnink I et al (2002) Import of assembled PTS1 proteins into peroxisomes of the yeast Hansenula polymorpha: yes and no! Biochim Biophys Acta 1591:157–162 Faber KN, Swaving GJ, Faber F et al (1992) Chromosomal targeting of replicating plasmids in the yeast Hansenula polymorpha. J Gen Microbiol 138:2405–2416 Gellissen G., Kang A. H. (2005) Hansenula polymorpha, in Production of Recombinent proteins, WILEY.VCH,Verlag GmbH & Co. KgaA, Weinheim.

62

Ruchi Saraya et al.

38. Tan X, Waterham HR, Veenhuis M et al (1995) The Hansenula polymorpha PER8 gene encodes a novel peroxisomal integral membrane protein involved in proliferation. J Cell Biol 128:307–319 39. Nagotu S, Saraya R, Otzen M et al (2008) Peroxisome proliferation in Hansenula polymorpha requires Dnm1p which mediates fission but not de novo formation. Biochim Biophys Acta 1783:760–769

40. Cepinska MN, Veenhuis M, van der Klei IJ et al (2011) Peroxisome fission is associated with reorganization of specific membrane proteins. Traffic 12:925–937 41. Nagotu S, Krikken AM, Otzen M et al (2008) Peroxisome fission in Hansenula polymorpha requires Mdv1 and Fis1, two proteins also involved in mitochondrial fission. Traffic 9:1471–1484

Chapter 4 Molecular Tools and Protocols for Engineering the Acid-Tolerant Yeast Zygosaccharomyces bailii as a Potential Cell Factory Paola Branduardi, Laura Dato, and Danilo Porro Abstract Microorganisms offer a tremendous potential as cell factories, and they are indeed used by humans for centuries for biotransformations. Among them, yeasts combine the advantage of unicellular state with a eukaryotic organization, and, in the era of biorefineries, their biodiversity can offer solutions to specific process constraints. Zygosaccharomyces bailii, an ascomycetales budding yeast, is widely known for its peculiar tolerance to various stresses, among which are organic acids. Despite the possibility to apply with this yeast some of the molecular tools and protocols routinely used to manipulate Saccharomyces cerevisiae, adjustments and optimizations are necessary. Here, we describe in detail protocols for transformation, for target gene disruption or gene integration, and for designing episomal expression plasmids helpful for developing and further studying the yeast Z. bailii. Key words Zygosaccharomyces bailii, Yeast transformation, Targeted gene deletion, Plasmids, Promoters

1

Introduction Industrial biotechnology allows for the development of biocompatible processes to be carried out from microorganisms for the production of many different proteins and chemical compounds with a wide range of applications, from pharmaceuticals to renewable energy, food, cosmetics, and textiles, and therefore with relevant economic and social impact [1, 2]. Efforts in current research are made to render existing bioprocesses more efficient, and hence more economically sustainable and competitive, and to establish novel processes to widen the set of useful products. Yeasts are extensively used in such biotechnological processes [3] and beside the most utilized and known Saccharomyces cerevisiae, other yeast species, so called nonconventional, revealed their potential and are now being exploited and studied [4–6]. In fact, the natural properties of some species, such as utilization of particular compounds as nutrients or tolerance to different kinds of stress factors, can

Valeria Mapelli (ed.), Yeast Metabolic Engineering: Methods and Protocols, Methods in Molecular Biology, vol. 1152, DOI 10.1007/978-1-4939-0563-8_4, © Springer Science+Business Media, LLC 2014

63

64

Paola Branduardi et al.

help to overcome some of the main bottlenecks. Zygosaccharomyces bailii has been proposed as a new host due to convenient properties among which its resistance to environmental stresses and specifically to weak organic acids [7–13]. Such compounds, if present at high concentration and especially in their undissociated form, are detrimental for growth and cellular metabolism [14]. However, coping with their presence will be of crucial importance in the era of biorefineries. Acetic acid is in fact abundantly present in pretreated lignocellulose, which is the most promising substrate for second-generation processes [15]. Other organic acids, such as lactic acid and succinic acid, are target products of industrial bioprocesses, as they will represent the new building blocks for the construction of biodegradable and sustainable polymers [16]. For these reasons, the development of organic acid-tolerant cell factories is of primary importance. Despite the possibility to apply molecular tools and protocols routinely used to manipulate Saccharomyces cerevisiae, some optimizations are necessary for their use in Z. bailii. One of the major constraints is represented by the impossibility to obtain stable Z. bailii haploid strains, making genetic manipulations more complicated. Moreover, for a long time no auxotrophic strains were available, obliging to depend on dominant markers, which also have been defined. Finally, considering the ultimate goal of using this yeast in industrial processes, there is a necessity to develop marker-free stable engineered strains. By applying and mixing tools and techniques developed for conventional and nonconventional yeasts, and by considering the specific physiology of Z. bailii, transformation protocols have been optimized. Centromeric, multicopy, and integrative vectors exploiting different promoters have been developed. With plasmids and strains mentioned here, it has been possible to achieve the expression of different heterologous genes, resulting in the production of the corresponding heterologous proteins of diverse origin: from bacterial to mammalian and plant origin [17–23]. Furthermore, different leader sequences were successfully defined and tested for heterologous protein secretion, which is highly desirable for breaking down the recovery downstream costs and for relieving the cell from a metabolic stress caused by protein accumulation. Here, we describe the protocols we routinely use for transforming Z. bailii together with the molecular tools developed for gene expression and deletion.

2

Materials All the solutions are prepared using ultrapure water (MilliQ) and analytical grade reagents, while media for cell cultivation are prepared using deionized water. Unless indicated otherwise, solutions are autoclaved for sterilization; storing temperature and time are specified in Subheading 3.

Engineering Z. bailii as Cell Factory

2.1 Zygosaccharomyces bailii Strains

2.2 Solutions for Transformation Protocols

65

Z. bailii strains utilized in our laboratory are as follows: ATCC 36947, ATCC 60483, and ATCC 8766. The described protocols apply to all of them, with minor differences in results, unless differently specified. Solutions are described in detail here, only if specific technical details are required for their preparation. 1. YPD/YPF media. Dissolve 2 % (w/v) Tryptone, 1 % (w/v) Yeast Extract, 2 % (w/v) Glucose or Fructose for YPD or YPF medium, respectively, in distilled or deionized water and autoclave to sterilize. In case of solid medium, add 2 % (w/v) Agar to the ingredients. Store the media at room temperature, pre-warm them at 30 °C before inoculum. 2. Antibiotic stock solutions. 1,000× G418 stock solution: dissolve 2 g of geneticin (G418) in 10 mL of distilled or deionized water, sterilize by filtration through a sterile 0.22 μm pore size filter and dispense into 1-mL aliquots. 1,000× Nourseothricine stock solution: dissolve 1 g of Nourseothricine (nourseothricine sulfate, cloNAT, WERNER BioAgents, Germany) in 10 mL of distilled or deionized water, sterilize by filtration through a sterile 0.22 μm pore size filter and dispense into 1-mL aliquots. Store all aliquots at −20 °C. 3. Lithium acetate. 1 M lithium acetate: dissolve 10.2 g lithium acetate dihydrate in 100 ml of distilled or deionized water; 2 M solution: dissolve 20.4 g in 100 ml of distilled or deionized water; filter-sterilize or autoclave. There is no need to titrate this solution, but the final pH should be between 8.4 and 8.9. Store the solution at room temperature in small aliquots to better preserve it and prepare new stocks every 2–3 months. Prepare 0.1 M lithium acetate solution just before use and in required amounts. When the solution is too old, white precipitates tend to occur. 4. DL-dithiothreitol (DTT). Prepare a 1 M DTT solution dissolving 1.5 g of DTT (anhydrous m.w. = 154.25) in 8 mL of milliQ water. Adjust the total volume to 10 mL, filter-sterilize (do not autoclave it) and dispense into 1-mL aliquots. Store DTT aliquots in the dark (wrapped in aluminum foil) at −20 °C, indefinitely. It is advisable to use the complete aliquot once it has been defrosted; re-freezing it more than 2–3 times is not advisable. 5. Tris–HCl, pH 7.5. Prepare 1 M Tris–HCl dissolving 121.1 g of Tris base in 800 ml of distilled or deionized water. Adjust the pH by adding

66

Paola Branduardi et al.

65 ml of 37 % (v/v) (i.e., 6 M) HCl (you can check the pH of the solution only if your electrode is suitable for that) and then adjust the volume to 1,000 ml. Sterilize by autoclaving. The solution can be stored for 3–4 months at room temperature, better if kept in smaller aliquots. If the solution turns yellowish, discard it. 6. Sorbitol. 1 M sorbitol solution: dissolve 18.2 g of sorbitol in 100 mL of distilled or deionized water, and autoclave to sterilize. Store at room temperature. When preparing YPD/YPF medium with sorbitol, add sorbitol directly as powder together with the other ingredients of the medium recipe. 7. Single-strand carrier DNA. Weight 200 mg of high molecular weight DNA (as example, deoxyribonucleic acid Sodium Salt Type III from Salmon Testes, Sigma D1626) into 100 ml of TE buffer (10 mM Tris–HCl pH 8.0, 1 mM EDTA). Disperse the DNA into solution by drawing it up and down repeatedly in a 10 ml pipette. Mix vigorously on a magnetic stirrer for 2–3 h (or until fully dissolved: in case overnight stirring is necessary, stir it continuously in a cold room at 4 °C). Filter-sterilize the solution through a 0.45-μm sterile filter unit, aliquot and store at −20 °C. Boil an aliquot for 5 min and then chill it on ice before use. Boiled sample can be used 2–3 times. 8. Polyethylene glycol (PEG) 3350. Prepare 50 % (w/v) PEG3350 (MW 350) solution with distilled or deionized water. Place 50 g of PEG 3350 in a 150 ml glass beaker and add 35 ml of deionized water. Dissolve PEG by stirring and then adjust to the final volume of 100 ml. Filter-sterilize the solution with a 0.45-μm filter unit. Alternatively, the PEG solution can be autoclaved, autoclaving can result in water evaporation and therefore changing the actual concentration of the PEG solution. Store the PEG solution in a tightly capped container to prevent water evaporation. Small variations above or below the optimal PEG concentration (i.e., 33 % (w/v)) during the transformation reaction can reduce the efficiency of transformation. 2.3 Plasmids, Promoters, and Leader Sequences for Gene Expression and Protein Secretion in Z. bailii 2.3.1 Centromeric Plasmids

The development of molecular tools for Z. bailii is necessary for genetic manipulation of this yeast. Centromeric, multicopy, and integrative vectors have been developed (see Table 1). Here, we summarize the array of tools currently available for Z. bailii genetic manipulation (in terms of protein expression and secretion), citing appropriate references for molecular details. 1. pZ3 (Fig. 1a) is the first centromeric plasmid built for Z. bailii. pZ3 bears the S. cerevisiae autonomous replicating sequence ARS1 and the CEN4 sequence; the latter improves plasmid

ZrGAPDH Centromeric

Centromeric

Centromeric

Centromeric

ZbTPI

ScTPI

ScTPI

ScTPI

ScTPI

ZbTPI

ZbTPI

ZrGAPDH Centromeric

ScTPI

ScTPI

ScTPI

ScTPI

ScTPI

ScTPI

ScTPI

pZ3rG

pZ4

pZ5

pZ5(-Nco)

pZ5 (-Nco)ScGAS1 ScTPI

ZbTPI

pZ3bT

pZ5bT

pZ3LacZ

pZ3bTLacZ

pZ3bT-LDH

pZ3rGLacZ

pZ3GFP

pZ3ppαGFP

pZ3klIL-1β

pZ5klhIL-1β

pZ3ppαIL-1β

pZ3pkbIL-1β

pZ3GAA

Centromeric

Centromeric

Centromeric

Centromeric

Centromeric

Centromeric

Centromeric

Centromeric

Centromeric

Centromeric

Centromeric

Centromeric

Centromeric

Centromeric

ScTPI

pZ3

Propagation system

Promoter

Vector

Glucoamylase

IL-1β

IL-1β

IL-1β

IL-1β

GFP

GFP

β-Galactosidase

LDH

β-Galactosidase

β-Galactosidase



β-1,3-Glucanosyl-transglycosylase













Heterologous protein

Table 1 Plasmids constructed for heterologous gene expression in Z. bailii

Aa Glucoamylase

Zb killer toxin

Sc MFα-1

Kl killer toxin

Kl killer toxin

Sc MFα-1



























Branduardi et al. [17] R

kanR

kan

(continued)

Branduardi et al. [17]

Branduardi et al. [17] R

kan

Passolunghi et al. [21] natR

kanR

Branduardi et al. [17]

Branduardi et al. [17] kan

Branduardi et al. [17] R

Branduardi et al. (unpublished data) kanR

kanR

kanR

Dato et al. [22]

Branduardi et al. [17]

kan

Branduardi et al. [17] R

Branduardi et al. (unpublished data)

Passolunghi et al. [21]

Passolunghi et al. [21]

Branduardi et al. [18]

kanR

nat

R

natR

natR

nat

R

hphR

Branduardi et al. [17]

Branduardi et al. (unpublished data)

kan

Branduardi et al. [17]

R

Branduardi et al. [17]

kanR

kanR

Leader sequence (source) Selection Reference

67

ScTPI

ScTPI

ScTPI

ZbTPI



ScTPI

ScTPI

ScTPI

ScTPI

ScTPI

ScTPI



pZLN022

pZLN022XPR2

pZLN022LIP1

pZ3ILbTLacZ

pZbrDL2d

p195Z3-LacZ

p195I-LacZ

p195IF-LacZ

p195ITF-LacZ

p195ITFI-LacZ

p195ITFi-LacZ

p212ScLEU2d



β-Galactosidase

β-Galactosidase

β-Galactosidase

β-Galactosidase

β-Galactosidase

β-Galactosidase



















Cr lipase

Yl protease







Kl killer toxin







Kl killer toxin

Sd glucoamylase

LEU2d

Dato et al. [22]

Dato et al. [22] R

kan

Dato et al. [22]

kanR

kanR

Dato et al. [22]

Dato et al. [22]

kan

Dato et al. [22] R

Dato et al. [22]

Dato et al. [22]

Dato et al. [22]

Passolunghi et al. [21]

Passolunghi et al. [21]

Passolunghi et al. [21]

Passolunghi et al. [21]

Dato et al. [22]

Dato et al. [22]

Branduardi et al. [17]

Sauer et al. [23]

kanR

kanR

LEU2d

kan

R

natR

natR

nat

R

HIS3

LEU2

LEU2

kan

R

kanR

kanR

Sauer et al. [23]

Branduardi et al. [17]

R

kan

Branduardi et al. [17]

kanR

Leader sequence (source) Selection Reference

Sc S. cerevisiae, Zb Z. bailii, Zr Z. rouxii, Kl K. lactis, Sd S. cerevisiae var. diastaticus, Aa A. adeninivorans, Yl Yarrowia lipolytica, Cr Candida rugosa, TPI triose phosphate isomerase, GAPDH glyceraldehyde phosphate dehydrogenase, ADH alcohol dehydrogenase, GFP green fluorescent protein, IL-1β human interleukin 1β, LDH lactate dehydrogenase, kanR hphR and natRcassettes conferring resistance to kanamycin, hygromycin, and nourseothricin respectively

Episomal

Episomal

Episomal

Episomal

Episomal

Episomal

Episomal



Integrative (in ZbLEU2) β-Galactosidase

Integrative (in ZbLEU2) Lipase

Integrative (in ZbLEU2) Protease

Integrative

oxidase

oxidase

β-1,3-Glucanosyl-transglycosylase

LDH

IL-1β

LDH

L-gulono-1,4-lactone

D-arabinono-1,4-lactone

Glucoamylase

Glucoamylase

Heterologous protein

Integrative (in ZbLEU2) –

Integrative

Centromeric

ScTPI

ScADH1

pLAT-ADH

Centromeric

pYX022ZbGAS1

ScTPI

pZ-RGLO

Centromeric

Centromeric

ScTPI

pZ-ALO

Centromeric

YCPlac111bTLDH ScTPI

ScTPI

pZ3klSTA2

Centromeric

Centromeric

ScTPI

pZ3STA2

Propagation system

YCPlac111KlIL-1β ScTPI

Promoter

Vector

Table 1 (continued)

Engineering Z. bailii as Cell Factory

69

Fig. 1 Schematic maps of the plasmids constructed for gene expression in Z. bailii. (a) pZ3: the backbone of the plasmid is the S. cerevisiae pYX022 expression plasmid; the expression cassette is based on the glycolytic S. cerevisiae TPI promoter and the corresponding poly A tail. The ARS/CEN sequence is from Ycplac33 and ensures replication and stability of the plasmid, while the KanR cassette, derived from pFA6-KanMX4, allows G418-based selection of the transformants; (b) pZ4: the same as pZ3, except for the selective marker, being the hphR cassette, derived from pAG26 and allowing hygromycin-based selection of the transformants; (c) pZ5: the same as pZ3, except for the selective marker, being the natR cassette, derived from pAG25 and allowing nourseothricine-based selection of the transformants (adapted from [17, 18])

stability during the segregation step [24]. S. cerevisiae genetic elements were initially used due to the lack of information about similar elements in Z. bailii and also based on successful experimental evidences of a similar approach pursued in the closely related yeast Zygosaccharomyces. rouxii [25]. The backbone of the vector pZ3 is the S. cerevisiae integrative expression plasmid pYX022 (RpD Systems, Wiesbaden, Germany). The ARS1-CEN4 fragment is from plasmid YCplac33 (ATCC 87623, GenBank accession numbers X75456, L26352) and substituted to the original HIS3 marker gene of pYX022, usually utilized for homologous integration (see [17] for molecular details of the cloning steps). As a dominant selection marker, plasmid pZ3 bears the E. coli-derived kanamycin resistance gene (KanR), conferring resistance to geneticin (G418). 2. pZ4: same as pZ3 except for the dominant marker, which is the hygromycin resistance gene (hphR) [26, 27] from Klebsiella pneumonia (Fig. 1b). 3. pZ5: same as pZ3 and pZ4 except for and the dominant marker, which is the nourseothricine resistance gene (natR) [18, 27, 28] from Streptomyces noursei (Fig. 1c). It has to be underlined that the described plasmids (1) can be used to transform both Z. bailii and S. cerevisiae, giving the possibility to compare the experimental results obtained in the two yeasts; (2) can be used as alternative plasmids for baker’s yeast transformation, being particularly useful when manipulating industrial prototrophic strains.

70

Paola Branduardi et al.

The segregational stability of the described centromeric plasmids is 70 ± 5 %, independent of the plasmid or the yeast species (see Note 1). Being based on the pYX plasmid series, in these new plasmids gene expression is under the control of the S. cerevisiae TPI (ScTPI) promoter, which is proven to be functional in Z. bailii. 2.3.2 Multicopy Plasmids

If using a [cir+] strain (i.e., a strain harboring 2 μ DNA), an additional artificial vector only needs to have the cis-acting elements, i.e., the autonomously replicating sequence (ARS) for replication, the inverted repeat (IR) for amplification, and the sequence conferring stability (STB) [29], as the endogenous 2 μ plasmid possesses all the trans-acting elements necessary for amplification and maintenance. In Z. bailii two natural plasmids, pSB1 and pSB2 have been described, which were shown to be functionally and structurally related to the S. cerevisiae 2 μm plasmid [30, 31]. 1. p195I: considering that ATCC 36947, ATCC 60483, and ATCC 8766 are [cir+] strains contain the pSB2 plasmid, the multicopy plasmid p195I for Z. bailii harbors only the cis-acting elements (see Notes 2 and 3). 2. p195I-LacZ (Fig. 2a): It derives from p195I and harbors the bacterial β-galactosidase gene as reporter system under the control of the ScTPI promoter. This plasmid leads to a β-galactosidase activity just slightly higher than the one obtained, exploiting the centromeric plasmid and is characterized by a very low stability. 3. p195ITFI and p195ITFI-LacZ: These plasmids have been constructed by cloning all the other pSB2 functional elements, including the FLP recombinase gene, the regulatory element TF-C, and the second inverted repeat IR-B (see Fig. 2a, b). The elements mentioned above were inserted maintaining their relative position, according to the pSB2 structure described [31]. These plasmids have higher stability and guarantee effective heterologous expression.

2.3.3 Integrative Plasmids

pZ3ILbT-LacZ: It is an integrative vector developed by exploiting the Z. bailii LEU2 (ZbLEU2) gene sequence for targeted gene integration into ZbLEU2 locus. The ZbLEU2 ORF is inserted with its own promoter, from pSTbLeuZ1 (SnaBI/HincII restriction fragment) into the plasmid pZ3bT-LacZ cut NaeI-blunt [17], replacing the ARS/CEN sequence. This allows for directing the integration into the ZbLEU2 genomic locus. The KanMX4 cassette was maintained as a selectable marker. This integrative plasmid is 100 % stable, independently of the cultivation medium and of the generation time, as demonstrated by measuring β-galactosidase activity and plasmid stability in different media and along the time.

Engineering Z. bailii as Cell Factory

71

Fig. 2 Heterologous β-galactosidase activity and plasmid stability: comparison between the centromeric and the different multicopy plasmids. (a) Schematic representation of the episomal vectors of the pSB2 elements. P promoter, L LacZ gene, T terminator (poly-A), A AmpR gene, K KanR cassette, o E. coli ori. (b) Z. bailii cells or (c) S. cerevisiae cells were transformed with the indicated expression vectors bearing the E. coli LacZ gene, or with the empty vector. Clones were grown in YPD-G418 until exponential phase and then harvested. The β-galactosidase (β-gal) activity was assayed on crude cell extracts as described previously [47]. Plasmid stability was determined as the percentage of CFU found on selective vs. nonselective medium. Three independent clones for each strain were tested in at least three independent experiments. Standard deviation was always

24 h incubation 3rd passage

survivors

inoculate survivors into stress medium

Stress =>

survivors

nth passage

Stock at -80°C

Stock at -80°C

Fig. 4 Experimental protocol for selection under continuous stress conditions

3.4 Random Selection of Individual Mutants from Final Population

1. Inoculate stock cultures of final populations obtained from both batch and chemostat selections into liquid YMM in Erlenmeyer flasks and incubate them until logarithmic phase of growth (OD600 = 1.2–1.5). 2. Prepare serial dilutions of the samples (see Note 9). 3. Inoculate the culture into petri plates by spreading. Use an optimum dilution factor to obtain individual colonies (see Note 9). 4. Incubate the plates at 30 °C until the appearance of individual colonies on the surface of the plates. 5. Pick at least 12 single colonies with different sizes (small and large) randomly and prepare their stock cultures as explained in Subheading 2.1. 6. Each single colony represents an individual mutant.

3.5 Estimation of Stress Resistance 3.5.1 Most Probable Number (MPN) Method

Stress resistance of the single colonies that were selected randomly (step 5 in Subheading 3.4) is estimated by MPN method [10], with slight modifications. 1. Inoculate the culture to be tested (step 5 in Subheading 3.4) into YMM for overnight growth at 30 °C.

Evolutionary Engineering of Yeast

177

2. After 14–16 h of incubation, determine OD600 values of the pre-culture. 3. Inoculate the pre-culture into 10 mL of YMM in 50 mL culture tubes at initial OD600 to 0.35. 4. Incubate the cultures at 30 °C until OD600 of 1.2–1.6. 5. Prepare YMM and YMM with the selected stress factor(s) to fill two separate 96-well plates (180 μL in each well). 6. Distribute 180 μL of each medium into the first five columns of the 96-well plate as the five repeats of one sample. 7. Add 20 μL of the culture at step 4 into the first row (five wells) of the 96-well plate to obtain tenfold dilution of the culture in the first well (see Note 10). 8. Take 20 μL of the diluted samples in the first row and transfer them to the next row below containing 180 μL of medium to obtain 102-fold dilution. 9. Use new tips for each dilution step and continue diluting the sample up to 108-fold dilution. 10. After serial dilution steps, incubate the cells at 30 °C and 50 rpm for 72 h. 11. Use five-tube MPN table (Table 1 from [11]) to calculate the number of cells/mL in stress-treated and control samples, as explained in Fig. 5. 12. Calculate the survival ratio according to Eq. 2: Survival ratio = 3.5.2 Spot Assay

number of cells / mL of stress treated samples number of cells / mL of control samples

( 2)

1. Inoculate the wild-type strain, the final populations, and individual mutants (step 5 in Subheading 3.4) into YMM and incubate them overnight at 30 °C and 150 rpm. 2. Next day, inoculate those cultures to fresh YMM at initial OD600 to 0.35 (~5 × 106 cells/mL). 3. Grow the cultures until their late logarithmic phase (OD600 ~ 3). 4. Harvest cells by centrifugation (10,000 × g, 5 min) to have 4 OD600 units of cells (e.g., 4 OD unit = 1 mL culture at OD600 4.0). 5. Centrifuge them at 10,000 × g for 5 min, discard supernatant and resuspend the cell pellets in 50 μL of sterile dH2O. 6. For each culture, prepare 101, 102, 103, and 104-fold dilutions and spot 3 μL of each dilution onto the control and stresscontaining YMM plates. 7. Incubate the plates for 3 days at 30 °C and observe the growth of the cultures (Fig. 6).

178

Ceren Alkım et al.

Table 1 Five-tube MPN table (adapted from [11]) No. of tubes positive in First set

Middle set

Last set

MPN in the inoculum of the middle set of tubes

0

0

0

24

1. Inoculate an individual mutant (obtained in step 5 of Subheading 3.4) taken from a frozen glycerol stock stored at −80 °C to 10 mL of YMM in a 50 mL culture tube and incubate at 30 °C, 150 rpm. The individual mutant to be analyzed for genetic stability should have improved stress resistance, based on the tests explained in Subheadings 3.5.1 and 3.5.2. The mutant can be any individual mutant selected by batch or chemostat selection procedures.

Ceren Alkım et al. 10-1 10-2

v

10-3 10-4

v

180

10-5 10-6 10-7 10-8

Fig. 5 Schematic representation of MPN method on a 96-well plate. Dark spots indicate the wells with growth upon serial dilution and 72 h of incubation. MPN score for this growth pattern is 5-4-0 in the range of 10−4–10−6 dilutions; all five parallel samples could grow up to 10−4 dilution, four out of five cultures could grow up to 10−5 dilution and none (0) of the five parallel samples could grow until 10−6 dilution. Thus, according to five-tube MPN table (Table 1), this result corresponds to 1.3 × 105 cells/mL. Since dilution factor is 105, in this range, the number of cells in the culture is determined as 1.3 × 1010 cells/mL Dilutions 10-4 10-3 10-2 10-1

Dilutions 10-4 10-3 10-2 10-1

Wild type Final Population Control medium

Stress-containing medium

Fig. 6 Spot assay results of wild-type and final population obtained by evolutionary engineering on media with and without stress condition. Cells are spotted (10−1–10−4) in 1:10 dilution series on YMM plates without stress (control medium) and with stress containing medium. Growth is monitored upon 3 days of incubation. Both wild-type and stress-resistant final population could grow in control medium at all their dilutions (from 10−1 to 10−4). However, wild type could barely grow at its tenfold dilution (10−1) in stress containing medium. Additionally, the final population selected for that stress could grow nearly at all of its dilutions (10−1–10−4) which demonstrate its high resistance properties against the stress factor tested

2. Upon incubation at 30 °C and 150 rpm for 24 h, prepare a glycerol stock culture from the first culture. Re-inoculate this culture into fresh liquid 10 mL YMM in a 50 mL culture tube to an initial OD600 of 0.35 (~5 × 106 cells/mL). 3. After 24 h, repeat steps 2 and 3 for five times to obtain five passages of the mutant culture under nonselective conditions. 4. Determine stress resistances of these five successive passages quantitatively, as explained in Subheading 3.5.1 (see Note 11).

Evolutionary Engineering of Yeast

3.7 Analysis of Cross-Resistance to Other Stress Types

4

181

Mutants selected by evolutionary engineering for a specific stress resistance usually gain cross-resistance against other types of stresses [9, 10]. For this reason, cross-resistance to other stress types should be tested for mutants and compared to the results of the wild type (see Note 12).

Notes 1. Chemical or physical (UV) mutagenesis has been used to induce genetic variation and to increase genetic diversity [12]. Lawrence’s method [13] of EMS mutagenesis has been carried out with slight modifications in our group for all types of selection strategies. 2. Prototrophic yeast cultures are generally preferred for evolutionary engineering research, as the growth requirements of auxotrophic hosts might be complicated [14]. 3. Ethyl methane sulfonate is genotoxic, maximum care must be taken when handling EMS [15]. One must wear gloves, lab coat, and safety goggles, and work in a chemical fume hood. 4. EMS treatment should be performed in such a way, that about 10 % of the cells survive. This survival rate depends both on the amount of EMS applied and the exposure time. It can vary depending on the strain that is used. 5. The stress level to be applied depends on the toxicity of the chemicals and is also directly related to the type of the strain. In our case, for S. cerevisiae CEN.PK 113.7D strain, millimolar levels of cobalt and micromolar levels for copper and nickel are acceptable. 6. Optical density measurement is a rapid and easy method to determine the approximate cell density in a culture [16]. However, some stress factors such as FeCl2 precipitate in the growth media and lead to serious errors in measurements. If that is the case, direct colony counts by triplicate plating might be used for determining the survival ratios. 7. Freeze-thaw and temperature stresses are suitable for pulse stress application. However, metal stresses could be applied for both pulse and continuous selection strategies successfully [8]. 8. During increasing stress level selection, 24 h of incubation may not be sufficient to allow the growth of survivors. For that reason, incubation should be continued for an additional 48 or 72 h. 9. Prepare dilutions of the cultures from 10−1 to 10−6. Inoculating all dilutions would increase the probability to obtain optimally separated individual colonies. However, 10−5 or 10−6 diluted samples usually give better results.

182

Ceren Alkım et al.

Table 2 Stress types that can be applied in YMM and their concentrations

Metal stress

Non-metal stress

Stress type

Chemical formula

Concentration

Aluminum chloride hexahydrate Ammonium iron (II) sulfate hexahydrate Boric acid Calcium chloride hexahydrate Cobalt chloride anhydrous Copper (II) sulfate pentahydrate Ferrous chloride Magnesium chloride hexahydrate Manganese (II) sulfate monohydrate Nickel (II) chloride hexahydrate Sodium chloride Zinc sulfate heptahydrate

AlCl3 · 6H2O (NH4)2Fe(SO4)2 · 6H2O

5–7.5 mM 35–50 mM

H3BO3 CaCl2 · 6H2O CoCl2 CuSO4 · 5H2O FeCl2 MgCl2 · 6H2O MnSO4 · H2O NiCl2 · 6H2O NaCl ZnSO4 · 7H2O

80 mM 1–1.5 M 2.5 and 5 mM 0.1 mM 10–15 mM 1M 20–25 mM 0.25–0.5 mM 0.5–0.75 M 10 mM

Bathophenanthroline disulfonic acid disodium salt hydrate Caffeine Ethanol Hydrogen peroxide Sorbitol

C24H14N2Na2O6S2 · H2O (empirical formula) C8H10N4O2 C2H6O H2O2 C6H14O6

80–160 μM 10 mM 8–10 % (v/v) 1 mM 2M

10. Wild type or other cultures could grow even at final dilutions of (10−8) under non-stress conditions. For this reason, dilute the cultures 100-fold in 1.5 mL microfuge tubes. Use these diluted samples for inoculating the first row. 11. If there is no regular/gradual decrease of survival ratio through five passages obtained during genetic stability test, the resistance characteristics of the mutant of interest are considered to be stable. 12. Metal and non-metal stresses are tested for wild-type S. cerevisiae CEN.PK 113.7D (MATa, MAL2-8c, SUC2) strain. The suitable stress concentrations for spot assay are indicated (Table 2). All tests are performed in YMM and changes of concentrations should be considered for other strains of S. cerevisiae.

Acknowledgements We thank TÜBİTAK (project no: 109 T638, 105 T314), COST (Action no: CM0902), and ITU Research Funds (project no: 33237, 34200) for financial support of our evolutionary engineering research.

Evolutionary Engineering of Yeast

183

References 1. Patnaik R (2008) Engineering complex phenotypes in industrial strains. Biotechnol Prog 24(1):38–47 2. Bailey JE (1991) Toward a science of metabolic engineering. Science 252(5013):1668–1675 3. Bailey JE, Shurlati A, Hatzimanikatis V, Lee K, Renner WA, Tsai PS (1996) Inverse metabolic engineering: a strategy for directed genetic engineering of useful phenotypes. Biotechnol Bioeng 52(1):109–121 4. Oud B, van Maris AJA, Daran JM, Pronk JT (2012) Genome-wide analytical approaches for reverse metabolic engineering of industrially relevant phenotypes in yeast. FEMS Yeast Res 12(2):183–196 5. Bro C, Nielsen J (2004) Impact of ‘ome’ analyses on inverse metabolic engineering. Metab Eng 6(3):204–211 6. Warner JR, Patnaik R, Gill RT (2009) Genomics enabled approaches in strain engineering. Curr Opin Microbiol 12(3):223–230 7. Çakar ZP, Turanlı-Yıldız B, Alkım C, Yılmaz Ü (2012) Evolutionary engineering of Saccharomyces cerevisiae for improved industrially important properties. FEMS Yeast Res 12(2): 171–182 8. Çakar ZP, Alkim C, Turanli B, Tokman N, Akman S, Sarikaya M, Tamerler C, Benbadis L, Francois JM (2009) Isolation of cobalt

9.

10. 11. 12. 13. 14. 15.

16.

hyper-resistant mutants of Saccharomyces cerevisiae by in vivo evolutionary engineering approach. J Biotechnol 143(2):130–138 Çakar ZP, Seker UOS, Tamerler C, Sonderegger M, Sauer U (2005) Evolutionary engineering of multiple-stress resistant Saccharomyces cerevisiae. FEMS Yeast Res 5(6–7): 569–578 Russek E, Colwell RR (1983) Computation of most probable numbers. Appl Environ Microbiol 45(5):1646–1650 Lindquist J (2012) A five-tube MPN table. http://www.jlindquist.net/generalmicro/ 102dil3a.html. Accessed on November 2012 Sauer U (2001) Evolutionary engineering of industrially important microbial phenotypes. Adv Biochem Eng Biotechnol 73:129–169 Lawrence CW (1991) Classical mutagenesis techniques. Methods Enzymol 194:273–281 Çakar ZP, Sauer U, Bailey J (1999) Metabolic engineering of yeast: the perils of auxotrophic hosts. Biotechnol Lett 21(7):611–616 Gocke E, Buergin H, Mueller L, Pfister T (2009) Literature review on the genotoxicity, reproductive toxicity, and carcinogenicity of ethyl methanesulfonate. Toxicol Lett 190(3): 254–265 Monod J (1949) The growth of bacterial cultures. Annu Rev Microbiol 3:371–394

Chapter 11 Determination of a Dynamic Feeding Strategy for Recombinant Pichia pastoris Strains Oliver Spadiut, Christian Dietzsch, and Christoph Herwig Abstract The knowledge of certain strain specific parameters of recombinant P. pastoris strains is required to be able to set up a feeding regime for fed-batch cultivations. To date, these parameters are commonly determined either by time-consuming and labor-intensive continuous cultivations or by several, consecutive fed-batch cultivations. Here, we describe a fast method based on batch experiments with methanol pulses to extract certain strain characteristic parameters, which are required to set up a dynamic feeding strategy for P. pastoris strains based on specific substrate uptake rate (qs). We further describe in detail the course of actions which have to be taken to obtain the desired dynamics during feeding. Key words Pichia pastoris, Methanol pulses, Specific substrate uptake rate, Dynamic fed-batch strategy

1

Introduction Recombinant protein production with the methylotrophic yeast Pichia pastoris is a key process not only in academic research but also in the biopharmaceutical industry. To date, several of the implemented fermentation strategies for P. pastoris are based on the Invitrogen protocol suggesting constant feeding profiles for fed-batch cultivations (http://tools.invitrogen.com). Different strategies, like a feed forward regime based on a constant specific growth rate (μ; e.g., [1, 2]), are based on this protocol. However, these strategies do not aim at minimizing substrate consumption or at improving and optimizing the specific productivity. One of the major goals of each recombinant protein production process is a maximum productivity. Currently, the outcomes regarding a possible interdependency between the specific productivity (qp) and the specific growth rate (μ) of P. pastoris are inconsistent as some studies show that qp does not relate to μ [1, 3, 4], whereas another study demonstrates growth association [5].

Valeria Mapelli (ed.), Yeast Metabolic Engineering: Methods and Protocols, Methods in Molecular Biology, vol. 1152, DOI 10.1007/978-1-4939-0563-8_11, © Springer Science+Business Media, LLC 2014

185

186

Oliver Spadiut et al.

Due to these controversial findings, another parameter than μ, namely, the specific substrate uptake rate (qs), has been analyzed for its possible correlation with qp recently [6–8]. However, regardless of which control parameter is chosen to set up a feeding profile for P. pastoris, specific parameters of the respective strain have to be determined. This can either be done by very time-consuming and labor-intensive continuous cultivations [9] or by several, consecutive fed-batch cultivations [10, 11]. However, to meet industrial demands, a fast and easy-to-do characterization of recombinant P. pastoris strains to extract bioprocessrelevant strain characteristic parameters for the subsequent set-up of production processes is essential. Here, we describe a novel, fast method based on batch experiments with methanol pulses to extract a minimal set of strain characteristic parameters, which are required to set up a dynamic feeding strategy for P. pastoris strains based on qs.

2

Materials Prepare all media and solutions with analytical grade reagents and deionized water.

2.1 Medium for Preculture

2.2 Medium for Batch and Fed-Batch Culture

Yeast nitrogen base (YNB) medium per L: 1.0 M potassium phosphate buffer (dissolve 118.1 g KH2PO4 and 23.0 g K2HPO4 in 1,000 mL distilled water, pH 6.0), 3.4 g YNB w/o Amino acids and Ammonium Sulfate, 10 g (NH4)2SO4, 400 mg biotin, 20 g glucose. Weigh YNB w/o amino acids and ammonium sulfate (NH4)2SO4, biotin, and glucose in a beaker. Add 100 mL of 1.0 M potassium phosphate buffer (pH 6.0), dissolve by stirring and set with water to 1 L. Filter-sterilize through a 0.2 μm cutoff filter into a sterile flask. If the strain of interest carries a resistance gene, add the respective antibiotics according to the specific resistance marker present in the strain (e.g., Zeocin, Kanamycin) to an appropriate concentration (e.g., 100 μg/mL medium). Store at 4 °C. 1. Basal salt medium (BSM) per L: 26.7 mL of 85 % (v/v) phosphoric acid; 1.17 g CaSO4 · 2H2O, 18.2 g K2SO4, 14.9 g MgSO4 · 7H2O, 4.13 g KOH, 44 g C6H12O6 · H2O, 0.3 mL Antifoam. Weigh the chemicals in a beaker, dissolve in around 600 mL of water, and then fill up to 725 mL in a measuring cylinder. Fill the bioreactor with this medium and autoclave. 2. C-source per L: weigh 220 g glucose monohydrate or 200 g glycerol (depending on which C-source you prefer) and fill with water to 1 L. Autoclave and store at 4 °C. 3. Trace metal solution (PTM1) per L: 6.0 g CuSO4 · 5H2O, 0.08 g NaI, 3.0 g MnSO4 · H2O, 0.2 g Na2MoO4 · 2H2O,

Determination of a Dynamic Feeding Strategy for Recombinant Pichia pastoris Strains

187

0.02 g H3BO3, 0.5 g CoCl2, 20.0 g ZnCl2, 65.0 g FeSO4 · 7H2O, 0.2 g biotin, 5 mL H2SO4. Weigh the chemicals in a beaker and fill with water to 1 L. Filter-sterilize through a 0.2 μm cutoff filter into a sterile flask and store at room temperature. 4. Base solution to set the pH: 2–3 M NH4OH. 2.3 Methanol Solution for Pulses

For methanol pulses during batch cultivations use pure methanol supplemented with PTM1. For this purpose add 12 mL PTM1 solution into 1 L pure methanol, filter-sterilize this solution into a sterile bottle and store this solution at 4 °C.

2.4

Depending on the goal of the fed-batch cultivation (either biomass formation or induction of protein expression) different feeding media can be prepared. For P. pastoris glucose and glycerol are prominent C-sources for biomass formation, whereas methanol is used for the induction of protein expression.

Feeding Medium

1. Glucose feed per L: 275 g glucose monohydrate, 12 mL PTM1, 0.3 mL antifoam. 2. Glycerol feed per L: 250 g glycerol, 12 mL PTM1, 0.3 mL antifoam. 3. Methanol feed per L: 300 g methanol (use a balance), 4 mL PTM1, 0.3 mL antifoam. The glucose and the glycerol feed can be sterilized via autoclavation; the methanol feed is sterile-filtered through a 0.2 μm cutoff filter into a sterile flask in order to avoid methanol evaporation. 2.5

Equipment

For a standard fed-batch experiment the following equipment is at least required: 1. Bioreactor (e.g., 5 L working volume glass bioreactor; Infors, Switzerland). 2. pH and pO2 probe. 3. Air and oxygen lines. 4. Offgas analyzer (e.g., infrared cell for CO2 and a zirconium dioxide sensor for O2 concentration; DasGip, Germany). 5. Pumps and tubings for base and feed. 6. Balances (reactor balance, feed balance, base balance)— connected to the process information management system. 7. Process information management system (PIMS; e.g., Lucullus, SecureCell, Switzerland). 8. Spectrophotometer, centrifuge and dry oven for sample preparation. 9. HPLC for exact determination of methanol concentrations (e.g., Agilent Technologies, USA) equipped with a Supelco

188

Oliver Spadiut et al.

guard column, a SUPELCOGEL C-610H ion-exclusion column (Sigma-Aldrich, USA) and a refractive index detector (Agilent Technologies, USA).

3

Methods

3.1 Preculture of Pichia pastoris

Start a pre-culture of the P. pastoris strain of interest in 100 mL of YNB medium in 1 L baffled shaking flasks at 220 rpm and 28 °C for maximum 24 h (to guarantee good aeration only 1/10 of the total volume of the flask is filled with medium). The preculture is inoculated with 1 mL of frozen glycerol stock (see Notes 1–3).

3.2 Batch Cultivation in Bioreactors

After autoclaving the BSM in a bioreactor vessel, aseptically add the C-source (e.g., 40 g/L glucose or glycerol as final concentration in the vessel). Then adjust the temperature and the stirring speed to the desired values, before the pH in the bioreactor is adjusted to pH 5.0 using 25 % (v/v) ammonia solution (NH4OH) by manually pumping the solution into the bioreactor. Aseptically transfer sterile PTM1 solution to the bioreactor (4.5 mL/L BSM; see Note 4). Aseptically transfer the preculture into a sterile inoculation flask (i.e., a vessel providing a connection to the bioreactor). The inoculum should be around 10 % of the final volume in the bioreactor. Dissolved oxygen (dO2) is measured with a sterilizable polarographic dissolved oxygen electrode. The pH is measured on-line with a sterilizable electrode and maintained constant with a step controller using 2–3 M NH4OH which also represents the N-source during cultivation. The exact concentration of NH4OH in the base bottle is determined by titration with 0.25 M potassium hydrogen phthalate (KHP; see Note 5). Base consumption is determined gravimetrically by putting the base bottle on a balance and recording the loss in weight over time. Set the cultivation temperature (e.g., 28 °C) and fix the agitation to the highest possible setpoint (e.g., 1,495 rpm) to guarantee good aeration. Aerate the culture with 2.0 vvm dried air (i.e., volume per volume per minute; in 1 L cultivation volume 2.0 vvm correspond to 2 L of dry air per minute). Measure the offgas of the culture by using an infrared cell for CO2 and a paramagnetic cell for O2 concentration. Temperature, pH, dO2, agitation in the vessel, as well as CO2 and O2 in the offgas are measured on-line and logged in a process information management system.

3.3 Analysis of Growth-Parameters During Cultivations

Harvest 5 mL of culture broth by centrifugation in 10 mL glass tubes (4,300 × g, 4 °C, 10 min), wash the pellet twice with 5 mL deionized water and determine the dry cell weight (DCW) after drying at 105 °C to a constant weight in an oven (approximately 2–3 days; see Note 5). Optical density of the culture broth throughout the

Determination of a Dynamic Feeding Strategy for Recombinant Pichia pastoris Strains

189

Fig. 1 Experimental strategy for the fast determination of strain specific parameters of P. pastoris using a batch experiment with methanol pulses of 0.5 and 1 % (v/v). continuous line, carbon dioxide emission rate (CER); circle, calculated specific substrate uptake rate (qs). Figure adapted with permission from [14]

process is measured using a spectrophotometer at a wavelength of 600 nm (OD600). Dry cell weight measurement and OD600 have to be correlated to be able to use the measured OD600 values for qs adaptation in subsequent fed-batch cultivations (see Notes 6 and 7). 3.4 Batch Cultivation with Methanol Pulses

After the complete consumption of the C-source (e.g., glucose or glycerol), which is indicated by an increase of dissolved oxygen and a drop in offgas activity, perform the first methanol pulse of a final concentration in the bioreactor of 0.5 % (v/v). Following pulses are performed one after the other with 1 % (v/v) methanol as soon as methanol in the bioreactor is depleted (evident in the offgas analysis; Fig. 1). To obtain the specific rates for substrate uptake during each pulse, a minimum of two samples has to be taken, one directly after methanol addition and the other before complete methanol depletion. The samples are used to determine the concentration of residual substrate, OD600 and the dry cell weight. Determined values at the beginning and the end of the respective pulse are used to calculate an average value of the specific substrate uptake rate (qs) according to Eq. 1 (in short: take a sample—pulse methanol— when the offgas signal drops, indicating depletion of methanol, take another sample—measure the exact methanol concentration in these two samples by HPLC—calculate the volumetric methanol uptake rate and relate it to the total biomass content at the latter sample point).

190

Oliver Spadiut et al.

∆MeOH ( g ) ∆time ( h ) biomass ( g ) q s ( mmol / g / h ) = M ( MeOH ) × 1, 000

(1)

M(MeOH) = 32.04 (g/mol), biomass = amount of biomass in gram at the time point of the latter sampling The parameters which can be determined by this strategy are: (1) adaptation time of P. pastoris the methanol, (2) the qs during the adaptation pulse and (3) a maximum qs for methanol (qs max) (see Note 8). We recommend performing at least four methanol pulses to obtain statistically significant parameters (see Note 9). Fed-Batch

During fed-batch the dissolved oxygen (dO2) signal is used to adjust air-in flow to keep levels >30 % dO2 at all time points. In case air flow is not sufficient to keep this dO2 level, pure oxygen is added. After the complete consumption of the substrate at the end of the batch phase (see Notes 10 and 11), which is indicated by an increase of dissolved oxygen and a drop in offgas activity, the feeding phase of the fed-batch feed is started. The feed rate is measured and controlled using a gravimetrically based PID flow controller (see Note 12). The culture is first adapted to methanol: the adaptation is performed at a qs setpoint of 0.5 mmol/g/h (Fig. 2). As soon as the offgas signal (CO2) gets constant, the qs setpoint can be stepwise increased up to the predetermined qs max of the respective strain (Fig. 2). To be able to analyze the physiology and the productivity of the P. pastoris strain at each qs step, it is required to allow the cells to adapt to the respective qs step before increasing it (see Notes 13 and 14).

3.6 Substrate Concentrations

Samples are centrifuged (20,000 × g, 15 min) and then concentrations of methanol are determined in cell free samples by HPLC. The mobile phase is 0.1 % H3PO4 with a constant flow rate of 0.5 mL/min and the system is run isocratic. Calibration is done by measuring standard points in the range of 0.1–10 g/L methanol.

3.5

4

Notes 1. When combining the sterile solutions for the preculture in a baffled shaking flask, work in the laminar flow hood and be careful to work sterile and avoid contaminations. 2. If necessary, add antibiotics specific for the selection markers harbored by the strain (e.g., Zeocin, Kanamycin) to the preculture in appropriate concentrations to further reduce the risk of contamination.

Determination of a Dynamic Feeding Strategy for Recombinant Pichia pastoris Strains

191

Fig. 2 Fed-batch cultivation of a P. pastoris strains on methanol with a stepwise increase of qs to qs max. straight line, set point for qs; black dot, calculated qs values; black triangle, methanol concentration in the supernatant. The qs max of the respective P. pastoris strain was determined with 1.94 mmol/g/h before. When this level is exceeded in fed-batch cultivations, methanol accumulates in the cultivation broth. Figure adapted with permission from [15]

3. The glycerol stocks is prepared by mixing 1 mL of a fresh overnight culture of the respective P. pastoris strain with 0.5 mL sterile 75 % glycerol (v/v) and snap-freezing it in liquid N2. The frozen glycerol stocks are then stored at −80 °C. 4. Before inoculating the bioreactor with the appropriate amount of preculture, the following actions should be taken: – Aseptically add the C-source to the sterile BSM in the bioreactor. – Set the desired temperature (typically 28–30 °C) and stirring speed (e.g., 1,495 rpm). – Set the pH value of the BSM to pH 5.0 with NH4OH and note the amount of base which is required to determine the overall content in the bioreactor vessel. – Add PTM1 aseptically to the cultivation broth. – Calibrate the pO2 electrode according to manufacturer’s instructions. – Adjust the weight of the bioreactor balance to the weight of the bioreactor content—the bioreactor weight is logged in the process information management system and by adjusting it correctly at this stage of the bioprocess the final data analysis will be facilitated.

192

Oliver Spadiut et al.

– Note the “O2 wet value,” which corresponds to the O2 content measured in the offgas before inoculation. This value will be needed for the final data analysis. – Aseptically inoculate the bioreactor with preculture (i.e., 100 mL for a final volume of 1 L cultivation broth). – When taking samples, note the exact process time for the calculation of specific rates. 5. We recommend taking at least two samples for the batch phase (right after inoculation and after the C-source is depleted) as well as at least two samples for each methanol pulse (before the pulse and after methanol depletion, which is indicated by a drop in the offgas signal). During the fed-batch phase we recommend taking samples every 4 h. 6. For the base titration the following materials are required: base (KOH, NH4OH), 0.25 M KHP, Bromothymol blue (indicator), burette and beaker with magnetic stirrer. Add 2 mL of base to the beaker (dilute NH4OH 1:10) and use the burette to add 0.25 KHP. At the point of equivalence, the color of the indicator will turn from blue to grey and then to green. Calculate the molarity of the base according to: molarity _ base = f ×

molarity _ KHP × base _ consumption ( KHP ) volume _ base ( diluted / before titration )

molarity_base (mol/L), molarity_KHP (mol/L), base_ consumption (KHP) [mL], volume_base (mL), f = dilution factor of base before titration 7. To be able to use the OD600 values to set the feeding rate to the desired qs setpoint, it is crucial to have a good and reliable calibration curve of the OD600 and the biomass content (Dry Cell Weight) in (g/L). Before starting the fed-batch bioreactor cultivation, generate such a calibration curve by using the biomass from batch cultivations in different dilutions. 8. During cultivations, use the same photometer for OD600 measurements as for the calibration curve. Do not switch photometers during the experiment. 9. The strain characteristic parameters which can be analyzed by the batch experiment with methanol pulses are: – Δtimeadapt: time period from induction until the offgas (CO2) has reached its maximum. – qs adapt: specific uptake rate for methanol during the adaptation pulse. – qs max: the maximum specific uptake rate for methanol during consecutive pulses.

Determination of a Dynamic Feeding Strategy for Recombinant Pichia pastoris Strains

193

10. To get even more precise data for qs, the methanol stripping from the bioreactor can be considered according to the Antoine’s equation [12, 13]. 11. For bioreactor cultivations, the C-source can be freely chosen depending on the research question (e.g., glucose, glycerol, sorbitol). 12. In case a high cell density cultivation is envisioned, the BSM should be concentrated (e.g., twofold) to ensure the availability of enough salt throughout the cultivation process. Additionally, when performing a high cell density cultivation (e.g., more than 100 g/L DCW), the concentration of the C-source can be increased. 13. The feeding profile based on a constant qs value describes an exponential function. Of course the manual adjustment of the feeding rate to the desired qs setpoint is laborious and cumbersome. However, this strategy does not require sophisticated equipment or soft sensor tools. Nevertheless, besides this manual adjustment, automatic feed forward feeding regimes are applicable. Similar to feeding strategies based on a predefined equation for a certain specific growth rate, also the specific substrate uptake rate can be controlled automatically assuming a constant yield coefficient on the substrate methanol. Based on the consumed methanol feed (with known concentration), which is determined gravimetrically via a feed balance, the amount of generated biomass is easily computable. To calculate the biomass yield coefficient on the substrate methanol for a certain strain, the here described methanol pulse strategy during batch cultivations can be used. 14. For laboratories which are not equipped with an offgas analyzer, the methanol pulses can also be followed by the dissolved oxygen (dO2) signal. After pulse addition, dO2 declines and only rises again when methanol is depleted.

Acknowledgements The authors are very grateful to the Austrian Science Fund (FWF): project P24861-B19 for financial support. References 1. Potgieter TI, Kersey SD, Mallem MR, Nylen AC, d’Anjou M (2010) Antibody expression kinetics in glycoengineered Pichia pastoris. Biotechnol Bioeng 106:918–927 2. Jacobs PP, Inan M, Festjens N, Haustraete J, Van Hecke A, Contreras R, Meagher MM, Callewaert N (2010) Fed-batch fermentation of GM-CSF-producing glycoengineered Pichia

pastoris under controlled specific growth rate. Microb Cell Fact 9:93 3. Sinha J, Plantz BA, Zhang W, Gouthro M, Schlegel V, Liu CP, Meagher MM (2003) Improved production of recombinant ovine interferon-tau by mut(+) strain of Pichia pastoris using an optimized methanol feed profile. Biotechnol Prog 19:794–802

194

Oliver Spadiut et al.

4. Zhang W, Sinha J, Smith LA, Inan M, Meagher MM (2005) Maximization of production of secreted recombinant proteins in Pichia pastoris fed-batch fermentation. Biotechnol Prog 21:386–393 5. Ohya T, Ohyama M, Kobayashi K (2005) Optimization of human serum albumin production in methylotrophic yeast Pichia pastoris by repeated fed-batch fermentation. Biotechnol Bioeng 90:876–887 6. Khatri NK, Hoffmann F (2006) Oxygen-limited control of methanol uptake for improved production of a single-chain antibody fragment with recombinant Pichia pastoris. Appl Microbiol Biotechnol 72:492–498 7. Khatri NK, Hoffmann F (2006) Impact of methanol concentration on secreted protein production in oxygen-limited cultures of recombinant Pichia pastoris. Biotechnol Bioeng 93: 871–879 8. Cunha AE, Clemente JJ, Gomes R, Pinto F, Thomaz M, Miranda S, Pinto R, Moosmayer D, Donner P, Carrondo MJ (2004) Methanol induction optimization for scFv antibody fragment production in Pichia pastoris. Biotechnol Bioeng 86:458–467 9. d’Anjou MC, Daugulis AJ (2001) A rational approach to improving productivity in recom-

10.

11.

12.

13.

14.

15.

binant Pichia pastoris fermentation. Biotechnol Bioeng 72:1–11 Ren H, Yuan JQ, Bellgardt KH (2003) Macrokinetic model for methylotrophic Pichia pastoris based on stoichiometric balance. J Biotechnol 106:53–68 Ren HT, Yuan JQ (2005) Model-based specific growth rate control for Pichia pastoris to improve recombinant protein production. J Chem Technol Biotechnol 80:1268–1272 Dietzsch C, Spadiut O, Herwig C (2011) A dynamic method based on the specific substrate uptake rate to set up a feeding strategy for Pichia pastoris. Microb Cell Fact 10:14 Rodgers RC, Hill GE (1978) Equations for vapour pressure versus temperature: derivation and use of the Antoine equation on a handheld programmable calculator. Br J Anaesth 50:415–424 Krainer FW, Dietzsch C, Hajek T, Herwig C, Spadiut O, Glieder A (2012) Recombinant protein expression in Pichia pastoris strains with an engineered methanol utilization pathway. Microb Cell Fact 11:22 Dietzsch C, Spadiut O, Herwig C (2011) A fast approach to determine a fed batch feeding profile for recombinant Pichia pastoris strains. Microb Cell Fact 10:85

Part II Tools and Technologies for Investigation and Determination of Yeast Metabolic Features

Chapter 12 Yeast Metabolomics: Sample Preparation for a GC/MS-Based Analysis Sónia Carneiro, Rui Pereira, and Isabel Rocha Abstract Metabolome sample preparation is one of the key factors in metabolomics analyses. The quality of the metabolome data will depend on the suitability of the experimental procedures to the cellular system (e.g., yeast cells) and the analytical performance. Here, we summarize a protocol for metabolome analysis of yeast cells using gas chromatography–mass spectrometry (GC–MS). First, the main phases of a metabolomics analysis are identified: sample preparation, metabolite extraction, and analysis. We also provide an overview on different methods used to quench samples and extract intracellular metabolites from yeast cells. This protocol provides a detailed description of a GC-MS-based analysis of yeast metabolome, in particular for metabolites containing amino and/or carboxyl groups, which represent most of the compounds participating in the central carbon metabolism. Key words Metabolic profiling, Metabolome, Gas chromatography–mass spectrometry, Quenching, Metabolite extraction, Metabolite derivatization

1

Introduction Metabolomics has emerged as a powerful tool to identify and quantify large number of metabolites (i.e., small organic molecules), providing substantial information on the organization of cellular metabolism. Nevertheless, these analyses are still challenging, especially due to the number of analytes (i.e., metabolites, in this specific case) present in biological samples, their concentration ranges and, most significantly, their chemical diversity (classes of metabolic compounds). With the development of novel and diverse experimental procedures, significant progress has been accomplished. In a basic workflow procedure, there are three essential steps that determine the capacity and success of the designed metabolomic analysis: (1) the preparation of samples: it should take into consideration the target metabolite set (i.e., intracellular or extracellular metabolites) and its turnover time, as well as the final concentration

Valeria Mapelli (ed.), Yeast Metabolic Engineering: Methods and Protocols, Methods in Molecular Biology, vol. 1152, DOI 10.1007/978-1-4939-0563-8_12, © Springer Science+Business Media, LLC 2014

197

198

Sónia Carneiro et al.

of biomass required for the analysis; (2) the extraction method specific for the microbial system and targeted compound classes under analysis; (3) and finally, the selection of the analytical platform that can consist on a single analytical technique or a combination of various techniques. 1.1 Sample Preparation

During this step, intracellular metabolites (i.e., endometabolome) and/or the extracellular metabolites secreted by cells (i.e., exometabolome) are collected. The endometabolome is more technically demanding than the exometabolome: besides the difficulty in designing protocols capable of extracting metabolites from the intracellular milieu, a previous quenching step has to be devised to assure a rapid metabolic activity arrest. Typically, a good criterion to evaluate the need for quenching is the turnover time of target metabolites, which can be determined as the ratio between the metabolite intracellular concentration and the flux through that pool at physiological conditions. For example, most metabolites associated with the central metabolism have short turnover times, such as pyruvate or fumarate with 2–4 s, compared to the ~3,000 s for alanine or histidine [1]. The quenching step is essential to assure that all metabolic activities inside the cell are arrested, and thus no further enzymatic conversions of metabolites are taking place. Usually, quenching protocols employ a rapid increase or decrease in temperature to prevent enzyme activity. Commonly applied quenching procedures to halt yeast metabolism are based on cold aqueous methanol (e.g., 60 % methanol at −40 °C) [2, 3]. Although methanol has been contested as a quenching agent due to the fact that it causes metabolite leakage from the cells [1, 4, 5], it has recently been demonstrated that increasing the methanol concentration could effectively prevent the leakage of intracellular metabolites [1]. During the quenching procedure it is crucial that metabolites remain inside the cells, otherwise the levels of metabolites after cellular extraction would be underestimated. Besides the specific quenching agent, many metabolomics protocols recommend the addition of buffers to maintain a neutral pH [6, 7]. This is supposed to prevent or minimize cell damage caused by changes in pH or osmotic shock. However, non-salt-based buffers (e.g., tricine) should be selected over salt-based buffers (e.g., phosphate, HEPES), as the presence of salts in samples may cause undesirable ion suppression effects in mass spectrometric assays. Several quenching protocols have been proposed and successfully applied to yeast cells (see Table 1). Although a few studies have evaluated the effects of different quenching agents in cell integrity, any quenching method is a tradeoff between metabolite leakage and the turnover of intracellular metabolites. Other factors as the time of exposure to the quenching solution and temperature are equally important when optimizing a quenching procedure and are often disregarded in most protocols.

Metabolomics in Yeast by GC-MS

199

Table 1 List of quenching solutions that have been applied in yeast metabolome analysis

a

Quenching agent/buffer

Details

References

Methanol (1)

Pure methanol at −40 °C with a sample/quenching solution ratio of 1:5

[1]

Methanol (2)

Pure methanol at −80 °C with a sample/quenching solution ratio of 1:3

[8]

Methanol–tricine

60 % (v/v) methanol buffered with 10 mM tricine; pH = 7.4 at −50 °C

[6]

Methanol–ammonium acetate

60 % (v/v) methanol buffered with 10 mM ammonium acetate (pH 7.5) at −40 °C

[9]

Methanol–Nethylmaleimidea

Methanol–N-ethylmaleimide solution (4 mM) at −70 °C

[10]

Glycerol–sodium chloride

3:2 (v/v) glycerol–saline solution (13.5 % NaCl) at −20 °C

[11, 12]

Perchloric acid

35 % (w/w) in water at −40 °C

[13]

Nitrogen

Liquid nitrogen

[14]

N-ethylmaleimide was used to protect thiols from oxidation by binding to –SH groups

The prolonged exposure to “detrimental” quenching agents and the laboriousness to maintain samples at low temperatures (e.g., −40 °C) can strongly affect the endometabolome. 1.2 Metabolite Extraction

In the endometabolome analysis, after the quenching step, cells are normally separated from the quenching solution by centrifugation or filtration to eliminate contaminants and quenching chemicals. This is particularly necessary when cultivating cells in rich media, where a significant fraction of target metabolites (e.g., amino acids) is also present in the culture broth. Some protocols, however, exclude this step, performing a whole-broth extraction. Yet it should be kept in mind that the extracellular volume is greater than the intracellular volume and very high intracellular concentrations can represent a small fraction of the total whole-broth amount. Furthermore, it has recently been demonstrated that several metabolites have higher extracellular fractions under certain physiological conditions, particularly organic acids and amino acids [15], which can lead to the overestimation of intracellular levels of such metabolites if not properly eliminated before analysis. Intracellular metabolites are extracted by permeabilization of cell walls, usually with chemical and/or physical agents. These agents should allow for maximum extraction (i.e., as many metabolites as possible) with minimal alteration of the extracted metabolites in order to obtain meaningful metabolome data.

200

Sónia Carneiro et al.

The appropriate choice of the chemical agent, as well as the minimization of sample processing and keeping low temperatures (e.g., < −20 °C) during the extraction protocol can help to prevent metabolites degradation. Currently, there is a wide range of chemical agents for extracting metabolites from yeast cells (see Table 2). Most extraction methodologies apply chemical agents combined with physical or mechanical processes, such as freeze–thaw, sonication, bead-beating, or boiling. The freeze–thaw extraction with cold aqueous methanol is one of the most popular methods used with yeast cells [11], but other methodologies have also been demonstrated to perform with high efficiency. For instance, combined chemical-mechanical disruption by bead-beating using zirconia/silica beads or sonication have been also tested with yeast cells [10]. Though none of the published methodologies have shown to cover the entire scope of the yeast metabolome, extraction protocols that do not require heating (e.g., boiling ethanol) are often preferred to prevent the degradation of heat-labile compounds. Also, protocols that minimize the sample processing are often chosen, not only for simplicity reasons, but because it helps to prevent metabolite degradation (e.g., metabolite oxidation). 1.3 Metabolite Analysis

Mass spectrometry (MS)-based techniques have been widely applied in yeast metabolomics analysis, in particular gas chromatography with MS detection (i.e., GC-MS). These techniques are mostly applied to the measurement of relative concentration changes of metabolites using an internal standard (e.g., D-4-alanine), but can also be used for the determination of absolute concentrations in samples. However, measurements are normally noisy with high relative standard deviations within the range of 20–30 %, and the correlation between the MS signal and concentration depends strongly on the analyte structure. Therefore, it is most common to use relative ion intensities across samples to estimate relative metabolites concentrations, and compare metabolic changes across samples. For further details on processing data and statistical analysis, we refer to existing literature on chemometrics [19–21]. Commonly, a derivatization step is required prior to the GC analysis, so that metabolites containing functional groups with active hydrogens such as –SH, –OH, –NH, and –COOH are modified by derivatizing chemical reagents through alkylation, acylation, or silylation. These derivatization reactions render metabolites less polar, more volatile and more thermally stable. Nearly all functional groups can be derivatized by silylation, which makes this the most widely used derivatization technique [22]. However, the most suitable derivatization agent should be evaluated according to the functional groups of the target analytes, among other criteria, such as complete derivatization of the analytes and stability of the derivatives during the reaction. Taking into consideration all these aspects guarantees a more comprehensive GC-MS analysis.

Boiling ethanol solution (75 % v/v ethanol–water) is added to the pellet and then placed in a hot water bath at 95 °C for a period of 3 min. The ethanol extracts are cooled to −40 °C in a cryostat and dried for 110 min under controlled vacuum and temperature. The dried residue is then dissolved in 500 μL of water and centrifuged for 5 min at 3,000 × g Cold aqueous methanol (50 % v/v, −40 °C) and chloroform (−40 °C) are added to the pellet and samples are vigorously shaken for 45 min in a orbital shaker at −40 °C, and then centrifuged (5,000 × g, 5 min, −20 °C). The upper water–methanol phase is collected and the lower phase is re-extracted with cold aqueous methanol (50 % v/v, −40 °C) by vortexing for 30 s. After centrifugation, the upper phase is pooled with the first extracts Samples are resuspended in perchloric acid (35 %, −25 °C) and three freeze– thaw cycles are carried out to extract metabolites. After neutralization to pH 7.0 (2 M KOH, 0.5 M imidazole), a filtration step is performed to remove the KClO4 precipitate Pellets are incubated in an α-aminobutyrate solution (200 μM) for 15 min at 100 °C. Subsequently, extracts are cooled on ice and centrifuged for 5 min at 16,000 × g, 4 °C

Sugar phosphates

Amino acids Carboxylic acids Nucleotides Sugar phosphates

Nucleotides Sugar phosphates

Amino acids

Amino acids Carboxylic acids

Hot ethanol (2)

Methanol– chloroform

Perchloric acid

α-aminobutyrate

Cold methanol

Freeze–thaw cycles at low temperatures (−20 °C). To enhance recovery, cells are washed with cold methanol once or twice

Boiling ethanol solution (75 % v/v ethanol–water containing 0.25 M Hepes, pH 7.5) is added to cell pellets and incubated for 3 min at −80 °C. After cooling down, the mixture is placed on ice for 3 min and the volume is reduced by evaporation at 45 °C using a rotavapor apparatus. The residue is then resuspended to a final volume of 1 mL with distilled water and centrifuged for 10 min at 5,000 × g at 4 °C to remove the insoluble particles

Carboxylic acids Nucleotides Sugar phosphates

Hot ethanol (1)

Details Cold extraction solvent (−20 °C) of acetonitrile–methanol–water (40:40:20) is added to pellets and kept for 15 min at 4 °C. After centrifugation the supernatant is collected and cell pellets can be extracted again

Compounds classes

Acetonitrile–methanol

Extraction agent

Table 2 List of some chemical-based methodologies to extract intracellular metabolites applied to yeast

[11]

[18]

[14]

[17]

[16]

[3]

[8]

References

Metabolomics in Yeast by GC-MS 201

202

Sónia Carneiro et al.

The following protocol resumes a consensus set of techniques for yeast metabolome analysis, particularly for the analysis of metabolites (both from the endometabolome and exometabolome) containing amino and/or carboxyl groups, which represent most of the compounds participating in the central carbon metabolism. The alkylation of amino and non-amino organic acids into volatile carbamates and esters using methyl chloroformate (MCF) as derivatization agent has been chosen because it provides complete and stable derivatives for GC analysis and has shown good results with yeast samples [11, 23]. Procedures for the sample preparation have been also selected based on successful results with yeast samples. The quenching method using pure cold-methanol has been shown recently to be a leakage-free quenching method for yeast metabolomics [1] and the extraction method is based on the widely applied freeze–thaw method in methanol [11, 23, 24]. The choice of the sample preparation is a critical step with significant consequences for the accuracy of results and biological interpretation of data; therefore, this protocol describes methodologies that have shown high reproducibility and adequate metabolite coverage in yeast metabolomics studies [1, 11, 17, 23].

2

Materials All solutions are prepared using solvents with the highest purity available (minimum purity should be HPLC grade) and bi-distilled water (MilliQ™). Solutions should be filtered through a 0.2 μm filter and kept at low temperatures (−20 ºC) before using.

2.1

Quenching

Quenching solutions: pure methanol. Store at −20 °C.

2.2

Extraction

Extraction solution: 1:1 (v/v) methanol–water solution. Mix 500 mL of methanol with 500 mL of bi-distilled water. Store at −20 °C.

2.3

Derivatization

1. Methyl chloroformate (MCF), methanol, and pyridine (see Note 1). 2. Sodium hydroxide solution: 1 M NaOH. Add 800 mL of bi-distilled water to a 1 L volumetric flask. Weigh 40 g of NaOH and transfer to the volumetric flask. Dissolve and adjust the volume to 1 L with bi-distilled water. Store at room temperature. 3. Sodium bicarbonate solution: 50 mM NaHCO3. Add 900 mL of bi-distilled water to a 1 L volumetric flask. Weigh 4.2 g of NaHCO3 and transfer to the volumetric flask. Dissolve and adjust the volume to 1 L with bi-distilled water. Store at room temperature.

Metabolomics in Yeast by GC-MS

2.4

Silanization

2.5 Metabolite Analysis

203

Silanization solution: 1:9 (v/v) dichlorodimethylsilane–hexane solution. Mix 900 mL of hexane with 100 mL of dichlorodimethylsilane. Store at room temperature (see Note 1). 1. Internal standard: 10 mM DL-Alanine-2,3,3,3-d4. Weigh 0.09312 g of d4-alanine and transfer to a 100 mL volumetric flask. Make up to 100 mL with bi-distilled water. Store at −80 °C (see Note 2). 2. GC-MS equipment: Gas chromatographer coupled to a mass spectrometer. Suggested column: GC capillary column (30 m × 0.25 ID mm × 0.15 μm film thickness) with 5-m guard column.

3

Methods Sample procedures should be carried out, as much as it is possible, at low temperatures (i.e., below −20 ºC).

3.1 Harvesting and Quenching the Yeast Culture: Pure Methanol Quenching

1. Grow the yeast strain in the conditions desired (see Note 3). 2. Precool the centrifuge rotor (with capacity for 15 mL centrifuge tubes) at −40 ºC, in order to prevent warming of the samples to temperatures above −20 ºC while centrifuging at −20 °C. 3. Prepare 15 mL centrifuge tubes with 8 mL of pure methanol at −40 ºC and keep them in a ethanol refrigerated bath at −40 ºC until harvesting the yeast culture. 4. Directly quench approximately 2 mL of culture broth into the cold quenching solution (see Note 4) and homogenize immediately, either manually or using a vortex mixer (~5 s). Keep the quenched samples in the refrigerated bath at −20 ºC for 5 min before centrifugation (when using the pure methanol method for extraction, temperature can be kept even lower than −20 ºC). 5. In the meantime, collect 5 mL of the culture broth and centrifuge at 10,000 × g for 10 min at 10 °C. Divide the supernatant into 3 × 1 mL aliquots. Store at −80 °C and then freeze-dry before derivatization (see Note 5). 6. Centrifuge the quenched samples at 10,000 × g for 20 min at −20 °C. Discard the supernatant and store the pellet at −80 ºC (see Note 6).

3.2 Extraction of Intracellular Metabolites: Freeze– Thaw Procedure

1. Add 2.5 mL of the extraction solution (1:1 (v/v) methanol– water, −20 °C) to the cell pellet. 2. Add 15–20 μL of internal standard (10 mM d4-alanine). Mix vigorously with a vortex mixer (~1 min). Freeze at −80 °C (for ~30 min).

204

Sónia Carneiro et al.

3. Remove the samples from the freezer and thaw them in an ice bath (for ~4 min). Mix vigorously using a vortex mixer (~1 min) and freeze again at −80 °C (for ~30 min). Repeat this step two more times. 4. Centrifuge samples at 10,000 × g for 15 min at −20 °C. Collect the supernatant to a new tube and store at −80 °C. 5. Add another 2.5 mL of extraction solution (1:1 (v/v) methanol– water, −20 °C) to the cell pellet. Mix vigorously with a vortex mixer (~30 s). 6. Centrifuge samples at 10,000 × g for 15 min at −20 °C. Collect the supernatant in the previous tube and store at −80 °C until lyophilization (see Note 7). 7. Transfer samples into silanized glass vials (see Notes 7 and 8) and lyophilize samples in a freeze-dryer. 3.3 Derivatization: Methyl-Chloroformate Derivatization

1. Resuspend the freeze-dried samples in 200 μL of 1 M NaOH. 2. Transfer the suspension into silanized glass reaction tube (see Note 8). 3. Add 167 μL of pure methanol and 34 μL of pyridine to the samples (see Note 9). 4. Add 20 μL of MCF and mix vigorously using a vortex mixer for 30 s (see Note 10). 5. Add 20 μL of MCF and mix vigorously for 30 s. 6. Add 400 μL of chloroform to the samples and mix vigorously for 10 s. 7. Add 400 μL of 50 mM NaHCO3 to the samples and mix vigorously for 10 s (see Note 11). 8. Remove the aqueous (top) phase using a glass Pasteur pipette (see Note 12). 9. Add ~100 mg of anhydrous Na2SO4 to “dry” the organic phase. 10. Transfer the dried organic phase using a glass Pasteur pipette to a GC-MS vial assembled with a silanized insert (see Note 13). 11. Samples are ready for the GC-MS analysis (see Note 14).

3.4

GC-MS Analysis

1. Use helium as a carrier gas with a constant flow rate of 1.0 mL/min. 2. Set the injector temperature to 290 °C, the interface temperature to 250 °C and the trap temperature to 190 °C. 3. In the splitless injection mode, inject 1 μL of sample. 4. Following injection, the GC oven temperature is hold at 45 °C for 2 min before ramping up to 180 °C at a rate of 9 °C/min. 5. The oven temperature is then hold for 5 min before finally ramping to 220 °C at a rate 40 °C/min and hold for another 5 min.

Metabolomics in Yeast by GC-MS

205

6. The temperature is raised to 240 °C at a rate 40 °C/min, held for 11.5 min and raised again to 280 °C at a rate 40 °C/min and held for 2 min, totalizing 43 min of time of analysis. 7. The MS is operated in full-scan acquisition mode (38– 650 m/z), after 5 min at 1.64 scan/s.

4

Notes 1. Most chemicals used in the derivatization and silanization processes are flammable, corrosive, and/or toxic and should be handled in a fume hood. 2. Store the internal standard into 200 μL aliquots that should be sufficient for the analysis of 15–20 samples. 3. This protocol was optimized for batch cultivation where samples were taken with an approximate dry weight of 2 g/L. 4. When growing yeast in a bioreactor vessel, the sample collection can be performed using incorporated sampling devices (if available) or using a plastic 10 mL syringe to withdraw approximately 10–15 mL of culture broth that is immediately split (~2 mL) among the five 15 mL centrifuge tubes containing the cold quenching solution. If the culture is performed in shaker flasks, the culture broth should be mixed well by pipetting and rapidly transferred (~2 mL) to the 15 mL centrifuge tubes. 5. The exometabolome does not need to be quenched, as the extracellular milieu is frequently deprived of metabolic activity and extracellular metabolites are more stable. 6. At this stage pellets can be kept at −80 ºC before proceeding with the extraction procedure. 7. Before storing at −80 ºC, add 10 mL of cold bi-distilled water (4 ºC) in order dilute methanol to a concentration ≤30 % (v/v) and allow the freezing of the solution, which is necessary for lyophilization in the freeze drying process. After dilution, transfer the solution into silanized glass vials that will be transferred to the freeze dryer for lyophilization. 8. Glass tubes and vials used in the derivatization procedure should be silanized. Always use glassware (not plastic tubes and containers) when performing the silanization process. The silanization process deactivates active groups on glass surface and prevents metabolites from adhering onto the surface. This chemical modification maximizes the recovery of analytes stored in these tubes. The silanization of reaction tubes can be performed as follows: – Submerge the tubes in the silanization solution for 30 min and transfer then to a methanol bath for 5 min.

206

Sónia Carneiro et al.

– ATTENTION: Dichlorodimethylsilane and hexane are highly flammable and toxic. The silanization process should be undertaken in a fume hood and wearing proper protective gloves. – Transfer the tubes to a beaker with running distilled water for 5–10 min and transfer them again to an acetone bath for 5 min. – Drain the tubes and let them dry at room temperature. 9. If samples are not completely dissolved, the volume of solutions used in the derivatization method can be doubled (including solutions used in this step and subsequent ones). 10. Once started, it is important to go through this procedure (i.e., steps 4 and 5) until the addition of chloroform, when the derivatization reaction is stopped. Mixing times and intensity should be kept constant among samples. 11. After mixing, a two-phase separation should be observed: the upper phase is the water phase and the bottom phase is the organic phase composed of amino and non-amino metabolites. 12. If necessary, briefly spin the tubes in a centrifuge to separate the two phases. 13. Avoid transferring Na2SO4 crystals into the vials, because it may block the GC column during the chromatographic analysis. 14. Samples can be stored at −80 °C for 24–48 h before analysis. References 1. Canelas AB, Ras C, Pierick A et al (2008) Leakage-free rapid quenching technique for yeast metabolomics. Metabolomics 4:226–239 2. De Koning W, Van Dam K (1992) A method for the determination of changes of glycolytic metabolites in yeast on a subsecond time scale using extraction at neutral pH. Anal Biochem 204:118–123 3. Gonzalez B, François J, Renaud M (1997) A rapid and reliable method for metabolite extraction in yeast using boiling buffered ethanol. Yeast (Chichester, England) 13: 1347–1355 4. Taymaz-Nikerel H, de Mey M, Ras C et al (2009) Development and application of a differential method for reliable metabolome analysis in Escherichia coli. Anal Biochem 386: 9–19 5. Wittmann C, Krömer JO, Kiefer P et al (2004) Impact of the cold shock phenomenon on quantification of intracellular metabolites in bacteria. Anal Biochem 327:135–9

6. Castrillo JI, Hayes A, Mohammed S et al (2003) An optimized protocol for metabolome analysis in yeast using direct infusion electrospray mass spectrometry. Phytochemistry 62: 929–937 7. Faijes M, Mars AE, Smid EJ (2007) Comparison of quenching and extraction methodologies for metabolome analysis of Lactobacillus plantarum. Microb Cell Fact 6:27 8. Boer VM, Crutchfield CA, Bradley PH et al (2010) Growth-limiting intracellular metabolites in yeast growing under diverse nutrient limitations. Mol Biol Cell 21:198–211 9. Ewald JC, Heux S, Zamboni N (2009) Highthroughput quantitative metabolomics: workflow for cultivation, quenching, and analysis of yeast in a multiwell format. Anal Chem 81: 3623–9 10. Sasidharan K, Soga T, Tomita M et al (2012) A yeast metabolite extraction protocol optimised for time-series analyses. PLoS One 7:e44283

Metabolomics in Yeast by GC-MS 11. Smart KF, Aggio RBM, Van Houtte JR et al (2010) Analytical platform for metabolome analysis of microbial cells using methyl chloroformate derivatization followed by gas chromatography-mass spectrometry. Nat Protoc 5: 1709–29 12. Spura J, Reimer LC, Wieloch P et al (2009) A method for enzyme quenching in microbial metabolome analysis successfully applied to gram-positive and gram-negative bacteria and yeast. Anal Biochem 394:192–201 13. Weuster-Botz D (1997) Sampling tube device for monitoring intracellular metabolite dynamics. Anal Biochem 246:225–33 14. Buziol S, Bashir I, Baumeister A et al (2002) New bioreactor-coupled rapid stopped-flow sampling technique for measurements of metabolite dynamics on a subsecond time scale. Biotechnol Bioeng 80:632–6 15. Paczia N, Nilgen A, Lehmann T et al (2012) Extensive exometabolome analysis reveals extended overflow metabolism in various microorganisms. Microb Cell Fact 11:122 16. Mashego MR, Wu L, Van Dam JC et al (2004) MIRACLE: mass isotopomer ratio analysis of U-13C-labeled extracts. A new method for accurate quantification of changes in concentrations of intracellular metabolites. Biotechnol Bioeng 85:620–8 17. Canelas AB, ten Pierick A, Ras C et al (2009) Quantitative evaluation of intracellular metabolite

18.

19.

20. 21. 22.

23.

24.

207

extraction techniques for yeast metabolomics. Anal Chem 81:7379–89 Bolten CJ, Wittmann C (2008) Appropriate sampling for intracellular amino acid analysis in five phylogenetically different yeasts. Biotechnol Lett 30:1993–2000 Brown SD (1988) Chemometrics: A textbook. D. L. Massart. B. G. M. Vandeginste, S. N. Deming, Y. Michotte, and L. Kaufman, Elsevier, Amsterdam, 1988. ISBN 0-44442660-4. Price Dfl 175.00. J Chemometr 2: 298–299 Putri SP, Yamamoto S, Tsugawa H et al (2013) Current metabolomics: technological advances. J Biosci Bioeng 116:9–16 Van Der Greef J, Smilde AK (2005) Symbiosis of chemometrics and metabolomics: past, present, and future. J Chemometr 19:376–386 Koek MM, Jellema RH, van der Greef J et al (2011) Quantitative metabolomics based on gas chromatography mass spectrometry: status and perspectives. Metabolomics 7:307–328 Villas-Bôas SG, Højer-Pedersen J, Akesson M et al (2005) Global metabolite analysis of yeast: evaluation of sample preparation methods. Yeast (Chichester, England) 22:1155–69 Winder CL, Dunn WB (2011) Fit-for-purpose quenching and extraction protocols for metabolic profiling of yeast using chromatographymass spectrometry platforms. Methods Mol Biol (Clifton, NJ) 759:225–38

Chapter 13 13

C-Based Metabolic Flux Analysis in Yeast: The Pichia pastoris Case Pau Ferrer and Joan Albiol Abstract Metabolic flux analysis based on tracing patterns of stable isotopes, particularly 13C, comprises a set of methodologies to experimentally quantify intracellular biochemical reaction rates, i.e., to measure carbon flux distributions through a metabolic network. This allows quantifying the response of a metabolic network to an environmental or genetic perturbation (i.e., the metabolic phenotype). Here, we describe a protocol based on growing yeast on a 13C-labelled substrate and subsequent NMR detection of 13C-patterns in proteinogenic amino acids. To calculate metabolic fluxes, we describe two complementary mathematical approaches using available software; namely, an approach based on the estimation of local ratios in network nodes, and a method based on a global iterative fitting approach. Furthermore, we consider specificities of these protocols for their application to the yeast Pichia pastoris growing on multicarbon substrates other than glucose (glycerol), as well as the case when methanol is used as co-substrate in combination with glucose or glycerol. Key words Metabolic flux analysis, 13C-labelling, Yeast, NMR, MS, Pichia pastoris, Glycerol, Methanol

1

Introduction 13

C-based metabolic flux analysis (13C-MFA) has been developed as a set of powerful methodologies for quantitative analysis of microbial metabolism [1, 2]. In particular, there have been remarkable methodological advances and applications to yeast over the past decade, not only for S. cerevisiae growing on glucose as sole carbon source [3–5] but also extended to other yeasts and substrates [6–13]. We describe here a protocol for metabolic flux analysis based on the use of nuclear magnetic resonance (NMR) to trace patterns of stable isotopes in protein-bound amino acids from yeast growing on 13C-labelled substrates [1, 2] for extended periods. Since the carbon backbones of several key metabolic intermediates are conserved in amino acids, cell protein is a stable and abundant

Valeria Mapelli (ed.), Yeast Metabolic Engineering: Methods and Protocols, Methods in Molecular Biology, vol. 1152, DOI 10.1007/978-1-4939-0563-8_13, © Springer Science+Business Media, LLC 2014

209

210

Pau Ferrer and Joan Albiol

source of labelling information on the metabolites of the central metabolism, thereby enabling the characterization of the metabolic network topology and fluxes under steady state conditions [14, 15]. Two major (complementary) approaches have been used to calculate metabolic fluxes from labelling patterns [2, 16]. The first strategy is based on the calculation of flux ratios in converging pathways based on the direct interpretation of selected labelling patterns of protein-bound amino acids. These flux ratios are then used as constraints for the calculation of net fluxes [7, 17], e.g., using the MATLAB programming environment. As metabolic flux ratio (METAFoR) analysis relies on amino acids, it can only resolve fluxes in central carbon and amino acid metabolism. Furthermore, the formulation of the flux ratio equations has to be manually derived for every metabolic network (i.e., for every specific microorganism under study), carbon source(s), and isotopic tracer. For instance, Szyperski [14] proposed the use of mixtures of non-labelled and uniformly labelled substrates for METAFoR analysis. Therefore, the information gained from these experiments relies solely on the 13C–13C connectivities. This imposes a limitation to the applicability of this methodology when using carbon sources with reduced number of C−C bonds or C1 compounds such as glycerol and methanol, respectively [18, 19]. The second method uses a system of balance equations describing the propagation of 13C within a certain metabolic network. These equations are derived for each isotopomer (isotope isomer) of each metabolite [1]. By integrating 13C data, measured extracellular fluxes, biosynthetic requirements, and an initial set of intracellular flux estimates into the computer model, flux distributions are calculated by global iterative fitting of fluxes to measured data, which minimizes the difference between observed and simulated labelling patterns and measured flux data. To this purpose, the publicly available 13CFlux2 software package can be used [20]. Both local flux ratio analysis and global iterative fitting approach can also be carried out using mass spectrometry (MS) as analytical method for detection of isotope patterns. The pros and cons of NMR versus MS for 13C-based flux analysis are discussed elsewhere [16]. Alternative protocols for 13C-MFA based on the analysis of labelling patterns in free intracellular metabolites potentially allow resolving fluxes through a larger metabolic network, as well as assessing dynamic flux changes [21]. Such approach has been recently applied to Pichia pastoris [22]. However, it is experimentally more demanding and it is not within the scope of the protocol described here. In this protocol, we provide two procedures for MFA suitable for the cases of glucose, glucose–methanol (using the METAFoRbased method), and glycerol–methanol (using global iterative fitting method) metabolism in P. pastoris. In the present protocol, P. pastoris is grown in continuous cultures and analysis of proteinogenic

13

C-MFA of P. pastoris

211

amino acids is performed by NMR. The MFA procedure using glucose as only carbon source is representative of the standard analysis performed in most yeasts, and has also been carried out through MS analysis in several prokaryotic and eukaryotic microorganisms [6]. An excellent detailed protocol based on MS data has been previously published by Zamboni and coworkers [16], using E. coli grown on glucose as case study. The mixed substrates cases represent two alternative scenarios highly relevant to the methylotrophic yeast P. pastoris. Overall, the procedure includes yeast growth in continuous culture in the presence of 13C-labelled carbon source(s), sampling, labelling pattern detection by NMR, calculation of flux ratios and net fluxes in the central carbon metabolism. Metabolic models and experimental data to reproduce the P. pastoris case study are available in the literature [7, 8, 19, 23].

2

Materials

2.1 Cultivation of P. pastoris in Continuous Culture

1. A prototrophic P. pastoris strain (e.g., X-33, Invitrogen) (see Note 1). 2. Unlabelled carbon sources (glucose, glycerol, methanol), i.e., with 13C degree of labelling as abundant as it occurs in nature (i.e., ca. 1.1 %). 3. YPG medium: 10 g/L yeast extract, 20 g/L peptone, 10 g/L glycerol, used for seed cultures, supplemented with appropriate antibiotics (e.g., zeocin), when required (e.g., for recombinant strains). 4. PTM1 trace salts stock solution: 6.0 g/L CuSO4⋅5H2O, 0.08 g/L NaI, 3.0 g/L MnSO4⋅H2O, 0.2 g/L Na2MoO4⋅2H2O, 0.02 g/L H3BO3, 0.5 g/L CoCl2, 20.0 g/L ZnCl2, 65.0 g/L FeSO4⋅7H2O, and 5.0 mL H2SO4 (95–98 %). 5. Minimal medium for chemostat culture: e.g., as described in ref. 24, 50 g/L glucose·H2O, 0.9 g/L citric acid, 4.35 g/L (NH4)2HPO4, 0.01 g/L CaCl2⋅2H2O, 1.7 g/L KCl, 0.65 g/L MgSO4⋅7H2O, 1 mL/L biotin (0.2 g/L), and 1.6 mL/L PTM1 trace salts stock solution. The pH is set to 5.0 with 25 % (v/v) HCl (see Note 2). 6. 0.22 μm filters (sterile or autoclavable) for sterilization of required medium components and 0.45 μm filters for HPLC sample preparation.

2.2 13C-labelled Substrates and Materials Needed for Sample Preparation

1. Uniformly 13C-labelled carbon source (e.g., Cortecnet, Sigma Aldrich, Isotec), of chemical purity >99 % and 13 C-enrichment >98 %. 2. 20 mM Tris–HCl, pH 7.6

212

Pau Ferrer and Joan Albiol

3. 6 M HCl 4. Deuterium oxide D2O. 5. 15–20 mL capacity hydrolysis tubes (e.g., borosilicate glass) and caps suitable to resist 6 M HCl at 110 °C for 24 h. 6. 0.22 μm filters 7. NMR tubes. 2.3

Equipment

1. Benchtop bioreactor equipment, preferably equipped with CO2 and O2 off-gas analyzers. 2. HPLC equipment, ion exchange column (e.g., Aminex HPX87H, Bio-Rad Laboratories, Inc.) and HPLC vials (HPLC is used for extracellular metabolite analyses). 3. Spectrophotometer equipped with visible (400–700 nm) light source. 4. Benchtop centrifuge for 2 and 50 mL tubes and centrifuge for 250 mL tubes. 5. Heat block for temperatures up to 110 °C. 6. Benchtop lyophilizer. 7. NMR spectrometer of frequency 500 MHz or higher (e.g., Varian Inova spectrometer), equipped with probes for 1H and 13C. 8. Standard (Varian) spectrometer software VNMR™ for spectral acquisition and initial processing.

2.4 Software Tools for MFA Based on 13 C-NMR Datasets

3

1. Microsoft Excel or similar spreadsheet application 2. MATLAB® 3.

13

CFlux2 [20] (http://www.13cflux.net/13cflux2/). It runs on Linux operating system.

Methods

3.1 Chemostat Cultivations and Labelling Experiment

1. Inoculate a seed culture with unlabelled YPG medium (200 mL in a 1 L Erlenmeyer flask) from a cell glycerol stock kept at −80 °C or an agar plate. Grow overnight shaking at 200 rpm and 30 °C. For inoculation of the bioreactor, harvest the cells by centrifugation in sterile centrifuge tubes at 4 °C and resuspend the cell pellet in a small volume of bioreactor medium. 2. Inoculate the bioreactor at an optical density (OD600) of 1.0. After a batch phase of about 24 h (length may vary depending on the growth conditions and medium composition), indicated by a sharp increase in the dissolved oxygen concentration, start the continuous culture by feeding the medium of choice at a defined rate. 3. Cultivate the cells in continuous culture feeding unlabelled growth medium until they reach a steady state, which is indicated

13

C-MFA of P. pastoris

213

by steady values of specific cultivation parameters (i.e., CO2 and O2 concentrations in the bioreactor off-gas, pO2, OD600 and level of extracellular metabolites in the bioreactor, e.g., carbon source(s), arabitol, ethanol, pyruvate, citrate, glycerol). Generally, the steady state before sampling should be kept for not less than five residence times (see Note 3). 4. During cultivation, aseptically remove 2 mL aliquots from the bioreactor (2–3 times per residence time); measure OD600 and centrifuge a volume of 1 mL for 2 min at 14,000 × g, 4 °C, remove the supernatant with a pipette and store it at −20 °C until analysis. Perform HPLC measurement of substrates and by-products in the supernatant samples collected along the continuous culture. Measure also biomass concentration (dry cell weight/L) of samples taken at the same time as the ones for supernatant analysis. 5. Once the steady state is reached and kept for five residence times, switch the feed line to the 13C-labelled growth medium, registering the time at which the new medium starts entering the bioreactor. The 13C-labelled growth medium must have exactly the same composition as the unlabelled (i.e., naturally labelled) medium except for that 10–15 % of its carbon source must be replaced by uniformly 13C-labelled substrate. Importantly, for the case of mixed carbon feeds (e.g., glucose– methanol and glycerol–methanol), both carbon sources have to be fractionally labelled at exactly the same level, that is, the percentage of uniformly 13C-labelled substrate in the feed medium must be equal for both carbon sources. 6. Allow to grow cells as long as possible with the labelled medium. Typically, it should be for 1.5 residence times (and not less than 1 residence time) (see Note 4). Harvest the cells and record the harvest time. Allow for a minimum of three aliquots (technical replicates) with about 100 mg of dry cell weight each for later analysis. If required, harvest extra biomass for compositional (elemental and macromolecular) analysis (e.g., as described in ref. 25). 7. Centrifuge harvested cells at 4,000 × g, 4 °C. Discard the supernatant and wash the cell pellet with an equal volume of isotonic buffer at 4 °C (e.g., 20 mM Tris–HCl pH 7.6) and pellet the cells by centrifugation again. Store cell pellets at −20 °C. At this step, cells can be also lyophilized and stored in a dry environment at −20 °C before analysis (cell samples can be stored in this condition for 2 months). 3.2 Data Consistency Check

8. Calculate the substrate(s) uptake rate(s), metabolite secretion rates, CO2 production rates (CER), and O2 uptake rates (OUR), for instance, in mmol/gDCW·h (see Note 5). 9. If not available from the literature or previous experiments under analogous conditions, perform elemental and macromolecular

214

Pau Ferrer and Joan Albiol

(protein, carbohydrates, lipids, RNA, and DNA) biomass compositional analyses. 10. At this point, it is advisable to perform a data consistency and validation step. Usually, this step relies on the elemental mass balances around the bioreactor and the details have been described extensively in the literature [26–28]. The method allows detecting any significant measurement error or inconsistency in the collected data. In this way, the biases in the metabolic flux calculation due to erroneous measurement data are limited. It is convenient to verify that the used data are as reliable as possible, as the metabolic fluxes cannot be directly measured for verification purposes. Therefore, their determination relies totally on the accuracy and precision of the obtained data and the correctness of the assumed stoichiometry and individual carbon transfer. To be able to perform this step, it is necessary to have redundant information among the measured data. This is because the reconciliation step relies in establishing a set of linear relations (for example from elemental balances) from which the complete number of rates can be determined, provided that a minimum number of measures are available. Measuring more rates than de minimum required allows for performing a consistency analysis. Therefore, knowledge on the elemental composition of all the involved compounds (see Note 6), including biomass, is usually required besides measuring as much substrates and products as possible. If a simpler approach is desired, it is convenient to perform at least a carbon balance. That is, to verify that the total carbon supplied to the bioreactor is found in the outputs. Otherwise, there is the risk that a single biased measure compromises the rest of calculations. Of course, if not all the carbon containing compounds entering or leaving the bioreactor are measured, the accuracy and precision of the calculated fluxes will depend on the measured ones and their confidence (biological replicates, etc.) will become key point. In a first step, a minimum of two biological replicates is necessary. Nevertheless, obtaining small confidence intervals for the fluxes may require increased number of biological replicates, as well as a carefully designed labelling strategy. 3.3 Sample Processing for NMR Analysis

1. Resuspend the frozen or lyophilized cells in 3 mL of H2O, and add 6 mL of 6 M HCl; transfer resuspended cells into hydrolysis tubes. Seal the tubes to prevent evaporation of HCl. Incubate the tubes for 24 h in a heat block or oven at 110 °C for 24 h. 2. Filter the cell hydrolysate through a 0.22 μm filter and vacuumdry or lyophilize it. 3. Dissolve the dried filtrates in D2O for NMR analysis. The pH of the sample will be below 1 due to residual HCl.

13

C-MFA of P. pastoris

3.4 2D1H-13C-HSQC NMR Spectral Acquisition

3.5 Spectral Processing

215

1

H-13C-HSQC nuclear magnetic resonance (NMR) spectra acquisition detailed procedures can be found in refs. 14, 29. Also, a summarized protocol for HSQC spectral acquisition has been recently described in ref. 30. The spectra are processed using the standard spectrometer software (e.g., Varian VNMR™). 1. Integration of 13C scalar fine structures of proteinogenic amino acids carbon signals Szyperski and coworkers [31] developed a specialized software package, FCAL, for the integration of 13C-scalar fine structures of proteinogenic amino acid carbon signals in the 1H-13C-HSQC NMR spectra and for the calculation of relative abundances of intact carbon fragments originating from a single molecule of glucose. This methodology was further extended for investigating eukaryotic systems (i.e., consideration of cell compartmentation) by [3, 32]. However, this package has not been made available to many researchers in the field. Other systematic procedures for spectral fitting have been described [33]. Alternatively, we refer to the protocol recently described by [30], which relies on open-source available MATLAB module (NMRisotopomer) for extracting, deconvoluting, and quantifying peaks. The data resulting from this analysis are relative intensities of fine structures observed in the multiplets of the proteinogenic amino acids (see Note 7). 2. Calculation of relative abundances of intact carbon fragments (fragmentomers). Statistical formulae to translate multiplet intensities into bond integrities have been developed in 14. Notably, these probabilistic equations can be readily applied to the case of two simultaneous carbon sources, being methanol one of them. This is because, as a C1-compound, methanol does not introduce contiguous multiple-carbon fragments to the metabolism and, therefore, all contiguous 13Cn (n > 1) fragments must originate from the multicarbon C-source (e.g., glucose or glycerol) [19]. The nomenclature used here for the intact carbon fragments, “fragmentomers,” has been previously described [3]. Briefly, f(1) represents the fraction of molecules in which the observed carbon atom and the two neighboring carbons originate from different carbon source molecules; f(2) is the fraction of molecules in which the observed carbon atom and one of the two neighboring carbon atoms originate from the same source molecule of multicarbon source, and f(3) represents the fraction of molecules in which the observed carbon atom and both carbon neighbors originate from the same glucose molecule (Fig. 1). In case that the observed carbon exhibits significantly different 13 C–13C scalar coupling constants (see Note 7) for the two neighbor carbons, two different fractions, f(2) and f(2*) are distinguished. In this case, the fraction of molecules with a conserved

216

Pau Ferrer and Joan Albiol

Fig. 1 Isotopomers (fragmentomers) causing the various fine structures from 13C–13C scalar coupling in the NMR spectra of fractionally 13C-labelled amino acids. 13C atoms are depicted as black circles and 12C atoms as white circles. The observed carbon atom (carbon 2, C2) is indicated by an arrow. The corresponding fine structures (multiplets) observable on a one-dimensional 13C section of a 2D 1H-13C HSQC spectrum are shown below each fragmentomer. JCC denotes the 13C–13C scalar coupling constant

bond between the observed carbon atom and the neighboring carbon with the smaller coupling is represented by f(2). Accordingly, f(2*) denotes the fraction of molecules where the carbon bond is conserved between the observed carbon and the neighboring carbon with the larger coupling. If the observed carbon is located at the end of a carbon chain, only the f(1) and f(2) fragmentomers can be observed. The fragmentomer information obtained from the proteinogenic amino acids can be traced back to their metabolic precursors, which are intermediates of the central carbon metabolism. The carbon backbones of those eight precursors (i.e., ribose-5-P, erythrose-4-P, 3-phosphoglycerate, phosphoenolpyruvate, pyruvate, oxaloacetate, acetyl-coenzyme A, and 2-oxoglutarate) are conserved in the amino acid synthesis pathways [32]. 3.6 Metabolic Flux Ratios (METAFoR) Analysis

The calculation of metabolic flux ratios (METAFoR analysis) when using fractional 13C-labelling of amino acids in yeast (i.e., considering cell compartmentation) is described in detail in ref. 3. An extension of this formalism for the case of methanol being used as a co-substrate with glucose is described in [8]. The calculation of flux ratios is based on the assumption that both a metabolic and an isotopomeric steady state occur. To establish a cost-effective protocol for a larger number of 13C labelling experiments, chemostat cultures operating in metabolic

13

C-MFA of P. pastoris

217

steady state are typically fed with the medium containing the 13 C-labelled substrates for the duration of 1 or 1.5 volume changes before harvesting the biomass[15, 18]. Then, the fraction of labelled biomass (Xlabelled) is calculated according to a first-order wash-out kinetics: Xlabelled = 1 − e−t/θ, where θ is the residence time of the chemostat and t the labelling time [15, 18, 29]. For instance, 63 % of the biomass will be fractionally labelled after one residence time (i.e., volume change) (see Notes 1−4). 3.7 Calculation/ Computation Protocols for MFA Using Local METAFoR Constraints

1. Define the metabolic stoichiometric model 13 C-constrained metabolic flux analysis (13C-MFA) is performed using a stoichiometric model comprising the relevant pathways of the desired organism. As a typical example the central carbon metabolism [8] of P. pastoris can be considered. The stoichiometric model results in a linear system of equations. To be able to determine the fluxes, the linear system has to be solved. To this purpose the degrees of freedom of the system (number of fluxes minus rank of stoichiometric matrix) have to be equal (or lower) to the sum of the number of measurements and the number of available flux ratios. Therefore, this has to be taken into account when selecting the reactions to include in the model. 2. Calculate the consumed intermediate metabolites for biomass formation. Besides extracellular input–output fluxes, the metabolic networks usually supply intermediates for other metabolic processes that have to be taken into account. In this model, the consumption of intermediate metabolites of central metabolic pathways for formation of the major biomass macromolecular components (i.e., proteins, carbohydrates, lipids, and nucleic acids) is calculated as previously described [25]. The amount of consumed central metabolism precursors for biomass formation is important for correct determination of metabolic fluxes. A proper biomass composition (see for example, those provided in refs. 8, 23, 25) should be used to calculate the amount of consumed precursors; therefore, the biomass formation equation should be upgraded when the biomass composition is known to change under different growth conditions (or genetic backgrounds). 3. Constrain the network using the flux ratios. To calculate the intracellular net fluxes, the stoichiometric model is constrained not only by the measured extracellular metabolic input/output rates (evolution rates of biomass, methanol and glucose uptake rate, CO2 uptake rate) but also by the set of intracellular flux ratios derived from the METAFoR analysis, as described by [17], thereby constituting a determined linear system of equations.

218

a

Pau Ferrer and Joan Albiol

b

Fig. 2 Metabolic flux ratios. (a) Metabolic flux ratios generally used for yeast grown in glucose-limited chemostats. (b) Additional metabolic flux ratio applicable to glucose+methanol-limited chemostats. Flux ratio PEP from PPP does not apply to glycerol (and glycerol+methanol) grown cells. The X values are directly inferred from the 13C patterns. PEP phosphoenolpyruvate, PGA phosphoglycerate, OAA oxaloacetate, TCA tricarboxylic acid, cyt, cytosolic, mit mitochondrial, ub upper bound, lb lower bound

In particular, for glucose-grown cells, a minimum of four flux ratios (see Note 8) are generally used [3, 7] (Fig. 2a). For the glucose–methanol co-assimilation case, specific ratios have been recently formulated [8] (Fig. 2b). Typically, redox cofactors are not used as mass balance constraints to solve the 13C-MFA system. Cofactor mass balances are potential sources of errors since the correct balancing requires detailed knowledge of the relative activities of different isoenzymes and the enzyme cofactor specificities on a cell wide scale. Depending on the growth conditions, some reactions of the central metabolism (e.g., glyoxylate cycle and malic enzyme reactions when yeast are growing on glucose-limited aerobic chemostats) can be omitted from the model on the grounds of the inspection of the METAFoR analysis, as previously described [3, 18, 19]. As an example, for a simple metabolic model including glycolysis, pentose phosphate pathway, and tricarboxylic acid cycle (Fig. 3), the following ratios can be experimentally obtained in oxygen-limiting and hypoxic conditions and subsequently used for flux calculations (see ref. 7 for details):

13

C-MFA of P. pastoris

219

Fig. 3 Metabolic network model of the central carbon metabolism of P. pastoris. Fluxes are represented as net fluxes, and the directions of the arrows indicate the directions of the positive net fluxes. The metabolites consumed or produced by extracellular fluxes (shown as dashed arrows) are denoted with (E). (Taken from ref. 7)

220

Pau Ferrer and Joan Albiol ●

The fraction of mitochondrial oxaloacetate (OAAmit) originating from cytoplasmic oxaloacetate (OAAcyt), i.e., OAAcyt transported into the mitochondria, is defined as: a=



(1)

Similarly, in oxygen-limiting and hypoxic conditions, the fraction of OAAcyt originating from cytoplasmic pyruvate (Pyrcyt), by Pyruvate carboxykinase (PyrCK) is defined as: b=



x 23 x x 23 + x16

x17 x17 + x 24

(2)

On the other hand, upper limit of the fraction of Phosphoenolpyruvate (Pep) from Pentose phosphate pathway (PPP) assuming a maximal contribution of PPP: c³

x 9 + 2 ( x11 ) + 3 ( x10 ) 2 ( x 3 ) + x 9 + x10

(3)

From these ratios, an initial solution can be obtained by deriving the following linear constraint equations (see Note 8): x 23 (1 - a ) + x16 ( -a ) = Ra

(4)

x17 (1 - b ) + x 24 ( -b ) = Rb

(5)

x 9 (1 - c ) + x11 ( 2 ) + x10 (3 - c ) + x 3 ( -2 ) £ Rc

(6)

where each R represents the residual of the balance which ideally should be equal to zero. Equations 4–6 can be added to the corresponding stoichiometric model as a submatrix F, obtaining the complete metabolic model to solve the metabolite mass balances: éc ù éS ù ê F ú × x = ê0 ú º N × x = b ë û ë û

(7)

where S represents the stoichiometric matrix (including input/output reactions), c is a column vector with either 0 for internal reactions or the corresponding value for each one of the input/output rates and x is the vector of fluxes. 4. Solve the linear system of equations to obtain flux estimates The solution of the resulting linear system can be obtained using any mathematical programming environment such as MATLAB. If MATLAB is used, the function lsqlin (x = lsqlin (C, d, A, b, Aeq, beq, lb, ub, x0)) can be used to solve Eq. 7,

13

221

C-MFA of P. pastoris

where C and d are the matrix N and vector b, respectively, in Eq. 7. The vector x0 contains the initial values for the fluxes in the fitting procedure and x the final solution. This function allows including additional constraints such as reaction irreversibility, upper and lower flux bounds or upper and lower bounds to the flux ratio constraints. Matrix A and vector b are verified to fulfil the equation A⋅x = b, while matrix Aeq and vector beq have to fulfil Aeq⋅x = beq (see Note 9). Thus, if irreversibility (positive flux) can be assumed for some intracellular fluxes from which this information is available, one row of zeros must be added to matrix A with a 1 at the position corresponding to the irreversible flux, while a 0 has to be set in the same position in vector b. Fixing a certain flux to a value can be done in a similar way using matrix Aeq and vector beq. Secondly, a non linear error minimization approach to iteratively determine metabolic fluxes can also be followed. An excellent example of this approach is described by [17]. Briefly, an objective function has to be minimized in the form: 2

fobj = å

2 Rmb 2 s mb

æ Rc + Rc ö ç ÷ 2 2 2 Ra Rb è ø + + + 2 2 2 2 æ ¶Rb ö 2 æ ¶Ra ö 2 æ ¶Rc ö sd ç sa ç sb ç ÷ ÷ ÷ è ¶a ø è ¶d ø è ¶b ø

(8)

where Rmb is: (9)

S × c - b = Rmb

When using this approach, the solution obtained using “lsqlin” above can be used as starting point for minimization. Minimization of this objective function can be done using MATLAB function “fmincon” (x = fmincon(fobj, x0, A, b, Aeq, beq, lb, ub)) using constraints in a similar way as done when solving the linear system above. 5. Determine confidence intervals for the fluxes Once the metabolic fluxes are obtained, confidence intervals for them should always be calculated. This can be performed calculating their standard deviation using the Fisher Information Matrix approach (FIM) [17] such as:

sj =

( FIM ) -1

jj

(10)

The FIM matrix is calculated as: FIM = åW T C -1W

(11)

222

Pau Ferrer and Joan Albiol

where C is the variance–covariance matrix of the measurements (usually assumed independent) and W is a parameter sensitivity matrix where each element wij corresponds to: wij =

¶xi ¶p j

(12)

which describes an infinitesimal change of the variable xi (e.g., a measurement) due to an infinitesimal change in parameter pj (a flux). Confidence intervals for the estimated fluxes pˆ j of pj can be derived from [34]. pˆ j - s p j .t av / 2 < p j < pˆ j + s p j .t av / 2

(13)

where tα/2v corresponds to the Student t distribution, with v degrees of freedom and α corresponds to the (1−α) selected confidence interval. Raw NMR data (intensities and f-values) and physiological parameters, as well as the summary of results (flux ratios and net fluxes) for a case study on MFA based on METAFoR for P. pastoris grown on glucose or glucose–methanol mixed feeds can be found in refs. 7, 8. 3.8 Calculation Protocols for MFA Using Global Iterative Fitting

Although the previous approaches can give useful values for the metabolic fluxes, there is a more general and powerful approach that takes full advantage of the complete information contained in the labelling data. Among other advantages, it potentially enables determining bidirectional fluxes and makes unnecessary the previous calculation of flux ratios. From a general point of view, this approach consists in a combination of steps that iteratively calculate the labelling distribution together with the corresponding expected values for the measures, and a step modifying the fluxes towards minimization of the difference between those calculated measures and the real ones. Several approaches have been developed for the calculation of the label distribution (e.g., isotopomers, cumomers, bondomers, and elementary metabolite units) and several software tools exist for this purpose. In the present case, it is proposed to use the software tool 13CFlux2 [20] (see Note 10). 1. Specification of metabolic and isotopic reaction networks As a first step, a text file must be written describing both the metabolic network and the specification of the corresponding carbon transfer for each reaction, as well as the measurements of the input–output fluxes and of the labelling data. In the new 13CFlux2 version, this file has to be written following a specific file format derived from the XML standard (Extensible Markup Language), which the authors have adapted for its use in metabolic flux calculation and named it FluxML.

13

C-MFA of P. pastoris

223

Box 1 Required Starting and Ending Lines in the FluxML File

.... ....

Box 2 Example of the Block







In FluxML, it is necessary to maintain the structure determined by the established rules. The file has to have a precise starting and ending lines, as shown in Box 1: Several sections must be defined between these starting and ending lines. When writing the file, it must be taken into account that 13CFlux2 is case sensitive. An accurate description of all the possibilities that can be implemented in this file goes beyond the scope of this protocol. Major sections will be briefly described. The user is referred to the accompanying Linux “man” page included in the software installation for more extensive details. The “” block (see Box 2) describes the metabolic network in terms of metabolites and reactions. Within this section, another block “” must contain an identifier for each metabolite together with its number of carbon atoms. This block is followed by a description of all the reactions in the network. For each reaction an identifier is provided and, optionally, an indication of bidirectionality. The program assumes by default that all reactions are bidirectional. Also, the reaction substrates (educts) and products must be given using their identifier (“id”). Specification of the atom transitions is stated using a string of characters used to uniquely identify each carbon at each educt. For example, educt Xul5P has five carbon atoms represented by letters “ABCDE.” These characters are positioned in the character strings corresponding to each product in the position where each carbon in the educts has been positioned in the products. For example, carbons represented by letters “AB” in Xul5P are later found as

224

Pau Ferrer and Joan Albiol

Box 3 Example of the Module

Met4-Met5=0; 0 lt= Met4





carbons 1 and 2 of the Fru6P product. As an alternative, the program also accepts a universal notation to specify atom position (atom#num_ of:atom@num_of_educt) (see Note 10). Once the reaction network has been described, a section of constraints can be included (see Box 3). Constraints are specified using the reaction identifiers as combinations of metabolic fluxes. They can be specified either for the net part of reaction flux as well as for the exchange part of the flux reactions. The program considers each reversible reaction formed by the combination of a net flux (the difference between forward and backward fluxes) and an exchange flux, which can be described as the difference between the highest flux (forward or backward) and the net flux. In the constraints section, different types of constraints can be included in a text line. For example, “Met4−Met5=0” indicates that fluxes from reactions Met4 and Met5 have to be equal. Or “0 lt= Met4” indicates that the flux “Met4” has to be always positive or zero (see Note 11). This section must include the ratio of labelled and unlabelled input metabolites. For example, a constraint such as (4·UptU−Upt0=0) indicates that the non labelled input “Upt0” equals four times the uniform labelled input “UptU,” which is equivalent to state a mixture of 20 % labelled input with 80 % unlabelled input. In a further section (see Box 4), the rest of the necessary data has to be included. This section includes: ●





A section for the definition of the input labelling as a series of statements A section of the measured values either as labelling or flux measures A section to specify flux values to be used as starting points for simulation.

Labelling is specified in a dedicated “input pool” section. For substrates with a mixture of labelled and unlabelled molecules, two input pools must be created: one for the labelled and one for the

13

C-MFA of P. pastoris

225

Box 4 Example of the Section

1



.....

......

....





non labelled species. Both input fluxes converge to a common metabolite, which is taken as the input of labelled substrate to the cell. The ratio among labelled and unlabelled species can be specified as a constraint. The program accepts different types of measures at the same time, including MS, MS/MS, 1H-NMR, and 13C-NMR data (see Note 12). The program can calculate the labelling distribution using either the elementary metabolite unit method or the cumomer method. This can be specified in the option “method” either as “emu” or “cumomer.” Alternatively, if “auto” is specified as an option, the program selects the more appropriate for the measurements provided. Similarly, the option “type” allows indicating whether it is desired to calculate all the “cumomers” or, alternatively, “emus” in the case the network can be reduced. Again, by specifying “auto” the program selects the most appropriate option for the case. There is a utility to verify the proper syntax of the FluxML file that can be used in the following way: ●

fmllint -i input_file_name.fml

If no problems are detected, the program provides no outputs. Alternatively, a list of errors (red) and warnings (yellow) is provided in the Linux shell.

226

Pau Ferrer and Joan Albiol

3.9 Simulation and Flux Determination

Once the FluxML file has been created and validated, it can be used both for simulation of the labelling process as well as for the fitting and flux determination procedure. The simulation step can be very useful in the planning of a labelling experiment in order to decide what labelling strategy would be more useful for the desired identification purpose. Whatever the simulation or identification are taken as the following steps, the academic free of charge version of the program requires to initially obtain a digital signature for the file. In this way, the program forces to obtain a license. For this step, an internet connection must be active. Therefore, the first step must always be to digitally sign the file with the following command: ●

fmlsign -i input_file_name.fml -o signed_file_name.fml

The internet connection established is usually very fast and the digitally signed file is obtained after a few seconds. 1. Simulation If a preliminary simulation step has to be performed, the following command can be used: ●

fwdsim -i signed_file_name.fml -o sim.fwd -H simulation.hdf5

The “sim.fwd” output file contains the result of the simulation including cumomer fractions, as well as net and exchange fluxes in an “XML” format. Further data, including the stoichiometric matrix and cumomer equations, can be optionally obtained in a file coded in “hdf5” format, which could be more convenient for transferring numerical data to other environments such as MATLAB for further calculations. 2. Flux identification The flux identification step requires specifying initial values from where to start the identification process. The possibility exists that the program finds a minimum solution from which no further improvement is possible. This usually depends on the starting point for minimization. Therefore, it is customary to start the minimization procedure from multiple different starting points and select the point with minimum residual among all the results given by the program. The determination of the range of possible flux values (feasible flux space of initial values) can be done using the following command: ●

sscanner -i signed_file_name.fml -o signed_file_name_new.fml

This step can also be used to select among different possible free fluxes if several possibilities are available. The user starts specifying only one free flux (normally the labelled substrate input) and the program will automatically select the rest of free fluxes. If the labelled input is measured, it must also be included in the measurements section.

13

C-MFA of P. pastoris

227

The process of selecting a number of different initial random points as starting values is automated by using the command: ●

ssampler -i signed_file_name.fml -o model_sampl.hdf5 -n 15

where the parameter “-n” indicates the number of different starting points (15 in this example) desired. The different starting points selected by the program are stored in the hdf5 file for further use. To include those starting points in the model file, the following step must be followed: ●

setfluxes –l n -H model_sampl.hdf5 -i signed_file_name.fml -o signed_file_name_new.fml -f

where “n” is de number of the starting point data group that has to be included for simulation. The next step is the estimation of the flux distribution from a different starting points: ●

fitfluxes -i signed_file_name_new.fml -o model_opt.fwd

The program shows in the screen the residual of the model being optimized. Once the program has run, the solution can be found in the file “model_opt.fwd” (see Note 13). Note that the residual is affected by the value given to the deviation of the measurements. This is because the residuals are divided (weighted) by the deviation. The intention is to give more weight to the more reliable (lower deviation) data. Giving higher deviation values to the measurements lowers the residual. However, this strategy results in a poorer fitting of the measures. Therefore, it is critical to properly determine standard deviation for the measures. From all the solutions found for the different starting points, the one with the minimum residual should be selected and the flux values written in a new model file: ●

setfluxes -F model_opt.fwd -i signed_file_name_new.fml -o model_opt.fml –f

Once the new file with the optimized fluxes is obtained, further statistical data can be obtained by performing a new simulation including the parameter “-s” for the statistics and “-H” to obtain a covariance matrix: ●

fwdsim -i model_opt.fml -o sim_opt.fwd -s -H sim_opt.hdf5

The file sim_opt.fwd contains the simulated fluxes and labelling fractions together with the residual for each measure (in the “measurement residual” section). The residual values for each measurement group should be revised and high residual values carefully checked. This is an indication that the simulation cannot get a solution closer to the measurement. This should be taken as an indication that either the measure is incorrect, has a high deviation, or the introduced model is incorrect (see Note 14). As all this procedure is laborious, it is convenient to write a perl program for automatization.

228

Pau Ferrer and Joan Albiol

The raw NMR data (relative intensities), physiological parameters and summary of results for case on MFA based on global iterative fitting using 13CFlux2 for P. pastoris grown on glycerol–methanol mixed feeds can be found in [19] and [23], respectively.

4

Notes 1. Strains with biosynthetic auxotrophies should be avoided, since supplemented nutrients with carbon atoms, e.g., amino acids, will be metabolized, thereby interfering with labelling patterns in proteinogenic amino acids and subsequent estimation of fluxes. 2. It is desirable to grow cells in chemically defined media, i.e., where all the carbon sources are known. Most 13C-MFA studies reported in the literature describe experiments with minimal medium and a single carbon source (glucose), as this simplifies the calculation of fluxes. Co-assimilation of multiple carbon substrates, e.g., glucose–methanol, requires the development of new flux ratio equations on a case-by-case basis that can account for the new substrate [8]. However, this is not always possible [18, 19]. In this case, iterative flux fitting is more versatile and straightforward approach, as in principle it only requires the extension of the metabolic model. Moreover, multiple labelling experiments under the same growth conditions using the same substrates but with different isotopic labels (e.g., uniform labelled substrate, where all carbon atoms are 13C, mixed with the substrate labelled in one or some of its carbon atoms) may help increasing flux resolution in some nodes of the network. Indeed, 13CFlux2 can also be use as a tool for 13C-labelling experimental design. 3. Batch and fed-batch cultivations can be also considered for 13 C-flux analysis methodology described in this protocol, which assumes that cells are at metabolic steady state. In batch cultures, a metabolic pseudosteady state exists during the exponential growth phase, where the specific growth rate is constant. A detailed protocol for 13C-labelling experiments in batch (shake flask) cultures is described in ref. 16. Similarly, fed-batch cultivations operating at a controlled constant growth rate have also been considered [6]. 4. As mentioned above, the calculation of the flux ratios when using fractional 13C-labelling of amino acids is based on assuming both a metabolic and an isotopomeric steady state, i.e., transport and metabolic reaction rates, metabolite concentrations, as well as labelling patterns are constant over time. Theoretically, the isotopic steady state of the biomass components is only

13

C-MFA of P. pastoris

229

reached after an infinite number of residence times. Nevertheless, to establish an affordable protocol for 13C-labelling, chemostats operating in metabolic steady state are typically fed for the duration of 1–1.5 residence times (reactor volume changes) with the medium containing the 13C-labelled substrate(s) before harvesting biomass. Therefore, the experimental labelling data must be corrected by the deviation from the isotopic steady state. In particular, the fraction of labelled biomass produced at the end of the supply with 13C-labelled medium can be calculated following first-order wash-out kinetics. Then, it is desirable to operate chemostats under carbon-limiting conditions, i.e., where the residual concentration of carbon source in the bioreactor is small (or even below detection limit). Operation under carbon-excess conditions will dilute the 13C-substrate fraction upon switching from unlabelled to labelled medium, thereby resulting in altered 13 C incorporation kinetics in the generated biomass. Moreover, growing yeast in carbon-limited conditions may result in smaller amounts of metabolic by-products being secreted, which could be potentially re-imported into the cell. 5. Calculation of specific conversion rates in a continuous bioreactor can be done in several ways. For instance, just multiply the input flow rate (ex, L/h) by the relevant input concentrations (ex. mmol/L), thereby obtaining input flow rates (ex. mmol/h). Do the same with the output. For each relevant component subtract the output from the input rates and divide the result by the reactor volume. You now have the input (with negative sign) or output (positive sign) rates per unit of reactor volume (ex. mmol/(h·L)). Finally, divide the result by biomass concentration (ex. gDCW) obtaining the specific conversion rates (mmol/(h·gDCW)). 6. The “involved compounds” are the compounds involved in cell biomass growth. That is, the major components in the input and output flows that the biomass consume or produce during growth. Ideally, the total mass of the input components should equal the output components within the measurement accuracy limits. In practice, it is sufficient if they are the components appearing in the elemental mass balance equations. For instance if only C and N balances are used, it is only necessary to take into account the compounds containing C and N. Also, micronutrients can usually be neglected. 7. The scalar coupling (also known as spin–spin coupling or J-coupling) is the effect of the spin state of one nucleus on the energy of another nucleus, i.e., the effect of one nucleus on the local magnetic field (and, thus, resonance frequency) of the other nucleus. The coupling constant J therefore measures the interactions between two nuclei. Notably, such effect can

230

Pau Ferrer and Joan Albiol

be noted through 1–4 covalent bonds. Consequently, the scalar coupling between NMR active nuclei is the result of different spin states through the chemical bonds of a molecule, e.g., an amino acid, therefore resulting in the splitting of NMR signals, known as multiplets. These splitting patterns provide detailed insight into the connectivity of atoms in a molecule. The heteronuclear 1H-13C single quantum coherence spectroscopy (2D 1H-13C HSQC) results in spectra with signals between directly bound pairs of 1H and 13C atoms. In the fine structures of these signals the scalar couplings between directly linked 13C spins are represented. 8. To find a proper solution, one flux ratio is necessary for each remaining degree of freedom in the metabolic system after taking into account all available measurements. 9. If only an upper or lower bound for the flux ratios can be obtained (e.g., in Eq. 6), the constraint should be added as a row constraint using matrix A and vector b instead of including it in matrix N. 10. The 13CFlux2 software is a full upgrade to the original 13CFlux tool with numerous improvements. This tool operates in the Linux environment and a license (free for academic users) must be obtained from the developers. 11. When a symmetric molecule is involved in a reaction, it is convenient to define two parallel reactions with reversed labelling specifications and set a constraint specifying that the sum of the two fluxes must be zero in the net and exchange constraint sections. 12. Reaction section must include output reactions to the metabolites being measured for their labelling information. 13. The program can be stopped at any time with the “Ctrl+C” command. It can be used if the residual settles to a number to terminate the optimization. 14. The file “sim_opt.hdf5” can be displayed using the tool h5dump (h5dump sim_opt.hdf5 > sim_opt.txt).

Acknowledgement This work was supported by CTQ2010-15131 grant from the Spanish Ministry of Science and Innovation, and the Catalan Government (contract grant 2009-SGR-281 and Xarxa de Referència en Biotecnologia). The authors wish to thank Dr Hannu Maaheimo (VTT, Finland) for useful comments on NMR terminology.

13

C-MFA of P. pastoris

231

References 1. Wiechert W (2001) 13C metabolic flux analysis. Metab Eng 3:195–206 2. Sauer U (2006) Metabolic networks in motion: 13 C-based flux analysis. Mol Syst Biol 2:62 3. Jouhten P, Rintala E, Huuskonen A, Tamminen A, Toivari M, Wiebe M, Ruohonen L, Penttilä M, Maaheimo H (2008) Oxygen dependence of metabolic fluxes and energy generation of Saccharomyces cerevisiae CEN.PK113-1A. BMC Syst Biol 2:60 4. Frick O, Wittmann C (2005) Characterization of the metabolic shift between oxidative and fermentative growth in Saccharomyces cerevisiae by comparative 13C flux analysis. Microb Cell Fact 4:30 5. Kleijn RJ, Geertman JM, Nfor BK, Ras C, Schipper D, Pronk JT, Heijnen JJ, van Maris AJ, van Winden WA (2007) Metabolic flux analysis of a glycerol-overproducing Saccharomyces cerevisiae strain based on GC-MS, LC-MS and NMR-derived C-labelling data. FEMS Yeast Res 7:216–231 6. Heyland J, Fu J, Blank LM, Schmid A (2010) Quantitative physiology of Pichia pastoris during glucose-limited high-cell density fed-batch cultivation for recombinant protein production. Biotechnol Bioeng 107:357–368 7. Baumann K, Carnicer M, Dragosits M, Graf AB, Stadlmann J, Jouhten P, Maaheimo H, Gasser B, Albiol J, Mattanovich D et al (2010) A multi-level study of recombinant Pichia pastoris in different oxygen conditions. BMC Syst Biol 4:141 8. Jordà J, Jouhten P, Cámara E, Maaheimo H, Albiol J, Ferrer P (2012) Metabolic flux profiling of recombinant protein secreting Pichia pastoris growing on glucose:methanol mixtures. Microb Cell Fact 11:57 9. Fredlund E, Blank LM, Schnurer J, Sauer U, Passoth V (2004) Oxygen- and glucosedependent regulation of central carbon metabolism in Pichia anomala. Appl Environ Microbiol 70:5905–5911 10. Klein T, Heinzle E, Schneider K (2013) Metabolic fluxes in Schizosaccharomyces pombe grown on glucose and mixtures of glycerol and acetate. Appl Microbiol Biotechnol 97: 5013–5026 11. Blank LM, Lehmbeck F, Sauer U (2005) Metabolic-flux and network analysis in fourteen hemiascomycetous yeasts. FEMS Yeast Res 5:545–558 12. dos Santos MM, Gombert AK, Christensen B, Olsson L, Nielsen J (2003) Identification of in

13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

vivo enzyme activities in the cometabolism of glucose and acetate by Saccharomyces cerevisiae by using 13C-labeled substrates. Eukaryot Cell 2:599–608 Christen S, Sauer U (2011) Intracellular characterization of aerobic glucose metabolism in seven yeast species by 13C flux analysis and metabolomics. FEMS Yeast Res 11:263–272 Szyperski T (1995) Biosynthetically directed fractional 13C-labeling of proteinogenic amino acids. An efficient analytical tool to investigate intermediary metabolism. Eur J Biochem 232: 433–448 Sauer U, Hatzimanikatis V, Bailey JE, Hochuli M, Szyperski T, Wuthrich K (1997) Metabolic fluxes in riboflavin-producing Bacillus subtilis. Nat Biotechnol 15:448–452 Zamboni N, Fendt SM, Ruhl M, Sauer U (2009) 13C-based metabolic flux analysis. Nat Protoc 4:878–892 Fischer E, Zamboni N, Sauer U (2004) Highthroughput metabolic flux analysis based on gas chromatography-mass spectrometry derived 13 C constraints. Anal Biochem 325:308–316 Solà A, Maaheimo H, Ylonen K, Ferrer P, Szyperski T (2004) Amino acid biosynthesis and metabolic flux profiling of Pichia pastoris. Eur J Biochem 271:2462–2470 Solà A, Jouhten P, Maaheimo H, SanchezFerrando F, Szyperski T, Ferrer P (2007) Metabolic flux profiling of Pichia pastoris grown on glycerol/methanol mixtures in chemostat cultures at low and high dilution rates. Microbiology 153:281–290 Weitzel M, Nöh K, Dalman T, Niedenfuhr S, Stute B, Wiechert W (2013) 13CFLUX2— high-performance software suite for 13Cmetabolic flux analysis. Bioinformatics 29: 143–145 Wiechert W, Nöh K (2013) Isotopically nonstationary metabolic flux analysis: complex yet highly informative. Curr Opin Biotechnol 24(6): 979–986 Jordà J, Suarez C, Carnicer M, ten Pierick A, Heijnen JJ, van Gulik W, Ferrer P, Albiol J, Wahl A (2013) Glucose-methanol co-utilization in Pichia pastoris studied by metabolomics and instationary 13C flux analysis. BMC Syst Biol 7:17 Jordà J, Santos de Jesus S, Peltier S, Ferrer P, Albiol J (2014) Metabolic flux analysis of recombinant Pichia pastoris growing on different glycerol/methanol mixtures by iterative fitting of NMR-derived 13C-labelling data from

232

24.

25.

26.

27.

28.

Pau Ferrer and Joan Albiol proteinogenic amino acids. N Biotechnol 31(1): 120–132 Baumann K, Maurer M, Dragosits M, Cos O, Ferrer P, Mattanovich D (2008) Hypoxic fedbatch cultivation of Pichia pastoris increases specific and volumetric productivity of recombinant proteins. Biotechnol Bioeng 100: 177–183 Carnicer M, Baumann K, Toplitz I, SanchezFerrando F, Mattanovich D, Ferrer P, Albiol J (2009) Macromolecular and elemental composition analysis and extracellular metabolite balances of Pichia pastoris growing at dif ferent oxygen levels. Microb Cell Fact 8:65 van der Heijden RT, Romein B, Heijnen JJ, Hellinga C, Luyben KC (1994) Linear constraint relations in biochemical reaction systems: II. Diagnosis and estimation of gross errors. Biotechnol Bioeng 43:11–20 Stephanopoulos GN, Aristidou AA, Nielsen J (1998) Metabolic Engineering. Principles and methodologies. Academic, San Diego Verheijen PJT (2010) Data reconciliation and error detection. In: Smolke CD (ed) The metabolic engineering handbook. CRC Press, Boca Raton, FL, pp 8.1–8.13

29. Szyperski T (1998) 13C-NMR, MS and metabolic flux balancing in biotechnology research. Q Rev Biophys 31:41–106 30. Nargund S, Joffe ME, Tran D, Tugarinov V, Sriram G (2013) Nuclear magnetic resonance methods for metabolic fluxomics. Methods Mol Biol 985:335–351 31. Szyperski T, Glaser RW, Hochuli M, Fiaux J, Sauer U, Bailey JE, Wuthrich K (1999) Bioreaction network topology and metabolic flux ratio analysis by biosynthetic fractional 13C labeling and two-dimensional NMR spectroscopy. Metab Eng 1:189–197 32. Maaheimo H, Fiaux J, Cakar ZP, Bailey JE, Sauer U, Szyperski T (2001) Central carbon metabolism of Saccharomyces cerevisiae explored by biosynthetic fractional 13C labeling of common amino acids. Eur J Biochem 268:2464–2479 33. van Winden WA, Schipper D, Verheijen P, Heijnen J (2001) Innovations in generation and analysis of 2D [13C,1H] COSY NMR spectra for metabolic flux analysis purposes. Metab Eng 3:322–343 34. Press W, Flannery B, Teukolsy S, Vetterling W (2002) Numerical recipies example book (C++). The art of scientific computing. Cambridge University Press, Cambridge

Chapter 14 Pathway Activity Profiling (PAPi): A Tool for Metabolic Pathway Analysis Raphael B.M. Aggio Abstract Pathway Activity Profiling (PAPi) is a method developed to correlate levels of metabolites to the activity of metabolic pathways operating within biological systems. Based solely on a metabolomics data set and the Kyoto Encyclopedia of Genes and Genomes, PAPi predicts and compares the activity of metabolic pathways across experimental conditions, which considerably improves the hypothesis generation process for achieving the biological interpretation of biological studies. In this chapter, we describe how to apply PAPi to a metabolomics data set using the R-software. Key words Metabolic pathway activity, Metabolomics and systems biology

1

Introduction Metabolites are initial substrates, intermediate or final products of biochemical reactions, and are the link between different metabolic pathways that operate within a biological system. Many regulatory processes involving gene transcription, mRNAs, and enzymes determine the level of metabolites inside and outside of the cells. However, the convoluted nature of cell metabolism, where the same metabolite can participate in many different metabolic pathways, makes the pathway activity analysis one of the most difficult “omics” data to interpret [1]. The correlation between metabolite levels and metabolic pathway activity requires extensive knowledge about biochemical reactions, metabolic pathways, and the organism under study, which is not always available in most research labs and can be considerably time consuming if performed manually. In the last decade, many Web-based databases containing important information regarding metabolite diversity, metabolic pathways, biochemical reactions, enzymes, and genes have been developed [2]. Among them, the Kyoto Encyclopedia of Gene and Genomes (KEGG) [3] is one of the most popular, and it is freely available through http://www.genome.jp/kegg/.

Valeria Mapelli (ed.), Yeast Metabolic Engineering: Methods and Protocols, Methods in Molecular Biology, vol. 1152, DOI 10.1007/978-1-4939-0563-8_14, © Springer Science+Business Media, LLC 2014

233

234

Raphael B.M. Aggio

Also, KEGG has application programming interfaces (API) that allow its use by external software. Consequently, several computational tools have been created to automatically access, extract, and manipulate the information contained in this database (http://www.genome.jp/kegg/soap/) [4]. R [5], an opensource software environment developed for statistical computing (www.r-project.org), is among those with hundreds of available packages developed for different purposes, in particular “KEGGREST” [6], which enables access to KEGG database in a flexible way. The Pathway Activity Profiling (PAPi) is an R package that uses a metabolite profile and the KEGG database to predict and compare the activity of metabolic pathways between different experimental conditions [7]. For each metabolic pathway, PAPi generates an activity score (AS), which is calculated based on the absolute number of metabolites identified from each metabolic pathway and their relative abundances in each analyzed sample. The PAPi method is grounded on two assumptions. First, more intermediate compounds from a specific metabolic pathway will be detected when this metabolic pathway is operating at a higher activity (up-regulated). Higher metabolic pathway activity is associated with a higher intracellular conversion of metabolites; thus, metabolomics techniques have a greater probability of detecting these metabolites. Second, if the same compound is detected under two different experimental conditions, higher abundances of this intermediate will be detected when the associated metabolic pathway is operating in lower activity (downregulated). Lower activity of a metabolic pathway should result in lower intracellular conversion rates and, consequently, accumulation of metabolites. The AS calculated by PAPi represents the likelihood that a metabolic pathway is active inside the cell and allows a quick comparison of metabolic pathway activities across experimental conditions; the higher the AS, the lower the predicted activity of a metabolic pathway.

2

Materials

2.1 Installing R-Software and PAPi





R-software and its installation guide are available at http:// www.r-project.org/ (see Note 1). For installing PAPi, simply start R and enter source("http://bioconductor.org/biocLite.R") biocLite("PAPi")

Pathway Activity Profiling (PAPi): A Tool for Metabolic Pathway Analysis

3

235

Methods

3.1 Metabolomics Data

The metabolomics data set to be analyzed by PAPi consists in a spreadsheet (data frame) containing the names of identified metabolites in the first column and their respective abundance/relative abundances in each sample in the subsequent columns (Table 1). Some functions from PAPi package require identifying samples belonging to the different experimental conditions under study, and an additional row defining the name of the experimental condition associated with each sample can be used. In this case, the word “Replicates” is inserted in the first column and the following columns receive the name of the experimental condition associated with each related sample. Any name can be used for identifying samples and experimental conditions. If this additional row (“Replicates”) is not present in the input data, PAPi will show dialog boxes that allow the user to interactively select samples belonging to each experimental condition.

3.2 Preparing the Input Data

1. Save the metabolomics data set into a file using the commaseparated value (.csv) format. For this, open the metabolomics data set using Open Office (www.openoffice.org), which is FREE, or Microsoft Excel (http://office.microsoft.com/ en-gb/excel/). Make sure the metabolomics data is structured as shown in Table 1. 2. Click on “File” -> “Save As….” A new window will open. 3. Choose a name for the new file, the folder where the file will be saved and the format .csv. The new file with the extension .csv will be generated inside the specified folder.

3.3 Preparing R-Software and PAPi 3.4 Adding KEGG Codes to the Metabolomics Data

1. Open R (click on R icon) and enter library(PAPi) Follow option 1 if using the function addKeggCodes for the first time, or follow option 2 if the function addKeggCodes has been previously applied. (1) Using addKeggCodes for the first time (I) In R, enter data(keggLibrary) addKeggCodes(keggCodes = keggLibrary) (II) A new window called “Select the CSV file containing the input data” will open. Select the .csv file generated in step 1 in Subheading 3.2, and click on “Choose.” (III) A new window called “Select the folder where the output file will be saved” will open. Select the folder where the results produced by addKeggCodes must be saved, and click on “Choose.”

Abundance of M1 in Sample 1.1

Abundance of M2 in Sample 1.1

Abundance of M3 in Sample 1.1



M1

M2

M3

Mn



Abundance of M3 in Sample 1.2

Abundance of M2 in Sample 1.2

Abundance of M1 in Sample 1.2

Sample 1.2 ExpCond1



Abundance of M3 in Sample 1.3

Abundance of M2 in Sample 1.3

Abundance of M1 in Sample 1.3

Sample 1.3 ExpCond1



Abundance of M3 in Sample 2.1

Abundance of M2 in Sample 2.1

Abundance of M1 in Sample 2.1

Sample 2.1 ExpCond2



Abundance of M3 in Sample 2.2

Abundance of M2 in Sample 2.2

Abundance of M1 in Sample 2.2

Sample 2.2 ExpCond2



Abundance of M3 in Sample 2.3

Abundance of M2 in Sample 2.3

Abundance of M1 in Sample 2.3

Sample 2.3 ExpCond2

A metabolomics data set to be analyzed by PAPi consists of a spreadsheet (data frame) containing the name of identified metabolites in the first column and their respective abundance in each sample in the following columns. Note that an additional row (“Replicates”) may be added to define the experimental condition associated with each sample

Sample 1.1 ExpCond1

Name Replicates

Table 1 Metabolomics data set

236 Raphael B.M. Aggio

Pathway Activity Profiling (PAPi): A Tool for Metabolic Pathway Analysis

237

(IV) If compounds in the metabolomics data used as input data are not part of the keggLibrary used, a new window will open informing that few compounds from the input data are not listed in the KEGG code library used. Click on “Yes” to search the KEGG database for these missing compounds, and proceed to (V). Or click on “No,” and proceed to (XII). If every compound in the input data is part of the keggLibrary used, no new window will open at this stage. In this case, proceed to (XII) (see Note 2). (V) A new window will open giving directions to what is going to happen in the following steps. Click on “OK.” (VI) For each compound in the input data, a new window will open. This new window contains a list of compounds from KEGG database that are named in the same way or in a similar way as the compounds’ names in the input data. If the desired compound (see the name of the compound in the title of the window) is part of the presented list, select this compound, click on “OK,” and proceed to (VIII). If the desired compound is not part of the presented list, go to the bottom of this list and select “OTHER” to manually insert its respective KEGG code or select “SKIP” to exclude this specific analysis from the results. Click on “OK.” If “OTHER” has been selected, proceed to (VII). If “SKIP” has been selected, proceed to (VIII). (VII) A new window will open. Manually insert the KEGG code for this specific compound, and click on “OK.” KEGG codes can be manually found at http://www. genome.jp/dbget-bin/www_bfind?compound. (VIII) After every missing compound is searched in KEGG database, their respective KEGG codes are automatically added to the keggLibrary used and a new window will open to decide if this new keggLibrary should be saved into a . csv file for future use. If yes, click on “Yes” and proceed to (IX). If no, click on “No” and proceed to (XII). (IX) A new window called “Select the folder where the library will be saved” will open. Select the folder where the new keggLibrary must be saved, and click on “Choose.” (X) A new window will open. Define the name of the new keggLibrary, and click on “Ok.” (XI) A new file named according to (X) will be saved in the folder defined in (IX) (see Note 3). (XII) A new file called data_with_kegg_codes.csv will be saved in the folder defined in (III). The file data_with_kegg_

238

Raphael B.M. Aggio

codes.csv has the same format as the metabolomics data used as input data; however, the names of compounds are substituted by their respective KEGG codes. (2) If addKeggCodes has been previously applied (I) In R, enter addKeggCodes() (II) A new window called “Select the CSV file containing the input data” will open. Select the .csv file generated in step 1 in Subheading 3.2, and click on “Choose.” A new window called “Select the CSV file containing the KEGG codes” will open. (III) Select the keggLibrary generated when addKeggCodes was first used [Subheading 3.4, (option 1—XI)], and click on “Choose.” The keggLibrary is a .csv file containing a list of compound names and their respective KEGG codes. (IV) A new window called “Select the folder where the output file will be saved” will open. Select the folder where the file produced by addKeggCodes must be saved, and click on “Choose.” (V) If compounds in the metabolomics data used as input data are not part of the keggLibrary used, a new window will open informing that compounds from the input data are not listed in the KEGG code library used. Click on “Yes” to search the KEGG database for these missing compounds, and proceed to (VI). Or click on “No,” and proceed to (XIII). If every compound in the input data is part of the keggLibrary used, no new window will open. In this case, proceed to (XIII) (see Note 2). (VI) A new window will open giving direction to what is going to happen in the following steps. Click on “OK.” (VII) For each compound in the input data, a new window will open. This new window contains a list of compounds from KEGG database that are named in the same way or in a similar way as the compound names in the input data. If the desired compound (see the name of the compound in the title of the window) is part of the presented list, select this compound, click on “OK,” and proceed to (IX). If the desired compound is not part of the presented list, go to the bottom of this list and select “OTHER” to manually insert its respective KEGG code or select “SKIP” to exclude this specific analysis from the results. Click on “OK.” If “OTHER” has been selected, proceed to (VIII). If “SKIP” has been selected, proceed to (IX).

Pathway Activity Profiling (PAPi): A Tool for Metabolic Pathway Analysis

239

(VIII) A new window will open. Manually insert the KEGG code for this specific compound, and click on “OK.” KEGG codes can be manually found at http://www. genome.jp/dbget-bin/www_bfind?compound. (IX) After every missing compound is searched in KEGG database, their respective KEGG codes are automatically added to the keggLibrary used and a new window will open to decide if this new keggLibrary should be saved into a .csv file for future use. If yes, click on “Yes” and proceed to (X). If no, click on “No” and proceed to (XIII). (X) A new window called “Select the folder where the library will be saved” will open. Select the folder where the new keggLibrary must be saved, and click on “Choose.” (XI) A new window will open. Define the name of the new keggLibrary, and click on “Ok.” (XII) A new file named according to (XI) will be saved in the folder defined in (X) (see Note 4). (XIII) A new file called data_with_kegg_codes.csv will be saved in the folder defined in (IV). The file data_with_kegg_ codes.csv has the same format as the metabolomics data used as input data; however, the names of compounds were substituted by their respective KEGG codes. 3.5 Ways of Applying PAPi

PAPi can be performed off-line using a local database or online using the Internet connection to collect data from KEGG database. The PAPi package version 0.99.3 brings a local version of KEGG database (default) generated on the 25th of March of 2013. Follow option 1 for generating a new local version of KEGG database, and apply PAPi off-line; follow option 2 for applying PAPi off-line using the default database; or follow option 3 for applying PAPi online using the Internet connection and KEGG database (see Note 5). (1) Install a new local version of KEGG database, and apply PAPi. (I) In R, enter buildDatabase(save = FALSE) A new database will be saved in the R installation folder R/library/PAPi/databases/ with the name “KEGGDatabaseYEAR-MONTH-DAYTIME,” e.g., KEGGDatabase2013-04-01191629 for a database generated on 1st of April of 2013 at 19 h:16 min:29 s (see Note 6). (II) In R, enter papi(localDatabase = "choose")

240

Raphael B.M. Aggio

(III) A new window called “Which database do you want to use?” will open. Select the database generated in (I), and click on “OK.” (IV) A new window called “Select the CSV file containing the input data” will open. Select the file data_with_kegg_ codes.csv generated in Subheading 3.4 (option 1—XII or option 2—XIII), and click on “Choose.” (V) A new window called “Select the folder where the output file will be saved” will open. Choose the folder where PAPi results must be saved, and click on “Choose.” A file called papi_results.csv will be generated in this specific folder. Proceed to Subheading 3.6. (2) Apply PAPi off-line using the default database. (I) In R, enter papi() (II) A new window called “Select the CSV file containing the input data” will open. Select the file data_with_kegg_ codes.csv generated in Subheading 3.4 (option 1—XII or option 2—XIII), and click on “Choose.” (III) A new window called “Select the folder where the output file will be saved” will open. Choose the folder where PAPi results must be saved, and click on “Choose.” A file called papi_results.csv will be generated in this specific folder. Proceed to Subheading 3.6. (3) Apply PAPi online using the Internet connection and KEGG database. (I) In R, enter papi(offline = FALSE) (II) A new window called “Select the CSV file containing the input data” will open. Select the file data_with_kegg_ codes.csv generated in Subheading 3.4 (option 1—XII or option 2—XIII), and click on “Choose.” (III) A new window called “Select the folder where the output file will be saved” will open. Choose the folder where PAPi results must be saved, and click on “Choose.” A file called papi_results.csv will be generated in this specific folder. Proceed to Subheading 3.6. 3.6 Find Metabolic Pathways Showing Significantly Different Activity Scores Across Experimental Conditions

Follow option 1 for performing t-Tests on PAPi results, or follow option 2 for performing ANOVAs. ANOVA is generally performed when comparing the activities of metabolic pathways between three or more experimental conditions, while t-Test is applied when comparing pathway activities between two experimental conditions.

Pathway Activity Profiling (PAPi): A Tool for Metabolic Pathway Analysis

241

(1) Performing a t-test on PAPi results: (I) In R, enter papiHtest(StatTest = "t-test", signif. level = 0.05) Only compounds showing a p-value lower than the one defined through the argument signif.level will be returned. The t-test is performed as two sided, as non-paired, and with unequal variance between experimental conditions. (II) A new window called “Select the CSV file containing the input data” will open. Choose the file papi_results.csv generated in Subheading 3.5 (option 1—V, option 2—III, or option 3—III), and click on “Choose.” (III) A new window called “Select the folder where the output file will be saved” will open. Select the folder where the results from papiHtest must be saved, and click on “Choose.” A new file called ttest.csv will be generated in this specific folder. (2) Performing an ANOVA on PAPi results: (I) In R, enter papiHtest(StatTest level = 0.05)

=

"ANOVA",

signif.

Only compounds showing a p-value lower than the one defined through the argument signif.level will be returned. The ANOVA is performed as one-way ANOVA. See the R function anova for more details. (II) A new window called “Select the CSV file containing the input data” will open. Choose the file papi_results.csv generated in Subheading 3.5 (option 1—V, option 2—III, or option 3—III), and click on “Choose.” (III) A new window called “Select the folder where the output file will be saved” will open. Select the folder where the results from papiHtest must be saved, and click on “Choose.” A new file called anova.csv will be generated in this specific folder. 3.7 Plotting PAPi Results

Follow option 1 for generating a line graph with PAPi results. (1) Plotting PAPi results: (I) In R, enter (see Note 7) papiLine(setRef.cond "manual")

=

TRUE,

color

=

(II) A new window called “Select the CSV file containing the input data” will open. Choose the file papi_results.csv, ttest.csv or anova.csv generated in Subheading 3.5 (option 1—V, option 2—III, and option 3—III) and

242

Raphael B.M. Aggio

Subheading 3.6 (option 1—III or option 2—III), and click on “Choose.” (III) After selecting the folder where PAPi graph will be saved, a new window called “Please select the condition to be used as reference” will open. Select the experimental condition to be used as reference, and click on “OK.” (IV) A new window called “Choose the color to represent the condition X” will open, where X is the name of the experimental condition under analysis. Select a color, and click on “OK.” A new window will open to select the color representing the other experimental conditions. Once all the experimental conditions are associated with one color, a new window will open with the graphical output. (V) For saving the resultant graph to a PDF file, click on “R->save.” A new window will open. Select the folder where the file must be saved, define a name for the new file, and click on “Save.” The new PDF file will be generated in the specified folder.

4 4.1

Example PAPi Analysis

Table 2 shows an example of a metabolomics data set to be analyzed by PAPi. This data set represents part of the intracellular metabolite profile of the bacteria Lactococcus lactis (the example is applicable to any other metabolite profile of unicellular organisms) growing aerobically and anaerobically in a chemically defined medium. The data set is composed of six samples (Aero_1, Aero_2, Aero_3, Ana_1, Ana_2, Ana_3) from two experimental conditions (aerobic and anaerobic) and will be used here to exemplify the use of each function from the PAPi package. 1. Open the OpenOffice software (www.openoffice.org) or Microsoft Excel, and insert the content of Table 2 in a spreadsheet. Click on “File” -> “Save As….” In the new window, select the format .csv, add demo_papi.csv as file name, and click on “Save.” 2. Open R, and enter library(PAPi) 3. In R, enter data(keggLibrary) addKeggCodes(keggCodes = keggLibrary) 4. In the new window, select the file demo_data.csv and click on “Choose.” 5. In the new window, select the folder where the results of addKeggCodes must be saved and click on “Choose.”

Pathway Activity Profiling (PAPi): A Tool for Metabolic Pathway Analysis

243

Table 2 A metabolite profile of Lactococcus lactis Name Replicates

Aero_1 Aerobic

Aero_2 Aerobic

Aero_3 Aerobic

Ana_1 Anaerobic

Ana_2 Anaerobic

Ana_3 Anaerobic

Lactate

5.95

5.05

9.80

523.32

560.01

510.23

Valine

39.22

43.50

38.51

484.11

509.85

431.45

Methionine

2.20

2.65

2.25

15.02

12.48

18.19

Benzoate

0.38

0.37

0.18

0.64

0.66

0.84

Citrate

3.20

2.52

2.04

31.50

22.97

29.12

Cysteine

NA

NA

NA

0.15

0.21

0.19

Glutathione

0.19

0.13

0.18

0.03

0.03

0.04

Threonine

3.77

4.37

2.98

9.73

8.61

9.02

Tetradecanoate

0.14

0.14

0.13

NA

NA

NA

Alanine

43.02

48.74

34.09

183.43

197.21

109.80

Metabolite profile of Lactococcus lactis growing aerobically and anaerobically in a chemically defined medium. It is composed of metabolite’s names in the first column and their respective abundances or intensities in six samples (Aero_1, Aero_2, Aero_3, Ana_1, Ana_2, Ana_3) in the following columns. These six samples represent three biological replicates of L. lactis growing aerobically (Aero) and three biological replicates of L. lactis growing anaerobically (Ana)

6. A new window will open informing that few compounds from the input data are not present in the KEGG code library used. These compounds are listed in the R command window. Click on “Yes” to search the KEGG database for these missing compounds (see Note 2). 7. A new window will open giving direction to what is going to happen in the following steps. Click on “OK.” 8. A new window called “Alanine” will open. This new window contains a list of compounds from KEGG database that shows “Alanine” as part of their name. Select the first compound of the list, “L-Alanine; L-2-Aminopropionic acid; L-alphaAlanine,” and click on “OK.” A new window will open with a list of options for the rest of missing compounds. The correct compound is mostly at the top of the presented list. Choose the compound that best matches the title of the window. Repeat this step until no more lists of compounds are presented. 9. After every missing compound is searched, a new window will open asking if the new KEGG library must be saved into a .csv file for future use. Click on “No.” 10. A new file called data_with_kegg_codes.csv (Table 3) will be saved in the folder defined in step 5 of this section.

244

Raphael B.M. Aggio

Table 3 Metabolite profile of Lactococcus lactis with KEGG codes Name Replicates

Aero_1 Aerobic

Aero_2 Aerobic

Aero_3 Aerobic

Ana_1 Anaerobic

Ana_2 Anaerobic

Ana_3 Anaerobic

C00041

43.02

48.74

34.09

183.43

197.21

109.8

C00180

0.38

0.37

0.18

0.64

0.66

0.84

C00097

NA

NA

NA

0.15

0.21

0.19

C00051

0.19

0.13

0.18

0.03

0.03

0.04

C00256

5.95

5.05

9.8

523.32

560.01

510.23

C00073

2.2

2.65

2.25

15.02

12.48

18.19

C06424

0.14

0.14

0.13

NA

NA

NA

C00188

3.77

4.37

2.98

9.73

8.61

9.02

C00183

39.22

43.5

38.51

484.11

509.85

431.45

Metabolite profile of Lactococcus lactis growing aerobically and anaerobically in a chemically defined medium. It is composed of KEGG codes for metabolites in the first column and their respective abundances or intensities in six samples (Aero_1, Aero_2, Aero_3, Ana_1, Ana_2, Ana_3) in the following columns. These six samples represent three biological replicates of L. lactis growing aerobically (Aero) and three biological replicates of L. lactis growing anaerobically (Ana)

4.2 Applying Papi Off-Line Using the Default Database

1. In R, enter papi() 2. A new window called “Select the CSV file containing the input data” will open. Select the file data_with_kegg_codes.csv generated in step 10 of this section, and click on “Choose.” 3. A new window called “Select the folder where the output file will be saved” will open. Choose the folder where PAPi results must be saved, and click on “Choose.” A file called papi_ results.csv (Table 4) will be generated in this specific folder.

4.3 Performing t-Test on PAPi Results: Aerobic vs. Anaerobic

1. In R, enter papiHtest(signif.level "t-test")

=

0.05,

StatTest

=

Only metabolic pathways showing a p-value lower than the one defined through the argument signif.level will be returned. The t-test is performed as two sided, as non-paired, and with unequal variance between experimental conditions. 2. A new window called “Select the CSV file containing the input data” will open. Choose the file papi_results.csv generated in step 3 of Subheading 4.2, and click on “Choose.” 3. A new window called “Select the folder where the output file will be saved” will open. Select the folder where the results from papiHtest must be saved, and click on “Choose.” A new file called ttest.csv (Table 5) will be generated in the folder specified.

Pathway Activity Profiling (PAPi): A Tool for Metabolic Pathway Analysis

245

Table 4 Results from PAPi function Pathway name Replicates

Aero_1 Aerobic

Aero_2 Aerobic

Aero_3 Aerobic

Ana_1 Anaerobic

Ana_2 Ana_3 Anaerobic Anaerobic

ABC transporters

2133.45

2394.32

1875.06

16763.18

17713.58

13620.17

Cyanoamino acid metabolism 1804.12

2001.00

1771.46

22269.06

23453.10

19846.70

Valine, leucine, and isoleucine 1608.02 degradation

1783.50

1578.91

19848.51

20903.85

17689.45

Glucosinolate biosynthesis

1553.25

1730.63

1528.50

18717.38

19587.38

16861.50

Propanoate metabolism

1411.92

1566.00

1386.36

17427.96

18354.60

15532.20

Aminoacyl-tRNA biosynthesis 1168.78

1315.20

1031.25

7339.86

7720.62

6027.69

Selenocompound metabolism 1161.54

1315.98

920.43

4952.61

5324.67

2964.60

Pantothenate and CoA biosynthesis

1098.16

1218.00

1078.28

6779.64

7140.84

6042.96

Protein digestion and absorption

1036.47

1166.31

914.50

6508.94

6846.58

5345.31

Alanine, aspartate, and glutamate metabolism

1032.48

1169.76

818.16

4402.32

4733.04

2635.20

Carbon fixation in photosynthetic organisms

989.46

1121.02

784.07

4218.89

4535.83

2525.40

Taurine and hypotaurine metabolism

946.44

1072.28

749.98

2019.38

2171.62

1209.89

Cysteine and methionine metabolism

862.79

978.88

693.88

2830.48

2991.50

1827.14

Penicillin and cephalosporin biosynthesis

705.96

783.00

693.18

4358.34

4590.54

3884.76

Mineral absorption

639.52

719.64

564.27

5019.10

5279.09

4121.34

Valine, leucine, and isoleucine 494.39 biosynthesis

550.51

477.14

5679.16

5962.29

5065.41

Porphyrin and chlorophyll metabolism

467.48

541.88

369.52

1206.52

1067.64

1118.48

Sulfur relay system

430.20

487.40

340.90

917.90

987.10

549.95

D-alanine metabolism

258.12

292.44

204.54

1100.58

1183.26

658.80

Glycine, serine, and threonine 192.27 metabolism

222.87

151.98

251.94

224.91

234.86

Pyruvate metabolism

190.40

161.60

313.60

16746.24

17920.32

16327.36

Metabolic pathways

88.78

99.76

78.19

693.11

729.05

569.53

Biosynthesis of plant secondary metabolites

88.21

99.26

77.83

692.44

728.36

568.65 (continued)

246

Raphael B.M. Aggio

Table 4 (continued) Pathway name Replicates

Aero_1 Aerobic

Aero_2 Aerobic

Aero_3 Aerobic

Ana_1 Anaerobic

Ana_2 Ana_3 Anaerobic Anaerobic

Microbial metabolism in diverse environments

47.17

53.48

37.25

193.95

206.69

119.85

Biosynthesis of secondary metabolites

45.57

50.89

43.92

509.50

531.60

459.50

2-Oxocarboxylic acid metabolism

41.42

46.15

40.76

499.13

522.33

449.64

Biosynthesis of alkaloids derived from shikimate pathway

39.60

43.87

38.69

484.75

510.51

432.29

Bile secretion

33.25

22.75

31.50

5.25

5.25

7.00

Aminobenzoate degradation

31.92

31.08

15.12

53.76

55.44

70.56

Phenylalanine metabolism

27.36

26.64

12.96

46.08

47.52

60.48

Benzoate degradation

25.08

24.42

11.88

42.24

43.56

55.44

Dioxin degradation

22.04

21.46

10.44

37.12

38.28

48.72

Toluene degradation

17.10

16.65

8.10

28.80

29.70

37.80

Glutathione metabolism

7.22

4.94

6.84

3.42

4.56

4.37

Fatty acid biosynthesis

6.86

6.86

6.37

NA

NA

NA

Biosynthesis of plant hormones

2.20

2.65

2.25

15.02

12.48

18.19

Degradation of aromatic compounds

0.38

0.37

0.18

0.64

0.66

0.84

Sulfur metabolism

NA

NA

NA

2.70

3.78

3.42

Thiamine metabolism

NA

NA

NA

3.90

5.46

4.94

The result from PAPi function was saved to the file papi_results.csv. The first column contains the name of metabolic pathways found, and the following columns contain the activity score of each pathway in the analyzed samples

4.4 Plotting PAPi Results

1. In R, enter (see Note 7) papiLine(relative = TRUE, setRef.cond = TRUE, color = "manual", save = FALSE) 2. A new window called “Select the CSV file containing the input data” will open. Choose the file ttest.csv generated in step 3 in Subheading 4.3, and click on “Choose.” 3. A new window called “Please select the condition to be used as reference” will open. Select the experimental condition (e.g., aerobic) to be used as reference, and click on “OK.” Any experimental condition may be selected as reference. The best experimental condition to be selected as reference depends on the hypothesis being tested.

Table 5 Result of papiHtest function Pathway name Replicates

Aero_1 Aerobic

Aero_2 Aerobic

Aero_3 Aerobic

Ana_1 Ana_2 Ana_3 p-values Anaerobic Anaerobic Anaerobic NA

Protein digestion and absorption

1036.47

1166.31

914.50

6508.94

6846.58

5345.31

0.0065

Alanine, aspartate, and glutamate metabolism

1032.48

1169.76

818.16

4402.32

4733.04

2635.20

0.0434

Carbon fixation in photosynthetic organisms

989.46

1121.02

784.07

4218.89

4535.83

2525.40

0.0434

Cysteine and methionine metabolism

862.79

978.88

693.88

2830.48

2991.50

1827.14

0.0372

Penicillin and cephalosporin biosynthesis

705.96

783.00

693.18

4358.34

4590.54

3884.76

0.0030

Mineral absorption

639.52

719.64

564.27

5019.10

5279.09

4121.34

0.0063

Valine, leucine, and isoleucine biosynthesis

494.39

550.51

477.14

5679.16

5962.29

5065.41

0.0026

Porphyrin and chlorophyll metabolism

467.48

541.88

369.52

1206.52

1067.64

1118.48

0.0006

ABC transporters

2133.45

2394.32

1875.06

16763.18 17713.58 13620.17 0.0072

D-alanine metabolism

258.12

292.44

204.54

1100.58

Pyruvate metabolism 190.40

161.60

313.60

16746.24 17920.32 16327.36 0.0007

Metabolic pathways

88.78

99.76

78.19

693.11

729.05

569.53

0.0063

Biosynthesis of plant 88.21 secondary metabolites

99.26

77.83

692.44

728.36

568.65

0.0063

Microbial metabolism in diverse environments

47.17

53.48

37.25

193.95

206.69

119.85

0.0388

Biosynthesis of secondary metabolites

45.57

50.89

43.92

509.50

531.60

459.50

0.0020

2-Oxocarboxylic acid metabolism

41.42

46.15

40.76

499.13

522.33

449.64

0.0022

Biosynthesis of alkaloids derived from shikimate pathway

39.60

43.87

38.69

484.75

510.51

432.29

0.0027

1183.26

658.80

0.0434

(continued)

248

Raphael B.M. Aggio

Table 5 (continued) Pathway name Replicates

Aero_1 Aerobic

Aero_2 Aerobic

Aero_3 Aerobic

Ana_1 Ana_2 Ana_3 p-values Anaerobic Anaerobic Anaerobic NA

Bile secretion

33.25

22.75

31.50

5.25

Cyanoamino acid metabolism

1804.12

2001.00

1771.46

22269.06 23453.10 19846.70 0.0027

Aminobenzoate degradation

31.92

31.08

15.12

53.76

55.44

70.56

0.0114

Phenylalanine metabolism

27.36

26.64

12.96

46.08

47.52

60.48

0.0114

Benzoate degradation

25.08

24.42

11.88

42.24

43.56

55.44

0.0114

Dioxin degradation

22.04

21.46

10.44

37.12

38.28

48.72

0.0114

Toluene degradation 17.10

16.65

8.10

28.80

29.70

37.80

0.0114

Fatty acid biosynthesis

6.86

6.86

6.37

NA

NA

NA

0.0000

Biosynthesis of plant 2.20 hormones

2.65

2.25

15.02

12.48

18.19

0.0155

Degradation of aromatic compounds

0.38

0.37

0.18

0.64

0.66

0.84

0.0114

Sulfur metabolism

NA

NA

NA

2.70

3.78

3.42

0.0000

Valine, leucine, and isoleucine degradation

1608.02

1783.50

1578.91

19848.51 20903.85 17689.45 0.0027

Thiamine metabolism

NA

NA

NA

3.90

Glucosinolate biosynthesis

1553.25

1730.63

1528.50

18717.38 19587.38 16861.50 0.0022

Propanoate metabolism

1411.92

1566.00

1386.36

17427.96 18354.60 15532.20 0.0027

Aminoacyl-tRNA biosynthesis

1168.78

1315.20

1031.25

7339.86

7720.62

6027.69

0.0065

Selenocompound metabolism

1161.54

1315.98

920.43

4952.61

5324.67

2964.60

0.0434

Pantothenate and CoA biosynthesis

1098.16

1218.00

1078.28

6779.64

7140.84

6042.96

0.0030

5.25

5.46

7.00

4.94

0.0164

0.0000

The result of papiHtest function was saved to the file ttest.csv. The first column contains the name of metabolic pathways, the following six columns contain the activity score of each pathway in the analyzed samples, and the last column contains the p-values returned by the t-test (two sided, non-paired, and with unequal variance between experimental conditions). PAPi searches every metabolic pathway in KEGG database. Therefore, it may report metabolic pathways not necessarily related to the organism under study, such as bile secretion pathway reported for Lactococcus lactis (see Note 8)

Thiamine metabolism Sulfur metabolism Bile secretion Phenylalanine metabolism Benzoate degradation Aminobenzoate degradation Dioxin degradation Toluene degradation Degradation of aromatic compounds Porphyrin and chlorophyll metabolism Cysteine and methionine metabolism Selenocompound metabolism Alanine, aspartate and glutamate metabolism Carbon fixation in photosynthetic organisms D-Alanine metabolism Penicillin and cephalosporin biosynthesis Pantothenate and CoA biosynthesis Aminoacyl-tRNA biosynthesis Protein digestion and absorption Biosynthesis of plant hormones Mineral absorption Biosynthesis of plant secondary metabolites ABC transporters Valine, leucine and isoleucine biosynthesis Glucosinolate biosynthesis 2-Oxocarboxylic acid metabolism Biosynthesis of alkaloids derived from shikimate pathway Cyanoamino acid metabolism Valine, leucine and isoleucine degradation Metabolic pathways Microbial metabolism in diverse environments Biosynthesis of secondary metabolites Propanoate metabolism Glycolysis / Gluconeogenesis Styrene degradation HIF-1 signaling pathway Pyruvate metabolism Fatty acid biosynthesis

Activity Score (AS)

Pathway Activity Profiling (PAPi): A Tool for Metabolic Pathway Analysis

-60

249

0

-20

-40

Aerobic Anaerobic Not present in Aerobic Not present in Anaerobic Conditions

-80

Fig. 1 PAPi graph. The activity scores (ASs) calculated for each metabolic pathway were plotted using the experimental condition aerobic as reference. The ASs of anaerobic condition were inverted before plotting, as the papiLine’s argument relative was set to TRUE. Any experimental condition may be selected as reference

4. A new window called “Choose the color to represent the condition Aerobic” will open. Select a color (e.g., red), and click on “OK.”

5. A new window called “Choose the color to represent the condition Anaerobic” will open. Select a color (e.g., blue), and click on “OK.” A new window will open with the graphical output.

6. For saving the resultant graph (Fig. 1) to a PDF file, click on “R->save.” A new window will open. Select the folder where the file must be saved, define a name for the new file, and click on “Save.” The new PDF file will be generated in the specified folder.

250

5

Raphael B.M. Aggio

Notes 1. R and PAPi require administrative permissions to be installed. The user log-in MUST be one of the administrators of the computer in use. 2. Internet connection is required for searching KEGG database. 3. Any existing .csv file showing the same name defined in (X) will be replaced. 4. An existing .csv file showing the same name defined in (XI) will be replaced. 5. Applying PAPi off-line is considerably faster than online, and the results produced off-line are always reproducible when the same database is used. On the other hand, KEGG database is constantly being updated and applying PAPi online may generate different results when performed on different days. 6. Internet connection is required, and the computer user must have administrative permission. If using a Windows operational system, R must be opened as administrator (click on R icon with the right button of the mouse, and click on “Run as administrator”). Depending on the Internet connection speed, this process might take more than 5 h. 7. In R, enter ?papiLine for more information about each of its arguments and how they can be used to customize the graphical output. 8. PAPi performs non-species-specific enquiries to KEGG database. Therefore, it may report metabolic pathways not necessarily related to the organism under study. This strategy allows to identify metabolites and metabolic pathways potentially connecting the metabolism of different organisms, such as metabolic pathways linking the metabolism of bacteria and humans.

References 1. Cakir T et al (2006) Integration of metabolome data with metabolic networks reveals reporter reactions. Mol Syst Biol 2:50 2. Kopka J et al (2005) [email protected]: the Golm Metabolome Database. Bioinformatics 21:1635–1638 3. Ogata H et al (1999) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 27:29–34 4. Arita M (2004) Computational resources for metabolomics. Brief Funct Genomic Proteomic 3:84–93

5. R_Development_Core_Team (2012) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna 6. Tenenbaum, D (2013) KEGGREST: Clientside REST access to KEGG. 7. Aggio RBM, Ruggiero K, Villas-Boas SG (2010) Pathway Activity Profiling (PAPi): from the metabolite profile to the metabolic pathway activity. Bioinformatics 26:2969–2976

Chapter 15 QTL Mapping by Pooled-Segregant Whole-Genome Sequencing in Yeast Thiago M. Pais, María R. Foulquié-Moreno, and Johan M. Thevelein Abstract Quantitative trait locus (QTL) mapping by pooled-segregant whole-genome sequencing in yeast is a robust methodology for the simultaneous identification of superior genes involved in polygenic traits (e.g., high ethanol tolerance). By crossing two haploid strains with opposite phenotypes, being one of interest, the resulting diploid is sporulated, the meiotic segregants phenotyped, and a pool of selected segregants with the phenotype of interest assembled. The genotyping by pooled-segregant sequencing constitutes a fast and reliable methodology to map all QTL defining the trait of interest. The QTLs can be further analyzed by reciprocal hemizygosity analysis to identify the causative superior alleles that can subsequently be used for yeast strain improvement by targeted genetic engineering. Key words QTL mapping, Yeast, Genome sequencing, Polygenic traits, Meiotic segregants, Reciprocal hemizygosity analysis, Causative alleles

1

Introduction Polygenic or complex traits involve the participation of two or usually more genes. All the causative genes have to act together in order to manifest the phenotype. The regions of the genome where the causative genes are located can be determined by quantitative trait locus (QTL) mapping. The QTL(s) can be identified by linkage analysis through the evaluation of phenotypes and genetic markers, preferably single-nucleotide polymorphisms (SNPs), in meiotic segregants [1, 2]. The scoring of the SNPs can be performed by DNA hybridization to microarrays of PCR-amplified DNA [3] or genomic DNA [4]. The genotyping can also be done by polarized fluorescence spectroscopy [5], mass spectrometry [6], using DNA barcodes [7], or DNA sequencing to identify SNPs [8, 9]. Swinnen et al. [10] demonstrated that pooled-segregant whole-genome sequencing can be successfully applied in QTL mapping with a relatively small selected pool of meiotic segregants.

Valeria Mapelli (ed.), Yeast Metabolic Engineering: Methods and Protocols, Methods in Molecular Biology, vol. 1152, DOI 10.1007/978-1-4939-0563-8_15, © Springer Science+Business Media, LLC 2014

251

252

Thiago M. Pais et al.

The evaluation of candidate genes within the QTLs can be performed by reciprocal hemizygosity analysis [1, 10]. With this method, a candidate gene is evaluated in a hemizygous hybrid background, consisting of two diploid strains, in which one of the two parental alleles of the candidate gene is present and the other allele is deleted. Positive alleles identified in this way can then be exchanged between inferior and superior strains, disrupted or overexpressed, and the biological effect of this alteration assessed to confirm their involvement in determining the trait of interest. An overview of the QTL mapping by pooled-segregant sequencing is shown in Fig. 1.

2

Materials

2.1 DNA Staining with Propidium Iodide (PI)

1. 70 % ethanol (store at −20 °C). 2. DNA staining buffer: 50 mM Tris–HCl, pH 7.7, 15 mM MgCl2. 3. RNAse (DNAse free): 10 mg/L in buffer; boil for 5 min, and let it cool down at room temperature. Store at −20 °C. 4. Propidium iodide (PI, light sensitive (see Note 1)): Prepare 1.84 mM stock solution (40× stock solution) in DNA staining buffer. 5. Sterile ice-cold Milli-Q water.

2.2 Sporulation and Tetrad Dissection

1. Solid sporulation medium: 0.5 % (w/v) potassium acetate and 1.5 % (w/v) bacto agar, pH 6.0 (adjust pH with 0.5 M NaOH or 0.5 M HCl). 2. Lyticase: 500 U/mL. 3. Micromanipulator. 4. YPD medium: 1 % (w/v) yeast extract, 2 % (w/v) bactopeptone, and 2 % (w/v) glucose; for solid media add 1.5 % (w/v) agar. 5. Sterile ice-cold Milli-Q water.

2.3 Mating Type Determination

1. Sodium hydroxide (0.02 M NaOH). 2. Primers for the mating type locus: MAT locus: 5′ AGTCACATCAAGATCGTTTATGG 3′. HMRα: 5′ GCACGGAATATGGGACTACTTCG 3′. HMRa: 5′ ACTCCACTTCAAGTAAGAGTTTG 3′. 3. PCR reaction mix: 2 μL buffer for Standard Taq polymerase, 1 μL dNTPs (2.5 mM each), 1 μL each primer (10 μM), 0.5 μL Standard Taq polymerase, 1 μL DNA solution, Milli-Q water up to 20 μL. 4. 1 % (w/v) agarose gel.

QTL Mapping by Pooled-Segregant Whole-Genome Sequencing in Yeast

253

Fig. 1 QTL mapping by pooled-segregant whole-genome sequencing in yeast. Causative genes involved in polygenic traits can be identified simultaneously through analysis of yeast meiotic segregants. Pooledsegregant sequencing allows the rapid mapping of QTLs, whereas RHA identifies the gene(s) involved in the trait within the QTLs. Legend: Red and white sections represent the DNA from the superior and inferior parent strains, respectively. Black lines represent the polymorphisms (SNPs and indels) that are used for the genetic mapping

254

Thiago M. Pais et al.

2.4 Genomic DNA Isolation

1. Extraction buffer: 1 M sorbitol, 100 mM trisodium citrate, 60 mM EDTA, pH 7.0 (adjust pH with 0.5 M NaOH or 0.5 M HCl). 2. Lyticase solution: 300 U/mL Lyticase and 8 μL/mL β-mercaptoethanol in extraction buffer (stored at −20 °C). 3. Lysis buffer: 2 % (w/v) SDS, 50 mM Tris–HCl, pH 8.0, 10 mM EDTA, pH 8.0. 4. 5 M NaCl. 5. TE buffer: 10 mM Tris–HCl, pH 8.0, 1 mM EDTA, pH 8.0. 6. PCI: Phenol:chloroform:isoamyl alcohol (25:24:1, v/v). 7. 100 % ethanol and 70 % (v/v) ethanol (stored at −20 °C). Extraction, lysis, and TE buffers and 5 M NaCl solution are sterilized.

2.5 Mating and Diploidization Detection 2.6 Inbreeding of Meiotic Segregants

1. YPD medium: 1 % (w/v) yeast extract, 2 % (w/v) bactopeptone, and 2 % (w/v) glucose; for solid media add 1.5 % w/v agar. 2. Microscope (40× enlargement). 1. Sporulation medium (see item 1, Subheading 2.2). 2. Sterile Milli-Q water. 3. Glass beads. 4. Zymolyase (1 mg/mL, 100T). 5. β-mercaptoethanol. 6. 1.5 % (v/v) Nonidet™ P-40 (NP-40 is a nonionic, nondenaturing detergent). 7. YPD plates with chloramphenicol (100 μg/mL).

3

Methods

3.1 Yeast Strain Selection and Characterization

1. A superior yeast strain, displaying the phenotype of interest (e.g., high ethanol tolerance), has to be selected under experimental conditions that clearly distinguish its performance from an inferior yeast strain (e.g., low ethanol tolerance). For many phenotypes, a laboratory strain (e.g., S288c [10]) can be utilized as inferior strain (see Note 2). 2. The selected superior strain needs to be characterized for its ploidy and sporulation capacity (see Note 3). The DNA content can be determined by PI staining followed by flow cytometry analysis [11].

3.2 Ploidy Determination by DNA Staining with Propidium Iodide (PI)

Propidium iodide is a membrane-impermeant dye that stains by intercalating into nucleic acid molecules. PI stains all doublestranded regions of both DNA and RNA by intercalating between the bases of the double helix. PI staining can be used for flow cytometry to quantitatively measure DNA content.

QTL Mapping by Pooled-Segregant Whole-Genome Sequencing in Yeast

255

1. Grow a culture of the selected yeast strain to mid-exponential phase. 2. Measure the optical density at 600 nm (OD600) of the culture, and calculate the volume you have to take in order to have 2 × 107 cells/mL (around 500 μL of an OD600 = 1.0 cell culture). 3. Transfer the determined volume of cell culture to two microcentrifuge tubes, and adjust the volume of each sample to 1.0 mL with sterile ice-cold water. Reserve one sample for background control that will not be treated with PI and RNAse. 4. Centrifuge the tubes for 5 min at 2,000 × g. 5. Discard the supernatant, resuspend the pellet in 1 mL of 70 % (v/v) ethanol (−20 °C), and vortex briefly. Incubate at least for 1 day at 4 °C. 6. Centrifuge (5 min, 2,000 × g, 4 °C). 7. Discard the supernatant and resuspend the pellet in 100 μL of 1 mg/mL RNAse in order to remove the RNA. Resuspend the control sample in 100 μL of sterile water. 8. Incubate the samples for 90 min in a water bath at 37 °C. 9. Centrifuge, remove supernatant, and add 100 μL of 1× PI solution. 10. Incubate the samples for at least 2 days (maximum 1 week) at 4 °C. Cover the samples with aluminum foil, since PI is light sensitive. 11. Analyze the samples by flow cytometry diluting the samples in water (50 μL of sample in 450 μL of water). PI is excited with 488 nm wavelength light, and it fluoresces red. Include as control both haploid and diploid known strains for comparison of the DNA content measurements (Fig. 2). 3.3 Sporulation and Tetrad Dissection

1. The sporulation and tetrad dissection can be performed as described by Sherman and Hicks [12]. 2. Grow the cell culture (yeast strain with the desired superior phenotype) in 3 mL of YPD medium for around 7 h (200 rpm, 30 °C). 3. Harvest the cells by centrifugation (3 min, 500 × g, 4 °C), discard the supernatant, and wash the cell pellet with 1 mL of ice-cold sterile water. 4. Harvest the cells by centrifugation, and discard the supernatant. Add 20–100 μL of ice-cold sterile water to the pellet. 5. Spot the cell suspension on solid sporulation medium. 6. Incubate the plates at 23 °C for at least 5 days (see Note 4). 7. The sporulation efficiency can be checked by microscopy (40×). After 5 days, tetrads are visible with well-sporulating strains.

256

Thiago M. Pais et al.

Fig. 2 Ploidy determination by flow cytometry of S288c strain: examples of haploid strain (a) and diploid strain (b)

8. For the tetrad dissection, resuspend a small amount of cells in 50 μL of lyticase solution and incubate for 10 min at room temperature. 9. 10 μL of the lyticase-treated cell sample are spotted onto a YPD plate and used for tetrad dissection, which is performed with a micromanipulator (see Note 5). 3.4 Mating Type Determination by PCR

1. Resuspend a small amount of cells in 20 μL of 0.02 M NaOH and incubate for 1 h at room temperature. 2. 1 μL of the treated cell suspension is used to determine the mating type by PCR. The three primers for the MAT locus (see Subheading 2.3) are used together in the PCR reaction mix [13]. 3. The PCR conditions are as follows: Step 1: 94 °C (4 min) denaturation Step 2: 94 °C (30 s) denaturation

QTL Mapping by Pooled-Segregant Whole-Genome Sequencing in Yeast

257

Step 3: 58 °C (30 s) annealing Step 4: 72 °C (45 s) elongation Step 5: 72 °C (10 min) Step 6: 10 °C (∞) Repeat 25–30 times from step 2 to step 4. 4. The size of the PCR products is checked via electrophoresis on agarose gel. The expected amplicon sizes are 544 bp for MATa (primers MAT locus and HMRa, see item 2, Subheading 2.3) and 404 bp for MATα (primers MAT locus and HMRα, see item 2, Subheading 2.3). Heterozygous diploid strains give both PCR products. Strains with only MATa or MATα products are not necessarily haploids; they can also be diploids (a/a or α/α). Therefore, determination of the DNA content (see Subheading 3.2) is important before starting the strain crossing (see Note 6). 3.5 Crossing of Haploid Yeast Strains

1. A small amount of cells from each haploid strain of opposite mating type (MATa or MATα) are streaked over each other and mixed on a YPD plate. 2. The plate is incubated at 30 °C for 6–10 h, and the presence of shmoos (cellular bulge that is produced by a haploid yeast cell in response to pheromone from the opposite mating type) is checked by microscopy (40×). 3. A small amount of cells is transferred with a sterile loop from the plate to 3 mL of YPD and incubated for 1 day (200 rpm, 30 °C). 4. A cell suspension with OD600 = 1.0 is prepared and diluted 1:10,000 times in sterile deionized water. 5. 500 μL of the diluted cell suspension is plated on a YPD plate for isolation of single colonies (incubation at 30 °C for 1 day). 6. The single colonies can be analyzed for their mating type (see Subheading 3.4) and DNA content (see Subheading 3.2). A diploid strain is selected for the genetic mapping studies.

3.6 Isolation and Inbreeding of Meiotic Segregants for Genetic Mapping

1. The diploid strain deriving from the crossing of the superior with the inferior strain (superior × inferior) is sporulated, and meiotic segregants are obtained as described before (see Subheading 3.3). 2. Check for sporulation under the microscope. If tetrads are observed, harvest all cells. 3. Resuspend the sporulating cells in a 300 mL sterile Erlenmeyer flask containing 25 mL of Milli-Q water and glass beads (approximately two full tubes of 0.2 mL). Add 500 μL of zymolyase and 10 μL of β-mercaptoethanol. Incubate overnight in a shaking incubator at 30 °C. This treatment will result in lysis of unsporulated diploid cells.

258

Thiago M. Pais et al.

4. Transfer the cells and the glass beads from the Erlenmeyer to a 50 mL screw cap sterile tube and shake/vortex for 5 min. 5. Transfer the supernatant (cells without glass beads) into a fresh 50 mL screw cap tube. 6. Spin down the cells by centrifugation for 5 min at 10,000 × g, 4 °C. Check for a pellet. If no pellet is visible, use a high-speed centrifuge. 7. Resuspend the cells in 10 mL of NP-40, and transfer the solution to a 15 mL screw cap tube. Incubate the suspension on ice for 15 min. 8. Sonicate the cell suspension for 30 s (amplitude = 75 %, cycle = 1) while keeping the tube on ice. Let the cells rest for 2 min in ice. Repeat this cycle three times (four sonication cycles in total). The sonication will release the spores to avoid sticking together. 9. Harvest the cells by centrifugation (5 min at 10,000 × g, 4 °C), and resuspend the pellet in 5 mL of NP-40. Repeat this step two times. 10. Repeat the sonication step (again four sonications). 11. Harvest the cells by centrifugation for 10 min at 10,000 × g, 4 °C. 12. Decant the supernatant and remove the remaining liquid with a micropipette. Resuspend the pellet in 300 μL of sterile Milli-Q water. 13. Prepare serial dilutions to obtain isolate colonies and plate on YPD + chloramphenicol plates. The chloramphenicol is needed since sonication cannot be done in a sterile environment. 14. Plate the remaining spore solution on a single YPD + chloramphenicol plate to allow the spores to mate. Incubate for 48 h at 30 °C. 15. Pick colonies from the serial dilution plates, and check for mating type (see Subheading 3.4). If additional internal crosses are desired, take a lump of cells from the mating plate and restart the protocol from step 1 [9]. 16. The meiotic segregants isolated can have their mating type determined as a fast method to assess the ploidy (see Subheading 3.4). 17. The phenotype of the segregants is characterized in order to select the ones that have the same superior phenotype as the superior parent, which has the phenotype of interest (see Note 7). 18. The selected segregants (ca. 30) are used for pooled-segregant whole-genome sequencing (see Note 8). 19. The genomic DNA of the segregants is extracted separately or after pooling the cells, based on dry weight or OD600 [10] (see Notes 9 and 10).

QTL Mapping by Pooled-Segregant Whole-Genome Sequencing in Yeast

3.7 Isolation of Genomic DNA

259

1. Grow the cells in 5 mL YPD medium until stationary phase (2–3 days, 30 °C, 200 rpm). 2. Harvest the cells in a 15 mL tube (10,000 × g, 3 min). 3. Resuspend the cells in 0.5 mL of sterile water and transfer to a microcentrifuge tube. 4. Centrifuge the cells in a microcentrifuge (10,000 × g, 5 min, 4 °C). 5. Resuspend the cells in 400 μL of lyticase solution (pipette up and down). 6. Incubate for 2–3 h at 37 °C. 7. Add 400 L lysis buffer with Proteinase K (50–100 g/mL). Brief vortex. Incubate at 55 °C for 30 min. If all the cells are lysed, the solution becomes transparent. After this step, keep your lysate on ice. 8. Spin down cell debris at max speed (4,750 × g) in benchtop centrifuge for 15 min at 4 °C. 9. Decant supernatant into new tube on ice and discard pellet. 10. Add NaCl 5M to a final concentration of 0.2 M. Add 0.6–1 volume of cold isopropanol to 1 volume of DNA/salt solution. Keep it at –20 °C for 30–60 min for an optimal precipitation (0 °C can be used for >20 ng/ml). Mix by inverting the tube gently several times. 11. Spin down at max speed in benchtop centrifuge for 10 min at 4 °C. Discard the supernatant by decanting, and resuspend the pellet in 4 mL TE buffer. 12. Add 4 mL PCI reagent pH 8.0 (1:1, v:v). 13. Mix gently for 5 min (rocking platform) and microcentrifuge 10 min at 10,000 rpm at room temperature. Phases should be well separated. If DNA solution is viscous or contains a large amount of protein, it should be spun longer. 14. Carefully remove the top (aqueous) phase containing the DNA (avoid to take interphase) and transfer it to a new tube. 15. Repeat PCI extraction (equal volume as transferred). 16. Add RNase A (10 mg/mL), DNase free, to a final concentration of 100 g/mL and incubate for 2 h at 37 °C. 17. Repeat PCI extraction, now using phase lock gel to separate the aqueous phase and avoid contamination. 18. Decant the upper aqueous layer in another tube. 19. Add 1/10th volume of 3 M NaAc and 2 volumes of ice cold 95 % ethanol (or 1 volume of ice cold isopropanol). Keep it at –20 °C for about 30–60 min. 20. Centrifuge at max speed in benchtop centrifuge for 5–10 min at 4 °C. Make sure all supernatant is removed.

260

Thiago M. Pais et al.

21. Wash pellet with 500 µL 70 % EtOH. Make sure all supernatant is removed (take the remaining off with yellow tip). 22. Dry sample in speedvac. (Alternative: dry sample in laminar flow.) 23. Resuspend DNA in 100 µL TE buffer; use 10 µL for analysis; store the rest at –80. 24. Check DNA by gel: 0.8 % agarose; 4 µL in medium-size slot; 100 V 35 min. 25. Quantify DNA with PicoGreen. 26. Check the absence of bacterial DNA contamination by PCR. 27. If the sample is qualified, store it in DNAstable® (Biomatrica). 3.8 PooledSegregant WholeGenome Sequencing

1. The genomic DNA of the pool of segregants is sequenced using the Illumina platform (see Note 11). 2. The sequencing data can be analyzed with open-source tools (NGSEP, http://sourceforge.net/p/ngsep/wiki/Home/) [14] or with licensed software (e.g., DNAstar Lasergene, CLC genome workbench). 3. The mapping of the sequencing reads of the selected pool of segregants against the two parental strains provides the single-nucleotide variant (SNV) frequencies over the 16 yeast chromosomes. 4. Because the SNV frequencies present some variability over the length of the chromosomes, an algorithm is used to smooth the sequencing data in order to create a curve with confidence intervals that indicates the linkage towards one of the two parental strains.

3.9 Statistical Analysis for the Raw SNV Frequency Data [10]

1. For each chromosome, the quantified frequencies of the detected SNPs are considered to be binomially distributed. By fitting smoothing splines, the underlying structure in the SNP scatterplot of a given chromosome can be identified (scatterplot smoother for genetic mapping based on next-generation sequencing (NGS) data: http://www.ibiostat.be/software/ bioinformatics.asp#ScatterplotSmoother). An example of the sequencing data (SNV frequency) for two chromosomes that were smoothened by applying this script is shown in Fig. 3.

3.10 QTL Determination and Fine Mapping 3.10.1 Determination of QTLs by SNP PCR

1. In addition to the fast assessment of SNP frequencies within a given pool of segregants (see Subheading 3.9), one can also determine the exact SNP frequency of a few selected SNPs, especially in places where the data smoothing indicates the occurrence of genetic linkage with one of the parental strains.

QTL Mapping by Pooled-Segregant Whole-Genome Sequencing in Yeast

261

Fig. 3 Pooled-segregant sequencing data smoothing. (a) The data set (chr XIV) of an unselected pool of segregants (237 segregants) sequenced twice (Illumina platform) by Beijing Genomics Institute—BGI (BGIHong Kong, China) (green line) and GATC (GATC Biotech AG, Konstanz, Germany) (red line). (b) Sequencing data (chr I) of a selected pool of segregants (22 segregants). The data were smoothed using the algorithm that applies a linear mixed model (LMM) framework [10]. Dashed lines (a) and filled grey band (b) represent the confidence intervals for the respective smoothed lines. Grey dots (background) represent the raw sequencing data. The sequencing depths of the unselected pool were 37.71× and 134.3× for BGI and GATC, respectively (a), and 36.36× for the selected pool (b). The unselected pool has smoothed curves with SNV frequencies around 50 % as expected with normal segregation (a). On the other hand, the selected pool has a deviation from 50 % that indicates linkage towards the superior parent strain (b)

2. Two SNPs, preferably around 1,000 bp apart, are chosen for the primer design. Two sets of primers are constructed, differing only in the 3′ end of each primer. The difference corresponds exactly to the two selected SNPs that differentiate the two parent strains (see Note 12). 3. A gradient PCR is performed using the two sets of primers and the two template DNAs (from the two parental strains) in order to determine at which temperature the mismatch in the annealing allows the detection of each SNP. 4. The optimal annealing temperatures determined in the previous step are used for PCR analyses of the segregants, individually, to determine which parental nucleotide is present in a given genomic position. Thereby, testing all the selected segregants separately, SNP frequencies of selected genomic regions can be determined precisely and therefore genetic linkage (the presence of QTLs) defined by statistical means. 3.10.2 Binomial Test

Single individual testing of the SNV frequency can be performed applying a simple binomial test, which tests for a statistically significant deviation of the SNV frequency from 0.5. The deviation is

262

Thiago M. Pais et al.

significant when P-value is below 0.05. Also, more advanced methods can be applied to obtain less false positives as described by Benjamini and Yekutieli [15]. 3.11 Reciprocal Hemizygosity Analysis

Two hemizygous diploid strains are constructed by crossing of the superior and inferior parent strains, either the original parent or the parent with the candidate gene deleted. In case of essential genes, the diploid strains are constructed first and a different copy of the candidate gene is deleted in each diploid strain. In this way, two hybrid diploid strains differing in a single allele of the candidate gene are obtained: one diploid strain has the allele of the superior parent, and the other diploid strain has the allele of the inferior parent. 1. Choose a selection marker (e.g., antibiotic resistance gene) for gene knockout that is appropriate for the yeast strains under study (see Note 13). 2. Design primers for the gene that has to be deleted adding a tail of 50–80 base pairs that matches with the target site for homologous recombination (see Note 14). 3. PCR amplify the selection marker to be used in the gene deletion with the abovementioned primers. 4. Use the PCR product (ca. 1 μg) for yeast transformation (inferior and/or superior parent), selecting transformants on media appropriate for the selection marker chosen (see Note 15). Transformation can be done by the Gietz method [16]. 5. Construct the two hemizygous diploid parent strains by crossing, and evaluate the phenotype using the same conditions utilized for the selection of the superior meiotic segregants. The deletion strains can also be compared to the original diploid strain (see Subheading 3.5) that has no deletion (see Note 16). 6. If a superior allele is identified, this gene can be inserted in other genetic backgrounds for further characterization and for industrial yeast strain improvement.

4

Notes 1. PI solution can be protected from light by covering the vial with aluminum foil. 2. Laboratory strains are usually more stress sensitive and, in most cases, are appropriate strains to be used as the inferior parents when investigating stress tolerance traits. However, when the final natural or industrial target strain for the improvement strategy already presents a clear inferior phenotype compared to the selected superior strain, it is rather used as the inferior

QTL Mapping by Pooled-Segregant Whole-Genome Sequencing in Yeast

263

parent than the laboratory strain. In this way, considerable amount of work in analyzing the QTLs to identify the causative genes can be saved. 3. The ploidy of natural and especially industrial strains can vary from aneuploidy to multiploidy. With the described methodology, one needs stable haploid strains for the genetic mapping of the polygenic trait of interest. 4. Sporulation efficiency varies according to the genetic background. Some yeast strains show sporulation only after a few weeks. 5. If a micromanipulator is not available, sporulation and isolation of segregants can be performed by random spore isolation methodology [17]. 6. In case a strain is homothallic, preferably all copies of the gene HO, which is responsible for the mating type switch, should be deleted, which will result in stable haploid strains. 7. The phenotype should be assessed preferably using exactly the same conditions utilized to identify the superior parent strain (see Subheading 3.1). Alternatively, if the trait of interest is selectable (e.g., high temperature tolerance), one can design a screening in order to quickly select thousands or even millions of superior segregants [4]. Nevertheless, it has already been demonstrated that it is possible to map QTLs and to identify the respective causative genes with a relatively small number of meiotic segregants [5]. 8. The option of pooling the segregants for sequencing is mainly economic. When the technologies for DNA sequencing have evolved further and the prices decreased enough, it might become affordable and preferable to sequence the segregants individually. 9. The DNA extraction of pooled cells is more convenient than the individual DNA extraction. However, the pooling of cells, especially of small number of segregants (ca. 30), should be performed based on dry weight and not OD600. Segregants can have different cell sizes which can interfere with the ratio cell number/OD600. Moreover, if cell aggregation is observed in part of the segregants, this discrepancy can become even more significant. If individual DNA extraction is chosen for the segregants, one should utilize a specific DNA measurement method (e.g., Picogreen, Invitrogen, for double-strand DNA) to precisely mix the DNAs in equimolar concentrations while pooling the DNA samples for sequencing. 10. Cells are pooled based on dry weight: (a) Grow cells in 50 mL.

264

Thiago M. Pais et al.

(b) Filter (40 μm pore size) exactly 10 mL of each culture, and freeze (−80 °C) the rest of the culture(s). (c) Dry completely the cells on the filter, and weigh them to calculate μg of cells/mL culture. (d) Take appropriate volumes from the original culture to mix the same number of cells from each culture. 11. The Illumina platform sequencing generates short reads of 75–100 bp that can be assembled and mapped to a reference genome (e.g., S288c Saccharomyces cerevisiae genome). If novel strains are used for the crossing, the parental strains are sequenced and assembled and novel genomic consensus sequences are generated, which will be used for the mapping of the pool of selected segregants. The DNA sample quality requirements vary with the different companies that provide DNA sequencing, but, as a general rule, the DNA sample should have at least 3 μg of genomic DNA, no RNA contamination, and a purity ratio (260/280) between 1.8 and 2.0. The sequencing depth should be at least around 30×, with a technical quality limitation at around 70×, at least regarding the Illumina platform. Sequencing quality and genetic mapping do not improve with higher depth (see Fig. 3). 12. The selected SNPs are close to each other to avoid the occurrence of recombination in between them. If a crossingover would have occurred in a given segregant in a place located in between the two SNPs, targeted by the SNP PCR approach, the optimal annealing temperatures defined by the gradient PCR using the parental strain DNA as template will not work as effectively as in the hybrid segregant. 13. Certain natural and industrial yeast strains show moderate resistance to antibiotics normally utilized for selection of laboratory strains (e.g., geneticin). In this case higher concentrations of antibiotic or different selection markers should be used to make the gene knockouts. 14. The length of the primers’ flanking regions, which will be responsible for the gene knockout by homologous recombination, varies according to the genomic position of the targeted gene. Some gene deletions work well with shorter flanking regions, whereas other genes require longer primers for efficient knockout. 15. The purpose of the reciprocal hemizygosity analysis (RHA) test is to construct heterozygous diploid strains, in which only the gene of interest is affected (deleted or replaced). The gene knockout can be performed in the parent strains separately or in the diploid directly. If the target gene is essential, one can transform the diploid strain directly and check which allele was deleted afterwards, since double deletion will not be feasible.

QTL Mapping by Pooled-Segregant Whole-Genome Sequencing in Yeast

265

The test of which allele is deleted can be done by sequencing the flanking regions of the deleted gene if SNVs are present nearby. The SNVs work as natural markers to differentiate each parent strain DNA. On the other hand, if the gene is not essential, transforming the diploid strain can result in a doubledeleted strain (loss of function). To avoid this, one can delete the target gene in the parent strains separately and only then cross the selected haploid strains carrying the deletion with the original opposite parent strain (see Subheading 3.5). In addition to gene knockouts, one can also perform large deletions (bulk reciprocal hemizygosity analysis—bRHA) in order to evaluate several adjacent candidate genes simultaneously. In this case primers with flanking regions of 80 base pairs were successfully employed to delete regions of up to 27 kb. In some cases, even deletion of smaller regions requires primers with longer flanking regions. 16. If the (superior) allele of a candidate gene is involved in the trait under investigation, it will cause a difference in the phenotype, compared to the strain with the inferior allele. References 1. Steinmetz LM, Sinha H, Richards DR, Spiegelman JI, Oefner PJ, McCusker JH, Davis RW (2002) Dissecting the architecture of a quantitative trait locus in yeast. Nature 416: 326–330 2. Deutschbauer AM, Davis RW (2005) Quantitative trait loci mapped to singlenucleotide resolution in yeast. Nat Genet 37: 1333–1340 3. Patil N, Berno AJ, Hinds DA, Barrett WA, Doshi JM, Hacker CR, Kautzer CR et al (2001) Blocks of limited haplotype diversity revealed by high-resolution scanning of human chromosome 21. Science 294(5547): 1719–1723 4. Winzeler EA et al (1998) Direct allelic variation scanning of the yeast genome. Science 281(5380):1194–1197 5. Kwok P-Y (2002) SNP genotyping with fluorescence polarization detection. Hum Mutat 19(4):315–323 6. Lechner D, Lathrop GM, Gut IG (2002) Large-scale genotyping by mass spectrometry: experience, advances and obstacles. Curr Opin Chem Biol 6(1):31–38 7. Hardenbol P, Banér J, Jain M, Nilsson M, Namsaraev EA, Karlin-Neumann GA, FakhraiRad H et al (2003) Multiplexed genotyping with sequence-tagged molecular inversion probes. Nat Biotechnol 21(6):673–678

8. Ehrenreich IM, Torabi N, Jia Y, Kent J, Martis S, Shapiro JA, Gresham D, Caudy AA, Kruglyak L (2010) Dissection of genetically complex traits with extremely large pools of yeast segregants. Nature 464(7291):1039–1042 9. Parts L, Cubillos FA, Warringer J, Jain K, Salinas F, Bumpstead SJ, Molin M (2011) Revealing the genetic structure of a trait by sequencing a population under selection. Genome Res 21:1131–1138 10. Swinnen S, Schaerlaekens T, Pais T, Claesen J, Hubmann G, Yang Y, Demeke M, FoulquieMoreno M, Goovaerts A, Souvereyns K, Clement L, Dumortier F, Thevelein JM (2012) Identification of novel causative genes determining the complex trait of high ethanol tolerance in yeast using pooled-segregant whole-genome sequence analysis. Genome Res. doi:10.1101/gr.131698.111 11. Popolo L, Vanoni M, Alberghina L (1982) Control of the yeast cell cycle by protein synthesis. Exp Cell Res 142:69–78 12. Sherman F, Hicks J (1991) Micromanipulation and dissection of asci. Methods Enzymol 194: 21–37 13. Huxley C, Green ED, Dunham I (1990) Rapid assessment of S. cerevisiae mating type by PCR. Trends Genet 6:236 14. Duitama J et al (2014) An integrated framework for discovery and genotyping of genomic

266

Thiago M. Pais et al.

variants from high-throughput sequencing experiments. Nucleic Acids Res. doi: 10.1093/ nar/gkt1381 15. Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29: 1165–1188

16. Gietz RD, Schiestl RH, Willems AR, Woods RA (1995) Studies on the transformation of intact yeast cells by the LiAc/SS-DNA/PEG procedure. Yeast 11:355–360 17. Douglas AT, Fred W (2008) Growth and manipulation of yeast. Curr Protoc Mol Biol 82:13.2.1–13.2.12

Part III Metabolic Models for Yeast Metabolic Engineering

Chapter 16 Genome-Scale Metabolic Models of Yeast, Methods for Their Reconstruction, and Other Applications Sergio Bordel Abstract Here, we present the concept of genome-scale metabolic models and some of their applications in metabolic engineering of yeast and in the analysis of gene expression data. The yeast species for which there are available genome-scale metabolic models are reviewed, as well as the methods for the reconstruction of genome-scale metabolic models for new species. Some commonly used algorithms for metabolic engineering and data integration are described. Key words Metabolic engineering, Gene expression, Metabolic networks, Data integration

1

Introduction Genome-scale metabolic models are comprehensive compilations of all the metabolic reactions taking place in an organism. Each metabolic reaction is linked to one or several enzymes and to the genes coding these enzymes. Genome-scale metabolic models are already available for the most industrially relevant yeast species such as Saccharomyces cerevisiae [1], Schizosaccharomyces pombe [2], Pichia pastoris [3, 4], Pichia stipitis [4], and also the lipidaccumulating yeast Yarrowia lipolytica [5]. Some numbers summarizing the contents of these models are presented in Table 1. The published models can be used as templates for future reconstructions of genome-scale metabolic models of other species of yeast. In this chapter we summarize the pipeline to be followed in order to reconstruct genome-scale models for other species and also the kinds of information that it is possible to obtain from genomescale metabolic models for metabolic engineering purposes.

Valeria Mapelli (ed.), Yeast Metabolic Engineering: Methods and Protocols, Methods in Molecular Biology, vol. 1152, DOI 10.1007/978-1-4939-0563-8_16, © Springer Science+Business Media, LLC 2014

269

Sergio Bordel

270

Table 1 Existing genome-scale metabolic models of yeast species Model name

Organism

Reactions

Metabolites

Genes

Genome coverage (%)

iMM904

S. cerevisiae

1,312

1,177

904

13.7

SpoMBEL1693

S. pombe

1,693

1,744

605

12.2

iNL895

Y. lipolytica

2,002

1,847

895

13.8

iSS884

P. pastoris

1,332

920

884

15.1

iLC915

P. stipitis

1,423

899

915

17.2

The genome coverage indicates the percentage of the protein coding ORFs that are involved in metabolic reactions. The same reaction or metabolite happening in a different subcellular compartment is normally considered as a different reaction or metabolite, which makes the total numbers of metabolites and reactions in the models being higher than the real number of distinct chemical species and reactions present in a cell

2

Pipeline for the Reconstruction of Genome-Scale Metabolic Models The first genome-scale metabolic model for a eukaryotic organism was a model of S. cerevisiae and was built based on the available bibliographic information about the organism [6]. This is only possible for model organisms that have been already extensively studied, which is often not the case. On the other hand, a manual reconstruction based on literature surveys is a very time and effort consuming process. In order to overcome the mentioned limitations, it is essential to count on a methodology that allows obtaining automatically an initial draft that will then be manually curated taking into account the existing literature. The existence of genome-scale metabolic models for several yeast species provides a very good basis to start the reconstruction work of any other yeast, due to the fact that these organisms are closely related and share a substantial number of metabolic features among each other. The automated reconstruction process starts by identifying ortholog pairs of proteins between the target organism and each of the yeast genome-scale models that have been already reconstructed. This can be done using a software suite such as the RAVEN Toolbox (Reconstruction, Analysis, and Visualization of Metabolic Networks) [7]. The RAVEN Toolbox allows using a number of genome-scale metabolic models given in the SBML (Systems Biology Markup Language) format [8] and the corresponding FASTA files containing the protein sequences for each of the template models and the target organism. A bidirectional BLASTp [9] is performed for each protein in the target organism against all the proteins in the set of template models. A typically used cutoff value for the BLAST is an E-value equal or lower than 10−40 [4]. Other homology measurements can also be used within the RAVEN framework.

Genome Scale Metabolic Models of Yeast

271

The reactions linked to proteins passing the mentioned homology threshold are added to a draft model keeping the same directionality (reversible or irreversible reactions) and localization that they have in the template models. The mentioned approach limits the reactions that can be included in the new model to the set of reactions already present in the template models. In order to overcome this limitation, RAVEN allows including any reaction present in the KEGG (Kyoto Encyclopedia of Genes and Genomes; http://www.genome.jp/kegg/) database. KEGG contained in 2011, 556,272 metabolic genes grouped into KEGG Orthologies (KOs), which are sets of genes from different organisms involved in a particular metabolic function. For each KO a hidden Markov model is generated based on the set of sequences that it contains. This is done using the software HMMER [10]. The user can choose to reduce the number of sequences in each KO to those corresponding to evolutionary closer organisms. For example, for a yeast model it is possible to use only eukaryotes or only fungi. The protein sequences of the organism of interest are then queried using the hidden Markov models for each KO: if a gene has a significant match to one KO, the corresponding metabolic reactions associated to the KO are added to the model and linked to the corresponding gene. In the case of eukaryotic cells such as yeast, the subcellular localization of each reaction is also fundamental. The best way to assign subcellular localization is to have evidence in the literature. If this evidence is not available, the RAVEN toolbox includes the algorithm FLocA, which allows combining the localization predictors obtained from the protein sequences together with network connectivity information in order to infer the subcellular localization. Special effort should be put in identifying transport reactions between compartments. Too many transporters lead to unrealistic flux distributions predicted by the model and to wrong predictions of gene essentiality. The best approach is to add only transport reactions when there is bibliographical evidence or when a transporter protein has been sufficiently characterized. The described bioinformatics based methodology allows obtaining a draft model in a time efficient manner. These draft models are normally not functional and need to be subjected to quality control and manual curation.

3

Model Curation The first step in the curation process of a genome-scale metabolic model is the “gap filling,” which consists in identifying metabolites that cannot be produced or consumed in the model and suggesting new reactions able to fill the gaps. There are several computational tools available to help the gap filling process. The mentioned RAVEN Toolbox includes a gap filling function and also the sec-

272

Sergio Bordel

ond version of the COBRA Toolbox has one [11]. The COBRA Toolbox includes the function gapFind, which provides a list of all the metabolites lacking reactions for their production or consumption, as well as metabolites downstream of each gap (which cannot be produced by the network but could eventually be produced if the gap is filled). Once the gaps have been identified, they have to be filled with reactions able to transform some of the compounds that cannot be consumed in the model into some of the compounds that cannot be produced. This requires using databases such as KEGG, in order to identify potential candidate reactions. Evidence of the existence of a specific reaction in our target organism should be found in the literature: this process could eventually lead to the re-annotation of some gene. If no evidence is found in the literature, it is advisable to only fill the gap if it is strictly necessary for the model functionality. A model is said to be functional if it is able to produce all the biomass precursors under realistic growth conditions [12]. By growth conditions we mean the presence or absence of external metabolites/compounds present in a real growth medium. In order to assess the functionality of a model, it is necessary to define the so called biomass equation, which is an extra reaction in the model and whose stoichiometry mimics the biomass composition. The biomass composition must be experimentally determined. It is easy to determine the biomass fractioning into macromolecular components such as proteins, nucleic acids, lipids, etc. When it comes to the detailed biomass composition, the task is more difficult and it might be necessary to estimate the detailed amino acid and nucleic acid compositions from the genome. The most difficult task is to identify the detailed lipid composition, due to the extreme diversity of lipid species. The definition of a realistic biomass equation is essential for the good performance of the model and is crucial for the gene essentiality predictions. Besides the biomass building blocks, the biomass equation includes a number of ATP molecules which are dephosphorylated. It is possible to estimate theoretically the ATP (or equivalents such as GTP) necessary for the polymerization of macromolecules; however, the real ATP costs associated to growth are always higher than the theoretical ones and the growth associated ATP costs constitute a parameter that should be given a value based on experimental growth yields. The parameter estimation is briefly described later.

4

Model Validation and Parameterization The reconstruction of genome-scale metabolic models is an iterative process in which an initial model is used to make predictions about the physiology of the organism and the accuracy (or lack of accuracy) of these predictions is used to infer reactions that should

Genome Scale Metabolic Models of Yeast

273

be added or removed from the model. The model validation involves necessarily translating the structure of the model into a mathematical formulation that allows computing possible flux distributions. The core of this mathematical representation is the so called stoichiometric matrix S, which is a matrix containing the stoichiometric coefficients for each intracellular metabolite in each reaction of the model. The steady state condition for the intracellular metabolites can then be imposed on the model by using the following equation:   Sv = 0 (1) Equation (1) is the matrix notation of a system of linear equations that contains as many equations as internal metabolites and as many unknowns as metabolic reactions. The stoichiometric matrix S contains the stoichiometric coefficients of each metabolite in each metabolic reaction. The vector v contains the rates of each of the metabolic reactions. Each of the reactions must be defined as reversible (being able of taking both positive and negative values) or irreversible (allowing only positive values). If biochemical data are not available, it is possible to compute the standard Gibbs free energy of reaction ΔrG0 [13] in order to infer the most likely reaction directionality. Once the directionality constraints have been fixed, the model can start being checked for consistency. The first check to be made is the model viability. The model is said to be viable if it is able to synthesize all the components in the biomass equation from a minimal set of precursors in which the studied organism is known to grow. It is convenient to check the biomass components one by one in order to identify the missing reactions (in case the model is not viable) in a more efficient way. The next step in the validation process is checking gene essentiality. A gene deletion can be simulated using a genome-scale metabolic model by setting to zero the flux through the reactions catalyzed by the product of the deleted gene. Deletions of essential genes should make the model not viable. If a gene that has been shown experimentally to be essential is not predicted as essential, the model is likely to contain one or several reactions that do not correspond to the actual cellular metabolism. If a nonessential gene is predicted to be essential, there are still missing reactions in the model and a new gap filling round should be started in order to make the model viable even in the absence of the nonessential gene. In order to use genome-scale metabolic models to perform quantitative predictions, it is necessary to adjust at least three energetic parameters which are (1) the growth associated ATP consumption, (2) the maintenance ATP consumption, and (3) the P/O ratio (which is the number of ATP molecules produced in the oxidative phosphorylation per each pair of electrons transferred to oxygen). This parameterization can be easily performed using

274

Sergio Bordel

chemostat experiments at different dilution rates and under aerobic and anaerobic conditions (if the yeast of interest can grow under both conditions). By plotting the glucose uptake rate as function of the specific growth rate under anaerobic conditions, we obtain a linear relationship that can be used to estimate the growth associated ATP maintenance consumption from its slope and the non-growth associated ATP consumption from its ordinate at the origin. This is done by setting the experimental glucose uptake rate in the model to its experimental value and maximizing the rate of biomass production (with the oxygen uptake rate set to zero). Repeating this calculation at several growth rates gives us a linear relationship, whose slope can be compared with the experimental one. The growth associated ATP consumption rate can be changed until the theoretical slope fits the experimental slope. The non-growth associated ATP maintenance is obtained by fitting the predicted ordinate at the origin to the experimental one. Once the ATP maintenance costs have been estimated (it is normally assumed that they are the same under anaerobic and aerobic conditions), the P/O ratio can be obtained from the slope of the plot of the specific oxygen uptake rate as function of the specific growth rate under aerobic conditions (the oxygen uptake rate can be also inferred from the specific CO2 production rate). The reconstruction, validation and parameterization of a genome-scale metabolic model can be seen as an iterative process in which the model is sequentially improved after each iteration round. This iterative process is summarized in Fig. 1.

5

Some Computational Tools for Metabolic Engineering Genome-scale metabolic models are used to identify sets of genes whose deletion is expected to result in a higher production yield of a particular metabolite of interest (Fig. 2). The assumption behind this approach is the existence of an objective function which is optimized by the metabolic network (normally assumed to be the growth rate). The reactions to be removed from the model are therefore those that after being removed result in a model wherein the organism needs to produce the compound(s) of interest in order to grow at an optimal yield. Several algorithms have been developed with the aim to find optimal sets of deletion targets. The COBRA Toolbox [11] incorporates three of these algorithms: OptKnock [14], OptGene [15], and GDLS [16]. OptKnock solves a bi-level linear optimization problem and is a computationally expensive method: that is why, normally, the search of deletion candidates is reduced to a subset of the metabolic network. OptGene and DGLS are heuristic methods; this makes them computationally more efficient but does not guarantee reaching an optimal solution.

Genome Scale Metabolic Models of Yeast

275

Fig. 1 Schematic representation of the reconstruction and validation of a genome-scale metabolic model

Fig. 2 A set of gene deletions can transform the space of possible metabolic flux distributions in such a way that the maximal growth yield corresponds to a substantial production rate of the product of interest

If the objective is to produce a metabolite that is not native to the yeast, genome-scale metabolic models can be used to infer the optimal set of new reactions that have to be added to the network in order to obtain the desired product with an optimal yield. OptStrain [16] is an algorithm developed by the creators of OptKnock and uses a database of known enzymes to find the minimal set of nonnative genes that are potential good candidates for over-expression in the modeled host organism in order to obtain the desired product with an optimal yield. An additional algorithm

276

Sergio Bordel

aiming to design new pathways to be introduced in a host organism is BNICE [17], which is able also to propose reactions without known enzymes catalyzing them and suggest potentially suitable enzymes based on their EC numbers.

6

Genome-Scale Metabolic Models for Data Integration in Systems Biology The gene–reaction associations included in genome-scale metabolic models can be used to map observed changes in gene expression into the global metabolic network of the organism. Here, we review a methodology that can be used for the comparative analysis of yeast strains and allows inferring the molecular mechanisms behind physiological differences between strains. Microarray technologies and RNA sequencing technologies are already well established and can provide us with lists of differentially expressed genes at genome wide scale. However, the differential expression of a metabolic gene does not result automatically in a significant change in the rate of the reaction catalyzed by its product. Genome-scale metabolic models can be used to assess what metabolic fluxes are more likely to change significantly between two different strains or conditions. By comparing the obtained list of significantly perturbed metabolic fluxes with the genes differentially expressed, it is possible to obtain a list of transcriptionally regulated metabolic reactions (in which both the reaction rate and the expression level change in the same sense). This list is typically a relatively small subset of all the differentially expressed metabolic genes and can be used to identify transcription factors responsible for the observed physiological changes by performing a simple hyper-geometric statistical set on the list of transcriptionally regulated reactions. In order to assess how likely is that a metabolic flux changes between two different strains or conditions, a set of flux distributions consistent with the observed phenotype (growth rate, substrate uptake rate, and product secretion rates) is obtained for each of the strains or conditions of interest. This can be done using the random sampling algorithm [18], which, after fixing the experimentally measured fluxes (typically uptake and secretion fluxes and also fluxes in the central carbon metabolism if those have been measured using 13C labeling), optimizes a set of different objective functions in order to get a set of possible flux distributions consistent with the observations. This can be done using the COBRA Toolbox to solve the optimization problems. Here we transcribe a MATLAB® code to perform the described random sampling. function Output=MuestreoCB(model); S=[]; for i=1:numel(model.c); model.c(i)=1;

Genome Scale Metabolic Models of Yeast

277

solution=optimizeCbModel(model); if max(solution.x)>1.5; S=[S solution.x]; end model.c=zeros(numel(model.c),1); end Output=S; In order to run this code it is important to first set the default upper and lower bounds (which are normally 1,000 and −1,000) to Inf and −Inf, respectively. This allows avoiding solutions that contain unfeasible loops in a very simple manner. The input to the function is the model structure as it is defined in the COBRA Toolbox (with the measured fluxes set to their experimental values). The output is a matrix containing a set of possible solutions. COBRA Toolbox also includes a function to perform random sampling using the Hit and Run algorithm; however, this method seems to underestimate the standard deviation of the fluxes [18]. This underestimation can be observed experimentally by comparing average fluxes predicted with the random sampling with their real values measured with 13C labeling [18]. Once we have a set of putative flux distributions for each strain or condition, we can obtain a statistical score indicating how the possible change of each metabolic flux between the compared conditions is significant. This score can be obtained in two different ways (i.e., A and B reported below). (A) If it is supposed that the difference between the real metabolic fluxes (which are unknown) and the average fluxes obtained with the random sampling is normally distributed, we can compute the difference between the average fluxes in the two considered conditions and normalize it by dividing by the standard deviation of each flux [18]. From the obtained Z-score (Eq. 2) it is possible to compute the probability of each flux changing between conditions just by using the normal distribution. The symbol μ is the average flux of a reaction in a particular condition, the symbol σ is its standard deviation. Zi =

m2 (vi ) - m1 (vi ) 2

s 1 (vi ) + s 2 (vi )

2

(2)

(B) An approach that does not rely on assuming normal distribution relies on the use the number of pairs of solutions in which the flux in the second condition is higher than the flux in the first condition, and divide it by the total number of pairs of solutions, as an estimation of the probability of the real flux being higher in the second condition. This method is less stringent than the first one and leads to more reactions being classified as transcriptionally regulated. If we are comparing strains relatively simi-

278

Sergio Bordel

lar in physiology, it is better to use the second approach in order to improve the sensitivity of the method. Once a set of transcriptionally regulated reactions has been identified and if the transcription factors binding each metabolic gene are known (as it is for example the case for S. cerevisiae), a hyper-geometric enrichment test can be performed to identify transcription factors regulating a high proportion of the transcriptionally regulated reactions. This provides very valuable information to understand the molecular mechanisms responsible for the physiological differences between the compared strains or growth conditions.

7

Conclusions Genome-scale metabolic models are now available for several yeast species and it is likely that the number of these species will grow substantially in the near future. There exist today several software tools that allow semi-automatic reconstruction of genome-scale metabolic models. There are also algorithms that can be used to find nontrivial metabolic engineering strategies and allow the interpretation of high-throughput omics data. Genome-scale metabolic models are an important step towards a holistic understanding of the cell function and we can expect their predictive power to increase in the future, thanks to their integration with regulatory networks.

References 1. Österlund T, Nookaew I, Nielsen J (2011) Fifteen years of large scale metabolic modeling of yeast: developments and impacts. Biotechnol Adv 30:979–988 2. Sohn SB, Kim TY, Lee JH, Lee SY (2012) Genome-scale metabolic model of the fission yeast Schizosaccharmyces pombe and the reconciliation of in silico/in vivo mutant growth. BMC Syst Biol 6:49 3. Sohn SB, Graf AB, Kim TY, Gasser B, Maurer M, Ferrer P, Mattanovich D, Lee SY (2010) Genome-scale metabolic model of methylotrophic yeast Pichia pastoris and its use for in silico analysis of heterologous protein production. Biotechnol J 5:705–715 4. Caspeta L, Shoaie S, Agren R, Nookaew I, Nielsen J (2012) Genome-scale metabolic reconstructions of Pichia stipitis and Pichea pastoris and in silico evaluation of their potentials. BMC Syst Biol 6:24 5. Loira N, Dulermo T, Nicaud J-M, Sherman DJ (2012) A genome-scale metabolic model of the lipid-accumulating yeast Yarrowia lipolytica. BMC Syst Biol 6:35

6. Förster J, Famili I, Fu P, Palsson BO, Nielsen J (2003) Genome-scale reconstruction of the Saccharomyces cerevisiae metabolic network. Genome Res 13:244 7. Agren R et al (2013) The RAVEN toolbox and its use for generating a genome-scale metabolic model for Penicillium chrysogenum. PLoS Comput Biol 9(3):e1002980 8. Hucka M et al (2003) The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19:524–531 9. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410 10. Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14:755–763 11. Schellenberger J et al (2011) Quantitative prediction of cellular metabolism with constraintbased models: the COBRA Toolbox v2.0. Nat Protoc 6:1290–1307 12. Thiele I, Palsson BO (2010) A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat Protoc 5:93–121

Genome Scale Metabolic Models of Yeast 13. Jankowski MD, Henry CS, Broadbelt LJ, Hatzimanikatis V (2008) Group contribution method for thermodynamic analysis of complex metabolic networks. Biophys J 95:1487–1499 14. Burgard AP, Pharkya P, Maranas CD (2003) OptKnock: a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnol Bioeng 84:647–657 15. Patil K, Rocha I, Förster J, Nielsen J (2005) Evolutionary programming as a platform for in

279

silico metabolic engineering. BMC Bioinformatics 6:308 16. Lun DS et al (2009) Large-scale identification of genetic design strategies using local search. Mol Syst Biol 5:296 17. Hatzimanikatis V et al (2005) Exploring the diversity of complex metabolic networks. Bioinformatics 21:1603–1609 18. Bordel S, Agren R, Nielsen J (2010) Sampling the solution space in genome-scale metabolic networks reveals transcriptional regulation in key enzymes. PLoS Comput Biol 6:e1000859

Chapter 17 Model-Guided Identification of Gene Deletion Targets for Metabolic Engineering in Saccharomyces cerevisiae Ana Rita Brochado and Kiran Raosaheb Patil Abstract Identification of metabolic engineering strategies for rerouting intracellular fluxes towards a desired product is often a challenging task owing to the topological and regulatory complexity of metabolic networks. Genome-scale metabolic models help tackling this complexity through systematic consideration of mass balance and reaction directionality constraints over the entire network. Here, we describe how genome-scale metabolic models can be used for identifying gene deletion targets leading to increased production of the desired product. Vanillin production in Saccharomyces cerevisiae is used as a case study throughout this chapter. Key words Metabolic engineering, Genome-scale metabolic reconstructions, Flux balance analysis, OptGene, Minimization of metabolic adjustment, Minimization of metabolite balance

1

Introduction Genome-scale metabolic reconstructions are comprehensive representations of the cellular metabolic network of a given organism, encompassing information on all metabolic reactions in terms of their stoichiometry, reversibility, localization, cofactor usage, and gene–protein-reaction associations [1–3]. Metabolic reconstructions constitute powerful tools for systems biology with a variety of applications, which range from integrative data analysis to modeling of genotype–phenotype relationships [3–8]. One of the fields benefiting from the applied potential of genome-scale metabolic models is metabolic engineering, where the aim is to genetically engineer a microbial cell factory towards obtaining a desired metabolic phenotype [9–12]. A particular advantage offered by the use of genome-scale metabolic models is the possibility to systematically account for the high network interconnectivity that is inherent to metabolism. This accounting is achieved through mass balance constraints over all

Valeria Mapelli (ed.), Yeast Metabolic Engineering: Methods and Protocols, Methods in Molecular Biology, vol. 1152, DOI 10.1007/978-1-4939-0563-8_17, © Springer Science+Business Media, LLC 2014

281

282

Ana Rita Brochado and Kiran Raosaheb Patil

intracellular metabolites, constraints on the uptake and production rates of exchanged metabolites, and directionality constraints over the fluxes. Collectively, these constraints define the feasible space for the flux phenotype of the cell [13]. Assuming a constant growth rate (pseudo-steady state), the problem of identifying a biologically relevant flux distribution, which reflects the metabolic phenotype, can be formulated as a linear optimization problem. The ability of the models to predict meaningful metabolic phenotypes stems from the use of a biologically relevant objective function (needed for solving the optimization problem), which for some microorganisms has been proposed to be maximization of growth. The underlying assumption is that the organism has evolved metabolic regulation such that the resulting flux distribution is geared for supporting maximal growth. This approach was initially termed Flux Balance Analysis (FBA) [13–15]. FBA and allied methods (referred to as constraint-based methods) have been successfully used for phenotype predictions, particularly in the case of Escherichia coli and Saccharomyces cerevisiae [16, 17]. The use of metabolic modeling for metabolic engineering relies on the possibility of simulating metabolic phenotypes upon genetic perturbation in a relatively simple and fast manner. The simulation of the deletion of an enzyme-coding gene will modify the constraints imposed on the metabolic network, therefore altering the feasibility space and potentially the optimal solution. Consider removal of a reaction that is the only reaction producing an essential amino acid. Removing this reaction will render the in silico growth infeasible and the model would thereby predict the essentiality of the corresponding gene. Analogously, the removal of other particular reaction (s) may turn out to increase the production of the desired product (see Subheading 2.4). The ability of metabolic models in predicting phenotypes following gene knockouts can be exploited for metabolic engineering target identification in various ways (see Note 1) (reviewed at [18]). As growth is an inherent part of any cellular biological objective function, metabolic networks are often finely tuned towards producing biomass precursors. For metabolic engineering, such well developed mechanisms for resource allocation brings difficulties, as any product formation pathway often competes with growth in terms of mass and energy requirements. In this chapter, we focus on a bi-level optimization approach in which the goal is to identify a set of knockouts that will couple the biological objective (e.g., growth, see Subheading 3.1) with the design objective (e.g., product yield) [19, 20]. This approach for model-guided metabolic engineering can be applied to large-scale networks and has already led to successful assembly of yeast cell factories, namely, for production of sesquiterpenes [21] and vanillin [22]. The overall workflow for using genome-scale metabolic models to identify metabolic engineering strategies involves multiple steps, several of which need problem-specific considerations (Fig. 1).

Model-Guided Metabolic Engineering Selection of the metabolic network reconstruction

iFF708, iAZ900...

Selection of the objective function

FBA, MiMBl...

Deciding relevant physiological constraints

283

aerobic, anaerobic...

Shortelisting the potential targets remove essential genes...

Simulations and final ranking of the targets OptKnock, OptGene...

Fig. 1 Schematic representation of the general workflow for designing metabolic engineering strategies described in this chapter. For each depicted step (dark grey), one or two examples (light grey) of the corresponding resources/tools etc. are given

2

Materials

2.1 Yeast GenomeScale Metabolic Reconstructions

Up to date, more than ten high-quality genome-scale metabolic reconstructions are available for S. cerevisiae, most of which have been manually curated to some extent [8]. Choosing the bestsuited model for the application at hand is one of the first questions faced by the modeler. The available network reconstructions differ in various aspects including pathway coverage and comprehensiveness. The first generation genome-scale reconstructions, iFF708 [23] and iLL672 [24], contain approximately 700 genes, 1,200 reactions and 600 metabolites, distributed between two cellular compartments: cytoplasm and mitochondrion. More recent models, e.g., iMM904 [25], iAZ900 [26], or Yeast 5 [27], contain ~900 genes, 2,000 reactions and >1,500 metabolites, distributed between up to ten cellular compartments. The methods for building genome-scale metabolic reconstructions are well established and extensively reviewed [1, 2, 7]. Thus, only a general overview of the information used in the reconstruction process is provided here, which may be useful in interpreting the results from the modeling tools described in this chapter. 1. Functional annotation of the genome sequence of the organism. In particular, annotations for the enzyme-coding genes. 2. Biochemical evidence from various published sources, including databases, textbooks and scientific articles. Reaction-specific

284

Ana Rita Brochado and Kiran Raosaheb Patil

information that is needed for comprehensive reconstruction includes EC number, localization, associated gene(s) and reversibility. In the case of S. cerevisiae, the availability of large amount of biochemical, genomic, and physiological data has led to high quality reconstructions. The reconstruction is converted into a mathematical model that allows simulation of metabolic fluxes by using constraintbased optimization approaches. This process often requires network completion through gap-filling, wherein several reactions that are not enzyme-catalyzed, or without known enzyme/gene association, are introduced in order to obtain a functional model [28, 29]. A functional model is one that is capable of simulating biomass formation and other physiological behaviors, such as ethanol production. Physiological/biochemical evidence for the added reactions is frequently available, e.g., diffusion of ethanol across membrane [30]. In case of S. cerevisiae, phenotypic data from large-scale single/double gene deletion studies has also been used to curate or validate the models [26]. Thanks to the extensive data availability for reconstruction, the sensitivity for gene essentiality prediction is typically above 90 % [26, 27]. 2.2 Selecting a Suitable Metabolic Reconstruction

Various general features, such as the number of reactions, metabolites, genes and compartments, are well described for each of the available models. The used biomass equation and model performance for predicting gene essentiality (a commonly used model validation criterion) are also well described. Consideration of these parameters alone, however, is often insufficient to ensure that the model is adequate for the metabolic engineering problem at hand, which demands good predictions for metabolic physiology and thereby for intracellular fluxes. We recommend the use of models that have been validated with physiological data other than viability assays, e.g., models that can describe observed growth and by-product formation in continuous cultures at various specific growth rates. Constraining the model with glucose and oxygen uptake rates should be sufficient to obtain predictions of the main products being formed. For yeast, these products are typically biomass, ethanol, glycerol, and carbon dioxide in respirofermentative growth mode (batch cultivations, glucose-excess chemostats, etc.). See Notes 2–5 for recommendations regarding the currently available models. In case of new models that might become available in the future, the physiology prediction criteria can be readily applied. It should be noted that a reconstruction that is suited for in silico metabolic engineering may not be the best choice for other applications, e.g., data integration, where comprehensiveness might be more important than physiological considerations.

Model-Guided Metabolic Engineering

2.3 Inserting New Metabolic Pathways into the Model (Heterologous Products)

285

Production of heterologous compounds is often the aim of metabolic engineering projects. Assuming that the biosynthetic pathway leading to the desired product is already known, the first in silico task is to add the heterologous pathway to the metabolic reconstruction. Listed below are some factors that might need special attention (also see Note 6). 1. Stoichiometry and cofactor usage of all new reactions. 2. Biochemical evidence concerning the reversibility of the reactions. 3. Compartmentalization of the reactions/metabolites. In cases where identification of a novel metabolic pathway leading to a new compound is the objective of modeling, or one aims at finding an alternative pathway that is more suited to the host (e.g., due to less toxic intermediates), several available algorithms can be used [18, 27].

2.4 Metabolic Modeling for Metabolic Engineering Applications

3

There are a few metabolic modeling simulation platforms for metabolic engineering (reviewed at [18]). We will focus on modeling platforms that are based on the growth-product coupling philosophy, firstly proposed by Burgard et al. in 2003, as OptKnock. Optknock is a bi-level optimization approach that aims at finding gene deletion targets that couple product formation with biomass formation [20]. Thus, if successful, the resulting engineered strain would have the production of the desired compound (design objective) as a byproduct of biomass formation (cellular objective), and not as a competitor [20]. Among other simulation tools based on the growth-product coupling approach, a relevant example is OptGene, which allows the use of nonlinear design objective functions such as productivity [19]. OptGene is also flexible in terms of the use of biological objective functions other than growth, e.g., minimization of metabolic adjustment as has been previously used in model-aided construction of yeast cell factories [19].

Methods Having selected a suitable metabolic network reconstruction and the simulation platform, three major questions remain to be addressed before proceeding to the identification of metabolic engineering targets (Fig. 1): (1) what objective function to use to simulate cellular phenotype? (2) what physiological constraints to apply? and (3) how to preprocess the network and assemble the list of potential targets?

3.1 Choosing the Biological Objective Function

The choice of the biological objective function is fundamental for obtaining meaningful solutions. Maximization of growth, sometimes simply referred to as FBA (though FBA has much broader

286

Ana Rita Brochado and Kiran Raosaheb Patil

meaning and scope), is one of the most widely used objective functions. Although growth maximization is an elegant manner to mathematically formulate complex evolutionary principles, the underlying assumption of optimized growth implies that the microbe at hand had been subjected to evolutionary pressures for growth. Such assumption will hold true mostly in case of the “wildtype” strains. In case of mutant strains, adaptive evolution may be necessary before the fluxes are reorganized for supporting optimal growth [31]. In order to better predict the phenotypes of perturbed metabolic networks, i.e., after genetic manipulation (e.g., gene deletion) and without adaptive evolution, alternative biological principles have been suggested. Instead of optimal growth, proximity of the mutant’s metabolism to the wild-type is the hypothesis behind the currently used objective functions [32–34]. Experimental studies, including our previous work on vanillin production in S. cerevisiae, have demonstrated the usefulness of this hypothesis. Minimization of Metabolites Balance (MiMBl) is a recently published example of objective function stemmed from this hypothesis. MiMBl has contributed to the increased capability of metabolic models in predicting genetic interactions in yeast [34]. This result is particularly important for metabolic engineering, as multiple genetic manipulations are often necessary for successful strain improvement. Although growth and minimization of metabolic adjustment are the objective functions most commonly applied for metabolic engineering, several others have also been proposed, e.g., Regulatory-On-Off of metabolic fluxes changes (ROOM) and maximization of ATP yield [18, 33, 35]. 3.2 Gathering Constraints to Be Used in Simulations

Choosing appropriate physiological constraints to be applied in modeling is a determining factor, as the predicted targets can change depending on the used constraints. Typically, one imposes constraints based on experimental data on uptake/secretion of metabolites. For S. cerevisiae, exchange rates of key metabolites, such as glucose, carbon dioxide, oxygen and ethanol, are often available from the literature. It is important to verify that the constraints used for modeling mimic physiologically relevant situations, e.g., whether the cultivation conditions will induce respiratory, fermentative or respiro-fermentative metabolism (see Note 7). In case of MiMBl, or other methods that minimize metabolic adjustments, the physiological constraints are useful not only in constraining the mutant simulations but also in obtaining the wild-type/reference flux distribution. The latter is often obtained by using FBA. Thus, the used physiological constraints will also influence the predicted phenotypes of the knockout strains and hence the identified targets for improved production (see Note 8).

Model-Guided Metabolic Engineering

287

It is therefore important to use as much experimental data as possible to obtain a meaningful wild-type/reference flux distribution. In addition to the exchange rate of the secreted/uptaken compounds, in some cases it might be possible to further constraint the reference flux distribution by using 13C-labeling experiments [36]. Such data would represent significant advantage, since it can substantially decrease the solution space delimited by the metabolic network, and thereby result in more reliable intracellular flux distribution. It should be noted that even when several empirical constraints are used, the FBA solution for the reference flux distribution might not be unique and it is therefore recommended to run the simulations by using several alternative optimal reference flux distributions (see Note 9). In addition to flux measurements, transcriptomic, proteomic, and/or metabolomic data are becoming available for S. cerevisiae and can be valuable in modeling. Translation of such molecular abundance data into model constraints is, however, still a largely open question. We will not further elaborate on this topic here, and rather recommend the reader a recent review on this topic [37]. 3.3 Model Preprocessing and Shortlisting of Target Genes

Model preprocessing and the compilation of a list of potential targets for genetic manipulation is the last step before starting the model simulations. This step is critical for: (1) reducing the chances of obtaining biologically implausible solutions; and (2) improving the speed of simulations. Starting with all genes in the network, the following can be removed from the list of potential targets: 1. Genes that are in vivo essential or known to lead to severe growth defects. Note that several of such genes would be false positives in silico and hence would not be automatically discarded during simulations. 2. Genes that are predicted to be essential by using FBA. Some of these genes might not be essential in vivo. Simulation of any mutant that includes deletion of one or more of these genes will result in zero growth prediction. 3. Genes that code for enzymes catalyzing reactions predicted to be always inactive under the simulated conditions (e.g., deadend reactions). Deletion of these genes will not alter the phenotype prediction. A set of such genes can be found by using flux variability analysis (FVA) [38]. At last, note the representation of isoenzymes in the metabolic network. In some models these are described as multiple identical reactions, each associated with a single gene, while in others a single reaction is associated to multiple genes with “or” relationship. Currently, there are no generally applicable methods for modeling

288

Ana Rita Brochado and Kiran Raosaheb Patil

contribution of each of the isoenzyme to the overall flux through the reaction. In the absence of such constraints, deletion of all isoenzymes is necessary in silico for simulating the corresponding reaction inactivation. Consequently, if the removal of a reaction is beneficial for the product formation, such strategy will be identified only when searching for higher number of gene knockouts. If the inclusion of genes corresponding to isoenzymes in the list of potential targets is intended, we suggest the following: 1. Removal of duplicate reactions from the network; and/or 2. Changing the gene–reaction association from “or” to “and”, so as to enforce the suppression of flux through the reaction even when only one of the genes is deleted. A strategy involving isoenzymes was used in the vanillin case study [22]; PDC1, a gene with two other structural isoenzymes, was thereby identified as a deletion target and led to the improvement in vanillin production. See Note 10 for further considerations regarding network preprocessing and selection of deletion targets. 3.4 Performing Simulations

Several software platforms are available for carrying out metabolic engineering simulations, as well as more general constraint-based modeling routines, such as performing FBA or MiMBl. Among the main tools for target identification are OptKnock and OptGene. An example workflow showing the major steps involved when using OptGene is shown in Fig. 2. Following is a short list of selected tools, most of which are available online for download or as Web-based applications. 1. OptFlux, an open-source software platform for in silico metabolic engineering [39]. Among other functionalities, it allows performing OptKnock and OptGene simulations. http://www.optflux.org 2. COBRA Toolbox (v2.0), a comprehensive collection of tools and functions for constraints based reconstruction and analysis of metabolic networks [40]. The available tools include FBA, MoMA, OptGene and OptKnock, among others. It is available as a MATLAB toolbox and also as a Python application. http://opencobra.sourceforge.net/openCOBRA/Welcome. html 3. BioMet, a Web-based toolbox for genome wide analysis of metabolism [41]. It allows integration of high-throughput data with flux analysis. A FBA tool and a selection of yeast genome-scale metabolic reconstructions can also be accessed. http://129.16.106.142 4. MiMBl webserver, a Web-based tool for simulation of perturbed metabolic networks using MiMBl [34]. http://www.patil.embl.de

Model-Guided Metabolic Engineering

Reference physiological constratints

FBA

289

Reference flux distribution

Check for uniqueness GLK1 PGI1 PFK2

OptGene

FBA1 TPI1

FVA

TDH1

MiMBl

PGK1 GPM1 ENO1 CDC19 PDA1



20 10

0

10 20

Final list of metabolic engineering targets



Ranking

List of metabolic engineering strategies

Fig. 2 Illustration of the steps involved in designing metabolic engineering strategies by using OptGene with MiMBl as biological objective function. Circles represent methods/procedures/algorithms. Rounded rectangles represent input and/or output to the procedures. Full arrows mark required steps, while the dashed arrows symbolize optional but recommended steps. Note that, in comparison to FBA, the use of MiMBl requires an additional step for obtaining the reference flux distribution, and hence it entails a more complex workflow. We recommend the modeler to perform simulations by using several alternative reference flux distributions when using FBA to generate these, since the alternative optimal solutions can lead to different metabolic engineering targets 3.5 Ranking of Identified Targets

Along this chapter, we have suggested the modeler to explore, in silico, several hypothetical physiological scenarios, as well as to try different objective functions, depending on the problem at hand. Such procedure usually yields several sets of gene deletion targets potentially leading to the desired phenotypic trait. Manual inspection and screening of these results has a decisive outcome on which of the model-suggested strategies should proceed to experimental verification. Owing to the complexity of the genotype–phenotype relationship, there is no single protocol to perform such screening, but several guidelines can be used. 1. Check for uniqueness of the solution by using flux variability analysis (see Notes 9 and 11). 2. If data are available, check that the mutants manipulated in the corresponding gene(s) have not been reported to display severe growth defects or other undesirable traits. 3. If the target gene set includes isoenzymes, check the experimentally observed phenotype of the individual genes and accordingly select the gene(s) that will most likely result in lower flux, without leading to severe growth defects. In some cases, it might be necessary to delete all genes coding for the respective isoenzymes.

290

Ana Rita Brochado and Kiran Raosaheb Patil

Other criteria than growth or product yield can be used to rank the in silico identified strategies, eventually more adjusted to the problem at hand. One example is presented in Note 12.

4

Notes 1. The modeling methods discussed here assume that there are no limitations in the availability of active enzymes, especially in the pathway of interest. If such limitations exist, the modelsuggested targets may not yield expected results. 2. Besides their gene/reaction/metabolite coverage, the reconstructions often differ in their assumption of biomass composition, which is summarized in the biomass formation reaction. For example, iLL672, which is very similar to iFF708 in terms of the gene/reaction/metabolite coverage, has “noncanonical” constituents in the biomass composition, e.g., pyruvate [24]. Although such differences do not imply substantially different macromolecular biomass composition (e.g., lipid and protein content), they can lead to different phenotypic predictions, typically resulting in improved gene essentiality predictions [24, 26]. On the other hand, inclusion of some noncanonical constituents can also lead to false negative predictions when considering higher number of gene knockouts, and it may be necessary to review the biomass equation in some cases. 3. In addition to growth and product flux, it is advisable to check which by-products are being formed (e.g., ethanol) and whether the observed byproduct profile is in agreement with the known yeast physiology. 4. Most of the times the available data for modeling are the rates of uptake and secretion of few metabolites, e.g., glucose, oxygen, carbon dioxide, ethanol, glycerol, acetate and pyruvate. Constraints on the corresponding fluxes provide a good starting point for modeling. However, the possibility of multiple optimal solutions often exists owing to the complexity of the metabolic networks. The variability in intracellular fluxes due to the alternative solutions can be particularly large in case of the models involving multiple compartments. In order to reduce the consequent uncertainty in metabolic engineering simulations, one plausible option is to use smaller models such as iFF708 or iLL672, which have been curated with physiological considerations. Smaller models can compensate for their lower coverage and less comprehensiveness by reducing the variability while still capturing the major flux routes. 5. In cases where the product of interest is produced in a compartment other than cytosol or mitochondrion, a model with comprehensive compartmentalization information will, of course, be necessary.

Model-Guided Metabolic Engineering

291

6. Currently there is no established standard nomenclature for naming metabolites or reactions in the model. Therefore, while adding a new pathway to the model, adherence to the same nomenclature should be observed. 7. Glucose (or other carbon source) is often the limiting nutrient for yeast cultivations in several laboratory and industrial setups. This physiological condition should be correctly reflected in the model constraints. Although there is no guarantee that the uptake of glucose will remain constant after genetic manipulations, this parameter is often constrained to its wild-type value. It has been often observed that the substrate uptake rates in mutants are lower in comparison to the wild-type/reference strain. However, the current models are not predictive to quantitatively reflect this decrease. It is therefore recommended to interpret the model predictions in terms of product yields rather than production rates. 8. Minimization of metabolic adjustment was used as a biological objective when designing the metabolic engineering strategy for improving vanillin production in S. cerevisiae [22]. Three different sets of constrains were used in order to span the relevant yeast life-styles: (1) complete respiratory metabolism, characterized by low glucose uptake rate and no ethanol formation; (2) respiro-fermentative metabolism in aerobic condition, characterized by high glucose uptake rate, ethanol formation and active but low respiration, classically observed in batch cultivations; (3) respiro-fermentative metabolism with additional constraints based on the experimentally determined exchange rates of the vanillin producing reference strain. Major differences in the predicted targets were observed depending on the reference used, especially between the respiratory and respiro-fermentative situations. It is therefore important to consider different metabolic scenarios, as close as possible to the experimental conditions similar to those that will be used to grow the strains. 9. An important factor to take into account while using optimization-based approaches for metabolic modeling is the possibility of nonunique solutions, i.e., several alternative flux distributions may exist leading to the same objective function value. If the objective function is FBA, such alternative solutions will correspond to alternative pathways supporting the same growth rate (or biomass yield). When the number of experimental constraints added to the model increases, the solution space is often reduced and the variability due to alternative solutions decreases. For metabolic engineering simulations, alternative optima need to be tackled at two major steps: (1) obtaining reference flux distribution for MiMBl (or similar algorithms); (2) gene knockout simulations for coupling biomass with product.

292

Ana Rita Brochado and Kiran Raosaheb Patil

In the former, it is recommended first to reduce the alternative solution space by constraining the model with relevant experimental data (see Subheading 3.2), and to run simulations by using several of the alternative solutions as references. For the latter, it is generally suggested to use the most pessimistic estimate of the product flux (or other design objective value) for any given mutant. Simulation tools such as RobustKnock [42] and tilted objective functions [43] do this intrinsically. Alternatively, one can use flux variability analysis (FVA) [38] in order to verify that the flux leading to the product of interest is unique, or to choose the lower estimate for that mutant. In essence, FVA consists of constraining the objective function value (e.g., growth) to its optimal, and then minimization and maximization of the fluxes of interest to identify the upper and lower limits on their flux range. Fluxes with equal values for upper and lower limit would be the ones that are uniquely determined for that problem. 10. As opposed to simulating the deletion of genes, one can also simulate the “deletion” or suppression of reactions. When a reaction is associated to a single gene, deleting one or the other is equivalent. On the other hand, the in vivo suppression of non-enzyme-catalyzed reactions or reactions for which the enzyme-coding genes are not known, might not be possible. Nevertheless, reaction deletion routines can be very useful for gaining insights about the metabolic network and how different reactions affect the production of the desired product. 11. Although MiMBl is less prone to have multiple optimal solutions, we suggest to perform FVA of the in silico mutants before selecting targets for experimental work. 12. For improving vanillin production in yeast, we used the predicted metabolic adjustment as an additional feature for target ranking. It was hypothesized that the metabolic distance of the mutant from the wild-type is a measure of regulatory changes that a mutant will need to undergo for achieving the predicted performance. Therefore, we ranked all the strategies based on a Reward-Risk-Ratio, defined to be the product of the flux leading to the product by the one leading to biomass, divided by the metabolic adjustment [22]. References 1. Thiele I, Palsson BØ (2010) A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat Protoc 5:93–121 2. Schellenberger J, Park JO, Conrad TM, Palsson BØ (2010) BiGG, a biochemical genetic and genomic knowledgebase of large scale metabolic reconstructions. BMC Bioinformatics 11:213

3. Oberhardt MA, Palsson BØ, Papin JA (2009) Applications of genome-scale metabolic reconstructions. Mol Syst Biol 5:320 4. Usaite R, Jewett MC, Oliveira AP et al (2009) Reconstruction of the yeast Snf1 kinase regulatory network reveals its role as a global energy regulator. Mol Syst Biol 5:319

Model-Guided Metabolic Engineering 5. Patil KR, Nielsen J (2005) Uncovering transcriptional regulation of metabolism by using metabolic network topology. Proc Natl Acad Sci U S A 102:2685–2689 6. Covert MW, Knight EM, Reed JL et al (2004) Integrating high-throughput and computational data elucidates bacterial networks. Nature 429:92–96 7. Kim TY, Sohn SB, Kim YB et al (2011) Recent advances in reconstruction and applications of genome-scale metabolic models. Curr Opin Biotechnol 23:1–7 8. Osterlund T, Nookaew I, Nielsen J (2011) Fifteen years of large scale metabolic modeling of yeast, developments and impacts. Biotechnol Adv 30:979–988 9. Stephanopoulos GN, Aristidou AA, Nielsen J (1998) Metabolic engineering, 1st edn. Academic, San Diego, CA 10. Hong K-K, Nielsen J (2012) Metabolic engineering of Saccharomyces cerevisiae, a key cell factory platform for future biorefineries. Cell Mol Life Sci 69:2671–2690 11. Patil KR, Akesson M, Nielsen J (2004) Use of genome-scale microbial models for metabolic engineering. Curr Opin Biotechnol 15:64–69 12. Bailey JE (1991) Toward a science of metabolic engineering. Science 252:1668–1675 13. Varma A, Palsson BØ (1993) Metabolic capabilities of Escherichia coli. I. Synthesis of biosynthetic precursors and cofactors. J Theor Biol 165:477–502 14. Varma A, Palsson BØ (1994) Metabolic flux balancing, basic concepts scientific and practical use. Nat Biotechnol 12:994–998 15. Kauffman KJ, Prakash P, Edwards JS (2003) Advances in flux balance analysis. Curr Opin Biotechnol 14:491–496 16. Edwards JS, Ibarra RU, Palsson BØ (2001) In silico predictions of Escherichia coli metabolic capabilities are consistent with experimental data. Nat Biotechnol 19:125–130 17. Famili I, Förster J, Nielsen J, Palsson BØ (2003) Saccharomyces cerevisiae phenotypes can be predicted by using constraint-based analysis of a genome-scale reconstructed metabolic network. Proc Natl Acad Sci U S A 100: 13134–13139 18. Zomorrodi AR, Suthers PF, Ranganathan S, Maranas CD (2012) Mathematical optimization applications in metabolic networks. Metab Eng 14:672–686 19. Patil KR, Rocha I, Förster J, Nielsen J (2005) Evolutionary programming as a platform for in silico metabolic engineering. BMC Bioinformatics 6:308

293

20. Burgard AP, Pharkya P, Maranas CD (2003) Optknock, a bilevel programming framework for identifying gene knockout strategies for microbial strain optimization. Biotechnol Bioeng 84:647–657 21. Asadollahi M, Maury J, Patil KR et al (2009) Enhancing sesquiterpene production in Saccharomyces cerevisiae through in silico driven metabolic engineering. Metab Eng 11: 328–334 22. Brochado AR, Matos C, Møller BL et al (2010) Improved vanillin production in baker’s yeast through in silico design. Microb Cell Fact 9:84 23. Förster J, Famili I, Fu P et al (2003) Genomescale reconstruction of the Saccharomyces cerevisiae metabolic network. Genome Res 13: 244–253 24. Kuepfer L, Sauer U, Blank LM (2005) Metabolic functions of duplicate genes in Saccharomyces cerevisiae. Genome Res 15:1421–1430 25. Mo ML, Palsson BØ, Herrgård MJ (2009) Connecting extracellular metabolomic measurements to intracellular flux states in yeast. BMC Syst Biol 3:37 26. Zomorrodi AR, Maranas CD (2010) Improving the iMM904 S. cerevisiae metabolic model using essentiality and synthetic lethality data. BMC Syst Biol 4:178 27. Heavner BD, Smallbone K, Barker B et al (2012) Yeast 5—an expanded reconstruction of the Saccharomyces cerevisiae metabolic network. BMC Syst Biol 6:55 28. Kumar VS, Dasika MS, Maranas CD (2007) Optimization based automated curation of metabolic reconstructions. BMC Bioinformatics 8:212 29. Kumar VS, Maranas CD (2009) GrowMatch, an automated method for reconciling in silico/ in vivo growth predictions. PLoS Comput Biol 5:e1000308 30. Guijarro JM, Lagunas R (1984) Saccharomyces cerevisiae does not accumulate ethanol against concentration gradient. J Bacteriol 160: 874–878 31. Ibarra RU, Edwards JS, Palsson BØ (2002) Escherichia coli K-12 undergoes adaptive evolution to achieve in silico predicted optimal growth. Nature 420:20–23 32. Segrè D, Vitkup D, Church GM (2002) Analysis of optimality in natural and perturbed metabolic networks. Proc Natl Acad Sci U S A 99:15112–15117 33. Shlomi T, Berkman O, Ruppin E (2005) Regulatory on/off minimization of metabolic flux changes after genetic perturbations. Proc Natl Acad Sci U S A 102:7695–7700

294

Ana Rita Brochado and Kiran Raosaheb Patil

34. Brochado AR, Andrejev S, Maranas CD, Patil KR (2012) Impact of stoichiometry representation on simulation of genotype-phenotype relationships in metabolic networks. PLoS Comput Biol 8:e1002758 35. Schuetz R, Kuepfer L, Sauer U (2007) Systematic evaluation of objective functions for predicting intracellular fluxes in Escherichia coli. Mol Syst Biol 3:119 36. Sauer U (2006) Metabolic networks in motion, 13C-based flux analysis. Mol Syst Biol 2:62 37. Reed JL (2012) Shrinking the metabolic solution space using experimental datasets. PLoS Comput Biol 8:e1002662 38. Mahadevan R, Schilling CH (2003) The effects of alternate optimal solutions in constraintbased genome-scale metabolic models. Metab Eng 5:264–276 39. Rocha I, Maia P, Evangelista P et al (2010) OptFlux, an open-source software platform for

40.

41.

42.

43.

in silico metabolic engineering. BMC Syst Biol 4:45 Schellenberger J, Que R, Fleming RMT et al (2011) Quantitative prediction of cellular metabolism with constraint-based models, the COBRA Toolbox v2.0. Nat Protoc 6: 1290–1307 Cvijovic M, Olivares-Hernández R, Agren R et al (2010) BioMet Toolbox, genome-wide analysis of metabolism. Nucleic Acids Res 38:W144–W149 Tepper N, Shlomi T (2010) Predicting metabolic engineering knockout strategies for chemical production, accounting for competing pathways. Bioinformatics 26:536–543 Feist AM, Zielinski CD, Orth JD et al (2010) Model-driven evaluation of the production potential for growth-coupled products of Escherichia coli. Metab Eng 12: 173–186

Part IV Patenting and Regulations

Chapter 18 Patents: A Tool to Bring Innovation from the Lab Bench to the Marketplace Z. Ying Li and Wolfram Meyer Abstract Intellectual property (IP) is creations of the mind. Protecting IP through patents is an important venue for a researcher to reap rewards from his scientific endeavors. It is part of a competitive strategy for bringing one’s invention to the marketplace. Using the US and European patent systems as examples, we provide here an overview of how patents protect innovation, with a focus on biotechnology. We explain what a patent is, what a patent owner can do with a patent, and how patents are granted. The article ends with some recent examples of noteworthy patents in the field of yeast research. Key words Patents, Applications, European patents, US patents, European Patent Office (EPO), United States Patent and Trademark Office (USPTO), International patent applications, Patent Cooperation Treaty (PCT), World Intellectual Property Organization (WIPO), Yeast, Invention, Innovation, Novelty, Inventive step, Support, Industrial application, Royalties, Licenses, Infringement

1

Patents: Rights and Limitations Patent protection in the field of microbiology dates back to the nineteenth century. In the 1870s Louis Pasteur received French and US patents on improved methods of brewing beer and ale [1–3]. In the early 1890s Jokichi Takamine patented his improved methods of fermenting rice and soy, the starting materials for many fermentation products ranging from soy sauce and miso to sake [4]. An amylase used in these patented methods—Takadiastase— was one of the first products sold by pharmaceutical company Parke-Davis, for treating digestive ailments. So you may ask: what is a patent? A patent is an exclusive right of a limited duration, conferred by a government to the patent owner, to preclude others from making, using, or selling the patented invention without the owner’s permission. A patent gives its owner the right to stop others from commercially exploiting the patented invention. A patent owner, also called a patentee or patent

Valeria Mapelli (ed.), Yeast Metabolic Engineering: Methods and Protocols, Methods in Molecular Biology, vol. 1152, DOI 10.1007/978-1-4939-0563-8_18, © Springer Science+Business Media, LLC 2014

297

298

Z. Ying Li and Wolfram Meyer

proprietor, can be the inventor or the inventor’s assignee (e.g., an employer of an inventor). A person who makes, uses, or sells a patented invention without the owner’s permission is called a patent infringer. A patent, however, does not give its owner the right to practice the patented invention. To do so, the owner will have to make sure that he is not infringing others’ patents, i.e., he has “freedom to operate” (FTO) his own invention (see Note 1). A patent has two important limitations. The first is its territorial limitation—a patent is only effective in the country where it is granted. Thus, when you are deciding where to file for a patent, you should consider where infringement is likely to occur. The second limitation is time. In most countries a patent has a term starting on the date of grant (or issuance) and ending 20 years from the earliest filing date of the application for the patent. For example, if you filed an application for patent in 2010, and were granted a patent in 2013, your patent will have a term of 17 years, going from 2013 to 2030, assuming that all the maintenance fees or annuities are paid. During the term of a patent, a patent owner can enforce his patent right against an infringer. The patent owner may demand, through negotiation and/or a lawsuit, the ending of the infringement, and/or compensation for economic losses caused by the infringement (see Note 2). A patent owner can also, of course, grant someone the right to use the patented invention in exchange for monetary or other benefits (see below).

2

Why Patents? Once an invention is made, an inventor can keep it confidential, disclose it to the public (e.g., through publication), or patent it (see Note 3). Obtaining a patent is a major undertaking. It takes at least 3–5 years on average and costs a small fortune. Nonetheless, there are many reasons why an inventor should obtain a patent, ranging from personal recognition to commercial prospects. With a patent or a patent application in hand, an inventor can more easily attract funding for starting a company to commercialize the invention, or even just for further research. The inventor can also license a patent or patent application to a company that is interested in commercializing the invention, in exchange for licensing fees, milestones, and royalties. Sometimes your patent can be a “blocking” patent, meaning that a commercial entity cannot do business without infringing your patent; in that case, the company may be interested in negotiating a license agreement with you. A patent or patent application can also be an important asset in a cross-licensing deal, where two parties obtain IP rights from each other to further their respective business goals.

Patents: A Tool to Bring Innovation from the Lab Bench to the Marketplace

299

In any event, whether to obtain a patent is often a decision that one will need to discuss with his employer. Many employment contracts dictate that an employee has an obligation to assign (transfer) the right to his invention to the employer.

3

Patent Granting Systems Patent systems exist in most countries of the world and have evolved tremendously over the past century. Although historically, different countries have had different regimes for granting and enforcing patents, many countries have been working together to harmonize their patent laws through international treaties. One of the most important international IP treaties is the Paris Convention for the Protection of Industrial Property, signed on 20 March 1883 by 11 countries. Its membership has since expanded to more than 170 countries. The Paris Convention established a union for the protection of industrial property, including patents. It provides that each contracting state must grant the same protection of industrial property to nationals of the other contracting states as it grants to its own nationals (see Note 4). Importantly, the Paris Convention provides for the right of priority in patents. This right means that, if an applicant files a valid first application in one of the contracting states, the applicant may, within 12 months, apply for protection in any of the other contracting states. These later applications will be regarded as if they had been filed on the same day as the first application. In other words, these later applications will have priority over applications that may have been filed during that 12-month period by others for the same invention. Moreover, these later applications, being based on the first application, will not be affected by any disclosure of the invention by the inventor or others during that 12-month period. In other words, the earlier application puts a stake in the sand for the applicant. One great advantage of this provision is that, when an applicant desires protection in several countries, he is not required to present applications simultaneously in those countries. The applicant has 12 months at his disposal to decide where he wishes protection (see Note 5). In 1970, a number of contracting states to the Paris Convention signed the Patent Cooperation Treaty (PCT) (see Note 6). The PCT makes it possible to seek patent protection in many countries simultaneously by first filing an “international” (PCT) patent application (see Note 7). A PCT application has the same effect as a national application in each designated contracting state. However, the administrator of the PCT—the World Intellectual Property Organization (WIPO)—does not grant patents. A PCT application is ultimately examined and granted at the national level. Thus, while one can file an “international” application, it is not

300

Z. Ying Li and Wolfram Meyer

possible to obtain an “international” patent. To date, there are close to 150 contracting states to the PCT. During the “international” phase, a PCT application is subjected to preliminary examination by a selected patent office that acts as an International Searching Authority (ISA). This examination results in a preliminary, non-binding written opinion on whether the claimed invention appears to be patentable. A PCT applicant has up to 30 or 31 months after the earliest priority date—typically 18 or 19 months after the PCT filing date—to enter the PCT application into the “national” or “regional” phase, that is, to present the PCT application to the patent offices of the individual countries or regions (see below) where the applicant desires patent protection. While some patent offices take ISA’s preliminary opinion under advisement only and conduct their own examination of the application, many patent offices rely heavily on the ISA’s opinion. Filing a PCT application allows significant cost savings to applicants by deferring the major costs of patent procurement for at least 18 months. During this period, an applicant can evaluate his business needs and the ISA’s preliminary patentability opinion, and decide accordingly whether to go forward with the patent procurement process and if so, in which countries. In many cases patents are examined and granted by national patent offices. But in some regions of the world, a centralized regional patent organization examines patent applications and grants patents under a regional treaty. The biggest such organization is the European Patent Office (EPO), established in 1973 pursuant to the European Patent Convention (EPC) to give European countries a unified patent granting system. The EPO grants European patents, which are then “validated” (through translation and registration) in each designated member state to become a national patent. The national patent is enforced in accordance with each country’s own national laws. This system will change within the next few years: in 2013, 25 of the then 27 members of the European Union signed an agreement to establish a unitary patent system (see Note 8). Under this system, patents in the contracting states will still be granted by the EPO, but patent disputes will be litigated in a common court system, rather than in national courts.

4

Patent Disclosure A patent exists in the form of a legal document that is derived from a patent application. A patent document is usually substantially identical to the patent application, save for revisions made during the patent procurement process (also called patent prosecution). Most patent jurisdictions require that an applicant disclose the

Patents: A Tool to Bring Innovation from the Lab Bench to the Marketplace

301

invention in full, clear, and enabling details in the application, such that another person skilled in the field can reproduce the invention without an unreasonable amount of experimentation. A patent contains a description of the invention (also called the “specification”) and often figures, and ends with one or more claims. The description usually starts by providing background on the prior art (i.e., the collective relevant information available to the public at the time the application was filed), and describing the technical problem to be solved. It then describes the invention in detail, using specific examples, and experimental data, if any. Experimental data are not legally required, but they are often helpful to include. It is advisable to put much effort into drafting the description. Once a patent application is filed, it is nearly impossible to improve it substantially without losing the right of priority (see also Subheading 6). Claims are the most important part of a patent. They describe the exact subject matter to be protected, thus defining the legal boundary of a patentee’s exclusionary right. Consider this hypothetical claim: “A genetically engineered yeast cell comprising a nucleic acid sequence encoding human protein A isoform 1.” Someone using yeast cells expressing human protein A isoform 2 would not infringe this claim (see Note 9). While a patent applicant may have a natural desire to claim his invention broadly to “catch” more infringers, it is important to bear in mind that narrower claims are usually easier to get and less vulnerable to invalidity attack or revocation. Thus, a patent application should contain claims of various scopes (i.e., broad, narrow, and medium-breadth claims) so as to optimize protection for the invention. For biotechnological inventions, many patent offices require any biological material crucial for the claimed invention to be deposited in a public depository. For example, if a patent application claims the use of a genetically modified yeast strain, and this strain cannot otherwise be described sufficiently (e.g., by its modified genetic sequence), the patent offices may require the applicant to deposit the strain in an acceptable public depository and to provide assurance that the deposit will become publicly available once a patent is granted. In many jurisdictions, a patent application is published by the patent office in an official gazette or online about 18 months after its earliest filing date. Once a patent is granted, the patent in its granted form will also be published. In case the patent applicant later decides not to let the invention become publicly known, it is possible to withdraw the patent application just before publication—but then he can no longer obtain a patent from this application, and will have to file a new application if he still wishes to patent the invention.

302

5

Z. Ying Li and Wolfram Meyer

Patent Grant Criteria With the trend of worldwide harmonization, almost all patent systems have adopted similar criteria for granting a patent, although differences in examination approach remain among different patent offices. The following subsections outline the basic criteria followed by most patent offices.

5.1 Patentable Subject Matter

A first criterion for granting a patent is eligibility of the subject matter sought to be patented. In the USA, for example, any “new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof” is eligible for patenting (see Note 10). The country’s Supreme Court has limited the scope of patentable subject matter to specifically exclude abstract ideas, laws of nature, and natural phenomena. The emergence of biotechnology has posed difficult questions for the Supreme Court. In the landmark Diamond v. Chakrabarty case, the Supreme Court held that genetically modified microorganisms could be patented. In that case, the Court stated that “anything under the sun that is made by man” is patentable subject matter (see Note 11). That expansive view has been somewhat tempered in recent years. In Mayo v. Prometheus, the Supreme Court evaluated the patentability of a diagnostic method based on the discovery of a correlation between the level of a certain drug metabolite in a patient and the effectiveness of the drug in the patient. A unanimous Court held that the method was not patentable because it essentially claimed a natural law—the correlation (see Note 12). In 2013, the Supreme Court imposed further limits on patentable subject matter in Association for Molecular Pathology v. Myriad Genetics (see Note 13). In this case, the Court tackled the long-debated question of whether human genes are patentable. Myriad Genetics held patents on the BRCA1 and BRCA2 genes, and was the sole provider of BRCA1 and BRCA2 genetic screening for breast and ovarian cancers. A unanimous Supreme Court held that isolated genomic DNA is not patentable because genomic DNA is from nature. But cDNA, on the other hand, may be patentable if it differs from the natural genomic DNA in sequence. These recent cases will undoubtedly have an impact on the patenting of biotechnology inventions in the USA. In Europe, inventions in all fields of technology are patentable subject matter under the EPC if they are new, involve an inventive step and are susceptible of industrial application (see Note 14). Industrial application means that the invention can be made or used in any kind of industry, including agriculture (see Note 15). In most cases, the industrial application of a biotechnological invention is self-evident. But there are exceptions. For example, in the case of a newly identified gene sequence or partial gene

Patents: A Tool to Bring Innovation from the Lab Bench to the Marketplace

303

sequence, the industrial application may not be self-evident if the functions of the gene product are not described (see Note 16). Under the EPC, inventions have to be of technical character to be patentable. The EPC expressly excludes the following ones from patentability: discoveries; scientific theories; mathematical methods; aesthetic creations; and schemes, rules, and methods for performing mental acts, playing games, or doing business; computer programs; and presentations of information (see Note 17). These exclusions share similarity with the US case law. But unlike the USA, where there is no legislation or patent office regulations specifically addressing the patentability of biotechnology inventions, the EPC’s Implementing Regulations have specific biotech provisions (see Note 18). This part of European patent law provides the ground rules for determining the patentability of biotech inventions—together with the criteria applicable to all inventions. Under these provisions, biotech inventions are in principle patentable, with the following exceptions: (a) methods for treatment of the human or animal body by surgery or therapy, and diagnostic methods practiced on the human or animal body; (b) plant and animal varieties; and (c) essentially biological processes for the production of plants and animals (see Note 19). These exceptions do not include microbiological processes or the products thereof; that is, microbiological processes and products are patentable subject matter under the EPC. The EPC biotech provisions expressly permit patenting of biological material isolated from its natural environment or produced by means of a technical process, even if it previously occurred in nature (see Note 20). For example, the sequence or partial sequence of a gene may be patented even if it is identical to the naturally occurring sequence (see Note 21); this is in contrast to the Myriad Genetics decision in the USA. 5.2 Novelty and Inventiveness Requirements

Other important criteria for patentability are novelty and inventiveness of the invention sought to be patented. Novelty means that the invention must be new and previously undisclosed in the prior art. Inventiveness means that the invention must not be obvious to a person of ordinary skill in the art at the time of filing. Simply put, the questions asked by a patent examiner are: did the invention exist before? If not, would it have been obvious? A patent examiner evaluates the novelty and inventiveness of an invention by comparing it to the prior art that he has found through searches of publications such as peer-reviewed journals as well as patent databases. Some patent offices, such as those in the USA, Canada, Israel, and India, also require the patent applicant to submit prior art documents cited by other patent offices in counterpart applications, as well as any other relevant prior art document of which the applicant may be aware.

304

Z. Ying Li and Wolfram Meyer

In order for an invention to be novel, it must be different in at least a single aspect from what is known in the prior art. In other words, novelty can be established by even a small difference between the claimed invention and the prior art. It is important to note that sometimes an inventor may inadvertently create noveltydestroying art to his own patent application by disclosing his invention to the public through publications, conference presentations, discussions with others, etc., before patent filing (see Note 22). Thus, an inventor should consider whether to file for a patent before disclosing the invention to the public. If an inventor must discuss his invention with someone else (e.g., an investor) before patent filing, it is advisable for the inventor to obtain a confidentiality agreement beforehand. In order for an invention to be inventive, it must not be obvious in view of the prior art. To introduce objectivity, most patent offices assess inventiveness from the standpoint of a person of ordinary skill in the technical field. This is a hypothetical person who possesses all the public knowledge in the field and has ordinary or average skill for this field. In the USA, the Supreme Court has provided the so-called Graham v. Deere test to help determine non-obviousness of an invention (see Note 23). This test requires the consideration of (a) the scope and content of the prior art; (b) the differences between the claimed invention and the prior art; (c) the level of ordinary skill in the prior art; and (d) “secondary considerations” such as commercial success, long felt but unsolved needs, and failure of others. In practice, factors to be considered include whether the prior art provides a motivation to try the invention and whether the prior art provides a reasonable expectation of success and whether the invention provides unexpected results over the prior art. At the EPO, inventiveness is analyzed using the “problemand-solution” approach. This approach has the following key steps: (a) determining the closest prior art, (b) defining the objective technical problem to be solved, and (c) considering whether or not a skilled person, knowing the closest prior art and the objective technical problem, would have arrived at the same invention, i.e., a technical solution to the problem. This problem-and-solution approach strives to provide a systematic and uniform analysis of non-obviousness. To ensure uniform application of this analytical approach, this analysis is performed by an Examining Division, which consists of three patent examiners. 5.3 Support Requirement

As discussed above, a patent applicant is required to disclose his invention in such a clear and detailed manner that a skilled person in the art can understand and reproduce the disclosed invention without an unreasonable (undue) amount of experimentation. One reason behind this requirement is that the patent system is intended to promote progress in science and technology; a patent disclosure

Patents: A Tool to Bring Innovation from the Lab Bench to the Marketplace

305

should provide adequate information for others in the field to improve upon the disclosed invention. Another reason for this requirement is that a patent confers an exclusive right. To avoid the granting of an unduly broad right, a patent disclosure must provide sufficient evidence that the granted right is commensurate with what was invented. Thus, to submit a strong patent application, an applicant should draft the patent specification carefully, and make certain that the claims are fully supported by the specification.

6

Obtaining a Patent A patent application is filed with a national or regional patent office. Usually this is done in a country or region where the applicant resides. The day the application is filed is also called the effective filing date, or the priority date, of any future, related patent application. This is the critical date before which all the public knowledge in the art is considered the prior art. The applicant has now the right to file, within a period of 12 months, a foreign application at other patent offices under the Paris Convention, or an international (PCT) application. After all formalities have been taken care of, the patent office will start examining the patent application on its merits, i.e., to determine if the patent application meets the patentability requirements discussed above. (The EPO and some other patent offices also issue a search report beforehand, so that the applicant can decide on his prosecution strategy, e.g., whether to request examination of the application, and/or whether to amend the claims.) The patent office will then issue examination reports detailing its assessment of the patentability of the application. In case the examination report is unfavorable, e.g., one or more claims are rejected, the applicant has the opportunity to respond to the report by arguments and/or modifications to the application. These modifications are called “amendments.” Any amendment can only be based on what was already disclosed in the application as originally filed. Once the patent office finds the pending claims allowable, it will issue a notice of allowance or grant. This process takes as short as 1 year (if expedited examination is requested) to as long as 5 or more years.

7

Success Stories of Patenting Yeast Related Inventions There are many examples illustrative of patents’ role in helping to bring yeast-related inventions made in a laboratory to the marketplace. In the middle of the twentieth century, patents in the yeast field were mostly related to growth media or storage for baker’s yeast [5]. Since then, modern biotechnology has developed rapidly. More and more patents are being sought for genetically modified yeast.

306

Z. Ying Li and Wolfram Meyer

In 1976 Dr. Maria-Regina Kula isolated an enzyme called formate dehydrogenase (FDH) from Candida boidinii, and spent the next decades unlocking the catalytic potential of the enzyme [6]. In late 1990s she and her coworkers created and patented a genetically modified, more stable version of FDH for industrial use [7]. This patented modified FDH is now used in the large-scale production of many products. The German company Evonik Industries AG employs Dr. Kula’s patented enzyme as a catalyst in the commercial manufacture of new medications for cancer, asthma, high cholesterol, and AIDS. The cost-savings for the company are tremendous: a reaction that used to cost 2,000 € can now be carried out at 2 € (see Note 24). Toward the end of the twentieth century, innovation in the yeast field moved in the direction of producing biological medicinal products, such as protein therapeutics and vaccines, by using genetically modified yeast. For example, Drs. Benjamin Hall and Gustav Ammerer at the University of Washington, together with collaborators at other institutions, developed a platform technology for making biologics in yeast. The resulting patents were licensed to more than 50 companies. Merck & Co., Inc. uses the patented technology to manufacture the hepatitis B vaccine Recombivax HB® and the human papillomavirus vaccine Gardasil®. Novo Nordisk A/S uses the patented technology for the production of insulin. The royalties from the licensing of these patents have generated huge amounts of revenues for the University. Another excellent example demonstrating patents’ role in commercializing biotech inventions is the Hansenula technology invented and patented by Drs. Zbigniew Janowicz and Cornelius Hollenberg at Rhein Biotech GmbH. This technology allows the high-yield production of heterologous proteins in Hansenula yeast, and has been acknowledged as an industrial standard for protein production. The “Hansenula technology” was granted a European patent in 1994 [8]. Using the patented Hansenula technology, Rhein Biotech became the third largest producer of Hepatitis B vaccines worldwide. Millions of doses of the WHO-qualified vaccine have been sold in over 90 countries, helping to combat the spread of Hepatitis B. The success of the Hansenula technology enabled the company to list on the Frankfurt Stock Exchange in 1999. Rhein Biotech has granted numerous research and commercial licenses for the patented technology, helping licensees to bring other biological products to the market (see Note 25). In 2013, another patented yeast technology made headlines around the world. Sanofi announced the launch of large-scale manufacturing of artemisinin—the key ingredient of a highly effective antimalaria therapeutic—using a patented yeast technology invented by Dr. Jay Keasling and his group at the University of California, Berkeley [9–11]. In the past, the only source of artemisinin had been the sweet wormwood plant grown mostly in

Patents: A Tool to Bring Innovation from the Lab Bench to the Marketplace

307

China and Vietnam. But variable harvests and a long production cycle often made supplies and prices fluctuate, severely hampering the availability of the drug to patients in developing countries. With the new technology, the drug can be supplied with much less price fluctuation. Sanofi plans to produce 50–60 tons of artemisinin per year by 2014, which corresponds to 80–150 million malaria treatments, and to sell them at cost (see Note 26). Yeast technology has also shown great promise in producing biofuels. This can be seen in the increasing number of biofuel companies and an explosion of patent activity in this field. One of the leading companies in the biofuel field is Butamax™ Advanced Biofuels LLC, a joint venture established by BP and DuPont in 2009 to develop microorganism-made biobutanol for the transportation market. The company has been expanding its patent portfolio to protect its proprietary technology, and is actively partnering with other companies to commercialize the patented technology (see Note 27).

8

Summary For more than a century, patents have provided a strong incentive for scientific innovation. They will continue to play an important role in bridging the lab bench and the marketplace, helping to bring fruits of scientific endeavors to practical everyday use, and to improve the quality of our life.

9

Notes 1. Commercialization of certain patented inventions in the biotech/pharma industry may also be subject to other legal constraints such as governmental regulation. 2. In some jurisdictions, including the USA and Europe, a patent application may also provide “provisional rights” to the applicant once it is published. For example, in the USA, a patent owner may sue, within 6 years of patent issuance, to collect a reasonable royalty from someone who infringed the patent application during the period between the date of the patent application publication and the date of the patent issuance. To win such a lawsuit, the patent holder must show that the infringer had actual notice of the published patent application, and that the invention claimed in the patent application is substantially identical to the invention claimed in the issued patent. See 35 United States Code § 154(d). In Europe, the enforcement of the provisional rights varies from country to country. See, e.g., European Patent Convention (14th Ed., 2010), Article 67.

308

Z. Ying Li and Wolfram Meyer

3. Publishing an invention and patenting it are not mutually exclusive. However, it is usually desirable to file for a patent on the invention before disclosing it to the public; otherwise the public disclosure may adversely affect the patentability of the invention. 4. Nationals of non-contracting states also are entitled to national treatment under the Paris Convention, if they are domiciled or have a real and effective industrial or commercial establishment in a contracting state. 5. http://www.wipo.int/treaties/en/ip/paris/summary_paris. html. Accessed 18 September 2013 6. Another major initiative for harmonizing IP laws took place during the Uruguay Round of the multilateral trade negotiations conducted under the General Agreement on Tariffs and Trade (GATT), with more than 100 countries participating. The Round led to the creation of the World Trade Organization. This Round also led to the Agreement on Trade Related Aspects of Intellectual Property Rights (TRIPS) in 1994, a significant milestone in harmonizing IP laws among WTO member countries. 7. A PCT application may be filed by anyone who is a national or resident of a PCT contracting state. The application is filed with the national patent office of the contracting state of which the applicant is a national or resident, or with WIPO’s International Bureau in Geneva. http://www.wipo.int/treaties/en/registration/pct/summary_pct.html. Accessed 18 September 2013 8. Italy and Spain did not sign the agreement. 9. In some countries such as the USA, there are two theories of patent infringement—literal infringement and infringement under the Doctrine of Equivalents. In the above example, while a yeast cell expressing isoform 2 does not fall within the literal scope of the claim, a court or jury may decide whether the yeast cell is equivalent to the claimed yeast cell and therefore is an infringing product under the Doctrine of Equivalents. 10. 35 United States Codes § 101. 11. Diamond v. Chakrabarty, 447 U.S. 303, 309 (1980). 12. Mayo v. Prometheus, 132 S. Ct. 1289 (2012). 13. Association for Molecular Pathology v. Myriad Genetics, 133 S. Ct. 2107 (2013). 14. EPC (14th Ed., 2010), Article 52. 15. EPC (14th Ed., 2010), Article 57. 16. The EPO Guidelines for Examination, Part G, Chapter III, Section 4.

Patents: A Tool to Bring Innovation from the Lab Bench to the Marketplace

309

17. EPC (14th Ed., 2010), Article 52. Article 53 of the EPC also prohibits the patenting of any invention whose commercial exploitation would be contrary to public order or morality. 18. Implementing Regulations to the Convention on the Grant of European Patents (as amended in 2010), Part II, Chapter V. These provisions were adopted from the 1998 EU Directive 98/44/EC, known as the “Biotech Patent Directive.” 19. EPC (14th Ed., 2010), Article 53. Many other countries similarly prohibit the patenting of methods for treating a human body or diagnostic methods performed on a human body. In Europe and in such countries, patent claims have to be worded differently from those in, for example, the USA, where such methods can be patented, so as not to run afoul of the prohibition. 20. Implementing Regulations to the Convention on the Grant of European Patents (as amended in 2010), Rule 27. 21. Implementing Regulations to the Convention on the Grant of European Patents (as amended in 2010), Rule 29. 22. Some countries (e.g., the USA, Australia, Canada, and Japan) provide a grace period of 6 or 12 months to protect a patent applicant from disclosure of the invention before the application filing date. This grace period does not exist under the EPC or in most countries. 23. Graham v. John Deere Co., 383 U.S. 1 (1966). 24. “A Gentler Biotechnology.” http://www.epo.org/learningevents/european-inventor/finalists/2009/kula.html . Accessed 17 September 2013 25. “Yeast Technology Helps Fight Against Hepatitis B.” http:// www.epo.org/learning-events/european-inventor/finalAccessed 17 ists/2006/Janowicz-Hollenberg_fr.html. September 2013 26. Sanofi and PATH Press Release (11 April 2013). Sanofi and PATH Announces the Launch of Large-Scale Production of Semisynthetic Artemisinin Against Malaria. 27. http://butamaxpatents.com/PatentHistory/ButamaxPatents. aspx. Accessed 9 October 2013

Acknowledgments We would like to thank Drs. Siobhán Yeats, Jose Garcia-Sanz, Manfred Cassens for their careful reading of the manuscript and their helpful comments. The present article is based on personal considerations by the authors and does not reflect the official positions of the European Patent Office or Ropes & Gray LLP.

310

Z. Ying Li and Wolfram Meyer

References 1. Pasteur L (1873) Improvement in brewing beer and ale. United States Patent 135,245 2. Pasteur L (1873) Improvement in the manufacture of beer and yeast. United States Patent 141,072 3. Federico PJ (1937) Louis Pasteur’s patents. Science 86:327 4. Takamine J (1894) Preparing and making moto. United States Patent 525,821 5. Gelinas P (2012) In search of perfect growth media for baker’s yeast production: mapping patents. Compr Rev Food Sci Food Saf 11:13–33 6. Schütte H, Flossdorf J, Sahm H, Kula M-R (1976) Purification and properties of formaldehyde dehydrogenase and formate dehydrogenase from Candida boidinii. Eur J Biochem 62:151–160

7. Slusarczyk H, Felber S, Kula MR, Pohl M (2000) Stabilization of NAD-dependent formate dehydrogenase from Candida boidinii by site-directed mutagenesis of cysteine residues. Eur J Biochem 267:1280–1289 8. Hollenberg CP, Janowicz Z (1994) DNAmolecules coding for FMDH control regions and structured gene for a protein having FMDH-activity and their uses. European Patent EP0299108 B1 9. Keasling J et al (2007) Biosynthesis of isopentenyl pyrophosphate. United States Patent 7,172,886 10. Keasling J et al (2009) Methods for synthesizing mevalonate. United States Patent 7,622,283 11. Keasling J et al (2012) Host cells for production of isoprenoid compounds. United States Patent 8,288,147

INDEX A Activity score (AS) .................................. 234, 240–241, 246, 248, 249 Adenosine triphosphate (ATP) .................. 26, 272–274, 286 dl-Alanine-2,3,3,3-d4 ......................................................203 Alcohol oxidase (AOX) AOX1 ....................................................... 90, 93, 96–100, 102, 103 AOX2 ............................................................ 90, 101, 102 Aox1p .....................................................................90, 99 Aox2p .....................................................................90, 99 α-Agglutinin..................................... 138–141, 146, 151–153 α-Aminobutyrate..............................................................201 Alpha amylase signal sequence .........................................104 Alpha factor leader peptide................................................ 88 Alpha mating-factor ...................................................97, 104 Alternative oxidase gene (AOD) .............................. 100, 102 Anova ....................................................................... 240, 241 Antimycin A .....................................................................100 AP-conjugated anti-rabbit antibody.................................118 Arming yeast ............................................................137–153 ARS1-based vectors .........................................................114 AS. See Activity score (AS) Ashbya gossypii ............................................................... 11, 29 ATP. See Adenosine triphosphate (ATP) attB site ............................................................ 48, 49, 56, 59 attP ...............................................................................55, 56 Autonomously replicating sequence (ARS) ..................70, 71 ARS1-CEN4 ................................................................. 69 Autoselection systems ...............................................3, 4, 6–7 Auxotrophy ................................................. 12, 45, 93, 94, 96

B Bacillus megaterium ............................................................ 126 Beta-galactosidase gene ................................................70, 80 Binomial test ............................................................260–261 Biological objective function .................... 282, 285–286, 289 Biomass ...............................17, 18, 75, 96, 97, 113, 139, 187, 189, 190, 192, 193, 198, 213, 214, 217, 228, 229, 272–274, 285, 290–292 biomass equation ................................ 272, 273, 284, 290 BioMet .............................................................................288 BLAST.............................................................................270

BNICE .............................................................................276 Bulk reciprocal hemizygosity analysis (bRHA) ................264

C Caffeine demethylase................................................126, 129 Candida boidinii ................................................................306 Carbon dioxide ................................................. 284, 286, 290 Carboxypeptidase Y ........................................................... 90 CAT gene ...........................................................................52 Causative alleles ................................................ 252, 261, 264 13 C-based metabolic flux analysis (13C-MFA) ............................................. 209, 217 Cell lysis ..................................................................... 90, 119 Cell surface display ...................................................137–153 13CFlux2 ........................................................ 210, 212, 222, 223, 228, 230 Chemostat ................................170, 171, 174–176, 211–213, 216–218, 229, 274, 284 Chimeric/hybrid promoters ..........................................31–34 Choline ...............................................................................47 Cis-regulatory element .................................................21, 25 13 C labeling ....................................................... 276, 277, 287 ClonNat.............................................159, 160, 164, 165, 167 COBRA ............................................272, 274, 276, 277, 288 Cold methanol..........................................................201, 202 Compartmentalization .............................................285, 290 Constraints .................................64, 210, 217–222, 224, 225, 230, 273, 281, 282, 284–288, 290, 291, 307 Continuous culture ...........................................210–213, 284 CO2 production rates (CER) ............................................213 CORE cassette .........................................................159–167 Counterselection...........................................................10–11 Cre/loxP .............................................................................. 11 Cross-resistance ........................................................170, 181 Cumomer ......................................................... 222, 225, 226

D DAPI ................................................................ 128, 132, 133 Data integration .......................................................278, 284 Deletion cassette......................................... 49, 58, 78–80, 82 Delitto perfetto ..................................................... 11, 158, 159 Differential spectrum................................................117, 121 Diploid ................................... 77, 92, 93, 158, 159, 164–165, 167, 252, 254–257, 261, 262, 264

Valeria Mapelli (ed.), Yeast Metabolic Engineering: Methods and Protocols, Methods in Molecular Biology, vol. 1152, DOI 10.1007/978-1-4939-0563-8, © Springer Science+Business Media, LLC 2014

311

YEAST METABOLIC ENGINEERING: METHODS AND PROTOCOLS 312 Index Directed evolution ................................................ 35, 95, 104 Direct repeats .................................................................... 11 Dissolved oxygen (dO2) .................................... 188, 190, 193 Dithiothreitol (DTT) ..................................51, 60, 65, 76, 82 2μ DNA .............................................................................70 Dot blot .............................................114, 116, 117, 119, 121 Downstream promoter element (DPE) ........................19, 20 DpnI ................................................................. 129, 134, 135 Drug resistance ............................................. 3, 9, 12, 94, 158 DsRed-monomer...................................... 141–145, 148, 149 DTT. See Dithiothreitol (DTT)

E Electroporation....................................... 55, 73–74, 128, 130 EMS. See Ethyl methane sulfonate (EMS) Enterokinase ............................................. 144, 150–151, 153 EPC. See European Patent Convention (EPC) Episomal vector ................................................................. 23 EPO. See European Patent Office (EPO) Ethanol ................................8, 24, 35, 47, 54, 57, 58, 60, 101, 102, 129, 139, 142, 145, 170–172, 182, 200, 201, 203, 213, 252, 254, 255, 259, 284, 286, 290, 291 Ethylamine ........................................................................ 47 Ethyl methane sulfonate (EMS)....................... 170–173, 181 European Patent Convention (EPC)....................... 300, 302, 303, 307–309 European Patent Office (EPO) ....................... 300, 304, 305, 308, 309 Evolutionary engineering .........................................169–182

F FACS. See Fluorescence-activated cell sorting (FACS) FBA. See Flux balance analysis (FBA) Fed-batch ............................ 97, 101, 185–187, 189–192, 228 dynamic fed-batch ......................................................186 Feeding medium .......................................................171, 187 FLAG epitope ..........................................................141, 144 FlocA................................................................................271 Flo1p ................................................................................138 Flow cytometry.........................126, 131, 133, 134, 148–149, 254–256 Flp/FRT system ................................................................ 11 FLP recombinase..........................................................70, 94 Fluorescence-activated cell sorting (FACS) ............................................ 35, 126–129, 132–134, 140, 143, 145 5-Fluoroorotic acid (5-FOA) .............................................10 Flux balance analysis (FBA) ..................... 282, 285–289, 291 Flux distribution ....................... 210, 227, 271, 273, 275–277, 282, 286, 287, 289, 291 FluxML ...................................................... 22, 223, 225, 226 Flux variability analysis (FVA) ......................... 287, 289, 292 Formaldehyde .................. 7, 8, 44, 90, 94, 101, 143, 147, 153 Fragmentomers.........................................................215, 216

Freeze-dry ................................................................ 203, 204 FVA. See Flux variability analysis (FVA)

G G418 ..........................8, 9, 12, 65, 69, 81, 87, 93, 94, 96, 158 Galactose-inducible promoters .......................................... 26 Gas chromatography–mass spectrometry (GC–MS) ....................................... 200, 203–205 Gateway cloning .............................................................. 49, 55–58 ENTR .............................................................. 49, 56, 57 primer ......................................................... 48–50, 56, 57 GC–MS. See Gas chromatography–mass spectrometry (GC–MS) GC-rich sequence ............................................................. 22 GDLS ..............................................................................274 Gene deletion ........................... 10, 29, 44, 45, 49, 55, 58, 78, 83, 158, 160, 164, 168, 262, 264, 273, 281–292 Gene expression....................................18, 21–33, 35, 37, 44, 45, 49, 51, 64, 66–73, 105, 276 Geneticin (G418) .....................................8, 9, 12, 50, 65, 69, 81, 87, 93, 94, 96, 158, 264 Genetic stability ...............................................179–180, 182 Genome coverage .............................................................270 Genome scale metabolic models curation...............................................................271–272 parameterization .................................................272–274 reconstruction ..............269–278, 281, 283–285, 288, 290 validation ............................................ 214, 272–275, 284 GFP. See Green fluorescent protein (GFP) Gibbs free energy .............................................................273 GIN11 ..............................................................................165 GIN11M86 ................................................ 10, 160, 165, 167 Glucoamylase ......................................... 37, 38, 72, 141, 146 Glycerol ...................................... 8, 35, 47, 98–101, 115, 116, 121, 170, 173–175, 179, 180, 186–189, 191, 193, 199, 210–213, 215, 218, 228, 284, 290 Glycoengineered strains................................................91–92 GlycoFi technology ........................................................... 92 Glycomodification ............................................................105 GlycoSwitch technology.................................................... 91 Glycosylphosphatidylinositol (GPI) .................................138 Green fluorescent protein (GFP) .......................... 25, 35, 67, 68, 98, 99, 102, 104, 126, 127, 129, 130, 132 Growth inhibition gene ....................................................160

H Hansenula polymorpha (H. polymorpha) .........................43–60 Hansenula technology .......................................................306 Heterologous protein production .................... 18, 26, 27, 44, 96, 97, 103, 105, 306 Heteronuclear single quantum coherence (HSQC).......................................... 215, 216, 230 High-throughput enzyme screening.................................127

YEAST METABOLIC ENGINEERING: METHODS AND PROTOCOLS 313 Index High throughput screening ....................... 35, 113–122, 125, 137, 138, 140, 145 His-tag (or His-tagged).................................... 114, 116, 118 HMMER .........................................................................271 HMRa ...................................................................... 252, 257 HMRα ...................................................................... 252, 257 Homologous recombination (HR) ...................11, 28, 29, 77, 79, 82, 88, 92, 114, 157, 158, 261, 264 Homothallic .....................................................................262 Horseradish peroxidase (HRP).......................... 90, 100, 102, 115, 116, 119, 120 Hot ethanol ......................................................................201 HPLC ......................................................131, 135, 187, 190, 202, 211–213 HRP. See Horseradish peroxidase (HRP) Hygromycin resistance....................................................... 69

I Induction .................................. 11, 23, 25–28, 31–33, 35, 51, 93, 97, 101, 102, 187, 192 Industrial application ................................................302, 303 Industrial property ............................................................299 Industrial strains ......................................... 4, 7, 77, 157, 262 Initiator element (INR) ......................................................20 Innovation ................................................................297–309 In silico metabolic engineering .................................284, 288 International patent applications ......................................300 Invention .................................................. 297–303, 305–309 Inventiveness ............................................................303–304 Inventive step ...................................................................302 Inverse metabolic engineering ..........................................169 Inverted repeat (IR) ............................................................70 inverted repeat IR-B .................................................... 70 Isoenzyme................................................. 119, 218, 287–289 Isotopomer ...................................................... 210, 215, 216, 222, 228

K Kanamycin...................................... 48, 57, 59, 68, 69, 80, 87, 93, 186, 190. See also G418 kex2Δ ..............................................................89, 91, 141, 152 Klebsiella pneumoniae ....................................................... 8, 53 KlURA3 .................................................................... 5, 10, 11 Kluyveromyces lactis ................................................ 4, 5, 72, 73 Komagataella K. pastoris.......................................................................88 K. phaffii ........................................................................88 K. pseudopastoris .............................................................88 K28 preprotoxin ...............................................................104 Kyoto encyclopedia of genes and genomes (KEGG) codes ................................................... 235–240, 243, 244 KEGG Orthologies (KO) ..........................................271 library .........................................................................243

L lacO operator ..................................................................... 32 Lactococcus lactis ................................................. 242–244, 248 Leader sequences ...................................... 64, 66–74, 96, 104 LiAc. See Lithium acetate (LiAc) Licenses .............................................226, 230, 259, 298, 306 Linear expression cassettes ...........................................88, 95 Lithium acetate (LiAc) ....................65, 74–77, 143, 159, 162 LR clonase.................................................................... 56, 57 plasmid ........................................................................ 58 reaction ...................................................................57–58 Lyophilisation ............................115, 146, 204, 205, 212–214

M Marker. See also Selection marker carbon-/nitrogen-source specific markers ...................4, 6 counterselectable markers .......................................10, 11 (semi)dominant marker ................................................. 7 marker-free knockout .................................................. 94 marker loop-out ........................................................... 11 prototrophic markers ..................................................3–6 resistance marker ...................................... 3, 7–9, 96, 186 Mass spectrometry (MS) ................................. 200, 205, 210, 211, 225, 251 Matα2 system .................................................................... 28 Mathematical model.........................................................284 MATLAB ........................................210, 212, 215, 220, 221, 226, 276, 288 MAT locus ............................................... 167, 252, 256, 257 Matrix........................ 217, 220–222, 226, 227, 230, 273, 277 mazF...................................................................................94 mCherry ................................................................... 129, 132 Meiotic segregants ............................251, 253, 254, 257–258, 262, 263 Metabolic flux........................................ 18, 35, 36, 209–230, 275–277, 284, 286 Metabolic flux analysis (MFA) ................................. 209–230 Metabolic flux ratio (METAFoR) ............ 210, 216–218, 222 Metabolic models .............................218, 220, 228, 269–278, 281, 282, 286 Metabolic network ....................... 17, 34, 210, 217, 219, 222, 223, 270, 274, 276, 281, 282, 285–288, 290, 292 Metabolic pathway activity .......................................233, 234 Metabolic profiling ...........................................................234 Metabolite derivatization ..............................................................200 extraction ............................................................199–200 Metabolome/metabolomics .....................................197–206, 234–239, 242, 287 METAFoR. See Metabolic flux ratio (METAFoR) Methanol/chloroform .......................................................201 Methanol pulses ....................................... 186, 187, 189–193

YEAST METABOLIC ENGINEERING: METHODS AND PROTOCOLS 314 Index Methomyl ...........................................................................95 Methotrexate ....................................................................... 9 Methylamine .......................................... 47, 51, 94, 101, 102 Methyl chloroformate (MCF) .................................. 202, 204 Methylotrophic yeast ............................ 44, 88, 113, 185, 211 MFA. See Metabolic flux analysis (MFA) Minimization of metabolic adjustment (MoMA) ............. 285, 286, 291 of metabolites Balance (MiMBl) ....................... 286, 288, 289, 291, 292 Model pre-processing.....................................................287–288 validation ....................................................272–274, 284 Most probable number (MPN) ........................ 170, 176–180 Multi copy vector ................................................................ 9 Multiploidy.......................................................................262 Mutated transcription factors ...........................................8, 9 Mutation rates .................................................... 21, 134, 135 MX cassettes...................................................................... 11

N natMX gene ......................................................................160 N-glycosylation .........................................44, 91, 92. See also N-linked glycosylation N-linked glycosylation....................................................... 91 NMR. See Nuclear magnetic resonance (NMR) Nourseothricin............................ 9, 50, 52, 55, 65, 68, 69, 81, 158–160, 165 Novelty .....................................................................303–304 NP-40....................................................................... 254, 258 Nuclear magnetic resonance (NMR) ....................... 209–212, 214–216, 222, 228, 230

O Objective function ................................... 221, 274, 276, 282, 285–286, 289, 291, 292 O-linked glycosylation ...................................................... 91 OptGene .................................................. 274, 285, 288, 289 OptKnock................................................. 274, 275, 285, 288 OptStrain .........................................................................275 Orthogonality ...............................................................30, 31 Ortholog ............................................................. 29, 270, 271 O2 uptake rates (OUR).....................................................213

P P450 ..................................................114–117, 119, 121, 126 Paris Convention .............................................. 299, 305, 308 Patent applications .....................................................93, 300 Patent Cooperation Treaty (PCT) ........................... 299, 300, 305, 308 Patent infringement..........................................................308 Patents European patents (EP) ....................................... 300, 309 U.S. patents .........................................................308, 309

Pathway activity profiling (PAPi) analysis................................................................242–244 function ..............................................................245–246 PAPiHtest .......................................... 241, 244, 247–248 plotting PAPi results........................... 241, 246, 249, 250 PEX genes ....................................................................52, 53 Phospholipase C (PI-PLC) ..............................................138 phyA ...................................................................................33 phyB ...................................................................................33 Phytochrome ..................................................................... 33 Pichia P. angusta .......................................................................43 P. pastoris Mut+ .......................................................................90 Mut- ........................................................................90 MutS ............................................................... 90, 117 P. stipitis ....................................................... 101, 269, 270 Pichia genome .................................................................... 89 PichiaPinkT system........................................................... 94 Pif3 regulator ..................................................................... 33 Plasmids Hansenula polymorpha plasmids ..............................52–53 maintenance plasmid ..................................................... 7 pCORE5 ....................................................................160 Pichia pastoris episomal plasmid (pBGP1) .................... 95 plasmid integration .................................................77–80 Zygosaccharomyces bailii plasmids.............................66–73 Polycistronic gene .............................................................. 18 Polyethylene glycol (PEG) .............................66, 75, 81, 143, 159, 162 Polygenic traits .................................................................253 P/O ratio .................................................................. 273, 274 Pre-initiation complex (PIC)........................................19, 21 Prepro-leader sequence ...............................................97, 104 Promoter Ashbya gossipii promoter ............................................... 11 constitutive promoters ......................... 19, 20, 22–25, 29, 44, 98–99 core promoter ....................................... 19–21, 27, 30–34 cytomegalovirus promoter (PCMV ) ................................29 heterologous promoters ........................ 12, 22–31, 34, 71 Pichia pastoris constitutive promoters ......................................98–99 inducible promoters ........................................99–102 self-inducing promoters ............................................... 27 strength....................................................... 21, 24, 25, 35 structure ..................................................................18–22 synthetic promoters .............................. 18, 30, 33–35, 97 Zygosaccharomyces bailii endogenous promoter .......................................................... 72 Propidium iodide (PI) .............................. 252, 254–255, 262 Protein A isoform 1 ..........................................................301 Proteinase A (PEP4) ....................................................89, 90 Proteinase B (PRB1) ....................................................89, 90 Pseudosteady state ............................................................228

YEAST METABOLIC ENGINEERING: METHODS AND PROTOCOLS 315 Index PTM1 solution.........................................................187, 188 Pyridine .................................................................... 202, 204

Q QTL mapping ..........................................................251–264 Quenching ................................................ 198, 199, 202, 203 Quenching solutions.................................................199, 202

R Ranking ............................................................ 289–290, 292 RAVEN toolbox .......................................................270, 271 Reciprocal hemizygosity analysis (RHA) ................ 252, 253, 261–262, 264 Red fluorescent protein (RFP) ................................. 132, 133 Regulated promoters ..............................................19, 25–28 Regulatory element TF-C ................................................. 70 Regulatory networks ....................................... 18, 28, 30, 278 Replica plating .................................................... 79, 162, 163 Repression ................................... 21, 23, 26–28, 32, 101, 102 Resistance cassettes.......................................................78, 87 Respiration .......................................................................291 Respiro-fermentative metabolism.............................286, 291 Reversibility of the reactions.............................................285 RNA aptamer ...........................................................126, 128 RNA polymerase II ......................................................19, 20 RNA switch ..............................................................125–135 Royalties ................................................................... 298, 306 R-software ................................................................ 234, 235

S Saccharomyces cerevisiae (S. cerevisiae) ..................... 44, 53, 63, 64, 88, 102, 113, 125, 127, 138, 141, 157–167, 170, 171, 263, 269, 281–292 Schizosaccharomyces pombe ...................................... 4, 5, 7, 269 Secretion rates ..........................................................213, 276 Secretion signal Arxula adeninivorans secretion signal of glucoamylase ................................................ 72 bacterial secretion leaders ...........................................105 Kluyveromyces lactis signal peptide of the K1 killer toxin ................................................................. 73 S. diastaticus secretion signal of glucoamylase .............. 72 Z. bailii pre-killer toxin zygocin leader peptide ........... 73 Segregants .........................................251, 253, 254, 257–263 Selection batch selection ....................................................171–175 chemostat selection ..................................... 171, 175, 179 continuous stress selection ..................................174–175 pulse stress selection ...........................................173–175 Selection marker ...............................3–12, 49, 55, 56, 58, 69, 160, 190, 261, 262, 264 Sequence conferring stability (STB)................................... 70 Shortlisting of target genes .......................................287–288 Signal sequence ......... 72, 87, 97, 99, 104, 138, 141, 146, 153

Silanization ....................................................... 203, 205, 206 Silylation ..........................................................................200 Simulation ........................................ 224, 226–228, 284–292 Single cell analysis ....................................................149–150 Single nucleotide polymorphisms (SNPs) .................................... 251, 253, 260, 264 Single nucleotide variants (SNVs) ............................ 261, 264 SNV frequency ...........................................................260 Single strand carrier DNA (ss-DNA) .................... 66, 75–77 sir3 mutants ....................................................................... 28 Sorbitol ................................... 7, 48, 66, 74, 76, 77, 100, 101, 182, 193, 254 Specific growth rate (μ) ....................................................185 Specific productivity (qp) .................................................. 185 Specific substrate uptake rate (qs) qs adapt .......................................................................192 qs max .........................................................................192 Sporulation ...................................77, 93, 159, 164–165, 252, 254–257, 262 Steady state .......................................210, 212, 213, 216, 217, 228, 229, 273, 282 Stoichiometric matrix ............................... 217, 220, 226, 273 Stoichiometry ........................................... 214, 272, 281, 285 Strains [cir+] strain ................................................................... 70 Hansenula polymorpha strains ....................................... 44 Pichia pastoris commonly used strains (P. pastoris).............................89 protease deficient strains ...................................90–91 polyploid strains..................................................158–159 Streptoalloteichus hindustanus ......................................... 53, 93 Streptomyces noursei .................................................... 9, 53, 69 Stress resistance ........................................................180, 181 estimation of stress resistance .....................171, 176–179 Survival ratio .................................... 172–175, 177, 181, 182 Systems biology markup language (SBML) ..................... 270

T Targeted gene deletion ...........................................77–80, 82 TATA box .....................................................................19, 20 tet-off system ...................................................................... 31 tetO operators .................................................................... 32 tet promoter system ........................................................... 32 Tetrad dissection...............................................252, 255–256 Theophylline ............................................................126, 135 Transaminase ............................................ 115, 116, 119, 120 Transcription activator-like effectors (TALEs) ..................34 Transcription activator-like orthogonal repressors (TALORs) ........................................................34 Transcriptional activators (TA)....................21, 22, 27, 29, 34 Transcriptional control ...................................................... 36 Transcriptional repressors (TR) .............................. 21, 29, 32 Transcriptional start site (TSS) ....................................19–22 t-test ......................................................... 240, 241, 244, 248 Tunicamycin ...................................................................... 92

YEAST METABOLIC ENGINEERING: METHODS AND PROTOCOLS 316 Index U

W

Upstream activation (UAS) ........................ 19, 21, 26–28, 31 Upstream cis-acting promoter element.........................21–22 Upstream repression sequences (URS) .........................19, 21 Uptake rate .......................................186, 189, 192, 193, 213, 217, 274, 276, 284, 291

Whole-cell biocatalyst ...................................... 137, 139, 145 World Intellectual Property Organisation (WIPO) .......................................... 297, 299, 308

V

Yarrowia lipolytica ................................................ 68, 269, 270 Yeast cell chip ........................................... 137, 140, 149–150 Yeast minimal medium (YMM) .............................. 170–177, 179, 180, 182 YKU80 gene....................................................................... 46

Vanillin ..............................................282, 286, 288, 291, 292 Vector. See also Plasmids centromeric ..................................................... 67, 68, 157 episomal ....................68, 71, 74, 114 (see also Multi copy) gateway vectors .......................................................48–50 integrative ................................64, 66, 68–71, 88, 93, 114 multi copy .........................................9 (see also Episomal) Pichia pastoris expression vectors ...........................87–105 pop-in vectors .............................................................. 88 pop-out vectors ............................................................ 88 Vishniac solution ..........................................................46, 48

Y

Z Zeocin ............................50, 55, 87, 93, 94, 96, 186, 190, 211 Z-score .............................................................................277 Zygosaccharomyces Z. bailii .................................................................... 63–83 Z. rouxii ....................................................... 68, 69, 71, 80