Parasite Genomics: Methods and Protocols (Methods in Molecular Biology, 2369) 1071616803, 9781071616802

This detailed book provides a comprehensive series of innovative research techniques and methodologies applied to the pa

152 39 7MB

English Pages 347 [333] Year 2021

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Contents
Contributors
Part I: Novel Sequencing and Bioinformatic Pipelines for the Study of Parasite Genomes
Chapter 1: Nanopore Long Read DNA Sequencing of Protozoan Parasites: Hybrid Genome Assembly of Trypanosoma cruzi
1 Introduction
2 Materials
2.1 Parasites
2.2 Solutions, Reagents, and Consumables
2.3 Equipment
3 Methods
3.1 High Molecular Weight Genomic DNA Isolation
3.2 DNA Fragmentation (Optional)
3.3 ONT Ligation Sequencing Protocol
3.3.1 DNA Repair and Preparation for Adaptor Ligation
3.3.2 Barcode Ligation
3.3.3 Adaptor Ligation
3.4 ONT Rapid Sequencing Protocol
3.5 Genome Assembly
3.5.1 Basecall Reads Using Guppy
3.5.2 Quality Control of Reads
3.5.3 Genome Assembly
3.5.4 Assembly polishing
4 Notes
Annex 1 Masurca Configuration File
References
Chapter 2: Chromosomes Conformation Capture Coupled with Next-Generation Sequencing (Hi-C) in Plasmodium falciparum
1 Introduction
2 Materials
2.1 Specific Equipment
2.2 Kits
2.3 Reagents and Buffers
3 Methods
3.1 Preparation of Crosslinking the parasites (Day 1)
3.2 Nuclear Extraction and Restriction Digestion (Day 2)
3.3 Labeling of DNA Ends, Blunt End Ligation, and Crosslinking Reversal (Day 3)
3.4 DNA Purification and Shearing (Day 4)
3.5 Pull-Down of Biotinylated DNA
3.6 End Repair
3.7 A-Tailing
3.8 Adaptor Ligation
3.9 PCR Amplification and Library Purification
3.10 Sequencing (Day 5)
3.11 Read Processing, Normalization, Visualization, and Differential Interaction Analysis
4 Notes
References
Chapter 3: Sequencing and Reconstructing Helminth Mitochondrial Genomes Directly from Genomic Next-Generation Sequencing Data
1 Introduction
2 Materials
2.1 Reagents
2.2 Equipment
3 Methods
3.1 Preparation of Samples for High-Molecular Weight (HMW) Genomic DNA Extraction.
3.2 Isolation of HMW gDNA
3.3 Quantification of gDNA Yield
3.4 Examination of gDNA Using Agarose Gel Electrophoresis
3.5 Sequencing
3.6 Pre-processing of NGS Data
3.7 Bioinformatic Assembly, Annotation, and Analysis of Sequence Data
3.7.1 Assembly of Sequence Data
3.7.2 Evaluation of Mitochondrial Genome Assembly
3.7.3 Annotation of Sequence Data
3.7.4 Additional Analysis of Repeat-Rich Sequences
3.8 Publication of Annotated mt Genome
4 Notes
References
Chapter 4: Automated Phylogenetic Analysis Using Best Reciprocal BLAST
1 Introduction
2 Materials
2.1 Installation of batch_brb
3 Methods
3.1 Setup
3.2 Data Selection
3.3 Create BLAST Database
3.4 Create an Alias Database
3.5 Retrieve Accessions
3.6 Identify Putative Orthologs
3.7 Genome Walk
3.8 Phylogenetic Trees
3.9 Analyze Results
3.10 Finalize and Validate Results
4 Notes
References
Part II: Diagnostic Approaches Using Genomic Tools
Chapter 5: An Illumina MiSeq-Based Amplicon Sequencing Method for the Detection of Mixed Parasite Infections Using the Blastoc...
1 Introduction
2 Materials
2.1 Concentration of Parasite Forms from Feces
2.2 DNA Extraction
2.3 Sample Screening and Sequencing Library Preparation
3 Methods
3.1 Concentrating Parasite Forms from Feces Via CsCl Centrifugation
3.2 DNA Extraction
3.3 Sample Screening and Sequencing Library Preparation
3.4 Bioinformatic Analysis of Illumina Sequences
4 Notes
References
Chapter 6: Giardia duodenalis: Detection by Quantitative Real-Time PCR and Molecular Diversity
1 Introduction
2 Materials
2.1 Extraction and Purification of Genomic DNA from Stool Samples
2.2 Molecular Detection of Giardia duodenalis by Quantitative Real-Time PCR (qPCR)
2.3 Molecular Characterization of Giardia duodenalis
2.4 Gel Electrophoresis of PCR Products
2.5 Analysis of DNA Sequences
3 Methods
3.1 Extraction and Purification of Genomic DNA from Stool Samples
3.2 Molecular Detection of Giardia duodenalis by Quantitative Real-Time PCR
3.3 Molecular Characterization of Giardia duodenalis
3.3.1 Amplification of the Glutamate Dehydrogenase Gene of G. duodenalis
3.3.2 Amplification of the β-Giardin Gene of G. duodenalis
3.3.3 Amplification of Triosephosphate Isomerase Gene of G. duodenalis
3.4 Gel Electrophoresis of PCR Products
3.5 Analysis of DNA Sequences
4 Notes
References
Part III: Host-Parasite Interactions: Deciphering Gene Function and Molecular Processes in Parasites
Chapter 7: Conditional Gene Deletion in Mammalian and Mosquito Stages of Plasmodium berghei Using Dimerizable Cre Recombinase
1 Introduction
2 Materials
2.1 Lab Equipment
2.2 Plasticware and Glassware
2.3 Animals, Parasites, Bacteria, and Cells
2.4 Media, Solutions, and Reagents
2.4.1 Reagents for DNA Cloning and Genotyping
2.4.2 Reagents for Cell Culture, Parasite Transfection, and Selection
2.4.3 Other Reagents
3 Methods
3.1 Generation of Plasmids
3.2 Generation of Conditional Deletion Mutants in Plasmodium berghei
3.2.1 Plasmid Preparation for Transfection
3.2.2 Parasite Transfection and Selection (see Note 3)
3.2.3 Construct Integration Checks
3.3 Induction of Conditional Gene Deletion
3.3.1 Conditional Gene Deletion in Asexual Blood Stages
3.3.2 Conditional Gene Deletion in Sexual Blood Stages Prior to Transmission to Mosquitoes
3.3.3 Conditional Gene Deletion in Mosquito Stages
3.3.4 Conditional Gene Deletion in Liver Stages
4 Notes
References
Chapter 8: Coupling Auxin-Inducible Degron System with Ultrastructure Expansion Microscopy to Accelerate the Discovery of Gene...
1 Introduction
2 Materials
2.1 Coverslip Coating
2.2 Gels Preparation and Expansion
2.3 Immunostaining
2.4 Imaging
3 Methods
3.1 Coverslips Coating and Preliminary Preparations (Day 0)
3.2 Extracellular Parasite Preparation (Day 1)
3.3 Protein Anchoring and Crosslinking Prevention (Day 1)
3.4 Gelation (Day 1)
3.5 Denaturation (Day 1)
3.6 First Round of Expansion (Day 1)
3.7 Gel Measurement and Shrinkage (Day 2)
3.8 Immunostaining (Day 2)
3.9 Final Round of Expansion (Day 2)
3.10 Imaging (Day 3)
4 Notes
References
Chapter 9: Genome-Wide Analysis of RNA-Protein Interactions in Plasmodium falciparum Using eCLIP-Seq
1 Introduction
2 Materials
2.1 Specific Equipment
2.2 Kits
2.3 Specific Consumables
2.4 Reagents (see Note 1)
2.5 Primers
2.6 Recipes
3 Methods
3.1 Day 1: Preparation of Parasite Protein Extract and Immunoprecipitation (IP)
3.1.1 Parasite Extraction (see Note 2)
3.1.2 UV Crosslinking
3.1.3 Preparation of Antibody-Coupled Magnetic Beads
3.1.4 Cell Lysis and Fragmentation
3.1.5 Immunoprecipitation
3.2 Day 2: IP Washes, Gel Electrophoresis, and Transfer
3.2.1 Save Input Samples
3.2.2 Immunoprecipitation washes
3.2.3 RNA 5′-End Repair
3.2.4 RNA 3′-End Repair
3.2.5 Immunoprecipitation Wash
3.2.6 Ligation of RNA Adaptor to IP Samples
3.2.7 Immunoprecipitation Wash
3.2.8 Elution and Denaturation of IP and Input Samples
3.2.9 Loading SDS-PAGE Gel
3.2.10 Transfer to Nitrocellulose Membrane
3.3 Day 3: Library Preparation
3.3.1 Size Selection
3.3.2 RNA Release
3.3.3 Clean Samples with RNA Clean and Concentrator-5
3.3.4 5′-End Repair of Input RNA
3.3.5 3′-End Repair of Input RNA
3.3.6 Clean Input Samples with RNA Clean and Concentrator-5
3.3.7 Ligation of RNA Adaptor to Input Samples
3.3.8 Input Samples Cleanup
3.3.9 Reverse Transcription of IP and Input Samples
3.3.10 cDNA Cleanup
3.3.11 Bead Cleanup
3.3.12 IP and Input cDNA Ligation
3.4 Day 4: Library Amplification
3.4.1 Samples Cleanup
3.4.2 Quantification by qPCR
3.4.3 PCR Amplification
3.4.4 AMPure Beads Cleanup
3.4.5 Gel Purification
3.4.6 Library Quantification and Sequencing
3.4.7 Read Processing, Normalization, and Visualization
4 Notes
References
Chapter 10: Transcriptional Analysis of Tightly Synchronized Plasmodium falciparum Intraerythrocytic Stages by RT-qPCR
1 Introduction
2 Materials
3 Methods
3.1 Tight Synchronization of Parasite Cultures
3.2 Sample Lysis and RNA Collection Using Trizol
3.3 RNA Extraction and DNase Treatment
3.3.1 Large RNA Quantity Samples
3.3.2 Small RNA Quantity Samples
3.4 Reverse Transcription
3.5 Analysis of cDNA by Quantitative PCR (qPCR)
3.6 Data Analysis
3.6.1 Quality Control
3.6.2 cDNA Quantification and Normalization
4 Notes
References
Chapter 11: Analysis of the Interaction Between Plasmodium falciparum-Infected Erythrocytes and Human Endothelial Cells Using ...
1 Introduction
2 Materials
2.1 Cells
2.2 Laminar Flow System
2.3 Lab Equipment
2.4 Kits
3 Methods
3.1 Co-Incubation of IEs and ECs Using a Laminar Flow System
3.2 Isolation of RNA/miRNA and NGS Sequencing of ECs
3.3 Bioinformatics Data Analysis
3.4 Imaging
3.5 R Tracking Script Used for bioinformatics Analysis
4 Notes
References
Chapter 12: Integration of Genomic and Transcriptomic Data to Elucidate Molecular Processes in Babesia divergens
1 Introduction
2 Materials
2.1 B. divergens In Vitro Culture
2.2 Giemsa Stain
2.3 B. divergens-Free Merozoite Isolation
2.4 B. divergens Intraerythrocytic Parasites Isolation
2.5 RNA Isolation
2.6 RNAseq Library Preparation
2.7 Computational Tools
2.8 Real-Time RT-qPCR
3 Methods
3.1 B. divergens In Vitro Culture
3.2 B. divergens-Free Merozoite Isolation
3.3 B. divergens Intraerythrocytic Parasites Isolation
3.4 RNAIsolation
3.5 RNAseq Library Preparation
3.6 RNA Sequencing
3.7 Genome Reassembly, Integration, and Improvement Using Data Obtained from Multiple DNA Sequencing Platforms
3.8 Differential Expression Analysis Using RNAseq
3.9 Real-Time RT-qPCR
3.9.1 Reverse Transcription (RT) Reaction
3.9.2 Real-Time qPCR
4 Notes
References
Chapter 13: Integration of Functional Genomic, Transcriptomic, and Metabolomic Data to Identify Key Features in Genomic Expres...
1 Introduction
2 Materials
2.1 B. divergens In Vitro Culture
2.2 CE-TOF/MS
2.3 GC-QTOF/MS
2.4 LC-QTOF/MS
2.5 LC-QqQ/MS
2.6 Statistics
2.7 Genomic, Transcriptomic and Metabolomic Data Integration
3 Methods
3.1 B. divergens In Vitro Culture
3.2 B. divergens-Free Merozoite Isolation
3.3 B. divergens Intraerythrocytic Parasites Isolation
3.4 Supernatants Isolation from B. divergens In Vitro Cultures
3.5 Isolation of uRBC and uS
3.6 Metabolite Extraction
3.7 CE-TOF/MS Analysis and Data Reprocessing
3.8 GC-QTOF/MS Analysis and Data Reprocessing
3.9 LC-QTOF/MS Analysis and Data Reprocessing
3.10 LC-QqQ/MS Analysis and Data Reprocessing
3.11 Statistics
3.12 Biocomputational Integration of Genomic, Transcriptomic, and Metabolomic Data
4 Notes
References
Chapter 14: Genome Analysis of Programmed DNA Elimination in Parasitic Nematodes
1 Introduction
2 Materials
2.1 Parasitic Nematode Materials
2.2 Buffers and Solutions
3 Methods
3.1 High-Quality Genomic DNA Preparation
3.2 Genomic Data Analysis on DNA Elimination
3.2.1 Identify Retained and Eliminated Sequences
3.2.2 Identify DNA break Regions
3.2.3 Determine the Consequence of DNA Break Ends in the Somatic Cells
4 Notes
References
Chapter 15: Helminth Microbiota Profiling Using Bacterial 16S rRNA Gene Amplicon Sequencing: From Sampling to Sequence Data Mi...
1 Introduction
2 Materials
2.1 Sampling of Schistosoma mansoni Developmental Stages for Helminth Microbiota Profiling
2.1.1 Consumables and Equipment
2.1.2 Buffers, Solutions, Reagents, and Other Compounds
2.2 DNA Extraction
2.3 16S rRNA Gene Amplicon Library Preparation and Sequencing on the Illumina MiSeq Platform
2.3.1 Equipment (see Note 2)
2.3.2 Reagents and Consumables
2.4 Analysis of 16S rRNA Gene Sequencing Data Using QIIME2
3 Methods
3.1 Sampling of S. mansoni Developmental Stages for Helminth Microbiota Profiling
3.1.1 Cercariae
3.1.2 Adult Worms
3.1.3 Eggs
3.1.4 Miracidia
3.1.5 Sporocysts
3.1.6 Collecting Sample Metadata
3.2 DNA Extraction
3.3 16S rRNA Gene Amplicon Library Preparation and Sequencing on the Illumina MiSeq Platform
3.3.1 Amplicon PCR
3.3.2 Amplicon PCR Clean-up
3.3.3 Index PCR
3.3.4 Index PCR Clean-up
3.3.5 Library Quantification and Pooling
3.3.6 Library Denaturation and Loading onto the MiSeq
3.4 Analysis of 16S rRNA Gene Sequencing Data Using QIIME2
3.5 Sequencing Data Visualization, Statistical Analysis and Mining
4 Notes
References
Part IV: Genomics of Parasite-Derived Extracellular Vesicles
Chapter 16: Methods for the Isolation and Study of Exovesicle DNA from Trypanosomatid Parasites
1 Introduction
2 Materials
2.1 Cell Culture of Epimastigotes and Trypomastigotes Forms of T. cruzi
2.2 EVs Induction by Serum Starvation
2.3 Exovesicles Isolation by Ultracentrifugation
2.4 Exovesicles Isolation by Ultrafiltration
2.5 Exovesicles DNA Isolation
2.6 Characterization of the EV-Associated DNA Cargo
3 Methods
3.1 In Vitro Differentiation of Epimastigotes into Metacyclic Trypomastigotes
3.2 Recovery of Trypomastigotes Forms of T. cruzi from Mammalian Cell Cultures and Exovesicles Induction
3.3 Exovesicles Induction from Epimastigotes Forms of T. cruzi
3.4 EVs Isolation and Purification
3.4.1 EVs Isolation by Ultracentrifugation
3.4.2 EVs Isolation by Ultrafiltration
3.5 ExoDNA Isolation
3.6 Characterization of the ExoDNA
4 Notes
References
Chapter 17: Isolation and Analysis of MicroRNAs from Extracellular Vesicles of the Parasitic Model Nematodes Nippostrongylus b...
1 Introduction
2 Materials
2.1 Laboratory Facilities and Equipment
2.2 General Reagents for Molecular Biology
2.3 Biological Material: Extracellular Vesicles from Trichuris muris and Nippostrongylus brasiliensis excretory/secretory prod...
2.4 Kits and Reagents for RNA Extraction and Sequencing
2.5 Bioinformatic resources
3 Methods
3.1 Density Gradient Purification of EVs from Nematodes
3.2 miRNA Isolation and Quality Control (QC).
3.3 (Optional) RNA Precipitation Procedure for Small RNAs
3.4 miRNA Library Preparation and Sequencing on a NextSeq500 Platform
3.5 miRNA Identification by Using the miRDeep2 Package
4 Notes
References
Index
Recommend Papers

Parasite Genomics: Methods and Protocols (Methods in Molecular Biology, 2369)
 1071616803, 9781071616802

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Methods in Molecular Biology 2369

Luis M. de Pablos Javier Sotillo Editors

Parasite Genomics Methods and Protocols

METHODS

IN

MOLECULAR BIOLOGY

Series Editor John M. Walker School of Life and Medical Sciences University of Hertfordshire Hatfield, Hertfordshire, UK

For further volumes: http://www.springer.com/series/7651

For over 35 years, biological scientists have come to rely on the research protocols and methodologies in the critically acclaimed Methods in Molecular Biology series. The series was the first to introduce the step-by-step protocols approach that has become the standard in all biomedical protocol publishing. Each protocol is provided in readily-reproducible step-bystep fashion, opening with an introductory overview, a list of the materials and reagents needed to complete the experiment, and followed by a detailed procedure that is supported with a helpful notes section offering tips and tricks of the trade as well as troubleshooting advice. These hallmark features were introduced by series editor Dr. John Walker and constitute the key ingredient in each and every volume of the Methods in Molecular Biology series. Tested and trusted, comprehensive and reliable, all protocols from the series are indexed in PubMed.

Parasite Genomics Methods and Protocols

Edited by

Luis M. de Pablos Departamento de Parasitologia, Universidad de Granada, Granada, Spain

Javier Sotillo Laboratorio de Referencia e Investigación en Parasitología, Centro Nacional de Microbiología, Instituto de Salud Carlos III, Majadahonda, Madrid, Spain

Editors Luis M. de Pablos Departamento de Parasitologia Universidad de Granada Granada, Spain

Javier Sotillo Laboratorio de Referencia e Investigacio´n en Parasitologı´a Centro Nacional de Microbiologı´a Instituto de Salud Carlos III Majadahonda, Madrid, Spain

ISSN 1064-3745 ISSN 1940-6029 (electronic) Methods in Molecular Biology ISBN 978-1-0716-1680-2 ISBN 978-1-0716-1681-9 (eBook) https://doi.org/10.1007/978-1-0716-1681-9 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Cover illustration: In vitro culture of Babesia divergens: The figure shows B. divergens parasites stained with MitoTracker (green fluorescent) within human red blood cells stained with PKH26 (red fluorescent). Photo credits: Elena Sevilla, Dr. Luis Miguel Gonza´lez, Dr. Estrella Montero. This Humana imprint is published by the registered company Springer Science+Business Media, LLC, part of Springer Nature. The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.

Preface Parasite genomics is a scientific research area devoted to the study of the genetic material synthetized by parasites. This field has tremendously benefited over recent years from the development of novel methodologies involved in the sequencing, assembly, and annotation of parasite genomes, which has resulted in numerous publications and significantly advanced the field. The expansion and lowered costs of these methodologies as well as the development of novel strategies integrating genomics with other -omics such as proteomics and metabolomics have propelled our knowledge in drug discovery, vaccine discovery, gene editing, and fundamental biology of these pathogens. As James D. Wasmuth stated in an opinion paper in Trends in Parasitology in 2014 [1] “a genome is a resource that can be exploited by many different researchers, in many different ways, to answer many different questions”. Therefore, it is of outstanding importance to share with the research community the appropriate tools and detailed descriptions to not only generate high-precision genome sequences but also analyze and interpret of the large amount of data generated. The Parasite Genomics: Methods and Protocols volume aims to provide a comprehensive series of innovative research and methodologies applied to the parasite genomics research area. These methods are provided and written by internationally recognized experts in the field, applying different approaches to analyzing parasite genomes and furthering the study of genetic complexity and the mechanisms of regulation. With the hope that this book provides a useful guide on current protocols and techniques for the study of parasite genomes, we believe that the reader will find answers and solutions to analyze their favorite parasite genome. The volume is organized into four parts. The first part, “Novel Sequencing and Bioinformatic Pipelines for the Study of Parasite Genomes,” provides state-of-the-art approaches for high-quality genome assembly using third-generation sequencing (Dı´az Viraque´ et al., Chap. 1), 3D genome conformation and interaction with genomic loci using Hi-C (Gupta et al., Chap. 2), and a straightforward method for reconstructing mitochondrial genomes derived from Next-Generation Sequencing (NGS) datasets (Palevich and Maclean, Chap. 3). Finally, the last chapter of this section provides a detailed pipeline for ortholog and paralog gene identification in eukaryotic genomes (Butterfield et al., Chap. 4). The second part, “Diagnostic Approaches Using Genomic Tools,” is fully dedicated to the identification of parasite species in environmental samples using two different approaches: next-generation amplicon sequencing using the Blastocystis SSU rRNA gene as an example (Maloney et al., Chap. 5) and Giardia duodenalis coupled to SSU rRNA gene multilocus sequencing for parasite genotyping (Dashti et al., Chap. 6). The third part, “Host–Parasite Interactions: Deciphering Gene Function and Molecular Processes in Parasites,” describes a wide range of multi-omics approaches for comprehension of gene expression and regulatory networks, functional genomics, and phenotypic analysis. Fernandes et al. (Chap. 7) employ dimerisable Cre (DiCre) recombinase system, which has emerged as a powerful approach for conditional gene knockout in Plasmodium parasite, and phenotypic analysis in this biological model. Dos Santos and Soldati-Favre (Chap. 8) use an auxin-inducible degron system coupled to high-resolution microscopy termed ultrastructure expansion microscopy to study the cellular architecture of Toxoplasma gondii. By its hand, an eCLIP method for analysis of RNA-binding proteins–RNA association is adapted

v

vi

Preface

and described to the asexual blood stages of Plasmodium falciparum (Hollin et al., Chap. 9), while Casas Vila et al. (Chap. 10) describe a highly detailed RTqPCR workflow for the precise characterization of synchronized erythrocyte stages of P. falciparum and Wu et al. (Chap. 11) examine the cytoadhesive properties of infected red blood cells with P. falciparum and further analysis of endothelial cells using transcriptome approaches. Gonza´lez et al. (Chap. 12) and Ferna´ndez-Garcı´a (Chap. 13) introduce integrative approaches for multi-omic characterization of parasite of Babesia genus. Finally, two chapters were dedicated to helminth parasites. Firstly, Wang (Chap. 14) will introduce new developments and methods to study programmed DNA elimination using the parasitic nematode Ascaris as a model, and Formenti et al. (Chap. 15) describe the appropriate workflow to characterize worm-associated microbial communities from sampling to sequencing data analysis, visualization, and mining. The fourth part, “Genomics of Parasite-Derived Extracellular Vesicles,” is entirely dedicated to the characterization of these released nanovesicles that play a pivotal role in parasite pathogenesis. To characterize the exovesicle DNA cargoes (exoDNA), a novel protocol for exoDNA isolation is proposed (Orrego et al., Chap. 16), whereas Eichenberger describes the methodology to isolate and characterize EV’s microRNA cargoes. This volume has been a highly collaborative effort to create a detailed picture of the genomic approaches for the better understanding and characterization of parasite nucleic acid content. Granada, Spain Madrid, Spain

Luis M. de Pablos Javier Sotillo

Reference 1. Wasmuth JD (2014) Realizing the promise of parasite genomics. Trends Parasitol 30:321–323.

Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

PART I

NOVEL SEQUENCING AND BIOINFORMATIC PIPELINES FOR THE STUDY OF PARASITE GENOMES

1 Nanopore Long Read DNA Sequencing of Protozoan Parasites: Hybrid Genome Assembly of Trypanosoma cruzi. . . . . . . . . . . . . . . . . . . . . . . . . . . . Florencia Dı´az-Viraque´, Gonzalo Greif, Luisa Berna´, and Carlos Robello 2 Chromosomes Conformation Capture Coupled with Next-Generation Sequencing (Hi-C) in Plasmodium falciparum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mohit Kumar Gupta, Todd Lenz, and Karine G. Le Roch 3 Sequencing and Reconstructing Helminth Mitochondrial Genomes Directly from Genomic Next-Generation Sequencing Data . . . . . . . . . . . . . . . . . . Nikola Palevich and Paul Haydon Maclean 4 Automated Phylogenetic Analysis Using Best Reciprocal BLAST . . . . . . . . . . . . . Erin R. Butterfield, James C. Abbott, and Mark C. Field

PART II

3

15

27 41

DIAGNOSTIC APPROACHES USING GENOMIC TOOLS

5 An Illumina MiSeq-Based Amplicon Sequencing Method for the Detection of Mixed Parasite Infections Using the Blastocystis SSU rRNA Gene as an Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jenny G. Maloney, Nadja S. George, Aleksey Molokin, and Monica Santin 6 Giardia duodenalis: Detection by Quantitative Real-Time PCR and Molecular Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alejandro Dashti, Pamela C. Ko¨ster, and David Carmena

PART III

v ix

67

83

HOST-PARASITE INTERACTIONS: DECIPHERING GENE FUNCTION AND MOLECULAR PROCESSES IN PARASITES

7 Conditional Gene Deletion in Mammalian and Mosquito Stages of Plasmodium berghei Using Dimerizable Cre Recombinase . . . . . . . . . . . . . . . . . 101 Priyanka Fernandes, Manon Loubens, Olivier Silvie, and Sylvie Briquet 8 Coupling Auxin-Inducible Degron System with Ultrastructure Expansion Microscopy to Accelerate the Discovery of Gene Function in Toxoplasma gondii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Nicolas Dos Santos Pacheco and Dominique Soldati-Favre 9 Genome-Wide Analysis of RNA–Protein Interactions in Plasmodium falciparum Using eCLIP-Seq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Thomas Hollin, Steven Abel, and Karine G. Le Roch

vii

viii

10

11

12

13

14 15

Contents

Transcriptional Analysis of Tightly Synchronized Plasmodium falciparum Intraerythrocytic Stages by RT-qPCR . . . . . . . . . . . . . . . Nu´ria Casas-Vila, Anastasia K. Pickford, Harvie P. Portugaliza, Elisabet Tinto-Font, and Alfred Corte´s Analysis of the Interaction Between Plasmodium falciparum-Infected Erythrocytes and Human Endothelial Cells Using a Laminar Flow System, Bioinformatic Tracking and Transcriptome Analysis . . . . . . . . . . . . Yifan Wu, Philip Bouws, Stephan Lorenzen, Iris Bruchhaus, and Nahla Galal Metwally Integration of Genomic and Transcriptomic Data to Elucidate Molecular Processes in Babesia divergens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Luis Miguel Gonzalez, Elena Sevilla, Miguel Ferna´ndez-Garcı´a, Alejandro Sanchez-Flores, and Estrella Montero Integration of Functional Genomic, Transcriptomic, and Metabolomic Data to Identify Key Features in Genomic Expression, Metabolites, and Metabolic Pathways of Babesia divergens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Miguel Ferna´ndez-Garcia, Alejandro Sanchez-Flores, Luis Miguel Gonzalez, Coral Barbas, Mª. Fernanda Rey-Stolle, Elena Sevilla, Antonia Garcı´a, and Estrella Montero Genome Analysis of Programmed DNA Elimination in Parasitic Nematodes . . . Jianbin Wang Helminth Microbiota Profiling Using Bacterial 16S rRNA Gene Amplicon Sequencing: From Sampling to Sequence Data Mining. . . . . . . . . . . . . Fabio Formenti, Gabriel Rinaldi, Cinzia Cantacessi, and Alba Corte´s

PART IV 16

17

165

187

199

217

251

263

GENOMICS OF PARASITE-DERIVED EXTRACELLULAR VESICLES

Methods for the Isolation and Study of Exovesicle DNA from Trypanosomatid Parasites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 Lina Marı´a Orrego, Romina Romero, Antonio Osuna, and Luis M. De Pablos Isolation and Analysis of MicroRNAs from Extracellular Vesicles of the Parasitic Model Nematodes Nippostrongylus brasiliensis and Trichuris muris . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 Ramon M. Eichenberger

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

333

Contributors JAMES C. ABBOTT • Data Analysis Group, Division of Computational Biology, School of Life Sciences, University of Dundee, Dundee, UK STEVEN ABEL • Department of Molecular, Cell and Systems Biology, University of California, Riverside, CA, USA CORAL BARBAS • CEMBIO (Center for Metabolomics and Bioanalysis), Facultad de Farmacia, Universidad San Pablo CEU, CEU Universities, Campus Monteprincipe, Boadilla del Monte, Madrid, Spain LUISA BERNA´ • Laboratorio de Interacciones Hospedero-Patogeno—UBM, Institut Pasteur de Montevideo, Montevideo, Uruguay; Seccion Biomatema´tica—Unidad de Genomica Evolutiva, Facultad de Ciencias, Universidad de la Repu´blica, Montevideo, Uruguay PHILIP BOUWS • Bernhard Nocht Institut for Tropical Medicine, Hamburg, Germany SYLVIE BRIQUET • Sorbonne Universite´, INSERM, CNRS, Centre d’Immunologie et des Maladies Infectieuses, CIMI-Paris, Paris, France IRIS BRUCHHAUS • Bernhard Nocht Institut for Tropical Medicine, Hamburg, Germany ERIN R. BUTTERFIELD • Wellcome Centre for Anti-Infectives Research, School of Life Sciences, University of Dundee, Dundee, UK CINZIA CANTACESSI • Department of Veterinary Medicine, University of Cambridge, Cambridge, UK DAVID CARMENA • Parasitology Reference and Research Laboratory, National Centre for Microbiology, Madrid, Spain ´ NURIA CASAS-VILA • ISGlobal, Hospital Clı´nic—Universitat de Barcelona, Barcelona, Catalonia, Spain ` cia i Tecnologia Farmace`utica i Parasitologia, ALBA CORTE´S • Departament de Farma ` cia, Universitat de Vale`ncia, Vale`ncia, Spain Facultat de Farma ALFRED CORTE´S • ISGlobal, Hospital Clı´nic—Universitat de Barcelona, Barcelona, Catalonia, Spain; ICREA, Barcelona, Catalonia, Spain ALEJANDRO DASHTI • Parasitology Reference and Research Laboratory, National Centre for Microbiology, Madrid, Spain LUIS M. DE PABLOS • Institute of Biotechnology, University of Granada, Granada, Spain; Department of Parasitology, Biochemical and Molecular Parasitology Group CTS-183, University of Granada, Granada, Spain FLORENCIA DI´AZ-VIRAQUE´ • Laboratorio de Interacciones Hospedero-Patogeno—UBM, Institut Pasteur de Montevideo, Montevideo, Uruguay RAMON M. EICHENBERGER • Institute of Parasitology, University of Zurich, Zurich, Switzerland PRIYANKA FERNANDES • Sorbonne Universite´, INSERM, CNRS, Centre d’Immunologie et des Maladies Infectieuses, CIMI-Paris, Paris, France MIGUEL FERNA´NDEZ-GARCI´A • CEMBIO (Center for Metabolomics and Bioanalysis), Facultad de Farmacia, Universidad San Pablo CEU, CEU Universities, Campus Monteprincipe, Boadilla del Monte, Madrid, Spain MARK C. FIELD • Wellcome Centre for Anti-Infectives Research, School of Life Sciences, University of Dundee, Dundee, UK; Biology Centre, Institute of Parasitology, Faculty of ˇ eske´ Budeˇjovice, Czech Republic Sciences, University of South Bohemia, C

ix

x

Contributors

FABIO FORMENTI • Department of Veterinary Medicine, University of Cambridge, Cambridge, UK; IRCCS Sacro Cuore Don Calabria Hospital, Negrar, Verona, Italy ANTONIA GARCI´A • CEMBIO (Center for Metabolomics and Bioanalysis), Facultad de Farmacia, Universidad San Pablo CEU, CEU Universities, Campus Monteprincipe, Boadilla del Monte, Madrid, Spain NADJA S. GEORGE • United States Department of Agriculture, Agricultural Research Service, Environmental Microbial Food Safety Laboratory, Beltsville, MD, USA LUIS MIGUEL GONZALEZ • Laboratorio de Referencia e Investigacion en Parasitologı´a, Centro Nacional de Microbiologı´a, ISCIII Majadahonda, Madrid, Spain GONZALO GREIF • Laboratorio de Interacciones Hospedero-Patogeno—UBM, Institut Pasteur de Montevideo, Montevideo, Uruguay MOHIT KUMAR GUPTA • Department of Molecular, Cell and Systems Biology, University of California, Riverside, CA, USA THOMAS HOLLIN • Department of Molecular, Cell and Systems Biology, University of California, Riverside, CA, USA PAMELA C. KO¨STER • Parasitology Reference and Research Laboratory, National Centre for Microbiology, Madrid, Spain TODD LENZ • Department of Molecular, Cell and Systems Biology, University of California, Riverside, CA, USA KARINE G. LE ROCH • Department of Molecular, Cell and Systems Biology, University of California, Riverside, CA, USA STEPHAN LORENZEN • Bernhard Nocht Institut for Tropical Medicine, Hamburg, Germany MANON LOUBENS • Sorbonne Universite´, INSERM, CNRS, Centre d’Immunologie et des Maladies Infectieuses, CIMI-Paris, Paris, France PAUL HAYDON MACLEAN • AgResearch Limited, Grasslands Research Centre, Palmerston North, New Zealand JENNY G. MALONEY • United States Department of Agriculture, Agricultural Research Service, Environmental Microbial Food Safety Laboratory, Beltsville, MD, USA NAHLA GALAL METWALLY • Bernhard Nocht Institut for Tropical Medicine, Hamburg, Germany ALEKSEY MOLOKIN • United States Department of Agriculture, Agricultural Research Service, Environmental Microbial Food Safety Laboratory, Beltsville, MD, USA ESTRELLA MONTERO • Laboratorio de Referencia e Investigacion en Parasitologı´a, Centro Nacional de Microbiologı´a, ISCIII Majadahonda, Madrid, Spain LINA MARI´A ORREGO • Programa de Estudio y Control de Enfermedades Tropicales (PECET), Facultad de Medicina, Universidad de Antioquia, Medellı´n, Columbia ANTONIO OSUNA • Institute of Biotechnology, University of Granada, Granada, Spain; Department of Parasitology, Biochemical and Molecular Parasitology Group CTS-183, University of Granada, Granada, Spain NICOLAS DOS SANTOS PACHECO • Department of Microbiology and Molecular Medicine, CMU, University of Geneva, Geneva, Switzerland NIKOLA PALEVICH • AgResearch Limited, Grasslands Research Centre, Palmerston North, New Zealand ANASTASIA K. PICKFORD • ISGlobal, Hospital Clı´nic—Universitat de Barcelona, Barcelona, Catalonia, Spain HARVIE P. PORTUGALIZA • ISGlobal, Hospital Clı´nic—Universitat de Barcelona, Barcelona, Catalonia, Spain

Contributors

xi

Mª. FERNANDA REY-STOLLE • CEMBIO (Center for Metabolomics and Bioanalysis), Facultad de Farmacia, Universidad San Pablo CEU, CEU Universities, Campus Monteprincipe, Boadilla del Monte, Madrid, Spain GABRIEL RINALDI • Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK CARLOS ROBELLO • Laboratorio de Interacciones Hospedero-Patogeno—UBM, Institut Pasteur de Montevideo, Montevideo, Uruguay; Departamento de Bioquı´mica, Facultad de Medicina, Universidad de la Repu´blica, Montevideo, Uruguay ROMINA ROMERO • Institute of Biotechnology, University of Granada, Granada, Spain; Department of Parasitology, Biochemical and Molecular Parasitology Group CTS-183, University of Granada, Granada, Spain ALEJANDRO SANCHEZ-FLORES • Unidad Universitaria de Secuenciacion Masiva y Bioinforma´tica, Instituto de Biotecnologı´a, Cuernavaca, Morelos, Mexico MONICA SANTIN • United States Department of Agriculture, Agricultural Research Service, Environmental Microbial Food Safety Laboratory, Beltsville, MD, USA ELENA SEVILLA • Laboratorio de Referencia e Investigacion en Parasitologı´a, Centro Nacional de Microbiologı´a, ISCIII Majadahonda, Madrid, Spain OLIVIER SILVIE • Sorbonne Universite´, INSERM, CNRS, Centre d’Immunologie et des Maladies Infectieuses, CIMI-Paris, Paris, France DOMINIQUE SOLDATI-FAVRE • Department of Microbiology and Molecular Medicine, CMU, University of Geneva, Geneva, Switzerland ELISABET TINTO´-FONT • ISGlobal, Hospital Clı´nic—Universitat de Barcelona, Barcelona, Catalonia, Spain JIANBIN WANG • Department of Biochemistry and Cellular and Molecular Biology, University of Tennessee, Knoxville, TN, USA; UT-ORNL Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, USA YIFAN WU • Bernhard Nocht Institut for Tropical Medicine, Hamburg, Germany

Part I Novel Sequencing and Bioinformatic Pipelines for the Study of Parasite Genomes

Chapter 1 Nanopore Long Read DNA Sequencing of Protozoan Parasites: Hybrid Genome Assembly of Trypanosoma cruzi Florencia Dı´az-Viraque´, Gonzalo Greif, Luisa Berna´, and Carlos Robello Abstract Due to highly repetitive genome sequences, short-read-based Trypanosoma cruzi genomes are extremely fragmented. Contiguous trypanosomatid genomes assemblies have resulted in the advent of thirdgeneration sequencing technologies. Long reads span several to hundreds of kbps allowing to correct assemblies of repeated and low complexity DNA regions. However, these techniques present higher error rates. Hybrid assembly strategies that combine error-prone long reads with much more accurate Illumina short reads represent a very convenient approach for enhancing genome completeness. Here, we describe how to perform a hybrid assembly for genomic analysis of protozoan pathogens using Illumina and Oxford Nanopore sequencing. Key words Protozoan parasites, Trypanosomes, Trypanosoma cruzi, Hybrid Genome Assembly, Long read sequencing, Oxford Nanopore Technologies, Illumina

1

Introduction Even though trypanosomatid genomes are small (ranging from 25 to 90 Mb) [1], Trypanosoma cruzi assembly has been challenging due to the abundance of repetitive sequences including multigene families, transposable elements, the 195-bp satellite and tandem repeats [2–5]. Nevertheless, sufficiently long reads can help to overcome this hurdle. Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) are useful for assembling genomes that are rich in repetitive elements because its long reads can span entire tandems of repeats and anchor them to uniquely occurring segments of the genome, resolving these complex regions and improving contiguity. However, these technologies have higher per-base error rates than short reads from Illumina, demanding considerable amounts of data and intensive computation to build assemblies with reasonable consensus accuracy just using long reads. Conveniently, it is possible to perform hybrid

Luis M. de Pablos and Javier Sotillo (eds.), Parasite Genomics: Methods and Protocols, Methods in Molecular Biology, vol. 2369, https://doi.org/10.1007/978-1-0716-1681-9_1, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021

3

4

Florencia Dı´az-Viraque´ et al.

assemblies, combining long reads with accurate Illumina short reads representing a convenient approach for enhancing genome completeness. This protocol describes how to carry out genomic high molecular weight DNA extraction, library preparation, and sequencing with long read technologies, as well as computational analysis in order to generate high quality hybrid T. cruzi genome assemblies. The procedure can be applied in the same way on other protozoan parasites; in our laboratory, it has been successfully used with T. cruzi,Trypanosoma vivax,Toxoplasma gondii, and Neospora caninum.

2 2.1

Materials Parasites

2.2 Solutions, Reagents, and Consumables

Handling of T. cruzi epimastigotes must be carried out in Class II microbiological safety cabinets. T. cruzi epimastigotes [6] should be cultured axenically in liver infusion tryptose (LIT) medium supplemented with 10% (v/v) heat-inactivated fetal bovine serum at 28  C. 1. LIT medium: 5.0 g/L Liver Infusion Broth, 4.4 g/L NaCl, 0.4 g/L KCl, 2.2 g/L glucose, 5 g/L tryptose, 11.6 g/L Na2HPO4, 15 g/L Yeast Extract. Adjust pH to 7.2 and filtersterilize. Add 25 mg hemin, 100 U/mL penicillin, and 0.1 mg/mL streptomycin. 2. 10x phosphate-buffered saline (PBS): 1.37 M NaCl, 27 mM KCl, 100 mM Na2HPO4, and 18 mM KH2PO4 pH 7.4 in DEPC water. Autoclave. Prepare fresh 1x solution in sterile conditions for parasites washes. 3. Buffer A: 0.2 M Tris-HCl pH 8, 0.1 M EDTA. 4. Proteinase K (20 mg/mL). 5. 10% (w/v) SDS. 6. Sodium chloride (NaCl) 0.5 M. 7. Phenol (saturated with 10 mM Tris buffer pH 8 and EDTA 1 mM). 8. Chloroform. 9. Isoamyl alcohol. 10. DNAse-free RNAse A. 11. Nuclease-free water. 12. Ethanol absolute. 13. Freshly prepared 70% ethanol in nuclease-free water. 14. Chloroform/Isoamyl alcohol (24:1, v/v). 15. Phenol (TE saturated, pH 8)/chloroform/isoamyl alcohol (25:24:1, v/v).

Long Read DNA Sequencing of Protozoan Genomes Using Nanopore

5

16. Solutions and buffers contained in: ONT Rapid Sequencing Kit (Nanopore, SQK-RBK004), ONT Ligation Sequencing Kit SQK-LSK109, ONT Native Barcoding Expansions 1–12 EXP-NBD104, NEB Blunt/TA Ligase Master Mix (M0367), NEBNext® Quick Ligation Reaction Buffer (NEB B6058) and NEBNext® Companion Module for Oxford Nanopore Technologies® Ligation Sequencing (cat # E7180S). 17. Agencourt AMPure XP beads. 18. g-TUBEs (Covaris, 520079). 19. 1.5 mL Eppendorf DNA LoBind tubes. 20. 0.2 mL thin-walled PCR tubes. 21. Sterile Falcon. 22. Micropipette tips. 2.3

Equipment

1. MinION Sequencing Device and flow cells (ONT). 2. Thermal cycler. 3. Agarose gel electrophoresis equipment or Bioanalyzer (Agilent). 4. Microcentrifuge. 5. Class II microbiological safety cabinet. 6. Magnetic separator rack. 7. Fluorescence-based quantification equipment (Qubit, DeNovix, etc.).

3

Methods Two library preparation protocols for genomics studies with long reads are presented using T. cruzi as a model of a highly repetitive and complex genome. While the ONT Rapid Sequencing protocol is a simple and rapid library preparation method for genomic DNA, the ONT Ligation Sequencing protocol is a versatile sequencing kit optimized for highest throughput that allows controlling read length by user fragmentation. Several samples can be analyzed in one sequencing experiment with both protocols, which reduces costs. These protocols are used with high molecular weight DNA obtained from parasites. In combination with Illumina highly accurate short reads [7], the data generated could be used to perform highly accurate hybrid genome assembly.

3.1 High Molecular Weight Genomic DNA Isolation

1. Centrifuge 5  108 epimastigotes at 1400 g for 15 min at 4  C. Wash twice with PBS and resuspend in 4 mL of Buffer A. Mix gently by flicking the tube and add 20μL of proteinase K (20 mg/mL) and 0.2 mL of SDS 10%. Incubate for 1 h at

6

Florencia Dı´az-Viraque´ et al.

50  C. During incubation, gently invert the tubes occasionally for mixing (see Note 1). 2. For DNA isolation, add 0.8 mL of NaCl 0.5 M and 5 mL of phenol (pH >7.8). Mix carefully by inversion for 15 min (see Note 2). Separate phases by centrifuging the sample at 5000 g for 15 min at room temperature and transfer the upper phase containing the DNA to a new falcon tube. Add 100μg/mL of DNAse-free RNAse A and incubate for 15 min at 37  C. Repeat the extraction with phenol/chloroform/isoamyl alcohol and then with chloroform/isoamyl alcohol (see Note 3). 3. Add two volumes of pre-cooled (20  C) absolute ethanol, mix gently by inversion until the solution is completely mixed and the DNA forms a visible precipitate (~10 times) and incubate overnight at 20  C (see Note 4). Centrifuge the samples at 5000 g for 10 min at 4  C and wash the pellet with 1 mL of freshly prepared pre-cooled 70% ethanol. Air-dry the pellet and resuspend in 200μL of pre-heated at 60  C of nuclease-free water. 4. Quantify DNA samples using a fluorescence-based quantification method and store the DNA at 4  C for later use. Alternatively, DNA can be quantified spectrophotometrically by A260 although this method gives higher error rates. 3.2 DNA Fragmentation (Optional)

1. Add 1.5μg of genomic DNA in g-TUBEs and centrifuge 1 min at 3000 g (see Note 5). Invert the tube and centrifuge another minute. The first spin will transfer the sample to the bottom and the second is to transfer the sample into the cap. Collect the sample from the screw-cap. Proceed to the next step within 15 min. These settings will produce 10 Kb fragments (see Note 6). 2. Quantify DNA fragments using a fluorescence-based quantification method and store the DNA at 4  C for later use. Optional: Perform DNA quality check by 2100 Bioanalyzer (Agilent, USA) using High Sensitivity DNA Kit.

3.3 ONT Ligation Sequencing Protocol

3.3.1 DNA Repair and Preparation for Adaptor Ligation

This protocol consists in three steps: repair DNA ends and prepare them for adaptor attachment, ligation of barcodes to each sample, and ligation of sequencing adaptors. 1. Prepare ONT Ligation libraries from fragmented gDNA according to the manufacturer’s instructions. 2. To 200 fmol of DNA in 48μL (see Note 7), add 3.5μL of NEBNext FFPE DNA Repair Buffer, 2μL of NEBNext FFPE DNA Repair Mix, 3.5μL of Ultra II End-prep reaction buffer,

Long Read DNA Sequencing of Protozoan Genomes Using Nanopore

7

and 3μL of Ultra II End-prep enzyme mix. Mix gently by flicking the tube and spin down. 3. Incubate 5 min at 20  C and 5 min at 65  C in a thermal cycler. 4. Add 60μL of AMPure XP beads and mix flicking the tube (see Note 8). 5. Incubate 5 min at room temperature in a rotator mixer. 6. Place the tube on a magnetic rack for 15 s and carefully remove the supernatant once the solution is clear. 7. Add 200μL of freshly prepared 70% ethanol without disturbing the pellet and remove. Repeat this step. 8. Spin down and remove any residual ethanol on the magnetic rack. Allow to air-dry for 30 s. 9. Take the tube out of the magnetic rack, resuspend in 25μL of DNAse-free water and incubate 2 min at room temperature. 10. Pellet on the magnetic rack and transfer the supernatant to a 1.5 mL Eppendorf. 11. Quantify the end-prepped DNA using a fluorescence-based quantification method. 3.3.2 Barcode Ligation

1. To 500 ng of end-prepped DNA in 22.5μL, add 2.5μL of barcode (select a unique barcode for every sample) and 25μL of Blunt/TA Ligase Master Mix. Mix gently by flicking the tube and spin down. 2. Incubate 10 min at room temperature. 3. Add 50μL of AMPure XP beads and mix by pipetting. 4. Incubate 5 min at room temperature in a rotator mixer. 5. Pellet on a magnetic rack and discard the supernatant. 6. Add 200μL of freshly prepared 70% ethanol without disturbing the pellet and remove. Repeat this step. 7. Spin down and remove any residual ethanol on the magnet. Allow to air-dry for 30 s. 8. Remove from the magnetic rack and resuspend in 26μL of DNAse-free water. 9. Incubate 2 min at room temperature. 10. Pellet on the magnetic rack and transfer the supernatant to a 1.5 mL Eppendorf. 11. Quantify the barcoded DNA using a fluorescence-based quantification method.

3.3.3 Adaptor Ligation

1. Pool barcoded samples in an equimolar or desired ratio (see Note 9). 2. Quantify the pool using a fluorescence-based quantification method.

8

Florencia Dı´az-Viraque´ et al.

3. To 700 ng of DNA pool in 65μL of DNAse-free water add 5μL of Adaptor Mix II (AMII), 20μL of NEBNext Quick Ligation Reaction Buffer (5X) and 10μL of Quick T4 DNA ligase. Mix gently by flicking the tube and spin down. 4. Incubate 10 min at room temperature. 5. Add 50μL of AMPure XP beads and mix by pipetting. 6. Incubate 5 min at room temperature in a rotator mixer. 7. Pellet on the magnetic rack and discard the supernatant. 8. Add 250μL of Long Fragment Buffer and resuspend the beads by flicking the tube. Pellet on the magnetic rack and pipette off supernatant. Repeat this step. 9. Spin down and remove any residual supernatant on the magnetic rack. Allow air-drying for 30 s. 10. Remove from the magnet and resuspend in 15μL of Elution Buffer (EB included in ONT kit). 11. Incubate 10 min at 37  C. 12. Pellet on a magnet and transfer the supernatant to a 1.5 mL Eppendorf. 13. Quantify the adaptor-ligated DNA using a fluorescence-based quantification method. 14. 50 fmol are then used for sequencing. 15. Incubate 5 min at room temperature and proceed with priming and loading the Minion Flow cell according to the manufacturer’s instructions. 3.4 ONT Rapid Sequencing Protocol

This straightforward 10-min protocol enables to perform genomic libraries using basic laboratory equipment. 1. Prepare ONT Rapid libraries from high molecular weight gDNA according to manufacturer’s instructions. 2. In a 0.2 mL tube, add 400 ng of gDNA in 7.5μL and 2.5μL of Fragmentation Mix RB01 to RB12 (one for each sample). 3. Incubate 1 min at 30  C and then 1 min at 80  C. 4. Keep the tubes on ice to cool it down. 5. Prepare 10μL of barcoded libraries by mixing equal volumes of desired ratios (if barcoding four or more libraries, it is recommended to clean and concentrate pooled material using an equal volume of AMPure XP beads). 6. Add 1μL of RAP to 10μL of pooled libraries. 7. Incubate 5 min at room temperature and proceed with priming and loading the Minion Flow cell according to the manufacturer’s instructions.

Long Read DNA Sequencing of Protozoan Genomes Using Nanopore

9

3.5 Genome Assembly

The computational analysis for genome assembly includes (1) basecalling of ONT reads (2) quality control of reads (3) hybrid genome assembly, and (4) assembly polishing.

3.5.1 Basecall Reads Using Guppy

DNA strand passing through the nanopore causes particular changes in ionic current. ONT produces fast5 files containing ionic current measurements. Basecalling is the process of converting the electrical signals into the corresponding base sequence. Basecalling could be done directly using MinKNOW software in real time, or post-run using Guppy from command line: $ guppy_basecaller --input_path /data/my_folder/reads --save_path /data/output_folder/basecall --compress_fastq --flowcell FLO-MIN106 --kit SQK-LSK109 --barcode_kits EXP-NBD104 --trim_barcodes

Merge all of the fastq files together into a single file: $ cat basecall/*fastq.gz > all.fastq.gz

3.5.2 Quality Control of Reads

$ fastqc ~/workdir/all.fastq.gz

3.5.3 Genome Assembly

Along with the development of long read sequencing, new assembly algorithms have been developed that are specifically tailored to the use of long reads. These algorithms attempt to reach a balance between being fast, accurate, and computationally inexpensive. Despite reported differences in performance, not all are optimal for dealing with complex genomes. Our results in T. cruzi indicate that the best software for genome assembly are Canu [8] and MaSuRCA [9]. However, this is highly dependent on genome characteristics, some faster and less computationally expensive algorithms could give similar results in “simpler” genomes. Although Canu does not use the Illumina reads in its algorithm, they are needed in the following steps to correct the assembly (see Note 10). Here, we describe how to perform a hybrid assembly using MaSuRCA, but it should be noted that all time new algorithms are being developed to deal with long reads, for instance, HASLR has been released early this year [10]. MaSurRCA 1. Create a configuration file with the location of the data and parameters information. Copy the template in your working directory from the directory where the software was installed and edit it with your information. $ cp ~/softwaredir/sr_config_example.txt ~/workdir/

10

Florencia Dı´az-Viraque´ et al.

An example is detailed in Annex 1. MaSuRCA 2. Make the shell script $ ./masurca/

MaSuRCA 3. Run the shell script $ ./assemble.sh

3.5.4 Assembly polishing

Illumina reads are then used to improve draft assembly using Pilon [11]. Pilon input is the draft assembly in FASTA format and BAM files of reads aligned to the draft assembly. In order to generate the BAM files, Illumina reads are mapped to the assembly using bwa [12] and then BAM files are sorted and indexed with SAMtools [13]: $ bwa index ~/workdir/assembly.fasta $ bwa mem ~/workdir/assembly.fasta ~/workdir/Illumina_R1. fastq.gz ~/workdir/Illumina_R2.fastq.gz | samtools view -bS | samtools sort -o ~/workdir/mapping.sorted.bam $ samtools index ~/workdir/mapping.sorted.bam

Last, polish the assembly with Pilon: $ pilon -Xmx200G --genome ~/workdir/assembly.fasta --fix all --changes --frags ~/workdir/mapping.sorted.bam --threads 14 -output ~/workdir/pilon_round1

Assembly polishing can be repeated several rounds by creating an index on the assembly resulted in round 1 (pilon_round1.fasta file), mapping reads with bwa and last call Pilon with the mappings to polish the assembly. After several polishing runs the result is a high quality genome assembly that can be annotated and submitted to genome databases.

4

Notes 1. Safe stopping point: samples can be stored at room temperature until the next step (up to 48 h). Do not store the samples at 4  C because sodium dodecyl sulfate precipitates below 16  C. 2. It is highly recommended to use a DNA extraction method that preserves the integrity of high molecular weight DNA. ONT read lengths are limited by the length of DNA molecules present in the sample. Be careful to ensure there is not

Long Read DNA Sequencing of Protozoan Genomes Using Nanopore

11

mechanical fragmentation during the process of DNA extraction. Normal DNA manipulations such as pipetting or vortex can shear or nick DNA. To transfer the viscous upper phases, we recommend to cut-off (with clean scissors) the end of the pipette tip to increase the bore size of the tip to 0.3–0.4 mm diameter. 3. Use sterile polypropylene centrifuge tubes suitable for organic solutions for this purpose. 4. DNA should quickly become visible as a cloud of fibers. The DNA precipitate can be transferred by spooling with a Pasteur pipette to an Eppendorf tube containing 70% ethanol. 5. Eppendorf® 5415 and MiniSpin plus centrifuges are compatible, and the settings recommended by manufacturers are 6000 rpm and 8000 rpm, respectively, in order to produce 10 Kb fragments. 6. DNA fragmentation for ONT sequencing is a cost-benefit decision among the amount of data produced and read length. These settings will produce 10 Kb fragments from 227 fmol of double-stranded DNA. G-TUBEs could be re-used for the same sample. Changing the acceleration rate and speed of the centrifuge different sizes can be selected. 7. 200 fmol of double-stranded DNA fragments of 10 Kb corresponds to 1.32μg. For fluorescence-based quantification, we usually use Qubit fluorometer. 10000  660 Da ¼ 6600000 Da m ¼ n  PM m ¼ 200  1015 mol  6600000 Da m ¼ 1:32  106 g  1:32 μg 8. Resuspend the AMPure XP beads by vortexing before use. 9. Since 700 ng is required, pool equimolar amounts of each barcoded DNA to produce a pooled sample of 750 ng total. 10. Genome assembly using Canu: $ canu -p ~/workdir/canu_assembly -d canu_assembly_v1 GenomeSize=60m -nanopore=~/workdir/all.fastq.gz

Acknowledgments This work was funded by Research Council United Kingdom Grand Challenges Research Funder “A Global Network for Neglected Tropical Diseases” grant number MR/P027989/1.

Florencia Dı´az-Viraque´ et al.

12

FDV has an ANII doctoral fellowship (No. POS_NAC_2016_1_129916). GG, LB, and CR are members of the Sistema Nacional de Investigadores (SNI-ANII, UY).

Annex 1 Masurca Configuration File

# DATA is specified as type {PE,JUMP,OTHER,PACBIO} and 5 fields: # 1)two_letter_prefix 2)mean 3)stdev 4)fastq(.gz)_fwd_reads # 5)fastq(.gz)_rev_reads. The PE reads are always assumed to be # innies, i.e. --->.14,000 rpm for 15 min at 4  C (see Note 7). Supernatant was carefully removed to avoid disturbing the DNA pellet. 4. The DNA pellet was washed twice with 70% ethanol (see Note 8) and centrifuged at 13,000 g for 30 min at 4  C. 5. Supernatant was carefully removed, and DNA pellet allowed to air dry (see Note 9). 6. The DNA pellet was resuspended in 50 μL of TE buffer or nuclease-free H2O. 3.3 Quantification of gDNA Yield

The yield and concentration of the genomic DNA was determined using two independent methodologies, NanoDrop and Qubit BR dsDNA assay (see Note 10). 1. Nucleic acid quality was measured spectrophotometrically using the NanoDrop by analyzing 2 μL of gDNA, where nuclease-free H2O was used as a blank. The nucleic acid concentration and quality were recorded using OD260/280 nm and OD260/230 nm ratios (i.e., absorbance at 260 nm and 230 nm wavelengths), respectively. 2. Nucleic acid quantity was also measured using a Qubit fluorometer and with the Quant-iT dsDNA Broad-range (BR) Assay kit (2–1000 ng) as per the manufacturer’s instructions.

3.4 Examination of gDNA Using Agarose Gel Electrophoresis

1. To verify the quality and integrity of the genomic DNA, add approximately 1000 ng of gDNA sample diluted in 10 μL of H2O and 2 μL of 6 loading dye and load on a 0.8% w/v agarose-0.5 TBE gel. Also load 5 μL of 1 Kb Plus DNA molecular size marker into a lateral lane for reference and size estimation. 2. Gels were cast in stands, submerged in 1 TBE buffer in a gel electrophoresis system and run at 70 V for 3 h in 0.5 TBE buffer. The gel was then visualized using UV trans-illumination and photographed (see Note 11).

3.5

Sequencing

1. The complete mt genome sequences of helminths can be obtained using a wide variety of next-generation sequencing technologies. 2. For whole-genome shotgun paired-end (PE) sequencing, the Illumina HiSeq2500 sequencing platform or similar is suitable (see Note 12), according to the manufacturer’s recommended protocol.

34

Nikola Palevich and Paul Haydon Maclean

3. Alternatively, closed mt genomes can be generated from a single read using the Pacific Biosciences (PacBio) or Oxford Nanopore sequencing platforms, according to the manufacturer’s instructions. 3.6 Pre-processing of NGS Data

The bioinformatic pipeline and instructions described below for mitochondrial genome assembly, annotation and sequence analysis can be found on the GitHub page (github.com/NikPalevich/hel minth_mt-genomes), and can be easily performed without any programming knowledge. 1. Most NGS data is made available in FASTQ format and the quality of the raw sequence reads can be evaluated (see Note 13), using the FastQC software package [29]. 2. The error correction tool Trimmomatic v.0.36 [32] is used for removal of adaptor and low-quality (Phred scores Root tree > Midpoint 3. Show the local support values. In FigTree: Node Labels > Display: label 4. When analyzing the tree: (a) Look at the node support values (see Note 44). (b) How does the topology compare to the species tree? (see Fig. 3a–d and Note 45) (c) Is long branch attraction present? (see Fig. 3e and Note 46) (d) Compare the tree to the alignment. Is there anything in the alignment which could explain discrepancies in the tree? (see Note 47) 5. Open the _aln.fasta and _edited.fasta files in an alignment viewer such as Jalview [15]. 6. Color the residues to show conservation. In Jalview [15]: Color > Clustalx 7. When analyzing the alignment: (a) Is the conservation strong or poor? (see Note 48) (b) Does the conservation cover the sequence or is it restricted to a domain region only? (c) How does the edited alignment compare to the unedited alignment? Is there still good representation of the entire protein, or has the alignment been trimmed to a specific domain only? (d) Investigate any sequences which are significantly shorter than the rest of the alignment in the edited alignment (see Note 49). 8. If sequences need to be removed from the alignment, the tree will need to be recomputed. (a) Remove required sequences from the unaligned fasta files and save. (b) Within the jobs folder run the following command to make a new directory where name is the name of the directory to create mkdir name (c) Use the following command to change into the new directory where name is the name of the new directory.

Automated Phylogenetic Analysis

53

Fig. 3 Examples of phylogenetic trees. (a) Represents the species tree and is labeled to show the different components of a phylogenetic tree. (b) An example of rotation of branches within nodes which does not change the tree topology, while they look different, Tree A and Tree B are the same. (c) Rearrangement of branches, species “e” now groups with the outgroup “a”. (d) An example of the presence of paralogs in the same tree. (e) Species “a” is an example of a long branch

cd name (d) Move the new unaligned fasta files into the new directory (e) Run the following command (see Notes 38–41). fasttree_pipeline 9. Repeat steps 1–8 as many times as required until the user is satisfied no false hits are present. 3.10 Finalize and Validate Results

Orthology results should be validated through multiple methods where possible to increase confidence. At the very least results should be submitted to other phylogenetic algorithms, e.g.,

54

Erin R. Butterfield et al.

PhyML [29] and MrBayes [30]. It is also useful to compare domain predictions and domain organization between candidate orthologs. For proteins performing the same function, it would be expected that they have the same (or highly similar) domain predictions and organizations: however, divergence can sometimes lead to a failure to predict a domain. Domain predictions can also be useful for identifying gain or loss of features. It is also useful to compare orthogonal data, e.g., localization of different orthologs (ideally experimentally confirmed rather than predicted). Orthogonal approaches for validation can increase confidence in functional orthology results particularly if conservation is poor between orthologs.

4

Notes 1. This pipeline has been tested on MacOS Mojave 10.14.3, MacOS Catalina 10.15.7, CentOS 6 and Red Hat Enterprise Linux 7. 2. Miniconda can be downloaded from Anaconda: https://docs. conda.io/en/latest/miniconda.html 3. Bioconda [14] installation instructions can be found here: https://bioconda.github.io/user/install.html If Miniconda is already installed, then only the channels should need to be setup. 4. Jalview [15] can be downloaded from: http://www.jalview. org/getdown/release/ or installed using Bioconda [14]. 5. FigTree can be downloaded from: https://github.com/ rambaut/figtree/releases or installed using Bioconda [14]. 6. batch_brb can be installed from Bioconda [14], source code is available on GitHub: https://github.com/erin-r-butterfield/batch_brb and Zenodo: https://doi.org/10.5281/zenodo.4282534 7. Depending on the operating system Conda may first need to be activated using the following command where path_to_conda is the path where Conda is installed. source /path_to_conda/bin/activate Once Conda is activated “(base)” should be visible next to the prompt. 8. Once activated the environment name “(batch_brb)” should be visible next to the prompt. 9. Several considerations should be made when choosing organisms to include. Firstly, is the goal to analyze a specific lineage or pan-eukaryotic? Secondly, even sampling across the desired dataset is important and include at least two organisms from

Automated Phylogenetic Analysis

55

each group, this can give confidence to negative results. Thirdly, understand the quality of the datasets and specifically if the entire coding complement is adequately covered [9]. Finally, how divergent are the genes of interest? A large search database can compromise identification of divergent candidates due to the Expect value (Evalue) being dependent on database size and requiring increasing levels of conservation to secure a significant result [31]. 10. Protein sequences enable detection of greater divergence due to codon degeneracy and the increase in character states (20 vs 4) decreasing random matching [9]. 11. Genome annotations can change; therefore, it is important to log information regarding data source, relevant publications, date of download, and data type. This information will also be required for publication. 12. Valid input fasta file header formats: >..|accession information (e.g., UniProt [18]) >ENA|accession|information (e.g., the European Nucleotide Archive (ENA) [32]) >jgi|organism|accession (e.g., JGI [22]) >. . .|accession information >accession information (e.g., NCBI [19], Ensembl [20]) >accession | information (e.g., EuPathDB [21]) provided the accession is less than or equal to 44 characters, if greater than this the user will need to truncate the accession. 13. Files with the extension .tar.gz can be decompressed using the following command where file is the filename. tar -zxvf file.tar.gz 14. If all files in the databases folder need to be converted to BLASTdatabases, the ls command can be used to get all the names and the results can be copied into the infile column. ls * 15. Do not include spaces (use _ instead, replace in the filename as well) and do not start with a number. 16. If using MacOS Numbers, ensure to remove columns which contain no headings or data before converting to CSV. 17. The batch_makeblastdb command first checks a valid fasta file has been submitted and which data type (protein or nucleotide). The headers of the sequences in the fasta file are converted into the required format for the pipeline and a five character code followed by an underscore is appended as a prefix to the accessions. A single code is used for all sequences

56

Erin R. Butterfield et al.

in the fasta file. The details about the file, unique code, and all the accessions are added to the SQLite3 database and the fasta file is converted into a BLAST [8] database using the BLAST makeblastdb command [33]. 18. Some common errors: (a) Names are already present in database; ensure the file has not already been used. If it has and it needs to be replaced, the delete_db pipeline can be used (see Note 19); otherwise, change the required names. (b) lock file detected, sleeping. . .; This occurs most frequently if the batch_makeblastdb command was terminated prematurely by the user. Provided no other script is running at the time (this is important if many users are using the software at once (e.g., in a cluster environment) as the lock_file is to protect the SQLite3 database from corruption), end the current script (Ctrl + c), delete the lock_file (rm lock_file), and repeat the script. (c) BLAST Database creation error: Near line x, the local id is too long. Its length is y but the maximum allowed local id length is 50. Please find and correct all local ids that are too long; The accessions are too long in the input fasta file (they must be less than or equal to 44 characters due to the addition of a unique id), please truncate the accession. 19. To delete a database, fill in the delete_database_template.xlsx located in the templates/Excel_files folder: (a) BLAST_db; Name of the BLAST [8] database to delete (do not include _database in the name (Table 1)). (b) SQLite3_db; Name of SQLite3 database, leave blank if left blank in Subheading 3.3. (c) Save as a CSV into the databases folder (seeNote 16). (d) Change into the databases folder. cd ../databases (e) Run the following command where csv.csv is the name of the CSV file. delete_db -csv csv.csv This will delete the database and remove information about this database from the SQLite3 database. 20. Ensure all database names end in _database (Table 1); a simple way to do this is to run the following command where out is the name of the output to create provided all databases in the databases folder are to be included. ls *_converted.fasta | sed ‘s/_converted.fasta/_database/g’ > out.txt

Automated Phylogenetic Analysis

57

21. Ensure to include the organism the sequences of interest come from in the alias database, this will provide useful information regarding possible paralogs and also ensure these sequences are included in the fasttree_pipeline later. 22. This script will check the new alias database has not previously been used, create an alias database using the BLAST blastdb_aliastool [33], and add the newly formed database information to the SQLite3 database. 23. Some common errors: (a) BLAST database creation error: BLASTDB alias file creation failed. Some referenced files may be missing; ensure no errors occurred during the batch_makeblastdb pipeline in Subheading 3.3, ensure _database (Table 1) has been added to the end of all database names. (b) Names are already present in database; seeNotes 18a and 19. (c) lock file detected, sleeping. . .; seeNote 18b. 24. The accession_retrieve pipeline determines the md5 checksum for the input sequences and compares these to the md5 checksums in the SQLite3 database. Any md5 checksums that match will be retrieved. These are filtered to those present in the BLAST [8] database specified by the user (this is why a single organism database should be used for retrieval, rather than an alias database). For sequences where the relevant accession cannot be retrieved, the pipeline will run BLAST [8] so the user can determine which is the relevant accession. 25. The output of the BLAST [8] results in the tabular format are: query id (qseqid), subject id (sseqid), percentage identity (pident), length, number of mismatches (mismatch), number of gap openings (gapopen), alignment start for query sequence (qstart), alignment end for query sequence (qend), alignment start for subject sequence (sstart), alignment end for subject sequence (send), Expect value (evalue), bitscore, number of gaps (gaps), total query coverage (qcovs), query coverage per high scoring pair (qcovhsp), query length (qlen), and subject length (slen) [34]. Column names are not provided, names in brackets refer to identifiers from [34]. 26. Common errors: (a) The accessions retrieved are not from the organism of the sequences of interest; this occurs because the sequences are identical. The user can ensure the correct organism accession is retrieved by listing the specific organism database rather than an alias database. (b) Error: Cannot find Fasta_file; Ensure fasta file is located in jobs directory (it will have moved if the job is being

58

Erin R. Butterfield et al.

rerun), ensure file name is correctly spelt and contains extension. 27. Each accession should be on a new line. Commas (“,”) should be removed (if using results from previous ortholog predictions). 28. Accessions need to be the version stored within the SQLite3 database. For the first round of orthology, searching these accessions will need to be retrieved from the SQLite3 database using the accession_retrieve pipeline in Subheading 3.5. For genome walking (Subheading 3.7), the accessions listed in the orthology results from Subheading 3.6 will be the SQLite3 database accessions, so these will not need to be retrieved. 29. If not using an alias database, include _database in the database name (Table 1). 30. It is advisable to keep this number low, as the greater the number the greater the false-positive rate. A value of 5 is normally used for divergent sequences. A value of 2 or 3 is normally used for more conserved sequences. 31. The lower the value, the greater the false-positive rate. A value of 30 is normally used for divergent sequences. A value of 50 is normally used for more conserved sequences. 32. If searching a database with many organisms, this may need to be increased. 33. Alignments are edited before performing phylogenetic analyses. While gaps can be useful when looking at alignments, i.e., for the identification of indels, poorly conserved regions of an alignment can decrease the alignment signal-to-noise ratio which can affect tree quality although this can be less of an issue for short alignments [35]. 34. If specifying a model, ensure to use lower case. 35. The orthology_pipeline retrieves sequences from the organism database for the accessions provided and performs a BLAST [8] against the search database. Results are filtered to take the top x hits per organism per query where x is user supplied (identical results do not count towards the hit count). These are further filtered to those hits which cover y percentage of the query where y is user supplied. The sequences for this list of hits are retrieved and BLAST [8] is performed against the organism database. These results are filtered to take the top x hits per query where x is user supplied (identical results do not count towards the hit count). These are further filtered to those which cover y percentage of the query where y is user supplied. The two results sets are compared, where they match the hits are considered an ortholog (i.e., if A detects B in first BLAST [8] and B detects A in reverse BLAST [8], then A and B

Automated Phylogenetic Analysis

59

are called orthologs). If the tree pipeline is selected, the sequences of all predicted orthologs per query will be retrieved, aligned using MUSCLE [11], edited using alncut [12] and a FastTree [13] phylogenetic tree will be built. 36. Some common errors: (a) Error: [blastdbcmd] Entry not found:accession Error: [blastdbcmd] Entry or entries not found in BLASTdatabase; ensure the SQLite3 database accessions are used, obtained with the accession_retrieve pipeline (Subheading 3.5). (b) Can’t find (first/reverse) BLASTdatabase; ensure the database names are correct. (c) mv: rename Accession_list to ../Accession_list: No such file or directory; ensure Accession_list (from step 2b) contains the file extension. (d) Didn’t get the anticipated results; stringency may be too high, lower the alignment coverage and potentially increase the hit number. Alternatively, perform “genome walking” (Subheading 3.7). 37. merge_results will combine the results of two orthology searches. The script will map the query accessions from the round of genome walking back to the organism results in the first orthology search to determine the query accessions of the original orthology search. This mapping is used to add the results of the genome walk. Instances where the result for an organism is detected by two different queries will result in the new results being added to both queries, i.e., if A and B from the first orthology results both predict C as an ortholog then the orthology results using C as query will be mapped back to both A and B. 38. Editing frequency can be altered by using the below argument where frequency is a value between 0 and 1. The default value is 0.25 which enables gaps to be present in 25% of sequences at a given residue. -f frequency 39. The model can be altered by using the below argument where model is either lg [25] or wag [26] for protein or gtr for nucleotide [13]. The default is JTT [23] for protein and JC [24] for nucleotide. -m model 40. This script requires there be greater than three sequences present for each query in order to build an alignment and tree. 41. This script also works with fasta files or text files of accession lists. For text files the -db argument is required. The -csv

60

Erin R. Butterfield et al.

argument is not required for either text or fasta files. Each file will produce its own alignment and tree, so only include the accessions/sequences that should appear on the same tree. 42. Some common errors: (a) Database required if text files supplied; the -db argument is required for both text files and a CSV file, ensure this has been included. (b) Not enough sequences to build tree; this pipeline requires there to be at least three sequences for a tree to be built. (c) Error: Inappropriate model type chosen. . .; This occurs if a nucleotide model was chosen for protein sequences or vice versa. If this occurs, the default algorithm will be selected (JTT [23] for protein, JC [24] for nucleotide). (d) The pipeline is making folders/ trees, etc. of everything in the jobs folder; ensure a new folder is created which contains only the files the user wants to use to build a tree. Make sure the terminal is within this folder before launching fasttree_pipeline 43. Where possible the tree should be rooted by an outgroup—this can either be an organism that is known to be at the base of the tree or alternatively a gene sequence which shares ancestry with the gene of interest. Where this is not possible trees can be rooted by midpoint, this takes the two longest branches of the tree and places the root at the midpoint between. Assuming equal evolutionary rates across the tree, this will normally result in the tree following the species tree [7]. 44. The closer the value is to 1, the more confident the topology. Nodes with values 0.95 show strong support, 0.85–0.94 show good support, 0.75–0.84 show moderate support and 0.74 show poor support. 45. While the tree topology will not always follow the species tree, it can be used to give an indication as to whether false positives or paralogs have been included [7]. Figure 3 demonstrates various examples of how the tree can differ from the species tree (A). Branches within a node are of equal distance from other branches within the same node. Therefore, rotation of branches within a node does not change how the tree reads, i.e., Tree A and Tree B in Fig. 3 are the same as “d” and “e” are the same evolutionary distance from “b” and “c”. In Fig. 3c, species “e” has moved to group with species “a”, this may suggest that “e” is a mishit as the sequence is now grouping with the outgroup—however, this requires further investigation to ensure there are no features in the alignment which could support this move and knowing whether these

Automated Phylogenetic Analysis

61

organisms “normally” attract each other when building phylogenetic trees. Figure 3d represents paralog presence on the tree, further investigation is required to determine whether these sequences should continue to be considered orthologous or should be split into two separate trees. 46. Sequences which evolve rapidly relative to other sequences within the tree will be given long branches (Fig. 3e species “a” is an example of a long branch) and can be falsely grouped together. This is referred to as long branch attraction. These sequences can be removed individually to test their placement, if they move around it suggests long branch attraction. There are many methods for dealing with long branch attraction including removal of these sequences, increased taxa sampling, and selection of a more appropriate evolutionary model [36, 37]. 47. Topology of the tree should be checked against the alignment to: (a) Ensure the alignment is of sufficient quality to build the tree. (b) Ensure there are no features of the alignment which could explain oddities in the tree (e.g., short sequences, lack of domains etc.). 48. Tree building algorithms require a phylogenetic signal to be able to build a tree. Alignments with poor conservation have a high signal-to-noise ratio and can produce poor quality trees. Equally, alignments with too much conservation can also produce poor quality trees due to the lack of phylogenetic signal. Applying more or less stringent editing, respectively, can improve these [35]. 49. The sequence origin should be investigated to determine whether the sequence is complete (i.e., ensure the entire sequence has been predicted). If the sequence is complete and still considerably shorter than the remainder of the alignment, the sequence should usually be removed.

Acknowledgments Development of the software described and provided here has been supported by a Wellcome Trust grant for establishing the Wellcome Centre for Anti-Infectives Research (WCAIR) at the University of Dundee (to Paul Wyatt (PI), MCF, and others). We thank Tim Butterfield, Frederik Dro¨st, and Michele Tinti for comments on the code. We also thank Ricardo Canavate del Pino and Ning Zhang for help with testing.

62

Erin R. Butterfield et al.

References 1. Stamboulian M, Guerrero RF, Hahn MW et al (2020) The ortholog conjecture revisited: the value of orthologs and paralogs in function prediction. Bioinformatics 36(Supplement_1): i219–i226. https://doi.org/10.1093/bioin formatics/btaa468 ˜ a B, Forte B, Choi R et al (2019) 2. Baragan Lysyl-tRNA synthetase as a drug target in malaria and cryptosporidiosis. Proc Natl Acad Sci U S A 116(14):7015–7020. https://doi. org/10.1073/pnas.1814685116 3. Klinger CM, Ramirez-Macias I, Herman EK et al (2016) Resolving the homology—function relationship through comparative genomics of membrane-trafficking machinery and parasite cell biology. Mol Biochem Parasitol 209:88–103. https://doi.org/10.1016/j. molbiopara.2016.07.003 4. Huerta-Cepas J, Szklarczyk D, Forslund K et al (2016) EGGNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res 44(D1): D286–D293. https://doi.org/10.1093/nar/ gkv1248 5. Aslett M, Aurrecoechea C, Berriman M et al (2009) TriTrypDB: a functional genomic resource for the Trypanosomatidae. Nucleic Acids Res 38(Database issue):D457–D462. https://doi.org/10.1093/nar/gkp851 6. Emms DM, Kelly S (2015) OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol 16 (1):157–157. https://doi.org/10.1186/ s13059-015-0721-2 7. Altenhoff AM, Glover NM, Dessimoz C (eds) (2019) Inferring orthology and paralogy (vol. 1910). Evolutionary genomics. Methods in molecular biology. Springer, New York 8. Altschul SF, Madden TL, Sch€affer AA et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402. https://doi.org/10.1093/nar/25.17.3389 9. Klute MJ, Melanc¸on P, Dacks JB (2011) Evolution and diversity of the Golgi. Cold Spring Harb Perspect Biol 3:a007849 10. Shen W, Le S, Li Y et al (2016) SeqKit: a crossplatform and ultrafast toolkit for FASTA/Q file manipulation. PLoS One 11(10):e0163962. https://doi.org/10.1371/journal.pone. 0163962 11. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and

high throughput. Nucleic Acids Res 32 (5):1792–1797. https://doi.org/10.1093/ nar/gkh340 12. Lawrence TJ, Kauffman KT, Amrine KCH et al (2015) FAST: FAST analysis of sequences toolbox. Front Genet 6:172. https://doi.org/10. 3389/fgene.2015.00172 13. Price MN, Dehal PS, Arkin AP (2010) FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS One 5(3): e9490. https://doi.org/10.1371/journal. pone.0009490 14. Gru¨ning B, Dale R, Sjo¨din A et al (2018) Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods 15(7):475–476. https://doi.org/10.1038/ s41592-018-0046-7 15. Waterhouse AM, Procter JB, Martin DM et al (2009) Jalview version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics 25(9):1189–1191. https://doi. org/10.1093/bioinformatics/btp033 16. Barlow LD (2018) AMOEBAE. https:// github.com/laelbarlow/amoebae 17. Larson RT, Dacks JB, Barlow LD (2019) Recent gene duplications dominate evolutionary dynamics of adaptor protein complex subunits in embryophytes. Traffic 20 (12):961–973. https://doi.org/10.1111/tra. 12698 18. The UniProt Consortium (2019) UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res 47(D1):D506–D515. https://doi. org/10.1093/nar/gky1049 19. NCBI Resource Coordinators (2018) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 46 (D1):D8–D13. https://doi.org/10.1093/ nar/gkx1095 20. Yates AD, Achuthan P, Akanni W et al (2020) Ensembl 2020. Nucleic Acids Res 48(D1): D682–D688. https://doi.org/10.1093/nar/ gkz966 21. Aurrecoechea C, Barreto A, Basenko EY et al (2017) EuPathDB: the eukaryotic pathogen genomics database resource. Nucleic Acids Res 45(D1):D581–D591. https://doi.org/ 10.1093/nar/gkw1105 22. Nordberg H, Cantor M, Dusheyko S et al (2014) The genome portal of the Department of Energy Joint Genome Institute: 2014 updates. Nucleic Acids Res 42(D1): D26–D31. https://doi.org/10.1093/nar/ gkt1069

Automated Phylogenetic Analysis 23. Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Bioinformatics 8 (3):275–282. https://doi.org/10.1093/bioin formatics/8.3.275 24. Jukes TH, Cantor CR (eds) (1969) Evolution of protein molecules, Mammalian protein metabolism, vol 3. Academic, New York 25. Le SQ, Gascuel O (2008) An improved general amino acid replacement matrix. Mol Biol Evol 25(7):1307–1320. https://doi.org/10.1093/ molbev/msn067 26. Whelan S, Goldman N (2001) A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol 18(5):691–699. https://doi.org/10.1093/ oxfordjournals.molbev.a003851 27. Liu K, Linder CR, Warnow T (2011) RAxML and FastTree: comparing two methods for large-scale maximum likelihood phylogeny estimation. PLoS One 6(11):e27731. https:// doi.org/10.1371/journal.pone.0027731 28. Smirnov V, Warnow T (2021) Phylogeny estimation given sequence length heterogeneity. Syst Biol 70(2):268–282. https://doi.org/10. 1093/sysbio/syaa058 29. Guindon S, Dufayard J-F, Lefort V et al (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59 (3):307–321. https://doi.org/10.1093/sys bio/syq010 30. Ronquist F, Teslenko M, van der Mark P et al (2012) MrBayes 3.2: efficient Bayesian

63

phylogenetic inference and model choice across a large model space. Syst Biol 61(3):539–542. https://doi.org/10.1093/sysbio/sys029 31. Kerfeld CA, Scott KM (2011) Using BLAST to teach “E-value-tionary” concepts. PLoS Biol 9 (2):e1001014. https://doi.org/10.1371/jour nal.pbio.1001014 32. Amid C, Alako BTF, Balavenkataraman Kadhirvelu V et al (2020) The European nucleotide archive in 2019. Nucleic Acids Res 48(D1): D70–D76. https://doi.org/10.1093/nar/ gkz1063 33. Camacho C, Coulouris G, Avagyan V et al (2009) BLAST+: architecture and applications. BMC Bioinform 10:421. https://doi.org/10. 1186/1471-2105-10-421 34. Bethesda (MD): National Center for Biotechnology Information (US) (2008) Appendices. https://www.ncbi.nlm.nih.gov/books/ NBK279684/ 35. Talavera G, Castresana J (2007) Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol 56 (4):564–577. https://doi.org/10.1080/ 10635150701472164 36. Brinkmann H, van der Giezen M, Zhou Y et al (2005) An empirical assessment of long-branch attraction artefacts in deep eukaryotic phylogenomics. Syst Biol 54(5):743–757. https://doi. org/10.1080/10635150500234609 37. Bergsten J (2005) A review of long-branch attraction. Cladistics 21(2):163–193

Part II Diagnostic Approaches Using Genomic Tools

Chapter 5 An Illumina MiSeq-Based Amplicon Sequencing Method for the Detection of Mixed Parasite Infections Using the Blastocystis SSU rRNA Gene as an Example Jenny G. Maloney, Nadja S. George, Aleksey Molokin, and Monica Santin Abstract Parasite mixed infections remain a relatively unexplored field in part due to the difficulties of unraveling complex mixtures of parasite DNA using classical methods of sequencing. Next-generation amplicon sequencing (NGS) is a powerful tool for exploring mixed infections of multiple genetic variants of the same parasite in clinical, environmental (water or soil), or food samples. Here, we provide a method for NGS-based detection of mixed parasite infections which uses the Blastocystis SSU rRNA gene as an example and includes steps for parasite concentration, DNA extraction, sequencing library preparation, and bioinformatic analysis. Key words Blastocystis, Illumina Miseq, Mixed infections, Next-generation amplicon sequencing, NGS, Parasites, Protist, Subtypes

1

Introduction Environmental samples and individuals infected with gastrointestinal protists often carry several genetic variants of the same parasite. This intra-host parasite diversity is an important aspect of parasite epidemiology and host–parasite interaction that may influence transmission, infection outcome, and response to treatment. However, this phenomenon remains understudied, in part due to the technical difficulties associated with accurate identification of mixed infections. Next-generation amplicon sequencing (NGS) offers a solution to this problem through massively parallel sequencing of amplicons from individual samples which provides the sequencing depth and resolution needed for characterization of a microbial community within a given host or environmental sample. NGS is commonly used in bacterial community analyses, but this type of sequencing has not been widely applied to studying protist parasite communities. In studies which have used NGS to study intra-host

Luis M. de Pablos and Javier Sotillo (eds.), Parasite Genomics: Methods and Protocols, Methods in Molecular Biology, vol. 2369, https://doi.org/10.1007/978-1-0716-1681-9_5, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021

67

68

Jenny G. Maloney et al.

parasite diversity, the superior ability of NGS to detect mixed infections is a recurring theme [1–4]. The method detailed here describes how to perform NGS for community analysis of protist parasites, beginning with parasite concentration from feces and ending with recommendations for bioinformatic analysis of the sequencing data generated by the NGS protocol. This protocol has been successfully applied to studies of Blastocystis diversity in multiple hosts including humans, cattle, birds, and carnivores and has demonstrated mixed infections to be far more common than previously reported using traditional Sanger sequencing methods [2, 5–7]. This method uses the Blastocystis SSU rRNA gene as an example but is adaptable for use with other parasites and genes through selection of primers suitable for use with Illumina MiSeq sequencing technology [8].

2

Materials Carefully follow all disposal regulations for accumulated waste materials.

2.1 Concentration of Parasite Forms from Feces

1. TE Buffer: 50 mM Tris, 10 mM EDTA, pH 7.2. Add approximately 700 mL of deionized water to a 1 L glass beaker containing a magnetic stir bar. Weigh 6.057 g of Tris base (mw. ¼ 121.14 g/mol) and transfer to beaker. Then weigh 2.9224 g of EDTA (mw. ¼ 292.24 g/mol) and transfer into the same glass beaker that already contains the Tris base. Rinse any powder residue on the inner side of the beaker with deionized water and fill beaker with water to a volume of 900 mL. Mix by stirring and adjust pH with HCl to 7.2. Fill to a final volume of 1 L with deionized water. Transfer to 1 L glass bottle/plastic bottle and close bottle with cap firmly. Store at 4  C. 2. 1.8 g/mL Cesium chloride (CsCl) solution: Add approximately 700 mL of deionized water to a 1 L glass beaker containing a magnetic stir bar. Weigh 800 g of CsCl (mw ¼ 168.36 g/mol) and transfer to beaker (see Note 1). Use stirrer/hot plate to heat solution until CsCl is dissolved (see Note 2). Under constant stirring allow CsCl powder to dissolve completely before adding deionized water to final volume of 1 L. Transfer to 1 L glass bottle and close bottle with cap firmly. Store at 4  C (see Note 3). 3. 1.4 g/mL Cesium chloride solution: Add 500 mL of the 1.8 g/mL CsCl solution and 500 mL of TE buffer, pH 7.2 (see Note 4) into a 1 L bottle containing a stir bar. Cap bottle and mix via magnetic stirrer until the two liquids become a homogenous solution (see Note 5). Store at 4  C.

NGS Method for Detection of Mixed Parasite Infections

69

4. Lab bench scale. 5. Wooden spatula. 6. Deionized water. 7. 0.45 μm sieves. 8. Disposable 16 oz. cups. 9. Bleach (Sodium hypochlorite) diluted 1:10 for sanitization. 10. Cooling centrifuge that accommodates 50 mL and 15 mL tubes. 11. Transfer pipettes. 2.2

DNA Extraction

1. Column-based commercial DNA extraction kit. Numerous kits are available and reported in the literature for extraction of parasite DNA from feces. Kits usually include some sort of lysis buffer, proteinase K, wash buffers, elution buffer, and a membrane-based spin column. The example given below uses the DNeasy Blood & Tissue Kit (Qiagen, Valencia, CA). The kit contains membrane spin columns, collection tubes, readyto-use proteinase K solution, ATL buffer (tissue lysis buffer), AL (lysis buffer), two wash buffers (concentrated AW1 and AW2), and AE buffer (elution buffer). 2. 96–100% Ethanol (200 proof). 3. Water bath for heating at 56  C and 70  C. 4. Benchtop centrifuge.

2.3 Sample Screening and Sequencing Library Preparation

1. Amplicon library preparation protocol: the Illumina 16S Metagenomic Sequencing Library Preparation Protocol (Part # 15044223 Rev. B) (Illumina, San Diego, CA) has been modified for use with the primers described in this chapter. 2. 50 μM (stock) PCR forward and reverse primers (which include Illumina adaptor sequences) (see Note 6). 3. 2 KAPA HiFi HotStart ReadyMix (Roche Diagnostics). 4. 0.1 g/10 mL Bovine Serum Albumin (BSA). 5. Nuclease-free water. 6. DNA visualization method (such as agarose gel or capillary gel electrophoresis). 7. 96–100% Ethanol (200 proof). 8. 80% Ethanol (should be prepared immediately before use) 9. Nuclease-free reagent reservoirs (5 and 25 mL). 10. Agencourt AMPure XP beads (Beckman Coulter Life Sciences, Indianapolis, IN). 11. Illumina Nextera XT Index Kit v2 (Illumina, San Diego, CA). 12. MiSeq Reagent Kit v3 (600 cycle) (Illumina, San Diego, CA).

70

Jenny G. Maloney et al.

13. PhiX Control Kit v3 (Illumina, San Diego, CA). 14. 10 mM Tris-HCl, pH 8.5 (see Note 7). 15. 1 N or 10 N Sodium hydroxide (NaOH) for preparing 0.2 N in step 52. 16. Magnetic stand for 96-well plates. 17. TruSeq Index Plate Fixture Kit (Illumina, San Diego, CA). 18. Quant-iT™ dsDNA Broad-Range or High Sense Assays Kits (Thermo Fisher Scientific, Waltham, MA). 19. Qubit fluorometer (ThermoFisher Scientific, Waltham, MA) or fluorescence microplate reader such as SpectraMax iD5 (Molecular devices, San Jose, CA). 20. Illumina MiSeq (Illumina, San Diego, CA).

3

Methods Wear appropriate Personal Protective Equipment (PPE), such as laboratory coat and disposable gloves, during all sample-processing steps. In addition, wear face shield or facemask/eye protection when processing fecal samples.

3.1 Concentrating Parasite Forms from Feces Via CsCl Centrifugation [9]

1. Weigh fecal sample directly into a 50 mL tube to a maximum of 15 g and add deionized water until a volume of 50 mL is reached (see Note 8). Cap tube and mix thoroughly by vortexing. 2. Sieve each sample with a 45 μm reusable sieve (see Note 9). The sieve is placed on top of a disposable 16 oz. cup, and the contents of the 50 mL tube are decanted onto the membrane. Gravity filtration of the fecal sample can be aided by the addition of more water or the use of the 50 mL tube to break bigger pieces apart. Total filtrate volume should not exceed 50 mL. 3. Transfer sieved fecal solution into new 50 mL tube and add enough deionized water until a volume of 50 mL is reached. 4. Place the 50 mL sample tubes in a cooling benchtop centrifuge with swing out bucket rotor, add a balance tube, if necessary, and centrifuge at 1800  g for 15 min at 4  C. 5. Decant supernatant into designated waste container. Retain the pellet for the next step. 6. Add deionized water to pellet to a volume of 25 mL and vortex to completely resuspend pellet. Add 25 mL of 1.4 g/mL CsCl solution (see Note 10). 7. Mix gently by inverting tube ten times before placing in centrifuge. Centrifuge tubes at 300  g for 20 min at 4  C with no breaking.

NGS Method for Detection of Mixed Parasite Infections

71

8. Carefully retrieve tubes from the centrifuge so as not to disturb parasite forms present at the top of the gradient. 9. Transfer 4 mL from the top of the gradient to a 15 mL tube containing 11 mL of deionized water. Mix by inverting 10 times, and centrifuge at 1800  g for 15 min at 4  C. 10. Retrieve 15 mL tubes and decant the supernatant into a waste container. 11. Use a transfer pipette to mix and transfer the pellet from the 15 mL tubes into a labeled 1.5 mL tube. 12. Samples can be stored at 4  C for up to 1 week before proceeding to DNA extraction. Store samples at 20  C if DNA extraction will not be performed within 1 week. 3.2

DNA Extraction

There are numerous commercial kits available to perform DNA extraction. This protocol uses the DNeasy Blood & Tissue Kit (Qiagen Inc., Valencia, CA) for the extraction of DNA from parasite forms. The following DNA extraction protocol has been adapted and modified from the manufacturer’s manual. 1. Transfer 100 μL of concentrated parasite sample to a 1.5 mL microcentrifuge tube. Add 180 μL of ATL buffer. 2. Add 20 μL of Proteinase K to each sample tube. Mix thoroughly by vortexing for 15 s. Incubate in a 56  C water bath overnight. Following incubation, briefly spin tube to collect sample volume to bottom of tube. 3. Add 200 μL of AL buffer to each sample tube, mix thoroughly by vortexing for 15 s. Incubate in a 70  C water bath for 10 min. Spin down briefly to collect condensation in tube lid. 4. Add 200 μL 96–100% ethanol to each sample. Mix by pipetting up and down three times. 5. Place the DNeasy Mini spin column into a 2 mL collection tube and pipette the mixture (including any precipitate) into the column. Centrifuge at 6000  g for 1 min. Discard flowthrough and collection tube. 6. Place the DNeasy Mini spin column in a new 2 mL collection tube, add 500 μl AW1 buffer to the column, and centrifuge for 1 min at 6000  g. Discard flow-through and collection tube. 7. Place the DNeasy Mini spin column in a new 2 mL collection tube, add 500 μl of AW2 buffer to the column, and centrifuge for 3 min at 20,000  g to dry the DNeasy membrane. Discard flow-through and collection tube. 8. Place the DNeasy Mini spin column in a new 2 mL collection tube, and pipet 100 μl Buffer AE directly onto the DNeasy membrane. Incubate at room temperature for 1 min, and then centrifuge for 1 min at 6000  g to elute.

72

Jenny G. Maloney et al.

9. Discard the column and transfer the eluate from the collection tube into a 1.5 mL microcentrifuge tube. 10. DNA samples can be stored at 20  C for later use. 3.3 Sample Screening and Sequencing Library Preparation

The following protocol uses Blastocystis SSU rRNA as an example but can be modified for use with other parasites by using primers specific to your organism and gene of interest (see Note 6). This amplicon sequencing protocol has been adapted from the Illumina 16S Metagenomic Sequencing Library Preparation Protocol (Part # 15044223 Rev. B). The details of library preparation are given below. Because this is a multistep process, which can span several days, safe stopping places are noted. 1. Prepare PCR master mix. Each run should include a positive control and negative control (nuclease-free water). The reaction setup in Table 1 is used as a guideline and can be modified accordingly for the number of samples being tested. 2. Pipette the appropriate amount of master mix solution, in this case 22.5 μL, into labeled PCR tubes or a 96-well microplate. 3. Add 2.5 μL of each DNA sample, positive control, and no template control (nuclease-free water) into designated tube or well. Seal plate or close PCR tube and briefly centrifuge before placing in PCR thermal cycler. Run the PCR program outlined in Table 2. 4. Confirm your PCR product by either an agarose gel or capillary gel electrophoresis method (see Note 11). The total volume of the screening PCR may need to be adjusted depending on the volume of product required for your DNA visualization method (see Note 12). 5. Select all Blastocystis-positive samples, the Blastocystis-positive control, and the no template control and transfer these samples to a new 96-well plate for library preparation. Samples can be stored at 4  C overnight or at 20  C for up to 1 week before proceeding to the next step. 6. Perform the first PCR clean-up which uses AMPure XP beads to clean primers and primer dimers from amplicon. 7. Allow the AMPure XP beads to equilibrate to room temperature for at least 30 min before use. 8. Briefly centrifuge plate to collect condensation and carefully remove seal (see Note 13). 9. Mix the AMPure XP beads thoroughly by vortexing for at least 30 s. Dispense the volume of beads needed for sample processing into a reservoir. Use a multichannel pipette to transfer 12 μL of beads into each well containing a 15 μL volume of sample.

NGS Method for Detection of Mixed Parasite Infections

73

Table 1 PCR master mix reaction setup Component

1X Reaction (μL)

Template DNA

2.5

50μM forward primer

0.5

50μM reverse primer

0.5

2X KAPA HiFi HotStart ReadyMix

12.5

0.1 g/10 mL BSA

1.25

Nuclease-free water

7.75

Total volume

25

Table 2 PCR reaction conditions for amplification of Blastocystis SSU rRNA gene Step

Temperature ( C)

Time

Initial denaturation

95

4 min

Denaturation

95

30 s

Annealing

54

30 s

Extension

72

30 s

Final extension

72

5 min

Hold

4

1

35 cycles

10. Adjust volume of multichannel pipette to 20 μL and gently mix by pipetting up and down ten times. 11. Incubate for 5 min at room temperature. 12. Secure plate on a magnetic plate stand and wait for at least 2 min or until the supernatant is clear before proceeding to the next step. 13. Remove and discard the supernatant by multichannel pipette using care not to disturb the bead pellet. 14. Wash the pellet by adding 200 μL of freshly prepared 80% ethanol to each well with a multichannel pipette. Remove and discard the supernatant. 15. Perform a second ethanol wash as above. After discarding the supernatant, use a multichannel pipette and 20 μL fine tips to remove any excess ethanol from the bottom of the sample well. 16. Allow the beads to air dry for up to 10 min (see Note 14).

74

Jenny G. Maloney et al.

17. Dispense enough 10 mM Tris pH 8.5 into a reservoir to be able to multichannel pipette 52.5 μL into each sample well. 18. Remove the plate from the magnetic stand. Transfer 52.5 μL of 10 mM Tris pH 8.5 into each sample well and resuspend bead pellet by gently pipetting up and down until bead pellet is fully resuspended. 19. Incubate at room temperature for 2 min. 20. Secure plate on a magnetic plate stand and wait for at least 2 min or until the supernatant is clear before proceeding to the next step. 21. Use a multichannel pipette to transfer 45 μL of supernatant to a new 96-well plate. Use caution not to disturb the bead pellet (see Note 15). Samples can be stored at 4  C overnight or at 20  C for up to 1 week before proceeding to the next step. 22. Perform the index PCR which uses the Nextera XT index kit to attach dual indices and Illumina sequencing adaptors to each sample for sample identification following sequencing. 23. Transfer 5 μL of each sample to a new 96-well plate using a multichannel pipette. 24. Place the Index 1 and Index 2 tubes into the TruSeq Index Plate Fixture with Index 1 (orange caps) in columns 1 through 12 and Index 2 (white caps) in rows A through H. 25. Add 10 μL of nuclease-free water to each sample well. 26. Add 5 μL of Primer 1 to each sample well. 27. Add 5 μL of Primer 2 to each sample well. 28. Add 25 μL of Kapa HiFi HotStart Ready Mix to each sample well. Mix by pipetting up and down ten times. 29. Seal plate and briefly centrifuge before placing in PCR thermal cycler. Run the PCR program outlined in Table 3. Samples can be stored at 4  C overnight or at 20  C for up to 1 week before proceeding to the next step. 30. Perform the second PCR clean-up. 31. Allow the AMPure XP beads to equilibrate to room temperature for at least 30 min before use. 32. Briefly centrifuge plate to collect condensation and carefully remove seal. 33. Mix the AMPure XP beads thoroughly by vortexing for at least 30 s. Dispense the volume of beads needed for sample processing into a reservoir. Use a multichannel pipette to transfer 56 μL of beads into each sample well. 34. Gently mix by pipetting up and down ten times. 35. Incubate for 5 min at room temperature.

NGS Method for Detection of Mixed Parasite Infections

75

Table 3 Index PCR reaction conditions Step

Temperature ( C)

Time

Initial denaturation

95

3 min

Denaturation

95

30 s

Annealing

55

30 s

Extension

72

30 s

Final extension

72

5 min

Hold

4

1

8 cycles

36. Secure plate on a magnetic plate stand and wait for at least 2 min or until the supernatant is clear before proceeding to the next step. 37. Remove and discard the supernatant using care not to disturb the bead pellet. 38. Wash the pellet by adding 200 μL of freshly prepared 80% ethanol to each well with a multichannel pipette. Remove and discard the supernatant. 39. Perform a second ethanol wash as above. After discarding the supernatant use a multichannel pipette and 20 μL fine tips to remove any excess ethanol from the bottom of the sample well. 40. Allow the beads to air dry for up to 10 min (see Note 14). 41. Dispense enough 10 mM Tris pH 8.5 into a reservoir to be able to multichannel pipette 30 μL into each sample well. 42. Remove the plate from the magnetic stand. Transfer 30 μL of 10 mM Tris pH 8.5 into each sample well and resuspend bead pellet by gently pipetting up and down until bead pellet is fully resuspended. 43. Incubate at room temperature for 2 min. 44. Secure plate on a magnetic plate stand and wait for at least 2 min or until the supernatant is clear before proceeding to the next step. 45. Use a multichannel pipette to transfer 25 μL of supernatant to a new 96-well plate. Use caution not to disturb the bead pellet (see Note 15). Samples can be stored at 4  C overnight or at 20  C for up to 1 week before proceeding to the next step (see Note 16). 46. Quantify the cleaned final libraries by fluorometric quantification either by Qubit or using a microplate reader with the appropriate wavelength.

76

Jenny G. Maloney et al.

47. From the DNA concentration (ng/μL) obtained via the fluorometric quantification of the libraries and the average library size (in this case about 630 bp), calculate the DNA concentration in nM (Molarity) (see Note 17). 48. Normalize the libraries by diluting each library to 4 nM with 10 mM Tris pH 8.5. 49. Pool 5 μL of each normalized library into a 1.5 mL microcentrifuge tube. The protocol can be stopped here, and the pooled library can be stored at 20  C for up to 1 week. Do not proceed to the final steps of library preparation until you are ready to load the library on the Illumina MiSeq for sequencing. 50. Prepare library for loading on the Illumina Miseq. 51. Thaw MiSeq reagent cartridge at room temperature or overnight at 4  C. 52. Prepare fresh 0.2 N NaOH (see Note 18). Set a heat block to 96  C and prepare an ice water bath. 53. Denature pooled final library DNA by combining 5 μL 4 nM pooled library with 5 μL 0.2 N NaOH in a 1.5 mL microcentrifuge tube. Vortex and centrifuge the sample briefly. 54. Incubate for 5 min at room temperature. 55. Add 990 μL pre-chilled HT1 (found in reagent cartridge) to the denatured DNA to dilute to 20 pM and place on ice. 56. Dilute the PhiX control library by combining 2 μL of 10 nM PhiX library with 3 μL 10 mM Tris pH 8.5 in a 1.5 mL microcentrifuge tube. 57. Denature PhiX control library by adding 5 μL 0.2 N NaOH. Vortex and centrifuge the sample briefly. 58. Incubate for 5 min at room temperature. 59. Add 990 μL pre-chilled HT1 to the denatured PhiX library to dilute to 20 pM and place on ice. 60. Dilute the denatured DNA library to a final concentration of 8 pM by combining 240 μL of 20 pM library with 360 μL of pre-chilled HT1. Invert tube several times to mix, centrifuge briefly by pulse spinning, and place on ice. 61. Dilute the denatured PhiX library to 8 pM by combining 240 μL of 20 pM library with 360 μL of pre-chilled HT1. Invert tube several times to mix, centrifuge briefly, and place on ice. 62. Spike the final 8 pM amplicon library with 10% 8 pM PhiX by combining 540 μL amplicon library with 60 μL PhiX library (see Note 19). Leave on ice until ready to load into the MiSeq reagent cartridge. 63. Immediately before loading, denature the library in a heat block at 96  C for 2 min.

NGS Method for Detection of Mixed Parasite Infections

77

64. Invert the tube 1–2 times to mix and immediately place in an ice water bath for 5 min. 65. Prepare Illumina MiSeq Instrument and then load library. 3.4 Bioinformatic Analysis of Illumina Sequences

Raw data generated by Illumina sequencers comes in the form of Binary Base Call (bcl) files that encode the nucleotide bases of each DNA library fragment sequenced and their quality scores. The raw data files are converted to the text-based FASTQ format, and sequences (or reads) are demultiplexed into samples using the index sequence of each read by the sequencer’s on-board software. The reads are trimmed on the 30 end to remove any Illumina adaptor sequences beyond the biological sequence of interest (i.e., DNA insert). The following steps can be used for analysis of reads generated using this protocol. 1. Merge read pairs. As each library fragment was sequenced from both ends to produce a forward and reverse read, each read pair should be merged into a single, longer sequence that covers the entire DNA insert. Read pair merging involves the alignment of the forward and reverse read via their 30 overlapping region. The merging process takes into account the number of matching and mismatching bases within the overlap to determine if the pair can be reliably merged. Base quality information is also incorporated into this step and low-quality bases at the 30 ends may be trimmed off if initial attempts to merge are unsuccessful. Pair merging has the benefit of both reducing the number of reads for downstream steps and recalculating base qualities within the overlapping region using quality scores from both reads. Some examples of stand-alone Linux tools that can be used to merge read pairs include BBmerge [10], VSEARCH [11], or USEARCH [12]. 2. Quality filter merged reads. Merged reads should be quality filtered using an expected error rate threshold as described by Edgar 2015 [13] and implemented in tools such as USEARCH or VSEARCH to remove reads that contain sequencing errors. 3. De-replicate and denoise reads. High quality reads should be de-replicated within each sample to prepare the reads for denoising. De-replication reduces identical reads into unique reads with abundance values. In order to remove or reduce the number of spurious sequences that result from PCR/sequencing errors and contaminants, unique reads are denoised. The need for denoising becomes especially important when the goal of an experiment involves distinguishing sequence variants at a high resolution (i.e., SNP-level differences). Several algorithms that can be used for denoising include UNOISE (implemented in USEARCH and VSEARCH) [14], DADA2 [15], and Deblur [16].

78

Jenny G. Maloney et al.

4. Remove chimeric sequences. Denoised reads should be checked for chimeric sequences which may form during PCR amplification due to incomplete extension. One method for efficiently detecting and removing chimeras involves pooling denoised reads from all samples together, de-replicating reads again, and then performing chimera detection and removal using the UCHIME [17] algorithm that is implemented in the USEARCH or VSEARCH toolkits. The rationale behind pooling sample reads together is primarily to increase the abundance of real biological sequences relative to chimeric sequences. This in turn has the effect of increasing the sensitivity for detecting and filtering out chimeras. 5. Cluster reads. Chimera-free reads are demultiplexed back out into individual samples for read clustering. Clustering is performed in order to group reads together that are within a userchosen similarity threshold or at 100% similarity to forgo clustering and differentiate reads with as little as a single base difference. Clusters are referred to as operational taxonomic units (OTUs) or amplicon sequence variants (ASVs)/exact sequence variants (ESVs) when sequences are not clustered. 6. OTUs/ASVs/ESVs can be optionally filtered by their abundance to further remove low abundance sequences suspected of being artifacts, contaminants, or chimeras that were not caught by the UCHIME algorithm. Another optional round of chimera detection and removal can be performed at this point within each sample. 7. Align sequences to references. Using an appropriate reference database, sequences can be aligned to references using BLAST’s megablast option for intraspecies comparisons or blastn for interspecies comparisons [18]. BLAST results (hits) can be optionally filtered using metrics such as e-value, percent identity, or alignment length based on experimental needs and goals. For example, if novel sequence detection is a goal of the study, percent identity, and alignment length should use liberal settings (e.g., eval 10, no min id, min align length ¼ less than length of references). While if perfect alignment to a set of reference sequences is desired, more stringent cutoffs (e.g., eval4% parasitemia (ring stage) or >1% parasitemia (mature trophozoite/ schizont stage) and the method for “Small RNA quantity samples” when working with field isolates or cultures containing a lower amount of RNA. The latter method is suitable for as little as 120 μL of RBC pellet at 0.02% parasitemia [13].

3.3.1 Large RNA Quantity Samples

1. Thaw the Trizol samples at room temperature (see Note 13). 2. Centrifuge for 2 min at 800  g at room temperature. 3. Transfer the supernatant to a new 15 mL tube. Avoid taking the pellet, which contains insoluble material. 4. Add 0.2 volumes of chloroform and mix vigorously by inverting the tube several times. Incubate for 2–3 min at room temperature. 5. Centrifuge at 5000  g for 40 min at 4  C. 6. Transfer the aqueous upper phase to a new tube (see Note 14). 7. Add 0.8 volumes of pure isopropanol and mix well by inverting the tube vigorously. Incubate overnight at 4  C for the RNA to precipitate. 8. Centrifuge at 5000  g for 50 min at 4  C and carefully remove the supernatant. A white gel-like RNA pellet may be visible. 9. Add 6 mL of cold (20  C) 75% ethanol to the pellet and mix by inversion. Centrifuge the sample for 20 min at 5000  g at 4  C. Carefully remove all the ethanol, leaving only the RNA pellet (see Note 15). 10. Dry the RNA pellet by leaving the tube open at room temperature for 5 min. If the RNA pellet is not totally dry, extend this step for an additional 10–15 min, but do not over dry the RNA pellet. 11. Resuspend the RNA pellet with 30–45 μL of RNase-free or DEPC-treated water and tap the tube gently to mix. Incubate for 5 min at 55  C using a heated block. 12. Gently pipette up and down the resuspended RNA and transfer to an RNase-free microtube. Measure the RNA concentration using a spectrophotometer (e.g., NanoDrop) and store at 80  C or proceed to the next step.

Transcriptional Analysis of Plasmodium falciparum

173

13. Incubate up to 45 μg of RNA with the DNase treatment mix for 10 min at room temperature, following manufacturer’s recommendations (see Notes 16 and 17). DNase treatment mix: l

RDD Buffer 10 μL.

l

DNase I 2.5 μL (2.7 U/ μL).

l

RNA 15 μg of total RNA. 3.3.2 Small RNA Quantity Samples

1. Thaw the Trizol samples at room temperature (see Note 13). 2. Centrifuge for 2 min at 800  g at 20–25  C. 3. Transfer the supernatant to a new tube. 4. Add 0.2 volumes of chloroform and mix vigorously by inverting the tube several times. Incubate the solution for 2–3 min at room temperature. 5. Centrifuge for 30 min at 3000  g at 4  C. 6. Transfer the aqueous upper phase to a new tube (see Note 14). 7. Add 1 volume of 70% ethanol and mix well by pipetting up and down. 8. Transfer 700 μL of the sample to an RNeasy spin column placed in a 2 mL collection tube from the RNeasy Mini Kit. 9. Centrifuge 15 s at >8000  g and discard the flow-through. If the sample volume exceeds 700 μL, centrifuge successive aliquots in the same RNeasy spin column and discard the flowthrough after each centrifugation. Follow the next steps of the manufacturer’s protocol. 10. After the first Buffer RW1 washing of the RNeasy spin column, perform an on-column DNase I treatment by adding a mixture of 10 μL RNase-free DNase I and 70 μL RDD buffer. Incubate for 15 min at room temperature (see Note 16). 11. After DNase treatment, wash with Buffer RW1 and incubate for 5 min at room temperature. Continue the purification following the RNeasy Mini Kit instructions.

174

Nu´ria Casas-Vila et al.

12. Finally, elute the RNA by adding 30 μL of RNase-free water directly to the center of the spin column membrane. Measure the RNA concentration and store it at 80  C until use. 1. Incubate two tubes with 100–500 ng of RNA for 10 min at 70  C in a thermocycler and then place them on ice (see Note 18).

3.4 Reverse Transcription

2. Set up a 10 μL reaction with AMV Reverse Transcriptase (+ R.T.) and an equivalent reaction with water instead of the enzyme ( R.T. control) (Table 2). 3. Incubate the reaction in a thermal cycler under the following conditions: Primer Hybridization, 25  C, 10 min; Reverse Transcription, 42  C, 60 min; Inactivation, 95  C, 5 min; Hold, 4  C, 5 min. 4. Dilute the cDNA in nuclease-free water as needed (see Note 19) and store at 20  C. 3.5 Analysis of cDNA by Quantitative PCR (qPCR)

Two main approaches are available to analyze cDNA by qPCR: probe-based approaches (e.g., TaqMan assays) and DNA stainbased approaches (e.g., SYBR green-based assays). For SYBR green assays, the absolute and the relative quantification methods are available. The absolute quantification approach, also known as the standard curve method, involves interpolating the Cq values of the samples of interest against a standard curve generated from serial dilutions of standards of known quantity. In contrast, the relative quantification approach, known as the ΔΔCt method, relies on the comparison of the Cq values obtained between samples,

Table 2 Reverse transcription reaction setup (per sample). If multiple RNAs are reverse transcribed, it is recommended to prepare a master mix containing all components except the RNA +R.T.

R.T.

MgCl2 25 mM

2μL

2μL

Reverse Transcription 10X buffer

1μL

1μL

dNTP mixture 10 mM

1μL

1μL

Recombinant RNAsin ribonuclease inhibitor 40 U/μL

0.25μL

0.25μL

AMV reverse transcriptase 25 U/μL

0.3μL

0μL

Oligo(dT)15 primer 500μg/ml

0.25μL

0.25μL

Random primers 500μg/ml

0.25μL

0.25μL

RNA

100–500 ng

100–500 ng

RNase-free water

To 10μL

To 10μL

Transcriptional Analysis of Plasmodium falciparum

175

such as test vs. control samples, for the gene of interest and a housekeeping gene. The protocol described here is a mixture of absolute and relative quantification methods: we use the standard curve method, but the standard curve does not consist of samples of known quantity but rather of serial dilutions of a genomic DNA sample (see Note 20). Therefore, interpolation against the standard curve yields transcript level values in arbitrary units. By including a standard curve, our approach has the advantage of being more robust and reproducible than ΔΔCt methods and is less affected by differences in the amplification efficiency. However, by using genomic DNA for the standard curve, the method is straightforward to perform, as it does not require the time-consuming preparation of standards with a known absolute quantity of the amplicon (e.g., plasmids) for each gene analyzed. 1. Design primer pairs with which to amplify the gene(s) of interest and normalization gene(s) (see Note 21). 2. Prepare the serial dilutions for the standard curve using genomic DNA (gDNA) from a parasite line that has one copy of the genes of interest. Perform serial 1:10 dilutions from 10 to 0.001 ng/ μL, mixing well after each dilution to obtain reproducible results (see Notes 22 and 23). 3. Decide the plate distribution of all the samples taking into account the following guidelines (Fig. 5) (see Note 24): l

l

Each primer pair (both for genes of interest and normalization genes) requires a standard curve of five different concentrations and a no-template control (NTC) containing water instead of gDNA. The standard curve dilutions, NTCs,  R.T. controls and test samples (+ R.T.) for each primer pair should all be placed in the same plate. If all the samples do not fit in a single plate, the standard curve will have to be repeated in each plate.

l

96-well or 384-well plates can be used depending on the number of samples and the thermocycler that is available.

l

The Reverse Transcriptase negative controls ( R.T.) for each sample must be included in order to exclude the possibility of gDNA contamination.

l

All samples are analyzed in triplicate.

4. Prepare a SYBR green reaction mix considering the type of plate, final primer concentration (see Note 25) and number of samples (Table 3).

176

Nu´ria Casas-Vila et al.

Fig. 5 Example of a 96-well plate distribution to analyze by qPCR the relative transcript levels of one gene of interest (gene X) and one normalizing gene (gene N) in four cDNA samples (A to D), using a standard curve. The corresponding - R.T. controls (A- to D-) are included. Wells containing the reaction mix with the primer pair for gene X are in yellow, whereas wells for gene N are in blue. Samples with the same master mix are grouped together and triplicates are placed in the same row to facilitate master mix dispensing and sample loading. NTC no-template control Table 3 qPCR reaction setup (per well), not including the template. For each primer pair, prepare a mix containing all of these components 384-well plate

96-well plate

Power SYBR® Green PCR Master Mix 2X

5μL

10μL

Forward primer

200–500 nM (final)

200–500 nM (final)

Reverse primer

200–500 nM (final)

200–500 nM (final)

Milli-Q water

To 8μL

To 18μL

5. Distribute 8 or 18 μL of reaction mix in the corresponding wells of the 384 or 96-well plate, respectively. It is recommendable to use a stepper pipette. 6. Add 2 μL of sample to each well, being careful to leave the sample at the bottom of the well. 7. Cover the plate with Optical Adhesive Film and gently press with an Optical Adhesive Film Applicator in order to eliminate any possible air bubbles from under the film. 8. Vortex the plate for 30 s and spin it (see Note 26). 9. Incubate the plate in a real-time PCR instrument. The precise conditions will depend on the size of the amplicon and the

Transcriptional Analysis of Plasmodium falciparum

177

melting temperature of the primers; our standard thermal cycler conditions are as follows: l

Initial denaturation – 95  C, 10:00 min

l

Cycling stage (40) – 95  C, 00:15 – 57  C, 00:30 – 60  C, 00:30 (measure SYBR green signal)

l

Melt curve – 95  C, 00:15 – 60  C, 01:00 – 60–95  C (measure SYBR green signal in 0.3 increments)



C

– 95  C, 00:15 Data Analysis

All the steps of the analysis can be performed using the software of the real-time PCR instrument. The data generated can be exported to a spreadsheet for downstream analysis. A representative example of how to calculate relative transcript levels from the RT-qPCR data is provided in Fig. 6.

3.6.1 Quality Control

1. Check the specificity of the reaction using the melt-curve graph (see Note 27).

3.6

2. Check the amplification plot and define a threshold within the exponential phase of the curve. The Cq values obtained depend on this threshold. Be consistent between experiments in the criteria used to define the threshold (e.g., set it at the middle of the exponential phase). 3. Verify that there is no amplification in the NTCs and  R.T. controls (see Note 28). 4. Confirm that all triplicates for each sample have a similar Cq value (allow a 0.3 difference between replicates). Exclude from the analysis wells that have differences in Cq >0.3 compared to the other two replicates (see Note 29). 5. For each primer set, construct a standard curve by plotting the logarithm of the quantity in arbitrary units (x-axis, typically the largest dilution is given a value of 1) against the respective Cq values (y-axis) and using linear regression. Next, calculate the amplification efficiency from the slope of the standard curve. Good efficiencies range from 90 to 110% (see Note 30).   %Efficiency ¼ 101=slope  1  100:

178

Nu´ria Casas-Vila et al.

Fig. 6 Example of the analysis pipeline to calculate the relative transcript levels in an RT-qPCR experiment using the standard curve method. (a) Standard curves for the gene of interest (hsp70-1) and the normalizing gene (serrs) are generated by plotting the Cq values against the log10 of the quantity of the gene in arbitrary units (a.u.) for the serial dilutions of gDNA. An equation relating the two variables is obtained using linear regression. (b) The quantity of hsp70-1 and serrs transcripts (in a.u.) in samples A and B is calculated by interpolating their Cq values in the standard curve. Finally, hsp70-1 transcript levels relative to serrs are calculated for each sample. The majority of calculations (e.g., generation of the standard curves, interpolation) can be performed using the software provided with the real-time PCR instrument 3.6.2 cDNA Quantification and Normalization

1. Interpolate the Cq value of each unknown sample against the standard curve to determine the quantity (in arbitrary units) of transcripts (see Note 31). 2. Calculate the quantity mean and standard deviation of the triplicate reactions for each sample. 3. Divide the transcript levels of the gene of interest by that of the reference gene to obtain the relative transcript levels. The selection of the normalizing gene is critically important. As a general principle to decide which normalizing gene to use (Table 4):

Transcriptional Analysis of Plasmodium falciparum

179

Table 4 Examples of genes commonly used to normalize transcript levels, with roughly constant expression along the IDC or with marked stage-specific expression. Transcript levels for these genes during the IDC, obtained from a microarray-based study (3D7-A line) are shown [4]. The gene uce has also been used for normalization in studies involving gametocyte stages [14]

Name

Gene ID

Stage of expression

serine-tRNA ligase (serrs) [15, 16]

PF3D7_0717700

Constant

ubiquitin-conjugating enzyme E2 (uce) [14]

PF3D7_0812600

Constant

skeleton-binding protein 1 (sbp1) [17]

PF3D7_0501300

Rings

High molecular weight rhoptry protein 2 (RhopH2) [15, 16]

PF3D7_0929400

Schizonts

Time-course expression

(a) To normalize by “number of parasites”: use genes that have constant expression along the parasite development cycle. (b) To normalize by “number of parasites at the stage of development at which the gene of interest is expressed”: use genes with a temporal pattern of expression that coincides with that of the gene of interest. This enables comparison of the transcript levels between parasite samples that were not collected at exactly the same stage of life cycle progression. The more similar the expression pattern between test gene and normalizing gene, the more robust the comparison will be.

180

4

Nu´ria Casas-Vila et al.

Notes 1. Our standard conditions for parasite cultures are 3% hematocrit, Albumax II- supplemented parasite culture medium (rather than human serum-supplemented medium), 5% CO2/3% O2/92% N2 atmosphere. 2. Use a biosafety level 2 rated tissue culture cabinet to work under sterile and safe conditions. 3. It is important to estimate in the cycles previous to tight synchronization the time of the day at which Percoll purification can be started. This will depend on the age of the ring stage parasites in the frozen stock (this can range from predominantly very early rings to predominantly late rings) and the time of thawing. For parasite lines with a regular ~48 h duration of the IDC, in consecutive cycles cultures will be ready for Percoll purification at approximately the same time. For parasite lines with an intrinsic multiplication rate 6 (under the same conditions used during the synchronization, i.e., static or shaking), we consider it optimal to start the Percoll purification when there are 2 “new rings” for each schizont, but for parasite lines with lower multiplication rates, a “new rings” to schizonts ratio 1 is appropriate. 4. The required parasitemia before Percoll-sorbitol synchronization depends on the total number of tightly synchronized parasites needed for downstream analysis. The reinvasion efficiency of the purified schizonts (i.e., the number of new rings formed from each schizont) should be considered in the calculations to determine the required starting parasitemia. The reinvasion efficiency can be estimated from the intrinsic multiplication rate of each strain (fold increase in parasitemia from one cycle to the next) under the culture conditions used during schizont bursting/reinvasion (e.g., shaking or static; shaking enhances reinvasion), and is affected by the starting parasitemia (high parasitemia compromises reinvasion efficiency in some strains) [18]. Generally, it is sufficient to start Percoll purification with a 30 mL, 3% hematocrit culture at a 2–3% schizonts parasitemia to obtain a culture of the same volume and hematocrit with ~5% tightly synchronized rings, for a multiplication rate of ~8 (this calculation takes into account that the typical yield of Percoll purification is ~50% and many of the schizonts do not burst before sorbitol lysis). 5. Use 10 mL of pre-warmed 63% Percoll for up to 1.8 mL of RBCs pellet (i.e., 60 mL of culture at 3% hematocrit). Percoll purification gives poor yields in some parasite lines; for the 3D7 reference strain it works well.

Transcriptional Analysis of Plasmodium falciparum

181

6. Determine the number of RBCs that need to be added to adjust the parasitemia between 1 and 4%, assuming a 50% yield of the Percoll purification, while maintaining the hematocrit at 3%. Higher parasitemia should be avoided because it can compromise schizont bursting and reinvasion, and there is variation between parasite lines in their sensitivity to high density [18]. Confirm the parasitemia of the cultures using Giemsa-stained smears and, if it is >4%, dilute the cultures. 7. To harvest RNA at the ring, trophozoite or schizont stages, the parasitemia after sorbitol lysis should be