Trends in the Systematics of Bacteria and Fungi 1789244986, 9781789244984

Methods in microbial systematics have developed and changed significantly in the last 40 years. This has resulted in con

247 8 6MB

English Pages 366 [367] Year 2020

Table of contents :
Cover
Trends in the Systematics of Bacteria and Fungi
Copyright
Contents
List of Figures
List of Tables
List of Authors
Preface
1 Bridging 200 Years of Bacterial Classification
Introduction
The Historical Perspective
The changing consideration of bacterial taxonomic assessment
The early era
A witness of scientific progress: Bergey’s Manual of Determinative Bacteriology
The dawn of unravelling the evolution of Prokaryotes
Reconciliation of bacterial taxonomy
Changing gear: Bergey’s Manual of Systematic Bacteriology
Some Considerations on Taxonomy and the Misunderstandings
The Paramount Relevance of the Genomic Data
Recent Innovations
Taxonomy Needs to Change Its Path
Conclusions: Reconciliation or Divorce
Acknowledgements
References
2 Identification of Fungi: Background, Challenges and Prospects
Introduction
Fungi and Fungi
The Identification Process
Challenges of Identifying Fungi
Prospects for Addressing Challenges
Conclusion
References
3 Names of Microorganisms and Data Resources to Retrieve Information About Published Names
Introduction
The International Code of Nomenclature of Prokaryotes (the Prokaryotic Code)
Resources from Which Information About Names of Prokaryotic Taxa Can Be Retrieved
The ‘official’ sources of information: articles and lists in the IJSB/IJSEM
Online Resources that Provide Information on Validly Published Names of Taxa of Prokaryotes
List of Prokaryotic names with Standing in Nomenclature (LPSN) (Euzéby, 1997; Parte, 2014, 2018)
Prokaryotic Nomenclature Up-to-Date (Leibniz Institute DSMZ, 2019)
NamesforLife (Garrity, 2010)
Names of Prokaryotes Effectively but not Validly Published
Names of Candidatus Taxa of Prokaryotes
The Special Status of the Cyanobacteria/Cyanophyta
Names of Fungi and Related Digital Resources
Effective Publication Under the ICN
Chapter F of the ICN
Typification
Priority, starting-point dates and hemihomonyms
Registration of nomenclatural acts
Pleomorphic life cycles – One fungus: one name
Lists of approved and rejected names
Naming cryptic diversity
eDNA
Linking names to DNA, and DNA to names
Data Standards and Databases
Digital Resources
Names
Taxa
Descriptive data
Sequence-related databases
Conclusion
References
4 Preserving the Reference Strains
Rationale
Introduction
Handling Samples from the Environment to the Laboratory
DNA Sample Preparation and Storing
Sample Acquisition and Authentication
Preservation Techniques
Approaches to Testing Stability in Storage
mBRC Management: Adopting an Appropriate Standard
Conclusion
References
5 Can Older Fungal Sequence Data be Useful?
Introduction
Data Available
Placing mOTUs in Beauveria
‘Fishing’ for New Sequences
‘Clustering’ for New Sequences
Outcome
Mislabelled Sequences
Limitations in the Methodology
ITS
Cut-off values
Duplicated sequences
Limitations of names and labels
Species Complexes
Conclusion
References
6 Data Resources: Role and Services of Culture Collections
Introduction
The Importance of Reliable Data
Desired Function of a Modern Culture Collections Management System
Reliable and Useful Data
Supporting Fungal Taxonomy
Standards and Open Access
Conclusion
References
7 MALDI-TOF MS and Currently Related Proteomic Technologies in Reconciling Bacterial Systematics
Introduction
Proteins in Microbial Systematics
Arrival of MALDI-TOF MS in Microbiology
Establishing MALDI-TOF MS in Clinical Microbiology
MALDI-TOF MS in the Non-clinical Laboratory and its Role in Searching for New Diversity
MALDI-TOF MS in Subspecies Identification, Typing and Screening for Genetic Variants: Implication for Systematics
MALDI-TOF MS in Microbial Systematics; a Case Study Involving Cutibacterium acnes
Brief biology of Cutibacterium acnes
MALDI-TOF MS delineates three proteotypes
Correlation of proteotypes with whole-genome sequencing
MALDI-TOF MS and the Future Interest of MS Companies
Use of MALDI-TOF MS in a Clinical Laboratory
Limitations of MALDI-TOF MS as currently used
Retaining the Interest of Mass Spectrometry Companies
Potential to Identify the Biomarker Peaks in a MALDI-TOF MS Spectrum: Towards a MALDI-TOF MS Global Database
High-resolution forms of MS that may be used to deduce peptide/protein taxon-specific signatures
From linear MALDI-TOF MS to tandem LC-MS/MS: unravelling the proteome of microbial species and future implications for bacterial systematics
Case study: use of tandem LC-MS/MS during a major disease outbreak of pathogenic E. coli and taxonomic implications
Nature of the outbreak
Proteomics and systematics in a high-containment laboratory
Conclusion
References
8 MALDI-TOF MS and its Requirements for Fungal Identification
Introduction
Principles of MALDI-TOF MS and its Application in Fungal Taxonomy
Examples of the Use of MALDI-TOF MS Technique in Fungal Identification
Limitations to the Use of MALDI-TOF MS Technique in Fungal Identification
MALDI-TOF MS for Cryptic and Dimorphic Fungal Identification
MALDI-TOF MS Databases and Data Analysis in Fungal Identification
Current situation of each different commercial database dedicated to fungal identification
In-house MALDI-TOF MS databases for fungal identification
Conclusion
Acknowledgements
References
9 The Strength of Chemotaxonomy
Introduction
Background and History of Chemotaxonomic Biomarkers
Cell wall components
Lipids
Polyamines
Applications of Chemotaxonomy to Bacterial Systematics
Winds of Change: Chemotaxonomy in the Era of Omics
Conclusion: Chemotaxonomy and What Lies Ahead
References
10 Microbial Genomic Taxonomy
Introduction
Genomic Microbial Taxonomy
In silico Phenotyping
Suggestions for a Genome-based Taxonomy
Challenges Ahead for Microbial Taxonomy in the Context of Microbial Ecology
Challenges in the Taxonomy of the Cyanobacteria Phylum
Conclusion
References
11 Navigating Bacterial Taxonomy in a World of Unchartered Microbial Organisms
Introduction
Determining Taxonomy in Metabarcoding Experiments
Approaches for Assigning OTUs to Amplicon Sequences
Emergence of Amplicon Sequence Variants (ASVs)
Assigning Taxonomy to MAGS
The Disconnect Between MAGs and Metabarcoding Approaches
Conclusion
References
12 Sequence-based Identification and Classification of Fungi
Introduction
The ITS Region as a Universal Barcode for Fungal Identification: Advantages and Limitations
Secondary DNA Barcode Regions as Adjuncts to (or Replacements for) ITS
Quality of Reference Sequence Libraries
The Problem of Sequences Without Names: ‘Dark Taxa’
Implications for Fungal Taxonomy and Nomenclature
Conclusion
References
13 Identification and Classification of Prokaryotes Using Whole-genome Sequences
Introduction
Genome-based Classification: Advantages
What Did Whole-genome Sequencing Reveal About Traditional Taxonomic Practices?
Genome-based Classification: Limitations
Genome Classification Resources Available
Unculturable Taxa: Genome-based Classification is the Only Way Forward
Tips for Genome-based Classification of an Unknown Query Genome
Acknowledgements
References
14 Genomic Sequences for Fungi
Introduction
The Species Concept in the Next-generation Sequencing (NGS) Era
Methodology
Sequencing technologies
De novo, resequencing and targeted sequencing
RNA sequencing (RNA-Seq)
Epigenetics
Genomic variation and mutation detection
Data analysis and interpretation
Experimental design and generation of data
Analysis
Interpretation
Visualization and reporting
Techniques
Comparative genomics
Genome sequences to link genetics with biological traits
From biochemistry to genomics
Metagenomics
Technology and Fungal Systematics
Saccharomyces
Penicillium
Aspergillus
Fusarium
Colletotrichum
Discussion and Conclusion
Acknowledgements
References
15 What can Genome Analysis Offer for Bacteria?
Introduction
Schools of Taxonomic Thought and Associated Methods of Analysis
Methodological Issues in Polyphasic Taxonomy
Causes of Conflict Between Taxonomic Classifications and Genome-scale Analyses
Assigning Taxonomic Ranks Using Genome-scale or Other Data
Conclusion
References
16 Genomes Reveal the Cohesiveness of Bacterial Species Taxa And Provide a Path Towards Describing All of Bacterial Diversity
Introduction
How Taxonomy Demarcates Bacterial Species
A Genome-based Species Taxonomy
Substituting a type genome sequence for a type strain
Demarcating genomes into new species
Describing the phenotype of novel species
Is There Something Real About Species?
Recombination Does Not Prevent Ecological Divergence Between Bacterial Populations
Periodic Selection as a Force of Cohesion in Bacterial Species
Ecotypes as Species-like Lineages
Enriching Bacterial Systematics with Ecotypes
Recombination as a Force of Cohesion Among Ecologically Distinct Lineages
A Force of Cohesion That is Limited to Species Taxa Across Much of Life
Conclusion
References
17 Are Species Concepts Outdated for Fungi? Intraspecific Variation in Plant-pathogenic Fungi Illustrates the Need for Subspecific Categorization
Introduction
Difficulties in Applying Species Concepts in Fungi
Phylogenetic Species Concept and Molecular Data
Structured Case Summaries
Rhizoctonia solani
Colletotrichum
Fusarium oxysporum
Verticillium
Redefinition of species in Verticillium
Intraspecific diversity in Verticillium species and its phytopathological relevance
Conclusion
References
18 Where to Now?
Introduction
Progress in Mycological Systematics
Species Concepts
Diverging Developments in Bacterial Classification
Bacterial Nomenclature in the Future
Reference Materials for Mycology
Herbarium resources
Curating the names
Networking Microbial Strain Information
Systematics in the Post-Nagoya Era
Conclusion
References
Appendix Abbreviations and Acronyms List
Abbreviations and Acronyms List
Chapters 1–10
Chapters 11–17
Index
Back Cover

Recommend Papers

Beneficial microbes in agro-ecology. Bacteria and fungi 9780128235584, 0128235586

1. Arthrobacter -- 2. Alcaligenes -- 3. Serratia -- 4. Rhizobium -- 5. Streptomyces -- 6. Azospirillum -- 7. Bacillus --

1,313 170 13MB Read more

Phylogenetic Systematics (Species and Systematics)

Phylogenetic Systematics: Haeckel to Hennig traces the development of phylogenetic systematics against the foil of ideal

414 39 2MB Read more

Systematics and Morphology of American Mosasaurs 9781933789453

A classic work from the Yale Peabody Museum of Natural History describing the mosasaurs, a group of large predatory mari

162 4 37MB Read more

The New Systematics

624 18 23MB Read more

The Evolution of Phylogenetic Systematics 9780520956759

The Evolution of Phylogenetic Systematics aims to make sense of the rise of phylogenetic systematics—its methods, its ob

129 64 3MB Read more

The Secret Life of Fungi 9781783965311, 9781783965328

433 68 2MB Read more

The Curious World of Bacteria 9781771648257, 9781771648264

Bacteria were the first life on Earth. But what do we really know about them? In this captivating, science-driven book,

174 68 867KB Read more

Bacteria-Plant Interactions: Advanced Research and Future Trends [1 ed.] 9781910190005, 9781908230584

The relative food prosperity of the 1980/90s has been eroded in recent years through the convergence of a variety of fac

131 103 5MB Read more

Fungi of Switzerland [1/6]

191 61 122MB Read more

Manual For The Identification Of Medical Bacteria

415 39 25MB Read more

Trends in the Systematics of Bacteria and Fungi
1789244986, 9781789244984

Author / Uploaded
Paul D. Bridge
David Smith
Erko Stackebrandt

Similar Topics
Biology
Microbiology

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Trends in the Systematics of Bacteria and Fungi

Trends in the Systematics of Bacteria and Fungi

Edited by

Paul Bridge David Smith and

Erko Stackebrandt

CABI is a trading name of CAB International CABI Nosworthy Way Wallingford Oxfordshire OX10 8DE UK Tel: +44 (0)1491 832111 Fax: +44 (0)1491 833508 E-mail: [email protected] Website: www.cabi.org

CABI WeWork One Lincoln St 24th Floor Boston, MA 02111 USA Tel: +1 (617)682-9015 E-mail: [email protected]

© CAB International 2021. All rights reserved. No part of this publication may be reproduced in any form or by any means, electronically, mechanically, by photocopying, recording or otherwise, without the prior permission of the copyright owners. A catalogue record for this book is available from the British Library, London, UK.

References to Internet websites (URLs) were accurate at the time of writing. ISBN-13: 9781789244984 (hardback) 9781789244991 (ePDF) 9781789245004 (ePub) Commissioning Editor: Rebecca Stubbs Editorial Assistant: Lauren Davies Production Editor: Marta Patiño Typeset by SPI, Pondicherry, India Printed and bound in the UK by Severn, Gloucester

Contents

List of Figuresvii List of Tables

xi

List of Authors

xiii

Preface

xvii

Chapter 1 Bridging 200 Years of Bacterial Classification Ramon Rosselló-Móra and Erko Stackebrandt Chapter 2 Identification of Fungi: Background, Challenges and Prospects Tom W. May Chapter 3 Names of Microorganisms and Data Resources to Retrieve Information about Published Names Aharon Oren, Aidan Parte and Jerry Cooper

1 21

30

Chapter 4 Preserving the Reference Strains David Smith and Vera Bussas

55

Chapter 5 Can Older Fungal Sequence Data be Useful? Paul Bridge

69

Chapter 6 Data Resources: Role and Services of Culture Collections Matthew J. Ryan, Gerard Verkleij and Vincent Robert

83

Chapter 7 MALDI-TOF MS and Currently Related Proteomic Technologies in Reconciling Bacterial Systematics Haroun N. Shah, Ajit J. Shah, Omar Belgacem, Malcolm Ward, Itaru Dekio, Lyna Selami, Louise Duncan, Kenneth Bruce, Zhen Xu, Hermine V. Mkrtchyan, Rory Cave, Laila M.N. Shah and Saheer E. Gharbia

93

v

vi Contents

Chapter 8 MALDI-TOF MS and its Requirements for Fungal Identification Cledir Santos, Paula Galeano, Reginaldo Lima-Neto, Manoel Marques Evangelista Oliveira and N elson Lima

119

Chapter 9 The Strength of Chemotaxonomy Paul A. Lawson and Nisha B. Patel

141

Chapter 10 Microbial Genomic Taxonomy Cristiane C. Thompson, Livia Vidal, Vinicius Salazar, Jean Swings and Fabiano L. Thompson

168

Chapter 11 Navigating Bacterial Taxonomy in a World of Unchartered Microbial Organisms Varsha Kale, Lorna Richardson and Robert D. Finn Chapter 12 Sequence-based Identification and Classification of Fungi Andrew M. Borman and Elizabeth M. Johnson Chapter 13 Identification and Classification of Prokaryotes Using Whole-genome Sequences Luis M. Rodriguez-R, Ramon Rosselló-Móra and Konstantinos T. Konstantinidis

179 198

217

Chapter 14 Genomic Sequences for Fungi Riccardo Baroncelli and Giovanni Cafà

231

Chapter 15 What can Genome Analysis Offer for Bacteria? Markus Göker

255

Chapter 16 Genomes Reveal the Cohesiveness of Bacterial Species Taxa and Provide a Path Towards Describing All of Bacterial Diversity Frederick M. Cohan

282

Chapter 17 Are Species Concepts Outdated for Fungi? Intraspecific Variation in Plant-pathogenic Fungi Illustrates the need for Subspecific Categorization Enrique Monte, Rosa Hermosa, María del Mar Jiménez-Gasco and Rafael M. Jiménez-Díaz

301

Chapter 18 Where to Now? Paul Bridge, Erko Stackebrandt and David Smith

320

Appendix

335

Index

337

List of Figures

1.1 Development of taxonomic approaches through different historical phases 3 1.2 Yearly number of new species descriptions and new 16S rRNA gene entries in public repositories. The primary Y-axis (left) in red relates to the number of described taxa (red line), and the secondary Y-axis (right) relates to the number of gene entries in public repositories (blue line) in accordance with the SILVA database (www.arb-silva.de). Owing to the lack of releases in 2018, the numbers of gene entries for 2018 and 2019 are given as the mean in yearly increase between SILVA SSU 132 and 138 (Quast et al., 2013) 11 3.1 Taxon entry for the species Klenkia soli in LPSN (http://www.bacterio.net/ klenkia.html; accessed 23 October 2019) 35 3.2 NamesforLife Guide widget displayed by clicking on the genus name Acetobacter in an IJSEM paper 37 3.3 NamesforLife Name Abstract for Haloferax volcanii38 3.4 The entry for the genus Chroococcidiopsis in CyanoDB (http://www.cyanodb.cz/#/ Chroococcidiopsis; accessed 31 October 2019) 41 3.5 The entry for the species Hortaea werneckii in MycoBank (partial view) (http://www.mycobank.org/BioloMICSDetails.aspx?Rec=13049; accessed 31 October 2019) 47 6.1 Management of collection data in line with both FAIR standards and the GODAN model 89 7.1 MALDI-TOF MS profiles obtained for the same sample using three different ProteinChip arrays (NP1, SAX2 and WCX1) to reveal the complexity of a sample 96 7.2 The separation of Lactobacillus species and subspecies using Bruker Biotyper (software version 3.0) 100 7.3 Dendrograms showing phenotypic similarities and relationships of 53 Pseudomonas aeruginosa strains from cystic fibrosis (CF) and non-CF sites and the same strains analysed genetically using VNTR (variable number tandem repeat) 101 7.4 (Left) Unrooted cluster analysis of Staphylococcus species in the community based upon MALDI-TOF MS spectral profiles. (Right) Three-dimensional scatter plot of Staphylococcus aureus isolates using a supervised method such as linear discriminant analysis (LDA, Bionumerics) to show relationships among isolates from specific sites which may be useful for identifying unique traits during transmission (adapted from Vranckx, K. et al., 2017) 102

vii

viii

List of Figures

7.5 Unique biomarker mass ions in the MALDI-TOF MS spectrum of Cutibacterium acnes subspecies (Dekio et al., 2015) 104 7.6 Unique biomarker mass ions between the 10 and 20 kD segment of the SELDI-TOF MS spectra of anaerobically-cultured Cutibacterium acnes isolates 105 7.7 (Left) MALDI-TOF MS analysis of staphylococcal species using ASTA’s Tinkerbell Linear MS and Bruker’s Autoflex LRF MALDI-TOF MS. (Right) 16S rRNA identification of 10 atypical strains of Staphylococcus cohnii. Both instruments revealed low identification scores, but most samples were correctly identified 108 7.8 Use of top-down proteomics to delineate genetically closely related species. In an early attempt to demonstrate the high resolution of this approach, proteins unique to both species were evident but were differentially expressed (Shah et al., 2015) 111 8.1 MALDI-TOF MS spectrum of Fusarium guttiforme strain E-480 (adapted from Santos et al., 2016) 123 8.2 Dendrograms obtained from (A) MALDI-TOF MS mass spectra of T. rubrum strains; and (B) ITS sequence data of T. rubrum strains. (Data from the Mycology Applied Group, University of Minho, Braga, Portugal) 127 8.3 Early identification by MALDI-TOF MS of the pineapple pathogen Fusarium guttiforme (A and B) and its antagonist Trichoderma asperellum on decayed pineapple (C and D). (Data from the Mycology Applied Group, University of Minho, Braga, Portugal) 128 8.4 MALDI-TOF MS spectra of darkly pigmented strains of Colletotrichum spp. Colony pigmentation increased qualitatively from colony A to D. (Data from the Chemistry of Fungi Group, Department of Chemical Science and Natural Resources, Universidad de La Frontera, Temuco, Chile) 129 8.5 MALDI-TOF MS lipid fingerprinting of different Phytophthora (A) species and (B) strains (adapted from Galeano, 2019) 130 9.1 Peptidoglycan types 144 9.2 Phylogenetic tree of 16S rRNA gene sequences and some diagnostic chemotaxonomic biomakers of members of the family Intrasporangiaceae. The phylogenetic analysis was performed using MEGA X (Kumar et al., 2018) using the maximum-likelihood (Felsenstein, 1981) employing the Kimura 2-parameter substitution model (Kimura, 1980). Bootstrap values (%) were obtained with 1000 replicates and are displayed on their relative branches (Felsenstein, 1985) 150 9.3 Phylogenetic tree of UDP-acetylmuramoylalanyl-D-glutamate-L-lysine ligase, UDP-acetylmuramoylalanyl-D-glutamate-L-lysine ornithine, and UDPacetylmuramoylalanyl-D-glutamate-2,6,-diaminopimelate ligase. The phylogenetic analysis was performed using MEGA X (Kumar et al., 2018) using the maximumlikelihood (Felsenstein, 1981) employing the Kimura 2-parameter substitution model (Kimura, 1980). Bootstrap values (%) were obtained with 1000 replicates and are displayed on their relative branches (Felsenstein, 1985) 154 9.4 Polar lipid analysis. Laboratory-based information compared to in silico information 155 9.5 Polar lipid analysis. Phylogenetic groups with laboratory-based information compared to in silico information 156 10.1 Phylogenomic tree of the Cyanobacteria phylum with the proposed new names (Walter et al., 2017). New species names in red correspond to new genera based on genome analyses (phylogenomics, AAI, GGDH). Former species names in black. Ecogenomic groups are reflected by branch colour 173 11.1 A schematic of the potential workflow from prokaryotic genomes to taxonomic annotation186

List of Figures

11.2 A subtree of the GTDB phylogeny: order 4c28d-15, class Clostridia phylum Firmicutes generated with iTOL (Letunic and Bork, 2019) 13.1 Distribution of average nucleotide identity (ANI) and aligned fraction (AF) between complete prokaryotic genomes 17.1 Rhizoctonia solani AG2-2 growing on minimal medium. The colony is characterized by the absence of any sporulation and the hyphae are wide and tend to branch at right angles. Inset: the formation of a septum near each hyphal branch and a slight constriction at the branch are diagnostic 17.2 Verticillate conidiophore characteristic of the genus Verticillium. Note phialides arranged in whorls and conidia at the tip of phialides 17.3 Microsclerotia of Verticillium dahliae formed in water agar and within a xylem vessel of an infected plant. Note different stages of microsclerotial development and morphology. Elongated microsclerotia are characteristic of the cotton- and olive-defoliating V. dahliae pathotype

ix

191 220

304 308

309

List of Tables

4.1 Quality control procedures recommended for microorganisms 5.1 Matches obtained from NCBI Nucleotide database for reference ITS sequences from Beauveria species 5.2 Some recent examples of multigene sets used in delineating fungal species 6.1 Example of a typical culture collection minimal microbial data set 6.2 Management and handling of data in a typical culture collection 7.1 Barriers to using MALDI-TOF MS exclusively as a universal typing tool 7.2 Summary of the current advantages of MALDI-TOF MS 8.1 Features of the two MALDI-TOF MS matrices most commonly used for fungal identification (maximum wavelength values from Robinson et al., 2018) 8.2 Some examples of MALDI-TOF MS studies for the identification of cryptic and dimorphic species of filamentous fungi 8.3 Main features of the major commercial MALDI-TOF MS systems available 9.1 Examples of chemotaxonomic biomarkers 9.2 Chemotaxonomic characteristics of members of the family Intrasporangiaceae. DAP, 2,6-diaminopimelic acid; Orn, ornithine; DPG, diphosphatidylglycerol; PG, phosphatidylglycerol; PI, phosphatidylinositol; PIM, phosphatidylinositol mannosides; PSer, phosphatidyl serine; PL, unidentified phospholipid(s); GL, unidentified glycolipid(s); PGL, unidentified phosphoglycolipid(s); ND, not determined 12.1 Selected loci and primers used for DNA sequencing identification of fungi. Primary (ITS) and secondary (TEF1α) barcode primers are in bold 12.2 List of currently available, curated databases for the sequence-based identification of fungi 12.3 Useful fungal taxonomy reference sites 15.1 Main differences between the phenetic school of taxonomy and phylogenetic systematics with respect to general principles, manual interpretation of single characters and computational tools used to analyse multiple characters. 15.2 Interpretation of the character states diagnostic for Turicella (and different from Corynebacterium) in distinct studies. The two 2018 studies were based on genome-scale data

63 73 76 84 87 103 106 124 131 132 148

151 202 203 207

257

263

xi

List of Authors

Riccardo Baroncelli Spanish–Portuguese Institute for Agricultural Research (CIALE), University of Salamanca, Spain; [email protected] Omar Belgacem Ascend Diagnostics Limited, CityLabs 1.0, Nelson Street, Manchester M13 9NQ, UK Andrew M. Borman UK National Mycology Reference Laboratory (MRL), Public Health England South-West, Bristol, Science Quarter, Southmead Hospital, Bristol BS10 5NB; and Medical Research Council Centre for Medical Mycology (MRC CMM), University of Exeter, UK; [email protected] Paul Bridge Axminster, UK; [email protected] Kenneth Bruce King’s College London, Molecular Microbiology Research Laboratory, Pharmaceutical Science Research Division, Franklin-Wilkins Building, 150 Stamford Street, London SE1 9NH, UK Vera Bussas Formerly Leibniz-Institut DSMZ - German Collection of Microorganisms and Cell Cultures, Inhoffenstr. 7 B 38124 Braunschweig, Germany; [email protected] Giovanni Cafà CABI Europe, Bakeham Lane, Egham, Surrey TW20 9TY, UK; [email protected] Rory Cave School of Health, Sport and Biosciences, University of East London, Water Lane, London E1 4NS, UK Frederick M. Cohan Department of Biology, Wesleyan University, Middletown, CT 06459, USA; [email protected] Jerry Cooper Manaaki Whenua – Landcare Research, 54 Gerald Street, Lincoln 7608, New Zealand Itaru Dekio Department of Biochemistry and Integrative Medical Biology, School of Medicine, Keio University, Japan Louise Duncan King’s College London, Molecular Microbiology Research Laboratory, Pharmaceutical Science Research Division, Franklin-Wilkins Building, 150 Stamford Street, London SE1 9NH, UK Robert D Finn European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK; [email protected]

xiv

List of Authors

Paula Galeano Facultad de Ciencias Básicas, Universidad de la Amazonia, Florencia, Caquetá 180002, Colombia Saheer E. Gharbia Public Health England, Department of Gastrointestinal Pathogens, 61 Colindale Avenue, London NW9 5EQ, UK Markus Göker Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Inhoffenstraße 7 B, D-38124 Braunschweig, Germany; [email protected] Rosa Hermosa Spanish–Portuguese Institute for Agricultural Research (CIALE), University of Salamanca, 37185 Salamanca, Spain Rafael M. Jiménez-Díaz College of Agriculture and Forestry, Universidad de Córdoba and Instituto de Agricultura Sostenible, CSIC, Avda. Menéndez Pidal s/n, 14004 Córdoba, Spain María del Mar Jiménez-Gasco Department of Plant Pathology and Environmental Microbiology, The Pennsylvania State University, University Park, PA 16802, USA Elizabeth M. Johnson UK National Mycology Reference Laboratory (MRL), Public Health England South-West, Bristol; and Medical Research Council Centre for Medical Mycology (MRC CMM), University of Exeter, UK Varsha Kale European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK; [email protected] Konstantinos T. Konstantinidis School of Civil and Environmental Engineering, School of Biological Sciences, Georgia Institute of Technology, 311 Ferst Dr, Ford ES&T Building, Suite 3224, Atlanta, GA 30332, USA; kostas@ ce.gatech.edu Paul A. Lawson Department of Microbiology and Plant Biology, University of Oklahoma, Norman, Oklahoma, USA; [email protected] Nelson Lima CEB–Biological Engineering Centre, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal Reginaldo Lima-Neto Department of Tropical Medicine, Federal University of Pernambuco (UFPE), Recife-PE 50.740600, Brazil Tom W. May Royal Botanic Gardens Victoria, Birdwood Avenue, Melbourne, Victoria 3004, Australia; Tom. [email protected] Hermine V. Mkrtchyan School of Biomedical Sciences, University of West London, St Mary’s Road, London W5 5RF, UK; School of Health, Sport and Biosciences, University of East London, Water Lane, London E1 4NS, UK Enrique Monte Spanish–Portuguese Institute for Agricultural Research (CIALE), University of Salamanca, 37185 Salamanca, Spain; [email protected] Manoel Marques Evangelista Oliveira Laboratory of Taxonomy, Biochemistry and Bioprospecting of Fungi, Oswaldo Cruz Institute, Oswaldo Cruz Foundation, Rio de Janeiro 21040-900, Brazil Aharon Oren Department of Plant and Environmental Sciences, The Institute of Life Sciences, The Hebrew University of Jerusalem, Edmond J. Safra Campus, Jerusalem 9190401, Israel; [email protected]

List of Authors

xv

Aidan C. Parte List of Prokaryotic names with Standing in Nomenclature (LPSN), Sudbury, MA 01776, USA Nisha B. Patel Department of Microbiology and Plant Biology, University of Oklahoma, Norman, Oklahoma, USA Lorna Richardson European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK; [email protected] Vincent Robert Westerdijk Fungal Biodiversity Institute, Uppsalalaan 8, 3584 CT, Utrecht, The Netherlands Luis M. Rodriguez-R School of Civil and Environmental Engineering, Georgia Institute of Technology, 311 Ferst Dr. NW, Atlanta, GA 30332, USA Ramon Rosselló-Móra Grup de Microbiologia Marina, Department Animal and Microbial Diversity, IMEDEA (CSIC-UIB), Miquel Marqués 21, 07190 Esporles, Illes Balears, Spain Matthew J. Ryan CABI, Bakeham Lane, Egham, Surrey TW20 9TY, UK; [email protected] Vinicius Salazar Universidade Federal do Rio de Janeiro, Av. Pedro Calmon, 550 - Cidade Universitária da Universidade Federal do Rio de Janeiro, Rio de Janeiro - RJ, 21941-901, Brazil Cledir Santos Department of Chemical Science and Natural Resources, Universidad de La Frontera, Av. Francisco Salazar 01145, 4811-230 Temuco, Chile [email protected] Lyna Selami Ascend Diagnostics Limited, CityLabs 1.0, Nelson Street, Manchester M13 9NQ, UK Ajit J. Shah Department of Natural Sciences, Middlesex University, London NW4 4BT, UK Haroun N. Shah Department of Natural Sciences, Middlesex University, London NW4 4BT, UK; harounnshah@ gmail.com Laila M.N. Shah King’s College London, Faculty of Natural and Mathematical Sciences, Department of Chemistry, Franklin-Wilkins Building, 150 Stamford Street, London SE1 9NH, UK David Smith CABI, Bakeham Lane, Egham, Surrey TW20 9TY, UK; [email protected] Erko Stackebrandt 38170 Kneitlingen OT Ampleben, Germany; [email protected] Jean Swings Universidade Federal do Rio de Janeiro, Av. Pedro Calmon, 550 - Cidade Universitária da Universidade Federal do Rio de Janeiro, Rio de Janeiro - RJ, 21941-901, Brazil; Ghent University, St. Pietersnieuwstraat 33, 9000 Gent, Belgium Cristiane C. Thompson Universidade Federal do Rio de Janeiro, Av. Pedro Calmon, 550 - Cidade Universitária da Universidade Federal do Rio de Janeiro, Rio de Janeiro - RJ, 21941-901, Brazil Fabiano L. Thompson Universidade Federal do Rio de Janeiro, Av. Pedro Calmon, 550 - Cidade Universitária da Universidade Federal do Rio de Janeiro, Rio de Janeiro - RJ, 21941-901, Brazil; fabianothompson1@ gmail.com Gerard Verkleij Westerdijk Fungal Biodiversity Institute, Uppsalalaan 8, 3584 CT, Utrecht, The Netherlands

xvi

List of Authors

Livia Vidal Universidade Federal do Rio de Janeiro, Av. Pedro Calmon, 550 - Cidade Universitária da Universidade Federal do Rio de Janeiro, Rio de Janeiro - RJ, 21941-901, Brazil Malcolm Ward Department of Natural Sciences, Middlesex University, London NW4 4BT, UK Zhen Xu Tianjin Key Laboratory of Environment, Nutrition and Public Health, Department of Toxicology and Sanitary Chemistry, School of Public Health, Tianjin Medical University, Qixiang Road No. 22, Tianjin 300070, China

Preface

Janus (Ianus) was one of the most ancient Roman gods who was said to have ruled with Saturnus during the ‘Golden Age’ that was free from toil, disease and war. He was the god of doors, bridges, passages and transitions. As a door has two sides he is depicted as looking in two directions, forwards and backwards, and he is perhaps an appropriate image as we consider some of the potential changes in microbial systematics going forwards. Systematics is a word that conveys different concepts to different people. It was described by Cowan in his Dictionary of Microbial Taxonomy (1978) as including ‘everything that contributes to the organization of organisms, both intrinsic and extrinsic, and man’s attempts to arrange them neatly and to label them so that the knowledge available can be transmitted to others succinctly’. Looking back, the first major systematic treatment of organisms is generally accepted as Aristotle’s zoological studies detailed in Historia Animalium. Aristotle reasoned that to progress towards an organized scientific understanding it was necessary to develop a system of concepts and propositions organized hierarchically. This has remained as the primary goal of systematists even though the scientific knowledge and methods available have increased vastly. There have been many fundamental developments in systematics through time, including the first descriptions of bacteria in the 17th century, the establishment of evolutionary concepts and the first tree of life approaches in the 19th century. These were followed by increased study of anatomy, physiology and biochemistry that resulted in a very wide-ranging phenetic systematics. The terms phylum and phylogeny are attributed to the 19th century German zoologist Ernst Haeckel who proposed some of the first ‘Trees of life’, and the concept of phylogenetic systematics was further developed from the 1950s by Willi Hennig and others. In microbial systematics a major change occurred in the 1970s. The first DNA-based tree of life for the bacteria was published in 1977 based on sequences from the 16S subunit of the ribosomal RNA gene, and this feature has become the major defining feature for bacterial genera and species. In mycology, phylogenetic studies were developed with various regions of the rDNA gene cluster, and the internally transcribed spacer region became established as a ‘universal bar code’ for fungal species in the 2000s. Looking forwards, this is a pivotal time for microbial systematics. There are some shortcomings in the use of single gene regions, and wider-ranging multi-locus studies have become necessary for many groups. The science and technology needed for whole-genome and whole-proteome studies are becoming more accurate, quicker and cheaper, and consequently more available. This raises exciting possibilities for using this technology to investigate fundamental phylogenetic and systematic xvii

xviii Preface

questions, not least of which is the definition of a species. At the same time both genomic and proteomic sampling in the environment indicates that there may be a vast majority of microorganisms that have yet to be isolated or grown in culture, and that can only be defined through such technology. This raises many questions as to how these organisms can be identified and classified, and how they can be related to the taxa already described. There are fundamental differences between the current levels of genomic and proteomic knowledge for bacteria and fungi. With multiple growth forms and over 100,000 known species the fungi probably present a more complex situation, but genomic studies are hindered by the lack of reliable reference data for many species. As activities such as environmental sampling, and genomic and proteomic profiling, become more important in extending our understanding of ecosystems, there is an increasing imperative for researchers in microbial systematics to develop the methods and concepts required to interpret the information being generated. In this volume we present a collection of chapters that provide some insights into how current methods and resources are being used in microbial systematics, together with some thoughts and suggestions about how both methodologies and concepts may develop in the future. But human pride Is skilful to invent most serious names To hide its ignorance (Queen Mab: A Philosophical Poem; with Notes. Percy Bysshe Shelley, 1813) Paul Bridge, David Smith and Erko Stackebrandt, May 2020

Reference Cowan, S.T. (1978) A Dictionary of Microbial Taxonomy (ed. Hill, L.R.) Cambridge University Press, Cambridge, UK.

1 Bridging 200 Years of Bacterial Classification

Ramon Rosselló-Móra1 and Erko Stackebrandt2,* Grup de Microbiologia Marina, Department Animal and Microbial Diversity, IMEDEA (CSIC-UIB), Esporles, Spain; 2Kneitlingen, OT Ampleben, Germany 1

Introduction The history of bacterial systematics reflects scientific progress in which approaches (often developed in non-biological disciplines) were adopted to improve the accuracy of bacterial taxonomy and to develop comprehensible relationships. What started two centuries ago with a blurred vision of the simplest properties of the bacterial cell became more focused once pure cultures were achievable. At an amazing speed, their existence as both harmful pathogens for humans, plants and animals and as beneficial partners in agriculture, food and industrial applications was disclosed. Microbiology as a scientific discipline in its own right was established, although the historical connection to botany was still recognizable until recently. In parallel with the enormous avalanche of strains, names and accompanying data already generated at the dawn of microbiology, attempts were made to understand the origin of bacteria, and to order them into a system that depicted the relationships among themselves and to other living organisms. A large number of systems were outlined with hardly any of them being identical to another. It was not until far into the 20th century that two major events occurred that would change our perception of how to reject the plethora of

bacterial names and classification systems. The first was the establishment of the Approved List of Bacterial Names (Skerman et al., 1980) that brought order into the huge number of synonyms based on obscure species descriptions; the second was the perception that proper classification needed to be based upon phylogenetic relationships. Within a short period of 20 years the tree of life began to unfold, and what originally was founded on a single evolutionary conservative gene has now been extended to genome sequences. Moreover, the era in which pure cultures were needed to assess phylogenetic novelty has been extended to the direct application of metagenome and microbiome studies to environmental samples. As a result, the ‘traditional’ bacteriologists are confronted with the interest of ‘molecular systematists’ in giving names to as yet uncultured organisms and even to genomes or fragments thereof. No doubt, without a mutual understanding of the rationale and purpose of naming the individual entities (from genes to cultured cells) confusion is bound to occur, especially if different entities of a given organism are named differently. This chapter will briefly take a historical view of the major steps in bacterial systematics leading to the first reconciliation workshop in 1987 and a re-evaluation of the species concept in 2002.

*[email protected] © CAB International 2021. Trends in the Systematics of Bacteria and Fungi (eds. P. Bridge, D. Smith and E. Stackebrandt)

1

2

R. Rosselló-Móra and E. Stackebrandt

New challenges and concepts developed since then will be outlined.

The Historical Perspective The changing consideration of bacterial taxonomic assessment The early era Microorganisms became visible by means of the introduction of Antony van Leeuwenhoek’s microscope in the mid-17th century. He and Lazzaro Spallanzani objected to the idea of spontaneous generation of microorganisms, and paved the way for the future development of the science of microbiology. It took about another 100 years before the first report on classification and characterization of microorganisms was attempted (Müller, 1786). Another century of optical and instrumental improvement passed before the name ‘Bacterium’ as a genus was introduced as a scientific word by Christian Gottfried Ehrenberg (1838). As Ehrenberg stated: ‘eine Milchstrasse der kleinsten Organisation geht duch die Gattungen Monas, Vibrio, Bacterium…’ (‘a Milky Way of the smallest organization runs through the genera Monas, Vibrio, Bacterium…’). However, while at that time their occurrence and relationships in different niches became apparent to the early microbiologists, neither their ecological function nor their mutual relationships revealed themselves to the observer. It is a part of human nature to arrange subjects into categories, no matter how small the number and how difficult the selection criteria. Being mostly botanist by education is not surprising that, in the absence of facts about their phylogeny, the nature of bacteria (based on nothing more than basic morphological descriptions) could not be properly evaluated by these early systematists. The apparent similarity of mainly unicellular forms and separation by fission allowed the early microbial systematists to deduce that bacteria derived from animals (van Leeuwenhoek, Müller, Ehrenberg), and later from fungi, classifying them as Schizophytae (Cohn, 1872), a system in which bacteria were placed together with ‘Spaltalgen’ (today named cyanobacteria). This early period which Paul de Kruif (1926) characterizes so well in his famous novel Microbe Hunters is dominated by discoveries in the medical

field. But this era also saw the exploration into other scientific fields, such as immunology, chemotherapy, physiology, industrial microbiology, biochemistry and the study of metabolisms, leading to a rapid growth in knowledge about the properties of microorganisms and their interactions with biotic and abiotic matter. In the second half of the 19th century the deposition and exchange of microbial cultures; the organization of international conferences; and the accessibility of scientific literature was limited, and it is not surprising that individual systematists generated a plethora of systems. Migula (1900) compared about 30 such systems, published between 1836 and 1894. At the dawn of microbiology, and even later, the same microorganism was given different names as individual researchers based their taxonomy on different properties. It was not until 1980 that the Approved List of Bacterial Names (Skerman et al., 1980) brought nomenclatural order into bacteriology by starting a new date for bacterial nomenclature. The tens of thousands of names for bacterial species were reduced to about 2500 names that could be linked unambiguously to a previously defined name of a bacterial species. Nevertheless, despite the methodological shortcomings of taxon descriptions, the early 19th century must be considered a period of great scientific achievement and accurate observations, as some of the names given to bacteria with particular morphologies in the first half of the 19th century (e.g. Spirillum [Ehrenberg, 1835], Spirochaeta [Ehrenberg, 1835] and Sarcina [Goodsir, 1842]) were recognized as valid and included in the Approved List. To recognize the historic development of species characterization and the changes in affiliating genera to higher taxa we use the fate of Vibrio cholerae as an example, starting from the first description by Filipo Pacini (1854) through a series of progress reports as laid down in Bergey’s Manual of Determinative Bacteriology and later of Bergey’s Manual of Systematic Bacteriology. Almost any other taxon described before the late 20th century was prone to the same fate as the genus Vibrio and its type species V. cholerae. A witness of scientific progress: Bergey’s Manual of Determinative Bacteriology About 20 years after the publication of Ehrenberg (1832) a Vibrio-shaped bacterium was deduced

Phase 4: to present rRNA gene, housekeeping genes and genome sequences; MLST and MLSA; MALDI-TOF, DNA patterns; FISH-probes; -omics; integrated databases

Numerical taxonomy; chemotaxonomy; DNA-DNA and DNA-rRNA reassociation; mol% G+C of DNA; rRNA gene cataloguing; anaerobic techniques

Phase 2: until mid-1950s Ecology; biochemistry; genetics; serology; immunology

Phase 1: around 1880–1900 Medical microbiology; first species description; culture observation; physiology

1st reconciliation workshop; polyphasic taxonomy; commercial kits; rRNA gene sequence(evolution/objective)-based phylogeny

Subjective classification into ranks based on individual observations; unravelling the high diversity and niches of microorganisms Around 1850: considering microorganisms as pathogenic agents; the early champions of microbiology

Bridging 200 Years of Bacterial Classification

Phase 3: mid-1950s–mid-1980s

In silico genome analysis of relationships; genome-based phylogeny; ecotype concept; rapid molecular identification tools; 2nd reconciliation workshop

Early 1600s: Beginning of microscopy

Fig. 1.1. Development of taxonomic approaches through different historical phases. MLST: Multilocus Sequence Typing; FISH: Fluorescent in situ Hybridization.

3

4

R. Rosselló-Móra and E. Stackebrandt

as the most likely causative agent of cholera disease and was named V. cholerae by Pacini (1854), although for several decades Robert Koch was listed as the discoverer of this bacterium. For some of the reasons indicated above, strains of this species or similar organisms were named (among others) Bacillus cholerae-asiaticae (Pacini, 1854) Trevisan, 1884; Spirillum cholerae-asiaticae (Trevisan, 1884) Zopf, 1885; Microspira comma Schröter, 1886; Spirillum cholerae (Pacini, 1854) Macé, 1889; Vibrio comma (Schröter, 1886) Blanchard, 1906; or Liquidivibrio cholerae (Pacini, 1854) Orla-Jensen, 1909 (see www.bacterio. net/vibrio.html, accessed 4 July 2020). After World War I the dominance of microbial systematics and taxonomy shifted from Europe to the USA. Initiated by the Society of American Bacteriologists, David Hendricks Bergey was appointed chairman of an editorial board in charge of publication of a Manual; under the name Bergey’s Manual of Determinative Bacteriology, this book and its several editions remained the international reference work and benchmark for bacterial taxonomy. Since 1923 this Manual has served as the main data depository for identification and systematics. No other textbook on bacterial properties is a more accurate witness of taxonomic innovation and progress in identification. Its editors chose a useful, reasonably stable, artificial classification rather than trying to place systematics within a phylogenetic framework (cited in Sapp, 2005). This was not, however, consequently put into practice, as seen in the affiliation of the genus Vibrio with the family Spirillaceae in the 1st edition (Bergey et al., 1923). The number of properties included for V. comma (first Koch, then Schröter were given as references) is small and restricted to culture observations such as shape, flagellation, reaction of growth and pigmentation in different media, a few physiological properties and habitat. The name and taxonomic affiliation, as well as the descriptive criteria, remained basically unchanged in the 2nd to the 5th edition (Bergey, 1925, 1930, 1935; Bergey et al., 1939), except that the genus Vibrio was placed in the family Pseudomonadaceae in the 5th edition and in the listing of a few more physiological reactions (acid production from carbohydrates). The stagnation seen in microbial taxonomy in the first 30 years of the 20th century contrasts with the progess witnessed in the ‘Golden Age of Microbiology’ before and around

the turn of the 19th century. Although DNA was discovered around 1860, bacterial transformation experiments were published in the 1920s, and chromosomes were identified as the cellular structures responsible for heredity, the structure of DNA had still not been deciphered and genetics was in its infancy. The absence of a clearly defined nucleus let Copeland (1938) propose a separate kingdom – Monera – for the bacteria (and cyanobacteria), bringing them to the same level as those of animals, plants and protoctista. A historic view on the development of the highest systematic ranks (kingdoms, domains) and a summary of the thoughts of leading scientists on the likeliness (and purpose) of constructing a phylogenetic framework for bacteria has been given by Sapp (2005). The 1940s and 1950s saw the development of electron microscopy; the evidence that DNA, not protein, is the genetic material; and the double helix structure of DNA was proposed. As progress in these scientific eras was not evaluated for application in bacterial systematics, the range of taxonomic properties remained basically the same until the early 1970s. van Niel’s statement (1955) about the inappropriateness of phenotypic data to order taxa into a hierarchic system and to replace the binominal system of species by common names is a clear testimony to the frustration with bacterial classification prevailing in these decades. Still concentrating on the cultural properties of V. comma, the description of this species in the 6th and 7th editions (Breed et al., 1948 and 1957, respectively) were almost identical but included physiological properties additional to those in the previous editions. More important, information on the antigen structure (O, somatic and H, flagellar) led to the clustering of V. cholerae strains into groups and subgroups. While in the 6th edition the genus Vibrio was affiliated to the family Spirilleae, it was a member of the family Spirillaceae in the 7th edition. The 1960s and 1970s saw a boom in the introduction of taxonomic methods, ranging from the molecular (DNA-DNA and DNA-rRNA hybridization, mol% G+C of DNA) over chemotaxonomic (peptidoglycan, fatty acid, isoprenoid quinone, lipid A, polar lipid) to the phenotypic level (numerical taxonomy) as well as the design of novel cultivation strategies (anaerobic techniques). Naming the key authors of these many approaches would be beyond the framework of

Bridging 200 Years of Bacterial Classification

this chapter (see Chapters 9, 10, 13, 15 and 16). However, Sokal and Sneath (1963) should be mentioned here, as computer-assisted numerical taxonomy was the favoured classification of microbes of the 1960s and 1970s. The need to optimally characterize bacterial isolates by the many taxonomic techniques available was coined ‘polyphasic taxonomy’ by Colwell (1970), a term still very much in use today. The 8th edition (Buchanan and Gibbons, 1974) included a few changes from the first seven editions, in that (i) the name V. comma was replaced by V. cholerae Pacini 1854; (ii) the genus Vibrio was placed into the family Vibrionaceae, listed under Part III ‘Gram-negative facultative anaerobic rods’ and the rank Schizomycetes was abolished; and (iii) the first genomic marker, here the DNA base composition, was included. The range of phenotypic properties was significantly expanded and used in the differentiation from other Vibrio species. Molecular approaches not to be directly applied to identification (e.g. DNADNA hybridization data) were mentioned under ‘Further Comments’. In the mid-1970s the taxonomic toolbox was well filled with a wide range of methods suited to characterize strains and to delineate closely related species from each other. Chemo taxonomy provided a significant number of genomically stable properties which especially facilitated the characterization of Gram-positive bacteria. The era of numerical taxonomy ended when the inability of phenotypic data to unravel natural relationships became obvious by nucleic acid hybridization. Nevertheless, phenotypic data still played a significant role in species identification and characterization, as later seen by the introduction of commercial systems such as BIOLOG, VITEK or API. Although DNA-rRNA hybridization allowed the detection of broader genetic relationships, it was mainly used for some Gram-negative taxa and the resolution power ended at the suprageneric level. But the fate of bacterial systematics changed rapidly as, almost unnoticed by the majority of taxonomists, a few scientists began to develop methods that would fundamentally change our idea of the natural relationships of bacteria, tracing them back to the origin of life. A 9th edition was published in 1994, between the release of the two editions of Bergey’s Manual of Systematic Bacteriology (see below), as

5

a true identification reference book in which the taxonomic data over the past 20 years were added to the 8th edition. Some changes in family composition, biovar characteristics of V. cholerae and an extensive list of phenotypic traits of old and newly added genus members were included.

The dawn of unravelling the evolution of Prokaryotes The majority of the younger readers of this chapter may not be aware of the significance of the term ‘oligonucleotide sequencing’ as they have been educated in times of rapid genome sequencing. What is done today by sophisticated machines and a range of treeing algorithms at an incredible speed not foreseen 50 years ago, has been a labour- and intellectually intensive exercise. Devised in the 1960s, the oligonucleotide sequence approach was the only method able to assemble entire RNA species, such as small microbial ribosomal or transfer RNA, or the genomes of single-stranded RNA bacteriophages. These molecules had a number of intrinsic properties, such as small size, slow evolution of rrn genes, the availability of site-specific RNases motives, the lack of complementary strands (Heather and Chain, 2016) and the ease at which bulk RNA could be labelled and isolated, and so at that time had a head start over the use of sequencing DNA, which began at the beginning of the 1970s. Originally developed by analytical chemists, the use of RNA oligomers was the basis on which to test the hypothesis of Zuckerkandl and Pauling (1965) that ‘semantides’ – the sequences of information carrying macromolecules – serve as the basis for deciphering molecular phylogenies (Sogin et al., 1971). Once the method of two-dimensional separation of P32, labelled short 16S rRNA fragments (6 to ̴24 nt long), and the ‘mastermind’ approach of reassembling these T1 RNase fragments from subsets of even smaller ones generated by RNases of different specificity (U2, A) had been established and applied to Bacteria and Archaea (Sogin et al., 1971; Uchida et al., 1974; Woese et al., 1975), the door opened to what culminated in 1977 in the definition of three primary kingdoms (Woese and Fox, 1977). First sceptically considered by the majority of taxonomists, the convincing phylogenetic

6

R. Rosselló-Móra and E. Stackebrandt

relationships led to a rethink in terms of the use of molecular approaches and in the genomic relationships among living matter. The paradigm shift in the understanding of natural relationships embraced and clarified diverse aspects of bacterial taxonomy, systematics, phylogeny and evolution. It later extended to bacterial identification and, crossing kingdom/domain boundaries, to yeast and fungi, algae and protozoa up to the ranks of Plantae and Animalia. Within this period of rapid change in concepts of affiliating lower to higher ranks, the contribution of a few scientists laying the breathtaking foundation of revolutionary changes has been well covered by Sapp (2009). No stone was left unturned (taxonomically speaking), and no historic grail was considered to be untouchable. A change in the taxonomist’s consideration of systematics resulted in a dramatic rearrangement of higher taxa in the domain Bacteria and laid the basis for a hierarchic structure of the domain Archaea (Woese et al., 1990). Speculations on the ancestral, primitive stage of the living cell tried to explain the differences in members of the three ‘primary kingdoms’. In addition to the numerous reclassification of species taxa like Firmicutes (Gibbons and Murray, 1978) and Proteobacteria (Stackebrandt et al., 1988) were introduced the origin of mycoplasmas (Woese et al., 1984) and Cyanobacteria (Woese et al., 1985) linked to the root of Firmicutes (Gram-positive bacteria); the significance of phenotypic and ecological forces to shape bacterial taxa (Zavarzin et al., 1991) were explained and chemotaxonomic traits (Goodfellow et al., 1988) were evaluated in an evolutionary context. The period of ‘oligonucleotide sequencing’ was replaced, first by reverse sequencing of rRNA genes, then shortly afterwards by cycle sequencing of rDNA for taxonomic purposes (Sanger et al., 1975; Mullis et al., 1986). Since then, sequencing of rRNA genes has become a universal approach to placing a new isolate into a phylogenetic framework, making taxonomy an objective pursuit and less subject to personal judgements about relationships.

Reconciliation of bacterial taxonomy The influence of 16S rRNA and 16S rDNA gene sequencing in rearranging the place of taxa as

outlined in Bergey’s Manual of Determinative Bacteriology raised the concern of some taxonomists about the power of a single methodological approach that may result in nomenclatural disruption and the abolishment of phenotypic and chemotaxonomic traits. As a consequence, an ad hoc committee, including experts from a wide range of taxonomic fields, met to make recommendations about the future of bacterial systematics (Wayne et al., 1987). In summary, the committee acknowledged, among other things: (i) that a single formal classification system appears to be adequate; (ii) that the genome sequence would be the reference standard for phylogeny and that phylogeny should determine taxonomy; (iii) the need to search for molecules other than rrn genes to verify the findings based on a ribosomal RNA species; (iv) that there should be no designation of higher ranks without chemotaxonomic and sequence data support; and (v) that DNA-DNA hybridization is the superior approach for species delineation. The latter notion was included because the sequences of rrn genes is too conserved to discriminate between closely related species. These recommendations were expanded by subsequent working groups, concentrating on specific taxa (Murray et al., 1990) or on the species level (Stackebrandt et al., 2002), expressing the opinion that the ‘polyphasic approach’ is superior over emphasis on a few methods and data, and that recent advances in molecular techniques (e.g. MLSA, riboprinting), should be regularly evaluated for their application in shaping taxonomic ranks (see Chapters 10, 11, 13).

Changing gear: Bergey’s Manual of Systematic Bacteriology In light of the ongoing creation of names for phylogenetically defined higher taxa, in 1984 Bergey’s Trust changed the name of the Manual from ‘Determinative Bacteriology’ to ‘Systematic Bacteriology’ to reflect the advancement in phylogenetic considerations. The 1st edition (Holt, 1984) divided the kingdom Prokaryotae into four phylogenetically defined divisions, but the lower ranks were still based on phenotypic descriptions. In contrast, the 2nd edition (Brenner et al., 2005, for Volume 2, Proteobacteria) made a

Bridging 200 Years of Bacterial Classification

courageous step forward and included a complete hierarchic system of taxa described until then, and consequently continued the naming of higher taxa up to the Phylum level. For species including V. cholerae (Phylum Proteobacteria, class Gammaproteobacteria, order Vibrionales, family Vibrionaceae), the ecological, phylogenetic and molecular characterization data (e.g. plasmids, bacteriophages, bacteriocins, antigenic structure, gene-specific probes, Restriction Fragment Length Polymorphism) were referenced, and new identification data were included to specifically differentiate the ranks of species and genus.

Some Considerations on Taxonomy and the Misunderstandings As Cowan (1965) stated, taxonomy is a discipline that is dedicated to three major objectives: (i) classification; that is, the establishment of a structured system of categories of living entities by means of their natural relationships; (ii) nomenclature; that is, the procedure of naming taxa in accordance with scientifically established rules; and (iii) identification, the major goal of taxonomy that focuses on the recognition of members of classified taxa by means of a series of diagnostic features. The system created needs to be universal (applying to all members that the system should embrace), operative (must work and not be so complexly generated that is unworkable) and predictive (ideally, one should be able to predict genetic, phenotypic and even ecologic traits when hearing or reading a name) (Rosselló-Móra, 2012). Thus, taxonomy is about being accurate but also pragmatic, as all disciplines (in our case of bacteriology) must be able to make use of it with no major complications. And this is one of the major problems that taxonomists are disputing among themselves, as well with the scientific community: the disjunction between being precise and operational, between constructing a system for everyone and showing fine-tuned natural relations. However, the major problem of prokaryotic taxonomy is that only nomenclature is official, while there is no official classification, nor classification requirements. The Bacteriological Code, also known as the International Code of Nomenclature of Prokaryotes

7

(ICNP; Parker et al., 2019), was first created in 1958 and reviewed in 1990, 1992 and 2006) (Lapage et al., 1992; Parker et al., 2019), and is the official document showing how names must be constructed and prioritized once a species is published. The new taxonomic descriptions must be first published in an international, peer-reviewed journal. Then, upon publication, the protologue (or formal description in which the etymology, diagnostic features and type material are indicated) is evaluated by the list editors of the International Journal of Systematic and Evolutionary Microbiology (IJSEM) who validate the name if formulated in accordance with the Bacteriological Code. Only the manuscripts published in IJSEM are automatically validated and listed in its notification lists. All other classifications in journals other than IJSEM must be evaluated by the responsible IJSEM list editors who generate the validation lists. Unfortunately, and despite the editors’ reminder to submit a request for validation to IJSEM, many names remain invalid as a request has never been submitted (Oren et al., 2018). Both notification and validation lists become the ‘official’ lists of accepted names, and ultimately are what most scientists believe is ‘the official taxonomy’ (see Chapter 3). Theoretically, the classification of a taxon is just a matter of the scientific opinion of a taxonomist, without restricting the criteria to circumscribe taxa and giving all freedom. Ultimately, this is a judgement of an expert, and as such is referred to in the Bacteriological Code (Parker et al., 2019). However, this freedom is only partially true, because to catalogue a name in the validation or notification lists, the taxonomic description must be first published in an international, peer-reviewed journal. Thus, the first restrictions to the scientific freedom originate from the consensus (or opinion) of a specific subcommittee of scientists (if any) who outline the minimal standards for the classification of the microorganisms they cover as experts. They also come from the referees and editors who request specific parameters and standards (beyond the naming rules) for taxa descriptions (Konstantinidis et al., 2018). The name will ultimately be accepted if the description meets the requirements established in the Bacteriological Code, in which the deposition of a pure culture in two different strain collections is an indispensable requisite. No other form of type or reference material is

8

R. Rosselló-Móra and E. Stackebrandt

allowed. Altogether, the highly restrictive way bacteriologists proceed with the classification of bacterial taxa and valid publication of the new names is excellent in guaranteeing taxonomic accuracy, but ultimately one must admit that there is no freedom to decide what a taxon is (see Chapter 3).

The Paramount Relevance of the Genomic Data As mentioned in the historical perspective, the early age of microbial taxonomy was based on phenotypic distinction of taxa, and the hierarchy was mostly reflecting physiological and morphological similarities and divergences. However, major breakthroughs occurred with the discovery that DNA was the genetic information store of living cellular organisms, and of the intrinsic chemical properties of this molecule. Already in the 1960s the DNA-DNA hybridization experiments were designed to evaluate the genomic similarities between classified taxa (Rosselló-Móra et al., 2011). Soon, it was observed that often phenotypically coherent groups of strains also formed genomically coherent groups based on either reassociation kinetics or percentage of complementarity. The many laboratory experiments done in the second half of the 20th century showed that a difference smaller than 5ºC in the melting temperature between hybrids and homologous DNA (ΔTm) or higher than 70% in DNADNA similarity could be the inclusive border of members of the same species. Actually, these two parameters have been the gold standard for circumscribing species (despite the fact that some taxonomists do not accept them), as for several decades they provided the only approach that would allow numerically measured boundaries. Almost all new classifications using two or more strains were evaluated with one of the different DNA-DNA hybridization techniques (Rosselló-Móra et al., 2011). Only in cases where a single strain was serving as type material, and its 16S rRNA gene identity with the closest relative type strain was below 98.7% (Stackebrandt and Ebers, 2006), was the new classification without genomic comparisons tolerated. The influence of the species circumscription by means of genomic data has been of paramount

relevance since the first measurements, and has gained significance as the feasibility of sequencing genomes at relative low costs allowed in silico comparisons. As indicated below, the wet-lab studies using error-prone DNA-DNA hybridization techniques have been substituted by a series of more precise in silico measured overall genome-related indices (OGRI; Chun et al., 2014). The provision of draft or complete genome sequences of the type strain has become compulsory for any new classification, demanded by journals publishing taxonomic papers, such as Systematic and Applied Microbiology (compulsory since 2014), Archives of Microbiology and Current Microbiology (since 2017) or IJSEM (since 2019). Although not well accepted, modern bacterial classification has a strong genetic and genomic basis. Phenotyping has been losing relevance, and most of the tests used are either unnecessary and/or uninformative in many taxonomic papers (Sutcliffe, 2015). Like it or not, the current classification of higher taxa is only driven by molecular phylogenies based on 16S rRNA genes or, more recently, by analyses of essential genes (Parks et al., 2018), with the fine-tuned species and genus circumscriptions using OGRI. Recent developments in high-throughput sequencing platforms, and bioinformatics allowing the determination of almost complete genomes retrieved from culture-independent methods, challenge the future of microbial taxonomy, as discussed below.

Recent Innovations The evolution of the classification of prokaryotes has evolved in parallel with technological developments (see above and Rosselló-Móra, 2012); and prokaryotic taxonomy has always pioneered the use and development of molecular approaches to discern among distinct, but closely related organisms. GC mol% content, followed by DNADNA hybridization techniques, and molecular phylogenies using 16S rRNA gene analysis, have been the major advances in genotyping that allowed the current classification with high stability. Early in this century, multilocus sequence analysis (MLSA) was foreseen as an alternative to DNA-DNA hybridization (Stackebrandt et al., 2002), but mostly has been used as an alternative to complement molecular phylogenies.

Bridging 200 Years of Bacterial Classification

However, the major breakthrough in the taxonomy for Bacteria and Archaea occurred with the development of high-throughput sequencing technologies. The capability to fully sequence genomes at a relative low cost, together with the development of a myriad of bioinformatic tools to analyse and compare them, has revealed a completely new dimension in classification. Again, genotyping has been the major source of innovative and precise measurements that can numerically circumscribe taxa. As indicated above, high-throughput sequencing technologies have allowed in silico genome to genome comparisons, and several OGRI (Chun et al., 2014) have been developed. The simplest and therefore most straightforward parameters are both the average nucleotide identity (ANI; Konstantinidis and Tiedje, 2005a) and average amino acid identity (AAI; Konstantinidis and Tiedje, 2005b), as the former calculate the mean in identity between orthologous pairs of DNA fragments and the latter the mean of similarity among orthologous DNA-translated proteins. All other OGRI are slightly more complex, as they use additional genome parameters to generate a value, but ultimately all of them render similarly valuable information. Since its development (Konstantinidis and Tiedje, 2005a) ANI has been used extensively to circumscribe species. In the dawn of the in silico analyses using the extant genomic information, the commonly used range of similarity around the 70% DNADNA hybridization boundary could be equivalent to an ANI range of 94%–96% (Richter and Rosselló-Móra, 2009). However, the existence of discrete biological units (which we call ‘species’) has been often questioned, as the boundaries were considered pragmatic, artificial and of no biological significance (Rosselló-Móra, 2012). With the increasing number of genomes in the databases, it was possible to observe a bimodal ANI curve among the species of a genus reinforcing the existence of a fuzzy zone between 92% and 96% ANI (Rosselló-Móra and Amann, 2015). Stronger proof that ‘species’ exist, however, came from the thousands (close to 100,000 genomes) of pairwise comparisons reflecting that, between a species and its closest relative species, there was a gap in the ANI values (ANI-gap) showing a clear jump (Jain et al., 2018). The demonstration with a rational biological foundation of the existence of what bacteriologists call ‘species’ is perhaps the most important breakthrough in

9

the taxonomy of prokaryotes since the beginning of the 19th century, and will most probably have a paramount influence in the very near future. AAI is also a parameter that has been shown to be very informative in the classification of novel taxa. The much more conservative nature of the translated nucleic acid information to proteins has revealed resolution at the level of higher taxa, and especially at the genus level (Konstantinidis and Tiedje, 2005b). The boundary of 70% AAI can serve as a very plausible threshold to discern two genera, similar to the previously used 94% threshold of 16S rRNA gene identity (Yarza et al., 2014). The artificial nature of the present hierarchical classification system (Rosselló-Móra, 2012), results in a much fuzzier circumscription of higher taxa than that of species and genus, but there is still a correlation between how taxonomists have classified families, orders and classes, and the AAI and 16S rRNA gene identity thresholds (Konstantinidis and Tiedje, 2005b; Yarza et al., 2014). It is remarkable that the taxonomic rank Phylum has been thoroughly used in molecular phylogenies and molecular ecology, but this category has no standing in nomenclature (Whitman et al., 2018). Yet there is a definitive need to implement this category in the code of nomenclature for prokaryotes as it is possibly, after the rank of species, the most popular category used among molecular microbial ecologists. In addition to OGRI, the high-throughput sequencing platforms have brought the possibility to expand phylogenetic analyses beyond the single gene, as 16S rRNA or a small set of essential single copy genes (MLSA). Access to the almost complete gene content of genomes has allowed genealogical reconstruction using the shared core of genes among a group of organisms (phylogenomics), or even reconstruction of a prokaryotic-wide phylogeny using the universally conserved housekeeping genes (Parks et al., 2018). The impact of the genomic taxonomy approach is enormous, and has influenced even the structure of Bergey’s Manual of Systematics of Archaea and Bacteria that will implement the taxonomic structure proposed in the Genome Taxonomy Database (GTDB) (www.bergeys. org/). The broad range of effects on genomic taxonomy has resulted in the proposal of some relevant changes in hitherto accepted classification.

10

R. Rosselló-Móra and E. Stackebrandt

The influence of the 16S rRNA gene sequence is predominant in systematics, however; its highly conserved nature and the large database guarantees its dominant role in prokaryotic classification, and it will not be abandoned in the short term. As the sequencing approaches and bioinformatic tools improve, and the genome databases such as GTDP (Parks et al., 2018) or Microbial Genomes Atlas (Rodriguez-R et al., 2018) expand, most probably the relevance of the small subunit ribosomal RNA gene will slowly decrease. Altogether, the influence of genotyping – and especially that of the basic genomic comparisons – has guided the structure of the current classification system (see Chapters 10 and 15) and, somehow, phenotyping has been left aside and relegated to just the minimal standards needed for classification. Often most of the phenotypic data provided in the current taxonomic descriptions are of very low relevance and do not really include truly valuable diagnostic traits (Sutcliffe et al., 2012; Sutcliffe, 2015). To overcome this lack of solid phenotypic data, an effort is needed to implement the high-throughput technologies originally developed in chemistry, also known as metabolomics (Rosselló-Móra et al., 2008). There is a series of different mass spectrometry and fine-tuned technologies that could be implemented. For example, large molecules (between 1000 and 10000D) could be clearly detected using matrix-assisted laser desorption/ionization time of flight mass spectrometry (MALDI-TOF MS), and small molecules (between 50 and 2000D) using high-resolution spectrometry such as Fourier-transform ion cyclotron resonance (FT-ICR MS). The latter has been used already to determine metabolite profiles to distinguish between different strains of the same species (e.g. Antón et al., 2013), along the growth curve of the same strain (Brito-Echeverria et al., 2011), or even to show the biogeographic patterns of members of the same species (Rosselló-Móra et al., 2008). Such metabolomics have never, however, been used in taxonomy. On the other hand, MALDI-TOF has already gained relevance in systematics (Welker and Moore, 2011), and also in medical diagnostics. The relatively specific whole-cell profiles, mostly originating from ribosomal proteins, show patterns that could be diagnostic for members of the same species or of very close species (see Chapters 7 and 8). This approach has been

already well used in classifying and/or identifying thousands of strains in a single study (Viver et al., 2015; Alejandre-Colomo et al., 2020). The cost efficiency of this technology allows the large-scale screening of strains and their classification in accordance with their profiles. The cumulative database of profiles will, in the future, allow rapid classification of new isolates, as well as precise identification of those that may be new and unclassified, facilitating the discovery of taxonomic novelty. Moreover, it helps in generating collections of organisms in which single species can be represented by several strains, avoiding the problems of species described only using a single strain (Christensen et al., 2001). A strategy to assess the large-scale cultivation of microorganisms from the human gut resulted in the isolation of thousands of strains on about 200 variations of culture media (later reduced to about 20) and led to the development of the so-called Culturomics approach (Lagier et al., 2016). In accelerating the process of identification, and to provide a shortcut through the lengthy traditional species description, MALDI-TOF MS represents the beginning of phenotypic characterization of the microbiome, followed by rRNA gene sequence analysis of putative novel species. In accord with the idea of phylotaxonomy, the genome sequence of such species is generated and compared to those of the nearest related type strains, based upon 16S rRNA gene sequence similarity. A few basic phenotypic traits, and whole-cell fatty acid composition, complemented the spectrum of data which led to a significant increase in the recording of novel species from the human gut (Lagier et al., 2018). The identification of a huge number of strains from hosts or the environment is not only a source of novelty, but also a good way to expand the collection of isolated organisms in pure culture that can serve as a source for biotechnological applications. The fact that not all members of the same species isolated from the same source are genomically identical (e.g. Gevers et al., 2005; Antón et al., 2013), enables large collections of strains of the same taxon to be very useful material in understanding genomic diversity. Another benefit of high-throughput sequencing in systematics is in promoting our understanding of how genes show the blueprint of classified taxa (Munoz et al., 2020).

Bridging 200 Years of Bacterial Classification

Taxonomy Needs to Change Its Path The relevance of high-throughput approaches in the study of the diversity of prokaryotes contrasts with the relatively low pace in the description of new species. At the time of writing, about 14,000 species had been classified and close to 19,000 names had been proposed (in accordance with the List of Prokaryotic Names with Standing in Nomenclature [LPSN] figures; www. bacterio.net/; Parte, 2018). It is commonly acknowledged that the number of species in the environment is orders of magnitude higher, and to some extent we hypothesized a range of between 1 to 10 million (Yarza et al., 2014). The relatively low pace of bacterial taxonomy is due to the entire, complex procedure of classifying new taxa, which has several bottlenecks that need to be resolved. In the first instance, only names given to pure cultures can be validated. The Bacteriological Code requires that the reference material (in the form of a type strain) has to be deposited in two international culture collections, to preserve the living genetic material, and to make it available to the scientific community (Parker et al., 2019).The deposition of strains in collections alone takes some time, as these have

1000

11

to certificate the identity of the deposit; other collections implement legal constraints in sharing material that may make the valid publication of a name problematic (Oren et al., 2018). The pace of species descriptions (Fig. 1.2) has witnessed a steady growth since the 16S rRNA gene sequence allowed the recognition of novelty in the genealogical tree. The arithmetic growth reached a plateau between 2007 and 2014 (with about 600 species descriptions/ year), and a significant jump in new descriptions occurred in 2015 (with about 900 species descriptions/year) and in the following years. Basically, the reasons were the increase in numbers of publications (due to the appearance of novel journals publishing taxonomic papers), and the increased number of papers, especially from countries that recognized the value of indigenous resources. But the yearly number of papers dealing with species is still limited, and also the policies of some journals may lead them to drastically reduce the number of taxonomic papers (Sutcliffe, 2019). Altogether, this may have a negative impact on the pace of new taxonomic proposals. One of the main reasons for the current fluctuations is the rise of single-strain species descriptions (SSSD; Tamames and Rosselló-Móra, 1650000

Historical perspective of the yearly new description and 16S rRNA gene entries

1500000 1350000

800

1200000 1050000

600

400

900000

New species

750000

New sequences

600000 450000

200

300000 150000

0

1930

0

1950

1970

1990

2010

Fig. 1.2. Yearly number of new species descriptions and new 16S rRNA gene entries in public repositories. The primary Y-axis (left) in red relates to the number of described taxa (red line), and the secondary Y-axis (right) relates to the number of gene entries in public repositories (blue line) in accordance with the SILVA database (www.arb-silva.de). Owing to the lack of releases in 2018, the numbers of gene entries for 2018 and 2019 are given as the mean in yearly increase between SILVA SSU 132 and 138 (Quast et al., 2013).

12

R. Rosselló-Móra and E. Stackebrandt

2012), often with reduced relevance owing to the very sparse information content of the descriptions (Christensen et al., 2001). This, together with ‘one-species one-paper’ practices (Sutcliffe, 2019), has strongly impacted the citation indices of the journals publishing these papers. The impact has had such an influence that many journals adhered to the Declaration on Research Assessment (DORA; https://sfdora.org/) and avoided the use of altmetrics in their journal advertisements (Parish et al., 2019). It seems that there is a conflict between fostering an increase in new classifications, and the journal policies, owing to the scientific relevance of their publications. Actually, the cataloguing of new taxa may well be disassociated from the standard publication procedures. Journals could centre their publication basis on highly relevant classifications. As Sutcliffe (2019) stated, ‘…it seems unfeasible, given the sheer scale of microbial diversity, that a “one species, one paper” publishing model can be sustained indefinitely, particularly if there is to be a shift towards “high throughput” approaches to the circumscription and valid naming of prokaryotic taxa’. Thus, it is clear that a new path for taxonomic cataloguing is needed. To this purpose, the Digital Protologue Database (DPD; Rosselló-Móra et al., 2017a,b) was created. In the first instance this was to generate a repository of taxonomic descriptions in an orderly manner in a database-based format that could be cumulative and searchable following the current bioinformatics requirements. The protologues were laid out in a way that was reminiscent of the gene and genome entries in public repositories, formatted in fields of searchable information and universal to all new classifications. Each entry was given a unique TaxoNumber. But, in the second instance (and as future perspective), it was suggested that the DPD was to be a repository of descriptions, each having been given a digital object identifier (DOI) entry. It was to be citable independently from publishing journals, and ideally run by one of the managing entities of the International Nucleotide Sequence Database Collaboration (INSDC) that would guarantee stability. The DPD was warmly welcomed by other journals publishing taxonomic papers, such as Archives of Microbiology (Stackebrandt and Smith, 2017a), Current Microbiology (Stackebrandt and Smith, 2017b) and New Microbes New Infections (Drancourt and

Fournier, 2018). In its 2-year life over 1000 entries and 750 registered users were accomplished. However, the inability to secure the support of the editorial board of the IJSEM (because of its dominant role in publishing > 80% of all new names) forced closure of the DPD (Rosselló-Móra and Sutcliffe, 2019a,b). The authors believe that it is only a question of time before the classification of microorganisms will be automatized (Rosselló- Móra and Whitman, 2019), especially if we want to speed up the process of cataloguing the expected vast amount of as yet unclassified taxa. The DPD, in one form or another, will then prevail. We foresee that the global centralization of data, with taxonomic descriptions and their interlinking genomic, genetic and phenotypic data, will be the choice of the future; the one-species one-paper practices will then go extinct. However, the major bottleneck to cataloguing prokaryotic species is cultivation. In general, discussion about the vast majority of prokaryotes being recalcitrant to cultivation (Konstantinidis et al., 2017) is commonplace, but is one of the major problems if we want to classify the entire diversity at a relatively fast pace. Traditionally, since the introduction of molecular microbial ecology tools, uncultured microorganisms were detected solely by means of 16S rRNA amplified and cloned gene sequences, and by application of phylogenetic probes directed to the ribosomal RNA sequences either using northern blots or fluorescence in situ hybridization (Amann et al., 1995). The failure to retrieve accurate genetic and phenotypic data from uncultivated microorganisms made their stable classification and nomenclature impossible. Only some microorganisms with conspicuous characters such as size, inclusions or lifestyle could be identified, and for these the provisional category of Candidatus was created (Murray and Stackebrandt, 1995). The disadvantage of this provisional category is that it has no standing in the Code; and, derived from this, the names given have no priority (Whitman et al., 2019). The lack of priority means that if someone isolates a representative of a Candidatus as a pure culture, any new name could be published, while the earlier given name would be denied. The lack of a stable framework seems to have discouraged molecular ecologists to ‘formally’ name the uncultivated prokaryotes they detected and, in about 20 years of existence, only about < 500 Candidatus taxa were proposed (Oren, 2017).

Bridging 200 Years of Bacterial Classification

But things have changed, and the highthroughput sequencing strategies have brought a completely new dimension to microbial systematics (see Chapters 10, 11, 13, 15). The metagenomic approaches, together with the new bioinformatic tools, have allowed the segregation of single-species genomes from complex metagenome pools of sequences. The metagenome assembled genomes (MAGs) represent the pooled mosaic of the genomes of the coexisting populations or strains of a single species thriving in the same sample (Konstantinidis et al., 2017). On the other hand, single cell amplified genomes (SAGs) can render genomic information on single strains without a cultivation step (Hedlund et al., 2015). This giant step in microbial molecular ecology has allowed the retrieval of genetic information from uncultured organisms that is at least of equivalent quality as that derived from cultured organisms (Konstantinidis and Rosselló-Móra, 2015). The most important parameters for a taxonomic classification that infers phylogeny and calculates OGRI can be readily be determined, and with high accuracy, using these culture-free methods. Essential gene analyses to infer global phylogenies are now feasible, and therefore the new taxonomic framework already includes uncultured MAGs and SAGs (Parks et al., 2018). Similarly, global genome analyses to precisely circumscribe species and genera using ANI and AAI can now also include MAGs and SAGs (Rodriguez-R et al., 2018) with which the ANI-gap has been demonstrated (see above; Jain et al., 2018). Metabolism can also be inferred from the genome, and although it is clear that the presence of genes may not always correlate with gene expression (Bisgaard et al., 2019), the phenotypic inference can always be proven by means of extant sophisticated techniques using radiolabelled (Rosselló-Móra et al., 2003) or stable isotopelabelled compounds (Musat et al., 2016), as well as metatranscriptomics (Zuñiga et al., 2017) and metaproteomics (Armengaud, 2016). The revolution of high-throughput techniques and their application in systematics has led to the radical proposal for the Bacteriological Code to allow DNA sequences to become type material as an alternative to the deposit of living cultures in international collections (Whitman, 2015, 2016). If DNA could become type material, the classification of all (not only fastidious) microorganisms would be facilitated and the

13

door would be opened to classifying uncultured taxa for which high quality MAGs or SAGs had been retrieved (Konstantinidis et al., 2017). The most important benefit of cataloguing the as yet uncultured taxa using the same classification standards as used for those that are cultured is the provision of a stable nomenclature that would avoid anarchic designations and the synonymy that often arises owing to the lack of rules. However, this proposal has not always been welcome by taxonomists (Oren and Garrity, 2018; Bisgaard et al., 2019; Overmann et al., 2019). There is an uneasiness that nomenclatural chaos will be created arising from the avalanche of names based on the classification of MAGs and SAGs; there is also great concern that these sequences do not represent extant species genomes, but non-existent chimeras resulting from insufficiently powerful bioinformatic tools. We are at the dawn of the high-throughput bioinformatics era, and in the future the accuracy of the new approaches will improve enormously. But, even today, there is evidence indicating that MAGs share ANI values > 97% with the genomes of isolates: for example, Haloquadratum walsbyi (Viver et al., 2019), Salinibacter ruber (Ramos-Barbero et al., 2019), Escherichia coli (Almeida et al., 2019; Peña-Gonzalez et al., 2019) and Candidatus ‘Macondimonas diazotropica’ (Karthikeyan et al., 2019). Moreover, the fact that, in a given sample, different populations of the same species coexist with different genomic content (Antón et al., 2013), the retrieved MAGs would represent the core genome of the species that thrived when the sample was taken. Actually, there is an advantage with this approach, as it is most likely that the genomic blueprint of what a species is relies on the shared set of genes, and a MAG will better represent the species than the single isolate of the common SSSDs.

Conclusions: Reconciliation or Divorce Whether DNA is accepted as alternative type material remains independent of whether the code of nomenclature will consider MAGs and SAGs as nomenclatural types. Thus, both cases can be treated independently by the International Committee on Systematics for Prokaryotes (ICSP;

14

R. Rosselló-Móra and E. Stackebrandt

www.the-icsp.org). Several proposals have been forwarded to the committee for its evaluation and vote on the use of DNA as type material (Whitman, 2016) and the acceptance of Phylum as the highest category (Whitman et al., 2018). At present, both requests can only apply to pure cultures but, to allow the names of MAGs and SAGs to gain priority, a third proposal was raised for them (Whitman et al., 2019). At the time of writing, none of the proposals had been voted for by the ICSP, and so none of the relevant topics are either accepted for or denied implementation in the Bacteriological Code. Microbial systematics is now at the tipping point between reconciliation or divorce among classical taxonomists and molecular microbial ecologists. The result of the opinion of the ICSP will definitively clarify the path of prokaryotic taxonomy, and the decision is especially important for the cataloguing of the uncultured organisms. As already proposed (Konstantinidis et al., 2017), either the Bacteriological Code is adapted to the new winds of molecular systematics, thus allowing MAGs and SAGs to be designated with stable names with priority, or an alternative nomenclatural code should be created by microbial molecular ecologists. A parallel code for the uncultured organisms would result in an alternative taxonomy exclusively for DNA-based classifications that would run independently of the decisions of the ICSP. But this divorce would surely have many other negative effects. To this purpose, taxonomists and molecular ecologists have again reinforced the need for an urgent

s olution (Murray et al., 2020). The preferred Plan A would include a common nomenclature for cultured and uncultured taxa, fostering a harmonious classification. The less desirable Plan B would include a new code allowing uncultured taxa to be stably named (the UnCode) that would have at least the effect of halting chaos. Never before has the future of the prokaryotic taxonomy been at such a critical point. The uncertain future, and whether Plan A or Plan B will prevail, depends totally on the wisdom of the ICSP. Perhaps by the time this book is published the situation will have been clarified, but the current situation is as full of uncertainty as of excitement, and none of the scenarios can be predicted. In April 2020, a majority of the ICSP members decided to reject the proposals to use DNA as type material. Therefore, this rejection leaves the only path for microbial ecologists to go through Plan B. Only time will reveal whether this was the best decision for the future of the taxonomy for prokaryotes.

Acknowledgements RRM acknowledges the financial support from the Spanish Ministry of Science and Innovation with the projects CLG2015_66686-C3-1-P, PGC2018-096956-B-C41, PRX18/00048 and RTC-2017-6405-1, also supported with European Regional Development Fund (FEDER).

References Alejandre-Colomo, C., Harder, J., Fuchs, B.M., Rosselló-Móra, R. and Amann, R. (2020) High-throughput cultivation of heterotrophic bacteria along a spring phytoplankton bloom in the North Sea. Systematic and Applied Microbiology 43: 126066. https://doi.org/10.1016/j.syapm.2020.126066 Almeida, A., Mitchell, A.L., Boland, M., Forster, S.C., Gloor, G.B., Tarkowska, A., Lawley, T.D. and Finn, R.D. (2019) A new genomic blueprint of the human gut microbiota. Nature 568, 499–504. https://doi. org/10.1038/s41586-019-0965-1 Amann, R.I., Ludwig, W. and Schleifer, K.-H. (1995) Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiology Reviews 59, 143–169. https://doi.org/10.1128/ MMBR.59.1.143-169.1995 Antón, J., Lucio, M., Peña, A., Cifuentes, A., Brito-Echeverría, J., Moritz, F., Tziotis, D., López, C., Urdiain, M., Schmitt-Kopplin, P. and Rosselló-móra, R. (2013) High metabolomic microdiversity within co-occurring isolates of the extremely halophilic bacterium salinibacter ruber. PLoS ONE 8(5): e64701. https://doi. org/10.1371/journal.pone.0064701 Armengaud, J. (2016) Next-generation proteomics faces new challenges in environmental biotechnology. Current Opinion in Biotechnology 38, 174–182. https://doi.org/10.1016/j.copbio.2016.02.025

Bridging 200 Years of Bacterial Classification

15

Bergey, D.H. (1925) Bergey’s Manual of Determinative Bacteriology 2nd edn. Williams & Wilkins Co., Baltimore, Maryland. Bergey, D.H. (1930) Bergey’s Manual of Determinative Bacteriology 3rd edn. Williams & Wilkins Co., Baltimore, Maryland. Bergey, D.H. (1934) Bergey’s Manual of Determinative Bacteriology 4th edn. Williams & Wilkins Co., Baltimore, Maryland. https://doi.org/10.1097/00000441-193408000-00030 Bergey, D.H., Harrison, F.C., Breed, R.S., Hammer, B.W. and Huntoon, F.M. (1923) Bergey’s Manual of Determinative Bacteriology. Williams & Wilkins Co., Baltimore, Maryland. Bergey, D.H., Breed, R.S., Murray, E.G.D. and Hitchens, A.P. (1939) Bergey’s Manual of Determinative Bacteriology 5th edn. Williams & Wilkins Co., Baltimore, Maryland. Bisgaard, M., Christensen, H., Clermont, D., Dijkshoorn, L., Janda, J.M., Moore, E.R.B., Nemec, A., Norskov-Lauritsen, N., Overmann, J. and Reubsaet, F.A.G. (2019) The use of genomic DNA sequences as type material for valid publication of bacterial species names will have severe implications for clinical microbiology and related disciplines. Diagnostic Microbiolology and Infectious Diseases 95, 102–103. https://doi.org/10.1016/j.diagmicrobio.2019.03.007 Breed, R.S., Murray, E.G.D. and Hitchens, A.P. (1948) Bergey’s Manual of Determinative Bacteriology 6th edn. Williams & Wilkins Co., Baltimore, Maryland. Breed, R.S., Murray, E.G.D. and Smith N.R. (1957) Bergey’s Manual of Determinative Bacteriology 7th edn. Williams & Wilkins Co., Baltimore, Maryland. Brenner, D.J., Krieg, N.R., Staley, J.T. and Garrity, G.M. (eds) (2005) Proteobacteria. Bergey’s Manual of Systematic Bacteriology 2nd edn, vol. 2, parts A, B and C. Springer-Verlag, New York, NY. https://doi. org/10.1007/0-387-28021-9 Brito-Echeverría, J., Lucio, M., López-López, A., Antón, J., Schmitt-Kopplin, P. and Rosselló-Móra, R. (2011) Response to adverse conditions in two strains of the extremely halophilic species Salinibacter ruber. Extremophiles 15, 379–389. https://doi.org/10.1007/s00792-011-0366-3 Buchanan, R.E. and Gibbons, N.E. (1974) Bergey’s Manual of Determinative Bacteriology 8th edn. Williams & Wilkins Co., Baltimore, Maryland. Christensen, H., Bisgaard, M., Frederiksen, W., Mutters, R., Kuhnert, P. and Olsen, J.E. (2001) Taxonomic note: is characterization of a single isolate sufficient for valid publication of a new genus or species? Proposal on reformulation of recommendation 30b of the Bacteriological Code. International Journal of Systematic and Evolutionary Microbiology 51, 2221–2225. https://doi.org/10.1099/00207713-51-6-2221 Chun, J. and Rainey, F.A. (2014) Integrating genomics into the taxonomy and systematics of Bacteria and Archaea. International Journal of Systematic and Evolutionary Microbiology 64, 316–324. https://doi. org/10.1099/ijs.0.054171-0 Cohn, F. (1872) Untersuchungen über Bacterien. Beiträge zur Biologie der Pflanzen 1(2), 127–224. Colwell, R.R. (1970) Polyphasic taxonomy of the genus Vibrio: Numerical taxonomy of Vibrio cholerae, Vibrio parahaemolyticus, and related Vibrio species. Journal of Bacteriology 104, 410–433. https:// doi.org/10.1128/JB.104.1.410-433.1970 Copeland, H. (1938) The kingdoms of organisms. Quarterly Review of Biology 13, 383–420. https://doi. org/10.1086/394568 Cowan, S.T. (1965) Principles and practice of bacterial taxonomy - a forward look. Journal of General Microbiology 39, 148–159. https://doi.org/10.1099/00221287-39-1-143 De Kruif, P. (1926) Microbe Hunters. 1996 Reprint. Houghton Mifflin Harcourt, USA. New https://doi. org/10.2307/3221690 Drancourt, M., Fournier, P.E. (2018) New species announcement 2.1. New Microbes New Infections 25, 48. https://doi.org/10.1016/j.nmni.2018.06.009 Ehrenberg, C.G. (1832) Beiträge zur Kenntnis der Organization der Infusorien und ihrer geographischen Verbreitung besonders in Sibirien. Abhandlungen der Königlichen Akademie der Wissenschaften zu Berlin 1830, 1–88. https://doi.org/10.5962/bhl.title.143632 Ehrenberg, C.G. (1835) Dritter Beitrag zur Erkenntniss grosser Organisation in der Richtung des kleinsten Raumes. Abhandlungen der Königlichen Akademie der Wissenschaften zu Berlin aus den Jahren 1833-1835 143–336. Ehrenberg, C.G. (1838) Die Infusionsthierchen als vollkommene Organismen. Ein Blick in das tiefere organische Leben der Natur. Voss, Leipzig 1838. https://doi.org/10.5962/bhl.title.97605 Gevers, D., Cohan, F.M., Lawrence, J.G., Spratt, B.G., Coenye, T., Feil, E.J., Stackebrandt, E., Van de Peer, Y., Vandamme, P., Thompson, F.L., Swings, J. (2005) Opinion: Re-evaluating prokaryotic species. Nature Reviews Microbiology 3, 733–739. https://doi:10.1038/nrmicro1236

16

R. Rosselló-Móra and E. Stackebrandt

Gibbons, N.E. and Murray, R.G.E. (1978) Proposals concerning the higher taxa of bacteria. International Journal of Systematic Bacteriology 28, 1–6. https://doi.org/10.1099/00207713-28-1-1 Goodfellow, M., Stackebrandt, E. and Kroppenstedt, R.M. (1988) Chemotaxonomy and actinomycete systematics In: Okami. Y., Beppu, T. and Ogawara, H. (Eds) Biology of Actinomycetes. Japan Scientific Societies Press, Tokyo, Japan, pp. 233–238. Goodsir, J. (1842) History of a case in which a fluid periodically ejected from the stomach contained vegetable organisms of an undescribed form. Edinburgh Medical Surgery Journal 57, 430–443. Heather, J.M. and Chain, B. (2016) The sequence of sequencers: The history of sequencing DNA. Genomics 1-8. https://doi.org/10.1016/j.ygeno.2015.11.003 Hedlund, B.P., Dodsworth, J.A., Staley, J.T. (2015) The changing landscape of microbial biodiversity exploration and its implications for systematics. Systematic and Applied Microbiology 38, 231–236. https://doi.org/10.1016/j.syapm.2015.03.003 Holt, J.G.B (1994) Bergey’s Manual of Systematic Bacteriology 1st edn. Williams & Wilkins Co., Baltimore, Maryland. Jain, C., Rodriguez-R, L.M., Phyllippy, A.M., Konstantinidis, K.T. and Aluru, S. (2018) High-throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nature Communications 9, 5114. https://doi.org/10.1038/s41467-018-07641-9 Karthikeyan, S., Rodriguez-R, J.M., Heritier-Robbins, P., Kim, M., Overholt, W.A., Gaby, J.C., Hatt, J.K., Spain, J., Rosselló-Móra, R., Huettel, M., Kostka, J. and Konstantinidis, K. (2019) “Candidatus Macondimonas diazotrophica” a novel gammaproteobacterial genus dominating crude-oil-contaminated coastal sediments. The ISME Journal 13, 2129–2134. https://doi.org/10.1038/s41396-019-0400-5 Konstantinidis, K. and Rosselló-Móra, R. (2015) Classifying the uncultivated microbial majority: a place for metagenomic data in the Candidatus proposal. Systematic and Applied Microbiology 38, 223–230. https://doi.org/10.1016/j.syapm.2015.01.001 Konstantinidis, K. and Tiedje, J.M. (2005a) Genomic insights that advance the species definition for prokaryotes. Proceedings of the National Academy of Sciences USA 102, 2567–2592. https://doi. org/10.1073/pnas.0409727102 Konstantinidis, K.T. and Tiedje, J.M. (2005b) Towards a genome-based taxonomy for prokaryotes. Journal of Bacteriology 187, 6258–6264. https://doi.org/10.1128/JB.187.18.6258-6264.2005 Konstantinidis, K., Rosselló-Móra, R. and Amann, R. (2017) Uncultivated microbes in need of their own taxonomy. The ISME Journal 11, 2399–2406. https://doi.org/10.1038/ismej.2017.113 Konstantinidis, K., Rosselló-Móra, R. and Amann, R. (2018) Reply to the commentary “Uncultivated microbes-in need of their own nomenclature?”. The ISME Journal 12, 653–654. https://doi.org/10.1038/ s41396-017-0011-y Lagier, J.C., Khelaifia, S. and Tidjani-Alou, M. (2016) Culture of previously uncultured members of the human gut microbiota by culturomics. Nature Microbiology 1, 16203. https://doi.org/10.1038/nmicrobiol.2016.203 Lagier, J.C., Dubourg, G., Million, M., Cadoret, F., Bilen, M., Fenollar, F., Levasseur, A., Rolain, J.M., Fournier, P.E. and Raoult, D. (2018) Culturing the human microbiota and culturomics. Nature Reviews Microbiology 16, 540–550. https://doi.org/10.1038/s41579-018-0041-0 Lapage, S.P., Sneath, P.H.A., Lessel, E.F., Skerman, V.B.D., Seeliger, H.P.R. and Clark, W.A. (1992) International Code of Nomenclature of Bacteria. American Society for Microbiology, Washington, D.C. Migula, W. (1900) System der Bakterien. Jena Gustav Fischer Verlag. Müller, O.F. (1786) Animalcula infusoria; fluvia tilia et marina. Hauniae: Typis Nicolai Mölleri (in Latin). Mullis, K.F., Faloona, F., Scharf, S., Saiki, R., Horn, G. and Erlich, H. (1986) Specific enzymatic amplification of DNA in vitro: The polymerase chain reaction. Cold Spring Harbor Symposium in Quantitative Biology 51, 263–273. https://doi.org/10.1101/SQB.1986.051.01.032 Munoz, R., Teeling, H., Amann, R. and Rosselló-Móra, R. (2020) Ancestry and adaptive radiation of Bacteroidetes as assessed by comparative genomics. Systematic and Applied Microbiology 43, 126065. https://doi.org/10.1016/j.syapm.2020.126065 Murray, A.E., Freudenstein, J., Gribaldo, S., Hatzenpichler, R., Hedlund, B.P., Hugenholtz, P., Kämpfer, P., Konstantinidis, K.T., Lane, C., Papke, R.T., Parks, D.H., Reysenbach, A-L, Rosselló-Móra, R., Stott, M., Sutcliffe, I.C., Thrash, J.C., Venter, S.N., Whitman, W.B. et al. (2020) Roadmap for naming uncultivated Archaea and Bacteria. Nature Microbiology 5, 987–994. https://doi.org/10.1038/s41564-020-0733-x. Murray, R.G. and Stackebrandt, E. (1995) Taxonomic note: implementation of the provisional status Candidatus for incompletely described procaryotes. International Journal of Systematic Bacteriology 45, 186–187. https://doi.org/10.1099/00207713-45-1-186

Bridging 200 Years of Bacterial Classification

17

Murray, R.G.E., Brenner, D.J., Colwell, R.R., DeVos, P., Goodfellow, M., Grimont, P.A.D., Pfennig, N., Stackebrandt, E. and Zavarzin, G.A. (1990). Report of the ad hoc Committee on Approaches to Taxonomy within the Proteobacteria. International Journal of Systematic Bacteriology 40, 213–215. https://doi.org/10.1099/00207713-40-2-213 Musat, N., Musat, F., Weber, P.K. and Pett-Ridge, J. (2016) Tracking microbial interactions with NanoSIMS. Current Opinion in Biotechnology 41, 114–121. https://doi.org/10.1016/j.copbio.2016.06.007 Oren, A. (2017) A plea for linguistic accuracy - also for Candidatus taxa. International Journal of Systematic and Evolutionary Microbiology 67, 1085–1094. https://doi.org/10.1099/ijsem.0.001715 Oren, A. and Garrity, G. (2018) Commentary: Uncultivated microbes - in the need of their own nomenclature? The ISME Journal 12, 309–311. https://doi.org/10.1038/ismej.2017.188 Oren, A., Garrity, G.M. and Parte, A.C. (2018) Why are so many effectively published names of prokaryotic taxa never validated? International Journal of Systematic and Evolutionary Microbiology 68, 2125– 2129. https://doi.org/10.1099/ijsem.0.002851 Overmann, J., Huang, S., Nübel, U., Hahnke, R.L. and Tindall, B.J. (2019) Relevance of phenotypic information for the taxonomy of not-yet-cultured microorganisms. Systematic and Applied Microbiology 42, 22–29. https://doi.org/10.1016/j.syapm.2018.08.009 Pacini, F. (1854) Osservazione microscopiche e deduzioni patologiche sul cholera asiatico. Gazette Medicale de Italiana Toscano Firenze 6, 405–412. Parish, T., Harris, M., Fry, N., Mathee, K., Trujillo, M.E., Bentley, S. and Thomson, N. (2019) DORA Editorial. International Journal of Systematic and Evolutionary Microbiology. 69, 1–2. https://doi. org/10.1099/ijsem.0.003172 Parker, C.T., Tindall, B.J. and Garrity, G.M. (2019) International Code of Nomenclature of Prokaryotes. International Journal of Systematic and Evolutionary Microbiology 69, S1–S111. https://doi. org/10.1099/ijsem.0.000778 Parks, D., Chuvochina, M., Waite, D.V., Rinke, C., Skarshewsky, A., Chaumeil, P.A. and Hugenholtz, P. (2018) A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nature Biotechnology 36, 996–1004. https://doi.org/10.1038/nbt.4229 Parte, A.C. (2018) LPSN - list of prokaryotic names with standing in nomenclature (bacterio.net), 20 years on. International Journal of Systematic and Evolutionary Microbiology. 68, 1825–1829. https://doi.org/ 10.1099/ijsem.0.002786 Peña-Gonzalez, A., Soto-giron, M.J., Smith, S., Sistrunk, J., Montero, L., Paez, M., Ortega, E., Hatt, J.K., Cevallos, W., Trueba, G., Levy, K. and Konstantinidis, K. (2019) Metagenomic signatures of gut infections caused by different Escherichia coli pathotypes. Applied and Environmental Microbiology 85: e01820-19. https://doi.org/10.1128/AEM.01820-19 Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., Peplies, J. and Glöckner, F.O. (2013) The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Research 41, D590–D596. https://doi.org/10.1093/nar/gks1219 Ramos-Barbero, M.D., Martin-Cuadrado, A.-B., Viver,T., Santos, F., Martinez-Garcia, M. and Anton, J. (2019) Recovering microbial genomes from metagenomes in hypersaline environments. The good, the bad and the ugly. Systematic and Applied Microbiology 42, 30–40. https://doi.org/10.1016/j.syapm.2018.11.001 Richter, M. and Rosselló-Móra, R. (2009) Shifting the genomic gold standard for the prokaryotic species definition. Proceedings of the National Academy of Sciences USA 106, 19126–19131. https://doi. org/10.1073/pnas.0906412106 Rodriguez-R, L.M., Gunturu, S., Harvey, W.T., Rosselló-Móra, R., Tiedje, J.M., Cole, J.R. and Konstantinidis, K.T. (2018) The microbial genomes atlas (MiGA) webserver: taxonomic and gene diversity analysis of Archaea and Bacteria at the whole genome level. Nucleic Acids Research 46(W1), W282–W288. https://doi.org/10.1093/nar/gky467 Rosselló-Móra, R. (2012) Towards a taxonomy of Bacteria and Archaea based on interactive and cumulative data repositories. Environmental Microbiology 14, 318–334. https://doi.org/10.1111/ j.1462-2920.2011.02599.x Rosselló-Móra, R. and Amann, R. (2015) Past and future species definitions for Bacteria and Archaea. Systematic and Applied Microbiology 38, 209–216. https://doi.org/10.1016/j.syapm.2015.02.001 Rosselló-Móra, R. and Sutcliffe, I. (2019a) Reflections on the introduction of the Digital Protologue Database - a partial success? Systematic and Applied Microbiology 42, 1–2 https://doi.org/10.1016/j.syapm.2018.12.002 Rosselló-Móra, R. and Sutcliffe, I. (2019b) Reflections on the introduction of the Digital Protologue Database - a partial success? Antonie van Leeuwenhoek 112, 141–143. https://doi.org/10.1007/ s10482-018-01221-z

18

R. Rosselló-Móra and E. Stackebrandt

Rosselló-Móra, R. and Whitman, W.B. (2019) Dialogue on the nomenclature and classification of prokaryotes. Systematic and Applied Microbiology 42, 5–14. https://doi.org/10.1016/j.syapm.2018.07.002 Rosselló-Mora, R., Lee, N., Antón, J. and Wagner, M. (2003) Substrate uptake in extremely halophilic microbial communities revealed by microautoradiography and fluorescence in situ hybridisation. Extremophiles 5, 409–413. https://doi.org/10.1007/s00792-003-0336-5 Rosselló-Móra, R., Lucio, M., Peña, A., Brito-Echeverría, J., López-López, A., Valens-Vadell, M., Frommberger, M., Antón, J. and Schmitt-Kopplin, P. (2008) Metabolic evidence for biogeographic isolation of the extremophilic bacterium Salinibacter ruber. The ISME Journal 2, 242–253. https://doi. org/10.1038/ismej.2007.93 Rosselló-Móra, R., Urdiain, M. and López-López, A. (2011) DNA-DNA hybridization. In: Rainey, F. and Oren, A. (eds) Methods in Microbiology Volume 38. Elsevier Ltd. pp. 325–347. https://doi.org/10.1016/ B978-0-12-387730-7.00015-2 Rosselló-Móra, R., Trujillo, M. and Sutcliffe, I.C. (2017a) Introducing a digital protologue: a timely move towards a database-driven systematics of archaea and bacteria. Systematic and Applied Microbiology 40, 121–122. https://doi.org/10.1016/j.syapm.2017.02.001 Rosselló-Móra, R., Trujillo, M. and Sutcliffe, I.C. (2017b) Introducing a digital protologue: a timely move towards a database-driven systematics of archaea and bacteria. Antonie van Leewenhoek 110, 455–456. https://doi.org/10.1007/s10482-017-0841-7 Sanger, F. and Coulson, A.R. (1975) A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase. Journal of Molecular Biology 93, 441–448. https://doi.org/10.1016/00222836(75)90213-2 Sapp, J. (2005) The Prokaryote-Eukaryote Dichotomy: Meanings and Mythology. Microbiology and Molecular Biology Reviews 69, 292–305. https://doi.org/10.1128/MMBR.69.2.292-305.2005 Sapp, J. (2009) The New Foundation of Evolution: On the Tree of Life. Oxford University Press, New York. Sogin, S.J., Sogin, M.L. and Woese, C.R. (1971) Phylogenetic measurement in procaryotes by primary structural characterization. Journal of Molecular Evolution 1, 173–184. https://doi.org/10.1007/ BF01659163 Sogin, S.J., Sogin, M.L. and Woese, C.R. (1972) Phylogenetic measurement in procaryotes by primary structural characterization. Journal of Molecular Evolution 1, 173–184. Sokal, R.R. and Sneath, P.H.A. (1963) Principles of Numerical Taxonomy. W.H. Freeman, San Francisco, CA. Stackebrandt, E. and Ebers, J. (2006) Taxonomic parameter revisited: tarnished gold standards. Microbiology Today 33, 152–155. Stackebrandt, E. and Smith D. (2017a) Expanding the ‘Digital Protologue’ database (DPD) to ‘Archives of Microbiology’: an offer to scientists and science. Archives of Microbiology 199, 519–520. https://doi. org/10.1007/s00203-017-1369-y Stackebrandt, E. and Smith, D. (2017b) Expanding the ‘Digital Protologue’ database (DPD) to ‘Current Microbiology’: an offer to scientists and science. Current Microbiolology 74, 1003–1004. https://doi.org/10.1007/ s00284-017-1290-2 Stackebrandt, E, Murray, R.G.E. and Trüper, H.G. (1988) Proteobacteria classis nov., a name for the phylogenetic taxon that includes the purple bacteria and their relatives. International Journal of Systematic Bacteriology 38, 321–325. https://doi.org/10.1099/00207713-38-3-321 Stackebrandt, E., Frederiksen, W., Garrity, G.M., Grimont, P.A.D., Kämpfer, P., Maiden, M.C.J., Nesme, X., Rosselló-Móra, R., Swings, J., Trüper, H.G., Vauterin, L., Ward, A.C. and Whitman, W.B. (2002) Report of the ad hoc committee for the re-evaluation of the species definition in bacteriology. International Journal of Systematic and Evolutionary Microbiology 52, 1043–1052. https://doi.org/10.1099/00207713-52-3-1043 Sutcliffe, I.C. (2015) Challenging the anthropocentric emphasis on phenotypic testing in prokaryotic species descriptions: rip it up and start again. Frontiers in Genetics 6, 218. https://doi.org/10.3389/ fgene.2015.00218 Sutcliffe, I.C. (2019) Valediction: descriptions of novel prokaryotic taxa published in Antonie van Leeuwenhoek-change in editorial policy and a signpost to the future? Antonie Van Leeuwenhoek 112, 1281–1282. https://doi.org/10.1007/s10482-019-01311-6 Sutcliffe, I.C., Trujillo, M.E. and Goodfellow, M. (2012) A call to arms for systematists: revitalising the purpose and practices underpinning the description of novel microbial taxa. Antonie Van Leeuwenhoek 101, 13–20. https://doi.org/10.1007/s10482-011-9664-0 Tamames, J. and Rosselló-Móra, R. (2012) On the fitness of microbial taxonomy. Trends in Microbiology 20, 514–516. https://doi.org/10.1016/j.tim.2012.08.012

Bridging 200 Years of Bacterial Classification

19

Uchida, T., Bonen, L., Schaup, H.W., Lewis, B.J., Zablen, L. and Woese, C. (1974) The use of ribonuclease U2 in RNA sequence determination. Some corrections in the catalog of oligomers produced by ribonuclease T1 digestion of Escherichia coli 16S ribosomal RNA. Journal of Molecular Evolution 3, 63–77. https://doi.org/10.1007/BF01795977 van Niel, C.B. (1955) Classification and taxonomy of the bacteria and blue green algae. In: Kessel, E.L. (ed.), A Century of Progress in the Natural Sciences, 1853–1953. California Academy of Sciences, San Francisco, CA, pp. 89–114. Viver, T., Cifuentes, A., Díaz, S., Rodríguez-Valdecantos, G., González, B., Antón, J. and Rosselló-Móra, R. (2015) Diversity of extremely halophilic cultivable prokaryotes in Mediterranean, Atlantic and Pacific solar salterns: evidence that unexplored sites constitute sources of cultivable novelty. Systematic and Applied Microbiology 38, 266–275. https://doi.org/10.1016/j.syapm.2015.02.002 Viver, T., Orellana, L.H., Díaz, S., Urdiain, M., Ramos-Barbero, M.D., Gonzalez-Pastor, J.E., Oren, A., Hatt, J.K., Amann, R., Anton, J., Konstantinidis, K.T. and Rosselló-Móra, R. (2019) Predominance of deterministic microbial community dynamics in salterns exposed to different light intensities. Environmental Microbiology 21, 4300–4315. https://doi.org/10.1111/1462-2920.14790 Wayne, L., Brenner, D.J., Colwell, R.R., Grimont, P.A.D., Kandler, O., Krichevsky, M.I., Moore, L.H., Moore, W.E.C., Murray, R.G.E., Stackebrandt, E., Starr, M.P. and Trüper, H.G. (1987) International Committee on Systematic Bacteriology: Report of the ad hoc committee on reconciliation of approaches to bacterial systematics. International Journal of Systematic Bacteriology 37, 463–464. https://doi. org/10.1099/00207713-37-4-463 Welker, M. and Moore, E.R.B. (2011) Applications of whole-cell matrix-assisted laser-desorption/ionization time-of-flight mass spectrometry in systematic microbiology. Systematic and Applied Microbiology 34, 2–11. https://doi.org/10.1016/j.syapm.2010.11.013 Whitman, W.B. (2015) Genome sequences as the type material for taxonomic descriptions of prokaryotes. Systematic and Applied Microbiology 38, 217–222. https://doi.org/10.1016/j.syapm.2015.02.003 Whitman, W.B. (2016) Modest proposals to expand the type material for naming of prokaryotes. International Journal of Systematic and Evolutionary Microbiology 66, 2108–2112. https://doi.org/10.1099/ ijsem.0.000980 Whitman, W.B., Oren, A., Chuvochina, M., da Costa, M.S., Garrity, G.M., Rainey, F.A., Rosselló-Móra, R., Schink, B., Sutcliffe, I., Trujillo, M.E. and Ventura, S. (2018) Proposal of the suffix -ota to denote phyla. Addendum to ‘proposal to include the Rank of phylum in the International Code of Nomenclature of Prokaryotes’. IJSEM 68, 967–969. https://doi.org/10.1099/ijsem.0.002593 Whitman, W.B., Sutcliffe, I.C. and Rosselló-Móra, R. (2019) Proposal for changes in the International Code of Nomenclature of Prokaryotes: granting priority to Candidatus names. International Journal of Systematic and Evolutionary Microbiology 69, 2174–2175. https://doi.org/10.1099/ijsem.0.003419 Woese, C.R., Sogin, M.L., Bonen, L. and Stahl, D. (1975) Sequence studies on 16S ribosomal RNA from a blue-green alga. Journal of Molecular Evolution 4, 307–315. https://doi.org/10.1007/BF01732533 Woese, C.R., and Fox, G.E. (1977) Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proceedings of the National Academy of Sciences USA 74, 5088–5090. https://doi. org/10.1073/pnas.74.11.5088 Woese, C.R., Stackebrandt, E. and Ludwig, W. (1984) What are mycoplasmas: the relationship of tempo and mode in bacterial evolution. Journal of Molecular Evolution 21, 305–316. https://doi.org/10.1007/ BF02115648 Woese, C.R., Debrunner-Vossbrinck, B.A., Oyaizu, H., Stackebrandt, E. and Ludwig, W. (1985) Gram-positive bacteria: possible photosynthetic ancestry. Science 229, 762–765. https://doi. org/10.1126/science.11539659 Woese, C.R., Kandler, O. and Wheelis, M.L. (1990) Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proceedings of the National Academy of Sciences USA 87, 4576–4579. https://doi.org/10.1073/pnas.87.12.4576 Yarza, P., Yilmaz, P., Pruesse, E., Glöckner, F.O., Ludwig, W., Schleifer, K.-H., Whitman, W., Euzéby, J., Amann, R. and Rosselló-Móra, R. (2014) Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nature Reviews in Microbiology 12, 635–645. https:// doi.org/10.1038/nrmicro3330 Zavarzin, G.A., Stackebrandt, E. and Murray, R.G. (1991) A correlation of phylogenetic diversity in the Proteobacteria with the influences of ecological forces. Canadian Journal of Microbiology 37, 1–6. https://doi.org/10.1139/m91-001

20

R. Rosselló-Móra and E. Stackebrandt

Zuckerkandl, E. and Pauling, L. (1965) Evolutionary divergence and convergence in proteins. In: Bryson, V. and Vogel, H.J., (eds) Evolving Genes and Proteins. Academic Press, New York, pp. 97–166. https:// doi.org/10.1016/B978-1-4832-2734-4.50017-6 Zuñiga, C., Zaramela, L. and Zengler, K. (2017) Elucidation of complexity and prediction of interactions in microbial communities. Microbial Biotechnology 6, 1500–1522. https://doi.org/10.1111/1751-7915.12855

2

Identification of Fungi: Background, Challenges and Prospects Tom W. May* Royal Botanic Gardens Victoria, Melbourne, Victoria, Australia

Introduction Fungi are one of the main radiations of eukaryote life along with Plantae and Metazoa (multicellular animals), and are diverse phylogenetically and ecologically. This chapter introduces the kingdom Fungi, and other organisms referred to as fungi, as background for discussion of issues around the identification of fungi. The process of identification is sketched out – particularly with reference to names, taxa and taxon concepts, which it is important to recognize as potentially changing over time. The diversity of fungi makes identification challenging, especially because many species are yet to be described and/or documented. Other challenges include variable biology and, frequently, wide geographical distributions. Populating and interlinking name, type, trait and sequence databases is one approach that offers promise for facilitating the identification of fungi. Nevertheless, an ongoing challenge is development of an accepted system of referring to ‘dark taxa’, those fungi known only from environmental DNA sequences.

Fungi and Fungi The term ‘fungi’ (lower case) colloquially refers to organisms that reproduce by spores and typically

comprise hyphae that extend apically, branch and anastomose. Fungi in this sense are polyphyletic, appearing in several branches of the tree of life. The circumscription of kingdom Fungi (upper case and italicised to denote that it is the name of a taxon) has evolved as elements were removed. Initially, lineages that belonged elsewhere in the tree of life were recognized as discordant in their morphological and biochemical characters. For example, zoospores (motile spores) are present in various fungi, but those of Fungi have a single flagellum, while those of the Oomycota (water moulds) have two flagellae; one of these is whiplash-like and the other has a tinsel-like structure. Such heterokont zoospores suggested a closer relationship with brown algae rather than with Fungi. The widespread use of molecular data to reconstruct evolutionary relationships over the last several decades has further clarified the limits of the Fungi (which is where most fungi belong) and the position of fungi-like organisms in other lineages. The main groups of fungi-like organisms belonging outside of Fungi are the Oomycota, Hyphochytriomy cota and Labyrinthulomycota, which are placed in the Straminipila (Wijayawardene et al., 2020). In addition, slime moulds superficially resemble fungi by production of spores but are motile and phagotrophic and belong outside of Fungi, with most placed in the Eumycetozoa within the

*[email protected] © CAB International 2021. Trends in the Systematics of Bacteria and Fungi (eds. P. Bridge, D. Smith and E. Stackebrandt)

21

22

T.W. May

supergroup Amoebozoa (Wijayawardene et al., 2020). While there is broad agreement among mycologists on what organisms are Fungi, apart from some basal lineages (see below), there is considerable variation in recent classifications of Fungi in the rank accepted for higher taxa. The number of phyla accepted ranges from eight in Spatafora et al. (2017) – Ascomycota, Basidi omycota, Blastocladiomycota, Chytridiomycota, Cryptomycota, Microsporidia, Mucoromycota and Zoopagomycota – to 19 by Wijayawardene et al. (2020). The classification of phyla in the latter work is largely congruent with that of Tedersoo et al. (2018), with the addition of Caulochytrio mycota. In contrast to Spatafora et al. (2017), the arrangement of Wijayawardene et al. (2020) accepted 11 further phyla (Basidiobolomycota, Calcarisporiellomycota, Caulochytriomycota, Ento mophthoromycota, Entorrhizomycota, Glomero mycota, Kickxellomycota, Monoblepharomycota, Mortierellomycota, Neocallimastigomycota and Olpi diomycota), while placing Microsporidia at class level in the Rozellomycota (an alternative name for Cryptomycota). Most of the additional phyla were recognized by Spatafora et al. (2017), but at lower ranks in the taxonomic hierarchy, such as at subphyla or class level. In addition, the Aphelidiomycota are recognized by Tedersoo et al. (2018) and Wijayawardene et al. (2020) as a phylum-level taxon in the Fungi. The aphelids were not considered by Spatafora et al. (2017) but are discussed by Richards et al. (2017) in relation to basal members of the kingdom Fungi. In addition to formally named taxa there are numerous Fungi known only from DNA sequences in environmental samples such as soil. When placed in phylogenies, some of these sequences appear to represent novel lineages as high as the order level (Tedersoo et al., 2017). Some lineages of Fungi are well-represented among environmental DNA, but hardly at all by specimens or cultures, for example the Archae orhizomycetes (Rosling et al., 2011). Owing to the nomenclatural requirement for a physical specimen, fungi known only from sequences cannot be formally named at present. Within the phyla of Fungi there is considerably diversity, with Wijayawardene et al. (2020) recognizing 78 classes, 268 orders and more than 1000 families. Changes to the classification of Fungi over time have often been due to raising

of taxa at lower rank (such as subphyla or classes) to taxa of higher rank (such as phyla). These changes reflect the realization of the distinctiveness of lineages within Fungi – indicated from analysis of molecular data – and the deep evolutionary origin of many lineages. Such changes are likely to continue, especially as further taxa are sampled. However, resolution of the topology and timing of deep nodes in the fungal tree of life may remain problematic, owing to the ancient origin of the kingdom and the paucity of fossils. Early classifications of Fungi utilized morphological characters, such as the way sexual spores were produced. Current classifications rely heavily on phylogenetic reconstructions utilizing molecular data. Some morphological and biochemical characters do map well to phylogenies. For example, dikaryotic hyphae (containing two genetically distinct nuclei) are present at some stage in the life cycle of members of the subkingdom Dikarya (comprised of Ascomycota and Basdiomycota). However, phenotypic synapomorphies are not always able to be distinguished. Indeed, for Fungi, many traits formerly used to characterize the kingdom are found elsewhere in the tree of life. Richards et al. (2017) discussed a number of seemingly ‘Fungal’ characteristics, such as the presence of ergosterol and cell walls containing chitin, that are present outside of the fungal kingdom, particularly among protists. The Spitzenkörper is a cytological feature at the tip of hyphae comprising an organized grouping of vesicles, actin and ribosomes involved in hyphal elongation (Riquelme and Sánchez-León, 2014). This structure is unique to Fungi, but is not found in all phyla within the kingdom (Richards et al., 2017). A more promising candidate for a synapomorphy for Fungi is the polarisome, a protein complex also present at the hyphal tip and involved in growth. Functional genes coding for polarisome proteins seem to be unique and widespread in Fungi, but Richards et al. (2017) point out that more sampling of basal lineages is required and some polarisome components may be present in opisthokonts outside of Fungi. To overcome the lack of phenotypic synapomorphies, Hibbett et al. (2018) provide phylogenetic taxon descriptions for Fungi and several of the higher level taxa within. They characterize the taxa by node- or stembased definitions with reference to particular phylogenies.

Identification of Fungi: Background, Challenges and Prospects

Fungi are heterotrophic, with a diversity of trophic modes including parasitism, mutualism and saprotrophy. Mutualists include mycorrhizal fungi that grow in close association with most green plants and lichenized fungi (‘lichens’) that form a compound thallus with fungal tissue (generally of an ascomycete) housing photosynthetic photobionts that may be algae or cyanobacteria. Fungi occupy a great diversity of niches and substrates across terrestrial and marine environments. Among the many diverse assemblages of fungi associated with particular hosts or niches, examples include lichenicolous fungi that grow on lichens (Lawrey and Diederich, 2003); endophytic fungi that are considered to form symbioses with ‘all plants in natural ecosystems’ (Rodriguez et al., 2009); fungi inhabiting sessile marine organisms such as corals and sponges (Yarden, 2014); anaerobic Neocallimas tigomycetes occurring in the gastrointestinal tract of ruminant animals (Fliegerova et al., 2015); and the more than 1000 species of fungi pathogenic to insects (Vega et al., 2012). Throughout the fungi, and especially in the basal lineages, there are fungal parasites of fungi. For example, Rozella allomycis (Rozellomycota) forms spores within hyphae of Allomyces (Blastocladiomycota), necessitating careful interpretation to distinguish the structures of host and parasite. Many fungi grow as hyphae divided into compartments by frequent cross-walls, but there are variations on this basic theme. Hyphae in some fungi are coenocytic, with large compartments free of cross-walls, allowing free movement of nuclei. Where cross-walls (septa) are present, some movement of organelles may still be possible via septal pores. Specialized cell types produced by Fungi include: (i) unicellular yeasts; (ii) the rhizoid thallus of chytrids; and (iii) the cellular thallus of Laboulbeniomycetes (insect ectoparasites), which consists of a defined number of cells formed from cell division in multiple planes (Blackwell et al., 2020). Some species of fungi are only known to produce a yeast morphology, but in other species there can be switching between yeast and hyphal phases. There is a great variety of ways that sex occurs in Fungi. For example, in Blastocladiomycota there is an alteration of generations, during which motile, flagellate gametes produced by gametangia fuse to form a zygote leading to production of a meiosporangium in which meiosis

23

occurs; in Mucoromycota compatible hyphae fuse to form a sexual spore called the zygospore; in Ascomycota sexual spores are formed within the ascus; and in Basidiomycota sexual spores are formed on the exterior of the basidium (Lee et al., 2010). Karyogamy (nuclear fusion) may occur immediately after plasmogamy (cell fusion), as in the Chytridiomycota, or there can be an extended dikaryotic stage where parental nuclei remain separate, as in Basidiomycota. Mating types, controlled by one or two loci, determine mating compatibility, with successful matings occurring when the mating types differ. Often there are two alleles at the mating type locus, but sometimes there may be numerous alleles (Lee et al., 2010; Coelho et al., 2017). In addition to many fungi having multinucleate stages, such as the dikaryon in the Dikarya, there are unusual arrangements of nuclei and chromosomes in some fungal lineages. For example, in the Glomeromycota, spores and coenocytic hyphae are multinucleate and may contain many thousands of nuclei. In this group, population genomics indicates significant genetic variation among these nuclei (Wyss et al., 2016). Ploidy (the number of full sets of chromosomes) may vary, with evidence in various fungi for polyploids and aneuploids along with variation in the ploidy level related to environmental conditions (Todd et al., 2017). An extreme form of variation in nuclear number and ploidy is represented by Eremothecium gossypii (=Ashbya gossypii, Ascomycota) which forms a syncytium with multiple nuclei which may be of different ploidy (Todd et al., 2017). Asexual reproduction is common, and indeed many fungi are known only from the asexual stage. Resting stages such as sclerotia or chlamydospores may also be present. The connection between asexual and sexual stages of the same fungus may not be evident except after careful study of the life cycle, and different stages may utilize different hosts. In some fungi known only from an asexual stage, sex is implicated by population genetic structure and/or the presence of mating type genes (Dyer and Kück, 2017). However, there is also evidence for a parasexual cycle that enables recombination during mitosis (Lee et al., 2010). The most complex life cycles among Fungi are exhibited by rust fungi (Pucciniales) which may have up to four spore types (asexual and sexual) with different stages often occurring on unrelated plant hosts (Aime et al., 2018).

24

T.W. May

The considerable variation in biology, ecology and reproductive modes of fungi, coupled with morphological diversity, means that different terminology is applied in different lineages; this can be a challenge when identifying unfamiliar groups. Further challenges are discussed below, after a consideration of the identification process.

The Identification Process For any organism, identification is the process of assigning a sample of unknown identity to a taxon. Taxa are entities in a taxonomic hierarchy. Examples of different ranks of taxa within the hierarchy are: species, genera, families, orders, classes and phyla (in this sequence within the hierarchy). Colloquially, when a new species is discovered and characterized, one might say that ‘a new species has been identified’, although it is useful to restrict ‘identification’ to the process of assigning unknowns to taxa. Delimitation is the appropriate term for characterizing a novel taxon. Names of fungi are regulated by the International Code of Nomenclature for Algae, Fungi, and Plants (ICN; Turland et al., 2018). The ICN deals with whether names are effectively published and the conditions for names to be valid and legitimate. Examples of stipulations of the ICN for novel species include the need for a description or diagnosis in English or Latin, and the requirement for a type specimen to be explicitly cited. However, the ICN does not regulate taxonomy, and the limits of taxa are a matter for taxonomists, moderated by peer review and community acceptance of particular taxonomies. Each species name is anchored to a type specimen, which provides certainty for the application of the name in a taxonomy. However, the type specimen is not necessarily typical of the taxon in the sense of representing the average of variation across characters within the species. When the concept of a taxon changes, such as when a species is split, the name always travels with the type. When one species is split into two, one of the constituents retains the name (attached to the original type) and the other constituent either gets a new name (with a new type) or else a more recent synonym is taken up. Therefore, an identification to the species name

prior to a split may not be correct in respect of an updated taxonomy (see also Chapter 5). At any given point in time, an identification is to a particular concept of a taxon rather than simply to a name. A taxon concept is based on the characteristics of individual isolates or collections that are assigned to the taxon by the taxonomist. The concept of a taxon may well – and often does – change over time, even if the name does not. Nevertheless, some taxa remain known only from their type for many years. Indeed, the limits of fungal taxa, in terms of the range of variation in morphology, physiology and DNA sequences, may take some considerable time to establish. Over time, the addition of isolates or collections to a taxon can only widen the limits of variation of that taxon. For continuous characters, such as spore dimensions, adding exemplars may lessen the distinction from other taxa when the range of variation in such characters is utilized for identification. For other characters, such as DNA sequence variants (‘haplotypes’), additional variants may continue to be unique, at least in comparison to other taxa, but the gap between the range of within-species and between-species distances may be lessened. Whatever characters are used in an identification, it is useful to specify both the method of identification and the source of taxon concepts against which the identification has been made (see also Chapters 5 and 18).

Challenges of Identifying Fungi Identifying a specimen of a fungus to a named species is challenging owing to many factors related to biology, diversity, biogeography, evolution and state of documentation. The specimen available for identification is rarely the whole fungus in all its stages. Coupled with this challenge, the possibility of mixed samples must always be taken into account, particularly due to the intimate association between many fungi and their substrate, which means that other fungi are likely to be present. Obtaining a pure culture can address the issue of mixtures, but the particular stages required for morphological identification may be difficult or time-consuming to produce in culture. Where cultures cannot be

Identification of Fungi: Background, Challenges and Prospects

obtained, as in many biotrophic fungi, the material at hand must always be treated as a potential mixture, especially where solely sequence-based identification is utilized. Even a mushroom, preserved as a fungarium specimen, may contain mycoparasites within tissues as well as a variety of spores of other fungi on surfaces. Historically, different forms of the one fungus producing either a sexual or sexual spores, were able to be provided with different names under the ICN. As more and more fungi were able to be placed in a phylogenetic classification, including species known only from asexual stages, a paradigm shift to ‘one fungus:one name’ occurred, leading to a change in the ICN preventing the overt naming of different stages of the one fungus with different names (Crous et al., 2015). There was a managed transition to mesh together names for asexual and sexual stages for fungi of economic importance. For example, the genus Trichoderma was chosen to represent the taxon for which sexual forms were formerly placed in Hypocrea and asexual forms were formerly placed in Trichoderma. Therefore, Tricho derma now encompasses species producing asexual and/or sexual spores (Bissett et al., 2015). However, in some lineages of fungi, the transition to ‘one fungus:one name’ is ongoing. This transition may complicate identification, as there can be high similarity in DNA sequence comparisons to species in different genera, which are merely different names for stages of the one fungus. The magnitude of diversity contributes to the difficulty in identifying fungi, first because there are so many species, and also because so many species are yet to be formally described. Hawksworth and Lücking (2017) estimated fungal diversity at between 2.2 and 3.8 million species. This estimate of fungal diversity is based on rates of description of new taxa, comparison of pre- and post-revision accounts for recently well-studied groups and indications from metabarcoding studies. Even in well-studied genera such as Penicillium, novel species continue to be detected (see also Chapters 5 and 14). In all groups of fungi, revisions that utilize sequence information often lead to the description of novel cryptic species with subtle morphological characteristics, which would not on their own be sufficient to characterize species. Less often, revisions utilizing sequence information merge existing taxa. There are around 120,000 formally

25

described and accepted species of fungi. In recent years, around 2000 novel species of fungi have been described per year. Using this rate, and a figure of 3 million expected species, it will take at least 1400 years to fully document the fungal kingdom (Lücking, 2020). A particular case of how mega-diversity impacts identification is metabarcoding. At species level, metabarcoding studies using next-generation sequencing recover high numbers of molecular operational taxonomic units (mOTUs) when binning short sequences by similarity at a given level such as 97% (Tedersoo et al., 2014). However, equating these numerous mOTUs to phylogenetic species is challenging. This is because reference sequence databases do not fully cover known species, and because of issues in the analysis of the often rather short sequences including primer choice, chimera detection, and selection of appropriate analyses and settings in the multiple steps of the bioinformatic pipeline (Lindahl et al., 2013). Computation time when handling large data sets has been a constraint, leading to use of clustering algorithms based on similarity (not involving alignment), but methods that utilize placement into a reference tree such as phylogenetic binning are emerging as practical options (Carbone et al., 2019). Parallel evolution of morphological features is a common occurrence across fungal lineages. Older classifications based solely on morphology used simple features at higher taxonomic levels (such as orders or families) to define taxonomic groups: for example, mushrooms, truffles or yeasts. Many groups so defined have been shown to be polyphyletic once DNA sequences are taken into account. The newly defined higher taxa often lack morphological synapomorphies (shared derived characters unique to the taxon). Consequently, it is no longer feasible in many lineages to create keys based on morphology that cascade down the taxonomic hierarchy (through class, order and family to genus). The wide geographic distribution of fungi is an additional challenge to identifying fungi. While fungi are not everywhere, they do often have distributions at the continental level, or at least over very broad geographic areas (May, 2017). Therefore, when making an identification (even at the genus level), the pool of potential candidates is often large. Widely distributed fungi may nevertheless have specific niche requirements,

26

T.W. May

in terms of hosts or microhabitats. Where there is strong host specificity, such as in some biotrophic fungi, their host may provide reliable support for identification, but the breadth of host range varies considerably even within lineages, and host jumps are possible, especially for species outside of their normal range. A further complicating factor to identification is the increasing movement of fungi around the globe due to human agency, whether crop or human pathogens, or mycorrhizal fungi such as the death cap Amanita phalloides (Pringle and Vellinga, 2006). Changes to names are a challenge for user groups of taxonomy (see also Chapters 5, 12 and 18). In some user groups there can be resistance to accepting name changes, while other groups (such as citizen scientists) may be eager to adopt the ‘latest names’. In a relatively small number of cases, changes to names are nomenclatural, resulting from following the rules in the ICN. Examples include the discovery of overlooked earlier names for the same taxon or the discovery that a name has been used before for a different taxon (homonymy). Under a given taxonomy, there is one correct name for each taxon under the ICN. However, mechanisms are available under the ICN to promote stability, allowing conservation of newer names against older names, or names with conserved types. Nevertheless, most name changes result from taxonomic decisions, such as taxonomists altering the boundaries of genera or splitting species. Disruption to established names can sometimes be dealt with by conservation. More often than not, taxonomic changes inevitably lead to name changes. On the positive side, changes to names – such as the circumscription of species and the placement of species within genera that more closely reflect evolutionary relationships – can lead to greater predictive power in relation to the phenotype as a whole. When identifying any fungus, it is vital to take into account the possibility that the specimen at hand may be undescribed. Incorporation of a validation step in the identification process should always be considered. For example, in using a dichotomous key, a small subset of characters may have been utilized. Once a name is obtained, comparison of the material across all available characters is advisable as a double check on the reliability of the identification.

Prospects for Addressing Challenges Identification is underpinned by information. Name databases act as a skeleton, to which other critical information can be attached either directly, or by cross reference to the names. Various processes, including identification, could be streamlined if name and associated databases were fully populated as far as formally described species. Since 2013, it has been mandatory under the ICN to register new names of fungi, leading to almost full compliance with authors depositing details of names with one of the approved repositories (Fungal Names, Index Fungorum and MycoBank; see also Chapter 3). Names that are not registered are not valid. Ideally, in name databases, there is access to the protologue (original description) and information on types. There is a steady increase in populating such information, such as by providing links to protologues in literature digitized for the Biodiversity Heritage Library (2020). Type information is also gradually being added to name databases. Once the verbatim original type citation is available (such as locality, collector and reference collection), this needs to be linked to contemporary reference collection databases, as evidence of the existence and current whereabouts of the type specimen. Population of reference collection databases is patchy, but is progressing (as is imaging of types) such as through the Global Plants Initiative (Global Plants, 2020), which, despite the name, does include fungi). Early names often need typification, such as by selecting lectotypes (in the case of multiple syntypes) or neotypes (in the case of missing or destroyed types). Until recently, tracking such later typifications has been difficult, but the ICN now mandates registration of typification acts, which is leading to effective trapping of information on typifications in the name databases. Nevertheless, numerous older publications need checking for typifications. Original descriptions are sources of information on traits, whether morphological, physiological or ecological. Subsequent collections also provide enhanced trait information. Synthesizing trait information into databases facilitates identification by opening up the possibility of dynamic multiple access identification systems in contrast to conventional dichotomous keys.

Identification of Fungi: Background, Challenges and Prospects

Dichotomous keys produced in taxonomic treatments remain static, and are difficult to revise as novel taxa are added, while multiple access keys can readily accept further taxa. For example, there is no simple way of using unusual morphological characters (such as large or peculiarly shaped spores) to lead to an identification, unless spore characters for all fungi are available via a single database. Such an approach requires standardized ontologies (lists of characters and character states). Ideally, information on traits would be held at the specimen or isolate level, allowing automatic assembly of taxon descriptions as the boundaries of taxa (whether species or higher taxa) are adjusted by the taxonomist. Owing to the utility of sequence data in identifying fungi (but see critique by Lücking et al., 2020), it is desirable that there is a sequence attached to each name, if possible through the type specimen (see also Chapters 5 and 12). Where older types are not available for destructive sampling, or will not yield DNA, epitypification is a mechanism under the ICN for attaching a sequenced collection to a name. Initiatives such as the Reference Sequence (RefSeq) collection are making it easier to locate sequences obtained from type specimens (Schoch et al., 2014; National Center for Biotechnology Information, 2020; see also Chapters 5 and 12). Increased interlinking of name, type, trait and sequence databases is required to optimally use information and to clearly see what work remains to be done in fully populating each database. More complete coverage in databases would not only provide significant support for fast and accurate identification, but also assist greatly in the rapid publication of novel species by facilitating comparisons against known species. Comprehensive systems utilizing ‘cloud-based dynamic data network platforms’ are already envisaged for certain groups, such as pathogenic fungi (Prakash et al., 2017). The scale of such databases is a step change from current approaches and the amount of work to populate them should not be underestimated. Trait-scoring in the Global Information System for Lichenized and Non-Lichenized Ascomycetes (LIAS) evolved from an original concept of community-driven data input to LIAS-light, which involved scoring primarily by members of one research group, covering around 50 characters scored for around 10,000 taxa (Rambold et al., 2014). Emerging platforms that

27

enable citizen science contributions to trait-scoring and label transcription have great potential to assist in dealing with large numbers of individual tasks (Ellwood et al., 2018). The most significant current challenge for mycology is how to deal with the naming of socalled dark taxa where there is no physical type specimen (in the intended sense of a gathering of a single species) but a DNA sequence indicating that a novel taxon exists. Discussion on this issue to date has been polarized (Lücking and Hawksworth, 2018; Thines et al., 2018). There is a formal route, where the provisions of the ICN would be modified; and an informal route, outside of the ICN. Provisions of the ICN specific to fungi (those in Chapter F of the ICN) can be revised or added through actions of an International Mycological Congress (IMC), held every 4 years. It is important to separate the use of characteristics of DNA sequences for diagnostic purposes when establishing taxa on a physical specimen (permitted under the ICN) from the use of a DNA sequence as a ‘type’ for a novel species in the absence of a physical specimen (not currently permitted). At the 2018 San Juan IMC there was an unsuccessful attempt to change the wording of the ICN to allow a DNA sequence to serve as a type specimen (May and Redhead, 2018). As an alternative, there are already systems for informal naming, such as the versioned designation of species hypotheses in the UNITE fungal Internal Transcribed Spacer (ITS) sequence database (Kõljalg et al., 2019; UNITE, 2020). It is essential to develop a naming system for dark taxa – whether formal or informal – that has wide acceptance in the mycological community.

Conclusion The diversity of fungi and their myriad interactions with other organisms makes them a compelling subject for study, but this diversity does complicate identification. Accurate identification is a crucial underpinning of biological research, and taxonomy underpins identification. Identification will continue to be challenging, but conceptualizing and realizing the goal of completing the taxonomic inventory of fungi (Lücking 2020), in relation to both describing and documenting, will be a key contribution to enabling accurate and precise identification.

28

T.W. May

References Aime, M.C., Bell, C.D. and Wilson, A.W. (2018) Deconstructing the evolutionary complexity between rust fungi (Pucciniales) and their plant hosts. Studies in Mycology 89, 143–152. https://doi.org/10.1016/j. simyco.2018.02.002 Biodiversity Heritage Library (2020) Biodiversity Heritage Library. Available at: https://www.biodiversitylibrary.org/ (accessed 2 June 2020). Bissett, J., Gams, W., Jaklitsch, W. and Samuels, G.J. (2015) Accepted Trichoderma names in the year 2015. IMA Fungus 6, 263–295. https://doi.org/10.5598/imafungus.2015.06.02.02 Blackwell, M., Haelewaters, D. and Pfister, D.H. (2020) Laboulbeniomycetes: evolution, natural history, and Thaxter’s final word. Mycologia. https://doi.org/10.1080/00275514.2020.1718442. Carbone, I., White, J.B., Miadlikowska, J., Arnold, A.E., Miller, M.A. et al. (2019) T-BAS version 2.1: Tree-Based Alignment Selector toolkit for evolutionary placement of DNA sequences and viewing alignments and specimen metadata on curated and custom trees. Microbiology Resource Announcements 8(29), e00328-19. https://doi.org/10.1128/MRA.00328-19 Coelho, M.A., Bakkeren, G., Sun, S., Hood, M.E. and Giraud, T. (2017) Fungal sex: the Basidiomycota. Microbiology Spectrum 5(3). https://doi.org/10.1128/microbiolspec.FUNK-0046-2016 Crous, P.W., Hawksworth, D.L. and Wingfield, M.J. (2015) Identifying and naming plant-pathogenic fungi: past, present, and future. Annual Review of Phytopathology 53, 247–267. DOI: 10.1146/annurevphyto-080614-120245 Dyer, P.S. and Kück, U. (2017) Sex and the imperfect Fungi. Microbiology Spectrum 5(3). doi:10.1128/ microbiolspec.FUNK-0043-2017 Ellwood, E.R., Kimberly, P., Guralnick, R., Flemons, P., Love, K. et al. (2018) Worldwide engagement for digitizing biocollections (WeDigBio): the biocollections community’s citizen-science space on the calendar. BioScience 68, 112–124. https://doi.org/10.1093/biosci/bix143 Fliegerova, K., Kaerger, K., Kirk, P. and Voigt, K. (2015) Rumen Fungi. In: Puniya, A., Singh, R. and Kamra, D. (eds) Rumen Microbiology: From Evolution to Revolution. Springer, New Delhi, India. Global Plants (2020) Global Plants. Available at: https://plants.jstor.org/ (accessed 2 June 2020). Hawksworth, D.L. and Lücking, R. (2017) Fungal diversity revisited: 2.2 to 3.8 million species. Microbiology Spectrum 5(4). doi:10.1128/microbiolspec.FUNK-0052-2016 Hibbett, D.S., Blackwell, M., James, T.Y., Spatafora, J.W., Taylor, J.W. et al. (2018) Phylogenetic taxon definitions for Fungi, Dikarya, Ascomycota and Basidiomycota. IMA Fungus 9, 291–298. https://doi. org/10.5598/imafungus.2018.09.02.05 Kõljalg, U., Abarenkov, K., Nilsson, R.H., Larsson, K.H. and Taylor, A.F. (2019) The UNITE database for molecular identification and for communicating fungal species. Biodiversity Information Science and Standards 3:e37402. https://doi.org/10.3897/biss.3.37402 Lawrey, J.D. and Diederich, P. (2003) Lichenicolous fungi: interactions, evolution, and biodiversity. The Bryologist 106, 80–120. https://doi.org/10.1639/0007-2745(2003)106[0080:LFIEAB]2.0.CO;2 Lee, S.C., Ni, M., Li, W., Shertz, C. and Heitman, J. (2010) The evolution of sex: a perspective from the fungal kingdom. Microbiology and Molecular Biology Reviews 74, 298–340. DOI: 10.1128/ MMBR.00005-10 Lindahl, B.D., Nilsson, R.H., Tedersoo, L., Abarenkov, K., Carlsen, T. et al. (2013) Fungal community analysis by high-throughput sequencing of amplified markers – a user’s guide. New Phytologist 199, 288–299. https://doi.org/10.1111/nph.12243 Lücking, R. (2020) Three challenges to contemporaneous taxonomy from a licheno-mycological perspective. Megataxa 1(1), 78–103. https://doi.org/10.11646/megataxa.1.1.16 Lücking, R. and Hawksworth, D.L. (2018) Formal description of sequence-based voucherless Fungi: promises and pitfalls, and how to resolve them. IMA Fungus 9, 143–165. https://doi.org/10.5598/imafungus. 2018.09.01.09 Lücking, R., Aime, M.C., Robbertse, B., Miller, A.N., Ariyawansa, H.A. et al. (2020) Unambiguous identification of fungi: where do we stand and how accurate and precise is fungal DNA barcoding? IMA Fungus 11(14), 1–32. https://doi.org/10.1186/s43008-020-00033-z May, T.W. (2017) Biogeography of Australasian fungi: from mycogeography to the mycobiome. In: Ebach, M. (ed.) Handbook of Australasian Biogeography. CRC Press, Boca Raton, Florida, pp. 155–214. May, T.W. and Redhead, S.A. (2018) Synopsis of proposals on fungal nomenclature: a review of the proposals concerning Chapter F of the International Code of Nomenclature for algae, fungi, and plants

Identification of Fungi: Background, Challenges and Prospects

29

submitted to the XI International Mycological Congress, 2018. IMA Fungus 9, ix–xiv. https://doi. org/10.1007/BF03449482 National Center for Biotechnology Information (2020) RefSeq: NCBI Reference Sequence Database. Available at: https://www.ncbi.nlm.nih.gov/refseq/ (accessed 2 June 2020). Pringle, A. and Vellinga, E.C. (2006) Last chance to know? Using literature to explore the biogeography and invasion biology of the death cap mushroom Amanita phalloides (Vaill. ex Fr. :Fr.) Link. Biological Invasions 8, 1131–1144. https://doi.org/10.1007/s10530-005-3804-2 Prakash, P.Y., Irinyi, L., Halliday, C., Chen, S., Robert, V. et al. (2017) Online databases for taxonomy and identification of pathogenic fungi and proposal for a cloud-based dynamic data network platform. Journal of Clinical Microbiology 55, 1011–1024. DOI: 10.1128/JCM.02084-16 Rambold, G., Elix, J., Heindl-Tenhunen, B., Köhler, T., Nash III, T. et al. (2014) LIAS light – towards the ten thousand species milestone. MycoKeys 8, 11–16. https://doi.org/10.3897/mycokeys.8.6605 Richards, T.A., Leonard, G. and Wideman, J.G. (2017) What defines the “Kingdom” Fungi? Microbiology Spectrum 5, 1–21. DOI: 10.1128/microbiolspec.FUNK-0044-2017 Riquelme, M. and Sánchez-León, E. (2014) The Spitzenkörper: a choreographer of fungal growth and morphogenesis. Current Opinion in Microbiology 20, 27–33. https://doi.org/10.1016/j.mib.2014.04.003 Rodriguez, R.J., White, J.F. Jr., Arnold, A.E. and Redman, R.S. (2009) Fungal endophytes: diversity and functional roles. New Phytologist 182, 314–330. https://doi.org/10.1111/j.1469-8137.2009.02773.x Rosling, A., Cox, F., Cruz-Martinez, K., Ihrmark, K., Grelet, G.-A. et al. (2011) Archaeorhizomycetes: unearthing an ancient class of ubiquitous soil Fungi. Science 333, 876–879. DOI: 10.1126/science.1206958 Schoch, C.L., Robbertse, B., Robert, V., Vu, D., Cardinali, G. et al. (2014) Finding needles in haystacks: linking scientific names, reference specimens and molecular data for fungi. Database 2014, 1–21. DOI: https://doi.org/10.1093/database/bau061 Spatafora, J.W., Aime, M.C., Grigoriev, I.V., Martin, F., Stajich, J.E. et al. (2017) The fungal tree of life: from molecular systematics to genome‐scale phylogenies. Microbiology Spectrum 5(5), FUNK-0053-2016. DOI: 10.1128/microbiolspec.FUNK-0053-2016 Tedersoo, L., Bahram, M., Põlme, S., Kõljalg, U., Yorou, N.S. et al. (2014) Global diversity and geography of soil fungi. Science 346, 1256688. https://doi.org/10.1126/science.1256688 Tedersoo, L., Bahram, M., Puusepp, R., Nilsson, R.H., and James, T.Y. (2017) Novel soil-inhabiting clades fill gaps in the fungal tree of life. Microbiome 5(1), 42. https://doi.org/10.1186/s40168-017-0259-5 Tedersoo, L., Sánchez-Ramírez, S., Kõljalg, U., Bahram, M., Döring, M. et al. (2018) High-level classification of the Fungi and a tool for evolutionary ecological analyses. Fungal Diversity 90, 135–159. https:// doi.org/10.1007/s13225-018-0401-0 Thines, M., Crous, P.W., Aime, M.C., Aoki, T., Cai, L. et al. (2018) Ten reasons why a sequence-based nomenclature is not useful for fungi anytime soon. IMA Fungus 8, 177–183. https://doi.org/10.5598/imafungus.2018.09.01.11 Todd, R.T., Forche, A. and Selmecki, A. (2017) Ploidy variation in fungi: polyploidy, aneuploidy, and genome evolution. Microbiology Spectrum 5(4). doi:10.1128/microbiolspec.FUNK-0051-2016 Turland, N.J., Wiersema, J.H., Barrie, F.R., Greuter, W., Hawksworth, D.L. et al. (eds) (2018) International Code of Nomenclature for algae, fungi, and plants (Shenzhen Code) adopted by the Nineteenth International Botanical Congress Shenzhen, China, July 2017. Regnum Vegetabile 159. Koeltz Botanical Books, Glashütten, Germany. DOI: https://doi.org/10.12705/Code.2018 UNITE (2020) rDNA ITS based identification of Eukaryotes and their communication via DOIs. Available at: https://unite.ut.ee/ (accessed 2 June 2020). Vega, F.E., Meyling, N.V., Luangsa-Ard, J.J. and Blackwell, M. (2012) Fungal entomopathogens. In: Vega, F. and Kaya, H.K. (eds) Insect Pathology, 2nd edn. Academic Press, San Diego, California, pp. 171–220. Wijayawardene, N.N., Hyde, K.D., Al-Ani, L.K.T., Tedersoo, L., Haelewaters, D. et al. (2020) Outline of Fungi and fungus-like taxa. Mycosphere 11(1), 1060–1456. DOI: 10.5943/mycosphere/11/1/8 Wyss, T., Masclaux, F., Rosikiewicz, P., Pagni, M. and Sanders, I.R. (2106) Population genomics reveals that within-fungus polymorphism is common and maintained in populations of the mycorrhizal fungus Rhizophagus irregularis. The ISME Journal 10, 2514–2526. https://doi.org/10.1038/ismej.2016.29 Yarden, O. (2014) Fungal association with sessile marine invertebrates. Frontiers in Microbiology 5, 228. https://doi.org/10.3389/fmicb.2014.00228

3

Names of Microorganisms and Data Resources to Retrieve Information About Published Names Aharon Oren1,*, Aidan C. Parte2 and Jerry Cooper3 Department of Plant and Environmental Sciences, The Institute of Life Sciences, The Hebrew University of Jerusalem, Edmond J. Safra Campus, Jerusalem, Israel; 2 List of Prokaryotic names with Standing in Nomenclature (LPSN), Sudbury, MA, USA; 3Manaaki Whenua – Landcare Research, Lincoln, New Zealand 1

Introduction When this chapter was written at the end of 2019, the names of more than 18,000 species of prokaryotes (Bacteria and Archaea combined), belonging to about 3400 genera, had been validly published (Parte, 2018). By the end of 2018, more than 100,000 species of fungi had been described in the literature, classified in >10,000 genera (Kirk et al., 2008, IndexFungorum, 2019). A discussion about the basic question how to define species and genera of prokaryotes and fungi is beyond the scope of this chapter. For the prokaryotes there is still no generally accepted and well-defined species concept (Rosselló-Móra and Amann, 2001, 2015; Gevers et al., 2005). Also, for the fungi there is no universally accepted definition, but there has been a general transition to a polyphasic approach based on a phylogenetic species concept (Taylor et al., 2000; also see Chapter 2). However, this does not prevent microbiologists from describing new taxa at an ever- increasing rate. In the past years, more than

1000 new species and around 200 new genera of prokaryotes were added annually to the list (Oren and Garrity, 2014a; Parte, 2018), and more than 4000 species and over 300 new genera of fungi were added annually (Robert et al., 2005). This chapter deals with the process of giving names to newly discovered microorganisms and assigning names to them that are correctly formed in agreement with the rules of the relevant code of nomenclature: the International Code of Nomenclature of Prokaryotes (ICNP, https://dx.doi.org/10.1099/ijsem.0.000778, accessed 9 July 2020) (Parker et al., 2019) for the prokaryotes (Archaea and Bacteria, including a small number of cyanobacteria), and the International Code of Nomenclature for algae, fungi, and plants (ICN, https://www. iapt-taxon.org/historic/2018.htm, accessed 9 July 2020) (Turland et al., 2018) for the microfungi and for most cyanobacteria. In-depth discussions will be devoted to the online databases and handbooks, where updated information can be found about all names with standing in the nomenclature.

*[email protected]

30

© CAB International 2021. Trends in the Systematics of Bacteria and Fungi (eds. P. Bridge, D. Smith and E. Stackebrandt)

Microorganisms and Data Resources – Published Names

The International Code of Nomenclature of Prokaryotes (the Prokaryotic Code) To understand the ways in which new taxa of prokaryotes are named and how names obtain standing in the nomenclature, it is important to highlight and explain a selection of the Rules of the ICNP, the Code that defines the formal framework that regulates the nomenclature. The ICNP, with its General Considerations, Principles, Rules and Recommendations and Appendices, is an official document approved by the International Committee on Systematics of Prokaryotes (ICSP, the-icsp.org, accessed 9 July 2020), and alterations to this Code can only be made by the ICSP at one of its plenary sessions (Rule 1b). For further information about the way the ICSP functions see Whitman et al. (2019). Some of the quotations from the Code given below are abbreviated from the original text (Parker et al., 2019). Principle 1: The essential points in nomenclature are as follows: (1) Aim at stability of names; (2) Avoid or reject the use of names which may cause error or confusion; (3) Avoid the useless creation of names…

Many nomenclatural changes have been introduced in recent years, including extensive renaming of species by transferring them to newly created genera. Sometimes such changes have led to confusion, especially in cases of names of well-known organisms, some of which have clinical importance. Examples are the renaming of Clostridium difficile as Clostridioides difficile and the renaming of a number of species of the genus Mycobacterium to the newly proposed genera Mycobacteroides, Mycolicibacter, Mycolicibacterium and Mycolicibacillus. Papers were published to further explain the changes and to further clarify the status of the new names (Oren and Rupnik, 2018; Oren and Trujillo, 2019; Tortoli et al., 2019; see also Tindall, 1999). Principle 1(4): Nothing in this Code may be construed to restrict the freedom of taxonomic thought or action.

There is a common misunderstanding that the Rules of the ICNP govern taxonomy and that classification schemes of prokaryotes are approved by the ICSP. Principle 1(4) was added to the Code to make it clear that its Rules govern

31

nomenclature only, and do not deal with issues of taxonomic opinion, phylogenetic relationships and classification. It is based on General Consideration 4: ‘Rules of nomenclature do not govern the delimitation of taxa nor determine their relations.’ Principle 2: The nomenclature of prokaryotes is not independent of botanical and zoological nomenclature. When naming new taxa in the rank of genus or higher, due consideration is to be given to avoiding names which are regulated by the International Code of Zoological Nomenclature and the International Code of Nomenclature for algae, fungi, and plants. Note. This principle takes effect with publication of acceptance of this change by the ICSP (from November 2000) and is not retroactive.

There are a number of generic names with standing in the prokaryotic nomenclature that have homonyms in the botanical and/or in the zoological nomenclature. Examples are Bacillus (also an insect), Proteus (also an amphibian) and Lawsonia (both an insect and a flowering plant). From 2001 it is no longer possible to propose new generic names for prokaryotes that are already in use in botany or in zoology. However, as the botanical and the zoological nomenclatures do not have central registration of names, there is no guaranteed way to ascertain that a name had not been used earlier. Online resources such as the Index Nominum Genericorum (Farr and Zijlstra, 1996); Nomenclator Zoologicus (www. ubio.org/NomenclatorZoologicus, accessed 9 July 2020) and Indexing & Organizing Biological Names (uBio; ubio.org); and NamesforLife (www.namesforlife.com, accessed 9 July 2020) (Garrity, 2010) are helpful. These resources are discussed in more depth below. Principle 6: The correct name of a taxon is based upon valid publication, legitimacy and priority of publication …

The concept and implications of valid publication (as opposed to effective publication) of names are further explained below. Each taxon belonging to one of the ranks covered by the rules of the ICNP can bear only one correct name; that is, the earliest that is in accordance with the Rules of this Code (Principle 6). Rule 15: For each named taxon of the various taxonomic categories… , there shall be designated a nomenclatural type. The

32

A. Oren et al.

nomenclatural type… is that element of the taxon with which the name is permanently associated, whether as a correct name or as a later heterotypic synonym. The nomenclatural type is not necessarily the most typical or representative element of the taxon.

The taxonomic categories covered by the rules of the ICNP, as defined in Table 2 found in Rule 15, are from subspecies to class. The rank of phylum is not included. A proposal to include the rank of phylum in the Code was submitted in 2015 (Oren et al., 2015), but at the time of writing of the present chapter it had not yet been discussed by the ICSP. Rule 18a: Whenever possible, the type of a species or subspecies is a designated strain. The type strain is made up of living cultures of an organism, which are descended from a strain designated as the nomenclatural type. The strain should have been maintained in pure culture and should agree closely to its characters with those in the original description… (1) Until 31 December 2000, for a species (or subspecies) which has not so far been maintained in laboratory cultures or for which a type does not exist, a description, preserved specimen, or illustration… may serve as the type. (2) As from 1 January 2001, a description, preserved (non-viable) specimen, or illustration may not serve as the type.

There are a small number of species-level taxa of prokaryotes for which no living type material is available and whose descriptions are based on drawings, photographs, etc. There are a few such names in the Approved Lists of Bacterial Names of 1980 (Skerman et al., 1980, 1989) derived from the older literature (see further Rule 24a). A few more such species were described after 1980. However, in 1999 the ICSP decided to modify the rules of the Code so that valid publication of names of new species (and subspecies) can only be based on the availability of axenic cultures as type material. Rule 25a: Effective publication is effected under this Code by making generally available, by sale or distribution, to the scientific community, printed and/or electronic material for the purpose of providing a permanent record. … Note. Electronic publication should follow the tradition of publication of printed matter acceptable to this Code.

Publication of descriptions of new taxa in electronic journals is thus acceptable as effective

publication of the names. Information in online supplementary material, both for ‘traditional’ printed journals and for journals published in electronic format only, is not accepted as the basis for later validation of names, as it is not certain that such supplementary material can be considered a permanent record. Rule 27: A name of a new taxon, or a new combination for an existing taxon, is not validly published unless the following criteria are met. (1) The name is published in the IJSB/IJSEM. (2) The publication of the name in the IJSB/ IJSEM is accompanied by a description of the taxon or by a reference to a previous effectively published description of the taxon… As of 1 January 2001 the following criteria also apply: a. The new name or new combination should be clearly stated and indicated as such (i.e. fam. nov., gen. nov., sp. nov., comb. nov., etc.). b. The derivation (etymology) of a new name… must be given. c. The properties of the taxon being described must be given directly after (a) and (b). This may include reference to tables or figures in the same publication, or reference to previously effectively published work. d. All information contained in (c) should be accessible. (3) The type of the taxon must be designated… In the case of species or subspecies including new combinations, the type strains must be deposited according to Rule 30.

The International Journal of Systematic and Evolutionary Microbiology (IJSEM, www. microbiologyresearch.org/content/journal/ijsem, accessed 9 July 2020; prior to 2000, International Journal of Systematic Bacteriology – IJSB) is thus the only journal in which names of prokaryotic taxa can be validly published and obtain standing in the nomenclature. This distinguishes the nomenclature of prokaryotes from that of plants and animals, as the ICN and the International Code of Zoological Nomenclature (ICZN, (www.iczn.org/ the-code/the-international-code-of-zoologicalnomenclature/the-code-online/, accessed 9 July 2020) do not have such a mechanism of central registration of names. There is a widespread misconception that publication of the name of a new prokaryotic taxon in the IJSEM automatically grants valid publication of the name. A name published in the IJSEM can be considered to be validly

Microorganisms and Data Resources – Published Names

published only if it conforms to the rules of the ICNP. Thanks to its rigorous reviewing process, it seldom happens that problematic names of new taxa are published in the journal. When it does happen, a correction can be made in a Notification List. Rule 30: For the name of a species to be validly published, it must conform with the following conditions. … (3) (b) As of 1 January 2001, the description of a new species, or new combinations previously represented by viable cultures must include the designation of a type strain… and a viable culture of that strain must be deposited in at least two publicly accessible culture collections in different countries from which subcultures must be available. The designations allotted to the strain by the culture collections should be quoted in the published description. Evidence must be presented that the cultures are present, viable, and available at the time of publication. Note. In exceptional cases, such as organisms requiring specialized facilities (e.g. Risk Group/ Biological Safety Level 3, high pressure requirements, etc.), exceptions may be made to this Rule. Exceptions will be considered on an individual basis by a committee consisting of the Chairman of the ICSP, the Chairman of the Judicial Commission and the Editor of the IJSEM. …(4) Organisms deposited in such a fashion that access is restricted, such as safe deposits or strains deposited solely for current patent purposes, may not serve as type strains.

Availability of the type strains of species and subspecies without restrictions is considered extremely important. When access to type material is restricted (e.g. because the strain is protected by a patent or was deposited in a culture collection as a ‘safe deposit’ that can be accessed only with the permission of the depositor or a third party), such cultures cannot serve as type material on which valid publication of the name is based. A copy of such a strain can be deposited in an open collection under a different accession number than the original one, in order to make it public. Rule 56a: Only the Judicial Commission can place names on the list of rejected names (nomina rejicienda)… A name may be placed on this list for various reasons, including the following. (1) An ambiguous name (nomen ambiguum), i.e. a name which has been used with different

33

meanings and thus has become a source of error. … (2) A doubtful name (nomen dubium), i.e. a name whose application is uncertain. … (3) A name causing confusion (nomen confusum), i.e. a name based upon a mixed culture. … (4) A perplexing name (nomen perplexum), a name whose application is known but which causes uncertainty in bacteriology… (5) A perilous name (nomen periculosum), i.e. a name whose application is likely to lead to accidents endangering health or life or both or of serious economic consequences. …

A name once validly published thus remains validly published in perpetuity unless the Judicial Commission of the ICSP decides otherwise. Proposals to reject names can be submitted to the Judicial Commission by publication of a Request for an Opinion in the IJSEM in the format described in Appendix 8 of the ICNP. Further information about the way the Judicial Committee is appointed and functions is found in the statutes of the ICSP (Whitman et al., 2019).

Resources from Which Information About Names of Prokaryotic Taxa Can Be Retrieved The ‘official’ sources of information: articles and lists in the IJSB/IJSEM As stated in Rule 27, valid publication of the name of a prokaryotic taxon requires publication of that name in the IJSB/IJSEM. Being the official journal of the ICSP (formerly the International Committee on Systematics of Bacteria, ICSB), this journal provides central registration of names with standing in the nomenclature. In the prokaryotic nomenclature, priority of publication dates from 1 January 1980 (see Rule 24a) with the publication of the Approved Lists of Bacterial Names (Skerman et al., 1980). Many older names that were in use before 1980 were not included in the ‘Approved Lists’, and these lost their status in the nomenclature. Most of those names can be found in the Index Bergeyana – An Annotated Alphabetic Listing of Names of the Taxa of the Bacteria (Buchanan et al., 1966). This monumental 1472-page document is today mainly of historical interest. With the publication of the Approved Lists, a new start was made

34

A. Oren et al.

in the nomenclature of prokaryotes. The first list gives the names of taxa above the rank of genus; the second list has the names of genera, species and subspecies. Following the original publication in 1980 (Skerman et al., 1980), corrigenda were published (Hill et al., 1984), and a corrected and emended version was later published in book form (Skerman et al., 1989). The Approved Lists contain the names of seven classes, one subclass, 21 orders, three suborders, 66 families, 24 tribes, 290 genera, 1792 species and 131 subspecies. In addition, names of two divisions were included: Firmacutes (now known as Firmicutes) and Gracilicutes. Since the publication of the 1980 Approved Lists, names of new taxa of prokaryotes can be validly published only in the IJSB/IJSEM. This can be accomplished in either of two ways: in an original publication in that journal or by inclusion of the names in a validation list after they had earlier been published elsewhere (effective publication). As explained above, publication of a new name in the IJSB/IJSEM is considered valid publication only if all conditions set by the rules of the ICNP are met. There have been a few cases where names published in the IJSEM had to be changed when it was discovered that they contravened one or more rules of the Code and were thus illegitimate. The proposal of the generic names Parvimonas and Quadrionicoccus to replace the illegitimate names Micromonas and Quadricoccus is an example (Tindall and Euzéby, 2006). At the request of the Judicial Commission, the IJSB/IJSEM provides notification lists that itemize all nomenclatural changes as well as changes in taxonomic opinion that have occurred in each issue of the journal. The names are listed according to the rules of priority; that is, the page number and order of valid publication of names in the original articles. This list has no formal status in the nomenclature except to allow for orthographic corrections to be made. The first of these Notification Lists was published in July 1991. Currently, each Notification List is published 3 months after the publication of each issue of the journal. It must be stressed that the date of valid publication of a name is that of its publication in the IJSB/IJSEM, not the date of publication of the Notification List in which the name features. The notification lists also contain information about emendation of circumscriptions or

the creation of synonyms proposed in the IJSEM. Starting with the list published in January 2014, this information is presented in a separate table after the main table that lists the new names and new combinations published in each issue. These taxonomic opinions cannot be considered as validly published nor in any other way approved by the ICSP or by its Judicial Commission. Names effectively published in other journals can obtain standing in the nomenclature by their inclusion in the Validation Lists in the IJSB/IJSEM. The first validation list was published in the IJSB in July 1977, and currently these lists are published bimonthly in the IJSEM. Authors and other interested parties (individuals, journals, etc.) wishing to have new names and/or combinations included in a validation list should send an electronic copy of the published paper to the IJSEM Editorial Office. For the validation of the names of species, subspecies and new combinations, evidence must be supplied that the type strains are deposited and are available without restrictions from two public culture collections in two different countries, to comply with the requirements of Rule 30 of the ICNP. The requests are checked by the List Editors of the IJSEM. If approved, the date of valid publication of the new names and combinations is the date of publication of the list, not the date of the effective publication of the names in different journals. The inclusion of a name on a validation list is not to be construed as taxonomic acceptance of the taxon to which the name is applied. Indeed, some of these names may, in time, be shown to be synonyms, or the organisms may be transferred to another genus, thus necessitating the creation of a new combination. For further information about the validation procedure, see Tindall et al. (2006) and Trujillo et al. (2019).

Online Resources that Provide Information on Validly Published Names of Taxa of Prokaryotes The articles and the lists published in the IJSB/ IJSEM are the only official sources for information about validly published names of prokaryotic taxa that are endorsed by the ICSP. Much useful information can be retrieved from searchable online resources that make the data readily available and that are linked to the original publications

Microorganisms and Data Resources – Published Names

in the IJSB/IJSEM. These include the List of Prokaryotic names with Standing in Nomenclature (LPSN) (Euzéby, 1997; Parte, 2014, 2018), Prokaryotic Nomenclature Up-to-Date (Leibniz Institute DSMZ, 2019) and NamesforLife (Garrity, 2010). None of these resources has official standing in the nomenclature. In cases of discrepancies, the IJSB/IJSEM contains the authoritative version. NamesforLife is not restricted to the prokaryotes. List of Prokaryotic names with Standing in Nomenclature (LPSN) (Euzéby, 1997; Parte, 2014, 2018) The List of Bacterial Names with Standing in Nomenclature (LBSN, https://lpsn.dsmz.de/archive/, accessed 9 July 2020) was launched on 28 March 1997 by Professor Jean P. Euzéby, a veterinary bacteriologist from Toulouse, France. LBSN was an extension of the papers that Euzéby wrote for Revue de Médecine Vétérinaire from 1990 onwards, listing changes in the systematics of bacteria of veterinary interest. It was originally available as text files by anonymous FTP and then on the World Wide Web as of 28 January 1998 (Euzéby, 1997). It was renamed LPSN when the term ‘prokaryotes’ was adopted in the name of the nomenclatural body, the International Committee on the Systematics of Prokaryotes (formerly Bacteria). LPSN is currently owned and curated by Aidan Parte since Euzéby’s retirement in 2013 (Oren and Garrity, 2013; Parte, 2018). LPSN consists of HTML files for each genus, family, suborder, order and class; there is a separate

35

page, listing taxa above the rank of class. For every taxon, the nomenclatural type and full reference(s) are given. The genus files list all the member species and subspecies of the genus; each species entry lists the etymology, valid publication and effective publication where appropriate. Species entries contain detailed information (Fig. 3.1), including: (i) the ‘defining publication’ or authorship, in the correct format according to the ICNP; however, Rule 34a, note 1, is not followed because citations would be too long and because basonyms are given elsewhere in the entry; (ii) the status (e.g. new species: sp. nov.; new combination: comb. nov.), and whether the species is the type for the genus; (iii) the type strain accession numbers (with direct links to American Type Culture Collection (ATCC), Belgian Coordinated Collection of Microorganisms (BCCM), Culture Collection University of Gothenburg (CCUG), Deutsche Sammlung von Mikroorganismen und Zellkulturen (DSMZ), Japan Culture Collection (JCM) and Korean Collection for Type Cultures (KCTC) catalogue entries when applicable) and a link to the species in the Global Catalogue of Microorganisms (Wu et al., 2013), a service of the World Federation for Culture Collections; (iv) links to the National Center for Biotechnology Information (NCBI) sequence database for the 16S rRNA gene and the whole- genome sequence when available; (v) the original name (or basonym), if the taxon is a new combination, along with a link to the original entry in LPSN; (vi) the etymology of the name when it is originally published in IJSEM or when LPSN has access to the effective publication, along with the source of isolation of the type

Fig. 3.1. Taxon entry for the species Klenkia soli in LPSN (www.bacterio.net/klenkia.html; accessed 23 October 2019).

36

A. Oren et al.

strain where possible; and (vii) references to the valid publication – either the original paper in IJSEM or the Validation List, and the effective publication. Throughout LPSN, whenever possible links to publications use the Digital Object Identifier (DOI) for reliability. LPSN may be searched using the Google custom search boxes at the top of every page, or entries may be found using the alphabetical listings of all names cited in the LPSN: List A–C, D–L, M–R and S–Z or the listing of genera and taxa above the rank of genus: A–C, etc. There is also a listing of Candidatus taxa, and of nonvalid names (i.e. with no standing in nomenclature), which is occasionally updated. Each genus (or higher) file also links to the genus (or higher) in the Classification of domains and phyla – Hierarchical classification of prokaryotes (www.bacterio.net/-classifphyla.html). Using data from the previous iteration, which was based on the original publications, the latest ‘Taxonomic Outline of the Bacteria and Archaea’ (Garrity et al., 2007), the ‘NCBI Taxonomy Browser’ (www.ncbi.nlm.nih.gov/taxonomy) and the Taxonomic Outlines for Volumes 3 and 4 of Bergey's Manual of Systematic Bacteriology (2nd edn) (Ludwig et al., 2009, 2010) and/or ‘The All- Species Living Tree Project’ (Yarza et al., 2008), the classification was updated in collaboration with Pablo Yarza (Ribocon GmbH) to include new taxa at or above the rank of genus validly published since 2013 in October 2016 (designated version 2.0), July 2018 (version 2.1) and June 2019 (version 2.2). Each taxon entry links directly to the appropriate file in LPSN, providing another means of accessing information. In addition to the taxon entries, LPSN contains a list of all names in the Approved Lists with links to the taxon entry in LPSN and to the ATCC or DSMZ catalogue entries, and a number of pages concerning prokaryotic nomenclature, culture collections, etc. Prokaryotic Nomenclature Up-to-Date (Leibniz Institute DSMZ, 2019) Prokaryotic Nomenclature Up-to-Date (PNU, https://www.dsmz.de/services/online-tools/ prokaryotic-nomenclature-up-to-date, accessed 9 July 2020) is a service provided by the Leibniz Institute DSMZ – German Collection of Micro-

organisms and Cell Cultures. It was launched in 1993 by the DSM – Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH (www. dsmz.de, accessed 9 July 2020) together with the Information Centre for European Culture Collections (ICECC, www.cordis.europa.eu/project/id/ BIOT0162, accessed 9 July 2020) as Bacterial Nomenclature Up-to-Date, and it was curated by Norbert Weiss of the Deutsche Sammlung von Mikroorganismen und Zellkulturen (DSMZ) (Anonymous, 1993). It included all bacterial names validly published since 1 January 1980 (since the Approved Lists) and all nomenclatural changes validly published since then. It was sold on floppy disks in two different electronic files: one for import into a text-processing system and the other for import into a database, with updates being sent after the publication of each new issue of the IJSB. In time, the files became available via FTP. Presently, data are accessed in Prokaryotic Nomenclature up-to-date by browsing genus names through alphabetized lists, by downloading a PDF listing of the new names validly published every month in IJSEM or by downloading a PDF or Excel file of a complete list of validly published names, which includes a reference to the valid publication in IJSB/IJSEM. Information may also be retrieved via PNU Web Service 1.0 in either XML or JavaScript Object Notation (JSON) format, which are not dependent on the client-side software. The system enables the retrieval of data by class, family, genus, species and subspecies name or culture collection number via a Representational State Transfer (REST)ful web service. The RESTful web services are designed for high performance, reliability and expandability. Note: In November 2019, the Leibniz Institute DSMZ acquired LPSN with the aim of combining it with PNU – this move will ensure the long-term future of LPSN, owing to more stable funding, and will be more efficiently produced using the technology available at the DSMZ (www.bacterio.net and www.lpsn.dsmz.de/, accessed 9 July 2020).

NamesforLife (Garrity, 2010) NamesforLife (namesforlife.com, accessed 9 July 2020) was first proposed in 2003 as a service

Microorganisms and Data Resources – Published Names

designed to ‘future-proof biological nomenclature’, using as its globally unique persistent identifier the DOI, the system for identifying ‘objects’ in the digital world, most commonly used by publishers for providing persistent links to journal articles. The DOI system has several components: a specified numbering syntax (the DOI), a resolution service (the Handle System, which connects opaque identifiers such as the DOI with metadata such as URLs), a data model (a data dictionary and a framework for applying it, ensuring interoperability) and procedures for the implementation of DOIs (Paskin, 2004). NamesforLife applies its DOIs to taxonomic concepts, names and exemplars (e.g. type strains), thus providing a standards-based means of linking the prokaryotic nomenclature to a range of online resources, such as the original literature, culture collections, sequence and physiological data. The application of DOI technology enables NamesforLife to retrieve the current nomenclatural status of a taxon as well as any historical information such as synonyms. The Microbiology Society has integrated NamesforLife into all of its non-virology journals, including IJSEM, which has NamesforLife in both its HTML and PDF formats. In the HTML version, validly published names are displayed in a green font, indicating to users that further details concerning the taxon are available. In the first example (Fig. 3.2), clicking on the name Acetobacter activates the NamesforLife Guide widget, a popup with a number of menus. Clicking the Nomenclature menu opens a submenu from

37

which the user can dig deeper – clicking the Nomenclature Abstract (described below) at the top opens the taxon entry on the NamesforLife website in a separate browser tab or window; in the example here, the user has followed the submenu to Nomenclatural Events, which lists the 1898 effective publication, a 1974 emendation and the 1980 validation of the genus name in the Approved Lists. BioSafety Risk Group classification information is included where appropriate. As mentioned above, the curated NamesforLife database enables the tracking of synonyms: under the Synonyms menu, earlier or later synonyms are displayed, along with links to any synonymous taxa and citation information. The NamesforLife website has Name (Fig. 3.3) and Taxonomic Abstracts, linkable from the Guide widget, from the NamesforLife DOI embedded in a PDF file, or by using the site’s search facility. All the information in the Guide widget is available in the Name and Taxonomic Abstracts. The Taxonomic Abstract also gives the taxonomic placement according to the Taxonomic Outline of Bacteria and Archaea, a NamesforLife classification (Garrity et al., 2007).

Names of Prokaryotes Effectively but not Validly Published While the IJSB/IJSEM is obviously the preferred journal for publication of original papers in which new names of prokaryotic taxa are proposed,

Fig. 3.2. NamesforLife Guide widget displayed by clicking on the genus name Acetobacter in an IJSEM paper.

38

A. Oren et al.

Fig. 3.3. NamesforLife Name Abstract for Haloferax volcanii.

many new names are published in other journals and in printed and/or online resources such as Bergey’s Manual of Systematics of Archaea and Bacteria (Whitman, 2015a) and in older versions

of Bergey’s Manual. Such names are then effectively published, based on Rule 25a of the ICNP, but do not have standing in the nomenclature until they are later validated by inclusion in a

Microorganisms and Data Resources – Published Names

validation list in the IJSB/IJSEM, as explained above (see further Rules 24b and 27 of the ICNP). There is no central registration of prokaryotic names that are effectively but not validly published, so there is no information about the number of names without standing in the nomenclature that have been added since the publication of the Approved Lists of Bacterial Names in January 1980 (Skerman et al., 1980).

Names of Candidatus Taxa of Prokaryotes In the mid-1990s, proposals were published for the implementation of the provisional status Candidatus for taxa that could not (yet) be isolated in pure culture, but for which sufficient information was available so that the taxa could be recognized and described (Murray and Schleifer, 1994; Murray and Stackebrandt, 1995). A Candidatus name is by definition a preliminary name and therefore has no standing in prokaryote nomenclature. These proposals were endorsed by the ICSP, but the rank of Candidatus was not formally included in the rules of the ICNP. More information about the status of Candidatus taxa is found in Appendix 11 of the Code. The fact that the naming of Candidatus taxa is not regulated by the ICNP means that there has never been any nomenclatural quality control for the majority of Candidatus names. Names of Candidatus taxa that feature in articles submitted to the IJSEM are subjected to the same nomenclature checks as names of taxa that will be validly published upon their publication in the journal. However, many Candidatus taxa names published elsewhere are unfortunately malformed (Oren, 2017). According to Appendix 11 of the ICNP, a list in the form of a codified record of organisms of the status Candidatus should be kept by the Judicial Commission of the ICSP in cooperation with the Editorial Board of the IJSEM, and should be published in that journal at appropriate intervals. However, this was never implemented. Only recently, an effort was made to prepare an inventory of Candidatus names found in the literature. An annotated list of names of Candidatus taxa of prokaryotes with ranks between subspecies and class, proposed between

39

the mid-1990s when the rank of Candidatus taxa was first established and the end of 2018, has been completed (Oren et al., 2020). The total number of names listed is 1093: one higher taxon of undefined rank, seven classes, one subclass, 12 orders, 25 families, 329 genera, 708 species and 10 subspecies. Where necessary, corrected names were proposed that comply with the current provisions of the ICNP and its Orthography appendix (Appendix 9). In addition, a second annotated list of Candidatus names published in the literature since the beginning of 2019 is in preparation. These lists, as well as later updated lists of newly published names of Candidatus taxa, and additions and corrections to the current lists, will be published periodically in the IJSEM, providing a kind of central registration of Candidatus names, even if the rank of Candidatus is currently not covered by the rules of the ICNP. It must be stressed that these tables may not be considered to be ‘Approved Lists’ of Candidatus names comparable to the 1980 Approved Lists of Bacterial Names (Skerman et al., 1980) or lists that will automatically serve as Validation Lists if and when the ICSP may decide to include Candidatus taxa under the rules of the ICNP. Their sole goal is to implement the recommendations of Appendix 11 and to provide an attempt towards an inventory of Candidatus names found in the literature. In past years, proposals have been published to expand the type material for naming of prokaryotes to the uncultivated majority, and to allow Candidatus taxa, genome sequences, sequences from metagenomics data, etc., as type material under the nomenclature rules of the ICNP (Konstantinidis and Tiedje, 2005; Konstantinidis and Rosselló-Móra, 2015; Thompson et al., 2015; Whitman, 2015b, 2016; Konstantinidis et al., 2017; Oren and Garrity, 2018). Discussions on the merit and technical feasibility of these proposals are outside the scope of this chapter. If indeed the ICSP will endorse these proposals in the future, the above-described lists of Candidatus taxa may form the basis for a further evaluation of which names can be validated. It is clear that not all Candidatus names will qualify. Many of the papers in which Candidatus taxa were proposed do not contain a satisfactory description of the taxon, and in some cases the name is only incidentally mentioned and no further information about the taxon is available.

40

A. Oren et al.

The Special Status of the Cyanobacteria/Cyanophyta The cyanobacteria occupy a special place in the nomenclature of prokaryotes, as the nomenclature of this group is covered by the rules of the International Code of Nomenclature for algae, fungi, and plants (ICN, the ‘Botanical Code’) (Turland et al., 2018). The 1980 Approved Lists of Bacterial Names that reset the date of priority for the valid publication of names of prokaryotes (Skerman et al., 1980) did not contain any names of cyanobacteria. After it was realized that they belong to the bacterial world because of their prokaryotic cell structure and their phylogenetic affiliation, a formal proposal was made to include the cyanobacteria under the rules of the International Code of Nomenclature of Bacteria (the ‘Bacteriological Code’- the ICSB, now the ‘Prokaryotic Code’ – the ICNP) (Stanier et al., 1978). This proposal was positively evaluated by the Judicial Commission of the ICSB (Wayne, 1991). Following this proposal, many discussions were held between the authorities of the International Code of Nomenclature of Bacteria (ICNB/ICNP) and the International Code of Botanical Nomenclature (ICBN/ICN), discussions that today, nearly 40 years later, still have not led to a satisfactory solution. For further information see, for example, Hoffmann (2005), Oren et al. (2009), Oren and Komárek (2010), Oren and Ventura (2017) and Palinska and Surosz (2014). This is mainly because the rules of the ICN (Turland et al., 2018) differ from those of the ICNP (Parker et al., 2019). Reconciliation between the two nomenclature systems is, therefore, difficult (Oren, 2004; Oren and Tindall, 2005). Two opposing proposals were submitted for discussion by the ICNP: the first proposed the exclusion of most cyanobacteria from the rules of the ICNP and to ‘return’ the nomenclature of the cyanobacteria to the botanical authorities (Oren and Garrity, 2014b; see also Imhoff, 2014); the second proposed the consistent application of the rules of the ICNP to all cyanobacteria (Pinevich, 2015). At the time of writing, the two proposals are still waiting to be evaluated by the ICSP. A recurring problem relating to the nomenclature of the cyanobacteria is that there is no well-defined and generally agreed-upon species

concept for the group. Castenholz (1992) provided an excellent summary of the older literature. Molecular approaches based on comparisons of 16S rRNA and other genes showed that the earlier morphology-based classification and nomenclature had created many polyphyletic taxa that need to be rearranged and renamed. There are currently over 2700 species of cyanobacteria described in the literature. Based on theoretical models it was estimated that more than 6000 species may exist in nature (Nabout et al., 2013). There is no official central registration of names of cyanobacterial taxa, but the CyanoDB website – the online database of cyanobacterial genera (www.cyanodb.cz, accessed 9 July 2020) – provides excellent and well-updated information at least to the genus level, with references and some other data for the species assigned to each genus (Hauer and Komárek, 2019). Figure 3.4 gives an example of the kind of information provided for cyanobacterial genera in the CyanoDB.

Names of Fungi and Related Digital Resources The names of fungi are governed by the same code as that used for naming plants. For most of its history this has been known as the International Code for Botanical Nomenclature (ICBN). In 2011 it was formally renamed the International Code of Nomenclature for algae, fungi, and plants (ICN). This change was in recognition of the breadth of organisms governed by the code, and to acknowledge the significantly different nomenclature requirements in each organism group. The requirements for naming fungal species are:

• • • • •

The species must be described in a journal or book with an ISSN or ISBN number. The species name must be a unique binomial. There must be an English or Latin description or diagnosis (how it differs). There must a preserved type in a recognized biological collection. Specific nomenclature data must be deposited in an approved repository and the issued registration number included.

Microorganisms and Data Resources – Published Names

41

Fig. 3.4. The entry for the genus Chroococcidiopsis in CyanoDB (www.cyanodb.cz/#/Chroococcidiopsis; accessed 31 October 2019).

•

It is recommended good practice to deposit associated sequence data in GenBank and sequence alignments and phylogenetic trees in TreeBASE (https://www.treebase.org/ treebase-web/home.html, accessed 9 July 2020).

The procedures for naming plants and fungi under the rules of the ICN are adequately covered elsewhere (Turland, 2019; also see

Chapter 2). Hawksworth (2010) provides a useful reference to the technical terminology used in each of the biological codes and the definition of these terms. Some terms can cause confusion because they have different meanings under different codes, for example ‘valid’ (ICN) versus ‘available’ (ICZN), and ‘valid’ (ICNP) versus ‘correct’ (ICN). Here we highlight some key points in the naming of fungi with a focus on the recent and

42

A. Oren et al.

substantial changes to the ICN introduced in the Melbourne Code (McNeill et al., 2012) and more recently the Shenzhen Code (Turland et al., 2018). We also include some discussion of the impact of changing technology and the currently available digital resources to assist in correctly naming fungi and their associated plant hosts, and the analysis of fungal data linked to names.

Effective Publication Under the ICN Starting in 2012 names can be published electronically in addition to traditional print (Article 29.1). To be effectively published, the digital article must be available online via the World Wide Web, in a publication with a registered ISSN or ISBN number, and must be formatted as a PDF document. It is recommended the PDF conform to the PDF/A archival standard format (www.en.wikipedia.org/wiki/PDF/A; accessed 9 July 2020). Any subsequent alterations to the document content are not effectively published. To establish priority, it is important that publications contain the correct date on which they became publicly available, preferably with pagination. The valid publication of subsequent combinations (Article 41.5) requires a full and direct reference to the basionym author and place of valid publication, with page or plate reference and date. The correct citation of digital (in particular digital-first and ‘early view’) publications can be problematic. Prior to 2012 the description or diagnosis accompanying a new taxon had to be in Latin. Starting in 2012 it became permissible to use Latin or English.

Chapter F of the ICN Chapter F of the ICN (Turland et al., 2018) is concerned with some specific rules applied to the names of organisms treated as fungi. A key point to note is that not all ‘organisms treated as fungi’ are members of the kingdom Fungi. The ICN applies to some groups that have traditionally been treated as fungi. This includes the slime moulds currently classified within the

Amoebozoa, and also the downy mildews and water moulds, which are members of the kingdom Chromista.

Typification The type specimen of a named taxon must be a single preserved collection that is either dead or metabolically inactive and conserved in a named institution (Article 8.1, 8.4). Living cultures cannot serve as nomenclatural types. Cultures that are stored by lyophilization or at liquid nitrogen temperatures may be considered metabolically inactive and may serve as nomen clatural types.

Priority, starting-point dates and hemihomonyms The ICN, in common with other codes, adopts the ‘principle of priority’: the first valid publication of a taxon name establishes priority. For plant and fungal names treated under ICN, the starting point for establishing priority is 1 May 1753, with the publication of Linnaeus’s Species Plantarum edn 1. However, significant early comprehensive catalogues of fungal names were published later by Christiaan Persoon and Elias Fries. To stabilize the use of fungal names, some publications by these two authors have the special status of ‘sanctioned works’ (Article F.3) (en. wikipedia.org/wiki/Sanctioned_name). Any names occurring in these publications have a privileged status and are automatically treated as if conserved against all earlier synonyms. In establishing the first correct introduction of a name it is important to check the inclusion in lists of sanctioned and conserved names. After 1 January 2019 a proposed new name is illegitimate if it is a later homonym of a prokaryote or protozoan (F.6). It is therefore essential to consult the available databases of organism names for the existence of potential homonyms. Aside from reducing the ambiguity created by hemihomonyms (identical names covered by multiple codes) the ruling paves the way for a future harmonization of the different codes into a single code of life (Hawksworth, 2010).

Microorganisms and Data Resources – Published Names

Registration of nomenclatural acts After 1 January 2013 it became mandatory to register new names (including new combinations) with a repository, and to include the registration identifier issued by a repository in the publication establishing the nomenclatural novelty (F.5). Currently there are three registration repositories recognized by the Nomenclatural Committee for Fungi: IndexFungorum (www. indexfungorum.org/Names/Names.asp, accessed 9 July 2020), MycoBank (www.mycobank.org, accessed 9 July 2020) and Fungal Names (www. fungalinfo.im.ac.cn/fungalname/fungalname. html; accessed 9 July 2020). After 1 January 2019 there is also a requirement to register and cite identifiers for new designations of lectotypes, neotypes or epitypes (Article F.5.4). Pleomorphic life cycles – One fungus: one name Fungi can exhibit one or more asexual (anamorph) stages in addition to a sexual (teleomorph) stage, and are referred to as pleomorphic fungi (see Chapter 2). The propagules associated with each of these stages of pleomorphic fungi may exhibit quite different morphological characters and, in addition, they may also exhibit different ecological requirements and host associations. This behaviour is typical of many important plant-pathogenic fungi. Historically the connection between these different stages was often unrecognized, and consequently they were often named separately, but multiple names were accepted and persisted even when the connection was known. Until recently, the ICN was the only code of life that accepted dual nomenclature allowing these different and validly published names for the same taxon to coexist. With the advent of sequencing, it has become much easier to recognize the equivalence of these stages. Starting in 2013 dual nomenclature was dropped, and all fungal names for the same taxon compete according to the principle of priority. This change to the ICN came about after considerable and lengthy debate within the mycological community, with competing views on how the situation should be resolved (Gams et al., 2012). An alternative proposal was that a name associated with the teleomorph stage

43

should take priority over names of anamorphs. Either way it was clear that abandoning dual nomenclature would result in changes to many names familiar to plant pathologists and other important end users. This process of change is ongoing.

Lists of approved and rejected names To minimize this disruption, the ICN now also allows for lists of accepted names to be approved by the ICN Nomenclatural Committee. Names on these lists are effectively conserved over competing names (Article 14.3 and F.2). The same process may apply to lists of rejected names (Article F7.7). A significant role of these lists is to stabilize the use of generic names. Working groups were established to review names in various fungal groups (May, 2017), and several lists have been proposed and approved. Further lists have been proposed, for example names of clinical importance (de Hoog et al., 2015) and rust/smut fungi (Aime et al., 2018), and have yet to be formally approved by the ICN Nomenclatural Committee. They have, nevertheless, been adopted by mycologists. The insights provided by sequence data and the need to resolve relationships between pleomorphic fungi in many groups has led to an increased level of international collaboration and rapid progress in establishing phylogenetic relationships in many taxonomic groups, especially within the Ascomycota.

Naming cryptic diversity The increasing availability of sequence data has revealed many unanticipated relationships and the recognition of a high degree of cryptic diversity. The universal adoption of a phylogenetic taxon concept by the mycological community has, therefore, resulted in a large-scale systematic rearrangement of fungal classification, together with many new genera and species being described in recent years. The proliferation of new genera has sometimes been based on inadequate phylogenetic support, and guidelines for establishing new fungal genera have been published (Vellinga et al., 2015). The recognition of

44

A. Oren et al.

morphological distinctions between some taxa is often difficult or impossible, and sequencing provides the only tool for resolving identification issues. It seems likely that the current era of significant change will continue for some time.

eDNA The development of several relatively recent techniques and technologies (next-generation or high-throughput sequencing – NGS/HTS) has enabled the sampling and identification of DNA extracted from environmental samples (eDNA). These techniques include the sampling of DNA fungal barcode regions, for example the internal transcribed space region (ITS) (Schoch et al., 2012), which may then be compared with reference databases to identify the species present (Caporaso et al., 2010). More recently, techniques have been developed to facilitate whole-genome shotgun sequencing of environmental samples. The application of these techniques has exposed the extent of the hidden ‘dark matter fungi’ (Grossart et al., 2016) and also greatly accelerated our understanding of fungal ecology. Reference databases of sequenced material have an increasingly important role in interpreting such data. The interpretation of eDNA data requires careful consideration owing to several factors including the presence of extracellular DNA, the quality of eDNA reads, the short length of reads from some platforms, the clustering of reads to generate operational taxonomic units (OTUs), the presence of multiple intraspecies variants of a locus, the relatively low coverage of described species in fungal reference databases and the inability of a single locus to distinguish many species, especially plantpathogenic groups.

Linking names to DNA, and DNA to names It is a fundamental purpose of the Code to define the application of a name by the declaration of a nomenclatural type for each name, which is generally a vouchered type specimen. It is becoming increasingly important to link names

and type specimens to deposited sequence data. In 2013, 35% of all novel species had barcode loci data available, and that figure continues to rise (Crous et al., 2015). The vast number of names in the older literature are difficult to interpret within the context of a modern phylogenetic species concept. The use of these older names may be stabilized by proposing epitypes (Article 9.9) to act as a modern interpretive type, associated with deposited sequence data. This practice is to be encouraged (Ariyawansa et al., 2014). The fungal diversity revealed through eDNA includes many lineages of fungi that are currently unseen, undescribed and unculturable; these have become known as dark matter fungi, and include ancient but poorly known lineages (Menkis et al., 2014). This situation has led to a proposal that a DNA sample, even when mixed in an environmental sample, may serve as a holotype, together with an appropriate phylogenetically based description (Hawksworth et al., 2016; Ryberg and Nilsson, 2018). The proposal, in its current form, has so far been rejected (Hongsanan et al., 2018). It has become recommended practice to deposit reference sequences for the barcode loci associated with the publication of novel taxa, and especially of type material (Ariyawansa et al., 2014).

Data Standards and Databases Correct nomenclatural practice provides a stable, reliable and unique way of referring to each taxon. The scientific name then provides the gateway to all the information associated with organisms. Increasingly these data are managed and interrogated digitally. In this digital format, the scientific name is a deficient means for recognizing linkages between different data. The deficiency is due to the inevitable variability in the way scientific names are used – including orthographic variants, misspellings, homonyms, inclusion/exclusion of authorship and the form of the authorship. To resolve any ambiguity in the formatting of names it is, therefore, ideal for each name to be associated with a permanent and universally unique identifier (www.en.wikipedia.org/wiki/Universally_unique_identifier,

Microorganisms and Data Resources – Published Names

accessed 9 July 2020) which may act as a proxy for the name, in turn acting as a proxy for the taxon. The establishment of these identifiers is a key function of the registration repositories. The need to cross-reference different digital data extends far beyond names. For this reason, there has been a systematic effort over the last two decades to establish openly accessible biological data standards, including taxon names. One of the principal bodies responsible for establishing these standards is Biodiversity Information Standards (www.tdwg.org/standards/). Taxonomic Databases Working Group (TDWG) – today's Biodiversity Information Standards – established the Darwin Core (www.rs.tdwg.org/dwc/, accessed 9 July 2020), which was initially developed to facilitate the management of data associated with biological collections. One purpose of a data standard is to establish a uniform description of data elements (fields) within related groups of data, such as those associated with collections, names, locations, literature, media, people and so on. Another purpose is to define vocabularies for describing the content of data elements such as species names and place names. The Darwin Core is part of a much broader effort to establish and promote digital data standards, such as geospatial data (www. ogc.org, accessed 9 July 2020) and metadata (www.dublincore.org/, accessed 9 July 2020). These standards allow the development of generic data-management software which may be widely shared, for example the Specify collection management systems (www.sustain.specifysoftware. org/, accessed 9 July 2020). More importantly, the existence and adoption of data standards facilitates data interoperability (www.en.wikipedia. org/wiki/Interoperability, accessed 9 July 2020); that is, the ability to exchange, share, query and synthesize fragmented data that may be widely distributed. Data sharing and synthesis increasingly occur globally and in near real time. The Global Biodiversity Information Facility (GBIF, www. gbif.org, accessed 9 July 2020) is a good example of a global-scale information infrastructure based on open-data standards. GBIF is an intergovernmental organization currently with 60 participant countries plus many organizations. It provides centralized access to over 1.3 billion biodiversity species occurrence records from 46,000 different data sources, using open-data standards to

45

dynamically harvest and integrate data from these databases scattered around the globe. Central to the operation of GBIF is a global list of taxonomic names, in part provided by the global Catalogue of Life (CoL, www.catalogueoflife.org/, accessed 9 July 2020), which is an initiative to assemble and maintain a standard list of names of all organisms. Large-scale, standards-based digital infrastructure such as GBIF are becoming increasingly important as a means of synthesizing information about the status and change of life on earth. In the next section we briefly outline some of the principal digital resources supporting the creation, use and interpretation of scientific names relating to fungi and related areas.

Digital Resources Names New names for fungi can be validly introduced in any publication, providing the requirement of the code are satisfied. Until the introduction of name registration, comprehensive lists of fungal names were compiled manually by reviewing all the available literature. From 1940, this service was provided through the Index of Fungi published by the Commonwealth Mycological Institute, which became the International Mycological Institute and now CABI (www.cabi.org/publishingproducts/online-information-resources/indexof-fungi/, accessed 13 July 2020). In the 1980s, a database of names was created from the printed product and later supplemented by digitized data from other catalogues, such as the older Saccardo’s Sylloge Fungorum (1877–1886) and Petrak’s Index of Fungi (1936–1939), and lichen names from Lamb’s Index Nominum Lichenum (1963) and Zahlbruckner’s Catalogus Lichenum Universalis (1921–1940). At the same time, data for the higher classification of fungi were established to support the regular publication of the Dictionary of the Fungi (Kirk et al., 2008). The resulting databases were made available online starting in the late 1990s as IndexFungorum (http://www.indexfungorum.org/names/names. asp, accessed 15 October 2020). Currently IF is managed as an open resource by the IF Partnership and supported by the Royal

46

A. Oren et al.

otanic Gardens, Kew. IF provides an equivalent B resource to the International Plant Names Index (IPNI), also maintained by RBG Kew. The IF baseline data were made available to support the introduction of name registration in 2013 by three repositories: IndexFungorum, MycoBank and Fungal Names. Each of these repositories provides an equivalent service and maintains equivalent data. Crous et al. (2004) and Robert et al. (2013) provide background information on MycoBank (www.mycobank.org, accessed 9 July 2020). Figure 3.5 shows an example of the information found in MycoBank. ZooBank (www.zoobank.org, accessed 9 July 2020) provides the equivalent registration service for names covered by the zoological code (ICZN, 1999). There is no current and authoritatively maintained single global resource covering names for all organisms. Such a resource could be used to check for all homonyms across codes (hemihomonyms), although uBio and the Global Names Architecture (GNA) provide useful exemplars. A resource for checking potential generic homonyms across all organism groups (hemihomonyms) is provided by the Interim Register of Marine and Nonmarine Genera (IRMNG, www. irmng.org, accessed 9 July 2020).

Taxa The focus of digital resources such as IF and MycoBank is scientific names, where they were published and their nomenclatural status under ICN. They are nomenclatural databases and, in this respect, they are as complete as possible, and are continually improved and extended. A secondary activity is to provide taxonomic opinion; that is, to provide a consensus view on the currently preferred name for a taxon, together with lists of both homotypic (obligate/objective) and heterotypic (subjective) synonyms for this preferred name. Both MycoBank and SpeciesFungorum (based on IF) provide information on taxonomic opinion, but with limited information on the sources of opinion, or the evidence for consensus within the community. For fungi, the establishment and management of an effective resource would require substantial ongoing effort. One approach would be the development

of an open platform and broad community involvement, such as that provided by the World Registry of Marine Species (WoRMS, marinespecies. org, accessed 9 July 2020) for the classification of marine species. Within the plant community, the most widely adopted global-scale taxonomic resource is Plants of the World Online (PoWo, www. plantsoftheworldonline.org, accessed 9 July 2020), although the taxonomic opinion presented in PoWo is currently patchy both geographically and taxonomically. The Taxon omic Name Resolution Service (TNRS, www.tnrs.iplantcollaborative. org, accessed 9 July 2020) is a software tool for translating scientific names of plants into a standard form and translating into an accepted name following the Tropicos databases (www. tropicos.org, accessed 9 July 2020). Another important global initiative covering all species is the Catalogue of Life (CoL, www.catalogueoflife.org, accessed 9 July 2020). CoL is a global-scale project to regularly collate information from numerous Global Species Databases (GSD, globalspecies.org, accessed 9 July 2020) into a single combined catalogue of preferred names of taxa. Currently CoL covers 1.8 million species aggregated from 168 GSD. The currency of CoL names, and the completeness of synonymic names, varies according to taxonomic sector and to the status and open availability of GSD. There is a growing list of standards-based, open-software tools and data that facilitate taxonomic data-management processes. The ROpenSci library (www.ropensci.org, accessed 9 July 2020) is a good example of these kinds of frameworks that include taxonomic data-management packages.

Descriptive data Another significant component of a global information infrastructure are the descriptive data associated with each taxon. Unfortunately, existing digital descriptive data resources for fungi are scattered, taxonomically and geographically restricted, and are often not managed within an open standards-based framework. Fungal descriptive data may be found in such resources as the Encyclopedia of Life (EOL, www.eol.org, accessed 9 July 2020), MycoBank

Microorganisms and Data Resources – Published Names

Fig. 3.5. The entry for the species Hortaea werneckii in MycoBank (partial view) (www.mycobank.org/ BioloMICSDetails.aspx?Rec=13049; accessed 31 October 2019).

47

48

A. Oren et al.

(www.mycobank.org, accessed 9 July 2020), Wikipedia (www.wikipedia.org, accessed 9 July 2020), Wikispecies (https://species.wikimedia. org/wiki/Main_Page, accessed 9 July 2020), and Wikidata (www.wikidata.org/wiki/Wikidata: Main_Page, accessed 9 July 2020). In addition, there are numerous taxonomically or geographically focused resources for descriptive mycological data (e.g. Johnston et al., 2017).

Sequence-related databases There is an increasingly important role for sequence databases in fungal identification. GenBank (www.ncbi.nlm.nih.gov/genbank, accessed 9 July 2020), as part of the International Nucleotide Sequence Database Collaboration, plays a key role as the primary open repository for sequence data. It is well known that GenBank contains many incorrectly identified and poorly annotated sequences (Bridge et al., 2003). It is, therefore, important to identify high-quality data within GenBank, such as the subset associated with key publications, and especially the subset of data associated with barcode loci and type material, which are tagged as RefSeq- (www. ncbi.nlm.nih.gov/refseq, accessed 9 July 2020) targeted locus entries (Schoch et al., 2014). The National Center for Biotechnology Information (NCBI) Sequence Read Archive (www. ncbi.nlm.nih.gov/sra, accessed 9 July 2020) provides a repository for environmental sequence data generated by HTS, and searching such databases will be increasingly important as a source of species occurrence data. The International Barcode of Life (iBOL, www.ibol.org, accessed 9 July 2020) is the international initiative driving the adoption and use of barcode loci for species identification, and the Barcode of Life Data Systems (BOLD; www.barcodinglife.org, accessed 9 July 2020) provides online access to the available data. The inclusion of sequence data in the BOLD database requires adherence to several data-processing and data- quality standards, whereas GenBank will contain many additional sequences for barcode loci that may or may not achieve the same standard. The UNITE (eukaryotic nuclear ribosomal ITS region) database (https://unite.ut.ee, accessed 9 July 2020) was originally established as a

web-based tool for the identification of sequences of ectomycorrhizal fungi, although it now has much broader application. UNITE maintains a database of reference sequences which may be incorporated into standard workflows for analysing HTS data, for example Quantitative Insights into Microbial Ecology (QIIME) (Caporaso et al., 2010). An important innovation by UNITE was to generate unique identifiers for species hypotheses (SH) linked to reference sequences (Köljalg et al., 2013). UNITE SH identifiers represent the OTUs derived by clustering sequence data at some threshold (> 97% similarity). Many of these OTUs can be unambiguously linked to reference sequences for named taxa. However, SH identifiers may also refer to taxa that do not currently have species names. They facilitate identification and communication of the substantial unnamed dark matter fungi. There is a widespread practice of establishing similarity of sequence data to reference taxa using the basic local alignment search tool (BLAST) (https://blast.ncbi.nlm.nih.gov/Blast.cgi, accessed 9 July 2020). However, BLAST similarity is not always a good indication of phylogenetic relationships. It is always recommended that taxon relationships are confirmed through a phylogenetic analysis of sequence alignments of selected and similar representative taxa employing maximum likelihood estimation, for example Randomized Axelerated Maximum Likelihood (RAxML) (Stamatakis, 2006) and/or Bayesian analysis, for example MrBayes 3 (Ronquist and Huelsenbeck, 2003). In the last two decades we have witnessed an exponential growth in the generation and use of sequence data to establish and confirm species concepts (see Chapters 11, 12, 13, 14, 15). Initially this work was based on one or few loci of relatively short sequence length, including the now-established barcode loci. However, single barcode loci are often unable to distinguish species in some important groups, such as plant pathogens. In addition, single loci, which are often good at distinguishing species-level differences (such as ITS), are simultaneously poor at resolving the deeper phylogenetic relationships between species. Consequently, there is an increasing use of multiple loci to establish species concepts and relationships (Jayawardena et al., 2019). The natural progression of this approach is to employ full genome data and to

Microorganisms and Data Resources – Published Names

analyse many thousands of appropriate loci. The cost and ease of generating full genome data is falling and the tools available to analyse these data continue to improve. A steady progression to a phylogenomic-based approach (en.wikipedia.org/wiki/Phylogenomics) is likely in the coming years and will necessitate an evolution in the resources and tools we employ to discover and name fungi.

Conclusion The naming of microorganisms is governed by different codes of nomenclature: for prokaryotes (excluding most cyanobacteria) the ICNP; for yeasts and other microfungi, and also most cyanobacteria, the ICN; and for protists (not further discussed here) the ICZN (International Commission on Zoological Nomenclature, 1999). In the past there have been attempts towards the harmonization of all biological nomenclature by the establishment of a single ‘BioCode’. Different draft versions of this BioCode have been published, the last one dating from 2011 (Greuter et al., 2011; Hawksworth, 2011). As far as we are aware, little progress has been made since, and we do not expect that a universal BioCode will soon replace the separate codes of nomenclature for prokaryotes, plants (including algae and fungi), animals and viruses.

49

One of the basic features of the Prokaryotic Code – the ICNP – is the central registration of validly published names. This makes it very easy to search for names with standing in the nomenclature, and these names can be retrieved from different online resources. As explained above, an effort towards registration of Candidatus names is now under way. The efforts made by the curators of the CyanoDB online database (thus far only at the genus level) have greatly simplified access to nomenclatural information on this group, whose status within the ICNP and the ICN still needs to be discussed. The recent introduction of the concept of central registration of validly published names of fungi (Article F5; Turland et al., 2018) is an important step forwards, and fungal name registration resources such as IndexFungorum and MycoBank Database are already now providing all the information in a single framework. Two decades ago, universal registration for any group of organisms covered by the rules of the ‘Botanical Code’ (ICBN, now the ICN) would have been unthinkable. It is interesting to read the last paragraphs of the Preface to the 2000 version of that code (the ‘Saint Louis code’; Greuter and Hawksworth, 2000) to discover how times have changed. We can only hope that the good example given by the fungal taxonomists will soon be followed by the experts on other groups of organisms whose nomenclature is regulated by the ICN.

References Aime, M.C., Castlebury, L.M., Abbasi, M., Begerow, D., Berndt, R. et al. (2018) Competing sexual and asexual generic names in Pucciniomycotina and Ustilaginomycotina (Basidiomycota) and recommendations for use. IMA Fungus 9, 75–89. DOI: 10.5598/imafungus.2018.09.01.06 Anonymous (1993) Book announcement: Bacterial nomenclature up-to-date. FEMS Microbiology Letters, Volume 106, Issue 3, page ii. DOI: 10.1111/j.1574-6968.1993.tb05967.x Ariyawansa, H.A., Hawksworth, D.L., Hyde, K.D., Jones, E.G., Maharachchikumbura, S.S. et al. (2014) Epitypification and neotypification: guidelines with appropriate and inappropriate examples. Fungal Diversity 69, 57–91. DOI: 10.1007/s13225-014-0315-4 Bridge, P.D., Roberts, P.J., Spooner, B.M. and Panchal, G. (2003) On the unreliability of published DNA sequences. New Phytologist 160, 43–48. DOI: 10.1046/j.1469-8137.2003.00861.x Buchanan, R.E., Holt, J.G. and Lessel, E.F. (1966) Index Bergeyana. An Annotated Alphabetic Listing of Names of the Taxa of the Bacteria. Williams & Wilkins, Baltimore, Maryland. Caporaso, J.G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F.D. et al. (2010) QIIME allows analysis of high-throughput community sequencing data. Nature Methods 7, 335e336. DOI: 10.1038/nmeth.f.303 Castenholz, R.W. (1992) Species usage, concept, and evolution in the cyanobacteria (blue-green algae). Journal of Phycology 28, 737–745. DOI: 10.1111/j.0022-3646.1992.00737

50

A. Oren et al.

Crous, P.W., Gams, W., Stalpers, J.A., Robert, V. and Stegehuis, G. (2004) MycoBank: an online initiative to launch mycology into the 21th century. Studies in Mycology 50, 19–22. Crous, P.W., Hawksworth, D.L. and Wingfield, M.J. (2015) Identifying and naming plant-pathogenic fungi: past, present, and future.Annual Review of Phytopathology 53, 247-267.DOI:10.1146/annurev-phyto-080614-120245 de Hoog, G.S., Chaturvedi, V., Denning, D.W., Dyer, P.S., Frisvad, J.C. et al. (2015) Name changes in medically important fungi and their implications for clinical practice. Journal of Clinical Microbiology 53, 1056–1062. DOI: 10.1128/JCM.02016-14 Euzéby, J.P. (1997) List of Bacterial Names with Standing in Nomenclature: a folder available on the Internet. International Journal of Systematic Bacteriology 47, 590–592. DOI: 10.1099/00207713-47-2-590 Farr, E.R. and Zijlstra, G. (1996+) Index Nominum Genericorum (Plantarum). Available at: botany.si.edu/ ing/ (accessed 5 August 2019) Gams, W., Humber, R.A., Jaklitsch, W., Kirschner, R. and Stadler, M. (2012) Minimizing the chaos following the loss of Article 59: Suggestions for a discussion. Mycotaxon 119, 495–507. https://doi.org/ 10.5248/119.495 Garrity, G. (2010) NamesforLife Browser Tool takes expertise out of the database and puts it right in the browser. Microbiology Today 2(2), 9. Garrity, G.M., Lilburn, T.G., Cole, J.R., Harrison, S.H., Euzéby, J. et al. (2007) Taxonomic Outline of the Bacteria and Archaea, Release 7.7. Michigan State University Board of Trustees. DOI:10.1601/TOBA7.7 Gevers, D., Cohan, F.M., Lawrence, J.G., Spratt, B.G., Coenye, T. et al. (2005) Re-evaluating prokaryotic species. Nature Reviews of Microbiology 3, 733–739. DOI: 10.1038/nrmicro1236 Greuter, W. and Hawksworth, D.L. (2000) Preface. In: Greuter, W., NcNeill, J., Barrie, F.R., Burdet, H.M., Demoulin, V. et al. (eds) International Code of Botanical Nomenclature (Saint Louis Code). Koeltz Scientific Books, D-61453 Königstein, Germany, pp. vii–xviii. Available at: https://archive.bgbm.org/iapt/ nomenclature/code/SaintLouis/0000St.Luistitle.htm (accessed 13 August 2019) Greuter, W., Garrity, G., Hawksworth, D.L., Jahn, R., Kirk, P.M. et al. (2011) Draft BioCode (2011). Principles and Rules regulating the naming of organisms. New draft, revised in November 2010. Taxon 60, 201–212; Bionomina 3, 26–44. DOI: dx.doi.org/10.11646/bionomina.3.1.3 Grossart, H.-P., Wurzbacher, C., James, T.Y. and Kagami, M. (2016) Discovery of dark matter fungi in aquatic ecosystems demands a reappraisal of the phylogeny and ecology of zoosporic fungi. Fungal Ecology 19, 28–38. https://doi.org/10.1016/j.funeco.2015.06.004 Hauer, T. and Komárek, J. (2019) CyanoDB.cz 2.0 – On-line database of cyanobacterial genera – Worldwide electronic publication, University of South Bohemia & Institute of Botany AS CR. Available at: www.cyanodb.cz (accessed 5 August 2019) Hawksworth, D.L. (2010) Terms Used in Bionomenclature.The Naming of Organisms (and Plant Communities). Global Biodiversity Information Facility, Copenhagen, 216 pp, accessible online at www.gbif.org/document/80577. ISBN: 87-92020-09-7. Hawksworth, D.L. (2011) Introducing the Draft BioCode (2011) Bionomina 3, 24–25. DOI: dx.doi. org/10.11646/bionomina.3.1.2 Hawksworth, D.L., Hibbett, D.S., Kirk, P.M. and Lucking, R. (2016) (308–310) Proposals to permit DNA sequence data to serve as types of names of fungi. Taxon 6, 899–900. https://doi.org/10.12705/654.31 Hill, L.R., Skerman, V.B.D. and Sneath P.H.A. (eds) (1984) Corrigenda to the Approved Lists of Bacterial Names. International Journal of Systematic Bacteriology 34, 508–511. Hoffmann, L. (2005) Nomenclature of Cyanophyta/Cyanobacteria: roundtable on the unification of the nomenclature under the Botanical and Bacteriological Codes. Algological Studies 117(6), 13–29. DOI: 10.1127/1864-1318/2005/0117-0013 Hongsanan, S., Jeewon, R., Purahong, W., Xie, N., Liu, J.-K. et al. (2018) Can we use environmental DNA as holotypes? Fungal Diversity 92, 1–30. https://doi.org/10.1007/s13225-018-0404-x ICZN (1999) International Code of Zoological Nomenclature. Fourth Edition. The International Trust for Zoological Nomenclature, London. 306 pp. ISBN: 0 85301 006 4. Also availalble at https://www.iczn. org/the-code/the-international-code-of-zoologicalnomenclature/the-code-online/ (accessed 14 August 2019). Imhoff, J.F. (2014) International Committee on Systematics of Prokaryotes. Subcommittee on the taxonomy of phototrophic bacteria. Minutes of the closed online meeting, 10–30 June 2014. International Journal of Systematic and Evolutionary Microbiology 64, 3910–3012. DOI: 10.1099/ijs.0.068908-0 IndexFungorum. www.indexfungorum.org (retrieved 10th September 2019) Indexing & Organizing Biological Names. uBio. Available at: ubio.org (accessed 5 August 2019)

Microorganisms and Data Resources – Published Names

51

International Commission on Zoological Nomenclature (1999) International Code of Zoological Nomenclature. Fourth Edition. Availalble at https://www.iczn.org/the-code/the-international-code-of-zoologicalnomenclature/the-code-online/ (accessed 14 August 2019). Jayawardena, R.S., Hyde, K.D., Jeewon, R., Ghobad-Nejhad, M., Wanasinghe, D.N. et al. (2019) One stop shop II: taxonomic update with molecular phylogeny for important phytopathogenic genera: 26–50 (2019). Fungal Diversity 94, 41–129. DOI: https://doi.org/10.1007/s13225-019-00418-5. Johnston, P.R., Weir, B.S. and Cooper, J.A. (2017) Open data on fungi and bacterial plant pathogens in New Zealand. Mycology 8(2), 59–66. DOI: dx.doi.org/10.1080/21501203.2016.1278409 Kirk, P.M., Cannon, P.F., Minter, D.W. and Stalpers, J.A. (2008) Dictionary of the Fungi, 10th edn. CABI, Wallingford, UK. Köljalg, U., Nilsson, R.H., Abarenkov, K., Tedersoo, L., Taylor, A.F. et al. (2013) Towards a unified paradigm for sequence-based identification of fungi. Molecular Ecology 22, 5271–5277. DOI: 10.1111/ mec.12481 Konstantinidis, K.T. and Rosselló-Móra, R. (2015) Classifying the uncultivated microbial majority: A place for metagenomic data in the Candidatus proposal. Systematic and Applied Microbiology 38, 223–230. DOI: 10.1016/j.syapm.2015.01.001. Konstantinidis, K.T. and Tiedje, J.M. (2005) Genomic insights that advance the species definition for prokaryotes. Proceedings of the National Academy of Sciences of the USA 102, 2567–2572. DOI: 10.1073/pnas.0409727102 Konstantinidis, K.T., Rosselló-Móra, R. and Amman, R. (2017) Uncultivated microbes in need of their own taxonomy. The ISME Journal 11, 2399–2406. DOI: 10.1038/ismej.2017.113 Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Germany, Prokaryotic Nomenclature Up-to-Date [August 2019]. Available at: www.dsmz.de/bacterial-diversity/prokaryoticnomenclature-up-to-date (accessed 5 August 2019) Ludwig, W., Schleifer, K.H. and Whitman, W.B. (2009) Revised road map to the phylum Firmicutes. In: De Vos, P., Garrity, G.M., Jones, D., Krieg, N.L., Ludwig, W. et al. (eds) Bergey’s Manual of Systematic Bacteriology. Vol. 3. The Firmicutes. Springer, New York. DOI: 10.1002/9781118960608. bm00025 Ludwig, W., Euzéby, J., and Whitman, W.B. (2010) Taxonomic outline of the Bacteroidetes, Spirochaetes, Tenericutes (Mollicutes), Acidobacteria, Fibrobacteres, Fusobacteria, Dictyoglomi, Gemmatimonadetes, Lentisphaerae, Verrucomicrobia, Chlamydiae, and Planctomycetes. In: Krieg N.L., Staley, J.T., Brown, D.R., Hedlund, B.P., Paster, B.J. et al. (eds) Bergey's Manual of Systematic Bacteriology. Vol. 4, The Bacteroidetes, Spirochaetes, Tenericutes (Mollicutes), Acidobacteria, Fibrobacteres, Fusobacteria, Dictyoglomi, Gemmatimonadetes, Lentisphaerae, Verrucomicrobia, Chlamydiae, and Planctomycetes. Springer, New York. May, T.W. (2017) Report of the Nomenclature Committee for Fungi – 21: Lists from working groups. IMA Fungus 8, 205–210. https://doi.org/10.12705/662.16 McNeill, J., Barrie, F.R., Buck, W.R. Demoulin, V., Greuter, W. et al. (eds) (2012) International Code of Nomenclature for algae, fungi, and plants (Melbourne Code) adopted by the Eighteenth International Botanical Congress Melbourne, Australia, July 2011. [Regnum Vegetabile no. 154.] A.R.G. Gantner Verlag, Ruggell, Liechtenstein. https://doi.org/10.12705/Code.2018 Menkis, A., Urbina, H., James, T.Y. and Rosling, A. (2014) Archaeorhizomyces borealis sp. nov. and a sequence-based classification of related soil fungal species. Fungal Biology 118, 943–955. DOI: 10.1016/j.funbio.2014.08.005 Murray, R.G.E. and Schleifer, K.H. (1994) Taxonomic notes: a proposal for recording the properties of putative taxa of procaryotes. International Journal of Systematic Bacteriology 44, 174–176. DOI: 10.1099/00207713-44-1-174 Murray, R.G.E. and Stackebrandt, E. (1995) Taxonomic Note: implementation of the provisional status Candidatus for incompletely described procaryotes. International Journal of Systematic Bacteriology 45, 186–187. DOI: 10.1099/00207713-45-1-186 Nabout, J.C., da Silva Rocha, B., Carniero, F.M. and Sant’Anna, C.L. (2013) How many species of cyanobacteria are there? Using a discovery curve to predict the species number. Biodiversity and Conservation 22, 2907–2918. DOI: 10.1007/s10531-013-0561-x NamesforLife. Available at: namesforlife.com (accessed 5 August 2019) Nomenclator Zoologicus. Available at: ubio.org/index.php?pagename=NZ (accessed 5 August 2019) Oren, A. (2004) A proposal for further integration of the cyanobacteria under the Bacteriological Code. International Journal of Systematic and Evolutionary Microbiology 54, 1895–1902. DOI: 10.1099/ ijs.0.03008-0

52

A. Oren et al.

Oren, A. (2017) A plea for linguistic accuracy – also for Candidatus taxa. International Journal of Systematic and Evolutionary Microbiology 67, 1085–1094. DOI: 10.1099/ijsem.0.001715 Oren A. and Garrity, G.M. (2013) Retirement of Professor Jean Paul Euzéby as List Editor. International Journal of Systematic and Evolutionary Microbiolog 63, 2373. DOI: 10.1099/ijs.0.052316-0 Oren, A. and Garrity, G.M. (2014a) Then and now: a systematic review of the systematics of prokaryotes in the last 80 years. Antonie van Leeuwenhoek 106, 43–56. DOI: 10.1007/s10482-013-0084-1 Oren, A. and Garrity, G.M. (2014b) Proposal to change General Consideration 5 and Principle 2 of the International Code of Nomenclature of Prokaryotes. International Journal of Systematic and Evolutionary Microbiology 64, 309–310. DOI: 10.1099/ijs.0.059568-0 Oren, A. and Garrity G.M. (2018) Uncultivated microbes – in need of their own nomenclature? The ISME Journal 12, 309–311. DOI: 10.1038/ismej.2017.113 Oren, A. and Komárek, J. (2010) Nomenclature of the Cyanobacteria/Cyanophyta – current problems and proposed solutions. Notes based on a roundtable discussion held on August 16, 2010 during the 18th Symposium of the International Association for Cyanophyte Research, České Budĕjovice, Czech Republic. The Bulletin of BISMiS 1, 25–33. ISSN 2159-287X Oren, A. and Rupnik, M. (2018) Clostridium difficile and Clostridioides difficile: Two validly published and correct names. Anaerobe 52, 125–126. DOI: 10.1016/j.anaerobe.2018.07.005 Oren, A. and Tindall, B.J. (2005) Nomenclature of the cyanophyta/cyanobacteria/cyanoprokaryotes under the International Code of Nomenclature of Prokaryotes. Algological Studies 117, 39–52. DOI: 10.1127/1864-1318/2005/0117-0039 Oren, A. and Trujillo, M.E. (2019) On the valid publication of names of mycobacteria. Comments on “Same meat, different gravy: ignore the new names of mycobacteria” by E. Tortoli et al. (Eur Respir J 2019; 54: 1990795). European Respiratory Journal 2019, 1901483. DOI: 10.1183/13993003.01483-2019 Oren, A. and Ventura, S. (2017) The current status of cyanobacterial nomenclature under the “prokaryotic” and the “botanical” code. Antonie van Leeuwenhoek 110, 1257–1269. DOI: 10.1007/s10482-017-08480 Oren, A., Komárek, J. and Hoffmann, L. (2009) Nomenclature of the Cyanophyta/Cyanobacteria/Cyanoprokaryotes: what has happened since IAC Luxembourg? Algological Studies 130, 17–26. DOI: 10.1127/1864-1318/2009/0130-0017 Oren, A., da Costa, M.S., Garrity, G.M., Rainey, F.A., Rosselló-Móra, R. et al. (2015) Proposal to include the rank of phylum in the International Code of Nomenclature of Prokaryotes. International Journal of Systematic and Evolutionary Microbiology 65, 4284–4287. DOI: 10.1099/ijsem.0.000664 Oren, A., Chuvochina, M., Garrity, G.M. and Trujillo, M.E. (2020) Lists of names of prokaryotic Candidatus taxa. International Journal of Systematic and Evolutionary Microbiology 70, 3956–4042. DOI: 10.1099/ ijsem.0.003789 Palinska, K.A. and Surosz, W. (2014) Taxonomy of cyanobacteria: a contribution to consensus approach. Hydrobiologia 740, 1–11. DOI: 10.1007/s10750-014-1971-9 Parker, C.T., Tindall, B.J. and Garrity, G.M. (2019) International Code of Nomenclature of Prokaryotes. Prokaryotic Code (2008 revision). International Journal of Systematic and Evolutionary Microbiology 69, S1–S111. DOI: 10.1099/ijsem.0.000778 Parte, A.C. (2014) LPSN – list of prokaryotic names with standing in nomenclature. Nucleic Acids Research 42, D613–D616. DOI: 10.1093/nar/gkt1111 Parte, A.C. (2018) LPSN – List of Prokaryotic names with Standing in Nomenclature (bacterio.net), 20 years on. International Journal of Systematic and Evolutionary Microbiology 68, 1825–1829. DOI: 10.1099/ ijsem.0.002786 Paskin, N. (2004) Digital Object Identifiers for scientific data. Paper presented at 19th International CODATA Conference, Berlin, November, 7–10, 2004. www.codata.info/04conf/papers/Paskin-paper.pdf Pinevich, A.V. (2015) Proposal to consistently apply the International Code of Nomenclature of Prokaryotes (ICNP) to names of the oxygenic photosynthetic bacteria (cyanobacteria), including those validly published under the International Code of Botanical Nomenclature (ICBN)/International Code of Nomenclature for algae, fungi, and plants (ICN), and proposal to change Principle 2 of the ICNP. International Journal of Systematic and Evolutionary Microbiology 65, 1070–1074. DOI: 10.1099/ijs.0.000034 Robert, V., Stegehuis, G. and Stalpers, J. (2005) The MycoBank engine and related databases. Available at www.MycoBank.org (accessed 5 August 2019) Robert, V., Vu, D., Amor, A.B.H., van de Wiele, N., Brouwer, C. et al. (2013) MycoBank gearing up to new horizons. IMA Fungus 4, 371–379. DOI: 10.5598/imafungus.2013.04.02.16

Microorganisms and Data Resources – Published Names

53

Ronquist, F. and Huelsenbeck, J.P. (2003) MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574. https://doi.org/10.1093/bioinformatics/btg180 Rosselló-Mora, R. and Amann, R. (2001) The species concept for prokaryotes. FEMS Microbiology Reviews 25, 39–67. DOI: 10.1111/j.1574-6976.2001.tb00571.x Rosselló-Móra, R. and Amann, R. (2015) Past and future species definitions for Bacteria and Archaea. Systematic and Applied Microbiology 38, 209–216. DOI: 10.1016/j.syapm.2015.02.001 Ryberg, M. and Nilsson, R.H. (2018) New light on names and naming of dark taxa. MycoKeys 23, 31–39. DOI: 10.3897/mycokeys.30.24376 Schoch, C.L., Seifert, K.A., Huhndorf, S., Robert, V., Spouge, J.L. et al. (2012) Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi. Proceedings of the National Academy of Sciences of the USA 109, 6241–6246. DOI: 10.1073/pnas.1117018109 Schoch, C.L., Robbertse, B., Robert, V., Vu, D., Cardinali, G. et al. (2014) Finding needles in haystacks: linking scientific names, reference specimens and molecular data for fungi. Database 2014, 1–21. DOI: https://doi.org/10.1093/database/bau061 Skerman, V.B.D., McGowan, V. and Sneath, P.H.A. (1980) Approved Lists of Bacterial Names. International Journal of Systematic Bacteriology 30, 225–420. DOI: 10.1099/00207713-30-1-225 Skerman, V.B.D., McGowan, V. and Sneath, P.H.A. (1989) Approved Lists of Bacterial Names (Amended Edition). American Society for Microbiology, Washington, DC. Stamatakis, A. (2006) RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690. https://doi.org/10.1093/bioinformatics/btl446 Stanier, R.Y., Sistrom, W.R., Hansen, T.A., Whitton, B.A., Castenholz, R.W. et al. (1978) Proposal to place the nomenclature of the cyanobacteria (blue-green algae) under the rules of the International Code of Nomenclature of Bacteria. International Journal of Systematic Bacteriology 28, 335–336. Taylor, J.W., Jacobson, D.J., Kroken, S., Kasuga, T., Geiser, D.M. et al. (2000) Phylogenetic species recognition and species concepts in fungi. Fungal Genetics and Biology 31, 21–32. DOI: 10.1006/fgbi.2000.1228 Thompson, C.C., Amaral, G.R., Campeão, M., Edwards, R.A., Polz, M.E. et al. (2015) Microbial taxonomy in the post-genomic era: Rebuilding from scratch? Archives of Microbiology 197, 359–370. DOI: 10.1007/s00203-014-1071-2 Tindall, B.J. (1999) Misunderstanding the Bacteriological Code. International Journal of Systematic and Evolutionary Microbiology 49, 1313–1316. DOI: 10.1099/00207713-49-3-1313 Tindall, B.J. and Euzéby, J.P. (2006) Proposal of Parvimonas gen. nov. and Quatrionicoccus gen. nov. as replacements for the illegitimate, prokaryotic, generic names Micromonas Murdoch and Shah 2000 and Quadricoccus Maszenan et al. 2002, respectively. International Journal of Systematic and Evolutionary Microbiology 56, 2711–2713. DOI: 10.1099/ijs.0.64338-0 Tindall, B.J., Kämpfer, P., Euzéby, J. and Oren, A. (2006) Valid publication of names of prokaryotes according to the rules of nomenclature: past history and current practice. International Journal of Systematic and Evolutionary Microbiology 56, 2715–2720. DOI: 10.1099/ijs.0.64780-0 Tortoli, E., Brown-Elliott, B.A., Chalmers, J.D., Cirillo, D.M., Daley, C.L. et al. (2019) Same meat, different gravy: ignore the new names of mycobacteria. European Respiratory Journal 54, 1900795. DOI: 10.1183/13993003.00795-2019 Trujillo, M.E., Oren, A. and Garrity, G.M. (2019) Preparation of the Validation Lists and the role of the List Editors. International Journal of Systematic and Evolutionary Microbiology 69, 3–4. DOI: 10.1099/ ijsem.0.003106 Turland, N.J. (2019) The Code Decoded. A User’s Guide to the International Code of Nomenclature for Algae, Fungi, and Plants. 2nd edition. Pensoft. https://ab.pensoft.net/book/38075 Turland, N.J., Wiersema, J.H., Barrie, F.R., Greuter, W., Hawksworth, D.L. et al. (eds) (2018) International Code of Nomenclature for algae, fungi, and plants (Shenzhen Code) adopted by the Nineteenth International Botanical Congress Shenzhen, China, July 2017. Regnum Vegetabile 159. Koeltz Botanical Books, Glashütten. DOI: https://doi.org/10.12705/Code.2018. Available at: iapt-taxon.org/nomen/main. php (accessed 5 August 2019) Vellinga, E.C., Kuyper, T.W., Ammirati, J., Desjardin, D.E., Halling, R.E. et al. (2015) Six simple guidelines for introducing new genera of fungi. IMA Fungus 6, 65–68. DOI: 10.1007/BF03449356 Wayne, L.G. (1991) Judicial Commission of the International Committee on Systematic Bacteriology. Minutes of the Meeting, 14 September 1990, Osaka, Japan. International Journal of Systematic Bacteriology 41, 185–187. DOI: 10.1099/00207713-41-1-185 Whitman, W.B. (ed.) (2015a) Bergey’s Manual of Systematics of Archaea and Bacteria. John Wiley, Chichester, UK. DOI: 10.1002/9781118960608

54

A. Oren et al.

Whitman, W.B. (2015b) Genome sequences as the type material for taxonomic descriptions of prokaryotes. Systematic and Applied Microbiology 38, 217–222. DOI: 10.1016/j.syapm.2015.02.003 Whitman, W.B. (2016) Modest proposals to expand the type material for naming of prokaryotes. International Journal of Systematic and Evolutionary Microbiology 66, 2108–2112. DOI: 10.1099/ijsem.0.000980. Whitman, W.B., Bull, C.T., Busse, H.-J., Fournier, P.-E., Oren, A. et al. (2019) Request for revision of the Statutes of the International Committee on Systematics of Prokaryotes. International Journal of Systematic and Evolutionary Microbiology 69, 584–593. DOI: 10.1099/ijsem.0.003117. Wu, L., Sun, Q., Sugawara, H., Yang, S., Zhou, Y. et al. (2013) Global catalogue of microorganisms (gcm): a comprehensive database and information retrieval, analysis, and visualization system for microbial resources. BMC Genomics 14:933. DOI:10.1186/1471-2164-14-933 Yarza, P., Richter, M., Peplies, J., Euzéby, J., Amann, R. et al. (2008) The All-Species Living Tree project: a 16S rRNA-based phylogenetic tree of all sequenced type strains. Systematic and Applied Microbiology 31, 241–250. DOI: 10.1016/j.syapm.2008.07.001

4

Preserving the Reference Strains

David Smith1,* and Vera Bussas2 CABI, Bakeham Lane, Egham, Surrey, UK; 2Formerly Leibniz-Institut DSMZ-German Collection of Microorganisms and Cell Cultures, Inhoffenstr, Braunschweig, Germany

1

Rationale In the context of microbial identification, reference strains represent the nomenclatural types of the species; they are deposited as vouchers in public service collections or microbial domain biological resource centres (mBRC) (OECD, 2017). They are made available for study and comparison with unknown strains and, among other things, to resolve problems and to be revisited as technologies, opinions and understanding change. The status of these living representatives of fungi and bacteria differ (see Chapter 3). The International Code of Nomenclature for algae, fungi, and plants (ICN) provides the set of rules and recommendations dealing with the formal botanical names that are given to plants, fungi and a few other groups of organisms; all those ‘traditionally treated as algae, fungi, or plants’ (www.iapt-taxon. org/nomen/main.php, accessed 9 July 2020). The articles of this code (8.1) describe the type (holotype, lectotype or neotype) of a name of a species or infraspecific taxon as either a single specimen conserved in one herbarium or other collection or institution, or a published or unpublished illustration. Article 8.4 states that type specimens of named taxa must be preserved permanently and may not be living organisms or cultures. However, Recommendation 8B states

that, whenever practicable, a living culture should be prepared from the holotype material of the name of a newly described taxon of algae or fungi and deposited in at least two institutional culture or genetic resource collections. The recommendation also states, however, that ‘such action does not obviate the requirement for a holotype specimen under Art. 8.4.’ The status given to a living culture of a fungus cultured from the holotype that is preserved in a collection is that of ‘ex-type species’. In bacteriology, the requirements for a valid publication of a name of a taxon (including a new combination) are described in the International Code of Nomenclature of Prokaryotes (www.the-icsp.org/bacterial-code, accessed 9 July 2020); they include publication in the International Journal of Systematic and Evolutionary Microbiology (IJSEM). IJSEM is the leading forum if you want to describe novel microbial taxa, as it is the official journal of record for prokaryotic names of the International Committee on Systematics of Prokaryotes (ICSP). The journal publishes notification lists where new names of prokaryotes, new combinations and taxonomic opinions – which have already been published in the journal – are presented. However, if the new name or combination is published elsewhere, it is not validly published until it has been published

*[email protected] © CAB International 2021. Trends in the Systematics of Bacteria and Fungi (eds. P. Bridge, D. Smith and E. Stackebrandt)

55

56

D. Smith and V. Bussas

in IJSEM (see Chapter 3). These names of new prokaryotes that have not been published in IJSEM from the very beginning will be validated by the journal in official validation lists (www. microbiologyresearch.org/journal/ijsem/scope, accessed 9 July 2020). The valid description is accompanied by the requirement for the type strain to be deposited in two different collections in two different countries. Tindall (2015) discusses the valid publication of bacterial names in his paper ‘On the valid publication of names and combinations’. The type specimens of fungi and the type strains of bacteria are made available for use in subsequent taxonomic studies, industrial applications, plant growth enhancement, bioremediation and many other applications. The dead, dried specimens of fungi do not often lend themselves for application of techniques such as genome sequencing, or of protein or metabolic profiling, although technologies have been developed to extract DNA from them (Forin et al., 2018; also see Chapter 18). Such methodologies can often overcome the high fragmentation of the DNA and the massive occurrence of non-target DNA from fungal contaminants. However, the DNA often deteriorates over time and in the course of the sample preparation, and so ex-type strains are more often used. There have been several large-scale sequencing projects in both bacteriology and mycology; however, whereas most type species of bacteria are sequenced, this is far from being the case with the fungi. It is estimated that there are still only 35,000 correctly identified fungal species from the 120,000 described and represented by genome sequences in public databases (Forin et al., 2018). It is critical that storage of the living reference strains, on which the names and properties are based and the DNA sequenced to assign a name (the reference genetic resources), are preserved optimally to retain stability. The fact that less than 1% of microbial diversity can be grown sets enormous challenges for repositories (mBRCs). It is most often the case that it is an axenic culture of the reference genetic resource that is preserved but, for those organisms that cannot be grown or where molecular techniques are used to identify the organism, DNA should be stored. This task increases further when the microbiome is being studied, and environmental samples from whole communities are examined;

mBRCs need to address how these can be preserved too. This chapter focuses on property retention, selecting the appropriate techniques for long- term survival and stability of characters. It covers the operations of mBRCs and the most appropriate technologies and mechanisms for stability testing and quality assurance. It addresses the preservation of microbial strains of the wide range of archaeal, bacterial (including cyanobacterial), yeast and fungal type and reference strains. However, it should be noted that virus and bacteriophages are not covered in this chapter.

Introduction The first public microbial culture collections were established around the turn of the 19th century, providing researchers with the raw material for their research and innovation to improve health, crop production and food quality. In parallel with the evolving microbiological disciplines – introducing novel isolation and identification strategies, unravelling the physiology of microorganisms and their unique chemical structures, and tracing life in its enormous depth and breadth back to the dawn of evolution by analyses of genes – new tasks were introduced into culture collections. Published science, research and development rely on the biological material on which key hypotheses and discoveries are made. Originally confined to the accession, long-term maintenance and provision of microbial resources, culture collections evolved into multi-task centres in which the core activities were accompanied by consultation and the offer of identification and training services. The recent surge in the importance of biodiversity has led to an impressive increase in new collections registered in the World Federation for Culture Collections’ World Data Centre for Microorganisms (www.wdcm.org/, accessed 16 October 2020). At the same time, the needs to harmonize the quality of collection management to better serve bioindustry was recognized by the Organisation for Economic Co-operation and Development (OECD), which coined the phrase ‘biological resource centres’ (BRC) for those user-driven public culture collections with an improved quality management system and which were introducing a collection-related research component (OECD, 2007). Other issues of public interest,

Preserving the Reference Strains

such as bio-risk, intellectual property rights and material transfer agreement, or provision of resources and associated data, are additional tasks required of BRCs (including the mBRC) (OECD, 2007). As no single mBRC is sufficiently equipped to comply with all these personnel and labourintensive tasks there is a need for better coordination of the fragmented landscape of mBRCs. The World Federation for Culture Collections (WFCC) was established as a Multidisciplinary Commission of the International Union of Biological Sciences (IUBS) and a Federation within the International Union of Microbiological Societies (IUMS). The WFCC (www.wfcc.info, accessed 9 July 2020) brings together collections from over 70 countries to discuss and develop standards and protocols to improve collection operations. There are also regional efforts, over 20 such networks are listed and linked to on the WFCC website (www.wfcc.info/collections/ networks/, accessed 9 July 2020). Examples of these include the European Culture Collections’ Organisation (ECCO; www.eccosite.org/, accessed 9 July 2020) and the United States Culture Collection Network (USCCN; www.usccn.org/Pages/ default.aspx/, accessed 16 October 2020). Together, such networks help collections select the most appropriate approaches and methodologies, and raise awareness of best practice and regulatory compliance. The European Strategy Forum on Research Infrastructures (ESFRI; www. esfri.eu/, accessed 16 October 2020) recognized this need for better coordination and, as a result, initiated the Microbial Resource Research Infrastructure (MIRRI) in 2012. The goal was to facilitate harmonized collaboration between some European mBRCs, working on strategies to improve the provision of resources and services from mBRCs to the user by better communication and streamlining mBRC operations. Similar initiatives on other continents, such as the Asian Biological Resource Centers Network, provide the potential to establish the global biological resource centre network envisaged by the OECD (www.oecd.org/science/emerging-tech/ towardsaglobalbiologicalresourcecentrenetwork. htm, accessed 9 July 2020). The mBRC services are there to ensure that investment in research is protected, and that the materials and information generated are there for confirmation of work and further study; they also facilitate shipping that conforms to regulations across

57

the globe, enabling the researchers to focus on the science (see Chapter 6). Using a non-culture assessment of the prokaryotic richness in various environments (Whitman et al., 1998), the number of cells on Earth has been estimated to be as high as about 4–6 × 1030. The majority (about 5 × 1030 cells) live in the subsurface of oceans and soil; less than one per million inhabit humans, cattle, birds and termites. Based upon the genetic diversity of samples, the number of fungal species has been estimated to be 9.96 (Cannon, 1997), while that of prokaryotes has been estimated to be as high as 109 (Torsvik et al., 1990). A single gram of human gut content harbours about 1012 microbial cells, while one tablespoon of garden soil (10 g) contains about 109 bacterial cells, representing about 50,000 species. Considering that since the dawn of bacteriology not more than 18,000 species have been validly named and taxonomically described worldwide, enormous efforts are ongoing to increase the cultured fraction of hidden diversity (www.bacterio.net/number.html, accessed 9 July 2020). This is needed for deciphering the evolution of the genetic blueprint, its gene expression and the formation of secondary metabolites, to underpin our knowledge and realise the value and opportunities from research in academia and bioindustry. It is the main task of public mBRCs to make such cultured strains available to the research environment. The following paragraphs guide the researcher through a series of steps safeguarding the preservation of microbial strains and the necessary quality management, which include: (1) how to handle samples in the environment, including the clinical environment; (2) the tasks of the mBRC in authenticating a microbial culture; and (3) aspects of quality management in which samples are properly controlled from acquisition to deposition.

Handling Samples from the Environment to the Laboratory It is clear that, at present, researchers are not able to retrieve a pure culture directly from any environment without also collecting adventitious contaminating organisms. Carefully outlined strategies are necessary, therefore, to plan expeditions for collecting the raw material for subsequent studies.

58

D. Smith and V. Bussas

A research plan must include considerations about whether to target a specific member of a physiologically defined taxon, or to leave that selection to the growth media. The former strategy must be based on knowledge of the properties of the taxon, the most likely occurrence, the mode of enrichment and the application of specific growth media that are most likely to select the organisms with the desired properties. But before taking samples, the location of the sample in terms of depth from the surface, the sample volume and basic physico-chemical properties of the sample need to be recorded (sample type, temperature, pH). More extensive sample analysis is usually done in the laboratory (heavy metal content, total organic compound, content of nitrogen and phosphorus). Up-to-date taxonomic descriptions require the geographic coordinates and, since the signing of the Convention on Biological Diversity (CBD; www. cbd.int/convention/text/, accessed 16 October 2020), the prior informed consent of the landowner and/or national authority before taking samples. Working with genetic resources and associated data requires compliance with the Nagoya Protocol on Access and Benefit Sharing which came into force in October 2014 (www.cbd.int/ abs/). The enactment of EU Regulation No. 511/2014 implements Nagoya Protocol elements that govern compliance measures for users and offers the opportunity to demonstrate due diligence in sourcing their organisms from holdings of ‘Registered Collections’. The individual countries have been left the option to put in place access control on their own genetic resources. While most EU countries have decided not to control access (the UK, for example, has introduced a Statutory Instrument that establishes enforcement measures to implement this EU Regulation), some countries, such as France and Spain, have specific controls; see the Access and Benefit-Sharing Clearing-House (ABSCH) (https://absch.cbd.int/about/, accessed 9 July 2020). Countries beyond Europe are also enacting legislation to implement the Nagoya Protocol. Currently, there are 62 countries of the 118 parties that have administrative or policy measures in place (as of 26 August 2019); see the ABSCH for up-to-date information. Brazil, in particular, has taken measures to introduce legislation and implementing acts to facilitate

benefit sharing for utilization of its biodiversity (Smith et al., 2017a). Microbiologists need to be aware of the actions and procedures required by the source country with respect to the materials they use, and to put in place best practices for compliance (Smith et al., 2017a); such best practice has been drafted by MIRRI (Verkley et al., 2016). Additionally, there are restrictions on the distribution of dangerous organisms, owing to their biosecurity risk (Smith et al., 2017b; also see Chapter 6). The literature describes a plethora of detailed techniques for sample retrieval, the listing of which would exceed the scope and size of this chapter. Most important is the rapid, undisturbed and cold (4°C) transport into the laboratory to initiate isolation. Immediate freezing on dry ice or in liquid nitrogen has been used but is often not the method of choice because of logistical problems. All delayed isolation procedures have the same problem in that the fraction of the original diversity lost during transport cannot be assessed. For researchers to be able to benefit from scientific innovation, close cooperation and coordination of activities is needed between them and the research infrastructures, and particularly with the partner mBRCs, to maintain the resources for sustainability, the validity of science and future development. Taking the publication of a scientific paper as an example it is recognized that the written information is not enough to validate the science or to enable researchers to follow up the results. Most scientific journal policies state in the ‘Instructions to Authors’ that material, data and protocols should be made available in order to allow others to replicate and build upon the authors’ published claims. This is most often interpreted to mean that the researchers themselves keep the material. Unfortunately, unless the authors have a dedicated mBRC in their institution, there are often no facilities for culture maintenance, resulting most often in the loss of the strain, particularly when research takes a new direction or the researcher retires. Therefore, it is recommended that key strains from research papers be deposited in mBRCs. Coordinated and targeted efforts between the researchers, journals, funders and mBRCs can address capacity problems and provide a workable solution (Stackebrandt et al., 2014).

Preserving the Reference Strains

DNA Sample Preparation and Storing European collections in the European Consortium of Microbial Resource Centres (EMbaRC) project have outlined their processes for the extraction, quality control and storage protocols to optimize their DNA banking activities with the different organism types (www.embarc.eu/ deliverables/EMbaRC_D.JRA1.2.2_D14.18_ method-storage-DNA.pdf, accessed 9 July 2020). This project deliverable emphasizes that the long-term storage of high-quality DNA is key to the development of the European microbial DNA bank network, and summarizes in precise protocols the different options to preserve isolated microbial DNA (dry, frozen or encapsulated), depending on the species and on the BRC facilities and strategy. For consistency, genomic DNA extraction kits are often used; several are available to choose from, to best suit the sample type and needs. Several storage methods are recommended; most favour storage in liquid nitrogen vapour at temperatures below −175°C, but storage at −20°C and −80°C are also suggested. There are also dry DNA storage methods and kits available. It is critical that the DNA quality and identity is checked both pre- and post-preservation.

Sample Acquisition and Authentication The mBRC must document its acquisition policy defining the biological material to be maintained and the criteria on which the acceptance of new biological material offered to the collection are based. This policy must balance capability and capacity with the scientific and users’ needs, and the policy must also meet the needs of the provider country of the materials. Only biological material that meets the defined acquisition criteria and which falls into the groups of its specialist expertize should be accepted. Safe procedures for receipt and storage of the microorganisms for deposit appropriate to the type of biological materials handled must be documented and implemented. The Common Access to Biological Resources and Information (CABRI, www.cabri.org, accessed 9 July 2020) website provides many examples of the technical and standard operating procedures needed to help

59

with this documentation. All incoming parcels that contain known or unknown microorganisms must be opened in a suitable containment laboratory or appropriate microbiological safety cabinet, with local facilities for the safe handling and disposal of biological materials. Part of the preparation for receipt must be the provision of assurance from the depositor that biological materials were obtained legitimately and that they are safe to handle. Conditions of deposit must be laid down in a material transfer agreement (MTA), for example to protect assigned intellectual property rights (IPR). Where deposits are outside the expertize of the mBRC, alternative suitable mBRCs should be recommended. Quality control procedures must be carried out upon receipt of biological material to confirm its purity, identity and viability. Before accepting a deposit, the material must be checked against risk group lists and other lists to make sure that the biological material does not exceed the laboratory’s biological safety containment level. It is required by law in Europe that a risk assessment is carried out on the material and the methods recorded to determine, as far as possible, the potential of harm to personnel, the public and the environment. A unique collection number is then allocated to the biological material, which is never reassigned, even if the biological material is later discarded. All records are retained and used as a base line when in-storage maintenance checks are performed or for validation after preservation restocking. A ‘maintenance plan’ (i.e. a scheme for periodic control of the preserved material) is then put in place for each item stored. Several factors determine the frequency of the maintenance checks (e.g. the type of biological material, the preservation method and the turnover of the material). An mBRC should only accept biological material that meets its acquisition criteria and for which expertize is available. The OECD best practice for BRCs recommends that the scope of the types of organisms collected by the mBRC is defined by its expertize. This is not limited to taxonomy of the organisms but includes the technologies and knowledge needed for their growth, handling and preservation, be they of human, plant, animal or environmental origin. The material received should be accompanied by a deposition form containing at least a minimum of information, although more information

60

D. Smith and V. Bussas

is advantageous, such as attached publications, gene sequence accession numbers, reference to associated data, including:

• • • • • • • •

species name (if previously determined), other identifier or cell culture description; isolator’s and depositor’s names and addresses; provision of prior informed consent (PIC) from the sample site owner; source, substrate or host from which the biological material was isolated or derived (where identified) and date of isolation; geographical origin of material (the minimum requirement is the country of origin or the furnisher of the source, substrate or host) and coordinates of sampling site; depositor’s biological material number or other collection number(s), if deposited elsewhere; growth media and conditions, cell preservation or storage conditions where known; and hazard information (e.g. in the form of a safety data sheet).

Details of minimal and recommended data sets for microbial accessions (bacteria, archaea, cyanobacteria, filamentous fungi, yeasts, microalgae, plasmids, protozoa, phages, DNA and viruses) to mBRCs are listed in the Annex of the OECD Best Practice Guidelines (OECD, 2007). According to this Annex, ‘authentication is the process by which biological materials are characterized up to a defined level using appropriate technology to establish a conclusive basis for accepting the material as genuine’. The defined level is usually the one that was provided with the resource by the depositor, whether a type strain of a new species or a strain from a research collection. Generally, the amount of data is higher for a type strain included in a publication, as the requirements for publishing a new species according to minimal or recommended standards include a wide range of different tests for its circumscription. Non-type strains, such as those included in clinical, ecological and applied studies, are affiliated to a species by the authors of a publication on the basis of some salient features that fit the description of the species, generally physiological tests or gene sequences. In cases where research strains are sent to collections for preservation, the degree of characterization varies

depending upon the research goal, but currently at least partial 16S rRNA gene sequences should be available. Besides authentication, mBRCs are asked to test the stability of some key features used to affiliate to a species. While this is not necessary for gene sequences, plasmid-coded features and other properties not lost during repeated growth–cold storage cycles, several phenotypic (mostly physiological) properties are prone to change, and these characters should not be used for taxonomic purposes or quality checks. Information on all test results needs to be recorded, preferably in electronic form; this can be used for future validation of identity and for determination of changes in geno- and phenotype after preservation replenishment. When materials are received by the mBRC they must be grown on defined media, often recommended by the author or depositor, and standard reagents should be used. Standards for all preparations used in the growth and/or maintenance of the living biological materials held must be documented with the appropriate mechanisms in place to record any changes made to procedures. Every precaution must be taken to ensure the accurate preparation and storage conditions of culture media; this fundamental step in the growth and maintenance of biological materials must be given special attention. The media formulae must be documented and procedures put in place for approval and adoption of changes. Media batches need to be clearly labelled, and expiry dates (date after which media and reagents are not to be used) defined and clearly indicated. Supplies of materials for use must be of high standard and uncontaminated. The quality control parameters depend upon the material received:

• • •

The growth of most organisms should be checked on appropriate media, while alternative tests for obligate pathogens or those yet to be grown in culture should be selected. Purity should be checked by the absence of contaminants using macro- and microscopic observations on the culture grown on an appropriate medium. Identification to species level requires a polyphasic approach including morphological (macroscopic and microscopic), physiological and molecular tools.

•

Preserving the Reference Strains

Stability testing would include viability and purity checks, followed by confirmation of identity.

Procedures to detect errors in data to improve their quality and consistency must be put in place. This is an essential part of information management as it is the window to the user, helping them select the material that is most suitable for their anticipated use. A standard data schema and protocols are preferable to make the databases distributed and interoperable (see Chapter 6). Appropriate measures (protocols, tools and standards) should be employed to assure the reasonable security of information. Existing systems, such as authentication by user ID and password, encryption of messages and restriction of IP addresses, may provide the basis for such measures. Back-up files should be stored in secure cabinets. To date, the most straightforward method of checking the identity of prokaryotic and eukaryotic microorganisms is verification of the ribosomal 16S/18S rRNA gene sequence (preferably the almost full-length gene sequence and ITS regions, respectively), but this may be replaced by rapid genome sequencing in the near future. Although mislabelled strains can be identified immediately, a small percentage of contaminant in a culture would remain undetected, as the microheterogeneity of several ribosomal RNA (rrn) operon copies in the target organisms would mask the presence of foreign rDNA. There is the option of cloning and sequencing rrn genes, or of identifying other genes for specific groups of microorganisms, especially those for which a genome sequence has been obtained. Authentication by matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) is a rapid method that is in wide use in the clinical environment, and also applied increasingly in the mBRC environment (see Chapters 7 and 8). As most of the protein peaks detected are ribosomal proteins, the method is appropriate in identifying microorganisms to the species – and often even to the subspecies – level. The advantage of both approaches is the construction of databases, although only the rRNA database is open access. Other fast methods, such as generation of DNA and rDNA patterns (e.g. Enterobacterial Repetitive Intergenic Consensus; Amplified Ribosomal DNA Restriction Analysis, Restriction Fragment

61

Length Polymorphism, Randomly Amplified Polymorphic DNA) are not interoperable, but serve authentication at the intra-laboratory level. Identification at the physiological level by application of commercial systems has the advantage of multi-character testing, but experience has shown that often some of these properties are not expressed under different pregrowth conditions, excluding identification at the strain level. As the majority of bacterial species are described on the basis of a single strain only, the variety of phenotypic reactions in additional strains of a species remain obscure before thorough testing. Most importantly, a preserved authenticated strain must be sent back to the original depositor for confirmation of identity, and only after the mBRC receives a confirmation letter will the respective strains enter the storage routine.

Preservation Techniques The next step in the process is the selection of appropriate preservation and maintenance methods; these may be in accordance with recommendations from the depositor and/or previous experience. The mBRC should document these preservation procedures to ensure they are reproducible and that key parameters of the process are recorded and monitored. The biological material must be preserved by at least two methods (where two distinct methods are not applicable to the biological material, cryopreserved stocks can be maintained in separate locations) and as master cell banks and as stocks for distribution. The labels on preserved materials should include at least the batch date or number and the unique accession number. Biological materials with specific hazards must be clearly differentiated. Special care must be taken to select appropriate materials and methods when labelling tubes for cryostorage, for which specially designed labels are available. Several innovative labelling devices can be considered such as the use of barcoding or radio-frequency identification (RFID), where digital data is encoded in RFID tags or smart labels, and outputs are captured by a reader via radio waves (e.g. www.z ebra.com/gb/en/resource-librar y/ getting-started/rfid-printing-encoding.html, accessed 9 July 2020).

62

D. Smith and V. Bussas

Numerous criteria need to be considered when selecting the most appropriate technique for preservation (Ryan et al., 2000). The commonly used approach for sustainable preservation of microbial cultures is long-term preservation employing liquid nitrogen, deep freezing, freeze- drying or L-drying methods (Smith and Ryan, 2008). Freeze-drying of microorganisms has been a method of choice for many years, and relies on the removal of water from a suspension of cells directly from frozen material by sublimation under vacuum. The exact methodology is equipment dependent; there is a wide range of equipment available, ranging from laboratory bench models through to pilot scale and huge industrial installations. The protocol should be optimized for different organisms and cell types and shown to be applicable for the majority of bacteria, sporulating fungi and yeasts (Smith et al., 2001; Smith, 2012; Smith et al., 2013). L-drying methods were developed for those organisms that fail to survive the vigours of freezing and dehydration. It is a useful alternative method of vacuum drying for the preservation of bacteria that are particularly sensitive to the initial freezing stage of the normal lyophilization process. The process prevents the cultures from freezing under vacuum; drying occurs direct from the liquid phase (Smith et al., 2001). Checking of hundreds of ampoules of bacterial strains after more than 40 years revealed that there was no loss at all in viability (Bussas, personal communication). The most widely applicable preservation technique is freezing suspensions of cells or agar plugs cut from growing cultures using appropriate cryoprotectants such as glycerol. There are numerous controlled freezing options and storage temperatures, but storage in or above liquid nitrogen is considered the best for keeping stability and long-term survival (Smith et al., 2001, 2013; Smith and Ryan, 2008; Smith et al., 2012). Taking cryopreservation as an example, a standard operational procedure would include the selection of optimal growth conditions prior to preservation in order to produce healthy material. It would go on to describe criteria for all elements in the procedure including measuring and recording baseline data for stability checks. The latter might utilize morphological characteristics, photomicrographs, growth rates, metabolic data, sequence data and genome

fingerprinting techniques. The selection of the most appropriate preservation protocol would include the selection of a cryoprotectant that is appropriate for the cell type, the most appropriate cooling rate, storage temperature (below −140°C) and the most appropriate thawing protocol. Temperature measurement must be carried out using a thermometer calibrated to a standard; cooling and thawing should take place in calibrated and controlled equipment; and all storage conditions must be monitored and recorded. Other criteria as described by Smith and Ryan (2008) include:

• • •

• •

preparation of master and distribution stocks that have high recovery and no contamination; authentication of samples utilizing morphology, phenotype, molecular integrity; method validation (e.g. performing blind tests, reproducibility checks by comparing results of the same method at different times, utilizing different methods with different operators); equipment calibrated and regularly serviced with gauges calibrated to recognized standards; and recording parameters by keeping daily records of temperature readings of incubators and cryostorage units to ensure they remain within set parameters.

The preservation and handling records kept must be able to demonstrate that the correct organism is in the resulting ampoule and it has followed the desired protocol and been tested satisfactorily, for example:

• • • • • • • • •

confirmation of organism identity or traceable to source; organism is pure, viable and stable; suspending medium formula and preparation is traceable; equipment used is recorded and serviced; key parameters are measured and measurements calibrated to an accepted standard (pressure, temperature); post-preservation tests completed satisfactorily; when the work was done and by whom; the persons involved had the correct training; and environmental conditions controlled.

Preserving the Reference Strains

mBRCs are designed and equipped to manage such processes; they are centres for technology and knowledge transfer and, individually, through their scientific communities or research infrastructures or projects they offer opportunities for training, joint activities or support in many other ways.

Approaches to Testing Stability in Storage Traditionally, growth and stability testing in mBRCs has been limited to recovering strains from

63

storage, comparing germination results, checking the identity and examining specific phenotypic characteristics, normally those relevant to the application of the organism. Many phenotypic and molecular technologies are available (Table 4.1). The OECD best practice for BRCs recommends that, for stability in storage, the viability and purity should be checked and the identity confirmed (OECD, 2007). This should be sufficient for reference strains as long as the identification check includes the phenotypic and genotypic technologies needed to name the strain. Stable strains are needed to support validated databases, including those supporting

Table 4.1. Quality control procedures recommended for microorganisms. Microorganism Identity (OECD, 2007) Yeasts and filamentous fungi

Bacteria

Stability (OECD, 2007) Recommended today

Identify to species level using Check viability and morphological (macroscopic purity. Confirm and microscopic) and identity physiological features; where appropriate use biochemical features and molecular tools, according to the taxa Identify to species level using Check viability and morphological (macroscopic purity. Confirm and microscopic) and identity physiological tools; where appropriate, use molecular tools

Cyanobacteria Identify to genus level using Check viability and morphological (macroscopic purity. Confirm and microscopic) and identity physiological tools; where appropriate, use molecular tools Archaea Identify to species level using Check viability and morphological (macroscopic purity. Confirm and microscopic) and identity physiological tools; where appropriate, use molecular tools

Viability and purity check Phenotypic: Morphology MS/ MS spectral patterns Genotypic: Sequencing the internal transcribed spacer (ITS) MicroSeq

Viability and purity check Phenotypic: Manual phenotypic methods (API Strips, BBL etc.), Automated phenotypic methods, Cellular fatty acids (MIDI System), Carbon source utilization (Biolog), MALDI-TOF Genotypic: 16s rRNA gene sequencing, MicroSeq, Riboprinter Viability and purity check Phenotypic: Morphology, ELISA Genotypic: Phylogenetic analyses based on 16S rDNA gene sequences Viability and purity check Phenotypic: Phospholipid fatty acid (PLFA) profiling, Polar lipid analysis, Carbon(C)-source consumption patterns (e.g. Biolog) Genotypic: 16S ribosomal RNA

MS/MS: Tandem mass spectrometry; API: Analytical Profile Index; BBL: BD BBL™ Crystal™ identification system; MIDI: Sherlock Microbial Identification System; MicroSeq: MicroSeq 500 16S rDNA microbial identification system; Biolog: Biolog Identification System; MALDI-TOF: matrix-assisted laser desorption/ionization time-of-flight mass spectrometry.

64

D. Smith and V. Bussas

automated identification systems, and it is prudent to carefully select the stability testing regime for such strains. Phenotypic properties (e.g. biochemical/ carbon source utilization) can be variable, subjective and dependent on growth parameters and the health of an organism. Using the MIDI system for cellular fatty acid profiles results in variability; for example, the profiles may change with temperature, age of the culture and growth medium. Although automated systems such as MALDI-TOF are available and reliable for bacteria, the technique cannot yet be fully adapted to fungi. Bader (2017) points out that the protocols for harvesting cells for MALDI-TOF analyses differ between the different groups of organisms. With respect to fungi, methods further vary between yeasts and moulds as they do not lyse in a similar fashion; the method is not yet fully adapted to fungi (Bader, 2017; Hendrickx, 2017). Genomic technologies involving DNA and RNA sequencing are often applied; these require somewhat complicated laboratory procedures and manual interpretation of results requires phylogenetic analysis. The impact of such approaches is far-reaching, but limited, because not all microorganisms have yet been sequenced. Again, work with bacteria is ahead of that with fungi, as there are comprehensive 16S ribosomal RNA gene databases to assist bacteria identification. ITS sequencing reveals considerable fungal diversity and to date there are no complete databases. However, whole-genome sequencing, which is now becoming more rapid and affordable, can be used to detect strain differences at the molecular level.

mBRC Management: Adopting an Appropriate Standard When processing microbial samples it is critical that recommended best practices are followed to ensure stability and reproducibility of properties of the microorganisms. Such best practice guidelines have been around for at least four decades (Smith, 2012). In this chapter, the authors recommend that operational systems be based upon the OECD Best Practice Guidelines (OECD, 2007). The public service culture collections have been following the WFCC guidelines (Anon., 2010) for at

least the last three decades, and many have gone on to adopt International Organization for Standardization (ISO) standards such as ISO 9001:2015 series, or ISO17025, for their operations as a benchmark for quality. The various culture collection or mBRC organizations and infrastructures recommend that their member collections put in place the most appropriate management systems suited to the environment in which the collection operates (Martin et al., 2015). This includes the quality management of processing samples from arrival to their long-term storage utilizing technical operating procedures (TOPs) and standard operating procedures (SOPs) in process management. The management practices are based upon a documented system, giving management systems effectiveness through process performance measures (Martin et al., 2015). As with most technological areas the ISO standards have to be adapted to suit the use to which they are being applied. Consequently, the French biobanks and culture collection community worked with the French national organization for standardization, the Association Française de Normalisation (AFNOR) to develop the French standard NF S96-900 ‘Quality of biological resource centres (BRCs) – Management system of a BRC and quality of biological resources from human or micro-organism origin’ (AFNOR, 2008). This brings together most of the essential elements of such standards that are relevant to the operation of biobanks and culture collections to facilitate consistency of application across the biobank community. More recently, an ISO standard has been launched for biobanks, including mBRCs. This provides general requirements for the competence, impartiality and consistent operation of biobanks including quality control requirements to ensure biological material and data collections of appropriate quality: ‘ISO 20387:2018 Biotechnology – Biobanking – General requirements for biobanking’ (www.iso.org/standard/67888.html, accessed 9 July 2020). The Brazilian network of collections has gone a step further. Working with its accreditation bodies and led by INMETRO - National Institute of Metrology, Quality and Technology, they have published the standard ‘NIT-Dicla061 Aplicações e Requisitos Adicionais Acreditação ABNT NBR ISO/IEC 17025 dos Centros de Recursos Biológicos’ for the accreditation of BRCs (www.inmetro.gov.br/credenciamento/

Preserving the Reference Strains

pdf/nit_dicla_061.pdf, accessed 9 July 2020). Conformity assessment for biological resource centres (BRC): The Brazilian approach’ was published in 2012 (Holanda et al., 2012) and the first BRC to be accredited globally, the Collection of Reference Microorganisms on Health Surveillance (Fiocruz- CMRVS), followed. The collections at Fiocruz follow the best practices that were established (Forti et al., 2016; see also https://portal.fiocruz. br/en/biological-collections, accessed 9 July 2020). Whatever the approach that is selected, an mBRC quality management system must address several specific areas:

• • • • • • • • •

organizational requirements; equipment use, calibration, testing and maintenance records; documentation management; data management, processing and publication; preparation of media and reagents; accession of deposits to the mBRC; preservation and maintenance; supply; and quality audit and quality review.

Among the microorganism domain criteria set down by the OECD Best Practice Guidelines (BPG) are recommendations on:

• • • • •

staff qualifications and training; hygiene and biosafety; equipment use, calibration, testing and maintenance records; preparation of samples; and information provided with the biological material supplied.

All these issues must be addressed when carrying out the duties of the mBRC. It is important to document procedures for good science as well as good management practice. There is a need to ensure continuity when there are staff changes and when new technologies become available; changes to procedures must be made in a way that is traceable and evidence based. Records need to be kept to ensure that the impact of change can be monitored. Applying measures to maintain the genomic stability of the biological materials is paramount to ensure reproducibility and that they remain fit for purpose. An assessment of the activities of the institution and its outputs to determine the most important elements of the OECD BPG (OECD, 2007) will help to prioritize and develop the implementation

65

plan (see EMbaRC; www.embarc.eu/, accessed 16 October 2020). It is not essential to try and implement all elements of the guidance at once. Public culture collections have in common the basic requirements to provide authentic, well-preserved biological material whose properties are reproducible for the long term (Smith, 2012). Establishing quality management processes in these areas will always first be followed by other key activities, introducing a culture of continual improvement. As with all such systems, the ultimate beneficiary is the user, so their requirements must always be considered alongside institutional needs and internationally accepted criteria. The processes implemented by an mBRC, from arrival of microorganisms to their long- term storage, are relatively complex and each step must be carefully taken and documented. The OECD BPG (OECD, 2007) laid the foundation for basic requirements for the management of the organization. At the microbial domain level, it is important that:

• • • • • • • •

there is a minimum number of generations of original material before preservation and storage; a master stock is created from the original material and sufficient distribution stocks are produced to minimize the need for stock replenishment; material is stored under environmental parameters that assure genomic stability; details of the inventory control, lead times and restocking practices are documented; duplicate collections should be maintained, preferably on another site, as a ‘disaster protection’ measure; all methods used in the handling, characterization and storage of the microbial material are validated; quality audits and review procedures are in place; and biological material is preserved by at least two methods.

The organization must meet the OECD definition and be compliant with appropriate national law and regulations. It should describe and document the nature of the biological resources it holds. The collection has a long-term commitment to provide strains to its user community and therefore it must have a strategy for its long-term sustainability; and, if its future is

66

D. Smith and V. Bussas

threatened, the mBRC should have a plan to ensure that its key holdings remain available. There are also managerial responsibilities and staff requirements, for example senior management must ensure that appropriate resources are available for staff members to discharge their responsibilities; that staff are proved to be competent; and that they are not assigned to duties they are not trained in and able to do. Authorization to use specialist equipment should be documented in training records. For example, new staff must not be allowed to use autoclaves, centrifuges, freeze-drying equipment, cryopreservation facilities and safety cabinets until they have been trained in their use and are proved competent. Health and safety (biosafety) procedures must be laid down under the appropriate level of containment for the microorganisms being handled, as defined by the World Health Organization (WHO, 2004) and as interpreted by national law, regulations and policies, to avoid contaminating samples, risk of infection and environmental dispersion. Additionally, the environment for handling microorganisms must be, for example, free from contamination. The premises should be conducive to facilitate the acquisition, maintenance and provision of biological material and the organization’s services. Appropriate areas are required for specific operations. These include areas for:

• • • • •

receipt and storage of the initial sample; preparation, regeneration, handling and processing of samples; biological material storage and backup for safety (a duplicate collection, preferably in a remote building or alternative site); supply, delivery/sales (separate from incoming accessions); and decontamination and cleaning of equipment and processing of wastes.

Access to the mBRC should be restricted to authorized staff or those accompanied by them. Those housing hazardous biological materials should pay particular attention to security and, where appropriate, be fitted with security devices. A contamination monitoring programme should be in place to include environmental monitoring of laboratory air and surfaces (see above). Any support services used should be of adequate quality to sustain confidence in its

activities. Supplies should be sought from reputable companies with, where possible, proven quality of products. Equipment management procedures including use, control of performance, maintenance and calibration must be laid down in a predefined schedule. Service records should also be maintained and copies of key documents held in equipment maintenance and calibration logbooks. Documentation management procedures are crucial: alterations to any operating documents need to be controlled, recorded and issued, and all staff must adhere to the prescribed policies and procedures. Any departures from documented procedures must be agreed by senior management prior to deviation, and written permission and justification be included in the relevant records. The mBRC should manage and store data and produce electronic catalogues based on authenticated and validated information. Depositors are responsible for assuring the quality of data associated with the biological material deposited, but the collection should seek evidence to assure the validity of the data. The authentication of data may differ from centre to centre, but it should include:

• •

providing traceability of data through a history of modifications (dates and signatures of inputs, validations, modifications and deletions); and signature for data entry, validation, modification or deletion.

Conclusion It is critical that reference genetic resources are preserved and supplied, and that the quality, reproducibility and retention of properties are given the highest priority. The repository (mBRC) must operate to internationally accepted criteria to ensure this. Attention to detail must be maintained in the supply of materials from collections, taking into consideration the nature of the biological materials and compliance with all relevant national and international regulations and policies (see above). Microorganisms should only be supplied to laboratories, and only to those individuals who are trained in microbiology and have access to properly equipped laboratories.

Preserving the Reference Strains

The recipient’s facilities must meet the specific requirements as required by relevant national and international regulations and policies. The mBRC must have procedures to ensure that only authorized users may access biological material and that supply follows national and international requirements such as those associated with shipping, packaging, health and safety, quarantine and biosecurity. State-of-the-art preservation for long-term maintenance and genetic stability has always been one of the core functions of culture collections and mBRC. The reader is referred to the OECD BPG (OECD, 2007) and the ISO Biobank standard (www.iso.org/standard/67888.html, accessed 9 July 2020) to obtain an overview of current methods. Other useful references are

67

Smith et al. (2013), Spring et al. (2014) and Overmann and Smith (2017). The resources provided by mBRCs, in particular, require specialist management and expert knowledge to give the high quality and legal certainty needed today. Additionally, if the user cannot find the microorganisms needed, the mBRCs extended networks of depositors and expertize in isolation and growth may be able to collect new isolates. Alternatively, mBRCs can help to find close relatives or other organisms from similar environments that have the needed property. An identification is only as good as the reference material and the technologies used to obtain it. mBRCs operating to recognized quality management systems and standards are designed to ensure that an identification is accurate.

References AFNOR (2008) French Standard NF S96-900 “Quality of biological resource centers (BRCs) – Management system of a BRC and quality of biological resources from human or micro-organism origin”. Association Française de Normalisation, 11 rue Francis de Préssensé F, 93 571 La Plaine Saint Denis cedex. http://www.afnor.fr Anon (2010) The WFCC guidelines for the establishment and operation of culture collections (Online), http://www.wfcc.info/index.php/wfcc_library/publication/, (accessed March 2020). Bader, O. (2017) Fungal species identification by MALDI-ToF mass spectrometry. Methods in Molecular Biology 1508, 323–337. DOI: 10.1007/978-1-4939-6515-1_19 Cannon, P.F. (1997) Diversity of the Phyllachoraceae with special reference to the tropics. In: Hyde, K.D. (ed.) Biodiversity of Tropical Microfungi University Press, Hong Kong, pp 255–278. Forin, N., Nigris, S., Voyron, S., Girlanda, M., Vizzini, A., Casadoro, G. and Baldan, B. (2018) Next generation sequencing of ancient fungal specimens: the case of the Saccardo Mycological Herbarium. Frontiers in Ecology and Evolution 6, 129. https://doi.org/10.3389/fevo.2018.00129 Forti, T., Souto Ada, S., do Nascimento, C.R., Nishikawa, M.M., Hubner, M.T.W., Sabagh, F.P., Temporal, R.M., Rodrigues, J.M. and da Silva, M. (2016) Evaluation of a fungal collection as certified reference material producer and as a biological resource center. Brazilian Journal of Microbiology 47(2), 403– 409. DOI: 10.1016/j.bjm.2016.01.021 Hendrickx, M. (2017) Fungal species identification by MALDI-ToF mass spectrometry. Current Fungal Infection Reports 11, 60. https://doi.org/10.1007/s12281-017-0277-6 Holanda, P., Cavalcanti, E., Borges, R.M.H., Souza, W.S. (2012) Conformity assessment for biological resource centres (BRC): The Brazilian approach. WFCC Newsletter 52, 8–10. Martin, D., Stackebrandt, E. and Smith, D. (2015) MIRRI promoting quality management systems for microbiology. ECronicon Microbiology 2.2, 278–287. OECD (2007) Best Practice Guidelines for Biological Resource Centers (June 2007), http://www.oecd.org/ document/36/0,3343,en_2649_34537_38777060_1_1_1_1,00.html (accessed March 2020). Overmann, J. and Smith, D. (2017) Bioprospecting in biological resource centres: business plans and molecular databases. In: Paterson, R. and Lima, N. (eds) Bioprospecting – Successes, Potential and Constraints Vol 16. Springer, http://link.springer.com/book/10.1007/978-3-319-47935-4 Ryan, M.J., Smith, D. and Jeffries, P. (2000) A decision-based key to determine the most appropriate protocol for the preservation of fungi. World Journal of Microbiology & Biotechnology 16, 183–186. https:// link.springer.com/article/10.1023/A:1008910006419

68

D. Smith and V. Bussas

Smith, D. (2012) Culture collections. Advances in Applied Microbiology 79, 73–118. https://doi.org/10.1016/ B978-0-12-394318-7.00004-8 Smith, D. and Ryan, M.J. (2008) The impact of OECD best practice on the validation of cryopreservation techniques for microorganisms. Cryo-letters 29, 63–72. https://www.ingentaconnect.com/content/cryo/ cryo/2008/00000029/00000001/art00010. Smith, D., Ryan, M.J. and Day, J.G. (eds) (2001) The UK National Culture Collection Biological Resource: Properties, maintenance and management. UK National Culture Collection, Egham, UK. Smith, D., Fritze, D. and Stackebrandt, E. (2013) Public service collections and biological resource centres of microorganisms. In: Rosenberg, E., DeLong, E.F., Stackebrandt, E., Lory, S. and Thompson, F. (eds) The Prokaryotes: Prokaryotic Biology and Symbiotic Associations, 4th edn. Springer Verlag, Heidelberg, Germany, pp 267–304. Smith, D., da Silva, M., Jackson, J. and Lyal, C. (2017a) The impact of the Nagoya Protocol on Access and Benefit Sharing (ABS) on microbiology. Microbiology 163(3) DOI: 10.1099/mic.0.000425 Smith, D., Martin, D. and Novossiolova, T. (2017b) Microorganisms: good or evil, MIRRI provides biosecurity awareness. Current Microbiology 74, 299. DOI:10.1007/s00284-016-1181-y. Spring, S. (2014) Preservation of thermophilic microorganisms. In: Rainey, F.A. and Oren, A. (eds) Extremophiles. Elsevier B.V., Amsterdam, the Netherlands, pp 349–368. Stackebrandt, E., Smith, D., Casaregola, S., Varese, G.C., Verkleij, G., Lima, N. and Bridge, P.D. (2014) Deposit of microbial strains in public service collections as part of the publication process to underpin good practice in science. SpringerPlus 3, 208. Tindall, B.J. (2015) On the valid publication of names and combinations. International Journal of Systematic and Evolutionary Microbiology 65, 3226–3227. DOI: 10.1099/ijs.0.000367 Torsvik, V., Golsoyr, J. and Daae, F. (1990) High diversity in DNA of soil bacteria. Applied Environmental Microbiology 56, 782–787. Verkley, G., Martin, D. and Smith, D. (2016) Microbial Resource Research Infrastructure Best Practice Manual on Access and Benefit Sharing. https://absch.cbd.int/api/v2013/documents/F1C80F1C-1EB7F02A-CEED-E7D523F17079/attachments/MIRRI%20ABS%20Manual_web.pdf and http://www.mirri. org/fileadmin/mirri/media/Dokumente/generalDocs/MIRRI_ABS_Manual_web.pdf (accessed July 2020) Whitman, W.B., Coleman, D.C. and Wiebe, W.J. (1998) Prokaryotes: The unseen majority. Proceedings of the National Academy of Sciences of the United States of America 95, 6578–6583. https://doi. org/10.1073/pnas.95.12.6578 WHO - World Health Organization (2004) Laboratory Biosafety Manual, 3rd edn. World Health Organization, Geneva, ISBN 92 4 154650 6, WHO/CDS/CSR/LYO/2004.11

5

Can Older Fungal Sequence Data be Useful? Paul Bridge* Axminster, UK

Introduction A fundamental purpose of microbial systematics is to obtain information that can be used to elucidate relationships between strains and specimens. At one level this information may be required to determine high-level evolutionary relationships, or at the other extreme it may be required to identify individual strains. In between these is the collection of data to determine genera and species. This middle level is probably the most fundamental in systematics as it provides the name that is placed on the sample. The terms ‘species’ and ‘species name’ possibly originate from Aristotle and, although the concept was formalized by Linnaeus, Darwin and others, the species has remained a fundamental unit of systematics (Cowan, 1978; Abbott et al., 1985; Leroi, 2014). Systematics is not an entirely exact science, as the information available has to be interpreted, and as scientific methods develop more information becomes available. New information has to be incorporated into a systematic framework and so, over time, taxonomic interpretations change. This is particularly true in microbial systematics where developments in molecular biology such as hybridization and PCR, together with the routine access to DNA sequencing, have resulted in a huge increase in the

information available. These advances have allowed both improved discrimination between organisms, and increased capabilities in detecting microbes in environmental material. As molecular techniques are often culture independent they can identify organisms that have not been, or cannot be, isolated into culture. Molecular screening of environmental samples has led to the detection of many previously unknown genera and species and these, in turn, have raised many questions about the total numbers of microbial species in existence. For many years there has been considerable discussion as to how many species of microbes exist worldwide, particularly in relation to the potential numbers of fungi and bacteria. The estimated numbers and the number of those species characterized increase as new techniques for sampling are developed and older material is re-examined and evaluated. During 2018 some 2189 species of fungi were described as new (Willis, 2018) and there were on average around 50 new species of bacteria described in the International Journal of Systematic and Evolutionary Microbiology each month. These new taxa include newly discovered species reported for the first time, and new species derived from existing taxa that had previously been considered under wider concepts or acknowledged as species complexes.

*[email protected] © CAB International 2021. Trends in the Systematics of Bacteria and Fungi (eds. P. Bridge, D. Smith and E. Stackebrandt)

69

70

P. Bridge

In microbial systematics information is widely disseminated when it is obtained – historically in monographs and papers – and reference materials such as specimens, cultures and DNA sequences are deposited in appropriate collections and institutions. These remain available for comparison to, and incorporation with, new information as it becomes available and such activities provide some opportunities to re-evaluate the earlier data. The curation and use of culture collection data and materials are dealt with elsewhere in this volume, as are many of the issues regarding bacteria, and so this chapter will focus primarily on the use and evaluation of fungal DNA sequences.

Data Available The sequence of the internally transcribed spacer (ITS) of the rRNA gene has been adopted as a universal barcode for fungal systematics (see Schoch et al., 2012). As a result, ITS sequences have been recovered from an enormous range of fungi and the ITS region has been used as a target in environmental studies. Journal publication guidelines generally require deposition of sequence data prior to publication and so large numbers of fungal ITS sequences are available in the International Nucleotide Sequence Database Collaboration (INSDC) sequence databases. Various specific ITS barcoding databases have been established within the systematic mycology community (for details see Yahr et al., 2016; Raja et al., 2017 and Chapter 12, this volume). In addition to sequences from named cultures, the INSDC sequence databases contain multiple ITS sequences from only partly identified fungi. Some of these have been derived from cultures from surveys and screens, but many are from clones obtained from environmental surveys. These may have been labelled as unidentified because they did not match to any of the reference sequences available at the time, or in some cases they may not have been identified as there was no need in the study being undertaken. As more ITS sequences become available from both new species and existing type, ex-type and reference material these can potentially be used to screen the environmental and part-identified sequences. This may give some insights into how many are from previously described species and

how many may be from currently unknown species. How many unidentified ITS sequences, designated as molecular operational taxonomic units (mOTU) represent new taxa, and what proportion represent already known but unsequenced species has been widely discussed in the literature (for discussion see Bidartondo et al., 2009; Hibbett et al., 2009, 2011). Nagy et al. (2011) investigated to what extent mOTUs might be identified if sequences were available from all current type materials. They used the zygomycete genus Mortierella as a model and obtained ITS sequences from 102 type and ex-type materials of 78 of the c.155 known species of Mortierella. They grouped these with 832 ITS sequences deposited in Genbank as from identified and unidentified mortierellalean fungi. The combined data set resulted in 92 distinct mOTUs, of which 52 contained sequences from their named reference set. They observed a roughly linear relationship between the number of mOTUs and type sequences and concluded, for the Mortierella-like fungi ‘that most “unidentifiable” environmental sequences in fact represent species already described by taxonomists, but have not been sequenced to date’. Similar instances of environmental and unidentified sequences being matched to newly sequenced older taxa have been reported, but the wider significance of this in estimating fungal diversity is still debated (see Hibbett and Glotzer, 2011). One limitation to comparing new mOTUs to existing species that was highlighted by Nagy et al. (2011) is the limited number of verified type and ex-type sequences available for known species. This is not limited to Mortierella species and, although across mycology around 100,000 fungal species have been described, the use of rRNA sequences in those descriptions is relatively new. Yahr et al. (2016) recently reviewed links between rRNA sequences and fungal species descriptions and they identified that, in 2016, the National Centre for Biotechnology Information (NCBI) database included 32,431 taxonomically named fungal species, but only 7308 of those included sequences that could be linked to type or ex-type material. In addition, they determined that between 2011 and 2016 only 55% of new fungal species descriptions included sequence data. One issue of particular concern in mycology is the number of species for

Can Older Fungal Sequence Data be Useful?

which type material is either no longer available or not s uitable for modern molecular studies. For many years type material was restricted to either a dried specimen in a herbarium or, until 2007, an illustration. A recent initiative at NCBI to obtain reference sequences for targeted loci for bacterial type material has been extended to include fungal ITS sequences (see Federhen, 2015). As a result, type and ex-type information from more than 20 established taxonomic collections has been obtained, and has resulted in the release of around 2600 reference ITS sequences covering about 2500 species (Schoch et al., 2014). These sequences are maintained under the NCBI targeted loci projects in their RefSeq database and can be searched through BLAST and downloaded. The NCBI project has included re-annotation or renaming of existing sequences with the type or ex-type status included in the sequence title, and the newly numbered sequences are detailed in RefSeq (see Federhen, 2015). As reference ITS sequences are obtained for more type and ex-type material, then there will be greater opportunities for comparing current mOTUs with validated sequences. As this occurs it may be possible to put names to more ‘unidentified material’ and to identify sequences labelled with earlier names. There are potentially two different ways in which the new and old sequences can be compared. Either all of the unidentified sequences of interest can be compared to the new reference sequences by clustering, or the new reference sequences can be compared individually to all the available sequences. Nagy et al. (2011) used the first approach and downloaded unknown sequences, combining them in a data set with their new reference sequences. This allowed them to use clustering methods to identify groups of highly similar sequences. This approach works well when there are clearly distinct sequence differences and arbitrary cut-off values can be assigned to the species being considered. This approach has been used extensively with fungal ITS sequences to delineate species groupings within sequence databases (e.g. Kõljalg et al., 2013; Vu et al., 2019; see also Chapter 12, this volume), both to determine species groupings and to investigate the placement of an unknown sequence. However, a clustering approach can be quite difficult with groups where there are very high levels of similarity between sequences from similar species. The very

71

small discrepancies can be problematic in these cases, and overlap between species or closely related genera can require individual cut-off values to be determined for each species being considered (see later). The alternative approach, where each new reference sequence is compared to all potential sequences – essentially using the sequence to ‘fish’ for closely related or identical sequences – can be a slow process, and may also depend on how many sequences from that species are already present in the database. It could, however, prove useful where ITS sequence differences are very small, and any previously unidentified sequences that were representatives of the taxon being considered may be apparent quickly in simple BLAST searches.

Placing mOTUs in Beauveria Nagy et al. (2011) used a cut-off of 97% sequence similarity to define their species groups in Mortierella. This may be considered rather low for many fungal groups (see later) and it may be useful to have a comparison with a genus where species are more closely defined. One candidate for this is the widely distributed entomopathogenic fungal genus Beauveria. Species of Beauveria are commonly isolated from insect cadavers and soil, and have also been found in water, air and in association with plant material (see Meyling and Eilenberg, 2007; Jaronski, 2008; Vega et al., 2008). They have been reported from both tropical and temperate regions, so it would seem likely that Beauveria isolates and sequences would be among those obtained from a range of environmental studies. The taxonomy of the genus has developed considerably over the last few years. de Hoog recognized three species in 1972, and four further species were described by 2006. In 2011 Rehner characterized 12 species and in 2017 Imoulen et al. listed and reviewed 17 species. Since that review further species have been described and currently it is thought that there are between 20 and 28 species (de Hoog, 1972; Rehner et al., 2011; Imoulan et al., 2017; Bustamante et al., 2019). Imoulan et al. (2017) reported that there were some 60 species names listed for Beauveria in Index Fungorum. At the present time there are 48 species names listed in MycoBank although some, such as Beauveria delacroixii, are c urrently

72

P. Bridge

c onsidered as synonyms and others such as Beauveria felina have been associated with other teleomorph genera. The genus occurs as a distinct, single entity within the Cordycipitaceae and appears quite distinct from other anamorphic genera placed in that family (Sung et al., 2007; Kepler et al., 2017). ITS regions across the species are very similar to each other and multigene phylogenies have been needed to elucidate relationships between species (Rehner et al., 2011; Imoulen et al., 2017). Rehner et al. used a four-loci study, including ITS, to evaluate species concepts in the genus, and they also reported that each molecular data set was effective for accurate diagnosis. Reference ITS sequences from type and verified materials are available for many of the species. A set of reference ITS sequences was obtained for ex-type or otherwise verified strains of 25 of the species recognized by Imoulan et al. (2017) and the recently described Beauveria majiangensis (Chen et al., 2018). Imoulen et al. (2017) considered that species identification by ITS alone had reached its resolution in Beauveria and that the ITS–nrDNA region performed poorly in resolving some species. To determine if appropriate identification cut-offs could be established for individual species the reference sequences were BLAST-searched using the ‘sequences from type material’ option at NCBI and aligned through multiple alignment using fast Fourier transform (MAFFT) at the EBI website (www.ebi.ac.uk/Tools/msa/mafft/, accessed 1 February 2020). Percentage identity was determined for all pairwise alignments to provide an estimate of interspecific variation. In the current exercise simple percentage identity values, and the percent coverage, were used as the primary criteria to avoid any possibility of small differences in e scores that could theoretically be due to large differences in the numbers of sequences available for the various species. The percent identities between the accepted Beauveria species were high, ranging from 92.5% to 99.2%. In this instance identity scores of between 98% and 99% were adopted as cut-off values. Applying these across the reference sequences allowed the 25 species to be arranged in 18 ITS groups (see Table 5.1). While this prevents definitive identification of some species, it should permit clear identification at a genus and species group area.

‘Fishing’ for New Sequences Reference ITS sequences for each Beauveria species were compared by BLAST against the full NCBI nucleotide database to determine if ITS sequences from environmental clones or other unidentified material showed any matches. The number of matching sequences that fell within the cut-off levels were determined for each reference sequence and these are shown in Table 5.1. Overall, across all of the reference data, there were 679 ‘matches’ above the cut-off values. The great majority of these matches were to sequences from appropriately named Beauveria species. Discounting sequences that had possibly been mislabelled or misidentified, this exercise found 23 matches to sequences labelled as Beauveria sp. or Cordyceps sp., and 26 to ‘unidentified sequences’. Of the 23 Beauveria/Cordyceps sp. sequences identified, five were sequences that had originally been deposited as unnamed species, although the strains used were subsequently described as new species of Beauveria (B. acridophila, diapheromeriphila, gryllotalpidicola and loensis). The 26 unidentified sequences were from a variety of sources including insects, mites, plants, soil, air and water. One group of 11 sequences labelled as ‘Fungal strain’ that matched with Beauveria caledonica were from an unpublished study of entomopathogenic fungi and it is possible that they may have been re-isolations of a single strain. In total there were 44 matches that could be considered as being to partly identified or unidentified material. There were 679 ‘matches’ within the cut-off values and so approximately 6.5% were to unidentified material. The limited resolution in ITS sequences for Beauveria species raises problems in making direct comparisons with the Mortierella study, but one possible interpretation is that there are very few environmental isolations of ITS sequences of Beauveria that have not been identified to at least genus and ‘species group’ level. The BLASTbased approach focused on the cut-off values determined for the existing sequences. This then raises the question of whether there are unidentified ITS sequences that could potentially be new species of the genus. This was tested by downloading sequences described as unclassified and comparing them individually to the reference Beauveria species ITS sequences.

Table 5.1. Matches obtained from NCBI Nucleotide database for reference ITS sequences from Beauveria species. No. of sequences No. of ‘correct’ No. of matches to No. of matches to other above cut-off value matches older name/concept Beauveria species 6 9 40

6 2 22

B. diapheromeriphila* (92%) B. gryllotalpidicola* (98.5) B. hoplocheli* (98.5%) B. lii (98.5%) B. locustiphila* (98.5) B. loeinsis *(96.5%) B. majiangensis (99%)

2 2 12 1 4 3 32

2 2 12 1 4 3 3

B. malawiensis (99%) B. pseudobassiana (99%)

11 183

10 67

B. sinensis (98%) B. sungii/scarabeicola* (99%) B. varroae (98.5%) B. vermiconia (98%) B. bassiana/ staphylinidicola*/ rudraprayagi* ITS group (99%)

4 17 18 4 249

1 11 6 3 235

82

70

0 0 6 as B. bassiana 1 as Hypocreales sp. 1 as Beauveria sp., 2 as B. bassiana, 11 as Fungal strain sp., 1 as 1 as B. brongniartii Hypocreales sp., 1 as Ascomycete 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 16 as B. brongniartii, 4 as B. 1 as Cordyceps sp., 2 as asiatica,1 as B. amorpha, 2 Cordyceps miltaris as B. bassiana, 1 as B. mendogensis, 1 as Beauveria sp. strain GZ1214, 1 as uncultured Beauveria clone 1 as B. brongniartii 0 0 94 as B. bassiana 1 as B. brongniartii, 8 as Beauveria 1 as Fungus sp., 3 as Uncultured 4 as B. tenella sp. fungal clone, 2 as Uncultured fungus, 1 as Clavicipitaceae clone, 1 as Ascomycete clone, 1 as Cordyceps taishanensis 3 as Beauveria or Cordyceps sp. 0 5 as B. bassiana 0 1 as Cordyceps militaris 1 as B. kipukae 4 as B. bassiana, 4 as Beauveria sp. 1 as Uncultured fungal clone 1 as B. bassiana 0 n/a 8 as Beauveria or Cordyceps sp., 1 as Uncultured Asco, 1 as 1 as B. pseudobassiana, 2 as Uncultured fungal clone, 1 as B. brongniartii Uncultured Cordycipitaceae clone n/a 6 as B. bassiana, 1 as B. amorpha, 1 as Uncultured fungal clone, 3 1 as Beauveria sp. as Cordyceps militaris

ITS sequences could not be otained for ex-types of B. araneola, B. peruviensis or B. blattidicola *ex-type sequence not included in RefSeq

73

B. brongniartii/asiatica/ australis/ kipukae/ medogenisis ITS group (98%)

0 0 1 as B. tenella

Can Older Fungal Sequence Data be Useful?

B. acridophila* (95%) B. amorpha (99%) B. caledonica (98.7%)

No of matches to unidentified sequences or others

ITS group (and cut-off value used)

74

P. Bridge

‘Clustering’ for New Sequences The search terms ‘unclassified Hypocreales’ and ‘internal’ were used to search the NCBI nucleotide database and to download 66 candidate ITS sequences of between 450 and 1500 bp. The sequences were aligned through MAFFT with all of the named reference Beauveria sequences, and clustered. The Beauveria sequences were recovered as two closely related groups independent of the unidentified sequences. Individual BLAST searches of the unclassified sequences gave significant matches to a variety of species and genera, but none of the matches was to any Beauveria species. This approach was further refined by reducing the potential taxonomic diversity and considering sequences that had been deposited as unknown members of Cordyceps. Sixty records containing ITS sequences labelled as ‘unclassified Cordyceps’ were downloaded. Seven sequences that had been identified with known species in the previous BLAST ‘fishing’ approach were removed and the remaining 53 sequences were BLAST-searched against both the full NCBI nucleotide data set and the restricted data set of sequences from type material only. Five of the unclassified Cordyceps sequences matched to the known Beauveria species at better than 95% identity but below the cut-off values used for the species earlier. The initial BLAST ‘fishing’ exercise was based on only type, ex-type and reference sequences, and did not make any allowance for ITS sequence variation within species, and so it cannot be determined whether these five matches represent new species or outlying strains of known species. None of the remaining 48 ‘unclassified Cordyceps’ sequences showed any significant similarity to named Beauveria species. Twenty of the sequences could all be identified to multiple sequences of known Cordyceps species, although verified reference sequences were not available for these.

Outcome Each of the approaches used here gives results that provide some insights into the potential value of the older sequence data in Beauveria. Superficially, the small proportion of unknown/ unclassified sequences (6.5%) found in the

BLAST matches could suggest that there are relatively few older unknown ITS sequences that can be identified to current Beauveria species. This is supported by the examination of the downloaded unclassified Hypocreales sequences. There are, however, a number of particular factors that need to be considered. Beauveria is a widespread cosmopolitan genus that could be expected to feature in environmental studies, and there are over 2000 ITS sequences available for the various strains and species. Therefore, any close additional matches may be a long way down any list from BLAST screening. This is apparent when the unclassified Cordyceps sequences were examined and five further potential matches were found at levels near or just below the identification cut-off values used. There were an additional seven sequences that had subsequently been assigned to Beauveria, but not relabelled, and so 12 of the 60 unidentified Cordyceps sequences (20%) were considered as Beauveria. Although Beauveria species are difficult to identify by ITS alone, the genus is reasonably distinct by morphology and very distinct at the ITS level, so generic identifications or identifications to the two most common species of Beauveria bassiana and Beauveria brongniartii could be expected. This was seen in the fishing approach in Table 5.1, where matches to named non-target sequences were largely to B. bassiana/B. brongniartii (115/21) and the only other named sequences were from various Cordyceps species (7). The systematics of Beauveria has been substantially refined in recent years and 19 of the species names used here have been described in the last 20 years. In some instances, there were no extraneous matches to those taxa, suggesting that they have not been encountered before. The major exception to this was Beauveria pseudobassiana where more than half of the sequences that matched in the BLAST ‘fishing’ exercise had originally been assigned to older Beauveria species. In general terms the results of this study, although based on small data sets, tend to give support to the views of Nagy et al. (2011) that many of the currently unidentified ITS sequences can be assigned to existing species as more verified reference ITS sequences become available. This is supported in the clustering approach, where 20 unclassified Cordyceps species showed matches at around 99% to multiple named

Can Older Fungal Sequence Data be Useful?

sequences, where verified reference material was either not available or not specified. It would seem likely that these sequences could be assigned to those species, but the lack of sequences from ex-type material would make any re-annotation difficult to validate. Comparison of the results from the two different approaches suggests that the BLAST search ‘fishing’ exercise was appropriate for identifying older sequences that could now be labelled as known species, and that the ‘clustering’ approach was more suitable for identifying sequences that may represent as yet undescribed Beauveria taxa.

Mislabelled Sequences One concern regarding the use of the public sequence databases for fungal identification is the presence of incorrectly named or annotated sequences. This was reported in detail by Nilsson et al. (2006), and has partially been addressed through a number of initiatives (see Yahr et al., 2016) including the curation and re-annotation of ITS sequences from ex-type material mentioned earlier (Schoch et al., 2014; Federhen, 2015). In this study, sequences labelled as ‘different’ Beauveria species (or Cordyceps species) were obtained in the top matches above the cut-off levels for 10 of the 19 ITS groups. It is, however, difficult to estimate to what extent these represent misidentified/mislabelled sequences. There were seven accepted species names available in 2006, compared to the 20– 28 currently reported (Rehner et al., 2011; Imoulen et al., 2017; Bustamante et al., 2019), and so some of the older sequences will have been labelled with the most appropriate name available at that time. Imoulan et al. (2017) recovered their multigene phylogeny as nine main clades. Sequences with ‘different’ names could be considered correct if they are to older names in the same clade or to an older morphologically similar species. If these are excluded then there were 66 sequences that were incorrectly named at species level (see Table 5.1). Many of these sequences appeared in the matches for more than one species, and there were 25 independent sequences that were possibly incorrectly named. There were 679 matches above the cut-off levels

75

and so this equates to 3.7% of matches being to incorrectly labelled sequences. However, it is difficult to apply any confidence levels to this figure, particularly as the cut-off values used in this study were generated by comparing all of the new taxonomic studies and so are probably more stringent than those used previously. There is limited ITS resolution in Beauveria and so some ‘different’ names may simply be due to lack of resolution between similar species.

Limitations in the Methodology Although BLAST searching and clustering approaches with older data can provide some useful insights, there are a number of clear limitations to these approaches. Three of these are the ‘universal’ use of ITS sequences, assigning cut-off values, and the names and labels applied to sequences.

ITS As mentioned earlier, the ribosomal internal transcribed spacer region has been adopted as the universal barcode for fungi (see Schoch et al., 2012), and a number of ITS or barcode databases have been established (e.g. see Yahr et al., 2016; Raja et al., 2017). The ITS region was proposed as the universal fungal barcode as it was found to have ‘the highest probability of successful identification for the broadest range of fungi, with the most clearly defined barcode gap between inter- and intraspecific variation’ (Schoch et al., 2012). There are other opinions as to whether it is appropriate to use a multi-copy gene, as there may be opportunities for intragenomic variability (e.g. Nilsson et al., 2008; Simon and Weiss, 2008; Lindner et al., 2013). Instances where this may have occurred and resulted in more than one ITS sequence being present have been reported by numerous authors, and although it does seem more prevalent in some genera, it may only be relevant in a very small proportion of fungal species (Lindner et al., 2013). ITS barcodes have become an important tool in many different areas, ranging from the identification of individual taxa in diagnostic

76

P. Bridge

procedures to markers in large-scale community ecology and biodiversity surveys. One limitation to a solely ITS-based identification for fungi is that average intra- and intraspecific sequence differences vary for different genera (e.g. Nilsson et al., 2008). As a result, species of some genera such as Mortierella can be clearly identified solely from ITS sequences; whereas in other genera, such as Penicillium or Colletotrichum, ITS sequences can only reliably be used to place strains in groups containing multiple named species. It has been estimated that the ITS sequence is effective for species discrimination for around 70% of fungi (Schoch et al., 2012; Yahr et al., 2016) and so the remaining 30% (which include many commonly encountered genera) require alternative or multigene sets to define individual species. Various genes have been used for different genera (see Table 5.2 and Chapter 12) and so comparable sequences may not be available for unknown or environmental samples. Cut-off values Various authors have investigated the use of different cut-off values for identifying fungal species, and differences in intrageneric and

intraspecies ITS variation have been discussed elsewhere (see Chapter 12). Kõljalg et al. (2013) demonstrated the performance of various cut-off values ranging from 97% to 99%, and more recently Vu et al. (2019) have suggested a figure of 99.6% for reference sequence data. In this study with Beauveria it was found that intraspecific ITS variation was not consistent, and separate cut-off values were determined for each species or species group. Although some very high cut-off values have been suggested, one issue that needs to be considered is whether these cut-offs are greater than values that may arise due to differences between multiple copies of rDNA genes. Such events do not appear to be frequent, but they have been reported from a wide range of single and multinucleate fungi (e.g. O’Donnell and Cigelnik, 1997; Fatehi and Bridge, 1998; Lindner and Banik, 2011; Li et al., 2013). Simon and Weiss (2008) obtained rRNA gene sequences from multiple clones of four ascomycete fungi. They found that 2%–3% of sites in the ITS were polymorphic, possibly due to multiple gene copies not evolving concertedly. They also raised the possibility that some apparent polymorphisms may be due to Taq polymerase misreadings. The fidelity of modern DNA polymerases is generally

Table 5.2. Some recent examples of multigene sets used in delineating fungal species. Genus

Gene regions used

References

Beauveria Cladosporium Colletotrichum Emarellia Fusarium

RPB1, RPB2, TEF-1α, Bloc ACT, ITS, TEF-1α ACT, TUB2, CHS-1, GAPDH, CAL, ITS, SOD, GS LSU, ITS, TEF-1α, RPB2 SSU, ITS1, 5.8S, LSU, TUB2, TEF-1α, LYS2, TUB2, CAL, TEF-1a, ITS, LSU, RPB1, RPB2 mtSSU, IGS, TEF-1α ITS, LSU, SSU, mtSSU, TEF-1α, RPB2 TEF-1α, RPB1, RPB2, TUB2, IGS ITS, TEF-1α, ACT TUB2, CAL, RPB2 ITS, TEF-1α, TUB2 ACT, CAL, ITS, RPB2, TEF-1α

Imoulan et al., 2017 Tibpromma et al., 2019 Weir et al., 2012 Borman et al., 2016 Watanabe et al., 2011 Maryani et al., 2019 Kee et al., 2020 Han et al., 2016 Bishcoff et al., 2009 Hunter et al., 2006 Houbraken et al., 2011 Tibpromma et al., 2019 Chaverri et al., 2015

Fomitopsis Metarhizium Mycosphaerella Penicillium Pestalotiopsis Trichoderma

Gene regions: ACT, actin; Bloc, a nuclear intergenic region; CAL, calmodulin; CHS-1, chitin synthase; GAPDH, glyceraldehyde-3-phosphate dehydrogenase; GS, glutamine synthetase; IGS, nuclear ribosomal intergenic spacer region; ITS, nuclear rDNA internal transcribed spacer; LSU, rDNA large subunit; LYS2, aminoadipate reductase; mtSSU, mitochondrial rDNA small subunit; SSU, rDNA small subunit; RPB1, RNA polymerase II largest subunit; RPB2, RNA polymerase II second largest subunit; SOD, manganese-superoxide dismutase; TEF-1α, translation elongation factor 1α; TUB2, partial α-tubulin 2; 5.8S, rDNA 5.8S subunit

Can Older Fungal Sequence Data be Useful?

around 10-5 to 10-6; however, this can depend on the DNA sequence, fragment size and temperature (see McInernery et al., 2014). Fidelity is measured as the accuracy for each copying event, and so errors could compound during the sequencing of long fragments with multiple PCR cycles. Diaz and Sabino (1998) reported fidelity values of 10-5 to 10-6 but obtained 0.13%–0.2% error rates in sequencing a 3 kb fragment over 35 cycles. This area has been reviewed recently, and temperature damage to DNA during thermal cycling was raised as a more significant source of errors (Potapov and Ong, 2017). Other errors can occur during high throughput and pyrosequencing due to carry over events, and these are collectively termed CAFIE (carry forward/incomplete extension) errors (for details see Balzer et al., 2011). Cannon et al. (2008) reported a 1.5% variation in published ITS sequences derived from different lines of the same strain of Colletotrichum gloeosporioides, and multiple similar examples can be readily found in sequence databases. One example is ARSEF324, a strain of Metarhizium acridum that has been extensively studied. It was subsequently relabelled as FI985 by Driver et al. (2000) and has been used in biocontrol studies and as a commercial formulation (Milner and Hunter, 2001). Five ITS sequences are available for the strain under the two strain numbers. These are not strictly duplicates, as they differ in length, and one sequence differs by two bases from the others. It is not possible to determine if the slightly divergent sequence is different owing to sequencing errors, or whether it represents a different clone of the original isolate. Arguments for and against universal cut-off values have been extensively discussed elsewhere (see Nilsson et al., 2008; Koljalg et al., 2013) but, in light of such uncertainties, it may be unrealistic to consider any universal cut-off values for ITS sequences of greater than about 98.5%. Duplicated sequences Comparing sequences from different lines of the same strain can be useful in estimating errors, etc., but it also raises the issue of duplication in the sequence databases. This is partly being addressed through the NCBI type strain initiative mentioned earlier (see Schoch et al., 2014; Federhen, 2015) but appears to be only partly

77

successful as the type status may not have been clearly reported when the sequence was deposited. One example is Penicillium chrysogenum where the new RefSeq sequence effectively replaces the earlier sequences from CBS and IMI lines of the ex-type strain, although sequences from the FRR, ATCC and JCM lines are also available. There are likely to be instances where a single strain has been used as a reference in multiple studies, or has been of interest to several different research groups, as in the M. acridum example above. In these cases, it may be possible to identify ‘duplicate’ sequences by cross-comparison of reference and culture collection numbers, but this can be problematic. For example, the extype strain of P. chrysogenum has acquired at least ten different culture collection numbers, not all of which can be easily cross-referenced. In environmental studies it is possible that multiple identical sequences may be the result of re-isolations of a single strain, and the same may be true for identical sequences recovered from situations such as the spread of a new plant or animal disease. For example, there are around 500 ITS sequences available through the EBI portal for the causal agent of Ash die-back (Hymenoscyphus fraxineus), but it is not possible to determine how many of these are from re-isolations of a single strain. Duplicated sequences can lead to redundancy in nucleotide databases and may have an effect on various tasks (see Chen et al., 2017).

Limitations of names and labels One issue identified in this study is whether sequences should, or can, be re-annotated when an identification is possible. Nearly half of the ‘unclassified Cordyceps’ ITS sequences used here could be placed at species or species complex level. Arguably there would be benefits in some form of re-annotation of older sequences as identification becomes possible, because this could provide useful further information on the species such as distribution, ecology and hosts. The same could be said about some level of re-annotation of sequences after systematic revision. In this study a number of the ITS sequences labelled as Cordyceps sp. were found to be from strains subsequently described as new

78

P. Bridge

species. Although there is now provision for collection curators to retrospectively notify NCBI of ex-type cultures, it is unclear what the procedure would be with a group of sequences from multiple strains that were all found to belong to new species, although only one was specified as being from type material. The examples seen here are not uncommon events and there are other instances of sequences retaining the name as deposited before a new name was adopted.

Species Complexes One development from the increased use of sequence data in fungal taxonomy has been the re-examination of large complex species that may be aggregations of two or more separate species. This has been undertaken on many occasions in recent years. These name changes are recognized at the time for the materials studied and in subsequent identifications. Any older cultures and specimens can also be re-examined, but older sequences – particularly those obtained from non-culture-based studies – can be problematic. Species concepts in the black yeast Exophiala were revised by de Hoog et al. (2003). In that study new and existing ITS sequences were used in a multigene study that resulted in some strains previously identified as Exophiala jeanselmei being placed in the new species Exophiala oligosperma. Some of the older ITS sequences from those strains remain labelled as E. jeanselmei in the nucleotide databases. In this case the ITS sequences of the two species are distinct and so the earlier ITS sequences can be re-identified, although again how any re-annotation would be achieved is not clear. In other cases, where there is little ITS sequence variation between the new and old species concepts, the situation becomes more difficult. One example of this is in P. chrysogenum, where strains previously assigned to this species have subsequently been shown to belong to a number of distinct groups, each of which has been recognized as a separate species. As mentioned earlier, species of Penicillium are difficult to separate by ITS alone, and multigene approaches are required to differentiate the species in section Chrysogena. The species P. chrysogenum was critically examined in 2011 and several

strains previously named as P. chrysogenum were transferred to a new species, Penicillium rubens (Houbraken et al., 2011). The ITS sequences from the ex-types of P. chrysogenum and P. rubens differ by only a single base, which equates to 99.8% identity. However, species concepts in Penicillium section Chrysogena were reconsidered in 2012 and 2013, and a further five new species were described that had ITS sequences indistinguishable from P. chrysogenum/P. rubens (Houbraken et al., 2012; Browne et al., 2013). There are over 1500 ITS sequences labelled as P. chrysogenum available at EBI in the Sequence database. Around 550 of these sequences were deposited before publication of the 2011–2013 systematic changes and so were not named according to the current concepts. These sequences cannot, therefore, be unequivocally named as P. chrysogenum as now described, and can only be considered as members of one of the species in section Chrysogena unless they can be linked to the new names through strain number or some other data. The revision of P. chrysogenum is not an isolated example and many recent fungal studies have identified polyphyletic lines in previously named species. In many of these instances the ITS sequences have not been sufficient to distinguish the new species. Other examples can be found in many genera, one such being the genus Colletotrichum where various species complexes have been reconsidered. One of these was Colletotrichum acutatum where 31 distinct species were identified, 21 of them new, and many that could not be distinguished by ITS sequences alone (Damm et al., 2012). In this case there are some 1900 ITS sequences available in the EMBL Sequence database under the name C. acutatum with around 700 predating the taxonomic revision. Species subdivisions have widespread implications outside systematics as the same uncertainty in the ITS sequences also applies to any DNA sequence associated with that species name. For example, there are around 5000 non-ITS DNA sequences labelled as C. acutatum in the EMBL Sequence database, and around 1600 of these predate the revision. Other non-systematic target gene regions may also differ between the new species, and so re-examination of the sequence may allow the sequence to be re-annotated.

Can Older Fungal Sequence Data be Useful?

Conclusion The study of Beauveria species demonstrates that validated ITS sequences can be used to ‘fish’ for or cluster similar older sequences in nucleotide databases and identify potential new or more appropriate species names for some of the older records. This type of approach is, however, limited to fungal genera where ITS sequences give sufficient resolution (allowing for errors); in some other common genera such as Penicillium the approach would only allow identification to be made to a species complex or section level. One of the definitions given for the term ‘legacy data’ is ‘old information that an organization has, especially information stored in an old-fashioned way’ (Longman Dictionary of Contemporary English, 2020). The examples given here demonstrate that some of the older fungal sequence data can be considered to fall into the category of legacy data, where earlier nomenclature can be taken as the ‘old-fashioned way’. In particular this occurs where older names cannot be reliably matched against the current species concepts, as in the Penicillium example. This is compounded as alternatives to the ITS regions become necessary for accurate species delineation. This does not mean that legacy data necessarily loses any scientific value. Strain numbers can be traced through the literature and culture collection databases, although in some cases (such as with environmental isolates) they may not have been deposited or considered elsewhere. Similarly, multiple depositions of the same strain may allow a sequence from an alternative DNA region to be found under an equivalent collection number. As systematics is dynamic, and names fixed to sequences are not, there is a need to check that the name attached to any sequence used for

79

further study remains appropriate. Easily accessible resources such as Index Fungorum and MycoBank provide dates for species descriptions and lists of earlier synonyms (see Chapters 2 and 3) and are useful starting points for identifying recent taxonomic changes. Where long- established species have been subdivided, however, it can be difficult to determine if an older name remains valid for a particular sequence. In these cases, it may be necessary to trace the history of the originating strain, and its identification, to determine if the name attached to a sequence remains valid. Non-systematic sequences perhaps present the major issue in the use of legacy data. The systematic tracing of strain and sequence numbers, and linking these with recent taxonomic revisions, may be a routine process for systematists. However, many of the users of sequence data are non-systematists who use the nonsystematic sequences, and who may be unaware of any changes to the species (or generic) names. This, in turn, could be further compounded if the original organism that the sequence came from had been identified using an older sequence. There is a potential risk of different species concepts developing in and outside the systematic community, and this could lead to further issues if organisms, or their sequences, are utilized for a range of activities such as environmental markers and commercial development. In conclusion: legacy sequences (particularly ITS ones) can have a role in identifying new taxa or re-examining existing ones. However, attaching current species names to legacy sequences can be problematic, and should only be undertaken in conjunction with a review of any subsequent taxonomic revisions within the genus.

References Abbott, L.A., Bisby, F.A. and Rogers, D.J. (1985) Taxonomic Analysis in Biology. Columbia University Press, New York. https://doi.org/10.7312/abbo93026 Balzer, S., Malde, K. and Jonassen, I. (2011) Systematic exploration of error sources in pyrosequencing flowgram data. Bioinformatics (Oxford, England), 27, i304–i309. https://doi.org/10.1093/bioinformatics/ btr251 Bidartondo, M., Ameri, G. and Döring, H. (2009) Closing the mycorrhizal DNA sequence gap. Mycological Research 113, 1025–1026. https://doi.org/10.1016/j.mycres.2009.09.009

80

P. Bridge

Bischoff, J.F., Rehner, S.A. and Humber, R.A. (2009) A multilocus phylogeny of the Metarhizium anisopliae lineage. Mycologia 101, 512–30. https://doi.org/10.3852/07-202 Borman, A.M., Desnos-Ollivier, M., Campbell, C.K., Bridge, P.D., Dannaoui, E. and Johnson, E.M. (2016) Novel taxa associated with human fungal black-grain mycetomas –Emarellia grisea gen nov. et sp. nov. and Emarellia paragrisea sp. nov. Journal of Clinical Microbiology 54, 1738–1745. https://doi. org/10.1128/JCM.00477-16 Browne, A.G., Fisher, M.C. and Henk, D.A. (2013) Species-specific PCR to describe local-scale distributions of four cryptic species in the Penicillium chrysogenum complex. Fungal Ecology 6, 419–429. https://doi.org/10.1016/j.funeco.2013.04.003 Bustamante, D.E., Oliva, M., Leiva, S., Mendoza, J.E., Bobadilla, L., Angulo, G. and Calderon, M.S. (2019) Phylogeny and species delimitations in the entomopathogenic genus Beauveria (Hypocreales, Ascomycota), including the description of B. peruviensis sp. nov. MycoKeys 58, 47–68. https://doi. org/10.3897/mycokeys.58.35764 Cannon, P.F., Buddie, A.G. and Bridge, P.D. (2008) The typification of Colletotrichum gloeosporioides. Mycotaxon 104, 189–204. Chaverri, P., Branco-Rocha, F., Jaklitsch, W., Gazis, R., Degenkolb, T. and Samuels, G.J. (2015) Systematics of the Trichoderma harzianum species complex and the re-identification of commercial biocontrol strains. Mycologia 107, 558–590. https://doi.org/10.3852/14-147 Chen, Q., Zobel, J. and Verspoor, K. (2017) Duplicates, redundancies and inconsistencies in the primary nucleotide databases: a descriptive study. Database: the Journal of Biological Databases and Curation, 2017, baw163. https://doi.org/10.1093/database/baw163 Chen, W.H., Liu, M., Huang, Z.X., Yang, G.M., Han, Y.F., Liang, J.D. and Liang, Z.Q. (2018) Beauveria majiangensis, a new entomopathogenic fungus from Guizhou, China. Phytotaxa 333, 243–250. https:// doi.org/10.11646/phytotaxa.333.2.8 Cowan, S.T. (1978) A Dictionary of Microbial Taxonomy (ed. Hill, L.R.) Cambridge University Press, Cambridge, UK. Damm, U., Cannon, P.F., Woudenberg, J.H.C. and Crous, P.W. (2012) The Colletotrichum acutatum species complex. Studies in Mycology 73, 37–113. https://doi.org/10.3114/sim0010 Diaz, R.S. and Sabino, E.C. (1998) Accuracy of replication in the polymerase chain reaction. Comparison between Thermotoga maritima DNA polymerase and Thermus aquaticus DNA polymerase. Brazilian Journal of Medical and Biological Research 31, 1239–1242. https://doi.org/10.1590/S0100879X1998001000001 Driver, F., Milner, R.J. and Trueman, J.W.H. (2000) A taxonomic revision of Metarhizium based on a phylogenetic analysis of rDNA sequence data. Mycological Research 104, 134–150. https://doi.org/10.1017/ S0953756299001756 de Hoog, G.S. (1972) The genera Beauveria, Isaria, Tritirachium and Acrodontium gen. nov. Studies in Mycology 1, 1–41. de Hoog, G.S., Vicente, V., Caligiorne, R.B., Kantarcioglu, S., Tintelnot, K., Gerrits van den Ende, A.H. and Haase, G.G. (2003) Species diversity and polymorphism in the Exophiala spinifera clade containing opportunistic black yeast-like fungi. Journal of Clinical Microbiology 41, 4767–4778. https://doi. org/10.1128/JCM.41.10.4767-4778.2003 Fatehi, J. and Bridge, P.D. (1998) Detection of multiple rRNA-ITS regions in isolates of Ascochyta. Mycological Research 102, 762–766. https://doi.org/10.1017/S0953756297005704 Federhen, S. (2015) Type material in the NCBI Taxonomy Database. Nucleic Acids Research, 43 (Database issue), D1086–D1098. https://doi.org/10.1093/nar/gku1127 Han, M., Chen, Y., Shen, L., Song, J., Vlasák, J., Dai, Y. and Cui, B. (2016) Taxonomy and phylogeny of the brown-rot fungi: Fomitopsis and its related genera. Fungal Diversity 80, 343–373. https://doi. org/10.1007/s13225-016-0364-y Hibbett, D.S., Ohman, A., Glotzer, D., Nuhn, M., Kirk, P. and Nilsson, R.H. (2011) Progress in molecular and morphological taxon discovery in Fungi and options for formal classification of environmental sequences. Fungal Biology Reviews 25, 38–47. https://doi.org/10.1016/j.fbr.2011.01.001 Hibbett, D.S., Ohman, A. and Kirk, P.M. (2009) Fungal ecology catches fire. New Phytologist 184, 279–282. https://doi.org/10.1111/j.1469-8137.2009.03042.x Hibbett, D. and Glotzer, D. (2011) Where are all the undocumented fungal species? A study of Mortierella demonstrates the need for sequence-based classification. New Phytologist 191, 592–596. https://doi. org/10.1111/j.1469-8137.2011.03819.x

Can Older Fungal Sequence Data be Useful?

81

Houbraken, J., Frisvad, J.C. and Samson, R.A. (2011) Fleming's penicillin producing strain is not Penicillium chrysogenum but P. rubens. IMA Fungus 2, 87–95. https://doi.org/10.5598/imafungus.2011.02.01.12 Houbraken, J., Frisvad, J.C., Seifert, K.A., Overy, D.P., Tuthill, D.M., Valdez, J.G. and Samson, R.A. (2012) New penicillin-producing Penicillium species and an overview of section Chrysogena. Persoonia 29, 78–100. https://doi.org/10.3767/003158512X660571 Hunter, G.C., Wingfield, B.D., Crous, P.W. and Wingfield, M.J. (2006) A multi-gene phylogeny for species of Mycosphaerella occurring on Eucalyptus leaves. Studies in Mycology 55, 147–161. https://doi. org/10.3114/sim.55.1.147 Imoulan, A., Hussain, M., Kirk, P.M., Meziane, A.E. and Yao, Y-J. (2017) Entomopathogenic fungus Beauveria: Host specificity, ecology and significance of morpho-molecular characterization in accurate taxonomic classification. Journal of Asia-Pacific Entomology 20, 1204–1212. https://doi.org/10.1016/j.aspen.2017.08.015 Jaronski, S.T. (2008) Soil ecology of the entomopathogenic Ascomycetes: a critical examination of what we (think) we know. In Ekesi, S. and Maniania N.K. (eds) Use of Entomopathogenic Fungi in Biological Pest Management. Research Signpost, Thiruvananthapuram, India, pp. 91–144. Kee, Y.J., Zakaria, L. and Mohd, M.S. (2020) Morphology, phylogeny and pathogenicity of Fusarium species from Sansevieria trifasciata in Malaysia. Plant Pathology 69, 442–454. https://doi.org/10.1111/ ppa.13138 Kepler, R.M., Luangsa-ard, J.J., Hywel-Jones, N.L., Quandt, C.A., Sung, G.H., Rehner, S.A., Aime, M.C., Henkel, T.W., Sanjuan, T., Zare, R., Chen, M., Li, Z., Rossman, A.Y., Spatafora, J.W. and Shrestha, B., (2017) A phylogenetically-based nomenclature for Cordycipitaceae (Hypocreales). IMA Fungus 8, 335–353. https://doi.org/10.5598/imafungus.2017.08.02.08 Kõljalg, U., Nilsson, R.H., Abarenkov, K., Tedersoo, L., Taylor, A.F.S., Bahram, M., Bates, S.T., Bruns, T.D., Bengtsson-Palme, J., Callaghan, T.M., Douglas, B., Drenkhan, T., Eberhardt, U., Dueñas, M., Grebenc, T., Griffith, G.W., Hartmann, M., Kirk, P.M., Kohout, P., Larsson, E., Lindahl, B.D., Lücking, R., Martín, M.P., Matheny, P.B., Nguyen, N.H., Niskanen, T., Oja, J., Peay, K.G., Peintner, U., Peterson, M., Põldmaa, K., Saag, L., Saar, I., Schüßler, A., Scott, J.A., Senés, C., Smith, M.E., Suija, A., Taylor, D.L., Telleria, M.T., Weiss, M. and Larsson, K.-H. (2013) Towards a unified paradigm for sequence-based identification of fungi. Molecular Ecology 22, 5271–5277. https://doi.org/10.1111/mec.12481 Leroi, A.M. (2014) The Lagoon: How Aristotle Invented Science. Bloomsbury, London. Li, Y., Jiao, L. and Yao, Y.J. (2013) Non-concerted ITS evolution in fungi, as revealed from the important medicinal fungus Ophiocordyceps siensis. Molecular Phylogenetic Evolution 68, 373–379. https://doi. org/10.1016/j.ympev.2013.04.010 Lindner, D.L. and Banik, M.T. (2011) Intragenomic variation in the ITS rDNA region obscures phylogenetic relationships and inflates estimates of operational taxonomic units in genus Laetiporus. Mycologia 103, 731–740. https://doi.org/10.3852/10-331 Lindner, D.L., Carlsen, T., Nilsson, R.H., Davey, M., Schumacher, T. and Kauserud, H. (2013) Employing 454 amplicon pyrosequencing to reveal intragenomic divergence in the internal transcribed spacer rDNA region in fungi. Ecology and Evolution 3, 1751–1764. https://doi.org/10.1002/ece3.586 Longman Dictionary of Contemporary English (2020). Available at https://www.ldoceonline.com/dictionary/ legacy-data (accessed 5 February 2020). Maryani, N., Sandoval-Denis, M., Lombard, L., Crous, P.W. and Kema, G.H.J. (2019) New endemic Fusarium species hitch-hiking with pathogenic Fusarium strains causing Panama disease in small-holder banana plots in Indonesia. Persoonia 43, 48–69. https://doi.org/10.3767/persoonia.2019.43.02 McInerney, P., Adams, P. and Hadi, M.Z. (2014) Error rate comparison during Polymerase Chain Reaction by DNA polymerase. Molecular Biology International 2014, Article ID 287430. https://doi.org/10.1155/2014/287430. Meyling, N.V. and Eilenberg, J. (2007) Ecology of the entomopathogenic fungi Beauveria bassiana and Metarhizium anisopliae in temperate agroecosystems: Potential for conservation biological control, Biological Control 43, 145–155. https://doi.org/10.1016/j.biocontrol.2007.07.007 Milner, R.A. and Hunter, D.M. (2001) Recent developments in the use of fungi as biopesticides against locusts and grasshoppers in Australia. Journal of Orthopteran Research 10, 271–276. https://doi.org/10.1665/1082-6467(2001)010[0271:RDITUO]2.0.CO;2 Nagy, L.G., Petkovits, T., Kovács, G.M., Voigt, K., Vágvölgyi, C. and Papp, T. (2011) Where is the unseen fungal diversity hidden? A study of Mortierella reveals a large contribution of reference collections to the identification of fungal environmental sequences. New Phytologist 191, 789–794. https://doi. org/10.1111/j.1469-8137.2011.03707.x

82

P. Bridge

Nilsson, R.H., Kristiansson, E., Ryberg, M., Hallenberg, N. and Larsson, K.H. (2008) Intraspecific ITS variability in the Kingdom Fungi as expressed in the international sequence databases and its implications for molecular species identification. Evolutionary Bioinformatics Online 4, 193–201. https://doi.org/10.4137/EBO.S653 Nilsson, R.H., Ryberg, M., Kristiansson, E., Abarenkov, K., Larsson, K-H. and Kõljalg, U. (2006) Taxonomic reliability of DNA sequences in public sequence databases: a fungal perspective. PLoS ONE 1, e59. https://doi.org/10.1371/journal.pone.0000059 O’Donnell, K. and Cigelnik, E. (1997) Two divergent intragenomic rDNA ITS2 types within a monophyletic lineage of the fungus Fusarium are non-orthologous. Molecular Phylogenetic Evolution 7, 103–116. https://doi.org/10.1006/mpev.1996.0376 Potapov, V. and Ong, J.L. (2017) Examining sources of error in PCR by single-molecule sequencing. PLoS ONE 12, e0169774. https://doi.org/10.1371/journal.pone.0169774 Raja, H.A., Miller, A.N., Pearce, C.J. and Oberlies, N.H. (2017) Fungal identification using molecular tools: A primer for the natural products research community. Journal of Natural Products 80, 756–770. https://doi.org/10.1021/acs.jnatprod.6b01085 Rehner, S.A., Minnis, A.M., Sung, G.H., Luangsa-ard, J.J., Devotto, L. and Humber, R.A. (2011) Phylogeny and systematics of the anamorphic, entomopathogenic genus Beauveria. Mycologia 103, 1055–1073. https://doi.org/10.3852/10-302 Schoch, C.L., Seifert, K.A., Huhndorf, S., Robert, V., Spouge, J.L., Levesque, C.A., Chen, W. and Fungal Barcoding Consortium (2012) Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi. Proceedings of the National Academy of Sciences 109, 6241– 6246. https://doi.org/10.1073/pnas.1117018109 Schoch, C.L., Robbertse, B., Robert, V., Vu, D., Cardinali, G., Irinyi, L., Meyer, W., Nilsson, R.H., Hughes, K., Miller, A.N. et al. (2014) Finding needles in haystacks: linking scientific names, reference specimens and molecular data for Fungi. Database: the Journal of Biological Databases and Curation, 2014, bau061. https://doi.org/10.1093/database/bau061 Simon, U.K. and Weiss, M. (2008) Intragenomic variation of fungal ribosomal genes is higher than previously thought. Molecular Biology & Evolution 25, 2251–2254. https://doi.org/10.1093/molbev/msn188 Sung, G.H., Hywel-Jones, N.L., Sung, J.M., Luangsa-Ard, J.J., Shrestha, B. and Spatafora, J.W. (2007) Phylogenetic classification of Cordyceps and the clavicipitaceous fungi. Studies in Mycology 57, 5– 59. https://doi.org/10.3114/sim.2007.57.01 Tibpromma, S., Mortimer, P.E., Karunarathna, S.C., Zhan, F., Xu, J., Promputtha, I. and Yan, K. (2019) Morphology and multi-gene phylogeny reveal Pestalotiopsis pinicola sp. nov. and a new host record of Cladosporium anthropophilum from edible pine (Pinus armandii) seeds in Yunnan province, China. Pathogens 8, 285. https://doi.org/10.3390/pathogens8040285 Vega, F.E., Posada, F., Aime, M.C., Pava-Ripoll, M., Infante, F. and Rehner, S.A. (2008) Entomopathogenic fungal endophytes. Biological Control 46, 72–82. https://doi.org/10.1016/j.biocontrol.2008.01.008 Vu, D., Groenewald, M., de Vries, M., Gehrmann, T., Stielow, B., Eberhardt, U., Al-Hatmi, A., Groenewald, J.Z., Cardinali, G., Houbraken, J., Boekhout,T., Crous, P.W., Robert, V. and Verkley, G.J.M. (2019) Large-scale generation and analysis of filamentous fungal DNA barcodes boosts coverage for kingdom fungi and reveals thresholds for fungal species and higher taxon delimitation, Studies in Mycology 92, 135–154. https://doi.org/10.1016/j.simyco.2018.05.001 Watanabe, M., Yonezawa, T., Lee, K., Kumagai, S., Sugita-Konishi, Y., Goto, K. and Hara-Kudo, Y. (2011) Molecular phylogeny of the higher and lower taxonomy of the Fusarium genus and differences in the evolutionary histories of multiple genes. BMC Evolutionary Biology 11, 322. https://doi.org/10.1186/14712148-11-322 Weir, B.S., Johnston, P.R. and Damm, U. (2012) The Colletotrichum gloeosporioides species complex. Studies in Mycology 73, 115–180. https://doi.org/10.3114/sim0011 Willis, K.J. (ed.) (2018) State of the World’s Fungi 2018. Report. Royal Botanic Gardens, Kew, UK. https:// stateoftheworldsfungi.org/ Yahr, R., Schoch, C.L. and Dentinger, B.T. (2016) Scaling up discovery of hidden diversity in fungi: impacts of barcoding approaches. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 371, 20150336. https://doi.org/10.1098/rstb.2015.0336

6 1

Data Resources: Role and Services of Culture Collections

Matthew J. Ryan1,*, Gerard Verkleij2 and Vincent Robert2 CABI, Egham, UK; 2Westerdijk Fungal Biodiversity Institute, Uppsalalaan8, Utrecht, The Netherlands

Introduction Microbial culture collections (CC) support taxonomy through the provision and preservation of microorganisms, underpinning the integrity of research. However, cultures are essentially worthless without associated data. Details of isolation (geographical location, date of collection, date of isolation), preservation, identification, characterization, growth conditions, properties and applications, etc., make up the key microbial data set that should be associated with every culture (Table 6.1). Additionally, there are aspects of information necessary to meet regulatory requirements, and conditions for use and distribution, that must be held with the strain. Therefore, it is imperative that scientists document and record key information and that curators check data and maintain them in a sustainable way. There is software available to curators to assist in the management of data and it is often used to populate key microbial catalogues of individual microbial biological resource centre (mBRC) holdings and aggregate them in resources such as the Global Catalogue of Microorganisms (GCM), or the Microbial Resource Research Infrastructure (MIRRI), which is under construction. However, it is important to note that while small collections can build their (online) strain databases

using the GCM, larger individual mBRCs usually have their own tools (often developed in-house) to keep their own strain data and thus do not depend on the GCM for this. It is imperative that the names of organisms and their taxonomy that are held in collections are correct and linked to curated nomenclature resources such as Index Fungorum and MycoBank. CC are dealing with an increasing number of objectives and duties. Clients want more information about the strains, improved quality services for identification, the choice of a large panel of strains, to be able to order strains online, rapid delivery, etc. At the same time, funding bodies are increasing their expectations for high-quality reproducible research, requiring more high-impact factor scientific publications, increased biosecurity and the tracking of the origin of the strains, while maintaining or improving the overall quality of the collection. As usual, all these objectives should be achieved with reduced staff and money. An additional complexity is that taxonomy is dynamic with the introduction of new technologies that are producing large amounts of data. More and more, CC staff have to handle increasing amounts and diversity of data quickly and with limited resources. While CCs, herbaria and museums were considered for decades as core facilities to access

*[email protected] © CAB International 2021. Trends in the Systematics of Bacteria and Fungi (eds. P. Bridge, D. Smith and E. Stackebrandt)

83

84

M.J. Ryan et al.

Table 6.1. Example of a typical culture collection minimal microbial data set. Culture Collection Unique identifier number Organism name Current accepted taxonomic name Other name Any previous name Date collected Date sample collected from the environment Date isolated Date strain isolated Geographical GPS location Country of origin ISO country list Isolated from Host or substrate Environment e.g. desert, arable farm, etc. Isolated by Name of isolator Collected by Name of collector Other collections Strain number of other centres where the culture is held Preservation Details of preservation method history and dates Biosecurity Organism-specific biosecurity risk Biosafety Organism-specific biosafety risk Security Restrictions of the culture collection for release e.g. Nagoya Protocol Molecular tests Sequences Proteomic tests Protein sequences and spectra ISO, International Organization for Standardization

type or reference material (see Chapter 4), there is a current trend to consider them as less important as data acquisition on newly sampled material by new methodologies such as next- generation sequencing is becoming cheaper and easier (see Chapters 8, 12 and 14). Therefore, strains or specimens that are poorly annotated or lack useful metadata can be considered by some as useless. In this chapter we will summarize the current status of data curation in fungal CCs, assess the challenges of maintaining genomic and bioinformatic data and investigate how collections need to evolve to meet the challenges of data acquisition and maintenance in the future in order to underpin taxonomy.

The Importance of Reliable Data A fungal or bacterial culture without its associated data is not useful and has little intrinsic

value. All scientists involved in processing a culture, from sample collection to isolation and from characterization to supply, have a responsibility to ensure that all data and information pertaining to a culture are recorded and documented. In recent times, the rapid evolution of technology has allowed microbiologists to undertake analyses on microbes that were unimaginable even 5–10 years ago. This has produced massive data sets that need curating in association with the organism from which the data were generated. This now includes the relatively straightforward sequencing of whole microbial genomes and the subsequent bioinformatic analysis (see Chapter 14). Ryan et al. (2019) summarized the key issues facing curators in their handling of data and these include:

• • • • • • • • •

the amount of data to be stored, handled and analysed; the diversity of data types; algorithms that need to be developed and tuned to be able to properly analyse complex and diverse data sets; the interconnection of data sets; the formats, re-usability of data and data exchange; the target audience of the data produced (e.g. if they are for humans or for machines); the rise of artificial intelligence and its consequences; the costs associated with the points mentioned above and the shortage of software developers, database specialists or data analysts (to mention only a few); building large multi-disciplinary teams (biologists, ecologists, bioinformaticians; and software, algorithm and databasing specialists, etc.).

The GCM provides a good example of a comprehensive database and information retrieval, analysis and visualization system for microbial resources established through the World Data Centre for Microorganisms (WDCM) (Wu et al., 2013). The WDCM has been focusing on how to enhance, enrich and link data to multiple sources (Wu et al., 2017). There are many separate data sources of relevance to the use and characterization of microorganisms; they require collaboration and coordination mechanisms to provide interoperability and dynamic linking in order to benefit

Data Resources: Role and Services of Culture Collections

the user and enable innovation (Ryan et al., 2019). Some examples are described in this chapter; see also Chapters 11, 12, 13 and 14.

Desired Function of a Modern Culture Collections Management System CC data management systems (CCMS) must include functionalities that are useful to curators, technicians, researchers, clients buying the strains and end-users of the website wishing to get data for their studies, and developers should clearly distinguish the features they need. Clients of a CC are usually looking for strains that have a number of properties and want to order them quickly via an order form available from printed or, more likely now, web-based catalogues. They usually want to know how much strains will cost and when they will be delivered. Previously, hard copy CC catalogues were the only way to list all the strains and provide additional data to clients. Such printed catalogues have now become redundant and most CCs have websites that contain the list of available strains, with some additional features or metadata. Many CCs still do not provide more data than that previously listed in printed catalogues. However, there is certainly a trend to increase the amount of data associated with each strain, as it provides significant added value to the strains. Most CCs allow searching for basic strain data such as strain number, species name, country of origin, substrate or equivalent CC numbers in other collections. Some collections allow clients to query their databases by multiple criteria, e.g. morphological, physiological, chemical, molecular, ecological, geo-localization, bibliographical or other properties. Researchers interested in using data held in CCs may not yet be clients and may use CC websites and associated databases to retrieve specific information, to perform correlation analyses or to identify their unknown strains against reference databases. CCs that have created websites that are more than just online catalogues are more likely to attract more traffic – and therefore more clients – than those just posting basic strain data without any additional tools or features. Good examples of such websites are the Westerdijk Institute-KNAW (www.wi.knaw.nl)

85

or MycoBank (www.MycoBank.org), accessed 16 October 2020, which allow online pairwise DNA sequence alignments against reference and curated databases. MycoBank attracts between 2000 and 3500 unique users per day and offers a number of free tools that allow researchers to analyse and compare data. Other websites, such as Barcode of Life Database (BOLD; www.boldsystems.org) or GenBank (www.ncbi.nlm.nih.gov), accessed 16 October 2020, attract even more users and provide a wide range of functional tools. Tools should be present and integrated in a CCMS to provide support for the users mentioned above. The first one is an integrated set of data retrieval tools that includes a laboratory information management system (LIMS) to handle a variety of data entries, such as those related to the administration of the collection, and ecological, morphological, physiological, chemical and molecular data (among others). Integrated data retrieval tools are important as they can help technicians, scientists and curators to save time, be more efficient and reduce error rates. Data importation and exportation are also critically important to allow connections with different external software or machines. Automated importations are required when regular and highthroughput data need to be associated to strains such as matrix-assisted laser desorption/ionization time of flight mass spectrometry (MALDI-TOF) spectra, DNA sequences or HPLC profiles. Once the data are stored in the central databasing system, they should be managed through an easy-to-use interface that includes advanced security features and access management, tracking of database modifications by each user, stock management of strains, customer information management, orders and management of invoices. The system should be able to read data in multiple formats and include the ability to create custom layouts such as invoices, catalogues or sample labels, and scripting tools to automate routine tasks and extend functionalities of the software for storage, editing and analysis of DNA and protein sequence data. When data are properly stored in a database they will be available for subsequent taxonomic analysis, or comparisons such as generating dynamic geographic distribution maps. This will enable identifications, classifications, species determinations and phylogeny construction. Additionally, comparisons of spectral data (such as MALDI-TOF, GC or HPLC profiles) may be useful.

86

M.J. Ryan et al.

Finally, the perfect CCMS should also be able to publish a selected portion of the information defined by the curator directly on the internet. This means that changes in the data made in the management software can be easily and quickly made available to the website. This can allow the release of new strains and associated data while restricting access to data that should not be made visible to the general public. The creation of webpages and website content should also be integrated in the system without requiring the intervention of a highly qualified web designer or manager. It should be fast and easy to update the information. Online deposit forms for the use of depositors of strains should be easy to create and edit by the curators. Online ordering and automated payments and invoicing should also be part of the ideal system. The majority of CCs’ data should respect the FAIR principles (findable, accessible, interoperable and reusable; Wilkinson et al., 2016) by developing them through web services such as REST or SOAP (Mumbaikar and Padiya, 2013) that can be utilized by other software systems though the Global Biodiversity Information Facility (GBIF), the European Bioinformatics Institute (EMBL)/ National Center for Biotechnology Information (NCBI), European Open Science Cloud (EOSC), Microbial Resource Research Infrastructure (MIRRI), World Data Centre for Microorganisms (WDCM) or by individual researchers. There are many more features that could be mentioned; the list above is certainly not exhaustive and will grow over the years.

Reliable and Useful Data Classically, CCs gather a number of basic administrative features related to the strains in their collection. The minimal data set is often enhanced through a more comprehensive approach to the management and handling of data (see Table 6.2). While such data are of course necessary, they are now certainly insufficient, as a taxonomic affiliation does not provide the expected features, properties and abilities of a given strain. End-users now expect to have access to ecological, morphological, physiological, chemical, behavioural or molecular data in order to select the most appropriate strains and conduct specific research with them.

To allow third parties to conduct efficient research, curators and scientists working in CCs must ensure the highest quality standards for the preservation of the integrity of the strains and the data that are associated with them (see Chapter 4). This is a critical point as nobody wants to repeatedly reassess the properties of strains. The idea is to build on existing reliable data and add new quantitative properties based on, for example, reproducible physiological tests or reliable DNA sequencing of barcodes or genomes.

Supporting Fungal Taxonomy The name of an organism is often the starting point to access other information on its provenance, taxonomy, morphology and general biology. The overarching quality of databases is dependent on the use of correct, current names and linking to key data held by collections, such as the ex-type strains upon which descriptions are based (see Chapters 2, 3 and 4). This may include images, digital sequence information and important information sourced from the microbial data set. Registration of new fungal names is a mandatory requirement for valid publication under the International Code of Nomenclature for algae, fungi, and plants (ICN) (Turland et al., 2018). Scientists can register new fungal names through one of three online databases: Fungal Names, Index Fungorum and MycoBank. Additionally, they should provide a ‘core’ set of data which should include the deposit of the type specimen (as a dried specimen in a fungarium/herbarium or as a living culture preserved in a metabolically inactive state in a CC) and details of the place of publication. For the name to be valid for each deposit, a unique database registration number is provided by the online database; this needs to be cited in the place of publication of the name (see also Chapter 3). One of the three online databases, MycoBank was launched in 2004 by mycologists at the CBS-KNAW (now the Westerdijk Fungal Biodiversity Institute) and recognized by the International Mycological Association (IMA) in 2010. The primary aim of MycoBank is to register all fungal taxonomic novelties including new names and combinations (= ‘MB’ number), and provide descriptions and illustrations (Crous et al., 2004; Robert et al., 2013).

Data Resources: Role and Services of Culture Collections

87

Table 6.2. Management and handling of data in a typical culture collection. Process Submission to culture collection (usually through form issued to depositor)

Data or requirements

Place of collection (GPS) including country of origin Collector Date of collection Place of isolation Isolator Date of isolation Depositor Depositor address (incl. institution) Depositor number Other collection number Substrate/host Environment/ecosystem Provisional taxonomy Maintenance (media, temperature, culture regime) Bibliographic reference Prior informed consent (Nagoya/ABS) documentation Accession into collection Characterization and additional Correct taxonomy: information from the Genus name and species epithet depositor if known Subspecies (only if applicable) Variety (only if applicable) Forma (only if applicable) Forma specialis (only if applicable) Authors of the name Race Status (under nomenclature, ex-type, etc.) Optimal growth conditions Special features Risk assessment Biosecurity Restriction (if any) Purpose of accession (patent, safe deposit, publication deposit, new species, novel properties, etc.) Other culture collections Characterization Proteomic (profile) data Genomic (sequence) data Morphological tests Physiological tests Analytical tests Correct taxonomy Tests applied Results of tests Preservation Preservation method applied (liquid nitrogen cooling rate, freeze drying, etc.) Storage conditions Number of replicates/batches preserved Date of preservation and re-preservation Post-preservation viability check (germ test) Supply Release conditions Instructions for resuscitation Instructions for culture Full provenance data Full deliveries data (customer confidentiality) Future research by customer References/bibliography (back to collection database)

88

M.J. Ryan et al.

Running on BioloMICS software (Robert et al., 2011) the MycoBank database is actively curated and regularly updated and improved. Functionality for registration of typification events (epi-, lecto- or neotype) was added in 2013. A MycoBank Typification (MBT) number is issued for each typification event occurring after initial publication of the name; this aids recognition of such events and helps to avoid later superfluous alternative typifications (e.g. for the same name based on different specimens). In common with Index Fungorum and Fungal Names, MycoBank is an open access, community-driven online repository which aims to be comprehensive and to avoid a high degree of redundancy. One of the key challenges for MycoBank is to correctly link species (and lower taxonomic ranks, such as subspecies) with genomic sequence data, and this can only be done via physical specimens (dried specimens, living and metabolically inactive strains) (Durães Sette et al., 2013; Robert et al., 2013). CCs preserve these biological reference materials and collect and store associated information (Verkleij et al., 2016), which includes the key microbial information set discussed above. Further, it is good practice to deposit type- or other correctly identified voucher material in public collections and herbaria, where it can be optimally preserved and stored under strict standards, and from where it can be made available for further study (Schoch et al., 2014; Vu et al., 2016; also see Chapter 4). The onset of DNA sequencing revolutionized our understanding of fungal science and, as a result, fungal taxonomy. This has generally been beneficial in allowing for better insight into the evolution of major fungal lineages, more accurate identifications and a more in-depth comprehension of fungal diversity as cryptic taxa have been revealed. The new subclades thus revealed were often found to differ in their physiology or pathogenicity (Crous et al., 2015). In turn, these distinctions assisted in more accurately recognizing new species; indeed, many fungal genera have proved to be poly- and/or paraphyletic. This has resulted in many new generic names but unfortunately the proportion of genus names actually linked to validated DNA data is still relatively low. This is further complicated by the fact that Genbank holds a large number of unreliable sequences derived from fungi that have been misidentified (Bridge et al., 2003; Hawksworth, 2004), that cannot be

updated to more recent taxonomic concepts (e.g. by lack of sequence data of more diagnostic loci for the voucher specimens), or that have no link to validated material. This can cause users to name their sample incorrectly as the quality of information on which they base their assumption is of poor. This is a problem for any database that cannot keep up with the changes or that requires resequencing of vouchers (such as for new loci) to be able to re-identify the materials. Further, Hawksworth (2015) recognized that misidentifications led to unnecessary and costly efforts in research and management, economic losses and even societal risks. The linking of organism names in all curated nomenclatural databases to other sources of scientific information is required to more effectively find information for scientific analyses and management in health care, food security and conservation. In MycoBank, structural links have been created to several websites and data sources including Catalogue of Life (CoL), Encyclopaedia of Life (EOL), GBIF, Integrated Taxonomic Information System (ITIS), Google Scholar, PubMed, Wikispecies, BOLD Systems, EMBL, NCBI, All Russian Collection of Microorganisms (VKM), CBS collection and GCM. ELIXIR (https://elixir-europe.org/, accessed 16 October 2020) is an EU initiative that allows scientists to share and store their research data. It coordinates, integrates and sustains bioinformatics resources across its member states and enables users in academia and industry to access services that are vital for their research. Another example of a valuable partner in such a relationship is EU-OPENSCREEN (www.eu-openscreen.eu/, accessed 16 October 2020); this integrates some high-capacity screening platforms throughout Europe which jointly use a rationally selected compound collection, comprising up to 140,000 commercial and proprietary compounds collected from European chemists. The European Marine Biological Research Infrastructure Cluster (EMBRIC; www.embric.eu/, accessed 16 October 2020) has demonstrated how such research infrastructures can be brought together with researchers and resource holders such as mBRCs to offer targeted research and solutions to bioindustry (Brennecke et al., 2018; Piña et al., 2018). Without international networks, researchers will continue to rely on existing global capacity such as services provided through EMBL,

Data Resources: Role and Services of Culture Collections

GenBank (an NIH genetic sequence database) and the training opportunities that these European and North American institutions offer.

Standards and Open Access As no common standards exist for how CCs use and provide data, reference is most often made to the FAIR data standards (Wilkinson et al., 2016). FAIR is supported by the G20 group of nations to encourage ‘open science’. However, ultimately all collection data should be made open access, over and above what is typically provided. A model for open data access is provided by the Global Open Data for Agriculture and Nutrition (GODAN) model (Musker et al., 2018), but obstacles must be surmounted. For example, industry may require free access to databases while often themselves restricting access to their own commercially valuable data. This is particularly relevant to the large private CCs held by the agritech and pharmaceutical industries, in which neither the cultures nor the data are publicly available.

89

It must be noted that nearly all CCs restrict some data from their holdings on the grounds of biosecurity, intellectual property (IP) protection or the Nagoya Protocol, which is concerned with the fair and equitable sharing of benefits (Verkleij et al., 2020; see also Chapters 4 and 18). Figure 6.1 shows how collections can manage data in line with both FAIR standards and the GODAN model. The central sector represents restricted information specific to the collection, such as inventory positions or preservation information. It would also cover all information related to strains that are restricted and not publicly available, such as those related to internal use, safe deposits or patent deposits. The intermediate segment represents information held on strains available in publicly available catalogues such as mBRC websites and global databases such as the GCM. These data can be used and reproduced for research use only, and may not be exploited commercially without an appropriate licence. The data available will be screened by the collection to include the key microbial data set and may exclude other or exploitable proteomic and genomic data. Finally, the outer sector represents all the data on a strain that is completely open and not restricted by licence.

Free information

Provenance and genomic data

Licence free

Restricted information /Reserve holdings e.g. inventory positions, biosecurity

Licensed access unless research use only

Commercially exploitable

Open Information on publicly available holdings

Public domain access Non-commercial use All open access data and information Fig. 6.1. Management of collection data in line with both FAIR standards and the GODAN model. Implications of the Nagoya Protocol and other legislation. FAIR, findable, accessible, interoperable and reusable; GODAN, Global Open Data for Agriculture and Nutrition.

90

M.J. Ryan et al.

At a time when data are being generated at an exponential rate, it is also possible that data generated from a microorganism cannot be freely used or commercially exploited when the terms of the Nagoya Protocol and associated national legislation of the country of origin apply to that organism. The Nagoya Protocol of the Convention on Biological Diversity (CBD) set a number of stringent requirements for scientists to observe when collecting and isolating microorganisms (Overmann and Sholtz, 2016; see Chapter 4). A beneficial aspect of this is complete data, as all information concerning the collection and isolation of the organism must be recorded to comply. This includes the date of collection and isolation, geographical location (including the country of origin and geographic coordinates) and the substrate/environment from which the organism was collected. An overview of the information that needs recording can be found in best practices developed by the community of microbial BRCs, including the ‘MIRRI Best practice manual on access and benefit sharing for mBRCs’ (Verkleij et al., 2016), primarily designed for compliance with EU Regulation 511/2014, and the TRUST (TRansparent User-friendly System of Transfer) tools (Desmeth, 2017) focusing on a more global scale. The recently updated tools for transfer agreements of the European Culture Collections’ Organisation (ECCO) (Verkleij et al., 2020) also provide guidance for collections who seek to implement ‘Nagoya-compliant model agreements’ for deposit and supply of microorganisms from a public collection to stakeholders. Important resources and the legal necessity to record more information will help to improve the accuracy and completeness of data and metadata sets but, depending on whether a country has laws and restrictions on the access and use of its biodiversity, there may be additional requirements to collect and store specific documentation. This would include documents that provide evidence that prior informed consent (PIC) for collecting the organisms was obtained from the competent authority in the country of origin, and that the intended use and terms for sharing benefits have been settled between the provider and the user (the mutually agreed terms, MAT). Some countries encourage use of an internationally recognized certificate of compliance (IRCC) where the data requirements for documentation are predefined. The competent

authorities of the country of origin are responsible for publishing the IRCC in the Access and Benefit-Sharing Clearing-House (ABSCH) database, where it will be permanently available for users (https://absch.cbd.int/, accessed 16 October 2020). With the exception of certain data, for which public availability is mandatory, certain information that must be recorded in the IRCC can be kept confidential if so requested by the party that applied for PIC. This includes information on the identity of the collecting party, details of the location, and the type of materials or samples that were collected. A big question facing researchers relates to digital sequence information (DSI), which has been raised in relation to the Nagoya Protocol, and whether publishing and using sequence data invokes that protocol. If it is decided that it does, then it will become essential to record and track data associated with both the organisms and any sequences generated from them (see Chapter 18). Correct taxonomy and associated information are also essential for other legislative and regulatory processes. For example, when shipping material, the correct name provides an indication of an organism’s hazard status and country of origin; this is important, not only for importation processes and licensing, but also for how organisms are handled and packaged in the postal system (Smith and Ryan, 2019). Biosecurity is in itself an important consideration; information and data concerning the provenance of an organism, from where (location, habitat) and what (host, substrate) it was isolated, all inform the taxonomic decision-making process. This, in turn, influences the risk assessment process and the allocation to a specific hazard group or risk status. It is imperative that this information is authoritative and up to date.

Conclusion Without good-quality data associated with the organisms they study, the ability of taxonomists to produce accurate and reliable identifications is compromised. CCs have a duty to ensure that the information associated with cultures is accurate and is sourced from reliable and referenced sources. This requires a validated database infrastructure that catches all appropriate information, with verifiable sources and links.

Data Resources: Role and Services of Culture Collections

Scientists must themselves act in a responsible way to ensure that they undertake appropriate care to ensure the quality of their work. To do this they should source cultures from public service CCs, rather than from privately held collections where associated data may often be poorer or not linked to current information. When describing new species, they should submit their cultures for deposit in a public collection with all of the necessary provenance information required for the microbial data set. Most importantly of all, they should not compromise quality by publishing in journals that have less stringent requirements concerning the provenance of data and that lack the requirement to deposit material in at least two public service collections (as stipulated by the nomenclatural code for bacteria and recommended in Article 8 in the current edition of the Fungal code - Recommendation 8B.1). Going forward, collections must play a role in ensuring that databases link to resources such as the European Nucleotide archive and GenBank (and, in the future, MGnify for microbiome

91

samples). This requires the mBRC community to join forces and work with research infrastructures such as ELIXIR, which unites some of Europe’s leading life science organizations in managing and safeguarding the increasing volume of data being generated by publicly funded research. Collections also have a duty to ensure that the taxonomic information they hold is correct and up to date. With names changing frequently, linkage to the authoritative nomenclatural databases is essential. However, if there are no requirements to do so, or if resources are stretched, this is not always possible – especially if collections are housed privately or have database/software versions that are not kept updated. All of these measures will ultimately make science more robust, improve stringency and allow taxonomists to repeat and critically reproduce and review the work of others. This will ultimately improve the quality of taxonomy for many years to come, at a time when the number of traditional taxonomists is drastically declining in many countries.

References Brennecke, P., Ferrante, M.I., Johnston, I.A. and Smith, D. (2018) A collaborative European approach to accelerating translational marine science. Journal of Marine Science & Engineering 6, 81. https://doi. org/10.3390/jmse6030081 Bridge, P.D., Roberts, P.J., Spooner, B.M. and Panchal, G. (2003) On the unreliability of published DNA sequences. New Phytologist 160, 43–48. https://doi.org/10.1046/j.1469-8137.2003.00861.x Crous, P.W., Gams, W., Stalpers, J.A., Robert, V. and Stegehuis, G. (2004) MycoBank: an online initiative to launch mycology into the 21st century. Studies in Mycology 50, 19–22. Crous P.W., Hawksworth, D.L., Wingfield, M.J. (2015) Identifying and naming plant-pathogenic fungi: past, present, and future. Annual Review of Phytopathology 53, 12.1–12.21. https://doi.org/10.1146/ annurev-phyto-080614-120245 Desmeth, P. (2017) The Nagoya Protocol Applied to Microbial Genetic Resources. In: Kurtböke, I. (ed.) Microbial Resources. Academic Press, pp. 205–217. https://doi.org/10.1016/B978-0-12-804765-1.00010-2 Durães Sette, L., Pagnocca, F.C. and Rodrigues, A. (2013) Microbial culture collections as pillars for promoting fungal diversity, conservation and exploitation. Fungal Genetics and Biology 60, 2–8. Hawksworth, D.L. (2004) ‘Misidentifications’ in fungal DNA sequence databanks. The New Phytologist 161, 13–15. https://doi.org/10.1111/j.1469-8137.2004.00958.x Hawksworth, D.L. (2015) Naming fungi involved in spoilage of food, drink, and water. Current Opinion in Food Science 5, 23–28. https://doi.org/10.1016/j.cofs.2015.07.004 Mumbaikar, S. and Padiya, P. (2013) Web services based on soap and rest principles. International Journal of Scientific and Research Publications 3, 1–4. Musker, R., Tumeo, J., Schaap, B. and Parr, M. (2018) GODAN’s impact 2014-2018 – improving agriculture, food and nutrition with open data [version 1; not peer reviewed]. F1000Research 7, 1328. https://doi. org/10.7490/f1000research.1115970.1

92

M.J. Ryan et al.

Overmann, J. Scholz, A.H. (2016) Microbiological research under the Nagoya Protocol: facts and fiction. Trends in Microbiology 25(2), 85–88. https://doi.org/10.1016/j.tim.2016.11.001 Piña, M., Colas, P., Cancio, I., Audic, A., Bosser, L., Canario, A., Gribbon, P., Johnston, I.A., Kervella, A.E., Kooistra, W.H.C.F., Merciecca, M., Magoulas, A., Nardello, I., Smith, D., Pade, N., Robinson, D., Schoen, A., Schultz, F. and Kloareg, B. (2018) The European Marine Biological Research Infrastructure Cluster: an alliance of European research infrastructures to promote the blue bio-economy. In: Rampello, P.H. and Trincone, A. (eds) Grand Challenges in Marine Biotechnology, Grand Challenges in Biology and Biotechnology. Springer, New York, p 405–442. https://doi.org/10.1007/978-3-319-69075-9_10 Robert, V., Szoke, S., Jabas, B., Vu, D., Chouchen, O., Blom, E. and Cardinali, G. (2011) BioloMICS software: biological data management, identification, classification and statistics. Open Applied Informatics Journal 5, 87–98. DOI: 10.2174/1874136301005010087 Robert, V., Vu, D., Amor, A.B.H. et al. (2013) MycoBank gearing up for new horizons. IMA Fungus 4, 371–379. https://doi.org/10.5598/imafungus.2013.04.02.16 Ryan, M.J., McCluskey, K., Verkleij, G., Robert, V. and Smith, D. (2019) Fungal biological resources to support international development: challenges and opportunities. World Journal of Microbiology & Biotechnology 35, 139. https://doi.org/10.1007/s11274-019-2709-7 Schoch, C.L., Robbertse, B., Robert, V., Vu, D., Cardinali, G., Irinyi, L., Meyer, W., Nilsson, R.H., Hughes, K., Miller, A.N. and Kirk, P.M. (2014) Finding needles in haystacks: linking scientific names, reference specimens and molecular data for Fungi. Database 1. https://doi.org/10.1093/database/bau061 Smith, D. and Ryan, M.J. (2019) International postal, quarantine and safety regulations. Microbiology Australia 40, 117–120. https://doi.org/10.1071/MA19032 Turland, N.J., Wiersema, J.H., Barrie, F.R., Greuter, W., Hawksworth, D.L., Herendeen, P.S., Knapp, S., Kusber, W.-H., Li, D.-Z., Marhold, K., May, T.W., McNeill, J., Monro, A.M., Prado, J., Price, M.J. and Smith, G.F. (eds) (2018) International Code of Nomenclature for algae, fungi, and plants (Shenzhen Code) adopted by the Nineteenth International Botanical Congress Shenzhen, China, July 2017. Regnum Vegetabile 159. Koeltz Botanical Books, Glashütten, Germany. https://doi.org/10.12705/Code.2018 Verkleij, G., Martin, D. and Smith, D. (2016) MIRRI best practice manual on access and benefit sharing. https:// absch.cbd.int/database/A19A20/ABSCH-A19A20-SCBD-208213 (accessed 11 March 2020). Verkleij, G., Perrone, G., Piña, M., Hartman Scholz, A., Overmann, J., Zuzuarregui, A., Perugini, I., Turchetti, B., Hendrickx, M., Stacey, G., Law, S., Russell, J., Smith, D. and Lima, N. (2020) New ECCO model documents for Material Deposit and Transfer Agreements in compliance with the Nagoya Protocol. FEMS Microbiology Letters 367, fnaa044. https://doi.org/10.1093/femsle/fnaa044 Vu, D., Groenewald, M., Szöke, S., Cardinali, G., Eberhardt, U., Stielow, B., de Vries, M., Verkleij, G.J.M., Crous, P.W., Boekhout, T., Robert, V. and Samson, R.A. (2016) DNA barcoding analysis of more than 9 000 yeast isolates contributes to quantitative thresholds for yeast species and genera delimitation. Studies in Mycology 85, 91–105. http://dx.doi.org/10.1016/j.simyco.2016.11.007 Wilkinson, M.D., Dumontier, M.l, Aalbersberg, I.J., Appleton, G., Axton, M., Arie Baak, A., Blomberg, N., Jan-Willem Boiten, J., Bonino da Silva Santos, L., Bourne, P.E., Bouwman, J., Brookes, A.J., Clark, T., Crosas, M., Ingrid Dillo, I., Dumon, O., Edmunds, S., Evelo, C.T., Finkers, R., Gonzalez-Beltran, A., Gray, A.J.G., Groth, P., Goble, C., Grethe, J.S., Jaap Heringa, J., Hoen, P.A.C., Hooft, R., Kuhn, T., Kok, R., Joost, K., Lusher, S., Martone, M.E., Albert Mons, A., Packer, A.L., Persson, B., Rocca-Serra, P., Marco Roos, M., van Schaik, R., Sansone, S., Schultes, E., Sengstag, T., Slater, T., Strawn, G., Swertz, M.A.,Thompson, M., van der Lei, J., Mulligen, E., Velterop, J. Waagmeester, A., Wittenburg, P., Wolstencroft, K., Zhao, J. and Mons, B. (2016) The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3 (1), 160018. doi:10.1038/sdata.2016.18. http://dx.doi. org/10.1038/sdata.2016.18 Wu, L., Sun, Q., Sugawara, H., Yang, S., Zhou, Y., McCluskey, K., Vasilenko, A., Suzuki, K., Ohkuma, M., Lee, Y., Robert, V., Ingsriswang, S., Guissart, F., Desmeth, P. and Ma, J. (2013). Global Catalogue of Microorganisms (GCM): a comprehensive database and information retrieval, analysis, and visualization system for microbial resources. BMC Genomics 14, 933. https://bmcgenomics.biomedcentral. com/articles/10.1186/1471-2164-14-933 Wu, L., Sun, Q., Desmeth, P., Sugawara, H., Xu, Z., McCluskey, K., Smith, D., Alexander, V., Lima, N., Ohkuma, M., Robert, V., Zhou, Y., Li, J., Fan, G., Ingsriswang, S., Ozerskaya, S. and Ma, J. (2017) World Data Centre for Microorganisms: an information infrastructure to explore and utilize preserved microbial strains worldwide. Nucleic Acids Research 1, 8. https://doi.org/10.1093/nar/gkw903

7

MALDI-TOF MS and Currently Related Proteomic Technologies in Reconciling Bacterial Systematics

Haroun N. Shah1,*, Ajit J. Shah1, Omar Belgacem2, Malcolm Ward1, Itaru Dekio3, Lyna Selami2, Louise Duncan4, Kenneth Bruce4, Zhen Xu5, Hermine V. Mkrtchyan6,7, Rory Cave7, Laila M.N. Shah8 and Saheer E. Gharbia9 1 Department of Natural Sciences, Middlesex University, London, UK; 2Ascend Diagnostics Limited, CityLabs 1.0, Manchester, UK; 3Department of Biochemistry and Integrative Medical Biology, School of Medicine, Keio University, Japan; 4King’s College London, Molecular Microbiology Research Laboratory, Pharmaceutical Science Research Division, London, UK; 5Tianjin Key Laboratory of Environment, Nutrition and Public Health, Department of Toxicology and Sanitary Chemistry, School of Public Health, Tianjin Medical University, Tianjin, China; 6School of Biomedical Sciences, University of West London, London, UK; 7School of Health, Sport and Biosciences, University of East London, London, UK; 8King’s College London, Faculty of Natural and Mathematical Sciences, Department of Chemistry, Franklin-Wilkins Building, London, UK; 9Public Health England, Department of Gastrointestinal Pathogens, London, UK

Introduction The effect of the introduction of the Gram stain into clinical microbiology over a century ago has been profound and enduring (Gram, 1884). The test, based on the cell’s colour reaction to the stain, enabled the dichotomous division of the microbiological kingdom and stimulated microbiologists to seek new methods to further characterize potential pathogens. Today, even in the current era of whole-genome sequencing (WGS), many traditional diagnostic laboratories (such as those long established in Public Health England, UK) (PHE) are designated as enteric ‘Gram-negative’ or ‘Gram-positive’ food pathogens to describe their function. As new physiological tests and, subsequently, chemotaxonomic methods were introduced into microbiology

(see Chapter 9), various editions of Bergey’s Manual of Determinative Bacteriology (from 1923 to 1974) assigned different taxa to chapters/volumes in accordance with their reactions to the Gram stain. There was a brief period when peptidoglycan patterns (based mainly upon the characteristic dibasic amino acids lysine or diaminopimelic acid) were considered as a new, well-defined dichotomous key for the microbial kingdom (Schleifer and Kandler, 1972). However, while this was retained as a key character for the description of bacterial species, various anomalies within some genera such as the Bacteroides or Fusobacterium prevented this from being universally adopted. Microbiology changed fundamentally with the introduction by Sneath (1957) of numerical

*[email protected] © CAB International 2021. Trends in the Systematics of Bacteria and Fungi (eds. P. Bridge, D. Smith and E. Stackebrandt)

93

94

H.N. Shah et al.

phenetic analysis which strongly advocated a transition from single weighted tests to those based on multiple characteristics to delineate nomenclatural groups. The arrival of DNA/DNA reassociation, and subsequently of 16S rRNA, reinforced the move towards a phylogeneticbased structure and added rigour to the definition of a species. Thus, as microbiology progressed from an arbitrary identification system to one based upon systematic classification principles, new characters were reported and microbial cellular components were analysed for the first time to maintain a polyphasic approach (Goodfellow and Board, 1980; Minnikin and Goodfellow, 1980; Goodfellow and Minnikin, 1985). During this process, mass spectrometry (MS) was used for the first time for microbial systematics. Both polar (and particularly nonpolar) lipids revealed enormous complexity and diversity and exhibited good congruence with established taxonomic methods. Very multifaceted and hitherto poorly defined groups such as the ‘acid-fast’ bacteria that encompassed several ill-defined taxa (e.g. Mycobacterium, Nocardia, Actinomadura, Corynebacterium and Rhodococccus) began to show resolution. Such data provided the impetus to drastically transform microbiology from a determinative- to a systematic- based approach; and, for the first time, Bergey’s Manuel reflected this change in being renamed Bergey’s Manual of Systematic Bacteriology (Krieg et al., 1984) and microbial classification was no longer pivotally structured around the Gram stain (Shah and Gharbia, 2011).

Proteins in Microbial Systematics The use of protein profiles in microbial systematics has long been regarded as highly compatible with phylogenetic approaches, since proteins were envisioned as a direct product of the genome. Numerous studies were reported that showed excellent correlation between protein profiles and methods such as DNA/DNA reassociation (reviewed by Jackman, 1985). The immense complexity of the bacterial proteome was already established through studies of intermediary metabolism and microbial biochemistry. However, although protein sequences were being reported as early as 1967 (Ambler and

Brown, 1967) and their immense value in microbial systematics demonstrated (Ambler, 1985), existing technologies were considerably laborious, costly and outside the scope of bacterial systematics. Microbiologists therefore turned towards electrophoretic-based platforms to provide insight into the expressed proteome of bacterial species. Three basic approaches were pursued: (i) multilocus enzyme electrophoresis (MLEE; Selander et al., 1986); (ii) peptide/ protein profiles using sodium dodecyl sulphatepolyacrylamide gel electrophoresis (SDS-PAGE; Laemmli, 1970); and (iii) isoelectric focusing protein profiling (IEF-PP) (e.g. Shah et al., 1982). When analyses commenced, optimization of methods to release cellular proteins were based largely upon the Gram stain, but this was due to differences in complexity of the cell envelope rather than to taxonomic rationale. Early results revealed enormous implications for bacterial systematics, and soon methods such as MLEE, based upon the electrophoretic mobility of specific enzymes (e.g. Selander et al., 1986), became established as a tool for microbial systematics. This subsequently translated into its highresolution counterpart, multilocus sequence typing (MLST), based upon comparative DNA sequencing of the selected gene. MLST is used widely today for studies in systematics, evolution and epidemiology (e.g. High et al., 2015). However, both SDS-PAGE and IEF-PP continued to have a significant impact in microbial identification and phylogeny. In an early study, strains designated Bacteroides oralis were shown to exhibit such profound heterogeneity using IEF-PP that this led to the proposal of three new species (Shah and Collins 1981; Shah et al., 1982). Results were corroborated by other chemotaxonomic methods and subsequently by comparative 16S rRNA sequencing (Shah and Collins, 1981). SDS-PAGE is robust, inexpensive, simple and rapid and is still used in its original form today by some laboratories (e.g. Berber, 2004).

Arrival of MALDI-TOF MS in Microbiology A major drawback of the above electrophoretic methods is that they rely solely on pattern-matching algorithms and provide no information on the

MALDI-TOF MS and Currently Related Proteomic Technologies

protein biomarkers that were used to differentiate taxa. In theory, new forms of MS such as MALDITOF MS and tandem MS/MS should have provided the analytical tools to develop these methods, analogous to those achieved for transforming MLEE to MLST. However, systematic analysis using whole-cell protein profiles are only now gaining momentum, as new forms of MS become more accessible. Current MS enables identification of post-translational modification of proteins that cannot be deduced from WGS, and is likely to gain widespread application both in cell biology and systematics, as these techniques become automated, accessible, miniaturized, portable and less costly. The platform that paved the way for these developments was matrix-assisted laser desorption/ionization time of flight MS (MALDI-TOF MS). Earlier forms of MS that were used in microbiology, such as electron impact or pyrolysis MS, had upper detection limits of c.1500 daltons and were, therefore, restricted to small molecules such as lipids and metabolites (Goodfellow and Minnikin, 1985). Proteins are orders of magnitude greater and initial success was reported in the analysis of small proteins using fast atom bombardment MS in the mid-1990s (e.g. Drucker, 1997), but this soon gave way to the more efficient and versatile platform MALDI-TOF MS, which prominent microbiologists have described as ‘transformational’ (Patel, 2013). Pioneered by the late Franz Hillenkamp’s group in the mid-1980s (Karas et al., 1985, 1987), MALDI-TOF MS soon became established as a method for analysis of high molecular weight compounds, particularly proteins (Karas and Hillenkamp, 1988; Tanaka et al., 1988). The work of Cain et al. (1994) drew attention to its first application in microbiology through their publication, ‘Differentiation of bacteria using protein profiles from matrix-assisted laser desorption/ionization time of flight mass spectrometry’. The response by the MS company Kratos Analytical was to launch the first ‘benchtop’ MS instrument, which was groundbreaking for this early period. Three papers reported its use almost simultaneously in 1996 (Claydon et al., 1996; Holland et al., 1996; Krishnamurthy and Ross, 1996) and provided further evidence of its potential. The response by microbiologists was inconsequential, perhaps due to the work being reported in non-microbiological journals;

95

possibly more significant was the previous failure of MS methods such as pyrolysis MS (e.g. Shute et al., 1985) to gain acceptance as a tool for bacterial systematics.

Establishing MALDI-TOF MS in Clinical Microbiology In mid-1997, PHE (then the Public Health Laboratory Service and later the Health Protection Agency) established a new laboratory (Molecular Identification Services Unit, MISU) to identify novel, emerging and atypical pathogens that were increasingly being reported in the UK. This provided an opportune moment to access the potential of new and emerging technologies. The various MALDI-TOF MS platforms were among those introduced, and had an almost immediate impact on MISU. The laboratory was in the middle of assembling a database of MALDI-TOF MS that included most staphylococcal species. A sample labelled ‘atypical Staphylococcus aureus’ was received by MISU for identification. Its MALDI-TOF MS spectrum was shown to be incompatible with any staphylococci and the isolate was identified by 16S rRNA as Exiguobacterium aurantiacum. The inclusion of spectral data into the database facilitated the subsequent rapid identification of 18 blood cultures using the Kratos Linear MALDI-TOF MS, and allowed notification to be sent to hospitals to alert them of the impending threat. The clinical application of MALDI-TOF MS increased substantially and early reviews reflected increasing optimism (e.g. Jarman and Wahl, 2006; Singhal et al., 2015; Belén et al., 2019). It was already hailed as ‘the quantum leap’ by Greub (2010) and later regarded as the gold standard for clinical microbiology (Schubert and Kostrzewa, 2017). However, the clinical applications of MALDI-TOF MS, which was accredited and widely used in Europe, only gained widespread success in North America when quality assurance procedures were subsequently established (Bourassa, 2018) and approval received from the Food and Drug Administration (FDA). From 1998, 21 consecutive international conferences were held at Public Health England and, subsequently, with Middlesex University, London, to advance the application of these

96

H.N. Shah et al.

technologies in microbiology (Shah and Gharbia, 2017). Among the early MALDI-TOF MS methods was the novel application of DNA resequencing that was developed by Sequenom GmbH Hamburg. The system, designated a MassARRAY single-nucleotide polymorphism (SNP) typing platform, employed a MALDI-TOF MS coupled with single-base extension PCR for high-throughput multiplex SNP detection (reviewed by Honisch et al., 2010). Very complex traditional serotyping methods such as the Kauffmann–White method, for the rapid typing of Salmonella, were shown to be amenable for transfer to the Sequenom MALDI-MS platform (Bishop et al., 2010; Syrmis et al., 2011). However, despite its huge potential for microbial analysis, the technology never gained traction in microbiology although it continued to find applications in clinical diagnostics and is currently used for the detection of cancer biomarkers (e.g. Gloss et al., 2012). Subsequently, PCR-based products were analysed using tandem MS/MS, which was excellent for species identification and typing. Perhaps because of its colossal cost, however, the system (designated PCR-ESI-MS) was not acquired by laboratories and soon became obsolete (reviewed by Emonet et al., 2010). 5000 7.5

10000

15000

Protein-based target molecules were viewed with more optimism by microbiologists and different approaches were explored. While the standard MALDI instrument developed by Kratos Analytical Ltd employed a stainless target plate for sample analysis, the Ciphergen Biosystems instrument utilized a chemically active surface (ProteinChip Array) to selectively capture various classes of proteins (Fig. 7.1). Thus, the same sample, using sinapinic acid as its matrix solution, could be analysed by using various ProteinChips to obtain more in-depth coverage of the proteome (Shah et al., 2010). Data sets obtained from Surface Enhanced Laser Desorption/Ionisation Time of Flight Mass Spectrometry (SELDI-TOF MS) analysis were complex and the accompanying software utilized heatmaps to visualize differences in mass ion intensities (Shah et al., 2010). However, raw data could also be extracted and analysed separately. Artificial neural networks were generally employed to assess the diversity of species (Encheva et al., 2005; Lancashire et al. 2005; Schmid et al., 2005; Hamilton et al., 2010; Chiu, 2014). Large proteins of >150,000 daltons were reported while mass spectra were so vast that PHE considered this to 20000

25000

Normal phase Protein Array (NP1)

5 2.5 0

Strong anionic exchange Protein Array (SAX2)

10 5 0 30

Weak cationic exchange Protein Array (WCX1)

20 10 0

000 5000

10000

15000

20000

25000

154,00 154,000 daltons Strep. pneumoniae Fig. 7.1. MALDI-TOF MS profiles obtained for the same sample using three different ProteinChip arrays (NP1, SAX2 and WCX1) to reveal the complexity of a sample. To enable rapid visual comparison of spectra, the software converted the mass ions into Gel view images; the greater the mass intensity, the darker the band. Strep., Streptococcus. (Adapted from Shah et al., 2010.)

MALDI-TOF MS and Currently Related Proteomic Technologies

be a more comprehensive and robust method for typing bacterial species (Shah et al., 2005; Encheva et al., 2006). The launch of the Ciphergen Biosystems next-generation ProteinChip System, Series 4000, in 2004 enabled rapid biomarker discovery and development of predictive, high-throughput SELDI-based analysis. Unfortunately, laboratories with the first generation MALDI-TOF MS instruments, on which considerable background work had been undertaken, were left unsupported. In 2006, the company was acquired by Bio-Rad and there was an abrupt loss of interest in use of the technology in microbiology. It was evident that, for any of these technologies to have a future, a comprehensive microbial database of mass spectral profiles of species was required. Early protocols for analysis of whole cells by MALDI-TOF MS again resorted to the use of different matrix solutions based upon the Gram stain. In general, laboratories utilized alpha-cyano-4-hydroxycinnamic acid for Gramnegative cells and 5-chloro-2-mercaptobenzothiazole for Gram-positive cells. Having optimized a method based on available data, considerable in-house laboratory work was undertaken to initiate this process (Shah et al., 2000, 2002). With the arrival of a dedicated new upright linear MALDI-TOF MS instrument by Micromass (later acquired by Waters Inc.), 4 years was spent developing the first mass spectral database (Keys et al., 2004). Proof of concept was demonstrated through a direct trial at The Royal London Hospital, University of London, and was highly successful. More than 600 isolates were analysed, in parallel with traditional tests undertaken in the hospital’s laboratory, and yielded an 80% concordance (Rajakaruna et al., 2009). During this period, infection in the UK by Clostridium difficile was at a peak, and MALDI-TOF MS was again used to assess its potential value as an analytical tool. Initial results were spurious and led to a re-examination of the basic protocol. A change from 5-chloro-2-mercaptobenzothiazole to 2,5-dihydroxy benzoic acid in acetonitrile:ethanol:water (1:1:1) with 0.3% trifluoroacetic acid (TFA) (Shah et al., 2010) led to a dramatic improvement in results. Similar results were also obtained using α-cyano-4-hydroxycinnamic acid. Furthermore, re-optimization of protocols led to the use of a preliminary extraction

97

with formic acid and the use of α-cyano-4-hydroxycinnamic acid for both Gram-positive and Gram-negative cells. This was demonstrated in several laboratories by a group who met periodically to optimize methods (Kallow et al., 2010). It was soon evident that MALDI-TOF MS spectra gave similar results to the corresponding ribosomal RNA preparation of various species, and provided unequivocal proof that these were the molecules giving rise to the specific mass ions that comprised the spectrum of a given species. This removed the need to standardize the growth conditions, since the mass spectra were not adversely affected by the physiological conditions of the cell. A universal protocol was now established for use with all microbial species, and interest in the technique gained momentum. Commercial databases were now available through Bruker Daltonik GmbH (Bremen, Germany) and AnagnosTec (Potsdam, Germany; subsequently acquired by bioMérieux). In 2011, the UK was in the middle of preparing the staging of the London 2012 Olympics. Part of these preparations was a responsibility by PHE to implement technologies that enabled rapid identification of pathogens in the event of an infectious disease outbreak. A proposal by the Molecular Identification Services Unit of PHE led to the placement of an initial six MALDI-TOF MS instruments, and then a further ten instruments, in major UK cities. This represented the largest network of instruments in one organization globally, and represented endorsement of the technology after 13 years of consolidated work. It continues to function to the present and has retained parallel analysis using comparative 16S rRNA sequence analysis.

MALDI-TOF MS in the Non-clinical Laboratory and its Role in Searching for New Diversity Because MALDI-TOF MS databases were assembled using mainly clinically relevant strains for species identification, it was assumed initially that application of the technology for identification of isolates from other locations such as soil, marine, freshwater, agricultural, poultry and industrial waste, bioremediation in landfill, inhospitable geological sites, and manufacturing sites

98

H.N. Shah et al.

would require new databases with correspondingly relevant species. The method began attracting interest a decade ago and several specialist groups used an existing database in tandem with 16S rRNA sequencing to extended applications of the technology. Species that could not be detected by MALDI-TOF MS, but could be identified by 16S rRNA, were added to the database for studying microbial communities of hitherto poorly studied sites. This approach, in which MALDI-TOF MS and 16S rRNA are used in parallel and the latter used to add new diversity to a pre-existing mass spectral database, has successful moved the technology well beyond the clinical laboratory where it had its roots. For example, MALDI-TOF MS and 16S rRNA were used in tandem for characterizing the cultivable bacterial communities from polluted soils and water following copper mining. The results revealed that MALDI-TOF MS analysis was reliable and could be used as a rapid tool for identifying copper-resistant bacteria (Avanzi et al., 2017). As anticipated, some results were equivocal, but accuracy was improved by enhancing the reference database. In a very interesting application, Timperio et al. (2017) studied Arctic bacteria isolated from the White Sea, Russia, in parallel with 16S rRNA. Agreement between both methods was 100% at the genus level, although it decreased to 48% at the species level, but this was subsequently remedied by inclusion of the missing reference spectra to the database. Notable examples were strains of Exiguobacterium oxidotolerans and Pseudomonas costantinii, which were misidentified initially using the MALDI BioTyper owing to the absence of reference spectra in the database. Interestingly, the concordance for Pseudomonas species was low (29%), confirming the problematic taxonomy of this genus and highlighting areas where the use of MALDI-TOF MS may be used in systematics. In a vastly different study, the geomicrobiology of a 300 m stretch of a rivulet with very hard water (> 120 mg Ca2+/L) in a karst hardwater creek in northern Germany (the Westerhofer Bach) was investigated using 16S rRNA and MALDI-TOF MS as a phenotypic method. Some 35 genera were identified, including a predominance of Flavobacterium, Pseudomonas and Stenotrophomonas and some 60 novel phylospecies. The study provided a deep insight into the bacterial community of the geosphere

and biosphere of a non-marine environment (Cousin et al., 2008) and demonstrated the versatility of the technique. Habitat selection and human transmission of Pseudomonas aeruginosa is currently being investigated. Strains from environments across London are being sampled and analysed by MALDITOF MS following presumptive identification on agar plates. MALDI-TOF MS, in parallel with 16S rRNA, revealed that several phenotypically closely related isolates that produce the characteristic fluorescent and green pigments on agar may have been incorrectly misidentified as P. aeruginosa, instead of Pseudomonas citronellolis (Louise Duncan, PhD student, unpublished). P. citronellolis is a documented soil microbe that exhibits a strong biotic relationship with pine trees or basil plant host (Seubert, 1960; RemusEmsermann et al., 2016). Interestingly, P. citronellolis has been reported for the first time in a case of human infection (Williams, 2019) and demonstrates the value of MALDI-TOF MS in studying microbial community habitats (Duncan et al., 2019). Investigation into these microbial communities revealed that phenotypical colour formation is not an exclusive property of P. aeruginosa (Pirnay et al., 2005; Batrich et al., 2019). MALDI-TOF MS has been proved, therefore, to be a highly sensitive and accurate tool for identifying diverse species of Pseudomonas. The application of MALDI-TOF MS to bacterial identification is key to both diagnostic laboratories and industry. Food and beverage manufacturers are disposing of the methods of their traditional microbiology laboratories and turning to the routine usage of MALDI-TOF MS through various stages of production (e.g. Pavlovic et al., 2013; Santos et al., 2016). It facilitates rapid communication of important quality control results, thus preventing contaminated products from leaving the manufacturing environment to distributors and the subsequent consumer (Bourassa, 2018). These studies, together with the huge volume of reports on the successful use of MALDITOF MS in clinical laboratories, indicate that where the taxonomy of species is unambiguous and the associated reference spectra are in the database employed, the use of MALDI-TOF MS for microbial species identification appears likely to be continued for the foreseeable future. The exceptions to date are Mycobacterium spp., where

MALDI-TOF MS and Currently Related Proteomic Technologies

the rigid cell wall contains very complex longchained fatty acid of up to 80 carbon atoms, and additional extraction methods are required. Furthermore, many species are so poorly resolved that a separate database is generally created for this genus and lower threshold scores are used to identify some species (e.g. Kim and Kim, 2017; Pranada et al., 2017; Rose et al., 2017; Kim et al., 2020).

MALDI-TOF MS in Subspecies Identification, Typing and Screening for Genetic Variants: Implication for Systematics Among certain taxa where subspecies-level identification is apparent and better defined, MALDI-TOF MS has been used to delineate such diversity. Several species and subspecies are recognized in members of the genus Lactobacillus and commonly used as probiotics. Most species of Lactobacillus may be identified using MALDITOF MS; and several subspecies (e.g. L. delbrueckii subsp. delbrueckii, L. delbrueckii subsp. lactis and L. delbrueckii subsp. indicus) which are favoured by various probiotic companies, and need authentication and verification, may also be delineated by MALDI-TOF MS. Many companies, such as Natren Inc., California, USA, previously used long-chained fatty acids to detect minor strain variation to characterize starter cultures and ascertain strain stability during manufacturing. Today, this has been transformed by the use of MALDI-TOF MS for rapid and routine monitoring. PHE has worked closely with Natren Inc. to assess the potential of MALDI-TOF MS to identify probiotic Lactobacillus spp. and Bifidobacterium spp. in parallel with 16S rRNA and WGS in some species, and found excellent correlation (Fig. 7.2). Numerous reports highlight the use of MALDI-MS in the detection of genetic variants and the typing of a vast range of species such as Staphylococcus aureus, Pseudomonas spp., Clostridium difficile, Bacillus spp., Listeria monocytogenes, Salmonella spp., Streptococcus spp., Escherichia coli, Klebsiella pneumoniae and Acinetobacter baumannii (Biswas et al., 2016). We have investigated the capacity of MALDI-TOF MS to subtype the above taxa against a recognized DNA-based

99

typing system and have consistently failed to substantiate such correlations. It is sometimes assumed that, because MALDI-TOF MS spectra comprise mainly highly conserved ribosomal protein structures, dendrograms would mirror 16S rRNA trees. However, concordance between both is uncommon except where taxa have very high gene sequence similarities (> 98.2%) (Schumann and Maier, 2014). However, we could not demonstrate this for strains of P. aeruginosa that had > 99% similarity in 16S rRNA, and bifurcations were not shared in dendrograms. Figure 7.3 shows an example of the discordance among isolates of P. aeruginosa using MALDI-TOF MS against the DNA-based typing system (variable number tandem repeat (VNTR)) that is used routinely at PHE. Similarly, Schumann and Maier (2014) reported that the dendrogram generated on the basis of MALDI-TOF MS did not correspond with the topology of the tree derived from their 16S rRNA gene sequences for the type strains of type species of genera of the family Microbacteriaceae or genus Arthrobacter. Each method is based on different target molecules, and was expected to differ, as these results demonstrate. If the long-term goal of such studies is to use MALDI-TOF MS to replace a recognized DNA-based typing system, our studies have shown that this is not achievable. However, if the objective is to use MALDI-TOF MS exclusively to establish a ‘mass spectral typing’ system for use within a specific laboratory, indications are that this is achievable. Figure 7.4 shows samples from a study to map the transmission of S. aureus and related species in the community. A total of 411 isolates was recovered from the general public and various environmental sites in London. The MALDI-TOF MS data were analysed using the software BioNumerics (Applied Maths) (Vranckx et al., 2017), and evidence of clustering within the same sites were apparent, with the human hand being the reservoir (Xu et al., 2017). Nineteen species of staphylococci were identified, most of which were coagulase negative. BioNumerics revealed hierarchical interrelationships of nine major clusters comprising S. hominis, S. haemolyticus, S. epidermidis, S. pasteuri, S. warneri, S. aureus, S. saprophyticus, S. capitis and S. simiae (Fig. 7.4). It is our view that, with the current level of resolution of the technology, MALDI-TOF MS is excellent for identification of species and (in some

100

H.N. Shah et al.

MSP Dendrogram DANS L. bulgaricus 1. FA_broth DANS L. bulgaricus 1. FA_broth DANS L. bulgaricus 2. FA_plate DANS L. bulgaricus 2. FA_plate Lactobacillus delbrueckii ssp bulgaricus DSM 20081T DSM DANS L. bulgaricus 1. FA_plate DANS L. bulgaricus 1. FA_plate DANS L. bulgaricus 1. FA_plate DANS L. bulgaricus 1. FA_plate Lactobacillus delbrueckii ssp lactis DSM 20076 DSM Lactobacillus delbrueckii ssp lactis DSM 20073 DSM Lactobacillus delbrueckii ssp lactis DSM 20355 DSM Lactobacillus delbrueckii ssp lactis DSM 20072T DSM Lactobacillus delbrueckii ssp indicus DSM 15996T DSM Lactobacillus delbrueckii ssp delbrueckii DSM 20074T DSM NPCBUL02 1. L. delbrueckii FA_broth NPCBUL02 1. L. delbrueckii FA_broth NPCBUL02 L. delbrueckii 2. FA_broth NPCBUL02 L. delbrueckii 2. FA_broth Natern LB51 2. FA_broth Natern LB51 2. FA_broth Natern LB51 2. FA_plate Natern LB51 2. FA_plate Natern LB51 1. FA_plate Natern LB51 1. FA_plate NPCBUL02 1. L. delbrueckii FA_plate NPCBUL02 1. L. delbrueckii FA_plate 1000

800

600

400

200

0

Distance Level

Fig. 7.2. The separation of Lactobacillus species and subspecies using Bruker Biotyper (software version 3.0). The results highlight the capacity of MALDI-TOF MS to delineate some taxa to the subspecies level. These include L. delbrueckii subspecies delbrueckii, L. delbrueckii subspecies lactis and L. delbrueckii subspecies indicus, which are frequently used by various probiotic companies.

instances) subspecies, but does not provide a means to type isolates in a manner similar to DNA-based methods. Table 7.1 highlights some of the obstacles facing the development of a universal typing platform analogous to the database developed for species identification. However, the value of MALDI-TOF MS as a tool for microbial systematics has been grossly undervalued, and reports in systematics journals remain sparse, despite an early report by Schumann and Maier (2014) that provides detailed methods for adapting MALDI-TOF MS to systematics applications.

MALDI-TOF MS in Microbial Systematics; a Case Study Involving Cutibacterium acnes The widespread use of comparative 16S rRNA sequence analysis enabled the transition of a

determinative microbial classification system to a phylogenetic structure (see above). Systematic subcommittees provide essential information on the proposal of new species and continue to ensure that a polyphasic description is still the desired approach (Chapters 1 and 3). However, several species are still proposed solely on the basis of 16S rRNA where a level > 98.8% has been detected, and where there is a paucity of reliable characters to facilitate its identification. While WGS represents the most comprehensive data available to describe a new taxon, acquired genetic elements may skew the description of the new entity. Consequently, in silico DNA-DNA hybridization or ANI values are now used to circumscribe new diversity (Dekio et al., 2015). For many new fastidious species, especially nonfermentative taxa, characteristic phenotypic characters may be extremely difficult to discern. MALDI-TOF MS has the potential to provide unique characters. The current case report

Paeruginosa_H142760080

_H143240544

Paeruginosa_H142760129

_H143220249

Paeruginosa_H142620121

KING55

_H143360578

Paeruginosa_H142840340

NOTTP

_H143360588

Paeruginosa_H142340555 Paeruginosa_H142400369

QUEE13 WYTH02

_H143560014

Paeruginosa_H142520820

_H142100600

Paeruginosa_H142600288

UCLH00

_H142120438

Paeruginosa_H142340666

CAMBP

_H143360591

Paeruginosa_H143420546

BRIGP0

_H142120309

Paeruginosa_H142400533

CHELP RGRO01 SOUTP0

_H143420546

Paeruginosa_H142400378

_H142520820

Paeruginosa_H143360588

_H142040348

Paeruginosa_H143720574

STHO04

_H142080367

Paeruginosa_H143360591

BANGP0 LUTOP

_H142300547

Paeruginosa_H142100600

_H142080586

Paeruginosa_H142940486

RLIV07 GLOU05

_H142600297

Paeruginosa_H142980597

_H142320308

Paeruginosa_H143720561

FREE10

_H142840493

Paeruginosa_H143820549

GREA14

_H143240543

Paeruginosa_H142060194

MANCP1 NTEE03 RSHR00

_H143320759

Paeruginosa_H142720254

_H143560004

Paeruginosa_H143160429

_H142320285

Paeruginosa_H143560023

POOLP0

_H142300377

Paeruginosa_H142040348

PLYMP

_H142340555

Paeruginosa_H142840493

CENTP

_H142620121

Paeruginosa_H142080367

SGEO07 RINF00

_H142840340

Paeruginosa_H142300377

_H142400369

Paeruginosa_H143360578

FURN04

_H142940486

Paeruginosa_H143480345

FARN01

_H142980597

Paeruginosa_H142120438

PORTP

_H143100270

Paeruginosa_H142460268

BLAC23

_H142400533

Paeruginosa_H143140314

_H143140314

Paeruginosa_H143560004

_H142760080

Paeruginosa_H142300547

_H143160429

Paeruginosa_H143100270

_H142060194

Paeruginosa_H144000413

_H142720254

Paeruginosa_H143220249

_H142720263

Paeruginosa_H143240543

_H143320746

Paeruginosa_H143240544

_H144000413

Paeruginosa_H142080586

_H142760129

Paeruginosa_H143560014

_H143720561

Paeruginosa_H142320285

_H143580529

Paeruginosa_H142320315

_H143720574

0

10

20

30

40

50

60

70

80

Paeruginosa_H143420559

_H143480345

Paeruginosa_H143320746

_H143420559

Paeruginosa_H142720263

_H142460268

Paeruginosa_H142120309

_H143820549

Paeruginosa_H142600297

_H142600288

Paeruginosa_H143320759

_H142400378

Paeruginosa_H143340359

_H143560023

Paeruginosa_H142320308

_H142320315

Paeruginosa_H143580529

101

Fig. 7.3. Dendrograms showing phenotypic similarities and relationships of 53 Pseudomonas aeruginosa strains from cystic fibrosis (CF) and non-CF sites, and the same strains analysed genetically using variable number tandem repeat (VNTR). Congruence between the phylotypes was negligible and indicates that one method cannot supplant another. (Adapted from Olkun et al., 2017.)

MALDI-TOF MS and Currently Related Proteomic Technologies

RBRO03

_H143340359

90

100

100

90

80

70

60

50

40

30

20

10

_H142340666

CARDP0 LEWI26

QUEE11

102

H.N. Shah et al.

S. hominis

S.. haemolyticus

–600 –400 0 –200

S. sim simiae

S. aureus reus

–40 4 –400 –20 –2 2 –200 0 200 2 Z 400 4 600 600

0 S. warneri

Y

200 400 600 800 1000

past S. pasteuri

0 X

S. capitis apitis S. saprophyticus prophyticus

500

800 500 –500

–1000

S. epide epidermidis

Fig. 7.4. (Left) Unrooted cluster analysis of Staphylococcus species in the community based upon MALDI-TOF MS spectral profiles. While each species was distinctively delineated, the data enabled the intraspecies diversity to be clearly discerned (adapted from Xu et al., 2017). (Right) Three-dimensional scatter plot of Staphylococcus aureus isolates using a supervised method such as linear discriminant analysis (LDA, Bionumerics) to show relationships among isolates from specific sites, which may be useful for identifying unique traits during transmission (adapted from Vranckx et al., 2017).

(below), based on Propionibacterium acnes, helped in its reclassification to Cutibacterium acnes and provides an example of how these techniques may be used to reconcile microbial systematics.

Brief biology of Cutibacterium acnes C. acnes (previously, Propionibacterium acnes) is a facultative anaerobic bacterium and one of the most abundant resident species of the human skin, with a count of 105–106 cells/cm2 on the facial skin (Dekio et al., 2007). Although it is considered beneficial to humans, it plays a fundamental role in the development of inflammatory acne, a disease that gave the species its specific epithet. Moreover, it also inhabits the eye, oral cavity, large intestine and genito-urinary tract of humans and may be isolated from the cases of opportunistic infections in the cornea, prostate cancer and surgically treated bone and blood-borne illnesses. Differences in cell wall composition led to subdivision of the species into types I and II (Johnson and Cummins, 1972), and later into a

group with a unique filamentous morphology, designated type III (McDowell et al., 2008). These groups were found to be serologically different and using the housekeeping genes, recA and tly to be phylogenetically distinct (McDowell et al. 2005, 2008). In an attempt to resolve the taxonomic substructure of the group, in-depth analysis using genomics and proteomics against known morphological and biological features were undertaken in our laboratory (Dekio et al., 2015) and those of McDowell et al. (2016). To clarify the role of C. acnes in acne, studies have focused on potential virulence determinants, which finally led to the conclusion that type I was responsible for inflammatory acne (Lomholt and Kilian, 2010).

MALDI-TOF MS delineates three proteotypes Finding characters that consistently discriminate bacterial strains below the species level is often fraught with difficulties. In an attempt to delineate these subtypes, we collected MALDI-TOF

MALDI-TOF MS and Currently Related Proteomic Technologies

103

Table 7.1. Barriers to using MALDI-TOF MS exclusively as a universal typing tool. Limitations of MALDI-TOF MS

Comments

MALDI-TOF MS uses a subset of the proteome

The method selects the most abundant proteins in the cells (ribosomal proteins); small peaks below the threshold may be unreliable and are excluded in the creation of the database MALDI-TOF MS, as currently used for species identification, is a qualitative method; mass ions from the same sample can vary in mass intensity between different analyses. However, for practical applications, MALDI-TOF MS relies on pattern-matching algorithms using multiple mass ions in the spectrum, and therefore acquires sufficient data for identification at the species level Reports to date utilize minor peaks in the mass spectrum for typing. The use of these ions to aid typing can be misleading, because these are the most variable mass ions derived, even when using the same instrument. When the same sample is analysed using a different instrument, the magnitude in variability of these minor peaks is exacerbated Inter-laboratory reproducibility as ‘ring tests’ has shown successful identification to the species level but variation in identification scores (Veloo et al., 2017). If uniform identification scores are unachievable at the species level among a group of collaborators, where sample preparation and analytical parameters are rigorously standardized, they are unlikely to be achieved for typing in different laboratories Because MALDI-TOF MS spectra comprise mainly highly conserved ribosomal protein structures, it is often assumed that the topology of dendrograms would mirror 16S rRNA trees. It has been reported that if strains are > 98.2% concordant between both methods, bifurcations are shared in dendrograms (Schumann and Maier, 2014). However, we could not demonstrate this for strains of P. aeruginosa that had > 99% similarity in 16S rRNA, nor could the same authors for family Microbacteriaceae or genus Arthrobacter

Linear MALDI-TOF MS as used in microbiology is qualitative

Inter-laboratory reproducibility of instruments

Reproducibility among technical staff

Comparative 16S rRNA and MALDI-TOF MS

MS spectra of C. acnes strains of 12 MLST types belonging to types I, II and III. As expected, the majority of the peaks were common among the three types, but peaks at 6950– 7200 m/z were found to be highly indicative of each type and may be used as biomarkers (Fig. 7.5) (Nagy et al., 2013; Dekio et al., 2015). These biomarkers are clearly detected by MALDITOF MS and have been confirmed at four different institutes: Keio University, Japan; Japan Collection of Microorganisms, Japan; PHE, UK; and Middlesex University, UK, with acceptable shifts in m/z values. These results were corroborated using another form of MALDI-TOF MS, SELDI-TOF MS, which analysed a far greater depth of the proteome (see above). Furthermore, the more precise

analysis with the wider mass range of mass ions provided by SELDI-TOF MS was found to be useful in functional investigations. C. acnes isolates showed significantly higher expression of a protein with a mass range of c.15–17 kD in type I (Fig. 7.6). This expression disappears almost completely when cells were cultured under aerobic conditions (Dekio et al., 2013). We further investigated this peak following preliminary separation by 1D SDS-PAGE, protein extraction and tandem LC/MS/MS using a Thermo Fisher Orbitrap. Among several proteins contained in this gel-band, CAMP factor, an infection-related protein, was identified. This protein is hypothesized to be a key protein related to the development of inflammatory acne (Dekio et al., 2013).

104

H.N. Shah et al.

7033.1

7179.2

Type IA1

6976.7

7226.1 7106.02 6999.4

7175.2

Type IA2/IB 6983.7

7037.8

7232.5 7106.02 7258.0

6978.2

Type II 7240.4

7007.2

7181.6 7106.02 7233.3

Type III 6982.9

7185.5

7037.8

7260.4

7106.02 Fig. 7.5. Unique biomarker mass ions in the MALDI-TOF MS spectrum of Cutibacterium acnes subspecies (Dekio et al. 2015).

Correlation of proteotypes with whole-genome sequencing Since the taxonomic basis of the proteotypes of C. acnes was established using primarily

MALDI-TOF MS and its extended form, SELDITOF MS, a key consideration was to investigate the rigour of these subtypes by their correlation with genome sequencing. While they showed 16S rRNA gene similarity of > 99.3%, the

MALDI-TOF MS and Currently Related Proteomic Technologies

10000

15000

20000

25000

105

30000

Type I

20

(m/z)

10 0

Type II

15 10 5 0 Type III

20 10 0

Fig. 7.6. Unique biomarker mass ions between the 10 and 20 kD segment of the SELDI-TOF MS spectra of anaerobically-cultured Cutibacterium acnes isolates.

genome similarity of each group (calculated by in silico DNA-DNA hybridization) were in the range 78–72%, slightly higher than that accepted for a species cut-off of 70% (Dekio et al., 2012, 2015). This resulted in the confirmation of the three subspecies C. acnes subsp. acnes, C. acnes subsp. defendens and C. acnes subsp. elongatum, for types I, II and III, respectively (McDowell et al., 2016; Dekio et al., 2019). Unlike many taxa where there is a paucity of reliable characters for subspecies-level identification, these results demonstrate unequivocally the huge potential of MALDI-TOF MS and proteomics in microbial systematics.

MALDI-TOF MS and the Future Interest of MS Companies The current and long-term microbiological use of MALDI-TOF MS is entirely dependent on the interest and development of leading MS companies. There are numerous MS companies worldwide, and several manufacture MALDITOF MS instruments, yet despite the success of the technology and its transformative impact in clinical microbiology, only two companies, Bruker Biotyper (Bruker Daltonics, Germany)

and bioMérieux (in partnership with Shimadzu, Japan; VITEK MS System, France) dominated the global market. However, this is likely to change, because more companies are showing an interest in entering the field. The arrival of ASTA’s MALDI-TOF MS (Tinkerbell LT, ASTA, Korea) and recent developments at Ascend Diagnostics Ltd, Manchester, UK, are examples. The latter’s new MALDI-TOF MS platform (Lexi) is a state-of-the-art MALDI bench-top mass spectrometer that allows rapid identification of microorganisms. Its novel electronic and mechanical modules are embedded in a compact and robust design, generating the smallest footprint on the market at present. The unique features and versatility of this instrument, and other platforms that enable users to have more flexibility, are likely to stimulate further interest across the field of microbiology. The most significant barrier to more MS companies entering clinical microbiology is the monumental task of developing an accredited microbial database for use with a new instrument. ASTA has successfully achieved this and demonstrated the reliability of its MicroIDSys system (ASTA, Korea) by carrying out parallel studies using the Bruker Biotyper (Bruker Daltonics, Germany) on over 5000 clinical isolates. Identical results with high confidence scores

106

H.N. Shah et al.

(≥ 2.0 for Bruker Biotyper) and (≥140 for ASTA MicroIDSys) were obtained for 86.1% of isolates with 99.2% (4267 strains) showing good scores in both systems. The authors concluded that the ASTA MicroIDSys had the capacity to reliably identify clinically important microorganisms (Lee et al., 2017). In further developments, the ASTA MicroIDSys was challenged with 370 clinical anaerobic isolates and attained 91.6% success at the species level. Among these were many poorly described species and many nonfermentative taxa that are normally difficult to speciate (Kim et al., 2020). The MS companies SAI (Scientific Analysis Instruments, Manchester, UK), Waters Corporation, Elstree, UK (formerly Micromass) and Shimadzu Corporation, Japan, have been manufacturing MALDI-TOF MS instruments at the onset of these developments, and have shown a long-term interest in promoting microbiological applications. Two other MS companies, AB Sciex and ThermoFisher Scientific, supply tandem MS

instruments that are likely to feature significantly in the future development of microbial proteomics and diagnostic applications in microbiology.

Use of MALDI-TOF MS in a Clinical Laboratory The initial microbiological development and acceptance of the MALDI-TOF MS was mainly in Europe. A decade later, when accreditation was achieved, the technology entered the US market and was met with considerable approval by some of the leading laboratories (Patel, 2013; Angeletti, 2017). This was subsequently mirrored in many developing countries, largely because after the outlay for an instrument, the accuracy, robustness, low running cost and simplicity of the method markedly surpasses traditional methods (Table 7.2).

Table 7.2. Summary of the current advantages of MALDI-TOF MS. Criteria

Comments

Simplicity of technique

Minimal sample preparation is required. Formic acid extraction is simple and may be automated if required. Minimal biomaterial (a few cells) is required from a primary agar plate for analysis Rapid technique: time to result may be 2–3 min. Analysis may be interrupted, and new samples added during processing Extremely low running cost; e.g. reagents used are miniscule. Many technical problems may be resolved remotely through electronic communications with the company Allows for multiple samples to be analysed with marginal cost implications. New instruments, such as ASTA’s Tinkerbell MS, include target plates that allow parallel analysis on a Bruker MALDI-TOF MS Fits in well into existing laboratory workflows and may be automated into existing IT network systems Staff training is minimal and only low level of technical expertize is required to use an instrument routinely Concordance with 16S rRNA and WGS is excellent, therefore strong taxonomic endorsement at the species level Amenable to additional diversity being added to existing databases and acquisition of data for additional analysis (e.g. see Vranckx et al., 2017) Aids early diagnosis (e.g. blood cultures) and provides preliminary information on antibiotic resistance Analysis of mixed samples is improving (see Charretier et al., 2015; Mahé et al., 2017; Yang et al., 2018)

Time to result Maintenance of the instrument Versatility

Laboratory workflow Technical expertize Scientific credibility Database entries Blood culture Mixed culture identification

MALDI-TOF MS and Currently Related Proteomic Technologies

Limitations of MALDI-TOF MS as currently used Use of mass spectral pattern-matching profiles of mass ions to delineate bacterial species has enabled microbiologists to demonstrate unequivocal proof of concept. These biomarker mass ions are largely derived from the highly abundant ribosomal proteins of the cell, and their stability and reproducibility has been pivotal in the development of a universal method that has gained widespread appeal at the species level. Its application to the subspecies level using mass spectral profiles is equivocal (see above). Indications are that several of the minor peaks in the mass spectrum may be lipids/lipoproteins/glycolipids, and may provide biomarkers for subspecies-level identification, as shown for yeast and fungi (Stübiger et al., 2016; see also Chapter 8). These molecules have been extensively studied in bacteria using older forms of MS (Minnikin and Goodfellow, 1980), but to date the use of MALDI-TOF MS to aid species/subspecies identification has not been demonstrated. DNA-based methods are inherently more precise, and an elegant method developed by Hiroto Tamura takes advantage of the exactitude of DNA and combines it with the ease and speed of MALDI-TOF MS (Tamura, 2017). The method expands the use of the ribosomal proteins used in current MALDI-TOF MS analysis but differs in that it decodes the base sequence for the S10spc-alpha operon which codes for approximately half of the ribosomal proteins. Designated the S-10-GERMS method, it allows rigorous typing of bacterial isolates. Compared to conventional gene-based typing methods, which require DNA sequencing, Tamura’s method is simpler and rapid and has been adopted by Shimadzu Corporation, Japan.

Retaining the Interest of Mass Spectrometry Companies While there is enormous interest by microbiologists to retain the use of MALDI-TOF MS for the foreseeable future, the interest of the MS companies will depend on financial returns. Several MALDI-TOF MS companies have contributed

107

significantly to the development of applications for the technology, but subsequently discontinued their interest owing to budgetary constraints or to a change in the direction of the company’s strategy. Examples include Micromass MALDI-MS (Waters Corporation), Sequenom MassArray system and Ciphergen Biosystems SELDI-TOF MS. In general, MS companies need to work with microbiologists to assemble appropriate databases; the exception is Bruker Daltonik GmbH, which established its own microbiology facility at Bremen, Germany. The Shimadzu Corporation developed a database for use in the Far East, and in Europe and America it used the expanded accredited database of bioMérieux. Several studies have shown near-congruent results through parallel analyses of the same samples using the Bruker Biotyper and bioMérieux VITEK MS (Lévesque et al., 2015). Each sample needed to be run according to the methodology, and analysed using the search engine of each platform, and this may have accounted for minor differences in output. ASTA (MicroIDSys system, ASTA, Korea) is unique in that it is the only non-specialist MS company to have successfully built its own MALDITOF MS instrument for microbiological applications. The design features include a target plate that may be fitted to a Bruker’s Microflex LT/SH or an Autoflex LRF MALDI-TOF MS. The 384 well target plate, therefore, for the first time allows MALDI-TOF MS analysis of the same sample on two entirely different platforms. The ability to be able to combine the use of two different instruments and databases adds new depth and confidence to the type of analyses that may now be undertaken. This was demonstrated in our laboratory at Middlesex University using several staphylococcal species analysed on both Bruker’s Autoflex LRF and ASTA’s Tinkerbell MALDI-TOF MS with considerable ease and confidence (Fig. 7.7). The key to successful identification of an isolate is the comprehensive databases held by these companies; these are continually expanding and released periodically. While valuable for users, they remain a significant financial burden for more modest laboratories, particularly those in developing countries where (paradoxically) more diversity may be encountered. Ideally, an online database of MALDI-TOF MS profiles

108

70 0

X: 5543 5543 55 Y: 45.78 78

0 50 X: 4305 4305 43 37. 7.51 51 0 Y: 37.51 40

X: 4861 4861 48 40..02 Y: 40.02

X: 4865 4865 4 39. 9 72 Y: 39.72 X: 5228 5228 52 Y: 38.85 38..85

0 30

X: 5849 5849 58 Y: 30.49 30. 0.49 49

Score Autoflex: 2.173 Tinkerbel: 107

S. sciuri

X: 5239 52 Y: 56.21 56.21

60 0

Relative Intensity

Autoflex Tinkerbell

X: 5556 5556 55 Y: 64.57 64. 4.57 57

X: 9743 9743 Y: 47.69 47.69 6481 64 X: 6481 41. 1.66 66 Y: 41.66

ASTA’S -Tinkerbell

X: 6 6463 646 3 Y: 29. .77 29.77

X: 97 970 5 9705 Y: 38.25 38.25

Bruker’s Autoflex X: 8336 Y: 17. 737 37 17.37

0 20

8353 X: 8353 18.58 18 Y: 18.58

8878 X: 8878 17. 743 43 Y: 17.43

Species

Bruker species

Bruker score

ASTA species

ASTA score

S. cohnii

295

S. cohnii

297

S. cohnii

319

S. cohnii

342

S. cohnii

343

S. cohnii

344

S. cohnii

345

S. cohnii

347

S. cohnii

349

S. cohnii

Not reliable ID S. capitis S. cohnii Not reliable id Not reliable id S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii Not reliable id S. cohnii S. cohnii S. cohnii

1.657 1.725 1.874 1.665 1.445 1.771 1.703 1.965 1.827 1.846 1.832 1.834 1.816 1.766 1.884 1.848 1.993 1.927 1.856 1.933 1.866 1.821 1.904 1.702 1.702 2.026 1.648 1.807 1.991 1.955

Invalid ID Invalid ID Invalid ID S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii M. Iylae M. Iylae M. Iylae S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii S. cohnii

96 90 89 223 152 207 235 237 235 180 168 165 239 249 252 228 219 227 230 221 217 219 222 216 221 231 222 222 236 239

0 10 0

4000

5000

6000

7000

8000

Mass/Charge (M/Z)

9000

10000

11000

Fig. 7.7. (Left) MALDI-TOF MS analysis of staphylococcal species using ASTA’s Tinkerbell Linear MS and Bruker’s Autoflex LRF MALDI-TOF MS. An example using Staphylococcus sciuri showing the spectrum obtained on ASTA’s Tinkerbell Linear MS (red) which was superimposed on the Bruker’s Autoflex LRF MALDI-TOF MS spectrum (light blue). The results show unequivocally the correspondence of the major mass ions of this species. (Right) 16S rRNA identification of ten atypical strains of Staphylococcus cohnii. Both instruments revealed low identification scores, but most samples were correctly identified. The identification scores of seven strains were too low to provide an identification, while strain 319 was incorrectly identified as Mycobacterium lylae using ASTA’s Tinkerbell MS. Once added to the ASTA database, this and other strains were correctly identified.

H.N. Shah et al.

X: 8834 8834 X: Y 12.55 12 55 12. Y:

ID 211

MALDI-TOF MS and Currently Related Proteomic Technologies

analogous to DNA sequencing databases would remove the constraints currently imposed by a company-based database, and add value to a more diverse and pragmatic solution for future MALDI-TOF MS platforms.

Potential to Identify the Biomarker Peaks in a MALDI-TOF MS Spectrum: Towards a MALDI-TOF MS Global Database The development and clinical applications of MALDI-TOF MS began in Europe in the late 1990s, gained confidence in the USA a decade ago and is now rapidly being implemented in many developing countries. To date attempts have not been made to identify the key mass ions in the MALDI-TOF MS spectra that give each species its unique signature. However, as MALDI-TOF MS is used more for systematics, it will be necessary to identify signature mass ions that may be added to the description of a new species (see above) for the subspecies of C. acnes. Being able to identify the signature mass ions of species will add immensely to the development of an open access global database in the future and help develop the next generation of MALDI-TOF MS applications. To undertake such a task, it will be necessary to use several high-resolution forms of MS, which MALDI-TOF MS is unable to deduce. High-resolution forms of MS that may be used to deduce peptide/protein taxon-specific signatures Although the available MALDI-TOF MS databases encompass mass ions within the range of 500–20,000 daltons, most bacterial species signature biomarkers are within the range of 500– 12,000 daltons, and remain uncharacterized. MALDI-TOF MS is efficient at ionizing proteins in this mass range. However, to characterize these proteins, and species-specific proteins that are much larger (in the range of 50,000– 100,000 daltons), they need to be isolated using techniques such as gel electrophoresis or high-performance liquid chromatography and subsequently digested with a protease before MALDI-TOF MS analysis and database searching. This can result in a better level of

109

taxonomic identification. Furthermore, peptides that are highly abundant and detected reproducibly may be identified and used as biomarkers (Fox et al., 2011). Alternatively, proteins can be extracted from an organism and analysed using liquid chromatography with high-resolution, accurate MS, using either a top-down or bottom-up proteomic approach (Wynne et al., 2010). Apart from simpler sample preparation compared to the bottom-up approach, the topdown workflow provides additional information such as post-translational modifications and identification of proteoforms. With rapid improvements in ion optics, vacuum technology and fragmentation techniques, the upper limit for the top-down is moving to > 100 kDa. Groups that have used bottom-up approaches for proteotyping have created their own database (Jabbour et al., 2010). Currently there are no databases for processing data from either tryptic digests or intact proteins. Such a database could lead not only to better identification, especially for closely related species, but also to identification of specific protein markers. An area where bottom-up workflow has been used extensively is in comparative proteomics. Studies in this area include comparison of protein profiles of methicillin-susceptible and resistant S. aureus (Xu et al., 2020), antibacterial mechanism of antibiotics (Ma et al., 2017), resistant mechanisms (Chen et al., 2019) and the effect of culture conditions on organisms such as P. aeruginosa (Duncan et al., 2019). A bottom-up workflow has also been used for discovery of biomarkers that can be used for detection and identification of bacteria (Charretier et al., 2015). The application of high-resolution MS approaches for identification and studying the bacterial proteome would become even more widespread if databases were available for both top-down and bottom-up workflows. From linear MALDI-TOF MS to tandem LC-MS/MS: unravelling the proteome of microbial species and future implications for bacterial systematics To date, MALDI-TOF MS and tandem MS/MS analyses have operated independently; consequently, the ability to rapidly link bacterial phylogenetics with potential environmental functions has not been explored. To bridge this gap, Clark et al. (2018)

110

H.N. Shah et al.

designed a novel MALDI-TOF MS data acquisition and bioinformatics pipeline (IDBac) to integrate data from both intact protein and specialized metabolite spectra directly from bacterial cells grown on agar. This technique organized bacteria into highly similar phylogenetic groups and allowed for comparison of metabolic differences of hundreds of isolates in just a few hours. The literature detailing the proteome of bacterial species is colossal and continues to expand appreciably. Nearly all investigations pertain to aspects of the physiological flux of the cell, from the effect of environmental stimuli and antibiotic resistance to detailed insights into pathogenic mechanisms of infectious agents (Chamot-Rooke et al., 2011; Chilton et al., 2014, 2017; Soufi and Soufi, 2016; Gault et al., 2017). Although protein profiling methods gained a strong presence in microbial systematics, MALDI-TOF MS – which utilizes protein signatures – dominates taxonomic and species identification outputs. The capability of high-resolution tandem MS methods to identify novel characters and translate genomic traits into phenotypes that may be used in microbial systematics remains poorly studied. We embarked on peptide/protein analysis to decipher species and subspecies markers for predicting microbial behaviour and devising identification markers in the late 1990s, initially using MALDI and SELDI-TOF MS. A few years later we explored the value of LC-MS/MS for higher resolution and direct biomarker sequencing (reviewed by Shah and Gharbia, 2017). Acquisition of the first generation of Thermo Fisher’s LTQ Orbitrap MS established new, rapid extraction methods and bioinformatic approaches applicable across the microbial kingdom (Lancashire et al., 2005; Schmid et al., 2005). Initially, these were confined to high-risk pathogens that are genetically indistinguishable from non-pathogenic taxa, such as separating Bacillus anthracis from Bacillus cereus; Shigella dysenteriae from Shigella boydii; and Clostridium botulinum from Clostridium sporogenes. Huge challenges are encountered in the identification and systematics of Burkholderia cepacia and its relationship to P. aeruginosa in the cystic lung. A parallel problem occurs with Mycobacterium tuberculosis, a member of an almost identical complex of species and subspecies known as the TB complex where differences between taxa are blurred. A coherent bioinformatic pipeline incorporating comparative genome sequence and open

reading frame prediction and translation was developed (Al-Shahib et al., 2010; Misra et al., 2012, 2015, 2017). This enabled the discovery of novel peptide biomarkers characterizing isolates of each species tested using peptide sequences purified from GelC or nano-LC-MS/MS. Such approaches, integrating peptide discovery against comparative genomics (Karlsson et al., 2017, 2018), have the capacity to effectively utilize bacterial proteomes for species-, subspecies- and strain-level typing.

Case study: use of tandem LC-MS/MS during a major disease outbreak of pathogenic E. coli and taxonomic implications During outbreak infections, samples received by a diagnostic laboratory often comprise a diverse range of isolates of the putative pathogen in which the boundaries between subspecies are blurred or the recognition of pathogenic/ non-pathogenic strains is difficult, owing to the exchange of genetic elements. The expression of virulence factors such as toxins, which are nearly always protein in nature, make proteomics a powerful and sensitive tool to simultaneously identify isolates and map the pathogenic factors within strains. Ideally, this should be considered against a background of in silico analysis of the relevant genome, to provide a sound basis to screen for genetic variants to corroborate the pathogenic potential of the isolate in terms of its proteome. The E. coli 0104:H4 outbreak in summer 2011 in Europe is a poignant example of the role of modern systematics in unravelling the pathogenic mechanisms of a well-defined taxon and controlling the outbreak. Nature of the outbreak On 1 May 2011, an E. coli outbreak began in Germany with a few patients presenting with bloody diarrhoea. The aggressiveness of the strain was realized within days as the outbreak extended to eight other European countries. With over 4000 reported cases and 50 deaths, the status of the outbreak was soon elevated to one of the ‘deadliest E. coli outbreaks’ (Rogers, 2012). The outbreak strain was positively identified on 25th May by Karch and colleagues at the University of Münster and the Robert Koch Institute, Germany, using

MALDI-TOF MS and Currently Related Proteomic Technologies

serotyping and PCR assays. Multilocus sequence typing confirmed that the outbreak was caused by a single clone, and that it was a rare serotype: O104:H4. This serotype is normally associated with enteroaggregative E. coli (EAEC) which is known to cause persistent diarrhoea, but not haemorrhaging or Haemolytic uremic syndrome (HUS). In June, patients exhibiting symptoms of the outbreak strain arrived in the UK. PHE (then the Health Protection Agency) initiated plans to undertake the sequencing of its first in-house bacterial genome and proteome sequencing to aid investigation of this outbreak strain. WGS confirmed the mosaic nature of the strain’s chromosome, which was consistent with reports from other isolates of this outbreak. Genome sequencing of multiple strains identified virulence factors that may account for the higher incidence of HUS and the unique traits of the strain. Two independent groups completed about 80% sequencing of the outbreak isolate’s 5.2 million base pair genome and two large plasmids using short-read DNA sequencers (Mellmann et al., 2011; Editorial, 2011) which partly explained the strain’s pathogenicity, acquisition of new genetic elements and evolutionary origin (Shah and Gharbia, 2012). Proteomics and systematics in a high-containment laboratory Cell culture, mechanical disruption of cells, centrifugation and sample preparations were carried out in Class 111 facilities at PHE. Proteomic analysis was performed on five strains from patients with confirmed E. coli 0104 infections; three clinical isolates from patients infected during the ‘German outbreak’ and two other comparator strains that were from PHE’s archives and previously characterized as Enteroaggregative (EAEC) and Enterohemorrhagic (EHEC), respectively. Two parallel approaches were used to reduce the complexity of the protein mixture prior to mass spectral analysis. Initially, lysates were separated by SDS-PAGE, gel slices digested with trypsin, and peptides were analysed using nanoLC-MS/MS. Next, the entire cell lysate was digested directly with trypsin and injected onto two LC-MS/MS systems (Thermo Scientific LTQ Orbitrap and Thermo Scientific LTQ Orbitrap Velos, each with a front-end Ultimate 3000 Dionex nano/capillary liquid chromatography system, Thermo Fisher Scientific) that provided

111

ultra-high resolution and accurate mass for differentiating closely related peptides. The recorded peptide MS/MS spectra were matched to both protein and in silico genome-translated databases to identify sequenced proteins. The peptides were then subjected to further analysis using bioinformatic pipelines to characterize unique signatures. In total, 2500 proteins were identified from the E. coli outbreak isolates, which included 68 peptide signatures that were not shared by EAEC or EHEC or other Enterobacteriaceae. Species-level peptide signatures including those for the AggR transcription factor, haemolysin protein, Aaf fimbriae protein and Iha adhesion protein were detected. Furthermore, genus-characteristic peptides were also mapped, suggesting that, using proteome mining, an unknown isolate can be identified at genus, species and strain level from a single analysis.

Conclusion Over the last two decades we have employed a parallel approach to maximizing the application of proteomics to characterize bacterial species from a vast range of environments. Shah (2014) and Nomura (2015) have reported on the manner in which proteomic data were previously used in bacterial taxonomy. While MALDI-TOF MS has gained formidable recognition in microbiological laboratories, the huge potential available through in-depth analysis of the proteome by new, advanced high-resolution MS-based technologies is grossly underestimated (Xu et al., 2020). The latter may be owing to the absence of a high-throughout, low-cost microscale instrument that is comparable with a MALDI-TOF MS platform; this largely restricts the technology to the confines of the research laboratory. The methods described above use a bottom-up approach in which cells need to be efficiently disrupted, lysates prepared and trypsinized prior to LC-MS/MS. This introduces a higher level of technical expertize and lengthy sample preparation steps, and perhaps has also discouraged MALDI-TOF MS users. More recently, Thermo Fisher Scientific usurped the well-established LTQ Orbitrap MS with the Q Exactive Hybrid Quadrupole-Orbitrap Mass Spectrometer. This highly advanced instrument, with the ability to undertake top-down MS proteome analysis, paves the way for a more automated, high-throughput platform that is

112

H.N. Shah et al.

comparable with MALDI-TOF MS. Early work done at PHE and reported at the European Congress of Clinical Microbiology and Infectious Diseases (Shah et al., 2015) showed that, even for species that are genetically very closely related (such as E. coli and Shigella sonnei, which are inseparable even using WGS), top-down MS analysis revealed differential biomarkers (Fig. 7.8). The next generation of MS to supersede MALDI-TOF MS is likely to be an instrument

such as a Q Exactive Hybrid QuadrupoleOrbitrap Mass Spectrometer that can analyse the bacterial proteome using a top-down approach which is as manageable as processing samples using a MALDI-TOF MS instrument (Armengaud, 2017; Gault et al., 2017). However, the data derived from such analyses, that take advantage of vast amounts of data derived from the entire proteome, will propel microbiology into a new era of systematics and biology.

E. coli col

S. sonnei sonn Zoom View

Protein(s) unique to E. coli

Protein(s) unique to S. sonnei

Fig. 7.8. Use of top-down proteomics to delineate genetically closely related species. In an early attempt to demonstrate the high resolution of this approach, proteins unique to both species were evident, but were differentially expressed (Shah et al., 2015).

References Al-Shahib, A., Misra, R., Ahmod, N., Fang, M., Shah, H.N. and Gharbia, S.E. (2010) Coherent pipeline for biomarker discovery using mass spectrometry and bioinformatics. BMC Bioinformatics 11, 437. Ambler, R.P. (1985) Protein sequencing and taxonomy. In: Jones, D., Goodfellow, M. and Priest, F.G. (eds) Computer-aided Bacterial Systematics. Academic Press, London, pp. 307–335. Ambler, R.P. and Brown, L.H. (1967) The amino acid sequence of Pseudomonas fluorescens azurin. Biochemistry Journal 104, 784–825. Armengaud, J. (2017) Proteogenomics of non-model microorganisms. In: Shah, H.N. and Gharbia, S.E. (eds) MALDI-TOF and Tandem MS for Clinical Microbiology. Wiley, Chichester, UK, pp. 529–538. Angeletti, S. (2017) Matrix assisted laser desorption time of flight mass spectrometry (MALDI-TOF MS) in clinical microbiology. Journal of Microbiological Methods 138, 20–29. Avanzi, I.R., Gracioso, L.H., Baltazar, M.D.P.G., Karolski, B., Perpetuo, E.A. and do Nascimento, C.A.O. (2017) Rapid bacteria identification from environmental mining samples using MALDI-TOF MS analysis. Environmental Science and Pollution Research 24, 3717–3726. Batrich, M., Maskeri, L., Schubert, R., Ho, B., Kohout, M., Abdeljaber, M., Abuhasna, A., Kholoki, M., Psihogios, P., Tahir, R., Sawhney, S., Siddiqui, S., Xoubi, E., Cooper, A., Hatzopoulos, T. and Putonti, C. (2019) Pseudomonas diversity within urban freshwaters. Frontiers in Microbiology 10, 195. doi: 10.3389/fmicb.2019.00195

MALDI-TOF MS and Currently Related Proteomic Technologies

113

Belén, R-S., Emilia, C., Coste, A.T. and Gilbert, G. (2019) Review of the impact of MALDI-TOF MS in public health and hospital hygiene, 2018. Eurosurveillance 24, https://doi.org/10.2807/1560-7917. ES.2019.24.4.1800193 Berber, I. (2004) Characterization of Bacillus species by numerical analysis of their SDS- PAGE protein profiles. Journal of Cell and Molecular Biology 3, 33–37. Bergey, D.H., Harrison, F.C., Breed, R.S., Hammer, B.W. and Huntoon, F.M. (eds) (1923) Genus Bacteroides Castellani and Chalmers 1919. In: Bergey’s Manual of Determinative Bacteriology. 1st edn. Williams and Wilkins Company, Baltimore, Maryland, pp. 255–264. Bishop, C., Arnold, C. and Gharbia, S.E. (2010) Transfer of a traditional serotyping system (Kauffmann– White) onto a MALDI-TOF MS platform for the rapid typing of Salmonella isolates. In: Shah, H.N. and Gharbia, S.E. (eds) Mass Spectrometry for Microbial Proteomics. Wiley, Chichester, UK, pp 463–496. Biswas, S., Gouriet, F. and Rolain, J.-M. (2016) Molecular typing of bacteria/fungi using MALDI-TOF MS. In: Kostrzewa, M. and Schubert, S. (eds) MALDI-TOF Mass Spectrometry in Microbiology. Caister Academic Press, Munich, Germany, pp. 79–92. Bourassa, L. (2018) MALDI-TOF MS quality control in clinical microbiology. Clinical Laboratory News. https://www.aacc.org/publications/cln/articles/2018/janfeb/maldi-tof-ms-quality-control-inclinical-microbiology (accessed 16 October 2020). Cain, T.C., Lubman, D.M. and Weber Jr. W.J. (1994) Differentiation of bacteria using protein profiles from matrix assisted laser desorption/ionization time-of-flight mass spectrometry. Rapid Communications in Mass Spectrometry 8, 1026–1030. Chamot-Rooke, J., Mikaty, G., Malosse, C., Soyer, M., Dumont, A., Gault, J., Imhaus, A-F., Martin, P., Trellet, M., Clary, G., Chafey, P., Camoin, L., Nilges, M., Nassif, X. and Duménil, G. (2011) Posttranslational modification of pili upon cell contact triggers N. meningitidis dissemination. Science 331, 778–782. Charretier, Y., Dauwalder, O., Franceschi, C., Degout-Charmette, E., Zambardi, G., Cecchini, T., Bardet, C., Lacoux, X., Dufour, P., Veron, L., Rostaing, H., Lanet, V., Fortin, T., Beaulieu, C., Perrot, N., Dechaume, D., Pons, S., Girard, V., Salvador, A., Durand, G., Mallard, F., Theretz, A., Broyer, P., Chatellier, S., Gervasi, G., Van Nuenen, M., Roitsch, C.A., Van Belkum, A., Lemoine, J., Vandenesch, F. and Charrier, J.P. (2015) Rapid bacterial identification, resistance, virulence and type profiling using selected reaction monitoring mass spectrometry. Scientific Reports 9(5), 13944. https://www.ncbi.nlm.nih.gov/ pmc/articles/PMC4563557/ Chen, C.Y., Clark, C.G., Langner, S., Boyd, D.A., Bharat, A., McCorrister, S.J., McArthur, A.G., Graham, M.R., Westmacott, G.R. and Van Domselaar, G. (2019) Detection of antimicrobial resistance using proteomics and the comprehensive antibiotic resistance database: a case study. Proteomics Clinical Applications. https://doi.org/10.1002/prca.201800182 Chilton, C.H., Gharbia, S.E., Fang, M., Misra, R., Poxton, I.R., Borriello, S.P. and Shah, H.N. (2014) Comparative proteomic analysis of Clostridium difficile isolates of varying virulence. Journal of Medical Microbiology 63, 489–503. Chilton, C.H., Gharbia, S.E., Misra, R.V., Fang, M., Poxton, I.R., Borriello, S.P. and Shah, H.N. (2017) Mapping of the proteogenome of Clostridium difficile isolates of varying virulence. In: Shah, H.N. and Gharbia, S.E. (eds) MALDI-TOF and Tandem MS for Clinical Microbiology. Wiley, Chichester, UK, pp 379–398. Chiu, T-C. (2014) Recent advances in bacteria identification by matrix-assisted laser desorption/ionization mass spectrometry using nanomaterials as affinity probes. International Journal of Molecular Sciences 15, 7266–7280. Clark, C.M., Costa, M.S., Sanchez, L.M. and Murphy, B.T. (2018) Coupling MALDI-TOF mass spectrometry protein and specialized metabolite analyses to rapidly discriminate bacterial function. PNAS 115, 4981–4914. Claydon, M.A., Davey, S.N., Edwards-Jones, V. and Gordon, D.B. (1996) The rapid identification of intact microorganisms using mass spectrometry. Nature Biotechnology 14,1584–1586. Cousin, S., Brambilla, E., Yang, J. and Stackebrandt, E. (2008) Culturable aerobic bacteria from the upstream region of a karst water rivulet. International Microbiology 11, 91–100. Dekio, I., Sakamoto, M., Hayashi, H., Amagai, M., Suematsu, M. and Benno, Y. (2007) Characterization of skin microbiota in patients with atopic dermatitis and normal subjects using 16S rRNA gene-based comprehensive analysis. Journal of Medical Microbiology 56,1675–1683. Dekio, I., Rajendram, D., Morita, E., Gharbia, S.E. and Shah, H.N. (2012) Genetic diversity of Propionibacterium acnes strains isolated from human skin in Japan and comparison with their distribution in Europe. Journal of Medical Microbiology 61, 622–630.

114

H.N. Shah et al.

Dekio, I., Culak, R., Fang, M., Ball, G., Gharbia, S. and Shah, H.N. (2013) Correlation between phylogroups and intracellular proteomes of Propionibacterium acnes and differences in the protein expression profiles between anaerobically and aerobically grown cells. BioMedical Research International article 151797, Doi: 10.1155/2013/151797 Dekio, I., Culak, R., Misra, R., Gaulton, T., Fang, M., Sakamoto, M., Oshima, K., Hattori, M., Klenk, H-P., Rajendram, D., Gharbia, S.E. and Shah, H.N. (2015) Dissecting the taxonomic heterogeneity within Propionibacterium acnes: proposal for Propionibacterium acnes subsp. acnes subsp. nov. and Propionibacterium acnes subsp. elongatum subsp. nov. International Journal of Systematic and Evolutionary Microbiology 65, 4776–4787. Dekio, I., McDowell, A., Sakamoto, M., Tomida, S., Ohkuma M. (2019) Proposal of new combination, Cutibacterium acnes subsp. elongatum comb. nov., and emended descriptions of genus Cutibacterium, Cutibacterium acnes subsp. acnes, and Cutibacterium acnes subsp. defendens. International Journal of Systematic and Evolutionary Microbiology 69, 1087–1092. Drucker, D.B. (1997) Use of Fast Atom Bombardment Mass Spectrometry for analysis of polar lipids of anaerobes. In: Eley, A.R. and Bennett, K.W. (eds) Anaerobic Pathogens. Sheffield Academic Press, Sheffield, UK, pp. 331–341. Duncan, L.K., Kenneth, B., Shah, A.J., Selami, L., Belgacem, O., Ward, M., Shah, H.N. (2019) Isolation of putative strains of P. aeruginosa and verification using Mass Spectrometry. Society of Chemical Industry December 2019 Conference, Rapid conformation using MALDI-TOF III. SCI, 14/15 Belgrave Square, London. Editorial. Outbreak Genomics. (2011) Whole-genome sequencing and crowdsourced analyses proved a powerful adjunct to traditional typing in the recent Escherichia coli outbreak. Nature Biotechnology 29, 769. Emonet, S., Shah, H.N., Cherkaoui, A. and Schrenzel, J. (2010) Application and use of various mass spectrometry methods in clinical microbiology. Clinical Microbiology and Infection 16, 1604–1613. Encheva, V., Wait, R., Gharbia, S.E., Begum, S. and Shah, H.N. (2005) Proteome analysis of serovars Typhimurium and Pullorum of Salmonella enterica subspecies. BMC Microbiology 5, 42–52. Encheva, V., Wait, R., Gharbia, S.E., Begum, S., Shah, H.N. (2006) Comparison of extraction procedures for proteome analysis of Streptococcus pneumoniae and a basic reference map. Proteomics 6, 3306–3317. Fox, K., Fox, A., Rose, J. and Walla, M. (2011) Speciation of coagulase negative staphylococci, isolated from indoor air, using SDS page gel bands of expressed proteins followed by MALDI-TOF MS and MALDI TOF-TOF MS-MS analysis of tryptic peptides. Journal of Microbiological Methods 84, 243– 250. Gault, J., Vorontsov, E., Dupré, M., Calvaresi, V., Duchateau, M., Lima, D.B., Malosse, C. and ChamotRooke, J. (2017) Top-down proteomics in the study of microbial pathogenicity. In: Shah, H.N. and Gharbia, S.E. (eds) MALDI-TOF and Tandem MS for Clinical Microbiology. Wiley, Chichester, UK, pp 493–504. Gloss, B.S., Patterson, K.I., Barton, C.A., Gonzalez, M., Scurry, J.P., Hacker, N.F., Sutherland, R.L., O’Brien, P.M. and Clark, S.J. (2012) Integrative genome-wide expression and promoter DNA methylation profiling identifies a potential novel panel of ovarian cancer epigenetic biomarkers. Cancer Letters 318, 76–85. Goodfellow, M. and Board, R.G. (1980) Microbiological Classification and Identification. Society for Applied Bacteriology, Academic Press, London, New York, Toronto, Sydney, San Francisco. Goodfellow, M. and Minnikin, D.M. (1985) Chemical Methods in Bacterial Systematics. Society for Applied Bacteriology, Technical Series 20. Academic Press, London, Orlando, San Diego, New York, Toronto, Sydney, Tokyo. Gram, H.C. (1884) Über die isolierte Färbung der Schizomyceten in Schnitt- und Trockenpräparaten. Fortschritte der Medizin 2, 185–189. Greub, G. (2010) MALDI-TOF mass spectrometry: the quantum leap. Clinical Microbiology and Infection 16, 1603. Hamilton, S., Levin, M., Knoll, J.S. and Langford, P.R. (2010) Characterisation of microorganisms by pattern matching of mass spectral profiles and biomarker approaches requiring minimal sample preparation. In: Shah H.N. and Gharbia, S.E. (eds) Mass Spectrometry for Microbial Proteomics. Wiley, Chichester, UK, pp. 223–254. High, N.J., Fan, F. and Schwartzman, J.D. (2015) Haemophilus influenzae. In: Molecular Medical Microbiology 2nd edn. Chapter 97, Vol. 3, pp. 1709–1728. Holland, R.D., Wilkes, J.G., Rafii, F. and Lay, J.O. (1996) Rapid identification of intact whole bacteria based on spectral patterns using matrix-assisted laser desorption/ionization with time-of-flight mass spectrometry. Rapid Communications in Mass Spectrometry 10, 1227–1232.

MALDI-TOF MS and Currently Related Proteomic Technologies

115

Honisch, C., Chen, Y. and Hillenkamp, F. (2010) DNA resequencing by MALDI-TOF Mass Spectrometry and its application to traditional microbiological problems. In: Shah, H.N. and Gharbia, S.E. (eds) Mass Spectrometry for Microbial Proteomics. Wiley, Chichester, UK, pp 443–462. Jabbour, R.E., Deshpande, S.V., Wade, M.M., Stanford, M.F., Wick, C.H., Zulich, A.W., Evan, W., Skowronski, E.W. and Snyder, A.P. (2010) Double-blind characterization of non-genome-sequenced bacteria by Mass Spectrometry-based proteomics. Applied and Environmental Microbiology 76, 3637–3644. Jackman, P.J.H. (1985) Bacterial taxonomy based on electrophoretic whole-cell protein patterns. In Goodfellow M. and Minnikin, D.E. (eds) Chemical Methods in Bacterial Systematics. Academic Press, London and New York, pp. 115–129. Jarman, K.H. and Wahl, K.L. (2006) Development of spectral pattern-matching approaches to Matrix Assisted Laser Desorption/Ionization-Time of Flight Mass Spectrometry for bacterial identification. In: Wilkins, C.L. and Lay, J.O. (ed.) Identification of Microorganisms by Mass Spectrometry. John Wiley & Sons, Inc., Hoboken, New Jersey, pp 153–160. Johnson, J.L. and Cummins, C.S. (1972) Cell wall composition and deoxyribonucleic acid similarities among the anaerobic coryneforms, classical propionibacteria, and strains of Arachnia propionica. Journal of Bacteriology 109, 1047–1066. Kallow, W., Erhard, M., Shah, H.N., Raptakis, E. and Welker, M. (2010) MALDI-TOF MS for microbial identification: years of experimental development to an established protocol. In: Shah, H.N. and Gharbia, S.E. (eds) Mass Spectrometry for Microbial Proteomics, Wiley, Chichester, UK, pp. 255–276. Karas, M. and Hillenkamp, F. (1988) Laser desorption ionization of proteins with molecular masses exceeding 10,000 daltons. Analytical Chemistry 60, 2299–2301. Karas, M., Bachmann, D. and Hillenkamp, F. (1985) Influence of the wavelength in high-irradiance ultraviolet laser desorption mass spectrometry of organic molecules. Analytical Chemistry 57, 2935–2939. Karas, M., Bachmann, D., Bahr, U. and Hillenkamp, F. (1987) Matrix-Assisted Ultraviolet Laser Desorption of non-volatile compounds. International Journal of Mass Spectrometry and Ion Processes 78, 53–68. Karlsson, R., Gonzales-Siles, L., Boulund, F., Lindgren, Å., Svensson-Stadler, L., Karlsson, A., Kristiansson, E. and Moore, E.R.B. (2017) Proteotyping: tandem Mass Spectrometry shotgun proteomic characterization and typing of pathogenic microorganisms. In: Shah, H.N. and Gharbia, S.E. (eds) MALDI-TOF and Tandem MS for Clinical Microbiology. Wiley, Chichester, UK, pp. 419–450. Karlsson, R., Gonzales-Siles, L., Gomila, M., Busquets, A., Salvà-Serra, F., Jaén-Luchoro, D., Jakobsson, H.E., Karlsson, A., Boulund, F., Kristiansson, E. and Moore, E.R.B. (2018) Proteotyping bacteria: Characterization, differentiation and identification of pneumococcus and other species within the Mitis Group of the genus Streptococcus by tandem mass spectrometry proteomics. PLOS One. https://doi. org/10.1371/journal.pone.0208804 Keys, C.J., Dare, D.J., Sutton, H., Wells, G., Lunt, M., McKenna, T., McDowall, M. and Shah, H.N. (2004) Compilation of a MALDI-TOF mass spectral database for the rapid screening and characterisation of bacteria implicated in human infectious diseases. Infection, Genetics and Evolution 4, 221–242. Kim, D., Ji, S., Kim, J.R., Kim, M., Byun, J-H., Yum, J.H., Yong, D. and Lee, K. (2020) Performance evaluation of a new matrix-assisted laser desorption/ionization time-of-flight mass spectrometry, ASTA MicroIDSys system, in bacterial identification against clinical isolates of anaerobic bacteria. Anaerobe 61, 102–131. Kim, Y. and Kim, J-S. (2017) ASTA’S MicroID System and its MycoMp database for mycobacterium. In: Shah, H.N. and Gharbia, S.E. (eds) MALDI-TOF and Tandem MS for Clinical Microbiology. Wiley, Chichester, UK, pp. 110–12. Krieg, N.R. and Holt, J.G. (eds) (1984) Bergey’s Manual of Systematic Bacteriology, 1st ed, vol. 1. Williams and Wilkins, Baltimore, Maryland. Krishnamurthy, T. and Ross, P.L. (1996) Detection of pathogenic and non-pathogenic bacteria by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Rapid Communications in Mass Spectrometry 10, 883–888. Laemmli, U.K. (1970) Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature 227, 680–685. Lancashire, L., Schmid, O., Shah, H.N. and Ball, G. (2005) Classification of bacterial species from proteomic data using combinatorial approaches incorporating artificial neural networks, cluster analysis and principal components analysis. Bioinformatics 21, 2191–2199. Lee, Y., Sung, J.Y., Kim, H., Yong, D. and Lee, K. (2017) Comparison of a new matrix-assisted laser desorption/ionization time-of-flight mass spectrometry platform, ASTA MicroIDSys, with Bruker Biotyper for species identification. Annals of Laboratory Medicine 37, 531–535.

116

H.N. Shah et al.

Lévesque, S., Dufresne, P.J., Soualhine, H., Domingo, M-C., Bekal, S., Lefebvre, B. and Tremblay, C. (2015) A side by side comparison of Bruker Biotyper and VITEK MS: Utility of MALDI-TOF MS technology for microorganism identification in a public health reference laboratory. PLoS One 10 (12), doi: 10.1371/journal.pone.0144878 Lomholt, H.B. and Kilian, M. (2010) Population genetic analysis of Propionibacterium acnes identifies a subpopulation and epidemic clones associated with acne. PLoS One 5, e12277. Ma, W., Zhang, D., Li, G., Liu, J., He, G., Zhang, P., Yang, L., Zhu, H., Xu, N. and Liang, S. (2017) Antibacterial mechanism of daptomycin antibiotic against Staphylococcus aureus based on a quantitative bacterial proteome analysis. Journal of Proteomics 150, 242–251. Mahé, P., Arsac, M., Perrot, N., Charles, M-H., Broyer, P., Hyman, J., Walsh, J., Chatellier, S., Girard, V., van Belkum A. and Veyrieras, J-B. (2017) Identification of species in mixed microbial populations using MALDI-TOF MS. In: Shah, H.N. and Gharbia, S.E. (eds) MALDI-TOF and Tandem MS for Clinical Microbiology. Wiley, Chichester, UK, pp 167–186. McDowell, A., Valanne, A., Ramage, G., Tunney, M.M., Glenn, J.V., McLorinan, Bhatia, A., Maisonneuve, J.F., Lodes, M., Persing, D.H. and Patrick, S. (2005) Propionibacterium acnes Types I and II represent phylogenetically distinct groups. Journal of Clinical Microbiology 43, 326–334. McDowell, A., Perry, A.L., Lambert, P.A. and Patrick, S. (2008) A new phylogenetic group of Propionibacterium acnes. Journal of Medical Microbiology 57, 218–224. McDowell, A., Barnard, E., Liu, J., Li, H. and Patrick, S. (2016) Proposal to reclassify Propionibacterium acnes type I as Propionibacterium acnes subsp. acnes subsp. nov. and Propionibacterium acnes type II as Propionibacterium acnes subsp. defendens subsp. nov. International Journal of Systematic and Evolutionary Microbiology 66, 5358–5365. Mellmann, A., Harmsen, D., Cummings, C.A., Zentz, E.B., Leopold, S.R., Rico, A., Prior, K., Szczepanowski, R., Ji, Y., Zhang, W., McLaughlin, S.F., Henkhaus, J.K., Leopold, B., Bielaszewska, M., Prager, R., Brzoska, P.M., Moore, R.L., Guenther, S., Rothberg, J.M. and Karch H. (2011) Prospective genomic characterization of the German enterohemorrhagic Escherichia coli O104:H4 outbreak by rapid next generation sequencing technology. PLoS One 6 (7), e22751. Minnikin, D.E. and Goodfellow, M. (1980) Lipid composition in the classification and identification of acid-fast bacteria. In: Goodfellow, M. and Board, R.G. (eds) Microbiological Classification and Identification. Academic Press, London, pp. 189–256. Misra, R.V., Ahmod, N.Z., Parker, R., Fang, M., Shah, H.N. and Gharbia, S.E. (2012) Developing an integrated proteo-genomic approach for the characterisation of biomarkers for the identification of Bacillus anthracis. Journal of Microbiological Methods 88, 237–247. Misra, R.V., Gaulton, T., Fang, M., Culak, R.A., Hornshaw, M.M., Ho, J., Gharbia, S.E. and Shah, H.N. (2015) Tandem Mass Spectrometry Analysis as an approach to delineate genetically related taxa, with specific implication for differentiating Escherichia coli from amongst the complex Enterobacteriaceae family. Journal of Proteomics & Enzymology 4, 1–11. Misra, R.V., Gaulton, T., Ahmod, N., Fang, M., Hornshaw, M.M., Ho, J., Gharbia, S.E. and Shah, H.N. (2017) Tandem Mass Spectrometry Analysis as an approach to delineate genetically related taxa. In: Shah, H.N. and Gharbia, S.E. (eds) MALDI-TOF and Tandem MS for Clinical Microbiology. Wiley, Chichester, UK, pp. 313–378. Nagy, E., Urbán, E., Becker, S., Kostrzewa, M., Vörös, A., Vörös, A., Hunyadkürti, J. and Nagy I. (2013) MALDI-TOF MS fingerprinting facilitates rapid discrimination of phylotypes I, II and III of Propionibacterium acnes. Anaerobe 20, 20–26. Nomura, F. (2015) Proteome-based bacterial identification using matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS): A revolutionary shift in clinical diagnostic microbiology. Biochimica Biophysica Acta 1854, 528–537. Olkun, A., Shah, A. and Shah, H.N. (2017) Elucidating the intra-species proteotypes of Pseudomonas aeruginosa from cystic fibrosis. In: Shah, H.N. and Gharbia, S.E. (eds) MALDI-TOF and Tandem MS for Clinical Microbiology. Wiley, Chichester, UK, pp. 579–592. Pavlovic, M., Huber, I., Konrad, R. and Busch, U. (2013) Application of MALDI-TOF MS for the identification of food borne bacteria. The Open Microbiology Journal 7, 135–141. Patel, R. (2013) MALDI-TOF Mass Spectrometry: Transformative proteomics for clinical microbiology. Clinical Chemistry 59, 340–342. Pirnay, J.-P., Matthijs, S., Colak, H., Chablain, P., Bilocq, F., Van Eldere, J., De Vos, D., Zizi, M., Triest, L. and Cornelis, P. (2005) Global Pseudomonas aeruginosa biodiversity as reflected in a Belgian river. Environmental Microbiology 7, 969–980.

MALDI-TOF MS and Currently Related Proteomic Technologies

117

Pranada, A.B., Witt, E., Bienia, M., Kostrzewa, M. and Timke, M. (2017) Accurate differentiation of Mycobacterium chimaera from Mycobacterium intracellulare by MALDI-TOF MS analysis. Journal Medical Microbiology 66, 670–677. Rajakaruna, L., Hallas, G., Dare, D., Sutton, H., Encheva, V., Culak, R., Innes, I., Ball, G., Sefton, A.M., Eydmann, M., Kearns, A.M. and Shah, H.N. (2009) High throughput identification of clinical isolates of Staphylococcus aureus using MALDI-TOF-MS of intact cells. Infection, Genetics and Evolution 9, 507–513. Remus-Emsermann, M., Schmid, M., Gekenidis, M.T., Pelludat, C., Frey, J.E., Ahrens, C.H. and Drissner, D. (2016) Complete genome sequence of Pseudomonas citronellolis P3B5, a candidate for microbial phyllo-remediation of hydrocarbon-contaminated sites. Standards in Genomic Sciences 11, 75. doi:10.1186/s40793-016-0190-6 Rogers, K. (2012) The German E. coli outbreak of 2011. In: Britannica 2012 Book of the Year. Encyclopaedia Britannica, Inc., Chicago, Illinois, pp. 196–197. Rose, G., Culak, R., Chambers, T., Gharbia, S.E. and Shah, H.N. (2017) The challenges of identifying mycobacterium to the species level using MALDI-TOF MS. In: Shah, H.N. and Gharbia, S.E. (eds) MALDI-TOF and Tandem MS for Clinical Microbiology. Wiley, Chichester, UK, pp. 93–121. Santos, I.C., Hildenbrand, Z.L. and Schug, K.A. (2016) Applications of MALDI-TOF MS in environmental microbiology. The Analyst 141, 2827–2837. Schleifer, K.H. and Kandler, O. (1972) Peptidoglycan types of bacterial cell walls and their taxonomic implications. Bacteriological Reviews 36, 407–477. Schmid, O., Ball, G., Lancashire, L., Culak, R. and Shah, H.N. (2005) New approaches to identification of bacterial pathogens by surface enhanced laser desorption/ionisation time of flight mass spectrometry in concert with artificial neural networks, with special reference to Neisseria gonorrhoeae. Journal of Medical Microbiology 54, 1205–11. Schubert, S. and Kostrzewa, M. (2017) MALDI-TOF MS in the microbiology laboratory: Current Issues in Molecular Biology. doi: 10.21775/cimb.023.017. Schumann, P. and Maier, T. (2014) MALDI-TOF Mass Spectrometry applied to classification and identification of bacteria. In: Goodfellow, M., Sutcliffe, I. and Chun, J. (eds) Methods in Microbiology, New Approaches to Prokaryotic Systematics. Vol. 41. Elsevier, Oxford, UK, Chapter 13, pp. 275–306. Selander, R.K., Caugant, D.A., Ochman, H., Musser, J.M., Gilmour, M.N. and Whittam, T.S. (1986) Methods of multilocus enzyme electrophoresis for bacterial population genetics and systematics. Applied and Environmental Microbiology 51, 873–884. Seubert, W. (1960). Degradation of isoprenoid compounds by microorganisms. Pseudomonas citronellolis nov. sp. Isolation and characterization of an isoprenoid-degrading bacterium. Journal of Bacteriology 79, 426–434. Shah, H.N. (2014) New Methods bring new Information: What is hidden behind a bacterial species name. European Congress of Clinical Microbiology & Infectious Diseases (ECCMID), Congress, May, 2014, Barcelona. Shah, H.N. and Collins, M.D. (1981) Bacteroides buccalis sp nov Bacteroides denticola sp nov and Bacteroides pentosaceus sp nov, new species of the genus Bacteroides from the oral cavity. Zentralblatt fur Bakteriologie Mikrobiologie und Hygiene I. Abteilung Originale C. 2, 235–241. Shah, H.N. and Gharbia, S.E. (2011) A century of systematics of the genus Bacteroides, from a single genus up to the 1980s to an explosion of assemblages and the dawn of MALDI-TOF-Mass Spectrometry. The Bulletin of BISMiS 2, 87–106. Shah, H.N. and Gharbia, S.E. (2012) Using nano-LC-MS/MS to investigate the toxicity of outbreak E. coli 0104:h4 strain. Thermo Fisher Scientific, UK. Culture 33- ISSN 0965-0989. Shah, H.N. and Gharbia, S.E. (2017) A paradigm shift from research to front-line microbial diagnostics in MALDI-TOF and LC-MS/MS: a laboratory’s vision and relentless resolve to help develop and implement this new technology amidst formidable obstacles. In: Shah, H.N. and Gharbia, S.E. (eds) MALDI-TOF and Tandem MS for Clinical Microbiology. Wiley, Chichester, UK, pp. 1–38. Shah, H.N., van Steenbergen, T.J.M., Hardie, J.M. and de Graaff, J. (1982) DNA base composition, DNADNA reassociation and isoelectric focusing of proteins of strains designated Bacteroides oralis. FEMS Microbiology Letters 13, 125–130. Shah, H.N., Keys, C., Gharbia, S.E., Ralphson, K., Trundle, F., Brookhouse, I. and Claydon, M. (2000) The application of MALDI-TOF Mass Spectrometry to profile the surface of intact bacterial cells. Microbial Ecology in Health and Disease 12, 241–246. Shah, H.N., Keys, C.J., Schmid, O. and Gharbia, S.E. (2002) Matrix-Assisted Laser Desorption/Ionisation Time of Flight Mass Spectrometry and Proteomics; a new era in anaerobic microbiology. Clinical Infectious Diseases 35, 58–64.

118

H.N. Shah et al.

Shah, H.N., Encheva, V., Schmid, O., Nasir, P., Culak, R.A., Ines, I., Chattaway, M.A., Keys, C.J., Jacinto, R.C., Molenaar, L., Ayenza, R.S., Hallas, G., Hookey, J.V. and Rajendram, D. (2005) Surface Enhanced Laser Desorption/Ionization Time of Flight Mass Spectrometry (SELDI-TOF-MS): A potentially powerful tool for rapid characterisation of microorganisms. In: Miller, M.J. (ed.) Encyclopedia of Rapid Microbiological Methods.Vol.3. DHI Publishing, LLC, River Grove, IL, pp. 57–96. Shah, H.N., Chilton, C., Rajakaruna, L., Gaulton, T., Hallas, G., Atanassov, H., Khoder, G., Rakowska, P.D., Cerasoli, E., Gharbia, S.E. (2010) Changing concepts in the characterisation of microbes and the influence of mass spectrometry. In: Shah H.N. and Gharbia, S.E. (eds) Mass Spectrometry for Microbial Proteomics. Wiley, Chichester, UK, pp. 3–34. Shah, H.N., Stephenson, J., Cardasis, H., Neil, J., Yip, P., Ravela, S., Ritamo, I., Damsbo, M., Grist, R., Freeke, J., Stielow, B., Gvozdyak, O., Dukik, K., de Hoog, S., Damoc, E., Valmu, L., Cherkassky, A., Fang, M., Gaulton, T. Misra, R. and Gharbia, S.E. (2015) A Global Diagnostic Approach for Microbial Identification: Accurate characterization of difficult to differentiate pathogens using Top-down proteomics. European Congress of Clinical Microbiology & Infectious Diseases (ECCMID) Congress, April 2015, Copenhagen. Shute, L.A., Berkeley, R.C.W., Norris, J.R. and Gutteridge, C.S. (1985) Pyrrolysis mass spectrometry in bacterial systematics. In: Goodfellow, M. and Minnikin, D.E. (eds) Chemical Methods in Bacterial Systematics. Academic Press, London and New York, pp. 95–114. Singhal, N., Kumar, M., Kanaujia, P.K. and Virdi, J.S. (2015) MALDI-TOF mass spectrometry: an emerging technology for microbial identification and diagnosis. Frontiers in Microbiology https://doi.org/10.3389/ fmicb.2015.00791 Sneath, P.H.A. (1957) The application of computers to taxonomy. Journal of General Microbiology 17, 201–26. Soufi, Y. and Soufi, B. (2016) Mass Spectrometry-Based Bacterial Proteomics: focus on dermatologic microbial pathogens. Frontiers in Microbiology https://doi.org/10.3389/fmicb.2016.00181 Stübiger, C.G., Wuczkowski, M., Mancera, L., Lopandic, K., Sterflinger, K. and Belgacem, O. (2016) Characterization of yeasts and filamentous fungi using MALDI Lipid Phenotyping. Journal of Microbiological Methods 130, 27–37. Syrmis, M.W., Moser, R.J., Whiley, D.M., Vaska, V., Coombs, G.W., Nissen, M.D., Sloots, T.P. and Nimmo, G.R. (2011) Comparison of a multiplexed MassARRAY system with real-time allele-specific PCR technology for genotyping of methicillin-resistant Staphylococcus aureus. Clinical Microbiology and Infection 17, 1804–1810. Tamura, H. (2017) MALDI-TOF MS based on ribosomal protein coding in S10-spc-alpha operons for proteotyping. In: Shah, H.N. and Gharbia, S.E. (eds) MALDI-TOF and Tandem MS for Clinical Microbiology. Wiley, Chichester, UK, pp 269–310. Tanaka, K., Waki, H., Ido, Y., Akita, S., Yoshida, Y., Yoshida, T. and Matsuo, T. (1988) Protein and polymer analyses up to m/z 100 000 by Laser Ionization Time-of flight Mass Spectrometry. Rapid Communications in Mass Spectrometry 2, 151–153. Timperio, A.M., Gorrasi, S., Zolla, L. and Fenice, M. (2017) Evaluation of MALDI-TOF mass spectrometry and MALDI BioTyper in comparison to 16S rDNA sequencing for the identification of bacteria isolated from Arctic sea water. PLoS One 12, e0181860. Vranckx, K., De Bruyne, K. and Pot, B. (2017) Analysis of MALDI-TOF MS Spectra using the BioNumerics Software. In: Shah, H.N. and Gharbia, S.E. (eds) MALDI-TOF and Tandem MS for Clinical Microbiology. Wiley, Chichester, UK, pp. 539–562. Veloo, A.C.M., Jean-Pierre, H., Justesen, U.S., Morris, T., Urban, E., Wybo, I., Shah, H.N., Friedrich, A.W., ENRIA workgroup, Nagy, E. and Kostrzewa, M. (2017) A multi-center ring trial for the identification of anaerobic bacteria using MALDI-TOF MS. Anaerobe 48, 94–97. Williams, G. (2019) First report of infection with Pseudomonas citronellolis: a case of urosepsis. New Microbes and New Infections, 30, 100531. doi:10.1016/j.nmni.2019.100531 Wynne, C., Edwards, N.J. and Fenselau, C. (2010) Phyloproteomic classification of unsequenced organisms by top-down identification of bacterial proteins using cap LC-MS/MS on an Orbitrap. Proteomics 10, 3631–43. Xu, Z., Olkun, A., Vranckx, K., Mkrtchyan, H.V., Shah, A.J., Pot, B., Cutler, R.C. and Shan, H.N. (2017) Subtyping of Staphylococcus spp. based upon MALDI-TOF MS data analysis. In: Shah, H.N. and Gharbia, S.E. (eds) MALDI-TOF and Tandem MS for Clinical Microbiology. Wiley, Chichester, UK, pp. 563–578. Xu, Z., Chen, J., Vougas, K., Shah, A.J., Shah, H.N., Misra, R. and Mkrtchyan, H.V. (2020) Comparative proteomic profiling of methicillin-susceptible and resistant Staphylococcus aureus. Proteomics 20(2), 1–6. Yang, Y., Lin, Y. and Qiao, L. (2018) Direct MALDI-TOF MS identification of bacterial mixtures. Analytical Chemistry 17, 10400–10408.

8

MALDI-TOF MS and its Requirements for Fungal Identification

Cledir Santos1,*, Paula Galeano2, Reginaldo Lima-Neto3, Manoel Marques Evangelista Oliveira4 and Nelson Lima5 1 Department of Chemical Science and Natural Resources, Universidad de La Frontera, Temuco, Chile; 2Facultad de Ciencias Básicas, Universidad de la Amazonia, Florencia, Caquetá, Colombia; 3Department of Tropical Medicine, Federal University of Pernambuco (UFPE), Recife, Brazil; 4Laboratory of Taxonomy, Biochemistry and Bioprospecting of Fungi, Oswaldo Cruz Institute, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil; 5CEB–Biological Engineering Centre, University of Minho, Campus de Gualtar, Braga, Portugal

Introduction Taxonomy comprises classification and nomenclature, where classification is the arrangement of new species into taxa, and nomenclature is concerned with the assignment of valid names to each taxon, in agreement with published rules of the code of nomenclature. Species are the basic rank of fungal taxonomy and, although classification is the ultimate aim for many fungal taxonomists, much of their daily routine work is fungal identification. The aim of identification is to determine that a particular organism belongs to a recognized taxon. The present chapter is concerned with the methodology of fungal identification. Traditionally, identification of filamentous fungal species is based on morphological data, mainly those linked to the fungal sexual and/or asexual reproductive structures. This method of fungal identification has critical limitations; for example, some fungal cultures do not develop reproductive structures, or there is morphological similarity between members of different species

(Simões et al., 2013). Macro- and micro-morphology analyses are time-consuming work, prone to subjectivity and may involve complex methodology. Despite this, morphology is still used as an important feature in delineating fungal species within a genus (Simões et al., 2013; Alsohaili and Bani-Hasan, 2018; Lima et al., 2019). The incorporation of biochemical features in fungal identification, including secondary metabolite production and assays based on physiological characters, have helped to overcome these problems in some cases (Rodrigues et al., 2011; Lima-Neto et al., 2014; Matos et al., 2020; Paziani et al., 2020). However, their use is somewhat limited and reproducibility is clearly affected by environmental changes, cell ageing, etc. DNA sequence-based identification, such as with DNA barcodes, is considered the ‘gold-standard’ of fungal taxonomy (see Chapters 12 and 14). It generates accurate identifications and represents a useful strategy with fast results, and high sensitivity and specificity in comparison with the conventional methods of fungal identification (e.g. classical morphology

*[email protected] © CAB International 2021. Trends in the Systematics of Bacteria and Fungi (eds. P. Bridge, D. Smith and E. Stackebrandt)

119

120

MALDI-TOF MS and its Requirements for Fungal Identification

and biochemical assays) (Hibbett et al., 2016; Norlia et al., 2018). Despite this, there are significant limitations to molecular identification (Paziani et al., 2020). Rapid and reliable sequencing analysis is still available only for a limited number of fungal taxa and the quality of reference DNA sequences contained in public access databases is not always as high as expected (Bidartondo et al., 2008). In addition, the application of molecular methods for routine identification is relatively expensive and requires highly specialized equipment and labour. Delays in fungal identification (which in some cases can be weeks), as well as limitations in the discrimination of closely related fungal species, can occur during DNA sequence-based identification (Paziani et al., 2020). Matrix-assisted laser desorption/ionization time of flight mass spectrometry (MALDI-TOF MS) is now used as a routine technique for the fast and reliable identification of fungi at the species level and, currently, it represents an important phenotypic methodology based on proteomic profiles (Santos et al., 2010; Rodrigues et al., 2011, 2019). The application of MALDI-TOF MS in microbial identification dates from the early 1990s with the studies by Cain et al. (1994) for identification of bacteria (see Chapter 7). Although this work used a minimal sample preparation method for the purification of ribosomal proteins from cell contents, this study was the starting point for the intact microbial cell methods currently used, and was an important milestone for the use of MALDI-TOF MS in modern microbial taxonomy. Subsequently Holland et al. (1996) described the first method for the rapid identification of whole bacteria. This was the first reported example of successful bacterial chemotaxonomy by MALDI-TOF MS analysis of intact cells. Holland et al. (1996) sampled bacterial cells directly from colonies on an agar plate, transferring a sample to a Parafilm® M surface. A matrix solution was added and the sample was gently mixed and left to air-dry. The sample was finally spotted into the MALDI stainless steel sample plate, which was transferred into the MALDI-TOF MS apparatus for analysis. After the whole-cell bacterial analysis, each spectrum was archived and used as a reference spectrum in a preliminary in-house database that allowed species-level bacterial identification. This work established the basis that was

extended for fungal identification by MALDI-TOF MS as we know it today. The first use of MALDI-TOF MS for fungal identification was reported by Welham et al. (2000), who analysed spores of Penicillium spp., Scytalidium dimidiatum (current name Neoscytalidium dimidiatum) and Trichophyton rubrum. All the fungi evaluated gave different spectra with just a few discrete peaks being observed over the 2–13 kDa mass range. All the spectra were reproducible under the method used. Li et al. (2000) applied the same methodology in analysing spores by MALDI-TOF MS to identify strains of Aspergillus flavus, Aspergillus oryzae, Aspergillus parasiticus and Aspergillus sojae. The results were similar to those obtained by Welham et al. (2000) regarding the number and mass of ions in the spectra, and Li et al. (2000) were able to use the different mass peak profiles of the fungal spores to differentiate between aflatoxigenic and non-aflatoxigenic strains in the three species. Although fungi can be successfully identified using fungal spores, it is a time-consuming procedure. In addition, fungal spores yield only a small number of peaks in the MALDI-TOF mass spectrum. Some of these peaks can be related to compounds other than ribosomal proteins. Fungal cell walls generally consist of about 80%– 90% polysaccharides, including the long-chain polymer chitin. If non-ribosomal protein ion peaks were present in the MALDI-TOF mass spectrum, the reliability of fungal identification could be improved. Amiri-Eliasi and Fenselau (2001) described the first MALDI-TOF MS method based on the analysis of intact cells for the identification of the yeast species Candida albicans and Saccharomyces cerevisiae, and the dermatophyte Epidermophyton floccosum. They reported that the best technique for cell wall lyses for MALDI-TOF MS involved the use of high concentrations of formic acid solutions. It is now known that there are issues with the use of high concentrations of formic acid for the analysis of filamentous fungi, as it can lead to the fast degradation of biomarkers, and can interfere in the spectrum acquisition by MALDI-TOF MS. In contrast, the use of formic acid levels of up to 25% (v/v) is a good option for the identification of yeasts. Oliveira et al. (2015) established a MALDI-TOF MS methodology that used 25% formic acid for the identification of the Sporothrix species complexes. They

C. Santos et al. 121

found this a good option for the yeast form of Sporothrix species, but it affected the spectra acquisition of the filamentous form of these fungi. Schmidt and Kallow (2005) provided the first methodology for the identification of various Basidiomycete species (Serpula lacrymans, Serpula himantioides, Coniophora putena, Coniophora marmorata, Antrodia vaillantii and Antrodia sinuosa). They used mycelia of each fungal strain grown on agar plates. The method was fast, with simple sample preparation, and the final fungal identification was performed by comparing each spectrum with reference spectra in a preliminary in-house database. Kallow et al. (2006) evaluated MALDI-TOF MS identification with 18 strains belonging to Aspergillus section Nigri. Their methodology was based on the fungal mycelium method; mycelium was collected from agar plates, washed with distilled water and freeze-dried. The dried mycelia were transferred as a thin film to the MALDI stainless steel template, mixed with matrix solution, air-dried and analysed. It is now known that freezing-drying the material is not optimal for fungi and that the age of the cells will affect the spectra obtained and any subsequent identification (Lima et al., 2019; Matos et al., 2020; Paziani et al., 2020). Since 2006 there have been a large number of studies on fungal identification by MALDI-TOF MS (for reviews see Santos et al., 2010; Lima et al., 2019). MALDI identification of different fungal groups has been described using biological material from both pure cultures (including a mixture of mycelium and spores and single yeast cells) and complex matrices, such as fungi colonizing plants and human clinical samples (Santos et al., 2015, 2016; de Almeida et al., 2016). Currently, the MALDI-TOF MS technique is a routine methodology for fungal identification in both research and applied studies. The main differences between the methods used for yeast and filamentous fungi identification by MALDI-TOF MS have been reviewed by Bader (2017). That study was largely focused on the production of samples suitable for MALDI-TOF MS fungal species identification from both pure cultures and different clinical materials. The publication detailed the various consumables and chemicals used for fungal identification by MALDI-TOF MS, and included procedures for processing samples, including

harvesting from matrices such as blood culture bottles and urine. Methodologies for fungal growth prior to MALDI-TOF MS analysis were also discussed, as were procedures for sample preparation and mass spectra acquisition. This reference, therefore, provides a good guideline for fungal identification by MALDI-TOF MS. Crossay et al. (2017) have recently developed a MALDI-TOF MS method for the identification of arbuscular mycorrhizal fungi (AMF) based on the analysis of spores. The authors used the existing knowledge on MALDI-TOF MS identification from fungal spores and extended it to the identification of some AMF at species level. They successfully identified 19 AMF strains belonging to 14 species placed in seven genera and five families. In addition, they also demonstrated intraspecific differentiation for the majority of isolates. MALDI-TOF mass spectra were recorded in the positive linear mode with a mass range from 2000 to 20000 Da and, although the authors did not mention the average peak numbers per spectrum, visual examination of their spectra shows that spores of AMF generated spectra richer than those obtained from spores of other fungal groups (e.g. Aspergillus, Penicillium and Trichophyton). Sanguinetti and Posteraro (2017) revised the use of MALDI-TOF MS for the identification of various species of Aspergillus and Fusarium, as well as fungi in the order Mucorales, and some species of dimorphic fungi and dermatophytes. They described in detail the major key points needed for routine fungal identification by MALDI-TOF MS analysis, including sample preparation, reference databases and interpretive identification cut-offs. They reported that MALDI-TOF MS has contributed to improving the laboratory diagnosis of infections by filamentous fungi in terms of rapidity and accuracy of identification. However, they cautioned that further improvements are needed in the reference fungal databases, in order to explore the evolving diversity of fungal species recovered from clinical specimens in real time. Drissner and Freimoser (2017) reviewed the application of MALDI-TOF mass spectrometry in the research and diagnostics of yeasts and filamentous fungi in the agricultural value chain. The authors focused their review on the role of MALDI-TOF MS as a tool for fungal species identification, mainly in comparison to

122

MALDI-TOF MS and its Requirements for Fungal Identification

DNA-based identification methods. In addition, the value of custom-made reference spectra for fungal MALDI-TOF MS identification and the advance of the technique, mainly from the field of clinical diagnostics, were also highlighted. The review concluded with a summary of MALDI-TOF MS studies of yeasts and filamentous fungi of agricultural relevance. The authors concluded that MALDI-TOF MS is a technology that is far from being a standard tool for the identification of fungi and their functions. Patel (2019) recently reviewed the use of MALDI-TOF MS for the identification of pure cultures of yeasts, and dimorphic and filamentous fungi grown on culture medium, particularly those obtained from clinical samples. Patel described the main technical procedures for fungal identification in detail and also discussed the multiple MALDI-TOF MS systems and databases available worldwide for fungal identification. Patel reported limitations in MALDI-TOF MS databases, and considered that the main improvements in MALDI-TOF MS databases had been for clinically relevant fungi, mainly owing to the use of in-house databases. Overall, fungal identifications based on MALDI-TOF MS have in some cases been as good as those from molecular biology analysis, and fungal identification based exclusively on MALDITOF MS analysis is becoming more acceptable for publications (Matos et al., 2020). However, it will only be accepted as the sole procedure required for identification when more diverse taxa are studied, and more extensive spectra databases are available. The main limitations to MALDI-TOF MS for fungal identification are related to sample quality (e.g. quality of biological material such as rigidity or pigmentation of cell walls), sample preparation (e.g. the myriad of sample preparation methodologies that deliver different data sets to different MALDI-TOF MS databases – see Santos et al., 2017) and the databases themselves (e.g. the ‘black-box’ commercial databases) (Lau et al., 2019). In this chapter we present an overview and discussion of the use of MALDI-TOF MS for fungal identification. The major known limitations of the technique for fungal taxonomy, and how to overcome these, are also discussed.

Principles of MALDI-TOF MS and its Application in Fungal Taxonomy MALDI-TOF is a mass spectrometry technique developed over the years for the analysis of organic molecules by determining their molecular mass. MALDI-TOF MS analysis utilizes solid-state samples and this is one of the main advantages of the technique, as it only requires the solubilization of a fraction of the molecular component of the fungal cell wall. Based on this, the fungal sample can be transferred directly from the culture sample plate to the MALDI stainless steel sample plate, and the final sample can be prepared in situ with minimal sample preparation requirements (Santos et al., 2010; Lau et al., 2013; Lima et al., 2019; Paziani et al., 2020). During sample preparation, the analyte (fungal biomass) is covered with an organic compound that is referred to as the matrix. The matrix functions as an energy mediator to the MALDI pulsed laser. The laser irradiates a sample that has been impregnated with a molar excess of the matrix (molar analyte-to-matrix ratio of 1:10,000) (Hillenkamp and Karas, 2007). The fungal biomass used can be either mycelium or spores, or a mixture of both. Although there is a significant amount of literature regarding fungal identification based on the analysis of spores, the most common and reliable methods for MALDI-TOF MS fungal identification are based on the analysis of mycelium, or of mixtures of mycelium and spores (Lima and Santos, 2017). As previously mentioned, analysis of spores for identification is time-consuming and constrained by the low number and quality of peaks in the final mass spectrum. Sonicating samples to disrupt rigid spores and fungal cell walls has been used as an option to improve the quality of the mass spectra. Sonication will be necessary for all taxonomic groups being studied, as spectra in databases may have been acquired with multiple procedures (Matos et al., 2020). Preparing the sample in a microtube and avoiding direct transfer from the agar plates to MALDI stainless sample plates can lead to a significant increase in both quality and peak numbers in the final spectrum. Once the sample is spotted on the MALDI sample plate, the liquid phase of the matrix must be completely

C. Santos et al. 123

evaporated before the plate is transferred to the MALDI-TOF MS system. In the first stage of the process the sample plate is placed under a vacuum of c.8 × 10−7 mbar prior to spectra acquisition. Yeast samples must be initially treated with a 25% aqueous formic acid, spotted onto the MALDI plate; the matrix solution can be added only after the liquid acid has evaporated. The laser emission triggers ablation and desorption of the sample and matrix material and a plume of ions is generated. The mass spectra in the 2–20 kDa range are recorded using the MALDI-TOF MS linear mode. In fungal MALDITOF MS spectra the visible ion peaks are usually found between 4 and 13 kDa, but it is important to use the 2–20 kDa mass range for three reasons: (1) the main commercial databases are based on this mass range; (2) differences in mass range can occur in different fungal groups; and (3) the spectra of some fungi can contain peaks that cannot be observed in a simple visible plot. These peaks are present at a low intensity (ion abundance) but are important for fungal identification. The spectrum of a strain of Fusarium guttiforme is illustrated in Fig. 8.1. The strain was grown on a potato dextrose agar plate at 20°C for 5 days. The spectrum was acquired with a 2,5-di-hydroxy-benzoic acid (DHB) matrix using an Axima LNR MALDI-TOF MS system (Kratos Analytical, Shimadzu, Manchester, UK) equipped with a nitrogen laser (337 nm), where the laser intensity was set just above the threshold for ion production. Expanding the display over the 5.5–8.5 kDa mass range identifies peaks that were not apparent in the initial full spectrum plot. All data from the full mass spectrum (from 2 to 20 kDa) must be kept and used for fungal identification.

In addition to interspecific differences, MALDI-TOF mass spectra can show differences due to the culture medium used for growth, and other parameters such as the age of colony, growth temperature, the geographical origin of strains and the type and formulation of the MALDI-TOF MS matrix (Santos et al., 2010, 2017; Lau et al., 2019). MALDI-TOF MS matrices are molecules with both an aromatic moiety (the aromatic ring in each molecule) and some electron attractant substituents (nitrile, hydroxyl and carboxyl groups in the molecules). These features are fundamental for energy transfer, where the matrix transfers the absorbed photoenergy from the irradiation source to the surrounding sample molecules. This provides a ‘soft’ ionization of the sample with minimal fragmentation of the molecules being analysed (Hillenkamp and Karas, 2007; Robinson et al., 2018). There are many different matrices available, and optimal MALDI-TOF mass spectra acquisition depends on using the most appropriate matrix for the laser type and the molecular mass range. The matrix used must be suitable for the maximum wavelength of the laser. The two most common MALDI-TOF MS matrices used for fungal identification are α-cyano-4-hydroxy-cinnamic acid (CHCA) and DHB (Table 8.1). Both are based on ultravioletMALDI-TOF MS (UV-MALDI-TOF MS) absorbing compounds and, historically, MALDI-TOF MS databases have mainly been developed from spectra acquired with these matrices (Santos et al., 2017). The 2–20 kDa molecular mass range has become a standard setting for fungal identification and is used in the various commercially available MALDI-TOF MS databases. The molecular

100 50 0

5000

5500

6000

6500

7000

7500

8000

8500

m/z 100 50 0 2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

m/z

Fig. 8.1. MALDI-TOF MS spectrum of Fusarium guttiforme strain E-480 (adapted from Santos et al., 2016).

124

MALDI-TOF MS and its Requirements for Fungal Identification

Table 8.1. Features of the two MALDI-TOF MS matrices most commonly used for fungal identification (maximum wavelength values from Robinson et al., 2018). Name

Abbreviation

Maximum wavelength

MW (g·mol−1)

α-cyano-4-hydroxy-cinnamic acid

CHCA

350 nm

189.1675

Chemical structure O OH CN

HO O 2,5-di-hydroxy-benzoic acid

DHB

351 nm

154.1201

HO

OH OH

mass range selection is related to the molecular mass of fungal ribosomal proteins. These are large and conserved molecules and their use can make fungal identifications by MALDI-TOF MS comparable to those from molecular methods for some groups of fungi (Pereira et al., 2014; Lima et al., 2019; Matos et al., 2020). Early studies on fungal identification by MALDI-TOF MS were mainly based on the use of the DHB matrix because of its ease of handling for sample preparation and protein crystallization. However, with the introduction of highthroughput fungal identification, this matrix was replaced by CHCA. This latter matrix needs less time for sample analysis and spectra acquisition, leading to a reduction of laser use and faster spectra generation and species identification (Santos et al., 2010; Rodrigues et al., 2011; Lima and Santos, 2017). Discounting time factors, both CHCA and DHB matrices are good options for MALDI-TOF mass spectra acquisition and fungal identification. Additional compounds present in the fungal cell walls, mycelium and spores (such as sugars, carbohydrates and non-ribosomal peptides) have been used as biomarkers in MALDI-TOF MS proteomic profiles for fungal identification (Santos et al. 2010; Lima et al., 2019; Matos et al., 2020). Although these molecules are the major constituents of fungal cell walls, only a small number produce peaks in the molecular mass range above 2 kDa (less than 30% of total ion peaks in the mass spectra). Nitrogen (N2) gas lasers are by far the most commonly used for MALDI-TOF MS owing to their simplicity, small size and relatively low cost. They radiate at a wavelength of 337 nm, close to

the maximum absorption of many commonly used matrices. Their main limitation is related to their limited pulse repetition frequency, which is typically less than 100 Hz, and to their life cycle of about 2 years, depending on the equipment configuration and usage. Both features limit the use of N2 gas lasers for high-throughput analyses, such as fungal identification in routine clinical laboratories; in these kinds of applications, frequency-tripled Nd:YAG solid-state lasers with a wavelength of 355 nm are a better choice (O’Connor and Hillenkamp, 2007; Santos et al., 2010). The laser pulse is also fundamental for the efficient ionization of any molecules, including those present in the fungal cell wall, mentioned above. Pulsed lasers avoid fragmentation during the ionization process before spectra acquisition. Moreover, axial TOF analysers also require a short laser pulse for good mass resolution (O’Connor and Hillenkamp, 2007; Santos et al., 2010). Different pulsed gas and solid-state lasers are available for MALDI-TOF MS, and the equipment normally used for fungal identification is based on UV-MALDI-TOF MS lasers. External calibration of the mass-to-charge ratio (m/z) scale is a crucial process before mass spectra acquisition. Calibration typically involves analysing a compound which yields ions of known m/z. The m/z scale is then adjusted to give the correct values for the calibration peaks. Frequency of calibration will depend on the instrument and the reason for acquiring the mass spectrum. Mass calibration is one of the most critical parameters when undertaking accurate mass measurements and should cover the complete range of analyte masses. For fungal identification by MALDI-TOF MS, the normal mass range (from

C. Santos et al. 125

2 to 20 kDa) used must be covered by the calibration. Cells of Escherichia coli strain K12 or DH5-α are normally used for the in situ extraction of the 13 defined ribosomal proteins with molecular masses of 4365.4, 5096.8, 5381.4, 6241.4, 6255.4, 6316.2, 6411.6, 6856.1, 7158.8, 7274.5, 7872.1, 9742 and 12227.3 Da. These proteins are used as reference compounds for the external calibration of the MALDI-TOF MS equipment (Oliveira et al., 2015; Santos et al., 2015, 2016; Chang et al., 2016). Bruker markets a bacterial test standard (BTS), which is an E. coli extract spiked with two extra high-molecular weight proteins. According to the manufacturer, the specific composition of BTS covers the entire mass range of proteins used by MALDI-TOF MS for precise identification of microorganisms (Bruker, 2019). Other chemical compounds can be used for the external calibration. Paziani et al. (2020) evaluated MALDI-TOF proteomic profiles and phylogenetic analysis for the identification of Fusarium species complexes isolated from clinical cases in the State of São Paulo, Brazil. In their study, the authors used the commercial protein calibration standard of Bruker Daltonics. The use of commercial calibration kits may have cost implications, but the use of such BTS or other calibration kits is required for fungal identification by MALDI-TOF MS in accredited laboratories. In other circumstances E. coli K12 or DH5-α strains are good and cost-effective options.

Examples of the Use of MALDI-TOF MS Technique in Fungal Identification The identification of fungi at the species level is an important goal in taxonomic mycology. Historically, information about the fungus – such as ecological role, morphological description, physiological and biochemical properties (and sometimes societal benefits/detriments) – have been key elements in this process. Such fungal identification can be a long process and can be complicated by frequent revisions of the taxonomic schemes (Simões et al., 2013). The species concept is clearly abstract and its delimitation can be difficult and not always universally accepted. It is gradually becoming clearer that microbial identification and authentication require a multiple step approach to generate accurate

and useful data (Keys et al., 2004; Simões et al., 2013). The development and application of MALDI-TOF MS for fungal identification has provided some significant advances in fungal taxonomy. In some instances, the inclusion of MALDI-TOF MS techniques in polyphasic approaches to fungal taxonomy has significantly improved the reliability of some fungal identifications (Rodrigues et al., 2011; Soares et al., 2012). Rodrigues et al. (2011) characterized and identified 352 isolates of Aspergillus section Flavi collected from Portuguese almonds by applying a set of phenotypic analyses, including spectral analysis by MALDI-TOF MS and phylogenetics based on multilocus sequence analysis (MLSA). Subsequently, two new species, Aspergillus sergii and Aspergillus transmontanensis, were described from these strains and the previous MALDI-TOF MS analysis was important in delineating these (Soares et al., 2012). Chalupová et al. (2014) reviewed fungal identification by the MALDI-TOF MS technique and focused on the basic knowledge needed to apply MALDI-TOF MS in fungal identification. They discussed species identification of fungi associated with different branches of science. These included species of Aspergillus, Fusarium, Penicillium and Trichoderma, and other fungal species of agricultural, clinical and biotechnological relevance including wood-decaying fungi and phytopathogens; and Candida, Cryptococcus, Malbranchea, Pichia, Saccharomyces and Trichosporon. This is a significant study, as the authors identified the need to create new databases for specialized research purposes. They also identified the big gap in linking MALDI-TOF MS peptide/protein profiles with proteomic identification of individual biomarker molecules. Pereira et al. (2014) investigated MALDI-TOF MS as an alternative tool for identification and typing in Trichophyton rubrum. A total of 24 clinical strains of T. rubrum isolated from clinical samples in northern Portugal, and two reference strains obtained from the American Type Culture Collection were assessed (Dias et al., 2011). Species identification by MALDITOF MS was compared with the results of sequencing the internal transcribed spacers (ITS1 and ITS2) plus the 5.8S rDNA region. Intraspecific variability of the T. rubrum strains was assessed by PCR fingerprinting analysis with the primers M13, (GACA)4 and (AC)10. Pereira et al. (2014) found that the spectral dendrogram

126

MALDI-TOF MS and its Requirements for Fungal Identification

based on the MALDI-TOF MS data (Fig. 8.2A) showed intraspecific variability within the T. rubrum cluster, where the clinical strains MUM 09.12, 08.11, 10.133 and the reference strain ATCC MYA4438 were more distant from the other strains. In the ITS sequence data (Fig. 8.2B), 16 of the 18 strains were clustered in a single group with identical sequences, and isolates MUM 08.11 and 10.133 were 99% similar to them. Comparison of both dendrograms demonstrates that the MALDI-TOF MS technique was more discriminative than the ITS region at strain level. PCR fingerprinting with the three primers showed a lack of genetic variability and DNA polymorphisms among the T. rubrum strains (see Pereira et al., 2014). These results match the clonal mode of reproduction of T. rubrum (Gräser et al., 1999, 2008; Gupta et al., 2001; Seyfarth et al., 2007) where the exclusively asexual reproduction can explain the genetic uniformity within this species (Fig. 8.2B). In contrast, the mass spectral data show that there is intraspecific variability between the different proteomes of the T. rubrum strains (Fig. 8.2A). This comparison demonstrates that the species exhibit a uniformity in their genetic make-up which may be an expression of a general survival strategy for those fungi that propagate in a very specialized ecological niche such as human skin (Gräser et al., 1999). However, their proteomes have been able to vary (possibly through mechanisms such as post-translational modification). The MALDI-TOF MS technique has been used for the identification of different fungal taxa (some of which were mentioned earlier). Its use in identifying fungi in complex situations has also been evaluated with some success. Santos et al. (2015) evaluated the potential of MALDI-TOF MS for the in situ identification of both F. guttiforme on pineapple side shoots and of its antagonist, Trichoderma asperellum. They established a multistep identification methodology that was sensitive and accurate for the detection of both fungi. Figure 8.3A shows F. guttiforme growing on pineapple side shoots and Fig. 8.3C shows its antagonist, T. asperellum. The F. guttiforme strain was successfully identified when grown solely on pineapple side shoots (F. guttiforme, Fig. 8.3A–B) and T. asperellum was identified after isolation (T. asperellum, Fig. 8.3C–D). In this study, the in situ fungal growth conditions seemed to be less relevant to the correct identifi-

cation, as there was a clear difference in the intraspecific mass spectra peaks displayed by F. guttiforme and T. asperellum. de Almeida et al. (2016) described a promising new strategy for the direct analysis of blood cultures for fungi and arthroconidial yeasts by MALDI-TOF MS. Samples were Gram stained and hyphae and/or arthroconidia were selected and treated through an in-house protein extraction protocol. MALDI-TOF MS spectra were obtained and the fungal species Fusarium solani, Fusarium verticillioides, Exophiala dermatitidis, Saprochaete clavata and Trichosporon asahii were correctly identified. Li et al. (2017) evaluated the accuracy of the Bruker Biotyper MALDI-TOF MS system for the identification of clinical strains of Aspergillus species grown on agar media. A set of 381 Aspergillus distributed among 21 species previously identified by molecular analysis were assessed. They found that the Biotyper system was able to identify 30.2% (115/381) of the isolates at species level and 49.3% at genus level. They considered that the limited number of spectra in the database was the main limiting factor to using this platform as a routine identification method. The available MALDI-TOF mass spectra databases have mainly been built with a focus on human pathogens, and Becker et al. (2019) assessed a large in-house database of reference spectra and a dedicated web application for their suitability for use in veterinary laboratories. They used both MALDI-TOF MS and conventional techniques to identify a set of 290 filamentous fungal and yeast strains of some 69 different species isolated from animals such as pets, cattle and zoo animals. MALDI-TOF MS results correctly identified 89% of the isolates at the species level. In comparison, only 60% of the isolates were correctly identified with conventional approaches. They reported that their online database, which was developed for clinical practice in humans, could also be used for the identification of veterinary strains by MALDI-TOF MS.

Limitations to the Use of MALDI-TOF MS Technique in Fungal Identification Although there have been several successful applications of MALDI-TOF MS for fungal identification in both pure culture and complex situations, some important limitations have been

Clade I

MUM 09.20 MUM 09.29 MUM 10.128 MUM 10.132

T. rubrum

MUM 09.12 MUM 09.133

99

98

97

96

100 MUM 09.12 MUM 09.18 MUM 09.20 MUM 09.26 MUM 09.29 MUM 10.128 MUM 10.132

Clade II

MUM 08.10 MUM 08.15 MUM 08.11 ATOC MYA-4439

T. rubrum

MUM 09.10

MUM 08.11 MUM 10.133

Outgroup

T. mentagrophytes

ATCC MYA-4439

T. mentagrophytes

Fig. 8.2. Dendrograms obtained from (A) MALDI-TOF MS mass spectra of T. rubrum strains; and (B) ITS sequence data of T. rubrum strains. (Data from the Mycology Applied Group, University of Minho, Braga, Portugal.)

C. Santos et al. 127

MUM 08.05 MUM 08.09 MUM 08.10 MUM 08.12 MUM 08.13 MUM 08.15 MUM 09.08 MUM 09.09

MUM 09.09

MUM 09.26 MUM 09.09 MUM 08.13 MUM 08.12

95

ATCC MYA-4438

MUM 08.05 MUM 09.08 MUM 09.10 MUM 09.18

94

% ATCC MYA-4438

93

100

90

80

70

60

50

40

(B) 30

%

20

(A)

128

MALDI-TOF MS and its Requirements for Fungal Identification

A

B Direct analysis by MALDI-TOF MS

Fusarium guttiforme

Pineapple infected with F. guttiforme C

Antagonism: Trichoderma asperellum on F. guttiforme

D Fungal isolation

Fungal identification by macro- and micro-morphologies and MALDI-TOF MS

Trichoderma asperellum

Fig. 8.3. Early identification by MALDI-TOF MS of the pineapple pathogen Fusarium guttiforme (A and B) and its antagonist Trichoderma asperellum on decayed pineapple (C and D). (Data from the Mycology Applied Group, University of Minho, Braga, Portugal.)

reported. One of these is the presence of pigments in the fungal colony, and darkly pigmented fungi give a poor mass spectral fingerprint that contains few peaks of low relative abundance. Many fungi may produce pigmented colonies (Souza et al., 2016), and fungi such as Colletotrichum spp. can present highly melanised colonies that pose problems for MALDI-TOF mass spectra acquisition. Figure 8.4 shows the MALDI-TOF mass spectra obtained from four Colletotrichum spp. strains with increasing levels of pigmentation. Pigmentation increased from trace A to trace D, and the spectra show a corresponding decrease in ion abundance and peak number. Analysis of a fungal sample containing melanin and/or fungal pigments may interfere with the MALDI-TOF MS in a number of fundamental ways, mainly through direct competition by melanin and/or pigments with the MALDI-TOF MS matrix for irradiating photons. Moreover, during the ionization, the plume of ions above the sample can only occur where sufficient populations of electronically excited matrix molecules exist. If a photoexcitation/pooling mechanism for MALDI-TOF MS is assumed,

mobile excitations in the bulk sample deposit can be quenched by melanin and/or fungal pigments (Buskirk et al., 2011). Although the ribosomal protein fingerprint can be affected by melanised and/or pigmented fungal colonies, lipids (e.g. membrane phospholipids) may provide an alternative group of molecules to overcome the impact of pigmented molecules on fungal identification by MALDI-TOF MS (dos Santos et al., 2017). During the extraction of lipids from the fungal cell wall, some polar pigmented molecules remain in the aqueous phase while the lipids migrate with the organic non-polar phase. Fungal lipids are involved in essential functions for fungal cell survival and adaptation to specific environments and habitats (Stübiger et al., 2016). Fungal identification based on MALDI-TOF MS fingerprinting of lipids has been assessed for a number of fungal taxa (dos Santos et al., 2017), and Galeano García (2019) evaluated the technique for differentiating strains within some Phytophthora species. Phytophthora species can cause severe diseases in many crops including cocoa, citrus, tomato

C. Santos et al. 129

Intens. [a.u.]

(A) 0 5000

4000

3000

2000

1000

(B)

0 x104 1.0

0.8

0.6

0.4

0.2

0.0

(C)

6000

4000

2000

0

(D)

2500

2000

1500

1000

500

0 2000

4000

6000

8000

10000

12000

14000

16000

18000

m/z

Fig. 8.4. MALDI-TOF MS spectra of darkly pigmented strains of Colletotrichum spp. Colony pigmentation increased qualitatively from colony A to D. (Data from the Chemistry of Fungi Group, Department of Chemical Science and Natural Resources, Universidad de La Frontera, Temuco, Chile.)

and potato, and individual species are difficult to recognize by morphological characteristics. The early identification of this pathogen is an important task for fast decision making in crop protection. Galeano García (2019) was able to successfully differentiate strains belonging to the species Phytophthora palmivora, Phytophthora heveae, Phytophthora andina, Phytophthora betacei, Phytophthora infestans, Phytophthora citrophthora and Phytophthora capsici. Strains of Phytophthora were obtained from a single host (Solanum betaceum) cultivated at different sites. Figure 8.5 shows the MALDI-TOF MS differentiation of the isolates at the species and strain levels, based on

lipid fingerprinting. Distinct profiles were obtained for each species (Fig. 8.5A), and individual strains (Fig. 8.5B) could be differentiated (Galeano García, 2019). Although these results demonstrate the utility MALDI-TOF MS lipid fingerprinting for fungal identification, it is important to consider that lipids are not as conserved as the ribosomal proteins. In addition to pigmentation, the rigidity of fungal cell walls can also be an important limitation in fungal identification by MALDI-TOF MS. Both these limitations have led to a variety of different sample preparation and treatment methodologies being used in the generation of

130

MALDI-TOF MS and its Requirements for Fungal Identification

MALDI-TOF MS databases. The database is the main limitation of MALDI-TOF MS in fungal identification, and this issue has been discussed in detail by Santos et al. (2017).

MALDI-TOF MS for Cryptic and Dimorphic Fungal Identification There has been considerable debate on the true number of fungal species present in nature. A recent study by Hawksworth and Lücking (2017) considered the estimate of 1.5 million fungal species as conservative and suggested that the actual range could be better estimated at 2.2–3.8 million. There are some 120,000 currently accepted fungal species and these include cryptic species. Cryptic species are biologically distinct species that are morphologically indistinguishable from others. These may include some of the predicted ‘missing fungi’, as physiological and other biological divergences often precede morphological divergence in the evolution of fungal species. On this basis Hawksworth and Lücking (2017) suggested that the number of known fungi might rise by a factor of five or more. Molecular studies have shown that cryptic speciation is widespread through diverse groups of fungi and that they play an increasingly important role in species discovery (Hawksworth

and Lücking, 2017). Overall, it is possible to identify most filamentous fungi at the genus level using phenotypic methods but, owing to phenotypic variability between and within species, accurate species identification often requires molecular methods. For example, identification of dimorphic fungi or cryptic species can be complicated by the presence of multiple morphotypes, lack of sporulation and their presence in culture as either anamorphs or teleomorphs, or both (Del Chierico et al., 2012; Ranque et al., 2014; de Almeida et al., 2015; Oliveira et al., 2015; Valero et al., 2018; Ceballos-Garzón et al., 2019). As discussed above, the accurate identification of fungal species by MALDI-TOF MS has some important drawbacks. This is particularly true for cryptic and dimorphic species (Quéro et al., 2019, 2020; Matos et al., 2020; Paziani et al., 2020), and various MALDI-TOF MS studies on these groups have been published with varying success rates (see Table 8.2). In fungal taxonomy MALDI-TOF MS has largely been applied to yeast identification, particularly for clinically important groups (Santos et al., 2011; Lacroix et al., 2014; Lima-Neto et al., 2014; de Almeida et al., 2015; Oliveira et al., 2015; Panda et al., 2015). Compared to yeasts, MALDI-TOF MS identification rates of filamentous fungi remain low (Cassagne, 2016; Angeletti et al., 2017). A study by Dupont et al. (2019) reported that almost 49% of fungal identifications by MALDI-TOF MS with commercial databases

(A) (B)

Fig. 8.5. MALDI-TOF MS lipid fingerprinting of different Phytophthora (A) species; and (B) strains (adapted from Galeano García, 2019).

C. Santos et al. 131

Table 8.2. Some examples of MALDI-TOF MS studies for the identification of cryptic and dimorphic species of filamentous fungi. Proteomic characterization Cryptic species and dimorphic fungi

Via an extraction protocol

Direct analysis

Sporothrix spp.

+

−

Paracoccidioides sp. Histoplasma capsulatum var. Fusarium spp.

+

−

Oliveira et al. (2015) Peng et al. (2019); Matos et al. (2020) de Almeida et al. (2015)

+

−

Valero et al. (2018)

+

−

Aspergillus spp.

−

+

Aspergillus spp.

+

−

Penicillium spp. Penicillium spp.

− +

+ −

Cryptococcus sp.

+

−

Trichophyton spp.

+

−

Dong et al. (2009); Kemptner et al. (2009); Quéro et al. (2019); Paziani et al. (2020) Li et al. (2000); Alanio et al. (2011); Rodrigues et al. (2011, 2019); Caira et al. (2012) Hettick et al. (2008); Pan et al. (2011); Tam et al. (2014); Masih et al. (2016); Park et al. (2017); Vidal-Acuña et al. (2018); Hedayati et al. (2019); Imbert et al. (2019); Quéro et al. (2019); Heireman et al. (2020); Quéro et al. (2020) Passarini et al. (2013) Welham et al. (2000); Chen and Chen (2005); Quéro et al. (2019); Peng et al. (2019); Heireman et al. (2020); Quéro et al. (2020); Firacative et al. (2012); Danesi et al. (2014); Siqueira et al. (2019) Peng et al. (2019); Heireman et al. (2020)

References

+, fungal sample analysis performed with previous protein extraction; –, direct fungal sample analysis performed without previous protein extraction.

were unsatisfactory. They found that with the commercial databases, most identifications by MALDI-TOF MS were correct at the genus or complex level but not at the species level. This is particularly true for complexes of cryptic species and has previously been reported in a number of studies (e.g. Alanio et al., 2011; Sanguinetti and Posteraro, 2014; Tam et al., 2014; Gautier et al., 2016; Masih et al., 2016; Park et al., 2017; VidalAcuña et al., 2018; Hedayati et al., 2019; Imbert et al., 2019). There is a considerable need to expand MALDI-TOF MS databases, particularly for species complexes. This could be achieved by expanding the number of reference spectra for fungal species belonging to a species complex through the addition of in-house reference

spectra (Gautier et al., 2014; Schulthess et al., 2014; Normand et al., 2017). A new culture medium developed to overcome difficulties and to improve the identification of filamentous fungi by MALDI-TOF MS (IDFungi-Plates) was evaluated by Robert et al. (2020). They found that the growth rates and morphology of fungal colonies grown on the IDFungi-Plates culture medium were similar to those achieved when the same fungal species were grown on other common culture media. However, the use of the ID-Fungi-Plates allowed for a quicker identification by MALDI-TOF MS. This new culture medium may improve the MALDI-TOF MS identification of cryptic and dimorphic fungal species.

132

MALDI-TOF MS and its Requirements for Fungal Identification

MALDI-TOF MS Databases and Data Analysis in Fungal Identification Current situation of each different commercial database dedicated to fungal identification There are three major commercial MALDI-TOF MS instruments available for fungal identification: Flex Line, by Bruker Daltonics (Bremen, Germany); Axima Series, by Shimadzu (Manchester, UK), which is commercialized as Vitek MS (bioMérieux, Marcy l’Etoile, France); and ASTA (Suwon, South Korea) (Clark et al., 2013; Lee et al., 2017; Santos et al., 2017). Each commercial instrument has its own commercial database, and these are: Biotyper (Bruker Daltonics); SARAMIS-Spectral ARchive and Microbial Identification System (bioMérieux); and MicroIDSys (MicroIDSys, 2020), respectively (Table 8.3). In addition, a fourth system named Andromas SAS (Paris, France) provides a database and software for fungal identification based on MALDI-TOF mass spectra. Andromas, MALDI Biotyper and VITEK MS have been accredited for microbiological identification purposes under EU directive EC/98/79 in different European countries. However, at the current time, Andromas SAS is not approved by the US Food and Drug Administration or by any other surveillance organization from the whole American Continent (North, Central and South America) (Sampedro et al., 2018). These different systems are not compatible with each other (Patel, 2013). In-house MALDI-TOF MS databases for fungal identification Fungal identification by MALDI-TOF MS is critically dependent on the quality and accuracy of

the databases available. The need to improve existing commercial databases has been highlighted in various studies, particularly those in clinical mycology (e.g. Flórez-Muñoz et al., 2019; Matos et al., 2020; Paziani et al., 2020). Paziani et al. (2020) evaluated the performance of the Biotyper database for identifying clinical strains of Fusarium isolated in the State of São Paulo (Brazil). They found around 90% concordance between MALDI-TOF MS and molecular identification methods. They highlighted the potential of MALDI-TOF MS for identification of filamentous fungi as a fast and cost‐efficient alternative, but concluded that this technique would require a larger and more accurate database. Many researchers around the world are adding additional spectral data to commercially available databases, although previous studies have shown that establishing in-house databases is a valuable alternative to improve fungal identification. Borman et al. (2019) used MALDI-TOF MS to distinguish the dimorphic fungal pathogen Talaromyces marneffei, using an invertebrate model with Galleria mellonella. Conversion of conidia to the yeast form occurred at 24 h post-inoculation, and identification was possible after supplementation of the MALDI Biotyper database with in-house mass spectral profiles created from both the yeast-like and the mycelial phase. Matos et al. (2020) successfully identified a clinical strain of Sporothrix brasiliensis by MALDI-TOF MS with an in-house database. They reported that an in-house database was a good way to overcome limitations associated with the use of MALDI-TOF MS in fungal identification. Sogawa et al. (2012) added additional spectra of 229 microbial isolates to the commercial Biotyper database and found this increased the identification rate from c.87% to

Table 8.3. Main features of the major commercial MALDI-TOF MS systems available. Manufacturer Instrument Database

Library (species) ID criteria

Bruker Daltonics bioMérieux

Flex Line

Biotyper

2748

Vitek MS

SARAMIS

1316

ASTA

Tinkerbell

MicroIDSys

2604

ID reliability

Reference

Score 2.00–2.299 Clark et al. (2013); confidence Bruker (2018) Percentage 60%–99.9% Clark et al. (2013); Santos et al. (2017) Score ≥ 140 Ha et al. (2016); Lee confidence et al. (2017); BioMerieux (2020); MicroIDSys (2020)

C. Santos et al. 133

98% for a sub-set of 498 microbial isolates. Honnavar et al. (2018) were only able to identify 2 out of 15 species of Malassezia with the Biotyper database. They constructed an in-house database from 88 Malassezia isolates previously identified by DNA sequencing. This database was tested against 190 isolates previously identified by PCR-RFLP and allowed identification at 94.7% and 95.3% to species and genus level, respectively. They reported that Malassezia identification by MALDI-TOF MS was a reliable approach. Matos et al. (2020) and Paziani et al. (2020) have shown that results obtained by MALDI-TOF MS can be as good as those obtained by molecular methods when a reliable database is available for fungal identification. Paul et al. (2019) identified 117 melanised fungi by establishing an in-house database of spectra obtained from a modified protein extraction protocol. They found that MALDI-TOF MS-based identification of melanised fungi was faster and fully reliable with this database. Although this was a good strategy to solve the specific problem in this case, the problems of using diverse sample preparation protocols have been widely discussed (see Santos et al., 2017). MALDI-TOF MS users can create and/or enrich their own libraries of mass spectra by including regional, specific and relevant fungal strains. It is worth noting that increasing the spectral data with an in-house database was more practical in the former Biotyper software than in the newer version (Posteraro et al., 2013). However, with the most recent version of the Biotyper system, it became a ‘black-box’ database in which, for example, it is not easy to compare two spectra acquired at the same time, even within the Biotyper system. Many studies have compared the effectiveness of the different commercial MALDI-TOF MS systems, particularly Biotyper and VITEK MS, which are the two most commonly used worldwide. Overall, spectra obtained in the different systems are not interchangeable, and therefore cannot be assessed in other databases. This is one of the main limitations in establishing a common MALDI-TOF MS-based fungal identification system. It is a particularly limiting factor when laboratories need to share useful metadata to detect the emergence of new fungal species, and thus more rapidly detect the epidemic spread of a given fungal strain (Normand et al., 2017).

Chao et al. (2014) compared the Bruker Biotyper and VITEK MS systems with rDNA gene sequencing analysis. They evaluated 200 clinical isolates of commonly encountered clinical yeasts. They correctly identified 92.5% of isolates with high scores with the Bruker MS compared to 87.5% identified with the VITEK MS system. Cassagne et al. (2016) reviewed the performance of MALDI-TOF MS platforms in fungal identification. They described the preprocessing and identification steps among platforms and compared the identification efficacy of yeast, filamentous fungi and dermatophyte species. They found that for yeasts the percentage of correct identifications varied from 82.7% to 100% with the SARAMIS system and from 98.2% to 98.8%, with the Andromas system. With filamentous fungi they obtained 81.8% to 100% correct identifications when using the VITEK MS. The results for dermatophytes varied between 59.6% and 100% and from 99.3% to 100% with the Biotyper and SARAMIS systems, respectively. Comparing commercially available systems is not an easy task, mainly because of differences in the composition and architecture of reference databases, sample pre-treatment and biological composition of the fungal samples used. Nevertheless, trends can be highlighted when assessing each MALDI-TOF MS platform. Currently, commercial MALDI-TOF MS systems undoubtedly outperform some conventional techniques for yeast identification, although slight improvements of the reference databases are needed, especially for Cryptococcus, Trichosporon and Malassezia species. With regard to filamentous fungi, the commercially available databases remain insufficient for routine analyses. In contrast, high identification rates have been reported by different research groups, but only when using in-house databases. The construction of enhanced databases is time-consuming and possible only in centres equipped with both MALDI-TOF technology and considerable mycological expertize. Therefore, there is a need for the development of publicly available reference spectra databases that could be queried online, similar to the NCBI GenBank nucleotide database. Establishing a freely available online database of fungal spectra covering all fungal species analysed by MALDI-TOF MS would provide a breakthrough for the use of the system in mycology.

134

MALDI-TOF MS and its Requirements for Fungal Identification

The inclusion of those fungal species involved in human diseases, food safety and/or biotechnological processes could provide a major advance for the global economy and well-being. This issue has been addressed by Normand et al. (2017) who have developed the online software known as the Mass Spectrometry Identification (MSI) Platform. This includes a database of 11,851 fungal MALDI-TOF MS spectra comprising 938 pathogenic and non-pathogenic fungal species of clinical relevance. This application has been freely available to medical mycology specialists since July 2016 at https://biologicalmass-spectrometry-identification.com/msi/ (accessed 1 August 2020). Although the MSI Platform is a good open access option, this system is built on spectra acquired by different researchers and clinical service laboratories. All spectra are obtained according to the standardized MSI Platform methodology and authorship of the spectra is transferred to the MSI Platform. The online application has only been tested with spectra obtained by the Flex Line equipment from Bruker. Overall, an adapted MALDI-TOF MS database able to process spectra obtained from different MALDI-TOF MS equipment, such as the Flex Line, Axima Series or ASTA, is required to serve mycologists globally in the future. Importantly, there needs to be some guarantee that spectra stored in such a database will remain freely available in the public domain, and will not be used in the future for commercial purposes. The MSI online library was compared to the Bruker filamentous fungi library and the National Institute of Health (NIH) library by Stein et al. (2018). They evaluated these three MALDI-TOF mass spectrometry libraries for the identification of filamentous fungi in clinical microbiology laboratories from Canada, and demonstrated greater accuracy in genus level identification (≥ 94.9%) for all libraries than conventional methods (86.4%).

Conclusion It is some 20 years since MALDI-TOF MS applications were introduced for fungal identification. During these two decades, several advances have been made and different fungal taxa grown in pure culture or in complex situations can be

identified in few seconds. However, there are still important limitations to the methodology that need to be overcome, such as pigmentation in fungal cell walls and cell wall rigidity. The latter can be overcome through reduced growth times and mechanical cell disruption (sonication), while cell wall pigmentation can be addressed by using some of the newly designed culture media or by removing pigmented compounds with selective organic solvents. The major current bottleneck in development of MALDI-TOF MS for fungal identification is the lack of comprehensive reference strain databases (see Chapter 18). The currently available closed databases are apparently not aligned with the increasing scientific demand for open data and its main findable, accessible, interoperable and reusable (FAIR) principles, and so will not contribute to the acceleration of research and innovation. A real public and open access MALDI-TOF MS database is urgently required as a matter of consequence. Moreover, this should guarantee that spectra deposited in such MALDI-TOF MS database would remain public, preferably in open free access. To avoid misidentification, these stored spectra must be curated and based on well-established standard operating procedures. The number of spectra available within species needs to be increased to accommodate the diversity and geographic differences, unique strain traits and the varied culture conditions and procedures in order to establish a single public and open access MALDITOF MS database. This could then be used with metadata analysis and artificial intelligence algorithms, to provide reliable fungal identification.

Acknowledgements C. Santos thanks the Universidad de la Amazonia (Florencia, Colombia) for the financial support for his visit as Visiting Professor during summer 2020, when this book chapter was designed and written. This work was partially funded by the Universidad de La Frontera (Temuco, Chile) through the Project DIUFRO PIA19‐ 0001. P. Galeano thanks Colciencias/Colombia for the doctoral fellowship No. 6172; the Laboratory of Advanced Analytical Techniques in Natural

C. Santos et al. 135

Products of the Universidad de los Andes (Bogotá, Colombia); and the Thomson Mass Spectrometry Laboratory of the University of Campinas (Campinas, Brazil) for the fruitful discussions on MALDI-TOF MS applied to fungal identification. R. Lima-Neto thanks CNPq/Brazil for the grant 310822/2018-1 and CETENE/MCTI/Brazil for the permanent support and access to the MALDITOF MS. M.M.E. Oliveira thanks FAPERJ/Brazil for the grant JCNE E-26/203.301/2017 and CNPq/Brazil for the grant Proc. 409227/

2016-1. N. Lima acknowledges the support of FCT/Portugal under the scope of the strategic funding of the UID/BIO/04469/2019 unit and BioTecNorte operation (NORTE-010145-FEDER-000004) funded by the European Regional Development Fund under the scope of Norte2020–Programa Operacional Regional do Norte. Some MALDI‐TOF MS analyses presented in this chapter were developed using equipment funded by CONICYT/Chile through the project Fondequip EQM160054 2016.

References Alanio, A., Beretti, J.L., Dauphin, B., Mellado, E., Quesne, G. et al. (2011) Matrix-assisted laser desorption ionization time-of-flight mass spectrometry for fast and accurate identification of clinically relevant Aspergillus species. Clinical Microbiology and Infection 17 (5), 750–755. doi:10.1111/j.1469-0691.2010.03323.x Alsohaili, S.A. and Bani-Hasan, B.M. (2018) Morphological and molecular identification of fungi isolated from different environmental sources in the northern eastern Desert of Jordan. Jordan Journal of Biological Sciences 11 (3), 329–337. Amiri-Eliasi, B. and Fenselau, C. (2001) Characterization of protein biomarkers desorbed by MALDI from whole fungal cells. Analytical Chemistry 73 (21), 5228–5231. doi:10.1021/ac010651t Angeletti, S. (2017) Matrix assisted laser desorption time of flight mass spectrometry (MALDI-TOF MS) in clinical microbiology. Journal of Microbiological Methods 138, 20–29. doi:10.1016/j.mimet. 2016.09.003 Becker, P., Normand, A.C., Vanantwerpen, G., Vanrobaeys, M., Haesendonck, R. et al. (2019) Identification of fungal isolates by MALDI-TOF mass spectrometry in veterinary practice: validation of a web application. Journal of Veterinary Diagnostic Investigation 31 (3), 471–474. Bader, O. (2017) Fungal species identification by MALDI-ToF mass spectrometry. In: Lion, T. (ed.) Human Fungal Pathogen Identication. Springer Nature, Switzerland, pp. 323–337. doi: 10.1007/978-1-4939-6515-1 Bidartondo, M.I. et al. (2008) Preserving accuracy in GenBank. Science 319 (5870), 1616a. doi:10.1126/ science.319.5870.1616a Borman, A.M., Fraser, M., Szekely, A. and Johnson, E.M. (2019) Rapid and robust identification of clinical isolates of Talaromyces marneffei based on MALDI-TOF mass spectrometry or dimorphism in Galleria mellonella. Medical Mycology 57 (8), 969–975. doi:10.1093/mmy/myy162 BioMerieux (2020) Mass spectrometry microbial identification system. Available at: https://www.biomerieux-diagnostics.com/vitekr-ms-0 (accessed 13 February 2020). Bruker (2018) MBT Compass Library, Revision E MBT 7854 MSP Library. Available at: ftp://ftp.bdal.de/data/ Support/TOF/MaldiBiotyper/Version8.0.0.0/MaldiBiotyperDBUpdate_V8.0.0.0_7311-7854(RUO)_ Release-Notes.pdf (accessed 13 February 2020). Bruker (2019) Bacterial Test Standard. Available at: https://www.bruker.com/fileadmin/user_upload/8-PDFDocs/Separations_MassSpectrometry/Literature/Flyers/1862828_MBT_BTS_USA_01-2019_eBook. pdf (accessed 10 February 2020). Buskirk, A.D., Hettick, J.M., Chipinda, I., Law, B.F., Siegel, P.D. et al. (2011) Fungal pigments inhibit the matrix-assisted laser desorption/ionization time-of-flight mass spectrometry analysis of darkly pigmented fungi. Analytical Biochemistry 411 (1), 122–128. doi:10.1016/j.ab.2010.11.025 Cain, T.C., Lubman, D.M., Weber, W.J. and Vertes, A. (1994) Differentiation of bacteria using protein profiles from matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Rapid Communications in Mass Spectrometry 8 (12), 1026–1030. doi:10.1002/rcm.1290081224 Caira, M., Posteraro, B., Sanguinetti, M., de Carolis, E., Leone, G. and Pagano, L. (2012) First case of breakthrough pneumonia due to Aspergillus nomius in a patient with acute myeloid leukemia. Medical Mycology 50 (7), 746–750. doi:10.3109/13693786.2012.660507 Cassagne, C., Normand, A.C., L’Ollivier, C., Ranque, S. and Piarroux, R. (2016) Performance of MALDI-TOF MS platforms for fungal identification. Mycoses 59 (11), 678–690. doi:10.1111/myc.12506

136

MALDI-TOF MS and its Requirements for Fungal Identification

Ceballos-Garzón, A., Cortes, G., Morio, F., Zamora-Cruz, E.L., Linares, M.Y. et al. (2019) Comparison between MALDI-TOF MS and MicroScan in the identification of emerging and multidrug resistant yeasts in a fourth-level hospital in Bogotá, Colombia. BMC Microbiology 19 (1), 106. doi:10.1186/s12866-019-1482-y Chang, S., Carneiro-Leão, M.P., de Oliveira, B.F., Souza-Motta, C., Lima, N. et al. (2016) Polyphasic approach including MALDI-TOF MS/MS analysis for identification and characterisation of Fusarium verticillioides in Brazilian corn kernels. Toxins 8 (3), 1–13. doi:10.3390/toxins8030054 Chalupová, J., Raus, M., Sedlárˇová, M. and Sebela, M. (2014) Identification of fungal microorganisms by MALDI-TOF mass spectrometry. Biotechnology Advances 32 (1), 230–241. Chao, Q.T., Lee, T.F., Teng, S.H., Peng, L.Y., Chen, P.H. et al. (2014) Comparison of the accuracy of two conventional phenotypic methods and two MALDI-TOF MS systems with that of DNA sequencing analysis for correctly identifying clinically encountered yeasts. PLoS One 9 (10), be109376. doi: 10.1371/journal.pone.0109376 Chen, H.Y. and Chen, Y.C. (2005) Characterization of intact Penicillium spores by matrix-assisted laser desorption/ionization mass spectrometry. Rapid Communication in Mass Spectrometry 19 (23), 3564–3568. Clark, A.E., Kaleta, E.J., Arora, A. and Wolk, D.M. (2013) Matrix-assisted laser desorption ionization-time of flight mass spectrometry: a fundamental shift in the routine practice of clinical microbiology. Clinical Microbiology Reviews 26 (3), 547–603. doi:10.1128/CMR.00072-12 Crossay, T., Antheaume, C., Redecker, D., Bon, L., Chedri, N. et al. (2017) New method for the identification of arbuscular mycorrhizal fungi by proteomic-based biotyping of spores using MALDI-TOF-MS. Scientific Reports 7 (1), 14306. doi:10.1038/s41598-017-14487-6 Danesi, P., Drigo, I., Iatta, R., Firacative, C., Capelli, G. et al. (2014) MALDI-TOF MS for the identification of veterinary non-C. neoformans-C. gattii Cryptococcus spp. isolates from Italy. Medical Mycology 52 (6), 659–666. doi: 10.1093/mmy/myu031 de Almeida, J.N., Del Negro, G.M.B., Grenfell, R.C., Vidal, M.S.M., Thomaz, D.Y. et al. (2015) Matrixassisted laser desorption ionization–time of flight mass spectrometry for differentiation of the dimorphic fungal species Paracoccidioides brasiliensis and Paracoccidioides lutzii. Journal of Clinical Microbiology 53 (4), 1383–1386. doi:10.1128/JCM.02847-14 de Almeida, J.N., Sztajnbok, J., da Silva, A.R., Vieira, V.A., Galastri, A.L. et al. (2016) Rapid identification of moulds and arthroconidial yeasts from positive blood cultures by MALDI-TOF mass spectrometry. Medical Mycology 54 (8), 885–889. doi:10.1093/mmy/myw044 Del Chierico, F., Masotti, A., Onori, M., Fiscarelli, E., Mancinelli, L. et al. (2012) MALDI-TOF MS proteomic phenotyping of filamentous and other fungi from clinical origin. Journal of Proteomics 75 (11), 3314–3330. doi:10.1016/j.jprot.2012.03.048 Dias, N., Santos, C., Portela, M. and Lima, N. (2011) Toenail onychomycosis in a portuguese geriatric population. Mycopathologia 172 (1), 55–61. doi:10.1007/s11046-011-9402-1 Dong, H., Kemptner, J., Marchetti-Deschmann, M., Kubicek, C.P. and Allmaier, G. (2009) Development of a MALDI two-layer volume sample preparation technique for analysis of colored conidia spores of Fusarium by MALDI linear TOF mass spectrometry. Analytical and Bioanalytical Chemistry 395 (5), 1373–1383. doi:10.1007/s00216-009-3067-3 dos Santos, F.N., Tata, A., Belaz, K.R.A., Magalhães, D.M.A., Luz, E.D.M.N. and Eberlin, M.N. (2017) Major phytopathogens and strains from cocoa (Theobroma cacao L.) are differentiated by MALDI-MS lipid and/or peptide/protein profiles. Analytical and Bioanalytical Chemistry 409 (7), 1765–1777. doi:10.1007/ s00216-016-0133-5 Drissner, D. and Freimoser, F.M. (2017) MALDI-TOF mass spectroscopy of yeasts and filamentous fungi for research and diagnostics in the agricultural value chain. Chemical and Biological Technologies in Agriculture 4 (1), 1–12. doi:10.1186/s40538-017-0095-7 Dupont, D., Normand, A.C., Persat, F., Hendrickx, M., Piarroux, R. et al. (2019) Comparison of matrix-assisted laser desorption ionization time of flight mass spectrometry (MALDI-TOF MS) systems for the identification of moulds in the routine microbiology laboratory. Clinical Microbiology and Infection 25 (7), 892–897. doi:10.1016/j.cmi.2018.10.013 Flórez-Muñoz, S.V., Gómez-Velásquez, J.C., Loaiza-Díaz, N., Soares, C., Santos, C. et al. (2019) ITS rDNA gene analysis versus MALDI-TOF MS for identification of Neoscytalidium dimidiatum isolated from onychomycosis and dermatomycosis cases in Medellin (Colombia). Microorganisms 7 (9), 306. doi:10.3390/microorganisms7090306 Firacative, C., Trilles, L. and Meyer, W. (2012) MALDI-TOF MS enables the rapid identification of the major molecular types within the Cryptococcus neoformans/C. gattii species complex. PLoS One 7 (5), e37566.

C. Santos et al. 137

Galeano García, P.L. (2019) Estudio del metabolismo de plantas de Solanum lycopersicum durante la infección de Phytophthora infestans, mediante técnicas de espectrometría de masas. Phd thesis. Universidad de los Andes, Bogotá, Colombia. Gautier, M., Normand, A.C. and Ranque, S. (2016) Previously unknown species of Aspergillus. Clinical Microbiology and Infection 22 (8), 662–669. doi:10.1016/j.cmi.2016.05.013 Gautier, M., Ranque, S., Normand, A.C., Becker, P., Packeu, A. et al. (2014) Matrix-assisted laser desorption ionization time-of-flight mass spectrometry: revolutionizing clinical laboratory diagnosis of mould infections. Clinical Microbiology and Infection 20 (12), 1366–1371. doi:10.1111/1469-0691.12750 Gräser, Y., Kühnisch, J. and Presber, W. (1999) Molecular markers reveal exclusively clonal reproduction in Trichophyton rubrum. Journal of Clinical Microbiology 37 (11), 3713–3717. Gräser, Y., Scott, J. and Summerbell, R. (2008) The new species concept in dermatophytes—a polyphasic approach. Mycopathologia 166 (5–6), 239–256. doi:10.1007/s11046-008-9099-y Gupta, A.K., Kohli, Y. and Summerbell, R.C. (2001) Variation in restriction fragment length polymorphisms among serial isolates from patients with Trichophyton rubrum infection. Journal of Clinical Microbiology 39 (9), 3260–3266. doi:10.1128/JCM.39.9.3260-3266.2001 Ha, M., Choi, E.J., Son, E.J., Yang, J., Choi, E.K. et al. (2016) Discrimination of the food borne bacteria by MALDI-TOF MS. Safe Food 11 (2), 18–27. Hawksworth, D.L. and Lücking, R. (2017) Fungal diversity revisited: 2.2 to 3.8 million species. Microbiology Spectrum 5 (4), 79–95. doi:10.1128/microbiolspec.FUNK-0052-2016 Hedayati, M.T., Taghizadeh-Armaki, M., Zarrinfar, H., Hoseinnejad, A., Ansari, S. et al. (2019) Discrimination of Aspergillus flavus from Aspergillus oryzae by matrix-assisted laser desorption/ionisation time-of-flight (MALDI-TOF) mass spectrometry. Mycoses 62 (12), 1182–1188. doi:10.1111/myc.13010 Heireman, L., Patteet, S. and Steyaert, S. (2020) Performance of the new ID fungi plate using two types of reference libraries (Bruker and MSI) to identify fungi with the Bruker MALDI Biotyper. Medical Mycology myz138. doi:10.1093/mmy/myz138 Hettick, J.M., Green, B.J., Buskirk, A.D., Kashon, M.L., Slaven, J.E. et al. (2008) Discrimination of Aspergillus isolates at the species and strain level by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry fingerprinting. Analytical Biochemistry 380 (2), 276–281. doi: 10.1016/j.ab.2008.05.051. Hibbett, D., Abarenkov, K., Kõljalg, U., Öpik, M., Chai, B. et al. (2016) Sequence-based classification and identification of Fungi. Mycologia 108 (6), 1049–1068. doi:10.3852/16-130 Hillenkamp, F. and Karas, M. (2007) The MALDI process and method. In: Hillenkamp, F. and Peter-Katalinic´, J. (ed.) MALDI MS: A Practical Guide to Instrumentation, Methods and Applications. Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim, Germany, pp. 1–28. doi:10.1002/9783527610464.fmatter Holland, R.D., Wilkes, J.G., Rafii, F., Sutherland, J.B., Persons, C.C. et al. (1996) Rapid identification of intact whole bacteria based on spectral patterns using matrix-assisted laser desorption/ionization with time-of-flight mass spectrometry. Rapid Communications in Mass Spectrometry 10 (10), 1227–1232. doi:10.1002/(SICI)1097-0231(19960731)10:103.0.CO;2-6 Honnavar, P., Ghosh, A.K., Paul, S., Shankarnarayan, S.A., Singh, P. et al. (2018) Identification of Malassezia species by MALDI-TOF MS after expansion of database. Diagnostic Microbiology and Infectious Disease 92 (2), 118–123. doi:10.1016/j.diagmicrobio.2018.05.015 Imbert, S., Normand, A.C., Gabriel, F., Cassaing, S., Bonnal, C. et al. (2019) Multi-centric evaluation of the online MSI platform for the identification of cryptic and rare species of Aspergillus by MALDI-TOF. Medical Mycology 57 (8), 962–968. doi:10.1093/mmy/myz004 Kallow, W., Santos, I.M., Erhard, M., Serra, R., Venâncio, A. et al. (2006) Aspergillus ibericus: a new species of section Nigri characterised by MALDI-TOF MS. In: Meyer, W. and Pearce, C. (ed.) Proceedings of 8th International Mycological Congress. Medimond S.r.l., Bologna, Italy, pp. 189–193. Kemptner, J., Marchetti-Deschmann, M., Mach, R., Druzhinina, I.S., Kubicek, C.P. et al. (2009) Evaluation of matrix-assisted laser desorption/ionization (MALDI) preparation techniques for surface characterization of intact Fusarium spores by MALDI linear time-of-flight mass spectrometry. Rapid Communications in Mass Spectrometry: RCM 23 (6), 877–884. doi:10.1002/rcm.3949 Keys, C.J., Dare, D.J., Sutton, H., Wells, G., Lunt, M. et al. (2004) Compilation of a MALDI-TOF mass spectral database for the rapid screening and characterisation of bacteria implicated in human infectious diseases. Infection, Genetics and Evolution 4 (3), 221–242. doi:10.1016/j.meegid.2004.02.004 Lacroix, C., Gicquel, A., Sendid, B., Meyer, J., Accoceberry, I. et al. (2014) Evaluation of two matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) systems for the identification of Candida species. Clinical Microbiology and Infection 20 (2), 153–158. doi:10.1111/1469-0691.12210

138

MALDI-TOF MS and its Requirements for Fungal Identification

Lau, A.F., Drake, S.K., Calhoun, L.B., Henderson, C.M. and Zelazny, A.M. (2013) Development of a clinically comprehensive database and a simple procedure for identification of molds from solid media by matrix-assisted laser desorption ionization–time of flight mass spectrometry. Journal of Clinical Microbiology 51 (3), 828–834. doi:10.1128/JCM.02852-12 Lau, A.F., Walchak, R.C., Miller, H.B., Slechta, E.S., Kamboj, K., Riebe, K., Robertson, A.E., Gilbreath, J.J., Mitchell, K.F., Wallace, M.A., Bryson, A.L., Balada-Llasat, J.-M., Bulman, A., Buchan, B.W., Burnham, C.-A.D., Butler-Wu, S., Desai, U., Doern, C.D., Hanson, K.E., Henderson, C.M., Kostrzewa, M., Ledeboer, N.A., Maier, T., Pancholi, P., Schuetz, A.N., Shi, G., Wengenack, N.L., Zhang, S.X., Zelazny, A.M. and Frank, K.M. (2019) Multicenter study demonstrates standardization requirements for mold identification by MALDI-TOF MS. Frontiers in Microbiology 10, 2098. doi: 10.3389/fmicb.2019.02098 Lee, Y., Sung, J.Y., Kim, H., Yong, D. and Lee, K. (2017) Comparison of a new matrix-assisted laser desorption/ionization time-of-flight mass spectrometry platform, ASTA MicroIDSys, with Bruker Biotyper for species identification. Annals of Laboratory Medicine 37 (6), 531. doi:10.3343/alm.2017.37.6.531 Li, T.Y., Liu, B.H. and Chen, Y.C. (2000) Characterization of Aspergillus spores by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Rapid Communications in Mass Spectrometry 14 (24), 2393–2400. doi:10.1002/1097-0231(20001230)14:243.0.CO;2-9 Li, Y., Wang, H., Zhao, Y.P., Xu, Y.C. and Hsueh, P.R. (2017) Evaluation of the Bruker Biotyper Matrix-Assisted Laser Desorption/Ionization Time-of-Flight mass spectrometry system for identification of Aspergillus species directly from growth on solid agar media. Frontiers in Microbiology 8, 1209. doi: 10.3389/fmicb.2017.01209 Lima-Neto, R., Santos, C., Lima, N., Sampaio, P., Pais, C. et al. (2014) Application of MALDI-TOF MS for requalification of a Candida clinical isolates culture collection. Brazilian Journal of Microbiology 45 (2), 515–522. doi:10.1590/S1517-83822014005000044 Lima, M.S., de Lucas, R.C., Lima, N., Polizeli, M.L.T M. and Santos, C. (2019) Fungal community ecology using MALDI-TOF MS demands curated mass spectral databases. Frontiers in Microbiology 10, 315. doi:10.3389/fmicb.2019.00315 Lima, N. and Santos, C. (2017) MALDI-TOF MS for identification of food spoilage filamentous fungi. Current Opinion in Food Science 13, 26–30. doi:10.1016/j.cofs.2017.02.002 Masih, A., Singh, P.K., Kathuria, S., Agarwal, K., Meis, J.F. et al. (2016) Identification by molecular methods and matrix-assisted laser desorption ionization–time of flight mass spectrometry and antifungal susceptibility profiles of clinically significant rare Aspergillus species in a referral chest hospital in Delhi, India. Journal of Clinical Microbiology 54 (9), 2354–2364. doi:10.1128/JCM.00962-16 Matos, A., Moreira, L., Barczewski, B., de Matos, L., de Oliveira, J. et al. (2020) Identification by MALDI-TOF MS of Sporothrix brasiliensis isolated from a subconjunctival infiltrative lesion in an immunocompetent patient. Microorganisms 8 (1), 22. doi:10.3390/microorganisms8010022 MicroIDSys (2020) MicroIDSys: Microorganism Identification. Available at: http://astams.co.kr/asta_/en/ product/microidsys.php (accessed 13 February 2020). Norlia, M., Jinap, S., Nor-Khaizura, M.A.R., Son, R., Chin, C.K. et al. (2018) Polyphasic approach to the identification and characterization of aflatoxigenic strains of Aspergillus section Flavi isolated from peanuts and peanut-based products marketed in Malaysia. International Journal of Food Microbiology 282, 9–15. doi:10.1016/j.ijfoodmicro.2018.05.030 Normand, A.C., Becker, P., Gabriel, F., Cassagne, C., Accoceberry, I. et al. (2017) Validation of a new web application for identification of fungi by use of matrix-assisted laser desorption ionization–time of flight mass spectrometry. Journal of Clinical Microbiology 55 (9), 2661–2670. doi:10.1128/JCM.00263-17 O’Connor, P.B and Hillenkamp, F. (2007) MALDI Mass spectrometry instrumentation. In: Hillenkamp, F. and Peter-Katalinic´, J. (ed.) MALDI MS: A Practical Guide to Instrumentation, Methods and Applications. WileyVCH Verlag GmbH & Co. KGaA, Weinheim, Germany, pp. 29–82. doi:10.1002/9783527610464.fmatter Oliveira, M.M.E., Santos, C., Sampaio, P., Romeo, O., Almeida-Paes, R. et al. (2015) Development and optimization of a new MALDI-TOF protocol for identification of the Sporothrix species complex. Research in Microbiology 166 (2), 102–110. doi:10.1016/j.resmic.2014.12.008 Pan, Y.L., Chow, N.H., Chang, T.C. and Chang, H.C. (2011) Identification of lethal Aspergillus at early growth stages based on matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Diagnostic Microbiology and Infectious Disease 70 (3), 344–354. Panda, A., Ghosh, A.K., Mirdha, B.R., Xess, I., Paul, S. et al. (2015) MALDI-TOF mass spectrometry for rapid identification of clinical fungal isolates based on ribosomal protein biomarkers. Journal of Microbiological Methods 109, 93–105. doi:10.1016/j.mimet.2014.12.014 Park, J.H., Shin, J.H., Choi, M.J., Choi, J.U., Park, Y.J. et al. (2017) Evaluation of matrix-assisted laser desorption/ionization time-of-fight mass spectrometry for identification of 345 clinical isolates of

C. Santos et al. 139

Aspergillus species from 11 Korean hospitals: comparison with molecular identification. Diagnostic Microbiology and Infectious Disease 87 (1), 28–31. doi:10.1016/j.diagmicrobio.2016.10.012 Passarini, M.R.Z., Santos, C., Lima, N., Berlinck, R.G.S. and Sette, L.D. (2013) Filamentous fungi from the Atlantic marine sponge Dragmacidon reticulatum. Archives of Microbiology 195 (2), 99–111. doi:10.1007/s00203-012-0854-6 Patel, R. (2013) Matrix-assisted laser desorption ionization-time of flight mass spectrometry in clinical microbiology. Clinical Infectious Diseases 57 (4), 564–572. doi:10.1093/cid/cit247 Patel, R. (2019) A Moldy application of MALDI: MALDI-ToF mass spectrometry for fungal identification. Journal of Fungi 5 (1), 4. doi:10.3390/jof5010004 Paul, S., Singh, P., Sharma, S., Prasad, G.S., Rudramurthy, S.M. et al. (2019) MALDI-TOF MS-Based identification of melanized fungi is faster and reliable after the expansion of in-house database. Proteomics Clinical Applications 13 (3), e1800070. Paziani, M.H., Tonani Carvalho, L., Carvalo Melhem, M.S., Gottardo de Almeida, M.T., Nadaletto Bonifácio da Silva, M.E. et al. (2020) First comprehensive report of clinical Fusarium strains isolated in the state of Sao Paulo (Brazil) and identified by MALDI-TOF MS and molecular biology. Microorganisms 8 (1), 66. doi:10.3390/microorganisms8010066 Pereira, L., Dias, N., Santos, C. and Lima, N. (2014) The use of MALDI-TOF ICMS as an alternative tool for Trichophyton rubrum identification and typing. Enfermedades Infecciosas y Microbiología Clínica 32 (1), 11–17. doi:10.1016/j.eimc.2013.01.009 Peng, Y., Zhang, Q., Xu, C. and Shi, W. (2019) MALDI-TOF MS for the rapid identification and drug susceptibility testing of filamentous fungi. Experimental and Therapeutic Medicine 18 (6), 4865–4873. Posteraro, B., De Carolis, E., Vella, A. and Sanguinetti, M. (2013) MALDI-TOF mass spectrometry in the clinical mycology laboratory: identification of fungi and beyond. Expert Review of Proteomics 10 (2), 151–164. doi:10.1586/epr.13.8 Quéro, L., Courault, P., Cellière, B., Lorber, S., Jany, J.L. et al. (2020). Application of MALDI-TOF MS to species complex differentiation and strain typing of food related fungi: Case studies with Aspergillus section Flavi species and Penicillium roqueforti isolates. Food Microbiology 86, 103311. doi:10.1016/j.fm.2019.103311 Quéro, L., Girard, V., Pawtowski, A., Tréguer, S., Weill, A. et al. (2019) Development and application of MALDI-TOF MS for identification of food spoilage fungi. Food Microbiology 81, 76–88. doi:10.1016/j.fm.2018.05.001 Ranque, S., Normand, A.C., Cassagne, C., Murat, J.B., Bourgeois, N. et al. (2014) MALDI-TOF mass spectrometry identification of filamentous fungi in the clinical laboratory. Mycoses 57 (3), 135–140. doi:10.1111/myc.12115 Robert, M.G., Romero, C., Dard, C., Garnaud, C., Cognet, O. et al. (2020) Evaluation of ID-Fungi-Plates™ media for identification of molds by MALDI-Biotyper™. Journal of Clinical Microbiology, In press. doi: 10.1128/JCM.01687-19 Robinson, K.N., Steven, R.T. and Bunch, J. (2018) Matrix optical absorption in UV-MALDI MS. Journal of The American Society for Mass Spectrometry 29 (3), 501–511. doi:10.1007/s13361-017-1843-4 Rodrigues, P., Santos, C., Venâncio, A. and Lima, N. (2011) Species identification of Aspergillus section Flavi isolates from Portuguese almonds using phenotypic, including MALDI-TOF ICMS, and molecular approaches. Journal of Applied Microbiology 111 (4), 877–892. doi:10.1111/j.1365-2672.2011.05116.x Rodriguez, R., Santos, C., Simões, M.F., Soares, C., Santos, C. et al. (2019) Polyphasic, including MALDITOF MS, evaluation of freeze-drying long-term preservation on Aspergillus (Section Nigri) Strains. Microorganisms 7 (9), 291. doi:10.3390/microorganisms7090291 Sampedro, A., Ceballos Mendiola, J. and Aliaga Martínez, L. (2018) MALDI-TOF commercial platforms for bacterial identification. In: Cobo, F. (ed.) The Use of Mass Spectrometry Technology (MALDI-TOF) in Clinical Microbiology. Academic Press, Elsevier, London, pp. 47–57. doi:10.1016/B978-0-12-814451-0.00003-4 Sanguinetti, M. and Posteraro, B. (2014) MALDI-TOF mass spectrometry: any use for Aspergilli? Mycopathologia 178 (5–6), 417–426. doi:10.1007/s11046-014-9757-1 Sanguinetti, M. and Posteraro, B. (2017) Identification of molds by matrix-assisted laser desorption ionization–time of flight mass spectrometry. Journal of Clinical Microbiology 55 (2), 369–379. doi:10.1128/JCM.01640-16 Santos, C., Paterson, R.R.M., Venâncio, A. and Lima, N. (2010) Filamentous fungal characterizations by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry. Journal of Applied Microbiology 108 (2), 375–385. doi:10.1111/j.1365-2672.2009.04448.x Santos, C., Lima, N., Sampaio, P. and Pais, C. (2011) Matrix-assisted laser desorption/ionization time-offlight intact cell mass spectrometry to detect emerging pathogenic Candida species. Diagnostic Microbiology and Infectious Disease 71 (3), 304–308. doi:10.1016/j.diagmicrobio.2011.07.002

140

MALDI-TOF MS and its Requirements for Fungal Identification

Santos, C., Ventura, J.A. Costa, H. Fernandes, P.M.B. and Lima, N. (2015) MALDI-TOF MS to identify the pineapple pathogen Fusarium guttiforme and its antagonist Trichoderma asperellum on decayed pineapple. Tropical Plant Pathology 40 (4), 227–232. doi:10.1007/s40858-015-0027-7 Santos, C., Ventura, J.A. and Lima, N. (2016) New insights for diagnosis of pineapple fusariosis by MALDITOF MS technique. Current Microbiology 73 (2), 206–213. doi:10.1007/s00284-016-1041-9 Santos, C., Francisco, E., Mazza, M., Padovan, A.C.B., Colombo, A. et al. (2017) Impact of MALDI-TOF MS in clinical mycology; progress and barriers in diagnostics. In: Shah, H.N. and Gharbia, S.E. (ed.) MALDI-TOF and Tandem MS for Clinical Microbiology. John Wiley & Sons, Ltd, Chichester, UK, pp. 211–230. doi:10.1002/9781118960226.ch9 Schmidt, O. and Kallow, W. (2005) Differentiation of indoor wood decay fungi with MALDI-TOF mass spectrometry. Holzforschung 59 (3), 374–377. doi:10.1515/HF.2005.062 Schulthess, B., Ledermann, R., Mouttet, F., Zbinden, A., Bloemberg, G.V. et al. (2014) Use of the Bruker MALDI Biotyper for identification of molds in the clinical mycology laboratory. Journal of Clinical Microbiology 52 (8), 2797–2803. doi:10.1128/JCM.00049-14 Seyfarth, F., Ziemer, M., Gräser, Y., Elsner, P. and Hipler, U.C. (2007) Widespread tinea corporis caused by Trichophyton rubrum with non-typical cultural characteristics- diagnosis via PCR. Mycoses 50 (s2), 26–30. doi:10.1111/j.1439-0507.2007.01427.x Simões, M.F., Pereira, L., Santos, C. and Lima, N. (2013) Polyphasic identification and preservation of fungal diversity: concepts and applications. In: Malik, A., Grohmann, E. and Alves, M. (eds) Management of Microbial Resources in the Environment. Springer, Dordrecht, Netherlands, pp. 91–117. doi:10.1007/978-94-007-5931-2_5 Siqueira, L.P.M., Gimenes, V.M.F., de Freitas, R.S., Melhem, M.S.C., Bonfietti, L.X. et al. (2019) Evaluation of Vitek MS for differentiation of Cryptococcus neoformans and Cryptococcus gattii genotypes. Journal of Clinical Microbiology 57 (1), pii: e01282-18. Soares, C., Rodrigues, P., Peterson, S.W., Lima, N. and Venâncio, A. (2012) Three new species of Aspergillus section Flavi isolated from almonds and maize in Portugal. Mycologia 104 (3), 682–697. doi:10.3852/11-088 Sogawa, K., Watanabe, M., Sato, K., Segawa, S., Miyabe, A. et al. (2012) Rapid identification of microorganisms by mass spectrometry: improved performance by incorporation of in-house spectral data into a commercial database. Analytical and Bioanalytical Chemistry 403 (7), 1811–1822. doi:10.1007/s00216-011-5656-1 Souza, P., Grigoletto, T., de Moraes, L., Abreu, L., Guimarães, L. et al. (2016) Production and chemical characterization of pigments in filamentous fungi. Microbiology 162 (1), 12–22. doi:10.1099/mic.0.000168 Stein, M., Tran, V., Nichol, K.A., Lagacé-Wiens, P., Pieroni, P. et al. (2018) Evaluation of three MALDI-TOF mass spectrometry libraries for the identification of filamentous fungi in three clinical microbiology laboratories in Manitoba, Canada. Mycoses 61 (10), 743–753. doi:10.1111/myc.12800 Stübiger, G., Wuczkowski, M., Mancera, L., Lopandic, K., Sterflinger, K. et al. (2016) Characterization of yeasts and filamentous fungi using MALDI lipid phenotyping. Journal of Microbiological Methods 130, 27–37. doi:10.1016/j.mimet.2016.08.010 Tam, E.W T., Chen, J.H.K., Lau, E.C.L., Ngan, A.H.Y., Fung, K.S.C. et al. (2014) Misidentification of Aspergillus nomius and Aspergillus tamarii as Aspergillus flavus: Characterization by internal transcribed spacer, ß-Tubulin, and calmodulin gene sequencing, metabolic fingerprinting, and matrixassisted laser desorption ionization-time of flight mass spectometry. Journal of Clinical Microbiology 52 (4), 1153–1160. doi:10.1128/JCM.03258-13 Valero, C., Buitrago, M.J., Gago, S., Quiles-Melero, I. and García-Rodríguez, J. (2018) A matrix-assisted laser desorption/ionization time of flight mass spectrometry reference database for the identification of Histoplasma capsulatum. Medical Mycology 56 (3), 307–314. doi:10.1093/mmy/myx047 Vidal-Acuña, M.R., Ruiz-Pérez de Pipaón, M., Torres-Sánchez, M.J. and Aznar, J. (2018. Identification of clinical isolates of Aspergillus, including cryptic species, by matrix assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS). Medical Mycology 56 (7), 838–846. doi:10.1093/ mmy/myx115 Welham, K.J., Domin, M.A., Johnson, K., Jones, L. and Ashton, D.S. (2000) Characterization of fungal spores by laser desorption/ionization time-of-flight mass spectrometry. Rapid Communications in Mass Spectrometry 14 (5), 307–310. doi:10.1002/(SICI)1097-0231(20000315)14:53.0.CO;2-3

9

The Strength of Chemotaxonomy Paul A. Lawson* and Nisha B. Patel Department of Microbiology and Plant Biology, University of Oklahoma, Norman, Oklahoma, USA

Introduction For the majority of investigators, the terms ‘systematics’ and ‘taxonomy’ are synonymous; however, there are important distinctions between them. Systematics is a broader field of biology that studies the diversity and relationships between organisms; whereas taxonomy is more narrowly defined and can be considered to fall under the umbrella of systematics per se consisting of (i) classification (the arrangement of organisms into taxonomic groups); (ii) nomenclature (assignment of names to the taxonomic groups); and (iii) identification (the process of determining that a new isolate belongs to one of the established taxa). Furthermore, in light of the enormous amount of unknown diversity yet to be described by science (Whitman et al., 1998; Yarza et al., 2014; Hedlund et al., 2015; Lagkouvardos et al., 2017; Overmann et al., 2018), Staley (2010) proposed that ‘comprehending microbial diversity’ should be a fourth goal of taxonomy. Currently, there is a robust debate taking place within the taxonomic community on precisely which methods should be used to efficiently classify microbial diversity (see Chapters 1, 13, 16 and 18). The traditional polyphasic approach is more comprehensive (Tindall et al., 2010; Rainey, 2011; Garrity and Oren, 2013; Kämpfer, 2014); methods

based upon genome sequences are more minimalist, relying on fewer laboratory-based characteristics, or just on the genome alone (Sutcliffe et al., 2011, 2013; Sutcliffe, 2015; Whitman, 2014, 2015, 2016). Although the number of laboratories incorporating a minimalist taxonomic approach has increased, journals dealing with the description of novel taxa show no sign of major changes to their guidelines for publishing, aside from requiring genome sequences to be deposited in DNA databanks. Indeed, most journals still insist on a checklist of methods alongside reference strains, contributing towards unacceptable delays and increased costs (Sutcliffe et al., 2011, 2013; Sutcliffe, 2015). An essential component of taxonomy is the application of appropriate methods that can be used to identify and then classify microorganisms; chemotaxonomy embraces a number of techniques that has been used extensively in this activity. The term ‘chemotaxonomy’ refers to analytical methods that exploit the chemical differences in the composition of cell constituents that may include cellular fatty acids, mycolic acids, polar lipids, respiratory quinones, pigments, polyamines, cellular sugars and peptidoglycan (Busse et al., 1996). Taken together, the collection of information on various constituents of the cell provide a robust approach to the generation

*[email protected] © CAB International 2021. Trends in the Systematics of Bacteria and Fungi (eds. P. Bridge, D. Smith and E. Stackebrandt)

141

142

P.A. Lawson and N.B. Patel

of information used in the identification and classification of prokaryotes (Busse et al., 1996; Schleifer, 2009; Tindall et al., 2010). Strictly speaking, the G+C content of the DNA may be considered a chemotaxonomic trait, and has been treated so in the past; it is now normally discussed in terms of the context of molecular and genotypic methods and therefore will not be covered in this text (Mesbah et al., 2011). However, many taxa – especially in the pre-molecular era – were circumscribed on the basis of G+C mol% in tandem with chemotaxonomic criteria (Goodfellow and Minnikin, 1985; Goodfellow and O’Donnell, 1994). Likewise, matrix-assisted laser desorption/ionization time of flight (MALDI-TOF) mass spectrometry is now included in many texts dealing with chemotaxonomy; this is covered in Chapters 7 and 8 and therefore will not be discussed here. Chemotaxonomic markers are fundamentally part of the microbial phenotype but are generally discussed separately. A shortfall of some phenotypic methods is that particular tests cannot be universally applied across all bacterial and archaeal groups; for example, not all organisms are metabolically active in these assays, leading to data and classification schemes that are often fragmented in nature. On the other hand, traits that encompass chemotaxonomic markers present in cell walls or cell membranes are always present to some extent or another. Not only have these properties been used extensively, but they have been found to reflect phylogenetic relationships and are recommended for use in polyphasic investigations (Tindall et al., 2010; Rainey, 2011). It is important to note that these chemical constituents can be identical in different organisms, and therefore that the interpretation of data obtained should include chemotaxonomic differences to exclude an organism from a particular taxon, rather than to use similarities as the sole criterion for inclusion (Tindall et al., 2010). Chemotaxonomic methods in particular have been very useful for those groups of prokaryotes where morphological and physiological characters failed to differentiate taxa or provide a satisfactory classification (Schleifer and Stackebrandt, 1983). The individual methods are beyond the scope of this chapter and have been reported extensively elsewhere (Busse et al., 1996; Rainey and Oren, 2011). The history and

development of chemotaxonomic methods will be reviewed, with examples of the application to different taxa, and with extensive reference to primary literature and reviews. The application of in silico methods utilizing information from the genome and future directions will also be discussed.

Background and History of Chemotaxonomic Biomarkers The field of chemotaxonomy has a long and rich history. Many of the chemical constituents that now form the basis of the chemotaxonomic biomarkers and now routinely employed to characterize microorganisms were first described in the early 1900s. Those early studies began to demonstrate the variation in the composition of the bacterial cell envelope between different bacterial groups (for a review see Salton, 1994). However, it was the technological advances achieved in the 1950s that resulted in an explosion in the application of these methods to microbial systematics (Kates, 1964; Lechevalier and Moss, 1977; Salton, 1994). Laboratory-based investigations revealed the array of structural differences in these cellular biomarkers and, as more information was gathered across different microbiological groups, microbiologists realized that these could be used not only in the identification but also in the classification of microorganisms (Kates, 1964; Lechevalier and Moss, 1977; Salton, 1994; Busse et al., 1996; Schleifer, 2009).

Cell wall components The constituents of the bacterial cell wall were some of the first chemotaxonomic markers to be utilized in the identification and classification of microorganisms. However, these observations were only possible with the development of methods that allowed the purification of cell wall material. These included (i) electron microscopy, which demonstrated that the insoluble material obtained from mechanically disintegrated bacteria was mostly cell wall fragments; and (ii) paper chromatography, which provided a simple means to separate and purify specific components liberated by enzymatic digestion (Cummins, 1956; Salton, 1994). Of particular importance to microbial taxonomy

The Strength of Chemotaxonomy

is a type of cell wall peptidoglycan or ‘murein’, a term introduced by Weidel and Pelzer (1964) that may still be used in the literature, although the former is now preferred (Weidel et al., 1960). Peptidoglycan is a heteropolymer composed of glycan strands usually made up of alternating β-1,4-linked N-acetylglucosamine and N-acetyl muramic acid residues cross-linked through short peptides (Ghuysen, 1968; Vollmer et al., 2008). Indeed, these methods are still used today in the majority of taxonomic papers, but with silica- based plates replacing paper in the chromatography process (Becker et al., 1965; Minnikin and Abdolrahimzadeh, 1971; Staneck and Roberts, 1974; Busse et al., 1996). These studies revealed that the cell wall preparations contained three important components; sugar, amino sugars and amino acid residues. Whole-cell hydrolysates may also come from a number of sources, including cell wall-associated or capsular polysaccharides, glycolipids, nucleic acids or carbohydrate storage products. Sugars contained within these hydrolysates may vary depending on cultivation conditions (medium, environmental conditions, duration, etc.). Glucose, galactose and mannose were found to be common components across different genera, with arabinose, rhamnose and ribose being less common (Cummins and Harris, 1956). With these methods established, there followed a number of studies focusing on the analysis of the cell wall, but it was not until the pioneering work of Cummins and Harris that a wide range of organisms (corynebacteria, lactobacilli, streptococci, staphylococci and other Gram-stain-positive cocci) were examined (Cummins, 1956; Cummins and Harris, 1956). Cummins and Harris (1956) first proposed that cell wall components could be used in microbial taxonomy, citing the earlier investigations of Work and Dewey (1953) who demonstrated the distribution of diaminopimelic acid among microorganisms and could be used in bacterial classification. Furthermore, it was discovered that the amino acids were limited to alanine, glutamic acid, aspartic acid, glycine, isomers of diaminopimelic acid (DAP) and lysine. Later studies revealed that some pathogenic plant species of Corynebacterium contained ornithine and 2,4-diaminobutyric acid, and that Micrococcus radiodurans also contained ornithine rather than DAP and lysine (Perkins and Cummins, 1964; Work, 1964). The importance of the diamino acids lysine, DAP, ornithine and 2,4-diaminobutyric

143

acid as taxonomic biomarkers was soon realized (Work and Dewey, 1953; Cummins, 1956; Cummins and Harris, 1956; Work, 1957; Salton, 1994). It is this variation in the ‘stem peptide’ that holds taxonomically significant information, especially in Gram-stain-positive bacteria. Subsequently, peptidoglycans were later divided into two main types (A and B) based on their cross-linkages, published in the seminal work of Schleifer and Kandler (1972). It is pertinent to note that the B-type is rare, and found only in some members of the phylum Firmicutes (e.g. Erysipelotrichaceae) and in some members of the phylum Actinobacteria (e.g. Microbacteriaceae) (Schleifer and Kandler, 1972; Schumann et al., 2009; Schumann, 2011; Verbarg et al., 2014). In the case of type A, the cross-linkage is achieved by linking the ω-amino group of the diamino acid at position 3 to the carboxylic group of D-alanine at position 4 of the adjacent peptide subunit, either directly, or by means of an interpeptide bridge consisting of 1-7 amino acid residues (Fig. 9.1A). In the case of type B cell walls, the α-carboxylic group of D-glutamic acid or threo-3-hydroxyglutamic acid at position 2 is involved in the cross-linkage (Schleifer et al., 1967). This type of cross-linkage requires the presence of the diamino acid (2,4-diaminobutyric acid) in the interpeptide bridge that allows a free amino group for linkage to the carboxylic group of the adjacent peptide subunit (Schleifer and Kandler, 1972; Schumann, 2011) (Fig. 9.1B). For a comprehensive list of peptidoglycan types, see www. peptidoglycan-types.info (accessed 20 July 2020). All Archaea lack peptidoglycan; it is replaced by polymers of diverse chemical structures to form a rigid cell wall (Visweswaran et al., 2011). One such compound, termed pseudomurein, is a fundamentally different type of peptidoglycan found in members of the Methanobacteriales and the genus Methanopyrus. Its glycan moiety contains L-talosaminuronic acid instead of muramic acid, and its peptide moiety lacks D-amino acids (Kandler and König, 1998). A review of the literature suggests that pseudomurein appears not to have been used as a biomarker in chemotaxonomy. Another taxonomically significant component of the cell wall is the glycolic acid that is linked to the muramic acid of the glycan backbone; its molar amount is comparable to those of other constituents of the peptidoglycan, such as

144

P.A. Lawson and N.B. Patel

A NAM 1L-Ala

LL-Dpm-Gly L-Lys-D-Ser-D-Asp L-Orn-L-Ser-L-Ala-L-Thr-L-Ala

2D-Glu 3

NAG

D-Ala4

Interpeptide bridge

L-DA

Meso-2,6-diaminopimelic acid LL-2,6-diaminopimelic acid

4D-Ala

L-Lysine, L-Ornithine 5(D-Ala)

L-DA

3

D-Glu2 L-Ala

NAM

NAG

B NAM 1Gly

L-Ser 2

(threo-3-hydroxyglutamic acid) 3

NAG

Type B1. D-Lysine, L-2,4-diaminobutyric acid, D-Omithine Type B2. D-Lysine, D-2,4-diaminobutyric acid, D-Omithine, D-Lys, (Gly, L-Thr, or L-Asp)

D-Ala 4

Interpeptide bridge

D-Glu

L-homoserine

L-AA

L-AA

L-2,4-diaminobutyric acid L-Alanine, L-Glutamine, L-Ornithine, L-Lysine

4 D-Ala

3

D-Glu 2

5 (D-Ala)

Gly

NAG

NAM

Fig. 9.1. Peptidoglycan types A and B. L-Ala, L-alanine; L-Glu, L-glutamic acid; L-DA, L-diamino acid, L-AA, L-amino acid; L-Gly, L-glycine; NAM, N-acetylmuramic acid; NAG, N-acetylglucosamine.

muramic acid, glucosamine and D-glutamic acid. Early studies on Mycobacterium smegmatis found a variation of the acyl form, N-glycolylmuramic acid, in the muramyl residue of the peptidoglycan (Adam et al., 1969; Petit et al., 1969). A simple method was developed by Uchida and Aida (1977)

and applied to a study on members of Actinomycetales that demonstrated its use as a chemotaxonomic tool (Uchida and Seino, 1997). Details of the isolation and characterization of peptidoglycan are provided by Schleifer and Kandler (1972) and Schumann (2011).

The Strength of Chemotaxonomy

Lipids Lipids have been used widely as chemotaxonomic markers; however, early studies focused more on their identification and structural properties rather than on their use as a taxonomic tool. It was not until the 1950s that lipid research became more widespread, although their application to microbial taxonomy only became more extensive in the 1970s (Kates, 1964; Shaw, 1970; Lechevalier and Moss, 1977). Lipids of particular use in chemotaxonomy include polar lipids (phospholipids, glycolipids and glycophospholipids, aminolipids and sulfur-containing lipids), cellular fatty acids and a class of terpenoid lipids known as isoprenoid or respiratory quinones. Microorganisms inhabit vastly different environments and possess very different lifestyles, but aerobes, anaerobes, thermophiles, acidophiles and alkaliphiles all synthesize ATP as a source of energy. Aerobes synthesize ATP via an electron transport (respiratory) chain, whereby anaerobes carry out an anaerobic respiration, while certain fermentative strains generate energy (ATP) by substrate-level phosphorylation. A variety of molecules may be involved in the electron transport mechanisms (i.e. cytochromes and ferredoxins), and respiratory lipoquinones are an important constituent of this process (Nowicka and Kruk, 2010). The three most studied electron carriers are menaquinones (MK, vitamin K), ubiquinone (UQ, coenzyme Q) and 2-demethylmenaquinone (DMK) (Nowicka and Kruk, 2010). For details on the occurrence, structure, biosynthesis and function of isoprenoid quinones, the reader is directed to the reviews of Collins and Jones (1981) and Nowicka and Kruk (2010). MK are the most widespread respiratory quinones and from an evolutionary perspective are the most ancient type (Nitschke et al., 1995). The quinone ring system, the functional component of electron transport, is attached to an isoprenoid side chain. These side chains can vary not only in the length (MK-n) but also in the degree of saturation (MK-n(Hn), referring to the number of hydrogens in the isoprenoid side chain, such as MK-8(H2) (Collins and Jones, 1981). UQ have a hydrophobic polyprenyl side chain of varying length (U-n), where n denotes the number of isoprenyl units. The variety of these structural differences makes these components powerful chemotaxonomic tools.

145

Respiratory lipoquinones or isoprenoid quinones are membrane-bound constituents found in nearly all living organisms; UQs are only found in alpha, beta- and gamma-Proteobacteria. Some exceptions are some obligatory fermentative bacteria (Collins and Jones, 1981; Zhi et al., 2014) and methanogenic Archaea (Beifuss and Tietze, 2005) and early branching bacterial phyla (Schütz et al., 2000). Composed of a polar head and hydrophobic side chain, they play important roles in electron transport. The majority of biological isoprenoid quinones belong to the naphthoquinones or benzoquinones. Napthoquinones can be further divided into phylloquinones (vitamin K1) and menaquinones (vitamin K2). The two most important groups of benzoquinones are UQs and plastoquinones, although benzothiophene quinones are an essential constituent of certain thermophilic and acidophilic Archaea, differing in the pattern of ring substitution (Soballe and Poole, 1999). Early studies established the presence of these compounds in different groups of bacteria, and also revealed that the structural variation exhibited by isoprenoid quinones may be of value in microbial taxonomy (Page et al., 1960; Bishop et al., 1962; Crane, 1965). These studies provided the impetus for a large number of studies in the 1970s that demonstrated the wide distribution in Bacteria and established their usefulness in taxonomy and phylogenetic affiliations (see review in Collins and Jones, 1981). The isolation, characterization and distribution of respiratory quinones are provided in Collins and Jones (1981), Busse et al. (1996) and Costa et al. (2011a). Polar lipids are important constituents of membranes and are formed by amphiphilic lipids which, in most of the studies, are glycerophospholipids composed of two fatty acids, a glycerol moiety, a phosphate group and a variable head group. Examples are phosphatidylethanolamine (PE), phosphatidylglycerol (PG), cardiolipin (CL) (also known more commonly in bacteria as diphosphatidylglycerol (DPG)), lysyl-phosphatidylglycerol (LPG), phosphatidylinositol (PI), phosphatidic acid (PA) and phosphatidylserine (PS) (Kates, 1964; Costa et al., 2011c; Sohlenkamp and Geiger, 2016). In the 1960s and 1970s reports emerged that described the presence of polar lipids in a number of diverse taxa, therefore providing evidence that polar lipids could be used in the

146

P.A. Lawson and N.B. Patel

identification and classification of microorganisms (Ames, 1968; Kamp et al., 1969; Kamio et al., 1970; Olsen and Ballou, 1971; Khuller and Brennan, 1972; Steiner et al., 1973; Hagen, 1974; Makula and Finnerty, 1974). For taxonomic purposes the vast majority of published studies show the lipids separated on two-dimensional thin-layer chromatography (2D-TLC) with the lipids identified on the basis of staining with specific sprays (Busse et al., 1996; Costa et al., 2011b). Although more sophisticated methods are available to determine the structure of lipids by nuclear magnetic resonance (NMR), mass spectrometry (MS) and liquid chromatography/mass spectrometry (LC/MS), in general these have not been employed in taxonomic descriptions (Jensen and Gross, 1988; Pulfer and Murphy, 2003; Ejsing et al., 2006; Buyukpamukcu et al., 2007; Schmidt et al., 2009; Bird et al., 2011). The most common polar lipids of bacteria are phospholipids, glycolipids and glycophospholipids, aminolipids and sulfur- containing lipids. It is pertinent to mention that the anionic lipid PS is often not observed on the 2D-TLC plate as it is normally converted to PE by the enzyme phosphatidylserine decarboxylase (Psd) (Martinez-Morales et al., 2003; Sohlenkamp and Geiger, 2016). A few bacteria are known to accumulate PS, for instance in the δ-proteobacterium Bdellovibrio bacteriovorans (Nguyen et al., 2008), Clostridium botulinum (Evans et al., 1998) and some Flavobacterium species (Lata et al., 2012). Archaeal lipids have been studied since the 1960s, and both structural and biosynthetic pathways have been reported, revealing novel and unique archaeal polar lipids (Kushwaha et al., 1981; Langworthy and Pond, 1986; Rosa et al., 1986; Rosa and Gambacorta, 1988; Zhang and Poulter, 1992; Koga and Morii, 2007; Koga, 2010; Jain et al., 2014). These studies demonstrated that Archaea possess a wide diversity of polar lipids, including phospholipids, glycolipids, phosphoglycolipids, sulpholipids and aminolipids, which have significant taxonomic applications. The occurrence of different polar head groups depends on the individual archaeal family and they can be used as unique taxonomic markers (Ulrih et al., 2009). For instance, aminolipids are prevalent in methanogens, but are completely absent in halophiles and thermophiles (Gambacorta et al., 1995). These studies as a whole revealed some important differences when compared to Bacteria: (i) hydrocarbon chains are bound to the glycerol moiety exclusively by

ether linkages in archaeal lipids, in contrast to bacterial polar lipids, most of which have ester linkages between fatty acids and a glycerol moiety; and (ii) the stereo structure of the glycerophosphate backbone: hydrocarbon chains are bound at the sn-2 and sn-3 positions of the glycerol moiety in archaeal lipids (2,3-dialkyl-sn-glycerol backbones), whereas bacterial and eucaryal lipids have sn-1 and 2-diradyl chains. Information on extraction and identification of polar lipids is reported by Busse et al., (1996) and Costa et al. (2011b). Cellular fatty acids (CFA) are defined as carboxylic acid derivatives of long-chain aliphatic molecules, with most being located in the cytoplasmic membrane as constituents of polar lipids and glycolipids (Kates, 1964). Bacteria have an enormous variety of fatty acyl chains that comprise straight chain saturated and unsaturated fatty acids, iso- and anteiso-branched fatty acids, internally branched fatty acids, hydroxy fatty acids, cyclopropane fatty acids, ω-cyclic fatty acids and dicarboxylic fatty acids (Goldfine and Bloch, 1961; Suzuki et al., 1993; Busse et al., 1996; Costa et al., 2011b). Polyunsaturated forms are rare but have been reported in psychrophilic bacteria and cyanobacteria (Yano et al., 1997; Russell and Nichols, 1999; Singh et al., 2002). In 1963, Abel and co-workers (Abel et al., 1963) were the first to present evidence suggesting that CFA analysis could successfully identify bacteria after the establishment of gas chromatography (GC) to separate organic compounds (James and Martin, 1952). This study demonstrated different CFA patterns within members of the family Enterobacteriaceae and in some Gram-positive- staining bacteria (Abel et al., 1963). Once the potential usefulness of CFA analysis in the identification and classification of bacteria had been established a plethora of studies ensued (Bousfield et al., 1983; Kaneda, 1991; Kämpfer and Kroppenstedt, 1996; Costa et al., 2011a). Perhaps the most significant development in the use of fatty acids in microbial taxonomy was its partial automation and commercialization by the introduction of the HP Microbial Identification System released in 1985, later upgraded in 1991 to the Sherlock Microbial Identification System (MIS). This system, created by Myron Sasser and co-workers, opened up the analysis of fatty acids to an enormous number of users, so much so that today almost every taxonomic description of a novel taxon includes this information. Furthermore, in addition to specialized

The Strength of Chemotaxonomy

research laboratories, it is used extensively in clinical laboratories for rapid identification of pathogens (Welch, 1991). In Archaea, the role(s) of long-chain fatty acids, and even their presence, remain uncertain. Fatty acids are not part of archaeal membrane lipids, with several studies failing to detect fatty acid-based lipids (Falb et al., 2008; Koga, 2010). Small amounts of fatty acids (C14, C16, C18) have, however, been detected in membrane proteins of Halobacterium salinarum (Pugh and Kates, 1994). Information on the extraction, structure and distribution of fatty acids are reviewed in da Costa (2011), Busse et al., (1996) and Suzuki et al. (2002). Mycolic acids (MAs) are characteristic lipid components of the cell envelope of mycobacteria and related Actinobacteria, constituting approximately 40%–60% of the cell envelope (Minnikin et al., 1971; Minnikin, 1982). The presence of a large amount of wax-like substance in the lipid fraction of the tubercle bacilli was first investigated by Aronson (1898). The chemical composition and properties were later studied by a number of investigators (Tamura, 1913; Anderson, 1929), and it was Stodola and colleagues (1938) who first proposed to call the ether-soluble, unsaponifiable, high molecular weight hydroxy acid of the human tubercle bacillus ‘mycolic acid’ (Stodola et al., 1938). The detailed structural details of MA were determined by Asselineau and Lederer (1950); they further defined mycolic acids as high molecular weight β-hydroxy fatty acids with a long α side chain. Subsequent studies demonstrated that MA-containing organisms are phylogenetically members of the Actinobacteria (Ludwig et al., 2012; Marrakchi et al., 2014; Nouioui et al., 2018). Mycobacteria and related MA-containing bacteria possess many unique characteristics due to the presence of MAs (Minnikin, 1982; Daffé and Draper, 1998). An important landmark in clinical microbiology and the disease TB was the observation of MA-containing bacteria to retain coloration with carbol fuchsin (acid-fast staining) following acid treatment of intact, stained cells (Barksdale and Kim, 1977; Smith, 2003; O’Sullivan et al., 2012). MAs are also involved in survival strategies such as antibiotic resistance and resistance to degradation with the macrophage phagolysosome (Minnikin, 1982; Gebhardt et al., 2007; Jackson, 2014; Smith, 2003). These studies, and others, have demonstrated the enormous number

147

of individual MA components with novel structures continuing to be identified (Lanéelle et al., 2011; Marrakchi et al., 2014). Again, these structural attributes make them ideal chemotaxonomic markers, and their analysis has been applied in the identification and characterization of members of the order Corynebacteriales (Lanéelle et al., 2013; Marrakchi et al., 2014). Information on the extraction, structure and distribution of MAs is given in Yassin (2011). Polyamines Polyamines are non-lipophilic constituents of the cell and are a diverse group of compounds found in the domains Eukarya, Bacteria and Archaea. They are derived from amino acids and are positively charged at physiological pH and have important roles in a range of cellular processes. For example, polyamines participate in nucleic acid and protein synthesis and stabilization, and they also regulate the cell cycle and growth processes (Tabor and Tabor, 1985; Michael, 2016). The more common polyamines (Table 9.1) include the diamines 1,3-diaminopropane (Dap), 1,4-diaminobutane (putrescine, Put) and 1,5diaminopentane (cadaverine, Cad); triamines sym-norspermidine (Nspd), spermidine (Spd) and sym-homospermidine (Hspd); the uncommon triamines aminopropylcadaverine and aminobutylcadaverine; and the tetraamines norspermine (Nspm), spermine (Spm) and thermospermine (Tspm) (Tabor and Tabor, 1985; Michael, 2016). The use of these polyamine patterns for chemotaxonomic purposes began to emerge in the 1970s and later expanded to include a wide range of taxa; for example studies on the family Vibrionaceae (Yamamoto et al., 1979, 1983), Proteobacteria (Hamana and Matsuzaki, 1993), Gram-stain-positive bacteria (Gvozdiak et al., 1998), thermophilic bacteria (Hosoya et al., 2004) and methanogenic Archaea (Scherer and Kneifel, 1983; Kneifel et al., 1986). Information on the characterization and distribution of polyamines is given by Busse (1996).

Applications of Chemotaxonomy to Bacterial Systematics The application and significance of chemotaxonomic methods to microbial systematics cannot be

148

P.A. Lawson and N.B. Patel

Table 9.1. Examples of Chemotaxonomic biomarkers Biomarker

Variations

Peptidoglycan

Diamino acid Interpeptide bridge Acetyl or Glycolyl arabinose, fucose, galactose, glucose, madurose, mannose, 2-O-methylD-mannose, rhamnose, ribose, tyvelose, and xylose Menaquinones (MK) Ubiquinones (Q-n) Length of isoprenoid units Degree of saturation Carbon length, saturated/unsaturated, branched, cyclic Carbon length, double bonds, functional goups phosphatidylethanolamine (PE), phosphatidylglycerol (PG), Diphosphatidylglycerol (DPG), lysyl-phosphatidylglycerol (LPG), phosphatidylinositol (PI), phosphatidic acid (PA) phosphatidylserine (PS) Cadaverine (CAD) Caldopentamine (CLPA) Canavalmine (CANV) 1,3-diaminopropane (DAP), Homocaldopentamine (HCPA) 2-Hydroxyputrescine (HPUT Putrescine (PUT) Spermidine (SPD) Spermine (SPM), sym-norspermidine (NSPD), sym-homospermidine (HSPD), sym-norspermine (NSPM) Thermospermine (TSPD)

Acyl type Whole cell sugars Respiratory Lipoquinones

Fatty acids Mycolic acids Polar lipids

Polyamines

overstated. The affiliation of microorganisms to particular taxa, especially in the pre-molecular era with the application of sequencing methods (16S rRNA gene and more recently whole genomes), relied to a great extent on chemotaxonomic biomarkers. Following early investigations, patterns began to emerge where particular biomarkers could be attributed to certain bacterial groups; as a result, chemotaxonomic methods were increasingly used in descriptions of novel organisms. This was further enhanced by the application of molecular methods, demonstrating the stable nature of chemotaxonomic biomarkers with respect to their presence within phylogenetic groups (Stackebrandt and Schumann, 2006). One example is the confusion over the taxonomy of the bacterial family Rhodobacteraceae over the past two decades. Suresh et al. (2019) used a taxogenomic approach to resolve the

taxonomy of this group. However, it is interesting to note that the phylogenetic clusters identified by Suresh et al. (2019) were complementary to the chemical data (polar lipids and fatty acids) reported in the 1990s (Imhoff, 1991, 2015; Imhoff and Bias-lmhoff, 1995). The reviews of Collins and Jones (1981) were instrumental in the taxonomy of Pseudomon considered essential in taxonomic as and supported the delineation of the proteobacteria into the alpha, beta and gamma groupings (Woese, 1987). Similarly, the review of Schleifer and Kandler (1972) was of enormous value in attributing peptidoglycan types to a wide range of bacterial groups. Given the huge number of published taxonomic descriptions that include chemotaxonomic information, a single text such as this cannot hope to include them all. However, the significance of chemotaxonomic biomarkers is evidenced by their

The Strength of Chemotaxonomy

inclusion in many published minimal standards given for the description of species and genera (Levy-Frebault and Portaels, 1992; Freney et al., 1999; Mattarelli et al., 2014; Lajudie et al., 2019), families (Arahal et al., 2007; Christensen et al., 2007; Logan et al., 2009; Mattarelli et al., 2014), suborders (Schumann et al., 2009) and orders (Oren et al., 1997). To exemplify the strength of chemotaxonomic biomarkers, a representative family (Intrasporangiaceae) of the suborder Micrococcineae will be discussed (Schumann et al., 2009). The suborder Micrococcineae was established by Stackebrandt et al. (1997) on the basis of a characteristic set of 16S rRNA gene signature nucleotides and updated by Zhi et al. (2009) and Nouioui (2018). The large number of taxa affiliated with this suborder demonstrate a remarkable variety of chemotaxonomic markers, including a peptidoglycan structure, CFA, polar lipids and respiratory quinones; as such they are a critical component of the minimal standards for the suborder Micrococcineae (Schumann et al., 2009). To date, the family Intrasporangiaceae contains over 19 validly named genera (https://lpsn. dsmz.de/family/intrasporangiaceae, accessed 16 October 2020) (Parte, 2018). In addition to phylogenetic criteria these groups are circumscribed on a number of well-documented chemotaxonomic markers (Fig. 9.2). For example, the diagnostic diamino acids for the genera Instrasporangium, Humibacillus, Lapillicoccus, Terrabacter and Terracoccus contain LL-DAP; whereas Janibacter, Knollia, Kribba, Oryzobacter, Phycicoccus and Tetraphaera contain meso-DAP. Ornithinibacter and Ornithinicoccus, as the names imply, contain ornithine, as does Segeticoccus. Where reported, all genera contain the acetyl form of the N-acyl type of the muramyl residue of the peptidoglycan. For the respiratory quinones almost all genera contain MK-8(H4). The exceptions are Aquipuribacter (which contains MK-10(H4)) and Knollia (which, in addition to MK-8(H4), also contains MK-7(H4) and MK6(H4)). Polar lipids contain combinations of PG, PI, PIM, PE, DPG and PL. Most genera produce iso- and anteiso- forms of fatty acids (i-C15:0, ai-C15:0, i-C16:0) in addition to other straight chain saturated and unsaturated CFA. Even with comprehensive phylogenetic and phylogenomic information, for all novel taxa within the suborder Micrococcineae, the chemotaxonomic

149

information is still considered essential in taxonomic descriptions. Winds of Change: Chemotaxonomy in the Era of Omics It is generally agreed that chemotaxonomic biomarkers are very stable within taxa but, with the advances in both genomics and proteomics, the relevance of these methods to modern taxonomy is now being questioned (Sutcliffe et al., 2013; Mahato et al., 2017). Most chemotaxonomic methods are laborious and not amenable to high-throughput methods; one exception is the extraction and identification of CFA using the MIDI system (http://midi-inc.com/pages/ microbial_id.html, accessed 9 July 2020). Other criticisms include, with polar lipids, the lack of resolution and information content on the 2D-TLC (unidentified products) often used in taxonomic descriptions. Although more sophisticated methods such as NMR and MS are available, these have not been adopted and used in the vast majority of taxonomic papers (Khuller and Goldfine, 1974; Johnston et al., 2010; Garrett et al., 2012). Inter-laboratory variation, subjectivity in the interpretation of data, the repetitive use of reference strains/ materials (adding to the financial costs) and the lack of portable/searchable databases are also major impediments to the generation of accurate information and the transfer of knowledge (Moore et al., 2010; Whitman, 2014). These criticisms, compared to high throughput methods and computational advances involved with the ‘omics’ are often cited as reasons to move away from traditional chemotaxonomic methods towards genomic and phylogenomic approaches to taxonomy (Sutcliffe et al., 2013; Whitman, 2014). Indeed, compared to the omics, some regard chemotaxonomy as antiquated, leading many principal investigators and students to adopt the molecular route, with a high-throughput approach to laboratory practice and publication rates. Unfortunately, such approaches are leading to fewer laboratories having the capability to culture a wide range of microorganisms, and even fewer having the capability to perform laboratory-based chemotaxonomic investigations (Tamames and Rosselló-Móra, 2012).

T

Terrabacter carboxydivorans PY2 (FJ717334)

Diamino acid

Polar Lipids

Major Fatty Acids

LL-DAP

PE, PI, DPG, PL

i-C15:0, i-C14:0, i-C16:0

LL-DAP

PE

i-C15:0, ai-C15:0, i-C16:0, C17:1

LL-DAP

PE, PI,PG, DPG

i-C15:0, ai-C15:0, C16:0

LL-DAP

PI, PIM, PG, DPG

i-C15:0, ai-C15:0, i-C16:0

L-Orn

PI, Pser, PG, DPG, PL

i-C15:0, ai-C15:0

L-Orn

PG, DPG, PS ND

i-C15:0, i-C16:1 i-C16:0, ai-C17:1ω9c

150

T

Terrabacter terrae PPLB (AY944176) 99

T

Terrabacter tumescens DSM 20308 (X83812) T

Humibacillus xanthopallidus KV-663 (AB282888) T

Terracoccus luteus DSM 44267 (RBXT01000001) T

Intrasporangium oryzae NRRL B-24470 (AWSA01000105) T

Intrasporangium calvum DSM 43043 (CP002343)

90

T

Intrasporangium mesophilum YIM 49065 (HQ204221) T

98

Ornithinicoccus hortensis HKI 0125 (Y17869) T

Ornithinicoccus halotolerans EGI 80423 (KT734857) T

90

Segeticoccus rhizosphaerae YJ01 (JN848792) T

meso-DAP

Oryzihumus soli Aerobe-19 (KP185144) T

LL-DAP PI, DPG meso-DAP (3-hydroxy) meso-DAP PI, PG, DPG, PE, APL, PL

Lapillicoccus jejuensis R-Ac013 (AM398397) T

Tetrasphaera japonica T1-X7 (HF570958) T

i-C15:0 , C16:0 i-C16:0, ai-C15:0, i-C14:0, ai-C15:0 i-C16:0, C17:1, i-C15:0 i-C16:0, ai-C17:0, i-C16:1 i-C15:0, C16:1, i-C14:1, C16:0

Knoellia locipacati DMZ1 (HQ171909)

meso-DAP

T

PI,PG, DPG

Knoellia sinensis|KCTC 19936 (AVPJ01000034) 90

i-C15:0, i-C17:0, i-C16:0, ai-C17:0

T

Knoellia flava TL1 (AVPI01000085) T

Oryzobacter terrae PSGM2-16 (KP100643) T

Ornithinibacter aureus HB09001 (FJ796074) T

Fodinibacter luteus YIM C003 (EU878005) T

Aquipuribacter hungaricus IV-75 (FM179321) T

Marihabitans asiaticum HG667 (AB286025)

93

T

Kribbia dieselivorans NBRC 106261 (BCRE01000280)

meso-DAP/3-OH meso-DAP

DPG, PE, PI, APLs

i-C16:0 , i-C17:1ω8c, i-C14:0

L-Orn

PI, PE, PG, DPG, GL

i-C18:1ω9c, i-C16:0 i-C15:0, C17:0

meso-DAP

DPG, PE, PG, PI, PLs

C18:1ω9cC16:0 C15:0,C16:0,18:1ω9c

meso-DAP

PG, Pls, Gls

meso-DAP

ND

i-C17:0, C17:1, i-C15:0, C18 :1, ai-C17:0

meso-DAP

ND

10-Me-C18:0, i-C16:0, C18:1, C16:0, C18:0

meso-DAP

PI, PG, DPG

C17:1, i-C17:0, i-C16:0, ai-C17:0

meso-DAP

PE, PI, DPG, PG

i-C15:0, i-C16:0

T

Janibacter corallicola PA2-Co5-61 (AB286023) T

Janibacter indicus CGMCC 1.12511 (FWXN01000022) T

Janibacter terrae NBRC 107853 (BCUV01000003) T

Phycicoccus ginsengisoli DCY87 (KF915798) T

Pedococcus cremeus CGMCC 1.6963 (jgi.1076208)

99

T

Pedococcus bigeumensis MSL03 (EF466128) 99

T

Pedococcus dokdonensis DSM 22329 (LT629711)

Sanguibacter suarezii NBRC 16159 (BCRT01000028)

1%

Fig. 9.2. Phylogenetic tree of 16S rRNA gene sequences and some diagnostic chemotaxonomic biomarkers of members of the family Intrasporangiaceae. The phylogenetic analysis was performed using MEGA X (Kumar et al., 2018) using the maximum-likelihood (Felsenstein, 1981) employing the Kimura 2-parameter substitution model (Kimura, 1980). Bootstrap values (%) were obtained with 1000 replicates and are displayed on their relative branches (Felsenstein, 1985). DAP, 2,6-diaminopimelic acid; Orn, ornithine; DPG, diphosphatidylglycerol; PG, phosphatidylglycerol; PI, phosphatidylinositol; PIM, phosphatidylinositol mannosides; PSer, phosphatidyl serine; PL, unidentified phospholipid(s); GL, unidentified glycolipid(s); PGL, unidentified phosphoglycolipid(s); ND, not determined.

P.A. Lawson and N.B. Patel

T

Knoellia subterranea KCTC 19937 (AVPK01000001)

The Strength of Chemotaxonomy

151

Table 9.2. Chemotaxonomic characteristics of members of the family Intrasporangiaceae.

Family/Genus

Major respiratory Diamino acid Acetyl type quinones

Polar lipids

Major fatty acids

PI, PIM, PG, DPG, PG

i-C15:0, ai-C15:0, i-C16:0, i-C15:0, ai-C15:0, i-C16:0, ai-C16:0, C16:0, C18:1ω9c and C18:3ω6,9,12c C18:1ω9c, C16:0,

Family Intrasporangiaceae Intrasporangium

LL-DAP

Acetyl

MK-8

Aquipuribacter

meso-DAP

ND

MK-10 (H4)

Fodinibacter

meso-DAP

Humibacillus

LL-DAP

ND

MK-8 (H4)

DPG, PE, PG, PI, PLs PE

Janibacter

meso-DAP

ND

MK-8 (H4)

PI, PG, DPG,

Knollia

meso-DAP

Acetyl

Kribbia

meso-DAP

ND

MK-8 (H4), MK-7 PI, PG, DPG, (H4), MK-6 (H4) MK-8 (H4) ND

Lapillicoccus

LL-DAP

Acetyl

MK-8 (H4)

DPG, PI

Marihabitans

meso-DAP

ND

MK-8 (H4)

ND

Ornithinibacter

L-Orn

ND

MK-8 (H4)

Ornithinicoccus

L-Orn

Acetyl

MK-8 (H4)

Oryzihumus

meso-DAP

Acetyl

MK-8 (H4)

PI, PE, PG, DPG, PI, PSer, PG, DPG, PL ND

Oryzobacter

meso-DAP

ND

MK-8 (H4)

Pedococcus

meso-DAP

Phycicoccus

meso-DAP

Acetyl

MK-8 (H4)

Segeticoccus

L-Orn

ND

MK-8 (H4)

DPG, PG, PS,

Terrabacter

LL-DAP

Acetyl

MK-8 (H4)

Terracoccus

LL-DAP

ND

MK-8 (H4)

Tetraphaera

(3-hydroxy) ND meso-DAP

MK-8 (H4)

PE, PI, DPG, PL PE, PI, PG, DPG PI, PG, DPG, PE, APL, PL

MK-8 (H4)

MK-8 (H4)

DPG, PE, PI, APLs, PLs PE, PI, DPG, PG PE, PI, PG, DPG, PL

i-C15:0, ai-C15:0, i-C16:0, C17:1 C17:1, C17:0, i-C16:0, C18:1 i-C15:0, i-C17:0, i-C16:0, ai-C17:0 10-Me-C18:0, i-C16:0, C18:1, C16:0, C18:0 i-C16:0, C17:1, i-C15:0 i-C16:0, C17:1, i-C15:0, C18:1, ai-C17:0 i-C18:1ω9c, i-C16:0, i-C15:0, C17:0 i-C15:0, ai-C15:0 i-C15:0, C16:0, i-C16:0, i-C15:0, i-C14:0, ai-C15:0 i-C16:0, C17:1ω8c, i-C14:0 i-C15:0, i-C16:0, C17:1, i-C16:0, i-C15:0, C15:0, C17:0 i-C15:0, i-C16:1 h, i-C16:0, anteiso-C17:1ω9c i-C15:0, i-C14:0, i-C16:0 i-C15:0, ai-C15:0, C16:0 i-C16:0, ai-C17:0, i-C16:1, i-C15:0, C16:1, i-C14:1, C16:0

152

P.A. Lawson and N.B. Patel

It is clear that the use of information coded for in the genome has assumed an increasing role in modern taxonomy (Chun and Rainey, 2014; Ramasamy et al., 2014; Thompson et al., 2015; Garrity, 2016; Hahnke et al., 2016; see also Chapters 10, 11, 13 and 16). Scientific journals that publish descriptions of prokaryotic taxa now require that, in addition to 16S rRNA gene sequences, the genome sequence must also be available in publicly accessible databases. Taxonomic descriptions now exploit the application of in silico analysis, replacing characteristics traditionally determined in the laboratory, with the dual goals of increasing the speed of the taxonomic description of taxa and the accuracy and consistency of information provided. For example, laboratory-based DNA-DNA hybridization methods have now routinely been replaced by the reporting of average nucleotide identity (ANI) and in silico DNA-DNA analysis (Goris et al., 2007; Meier-Kolthoff et al., 2013; Lee et al., 2016). Although attempts to derive biochemical pathways of reactions used in routine laboratory identifications have been reported, these have mostly gone unnoticed and have not been broadly adopted by the taxonomic community, but nevertheless should be encouraged (BaronaGómez et al., 2012; Amaral et al., 2014). Such advances have, however, been contingent upon improved sequencing and computational analysis (Köser et al., 2012; Loman et al., 2012; Bertelli and Greub, 2013). Online tools such as Kyoto Encyclopedia of Genes and Genomes (KEGG, www.genome.jp/kegg/); Rapid Annotations using Subsystems Technology (RAST, www.rast.nmpdr. org/); PathoSystems Resource Integration Center (PATRIC, https://www.patricbrc.org/); and MetaCyc (https://metacyc.org/, all websites accessed 9 July 2020) are integrated knowledgebased reference databases consisting of genomic information, systems information (metabolic pathways), chemical information (metabolites, ligands, enzymes) and health information (drug and diseases), all of which are essential in the omics era (Aziz et al., 2008; Caspi et al., 2012; Kanehisa et al., 2016; El-Gebali et al., 2018; Davis et al., 2019). Such tools should encourage the ‘in silico’ approach to the prediction of chemotaxonomic biomarkers from the genome (Barona- Gómez et al., 2012; Sutcliffe et al., 2013; Lawson et al., 2016; Patel et al., 2016; Mahato et al., 2017). In silico predictions of chemotaxonomic

traits are still in their infancy, despite calls to utilize this tool (Sutcliffe, 2010; Sutcliffe et al., 2013; Whitman, 2014). An examination of the literature reveals that, where genomic information is included in taxonomic descriptions, it is apparent that many descriptions consist of little more than superficial information automatically extracted from the genome, utilizing software programs; often this valuable information is consigned to ‘on-line supplementary information’ resigned to history and never utilized in a taxonomically beneficial manner. In principle the entire phenotype is encoded in the genome; however, and even with the tremendous advances in bioinformatics and computational tools, a complete understanding of the phenotype based solely on genome sequence is not yet possible (Kämpfer, 2014). For example, it has been debated if values for temperature, pH and salt tolerance profiles could ever be derived from the genome sequence alone. However, Jensen et al. (2012) employed protein families associated with specific thermophilicity classes to distinguish between thermophilic-, mesophilic- and psychrophilic-adapted bacterial genomes (Jensen et al., 2012). This approach offers tantalizing glimpses into similar applications that may be applied to tease out other important physiological growth parameters required in taxonomic descriptions. For chemotaxonomic traits, in silico studies have been limited, although programs such as KEGG, RAST, PATRIC and Pfam are commonly used tools that can be utilized to identify genes and biosynthetic pathways associated with these biomarkers. A review of some examples follows. The peptidoglycan is composed of glycan strands consisting of N-acetylglucosamine and linked by pentaN-acetylmuramic acid cross- peptides. It is the diamino acid in this stem peptide that is of taxonomic and diagnostic importance (Vollmer et al., 2008; Schumann, 2011) (Fig. 9.1). The cytoplasmic steps of the biosynthesis of bacterial peptidoglycan are catalysed by a series of enzymes. Among them are individual ligases that are responsible for the successive additions of L-alanine, D-glutamic acid, meso-diaminopimelic acid (or lysine or ornithine) and D-alanyl-D-alanine to UDP-N-acetylmuramic acid (Maruyama et al., 1988; Vollmer et al., 2008). These particular ligases are classified as Mur E genes (Lugtenberg and Dam, 1972). At the second meeting of Bergey’s International Society for

The Strength of Chemotaxonomy

Microbial Systematics, ‘Defining Microbial Diversity in the Genomic Era’, held in Edinburgh, UK, in 2014, Gary Olsen presented a paper entitled ‘Reconciling computer conjectures with facts’ using sequences retrieved from the PATRIC database. Using the DNA sequences of the ligases that add the diamino acid in the third position of the stem peptide of the peptidoglycan, it can be demonstrated by phylogenetic analysis that the UDP-acetylmuramoylalanyl-D-glutamateL-Lysine ligase, UDP-acetylmuramoylalanyl-Dglutamate-L-Lysine ornithine and UDP-acetylmuramoylalanyl-D-glutamate-2,6,-diaminopimelate ligase for the addition of lysine, ornithine and diaminopimelate, respectively, clustered separately within the phylogenetic tree. The analysis also demonstrates the diversity of the individual ligases between taxa (Fig. 9.3). Within the KEGG database (www.genome.jp/kegg-bin/show_pathway? map00550) Enzyme Commission (EC) numbers are given for UDP-acetylmuramoylalanyl-D-glutamate-L-Lysine ligase (EC 6.3.2.7) (Triolo et al., 2004) and UDP-acetylmuramoylalanyl-D-glutamate2,6,-diaminopimelate ligase (EC 6.3.2.13) (Michaud et al., 1990); UDP-acetylmuramoylalanylD-glutamate-L-Lysine ornithine does not appear to be present in the database. When genomes are processed by the KEGG databases, they appear not to identify the specific MurE gene, presumably owing to the diversity within the sequences, as demonstrated in the PATRIC database. The KEGG database also contains pathways for glycerophospholipid metabolism (www.genome.jp/kegg-bin/show_pathway?map00564) that includes the production of polar lipids. From our own laboratory, we have initiated preliminary investigations into the use of genome sequences in predicting polar lipid patterns (Lawson et al., 2016; Patel et al., 2016). The application of the use of the genome sequence to predict polar lipid production is discussed using two examples, the first using the organism Ezakiella peruensis, which belongs to the phylum Firmicutes (Patel et al., 2015) and the second using Thermoanaerobaculum aquaticum, which belongs to the Acidobacteria (Losey et al., 2013; Stamps et al., 2014). In addition to a number of unidentified lipids, both organisms are reported to produce DPG, PG and PE (Fig. 9.4A and C). But, when KEGG is used to examine the two genome sequences, different patterns arise with respect to the enzymes present in the synthesis of these

153

products. For example, both organisms are predicted to contain phosphatidylserine used in the production of phosphatidyl L-serine (EC 2.7.8.8), although PS is not seen on the chromatography plates derived in the laboratory. This can be explained, however, by the fact that (as previously stated) PS is normally very efficiently converted to PE by the enzyme phosphatidylserine decarboxylase (Psd) (E.C.4.1.1.65) that is found in both genomes (Fig. 9.4B and D). Similarly, both genomes are predicted to contain phosphatidylglycerophosphate synthetase (EC 2.7.8.5) in the production of phosphatidyl-glycerophosphate, an intermediate in the production of PG, a product that is observed on the experimentally derived chromatography plates. Both organisms, however, appear to lack a second enzyme – phosphatidylglycerophosphate A – which converts this intermediate to PG (Fig. 9.4B and D). PG is then used in the production of DPG (cardiolipin) that again is seen on the laboratory plate with both organisms; but, again, we see discrepancies in the enzymes predicted by KEGG. In the genome of E. peruensis, cardiolipin synthase is predicted but absent in T. aquaticum. A possible explanation is that it has been reported that a set of three homologous genes encoding cardiolipins exists in a few prokaryotes; for example, in the Enterobacteriaceae (Salmonella, Escherichia and Shigella), in the Gammaproteobacteria (Pseudomonas) and in Betaproteobacteria (Burkholderia and Bordetalla) (Tan et al., 2012). It follows that with the absence of cardiolipin synthase in T. aquaticum, alternative homologous genes may be present, or an as yet unidentified pathway may also exist. To further test the robustness of this approach, Patel (2018) compared the reported laboratory data with the corresponding genomes of 100 organisms (Fig. 9.5). Although a limited sample size, some patterns were observed. Observations were considered as concordant (in agreement) when the genomic data matched the lipid data reported in the literature, and discordant when the literature-based information did not match the genomic data present in the KEGG database. When DPG was examined, there was higher concordance within the phylum Firmicutes compared to other phyla. Discordance was observed within the other groups, where the presence of DPG in the literature encountered lack of the gene for DPG synthesis in the genome. With regards to PE, results showed concordance

Deinococcus marmoris strain KOPRI UDP-N-acetylmuramoyl-dipeptide--L-ornithine ligase Deinococcus puniceus strain DY1T UDP-N-acetylmuramoyl-dipeptide--L-ornithine ligase

154

Deinococcus phoenicis strain 1P10MET UDP-N-acetylmuramoyl-dipeptide--L-ornithine ligase Deinococcus radiodurans ATCC 13939T UDP-N-acetylmuramoyl-dipeptide--L-ornithine ligase Deinococcus reticulitermitis strain CGMCC 1.10218T UDP-N-acetylmuramoyl-dipeptide--L-ornithine ligase Deinococcus proteolyticus MRP UDP-N-acetylmuramoyl-dipeptide--L-ornithine ligase

100

Deinococcus murrayi DSM 11303T UDP-N-acetylmuramoyl-dipeptide--L-ornithine ligase Arsenicicoccus sp. strain S2 005 UDP-N-acetylmuramoyl-dipeptide--L-ornithine ligase Aeromonas diversa CECT 4254T UDP-N-acetylmuramoyl-dipeptide--26-diaminopimelate ligase 100

Aeromonas jandaei CECT 4228T UDP-N-acetylmuramoyl-dipeptide--26-diaminopimelate ligase Aeromonas enteropelogenes strain CECT 4255T UDP-N-acetylmuramoyl-dipeptide--26-diaminopimelate ligase Aeromonas dhakensis CIP 107500T UDP-N-acetylmuramoyl-dipeptide--26-diaminopimelate ligase Aeromonas hydrophila 113 UDP-N-acetylmuramoyl-dipeptide--26-diaminopimelate ligase Acidaminococcus fermentans DSM 20731T UDP-N-acetylmuramoyl-dipeptide--26-diaminopimelate ligase Jonquetella anthropi E3 UDP-N-acetylmuramoyl-dipeptide--26-diaminopimelate ligase Acetobacter indonesiensis DmW 046 UDP-N-acetylmuramoyl-dipeptide--26-diaminopimelate ligase Enterococcus aquimarinus DSM 17690T UDP-N-acetylmuramoy-dipeptide--L-lysine ligase

100

Lactobacillus crustorum LMG 23699 UDP-N-acetylmuramoy-dipeptide--L-lysine ligase Lactobacillus concavus DSM 17758T UDP-N-acetylmuramoy-dipeptide--L-lysine ligase Catellicoccus marimammalium M35/04/3T UDP-N-acetylmuramoy-dipeptide--L-lysine ligase Clostridium difficile Y384 UDP-N-acetylmuramoy-dipeptide--L-lysine ligase Enterococcus avium NCTC 9938T UDP-N-acetylmuramoy-dipeptide--L-lysine ligase

10%

Fig. 9.3. Phylogenetic tree of UDP-acetylmuramoylalanyl-D-glutamate-L-Lysine ligase, UDP-acetylmuramoylalanyl-D-glutamate-L-Lysine ornithine and UDP-acetylmuramoylalanyl-D-glutamate-2,6,-diaminopimelate ligase. The phylogenetic analysis was performed using MEGA X (Kumar et al., 2018) using the maximum-likelihood (Felsenstein, 1981) employing the Kimura 2-parameter substitution model (Kimura, 1980). Bootstrap values (%) were obtained with 1000 replicates and are displayed on their relative branches (Felsenstein, 1985).

P.A. Lawson and N.B. Patel

Aeromonas lusitana MDC 2473 UDP-N-acetylmuramoyl-dipeptide--26-diaminopimelate ligase

The Strength of Chemotaxonomy

A

155

B CDP-diacylglycerol

phosphatidylserine synthase

L1 L L L2

DP DP PG G DPG y yPig 1 yPig1 YPig YP P 2 YPig2

L3 L3

GL1 GL1 L

L5 5 PL L3 PL3

4.1.1.65 2.7.8.5

PE PE

PG P

PL1 PL1 1

phosphatidylglycerophosphate synthetase

phosphatidylserine decarboxylase

Phosphatidylglycerophosphate

PL2 PL L L2

L4 4

PhosphatidylL-serine

2.7.8.8

Phosphatidylethanolamine

A AL1

3.1.3.27 phosphatidylglycerophosphatase A

P PL4

Cardiolipin

ClsA/B Phosphatidylglycerol

Cls

cardiolipin synthase

ClsC

C

D CDP-diacylglycerol

P4 PG DPG

PhosphatidylL-serine

2.7.8.8

L L L

3 P3

4.1.1.65

PG G

2.7.8.5

PE P

P2 P P1 1

phosphatidylserine synthase

C

A1 1 A2 2

phosphatidylglycerophosphate synthetase

phosphatidylserine decarboxylase

Phosphatidylglycerophosphate Phosphatidylethanolamine

3.1.3.27

phosphatidylglycerophosphatase A

ClsA/B Phosphatidylglycerol

Cls

Cardiolipin cardiolipin synthase

ClsC

Fig. 9.4. Polar lipid analysis. Laboratory-based information compared to in silico information.

with genomic data and were consistent between the different phyla. Analysis of PG, however, showed that the majority of genomes had only one of the two required enzymes (phophatidylglycerophophate synthase (EC 2.7.8.5) and phospatidylglycerophatase A (EC 3.1.3.27)) to synthesize the lipid, even if the literature reported presence of the lipid on TLC plates. Across all taxa, PS is rarely reported, but phosphatidyl L-serine synthase (EC 2.7.8.8) is present in the genome of the majority of organisms. However, it is pertinent to note that on examination of the TLC plates published in the literature, it was clear that some spots on the chromatography plates were

mis-labelled or not labelled at all, a common criticism of the use of the TLC method of analysing polar lipids. On the whole, and based on these observations, the in silico method of predicting polar lipids is promising but additional studies should be encouraged. With respect to fatty acid biosynthesis, the pathways for the fatty acid products are well documented (Cronan and Thomas, 2009) and are present in the KEGG pathway, but few enzymes have been assigned EC numbers (www.genome.jp/ kegg-bin/show_pathway?map00061/, accessed 7 July 2020). Some enzymes assigned EC numbers are those involved in the chain elongation and

Enterococcus_faecalis_(JCM_5803) Enterococcus_faecium_(LMG_11423) Streptococcus_downei_(ATCC_33748) Cottabacterium_suis_(MTC7) Exiguobacterium_antarcticum_(DSM_14480) Salinicoccus_(SV-16) Sporosarcina_koreensis_(F73) Listeria_monocytogenes_(NCTC10357) Gracilibacillus_boraciitolerans_(T-16X) Oceanobacillus_picturae_(LMG_19492) Bacillus_indicus_(Sd/3) Bacillus_siamensis_(PD-A10)

0.06

0.03

Rubrobacter_radiotolerans_(JCM_2153) Rubrobacter_xylanophilus_(DSM_9941) Enterorhabdus_caecimuris_(B7) Enterorhabdus_mucosicola_(Mt1B8) Gardnerella_vaginalis_(ATCC_14018) Streptomyces_coelicolor_(DSM_40233) Streptomyces_pluripotens_(MUSC_135) Streptomyces_albus_(DSM_40313) Actinoplanes_friuliensis_(HAG) Lawsonella_clevelandensis_(CCF-01) Tsukamurella_paurometabola_(ATCC8368) Amycolicicoccus_subflavus_(DQS3-9A1) Mycobacterium_marinum_(ATCC_927) Mycobacterium_tuberculosis_(NCTC_7416) Treponema_socranskii Brachyspira_hyodysenteriae_(B78) Brachyspira_innocens_(B256) Terriglobus_roseus_(KBS_63) Terriglobus_saanensis_(SP1PR4) Acidobacterium_capsulatum_(ATCC_51196) Chloracidobacterium_thermophilum_(B) Pyrinomonas_methylaliphatogenes_(K22) Thermoanaerobaculum_aquaticum_(MP-01) Rhodothermus_marinus_(DSM_4252) Porphyromonas_gingivalis_(ATCC33277) Prevotella_melaninogenica_(ATCC_25845) Siansivirga_zeaxanthinifaciens_(CC-SAMT-1) Hymenobacter_swuensis_(DY53) Dyadobacter_crusticola_(CP183-8) Fulvivirga_imtechensis_(AK7) Nafulsella_turpanensis_(ZLM-10)

1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

2 2 2 2 2 2 2 2 2 2 3 2 2 2

1 1 1 1 1

1 3 3 3 1 1 3 1 1 3 3 3 1 1

1 1 1 1 1

2 2 2 3 2 2 2 1 2 2 1 2 2 2

1 1 3 3 3 1 1 1 1 3 1 1 1 1

1 1 1

1 1 1

1 1 1

2 2 2

3 3 3

1 1

3 3 3 1 1 3

3 3 1

1 1 1 1 1 2

1 1 1 1 1 1

1 1 1 1 1 1

1 1 1 1 1 1

3 1 1 1 3 3 3 1

1 1 1

2 3 3 2 3 3 3 3

1 3 1 1 1 1 1 1

1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1

Firmicutes

Actinobacteria

1 1 1 1 1

Spirochaetes Acidobacteria

1 1 1

Bacteroidetes

PE

3 1 3 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1

1

3 1 3 3

3 3 3 3 2 3 3 2 2 2 1 1 1 1 2 2 1 2 3 1 3 2 1 1 3 1 1 1 1 1 2 1 1 1 1 1 1 1 1 2 1 1

TLC KEGG

1 1 1 1 1 1 1 1 1 1 1 3 3 1 3 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 3 3

1 1 1 1

Proteobacteria

PS TLC KEGG 1 1 1 1 1 1 3 3 1 1 1 3 3 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1

1

1 1 1 3 3 3

1 1 1

1

1 3 1 3 3 3 1 1

3

1 1

1 1 1

1

Present

Fig. 9.5. Polar lipid analysis. Phylogenetic groups with laboratory-based information compared to in silico information.

3 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

3 1 1

1 1

1 1 1 1 1 1 1

1 1 1 1 1 1 3 3 3 1 1 3 1 1 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 3 3 3 1 3 1 3 3 3 1 1 1 1 3 3 3 1 1 1 1 3 1 1 1 1

3 3 3

1 1

Absent Not Reported pgpA Absen Absent

P.A. Lawson and N.B. Patel

Rhizobium_tropici_(CIAT_899) Sinorhizobium_meliloti_(1021) Brucella_microti_(CCM_4915) Agrobacterium_tumefaciens_(NCPPB2437) Mesorhizobium_ciceri_(UPM-Ca7) Mesorhizobium_loti_(LMG_6125) Devosia_sp._nov._(H5989) Devosia_riboflavina_(DSM_7230) Pelagibacterium_halotolerans_(B2) Parvibaculum_lavamentivorans_(AY387398) Rhodobacter_sphaeroides_(2.4.1) Celeribacter_indicus_(P73) Celeribacter_marinus_(IMCC12053) Marinovum_algicola_(ATCC_51440) Roseovarius_tolerans_(Ekho_Lake-172) Methyloceanibacter_caenitepidi_(Gela4) Caulobacter_crescentus_(CB15) Sphingopyxis_alaskensis_(RB2256) Zymomonas_mobilis_(ATCC29192) Bdellovibrio_bacteriovorus_(DSM_50701) Janthinobacterium_agaricidamnosum_(W1r3T) Pandoraea_thiooxydans_(ATSB16) Bordetella_pertussis_(ATCC_9797) Neisseria_gonorrhoeae_(NCTC_83785) Snodgrassella_alvi_(wkB2) Lysobacter_capsici_(YC5194) Wenzhouxiangella_marina_(Ma-11) Marinobacter_hydrocarbonoclasticus_(ATCC_49840) Pseudomonas_knackmussii_(B13) Gynuella_sunshinyii_(YC6258) Alcanivorax_borkumensis_(Sk2) Kangiella_geojedonensis_(YCS-5) Shewanella_putrefaciens_(LMG_26268) Gilliamella_apicola_(wkB1) Photorhabdus_luminescens_(1121) Escherichia_coli_(ATCC_11775) Serratia_marcescens_(DSM_30121) Yersinia_similis_(Y228) Geobacter_metallireducens_(GS-15) Myxococcus_xanthus_(ATCC_25232) Desulfovibrio_gigas_(SB1) Helicobacter_pylori_(ATCC_43504) Clostridium_novyi_(JCM1406) Clostridium_tetani_(NCTC_279) Ezakiella_peruensis_(M6.X2)

PG TLC KEGG

156

DPG TLC KEGG

The Strength of Chemotaxonomy

final production of individual fatty acids, for example the medium-chain acyl-[acyl-carrier-protein] hydrolase for C10:0 (decanoic acid) and C12:0 (dodecanoic acid) have been assigned the same EC number (3.1.2.21); similarly, for C14:0 (tetradecanoic acid), C16:0 (hexadecanoic acid) and C18:0 (octadecanoic acid) the EC number is 3.12.14. However, and probably owing to the widespread use of automated systems such as the MIDI, this in silico approach has not been applied to taxonomic investigations. Similarly, with MAs, the genes for these products within the Corynebacteriales (Lanéelle et al., 2013) have been investigated but not assigned EC numbers or incorporated into in silico investigations. It is envisaged that, with the huge amount of genomic data being generated, metabolic pathways and individual genes responsible for the synthesis of particular fatty acids will be identified with greater accuracy and utilized in future in silico analysis. Databases such as KEGG are based on reference organisms, and many phylogenetic groups are not represented. Furthermore, the majority of organisms cannot be cultivated; it follows, therefore, that many novel genes and biosynthetic pathways may remain to be elucidated. It is apparent that relatively few genes/enzymes have been characterized and experimentally verified. Indeed, it was reported that 30% of EC numbers lack sequence data (Feist et al. 2009; Kanehisa et al., 2016), although this figure is likely to decrease over time. The KEGG analyses that have been performed to date, therefore, have been based on a small set of experimentally verified enzymes. This would also account for the discrepancies being seen between the experimentally derived information and genomic data and, again, further investigations are encouraged. As knowledge of genes and biochemical pathways derived from genomic studies grows, investigators should be encouraged to include this information in formal taxonomic descriptions. The correlation of experimentally derived information with information extracted from the genome will improve confidence in in silico modelling of chemotaxonomic biomarkers.

157

Improvements in the aforementioned computational programs leading to more precise phenotypic and chemotaxonomic predictions will undoubtedly be made and evolve with the incorporation of artificial intelligence, machine learning and neural networks (Bini, 2018; Qu et al., 2019).

Conclusion: Chemotaxonomy and What Lies Ahead The inevitable march towards reliance on the genome of Bacteria and Archaea, combined with the diminishing number of laboratories that focus on microbial cultivation (and even fewer with the capability to perform chemotaxonomic methods) raises the question: ‘What lies ahead?’ Although the technology for these chemotaxonomic methods is not a barrier in itself, the lack of automation and high-throughput approaches has certainly discouraged many investigators from routinely utilizing these techniques. Second, many investigators now question the very application of chemotaxonomic biomarkers as a useful tool in the characterization and description of novel taxa. Despite all the debate within the literature, the format of taxonomic descriptions has changed little over the past 50 years or so. Depending on the taxa in question, most journals, editors and reviewers still insist on chemotaxonomic traits being included in the protologue of novel taxa (Tindall et al., 2010; Rainey, 2011). While the demarcation of species and genera is now defined mostly by methods based on DNA sequences and measures of sequence relatedness such as ANI and amino acid identify (AAI), respectively, chemotaxonomic information can strongly support these affiliations. The delineation of higher taxa at the family level and above may especially be aided by chemotaxonomic criteria, as demonstrated in published minimum standards. Although chemotaxonomic methods have been enormously important in the past with identification and classification schemes, it remains to be seen in what form they will be utilized in the genomic era, and in the suite of methods available in the era of omics.

References Abel, K., DeSchmertzing, H. and Peterson, J.I. (1963) Classification of microorganisms by analysis of chemical composition I. feasibility of utilizing gas chromatography. Journal of Bacteriology 85, 1039– 1044. https://doi.org/10.1128/JB.85.5.1039-1044.1963

158

P.A. Lawson and N.B. Patel

Adam, A., Petit, J.F., Wietzerbin-Falszpan, J., Sinay, P., Thomas, D.W. and Lederer, E. (1969) L'acide N-glycolyl-muramique, constituant des parois de Mycobactenurn smegmatis: identification par spectrometrie de masses. FEBS Letters 4, 87–92. https://doi.org/10.1016/0014-5793(69)80203-6 Amaral, G.R.S., Dias, G.M., Wellington-Oguri, M., Chimetto, L., Campeão, M.E., Thompson, F.L. and Thompson, C.C. (2014) Genotype to phenotype: identification of diagnostic vibrio phenotypes using whole genome sequences. International Journal of Systematic and Evolutionary Microbiology 64, 357–365. https://doi.org/10.1099/ijs.0.057927-0 Ames, G.F. (1968) Lipids of Salmonella typhimurium and Escherichia coli: Structure and metabolism. Journal of Bacteriology 95, 833–843. https://doi.org/10.1128/JB.95.3.833-843.1968 Anderson, R.J. (1929) The chemistry of the lipoids of tubercle bacilli. Journal of Biological Chemistry 83, 505–522. Arahal, D.R., Vreeland, R.H., Litchfield, C.D., Mormile, M.R., Tindall, B.J., Oren, A. et al. (2007) Recommended minimal standards for describing new taxa of the family Halomonadaceae. International Journal of Systematic and Evolutionary Microbiology 57 (10), 2436–2446. https://doi.org/10.1099/ijs.0.65430-0 Aronson, R.J. (1898) Zur Biologie der Tuberkelbacillen. Berliner Klinische Wochenschrift 35, 484–486. Asselineau, J. and Lederer, E. (1950) Structure of the mycolic acids of Mycobacteria. Nature 166, 782–783. https://doi.org/10.1038/166782a0 Aziz, R.K., Bartels, D., Best, A.A., DeJongh, M., Disz, T., Edwards, R.A. et al. (2008) The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics 9 (1), 15. https://doi.org/10.1186/ 1471-2164-9-75 Barksdale, L. and Kim, K.-S. (1977) Mycobacterium. Bacteriological Reviews 41, 217–372. https://doi. org/10.1128/MMBR.41.1.217-372.1977 Barona-Gómez, F., Cruz-Morales, P. and Noda-García, L. (2012) What can genome-scale metabolic network reconstructions do for prokaryotic systematics? Antonie Van Leeuwenhoek 101, 35–43. https:// doi.org/10.1007/s10482-011-9655-1 Becker, B., Lechevalier, M.P. and Lechevalier, H.A. (1965) Chemical composition of cell-wall preparations from strains of various form-genera of aerobic Actinomycetes. Applied Microbiology 13: 236–243. https://doi.org/10.1128/AEM.13.2.236-243.1965 Beifuss, U. and Tietze, M. (2005) In: Mulzer, J. (ed.) Natural Product Synthesis II Methanophenazine and other natural biologically active phenazines. Topics in Current Chemistry. pp. 77–113. https://doi.org/10.1007/b96889 Bertelli, C. and Greub, G. (2013) Rapid bacterial genome sequencing: methods and applications in clinical microbiology. Clinical Microbiology and Infection 19, 803–813. https://doi.org/10.1111/1469-0691.12217 Bini, S.A. (2018) Artificial Intelligence, Machine Learning, Deep Learning, and Cognitive Computing: what do these terms mean and how will they impact health care? Journal of Arthroplasty 33, 2358–2361. https://doi.org/10.1016/j.arth.2018.02.067 Bird, S.S., Marur, V.R., Sniatynski, M.J., Greenberg, H.K. and Kristal, B.S. (2011) Lipidomics profiling by high-resolution LC−MS and high-energy collisional dissociation fragmentation: focus on characterization of mitochondrial cardiolipins and monolysocardiolipins. Analytical Chemistry 83, 940–949. https:// doi.org/10.1021/ac102598u Bishop, D., Pandya, K. and King, H. (1962) Ubiquinone and vitamin K in bacteria. Biochemical Journal 83, 606–614. https://doi.org/10.1042/bj0830606 Bousfield, I.J., Smith, G.L., Dando, T.R. and Hobbs, G. (1983) Numerical analysis of total fatty acid profiles in the identification of coryneform, nocardioform and some other bacteria. Microbiology 129, 375–394. https://doi.org/10.1099/00221287-129-2-375 Busse, H.J. (2011) Polyamines. In: Rainey, F., Oren, A. (eds) Methods in Microbiology Volume 38. Taxonomy of Prokaryotes. Academic Press, London. pp. 239-259. https://doi.org/10.1016/B978-0-12-387730-7.00011-5 Busse, H.-J., Denner, E.B.M. and Lubitz, W. (1996) Classification and identification of bacteria: current approaches to an old problem. Overview of methods used in bacterial systematics. Journal of Biotechnology 47, 3–38. https://doi.org/10.1016/0168-1656(96)01379-X Buyukpamukcu, E., Hau, J., Fay, L.-B. and Dionisi, F. (2007) Analysis of phospholipids using electrospray ionisation tandem mass spectrometry. Lipid Technology 19, 136–138. https://doi.org/10.1002/ lite.200700046 Caspi, R., Altman, T., Dreher, K., Fulcher, C.A., Subhraveti, P., Keseler, I.M. et al. (2012) The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Research 40, D742–D753. https://doi.org/10.1093/nar/gkr1014 Christensen, H., Kuhnert, P., Busse, H.-J., Frederiksen, W.C. and Bisgaard, M. (2007) Proposed minimal standards for the description of genera, species and subspecies of the Pasteurellaceae. International Journal of Systematic and Evolutionary Microbiology 57, 166–178. https://doi.org/10.1099/ijs.0.64838-0

The Strength of Chemotaxonomy

159

Chun, J. and Rainey, F.A. (2014) Integrating genomics into the taxonomy and systematics of the Bacteria and Archaea. International Journal of Systematic and Evolutionary Microbiology 64, 316–324. https:// doi.org/10.1099/ijs.0.054171-0 Collins, M.D. and Jones, D. (1981) Distribution of isoprenoid quinone structural types in bacteria and their taxonomic implication. Research Microbiology 45, 316–317. https://doi.org/10.1128/MMBR.45.2.316-354.1981 Costa, M.S. da, Albuquerque, L., Nobre, M.F. and Wait, R. (2011a) The extraction and identification of respiratory lipoquinones of prokaryotes and their use in taxonomy. In: Rainey, F., Oren, A. (eds) Methods in Microbiology Volume 38. pp. 197–206. https://doi.org/10.1016/B978-0-12-387730-7.00009-7 Costa, M.S. da, Albuquerque, L., Nobre, M.F., and Wait, R. (2011b) The identification of fatty acids in bacteria. In: Rainey, F. and Oren, A. (eds) Methods in Microbiology Volume 38.Taxonomy of Prokaryotes. Academic Press, London, pp. 183–196. https://doi.org/10.1016/B978-0-12-387730-7.00008-5 Costa, M.S. da, Albuquerque, L., Nobre, M.F. and Wait, R. (2011c) The Identification of polar lipids in Prokaryotes. In: Rainey, F., Oren, A. (eds) Methods in Microbiology Volume 38. Taxonomy of Prokaryotes. Academic Press, London, pp. 165–181. https://doi.org/10.1016/B978-0-12-387730-7.00007-3 Crane, F.L. (1965) Distribution of ubiquinones. In: Biochemistry of Quinones. pp. 183–206. Cronan, J.E. and Thomas, J. (2009) Bacterial fatty acid synthesis and its relationships with polyketide synthetic pathways. Methods in Enzymology 459, 395–433. https://doi.org/10.1016/S0076-6879(09)04617-5 Cummins, C.S. (1956) The chemical composition of the bacterial cell wall. International Review of Cytology 5, 25–50. https://doi.org/10.1016/S0074-7696(08)62566-8 Cummins, C.S. and Harris, H. (1956) The chemical composition of the cell wall in some gram-positive bacteria and its possible value as a taxonomic character. Journal of General Microbiology 14, 583–600. https://doi.org/10.1099/00221287-14-3-583 Daffé, M. and Draper, P. (1998) The envelope layers of mycobacteria with reference to their pathogenicity. Advances in Microbial Physiology 39, 131–203. https://doi.org/10.1016/S0065-2911(08)60016-8 Davis, J.J., Wattam, A.R., Aziz, R.K., Brettin, T., Butler, R., Butler, R.M. et al. (2019) The PATRIC Bioinformatics Resource Center: expanding data and analysis capabilities. Nucleic Acids Research 48, D606–D612. https://doi.org/10.1093/nar/gkz943 Ejsing, C.S., Duchoslav, E., Sampaio, J., Simons, K., Bonner, R., Thiele, C. et al. (2006) Automated identification and quantification of glycerophospholipid molecular species by multiple precursor ion scanning. Analytical Chemistry 78, 6202–6214. https://doi.org/10.1021/ac060545x El-Gebali, S., Mistry, J., Bateman, A., Eddy, S.R., Luciani, A., Potter, S.C. et al. (2018) The Pfam protein families database in 2019. Nucleic Acids Research 47: gky995. https://doi.org/10.1093/nar/gky995 Evans, R.I., McClure, P.J., Gould, G.W. and Russell, N.J. (1998) The effect of growth temperature on the phospholipid and fatty acyl compositions of non-proteolytic Clostridium botulinum. International Journal of Food Microbiology 40, 159–167. https://doi.org/10.1016/S0168-1605(98)00029-4 Falb, M., Müller, K., Königsmaier, L., Oberwinkler, T., Horn, P., Gronau, S. von, et al. (2008) Metabolism of halophilic archaea. Extremophiles 12, 177–196. https://doi.org/10.1007/s00792-008-0138-x Feist, A.M., Herrgård, M.J., Thiele, I., Reed, J.L. and Palsson, B.Ø. (2009) Reconstruction of biochemical networks in microorganisms. Nature Reviews Microbiology 7, 129–143. Felsenstein, J. (1981) Evolutionary trees from DNA sequences: A maximum likelihood approach. Journal of Molecular Evolution 17, 368–376. https://doi.org/10.1007/BF01734359 Felsenstein, J. (1985) Confidence limits on phylogenies: An approach using the Bootstrap. Evolution 39, 783. https://doi.org/10.1111/j.1558-5646.1985.tb00420.x Freney, J., Kloos, W.E., Hajek, V., Webster, J.A., Bes, M., Brun, Y. and Vernozy-Rozand, C. (1999) Recommended minimal standards for description of new staphylococcal species. International Journal of Systematic Bacteriology 49, 489–502. https://doi.org/10.1099/00207713-49-2-489 Gambacorta, A., Gliozzi, A. and Rosa, M.D. (1995) Archaeal lipids and their biotechnological applications. World Journal of Microbiology and Biotechnology 11, 115–131. https://doi.org/10.1007/BF00339140 Garrett, T.A., O’Neill, A.C. and Hopson, M.L. (2012) Quantification of cardiolipin molecular species in Escherichia coli lipid extracts using liquid chromatography/electrospray ionization mass spectrometry. Rapid Communications in Mass Spectrometry 26, 2267–2274. https://doi.org/10.1002/rcm.6350 Garrity, G.M. (2016) A new genomics-driven taxonomy of Bacteria and Archaea: Are we there yet? Journal of Clinical Microbiology 54, 1956–1963. https://doi.org/10.1128/JCM.00200-16 Garrity, G.M. and Oren, A. (2013) Response to Sutcliffe et al.: regarding the International Committee on Systematics of Prokaryotes. Trends in Microbiology 21, 53–55. https://doi.org/10.1016/j.tim.2012.12.003 Gebhardt, H., Meniche, X., Tropis, M., Krämer, R., Daffé, M. and Morbach, S. (2007) The key role of the mycolic acid content in the functionality of the cell wall permeability barrier in Corynebacterineae. Microbiology 153, 1424–1434. https://doi.org/10.1099/mic.0.2006/003541-0

160

P.A. Lawson and N.B. Patel

Ghuysen, J.M. (1968) Use of bacteriolytic enzymes in determination of wall structure and their role in cell metabolism. Bacteriological Reviews 32, 425–464. https://doi.org/10.1128/MMBR.32.4_Pt_2.425-464.1968 Goldfine, H. and Bloch, K. (1961) On the origin of unsaturated fatty acids in clostridia. Journal of Biological Chemistry 236, 2596–601. Goodfellow, M. and Minnikin, D. (1985) Introduction to chemosystematics. In: Goodfellow, M. and O’Donnell, A.G. (eds) Chemical Methods in Bacterial Systematics. Academic Press, New York, pp. 1–15. Goodfellow, M. and O’Donnell, A.G. (1994) Chemosystematics: current state and future prospects. In: Goodfellow, M. and Minnikin, D (eds) Chemical Methods in Prokaryotic Systematics. pp. 1–20. Goris, J., Konstantinidis, J.M.T.K.T., Klappenbach, J.A., Coenye, T., Vandamme, P.A.R. and Tiedje, J.M. (2007) DNA-DNA hybridization values and their relationship to whole-genome sequence similarities. International Journal of Systematic and Evolutionary Microbiology 57, 81–91. https://doi.org/10.1099/ ijs.0.64483-0 Gvozdiak, O.R., Schumann, P., Griepenburg, U. and Auling, G. (1998) Polyamine profiles of Gram-positive catalase positive cocci. Systematic and Applied Microbiology 21, 279–284. https://doi.org/10.1016/ S0723-2020(98)80034-9 Hagen, P.O. (1974) Lipids of Sphaerophorus ridiculosis: plasmalogen composition. Journal of Bacteriology 119, 643–645. https://doi.org/10.1128/JB.119.2.643-645.1974 Hahnke, R.L., Meier-Kolthoff, J.P., García-López, M., Mukherjee, S., Huntemann, M., Ivanova, N.N., et al. (2016) Genome-based taxonomic classification of Bacteroidetes. Frontiers in Microbiology 7, 72. https://doi.org/10.3389/fmicb.2016.02003 Hamana, K. and Matsuzaki, S. (1993) Polyamine distribution patterns serve as a phenotypic marker in the chemotaxonomy of the Proteobacteria. Candian Journal of Microbiology 39, 304–310. https://doi. org/10.1139/m93-043 Hedlund, B.P., Dodsworth, J.A. and Staley, J.T. (2015) The changing landscape of microbial biodiversity exploration and its implications for systematics. Systematic and Applied Microbiology 38, 231–236. https://doi.org/10.1016/j.syapm.2015.03.003 Hosoya, R., Hamana, K., Niitsu, M. and Itoh, T. (2004) Polyamine analysis for chemotaxonomy of thermophilic eubacteria: Polyamine distribution profiles within the orders Aquificales, Thermotogales, Thermodesulfobacteriales, Thermales, Thermoanaerobacteriales, Clostridiales and Bacillales. Journal of General and Applied Microbiology 50, 271–287. https://doi.org/10.2323/jgam.50.271 Imhoff, J.F. (1991) Polar lipids and fatty acids in the genus Rhodobacter. Systematic and Applied Microbiology 14, 228–234. https://doi.org/10.1016/S0723-2020(11)80373-5 Imhoff, J.F. (2015) Genus Rhodobacter In: Whitman, W.B., Rainey, F., Kämpfer, P., Trujillo, M., Chun, J., DeVos, P., Hedlund, B., Dedysh, S. (eds) Bergey’s Manual of Systematics of Archaea and Bacteria. John Wiley & Sons, Inc., New York, NY. https://doi.org/10.1002/9781118960608.gbm00862 Imhoff, J.F. and Bias-lmhoff, U. (1995) Lipids, quinones and fatty acids of anoxygenic phototrophic bacteria. In: Blankenship R.E., Madigan M.T. and Bauer, C.E. (eds) Anoxygenic Photosynthetic Bacteria. Advances in Photosynthesis and Respiration, vol 2. Springer, Dordrecht, pp. 179–205. https://doi. org/10.1007/0-306-47954-0_10 Jackson, M. (2014) The mycobacterial cell envelope-lipids. Cold Spring Habor Perspect Med 4: a021105. https://doi.org/10.1101/cshperspect.a021105 Jain, S., Caforio, A. and Driessen, A.J.M. (2014) Biosynthesis of archaeal membrane ether lipids. Frontiers in Microbiology 5, 641. https://doi.org/10.3389/fmicb.2014.00641 James, A.T. and Martin, A.J.P. (1952) Gas-liquid partition chromatography: the separation and micro-estimation of volatile fatty acids from formic acid to dodecanoic acid. Biochemistry Journal 50, 679–690. https://doi.org/10.1042/bj0500679 Jensen, D.B., Vesth, T.C., Hallin, P.F., Pedersen, A.G. and Ussery, D.W. (2012) Bayesian prediction of bacterial growth temperature range based on genome sequences. BMC Genomics 13, S3. https://doi. org/10.1186/1471-2164-13-S7-S3 Jensen, N.J. and Gross, M.L. (1988) A comparison of mass spectrometry methods for structural determination and analysis of phospholipids. Mass Spectrometry Reviews 7, 41–69. https://doi.org/10.1002/ mas.1280070103 Johnston, N.C., Aygun-Sunar, S., Guan, Z., Ribeiro, A.A., Daldal, F., Raetz, C.R.H. and Goldfine, H. (2010) A phosphoethanolamine-modified glycosyl diradylglycerol in the polar lipids of Clostridium tetani. Journal of Lipid Research 51, 1953–1961. https://doi.org/10.1194/jlr.M004788 Kamio, Y., Kim, K.C. and Takahashi, H. (1970) Glyceryl ether phopholipids in Selenomonas ruminantium. Journal of General and Applied Microbiology 16, 291–300. https://doi.org/10.2323/jgam.16.4_291

The Strength of Chemotaxonomy

161

Kamp, J.A. den, Redai, I. and Deenen, L.L. van (1969) Phospholipid composition of Bacillus subtilis. Journal of Bacteriology 99, 298–303. https://doi.org/10.1128/JB.99.1.298-303.1969 Kämpfer, P. (2014) Continuing importance of the ‘Phenotype’ in the genomic era. In: Rainey, F. and Oren, A. (eds) Methods in Microbiology Volume 41. New Approaches to Prokaryotic Systematics. Academic Press, London, pp. 307–320. https://doi.org/10.1016/bs.mim.2014.07.005 Kämpfer, P. and Kroppenstedt, R.M. (1996) Numerical analysis of fatty acid patterns of coryneform bacteria and related taxa. Candian Journal of Microbiology 42, 989–1005. https://doi.org/10.1139/m96-128 Kandler, O. and König, H. (1998) Cell wall polymers in Archaea (Archaebacteria). Cellular and Molecular Life Sciences 54, 305–308. https://doi.org/10.1007/s000180050156 Kaneda, T. (1991) Iso- and anteiso-fatty acids in bacteria: biosynthesis, function, and taxonomic significance. Microbiology and Molecular Biology Reviews 55, 288–302. https://doi.org/10.1128/MMBR.55.2.288302.1991 Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. and Morishima, K. (2016) KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Research 45, gkw1092 D361. https://doi. org/10.1093/nar/gkw1092 Kates, M. (1964) Bacterial lipids. In: Paoletti, R., Kritchevsky, D. (eds) Advances in Lipid Research. Academic Press, New York, pp. 17–90. https://doi.org/10.1016/B978-1-4831-9938-2.50008-X Khuller, G.K. and Brennan, P.J. (1972) The polar lipids of some species of Nocardia. Journal of General Microbiology 73, 409–412. https://doi.org/10.1099/00221287-73-2-409 Khuller, G.K. and Goldfine, H. (1974) Phospholipids of Clostridium butyricum. V. Effects of growth temperature on fatty acid, alk-1-enyl ether group, and phospholipid composition. Journal of Lipid Research 15, 500–507. Kimura, M. (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution 16, 111–120. https://doi. org/10.1007/BF01731581 Kneifel, H., Stetter, K.O., Andreesen, J.R., Wiegel, J., König, H. and Schoberth, S.M. (1986) Distribution of polyamines in representative species of Archaebacteria. Systematic and Applied Microbiology 7, 241–245. https://doi.org/10.1016/S0723-2020(86)80013-3 Koga, Y. (2010) The biosynthesis and evolution of archaeal membranes and ether phospholipids. In: Timmis, K.N. (ed.) Handbook of Hydrocarbon and Lipid Microbiology. Springer-Verlag, Berlin, Heidelberg, pp. 451–458. https://doi.org/10.1007/978-3-540-77587-4_33 Koga, Y. and Morii, H. (2007) Biosynthesis of ether-type polar lipids in archaea and evolutionary considerations. Microbiology and Molecular Biology Reviews 71, 97–120. https://doi.org/10.1128/ MMBR.00033-06 Köser, C.U., Ellington, M.J., Cartwright, E.J.P., Gillespie, S.H., Brown, N.M., Farrington, M. et al. (2012) Routine use of microbial whole genome sequencing in diagnostic and public health microbiology. PLOS Pathogens 8, e1002824. https://doi.org/10.1371/journal.ppat.1002824 Kumar, S., Stecher, G., Li, M., Knyaz, C. and Tamura, K. (2018) MEGA X: Molecular evolutionary genetics analysis across computing platforms. Molecular Biology and Evolution 35, 1547–1549. https://doi. org/10.1093/molbev/msy096 Kushwaha, S.C., Kates, M., Sprott, G.D. and Smith, I.C.P. (1981) Novel polar lipids from the methanogen Methanospirillum hungatei GP1. Biochimica et Biophysica Acta (BBA) - Lipids and Lipid Metabolism 664, 156–173. https://doi.org/10.1016/0005-2760(81)90038-2 Lagkouvardos, I., Overmann, J., and Clavel, T. (2017) Cultured microbes represent a substantial fraction of the human and mouse gut microbiota. Gut Microbes 91, 1–11. https://doi.org/10.1080/19490976.2017.1320468 Lajudie, P.M. de, Andrews, M., Ardley, J., Eardly, B., Jumas-Bilak, E., Kuzmanović, N. et al. (2019) Minimal standards for the description of new genera and species of rhizobia and agrobacteria. International Journal of Systematic and Evolutionary Microbiology 69, 1852–1863. https://doi.org/10.1099/ijsem.0.003426 Lanéelle, M.-A., Eynard, N., Spina, L., Lemassu, A., Laval, F., Huc, E. et al. (2013) Structural elucidation and genomic scrutiny of the C60-C100 mycolic acids of Segniliparus rotundus. Microbiology 159, 191–203. https://doi.org/10.1099/mic.0.063479-0 Laneelle, M.-A., Launay, A., Spina, L., Marrakchi, H., Laval, F., Eynard, N. et al. (2011) A novel mycolic acid species defines two novel genera of the Actinobacteria, Hoyosella and Amycolicicoccus. Microbiology 158, 843–855. https://doi.org/10.1099/mic.0.055509-0 Langworthy, T.A. and Pond, J.L. (1986) Archaebacterial ether lipids and chemotaxonomy. Systematic and Applied Microbiology 7, 253–257. https://doi.org/10.1016/S0723-2020(86)80015-7 Lata, P., Lal, D. and Lal, R. (2012) Flavobacterium ummariense sp. nov., isolated from hexachlorocyclohexane-contaminated soil, and emended description of Flavobacterium ceti Vela et al. 2007. International Journal of Systematic and Evolutionary Microbiology 62, 2674–2679. https://doi.org/10.1099/ijs.0.030916-0

162

P.A. Lawson and N.B. Patel

Lawson, P.A., Sankaranarayanan, K., Patel, N.B. and Busse, H.-J. (2016) In-silico chemotaxonomy: A tool for 21st century microbial systematics. In: Lawson, P.A. (ed.) Bulletin of Bergey’s International Society for Microbial Systematics, Abstracts Book. p. 27. Lechevalier, M.P. and Moss, C.W. (1977) Lipids in bacterial taxonomy - a taxonomist’s view. CRC Critical Reviews in Microbiology 5, 109–210. https://doi.org/10.3109/10408417709102311 Lee, I., Kim, Y.O., Park, S.-C. and Chun, J. (2016) OrthoANI: An improved algorithm and software for calculating average nucleotide identity. International Journal of Systematic and Evolutionary Microbiology 66, 1100–1103. https://doi.org/10.1099/ijsem.0.000760 Levy-Frebault, V.V. and Portaels, F. (1992) Proposed minimal standards for the genus Mycobacterium and for description of new slowly growing Mycobacterium species. International Journal of Systematic Bacteriology 42, 315–323. https://doi.org/10.1099/00207713-42-2-315 Logan, N.A., Berge, O., Bishop, A.H., Busse, H.-J., Vos, P. de, Fritze, D. et al. (2009) Proposed minimal standards for describing new taxa of aerobic, endospore-forming bacteria. International Journal of Systematic and Evolutionary Microbiology 59, 2114–2121. https://doi.org/10.1099/ijs.0.013649-0 Loman, N.J., Constantinidou, C., Chan, J.Z.M., Halachev, M., Sergeant, M., Penn, C.W. et al. (2012) High-throughput bacterial genome sequencing: an embarrassment of choice, a world of opportunity. Nature Reviews Microbiology 10, 599–606. https://doi.org/10.1038/nrmicro2850 Losey, N.A., Stevenson, B.S., Busse, H.-J., Damste, J.S.S., Rijpstra, W.I.C., Rudd, S. and Lawson, P.A. (2013) Thermoanaerobaculum aquaticum gen. nov., sp. nov., the first cultivated member of Acidobacteria subdivision 23, isolated from a hot spring. International Journal of Systematic and Evolutionary Microbiology 63, 4149–4157. https://doi.org/10.1099/ijs.0.051425-0 Ludwig, W., Euzéby, J., Schumann, P., Busse, H.-J., Trujillo, M.E., Kämpfer, P. and Whitman, W.B. (2012) Road map of the phylum Actinobacteria. In: Bergey’s Manual of Systematic Bacteriology, Volume 4, The Actinobacteria. pp. 1–28. https://doi.org/10.1007/978-0-387-68233-4_1 Lugtenberg, E.J. and Dam, A. v S. (1972) Temperature-sensitive mutants of Escherichia coli K-12 with low activity of the diaminopimelic acid adding enzyme. Journal of Bacteriology 110, 41–46. https://doi. org/10.1128/JB.110.1.41-46.1972 Mahato, N.K., Gupta, V., Singh, P., Kumari, R., Verma, H., Tripathi, C. et al. (2017) Microbial taxonomy in the era of OMICS: application of DNA sequences, computational tools and techniques. Antonie van Leeuwenhoek 110, 1357–1371. https://doi.org/10.1007/s10482-017-0928-1 Makula, R.A. and Finnerty, W.R. (1974) Phospholipid composition of Desulfovibrio species. Journal of Bacteriology 120, 1279–1283. https://doi.org/10.1128/JB.120.3.1279-1283.1974 Marrakchi, H., Lanéelle, M.-A. and Daffé, M. (2014) Mycolic acids: structures, biosynthesis, and beyond. Chemistry & Biology 21, 67–85. https://doi.org/10.1016/j.chembiol.2013.11.011 Martinez-Morales, F., Schobert, M., Lopez-Lara, I.M. and Geiger, O. (2003) Pathways for phosphatidylcholine biosynthesis in bacteria. Microbiology 149, 3461–3471. https://doi.org/10.1099/mic.0.26522-0 Maruyama, I.N., Yamamoto, A.H. and Hirota, Y. (1988) Determination of gene products and coding regions from the murE-murF region of Escherichia coli. Journal of Bacteriology 170, 3786–3788. https://doi. org/10.1128/JB.170.8.3786-3788.1988 Mattarelli, P., Holzapfel, W.H., Franz, C.M.A.P., Endo, A., Felis, G.E., Hammes, W. et al. (2014) Recommended minimal standards for description of new taxa of the genera Bifidobacterium, Lactobacillus and related genera. International Journal of Systematic and Evolutionary Microbiology 64, 1434– 1451. https://doi.org/10.1099/ijs.0.060046-0 Meier-Kolthoff, J.P., Auch, A.F., Klenk, H.-P. and Göker, M. (2013) Genome sequence-based species delimitation with confidence intervals and improved distance functions. BMC Bioinformatics 14, 60. https://doi.org/10.1186/1471-2105-14-60 Mesbah, N.M., Whitman, W.B. and Mesbah, M. (2011) Determination of the G+C content of prokaryotes. In: Rainey, F. and Oren, A. (eds) Methods in Microbiology Volume 38.Taxonomy of Prokaryotes. Academic Press, London, pp. 299–324. https://doi.org/10.1016/B978-0-12-387730-7.00014-0 Michael, A.J. (2016) Polyamines in Eukaryotes, Bacteria, and Archaea. Journal of Biological Chemistry 291, 14896–14903. https://doi.org/10.1074/jbc.R116.734780 Michaud, C., Mengin-Lecrulx, D., Heijenoort, J. and Blanot, D. (1990) Over-production, purification and properties of the uridine-diphosphate-N-acetylmuramoyl-l-alanyl-d-glutamate: meso-2,6-diaminopimelate ligase from Escherichia coli. European Journal of Biochemistry 194, 853–861. https://doi. org/10.1111/j.1432-1033.1990.tb19479.x Minnikin, D.E. (1982) Lipids: complex lipids, their chemistry, biosynthesis and roles. In: Ratledge, C. and Stanford, J.L. (eds) The Biology of the Mycobacteria. Academic Press, London, pp. 95–184.

The Strength of Chemotaxonomy

163

Minnikin, D.E. and Abdolrahimzadeh, H. (1971) Thin-layer chromatography of bacterial lipids on sodium acetate-impregnated silica gel. Journal of Chromatography A 63, 452–454. https://doi.org/10.1016/ S0021-9673(01)85672-7 Minnikin, D.E., Abdolrahimzadeh, H. and Baddiley, J. (1971) The interrelation of polar lipids in bacterial membranes. Biochimica et Biophysica Acta (BBA) – Biomembranes 249, 651–655. https://doi. org/10.1016/0005-2736(71)90148-9 Moore, E.R.B., Mihaylova, S.A., Vandamme, P.A.R., Krichevsky, M.I. and Dijkshoorn, L. (2010) Microbial systematics and taxonomy: relevance for a microbial commons. Research Microbiology 161, 430– 438. https://doi.org/10.1016/j.resmic.2010.05.007 Nguyen, N.-A.T., Sallans, L. and Kaneshiro, E.S. (2008) The major glycerophospholipids of the predatory and parasitic bacterium Bdellovibrio bacteriovorus HID5. Lipids 43, 1053–1063. https://doi. org/10.1007/s11745-008-3235-9 Nitschke, W., Kramer, D.M., Riedel, A. and Liebl, U. (1995) From naphtho- to benzoquinones-(r)evolutionary reorganisations of electron transfer chains. In: Mathis, P. (ed.) Photosynthesis: From Light to Biosphere, Vol. I, Kluwer Academic Publishers, Dordrecht, pp. 945–950. https://doi.org/10.1007/ 978-94-009-0173-5_225. Nouioui, I., Carro, L., García-López, M., Meier-Kolthoff, J.P., Woyke, T., Kyrpides, N.C. et al. (2018) Genome-based taxonomic classification of the phylum Actinobacteria. Frontiers in Microbiology 9, 2007. https://doi.org/10.3389/fmicb.2018.02007 Nowicka, B. and Kruk, J. (2010) Occurrence, biosynthesis and function of isoprenoid quinones. Biochimica et Biophysica Acta (BBA) – Bioenergetics 1797, 1587–1605. https://doi.org/10.1016/j.bbabio. 2010.06.007 Olsen, R.W. and Ballou, C.E. (1971) Acyl phosphatidylglycerol: a new phospholipid from Salmonella typhimurium. Journal of Biological Chemistry 246, 3305–3313. Oren, A., Ventosa, A. and Grant, W.D. (1997) Proposed minimal standards for description of new taxa in the order Halobacteriales. International Journal of Systematic Bacteriology 47, 233–238. https://doi. org/10.1099/00207713-47-1-233 O’Sullivan, D.M., Nicoara, S.C., Mutetwa, R., Mungofa, S., Lee, O.Y.C., Minnikin, D.E. et al. (2012) Detection of Mycobacterium tuberculosis in sputum by gas chromatography-mass spectrometry of methyl mycocerosates released by thermochemolysis. PLOS One 7, e32836. https://doi.org/10.1371/journal. pone.0032836 Overmann, J., Huang, S., Nübel, U., Hahnke, R.L. and Tindall, B.J. (2018) Relevance of phenotypic information for the taxonomy of not-yet-cultured microorganisms. Systematic and Applied Microbiology 42, 22–29. https://doi.org/10.1016/j.syapm.2018.08.009 Page, A.C., Gale, P., Wallick, H., Walton, R.B., McDaniel, L.E., Woodruff, H.B. and Folkers, K. (1960) Coenzyme Q. XVII. Isolation of Coenzyme Qlo from bacterial fermentation. Archives of Biochemistry and Biophysics 318–321. https://doi.org/10.1016/0003-9861(60)90062-X Parte, A.C. (2018) LPSN - List of Prokaryotic names with Standing in Nomenclature (bacterio.net), 20 years on. International Journal of Systematic and Evolutionary Microbiology 1825–1829. https://doi. org/10.1099/ijsem.0.002786 Patel, N.B. (2018) Studying the gastrointestinal microbiome of a traditional Peruvian community. Ph.D. Dissertation. University of Oklahoma. Patel, N.B., Sankaranarayanan, K., Busse, H.-J. and Lawson, P.A. (2016) Investigating genomic tools for polar lipid prediction. Bergey’s International Society for Microbial Systematics, Abstracts Book. p. 41. Patel, N.B., Tito, R.Y., Obregon-Tito, A.J., O’Neal, L., Trujillo-Villaroel, O., Marin-Reyes, L. et al. (2015) Ezakiella peruensis gen. nov., sp. nov. isolated from human fecal sample from a coastal traditional community in Peru. Anaerobe 32, 43–48. https://doi.org/10.1016/j.anaerobe.2014.12.002 Perkins, H.R. and Cummins, C.S. (1964) Chemical structure of bacterial cell walls: ornithine and 2,4-diaminobutyric acid as components of the cell walls of plant pathogenic Corynebacteria. Nature 201, 1105–1107. https://doi.org/10.1038/2011105a0 Petit, J.F., Adam, A., Wietzerbin-Falszpan, J., Lederer, E. and Ghuysen, J.M. (1969) Chemical structure of the cell wall of Mycobacterium smegmatis. I -Isolation and partial characterization of the peptidoglycan. Biochemical and Biophysical Research Communications 35, 478–485. https://doi.org/10.1016/0006291X(69)90371-4 Pugh, E.L. and Kates, M. (1994) Acylation of proteins of the archaebacteria Halobacterium cutirubrum and Methanobacterium thermoautotrophicum. Biochimica et Biophysica Acta (BBA) – Biomembranes 1196, 38–44. https://doi.org/10.1016/0005-2736(94)90292-5

164

P.A. Lawson and N.B. Patel

Pulfer, M. and Murphy, R.C. (2003) Electrospray mass spectrometry of phospholipids. Mass Spectrometry Reviews 22, 332–364. https://doi.org/10.1002/mas.10061 Rainey, F.A. (2011) How to describe new species of Prokaryotes. In: Methods in Microbiology Volume 38. Academic Press, London, pp. 7–14. Rainey, F.A. and Oren, A. (2011) Methods in Microbiology Volume 38. Academic Press, London. Ramasamy, D., Mishra, A.K., Lagier, J.-C., Padhmanabhan, R., Rossi, M., Sentausa, E. et al. (2014) A polyphasic strategy incorporating genomic data for the taxonomic description of novel bacterial species. International Journal of Systematic and Evolutionary Microbiology 64, 384–391. https://doi. org/10.1099/ijs.0.057091-0 Rosa, M.D. and Gambacorta, A. (1988) The lipids of archaebacteria. Progress in Lipid Research 27, 153– 175. https://doi.org/10.1016/0163-7827(88)90011-2 Rosa, M.D., Gambacorta, A. and Gliozzi, A. (1986) Structure, biosynthesis, and physicochemical properties of archaebacterial lipids. Microbiology and Molecular Biology Reviews 50, 70–80. https://doi. org/10.1128/MMBR.50.1.70-80.1986 Russell, N.J. and Nichols, D.S. (1999) Polyunsaturated fatty acids in marine bacteria - a dogma rewritten. Microbiology 145, 767–779. https://doi.org/10.1099/13500872-145-4-767 Salton, M.R.J. (1994) The bacterial cell envelope - a historical perspective. In: Ghuysen, J.M., Hakenbeck, R. (eds) Bacterial Cell Wall. Elsevier Science, pp. 1–22. https://doi.org/10.1016/S0167-7306(08)60404-4 Scherer, P. and Kneifel, H. (1983) Distribution of polyamines in methanogenic bacteria. Journal of Bacteriology 154, 1315–1322. https://doi.org/10.1128/JB.154.3.1315-1322.1983 Schleifer, K.-H. (2009) Classification of Bacteria and Archaea: Past, present and future. Systematic and Applied Microbiology 32, 533–542. https://doi.org/10.1016/j.syapm.2009.09.002 Schleifer, K.H. and Kandler, O. (1972) Peptidoglycan types of bacterial cell walls and their taxonomic implications. Bacteriological Reviews 36, 407–477. https://doi.org/10.1128/MMBR.36.4.407-477.1972 Schleifer, K.H., Plapp, R. and Kandler, O. (1967) Identification of threo-3-hydroxyglutamic acid in the cell wall of Microbacterium lacticum. Biochemical and Biophysical Research Communications 28, 566–570. https://doi.org/10.1016/0006-291X(67)90351-8 Schleifer, K.-H. and Stackebrandt, E. (1983) Molecular systematics of Prokaryotes. Annual Review of Microbiology 37, 143–187. https://doi.org/10.1146/annurev.mi.37.100183.001043 Schmidt, V.S.J., Mayr, R., Wenning, M., Glöckner, J., Butler-Wu, S.M. and Scherer, S. (2009) Bavariicoccus seileri gen. nov., sp. nov., isolated from the surface and smear water of German red smear soft cheese. International Journal of Systematic and Evolutionary Microbiology 59, 2437–2443. https://doi. org/10.1099/ijs.0.006601-0 Schumann, P. (2011) Peptidoglycan structure. In: Rainey, F. and Oren, A. (eds) Methods in Microbiology Volume 38. Taxonomy of Prokaryotes. Academic Press, London, pp. 101–129. https://doi.org/10.1016/ B978-0-12-387730-7.00005-X Schumann, P., Kämpfer, P., Busse, H.-J. and Evtushenko, L.I. (2009) Proposed minimal standards for describing new genera and species of the suborder Micrococcineae. International Journal of Systematic and Evolutionary Microbiology 59, 1823–1849. https://doi.org/10.1099/ijs.0.012971-0 Schütz, M., Brugna, M., Lebrun, E., Baymann, F., Huber, R., Stetter, K.-O. et al. (2000) Early evolution of cytochrome bc complexes. Journal of Molecular Biology 300, 663–675. https://doi.org/10.1006/jmbi.2000.3915 Shaw, N. (1970) Bacterial glycolipids. Bacteriological Reviews 34, 365–377. https://doi.org/10.1128/ MMBR.34.4.365-377.1970 Singh, S., Sinha, R.P. and Hader, D.T. (2002) Role of lipids and fatty acids in stress tolerance in Cyanobacteria. ACTA Protozoologica 41, 297–308. Smith, I. (2003) Mycobacterium tuberculosis pathogenesis and molecular determinants of virulence. Clinical Microbiology Reviews 16, 463–496. https://doi.org/10.1128/CMR.16.3.463-496.2003 Soballe, B. and Poole, R.K. (1999) Microbial ubiquinones: multiple roles in respiration, gene regulation and oxidative stress management. Microbiology 145, 1817–1830. https://doi.org/10.1099/13500872145-8-1817 Sohlenkamp, C. and Geiger, O. (2016) Bacterial membrane lipids: diversity in structures and pathways. FEMS Microbiology Reviews 40, 133–159. https://doi.org/10.1093/femsre/fuv008 Stackebrandt, E. and Schumann, P. (2006) Introduction to the taxonomy of Actinobacteria. In: Dworkin, M., Falkow, S., Rosenberg, E., Schleifer K.H. and Stackebrandt, E. (eds) The Prokaryotes. Springer, New York, NY, pp. 297–321. https://doi.org/10.1007/0-387-30743-5_16

The Strength of Chemotaxonomy

165

Stackebrandt, E., Rainey, F.A. and Ward-Rainey, N.L. (1997) Proposal for a new hierarchic classification system, Actinobacteria classis nov. International Journal of Systematic and Evolutionary Microbiology 47, 479–491. https://doi.org/10.1099/00207713-47-2-479 Staley, J.T. (2010) Comprehending microbial diversity: the fourth goal of microbial taxonomy. The Bulletin of BISMiS 1, 1–4. Stamps, B.W., Losey, N.A., Lawson, P.A. and Stevenson, B.S. (2014) Genome sequence of Thermoanaerobaculum aquaticum MP-01t, the first cultivated member of Acidobacteria subdivision 23, isolated from a hot spring. Genome Announcements 2, e00570-14. https://doi.org/10.1128/genomeA. 00570-14 Staneck, J.L. and Roberts, G.D. (1974) Simplified approach to identification of aerobic actinomycetes by thin-layer chromatography. Applied Microbiology 28, 226–231. https://doi.org/10.1128/AEM.28.2.226231.1974 Steiner, S., Conti, S.F. and Lester, R.L. (1973) Occurrence of phosphonosphingolipids in Bdellovibrio bacteriovorus strain UKi2. Journal of Bacteriology 116, 1199–211. https://doi.org/10.1128/JB.116.3.11991211.1973 Stodola, F.H., Lesuk, A. and Anderson, R.J. (1938) The chemistry of the lipids of Tubercle bacilli. Journal of Biological Chemistry 126, 505–513. Suresh, G., Lodha, T.D., Indu, B., Sasikala, C. and Ramana, C.V. (2019) Taxogenomics resolves conflict in the genus Rhodobacter: A two and half decades pending thought to reclassify the genus Rhodobacter. Frontiers in Microbiology 10, 2480. https://doi.org/10.3389/fmicb.2019.02480 Sutcliffe, I.C. (2010) A phylum level perspective on bacterial cell envelope architecture. Trends in Microbiology 18, 464–470. https://doi.org/10.1016/j.tim.2010.06.005 Sutcliffe, I.C. (2015) Challenging the anthropocentric emphasis on phenotypic testing in prokaryotic species descriptions: rip it up and start again. Frontiers in Genetics 6, 218. https://doi.org/10.3389/ fgene.2015.00218 Sutcliffe, I.C., Trujillo, M.E. and Goodfellow, M. (2011) A call to arms for systematists: revitalising the purpose and practices underpinning the description of novel microbial taxa. Antonie van Leeuwenhoek 101, 13–20. https://doi.org/10.1007/s10482-011-9664-0 Sutcliffe, I.C., Trujillo, M.E., Whitman, W.B. and Goodfellow, M. (2013) A call to action for the International Committee on Systematics of Prokaryotes. Trends in Microbiology 21, 51–52. https://doi.org/10.1016/j. tim.2012.11.004 Suzuki, K., Goodfellow, M. and O’Donnell, A.G. (1993) Cell envelopes and classification. In: Goodfellow, M. and O’Donnell (eds) Handbook of New Bacterial Systematics. Academic Press, London, pp. 195–250. Tabor, C.W. and Tabor, H. (1985) Polyamines in microorganisms. Microbiology and Molecular Biology Reviews 49, 81–99. https://doi.org/10.1128/MMBR.49.1.81-99.1985 Tamames, J. and Rosselló-Móra, R. (2012) On the fitness of microbial taxonomy. Trends in Microbiology 20, 514–516. https://doi.org/10.1016/j.tim.2012.08.012 Tamura, S. (1913) Zur Chemie der Bakterien. Hoppe-Seyler’s Zeitschr Physiological Chemistry 87, 85–114. https://doi.org/10.1515/bchm2.1913.87.2.85 Tan, B.K., Bogdanov, M., Zhao, J., Dowhan, W., Raetz, C.R.H. and Guan, Z. (2012) Discovery of a cardiolipin synthase utilizing phosphatidylethanolamine and phosphatidylglycerol as substrates. Proceedings of the National Academy of Sciences of the United States of America 109, 16504–16509. https://doi. org/10.1073/pnas.1212797109 Thompson, C.C., Amaral, G.R., Campeão, M., Edwards, R.A., Polz, M.F., Dutilh, B.E. et al. (2015) Microbial taxonomy in the post-genomic era: Rebuilding from scratch? Archives of Microbiology 197, 359–370. https://doi.org/10.1007/s00203-014-1071-2 Tindall, B.J., Rosselló-Móra, R., Busse, H.-J., Ludwig, W. and Kampfer, P. (2010) Notes on the characterization of prokaryote strains for taxonomic purposes. International Journal of Systematic and Evolutionary Microbiology 60, 249–266. https://doi.org/10.1099/ijs.0.016949-0 Triolo, T.A., Chabin, R.M. and Pompliano, D.L. (2004) Cloning, expression and characterization of the Streptococcus pyogenes murE gene encoding a UDP-MurNAc-l-alanyl-d-glutamate: l-lysine ligase. Enzyme and Microbial Technology 35, 300–308. https://doi.org/10.1016/j.enzmictec.2004.03.020 Uchida, K. and Aida, Ko. (1977) Acyl type of bacterial cell wall: its simple identification by colorimetric method. Journal of General and Applied Microbiology 23, 249–260. https://doi.org/10.2323/jgam.23.249

166

P.A. Lawson and N.B. Patel

Uchida, K. and Seino, A. (1997) Intra- and intergeneric relationships of various Actinomycete strains based on the acyl types of the muramyl residue in cell wall peptidoglycans examined in a glycolate test. International Journal of Systematic Bacteriology 47, 182–190. https://doi.org/10.1099/0020771347-1-182 Ulrih, N.P., Gmajner, D. and Raspor, P. (2009) Structural and physicochemical properties of polar lipids from thermophilic Archaea. Applied Microbiology and Biotechnology 84, 249–260. https://doi.org/10.1007/ s00253-009-2102-9 Verbarg, S., Göker, M., Scheuner, C., Schumann, P. and Stackebrandt, E. (2014) The Families Erysipelotrichaceae emend., Coprobacillaceae fam. nov., and Turicibacteraceae fam. nov. In: The Prokaryotes. pp. 79–105. Visweswaran, G.R.R., Dijkstra, B.W. and Kok, J. (2011) Murein and pseudomurein cell wall binding domains of Bacteria and Archaea-a comparative view. Applied Microbiology and Biotechnology 92, 921–928. https://doi.org/10.1007/s00253-011-3637-0 Vollmer, W., Blanot, D. and Pedro, M.A.D. (2008) Peptidoglycan structure and architecture. FEMS Microbiology Reviews 32, 149–167. https://doi.org/10.1111/j.1574-6976.2007.00094.x Weidel, W., Frank, H. and Martin, H.H. (1960) The rigid layer of the cell wall of Escherichia coli strain B. Journal of General Microbiology 22, 158–166. https://doi.org/10.1099/00221287-22-1-158 Weidel, W. and Pelzer, H. (1964) Bagshaped macromolecules-a new outlook on bacterial cell walls. In: Nord, N.N. (ed.) Advances in Enzymology - and Related Areas of Molecular Biology. Volume 26. John Wiley and Sons, Inc. USA. pp. 193–232. https://doi.org/10.1002/9780470122716.ch5 Welch, D.F. (1991) Applications of cellular fatty acid analysis. Clinical Microbiology Reviews 4, 422–438. https://doi.org/10.1128/CMR.4.4.422 Whitman, W.B. (2014) The need for change: Embracing the genome. Methods in Microbiology 41, 1–12. https://doi.org/10.1016/bs.mim.2014.08.002 Whitman, W.B. (2015) Genome sequences as the type material for taxonomic descriptions of prokaryotes. Systematic and Applied Microbiology 38, 217–222. https://doi.org/10.1016/j.syapm.2015.02.003 Whitman, W.B. (2016) Modest proposals to expand the type material for naming of prokaryotes. International Journal of Systematic and Evolutionary Microbiology 66, 2108–2112. https://doi.org/10.1099/ ijsem.0.000980 Whitman, W.B., Coleman, D.C. and Wiebe, W.J. (1998) Prokaryotes: The unseen majority. Proceedings of the National Academy of Sciences of the United States of America 95, 6578–6583. https://doi. org/10.1073/pnas.95.12.6578 Woese, C.R. (1987) Bacterial evolution. Microbiology Reviews 51, 221–271. Work, E. (1957) Biochemistry of the bacterial cell wall. Nature 179, 841–847. https://doi.org/10.1038/ 179841a0 Work, E. (1964) Amino-acids of walls of Micrococcus radiodurans. Nature 201, 1107. https://doi. org/10.1038/2011107a0 Work, E. and Dewey, D.L. (1953) The distribution of α, ε-diaminopimellc acid among various microorganisms. Journal of General Microbiology 9, 394–409. https://doi.org/10.1099/00221287-9-3-394 Yamamoto, S., Shinoda, S. and Makita, M. (1979) Occurrence of norspermidine in some species of genera Vibrio and Beneckea. Biochemical and Biophysical Research Communications 87, 1102–1108. https://doi.org/10.1016/S0006-291X(79)80021-2 Yamamoto, S., Shinoda, S., Kawaguchi, M., Wakamatsu, K. and Makita, M. (1983) Polyamine distribution in Vibrionaceae: norspermidine as a general constituent of Vibrio species. Canadian Journal of Microbiology 29, 724–728. https://doi.org/10.1139/m83-118 Yano, Y., Nakayama, A. and Yoshida, K. (1997) Distribution of polyunsaturated fatty acids in bacteria present in intestines of deep-sea fish and shallow-sea poikilothermic animals. Applied and Environmental Microbiology 63, 2572–2577. https://doi.org/10.1128/AEM.63.7.2572-2577.1997 Yarza, P., Yilmaz, P., Pruesse, E., Glöckner, F.O., Ludwig, W., Schleifer, K.-H. et al. (2014) Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nature Reviews Microbiology 12, 635–645. https://doi.org/10.1038/nrmicro3330 Yassin, A.-A.F. (2011) 10. Detection and characterization of mycolic acids and their use in taxonomy and classification. In: methods in microbiology. pp. 207–237. https://doi.org/10.1016/B978-0-12-3877307.00010-3 Zhang, D. and Poulter, C.D. (1992) Biosynthesis of Archaebacterial ether lipids. Formation of ether linkages by prenyltransferases. Journal of the American Chemical Society 115, 1270–1277. https://doi. org/10.1021/ja00057a008

The Strength of Chemotaxonomy

167

Zhi, X.Y., Li, W.J. and Stackebrandt, E. (2009) An update of the structure and 16S rRNA gene sequence-based definition of higher ranks of the class Actinobacteria, with the proposal of two new suborders and four new families and emended descriptions of the existing higher taxa. International Journal of Systematic and Evolutionary Microbiology 59, 589–608. https://doi.org/10.1099/ ijs.0.65780-0 Zhi, X.-Y., Yao, J.-C., Tang, S.-K., Huang, Y., Li, H.-W., and Li, W.-J. (2014) The futalosine pathway played an important role in menaquinone biosynthesis during early prokaryote evolution. Genome Biology and Evolution 6, 149–60. https://doi.org/10.1093/gbe/evu007

10

Microbial Genomic Taxonomy

Cristiane C. Thompson1, Livia Vidal1, Vinicius Salazar1, Jean Swings1,2 and Fabiano L. Thompson1,* 1 Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil; 2Ghent University, Gent, Belgium

Introduction Microbial taxonomy comprises: (i) the identification of isolates into known species; (ii) the classification of new isolates (creation of new taxa); and (iii) nomenclature, that is, naming the isolates according to the bacterial Code (Stackebrandt and Ebers, 2006). The history of microbial taxonomy over the last 100 years is that of a scientific field that is both progressive and conservative. It is progressive as it incorporates the most advanced technologies, but conservative as it adheres to standards and rules (Thompson et al., 2015). Microbial taxonomy is a scientific field that evolves as the technological tools become available. This has an effect on the species definition that forms the cornerstone in prokaryotic taxonomy. Taxonomic schemes today are still based on the polyphasic approach that is recognized as an orthodox field; that is, it follows the accepted standard rules that are applied for species delineation. These rules are: (i) DNA-DNA hybridization (DDH) values of at least 70% (De Ley, 1970); (ii) at least 97% 16 rRNA gene sequence similarity (98.7% was proposed by Stackebrandt and Ebers, 2006); (iii) maximum 2% of G+C span; and (iv) differentiating chemotaxonomic and phenotypic features. Great weight is placed on phenotypic (chemotaxonomic) characterization obtained from

specialized technologies such as fatty acid methyl esters (FAME), polyamines, peptidoglycan types, sphingolipids and matrix-assisted laser desorption/ionization-time of flight mass spectrometry (MALDI–TOF MS), see Chapters 7 and 9; however, in most cases, these chemotaxonomic properties are not very useful for discriminating closely related species. The need to undertake laborious DDH determinations, together with phenotypic differentiation, has limited the number of prokaryotic species descriptions. The seminal work of Woese and Fox (1977) used ribosomal rRNAs as evolutionary chronometers and identified three domains of life. The subsequent reconstruction of the tree of life (Woese et al., 1990) provided a starting point for unravelling the natural relationships among species (Woese et al., 1990; Ludwig and Klenk, 2005). Until now this consensus classification has remained the pragmatic basis for prokaryotic taxonomy (see Chapter 9). A more recent version of the tree of life has been proposed from a comprehensive analysis of ribosomal protein sequences (Hug et al., 2016). In that study the term candidate phyla radiation (CPR) was proposed for a large group of prokaryotic genomes that have not been isolated in culture. These novel organisms have reduced genomes and metabolic capacities, and many may possibly be endosymbionts.

*[email protected]

168

© CAB International 2021. Trends in the Systematics of Bacteria and Fungi (eds. P. Bridge, D. Smith and E. Stackebrandt)

Microbial Genomic Taxonomy

The current nomenclatural rules (see Chapter 3) are impeding progress both in the description of new species and in the development of taxonomy as a scientific discipline. DDH is still considered the gold standard for species delineation in spite of other techniques such as average amino acid identity (AAI), average nucleotide identity (ANI), genome-to-genome distance (GGD) and in silico phenotype having been shown to be both portable and to have greater discriminatory power (see, for example, Gevers et al., 2005; Konstantinidis and Tiedje, 2005; Auch et al., 2010; Thompson et al., 2015). Coming after the first century of bacterial taxonomy (until ~1970), the use of a polyphasic approach has enabled considerable progress and stability in microbial taxonomy. However, polyphasic taxonomy is increasingly outdated as orthodox microbial polyphasic taxonomy is neither able to keep up with the progress in environmental and evolutionary microbiology nor to satisfy the needs of clinical microbiologists and epidemiologists. Additionally, polyphasic taxonomy has not been able to keep up with the rapid increase in numbers of genome sequences. At the time of writing, there are about 234,282 prokaryote genomes, including some 7525 from type strains available at GenBank and the GEBA-VI project. Much of the recent progress in microbiology is due to the dramatic reduction in the cost and time involved in genome sequencing. Recent taxonomic studies on Escherichia coli have used over 100,000 genomes (Abram et al., 2020). The time has come to integrate genomics as a reliable and reproducible standard into microbial taxonomy (Chun et al., 2018). However, simply incorporating genome sequence data into polyphasic taxonomy might not rejuvenate microbial taxonomy. Taxonomists share the responsibility for a description of the microbial world together with ecologists and phylogeneticists. In fact, with the available genomic technology and sufficient metadata, the necessary standards and rules to develop robust and fast tools that describe and order microbial diversity can be constructed.

Genomic Microbial Taxonomy Whole-genome sequencing (WGS) launched microbial taxonomy into the new era of genomic

169

microbial taxonomy (see Chapter 13), establishing systematics based on the information retrieved from genomes. Genomic microbial taxonomy will not merely be an enriched polyphasic taxonomy, as it will be a taxonomy framed on a genomic backbone. Genomic taxonomy is defined on the basis of an integrated comparative genomics approach that includes genome signatures, for example: (i) GGD (Auch et al., 2010); (ii) AAI (Rohwer and Edwards, 2002); (iii) ANI (Konstantinidis and Tiedje, 2005); (iv) Karlin genomic signature (Karlin and Burge, 1995); (v) supertree analysis (Brown et al., 2001); (vi) codon usage bias (Wright, 1990); (vii) metabolic pathway content; (viii) core and pan-genome analysis; (ix) pan-genome family trees (Snipen and Ussery, 2010); and (x) in silico proteome analysis, genotype-to-phenotype-derived metabolic features, including those features that may inform ecology (e.g. host–microbe interactions, and energy/ nutrient cycling) and evolution (Dutilh et al., 2013, 2014; Amaral et al., 2014). The main goal of the genomic taxonomy is to extract taxonomic information from WGS that can be used to establish a solid framework for the identification and classification of prokaryote species and possibly populations. Taxonomy must adjust to the genomics era and address the needs of its users in microbial ecology and clinical microbiology (Preheim et al., 2011; Olm et al., 2020) in a new paradigm of open-access genomic taxonomy (Thompson et al., 2013a). We have already seen the tremendous efforts put into initiatives on prokaryote genomics, such as the Genomic Encyclopedia of Bacteria and Archaea (GEBA) (Wu et al., 2009; Klenk and Göker, 2010), the Genomes OnLine Database (GOLD) (Kyrpides, 1999; Pagani et al., 2012), the Integrated Microbial Genomes - IMG (Markowitz et al., 2006, 2014) and the Genome Taxonomy Database - GTDB (Parks et al., 2018, 2020; Chaumeil et al., 2019). We argue for an open-access catalogue of taxonomic descriptions with prototypes; diagnostic tables; and links to culture collections, to genome and gene sequences, and to other phenotypic and ecological databases mentioned above (see Chapters 3 and 6). Ideally, the open- access taxonomy will be based solely on genome sequences that allow both the phylogenetic allocation of new strains and species in the taxonomic space and the phenotypic/metabolic

170

C.C. Thompson et al.

characterization in open online databases. Careful and thorough annotation of the genome sequences for function and chemotaxonomic data will be required. An alternative Code will be required for the naming strategy of genomes. A new species description should be based on at least one genome (Thompson et al., 2013a). In this way, the genomic landscape of the novel bacterium becomes available to microbiologists. The genome may be from a cultured strain or from uncultured organisms (i.e. single-cell genomes, SCGs; metagenome-assembled genomes, MAGs). Ideally, additional representative genomes of cultured or uncultured strains belonging to the new species will be included in order to provide information on the intraspecies genomic and phenotypic variation. Genomic taxonomy has already been successfully applied as an alternative to the more traditional species description and reclassification (Thompson et al., 2009; Haley et al., 2010; Thompson et al., 2011a, 2013b; Moreira et al., 2014a,b; Coutinho et al., 2016; Amin et al., 2017; Walter et al., 2017; Fróes et al., 2018; Nóbrega et al., 2018; Silva et al., 2018; Vidal et al., 2019; Azevedo et al., 2020). For example, new taxonomic frameworks for Prochlorococcus (Thompson et al., 2013c; Walter et al., 2017; Tschoeke et al., 2020), Synechococcus (Coutinho et al., 2016) and for the phylum Cyanobacteria (Walter et al., 2017) were proposed with the descriptions of new species based on both cultured strains and uncultured organisms.

In silico Phenotyping Genome sequences can also allow for the rapid identification of major phenotypic features associated with the organism, and translation of genomic information into phenotype will become increasingly precise as more genomes are annotated. The manner in which phenotypic information is retrieved and presented in new species description and identification schemes will need to change in order to allow for open access of taxonomic data. Classical phenotypic data are very hard to retrieve from the literature and lack portability (see, e.g. Bergey’s Manual, The Prokaryotes (Garrity et al., 2004)). Huge amounts of valuable phenotypic data are simply out of reach because they are only available in the species description

papers, manuals or handbooks. On the other hand, researchers need portable electronic data in order to advance different fields of microbiology. Moreover, bioinformatics advances have enabled mining of these genome sequences to predict the phenotype of the sequenced strain, known as in silico phenotyping, avoiding costly experimental phenotypic screens that need to be performed in the laboratory. Analyses of genes coding for the specific proteins involved in the metabolic pathways responsible for diagnostic features (e.g. Voges–Proskauer reaction, indole production, arginine dihydrolase) may be an alternative to the time-consuming phenotypic characterization obtained from standard biochemical tests (Karp et al., 2005; Romero et al., 2005; Dutilh et al., 2013; Amaral et al., 2014). We have proposed an approach for in silico genomic phenotyping based on gene content screens (Amaral et al., 2014). In this study, genes involved in the molecular pathways leading to the phenotypes were selected and genome sequences screened for the presence of these genes. This allowed us to confidently predict phenotypic classifications to each of the genomes studied that can be tested experimentally (Amaral et al., 2014). A large collection of phenotypes and the associated genes is contained in the SEED database (Overbeek et al., 2014). This database contains hundreds of expert-annotated, manually curated subsystems that can be rapidly projected onto new genome sequences, providing an automated approach for in silico prediction of phenotypes. One interesting example of in silico phenotyping is the study of alginate degrading bacteria associated with seaweeds (Hehemann et al., 2016). The authors delineated three groups of marine vibrios (pioneer, harvester, scavenger) based on their gene content and ability to use alginate resources. Pioneers colonize and degrade the intact alginate polymer. They construct a niche for scavenger and harvester populations. Harvesters tether alginate lyases to their cell surface, and scavenger populations devoid of any alginate lyases can only use the smallest alginate oligosaccharides. Predicting the genes that are involved in each phenotype is known as gene-trait matching. A complete in silico pipeline has been outlined for the consistent annotation of bacterial genomes followed by automated gene-trait matching (Dutilh et al., 2013). The trait is consistently measured for all sequenced genomes.

Microbial Genomic Taxonomy

By using this approach (dubbed ‘genome-wide association study for microbes’, GWAS-M), candidate genes contributing to the trait can be obtained. The approach employs a machinelearning tool, and by analysing a training set of bacteria that differ with respect to the trait, it identifies which genomic variables best explain the trait variation. These genomic variables can then be used to infer the phenotype of a strain based on its genome sequence. Specific programs and databases related to different taxonomic groups will need to be developed in order to automate searches for genes related to phenotypes of interest. For instance, amino acid FASTA files with coding sequences of a target phenotypic feature can be used as input to verify whether hits are found for the gene (enzyme) being searched for in a specific database. Orthologous genes will have greater BLAST scores and their identity will be > 40% in this type of search. Gene sequence length normally needs to be > 70% of the query length. After these steps, if all the genes (enzymes) involved in a metabolic pathway are present in the genome, the organism is considered positive for a given phenotype, or if one or more genes (enzymes) in a metabolic pathway are absent, the organism is considered negative. It is also important to evaluate regulatory genes, global regulators of the different diagnostic phenotypic features/metabolic pathways, the presence of indels in the gene sequences, sRNA regulation and promoter sequences (Amaral et al., 2014; Thompson et al., 2015). Previous studies have suggested that growth temperatures and oxygen requirements can be predicted from genome sequences. Genome sequences obtained from metagenomes (MAGs) have been used to model the interactions between uncultured microorganisms and their environment. The phenotypic potential of such MAGs allows the inference of possible ecological roles in nature (Garza and Dutilh, 2015). Reconstructed genome-scale metabolic models of microorganisms based on their genome sequence are validated by both the growth of the microbes in the laboratory and in silico prediction. The metabolic fluxes of individual microbial populations can be linked to their community dynamics. Distinct microbial metabolomes obtained from different human body sites have allowed predictive microbiome modelling (Garza et al., 2018). The recent analyses of colorectal cancer metagenomic

171

samples demonstrated an increase in metabolic networks of bacteria that depend on metabolites abundant in this type of cancer for biomass production. The increase in colorectal cancer metabolites induces the growth of certain bacterial types. This study demonstrates that shifts in the microbiome can be explained from changes in the metabolome (Garza et al., 2020). A recent study has demonstrated the prediction of ecological niches from genomes in a vast marine ecosystem. The genome encodes the metabolic and functional capabilities of an organism and should be a major determinant of its ecological niche. The study used 123 metagenomes from the Baltic Sea to obtain 1961 MAGs. Machine-learning predicted a genome cluster along various niche gradients (salinity, depth, size-fraction) based on its functional genes, suggesting a strong link between genome and ecological niche (Alnegerg et al., 2020; Lieven et al., 2020). In all these previous studies it is clear that in silico phenotyping is feasible and reliable. The application of in silico phenotyping will allow tremendous advances in microbial taxonomy.

Suggestions for a Genome-based Taxonomy Microbial taxonomy is moving from polyphasic taxonomy into a new open-access genomic microbial taxonomy, with a set of standardized tools available for genome sequences. The highest priority for genomic microbial taxonomy is to help better describe microbial diversity. Species descriptions need to include whole-genome sequences of the novel type and reference strains, and calculate genome similarity within species and towards the closest known species by means of phylogenetic analysis based on the core genome, GGD, AAI and/or ANI. The genome sequence will finally become the basic unit of taxonomy, as already suggested by Wayne et al. (1987). The genome sequences of the novel type and reference strains must be deposited in public open-access databases and the cultures in public collections. The type can be a culture, DNA or a WGS. The principle that whoever discovers a new genome has the right to name it according to a simplified procedure would be a major advance.

172

C.C. Thompson et al.

A Code is required but the present one needs rewriting to promote the development of the taxonomy of under-studied groups (e.g. candidate radiation phyla and Cyanobacteria). By checking in the published literature (i.e. species descriptions, Bergey’s Manual, The Prokaryotes (Garrity et al., 2004)), the list of useful discriminatory phenotypic features can be determined for the genome sequences of its closest relative. The genotype-tophenotype approach can be applied to identify and define the presence of diagnostic phenotypes on the basis of the presence of the gene sequences, and to obtain the maximum number of phenotypes based on genome sequences. Matching genes in a newly described genome to those in its closest phylogenetic relatives allows for the determination of the in silico phenotypes. Species descriptions need at least the most basic phenotypic characterization of the novel strains in vitro, such as cell and colony morphology, growth at different ranges of temperature, pH and salinity. FAME, MALDI–TOF, AFLP and other non-portable fingerprinting techniques should be avoided. An efficient and simplified procedure for genomic taxonomy operates in a bottom-up approach from DNA (WGS) to species. Bioinformatics and analytical work are also needed in order to use information available in genome sequences. The genomic taxonomy should encompass the existing information, integrating it with new data on DNA, genomes, isolates/strains (cultured and uncultured), and also on Candidatus, as well as on reconstructed genomes from metagenomes (Hugerth et al., 2015). Species description papers should be concise texts reporting the findings in a machine-readable format, and so automation of the production of texts dealing with descriptions and updates of databases will become a plausible development.

Challenges Ahead for Microbial Taxonomy in the Context of Microbial Ecology The most extreme challenge is where new biodiversity described by single-cell sequencing is used to generate reference genomes of uncultured taxa, such as from the marine bacterioplankton; for example, two uncultured flavobacteria described by Woyke et al. (2009), and thousands of picocyanobacterial genomes (Pachiadaki et al., 2019).

In these and many other cases, traditional polyphasic taxonomy is of little help in describing novelty. The prokaryotic Code must be totally revised to include the description of uncultured organisms or even DNA based on whole-genome sequences. Complete genomes can be reconstructed from metagenomes through the process of binning contigs or scaffolds derived from the same strain. Recent studies have obtained new metagenome-assembled genomes from complex environmental samples (Hugerth et al., 2015; Almstrand et al., 2016; Haroon et al., 2016; Pinto et al., 2016; Campeão et al., 2019). The abundance of these genomes across different environments and their metabolic and eco-functional potential can now be inferred from metagenomics. Thousands of bacterial genomes representing new phyla in the domain of Bacteria have been reconstructed from environmental metagenomes. Collectively, this large group was called CPR (Brown et al., 2015; Hug et al., 2016; Parks et al., 2017; Castelle and Banfield, 2018), and may comprise the majority of life’s current biodiversity. This biodiversity can only be revealed by cultivation- independent genome approaches. We propose that taxonomists accept the principle that the researchers who discover a new genome sequence have the right to name it following a simple legitimate taxonomic procedure of technical quality standards (Murray and Stackebrandt, 1995). One group that deserves all our attention is the phylum Cyanobacteria.

Challenges in the Taxonomy of the Cyanobacteria Phylum Cyanobacteria is one of the most diverse and widely distributed prokaryote groups. Members of this group were historically named according to the Botanical Code and this has caused serious confusions among microbiologists (Kauff and Budel, 2010). The inclusion of Cyanobacteria in bacterial taxonomic schemes was only proposed in 1978 by Stanier et al. (1978); over time, bacterial taxonomic names have come into conflict with the botanical nomenclature (Oren, 2004). More than two decades passed before the Note to General Consideration 5 (1999) was published, indicating that Cyanobacteria should be included under the rules of the International

Microbial Genomic Taxonomy

173

Fig. 10.1. Phylogenomic tree of the Cyanobacteria phylum with the proposed new names (Walter et al., 2017). New species names in red correspond to new genera based on genome analyses (phylogenomics, AAI, GGD). Former species names in black. Ecogenomic groups are reflected by branch colour.

Committee on Systematic Bacteriology (ICSB)/ International Committee on Systematic of Prokaryotes (ICSP) (Tindall, 1999; De Vos and

Trüper, 2000; Labeda, 2000). Taxon nomenclature within this group has long been a topic of discussion, but there is currently no consensus

174

C.C. Thompson et al.

(Hoffmann et al., 2005; Oren and Tindall, 2005; Oren et al., 2009; Oren and Ventura, 2017). As a result of inconsistent and flawed taxonomic procedures, more than 50 genera of Cyanobacteria have been described since 2000, and many of them remain unrecognized in the List of Prokaryotic names with Standing in Nomenclature (LPSN, www.bacterio.net) (Parte, 2014) and in databases (e.g. NCBI) (Gaget et al., 2015a,b). Cyanobacterial taxonomy has been based on morphologic traits and may not reflect the results of phylogenetic analyses (Walter et al., 2017). Specialized journals (e.g. IJSEM) still accept descriptions of species of Cyanobacteria without genome sequences, inflating the problems related to lack of genome information. Insights have been gained recently on the genomic taxonomy of Cyanobacteria (Walter et al., 2017). This study delimited 57 different genera (of which 28 were new) and 87 different species (of which 32 were new) (Fig. 10.1). In addition, in silico phenotyping analysis showed three major ecological groups of Cyanobacteria: (i) low temperature; (ii) low temperature copiotroph; and (iii) high temperature oligotroph, which were coherently linked to the genomic taxonomy. A recent study using genome sequences from uncultured picocyanobacteria demonstrated

that the so-called species Prochlorococcus marinus comprises at least 137 species (Tschoeke et al., 2020). These newly defined species will be identified by genomic taxonomy.

Conclusion Current microbial taxonomy is not able to keep up with the pace of development in microbial ecology. Innovative ways of developing microbial taxonomy are, therefore, needed urgently. Excellent examples of species circumscription using genome and ecological information have been put forward by Arevalo et al. (2019) and Bobay et al. (2018). The authors propose an alternative strategy: recent gene flow to detect genetic–ecological clusters, which are separated by lack of gene flow. These clusters correspond to species, and are consistent with previous studies. The authors have also developed software to associate species clusters with habitats and hosts. This novel approach can, for the first time, allow microbial species descriptions using genomes based on ecological and evolutionary theory. One challenge ahead is to leverage the use of genome sequences to obtain insights on the (in silico) phenotypes and ecology of novel taxa.

References Abram, K., Udaondo, Z., Bleker, C., Wanchai, V., Wassenaar, T.M., Robeson, II M.S. and Ussery, D.W. (2020) What can we learn from over 100,000 Escherichia coli genomes? bioRxiv 708131; https://doi. org/10.1101/708131 Almstrand, R., Pinto, A.J., Figueroa, L.A. and Sharp, J.O. (2016) Draft genome sequence of a novel Desulfobacteraceae member from a sulfate-reducing bioreactor metagenome. Genome Announcements 4 (1):e0154015. doi: 10.1128/genomeA.01540-15 Alneberg, J., Bennke, C., Beier, S., Bunse, C., Quince, C., Ininbergs, K., Riemann, L., Ekman, M., Jürgens, K., Labrenz, M., Pinhassi, J. and Andersson, A.F. (2020) Ecosystem-wide metagenomic binning enables prediction of ecological niches from genomes. Communications Biology 3 (1), 119. doi: 10.1038/ s42003-020-0856-x Amaral, G.R., Dias, G.M., Wellington-Oguri, M., Chimetto, L., Campeão, M.E., Thompson, F.L. and Thompson, C.C. (2014) Genotype to phenotype: identification of diagnostic vibrio phenotypes using whole genome sequences. International Journal of Systematic and Evolutionary Microbiology 64 (Pt 2), 357–365. doi: 10.1099/ijs.0.057927-0. PMID: 24505074 Amin, A.K.M.R., Tanaka, M., Al-Saari, N., Feng, G., Mino, S., Ogura, Y., Hayashi, T., Meirelles, P.M., Thompson, F.L., Gomez-Gil, B. and Sawabe, T. (2017) Thaumasiovibrio occultus gen. nov. sp. nov. and Thaumasiovibrio subtropicus sp. nov. within the family Vibrionaceae, isolated from coral reef seawater off Ishigaki Island, Japan. Systematic and Applied Microbiology 40 (5), 290–296. doi: 10.1016/j. syapm.2017.04.003. Erratum in: Systematic and Applied Microbiology (2018) 41 (1), 62–63. Arevalo, P., VanInsberghe, D., Elsherbini, J., Gore, J. and Polz, M.F. (2019) A reverse ecology approach based on a biological definition of microbial populations. Cell 178 (4), 820–834. doi: 10.1016/j.cell.2019.06.033

Microbial Genomic Taxonomy

175

Auch, A.F., von Jan, M., Klenk, H-P. and Göker, M. (2010) Digital DNA–DNA hybridization for microbial species delineation by means of genome-to-genome sequence comparison. Standards in Genomic Sciences 2, 117–134. doi:10.4056/sigs.531120 Azevedo, G.P.R., Mattsson, H.K., Appolinario, L.R., Calegario, G., Leomil, L., Walter, J.M., Campeão, M., Tonon, L.A.C., Moreira, A.P.B., Vidal, L., Vieira, V.V., Otsuki, K., Tschoeke, D.A., Swings, J., Thompson, F.L. and Thompson, C.C. (2020) Enterovibrio baiacu sp. nov. Current Microbiology 77 (1), 154–157. doi: 10.1007/s00284-019-01785-7 Bobay, L.M., Ellis, B.S. and Ochman, H. (2018) ConSpeciFix: classifying prokaryotic species based on gene flow. Bioinformatics 34 (21), 3738–3740. doi: 10.1093/bioinformatics/bty400 Brown, J.R., Douady, C.J., Italia, M.J. et al. (2001) Universal trees based on large combined protein sequence data sets. Nature Genetics 28, 281–285. doi: 10.1038/90129 Brown, C.T., Hug, L.A., Thomas, B.C., Sharon, I., Castelle, C.J., Singh, A., Wilkins, M.J., Wrighton, K.C., Williams, K.H. and Banfield, J.F. (2015) Unusual biology across a group comprising more than 15% of domain Bacteria. Nature 523 (7559), 208–11. doi: 1038/nature14486 Campeão, M.E., Swings, J., Silva, B.S., Otsuki, K., Thompson, F.L. and Thompson, C.C. (2019) “Candidatus Colwellia aromaticivorans” sp. nov., “Candidatus Halocyntiibacter alkanivorans” sp. nov., and “Candidatus Ulvibacter alkanivorans” sp. nov. genome sequences. Microbiology Resource Announcements 8 (15), e00086-19. doi: 10.1128/MRA.00086-19 Castelle, C.J. and Banfield, J.F. (2018) Major new microbial groups expand diversity and alter our understanding of the Tree of Life. Cell 172 (6), 1181–1197. doi: 10.1016/j.cell.2018.02.016 Chaumeil, P-A., Mussig, A.J., Hugenholtz, P. and Parks, D.H. (2019) GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics. https://doi.org/10.1093/bioinformatics/ btz848 Chun, J., Oren, A., Ventosa, A., Christensen, H., Arahal, D.R., da Costa, M.S., Rooney, A.P., Yi, H., Xu, X.W., De Meyer, S. and Trujillo, M.E. (2018) Proposed minimal standards for the use of genome data for the taxonomy of prokaryotes. International Journal of Systematic and Evolutionary Microbiology 68 (1), 461–466. doi: 10.1099/ijsem.0.002516 Coutinho, F., Tschoeke, D.A., Thompson, F. and Thompson, C. (2016) Comparative genomics of Synechococcus and proposal of the new genus Parasynechococcus. PeerJ 4, e1522. https://doi.org/10.7717/peerj.1522 De Ley, J. (1970) Reexamination of the association between melting point, buoyant density, and chemical base composition of deoxyribonucleic acid. Journal of Bacteriology 101, 738–754. De Vos, P. and Trüper, H.G. (2000) Judicial commission of the International Committee on Systematic Bacteriology. International Journal of Systematic and Evolutionary Microbiology 50, 2239–2244. Dutilh, B.E., Backus, L., Edwards, R.A., Wels, M., Bayjanov, J.R. and van Hijum, S.A. (2013) Explaining microbial phenotypes on a genomic scale: GWAS for microbes. Briefings in Functional Genomics 12 (4), 366–380. doi: 10.1093/bfgp/elt008 Dutilh, B.E., Thompson, C.C., Vicente, A.C., Marin, M.A., Lee, C., Silva, G.G., Schmieder, R., Andrade, B.G., Chimetto, L., Cuevas, D., Garza, D.R., Okeke, I.N., Aboderin, A.O., Spangler, J., Ross, T., Dinsdale, E.A., Thompson, F.L., Harkins, T.T., Edwards, R.A. (2014) Comparative genomics of 274 Vibrio cholerae genomes reveals mobile functions structuring three niche dimensions. BMC Genomics 15 (1), 654. doi: 10.1186/1471-2164-15-654 Fróes, A.M., Freitas, T.C., Vidal, L., Appolinario, L.R., Leomil, L., Venas, T., Campeão, M.E., Silva, C.J.F., Moreira, A.P.B., Berlinck, R.G.S., Thompson, F.L. and Thompson, C.C. (2018) Genomic attributes of novel symbiont Pseudovibrio brasiliensis sp. nov. isolated from the sponge Arenosclera brasiliensis. Frontiers in Marine Science doi:10.3389/fmars.2018.00081 Gaget, V., Welker, M., Rippka, R. de Marsac, N.T. (2015a) A polyphasic approach leading to the revision of the genus Planktothrix (Cyanobacteria) and its type species, P. agardhii, and proposal for integrating the emended valid botanical taxa, as well as three new species, Planktothrix paucivesiculata sp. nov.ICNP, Planktothrix tepida sp. nov.ICNP, and Planktothrix serta sp. nov.ICNP, as genus and species names with nomenclatural standing under the ICNP. Systematic and Applied Microbiology 38, 141–158. Gaget, V., Welker, M., Rippka, R. and de Marsac Tandeau, N. (2015b) Response to: “Comments on:” A polyphasic approach leading to the revision of the genus Planktothrix (Cyanobacteria) and its type species, P. agardhii, and proposal for integrating the emended valid botanical taxa, as well as three new species, Planktothrix paucivesiculata sp. nov.ICNP, Planktothrix tepida sp. nov.ICNP, and Planktothrix serta sp. nov.ICNP, as genus and species names with nomenclatural standing under the ICNP. Systematic and Applied Microbiology 38, 368–370.

176

C.C. Thompson et al.

Garrity, G.M., Bell, J.A. and Lilburn, T.G. (2004) Taxonomic outline of the procaryotes. In: Bergey’s Manual of Systemic Bacteriology 2nd edn, release 5.0. Springer, New York, pp. 1–399. Garza, D.R. and Dutilh, B.E. (2015) From cultured to uncultured genome sequences: metagenomics and modeling microbial ecosystems. Cellular and Molecular Life Science 72 (22), 4287–308. doi: 10.1007/ s00018-015-2004-1 Garza, D.R., and van Verk, M.C., Huynen, M.A. and Dutilh, B.E. (2018) Towards predicting the environmental metabolome from metagenomics with a mechanistic model. Nature Reviews Microbiology 3 (4), 456–460. doi: 10.1038/s41564-018-0124-8 Garza, D.R., Taddese, R., Wirbel, J. Zeller, G., Boleij, A., Huynen, MA. and Dutilh, B.E. (2020) Metabolic models predict bacterial passengers in colorectal cancer. Cancer and Metabolism 8, 3. doi: 10.1186/ s40170-020-0208-9 Gevers, D., Cohan, F.M., Lawrence, J.G. et al. (2005) Re-evaluating prokaryotic species. Nature Reviews Microbiology 3, 733–739. Haley, B.J., Grim, C.J., Hasan, N.A. et al. (2010) Comparative genomic analysis reveals evidence of two novel Vibrio species closely related to V. cholerae. BMC Microbiology 10, 154. doi:10.1186/1471-2180-10-154 Haroon, M.F., Thompson, L.R., Parks, D.H., Hugenholtz, P. and Stingl, U. (2016) A catalogue of 136 microbial draft genomes from Red Sea metagenomes. Sci Data. 3, 160050. doi: 10.1038/sdata.2016.50 Hehemann, J.H., Arevalo, P., Datta, M.S., Yu, X., Corzett, C.H., Henschel, A., Preheim, S.P., Timberlake, S., Alm, E.J. and Polz, M.F. (2016) Adaptive radiation by waves of gene transfer leads to fine-scale resource partitioning in marine microbes. Nature Communication 7, 12860. doi: 10.1038/ncomms12860 Hoffmann, L., Komárek, J. and Kaštovsky, J. (2005) System of cyanoprokaryotes (cyanobacteria)--state in 2004. Archives of Hydrobiology Suppl Algological Studies 117, 95–115 Hug, L.A., Baker, B.J., Anantharman, K., Brown, C.T., Probst, J., Castelle, J.C., et al. (2016). A new tree of life. Nature Reviews Microbiology 1, 16048. Hugerth, L.W., Larsson, J., Alneberg, J., Lindh, M.V., Legrand, C., Pinhassi, J. and Andersson, A.F. (2015) Metagenome-assembled genomes uncover a global brackish microbiome. Genome Biology 16, 279. doi: 10.1186/s13059-015-0834-7 Karlin, S. and Burge, C. (1995) Dinucleotide relative abundance extremes: a genomic signature. Trends in Genetics 11, 283–290. Karp, P.D., Ouzounis, C.A., Moore-Kochlacs, C. et al. (2005) Expansion of the BioCyc collection of pathway/ genome databases to 160 genomes. Nucleic Acids Research 33, 6083–6089. doi:10.1093/nar/gki892 Kauff, F. and Büdel, B. (2010) Phylogeny of cyanobacteria: an overview. In: Progress in Botany 72. Springer, pp. 209–224. Klenk, H-P. and Göker, M. (2010) En route to a genome-based classification of Archaea and Bacteria? Systematic and Applied Microbiology 33, 175–182. doi:10.1016/j.syapm.2010.03.003 Konstantinidis, K.T. and Tiedje, J.M. (2005) Towards a genome-based taxonomy for prokaryotes. Journal of Bacteriology 187, 6258–6264. doi:10.1128/JB.187.18.6258 Kyrpides, N.C. (1999) Genomes OnLine Database (GOLD 1.0): a monitor of complete and ongoing genome projects world-wide. Bioinformatics 15, 773–774. Labeda, D.P. (2000) International committee on systematic bacteriology; IXth international (IUMS) congress of bacteriology and applied microbiology. International Journal of Systematics and Evolutionary Microbiology 50, 2245–2247. Lieven, C., Beber, M.E., Olivier, B.G., Bergmann, F.T., Ataman, et al. (2020) MEMOTE for standardized genome-scale metabolic model testing. Nature Biotechnology 38 (3), 272–276. doi: 10.1038/s41587-0200446-y Erratum in: Nature Biotechnology (2020) 38 (4), 504. Ludwig, W. and Klenk, H.P. (2005) Overview: a phylogenetic backbone and taxonomic frame for procaryotic systematics. In: Brenner, D.J., Krieg, N.R., Staley, T.S. and Garrity, G.M. (eds), Bergeys Manual of Systematic Bacteriology, vol. 2, 2nd edn. Springer, New York, pp. 49–69. Markowitz, V.M., Korzeniewski, F., Palaniappan, K. et al. (2006) The integrated microbial genomes (IMG) system. Nucleic Acids Research 34, D344–D348. doi:10.1093/nar/gkj024 Markowitz, V.M., Chen, I-M.A., Palaniappan, K. et al. (2014) IMG 4 version of the integrated microbial genomes comparative analysis system. Nucleic Acids Research 42, D560–D567. doi:10.1093/nar/ gkt963 Moreira, A.P.B., Duytschaever, G., Tonon, L.A.C. et al. (2014a) Photobacterium sanctipauli sp. nov. isolated from bleached Madracis decactis (Scleractinia) in the St Peter & St Paul Archipelago, Mid-Atlantic Ridge, Brazil. PeerJ 2, e427. doi:10.7717/peerj.427 Moreira, A.P.B., Duytschaever, G., Tonon, L.A.C. et al. (2014b) Vibrio madracius sp. nov. isolated from Madracis decactis (Scleractinia) in St Peter & St Paul Archipelago, Mid-Atlantic Ridge, Brazil. Current Microbiology 2, e427. doi:10.1007/s00284-014-0600

Microbial Genomic Taxonomy

177

Murray, R.G.E. and Stackebrandt, E. (1995) Taxonomic note: implementation of the provisional status Candidatus for incompletely describe procaryotes. International Journal of Systematic and Evolutionary Microbiology 45, 186–187. Doi: 10.1099/00207713-45-1-186 Nóbrega, M.S., Silva, B.S., Leomil, L. Tschoeke, D.A., Campeão, M.E., Garcia, G.D., Dias, G.A., VIeira, V.V., Thompson, C.C. and Thompson, F.L. (2018) Description of Alteromonas abrolhosensis sp. nov., isolated from sea water of Abrolhos Bank, Brazil. Antonie Van Leeuwenhoek 111, 1131–1138. doi:10.1007/ s10482-018-1016-x Olm, M.R., Crits-Christoph, A., Diamond, S., Lavy, A., Matheus Carnevali, P.B. and Banfield, J.F. (2020) Consistent metagenome-derived metrics verify and delineate bacterial species boundaries. mSystems 5(1), e00731-19. doi: 10.1128/mSystems.00731-19 Oren, A. (2004) A proposal for further integration of the cyanobacteria under the Bacteriological Code. International Journal of Systematic and Evolutionary Microbiology 54, 1895–1902. https://doi.org/10.1099/ijs.0.03008-0 Oren, A. and Tindall, B.J. (2005) Nomenclature of the cyanophyta/cyanobacteria/cyanoprokaryotes under the International Code of Nomenclature of Prokaryotes. Archives of Hydrobiology Suppl Algological Studies 117, 39–52. Oren, A. and Ventura, S. (2017) The current status of cyanobacterial nomenclature under the “prokaryotic” and the “botanical” code. Antonie Van Leeuwenhoek 110, 1257–1269. Oren, A., Komárek, J. and Hoffmann, L. (2009) Nomenclature of the Cyanophyta/Cyanobacteria/Cyanoprokaryotes--what has happened since IAC Luxembourg? Archives of Hydrobiology Suppl Algological Studies 130, 17–26. Overbeek, R., Olson, R. and Pusch, G.D. et al. (2014) The SEED and the Rapid annotation of microbial genomes using Subsystems Technology (RAST). Nucleic Acids Research 42, D206–D214. doi:10.1093/nar/gkt1226 Pachiadaki, M.G., Brown, J.M,. Brown, J., Bezuidt, O., Berube, P.M., Biller, S.J., Poulton, N.J., Burkart, M.D., La Clair, J.J., Chisholm, S.W. and Stepanauskas, R. (2019) Charting the complexity of the marine microbiome through Single-Cell Genomics. Cell 179(7), 1623–1635. doi: 10.1016/j.cell.2019.11.017 Pagani, I., Liolios, K., Jansson, J. et al. (2012) The Genomes OnLine Database (GOLD) v. 4: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Research 40, D571–D579. doi:10.1093/nar/gkr1100 Parks, D.H., Rinke, C., Chuvochina, M., Chaumeil, P.A., Woodcroft, B.J., Evans, P.N., Hugenholtz, P. and Tyson, G.W. (2017) Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nature Reviews Microbiology 2 (11), 1533–1542. doi: 10.1038/s41564-017-0012-7 Erratum in: Nature Reviews Microbiology (2017) Dec 12; PMID: 28894102. Parks. D.H., Chuvochina, M., Waite, D.W., et al. (2018) A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nature Biotechnology 36, 996. https://doi.org/10.1038/ nbt.4229 Parks, D.H., Chuvochina, M., Chaumeil, P-A. et al. (2020) A complete domain-to-species taxonomy for Bacteria and Archaea. Nature Biotechnology https://doi.org/10.1038/s41587-020-0501-8 Parte, A.C. (2014) LPSN--list of prokaryotic names with standing in nomenclature. Nucleic Acids Research 42, D613–D616. https://doi.org/10.1093/nar/gkt1111 Pinto, A.J., Sharp, J.O., Yoder, M.J. and Almstrand, R. (2016) Draft genome sequences of two novel Acidimicrobiaceae members from an acid mine drainage biofilm metagenome. Genome Announcements 4 (1), e0156315. doi: 10.1128/genomeA.01563-15 Preheim, S.P., Timberlake S. and Polz, M.F. (2011) Merging taxonomy with ecological population prediction in a case study of Vibrionaceae. Applied and Environmental Microbiology 77, 7195–7206. doi:10.1128/ AEM.00665-11 Rohwer, F. and Edwards, R. (2002) The phage proteomic tree: a genome based taxonomy for phage. Journal of Bacteriol 184, 4529–4535. doi:10.1128/JB.184.16.4529 Romero, P., Wagg, J., Green, M.L. et al. (2005) Computational prediction of human metabolic pathways from the complete human genome. Genome Biology 6, R2. doi:10.1186/gb-2004-6-1-r2 Silva, C.S.F., Walter, J.M., Nobrega, M.S., Calegario. G., Appolinario, L.R., Leomil, L., Cavalcanti, G., Silva, B.S., Garcia, G.D., Tschoeke, D., Swings, J., Thompson, F.L. and Thompson, C.C. (2018) Genome sequences of Vibrio maerlii sp. nov. and Vibrio rhodolitus sp. nov., isolated from rhodoliths. Microbiology Resources Announcements 7 (19), e01039-18. doi: 10.1128/MRA.01039-18 Snipen, L. and Ussery, D.W. (2010) Standard operating procedure for computing pangenome trees. Standards in Genomic Science 2, 135–141. doi:10.4056/sigs.38923 Stackebrandt, E. and Ebers, J. (2006) Taxonomic parameters revisited: tarnished gold standards. Microbiology Today 33, 152–155.

178

C.C. Thompson et al.

Stanier, R.Y., Sistrom, W.R., Hansen, T.A., et al. (1978) Proposal to place the nomenclature of the Cyanobacteria (blue-green algae) under the rules of the International Code of Nomenclature of Bacteria. International Journal of Systematic and Evolutionary Microbiology 28, 335–336. https://doi.org/10.1099/0020771328-2-335 Thompson, C.C., Vicente, A.C.P., Souza, R.C. et al. (2009) Genomic taxonomy of vibrios. BMC Evolutionary Biology 9, 258. doi:10.1186/1471-2148-9-258 Thompson, C., Vieira, N.M., Vicente, A. and Thompson, F. (2011) Towards a genome based taxonomy of Mycoplasmas. Infections, Genetics and Evolution 11, 1798–1804. doi:10.1016/j.meegid.2011.07.020 Thompson, CC., Chimetto, L., Edwards, R.A. et al. (2013a) Microbial genomic taxonomy. BMC Genomics 14, 913. doi:10.1186/1471-2164-14-913 Thompson, C.C., Emmel, V.E., Fonseca, E.L. et al. (2013b) Streptococcal taxonomy based on genome sequence analyses. F1000Res 2, 67. doi:10.12688/f1000research.2-67.v1 Thompson, C.C., Silva, G.Z., Vieira, N.M. et al. (2013c) Genomic taxonomy of the genus Prochlorococcus. Microbial Ecology 66, 752–762. doi:10.1007/s00248-013-0270-8 Thompson, C.C., Amaral, G.R., Campeão, M., et al. (2015) Microbial taxonomy in the post-genomic era: rebuilding from scratch? Archives of Microbiology 197, 359–370. Tindall, B.J. (1999) Note: Proposals to update and make changes to the Bacteriological Code. International Journal of Systematic and Evolutionary Microbiology 49, 1309–1312. Tschoeke, D., Salazar, V.W., Vidal, L., Campeão, M., Swings, J., Thompson, F. and Thompson, C. (2020) Unlocking the genomic taxonomy of the Prochlorococcus collective (in press). Vidal, L.M.R., Gonçalves, A., Venas, T.M., Campeão, M.E., Calegario, G., Walter, J.M., Silva, B.S., Garcia, G.D., Tschoeke, D.A., Swings, J., Thompson, F.L. and Thompson, C.C. (2019) Halomonas coralii sp. nov. isolated from Mussismilia braziliensis. Current Microbiology 76 (6), 678–680. doi: 10.1007/s00284-019-01674-z. Walter, J.M., Coutinho, F.H., Dutilh, B.E., et al. (2017) Ecogenomics and taxonomy of Cyanobacteria phylum. Frontiers in Microbiology 8. https://doi.org/10.3389/fmicb.2017.02132 Wayne, L.G., Brenner, D.J., Colwell, R.R., et al. (1987) International Committee on Systematic Bacteriology announcement of the report of the ad hoc Committee on Reconciliation of Approaches to Bacterial Systematics. International Journal of Systematic Bacteriology 463–464. Woese, C.R. and Fox, G.E. (1977) Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proceedings of the Natl Academy of Sciences USA 74, 5088–5090. Woese, C.R., Kandler. O. and Wheelis, M.L., (1990) Towards a natural system of organisms: proposal for the domains Archaea, Bacteria and Eucarya. Proceedings of the Natl Academy of Sciences USA 87, 4576–4579. Woyke, T., Xie, G., Copeland, A., González, J.M., Han, C., Kiss, H., Saw, J.H., Senin, P., Yang, C., Chatterji, S., Cheng, J.F., Eisen, J.A., Sieracki, M.E. and Stepanauskas, R. (2009) Assembling the marine metagenome, one cell at a time. PLoS One 4 (4), e5299. doi: 10.1371/journal.pone.0005299 Wright, F. (1990) The effective number of codons used in a gene. Gene 87, 23–29. Wu, D., Hugenholtz, P., Mavromatis, K. et al. (2009) A phylogenydriven genomic encyclopaedia of Bacteria and Archaea. Nature 462, 1056–1060. doi:10.1038/nature08656.A

11

Navigating Bacterial Taxonomy in a World of Unchartered Microbial Organisms Varsha Kale, Lorna Richardson and Robert D. Finn* European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, UK

Introduction Traditional taxonomic classifications employed culture-dependent methods to characterize microbes by their morphology and biochemical profile. However, it became apparent that the microbial diversity in nature was far greater than the capacity of laboratory culture, and curiosity of the unknown fuelled the development of cultureindependent methods. Molecular methods, first introduced in the 1960s, started to distinguish bacterial species based on the guanine – cytosine nucleotide content, and DNA-DNA hybridization was used as a gold standard until a series of conserved marker genes were later identified (Janda and Abbott, 2002). More recently, DNA sequencing approaches have increasingly been used to investigate the composition of microbial communities. These approaches have allowed new identities and functional capabilities of organisms to be revealed. The human microbiome represents the most widely studied microbial ecosystem, especially the gut, where the composition of the bacterial community has been demonstrated to be important in human health and can indicate the difference between healthy and pathogenic states. Today an ever-increasing spectrum of microbial communities (or biomes) is currently under investigation, ranging from

the deep ocean abyss (Kopf et al., 2015; Sunagawa et al., 2015) to the study of microbes interacting with plants grown on the International Space Station (Be et al., 2017). From monitoring environmental biodiversity, to the application of bacteria or the enzymes they encode within the biotechnology, food production and farming industries, the potential for discovery is vast. Regardless of the biome, two overarching experimental methodologies are routinely applied: metabarcoding and metagenomics. Metabarcoding utilizes specific, conserved marker genes that can be used as a diagnostic of species (see Chapters 12 and 16). In the late 1900s Carl Woese highlighted that the ribosomal RNA (rRNA) genes are a strong indicator of evolutionary change, and established Bacteria and Archaea as separate domains based on them (Woese et al., 1990). Thus, for bacteria and archaea the small subunit (SSU) rRNA (also called the 16S rRNA) is a standard for classification (see Chapter 16). The SSU rRNA is a housekeeping gene found in all cellular organisms, with highly conserved regions, punctuated by hypervariable regions (Yarza et al., 2014). The fact that the function has remained constant suggests that any sequence changes are likely to have evolutionary importance. The conserved rRNA regions enable the design of DNA sequencing primers that are used to amplify the

*[email protected] © CAB International 2021. Trends in the Systematics of Bacteria and Fungi (eds. P. Bridge, D. Smith and E. Stackebrandt)

179

180

V. Kale et al.

hypervariable regions, which provide the unique ‘fingerprint’ for each species. In the formative years of microbiome research, metabarcoding was the only approach used, as the cost of sequencing (e.g. Sanger [1st-generation] and 454 [2nd-generation] sequencing) was prohibitively high (Heather and Chain, 2016). As sequencing costs diminished, direct random shotgun sequencing of the entire sum of DNA extracted from the sample, termed metagenomics, became possible (Tyson et al., 2004; Venter et al., 2004). The cost of sequencing has continued to decline, and computational methods have advanced, and thus it is now possible to reconstruct complete (or near-complete) genomes from deeply sequenced metagenome samples. These so-called metagenome-assembled genomes (MAGs) are rapidly expanding the ‘tree of life’ (Hug et al., 2016). Specialized assemblers and binning software make it possible to resolve near-complete genomes, helping to uncover novel prokaryotic diversity from complex environmental data sets, and to distinguish closely related species. Experimental approaches to resolve MAGs are well established, and consequently this chapter will focus on state-of-the-art informatics techniques that perform bacterial taxonomic assertions on metabarcoding data sets and MAGs. We will highlight some of the advantages and disadvantages of the respective approaches and will finish by exploring why they are yet to be perfectly complementary.

Determining Taxonomy in Metabarcoding Experiments As described in the introduction, the SSU rRNA is a widely used marker gene to classify the bacteria and archaea found in a microbiome. Presently, this approach is most effective for prokaryotes, and while it is worth highlighting that eukaryotic genomes also contain an homologous SSU rRNA (the 18S rRNA), the bacterial DNA sequencing primers do not universally amplify the eukaryotic gene. Furthermore, the evolutionary distance between many of the microbial eukaryotes, especially the fungi, is relatively short compared to bacteria. As a result, there are fewer differences in the hypervariable region of the eukaryotic SSU rRNA, thus restricting the taxonomic resolution of the marker gene. Consequently,

alternative genetic markers are frequently used for studying eukaryotes, notably the internal transcribed spacers (ITS1 and ITS2) found within the ribosomal operon (see Chapter 12). Finally, as viruses lack a universal marker gene, metabarcoding approaches cannot elucidate viral composition or abundance. Hence, metagenomic and metatranscriptomic approaches are the only sequence-based approaches that are currently in use for determining viral composition. To accurately classify the individual species in a community we need to ascertain: which sequences are phylogenetically dissimilar, which sequences contain errors (i.e. are not representative of the true community biodiversity) and what is the taxonomic lineage for each unique sequence. The ideal scenario would be to conduct error-free sequencing, then look up a perfect match for each sequence in a catalogue of all known organisms. If each one of these organisms had a complete genome, we could then create a mapping between the marker gene sequence and the functional profile for that species, reducing the need for metagenomic studies. We will return to this idea later but, unsurprisingly, we currently do not have such a comprehensive catalogue. As elegantly illustrated by Rinke and colleagues, our knowledge of microbial genomes is extremely limited in phylogenetic breadth, owing to our inability to cultivate most microorganisms in the laboratory (Rinke et al., 2013). Based on SSU rRNA profiling, estimates suggest that perhaps as little as 1% of bacterial species have been cultured. Databases such as SILVA (Glöckner et al., 2017) have been developed to capture SSU rRNA sequences, but their composition still falls short of the total microbial diversity. Thus, in the absence of a comprehensive catalogue of SSU rRNA, how can the unknown organisms be identified, taxonomically labelled and contextualized across different samples? One approach that has been widely used to overcome this is sequenced-based clustering. Clustering enables the generation of what are termed operational taxonomic units (OTUs), both in the context of a reference catalogue or database (closed reference), or de novo. Although widely applied, OTU techniques and species delineation thresholds have been met with increasing scrutiny, both in terms of accuracy and taxonomic resolution. For example, researchers have questioned the validity of OTUs, the extent to which

Navigating Bacterial Taxonomy

OTUs can be used in parallel for different samples and whether comparative analysis yields meaningful results (Chen et al., 2013; Callahan et al., 2017). In the following section we discuss the two main systematic methodologies for assigning taxonomy to bacterial metabarcoding data sets. We also highlight an emerging method based on amplicon sequence variants (ASVs), which overcomes some of the pitfalls encountered by OTUs.

Approaches for Assigning OTUs to Amplicon Sequences Grouping SSU rRNA sequences by similarity has been the mainstay method to identify sequences that belong to the same species. More specifically, the OTU clustering method groups marker gene sequences above a threshold of sequence ‘dissimilarity’, with each cluster of sequences termed an OTU and believed to have been derived from the same bacterial grouping. The typical threshold to group bacterial sequences to a species rank is 3% dissimilarity (97% sequence similarity). This threshold was first coined by Stackebrandt and Goebel (1994) and was based on the gold standard DNA-DNA hybridization (DDH) threshold of 70% to delineate species. A comparative survey of the available bacterial SSU rRNA sequences with DDH results showed that sequences with > 70% DNA similarity correlated to a sequence identity greater than 97% (Edgar, 2017). As sequencing technologies have progressed, the number of available SSU rRNA sequences has increased and the validity of this OTU species threshold was re-evaluated. It was found that 97% is, however, an approximation for species delineation benchmarked with a subset of the SSU sequences, and incorrect merging of species is observed between a 98%–99% threshold. For example, species within the genus Mycobacterium are greater than 98% similar (Beye et al., 2018; Edgar, 2018). An increase to this ‘gold standard’ percentage threshold has been proposed (Stackebrandt and Ebers, 2006; Edgar, 2018). There are two main processes for OTU classification (closed reference, and de novo clustering), which can be used individually or together. Closed reference OTU classification depends on a

181

comprehensive reference database, which is representative of the biodiversity found in the community being sequenced. While it is out of the scope of this chapter to compare the merits of different reference databases (Almeida et al., 2018), it is noteworthy that this has led to the development of both general and specific reference databases, with taxonomy targeted to particular biomes, e.g. the Human Oral Microbiome Database (HOMD) (Escapa et al., 2018). Using different databases with closed reference methods can present discrepancies in taxonomic annotation. Briefly, sequences are first clustered into OTUs based on their matches to the reference database (termed closed reference OTU picking), and those that do not match within the predefined threshold remain unclassified (Edgar, 2017). The primary reason for adopting closed reference OTU classification is to easily assert a taxonomy for the OTU by directly mapping a representative sequence for each cluster to the reference database. The sequences within a cluster are assigned the same taxonomy as the representative. Another advantage of closed reference OTU picking is the ability to cluster together two different, non-overlapping query sequences that belong to a common reference sequence; for instance, those experiments that employ multiple primers during the amplification process. Notably, closed reference OTU picking is the only method appropriate for assigning taxonomy to the SSU rRNA sequences from raw shotgun metagenomics sequences, which will have fragmentary sequence matches across the whole SSU rRNA gene. Since success is dependent on the reference database, as long as both non-overlapping regions are covered in the same database, it is possible to utilize this method for analysing complex data sets. Conversely, a drawback of closed reference OTU analysis is that novel species that do not match to a reference sequence are unclassified. It is also possible, particularly with third- generation short-read DNA sequencing, that a query sequence could match to multiple sequences in the reference database. This occurs more frequently if the amplified marker gene sequence contains both conserved and variable regions (e.g. amplification of the SSU rRNA across variable regions 3 and 4). While a global alignment between the full-length SSU rRNA sequences in a reference database would reveal at least 3% dissimilarity between every sequence, an alignment of

182

V. Kale et al.

a sub-region of the SSU (as is produced by modern metabarcoding experiments) may not display the same level of dissimilarity, especially as each of the SSU variable regions evolve at different rates (Chakravorty et al., 2007). Thus, one amplified sub-region can be clustered with multiple different reference SSU rRNAs, even though the full-length alignments of the reference sequences do not form an OTU. The original threshold (3% dissimilarity) to delineate species was defined by a comparative analysis of full-length SSU sequences with their DNA-DNA reassociation values (Stackebrandt and Goebel, 1994), and subsequently any suggested updates to this threshold were also benchmarked with high-quality, full-length sequences (Edgar, 2018). The variable regions alone do not reflect the same sequence similarity as a full-length SSU, as illustrated with a range of clustering algorithms by Edgar (2018), whereby an average threshold of almost 100%, was required to distinguish species for the popular V4 16S variable region. It has been suggested that a threshold between 97% and 99% could allow delineation of species with the full-length of the gene (Johnson et al., 2019). We will discuss longread sequencing and resolving a full-length SSU further in this chapter. The alternative approach for OTU classification is de novo clustering, which is independent of a reference database and involves clustering of the entire set of query sequences to each other using distance-based algorithms, followed by the assignment of a consensus sequence for each cluster. This approach is widely adopted by research scientists, especially for analyses of environmental microbiomes, where the constituent bacteria are poorly represented by reference databases. However, distance-based algorithms can be computationally intensive, therefore different tools utilize different measures and algorithms to improve efficiency. The biological accuracy and number of OTUs output by the various tools are influenced by the type of sequence alignment and clustering model used. As such, results can differ based on the tool used, restricting cross-study comparison (without recomputation) (Chen et al., 2013). Most de novo clustering tools utilize k-mer- (overlapping sub-sequences of length k) based algorithms with a random seed; therefore, when the clustering is repeated, there can be subtle differences in the output (Ghodsi et al., 2011). This is all the more evident should sequences

be added from a second sequencing experiment, as there could potentially be major differences in the clustering results. Furthermore, the OTUs are typically assigned an identification number, but until the representative sequence for this OTU is mapped to a taxonomy, that number carries no meaning beyond the data set being clustered. Thus, identifying an equivalent OTU between two different data sets would actually require the re-clustering of the entire, combined data sets. For large data sets, such as the ‘earth microbiome’ (Thompson et al., 2017) and the American Gut projects (McDonald et al., 2018), which could be reanalysed by different researchers, replicating the same clustering with de novo methods is both computationally expensive and may result in a marginally different taxonomic classification. Moreover, multiple data sets can only be directly compared if they are from the equivalent region of the marker gene. As de novo clustering initially treats each sequence cluster as though it could be novel, and does not provide any taxonomic lineage, the primary information provided is the diversity, explained by the number of OTUs and their relative abundance. To overcome respective drawbacks of de novo clustering and closed reference database mapping, open reference OTU analysis can be employed. It combines both approaches: first, closed reference database mapping is performed, followed by de novo clustering of the sequences that have no match to the reference database in the first instance. This provides a taxonomy to those OTUs that match the reference databases, as well as enabling the relative abundance of those OTUs that do not match. A mapping of the latter to a more generic database can be attempted, or they can represent hitherto novel organisms. This makes it possible to compare those OTUs that have a taxonomic assertion between two different data sets; however, those OTUs arising from the de novo clustering are not comparable. It is evident that OTU clustering has limitations, and the potential for comparative study is restricted, but it is still a popular method to classify bacterial communities. Aside from condensing large bacterial data sets into more manageable groups, OTUs were primarily introduced to diminish the effects of PCR and sequencing errors. Advanced Illumina sequencing has reduced the chance of an incorrect base call to < 0.1% and,

Navigating Bacterial Taxonomy

augmented with post-sequencing quality control, this could be further improved. However, if each sequence is treated uniquely, it is possible to interpret errors as a new strain, incorrectly inflating the diversity within a sample. PCR errors are both multiplied (through the amplification process) and become more prominent (increasing frequency) in deeply sequenced experiments, which attempt to recover bacterial strains present at a low abundance. PCR or sequencing errors can be as small as a single nucleotide misassignment which, when clustering full-length SSU, is effectively masked in a 97% similarity cluster. Grouping similar sequences into OTUs minimizes the rate at which the errors can be misinterpreted as biodiversity. As current error rates with third-generation sequencing platforms such as Illumina are so low, clustering at 99% is feasible, thus providing better taxonomic resolution.

Emergence of Amplicon Sequence Variants (ASVs) Environmental analyses typically looked for significant bacterial differences between studies at higher taxonomic ranks. While OTUs are an ideal method to perform an analysis of sample diversity from within an individual project, an increased interest in meta-analyses focused on detailed strain-level differences in environmental bacteria, and/or longitudinal time-series analyses, means that they are less appropriate. Strain- level differences make important contributions to symbiotic bacterial relationships, pathogenicity and more (Bauer et al., 2018). The combination of a number of unknown novel strains in the lower lineages, partly due to the grouping of different strains into OTU clusters, and the lack of accurate species/strain taxonomy means that environmental diversity is often summarized to a trusted phylum or to genus rank. As the number of known strains increases, it may be possible to elucidate species- or even strain-level differences for meta-analysis. In such scenarios, it can be unclear whether the variation within an OTU is due to errors or true biological diversity. To overcome many of the limitations of OTU methods, an alternative method has recently grown in popularity. The ASV method is designed

183

to eliminate sequencing errors and reveal only the true biological data. In addition to the removal of erroneous sequences, ASVs avoid the classic clustering methods used to resolve OTUs, thereby abolishing the need for a 97% threshold. Arguably, the taxonomy derived from ASVs is a more accurate representation of the true biological community, delineating strain variants at a detailed single nucleotide level. For instance, Neisseria meningitidis is often carried in humans as a commensal bacterium and can become highly virulent with single nucleotide polymorphisms. N. meningitidis has several serotypes that carry combinations of hypervirulent complexes (Sacchi et al., 2002). Surveillance of the SSU rRNA sequences from those strains involved in outbreaks revealed single-base differences with clear ancestry, depicting evolutionary recombination events (Sacchi et al., 2002). This is one clinical example where ASV could be a useful method to differentiate commensal and pathogenic strains. ASVs have been referred to in the literature by other terms: exact sequence variants, subOTUs and zero-radius/noise OTUs (zOTUs). Some of the earliest references to ASV-like techniques were ‘deionising’, oligotyping and minimum entropy decomposition (MED). Pyrosequencing runs would be deionised with AmpliconNoise, an algorithm that removes erroneous sequences prior to constructing OTUs (Callahan et al., 2017). By contrast, oligotyping is a focused clustering of short informative DNA sequences taken from an initial alignment. In the context of marker gene data, the information is a Shannon entropy calculated from an alignment of query sequences, where the entropy score is a statistical value to pinpoint the nucleotide positions inclined to influence evolutionary variation. An oligotyping study revealed greater diversity of Bacteroides in SSU sequences in the human gastrointestinal tract than a de novo OTU approach, of which differences as small as 2 nucleotides correlated to geographic specificity (Eren et al., 2013). MED, an extension of oligotyping introduced by Eren et al. (2015), was designed to remove the reliance on pre-clustering of data sets. The marker gene sequences are grouped by entropy score, and iteratively regrouped until the score drops to 0; that is, there are no remaining nucleotide positions that are distinctly variable between all sequences in a cluster. To illustrate the presence

184

V. Kale et al.

of diversity within OTUs, MED was applied to OTUs clustered at a 99% threshold for three different marker genes, and significant sequence variation was observed within the clusters (Needham et al., 2017). Clusters with high abundance taxa had the most variation in diversity. This demonstrates that it is possible to differentiate single-base variants within an OTU cluster at a threshold higher than 97%, highlighting diversity that was otherwise missed by OTUs. However, oligotyping is a procedure that needs refining. It requires an estimation of nucleotide positional diversity for each expected bacterium in an environment, and is heavily dependent on userdefined parameters to calculate the rate of error (e.g. the minimum number of samples in which an oligotype is expected), or the minimum count of the most abundant species (Eren et al., 2013). ‘Deionisation’, entropy and MED are methods and measures used as a guideline to validate ASVs, and the revised approaches avoid OTU clustering altogether. ASV algorithms primarily focus on the abundance of a sequence to evaluate the chance of it being erroneous, the hypothesis being that errors are less prevalent than real sequences (Callahan et al., 2017). DADA2, Deblur and UNOISE are three examples of ASV pipelines. Read-preparation steps perform quality control by de-multiplexing raw reads and filtering chimeric sequences prior to deriving ASVs. Merging of paired-end reads is an additional step occurring prior to or after the assignment of ASVs, depending on the parameters and algorithm used. We will briefly discuss the differences between the three pipelines and explore some examples of ASV analysis. UNOISE (Edgar, 2016) used a model based on the distribution of abundance and distance. Initial steps remove low-abundance sequences as they are deemed more likely to be errors. This approach applies a coarse filter to the data set and risks eliminating real low-abundance strain variants. UNOISE depends on two measures: the maximum distance allowed (accounting for substitutions and gaps only), and minimum abundance ratio between a true sequence and query (Edgar and Flyvbjerg, 2015). Query sequences are clustered into unique reads and processed from most abundant to least. The most abundant sequence is treated as a centroid, or a true sequence. The neighbouring sequences are compared to the centroid by their abundance and

distance measure. An erroneous sequence would have a small distance measure and also low abundance in comparison to a true sequence. Both measures are user-defined parameters, and they either increase or decrease the sensitivity of the error model. User-defined distance measures are the biggest hurdle with the UNOISE model, as the distance value needs to be no larger than the expected error rate. ASVs aim to eliminate arbitrary thresholds; however, this mathematical model is only accurate if the sensitivity of the model mirrors the likelihood that an error will occur. The sensitivity would be conditional on the type of PCR and sequencing platforms used (Edgar and Flyvbjerg, 2015; Edgar, 2016). This error model was validated using ‘mock bacterial communities’ (two data sets comprising defined collections of bacterial species) sequenced with Illumina, and as such may not be representative of the error rates for different sequencing techniques. Deblur (Amir et al., 2017) is predominantly guided by abundance. A multiple sequence alignment of reads detects the number of indels and substitutions using a distance measure to determine which reads are most similar, namely the neighbouring reads. As the distance increases, so does the error probability (the probability that a single nucleotide mismatch could occur). Insertions and deletions are treated with a constant probability of 0.01. In order of abundance, an error prediction for each read is subtracted from defined neighbouring reads. If the number of reads remaining at the end of the iterations is 0, then is it assumed that this sequence was an error. Deblur predefines harsher error estimates per substitution to negate the effects of different PCR conditions. The error predictions are also trained with Sanger and Illumina sequence data sets. A caveat of this approach is the risk of falsely labelling true sequences as errors. In most cases this should not be an issue, as both the sequences themselves and the sequence abundances of an error and true read need to be very similar for this to occur. However, since deblurring prioritizes abundant sequences, if a true rare variant has low counts, it could be eliminated. The divisive amplicon deionising algorithm (DADA) uses the AmpliconNoise mentioned earlier (Quince et al., 2011), which has now been superseded by DADA2 (Rosen et al., 2012; Callahan et al., 2016). This uses machine learning to

Navigating Bacterial Taxonomy

learn the error rate of a sequencing run, estimating the rate at which one base can be substituted for another. The second step uses the learned error rate as a parameter to perform pairwise k-mer clustering and re-clustering until a significant p-value is reached. The p-value is a calculation of abundance, whereby a significant value means that a sequence abundance is too high to be classified as an error. Clustering begins by assuming that the most abundant sequence is real, the remainder are errors and the model is incrementally improved. The result is a set of clusters where each cluster representative is a real error- free biological sequence. DADA2 is modelled on Illumina reads and the initial estimation of error rates are modelled on Illumina quality scores; however, the quality metrics can be manually assigned for other sequencers. Since the algorithm is dependent on quality scores, paired-end reads are merged after ASV elucidation. Each sample is processed independently and the errors on each read are treated independently. All three of the above examples aim to output only true biological sequences, removing errors that introduce false biodiversity. ASVs are unique, and each sequence is treated as a single entity with its own taxonomy, which means it is possible to make comparative analyses across studies. In contrast, the constituent sequences of an OTU can contain errors, causing uncertainty about its true diversity, which is masked by classifying only a representative sequence. The OTU cluster identifiers are arbitrary numbers and cannot be extrapolated for replicate studies, or direct comparison to another taxonomic lineage, until they are mapped to a database. A schematic of the advantages of ASV over closed reference OTU and de novo OTU clustering is shown in Fig. 11.1. Straub et al. (2019) performed a comparative analysis between ASV and OTU methods for SSU rRNA from mock and true environmental data sets. Focusing on the environmental samples, the ASV approach outperformed OTU in both specificity and sensitivity. Fewer false positives were observed as the ‘deionising’ steps effectively eliminated erroneous reads. To reduce the number of error-containing reads within the OTUs, low-abundance reads were removed. However, this also meant that true rare variant sequences were discarded. Strikingly, there was no overlap in the most abundant genera for both methods, and this is most likely due

185

to algorithm differences. Specifically, ASVs take into account the abundance difference between real and erroneous sequences and refit models to an optimum, whereas OTU abundance is evaluated by the number of sequences in a cluster. A range of environmental samples, including soil, were analysed by Straub et al. (2019). Soil is known to be a microbe-rich environment, and two independent soil samples can have a vastly different diversity. Environments such as soil have a high beta-diversity (the number of species that differ between environments) and produce complex phylogenetic trees with many low, abundant branches at the lower taxa lineages. For example, 75% of genus diversity for soil and sediment samples in this study had less than 1% abundance (Straub et al., 2019). It can be overwhelming to analyse this volume of data when broken down to ASVs. For this reason, Straub et al. (2019) could only confidently analyse bacterial taxonomy to genus level. If elucidating a detailed taxonomy of low-abundance, rare strains is the aim of an analysis, this is an advantage. Otherwise, a parallel OTU and ASV approach may be more informative, where the initial OTU clusters inform about the general diversity, and ASVs hint to potential errors. Meta-analysis has inspired sequencing of new interesting environments, and at times there is only a small quantity of a sample available, or the correct extraction technique still needs to be refined. Ancient DNA, for example from skeletal or calcified dental plaque samples (Weyrich et al., 2018) or hair samples with low-quality DNA and low biomass, respectively, may be subjected to multiple rounds of PCR and deeper sequencing to achieve viable data. Longer exposure to laboratory techniques make these samples susceptible to contamination, and removal of error sequences becomes vital to prevent false positive diversity estimates. Caruso et al. (2019) tested this theory by diluting mock community samples with a known bacterial diversity to a low biomass, followed by OTU and ASV analyses. The number of detected contaminants increased linearly with lower biomass samples for Deblur, DADA2 and UNOISE (the ASV tools discussed above), and non-linearly for OTU clustering algorithms. The results support the view that ASV methods are effective in eliminating error sequences. We have discussed two studies that illustrate the advantages of ASVs, but it is important

186

ASVs

Closed reference

*

x x x x

x

*

x x x

x x

x

x

Cross-analyses of taxonomic and functional diversity x x x x x x x x

Sample 1 Sample 2 Sample 3 Sample 4

x

Sample 1 Sample 2 Sample 3 Sample 4

*

Unknown Reference marker gene sequences Marker genes from isolate genomes

x

Project 1

Project 2

Cross-project compatibility

Fig. 11.1. A schematic of the potential workflow from prokaryotic genomes to taxonomic annotation. From left to right, full-length marker gene sequences are elucidated from complete or near complete genomes. Marker gene sequences are classified: a closed reference OTU approach can only classify sequences present in the reference database. Clusters of unknown sequences from a de novo OTU approach cannot be compared across projects. An ASV approach can identify species/strains with single nucleotide differences, and can expand clades in a phylogenetic tree where these were previously clustered together. Ultimately a combination of the functional profile (from coding sequences in the genomes) and ASV taxonomy will highlight the variation in diversity between projects. OTU, operational taxonomic unit; ASV, amplicon sequence variant.

V. Kale et al.

x

De novo

*

Functional profile

x

x

x

x x x

x

x

x

Navigating Bacterial Taxonomy

to scrutinize the models they use. Similar to de novo OTU clustering, the same primers targeting one genetic locus would be required to accurately compare two ASV studies, a limitation that closed reference OTU clustering does not pose. Additionally, all three classifiers – DADA2, Deblur and UNOISE – ‘deionise’ the data and need to calculate potential error rates to do so. The models are trained beforehand on a range of Illumina and/or Sanger data sets (i.e. short-read sequencing), and it is unlikely that the trained models encompass all the possible PCR and sequencing conditions. Moreover, with an increased interest in long-read sequencing, all models would need to be retrained with new test data before they can be applied. Error estimation is always approximate, regardless of the algorithm used, thus finding a universal method that is accurate for a wide range of environments and sequencing conditions is a difficult task. Nevertheless, the finite bacterial diversity uncovered with ASVs is promising, and questions whether OTU clustering is the appropriate method for meta-analysis. The accuracy of taxonomic assignments for metabarcoding experiments can be variable, depending on the marker gene, genetic locus and also the reference database (Almeida et al., 2018). Each of these factors should influence the choice of classifier that is used. First, the marker gene. There are some gold standard marker genes that are trusted, depending on the microbe type you wish to classify, and we have discussed SSU rRNA here. However, the SSU rRNA gene used for prokaryotes can exist as multiple copies, and sometimes with intra-genomic heterogeneity. In fact, E. coli is known to have seven copies of the SSU rRNA (Johnson et al., 2019). Each copy would be classified as a different ASV, thereby inflating diversity and risking introducing false-novel strains to an already complex bacterial phylogeny. For a data set comprising bacterial DNA, many of which are known to carry multiple copies of the targeted marker gene, one may choose to use a combined OTU and ASV analysis to capture false positives, or consider a different marker gene. The SSU rRNA is only one potential marker gene we have discussed. The large subunit (LSU) rRNA (23S in bacteria or archaea and 28S in eukaryotes) (Pei et al., 2009) has a phylogeny largely in agreement with the SSU rRNA, and both ITS regions are increasingly used for fungal data sets (Schoch et al., 2012). Each gene is a different

187

length and evolves at a different rate; therefore, the advantages and limitations of ASVs will apply variably (Needham et al., 2017). Second, the marker gene region. The SSU rRNA gene has nine variable regions which all evolve at different rates. Using a de novo classifier for non-overlapping regions would result in fragmented clusters and falsely inflated bacterial diversity, and so a closed reference OTU approach or ASVs may be more appropriate. Third, the database. This represents the existing phylogeny that query sequences are mapped to. Databases with biome targeted taxonomy could be selected for a well-characterized environment. Additionally, the way a taxonomy is generated may largely influence whether it is relevant for a data set. Taking SILVA (2020) as an example: the database is available clustered anywhere between 97% and 99% sequence similarity, and the user should select which taxonomy they wish to use depending on the data type. Furthermore, each database sources its sequences from other high-throughput studies or general databases, and it is important to consider if the source or quality of sequences in that resource suits your data set. In this section we have discussed OTUs and ASVs as two potential options to classify marker gene sequences. While each has its benefits and pitfalls, in each case there are parameters and options which can and should be optimized for the query data set.

Assigning Taxonomy to MAGS So far in this chapter we have focused entirely on methods for assigning taxonomy to the sequences that are obtained from metabarcoding experiments. Such experiments represent the cheapest approaches for understanding the bacterial composition of a microbiome, both in terms of sequencing and of computational analysis. SSU rRNA has been a standard classification tool for prokaryotic taxonomy data sets for many years. We have described above the potential inaccuracies of amplicon classification tools, and metabarcoding is reliant on amplicons that target only partial regions of this marker gene. In this part of the chapter we turn our attention to the other extreme, the taxonomic assignment of genomes (or MAGs) recovered from deeply sequenced shotgun metagenomics experiments.

188

V. Kale et al.

This represents arguably the most costly form of experiments, but also potentially the most insightful. It involves reconstructing prokaryotic taxonomy with whole-genome data using the functional, metabolic and structural genes in an organism. Ideally, the genes should be universally found, and be single copy; that is, present just once within the genome. Using single-copy marker genes means that any change is a good indication of phylogenetic signal and prevents mistakenly classifying a species more than once, or classifying multiple novel species where multiple copies of a slightly variable marker gene exist (Konstantinidis et al., 2006). For example, the species identification (SpecI) tool was developed to enable the reclassification of prokaryotic isolate genomes based on 40 universal, single- copy marker genes (Mende et al., 2013). As our ability to isolate and sequence prokaryotic genomes improves, such methods are increasingly important, as the rate of discovery is outstripping the rate at which classical taxonomy can be assigned (Forster et al., 2019; Zou et al., 2019). There are 249,239 total prokaryotic genomes currently deposited in RefSeq. The use of functional marker genes has also become increasingly applicable to classify MAGs, which have increased substantially in numbers over the past 3 years (Parks et al., 2017; Almeida et al., 2019; Nayfach et al., 2019; Pasolli et al., 2019; Glendinning et al., 2020), due to both the increased volume of shotgun metagenomic data and to improved methods for the assembly and quality assessment of MAGs. The CheckM software has been widely used to assess the quality of MAGs, and estimates completeness and contamination by surveying for single-copy genes (Parks et al., 2015). Unlike SpecI, CheckM uses lineage-specific single-copy marker genes for completeness and contamination, and provides an indication of the MAG taxonomy. However, CheckM is primarily there to assess completeness and contamination: but why do we need such metrics? The process of generating a MAG is complex. Short reads are assembled into contigs, then these contigs are sorted into sets (a process termed binning) that are believed to have come from a single organism, typically based on both a uniform coverage of the reads and on tetranucleotide content. However, there are numerous steps where errors can be introduced. First, the tools used for assembly often fail to assemble repetitive

regions such as tandem repeats, or common elements shared between genomes. Second, the process of binning can be imperfect, either missing contigs or inappropriately grouping contigs that actually come from different organisms. This becomes particularly problematic when there are closely related organisms with similar levels of abundance, typically in diverse environments. According to the MiMAGs standard (Bowers et al., 2017), completeness of a medium quality MAG could be as low as 50% of the genome. Thus, assigning taxonomy based on a relatively small set of marker genes becomes less appropriate, as numerous marker genes could be missing. In a very extreme scenario, two different 50% complete MAGs from the same species could represent entirely different halves of the same genome. A set of MAGs is typically reduced by a process of de-replication whereby genomes that are identified as being the ‘same’ are reduced down to a single representative high- quality genome. This avoids redundancy when depositing MAGs into a database. Novel MAGs may not match to any existing isolate genomes, resulting in an uninformative taxonomy where a lineage is empty or labelled with an arbitrary ID such as ‘uncultured bacteria’ or ‘unknown bacteria’. Thus, there is the need for a more resolute taxonomic classification. The genome taxonomy database (GTDB) resource was established to overcome these limitations of simply using marker genes. GTDB proposes a reconstructed bacterial and archaeal taxonomy derived from publicly available isolate- and culture- independent genomes; that is, MAGs (Parks et al., 2018b). A subset of genomes is selected from RefSeq/GenBank and the sequence read archive (SRA) to produce a combination of frequently sequenced genomes, and high-quality MAGs (see MiMAGs standard) from under-sampled lineages. Genomes are de-replicated into groups of similar species based on estimated average nucleotide identity (ANI). An additional quality control step filters the de-replicated sequences for an estimated quality > 50 (equivalent to completeness minus 5× contamination), as calculated with CheckM, such that only the very best isolate and MAG genomes are utilized from the previous step. Both RefSeq and SRA have a set of stringent quality measures to ensure that their databases contain near-complete, high-quality genomes. For example, RefSeq only comprises

Navigating Bacterial Taxonomy

annotated prokaryotic genomes that pass specific completeness and contamination thresholds (Haft et al., 2018). GTDB then attempts to computationally reconstruct an unbiased bacterial taxonomy with equal representation for taxonomic classes that are usually sparse (have low depth). However, counts still favour frequently studied phyla with > 65,000 genomes classified as Proteobacteria and > 36,000 classified as Firmicutes. This is an issue encountered with most bacterial taxonomy, as earlier studies predominantly focused on sequencing human pathogenic bacteria to aid clinical research. As the importance of the microbiome in the environment has grown in popularity in the last decade, sequencing now no longer requires cultured isolates, and the bias towards certain clades may disappear over time. It may be the case, however, that the existing bias is simply a true reflection of evolution. The backbone of GTDB phylogeny is a concatenated alignment of 120 ubiquitous single- copy proteins per genome. The reference proteins were present in greater than 90% prokaryotic genomes, mostly as a single copy (Parks et al., 2017). Using Prodigal to infer coding regions, the group of 120 single-copy marker proteins are extracted with HMMER for each genome. A concatenated alignment of the proteins is trimmed, leaving only regions which are prominent in most genomes. The phylogenetic tree is inferred from the remaining concatenated single-copy proteins. The cultured genomes are labelled with NCBI taxonomy, standardized to 7 ranks and SSU rRNA sequences in MAGs are mapped to the databases Greengenes and SILVA to resolve taxa names for uncultured genomes. GTDB phylogeny is restricted to monophyletic clades with the exception of some known polyphyletic phyla such as Proteobacteria and Firmicutes (Parks et al., 2018a), and phylogenetic depths are normalized by relative evolutionary divergence (RED). Meaningful nomenclature for uncultured genomes will ensure universal and informative communication within the scientific community. It will also aid comparative studies where there may be overlap in novel microbial diversity. Large-scale metagenomic studies aim to accurately locate the position of each MAG in a phylogenetic tree to determine the microbial content and diversity of a particular environment.

189

In a nutshell, assigning taxonomy relies on an accurate pairwise comparison between a query sequence and a phylogeny of existing sequences. The genetic distance can be calculated using a range of measures such as ANI (the percentage similarity between two sequences), alignment fraction (AF, the relatedness of protein-coding genes) and average amino acid identity (AAI). ANI has frequently been used to distinguish prokaryotic species with a threshold above 95%– 96% similarity. The threshold is consistent with the SSU rRNA species boundary and > 70% DDH gold standards (Konstantinidis and Tiedje, 2005; Kim et al., 2014). ANI is typically used to measure the distance between a concatenated set of orthologous single-copy proteins potentially present in both the data set and a sequence database, giving an initial estimate of which species are likely to be closely related to a query genome. ANI is a computationally intensive algorithm made more efficient with hashing techniques. One such hashing-based method is called MASH, a k-mer-based comparison which generates intermediate MinHash signatures (Jain et al., 2018). Owing to the improved scalability and robust thresholds for ANI, popular databases such as NCBI GenBank now use this measure as part of a standard validation pipeline for new submissions, to ensure accurate classification of public genome assemblies (Ciufo et al., 2018). Parks et al. (2020) proposed an update to GTDB, whereby species clusters are reclassified with the ANI sequence similarity measure, to resolve the species name for a large proportion of genomes where it was previously missing. Each cluster was assigned a representative genome, typically belonging to a type strain, and an ANI:AF ratio between each representative was used as a measure for the expected circumscription for the sequences falling into each cluster (Parks et al., 2020). This reclassification improved the overall domain to species taxonomy, ensuring more discrete clades that are phylogenetically similar, with some notable changes to the genus Escherichia. The taxonomic boundaries proposed by GTDB coherently classify MAGs from a variety of environmental data sets in comparison to NCBI (Méric et al., 2019) and it has been utilized for a number of metagenomics studies (Almeida et al., 2019). The use of alignment and distance measures to assign taxonomy via databases such as

190

V. Kale et al.

GTDB is streamlined by classification tools. GTDBTk is a complementary toolkit to GTDB, which assigns taxonomy with the same processes used to construct the database. The same concatenated set of 120 marker genes are aligned between the query set and database, and the genomes are placed in the GTDB reference tree based on maximum likelihood. The RED score is used to consolidate the taxonomic rank where it is not apparent by the branching, and the ANI calculation distinguishes between existing and novel species (Chaumeil et al., 2019). We have utilized GTDB-Tk to visualize a clade of MAGs generated for a chicken caecal metagenomics data set (Fig. 11.2). Metagenomic reads from Glendinning et al. (2020) were assembled with metaSPAdes, binned with metaWRAP (including CheckM completeness and contamination filters) and de-replicated to resolve MAGs. The Clostridia clade shown illustrates a number of species for which taxonomy could not be resolved to the lowest lineage (Letunic and Bork, 2019). Additionally, most of these are split into sub-clades without significant support of a high-quality GTDB genome with a known taxonomy, highlighting the potential for novel diversity to be discovered and the need for an accurate taxonomy for novel species. Prokaryote organisms are notorious for their interaction with their hosts and the environment around them. While a portion of their genome is composed of ancient, vertically inherited genes, there is evidence of frequent horizontal gene transfer (HGT) events, particularly in bacteria that are put under constant survival pressure such as pathogenic bacteria frequently treated with antimicrobials, or transfer of symbiotic genes across genera to aid host nutrition (Philippe and Douady, 2003; Andrews et al., 2018). Phylogenetic trees can present incongruent taxonomy, clustering unrelated lineages where HGT events are frequent (Wolf et al., 2001; Philippe and Douady, 2003). As seen with GTDB, an alignment of ubiquitous single-copy marker genes is commonly used to place MAGs in a phylogenetic tree. Some methods predict which genes are more likely to have been part of HGT events, and selectively exclude these for taxonomic classification (Ciccarelli et al., 2006). Some studies also try to exclusively classify with conserved genes, often genes encoding ribosomal proteins, which are part of the ‘core’ bacterial

genome (Lang et al., 2013). The selection of a reference gene catalogue may be biased by the specific sample assembly set. Because of the uncertainty of multi-marker gene taxonomy, reconstructed phylogenetic trees are compared to their SSU rRNA counterparts, in particular to ensure that high level taxa from domain to phylum are in agreement.

The Disconnect Between MAGs and Metabarcoding Approaches There are several advantages of metagenomic sequencing over amplicon; for example, it avoids the introduction of primer bias when designing sample specific primers (Eisenstein, 2018). Reconstructing the SSU rRNA marker gene from MAGs, therefore, may resolve congruent phylogeny which does not require significant re-engineering of a trusted bacterial taxonomy. However, assembling full-length SSU rRNA sequences from short-read sequence data can be difficult, primarily due to the highly conserved regions of the SSU rRNA. Assembly graphs would confidently align reads belonging to the highly conserved region of the genes from different species. However, paths would diverge at the variable regions, and assembly algorithms attempt to collapse these points of divergence to give uniform paths through the graph. The result is then typically a fragmented SSU rRNA assembly, which negates the possibility of a simple local alignment to reference databases. Most MAGs lack a full-length SSU rRNA, or even fragments of it, and those that do are typically dominant in the bacterial population (Parks et al., 2017). A mere 10.2% of SSU rRNA fragments of reasonable length were recovered from a broad study of 1500 uncultivated genomes from the SRA identified with BLASTN, prompting a protein-driven taxonomic classification (Parks et al., 2017). Alignment to covariance models for the target region to extract SSU rRNA genes from metagenomes as performed by Schulz et al. (2017) shows promise as a method. They were able to recover 56,875 SSU sequences with a minimum length of 1200 bp from 6744 single amplified genomes (SAGs) and MAGs. Clustered initially at 97% similarity, this significantly expanded the bacterial tree, reporting greater than 4000 novel OTUs with no match in

Navigating Bacterial Taxonomy

191

ERR3414586bin.19 ERR3414593bin.81 ERR3414590bin.5 GR_GCA_003523255.1 GB_GCA_002308755.1 GB_GCA_900320065.1

UBA1259 sp003523255 UBA1259 sp002308755 UBA1259 sp900320065

g UBA1259' GB_GCA_003538975.1

UBA11475 sp003538975

GB_GCA_002405165.1

UBA4675 sp002405165

GB_GCA_002405915.1

UBA4636 sp002405915

GB_GCA_000437515.1 f UBA12242'

UBA10281 sp000437515

ERR3414586bin.83 GB_GCA_000435495.1 f CAG-552'

CAG-552 sp000435495

ERR3414594bin.4 GB_GCA_002405615.1

UBA7867 sp002405615

GB_GCA_900351185.1

RUG666 sp900351185

GB_GCA_002477805.1

UBA7584 sp002477805

f UBA4651' GB_GCA_003533505.1

UBA10677 sp003533505

ERR3414586bin.16 ERR3414586bin.114 ERR3414582bin.120 ERR3414578bin.1 ERR3414586bin.54 ERR3414588bin.44 ERR3414585bin.24 ERR3414572bin.20 ERR3414594bin.48 ERR3414590bin.50 ERR3414581bin.28 ERR3414581bin.19 ERR3414590bin.53 ERR3414581bin.115 ERR3414579bin.63 ERR3414577bin.43 GB_GCA_003503945.1

UBA11940 sp003503945

ERR3414586bin.121 ERR3414589bin.82 ERR3414594bin.81 ERR3414576bin.43 ERR3414579bin.95 ERR3414578bin.62

MAGs Genus UBA11940

ERR3414574bin.65 ERR3414590bin.83 GB_GCA_002474405.1

UBA7597 sp002474405

GB_GCA_003448195.1

UBA7597 sp003448195

g UBA7597'

Genus QALS01

ERR3414588bin.72 GB_GCA_003150575.1

QALS01 sp003150575

ERR3414577bin.15

Fig. 11.2. A subtree of the GTDB phylogeny: order 4c28d-15, class Clostridia phylum Firmicutes generated with iTOL (Letunic and Bork, 2019). MAGs generated from a chicken caecal metagenomics data set (Glendinning et al., 2020) for this clade are highlighted, and strains of the representative GTDB genomes are shown next to their identifier. Taxonomy was assigned with GTDB-Tk, of which 7 MAGs were annotated to a genus level, indicated with arrows. This phylogeny shows a number of closely related potentially novel species, for which the existing reference genomes are unable to adequately support the taxonomy. GTDB, Genome Taxonomy Database; MAG, metagenome-assembled genome.

SILVA. The SSU rRNA gene is typically > 1500 bp long, meaning these sequences still do not represent the full length of the gene, with the mapped sections falling anywhere between variable regions V3 and V7 (Schulz et al., 2017). As mentioned briefly, the lack of full-length genes is partly a result of short-read sequencing.

MAGs are resolved by grouping similar coverage and composition into bins. Short reads from highly conserved genes, or those with repeat elements such as retrotransposons, can have high coverage at conserved regions and produce conflicted assembly graphs at variable regions, potentially leading to incorrect binning of species (Moss

192

V. Kale et al.

et al., 2020). The use of long-read sequencing is becoming popular to study microbiome taxonomy, with PacBio and Oxford Nanopore Technologies now producing kilobases of cost-effective, high-quality shotgun metagenomic reads (Oxford Nanopore Technologies, 2017). With longer reads, assembly graph resolution is higher, and the number of contigs per assembly decreases. This means the binning process is more specific, and in turn the presence of full-length marker genes will ease taxonomic classification. Longread technology still needs refinement, and requires a large starting volume of concentrated DNA, which is challenging to extract from some environments, but has the potential to bridge the gap between MAGs and metabarcoding. Advancement in long-read technology could help resolve full-length SSU rRNA sequences, and this is one solution to preserve the current SSU taxonomy used almost universally. However, the fundamental issue is that SSU and reconstructed isolate/MAG taxonomy such as GTDB are not the same. In fact, when the authors of GTDB proposed a reconstructed tree of life, Parks et al. (2018b) found that 58% of the genomes above the rank of species had a different taxonomy when compared to NCBI. Moreover, 7% of these changes were at the phylum rank. GTDB have aimed to resolve monophyletic groups and use the rank-defining RED score to overcome uneven phylogenetic depth where the ranks mostly consist of uncultured genomes. These normalization techniques are partly responsible for some changes at higher ranks. For example, Tenericutes exists as a separate phylum in SILVA, Greengenes and RDP-II (based on SSU rRNA), and there is agreement that this is a separate monophyletic group. The RED score, however, questions the branching point of Tenericutes and rearranges taxa, incorporating the mycoplasmas and other orders into phylum Firmicutes (Parks et al., 2018b). The placement of Tenericutes has long been debated, but this is just one example of taxonomic disagreements highlighted by multiple gene markers. SILVA has taken steps to reconcile taxonomy, and the new SSU release 138 of the database also contains an alternative GTDB taxonomy. SILVA has warned that its implementation of GTDB taxonomy poses significant changes to some former groups. Nevertheless, this offers scientists the option to directly compare SSU rRNA and

MAG classifications using the same backbone taxonomy. Both GTDB-genome-based taxonomy and single marker gene taxonomy have limitations. First, GTDB is inclusive of uncultured genomes, resulting in phyla which have no support of cultured or type strain. For example, Candidatus species previously placed into the ‘unclassified bacteria’ phylum have been grouped into phylum Modulifelxota owing to the discovery of two closely related MAGs in the sludge metagenome (Sekiguchi et al., 2015). As the number of uncultured MAGs increases, more closely related species from a different phylum could mean that this taxonomy is significantly revised again. Second, the conserved nature of the SSU rRNA regions may not effectively mimic the true evolution of bacteria. Species boundaries are observed at greater than 99% SSU sequence similarity. An example is the clinically studied streptococcal species Streptococcus mitis, Streptococcus oralis and Streptococcus pneumoniae (Facklam, 2002). Phylogenies are constructed using alternative genes (e.g. ITS1 and ITS2); however, a single marker gene represents a very small proportion of the bacterial genome. These are generally used for the studying of eukaryotic organisms that are not yet part of the GTDB taxonomy. As mentioned previously, these genes also have different rates of evolution, making a comparative analysis of data sets difficult, and highlighting that extensive curation of single-copy marker genes is required to identify a concatenated set that accurately mimics the phylogenetic signal of an organism.

Conclusion In this chapter we describe the main techniques to determine taxonomy of the bacterial component of microbiota for both metabarcoding and metagenomics data sets. Currently, the results of these two approaches can be difficult to compare for the various reasons we highlight, and they are compounded by the fact that the data sets typically have different purposes. Despite the current difficulties, the field is heading in a direction that attempts to harmonize bacterial taxonomy generated by both approaches, ultimately making these comparisons less problematic in the future. In the context of the techniques discussed in this chapter, reconciling metabarcoding and genome taxonomy would be a twofold process.

Navigating Bacterial Taxonomy

First, as new SSU rRNAs are gathered from isolate genomes and MAGs, it will be possible to map these genes to an ASV-based taxonomy. Second, the genome sequences will provide a good approximation of the functional potential of the identified microbes. While methods to resolve MAGs have undoubtedly improved, owing to the nature of MAG assembly, a consensus genome may represent a set of closely related members of a population rather than a true clonal species. Figure 11.1 is a schematic representation of the ideal workflow to elucidate taxonomy from genomes, and similar methods could be applied for long-read data sets. It is also true that assembly and binning steps favour MAGs of the more

193

abundant microbes in an environment. Hence there is still a need for metabarcoding methods and other taxonomic methods (not covered by this chapter) that are capable of classifying all reads from a metagenomics experiment. The development of long-read sequencing protocols, as well as hybrid short- and long-read sequencing, will undoubtedly help increase the completeness and accuracy of MAGs, adding genomes to the tree of life, reducing errors and increasing the potential to resolve full-length SSU rRNA to then assign taxonomy for ASVs. Finally, increased genomic knowledge will enable greater, future functional insights into ASV-based results that have been – and will be – generated.

References Almeida, A., Mitchell, A.L., Tarkowska, A. and Finn, R.D. (2018) Benchmarking taxonomic assignments based on 16S rRNA gene profiling of the microbiota from commonly sampled environments. GigaScience 7 (5). doi: 10.1093/gigascience/giy054 Almeida, A., Nayfach, S., Boland, M., Strozzi, F., Beracochea, M., Shi, Z.J., Pollard, K.S., Parks, D.H., Hugenholtz, P., Segata, N., Kyrpides, N.C. and Finn, R.D. (2019) A unified sequence catalogue of over 280,000 genomes obtained from the human gut microbiome. bioRxiv. doi: 10.1101/762682 Amir, A., McDonald, D., Navas-Molina, J.A., Kopylova, E., Morton, J.T., Zech Xu, Z., Kightley, E.P., Thompson, L.R., Hyde, E.R., Gonzalez, A. and Knight, R. (2017) Deblur rapidly resolves singlenucleotide community sequence patterns. mSystems 2 (2). doi: 10.1128/mSystems.00191-16 Andrews, M., De Meyer, S., James, E.K., Stępkowski, T., Hodge, S., Simon, M.F. and Young, J.P.W. (2018) Horizontal transfer of symbiosis genes within and between rhizobial genera: occurrence and importance. Genes 9 (7). doi: 10.3390/genes9070321 Bauer, M.A., Kainz, K., Carmona-Gutierrez, D. and Madeo, F. (2018) Microbial wars: Competition in ecological niches and within the microbiome. Microbial Cell Factories 5 (5), 215–219. doi: 10.15698/ mic2018.05.628 Be, N.A., Avila-Herrera, A., Allen, J.E., Singh, N., Checinska Sielaff, A., Jaing, C. and Venkateswaran, K. (2017) Whole metagenome profiles of particulates collected from the International Space Station. Microbiome 5 (1), 81. doi: 10.1186/s40168-017-0292-4 Beye, M., Fahsi, N., Raoult, D. and Fournier, P.-E. (2018) Careful use of 16S rRNA gene sequence similarity values for the identification of Mycobacterium species. New Microbes and New Infections 22, 24–29. doi: 10.1016/j.nmni.2017.12.009 Bowers, R.M. et al. (2017) Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nature Biotechnology 35 (8), 725–731. doi: 10.1038/nbt.3893 Callahan, B.J., McMurdie, P.J., Rosen, M.J., Han, A.W., Johnson, A.J.A. and Holmes, S.P. (2016) DADA2: High-resolution sample inference from Illumina amplicon data. Nature Methods 13 (7), 581–583. doi: 10.1038/nmeth.3869 Callahan, B.J., McMurdie, P.J. and Holmes, S.P. (2017) Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. The ISME Journal 11 (12), 2639–2643. doi: 10.1038/ ismej.2017.119 Caruso, V., Song, X., Asquith, M. and Karstens, L. (2019) Performance of microbiome sequence inference methods in environments with varying biomass. mSystems 4 (1). doi: 10.1128/mSystems.00163-18 Chakravorty, S., Helb, D., Burday, M., Connell, N. and Alland, D. (2007) A detailed analysis of 16S ribosomal RNA gene segments for the diagnosis of pathogenic bacteria. Journal of Microbiological Methods 69 (2), 330–339. doi: 10.1016/j.mimet.2007.02.005

194

V. Kale et al.

Chaumeil, P.-A., Mussig, A.J., Hugenholtz, P. and Parks, D.H. (2019) GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics doi: 10.1093/bioinformatics/ btz848 Chen, W., Zhang, C.K., Cheng, Y., Zhang, S. and Zhao, H. (2013) A comparison of methods for clustering 16S rRNA sequences into OTUs. PloS One 8 (8), e70837. doi: 10.1371/journal.pone.0070837 Ciccarelli, F.D., Doerks, T., von Mering, C., Creevey, C.J., Snel, B. and Bork, P. (2006) Toward automatic reconstruction of a highly resolved tree of life. Science 311 (5765), 1283–1287. doi: 10.1126/science.1123061 Ciufo, S., Kannan, S., Sharma, S., Badretdin, A., Clark, K., Turner, S., Brover, S., Schoch, C.L., Kimchi, A. and DiCuccio, M. (2018) Using average nucleotide identity to improve taxonomic assignments in prokaryotic genomes at the NCBI. International Journal of Systematic and Evolutionary Microbiology 68 (7), 2386–2392. doi: 10.1099/ijsem.0.002809 Edgar, R.C. (2016) UNOISE2: improved error-correction for Illumina 16S and ITS amplicon sequencing. bioRxiv. doi: 10.1101/081257 Edgar, R.C. (2017) Accuracy of microbial community diversity estimated by closed- and open-reference OTUs. PeerJ 5, e3889. doi: 10.7717/peerj.3889 Edgar, R.C. (2018) Updating the 97% identity threshold for 16S ribosomal RNA OTUs. Bioinformatics 34 (14), 2371–2375. doi: 10.1093/bioinformatics/bty113 Edgar, R.C. and Flyvbjerg, H. (2015) Error filtering, pair assembly and error correction for next-generation sequencing reads. Bioinformatics 31 (21), 3476–3482. doi: 10.1093/bioinformatics/btv401 Eisenstein, M. (2018) Microbiology: making the best of PCR bias. Nature Methods 15 (5), 317–320. doi: 10.1038/nmeth.4683 Eren, A.M., Maignien, L., Sul, W.J., Murphy, L.G., Grim, S.L., Morrison, H.G. and Sogin, M.L. (2013) Oligotyping: Differentiating between closely related microbial taxa using 16S rRNA gene data. Methods in Ecology and Evolution 4 (12). doi: 10.1111/2041-210X.12114 Eren, A.M., Morrison, H.G., Lescault, P.J., Reveillaud, J., Vineis, J.H. and Sogin, M.L. (2015) Minimum entropy decomposition: unsupervised oligotyping for sensitive partitioning of high-throughput marker gene sequences. The ISME Journal 9 (4), 968–979. doi: 10.1111/2041-210X.12114 Escapa, I.F., Chen, T., Huang, Y., Gajare, P., Dewhirst, F.E. and Lemon, K.P. (2018) New insights into human nostril microbiome from the expanded human oral microbiome database (eHOMD): a resource for the microbiome of the human aerodigestive tract. mSystems 3 (6). doi: 10.1128/mSystems.00187-18 Facklam, R. (2002) What happened to the streptococci: overview of taxonomic and nomenclature changes. Clinical Microbiology Reviews 15 (4), 613–630. doi: 10.1128/cmr.15.4.613-630.2002 Forster, S.C., Kumar, N., Anonye, B.O., Almeida, A., Viciani, E., Stares, M.D., Dunn, M., Mkandawire, T.T., Zhu, A., Shao, Y., Pike, L.J., Louie, T., Browne, H.P., Mitchell, A.L., Neville, B.A., Finn, R.D. and Lawley, T.D. (2019) A human gut bacterial genome and culture collection for improved metagenomic analyses. Nature Biotechnology 37 (2), 186–192. doi: 10.1038/s41587-018-0009-7 Ghodsi, M., Liu, B. and Pop, M. (2011) DNACLUST: accurate and efficient clustering of phylogenetic marker genes. BMC Bioinformatics 12, 271. doi: 10.1186/1471-2105-12-271 Glendinning, L., Stewart, R.D., Pallen, M.J., Watson, K.A. and Watson, M. (2020) Assembly of hundreds of novel bacterial genomes from the chicken caecum. Genome Biology 21 (1), 34. doi: 10.1186/s13059020-1947-1 Glöckner, F.O., Yilmaz, P., Quast, C., Gerken, J., Beccati, A., Ciuprina, A., Bruns, G., Yarza, P., Peplies, J., Westram, R. and Ludwig, W. (2017) 25 years of serving the community with ribosomal RNA gene reference databases and tools. Journal of Biotechnology 261, 169–176. doi: 10.1016/j.jbiotec.2017.06.1198 Haft, D.H., DiCuccio, M., Badretdin, A., Brover, V., Chetvernin, V., O’Neill, K., Li, W., Chitsaz, F., Derbyshire, M.K., Gonzales, N.R., Gwadz, M., Lu, F., Marchler, G.H., Song, J.S., Thanki, N., Yamashita, R.A., Zheng, C., Thibaud-Nissen, F., Geer, L.Y., Marchler-Bauer, A. and Pruitt, K.D. (2018) RefSeq: an update on prokaryotic genome annotation and curation. Nucleic Acids Research 46 (D1), D851– D860. doi: 10.1093/nar/gkx1068 Heather, J.M. and Chain, B. (2016) The sequence of sequencers: The history of sequencing DNA. Genomics 107 (1), 1–8. doi: 10.1016/j.ygeno.2015.11.003 Hug, L.A., Baker, B.J., Anantharaman, K., Brown, C.T., Probst, A.J., Castelle, C.J., Butterfield, C.N., Hernsdorf, A.W., Amano, Y., Ise, K., Suzuki, Y., Dudek, N., Relman, D.A., Finstad, K.M., Amundson, R., Thomas, B.C. and Banfield, J.F. (2016) A new view of the tree of life. Nature Microbiology 1, 16048. doi: 10.1038/nmicrobiol.2016.48

Navigating Bacterial Taxonomy

195

Jain, C., Rodriguez-R, L.M., Phillippy, A.M., Konstantinidis, K.T. and Aluru, S. (2018) High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nature Communications 9 (1), 5114. doi: 10.1038/s41467-018-07641-9 Janda, J.M. and Abbott, S.L. (2002) Bacterial identification for publication: when is enough enough?. Journal of Clinical Microbiology 40 (6), 1887–1891. doi: 10.1128/jcm.40.6.1887-1891.2002 Johnson, J.S., Spakowicz, D.J., Hong, B.-Y., Petersen, L.M., Demkowicz, P., Chen, L., Leopold, S.R., Hanson, B.M., Agresta, H.O., Gerstein, M., Sodergren, E. and Weinstock, G.M. (2019) Evaluation of 16S rRNA gene sequencing for species and strain-level microbiome analysis. Nature Communications 10 (1), 5029. doi: 10.1038/s41467-019-13036-1 Kim, M., Oh, H.-S., Park, S.-C. and Chun, J. (2014) Towards a taxonomic coherence between average nucleotide identity and 16S rRNA gene sequence similarity for species demarcation of prokaryotes. International Journal of Systematic and Evolutionary Microbiology 64 (Pt 2), 346–351. doi: 10.1099/ ijs.0.059774-0 Konstantinidis, K.T. and Tiedje, J.M. (2005) Genomic insights that advance the species definition for prokaryotes. Proceedings of the National Academy of Sciences of the United States of America 102 (7), 2567–2572. doi: 10.1128/AEM.01398-06 Konstantinidis, K.T., Ramette, A. and Tiedje, J.M. (2006) Toward a more robust assessment of intraspecies diversity, using fewer genetic markers. Applied and Environmental Microbiology 72 (11), 7286– 7293. doi: 10.1073/pnas.0409727102 Kopf, A. et al. (2015) The ocean sampling day consortium. GigaScience 4, 27. doi: 10.1186/s13742-0150066-5 Lang, J.M., Darling, A.E. and Eisen, J.A. (2013) Phylogeny of bacterial and archaeal genomes using conserved genes: supertrees and supermatrices. PloS One 8 (4), e62510. doi: 10.1371/journal.pone.0062510 Letunic, I. and Bork, P. (2019) Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Research 47 (W1), W256–W259. doi: 10.1093/nar/gkz239 McDonald, D. et al. (2018) American Gut: an open platform for citizen science microbiome research. mSystems 3 (3). doi: 10.1128/mSystems.00031-18 Mende, D.R., Sunagawa, S., Zeller, G. and Bork, P. (2013) Accurate and universal delineation of prokaryotic species. Nature Methods 10 (9), 881–884. doi: 10.1038/nmeth.2575 Méric, G., Wick, R.R., Watts, S.C., Holt, K.E. and Inouye, M. (2019) Correcting index databases improves metagenomic studies. bioRxiv. doi: 10.1101/712166 Moss, E.L., Maghini, D.G. and Bhatt, A.S. (2020) Complete, closed bacterial genomes from microbiomes using nanopore sequencing. Nature Biotechnology. doi: 10.1038/s41587-020-0422-6 Nayfach, S., Shi, Z.J., Seshadri, R., Pollard, K.S. and Kyrpides, N.C. (2019) New insights from uncultivated genomes of the global human gut microbiome. Nature 568 (7753), 505–510. doi: 10.1038/s41586019-1058-x Needham, D.M., Sachdeva, R. and Fuhrman, J.A. (2017) Ecological dynamics and co-occurrence among marine phytoplankton, bacteria and myoviruses shows microdiversity matters. The ISME Journal 11 (7), 1614–1629. doi: 10.1038/ismej.2017.29 Oxford Nanopore Technologies (2017) Nanopore sequencing. The advantages of long reads for genome assembly. Oxford Nanopore Technologies. Available at: https://nanoporetech.com/sites/default/files/ s3/white-papers/WGS_Assembly_white_paper.pdf?submissionGuid=40a7546b-9e51-42e7-bde9 -b5ddef3c3512 (accessed 28 May 2020). Parks, D.H., Imelfort, M., Skennerton, C.T., Hugenholtz, P. and Tyson, G.W. (2015) CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research 25 (7), 1043–1055. doi: 10.1101/gr.186072.114 Parks, D.H. et al. (2017) Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nature Microbiology 2 (11), 1533–1542. doi: 10.1038/s41564-017-0012-7 Parks, D.H., Chuvochina, M., Waite, D.W., Rinke, C., Skarshewski, A., Chaumeil, P.-A. and Hugenholtz, P. (2018a) A proposal for a standardized bacterial taxonomy based on genome phylogeny. bioRxiv. doi: 10.1101/256800 Parks, D.H., Chuvochina, M., Waite, D.W., Rinke, C., Skarshewski, A., Chaumeil, P.-A. and Hugenholtz, P. (2018b) A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nature Biotechnology 36 (10), 996–1004. doi: 10.1038/nbt.4229 Parks, D.H., Chuvochina, M., Chaumeil, P.-A., Rinke, C., Mussig, A.J. and Hugenholtz, P. (2020) A complete domain-to-species taxonomy for Bacteria and Archaea. Nature Biotechnology doi: 10.1038/s41587020-0501-8

196

V. Kale et al.

Pasolli, E., Asnicar, F., Manara, S., Zolfo, M., Karcher, N., Armanini, F., Beghini, F., Manghi, P., Tett, A., Ghensi, P., Collado, M.C., Rice, B.L., DuLong, C., Morgan, X.C., Golden, C.D., Quince, C., Huttenhower, C. and Segata, N. (2019) Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell 176 (3), 649–662. e20. doi: 10.1016/j.cell.2019.01.001 Pei, A., Nossa, C.W., Chokshi, P., Blaser, M.J., Yang, L., Rosmarin, D.M. and Pei, Z. (2009) Diversity of 23S rRNA genes within individual prokaryotic genomes. PloS One 4 (5), e5437. doi: 10.1371/journal. pone.0005437 Philippe, H. and Douady, C.J. (2003) Horizontal gene transfer and phylogenetics. Current Opinion in Microbiology 6 (5), 498–505. doi: 10.1016/j.mib.2003.09.008 Quince, C., Lanzen, A., Davenport, R.J. and Turnbaugh, P.J. (2011) Removing noise from pyrosequenced amplicons. BMC Bioinformatics 12, 38. doi: 10.1186/1471-2105-12-38 Rinke, C., Schwientek, P., Sczyrba, A., Ivanova, N.N., Anderson, I.J., Cheng, J.-F., Darling, A., Malfatti, S., Swan, B.K., Gies, E.A., Dodsworth, J.A., Hedlund, B.P., Tsiamis, G., Sievert, S.M., Liu, W.-T., Eisen, J.A., Hallam, S.J., Kyrpides, N.C., Stepanauskas, R., Rubin, E.M., Hugenholtz, P. and Woyke, T. (2013) Insights into the phylogeny and coding potential of microbial dark matter. Nature 499 (7459), 431–437. doi: 10.1038/nature12352 Rosen, M.J., Callahan, B.J., Fisher, D.S. and Holmes, S.P. (2012) Denoising PCR-amplified metagenome data. BMC Bioinformatics 13, 283. doi: 10.1186/1471-2105-13-283 Sacchi, C.T., Whitney, A.M., Reeves, M.W., Mayer, L.W. and Popovic, T. (2002) Sequence diversity of Neisseria meningitidis 16S rRNA genes and use of 16S rRNA gene sequencing as a molecular subtyping tool. Journal of Clinical Microbiology 40 (12), 4520–4527. doi: 10.1128/jcm.40.12.4520-4527.2002 Schoch, C.L., Seifert, K.A., Huhndorf, S., Robert, V., Spouge, J.L., Levesque, C.A., Chen, W., Fungal Barcoding Consortium and Fungal Barcoding Consortium Author List (2012) Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi. Proceedings of the National Academy of Sciences of the United States of America 109 (16), 6241–6246. doi: 10.1073/ pnas.1117018109 Schulz, F., Eloe-Fadrosh, E.A., Bowers, R.M., Jarett, J., Nielsen, T., Ivanova, N.N., Kyrpides, N.C. and Woyke, T. (2017) Towards a balanced view of the bacterial tree of life. Microbiome 5 (1), 140. doi: 10.1186/s40168-017-0360-9 Sekiguchi, Y., Ohashi, A., Parks, D.H., Yamauchi, T., Tyson, G.W. and Hugenholtz, P. (2015) First genomic insights into members of a candidate bacterial phylum responsible for wastewater bulking. PeerJ 3, e740. doi: 10.7717/peerj.740 SILVA (2020) SILVA taxonomy [Online]. Available at https://www.arb-silva.de/documentation/silvataxonomy/ (accessed 28 May 2020). Stackebrandt, E. and Ebers J. (2006) Taxonomic parameters revisited : tarnished gold standards. Microbiology Today 33, 152–155. Stackebrandt, E. and Goebel, B.M. (1994) Taxonomic note: a place for DNA-DNA reassociation and 16S rRNA sequence analysis in the present species definition in bacteriology. International Journal of Systematic and Evolutionary Microbiology 44 (4), 846–849. doi: 10.1099/00207713-44-4-846 Straub, D., Blackwell, N., Fuentes, A.L., Peltzer, A., Nahnsen, S. and Kleindienst, S. (2019) Interpretations of microbial community studies are biased by the selected 16S rRNA gene amplicon sequencing pipeline. bioRxiv. doi: 10.1101/2019.12.17.880468 Sunagawa, S. et al. (2015) Ocean plankton. Structure and function of the global ocean microbiome. Science 348 (6237), 1261359. doi: 10.1126/science.1261359 Thompson, L.R. et al. (2017) A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551 (7681), 457–463. doi: 10.1038/nature24621 Tyson, G.W., Chapman, J., Hugenholtz, P., Allen, E.E., Ram, R.J., Richardson, P.M., Solovyev, V.V., Rubin, E.M., Rokhsar, D.S. and Banfield, J.F. (2004) Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428 (6978), 37–43. doi: 10.1038/ nature02340 Venter, J.C., Remington, K., Heidelberg, J.F., Halpern, A.L., Rusch, D., Eisen, J.A., Wu, D., Paulsen, I., Nelson, K.E., Nelson, W., Fouts, D.E., Levy, S., Knap, A.H., Lomas, M.W., Nealson, K., White, O., Peterson, J., Hoffman, J., Parsons, R., Baden-Tillson, H., Pfannkoch, C., Rogers, Y.-H. and Smith, H.O. (2004) Environmental genome shotgun sequencing of the Sargasso Sea. Science 304 (5667), 66–74. doi: 10.1126/science.1093857

Navigating Bacterial Taxonomy

197

Weyrich, L.S., Farrer, A.G., Eisenhofer, R., Arriola, L.A., Young, J., Selway, C.A., Handsley-Davis, M., Adler, C., Breen, J. and Cooper, A. (2018) Laboratory contamination over time during low-biomass sample analysis. bioRxiv. doi: 10.1101/460212 Woese, C.R., Kandler, O. and Wheelis, M.L. (1990) Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proceedings of the National Academy of Sciences of the United States of America 87 (12), 4576–4579. doi: 10.1073/pnas.87.12.4576 Wolf, Y.I., Rogozin, I.B., Grishin, N.V., Tatusov, R.L. and Koonin, E.V. (2001) Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evolutionary Biology 1, 8. doi: 10.1186/1471-2148-1-8 Yarza, P., Yilmaz, P., Pruesse, E., Glöckner, F.O., Ludwig, W., Schleifer, K.-H., Whitman, W.B., Euzéby, J., Amann, R. and Rosselló-Móra, R. (2014) Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nature Reviews Microbiology 12 (9), 635–645. doi: 10.1038/nrmicro3330 Zou, Y., Xue, W., Luo, G., Deng, Z., Qin, P., Guo, R., Sun, H., Xia, Y., Liang, S., Dai, Y., Wan, D., Jiang, R., Su, L., Feng, Q., Jie, Z., Guo, T., Xia, Z., Liu, C., Yu, J., Lin, Y., Tang, S., Huo, G., Xu, X., Hou, Y., Liu, X., Wang, J., Yang, H., Kristiansen, K., Li, J., Jia, H. and Xiao, L. (2019) 1,520 reference genomes from cultivated human gut bacteria enable functional microbiome analyses. Nature Biotechnology 37 (2), 179–185. doi: 10.1038/s41587-018-0008-8

12

Sequence-based Identification and Classification of Fungi

Andrew M. Borman* and Elizabeth M. Johnson UK National Mycology Reference Laboratory (MRL), Public Health England South-West, Bristol, and Medical Research Council Centre for Medical Mycology (MRC CMM), University of Exeter, UK

Introduction The fungal kingdom is vast, with estimates varying from 1.5 to > 5 million species, only approximately 140,000 of which have been described in detail and accepted (Hawksworth, 1991; O’Brien et al., 2005; Hawksworth and Lücking, 2017; Lücking and Hawksworth, 2018). Ubiquitous in nature, fungi inflict human, animal and plant diseases that are associated with huge health and economic burdens (Brown et al., 2012; Fisher et al., 2018). Accurate identification to species level is essential for managing fungal diseases of humans and animals, and elucidating outbreaks and possible transmission events, as well as for establishing the extent of their distribution in nature and their diversity (Borman et al., 2008; Olson et al., 2013; Fones et al., 2017; Lockhart et al., 2017; Fisher et al., 2018; Gladieux et al., 2018; Borman et al., 2019b; Szekely et al., 2019). Historically, fungal identification relied upon the careful examination of morphological and other phenotypic characters, often including detailed analyses of carbohydrate assimilation/fermentation or biochemical profiling (Huppert et al., 1975; Barnett et al., 2000; Frisvad et al., 2008; Campbell et al., 2013; de Hoog et al., 2016). Although such morphological and

phenotypic traits remain useful for species descriptions and delineations, they are fraught with limitations (see Chapter 2). For filamentous fungi, microscopic and macroscopic features are often produced infrequently and are dependent on the substrate (Slepecky and Starmer, 2009; Borman et al., 2008, 2016) or differ between the teleomorph and anamorph states, and numerous genera exhibit di- or pleomorphism (Wolff et al., 2002; Dukik et al., 2017; Friedman and Schwartz, 2019). Morphological identification is further confounded by convergent evolution of unrelated taxa (Brun and Silar, 2010; Luangsa-Ard et al., 2011), divergent evolution of closely related organisms (Xu et al., 2007), hybridization (Olson and Stenlid, 2002; Ioos et al., 2006; Hagen et al., 2015), and the presence of cryptic species in many well-studied morphospecies (Tavanti et al., 2005; Balajee et al., 2009; Houbraken et al., 2010; Hagen et al., 2015). In addition, a changing medical and environmental landscape has resulted in numerous novel human, animal and plant pathogens that are difficult to identify using traditional morphological/phenotypic characters (Linton et al., 2007; Olson et al., 2013; Dukik et al., 2017; Fones et al., 2017; Lockhart et al., 2017; Friedman and Schwartz, 2019).

*[email protected]

198

© CAB International 2021. Trends in the Systematics of Bacteria and Fungi (eds. P. Bridge, D. Smith and E. Stackebrandt)

Sequence-based Identification and Classification of Fungi

Faced with the above limitations to conventional fungal identification and classification, the mycology field has enthusiastically embraced molecular approaches (Linton et al., 2007; Borman et al., 2008; Balajee et al., 2009); see also Chapters 14 and 17. Such approaches have involved two complementary, but distinct aims: sequence- based classification and sequence-based identification (Balajee et al., 2009; Herr et al., 2015; Hibbett et al., 2016). Sequence-based classification, which strives to discover, name, classify and delimit novel fungal organisms according to their phylogenetic relationships with pre-existing taxa, typically requires PCR amplification, sequencing and phylogenetic comparisons of multiple genes/fragments (phylogenetic species recognition, PSR). Theoretically, once a species has been sufficiently delimited by PSR and intraspecific variation has been established, sequence-based identification can be achieved via comparative sequence analyses of a single well-defined locus (DNA barcoding) (Hebert et al., 2004; Balajee et al., 2009). Numerous individual loci have been explored for their taxonomic discriminatory power for both sequence-based classification and sequence-based identification (Linton et al., 2007; Balajee et al., 2009; Borman et al., 2010; Hibbett et al., 2016; Badotti et al., 2017). Among these, the nuclear rDNA gene cassette, which encompasses the small subunit (SSU) and large subunit (LSU) rRNA genes and the internal transcribed spacer regions 1 and 2 (ITS1 and ITS2), has received the most attention (White et al., 1990). Although the SSU and LSU regions have been shown to be useful barcodes for the Glomeromycota and human pathogenic ascomycete and basidiomycete yeast species (Kurtzman and Robnett, 1998; Fell et al., 2000; Beck et al., 2007; Linton et al., 2007), they show insufficient discrimination for most other groups of fungi. However, these regions are still frequently employed alongside more discriminatory loci for PSR approaches (Campbell et al., 2006; Liu et al., 2015; Wang et al., 2015a,b). Although the ITS region was formally declared as the universal fungal barcode only in the last decade (Schoch et al., 2012), it had long been adopted as such by the mycology community (White et al., 1990; Pryce et al., 2003; Bridge et al., 2005; Hinrikson et al., 2005; Schwartz et al., 2006; Balajee et al., 2009; Seifert, 2009).

199

The ITS Region as a Universal Barcode for Fungal Identification: Advantages and Limitations The ITS region offers several practical advantages as a universal fungal barcode region. The region encompasses segments that permit resolution at different taxonomic levels as it includes the highly conserved 5.8S rRNA gene, the moderately rapidly evolving ITS2 region and the rapidly evolving ITS1 region, flanked by the highly conserved SSU and LSU genes which permit design of PCR primers that are almost panfungal (White et al., 1990; Hillis and Dixon, 1991). In addition, it is a multi-copy region, which increases amplification success, with PCR success rates of > 90% reported across the fungal kingdom (Stielow et al., 2015). Moreover, of all of the regions of the nrDNA cistron, ITS has the highest probability of successful identification for the broadest range of fungi (White et al. 1990; Schoch et al., 2012; Stielow et al., 2015). This is principally owing to the fact that, of the various regions in the cistron, for most groups of fungi ITS has the highest barcode gap, defined as the difference between intraspecific and interspecific variation (Schoch et al., 2012). Because of the widespread adoption of ITS as the primary fungal barcode, it is the locus with by far the highest number of reference sequences in currently available databases. A structured search of the International Nucleotide Sequence Database Collaboration (INSDC [GenBank]) for Fungi and ITS1, ITS2, 5.8S or internal transcribed spacer returned 1,306,154 hits on 31 October 2019 (Borman and Johnson, unpublished), nearly triple the number of available sequences corresponding to the LSU (28S rRNA) gene. When the current number of ITS hits is compared to the 1,042,545 returned with the equivalent search on 19 October 2017 (Lücking and Hawksworth, 2018), and the meagre 51,000 ITS entries in INSDC in 2006 (Nilsson et al., 2006), the staggering increase in comparator sequence availability becomes apparent. One of the issues of employing any single barcode for fungal identification remains the establishment of a reliable sequence similarity cut-off for discrimination at species and genus levels. For yeast identification, this was partially

200

A.M. Borman and E.M. Johnson

resolved by dual barcoding of ~9000 reference strains from the CBS collection using ITS and LSU sequences (Vu et al., 2016). That study demonstrated that only ~3% of tested species could not be distinguished using a combination of both loci, and proposed taxonomic thresholds of 98.41% and 99.51% at the species level, and 96.31% and 97.11% at genus level for ITS and LSU, respectively. A second study performed with ~17,000 species of filamentous fungi yielded similar results, with only ~8% of species that could not be discriminated using combined ITS and LSU loci (Vu et al., 2019), with genus-level cut-offs of 94.3% and 98.2% and species-levels thresholds of 99.6% and 99.8% for ITS and LSU, respectively. ITS outperformed LSU for species discrimination (correct identification of 82% versus 77.6% for LSU), whereas LSU had better discriminatory power at higher taxonomic levels. While the data releases associated with the above studies represent an incredibly valuable addition to databases for fungal barcoding efforts, the thresholds proposed are unlikely to be universally applicable. It has been known for some years that intraspecific variability in ITS is not constant across the kingdom, and varies considerably among different groups of fungi in a manner that cannot easily be correlated with taxonomic affiliations (Nilsson et al., 2008). For the individual groups of fungi with intraspecific hyper-variance, it thus remains likely that group-specific thresholds will need to be developed based on the sequencing of large numbers of strains from diverse geographical regions or habitats (Nilsson et al., 2008). The converse problem is encountered in those groups of pathogenic fungi that have co-evolved with man (e.g. the anthropophilic dermatophytes Trichophyton rubrum, Trichophyton soudanense and Trichophyton violaceum), farmed animals or crops, where short evolutionary times and their clonal nature have resulted in very low levels of intra- and interspecific variation across the ITS region and very poor ITS barcode gaps (Gräser et al., 2000; Bridge et al., 2008; de Hoog et al., 2017; Zhan et al., 2018; Kandemir et al., 2020). These species nonetheless exhibit consistent differences in independent parameters (geographic localization, clinical features/sites of infection, morphological and physiological behaviours) sufficient to warrant their maintenance as independent taxa (Su et al., 2019; Kandemir et al., 2020).

Additionally, ITS is less effective in several important, species-rich genera, including Aspergillus, Fusarium, Penicillium and Cladosporium, as those genera also have narrow or no ITS barcode gaps (Seifert et al., 2007; Balajee et al., 2009; Samson et al., 2014; Stielow et al., 2015; also see Chapters 5 and 14). A final potential pitfall when employing ITS as the primary fungal barcode stems from the fact that it is multi-copy, and is thus potentially subject to intragenomic variation (Kiss, 2012). The results of studies aimed towards detecting such variation are conflicting. Two non-orthologous ITS2 types (which may have arisen from ancient gene duplication or hybridization events), and multiple ITS types with high levels of intraspecific divergence have been reported previously in species of Fusarium (O’ Donnell, 1992; O’Donnell and Cigelnik, 1997), together with rRNA gene introgression (Short et al., 2014). Similarly, incomplete concerted evolution and the presence of multiple ITS versions were detected in lichen-forming fungi (Mark et al., 2016), although much of this variation has been suggested to result directly from pyrosequencing errors and artefacts (Lücking et al., 2014; Lücking and Hawksworth, 2018). Whatever the precise extent of non-orthologous ITS types across different fungal lineages, limited intragenomic variation in ITS must occur as a result of DNA polymerase errors during replication. However, such errors (and the generation of random single nucleotide polymorphisms) should not significantly impact either ITS-based DNA barcoding using conventional Sanger sequencing, which would always reveal the dominant haplotype (Kiss, 2012), or the results of phylogenetic analyses if alignment-based phylogenetic methods rather than clustering approaches are employed (Lücking et al., 2014).

Secondary DNA Barcode Regions as Adjuncts to (or Replacements for) ITS Despite the many advantages of ITS as the universal fungal barcode, its species-resolution power is poor for certain groups of fungi (see above) and higher-level taxonomic resolution is more reliable with protein-coding genes (Nilsson et al., 2006; Seifert, 2009; Schoch et al., 2014) or even with LSU (Vu et al., 2019). Thus, mycologists

Sequence-based Identification and Classification of Fungi

have strived to identify additional gene regions that are sufficiently conserved to allow development of universal (or at least pan-phylum) primers for amplification, but variable enough to permit higher-level phylogenetic inferences and that possess sufficient barcode gaps for all fungal taxa (Capella-Gutierrez et al., 2014; Stielow et al., 2015). Among the loci that have received the most attention, the Assembling the Fungal Tree of Life (AFTOL; http://aftol.umn.edu, accessed 11 September 2020) gene targets DNA-directed RNA polymerase II subunits 1 and 2 (RPB1, RPB2), nucLSU, nucSSU and translation elongation factor 1α (TEF1α) have been shown to fulfil many of the requirements of an additional fungal barcode, including broad coverage and good discriminatory power (James et al., 2006; Schoch et al., 2012), as have several other universal and non-universal genes including Beta Tubulin 2 (TUB2), Gamma Actin (ACT), 60S ribosomal protein L10 and the PRP8 intein (Carbone and Kohn, 1999; Aveskamp et al., 2009; Kenyon et al., 2013; Borman et al., 2016, 2019a; Dukik et al., 2017). Stielow et al. (2015) undertook the daunting task of assessing many of these alternative, secondary barcode regions among a panel of 14 loci in a direct head-to-head comparison of PCR success and barcoding potential across > 1500 species. Although several of the comparator loci showed promise in terms of universal primer design/amplification success and adequate interspecific variation, they concluded that TEF1α, which has already been used extensively in a wide variety of phylogenetic analyses, was the most promising and proposed that locus as the universal secondary fungal DNA barcode for fungal identification (Stielow et al., 2015). However, as mentioned previously, DNA barcoding is designed to identify and allocate fungal isolates to known, described taxa based on known taxonomic boundaries, rather than to underpin the DNA-driven description of novel taxonomic units, and many of the promising additional loci proposed by various groups including LSU, RPB2, ACT, PRP8, TUB2 and RP L10 and combinations thereof are already extensively used to support the erection of new taxa. A list of the loci that are most commonly employed for fungal DNA barcoding and/or phylogenetic analyses for PSR is given in Table 12.1.

201

Quality of Reference Sequence Libraries The largest single limitation to the utilization of fungal DNA barcodes remains the absence of a comprehensive, high-quality reference sequence data set that encompasses fungi of interest to phytopathologists, medical mycologists and those studying environmental fungi (Balajee et al., 2009; Kang et al., 2010; Underwood and Green, 2011; Stielow et al., 2015). Although sequences deposited in public, non-curated databases (such as INSDC) have enormously benefited the scientific community, several studies have underscored the potential weakness of relying on such repositories for fungal DNA barcoding. Nilsson et al. (2006) reported that some 20% of the 51,000 ITS entries in INSDC at that time were erroneously identified at species level, and almost half of entries had issues surrounding correct taxonomic affiliations, descriptions or sequence annotations. A similar rate of sequence unreliability was reported by Bridge et al. (2003) in a study specifically examining fungal groups selected on the basis that sequences had been deposited by multiple laboratories to avoid sampling bias. A further problem with INSDC ITS sequences is that more than half of the entries correspond to environmental fungi and, as such, are not named (or named no further than to genus level) (Hawksworth and Lücking, 2017). In addition, 1.5% of 12,300 ITS sequences from environmental fungi were shown to be chimeric (artificial sequences comprising sequences from two or more unrelated species generated during PCR amplification) at the ordinal level (Nilsson et al., 2010). Subsequent fungal ITS chimera detection software allowed the removal of almost 1000 such sequences from public circulation (Nilsson et al., 2015). A variety of databases and tools has been developed in an attempt to address the issues of database fidelity (Hibbett et al., 2016; Table 12.2). The web-based UNITE database, which targets the primary fungal barcode ITS and was originally developed for ectomycorrhizal fungi, has been expanded dramatically to include all fungi (Kõljalg et al., 2005, 2013). As of January 2019, the database contained ~1,000,000 fungal ITS sequences arranged into > 450,000 ‘species hypotheses’ based on percent similarity thresholds

202

A.M. Borman and E.M. Johnson

Table 12.1 Selected loci and primers used for DNA sequencing identification of fungi. Primary (ITS) and secondary (TEF1α) barcode primers are in bold. Locus

Primer name

Primer Sequence (5ʹ-3ʹ)

Reference

ITS

ITS1 ITS4 ITS5 ITS2 ITS3 ITS4 LROR LR5 EF1-1018F EF1-1620R EF1-983F EF1-1567R Btub2Fd Btub4Rd ACT512F ACT783R fRPB2-5F fRPB2-7cR 60S-506F 60S-908R EF3-3185F EF3-3538R TOP1-501F TOP1-501R LN2-468F

TCCGTAGGTGAACCTGCGG TCCTCCGCTTATTGATATGC GGAAGTAAAAGTCGTAACAAGG GCTGCGTTCTTCATCGATGC GCATCGATGAAGAACGCAGC TCCTCCGCTTATTGATATGC ACCCGCTGAACTTAAGC TCCTGAGGGAAACTTCG GAYTTCATCAAGAACATGAT GACGTTGAADCCRACRTTGTC GCYCCYGGHCAYCGTGAYTTYAT ACHGTRCCRATACCACCRATCTT GTBCACCTYCARACCGGYCARTG CCRGAYTGRCCRAARACRAAGTTGTC ATGTGCAAGGCCGGTTTCG TACGAGTCCTTCTGGCCCAT GAYGAYMGWGATCAYTTYGG CCCATRGCTTGYTTRCCCAT GHGACAAGCGTTTCTCNGG CTTVAVYTGGAACTTGATGGT TCYGGWGGHTGGAAGATGAAG YTTGGTCTTGACACCNTC ACTGCCAAGGTTTTCCGTACHTACAACGC CCAGTCCTCGTCAACWGACTTRATRGCCCA GGCCATGTGCCTGAACATGATCGGHCGW GAYTGGAC CGGTTGCCRAAKCCRGCATAGAAKGG GTYGAYTTCAAYGTYCC ACACCDGGDGGRCCGTTCCA GCCAAAGGAACACAGCTGCTTCG GCTGAGGATTCAGAAAGAGG

White et al., 1990

ITS1 ITS2 LSU TEF1α

TUB2 ACT RPB2 60SL10 TEF3 TOP1 LNS2

PGK PRP8

LN2-468R PGK-533F PGK-533R PRP8-F PRP8-R

(Nilsson et al., 2019). Submission of reference sequences to the database is a restricted procedure that requires both that the submitter is a recognized expert in the fungal organisms in question and also that the sequenced strain is a fully described organism from a public herbarium or recognized culture collection (see Chapter 4). Additionally, the database is curated in-house, with the additional possibility of third-party curation and annotation (Nilsson et al., 2019). A number of alternatives to the UNITE database exist, although sequence coverage is much reduced. The Barcode of Life Database (BOLD; Ratnasingham and Hebert, 2007) also provides tools for fungal identification based on ITS sequences linked to voucher data, but currently encompasses less than 10% of the

Ward and Adams, 1998 White et al., 1990 White et al., 1990 Vilgalys and Hester, 1990 Stielow et al., 2015 Rehner and Buckley, 2005 Woudenberg et al., 2009 Carbone and Kohn, 1999 Liu et al., 1999 Stielow et al., 2015 Stielow et al., 2015 Stielow et al., 2015

Stielow et al., 2015 Stielow et al., 2015 Theodoro et al., 2011

number of species covered by the UNITE database (Hibbett et al., 2016). Similarly, the RefSeq Targeted Loci Project, which is a recent addition to NCBI (Tatusova et al., 2015a,b; O’Leary et al., 2016), is a curated database that contains those ITS accessions from GenBank that can be linked to type strains or other verified voucher material (Schoch et al., 2014). Currently, it contains only ~11,000 ITS sequences, and less than half that number of 28S LSU sequences (Borman and Johnson, unpublished). A number of additional databases, concentrating particularly on fungi pathogenic to humans and animals, have recently been developed by various consortia to more specifically address the needs of the medical mycology community (Hibbett et al., 2016; Prakash et al., 2017). The

Table 12.2. List of currently available, curated databases for the sequence-based identification of fungi, links correct as of 27 July 2020. URL

Scope

Loci Covered

Year Reference

AFTOL Barcode of Life

http://aftol.umn.edu http://v4.boldsystems.org/

All fungi All fungi

ITS ITS

CBS-KNAW FungiDB Fusarium-ID Fungal MLST

http://www.cbs.knaw.nl/collections http://fungidb.org/fungidb/ http://isolate.fusariumdb.org/blast.php http://www.q-bank.eu/Fungi/

All fungi All fungi Fusarium spp. Plant pathogens

ISHAM-Barcoding ISHAM-MLST

http://its.mycologylab.org/ http://mlst.mycologylab.org/

Institut Pasteur FungiBank MycoBank RefSeq Target Loci UNITE

http://fungibank.pasteur.fr

Medical fungi Scedosporium, Cryptococcus, Pneumocystis, Bipolaris Medical fungi

ITS and MLST, polyphasic Multiple ITS, TEF1α, RPB1, RPB2, TUB2 Multiple (including TEF1α, RPB1, RPB2, TUB2, ACT) ITS, TEF1α Multiple different loci depending on the target organism group Medical

2006 Celio et al., 2006 2007 Ratnasingham and Hebert, 2007 1989 N/A 2012 Stajich et al., 2012 2005 Park et al., 2011 2010 N/A

2011 Irinyi et al., 2015 2011 Bernhardt et al., 2013 Meyer et al., 2009 Phipps et al., 2011 Pham et al., 2015 2015 N/A

http://mycobank.org/ http://ncbi.nlm.nih.gov/refseq/ https://unite.ut.ee/

All fungi All fungi All fungi

ITS and polyphasic ITS, 28S, 18S ITS

2004 Robert et al., 2013 2014 Schoch et al., 2014 2003 Koljalg et al., 2013

Sequence-based Identification and Classification of Fungi

Database

203

204

A.M. Borman and E.M. Johnson

Centraalbureau voor Schimmelcultures (CBS) collection and databases contain ITS and LSU sequences for ~15,000 different species and allows pairwise DNA alignments as well as polyphasic identifications simultaneously against several linked remote databases (http://wi.knaw. nl, accessed 20 October 2020). The ISHAM-ITS reference database for human and animal pathogenic fungi, established in 2015 (Irinvi et al., 2015, 2016), contains > 4000 ITS sequences from more than 600 fungal species. Following the proposal that TEF1α be adopted as the secondary fungal DNA barcode for those groups of fungi that are insufficiently resolved with ITS, this database was recently extended to become the ISHAM Barcoding database with inclusion of a growing TEF1α data set (currently ~500 sequences from ~130 fungal species) of specific sequences corresponding to the TEF1α locus (Meyer et al., 2019). The ISHAM databases are completely integrated into the UNITE, RefSeq and BOLD databases via direct links/flags (Prakash et al., 2017). The Institut Pasteur FungiBank (IP-FungiBank, curated by the French National Research Centre for Invasive Mycoses and Antifungals) similarly provides DNA sequence alignments for yeast and mould species of medical interest, with a well-curated ITS database plus sequences for additional loci (TEF1α, TUB2 etc.) for those fungal groups that are ill-discriminated by ITS alone (http://fungibank. pasteur.fr, accessed 20 October 2020). Specific tools for certain fungal groups are also available. The CBS-KNAW provides dedicated polyphasic databases for dermatophytes, and for Penicillium and Aspergillus spp., and a multi-locus sequence typing (MLST) system for identification of the phylogenetically complex genus Fusarium that was instigated with the Pennsylvania State University who also curate the Fusarium ID database (O’Donnell et al., 2010; Park et al., 2011; O’Donnell et al., 2012). Finally, the Mycobank database, which was an initiative of the CBSKNAW Fungal Biodiversity centre (now the Westerdijk Fungal Biodiversity Institute) and later transferred to the IMA (International Mycological Association) provides a variety of comprehensive nomenclatural and taxonomic data, permits the centralized deposit of novel fungal taxa and currently contains sequence data for > 215,000 fungal strains. In addition, it allows pairwise sequence alignments as well as

polyphasic search approaches against a number of external, curated reference databases including ISHAM-ITS/ISHAM barcoding database, UNITE, GenBank, IP-FungiBank and CBS (Crous et al., 2004; Robert et al., 2013). However, despite all of the above advances, only approximately half of the currently described fungal species (which themselves only represent a small proportion of the total estimated diversity of the fungal kingdom) have any sequence data available in any public database (Hibbett et al., 2016; Xu, 2016; Prakash et al., 2017).

The Problem of Sequences Without Names: ‘Dark Taxa’ The issues with database reliability are exaggerated further when fungal DNA barcode sequences are compared against those in the Sequence Read Archive (SRA). Estimated to contain over 1.2 billion fungal ITS reads (Lücking and Hawksworth, 2018), none are named satisfactorily (Yahr et al., 2016; Lücking and Hawksworth, 2018; Ryberg and Nilsson, 2018). Moreover, since many of these sequences originate from large- scale, metagenomic studies of ecological/environmental samples (Boekhout, 2005; Buée et al., 2009; Porras-Alfaro et al., 2011; Hibbett et al., 2013, 2016; Ortiz-Vera et al., 2018), they are known only from sequence data and type strains; vouchers, or even living examplars, are missing. These dark taxa (or ‘sequences without names’) are thus completely separated from classical methods of fungal taxonomy or description based on morphological and other phenotypic examinations of living or preserved specimens, and the availability of such sequences is likely to increase exponentially with improvements in massively parallel metabarcoding technologies (PorrasAlfaro et al., 2011; Hibbett et al., 2013, 2016; Yahr et al., 2016; Ortiz-Vera et al., 2018). However, these data probably encompass many thousands of potentially novel taxonomic units, which would be extremely valuable if sequences were correctly ascribed to lineages. Several different approaches for exploiting such sequences have been explored, each with their own proponents (Hawksworth et al., 2016; Lücking and Hawksworth, 2018; Ryberg and Nilsson, 2018). One approach is that adopted by

Sequence-based Identification and Classification of Fungi

the UNITE database (Kõljalg et al., 2013), where cluster techniques are used to define ‘species hypotheses’ or operational taxonomic units (OTUs) based on a predefined similarity cut-off. These sequence clusters can then be mapped to known fungal taxa where phylogenetic overlaps occur. Currently, UNITE has ~800,000 fungal ITS sequences, arranged into approximate 70,000 species hypotheses using a 98.5% sequence identity threshold (Lücking and Hawksworth, 2018), an almost insignificant fraction (< 0.1%) of the number of ITS sequences currently in the SRA. It is hard to imagine the curation of > 1000-fold more species hypotheses using a clustering approach. Additional disadvantages of this approach are that it can only be applied to taxa represented by ITS sequences, and an alternative nomenclatural system is required to describe the sequence clusters; and, since OTUs are consensus sequences rather than real sequences, they cannot be used to formally describe new fungal taxa (Schoch et al., 2012; Ryberg, 2015; Lücking and Hawksworth, 2018). Finally, as discussed above, finding a sequence identity threshold that is applicable across the entire fungal kingdom is no mean feat. An alternative approach has been proposed (Hawksworth et al., 2016; Lücking and Hawksworth, 2018) that would allow DNA sequence data alone to serve as the formal types for naming novel fungi. For this approach to work it would necessarily have to employ sequences from a single locus, with the primary fungal barcode ITS being the most obvious choice. The approach is not permitted under the current International Code of Nomenclature for algae, fungi and plants (ICN; ‘the code’; Turland et al., 2018), which requires stored physical material, dried or metabolically inert (or an illustration of the organism) linked to the sequence to serve as holotype. However, the code does not formally exclude the use of any category of characters (including DNA sequences) for delineating taxa. Attempts to circumvent the need for a physical specimen to date have included preserving the environmental sample from which a sequence was obtained (Kirk, 2012) or providing illustrations of DNA sequence alignments (Lücking and Moncada, 2017). The advantages of employing DNA sequences as types are that sequences are intrinsically more stable than many of the other ephemeral phenotypic/

205

biological characters that are traditionally used to erect taxa; DNA sequences obtained from Sanger sequencing (see Chapter 14) are real sequences that exist in nature rather than consensus sequences generated from clustering/OTU approaches; and using DNA sequences as types would allow a more rapid response to formal naming of the exponentially increasing number of dark taxa without the need for alternative, parallel nomenclatural systems based on OTUs or species hypotheses (Lücking and Hawksworth, 2018). Finally, it appears the only valid approach to naming those fungi that can be recovered as DNA sequences but cannot be cultured. Arguments against the approach include: (i) the likelihood that using ITS alone initially might under- or over-resolve lineages requiring either the creation of epitypes or synonymization, respectively, when physical specimens and sequences from additional loci become available; (ii) if loci other than ITS are permitted, the possibility that a single taxon may be described as novel multiple times based on different non-overlapping loci; and (iii) the erection of taxa based on DNA sequences for those parts of the kingdom that have already been described but not sequenced at a particular locus (Thines et al., 2018; Zamora et al., 2018). One potential compromise to address some of these issues would be that sequence-based species names in the literature would be identifiable by the addition of ‘nom. seq.’ (nomen sequentiae). While valid when published, those names would not take priority over taxa named independently on the basis of physical types (Lücking et al., 2018; Zamora et al., 2018), which is similar to the ‘Candidatus’ status employed for bacteria (Lücking and Hawksworth, 2018).

Implications for Fungal Taxonomy and Nomenclature While the discovery, classification and naming of novel fungal taxa has been a continuous endeavour since the dawn of mycology, the introduction of molecular approaches to fungal identification has resulted in an almost logarithmic acceleration of these processes. On the basis of sequence-based fungal identification, cryptic species have been described in many common morphospecies, the dual nomenclatural system for teleomorph and anamorph life cycles was

206

A.M. Borman and E.M. Johnson

rendered obsolete and abandoned, and many extant fungi have been reassigned to new genera on the basis of phylogenetic analyses (Samson et al., 2011, 2014; Visagie et al., 2014; Borman et al., 2016, 2018; discussed in Warnock, 2017, 2019; Wiederhold and Gibas, 2018). However, since it is widely accepted that phylogenetic relationships are highly subject to sampling bias (Seifert, 2009), novel phylogenies are likely to be subject to considerable change when additional, diverse taxa are sampled. Moreover, several large- scale fungal barcoding initiatives have revealed that the conventional fungal barcode regions ITS and LSU perform poorly at higher taxonomic classifications (Vu et al., 2016, 2019). Using both LSU and ITS, taxonomic clustering of filamentous fungi at the generic level was poor, highlighting the need for wholesale higher-level classification changes in the future (Vu et al. 2019). The same was true when these two loci were employed for barcoding 9000 isolates of yeast (Vu et al., 2016). However, in that study, clustering of yeast genera in Basidiomycota was significantly improved when the taxonomic changes that resulted from the wholesale generic revision of basidiomycetous yeasts (Liu et al., 2015; Wang et al., 2015a, 2015b) were taken into account (Vu et al., 2016). Thus, although all of these studies suggested that ITS and LSU can be used to separate fungal strains at the species level, and that LSU outperformed ITS at the family and order levels, neither locus adequately classified fungi at the generic level. Wholesale revision of the fungal kingdom at the generic level will thus be necessary in the future. The increasing availability of completely sequenced fungal genomes (Hibbett et al., 2013; Grigoriev et al., 2014) will likely contribute to these revisions, as they generate robust scaffolds with improved taxonomic resolution (Dentinger et al., 2015; Lockhart et al., 2017; Ropars et al., 2018). However, incorporation of whole-genome sequencing data into revised fungal taxonomy is not without problems, since assemblies frequently do not include the ribosomal RNA cistron that is necessary to allow comparability with data sets generated by DNA barcoding (Yahr et al., 2016). While nomenclatural instability as described above is an inevitable (and hopefully transient) repercussion of sequence-based identification of fungi, nomenclatural conflicts are a further ongoing issue. The description of novel

cryptic species in recognized morphospecies and the renaming of historically accepted fungi will inevitably cause confusion to those not directly involved in the field (e.g. clinicians treating patients with fungal infections). Theoretically, these problems can to some extent be circumvented by the way that clinical mycology laboratories convey results. The use of a ‘species complex’, although not clearly defined taxonomically, can be employed for those cryptic species that share clinically similar properties to the well-known original morphospecies. For those fungi that have been renamed, reporting of the novel name with reference to the previously accepted genus and species epithet will allow clinicians to still access the wealth of historical data concerning treatment options and outcomes. The latter approach is the one that we are currently employing in an attempt to rationalize the nomenclature of the clearly polyphyletic ‘genus’ Candida (Borman and Johnson, 2018). However, even these suggestions are not without opposition. For complexes of cryptic species, it is rarely apparent initially whether all or any of the novel taxa are likely to exhibit clinically significant differences, or whether such differences are universally applicable. For example, several of the cryptic species in Aspergillus section Fumigati were initially identified from studies in the USA which demonstrated that isolates had unusual patterns of antifungal drug resistance (Balajee et al., 2005). The equivalent studies in the UK confirmed the isolation of a similar range of cryptic species from clinical samples, but failed to find clinically relevant differences in antifungal susceptibility (Borman and Johnson, unpublished data), a reflection that geographically distant populations of the same species might have evolved disparate phenotypic features. Similarly, although there are currently six cryptic species in the Scedosporium species complex (Scedosporium apiospermum, Scedosporium aurantiacum, Scedosporium boydii, Scedosporium dehoogii, Scedosporium minutisporum and Pseudallescheria angusta; Gilgado et al., 2005; Chen et al., 2016), S. aurantiacum, S. dehoogii and S. minutisporum have been more recently excluded from the ‘S. apiospermum species complex’ on the basis of reported differences in virulence and antifungal susceptibility profiles (Gilgado et al., 2005, 2009; Lackner et al., 2014; Chen et al., 2016). Additional problems abound for species that have been (or should be) renamed. It has long

Sequence-based Identification and Classification of Fungi

been accepted that genetic diversity in the medically important Cryptococcus neoformans/gattii species complex far exceeds the number of currently accepted species (Kwon-Chung et al., 2017). However, a proposal to partially address this undescribed diversity through the erection of several additional species (Hagen et al., 2015), some of which had clinically significant behavioural differences or geographic prevalence (Nyazika et al., 2016; Hagen et al., 2017), was strongly criticized as being premature since even these additional taxonomic novelties would not fully encompass the diversity of the complex (Kwon-Chung et al., 2017). Additionally, since recognition of the novel taxa relied upon MLST phylogenetic approaches involving 11 loci (but not including the fungal barcoding regions), clinical mycology laboratories are unlikely to be able to reliably differentiate them using existing technologies. A slightly different controversy surrounds nomenclature of the complex genera Aspergillus and Fusarium, both of which are important human and agricultural pathogens (de Hoog et al., 2013, 2015). The type species of Aspergillus is Aspergillus glaucus, and on the basis of DNA sequencing approaches most other Aspergillus species should be removed from the genus and placed in one of the nine new teleomorph genera that would be required to encompass the genetic diversity (reviewed in Samson et al., 2014). Similarly, since the type species for Fusarium is Fusarium sambucinum, which has a Gibberella teleomorph, all Fusarium species with non-Gibberella teleomorphs should be accommodated elsewhere. Thus, a novel genus Bisifusarium was erected for Fusarium dimerum and relatives, and it was proposed that all members of the Fusarium solani species complex should be

207

moved to the genus Neocosmospora (Lombard et al., 2015; Sandoval-Denis et al., 2018). However, several working groups have suggested that the status quo should be maintained to ‘preserve established research connections’ in the diverse communities interested in Fusarium (Geiser et al., 2013) and ‘maintain the prevailing, broad concept of Aspergillus’ (Samson et al., 2014). Despite overwhelming phylogenetic evidence to the contrary, this nomenclatural obfuscation has received support from the International Commission of Penicillium and Aspergillus (ICPA). Although this approach would reduce confusion in medical mycology by reducing the overall number of nomenclatural changes, its application lacks consistency across the kingdom, where a large number of nomenclatural changes to less prominent fungi have been ratified and filtered down to clinicians (de Hoog et al., 2013, 2015; Warnock, 2017, 2019; Wiederhold and Gibas, 2018). The approach of our own laboratory, which is the one shared by Wiederhold and Gibas (2018), circumvents these objections: we report the new (accurate) nomenclature together with the previous name (or names) that is most commonly encountered in the literature. A number of websites that attempt to record recent accepted taxonomic changes are listed in Table 12.3.

Conclusion Over the last two decades the sequence-based identification of fungi has certainly come of age. The ITS region is universally accepted as the primary fungal barcoding region owing to the high barcode gap with the locus for many groups of fungi. Since the species-resolution power of ITS is

Table 12.3. Useful fungal taxonomy reference sites. (All links correct as of 27 July 2020) Organization/Website

URL

Mycobank Index Fungorum Atlas of Clinical Fungi Westerdijk Institute The yeasts website ICPA Mycology Online International Commission on the Taxonomy of Fungi

http://www.mycobank.org/ http://www.indexfungorum.org/ http://www.clinicalfungi.org/ http://www.westerdijkinstitute.nl/ http://theyeasts.org/ http://www.aspergilluspenicillium.org/ http://www.mycology.adelaide.edu.au/ http://www.fungaltaxonomy.org/

208

A.M. Borman and E.M. Johnson

poor for certain groups of fungi, and higher-level taxonomic resolution is greater with protein- coding genes, the TEF1α locus has been proposed as the universal secondary barcode region. In addition, the historical problems surrounding the reliability of fungal DNA sequences in centralized repositories are slowly being resolved by the development of an increasing number of publicly accessible, curated databases. However, a number

of practical and theoretical issues still require addressing, including how to integrate the huge volume of sequences from dark taxa produced as a result of metagenomics studies (see Chapters 2, 5 and 14), and how best to deal with the enormous taxonomic revisions and wholesale higherlevel classification changes that will inevitably result from molecular examination of hitherto under-sampled parts of the kingdom.

References Aveskamp, M.M., Woudenberg, J.H., de Gruyter, J., Turco, E., Groenewald, J.Z. and Crous, P.W. (2009) Development of taxon-specific sequence characterized amplified region (SCAR) markers based on actin sequences and DNA amplification fingerprinting (DAF): a case study in the Phoma exigua species complex. Molecular Plant Pathology 10, 403–414. https://doi.org/10.1111/j.1364-3703.2009.00540.x Badotti, F., de Oliveira, F.S., Garcia, C.F., Vaz, A.B., Fonseca, P.L. et al. (2017) Effectiveness of ITS and sub-regions as DNA barcode markers for the identification of Basidiomycota (Fungi). BMC Microbiology 17, 42. https://doi.org/10.1186/s12866-017-0958-x Balajee, S.A., Gribskov, J.L., Hanley, E., Nickle, D. and Marr, K.A. (2005) Aspergillus lentulus sp. nov., a new sibling species of A. fumigatus. Eukaryotic Cell 4, 625–632. https://doi.org/10.1128/EC.4.3.625632.2005 Balajee, S.A., Borman, A.M., Brandt, M.E., Cano, J., Cuenca-Estrella, M. et al. (2009) Sequence-based identification of Aspergillus, Fusarium, and mucorales species in the clinical mycology laboratory: where are we and where should we go from here? Journal of Clinical Microbiology 4, 877–884. https:// doi.org/10.1128/JCM.01685-08 Barnett, J.A., Payne, R.W. and Yarrow, D. (2000) Yeasts: Characteristics and Identification, 3rd edn. Cambridge University Press, Cambridge, UK. Beck, A., Haug, I., Oberwinkler, F. and Kottke, I. (2007). Structural characterisation and molecular identification of arbuscular mycorrhiza morphotypes of Alzatea verticillata (Alzateaceae), a prominent tree in the tropical mountain rain forest of South Ecuador. Mycorrhiza 17, 607–625. https://doi.org/10.1007/ s00572-007-0139-0 Bernhardt, A., Sedlacek, L., Wagner, S., Schwarz, C., Würstl, B. and Tintelnot, K. (2013) Multilocus sequence typing of Scedosporium apiospermum and Pseudallescheria boydii isolates from cystic fibrosis patients. Journal of Cystic Fibrosis 12, 592–598. https://doi.org/10.1016/j.jcf.2013.05.007 Boekhout, T. (2005) Gut feeling for yeasts. Nature 434, 449–451. https://doi.org/10.1038/434449a Borman, A.M. and Johnson, E.M. (2018) Candida, Cryptococcus and other yeasts of medical importance. In: Jorgensen, J., Pfaller, M., Carroll, K., Funke, G., Landry, M., Richter, S. and Warnock, D. (eds) Manual of Clinical Microbiology, 12th edn. ASM Press, Washington, DC, pp 2056–2086. Borman, A.M., Linton, C.J., Miles, S.J. and Johnson, E.M. (2008) Molecular identification of pathogenic fungi. Journal of Antimicrobial Chemotherapy 61, Suppl 1, i7–12. https://doi.org/10.1093/jac/dkm425 Borman, A.M., Linton, C.J., Oliver, D., Palmer, M.D., Szekely, A. and Johnson, E.M. (2010) Rapid molecular identification of pathogenic yeasts by pyrosequencing analysis of 35 nucleotides of internal transcribed spacer 2. Journal of Clinical Microbiology 48, 3648–3653. https://doi.org/10.1128/JCM.01071-10 Borman, A.M., Desnos-Ollivier, M., Campbell, C.K., Bridge, P.D., Dannaoui, E. and Johnson, E.M. (2016) Novel taxa associated with human fungal black-grain mycetomas: Emarellia grisea gen. nov., sp. nov., and Emarellia paragrisea sp. nov. Journal of Clinical Microbiology 54, 1738–1745. https://doi. org/10.1128/JCM.00477-16 Borman, A.M., Szekely, A., Fraser, M., Lovegrove, S. and Johnson, E.M. (2019a) A novel dermatophyte relative, Nannizzia perplicata sp. nov., isolated from a case of tinea corporis in the United Kingdom. Medical Mycology 57, 548–556. https://doi.org/10.1093/mmy/myy099 Borman, A.M., Muller, J., Walsh-Quantick, J., Szekely, A., Patterson, Z. et al. (2019b) Fluconazole resistance in isolates of uncommon pathogenic yeast species from the United Kingdom. Antimicrobial Agents and Chemotherapy 63, pii: e00211-19. https://doi.org/10.1128/AAC.00211-19

Sequence-based Identification and Classification of Fungi

209

Bridge, P.D., Roberts, P.J., Spooner, B.M. and Panchal, G. (2003) On the unreliabilty of published DNA sequences. New Phytologist 160, 43–48. https://doi.org/10.1046/j.1469-8137.2003.00861.x Bridge, P.D., Spooner, B.M. and Roberts, P.J. (2005) The impact of molecular data in fungal systematics. Advances in Botanical Research 42, 33–67. https://doi.org/10.1016/S0065-2296(05)42002-9 Bridge, P.D., Schlitt, T., Cannon, P.F., Buddie, A.G., Baker, M. and Borman, A.M. (2008) Domain II hairpin structure in ITS1 sequences as an aid to differentiating recently evolved animal and plant pathogenic fungi. Mycopathologia 166, 1–16. https://doi.org/10.1007/s11046-008-9094-3 Brown, G.D., Denning, D.W., Gow, N.A., Levitz, S.M., Netea, M.G. et al. (2012) Hidden killers: human fungal infections. Science Translational Medicine 4, 165rv13. https://doi.org/10.1126/scitranslmed.3004404 Brun, S. and Silar, O. (2010) In: Pontarotti, P. (ed.) Evolutionary Biology - Concepts, Molecular and Morphological Evolution. Springer, Berlin. Buée, M., Reich, M., Murat, C., Morin, E., Nilsson, R.H. et al. (2009) 454 Pyrosequencing analyses of forest soils reveal an unexpectedly high fungal diversity. New Phytologist 184, 449–456. https://doi. org/10.1111/j.1469-8137.2009.03003.x Campbell, C.K., Johnson, E.M. and Warnock, D.W. (2013) Identification of Pathogenic Fungi, 2nd edn. Wiley-Blackwell, Hoboken, New Jersey. ISBN: 978-1-444-33070-0 Campbell, C.K., Borman, A.M., Linton, C.J., Bridge, P.D. and Johnson, E.M. (2006) Arthroderma olidum, sp. nov. a new addition to the Trichophyton terrestre complex. Medical Mycology 44, 451–459. https:// doi.org/10.1080/13693780600796538 Capella-Gutierrez, S., Kauff, F. and Gabaldón, T. (2014) NuclA phylogenomics approach for selecting robust sets of phylogenetic markers. Nucleic Acids Research 42, e54. https://doi.org/10.1093/ nar/gku071 Carbone, I. and Kohn, L.M. (1999) A method for designing primer sets for speciation studies in filamentous ascomycetes. Mycologia 91, 553–556. https://doi.org/10.1080/00275514.1999.12061051 Celio, G.J., Padamsee, M., Dentinger, B.T., Bauer, R. and McLaughlin, D.J. (2006) Assembling the Fungal Tree of Life: constructing the structural and biochemical database. Mycologia 98, 850–859. https:// doi.org/10.1080/15572536.2006.11832615 Chen, M., Zeng, J., de Hoog, G.S., Stielow, B., Gerrits Van Den Ende, A.H. et al. (2016) The ‘species complex’ issue in clinically relevant fungi: A case study in Scedosporium apiospermum. Fungal Biology 120, 137–146. https://doi.org/10.1016/j.funbio.2015.09.003 Crous, P.W., Gams, W., Stalpers, D., Robert, V. and Stegehuis, G. (2004) MycoBank: an online initiative to launch mycology into the 21st century. Studies in Mycology 50, 19–22. Dentinger, B., Gaya, E., O’Brien, H., Suz, L.M., Lachlan, R. et al. (2015) Tales from the crypt : genome mining from fungarium specimens improves resolution of the mushroom tree of life. Biological Journal of the Linnean Society 117, 11-32. https://doi.org/10.1111/bij.12553 de Hoog, G.S., Haase, G., Chaturvedi, V., Walsh, T.J., Meyer, W. and Lackner, M. (2013) Taxonomy of medically important fungi in the molecular era. Lancet Infectious Diseases 13, 385–386. https://doi. org/10.1016/S1473-3099(13)70058-6 de Hoog, G.S., Chaturvedi, V., Denning, D.W., Dyer, P.S., Frisvad, J.C., ISHAM Working Group on Nomenclature of Medical Fungi et al. (2015) Name changes in medically important fungi and their implications for clinical practice. Journal of Clinical Microbiology 53, 1056–1062. https://doi.org/10.1128/ JCM.02016-14 de Hoog, G.S., Guarro, J., Gene, J. and Figueras, M.J. (2016) Atlas of Clinical Fungi, 2nd edn. CBS Press, Utrecht, The Netherlands. de Hoog, G.S., Dukik, K., Monod, M., Packeu, A., Stubbe, D. et al. (2017) Toward a novel multilocus phylogenetic taxonomy for the dermatophytes. Mycopathologia 182, 5–31. https://doi.org/10.1007/s11046016-0073-9 Dukik, K., Muñoz, J.F., Jiang, Y., Feng, P., Sigler, L. et al. (2017) Novel taxa of thermally dimorphic systemic pathogens in the Ajellomycetaceae (Onygenales). Mycoses 60, 296–309. https://doi. org/10.1111/myc.12601 Fell, J.W., Boekhout, T., Fonseca, A., Scorzetti, G. and Statzell-Tallman, A. (2000) Biodiversity and systematics of basidiomycetous yeasts as determined by large-subunit rDNA D1/D2 domain sequence analysis. International Journal of Systematic and Evolutionary Microbiology 50, 1351–1371. https:// doi.org/10.1099/00207713-50-3-1351 Fisher, M.C., Hawkins, N.J., Sanglard, D. and Gurr, S.J. (2018) Worldwide emergence of resistance to antifungal drugs challenges human health and food security. Science 360, 739–742. https://doi. org/10.1126/science.aap7999

210

A.M. Borman and E.M. Johnson

Fones, H.N., Fisher, M.C. and Gurr, S.J. (2017) Emerging fungal threats to plants and animals challenge agriculture and ecosystem resilience. Microbiology Spectrum 5, https://doi.org/10.1128/microbiolspec.FUNK-0027-2016 Friedman, D.Z.P. and Schwartz, I.S. (2019) Emerging fungal infections: new patients, new patterns, and new pathogens. Journal of Fungi 5, pii: E67. https://doi.org/10.3390/jof5030067 Frisvad, J.C., Andersen, B. and Thrane, U. (2008) The use of secondary metabolite profiling in chemotaxonomy of filamentous fungi. Mycological Research 112, 231–240. https://doi.org/10.1016/j.mycres.2007.08.018 Geiser, D.M., Aoki, T., Bacon, C.W., Baker, S.E., Bhattacharyya, M.K. et al. (2013) One fungus, one name: defining the genus Fusarium in a scientifically robust way that preserves longstanding use. Phytopathology 103, 400–408. https://doi.org/10.1094/PHYTO-07-12-0150-LE Gilgado, F., Cano, J., Gené, J. and Guarro, J. (2005) Molecular phylogeny of the Pseudallescheria boydii species complex: proposal of two new species. Journal of Clinical Microbiology 43, 4930–4942. https://doi.org/10.1128/JCM.43.10.4930-4942.2005 Gilgado, F., Cano, J., Gené, J., Serena, C. and Guarro, J. (2009) Different virulence of the species of the Pseudallescheria boydii complex. Medical Mycology 47, 371–374. https://doi.org/10.1080/ ww13693780802256539 Gladieux, P., Ravel, S., Rieux, A., Cros-Arteil, S., Adreit, H. et al. (2018) Coexistence of multiple endemic and pandemic lineages of the rice blast pathogen. MBio 9, pii: e01806-17. https://doi.org/10.1128/ mBio.01806-17 Gräser, Y., Kuijpers, A.F., Presber, W. and de Hoog, G.S. (2000) Molecular taxonomy of the Trichophyton rubrum complex. Journal of Clinical Microbiology 38, 3329–3336. https://doi.org/10.1128/ JCM.38.9.3329-3336.2000 Grigoriev, I.V., Nikitin, R., Haridas, S., Kuo, A., Ohm, R. et al. (2014) MycoCosm portal: gearing up for 1000 fungal genomes. Nucleic Acids Research 42, D699–704. https://doi.org/10.1093/nar/gkt1183 Hagen, F., Khayhan, K., Theelen, B., Kolecka, A., Polacheck, I. et al. (2015) Recognition of seven species in the Cryptococcus gattii/Cryptococcus neoformans species complex. Fungal Genetics and Biology 78, 16–48. https://doi.org/10.1016/j.fgb.2015.02.009 Hagen, F., Lumbsch, H.T., Arsic Arsenijevic, V., Badali, H., Bertout, S. et al. (2017) Importance of resolving fungal nomenclature: the case of multiple pathogenic species in the Cryptococcus genus. mSphere 2, pii: e00238–17. https://doi.org/10.1128/mSphere.00238-17 Hawksworth, D.L. (1991) The fungal dimension of biodiversity – magnitude, significance and conservation. Mycological Research 95, 641–655. https://doi.org/10.1016/S0953-7562(09)80810-1 Hawksworth, D.L., Hibbett, D.S., Kirk, P.M. and Lücking, R. (2016) Proposals to permit DNA sequence data to serve as types of names of fungi. Taxon 65, 899–900. https://doi.org/10.12705/654.31 Hawksworth, D.L. and Lücking, R. (2017) Fungal diversity revisited: 2.2 to 3.8 million species. Microbiology Spectrum 5, https://doi.org/10.1128/microbiolspec.FUNK-0052-2016 Hebert, P.D., Stoeckle, M.Y., Zemlak, T.S. and Francis, C.M. (2004) Identification of birds through DNA barcodes. PLoS Biology 2, e312. https://doi.org/10.1371/journal.pbio.0020312 Herr, J.R., Opik, M. and Hibbett, D.S. (2015) Towards the unification of sequence-based classification and sequence-based identification of host-associated microorganisms. New Phytologist 205, 27–31. https://doi.org/10.1111/nph.13180 Hibbett, D., Abarenkov, K., Kõljalg, U., Öpik, M., Chai, B. et al. (2016) Sequence-based classification and identification of fungi. Mycologia 108, 1049–1068. https://doi.org/ 10.3852/16-130 Hibbett, D.S., Stajich, J.E. and Spatafora, J.W. (2013) Toward genome-enabled mycology. Mycologia 105, 1339–1449. https://doi.org/10.3852/13-196 Hillis, D.M. and Dixon, M.T. (1991) Ribosomal DNA: molecular evolution and phylogenetic inference. Quarterly Review of Biology 66, 411–453. https://doi.org/10.1086/417338 Hinrikson, H.P., Hurst, S.F., Lott, T.J., Warnock, D.W. and Morrison, C.J. (2005) Assessment of ribosomal large-subunit D1-D2, internal transcribed spacer 1, and internal transcribed spacer 2 regions as targets for molecular identification of medically important Aspergillus species. Journal of Clinical Microbiology 43, 2092–2103. https://doi.org/10.1128/JCM.43.5.2092-2103.2005 Houbraken, J., Verweij, P.E., Rijs, A.J., Borman, A.M. and Samson, R.A. (2010) Identification of Paecilomyces variotii in clinical samples and settings. Journal of Clinical Microbiology 48, 2754–2761. https://doi.org/10.1128/JCM.00764-10 Huppert, M., Harper, G., Sun, S.H. and Delanerolle, V. (1975) Rapid methods for identification of yeasts. Journal of Clinical Microbiology 2, 21–34.

Sequence-based Identification and Classification of Fungi

211

Ioos, R., Andrieux, A., Marçais, B. and Frey, P. (2006) Genetic characterization of the natural hybrid species Phytophthora alni as inferred from nuclear and mitochondrial DNA analyses. Fungal Genetics and Biology 43, 511–529. https://doi.org/10.1016/j.fgb.2006.02.006 Irinyi, L., Serena, C., Garcia-Hermoso, D., Arabatzis, M., Desnos-Ollivier, M. et al. (2015) International Society of Human and Animal Mycology (ISHAM)-ITS reference DNA barcoding database-the quality controlled standard tool for routine identification of human and animal pathogenic fungi. Medical Mycology 53, 313–337. https://doi.org/10.1093/mmy/myv008 Irinyi, L., Lackner, M., de Hoog, G.S. and Meyer, W. (2016) DNA barcoding of fungi causing infections in humans and animals. Fungal Biology 120, 125–136. https://doi.org/10.1016/j.funbio.2015.04.007 James, T.Y., Kauff, F., Schoch, C.L., Matheny, P.B., Hofstetter, V. et al. (2006) Reconstructing the early evolution of Fungi using a six-gene phylogeny. Nature 443, 818–822. https://doi.org/10.1038/nature05110 Kandemir, H., Dukik, K., Hagen, F., Ilkit, M., Gräser, Y. and de Hoog, G.S. (2020) Polyphasic discrimination of Trichophyton tonsurans and T. equinum from humans and horses. Mycopathologia 185, 113– 122. https://doi.org/10.1007/s11046-019-00344-9 Kang, S., Mansfield, M.A., Park, B., Geiser, D.M., Ivors, K.L. et al. (2010) The promise and pitfalls of sequence-based identification of plant-pathogenic fungi and oomycetes. Phytopathology 100, 732–737. https://doi.org/10.1094/PHYTO-100-8-0732 Kenyon, C., Bonorchis, K., Corcoran, C., Meintjes, G., Locketz, M. et al. (2013) A dimorphic fungus causing disseminated infection in South Africa. New England Journal of Medicine 369, 1416–1424. https:// doi.org/10.1056/NEJMoa1215460 Kiss, L. (2012) Limits of nuclear ribosomal DNA internal transcribed spacer (ITS) sequences as species barcodes for Fungi. Proceedings of the National Academy of Sciences USA 109, E1811; author reply E1812. https://doi.org/10.1073/pnas.1207143109 Kõljalg, U., Larsson, K.H., Abarenkov, K., Nilsson, R.H., Alexander, I.J. et al. (2005) UNITE: a database providing web-based methods for the molecular identification of ectomycorrhizal fungi. New Phytologist 166, 1063–1068. https://doi.org/10.1111/j.1469-8137.2005.01376.x Kõljalg, U., Nilsson, R.H., Abarenkov, K., Tedersoo, L., Taylor, A.F. et al. (2013) Towards a unified paradigm for sequence-based identification of fungi. Molecular Ecology 22, 5271–5277. https://doi. org/10.1111/mec.12481 Kopchinskiy, A., Komon, M., Kubricek, C.P. and Druzhinina, I.S. (2005) TrichoBLAST: a multilocus database for Trichoderma and Hypocrea identifications. Mycological Research 109, 658–660. https://doi. org/10.1017/S0953756205233397 Kirk, P.M. (2012) Nomenclatural novelties. Index Fungorum 1, 1. Kurtzman, C.P. and Robnett, C.J. (1998) Identification and phylogeny of ascomycetous yeasts from analysis of nuclear large subunit (26S) ribosomal DNA partial sequences. Antonie Van Leeuwenhoek 73, 331–71. https://doi.org/10.1023/A:1001761008817 Kwon-Chung, K.J., Bennett, J.E., Wickes, B.L., Meyer, W., Cuomo, C.A. et al. (2017) The case for adopting the "species complex" nomenclature for the etiologic agents of cryptococcosis. mSphere 2, pii: e00357–16. https://doi.org/10.1128/mSphere.00357-16 Lackner, M., Hagen, F., Meis, J.F., Gerrits van den Ende, A.H. et al. (2014) Susceptibility and diversity in the therapy-refractory genus Scedosporium. Antimicrobial Agents and Chemotherapy 58, 5877–85. https://doi.org/10.1128/AAC.03211-14 Linton, C.J., Borman, A.M., Cheung, G., Holmes, A.D., Szekely, A. et al. (2007) Molecular identification of unusual pathogenic yeast isolates by large ribosomal subunit gene sequencing: 2 years of experience at the United Kingdom mycology reference laboratory. Journal of Clinical Microbiology 45, 1152– 1158. https://doi.org/10.1128/JCM.02061-06 Liu, X.Z., Wang, Q.M., Göker, M., Groenewald, M., Kachalkin, A.V. et al. (2015) Towards an integrated phylogenetic classification of the Tremellomycetes. Studies in Mycology 81, 85–147. https://doi. org/10.1016/j.simyco.2015.12.001 Liu, Y.J., Whelen, S. and Hall, B.D. (1999) Phylogenetic relationships among ascomycetes: evidence from an RNA polymerase II subunit. Molecular Biology and Evolution 16, 1799–1808. https://doi. org/10.1093/oxfordjournals.molbev.a026092 Lockhart, S.R., Etienne, K.A., Vallabhaneni, S., Farooqi, J., Chowdhary, A. et al. (2017) Simultaneous emergence of multidrug-resistant Candida auris on 3 continents confirmed by whole-genome sequencing and epidemiological analyses. Clinical Infectious Diseases 64, 134–140. https://doi. org/10.1093/cid/ciw691

212

A.M. Borman and E.M. Johnson

Lombard, L., van der Merwe, N.A., Groenewald, J.Z. and Crous, P.W. (2015) Generic concepts in Nectriaceae. Studies in Mycology 80, 189–245. https://doi.org/10.1016/j.simyco.2014.12.002 Luangsa-Ard, J., Houbraken, J., van Doorn, T., Hong, S.B., Borman, A.M. et al. (2011) Purpureocillium, a new genus for the medically important Paecilomyces lilacinus. FEMS Microbiology Letters 321, 141– 149. https://doi.org/10.1111/j.1574-6968.2011.02322.x Lücking, R. and Hawksworth, D.L. (2018) Formal description of sequence-based voucherless Fungi: promises and pitfalls, and how to resolve them. IMA Fungus 9, 143–166. https://doi.org/10.5598/imafungus.2018.09.01.09 Lücking, R. and Moncada, B. (2017) Dismantling Marchandiomphalina into Agonimia (Verrucariaceae) and Lawreymyces gen. nov. (Corticiaceae): setting a precedent to the formal recognition of thousands of voucherless fungi based on type sequences. Fungal Diversity 84, 119–138. https://doi.org/10.1007/ s13225-017-0382-4 Lücking, R., Lawrey, J.D., Gillevet, P.M., Sikaroodi, M., Dal-Forno, M. and Berger, S.A. (2014) Multiple ITS haplotypes in the genome of the lichenized basidiomycete Cora inversa (Hygrophoraceae): fact or artifact? Journal of Molecular Evolution 78, 148–162. https://doi.org/10.1007/s00239-0139603-y Lücking, R., Kirk, P.M. and Hawksworth, D.L. (2018) Sequence-based nomenclature: a reply to Thines et al. and Zamora et al. and provisions for an amended proposal "from the floor" to allow DNA sequences as types of names. IMA Fungus 9, 185–198. https://doi.org/10.5598/imafungus.2018. 09.01.12 Mark, K., Cornejo, C., Keller, C., Fluck, D. and Scheidegger, C. (2016) Barcoding lichen-forming Fungi using 454 Pyrosequencing® is challenged by artifactual and biological sequence variation. Genome 59, 685–704. https://doi.org/10.1139/gen-2015-0189 Meyer, W., Aanensen, D.M., Boekhout, T., Cogliati, M., Diaz, M.R. et al. (2009) Consensus multi-locus sequence typing scheme for Cryptococcus neoformans and Cryptococcus gattii. Medical Mycology 47, 561–570. https://doi.org/10.1080/13693780902953886 Meyer, W., Irinyi, L., Hoang, M.T.V., Robert, V., Garcia-Hermoso, D. et al. (2019) Database establishment for the secondary fungal DNA barcode translational elongation factor 1 (TEF1α). Genome 62, 160–169. https://doi.org/10.1139/gen-2018-0083 Nilsson, R.H., Ryberg, M., Kristiansson, E., Abarenkov, K., Larsson, K.H. and Kõljalg, U. (2006) Taxonomic reliability of DNA sequences in public sequence databases: a fungal perspective. PLoS One 20, (1) e59. https://doi.org/10.1371/journal.pone.0000059 Nilsson, R.H., Kristiansson, E., Ryberg, M., Hallenberg, N. and Larsson, K.H. (2008) Intraspecific ITS variability in the kingdom fungi as expressed in the international sequence databases and its implications for molecular species identification. Evolutionary Bioinformatics Online 4, 193–201. https://doi. org/10.4137/EBO.S653 Nilsson, R.H., Abarenkov, K., Veldre, V., Nylinder, S., De Wit, P. et al. (2010) An open source chimera checker for the fungal ITS region. Molecular Ecology Resources 10, 1076–1081. https://doi.org/10.1111/ j.1755-0998.2010.02850.x Nilsson, R.H., Tedersoo, L., Ryberg, M., Kristiansson, E., Hartmann, M. et al. (2015) A comprehensive, automatically updated fungal ITS sequence dataset for reference-based chimera control in environmental sequencing efforts. Microbes and Environments 30, 145–150. doi: 10.1264/jsme2.ME14121. https://doi.org/10.1264/jsme2.ME14121 Nilsson, R.H., Larsson, K-H., Taylor, A.F.S., Bengtsson-Palme, J., Jeppesen, T.S. et al. (2019) The UNITE database for molecular identification of fungi: handling dark taxa and parallel taxonomic classifications. Nucleic Acids Research 47, D259–D264. https://doi.org/10.1093/nar/gky1022 Nyazika, T.K., Hagen, F., Meis, J.F. and Robertson, V.J. (2016) Cryptococcus tetragattii as a major cause of cryptococcal meningitis among HIV-infected individuals in Harare, Zimbabwe. Journal of Infection 72, 745–752. https://doi.org/10.1016/j.jinf.2016.02.018 O’Brien, H.E., Parrent, J.L., Jackson, J.A., Moncalvo, J.M. and Vilgalys, R. (2005) Fungal community analysis by large-scale sequencing of environmental samples. Applied Environmental Microbiology 71, 5544–5550. https://doi.org/10.1128/AEM.71.9.5544-5550.2005 O’Donnell, K. (1992) Ribosomal DNA internal transcribed spacers are highly divergent in the phytopathogenic ascomycete Fusarium sambucinum (Gibberella pulicaris). Current Genetics 22, 213–220. https://doi.org/10.1007/BF00351728

Sequence-based Identification and Classification of Fungi

213

O’Donnell, K. and Cigelnik, E. (1997) Two divergent intragenomic rDNA ITS2 types within a monophyletic lineage of the fungus Fusarium are nonorthologous. Molecular Phylogenetics and Evolution 7, 103–116. https://doi.org/10.1006/mpev.1996.0376 O’Donnell, K., Sutton, D.A., Rinaldi, M.G., Sarver, B.A., Balajee, S.A. et al. (2010) Internet-accessible DNA sequence database for identifying fusaria from human and animal infections. Journal of Clinical Microbiology 48, 3708–3718. https://doi.org/10.1128/JCM.00989-10 O’Donnell, K., Humber, R.A., Geiser, D.M., Kang, S., Park, B. et al. (2012) Phylogenetic diversity of insecticolous fusaria inferred from multilocus DNA sequence data and their molecular identification via FUSARIUM-ID and Fusarium MLST. Mycologia 104, 427–445. https://doi. org/10.3852/11-179 O’Leary, N.A., Wright, M.W., Brister, J.R., Ciufo, S., Haddad, D. et al. (2016) Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Research 44, D733–745. https://doi.org/10.1093/nar/gkv1189 Olson, A. and Stenlid, J. (2002) Pathogenic fungal species hybrids infecting plants. Microbes and Infection 4, 1353–1359. https://doi.org/10.1016/S1286-4579(02)00005-9 Olson, D.H., Aanensen, D.M., Ronnenberg, K.L., Powell, C.I., Walker, S.F. et al. (2013) Mapping the global emergence of Batrachochytrium dendrobatidis, the amphibian chytrid fungus. PLoS One 8, e56802. https://doi.org/10.1371/journal.pone.0056802 Ortiz-Vera, M.P., Olchanheski, L.R., da Silva, E.G., de Lima, F.R., Martinez, L.R.D.P.R. et al. (2018) Influence of water quality on diversity and composition of fungal communities in a tropical river. Science Reports 8, 14799. https://doi.org/10.1038/s41598-018-33162-y Park, B., Park, J., Cheong, K.C., Choi, J., Jung, K. et al. (2011) Cyber infrastructure for Fusarium: three integrated platforms supporting strain identification, phylogenetics, comparative genomics and knowledge sharing. Nucleic Acids Research 39, D640–646. https://doi.org/10.1093/nar/gkq1166 Pham, C.D., Purfield, A.E., Fader, R., Pascoe, N. and Lockhart, S.R. (2015) Development of a multilocus sequence typing system for medically relevant Bipolaris species. Journal of Clinical Microbiology 53, 3239–3246. https://doi.org/10.1128/JCM.01546-15 Phipps, L.M., Chen, S.C., Kable, K., Halliday, C.L., Firacative, C. et al. (2011) Nosocomial Pneumocystis jirovecii pneumonia: lessons from a cluster in kidney transplant recipients. Transplantation 92, 1327–1334. https://doi.org/10.1097/TP.0b013e3182384b57 Porras-Alfaro, A., Herrera, J., Natvig, D.O., Lipinski, K. and Sinsabaugh, R.L. (2011) Diversity and distribution of soil fungal communities in a semiarid grassland. Mycologia 103, 10–21. https://doi. org/10.3852/09-297 Prakash, P.Y., Irinyi, L., Halliday, C., Chen, S., Robert, V. and Meyer, W. (2017) Online databases for taxonomy and identification of pathogenic fungi and proposal for a cloud-based dynamic data network platform. Journal of Clinical Microbiology 55, 1011–1024. https://doi.org/10.1128/ JCM.02084-16 Pryce, T.M., Palladino, S., Kay, I.D. and Coombs, G.W. (2003) Rapid identification of fungi by sequencing the ITS1 and ITS2 regions using an automated capillary electrophoresis system. Medical Mycology 41, 369–381. https://doi.org/10.1080/13693780310001600435 Ratnasingham, S. and Hebert, P.D. (2007) BOLD: The Barcode of Life Data System (http://www.barcodinglife. org). Molecular Ecology Notes 7, 355–364. https://doi.org/10.1111/j.1471-8286.2007.01678.x Rehner, S.A. and Buckley, E. (2005) A Beauveria phylogeny inferred from nuclear ITS and EF1-alpha sequences: evidence for cryptic diversification and links to Cordyceps teleomorphs. Mycologia 97, 84–98. https://doi.org/10.3852/mycologia.97.1.84 Robert, V., Vu, D., Amor, A.B., van de Wiele, N., Brouwer, C. et al. (2013) MycoBank gearing up for new horizons. IMA Fungus 4, 371–379. https://doi.org/10.5598/imafungus.2013.04.02.16 Ropars, J., Maufrais, C., Diogo, D., Marcet-Houben, M., Perin, A. et al. (2018) - flow contributes to diversification of the major fungal pathogen Candida albicans. Nature Communications 9, 2253. https://doi. org/10.1038/s41467-018-04787-4 Ryberg, M. (2015) Molecular operational taxonomic units as approximations of species in the light of evolutionary models and empirical data from Fungi. Molecular Ecology 24, 5770–5777. https://doi. org/10.1111/mec.13444 Ryberg, M. and Nilsson, R.H. (2018) New light on names and naming of dark taxa. MycoKeys 30, 31–39. https://doi.org/10.3897/mycokeys.30.24376

214

A.M. Borman and E.M. Johnson

Samson, R.A., Yilmaz, N. and Houbraken, J. (2011) Phylogeny and nomenclature of the genus Talaromyces and taxa accommodated in Penicillium subgenus Biverticillium. Studies in Mycology 70, 159–183. https://doi.org/10.3114/sim.2011.70.04 Samson, R.A., Visagie, C.M., Houbraken, J., Hong, S.B., Hubka, V. et al. (2014) Phylogeny, identification and nomenclature of the genus Aspergillus. Studies in Mycology 78,141–173. https://doi.org/10.1016/j. simyco.2014.07.004 Sandoval-Denis, M. and Crous, P.W. (2018) Removing chaos from confusion: assigning names to common human and animal pathogens in Neocosmospora. Persoonia 41, 109–129. https://doi.org/10.3767/ persoonia.2018.41.06 Schoch, C.L., Seifert, K.A., Huhndorf, S., Robert, V., Spouge, J.L. et al. (2012) Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi. Proceedings of the National Academy of Sciences USA 109, 6241–6246. https://doi.org/10.1073/pnas. 1117018109 Schoch, C.L., Robbertse, B., Robert, V., Vu, D., Cardinali, G. et al. (2014) Finding needles in haystacks: linking scientific names, reference specimens and molecular data for Fungi. Database (Oxford) 2014, pii: bau061. https://doi.org/10.1093/database/bau061 Schwarz, P., Bretagne, S., Gantier, J.C., Garcia-Hermoso, D., Lortholary, O. et al. (2006) Molecular identification of zygomycetes from culture and experimentally infected tissues. Journal of Clinical Microbiology 44, 340–349. https://doi.org/10.1128/JCM.44.2.340-349.2006 Seifert, K.A. (2009) Progress towards DNA barcoding of fungi. Molecular Ecology Resources 9 (Suppl s1), 83–89. https://doi.org/10.1111/j.1755-0998.2009.02635.x Seifert, K.A., Samson, R.A., Dewaard, J.R., Houbraken, J., Lévesque, C.A. et al. (2007) Prospects for fungus identification using CO1 DNA barcodes, with Penicillium as a test case. Proceedings of the National Academy of Sciences USA 104, 3903901–6. https://doi.org/10.1073/pnas. 0611691104 Short, D.P., O’Donnell, K. and Geiser, D.M. (2014) Clonality, recombination, and hybridization in the plumbing-inhabiting human pathogen Fusarium keratoplasticum inferred from multilocus sequence typing. BMC Evolutionary Biology 14, 91. https://doi.org/10.1186/1471-2148-14-91 Slepecky, R.A. and Starmer, W.T. (2009) Phenotypic plasticity in fungi: a review with observations on Aureobasidium pullulans. Mycologia 101, 823–832. https://doi.org/10.3852/08-197 Stajich, J.E., Harris, T., Brunk, B.P., Brestelli, J., Fischer, S. et al. (2012) FungiDB: an integrated functional genomics database for fungi. Nucleic Acids Research 40 (Database issue), D675–81. https://doi. org/10.1093/nar/gkr918 Stielow, J.B., Lévesque, C.A., Seifert, K.A., Meyer, W., Iriny, L. et al. (2015) One fungus, which genes? Development and assessment of universal primers for potential secondary fungal DNA barcodes. Persoonia 35, 242–263. https://doi.org/10.3767/003158515X689135 Su, H., Packeu, A., Ahmend, S.A., Al-Hatmi, A.M.S., Blechert, O. et al. (2019) Species distinction in the Trichophyton rubrum complex. Journal of Clinical Microbiology 57, pii: e00352–19. https://doi. org/10.1128/JCM.00352-19 Szekely, A., Borman, A.M. and Johnson, E.M. (2019) Candida auris isolates of the Southern Asian and South African lineages exhibit different phenotypic and antifungal susceptibility profiles in vitro. Journal of Clinical Microbiology 57, pii: e02055-18. https://doi.org/10.1128/JCM.02055-18 Tatusova, T., Ciufo, S., Federhen, S., Fedorov, B., McVeigh, R. et al. (2015a) Update on RefSeq microbial genomes resources. Nucleic Acids Research 43 (Database issue), D599–605. https://doi.org/10.1093/ nar/gku1062 Tatusova, T., Ciufo, S., Fedorov, B., O'Neill, K. and Tolstoy, I. (2015b) RefSeq microbial genomes database: new representation and annotation strategy. Nucleic Acids Research 43, 3872. https://doi.org/10.1093/ nar/gkv278 Tavanti, A., Davidson, A.D., Gow, N.A., Maiden, M.C. and Odds, F.C. (2005) Candida orthopsilosis and Candida metapsilosis spp. nov. to replace Candida parapsilosis groups II and III. Journal of Clinical Microbiology 43, 284–292. https://doi.org/10.1128/JCM.43.1.284-292.2005 Theodoro, R.C., Volkmann, G., Liu, X.Q. and Bagagli, E. (2011) PRP8 intein in Ajellomycetaceae family pathogens: sequence analysis, splicing evaluation and homing endonuclease activity. Fungal Genetics and Biology 48, 80–91. https://doi.org/10.1016/j.fgb.2010.07.010

Sequence-based Identification and Classification of Fungi

215

Thines, M., Crous, P.W., Aime, M.C., Aoki, T., Cai, L. et al. (2018) Ten reasons why a sequence-based nomenclature is not useful for fungi anytime soon. IMA Fungus 9, 177–183. https://doi.org/10.5598/ imafungus.2018.09.01.11 Turland, N.J., Wiersema, J.H., Barrie, F.R., Greuter, W., Hawksworth, D.L. et al. (eds) (2018) International Code of Nomenclature for algae, fungi, and plants (Shenzhen Code) adopted by the Nineteenth International Botanical Congress Shenzhen, China, July 2017. Regnum Vegetabile 159. Koeltz Botanical Books, Glashütten. https://doi.org/10.12705/Code.2018 Underwood, A. and Green, J. (2011) Call for a quality standard for sequence-based assays in clinical microbiology: necessity for quality assessment of sequences used in microbial identification and typing. Journal of Clinical Microbiology 49, 23–26. https://doi.org/10.1128/JCM.01918-10 Vilgalys, R. and Hester, M. (1990) Rapid genetic identification and mapping of enzymatically amplified ribosomal DNA from several Cryptococcus species. Journal of Bacteriology 172, 4238–4246. https://doi. org/10.1128/JB.172.8.4238-4246.1990 Visagie, C.M., Houbraken, J., Frisvad, J.C., Hong, S.B., Klaassen, C.H. et al. (2014) Identification and nomenclature of the genus Penicillium. Studies in Mycology 78, 343–71. https://doi.org/10.1016/j.simyco.2014.09.001 Vu, D., Groenewald, M., Szöke, S., Cardinali, G., Eberhardt, U. et al. (2016) DNA barcoding analysis of more than 9 000 yeast isolates contributes to quantitative thresholds for yeast species and genera delimitation. Studies in Mycology 85, 91–105. https://doi.org/10.1016/j.simyco.2016.11.007 Vu, D., Groenewald, M., de Vries, M., Gehrmann, T., Stielow, B. et al. (2019) Large-scale generation and analysis of filamentous fungal DNA barcodes boosts coverage for kingdom fungi and reveals thresholds for fungal species and higher taxon delimitation. Studies in Mycology 92, 135–154. https://doi. org/10.1016/j.simyco.2018.05.001 Wang, Q.M., Begerow, D., Groenewald, M., Liu, X.Z., Theelen, B. et al. (2015a) Multigene phylogeny and taxonomic revision of yeasts and related fungi in the Ustilaginomycotina. Studies in Mycology 81, 55–83. https://doi.org/10.1016/j.simyco.2015.10.004 Wang, Q.M., Yurkov, A.M., Göker, M., Lumbsch, H.T, Leavitt, S.D. et al. (2015b) Phylogenetic classification of yeasts and related taxa within Pucciniomycotina. Studies in Mycology 81, 149–89. https://doi. org/10.1016/j.simyco.2015.12.002 Ward, E. and Adams, M.J. (1998) Analysis of ribosomal DNA sequences of Polymyxa species and related fungi and the development of genus- and species-specific PCR primers. Mycological Research 102, 965–974. https://doi.org/10.1017/S0953756297005881 Warnock, D.W. (2016) Name changes for fungi of medical importance, 2012 to 2015. Journal of Clinical Microbiology 55, 53–59. https://doi.org/10.1128/JCM.00829-16 Warnock, D.W. (2019) Name changes for fungi of medical importance, 2016-2017. Journal of Clinical Microbiology 57, e01183–18. https://doi.org/10.1128/JCM.01183-18 White, T.J., Bruns, T., Lee, S. and Taylor, J. (1990) Amplification and direct sequencing of fungal ribosomal RNA genes for phylogenetics. In: PCR Protocols: A Guide to Methods and Applications. Academic Press, San Diego, CA, pp. 315–322. https://doi.org/10.1016/B978-0-12-372180-8.50042-1 Wiederhold, N.P. and Gibas, C.F.C. (2018) From the clinical mycology laboratory: New species and changes in fungal taxonomy and nomenclature. Journal of Fungi 4, 138. https://doi.org/10.3390/jof4040138 Wolff, A.M., Appel, K.F., Petersen, J.B., Poulsen, U. and Arnau, J. (2002) Identification and analysis of genes involved in the control of dimorphism in Mucor circinelloides (syn. racemosus). FEMS Yeast Research 2, 203–213. https://doi.org/10.1016/S1567-1356(02)00090-9 Woudenberg, J.H., Aveskamp, M.M., de Gruyter, J., Spiers, A.G. and Crous, P.W. (2009) Multiple Didymella teleomorphs are linked to the Phoma clematidina morphotype. Persoonia 22, 56–62. https://doi.org/ 10.3767/003158509X427808 Xu, J. (2016) Fungal DNA barcoding. Genome 59, 913–932. https://doi.org/10.1139/gen-2016-0046 Xu, J., Saunders, C.W., Hu, P., Grant, R.A., Boekhout, T. et al. (2007) Dandruff-associated Malassezia genomes reveal convergent and divergent virulence traits shared with plant and human fungal pathogens. Proceedings of the National Academy of Sciences USA 104, 18730–18735. https://doi.org/10.1073/ pnas.0706756104 Yahr, R., Schoch, C.L. and Dentinger, B.T. (2016) Scaling up discovery of hidden diversity in fungi: impacts of barcoding approaches. Philosophical Transactions of the Royal Society of London B: Biological Sciences 371 (1702), pii: 20150336. https://doi.org/10.1098/rstb.2015.0336

216

A.M. Borman and E.M. Johnson

Zamora, J.C., Svensson, M., Kirschner, R., Olariaga, I., Ryman, S. et al. (2018) Considerations and consequences of allowing DNA sequence data as types of fungal taxa. IMA Fungus 9, 167–175. https://doi. org/10.5598/imafungus.2018.09.01.10 Zhan, P., Dukik, K., Li, D., Sun, J., Stielow, J.B. et al. (2018) Phylogeny of dermatophytes with genomic character evaluation of clinically distinct Trichophyton rubrum and T. violaceum. Studies in Mycology 89,153–175. https://doi.org/10.1016/j.simyco.2018.02.004

13

Identification and Classification of Prokaryotes Using Whole-genome Sequences

Luis M. Rodriguez-R1, Ramon Rosselló-Móra2 and Konstantinos T. Konstantinidis1,3* 1 School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, GA, USA; 2Grup de Microbiologia Marina, Institut Mediterrani d’Estudis Avançats (IMEDEA), Universitat de les Illes Balears (UIB) and Consejo Superior de Investigaciones Científicas (CSIC), Esporles, Illes Balears, Spain; 3School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, USA

Introduction The 16S ribosomal RNA (rRNA) gene tree, supplemented with (or ‘decorated by’) phenotypic and/ or morphological traits assessed by traditional culture-based methods such as lipid and substrate utilization profiles and DNA-DNA hybridization values for delineating the species level, has served effectively as the backbone for identification and classification (i.e. the taxonomy) of prokaryotic taxa. Standards to define the different taxonomic ranks based on the branching pattern and/or 16S rRNA gene sequence identity level have also been proposed, and are commonly used (Yarza et al., 2014). In the last two decades or so (i.e. between 2000 and 2020), it has become clear that whole-genome sequencing approaches, mostly at the DNA level but also the RNA and proteome levels, offer specific advantages such as higher species-level resolution and lower experimental noise over the traditional and, generally speaking, coarser methods for microbial taxon identification and classification (Konstantinidis and Tiedje, 2005a). For example,

the 16S rRNA gene offers limited resolution at the species and subspecies level, and is often not assembled as part of the metagenome-assembled genomes (MAGs) recovered from environmental DNA, limiting its usefulness (Parks et al., 2018; Rodriguez-R et al., 2018a). Traditional taxonomic methods are also not easily applicable to uncultivated taxa, rendering the genome- sequencing approach the only plausible and adequately high-throughput approach for cataloguing the uncultivated taxa that represent the predominant majority of prokaryotes in nature (Konstantinidis and Rosselló-Móra, 2015). Accordingly, we are in an era of taxonomic revolution, and it is highly likely that taxonomy will be practised differently in the next decade (2020– 2030) compared to the current practice. Here, we attempt to summarize the major findings from genome-based taxonomic studies in the past two decades, and briefly describe the major genome-based approaches currently available for species identification and classification with special focus on the ‘uncultivated majority’ and associated limitations. We also outline

*[email protected] © CAB International 2021. Trends in the Systematics of Bacteria and Fungi (eds. P. Bridge, D. Smith and E. Stackebrandt)

217

218

L.M. Rodriguez-R et al.

future directions towards a truly genome-based taxonomy for prokaryotes that will equally encompass cultured and uncultivated taxa.

Genome-based Classification: Advantages The classification of prokaryotes has traditionally been based on the recognition of monophyletic groups of organisms that can be differentiated from their closest relatives by means of genotypic uniqueness and a diagnostic phenotype (Rosselló-Móra and Amann, 2015) (see also Chapters 11 and 15). Monophyly has been assessed by sequencing single (or multiple) genetic markers, most notably the 16S rRNA gene (Stackebrandt et al., 2002), whereas genotypic circumscription is typically based on nucleic acid reassociation techniques such as DNA-DNA hybridization (DDH), and phenotype is generally determined using biochemical tests and chemotaxonomic markers (Rosselló-Móra and Amann, 2015). DDH has been used as the gold standard for species circumscriptions, as it offers robust resolution among closely (but not distantly) related organisms, and organisms with high DDH values (i.e. > 70%) tend to show high enough phenotypic homogeneity. However, because DDH is a cumbersome, error-prone technique, and its values are not always directly comparable and portable among different laboratories, alternative approaches were sought to replace this method (Stackebrandt et al., 2002). 16S rRNA gene analysis was explored as such an alternative (Stackebrandt and Ebers, 2006), but the lack of resolution of this gene at the species level ensured that DDH remained indispensable for organisms that show 16S rRNA gene identity values greater than ~98.5%. Alternatively, multi-locus sequence analysis (MLSA) using sets of concatenated essential genes was proposed as the alternative to DDH (Stackebrandt et al., 2002), but without success (Rosselló-Móra and Amann, 2015). This was presumably due to practical reasons related to the amount of effort required and the (low) robustness of the signal obtained when compared to whole-genome sequencing. The advantages of whole (or partial) genome sequences for revealing the complete phenotypic potential of organisms, finding diagnostic traits and delineating

the (different) evolutionary histories of individual genes, as well as weighting strategies to deal with the conflicting phylogenetic signal of individual genes, have been well described (e.g. Gophna et al., 2005; Ciccarelli et al., 2006). We focus here on comparisons of whole-genome approaches to traditional methods of 16S rRNA gene phylogeny, and the genome-based classification methods that have recently been proposed.

What Did Whole-genome Sequencing Reveal About Traditional Taxonomic Practices? The major conclusion from trees based on whole- genome concatenated alignments (Ciccarelli et al., 2006), individual or consensus trees of protein- coding universal genes (Gophna et al., 2005; Parks et al., 2018), or trees from shared gene content (Snel et al., 1999) and other genome-derived metrics (e.g. Konstantinidis and Tiedje, 2005a) is that there is good correlation overall between genome-derived metrics and 16S rRNA (or the 23S rRNA) gene phylogeny. That is, the 16S rRNA gene phylogeny is robust at the genus level and above, and its resolution may be higher for classification purposes than that of (at least some of) the protein-coding genes, owing to stronger selective constraints (i.e. slower sequence evolution). Given also the much larger number of taxa (at least 50 times at the time of this writing) and type strains currently known by their 16S rRNA gene versus whole-genome sequences (Cole et al., 2011; Quast et al., 2013; Yarza et al., 2013), as well as the limitations of whole-genome approaches (discussed below), it seems that the 16S rRNA gene could continue serving as the backbone of prokaryotic phylogenetic reconstructions and taxonomic classifications for the next decade. There is no compelling reason to replace it with a genome-derived metric at the genus level and higher. For within-genus resolution, it has become apparent that the 16S rRNA gene sequences are sometimes too conserved to robustly resolve between closely related – yet distinct – species (Rodriguez-R et al., 2018b), confirming the earlier results mentioned above (Stackebrandt and Ebers, 2006). Further, whole-genome sequencing provides significantly higher resolution than DDH at this level (Konstantinidis and Tiedje, 2005a; Jain

Identification and Classification of Prokaryotes

et al., 2018), and the genome-derived metrics are (more) portable and reproducible. The absence of this represents one of the major drawbacks of DDH (Stackebrandt et al., 2002). For instance, the genome-aggregated average nucleotide (ANI) value of shared genes among two related genomes (Konstantinidis and Tiedje, 2005a,b) has been shown to correlate well with their DDH values, and any disagreements are most commonly due to the experimental noise of the latter as opposed to the former method (Goris et al., 2007). Accordingly, ANI has been recognized as a replacement of the DDH because it provides robust resolution at the species and subspecies levels and represents a straightforward, easy-to-compute measure of genetic relatedness based on complete or draft genome sequences. The ANI signal saturates at the genus level, similar to DDH; hence, ANI is not useful for comparisons among distantly related genomes. Notably, at least seven webservers currently provide online ANI calculation capabilities: JSpeciesWS (Richter et al., 2016); pairwise Genome-to-Genome Distance Calculator (or GGDC) of the German Culture Collection (Auch et al., 2010); pairwise OrthoANI with the Orthologous Average Nucleotide Identity Tool (OAT) from ChunLab (Lee et al., 2016); pre-computed Microbial Species Identifier (MiSI) at Department of Energy’s Joint Genome Insitute (DOE-JGI) (Varghese et al., 2015); pre- computed ANI at NCBI GenBank (Ciufo et al., 2018); and our own pairwise ANI with the Enveomics Collection (Rodriguez-R et al., 2016), as well as the database search with the Microbial Genomes Atlas or MiGA webserver (Rodriguez-R et al., 2018a). Another major conclusion from genomic studies of microbial diversity is that the 95% ANI level, which corresponds to the recommended 70% DDH standard for species demarcation, commonly (Goris et al., 2007) but not always (Konstantinidis and DeLong, 2008) corresponds to species-like discontinuities among all available prokaryotic genomes (~100,000 genomes at the time of writing) (Jain et al., 2018) and to the assessment of natural populations based on metagenomics (Caro-Quintero and Konstantinidis, 2012) (Fig. 13.1). That is, genomes of the same species tend to show > 95% ANI among themselves (and sometimes > 98% for more clonal or recently emerged species) and < 95% (actually, < 85% in most of the cases) ANI with representatives of other species (i.e. the genomes form a genetic or sequence discontinuity).

219

Further studies are required to elucidate the ecological and/or genetic mechanisms underlying this ‘sequence gap’, and the factors responsible for the exceptions to the 95% rule (Caro-Quintero and Konstantinidis, 2012; Bendall et al., 2016) and thus advance the species concept (i.e. what a species is) (Rosselló-Móra and Amann, 2001; Cohan, 2019). Nonetheless, the gap represents a significant finding for circumscribing species (the species definition) because it indicates that species may exist for prokaryotes, and hence can be described and catalogued (for a contrasting view, see Lawrence, 2002; Doolittle, 2019). It is important to mention that approaches based on 16S rRNA or a set of universally conserved genes (Mende et al., 2013), owing to the high sequence conservation of these genes, frequently fail to reveal clear genetic discontinuities among closely related taxa that are discrete based on ANI. Notably, such sequence discontinuities were not observed for any other category/rank of taxonomy but the species category (Konstantinidis and Tiedje, 2005b; Rodriguez-R and Konstantinidis, 2014), making it a challenge to unequivocally define these ranks based on genomic or other relatedness-based criteria. Importantly, the 95% ANI standard is largely consistent with how the 100,000 genomes in the NCBI database have been classified into distinct (named) species by taxonomists thus far since, in ~93% of the cases, genomes of the same species also show > 95% ANI among themselves. Only about 7% of the genomes are classified as different species, while their ANI values are > 95% (Jain et al., 2018). Almost all of the latter cases are because of a few species of medical importance such as Escherichia coli and Shigella spp., a well-documented case of inconsistency between taxonomy and genomic relatedness (Lan and Reeves, 2001), or a group of closely related named species such as Mycobacterium tuberculosis (reference genome in the ANI comparison), Mycobacterium canettii (ANI 97%–99% against reference), Mycobacterium bovis (ANI 99.6%), Mycobacterium microti (ANI 99.8%–99.9%) and Mycobacterium africanum (ANI 99.9%), which are part of the M. tuberculosis complex (Jain et al., 2018). Hence, the 95% ANI criterion is largely consistent with how species have been defined thus far, providing further support for the value of the existing taxonomic system, and can guide future species descriptions. This is similar to how DDH – upon

220

(A)

L.M. Rodriguez-R et al.

(B)

9K 8K 7K

1K 900 800 Genome pairs

6K Genome pairs

1.1K

Lowest shared rank Order (n: 52) Family (n: 97,137) Genus (n: 82,179) Species (n: 135,622)

5K 4K 3K

700 600 500 400 300

2K

200

1K

100

0

0 80

85

90

95

Average Nucleotide Identity (ANI; %)

100

40

50

60

70

80

90

100

Aligned Fraction (AF; %)

Fig. 13.1. Distribution of average nucleotide identity (ANI) and aligned fraction (AF) between complete prokaryotic genomes. The collection of complete and chromosome status prokaryotic genomes in the NCBI Genomes database was de-replicated at 99.5% ANI for pairs classified in the same species, and constitutes one of the reference databases in MiGA (NCBI_Prok). (A) Distribution of ANI values between all pairs of genomes in NCBI_Prok with AAI ≥ 85%, coloured by the lowest shared rank between the two genomes (see figure key). Note that values in the region 83%–95% ANI are much more rare than values in its two flanking regions, and that the right region (ANI > 95%) is truncated by the applied de-replication threshold used (ANI = 99.5%); the left region (ANI < 85%) is truncated by the lowest AAI threshold used. All ANI values were estimated using FastANI (Jain et al., 2018). (B) Distribution of aligned fractions (AFs) for the same pairs of genomes. Note that AF displays a bimodal distribution, with a valley roughly corresponding to genus-level delimitation around 65%. However, AF is highly susceptible to varying genome completeness, unlike ANI values, which can be accurately estimated from genomic fragments covering an aligned fraction as small as 4%.

which the ANI concept is essentially founded (Goris et al., 2007) – has functioned for the past four decades. Importantly, by combining ANI (sub- genus resolution) with the phylogeny or sequence identity of universally conserved genes such as the 16S rRNA gene (above genus resolution), robust resolution within the whole bacterial and archaeal domains can be achieved. Our MiGA webserver employs this idea, and the genome- aggregate average amino acid identity of all shared genes between two genomes (AAI). This provides good resolution of genomes related at the genus level and above, and thus, when combined with ANI, coverage of all prokaryotes (Rodriguez-R et al., 2018a), as explained further on.

Genome-based Classification: Limitations In the previous section, we outlined the major advantages of genome-based approaches for

taxon identification and classification. However, there are also several limitations to how these genomic approaches are currently being used, which we attempt to summarize below. Note, however, that some of these limitations are not specific to genome-based methods but apply more broadly. Most notably, the goal of taxonomy is to group in the same species the organisms that show an identical, or nearly identical, function (or ‘phenotype’, more generally speaking) in their natural environment (i.e. in situ); and that these organisms form a cohesive – and monophyletic – unit that is distinct from other related units. Taxonomic studies have focused predominantly on the DNA level to date, but DNA reflects only the potential functions the organisms can perform. RNA and proteome-level measurements are getting closer to the actual function performed under the sampling conditions, and there is typically a good correlation between functional activity inferred based on DNA, RNA and proteome data (Helbling et al., 2012; Orellana et al., 2019). However,

Identification and Classification of Prokaryotes

current methodologies to measure functional activity in situ are still technically limited, even with the latest RNA/proteome technologies available (reviewed in VerBerkmoes et al., 2009). Further, activity measured from pure cultures in the laboratory is generally not reflective of in situ activity, unless the laboratory growth conditions closely simulate the natural environment, which is not often the case (Konstantinidis and Rosselló-Móra, 2015). Hence, there is still a need for new method development for assessing in situ activity, and it is likely that an important level of functional (and thus taxonomic) differentiation may be found within the currently named species. Until such methods become widely available, the recent proposal to infer functions and diagnostic traits from bioinformatic analysis of genomic sequence data against the genomes of close relatives (and, when possible or desirable, to validate these functions with RNA, isotope or other experimental data) remains probably the most pragmatic approach available (Konstantinidis and Rosselló-Móra, 2015; Konstantinidis et al., 2017). Another important limitation of genomic methods is standardization. Take as an example the seven ANI webservers mentioned above. While all webservers provide similar ANI values for the same pair of genomes compared, these values are not identical to each other and typically deviate by 0.1%, if not more. In general, algorithms that gain in speed typically lose in sensitivity. This means that the ANI values they calculate for moderately or distantly related genomes tend to represent overestimates, compared to the slower but more sensitive algorithms, in aligning moderately divergent sequences (Varghese et al., 2015; Jain et al., 2018). It would be important to standardize these methods against each other and provide direct interpolated values between different algorithms. Perhaps the sensitive Blastn method that was optimized to closely match the experimental DDH values for the same genomes compared (Goris et al., 2007) could represent the reference for standardization. Finally, no method is perfect and the genome- based methods would be no exception to this rule. Most notably, metagenome assembly and binning – as well as single-amplified genomes – now provide the opportunity to assemble the genome of uncultivated taxa on an unprecedented scale, and perform genealogical and taxonomic studies of such genome sequences. It is now feasible

221

(probably for the first time) to describe the total prokaryotic diversity in nature based on these genome-binning methods (Thompson et al., 2017). However, the underlying methodologies are prone to errors; in particular, they could provide chimeric genome sequences, which represent (combined) pieces of the genome sequence of distinct (even unrelated) species, and/or incomplete genome sequences. Bioinformatics pipelines to detect such chimeric sequences (e.g. Parks et al., 2015; Rodriguez-R et al., 2018a), and quality standards to detect high- versus low-quality and completeness (and thus, reliable) genome sequences, have been established (Bowers et al., 2017; Konstantinidis et al., 2017). The accuracy of these pipelines, however, could be low in some (predictable) cases of high intra-population heterogeneity or low sequencing coverage of the natural populations (Becraft et al., 2017; Sczyrba et al., 2017; RamosBarbero et al., 2019). For several researchers this lower quality (in general) of binned genomes compared to genomes of isolates is unacceptable (Bisgaard et al., 2019; Overmann et al., 2019). We argue, nevertheless, that the advantages provided by the assembly and binning methodologies to access organisms that are otherwise almost impossible to assess at a similar level outweigh the disadvantages. In addition, the low-quality MAGs and SAGs are often detectable by the approaches mentioned above and such low-quality genomes should not be encouraged for taxonomic descriptions. Further, imperfect methods have served prokaryotic taxonomy well in the past (e.g. DDH; see also further discussion below). Accordingly, we foresee that genomebinning methods will, in the near future, have a similar impact to DDH in describing the uncultivated prokaryotic diversity that exists in nature (see also below). The upcoming availability and wide accessibility of long-read sequencing (e.g. Andersen et al., 2019) will help to identify, fix and replace any chimeric genome sequences present in the reference databases, in a process that could be analogous to replacing the (usually lost) type strain of a validly named species by a neotype strain for isolated organisms.

Genome Classification Resources Available A few genome-based approaches such as ProGenomes (Mende et al., 2013), the Genome Taxonomy

222

L.M. Rodriguez-R et al.

Database or GTDB (Parks et al., 2018) and our own MiGA (Rodriguez-R et al., 2018a) have recently been described. They attempt to catalogue and taxonomically organize the microbial diversity revealed by genome and metagenome sequencing. Often, but not always, these approaches give similar results for the same genomes. Deviations in their results are typically attributable to well-understood differences of the underlying algorithms, or the level of comprehensiveness of the reference genome databases against which the query genomes are classified. In our view, MiGA provides several key advantages for taxonomic purposes, compared to these alternative approaches. First, the alternative approaches are based on a set of universally conserved genes (Mende et al., 2013), such as the 16S rRNA and ribosomal protein-encoding genes, which are often not applicable to incomplete genomes recovered from metagenomic data sets (the genes are not assembled and/or binned into MAGs). Perhaps more importantly, these genes typically show higher sequence conservation than the genome average. Consequently, analysis of universal genes does not provide sufficient resolution at the species level (Konstantinidis et al., 2006), and has frequently resulted in lack of clear genetic discontinuities among closely related taxa (Mende et al., 2013). ANI used by MiGA effectively circumvents these limitations, and our recent evaluation shows that robust ANI values can be obtained in a high-throughput manner even from as little as a ~200 Kbp-long subset of the genome that is shared by the pair of genomes in comparison (Jain et al., 2018). Notably, we managed to compute reliable ANI values between 90,000 genomes (i.e. 4 billion genome pair comparisons) using the FastANI algorithm recently developed by our team in a couple days, using modest computational resources (a personal laptop) (Jain et al., 2018). Performing this with alignment-based approaches, such as those implemented by ProGenomes (mOTUs) and GTDB, represents a daunting task that cannot easily be completed online. Thus, the ANI approach, and AAI for comparisons among more distantly related genomes (as implemented in the MiGA webserver) is more easily scalable to the geometrically increasing number of genomes. It should be able to scale to the 1 million genomes mark much more easily than the alternative methodologies.

Importantly, the statistical and algorithmic framework of MiGA is designed to be descriptive; that is, to capture a statistical signal from genome comparisons of historic taxonomic classifications. In contrast, methods seeking uniform phylogenetic descriptions based solely on genomic information are inherently prescriptive: they aim to generate stand-alone classification schemes. Consequently, another important limitation of alignment-based methods like GTDB is the instability of the resulting phylogenetic tree that guides the taxonomic naming of the genomes based on the branching patterns of the tree. With the addition of new genomes, especially those that are deeply branched, the branching patterns of the tree would change. This would mean that the naming of the taxa grouped at the unstable branches would have to change accordingly. Benchmarking studies have shown that the unstable branches are relatively infrequent compared to the total branches available; for example, typically less than 5% of the branches change with the addition of a new genome (Parks et al., 2018). However, 5% (or less) of 20,000 or more branches (the approximate number of available unique species or 95% ANI clusters currently) represents too high a number of unstable taxa, and so this framework does not provide adequate taxonomic stability. The AAI/ANI clustering approach employed by MiGA does not suffer this limitation because it uses the taxonomic names already assigned to the genome without a need to change them to match branching patterns. Finally, an important concept to consider is that GTDB attempts to normalize the intra-taxon evolutionary (or phylogenetic) relatedness, and make it equal for taxa grouped at the same taxonomic rank. The aim is to move towards a system that is more uniform, standardized and predictive of the relatedness of genomes grouped at the same taxonomic rank. That is, all species should have similar intraspecies diversity, and species that are more clonal (i.e. less diverse) than other species should be adjusted (i.e. be merged and their names changed) in order to become more uniform. While there may be some value to this idea for future taxon descriptions of uncharacterized taxa and higher ranks (family and above), applying this idea at the species level will create unnecessary confusion, at least temporarily, and may not (biologically) be easily defendable. Take, for instance, the Bacillus anthracis species, one of the most clonal species ever

Identification and Classification of Prokaryotes

described (ANI among B. anthracis genomes > 99.5%) (Van Ert et al., 2007; Pena-Gonzalez et al., 2018). If the GTDB method is applied to it, Bacillus. anthracis would have to be merged with (and renamed) Bacillus cereus, a sister species whose members often share > 95% ANI with B. anthracis genomes. However, there are important diagnostic properties, such as the presence of the anthrax toxin in the genome of the former species but not of the latter. These result in major phenotypic consequences (highly virulent versus avirulent organisms in this case) that taxonomically justify the separation of B. anthracis into a distinct species. Clinical diagnostic tests are also attuned to these diagnostic properties, and it would be highly challenging for taxonomists (as well as practitioner microbiologists) to change this naming system. Moreover, some species have evolved (appeared) more recently. For example, a shorter evolutionary time elapsed since the last population sweep or bottleneck that led to their speciation compared to other species, or they other species, or have long dormant stages that do not allow for accumulation of high intraspecies sequence diversity, as in the above B. anthracis case. In contrast, other species such as E. coli are evolving faster and show high promiscuity in acquiring or losing foreign DNA (Lawrence and Ochman, 1998). Equating such distinct ecological and genetic characteristics under similar (calibrated) intra-taxon diversity seems unjustifiable at present, and probably would have more disadvantages than advantages for practising microbiologists. The limitations mentioned above for the species level also apply to the higher ranks (e.g. genus and family levels) of the GTDB taxonomy. For example, sweeping taxonomic changes have recently been proposed for members of the orders Mycoplasmatales and Entomoplasmatales (Gupta et al., 2018, 2019), matching the GTDB reclassification. However, among other problems, these changes would generate significant nomenclatural destabilization and pose significant risks to public health management, and so a subcommittee of the International Committee on Systematics of Prokaryotes that was considering the taxonomy of Mollicutes has recommended that they should be rejected (Balish et al., 2019). The AAI approach has been criticized for not offering robust resolution at the phylum level and above. This is because an increasing

223

number of multiple substitutions at the same site have most likely occurred between the ancestors of more divergent genomes that are not considered in the ANI/AAI measurement (but are considered in molecular evolution analysis of sequence alignments). Indeed, phylogenetic analysis of aligned universal gene sequences is probably advantageous at this level (Ciccarelli et al., 2006; Cole et al., 2010; Parks et al., 2018), but not at the sub-genus level, as explained above. However, it is important to realize that protein-coding genes are sometimes challenging to identify and align between organisms of different phyla (e.g. the protein sequences are too divergent); hence, the limitation of AAI also applies to some degree to protein-coding universal gene alignment methods. Indeed, AAI is increasingly based on universal protein-coding genes between genomes that are more divergent. If such genes cannot be reliably identified and aligned (as is often the case between – for instance – archaeal and bacterial genomes) no AAI or alignment-based phylogeny is possible (or the derived tree is not robust). Accordingly, AAI-based classification of deep-branching genomes is frequently consistent with those derived based on phylogenetic analysis of universal protein-coding genes (Rodriguez-R et al., 2018a). It appears that only a relatively few essential genes (or gene domains) fulfil the criteria for universal distribution and sufficient sequence conservation for reliable alignment and so can serve as phylogenetic markers for all prokaryotes. These genes encode the 23S rRNA, transcription elongation and initiation factors, subunits of proton translocation ATPase, RNA polymerase, DNA gyrase, RecA, heat shock proteins and amino acyl tRNA synthetases (Ludwig and Schleifer, 2005). Comparative phylogenetic analysis of these markers is in good agreement with that of the 16S rRNA gene, at least with respect to the major taxa. However, local tree topologies often differ, depending on the gene analysed. Therefore, the 16S or 23S rRNA gene phylogeny may be as robust as (if not more than) those of protein-coding genes at the phylum and domain levels. rRNA genes are easier to identify and align; for example, they share at least 60% nucleotide identity between archaeal and bacterial genomes versus < 30% amino acid identity for the protein-coding gene markers (Konstantinidis and Tiedje, 2005a). It is also important to note

224

L.M. Rodriguez-R et al.

that < 30% amino acid identity is in the twilight zone of homology searches; that is, the alignment at this level is not always reliable and may be the result of spuriously matching amino acids and not of true homology (Rost, 1999). Hence, a hybrid approach of rRNA gene coupled with wholegenome derived metrics such as AAI/ANI may be advantageous for covering the whole prokaryotic tree, especially in terms of computation and curation time needed. How best to combine genome-derived metrics with rRNA gene phylogeny should be a subject of future investigations.

Unculturable Taxa: Genome-based Classification is the Only Way Forward Unlike taxa represented by isolates that can be studied by several different (culture-based) techniques, uncultivated taxa can only be assessed using culture-independent genomic techniques, largely MAGs and single-cell amplified genomes (SAGs). This creates an important difference; that is, the type material for the former taxa is a living isolate (the type strain), deposited in at least two public culture collections, but this is not possible to apply to the latter taxa. For the latter taxa, as proposed recently (Whitman, 2016) the DNA/genome sequence could effectively serve as the type material voucher, but this proposal has not been accepted yet. Accordingly, prokaryotic taxonomy is biased against uncultivated taxa in that these taxa can only be provisionally named using the Candidatus approach since they cannot meet the living culture requirement. However, Candidatus names are given no priority compared to names of isolates and, as a consequence, Candidatus names can be overwritten when a representative organism is brought to culture and given a different name (Konstantinidis and Rosselló-Móra, 2015). This has discouraged researchers from naming uncultivated taxa; instead, alphanumeric identifiers such as SAR-11 and SUPO3 (Brown et al., 2012; Glaubitz et al., 2013) have commonly been used for uncultivated taxa. However, these identifiers are challenging to remember, and do not typically reflect any important ecological or phenotypic/ functional information about the organism in question, unlike officially recognized names that

use the Linnaean naming system (see Chapter 3). Further, because the alphanumeric identifiers are not regulated, multiple identifiers (synonyms) often exist for the same group of organisms, causing unnecessary confusion among researchers and in communication with the public. It is important to try to reconcile the taxonomy of cultivated organisms with that of the uncultivated taxa towards a single, standardized system. This will encompass all prokaryotes and encourage the scientific community to officially describe at least the important uncultivated taxa. We (Konstantinidis et al., 2017) and others (Hedlund et al., 2015), believe that this is a feasible task and, in fact, it only requires two straightforward changes. That is, to give priority to Candidatus names, and to the names of isolated organisms, and qualify the genome sequence as the type material (voucher) for taxonomic descriptions. We argue that this approach is unlikely to result in decreased focus on isolation efforts, since isolating an organism in the laboratory still has important advantages for its study and use in downstream applications. Hence, the taxonomy of the cultivated taxa would not be threatened by these changes, especially because the recommendation to deposit an isolate to two culture collections should be maintained for cultured organisms. Nor do we anticipate an overwhelming increase in the number of uncultivated taxa that will be taxonomically described, especially if the taxonomic description of uncultivated taxa requires multiple genome sequences (e.g. single-genome descriptions should not be encouraged) and information is supplied on the ecological breadth and metabolic functions carried out by the taxon in question, as suggested recently (Konstantinidis and Rosselló-Móra, 2015; Konstantinidis et al., 2017). The description of such taxa would require a substantial effort by the authors, and thus only organisms of interest would be taxonomically classified among the great majority of uncultivated taxa that exist in nature. Finally, and perhaps more importantly, we believe that metagenomic methods have advanced adequately to allow one to gather enough ecological and phenotypic data for robust taxonomic descriptions, similar in quality and thoroughness to what can be achieved based on isolate description. For instance, MAGs and SAGs can reveal the genealogy of the organisms to be described taxonomically. Time- and/or spatial-

Identification and Classification of Prokaryotes

series metagenomics can reveal their relative in situ abundance and dynamics upon changing environmental conditions, thus providing important information towards defining the ecological niche of the organism. Bioinformatics predictions of the functional gene content of the genome following the community standards recently proposed (Field et al., 2008) can serve as a minimum description of the functional potential of the organism. When desirable, metatranscriptomics or isotope-based approaches (e.g. Nanoscale secondary ion mass spectrometry (NanoSIMS)) can confirm the bioinformatics predictions and/ or reveal the in situ functions carried out by the organisms to a level that is as good as (if not better than) the functions that can be inferred from isolates in the laboratory, especially when the laboratory growth conditions deviate from in situ conditions. We believe that obtaining the information described above is achievable for most organisms, and hence represents a broadly applicable yet robust foundation for a classification system suitable for all microorganisms, not only the uncultivated taxa. Several scientists have argued that the MAG and SAG information is not of similar quality to that derived from isolate-based experiments in the laboratory, or have presented examples of lower MAG/SAG quality than predicted by the currently available bioinformatics pipelines for quality estimation. For these reasons they feel that MAGs and SAGs do not provide a good representation of the organisms or population under investigation (Bisgaard et al., 2019; Overmann et al., 2019). While this is at least partly true, for several reasons it is not critical enough to prevent progress towards cataloguing the taxonomic diversity of uncultivated organisms. (i) Prokaryotic taxonomy has always been based on imperfect methods, and MAGs/SAGs are no exception to this (discussed above in the context of DDH). (ii) Quality can be assessed beyond reasonable doubt, such as by visual examination of read-recruitment plots in combination with the quality-checking pipelines (Rodriguez-R and Konstantinidis 2016). We believe that only genomes of high enough quality, based on these tests, should be taxonomically described (Konstantinidis et al., 2017; the latter publication also discusses some possible exceptions to this). (iii) The standards to use have been outlined previously (Bowers et al., 2017; Konstantinidis et al., 2017), and are of similar

225

stringency to those used to isolate genomes. (iv) Long-read sequencing for routine taxonomic descriptions, even for environmental samples, is developing rapidly (e.g. Andersen et al., 2019), and there is a strong belief that it will circumvent several of the issues around low quality reported for MAGs and SAGs in the literature. For example, it will provide a complete genome of similar quality to the isolate genomes, and/or help to identify and fix genome sequences that may be chimeric. It is argued that when DNA sequence type material is replaced by new versions owing to new sequencing technologies and/or tools for genome assembly, the species descriptions would consequently have to be revised, resulting in an unstable species (Bisgaard et al., 2019). However, this is unlikely to be true for most (if not all) taxa, because such new versions will mostly affect only a small number of genes or nucleotide substitution positions in the genome. This was shown by analysis of mock data sets of known composition (Sczyrba et al., 2017) or the sequencing of the isolated Candidatus Macondimonas diazotrophica that was almost identical to its corresponding (and previously assembled) MAG (e.g. ANI > 99.9%) (Karthikeyan et al., 2019). It is also important to realize that for two genomes to accumulate ~1% difference in their ANI value, more than 20,000 years of evolution would be required (Lawrence and Ochman, 1998), which represents too long a time to affect current taxonomy. Hence the genealogy of the genome, and thus of its nomenclature and classification, will remain unaffected in the great majority of cases where new versions of the genome become available. In a few cases in which the new genome version includes major changes in gene content, the old version could be replaced by the new version, in a process analogous to replacing the (usually lost) type strain of a classified species by a neotype strain that is employed by the current taxonomy. This will probably require less effort, since a genome sequence represents digitized information that can be managed and tracked more easily than culture collections. Overall, the advantages of adopting genome sequence as the type material for uncultivated and cultivated taxa that are fastidious to grow and be maintained in the culture collections far outweigh the disadvantages. We feel it is high time for our scientific community to form an experts committee and to revise current practices

226

L.M. Rodriguez-R et al.

and standards towards a more thorough taxonomic framework of all prokaryotes. If the official taxonomy is not willing to adopt the DNA sequence as the type material for uncultivated taxa and give priority to Candidatus names, a parallel system to the official taxonomy of isolates could be developed as an alternative plan for uncultivated taxa. Importantly, the need for a system to catalogue uncultivated taxa is very urgent, because the genomes and ecological/functional data that are becoming available are already overwhelming, and alphanumeric identifiers and synonyms are creating confusion of Babylonian dimensions, as discussed above (also see Chapter 16).

•

• • •

Tips for Genome-based Classification of an Unknown Query Genome These suggested steps aim to guide the reader on how to create a taxonomic description of a query genome sequence, partial or complete, using the resources mentioned above. The majority of these suggestions represent modified versions of proposals in Konstantinidis and Rosselló-Móra (2015) and Konstantinidis et al. (2017).

•

•

If the 16S rRNA gene is present in the query genome sequence, rely on its phylogeny for classification and verify for consistency with a whole-genome approach such as ANI/AAI (e.g. using MiGA) or marker- based phylogeny (e.g. GTDB-Tk). If the best match of the query 16S rRNA gene against the previously classified genomes is > 98.5% nucleotide identity, switch to using ANI and the 95% ANI threshold instead. Note that organisms that do not have a sequenced genome representative may provide a better (and more reliable) match based on 16S rRNA gene data than the best-matching organism(s) based on ANI/AAI. To determine the degree of novelty of the query (e.g. whether it represents a novel species, genus, family etc.), rely on the thresholds suggested previously for the 16S rRNA gene (Yarza et al., 2014) and ANI/AAI (Konstantinidis et al., 2017). Notably, MiGA offers statistical support for taxonomic novelty in its webserver deployment (Rodriguez-R et al., 2018a) for

•

AAI values that deviate from the average AAI values (Konstantinidis et al., 2017). Perform bioinformatics analysis of the functional gene content of the wholegenome sequence to describe its metabolic potential, and to identify diagnostic pathways and traits compared to the closest classified relatives. When possible and/or desirable, verify these bioinformatics predictions with RNA, isotope or other experimental data (Konstantinidis and Rosselló-Móra, 2015). Include spatial and/or temporal abundance data to define the ecological niche of the query organism. If possible, include additional MAGs (ideally from distinct samples in terms of space and/ or time) or SAGs to avoid single-genome descriptions and low-quality genome sequences, and a description of the cell morphology derived by Fluorescence In Situ Hybridization (FISH) or another technique. Note, however, that a MAG represents the composite genome of a population and not a single cell; hence, it can be trusted as a population genome when coverage by sequencing is high enough for robust assembly. A coverage of 5-7X or more is recommended (Sczyrba et al., 2017; Meziti et al., 2018). Name the organism following the International Code of Nomenclature of Prokaryotes (Parker et al., 2015).

There are several excellent examples of taxonomic descriptions that meet these standards and recommendations. The reader is referred to these publications for further details (e.g. Konstantinidis et al., 2019).

Acknowledgements Our work was supported by US NSF (awards 1759831 and 1831582 to KTK), the Spanish Ministry of Science and Innovation projects CLG2015_66686-C3-1-P, PGC2018-096956B-C41, PRX18/00048 and RTC-2017-6405-1, and a European Regional Development Fund (RRM). Jim Tiedje, Jim Cole and Barny Whitman are particularly acknowledged for their helpful discussions related to the manuscript.

Identification and Classification of Prokaryotes

227

References Andersen, M.H., McIlroy, S.J., Nierychlo, M., Nielsen, P.H. and Albertsen, M. (2019) Genomic insights into Candidatus Amarolinea aalborgensis gen. nov., sp. nov., associated with settleability problems in wastewater treatment plants. Systematic and Applied Microbiology 42 (1), 77–84. https://doi.org/10.1016/j. syapm.2018.08.001 Auch, A.F., von Jan, M., Klenk, H.P. and Goker, M. (2010) Digital DNA-DNA hybridization for microbial species delineation by means of genome-to-genome sequence comparison. Standards in Genomic Science 2 (1), 117–134. https://doi.org/10.4056/sigs.531120 Balish, M., Bertaccini, A., Blanchard, A., Brown, D., Browning, G., Chalker, V., Frey, J., Gasparich, G., Hoelzle, L., Knight, T., Knox, C., Kuo, C.H., Manso-Silvan, L., May, M., Pollack, J.D., Ramirez, A.S., Spergser, J., Taylor-Robinson, D., Volokhov, D. and Zhao, Y. (2019) Recommended rejection of the names Malacoplasma gen. nov., Mesomycoplasma gen. nov., Metamycoplasma gen. nov., Metamycoplasmataceae fam. nov., Mycoplasmoidaceae fam. nov., Mycoplasmoidales ord. nov., Mycoplasmoides gen. nov., Mycoplasmopsis gen. nov. [Gupta, Sawnani, Adeolu, Alnajar and Oren 2018] and all proposed species comb. nov. placed therein. International Journal of Systematic and Evolutionary Microbiology 69 (11), 3650–3653. https://doi.org/10.1099/ijsem.0.003632 Becraft, E.D., Woyke, T., Jarett, J., Ivanova, N., Godoy-Vitorino, F., Poulton, N., Brown, J. M., Brown, J., Lau, M.C.Y., Onstott, T., Eisen, J.A., Moser, D. and Stepanauskas, R. (2017) Rokubacteria: Genomic Giants among the Uncultured Bacterial Phyla. Frontiers in Microbiology 8, 2264. https://doi.org/10.3389/ fmicb.2017.02264 Bendall, M.L., Stevens, S.L., Chan, L.K., Malfatti, S., Schwientek, P., Tremblay, J., Schackwitz, W., Martin, J., Pati, A. et al. (2016) Genome-wide selective sweeps and gene-specific sweeps in natural bacterial populations.’ ISME Journal 10 (7), 1589–1601. https://doi.org/10.1038/ismej.2015.241 Bisgaard, M., Christensen, H., Clermont ,D., Dijkshoorn, L., Janda, J.M., Moore, E.R.B., Nemec, A., Norskov-Lauritsen, N., Overmann, J. and Reubsaet, F.A.G. (2019) The use of genomic DNA sequences as type material for valid publication of bacterial species names will have severe implications for clinical microbiology and related disciplines. Diagnostic Microbiology and Infectious Diseases 95 (1), 102–103. https://doi.org/10.1016/j.diagmicrobio.2019.03.007 Bowers, R.M., Kyrpides, N.C., Stepanauskas, R., Harmon-Smith, M., Doud, D., Reddy, T.B.K., Schulz, F., Jarett, J., Rivers, A.R. et al. (2017) Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nature Biotechnology 35 (8), 725–731. https://doi.org/10.1038/nbt.3893 Brown, M.V., Lauro, F.M., DeMaere, M.Z., Muir, L., Wilkins, D., Thomas, T., Riddle, M.J., Fuhrman, J.A., Andrews-Pfannkoch, C., Hoffman, J. M., McQuaid, J.B., Allen, A., Rintoul, S.R. and Cavicchioli, R. (2012) Global biogeography of SAR11 marine bacteria. Molecular and Systematic Biology 8, 595. https://doi.org/10.1038/msb.2012.28 Caro-Quintero, A. and Konstantinidis, K.T. (2012) Bacterial species may exist, metagenomics reveal. Environmental Microbiology 14 (2), 347–355. https://doi.org/10.1111/j.1462-2920.2011.02668.x Ciccarelli, F.D., Doerks, T., von Mering, C., Creevey, C.J., Snel, B. and Bork, P. (2006) Toward automatic reconstruction of a highly resolved tree of life. Science 311( 5765), 1283–1287. https://doi.org/10.1126/ science.1123061 Ciufo, S., Kannan, S., Sharma, S., Badretdin, A., Clark, K., Turner, S., Brover, S., Schoch, C.L., Kimchi, A. and DiCuccio, M. (2018) Using average nucleotide identity to improve taxonomic assignments in prokaryotic genomes at the NCBI. International Journal of Systematic and Synthetic Biology 68, 2386– 2392. https://doi.org/10.1099/ijsem.0.002809 Cohan, F. M. (2019) Systematics: The Cohesive Nature of Bacterial Species Taxa. Current Biology 29 (5), R169–R172. https://doi.org/10.1016/j.cub.2019.01.033 Cole, J., Konstantinidis, K.T., Farris, R.J. and Tiedje, J.M. (2010) Microbial diversity and phylogeny: extending from rRNAs to genomes. In: Liu, W.T. and Jansson, J. (eds) Environmental Molecular Biology. Horizon Scientific Press, Norwich, UK, pp. 1–20. Cole, J.R., Wang, Q., Chai, B. and Tiedje, J.M. (2011) The Ribosomal Database Project: sequences and software for high-throughput rRNA analysis. In: Bruijn, F.J. de (ed.) Handbook of Molecular Microbial Ecology I: Metagenomics and Complementary Approaches. Wiley & Sons, Inc, Hoboken, NJ, pp. 313–324. https://doi.org/10.1002/9781118010518.ch36 Doolittle, W.F. (2019) Speciation without Species: A Final Word. Philosophy, Theory, and Practice in Biology 11. https://doi.org/10.3998/ptpbio.16039257.0011.014

228

L.M. Rodriguez-R et al.

Field, D., Garrity, G., Gray, T., Morrison, N., Selengut, J., Sterk, P., Tatusova, T., Thomson, N., Allen, M.J., Angiuoli, S.V., Ashburner, M. et al. (2008) The minimum information about a genome sequence (MIGS) specification. Nature Biotechnology 26 (5), 541–547. https://doi.org/10.1038/nbt1360 Glaubitz, S., Kiesslich, K., Meeske, C., Labrenz, M. and Jurgens, K. (2013) SUP05 dominates the Gammaproteobacterial sulfur oxidizer assemblages in pelagic redoxclines of the central Baltic and Black Seas. Applied and Environmental Microbiology 79 (8), 2767–2776. https://doi.org/10.1128/AEM.03777-12 Gophna, U., Doolittle, W.F. and Charlebois, R.L. (2005) Weighted genome trees: refinements and applications. Journal of Bacteriology 187 (4), 1305–1316. https://doi.org/10.1128/JB.187.4.1305-1316.2005 Goris, J., Konstantinidis, K.T., Klappenbach, J.A., Coenye, T., Vandamme, P. and Tiedje, J.M. (2007) DNADNA hybridization values and their relationship to whole-genome sequence similarities. International Journal of Systematic and Evolutionary Microbiology 57 (Pt 1), 81–91. https://doi.org/10.1099/ijs.0.64483-0 Gupta, R.S., Sawnani, S., Adeolu, M., Alnajar, S. and Oren, A. (2018) Phylogenetic framework for the phylum Tenericutes based on genome sequence data: proposal for the creation of a new order Mycoplasmoidales ord. nov., containing two new families Mycoplasmoidaceae fam. nov. and Metamycoplasmataceae fam. nov. harbouring Eperythrozoon, Ureaplasma and five novel genera. Antonie Van Leeuwenhoek 111 (9), 1583–1630. https://doi.org/10.1007/s10482-018-1047-3 Gupta, R.S., Son, J. and Oren, A. (2019) A phylogenomic and molecular markers based taxonomic framework for members of the order Entomoplasmatales: proposal for an emended order Mycoplasmatales containing the family Spiroplasmataceae and emended family Mycoplasmataceae comprised of six genera. Antonie Van Leeuwenhoek 112 (4), 561–588. https://doi.org/10.1007/s10482-018-1188-4 Hedlund, B.P., Dodsworth, J.A. and Staley, J.T. (2015) The changing landscape of microbial biodiversity exploration and its implications for systematics. Systematic and Applied Microbiology 38 (4), 231–236. https://doi.org/10.1016/j.syapm.2015.03.003 Helbling, D.E., Ackermann, M., Fenner, K., Kohler, H.P. and Johnson, D.R. (2012) The activity level of a microbial community function can be predicted from its metatranscriptome. ISME Journal 6 (4), 902–904. https://doi.org/10.1038/ismej.2011.158 Jain, C., Rodriguez, R.L., Phillippy, A.M., Konstantinidis, K.T. and Aluru, S. (2018) High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nature Communications 9 (1), 5114. https://doi.org/10.1038/s41467-018-07641-9 Karthikeyan, S., Rodriguez, R.L., Heritier-Robbins, P., Kim, M., Overholt, W.A., Gaby, J.C., Hatt, J.K., Spain, J.C., Rosselló-Móra, R., Huettel, M., Kostka, J.E. and Konstantinidis, K.T. (2019) ‘Candidatus Macondimonas diazotrophica’, a novel gammaproteobacterial genus dominating crude-oil-contaminated coastal sediments. ISME Journal 13 (8), 2129–2134. https://doi.org/10.1038/s41396-019-0400-5 Konstantinidis, K.T. and DeLong, E.F. (2008) Genomic patterns of recombination, clonal divergence and environment in marine microbial populations. ISME Journal 2 (10), 1052–1065. https://doi.org/10.1038/ ismej.2008.62 Konstantinidis, K.T. and Rosselló-Móra, R. (2015) Classifying the uncultivated microbial majority: A place for metagenomic data in the Candidatus proposal. Systematic and Applied Microbiology 38 (4), 223–230. https://doi.org/10.1016/j.syapm.2015.01.001 Konstantinidis, K.T. and Tiedje, J.M. (2005a) Genomic insights that advance the species definition for prokaryotes. Proceedings of the Natliona Academy of Science USA 102 (7), 2567–2572. https://doi. org/10.1073/pnas.0409727102 Konstantinidis, K.T. and Tiedje, J.M. (2005b) Towards a genome-based taxonomy for prokaryotes. Journal of Bacteriology 187 (18), 6258–6264. https://doi.org/10.1128/JB.187.18.6258-6264.2005 Konstantinidis, K.T., Ramette, A. and Tiedje, J.M. (2006) Toward a more robust assessment of intraspecies diversity, using fewer genetic markers. Applied and Environmental Microbiology 72 (11), 7286–7293. https://doi.org/10.1128/AEM.01398-06 Konstantinidis, K.T., Rosselló-Móra, R. and Amann, R. (2017) Uncultivated microbes in need of their own taxonomy. ISME Journal. https://doi.org/10.1038/ismej.2017.113 Konstantinidis, K.T., Rosselló-Móra, R. and Amann R. (2019) Moving the cataloguing of the ‘uncultivated majority’ forward. Systematic and Applied Microbiology 42 (1), 3–4. https://doi.org/10.1016/j. syapm.2018.12.001 Lan, R. and Reeves, P.R. (2001) When does a clone deserve a name? A perspective on bacterial species based on population genetics. Trendsin Microbiology 9 (9), 419–424. https://doi.org/10.1016/S0966842X(01)02133-3 Lawrence, J.G. (2002) Gene transfer in bacteria: speciation without species? Theoretical Population Biology 61 (4), 449–460. https://doi.org/10.1006/tpbi.2002.1587

Identification and Classification of Prokaryotes

229

Lawrence, J.G. and Ochman, H. (1998) Molecular archaeology of the Escherichia coli genome. Proceedings of the National Academy of the Sciences 95 (16), 9413–9417. https://doi.org/10.1073/pnas.95.16.9413 Lee, I., Ouk Kim, Y., Park, S.C. and Chun, J. (2016) OrthoANI: An improved algorithm and software for calculating average nucleotide identity. International Journal of Systematics and Evolutionary Microbiology 66 (2), 1100–1103. https://doi.org/10.1099/ijsem.0.000760 Ludwig, W. and Schleifer, K. (2005) Molecular phylogeny of bacteria based on comparative sequence analysis of conserved genes. Microbial Phylogeny and Evolution, Concepts and Controversies. Oxford University Press, New York, pp. 70–98. Mende, D.R., Sunagawa, S., Zeller, G. and Bork, P. (2013) Accurate and universal delineation of prokaryotic species. Nature Methods 10 (9), 881–884. https://doi.org/10.1038/nmeth.2575 Meziti, A., Tsementzi, D., Rodriguez, R.L., Hatt, J.K., Karayanni, H., Kormas, K.A. and Konstantinidis, K.T. (2018) Quantifying the changes in genetic diversity within sequence-discrete bacterial populations across a spatial and temporal riverine gradient. ISME Journal. https://doi.org/10.1038/s41396-018-0307-6 Orellana, L.H., Hatt, J.K., Iyer, R., Chourey, K., Hettich, R.L., Spain, J.C., Yang, W.H., Chee-Sanford, J.C., Sanford, R.A., Loffler, F.E. and Konstantinidis, K.T. (2019) Comparing DNA, RNA and protein levels for measuring microbial dynamics in soil microcosms amended with nitrogen fertilizer. Science Reports 9 (1), 17630. https://doi.org/10.1038/s41598-019-53679-0 Overmann, J., Huang, S., Nubel, U., Hahnke, R.L. and Tindall, B.J. (2019) Relevance of phenotypic information for the taxonomy of not-yet-cultured microorganisms. Systematic and Applied Microbiology 42 (1) 22–29. https://doi.org/10.1016/j.syapm.2018.08.009 Parker, C.T., Tindall, B.J. and Garrity, G.M. (2015) International Code of Nomenclature of Prokaryotes. International Journal of Systematic and Evolutionary Microbiology. Parks, D.H., Chuvochina, M., Waite, D.W., Rinke, C., Skarshewski, A., Chaumeil, P.A. and Hugenholtz, P. (2018) A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nature Biotechnology 36 (10), 996–1004. https://doi.org/10.1038/nbt.4229 Parks, D.H., Imelfort, M., Skennerton, C.T., Hugenholtz, P. and Tyson, G.W. (2015) CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research 25 (7), 1043–1055. https://doi.org/10.1101/gr.186072.114 Pena-Gonzalez, A., Rodriguez, R.L., Marston, C.K., Gee, J.E., Gulvik, C.A., Kolton, C.B., Saile, E., Frace, M., Hoffmaster, A.R. and Konstantinidis, K.T. (2018) Genomic Characterization and Copy Number Variation of Bacillus anthracis Plasmids pXO1 and pXO2 in a Historical Collection of 412 Strains. mSystems 3 (4). https://doi.org/10.1128/ mSystems.00065-18 Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., Peplies, J. and Glockner, F.O. (2013) The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Research 41 (Database issue), D590–596. https://doi.org/10.1093/nar/gks1219 Ramos-Barbero, M.D., Martin-Cuadrado, A.B., Viver, T., Santos, F., Martinez-Garcia, M. and Anton, J. (2019) Recovering microbial genomes from metagenomes in hypersaline environments: The Good, the Bad and the Ugly. Systematic and Applied Microbiology 42 (1), 30–40. https://doi.org/10.1016/j. syapm.2018.11.001 Richter, M., Rosselló-Móra, R., Glöckner, F.O. and Peplies, J. (2016) JSpeciesWS: a web server for prokaryotic species circumscription based on pairwise genome comparison. Bioinformatics 32 (6), 929–931. https://doi.org/10.1093/bioinformatics/btv681 Rodriguez-R, L.M. and Konstantinidis, K.T. (2014) Bypassing Cultivation To Identify Bacterial Species. Microbe Magazine (March issue). https://doi.org/10.1128/microbe.9.111.1 Rodriguez-R, L.-M. and Konstantinidis, K.T. (2016) The enveomics collection: a toolbox for specialized analyses of microbial genomes and metagenomes. PeerJ Preprints (e1900v1). https://doi.org/10.7287/ peerj.preprints.1900v1 Rodriguez, R.L., Castro, J.C., Kyrpides N.C., Cole, J.R., Tiedje, J.M. and Konstantinidis, K.T. (2018a) How Much Do rRNA Gene Surveys Underestimate Extant Bacterial Diversity? Applied and Environmental Microbiology 84 (6). https://doi.org/10.1128/AEM.00014-18 Rodriguez-R, L.M. Gunturu, S., Harvey, W.T., Rosselló-Móra, R., Tiedje, J.M., Cole, J.R. and Konstantinidis, K.T. (2018b) The Microbial Genomes Atlas (MiGA) webserver: taxonomic and gene diversity analysis of Archaea and Bacteria at the whole genome level. Nucleic Acids Research 46 (W1), W282–W288. https://doi.org/10.1093/nar/gky467 Rosselló-Móra, R. and Amann, R. (2001) The species concept for prokaryotes. FEMS Microbiology Review 25 (1), 39–67. https://doi.org/10.1111/j.1574-6976.2001.tb00571.x

230

L.M. Rodriguez-R et al.

Rosselló-Móra, R. and Amann, R. (2015) Past and future species definitions for Bacteria and Archaea. Systematic and Applied Microbiology 38 (4) 209–216. https://doi.org/10.1016/j.syapm.2015.02.001 Rost, B. (1999) Twilight zone of protein sequence alignments. Protein Engineering 12 (2), 85–94. https://doi. org/10.1093/protein/12.2.85 Sczyrba, A., Hofmann, P., Belmann, P., Koslicki, D., Janssen, S., Droge, J., Gregor, I., Majda, S., Fiedler, J., Dahms, E. et al. (2017) Critical Assessment of Metagenome Interpretation-a benchmark of metagenomics software. Nature Methods 14 (11), 1063–1071. https://doi.org/10.1038/nmeth.4458 Snel, B., Bork, P. and Huynen, M.A. (1999) Genome phylogeny based on gene content. Nature Genetics 21 (1), 108–110. https://doi.org/10.1038/5052 Stackebrandt, E. and Ebers, J. (2006) Taxonomic parameter revisited: tarnished gold standards. Microbiology. Today 33, 152–155. Stackebrandt, E., Frederiksen, W., Garrity, G.M., Grimont, P.A., Kampfer, P., Maiden, M.C., Nesme, X., Rosselló-Móra, R., Swings, J., Truper, H.G., Vauterin, L., Ward, A.C. and Whitman, W.B. (2002) Report of the ad hoc committee for the re-evaluation of the species definition in bacteriology. International Journal of Systematic and Evolutionary Microbiology 52 (Pt 3), 1043–1047. https:// doi.org/10.1099/00207713-52-3-1043 Thompson, L.R., Sanders, J.G., McDonald, D., Amir, A., Ladau, J., Locey, K.J., Prill, R.J., Tripathi, A., Gibbons, S.M. et al. (2017) A communal catalogue reveals Earth's multiscale microbial diversity. Nature 551 (7681), 457–463. https://doi.org/10.1038/nature24621 Van Ert, M.N., Easterday, L.Y., Huynh, W.R., Okinaka,R.T., Hugh-Jones, M.E., Ravel, J., Zanecki, S.R., Pearson, T. (2007) Global genetic population structure of Bacillus anthracis. PLoS One 2 (5), e461. https://doi.org/10.1371/journal.pone.0000461 Varghese, N.J., Mukherjee, S., Ivanova, N., Konstantinidis, K.T., Mavrommatis, K., Kyrpides, N.C. and Pati, A. (2015) Microbial species delineation using whole genome sequences. Nucleic Acids Research 43 (14), 6761–6771. https://doi.org/10.1093/nar/gkv657 VerBerkmoes, N.C., Denef, V.J., Hettich, R.L. and Banfield, J.F. (2009) Systems biology: Functional analysis of natural microbial consortia using community proteomics. Nature Reviews Microbiology 7 (3), 196–205. https://doi.org/10.1038/nrmicro2080 Whitman, W.B. (2016) Modest proposals to expand the type material for naming of prokaryotes. International Journal of Systematic and Evolutionary Microbiology 66 (5), 2108–2112. https://doi.org/10.1099/ ijsem.0.000980 Yarza, P., Sproer, C., Swiderski, J., Mrotzek, N., Spring, S., Tindall, B.J., Gronow, S., Pukall, R., Klenk, H.P., Lang, E. et al., (2013) Sequencing orphan species initiative (SOS): Filling the gaps in the 16S rRNA gene sequence database for all species with validly published names. Systematic and Applied Microbiology 36 (1), 69–73. https://doi.org/10.1016/j.syapm.2012.12.006 Yarza, P., Yilmaz, P., Pruesse, E., Glockner, F.O., Ludwig, W., Schleifer, K.H., Whitman, W.B., Euzeby, J., Amann, R. and Rosselló-Móra, R. (2014) Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nature Reviews Microbiology 12 (9), 635–645. https://doi. org/10.1038/nrmicro3330

14

Genomic Sequences for Fungi

Riccardo Baroncelli1 and Giovanni Cafà*,2 Spanish–Portuguese, Institute for Agricultural Research (CIALE), University of Salamanca, Spain; 2CABI Europe, Egham, UK

1

Introduction

as this was not only the first fungus, but also the first eukaryotic organism to be fully sequenced Genomics is an area of genetics that aims to and released (Goffeau et al., 1996). As the first fungal genome became availcombine DNA sequencing techniques and bioinformatics tools to assemble, annotate and study able, it appeared to be much more complex than genomes. The term ‘genomics’ was first intro- had been previously assumed. Since then, owing duced in 1986 by Tom Roderick (Yadav, 2007) to the exponential increase in sequencing techand has since expanded from its original defin- niques, a large number of fungal genomes have ition to include further DNA sequence applica- become available, enabling detailed comparative tions such as transcriptomics and epigenomics. genomics studies, especially when coupled with The expansion of genomics is having a major transcriptomics, proteomics and epigenomics. The first fungal genomes were sequenced impact on many research fields, and those related to mycology are no exception. The capability by large international consortia. Since then, to analyse whole fungal genomes has signifi- most genome sequencing has been undertaken cantly boosted our understanding of fungal in large research centres such as the Broad Instidiversity, taxonomy, evolution, physiology and tute and the Joint Genome Institute (JGI, part of the US Department of Energy), which produce biology. More than 40 years ago, a sequencing tech- an enormous amount of sequencing data each nique based on DNA polymerase and radio- year. In 2000, the Broad Institute launched the labelled nucleotides was introduced. Shortly after, Fungal Genome Initiative (Cuomo and Birren, this technique was used to fully sequence the first 2010), a large-scale effort to sequence genomes genome, a 5-kb genome of the bacteriophage throughout the Kingdom of Fungi. The followφX174 (Sanger et al., 1977). This DNA polymer- ing decade included many important milestones ase-based approach developed into what is now in fungal genomics. For example, between 2002 referred to as the ‘Sanger sequencing method’. In and 2005, a further four fungal genomes were 1996, 20 years later, the entire genome of Sac- sequenced and released: Schizosaccharomyces charomyces cerevisiae was published by a consor- pombe (Wood et al., 2003), the first filamentous tium of researchers from laboratories around the fungus Neurospora crassa (Galagan et al., 2003), world. This event marked a milestone in genetics, Phanerochaete chrysosporium (Martinez et al., 2004)

*[email protected] © CAB International 2021. Trends in the Systematics of Bacteria and Fungi (eds. P. Bridge, D. Smith and E. Stackebrandt)

231

232

R. Baroncelli and G. Cafà

and the plant pathogen model Magnaporthe oryzae (Dean et al., 2005). In 2004, Génolevures, an international consortium of scientists studying yeast genetics and genomics supported by the French institutions CNRS and INRA, published the first large-scale comparative analysis of five yeast genomes (Dujon et al., 2004). In 2009, the JGI started its fungal programme to explore fungi in the context of diversity and biomass degradation, and included many plant-associated fungi (Grigoriev et al., 2011). A few years later, the 1000 Fungal Genomes Project was one of the most important projects, and aimed at comprehensively studying fungal genomic diversity associated with ecological adaptation (Grigoriev et al., 2014). Subsequently, smaller-scale genomics centres have been established in universities and research institutes, and provide internal and external services to a broader scientific community. More recently, the Darwin Tree of Life Project, led by the Wellcome Sanger Institute, has been developed to read the genomes of all known species of animals, plants, fungi and protists in the British Isles (sanger.ac. uk/science/collaboration/darwin-t ree-lifeproject, accessed 7 May 2020). As sequencing technology continues to advance, a new generation of benchtop sequencers have become available (Quail et al., 2012), making genome sequencing more affordable and practicable to an even larger community (see Chapter 13). At the beginning of 2020, a full genome sequence is publicly available for at least one representative individual from more than 2180 fungal species (ftp.ncbi.nlm.nih.gov/genomes/ genbank/fungi/), while 1566 genomes are available in the MycoCosm portal (Grigoriev et al., 2014); and more than 4000 wholegenome sequencing (WGS) projects are registered in the Genomes OnLine Database (GOLD) (Mukherjee et al., 2017). That a large proportion of genome sequences released in GenBank are characterized by incompleteness, poor assemblies or lacking annotation, is something that needs careful consideration (e.g. see section below on Colletotrichum). Genome data produced by large research centres are usually technically managed by experts. However, the data produced by external services or benchtop machines may be overwhelming, and while a large spectrum of users can afford sequencing, not many researchers have the appropriate

skills to technically and r ationally interpret the data generated. In the near future, genome sequencing will become even more affordable. Current techniques will improve, and new methods will be developed, particularly those aimed at sequencing single molecules in real time. Such developments will provide challenges for those researchers developing and improving bioinformatic tools, pipelines and platforms. The purpose of this chapter is to give an overview of the basic knowledge, understanding and perspectives in fungal genomics.

The Species Concept in the Next- generation Sequencing (NGS) Era While the species concept is well defined in many of the lineages of the tree of life, fungi seem to be more challenging. Traditionally, the scientific debate on species concept has mainly focused on two main objectives: provide a definition of ‘species’; and obtain the best method of species identification and delimitation. Endler (1989) delineated four different bipolar perspectives that summarize the species concept debate: ●

●

●

Taxonomic versus evolutionary. The taxonomic concept is based on the practical need to define species without considering the evolutionary relationships of the organisms in the study. In the evolutionary concept, the species designation is used as a communicative tool to study evolutionary processes and relationships. Theoretical versus operational. In the first case the concept is linked to the study of the origin of the species. Theoretical concepts are more appropriate in intellectual debates than in practical terms. The second concept includes mainly taxonomic studies, but may also include those that aim to understand evolutionary relationships. In this case, the objective of the concept relies on its applicability and practicality. Contemporaneous versus clade. The contemporaneous concept refers to the study of existing organisms that rarely consider the concept of ‘species’ as an ongoing process. The clade concept refers to those studies

●

Genomic Sequences for Fungi

that are inclined to consider species within the context of clades and ancestor– descendant relationships. Reproductive versus cohesive. These two concepts differ by the consideration of genetic exchanges or the possibility that these events occur. Reproductive concepts focus on genetic exchange processes (such as reproduction) and those that maintain segregation between species. The reproductive concept contrasts with the cohesive concept, which is related to those studies that focus on species as units with phenotypic and genetic cohesion.

The species concept is key to conventional systematics and has been widely debated, but the definition of ‘a species’ has not been generally agreed (Cai et al., 2011; Boluda et al., 2019; Valent et al., 2019; Wagner et al., 2019). Most plant and animal populations are groups of mainly diploid systems with population growth obtained through sexual reproduction. Fungi, on the other hand, are much more complicated, as they can exist in haploid, diploid, polyploid or multinucleate heterokaryotic states, with unlimited growth and with the capability of a wide range of reproductive strategies. Fungi may reproduce sexually (genetic recombination), asexually (clonal propagation) and also through a unique process described in no other systems. This is the parasexual cycle, a genetic recombination system that does not require two mating types and that takes place in three steps: diploidization, mitosis and haploidization. These reasons, and the greater complexity of the fungal kingdom, explain why the species concept debate has focused on animals and plants, while fungi have generally remained marginal. Nevertheless, defining species in the fungal kingdom remains crucial for at least two main reasons: one is practical, as scientists must be able to identify, name and communicate the organisms in fields such as biotechnology, ecology, agriculture and medicine. This impacts on the establishment of quarantine protocols, identification of plant pathogens, description of organisms associated with biochemical production and patent application. The second reason is more theoretical, as fungi are organisms that, owing to their complexity, could be considered as model systems in evolutionary studies and speciation in a broad sense

233

of the term: for example, ecological adaptation, reproductive behaviour and lifestyles. Species criteria are related to recognition and delimitation of species (Taylor et al., 2000), and have been subdivided into four categories (Giraud et al., 2008): ● ● ●

●

the biological species concept (BSC), based mainly on genetic isolation; the morphological species concept (MSC), based on morphological divergence; the ecological species concept (ESC), based on adaptation to a specific environment; and the phylogenetic species concept (PSC), based on genetic divergence. The PSC in fungi has two distinct approaches, the strict genealogic concordance (SGC) and the coalescent-based species delimitation (CBSD). These approaches are not mutually exclusive; they use phylogenetics, but their aim is different (Matute and Sepúlveda, 2019).

All these criteria reflect different events that happen during population divergence and speciation. Traditionally, the criterion most frequently used for fungi has been the MSC. In the past two decades many new species have been described or revised using the Genealogical Concordance Phylogenetic Species Recognition (GCPSR), which is a bioinformatics approach to the PSC (Taylor et al., 2000). The GCPSR is currently the most widely used approach within the fungal kingdom (Giraud et al., 2008) and it has been remarkably useful in many cases (see also Chapter 17). This criterion is better at discriminating close relationships than the other criteria, as the information retrieved from DNA is higher compared to that of the other approaches. Even if the genetic variability of standard loci (e.g. ITS rRNA) may not be sufficient, new loci can be identified to resolve genetic divergence (Cabral et al., 2020). The final aim of identifying species boundaries should not be considered merely as a simple description of a species. The first step in a robust research programme in evolutionary biology is to define species boundaries accurately. To understand diversification patterns in fungi, it is crucial to understand the degree of diversity in the group. Reliable species definitions that can be applied across taxa can be critical to understanding how fungal genomes

234

R. Baroncelli and G. Cafà

and divergence evolve. For example, genomic population analyses are defined within a species, and so are dependent on its definition. If species borders are not correctly defined, our understanding of how genetic diversity is distributed across populations and species will not be correctly represented. Plant pathogenic fungi provide a useful example of the accurate delimitation of species boundaries with insights into pathogenesis. One example is in Pyricularia, where the application of the SGC approach revealed the existence of two species, P. oryzae and P. grisea (Couch and Kohn, 2002). Phylogenetic trees based on partial sequences of actin, beta-tubulin and calmodulin genes were found to be concordant and distinguished two groups within strains previously assigned to P. grisea. One clade was associated with the grass Digitaria and retained the original name, and the other was associated with rice (Oryza sativa) and other cultivated grasses, and was therefore named P. oryzae. Castroagudín et al. (2016) used ten housekeeping gene loci to study relationships among 128 isolates of P. oryzae sampled from sympatric populations of wheat, rice and grasses growing in or near wheat fields. Their analyses grouped the isolates into three major clades. The first clade comprised only isolates associated with rice that matched the previously described rice blast pathogen P. oryzae pathotype Oryza (PoO). The second clade contained isolates associated almost exclusively with wheat (Triticum sp.) and corresponded to the previously described wheat blast pathogen P. oryzae pathotype Triticum (PoT). The third clade comprised strains isolated from wheat as well as from other Poaceae hosts. The authors stated that this clade was distinct from P. oryzae, and represented the new species Pyricularia graminis-tritici (Pgt). Gladieux et al. (2018) used whole-genome sequence information for 76 strains of P. oryzae isolated from 12 grass and cereal genera to infer the population structure of this pathogen, and to reconsider the taxonomic status of the wheat-infecting populations. Species recognition based on genealogical concordance failed to confirm a prior assignment of wheat blast isolates to the new species (P. graminis-tritici). This is an example of how increasing available data, such as new loci or complete genomes, can complicate matters for a unique approach to define a species.

Genomics investigates vast amounts of DNA sequences and studies the genes of organisms and/or the full DNA sequence containing all the hereditary information – the genome. The genome contains all the information to build and maintain an organism. Genomes come in different sizes and structures, and the use of complex algorithms and bioinformatic calculations with large numbers of DNA sequences can be used to detect genomic variations. Genomics has opened new possibilities in the understanding of how organisms change over time, and how genome sequences link genetics with biological traits. While taxonomy has historically relied on phenotypic characters, molecular data have frequently revealed that morphological traits are insufficient to describe biodiversity. In this context, genomics provides new perspectives on the identification of previously undescribed species, and the understanding of speciation in fungi.

Methodology Sequencing technologies The discovery of the structure of nucleic acids in 1953 was a major milestone in the field of genetics (Watson and Crick, 1953). Since then, the investigation of nucleotide sequences in organisms has become one of most studied scientific research disciplines, and has had a major impact on microbial systematics. In 1977, 24 years after the discovery of the structure of DNA, Frederick Sanger introduced the ‘dideoxy’ chain- termination method (known as the ‘Sanger method’) for sequencing DNA molecules. The method selectively incorporates chainterminating dideoxynucleotides, which act as specific chain-terminating inhibitors of DNA polymerase (Sanger et al., 1977). This was a major breakthrough and delivered a significant advance in DNA sequencing technology. It made possible the investigation of the DNA sequence of genes, and allowed entirely different types of investigations where the DNA sequence was the absolute main focus – genomics was born. After Sanger, second-generation sequencing of short reads (or next-generation sequencing, NGS), represented another breakthrough with a significant increase in the number of

Genomic Sequences for Fungi

bases generated by a single run. The first platform was developed between 2000 and 2005 when the company 454 Life Sciences was founded. The first NGS instrument available to the market was the GS 20. The newly developed technique was successfully validated by combining single-molecule emulsion PCR with pyrosequencing (Ronaghi et al., 1998). Soon after, the first complete genome of Mycoplasma genitalium was sequenced by Margulies et al. (2005), with 96% coverage at 99.96% accuracy in one run of the machine (Margulies et al., 2005). The new technology had the capacity to sequence hundreds of millions of DNA fragments, therefore allowing investigations of entire genomes instead of single targeted genes. Since then, other NGS platforms have been developed. These include other sequencing-by-synthesis platforms such as Illumina (Kawashima et al., 1998) and Ion Torrent (Merriman et al., 2012) and other chemistries such as sequencing by ligation, which is used in the AB SOLiD (Thermo Fisher) and Complete Genomics (BGI) platforms (Valouev et al., 2008; Drmanac et al., 2010). Another technology, known as third- generation sequencing, is the process by which single molecules of DNA are sequenced without the need to fragment nucleic acids. With third- generation sequencing, data analysis and processing can be reduced, as data sets of long reads may not require assembly. Third-generation methods include platforms such as Oxford Nanopore (Howorka et al., 2001) and Pacific Biosciences (PacBIO) (Eid et al., 2009). All of the above technologies, and the gradual reduction in the costs associated with them, have contributed to the generation of an enormous quantity of data (see Chapter 13). However, full interpretation of the data is not always achieved. The short duration of grants, and poor curation of databases, does not always deliver the accessibility of the data and so it becomes difficult to access the resources at a later time. In genomics, the structure, function, evolution, mapping and editing of genomes can be achieved by an extensive range of methods. Some of the principal methods used in genomics are given below. Depending on the requirements of the investigation, different methods of sequencing can be implemented to achieve multiple objectives (Pareek et al., 2011).

235

De novo, resequencing and targeted sequencing Genome sequencing is one of the widest applications of sequencing methods, and it is usually used in two main types of studies: sequencing of genomes from a species for which a reference genome is available, and de novo sequencing (Nowrousian, 2010). These are the major applications of genome sequencing, and they are routinely used to investigate pathogen genomics to inform management of infectious disease outbreaks, etc. Sequence reads are mapped to a reference genome and can then be used to identify genetic variation in strains (e.g. single-nucleotide polymorphisms (SNPs) or other variants), thereby providing information at the highest possible resolution, which can be used to determine the biology of pathogens (PHE, 2018). An example of targeted sequencing is whole-exome sequencing (WES), which is used to determine variations in the protein-coding regions of the genome (referred to as the exome). This is becoming a standard approach in studying genetic variants in diseases, and represents a faster and cheaper alternative to resequencing (Ng et al., 2010) (see ‘Genomic variation and mutation detection’, below). RNA sequencing (RNA-Seq) RNA-Seq provides higher coverage and resolution of the transcriptome compared to previous Sanger sequencing and microarray-based methods. This method not only quantifies gene expression, but can discover novel proteins, identify gene structure and detect allele-specific expression (Kukurba and Montgomery, 2015). Advances in the RNA-Seq method have enabled the detection of the complexity of transcription. Apart from enabling investigations into polyadenylated messenger RNA (mRNA), the method can be used to investigate the diversity of RNA (Kukurba and Montgomery, 2015). Epigenetics An epigenetic trait is a stable heritable change in gene expression that does not include alterations in the DNA sequence (Berger et al., 2009). Emerging studies have allowed the recognition of assorted gene regulatory mechanisms. Epigenetic regulations encompass chromatin-based regulatory

236

R. Baroncelli and G. Cafà

events such as chromatin remodelling, histone modification and DNA methylation, which are essential for precise and stable control of various nuclear processes (Zeng et al., 2019). DNA methylation combines conserved and derived features, which indicates that methylation is an ancient property of eukaryotic genomes (Zemach et al., 2010), and in fungi has been studied relatively less frequently than in other higher eukaryotes (Zeng et al., 2019). Genomic variation and mutation detection Global analysis of mutations drives the study of microevolution in a heterozygous diploid organism (Ene et al., 2018). For example, WES is the application of the sequencing technologies to determine the variations in the exome, and it is the ideal method to detect changes in the protein-coding regions of genes in a genome, with the final aim of identifying changes in the sites that alter protein sequences. As the data necessary for this technique are significantly less than for WGS, WES can be achieved at a much lower cost than WGS (Sawyers, 2008).

s equencing instruments. The DNA reads do not necessarily have the same quality, and those of low quality may include bases that might have been called incorrectly during sequencing. Identifying low-quality DNA reads and bases called incorrectly, and removing them from the data set, will improve the read-mapping step. In addition, processing the data may require conversion to a format that is suitable for exploratory analysis and modelling. In genomics, data processing involves multiple steps. DNA reads are aligned to reference genomes to screen the sequence to annotate genes and regions of interest, and are counted to generate the coverage. Interpretation This phase usually takes the processed or semi-processed data and includes machine- learning or statistical analysis to explore the data and link it to the initial biological question. In genomics this step could be for gene prediction or annotation. Visualization and reporting

Data analysis and interpretation Data analysis in genomics has a common pattern, regardless of the type of analysis: to screen high- and low-quality DNA reads. Good assemblies are derived from high-quality data, for example the generation of simple consensus sequences from forward- and reverse-sequencing data, to build up entire genomes from data sets of short reads. Data analysis uses various software packages depending on the techniques and hypothesis under consideration. Quantity and quality control remain the essential part of each step of the pipeline, and can be extremely time consuming (Akalin, 2019). The data analysis steps typically include the following. Experimental design and generation of data Generation of data refers to any source, experiment or survey that provides data for testing the hypothesis. Analysis Data analysis starts with quality control, which entails checking the quality of the output of

Visualization is necessary for all the previous steps, although the final phase requires final figures, tables and text that describe the outcome of the analyses. In genomics, this can include alignments or maps of genes across genomes. In general, the choice of methods and technology used for sequencing and data analysis should be strongly influenced by the hypothesis that is being tested.

Techniques Comparative genomics The availability of large numbers of fully sequenced genomes has made possible a wide range of comparative studies. Genomes can be compared at multiple levels, reflecting the complexity of the genome itself and depending on the aim of the research. For example, chromosome rearrangement and genomic reshuffling analyses are usually applied to a small number of genomes, with the aim of understanding differences in the number and structure of chromosomes. These changes also relate to the biology

Genomic Sequences for Fungi

of an organism and require high-quality, complete genomes. However, comparative genomics achieves its full potential only when biological data are available from all individuals. Conversely, there are limitations when released genomes are not fully analysed and linked to biological traits. Another focus of comparative genomics is the identification of lineage-specific regions (LSR) such as those that are species specific or pathogen specific. When working with complete genome sequences, the identification of LSRs can be associated with a specific biological trait that is peculiar to one (or, better) to more individuals sequenced, and it can be used to develop targets for diagnostic tools. For example, comparative analyses have shown LSRs in Fusarium oxysporum that include four entire chromosomes and account for more than one-quarter of the genome (Ma et al., 2010). LSRs are rich in transposons and genes with distinct evolutionary profiles but that may be related to pathogenicity (see Chapter 17). WGS can be used to compare the gene content, and this can then be used to identify specific genes or to study expansion and contraction in gene families. Sampling whole genomes has allowed researchers to study the evolutionary history of individual genes, and has identified events that generate gene family expansions such as gene duplicators. Comparisons of gene families can also identify contractions and gene losses. Comparisons of gene families between species have revealed several instances of gene copy number expansions, possibly driven by evolutionary adaptations. Such expansions have been instrumental in the evolution of pathogenicity traits (Teixeira et al., 2014). Gene family contraction and gene loss have contributed to the evolution of fungal genomes, and in plant pathogenic fungi some of these changes have been correlated to recent changes in host-association patterns (Baroncelli et al., 2016). Population genomics is a section of population genetics that has the same aims as population genetics, but uses a larger number of genetic markers with a known distribution in the genome (Hartl and Clark, 2007). The biological questions of this discipline fit into three broad categories: (i) evolutionary and demographic processes affecting population structure; (ii) those affecting speciation; and (iii) the divergence of

237

closely related taxa and locus-specific effects (mainly selection) acting on specific loci that affect adaptation or phenotypes (Hartl and Clark, 2007). Research approaches aiming to answer these questions include genetic analyses of demographic processes, comparative genomics of closely related species, identification of loci under selection, genome-wide association studies (GWASs) or segregating populations resulting from crosses. The latter approach can identify genetic loci that explain observed phenotypes such as quantitative trait locus (QTL) mapping. The basic principle behind these approaches is that evolutionary history and demography act on all neutral genomic loci in the same way. Therefore, genotyping a very large number of loci, or sequencing the genomes of several individuals, can allow inferences of genome-wide effects to be made. Such inferences can be stronger than those from studies limited to a small number of genetic markers. One of the most important improvements in population genomics is the identification of traits and genes that are under selection for adaptation. Genome-wide evolutionary analyses, such as selection scanning within a population, may identify regions with unbalanced genetic variations compared to the rest of the genome. For example, a reduction in genetic variation can be an indication of selective sweeps, the process through which a beneficial mutation that becomes fixed in a population leads to the decrease or elimination of genetic variation in the nucleotides that neighbour the mutation. This information can be used to find the genetic features responsible for a specific phenotype or to associate phenotypes with genetic variation. Genomic plasticity allows fungi to adapt to environmental changes and conquer new niches; this is particularly relevant for pathogens that engage in coevolutionary ‘arms races’ with their hosts (Raffaele and Kamoun, 2012). A remarkable example of a ‘two-speed’ genome is provided by the comparative genomics analysis between the wheat pathogen Mycosphaerella graminicola (current name Zymoseptoria tritici) and its closest known species relative. This revealed how essential and conditionally dispensable chromosomes have evolved at different rates (Stukenbrock et al., 2010). As selection scans aim to detect loci under selection without previous knowledge of associated phenotypes, approaches such as QTL mapping

238

R. Baroncelli and G. Cafà

and GWAS aim to detect loci linked with heritable phenotypic variation. The main difference between GWAS and QTL mapping are the target populations: QTL is applied to progeny genotypes, and analyses genetic variants segregating; while GWAS is applied to unrelated genotypes of natural populations. Both GWAS and QTL mapping aim to associate genetic polymorphisms with variation in phenotypic traits. QTL mapping is only applicable in fungi with sexual reproduction or, more generally, for which crosses can be achieved. However, GWAS success is directly associated with the level of recombination in the targeted population. QTL mapping is limited by the phenotypic and genotypic diversity of the parents, while the natural populations targeted by GWASs generally exhibit a wider range of phenotypes and genotypes. The appropriate definition of what constitutes a population can significantly impact GWASs, as the population represents all the organisms of the same group or species living in a given geographical area, and able to interbreed or exchange genetic material. If a GWAS is applied erroneously (e.g. to individuals not belonging to the same species), the strongest phenotypic differences – which are most likely the aim of the investigation – will be between the populations associating with every fixed, genetic difference between populations (Taylor et al., 2017). Genome sequences to link genetics with biological traits Interpreting the results of genome sequences is one of the biggest challenges of our time. The generation of data is faster than our ability to carry out detailed interpretation, and this has created a gap between the data and their interpretation. Sequencing entire genomes makes all genetic information available, but there is a crucial need for reference genomes in order to make accurate comparisons. Genetic association studies aim to (i) define the genetic architecture of complex traits and diseases; and (ii) provide new insights into normal physiology and disease pathophysiology (Lowe and Reddy, 2015). A major unresolved issue is how information encoded at the DNA level (e.g. genotypes) is translated to complex phenotypes and disease. Specifically, once DNA sequence variation is linked to whole-body phenotypes by GWAS, WES

or WGS approaches, the question remains as to which molecular and signalling pathways are actually involved in the disease process under investigation, and which of these should be targeted to design new or better therapeutics (Moreno-Moral and Petretto, 2016). In the area of microbial ecology, similar challenges occur when interpreting complex DNA sequencing and linking it to biological functions of microorganisms in the environment. GWASs are the latest development of genomic analyses, and entail analysis of a large number of individuals of the same species to identify associations between common genetic variance and biological traits. They are a powerful approach to dissecting complex traits into small components and detecting areas of the genomes where variability occurs (Korte and Farlow, 2013; Fang et al., 2017). GWASs have widely replaced QTL mapping. QTL mapping has proved, and remains, a powerful method to identify regions of the genome that segregate with a given trait either in F2 populations or recombinant inbred line (RIL) families. GWASs overcome the limitations of QTL, and provide insights into the genetic architecture of traits (Korte and Farlow, 2013). From biochemistry to genomics Genome data contain information on genes and pathways, and can deliver insights into complex biochemical compounds that may be virtually impossible to synthesize in the laboratory. The understanding of fungal metabolites has developed into a branch of research where genome data are investigated to deliver pathways for bioactive secondary metabolites. Penicillin, the first broad-spectrum antibiotic, is the most famous secondary metabolite. Pathways of secondary metabolites can be very complex, and include those of aflatoxin, penicillins and alkaloids. Research into secondary metabolism has generated knowledge into the organization of genes in clusters, and this discovery has had important implications for gene regulation and evolution (Keller et al., 2005). Over the last two decades, genome sequencing has enhanced understanding of the genomics of the fungal kingdom. Since the sequencing of the first fungus in 1996, the number of available fungal genome sequences has increased

Genomic Sequences for Fungi

s ignificantly. Fungal genomes have been publicly released and more are being sequenced every day (Galagan et al., 2005). This represents a great opportunity because these data allow the study of the biology and evolution of the fungal kingdom, not only in terms of taxonomic rearrangements, but also in terms of their impact on biological functions. Metagenomics Metagenomics (environmental and community genomics) is the analysis of microorganisms by direct sequencing of genomic DNA that is isolated from an environment hosting an assemblage of microorganisms (Handelsman, 2004). It can be considered as a set of molecular methods and a research field in itself. Metagenomics can be used to detect unculturable microorganisms and investigate their genomic diversity, and aims to understand individual organisms and their genes in the environment. Metagenomics, owing to the large amount of data that it generates, joins with bioinformatics, as the two disciplines overlap to develop computational methods that maximize our understanding of the genetic composition and activities of microbial communities (NRC, 2007a). Metagenomic investigations start from direct isolation of genomic DNA from an environment, and the direct sequencing of it without the need for isolation of pure cultures. The method, therefore, circumvents the need to culture the organisms under study (Handelsman, 2004). Once genomic DNA is isolated from the environmental matrix, it is made available for sequencing via library preparation. The library contains the genomic DNA of all the organisms in that matrix and can, therefore, be sequenced directly without the use of PCR amplification. This approach can be used to sequence the genomes of the organisms comprising the system, and to deliver useful information on the pool of genes/loci of the individuals inhabiting the system. However, it cannot discriminate whether the pool of genes/loci belong to the same individual or to the same nucleus of a multinucleate individual. For example, genetically different nuclei may coexist in the same individual or in populations of arbuscular mycorrhizal fungi (Kuhn et al., 2001; Wyss et al., 2016).

239

Sequence-based metagenomics captures a very large amount of information on the microbial community under study (NRC, 2007b). For example, a study of the metagenome of the microbial inhabitants of the Sargasso Sea generated sequences of about 1,000,000 genes, and revealed whole classes of genes that were more diverse than could ever have been anticipated on the basis of studies of cultured organisms (Venter et al., 2004). Another study explored the microbial community structure dynamics within a natural wetland exposed to acidic mine drainage, and demonstrated the potential of metagenomics to investigate detailed interactions among microbial community members (Aguinaga et al., 2018). Investigation of the functions of the organisms in a system can provide insights into the genetic diversity and the associated biological functions present. However, connecting the identity of the organisms to their function in the ecosystem remains challenging (see Chapter 11). Metagenomics research has recently benefited from the decreasing costs of DNA sequencing, improvements in technology, development of novel analysis tools and growth in taxonomic databases. However, there are standardized practices that can guarantee the best result for a given project, and several combinations of software, parameters and databases can be utilized (Escobar-Zepeda et al., 2018). Important advances in microbial ecology, and in many other fields, have been made by studying microbial communities and characterizing their genetic information (Wilmes et al., 2009; Zarraonaindia et al., 2013; Imhoff, 2016; Escobar-Zepeda et al., 2018). Although the 16S rRNA gene has been widely accepted as a molecular fingerprint for bacterial species, there are some limitations to its use. Some bacterial species have been shown to contain more than one 16S rRNA gene, each with different sequences, leading to the risk of an artificial over-representation of diversity in 16S rRNA-focused studies (Pei et al., 2010; Escobar-Zepeda et al., 2018). The multi-copy nuclear ribosomal DNA internal transcribed spacer (ITS) has been widely used to assess the fungal composition in different environments. Its use in deep sequencing was evaluated by Yang et al. (2018), and their study of pyrosequencing data showed that use of different regions of the ribosomal cluster (e.g. ITS, ITS1 and ITS2) gave variable results with different taxa, but that the

240

R. Baroncelli and G. Cafà

taxonomic preferences for ITS and ITS2 were similar. The results indicated that ITS2 alone might be a more suitable marker for revealing the operational taxonomic richness and specificity of fungal communities, although the full- length ITS does contain more information, so that the microbial population can be described in more detail.

Technology and Fungal Systematics Saccharomyces Since the first genome of S. cerevisiae was sequenced and published in 1996 (Goffeau et al., 1996), there has been intense research activity, and other complete genomes of Saccharomyces strains have been sequenced. Multiple strains of this yeast have been sequenced and intensively studied by geneticists, molecular biologists and computational scientists across the world (Otero et al., 2010). The first original genome sequence was of 12,068 kb with 5885 potential protein- encoding genes. It had around 140 genes specifying ribosomal RNA, small nuclear RNA molecules and transfer RNA genes (Goffeau et al., 1996). This data set provided information about the 16 chromosomes of S. cerevisiae and allowed insight into their evolutionary history. The genome showed a considerable amount of apparent genetic redundancy, and generated questions about how to elucidate the biological functions of all of the genes in the complete genome (Goffeau et al., 1996). This study opened the path to data repositories, and facilitated the Saccharomyces Genome Database (SGD; www.yeastgenome.org/, accessed 21 October 2020). SGD is one of the most wellknown repositories for the maintenance and annotation of the genome of the original model organism in genomics. It provides information about yeast genes based not only on the literature, but also on the systematic study of every Saccharomyces gene where any information is available (Botstein and Fink, 2011). The S. cerevisiae strain S288C reference genome sequence was updated in 2010 in the first major update since 1996, and this has had significant impacts on the understanding of the eukaryotic genome. The new version, S288C 2010, was determined from a single yeast colony using s econd-generation

sequencing technologies and remains a key reference for yeast genomics (Engel et al., 2014). This milestone revolutionized genomic analysis in yeasts, enabled the first global studies of eukaryotic gene function and expression, and provided initial evidence of the biological complexity of the fungal kingdom. The studies that followed, with Sch. pombe and N. crassa genomes (Galagan et al., 2003; Wood et al., 2003), demonstrated the limits of yeasts as reference organisms for all other fungi. In particular, analysis of the genome of N. crassa, the first filamentous fungus to be sequenced, showed nearly twice as many genes as S. cerevisiae and Sch. pombe, and over 40% of these genes lacked homologues to known proteins (Galagan et al., 2005). The comprehensive work on the yeasts S. cerevisiae and Sch. pombe as model systems has created the basis for understanding eukaryotic biology at the cellular and molecular levels in multicellular organisms such as humans (Cazzanelli et al., 2018). This success can be attributed to simple experimental handling, especially when applying classical and molecular genetic methods for the association of genes with proteins and functions within the cell (Botstein et al., 1988). Since 1996, the fraction of the protein-coding genes of Saccharomyces with a known biological role has increased significantly becoming the highest in the eukaryotes (Botstein and Fink, 2011). Penicillium The discovery of penicillin by Fleming (1929) represented one of the most important advances in human history, and provided the basis for antibiotic research. Since then, Penicillium species have been extensively studied and genome sequence data have been integrated with additional levels of cellular information, with for example metabolome (Nasution et al., 2008) and proteome investigations (Jami et al., 2010). In particular, penicillin synthesis has been investigated fully, with the full genome of the strain Wisconsin 54-1255 being published in 2008 as Penicillium chrysogenum (Van Den Berg et al., 2008). Three years later, the same strain was described as Penicillium rubens based on the analysis of the three loci partial β-tubulin, calmodulin and RPB2 (Houbraken et al., 2011). However, in 2014 Wang et al. compared the genomes of an industrial

Genomic Sequences for Fungi

high-penicillin producing strain NCPC10086 labelled as P. chrysogenum with the genome of Wisconsin 54-1255 referring to both as P. chrysogenum. This shows that despite the large amount of available information, and a keen interest in determining appropriate identification, the taxonomy of Penicillium is still in a state of flux. Yang et al. (2016) provided further insights into the pathogenic potential of other species of Penicillium, with Penicillium capsulatum as an example of a novel fungal pathogen. This species, first reported as a human pathogen by Chen et al. (2013), has since been investigated to determine the genetic basis of its novel pathogenicity. Genome sequencing and comparative genomics of two clinical and environmental strains of P. capsulatum were undertaken by RNA-Seq and phylogenetic analyses (Yang et al., 2016). Other sequencing initiatives have included investigations into ‘blue mould’, a postharvest rot of pomaceous fruits caused by Penicillium expansum and a number of other Penicillium species. The genome of the highly aggressive strain P. expansum R19 was resequenced and analysed together with the genome of the less aggressive Penicillium solitum RS1 (Wu et al., 2019). Wholegenome-scale similarities and differences were examined, and phylogenetic analysis of P. expansum, P. solitum and several closely related Penicillium species showed that the two pathogens were directly involved with blue mould symptoms during apple fruit decay. Aspergillus Aspergillus is one of the best-studied genera of filamentous fungi, as its species have relevance in several scientific fields including medical (Aspergillus fumigatus, Aspergillus terreus), food industry (Aspergillus flavus, Aspergillus parasiticus) and industrial (Aspergillus niger, Aspergillus aculeatus, Aspergillus oryzae) environments. Fundamental studies of the model fungus Aspergillus nidulans have contributed broadly to the understanding of eukaryotic cell biology and molecular processes (de Vries et al., 2017). Many species are used in biotechnology for the production of various metabolites such as antibiotics, organic acids, medicines or enzymes, or as agents in various food fermentations (Samson et al., 2014). Traditionally, the classification and identification

241

of Aspergillus species has been based on morphology, but molecular investigations and genomics have had a significant impact on Aspergillus characterization (McClenny, 2005). Strains of Aspergillus can cause invasive pulmonary aspergillosis (IPA), a major cause of morbidity and mortality in immunocompromised patients (Zhao et al., 2010). Therapeutic success depends on early diagnosis and initiation of antifungal therapy, and so rapid and sensitive timely detection of Aspergillus from clinical samples may facilitate the prompt diagnosis of IPA. Molecular diagnostics have been exploited for the diagnosis of Aspergillus; for example, real-time quantitative PCR (RTqPCR) has been extensively explored as a tool in the detection and identification of Aspergillus and other pathogenic fungi in clinical samples (Kami et al., 2001). Because of its medical importance, many genomes of different species of the genus have been sequenced, and in 1998 the Fungal Infection Trust created The Aspergillus Website repository in partnership with the University of Manchester (UK) and the NHS National Aspergillosis Centre in Manchester (UK) (www.aspergillus.org.uk/, accessed 7 May 2020). Genomics can be used for early detection of toxigenic strains. Specifically, the detection of A. flavus, which can cause a broad spectrum of diseases in humans, is of particular interest. It produces an important class of mycotoxins: the aflatoxins. Together with A. fumigatus and A. terreus, A. flavus is recognized as among the most common causes of aspergillosis. As international trade has progressed, aflatoxin contamination has become a serious risk to human and animal health for developing countries, as it can impact key agricultural commodities such as maize, groundnuts, almonds and cottonseed. The continuing threat to the world population from aflatoxin contamination of food, feed and agricultural commodities has made aflatoxin research one of the most rapidly developing areas of study in food security and public health (Razzaghi-Abyaneh et al., 2014). Aflatoxins are potent carcinogenic and mutagenic compounds produced as secondary metabolites by A. flavus and A. parasiticus. A. flavus belongs to Aspergillus section Flavi and is a common soil inhabitant, and is also found in crops and foods at both preand postharvest stages. In a recent study, metabarcoding with the ITS region of the ribosomal

242

R. Baroncelli and G. Cafà

cluster was used to detect strains of A. flavus taken from agricultural soils, for early detection of toxigenic strains (Cafà et al., 2019). Aspergillus section Flavi includes beneficial species as well as harmful ones, including A. oryzae which is used in food fermentation and enzyme production. A comparative genomics study of 23 Aspergillus species from section Flavi was carried out in 2020 (Kjærbølling et al., 2020). Other sequencing initiatives included de novo sequencing species in Aspergillus section Nigri (Vesth et al., 2018). Such experimental and computational analyses showed that secondary metabolism and regulation can significantly impact the delineation of Aspergillus species. Fusarium Fusarium species are of particular interest in genomics research, and have been studied extensively, as they are among the most important phytopathogenic and toxigenic fungi. Some Fusarium strains can produce mycotoxins, threatening animal and human health, and some can cause opportunistic mycoses in humans (Marasas et al., 1984; Nucci and Anaissie, 2007; O’Donnell et al., 2015; Waalwijk et al., 2017). Some species can produce industrially applicable enzymes and others are important plant pathogens in many economically important crops (Nelson et al., 1983, 1994). Specific Fusarium strains are also applied as biocontrol agents (Edel-Hermann et al., 2009), while others – such as a strain of Fusarium venenatum – are used for the production of meat alternatives (Wiebe, 2004). This economic importance, in addition to their complex reproduction strategies, has led to many complex molecular investigations from multilocus sequence typing to WGS (Waalwijk et al., 1996, 2017; Watanabe, 2013; Walder et al., 2017; Lombard et al., 2019). Studies such as these have had consequences in the definition of species and their delimitation. Historically, the classification of Fusarium was based on morphological characteristics. Compared to other taxa, the species concept has been more challenging in Fusarium as it needs to reconcile how strains with identical phenotypical characteristics had many differences at the genetic level (Waalwijk et al., 2017). Three predominant species concepts have been used to differentiate Fusarium species: morphology, biology

and phylogenetics (Summerell, 2019). As a consequence, several morphological species have been reorganized into multiple biological species. For example, the Fusarium fujikuroi species complex is composed of at least 11 separate mating populations (Martin et al., 2011; Waalwijk et al., 2017). Species discrimination and taxonomy in Fusarium have been developed through a combination of morphology, biology and phylogenetics. F. oxysporum, one of the most economically important and commonly encountered taxa, is a species complex consisting of numerous cryptic species. The identification of these cryptic species is challenging, because of multiple subspecific classification systems, and the lack of reference type material has complicated phylogenetic studies (Lombard et al., 2019). An attempt to resolve the taxonomic position of F. oxysporum as a species has been published recently, providing an epitype for the species and names for many of the multiple cryptic species (Lombard et al., 2019). This study used multilocus phylogenetic inference and subtle morphological differences to identify 15 cryptic taxa described as species (Lombard et al., 2019). Other attempts to resolve the taxonomy of F. oxysporum have been carried out recently with F. oxysporum f. sp. cubense, the causal agent of Fusarium wilt or Panama disease on banana (Maryani et al., 2019). The study investigated ~200 isolates of F. oxysporum f. sp. cubense sampled from across Indonesia. These strains were assessed by multilocus phylogenetic analyses with partial sequences of the genes translation elongation factor (Tef)-1alpha, RPB1 and RPB2. The study showed nine independent genetic lineages for F. oxysporum f. sp. cubense, and one novel clade in the F. oxysporum species complex (Maryani et al., 2019). The complexity of Fusarium, and the need to understand the molecular basis of pathogenicity in the genus, has meant that the number of loci employed in multilocus studies of Fusarium has gradually increased (O’Donnell et al., 2008). Wider genomic studies have focused on the comparative genomics of phenotypically diverse species. Ma et al. (2010) investigated the three species Fusarium graminearum, Fusarium verticillioides and F. oxysporum f. sp. lycopersici and identified lineage-specific (LS) genomic regions. LS regions are rich in transposons and genes with distinct e volutionary profiles that are related to pathogenicity, or may be indicative of horizontal acquisition (Ma et al., 2010).

Genomic Sequences for Fungi

More recently, Zhang et al. (2020) analysed two F. oxysporum isolates from humans in order to examine the role of LS chromosomes in niche adaptation. Other genome investigations of Fusarium species have included F. graminearum (Zhao et al., 2014; King et al., 2015), F. fujikuroi (Wiemann et al., 2013) and Fusarium poae (Vanheule et al., 2016). Another important aspect of particular interest in species of Fusarium is genome structure. Comparative genomics studies of Fusarium have contributed to advancing the knowledge of the complexity of the genomes of fungal pathogens, and have shown that genomes can be divided into two major components defined as core and accessory regions (Ma et al., 2010; Raffaele and Kamoun, 2012; Vanheule et al., 2016). Accessory regions (ARs) may occur within core chromosomes (CCs) or in wholly dispensable (accessory) chromosomes (ACs). Fungal ACs and ARs tend to accumulate mutations and structural rearrangements more rapidly than CCs, so these regions are of particular interest in plant pathology and include key gene clusters related to host-specific and secondary metabolite synthesis (Bertazzoni et al., 2018). More recently, genome sequencing with Fusarium species has been used to try to determine biological and taxonomic characteristics. For example, the full genome of F. oxysporum f. sp. vasinfectum, an important plant pathogen responsible for vascular wilt disease in cotton, has been sequenced to provide a resource for comparative genomics (Seo et al., 2020). Other genome-sequencing initiatives have included other strains of F. oxysporum f. sp. cubense, (Asai et al., 2019), F. oxysporum f. sp. lycopersici, causal agent of tomato wilt (Henry et al., 2019) and F. oxysporum f. sp. conglutinans, a taxon that can significantly affect yield and quality of cabbages worldwide (Liu et al., 2019). These studies have contributed to Fusarium systematics, and demonstrated some of the complexity of genome structures in fungi. Additional investigations into Fusarium could further clarify the species concept and determine whether and how core and ARs can help us to understand speciation processes in the genus (see Chapter 17). Colletotrichum Species in the genus Colletotrichum are among the most important fungal phytopathogens owing to their ubiquity and significant capacity

243

to infect plants. They are also of scientific importance as hemibiotrophic and evolutionary model systems (Perfect et al., 1999; Dean et al., 2012; Baroncelli et al., 2017). The genus includes nearly 230 species organized into at least 14 major phylogenetic lineages (also known as species complexes) which are responsible for diseases such as anthracnose of nearly every crop grown (Cannon et al., 2012; Marin-Felix et al., 2017; Crous et al., 2019a,b; Damm et al., 2019; Gan et al., 2019). In addition to the importance of Colletotrichum as a plant pathogen and a model system, the genus is causing concerns, as several species have been reported to cause human infections. These include Colletotrichum coccodes, Colletotrichum crassipes, Colletotrichum dematium, Colletotrichum gloeosporioides and Colletotrichum graminicola (Guilherme et al., 2001; Fernandez et al., 2002; Werbel et al., 2019). Colletotrichum species usually cause subcutaneous and systemic infections, most commonly occurring in immunosuppressed patients (Guarro et al., 1998). However, their identification is usually undertaken from morphology or from ITS sequences, and therefore species names associated with human pathogens may not be accurate under the current taxonomy. Werbel et al. (2019) reported a case of invasive, cutaneous infection due to Colletotrichum in a stem cell transplant recipient. An extensive multilocus phylogenetic analysis placed the isolate in Colletotrichum siamense, a member of the C. gloeosporioides species complex that had not previously been described as a human pathogen. C. siamense is a cosmopolitan plant pathogen causing severe diseases in many economically important plants. In recent years, there has been some discussion among taxonomists as to whether C. siamense is a single species or a complex comprising up to seven species. In 2016, Liu et al. used GCPSR and coalescent methods with multiple loci to show the lack of any independent evolutionary lineages within C. siamense, confirming the view that it constituted a single distinct species (Liu et al., 2016). Over the last 20 years, significant developments have been reached in understanding the diversity of Colletotrichum, but the taxonomy of this genus is still in a state of flux and this lack of clarity has generated major controversy. One example of this relates to Colletotrichum kahawae. This species is an extremely destructive and specialized pathogen of coffee, causing what is known as coffee berry disease (CBD). While other species can

244

R. Baroncelli and G. Cafà

infect mature berries, CBD is characterized by the exclusive ability to infect green, developing coffee berries (Waller et al., 1993). This pathogen is confined to countries in Africa where Coffea arabica is grown, and can lead to up to 80% yield losses (Silva et al., 2006; van der Vossen and Walyaro, 2008; Hindorf and Omondi, 2011). There is concern that it could potentially spread to other coffee-growing regions, and it is ranked as a quarantine pathogen and even as a biological weapon (Australia Group, 2014; Batista et al., 2017). Weir et al. (2012) used a 6-gene phylogeny to place CBD-causing strains in C. kahawae subsp. kahawae and the phylogenetically close non-CBD-causing isolates in C. kahawae subsp. cigarro (Weir et al., 2012). Cabral et al. (2020) subsequently used a much wider set of molecular genomic loci (selected from whole-genome functional studies), together with pathological, morphological, biochemical, cytogenomic and biological features, to show that both the CBD pathogen and the closely related non-pathogenic strains were better considered as separate, distinct species (C. kahawae and C. cigarro). Comparative analyses and transcription profiles of four Colletotrichum genome sequences have been used to investigate host interaction. The comparisons were between Colletotrichum higginsianum and C. graminicola (O’Connell et al., 2012), and Colletotrichum orbiculare and Colletotrichum fructicola (Gan et al., 2013). These studies showed that Colletotrichum species have large sets of pathogenicity-related genes compared to other closely related pathogens. Genome-wide expression profiling showed that pathogenicity-related genes were transcribed in successive waves that were linked to the lifestyle of the pathogens. Genes encoding for candidate effectors and key secondary metabolism enzymes were induced in the appressoria and during the biotrophic phase, while most degrading enzymes and transporters are upregulated at the switch to the necrotrophic phase. Since 2012, when the first four genomes of Colletotrichum were released, an increasing number of pathogenic Colletotrichum species have been sequenced, leading to the identification of several putative genes with key roles in pathogenicity, and providing signatures of host adaptation (Baroncelli et al., 2016; Gan et al., 2016). Most of the genomes sequenced so far have been

selected to cover the diversity in the genus and have been from different species and complexes. As a result, the diversity shown in these studies is much wider than expected. This demonstrates the need for greater taxonomic resolution in order to better understand the evolution of fungal genomes and the possible association with biological characters. Inevitably the recent revisions in Colletotrichum taxonomy have affected information released on newly sequenced genomes. In March 2020, almost 70 genomes of Colletotrichum were available on GenBank; of these, at least five were released with an incorrect taxonomic designation (unpublished data). While until now most of the research in the field of genomics has focused on identifying genes involved in pathogenicity, the genus Colletotrichum may provide a suitable platform for understanding the genetic bases of speciation, and also to find a pragmatic approach to the use of WGS to resolve taxonomic ambiguities (see Chapter 17).

Discussion and Conclusion We have established that genomics is the field of genetics that aims to combine DNA sequencing techniques and bioinformatics tools to assemble, annotate and investigate genomes. Since the discovery of the structure of DNA, technological advances have allowed genomes to be completely described and linked to the biological traits of fungi. However, the generation of data is faster than the ability to undertake detailed interpretation, and this has generated a gap between the data and their interpretation. Fungal taxonomy is dynamic and this has led to considerable changes in generic and species concepts in many fungal groups. The development of genome-sequencing technology has resulted in a complex range of methods, which at times creates almost too many possibilities for the researcher. Many genomes have only been partially sequenced, and these data have not always been fully interpreted. In addition, the large amount of data has created a need for reliable data repositories. These sources/databases are difficult to maintain and require skilled personnel and funds for their development. Genomics has opened up the field to new studies, such as comparative genomics and population studies, by using complete genome

Genomic Sequences for Fungi

sequencing to resolve taxonomic ambiguities. This has, on one hand, resulted in the generation of an enormous amount of data that have contributed to clarifying key biological questions. On the other hand, however, there is a challenge in providing clear interpretation to most data sets. Sequencing initiatives have involved a very small proportion of the estimated 100,000 fungal taxa. In this chapter, we provide an overview of five major sequencing initiatives that have contributed to the expansion in the field of genomics. S. cerevisiae, for example, was the first eukaryotic genome ever sequenced and represented a major milestone of genomics. Penicillium is important for antibiotic production; Aspergillus and Fusarium are important for the medical, food and industrial relevance of some of their species. Finally, with almost 70 genomes covering 35 species, Colletotrichum may be used as a model system that provides a suitable platform for understanding the genetic basis of speciation and other evolutionary processes. Genomics has progressed enormously in the last 20 years, but it has only touched the surface of the questions that it can and will answer in the near future. Most of the research in fungal genomics so far has focused on identifying genes with specific functions, in order to explore the evolution of specific lineages. However, complete genomes are yet to be fully exploited, to determine the genetic basis of biological traits such as those associated with speciation and pathogenicity in fungi. Quality control of data in GenBank is improving, with regular and consistent development

245

of the standard validation of genome sequence data. For example, submitted genomes undergo standard validations, which include screening for foreign contaminants and vector sequences, as well as analyses of the organism identification. Any annotated assemblies that do not pass these validations may need to be modified prior to being released to the public. However, a more comprehensive standardization of data submission and release should be considered during the peer-review process. Researchers should release raw data, genome assembly and gene prediction based on the requirements of their work. For example, for those who describe the gene content of a genome, the submission of gene prediction should be compulsory. In addition, all the strains under investigation should be deposited in registered culture collections, so that development and reproducibility of the research can be guaranteed by the accessibility of original strains. It is likely that fungal genome sequencing will soon become simpler and cheaper, and allow most research laboratories to undertake in-house, WGS on a regular basis, as sequencers will be accessible to most laboratories. Nonetheless, most of the innovation in the next decade will be driven by theories in innovative perspectives and fields of investigation, rather than in novel technical approaches.

Acknowledgements Riccardo Baroncelli was supported by grant RTI2018-093611-B-I00 from the Ministerio de Ciencia Innovación y Universidades, Spain.

References Aguinaga, O.E., McMahon, A., White, K.N., Dean, A.P. and Pittman, J.K. (2018) Microbial community shifts in response to acid mine drainage pollution within a natural wetland ecosystem. Frontiers in Microbiology 9, 1–14. https://doi.org/10.3389/fmicb.2018.01445 Akalin, A. (2019) Computational Genomics with R. https://compgenomr.github.io/book/index.html Asai, K. (2019) High-quality draft genome sequence of Fusarium oxysporum f. sp. cubense strain 160527, a causal agent of Panama disease. Microbiology Resource Announcements June, 27–29. Australia Group (2014) Australia Group Common Control List Handbook – Volume II: Biological Weapons-Related Common Control Lists. Available online at: http://www.australiagroup.net Baroncelli, R., Amby, D.B., Zapparata, A., Sarrocco, S., Vannacci, G., Le Floch, G., Harrison, R.J., Holub, E., Sukno, S.A., Sreenivasaprasad, S. and Thon, M.R. (2016) Gene family expansions and contractions are associated with host range in plant pathogens of the genus Colletotrichum. BMC Genomics 17 (1), 1–17. https://doi.org/10.1186/s12864-016-2917-6

246

R. Baroncelli and G. Cafà

Baroncelli, R., Talhinhas, P., Pensec, F., Sukno, S.A., Floch, G. Le and Thon, M.R. (2017) The Colletotrichum acutatum species complex as a model system to study evolution and host specialization in plant pathogens. Frontiers in Microbiology 8, 1–7. https://doi.org/10.3389/fmicb.2017.02001 Batista, D., Silva, D.N., Vieira, A., Cabral, A., Pires, A.S., Loureiro, A., Guerra-Guimarães, L., Pereira, A.P., Azinheira, H., Talhinhas, P., Silva, M. do C. and Várzea, V. (2017) Legitimacy and implications of reducing Colletotrichum kahawae to subspecies in plant pathology. Frontiers in Plant Science 7, 1–4. https://doi.org/10.3389/fpls.2016.02051 Berger, S.L., Kouzarides, T., Shiekhattar, R. and Shilatifard, A. (2009) An operational definition of epigenetics. Genes and Development 23 (7), 781–783. https://doi.org/10.1101/gad.1787609 Bertazzoni, S., Williams, A.H., Jones, D.A., Syme, R.A., Tan, K.-C. and Hane, J.K. (2018) Accessories make the outfit: accessory chromosomes and other dispensable DNA regions in plant-pathogenic fungi. Molecular Plant-Microbe Interactions 31 (8), 779–788. https://doi.org/10.1094/mpmi-06-17-0135-fi Boluda, C.G., Rico, V.J., Divakar, P.K., Nadyeina, O., Myllys, L., McMullin, R.T., Zamora, J.C., Scheidegger, C. and Hawksworth, D.L. (2019) Evaluating methodologies for species delimitation: The mismatch between phenotypes and genotypes in lichenized fungi (Bryoria sect. implexae, Parmeliaceae). Persoonia: Molecular Phylogeny and Evolution of Fungi 42, 75–100. https://doi.org/10.3767/persoonia. 2019.42.04 Botstein, D. et al. (1988) Yeast: an experimental organism for modern biology. Science 240, 1439–1443. doi: 10.1126/science.3287619 Botstein, D. and Fink, G.R. (2011) Yeast: An Experimental Organism for 21st Century Biology. Genetics 189 (3), 695–704. https://doi.org/10.1534/genetics.111.130765 Cabral, A., Azinheira, H.G., Talhinhas, P., Batista, D., Ramos, A.P., Silva, M.D.C., Oliveira, H. and Várzea, V. (2020) Pathological, morphological, cytogenomic, biochemical and molecular data support the distinction between Colletotrichum cigarro comb. et stat. nov. and Colletotrichum kahawae. Plants 9 (4), 1–22. https://doi.org/10.3390/plants9040502 Cafà, G., Caggiano, B., Reeve, M.A., Bhatti, H., Honey, S.F., Bajwa, B. and Buddie, A.G. (2019) A polyphasic approach aids early detection of potentially toxigenic aspergilli in soil. Microorganisms 7 (9). https://doi.org/10.3390/microorganisms7090300 Cai, L., Giraud, T., Zhang, N., Begerow, D., Cai, G. and Shivas, R.G. (2011) The evolution of species concepts and species recognition criteria in plant pathogenic fungi. Fungal Diversity 50, 121–133. https:// doi.org/10.1007/s13225-011-0127-8 Cannon, P.F., Damm, U., Johnston, P.R. and Weir, B.S. (2012) Colletotrichum - current status and future directions. Studies in Mycology 73, 181–213. https://doi.org/10.3114/sim0014 Castroagudín, V.L., Moreira, S.I., Pereira, D.A.S., Moreira, S.S., Brunner, P.C., Maciel, J.L.N., Crous, P.W., McDonald, B.A., Alves, E. and Ceresini, P.C. (2016) Pyricularia graminis-tritici, a new Pyricularia species causing wheat blast. Persoonia: Molecular Phylogeny and Evolution of Fungi 37, 199–216. https://doi.org/10.3767/003158516X692149 Cazzanelli, G., Pereira, F., Alves, S., Francisco, R., Azevedo, L., Dias Carvalho, P., Almeida, A., Côrte- Real, M., Oliveira, M., Lucas, C., Sousa, M. and Preto, A. (2018) The yeast Saccharomyces cerevisiae as a model for understanding RAS proteins and their role in human tumorigenesis. Cells (2). https://doi.org/10.3390/cells7020014 Chen, M., Houbraken, J., Pan, W., Zhang, C., Peng, H., Wu, L., Xu, D., Xiao, Y., Wang, Z. and Liao, W. (2013) Pulmonary fungus ball caused by Penicillium capsulatum in a patient with type 2 diabetes: A case report. BMC Infectious Diseases 13 (1), 1. https://doi.org/10.1186/1471-2334-13-496 Couch, B.C. and Kohn, L.M. (2002) A multilocus gene genealogy concordant with host preference indicates segregation of a new species, Magnaporthe oryzae, from M. grisea. Mycologia 94 (4), 683–693. https://doi.org/10.1080/15572536.2003.11833196 Crous, P.W., Carnegie, A.J., Wingfield, M.J., Sharma, R., Mughini, G., Noordeloos, M.E., Santini, A., Shouche, Y.S., Bezerra, J.D.P. et al. (2019a) Fungal Planet description sheets: 868–950. Persoonia Molecular Phylogeny and Evolution of Fungi 42, 291–473. doi: 10.3767/persoonia.2019.42.1110.3767/ persoonia.2019.42.11 Crous, P.W., Wingfield, M.J., Lombard, L., Roets, F., Swart, W.J., Alvarado, P., Carnegie, A.J., Moreno, G., Luangsa-Ard, J., Thangavel, R. et al. (2019b) Fungal Planet description sheets: 951 – 1041. Persoonia - Molecular Phylogeny and Evolution of Fungi 43, 223–425. doi: 10.3767/persoonia.2019.43.06 Cuomo, C.A. and Birren, B.W. (2010) The Fungal Genome Initiative and lessons learned from genome sequencing. Methods in Enzymology 470, 833–855. https://doi.org/10.1016/S0076-6879(10)70034-3

Genomic Sequences for Fungi

247

Damm, U., Sato, T., Alizadeh, A., Groenewald, J.Z. and Crous, P.W. (2019) The Colletotrichum dracaenophilum, C. magnum and C. orchidearum species complexes. Studies in Mycology 92, 1–46. https:// doi.org/10.1016/j.simyco.2018.04.001 de Vries, R.P., Riley, R., Wiebenga, A., Aguilar-Osorio, G., Amillis, S., Uchima, C.A., Anderluh, G., Asadollahi, M., Askin, M., Barry, K., Battaglia, E., Bayram, Ö., Benocci, T., Braus-Stromeyer, S.A., Caldana, C., Cánovas, D., Cerqueira, G.C., Chen, F., Chen, W. et al. (2017) Comparative genomics reveals high biological diversity and specific adaptations in the industrially and medically important fungal genus Aspergillus. Genome Biology 18 (1). https://doi.org/10.1186/s13059-017-1151-0 Dean, R.A., Talbot, N.J., Ebbole, D.J., Farman, M.L., Mitchell, T.K., Orbach, M.J., Thon, M., Kulkarni, R., Xu, J.R., Pan, H., Read, N.D., Lee, Y.I., Carbone, I., Brown, D., Yeon, Y.O., Donofrio, N., Jun, S.J., Soanes, D.M., Djonovic, S. et al. (2005) The genome sequence of the rice blast fungus Magnaporthe grisea. Nature 434 (7036), 980–986. https://doi.org/10.1038/nature03449 Dean, R., Van Kan, J.A.L., Pretorius, Z.A., Hammond-Kosack, K.E., Di Pietro, A., Spanu, P.D., Rudd, J.J., Dickman, M., Kahmann, R., Ellis, J. and Foster, G.D. (2012) The Top 10 fungal pathogens in molecular plant pathology. Molecular Plant Pathology 13 (4), 414–430. https://doi.org/10.1111/j.1364-3703. 2011.00783.x Drmanac, R., Sparks, A.B., Callow, M.J., Halpern, A.L., Burns, N.L., Kermani, B.G., Carnevali, P., Nazarenko, I., Nilsen, G.B., Yeung, G., Dahl, F., Fernandez, A., Staker, B., Pant, K.P., Baccash, J., Borcherding, A.P., Brownley, A., Cedeno, R., Chen, L. et al. (2010) Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327 (5961), 78–81. https://doi. org/10.1126/science.1181498 Dujon, B., Sherman, D., Fischer, G., Durrens, P., Casaregela, S., Lafentaine, I., De Montigny, J., Marck, C., Neuvéglise, C., Talla, E., Goffard, N., Frangeul, L., Algie, M., Anthouard, V., Babour, A., Barbe, V., Barnay, S., Blanchin, S., Beckerich, J.M. et al. (2004) Genome evolution in yeasts. Nature 430 (6995), 35–44. https://doi.org/10.1038/nature02579 Edel-Hermann, V., Brenot, S., Gautheron, N., Aimé, S., Alabouvette, C. and Steinberg, C. (2009) Ecological fitness of the biocontrol agent Fusarium oxysporum Fo47 in soil and its impact on the soil microbial communities. FEMS Microbiology Ecology 68 (1), 37–45. https://doi.org/10.1111/j.1574-6941.2009.00656.x Eid, J., Fehr, A., Gray, J., Luong, K., Lyle, J., Otto, G., Peluso, P., Rank, D., Baybayan, P., Bettman, B., Bibillo, A., Bjornson, K., Chaudhuri, B., Christians, F., Cicero, R., Clark, S., Dalal, R., deWinter, A., Dixon, J. et al. (2009) Real-time DNA sequencing from single polymerase molecules. Science 323 (5910), 133–138. https://doi.org/10.1126/science.1162986 Endler, J.A. (1989) Conceptual and other problems in speciation. In Otte, D. and Endler, J.A. (eds) Speciation and its Consequences. Sinauer Associates, Sunderland, Massachusetts. ISBN 978-0-87893-658-8. Ene, I.V., Farrer, R.A., Hirakawa, M.P., Agwamba, K., Cuomo, C.A. and Bennett, R.J. (2018) Global analysis of mutations driving microevolution of a heterozygous diploid fungal pathogen. Proceedings of the National Academy of Sciences of the United States of America 115 (37), E8688–E8697. https:// doi.org/10.1073/pnas.1806002115 Engel, S.R., Dietrich, F.S., Fisk, D.G., Binkley, G., Balakrishnan, R., Costanzo, M.C., Dwight, S.S., Hitz, B.C., Karra, K., Nash, R.S., Weng, S., Wong, E.D., Lloyd, P., Skrzypek, M.S., Miyasato, S.R., Simison, M. and Cherry, J.M. (2014) The reference genome sequence of Saccharomyces cerevisiae: then and now. G3: Genes, Genomes, Genetics 4 (3), 389–398. https://doi.org/10.1534/g3.113.008995 Escobar-Zepeda, A., Godoy-Lozano, E.E., Raggi, L., Segovia, L., Merino, E., Gutiérrez-Rios, R.M., Juarez, K., Licea-Navarro, A.F., Pardo-Lopez, L. and Sanchez-Flores, A. (2018) Analysis of sequencing strategies and tools for taxonomic annotation: Defining standards for progressive metagenomics. Scientific Reports 8 (1), 1–13. https://doi.org/10.1038/s41598-018-30515-5 Fang, C., Ma, Y., Wu, S., Liu, Z., Wang, Z., Yang, R., Hu, G., Zhou, Z., Yu, H., Zhang, M., Pan, Y., Zhou, G., Ren, H., Du, W., Yan, H., Wang, Y., Han, D., Shen, Y., Liu, S. et al. (2017) Genome-wide association studies dissect the genetic networks underlying agronomical traits in soybean. Genome Biology 18 (1). https://doi.org/10.1186/s13059-017-1289-9 Fernandez, V., Dursun, D., Miller, D. and Alfonso, E.C. (2002) Colletotrichum keratitis. American Journal of Ophthalmology 134 (3), 435–438. https://doi.org/10.1016/S0002-9394(02)01576-3 Fleming, A. (1929) On the antibacterial action of cultures of a Penicillium, with special reference to their use in the isolation of B. influenzæ. British Journal of Experimental Pathology 10 (3), 226–236. https:// www.ncbi.nlm.nih.gov/pmc/articles/PMC2048009/ Galagan, J.E., Calvo, S.E., Borkovich, K.A., Selker, E.U., Read, N.O., Jaffe, D., FitzHugh, W., Ma, L.J., Smirnov, S., Purcell, S., Rehman, B., Elkins, T., Engels, R., Wang, S., Nielsen, C.B., Butler, J.,

248

R. Baroncelli and G. Cafà

Endrizzi, M., Qui, D., Ianakiev, P. et al. (2003) The genome sequence of the filamentous fungus Neurospora crassa. Nature 422 (6934), 859–868. https://doi.org/10.1038/nature01554 Galagan, J.E., Henn, M.R., Ma, L.J., Cuomo, C.A. and Birren, B. (2005) Genomics of the fungal kingdom: Insights into eukaryotic biology. Genome Research 15 (12), 1620–1631. https://doi.org/10.1101/ gr.3767105 Gan, P., Tsushima, A., Hiroyama, R., Narusaka, M., Takano, Y., Narusaka, Y., Kawaradani, M., Damm, U. and Shirasu, K. (2019) Colletotrichum shisoi sp. nov., an anthracnose pathogen of Perilla frutescens in Japan: molecular phylogenetic, morphological and genomic evidence. Scientific Reports 9 (1), 1–13. https://doi.org/10.1038/s41598-019-50076-5 Gan, P., Ikeda, K., Irieda, H., Narusaka, M., O’Connell, R.J., Narusaka, Y., Takano, Y., Kubo, Y. and Shirasu, K. (2013) Comparative genomic and transcriptomic analyses reveal the hemibiotrophic stage shift of Colletotrichum fungi. New Phytologist 197 (4), 1236–1249. https://doi.org/10.1111/nph.12085 Gan, P., Narusaka, M., Kumakura, N., Tsushima, A., Takano, Y., Narusaka, Y. and Shirasu, K. (2016) Genus-wide comparative genome analyses of Colletotrichum species reveal specific gene family losses and gains during adaptation to specific infection lifestyles. Genome Biology and Evolution 8 (5), 1467–1481. https://doi.org/10.1093/gbe/evw089 Giraud, T., Refrégier, G., Le Gac, M., de Vienne, D.M. and Hood, M.E. (2008) Speciation in fungi. Fungal Genetics and Biology 45 (6), 791–802. https://doi.org/10.1016/j.fgb.2008.02.001 Gladieux, P., Ravel, S., Rieux, A., Cros-Arteil, S., Adreit, H., Milazzo, J., Thierry, M., Fournier, E., Terauchi, R. and Tharreau, D. (2018) Coexistence of multiple endemic and pandemic lineages of the rice blast pathogen. MBio 9 (2), 1–18. https://doi.org/10.1128/mBio.01806-17 Goffeau, A., Barrell, G., Bussey, H., Davis, R.W., Dujon, B., Feldmann, H., Galibert, F., Hoheisel, J.D., Jacq, C., Johnston, M., Louis, E.J., Mewes, H.W., Murakami, Y., Philippsen, P., Tettelin, H. and Oliver, S. G. (1996). Life with 6000 genes. Science 274 (5287), 546–567. https://doi.org/10.1126/science.274. 5287.546 Grigoriev, I.V., Cullen, D., Goodwin, S.B., Hibbett, D., Jeffries, T. W., Kubicek, C.P., Kuske, C., Magnuson, J.K., Martin, F., Spatafora, J.W., Tsang, A. and Baker, S.E. (2011) Fueling the future with fungal genomics. Mycology 2 (3), 192–209. https://doi.org/10.1080/21501203.2011.584577 Grigoriev, I.V., Nikitin, R., Haridas, S., Kuo, A., Ohm, R., Otillar, R., Riley, R., Salamov, A., Zhao, X., Korzeniewski, F., Smirnova, T., Nordberg, H., Dubchak, I. and Shabalov, I. (2014). MycoCosm portal: Gearing up for 1000 fungal genomes. Nucleic Acids Research 42 (D1), 699–704. https://doi. org/10.1093/nar/gkt1183 Guarro, J., Svidzinski, T.E., Zaror, L., Forjaz, M.H., Gené, J. and Fischman, O. (1998) Subcutaneous hyalohyphomycosis caused by Colletotrichum gloeosporioides. Journal of Clinical Microbiology 36 (10), 3060–3065. https://doi.org/10.1128/jcm.36.10.3060-3065.1998 Guilherme, L., Castro, M., Da, C., Lacaz, S., Guarro, J., Heins-vaccari, E.M., Santos, R., Leite, D.E.F., Herna, C.I.A., Arriagada, N., Ito, E.M., Yuriko, N., Valente, S. and Nunes, R.S. (2001). Phaeohyphomycotic cyst caused by Colletotrichum crassipes. Journal of Clinical Microbiology 39 (6), 2321–2324. https://doi.org/10.1128/JCM.39.6.2321 Handelsman, J. (2004) Metagenomics: application of genomics to uncultured microorganisms. Microbiology and Molecular Biology Reviews 68 (4), 669–685. https://doi.org/10.1128/MMBR.68.4.669-685.2004 Hartl, D.L. and Clark, A.G. (2007) Principles of Population Genetics, 4th edn. Sinauer Assoc. 652 pp. Henry, P.M., Stueven, M., Li, S., Miyao, E.M., Gordon, T.R., Davis, R.M. and Doan, H.K. (2019) Genome sequence of a California isolate of Fusarium oxysporum f. sp. lycopersici race 3, a fungus causing wilt disease on tomato. Microbiology Resource Announcements 8 (15), 1–2. https://doi.org/10.1128/ mra.01713-18 Hindorf, H. and Omondi, C.O. (2011) A review of three major fungal diseases of Coffea arabica L. in the rainforests of Ethiopia and progress in breeding for resistance in Kenya. Journal of Advanced Research 2 (2), 109–120. https://doi.org/10.1016/j.jare.2010.08.006 Houbraken, J., Frisvad, J.C. and Samson, R.A. (2011) Fleming’s penicillin producing strain is not Penicillium chrysogenum but P. rubens. IMA Fungus 2 (1), 87–95. https://doi.org/10.5598/imafungus. 2011.02.01.12 Howorka, S., Cheley S. and Bayley, H. (2001) Sequence-specific detection of individual DNA strands using engineered nanopores. Nature Biotechnology 19, 636–639. Imhoff, J. (2016) New dimensions in microbial ecology—functional genes in studies to unravel the biodiversity and role of functional microbial groups in the environment. Microorganisms 4 (2), 19. https://doi. org/10.3390/microorganisms4020019

Genomic Sequences for Fungi

249

Jami, M.S., Barreiro, C., García-Estrada, C. and Martín, J. F. (2010) Proteome analysis of the penicillin producer Penicillium chrysogenum: Characterization of protein changes during the industrial strain improvement. Molecular and Cellular Proteomics 9 (6), 1182–1198. https://doi.org/10.1074/mcp. M900327-MCP200 Kami, M., Fukui, T., Ogawa, S., Kazuyama, Y., Machida, U., Tanaka, Y., Kanda, Y., Kashima, T., Yamazaki, Y., Hamaki, T., Mori, S., Akiyama, H., Mutou, Y., Sakamaki, H., Osumi, K., Kimura, S. and Hirai, H. (2001) Use of Real-Time PCR on blood samples for diagnosis of invasive Aspergillosis. Clinical Infectious Diseases 33 (9), 1504–1512. https://doi.org/10.1086/323337 Kawashima, E.H., Farinelli, L. and Mayer, P. (1998) “Patent: Method of nucleic acid amplification”. Keller, N.P., Turner, G. and Bennett, J.W. (2005) Fungal secondary metabolism - From biochemistry to genomics. Nature Reviews Microbiology 3 (12), 937–947. https://doi.org/10.1038/nrmicro1286 King, R., Urban, M., Hammond-Kosack, M.C.U., Hassani-Pak, K. and Hammond-Kosack, K.E. (2015) The completed genome sequence of the pathogenic ascomycete fungus Fusarium graminearum. BMC Genomics 16 (1), 1–21. https://doi.org/10.1186/s12864-015-1756-1 Kjærbølling, I., Vesth, T., Frisvad, J.C., Nybo, J.L., Theobald, S., Kildgaard, S., Petersen, T.I., Kuo, A., Sato, A., Lyhne, E.K., Kogle, M.E., Wiebenga, A., Kun, R.S., Lubbers, R.J.M., Mäkelä, M.R., Barry, K., Chovatia, M., Clum, A., Daum, C. et al. (2020) A comparative genomics study of 23 Aspergillus species from section Flavi. Nature Communications. https://doi.org/10.1038/s41467-019-14051-y Korte, A. and Farlow, A. (2013) The advantages and limitations of trait analysis with GWAS: A review. Plant Methods 9 (1), 1. https://doi.org/10.1186/1746-4811-9-29 Kuhn, G., Hijri, M. and Sanders, I.R. (2001) Evidence for the evolution of multiple genomes in arbuscular mycorrhizal fungi. Nature 414 (6865), 745–748. https://doi.org/10.1038/414745a Kukurba, K.R. and Montgomery, S.B. (2015) RNA sequencing and analysis. Cold Spring Harbor Protocols 2015 (11), 951–969. https://doi.org/10.1101/pdb.top084970 Liu, F., Wang, M., Damm, U., Crous, P.W. and Cai, L. (2016) Species boundaries in plant pathogenic fungi: A Colletotrichum case study. BMC Evolutionary Biology 16 (1), 1–14. https://doi.org/10.1186/s12862016-0649-5 Liu, X., Xing, M., Kong, C., Fang, Z., Yang, L., Zhang, Y., Wang, Y., Ling, J., Yang, Y. and Lv, H. (2019) Genetic diversity, virulence, race profiling, and comparative genomic analysis of the Fusarium oxysporum f. sp. conglutinans strains infecting cabbages in China. Frontiers in Microbiology 10, 1–14. https://doi.org/10.3389/fmicb.2019.01373 Lombard, L., Sandoval-Denis, M., Lamprecht, S.C. and Crous, P.W. (2019). Epitypification of Fusarium oxysporum – Clearing the taxonomic chaos. Persoonia: Molecular Phylogeny and Evolution of Fungi 43, 1–47. https://doi.org/10.3767/persoonia.2019.43.01 Lowe, W.L., & Reddy, T.E. (2015) Genomic approaches for understanding the genetics of complex disease. Genome Research 25 (10), 1432–1441. https://doi.org/10.1101/gr.190603.115 Ma, L.J., Van Der Does, H.C., Borkovich, K.A., Coleman, J.J., Daboussi, M.J., Di Pietro, A., Dufresne, M., Freitag, M., Grabherr, M., Henrissat, B., Houterman, P.M., Kang, S., Shim, W.B., Woloshuk, C., Xie, X., Xu, J.R., Antoniw, J., Baker, S.E., Bluhm, B. H. et al. (2010) Comparative genomics reveals mobile pathogenicity chromosomes in Fusarium. Nature 464 (7287), 367–373. https://doi.org/10.1038/nature08850 Marasas, W.F.O., Nelson, P.E. and Toussoun, T.A. (1984) Toxigenic Fusarium species: identity and mycotoxicology. Pennsylvania State University Press, University Park, Pennsylvania. Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader, J.S., Bemben, L.A., Berka, J., Braverman, M.S., Chen, Y.J., Chen, Z., Dewell, S.B., Du, L., Fierro, J.M., Gomes, X.V., Godwin, B.C., He, W., Helgesen, S., Ho, C.H., Irzyk, G.P. et al. (2005). Genome sequencing in microfabricated high-density picolitre reactors. Nature 437 (7057), 376–380. https://doi.org/10.1038/nature03959 Marin-Felix, Y., Groenewald, J. Z., Cai, L., Chen, Q., Marincowitz, S., Barnes, I., Bensch, K., Braun, U., Camporesi, E., Damm, U., de Beer, Z.W., Dissanayake, A., Edwards, J., Giraldo, A., HernándezRestrepo, M., Hyde, K.D., Jayawardena, R.S., Lombard, L., Luangsa-ard, J. et al. (2017) Genera of phytopathogenic fungi: GOPHY 1. Studies in Mycology 86, 99–216. https://doi.org/10.1016/j.simyco.2017.04.002 Martin, S.H., Wingfield, B.D., Wingfield, M.J. and Steenkamp, E. T. (2011) Causes and consequences of variability in peptide mating pheromones of ascomycete fungi. Molecular Biology and Evolution 28 (7), 1987–2003. https://doi.org/10.1093/molbev/msr022 Martinez, D., Larrondo, L.F., Putnam, N., Sollewijn Gelpke, M.D., Huang, K., Chapman, J., Helfenbein, K.G., Ramaiya, P., Detter, J.C., Larimer, F., Coutinho, P.M., Henrissat, B., Berka, R., Cullen, D. and

250

R. Baroncelli and G. Cafà

Rokhsar, D. (2004) Genome sequence of the lignocellulose degrading fungus Phanerochaete chrysosporium strain RP78. Nature Biotechnology 22 (6), 695–700. https://doi.org/10.1038/nbt967 Maryani, N., Lombard, L., Poerba, Y.S., Subandiyah, S., Crous, P.W. and Kema, G.H.J. (2019). Phylogeny and genetic diversity of the banana Fusarium wilt pathogen Fusarium oxysporum f. sp. cubense in the Indonesian centre of origin. Studies in Mycology 92, 155–194. https://doi.org/10.1016/j.simyco. 2018.06.003 Matute, D.R. and Sepúlveda, V.E. (2019) Fungal species boundaries in the genomics era. Fungal Genetics and Biology 131, 103249. DOI: 10.1016/j.fgb.2019.103249 McClenny, N. (2005) Laboratory detection and identification of Aspergillus species by microscopic observation and culture: The traditional approach. Medical Mycology 43 (1), 125–128. https://doi.org/ 10.1080/13693780500052222 Merriman, B., Ion Torrent R&D Team and Rothberg, J.M. (2012) Progress in ion torrent semiconductor chip based sequencing. Electrophoresis 33, 3397–3417. doi: 10.1002/elps.201200424. Moreno-Moral, A. and Petretto, E. (2016) From integrative genomics to systems genetics in the rat to link genotypes to phenotypes. DMM Disease Models and Mechanisms 9 (10), 1097–1110. https://doi. org/10.1242/dmm.026104 Mukherjee, S., Stamatis, D., Bertsch, J., Ovchinnikova, G., Verezemska, O., Isbandi, M., Thomas, A.D., Ali, R., Sharma, K., Kyrpides, N.C. and Reddy, T.B.K. (2017) Genomes OnLine Database (GOLD) v.6: Data updates and feature enhancements. Nucleic Acids Research 45 (D1), D446–D456. https:// doi.org/10.1093/nar/gkw992 Nasution, U., van Gulik, W.M., Ras, C., Proell, A. and Heijnen, J.J. (2008) A metabolome study of the steady-state relation between central metabolism, amino acid biosynthesis and penicillin production in Penicillium chrysogenum. Metabolic Engineering 10 (1), 10–23. https://doi.org/https://doi. org/10.1016/j.ymben.2007.07.001 Nelson, P.E., Toussoun, T.A. and Marasas, W.F. (1983) Fusarium species: an illustrated manual for identification. Pennsylvania State University Press, University Park, Pennsylvania. Nelson, P.E., Dignani, M.C. and Anaissie, E.J. (1994) Taxonomy, biology, and clinical aspects of Fusarium species. Clinical Microbiology Reviews 7 (4), 479–504. https://doi.org/10.1128/CMR.7.4.479 Ng, S.B., Turner, E.H., Robertson, P.D., Flygare, S.D., Abigail, W., Lee, C., Shaffer, T., Wong, M., Bhattacharjee, A., Evan, E., Bamshad, M., Nickerson, D. and Shendure, J. (2010). Targeted capture and Massicely parallel sequencing of twelve human exomes. Nature 461 (7261), 272–276. https://doi. org/10.1038/nature08250.Targeted Nowrousian, M. (2010) Next-generation sequencing techniques for eukaryotic microorganisms: Sequencing- based solutions to biological problems. Eukaryotic Cell 9 (9), 1300–1310. https://doi.org/10.1128/ EC.00123-10 NRC (2007a) National Research Council (US) Committee on Metagenomics: Challenges and Functional Applications. The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. National Academies Press (US), Washington DC. Available from: https://www.ncbi.nlm.nih.gov/ books/NBK54006/ doi: 10.17226/11902 NRC (2007b) The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. National Research Council (US) Committee on Metagenomics: Challenges and Functional Applications. National Academies Press (US), Washington DC. doi: 10.17226/11902 Nucci, M. and Anaissie, E. (2007) Fusarium infections in immunocompromised patients. Clinical Microbiology Reviews 20 (4), 695–704. https://doi.org/10.1128/CMR.00014-07 O’Connell, R.J., Thon, M.R., Hacquard, S., Amyotte, S.G., Kleemann, J., Torres, M.F., Damm, U., Buiate, E.A., Epstein, L., Alkan, N., Altmüller, J., Alvarado-Balderrama, L., Bauser, C.A., Becker, C., Birren, B.W., Chen, Z., Choi, J., Crouch, J.A., Duvick, J.P. et al. (2012) Lifestyle transitions in plant pathogenic Colletotrichum fungi deciphered by genome and transcriptome analyses. Nature Genetics 44 (9), 1060–1065. https://doi.org/10.1038/ng.2372 O’Donnell, K., Sutton, D.A., Fothergill, A., McCarthy, D., Rinaldi, M.G., Brandt, M.E., Zhang, N. and Geiser, D.M. (2008) Molecular phylogenetic diversity, multilocus haplotype nomenclature, and in vitro antifungal resistance within the Fusarium solani Species complex. Journal of Clinical Microbiology 46 (8), 2477–2490. https://doi.org/10.1128/JCM.02371-07 O’Donnell, K., Ward, T.J., Robert, V.A.R.G., Crous, P.W., Geiser, D.M. and Kang, S. (2015) DNA sequence-based identification of Fusarium: Current status and future directions. Phytoparasitica 43 (5), 583–595. https://doi.org/10.1007/s12600-015-0484-z

Genomic Sequences for Fungi

251

Otero, J. M., Vongsangnak, W., Asadollahi, M.A., Olivares-Hernandes, R., Maury, J., Farinelli, L., Barlocher, L., Østerås, M., Schalk, M., Clark, A. and Nielsen, J. (2010) Whole genome sequencing of Saccharomyces cerevisiae: From genotype to phenotype for improved metabolic engineering applications. BMC Genomics 11 (1), 1–17. https://doi.org/10.1186/1471-2164-11-723 Pareek, C.S., Smoczynski, R. and Tretyn, A. (2011) Sequencing technologies and genome sequencing. Journal of Applied Genetics 52 (4), 413–435. https://doi.org/10.1007/s13353-011-0057-x PHE (2018) Implementing pathogen genomics: a case study. The work by PHE to establish the central Whole Genome Sequencing (WGS) service and the transformation of a national bacteriology reference laboratory. https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_ data/file/731057/implementing_pathogen_genomics_a_case_study.pdf Pei, A.Y., Oberdorf, W.E., Nossa, C.W., Agarwal, A., Chokshi, P., Gerz, E.A., Jin, Z., Lee, P., Yang, L., Poles, M., Brown, S.M., Sotero, S., DeSantis, T., Brodie, E., Nelson, K. and Pei, Z. (2010) Diversity of 16S rRNA genes within individual prokaryotic genomes. Applied and Environmental Microbiology 76 (12), 3886–3897. https://doi.org/10.1128/AEM.02953-09 Perfect, S.E., Hughes, H.B., O’Connell, R.J. and Green, J.R. (1999) Colletotrichum: A model genus for studies on pathology and fungal–plant interactions. Fungal Genetics and Biology 27 (2–3), 186–198. https://doi.org/10.1006/FGBI.1999.1143 Quail, M.A., Smith, M., Coupland, P., Otto, T.D., Harris, S.R., Connor, T.R., Bertoni, A., Swerdlow, H.P. and Gu, Y. (2012) A tale of 3 NGS sequencing platforms. BMC Genomics 13 (341), 13. https://doi. org/10.1186/1471-2164-13-341 Raffaele, S. and Kamoun, S. (2012) Genome evolution in filamentous plant pathogens: why bigger can be better. Nature Reviews Microbiology 10, 417–430. doi: 10.1038/nrmicro2790. Razzaghi-Abyaneh, M., Chang, P-K, Shams-Ghahfarokhi, M. and Rai, M. (2014) Global health issues of aflatoxins in food and agriculture: challenges and opportunities. Frontiers in Microbiology 5, 420. doi:10.3389/fmicb.2014.00420 Ronaghi, M., Uhlén, M. and Nyrén, P. (1998) A sequencing method based on real-time pyrophosphate. Science 281 (5375), 363–365. https://doi.org/10.1126/science.281.5375.363 Samson, R.A., Visagie, C.M., Houbraken, J., Hong, S.B., Hubka, V., Klaassen, C.H.W., Perrone, G., Seifert, K.A., Susca, A., Tanney, J.B., Varga, J., Kocsubé, S., Szigeti, G., Yaguchi, T. and Frisvad, J.C. (2014) Phylogeny, identification and nomenclature of the genus Aspergillus. Studies in Mycology 78 (1), 141–173. https://doi.org/10.1016/j.simyco.2014.07.004 Sanger, F., Nicklen, S. and Coulson, A.R. (1977) DNA sequencing with chain-terminating inhibitors. Proceedings of the National Academy of Sciences 74 (12), 5463–5467. https://doi.org/10.1073/pnas. 74.12.5463 Sawyers, C.L. (2008) The cancer biomarker problem. Nature 452 (7187), 548–552. https://doi.org/10.1038/ nature06913 Seo, S., Pokhrel, A. and Coleman, J.J. (2020) The genome sequence of five genotypes of Fusarium oxysporum f. sp. vasinfectum: A resource for studies on Fusarium wilt of cotton. Molecular Plant- Microbe Interactions 33 (2), 138–140. https://doi.org/10.1094/MPMI-07-19-0197-A Silva, M. do C., Várzea, V., Guerra-Guimarães, L., Azinheira, H.G., Fernandez, D., Petitot, A.-S., Bertrand, B., Lashermes, P. and Nicole, M. (2006) Coffee resistance to the main diseases: leaf rust and coffee berry disease. Brazilian Journal of Plant Physiology 18 (1), 119–147. https://doi.org/10.1590/s167704202006000100010 Stukenbrock, E.H., Jørgensen, F.G., Zala, M., Hansen, T.T., McDonald, B.A. and Schierup, M.H. (2010) Whole-genome and chromosome evolution associated with host adaptation and speciation of the wheat pathogen Mycosphaerella graminicola. PLoS Genetics 6 (12), 1–13. https://doi.org/10.1371/ journal.pgen.1001189 Summerell, B.A. (2019) Resolving Fusarium: current status of the genus. Annual Review of Phytopathology 57 (1), 323–339. https://doi.org/10.1146/annurev-phyto-082718-100204 Taylor, J., Branco, S., Gao, C., Hann-Soden, C., Montoya, L., Sylvain, I. and Gladieux, P. (2017) Sources of fungal genetic variation and associating it with phenotypic diversity. Microbiology Spectrum 5 (5). doi:10.1128/microbiolspec.FUNK-0057-2016. Taylor, J.W., Jacobson, D.J., Kroken, S., Kasuga, T., Geiser, D.M., Hibbett, D.S. and Fisher, M.C. (2000) Phylogenetic species recognition and species concepts in fungi. Fungal Genetics and Biology 31 (1), 21–32. https://doi.org/10.1006/fgbi.2000.1228 Teixeira, M.M., de Almeida, L.G.P., Kubitschek-Barreira, P., Alves, F.L., Kioshima, É.S., Abadio, A.K.R., Fernandes, L., Derengowski, L.S., Ferreira, K.S., Souza, R.C., Ruiz, J.C., de Andrade, N.C., Paes,

252

R. Baroncelli and G. Cafà

H.C., Nicola, A.M., Albuquerque, P., Gerber, A.L., Martins, V.P., Peconick, L.D.F., Neto, A.V. et al. (2014) Comparative genomics of the major fungal agents of human and animal sporotrichosis: Sporothrix schenckii and Sporothrix brasiliensis. BMC Genomics, 15(1), 1–22. https://doi.org/10.1186/14712164-15-943 Valent, B., Farman, M., Tosa, Y., Begerow, D., Fournier, E., Gladieux, P., Islam, M.T., Kamoun, S., Kemler, M., Kohn, L.M., Lebrun, M.H., Stajich, J.E., Talbot, N.J., Terauchi, R., Tharreau, D. and Zhang, N. (2019) Pyricularia graminis-tritici is not the correct species name for the wheat blast fungus: response to Ceresini et al. (MPP 20:2). Molecular Plant Pathology 20 (2), 173–179. https://doi.org/10.1111/ mpp.12778 Valouev, A., Ichikawa, J., Tonthat, T., Stuart, J., Ranade, S., Peckham, H., Zeng, K., Malek, J.A., Costa, G., McKernan, K., Sidow, A., Fire, A. and Johnson, S.M. (2008) A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning. Genome Research, 18 (7), 1051–1063. https://doi.org/10.1101/gr.076463.108 Van Den Berg, M.A., Albang, R., Albermann, K., Badger, J.H., Daran, J.M., Driessen, A.J., Garcia-Estrada, C., Fedorova, N.D., Harris, D.M., Heijne, W.H.M., Joardar, V., Kiel, J.A.K., Kovalchuk, A., Martín, J.F., Nierman, W.C., Nijland, J.G., Pronk, J.T., Roubos, J.A., Van Der Klei, I.J. et al. (2008) Genome sequencing and analysis of the filamentous fungus Penicillium chrysogenum. Nature Biotechnology 26 (10), 1161–1168. https://doi.org/10.1038/nbt.1498 van der Vossen, H.A.M. and Walyaro, D.J. (2008) Additional evidence for oligogenic inheritance of durable host resistance to coffee berry disease (Colletotrichum kahawae) in arabica coffee (Coffea arabica L.). Euphytica 165 (1), 105. https://doi.org/10.1007/s10681-008-9769-3 Vanheule, A., Audenaert, K., Warris, S., van de Geest, H., Schijlen, E., Höfte, M., De Saeger, S., Haesaert, G., Waalwijk, C. and van der Lee, T. (2016) Living apart together: Crosstalk between the core and supernumerary genomes in a fungal plant pathogen. BMC Genomics 17 (1), 1–18. https://doi. org/10.1186/s12864-016-2941-6 Venter, J.C., Remington, K., Heidelberg, J.F., Halpern, A.L., Rusch, D., Eisen, J.A., Wu, D., Paulsen, I., Nelson, K.E., Nelson, W., Fouts, D.E., Levy, S., Knap, A.H., Lomas, M.W., Nealson, K., White, O., Peterson, J., Hoffman, J., Parsons, R. et al. (2004) Environmental genome shotgun sequencing of the Sargasso Sea. Science 304 (5667), 66–74. https://doi.org/10.1126/science.1093857 Vesth, T.C., Nybo, J.L., Theobald, S., Frisvad, J.C., Larsen, T.O., Nielsen, K.F., Hoof, J.B., Brandl, J., Salamov, A., Riley, R., Gladden, J. M., Phatale, P., Nielsen, M.T., Lyhne, E.K., Kogle, M.E., Strasser, K., McDonnell, E., Barry, K., Clum, A. et al. (2018) Investigation of inter- and intraspecies variation through genome sequencing of Aspergillus section Nigri. Nature Genetics 50 (12), 1688–1695. https://doi.org/10.1038/s41588-018-0246-1 Waalwijk, C., Koning, J.R.A. de, Baayen, R.P. and Gams, W. (1996) Discordant Groupings of Fusarium spp. from sections Elegans, Liseola and Dlaminia Based on ribosomal ITS1 and ITS2 sequences. Mycologia 88 (3), 361–368. https://doi.org/10.2307/3760877 Waalwijk, C., Vanheule, A., Audenaert, K., Zhang, H., Warris, S., van de Geest, H. and van der Lee, T. (2017) Fusarium in the age of genomics. Tropical Plant Pathology 42 (3), 184–189. https://doi. org/10.1007/s40858-017-0128-6 Wagner, L., Stielow, J.B., de Hoog, G.S., Bensch, K., Schwartze, V.U., Voigt, K., Alastruey-Izquierdo, A., Kurzai, O. and Walther, G. (2019) A new species concept for the clinically relevant Mucor circinelloides complex. Persoonia - Molecular Phylogeny and Evolution of Fungi 67–97. https://doi.org/10.3767/ persoonia.2020.44.03 Walder, F., Schlaeppi, K., Wittwer, R., Held, A.Y., Vogelgsang, S. and Van Der Heijden, M.G.A. (2017) Community profiling of Fusarium in combination with other plant-associated fungi in different crop species using SMRT sequencing. Frontiers in Plant Science 8, 1–17. https://doi.org/10.3389/fpls.2017.02019 Waller, J.M., Bridge, P.D., Black, R. and Hakiza, G. (1993) Characterization of the coffee berry disease pathogen, Colletotrichum kahawae sp. nov. Mycological Research 97 (8), 989–994. https://doi. org/10.1016/S0953-7562(09)80867-8 Wang, F.Q., Zhong, J., Zhao, Y., Xiao, J., Liu, J., Dai, M., Zheng, G., Zhangm, L., Yu, J., Wu, J. and Duan, B. (2014) Genome sequencing of high-penicillin producing industrial strain of Penicillium chrysogenum. BMC Genomics. 15 (1), S11. doi: 10.1186/1471-2164-15-S1-S11 Watanabe, M. (2013) Molecular phylogeny and identification of Fusarium species based on nucleotide sequences. Mycotoxins 63 (2), 133–142. https://doi.org/10.2520/myco.63.133

Genomic Sequences for Fungi

253

Watson, J.D. and Crick, F.H.C. (1953) Molecular structure of nucleic acids: A structure for deoxyribose nucleic acid. Nature 171 (4356), 737–738. https://doi.org/10.1038/171737a0 Weir, B.S., Johnston, P.R. and Damm, U. (2012) The Colletotrichum gloeosporioides species complex. Studies in Mycology 73, 115–180. https://doi.org/10.3114/sim0011 Werbel, W. A., Baroncelli, R., Shoham, S. and Zhang, S. X. (2019) Angioinvasive, cutaneous infection due to Colletotrichum siamense in a stem cell transplant recipient: Report and review of prior cases. Transplant Infectious Disease 21 (5), e13153. https://doi.org/10.1111/tid.13153 Wiebe, M.G. (2004) QuornTM Myco-protein — Overview of a successful fungal product. Mycologist 18 (1), 17–20. https://doi.org/10.1017/S0269-915X(04)00108-9 Wiemann, P., Sieber, C.M.K., von Bargen, K.W., Studt, L., Niehaus, E.M., Espino, J.J., Huß, K., Michielse, C.B., Albermann, S., Wagner, D., Bergner, S.V., Connolly, L.R., Fischer, A., Reuter, G., Kleigrewe, K., Bald, T., Wingfield, B.D., Ophir, R., Freeman, S. et al. (2013) Deciphering the cryptic genome: genome-wide analyses of the rice pathogen Fusarium fujikuroi reveal complex regulation of secondary metabolism and novel metabolites. PLoS Pathogens 9 (6). https://doi.org/10.1371/journal. ppat.1003475 Wilmes, P., Simmons, S.L., Denef, V.J. and Banfield, J.F. (2009) The dynamic genetic repertoire of microbial communities. FEMS Microbiology Reviews 33 (1), 109–132. https://doi.org/10.1111/j.15746976.2008.00144.x Wood, V., Gwilliam, R., Rajandream, M. A., Lyne, M., Lyne, R., Stewart, A., Sgouros, J., Peat, N., Hayles, J., Baker, S., Basham, D., Bowman, S., Brooks, K., Brown, D., Brown, S., Chillingworth, T., Churcher, C., Collins, M., Connor, R. et al. (2003) Erratum: The genome sequence of Schizosaccharomyces pombe (Nature (2002) 415 (871-880)). Nature 421 (6918), 94. Wu, G., Jurick, W.M., Lichtner, F.J., Peng, H., Yin, G., Gaskins, V.L., Yin, Y., Hua, S.S., Peter, K.A. and Bennett, J.W. (2019) Whole-genome comparisons of Penicillium spp. reveals secondary metabolic gene clusters and candidate genes associated with fungal aggressiveness during apple fruit decay. PeerJ 2019 (1). https://doi.org/10.7717/peerj.6170 Wyss, T., Masclaux, F.G., Rosikiewicz, P., Pagni, M. and Sanders, I.R. (2016) Population genomics reveals that within-fungus polymorphism is common and maintained in populations of the mycorrhizal fungus Rhizophagus irregularis. ISME Journal 10 (10), 2514–2526. https://doi.org/10.1038/ismej. 2016.29 Yadav, S.P. (2007) The wholeness in suffix -omics, -omes, and the word om. Journal of Biomolecular Techniques 18 (5), 277. Yang, R.H., Su, J.H., Shang, J.J., Wu, Y.Y., Li, Y., Bao, D.P. and Yao, Y.J. (2018) Evaluation of the ribosomal DNA internal transcribed spacer (ITS), specifically ITS1 and ITS2, for the analysis of fungal diversity by deep sequencing. PLoS ONE 13 (10), 1–17. https://doi.org/10.1371/journal. pone.0206428 Yang, Y., Chen, M., Li, Z., Al-Hatmi, A.M.S., de Hoog, S., Pan, W., Ye, Q., Bo, X., Li, Z., Wang, S., Wang, J., Chen, H. and Liao, W. (2016) Genome sequencing and comparative genomics analysis revealed pathogenic potential in Penicillium capsulatum as a novel fungal pathogen belonging to Eurotiales. Frontiers in Microbiology 7, 1–14. https://doi.org/10.3389/fmicb.2016.01541 Zarraonaindia, I., Smith, D.P. and Gilbert, J.A. (2013) Beyond the genome: Community-level analysis of the microbial world. Biology and Philosophy 28 (2), 261–282. https://doi.org/10.1007/s10539-0129357-8 Zemach, A., McDaniel, I.E., Silva, P. and Zilberman, D. (2010). Genome-wide evolutionary analysis of eukaryotic DNA methylation. Science 328 (5980), 916–919. https://doi.org/10.1126/science. 1186366 Zeng, Z., Wu, J., Kovalchuk, A., Raffaello, T., Wen, Z., Liu, M. and Asiegbu, F.O. (2019) Genome-wide DNA methylation and transcriptomic profiles in the lifestyle strategies and asexual development of the forest fungal pathogen Heterobasidion parviporum. Epigenetics 14 (1), 16–40. https://doi.org/10.108 0/15592294.2018.1564426 Zhang, Y., Yang, H., Turra, D., Zhou, S., Ayhan, D.H., DeIulio, G.A., Guo, L., Broz, K., Wiederhold, N., Coleman, J.J., Donnell, K.O., Youngster, I., McAdam, A.J., Savinov, S., Shea, T., Young, S., Zeng, Q., Rep, M., Pearlman, E. et al. (2020) The genome of opportunistic fungal pathogen Fusarium oxysporum carries a unique set of lineage-specific chromosomes. Communications Biology 3 (1), 1–12. https://doi.org/10.1038/s42003-020-0770-2

254

R. Baroncelli and G. Cafà

Zhao, Y., Park, S., Warn, P., Shrief, R., Harrison, E. and Perlin, D.S. (2010) Detection of Aspergillus fumigatus in a rat model of invasive pulmonary aspergillosis by real-time nucleic acid sequencebased amplification. Journal of Clinical Microbiology 48 (4), 1378–1383. https://doi.org/10.1128/ JCM.02214-09 Zhao, Z., Liu, H., Wang, C. and Xu, J.R. (2014) Correction to Comparative analysis of fungal genomes reveals different plant cell wall degrading capacity in fungi. BMC Genomics 15 (1). https://doi. org/10.1186/1471-2164-15-6

15

What can Genome Analysis Offer for Bacteria?

Markus Göker* Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany

Introduction Providing an overview of the benefits of genome analysis for microbiology in a single book chapter appears to be an impossible task. In a book on the reconciliation of microbial systematics one can, fortunately, focus on the impact of the use of genome sequences on taxonomy. Among the distinct aspects of taxonomy – namely, characterization, classification, identification and nomenclature (Tindall et al., 2007) – we will here only consider taxonomic classification: that is, the arrangement of prokaryotes into named groups, as the other aspects are treated elsewhere in this book (see Chapters 9, 10, 11, 12, 13 and 14). Integrating genomic information in microbial systematics (Klenk and Göker, 2010) can be regarded as mandatory for the sole reason of the abundant availability of this kind of information. Taxonomists have always reacted positively to technological advances (Gram, 1884; Cummins and Harris, 1956; Lester and Crane, 1959; De Ley, 1970; Schleifer and Kandler, 1972; Fox et al., 1977; Sanger et al., 1977). For instance, the earlier progress in sequencing single genes had led to the view that genotypic and phenotypic data should be integrated to obtain classifications at all levels in the taxonomic hierarchy (Wayne et al., 1987; Stackebrandt, 1992).

The phenomenal increase in the number of taxonomically relevant and publicly accessible whole-genome sequences is caused by two factors. First, sequencing technologies experienced a steadily accelerating progress, starting with the automation of Sanger sequencing in the 1990s (Mavromatis et al., 2012). Second, large-scale sequencing projects for nomenclatural types such as the Genomic Encyclopaedia of Archaea and Bacteria (GEBA) pilot phase (Klenk and Göker, 2010) and its KMG (1000 Microbial Genomes) follow-up projects (Kyrpides et al., 2014; Mukherjee et al., 2017) made genome sequences available to taxonomists. A further shift took place when taxonomic journals such as the International Journal of Systematic and Evolutionary Microbiology (IJSEM) made genome sequencing mandatory for descriptions of new taxa (www. microbiologyresearch.org/journal/ijsem/scope, accessed 6 February 2020; see also Chapter 3). While it is now logical to request the use of genomic data for classifying prokaryotes (Ramasamy et al., 2014), this does not yet imply a need for reconciliation. For there to be a need to reconcile taxonomic classifications, conflict must have been arisen between genome-based classifications on the one hand and those that were proposed prior to the use of genome sequences on the other hand; or, alternatively, between

*[email protected] © CAB International 2021. Trends in the Systematics of Bacteria and Fungi (eds. P. Bridge, D. Smith and E. Stackebrandt)

255

256

M. Göker

distinct kinds of genome-based classifications. Conflicts between classifications may be caused by conflicts between distinct sets of primary data from which these classifications were derived; by conflicts between the way these primary data were refined or analysed; or by conflicts between the principles used to taxonomically interpret the results from these analyses. A discussion of these conflicts, let alone an attempt to settle them, requires an analysis of their causes. The ceteris paribus assumption dictates that in order to determine the effect of a given factor, all other factors need to be controlled; if otherwise the ceteris paribus assumption would be violated. For this reason, we need to establish the way non-genomic and genomic data were analysed and interpreted in the past to obtain taxonomic classifications, and how they are currently analysed and interpreted. To discover what genome analysis can offer for the classification of bacteria, as opposed to the analysis of other kinds of data, one needs to distinguish the effect of the data from the effect of the analysis methods applied to these data. Microbial systematics uses the Linnaean approach to classification, in which each taxon name belongs to a certain category and these categories are ordered, yielding ranks in a hierarchy. Deriving a partial or complete taxonomic classification thus not only requires a justification for the assignment of organisms to a taxon, but also a justification for the rank assigned to that taxon. While assigning ranks may also require more special considerations, the arrangement of organisms into taxa depends on the outcome of some analysis on the one hand and on the overall goal of taxonomic classification on the other hand. Yet this overall goal differed between the distinct schools of taxonomic thought. This chapter is organized as follows: (i) the main approaches to the philosophy of taxonomic classification are recapitulated; (ii) the paradigm of polyphasic taxonomy is discussed in this context; (iii) the causes of conflict between previous classifications and genome-scale analyses are investigated, using examples from recent phylum-wide studies, with a discussion of how markers used in polyphasic taxonomy can be replaced by genome-derived ones; and (iv) the challenges in assigning taxonomic ranks using genome-scale or other data are revisited. The

conclusion assesses the chances, or lack thereof, of reconciling taxonomic classifications.

Schools of Taxonomic Thought and Associated Methods of Analysis Three major lines of the philosophy of taxonomy have been distinguished in the literature (Hull, 1970; Wiley and Liebermann, 2011). Two are of particular relevance to this chapter (Table 15.1). Charles Darwin proclaimed that ‘our classifications will come to be, as far as they can be so made, genealogies’ (Darwin, 1859/2004). This called for a natural system of taxonomic classification in agreement with the branching order imposed by the course of evolution (Ghiselin, 1992). The implications of this goal for data analysis became apparent only later on (Hennig, 1965, 1975). Hennig’s major insight was that only monophyletic taxa can be accepted; that is, groups of organisms that include all descendants of their last common ancestor, and only those (Hennig, 1965; Wiley and Liebermann, 2011). This view can arguably be traced back to Darwin (Ghiselin, 2004), even though his original phrasing had to be reformulated; to provide a summary of the phylogeny of the classified organisms is now regarded as the purpose of taxonomic classification (Wiley and Liebermann, 2011). This modification was necessary as an adaptation to the Linnaean system because the number of taxonomic categories is limited (Hull, 1970; Farris, 1976a). A phylogenetic hypothesis becomes necessary, which is then summarized by a taxonomic classification. A summary must not contradict the statements it summarizes (Hull, 1964). A classification would contradict its underlying phylogenetic hypothesis if it contained groups that are non-monophyletic according to this hypothesis. Because this can be formally proven, only monophyletic taxa are permitted (Hull, 1964; Farris, 1974, 1976, 1977, 1979; Wiley and Liebermann, 2011). The school of taxonomy that results is best called ‘phylogenetic systematics’ (Wiley and Liebermann, 2011). A more narrowly defined term is ‘cladistics’, which should now be more appropriately used for approaches that restrict phylogenetic analysis to applying the maximum parsimony optimality criterion (discussed further

What can Genome Analysis Offer for Bacteria?

257

Table 15.1. Main differences between the phenetic school of taxonomy and phylogenetic systematics with respect to general principles, manual interpretation of single characters and computational tools used to analyse multiple characters. Note that ‘clustering’ here is not understood as a synonym for ‘agglomerative algorithm’; see the main text for details.

Overall goal Main approach

Characters to which the approach can be applied How to interpret single characters Main tools to computationally analyse sets of characters

Immediate result from applying these tools Goal of these tools Differences between these tools Statistical support

Phylogenetic systematics

Phenetics

Summarize the phylogeny of the organisms Allow only monophyletic taxa in taxonomic classification

Classify according to overall similarity Classify more similar organisms more closely together no matter whether the resulting taxa are monophyletic Genotypic and phenotypic

Genotypic and phenotypic Only apomorphies support monophyletic groups, plesiomorphies do not Phylogenetic inference using ML, MP, NJ, Bayesian Inference, rarely ULS, WLS, ME, balanced ME Unrooted, non-ultrametric phylogenetic tree Estimate of the phylogeny of the organisms Specific phylogenetic optimality criterion Taxa should be strongly supported as monophyletic

Overall similarity irrespective of whether character states are apomorphic or plesiomorphic Clustering tools such as UPGMA, WPGMA, single/average/ complete linkage, etc. (or MONA, PCoA, PCA, etc.) Rooted, ultrametric dendrogram (or two-dimensional scatterplot) Group more similar (= less distant) items more closely together Internal calculation of similarity or distance between groups of items Hardly, if ever, approached originally. Technically, resampling can be used in conjunction with clustering algorithms

ME, minimum evolution; ML, maximum likelihood; MONA, monothetic analysis; MP, maximum parsimony; NJ, neighbour joining; PCA, principal component analysis; PCoA, principal coordinate analysis; UPGMA/WPGMA, (un-)weighted pair group method by average; ULS/WLS, (un-)weighted least squares.

below). Cladistics was originally used as a derogatory term by proponents of the alternative school of ‘evolutionary taxonomy’, which permitted and preferred paraphyletic groups in certain situations (Hull, 1970; Mayr, 1974; Sokal, 1985). Evolutionary taxonomy fell into disregard overall (Wiley and Liebermann, 2011). When derived from single characters, paraphyletic groups are based on plesiomorphies (ancestral character states), whereas monophyletic groups are based on apomorphies (derived character states) and polyphyletic groups on homoplasies (parallelisms or reversals). Accordingly, ‘diagnostic’ character states may well diagnose nonmonophyletic groups, with reptiles being the classic example (Wiley and Liebermann, 2011).

The major alternative to phylogenetic systematics is ‘phenetics’. The phenetic approach (Sneath and Sokal, 1973) did not deny evolution either but was based on the assertion that it is unlikely that evolution can be reconstructed with certainty. It was concluded that organisms should better be classified using overall similarity (Sokal and Camin, 1965; Sneath and Sokal, 1973; Sokal, 1984, 1985), which led to the development of many clustering algorithms (Table 15.1). Clustering or cluster analysis is understood in statistics as an approach to group more similar (less distant) items more closely together. Methods originating in phenetics are still widely applied (Legendre and Legendre, 1998; Estivill-Castro, 2002; Crawley, 2007). Methods such

258

M. Göker

as PCA and principal coordinate analysis (PCoA) (Legendre and Legendre, 1998) may also be appended here. Historically, phenetics was also strongly associated with microbiology (Colwell, 1970; Sneath and Sokal, 1973). However, phenetics was criticized from a phylogenetic viewpoint (Hull, 1970; Johnson, 1970; Farris, 1979), which caused phylogenetic systematics to become the cornerstone of taxonomic classification in most areas (Wiley and Lieberman, 2011). As a consequence, trees obtained using methods of phylogenetic inference (Felsenstein, 2004; Wiley and Lieberman, 2011) replaced the ultrametric dendrograms that resulted from clustering as the main way of analysing sets of (either phenotypic or genotypic) characters (Table 15.1). Whether or not a method is a phylogenetic one, or belongs to cluster analysis, may not always be obvious. This may particularly affect the widely applied neighbour joining (NJ) algorithm because it is agglomerative; that is, it uses dynamic programming to build a tree in a series of equivalent divisive steps until a topology without multifurcations is obtained (Felsenstein, 2004). However, clustering is sometimes used as a synonym of ‘agglomerative algorithm’ (Felsenstein, 2004), and this usage must be distinguished from the definition of clustering as an approach that groups by increasing similarity, which is equivalent to grouping by decreasing distance (Legendre and Legendre, 1998; Estivill-Castro, 2002; Crawley, 2007). The unweighted pair group method by average (UPGMA) is a clustering algorithm, according to both definitions (Legendre and Legendre, 1998), whereas NJ is an agglomerative phylogenetic method (Felsenstein, 2004). Interestingly, NJ was found later on to be mathematically equivalent to a greedy optimization approach under the so-called balanced minimum evolution criterion (Desper and Gascuel, 2004) which led to the development of the more accurate FastME approach (Lefort et al., 2015). Thus, all major phylogenetic approaches (Table 15.1) are now known to make use of a specific optimality criterion. Some of these methods, including NJ and FastME, infer a tree from a distance matrix (which is, in most cases, derived from a character matrix), whereas others, including maximum likelihood and maximum parsimony, infer a tree from a character matrix (Felsenstein, 2004; Wiley and Lieberman, 2011). Clustering methods that operate on distance matrices include

UPGMA, while others (such as monothetic analysis, MONA) operate on character matrices. Phylogenetic methods do not differ from clustering methods with respect to the kinds of primary data that can be analysed (Table 15.1); for instance, phylogenetic methods can be applied to genotypic as well as to phenotypic data (Wiley and Lieberman, 2011). The first maximum parsimony algorithms have even been developed for application to phenotypic characters (Camin and Sokal, 1965; Kluge and Farris, 1969; Farris, 1970), whereas algorithms suitable for molecular sequences followed later (Fitch, 1971). Maximum likelihood algorithms were first devised for molecular sequences (Felsenstein, 1981) but were subsequently adapted to phenotypic characters (Lewis, 2001) and implemented in popular programs (Berger and Stamatakis, 2010; Berger et al., 2011). The PAUP* software (Swofford, 2002) can deal with binary and multi-state characters under both Fitch (1971) and Wagner Parsimony (Farris, 1970); TNT (Goloboff et al., 2008) can even infer phylogenies from continuous characters (Goloboff et al., 2005). Analysis of binary or multi-state characters under maximum likelihood is possible with RAxML (Stamatakis et al., 2008). The availability of such implementations also enables researchers to infer trees from non-standard genomic character matrices such as gene-content or ortholog- content data (Klenk and Göker, 2010; Breider et al., 2014). The main difference in perspective is that phylogenetic methods, as opposed to clustering methods, are needed to assess monophyly (Farris, 1974). Of course, occasionally phylogeny and clustering may yield the same topology, but this does not imply that the goals of phylogenetics and phenetics are identical. For instance, the replacement of UPGMA by NJ was not based on the fact that UPGMA always fails where NJ succeeds in obtaining the correct topology, but on the observation that UPGMA fails more often (Felsenstein, 2004). In situations with a sufficiently large deviation from clock-like evolution, phylogenetic methods deliberately do not group more similar organisms more closely together, which is why they have been invented in the first place (Felsenstein, 2004). More similar organisms are not necessarily more closely related. To assess monophyly, trees must be rooted, but phylogenetic methods yield unrooted trees

What can Genome Analysis Offer for Bacteria?

(Felsenstein, 2004). The extra rooting step may be overlooked when software is used that does this automatically. The most frequently used approach is outgroup rooting, but midpoint rooting may alternatively be applied (Hess and De Moraes Russo, 2007). Finally, methods that enforce a molecular clock on a phylogeny may be used for rooting (To et al., 2015). Incorrectly rooted trees may yield non-monophyletic taxa and inaccurate assessments of apomorphy (Wiley and Liebermann, 2011). Statistical inference from limited data may by affected by random errors, which also holds for phylogenetic inference (Felsenstein, 2004; Wiley and Liebermann, 2011). Phylogenetic inference algorithms will always yield a tree, irrespective of whether or not there is any support in the data for a certain branch. Branch support values are needed to determine parts of the topology that might be occurring by chance alone; the bootstrapping approach to character resampling is still the most prominent way to estimate branch support (Felsenstein, 2004; Taylor and Piel, 2004; Wiley and Liebermann, 2011). Establishing taxa based on trees for which branch support has not been calculated, or on apparently unsupported branches, is likely to lead to the need for later revision (Vences et al., 2013). As an aside, resampling techniques such as bootstrapping could also be coupled with phenetic methods (Suzuki and Shimodaira, 2006); but, owing to the decline in phenetics with respect to the analysis of sequence data, this seems to have been hardly (if ever) approached. The signal-to-noise ratio in a data set can be improved by sampling more characters (Gee, 2003). An obvious promise of genome sequencing for phylogenetics is to obtain more characters that can be analysed (Philippe et al., 2005; Ciccarelli et al., 2006; Wu and Eisen, 2008; Klenk and Göker, 2010). The analysis of so-called ‘supermatrices’ applies the same principles as the analysis of single loci, but is based on the concatenation of many alignments of individual orthologous genes detected in genomes (Dutilh et al., 2007; Wolf et al., 2001). Impressively well resolved phylogenetic trees can be obtained in this way (Klenk and Göker, 2010; Breider et al., 2014; Mukherjee et al., 2017). Pioneering studies in supermatrix generation were followed by the implementation of phylogenetic analysis pipelines freely available for other researchers

259

(Ciccarelli et al., 2006; Wu and Eisen, 2008; Meier-Kolthoff and Göker, 2019), their inclusion in recommendations for genomic analyses in microbial systematics (Chun et al., 2018) and the routine use of genome-scale data to answer taxonomic questions (Sangal et al., 2016; Kublanov et al., 2017; Carro et al., 2018). Stronger branch support may theoretically yield more conflict between distinct analyses (Jeffroy et al., 2006; Klenk and Göker, 2010). This particularly holds if distinct selections of genes have distinct histories. Horizontal gene transfer (HGT) is particularly prominent in bacteria, where it can be mediated by plasmids and easily affect entire operons (Bartling et al., 2017; Brinkmann et al., 2018). HGT is a well-known cause of topological conflict between analyses of single genes. Because it is so abundant in prokaryotes, it has even been be used to argue for the complete abandonment of the hierarchical classification, because a tree of life is a meaningless concept for prokaryotes (Bucknam et al., 2006; Doolittle and Bapteste, 2007; Bapteste and Boucher, 2009; Klenk and Göker, 2010). This view may actually be regarded as a fourth school of taxonomic thought. The conclusion is logical within the paradigm of phylogenetic systematics: if the goal of taxonomic classification is to summarize the phylogeny (Wiley and Liebermann, 2011), and if a common organismal phylogeny is a misleading idea, taxonomic classification should be abandoned entirely (or at least its Linnaean, hierarchical variant). In fact, whether or not distinct genes yield essentially the same topology was regarded as a test case for evolutionary theory (Penny et al., 1982). In particular, approaches that sample a relatively small set of ‘core genes’ that are phylogenetically congruent to each other (Ciccarelli et al., 2006; Wu and Eisen, 2008) were criticized because these genes only amount to a tiny fraction of the total number of genes in the genomes (Dagan and Martin, 2006). Moreover, to select a previously compiled set of genes does not normally yield genome-scale data, makes a priori assumptions about the relative suitability of genes, and does not normally yield the same set of genes (Lienau and DeSalle, 2009; Klenk and Göker, 2010; García-López et al., 2019). The increase in support for phylogenomic analyses when almost all available genes are included indicates a strong hierarchical signal,

260

M. Göker

despite HGT (Breider et al., 2014). Methods that skip supermatrix creation (Auch et al., 2010; Meier-Kolthoff et al., 2014a) and directly infer a tree from genome sequences also yield phylogenetic trees with high support (Hahnke et al., 2016; Nouioui et al., 2018; García-López et al., 2019). Overestimating phylogenetic confidence, however, needs to be avoided (Taylor and Piel, 2004) in analyses of huge but potentially conflicting data sets. Even supermatrices whose included genes are subsets of each other may yield strong phylogenetic conflict (Simon et al., 2017). It has been suggested that standard bootstrapping could be replaced by resampling entire genes (or an equivalent) to provide more reliable support values and reduce conflict (Siddall, 2010; Hahnke et al., 2016; Simon et al., 2017; Nouioui et al., 2018; García-López et al., 2019). Provided such approaches are successful, the presence of HGT speaks for the use of genome- scale data, not against it. Biases of the phylogenetic inference methods themselves may also complicate matters. Explicit evolutionary models are used in ML, NJ and Bayesian analyses, including (in the case of molecular data) the estimation of a nucleotide or amino-acid substitution model and the estimation of a gamma distribution to model rate heterogeneity between sites (Felsenstein, 2004; Wiley and Liebermann, 2011). The more characters are sampled, the easier may mis-specified models lead to strong support for wrong groupings, as in the case of heterogeneity in base composition (Phillips et al., 2004; Jeffroy et al., 2006). Sampling more characters does not help against long-branch attraction artefacts (Felsenstein, 1978), which may better be tackled by sampling more organisms (Siddall and Whiting, 1999; Bergsten, 2005; Leebens-Mack et al., 2005). While the known phylogenetic methods are obviously made for phylogenetic analysis as opposed to clustering, this does not mean that they perform well under all circumstances (Felsenstein, 2004; Bergsten, 2005). Finally, analysing genome-scale data is computationally more challenging than analysing single genes. Various steps are usually needed prior to phylogenetic analysis, including sequencing, assembly and annotation (Hyatt et al., 2010; Mavromatis et al., 2012; Seemann, 2014; Huntemann et al., 2015; Tanizawa et al., 2018), detection of orthologous genes (Lechner et al.,

2011), inference of sequence alignments and alignment filtering (Tan et al., 2015). Phylogenetic inference from huge data sets is hampered by the increased need for applying heuristic solution (Stamatakis et al., 2008; Price et al., 2010), which implies that the best trees that have been found will differ more strongly from the overall best tree, according to the chosen optimality criterion. All in all, the presence of more steps in an analysis pipeline implies that more choices are possible, and the greater methodological variability may cause greater variability between the phylogenetic outcomes. This may, in turn, result in discrepancies between taxonomic classifications derived from these results. Bacterial genomes thus offer much higher resolution in phylogenetic analyses. This may well be the main promise of genome sequencing for the taxonomic classification of prokaryotes. Despite rampant HGT, highly supported and meaningful phylogenies with reliable branch support values can, in principle, be inferred from genome-scale data, and the presence of HGT is one of the reasons why such approaches are preferable over analyses of single genes. However, systemic errors cannot easily be ruled out, overestimating branch support must be avoided and more complex analysis pipelines can cause a larger variance in the results. Phenetic approaches can also benefit from genome-scale data, but their use may be associated with other problems. It should be obvious from the previous discussion that distinct philosophies of taxonomic classifications yield distinct preferences for analysis methods. Even if they are associated with the same school of taxonomic thought, distinct analysis methods can yield distinct results when applied to the same data. This may also mean that distinct taxonomic classifications may be constructed from the same underlying data. Such discrepancies are a source of conflict and need to be reconciled, irrespective of whether phenotypic or genotypic (including genome-scale data) are analysed. This must be taken into account when discussing the impact of genomes on bacterial classification, particularly in areas where the impact of distinct schools of taxonomic thought persists. For instance, as considered below, the seemingly innocent use of a pairwise similarity value in taxonomic classifications is actually not a phylogenetic but a phenetic approach (Meier-Kolthoff et al., 2014b). It also appears

What can Genome Analysis Offer for Bacteria?

that the history of polyphasic taxonomy was affected by distinct approaches to phylogenetic classification and that the resulting discrepancies may still have a misleading effect on the construction and interpretation of taxonomic classifications.

Methodological Issues in Polyphasic Taxonomy Polyphasic taxonomy primarily aimed to integrate distinct kinds of phenotypic data (Pelczar, 1957; Lechevalier and Lechevalier, 1970; Staneck and Roberts, 1974; Gregersen, 1978; Kroppenstedt, 1982; Minnikin et al., 1984; Collins, 1985; Sasser, 1990) and later on, to integrate genetic and phenotypic data (Vandamme et al., 1996; Gillis et al., 2005; Kämpfer and Glaeser, 2012). It dominated microbial systematics for decades but has more recently been called into question (Sutcliffe et al., 2012; Vandamme and Peeters, 2014; Sutcliffe, 2015; Thompson et al., 2015). On the one hand, the phenotypic tests still conducted routinely were regarded as unnecessary; on the other hand, critics emphasized that more genomic information should be incorporated. But these criticisms apply more to the characterization of prokaryotes (Tindall et al., 2010), targeting the kinds of characters used and which of them are actually useful, rather than to the methods used to draw conclusions, such as taxonomic classifications, from the sampled characters. In the present discussion, it is more important that polyphasic taxonomy was originally introduced in conjunction with phenetic methodology (Colwell, 1970). The inclusion of genotypic data into polyphasic taxonomy (Wayne et al., 1987; Stackebrandt, 1992) primarily happened by determining genomic G+C content (Marmur and Doty, 1962; Schildkraut et al., 1962; Owen et al., 1969; Ko et al., 1977; Mesbah et al., 1989; Moreira et al., 2011) or genome size (Brenner et al., 1972; Neimark and Lange, 1990), by conducting DNA:DNA hybridization (Brenner, 1973; Wayne et al., 1987; Stackebrandt and Goebel, 1994; Rosselló-Móra and Amann, 2001; Johnson and Whitman, 2007; Tindall et al., 2010; Meier-Kolthoff et al., 2013b), by conducting analyses of the 16S rRNA gene (Fox et al., 1977) and later on by conducting

261

multi-locus sequence analysis of ‘housekeeping’ genes (Glaeser and Kämpfer, 2015). This had two interesting consequences, one for the kinds of methods applied to these data, and another for the relative impact of distinct kinds of data. The first consequence of the introduction of genotypic information into polyphasic taxonomy was that, contrary to the phenetic background of polyphasic taxonomy (Colwell, 1970), phylogenetic methods such as Neighbour-Joining began to dominate the analysis of sequence data. Neighbour-Joining is still in frequent use today, augmented by other phylogenetic approaches such as maximum parsimony, maximum likelihood and Bayesian methods (Table 15.1). This created a methodological inconsistency, as the interpretation of phenotypic characters continued to be based on phenetic thinking. Phenetics is most apparent in the persistent application of ‘diagnostic’ character states (GarcíaLópez et al., 2019). The term ‘diagnostic’ is used in the International Code of Nomenclature of Prokaryotes (Parker et al., 2019), but a definition of the term is not given. The term is also used fairly frequently in the IJSEM, where it may or may not correctly be applied. From our perspective, the more important issue is that, in contrast to apomophies, diagnostic character states can be plesiomorphic and thus be diagnostic of a paraphyletic group. Accordingly, in phylogenetic systematics a single character with two states would never be used to group organisms into two taxa. Determining phenotypic apomorphies requires outgroup comparisons but does not present other difficulties (Montero-Calasanz et al., 2017). While the pioneering studies of the principles of phylogenetic systematics (Hennig, 1965; Wiley and Liebermann, 2011) dealt with groups of organisms quite distinct from bacteria, the terms employed (as well as the principles and algorithms derived from these terms) are too general to be bound to certain characters or organisms. Phylogenetic principles for character analysis have, nevertheless, rarely been used in microbiology (García-López et al., 2019). The analysis of phenotypic characters using phenetic clustering algorithms also ceased to be conducted even though it could readily be carried out using state-of-the-art software tools (Suzuki and Shimodaira, 2006; Crawley, 2007; Vaas et al., 2013; Montero-Calasanz et al., 2017; R Core Team, 2019). The only technique that

262

M. Göker

remained was the manual interpretation of diagnostic character states. Using the states of individual characters for grouping, without distinguishing between apomorphic and plesiomorphic states, is equivalent to applying clustering algorithms such as UPGMA to single characters, as the outcome is the same. In microbial taxonomy, the search for diagnostic features as opposed to apomorphies was not restricted to phenotypic characters but was also apparent in the attempts to detect 16S signatures (Zhi et al., 2009). The quest for diagnostic combinations of character states may be due to the perceived need to define taxa. Interestingly, the International Code of Nomenclature of Prokaryotes (Parker et al., 2019) only emphasizes the need for taxon descriptions; definitions of taxa are not even mentioned. There does not appear to be a theoretical or empirical proof for the assumption that it is always possible to find unique character combinations for a chosen monophyletic group in a phylogenetic tree. Indeed, ‘while it is always convenient when a group can be concisely defined, it is folly to base a principle of classification on the assumption that any taxon can be defined by a limited list of the characters of its own members’ (Farris, 1967). This was one of the reasons for proposing alternative systems such as the PhyloCode (De Queiroz and Gauthier, 1990). The use of the term ‘phylogenetic data’ for ‘sequence data’ in microbial taxonomy may be related to the methodological inconsistency caused by the coexistence of phenetic and phylogenetic thinking. For instance, Tang et al. (2012) describe results ‘based on the phenotypic, chemotaxonomic and phylogenetic data’ even though the three terms do not exclude each other. Since one can either infer phylogenetic trees from sequence data or conduct some other kind of inference (or none at all), there is nothing inherently phylogenetic in sequence data. Conversely, while microbial taxonomists hardly ever inferred phylogenies from phenotypic data, this does not mean that such data are inherently non- phylogenetic. Phylogenetic inference from phenotypic data was indeed much more common in other disciplines (Kluge and Farris, 1969; Farris, 1970; Lewis, 2001; Bergsten, 2005; Goloboff et al., 2005). The abuse of the term ‘phylogenetic data’ may have led microbiologists to regard all sequence-based classification as phylogenetic

classification, although this is inappropriate. Whether or not a classification is phylogenetic does not depend on the underlying data but on the methods that are applied to these data. A method that does not involve a phylogenetic tree is not phylogenetic. For this reason, referring to an oligonucleotide signature as a ‘phylogenetic definition’ of taxa (Woese et al., 1985) should also be discouraged. Phylogenetic definitions of taxa are those that are applied by the PhyloCode (De Queiroz and Gauthier, 1990). The use of the term ‘phylogenetically coherent group’ (Okamura et al., 2015) may also be caused by the decoupling of the philosophy of taxonomy in microbiology from that in other disciplines. From the viewpoint of phylogenetic systematics, taxa must be monophyletic groups. If the term ‘phylogenetically coherent group’ is supposed to have the same meaning as ‘monophyletic group’, the term ‘phylogenetically coherent’ is superfluous; if otherwise, it is misleading. In a similar vein, a clade is by definition monophyletic (Wiley and Lieberman, 2011); the expression ‘monophyletic clade’ is a pleonasm. The second consequence of the introduction of genotypic information into polyphasic taxonomy was the decrease of the weight of the examined phenotypic characters, not just because their relative importance decreased by adding other characters. Rather, 16S rRNA gene trees became the starting point in polyphasic taxonomy. After – or in conjunction with – choosing taxon boundaries from such a tree, the polyphasic approach then proceeded by determining diagnostic features for the new taxa. This was usually done manually, even though statistical methods could assist in selecting features predictive for a certain group of interest from larger numbers of characters (Breiman, 2001; Montero-Calasanz et al., 2017). A potentially more serious problem is that such diagnostic features do not provide independent evidence for taxon boundaries, because here these boundaries were already used to choose the features. Independent evidence could only be obtained if the same groups were detected when analysing the additional features independently of the 16S rRNA gene (or other) tree (Montero-Calasanz et al., 2017). It seems that publications applying polyphasic taxonomy rarely addressed these two issues. While they may be regarded as theoretical

What can Genome Analysis Offer for Bacteria?

subtleties, the difference between diagnostic character states and apomorphic character states, in particular, can apparently affect the interpretation of taxonomic classifications. The phylogenetic placement of Turicella (Funke et al., 1994) within a paraphyletic Corynebacterium that was independently found in two genome- scale analyses (Baek et al., 2018; Nouioui et al., 2018) provides an interesting example (Table 15.2). Turicella differs from Corynebacterium with regard to the predominant menaquinones, and because mycolic acids are absent in Turicella. Baek et al. (2018) concluded that the separation of Turicella from Corynebacterium had correctly been based on this distribution of character states and, given the conflict between this separation of genera and the phylogenetic tree, these characters were unreliable as taxonomic markers. Nouioui et al. (2018) concluded that, in the same situation, the taxonomic separation of Turicella from Corynebacterium had not correctly been based on this distribution of character states and that we cannot conclude from this outcome that these characters are unreliable as taxonomic markers. According to Nouioui et al. (2018), both Funke et al. (1994) and Baek et al. (2018) applied a phenetic approach to character interpretation, not taking into account that plesiomorphies do not justify the establishment of a taxon. But the menaquinones found in Corynebacterium, as well as the presence of mycolic acids, are plesiomorphies within the entire order to which the two genera belong (Nouioui et al., 2018). Conflict between the character and phylogenetic tree is not apparent in the menaquinones (Table 15.2). Character conflict is apparent owing to independent losses of mycolic acids (Baek

263

et al., 2018) but this is as expected (Nouioui et al., 2018) according to Dollo’s law, which states that complex features arise only once in evolution but may be lost several times (Farris, 1977). Homoplasy (i.e. a distribution of character states that conflicts with the phylogeny) is, particularly in bacteria, often thought to be related to HGT. A simpler phenomenon, though – the loss of features – can equally result in homoplasy. Conversely, homoplastic character-state distributions that were attributed to ancestral presence followed by multiple losses can, alternatively, be caused by HGT, as in the case of photosynthesis in Alphaproteobacteria (Brinkmann et al., 2018). But, in any case, it would be inaccurate to postulate a character conflict when there is none. Information on the genes that play an essential role in mycolic acid biosynthesis (Baek et al., 2018) does not affect this interpretation of the character-state distribution. The need to taxonomically include Turicella in Corynebacterium does not imply that the characters that had justified the split are untrustworthy if the separation of the two genera was incorrectly based on these characters. This conclusion is not affected by the possibility of alternatively classifying Corynebacterium, for example by splitting it into several genera. It is easy to create examples of a combination of phylogenetic tree and character-state distribution in which it is impossible to detect diagnostic character states for the majority of the observed monophyletic groups, even if the character is variable and fits perfectly to the tree (Nouioui et al., 2018). Such examples provide a formal proof of the possibility of such situations and cannot, apparently, be rebutted. The problem may be of even greater relevance with respect

Table 15.2. Interpretation of the character states diagnostic for Turicella (and different from Corynebacterium) in distinct studies. The two 2018 studies were based on genome-scale data. Funke et al., 1994 Baek et al., 2018 Nouioui et al., 2018 Character interpretation Characters support separation of genera Separation of genera in conflict with phylogeny Characters in conflict with phylogeny Conclusion to treat characters cautiously

Phenetic Yes

Phenetic Yes

Phylogenetic No

No

Yes

Yes

No

Yes

No

Yes

No (menaquinones) or not substantially (mycolic acids) No

264

M. Göker

to multi-state characters such as 16S rRNA gene signatures (Zhi et al., 2009) or peptidoglycan types (Schleifer and Kandler, 1972). The use of plesiomorphies as diagnostic character states to yield paraphyletic groups is not a new insight in phylogenetic systematics (Hennig, 1965). Whether conflict between previous taxonomic classifications and trees inferred from genome-scale data is caused by homoplasy, or by phenetic thinking, needs to be determined in each specific case (Hahnke et al., 2016; Nouioui et al., 2018; García-López et al., 2019). The principles of phylogenetic systematics are also neglected when non-monophyletic taxa are deliberately created. Whenever a study proposes a taxon based on a phylogenetic tree in which this taxon is obviously non-monophyletic, the presented data specifically contradict the conclusions. Such studies are rare but they do occur (Kulichevskaya et al., 2007, 2015; Wagener et al., 2014; Ben Hania et al., 2015). Although in some cases the decision of the authors may also be motivated by nomenclatural problems caused by the non-availability of types (Scheuner et al., 2014; Dedysh et al., 2020) or by the technique of ‘salami slicing’ in publishing (Trujillo and Oren, 2018), another possible cause of such taxonomic proposals is the application of phenetic principles. Thus, one need for reconciliation in the taxonomic classification of microbes is the coexistence of phenetic and phylogenetic thinking, probably caused by the historical affiliation of phenetics with microbiology (Sneath and Sokal, 1973) and by the according origin of polyphasic taxonomy (Colwell, 1970). This conflict in the philosophy of systematics cannot be solved by the introduction of genomic data into microbial systematics. It can only be resolved if either phenetic or phylogenetic thinking is abandoned. It must be emphasized that both phylogenetic systematics and phenetics are consistent research programmes, but it is hard to see how a mixture of both should consistently be applied. A final issue with polyphasic taxonomy based on 16S rRNA gene trees is that single-gene trees frequently lack statistical support, at least on some branches. While early phylogenetic analyses of bacteria did not even calculate branch support, others were accompanied by bootstrapping, but branch support was not necessarily taken into account when drawing taxonomic

conclusions (Hahnke et al., 2016; Nouioui et al., 2018; García-López et al., 2019). This is expected to lead to the establishment of taxa that would need to be revised later on (Vences et al., 2013). Even recently, phylogenetic studies (Salter et al., 2019) are published that do not calculate branch support, either from single-gene or from genome-scale data, although the latter are expected to yield much higher support.

Causes of Conflict Between Taxonomic Classifications and Genome-scale Analyses Two recent phylum-level studies did not only propose reclassifications based on phylogenomic analyses, but also attempted to determine the causes of the need to reclassify; that is, the causes of the conflict between genome-scale results and previous taxonomic classifications. In Actinobacteria (Nouioui et al., 2018), use of plesiomorphic phenotypic character states to establish taxa had a negative impact (an example from this study was provided in the last section). In three cases a significant conflict between 16S rRNA gene and whole-genome phylogenies was detected: an apparently real conflict between single-gene and genome-scale data, potentially caused by HGT. The vast majority of the observed taxonomic problems concerned taxa for which statistical support for their monophyly was not apparent in the originally inferred or recalculated 16S rRNA gene trees. Here, the questionable taxa appeared to be monophyletic but with low bootstrap support, or even paraphyletic with weak to moderate support against their monophyly. Within Bacteroidetes (García-López et al., 2019), real conflict between 16S rRNA gene and supermatrix analyses was also observed in three cases. A handful of ancient taxa turned out to be non-monophyletic taxa because they were not accompanied by a phylogenetic analysis in the original publication. In a small number of other situations, conflicting taxonomic views were published at about the same time, and so were difficult to avoid. As in the case of Actinobacteria, most of the taxonomic discrepancies observed within Bacteroidetes were caused by low resolution of the 16S rRNA genes used to propose

What can Genome Analysis Offer for Bacteria?

taxa. Another important cause of taxonomic conflict in Bacteroidetes was insufficient sampling of organisms, which caused the non-monophyly of taxa proposed in the literature to be overlooked. Phylogenetically conflicting taxa may easily appear monophyletic when species or strains of relevance are missing. Assignments of new species to a genus without considering its type species; of new genera to a family without considering its type genus; and analogous sampling biases for other taxonomic categories are often found in the literature. Such shortcomings in taxon sampling may be related to ‘salami slicing’ in publishing (Trujillo and Oren, 2018) but are also mechanisms for perpetuating and enlarging already non-monophyletic taxa. We can conclude that genome-based classification solves some of the taxonomic problems observed in the past, but not all of them. Insufficient sampling can also happen in genome-scale analyses and present phylogenetic (Bergsten, 2005) as well as taxonomic problems. Because not all type-strain genomes of bacterial species with validly published names (Parker et al., 2019) are yet available, comprehensive analyses of 16S rRNA gene sequences are still necessary. Backbone constraints can be used to integrate information from analyses of more genes but fewer organisms into comprehensibly sampled single-gene data (Hahnke et al., 2016; Nouioui et al., 2018; García-López et al., 2019). These results do not yet reveal to what extent real conflict between traditionally employed phenotypic markers and genomic information can cause taxonomic discrepancies. Two ways can be conceived for an unbiased comparison of phenotypic and genotypic characters with respect to the phylogeny. The first approach is the independent inference of trees from separate datasets followed by the search for significantly supported branches that cause conflict between two trees. Because the statistical support in a phylogenetic analysis increases with the number of characters, this approach is unlikely to yield meaningful results for most sets of traditionally sampled phenotypic characters (Pelczar, 1957; Lechevalier and Lechevalier, 1970; Staneck and Roberts, 1974; Gregersen, 1978; Kroppenstedt, 1982; Minnikin et al., 1984; Collins, 1985; Sasser, 1990) as the overall number of characters is too low. The resulting phylogenies are unlikely to yield significant conflict simply because they lack branch support.

265

The second, more promising approach for an unbiased comparison of phenotypic and genotypic characters is to infer a single reference tree from many characters – which are to be expected in a genome-scale analysis (Klenk and Göker, 2010) – and to then compare each phenotypic and genotypic character of interest with respect to its fit to this tree using phylogenetic instead of phenetic criteria. Such a comparison was conducted for various genotypic and phenotypic features of the genus Micromonospora (Carro et al., 2018). The analysis found few significantly phylogenetically conserved phenotypic characters, but also a low number of significantly phylogenetically conserved genotypic characters, even within ‘housekeeping’ genes. It was also obvious from that study that combined phylogenetic analysis of these ‘housekeeping’ genes yielded a tree closer to the whole-genome tree (Carro et al., 2018). These results indicate that the differences in suitability between phenotypic and genotypic characters are a matter of quantity rather than a matter of quality. In a study on the entire phylum Bacteroidetes (García-López et al., 2019) characters traditionally employed in the classification of these bacteria were compared to a genome-scale tree. All characters showed a detectable fit to the tree and, with the exception of cell size, cell width and motility by gliding, even an agreement with the tree that rivalled the one of genome size and G+C content. Relationship to oxygen and average number of isoprene residues in major menaquinones (Collins and Jones, 1981) showed a higher fit to the phylogeny than the investigated genomic features. As in the case of Micromonospora, in this analysis a difference in quality between phenotypic and genotypic features could not be observed. While more phenotypic features could be analysed (Barberán et al., 2017), it must be taken into account that the lack of a difference in quality is the appropriate null hypothesis (Crawley, 2007). Additional assumptions about differences in quality may be unnecessary, particularly because it is obvious that genome sequencing yields orders of magnitudes more characters than traditional phenotypic techniques. These considerations are of relevance to the replacement of markers traditionally used in polyphasic taxonomy by genome-derived information. It is straightforward to replace all

266

M. Göker

genotypic markers with data obtained from genome sequences. This holds for analyses of single genes such as the 16S rRNA gene (Fox et al., 1977; Wayne et al., 1987; Stackebrandt, 1992), for analyses of multiple genes (Glaeser and Kämpfer, 2015), for the G+C content (Mesbah et al., 1989; Moreira et al., 2011) and for DNA:DNA hybridization (Brenner, 1973; Wayne et al., 1987; Stackebrandt and Goebel, 1994; Rosselló-Móra and Amann, 2001; Johnson and Whitman, 2007; Tindall et al., 2010; Meier-Kolthoff et al., 2013b). Single-gene analyses and multi-locus sequence analyses can be replaced by analyses of larger supermatrices up to genome-scale data sets, as discussed above. The replacement of experimental methods that indirectly determine DNA G+C composition (Marmur and Doty, 1962; Schildkraut et al., 1962; Owen et al., 1969; Ko et al., 1977; Mesbah et al., 1989; Moreira et al., 2011) by calculating it directly from accurate genome sequences yielded interesting insights. Literature statements that the variation in G+C content within bacterial species is at most 3 mol% (Mesbah et al., 1989) or even 5% (Rosselló-Móra and Amann, 2001) were attributed to experimental error when within-species variation turned out to be at most 1% if G+C content is calculated from genome sequences (Meier-Kolthoff et al., 2014b). It thus makes sense to propose emendations of species descriptions wherever previous estimates from experimental methods differ by more than 1% from values derived in silico (Meier-Kolthoff et al., 2014b; Hahnke et al., 2016; Carro et al., 2018; Nouioui et al., 2018; García-López et al., 2019). Differences in G+C content are thus routinely displayed by some servers for bacterial species delineation (Meier-Kolthoff and Göker, 2019). Bacterial genome size was experimentally accessible for decades (Brenner et al., 1972; Neimark and Lange, 1990) but, much like G+C content, it can now be trivially calculated from genome sequences. Genome size turned out be apparently phylogenetically conserved in phylum-wide studies, albeit less so than G+C content (Nouioui et al., 2018; García-López et al., 2019). The non-linear correlation found in Actinobacteria between G+C content and genome size was not unexpected, given the known mechanisms of G+C content evolution (Rocha and Danchin, 2002; Hildebrand et al., 2010; Mann and Chen, 2010; Lassalle et al., 2015). The

smallest genomes also tend to display the lowest G+C content values, although exceptions from this rule exist (McCutcheon et al., 2009). While the two characters could accordingly be regarded as non-independent, the correspondence between G+C content and genome size dropped dramatically after correcting for the impact of the phylogeny (Nouioui et al., 2018), an effect not considered in other studies of this topic (Almpanis et al., 2018). The Actinobacteria study thus also confirmed the insight from comparative biology that phylogenies need to be taken into account when correlating characters (Harvey and Pagel, 1991; Felsenstein, 2004). This is based on Felsenstein’s (1985) observation that distinct organisms cannot be regarded as statistically independent data points, and so ordinary statistical methods fail to yield reliable results. Genome sequencing yields not only highly resolved phylogenies, but also genomic features whose evolution can be inferred using ancestral character-state reconstruction, and which can be correlated with each other and with phenotypic features or ecological preferences of interest (Simon et al., 2017). Like G+C content, DNA:DNA hybridization (DDH) is a method that targets genomic features, in that case by calculating pairwise similarities, which yielded the gold standard for bacterial species delineation (Brenner, 1973; Wayne et al., 1987; Stackebrandt and Goebel, 1994; Rosselló-Móra and Amann, 2001; Johnson and Whitman, 2007; Tindall et al., 2010; Meier-Kolthoff et al., 2013b). A variety of so-called overall genome relatedness indices (OGRI) have been implemented (Chun and Rainey, 2014), including original average nucleotide identity (ANI) (Konstantinidis and Tiedje, 2005), OrthoANI (Yoon et al., 2017), JSpecies with ANIb and ANIm (Richter and Rosselló-Móra, 2009), gANI (Varghese et al., 2015) and digital DDH (Auch et al., 2010; Meier-Kolthoff et al., 2013a; Meier-Kolthoff and Göker, 2019). Given the statistical techniques that were employed, it is not surprising that digital DDH outperformed the ANI methods regarding the correlation to conventional DDH (Auch et al., 2010; Meier-Kolthoff et al., 2013a). This correlation was the criterion used for establishing ANI in the first place (Konstantinidis and Tiedje, 2005; Richter and Rosselló-Móra, 2009), however, because it was proposed by the ad hoc committee for the re-evaluation of the species definition in bacteriology for judging any genome-sequence-based

What can Genome Analysis Offer for Bacteria?

bacterial species delineation method that works in silico (Wayne et al., 1987; Stackebrandt et al., 2002). The later ANI versions (Varghese et al., 2015; Yoon et al., 2017) were not examined with respect to their correlation with conventional DDH, but at most examined regarding their correlation with each other, which increased the degree of indirection. In line with the distinction between phenetic and phylogenetic criteria explained earlier in the chapter, it must be emphasized that taxonomic conclusions directly based on pairwise similarity values (such as those calculated with one of the ANI variants or by digital DDH) are not phylogenetic methods (Meier-Kolthoff et al., 2013b), and nor was traditional DDH. More similar (that is, less distant) organisms are not necessarily more closely related. For this reason, even though it is customary in the taxonomic literature to call the 70% DDH boundary the ‘threshold for the phylogenetic definition of a species’ (Yoon et al., 2007), this terminology is inaccurate. This issue is rarely mentioned in the literature, and there is very little discussion, even of the kind of discrepancy that can arise from a purely phenetic viewpoint if the data are deviating from a molecular clock and thus from the ultrametricity condition (Meier-Kolthoff et al., 2013b). A gold standard for clustering DDH values when more than two genomes are involved has not been established. But a distance or similarity threshold alone does not yield a clustering if more than two organisms are involved. Choices in the literature for clustering algorithms are often arbitrary (Meier-Kolthoff and Göker, 2019). It can be concluded that the common species delineation approaches are not fully satisfying, neither from a phenetic nor from a phylogenetic viewpoint. However, phylogenies with statistical branch support values can be calculated well from genomic distances that represent digital DDH values, and species clusterings conducted so as to implement taxonomic conservatism by minimizing the number of proposed species (Meier-Kolthoff and Göker, 2019). Replacing all genotypic markers that were routinely applied in polyphasic taxonomy by genome sequence analysis is (conceptually) apparently easy, although it is obvious that using such methods causes conflicts between phenetic and phylogenetic thinking to resurface. The numerous phenotypic features that were sampled in the past (Pelczar, 1957; Lechevalier and Lechevalier, 1970; Staneck and Roberts, 1974; Gregersen, 1978;

267

Kroppenstedt, 1982; Minnikin et al., 1984; Collins, 1985; Sasser, 1990) could, in theory, also be replaced by sequence-derived characters. While still imperfect, sequence information allows for the prediction of phenotypes (Weimann et al., 2016; Deneke et al., 2017; Gardner and Boyle, 2017; Baek et al., 2018). On the other hand, the continued importance of phenotypic testing has been emphasized (Kämpfer, 2014; Tabssum et al., 2018; Overmann et al., 2019). Accurately predicting phenotypes is dependent on the functional annotation of genome sequences. However, particularly in groups that hardly contain any cultivated bacteria that were phenotypically characterized in the laboratory, more than 75% of all open reading frames can only be annotated as unknown or hypothetical genes (Overmann et al., 2017). When critically reappraising polyphasic taxonomic classification, it becomes clear that the lack of consistency due to the coexistence of phenetic and phylogenetic thinking when interpreting distributions of character states cannot be overcome by replacing laboratory tests with predictions from genome sequences. Moreover, some traditionally employed phenotypic characters are not expected to fit well to genome-scale phylogenetic trees, but others are expected to display a strong phylogenetic signal. For the purpose of comparing distinct organisms, results from phenotypic tests should not be contrasted with inferred phenotypes. Genome sequences could ease taxon ‘definitions’ because they yield more characters than traditional phenotypic tests. But this would only be possible if any genomic feature was considered, not just those used to predict phenotypic features traditionally used in polyphasic taxonomy. It may also be beneficial to emphasize the way the characters are analysed, rather than only the way they are obtained. Inferring character evolution by reconstructing ancestral character states, and conducting statistical tests from comparative biology, can make more sense of the available data than just their presentation in tables or taxon descriptions.

Assigning Taxonomic Ranks Using Genome-scale or Other Data A phylogenetic tree cannot be translated directly to a hierarchical classification such as the

268

M. Göker

Linnaean system because its number of ranks is limited (Hull, 1970; Farris, 1976a). If the monophyly criterion is honoured, the classification needs to be constructed by selecting subtrees (clades) from the given rooted tree. Apart from non-monophyly, phylogenetic trees and Linnaean classifications seem to fit well to each other since both are hierarchically structured. Each taxon in a Linnaean hierarchy has an associated rank; however, there was a need to introduce monotypic taxa because of (i) the convention that sister taxa should have the same rank (Hennig, 1965; Vences et al., 2013); and (ii) the frequent asymmetry of phylogenetic trees (Farris, 1976b). Monotypic taxa contain only a single child taxon and thus convey no grouping information, causing difficulties in their biological interpretation, known as ‘Gregg’s paradox’ (Farris, 1976a; Wiley and Lieberman, 2011). Some studies even concluded that monotypic taxa should be avoided as far as possible (Farris, 1976a; Wiley and Lieberman, 2011). Another difficulty related to Linnaean ranks is that distinct taxa of the same rank may or may not be quantitatively comparable. Misunderstanding this issue may have serious consequences for comparative biology and diversity estimates in ecology (Zachos, 2011). For instance, taxa of the same rank can have hugely different evolutionary ages (Avise and Johns, 1999; Avise and Liu, 2011; Holt and Jønsson, 2014). For this reason, it was suggested that Linnaean ranks should be abandoned (Zachos, 2011), but a classification without ranks can hardly be called Linnaean. The Phylo Code was suggested as a major alternative (De Queiroz and Gauthier, 1990) but also attracted severe criticism (Dominguez and Wheeler, 1997; Keller et al., 2003; Nixon et al., 2003). Conversely, several proposals have been made to quantitatively standardize the ranks in a Linnaean classification, but the suggested methods for standardization (as well as the popularity of these approaches) strongly differed between distinct taxonomic disciplines. Some zoologists have suggested ‘time banding’ to objectively assign higher ranks (Avise and Johns, 1999; Avise and Liu, 2011; Holt and Jønsson, 2014), an idea that goes back to Hennig (1965) and applies evolutionary age of origin to standardize ranks. Several kinds of objections have been raised against this proposal (Vences et al., 2013). A major practical concern is that

even though many molecular sequences are now available to infer phylogenies and to estimate divergence times, distinct methods often yield quite distinct estimates, even where a rich fossil record is available (Bromham and Penny, 2003; Grimm et al., 2015; Wilf and Escapa, 2015). Even though these discrepancies can partially be addressed by relaxing the application of fixed age limits when assigning taxonomic ranks (Talavera et al., 2013), this is unlikely to solve all issues with time estimates. As sequences have seldom fully evolved under a molecular clock, fitting one will almost always have to distort the signal to some degree. Moreover, fossils are often only partially preserved, and thus not necessarily easy to assign to an extant group of organisms (Bateman and Hilton, 2009). Finally, in other groups of organisms such as bacteria or fungi, the entire fossil record is scarce. For this reason, relative (rather than absolute) evolutionary times could at most be used. This holds even though the objectivity of absolute time has been regarded as its major advantage for defining taxonomic ranks (Farris, 1976a; Zachos, 2011). A further concern is that some groups of organisms simply evolved more rapidly, while others experienced stasis (which is why a molecular clock might be difficult to apply), and that this information is actually more valuable for taxonomic classification than evolutionary age (Vences et al., 2013). The quantitative standardization of ranks could be seen as a criterion of secondary importance. From that observation, however, it could be concluded that, instead of applying evolutionary ages, ranks should be standardized via phylogenetic trees that are able to display distinct evolutionary speeds because they were not forced under a molecular clock. The suggestions of using the age of origin of a clade (Hennig, 1965) or its age of differentiation (Farris, 1976a) for assigning ranks could be revisited, after dropping the restriction that branch lengths in trees actually represent absolute or relative time spans. We might expect the species rank to be comparatively easy to standardize as, in contrast to the higher ranks, it has a distinct biological interpretation (Cohan and Perry, 2007; Vences et al., 2013). The distinct modes of speciation are well understood, but there is no specific mechanism for the generation of genera, families and any other category (Wiley and Lieberman, 2011). Nevertheless, numerous distinct

What can Genome Analysis Offer for Bacteria?

and often conflicting species concepts have been suggested, applied and criticized by taxonomists (Wheeler and Meier, 2000; Wiley and Lieberman, 2011). In fact, even modern speciesdelimitation methods based on population statistics can yield widely differing estimates for species boundaries (Carstens et al., 2013; Miralles and Vences, 2013); see also Chapters 15 and 16. Whatever the discrepancies in detail, species concepts applied in zoology and botany at least have the advantage of being based on theory (Cohan and Perry, 2007). In contrast, species delineation derived from DNA:DNA hybridization, as commonly applied in microbiology, is a theory-free phenetic approach. This criticism is valid even though providing an alternative may be quite challenging (Cohan and Perry, 2007; Cohan and Koeppel, 2008). Of greater concern is the delineation of taxa of higher ranks. While there are mechanisms for speciation, there are no apparent mechanisms for genuation, familation, orderation, classation and so on. The Merriam-Webster Learner’s Dictionary (www.learnersdictionary. com/, accessed 7 February 2020) defines objectivity as ‘based on facts rather than feelings or opinions’ and as ‘existing outside of the mind: existing in the real world’. If taxa of a rank higher than species are not real, one would expect that it is impossible to objectively delineate them. Subjective choices by taxonomists could, in turn, lead to discrepancies between taxonomic classifications and so to a need for reconciliation, and the probability of such discrepancies would be entirely independent of the kind of data – genome sequences or otherwise – that were analysed. Quantitative definitions of taxonomic ranks could be regarded as objective because they apply the same quantitative criterion to all taxa of the same rank. However, this does not imply that they are objective with respect to the choice of this quantitative criterion. In microbiology, several studies have attempted to obtain objectivity by estimating quantitative thresholds for each taxonomic rank from the genetic divergence of taxa already proposed at that rank. These efforts were directed towards single genes or genome sequences. For instance, pairwise 16S rRNA gene similarity thresholds were proposed (Yarza et al., 2014), as were percentages of conserved proteins (POCP), as a genomic criterion

269

for the demarcation of bacterial genera (Qin et al., 2014). The first limitation of these approaches is that they are phenetic but not phylogenetic. This was recognized by Yarza et al. (2014) who suggested a relatively complicated procedure to align the ‘operational taxonomic units’ obtained by applying similarity thresholds and ‘operational phylogenetic units’ obtained from some phylogenetic tree. But there are limitations to such approaches, apart from the phenetic nature of similarity or distance thresholds: two technical shortcomings and one fundamental limitation. First, by taking only a single hierarchical classification into account when estimating boundaries, these approaches failed to consider that alternative taxonomic arrangements were proposed in the literature for a huge proportion of taxon names. These alternative classifications would yield higher similarity boundaries if they contained less divergent taxa and lower boundaries if they contained more divergent taxa. Choosing a certain classification simply because it is electronically available at GenBank may be pragmatic but it does not assist in obtaining objective results. Second, both Qin et al. (2014) and Yarza et al. (2014) had to modify or remove problematic, non-monophyletic taxa prior to estimating boundaries. It must not be overlooked, however, that non-monophyletic (polyphyletic or paraphyletic) taxa can be repaired either by splitting them or by merging them with other taxa, or by a combination of both kinds of changes. But whether splitting or merging is preferable depends on the typical divergence expected for a taxon of a given rank. Because the typical divergence per rank is the major result of this kind of study it should not be presupposed by it. Even the complete removal of all non-monophyletic taxa would not necessarily be a sufficient precaution against circular reasoning. Because dropping taxa would discard information, the estimation could be biased, as the non-monophyletic taxa would not enter the analysis but would be affected by its outcome. It appears that the only way to properly estimate thresholds for pairwise distances, or similarities for taxa of a given rank, is to apply an optimality criterion that can be used for minimizing the discrepancies between two partitions; that is, the result from applying some threshold on the one hand and the taxonomic classification on the other hand (Göker et al., 2009). The final result

270

M. Göker

would then be obtained by integrating over a representative selection of alternative published classifications. This also appears to be the sole way for properly implementing taxonomic conservatism in this context, by minimizing the number of reclassifications that would subsequently need to be proposed. Yet the third shortcoming, the fundamental obstacle of such attempts to quantitatively standardize higher ranks, would nevertheless remain. In the case of statistical distributions such as the normal curve (Crawley, 2007) a true sample mean exists. Empirical values drawn from a distribution like that are affected by random fluctuation and deviate from the mean by an error of an expected size. These individual values can thus be regarded as more or less precise estimates of the true value. Since the error terms cancel out, a sufficiently large number of observations allows for obtaining a reasonable estimate of the true sample mean. Crucially, determining the mean from empirical data does not generate objectivity. To the contrary, objectivity is presupposed, as such an approach only makes sense if the underlying real distribution contains a real sample mean which can be recovered from empirical observations. Depending on the robustness of the statistical approach, the analysed data may, of course, deviate to some degree from the quantitative assumptions of the model. However, the assumption that the result of the analysis reflects something that exists in reality cannot be dispensed with. In contrast, differences in maximum 16S rRNA gene or genomic distance within taxa of the same rank are unlikely to be caused by errors that occurred when these taxa were proposed or modified. Rather, these differences are most probably due to the fact that the authors of these taxa did not intend to create groups of a specified genetic or genomic divergence in the first place. In a situation like this, thresholds that are ‘existing in the real world’ cannot be determined. Such approaches cannot yield objective criteria for taxon delineation. The resulting classifications are at most pseudo-objective because they are not actually inferred from some character data. Indeed, an approach that estimates standardized divergences of taxa of distinct ranks from the empirical divergences of taxa that were proposed in the literature cannot be regarded as objective, unless taxa of that rank really existed

in nature. This may well hold for species, but there is no evidence that bacterial genera, families and taxa of even higher rank are real groups. Moreover, even if these taxa were real groups, it remains unclear whether standardizing the divergences of their taxonomic categories would allow for recognizing these real groups. Alternatively, an approach that estimates standardized divergences of taxa from empirical divergences of taxa could be justified by taxonomic conservatism alone, provided the estimation actually minimized the number of taxonomic changes that would be caused if this standardization was enforced. This would depend on the set of validly published taxon names at the time of estimation, and would require updates once new taxa are published by third parties. Objectivity cannot, apparently, be generated by such an approach. In comparison, the Genome Taxonomy Database (GTDB) platform (Parks et al., 2018) has – apart from being available as an online database – the primary advantage of representing an entirely phylogenetic approach. It replaces pairwise distance or similarities with heights of subtrees in a comprehensive, rooted phylogenetic tree. GTDB implements a variant of the time banding approach but uses the somewhat idiosyncratic ‘relative evolutionary divergence’ approach for modifying branch lengths to account for differences in speed of evolution between lineages, instead of fitting a molecular clock (To et al., 2015). GTDB differs from an approach that only attempts to fix non-monophyletic taxa by also trying to standardize taxonomic ranks using evolutionary ages. As may be expected, this yields a high number of proposals for taxonomic revisions. A major shortcoming seems to be the failure of the authors to distinguish between paraphyletic and polyphyletic taxa (Farris, 1974) which may in turn have led the GTDB authors to solve non-monophyly only by splitting, not by merging (Parks et al., 2018). This introduces a bias, because the split taxa are then used for estimating the typical ranges of evolutionary divergence for each rank, although the decision of whether to split or to lump should only be made once the expected evolutionary divergence is already known. GTDB employed only 120 marker proteins (Parks et al., 2018), and while this does not yield a genome-scale data set, the choice was probably dictated by computational limitations. In addition

What can Genome Analysis Offer for Bacteria?

to using the GenBank classification, GTDB also seems to assign the same weight to taxa without a validly published name as it assigns to taxa that do have a validly published name (Parker et al., 2019), which may not be the correct approach to implement taxonomic conservatism. However, one may counter that the GTDB approach is not conservative anyway. The three general criticisms raised above also fully apply to GTDB, as there seem to be too many subjective choices to yield an objective criterion for assigning taxonomic ranks. GTDB is not an objective approach to assigning taxonomic ranks, because estimating standardized divergences of taxa from the empirical divergences of taxa does not allow for detecting taxa that really exist in nature. GTDB is not a conservative approach, either, as it does not minimize the number of implied taxonomic reclassifications. In fact, GTDB has no optimality criterion at all. Are there alternatives to quantitatively assigning ranks? A qualitative approach uses detected synapomorphies of a set of species to assign them to a taxon of a higher rank. This is the normal manner in which phylogenetic systematics is applied in other disciplines such as zoology or botany (Wiley and Liebermann, 2011). Molecular synapomorphies became more widely applied in microbial taxonomy once genome sequences became routinely available. For instance, the actinobacterial genus Mycobacterium was dissected into five distinct genera based on molecular synapomorphies, namely conserved signature indels and conserved signature proteins (Gupta et al., 2018a). Another study (Nouioui et al., 2018) noted that because the analysis of Gupta et al. (2018) revealed a monophyletic genus Mycobacterium with molecular synapomorphies of its own, the decision to split could be regarded as arbitrary. Gupta (2019) responded by emphasizing that ‘molecular synapomorphies can exist at different phylogenetic/taxonomic levels ranging from phylum to the genus levels’. However, Nouioui et al. (2018) neither claimed nor presupposed that such synapomorphies could only exist at one level. Quite the contrary: precisely because molecular synapomorphies exist for many clades, their occurrence cannot rigidly be linked to a specific taxonomic rank. From the viewpoint of the monophyly criterion, and regarding the presence of molecular synapomorphies, the en-

271

tire genus Mycobacterium could as well be kept. Decisions to split monophyletic genera based on molecular synapomorphies were also criticized by others (Margos et al., 2018; Tortoli et al., 2019). This criticism should not be extended to cases in which such synapomorphies were used to repair non-monophyletic groups (Gupta et al., 2018b). Another approach is to neither emphasize a specific divergence per rank nor molecular synapomorphies, but to concentrate on the relatively modest task of only fixing non-monophyletic or poorly supported taxa (Hahnke et al., 2016; Nouioui et al., 2018; García-López et al., 2019). Apart from branch support and taxonomic conservatism, objective criteria are hardly present in such an approach, and some taxonomic decisions may also appear arbitrary to some degree. However, such a protocol most likely yields fewer changes of names overall, and thus higher taxonomic conservatism.

Conclusion The abundant availability of bacterial genome sequences yields several orders of magnitudes more characters than traditional phenotypic techniques and Sanger sequencing of single genes (see Chapters 11 and 13). These huge data matrices can be used to infer highly supported, evolutionarily meaningful phylogenetic trees. Moreover, the evolutionary histories of this enormous amount of genomic characters can be reconstructed and related to each other. Whereas comparative genomic analyses provide ‘grist to the taxonomic mill’, a variety of obstacles would need to be overcome to reconcile the conflicting taxonomic philosophies that have an impact on microbial systematics, and to ensure that distinct analysis pipelines yield the same outcome. Phenetic and phylogenetic thinking still compete with each other on the classification of bacteria, with potentially conflicting and confusing results. Some causes of problematic taxonomic classifications are independent of the type and number of characters that can be used and can only be mitigated if, for example, taxon sampling and branch support are more appropriately taken into account. It may be possible to devise objective criteria for separating bacterial

272

M. Göker

species, but the currently dominating approaches for microbial species delineation may be inadequate. It is even harder to delineate higher taxa;

in contrast to claims in the literature, it may prove to be impossible to objectively assign taxonomic ranks above species level.

References Almpanis, A., Swain, M., Gatherer, D. and McEwan, N. (2018) Correlation between bacterial G+C content, genome size and the G+C content of associated plasmids and bacteriophages. Microbial Genomics 4, 1–8. https://doi.org/10.1099/mgen.0.000168 Auch, A.F., von Jan, M., Klenk, H.P. and Göker, M. (2010) Digital DNA-DNA hybridization for microbial species delineation by means of genome-to-genome sequence comparison. Standards in Genomic Sciences 2, 117–134. https://doi.org/10.4056/sigs.531120 Avise, J.C. and Johns, G.C. (1999) Proposal for a standardized temporal scheme of biological classification for extant species. Proceeding of the National Academy of Sciences USA 96, 7358–7363. https://doi. org/10.1073/pnas.96.13.7358 Avise, J.C. and Liu, J.X. (2011) On the temporal inconsistencies of Linnean taxonomic ranks. Biological Journal of the Linnean Society 102, 707–714. https://doi.org/10.1111/j.1095-8312.2011.01624.x Baek, I., Kim, M., Lee, I., Na, S.-I., Goodfellow, M. and Chun, J. (2018) Phylogeny trumps chemotaxonomy: a case study involving Turicella otitidis. Frontiers in Microbiology 9, 1–10. https://doi.org/10.3389/ fmicb.2018.00834 Bapteste, E. and Boucher, Y. (2009) Epistemological impacts of horizontal gene transfer on classification in microbiology. Methods in Molecular Biology 532, 55. https://doi.org/10.1007/978-1-60327-853-9 Barberán, A., Velazquez, H.C., Jones, S. and Fierer, N. (2017) Hiding in plain sight: mining bacterial species records for phenotypic trait information. mSphere 2, e00237–17. https://doi.org/10.1128/ mSphere.00237-17 Bartling, P., Brinkmann, H., Bunk, B., Overmann, J., Göker, M. and Petersen, J. (2017) The composite 259kb plasmid of Martelella mediterranea DSM 17316T - a natural replicon with functional RepABC modules from Rhodobacteraceae and Rhizobiaceae. Frontiers in Microbiology 8, 1787. https://doi. org/10.3389/fmicb.2017.01787 Bateman, R.M. and Hilton, J. (2009) Palaeobotanical systematics for the phylogenetic age: Applying organspecies, form-species and phylogenetic species concepts in a framework of reconstructed fossil and extant whole-plants. Taxon 58, 1254–1280. https://doi.org/10.1002/tax.584016 Ben Hania, W., Joseph, M., Schumann, P., Bunk, B., Fiebig, A., Sproer, C., Klenk, H.P., Fardeau, M.L. and Spring, S. (2015) Complete genome sequence and description of Salinispira pacifica gen. nov., sp. nov., a novel spirochaete isolated form a hypersaline microbial mat. Standards in Genomic Sciences 10, 7. https://doi.org/10.1111/1462-2920.13639 Berger, S.A. and Stamatakis, A. (2010) Accuracy of Morphology-based Phylogenetic Fossil Placement under Maximum Likelihood. 8th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA-10), Hammamet, Tunisia. https://doi.org/10.1109/AICCSA.2010.5586939 Berger, S.A., Stamatakis, A. and Lücking, R. (2011) Morphology-based phylogenetic binning of the lichen genera Allographa and Graphis via molecular site weight calibration. Taxon 60, 1450–1457. https://doi. org/10.1002/tax.605020 Bergsten, J. (2005) A review of long-branch attraction. Cladistics 21, 163–193. https://doi.org/10.1111/j. 1096-0031.2005.00059.x Breider, S., Scheuner, C., Schumann, P., Fiebig, A., Petersen, J., Pradella, S., Klenk, H.P., Brinkhoff, T. and Göker, M. (2014) Genome-scale data suggest reclassifications in the Leisingera-Phaeobacter cluster including proposals for Sedimentitalea gen. nov. and Pseudophaeobacter gen. nov. Frontiers in Microbiology 5, 416. https://doi.org/10.3389/fmicb.2014.00416 Breiman, L. (2001) Random Forests. Machine Learning 45, 5–32. https://doi.org/10.1023/A:1010933404324 Brenner, D.J., Fanning, G.R., Skerman, F.J. and Falkow, S. (1972) Polynucleotide sequence divergence among strains of Escherichia coli and closely related organisms. Journal of Bacteriology 109, 953– 965. https://doi.org/10.1128/JB.109.3.953-965.1972 Brenner, D.J. (1973) Deoxyribonucleic acid reassociation in the taxonomy of enteric bacteria. International Journal of Systematic Bacteriology 23, 298–307. https://doi.org/10.1099/00207713-23-4-298

What can Genome Analysis Offer for Bacteria?

273

Brinkmann, H., Göker, M., Koblížek, M., Wagner-Döbler, I. and Petersen, J. (2018) Horizontal operon transfer, plasmids and the evolution of photosynthesis in Rhodobacteraceae. The ISME Journal 12, 1994–2010. https://doi.org/10.1038/s41396-018-0150-9 Bromham, L. and Penny, D. (2003) The modern molecular clock. Nature Reviews Genetics 4, 216–224. https://doi.org/10.1038/nrg1020 Bucknam, J., Boucher, Y., and Bapteste, E. (2006) Refuting phylogenetic relationships. Biology Direct 1, 26. https://doi.org/10.1186/1745-6150-1-26 Camin, J.H. and Sokal, R.R. (1965) A method for deducing branching sequences in phylogeny. Evolution 19, 311–326. https://doi.org/10.1111/j.1558-5646.1965.tb01722.x Carro, L., Nouioui, I., Sangal, V., Meier-Kolthoff, J.P., Trujillo, M.E., Montero-Calasanz, M.D.C., Sahin, N., Smith, D., Spittle, K., Peluso, P., Deshpande, S., Woyke, T., Shapiro, N., Kyrpides, N.C., Klenk, H-P., Göker, M. and Goodfellow, M. (2018) Genome-based classification of micromonosporae with a focus on their biotechnological and ecological potential. Scientific Reports 8, 525. https://doi.org/10.1038/ s41598-017-17392-0 Carstens, B.C., Pelletier, T.A., Reid, N.M. and Satler, J.D. (2013) How to fail at species delimitation. Molecular Ecology 22, 4369–4383. https://doi.org/10.1111/mec.12413 Chun, J., Oren, A., Ventosa, A., Christensen, H., Arahal, D.R., da Costa, M.S., Rooney, A.P., Yi, H., Xu, X.-W., De Meyer, S. and Trujillo, M.E. (2018) Proposed minimal standards for the use of genome data for the taxonomy of prokaryotes. International Journal of Systematic and Evolutionary Microbiology 68, 461–466. https://doi.org/10.1099/ijsem.0.002516 Chun, J. and Rainey, F.A. (2014) Integrating genomics into the taxonomy and systematics of the Bacteria and Archaea. International Journal of Systematic and Evolutionary Microbiology 64, 316–324. https:// doi.org/10.1099/ijs.0.054171-0 Ciccarelli, F.D., Doerks, T., von Mering, C., Creevey, C.J., Snel, B. and Bork, P. (2006) Toward automatic reconstruction of a highly resolved tree of life. Science 311, 1283–1287. https://doi.org/10.1126/ science.1123061 Cohan, F.M. and Koeppel, A.F. (2008) The origins of ecological diversity in prokaryotes. Current Biology 18, R1024–R1034. https://doi.org/10.1016/j.cub.2008.09.014 Cohan, F.M. and Perry, E.B. (2007) A systematics for discovering the fundamental units of bacterial diversity. Current Biology 17, R373–R386. https://doi.org/10.1016/j.cub.2007.03.032 Collins, M.D. (1985) Analysis of isoprenoid quinones. Methods in Microbiology 18, 329–366. https://doi. org/10.1016/S0580-9517(08)70480-X Collins, M.D. and Jones, D. (1981) Distribution of isoprenoid quinone structural types in bacteria and their taxonomic implication. Microbiological Reviews 45, 316–354. https://doi.org/10.1128/MMBR.45.2.316354.1981 Colwell, R.R. (1970) Polyphasic taxonomy of the genus Vibrio: numerical taxonomy of Vibrio cholerae, Vibrio parahaemolyticus, and related Vibrio species. Journal of Bacteriology 104, 410–433. doi:10.1128/jb.104.1.410-433.1970 Crawley, M.J. (2007) The R Book. John Wiley and Sons, Inc., Hoboken, New Jersey. https://doi.org/10.1002/ 9780470515075 Cummins, C.S. and Harris, H. (1956) The chemical composition of the cell wall in some Gram-positive bacteria and its possible value as a taxonomic character. Journal of General Microbiology 14, 583–600. https://doi.org/10.1099/00221287-14-3-583 Dagan, T. and Martin, W. (2006) The tree of one percent. Genome Biology 7, 118. https://doi.org/10.1186/ gb-2006-7-10-118 Darwin, C. (1859/2004) Recapitulation and conclusion. In: On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life. Reprint 2004. CRW Publishing Limited, London, pp. 494–526. Dedysh, S.N., Henke, P., Ivanova, A.A., Kulichevskaya, I.S., Philippov, D.A., Meier-Kolthoff, J.P., Göker, M., Huang, S., Overmann, J. (2020) 100 year-old enigma solved: identification of Planctomyces bekefii, the type genus and species of the phylum Planctomycetes. Environmental Microbiology 22, 198–211. https://doi.org/10.1111/1462-2920.14838 De Ley, J. (1970) Reexamination of the association between melting point, buoyant density and chemical base composition of deoxyribonucleotide acid. Journal of Bacteriology 101, 738–754. https://doi. org/10.1128/JB.101.3.738-754.1970 Deneke, C., Rentzsch, R. and Renard, B. (2017) PaPrBaG: A machine learning approach for the detection of novel pathogens from NGS data. Scientific Reports 7, 39194. https://doi.org/10.1038/srep39194

274

M. Göker

De Queiroz, K. and Gauthier, J. (1990) Phylogeny as a central principle in taxonomy: phylogenetic definitions of taxon names. Systematic Zoology 39, 307–322. https://doi.org/10.2307/2992353 Desper, R. and Gascuel, O. (2004) Theoretical foundation of the balanced minimum evolution method of phylogenetic inference and its relationship to weighted least-squares tree fitting. Molecular Biology and Evolution 21, 587–598. https://doi.org/10.1093/molbev/msh049 Dominguez, E. and Wheeler, Q.D. (1997) Forum - Taxonomic stability is ignorance. Cladistics 13, 367–372. https://doi.org/10.1111/j.1096-0031.1997.tb00325.x Doolittle, W.F. and Bapteste, E. (2007) Pattern pluralism and the tree of life hypothesis. Proceedings of the National Academy of Sciences USA 104, 2043–2049. https://doi.org/10.1073/pnas.0610699104 Dutilh, B.E., van Noort, V., van der Heijden, R.T.J.M., Boekhout, T., Snel, B. and Huynen, M.A. (2007) Assessment of phylogenomic and orthology approaches for phylogenetic inference. Bioinformatics 23, 815–824. https://doi.org/10.1093/bioinformatics/btm015 Estivill-Castro, V. (2002) Why so many clustering algorithms - A Position Paper. ACM SIGKDD Explorations Newsletter 4, 65–75. https://doi.org/10.1145/568574.568575 Farris, J.S. (1967) Definitions of taxa. Systematic Zoology 16, 174–175. https://doi.org/10.2307/2411414 Farris, J.S. (1970) Methods for computing Wagner trees. Systematic Zoology 19, 83–92. https://doi. org/10.1093/sysbio/19.1.83 Farris, J.S. (1974) Formal definitions of paraphyly and polyphyly. Systematic Zoology 23, 548–554. https:// doi.org/10.1093/sysbio/23.4.548 Farris, J.S. (1976a) Phylogenetic classification of fossils with recent species. Systematic Zoology 25, 271–282. https://doi.org/10.2307/2412495 Farris, J.S. (1976b) Expected asymmetry of phylogenetic trees. Systematic Zoology 25, 196–198. https:// doi.org/10.2307/2412748 Farris, J.S. (1977) Phylogenetic analysis under Dollo’s Law. Systematic Zoology 26, 77–88. https://doi. org/10.1093/sysbio/26.1.77 Farris, J.S. (1979) The information content of the phylogenetic system. Systematic Zoology 28, 483–519. https://doi.org/10.2307/sysbio/28.4.483 Felsenstein, J.S. (1978) Cases in which parsimony or compatibility methods will be positively misleading. Systematic Zoology 27, 401–410. https://doi.org/10.1093/sysbio/27.4.401 Felsenstein, J. (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17, 368–376. https://doi.org/10.1007/BF01734359 Felsenstein, J. (1985) Phylogenies and the comparative method. The American Naturalist 125, 1–15. https://doi.org/10.1086/284325 Felsenstein, J. (2004) Inferring Phylogenies. Sinauer Associates, Sunderland, Massachusetts. Fitch, W.M. (1971) Toward defining the course of evolution: minimum change for a specific tree topology. Systematic Zoology 20, 406–416. https://doi.org/10.1093/sysbio/20.4.406 Fox, G.E., Pechman, K.R. and Woese, C.R. (1977) Comparative cataloging of 16S ribosomal ribonucleic acids: molecular approach to prokaryotic systematics. International Journal of Systematic Bacteriology 27, 44–57. https://doi.org/10.1099/00207713-27-1-44 Funke, G., Stubbs, S., Altwegg, M., Carlotti, A. and Collins, M.D. (1994) Turicella otitidis gen. nov., sp. nov., a coryneform bacterium isolated from patients with otitis media. International Journal of Systematic Bacteriology 44, 270–273. https://doi.org/10.1099/00207713-44-2-270 García-López, M., Meier-Kolthoff, J.P., Tindall, B.J., Gronow, S., Woyke, T., Kyrpides, N.C., Hahnke, R.L. and Göker, M. (2019) Analysis of 1,000 type-strain genomes improves taxonomic classification of Bacteroidetes. Frontiers in Microbiology 10, 2083. https://doi.org/10.3389/fmicb.2019.02083 Gardner, J.J. and Boyle, N.R. (2017) The use of genome-scale metabolic network reconstruction to predict fluxes and equilibrium composition of N-fixing versus C-fixing cells in a diazotrophic cyanobacterium, Trichodesmium erythraeum. BMC Systematic Biology 11, 4. https://doi.org/10.1186/s12918-0160383-z Gee, H. (2003) Evolution: ending incongruence. Nature 452, 782. https://doi.org/10.1038/425782a Ghiselin, M.T. (1992) Will a real evolutionary ecologist please stand up? Biology and Philosophy 7, 355–359. https://doi.org/10.1007/BF00129976 Ghiselin, M.T. (2004) Mayr and Bock versus Darwin on genealogical classification. Journal Of Zoological Systematics And Evolutionary Research 42, 165–169. https://doi.org/10.1111/j.1439-0469.2004.00258.x Gillis, P., Vandamme, P., De Vos, P., Swings, J. and Kersters, K. (2005) Polyphasic taxonomy. In: Brenner, D.J., Krieg, N.R., Staley, J.T. and Garrity, G.M. (eds) Bergey’s Manual of Systematic Bacteriology, Vol. 2, The Proteobacteria (Part A). Springer Verlag, New York, pp. 43–48. https://doi.org/10.1007/0-387-28021-9_7

What can Genome Analysis Offer for Bacteria?

275

Glaeser, S.P. and Kämpfer, P. (2015) Multilocus sequence analysis (MLSA) in prokaryotic taxonomy. Systematic and Applied Microbiology 38, 237–245. https://doi.org/10.1016/j.syapm.2015.03.007 Göker, M., García-Blázquez, G., Voglmayr, H., Tellería, M.T. and Martín, M.P. (2009) Molecular taxonomy of phytopathogenic fungi: a case study in Peronospora. PLoS ONE 4, e6319. https://doi.org/10.1371/ journal.pone.0006319 Goloboff, P.A., Farris, J.S. and Nixon, K.C. (2008) TNT, a free program for phylogenetic analysis. Cladistics 24, 774–786. https://doi.org/10.1111/j.1096-0031.2008.00217.x Goloboff, P., Mattoni, C., Quinteros, S. (2005) Continuous characters analyzed as such. Cladistics 22, 589–601. https://doi.org/10.1111/j.1096-0031.2006.00122.x Gram, C. (1884) Über die isolierte Färbung der Schizomyceten in Schnitt- und Trockenpräparaten. Fortschritte der Medizin 2, 185–189. Gregersen, T. (1978) Rapid method for distinction of gram-negative from positive bacteria. Applied Microbiology and Biotechnology 5, 123–127. https://doi.org/10.1007/BF00498806 Grimm, G.W., Kapli, P., Bomfleur, B., McLoughlin, S. and Renner, S.S. (2015) Using more than the oldest fossils: dating Osmundaceae with three Bayesian clock approaches. Systematic Biology 64, https:// doi.org/10.1101/005777 Gupta, R.S. (2019) Commentary: Genome-based taxonomic classification of the phylum Actinobacteria. Frontiers in Microbiology 10, 206. https://doi.org/10.3389/fmicb.2019.00206 Gupta, R.S., Lo, B. and Son, J. (2018a) Phylogenomics and comparative genomic studies robustly support division of the genus Mycobacterium into an emended genus Mycobacterium and four novel genera. Frontiers in Microbiology 9, 67. https://doi.org/10.3389/fmicb.2018.00067 Gupta, R.S., Sawnani, S., Adeolu, M., Alnajar, S. and Oren, A. (2018b) Phylogenetic framework for the phylum Tenericutes based on genome sequence data: proposal for the creation of a new order Mycoplasmoidales ord. nov., containing two new families Mycoplasmoidaceae fam. nov. and Metamycoplasmataceae fam. nov. harbouring Eperythrozoon, Ureaplasma and five novel genera. Antonie Van Leeuwenhoek 111, 1583–1630. https://doi.org/10.1007/s10482-018-1047-3 Hahnke, R.L., Meier-Kolthoff, J.P., García-López, M., Mukherjee, S., Huntemann, M., Ivanova, N.N., Woyke, T., Kyrpides, N.C., Klenk, H.P. and Göker, M. (2016) Genome-based taxonomic classification of Bacteroidetes. Frontiers in Microbiology 7, 2003. https://doi.org/10.3389/fmicb.2016.02003 Harvey, P.H. and Pagel, M.D. (1991) The Comparative Method in Evolutionary Biology (Vol. 239). Oxford University Press, Oxford, UK. Hennig, W. (1965) Phylogenetic systematics. Annual Review of Entomology 10, 97–116. https://doi. org/10.1146/annurev.en.10.010165.000525 Hennig, W. (1975) ‘Cladistic analysis or cladistic classification?’: a reply to Ernst Mayr. Systematic Zoology 24, 244–256. https://doi.org/10.1093/sysbio/24.2.244 Hess, P.N. and De Moraes Russo, C.A. (2007) An empirical test of the midpoint rooting method. Biological Journal of the Linnean Society 92, 669–674. https://doi.org/10.1111/j.1095-8312.2007.00864.x Hildebrand, F., Meyer, A. and Eyre-Walker, A. (2010) Evidence of selection upon genomic GC-content in bacteria. PloS Genetics 6, e1001107. https://doi.org/10.1371/journal.pgen.1001107 Holt, B.G. and Jønsson, K.A. (2014) Reconciling hierarchical taxonomy with molecular phylogenies. Systematic Biology 63, 1010–1017. https://doi.org/10.1093/sysbio/syu061 Hull, D.L. (1964) Consistency and monophyly. Systematic Zoology 13, 1–11. https://doi.org/10.2307/2411431 Hull, D.L. (1970) Contemporary systematic philosophies. Annual Review of Ecological Systems 1, 19–54. https://doi.org/10.1146/annurev.es.01.110170.000315 Huntemann, M., Ivanova, N.N., Mavromatis, K., Tripp, H.J., Paez-Espino, D., Palaniappan, K. et al. (2015) The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4). Standards in Genomic Sciences 10, 86. https://doi.org/10.1186/s40793-015-0077-y Hyatt, D., Chen, G.L., LoCascio, P.F., Land, M.L., Larimer, F.W. and Hauser, L.J. (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119. https://doi. org/10.1186/1471-2105-11-119 Jeffroy, O., Brinkmann, H., Delsuc, F. and Philippe, H. (2006) Phylogenomics: the beginning of incongruence? Trends in Genetics 22, 225–231. https://doi.org/10.1016/j.tig.2006.02.003 Johnson, L.A.S. (1970) Rainbow’s end: the quest for an optimal taxonomy. Systematic Biology 19, 203– 239. doi:10.2307/2412206 Johnson, J.L. and Whitman, W.B. (2007) Similarity analysis of DNAs. In: Beveridge, T.J., Breznak, J.A., Marzluf, G.A., Schmidt, T.M. and Snyder, L.R. (eds) Methods for General and Molecular Microbiology. American Society for Microbiology, Washington, DC, pp. 624–652.

276

M. Göker

Kämpfer, P. (2014) Continuing importance of the ‘phenotype’ in the genomic era. In: Methods in Microbiology Vol. 41. Academic Press, London, pp. 307–320. https://doi.org/10.1016/bs.mim.2014.07.005 Kämpfer, P. and Glaeser, S.P. (2012) Prokaryotic taxonomy in the sequencing era - the polyphasic approach revisited. Environmental Microbiology 14, 291–317. https://doi.org/10.1111/j.14622920.2011.02615.x Keller, R. and Boyd, R. and Wheeler, Q. (2003) The illogical basis of phylogenetic nomenclature. The Botanical Review 69, 93–110. https://doi.org/10.1663/0006-8101(2003)069[0093:TIBOPN]2.0.CO;2 Klenk, H.P. and Göker, M. (2010) En route to a genome-based classification of Archaea and Bacteria? Systematic and Applied Microbiology 33, 175–182. https://doi.org/10.1016/j.syapm.2010.03.003 Kluge, A.G. and Farris, J.S. (1969) Quantitative phyletics and the evolution of anurans. Systematic Zoology 18, 1–32. https://doi.org/10.1093/sysbio/18.1.1 Ko, C.Y., Johnson, J.L., Barnett, L.B., McNair, H.M. and Vercellotti, J.R. (1977) A sensitive estimation of the percentage of guanine plus cytosine in deoxyribonucleic acid by high performance liquid chromatography. Analytical Biochemistry 80, 183–192. https://doi.org/10.1016/0003-2697(77)90638-8 Konstantinidis, K.T. and Tiedje, J.M. (2005) Towards a genome-based taxonomy for prokaryotes. Journal of Bacteriology 187, 6258–6264. https://doi.org/10.1128/JB.187.18.6258-6264.2005 Kroppenstedt, R.M. (1982) Separation of bacterial menaquinones by HPLC using reverse phase (RP18) and a silver loaded ion exchanger. Journal of Liquid Chromatography 5, 2359–2387. https://doi. org/10.1080/01483918208067640 Kublanov, I.V., Sigalova, O.M., Gavrilov, S.N., Lebedinsky, A.V., Rinke, C., Kovaleva, O., Chernyh, N.A., Ivanova, N.N., Daum, C., Reddy, T.B.K., Klenk, H-P., Spring, S., Göker, M., Reva, O.N., Miroshnichenko, M.L., Kyrpides, N.C., Woyke, T., Gelfand, M.S. and Bonch-Osmolovskaya, E.A. (2017) Genomic analysis of Caldithrix abyssi and proposal of a novel bacterial phylum Calditrichaeota. Frontiers in Microbiology 8, 195. https://doi.org/10.3389/fmicb.2017.00195 Kulichevskaya, I.S., Ivanova, A.O., Belova, S.E., Baulina, O.I., Bodelier, P.L., Rijpstra, W.I., Sinninghe Damste, J.S., Zavarzin, G.A. and Dedysh, S.N. (2007) Schlesneria paludicola gen. nov., sp. nov., the first acidophilic member of the order Planctomycetales, from Sphagnum-dominated boreal wetlands. International Journal of Systematic and Evolutionary Microbiology 57, 2680–2687. https://doi. org/10.1099/ijs.0.65157-0 Kulichevskaya, I.S., Ivanova, A.A., Detkova, E.N., Rijpstra, W.I., Sinninghe Damste, J.S. and Dedysh, S.N. (2015) Planctomicrobium piriforme gen. nov., sp. nov., a stalked planctomycete from a littoral wetland of a boreal lake. International Journal of Systematic and Evolutionary Microbiology 65, 1659–1665. https://doi.org/10.1099/ijs.0.000154 Kyrpides, N.C., Hugenholtz, P., Eisen, J.A., Woyke, T., Göker, M., Parker, C.T., Amann, R., Beck, B.J., Chain, P., Chun, J., Colwell, R.R., Danchin, A., Dawyndt, P., Dedeurwaerdere, T., DeLong, E.F., Detter, J.C., Vos, P.D., Donohue, T.J., Dong, X.Z., Ehrlich, D.S., Fraser, C., Gibbs, R., Gilbert, J., Gilna, P., Glöckner, F.O., Jansson, J.K., Keasling, J.D., Knight, R., Labeda, D., Lapidus, A., Lee, J.S., Li, W.J., Ma, J., Markowitz, V., Moore, E.R., Morrison, M., Meyer, F., Nelson, K.E., Ohkuma, M., Ouzounis, C.A., Pace, N., Parkhill, J., Qin, N., Rossello-Mora, R., Sikorski, J., Smith, D., Sogin, M., Stevens, R., Stingl, U., Suzuki, K.I., Taylor, D., Tiedje, J.M., Tindall, B.J., Wagner, M., Weinstock, G., Weissenbach, J., White, O., Wang, J., Zhang, L., Zhou, Y.G., Consortium, G.S., Field, D., Whitman, W.B., Garrity, G.M. and Klenk, H.P. (2014) Genomic encyclopedia of bacteria and archaea: sequencing a myriad of type strains. PLoS Biology 12, e1001920. https://doi.org/10.1371/journal.pbio.1001920 Lassalle, F., Périan, S., Bataillon, T., Nesme, X., Duret, L. and Daubin, V. (2015) GC-Content evolution in bacterial genomes: the biased gene conversion hypothesis expands. PLoS Genetics 11, e1004941. https://doi.org/10.1371/journal.pgen.1004941 Lechevalier, M.P. and Lechevalier, H.A. (1970) Chemical composition as a criterion in the classification of aerobic actinomycetes. International Journal of Systematic Bacteriology 20, 435–443. https://doi. org/10.1099/00207713-20-4-435 Lechner, M., Findeiß, S., Steiner, L., Marz, M., Stadler, P.F. and Prohaska, S.J. (2011) Proteinortho: Detection of (Co-)orthologs in large-scale analysis. BMC Bioinformatics 12, 124. https://doi. org/10.1186/1471-2105-12-124 Leebens-Mack, J., Raubeson, L.A., Cui, L., Kuehl, J.V., Fourcade, M.H., Chumley, T.W., Boore, J.L., Jansen, R.K. and dePamphilis, C.W. (2005) Identifying the basal angiosperm node in chloroplast genome phylogenies: sampling’s one way out of the Felsenstein zone. Molecular Biology and Evolution 22, 1948–1963. https://doi.org/10.1093/molbev/msi191

What can Genome Analysis Offer for Bacteria?

277

Lefort, V., Desper, R. and Gascuel, O. (2015) FastME 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program. Molecular Biology and Evolution 32, 2798–2800. https:// doi.org/10.1093/molbev/msv150 Legendre, P. and Legendre, L. (1998) Numerical Ecology, 2nd edn. Elsevier Science BV, Amsterdam. Lester, R.L. and Crane, F.L. (1959) The natural occurrence of coenzyme Q and related compounds. Journal of Biological Chemistry 234, 2169–2175. Lewis, P. (2001) A likelihood approach to estimating phylogeny from discrete morphological character data. Systematic Biology 50, 913–925. https://doi.org/10.1080/106351501753462876 Lienau, K.E. and DeSalle, R. (2009) Evidence, content and corroboration and the tree of life. Acta Biotheoretica 57, 187–199. https://doi.org/10.1007/s10441-008-9066-5 Mann, S. and Chen, Y.P.P. (2010) Bacterial genomic G + C composition-eliciting environmental adaptation. Genomics 95, 7–15. https://doi.org/10.1016/j.ygeno.2009.09.002 Margos, G., Gofton, A., Wibberg, D., Dangel, A., Marosevic, D., Loh, S.M. et al. (2018) The genus Borrelia reloaded. PLoS ONE 13, e0208432. doi:10.1371/journal.pone.0208432. https://doi.org/10.1371/journal.pone.0208432 Marmur, J. and Doty, P. (1962) Determination of the base composition of deoxyribonucleic acid from its thermal denaturation temperature. Journal of Molecular Biology 5, 109–118. https://doi.org/10.1016/ S0022-2836(62)80066-7 Mavromatis, K., Land, M.L., Brettin, T.S., Quest, D.J., Copeland, A., Clum, A. et al. (2012) The fast changing landscape of sequencing technologies and their impact on microbial genome assemblies and annotation. PLoS One 7, e48837. https://doi.org/10.1371/journal.pone.0048837 Mayr, E. (1974) Cladistic analysis or cladistic classification? Journal of Zoological Systematics And Evolutionary Research 12, 94–128. https://doi.org/10.1111/j.1439-0469.1974.tb00160.x McCutcheon, J.P., McDonald, B.R. and Moran, N.A. (2009) Origin of an alternative genetic code in the extremely small and GC-rich genome of a bacterial symbiont. PLoS Genetics 5, e1000565. https://doi. org/10.1371/journal.pgen.1000565 Meier-Kolthoff, J.P., Auch, A.F., Klenk, H.P. and Göker, M. (2013a) Genome sequence-based species delimitation with confidence intervals and improved distance functions. BMC Bioinformatics 14, 60. https:// doi.org/10.1186/1471-2105-14-60 Meier-Kolthoff, J.P., Göker, M., Spröer, C. and Klenk, H.P. (2013b) When should a DDH experiment be mandatory in microbial taxonomy? Archives of Microbiology 195, 413–418. https://doi.org/10.1007/ s00203-013-0888-4 Meier-Kolthoff, J.P., Auch, A.F., Klenk, H.P. and Göker, M. (2014a) Highly parallelized inference of large genome-based phylogenies. Concurrency And Computation-Practice & Experience 26, 1715–1729. https://doi.org/10.1002/cpe.3112 Meier-Kolthoff, J.P., Klenk, H.P. and Göker, M. (2014b) Taxonomic use of DNA G+C content and DNA-DNA hybridization in the genomic age. International Journal of Systematic and Evolutionary Microbiology 64, 352–356. https://doi.org/10.1099/ijs.0.056994-0 Meier-Kolthoff, J.P. and Göker, M. (2019) TYGS is an automated high-throughput platform for state-of-the-art genome-based taxonomy. Nature Communications 10, 2182. https://doi.org/10.1038/s41467-01910210-3 Mesbah, M., Premachandran, U. and Whitman, W.B. (1989) Precise measurement of the G+C content of deoxyribonucleic acid by high-performance liquid chromatography. International Journal of Systematic Bacteriology 39, 159–167. https://doi.org/10.1099/00207713-39-2-159 Minnikin, D.E., O’Donnell, A.G., Goodfellow, M., Alderson, G., Athalye, M., Schaal, K. and Parlett, J.H. (1984) An integrated procedure for the extraction of bacterial isoprenoid quinones and polar lipids. Journal of Microbiological Methods 2, 233–241. https://doi.org/10.1016/0167-7012(84)90018-6 Miralles, A. and Vences, M.O. (2013) New metrics for comparison of taxonomies reveal striking discrepancies among species delimitation methods in Madascincus lizards. PLoS One 2013, 8. https://doi. org/10.1371/journal.pone.0068242 Montero-Calasanz, M.D.C., Meier-Kolthoff, J.P., Zhang, D.F., Yaramis, A., Rohde, M., Woyke, T., Kyrpides, N., Schumann, P., Li, W.J. and Göker, M. (2017) Genome-scale data call for a taxonomic rearrangement of the family Geodermatophilaceae. Frontiers in Microbiology 8, 2501. https://doi.org/10.3389/fmicb. 2017.02501 Moreira, A.P.B., Pereira, N., Thompson, F.L., Pereira Jr., N. and Thompson, F.L. (2011) Usefulness of a real-time PCR platform for G+C content and DNA-DNA hybridization estimations in vibrios. International Journal of Systematic and Evolutionary Microbiology 61, 2379–2383. https://doi.org/10.1099/ijs.0.023606-0

278

M. Göker

Mukherjee, S., Seshadri, R., Varghese, N.J., Eloe-Fadrosh, E.A., Meier-Kolthoff, J.P., Göker, M., Coates, R.C., Hadjithomas, M., Pavlopoulos, G.A., Espino, D.P., Yoshikuni, Y., Visel, A., Whitman, W.B., Garrity, G.M., Eisen, J.A., Hugenholtz, P., Pati, A., Ivanova, N.N., Woyke, T., Klenk and H.-P. and Kyrpides, N.C. (2017) 1,003 reference genomes of bacterial and archaeal isolates expand coverage of the tree of life. Nature Biotechnology 35, 676–685. https://doi.org/10.1038/nbt.3886 Neimark, H.C. and Lange, C.S. (1990) Pulse-field electrophoresis indicates full-length mycoplasma chromosomes range widely in size. Nucleic Acids Research 18, 5443–5448. https://doi.org/10.1093/nar/ 18.18.5443 Nixon, K., Carpenter, J. and Stevenson, D. (2003) The PhyloCode is fatally flawed, and the ‘Linnaean’ system can easily be fixed. The Botanical Review 69, 111–120. https://doi.org/10.1663/0006-8101(2003) 069[0111:TPIFFA]2.0.CO;2 Nouioui, I., Carro, L., García-López, M., Meier-Kolthoff, J-P., Woyke, T., Kyrpides, N.C., Klenk, H.P., Goodfellow, M. and Göker, M. (2018) Genome-based taxonomic classification of the phylum Actinobacteria. Frontiers in Microbiology 9, 2007. https://doi.org/10.3389/fmicb.2018.02007 Okamura, K., Kawai, A., Wakao, N., Yamada, T. and Hiraishi, A. (2015) Acidiphilium iwatense sp. nov., isolated from an acid mine drainage treatment plant, and emendation of the genus Acidiphilium. International Journal of Systematic and Evolutionary Microbiology 65, 42–48. https://doi.org/10.1099/ ijs.0.065052-0 Overmann, J., Abt, B. and Sikorski, J. (2017) Present and future of culturing bacteria. Annual Review of Microbiology 71, 711–730. https://doi.org/10.1146/annurev-micro-090816-093449 Overmann, J., Huang, S., Nübel, U., Hahnke, R.L. and Tindall, B.J. (2019) Relevance of phenotypic information for the taxonomy of not-yet-cultured microorganisms. Systematic and Applied Microbiology 42, 22–29. https://doi.org/10.1016/j.syapm.2018.08.009 Owen, R.J., Hill, L.R. and Lapage, S.P. (1969) Determination of DNA base compositions from melting profiles in dilute buffers. Biopolymers 7, 503–516. https://doi.org/10.1002/bip.1969.360070408 Parker, C.T., Tindall, B.J. and Garrity, G.M. (2019) International Code of Nomenclature of Prokaryotes. Prokaryotic Code (2008 Revision). International Journal of Systematic and Evolutionary Microbiology 69, S1–S111. https://doi.org/10.1099/ijsem.0.000778 Parks, D., Chuvochina, M., Waite, D. et al. (2018) A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nature Biotechnology 36, 996–1004. https://doi. org/10.1038/nbt.4229 Pelczar, M.J. Jr (ed.) (1957) Manual of Microbiological Methods. McGraw-Hill Book Co, New York. Penny, D., Foulds, L.R. and Hendy, M.D. (1982) Testing the theory of evolution by comparing phylogenetic trees constructed from five different protein sequences. Nature 297, 197–200. https://doi.org/10.1038/297197a0 Phillips, M.J., Delsuc, F. and Penny, D. (2004) Genome-scale phylogeny and the detection of systematic biases. Molecular Biology and Evolution 21, 1740–1752. https://doi.org/10.1093/molbev/msh137 Philippe, H., Delsuc, F., Brinkmann, H. and Lartillot, N. (2005) Phylogenomics. Annual Review of Ecological Systems 36, 541–562. https://doi.org/10.1146/annurev.ecolsys.35.112202.130205 Price, M.N., Dehal, P.S. and Arkin, A.P. (2010) FastTree 2 - Approximately Maximum-Likelihood trees for large alignments. PLoS ONE 5, e9490. https://doi.org/10.1371/journal.pone.0009490 Qin, Q.L., Xie, B.B., Zhang, X.Y., Chen, X.L., Zhou, B.C., Zhou, J. et al. (2014) A proposed genus boundary for the prokaryotes based on genomic insights. Journal of Bacteriology 196, 2210–2215. https://doi. org/10.1128/JB.01688-14 Ramasamy, D., Mishra, A.K., Lagier, J.C., Padhmanabhan, R., Rossi, M., Sentausa, E. et al. (2014) A polyphasic strategy incorporating genomic data for the taxonomic description of novel bacterial species. International Journal of Systematic Bacteriology 64, 384–391. https://doi.org/10.1099/ijs.0.057091-0 R Core Team (2019) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/. Richter, M. and Rosselló-Móra, R. (2009) Shifting the genomic gold standard for the prokaryotic species definition. Proceedings of the National Academy of Sciences USA 106, 19126–19131. https://doi. org/10.1073/pnas.0906412106 Rocha, E.P.C. and Danchin, A. (2002) Base composition bias might result from competition for metabolic resources. Trends in Genetics 18, 291–294. https://doi.org/10.1016/S0168-9525(02)02690-2 Rosselló-Mora, R. and Amann, R. (2001) The species concept for prokaryotes. FEMS Microbiological Reviews 25, 39–67. https://doi.org/10.1016/S0168-6445(00)00040-1 Salter, S.J., Scott, P., Page, A.J., Tracey, A., de Goffau, M.C., Cormie, C., Ochoa-Montano, B., Ling, C.L., Tangmanakit, J., Turner, P. et al. (2019) ‘Candidatus Ornithobacterium hominis’: insights gained from

What can Genome Analysis Offer for Bacteria?

279

draft genomes obtained from nasopharyngeal swabs. Microbial Genomics 159, 157–160. https://doi. org/10.1099/mgen.0.000247 Sangal, V., Goodfellow, M., Jones, A.L., Schwalbe, E.C., Blom, J., Hoskisson, P.A. et al. (2016) Nextgeneration systematics: an innovative approach to resolve the structure of complex prokaryotic taxa. Scientific Reports 6, 38392. https://doi.org/10.1038/srep38392 Sanger, F., Nicklen, S. and Coulson, A.R. (1977) DNA sequencing with chain-termination inhibitors. Proceedings of the National Academy of Sciences USA 74, 5463–5467. https://doi.org/10.1073/ pnas.74.12.5463 Sasser, M. (1990) Identification of bacteria by gas chromatography of cellular fatty acids. USFCC Newsletter 20, 16. Schildkraut, C.L., Marmur, J. and Doty, P. (1962) Determination of the base composition of deoxyribonucleic acid from its buoyant density in CsCl. Journal of Molecular Biology 4, 430–443. https://doi.org/10.1016/ S0022-2836(62)80100-4 Scheuner, C., Tindall, B.J., Lu, M., Nolan, M., Lapidus, A., Cheng, J.F., Goodwin, L.A., Pitluck, S., Huntemann, M., Liolios, K., Pagani, I., Mavromatis, K., Ivanova, N.N., Pati, A., Chen, A., Palaniappan, K., Jeffries, C.D., Hauser, L., Land, M., Mwirichia, R., Rohde, M., Abt, B., Detter, J.C., Woyke, T., Eisen, J.A., Markowitz, V., Hugenholtz, P., Göker, M., Kyrpides, N.C. and Klenk, H.P. (2014) Complete genome sequence of Planctomyces brasiliensis type strain (DSM 5305T), phylogenomic analysis and reclassification of Planctomycetes including the descriptions of Gimesia gen. nov., Planctopirus gen. nov. and Rubinisphaera gen. nov. and emended descriptions of the order Planctomycetales and the family Planctomycetaceae. Standards in Genomic Sciences 9, 10. https://doi.org/10.1186/19443277-9-10 Schleifer, K.-H. and Kandler, O. (1972) Peptidoglycan types of bacteria cell walls and their taxonomic implications. Bacteriological Reviews 36, 407–477. https://doi.org/10.1128/MMBR.36.4.407-477.1972 Seemann, T. (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069. https:// doi.org/10.1093/bioinformatics/btu153 Siddall, M.E. (2010) Unringing a bell: Metazoan phylogenomics and the partition bootstrap. Cladistics 26, 444–452. https://doi.org/10.1111/j.1096-0031.2009.00295.x Siddall, M.E. and Whiting, M.F. (1999) Long-branch abstractions. Cladistics 15, 9–24. https://doi. org/10.1111/j.1096-0031.1999.tb00391.x Simon, M., Scheuner, C., Meier-Kolthoff, J.P., Brinkhoff, T., Wagner-Döbler, I., Ulbrich, M., Klenk, H.P., Schomburg, D., Petersen, J. and Göker, M. (2017) Phylogenomics of Rhodobacteraceae reveals evolutionary adaptation to marine and non-marine habitats. The ISME Journal 2017, 1–17. https://doi. org/10.1038/ismej.2016.198 Sneath, P.H.A. and Sokal, R.R. (1973) Numerical Taxonomy: The Principle and Practice of Numerical Classification. W.H. Freeman and Company, San Francisco, California. Sokal, R.R. (1984) Phenetic taxonomy: theory and methods. Annual Review on Ecology, Evolution and Systematics 17, 423–442. https://doi.org/10.1146/annurev.es.17.110186.002231 Sokal, R.R. (1985) The continuing search for order. The American Naturalist 126, 729–749. https://doi. org/10.1086/284450 Sokal, R.R. and Camin, J.H. (1965) The two taxonomies: areas of agreement and conflict. Systematic Zoology 14, 176–195. https://doi.org/10.2307/2411548 Stackebrandt, E. (1992) Unifying phylogeny and phenotypic diversity. In: Balows, A., Trueper, H.G., Dworkin, M., Harder, W. and Schleifer, K.H. (eds) The Prokaryotes. Springer, New York, pp. 19–47. Stackebrandt, E. and Goebel, B.M. (1994) Taxonomic note: a place for DNA-DNA reassociations and 16S rRNA sequence analysis in the present species definition in bacteriology. International Journal of Systematic Bacteriology 44, 846–849. https://doi.org/10.1099/00207713-44-4-846 Stackebrandt, E. et al. (2002) Report of the ad hoc committee for the re-evaluation of the species definition in bacteriology. International Journal of Systematic and Evolutionary Microbiology 52, 1043–1047. https://doi.org/10.1099/ijs.0.02360-0 Stamatakis, A., Hoover, P., Rougemont, J. (2008) A rapid bootstrap algorithm for the RAxML web-servers. Systematic Biology 57, 758–771. https://doi.org/10.1080/10635150802429642 Staneck, J.L. and Roberts, G.D. (1974) Simplified approach to identification of aerobic actinomycetes by thin-layer chromatography. Applied Microbiology 28, 226–231. https://doi.org/10.1128/AEM.28.2.226-231.1974 Sutcliffe, I.C. (2015) Challenging the anthropocentric emphasis on phenotypic testing in prokaryotic species descriptions: Rip it up and start again. Frontiers in Genetics 6, 6–9. https://doi.org/10.3389/fgene. 2015.00218

280

M. Göker

Sutcliffe, I.C., Trujillo, M.E. and Goodfellow, M. (2012) A call for arms to systematists: Revitalising the purpose and practices underpinning the description of novel microbial taxa. Antonie van Leeuwenhoek 101, 13–20. https://doi.org/10.1007/s10482-011-9664-0 Suzuki, R. and Shimodaira, H. (2006) Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics 22, 1540–1542. https://doi.org/10.1093/bioinformatics/btl117 Swofford, D.L. (2002) PAUP*: Phylogenetic Analysis Using Parsimony (*and Other Methods), Version 4.0 b10. Sinauer Associates, Sunderland, Massachusetts. Tabssum, F., Ahmad, Q.‐U. and Qazi, J.I. (2018) DNA sequenced based bacterial taxonomy should entail decisive phenotypic remarks: Towards a balanced approach. Journal of Basic Microbiology 58, 918–927. https://doi.org/10.1002/jobm.201800319 Talavera, G., Lukhtanov, V.A., Pierce, N.E. and Vila, R. (2013) Establishing criteria for higher-level classification using molecular data: The systematics of Polyommatus blue butterflies (Lepidoptera, Lycaenidae). Cladistics 29, 166–192. https://doi.org/10.1111/j.1096-0031.2012.00421.x Tan, G.E., Uffato, M.M., Ledergerber, C., Herrero, J., Goldman, N., Gil, M. and Dessimoz, C. (2015) Current methods for automated filtering of multiple sequence alignments frequently worsen single-gene phylogenetic inference. Systematic Biology 64, 778–791. https://doi.org/10.1093/sysbio/syv033 Tang, X., Zhou, Y., Zhang, J., Ming, H., Nie, G.X., Yang, L.L., Tang, S.K. and Li, W.J. (2012) Actinokineospora soli sp. nov., a thermotolerant actinomycete isolated from soil, and emended description of the genus Actinokineospora. International Journal of Systematic and Evolutionary Microbiology 62, 1845–1849. https://doi.org/10.1099/ijs.0.035832-0 Tanizawa, Y., Fujisawa, T. and Nakamura, Y. (2018) DFAST: a flexible prokaryotic genome annotation pipeline for faster genome publication. Bioinformatics 34, 1037–1039. https://doi.org/10.1093/ bioinformatics/btx713 Taylor, D.J. and Piel, W.H. (2004) An assessment of accuracy, error, and conflict with support values from genome-scale phylogenetic data. Molecular Biology and Evolution 21, 1534–1537. https://doi. org/10.1093/molbev/msh156 Thompson, C.C., Amaral, G.R., Campeão, M., Edwards, R.A., Polz, M.F., Dutilh, B.E. et al. (2015) Microbial taxonomy in the post-genomic era: Rebuilding from scratch? Archives of Microbiology 197, 359–370. https://doi.org/10.1007/s00203-014-1071-2 Tindall, B.J., Sikorski, J., Smibert, R.A. and Krieg, N.R. (2007) Phenotypic characterization and the principles of comparative systematics. In: Reddy, A. (ed.) Methods for General and Molecular Microbiology, 3rd edn. ASM Press, Washington, DC, pp. 159–167. ISBN: 978-1-55581-223-2 https://doi.org/ 10.1128/9781555817497.ch15 Tindall, B.J., Rosselló-Móra, R., Busse, H.J., Ludwig, W. and Kämpfer, P. (2010) Notes on the characterization of prokaryote strains for taxonomic purposes. International Journal of Systematic and Evolutionary Microbiology 60, 249–266. https://doi.org/10.1099/ijs.0.016949-0 To, T.H., Jung, M., Lycett, S. and Gascuel, O. (2015) Fast dating using least-squares criteria and algorithms. Systematic Biology 65, 82–97. https://doi.org/10.1093/sysbio/syv068 Tortoli, E., Brown-Elliott, B.A., Chalmers, J.D. et al. (2019) Same meat, different gravy: ignore the new names of mycobacteria. European Respiratory Journal 54, 1900795. https://doi.org/10.1183/13993003.00795-2019 Trujillo, M.E. and Oren, A. (2018) Avoiding ‘salami slicing’ in publications describing new prokaryotic taxa. International Journal of Systematic and Evolutionary Microbiology 68, 977–978. https://doi. org/10.1099/ijsem.0.002634 Vaas, L.A.I., Sikorski, J., Hofer, B., Fiebig, A., Buddruhs, N., Klenk, H.P. and Göker, M. (2013) opm: An R package for analysing OmniLog Phenotype Microarray data. Bioinformatics 29, 1823–1824. https:// doi.org/10.1093/bioinformatics/btt291 Vandamme, P. and Peeters, C. (2014) Time to revisit polyphasic taxonomy. Antonie van Leeuwenhoek 106, 57–65. https://doi.org/10.1007/s10482-014-0148-x Vandamme, P., Pot, B., Gillis, M., de Vos, P., Kersters, K. and Swings, J. (1996) Polyphasic taxonomy, a consensus approach to bacterial systematics. Microbiological Reviews 60, 407–438. https://doi. org/10.1128/MMBR.60.2.407-438.1996 Varghese, N.J. et al. (2015) Microbial species delineation using whole genome sequences. Nucleic Acids Research 43, 6761–6771. https://doi.org/10.1093/nar/gkv657 Vences, M., Guayasamin, J.M., Miralles, A. and De La Riva, I. (2013) To name or not to name: Criteria to promote economy of change in Linnaean classification schemes. Zootaxa 3636, 201–244. https://doi. org/10.11646/zootaxa.3636.2.1

What can Genome Analysis Offer for Bacteria?

281

Wagener, K., Drillich, M., Baumgardt, S., Kämpfer, P., Busse, H.J. and Ehling-Schulz, M. (2014) Falsiporphyromonas endometrii gen. nov., sp. nov., isolated from the post-partum bovine uterus, and emended description of the genus Porphyromonas Shah and Collins 1988. International Journal of Systematic and Evolutionary Microbiology 64, 642–649. https://doi.org/10.1099/ijs.0.057307-0 Wayne, L.G., Brenner, D.J., Colwell, R.R., Grimont, P.A.D., Kandler, O., Krichevsky, M.I., Moore, L.H., Moore, W.E.C., Murray, R.G.E., Stackebrandt, E., Starr, M.P. and Trüper, H.G. (1987) International Committee on Systematic Bacteriology. Report of the ad hoc committee on reconciliation of approaches to bacterial systematics. International Journal of Systematic Bacteriology 37, 463–464. https://doi.org/10.1099/00207713-37-4-463 Weimann, A., Mooren, K., Frank, J., Pope, P.B., Bremges, A. and McHardy, A.C. (2016) From genomes to phenotypes: Traitar, the microbial trait analyzer. mSystems 1, 6. https://doi.org/10.1128/mSystems.00101-16 Wheeler, Q. and Meier, R. (2000) Species Concepts and Phylogenetic Theory: A Debate. Columbia University Press, New York. Wiley, E.O. and Lieberman, B.B.S. (2011) Phylogenetics: Theory and Practice of Phylogenetic Systematics. 2nd edn. John Wiley and Sons, Inc., Hoboken, New Jersey. https://doi.org/10.2307/3280934 Wilf, P. and Escapa, I.H. (2015) Green Web or megabiased clock? Plant fossils from Gondwanan Patagonia speak on evolutionary radiations. New Phytologist 207, 283–290. https://doi.org/10.1111/nph.13114 Woese, C.R., Stackebrandt, E., Macke, T.J. and Fox, G.E. (1985) A phylogenetic definition of the major eubacterial taxa. Systematic and Applied Microbiology 6, 143–151. https://doi.org/10.1016/S07232020(85)80047-3 Wolf, Y., Rogozin, I., Grishin, N., Tatusov, R. and Koonin, E. (2001) Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evolutionary Biology 1, 8. https://doi. org/10.1186/1471-2148-1-8 Wu, M. and Eisen, J.A. (2008) A simple, fast, and accurate method of phylogenomic inference. Genome Biology 9, R151. https://doi.org/10.1186/gb-2008-9-10-r151 Yarza, P., Yilmaz, P., Pruesse, E., Glöckner, F.O., Ludwig, W., Schleifer, K.-H., Whitman, W.B., Euzéby, J., Amann, R. and Rosselló-Móra, R. (2014) Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nature Reviews Microbiology 12, 635–645. https://doi.org/10.1038/nrmicro3330 Yoon, J., Yasumoto-Hirose, M., Matsuo, Y., Nozawa, M., Matsuda, S., Kasai, H. and Yokota, A. (2007) Pelagicoccus mobilis gen. nov., sp. nov., Pelagicoccus albus sp. nov. and Pelagicoccus litoralis sp. nov., three novel members of subdivision 4 within the phylum ‘Verrucomicrobia’, isolated from seawater by in situ cultivation. International Journal of Systematic and Evolutionary Microbiology 57, 1377–1385. https://doi.org/10.1099/ijs.0.64970-0 Yoon, S.H. et al. (2017) Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies. International Journal of Systematic and Evolutionary Microbiology 67, 1613–1617. https://doi.org/10.1099/ijsem.0.001755 Zachos, F.E. (2011) Linnean ranks, temporal banding, and time-clipping: Why not slaughter the sacred cow? Biological Journal of the Linnean Society 103, 732–734. https://doi.org/10.1111/j.1095-8312.2011.01711.x Zhi, X.Y., Li, W.J. and Stackebrandt, E. (2009) An update of the structure and 16S rRNA gene sequence-based definition of higher ranks of the class Actinobacteria, with the proposal of two new suborders and four new families and emended descriptions of the existing higher taxa. International Journal of Systematic and Evolutionary Microbiology 59, 589–608. https://doi.org/10.1099/ijs.0.65780-0

16

Genomes Reveal the Cohesiveness of Bacterial Species Taxa And Provide a Path Towards Describing All of Bacterial Diversity Frederick M. Cohan* Department of Biology, Wesleyan University, Middletown, CT, USA

Introduction Scientists in many disciplines rely on the systematics of bacteria – epidemiologists, biotechnologists, agriculturists and microbial ecologists, and evolutionary biologists – and they all place high demands on the classification of bacterial species. All these consumers of bacterial systematics demand a reasonably complete accounting and description of the world’s species. I will address how far bacterial systematics has fallen behind in identifying and describing species that we now know exist (Yarza et al., 2014; Garrity, 2016; Locey and Lennon, 2016). An existential problem for systematics is that our current approach to describing species applies only to cultivated bacteria, since it requires a cultivated type strain (Garrity, 2016). Moreover, the current ‘polyphasic’ approach to species taxonomy requires a labour-intensive characterization of any new species through laboratory testing of its properties (Vandamme and Peeters, 2014). I will discuss how genomic approaches give us a means to catch up, by potentially providing us the full metabolic capacity for any organism, and a means to demarcate organisms into species taxa (Garrity and Lyons, 2011; Thompson et al.,

2014; Vandamme and Peeters, 2014; Garrity, 2016; Konstantinidis et al., 2017). To satisfy evolutionary biologists, systematists have aspired to define species according to an evolutionary theory of species origination. Speciologists of animals and plants have proposed that a species should have certain dynamical properties. Most famously, Ernst Mayr and others have suggested that species should be defined by properties of genetic exchange (Mayr, 1963; Coyne and Orr, 2004), and some bacteriologists have suggested extending that property to bacteria (Dykhuizen and Green, 1991; Cadillo- Quiroz et al., 2012; Bobay and Ochman, 2017). The motivation is that ecological diversification within a species is constrained by genetic exchange, while different species may diverge without limit (Mayr, 1963; Templeton, 1989; Cohan, 2017a). We shall see that this widely celebrated property of species does not apply reliably to any group of organisms, even to the animals that inspired Mayr’s work, and much less to bacteria (Mallet, 2008; Cohan, 2017a; see also Chapters 10 and 15). However, there is another universal property of species cohesion that applies to species of animals, plants and bacteria (and probably beyond) – that recombination prevents neutral

*[email protected]

282

© CAB International 2021. Trends in the Systematics of Bacteria and Fungi (eds. P. Bridge, D. Smith and E. Stackebrandt)

Bacterial Species Taxa – Bacterial Diversity

sequence divergence within a species (Cohan, 2011a, 2019). I will show that bacterial systematists of the mid-20th century fortuitously created a species-level systematics that actually fits an important universal theory of speciation. Finally, I will discuss what may be the most exacting demand on species taxonomy. The animal ecologist G. Evelyn Hutchinson aspired that every species should be homogeneous in its biochemical, physiological, morphological and ecological characteristics. He argued that such a taxonomy would allow us to infer the important characteristics of any unknown organism once we classify it to species (Hutchinson, 1968). In the case of bacteria, many microbiologists may agree that we would benefit from a taxonomy that classifies bacteria to groups that are uniform in the characteristics we care most about. For example, we might want a pathogenic taxon to be uniform in its tissue tropisms, host range, disease aetiology and so on. However, microbiologists understand that the typical bacterial species taxon houses a substantial level of ecological diversity (Konstantinidis et al., 2006; Cohan and Kopac, 2017). There are many good reasons not to move to a higher-resolution species taxonomy that would abide by Hutchinson’s aspiration. However, I will discuss ways in which we can extend infraspecific systematics of close relatives to better abide by the Hutchinsonian aspiration of homogeneous taxa.

How Taxonomy Demarcates Bacterial Species Each species experiences its world in its own unique way, and has evolved a special sense of its surroundings, known as an ‘umwelt’ (Yoon, 2009). In the case of our species, we inherited an umwelt from our hunter-gatherer past that enabled our ancestors to distinguish animal and plant species, driven by the need to distinguish beneficial and safe animal and plant species from their close relatives that were dangerous. Unfortunately, we have no umwelt for distinguishing bacteria because they are so new to us (Cohan, 2011a). Lacking an umwelt for bacteria, mid-century microbiologists set the foundation for species systematics based on the metabolic and chemical

283

traits that were then available. By borrowing the methods of numerical taxonomy developed by zoologists and botanists, bacterial systematists developed a new intuition for using phenotypes to estimate relatedness of organisms. Through numerical taxonomy, they sought to identify meaningful clusters of similar organisms (Sneath and Sokal, 1973), and these clusters became the species of bacteriology (Holmberg and Nord, 1984). In principle, mid-century bacterial systematists could have defined their species narrowly, for example by distinguishing species by subtle, quantitative differences in metabolic capacities (Cohan, 2002). Instead, they made a pragmatic decision early on to include, within a species, strains that were hugely heterogeneous in the presence versus absence of many metabolic capabilities (Rosselló-Móra and Amann, 2001; Cohan, 2011a). There was no reason to believe that these species should abide by any evolutionary theory about the properties of species, nor was there any desire to do this. However, we will see that these species fortuitously share a universal, species-like, dynamic property with species of animals and plants (Jain et al., 2018; Cohan, 2019). Beyond metabolic analyses, a series of molecular tools has contributed profoundly to bacterial systematics. DNA sequencing has revealed polyphyletic groups; that is, evolutionarily distant groups that were mistakenly identified as a single taxon on the basis of having converged on the same phenotype. For example, phylogenetic study of the 16S rRNA gene sequence revealed that the bacteria assigned to the Caulobacter genus (based on sharing the stalked phenotype) are really a set of divergent taxa that independently evolved the stalk (Stackebrandt et al., 1988). Also, phylogenetic study of the 16S rRNA molecule was able to place all cellular organisms on a single tree (Woese, 1987). Universal trees have come a long way since then by utilizing a large set of universal, single-copy genes (Zhu et al., 2019). An early whole-genome approach to analyzing relatedness was able to corroborate the mid-century metabolic species clusters. That is, DNA-DNA hybridization (DDH) could estimate the percentage of the genome shared between a pair of strains (De Ley, 1970), and it turned out that sharing more than 70% of the genome gave about the same species demarcations as metabolic clustering (Wayne et al., 1987). This was

284

F.M. Cohan

the first in a series of molecular traits that was calibrated to yield the species previously demarcated by numerical taxonomy of metabolic traits. Following DDH, systematists brought various molecular tools into the systematics of species demarcation, and they calibrated each to yield the earlier metabolic clusters (Thompson et al., 2014; Jain et al., 2018). Sequence identity at the 16S rRNA gene locus was calibrated to yield the metabolic species demarcations, first with a criterion of 97% sequence identity (Stackebrandt and Goebel, 1994) and then at 98.5% (Stackebrandt and Ebers, 2006). More recently, multilocus sequence analysis, with clusters based on sequence identity of seven or so shared genes, has also corroborated the species taxa based on metabolic sequence clusters (Gevers et al., 2005). A multilocus approach yields an advantage over 16S rRNA in providing greater resolution for discovering significant within-species diversity (Gevers et al., 2005). Moreover, multilocus approaches are less likely to misclassify when a given marker has recombined. Whole-genome sequencing has given new opportunities for species-level systematics (see Chapter 13). First, the extent of gene content sharing, which DDH measures only indirectly, can be estimated directly by comparing wholegenome sequences (Auch et al., 2010). While DDH was limited by requiring pairwise measures of genomic distance, and by requiring special expertize, whole-genome sequencing allows an incremental increase in the taxonomy of species. Each new species can be added to an existing database, without the need for comprehensive pairwise experiments every time a new species is added to the taxonomy (Garrity, 2016). Additionally, whole-genome sequencing takes clustering by multiple loci to the limit, with the potential to take into account the sequence identity levels of thousands of shared genes. In 2005, with the prospect of extremely cheap, wholegenome sequencing on the horizon, Kostas Konstantinidis and his colleagues presciently developed whole-genome average nucleotide identity (ANI) as a measure of relatedness (Konstantinidis and Tiedje, 2005). Like each of the earlier molecular markers of relatedness, whole-genome sequencing was calibrated to yield the old metabolically defined species. This team found that an ANI value of 95% closely approximated the existing species demarcations of bacterial systematics.

A recent study by Matt Olm and colleagues found the optimal cutoff for yielding the species taxa with various genes and sets of genes, and it rated each marker for its ability to delineate the existing species taxa of bacterial systematics (Olm et al., 2019). They found that the optimal level of 16S rRNA divergence for recalling the existing species taxa was 99% (similar to the suggestion by Stackebrandt and Ebers, 2006). Notably, 16S rRNA was the least discerning of all the single-gene markers that they studied, and the ribosomal protein L6 was the most discerning single-gene marker. However, an ANI value of 94.5% yielded the highest delineation of species of any molecular marker, far greater than for any single gene. Systematists are currently adopting ANI as a means for classifying new isolates and metagenome-assembled genomes to existing species and for discovering new species. For example, nearly 2500 published genomes in the Lactobacillus genus were recently classified by ANI (Wittouck et al., 2019). Wittouck and colleagues demarcated the entire set based on 94% ANI, a value lower than that recommended (Jain et al., 2018; Olm et al., 2019). Their analysis merged several species, and allowed discovery of eight previously uncharacterized species. Similarly, Kanny Diallo and colleagues applied ANI to discover and classify the full extent of humaninfecting Neisseria species (Diallo et al., 2019). There is also a pseudo-genomic approach similar to multilocus sequence analysis. Here, a small sample of genes from whole-sequence genomes is concatenated to yield sequence clusters. For example, a recent survey of Acinetobacter was based on a concatenation of 13 genes selected from whole-genome sequences (MateoEstrada et al., 2019). For decades now, and over generations of molecular techniques for demarcating species, systematists have applied a ‘polyphasic’ approach to identifying and describing species. This approach has sought to characterize novel species as fully as possible to reach a consensus among molecular and phenotypic traits (Vandamme et al., 1996, Tindall et al., 2010). Here, laboratory tests of physiology and metabolism are compared with molecular analyses with the aim of creating a stable taxonomy with a minimum of contradiction. While no one would argue that more information about a taxon is unhelpful,

Bacterial Species Taxa – Bacterial Diversity

there is a growing concern that polyphasic taxonomy places too high a standard on the quality of diagnostic information for a species taxon (Vandamme and Peeters, 2014). The problem with polyphasic taxonomy is that bacterial systematics is almost hopelessly behind in bringing all the species we know exist into our taxonomy. Pablo Yarza and colleagues have pointed out that the number of species taxa that have been discovered and are not yet classified is increasing steadily. At the current rate of describing species, systematists would require thousands of years to classify just the species that we currently are aware of (Yarza et al., 2014). High-throughput sequencing has made us aware of this problem but, as we will see, high-throughput sequencing can also solve the problem by giving us full genomes. This is because genomes are more than a high-resolution method of distinguishing clusters of bacteria. They can, potentially, also provide the full metabolic capabilities and ultimately the physiology and ecology of bacteria. As a result, some systematists and microbial ecologists are eager to replace physiological testing with a genome-based estimate of a bacterium’s capabilities (Garrity, 2016; Konstantinidis et al., 2017). Let us next consider how a genome-based species taxonomy would serve the microbiological community.

A Genome-based Species Taxonomy A genome-based species taxonomy will need to consider three points: (i) how to obviate the need for a cultivated type strain; (ii) how to demarcate genomes into new species; and (iii) how to describe and recurrently update the phenotype and diagnostic criteria for new species as new data become available.

Substituting a type genome sequence for a type strain Recognizing the limitations of a culture-based systematics at the species level, microbiologists are increasingly demanding a genome-based route to characterizing new species (Garrity and Lyons, 2011; Thompson et al., 2014; Vandamme

285

and Peeters, 2014; Garrity, 2016; Konstantinidis et al., 2017). One solution is to relax Rules 27 and 30 of the taxonomic code (Garrity, 2019), such that a genome sequence can be substituted for a type strain. Supporters of reform argue that this change will further the democratization of systematics, such that anyone who can sequence and analyse genomes will be able to demarcate novel species and characterize their metabolic features (Garrity, 2016). Those genome sequences based on a single uncultivated cell would make the most reasonable substitute material for a type strain. Characterizing the metabolic and ecological features of the type sequence would be comparable to characterizing those features from study of an isolate. However, I will argue that a metagenome- assembled genome should be considered as type material only for a candidate taxon. The problem is that a metagenome-assembled genome (MAG) is based on a concatenation across reads from multiple organisms. Therefore, a MAG could contain multiple ecologically distinct populations (ecotypes) (Nelson et al., 2016), considering that ecotypes appear to have as little as 1% divergence in ANI (Konstantinidis and Tiedje, 2005), and possibly less (Cohan, 2016a). Whether a genome sequence is based on a single cell or from a metagenome, any metabolism inferred from the genome should be considered tentative until assayed directly in laboratory tests. Taking into account both the limited opportunities for funding and the urgencies of the science, we should at least aspire to eventually confirm the inferred phenotypic features of a type genome sequence. George Garrity has anticipated that descriptions of species based on genomes will be in flux, and he and his coworkers have proposed and patented the Names for Life database (www. namesforlife.com/search, accessed 30 July, 2020), which allows continual updating of the data resources of a taxon. This includes a record of the original description as well as further taxonomic and nomenclatural events relating to the taxon (Garrity and Lyons, 2011).

Demarcating genomes into new species Because ANI delineates genomes into the recognized species taxa more accurately than any

286

F.M. Cohan

other molecular approach (Olm et al., 2019), it is reasonable to demarcate genomes of uncultivated bacteria into species by ANI (Varghese et al., 2015; Parks et al., 2019). One protocol is to apply complete-linkage clustering of ANI values to demarcate the genomes. That is, a new species would be demarcated such that all pairwise distances yield > 95% ANI (Varghese et al., 2015). In an alternative classification, systematists would base the demarcation of each novel species by sequence identity with the type genome for the species (Parks et al., 2019). That is, after the type genome for a novel species is chosen, other genomes would be added to the species based on having an ANI value > 95% of the type genome. This would capture the central importance of type strains in taxonomy. Although ANI can now be calculated extremely quickly with the new FastANI algorithm (Jain et al., 2018), there is still a need to improve the speed of classifying millions of novel species by their ANI values. One recent approach is to prescreen for close relatives by tetranucleotide composition, and to then apply ANI only to close relatives (Zhou et al., 2020).

Describing the phenotype of novel species Genomes provide a trove of information for describing the metabolic, chemical and ecological properties of species taxa. Most straightforwardly, the standard phenotypic traits of polyphasic studies can be estimated by analyzing gene content. For example, the capacity for the Voges–Proskauer reaction, indole production and utilization of any number of carbon sources can be tentatively determined through gene content analyses (Thompson et al., 2014). Using genomes to characterize novel bacteria for their tolerances of physical and chemical conditions is more challenging. However, microbial ecologists are making progress towards identifying genes that confer complex phenotypes. In genome-wide association studies, one correlates the genome content with phenotypes among close relatives, to identify genes responsible for the phenotypic differences. For example, the Traitar algorithm can predict

67 phenotypic traits from genomes from various phyla (Weimann et al., 2016). The algorithm was based on correlating phenotypic data from the GIDEON database (Berger, 2005) with gene content from sequenced genomes. Others have predicted phenotypes from genome variation within a species, for example by predicting invasiveness and resistance among strains of Neisseria meningitidis (Collins and Didelot, 2018). Various approaches to identifying genes responsible for complex traits such as salt and pH tolerance in laboratory studies promise future predictions of phenotypes (Hahne et al., 2010; Mirete et al., 2015; Barberán et al., 2017). Genes that promote successful interactions with other bacterial species have also been identified from laboratory studies (He et al., 2017). Genome-wide association studies (GWAS) can, in principle, be used more generally to discover the genes conferring the ability to live in various habitats (Dutilh et al., 2013). For example, correlation of genes in Novosphingium with diverse habitats including rhizospheres, contaminated soils, freshwater and marine water, yielded the discovery of genes consistently associated with each habitat type (Kumar et al., 2017). We should note that GWAS analyses can be performed retrospectively by future investigators, but only if microbiologists are careful to publish detailed accounts of the environments from which they isolate novel organisms. Following the MIxS protocols for describing habitats will become especially important as we try to characterize ecological abilities from GWAS (Cohan, 2011b; Yilmaz et al., 2011). This open-ended discovery of genes responsible for ecological adaptations will contribute to estimating the phenotype from the genomes of unclassified strains. In short, a genome-based systematics will allow us to demarcate novel, uncultivated species that are similar in their phylogenetic breadth (i.e. down to 95% ANI) to the traditional species, and we can in principle obtain a tentative outline of their phenotypic features from genomes. Moreover, I will discuss how the species we have classified up to now, as well as the species classified by their genomes, will abide by a universal theory of the dynamic evolutionary properties of species that a species should hold (Cohan, 2019). Let us next consider the properties that evolutionary biologists expect for species.

Bacterial Species Taxa – Bacterial Diversity

Is There Something Real About Species? Systematics begins with the observation that life’s diversity is organized into clusters of related organisms that are similar in structure, function and genomic properties. These phenotypic and genetic clusters are found from the most complex organisms to the prokaryotes and at all levels of life’s diversity, from the domains and phyla to species (Mallet, 1995; Caro-Quintero and Konstantinidis, 2012). Between these clusters are gaps that represent intermediate phenotypes, which we can imagine, but do not actually exist in the natural world (Wilkins, 2009; Cohan, 2013). This pattern of clusters and gaps reflects the genealogical continuity of all organisms, taking into account that some lineages have been extremely successful, while nearly every lineage that has ever existed has gone extinct. The higher ranks of systematics take into account obvious gaps in phenotype among closely related genera, families and so on. Systematists generally agree that these higher ranks (above the species level) are simply a convenience for consumers of systematics (McDonald et al., 2012). That is, systematists and evolutionary biologists have not hypothesized any dynamic force that would apply within a genus, for example, that would not apply across different genera (Cohan, 2017b). Nevertheless, there has been recent interest in determining a universal criterion for how much diversity should be included within each taxonomic rank. To this end, Donovan Parks and colleagues have developed a universal method for reclassifying organisms so that every genus contains organisms that have diverged for the same amount of time (Parks et al., 2018). In the case of genera, for example, each taxon is reclassified so that it contains organisms that have diverged up to 7% of the time since the last common ancestor of all of life. Among the most contentious issues in systematics is whether there is some biological reality to species. Some systematists believe that species are no more biologically real than the higher taxa (Hey, 2001; Doolittle and Zhaxybayeva, 2009), while others hold that there is something special about species – that they hold certain dynamic properties that transcend human attempts at classification (Mayr, 1942;

287

de Queriroz, 2005; Cohan and Perry, 2007). Among these proposed properties are that each species is ecologically distinct and irreversibly separate from other species, and that each species is cohesive in that some force constrains diversification within a species (de Queriroz, 2005; Kopac et al., 2014). We will see that there has been recent progress in showing that bacterial species taxa (as well as plant and animal species) are cohesive, although not in the way that most evolutionary biologists had expected (Jain et al., 2018; Cohan, 2019). I will next consider how recombination and selection can act as forces of cohesion within and between bacterial populations.

Recombination Does Not Prevent Ecological Divergence Between Bacterial Populations The first force of cohesion proposed for species was genetic exchange. Ernst Mayr and Theodosius Dobzhansky defined species such that populations within an animal or plant species could exchange genes at some high frequency, but that members of different species could not (Mayr, 1942; Dobzhansky, 1951). They argued that this pattern would limit the divergence among populations of the same species but not the divergence between different species. Thus, populations could diverge without bound only when they break free of their recurrent recombination, through evolving sexual isolation. The term ‘Mayr’s brake’ was applied to the action of recombination in stifling the adaptive divergence between populations of the same species. The concept of Mayr’s brake ruled with hegemony over the thinking of animal and plant speciologists throughout the 20th century, and its influence still rules to some extent over bacterial speciology (Bobay and Ochman, 2017; Cohan, 2017b). However, nearly a century ago the population geneticist J.B.S. Haldane noted an essential problem with Mayr’s brake (Haldane, 1932). He showed mathematically that a recurrent trickle of gene flow (exchange of genes) between populations adapted to different circumstances could have only a negligible effect on the abilities of the populations to maintain their unique adaptations.

288

F.M. Cohan

That is, if cb is the rate of recombination between populations (frequently called m, for migration) and s is the selection intensity against migrant alleles, then the equilibrium frequency of a maladaptive, migrant allele in a population is cb /s, which would be tiny for any set of populations with limited recombination between them (Vos and Didelot, 2009; Cohan, 2011a). James Mallet recently argued that even adjacent populations of animals or plants that are adapted to different environments can diverge without hindrance from recombination (Mallet, 2008). Recombination is exceedingly unlikely to hinder adaptive divergence between ecologically distinct populations of bacteria. This is because the rate of recombination in bacteria is extremely low, even within populations, hovering within an order of magnitude or two of the mutation rates, around 10−6 per gene per generation (Vos and Didelot, 2009). Thus, even if different populations were to recombine at the same rate as cells of the same population, the equilibrium frequency of a foreign allele would be negligible. This is to say that two populations could diverge even if they were in exactly the same place (e.g. two populations living on different soluble compounds in the same aquatic environment) and recombining at the same rate between as within populations. I have, therefore, argued that the evolution of sexual isolation is not a milestone in the ecological divergence of bacterial lineages (Cohan, 1994). We should expect, then, that one bacterial population should be able to diverge into two ecologically distinct lineages, even without any geographic separation or any kind of reduction in their recombination rate. Thus, laboratory evolution experiments have repeatedly brought diversification of the founding lineage into multiple populations within a culture flask, owing to specialization on different soluble resources (Treves et al., 1998; Blount et al., 2012) or to specialization on different microhabitats within the same flask (Rainey and Travisano, 1998; Koeppel et al., 2013). Likewise, surveys of ecological diversity within natural habitats have demonstrated ecological divergence among extremely closely related strains (Shapiro and Polz, 2014). Some researchers (Cadillo-Quiroz et al., 2012; Polz et al., 2013; Kashtan et al., 2014) have argued that recombination must be reduced before

bacterial lineages can diverge into different species (Cohan, 2016a). Their evidence was that the closely related, ecologically distinct clades that were the focus of their studies showed reduced recombination between them, compared to recombination rates within them. However, these studies did not consider whether there was ecological diversification within each of the focus clades, where recombination may have been higher (Melendrez et al., 2016). Our work with hot spring Synechococcus demonstrated ecological divergence among extremely close relatives, even those with the highest levels of recombination (Melendrez et al., 2016). Given this evidence, as well as the theoretical expectation that reduced recombination is not necessary for ecological divergence, we may conclude that sexual isolation is not likely to be a necessary step for adaptive diversification of bacteria. Let us next consider what might be the most significant forces of cohesion in the bacterial world.

Periodic Selection as a Force of Cohesion in Bacterial Species One important force of cohesion for bacteria is periodic selection. Because recombination rates in bacteria are so low (Vos and Didelot, 2009), natural selection favouring an adaptive gene within an ecologically homogeneous population (or ecotype) can reduce the genome-wide genetic variation within the ecotype to near zero (Cohan, 1994). However, because different ecotypes are ecologically distinct, a periodic selection cannot purge the diversity genome-wide across ecotypes (Cohan, 2017a). Genome-wide selective sweeps have been observed in the bacterial world, as expected for periodic selection events acting within a single ecotype. A metagenomic survey of diversity in a bog lake has yielded the first direct evidence of genome-wide sweeps in nature (Bendall et al., 2016). However, more frequently genomic (Bhaya et al., 2007; Shapiro et al., 2012) and metagenomic (Bendall et al., 2016) surveys have revealed evidence of single-gene sweeps (or sweeps over only a short segment of the chromosome) within a sequence cluster. These results appear at first to demonstrate that recombination

Bacterial Species Taxa – Bacterial Diversity

is sufficient to prevent genome-wide purges of diversity within any given population (Papke et al., 2004; Shapiro and Polz, 2015). However, my colleagues and I have previously argued that single-gene sweeps do not occur within just a single population. Instead, single-gene sweeps are most likely involve the transfer of one generally adaptive gene segment across all the ecotypes within a sequence cluster (Majewski and Cohan, 1999a; Kopac and Cohan, 2012; Cohan, 2016a).

Ecotypes as Species-like Lineages Bacterial ecotypes meet a diversity of aspirations of what a species should be (Ward, 1998; Koeppel et al., 2008; Sikorski, 2008). Because ecotypes are each ecologically homogeneous, the ecotypes reach Hutchinson’s call for a species taxonomy that allows a precise description of any unknown organism that is classified to an ecotype (Hutchinson, 1968). The ecotype also reaches the speciologists’ aspiration that a species should hold species-like properties – that ecotypes are each ecologically homogeneous and cohesive, and that different ecotypes are ecologically distinct and irreversibly separate (Cohan, 2017a). Ecotypes have been a target of study for microbial ecologists because they represent the most newly divergent, ecologically distinct populations of bacteria (Koeppel et al., 2008; Martiny et al., 2009; Becraft et al., 2015; Chase et al., 2019). Ecotypes may be tentatively discovered as closely related lineages that form distinct sequence-based clusters (Koeppel et al., 2008; Martiny et al., 2009; Wood et al., 2020). The hypothesized clusters may then be confirmed to be ecologically distinct, most easily by finding that they are substantially different in their habitat associations (Cohan, 2017a). Microbial ecologists have found that closely related ecotypes can differ in the chemical and physical conditions to which they are adapted (Connor et al., 2010; Denef et al., 2010; Becraft et al., 2015; Thompson and Kouba, 2019) or in the resources that they consume (Hunt et al., 2008; Kopac et al., 2014; Ramírez et al., 2020). A sampling of the ecological dimensions along which infraspecific ecotypes have diverged include solar exposure and soil texture in desert Bacillus (Connor et al.,

289

2010), temperature and depth in hot spring Synechococcus (Becraft et al., 2015), host specificity within Agrobacterium tumefaciens (Lassalle et al., 2011) and adaptation in Alteromonas macleodii to marine environments with different levels of organic content (Koch et al., 2020). There are generally many ecotypes within a bacterial species taxon that are recognized by bacterial systematics (Staley, 2006; Hunt et al., 2008; Connor et al., 2010; Cohan, 2016b). For example, the marine species Vibrio splendidus was found to have 15 ecotypes, which were confirmed to differ by the size of the particle on which they were sampled and by their seasonal abundance (Hunt et al., 2008). One may argue that ecotypes are the true species of the bacterial world, for being ecologically homogeneous and having periodic selection as a force of cohesion that limits their diversity. However, this would severely disrupt the stability of bacterial taxonomy. I will, instead, discuss the prospects for enriching bacterial systematics by including ecotypes as infraspecific taxa.

Enriching Bacterial Systematics with Ecotypes Within species taxa, both bacteria and higher organisms have diverged to form ecotypes that are adapted to different conditions and resources. Botanists developed the concept of ‘ecotype’ to represent populations in different locations that have diverged in their local adaptations (Turesson, 1922; Clausen et al., 1947). Whenever botanists study a single location of plants belonging to one species, they are not confused by the exuberance of ecotypes within the species, because the ecotypes tend to be in different places. When we study a collection within an animal or plant community, we know for example that all the fruit flies of Drosophila melanogaster from one site represent one evolutionary unit. The existence of ecotypes is not generally a confusion for botanists and zoologists. On the other hand, the proliferation of ecotypes is much more confusing for bacteriology. When we study Escherichia coli strains isolated from one habitat (even from one microhabitat), there can be any number of ecotypes subsumed

290

F.M. Cohan

within the collection (Cohan and Kopac, 2011; Luo et al., 2011). Because bacterial ecotypes can differ quantitatively in their habitat preferences, any one microhabitat could include various ecotypes, some specialized to different resources in the same microhabitat and others specialized to different microhabitats (Hunt et al., 2008). There are significant pitfalls in reifying a pool of ecotypes into one bacterial species. One problem is that population geneticists may incorrectly estimate population sizes and migration rates from sequence data when they focus on an entire bacterial species taxon. Population genetic estimates work well for a typical animal species, when we can assume that all individuals from a region are members of the same evolutionary unit (Ho and Shapiro, 2011; Volz, 2012). The principle of estimating the size of a single population is that, as population size increases, genetic drift will have lower potential to reduce the sequence diversity of the population. This works well for the Drosophila melanogaster fruit flies of a region, where we can reasonably figure that the sequence diversity of the species is limited by genetic drift. However, the sequence diversity within a bacterial species taxon containing multiple ecotypes is determined only very little by the population size of any one ecotype. Instead, the sequence diversity of the whole species taxon is determined mostly by the time that ecotypes have diverged from their common ancestor (Cohan and Kopac, 2017). Mistaking a bacterial species taxon for a single evolutionary unit has recurrently introduced errors into population genetic estimates (Roberts and Cohan, 1995; Bobay and Ochmann, 2018). My colleagues and I have previously discussed how classifying organisms to ecotypes may bring practical benefits for biotechnologists who are looking for close relatives of a useful strain that may differ in its optimal conditions; the search for a vaccine may also benefit from an ecotype-based taxonomy (Cohan and Kopac, 2017). Most generally, when we define a species taxon so broadly as to include many ecotypes, we reduce the opportunity for a full exploration of the metabolic, physiological, ecological and genomic diversity within the species. Systematics would, therefore, benefit from official recognition of the ecotypic diversity within a species taxon. The International Code of Nomenclature of Prokaryotes allows for infraspecific classification

(Parker et al., 2019), and gives a path for inclusion of ecotypes in taxonomy. Rule 13a–d regulates subspecies taxa, and Rule 14a states that taxa below the subspecies level are not regulated (Garrity, 2016; Parker et al., 2019). Systematists have regularly applied infrasubspecific labels to describe diversity within a species or subspecies. These labels include pathovar, serovar, phagovar, biovar, chemovar and morphovar (among others) (Parker et al., 2019). While each of these labels applies specifically to certain kinds of variants (e.g. biovar applies to symbionts of different plant species), ecovar could apply more generally to closely related phylogenetic groups within a species that are ecologically distinct in any way (Cohan, 2006). I will next consider possible criteria for introducing an ecovar to the taxonomy. The proposed ecovar should be identifiable as a sequence cluster, ideally demarcated by an algorithm intended to discover ecologically distinct groups, such ecotype simulation (Koeppel et al., 2008; Wood et al., 2020), AdaptML (Hunt et al., 2008), or Generalized Mixed Yule Coalescent (GMYC) (Barraclough et al., 2009). Alternatively, ecotypes could be revealed by whole-genome clustering at an ANI level of about 99.5% (Konstantinidis and Tiedje, 2005). Ecotypes hypothesized from sequence analysis should then be confirmed, at least tentatively, by quantitative differences in habitat associations. We need to keep in mind that very closely related ecotypes are frequently not totally divergent in their habitat preferences, and so we should not discount habitat differences that are not complete (Hunt et al., 2008). It is worth noting that if a survey of diversity within a species taxon provides the habitat provenance of each strain, future researchers can later add ecovar descriptions. Also, analysis of genomes can more fully confirm the ecological distinctness of ecotypes hypothesized by sequence clustering analyses. Michiel Vos has argued for using genome-wide analyses of positive selection to identify ecotypes (Vos, 2011). That is, when two closely related lineages show different histories of adaptive evolution in their shared genes, we may conclude that they are ecotypes (Kopac et al., 2014). Also, differences in genome content can predict ecological differences among hypothesized ecotypes (Lassalle et al., 2011). Bacterial systematics will be enriched by including ecotypes where it is convenient and

Bacterial Species Taxa – Bacterial Diversity

useful to describe them. Perhaps most appealing to microbial evolutionary biologists is that the ecotypes would represent taxa with the species-like properties of each being cohesive and ecologically homogeneous. Also, ecotypes are the most newly divergent lineages that can diverge without bound in their adaptations (Cohan, 2017a). However, I will argue that ecotypes are not the only bacterial groups to abide by a universal species concept. Recent developments in comparative genomics have indicated that the more inclusive and recognized species taxa, originally based on metabolic and chemical features alone, surprisingly follow a different criterion of species dynamics.

Recombination as a Force of Cohesion Among Ecologically Distinct Lineages While recombination does not act to hinder adaptive divergence among bacterial lineages, recombination can nevertheless act as a force of cohesion in a different way. In the world of animals and plants, recombination can act to spread an adaptive gene from one population to another, within a species or between extremely closely related species that can interbreed. In the process of ‘adaptive introgression’, a generally useful adaptation can spread between populations that have adapted to different environments while the populations maintain their adaptive divergence. For example, in the case of humans, an adaptive allele for lactase persistence (allowing lifelong consumption of fresh cow milk in pastoral societies) spread between North Africa and Scandinavia, while allowing each population to maintain its adaptations to its particular environment (in this case, maintaining adaptive differences in skin colour) (Gerbault et al., 2011). Also, in our species, Tibetans acquired some of their adaptation to high elevations through adaptive introgression from Denisovans, an extinct sister species with whom humans once interbred (Huerta-Sánchez et al., 2014). This process of adaptive introgression takes many generations in animals and plants because it involves a meiotic recombination of whole genomes, followed by natural selection of the hybrids and backcrosses that contain the adaptive

291

recombinants. However, in bacteria, transferring a single adaptive gene or a set of chromosomally linked adaptive genes can be immediate. This is because the size of recombined segments is usually much smaller than the full genome, and can contain just the generally adaptive genes (i.e. that are adaptive across different populations) without bringing along the whole genetic baggage of the donor cell (Zawadzki and Cohan, 1995; Wiedenbeck and Cohan, 2011). We can view recombination among close relatives as a cohesive force because of its ability to spread generally adaptive genes across ecologically distinct populations. Thus, closely related but ecologically distinct populations that recombine frequently have the greatest opportunity to share generally adaptive genes, as proposed in the Adapt Globally Act Locally model (Majewski and Cohan, 1999a, 1999b; Cohan, 2016a). In whole-genome comparisons, Martin Polz and his colleagues found many examples of a single gene (or closely linked set of genes) that swept across ecologically distinct populations within a Vibrio species taxon (Shapiro et al., 2012; Arevalo et al., 2019). Likewise, a metagenomic survey of diversity has shown many cases of such single-gene sweeps across ecologically distinct populations (Bendall et al., 2016; Cohan, 2016). To sum up, in both the bacteria and higher organisms, recombination can act as a force of cohesion that allows the sharing of adaptations among close relatives. We can add that ongoing, recurrent recombination can also act as a force of cohesion for neutral sequence variation. That is, recurrent transfer of ecologically interchangeable, neutral variants across ecotypes may homogenize the ecotypes’ sequences genome-wide (with the exception of genes involved in adaptive divergence). Ecotypes will homogenize nearly genome-wide when homologous recombination of neutral variants within a species taxon occurs faster than sequence divergence by mutations, causing a limit on the accumulation of neutral diversity (Palmer et al., 2019).

A Force of Cohesion That is Limited to Species Taxa Across Much of Life Fortuitously, the systematics laid down by the mid-century microbiologists appears to have

292

F.M. Cohan

delineated bacterial species taxa that are cohesive in the same way as animal and plant species taxa. Adaptive divergence within any species taxon, whether bacterial, animal or plant, does not appear hindered by recurrent recombination, yet recombination within animal or plant species appears to homogenize variation within species for genes not involved in adaptive divergence. In the case of bacteria, comprehensive comparisons of whole bacterial genomes have revealed that there is a similar cohesion acting within the broadly defined species taxa of bacteria, even though each species taxon is ecologically heterogeneous with a great diversity of ecotypes (Jain et al., 2018; Cohan, 2019). Early metagenomic studies indicated a genome-wide cohesion among closely related but ecologically distinct bacteria. This was suggested by early ‘tiling’ studies, in which the genome sequence of one isolate from an environment was compared to random sequence reads from the same environment. For example, when my colleagues and I compared the genomes of each of two Synechococcus isolates from a Yellowstone hot spring to the metagenome reads from the same hot spring, the environmental reads were either very closely similar to the isolate (> 95% identical) or they were much more divergent (usually < 80%) (Bhaya et al., 2007); other tiling experiments have yielded similar results (Caro-Quintero and Konstantinidis, 2012). These results suggested that some force acted cohesively to bacteria within 95% identity, perhaps slowing down their rate of divergence. Beyond 95% identity, it would seem that lineages are able to diverge freely at a faster rate (Cohan, 2019). The first study of ANI, based on a limited set of genomes in 2005, showed that 95% sequence identity has a special significance for bacterial taxonomy (Konstantinidis and Tiedje, 2005). An ANI value of 95% appeared to be a universal molecular criterion of relatedness that yielded the metabolically delimited species taxa of mid-century systematics. More extensive studies of genomes and metagenomes corroborated that there was a special cohesion of some kind that extends down to 95% sequence identity. Konstantinidis and colleagues analysed the ANI between all 90,000 pairs of sequenced genomes that were available at the time (Jain et al., 2018). A more recent study, using both cultivated and uncultivated

genomes, has confirmed a universal gap in ANI, where many close relatives span values from 95% to nearly 100%, and extremely few pairs of organisms are found between 83% and 95% ANI (Olm et al., 2019). The species taxa of bacterial systematics appear universally to be cohesive groups, with some force limiting divergence within them but not (or to a much lesser extent) between them (Cohan, 2019). It should surprise microbial ecologists that bacterial species taxa would be cohesive, given that they are each heterogeneous in their physiology, ecology and genome content (Cohan, 2016c). The species taxa each appear to be rife with ecotypes (Connor et al., 2010; Becraft et al., 2015), and each ecotype has its own force of cohesion through periodic selection events, which constrains the diversity within each of them (Bendall et al., 2016; Cohan, 2016a). Homologous recombination may be a force hindering divergence among relatives down to 95% ANI. One possibility is that homologous recombination is passing interchangeable sequence variants between the various ecotypes within a given species taxon. A recent study by Bobay and Ochman (2017) searched widely for recombination events between genome pairs from the same and different species taxa, and found that the rate of recombination was much higher within than between. While their study was not framed in terms of ANI, we can note that recombination rates decreased by orders of magnitude in groups with less than 95% ANI. Olm and colleagues extended the study of recombination to metagenome-assembled genomes from diverse habitats, and found a very similar result. That is, recombination decreased by orders of magnitude down to 95% ANI, at which point recombination rates became negligible (Olm et al., 2019). Why should divergence within any species taxon, from any walk of life, be constrained to the same extent by homologous recombination? We should first note that 95% ANI represents only the lower bound of within-taxon identity (Jain et al., 2018). While pairs of Escherichia coli strains extend down to 95% ANI, pairs of strains within Mycobacterium species taxa are rarely less than 99% identical. We can allow that homogenization of genomes is occurring across different levels of relatedness in different groups.

Bacterial Species Taxa – Bacterial Diversity

One possibility is that homogenization of genomes with high ANI may be mostly passive, with recombination bringing functionally interchangeable sequence variants between the various ecotypes of a species taxon (Marttinen et al., 2015; Iranzo et al., 2019). Various forces can cause the rate of recombination to decrease with increasing divergence. It is challenging to see why all taxa should be subject to the same constraints on homologous recombination, such that homogenization of sequences would fade out by 95% ANI. First, consider that there is a molecular requirement of sequence identity between donor and recipient at one or both ends of a recombining segment (Shen and Huang, 1986). This yields an exponential decay of recombination with increasing sequence divergence (Majewski and Cohan, 1999a,b; Majewski, 2001; Costechareyre et al., 2009). Of all constraints on recombination, one can most easily imagine how this molecular constraint on recombination may act more or less uniformly across the conserved recombination apparatus in different bacterial phyla. On the other hand, it is difficult to see why ecological constraints on recombination would increase uniformly with sequence divergence across all bacterial groups. Recombination between lineages requires that they live in the same microhabitat (Matte-Tailliez et al., 2002), and it is not clear why the probability of inhabiting the same environment would decline uniformly with sequence divergence. Other constraints on homologous recombination include differences in restriction endonuclease systems and in the plasmids and phage that could carry host genes (Wiedenbeck and Cohan, 2011; Hanage, 2016). It is not clear why these aspects of sexual isolation would increase uniformly with sequence divergence, especially when sexual isolation does not always increase monotonically with evolutionary divergence (Stefanic et al., 2019). An alternative explanation for homogenization of genomes with greater than 95% ANI (within a species taxon) could be that, above a particular level of relatedness, generally adaptive genes can pass from one ecotype to another and could thereby provide their adaptation to all the ecotypes within a species taxon (Cohan, 2016a, 2019). Each such sweep of an adaptive

293

gene would homogenize the entire taxon for that gene. However, it does not appear that this model could explain homogeneity at more than a few tens of thousands of base pairs (Arevalo et al., 2019). Instead, a passive homogenization of neutral sequence variants by homologous recombination is somehow responsible for blocking divergence within species taxa, down to a level of around 95% ANI, over all groups of bacteria. So we are faced with the convenient but not totally understood phenomenon that homologous recombination declines to near zero by 95% ANI in all bacterial species taxa. The resulting gap between 95% and 83% ANI, across all groups, can enable systematists to demarcate species taxa even when we know very little about a prospective species taxon. Demarcating bacterial species taxa by 95% ANI not only yields the familiar species of bacterial taxonomy. It also yields species taxa that have the species-like property of cohesion in limiting the genome- wide divergence among a set of ecologically differentiated ecotypes.

Conclusion It turns out, unexpectedly, that bacterial species taxa share a species-like property with the species taxa of zoology and botany. While recombination within species taxa of all these groups fails to prevent diversification within species, recombination nevertheless appears to act universally as a force of cohesion within species taxa. That is, recurrent recombination within species limits neutral sequence divergence within species taxa of plants, animals, and bacteria; recombination also allows a sharing of generally adaptive genes across a species range. The 95% ANI criterion that demarcates the traditionally defined species taxa of bacteria fortuitously also yields groups of bacteria that are subject to the species-like property of cohesion, where recombination prevents neutral sequence divergence among ecotypes within a species. Use of the ANI criterion, then, not only provides an easily used algorithm for demarcating bacterial species; it also places bacterial demarcation on the same theory-based foundation as the species taxonomy of animals and plants.

294

F.M. Cohan

References Arevalo, P., Vanlnsberghe, D., Elsherbini, J., Gore, J. and Polz, M.F. (2019) A reverse ecology approach based on a biological definition of microbial populations. Cell 178, 820. https://doi.org/10.1016/j. cell.2019.06.033 Auch, A.F., von Jan, M., Klenk, H.P. and Goker, M. (2010) Digital DNA-DNA hybridization for microbial species delineation by means of genome-to-genome sequence comparison. Standards in Genomic Sciences 2, 117–134. https://doi.org/10.4056/sigs.531120 Barberán, A., Caceres Velazquez, H., Jones, S. and Fierer, N. (2017) Hiding in plain sight: mining bacterial species records for phenotypic trait information. mSphere 2, e00237. https://doi.org/10.1128/ mSphere.00237-17 Barraclough, T.G., Hughes, M., Ashford-Hodges, N. and Fujisawa, T. (2009) Inferring evolutionarily significant units of bacterial diversity from broad environmental surveys of single-locus data. Biology Letters 5, 425–428. https://doi.org/10.1098/rsbl.2009.0091 Bhaya, D., Grossman, A.R., Steunou, A.S., Khuri, N., Cohan, F.M., Hamamura, N., Melendrez, M.C., Bateson, M.M., Ward, D.M. and Heidelberg, J.F. (2007) Population level functional diversity in a microbial community revealed by comparative genomic and metagenomic analyses. ISME Journal 1, 703–713. https://doi.org/10.1038/ismej.2007.46 Bobay, L.-M. and Ochman, H. (2017) Biological species are universal across Life's Domains. Genome Biology and Evolution 9, 491–501. https://doi.org/10.1093/gbe/evx026 Bobay, L.-M. and Ochman, H. (2018) Factors driving effective population size and pan-genome evolution in bacteria. BMC Evolutionary Biology 18, 153–153. https://doi.org/10.1186/s12862-018-1272-4 Becraft, E.D., Wood, J.M., Rusch, D.B., Kühl, M., Jensen, S.I., Bryant, D.A., Roberts, D.W., Cohan, F.M. and Ward, D.M. (2015) The molecular dimension of microbial species: 1. Ecological distinctions among, and homogeneity within, putative ecotypes of Synechococcus inhabiting the cyanobacterial mat of Mushroom Spring, Yellowstone National Park. Frontiers in Microbiology 6, 590. https://doi. org/10.3389/fmicb.2015.00590 Bendall, M.L., Stevens, S.L., Chan, L.K., Malfatti, S., Schwientek, P., Tremblay, J., Schackwitz, W., Martin, J., Pati, A., Bushnell, B. et al. (2016) Genome-wide selective sweeps and gene-specific sweeps in natural bacterial populations. ISME Journal. https://doi.org/10.1038/ismej.2015.241 Berger, S.A. (2005) GIDEON: a comprehensive Web-based resource for geographic medicine. International Journal of Health Geographics 4, 10. https://doi.org/10.1186/1476-072X-4-10 Blount, Z.D., Barrick, J.E., Davidson, C.J. and Lenski, R.E. (2012) Genomic analysis of a key innovation in an experimental Escherichia coli population. Nature Advance Online Publication. https://doi. org/10.1038/nature11514 Cadillo-Quiroz, H., Didelot, X., Held, N.L., Herrera, A., Darling, A., Reno, M.L., Krause, D.J. and Whitaker, R.J. (2012) Patterns of gene flow define species of thermophilic Archaea. PLoS Biology 10, e1001265. https://doi.org/10.1371/journal.pbio.1001265 Caro-Quintero, A. and Konstantinidis, K.T. (2012) Bacterial species may exist, metagenomics reveal. Environmental Microbiology 14, 347–355. https://doi.org/10.1111/j.1462-2920.2011.02668.x Clausen, J., Keck, D.D. and Hiesey, W.M. (1947) Heredity of geographically and ecologically isolated races. The American Naturalist 81, 114–133. https://doi.org/10.1086/281507 Cohan, F.M. (1994) The effects of rare but promiscuous genetic exchange on evolutionary divergence in prokaryotes. The American Naturalist 143, 965–986. https://doi.org/10.1086/285644 Cohan, F.M. (2002) What are bacterial species? Annual Review of Microbiology 56, 457–487. https://doi. org/10.1146/annurev.micro.56.012302.160634 Cohan, F.M. (2006) Toward a conceptual and operational union of bacterial systematics, ecology, and evolution. Proceedings of the Royal Society of London Series B 361, 1985–1996. https://doi. org/10.1098/rstb.2006.1918 Cohan, F.M. (2011a) Are species cohesive?--A view from bacteriology. In: Walk, S. and Feng, P. (eds) Bacterial Population Genetics: A Tribute to Thomas S. Whittam. American Society for Microbiology Press, Washington, DC, pp. 43–65. https://doi.org/10.1128/9781555817114.ch5 Cohan, F.M. (2011b) Koufax's perfect game-the tale of the data. In: Los Angeles Times. Tribune Newspapers, Los Angeles, California. Cohan, F.M. (2013) Species. In: Maloy, S. and Hughes, K. (eds). Brenner's Encyclopedia of Genetics, 2nd edn. Elsevier, Amsterdam, pp. 506–511. https://doi.org/10.1016/B978-0-12-374984-0.01454-6

Bacterial Species Taxa – Bacterial Diversity

295

Cohan, F.M. (2016a) Bacterial speciation: genetic sweeps in bacterial species. Current Biology 26, R112– R115. https://doi.org/10.1016/j.cub.2015.10.022 Cohan, F.M. (2016b) Prokaryotic species concepts. In: Kliman, R.M. (ed.) Encyclopedia of Evolutionary Biology. Elsevier, Amsterdam. https://doi.org/10.1016/B978-0-12-800049-6.00230-4 Cohan, F.M. (2016c) Bacterial species concepts. In: Kliman, R.M. (ed.) Encyclopedia of Evolutionary Biology, Volume 1. Academic Press, Oxford, UK, pp. 119–129. https://doi.org/10.1016/B978-0-12-800049-6.00230-4 Cohan, F.M. (2017a) Transmission in the origins of bacterial diversity, from ecotypes to phyla. Microbiology Spectrum 5. https://doi.org/10.1128/microbiolspec.MTBP-0014-2016 Cohan, F.M. (2017b) Species. In: Osmanaj, B. and Escalante-Santos, L. (eds). Reference Module in Life Sciences. Elsevier, Oxford, UK. https://doi.org/10.1016/B978-0-12-809633-8.07184-3 Cohan, F.M. (2019) Systematics: The cohesive nature of bacterial species taxa. Current Biology 29, R169–R172. https://doi.org/10.1016/j.cub.2019.01.033 Cohan, F.M. and Kopac, S.M. (2011) Microbial genomics: E. coli relatives out of doors and out of body. Current Biology 21, R587–589. https://doi.org/10.1016/j.cub.2011.06.011 Cohan, F.M. and Kopac, S.M. (2017) A theory-based pragmatism for discovering and classifying newly divergent species of bacterial pathogens. In: Tibayrenc, M. (ed.) Genetics and Evolution of Infectious Diseases, 2nd edn. Elsevier Inc., Oxford, UK, pp. 25–49. https://doi.org/10.1016/B978-0-12-7999425.00002-0 Cohan, F.M. and Perry, E.B. (2007) A systematics for discovering the fundamental units of bacterial diversity. Current Biology 17, R373–R386. https://doi.org/10.1016/j.cub.2007.03.032 Collins, C. and Didelot, X. (2018) A phylogenetic method to perform genome-wide association studies in microbes that accounts for population structure and recombination. PLoS Computational Biology. https://doi.org/10.1371/journal.pcbi.1005958 Connor, N., Sikorski, J., Rooney, A.P., Kopac, S., Koeppel, A.F., Burger, A., Cole, S.G., Perry, E.B., Krizanc, D., Field, N.C. et al. (2010) The ecology of speciation in Bacillus. Applied and Environmental Microbiology 76, 1349–1358. https://doi.org/10.1128/AEM.01988-09 Costechareyre, D., Bertolla, F. and Nesme, X. (2009) Homologous recombination in Agrobacterium: potential implications for the genomic species concept in bacteria. Molecular Biology and Evolution 26, 167–176. https://doi.org/10.1093/molbev/msn236 Coyne, J.A. and Orr, H.A. (2004) Speciation. Sinauer Associates, Sunderland, Massachusetts. Chase, A.B., Arevalo, P., Brodie, E.L., Polz, M.F., Karaoz, U. and Martiny, J.B.H. (2019) Maintenance of sympatric and allopatric populations in free-living terrestrial bacteria. mBio 10, e02361. https://doi. org/10.1128/mBio.02361-19 Denef, V.J., Kalnejais, L.H., Mueller, R.S., Wilmes, P., Baker, B.J., Thomas, B.C., VerBerkmoes, N.C., Hettich, R.L. and Banfield, J.F. (2010) Proteogenomic basis for ecological divergence of closely related bacteria in natural acidophilic microbial communities. Proceedings of the National Academy of Sciences of the United States of America 107, 2383–2390. https://doi.org/10.1073/pnas.0907041107 De Ley, J. (1970) Reexamination of the association between melting point, buoyant density, and chemical base composition of deoxyribonucleic acid. Journal of Bacteriology 101, 738–754. https://doi. org/10.1128/JB.101.3.738-754.1970 de Queiroz, K. (2005) Ernst Mayr and the modern concept of species. Proceedings of the National Academy of Sciences of the United States of America 102 Suppl 1, 6600–6607. https://doi.org/10.1073/ pnas.0502030102 Diallo, K., MacLennan, J., Harrison, O.B., Msefula, C., Sow, S.O., Daugla, D.M., Johnson, E., Trotter, C., MacLennan, C.A., Parkhill, J. et al. (2019) Genomic characterization of novel Neisseria species. Scientific reports 9, 13742–13742. https://doi.org/10.1038/s41598-019-50203-2 Dobzhansky, T. (1951) Genetics and the Origin of Species, 3rd edn. Columbia University Press, New York. Doolittle, W.F., and Zhaxybayeva, O. (2009) On the origin of prokaryotic species. Genome Research 19, 744–756. https://doi.org/10.1101/gr.086645.108 Dutilh, B.E., Backus, L., Edwards, R.A., Wels, M., Bayjanov, J.R. and van Hijum, S.A.F.T. (2013) Explaining microbial phenotypes on a genomic scale: GWAS for microbes. Briefings in Functional Genomics 12, 366–380. https://doi.org/10.1093/bfgp/elt008 Dykhuizen, D.E. and Green, L. (1991) Recombination in Escherichia coli and the definition of biological species. Journal of Bacteriology 173, 7257–7268. https://doi.org/10.1128/JB.173.22.7257-7268.1991 Garrity, G.M. (2016) A new genomics-driven taxonomy of bacteria and archaea: are we there yet? Journal of Clinical Microbiology 54 (8), 1956–1963. doi: 10.1128/JCM.00200-16

296

F.M. Cohan

Garrity, G. (2019) International Code of Nomenclature of Prokaryotes. International Journal of Systematic and Evolutionary Microbiology 69, S1–S111. Garrity, G. and Lyons, C. (2011) Systems and Methods For Resolving Ambiguity Between Names and Entities. Volume US 7.925,444 B2. (Board of Trustees of Michigan State University). Gerbault, P., Liebert, A., Itan, Y., Powell, A., Currat, M., Burger, J., Swallow, D.M. and Thomas, M.G. (2011) Evolution of lactase persistence: an example of human niche construction. Philosophical Transactions of the Royal Society of London. Series B, Biological sciences 366, 863–877. https://doi. org/10.1098/rstb.2010.0268 Gevers, D., Cohan, F.M., Lawrence, J.G., Spratt, B.G., Coenye, T., Feil, E.J., Stackebrandt, E., Van de Peer, Y., Vandamme, P., Thompson, F.L. et al. (2005) Opinion: Re-evaluating prokaryotic species. Nature Reviews Microbiology 3, 733–739. https://doi.org/10.1038/nrmicro1236 Hahne, H., Mader, U., Otto, A., Bonn, F., Steil, L., Bremer, E., Hecker, M. and Becher, D. (2010) A comprehensive proteomics and transcriptomics analysis of Bacillus subtilis salt stress adaptation. Journal of Bacteriology 192, 870–882. https://doi.org/10.1128/JB.01106-09 Haldane, J.B.S. (1932) The Causes of Evolution. Longmans, Green, and Co., London. Hanage, W.P. (2016) Not so simple after all: Bacteria, their population genetics, and recombination. Cold Spring Harbor Perspectives in Biology 8, a018069. https://doi.org/10.1101/cshperspect.a018069 He, X., Jin, Y., Ye, M., Chen, N., Zhu, J., Wang, J., Jiang, L. and Wu, R. (2017) Bacterial genetic architecture of ecological interactions in co-culture by GWAS-taking Escherichia coli and Staphylococcus aureus as an example. Frontiers in Microbiology 8, 2332–2332. https://doi.org/10.3389/fmicb.2017.02332 Hey, J. (2001) Genes, Categories, and Species: The Evolutionary and Cognitive Cause of the Species Problem. Oxford University Press, Oxford, UK. Ho, S.Y. and Shapiro, B. (2011) Skyline-plot methods for estimating demographic history from nucleotide sequences. Molecular Ecology Resources 11, 423–434. https://doi.org/10.1111/j.1755-0998.2011.02988.x Holmberg, K. and Nord, C.E. (1984) 14 Application of numerical taxonomy to the classification and identification of microaerophilic Actinomycetes. In: Bergan, T. (ed.) Methods in Microbiology, Volume 16. Academic Press, pp. 341–360. https://doi.org/10.1016/S0580-9517(08)70399-4 Huerta-Sánchez, E., Jin, X., Asan, Bianba, Z., Peter, B.M., Vinckenbosch, N., Liang, Y., Yi, X., He, M., Somel, M. et al. (2014) Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA. Nature 512, 194–197. https://doi.org/10.1038/nature13408 Hunt, D.E., David, L.A., Gevers, D., Preheim, S.P., Alm, E.J. and Polz, M.F. (2008) Resource partitioning and sympatric differentiation among closely related bacterioplankton. Science 320, 1081–1085. https://doi.org/10.1126/science.1157890 Hutchinson, G.E. (1968) When are species necessary? In: Lewontin, R.C. (ed.) Population Biology and Evolution. Syracuse University Press, Syracuse, New York, pp. 177–186. Iranzo, J., Wolf, Y.I., Koonin, E.V. and Sela, I. (2019) Gene gain and loss push prokaryotes beyond the homologous recombination barrier and accelerate genome sequence divergence. Nature Communications 10, 5376. https://doi.org/10.1038/s41467-019-13429-2 Jain, C., Rodriguez, R.L., Phillippy, A.M., Konstantinidis, K.T. and Aluru, S. (2018) High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nature Communications 9, 5114. https://doi.org/10.1038/s41467-018-07641-9 Kashtan, N., Roggensack, S.E., Rodrigue, S., Thompson, J.W., Biller, S.J., Coe, A., Ding, H., Marttinen, P., Malmstrom, R.R., Stocker, R. et al. (2014) Single-cell genomics reveals hundreds of coexisting subpopulations in wild Prochlorococcus. Science 344, 416–420. https://doi.org/10.1126/science. 1248575 Koch, H., Germscheid, N., Freese, H.M., Noriega-Ortega, B., Lücking, D., Berger, M., Qiu, G., Marzinelli, E.M., Campbell, A.H., Steinberg, P.D. et al. (2020) Genomic, metabolic and phenotypic variability shapes ecological differentiation and intraspecies interactions of Alteromonas macleodii. Scientific Reports 10, 809. https://doi.org/10.1038/s41598-020-57526-5 Koeppel, A., Perry, E.B., Sikorski, J., Krizanc, D., Warner, W.A., Ward, D.M., Rooney, A.P., Brambilla, E., Connor, N., Ratcliff, R.M. et al. (2008) Identifying the fundamental units of bacterial diversity: a paradigm shift to incorporate ecology into bacterial systematics. Proceedings of the National Academy of Sciences 105, 2504–2509. https://doi.org/10.1073/pnas.0712205105 Koeppel, A.F., Wertheim, J.O., Barone, L., Gentile, N., Krizanc, D. and Cohan, F.M. (2013) Speedy speciation in a bacterial microcosm: New species can arise as frequently as adaptations within a species. ISME Journal 7, 1080–1091. https://doi.org/10.1038/ismej.2013.3

Bacterial Species Taxa – Bacterial Diversity

297

Konstantinidis, K.T. and Tiedje, J.M. (2005) Genomic insights that advance the species definition for prokaryotes. Proceedings of the National Academy of Sciences of the United States of America 102, 2567–2572. https://doi.org/10.1073/pnas.0409727102 Konstantinidis, K.T., Ramette, A. and Tiedje, J.M. (2006) The bacterial species definition in the genomic era. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 361, 1929–1940. https://doi.org/10.1098/rstb.2006.1920 Konstantinidis, K.T., Rosselló-Móra, R. and Amann, R. (2017) Uncultivated microbes in need of their own taxonomy. ISME Journal 11, 2399–2406. https://doi.org/10.1038/ismej.2017.113 Kopac, S.M. and Cohan, F.M. (2012) Comment on ‘Population genomics of early events in the ecological differentiation of bacteria’. Science 336. Kopac, S., Wang, Z., Wiedenbeck, J., Sherry, J., Wu, M. and Cohan, F.M. (2014) Genomic heterogeneity and ecological speciation within one subspecies of Bacillus subtilis. Applied and Environmental Microbiology 80, 4842–4853. https://doi.org/10.1128/AEM.00576-14 Kumar, R., Verma, H., Haider, S., Bajaj, A., Sood, U., Ponnusamy, K., Nagar, S., Shakarad, M.N., Negi, R.K., Singh, Y. et al. (2017) Comparative genomic analysis reveals habitat-specific genes and regulatory hubs within the genus novosphingobium. mSystems 2, e00020-00017. https://doi.org/10.1128/ mSystems.00020-17 Lassalle, F., Campillo, T., Vial, L., Baude, J., Costechareyre, D., Chapulliot, D., Shams, M., Abrouk, D., Lavire, C., Oger-Desfeux, C. et al. (2011) Genomic species Are ecological species as revealed by comparative genomics in Agrobacterium tumefaciens. Genome Biology and Evolution 3, 762–781. https://doi.org/10.1093/gbe/evr070 Locey, K.J. and Lennon, J.T. (2016) Scaling laws predict global microbial diversity. Proceedings of the National Academy of Sciences 113, 5970–5975. https://doi.org/10.1073/pnas.1521291113 Luo, C., Walk, S.T., Gordon, D.M., Feldgarden, M., Tiedje, J.M. and Konstantinidis, K.T. (2011) Genome sequencing of environmental Escherichia coli expands understanding of the ecology and speciation of the model bacterial species. Proceedings of the National Academy of Sciences of the United States of America 108, 7200–7205. https://doi.org/10.1073/pnas.1015622108 Majewski, J. (2001) Sexual isolation in bacteria. FEMS Microbiology Letters 199, 161–169. https://doi. org/10.1111/j.1574-6968.2001.tb10668.x Majewski, J. and Cohan, F.M. (1999a) Adapt globally, act locally: the effect of selective sweeps on bacterial sequence diversity. Genetics 152, 1459–1474. Majewski, J. and Cohan, F.M. (1999b) DNA sequence similarity requirements for interspecific recombination in Bacillus. Genetics 153, 1525–1533. Mallet, J. (1995) A species definition for the modern synthesis. Trends in Ecology and Evolution 10, 294– 299. https://doi.org/10.1016/0169-5347(95)90031-4 Mallet, J. (2008) Hybridization, ecological races and the nature of species: empirical evidence for the ease of speciation. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 363, 2971–2986. https://doi.org/10.1098/rstb.2008.0081 Martiny, A.C., Tai, A.P., Veneziano, D., Primeau, F. and Chisholm, S.W. (2009) Taxonomic resolution, ecotypes and the biogeography of Prochlorococcus. Environmental Microbiology 11, 823–832. https://doi.org/10.1111/j.1462-2920.2008.01803.x Marttinen, P., Croucher, N.J., Gutmann, M.U., Corander, J. and Hanage, W.P. (2015) Recombination produces coherent bacterial species clusters in both core and accessory genomes. Microbial Genomics 1, e000038. https://doi.org/10.1099/mgen.0.000038 Mateo-Estrada, V., Graña, L., López-Leal, G. and Castillo-Ramírez, S. (2019) Phylogenomics reveals clear cases of misclassification and genus-wide phylogenetic markers for Acinetobacter. Genome Biology and Evolution. https://doi.org/10.1093/gbe/evz178 Matte-Tailliez, O., Brochier, C., Forterre, P. and Philippe, H. (2002) Archaeal phylogeny based on ribosomal proteins. Molecular Biology and Evolution 19, 631–639. https://doi.org/10.1093/oxfordjournals. molbev.a004122 Mayr, E. (1942) Systematics and the Origin of Species from the Viewpoint of a Zoologist. Columbia University, New York. Mayr, E. (1963) Animal Species and Evolution. Belknap Press of Harvard University Press, Cambridge. https://doi.org/10.4159/harvard.9780674865327 McDonald, D., Price, M.N., Goodrich, J., Nawrocki, E.P., DeSantis, T.Z., Probst, A., Andersen, G.L., Knight, R. and Hugenholtz, P. (2012) An improved Greengenes taxonomy with explicit ranks for

298

F.M. Cohan

ecological and evolutionary analyses of bacteria and archaea. ISME Journal 6, 610–618. https://doi. org/10.1038/ismej.2011.139 Melendrez, M.C., Becraft, E.D., Wood, J.M., Olsen, M.T., Bryant, D.A., Heidelberg, J.F., Rusch, D.B., Cohan, F.M. and Ward, D.M. (2016) Recombination does not hinder formation or detection of ecological species of Synechococcus inhabiting a hot spring cyanobacterial mat. Frontiers in Microbiology 6, 1540. https://doi.org/10.3389/fmicb.2015.01540 Mirete, S., Mora-Ruiz, M.R., Lamprecht-Grandío, M., de Figueras, C.G., Rosselló-Móra, R. and González-Pastor, J.E. (2015) Salt resistance genes revealed by functional metagenomics from brines and moderate-salinity rhizosphere within a hypersaline environment. Frontiers in Microbiology 6, 1121. https://doi.org/10.3389/fmicb.2015.01121 Nelson, W.C., Maezato, Y., Wu, Y.-W., Romine, M.F. and Lindemann, S.R. (2016) Identification and resolution of microdiversity through metagenomic sequencing of parallel consortia. Applied and Environmental Microbiology 82, 255–267. https://doi.org/10.1128/AEM.02274-15 Olm, M.R., Crits-Christoph, A., Diamond, S., Lavy, A., Matheus Carnevali, P.B. and Banfield, J.F. (2019) Consistent metagenome-derived metrics verify and define bacterial species boundaries. bioRxiv 647511. https://doi.org/10.1101/647511 Palmer, M., Venter, S.N., Coetzee, M.P.A. and Steenkamp, E.T. (2019) Prokaryotic species are sui generis evolutionary units. Systematic and Applied Microbiology 42, 145–158. https://doi.org/10.1016/j. syapm.2018.10.002 Papke, R.T., Koenig, J.E., Rodriguez-Valera, F. and Doolittle, W.F. (2004) Frequent recombination in a saltern population of Halorubrum. Science 306, 1928–1929. Parker, C.T., Tindall, B.J. and Garrity, G.M. (2019) International Code of Nomenclature of Prokaryotes. International Journal of Systematic and Evolutionary Microbiology 69, S1–S111. https://doi. org/10.1099/ijsem.0.000778 Parks, D.H., Chuvochina, M., Waite, D.W., Rinke, C., Skarshewski, A., Chaumeil, P.-A. and Hugenholtz, P. (2018) A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nature Biotechnology 36, 996. https://doi.org/10.1038/nbt.4229 Parks, D.H., Chuvochina, M., Chaumeil, P.-A., Rinke, C., Mussig, A.J. and Hugenholtz, P. (2019) Selection of representative genomes for 24,706 bacterial and archaeal species clusters provide a complete genome-based taxonomy. bioRxiv 771964. https://doi.org/10.1101/771964 Polz, M.F., Alm, E.J. and Hanage, W.P. (2013) Horizontal gene transfer and the evolution of bacterial and archaeal population structure. Trends in Genetics 29, 170–175. https://doi.org/10.1016/j.tig. 2012.12.006 Rainey, P.B. and Travisano, M. (1998) Adaptive radiation in a heterogeneous environment. Nature 394, 69–72. https://doi.org/10.1038/27900 Ramírez, M., Moncada, R.N., Villegas-Escobar, V., Jackson, R.W. and Ramírez, C.A. (2020) Phylogenetic and pathogenic variability of strains of Ralstonia solanacearum causing moko disease in Colombia. Plant Pathology 69, 360–369. https://doi.org/10.1111/ppa.13121 Roberts, M.S. and Cohan, F.M. (1995) Recombination and migration rates in natural populations of Bacillus subtilis and Bacillus mojavensis. Evolution 49, 1081–1094. https://doi.org/10.1111/j.1558-5646.1995. tb04435.x Rosselló-Móra, R. and Amann, R. (2001) The species concept for prokaryotes. FEMS Microbiology Reviews 25, 39–67. https://doi.org/10.1111/j.1574-6976.2001.tb00571.x Shapiro, B.J., Friedman, J., Cordero, O.X., Preheim, S.P., Timberlake, S.C., Szabo, G., Polz, M.F. and Alm, E.J. (2012) Population genomics of early events in the ecological differentiation of bacteria. Science 336, 48–51. https://doi.org/10.1126/science.1218198 Shapiro, B.J. and Polz, M.F. (2014) Ordering microbial diversity into ecologically and genetically cohesive units. Trends in Microbiology 22, 235–247. https://doi.org/10.1016/j.tim.2014.02.006 Shapiro, B.J. and Polz, M.F. (2015) Microbial speciation. Cold Spring Harbor Perspectives in Biology 7, a018143. https://doi.org/10.1101/cshperspect.a018143 Shen, P. and Huang, H.V. (1986) Homologous recombination in Escherichia coli: dependence on substrate length and homology. Genetics 112, 441–457. Sikorski, J. (2008) Populations under microevolutionary scrutiny: what will we gain? Archives of Microbiology 189, 1–5. https://doi.org/10.1007/s00203-007-0294-x Sneath, P. and Sokal, R. (1973) Numerical Taxonomy: the Principles and Practice of Numerical Classification. W.H. Freeman, San Francisco, California.

Bacterial Species Taxa – Bacterial Diversity

299

Stackebrandt, E. and Ebers, J. (2006) Taxonomic parameters revisited: tarnished gold standards. Microbiology Today 33, 152–155. Stackebrandt, E. and Goebel, B.M. (1994) Taxonomic note: a place for DNA:DNA reassociation and 16S rRNA sequence analysis in the present species definition in bacteriology. International Journal of Systematic Bacteriology 44, 846–849. https://doi.org/10.1099/00207713-44-4-846 Stackebrandt, E., Fischer, A., Roggentin, T., Wehmeyer, U., Bomar, D. and Smida, J. (1988) A phylogenetic survey of budding, and/or prosthecate, non-phototrophic eubacteria: membership of Hyphomicrobium, Hyphomonas, Pedomicrobium, Filomicrobium, Caulobacter and ‘Dichotomicrobium’ to the alpha-subdivision of purple non-sulfur bacteria. Archives of Microbiology 149, 547–556. https://doi. org/10.1007/BF00446759 Staley, J.T. (2006) The bacterial species dilemma and the genomic-phylogenetic species concept. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 361, 1899–1909. https://doi.org/10.1098/rstb.2006.1914 Stefanic, P., Belcijan, K., Kraigher, B., Kostanjšek, R., Nesme, J., Madsen, J., Kovac, J., Sørensen, S., Vos, M. and Mandic-Mulec, I. (2019) Intra-species DNA exchange: Bacillus subtilis prefers sex with less related strains. bioRxiv 756569. https://doi.org/10.1101/756569 Templeton, A. (1989) The meaning of species and speciation: a genetic perspective. In: Otte, D. and Endler, J. (eds) Speciation and its Consequences, Sinauer Associates, Sunderland, Massachusetts, pp. 3–27. Thompson, A.W. and Kouba, K. (2019) Differential activity of coexisting Prochlorococcus ecotypes. Frontiers in Marine Science 6. https://doi.org/10.3389/fmars.2019.00701 Thompson, C.C., Amaral, G.R., Campeao, M., Edwards, R.A., Polz, M.F., Dutilh, B.E., Ussery, D.W., Sawabe, T., Swings, J. and Thompson, F.L. (2014) Microbial taxonomy in the post-genomic era: Rebuilding from scratch? Archives of Microbiology 197, 359–370. https://doi.org/10.1007/s00203-014-1071-2 Tindall, B.J., Rosselló-Móra, R., Busse, H.J., Ludwig, W. and Kampfer, P. (2010) Notes on the characterization of prokaryote strains for taxonomic purposes. International Journal of Systematic and Evolutionary Microbiology 60, 249–266. https://doi.org/10.1099/ijs.0.016949-0 Treves, D.S., Manning, S. and Adams, J. (1998) Repeated evolution of an acetate-crossfeeding polymorphism in long-term populations of Escherichia coli. Molecular Biology and Evolution 15, 789–797. https://doi.org/10.1093/oxfordjournals.molbev.a025984 Turesson, G. (1922) The species and the variety as ecological units. Hereditas 3, 100–113. https://doi. org/10.1111/j.1601-5223.1922.tb02727.x Vandamme, P. and Peeters, C. (2014) Time to revisit polyphasic taxonomy. Antonie Van Leeuwenhoek 106, 57–65. https://doi.org/10.1007/s10482-014-0148-x Vandamme, P., Pot, B., Gillis, M., de Vos, P., Kersters, K. and Swings, J. (1996) Polyphasic taxonomy, a consensus approach to bacterial systematics. Microbiological Reviews 60, 407–438. https://doi. org/10.1128/MMBR.60.2.407-438.1996 Varghese, N.J., Mukherjee, S., Ivanova, N., Konstantinidis, K.T., Mavrommatis, K., Kyrpides, N.C. and Pati, A. (2015) Microbial species delineation using whole genome sequences. Nucleic Acids Research 43, 6761–6771. https://doi.org/10.1093/nar/gkv657 Volz, E.M. (2012) Complex population dynamics and the coalescent under neutrality. Genetics 190, 187– 201. https://doi.org/10.1534/genetics.111.134627 Vos, M. (2011) A species concept for bacteria based on adaptive divergence. Trends in Microbiology 19, 1–7. https://doi.org/10.1016/j.tim.2010.10.003 Vos, M. and Didelot, X. (2009) A comparison of homologous recombination rates in bacteria and archaea. ISME Journal 3, 199–208. https://doi.org/10.1038/ismej.2008.93 Ward, D.M. (1998) A natural species concept for prokaryotes. Current Opinion in Microbiology 1, 271–277. https://doi.org/10.1016/S1369-5274(98)80029-5 Wayne, L.G., Brenner, D.J., Colwell, R.R., Grimont, P.A.D., Kandler, O., Krichevsky, M.I., Moore, W.E.C., Murray, R.G.E., Stackebrandt, E., Starr, M.P. et al. (1987) Report of the ad hoc committee on reconciliation of approaches to bacterial systematics. International Journal of Systematic Bacteriology 37, 463–464. https://doi.org/10.1099/00207713-37-4-463 Weimann, A., Mooren, K., Frank, J., Pope, P.B., Bremges, A. and McHardy, A.C. (2016) From genomes to phenotypes: Traitar, the microbial trait analyzer. mSystems 1, e00101-00116. https://doi.org/10.1128/ mSystems.00101-16 Wiedenbeck, J. and Cohan, F.M. (2011) Origins of bacterial diversity through horizontal gene transfer and adaptation to new ecological niches. FEMS Microbiology Reviews 35, 957–976. https://doi.org/10.1111/ j.1574-6976.2011.00292.x

300

F.M. Cohan

Wilkins, J.S. (2009) Species: A History of the Idea. University of California, Berkeley, California. Wittouck, S., Wuyts, S., Meehan, C.J., van Noort, V. and Lebeer, S. (2019) A genome-based species taxonomy of the Lactobacillus genus complex. mSystems 4, e00264-00219. https://doi.org/10.1128/ mSystems.00264-19 Woese, C.R. (1987) Bacterial evolution. Microbiology Reviews 51, 221–271. https://doi.org/10.1128/ MMBR.51.2.221-271.1987 Wood, J.M., Becraft, E.M., Cohan, F., Krizanc, D. and Ward, D.M. (2020) Ecotype Simulation 2: An improved algorithm for efficiently demarcating microbial species from large sequence datasets. bioRxiv 2020.2002.2010.940734. https://doi.org/10.1101/2020.02.10.940734 Yarza, P., Yilmaz, P., Pruesse, E., Glöckner, F.O., Ludwig, W., Schleifer, K.H., Whitman, W.B., Euzeby, J., Amann, R. and Rosselló-Móra, R. (2014) Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nature Reviews Microbiology 12, 635–645. https:// doi.org/10.1038/nrmicro3330 Yilmaz, P., Kottmann, R., Field, D., Knight, R., Cole, J.R., Amaral-Zettler, L., Gilbert, J.A., Karsch-Mizrachi, I., Johnston, A., Cochrane, G. et al. (2011) Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nature Biotechnology 29, 415–420. https://doi.org/10.1038/nbt.1823 Yoon, C.K. (2009) Naming Nature: The Clash between Instinct and Science. Norton, New York. Zawadzki, P. and Cohan, F.M. (1995) The size and continuity of DNA segments integrated in Bacillus transformation. Genetics 141, 1231–1243. Zhou, Y., Zheng, J., Wu, Y., Zhang, W. and Jin, J. (2020) A completeness-independent method for preselection of closely related genomes for species delineation in prokaryotes. BMC Genomics 21, 183. https://doi.org/10.1186/s12864-020-6597-x Zhu, Q., Mai, U., Pfeiffer, W., Janssen, S., Asnicar, F., Sanders, J.G., Belda-Ferre, P., Al-Ghalith, G.A., Kopylova, E., McDonald, D. et al. (2019) Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea. Nature Communications 10, 5477. https://doi. org/10.1038/s41467-019-13443-4

17

Are Species Concepts Outdated for Fungi? Intraspecific Variation in Plantpathogenic Fungi Illustrates the Need for Subspecific Categorization

Enrique Monte1,*, Rosa Hermosa1, María del Mar Jiménez-Gasco2 and Rafael M. Jiménez-Díaz3 1 Spanish–Portuguese Institute for Agricultural Research (CIALE), University of Salamanca, Salamanca, Spain; 2Department of Plant Pathology and Environmental Microbiology, The Pennsylvania State University, University Park, PA, USA; 3College of Agriculture and Forestry, Universidad de Córdoba and Instituto de Agricultura Sostenible, CSIC, Córdoba, Spain

Introduction Classification of living organisms is of paramount importance for identification and for studies on diversity and function. However, the major obstacle to a collective awareness in biology is the inability of scientists to arrive at a universal definition of the term ‘species’, or a general method of discovering species (Taylor, 2009). The typological (phenetic) species concept (TSC) advocated by the Greek philosopher Aristotle and the Swedish naturalist Linnaeus was based on largely morphological characters and assumes that the members of a given species can be recognized by their essential characters. At the beginning of the 20th century the biological species concept (BSC) was adopted by naturalists and was based on the imperatives of zoological reproduction. This concept was strengthened in the 1940s when the German evolutionary biologist Ernst Mayr defined species as ‘groups of actually or potentially interbreeding natural

populations which are reproductively isolated from other such groups’ (Mayr, 1942). This means that one biospecies inhabiting a particular niche in nature can be recognized as a group of organisms that can successfully interbreed and produce fertile offspring, but are incapable of successfully mating with other such groups (Bisby and Coddington, 1995). When these concepts were applied to fungi, mycologists describing new species assumed that morphological features were species specific, constant, inherited and could be determined in one specimen over one generation. A particularly difficult issue for fungi is that sexual reproduction may occur in nature but may be infrequent, or triggered by such rare conditions that the sexual stages of many species remain unknown, with the subsequent limitation to the use of a BSC (see Chapter 2). A BSC is difficult to apply to fossil species, agamospecies (organisms where sexual reproduction does not occur, such as mitotic or anamorphic fungi) and sexually reproducing organisms with open

*[email protected] © CAB International 2021. Trends in the Systematics of Bacteria and Fungi (eds. P. Bridge, D. Smith and E. Stackebrandt)

301

302

E. Monte et al.

mating systems that freely hybridize with others. A BSC is also complicated for allopatric populations; that is, different populations of the same species that are physically isolated from each other to an extent that prevents or interferes with gene flow. Consequently, the boundaries of agamospecies are often hard to define. In the absence of meiosis, agamospecies need alternative processes for ecological adaptation and may introduce some genetic variation into a population through DNA transformation, viral transduction or horizontal (lateral) gene transfer. Given the complications of TSC and BSC concepts, many older fungal ‘species’ have been found to be arbitrary aggregations of individuals that can be termed as ‘classes’, ‘cryptic species’, ‘species groups’, ‘species complexes’ or ‘species aggregates’ hidden under one species name. Recognizing similar but distinct fungal species is important, and drawing dividing lines between these species can be inherently difficult (see Chapter 14).

Difficulties in Applying Species Concepts in Fungi There are various problems in applying TSC and BSC to fungi. Some of these are:

•

• • • • •

the meiotic (teleomorphic) stage is hard to find (e.g. Thanatephorus cucumeris) as the mitotic (anamorphic) stages are practically the sole form spotted in nature (i.e. Rhizoctonia solani); the occurrence of two anamorphs with widely differing morphologies (e.g. Epicoccum nigrum and Phoma epiccocina); the description of a teleomorphic stage (e.g. Hypocrea lixii) in a species aggregate that is not linked to all the anamorphic members of the aggregate (e.g. Trichoderma harzianum); some fungal structures (e.g. acervuli of Colletotrichum) are not produced in conventional laboratory cultures; anamorphic fungi described according to morphological criteria that are difficult to interpret or to distinguish (e.g. conidiogenesis types and conidial ontology); with pathogenic fungi, the use of host-specificity criteria not only depends on their pathogenic variation but on the susceptibility of the host giving rise to the application

•

•

of subspecific categories (pathogenic races, pathotypes, formae speciales, vegetative compatibility groups (VCGs), etc.); parasexually produced teleomorphs (recombined haploid nuclei formation after anastomosis-generated plasmogamy and diploidization processes that does not need the development of sexual structures and occurs at a specified time or specified points in the life cycle of a fungus) (e.g. Indian isolates of Puccinia graminis f. sp. tritici); and a limited number of species and ex-type strains have been deposited in public culture collections (see Chapters 4 and 5).

Phylogenetic Species Concept and Molecular Data As a result of the difficulties described above, a clearer and more precise idea of species is the phylogenetic species concept (PSC). A phylogenetic species is the smallest aggregation of (sexual) populations or (asexual) lineages whose members are descended from a common ancestor and possess a combination of certain defining, or derived, traits which are applicable across all kinds of living entities including sexual and asexual forms (Wheeler and Platnick, 2000). Such phylogenetic features allow differences to be set between individual species, making use of minute morphological and physiological details, and DNA-based methods such as molecular phylogenetics and DNA barcoding, although species delimitation on the basis of a single marker gene sequence should be avoided if a given locus could be affected by horizontal gene transfer (Dugan and Everhart, 2016). This approach, which provides information on an organism’s evolutionary relationships, is the most widely accepted species concept used in mycology (Taylor et al., 2000). Because of its evolutionary nature, molecular data can be used in various quantitative analyses to generate a phylogenetic tree where patterns of branching reflect how species or other groups evolved from a series of common ancestors. These groups can be further grouped to form clusters or clades (see Sneath and Sokal, 1973). As a result, with molecular data both the PSC and the cladistic classification define species

Are Species Concepts Outdated for Fungi?

as a cluster of individuals that is sufficiently differentiated by DNA sequences from other clusters. However, this definition needs to identify the PSC boundaries in fungi under the assumption that speciation is predicted by sequence divergence, and this is currently being done through two broadly defined forms: the strict genealogical concordance (SGC) and the coalescent-based species delimitation (CBD) approaches (Matute and Sepúlveda, 2019). SGC uses the probability that a pair of individuals will share a certain character, given that one of the pair has the character, and is especially practical for delimiting species in morphologically reduced fungi or fungi that only exhibit anamorphic stages. CBD attempts to show how several species are related by modelling the genealogical history of individuals back to a common ancestor (Fujita et al., 2012). This is modelled by estimating the likelihood of obtaining a given set of gene genealogies given a species tree, by exploring the full range of possible species tree topologies or by using the distinct branching patterns between divergence and intraspecific diversification to distinguish between species and populations (Matute and Sepúlveda, 2019). The International Code of Phylogenetic Nomenclature (PhyloCode, http://phylonames. org/code/, accessed 8 June 2020) was established to become the sole code governing the names of taxa. It can be distinguished from conventional hierarchic nomenclatural systems by being rankless, since in this system clades and species refer not to ranks, but to different kinds of biological entities (Robinson and Kommedahl, 2002). Although the Phylo Code facilitates the naming of new clades as they are discovered, this does not mean that all clades must be named (Cantino and Queiroz, 2010). In particular, subspecific groupings may require the use of wider integrated approaches to provide a correct identification. The present chapter focuses on the controversial subspecific groupings used in plant pathology, as illustrated by Rhizoctonia solani, Colletotrichum spp., Fusarium oxysporum and Verticillium spp. The road ahead is long and hard, since species nomenclature is based on diverse and not always reproducible criteria. Furthermore, we should not put aside how a given clade has been named historically.

303

Structured Case Summaries Rhizoctonia solani Rhizoctonia solani (teleomorph: the basidiomycete Thanatephorus cucumeris) is a plant-pathogenic fungus with a wide host range and worldwide distribution. It is one of several causal agents of the condition known in agriculture as ‘damping off ’, and which results in the death of seedlings. R. solani does not produce conidia, hence it is identified only from mycelial characteristics. Its hyphal cells are multinucleate, and the hyphae are wide and tend to branch at right angles (Fig. 17.1). A septum near each hyphal branch and a slight constriction at the branch are diagnostic. Hyphae may also be differentiated according to the formation of filaments formed by swelling cells called moniliaceous hyphae. Moreover, the colonies can develop buff to dark-brown sclerotia. The sexual structures of T. cucumeris are normally produced on plant debris, and in the laboratory the fungus occurs in its asexual R. solani stage, making mating studies extremely difficult. The asexual species is subdivided into anastomosis groups (AGs) based on hyphal fusion between compatible strains. Every AG is composed of isolates that undergo a process of hyphal attraction and fusion. It is assumed that the capacity to fuse involves some degree of genetic relatedness, and that isolates belonging to different AGs would be less related. Although this is an occasionally imprecise method, the use of anastomosis reactions between hyphae from different R. solani isolates was widely accepted and quickly extended as a valid system to recognize groups at a subspecific level (Ogoshi, 1987). The results of hyphal interactions can be confusing due to the existence of four different reaction categories: C0 (no reaction), C1 (contact fusion), C2 (killing reactions) and C3 (hyphal perfect fusion). C2 and C3 reactions between isolates indicate genetic difference and genetic identity, respectively. The fusion frequency percentage (%FF) can be determined using the formula: %FF = (A × 100)/B, where A is the sum of fusion points (in C1, C2 and C3 categories) in 15 microscope fields, and B is the sum of contact points in 15 fields (for all four categories) (Sneh et al., 1998). This has resulted in 14 different AGs

304

E. Monte et al.

Fig. 17.1. Rhizoctonia solani AG2-2 growing on minimal medium. The colony is characterized by the absence of any sporulation and the hyphae are wide and tend to branch at right angles. Inset: the formation of a septum near each hyphal branch and a slight constriction at the branch are diagnostic.

being recognized (from AG-1 to AG-13 and AGBI, although phylogenetic evidence suggests that the last one clusters together with AG-2 isolates). Eight of these groups (AG-1, -2, -3, -4, -6, -7, -8 and -9) are further divided into subgroups according to their morphological features, virulence, host range, nutrient requirements and molecular characteristics (Ogoshi, 1987; Carling et al., 2002; Monte and Suárez, 2010; Ajayi-Oyetunde and Bradley, 2018). AG2 is divided into nine subgroups depending on pathogenicity and nutritional requirements (1, t, Nt, 2IIIB, 2IV, 2LP, 2WB, 3 and 4). Similarly, AG4 is divided into three subgroups (HGI, HGII and HGIII) according to their DNA similarity. The most commonly occurring anastomosis group is AG4 and, together with the AG21 subgroup, has the broadest distribution (Vilgalys and Cubeta, 1994). Analysis of nuclear ribosomal internal transcribed spacer (ITS) sequences provided evidence that genetically distinct groups from Rhizoctonia corresponded well with previously recognized AG or AG subgroups (González et al., 2001). It is not sufficiently clear, however, that the present taxonomy can accommodate the overlaps seen in the phylogenetic trees obtained from R. solani isolates from AGs exhibiting intragroup variation. Further complications are the sterility of certain

isolates that cannot be assigned to an AG and the lack of clamp connections that can help differentiate between homokaryons and heterokaryons. There is, therefore, a crucial need for a more precise R. solani taxonomic framework, as at this time we have no basis on which to determine whether the AGs and AG subgroups may constitute distinct species.

Colletotrichum Colletotrichum is one of the most important genera of fungi and occupies eighth place in the worldwide ranking of plant pathogens based on their scientific/economic importance (Dean et al., 2012). Members of this genus can infect more than 30 plant genera, causing anthracnose disease and postharvest decay on a wide range of crops (Cannon et al., 2000; Damm et al., 2012a,b). Colletotrichum diseases are globally distributed, occurring in tropical, subtropical and temperate regions (Freeman and Shabi, 1996; Da Lio et al., 2018). Moreover, Colletotrichum species show various lifestyles, ranging from necrotrophy to hemibiotrophy, and from parasitism to endophytism (O’Connell et al., 2012; Baroncelli et al., 2017; also see Chapter 14).

Are Species Concepts Outdated for Fungi?

The ascomycete fungus Colletotrichum (typified by Colletotrichum lineola) was introduced by Corda in 1831 as the sole genus in the family Glomerellaceae (Glomerellales: Sordariomycetes). It is an anamorphic stage, with a number of species linked to teleomorphs included in the genus Glomerella (Cannon et al., 2012). Initially, the taxonomy of Colletotrichum was largely based on host range and morphological characters (Sutton, 1992), which led to limited species resolution. Currently, Colletotrichum includes species complexes of which Colletotrichum gloeosporioides (Weir et al., 2012), Colletotrichum acutatum (Damm et al., 2012a), Colletotrichum boninense (Damm et al., 2012b), Colletotrichum truncatum (Cannon et al., 2012) and Colletotrichum destructivum (Damm et al., 2014) are considered the most important plant pathogens. Although morphology in combination with ITS sequences can be used to place isolates in species complexes within Colletotrichum, they are not reliable for precise species discrimination. Several authors have proposed combinations of phylogenetic, morphological, geographical and ecological information in order to resolve species boundaries in the genus (Cai et al., 2009; Cannon et al., 2012; Damm et al., 2012a,b). Recently, multi-locus phylogenies, often including a mix of nuclear, ribosomal and protein-coding genes (ITS, ACT, CAL, CHS-1, GAPDH, TUB2, HIS3), together with morphological characteristics, have been used to identify new species and to characterize Colletotrichum species from new hosts or countries (Sharma and Shenoy 2016; Guarnaccia et al., 2017; Da Lio et al., 2018; Damm et al., 2019; Fu et al., 2019). As a consequence, over 200 Colletotrichum species are currently recognized (Silva et al., 2019), including 14 species complexes and 15 newly identified species (Marin-Felix et al., 2017; Damm et al., 2019; Fu et al., 2019). However, species diversity may be underestimated, as several reports have been based on small sample numbers from restricted areas. Moreover, the degree of specificity observed in some of the species complexes has been linked to incomplete sampling and/or ambiguous species concepts (Cannon et al., 2012), and many studies continue showing the importance of cross-infection potential to different hosts by more than one species (Freeman and Shabi, 1996; Damm et al., 2012b; Weir et al., 2012). Species identification and diversity in Colletotrichum are particularly important owing to

305

the importance of this genus in many areas, including agriculture (plant pathogens and quarantine, biocontrol, plant breeding), understanding evolutionary history and whole-genome sequencing (O’Connell et al., 2012; Sharma et al., 2015; Gan et al., 2016; Han et al., 2016; Jayawardena et al., 2016; Liu et al., 2016; Baroncelli et al., 2017; Guarnaccia et al., 2017; Fu et al., 2019). Colletotrichum continues to be a taxonomically confusing genus and several ongoing problems in its classification are well known. These include the variability of names based on the sexual stage; the difficulties of standardizing highly variable morphological characters; the high variability of host range and pathogenicity; the shortage of type materials available for molecular studies; and the presence of numerous erroneous names attached to Colletotrichum sequences in public databases (Cai et al., 2009; Hyde et al., 2009; Sharma and Shenoy, 2016; Damm et al., 2019). Although phylogenetic analyses based on multi-locus DNA sequence data and the application of SGC approaches have provided enhanced species resolution (Liu et al., 2016; Da Lio et al., 2018; Damm et al., 2019), there are still outstanding nomenclatural problems with Colletotrichum species. Thus, although many milestones have been reached, there is no consensus among Colletotrichum researchers about the need for mating compatibility studies and pathogenicity tests, the minimal number of isolates, which gene markers to use or the type of phylogenetic analysis and biochemical tests that should be used in a study to define and delimit a species within different species complexes of Colletotrichum (Sharma and Shenoy, 2016; see also Chapter 14).

Fusarium oxysporum Fusarium is regarded as one of the most adaptive, diverse and versatile genera in the Eumycota (Summerell et al., 2010; Geiser et al., 2013; Aoki et al., 2014). The genus contains agronomically important plant pathogens, mycotoxin producers and opportunistic human pathogens (Aoki et al., 2014). Two Fusarium species, Fusarium graminearum and F. oxysporum, have been listed as fourth and fifth, respectively, in a list of the top ten fungal plant pathogens (Dean et al., 2012).

306

E. Monte et al.

F. oxysporum is recognized as one of the most important members of the genus (Gordon and Martyn, 1997). The genus Fusarium is a good example of the challenges associated with species definitions and concepts discussed earlier, and its classification has undergone significant changes since its inception (Aoki et al., 2014). Historically, the classification of this group was based on a morphological species concept derived solely from cultural characteristics, resulting in a confusing and unstable taxonomy that was affected by environmental factors and lack of resolution (Snyder and Hansen, 1945; Gerlach and Nirenberg, 1982; Nelson, 1991). Multi-locus phylogenetic studies have produced a more objective identification of species boundaries in Fusarium. Species-level phylogenetic recognition and identification in Fusarium is based on intron-rich portions of protein-coding genes, such as the translation elongation factor 1α (TEF) gene (Geiser et al., 2004). This gene, which encodes for a key step in protein translation, has emerged as the barcoding region for Fusarium species identification with great phylogenetic value, as it is highly informative at the species level. The vast phylogenetic species diversity resulting from the application of molecular phylogenetics has led to the recognition of several species complexes, including the F. oxysporum species complex, a monophyletic group comprising multiple phylogenetic lineages (O’Donnell et al., 1998; Baayen et al., 2000; Geiser et al., 2013). Decades of research have resulted in members of the F. oxysporum species complex being widely recognized as fungal plant pathogens causing vascular wilts, and stem, bulb and root rots in more than 100 different plants, including important agricultural commodities such as banana, cotton and tomato (Michielse and Rep, 2009; EdelHermann and Lecomte, 2019). In addition, some strains of F. oxysporum are also efficient and ubiquitous soil inhabitants, root colonizers and recognized endophytes (Gordon and Martyn 1997; Olivain and Alabouvette 1997; Demers et al., 2015). Although this species complex displays a wide host range as a collective group, individual pathogenic isolates are recognized as being host specific, causing disease only at a plant species or genus level. Groups of such host-specific plant-pathogenic isolates are classified in the informal (non-taxonomic) category of ‘forma specialis’

(plural: formae speciales) (f. sp.) (Armstrong and Armstrong, 1978; Kistler, 1997), and some 106 formae speciales have been described to date (Edel-Hermann and Lecomte, 2019). Strains that cause disease in tomato (f. sp. lycopersici), cotton (f. sp. vasinfectum), bananas (f. sp. cubense), strawberry (f. sp. fragariae) and chickpea (f. sp. ciceris) are examples of pathogens that cause major economic losses globally (Skovgaard et al., 2001; Fourie et al., 2009; Michielse and Rep, 2009; Jiménez-Díaz et al., 2015; Koike and Gordon, 2015). Further pathogenic variability below the forma specialis level has been well documented. Variation in virulence among host cultivars can be found within strains of a given forma specialis, leading to the designation of pathogenic races. Pathogenic races have been described for many formae speciales (Correll, 1991; Gordon and Martyn, 1997), and one example is the three races (1, 2 and 3) described in f. sp. lycopersici affecting tomato (Lievens et al., 2009). Fusarium wilt of chickpea can be caused by one of eight races identified in f. sp. ciceris (races 0, 1A, 1B/C, 2, 3, 4, 5 and 6); these can affect different plant cultivars and can cause different symptoms such as yellowing and wilting (Jiménez-Díaz et al., 2015). Knowledge of these races is a key requirement for understanding the development of the disease and the deployment of resistant cultivars (Jiménez-Díaz et al., 2015). Banana production is affected by four races of F. oxysporum f. sp. cubense that are identified by their pathogenicity to specific clonal triploid banana hosts (races 1, 2, Subtropical Race 4 and Tropical Race 4), with the latter having the potential to cause catastrophic losses in the current monoculture-based banana industry (Ordonez et al., 2015; Magdama et al., 2019). The concept of forma specialis has been crucial to plant pathologists, but the concept does not necessarily reflect evolutionary relationships, and is not a formal taxonomic classification. Most of the formae speciales described within the F. oxysporum species complex are polyphyletic; that is, isolates that cause disease to a particular host do not necessarily share a most common recent ancestor. They were initially thought to have evolved by convergent evolution (O’Donnell et al., 1998; Baayen et al., 2000; Skovgaard et al., 2001), but recent evidence now suggests that the genetic mechanisms

Are Species Concepts Outdated for Fungi?

involved in host-specific pathogenicity in this group may be linked to the acquisition of adaptive functions associated to virulence factors, secondary metabolites, transposition of genomic regions or other events involving repetitive sequences (Ma et al., 2010, 2013). This makes pathogen identification with traditional phylogenetic markers difficult, as members of one forma specialis may be more closely related to those in other formae speciales than they are to members of their own. F. oxysporum has no known teleomorph, although molecular evidence indicates that cryptic sexual reproduction cannot be ruled out (Taylor et al., 1999) and strains carrying the mating type idiomorphs MAT1-1 and MAT1-2 have been observed (Fourie et al., 2009). Phylogenetic species boundaries and clades within F. oxysporum have been reassessed recently and two phylogenetic species (PS1 and PS2) have been described (Laurence et al., 2014). More recently Lombard et al. (2019) identified 21 cryptic species in the species complex, 15 of which they named at species level. Lombard et al. (2019) also pointed out that they had included only a small subset of the strains assigned to the species complex, and that further study may identify further diversity and new species. Recently, comparative genomics studies of several Fusarium species (F. graminearum, Fusarium verticillioides, Fusarium solani f. sp. pisi and F. oxysporum f. sp. lycopersici) have identified clear genomic compartmentalization at both structural and functional levels. There is a core component of the genome that encodes functions necessary for growth and survival, and this is shared among Fusarium species, and an accessory ‘lineagespecific’ genomic component (Ma et al., 2010, 2013). The accessory genome comprises a variable number of small (< 2Mb), supernumerary or conditionally dispensable chromosomes that are enriched for transposable elements, and which carry apparently exogenous genes, some of which are known to confer host-specific pathogenicity. The hypothesis of horizontal gene transfer between genetically unrelated F. oxysporum strains is consistent with these observations, and such transfer has been demonstrated experimentally. Small chromosomes harbouring these genes were transferred between F. oxysporum strains under laboratory conditions, converting an isolate previously non-pathogenic

307

to tomato (the biocontrol isolate Fo47) to a tomato pathogen (Ma et al., 2010). In addition to horizontal chromosome transfer, forces involving transposition and other events involving repetitive sequences are strong candidates in the horizontal transfer of adaptive functions (Ma et al., 2013). The frequency and mechanisms of horizontal gene transfer events in nature remain unknown, but this is a strong indication that F. oxysporum strains can exchange entire chromosomes that can potentially lead to changes in pathogenicity. Much of our understanding of the genetic basis of pathogenicity and host specificity within F. oxysporum comes from studies of strains of the forma specialis lycopersici. Fourteen effector genes encoding small proteins secreted in tomato, known as ‘secreted in xylem’ or SIX genes, have been identified so far in strains of forma specialis lycopersici (Houterman et al., 2007; Ma et al., 2010; Schmidt et al., 2013). Three of these are avirulence genes, and their presence has been found to have a direct correlation to the race structure within F. oxysporum f. sp. lycopersici (Takken and Rep, 2010). The products of the avirulence genes interact with tomato resistance gene products in a gene-for-gene fashion, inducing plant defences, although the exact method of interaction between the products has yet to be discovered (Takken and Rep, 2010). SIX gene homologues have been found in other formae speciales and it has been hypothesized that host-specific plant pathogenicity may be determined by unique combinations of SIX genes (Lievens et al., 2009), although some SIX genes have also been found in non-pathogenic isolates (Rocha et al., 2016). In conclusion, the question as to whether F. oxysporum is one highly diverse species or a group of multiple, well-defined phylogenetic species is still subject to debate (see Chapter 14). There has been a significant bias towards plant-pathogenic strains in attempts to recognize species in this group, disregarding the vast non-pathogenic ecological, geographical and genetic diversity. This sampling bias raises questions about the validity of applying SGC and CBD approaches to this group, given that agriculture promotes the selection and expansion of highly successful clonal lineages. The diversity found in soil and endophytic F. oxysporum populations is remarkable (Laurence et al., 2012; Demers et al.,

308

E. Monte et al.

2015), and can significantly impact pathogen identification methods (Magdama et al., 2019). The plasticity of genomes and permeability of species boundaries enabling horizontal transfer and rearrangement of chromosomes have profound impacts on population biology, niche adaptation and function, which can lead to the emergence of new pathogens. From a phytopathological perspective, the key information resides mainly at the forma specialis and pathogenic variant levels, for which we still rely heavily on pathogenicity testing of isolates. This complexity turns Fusarium wilts into some of the most devastating and challenging diseases in agricultural production.

Verticillium The genus Verticillium was erected for Verticillium tenerum, a saprophytic filamentous fungus from a stem of hollyhock in Germany more than 200 years ago. The name was based on the arrangement of conidiogenous cells in whorls (Latin: verticillus) on branched conidiophores (Fig. 17.2).

Subsequently the genus was characterized by having usually 1-celled, hyaline conidia in aculeate phialides inserted in a mesotonous to acrotonous position on verticillate conidiophores (Zare et al., 2004). Over the years, some 190 ecologically diverse species were added to the genus which, in addition to saprobes, also included plant and animal pathogens, and fungal parasites. More recently, molecular phylogenies have shown that the morphological definition of Verticillium was too vague, and that the genus comprised several distantly related groups. These groups were transferred from Verticillium to various other genera. The insect and fungal pathogens were placed in Lecanicillium, the nematode parasites were placed in Pochonia and Haptocillium, and the plant pathogens were placed in Gibellulopsis and Musicillium (Zare et al., 2001, 2007; Zare and Gams, 2001, 2008). As a consequence, Verticillium was redefined and reduced to only five plant-associated species: Verticillium albo-atrum, Verticillium dahliae, Verticillium longisporum, Verticillium nubilum and Verticillium tricorpus, with V. dahliae as the type species (Gams et al., 2005). The redefined genus Verticillium sensu stricto was placed in the Plectosphaerellaceae in the subclass Hypocreomycetidae in the Sordariomycetes (Zhang et al., 2006). Much of the interest in Verticillium species concerns their role as causal agents of vascular wilt diseases (viz. Verticillium wilts, VW). Verticillium wilts are among the most devastating and difficult to manage fungal diseases worldwide, affecting nearly 400 dicotyledonous plant species including horticultural and woody ornamentals, and crops, mainly in temperate regions (Klosterman et al., 2009). V. albo-atrum, V. dahliae and V. longisporum are by far the most damaging pathogens, causing yield losses of 50% or more in high-value crops including cotton, lettuce, olive, potato, rapeseed and strawberry (Pegg and Brady, 2002; Rowe and Powelson, 2002; Atallah et al., 2011; Jiménez-Díaz et al., 2012). Redefinition of species in Verticillium

Fig. 17.2. Verticillate conidiophore characteristic of the genus Verticillium. Note phialides arranged in whorls and conidia at the tip of phialides.

Although the redefined genus Verticillium is placed with ascomycete fungi, the five species remaining in it are solely anamorphic, with no known sexual stage (Klosterman et al., 2009). These species form melanized resting structures (i.e. chlamydospores in short chains, microsclerotia

Are Species Concepts Outdated for Fungi?

or brown-pigmented mycelia, depending on the species; Fig. 17.3), which allow the fungus to survive for at least 14 years in either soil or infested plant debris (Klosterman et al., 2009). In addition to their important role in the ecology of Verticillium species, the resting structures have taxonomic relevance as they provide the primary morphological characteristics used to distinguish the five species. The definition of V. albo-atrum was based on the formation of dark-brown mycelium, V. dahliae and V. longisporum by production of microsclerotia and V. nubilum by the formation of short chains of chlamydospores; Verticillium tricorpus was the only species to produce all three resting structures (Isaac, 1949, 1953; Karapapa et al., 1997). Subsequent analyses have raised doubts about the validity of the resting structures as taxonomic characters, as molecular phylogenetic trees recovered four of the five species as an overlapping group (Zare

309

et al., 2007). These doubts supported earlier studies that had shown important biological and phytopathological diversity within the morphologically defined species. For instance, strains of V. albo-atrum from lucerne are highly virulent on this plant but strains of V. albo-atrum from other hosts do not infect lucerne, or do so poorly (Barbara and Clewes, 2003). This separation was strongly supported by molecular markers and led to the recognition of two distinct groups, V. albo-atrum Grp1 and V. albo-atrum Grp2 (Mahuku and Platt, 2002). Similarly, in V. dahliae, many isolates may infect a wide range of hosts (Bhat and Subbarao, 1999), whereas others are pathogenically adapted to specific plant species (Resende et al., 1994). The significance of taxonomic inconsistencies in defining Verticillium species for the biology and management of VW diseases were considered by Subbarao and co-workers who

Fig. 17.3. Microsclerotia of Verticillium dahliae formed in water agar and within a xylem vessel of an infected plant. Note different stages of microsclerotial development and morphology. Elongated microsclerotia are characteristic of the cotton- and olive-defoliating V. dahliae pathotype. (a) Initial and (b) mature stages in the development of elongated microsclerotia of the defoliating pathotype on water agar; (c) globular, irregular mature microsclerotia of the cotton-non-defoliating pathotype on water agar; (d) microsclerotia of the defoliating pathotype within a stem xylem vessel of olive cv. Picual.

310

E. Monte et al.

proposed a more robust framework for Verticillium (Inderbitzin et al., 2011a; Inderbitzin and Subbarao, 2014). They recognized ten phylogenetic species in Verticillium from a multi-locus phylogenetic study (ACT, EF, GPD, TS and ITS) of 257 widely diverse Verticillium isolates that included comparisons to ex-type strains, herbarium material and the species descriptions in the literature (Inderbitzin et al., 2011a). These ten species included the five previously known and five new species (Verticillium alfalfae, Verticillium isaacii, Verticillium klebahnii, Verticillium nonalfalfae and Verticillium zaregamsianum). Nine of the ten species are monophyletic and haploid. The tenth species, V. longisporum, was found to be a hybrid diploid resulting from three hybridization events among four parental haploid lineages in three species. Two of the parents were identified as V. dahliae (lineages D2 and D3), and two were unknown species which were named Species A1 and Species D1 (Inderbitzin et al., 2011a,b). The V. albo-atrum isolates were split into three lineages, of which two were sister species, namely V. alfalfae and V. nonalfalfae. These species form melanized mycelium and are morphologically indistinguishable cryptic species, but differ in pathogenicity. V. alfalfae and V. nonalfalfae are phylogenetically related to V. dahliae and V. longisporum, and were previously referred to as the lucerne and non-lucerne pathotypes of the former V. albo-atrum, respectively (Barbara and Clewes, 2003). The name V. albo-atrum was retained for the third lineage, which was related to V. tricorpus, a species that produces both melanized mycelium and microsclerotia. Similarly, V. isaacii, V. klebahnii and V. zaregamsianum were also found to be phylogenetically related to V. tricorpus. The species V. isaacii, V. klebahnii and V. tricorpus are morphologically indistinguishable from each other and they all produce chlamydospores, melanized mycelium and microsclerotia, as well as yellow-pigmented hyphae, on potato-dextrose agar (PDA). V. zaregamsianum produces microsclerotia, as do V. dahliae and V. longisporum, but V. zaregamsianum differs from them as it forms yellow-pigmented hyphae on PDA. The marked differences in host range of the different Verticillium species mean that precise identification is very important for phytopathology studies. Of the ten species, V. dahliae has the widest host range worldwide and is of greater economic impact. V. dahliae has been recorded from more than 300 species in several

orders of seed plants, mainly dicotyledons but also some monocotyledons, which possibly constitutes one of the broadest host ranges of any fungal plant pathogen (EFSA Panel on Plant Health, 2014). Comparatively, V. albo-atrum, V. alfalfae and V. nonalfalfae have narrower host ranges and a more restricted global distribution in fresher and more humid areas than those favouring V. dahliae. Nevertheless, they are known to cause significant yield losses in potato (V. alboatrum); lucerne (V. alfalfae); and celery, cotton, hops, petunia, potato, spinach, tomato and tree of heaven (V. nonalfalfae) (Inderbitzin and Subbarao, 2014). V. longisporum is a pathogen of crucifer crops, except for broccoli and sugar beet. Differences in pathogenicity occur between V. longisporum lineages, with lineage A1/D1 being the most virulent on oilseed rape in Europe and Japan where it causes the non-wilting disease Verticillium stem striping. Conversely, lineage A1/D3 is not pathogenic on this crop, and lineage A1/D2 was described as the most virulent lineage on horseradish (Novakazi et al., 2015; Depotter et al., 2016). V. klebahnii and V. zaregamsianum are pathogenic to artichoke and lettuce, and V. tricorpus to lettuce, potato and tomato. V. isaacii and V. nubilum are minor pathogens and only cause disease on artichoke and lettuce, or tomato and potato (Inderbitzin et al., 2011a; Usami et al., 2011; Gurung et al., 2015). The genome of three strains of V. alfalfae, V. dahliae and V. longisporum were sequenced by Inderbitzin et al. (2014). They found that the genomes of the V. alfalfae and V. dahliae strains differed in four regions, each of approximately 300–350 kb, which were present in V. dahliae and absent in V. alfalfae. Hybridizations with probes from each of the regions and DNA from 13 strains of V. alfalfae, V. dahliae and V. longisporum suggested much variation between species and strains. These four regions, named lineage-specific (LS) regions 1 to 4, contain genes with putative or known functions in pathogenicity and virulence (Klosterman et al., 2011; de Jonge et al., 2013). The new taxonomic system in Verticillium exemplifies the concept that species names are carriers of information on plant pathogens that is of significant importance for the management of plant diseases (Rossman et al., 2008). Confusion in identifying Verticillium species under the previous taxonomy has given rise to regrettable consequences for

Are Species Concepts Outdated for Fungi?

VW management, as discussed by Inderbitzin and Subbarao (2014). To avoid this, precise Verticillium species identification according to the new taxonomy is required to understand the etiology and management of VW diseases. Owing to the existence of cryptic Verticillium species differing in biological properties, and the instability of morphological traits in culture (e.g. formation of microsclerotia and yellow hyphal pigment), such identifications must be undertaken with appropriate molecular protocols (Inderbitzin et al., 2013; Tran et al., 2013). Intraspecific diversity in Verticillium species and its phytopathological relevance Although V. dahliae, V. longisporum and V. nonalfalfae are pathogenic on a number of crop plants, populations of these pathogens may harbour significant pathogenic variation. An understanding of the nature of such variation is of utmost importance, because its absence undermines the efficiency of host-resistant cultivars for VW disease management. Pathogenic variation in Verticillium can be of two types: either isolates can differ in the severity of symptom types that they induce in the host plant (i.e. virulence) and are called pathotypes, or they can show differential interactions with resistance genes in their host plants and are called pathogenic races. Variation in virulence in Verticillium was first reported for V. albo-atrum from hops (now assigned to V. nonalfalfae) by Isaac and Keyworth (1948). These authors differentiated between isolates that caused mild wilt symptoms (named fluctuating pathotype, as the affected hop plants eventually recovered after some years) from those that are lethal to infected plants (named progressive or lethal pathotype). Later on, lethal hop isolates of V. nonalfalfae were found to be genetically distinct from mild isolates (Radišek et al., 2006). Similarly, two genetically distinct groups of isolates have been described in V. longisporum lineage A1/D that correlate with the geographic distribution of the isolates. They were named ‘A1/D1 East’, which causes Verticillium stem striping in continental Europe, and ‘A1/D1 West’ that caused the sudden emergence of this disease in the UK (Depotter et al., 2017). In V. dahliae, pathogenic variation must be viewed in the context of the highly clonal structure

311

of its populations (Jiménez-Gasco et al., 2014; Milgroom et al., 2014). Clonality in this fungus was first associated with VCGs, which comprise isolates that can form stable heterokaryons through anastomosis. Five main VCGs (VCG1–4, plus VCG6) were identified in V. dahliae, of which VCG1, 2 and 4 were further divided into subgroups A and B based on the frequency, speed and vigour of complementation (Katan, 2000). These VCGs correlated almost perfectly with molecular genetic markers, with the major exception of VCG2B. This VCG was subdivided into genetically distinct subgroups that correlated with PCR markers of 334 bp (VCG2B334) or 824 bp (VCG2B824) (Collado-Romero et al., 2006; Jiménez-Gasco et al., 2014). Subsequently, clonality in V. dahliae was confirmed without ambiguity by genotyping through sequencing and single-nucleotide polymorphisms. Nine distinct clonal lineages were identified and named according to VCGs (i.e. lineages 1A, 1B, 2A, 2B334, 2B824, 4A, 4B, 6 and a newly identified recombinant lineage 2BR1), which were shown to have arisen originally by recombination (Milgroom et al., 2014). The current clonal structure in V. dahliae is probably a consequence of selection for adaptation and clonal expansion of fit genotypes on agricultural crops (Jiménez-Díaz et al., 2006; Korolev et al., 2008; Milgroom et al., 2016). The two types of pathogenic variation, pathotypes and pathogenic races, exist within lineages of V. dahliae. A distinct defoliating pathotype (D) was first described in California in the early 1960s (misidentified as V. albo-atrum). This pathogen was characterized by rapid and severe defoliation on cotton, okra and olive, but caused a range of leaf necrosis and wilting without defoliation in other plants (Schnathorst and Mathre, 1966; Jiménez-Díaz et al., 2006; Korolev et al., 2008). This highly virulent D pathotype overcame disease resistance in cotton and olive that was still effective against a previously existing non-defoliating (ND) pathotype. The D pathotype also caused severe disease in susceptible cultivars with much lower levels of inoculum in soil than the ND pathotype (Schnathorst and Mathre, 1966; Jiménez-Díaz et al., 2012). The D pathotype now occurs worldwide. Phylogenetic and genealogical analyses of a large number of isolates have inferred that the D pathotype arose once in the southwestern USA and was later introduced at least five times into

312

E. Monte et al.

the Mediterranean basin (Milgroom et al., 2016). Migration of the D pathotype was most likely through infected cotton seed and, once introduced into a region, it became well established and was spread to large areas by infected planting stocks, infected plant parts and water (Jiménez-Díaz et al., 2012). The D pathotype occurs only within lineage 1A, whereas the ND pathotype is found in all other lineages (Milgroom et al., 2014, 2016). Recent research has associated the D phenotype with two genes, VdDf5 and VdDf6, which share homology with polyketide synthases involved in secondary metabolism (Zhang et al., 2019). These two genes are located in an LS region of the V. dahliae genome, which was previously shown to have been acquired through horizontal transfer from F. oxysporum f. sp. vasinfectum infecting cotton (Chen et al., 2018). In independent work, Li (2019) found that a single gene product coded for by a duplicate D pathotype-specific gene was responsible for the defoliating syndrome in cotton and olive. The second type of pathogenic variation in V. dahliae concerns pathogenic races 1 and 2. Race 1 is defined by the presence of the gene Ave1, which confers avirulence to cultivars of tomato or lettuce that carry the resistance gene Ve1 or its homologue Vr1, respectively (Kawchuk et al., 2001; Hayes et al., 2011). Ave1 in V. dahliae appears to have been acquired via horizontal transfer. Ave1 homologues are ubiquitous in plants and are bacterial virulence factors in the citrus canker pathogen Xanthomonas axonopodis pv. citri (Gottig et al., 2008). Ve1 and its homologues encode pattern-recognition receptors that recognize products encoded by Ave1, leading to a defence response against infection by race 1 (de Jonge et al., 2012). Conversely, race 2 is defined by the lack of Ave1, thus evading recognition by the Ve1-gene product and being potentially pathogenic on plants carrying the resistance gene (Short et al., 2014). Race 1 is restricted within lineage 2A and the ND pathotype. Conversely, race 2 occurs within seven lineages (1A, 1B, 2B334, 2B824, 2BR1, 4A and 4B), and belongs to either the D or ND pathotypes (Jiménez-Díaz et al., 2017). In summary, the significance of pathogenic variation in the most important VW diseases means that information provided at the species level is of insufficient value for their adequate understanding and management. In V. dahliae,

pathogenic variation is best explained in terms of clonal lineages, because it brings a biological interpretation to the study of this fungus, instead of a series of correlative species studies with results that vary depending on the specific population sampled. Identification of clonal lineages in V. dahliae is crucial for a better understanding of the role of endophytic infections of alternative hosts on the biology and ecology of the pathogen, and the epidemiology and management of Verticillium diseases.

Conclusion The question posed in the title of this chapter is complex and possibly does not have a single – or simple – answer. Precise naming of a species is very important for phytopathogenic fungi because names may carry key information for the management of the fungal diseases. Naming fungal species based on morphological traits or biological properties is outdated and unreliable. Current thinking, as exemplified by Colletotrichum and Verticillium taxonomy, indicates that phylogenetic species recognition, based either on SGC or CBD approaches, is the species concept most widely applied to fungi. However, the validity of this concept may vary, depending upon the group of organisms that a species is aiming to encompass. For example, the use of phylogenetic relationships is questionable for identifying formae speciales of the F. oxysporum species complex. In this case the formae speciales concept is entirely based on plant genus- or plant species-specific pathogenicity that may have been acquired by horizontal gene transfer from evolutionary unrelated donors. A further constraint to the use of PSC for recognizing species in phytopathogenic fungi is the variation in pathogenicity found in species or special forms. From a phytopathological point of view, the key information for identification lies in the pathogenicity properties associated with specific subspecies grouping, namely AGs (R. solani), pathotypes and pathogenic races (F. oxysporum, Verticillium species), for which we still rely heavily on pathogenicity testing or molecular typing of isolates. We expect that the availability of whole-genome sequencing will become widespread over time and will ultimately replace traditional phenotype and genetic

Are Species Concepts Outdated for Fungi?

markers based on a given DNA sequence. Fungal plant pathogens, owing to their very nature, provide an example of the correct delimitation of species boundaries and provide insight into pathogenesis. However, we cannot ignore plant responses to the molecular interactions with ‘friends and foes’ which determine the existence

313

or non-existence of pathogenic subspecific levels. Debate on species identification is no longer a question of being in favour of ‘splitters’ rather than of ‘lumpers’, but defining phytopathogenic species is particularly complicated and requires further consideration of subspecific categorizations (also see Chapters 14 and 18).

References Ajayi-Oyetunde, O.O. and Bradley, C.A. (2018) Rhizoctonia solani: taxonomy, population biology and management of rhizoctonia seedling disease of soybean. Plant Pathology 67, 3–17. doi:org/10.1111/ ppa.12733 Aoki, T., O’Donnell, K. and Geiser, D.M. (2014) Systematics of key phytopathogenic Fusarium species: current status and future challenges. Journal of General Plant Pathology 80, 189–201. doi:10.1007/ s10327-014-0509-3 Armstrong, G.M. and Armstrong, G.K. (1978) Formae speciales and races of Fusarium oxysporum causing wilts of the cucurbitaceae. Phytopathology 68, 19–28. doi:10.1094/Phyto-68-19 Atallah, Z.K., Hayes, R.J. and Subbarao, K.V. (2011) Fifteen years of Verticillium wilt of lettuce in America’s salad bowl: A tale of immigration, subjugation, and abatement. Plant Disease 95, 784–792. doi:org/10.1094/PDIS-01-11-0075 Baayen, R.P., O’Donnell, K., Bonants, P.J.M., Cigelnik, E., Kroon, L.P.N.M., Roebroeck, E.J.A. et al. (2000) Gene genealogies and AFLP analyses in the Fusarium oxysporum complex identify monophyletic and nonmonophyletic formae speciales causing wilt and rot disease. Phytopathology 90, 891–900. doi:10.1094/PHYTO.2000.90.8.891 Barbara, D.J. and Clewes, E. (2003) Plant pathogenic Verticillium species: How many of them are there? Molecular Plant Pathology 4, 297–305. doi:10.1046/j.1364-3703.2003.00172.x Baroncelli, R., Talhinhas, P., Pensec, F., Sukno, S.A., Le Flonc, G. et al. (2017) The Colletotrichum acutatum species complex as a model system to study evolution and host specialization in plant pathogens. Frontiers in Microbiology 8, 2001. doi:10.3389/fmicb.2017.02001 Bhat, R.G. and Subbarao, K.V. (1999) Host range specificity in Verticillium dahliae. Phytopathology 89, 1218–1225. doi:10.1094/PHYTO.1999.89.12.1218 Bisby, F.A. and Coddington, J. (1995) Biodiversity from a taxonomic and evolutionary perspective. In: Heywood, V.H. and Watson, R.T. (eds) Global Biodiversity Assessment. Cambridge University Press, Cambridge, UK, pp. 27–56. Cai, L., Hyde, K.D., Taylor, P.W.J., Weir, B.S. et al. (2009) A polyphasic approach for studying Colletotrichum. Fungal Diversity 39, 183–204. Cannon, P.F., Bridge, P.D. and Monte, E. (2000) Linking the past, present, and future of Colletotrichum systematics. In: Prusky, D., Freeman, S. and Dickman, M. (eds) Colletotrichum: Host specificity, Pathology, and Host-pathogen interaction. APS Press, St Paul, Minnesota, pp. 1–20. Cannon, P.F., Damm, U., Johnston, P.R. and Weir, B.S. (2012) Colletotrichum - current status and future directions. Studies in Mycology 73, 181–213. doi:10.3114/sim0014 Cantino, P.D. and de Queiroz, K. (2010) International Code of Phylogenetic Nomenclature Version 4c. Committee on Phylogenetic Nomenclature, Athens, Ohio. Carling, D.E., Kuninaga, S. and Brainard, K.A. (2002) Hyphal anastomosis reactions, rDNA-Internal Transcribed Spacer sequences, and virulence levels among subsets of Rhizoctonia solani anastomosis group-2 (AG-2) and AG-BI. Phytopathology 92, 43–50. doi:10.1094/PHYTO.2002.92.1.43 Chen, J.Y., Liu, C., Gui, Y.J., Si, K.W., Zhang, D.D. et al. (2018) Comparative genomics reveals cotton-specific virulence factors in flexible genomic regions in Verticillium dahliae and evidence of horizontal gene transfer from Fusarium. New Phytologist 217, 756–770. doi:10.1111/nph.14861 Collado-Romero, M., Mercado-Blanco, J., Olivares-García, C., Valverde-Corredor, A. and Jiménez-Díaz, R.M. (2006) Molecular variability within and among Verticillium dahliae vegetative compatibility groups determined by fluorescent AFLP and PCR markers. Phytopathology 96, 485–495. doi:10.1094/PHYTO-96-0485

314

E. Monte et al.

Correll, J. (1991) The relationship between formae speciales, races, and vegetative compatibility groups in Fusarium oxysporum. Phytopathology 81, 1061–1064. Da Lio, D., Cobo-Díaz, J.F., Masson, C., Chalopin, M., Kebe, D. et al. (2018) Multi-locus approach for genetic characterization of Colletotrichum species associated with common walnut (Juglans regia) anthracnose in France. Scientific Reports 8, 10765. doi:10.1038/s41598-018-29027-z Damm, U., Cannon, P.F., Woudenberg, J.H.C. and Crous, P.W. (2012a) The Colletotrichum acutatum species complex. Studies in Mycology 73, 37–113. doi:10.3114/sim0010 Damm, U., Cannon, P.F., Woudenberg, J.H.C., Johnston, P.R., Weir, B. et al. (2012b) The Colletotrichum boninense species complex. Studies in Mycology 73, 1–36. doi:10.3114/sim0002 Damm, U., O’Connell, R.J., Groenewald, J.Z. and Crous, P.W. (2014) The Colletotrichum destructivum species complex – hemibiotrophic pathogens of forage and field crops. Studies in Mycology 79, 49–84. doi:10.1016/j.simyco.2014.09.003 Damm, U., Sato, T., Alizadeh, A., Groenewald, J.Z. and Crous, P.W. (2019) The Colletotrichum dracaenophilum, C. magnum and C. orchidearum species complexes. Studies in Mycology 92, 1–46. doi:10.1016/j. simyco.2018.04.001 Dean, R., Van, J.A.L., Preorius, Z.A., Hammond-Kosack, K.E., Di, P.A. et al. (2012) The top 10 fungal pathogens in molecular plant pathology. Molecular Plant Pathology 13, 414–430. doi:10.1111/j.1364-3703.2011 Demers, J.E., Gugino, B.K. and Jiménez-Gasco, M.M. (2015) Highly diverse endophytic and soil Fusarium oxysporum populations associated with field-grown tomato plants. Applied and Environmental Microbiology 81, 81–90. doi:10.1128/AEM.02590-14 Depotter, J.R.L., Deketelaere, S., Inderbitzin, P., von Tiedemann, Höfte, M. et al. (2016) Verticillium longisporum, the invisible threat to oilseed rape and other brassicaceous plant hosts. Molecular Plant Microbe Interactions 17, 1004–1016. doi:10.1111/mpp.12350 Depotter, J.R.L., Seidl, M.F., van den Berg, G.C.M., Thomma, B.P.H.J. and Wood, T.A. (2017) A distinct and genetically diverse lineage of the hybrid fungal pathogen Verticillium longisporum population causes stem striping in British oilseed rape. Environmental Microbiology 19, 3997–4009. doi:10.1111/ 1462-2920.13801 de Jonge, R., van Esse, H.P., Maruthachalam, K., Bolton, M.D., Santhanam, P. et al. (2012) Tomato immune receptor Ve1 recognizes effector of multiple fungal pathogens uncovered by genome and RNA sequencing. Proceedings of the National Academy of Sciences USA 109, 5110–5115. doi:10.1073/ pnas.1119623109 de Jonge, R., Bolton, M.D., Kombrink, A., van den Berg, G.C.M., Yadeta, K.A. et al. (2013) Extensive chromosomal reshuffling drives evolution of virulence in an asexual pathogen. Genome Research 23,1271–1282. doi:10.1101/gr.152660.112 Dugan, F.M. and Everhart, S. (2016) Cryptic species: a leitmotif of contemporary mycology has challenges and benefits for plant pathologists. Plant Health Progress 17, 250–253. doi:org/10.1094/PHP-RV-16-0046. Edel-Hermann, V. and Lecomte, C. (2019) Current status of Fusarium oxysporum formae speciales and races. Phytopathology 109, 512–530. doi:10.1094/PHYTO-08-18-0320-RVW EFSA Panel on Plant Health (2014) Scientific opinion on the pest categorization of Verticillium dahliae Kleb. EFSA Journal 12, 3928. doi:10.2903/j.efsa.2014.3928 Fourie, G., Steenkamp, E.T., Gordon, T.R. and Viljoen, A. (2009) Evolutionary relationships among the Fusarium oxysporum f.sp. cubense vegetative compatibility groups. Applied and Environmental Microbiology 75, 4770–4781. doi:10.1128/AEM.00370-09 Freeman, S. and Shabi, E. (1996) Cross-infection of subtropical and temperate fruits by Colletotrichum species from various hosts. Physiological and Molecular Plant Pathology 49, 395–404. doi:10.1006/ pmpp.1996.0062 Fu, M., Crous, P.W., Bai, Q., Zhang, P.F., Xiang, J. et al. (2019) Colletotrichum species associated with anthracnose of Pyrus spp. in China. Persoonia 42, 1–35. doi:10.3767/persoonia.2019.42.01 Fujita, M.K., Leaché, A.D., Burbrink, F.T., McGuire, J.A. and Moritz, C. (2012) Coalescent-based species delimitation in an integrative taxonomy. Trends in Ecology and Evolution 27, 480–488. doi:org/10.1016/j. tree.2012.04.012 Gams, W., Zare, R. and Summerbell, R.C. (2005) Proposal to conserve the generic name Verticillium (anamorphic Ascomycetes) with a conserved type. Taxon 54, 179. doi:10.2307/25065318 Gan, P., Narusaka, M., Kumakura, N., Tsushima, A., Takano, Y. et al. (2016) Genus-wide comparative genome analyses of Colletotrichum species reveal specific gene family losses and gains during adaptation to specific infection lifestyles. Genome Biology and Evolution 8, 1467–1481. doi:10.1093/gbe/ evw089

Are Species Concepts Outdated for Fungi?

315

Geiser, D.M., Jiménez-Gasco, M.M., Kang, S., Makalowska, I., Zhang, N., Kuldau, G.A. and O’Donnell, K.L. (2004) FUSARIUM-ID v.1.0: A DNA sequence database for identifying Fusarium. European Journal of Plant Pathology 110, 473–479. doi:10.1023/B:EJPP.0000032386.75915.a0 Geiser, D.M., Aoki, T., Bacon, C.W., Baker, S.E., Bhattacharyya, M.K. Brandt, M.E. et al. (2013) One fungus, one name: Defining the genus Fusarium in a scientifically robust way that preserves longstanding use. Phytopathology 103, 400–408. doi:10.1094/PHYTO-07-12-0150-LE Gerlach, W. and Nirenberg, H. (1982) The genus Fusarium- a pictorial atlas. Mitteilungen der Biol. Bundesanstalt für Land- und Forstwirtschaft 209, 1–406. Gonzalez, D., Carling, D.E., Kuninaga, S., Vilgalys, R. and Cubeta, M.A. (2001) Ribosomal DNA systematics of Ceratobasidium and Thanatephorus with Rhizoctonia anamorphs. Mycologia 93, 1138–1150. doi:10.2307/3761674 Gordon, T.R. and Martyn, R.D. (1997) The evolutionary biology of Fusarium oxysporum. Annual Review of Phytopathology 35, 111–128. doi:10.1146/annurev.phyto.35.1.111 Gottig, N., Garavaglia, B.S., Daurelio, L.D., Valentine, A., Gehring, C. et al. (2008) Xanthomonas axonopodis pv. citri uses a plant natriuretic peptide-like protein to modify host homeostasis. Proceedings of the National Academy of Sciences USA 105, 18631–18636. doi:org/10.1073/pnas.0810107105 Guarnaccia, V., Groenewald, J.Z., Polizzi, G. and Crous, P.W. (2017) High species diversity in Colletotrichum associated with citrus diseases in Europe. Persoonia 39, 32–50. doi:10.3767/persoonia.2017.39.02 Gurung, S., Short, D.P.G., Hu, X., Sandoya, G.V., Hayes, R.J. et al. (2015) Host range of Verticillium isaacii and Verticillium klebahnii from artichoke, spinach, and lettuce. Plant Disease 99, 933–938. doi:10.1094/ PDIS-12-14-1307-RE Han, J.H., Chon, J.K., Ahn, J.H., Choi, I.Y., Lee, Y.H. et al. (2016) Whole genome sequence and genome annotation of Colletotrichum acutatum, causal agent of anthracnose in pepper plants in South Korea. Genome Data 8, 45–46. doi:10.1016/j.gdata.2016.03.007 Hayes, R.J., McHale, L.K., Vallad, G.E., Truco, M.J., Michelmore, R.W. et al. (2011) The inheritance of resistance to Verticillium wilt caused by race 1 isolates of Verticillium dahliae in the lettuce cultivar La Brillante. Theoretical and Applied Genetics 123, 509–517. doi:10.1007/s00122-011-1603-y Houterman, P.M., Speijer, D., Dekker, H.L., De Koster, C.G., Cornelissen, B.J.C. and Rep, M. (2007) The mixed xylem sap proteome of Fusarium oxysporum-infected tomato plants. Molecular Plant Pathology 8, 215–221. doi: 10.1111/j.1364-3703.2007.00384.x Hyde, K.D., Cai, L., McKenzie, E.H.C., Yang, Y.L., Zhang, J.Z., et al. (2009) Colletotrichum: a catalogue of confusion. Fungal Diversity 39, 1–17. Inderbitzin, P. and Subbarao, K.V. (2014) Verticillium systematics and evolution: How confusion impedes Verticillium wilt management and how to resolve it. Phytopathology 104, 564–574. doi:org/10.1094/ PHYTO-11-13-0315-IA Inderbitzin, P., Bostock, R.M., Davis, R.M., Usami, T., Platt, H.W. et al. (2011a) Phylogenetics and taxonomy of the fungal vascular wilt pathogen Verticillium, with the descriptions of five new species. PLoS ONE 6, e28341. doi:org/10.1371/journal.pone.0028341 Inderbitzin, P., Davis, R.M., Bostock, R.M. and Subbarao, K.V. (2011b) The ascomycete Verticillium longisporum is a hybrid and a plant pathogen with an expanded host range. PLoS ONE 6, e18260. doi:org/10.1371/journal.pone.0018260 Inderbitzin, P., Davis, R.M., Bostock, R.M. and Subbarao, K.V. (2013) Identification and differentiation of Verticillium species and V. longisporum lineages by simplex and multiplex PCR assays. PLoS ONE 8, e65990. doi:org/10.1371/journal.pone.0065990 Inderbitzin, P., Thomma, B.P.H.J., Klosterman, S.J. and Subbarao, K.V. (2014) Verticillium alfalfae and V. dahliae, agents of Verticillium wilt diseases. In: Dean, R.A. et al. (eds) Genomics of PlantAssociated Fungi and Oomycetes: Dicot Pathogens. Springer-Verlag, Berlin, Germany, pp. 65–97. doi:10.1007/978-3-662-44056-8_4 Isaac, I. (1949) A comparative study of pathogenic isolates of Verticillium. Transactions of the British Mycological Society 32, 137–157. doi:org/10.1016/S0007-1536(49)80002-7 Isaac, I. (1953) A further comparative study of pathogenic isolates of Verticillium: V. nubilum Pethybr and V. tricorpus sp. nov. Transactions of the British Mycological Society 36, 180–195. doi:org/10.1016/S00071536(53)80002-1 Isaac, I. and Keyworth, G.W. (1948) Verticillium wilt of the hop (Humulus lupulus) III. A study of the pathogenicity of the isolates from fluctuating and progressive outbreaks. Annals of Applied Biology 35, 243–249. doi.org/10.1111/j.1744-7348.1948.tb07365.x

316

E. Monte et al.

Jayawardena, R.S., Hyde, K.D., Damm, U., Cai, L., Liu, M. et al. (2016) Notes on currently accepted species of Colletotrichum. Mycosphere 7, 1192–1260. doi:10.5943/mycosphere/si/2c/9 Jiménez-Díaz, R.M., Olivares-García, C., Mercado-Blanco, J., Collado-Romero, M., Bejarano-Alcázar, J., et al. (2006) Genetic and virulence diversity in Verticillium dahliae populations infecting artichoke in eastern-central Spain. Phytopathology 96, 288–298. doi:10.1094/PHYTO-96-0288 Jiménez-Díaz, R.M., Cirulli, M., Bubici, G., Jiménez-Gasco, M.M., Antoniou, P.P. et al. (2012) Verticillium wilt, a major threat to olive production: Current status and future prospects for its management. Plant Disease 96, 304–329. doi:org/10.1094/PDIS-06-11-0496 Jiménez-Díaz, R.M., Castillo, P., Jiménez-Gasco, M.M., Landa, B.B. and Navas-Cortés, J.A. (2015) Fusarium wilt of chickpeas: Biology, ecology and management. Crop Protection 73, 16–27. doi:10.1016/j. cropro.2015.02.023 Jiménez-Díaz, R.M., Olivares-García, C., Trapero-Casas, J.L., Jiménez-Gasco, M.M., Navas-Cortés, J.A., Landa, B.B. and Milgroom, M.G. (2017) Variation of pathotypes and races and their correlations with clonal lineages in Verticillium dahliae. Plant Pathology 66, 651–666. doi:10.1111/ppa.12611 Jiménez-Gasco, M.M., Malcolm, G.M., Berbegal, M., Armengol, J. and Jiménez-Díaz, R.M. (2014) Complex molecular relationship between vegetative compatibility groups (VCGs) in Verticillium dahliae: VCGs do not always align with clonal lineages. Phytopathology 104, 650–659. doi:10.1094/PHYTO07-13-0180-R Katan, T. (2000) Vegetative compatibility in populations of Verticillium –An overview. In: Tjamos, E.C., Rowe, R.C., Heale, J.B. and Fravel, R.D. (eds) Advances in Verticillium Research and Disease Management. APS Press, St Paul, Minnesota, pp. 69–86. Karapapa, V.K., Bainbridge, B.W. and Heale, J.B. (1997) Morphological and molecular characterization of Verticillium longisporum comb. nov., pathogenic to oilseed rape. Mycological Research 101, 1281–1294. doi:org/10.1017/S0953756297003985 Kawchuk, L.M., Hachey, J., Lynch, D.R., Kulcsar, F., van Rooijenet, G. et al. (2001) Tomato Ve disease resistance genes encode cell surface-like receptors. Proceedings of the National Academy of Sciences USA 98, 6511–6515. doi:org/10.1073/pnas.091114198 Kistler, H.C. (1997) Genetic diversity in the plant-pathogenic fungus Fusarium oxysporum. Phytopathology 87, 474–479. doi:10.1094/PHYTO.1997.87.4.474 Klosterman, S.J., Atallah, Z.K., Vallad, G.E. and Subbarao, K.V. (2009) Diversity, pathogenicity, and management of Verticillium species. Annual Review of Phytopathology 47, 39–62. doi:org/10.1146/annurev-phyto-080508-081748 Klosterman, S.J., Subbarao, K.V., Kang, S., Veronese, P., Gold, S.E. et al. (2011) Comparative genomics yields insights into niche adaptation of plant vascular wilt pathogens. PLoS Pathogens 7, e1002137. doi:10.1371/journal.ppat.1002137 Koike, S.T. and Gordon, T.R. (2015) Management of Fusarium wilt of strawberry. Crop Protection 73, 67–72. doi:10.1016/j.cropro.2015.02.003 Korolev, N., Pérez-Artés, E., Mercado-Blanco, J., Bejarano-Alcázar, J., Rodríguez-Jurado, D. et al. (2008) Vegetative compatibility of cotton-defoliating Verticillium dahliae in Israel and its pathogenicity to various crop plants. European Journal of Plant Pathology 122, 603–617. doi:10.1007/s10658-008-9330-1 Laurence, M.H., Burgess, L.W., Summerell, B.A. and Liew, E.C.Y. (2012) High levels of diversity in Fusarium oxysporum from non-cultivated ecosystems in Australia. Fungal Biology 116, 289–297. doi:10.1016/j. funbio.2011.11.011 Laurence, M.H., Summerell, B.A., Burgess, L.W. and Liew, E.C.Y. (2014) Genealogical concordance phylogenetic species recognition in the Fusarium oxysporum species complex. Fungal Biology 118, 374–384. doi:10.1016/j.funbio.2014.02.002 Li, J. (2019) Identification of host-specific effectors mediating pathogenicity of the vascular wilt pathogen Verticillium dahliae. PhD Thesis. University of Wageningen, Wageningen, The Netherlands. Lievens, B., Houterman, P.M. and Rep, M. (2009) Effector gene screening allows unambiguous identification of Fusarium oxysporum f.sp. lycopersici races and discrimination from other formae speciales. FEMS Microbiology Letters 300, 201–215. doi:10.1111/j.1574-6968.2009.01783.x Liu, F., Wang, M., Damm, U., Crous, P.W. and Cai, L. (2016) Species boundaries in plant pathogenic fungi: a Colletotrichum case study. BMC Evolution Biology 16, 81. doi:10.1186/s12862-016-0649-5 Lombard, L., Sandoval-Denis, M., Lamprecht, S. and Crous, P. (2019) Epitypification of Fusarium oxysporum: clearing the taxonomic chaos. Persoonia 43, 1–47. doi:10.3767/persoonia.2019.43.01 Ma, L.-J., Geiser, D.M., Proctor, R.H., Rooney, A.P., O’Donnell, K., Trail, F. et al. (2013) Fusarium pathogenomics. Annual Review of Microbiology 67, 399–416. doi:10.1146/annurev-micro-092412-155650

Are Species Concepts Outdated for Fungi?

317

Ma, L.-J., van der Does, H.C., Borkovich, K.A., Coleman, J.J., Daboussi, M.-J., Di Pietro, A. et al. (2010) Comparative genomics reveals mobile pathogenicity chromosomes in Fusarium. Nature 464, 367–373. doi:10.1038/nature08850 Magdama, F., Monserrate-Maggi, L., Serrano, L., Sosa, D., Geiser, D.M. and Jiménez-Gasco, M.M. (2019) Comparative analysis uncovers the limitations of current molecular detection methods for Fusarium oxysporum f. sp. cubense race 4 strains. PLoS ONE 14, e0222727. doi: 10.1371/journal.pone.0222727 Mahuku, G.S. and Platt, H.W. (2002) Molecular evidence that Verticillium albo-atrum Grp 2 isolates are distinct from V. albo-atrum Grp 1 and V. tricorpus. Molecular Plant Pathology 3, 71–79. doi:10.1046/j. 1464-6722.2001.00096.x Marin-Felix, Y., Groenewald, J.Z., Cai, L., Chen, Q., Marincowitz, S. et al. (2017) Genera of phytopathogenic fungi: GOPHY 1. Studies in Mycology 86, 99–216. doi:10.1016/j.simyco.2017.04.002 Matute, D.R. and Sepúlveda, V.E. (2019) Fungal species boundaries in the genomics era. Fungal Genetics and Biology 131, 103249. doi:10.1016/j.fgb.2019.103249 Mayr, E. (1942) Systematics and the Origin of Species from the Viewpoint of a Zoologist. Columbia University Press, New York. Michielse, C.B. and Rep, M. (2009) Pathogen profile update: Fusarium oxysporum. Molecular Plant Pathology 10, 311–324. doi:10.1111/j.1364-3703.2009.00538.x Milgroom, M.G., Jiménez-Gasco, M.M, Olivares-García, C., Drott, M.T. and Jiménez-Díaz, R.M. (2014) Recombination between clonal lineages of the asexual fungus Verticillium dahliae detected by genotyping by sequencing. PLoS ONE 9, e106740. doi:org/10.1371/journal.pone.0106740 Milgroom, M.G., Jiménez-Gasco, M.M, Olivares-García, C. and Jiménez-Díaz, R.M. (2016) Clonal expansion and migration of a highly virulent, defoliating lineage of Verticillium dahliae. Phytopathology 106, 1038–1046. doi:10.1094/PHYTO-11-15-0300-R Monte, E. and Suárez, M.B. (2010) Muerte de plántulas. In: Jiménez Díaz, R.M. and Montesinos, E. (eds) Enfermedades de las Plantas causadas por Hongos y Oomicetos. Naturaleza y Control Integrado. Phytoma-España, Valencia, Spain, pp. 115–133. Nelson, P.E. (1991) History of Fusarium systematics. Phytopathology 81, 1045–1048. Novakazi, F., Inderbitzin, P., Sandoya, G., Hayes, R.J., Tiedemann, A.V. et al. (2015) The three lineages of the diploid hybrid Verticillium longisporum differ in virulence and pathogenicity. Phytopathology 105, 662–673. doi:org/10.1094/PHYTO-10-14-0265-R O’Connell, R.J., Thon, M.R., Hacquard, S., Amyotte, S.G., Kleemann, J. et al. (2012) Life-style transitions in plant pathogenic Colletotrichum fungi deciphered by genome and transcriptome analyses. Nature Genetics 44, 1060–1065. doi:10.1038/ng.2372 O’Donnell, K., Kistler, H.C., Cigelnik, E. and Ploetz, R.C. (1998) Multiple evolutionary origins of the fungus causing Panama disease of banana: concordant evidence from nuclear and mitochondrial gene genealogies. Proceedings of the National Academy of Sciences USA 95, 2044–2049. doi:10.1073/ pnas.95.5.2044 Ogoshi A. (1987) Ecology and pathogenicity of anastomosis and intraspecific groups of Rhizoctonia solani Kuhn. Annual Review of Phytopathology 25, 125–143. doi:org/10.1146/annurev.py.25.090187.001013 Olivain, C. and Alabouvette, C. (1997) Colonization of tomato root by a non-pathogenic strain of Fusarium oxysporum. New Phytologist 137, 481–494. doi:org/10.1046/j.1469-8137.1997.00855.x Ordonez, N., Seidl, M.F., Waalwijk, C., Drenth, A., Kilian, A., Thomma, B.P.H.J. et al. (2015) Worse comes to worst: Bananas and Panama disease—when plant and pathogen clones meet. PLoS Pathogens 11, e1005197. doi:org/10.1371/journal.ppat.1005197 Pegg, G.F. and Brady, B.L. (2002) Verticillium Wilts. CABI Publishing, Wallingford, UK, pp. 1–432. Radišek, S., Jakše, J. and Javornik, B. (2006) Genetic variability and virulence among Verticillium albo-atrum isolates from hop. European Journal of Plant Pathology 116, 301–314. doi:10.1007/s10658006-9061-0 Resende, M.L.V., Flood, J. and Cooper, R.M. (1994) Host specialization of Verticillium dahliae, with emphasis on isolates from cocoa (Theobroma cacao). Plant Pathology 43, 104–111. doi:org/10.1111/j. 1365-3059.1994.tb00559.x Rocha, L.O., Laurence, M.H., Ludowici, V.A., Puno, V.I., Lim, C.C., Tesoriero, L.A., Summerell, B.A. and Liew, E.C.Y. (2016) Putative effector genes detected in Fusarium oxysporum from natural ecosystems of Australia. Plant Pathology 65, 914–929. doi:10.1111/ppa.12472 Robinson, P. and Kommedahl, T. (2002) PhyloCode: a new system of nomenclature. Science Editor 25, 52. Rossman, A.Y. and Palm-Hernández, M.E. (2008) Systematics of plant pathogenic fungi: Why it matters. Plant Disease 92, 1376–1386. doi:org/10.1094/PDIS-92-10-1376

318

E. Monte et al.

Rowe, R.C. and Powelson, M.L. (2002) Potato early dying: Management challenges in a changing production environment. Plant Disease 86, 1184–1193. doi:org/10.1094/PDIS.2002.86.11.1184 Schmidt, S.M., Houterman, P.M., Schreiver, I., Ma, L., Amyotte, S., Chellappan, B. et al. (2013) MITEs in the promoters of effector genes allow prediction of novel virulence genes in Fusarium oxysporum. BMC Genomics 14, 113. doi:org/10.1186/1471-2164-14-119 Schnathorst, W.C. and Mathre, D.E. (1966) Host range and differentiation of a severe form of Verticillium albo-atrum in cotton. Phytopathology 56, 1155–1161. Sharma, G. and Shenoy, B.D. (2016) Colletotrichum systematics: Past, present and prospects. Mycosphere 7, 1093–1102. doi:10.5943/mycosphere/si/2c/2 Sharma, G., Pinnaka, A.K. and Shenoy, B.D. (2015) Resolving the Colletotrichum siamense species complex using ApMat marker. Fungal Diversity 71, 247–264. doi:10.1007/s13225-014-0312-7 Short, D.P.G., Gurung, S., Maruthachalam, K., Atallah, Z.K. and Subbarao, K.V. (2014) Verticillium dahliae race 2-specific PCR reveals a high frequency of race 2 strains in commercial spinach seed lots and delineates race structure. Phytopathology 104, 779–785. doi:10.1094/PHYTO-09-13-0253-R Silva, D.D., Groenewald, J.Z., Crous, P.W., Ades, P.K., Nasruddin, A. et al. (2019) Identification, prevalence and pathogenicity of Colletotrichum species causing anthacnose of Capsicum annuum in Asia. IMIA Fungus 10, 8. doi:10.1186/s43008-019-0001-y Skovgaard, K., Nirenberg, H.I., O’Donnell, K. and Rosendahl, S. (2001) Evolution of Fusarium oxysporum f. sp. vasinfectum races inferred from multigene genealogies. Phytopathology 91, 1231–1237. doi:10.1094/PHYTO.2001.91.12.1231 Sneath, P.H.A. and Sokal, R.R. (1973) Numerical Taxonomy. W.H. Freeman, San Francisco, California, pp. 1–573. Sneh, B., Burpee, L. and Ogoshi, A. (1998) Identification of Rhizoctonia Species. APS Press, St. Paul, Minnesota, pp. 1–133. Snyder, W.C. and Hansen, H.N. (1945) The Species concept in Fusarium with reference to discolor and other sections. American Journal of Botany 32, 657–666. doi:10.1002/j.1537-2197.1945.tb05172.x Summerell, B.A., Laurence, M.H., Liew, E.C.Y. and Leslie, J.F. (2010) Biogeography and phylogeography of Fusarium: A review. Fungal Diversity 44, 3–13. doi: 10.1007/s13225-010-0060-2 Sutton, B.C. (1992) The genus Glomerella and its anamorph Colletotrichum. In Bailey, J.A. and Jeger, M.J. (eds) Colletotrichum: Biology, Pathology and Control. CABI Publishing, Wallingford, UK, pp. 1–26. Takken, F. and Rep, M. (2010) The arms race between tomato and Fusarium oxysporum. Molecular Plant Pathology 11, 309–14. doi:10.1111/j.1364-3703.2009.00605.x Taylor, J., Jacobson, D. and Fisher, M. (1999) The evolution of asexual fungi: reproduction, speciation and classification. Annual Review of Phytopathology 37, 197–246. doi: 10.1146/annurev.phyto.37.1.197 Taylor, J.W., Jacobson, D.J., Kroken, S., Kasuga, T., Geiser, D.M., Hibbett, D.S. and Fisher, M.C. (2000) Phylogenetic species recognition and species concepts in fungi. Fungal Genetics and Biology 31, 21–32. doi:10.1006/fgbi.2000.1228 Taylor, P.J. (2009) Evolution and the species concept. In: Minelli, A. and Contrafatto, G. (eds) Biological Science Fundamentals and Systematics. Encyclopedia of Life Support Systems. Eolss Publishers, Oxford, UK, pp. 289–310. Tran, V., Braus-Stromeyer, S.A., Timpner, C. and Braus, G.H. (2013) Molecular diagnosis to discriminate pathogen and apathogen species of the hybrid Verticillium longisporum on the oilseed crop Brassica napus. Applied Microbiology and Biotechnology 97, 4467–4483. doi:10.1007/s00253-012-4530-1 Usami, T., Kanto, T., Inderbitzin, P., Itoh, M., Kisaki, G. et al. (2011) Verticillium tricorpus causing lettuce wilt in Japan differs genetically from California lettuce isolates. Journal of General Plant Pathology 77, 17–23. doi:10.1007/s10327-010-0282-x Vilgalys, R. and Cubeta, M.A. (1994) Molecular systematics and population biology of Rhizoctonia. Annual Review of Phytopathology 32, 135–155. doi:org/10.1146/annurev.py.32.090194.001031 Weir, B.S., Johnston, P.R. and Damm, U. (2012) The Colletotrichum gloeosporioides species complex. Studies in Mycology 73, 115–180. doi:org/10.3114/sim0011 Wheeler, Q.D. and Platnick, N.I. (2000) The phylogenetic species concept (sensu Wheeler and Platnick). In Wheeler, Q.D. and Meier, R. (eds) Species Concepts and Phylogenetic Theory: A Debate. Columbia University Press, New York, pp. 55–69. Zare, R. and Gams, W. (2001) A revision of Verticillium section Prostrata. VI. The genus Haptocillium. Nova Hedwigia 7, 271–292. Zare, R. and Gams, W. (2008) A revision of the Verticillium fungicola species complex and its affinity with the genus Lecanicillium. Mycological Research 112, 811–824. doi:org/10.1016/j.mycres.2008.01.019

Are Species Concepts Outdated for Fungi?

319

Zare, R., Gams, W. and Evans, H.C. (2001) A revision of Verticillium section Prostrata. V. The genus Pochonia, with notes on Rotiferophthora. Nova Hedwigia 73, 51–86. Zare, R., Gams, W. and Schroers, H.-J. (2004) The type species of Verticillium is not congeneric with the plant-pathogenic species placed in Verticillium and it is not the anamorph of ‘Nectria’ inventa. Mycological Research 108, 576–582. doi:10.1017/s0953756204009839 Zare, R., Gams, W., Starink-Willemse, M. and Summerbell, R.C. (2007) Gibellulopsis, a suitable genus for Verticillium nigrescens, and Musicillium, a new genus for V. theobromae. Nova Hedwigia 85, 463–489. doi:10.1127/0029-5035/2007/0085-0463 Zhang, N., Castlebury, L.A., Miller, A.N., Huhndorf, S.M., Schoch, C.L., et al. (2006) An overview of the systematics of the Sordariomycetes based on a four-gene phylogeny. Mycologia 98, 1076–1087. doi:10.3852/mycologia.98.6.1076 Zhang, D.-D., Wang, J., Wang, D., Kong, Z.-Q., Zhou, L. et al. (2019) Population genomics demystifies the defoliation phenotype in the plant pathogen Verticillium dahliae. New Phytologist 222, 1012–1029. doi:10.1111/nph.15672

18

Where to Now?

Paul Bridge1,*, Erko Stackebrandt2 and David Smith3 Axminster, UK; 2Ampleben, Germany; 3CABI, Egham, UK

1

Introduction At the current time there are various ongoing discussions about the future of microbial nomenclature, and in particular the recommendation from some molecular-oriented systematists to apply the Linnaeus binomial nomenclature to genomes and individual DNA sequences. In this context, the editors considered it timely to accumulate the opinions of experts into current trends in microbial systematics. We are aware of the lack of articles from supporters of the status quo of the current taxonomy of prokaryotes. However, as it is not yet known to what extent such changes may affect the main statements as laid down in the articles of the two Codes of Nomenclature, we wanted to place emphasis on summarizing new developments and how these may alter the future of the systematics of microorganisms (we use here the definition of Webster’s Ninth New Collegiate Dictionary (1983) which treats ‘classification’, ‘taxonomy’ and ‘systematics’ as synonyms in order to avoid lengthy explanations of different interpretations of terms). We also use the term ‘bacteriology’ in a relaxed way, to include all prokaryotic taxa. Our original thoughts when planning this book were to summarize the main points of the

individual chapters in a final ‘reconciliation’ summary. Reading the essays, it became very clear to us that the power of molecular data will definitely change the path of systematics, either in agreement with the authorities of nomenclature, or in contrast to current practice. As the book comprises chapters on aspects of both prokaryotic and mycological systematics and identification, the concluding chapter will also be divided into these two parts. The differences in dealing with the outcome of modern, actual trends are too diverse to be intermixed. It is apparent that the situation among prokaryotes will be resolved faster than in mycology, in which the complexity of problems appears to be an order of magnitude higher due to the great diversity of taxa, the paucity of reference material and the application of different taxonomic concepts (see Chapters 14 and 17). In this volume certain reflections on general aspects of systematics are covered in chapters of only one group of microorganisms (domain/ kingdom) while in practice they are applied to both. For example, the polyphasic approach that is well established in bacteriology (see Chapters 1 and 15) has also become established for some genera in mycology (e.g. Yilmaz et al., 2014), and is considered a ‘gold standard’ (Lücking et al., 2018).

*[email protected]

320

© CAB International 2021. Trends in the Systematics of Bacteria and Fungi (eds. P. Bridge, D. Smith and E. Stackebrandt)

Where to Now?

Progress in Mycological Systematics In mycology systematics is a particularly challenging area. The fungi are believed to have diverged as a distinct kingdom around 3.5 billion years ago (Wang et al., 1999). There are a number of suggestions as to how many lineages are present in the kingdom, and precisely how they are related, and this is discussed in Chapter 2 (see also Jones et al., 2011; Naranjo-Ortiz and Gabaldón, 2019). With recent estimates of between 1.5–12 million species (see Chapter 2; Yahr et al., 2016; Wu et al., 2019) and growth forms ranging from simple yeast cells to large multinucleate networks covering many hectares, they are one of the most diverse groups of organisms on the planet. We have, therefore, selected contributions that are focused primarily on the identification of fungal species. Phenotypic characters have remained very important in mycology, where an initial morphological examination may be required before an appropriate gene region can be selected for identification. In some instances, a morphological identification may be sufficient for presumptive diagnosis of plant and animal diseases, particularly when combined with host symptoms (see Chapters 2, 12 and 17). The wide-ranging phenotypic characterization provided by matrix-assisted laser desorption/ionization-time of flight (MALDITOF) methodology has been introduced for the identification of some fungal species. This approach is relatively new in mycology, with the first studies on fungi being undertaken in the early 2000s (see Chapter 8). The methodology is well established for bacteria (see Chapter 7) and appears to work well with many yeasts, but there seem to be some issues in both growth methods and sample preparation for highly pigmented and filamentous fungi (see Chapter 8). MALDI-TOF with fungi appears to show greater intraspecific variation than seen with bacteria, perhaps not surprisingly for such versatile organisms, so species identification may require multiple spectra from various different strains. This raises several questions, such as how many spectra are needed for a particular species, how will they be obtained and (perhaps most significantly) how will they be maintained and made available (see Chapter 8). It would seem likely that the use of MALDI-TOF as a routine identification tool will be rather constrained until these questions have been resolved.

321

Molecular identification has been seen as the ultimate goal for determining fungal species (see Chapter 12). This area has developed considerably since the early use of G+C base composition values (e.g. De Bertoldi et al., 1973) in the 1970s, with the discovery of the use of ribosomal DNA sequences in the early 1990s (e.g. Bruns et al., 1991) and the proposal to use ITS sequences for barcoding in 2012 (Schoch et al., 2012). However, the use of ITS sequences has limitations for some fungal genera including many that are commonly encountered in the environment. In genera such as Penicillium, Colletotrichum and Trichoderma ITS sequences often do not contain sufficient variation to discriminate between closely related species, and in some such as Fusarium and the Glomeromycota, multiple copies of the ITS may be present (see Chapters 5 and 12; see also Lindner and Banik, 2011; He et al., 2017). Some of the potential issues of using ITS as a ‘universal’ marker have been widely discussed in the past and the suitability of other genes as single markers has been considered (e.g. Větrovský et al., 2016). Some reference to these is given in Chapters 5 and 12. Such limitations have resulted in the increased use of multilocus studies, involving an increasing number of gene regions, which has led to the need to use additional sequences to definitively identify species in many genera (see Chapters 5, 12 and 14). The above issues then raise questions when new ITS sequences are obtained from the ‘dark’ taxa. New ITS sequences may represent new taxa or they may potentially represent alternative ITS sequences from known taxa. Similarly, recognized ITS sequences may represent single known species or species complexes (see Chapters 2, 5 and 12; Lindner et al., 2013).

Species Concepts The main species concepts that have been applied in mycology are reviewed in Chapters 14 and 17. Before considering the limitations of ITS as a species marker it may also be worth considering what is being expected from it. The concept of concerted evolution across sites in the rRNA gene cluster is well established and has been used extensively to determine lineages and date radiations (e.g. Pace, 1997; Berbee and Taylor,

322

P. Bridge et al.

2001). If one assumes that ITS sequences generally evolve in a concerted way, then different species with the same ITS sequences are potentially a more recent divergence than those with different ITS sequences (e.g. Réblová et al., 2013). Following on from this, species of different evolutionary ages could be expected to have different levels of intraspecific ITS sequence variation (see Lücking et al., 2018; Zamora et al., 2018). This has been demonstrated on a number of occasions, with at least one survey suggesting between 0% and 25% intraspecific variation across a wide range of genera (Nilsson et al., 2008). With the diverse range of forms, habitats, reproductive methods and life styles of fungi there are also various opportunities for horizontal transfer, as mentioned in Chapters 14 and 17. It is, then, not surprising that a single concerted molecular marker, such as ITS, is not appropriate for all species (Lücking et al., 2018; Zamora et al., 2018; see Chapters 5 and 12). The same logic could also be applied to fungal species concepts. Lücking and Hawksworth (2018) have rejected sequence similarity as an absolute measure for species recognition, in contrast to the average nucleotide identity (ANI) used in bacteriology (see Chapters 1, 13 and 15). Lücking et al. (2020) have recently suggested a framework for the identification of fungi, encouraging the approach of integrative (polyphasic) taxonomy for species delimitation. In Chapter 11 some consideration was given to using alternative species concepts with bacteria, but in mycology most species have been delineated using genealogical concordance phylogenetic species recognition (GCPSR; see Chapters 14 and 17). It may be worth considering whether different species concepts may be appropriate for different fungi (see Giraud et al., 2008; Taylor, 2014) depending on the particular fungal group being studied. We are not aware of any direct comparative studies in mycology, but comparisons have been used in zoology, where geographic and reproductive isolation may be factors for some species. In one example with giraffes, where there is controversy over the number of species and their delineation (Bercovitch et al., 2017), it has been shown that a combined multi-locus data set analysed with a number of different phylogeny reconstruction models can support between two and seven separate species depending on the assumptions and methodologies used, but in that case a three-species

model was the best supported and most likely (Petzold and Hassanin, 2020). Taylor (2014) has shown that more localized lineages can be detected in fungal populations through analysis of whole genomes, and it would be interesting to apply a similar approach to a wider range of genera such as Colletotrichum and Fusarium where an increasing number of loci and genomic sequences are becoming available (see also Chapter 14). In common with bacteriologists, mycologists have been involved in discussions about the use of DNA sequences as type specimens for names of fungi (see Chapter 2; Lücking and Hawksworth, 2018; Lücking et al., 2018; Thines et al., 2018; Zamora et al., 2018). A proposal to permit this was put forward to the 2018 International Mycological Congress and, although it was not accepted at that time, it has led to the establishment of a Special-purpose Committee for further consideration of the issue (see Crous and Boekhout, 2018; May et al., 2018). There is lively discussion about the future of naming prokaryotic material that does not consist of living material, and which can be revived from a frozen state for the analysis of properties and expression of genes whenever the annotation of genomic properties reveals the likelihood of novel traits. The discussion between ‘sequence-based’ and ‘catch-all’ taxonomists is one of highly divergent opinions on what and how the prokaryotes and parts thereof should be named. It is interesting to note that the term ‘part thereof ’, coined in early 2000 (e.g. DNA, plasmids), has been included in the definition of material to be deposited in biological resource centres (OECD, 2001). This discussion is mainly dominated on the one side by scientists, who are not considered to be traditional taxonomists, but are intrigued by the enormous wealth of as yet undescribed genomic diversity found in any environmental niche on the planet Earth. On the other side are taxonomists who strongly believe in a comprehensive description of properties at the genetic and epigenetic level. Despite the two sides, those who want to abolish the way prokaryotic taxonomy has been done since the time of Cohn (1872) and Migula (1894) and those who cling to this concept, this does not mean that either group is on the wrong path. The ‘sequence-based’ scientists educated themselves along with the development of new and faster sequencing and analysis

Where to Now?

rotocols that are now even able to reconstruct p the genome of single cells or at least of highly related strain consortia (see Chapters 1, 10 and 13). The range of properties analysed by the so-called ‘molecular taxonomists’ to name entities spans non-cellular material (genomes such as metagenome-assembled genomes (MAGs) and single amplified genomes, SAGs) creates shortcuts through the traditional approach (such as culturomics; Lagier et al., 2018) and to living cells, to which the Bacteriological Code (Parker et al., 2019) is so far restricted (see Chapter 3). At the moment the articles of the Code place the naming of solely culturable specimens as one of the major cornerstones in bacterial taxonomy, which also include the stability of registering names via the Validation and Notifications Lists (see Chapter 3). The ‘molecular taxonomists’ judge the 150-year-old way of circumscribing prokaryotes as being slow, failing to follow the tempo of the genomic revolution, generating data that are not easily transportable and which are of restricted value in identification. It is apparent from the literature that the majority of new species defined by the ‘polyphasic approach’ (Colwell, 1970), especially environmental and non-medical isolates, are based on the type strain only; while the properties of additional strains found later in other studies are not reported, and so the intraspecific properties of a species are not discernible. There are numerous initiatives to provide sequence data of the type species and, again, best coverage is seen in the bacteria where comprehensive databases are available (see Chapters 11 and 14). The small subunit (SSU) of ribosomal RNA (16S rRNA) is a standard for classification of bacteria (see Chapter 11). Whole-genome sequences are now being rapidly generated and there is a case for a new taxonomy built on this (see Chapter 13). To support the move to genome-based taxonomy, 3300 archaeal and 320,000 bacterial sequences were made publicly available in May 2020 (https://gold.jgi.doe. gov/, accessed 6 August 2020; www.ncbi.nlm.nih. gov/genome/microbes/, accessed 6 August 2020).

Diverging Developments in Bacterial Classification The traditional ‘catch-all’ taxonomy has shown its flexible nature (see Chapter 1) as new developments,

323

with the potential to differentiate between species and that are highly useful in identification, have regularly been incorporated into prokaryotic systematics. Approaches that lost their importance in one or another group of organisms were abolished. Nevertheless, the nature of the taxonomist’s mind appears to accept the addition of novel approaches more easily than to accept the annulling of a feature. This is clearly seen in the addition of a series of genomic properties such as DNA G+C mol%, DNA-DNA hybridization (and more recently by the ANI value determination), gene sequences (mainly of 16S rRNA and housekeeping genes, at least in certain genera) and draft or complete genome sequences. Interestingly, the addition of the latter as a mandatory trait was accepted almost immediately and can only be explained by the acceptance of genome sequences as an important feature, and the ease and low costs of DNA sequence-generation as a basis for bacterial classification. It had been hoped that this would happen, as already expressed in the publication of Wayne et al. (1987). Acceptance of the use of MALDI-TOF for species identification and differentiation (see Chapter 7) was similarly rapid, especially for clinical strains, for which a high number of strains of a species are available. Here the problems lie in the lack of common databases: some are in-house while others are commercial. Unfortunately, a mandatory deposition of spectra into public databases, such as those available for sequences, does not exist. In contrast to the ease with which molecular data on any DNA- and RNA-based genetic material are generated by sophisticated machines, and analysed and annotated by a plethora of algorithms, the polyphasic study of an organism seems to be light years away. Changes in growth conditions may alter gene expression and fatty acid composition; some techniques, such as DNA-DNA hybridization, belong to the ‘black-box’ category as the molecular interactions remain obscure; DNA patterns are not transportable; and some techniques, such as the determination of complex peptidoglycan structures, require absolute expert knowledge. Given the obvious advantages of the DNAbased taxonomy, why then are the phenotypeoriented taxonomists afraid to accept changes? First, there is lack of knowledge, as today most taxonomists are over 50 years old and were raised

324

P. Bridge et al.

in ‘schools’ around taxonomy champions not engaged in molecular biology and living in the USA, Germany, the UK, Belgium and France, and so on. Second, for the students of those taxonomy champions, nucleic acid sequences and their analysis were beyond their reach before the 1990s. Third, and probably the main reason, is the argument that the presence of gene sequences is no indication for gene expression (Bisgaard et al., 2019). This argument is not valid because, to identify a microorganism, the presence or absence of a chitinase gene (sequence), for example, is as significant as a phenetic trait for determining the expression of chitin hydrolysis in the laboratory. Fourth, there is the recurring argument that the majority of gene sequences cannot be unambiguously annotated, thus a huge fraction of genes cannot be used for microbial taxonomy. The progressing state of genome annotation is indeed encouraging. Two chapters outline these developments in two areas: the annotation of genes coding for moieties of chemotaxonomic traits (peptidoglycan and polar lipids; see Chapter 9) and of genes coding for metabolic properties (see Chapter 10). Both pillars of the phenetic classification of species have been challenged as outdated, superfluous and time consuming. Nevertheless, chemotaxonomic markers are a valuable support in defining boundaries (at least at the genus level), and metabolic traits provide, now as before, important diagnostic properties. The number of annotations is constantly increasing as more type strains and well-characterized strains are the subject of sequencing projects (see Chapters 15 and 16). Granted, more work should be directed towards the use of genetic approaches to determine gene function, and this work should engage both geneticists and culture collection staff, as the latter have a deep knowledge of metabolic traits of an organism. Reading through the chapters of this book a few issues became very clear. Chapter 15 discusses the notion that the polyphasic approach created a conflict in the classification of taxa. Over time it included the phylogenetic assessment of sequences by techniques such as neighbour joining, maximum parsimony and maximum likelihood within the original phenetic framework. ‘Phenetic and phylogenetic thinking still compete with each other regarding the classification of bacteria, with potentially conflicting

and confusing results’ (Colwell, 1970). The use of a pairwise gene sequence similarity value used in almost all recent species descriptions is actually not a phylogenetic but a phenetic approach (Meier-Kolthoff et al., 2014). Another issue raised in Chapter 16 refers to the mode of speciation. According to the idea outlined on the nature of the taxon ‘species’, ‘the species taxa of bacterial systematics do not represent the most newly divergent, ecologically distinct populations of bacteria. This attribute belongs to ecotypes, which are ecologically homogeneous bacterial populations that have the species-like property of cohesion’. However, studies on the presence of ‘ecotypes’, and consequently on the possibility of unravelling the mode of speciation, are not possible as long as the majority of species descriptions are restricted to a single strain, the type strain only (Christensen et al., 2001). The lack of effort by taxonomists to search for more than a single representative of a novel species hinders the credibility of the polyphasic approach. For example, the differentiation of related species based on physiological traits cannot be determined meaningfully as long as the intraspecific diversity remains undetermined. In those cases where multiple strains are available, the species boundary is defined so broadly that it contains multiple ecotypes, excluding ‘the opportunity for a full exploration of the metabolic, physiological, ecological and genomic diversity within the species’ (Chapter 16). Chapters 13 and 16 comment on the finding of molecular delineation of species with the average nucleotide identity (ANI) approach being able to confirm in most examples, a gap in ANI, corresponding to species-like discontinuities among prokaryotic genomes with most closely related species separated by ANI values above 95%. This finding indicates that homologous recombination declines to near zero by 95% ANI in the taxon species. On the other hand it shows that bacterial systematists of the mid20th century fortuitously created a species-level systematics that actually fits an important universal theory of speciation (see Chapter 16). These clear gaps (i.e. sequence discontinuities) were not found to separate higher taxa (Konstantinidis and Tiedje, 2005), which points towards either a lack of understanding of the evolutionary mechanisms for the separation of

Where to Now?

ranks between species and phyla, or it confirms the idea that higher taxa are man-made constructs trying to order nature.

Bacterial Nomenclature in the Future The chapters covering recent developments in the systematics within the domains of Prokaryota did not conclude on a definite solution to the issue around the future of nomenclature and the consequences for the Code of Nomenclature. It is, however, obvious from several publications (Thompson et al., 2015; Whitman, 2016; Konstantinidis et al., 2017) and the discussions brought forward in the chapters of this book, that the argument for the expansion of type material for the uncultivated majority of prokaryotes (i.e. Candidatus taxa, genome sequences, sequences from metagenomics data) as vouchers for formalized names is convincing. This should be done in a way that is progressive, though not hasty, by starting to validly name the Candidatus status for incompletely described prokaryotes (Murray and Stackebrandt, 1995). As indicated in Chapter 3 the International Committee on Systematics for Prokaryotes (ICSP) is already discussing this option. For this taxon, which could possibly be labelled with a superscript such as ‘TC’ (type of Candidatus) the decision appears to be facilitated by the existence of a living organism which can be analysed for genome and gene sequences, as well as for phenetic properties. Among taxonomists for whom the Code is the ‘holy grail’, there is opposition to the naming of a Candidatus based exclusively on metagenomic data. But let us consider a SAG, separated from the genome sequence of its nearest cultivated species (or Candidatus) by less than 95% ANI, indicating quasi-species status. Why not label it (e.g. with ‘TG’ to indicate type genome), amplify and store its DNA in a public repository (as for a Candidatus and cultured strain) and consequently name it to denote its uniqueness? As long as the nature of the biological material and its origin is recorded, and the genome sequence is deposited and retrievable, research on any aspect at the genetic and epigenetic

325

level is feasible, thus not only allowing insights into evolution in general of but also guiding research into its closest relatives. For one problem, however, the chapters did not provide a solution. The nomenclatural issue of prokaryotes and parts thereof remains; that is, how to expand the Code of Nomenclature to include the level of phyla and Candidatus taxa as well as that of DNA of isolated and environmental origin (MAGs and SAGS) remains unresolved. Alternative solutions have been indicated at the end of Chapter 13 and in a recent publication by a group of leading genome-oriented taxonomists (Murray et al., 2020).

Reference Materials for Mycology A lack of reliable reference data has been a common concern in all of the fungal contributions. Unlike bacteriology, mycological nomenclature has historically been under the International Code of Nomenclature for Algae, Fungi and Plants (Turland et al., 2018) and so type material has usually been either a dried specimen or an illustration (see Chapter 2). The concept of using a preserved living culture as type material is relatively new and, although resources such as MycoBank and Index Fungorum are available, there is no fungal equivalent of the bacteriologists’ Approved List, and details of original type materials for many species may only be found through original publications and herbarium records. The shortage of reference DNA sequences for verified or type material for fungi was highlighted in Chapters 5 and 12 and recent studies report that less than 30% of species for which sequences are available include an ITS sequence linked to type or ex-type material (Yahr et al., 2016; Chapter 5). This situation is starting to change with a number of current projects. The NCBI collaboration with various mycological centres to obtain ITS barcodes for types has been mentioned (Schoch et al., 2014; Federhen, 2015; see Chapters 5 and 12) and other ITSbased initiatives have been established (see Schoch et al., 2014 and Yahr et al., 2016 for wider reviews). There are also major efforts to

326

P. Bridge et al.

generate whole-fungal-genome sequence data that include the 1000 Fungal Genomes and the Plant and Fungal Trees of Life programmes (Grigoriev et al., 2014; www.kew.org/read-andwatch/completing-tree-of-life-plants-fungi/, accessed 16 October 2020).

Herbarium resources Given that most of the fungal types are dead voucher specimens, there have been several studies to extract DNA from the samples and sequence it (Thomas et al., 2005) and one such initiative is the Plant and Fungal Trees of Life (PAFTOL) programme at the Royal Botanic Gardens Kew (www.kew.org/read-and-watch/completing-tree-of-life-plants-fungi). The majority of type specimens in fungaria are quite old and their DNA is expected to be degraded. However, the Natural History Museum and herbarium specimens have yielded sequences of value in taxonomy (Bruns et al., 1990; Taylor and Swann, 1994; Forin et al., 2018). Kistenich et al. (2019) used comparative DNA sequence analysis for representative species of lichenized fungi (lichens) and obtained DNA sequence information for 54 of 56 specimens of up to 150 years old. They found that the ion torrent sequencing approach outperformed Sanger sequencing with regard to sequencing success and efficiency. Wang et al. (2017) compared different drying methods for recovery of mushroom DNA. Dried specimens from all tested methods yielded sufficient DNA for PCR amplification of the ITS region and a species-specific single-copy gene, although methods were species specific. Among these methods, oven drying at 70°C for 3–4 h seemed the most efficient for preserving field mushroom samples for subsequent molecular work. It would take some effort to extract good-quality DNA from the type species available. The Mycology Collections Portal (MyCoPortal) currently provides access to 3.6 million records from 84 institutions worldwide that hold 118,000 type specimens (mycoportal.org). However, it is uncertain whether good-quality DNA could be recovered from all samples; many researchers report it to be difficult, and often impossible (Hyde et al., 2010). The previous use of various substances as fixatives (such as formaldehyde or

methyl bromide) for treatment of specimens can impede successful DNA extraction, as can some chemicals previously used to control insects in herbaria, but improvements to the methodology are being made (Muñoz-Cadavid et al., 2010; Hykin et al., 2015). Hawksworth and Lücking (2017) reported that there were sequences from 35,000 of the 120,000 known species, so there is still a long way to go if their estimation of 2.2–3.8 million species of fungi are to be sequenced. If this is to be achieved it will be essential that data are made more accessible to deal with the several millions of fungi that have yet to be cultured. A coordinated effort is needed to bring these data together, and to store the necessary reference strains and the DNA or environmental samples containing the yet to be cultured (‘dark’) taxa. Curating the names A further constraint to obtaining reliable reference data for identification is the continuing discovery of cryptic species among existing species complexes (see Chapter 2). Such changes can have a significant effect on isolations and effectively limit any matches in sequence or spectra databases to data obtained from the type/ex-type material, or from material studied in (or after) the taxonomic change (see Chapters 2 and 5). The clarification of species complexes has wider implications beyond systematics, as many taxonomic changes have occurred since the advent of whole-genome sequencing. Chapter 14 raised this issue in relation to the species names attached to genome sequences of Colletotrichum, but this is only one example. As mentioned in Chapter 14, the description of Penicillium rubens as a distinct species included the historical Wisconsin penicillin production strain, although whole-genome sequences from this strain remain labelled as Penicillium chrysogenum in many reference databases. Name and species concept changes make it difficult for a non-specialist to determine which reference data or cultures may be appropriate for their particular requirements. Further examples of this are evident in selecting the strains or sequences needed to develop a reference database for an identification scheme, an eco-system reference or a spectral library for MALDI-TOF (as suggested in Chapter 8). In many, but not all,

Where to Now?

common species, sequences are available from type material and ex-type strains. These, however, provide only a single representative and, for applications such as MALDI or ecological studies, many strains or sequences may be required. In recent years there have been significant changes in species concepts in many common genera such as Colletotrichum and Penicillium, and these make selecting reference materials difficult. As an example, a strain or sequence labelled as Colletotrichum gloeosporioides before the 2012 revision of that species (Weir et al., 2012) may be an authentic representative of that species, but it could also be a representative of one of the 21 other species identified in that complex, and so could be retrospectively re-labelled C. gloeosporioides complex. Such relabelling does not appear have occurred to any great extent in strain and sequence databases. As a result, when attempting to select appropriate data for many species, there is an additional requirement on the user to determine, if possible, the date and taxonomy used to determine the species name (as recommended in Chapter 2). In many cases this can be a relatively complex procedure involving cross-referencing strain and sequence numbers in culture collection and sequence databases, together with recent publications, and may in some cases require further DNA sequencing if additional gene regions have been used in the new taxonomy. Such activities may be apparent to systematists, but may not be to other strain/data users. These issues then raise some concerns about the accuracy of fungal names in wider usage. Such examples illustrate the need for some form of oversight and curation of reference data when taxonomic changes are made (see Chapters 5 and 12). What is not clear, however, is how – or even if – this is possible in practice; and, if it could be undertaken, how it could then be replicated across the increasing number of data repositories. How does this all impact on our ability to identify microorganisms in our day-to-day life to diagnose disease, get to the bottom of microbial contamination or discover the risks involved in the presence of the unknown organism? Currently, there is a lack of data in the databases used for comparison in most automated systems. Only too often is there a lack of closely matching sequences. Contributing to this is that representative rDNA sequences are not available for all of the described

327

fungi. Where sequences are available, those from the additional genes that may be required for species level discrimination may be lacking. An additional problem is that, in mycology in particular, many of the reference strains from which the data were originally derived are no longer available for checking past data or for gathering new data. This sets a global challenge to microbiologists, their scientific communities and funders, and the managers of public service culture collections, to devise a systematic and coordinated approach to bridge the reference strain and data gaps. Several projects exist, some mentioned in previous chapters, to sequence the types. As mentioned by several authors here, however, full genome sequences are needed to resolve the new molecular taxonomy, and we need a broader representation of individual species (as well as of types) to determine intraspecific variation. More needs to be done and could be addressed through funded programmes of work and targeted isolation programmes. At its simplest, investments that are made to isolate samples for microbiome analysis should include provision for their characterization and storage, in order to maximise the future use of yet-to-bediscovered species.

Networking Microbial Strain Information As mentioned earlier, almost all bacterial type strains are available from public service collections but only around 25% of fungal type strains can be accessed (Overmann and Smith, 2017). By far the majority of fungal types are available as dead, dried voucher specimens from the world’s fungaria (Miller and Bates, 2017) and ex-type strains, where available, are usually deposited in public service collections. The gaps in coverage are recognized by several groups, for example in Europe by the Microbial Resource Research Infrastructure (MIRRI), which is being established as a research infrastructure under the European Strategy Forum for Research Infrastructures (www.mirri.org/, accessed October 2020). In an open paper, MIRRI outlined its policy on accession, addressing the acquisition of novel microbial material by microbial domain biological resource centres (mBRCs). MIRRI mBRC partners agreed to a targeted

328

P. Bridge et al.

accession of biological material to broaden the range of genetic resources of high interest to bioindustry and bioscience (zenodo.org/record/ 47247#.XtDUu6bHzs0). Such coordinated initiatives are essential if a comprehensive coverage of strains and data is to be achieved to underpin taxonomy and research in general. MIRRI’s target is ambitious but focused on the strains of potential use; it will require targeted isolation programmes and the help of the mBRC network of depositors (www.oecd.org/science/emerging-tech/ towardsaglobalbiologicalresourcecentrenetwork.htm; see Chapter 6). The Global Biological Resource Centre Network (GBRCN) programme envisaged similar actions to that of the European MIRRI initiative in regions around the world in order to create a global network to achieve common objectives (Fritze et al., 2012). Initiatives have also been established in Asia, with the Asian BRC Network (ABRCN; www.abrcn.net/aboutus.html) under the auspices of the Asian Consortium for the Conservation and Sustainable Use of Microbial Resources (ACM; www.acm-mrc.asia/). Activities in the USA have been funded by the National Science Foundation (NSF), which established the United States Culture Collection Network (USCCN; www.usccn.org/ Pages/default.aspx/, all accessed 16 October 2020).

Systematics in the Post-Nagoya Era Concern that the implementation of the Nagoya Protocol by nations would impede access to type strains caused Asian culture collections to take action. Under the ABRCN Task Force they adopted the Network of International Exchange of Microbes under the ACM (NIEMA) system (Ando et al., 2014). There are 23 member organizations in 13 countries (Cambodia, China, India, Indonesia, Japan, Korea, Laos, Malaysia, Mongolia, Myanmar, Philippines, Thailand and Vietnam) in the ACM. This system was proposed by the ABRCN Task Force to facilitate the exchange of microorganisms in compliance with the CBD and the Nagoya Protocol (www.acm-mrc.asia/ TF/MMT.html/, accessed 16 October 2020). It developed the NIEMA Code of Conduct and established a NIEMA Clearing House. Microbial strains are transferred among NIEMA member collections under the Code and distributed to end-users for

non-commercial research use. In the 6 years since this system was proposed, most countries have given facilitated access to genetic resources for non-commercial uses such as taxonomic studies when implementing legislative, administrative or policy measures to enact the Nagoya Protocol. When type strains are accessed from ex situ culture collections they are provided for taxonomic purposes only unless agreed otherwise with the country of origin. The European regulatory guidance provides clarity on these issues, and information on activities that are in (or that are out) of scope of the regulation is given (eur-lex.europa.eu/legal-content/EN/TXT/?uri=uriserv:OJ.C_.2016.313.01.0001.01. ENG&toc=OJ:C:2016:313:TOC). If the genetic resource is to be used for testing purposes or as a reference tool, and is not the object of research (e.g. it is the type strain and is only used to confirm the desired features), it falls out of scope of EU Regulation (EU) No 511/2014. In addition, the handling and storing of biological material and describing its phenotype are also out of scope. Currently 124 countries have legislation and it is wise to check their individual requirements to discover if the work being carried out using type strains falls into scope of their regulations. In Europe, a register of collections has been established and this provides a list of collections that have carried out due diligence on the strains they supply. This ensures the recipient of cultures from them has satisfied access requirements under the regulations. Currently, two microbial collections appear on this register: the Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures, and the French Collection for Plant-associated Bacteria (CIRM-CFBP). The Microbial Resources Research Infrastructure has produced an Access and Benefit Sharing (ABS) Manual to help culture collections fulfil their responsibilities with regard to the Nagoya Protocol (Verkleij et al., 2016). Of concern are the current discussions on digital sequence information (DSI) by the Conference of the Parties (COP) of the CBD and the Nagoya Protocol (www.cbd.int/dsi-gr/, accessed 16 October 2020). The question is whether DSI should be treated in the same way as genetic resources and be subject to the legal measures of a provider country that implements the Nagoya Protocol (see Chapter 6). There is currently no accepted common definition of DSI but it would include DNA and RNA sequences. COP has

Where to Now?

established an Ad Hoc Technical Expert Group on Digital Sequence Information on Genetic Resources (AHTEG). AHTEG was established to consider any potential implications of the use of digital sequence information on genetic resources, and its findings were published in the document CBD/DSI/AHTEG/2020/1/7 Report of the Ad Hoc Technical Expert Group on Digital Sequence Information on Genetic Resources (www.cbd.int/ dsi-gr/). The report proposed options for terminology to describe DSI, the implications of implementing controls and capacity building. Experts acknowledged that some countries are currently regulating DSI, others may be waiting for international consensus on this issue under the Convention and in other forums, and still others have stated that they do not intend to regulate it at all. Nations hope to have a decision on this topic at COP 15 and COP-MOP 4 of the Nagoya Protocol. This was originally scheduled for 15 October 2020 in Kunming, Yunnan, China, but will now be held at a later date. Some views on the significance of DSI and the Nagoya Protocol in lichen systematics have recently been reviewed (see Lücking, 2020) and it is imperative that scientists make sure their national delegations are aware of the implications that the inclusion of DSI could have on genomics and taxonomy.

Conclusion The chapters of this book represent the status quo of molecular assessment of prokaryotes,

329

fungi and yeast. They outline, then discuss in detail, the recent developments in sequencing approaches to assess the diversity of cultured and not-as-yet-cultured organisms and metagenomes, together with the most widely applied algorithms for gene annotation and the pros and cons of defining taxon ranks. The ideas expressed in these chapters should encourage traditional taxonomists to open their minds to this exciting progress, as well as stimulate molecular taxonomists to generate a platform for solving the problems of mutual discordance. Nevertheless, the current differences of opinion appear to be much stronger than those previously seen in microbial taxonomy. Genomic sequences entered microbial classification and taxonomy decades ago and will continue to shape the taxonomist’s view of systematics at all levels. For the bacteriologist, at least, the ball has been passed to the ICSP, and its members will decide how to deal with the molecular approach in differentiating organisms. After its decision the molecular systematists may need to determine their future. The future for mycology may be more problematic, but a lot can be learned from the bacteriologists. As scientists, we need more data and a coordinated approach, and common standards are needed for their generation. The editors trust that the appointed experts of all parties will have the wisdom and tolerance to listen to each other, to avoid confrontation and to have the conviction that an open mind will be to the benefit of microbial classification in its entirety.

References Ando, K., Jin, T.E., Funabiki, R., Wu, L. Thoetkiattikul, H., Lee, J-S., Techapattaraporn, B. and Changthavorn, T. (2014) Network of International Exchange of Microbes under the ACM (NIEMA) —A transfer and exchange system of microbes for microbial resource centres for non-commercial purposes according to the CBD and the Nagoya Protocol. Culture Collection 30, 85–96. http://www.jsmrs.jp/journal/ No30_2/No30_2_85.pdf Berbee, M.L. and Taylor, J.W. (2001) Fungal molecular evolution: Gene trees and geologic time. The Mycota: A Comprehensive Treatise on Fungi as Experimental Systems for Basic And Applied Research. Volume VII: Systematics and Evolution, Part B. Springer-Verlag, Berlin, pp. 229–245. https://link. springer.com/chapter/10.1007/978-3-662-10189-6_10 Bisgaard, M., Christensen, H., Clermont, D., Dijkshoorn, L., Janda, J.M., Moore, E.R.B., Nemec, A., NorskovLauritsen, N., Overmann, J. and Reubsaet, F.A.G. (2019) The use of genomic DNA sequences as type material for valid publication of bacterial species names will have severe implications for clinical microbiology and related disciplines. Diagnostic Microbiology Infectious Disease 95, 102–103. https://doi. org/10.1016/j.diagmicrobio.2019.03.007

330

P. Bridge et al.

Bruns, T.D., Fogel, R. and Taylor, J.W. (1990) Amplification and sequencing of DNA from fungal herbarium specimens. Mycologia 82, 175–184. https://doi.org/10.2307/3759846 Bruns, T., White, T. and Taylor, J. (1991) Fungal molecular systematics. Annual Review of Ecological Systems 22, 525–564. https://doi.org/10.1146/annurev.es.22.110191.002521 Bercovitch, F.B., Berry, P.S., Dagg, A., Deacon, F., Doherty, J.B., Lee, D.E. et al. (2017) How many species of giraffe are there? Current Biology 27, R136–R137. https://doi.org/10.1016/j.cub.2016.12.039 De Bertoldi, M., Lepidi, A.A. and Nuti, M. (1973) Significance of DNA base composition in classification of Humicola and related genera. Transactions of the British Mycological Society 60, 77–85. https://doi. org/10.1016/S0007-1536(73)80062-2 Christensen, H., Bisgaard, M. Frederiksen, W., Mutters, R., Kuhnert, P. and Olsen, J.E. (2001) Is characterization of a single isolate sufficient for valid publication of a new genus or species? Proposal to modify Recommendation 30b of the Bacteriological Code (1990 Revision). International Journal of Systematic and Evolutionary Microbiology 51, 2221–2225. https://doi.org/10.1099/00207713-51-6-2221 Colwell, R.R. (1970) Polyphasic taxonomy of the genus Vibrio: Numerical taxonomy of Vibrio cholerae, Vibrio parahaemolyticus, and related Vibrio species. Journal of Bacteriology 104, 410–433. https://doi. org/10.1128/JB.104.1.410-433.1970 Cohn, F. (1872) Untersuchungen über Bacterien. Beiträge zur Biologie der Pflanzen 1, 127–234. Crous, P.W. and Boekhout, T. (2018) News. IMA Fungus 9, 47–51. https://doi.org/10.1007/BF03449437 Federhen, S. (2015) Type material in the NCBI Taxonomy Database. Nucleic Acids Research, 43 (Database issue), D1086–D1098. https://doi.org/10.1093/nar/gku1127 Forin, N., Nigris, S., Voyron, S., Girlanda, M., Vizzini, A., Casadoro, G. and Baldan, B. (2018) Next generation sequencing of ancient fungal specimens: The Case of the Saccardo Mycological Herbarium. Frontiers in Ecology and Evolution 6. https://doi.org/10.3389/fevo.2018.00129 Fritze, D., Martin, D. and Smith, D. (2012) Final report on the GBRCN Demonstration Project. GBRCN Secretariat, Germany. ISBN 978-3-00-038121-8 Giraud, T., Refrégier, G., Le Gac, M., de Vienne, D.M. and Hood, M.E. (2008) Speciation in fungi. Fungal Genetics and Biology 45, 791–802. https://doi.org/10.1016/j.fgb.2008.02.001 Grigoriev, I.V., Nikitin, R., Haridas, S., Kuo, A., Ohm, R., Otillar, R., Riley, R., Salamov, A., Zhao, X., Korzeniewski, F., Smirnova, T., Nordberg, H., Dubchak, I. and Shabalov, I. (2014) MycoCosm portal: gearing up for 1000 fungal genomes. Nucleic Acids Research 42 (Database issue), D699–D704. https://doi.org/10.1093/nar/gkt1183 Hawksworth, D. and Lücking, R. (2017) Fungal diversity revisited: 2.2 to 3.8 million species. In: Heitman, J., Howlett, B., Crous, P., Stukenbrock, E., James, T. and Gow, N. (eds) The Fungal Kingdom. ASM Press, Washington, DC, pp. 79–95. https://doi.org/10.1128/microbiolspec.FUNK-0052-2016 He, X., Li, Q., Peng, W. et al. (2017) Intra- and inter-isolate variation of ribosomal and protein-coding genes in Pleurotus: implications for molecular identification and phylogeny on fungal groups. BMC Microbiology 17, 139. https://doi.org/10.1186/s12866-017-1046-y Hykin, S.M., Bi, K. and McGuire, J.A. (2015) Fixing formalin: A method to recover genomic-scale DNA sequence data from formalin-fixed museum specimens using high-throughput sequencing. PLOS ONE 10, e0141579. https://doi.org/10.1371/journal.pone.0141579 Hyde, K.D., Chomnunti, P., Crous, P.W., Groenewald, J.Z., Damm, U., Ko Ko, T.W., Shivas, R.G., Summerell, B.A. and Tan, Y.P. (2010) A case for re-inventory of Australia’s plant pathogens. Persoonia Molecular Phylogeny and Evolution of Fungi 25, 50–60. https://doi.org/10.3767/003158510X548668 Jones, M.D.M., Forn, I., Gadelha, C., Egan, M.J. et al. (2011) Discovery of novel intermediate forms redefines the fungal tree of life. Nature 474, 200–203. https://doi.org/10.1038/nature09984 Kistenich, S., Halvorsen, R., Schrøder-Nielsen, A., Thorbek, L., Timdal, E. and Bendiksby, M. (2019) DNA sequencing historical Lichen specimens. Frontiers in Ecology and Evolution 7. https://doi.org/10.3389/ fevo.2019.00005 Konstantinidis, K.T. and Tiedje, J.M. (2005) Towards a genome-based taxonomy for prokaryotes. Journal of Bacteriology 187, 6258–6264. https://doi.org/10.1128/JB.187.18.6258-6264.2005 Konstantinidis, K.T., Rosselló-Móra, R. and Amman, R. (2017) Uncultivated microbes in need of their own taxonomy. The ISME Journal 11, 2399–2406. https://doi.org/10.1038/ismej.2017.113 Lagier, J-C., Dubourg, G., Million, M., Cadoret, F., Bilen, M., Fenollar, F., Levasseur, A., Rolain, J-M., Fournier, P-E. and Raoult, D. (2018) Culturing the human microbiota and culturomics. Nature Reviews Microbiology 16, 540–550. https://doi.org/10.1038/s41579-018-0041-0

Where to Now?

331

Lindner, D.L. and Banik, M.T. (2011) Intragenomic variation in the ITS rDNA region obscures phylogenetic relationships and inflates estimates of operational taxonomic units in genus Laetiporus. Mycology 103, 731–740. https://doi.org/10.3852/10–331 Lindner, D.L., Carlsen, T., Nilsson, R.H., Davey, M., Schumacher, T. and Kauserud, H. (2013) Employing 454 amplicon pyrosequencing to reveal intragenomic divergence in the internal transcribed spacer rDNA region in fungi. Ecology and Evolution 3, 1751–1764. https://doi.org/10.1002/ece3.586 https:// doi.org/10.1002/ece3.586 Lücking, R. (2020) Three challenges to contemporaneous taxonomy from a licheno-mycological perspective. Megataxa 1, 78–103. https://doi.org/10.11646/megataxa.1.1.16 Lücking, R. and Hawksworth, D.L. (2018) Formal description of sequence-based voucherless Fungi: promises and pitfalls, and how to resolve them. IMA Fungus 9, 143–166. https://doi.org/10.5598/ imafungus.2018.09.01.09 Lücking, R., Kirk, P.M. and Hawksworth, D.L. (2018) Sequence-based nomenclature: a reply to Thines et al. and Zamora et al. and provisions for an amended proposal “from the floor” to allow DNA sequences as types of names. IMA fungus 9, 185–198. https://doi.org/10.5598/imafungus.2018.09.01.12 Lücking, R., Aime, M., Robbertse, B., Miller, A., Ariyawansa, H., Aoki, T., Cardinali, G., Crous, P., Druzhinina, I., Geiser, D., Hawksworth, D., Hyde, K., Irinyi, L., Jeewon, R., Johnston, P., Kirk, P., Malosso, E., May, T., Meyer, W. and Schoch, C. (2020) Unambiguous identification of fungi: where do we stand and how accurate and precise is fungal DNA barcoding? IMA Fungus 11, 14. https://doi.org/10.1186/s43008-020-00033-z May, T.W., Redhead, S.A., Lombard, L. et al. (2018) XI International Mycological Congress: report of Congress action on nomenclature proposals relating to fungi. IMA Fungus 9, xxii–xxvii. https://doi. org/10.1007/BF03449448 Meier-Kolthoff, J.P., Klenk, H.P. and Göker, M. (2014) Taxonomic use of DNA G+C content and DNA-DNA hybridization in the genomic age. International Journal of Systematic and Evolutionary Microbiology 64, 352–356. https://doi.org/10.1099/ijs.0.056994-0 Merriam-Webster (1983) Webster’s Ninth New Collegiate Dictionary. Merriam-Webster, Inc, Springfield, Massachusetts, pp. 1563. Migula, W. (1894) Über ein neues System der Bakterien. In: Arbeiten des Bakteriologischen Institutes Karlsruhe 1, 235–238. Miller, A.N. and Bates, S.T. (2017) The Mycology Collections Portal (MyCoPortal). IMA Fungus 8, 65– 66. https://doi.org/10.1007/BF03449464 Muñoz-Cadavid, C., Rudd, S., Zaki, S.R., Patel, M., Moser, S.A., Brandt, M.E. and Gómez, B.L. (2010) Improving molecular detection of fungal DNA in formalin-fixed paraffin-embedded tissues: Comparison of five tissue DNA extraction methods using panfungal PCR. Journal of Clinical Microbiology 48, 2147–2153. https://doi.org/10.1128/JCM.00459-10 Murray, R.G. and Stackebrandt, E. (1995) Taxonomic note: implementation of the provisional status Candidatus for incompletely described procaryotes. International Journal of Systematic Bacteriology 45,186–187. https://doi.org/10.1099/00207713-45-1-186 Murray , A.E., Freudenstein, J., Gribaldo , S., Hatzenpichler , R., Hugenholtz , P. et al. (2020) Roadmap for naming uncultivated Archaea and Bacteria. Nature Microbiology 5, 987–994. https://doi.org/10.1038/ s41564-020-0733-x Naranjo-Ortiz, M.A. and Gabaldón, T. (2019) Fungal evolution: diversity, taxonomy and phylogeny of the Fungi. Biological Reviews of the Cambridge Philosophical Society 94, 2101–2137. https://doi. org/10.1111/brv.12550 Nilsson, R.H., Kristiansson, E., Ryberg, M., Hallenberg, N. and Larsson, K.H. (2008) Intraspecific ITS variability in the kingdom fungi as expressed in the international sequence databases and its implications for molecular species identification. Evolutionary Boinformatics online 4, 193–201. https://doi.org/10.4137/EBO.S653 OECD (2001) Working Party on Biotechnology: Biological Resource Centres: Underpinning the Future of Life Sciences and Biotechnology. Overmann, J. and Smith, D. (2017) Microbial Resource centers contribute to bioprospecting of bacteria and filamentous microfungi. In: Paterson, R. and Lima, N. (eds). Bioprospecting – Successes, Potential and Constraints. Springer, pp. 51–79. ISBN: 978-3-319-47933-0 (Print) 978-3-319-47935-4 (Online). http://link.springer.com/book/10.1007/978-3-319-47935-4 Pace, N.R. (1997) A molecular view of microbial diversity and the biosphere. Science 276, 734–740. https://doi.org/10.1126/science.276.5313.734 PMid:9115194

332

P. Bridge et al.

Parker, C.T., Tindall, B.J. and Garrity, G.M. (eds) (2019) International Code of Nomenclature of Prokaryotes. Prokaryotic Code (2008 Revision). International Journal of Systematic and Evolutionary Microbiology 69 (1A), S1–S111. Petzold, A. and Hassanin, A. (2020) A comparative approach for species delimitation based on multiple methods of multi-locus DNA sequence analysis: A case study of the genus Giraffa (Mammalia, Cetartiodactyla). PLoS ONE 15, e0217956. https://doi.org/10.1371/journal.pone.0217956 Réblová, M., Untereiner, W.A. and Réblová, K. (2013) Novel evolutionary lineages revealed in the Chaetothyriales (Fungi) based on multigene phylogenetic analyses and comparison of ITS secondary structure. PLOS ONE 8, e63547. https://doi.org/10.1371/journal.pone.0063547 Schoch, C.L., Seifert, K.A., Huhndorf, S., Robert, V., Spouge, J.L., Levesque, C.A., Chen, W. and Fungal Barcoding Consortium (2012) Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi. Proceedings of the National Academy of Sciences 109, 6241– 6246. https://doi.org/10.1073/pnas.1117018109 Schoch, C.L., Robbertse, B., Robert, V., Vu, D., Cardinali, G., Irinyi, L., Meyer, W., Nilsson, R.H., Hughes, K., Miller, A.N. et al. (2014) Finding needles in haystacks: linking scientific names, reference specimens and molecular data for Fungi. Database: the Journal of Biological Databases and Curation 2014, bau061. https://doi.org/10.1093/database/bau061 Taylor, J.W. (2014) Evolutionary perspectives on human fungal pathogens. Cold Spring Harbor Perspectives in Medicine 5, a019588. https://doi.org/10.1101/cshperspect.a019588 Taylor, J.W. and Swann, E.C. (1994) DNA from herbarium specimens. In: Herrmann, B. and Hummel, S. (eds) Ancient DNA. Springer, New York. https://doi.org/10.1007/978-1-4612-4318-2_11 Thines, M., Crous, P.W., Aime, M.C., Aoki, T., Cai, L., Hyde, K.D., Miller, A.N., Zhang, N. and Stadler, M. (2018) Ten reasons why a sequence-based nomenclature is not useful for fungi anytime soon. IMA Fungus 9, 177–183. https://doi.org/10.5598/imafungus.2018.09.01.11 Thomas, M., Gilbert, P., Bandelt, H-J., Hofreiter, M. and Barnes, I. (2005) Assessing ancient DNA studies. Trends in Ecology and Evolution 20, 541–4. https://doi.org/10.1016/j.tree.2005.07.005 Thompson, C.C., Amaral, G.R., Campeão, M., Edwards, R.A., Polz, M.E. et al. (2015) Microbial taxonomy in the post-genomic era: Rebuilding from scratch? Archives of Microbiology 197, 359–370. https:// doi.org/10.1007/s00203-014-1071-2 Turland, N.J., Wiersema, J.H., Barrie, F.R., Greuter, W., Hawksworth, D.L. et al. (eds) (2018) International Code of Nomenclature for algae, fungi, and plants (Shenzhen Code) adopted by the Nineteenth International Botanical Congress Shenzhen, China, July 2017. Regnum Vegetabile 159. Koeltz Botanical Books, Glashütten, Germany. https://doi.org/10.12705/Code.2018 Verkleij, G., Martin, D. and Smith, D. (2016) Microbial Resource Research Infrastructure Best Practice Manual on Access and Benefit Sharing, published online at MIRRI (May 2016) https://zenodo.org/ record/284881. https://doi.org/10.5281/zenodo.284881 and ABSCH (November 2016): https://absch. cbd.int/api/v2013/documents/F1C80F1C-1EB7-F02A-CEED-E7D523F17079/attachments/ MIRRI%20ABS%20Manual_web.pdf Větrovský, T., Kolar í̌ k, M., Žifčáková, L., Zelenka, T. and Baldrian, P. (2016) The rpb2 gene represents a viable alternative molecular marker for the analysis of environmental fungal communities. Molecular Ecology Resources 16, 388–401. https://doi.org/10.1111/1755-0998.12456 Weir, B.S., Johnston, P.R. and Damm, U. (2012) The Colletotrichum gloeosporioides species complex. Studies in Mycology 73, 115–180. https://doi.org/10.3114/sim0011 Wang, D.Y.C., Kumar, S. and Hedges, S.B. (1999) Divergence time estimates for the early history of animal phyla and the origin of plants, animals and fungi. Proceedings of the Royal Society of London B 266, 163–171. https://doi.org/10.1098/rspb.1999.0617 Wang, S., Liu, Y. and Xu, J. Comparison of different drying methods for recovery of mushroom DNA. Scientific Reports 7, 3008. https://doi.org/10.1038/s41598-017-03570-7 Wayne, L., Brenner, D.J., Colwell, R.R., Grimont, P.A.D., Kandler, O., Krichevsky, M.I., Moore, L.H., Moore, W.E.C., Murray, R.G.E., Stackebrandt, E., Starr, M.P. and Trüper, H.G. (1987) International Committee on Systematic Bacteriology: Report of the ad hoc committee on reconciliation of approaches to bacterial systematics. International Journal of Systematic Bacteriology 37, 463–464. https://doi.org/10.1099/00207713-37-4-463 Whitman, W.B. (2016) Modest proposals to expand the type material for naming of prokaryotes. International Journal of Systematic and Evolutionary Microbiology 66, 2108–2112. https://doi.org/10.1099/ ijsem.0.000980 PMid:26902077

Where to Now?

333

Wu, B., Hussain, M., Zhang, W., Stadler, M., Liu, X. and Xiang, M. (2019) Current insights into fungal species diversity and perspective on naming the environmental DNA sequences of fungi, Mycology 10 (3), 127–140. https://doi.org/10.1080/21501203.2019.1614106 Yahr, R., Schoch, C.L. and Dentinger, B.T. (2016) Scaling up discovery of hidden diversity in fungi: impacts of barcoding approaches. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 371, 20150336. https://doi.org/10.1098/rstb.2015.0336 Yilmaz, N.,Visagie, C., Houbraken, J., Frisvad, J. and Samson, R. (2014) Polyphasic taxonomy of the genus Talaromyces. Studies in Mycology 78, 175–341. https://doi.org/10.1016/j.simyco.2014.08.001 Zamora, J.C., Svensson, M., Kirschner, R. et al. (2018) Considerations and consequences of allowing DNA sequence data as types of fungal taxa. IMA Fungus 9, 167–175. https://doi.org/10.5598/imafungus. 2018.09.01.10

Appendix

Abbreviations and Acronyms List Chapters 1–10 AAI, Amino acid identity AMF, Arbuscular mycorrhizal fungi ANI, Average nucleotide identity BOLD, Barcode of Life Database CFA, Cellular fatty acids CHCA, α-cyano-4-hydroxy-cinnamic acid CoL, Catalogue of Life CPR, Candidate Phyla Radiation DHB, di-hydroxy-benzoic acid DMK, 2-demethylmenaquinone DORA, Declaration on Research Assessment DPD, Digital Protologue Database DSI, Digital Sequence Information eDNA, DNA extracted from environmental samples EMbaRC, European Consortium of Microbial Resource Centres EOL, Encyclopedia of Life EOSC, European Open Science Cloud FAIR, Findable, accessible, interoperable, and reusable FT-ICR MS, Fourier-transform ion cyclotron resonance GBIF, Global Biodiversity Information Facility GC, Gas chromatography GEBA, Genomic Encyclopedia of Bacteria and Archaeae GGD, Genome-to-genome distance

GNA, Global Names Architecture GODAN, Global Open Data for Agriculture and Nutrition GOLD, Genomes OnLine Database GSD, Global Species Databases GTDB, Genome Taxonomy Database GWAS-M, Genome-wide association study for microbes HTS, High-throughput sequencing iBOL, International Barcode of Life ICBN, International Code of Botanical Nomenclature ICN, International Code of Nomenclature for Algae, Fungi, and Plants ICNB, International Code of Nomenclature of Bacteria ICNP, International Code of Nomenclature of Prokaryotes ICSB, International Committee on Systematics of Bacteria ICSP, International Committee on Systematics for Prokaryotes ICZN, International Code of Zoological Nomenclature IEFPP, Isoelectric focusing protein profiling IF, Index Fungorum IGS, Intergenic spacer IMA, International Mycological Association IMC, International Mycological Congress INSDC, International Nucleotide Sequence Database Collaboration IPNI, International Plant Names Index

© CAB International 2021. Trends in the Systematics of Bacteria and Fungi (eds. P. Bridge, D. Smith and E. Stackebrandt)

335

336 Appendix

IRCC, Internationally recognized certificate of compliance IRMNG, Interim Register of Marine and Nonmarine Genera ITIS, Integrated Taxonomic Information System LBSN, List of Bacterial Names with Standing in Nomenclature LC/MS, Liquid chromatography/mass spectrometry LDA, Linear discriminant analysis LIMS, Laboratory information management system LPSN, List of Prokaryotic names with Standing in Nomenclature LYS2, L-2-aminoadipate reductase MA, Mycolic acid MAFFT, Multiple alignment using fast Fourier transform MAGs, Metagenome-assembled genomes mBRC, Microbial biological resource centre mOTU, Molecular operational taxonomic units MSI, Mass Spectrometry Identification MTA, Material transfer agreement NIH, National Institutes of Health NMR, Nuclear magnetic resonanc PoWo, Plants of the World Online RFLP, Restriction Fragment Length Polymorphisms RPB1, RNA polymerase II largest subunit RPB2, RNA polymerase second largest subunit SAGs, Single Cell Amplified Genomes

SCG, Single cell genome SNP, Single nucleotide polymorphism SSSDs, Single-strain species descriptions TEF-1α, Translation elongation factor 1α TNRS, Taxon Name Resolution Service TOPs, Technical operating procedures TUB2, Partial β-tubulin 2 WGS, Whole-genome sequencing WoRMS, World Registry of Marine Species

Chapters 11–17 AFTOL, Assembling the Fungal Tree of Life BSC, Biological species concept ESC, Ecological species concept GCPSR, Genealogical Concordance Phylogenetic Species Recognition LSR, Lineage-specific regions ME, Minimum evolution MONA, Monothetic analysis MSC, Morphological species concept PSC, Phylogenetic species concept SGC, Strict genealogical concordance SGD, Saccharomyces Genome Database SRA, Sequence Read Archive TSC, Typological species concept WES, whole-exome sequencing

Index

Note: The page references in italics and bold represents the figures and tables respectively. Access and Benefit- Sharing Clearing-House (ABSCH) database 90 Actinomycetales 144 á-cyano-4-hydroxy-cinnamic acid (CHCA) 123 Ad Hoc Technical Expert Group on Digital Sequence Information on Genetic Resources (AHTEG) 329 Allomyces 23 Amanita phalloides 26 American Type Culture Collection (ATCC) 35 Amoebozoa 21–22 AmpliconNoise 183 amplicon sequence variants (ASVs) DADA 184, 185 de novo OTU clustering 185, 186, 187 deionisation 184 error estimation 187 MED 183, 184 meta-analysis 185 N. meningitidis 183 OTU 183, 185 soil samples 185 SSU rRNA 187 strain-level 183 UNOISE 184 Aphelidiomycota 22 arbuscular mycorrhizal fungi (AMF) 121 Archaeorhizomycetes 22 Archives of Microbiology and Current Microbiology 8, 12 Arthrobacter 99 Artificial neural networks 96 Ascomycota 23 asexual reproduction 23 Asian Biological Resource Center Network 57 Aspergillus 121

average amino acid identity (AAI) 169 average nucleotide identity (ANI) 152, 169, 324

Bacillus 31 bacterial classification catch-all taxonomy 323 chemotaxonomic markers 324 DNA-DNA hybridization 323 ecotypes 324 harmful pathogens 1 MALDI-TOF 323 metagenome and microbiome studies 1 microbiology 1 molecular systematists 1 pairwise gene sequence 324 bacterial genome analysis ceteris paribus 256 schools of taxonomy bootstrapping 259, 260 clustering/cluster analysis 257 evolutionary taxonomy 257 genealogies 256 genome-scale data 260 HGT 259, 260 Linnaean system 256 neighbour joining (NJ) algorithm 258 outgroup rooting 259 phenetic, phylogenetic systematics comparison 257 phylogenetic systematics 256 UPGMA 258 taxonomic classifications, conflicts Actinobacteria 264, 266

337

338 Index

bacterial genome analysis (continued ) Bacteroidetes 264, 265 DDH 266, 267 G+C content 266 genome-scale data sets 266 Micromonospora 265 phenotypic and genotypic characters 265 phylogenetic analysis 265 polyphasic taxonomic classification 267 whole-genome sequences 255 bacterial nomenclature, future 325 bacterial population, recombination ecological divergence 288 Mayr’s brake 287 recombination rate 288 Synechococcus 288 bacterial species taxa 287 bacterial systematics acid-fast bacteria 94 DNA/DNA 94 gram stain 93 MALDI-TOF MS 94–95 protein 94 bacterial taxonomic assessment arithmetic growth 11 bacterial properties 4 Bergey’s Manual of Systematic Bacteriology 6–7 bioinformatics requirements 12 chemotaxonomy 5 cholera disease 4 classification 7 cultural properties 4 descriptions 7 early era 2, 3 electron microscopy 4 genomic content 13 genomic data 8 high-throughput approaches 11 influence 12 innovations AAI 9 ANI values 9 biotechnological applications 10 classification system 10 culturomics approach 10 DNA fragments 9 genomic taxonomy approach 9 genotyping 9 high-throughput sequencing technologies 9 in silico analyses 9 metabolomics 10 molecular ecology 9 molecular phylogenies 9 phylotaxonomy 10 prokaryotes 8

ribosomal proteins 10 sequencing approaches and bioinformatic tools 10 16S rRNA gene 8, 10 strains 10 microbial diversity 12 microbial systematics and taxonomy 4 molecular approaches 5 physiological properties and habitat 4 polyphasic taxonomy 5 prokaryotes 5–6 reconciliation 6 ribosomal RNA sequences 12 species descriptions 11 strains 4 transformation experiments 4 Vibrio-shaped bacterium 2 bacterial taxonomy demarcates 16S rRNA gene sequence 283 ANI 284 DNA sequencing 283 high-throughput sequencing 285 multilocus sequence analysis 284 polyphasic approach 284 umwelt 283 whole-genome sequencing 284 bacterial test standard (BTS) 125 Bacteriological Code 7, 11, 13 Bacteroides 93 Basidiomycota 23 Beauveria 71–72, 74, 78 Belgian Coordinated Collection of Microorganisms (BCCM) 35 Betaproteobacteria 153 Biological resource centres (BRC) 56 biological species concept (BSC) 301, 302 Biomarkers 103 BioNumerics 99 Blastocladiomycota 23 Bruker Biotyper systems 100, 133 Burkholderia cepacia 110

candidate phyla radiation (CPR) 168 Candidatus 12, 39 Carry forward/incomplete extension (CAFIE) errors 77 CC data management systems (CCMS) 85–86 cellular fatty acids (CFA) 146 cell wall components electron microscopy 142 microorganisms 142, 143 paper chromatography 142 peptidoglycan 143 ‘stem peptide’, 143 sugars 143

CheckM software 188 chemotaxonomy, 5 application 147–149 automated systems 157 biomarkers 157 cell constituents 141 chemical constituents 142 databases 157 DNA sequences 153 enzymes 155 fatty acid products 155 genes and biochemical pathways 157 genome sequence 153 genomic information 152 genomics 149 in silico analysis 157 interpretation of data 149 Intrasporangiaceae 149, 151 KEGG database 153 ligases 152 microbial diversity 141 microbial phenotype 142 microorganisms 149 molecular and genotypic methods 142 morphological and physiological characters 142 online tools 152 peptidoglycan 152 phosphatidylserine 153 phylogenetic tree 149, 150, 153, 154 polar lipid analysis 153, 155, 156 prokaryotes 142 properties 142 proteomics 149 16S rRNA gene sequences 152 systematics 141 taxonomy 141 traditional polyphasic approach 141 2D-TLC 149 Chrysogena 78 Chytridiomycota 23 Clostridium difficile 31, 97 Code of Nomenclature 325 coffee berry disease (CBD) 243, 244 Colletotrichum spp. 78, 128, 129 ascomycete fungus 305 classification problems 305 description 304 multi-locus phylogenies 305 species complexes 305 Common Access to Biological Resources and Information (CABRI) 59 Cordyceps 74 Corynebacterium 143, 157 cryopreservation 62 culture-free methods 13 Cutibacterium acnes 100, 102 cyanobacteria 40

Index 339

Cyanobacteria Phylum 172–174, 173 cyanophyta 40

dark taxa cluster techniques 205 defined 204 DNA sequence data 205 ITS 205 Sequence Read Archive (SRA) 204 sequence-based species names 205 data resources CCMS 85–86 clients 83 collaboration and coordination mechanisms 84 collections 91 culture collections (CC) 83, 84, 84 fungal/bacterial culture 84 genomic and bioinformatic data 84 handling of data 84 herbaria 83 Index Fungorum 83 MycoBank 83 next-generation sequencing 84 reliable and useful data 86, 87 standards and open access 89, 89–90 strain 83 supporting fungal taxonomy 86, 88–89 WDCM 84 Declaration on Research Assessment (DORA) 12 Deutsche Sammlung von Mikroorganismen und Zellkulturen (DSMZ) 36 D-glutamic acid 143 diaminopimelic acid (DAP) 143 digital object identifier (DOI) 12 Digital Protologue Database (DPD) 12 Digital sequence information (DSI) 90 2,5-di-hydroxy-benzoic acid (DHB) matrix 123 divisive amplicon deionising algorithm (DADA) 184 DNA barcode, fungal identification definition 201 loci and primers 201, 202 translation elongation factor 1 α (TEF1 α) 201 DNA-DNA hybridization (DDH) 8, 218, 266, 283

ecotypes bacterial species taxon 289 bacterial systematics Escherichia coli strains 289 pitfalls 290 population size 290 species-like properties 291 microbial ecologists 289 species-like properties 289 electron microscopy 4 Enteroaggregative E. coli (EAEC) 111

340 Index

Enterobacteriaceae 146, 153 Enterohemorrhagic (EHEC) 111 Escherichia coli 125 Eumycetozoa 21–22 European Strategy Forum on Research Infrastructures (ESFRI) 57 Exiguobacterium oxidotolerans 98 Exophiala 78

filamentous fungi 198 Findable accessible interoperable and reusable (FAIR) 134 Flavobacterium 98 Fluorescence In Situ Hybridization (FISH) 226 force of cohesion, bacteria adaptive introgression 291 periodic selection 288–289 recombination 291 species taxa, limitation ANI 292, 293 ecological constraints, recombination 293 genome-wide cohesion 292 homologous recombination 292, 293 microbial ecologists 292 rate of recombination 293 species-like property 293 Fouriertransform ion cyclotron resonance (FT-ICR MS) 10 fungal identification asexual reproduction 126 Basidiomycete species 121 biochemical features 119 biological material 121 biomarkers 120, 124 biomass 122 biotyper system 133 calibration 124–125 CHCA and DHB matrices 124 chemical compounds 125 classification 119 commercial database 132 conventional methods 119 cryptic and dimorphic 130, 131, 131 dark taxa 204–205 DNA barcode 200–201 DNA polymorphisms 126 DNA sequence-based 119 energy transfer 123 filamentous fungi 121 genetic variability 126 human pathogens 126 human skin 126 ITS region 199–200, 207 laser emission 123 laser pulse 124 limitations 121

applications 126 lipid fingerprinting 129, 130 lipids 128 photoexcitation/pooling mechanism 128 pigments 128 macro- and micro-morphology analyses 119 mass spectra 133 mass spectrometry technique 122 mass-to-charge ratio (m/z) scale 124 matrices 121 matrix functions 122 matrix solution 120 123 modern microbial taxonomy 120 molecular analysis 126 molecular biology analysis 121 molecular mass 123, 124 molecular methods 120 mycelium 122 phenotypic analyses 125 polyphasic approaches 125 proteomes 126 rapid and reliable sequencing analysis 120 reference sequence libraries 201–204 TEF1 α locus 208 soft ionization 123 spectra 120 spores 122 strain 123 taxonomic schemes 125 yeast and filamentous 121 fungal sequence data Beauveria 71–72 clustering 74 complex species 78 cut-off values 76–77 duplicated sequences 77 fishing 72, 73 fungal DNA sequences 69 ITS 71, 75–76 limitations 75 microbial systematics 69 mislabelled sequences 75 molecular techniques 69 multi-gene sets 76 names and labels 77–78 outcome 74–75 rRNA gene 70 fungal species concepts DNA sequences 322 ITS sequences 322 molecular taxonomists 323 polyphasic approach 323 sequence-based scientists 322 whole-genome sequences 323 fungal systematics Aspergillus Aflatoxins 241

Index 341

invasive pulmonary aspergillosis (IPA) 241 Colletotrichum anthracnose 243 C. siamense 243 coffee berry disease (CBD) 243, 244 pathogenicity-related genes 244 Fusarium core and accessory regions 243 F. oxysporum 242, 243 LS regions 242 species concepts 242 Penicillium 240, 241, 245 Saccharomyces 240 fungal taxonomy and nomenclature Aspergillus 207 Basidiomycota 206 cryptic species 206 Fusarium 207 ITS, LSU 206 reference sites 207 sequence-based fungal identification 205 species complex 206 fungi bioinformatic pipeline 25 challenges 21 classification of phyla 22 conservation 26 dichotomous keys 26–27 diversity 21 25 factors 24 functional genes coding 22 geographic distribution 25 heterokont zoospores 21 hosts 26 isolates/collections 24 lineages 22 metabarcoding 25 microhabitats 26 morphological and biochemical characters 21, 22 morphological features 25 mycological community 27 mycoparasites 25 pathogenic fungi 27 phylogenetic classification 25 processes 24, 26 reference collection databases 26 sexual and asexual 25 tracking 26 trophic modes 23 type information 26 type specimen 24 unusual morphological characters 27 user groups 26 utility of sequence data 27 fungi genome sequencing comparative genomics GWAS 238

lineage-specific regions (LSR) 237 population genomics 237 QTL mapping 238 WGS 237 genetic association studies 238 GWASs 238 methodology (see genome sequencing methods) NGS 232–234 Penicillin 238, 245 fungi, species concepts difficulties 302 Fusarium description 305 F. guttiforme 123, 123 F. oxysporum 306 forma specialis 306, 307 forma specialis lycopersici 307 genomic compartmentalization 307 horizontal chromosome transfer 307 pathogenic races 306 pathogenicity testing 308 SIX genes 307 translation elongation factor 1- α (TEF) 306 Fusobacterium 93

Galleria mellonella 132 Gammaproteobacteria 153 GenBank 232, 245 Genealogical Concordance Phylogenetic Species Recognition (GCPSR) 233 genome-based classification advantages 218 Candidatus names 224, 226 limitations Blastn method 220 DNA level 220 genome-binning methods 220 standardization 220 uncultivated taxa Candidatus names 226 genome sequence 225 MAG/SAG 225 metagenomic methods 224 NanoSIMS 225 neotype strain 225 unknown query genome 226 genome-based species taxonomy genomes demarcating 285–286 MAG 285 novel species phenotype 286 type strain, genome sequence 285 genome classification resources AAI 222, 223, 224 ANI 222, 223, 224 Bacillus anthracis 221, 222 GTDB 221, 222 MiGA 221

342 Index

genome classification resources (continued ) ProGenomes 221 protein-coding genes 223 genome sequencing methods data analysis data generation 236 interpretation 236 quality control 236 visualization 236 sequencing technologies de novo sequencing 235 dideoxy chain-termination 234 epigenetics 235–236 genomic variation and mutation d etection 236 RNA-Seq 235 third-generation sequencing 235 Genome sequence 111 genome taxonomy database (GTDB) 9, 188, 270 genome-to-genome distance (GGD) 169 Genome-wide association studies (GWAS) 286 genomic data 8 Genomic Encyclopaedia of Archaea and Bacteria (GEBA) 255 genomic microbial taxonomy bacterium 170 biodiversity 172 bioinformatics and analytical work 172 cell and colony morphology 172 clinical microbiology 169 Cyanobacteria Phylum 172–174, 173 definition 169 gene sequences 172 genome signatures 169 metagenomes 172 metagenomics 172 microbial ecology 169 open-access catalogue 171, 169 phylogenetic allocation 169 phylogenetic analysis 171 picocyanobacterial genomes 172 under-studied groups 172 whole-genome sequences 172 genomic plasticity 237 genomics 231 Global Biological Resource Centre Network (GBRCN) programme 328 Global Catalogue of Microorganisms (GCM) 83 Global Open Data for Agriculture and Nutrition (GODAN) model 89 Glomeromycota 23 glycolic acid 143 Golden Age of Microbiology 4

Haemolytic uremic syndrome (HUS) 111 Haloferax volcanii 38

high-containment laboratory 111 homoplasy 263, 264 Horizontal gene transfer (HGT) 259 Hutchinson’s aspiration 283 hyphae 23 Hyphochytriomycota 21 Hypocrea 25 Hypocreales 74

Internal Transcribed Spacer (ITS) 27, 70, 125 International Code for Botanical Nomenclature (ICBN) 40 International Code of Nomenclature of Prokaryotes (ICNP) 7, 30 International Committee on Systematics for Prokaryotes (ICSP) 13–14 International Journal of Systematic and Evolutionary Microbiology (IJSEM) 7 International Mycological Congress (IMC) 27 International Nucleotide Sequence Database Collaboration (INSDC) 12, 70 Isoelectric focusing protein profiling (IEF-PP) 94 ITS region, fungal identification definition 199 DNA barcoding 200 group-specific thresholds 200 intragenomic variation 200 species-rich genera 200 yeast identification 199

Kauffmann–White method 96

Laboratory information management system (LIMS) 85 Laboulbeniomycetes 23 Labyrinthulomycota 21 Lactobacillus 99, 100 Lawsonia 31 lichenicolous fungi 23 Lichenized and Non-Lichenized Ascomycetes (LIAS) 27 Linnaeus binomial nomenclature 320 lipids archaeal 146 bacteria 145, 146 chemotaxonomic markers 145 electron transport mechanisms 145 isoprenoid quinones 145 isoprenyl units 145 microorganisms 145 napthoquinones 145 polar 145 polyunsaturated forms 146 quinone ring system 145 respiratory lipoquinones 145 substrate-level phosphorylation 145

Index 343

liquid chromatography/mass spectrometry (LC/MS) 146 bacterial phylogenetics 109 bacterial species 110 data acquisition and bioinformatics pipeline 109–110 infectious agents 110 pathogenic E. coli 110 peptide biomarkers 110 peptide/protein analysis 110 taxonomic implications 110 List of Bacterial Names with Standing in Nomenclature (LBSN) 35–36 List of Prokaryotic Names with Standing in Nomenclature (LPSN) 11, 35 long-read sequencing protocols 193 lysine 143

MAGs taxonomy ANI 189 CheckM software 188 genetic distance 189 GTDB 188, 189 GTDB-Tk 190, 191 horizontal gene transfer (HGT) events 190 NCBI 189 RED score 190 single-copy marker genes 188 SSU rRNA 187 Malassezia 133 MASH 189 Mass Spectrometry Identification (MSI) Platform 134 mass spectrometry (MS) 146 material transfer agreement (MTA) 59 matrix-assisted laser desorption/ionization time of fly mass spectrometry (MALDI-TOF MS) 10, 61, 85, 104, 108, 321 advantage 106 applications 107 barriers 100, 96 biomarker 109 clinical laboratory 106 clinical microbiology application 95–96 Exiguobacterium aurantiacum 95 Gram-positive and Gram-negative cells 97 Gram stain 97 mass spectral database 97 microbiology 97 pathogens 97 PCR-based products 96 ProteinChips 96 Sequenom GmbH Hamburg 96 Staphylococcus aureus 95 Cutibacterium acnes 100, 102 dendrograms 127 design features 107

diversity 99 genetic variants 99 high-resolution forms 109 limitations 107 mass spectral typing system 99 MS companies 105–106 online database 107 proteotypes 102, 103, 104, 105 resolution 99 microbiology 94–95 non-clinical laboratory 97–99 role 97–99 whole-genome sequencing 104, 105 metabarcoding MAGs GTDB 192 long-read sequencing 192 retrotransposons 191 SILVA 192 single marker gene 192 SSU rRNA marker gene 190 OTU 180–181 sequenced-based clustering 180 SSU rRNA 180 metabolism 13 metabolomics 10 Metagenome assembled genomes (MAGs) 13, 180, 285 metagenomic approaches 13 metagenomics defined 239 DNA sequencing 239 ITS 239, 240 sequence-based 239 16S rRNA 239 metaproteomics 13 Metarhizium acridum 77 metatranscriptomics 13 Metazoa 21 Methanobacteriales 143 Methanopyrus 143 microbial diversity 12 microbial domain biological resource centres (mBRC) 55, 83, 327 microbial genomic taxonomy in silico phenotyping 170–171 nomenclatural rules 169 phenotypic characterization 168 polyphasic approach 168 prokaryotic taxonomy 168 rules 168 tree of life 168 Microbial Identification System (MIS) 146 Microbial Resource Research Infrastructure (MIRRI) 57, 83, 327 microbial systematics 14, 256 Micrococcus radiodurans 143

344 Index

microorganisms 2 approved and rejected names 43 axenic cultures 32 Candidatus 39 cryptic diversity 43–44 CyanoDB 41 databases 44–45 data standards 44–45 digital resources 40–42 descriptive data 46–48 names 45–46 sequence-related databases 48–49 taxa 46 eDNA 44 electronic journals 32 fungi 30 ICN 42 IJSB/IJSEM 34–35 LBSN 35–36 MycoBank 47 NamesforLife 36–37, 37–38 names of fungi 40–42 nomenclatural acts 43 nomenclature 30 31 pleomorphic life cycles 43 PNU 36 polyphasic approach 30 prokaryotes 30 quality control 63 rules of this Code 31–32 strains 33 taxonomic categories 32 minimum entropy decomposition (MED) 183 molecular methods 179 molecular operational taxonomic units (mOTUs) 25, 70 morphological identification 198 Mortierella 70 Mucoromycota 23 Multilocus enzyme electrophoresis (MLEE) 94 Multilocus sequence analysis (MLSA) 8, 125, 218 Multilocus sequence typing (MLST) 94, 204 Multiple alignment using fast Fourier transform (MAFFT) 72 Mutually agreed terms (MAT) 90 Mycobacterium smegmatis 144 Mycobacterium tuberculosis 110 Mycolic acids (MAs) 147 mycology reference material curating names C. gloeosporioides 327 eco-system reference 326 full-genome sequences 327 name and species concept 326 reference strains 327 herbarium resources 325 Index Fungorum 325 MycoBank 325

mycology systematics MALDITOF 321 Molecular identification 321 phenotypic characters 321 mycorrhizal fungi 23

N-acetylglucosamine acid 152 N-acetylmuramic acid 152 Nagoya Protocol ABRCN Task Force 328 Access and Benefit Sharing (ABS) Manual 328 AHTEG 329 digital sequence information (DSI) 328 genetic resource 328 Network of International Exchange of Microbes under the ACM (NIEMA) system 328 NamesforLife 36–37 National Centre for Biotechnology Information (NCBI) 70–71 National Institute of Health (NIH) 134 Neisseria meningitidis 183 Neocallimastigomycetes 23 New Microbes New Infections 12 next-generation sequencing (NGS) GCPSR 233 genomics 234 Pyricularia 234 species boundaries 233 species concept 232–233 species criteria 233 Nuclear magnetic resonance (NMR) 146 nucleic acid hybridization 5

Oomycota 21 operational taxonomic units (OTUs) 180 advanced Illumina sequencing 182 closed reference 181 de novo clustering 182 DNA-DNA hybridization (DDH) 181 PCR/sequencing errors 183 SSU rRNA 181, 182 Organisation for Economic Co-operation and Development (OECD) 56

Penicillium chrysogenum 25, 77, 78 Peptides 111 peptidoglycans 143, 144 percentages of conserved proteins (POCP) 269 phylogenetic species concept (PSC) coalescent-based species delimitation (CBD) 303 definition 302 molecular data 302 PhyloCode 303 strict genealogical concordance (SGC) 303

Index 345

phylogenies 22 Phytophthora species 128 pineapple pathogen 128 Plantae 21 ploidy 23 polyamines 147 polyphasic approach 169, 282 polyphasic taxonomy character and phylogenetic tree 263 genotypic data 261 phenotypic apomorphies 261 phylogenetic data 262 phylogenetically coherent group 262 plesiomorphies 264 16S rRNA gene 261, 264 16S signatures 262 Turicella 263 weight, decrease 262 prior informed consent (PIC) 60 prokaryotes 5–6 Prokaryotic Nomenclature Up-to-Date (PNU) 36 Proteus 31 Pseudomonadaceae 4 Pseudomonas aeruginosa 99, 101, 110 Pseudomonas costantinii 98

quantitative trait locus (QTL) mapping 237, 238 quality control procedures 59

radio-frequency identification (RFID) 61 reference sequence libraries 27 Barcode of Life Database 202 Centraalbureau voor Schimmelcultures (CBS) 204 fungal DNA barcoding 201 INSDC ITS sequences 201 ISHAM-AM 204 ISHAM-ITS reference database 204 RefSeq Targeted Loci Project 202 sequence-based identification database 201, 203 web-based UNITE database 201 reference strains algae/fungi 55 application of techniques 56 bacteriology 55 biological material 56 collection/institution 55 crop production 56 DNA sample preparation 59 environment to laboratory 57–58 ‘ex-type species,’ 55 food quality 56 MBRC management 64–66 molecular techniques 56 nomenclatural types 55 non-culture assessment 57

preservation techniques 61–63 prokaryotes 55–56 quality management 57 sample acquisition and authentication 59–61 storing 59 testing stability 63–64 type specimens 56 valid description 56 Rhizoctonia solani anastomosis groups (AGs) 303 description 303 hyphal cells 303, 304 internal transcribed spacer (ITS) sequences 304 Rozella allomycis 23

SARAMIS system 133 sequence-based classification 199 sequence-based identification 199 single cell amplified genomes (SAGs) 13 single-nucleotide polymorphism (SNP) 96 single-strain species descriptions (SSSD) 11 16S ribosomal RNA (rRNA) gene 108, 217, 218 small subunit (SSU) rRNA 179 sodium dodecyl sulphatepolyacrylamide gel electrophoresis (SDS-PAGE) 94 Spirillaceae 4 spitzenkörper 22 Sporothrix brasiliensis 132 Sporothrix species 120–121 standard operating procedures (SOPs) 64 Staphylococcus aureus 99, 102 Stenotrophomonas 98 Surface Enhanced Laser Desorption/Ionisation Time of Flight Mass Spectrometry (SELDI-TOF MS) 96, 105 Systematic and Applied Microbiology 8

Talaromyces marneffei 132 taxonomic ranks empirical values 270 fossil record 268 GTDB 270, 271 Linnaean system 268 Merriam-Webster Learner’s Dictionary 269 monotypic taxa 268 Mycobacterium 271 non-monophyletic taxa 269, 271 phylogenetic trees 268 POCP 269 synapomorphies 271 time banding 268 technical operating procedures (TOPs) 64 temperature measurement 62 Thermoanaerobaculum aquaticum 153 Thermo Fisher Orbitrap 103

346 Index

threo-3-hydroxyglutamic acid 143 Top-down proteomics 112, 112 Traitar algorithm 286 Trichoderma asperellum 25, 126 Trichophyton rubrum 125 two-dimensional thin-layer chromatography (2D-TLC) 146 typological (phenetic) species concept (TSC) 301

unweighted pair group method by average (UPGMA) 258

variable number tandem repeat (VNTR) 99 Verticillium branched conidiophores 308 description 308 hyaline conidia 308 intraspecific diversity D pathotype 311, 312 non-defoliating (ND) pathotype 311, 312 pathogenic races 312 pathogenic variation 311 V. dahliae 311, 312 VCGs 311

VdDf5, VdDf6 genes 312 VW diseases 312 redefined genus ascomycete fungi 308 lineage-specific (LS) regions 310 microsclerotia 309 resting structures 309 V. albo-atrum 309, 310 V. dahliae 309, 309, 310 V. longisporum 310 VW diseases 311 Verticillium wilts 308 Vibrio cholerae 2 VITEK MS systems 133

Whole-genome sequencing (WGS) 93, 169 aligned fraction (AF) 220 ANI 219 average nucleotide identity (ANI) 219, 220 DDH 219 MiGA webserver 220 Mycobacterium tuberculosis 219 sequence gap 219 16S rRNA gene 218, 220 World Federation for Culture Collections (WFCC) 57