Viral Fitness and Evolution: Population Dynamics and Adaptive Mechanisms 3031156390, 9783031156397

This book unifies general concepts of plant and animal virus evolution and covers a broad range of topics related to the

239 10 11MB

English Pages 349 [350] Year 2023

Table of contents :
Foreword
Contents
Virus Evolution on Fitness Landscapes
1 Introduction
2 Landscape Models and Landscape Features
2.1 Classes of Landscapes
2.2 Abstract Rugged Landscapes
2.3 Additivity, Epistasis, and Elementary Landscapes
2.4 Kauffman's NK Model
2.5 Neutrality
2.6 Holey Landscapes
3 The RNA Model
3.1 RNA Secondary Structures
3.2 Counting Structures
3.3 Minimal Free Energy Structures
3.4 Neutral Networks and Compatible Sequences
3.5 Neutral Networks and Random Graphs
3.6 Evolution and Neutral Networks
3.7 Shape Space Covering
3.8 Suboptimal Structures and Compatibility
3.9 Kinetic Effects of RNA Folding
4 Topology of Sequence and Shape Spaces
4.1 Accessibility
4.2 Evolution in silico and Discontinuous Transitions
4.3 Accessibility and the Topology of Phenotype Space
4.4 Coarse Graining and Quotients
5 Quasispecies—Populations in Sequence Space
5.1 Replication and Selection
5.2 Replication, Mutation and Selection
5.3 Quasispecies on Uniform Class Landscapes
5.4 Simple Fitness Landscapes
5.5 Realistic Random Landscapes
5.6 Finite Population Size
6 Empirical Landscapes
6.1 Estimating Landscapes Parameters
6.2 Local Inference of Landscapes
7 Conclusions and Outlook
References
Viral Fitness Landscapes Based on Self-organizing Maps
1 Introduction
2 Fitness Landscapes
3 Self-organizing Maps
4 Real Fitness Landscapes
4.1 HIV-1 Biological Clones
4.2 SOM-Based 3D Fitness Landscape from Complete HIV-1 RNA Sequences, Using Experimentally Determined Fitness Values
4.3 SOM-Based 3D Fitness Landscape from HIV-1 RNA Sequences of the V1-V2 Region in Env Gene, Based on Experimentally Determined Fitness Values. Analysis of Related Variants
5 Fitness Landscapes Based on Haplotype Frequencies
5.1 HCV Populations. Ultra-Deep HCV RNA Sequencing
5.2 SOM-Based 3D Fitness Landscape Derived from Haplotype Frequencies
6 Discussion
References
Virus Evolution Faced to Multiple Host Targets: The Potyvirus—Pepper Case Study
1 Introduction
2 A Plethora of Recessive Resistances Mediated by eIF4E Alleles with Contrasted Spectrum of Action and Durability: A Model of Plant-Virus Co-evolution
3 Quantitative Resistance as a Mixture of Virus-Specific and Generic Effects
4 A Hypothetical Scenario for Evolution of Complex Resistance Systems
5 Conclusion
References
The Role of Extensive Recombination in the Evolution of Geminiviruses
1 Introduction
2 The Family Geminiviridae
2.1 General Features, Classification, Genome Organization and Replication
2.2 DNA Satellites Associated with Geminiviruses
3 Recombination in the Family Geminiviridae: Occurrence and Patterns
4 Experimental Evolution of Recombination
5 Recombination Events with Evolutionary and Pathological Implications
5.1 Tomato Yellow Leaf Curl Viruses in the Western Mediterranean Basin
5.2 Cassava Mosaic Viruses in Sub-Saharan Africa
5.3 Cotton Leaf Curl Viruses in Pakistan
6 Recombination Events Involving DNA Satellites
6.1 Recombination Between Geminiviruses and DNA Satellites
6.2 Recombination Between DNA Satellites
7 Conclusion
References
Plant Virus Adaptation to New Hosts: A Multi-scale Approach
1 The Multi-scale Nature of Virus Evolution
1.1 Challenges for Modeling Multi-scale Evolving Systems
2 The Pervasiveness of Fitness Trade-Offs
3 The Lowest Level: Evolution Within Cells
4 Cell-To-Cell Transmission and the Colonization of Different Tissues
4.1 The Cellular Contagion Rate and the Spread of Infection
4.2 MOI and the Effective Number of Infectious Units Per Cell
5 Beyond the Individual Host
5.1 Plant Virus Spread and the Basic Reproductive Number
5.2 Constraints Associated to Horizontal and Vertical Transmission Modes
5.3 A Trade-Off Between Within-Host Accumulation and Virulence
5.4 Transmission Bottlenecks Between Hosts
6 Evolutionary Consequences of Host Heterogeneity in the Landscape
6.1 The Complexity of Virus-Host Infection Networks
7 An Integrative New Paradigm: The Multilayer Network
8 Concluding Remarks
References
Viral Fitness, Population Complexity, Host Interactions, and Resistance to Antiviral Agents
1 Fitness Concept and Its Application to Viruses
2 Experimental Fitness Measurements in Cell Culture
3 Experimental Fitness Measurements in vivo
4 Limitations of Fitness Determinations
5 Overview of Fitness Landscapes for Viruses
5.1 Fitness and Virus Propagation Regimen
5.2 Intra-mutant Spectrum Fitness Profiles
6 Fitness in the Development of Antiviral Resistance
6.1 Fitness Cost of Escape Mutants
6.2 High Fitness as Promoter of Antiviral Resistance
7 Mechanisms of Antiviral Resistance Alternative to Direct Selection of Escape Mutants
8 Lessons for COVID-19
9 Conclusions and Prospects
References
Mechanisms and Consequences of Genetic Variation in Hepatitis C Virus (HCV)
1 HCV Introduction
1.1 HCV Impact on Human Health
1.2 Molecular Biology of HCV
1.3 HCV Phylogeny
2 Mechanisms of Genetic Diversity
2.1 Mutation
2.2 Recombination
2.3 Impact of Genetic Variability
3 Final Considerations
References
Mammarenavirus Genetic Diversity and Its Biological Implications
1 Arenaviridae: History and Current Taxonomy
2 Mammarenavirus Impact on Virology and Human Health
2.1 Mammarenavirus Impact on Human Health
2.2 Mammarenavirus as Highly Tractable Experimental Systems for the Investigation of Virus-Host Interactions
3 Molecular and Cell Biology of Mammarenaviruses
3.1 Mammarenavirus Genome Organization
3.2 Mammarenavirus Life Cycle
4 Mammarenavirus Phylogenetic Relationships
5 Mechanisms of Mammarenavirus Genetic Diversity
5.1 Mammarenavirus Mutation Frequencies
5.2 RNA Recombination in Mammarenaviruses
5.3 Mammarenavirus Genomic Reassortments
6 Mammarenavirus Origin and Geographic Distribution
7 Co-evolution of Mammarenaviruses and Their Natural Reservoirs: Intra- versus Inter-host Genetic Variation of Mammarenaviruses
8 Contribution of Genetic Variability to Mammarenavirus Pathogenesis
8.1 Contribution of Viral Quasispecies to Arenavirus Pathogenesis
8.2 Selection of Immunosuppressive Variants During LCMV Persistence
8.3 LCMV Variants and Growth Hormone Deficiency Syndrome
9 Lassa Virus: Origin, Evolution, and Contribution of Genetic Variability to Detection and Pathogenesis
10 Implications of Mammarenavirus Genetic Variability for the Development of Vaccines and Antiviral Drugs
10.1 Mammarenavirus Genetic Variation and Vaccine Development
10.2 Lethal Mutagenesis as a Novel Antiviral Strategy to Combat Mammarenavirus Infections
10.3 Novel Combination Drug Therapy to Combat Mammarenavirus Infections
References
Genome Structure, Life Cycle, and Taxonomy of Coronaviruses and the Evolution of SARS-CoV-2
1 An Introduction to Coronaviruses
2 Discontinuous transcription and subgenomic mRNAs
3 Coronaviruses beyond their sequence: RNA structures
4 SARS-CoV-2 Evolution and Epidemiological Dynamics
5 Disease Spread and Epidemiology
References
Epilogue: CTMI. 13.3.22

Recommend Papers

Islands and Snakes: Isolation and Adaptive Evolution 0190676426, 9780190676414, 9780190676421

Islands and Snakes contains 13 chapters describing ecological systems with foci on snakes and their ecological roles on

248 4 147MB Read more

Cooperation in Primates and Humans: Mechanisms and Evolution 3540282696, 9783540282693

This book examines the many facets of cooperative behavior in primates and humans as some of the world’s leading experts

106 90 5MB Read more

Dynamics and Evolution of Galactic Nuclei 9781400846122

Deep within galaxies like the Milky Way, astronomers have found a fascinating legacy of Einstein's general theory o

97 7 23MB Read more

Ethology: The Mechanisms And Evolution Of Behavior 0393014886, 9780393014884

The goal of this book is to illustrate the power of ethology's broad and integrative approach in unraveling how beh

365 61 53MB Read more

Entrepreneurship as Networking: Mechanisms, Dynamics, Practices, and Strategies 0190076895, 9780190076894

In the world of business, who you know is usually more important than what you know. While most research highlights the

292 28 2MB Read more

Molecular population genetics and evolution 0-444-10751-7, 9780444107510

This book will be especially useful to those, both in the field and outsideit, who are trying to keep abreast of recent

415 113 5MB Read more

Adaptive Training: Building a Body That's Fit for Function (Men's Health and Fitness, Functional Movement, Lifestyle Fitness Equipment) 1684811120, 9781684811120

Evolutionary Lifestyle Fitness Strategies “Definitely one of the most holistic fitness and training personalities of the

101 90 462KB Read more

Infectious Diseases from Nature: Mechanisms of Viral Emergence and Persistence 9783211243343, 3211243348

Significant zoonotic diseases have appeared with increasing frequency in recent years. At a symposium held in Galveston,

99 26 4MB Read more

Arterial Chemoreceptors: Mal(adaptive) Responses: O2 Dependent and Independent Mechanisms 303132370X, 9783031323706

The book will contain reviews and brief research articles from the participants attending the International Society for

164 9 14MB Read more

Population Dynamics for Conservation 0198758367, 9780198758365

The management and conservation of natural populations relies heavily on concepts and results generated from models of p

312 39 29MB Read more

Viral Fitness and Evolution: Population Dynamics and Adaptive Mechanisms
3031156390, 9783031156397

Author / Uploaded
Esteban Domingo
Peter Schuster
Santiago F. Elena
Celia Perales

Similar Topics
Biology
Virology

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Current Topics in Microbiology and Immunology

Esteban Domingo Peter Schuster Santiago F. Elena Celia Perales Editors

Viral Fitness and Evolution Population Dynamics and Adaptive Mechanisms

Current Topics in Microbiology and Immunology Volume 439

Series Editors Rafi Ahmed, School of Medicine, Rollins Research Center, Emory University, Atlanta, GA, USA Shizuo Akira, Immunology Frontier Research Center, Osaka University, Suita, Osaka, Japan Arturo Casadevall, W. Harry Feinstone Department of Molecular Microbiology and Immunology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA Jorge E. Galan, Boyer Center for Molecular Medicine, School of Medicine, Yale University, New Haven, CT, USA Adolfo Garcia-Sastre, Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY, USA Bernard Malissen, Parc Scientifique de Luminy, Centre d’Immunologie de Marseille-Luminy, Marseille, France Rino Rappuoli, GSK Vaccines, Siena, Italy

The reviews series Current Topics in Microbiology and Immunology publishes cutting-edge syntheses of the latest advances in molecular immunology, medical microbiology, virology and biotechnology. Each volume of the series highlights a selected timely topic, is curated by a dedicated expert in the respective field, and contains a wealth of information on the featured subject by combining fundamental knowledge with latest research results in a unique manner. 2020 Impact Factor: 4.291, 5-Year Impact Factor: 5.110 2020 Eigenfactor Score: 0.00667, Article Influence Score: 1.480 2020 Cite Score: 7.7, h5-Index: 38

Esteban Domingo · Peter Schuster · Santiago F. Elena · Celia Perales Editors

Viral Fitness and Evolution Population Dynamics and Adaptive Mechanisms

Editors Esteban Domingo (CSIC-UAM) Centro de Biología Molecular Severo Ochoa Madrid, Spain Santiago F. Elena (CSIC-UV) Instituto de Biología Integrativa de Sistemas Valencia, Spain

Peter Schuster Institut für Theoretische Chemie Universität Wien Vienna, Austria Celia Perales (CSIC) Centro Nacional de Biotecnología Madrid, Spain

ISSN 0070-217X ISSN 2196-9965 (electronic) Current Topics in Microbiology and Immunology ISBN 978-3-031-15639-7 ISBN 978-3-031-15640-3 (eBook) https://doi.org/10.1007/978-3-031-15640-3 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Foreword

Over the last years, I used to deliver a lecture on viral transmission within a course on viral dynamics at Universitat Pompeu Fabra in Barcelona. It was a review of standard approaches to virus complexity with a special focus on the modelling of how epidemic thresholds emerge from simple toy descriptions of interactions between viruses and their hosts. As with other standard lectures on the topic, some relevant examples were discussed. Measles, hepatitis C virus, HIV or polio and their particular idiosyncrasies and historical impact on our societies, as well as the major success of vaccination with the eradication of smallpox as an outstanding victory of science. I also discussed the concept of emerging viruses and the threat posed by novel infectious diseases associated with them. Along with Hanta, Lassa or the more recent Zika virus, the SARS-related coronavirus case study came easily as a reminder that new viruses are expected to emerge any time. I used to discuss the 2002 outbreak that took place in Southern China which caused alarm due to its rapid spread (cases were reported on all continents) and high mortality. The virus was eventually contained thanks to a rapid public health response and its zoonotic origins established. We could say that we were safe: It seemed just another event taking place somewhere, far away and just scratching our doors. At the end of the lecture, I also cited several authors warning about the risks associated to our globalised world where transport networks have virtually eliminated geographic boundaries and where humans are putting an enormous pressure on nature. The last slide was a cover of TIMES magazine: “Warning: we are not ready for the next pandemic”. As I write this foreword, the world emerges from 2 years of a devastating pandemic caused by another SARS-CoV strain that emerged at the end of 2019. In a few months, what appeared to be just another controllable outbreak led to a global spread. Millions of people have died from the COVID-19 pandemic and the confinement measures required to control further spread had to be taken at a big economic and social cost. It has been estimated that the total mass of viruses involved could fit within a soda can. A reminder of the power of these tiny molecular parasites, able to damage our civilisation despite their apparent simplicity. Along with the political and public health measures used to limit the spread, a historical research effort was deployed towards understanding the virus, from its v

vi

Foreword

genome sequence and molecular structure to the underlying strategies to find its way into tissues and organs. As a result, a whole picture of this new threat emerged in a record time, along with the first vaccines that marked the start of its control (but perhaps not its demise). The success of this scientific challenge was largely due to a combination of economic expenditure and the accumulated knowledge of past studies in virology. Despite the uncertainties created by the new virus, it was only one member of the virosphere, i.e. the huge and always expanding universe of viruses. It was known that they were equipped with RNA genomes, exhibiting high mutation rates (due to their error-prone polymerases) and able to generate a highly heterogeneous population. Moreover, theoretical works initiated in the 1970s on the Darwinian evolution of viruses also offered a rationale to understand the impact of their heterogeneity on adaptation. As discussed by different contributors of this book, viral populations (including both animal and plant viruses) are quasispecies: there is no single genome responsible for adaptation, but a whole cloud of sequences connected via mutations. And these populations evolve and spread within space and time but also across their astronomically vast fitness landscapes. Viruses are complex adaptive systems, always evolving and adapting to their hosts and sometimes causing major trouble. Over the last decades, the combination of novel techniques of deep sequencing along with improved models that can be compared with massive amounts of data is sharpening our picture of virus dynamics. These advances have shown that it is possible to describe their landscapes and the development of new antiviral agents. This is not an easy task, since coevolutionary forces are always at work as mutation (but also recombination, as illustrated by Geminiviruses) can produce novelties responsible for new epidemic events. As pointed out by Domingo and co-workers, viruses typically live in a non-equilibrium state. This makes difficult (but not impossible) to search and develop new resistance alleles to help creating reliable crops or predict clinical outcomes. The COVID-19 pandemic has been the most recent scenario to use the many and complementary approximations covered here by means of both general principles and well-defined case studies. The emergent picture that results from the interdisciplinary nature of virology is a promising one, where ecological and evolutionary components of adaptation (displaying similar time scales when dealing with RNA viruses) are both integrated into a phylodynamics perspective. Similarly, the study of plant-virus adaptation suggests that a proper integration will be achieved by means of a multilayer network, where multiple scales (from cells and tissues to individuals and communities) would connect key concepts associated to replication modes, bottlenecks and path-dependent propagation events. Understanding viruses, as illustrated by the contributions collected here, will always need to combine the idiosyncrasies of each case study (there is plenty of room for variation, and no other entity illustrates this

Foreword

vii

so well) and general principles. There is a long and winding road to map the virosphere, from its molecular possibilities and evolutionary trees to its mathematical properties. Much work is still needed, but progress is fast and the whole picture is rapidly improving. Hopefully, we might not yet be able to predict the next pandemic, but we are getting more aware and well prepared. Barcelona, Spain

Ricard Solé

Contents

Virus Evolution on Fitness Landscapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Peter Schuster and Peter F. Stadler

1

Viral Fitness Landscapes Based on Self-organizing Maps . . . . . . . . . . . . . . M. Soledad Delgado, Cecilio López-Galíndez, and Federico Moran

95

Virus Evolution Faced to Multiple Host Targets: The Potyvirus—Pepper Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Lucie Tamisier, Séverine Lacombe, Carole Caranta, Jean-Luc Gallois, and Benoît Moury The Role of Extensive Recombination in the Evolution of Geminiviruses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 Elvira Fiallo-Olivé and Jesús Navas-Castillo Plant Virus Adaptation to New Hosts: A Multi-scale Approach . . . . . . . . 167 Santiago F. Elena and Fernando García-Arenal Viral Fitness, Population Complexity, Host Interactions, and Resistance to Antiviral Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Esteban Domingo, Carlos García-Crespo, María Eugenia Soria, and Celia Perales Mechanisms and Consequences of Genetic Variation in Hepatitis C Virus (HCV) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 Andrea Galli and Jens Bukh Mammarenavirus Genetic Diversity and Its Biological Implications . . . . 265 Manuela Sironi, Diego Forni, and Juan C. de la Torre

ix

x

Contents

Genome Structure, Life Cycle, and Taxonomy of Coronaviruses and the Evolution of SARS-CoV-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 Kevin Lamkiewicz, Luis Roger Esquivel Gomez, Denise Kühnert, and Manja Marz Epilogue: CTMI. 13.3.22 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341

Virus Evolution on Fitness Landscapes Peter Schuster and Peter F. Stadler

Abstract The landscape paradigm is revisited in the light of evolution in simple systems. A brief overview of different classes of fitness landscapes is followed by a more detailed discussion of the RNA model, which is currently the only evolutionary model that allows for a comprehensive molecular analysis of a fitness landscape. Neutral networks of genotypes are indispensable for the success of evolution. Important insights into the evolutionary mechanism are gained by considering the topology of sequence and shape spaces. The dynamic concept of molecular quasispecies is viewed in the light of the landscape paradigm. The distribution of fitness values in state space is mirrored by the population structures of mutant distributions. Two classes of thresholds for replication error or mutations are important: (i) the— conventional—genotypic error threshold, which separates ordered replication from random drift on neutral networks, and (ii) a phenotypic error threshold above which the molecular phenotype is lost. Empirical landscapes are reviewed and finally, the implications of the landscape concept for virus evolution are discussed.

In: Viral Fitness and Evolution—Population Dynamics and Adaptive Mechanisms Esteban Domingo, Peter Schuster, Santiago F. Elena and Celia Perales, Eds. Current Topics in Microbiology and Immunology, Volume xxx Springer Nature Switzerland AG, Cham, CH 2021 P. Schuster (B) Institut für Theoretische Chemie der Universität Wien, Währingerstraße 17, 1090 Wien, Austria e-mail: [email protected] P. F. Stadler Institut für Informatik der Universität Leipzig, Härtelstraße 16-18, 04107 Leipzig, Germany e-mail: [email protected]; [email protected] The Santa Fe Institute, 1399 Hyde Park Road, Santa Fe NM 87501, USA © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 E. Domingo et al. (eds.), Viral Fitness and Evoluation, Current Topics in Microbiology and Immunology 439, https://doi.org/10.1007/978-3-031-15640-3_1

1

2

P. Schuster and P. F. Stadler

1 Introduction Evolutionary genetics in the twentieth century was shaped by the life-long controversy between Ronald Aylmer Fisher and Sewall Wright on the mechanism of evolution (Provine 1992) that did not end before both opponents had died. After Fisher’s death in 1962, Wright continued to put forward his view of evolution in many papers and in his extensive four-volume treatise of population genetics (Wright 1968, 1969, 1977, 1978). In a nutshell: (i) Fisher wanted to construct a theory of evolution similar to thermodynamics (Crow 2002; Ewens and Lessard 2015), whereas Wright favored the uniqueness of biology. Central to Fisher’s thinking was a reference state with independent genes, which were optimized each for themselves. Epistasis and pleiotropy, he thought, play the role of a perturbation (see Sect. 2.3). (ii) Wright’s view was the opposite in the sense that he emphasized the importance of gene interactions and thought that the successes of natural selection as well as animal and plant breeding result from optimizing the entire system of interacting genes rather than single genes. A consequence of the different views is that Fisher considered evolution to be the most effective in large populations, whereas Wright was thinking about partially isolated subgroups, sufficiently small to make random drift efficient and to allow for fixation but large enough to prevent degeneration and extinction. In addition, Fisher like most of his contemporaries involved in the development of the synthetic theory of evolution was an extreme gradualist in the sense that he insisted that all evolutionary change is brought about by the succession of a large number of very small steps (see Fig. 22 in Sect. 4.1). The concept of an adaptive fitness landscape is commonly attributed to Wright (1931, 1932, 1988) who introduced it as a metaphor underlying the illustration of adaptive evolution as a hill-climb or adaptive walk on a multi-peak fitness (hyper) surface (Fig. 1).1 According to McCoy (1979) the idea of evolution as an adaptive process on a fitness landscape has been used for the first time by Janet (1895) in order to provide an explanation for the lack of intermediate forms of species in the fossil record much earlier than Wright’s seminal publication (Wright 1932). Although there is a certain similarity between the thoughts of Janet and Wright there are also fundamental differences. Janet, for example, when searching for an explanation of the occurrence of gaps in the fossil record, assumed that species located in valleys are pulled down by selection similar to gravitation acting on bodies in reality. He thought 1

The expression hypersurface points at the fact that fitness landscapes are surfaces in highdimensional space. The notion hypersurface was introduced first in quantum molecular sciences where the motions of a molecule have commonly more than three degrees of freedom. Since we shall be dealing here almost exclusively with such high-dimensional objects we drop from now on the prefix ‘hyper’.

Virus Evolution on Fitness Landscapes

3

Fig. 1 Sketch of Sewall Wright’s fitness landscape. The figure sketches one-dimensional fitness landscapes where individual species Υ j occupy local fitness maxima. Species are ordered by fitness: f 1 ≥ f 2 ≥ f 3 . . . . Under constant environmental conditions evolution approaches the global optimum corresponding to the fittest species Υ1 . In low-dimensional spaces substantial fitness drawdowns are required in order to cross valleys. Wright (1932) himself points out that real fitness landscapes are high-dimensional and sketches with low-dimensional supports, in particular one-dimensional ones, may be misleading

that populations occupy areas corresponding to the variability so that strong selection and small areas correspond to narrow valleys and populations with little variability whereas more variable species would be represented by broader valleys and larger areas. Dietrich and Skipper (2012) argue that ‘Wright’s discovery of the adaptive landscape was independent of Janet’s, and certainly more influential historically.’ The fitness landscape in the sense of Sewall Wright is a mapping of the space of genotypes called genotype or sequence space2 onto non-negative real numbers representing fitness parameters, which enter the deterministic or stochastic dynamical systems describing the evolution of populations (Fig. 1). Wright’s illustration visualizes a genotype space with several alleles per locus. Genotype space then comprises all possible allele combinations and for a few hundred genes with two alleles each, the number of combinations is already hyper-astronomical—four hundred genes lead to 2400 = 2.58 × 10120 different genotypes. As sketched in Fig. 1 species are situated on local maxima or peaks of the landscape. Evolution progresses through migration from peak to peak, Wright’s original fitness landscape contains many local fitness maxima and minima, and migrating populations have to pass lower terrain crossing intermediate saddle points or valleys (Wright 1932, pp. 358 and 361). Optimization through natural selection on fitness landscapes implies stepwise or steady increase and encounters a problem, since naïve adaptive walks in the Darwinian sense cannot 2

The genotype space Q will be called sequence space when we understand genotypes as RNA or DNA sequences here.

4

P. Schuster and P. F. Stadler

take downward steps. Wright accounts for this obvious problem with his shifting balance model of evolution, which combines Darwinian selection and genetic drift (Masel 2011). The model consists of three phases: (i) random genetic drift splitting the global population into subpopulations, (ii) selection within subpopulations, and (iii) selection between subpopulations. The mean fitness of the population is assumed to decrease during phase (i) and to increase during the phases (ii) and (iii). Wright’s original fitness landscape is mapped upon a 2d model support of genotype space that contains several local maxima, saddle points, and valleys. Landscapes on low-dimensional supports are misleading as a simple example demonstrates: On 1dlandscapes all valleys have to be crossed at the bottom, because there are no saddle points, which exist already on 2d supports and it becomes possible to migrate from one peak to another without passing the lowest points. Generic landscapes on highdimensional supports provide a multiplicity of paths leading from peak to peak along ridges and passing high-dimensional saddle points, crossing valleys at the bottom is no longer necessary (Fragata et al. 2019). Apart from high-dimensionality it is the existence of neutral networks (Grüner et al. 1996a, b; Reidys et al. 1997; Schuster et al. 1994), that enables adaptive walks on fitness landscapes (see also sections 2.5 and 3.6). Despite the apparent success in popularity for the illustration of Darwinian selection Wright’s metaphor was facing objection and has been heavily debated in the following years (see, e.g., Provine 1986; Ruse 1996) and for a more recent well-founded analysis of Wright’s landscape concept we recommend (Skipper 2004, 2012)). Several authors find it questionable whether or not Wright’s fitness landscape is more than a didactically useful metaphor (Ruse 1996) and indeed, Wright developed his adaptive landscape metaphor and its diagrams as a way to translate his shifting balance theory from the mathematics of population genetics to a more accessible idiom for general biologists. The majority of studies dealing with fitness landscapes assume time-independent fitness values. This may be questionable because evolution takes place in ecosystems and the adaptation of one species provides changes in the environment for all other species. Stronger coupling and consequential time dependence are caused by the coevolution of species. In mathematical terms time dependent landscapes can be described in different ways: (i) by static landscapes, which are externally driven, (ii) by spatially extended dynamical systems, or (iii) by coevolutionary models (Richter and Engelbrecht 2014). Here we shall be concerned with static landscapes only.

2 Landscape Models and Landscape Features A complete and comprehensive representation of the fitness landscape with all important properties is out of reach at present, even for the smallest organisms and for viruses, and even in case of evolution in vitro (Joyce 2007). The maximal achievable population sizes are in the range of N = 1015 molecules and they are much too small to cover all possible variants. A practicable escape of this dilemma is the concep-

Virus Evolution on Fitness Landscapes

5

tion of models, which focus on a given aspect of evolution on landscapes. Therefore this section is dealing with different models and the properties they can describe or explain.

2.1 Classes of Landscapes Adaptive landscapes for evolution are easier to discuss and analyze when three classes are distinguished: (i) genetic, (ii) phenotypic, and (iii) molecular landscapes (Dietrich and Skipper 2012). Sewall Wrights landscape is a genetic landscape: fitness is assigned to genotypes, the collection of all possible genotypes together with a distance measure between pairs of genotypes constitutes the genotype space, and the fitness landscape is the result from plotting fitness on genotype space. Already in his first paper Wright (1932) makes clear that one fundamental problem is the dimensionality of the carrier upon which the landscape is built. Wright illustrates the problem by means of a fictive mini-genome of five loci with two alleles each giving rise to 25 = 32 different haploid combinations and 35 = 243 diploid genotypes. Viruses encode for one up to 2500 protein genes, prokaryotes from 182 up to 9400, and eukaryotes from 20 000 up to about 100 000 (Milo and Phillips 2016, p. 286). A second complication is the complexity of the relations between genotypes, phenotypes, and fitness, which involve a variety of still unsolved problems like protein folding, the development of multicellular organisms as well as the prediction of biological function from molecular structure. The existence of a genetic landscape as a useful tool in modeling evolution is based on a number of assumptions, for example (i) constant environments are required, since the unfolding of the phenotype as well as its fitness depends on environmental factors, (ii) fitness is a property of the carrier of the genotype alone and does not depend on other individuals in the population, (iii) random events during unfolding of the genotype, which exert influence on phenotypes and/or fitness values, are excluded, and (iv) epigenetic effects are negligible. In reality assumptions (i) to (iv) are restrictive and not fulfilled in general. In phenotypic landscapes the first part of the highly complex unfolding of genotypes is skipped and evolutionary relevant properties like fitness are assigned directly to phenotypes. The assignment of fitness to phenotypes is certainly less difficult and easier to interpret than the relation to genotypes. The first definition and application of phenotypic landscapes is due to the paleontologist Simpson Simpson (1944, 1953). Among other things the phenotypic landscape has been applied to explain the evolution of horses. The phenotypic landscape became popular among paleontologists and received a mathematical model through the works of Lande (1976, 1979). A detailed description of phenotypic landscapes is found in Rice (2012).

6

P. Schuster and P. F. Stadler

The fast development of molecular biology in the second half of the twentieth century opened a new avenue to understanding, analyzing, and handling of fitness landscapes. The resulting model is often characterized as the molecular landscapes paradigm: Fitness is a function of genotypes through the expression in molecular structures and according to structural biology this allocation is based on two mappings genotype

=⇒ phenotype Ψ

=⇒

Xk sequence

=⇒ Φ

Sk

=⇒ structure

Sk = Ψ(X k )

fitness ,

=⇒

fk ,

=⇒

function ,

(1)

f k = Φ(Sk ) .

In other words, the structure Sk —understood as the phenotype—is a function of the genotype X k , which is the nucleotide sequence of a DNA or an RNA molecule, and fitness f k is a function of the molecular structure. In strict mathematical sense both mappings are considered to be unique in the forward direction: A given genotype X k determines the structure Sk and the structure in turn determines the fitness f k of the genotype’s carrier. The first mapping, Sk = Ψ(X k ), is the genotype-phenotype (GP) map and the second map, f k = Φ(Sk ), is called the structure-function relation. The obvious question for the introduction of a molecular fitness landscape concerns the choice of appropriate supports for the functions Ψ(X ) and Φ(S). Maynard Smith (1970) suggested a protein space as the basis for modeling protein evolution. The protein space is a point space: Every point represents one particular amino acid sequence and accordingly, for a twenty-letter alphabet, κ = 20, the number of points in protein space is hyper-astronomically large already for small oligopeptides. To and give an example: The space of proteins of chain length l is denoted by Q(20) l (20) l has the cardinality |Ql | = 20 . Thus the number of individual sequences of chain 18 23 length l = 18 over the amino acid alphabet amounts to |Q(20) 18 | = 20 = 2.62 × 10 and this is more than one-third of Avogadro’s number or approximately one hundred times the number of protein molecules per liter at cellular concentration.3 The protein space is not only huge it is also quite complex as far as the manifestations of evolutionary moves are concerned (Dayhoff et al. 1978).4 The primary moves occur at the polynucleotide level—DNA or RNA—and because of translation amino acid replacements involve the genetic code and its redundancies. Empirical determination of the probabilities of particular amino acid replacements yields the socalled point accepted mutation matrices (PAMn ) (Böckenhauer and Bongartz 2007, pp. 95–97), where n counts the numbers of exchanged amino acid per one hundred amino acid residues—common are PAM1 , PAM100 , and PAM250 . Margaret Dayhoff’s path-breaking contribution to computational biology was to develop methods for the calculation of the PAM-matrices and to relate them to evolutionary time scales. 3 4

Milo (2013) provides a value of 2 − 4 × 106 protein molecules per μm 3 cell volume. An evolutionary move is the change of a biopolymer sequence as a consequence of a mutation.

Virus Evolution on Fitness Landscapes

7

Considering evolution at the level of DNA or RNA has the advantage that mutations, in particular point mutations, i.e., exchanges of single nucleotides, are straightforward to handle. The space of polynucleotide sequences is completely determined by the length of the sequences5 l and the size of the alphabet κ (Fig. 2). For binary sequences, κ = 2, of constant chain length l the sequence space is a hypercube of dimension n = l. The binary sequence space, Q01 l of sequences built from the two digits 0 and 1, was introduced by Richard Hamming at Bell Labs (Hamming 1950, 1986) in the context of error handling in communication theory. The binary sequence space is adequate for dealing with all sequences built from two nucleotides, AU AUGC QGC l and Ql , respectively. The extension to the natural four-letter alphabet, Ql with the four nucleotides (A, U (T), G, C), is straightforward although hard to illustrate properly, because projections of the high-dimensional objects onto a 2d plane commonly look rather confusing. In Fig. 2 we sketch the buildup of binary and fourletter sequence spaces: The sequence spaces for longer sequences can be derived by → QAUGC means of a recursive construction: QAUGC l l+1 (Swetina and Schuster 1982). A sequence space is visualized as a graph where the nodes correspond to individual sequences and the edges connect all pairs of sequences with Hamming distance dH = 1. The Hamming distance counts the number of positions in which two endto-end aligned sequences of equal length l differ. It induces a metric on sequence space and provides the basis for a formal mathematical analysis of sequence space and shape space topology (see Sect. 4). In Fig. 3 the sequence space is resolved into mutant classes. The class Γk comprises all sequences, which are at Hamming distance dH = k from the reference sequence X0 : Γk = {X j |dH (X 0 , X j ) = k}. In particular, Γ0 contains only the reference sequence X0 and ν0/= 1, Γ1 all ν1 = (κ − 1)l oneerror mutants of X0 , Γ2 all ν2 = (κ − 1)2 l(l − 1) 2 two-error mutants, and Γk all () νk = (κ − 1)k kl mutants with k errors. Binary sequences follow a binomial distribu() tion: |Γk | = νk = kl . There is also a rich literature on this topic (Eigen 1985; Feistel and Ebeling 1982; Fontana and Schuster 1998b; Rechenberg 1973; Reidys et al. 1997; Reidys 1997; Reidys and Stadler 2002; Stadler et al. 2001; Strasser 2010). At the same time the Hamming metric represents the natural distance between genotypes in evolution, because it is well defined as the minimal number of point mutations converting two sequences of equal lengths into each other, and in addition, point mutations are most common in evolution. Based on the concept of sequence space, mutations at the molecular level can be easily classified and analyzed. The binary sequence space has a symmetry that is missing in the sequence spaces over all other alphabets: The numbers of sequences ) () ( l ) (lin = k . class Γk , νk = kl , are the same as the numbers in class Γl−k since νl−k = l−k In particular, there is only one sequence in class Γl and this is the sequence that is complementary to the reference, whereas class Γl contains the complementary sequence and νl = (κ − 1)l − 1 other sequences in all sequence spaces with κ /= 2. This symmetry is nicely reflected by the appearance of probability density surfaces for structure distances (Fontana et al. 1993a, Fig. 12 ). 5

For the sake of simplicity we consider here sequences of the same length l.

8

P. Schuster and P. F. Stadler

Fig. 2 Buildup of binary (01) and four-letter (AUGC) sequence spaces. The sequence spaces QA l+1 for strings with chain length l + 1 over an alphabet A of size |A| = κ is constructed from sequence spaces QA l through adding one symbol, either 0 or 1 for binary sequences or A, or U(T) or G or C for natural sequences, on the l.h.s. to the string (see, e.g., Schuster 2009). Joining all pairs of sequences with dH = 1 by straight lines yields the sequence space QA l+1 . The upper is a hypercube of part of the figure deals with binary sequences, A2 = 01: The sequence space Q01 l dimension l. The lower part of the figure presents the same construction for the natural nucleotide AUGC , is a straight line and one alphabet, A4 = AU(T)GC. The single digit element, Q01 1 or Q1 dimensional for binary sequences or a tetrahedron and three-dimensional for the four-digit alphabet, respectively. The binary sequence space for two digits Q01 2 is a square and two dimensional. For natural (AUGC)-strings of two letters (l = 2) the sequence space, QAUGC , is a tetrahedron of 2 tetrahedra (middle drawing). This is an object in six-dimensional space that looks quite complicated in the projection onto a (two dimensional) plane (drawing on the r.h.s.)

Virus Evolution on Fitness Landscapes

9

GC Fig. 3 Sketch of the binary sequence space with l = 5. The sequence space Q(2) 5 ≡ Q5 contains 32 sequences, which are indicated here by their equivalent decadic numbers. Nucleotides are assigned, for example ' 0' ≡ C and ' 1' ≡ G: 0 = ’00000’ ≡ ’CCCCC’, 1 = ’00001’ ≡ ’CCCCG’, 2 = ’00010’ ≡ ’CCCGC’, . . . , 31 = ’11111’ ≡ ’GGGGG’. Individual sequences are grouped in mutant classes Γk that are defined by their Hamming distance to the reference sequence ’CCCCC’, dH (X j , X 1 ) = k. The numbers of binary sequences in each Γk are given by the binomial distribution: () |Γk | = νk = kl

The concept of evolution at the polynucleotide level was implemented by Eigen (1971) in his kinetic theory of self-organization of biological macromolecules. Implicitly in this theory it was assumed that evolution takes place by means of moves in a nucleic acid sequence space. These moves are mutations, most frequently point mutations—especially in in vitro and virus evolution—but other changes in the genetic message like deletions, insertions, and genome rearrangements occur as well although with lower frequency. In case the mutation rates are sufficiently high, continued reproduction and mutation lead to a distribution of sequences related by mutation, which may approach stationarity depending on population size, reproduction parameters, and environmental conditions. For now we mention three different scenarios: (i) at small mutation rates and not too large population sizes the populations are homogeneous and consist of single genotypes only, (ii) in the intermediate range mutations occur sufficiently frequently and mutants show up regularly in the genotype distributions, and (iii) at high mutation rates the supply of new mutations is sufficiently large in order to prevent the population from becoming stationary and hence it drifts randomly through sequence space. Evolution in scenario (i) is confined to successive fixation of (advantageous) mutants in the population and all previously selected sequences are lost in the transition to the next variant. Scenario (ii) allows for the simultaneous existence of several genotypes in the population and the genetic reservoir of the population comprises

10

P. Schuster and P. F. Stadler

the whole ensemble. The notion quasispecies has been coined for such an ensemble that has reached stationarity (Eigen and Schuster 1977). In Sect. 5 we shall derive an approximate quantitative expression for the mutation rates at which the first transition between the scenarios, (i)→(ii), occurs. The background picture of Eigen’s theory is a fitness landscape built upon a genotype space (Fig. 1), which consists of a complete set of polynucleotide sequences. There are many alternative concepts; as an example we mention the fitness-space model (Tsimring et al. 1996) that has been successfully applied to explain the experimental observation of two distinct stages in the growth of viral fitness (Novella et al. 1995, 1999).

2.2 Abstract Rugged Landscapes The dynamics of evolution, and its efficiency as a search process on the genotype or phenotype space, depends crucially on the structure of landscapes. It is instructive, therefore, to study simple mathematical models of landscapes before delving in more detail into the molecular and empirical landscapes on which viruses evolve. Abstracting equation (1) further, a fitness landscape is simply a function f (X) that assigns a (fitness) value to each object or configuration in a (finite) set Q, the configuration space. This set Q is endowed with some structure that expresses nearness or accessibility. For the moment, this is captured by a move set, that defines for X ∈ Q the set N (X) of neighbors of X. Assuming symmetry of the move set, i.e., we obtain, as above, an undirected graph as our configuration space. A detailed account of the formal aspects can be found in Reidys and Stadler (2002). Fitness landscapes are thus simply functions on the vertex set of a graph. At this level of abstraction, landscapes are a quite common model in many fields of science: (i) In physics, spin glasses (Garstecki ∑ et al. 1999) comprise sets of N spins si = ±1/2 to which an energy f (s) = i< j Ji j si s j is assigned that models their pairwise interaction. The interaction strength Ji j encapsulates details of the model. For instance, if the spins are arranged on a regular lattice, Ji j = 0 unless i and j located at adjacent lattice positions. The neighbors of spin configuration s = (s1 , s2 , . . . , s N ) are obtained by flipping a single spin si → −si . (ii) The folding of a given biopolymer (Onuchic et al. 1997; Mallamace et al. 2016) may be understood in terms of an energy landscape f (x) where x is a (discretized) spatial conformation (Flamm et al. 2000; Wolfinger et al. 2006), e.g., a particular contact matrix of the polypeptide chain or a list of nucleotide base pairs in a nucleic acid. A natural notion of neighborhood in this case involves the open or closing of a base pair or a contact between amino acids. Landscape of this type is used to model the dynamics of biopolymer folding. (iii) In the setting of computer-aided drug design, Q is e.g., a set of small molecular ligands and their fitness f (X) models their binding affinity to a given target

Virus Evolution on Fitness Landscapes

11

protein of interest (Vogt 2018). Neighborhood N on Q is defined by chemical similarity or similarity in syntheses of combinatorial and make-on-demand libraries (Saldívar-González et al. 2020). (iv) Landscapes also serve as abstract models of combinatorial optimization problems with the aim to design general purpose optimization schemes (Gendreau and Potvin 2010) such as genetic algorithms or simulated annealing. Here, the aim is to find neighborhood definitions that make the problem easier to solve. The neighborhood structure in the configuration space provides us with a notion of locality. A local optimum, then, is simply a configuration ^ X) ≥ X for which f (^ f (Y) for all Y ∈ N (X), i.e., a configuration with a fitness larger than that of all its neighbors. In the same vain, walks can be used to explore the landscape. An adaptive walk (Weinberger 1991b; Jain 2011; Neidhart and Krug 2011) is a sequence of configurations such that fitness increases in each step, ending when no further improvement can be found, i.e., when a local optimum is reached. A gradient walk is an adaptive walk in which in each step the neighbor Y ∈ N (X) is chosen that maximizes the fitness among all neighbors of X. The definition of locality makes it possible to render the intuition of ruggedness precise. A landscape is smooth if it has few local optima and (on average) long adaptive and gradient walks. In such a landscape, it suffices to follow an up-hill direction to eventually reach one of the few optima. A landscape is rugged, on the other hand, if it has many local optima and commensurably short adaptive walks that usually get stuck quickly in suboptimal configurations. A random walk on Q is simply a sequence R = (X0 , X1 , . . . , Xt , . . . ) of configurations Xi ∈ Q such that Xi+1 ∈ N (Xi ) is a randomly chosen neighbor of its predecessor Xi . Random walks provide a one-dimensional slice through the (landscapes if one considers the) corresponding sequence of fitness values f (R) = f (X0 ), f (X1 ), . . . , f (Xt ), . . . . In fact, in this manner one obtains a picture that very much looks like Fig. 1, except the horizontal axis now represents time, i.e., the number of steps along the random walk (Weinberger 1990). Standard methods from time-series analysis can be applied to the pseudo-time-series t |→ f (Xt ) sampled along random walks. Of particular interest is the autocorrelation function t 1 ⎲~ ρ(τ ) = lim f (Xt−τ ) f (Xt ) ~ t→∞ t − τ i=τ

(2)

f (X) = ( f (X) − ¯f∑ where ~ )/s f is the landscape normalized by subtracting the mean = (1/|X |) x∈X f (x) and dividing by the square root of the variance fitness value ¯f∑ s 2f = (1/|X |) x∈X ( f (x) − ¯f )2 . The autocorrelation functions satisfies ρ(0) = 1 and eventually approaches limτ ∞ ρ(τ ) = 0 as the random walk moves to unrelated regions of configuration space. The speed at which it drops measures how quickly the information about the local fitness is lost, which is fast ∑in rugged landscapes and slow in smooth landscapes. The correlation length l = ∞ τ =0 ρ(τ ) thus serves as a convenient measure of ruggedness that can easily and efficiently be estimated.

12

P. Schuster and P. F. Stadler

2.3 Additivity, Epistasis, and Elementary Landscapes The sequence space Qκl with κ = |A| = 2 is a Boolean hypercube. Instead of the digits 0 and 1 or the nucleotides G and C we consider here the directions of spins, xk = ±1 or in other words, a configuration X is a sequence of spins: X = (x1 , . . . , xn ) where the spin at the position is either +1 or −1. Then, every function of X can be written in a power series expansion6 : f (X) = a0 +

⎲ i

+

ai xi +

⎲

⎲ i< j

ai j xi x j +

⎲

ai jk xi x j xk + . . .

i< j 200. We illustrate by means of an example: (3,2) s10 = 14 (3,2) s200 = 1.233 × 1050 (3,2) s1000 = 3.8637 × 10262 (3,2) s10000 = 2.2401 × 102662

(3,2)

and ^ s10

= 21.92,

(3,2)

and ^ s200 = 1.270 × 1050 , (3,2) and ^ s1000 = 3.8886 × 10262 , and (3,2) and ^ s10000 = 2.2536 × 102662 ,

which shows rather slow convergence—the error for l = 200 is about 3% and for l = 10 000 still 6 ‰. Enumeration of all shapes for the chain length l = 6 (Fig. 9) yields |S6H | = s6 = (1,1) s6 = 17, s6(3,1) = 4, s6(1,2) = 4, and s6(3,2) = 1. We remark that the open chain, ‘••••••’, plays the role of the unit element and has to be member of every subset. Accordingly, the number of non-trivial or proper structures is three in both cases. Hence, loop restriction n hl ≥ μ = 3 and stack restriction n st ≥ σ = 2 applied alone reduce s6 = 17 to s6(3,1) = 4 or s6(1,2) = 4, respectively. Both restrictions applied together yield s6(3,2) = 1 and this means there is no proper structure fulfilling the restrictions simultaneously. Indeed the smallest shape with n hl = 3 and n st = 2 is ‘((•••))’ with l = 7. In Table 1 we compare the cardinalities of sequence spaces for binary and four(AUGC) l | = 4l , with the numbers of shapes calletter sequences, |Q(01) l | = 2 and |Ql culated for different restrictions (μ, σ): Both, the numbers of sequences and the numbers of shapes grow exponentially with chain lengths. In addition, the asymptotic expression (12) provides us with an accurate estimate of the ratio of sequences

24

P. Schuster and P. F. Stadler

Table 1 Comparison of the numbers of RNA sequences and structures as a function of the chain length l. Given are the numbers of sequences and the numbers of shapes computed form the recursions (10) and (11) as well as the parameters α and β of the asymptotic expressions (12) for different values of μ and σ (Hofacker et al. 1998) Number of sequences Number of structures (1,1) (3,1) (1,2) (3,2) 2l 4l sl = sl sl sl sl l 10 15 20 30 50 100 200 αμ,σ βμ,σ

1024 1.049 × 106 32 768 1.074 × 109 1.049 × 106 1.100 × 1012 9 1.074 × 10 1.153 × 1018 1.126 × 1.28 × 1030 1015 1.268 × 1.607 × 1030 1060 1.607 × 2.582 × 1060 10120

423 65 30 372 2481 2.516 × 106 106 633

38 708 14 409

14 174 2741

2.151 × 1010 2.359 × 1018 6.764 × 1038 1.518 × 1080

1.815 × 1015 6.320 × 1032 2.072 × 1068

2.773 × 1012 7.303 × 1026 4.740 × 1055

8.302 × 1010 6.900 × 1023 1.233 × 1050

1.1044 2.6178

0.7131 2.2889

2.1641 1.9681

1.4848 1.8488

2.409 × 108 7.373 × 106 760 983

to shapes for large molecules. The number of unrestricted structures sl(1,1) is larger than the number of binary sequences—apart from boundary or finite size effects for l < 16 where we find more sequences than shapes—but smaller than the number of four-letter sequences: 4l > sl(1,1) > 2l . Essentially the same is true for the structures with loop size restriction sl(3,1) but the boundary effects are much stronger and reach up to l = 45. The introduction of a minimal stack size σ = 2 is more restrictive: For (μ, σ) = (1, 2) and (μ, σ) = (3, 2) we calculate more binary sequences than structures. The asymptotic expression (12) predicts more shapes than binary sequences for sl = sl(1,1) and sl = sl(3,1) since the exponents are larger than two: β1,1 = 2.62 and β3,1 = 2.29. In case of the stack restriction the exponent is only a little smaller than two, β1,2 = 1.97, and we find also asymptotically more binary sequences than shapes. For the conventional case, n hl ≥ 3 and n st ≥ 2, the two parameters values are α3,2 = 1.4848 and β3,2 = 1.849. The value of β is significantly smaller than two, the basis of the exponentially growing binary sequences, and hence the numbers of shapes in the conventional case grow weaker with chain length l even for binary sequences (κ = 2, β < 2). The condition more sequences than shapes is fulfilled independently of the chain length l. The calculated data presented here demonstrate the great importance of physical constraints on sequence-structure relations resulting in drastic restrictions of shape space cardinality.

Virus Evolution on Fitness Landscapes

stack

hairpin loop

bulge

25

internal loop

multiloop

joint

free ends

Fig. 10 Modules of RNA secondary structures. The seven modules of RNA secondary structures in the string notation. The first three substructures (top row) occurred already in Fig. 6 and the names explain their appearance. A bulge is like an internal loop but the unpaired base(s) occur(s) only on one strand, a multiloop is like an internal loop too but it connects more than two stacks, a joint connects two shapes, and free ends are unpaired nucleotides occurring on one end or both ends of the string. Gray color is used to facilitate the assignment of the parentheses

3.3 Minimal Free Energy Structures In order to make the assignment of structures to sequences unique and to put the notion of structure on a solid physical basis, further specification of the RNA structure under consideration is needed. A structure is the result(of a folding process ) that is accompanied in general by lowering free energy, δ ΔG 0 (t) < 0. Thus, the structure is the result of folding kinetics and depends on initial conditions and other temporary influences. Thermodynamics, however, demands that the final state of the folding process—after sufficiently long eventually infinite time—is the unique structure of minimal free energy called the mfe-structure and defined by limt→∞ ΔG 0 (t) = min{ΔG 0 (t)} = ΔG 00 (see also Sect. 3.8). If we neglect kinetic effects or assign infinite time to the folding process the thermodynamic structures with minimal free energy (mfe-structures) serve the uniqueness criterion.14 To search for mfe-structures through free energy minimization clearly has the advantage of a precisely defined optimization objective and almost all computational methods for RNA (secondary) structure prediction use free energy minimization as optimization criterion. The free energy of folding a sequence into a secondary mfe-structure or shape is calculated from additive terms consisting of stabilizing base pair stacking contributions, destabilizing hairpin loop contributions or multiloop strains among other free energy items. Figure 10 lists seven classes of structure modules: (i) stacks, (ii) hairpin loops, (iii) internal loops, (iv) bulges, (v) multiloops, (vi) joints, and (vii) free ends (see also Fig. 6). Because of the assumption of additivity shapes with minimal free energy can be computed straightforwardly through optimization by means of lin14

In special situations two or more sequences may have exactly the same free energy of folding and thus jeopardize uniqueness. Two equivalent structures related by symmetry represent such examples but in case of RNA they are rather rare and thermodynamics handles this problem by redefining the ground state.

26

P. Schuster and P. F. Stadler

ear programming algorithms (see, for example, Zuker and Stiegler 1981; Zuker and Sankoff 1984; Hofacker et al. 1994; Lorenz et al. 2011). The free energy parameters are derived empirically from thermodynamic and kinetic data of model compounds. The possibility to compute structures of minimal free energy with fairly simple algorithms opens a new avenue for the analysis of shape spaces derived from different nucleotide alphabets: exhaustive folding, enumeration, and statistical analysis (Fontana et al. 1991; Grüner et al. 1996a, b). The enormous sizes of sequence spaces, however, set a size limitation to exhaustive enumeration. Our computational facilities and computer programs allow for handling about 1010 sequences at a time and this limits complete folding and enumeration of AUGC-sequences to l ≤ 16 and GC-sequences to l ≤ 32 (Grüner et al. 1996a). The shape spaces of sequences from different nucleotide alphabets—the natural alphabet AUGC and four restricted alphabets where one or two nucleotides are missing AUG, UGC, AU, or GC15 —have been studied by exhaustive enumeration (Fontana et al. 1991; Grüner et al. 1996a, b). The results are shown in Table 2: Not all acceptable shapes |Sl(3,2) | = sl(3,2) can be realized as mfe-structures in the different alphabets, not even in the largest alphabet AUGC.16 On the other hand, not a single mfe-structure with a loop size n hl < 3 or a stack size n st < 2 was found with these small RNA molecules and this justifies a posteriori the choices of physical constraints made for structure counting. The comparison of the numbers of mfe-structures for the alphabets UGC and AUG versus GC and AU shows that the weaker nucleotide interaction—AU in comparison to GC—can sustain only fewer mfe-structures and this is easily interpreted: The loop strain free energies being the same, weaker stacking makes it more difficult to form a stable hairpin loop. For all nucleotide alphabets applied the number of sequences exceeds the number of conventional mfe-structures, in particular (3,2) GC l > |S GC |QGC l | = 2 > sl mfe,l | = s l , and AUGC |QAUGC | = 4l > sl(3,2) > |S AUGC . l mfe,l | = s l

There are more sequences than structures and inevitably, there must exist structures, which are formed as mfe-structures by more than one sequence, and we shall characterize this fact as neutrality (in the sequence-structure map). For the smallest / four-letter sequences—AUGC-alphabet—the factor κl |SlA | is already larger than 104 and for two-letter sequences—GC-alphabet—the factor is more than 103 for l > 20, and increases exponentially with increasing l. The distribution of sequences forming the same or different mfe-structures is of primary importance in evolution since it determines whether progress by mutational moves is possible or not. Restricted alphabets, of course, have fewer legal base pairs, UGC, for example, has four: U − G, G − U, G ≡ C, and C ≡ G. The two other possibilities with three letters, AUC and AGC, are less interesting, because they are just two-letter alphabets with a dummy nucleotide, which cannot pair. 16 All smaller alphabets are implicitly contained in the larger alphabets and we have: 15

(3,2)

|Sl | ≥ |SlAUGC | ≥ |SlAUG | ≥ |SlAU | and |SlAUGC | ≥ |SlUGC | ≥ |SlGC |. The results in Table 2 show that ‘≥’ can be replaced here by ‘>’.

Virus Evolution on Fitness Landscapes

27

Table 2 Numbers of sequences and structures in different nucleotide alphabets. The numbers CA of sequences—2l or 4l —are compared with the minimal numbers of compatible sequences, ^ l (3,2) (3,2) and the cardinality of shape spaces |Sl | = sl with the numbers of mfe-structures derived A for different chain lengths l and the five different nucleotide alphabets, |S A mfe,l | = s l with A = AUGC, AUG, UGC, AU and GC, through exhaustive folding and enumeration (Schuster 2003) Numbers of sequences

Number of structures

l

2l

^ C GC l

4l

^ C AUGC l

sl(3,2)

s GC l

s UGC l

s AUGC l

s AUG l

s AU l

7

128

32

16 384

2304

2

1

1

1

1

1

8

256

64

65 536

9216

4

3

3

3

2

1

9

512

64

262 144 13 824

8

7

7

7

3

1

10

1024

128

1.05 × 106

55 296

14

13

13

13

5

3

12

4096

256

1.68 × 107

331 776 37

35

35

36

14

8

14

16 384

512

2.68 × 108

1.99 × 106

101

83

89

93

31

20

16

65 536

1024

4.29 × 109

1.19 × 107

304

214

246

260

72

44

18

262 144 2048

6.87 × 1010

7.17 × 107

919

582

735

180

96

20

1.05 × 106

4096

1.10 × 1012

4.30 × 108

2741

1599

2146

504

232

30

1.07 × 109

131 072 1.15 × 1018

3.34 × 1012

760 983 218 318

21 315

Alphabets with less than four letters are not only of theoretical interest, they may form perfect RNA catalysts, so-called ribozymes, as Gerald Joyce and coworkers have shown for the AUG and DU alphabet (Rogers and Joyce 1999; Reader and Joyce 2002).17 Joyce demonstrated successfully that simpler RNA molecules with fewer building blocks could have performed RNA catalysis under prebiotic conditions. The nearest neighbors of a given sequence of chain length l in sequence space are the 3l single point mutations. By means of a small RNA molecule of chain length l = 17 with the sequence X0 = AGCUUACUUAGUGCGCU and the structure S0 =(((•(((•••)))•))) as a study example we shall investigate the distribution of structures and their properties in the one-error sequence space neighborhood. The molecule has 17 × 3 = 51 nearest neighbors since every nucleotide in AUGCcan be replaced by three other nucleotides. Figure 11 shows sequence space, QAUGC l all 16 different mfe-structures, which are found in the one-error neighborhood. The numbers (frequency) of occurrence varies from 1 (1/51=0.0196) for six different 17

‘D’ stands for the modified nucleotide 2,6-diaminopurine, which is used instead of 6aminopurine—adenine ‘A’, because it makes stronger base pairs and more stable stacks, and thus sequences with D instead of A can form more mfe-structures.

28

P. Schuster and P. F. Stadler

Fig. 11 Structures of the one-error mutant spectrum of a small RNA molecule. The figure presents the structures of all 51 single point mutations of the sequence X0 = AGCUUACUUAGUGCGCU (Fig. 6). In total 16 different structures Sk with k = 0, 1, . . . , 15 were obtained. Shape S0 in the center is the structure of the reference sequence X0 , it is most frequent and occurs 15 times. The shapes on the periphery are ordered according to their appearance in the series of consecutive mutations (see Fig. 12). Inserted in the arrows pointing from S0 to the individual structures Sk are (i) the numbers of occurrence (color) and (ii) the base pair distance dS (S0 , Sk ) (larger numbers in gray). The base pair distance is defined as the number of base pairs in which the two structures differ: S0 and S9 , for example, differ in one base pair, dS (S0 , S9 ) = 1, and S0 and S7 differ in six base pairs, dS (S0 , S7 ) = 6—three base pairs have to be opened and three other ones have to be closed. All drawings of structures begin at the 5’-end of the RNA, which is always the left end of the graph or string (in upright positioning), nucleotides are shown as beads and base pairs are connected by a colored thick line. Colors encode numbers base pairs: red = 7, black = 6, green = 5, blue = 4, pink = 3, and lavender = 2

Virus Evolution on Fitness Landscapes

29

Fig. 12 Free energy of folding of the one-error mutants of a small RNA. The plot shows the folding energies at 0o C of the 51 one-error mutants of X0 . At each position 1 to 15 the sequence of mutants is N → A, N → U, N → G, and N → C, where the trivial replacement leaving the sequence unchanged is omitted (N = {A, U, G, C}). The folding energy of the reference sequence is shown as dotted line and the color code refers to the number of base pairs (red = 7, black = 6, green = 5, blue = 4, pink = 3, and lavender = 2; see caption of Fig. 11)

structures—S2 , S4 , S5 , S6 , S11 , and S12 —to 15 (15/51 = 0.2941) for the reference structure S0 . Similarities and differences between different structures can be expressed in terms of a distance measure between structures. One measure of distance is the base pair distance dS (Si , S j ) between two structures. in shape space S H,AUGC mfe,l It is calculated as the number of base pairs, which have changed and is simply the number of base pairs, which have been removed, plus the number of base pairs, which were inserted. The base pair distance of the two structures S0 and S7 , for example, is obtained from: S0 = (((•(((•••)))•))) ... base pairs removed, S7 = •((((((•••)))•))) ... base pairs inserted,

as 3 + 3 = 6 base pairs. In Fig. 12 we consider the Gibbs free energies of folding, ΔG 00 , into all mfe-structures in the one-error neighborhood of X0 in sequence space QAUGC as an example of a mapping from structures into function. The folding energy 17 varies strongly and not regularly with the position and the nature of the mutation, and this irregularity provides the basis for the ruggedness of the structure-function map.

30

P. Schuster and P. F. Stadler

3.4 Neutral Networks and Compatible Sequences Neutrality brings a new issue into the relation between sequences and structure. The assignment of mfe-structures to sequences is still unique from sequence to structure but not in opposite direction: Ψ−1 (Sk ) is a mapping many-to-one / and no function in the strict mathematical sense (Fig. 13). In general the ratio κl s A l is large—often many orders of magnitude. The preimage in sequence space of a structure Sk from shape space is called its neutral network (Schuster et al. 1994; Reidys 1995; Reidys et al. 1997; Grüner et al. 1996a, b). . | Ψ(X j ) = Sk } G(Sk ) = Ψ−1 (Sk ) = {X j ∈ Q(A) l

(13)

The neutral network is a subset of sequences in sequence space viewed as a graph: the nodes correspond to individual sequences and edges are straight lines connecting pairs of sequences at Hamming distance dH = 1 (Fig. 13). The size of the preimage of shape Sk , i.e., the number of sequences forming Sk as mfe-structure, is denoted by |G(Sk )| = G(Sk ). The degree of neutrality of a sequence X j that belongs to the neutral network G(Sk ) is the number of neutral nearest neighbors n (dj H =1) (Sk ), i.e., sequences at Hamming distance dH = 1 from X j that fold into the same mfe-structure Sk , divided by the total number of nearest neighbors in sequence space, λX j (Sk ) =

n (dj H =1) (Sk ) (κ − 1)l

; j = 1, . . . , G(Sk ) ,

and the mean degree is the average taken over all sequences X j of G(Sk ): ∑ λ(Sk ) =

X j∈G(Sk ) λX j (Sk )

G(Sk )

.

(14)

λ(Sk ) represents the most important quantity for the characterization of the neutral network G(Sk ). Neutral networks G(Sk ) are embedded in larger networks of compatible sequences C(Sk ). In other words, the neutral network of the structure Sk fulfills G(Sk ) ⊆ C(Sk ). A compatible sequence is a sequence, which can form the structure Sk but it does it not necessarily as its mfe-structure. Formulated differently, a sequence X j is compatible with a structure Sk if it has two nucleotides from the base pairing alphabet B(A) wherever there is a base pair in the structure (Fig. 14). Mapping structures into compatible sequences can also be formulated by means of a multiple-valued function ^−118 : Ψ . ^−1 (Sk ) = ^−1 (X j ) = Sk } . {X j |Ψ (15) C(Sk ) = Ψ

18

A so-called multiple-valued function is not a function in the strict mathematical sense since the outcome for a defined single input to a multiple-valued function is ambiguous.

Virus Evolution on Fitness Landscapes

31

Fig. 13 Relations between RNA sequences and shapes. A neutral network G (Sk ) is a set of sequences X j in sequence space that form the structure Sk as mfe-structure (upper panel; red). The neutral network is a graph in sequence space—nodes correspond to individual sequences, edges are straight lines connecting pairs of sequences with Hamming distance dH = 1—and represents the preimage of Sk in sequence space: G (Sk ) = {X j |Ψ−1 (Sk )}. Application of random graph theory to neutral networks predicts that, above threshold λ > λcr , G (Sk ) contains a single connected component (as shown here) or a sequence of components with a dominating giant component. A neutral network is embedded in the set of all compatible sequences C (Sk ) ⊇ G (Sk ), which consists of all sequences that could form Sk without violating a secondary structure rule. The lower panel sketches a mapping from all structures in shape space (blue)—the mfe-structure (large blue circle) and the metastable suboptimal structures defined as local minima of the free energy surface with |ΔG 0mfe | > |ΔG 0 < 0|—onto the sequence X j in sequence from which they are derived. The mfe^(X j ). This set is embedded in the structure together with all suboptimal structures form the set S ^(X j ) (see Sect. 3.8) set of all structures that are compatible with the sequence X j : C^(X j ) ⊇ S

32

P. Schuster and P. F. Stadler

Fig. 14 Compatible sequences and partitioning of sequence space. A compatible sequence has two matching bases that could form a legal base pair wherever there is a base pair in the structure which contains n u unpaired nucleotides and n p base pairs with l = n u + 2n p = 20. On the r.h.s. of the sketch a partitioning of sequence space into a space of unpaired nucleotides and a space of base pairs is shown

It is useful to factorize the sequence space into a space of unpaired nucleotides and A B a space of n p base pairs: QA l = Ulu ⊗ Pl p with l = n u + 2n p = lu + 2l p (Reidys et al. 1997).19 The numbers of sequences, which are compatible with a structure Sk are easily counted. As indicated in Fig. 14, κ nucleotides may occur at each unpaired position and δ base pairs at each position of a base pair: ( δ )n p (Sk ) = κn u · δn p = κl 2 , (16) |ClA,B(A) (Sk )| = CA,B(A) l κ ( )n = 2l−n p for GC and CAUGC = 4l 38 p for AUGC sequences. which leads to CGC l l Considering compatible networks of structures with the same chain length l we find that it is the unfolded chain S0 , which forms the largest network C(S0 ) with n p = 0 l ^ and CA l (S0 ) = κ . The smallest compatible network C is formed by the structure with the longest possible stack Smax , which for triloops as smallest acceptable hairpin has n p = ⎣(l − 3)/2⎦ base pairs and hence the compatible networks have sizes CA l (Sk ) in the range l ^A ^A κl ≥ CA l (Sk ) ≥ C l (Smax ) with C l = κ

( δ )⎣(l−3)/2⎦ κ2

.

As expected the cardinalities of compatible networks increase exponentially with ^AUGC are included in Table 2 and illustrate the chain length l. Values for ^ CGC l and Cl band width of the sizes of compatible networks for different structures. There is a subtle difference between n (u, p) and l(u, p) : n (u, p) counts the numbers of nucleotides irrespective of their position along the sequence, whereas l(u, p) distinguishes different orderings in the two subspaces U and P , respectively.

19

Virus Evolution on Fitness Landscapes

33

Structures are properly divided into common and rare shapes and with the mean number of sequences folding into an mfe-structure as borderline: common shape Sk : G(Sk ) ≥

κl sA l

and rare shape Sk : G(Sk )
1, then a graph in G(v, p) will almost surely have a unique giant component containing a (positive) ( fraction of the vertices and no other component/ will contain more than O ln v) vertices, (iv) if p < (1 − ε) ln v v, then a graph in G(v, p) will almost surely contain isolated vertices, and thus / be disconnected, (v) if p > (1 + ε) ln v v, then a graph in G(v, p) will almost surely consist of one component only / and hence be connected, and (vi) the quotient ln v v is a sharp threshold for the connectivity of G(v, p). In essence, we are dealing with two different scenarios in the Erd˝os-Rényi model: (i) at low probabilities of edges, p < pcr , or low density dG , a typical random graph is a patchwork consisting of many small components and (ii) at large probabilities, p > pcr , we are almost certainly dealing with a one component connected graph. The connectivity threshold phenomenon is closely related to the percolation threshold in percolation theory (Sahimi 1994; Stauffer and Aharony 1994).

36

P. Schuster and P. F. Stadler

The theory of random graphs has been applied to neutral networks and their appearance in sequence space (Reidys et al. 1997; Reidys 1995). The best suited parameter replacing the critical probability of edges p is the mean degree of neutrality λ from Eq. (14), which varies over the same domain: 0 ≤ λ ≤ 1, where λ = 0 implies absence of neutrality and λ = 1 characterizes a completely flat landscape with all sequences having identical fitness. The mean degree of neutrality exhibits a critical value called connectivity threshold, λcr (κ) = 1 −

√ κ−1 ,

κ−1

(18)

which separates the two different regimes of network structures: (i) networks with a mean fraction of neutral neighbors below threshold, λ < λcr , are typically unconnected and consist of a (large) number of components, and (ii) networks with a mean fraction of neutral neighbors above the threshold value λ > λcr consist of a single large component characterized as giant component and none or a few isolated points or very small components. The analogy to random graphs of the Erd˝os-Rényi model is obvious. In order to compare the connectivity threshold from graph theory with data from A B RNA secondary structures we adopt the factorized sequence space, QA l = Ulu ⊗ Pl p introduced in Fig. 14, and consider the mean degrees of neutrality separately for both subspaces UlAu and PlBp : √ √ ( ) ( ) κ−1 δ−1 λu > λcr u = 1 − κ−1 and λ p > λcr p = 1 − δ−1 ,

(19)

The mean degrees of neutrality derived from secondary structures computed by RNA folding depend on the chain lengths l but converge to constant values for sufficiently long molecules, λu, p = liml→∞ λu, p (l). A comparison of the connectivity thresholds from random graphs with the calculated critical mean degrees of neutrality for actual RNA secondary structures extrapolated to infinite chain lengths, λu and λ p , is shown in Table 3. The RNA model provides cases for both scenarios, (λu , λu ) < λcr and (λu , λu ) > λcr : The mean degrees of neutrality for structures from both twoletter alphabets, GC and AU, are below the connectivity threshold and hence their neutral networks are subcritical unconnected graphs—provided the sequences X j are sufficiently long and free from boundary edge effects. Shapes of sequences from the four-letter alphabets, AUGC and GCXK, are supercritical and hence their neutral networks are dense and their graphs are connected. The extension of neutral networks can be made plausible through computer sim^k (X0 ) (k = 1, 2, . . .) ulations of neutral paths (Schuster et al. 1994): A neutral path ϑ starts at a randomly chosen initial sequence X0 and is constructed by selecting randomly and iteratively neutral nearest neighbors such that the Hamming distance from X0 in QA l increases with each step. The subscript ‘k’ distinguishes different sequences of random events. The simulations ends when no neutral neighbor can be found at some final sequence Xk , which increases the distance to X0 . The distance ^k (X0 )| = dH (Xk , X0 ) is the length of the path. Interestingly, the density d^k (X0 ) = |ϑ

Virus Evolution on Fitness Landscapes

37

Table 3 Connectivity threshold and asymptotic fractions of neutral neighbors. The connectivity thresholds from random graph theory for different alphabet sizes—λcr (κ) from Eq. (18)—is compared with the asymptotic fractions of neutral neighbors in the unpaired subspace UlAu and the subspace of base pairs PlBp (Reidys et al. 1997, p. 381) for the alphabets derived from natural nucleotides, AU, GC, AUGC, and the artificial synthetic alphabet GCXK (Piccirilli et al. 1990) Random graphs RNA folding data Alphabet κ δ α,δ λcr λu λp 2 2 4 4 6

0.5 0.5 0.370 0.370 0.301

GC AU GCXKa AUGC AUGC

2 2 4 4 –

0.271 0.352 0.479 0.495 –

2 2 4 – 6

0.436 0.495 0.509 – 0.455

a

Synthetic RNA molecules with two artificial nucleotide forming a new base pair: X = xanthine and K = 2,6-diamino-pyrimidine (Piccirilli et al. 1990). In the calculations the GC parameters were used for the XK base pair

/ distribution of the relative lengths of neural path, d^k (X0 ) l, depends strongly on the size of the alphabet α—but apart from boundary or finite size effects—only little on chain lengths l. The vast extension of neutral networks in sequence space QAUGC l is reflected by analyzing neutral paths: There are relatively few short neutral path (d^k (X0 ) < l/2) but many long paths; for chain length l = 100 a fraction of 21.7% of all paths percolates the entire sequence space and end in a sequence Xk that has not a single nucleotide in common with the initial sequence X 0 : d^k (X0 ) = l = 100. This result has an enormous consequence for evolution: Populations can travel by random drift throughout almost all sequence space without changing fitness. Neutral and QAU paths in the two-letter sequence spaces QGC l l terminate at much shorter Hamming distances. The density distribution of the path lengths d^k (X0 ) is close to uniform in the range 1 < d^k (X0 ) < 90 for GC-sequences with chain length l = 100. The critical degree of neutrality λcr is closely related to a percolation threshold and this agrees well with the fact that the neutral networks of AUGC-sequences extend and the distribution of the length of neutral over the whole sequence space QAUGC l paths has a maximum at d^k (X0 ) = l (Schuster et al. 1994). RNA folding data show that neutral networks may have two, three, or four large components and apparently contradict the predictions of random graph theory. The splitting of neutral networks into a small number of large components can be readily explained by anisotropies in the distribution of sequences forming the same structure in sequence space (Fig. 16). The Erd˝os-Rényi model assumes independence and analytical equality of the probability distribution function of edges. In case there is no or little base composition bias for forming structures the largest number of structures is found in the class Γk (k = 0, 1, . . . , l)21 where most of the sequences are, and for the binary hypercube, Ql0,1 , this is in the class Γm in the middle of the Classes Γk are defined by the Hamming distance from a reference sequence X0 : Xl ∈ Γk |dH (X0 , Xl ) = k (see Fig. 3). 21

38

P. Schuster and P. F. Stadler

Fig. 16 Neutral networks with multiple large components. The figure sketches three types of structures involving a stable stacking regiona : In structures of type I the stack cannot be extended, structures of type II allow for an extension by an additional base pair on one side (red), and stacks in structures of type III can be extended by base pairs on both sides of the base paired region (blue and red). The probabilities of occurrence of structures of type I as a function of the base composition, xC (xG = 1 − xC ) follow closely a binomial distribution with a maximum at equal amounts C and G, xC = xG = 21 , and meet the prediction of random graph theory in the sense of forming a neutral network with a single giant component. Structures of type II and III are likely to form additional base pairs extending the stack whenever complementary nucleotides are in the appropriate positions, and then become other mfe-structures. Base compositions with equal amounts of C and G have the largest probability to be complementary, and hence more (2) (1) type II structures are formed at unequal nucleotide ratios, xC = 21 + δ and xC = 21 − δ. These compositions correspond to two different components of the neutral network at opposite offsets from xC = 21 . Type III structures can add two additional base pairs to the stack. Accordingly, we are dealing with two independent biasses δ and ε, which lead to four maxima of the structure distributions at the positions: xC(1) = 21 + δ + ε, xC(2) = 21 − δ + ε, xC(3) = 21 + δ − ε, and xC(4) = 21 − δ − ε. The assumption ε ≈ δ leads to partitioning into four equal components or three components in the size ratios 1:2:1 (Grüner et al. 1996b; Reidys et al. 1997). a A single stacking region is chosen as an example. There are many other structural motifs with analogous behavior

() hypercube,22 since the distribution of sequences is binomial νk = |Γk | = kl . We provide and explanation for the anisotropy by means of GC-sequences. The base composition of a sequence Xk is measured in terms of (mole) fractions: #C(Xk ) #C(Xk ) = , #C(X k ) + #G(Xk ) l #G(Xk ) #G(Xk ) = , and xC (Xk ) + xG (Xk ) = 1 . xG (Xk ) = #C(Xk ) + #G(Xk ) l xC (Xk ) =

Precisely we have one largest class, Γk with m = k/2 only for odd chain lengths l. For even l we are dealing with two largest classes, m l = ⎣k/2⎦ and m u = ⎡k/2⎤.

22

Virus Evolution on Fitness Landscapes

39

In Fig. 16 a simple example from the two-letter sequence space QGC l is presented for the purpose of illustration. The core structure is a stack of five base pairs, structures of type I cannot extend the stack by additional base pairs, they are characterized by an unbiased binomial distribution in sequence space, and form neutral networks with a single giant component as predicted by random graph theory. Structures of type II can be transformed to other structures through the extension of the stack by a base pair on one side, and this is most likely to occur at xC = xG = 21 . Therefore, the maximum number of structures is found in classes off the middle and we expect two distributions with maxima at xC(1,2) = 21 ± δ, which form two large components with xC > xG and xC < xG . Structures of type III, finally, have two possibilities to change the structure by adding a base pair to the stack, this leads to two independent biasses δ and ε, and four large components at the mole fractions xC(1,2,3,4) = 21 ± δ ± ε. Three large components result from four components through merging of two close components and have the size rations 1 : 2 : 1. The splitting of neutral networks into a small number of large components is observed occasionally and contradicts the predictions of random graph theory but can be readily explained by anisotropies in the distribution of sequences forming the same structure in sequence space as a consequence of the molecular details of the base pairing logic (see also Reidys et al. 1997, pp. 386–390).

3.6 Evolution and Neutral Networks Finally, we visualize evolution within the RNA model. The key relation shaping the landscape for the evolutionary process is the genotype-phenotype map modeled by folding RNA sequences into secondary structures of minimal free energy, Sk = Ψ(Xk ). Although being only an oversimplified caricature of a genotypephenotype map in the real world, the RNA sequence-structure relation shares important properties with empirical fitness landscapes. Four features that are relevant for evolution are listed here: (i) Ruggedness of landscapes follows inevitably from folding the sequences of a biopolymer and its most frequent mutations into structures (Figs. 11 and 12): A Hamming distance dH = 1 neighborhood of an RNA sequence contains (κ − 1) l single point mutants, which typically form the same, similar or entirely different secondary structures (Schuster 2016). The second mapping, f k = Φ(Sk ), translates structure into function, in particular into fitness that can be interpreted for simplicity as the reproduction rate of the molecular entity. It is suggestive to identify identical shapes with neutrality.23 Within the RNA model neutral mutants form the same shape and are thought to have the same fitness. The distribution 23

In reality forming the same structure is commonly not sufficient for having the same function, because there are parts of the molecules, for example, the active sites, where sequence and structure have to be conserved in order to retain function.

40

P. Schuster and P. F. Stadler

of fitness effects is a core issue of population genetics and has been investigated in great detail (Brajesh et al. 2019; Eyre-Walker and Keightley 2007) (see also Loewe and Hill 2010, and issue 1544 of Phil. Trans. Roy. Soc. B). An over the thumb rule for quasi-stationary populations says that neutral and deleterious mutations occur with approximately the same frequency and beneficial mutations are rare. (ii) The enormous sizes of sequence spaces imply that the evolving populations can explore only tiny local patches of the space. In principle, optimization or reaching one of the major fitness peaks only would already be doomed to fail would not be there a generic property of biopolymer sequence-structure map Ψ concerning shape space covering discussed later in this section. (iii) Sequence spaces are not only enormous with respect to their cardinality, they are also high-dimensional. Landscapes upon high-dimensional supports are substantially different from the familiar three-dimensional landscapes: Saddle points, in particular those of higher degrees allow for bridging of valleys without crossing the bottoms. (iv) Neutrality plays an important role in evolution, because it allows for highly efficient combination of adaptation through natural selection and random drift of populations in sequence space. Evolution is illustrated as an adaptive walk of a population on a fitness landscape. The evolving population chooses the next step—arbitrarily—from the set of all legal moves along which fitness is non-decreasing. Populations—even the largest ones— occupy only negligibly small regions in sequence space and would be unable to reach major fitness peaks in low-dimensional landscapes (see also Sect. 3.5). Two properties apart from the high-dimensionality of sequence spaces make up the apparent success of evolutionary optimization: (i) the existence of neutral networks and (ii) shape space covering. A typical virus population consist of an optimal genotype called the master sequence and a cloud of mutants surrounding it in sequence space (see Sect. 5). Any sequence can be the starting point for the next move of evolution or in other words, every sequence, which exists in the population, is an acceptable initial point for the next mutational jump. In case of favorable random events and if the target sequence of the jump has a sufficiently high fitness value the new genotype may become the new master sequence. In general a small high-dimensional patch in sequence space is covered by the population, which can bridge over narrow valleys whereby the bridgeable width depends on population size, mutation rate, and fitness landscape. Bridging valleys of low fitness by mutant distributions in the sense of Fig. 17 (upper panel; blue bridges) is not sufficient to reach major fitness peaks but neutral networks make it possible. Avoidance of low-lying terrain becomes especially effective in adaptive walks on landscapes with a sufficiently high degree of neutrality λ (Fig. 17, lower panel). As indicated by the red bridges populations can pass valleys without taking downward steps by a sequence of evens: (i) in an adaptive walk the populations reach a point where the adaptive walk ends, (ii) for sufficiently large λ this point belongs

Virus Evolution on Fitness Landscapes

41

Fig. 17 Sketch of an adaptive walk without and with selective neutrality. Darwinian optimization is illustrated as hill climbing on rugged landscapes. Sufficiently large populations can bridge minor valleys through exploring sequence space with their mutant distributions or quasispecies (blue bridges). In the absence of neutrality populations are frequently trapped in local optima and cannot reach major fitness peaks (upper panel). Extensive neutral networks of genotypes, however, allow for an escape through a random walk in another dimension (red bridges). The figure is a modified version of Fig. 12 from Reidys et al. (1997)

almost certain to neutral networks upon which the population migrates by random drift, and (iii) a sequence is found that allows for the continuation of the adaptive walk with increasing fitness. The neutral networks enables the bridging of low-lying parts in sequence space through escape in other dimensions (Reidys et al. 1997, p. 393) and this is demonstrated by the existence neutral paths, which percolate whole sequence

42

P. Schuster and P. F. Stadler

space (Schuster et al. 1994; Schuster 1995). An evolutionary walk on a rugged fitness landscape is characterized as an interplay between adaptation and random drift. Vast neutrality as observed in the RNA model allows for maintaining the phenotype despite large changes in the nucleotide sequences of genotypes (Huynen et al. 1996; Huynen 1996). Bridging valleys through a bypass in higher dimension has also been discussed by Conrad (1990) as a basic process in adaptive evolution.

3.7 Shape Space Covering Evolution is facilitated by the shape space covering property of common RNA secondary structures. The shape space covering conjecture suggests the existence of a neighborhood in form of a (high-dimensional) ball with radius rcov around an arbitrary sequence in sequence space that contains sequences whose structures include almost all common shapes (Schuster et al. 1994; Grüner et al. 1996b; Reidys et al. 1997). Shape space covering results from the combination of two features: (i) relatively few common shapes are more or less randomly distributed in sequence space, and (ii) the local structural features around a randomly chosen reference sequence disappear at relatively short Hamming distances—computation yielded a correlation lengths of lcor ≈ 8 for the RNA secondary structures of AUGC sequences of chain length l = 100. Analytical estimates of lower and upper bound for this example yielded 15 < rcov < 20 (Reidys et al. 1997), and this implies for the upper bound has to be searched in order to find—on the that only a fraction of ≈ 10−48 of QAUGC 100 average—one example of a given common shape. For GC sequences the balls are larger, values for rcov in the range 19 < rcov < 23 were estimated, and the fraction to be searched amounts to ≈ 10−23 of QGC 100 . Numerical computations yield estimates for rcov close to the upper bound. Shape space covering, the existence of a ball with characteristic radius around any randomly chosen sequences within which almost all common shapes are found, is a robust phenomenon of the mapping from sequences into RNA secondary structures and—as expected—the radius rcov depends approximately linearly on the chain l.

3.8 Suboptimal Structures and Compatibility So far we have implicitly assumed that every RNA sequence X k gives rise to single unique structure Sk = Ψ(X k ), which is almost always true when the notion of structure is restricted to the outcome of a well defined kinetic or thermodynamic process as it was the case for mfe-structures indicated by Sk, mfe = Sk(0) = S k = Ψ(X k ).24 24

In this section we make use of three different notions of structures, which shall be marked S, and compatible by overline, overhat, and overtilde, mfe-structures: S, suboptimal structures: ^ structures: ~ S.

Virus Evolution on Fitness Landscapes

43

Fig. 18 RNA minimum free energy (mfe) and suboptimal structures. Considered is an RNA sequence with chain length l = 33, which folds into the hairpin S 0 (red) as mfe-structure: GGCCCCUUUGGGGGCCAGACCCCUAAAGGGGUC. The mfe-structure S 0 with ΔG 0 = − 26.3 kcal/mole is accompanied by a large number of suboptimal structures. The first suboptimal S1 (blue) is a double hairpin with a free folding energy of ΔG 0 = − 25.3 kcal/mole. structure ^ The diagram in the middle shows the spectrum of suboptimal structures: The levels belonging to S1 in blue. The diagram on suboptimal structures related to S 0 are shown in red, those related to ^ the right-hand side presents the part of the barrier tree, which concerns the transition from ^ S1 to S1 have large basins of attraction and the barrier height from ^ S1 to S 0 S 0 . Both structures, S 0 and ^ is ΔG 0 = 19.7 kcal/mole. At room temperature the two structures behave like separate molecules since transitions are too slow to be observed during an experiment

When the minimal free energy criterion is relaxed the one sequence-one structure paradigm becomes invalid because a given RNA sequence X k can form a large num( j) ber of stable secondary structures Sk ( j = 0, 1, 2, . . .) with a wide spectrum of free energies (Fig. 18). In the language of physical chemistry the different secondary structures are conformations of the molecule characterized by its primary structure or sequence X k . The notion of stable structure will be used to indicate the existence of a local minimum on the conformational free energy (hyper)surface of the molecule. Stable structures in as much they are not mfe-structures are called suboptimal structures.25 The probability of occurrence of the molecular ground state conformation Sk(0) in the thermodynamic distribution denoted by p0 , is derived from statistical physics26 :

25

For convenience saddle points of transitions between conformations are sometimes considered together with stable suboptimal structures in barrier trees (Fig. 20). ) (

26

0 with R = NA · kB = For older energy units of cal/mol−1 the exponential factor is exp − EiR−E T

1.9872× cal·K−1 ·mol−1 . Details on the Boltzmann factors and the partition functions are found, for example, in Huang (2010); McQuarrie and Simon (1999).

44

P. Schuster and P. F. Stadler

⎲ ⎲ ε −ε ρ0 − i 0 p(^ S0 ) = p(^ Si ) = 1 , ρi , ρi = e kB T , and with Z = Z i i

(20)

^ ^ and the mean number of ( ) molecules in the state Σi with the suboptimal structure Si is εi −ε0 Ni = N0 · exp − kB T where kB is Boltzmann’s constant. The sum of all Boltzmann factors ρi is called the partition function Z and the reference state of the energies is the ground state energy ε0 .27 In order to provide an illustration of the contributions of suboptimal conformations to the thermodynamic ground state we consider as an example the long hairpin shown in Fig. 18. The free energy of folding of the mfeS0 ) is ΔG 00 = −26.30kcal/mol. The free energies ΔG 0j of the two structure S 0 (≡ ^ suboptimal conformations ^ S j ; j = 2, 3 and the statistical weights considering the three conformations only are: S 0 : ΔG 00 = −26.30 = ε0 kcal/mol with p(S 0 ) = 0.995965 ^ S2 : ΔG 02 = −23.00 = ε0 + 3.3 kcal/mol with p(^ S2 ) = 0.003964, and ^ S3 : ΔG 03 = −20.60 = ε0 + 5.7 kcal/mol with p(^ S3 ) = 0.000071.

For almost all practical purposes the ground state can be approximated by S 0 since the share of suboptimal states is only 4 ‰. Efficient algorithms for the computation of suboptimal structures and free energies based on dynamic programming and the same parameter sets as the programs for mfe-structures are available (Wuchty et al. 1999; Zuker 1989). McCaskill (1990) proposed a method, which calculates directly the partition function of the molecule. ( j) S j (X k ), which are formed through folding The set of suboptimal structures Sk = ^ the sequence X k , can be written in the form: ^ k )} , S^l(A) (X k ) = {^ S j (X k ) ∀ j ∈ Sl(H) | ^ S j = Ψ(X

(21)

where Sl(H) is the shape space of all secondary structures of chain length l. The ^ k ) returns one of the suboptimal structures ^ S j (X k )— multiple-valued function Ψ(X S0 (X k )—of the sequence X k . For the mapping of including the mfe-structure S 0 ≡ ^ sequence space onto structure space suboptimality plays a similar role as neutrality does but in opposite direction: The assignment of a structure to a given sequence is not unique since many structures are stable solutions to the folding problem of a given sequence. An example of mapping suboptimals is sketched in the lower panel of Fig. 13 (blue lines). In Sect. 3.4 sequences that were compatible with a given mfe-structure Sk were discussed as sets providing an appropriate embedding for neutral networks: C(Sk ) ⊇ G(Sk ). The notion of compatibility is symmetric: If a secondary structure is compatible with a sequence, the sequence is compatible with the structure. 27

It is also common to use the free energy of the unfolded chain as reference.

Virus Evolution on Fitness Landscapes

45

Therefore the notion of compatibility may be inverted and applied to sets of structures that are compatible with a given sequence X k . Properly we denote such compatible sets of structures by C(X k ). In contrast to a suboptimal structure a compatible structure need not fulfill any stability criterion, it need neither be a local minimum nor a major saddle point of the free energy surface nor need it to be stable in the sense of a negative free energy of folding. The only criteria the structure has to fulfill for ~ k ) concern base pair compatibility (Fig. 14) and the three rules listed in S j = Ψ(X Sect. 3.1. For compatible structures the relation is given analogously to Eq. (21): ~ k )} = C(X k ) . S~l(A) (X k ) = {S j (X k ) ∀ j ∈ Sl(H) | S j = Ψ(X

(22)

Clearly, there are more structures compatible with a sequence than there are suboptimal structures, S~l(A) (X k ) = C(X k ) ⊇ S^l(A) (X k ). The notion of compatible sets leads directly to the important intersection theorem of RNA secondary structures (Reidys et al. 1997) (Fig. 19): For any pair of arbitrary secondary structures, Si and S j , with the sets C(Si ) and C(S j ) of sequences compatible with the structures the statement C(Si ) ∩ C(S j ) /= ∅,

(23)

is always true.

Fig. 19 Intersection of compatible sets of two RNA structures. The figure sketches the neutral (A) networks of two structures S1 and S2 in sequence space Ql . The two sets C (S1 ) (green) and C (S2 ) (red) that have an overlap C (S1 ) ∩ C (S2 ) (yellow) that contains—here two—RNA sequences, which can fold into both structures S1 and S2 . The intersection theorem of compatible sets of RNA shapes (Reidys et al. 1997) states that the intersection of two compatible sets is always nonempty: C (S1 ) ∩ C (S2 ) / = ∅

46

P. Schuster and P. F. Stadler

A generalization of the intersection theorem to three different structures is false. The intersection theorem has an important consequence for molecular evolution: There are RNA molecules that can form two different structures with different (catalytic) properties. The result had been derived from pure theory (Reidys et al. 1997) and was verified in an excellent experiment (Schultes and Bartel 2000): One RNA sequence—that happens to be situated within an intersection of two compatible sets—forms two different RNA structures, which have different catalytic activities. Imagine now evolution would drive a molecule with defined function along a neutral network until it comes to a position in sequence space where structures with two different functions exist in an intersection of compatible sets. Then, the molecule can switch without mutation from one structure to the other and properties, for example, catalytic actions can undergo major changes. RNA molecules with two different regulatory functions occur in nature and are known as riboswitches (Edwards and Batey 2010; Garst et al. 2011; Mandal and Breaker 2004; Sinumvayo et al. 2018; Winkler and Breaker 2005). Commonly, riboswitches change conformation through binding of small regulatory molecules.

3.9 Kinetic Effects of RNA Folding The relation between RNA suboptimal structures is understood better when transitions between different conformations are considered as well. A path in conformation space is a sequence of consecutive structures, which are related by three classes of moves (Flamm et al. 2000): (i) ••(((••••••)))• → ••((((••••))))• (ii) ••((((••••))))• → ••(((••••••)))• (iii) ((••(((•••))))) → (((••((•••)))))

base pair closure, base pair opening, and base pair shift.

The shift move is a specific combination of base pair opening and base pair closure, which by empirical evidence was found to be necessary for describing folding kinetics properly. In general there is a large number of paths leading from one structure, S j , to another structure, Si (Fig. 20), passing thereby different saddle points and having different weights in a kinetic ensemble. The barrier tree is a coarse-grained simplification of the conformational landscape of an RNA molecule. It discards all unstable structures and retains only major local minima corresponding to (meta)stable conformations and the lowest saddle points. Commonly, the path passing the lowest saddle point has a weight near one—similar to the weight of the ground state S 0 in the partition function—and the approximation by a barrier tree is straightforward. Figure 18 shows a concrete example of an RNA sequence of chain length l = 33, X k = GGCCCCUUUGGGGGCCAGACCCCUAAAGGGGUC, which forms S1 with two two very different secondary structures—S 0 with one hairpin and ^ hairpins—with similar free energies of folding, ΔG 0 (S 0 ) = −26.3 kcal/mol and

Virus Evolution on Fitness Landscapes

47

Fig. 20 Barrier trees in RNA (re) folding. RNA secondary structures Sl formed by one sequence X k fall into three classes: (i) local minima of the energy surface, which are surrounded exclusively by suboptimal structures with higher energies (S j and Si ; black), (ii) saddle points, which have at least two nearest neighbors in shape space belonging to two distinct basins (red), and (iii) unstable structures that are neither local minima nor saddle points (blue). The reaction coordinate is a path in shape space, which leads from one local minimum (conformation S j ) to another local minimum (conformation Si ) through a sequence of consecutive moves. The barrier tree is constructed by neglect of all structures except the ( local minima of the energy surface, S j and Si , and the lowest saddle point, Ti j connecting them the energies of class (iii) structures are indicated for convenience ) as blue energy levels; a specific example is shown in Fig. 18

ΔG 0 (^ S1 ) = −25.3 kcal/mol and a barrier height of ΔE a = 19.7 kcal/mol in the direction from ^ S1 to S 0 . The consequence of the high barrier between the basins of S1 is that the two conformations act like independent molecules attraction of S 0 and ^ and take home message is that a single RNA sequence can form two or more seemingly independent structures with their own characteristic properties. Stochastic RNA folding kinetics has been studied at elementary step resolution based on the three moves base pair closing, base pair opening, and base pair shift shown above (Flamm et al. 2000). Application of the algorithm to a relatively small RNA molecule of chain length l = 115, the SV-11 variant of the RNA of the Escherichia coli bacteriophage Qβ, helped to resolve a puzzle. Biebricher et al. (1982) studied SV-11 in a series of elegant experiments and got hints on the existence of (at least) four different conformations (Biebicher and Luce 1992; Zamora et al. 1995): (i) the unfolded chain ^ Sop , (ii) a very stable single stranded form S 0 , which S1 , and (iv) a was inactive in replication, (iii) a replicating metastable conformation ^ double stranded dimer (S + S − ) that did not replicate as well. The primary structure or sequence of SV-11 is X k : GGGCACCCCC|CUUCGGGGGG|UCACCUCGCG|UAGCUAGCUA|CGCGAGGGUU|AAAGCGCCUU|-|UCUCCCUCGC|GUAGCUAACC|ACGCGAGGUG|ACCCCCCGAA|AAGGGGGGUU|UCCCA

and the kinetic folding algorithm applied to X k revealed that the ground state conformation S 0 is reached by 16% of the folding trajectories only. The remaining 84% con-

48

P. Schuster and P. F. Stadler

Fig. 21 The Qβ-variant SV-11. The figure shows the secondary structures of the metastable conformation ^ S1 and the ground state S 0 of SV-11 (Flamm et al. 2000)

Virus Evolution on Fitness Landscapes

49

S1 (Fig. 21).28 verge to a different dominant structure, the metastable conformation ^ 0 The free energies folding of both conformations, ΔG (S 0 ) = −88.0 kcal/mol and S1 ) = −62.0 kcal/mol, differ by 26 kcal/mol. The long hairpin of the ground ΔG 0 (^ state conformation prevents melting of the secondary structure that is required for replication. The metastable conformation in turn has shorter stacks and can be readily copied by the replicating enzyme Qβ-replicase. In chemical kinetics the phenomenon of this kind of two different reaction products is known as thermodynamic and kinetic control of products—one being the thermodynamic product S 0 and one a product S1 . This simple example shows that the yield of reproducible under kinetic control ^ entities need not be 100% and the active form of the product may be different from the most stable, the thermodynamic conformation.

4 Topology of Sequence and Shape Spaces Population genetics serves as a natural framework in which key concepts of evolutionary biology—such as phenotypic adaptation, the evolution of gene sequences, or the process of speciation—can be understood. Patterns of phenotypic evolution, however, such as the punctuated mode (the partially discontinuous nature) of evolutionary change, developmental constraints or constraints to variation, and phenotypic stability and homology, certain aspects of directionality in evolution, and in particular innovations including the jump of virus to a new host are not adequately described by population genetics models. This is a consequence of the fact that natural selection can determine the fate of a new phenotype only after it has been produced, i.e., accessed by means of the variational mechanisms of genotypic evolution. Theories that describe evolution at the level of phenotypes or phenotypic parameters therefore crucially depend on the genotype-phenotype (GP) map, which determines how phenotypes vary with changes of genotypes (Lewontin 1974; Wagner and Altenberg 1996; Fontana and Schuster 1998b). The key insight is that a researcher’s idea of similarity between phenotypes may be entirely unrelated to their mutual accessibility. For instance, it is hard to conceive a direct transition between the fins of fishes and whales despite their functional similarity, while the ostensibly different C3 and C4 carbon fixation systems in plants are prone to transitions and can even coexist in the same organism (Niklaus and Kelly 2018). The success of evolution finally depends crucially on the repertoire of phenotypes that are instantaneously accessible. Accessibility, in turn, depends on the mechanism(s) that produce the underlying genetic diversity, i.e., in the accessibility among the genotypes. In the simplest case, of course, genotypic accessibility is given by (point) mutations and leads us back to sequence spaces.

28

Both secondary structures, of course, do not reproduce the base pairing patterns exactly but the qualitative features used in the interpretation of the experimental data seem to be correct.

50

P. Schuster and P. F. Stadler

4.1 Accessibility The properties of the GP-map of the RNA secondary structure model ensure that RNA evolution can be understood at an approximate level by means of RNA structures. This insight relies on several features that cannot be taken for granted in other settings. (i) The extensive neutral networks of a secondary structure facilitate efficient diffusion in sequence space without phenotypic change. (ii) Shape space covering ensures that many of the phenotypic transitions caused by a single mutation are similar for most of the sequences that belong to the neutral network of a given secondary structure. As a result, it makes sense to consider the frequency of encountering the structure Sk as a point mutant of sequence X i that folds into the shape Si . More formally, the accessibility can be quantified as follows: a(Sk ← Si ) :=

⎲

1 |ψ−1 (S

j )|

X i :ψ(X i )=Si

1 |N (X i )|

⎲

δψ(X j ),Sk

(24)

X j ∈N (X i )

Importantly, accessibility—in contrast to distance—is not a symmetric measure. This is most clearly seen when a threshold value is used and only the frequently accessible neighbors Sk of Si with a(Sk ← Si ) ≥ a0 are considered. A simple analogy is formed by countries and their borders. Austria and Germany share a long border that accounts for sizeable fraction of each countries total border length: thus they are mutually frequently accessible. On the other hand, Austria shares a minute fraction of its borders with Liechtenstein, which is a tiny country, and hence this short stretch of border is nearly half of Liechtenstein’s border length. While Liechtenstein is not frequently accessible from Austria, Austria is frequently accessible from Liechtenstein. The same situation pertains to RNA structures and phenotypes in general. Rare phenotypes, in particular, are not frequently accessible from abundant ones and thus are usually overlooked by evolution and biotechnology (Fontana and Schuster 1998b, a). Let us define the accessible set of a phenotype as C(Si ) := {Sk |a(Sk ← Si ) ≥ a0 }

(25)

Due to the high level of neutrality in RNA secondary structures, we may safely assume S j ∈ C(S j ). In the case of RNA secondary structures it is possible as a further approximation to relate changes in structure that are frequent, easily accessible or continuous (C ) to local structural motives (Fig. 22). In particular, one frequently observes the elongation or the shorting of a stem and the opening of constrained stems, i.e., those that enclose multiloops. These simple rules (Fontana and Schuster 1998b, a) can then be used replace the more quantitative rule of Eq. (25). In addition to the continuous transitions major less frequent structure changing events are observed, which are classified as rare or discontinuous transitions (D ). Some transitions of this class are listed in

Virus Evolution on Fitness Landscapes

51

Fig. 22 Continuous or frequent transitions. In the upper part transitions between phenotypes A ↔ B are shown, which occur frequently in both directions, because they connect phenotypes within the overlap of the accessibility domains, a ≥ a0 , of A and B. Adding single base pairs to stacks or removing single base pairs from (unconstrained) stacks are examples. A constrained stack, for example a stack ending in a multiloop, is easily opened because B lies in the accessibility range of A but hard to close since A is not accessible to B. This shows that the accessibility of transitions is not commutative. The inserts on the right-hand side show sketches of the ranges of accessibility of the two phenotypes (red and green) and their overlap (yellow). In the upper process both transitions A → B and B → A remain within the range of accessibility of A, are continuous (C ), and occur frequently. The transition B → A in the second process involving constrained stacks leaves the region of accessibility of B and is a discontinuous or rare transition (D). The row at the bottom shows a cascade of continuous transitions (C ), which replaces a discontinuous transition (D) with the help of four intermediates Ai (i = 1, . . . , 4)

Fig. 23 and will be discussed in the next paragraph on the basis of RNA evolution in computer simulation. The rare transitions involve simultaneous opening and closing

52

P. Schuster and P. F. Stadler

Fig. 23 Discontinuous or rare transitions. The rare transitions sketched in the figure involve simultaneous changes of several base pairs and do not occur frequently. Nevertheless, they were encountered in a computer simulation of RNA evolution in the flow reactor (Fontana and Schuster 1998b, a). Both target shapes lie outside then the range of accessibility from the initial structure

of several base pairs, they occur with low probability, and they are inaccessible in the sense of accessibility defined in Eq. (24). Either way, accessibility renders the phenotype space as a directed graph since transitions are no longer symmetric (A → B /≡ B → A).

4.2 Evolution in silico and Discontinuous Transitions A typical computer experiment modeling RNA evolution in the flow reactor (Fig. 25) is shown in Fig. 24 (Fontana and Schuster 1998a): A population Π of N RNA molecules replicates in the flow reactor, occasional replication errors create mutants, which compete with the molecules that are already present the reactor, and randomly chosen molecules leave the reactor in order to compensate for population growth (see Sect. 5.2). The selection (criterion) is encoded in the fitness function on the RNA landscape, f k = Φ(Sk ) = Φ Ψ(X k ) and in particular the replication rate of indi-

Virus Evolution on Fitness Landscapes

53

vidual RNA sequences X k (28b) is assumed to increase structure ( with decreasing ) distance of the shape Sk from the target shape S 0 , h k dS (Sk , S 0 ) is monotonously decreasing. Thus the computer simulation models evolution that is directed toward a predefined target structure. In order to trace the evolutionary process the temporal succession of shapes rather than individual sequences is considered and a relay series is constructed: The relay series is a reconstruction of the evolution in retrospect. The starting point is the target shape S 0 at the time when the population has reached the target. The parental shape S1 , the structure derived from the sequence X 1 , is the precursor of S 0 . Next comes S2 from which S1 was produced by mutation S2 → S1 , etc. The relay series ends with the last shape Sm in the initial population. In the forward—increasing time—direction the relay series is a sequence of m transitions, Sm → Sm−1 , Sm−1 → Sm−2 , . . . , S1 → S 0 ending at the target shape. A typical relay series contains many continuous (C) and a few discontinuous transitions (D)—see Fig. 24, for example. In essence both classes of transitions are necessary: Discontinuous transitions are responsible for the major rearrangements of structure and continuous transitions do the fine tuning. The approach to target of the population as a whole, Π(t), is recorded by a trajectory plotting the average structure distance from the target structure as a function of time t, N ( ) 1 ⎲ d S Π(t) = (26) dS (Si , S 0 ) ; Si = Ψ(X i ) ∧ X i ∈ Π(t) . N i=1 Apart from a fast initial sequence of transitions, where shapes with higher fitness are readily accessible from most sequences, the approach to the target structures occurs in steps: Long periods of stasis with random fluctuations tantamount to random genetic drift are interrupted by short phases of major progress in the approach. Each of the fast episodes is initiated by a discontinuous transition, which is often followed by a cascade of continuous transitions and a period of stasis. A straightforward interpretation of the static periods is random drift of the population on a neutral network until a position is found from where a discontinuous transition is less improbable (Schuster 2010). The synthetic theory of evolution and in particular Ronald Fisher’s view based evolution in essence on a great number of very small changes—a concept known as gradualism—had no room for discontinuous, large transitions, and ridiculed Richard Goldschmidt’s hopeful monsters (Goldschmidt 1940), which he considered necessary for an explanation of speciation. Evidence for the occurrence of large, discontinuous transitions, however, came from the Paleontologists Niles Eldredge and Steven Gould in form of the theory of punctuated equilibria (Eldredge and Gould 1972; McGowran 2005). The results of the described computer evolution experiment were nicely confirmed by the results of the long-time, 50 000 generations laboratory experiment with Escherichia coli by Richard Lenski and his coworkers (2012, 2020). Chouard (2010) describes the current situation brilliantly in his essay Revenge of the hopeful monsters:

54

P. Schuster and P. F. Stadler

Fig. 24 A typical trajectory of RNA evolution. The plot shows a computer simulation of RNA evolution in the flow reactor, which has been constrained stochastically to a population size N (Fontana and Schuster 1998b, a) (The flow reactor is described in Sect. 5.1 and Fig. 25). A population Π(t) of N = 1000 RNA molecules with constant chain length l = 76 evolves from the structure S41 of some random sequence X 41 toward the structure of tRNAphe shown as S 0 . The process is terminated when the target sequence has appeared in the population. ( The) progress of the process is monitored by the mean distance to target of the population, d S Π(t) (26) (black curve). The relay series is shown (with equal step heights) in green, six selected shapes of the series are shown on top, and the presence of the individual shapes in the reactor is indicated by thin red stretches. Discontinuous transitions are marked by D and classified according to Fig. 23, two selected continuous transitions in the relay series are indicated by C, and a shift transition around t ≈ 460 that has no effect on fitness is addressed as silent shift (gray). The fitness is expressed here in terms of the replication rate constant of individual sequences X k (k = 1, . . . , N )—h k in 1 . Choice of parameters: N = 1000, p = 0.001, one computer run Eq. (28b): h k = α + d S (Sk ,S 0 )/l

amounts ≈ 107 replications

Virus Evolution on Fitness Landscapes

55

Experiments have revealed how single mutations can have huge effects. But small steps pave the way. Both kinds of transitions are indispensable for understanding evolution and the molecular approach provides a great variety of fascinating details. RNA virus evolution certainly follows the same principles as the RNA model.

4.3 Accessibility and the Topology of Phenotype Space It has turned out to be even more fruitful, however, to view accessibility from a different mathematical perspective. Instead of considering each phenotype in isolaphenotypes. If these evolve tion one may also consider accessibility for a set A of U independently of each other, we simply have C(A) = Si ∈A C(Si ), i.e., the union of their accessibilities. To a first approximation, this is the case for the evolution of many viruses. More generally, however, C(A) can also account for recombination and reassortment, which are key factors, e.g., in evolution of influenza (Pérez-Losada et al. 2015). Given a population of phenotypes A, the set C(A) is simply the set of phenotypes that is easily generated as offsprings. In general, one will assume that evolution is slow enough to ensure A ⊆ C(A), i.e., that evolution allows us to retain the current phenotypes in addition to producing novelty, namely the phenotypes C(A) \ A. It stands to reason, finally, that making the population more diverse will provide access to more or at the very least to the same diversity of novel phenotypes. Therefore, A ⊆ B implies C(A) ⊆ C(B). This property, which in the case of independent evolution follows directly from the definition of C(A), is called isotony. ˇ The mathematical field of topology (Cech 1966; Adams and Franzosa 2008) is concerned with the properties of objects that are preserved under a wide range of deformations, which do not require specific, quantitative measures of distances or similarities. In particular, it provides a formal basis for concepts such as boundaries and interiors of sets, and a distinction between continuous and discontinuous relationships. This is done by imposing topological structures on a set of objects. As it turns out, the notion of accessibility (24) captured in the function C that maps sets of phenotypes to accessible sets of phenotypes has just the right base properties for this purpose. A function that is isotonic, satisfies A ⊆ C(A), and adheres to the extra axiom C(∅) = ∅, which forbids the creation of phenotypes out of nothing, is known as a closure function and turns the phenotype space into a so-called neighborhood space. The interior I (A) of set consists of those phenotypes that are not accessible from outside of A, i.e., I (A) := X \ C(X \ A) and the boundary of the set is that, which is accessible both from inside and outside, i.e., ∂A := C(A) ∩ C(X \ A). A central notion for the application of topology is continuity. Formally, ( −1 a func) f (B) ⊆ Y between two neighborhood spaces is continuous if C tion (f : X → X ) spaces this is equivalent to closure f −1 CY (B) holds for all sets B. In neighborhood ( ) preservation (CP), i.e., if f (A) ⊆ B then f CX (A) ⊆ CY (B). For the GP-map ψ, this mean generating easily accessible variants of genotypes, for example, through

56

P. Schuster and P. F. Stadler

point mutations, only produces easily accessible variants of phenotypes. In general, this will not be the case, since occasionally, point mutations will also lead to phenotypes that are not frequently accessible as discussed above (Fig. 23). Thus f is usually not continuous. One can use this idea also to consider trajectories being temporally ordered sequences of genotypes X 1 , X 2 , . . . , X k and the corresponding sequences of phenotypes Si = Ψ(X i ). It turns out that a step along the trajectory is / C(Ψ(X j ) or in other words whenever the discontinuous if and only if Ψ(X j+1 ) ∈ phenotype S j+1 is not easily accessible from S j (Stadler et al. 2001). Of course, this matches our intuition all along (Fontana and Schuster 1998b, a). Nevertheless, it is reassuring to see that the distinction of continuous and discontinuous transition fits seamlessly into a coherent mathematical framework.

4.4 Coarse Graining and Quotients So far, we have considered the phenotype space—in particular the shape space of RNAs—at a particular resolution, namely that of the RNA secondary structure model. As alluded to before, this particular representation can be viewed as a coarse graining obtained by partitioning the much larger space of 3d conformations in subsets in which a specific set of base pairs is recognizable formed. The software tool RNApdbee (Zok et al. 2018), for instance, performs this task computationally by extracting the secondary structures from 3d models of the RNA molecules stored in the protein data bank (PDB) database. In the setting of topology, this kind of coarse graining is captured by quotient spaces, which inherit their notion of accessibility from the more fine-grained models. An example of a quotient space of coarse-grained RNA secondary structures is equivalence classes of secondary structures structure that share the same branching pattern. A coarse-grained structure class Q s is accessible from a class Q r if there is a structure Sk ∈ Q s that is frequently accessible from Si ∈ Q r , i.e., a(Sk ← Si ) ≥ a0 . Models of RNA that are even coarser than secondary structures have played a role in understanding RNA biology as well. Examples include stem-centered models of RNA folding kinetics (Mironov and Lebedev 1993) in which the length of helical regions and interconnecting loops is disregarded. An entire hierarchy of coarse-grainings has been considered in the approach available as computer program RNAshapes (Giegerich et al. 2004; Björn et al. 2006). Levels of interest include secondary structures with the same arrangement of stems, obtained by ignoring distinctions between the lengths of stems and loops, and the branching structure, where also interior loops and bulges are ignored. Practical applications of such coarsegrained models include the construction statistical tests for the conservation of RNA structure in an evolutionary context (Gruber et al. 2008). Recently this approach has also been used to show that natural RNA phenotypes indeed conform to frequent shapes (Dingle et al. 2020), as predicted for landscapes dominated by neutral networks and shape space covering (Schuster et al. 1994).

Virus Evolution on Fitness Landscapes

57

From a mathematical point of view, this kind of coarse graining preserve the continuity of transitions. This is, if an evolutionary trajectory is continuous at a certain level of description, it is also continuous at a coarser one. The converse, however, is not true, i.e., perceived continuity may be lost when the resolution is increased.

5 Quasispecies—Populations in Sequence Space Evolution can be viewed as the temporal change of population structure in sequence space. In this setting, the natural focus is on the relation between the distribution of fitness values and population dynamics Π(t): How is the fitness landscape reflected by the distribution of variants in the population? Populations of molecular species Xi , which are capable of reproduction and therefore often called replicators—for example RNA molecules or variants of a virus, change as a result of reproduction, mutation, and selection. A simple experimental setup for studying evolution in vitro is provided by a flow reactor (Fig. 25) that represents an open system with an inflow of stock solution containing the material required for reproduction, A, with a flow rate r = dV / dt. The outflow of solution in the reactor compensates the gain in volume V . Two issues are especially relevant for viruses: (i) the influence of mutation rates on population structures, and (ii) the role of population size in natural selection and evolution. Eigen (1971) conceived a theory of molecular evolution, which is based on chemical kinetics and molecular biology and which can provide answers to these questions. Here, we present quasispecies theory only in form of a brief account (Schuster 2013a) and refer for more detailed presentations to the literature (Eigen and Schuster 1978; Eigen et al. 1989; Schuster 2016).

5.1 Replication and Selection A population Π = {X1 , . . . , Xn } is an ensemble of n different molecular species Xi , which are present in the copy numbers Ni (t) = [Xi ] (i = 1, . . . , n). Vector representation in an n-dimensional state space is of the form n ⎲ ( ) Π(t) = N1 (t), . . . Nn (t) with Ni (t) = N (t) or i=1

(27)

Π(t) = N1 (t) e1 + . . . + Nn (t) en , where N (t) is the population size and ei a unit vector in state space pointing in the direction of species Xi . Without mutation the chemical rate equations are:

58

P. Schuster and P. F. Stadler

Fig. 25 A flowreactor for the evolution of RNA molecules. A stock solution containing all materials for RNA replication, [A] = a0 , including an RNA polymerase flows continuously into a well stirred tank reactor (CSTR) at a flow rate r = dV / dt and a sample of equal volume containing reaction mixture, [S] = [A, B, X i ; i = 1, . . . , n] with the concentrations s = a, b, ci = Ni /V , leaves RNA molecules, Π = the reactor (For other setups see Watts and Schwarz 1997).A population of ∑ n Ni (t), replicates {X1 , X2 , . . . , Xn } present in numbers N1 (t), N2 (t), . . . , Nn (t) with N (t) = i=1 and mutates in the reactor. The flow provides the selection pressure: If the effective reproduction rate of an RNA molecule is too small it is diluted out of the reactor. At constant environmental conditions the reactor approaches a stationary state at which the population fluctuates around a mean √ value, N ± N . As a pendant to natural selection the fastest replicators prevail in the stationary state. The RNA flow reactor has been used also as an appropriate tool for computer simulations (Fontana and Schuster 1987, 1998b, a; Huynen et al. 1996). There, criteria other than selection for fast replication were applied too. For example, a fitness function may be defined that measures the distance to a predefined target structure and then mean fitness increases as the distance decreases during the approach toward the target (see Fontana and Schuster 1998a and Sect. 4.2)

[A]0 ·r

∗ −−−→ A hi

→ 2 Xi A + Xi − di

→ B Xi − r

→ ∅, A , B , Xi −

(28a) (28b) (28c) (28d)

Virus Evolution on Fitness Landscapes

59

Herein A represents the symbol for all materials required for reproduction, [A]0 = a0 is the concentration of A in the stock solution and B stands for the degradation products. The overall rate constant for the multi-step replication process of the individual replicator Xi is denoted by h i , and di is its degradation rate constant. The relation to biological evolution is made through the fitness values: f (Xi ) = f i = h i · [A]. The variables of the flowreactor model are the numbers of the individual replicators Xi at time t, Ni (t) or their concentrations ci (t) = Ni (t)/V . The concentration of the building material is written as a(t) = [ A] = N A /V and for B no variable is required since it does not enter the kinetic equations of the steps (28a), (28b), and (28c). The kinetic equations of the (n + 1) variables are (⎲ ) n n ⎲ da h i ci + r (a0 − a) = − h i ci + r a + r a0 = −a · dt i=1 i=1

(29a)

dci = h i ci · a − r ci = (h i a − r ) ci ; i = 1, . . . , n . dt

(29b)

The kinetic system sustains a conservation relation, ( ) ( ) n n ⎲ ⎲ d ci = r a0 − a − ci , a+ dt i=1 i=1 ∑n ci )/ dt = 0 )leads to a + which at the stationary state defined by d(a + ( i=1 ∑ n t c = a . Normalized variables Π/N = x = x (t), . . . , xn (t) with xi (t) = i 0 1 i=1 ∑n Ni (t)/N (t) and x (t) = 1 leave x (t) (i = 1, . . . , n − 1) and N (t) as the n i i=1 i independent variables. The introduction of an unspecific flow Φ(t), which allows for regulating the population size N (t) instead of the flow control, renders the model more general: r ⇒ Φ(t). Eventually we assume the concentration of the building material to be buffered or available in excess [A] = const and f i = h i · [A] and obtain the selection equation: ( ) dxi = f i xi − xi · Φ(t) = f i − Φ(t) xi ; i = 1, 2, . . . , n . dt

(30)

The assumption of constant population size N (t) = N = const requires a flow term ∑ of the form Φ(t) = nj=1 f j x j (t) = f (t), which turns out to be the mean fitness of the population. The time dependence of the flow reveals an evolutionarily interesting property:

60

P. Schuster and P. F. Stadler n n n ( ) ⎲ ⎲ ⎲ dΦ dxi fi f i f i xi − xi fjxj = = = dt dt i=1 i=1 j=1

=

n ⎲

f i2 xi

i=1

−

n ⎲

f i xi

i=1

n ⎲

fjxj = f 2

( )2 = var{ f } ≥ 0 . − f

(31)

j=1

The flow or the mean fitness of the population is a non-decreasing function of time or, in other words f (t) is optimized during evolution.29 A general solution of equation (30) can be obtained by means of an integrating factor transformation (Zwillinger 1998, p. 322ff.) and is of the form: xi (0) · exp( f i t) ; i = 1, 2, . . . , n . j=1 x j (0) · exp( f j t)

xi (t) = ∑n

(32)

Under the assumption that the largest fitness value is non-degenerate, max{ f i ; i = 1, 2, . . . n} = f m > f i ∀ i /= m , every solution curve fulfilling the initial conditions xi (0) > 0 ∀ i approaches the homogeneous population lim xm (t) = x m = 1 and

t→∞

lim xi (t) = x i = 0 ∀i /= m ,

t→∞

the mean fitness of the population increases monotonously to its maximum value, limt→∞ f (t) = f m and the fittest variant Xm is selected. The mathematically simple model (30) of natural selection is based on ODEs in continuous variables x(t) and hence implicitly assumes infinite population size N . The scenario described by Eq. (30) is a good approximation only if N >> n or, in other words if the number of different molecular species n is negligibly small compared to the total population size.

5.2 Replication, Mutation and Selection Eigen’s theory of molecular evolution (Eigen 1971) considers correct replication and mutations as parallel chemical reactions (Fig. 26). Reproduction is a highly complex process even in the simplest case of enzymatic single strand replication of RNA and DNA (Biebricher et al. 1983). RNA replication is in the core of reproduction of conventional RNA viruses, where the enzyme, an RNA dependent RNA polymerase 29

The derivation presented here is the naïve version of Fisher’s fundamental theorem of natural selection (see Fisher 1930, p. 35 or Fisher 1958, p. 37) in a nutshell. For detailed discussions see Ewens (1989), Frank and Slatkin (1992), Lessard (1997), Price (1972).

Virus Evolution on Fitness Landscapes

61

Fig. 26 Replication and mutation of single stranded polynucleotides. The figure sketches a replication mechanism for single strand RNA or DNA. The replicase molecule (blue) binds the template molecule X j (orange). The material required for reproduction is denoted by A, the various variants produced by correct replication and mutations are denoted by Xi (i = 1, . . . , n) and characterized by different colors. Since the initiation of replication does not interfere with copying process the total replication rate of the template X j is f j for the production of correct copies or mutants. The factor Q i j measures ∑n the frequency at which Xi is obtained as an error copy of X j : X j → Xi (i = 1, . . . , n) with i=1 Q i j = 1. An important task of the enzyme or the experimental (+) (−) setup is to prevent the formation of double stranded duplices X j · X j (i)

R, fulfills two tasks: (i) it catalyzes the synthesis of a strand that is complementary to the template, A + Xi(+) + R → Xi(−) + Xi(+) + R or A + Xi(−) + R → Xi(+) + Xi(−) + R and (ii) it takes care that formation of the double stranded RNA duplex Xi(+) · Xi(−) is avoided. The double strand once formed does not melt at the replication temperature and accordingly is no template for the replicase R. In the replication of the bacteriophage Qβ RNA the enzyme makes use of double helix complementarity at the active site but then separates plus and minus strand through secondary structure formation of the growing strand (Weissmann 1974). Single strand DNA is multiplied routinely by means of the polymerase chain reaction (PCR) technique. In this case the DNA dependent DNA polymerase— commonly an heat resistant enzyme of the bacterium Thermus aquaticus known as Taq polymerase—completes the single strand to a duplex and double strand separation is achieved by rising temperature. Apart from a transient phase the plus-minus ensemble Xi(±) = {Xi(+), Xi(−) } grows like a single (simple) replicator with a fitness

62

P. Schuster and P. F. Stadler

/

f i(+) · f i(−) and the stationary ratio of the strands in complementary / / / replication fulfills Xi(+) Xi(−) = f i(−) f i(+) (Eigen 1971) (see also Stadler 1991; Schuster 2013a). Mutations are introduced into the selection equation (30) as replication errors. Then, the simple autocatalytic process A + X j → 2X j is modified to n parallel reactions of the form A + X j → Xi + X j with i = 1, . . . , n where—as shown in Fig. 26—Xi is either a correct copy or an error copy—or mutant—of X j . The initiation of the replication process is independent of an error occurring later during the copying process and hence the total replication rate, correct copies and errors, is the same: f j for the template X j . The frequency of a mutation X j → Xi is denoted by Q i j . All mutation frequencies are subsumed in the mutation matrix

value f i(±) =

⎛

Q 11 Q 12 ⎜ Q 21 Q 22 ⎜ Q = ⎜ . .. ⎝ .. . Q n1 Q n2

... ... .. .

⎞ Q 1n Q 2n ⎟ ⎟ .. ⎟ = {Q i j ∀ i, j = 1, . . . , n} . . ⎠

(33)

. . . Q nn

Since any event yields either a correct copy or a mutant Q is a stochastic ∑replication n Q i j = 1.30 Since p is the error rate, (1 − p) is the frequency at which matrix: i=1 the correct nucleotide is incorporated into the newly synthesized polynucleotide. Several/expressions become simpler when the ratio of mutation and correct copying, ε = p (1 − p), is used. Introduction of the mutation term into Eq. (30) yields the mutation-selection equation n n ⎲ ⎲ dxi Q i j f j x j − xi · Φ ; i = 1, 2, . . . , n with Φ = f i xi = f , (34) = dt j=1 i=1

where again the restriction to constant total population size N has been used: n n ⎲ ⎲ i=1 j=1

Qi j f j x j =

n ⎲ j=1

fj xj

n ⎲ i=1

Qi j =

n ⎲

fj xj = f = Φ .

j=1

Vector notation is particularly useful for the calculation of the solution of the differential equation (34): We define a matrix of fitness values, the diagonal matrix F = {Fi j = f i · δi j } or F = diag( f 1 , . . . , f n ), and a value matrix W = Q · F = {Wi j = Q i j · f j }. Then the mutation-selection equation takes on the form

In case one makes the (unrealistic) assumption of equal probabilities of X j → Xi and Xi → X j the matrix Q is symmetric and hence a bistochastic matrix. A symmetric matrix Q facilitates the construction of computer models for mutation but is not essential for the derivation of solutions. 30

Virus Evolution on Fitness Landscapes

dxt = (Q · F − Φ) · xt , dt

63

(34' )

where xt is again the column vector of the normalized variables. The exact solutions are derived in full analogy to the selection equation (30): In terms of the eigenvalues and eigenvectors of the matrix W we find: ∑n lik h k (0) exp(γk t) ∑n = xi (t) = ∑n k=1 j=1 k=1 l jk h k (0) exp(γk t) ∑n ∑n k=1 lik m=1 h km x m (0) exp(γk t) ∑n . = ∑n ∑n j=1 k=1 l jk m=1 h km x m (0) exp(γk t)

(35)

In this expression the exponents γk are the eigenvalues of the matrix W, which can be obtained by a linear transformation that brings W to diagonal form: Γ = L−1 · W · L , with W · L = L · Γ , H · W = Γ · H and L−1 = H . Herein the columns of L are the right eigenvectors l j = (li j , i = 1 . . . , n)t , and the rows of H, hk = (h ki , i = 1 . . . , n), the left eigenvectors of the value matrix W . The eigenvalues are assumed to be ordered31 : γ1 > γ2 ≥ γ3 ≥ . . . ≥ γn . ∑ The initial conditions are encapsulated in the equation h k (0) = nm=0 h km xm (0). The stationary solution is obtained straightforwardly by taking the long-time limit in which only the coefficient of the largest eigenvalue, γ1 , survives: ∑n lik h k (0) exp(γk t) ∑n x i = lim ∑n k=1 = t→∞ j=1 k=1 l jk h k (0) exp(γk t) li1 h 1 (0) exp(γ1 t) li1 = ∑n . = ∑n j=1 l j1 h 1 (0) exp(γ1 t) j=1 l j1

(36)

The stationary solution is the normalized, dominant right-hand eigenvector of the / ∑n t ¯ =^ value matrix W: Υ l1 = (l11 , l12 , . . . , l1n ) j=1 l j1 and was called quasispecies (Eigen and Schuster 1977) in order to indicate that the stationary mutant distribution is the genetic reservoir for evolving prokaryotic populations. The properties of the stationary solution of equation (34) can be obtained directly from a theorem on non-negative primitive matrices T derived by Frobenius (Seneta, 1981). The four statements that are most important for virus evolution are: (i) The largest eigenvalue is real and positive: γ1 > 0, The fact that the largest eigenvalue γ1 is non-degenerate follows from Perron-Frobenius theorem (see below). 31

64

P. Schuster and P. F. Stadler

(ii) a strictly positive right eigenvector l1 and a strictly positive left eigenvector h1 are associated with γ1 , (iii) γ1 > |γ j | holds for all eigenvalues γ j /= γ1 , and (iv) the eigenvectors associated with γ1 are unique up to constant factors. The theorem ensures that the quasispecies is unique and all replicators are present at (positive) concentrations at the stationary state, the eigenvalue corresponding to the growth rate is positive and real and this excludes oscillations. The conventional coordinates of the population can be replaced by variables referring to the contributions of the individual eigenvectors of W: n n ⎲ ⎲ Π t t xi = ξi = 1 . = x1 e1 + . . . + xn en = ξ1 l1 , . . . , ξn ln with N i=1 i=1

The optimization principle derived for the kinetic selection equation (34) is readily generalized to the mutation-selection system since the equation in the new variables ) ( dξi = γi − Φ(t) ξi ; i = 1, . . . , n , dt is the same as Eq. (30). The mean excess production or flow Φ(t) will increase, Φ(t) → γ1 until the maximal value is reached. A simple but nevertheless fairly accurate approximation for the position of the error threshold has been derived from the condition that the genetic information stored in the quasispecies is stable— p < pcr (Eigen and Schuster 1977, pp. 549– 554)—with ln σm , (37) pcr = 1 − σm−1/l ≈ l where σm is the superiority of the master sequence, which is defined as the fitness of the master sequence divided by the mean fitness of all other sequences in the population: σm =

fm f −m

∑n with f −m =

i,i/=m f i x i ∑n i,i/=m x i

∑n =

i,i/=m

fi x i

1 − xm

(38)

For a fittest sequence Xm the superiority has to be larger than one, σm > 1, and accordingly its logarithm is positive: ln σm > 0. The expression for the critical value pcr is obtained from a second-order perturbation expression for the largest eigenvalue of the matrix W = Q · F with the master sequence Xm = X1 and di j = dH (Xi , X j ) (Eigen 1971): γ1 ≈ Wmm +

⎲ Wmi Wim ⎲ fi fm = (1 − p)l f m + εdim , W − W f − fi mm ii m i/=m i/=m

Virus Evolution on Fitness Landscapes

65

Neglect of the second-order terms that constitute the sum on the r.h.s. leads to Wmm > f −m or (1 − p)l > (1 − pcr )l = σm−1 and Eq. (37).

5.3 Quasispecies on Uniform Class Landscapes The influence of mutation rates on the selection process and the distribution of variants in the quasispecies is illustrated by means of a simple model (Swetina and Schuster 1982), which is based on four assumptions: (i) uniform class landscapes where all sequences belonging to the same class have the same fitness values, f k for class Γk (k = 0, 1, . . . , l), (ii) consideration of point mutations only, (iii) uniform error rates implying mutation rates per site and replication, p, which are independent of the position on the sequence, and (iv) binary sequences. Mutants are ordered with respect to mutant classes Γk (see, for example, Fig. 3), where k = d0 j = dH (X 0 , X j ) (k = 0, 1, . . . , l and j = 0, 1, . . . , κl ) is the Hamming dis() tance from the reference X0 , Γk = {X j | dH (X 0 , X j ) = k}, and νk = (κ − 1)k kl is the number of sequences in the mutant class Γk . In other words, Γ0 contains only the reference sequence and ν0 = 1, Γ1 contains all one-error mutants of the reference () sequence with ν1 = (κ − 1)l, Γ2 all two-error mutants with ν2 = (κ − 1)2 2l , etc. On a uniform class fitness landscape the largest fitness value is assigned to a fittest replicator, which is chosen as reference sequence X0 ≡ Xm and which will be called master sequence. The same fitness values are assigned to all sequences in a given class and we assume that fitness is a non-increasing function of the Hamming distance dH : fm = f0 > f1 ≥ f2 ≥ . . . ≥ fl , A closely related further simplification yields the single peak fitness landscape where the fitness values for all classes except Γ0 are the same: f m = f 0 > f n , f i = f n = f −m ∀ i /= m, and σm = f m / f n . Assuming uniform error rates facilitates the calculation of the mutation matrix Q enormously since its elements Q i j depend only on three parameters and have the simple form ( )l−di j , (39) Q i j = p di j · 1 − (κ − 1) p where p is the mutation rate per site and replication, di j = dH (X i , X j ) the Hamming distance between the two sequences Xi and X j , and l the chain length of the polynucleotide. For binary sequences, κ = 2, we obtain the simpler equation with / ε = p (1 − p):

66

P. Schuster and P. F. Stadler

( )l−di j Q i j = p di j · 1 − p = (1 − p)l · εdi j = q l · εdi j ,

(39)

where q = 1 − p is the single digit accuracy of replication. Accordingly, the diagonal element of the mutation matrix Q ii = q l is the probability of correct replication of the entire polynucleotide sequence. It is a measure of the quality of replication and has been called quality factor therefore (Eigen 1971). Since the fitness value f k and the error rate p are independent of the particular sequence in a class, it is possible to reduce the number of variables drastically and to use variables yk for entire mutant classes Γk : yk =

⎲

x j , k = 0, 1, . . . , l with

j,X j ∈Γk

l ⎲

yk = 1 .

(40)

k=0

The kinetic differential equations in the new variables yk look the same as Eq. (34) l l ⎲ ⎲ dyk Q k j f j y j − yk · Φ ; i = 0, 1, . . . , l with Φ = f k yk = f , (41) = dt j=0 k=0

with exception of the matrix elements Q k j , which still have to be calculated. The ¯ stationary ∑ mutant distribution is the quasispecies Υ written in the class-variables: yk = x . There is an important but not immediately obvious difference j,X j ∈Γk j between two-letter and four-letter alphabets (Fontana et al. 1993a): Two-letter alphabets have a symmetry in the numbers of sequences belonging to classes that is lacking with four-letter sequences: ) k = νl−k resulting from a basic properties of the bino() ( lν (Figs. 3 and 27). mial distribution kl = l−k The calculation of the mutation matrix Q for binary sequences is straightforward but for four-letter sequences becomes quite involved as can be guessed from the structure of the four-letter sequence space (Fig. 2) and this explains the restriction to binary sequences the consequences of which will be discussed later. For binary sequences the matrix elements of the mutation matrix Q are of the form Q kd =

min(k,d) ⎲ ( i=d+k−l

k i

)(

l−k d −i

)

p k+d−2i · (1 − p)l−(k+d−2i ) ;

(42)

k, d = 0, 1, . . . , l . We remark that the mutation matrix is no longer symmetric, Q kd /= Q dk , as follows from inspection of equation (42). ¯ p) of binary sequences with chain length l = 20 on a In Fig. 27 a quasispecies Υ( single peak fitness landscape with superiority σm = 2 is plotted as a function of the error rate p. Error free replication, p = 0, is described by the solution in Eq. (32) and leads to selection of the master sequence: y 0 (0) = 1 and y i (0) = 0 ∀ i = 1, . . . , l or ¯ = (1, 0, . . . , 0). With increasing p the frequency of the master sequence decreases Υ

Virus Evolution on Fitness Landscapes

67

Fig. 27 Quasispecies and mutation rates. The figures show the quasispecies, the stationary distribution of relative concentrations of mutant classes Γk as functions of the error rate, ) ( ¯ p) = y 0 ( p), y 1 ( p), . . . , y l ( p) , for binary sequences of chain length l = 20 on a single peak Υ( landscape with f 0 = 2 and f k = f −m = 1 for k = 1, . . . , 20. The numerically computed solution curves show a sharp transition known as error threshold between the ordered regime to the domain of the uniform distribution at pcr ≈ 0.0356 ( )(black vertical broken line). The uniform distribution is the exact solution at p = pˆ = 21 : y k = kl 2−l . The colors are chosen such that the error classes ˆ for example Γ0 and Γl , Γ1 and Γl−1 , etc., share identical colors with the same frequency at p = p, and hence, curves with the same color (almost) merge at p = pcr . The lower panel is an enlargement of the upper panel

68

P. Schuster and P. F. Stadler

and the mutant classes come up sequentially with increasing d – first the one-error class, secondly the two-error class, etc. – until the ordered regime changes abruptly to an (almost) uniform distribution. Then all sequences have the ( ) (approximately) () same frequency x ≈ κ−l and for the classes we get y k = kl x = kl κ−l . With p further increasing the (almost) uniform distribution extends until the error rate p = (1 − p) = 21 or ε = 1 is reached: Correct copying and error copies occur at the same frequency—a situation that is characterized best by the term random replication—and the uniform distribution becomes the exact solution. The (sharp) transition between the ordered regime and random replication occurs at a mutation rate p = pcr , and it is easily diagnosed, because the curves y k ( p) and y l−k ( p) (almost) merge there as visualized by the black broken line in Fig. 27. This transition from ordered to random replication that has been characterized as error threshold or error catastrophe, since inheritance brakes down because of error accumulation in the random replication domain. Two properties of this transition from ordered to random replication were easily verified by numerical computation: (i) the critical mutation rate moves toward smaller error rates and (ii) becomes sharper when the chain length l is increased. In this respect the error threshold reminds of a cooperative transition from random to ordered structures in biopolymers, proteins, or nucleic acids (Clark et al. 1994; Cornish-Bowden 2013; Engel and Schwarz 1970; Schwarz and Engel 1972).

5.4 Simple Fitness Landscapes Detailed analysis of several different simple fitness landscapes (Fig. 28) showed that the error threshold phenomenon depends strongly on the nature of the landscape. The first computations of error thresholds were performed on single peak landscapes (Swetina and Schuster 1982), a dependence of the existence of an error threshold on the distribution of fitness values has been reported first by Wiehe (1997). Systematic computations of quasispecies on different simple model landscapes describing fitness by monotonous and non-increasing functions f (k) of the Hamming distance from the master sequence, dH (X 0 , X k ) = d = k (Schuster 2011, 2013a, 2016) revealed a direct relation between the steepness of falling of the fitness values with the Hamming distance from the master sequence: A sharp transition from the ordered to the random replication regime occurs for sufficiently steep fitness landscapes. Five different types of simple fitness landscapes were compared: (i) single peak landscapes, (ii) step linear landscapes, (iii) additive landscapes, (iv) multiplicative landscapes, and (v) hyperbolic landscapes. Analytical expressions for these landscapes are: { single peak :

f (k) =

f 0 if k = 0 , f n if k = 1, . . . , l ,

(43a)

Virus Evolution on Fitness Landscapes

69

Fig. 28 Simple model fitness landscapes. The figure compares different landscapes with a discontinuitya in the function for the fitness values with those having a continuous function. The three discontinuous landscapes in the upper panel are: (i) the single peak landscape (black), (ii) the step linear landscape with h = 4 (red), and (iii) the step linear landscapes with h = 6 (blue). On the landscapes (i) and (ii) the quasispecies shows a sharp transition in the sense of an error threshold from the ordered regime directly to the uniform distribution but on the third landscape (iii) the mutant distribution does not. The lower panel shows three ‘continuous’ landscapes: (iv) the additive landscape (green), (v) the multiplicative landscape (yellow), and (vi) the hyperbolic landscape (brown). The hyperbolic landscape (vi) shows a sharp transition but not directly to the uniform distribution, instead the sharp change leads to an intermediate distribution, which undergoes a gradual transformation into the uniform distribution. The other two smooth landscapes (iv) and (v) show no error threshold at all and the ordered distribution becomes the uniform distribution at p = 0.5 through a smooth transformation. a A fitness landscape is, necessarily, a function on a discrete support and this implies that in a strict sense it cannot be continuous because there are no genotypes at non-integer values of the Hamming distance. What is distinguished here are fitness values with a smooth or an angular shaped envelope

70

P. Schuster and P. F. Stadler

step linear :

multiplicative : additive : hyperbolic :

f (k) =

⎧ ⎨

f0 − ( f0 − fn )

⎩f

n

k if k = 0, . . . , h − 1 , h if k = h, . . . , l ,

( f )k/l n , f0 ( f0 − fn ) · k f (k) = f 0 − , and l ( f 0 − f n )(l + 1) · k f (k) = f 0 − . l (k + 1) f (k) = f 0 ·

(43b, c)

(43d) (43e) (43f)

The step linear landscape comes in two versions with different properties concerning the error threshold: (i) the steeper version with h = h s , ⎧ ⎨ f − ( f − f ) k if k = 0, . . . , h − 1 , 0 0 n s hs f (k) = , ⎩ fn if k = 1, . . . , l ,

(43b)

and (ii) the flatter version with h = h f , ⎧ ⎨f −(f − f ) k 0 0 n hf f (k) = ⎩ fn

if k = 0, . . . , h f − 1 , if k = 1, . . . , l ,

.

(43c)

Obviously the function f(k) for h = 1 is the single peak landscape. Examples of plots are shown in Fig. 28, the fitness functions in Eq. (43) are divided into two groups: (i) three / landscapes with functions f (k) that have a discontinuity in the first derivative ∂ f ∂k = f ' (k) in Eqs. (43a), (43b), and (43c), and (ii) three landscapes with continuous first derivatives in Eqs. (43d), (43e), and (43f). Although the latter group with the smooth functions seems to be quite unrealistic from the molecular knowledge available, models with additive and multiplicative fitness are still popular in population genetics (see, e.g., Lobkovsky et al. 2019). The occurrence of error thresholds in the sense of sharp transitions, which become sharper with increasing chain length l, does not depend on a discontinuity in f ' (k): The flat step linear landscape (43c) has a kink at the error class k = h f (Fig. 28, upper panel) and provided h is large enough does not exhibit a threshold phenomenon. The hyperbolic landscape (43f), on the other hand, is smooth and shows a sharp transition. A useful diagnostic / for the existence or non-existence of an error threshold is the derivative ∂ f (k) ∂k shown in Fig. 29: The value of the derivative decreases—stepwise or continuously—with increasing distance d = k from the master sequence, if the absolute value of the slope | f ' | exceeds a certain limit, the landscape sustains a sharp transition, which leads to the (approximate) uniform distribution in case of the landscapes (43a) and (43b), and to some intermediate distribution for the landscape (43f).

Virus Evolution on Fitness Landscapes

71

/ Fig. 29 Steepness of fitness landscapes and error threshold. Shown are the derivatives, ∂ f ' ∂k = f (k), of the six simple fitness landscapes (43a)–(43f). The three steeper landscapes, single peak (43a; black), steep step linear (43b; red), and hyperbolic (43f; brown) show sharp transitions from the ordered to the random quasispecies, whereas the three other landscapes, flat step linear (43c; blue), multiplicative (43d; yellow), and additive (43e; green) show a gradual broad transition that extends up to the point of exact random replication, p = 21 . The single peak and the steep step linear landscapes lead to direct transitions from the ordered quasispecies to the uniform distribution (Fig. 27), whereas the sharp transition on the hyperbolic landscape leads to an intermediate distribution, which changes to the uniform distribution in a broad band. Choice of parameters: l = 20, f 0 = 10, and f n = 1. The initial values outside the plot area are f ' (0) = −9 for (43a; black) and f ' (0) = −9.45 for (43f; brown)

5.5 Realistic Random Landscapes All simple landscapes discussed here were based on the assumption that fitness values of individual sequences in a given mutant class are the same. Actually there is no empirical evidence that this is the case in reality, and this raises the immediate question: How does the quasispecies distribution change when the condition of equal fitness within a given mutation class is relaxed? Instead of aiming at model calculations of fitness values a band of randomly chosen values is introduced. The assumption of fitness values scattered within a band with tunable width seems to be more realistic than a uniform distribution and we chose the notion realistic random fitness landscape (RRL) for this model (Schuster 2012, 2013b). Fitness values for individual sequences are obtained by means of a pseudorandom number generator, the width of the band of values is determined by a parameter ϑ, 0 ≤ ϑ ≤ 1, and two fitness parameters, f 0 and f n , which are the fitness of the master sequence and the mean fitness of all other/(κl − 1) sequences. Accordingly, the superiority of the master sequence is σm ≈ f 0 f n . The analytical expression for such a realistic rugged landscape is:

72

P. Schuster and P. F. Stadler

Fig. 30 A realistic rugged fitness landscape model (RRL). The landscape for 1024 binary sequences of chain length l = 10 is constructed according to Eq. (44). The blue broken lines separate different mutant classes. The band width of the random scatter was chosen to be ϑ = 0.5 and a seed s = 919 was applied for the pseudorandom number generator Legacy of the software package Mathematica (Wolfram 2012). Choice of other parameters: f 0 = 1.1 and f n = 1.0

{ f (X j ) = f j =

if j = 1 , f0 ( ) − 0.5 if j = 2, . . . , κl , f n + 2ϑ( f 0 − f n ) η (s) j

(44)

where η (s) j is the j-th output random number from a pseudorandom number generator with a uniform distribution in the range 0 ≤ η (s) j ≤ 1 that was started with the seeds s. An example with intermediate scatter of fitness values, ϑ = 0.5, is shown in Fig. 30. Figure 31 illustrates the dependence of the quasispecies of the mutation rate p for three different choices of the random scatter: (i) ϑ = 0, no scatter, (ii) ϑ = 0.5, bandwidth Δf = f 0 − f n with fitness values f j ( j > 1) in the range from f n − ( f 0 − f n )/2 to f n + ( f 0 − f n )/2, and (iii) ϑ = 0.9375, a bandwidth a little narrower than the full stretch Δ f = 2( f 0 − f n ). Three properties of quasispecies on RRL landscapes follow directly from Fig. 31: (i) the introduction of different fitness values for the sequences in the classes has the obvious consequence that the sequences belonging to a given class form bands of a width, which increases with increasing randomness ϑ until the bands overlap at sufficiently large parameter values, (ii) the quasispecies show perfect error thresholds despite the scatter of fitness values within classes, and (iii) with increasing random scatter ϑ the error thresholds move to smaller critical error rates pcr . Item (i) is obvious, (ii) is a fact observed in numerical computations, and (iii) can be made plausible by a simple estimate: The contribution of a pair of sequences with fitness values f h,l = f n ± d, which are equidistant to f n —‘h’ stands for high, and ‘l’ for low—to the superiority σm is σm (d) =

(1 −

d 2)

f0 , 0 ≤ d ≤ ϑ, fn + d 2 f0

Virus Evolution on Fitness Landscapes

73

Fig. 31 Quasispecies on a realistic model landscape. The three panels show relative stationary concentrations of all sequences, x j ( p), in the mutant classes Γ0 (black), Γ1 (red), and Γ2 (yellow) on the ‘realistic’ model landscapes (RRL) with ϑ = 0 (top), ϑ = 0.5 (middle), and ϑ = 0.9375 (bottom). Pseudorandom number generator: Extended CA (Mathematica 10), seed s = 491. Choice of other parameters: l = 10, f 0 = 2.0 and f n = 1.0. The error threshold (37) is shown in the top panel (blue)

74

P. Schuster and P. F. Stadler

and this leads to an error threshold in the range pcr (0) = 1 − σm−1/l > pcr (ϑ) ≥ 1 − σm−1l (δ) , where pcr (0) = pcr is the conventional error threshold. More details as well as an analysis of the role of neutrality in RRL has been described (Schuster 2012, 2013a, 2016). Transitions between quasispecies at certain critical error rates ptr can occur on multi-peak landscapes—these are landscapes with multiple local fitness optima (Schuster and Swetina 1988): At small mutation rates, 0 ≤ p < ptr , the sequence with the largest fitness value Xm1 is the master sequence. It is replaced at ptr by a slightly less fit sequence Xm2 , which then represents the master sequence in the range ptr < p < pcr . The nature of the transition between the two master sequences is easily explained: Xm2 is surrounded by mutations with higher fitness values than Xm1 and the backflow from mutants to Xm2 overcompensates the fitness difference between the two master sequences at mutation rates p > ptr . A search for realistic random landscapes (RRL) with transitions demonstrated the existence of special cases with three and more consecutive transitions at large values of the scattering parameter ϑ ≈ 1 (Schuster 2016). The existence of transitions between different quasispecies is relevant for virus evolution, because near a transition a small change in the mutation rate may destabilize the current virus population. The take home messages of quasispecies on landscapes with scattered fitness values are: The scatter broadens the bands for the sequences from a given class, x j ( p) with X j ∈ Γk , but does not change the nature of the error threshold, and increasing scatter moves thresholds toward smaller values of pcr , which can be readily explained since the fitness gap between the master sequence and the fittest other sequence becomes smaller with increasing values of the band width ϑ, f (Xopt ) = (1 − ϑ)( f 0 − f n ), and the sequence with the smallest fitness difference to the master has the largest weight in the expression for the position of the threshold. Certain—not rare—distributions of fitness values in sequence space give rise to complex mutation-selection dynamics in the sense of one or more transitions between different quasispecies in the range of ordered replication. The existence of extended neutral networks adds new features to the quasispecies theory. The evolution of a infinite population on a flat landscape can be described by a quasispecies equation, where every sequence has the same replication rate f j = f . We assume a homogeneous initial distribution x(0) = (1, 0, . . . , 0) and a population size N . Mutations occur with an error rate 0 < p ≤ 1 and lead to a spreading of the population in sequence space that is tantamount to a diffusion process, which approaches a uniform distribution of sequences in sequence space / in the limit lim t → ∞: x i = 1 κl for N = const. Huynen et al. (1996) estimate a diffusion coefficient D0 ≈

5 f lp and D( p) = D0 λ for small mutation rates. 3 + 4N p

Virus Evolution on Fitness Landscapes

75

An approach based on the annealed random map model (Kistler 2015) has been applied to the flat landscape (Derrida and Peliti 1991) and yielded exact solutions for genealogy statistics and genetic variability of the populations. The evolutionarily most important result is the breakup of realistic—not hyper-astronomically large— populations into a number of clones, which drift independently through sequence space. Computer simulation of an RNA population in sequence space yields a similar picture but there is less fragmentation because of the network boundaries confining evolution on a smaller volume than the entire sequence space (Huynen et al. 1996). A result of these computer simulations, which is important for evolution, is the detection of a second error threshold at which the information on a dominant phenotype is lost. Evolution in the RNA model with increasing mutation rates p leads to increasing genetic diversity until the conventional or genotypic error threshold is reached beyond which the information on a dominant RNA sequence is lost. Still—because of extensive neutrality—the dominant phenotype is preserved. Further increase in the error rate eventually leads to a loss of information on the prevailing phenotype at a second critical error rate characterized as phenotypic error threshold. In a colloquial language the population, which was migrating by random drift at mutation rates above the genotypic error threshold falls off the neutral network when the mutation rate is further increased and exceeds the phenotypic error threshold. The existence of intermediate range between mutational instability of the genotype and mutational melting of the phenotype is of great importance, because in this range a population can explore large portions of sequence space without being jeopardized by running into a (phenotypic) error catastrophe. During quasi-stationary epochs of the phenotype populations break up in subclusters, which are spreading in sequence space until an advantageous innovation with fitness increase happens in one of the subclusters, then this subcluster takes over further development, and all other subclusters die out (Schuster 2003, Fig. 11).

5.6 Finite Population Size Quasispecies theory as described so far is formulated in terms of kinetic differential equations, uses continuous variables, and thus makes the implicit assumption of an infinite population size. In the stochastic approach the deterministic scenario is only enhanced by fluctuations as long as the—unrealistic—condition N >> κl is fulfilled. The size of realistic populations compared to the cardinalities of the smallest meaningful sequence and shape spaces, |Q(AUGC) |, however, makes obvious l a dilemma: In most mathematical models populations cover whole sequence space and concentrations of sequences at large distances from the master sequence may therefore be incredibly small. In reality very small particle numbers can only be zero or one and hence real populations cover only tiny regions in sequence space. Saakian et al. (2009), Saakian et al. (2011) truncate a single peak landscape at a certain Hamming distance from the cen-

76

P. Schuster and P. F. Stadler

tral peak in order to confine the population to a small area of sequence space. Although the model is interesting from the mathematical point of view, it fails when confronted with the real world. As we know from extensive empirical studies, structures and functions of protein and RNA molecules from very distant regions in sequence may be very similar and even almost identical as follows from the neutral theory of evolution of Kimura (1983). The RNA model supports these empirical results by means of a theoretical concept in form of shape space covering. Similar results were obtained by the simple model of holey landscapes conceived by Gavrilets (1997a), Gavrilets (2004) (see also Sect. 2.6). The position of the error threshold as a function of the population size N has been calculated by means of a stochastic model based on a birth and death process (Nowak and Schuster 1989). The critical minimal accuracy of replication, qcr = 1 − pcr , can be expanded in a power √series of the reciprocal square root of the population size and thus increases with 1/ N in sufficiently large populations. Accordingly, the critical accuracy is smallest in infinite populations, qcr (N ) > qcr (∞) = σ −1 : ( )) 1( qcr (N ) ≈ qcr (∞) 1 + 2α + 2α2 + α3 + . . . , l )) (1( pcr (N ) ≈ pcr (∞) − 2α + 2α2 + α3 + . . . , l

(45)

√ where α = (σ − 1)/N 1011–12 copies of viral genomes per day per ml of body fluids), and high genetic mutation rates (around 10−5 substitutions per nucleotide) that result in the generation of innumerable variants which are referred to as viral quasispecies. These characteristics support the use of viruses as models in evolutionary studies in particular “in vitro” studies. In these works, the application of an evolutionary perspective for the analysis of the evolution of the viral quasispecies offers a fruitful line of investigation. The viral evolutionary pathways determined in the fitness landscape are very useful for the prediction of the evolution of viral infections that can help to anticipate potential epidemics and to implement intervention strategies and public health measures. The fitness landscapes associate viral genotypes to fitness and allow the depiction of evolutionary trajectories of variants because of the interaction of selection and fitness. Understanding why populations evolve along one trajectory, or a set of trajectories and not others, requires the understanding of how positive and negative selection act on viral genomes and their encoded proteins. In general, the direct experimental determination of real fitness values in organisms is cumbersome and difficult (Schuster et al. 1994). This is why the fitness determination is restricted to a limited number of alleles (Stadler and Schuster 1990). As a consequence, most fitness landscapes have been artificial, theoretical, or estimated (Biebricher and Eigen 2005) and very few fitness landscapes for viral infections have been established with real fitness data (Stadler and Schuster 1990). Fitness is not an absolute value, and it is always referred to a standard reference and to a specific environment (i.e., in vitro cultures, cell type, method of viral quantification, ex vivo studies, etc.). A fitness estimate in viruses requires face-to-face cultures where the virus or clone to be evaluated is compared in competing serial passages with a reference strain (Clarke et al. 1993). This strategy gives information on how and to what extent one variant outcompetes the reference one during serial passages (Yuste et al. 1999). This approach is laborious, and it is appropriate for a few variants within a population, but it is not feasible to multiple strains. The application of new technologies has permitted the determination of the fitness value of many individual variants (Dolan et al. 2018). Among these methods, we include the analysis of many individual host cells, or the detection of numerous individual virus particles and genomes with the help of all the “omics techniques”. The analysis of multiple variants could be approached by means of parallel evolutionary experiments with different mutants or distinct conditions that can be easily quantified by physico-chemical methods like the microfluidic analysis (Guo et al. 2012). Also the analysis of the folding stability of the viral capsid in bulk cultures results in biophysical fitness models (Rotem et al. 2018) and the study of the biophysical characteristics could predict fitness landscapes of drug resistance mutants (Rodrigues et al. 2016). These new technologies have been also used to study specific phenotypic transitions in viruses that underlie host transmission or pathogenesis that affect fitness (Fowler et al. 2010). It is important to make

98

M. S. Delgado et al.

clear that protein stability and/or enzyme function are particularly important determinants of the mutational fitness effects (Wylie and Shakhnovich 2011; Acevedo et al. 2014). An accurate representation of viral fitness of gp160 protein as a function of its protein sequences (a fitness landscape) was performed by a computational approach and was validated by comparisons with experimental antibodies binding affinity measurements (Louie et al. 2018). Also computational models have been used for the prediction of viral fitness in HIV-1 mutants (Barton et al. 2019). In addition to these new techniques, the replication capacity value of a variant in a culture system has been extensively used as a surrogate to estimate fitness (Barton et al. 2019). This method is easy to perform and also permits the quantification of many variants at the same time. The advent of Next-Generation Sequencing (NGS) allowed the analysis of numerous variants within the population of a viral quasispecies (Acevedo et al. 2014; Dolan et al. 2018). The widespread use of the NGS to many viruses from different setup (“in vitro studies”, molecular epidemiology analysis, or outbreak detection) provides a valuable information on the evolution of viruses. In the NGS works, as thousands of sequences are obtained, it is impossible to study individually their fitness, however the frequency of a variant within a quasispecies has been considered a surrogate of the fitness value of this variant within this population (Delgado et al. 2021).

3 Self-organizing Maps Artificial neural networks (ANN) are models that attempt to reproduce the way the human brain works using computational techniques. In this context, they consist of neurons interconnected by synapses with an associated weight (a numerical value). The input information goes across the neural network, where several operations are performed, producing an output value. Knowledge of the network underlies both the individual functioning of each neuron and its interconnection system and the configuration of the weights of the synapses. As in the biological model, ANNs are based on the learning paradigm. In this process, input data are presented to the network iteratively and the weights of the synapses are adjusted in order to reduce the value of some error function. The architecture of the network (number of neurons, organization of the neurons in layers, types of connections between neurons, etc.) and the learning paradigm (supervised, unsupervised, or reinforcement) define the different models of artificial neural networks proposed in the literature. The architecture of SOM (Kohonen 1982, 2001) consists of one input layer where a vector of data is introduced to the network, and one output layer composed of a set of neurons (or nodes) organized in a normally two-dimensional structure: rowscolumns. The resulting structure conforms a grid that establishes the neighborhood connection between neurons: square or hexagon connection. The input data to be processed by the SOM must be expressed as a set of vectors all with the same dimension. Furthermore, each neuron (grid node) has an associated synaptic vector of the same nature and dimension as the input vectors.

Viral Fitness Landscapes Based on Self-organizing Maps

99

The SOM learning algorithm is based on previously stablished Learning Vector Quantization (LVQ) (Makhoul et al. 1985; Merelo et al. 1991) algorithm that clusters a potentially infinite number of input samples around a pre-determined number of codebook vectors. SOM uses a competitive unsupervised learning algorithm that clusters the set of sample data by the process “the winner takes all”, like the classical method of k-means (MacQueen 1967), using Euclidean distances to calculate the best matching unit (bmu) or neuron with minimal distance. The algorithm is described in Box 1. Box 1

Method: a) The set of “training” or sample data is organized in vectors of n-dimension x

k

b) Create a two-dimensional map with a set of weights n-dimension vectors w associated ij

to each cell location (i,j). Initialize all w with random values similar to sample data. ij

c) By turn, compare each sample data vector with each w calculating the Euclidean ij

distance. d) Locate de bmu (best matching unit) as the one with minimal distance. e) Change the bmu weight w and those in the neighborhood to approximate to the sample vector x : k

ij

w (t+1) = w (t) + γ h(i,j) (x (t) - w (t)) ij

ij

k

ij

Where γ is the learning constant (400 Capsicum spp. accessions. In the light of results obtained in the close species tomato (Solanum lycopersicum) through mutagenesis, it can be expected that in such cases most potyviruses would be able to recruit the ortholog eIF4E2 gene, resulting in a lack of resistance or a very limited resistance spectrum (Gauffier et al. 2016). It therefore makes sense that such alleles have not been selected during pepper domestication or in the wild. Evolutionary patterns of eIF4E1-mediated resistance in pepper and potyviruses The large diversity amongst pepper eIF4E1 and potyvirus VPg allowed a thorough analysis of their evolutionary patterns, which made of the pepper-potyvirus pathosystem a privileged model to understand plant-parasite interaction and coevolution. Both proteins show highly similar evolutionary patterns. They exhibit a large amino acid variability but this variability is localized at a very limited number of positions in the proteins. Accordingly, strong positive selection is observed at the variable positions of eIF4E1 and PVY or TEV VPg, whereas the rest of the proteins is evolutionarily constrained (under negative selection) and quite invariable at the amino acid level. These “localized variability” patterns associated with positive selection in eIF4E1 and VPg have at least two important consequences. First, they strongly suggest that pepper and potyviruses have undergone coevolutionary dynamics, i.e., each partner’s evolution was determined at least in part, by the evolution of the other. To get deeper insights into this dynamic, it is possible to perform cross-inoculation experiments, where a set of parasite isolates is inoculated to a set of host genotypes, and to analyse the structure of the resulting interaction matrix. Notably, a nested pattern in the matrix, i.e., when the host range of the more specialized parasite isolates is a subset of the host range of less specialized isolates, would be compatible with an “arms race” model of co-evolution, characterized by the successive fixation of beneficial mutations in the host and parasite populations. In contrast, a modular pattern in the matrix, i.e., when host and parasite genotypes can be grouped into modules, infection being frequent for host and parasite genotypes belonging to the same module but rare for those belonging to different modules, would be compatible with a “Red Queen” model of co-evolution characterized by fluctuating negative frequency-dependent selection in the host and parasite (Woolhouse et al. 2002; Råberg et al. 2014). Moreover, from a genetic point of view, nestedness corresponds rather to a gene-for-gene interaction model whereas modularity corresponds to a matching-allele model with a high degree of specificity between host and parasite genotypes (Moury et al. 2014b; Fig. 2). Infection matrices

126

L. Tamisier et al.

involving PVY isolates and pepper genotypes carrying various eIF4E1 alleles were neither nested nor modular, which reveals an intermediate model between the genefor-gene and matching-allele models, and the potential for either arms race or Red Queen co-evolutionary dynamic or a mixture of both. Second, many mutations in eIF4E1 and VPg have pleiotropic effects. Functional characterization of the effect of VPg mutations in PVY showed that some of them affected the virus ability to overcome several eIF4E1-mediated resistances, either different allelic forms carried by several genotypes within the Capsicum annuum species or even carried by different plant genera within the family Solanaceae: pepper (Capsicum spp.), tomato (Solanum lycopersicum) and tobacco (Nicotiana tabacum) (Moury et al. 2014b). Regarding eIF4E1, we explored these pleiotropy effects thanks

Matching-allele model Modular structure

Plant genotypes

Plant genotypes

Gene-for-gene model Nested structure

Parasite genotypes

Parasite genotypes

Arms race dynamic

Red queen dynamic Genotype frequency

Genotype frequency

Time

Time

Fig. 2 Plant-parasite genetic and co-evolutionary models. Two contrasted genetic models of interaction between plant (in rows) and parasite (in columns) genotypes correspond to different structures, modular (right) or nested (left), of infection matrices (top). In these matrices, black cells correspond to plant infection by the parasite and white cells to a lack of infection (plant resistance). These genetic models are expected to translate into contrasted co-evolutionary dynamics between plant (in blue) and parasite (in red) genotypes (bottom). Red queen dynamic (right) is characterized by a diversity of plant resistance alleles and genotypes and of parasite pathogenicity alleles and genotypes that is maintained by recurrent cycles of negative frequency-dependent selection whereas arms race dynamic (left) is characterized by successive positive selection of newly acquired resistance and pathogenicity alleles in the host and parasite populations, respectively

Virus Evolution Faced to Multiple Host Targets: The Potyvirus …

127

to networks of mutations linking the different alleles, as there was too limited polymorphism to perform a phylogenetic analysis. We built a minimum-spanning network linking the two wild-type susceptibility alleles (pvr1+ and pvr2+ ) and 34 resistance alleles (Fig. 3). This network revealed a partial link with the taxonomy of the genus Capsicum. Indeed, all resistance alleles of C. annuum can be grouped together (and with a few C. frutescens alleles). The alleles of C. pubescens and C. chacoense belonged also to specific clusters. Finally, alleles of C. chinense, C. baccatum and a few C. frutescens alleles cluster together. Within C. annuum, there was no clear separation between eIF4E1 alleles from wild (C. annuum var. glabriusculum) and domesticated (C. annuum var. annuum) genotypes. pvr210

pvr1 73

pvr231

51 L218F

P66T

E160D

G107R

pvr226

S78W P66T

pvr220 73

G107R

pvr221 pvr216

A73T T51A

A73D

pvr1+

A73D N65D

E160G

K216A

R71K

pvr215

pvr2+

A15V 15

A68E

V105I V67E

pvr217

pvr24

D219G

pvr21

219 P66T 219

pvr28 pvr225

D205G

pvr233

pvr27 pvr22

A74D

pvr26

pvr29

D40G

L79R D109N

D205G

P66R

D219N

P66T

pvr213

pvr23

L79R

pvr211

L218R

G107R

pvr224

D205G

A73D

pvr232

D213H

pvr214 A74D pvr223 A73D

A15T

D219E

pvr230

D219N

S77I

D214E

pvr229

pvr212

L79R

pvr222

pvr227

65

T131I

pvr219

pvr218

D40G

pvr228

C. annuum var. annuum C. annuum var. glabriusculum C. frutescens C. chinense C. baccatum C. chacoense C. pubescens

pvr25

Fig. 3 Minimum spanning network (MSN) of eIF4E1 of 432 pepper genotypes belonging to six Capsicum species obtained with the median joining network algorithm implemented in the software network (version 10.2.0.0). Two groups of C. annuum genotypes are distinguished: wild accessions (C. annuum var. glabriusculum) and cultivars (C. annuum var. annuum). Each eIF4E1 allele is indicated by one node displayed as a circle that is connected by branches of the MSN. Circle areas are proportional to the number of sequenced individuals. The amino acid substitutions between neighbouring alleles are indicated on the branches, considering the susceptibility allele pvr1+ , which was observed in four different species, as the root of the tree. In some cases, the mutation direction could not be inferred and only the amino acid position is indicated. Almost all alleles except pvr1+ and pvr2+ confer potyvirus resistance or are found in potyvirus-resistant genotypes. The grey broken line groups all C. annuum genotypes with potyvirus resistance. Based on Charron et al. (2008), Ibiza et al. (2010), Poulicard et al. (2014) and Séverine Lacombe, Carole Caranta and Benoît Moury (unpublished data). The number of genotypes sequenced was 133 for C. annuum var. annuum, 109 for C. annuum var. glabriusculum, 67 for C. baccatum, 58 for C. chinense, 45 for C. frutescens, 18 for C. pubescens and two for C. chacoense

128

L. Tamisier et al.

This eIF4E1 network revealed a very large amount of parallel mutations (i.e., the same mutation arising several times independently, usually in different species) or different mutations affecting the same amino acid position. This is, for example, the case of the D40G, P66T/P66R and G107R mutations in C. annuum and C. chinense, the A73D/A73T and A74D mutations in C. annuum and C. baccatum, the E160D/E160G mutations in C. chinense and C. pubescens. Mutations at positions 15 and 219 have even been observed in three different species (C. annuum, C. frutescens and C. chacoense or C. annuum, C. chinense and C. pubescens, respectively). Indeed, given the lack, or low frequency of interspecific hybridation between these species, it is highly probable that the resistance alleles shared by multiple species are the results of independent and parallel evolution. It is interesting to note that many of those mutations resulted in amino acid changes altering the electrostatic potential at the surface of the eIF4E1 protein (Poulicard et al. 2014). Such changes are consistent with resistance resulting from impairing binding of eIF4E1 with the viral VPg. These data argue again in favour of (i) a rather limited set of amino acid positions and mutations than can confer potyvirus resistance and that are compatible with plant development, production and fitness and (ii) a strong positive selection on these specific mutations during pepper evolution in the wild, domestication and breeding. eIF4E1 alleles conferring the broadest spectrum of action towards potyviruses, like alleles pvr1, pvr21 , pvr22 , pvr27 and pvr229 , are frequently placed on tips of the network branches and usually carry mutations in different regions of the eIF4E1 simultaneously (regions I and II or I and III). Some mutations induced resistance to multiple potyviruses simultaneously (Moury et al. 2014a). Moreover, the network revealed that mutations that increased the resistance spectrum to a larger number of potyvirus species also increased the resistance durability to a given potyvirus species (Moury et al. 2014a). From evolutionary patterns to understanding of resistance durability All these results provided also important insights on the resistance durability. Three eIF4E1 resistance alleles (pvr21 , pvr22 and pvr23 ) have been precisely characterized for their durability towards PVY. pvr21 and pvr22 were used worldwide in pepper breeding programmes for PVY or TEV resistance since the 1950s. The PVY resistance conferred by pvr22 proved to be extremely durable since no PVY isolate able to break that resistance was ever observed in the field. This is a notable counterexample of the general rule that plant monogenic resistance to parasites is usually poorly durable compared to polygenic resistance. In contrast, TEV isolates able to break the pvr22 resistance have been observed in some geographic areas. A few PVY variants able to break the pvr22 resistance have however been selected in laboratory conditions using serial virus passages. Compared to pvr22 , the pvr21 resistance is far less durable, since RB PVY isolates have been observed in many areas where they can reach a high prevalence (Ben Khalifa et al. 2009). Finally, pvr23 confers a very poorly durable resistance to PVY and was not incorporated in breeding programmes for elite F1 hybrids of pepper. Indeed, pvr23 -breaking PVY isolates are widespread, even in areas where pepper cultivars do not harbour the pvr23 resistance gene. This

Virus Evolution Faced to Multiple Host Targets: The Potyvirus …

129

is certainly due to the pleiotropic effects of RB mutations, some VPg mutations selected by the pvr21 resistance conferring to PVY the capacity to infect pepper cultivars carrying either pvr21 or pvr23 (Ben Khalifa et al. 2009). In laboratory conditions, one month after inoculation with a non-adapted PVY isolate, up to 90% pvr23 plants can get infected because of the emergence of RB mutations in the virus population (Quenouille et al. 2014). In summary, in spite of being highly similar in sequence (they differ by one to three amino acid substitutions), the eIF4E1 encoded by pvr21 , pvr22 and pvr23 confer highly contrasted resistance durability towards PVY. Two main factors explained these durability contrasts: (i) the mutational pathways in PVY VPg that were involved in the breakdown of these resistances and (ii) the fitness cost induced by some of these mutations. VPg mutations involved in the breakdown of pvr21 and pvr23 do not seem to be costly for PVY (Ben Khalifa et al. 2009, 2012). Moreover, at least 10 nucleotide substitutions were shown to be sufficient for pvr23 RB (Ayme et al. 2006; Montarry et al. 2011), which explains its low durability. Much fewer nucleotide substitutions were shown to confer pvr21 RB contributing to its better durability although RB mutants can appear relatively easily and be selected by the pvr21 resistance. Finally, at least two mutations should be fixed in PVY VPg to break the pvr22 resistance. Moreover, the set of mutation combinations required is extremely reduced (two have been described) and most of these mutations induce a competitiveness cost to PVY in absence of the pvr22 resistance. Collectively, these factors are responsible for the extremely high durability of pvr22 . eIF4Es also work in pairs Besides resistance to PVY, TEV and PTV associated with eIF4E1, a digenic recessive resistance to the potyviruses pepper veinal mottle virus (PVMV) and chilli veinal mottle virus (ChiVMV) has been characterized in a few cultivars (Caranta et al. 1997). Genetic analysis showed that such resistance relied on the combination of the above-mentioned locus pvr2 (i.e., eIF4E1) and of the unlinked locus pvr6. A candidate-gene approach and functional complementation assays showed that pvr6 encodes a homolog of eIF4E named eIF(iso)4E (Ruffel et al. 2006). In contrast with eIF4E1 resistance alleles, which are characterized by non-synonymous substitutions, the pvr6-eIF(iso)4E allele that conferred the recessive resistance to PVMV and ChiVMV had a deletion in the coding sequence, causing a complete loss of function of the eIF(iso)4E gene. In accordance with the well-characterized functional redundancy amongst members of the translation initiation factors 4E, the inactivation of eIF(iso)4E did not cause any obvious plant development defect. The resistance to PVMV and ChiVMV based on the combination of eIF4E1 and eIF(iso)4E highlights the high versatility of potyviruses, and the fact that each potyvirus isolate or species can hijack specifically one member of the translation initiation 4E gene family or several. The latter case is however not as frequent, probably because of the structural differences between eIF4E and eIF(iso)4E proteins.

130

L. Tamisier et al.

3 Quantitative Resistance as a Mixture of Virus-Specific and Generic Effects Historically, two main types of plant resistance to parasites have been defined: qualitative and quantitative ones (Pilet-Nayel et al. 2017). Qualitative resistance will produce two discrete phenotypic classes amongst the plant population: the individuals showing a complete or high level of resistance and the susceptible individuals. The underlying genetic architecture of this resistance type is generally composed of one or a few large-effect genes, like for the pvr2 resistance alleles or the bigenic resistance combining pvr2 and pvr6. In contrast, quantitative resistance will produce a continuous distribution of phenotypes amongst the plant population, comprising individuals showing partial levels of resistance. Quantitative resistance is usually controlled by alleles segregating at many quantitative trait loci (QTLs), each QTL conferring a small to moderate resistance effect. Since qualitative resistance confers a near complete protection to the plant and is also relatively easy to introduce into crop cultivars, it has been widely used to control plant parasites. However, the strong selective pressure imposed by qualitative resistance genes on the parasite populations has often resulted in the adaptation of the targeted pathogens and to RB. In the past decades, particular attention has been paid towards quantitative resistance, which has been considered to be more durable than qualitative one. Identification of quantitative resistance and tolerance loci to PVY In the potyvirus/pepper pathosystem, the first approach used to map resistance QTLs in the pepper genome was to use a biparental progeny segregating for the level of quantitative resistance. The Indian Capsicum annuum inbred line ‘Perennial’ is particularly interesting, since it carries a broad spectrum resistance towards several potyviruses and this resistance has a high level and a good durability. The inheritance of its resistance was studied using a doubled-haploid (DH) progeny composed of pepper lines obtained from the F1 hybrid between ‘Perennial’ and ‘Yolo Wonder’, a line susceptible to potyviruses. One of the first studies performed on this progeny identified 11 genomic regions (QTLs) involved in resistance to PVY and to an isolate of PVMV named “potyvirus E” (Caranta et al. 1997). These QTLs were showing a different spectrum of action, some QTLs being associated to resistance to a single PVY isolate while others were associated with resistance to several PVY isolates, or even to both PVY and PVMV. Surprisingly, ‘Perennial’ carries also the eIF4E1pvr23 resistance allele, which was previously shown to confer a very poorly durable resistance to PVY. It was therefore interesting to understand why ‘Perennial’ had a durable resistance to PVY in spite of the presence of pvr23 . More recently, Quenouille et al. (2014) and Tamisier et al. (2017) mapped new resistance QTLs using a larger DH population obtained with the same pepper parental lines, together with additional genetic markers and a complete linkage map including the 12 pepper chromosomes (Fig. 4). The phenotyping of the plant resistance to PVY focused on three traits: (i) the size of the virus population at the inoculation step (SI) estimated with the number of primary infection foci on inoculated cotyledons

Virus Evolution Faced to Multiple Host Targets: The Potyvirus …

131

or leaves, which is related to the infection success, (ii) the kinetics of symptom expression in infected plants (AUDPC, for area under the disease progress curve) and (iii) the virus load within infected plants one month after inoculation (VL). The mapping analysis allowed identifying three QTLs with additive effects controlling SI, three controlling AUDPC, and two controlling VL. An epistatic interaction between QTLs on chromosomes 7 and 12 was also detected for SI. Later on, Tamisier et al. (2020) mapped the genetic regions associated to the same three resistance-associated traits using another approach, genome-wide association study (GWAS). For this, a core-collection of 256 C. annuum accessions was used. With a sufficient density of genetic markers on the pepper genome map, the higher number of recombination events of the core-collection should allow a better QTL mapping precision (i.e., reduced confidence intervals for the QTL location). Moreover, due to the much larger genetic basis of GWAS compared to biparental QTL mapping, we expect to identify more resistance genes and alleles. GWAS revealed six different QTLs in association with SI or VL (Fig. 4). These results are consistent with the previous ones (using the biparental progeny) since three of these QTLs co-localize with QTLs previously identified (on chromosomes 6 and 12) and two of these QTLs co-localize with the pvr2-eIF4E1 locus on chromosome 4, suggesting that different eIF4E1 alleles control either a qualitative or a quantitative resistance. A seventh QTL, at the bottom of chromosome 9 in Fig. 4, was shown to be associated with pepper tolerance to PVY. Tolerance is the plant ability to reduce the damage caused by a parasite, regardless of the parasite load. It was measured as the slope (usually negative) of the linear regression between plant weight (relative to healthy controls) and VL (the lower the slope, the lower the tolerance) (Råberg 2014). Logically, this tolerance QTL co-localizes with QTLs controlling AUDPC previously identified, since tolerance and symptomatology are partly related. Only one new QTL, affecting SI and located on the top of chromosome 9 in Fig. 4, was identified by GWAS thanks to the pepper core-collection compared to the biparental mapping. The gene(s) responsible for resistance or tolerance variation at most of these QTLs are unknown and no obvious candidates have been identified in their confidence intervals that contain from 19 to 61 genes for GWAS. Only for QTLs located on chromosomes 3 and 4 were obvious candidates identified: gene pvr6 encoding the eIF(iso)4E isoform and gene pvr2-eIF4E1, respectively. QTL mapping provided interesting lessons concerning the genetic architecture of potyvirus quantitative immunity in pepper. The immunity of ‘Perennial’ was built from an assemblage of qualitative and quantitative resistance genes/loci and involved different mechanisms of action, an early resistance reducing the infection efficiency, a later resistance reducing the virus load in the whole plant in addition to tolerance effects. Some of these effects were specific to PVY, like tolerance, whereas others were more general. Notably, the two QTLs with a major effect on resistance to PVY infection (SI) on chromosomes 7 and 12 co-localized with QTLs controlling resistance to infection by another unrelated virus, Cucumber mosaic virus (CMV; genus Cucumovirus, family Bromoviridae) (Fig. 4). In contrast, the two QTLs located on chromosome 6 that determined resistance to virus infection were specific, one efficient against PVY and the other against CMV. Amongst the pepper progeny,

132

L. Tamisier et al.

pvr6 pvr2 CMV PVY

1

3

5

6

4

Pepper populations

PVY

CMV

Biparental progeny (DH lines)

CoreCollection

Resistance breakdown frequency (RBF) Virus load (VL)

PVY

12

CMV

7

Area under disease progression curve (AUDPC) Virus population size at inoculation (SI)

9

Tolerance

Fig. 4 Pepper genetic map of resistance and tolerance QTLs identified using DH lines carrying the pvr23 resistance allele (filled boxes) or a pepper core-collection (empty boxes). The detected QTLs controlled several traits linked to PVY resistance, such as the breakdown of pvr23 resistance (RBF; in orange), the virus load within plants (VL; in blue), the area under disease progress curve (AUDPC; in green) and the size of the virus population at the inoculation step (SI) estimated with the number of primary infection foci (in black), as well as PVY tolerance (in grey). The number of infection foci in the DH lines has been measured for both PVY and CMV, the virus being indicated in white in black boxes corresponding to the QTLs. Lines indicate epistatic effects between loci. Shades of grey along the chromosomes indicates SNP marker density in the pepper core-collection. The pvr2 and pvr6 genes encode respectively the eIF4E1 and eIF(iso)4E proteins

Virus Evolution Faced to Multiple Host Targets: The Potyvirus …

133

resistance to infection by PVY was also correlated to resistance to infection by a third virus, Tomato mosaic virus (ToMV; genus Tobamovirus, family Virgaviridae), strengthening its genericity, although no genetic analysis of the resistance to this latter virus could be performed. The very large spectrum of action of the resistance to infection could be linked to the inoculation procedure used for the three viruses PVY, CMV and ToMV, i.e., manual inoculation by rubbing plant leaves or cotyledons with virus inocula. The diversity of genes and mechanisms involved in the ‘Perennial’ resistance is certainly also in the heart of its high resistance durability. Unravelling the mechanisms behind polygenic resistance durability One of the most intriguing findings concerning the C. annuum cultivar ‘Perennial’ is the fact that its PVY resistance is highly durable, even under high inoculum doses like in graft-inoculation experiments (Quenouille et al. 2013), even though it carries the major-effect eIF4E1-pvr23 allele which was shown to be extremely poorly durable (Palloix et al. 2009). Consequently, the high durability of the PVY resistance of ‘Perennial’ could be due to the presence of additional resistance factors (probably some of the mapped resistance QTLs), possibly involving synergy effects between pvr23 and QTLs. In order to explain the greater durability of polygenic (major gene pvr23 plus QTLs) compared to monogenic (major gene pvr23 alone) resistances, three possible evolution-related mechanisms have been examined. First, the QTLs could reduce the total virus population size in the infected plants. A lower within-plant virus load could be due to a reduced number of virus replications and, consequently, a lower probability of appearance of mutations in the PVY genome, including RB mutations. At the phenotypic level, a strong positive correlation between resistance breakdown frequency (RBF) and virus load (VL) have been found across pepper lines of the ‘Perennial × Yolo Wonder’ DH progeny (Quenouille et al. 2014). Four QTLs controlling RBF were mapped using that progeny in the pepper genome, including one QTL with only epistatic effect (on chromosome 6) (Fig. 4). Moreover, three of these QTLs, on chromosomes 1, 3, and 6, co-localize with QTLs controlling either VL or AUDPC. Interestingly, the effect of parental alleles was consistent between QTLs, the alleles decreasing RBF also decreasing VL and AUDPC. This strongly suggests a pleiotropic effect of the quantitative resistance factors. Interestingly, the QTL on chromosome 3 controlling RBF coincides with the pvr6eIF(iso)4E gene. As we have seen before, pvr6 was shown to expand the resistance spectrum to additional potyviruses (PVMV and ChiVMV) when combined with some pvr2 resistance alleles. Surprisingly, the pvr6 allele responsible for PVMV/ChiVMV resistance, a natural KO allele, decreased the durability of the PVY resistance determined by the pvr23 allele (higher RBF). This suggests a trade-off between the breadth of the resistance spectrum and resistance durability, opposite to what we have seen for the different pvr2 alleles (see above) or other resistance genes for which a positive correlation is generally observed (Janzac et al. 2009; Le Van et al. 2013). This unusual relationship could be due to the complex regulatory pathways that occur between the different eIF4E members in the plant. The destabilization of resistance durability conferred by pvr6 did not concern only the pvr23 allele but also the pvr24

134

L. Tamisier et al.

allele (Quenouille et al. 2016), the pvr21 and pvr22 alleles (unpublished data). Moreover, for pvr22 , resistance to both PVY and TEV was destabilized, strengthening the generality of this effect. Another mechanism that could explain the greater durability of polygenic resistance is the reduction of the virus effective population size. Within the plant, several evolutionary forces will act on the evolution of the virus population and its ability to adapt to the plant resistance. The first one is the selection, which is a deterministic force that will increase the frequency of the most adapted virus variants over time. The second one is the genetic drift, which is a stochastic force that will randomly change the frequencies of the virus variants from generation to generation, independently of their fitness. Both forces act jointly but can have different effects on the virus evolution. Indeed, when the within-plant genetic drift is strong, deleterious mutations may be randomly fixed or advantageous ones may be lost. The population genetics parameter used to quantify the impact of genetic drift on the virus population is the effective population size (N e ). N e can be defined as the number of individuals that pass their genes to the next generation, and the strength of genetic drift is inversely proportional to it. During the infection process, the virus population will be subject to several bottlenecks, which will reduce N e and, consequently, increase the effect of genetic drift. One of these bottlenecks occurs during the inoculation step, where only a small proportion of the virus population will be transmitted to the plant. Therefore, the QTLs controlling the PVY population size at inoculation (the variable SI described above) directly control the N e at inoculation, as well as the effect of genetic drift. For some pepper genotypes, another population bottleneck occurs at the onset of PVY systemic infection and could be due to virus loading into the plant vascular system (Rousseau et al. 2017). Interestingly, the QTL controlling SI on chromosome 6 co-localized with a QTL controlling VL and is also involved in an epistatic interaction with a QTL on chromosome 3 controlling RBF. The parental allele at the QTL on chromosome 6 that decreases N e (SI) decreases also VL, while increasing RBF. Therefore, these results provide genetic evidence that decreasing N e and VL, and, consequently, increasing genetic drift, could help to prevent the major resistance gene from breakdown (Rousseau et al. 2018). Finally, the third mechanism that could explain the high durability of polygenic resistance is the lower selection pressure exerted by QTLs on the virus population, compared to the selection pressure exerted by qualitative resistance genes. A low selection pressure could slow down the emergence of adapted virus mutants, increasing resistance durability. To explore this mechanism, the competitiveness (relative fitness) of five PVY variants was assessed in pepper plants carrying a polygenic (the ‘Perennial’ cultivar) or a monogenic resistance (a pepper DH line with pvr23 alone). These five PVY variants differed only by a few non-synonymous substitutions that conferred them various degrees of adaptation to the major-effect resistance gene pvr23 . Results showed that the PVY variants with a better adaptation to pvr23 were more competitive than the others in both pepper genotypes, but the difference in competitiveness between PVY variants was greater in plants with the monogenic resistance (Quenouille et al. 2013). Consequently, the selection exerted

Virus Evolution Faced to Multiple Host Targets: The Potyvirus …

135

by the polygenic resistance onto the PVY population is weaker than the selection exerted by the monogenic resistance, which is favourable to resistance durability. Combining major-effect resistance genes and resistance QTLs in the same plant cultivar was also shown to be a successful strategy to enhance the durability of resistance against other kinds of plant parasites, including a fungus and a nematode (Brun et al. 2010; Fournet et al. 2013).

4 A Hypothetical Scenario for Evolution of Complex Resistance Systems Thanks to phenotyping of resistance spectrum, durability, and mechanisms combined with genetic and functional analyses, a thorough image of the complexity of potyvirus resistance in the Indian pepper cultivar ‘Perennial’ could be obtained and a hypothetical scenario for the evolution of this complexity can be proposed (Fig. 5). The pvr23 resistance allele was probably the first resistance factor incorporated since it confers a high level of resistance against the worldwide-distributed potyvirus PVY. Indeed, the other resistance factors either work in epistasis with pvr23 (like pvr6) or confer a rather small resistance effect (like QTLs). The pvr23 allele confers resistance to other potyviruses like TEV or PTV, but those are mostly restricted to the American continent and did not exert a strong selection pressure on pepper production in India. However, PVY is not the only potyvirus in Asia and ChiVMV is highly prevalent throughout the Asian continent, including India. No single major-effect gene was shown to be efficient against ChiVMV but the combination of pvr6 and pvr2 alleles exerts a strong resistance effect against ChiVMV and PVMV, the latter being mostly prevalent in Africa. The second step in the scenario was therefore probably the acquisition of the pvr6 KO allele that expanded resistance to ChiVMV. However, as we have seen before, the combination with pvr6 has a detrimental effect on pvr23 in terms of resistance durability towards PVY. Three additional QTLs on chromosomes 1, 5, and 6 were consequently required to regain a good level of resistance durability. Hence, to achieve both a large spectrum and highly durable potyvirus resistance, it was necessary to combine pvr23 , pvr6, and three resistance QTLs.

5 Conclusion Overall, the pepper/potyvirus pathosystem, and the wealth of variability amongst eIF4E1-pvr2 resistance alleles have been instrumental in understanding how resistances based on eIF4E can be developed. Firstly, the characterization of eIF4E1 as the basis for pvr2-driven resistance started a large body of work in many crops at the beginning of the millennium. Those works showed that most recessive resistances to potyviruses and related single-strand positive-sense RNA viruses that had been

136

L. Tamisier et al.

1. Resistance to PVY

2. Enlargement of resistance spectrum to ChiVMV Decrease of PVY resistance durability

3. Combining large-spectrum and durable potyvirus resistance thanks to QTLs

pvr23

pvr23

pvr6

pvr23

pvr6

QTL1

QTL5

QTL6

eIF4E1

eIF4E1

eIF(iso)4E

eIF4E1

eIF(iso)4E

?

?

?

Fig. 5 Hypothetical scenario of introgression of multiple resistance genes and QTLs in the Indian pepper cultivar ‘Perennial’ to achieve both large spectrum and high-durability potyvirus resistance. While the pvr2 and pvr6 genes were shown to encode different members of the eIF4E multigenic family, the genes and encoded proteins corresponding to QTLs have not been identified. The plus and minus signs indicate the effects of the pvr6 gene and of QTLs on the durability of the PVY resistance controlled by the pvr23 gene

selected in crops resulted from similar types of mutations in eIF4Es (Robaglia and Caranta 2006). Besides, the very complex array of mutation combinations affecting eIF4E1 in pepper provides a blueprint that could help the development of new resistance alleles in crops for which natural resistance alleles have not been naturally selected. With the advance of genome editing technologies, a selection of mutations, whose combination has been shown to be effective in terms of resistance spectrum and durability, could be translated using precision breeding in order to generate transgene-free resistant plants (Bastet et al. 2017; Veillet et al. 2020). The wide diversity of resistance genes and alleles in pepper and the potyvirus diversity and evolutionary potential provided insights into the factors that rule resistance durability: the complexity of the assemblage of resistance factors in the plants and their effect on potyvirus evolution forces are good predictors of resistance durability and can be used to create new cultivars with both large spectrum, highly efficient and highly durable resistance in plants. Acknowledgements This chapter is dedicated to Alain Palloix (GAFL unit, INRAE PACA) who was a pioneer in the study of pepper resistance durability. We thank the many PhD and Master students who contributed to the advancement of knowledge on pepper resistance durability in the GAFL and Pathologie Végétale units at INRAE PACA, Cécile Desbiez (INRAE PACA) for her help with the use of the network software and Sophie Ewert, Vincent Simon, Grégory Girardot, Pauline Millot, Ghislaine Nemouchi and Marion Szadkowski (INRAE PACA) for their help in the experiments. We thank Sarah De Colle–Guiheneux and William Billaud for their comments on a previous version of the manuscript. We also acknowledge the CRB-Leg (www6.paca.inrae.fr/ gafl/CRB-Legumes) who maintained the INRAE pepper germplasm, as well as the experimental facilities of the Pathologie Végétale (https://doi.org/10.15454/8DGF-QF70) and GAFL INRAE research units.

Virus Evolution Faced to Multiple Host Targets: The Potyvirus …

137

References Ayme V, Souche S, Caranta C et al (2006) Different mutations in the VPg of Potato virus Y confer virulence on the pvr23 resistance in pepper. Mol Plant-Microbe Interact 19:557–563 Bastet A, Robaglia C, Gallois J-L (2017) eIF4E resistance: natural variation should guide gene editing. Trends Plant Sci 22:411–419 Ben Khalifa M, Simon V, Marrakchi M et al (2009) Contribution of host plant resistance and geographic distance to the structure of Potato virus Y (PVY) populations in pepper in northern Tunisia. Plant Pathol 58:763–772 Ben Khalifa M, Simon V, Fakhfakh H et al (2012) Tunisian Potato virus Y isolates with unnecessary pathogenicity towards pepper: Support for the matching allele model in eIF4E resistance—potyvirus interactions. Plant Pathol 61:441–447 Brun H, Chèvre A-M, Fitt BDL et al (2010) Quantitative resistance increases the durability of qualitative resistance to Leptosphaeria maculans in Brassica napus. New Phytol 185:285–299 Caranta C, Lefebvre V, Palloix A (1997) Polygenic resistance of pepper to potyviruses consists of a combination of isolate-specific and broad-spectrum quantitative trait loci. Mol Plant-Microbe Interact 10:872–878 Charron C, Nicolaï M, Gallois J-L et al (2008) Natural variation and functional analyses provide evidence for co-evolution between plant eIF4E and potyviral VPg. Plant J 54:56–68 Fournet S, Kerlan M-C, Renault L et al (2013) Selection of nematodes by resistant plants has implications for local adaptation and cross-virulence. Plant Pathol 62:184–193 Gauffier C, Lebaron C, Moretti A et al (2016) A TILLING approach to generate broad-spectrum resistance to potyviruses in tomato is hampered by eIF4E gene redundancy. Plant J 85:717–729 Ibiza VP, Canizares J, Nuez F (2010) EcoTILLING in Capsicum species: searching for new virus resistances. BMC Genomics 11:631 Janzac B, Fabre M-F, Palloix A et al (2009) Phenotype and spectrum of action of the Pvr4 resistance in pepper against potyviruses, and selection for virulent variants. Plant Pathol 58:443–449 Kang B-C, Yeam I, Frantz DJ et al (2005) The pvr1 locus in pepper encodes a translation initiation factor eIF4E that interacts with Tobacco etch virus VPg. Plant J 42:392–405 Le Van A, Caffier V, Lasserre-Zuber P et al (2013) Differential selection pressures exerted by host resistance quantitative trait loci on a pathogen population: a case study in an apple Venturia inaequalis pathosystem. New Phytol 197:899–908 Montarry J, Doumayrou J, Simon V et al (2011) Genetic background matters: a plant–virus genefor-gene interaction is strongly influenced by genetic contexts. Mol Plant Pathol 12:911–920 Moury B, Charron C, Janzac B et al (2014a) Evolution of plant eukaryotic initiation factor 4E (eIF4E) and potyvirus genome-linked protein (VPg): a game of mirrors impacting resistance spectrum and durability. Inf Genet Evol 27:472–480 Moury B, Janzac B, Ruellan Y et al (2014b) Interaction pattern between Potato virus Y and eIF4Emediated recessive resistance in the Solanaceae. J Virol 88:9799–9807 Palloix A, Ayme V, Moury B (2009) Durability of plant major resistance genes to pathogens depends on the genetic background, experimental evidence and consequences for breeding strategies. New Phytol 183:190–199 Pilet-Nayel ML, Moury B, Caffier V et al (2017) Quantitative resistance to plant pathogens in pyramiding strategies for durable crop protection. Front Plant Sci 8:1838 Poulicard N, Pinel-Galzi A, Fargette D et al (2014) Alternative mutational pathways, outside the VPg, of rice yellow mottle virus to overcome eIF(iso)4G-mediated rice resistance under strong genetic constraints. J Gen Virol 95:219–224 Quenouille J, Montarry J, Palloix A et al (2013) Farther, slower, stronger: how the plant genetic background protects a major resistance gene from breakdown. Mol Plant Pathol 14:109–118 Quenouille J, Paulhiac E, Moury B et al (2014) Quantitative trait loci from the host genetic background modulate the durability of a resistance gene: a rational basis for sustainable resistance breeding in plants. Heredity 112:579–587

138

L. Tamisier et al.

Quenouille J, Saint-Félix L, Moury B et al (2016) Diversity of genetic backgrounds modulating the durability of a major resistance gene. Analysis of a core collection of pepper landraces resistant to Potato virus Y. Mol Plant Pathol 17:296–302 Råberg L (2014) How to live with the enemy: understanding tolerance to parasites. PLoS Biol 12:e1001989 Råberg L, Alacid E, Garces E et al (2014) The potential for arms race and Red Queen coevolution in a protist host–parasite system. Ecol Evol 4:4775–4785 Robaglia C, Caranta C (2006) Translation initiation factors: a weak link in plant RNA virus infection. Trends Plant Sci 11:40–45 Rousseau E, Moury B, Mailleret L et al (2017) Estimating virus effective population size and selection without neutral markers. PLoS Pathog 13:e1006702 Rousseau E, Tamisier L, Fabre F et al (2018) Impact of genetic drift, selection and accumulation level on virus adaptation to its host plants. Mol Plant Pathol 19:2575–2589 Ruffel S, Dussault MH, Palloix A et al (2002) A natural recessive resistance gene against potato virus Y in pepper corresponds to the eukaryotic initiation factor 4E (eIF4E). Plant J 32:1067–1075 Ruffel S, Gallois J-L, Moury B et al (2006) Simultaneous mutations in translation initiation factors eIF4E and eIF(iso)4E are required to prevent pepper veinal mottle virus infection of pepper. J Gen Virol 87:2089–2098 Tamisier L, Rousseau E, Barraillé S et al (2017) Quantitative trait loci in pepper control the effective population size of two RNA viruses at inoculation. J Gen Virol 98:1923–1931 Tamisier L, Szadkowski M, Nemouchi G et al (2020) Genome-wide association mapping of QTLs implied in potato virus Y population sizes in pepper: evidence for widespread resistance QTL pyramiding. Mol Plant Pathol 21:3–16 van Schie CCN, Takken FLW (2014) Susceptibility genes 101: how to be a good host. Annu Rev Phytopathol 52:551–581 Veillet F, Durand M, Kroj T et al (2020) Precision breeding made real with CRISPR: illustration through genetic resistance to pathogens. Plant Commun 1:100102 Woolhouse MEJ, Webster JP, Domingo E et al (2002) Biological and biomedical implications of the co-evolution of pathogens and their hosts. Nat Genet 32:569–577

The Role of Extensive Recombination in the Evolution of Geminiviruses Elvira Fiallo-Olivé and Jesús Navas-Castillo

Abstract Mutation, recombination and pseudo-recombination are the major forces driving the evolution of viruses by the generation of variants upon which natural selection, genetic drift and gene flow can act to shape the genetic structure of viral populations. Recombination between related virus genomes co-infecting the same cell usually occurs via template swapping during the replication process and produces a chimeric genome. The family Geminiviridae shows the highest evolutionary success among plant virus families, and the common presence of recombination signatures in their genomes reveals a key role in their evolution. This review describes the general characteristics of members of the family Geminiviridae and associated DNA satellites, as well as the extensive occurrence of recombination at all taxonomic levels, from strain to family. The review also presents an overview of the recombination patterns observed in nature that provide some clues regarding the mechanisms involved in the generation and emergence of recombinant genomes. Moreover, the results of experimental evolution studies that support some of the conclusions obtained in descriptive or in silico works are summarized. Finally, the review uses a number of case studies to illustrate those recombination events with evolutionary and pathological implications as well as recombination events in which DNA satellites are involved.

1 Introduction Mutation, recombination and pseudo-recombination (reassortment) are the major forces driving the evolution of viruses. These three factors are essentially the

E. Fiallo-Olivé · J. Navas-Castillo (B) Instituto de Hortofruticultura Subtropical y Mediterránea “La Mayora” (IHSM-UMA-CSIC), Consejo Superior de Investigaciones Científicas, Avenida Dr. Wienberg s/n, 29750 Algarrobo-Costa, Málaga, Spain e-mail: [email protected]

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 E. Domingo et al. (eds.), Viral Fitness and Evolution, Current Topics in Microbiology and Immunology 439, https://doi.org/10.1007/978-3-031-15640-3_4

139

140

E. Fiallo-Olivé and J. Navas-Castillo

same that drive the evolution of all living things, viral recombination being analogous to recombination during meiosis, and pseudo-recombination being analogous to chromosomal reassortment during sexual reproduction. These evolutionary forces generate variants in viral genomes upon which natural selection, genetic drift and gene flow can act to shape the genetic structure of viral populations (reviewed in Duffy et al. 2008; Roossinck 1997). Recombination between related virus genomes co-infecting the same cell usually occurs via template-switching during the replication process and produces a chimeric genome. Reassortment takes place in segmented/multipartite viruses and generates novel combinations of genomic components that are derived from two or more parental viruses. This process involves either co-encapsidation in segmented viruses or co-transmission in multipartite viruses, the latter being much more common in plant viruses than in animal viruses (Varsani et al. 2018). In the case of plant viruses, wild host plants have been shown to play a key role as reservoirs of viral diversity, thus contributing to viral evolution and the emergence of diseases affecting crops, with most of these studies having been carried out on geminiviruses (García-Arenal and Zerbini 2019). In both crops and wild plants, mixed viral infections are frequent; this has important implications for evolution, as these mixed infections are a prerequisite for recombination to occur (Alcaide et al. 2020). Agricultural intensification, along with an increased population of insect vectors of plant viruses, are additional significant factors driving their evolution (Seal et al. 2006). In this review, we briefly describe the general characteristics of members of the family Geminiviridae and associated DNA satellites, and examine the common presence of recombination signatures in their genomes that play a key role in their evolution. We also present an overview of the recombination patterns observed in nature that provide some clues about the mechanisms involved in the generation and emergence of recombinant genomes. Moreover, results of experimental evolution studies that support some of the conclusions reached in descriptive or in silico works are described. Finally, we review a number of case studies of recombination events with evolutionary and pathological implications as well as the recombination events in which DNA satellites are involved.

2 The Family Geminiviridae 2.1 General Features, Classification, Genome Organization and Replication The family Geminiviridae includes plant viruses with circular single-stranded (ss) DNA genomes encapsidated in unique twinned (geminate) virions (Fiallo-Olivé et al. 2021; Navas-Castillo and Fiallo-Olivé 2021). Geminiviruses are the main group of plant viruses, comprising fourteen recognized genera and more than 500 species. Their members infect both dicot and monocot plant species and use insect vectors for transmission. A high number of geminiviruses are the cause of economically

The Role of Extensive Recombination in the Evolution of Geminiviruses

141

important diseases among vegetable and fiber crops worldwide, mainly in tropical and subtropical regions of the world (Navas-Castillo et al. 2011). Examples of such highly destructive geminiviruses include bean golden mosaic virus and bean golden yellow mosaic virus, two of the greatest impediments to bean production in the Americas (Zerbini and Ribeiro 2021); maize streak virus, the causal agent of maize streak disease (Shepherd et al. 2010), which is the principal viral disease affecting maize in sub-Saharan Africa; tomato yellow leaf curl virus and other begomoviruses causing tomato yellow leaf curl disease, one the most important viral diseases affecting tomato plants worldwide (Yan et al. 2021); African cassava mosaic virus and related begomoviruses, the most significant constraints to cassava production in the African continent (Rey and Vanderchuren 2017); and cotton leaf curl Multan virus (CLCuMuV), which is the cause of the most significant viral disease affecting cotton crops in Pakistan and India (Sattar et al. 2013). In addition to infecting economically important crops, geminiviruses also infect weeds and other wild plants, which can act as viral reservoirs, having an important role to play in viral emergence (García-Arenal and Zerbini 2019). Depending on genome organization, insect vectors and host ranges, geminiviruses are currently classified into 14 genera: Becurtovirus, Begomovirus, Capulavirus, Citlodavirus, Curtovirus, Eragrovirus, Grablovirus, Maldovirus, Mastrevirus, Mulcrilevirus, Opunvirus, Topilevirus, Topocuvirus and Turncurtovirus (Fiallo-Olivé et al. 2021; Navas-Castillo and Fiallo-Olivé 2021; Roumagnac et al. 2021). Begomoviruses are transmitted by whiteflies (Hemiptera: Aleyrodidae) of the Bemisia tabaci complex; becurtoviruses, curtoviruses, mastreviruses, mulcrileviruses, and turncurtoviruses by leafhoppers; grabloviruses and topocuviruses by treehoppers; and capulaviruses by aphids (Fiallo-Olivé et al. 2020, 2021). The vectors of citlodaviruses, eragroviruses, maldoviruses, opunviruses and topileviruses are unknown. The genus Begomovirus contains the highest number of species (more than 440) among all viral genera (Fiallo-Olivé et al. 2021). Begomoviruses generally induce severe symptoms in their hosts and are the most important in the family in terms of the losses they cause (Navas-Castillo et al. 2011). In the family Geminiviridae, species demarcation criteria are based on genome-wide pairwise identity, but the specific percentage varies from genus to genus. In the case of begomoviruses, for example, a virus should be considered a member of a new species if the sequence has 0.1) remaining uninfected (González-Jara et al. 2009). However, a recent reanalysis of these data suggested a higher MOI, whilst the number of coinfected cells was low due to the spatial segregation of the TMV variants (Zwart et al. 2013). For the pararetrovirus cauliflower mosaic virus (CaMV), estimated MOI values varied from 2 to 13 over time, with most cells being infected (Gutiérrez et al. 2010). For Japanese soil-borne wheat mosaic virus, MOI was estimated for the first rounds of cellular infection in the inoculated foci, rendering a value in the range 5–6 (Miyashita and Kishino 2010). For TEV, the number of infected cells in systemic tissues early in infection depends on the number of primary infection foci, and the number of infected cells never increased to a frequency > 0.5, and MOI < 1.5 per cell (Tromas et al. 2014). Interestingly, Gutiérrez et al. (2015) showed that MOI for TuMV dramatically changed along infection, with a biphasic behavior: MOI < 0.02 near the primary infection sites generated when the virus exits the vasculature but it was > 21 when infection was spreading from cell-to-cell despite the likely onset of mechanisms inhibiting secondary infections by other genotypic lineages in the same leaf. In the case of citrus tristeza virus, MOI was also estimated to be very low as a consequence of mechanisms of superinfection exclusion (Bergua et al. 2014). Finally, as mentioned in Sect. 3 above for ToMV, it was estimated to be 4 per cell (Miyashita et al. 2015). Hence, MOI seems to be generally low, thus raising doubts about how recombinant genotypes are so frequently found in nature (Simon-Loriere and Holmes 2011), or making it difficult to explain the evolutionary stability of segmented viruses. Bottlenecks due to low MOI values would result in heterogeneous spatial distribution of the genetic variability of the virus population within tissues. Bottlenecks occurring at the infection of new organs would extend this structuring to the organ level. The analysis of the higher level of biological organization, organs and tissues, has been the subject of a great deal of study before the rise of the HTS era. Evidence for spatial structure in the kinetics of virus multiplication and/or genetic structure is abundant, and derives from the analysis of different plant-virus systems (e.g., Hall

176

S. F. Elena and F. García-Arenal

et al. 2001; Sacristán et al. 2003; Li and Roossinck 2004; Jridi et al. 2006; Hackett et al. 2009; Tromas et al. 2014; Weigel et al. 2015). In agreement with these observations, estimation of bottlenecks during colonization of new leaves, or organs such as tubers in a few systems show consistently small values (Sacristán et al. 2003, French and Stenger 2003; Zwart et al. 2011; Tromas et al. 2014; but see Fabre et al. 2014). This work was based on the extraction and sequencing of clones of different viruses from different organs and tissues. Fortunately, this sort of study tremendously benefited from the advent of HTS techniques. For example, Kutnjak et al. (2015) characterized two different subpopulations of potato virus Y (PVY) (encapsidated genomes vs virus-derived small interfering RNAs), showing that they were mostly homogeneous in composition, with the major difference being that genomic RNAs showed a clear fingerprint of nonhomologous recombination. Unfortunately, most of the available HTS studies miss the temporal dimension, thus they cannot address the evolution of the viral populations. Three very interesting studies approached this temporal dimension. Fabre et al. (2012) used HTS to characterize the genetic stability and evolution of synthetic PVY populations within individual Capsicum annuum plants. These population were made up by mixing four different PVY genotypes, showing the existence of stochastic differences (due to inoculation bottlenecks) among plants and the relatively stable frequency of these genotypes along time. Interestingly, the genetic differentiation among plants increased at intermediate time points to decline afterwards. This pattern of evolution is compatible with the competition of the four different variants and was modeled using the stochastic Lotka-Volterra multi-species competition model. Cuevas et al. (2015) tracked changes in the genetic variability of TEV populations recurrently isolated from different compartments (leaves) and evolved by serial passages in the natural host N. tabacum and in an alternative one, C. annuum. It was found that directional selection shaped genetic variability in the novel host, with examples of selective sweeps that erased the variability of the mutant swarms after sequential fixation of beneficial mutations. By contrast, evolution in the original host was driven by diversifying selection or random drift (Cuevas et al. 2015). In a more recent study, time-sampled populations illustrated the balance between drift and selection during PVY adaptation to potato plants carrying different resistance genes, finding that selection operated upon epistatic groups of mutations in a host genotype-dependent manner (Kutnjak et al. 2017). Furthermore, it has been recently shown that PVY genetic variability was dependent upon the way of transmission, being larger for vertical tuber-transmitted quasispecies than for horizontal insect-transmitted ones (da Silva et al. 2020). HTS of ribosome-associated viral genomes showed temporal and spatial variation in the population of plum pox virus in plum trees, with buds and developing leaves supporting lower titers and higher genetic diversity than mature tissues, thus providing a reservoir of variants for infection of new organs (Tamukong et al. 2020). Genetic composition of the populations of different mature tissues or organs was also found to differ (Tamukong et al. 2020). In common, all these HTS studies draw a picture in which the mutant swarm is dominated by one or few genomes and the rest of coexisting variants exist at low frequencies, which may to some extent depend on the particular cell type or tissue.

Plant Virus Adaptation to New Hosts: A Multi-scale Approach

177

5 Beyond the Individual Host After replication in the initially infected cells and colonization of the individual host, a virus must infect new individuals in the host population for survival, which are the host at this and higher levels. It is generally accepted that virus titer in the infected host must reach a threshold for transmission to new hosts to occur (e.g., Alizon et al. 2009), what links processes at the cell and individual host level to processes at the host population level.

5.1 Plant Virus Spread and the Basic Reproductive Number At the population level, virus fitness relates to the virus capacity to infect new hosts. This capacity can be measured by the basic reproduction number, R0 , defined as the number of new hosts infected by an infected host when newly introduced into a fully susceptible, uninfected population, that is, before density dependent effects limit transmission. This definition can be rephrased as the generation-to-generation multiplication factor of the virus population, here of infected host individuals, when its density is infinitesimally small (van den Bosch et al. 2008), which underscores its relation with the virus fitness. While R0 is widely used in ecology and epidemiology of human and animal diseases, it is not much used by plant pathologists, despite its usefulness to establish thresholds for epidemic growth, compare epidemics or compare disease control strategies. Calculating R0 is not a trivial task. In its seminal book on plant disease epidemiology, Vanderplank (1963) pointed that R0 is equivalent to the product of the length of the infectious period times the corrected infection rate, Rc , of the disease progress curve (DPC). Expression for calculating R0 from different epidemiological models, either for DPC or for mean field models in which the host population is considered as built of SIR models are given in Madden et al. (2007). Expression for R0 has also been proposed for complex situations such as genetically heterogeneous host populations or uneven distributions of infections at the landscape level (van den Bosch et al. 2008). Also noteworthy is a derivation of R0 from a deterministic epidemiological model of vector transmitted viruses, which is considered susceptible-healthy, latent, infectious and removed classes of hosts, and differences in transmission mechanisms, given in Madden et al. (2000). As the epidemic progresses and incidence limits transmission to new hosts, the virus fitness will decrease which is reflected by variation in the apparent infection rate, r, of the DPC, the infectious period being constant in a first approximation (Madden et al. 2007). Often, epidemiological data do not allow for the estimation of model parameters, including r, and thus R0 . In these cases, a coarser estimate of the virus fitness would be the area under the disease progress stairs (AUDPS), which correlates with the incidence at the end of the epidemic and allows comparisons of the efficiency of host-to-host transmission (Simko and Piepho 2012).

178

S. F. Elena and F. García-Arenal

SIR models show that R0 is directly related to the transmission rate and inversely related to virulence (Anderson and May 1982), that is, the negative effect of infection in the host fitness (Alizon et al. 2009), because virulence reduces the infectious period. Henceforth, virus fitness at the population level is dependent on two main viral traits, transmission and virulence. Theory considers that transmission and virulence are not independent traits, and that their relationship depends on the mode of transmission.

5.2 Constraints Associated to Horizontal and Vertical Transmission Modes Transmission to new hosts may occur in two manners: horizontal transmission, i.e., between individuals of a population, and vertical transmission, i.e., from parent to offspring. Note that strictly speaking vertical transmission in plants occurs only through the seed. Transmission through other organs in vegetatively propagated plants (e.g., by tubers or bulbs), which is often considered as vertical in crops, is better understood as, and can be assimilated to, the colonization of new organs by the virus as discussed in the previous section, because virus infection in plants is always systemic and the organs for vegetative propagation are parts of an (infected) plant. While vertical transmission through the seed is widespread among plant virus taxa, with largely variable rates that depend on virus and host species and genotype, the mode of horizontal transmission is associated with taxonomy. Plant viruses in most taxa are vector-transmitted, mainly by homopterous insects (Fereres 2015; Whitfield et al. 2015; Gallet et al. 2018a, b), but there are also taxa in which horizontally transmission is through plant-to-plant contact or by the pollen (King et al. 2011). For a particular plant-virus interaction the efficiency of horizontal transmission is positively correlated with the virus concentration in the tissues of the infectious plant. This relation has been shown for many vectored viruses, including nonpersistently, persistently non-propagative and persistently propagative transmitted viruses (Amar 1975; Foxe and Rochow 1975; Escriu et al. 2000; Jiménez-Martínez and Bosque-Pérez 2004; Betancourt et al. 2011; Doumayrou et al. 2012), as well as for contact-transmitted (Sacristán et al. 2011; Alcaide and Aranda 2021) and pollen transmitted (Isogai et al. 2020) viruses. For several persistent-propagative viruses, which multiply inside the insect vector, it has been shown that titer in the vector also correlates with transmission efficiency (Ammar et al. 1995; Rotenberg et al. 2009; Linak et al. 2020). The efficiency of seed transmission also has been related to the virus titer in the reproductive tissues of the plant, another important factor being the speed of plant colonization to reach the reproductive structures (Wang et al., 1997; Amari et al. 2009; Cobos et al. 2019). Thus, no matter the mode of transmission, a positive correlation between virus titer and transmission seems to be general, in agreement with the assumption stated at the beginning of this section.

Plant Virus Adaptation to New Hosts: A Multi-scale Approach

179

5.3 A Trade-Off Between Within-Host Accumulation and Virulence Theory also considers virulence as positively related to the within-host multiplication of parasites, but since the higher the virulence the shorter the infectious period, a trade-off between within-host multiplication and between-host transmission is predicted that will optimize R0 and maintain virulence at intermediate levels (Alizon et al. 2009). This so-called trade-off hypothesis of virulence evolution (Fig. 2) is widely accepted, but its prediction of an intermediate optimum of virulence has been demonstrated for few pathogens, such as rodent malaria, rabbit myxomatosis and CaMV (Dwyer et al. 1990; Mckinnon and Read 1999; Bolker et al. 2010, Doumayrou et al. 2013). Also, the general validity of the hypothesis has been questioned on evidence from systems in which virulence is unlinked to transmission or parasite multiplication (Ebert and Bull 2003). A series of studies have analyzed if the trade-off hypothesis assumptions hold for plant viruses, with contradictory results depending on the analyzed system and approach of the study. For instance, virulence and within-host multiplication were unrelated when different genotypes of cucumber mosaic virus (CMV) were compared in hosts as varied as cucumber, tomato, bean, or Arabidopsis thaliana (Escriu et al. 2003; Sacristán et al. 2005; Pagán et al. 2007). However, for specific interactions between CMV and A. thaliana genotypes, virulence and within-host multiplication were correlated (Pagán et al. 2007), and the trade-off model predicted adequately the evolution of the virulence of CMV as observed in epidemics in tomato fields (Escriu et al. 2003). In experiments of horizontal or vertical transmission of barley stripe mosaic virus (BSMV) in barley, changes in virulence were not related to changes in virus concentration in infected tissues (Stewart et al. 2005). In maize streak virus (MSV), virulence, measured as the chlorotic area of the infected leaf, correlated with the total amount of viral DNA within the leaf (Martin et al. 2005), but comparison of MSV genotypes isolated over 100 years showed that while virulence has remained constant, or has decreased, the accumulation of the virus in the leaves has increased (Monjane et al. 2020). Comparison of nine isolates of CaMV infecting rape seed showed two groups of isolates that differed largely in within-host accumulation; virulence and accumulation did not correlate over the whole set of isolates, and the low accumulation group showed the higher virulence, but virulence and accumulation correlated for each of the two groups (Doumayrou et al. 2013). Finally, Agudelo-Romero et al. (2008) described a host-dependent positive correlation between virulence and within-host accumulation for TEV in the natural host N. tabacum and an alternative one, C. annuum. Furthermore, this correlation depended upon the degree of adaptation of the viral lineages to the particular host in which these two traits were measured. For TEV lineages that evolved in the novel host, a significant correlation was observed, while it was not so for lineages maintained in the natural host. In agreement, if all evolved lineages were tested in the alternative hosts, all correlations became significant, suggesting that long-term adaptation to a host may result in a decoupling between viral accumulation and symptom severity.

180

S. F. Elena and F. García-Arenal

Fig. 2 Trade-offs involving virulence. A Illustration of the trade-off hypothesis between virulence and transmission rate. In green the expected trade-off for a horizontally transmitted virus. An optimal virulence is expected at intermediate values that maximize transmission. In brown, the hypothetical case of a vertically transmitted virus: increasing virulence will always minimize vertical transmission rate by affecting the host fitness, especially for the case of castrating viruses. B The expected trade-off between within-host virus accumulation and virulence. As before, green lines represent the case of a horizontally transmitted virus. A positive association is expected. Different green lines illustrate that the actual trade-off depends on viral genotypes (as observed by AgudeloRomero et al. (2008) and Doumayrou et al. (2012)). For the case of a vertically transmitted virus, the expected relationship has a negative slope

The relationship between within-host multiplication, between-host transmission and virulence is predicted not to be according to the trade-off hypothesis for vertically transmitted viruses (Fig. 2). In this case, the parasite’s fitness is linked to host reproductive potential, as host reproduction is necessary for infection of new individuals. Accordingly, theory proposes that parasites with a higher efficiency of vertical transmission will evolve to lower virulence and lower within-host multiplication. For parasites transmitted both vertically and horizontally, the “continuum hypothesis” proposes that the optimum virulence will vary along a continuum depending on the relative weight of each transmission mode on the parasite fitness (Ewald 1987; Lipchitz et al. 1996). A negative correlation between virulence and vertical transmission has been reported for bacterial and animal viruses (Bull et al. 1991; Messenger et al. 1999; Lambrechts and Scott 2009). For plant viruses, two studies based on serial passages of vertical and horizontal transmission have addressed the hypothesis, with results supporting it. For one strain of BSMV, passages of vertical transmission resulted in increased transmission and reduced virulence, unrelated to virus accumulation (Stewart et al. 2005). For three different strains of CMV, vertical transmission increased its rate, decreased virulence and decreased virus accumulation (Pagán et al. 2014). We are not aware of similar studies with other viruses. Another underexplored relationship is that between virulence and survival in the environment, another component of virus fitness at the population level. Survival

Plant Virus Adaptation to New Hosts: A Multi-scale Approach

181

is particularly relevant for pathogens transmitted through both the environment and direct contact, as is the case for many plant viruses. In such parasites a trade-off between virulence and transmission is not predicted, because a sit-and-wait strategy, in which survival in the environment exceeds the infectious period of the host, may result in a positive correlation between virulence and survival (Walther and Ewan 2004), a hypothesis known as the “Curse of the Pharaoh” (Bonhoeffer et al. 1996). The relationships among survival and other life-history traits have seldom been tested for viruses (Goldhill and Turner 2014). This topic has been explored for pepper mild mottle virus, which is transmitted by plant-to-plant contact and through the soil. Analyses of coat protein mutants which affected particle stability and, hence, survival in the soil did show no correlation between survival and virulence or within-host multiplication (Fraile et al. 2014; Bera et al. 2017) although the system fulfilled the conditions predicted by theory for such correlations to occur (Bonhoeffer et al. 1996).

5.4 Transmission Bottlenecks Between Hosts Transmission to a new host individual is associated with bottlenecks in the virus population. As is the case for mosquito-transmitted viruses infecting mammals (Weaver et al. 2021), vector transmission of plant viruses involves severe population bottlenecks. Early attempts to quantify these bottlenecks were based on quantifying the volume retained in the aphid stylets of a suspension of virus particles and establishing correlations with the success of transmission. Thus, using suspensions of 125 I-labelled virions and calculating the retained volume from radioactivity counts, Pirone and Thornbury (1988) estimated that the minimal number of particles retained in aphid stylets needed for successful transmission of two potyviruses, TEV and tobacco vein mottling virus, was between 15 and 20. More recent estimates of the effective size of the founder populations after single-aphid transmission were based on analyses of the frequency of virus genotypes in the source and transmitted populations. Estimates for PVY transmitted by Myzus persicae (0.5–3.2; Moury et al. 2007) and CMV transmitted by Aphis gossypii (1–2; Betancourt et al. 2008) were similar, which might be surprising since PVY has a monopartite and CMV a three-partite genome. Effective numbers were also very small for the octopartite faba bean necrotic stunt virus of about 1 or 3–4, pending the experimental and analytical approach (Gallet et al. 2018a, b), in sharp contrast with theoretical expectations of bottleneck size required to ensure the transmission of each segment (Iranzo and Manrubia 2012). The effective size of founder PVY populations was also estimated by passages of transmission by M. persicae and HTS of the resulting population, yielding values or 10 or less according to the lineage (da Silva et al. 2020), which are compatible with the previous results, considering that the method of analysis did not allow estimating values below 10. Severe bottlenecks, with founder values of about 1.4 to 3.6 were estimated from the segregation of two TMV genotypes during contact transmission (Sacristán et al. 2011), with slightly higher values reported for mechanical inoculation (Sacristán et al. 2003). Founder effects at initiation of infection by mechanical

182

S. F. Elena and F. García-Arenal

inoculation of TEV were dose-dependent, but compatible with the hypothesis that a single infectious unit may start an infection (Zwart et al. 2011). Much higher values, between 20 and 30, were estimated for PVY during mechanical inoculation by da Silva et al. (2020). Despite variation in the actual estimates of the effective founder populations, which may depend on the analyzed system as well as on the experimental and analytical approaches, it is clear that horizontal transmission of plant viruses results in severe bottlenecks that would introduce random effects on the genetic structure of the virus population. This may well be a general trend, as severe bottlenecks, in the range of those here discussed for plant viruses, have been reported during the transmission of different animal viruses, such as HIV-1, HCV, Venezuelan equine encephalitis virus, or influenza A virus (reviewed in Zwart and Elena 2015). Bottlenecks at transmission are expected to decrease the effects of selection of host adaptation and, hence, affect the evolution of within-host multiplication, virulence and transmission.

6 Evolutionary Consequences of Host Heterogeneity in the Landscape In nature, transmission dynamics involves multispecies interactions embedded in communities (persistent sets of species), hence the need to go beyond the more traditional focus on single-host—single-virus interactions (Woolhouse and GowtageSequerira 2005). A virus interacts with plant communities harboring different potential hosts, as well as with virus communities, at all spatial scales from a single infected host to the landscape (McLeish et al. 2019a, b). Within this context, a main fitness component is the host range, defined as the number of hosts, species or genotypes, exploited by a pathogen (Woolhouse and Gowtage-Sequeria 2005; McLeish et al. 2018). The potential hosts of a virus may differ in susceptibility to virus infection, in competence for transmission and in their ability to sustain vector populations (LoGiudice et al. 2003; Cronin et al. 2010; Hily et al. 2014), thus, host range is a key determinant of virus transmission dynamics and survival. Although a simple metric, host range is not easy to determine, and is not a fixed value, as the distribution and abundance of species determine the potential hosts that a virus may come into contact with. Host range, and its evolution, depends on genetic traits that determine the virus fitness across hosts, and on ecological and epidemiological factors extrinsic to the virus (McLeish et al. 2018). Differences in susceptibility and competence among hosts translate into differences in host-associated selection that will result in genotype-by-environment (G × E) interactions. Otherwise said, as some fitness components of the virus are hostdependent, a virus cannot maximize its fitness in all its hosts. As described in Sect. 2 above, if adaptation to one or a few related hosts implies a decreased fitness in other hosts, an adaptive trade-off among hosts will be generated, which will favor the evolution of specialism (narrow host range) rather than generalism (broad host

Plant Virus Adaptation to New Hosts: A Multi-scale Approach

183

range). Evidence for across-hosts fitness trade-offs is abundant (Elena et al. 2014; McLeish et al. 2018), and includes trade-offs between the two hosts, plant and insect, in which propagatively-transmitted viruses must replicate (Zhao et al. 2018). Results of experimental evolution as well as analyses of field isolates showing differential host adaptation, have shown that the mechanisms behind such G × E interactions include antagonistic pleiotropy (a mutation improves fitness in one environment while reducing fitness in another) (Lali´c et al. 2011; Moreno-Pérez et al. 2016), epistatic interactions among mutations (G × G) (Lali´c and Elena 2015a), and higher order G × G × E interactions (Bedhomme et al. 2015; Lali´c and Elena 2015b) that result in rugged fitness landscapes (Cervera et al. 2016). However, the distribution of phenotypic effects of host range mutations may depend on demographic factors of the within-host virus population, associated with the host genotype (Rousseau et al. 2018). There are also reports of the fitness effects of host range mutations not differing among hosts (Bedhomme et al. 2012; Gallois et al. 2018). Virus encounters with diverse communities in heterogeneous environments may lead to new associations in which G × E interactions are not limited to the host plant as environment, as in the previous paragraph, but include factors extraneous to the host. In heterogeneous host populations virus infection may result in a range of antagonistic to mutualistic interactions (González et al. 2020) that affect both virus and host fitness (Hily et al. 2014; Creissen et al. 2016). Plant viruses also affect the interaction of their hosts with the environment that feedback on virus fitness, for instance leading to changes to their chemical ecology, which attract vectors (Mauck et al. 2018; Donnelly et al. 2019). The effects of environmental heterogeneity may be most relevant for generalist viruses, which make a large fraction of plant viruses (Power and Flecker 2003). Generalism involves highly heterogeneous rates of transmission among hosts that differ in susceptibility and competence (Cronin et al. 2010). This may result in viruses behaving as “facultative” generalists that exploit narrow subsets of the available host in different communities, as shown by large variations in incidence over hosts in different communities in a study of the distribution of 11 generalist viruses across communities in a heterogeneous landscape (McLeish et al. 2017). Host adaptation is not necessarily required for the acquisition of new hosts, as a virus may acquire a phenotype fit for survival and transmission in a new host through phenotypic plasticity (De Fine Licht 2018; González et al. 2020). For example, an increased disorder in the VPg protein of PVY, conferred the capacity to overcome the resistance in pepper genotypes (Charon et al. 2018). Phenotypic plasticity is also behind ecological fitting, that is, the capacity to cope with novel conditions based on preexisting capacities. Ecological fitting has been proposed to play an important role in host range evolution (Brooks and Boeger 2019). In agreement with this hypothesis, it has been reported that a single haplotype of watermelon mosaic virus was able to infect 11 out of a total of 24 host plant species across communities in a heterogeneous landscape, indicating few genetic constraints on host species use (Peláez et al. 2020).

184

S. F. Elena and F. García-Arenal

6.1 The Complexity of Virus-Host Infection Networks At the community and landscape scales, bipartite networks provide powerful tools for the analysis of host–pathogen interactions and specificity. The structure of the network of interactions was first analyzed with bacteria and phages, the structure being both nested, in which specialists interacted with subsets of the species with which generalist interacted, and modular, with subsets of specific interactions, depending on the geographical and taxonomic scales of the analysis. It was suggested that the nested structure represented the evolution of generalism, while the evolution of specialism resulted in modules (Flores et al. 2013; Weitz et al. 2013). A nested structure results from a hierarchical organization of interactions, which has been considered to be positively correlated with biodiversity and network resilience, and thus with coevolutionary processes among the interacting symbionts. However, it has been shown that a nested structure is a general feature of ecological networks which can result from neutral processes without the involvement of selection (Valverde et al. 2018). Network analyses were first applied to 37 viruses per 28 plant species infection matrix, based on experimental host ranges (Moury et al. 2017). As for the phage studies, the network was nested and modular, modules corresponding to subsets of interactions associated with host taxonomy. Coexistence of nestedness and modularity was also shown for virus-plant interactions in different communities of a heterogeneous landscape (Valverde et al. 2020). Nestedness in the global network across communities and seasons was dependent on neutral processes of community assembly, while modularity at local scales was contingent on local adaptation and competition. Spatial dependencies of plant virus interactions were also demonstrated in McLeish et al. (2017) following a different approach. Viruses are also confronted to heterogeneous virus communities, as in nature coinfection is frequent (Malpica et al. 2006; Kamitani et al. 2016; McLeish et al. 2019a), which modifies within host infection dynamics, host plant phenotypes and transmission dynamics (Gómez et al. 2009; Lacroix et al. 2017; Agüero et al. 2018). Coinfections have been shown to have a central role in structuring plant-virus interactions and infection distributions. Thus, when the interactions between 11 generalist viruses and 47 plant host species in four habitats were decomposed into co-infections and single infections, results showed that co-infection occurred more often than expected at random in one of the four habitats (McLeish et al. 2019b). The study also showed that the spatial distribution of infection at the ecosystem-level was structured by specific virus-virus (and not only host-virus) interactions, and allowed the identification of a particular habitat has a reservoir community. The results of these network analyses are relevant because they show that plant-virus interactions have non-random patterns, which is a prerequisite for predicting infection risk and approaching the analysis of the underlying processes.

Plant Virus Adaptation to New Hosts: A Multi-scale Approach

185

7 An Integrative New Paradigm: The Multilayer Network In all previous sections we have reviewed evidences of plant virus population dynamics at different levels of their hosts’ organization, from cells to singlepopulations, and to communities and ecosystems. In principle, the mechanisms of evolution (mutation, selection, migration, and drift) should be the same at each level, though operating upon different units, representing a first unifying conceptual framework to study. However, efforts to integrate into the same predictive model phenomena that take place at different levels are so far scarce or with little success. Many processes in natural and artificial systems are also characterized to occur, or can be represented, at multiple layers: from the movement of people among cities using different transport networks or different social media connecting people to the change in species composition in an ecosystem throughout time (Boccaletti et al. 2014; Kivelä et al. 2014; De Domenico et al. 2016; Pilosof et al. 2017b). A new area of network physics and mathematics that has been quickly developing along the last decade is the study of the so-called multilayer networks. To make it simple, multilayer networks are mathematical objects that are formed by two or more layers. Each layer contains nodes connected by intralayer edges that describe rules of interactions between the nodes for this particular layer. In addition, dependencies across layers are represented by interlayer edges. Figure 3 shows a cartoon representation of a multilayer (in this case three layers) network applied to viruses. The bottom layer represents the viral quasispecies generated within a cell. At this level, nodes represent genotypes and edges the mutational paths connecting those genotypes. The size of nodes is proportional to the abundance of each genotype. The equations describing the dynamics at this bottom level could be, for instance, Eigen’s quasispecies model. The middle layer represents local single-host populations. Each node corresponds to a particular host plant and the edges represent the contact network among the hosts that determine transmission dynamics. The equations governing the virus’ dynamics at the middle level could be the well-known SIS, SIR or SEIR epidemiological models and the network topology can show any topology (e.g., scale-free (Pastor-Satorras and Vespignani 2001)). The interlayer edges connecting the bottom and the middle layers represent the probability that a given viral genotype is actually present in each infected host, which indeed depends on the replicative advantage at the bottom level and the mutation rate. Coinfections are allowed (two interlayer edges pointing towards the same individual). The upper layer represents the epidemiological level. In this layer, nodes represent, for example, communities whereas edges represent the connectivity between these communities (e.g., intensity airport traffic connections, vector flights, seed dispersal, etc.). At this level the dynamics can be modeled using phylogeography tools. The interlayer edges connecting the middle and the upper layers would represent the probability that an infected individual will move from one community to another. This representation allows us to study not only the dynamics at each layer, but also in the entire multilayer system and infer properties such as multilayer modularity, robustness to perturbation or percolation. Modularity in the multilayer context

186

S. F. Elena and F. García-Arenal

Fig. 3 A multilayer approach to integrative virology. The diagram illustrates a three-level case. The bottom layer represents the genotypic viral space. Each dot represents a viral genotype and edges represent mutational steps connecting genotypes by mutation. Different colors may represent different fitness values. The medium layer represents individual hosts (in this example, tomato plants). Transmission among individuals in this example is mediated by vectors (red edges). Vertical blue edges connecting bottom and medium networks indicated the prevalence of particular viral genotypes of different individuals. Finally, the upper network represents different plant communities, from wild forests and prairies (left) to tomato crops surrounded by trees (center) and other crops and marginal shrubs (right). Horizontal white edges in this network represent possible connections by visits by herbivore vectors moving from wild plants into crops and vice versa. Blue vertical edges connecting medium and upper networks represent introduction of individual tomato plants from nurseries into crop fields

means nodes belonging to a particular module in a layer may pertain to a module in the upper layer. This is known as the mutually connected giant component (MCGC) (Boccaletti et al. 2014). For instance, viral genotypes closely connected in the quasispecies bottom layer may be found in individuals in the middle layer that also form a transmission cluster (or module). For instance, ecological multilayer networks (e.g., in plant-aphid and plant-aphid-parasitoids (Pocock et al. 2012)) show nontrivial stability properties that result in quantitative predictions about the persistence or extinction probabilities that would not be shown up by other modeling approaches (Pilosof et al. 2017b). Information between nodes in a particular layer can be transmitted via two different types of paths: those involving only intralayer edges and those involving both intra- and interlayer edges. This means that catastrophic failures in a particular layer can easily be transmitted to the rest of layers via the MCGC, resulting in percolations (Cellai et al. 2013; Boccaletti et al. 2014). Monolayer networks are

Plant Virus Adaptation to New Hosts: A Multi-scale Approach

187

mathematically represented by an adjacency matrix which encodes information about relationships among the entities in the network (e.g., the mutational coupling matrix in Eigen’s quasispecies model). Multilayer networks include multiple dimensions of connectivity (intra- and interlayer edges) that need to be simultaneously considered into a mathematical object, the adjacency tensor (De Domenico et al. 2016). The epidemic spread of infectious diseases in multilayer networks has received special attention by physicists (Dickison et al. 2012; Saumell-Mendiola et al. 2013; Zhao et al. 2014a, b; Stella et al. 2017), alas not by virologists. In particular, two topics have been profusely studied. The first topic consists in the interaction between two co-occurring infectious diseases (Funk and Jansen 2010; Sahneh and Scoglio 2014; Sanz et al. 2014; Zhao et al. 2014a, b). When multiple pathogens are spreading in a host scale-free network and compete for hosts, they obviously influence each other’s dynamics. In such a situation, for example, the derived epidemic threshold for each disease is substantially different than predicted using monolayer coinfection models, the difference further depending on the infection dynamics (SIS, SIR or SEIR). Indeed, two new thresholds arise from these models: the survival and the absolute-dominance thresholds (Sahneh and Scoglio 2014). The survival threshold determines a continuous phase transition from extinction to existence during competition between both viruses. The absolute-dominance threshold denotes the critical point where one of the viruses fully outcompetes the other. Between these two thresholds, coexistence is possible. Coexistence is an emerging property of the interconnected structure of the multilayer. The second studied topic is on the role of awareness in the spreading of an infectious disease (Granell et al. 2013; Guo et al. 2015; Scatà et al. 2016; Pérez et al. 2020). One can imagine two layers, one consisting in the physical contact between hosts in which actually the disease spreads out, the second layer would consist in a communication network (e.g., information flux) that influences the behavior of individuals. In this model, individuals show herd-like behavior because they are taking decisions upon the action of other individuals. In consequence, there is a reducing local contagion that results in a larger epidemic threshold, smaller size epidemics and the nodes become more resilient (Guo et al. 2015). Indeed, the epidemic threshold strongly depends on the topology of the communication network (Granell et al. 2013). In some sense, awareness, or its lack of, is equivalent to having two host categories that differ in their susceptibility. Not surprisingly, the conclusions from models including other types of heterogeneity in susceptibility are qualitatively similar (Pilosof et al. 2017a).

8 Concluding Remarks Viral populations and their interplay with hosts, at each level of biological organization, represent a prototypical example of a complex adaptive system (Solé and Elena 2019). Complex systems are characterized by the existence of emerging properties that cannot be predicted as the sum of their constitutive parts. Despite the difficulty to understand such systems, here we show that some unifying principles and concepts

188

S. F. Elena and F. García-Arenal

can be used to explain the behavior of viral populations at different levels of host organization. For example, the cellular contagion rate used to describe the speed at which a viral infection spreads among plant cells in the same tissue (leave) is homologous to the R0 used in epidemiological studies. Undoubtedly, one of the most pervasive concepts is that of trade-offs. Antagonistic pleiotropy generates fitness trade-offs at the level of cells and tissues (i.e., tissue tropism) in as much as they explain adaptation to local hosts at the cost of reduced fitness in alternative hosts (i.e., host range). Likewise, trade-offs also explain coevolution between viral traits such as transmission and virulence and within-host accumulation and virulence. Properties at one level, say the epidemiological displacement of one viral lineage by another one, ultimately could be considered as an emerging property from processes that took place at lower levels: a new mutant genotype arising within a single cell, spreading out to other cells and tissues due to a combination of chance and selection, then being transmitted to other individual hosts (also with a good dose of chance and selection) and from there to the entire population and community. Clearly, predicting the fate of a viral mutant genotype would require accumulating knowledge about its properties at different levels, but also would require a theoretical framework that allows making precise quantitative predictions. We propose that a potentially interesting theoretical framework to be explored is the multilayer network theory. As shortly reviewed in Sect. 7, this theory, whose mathematical foundations have been developed along the last decade or so, provides the tools to link dynamical processes at different levels in such a way that dynamics within-layers networks are linked by inter-layer edges that give rise the to a unique multi-level network (or network of networks). Processes within each layer network are driven by their own rules, while the connections of nodes between layers represent the feedback between processes at each layer level. This mathematical formalism shows that dynamical systems connected in such a way present fully new and unexpected properties. To finish this succinct review, this chapter can be summarized into a single sentence: a multi-level virus evolution theory still needs to be developed that provides a better understanding and predictability of their complex evolutionary dynamics. Acknowledgements This work has been supported by grants PID2019-103998GB-I00 and RTI2018-094302-B-100from Spain Agencia Estatal de Investigación—FEDER to S.F.E. and F.G.A., respectively, and grant PROMETEU2019/012 (Generalitat Valenciana) to S.F.E.

References Agudelo-Romero P, de la Iglesia F, Elena SF (2008) The pleiotropic cost of host-specialization in Tobacco etch potyvirus. Infect Genet Evol 8:806–814 Agüero J, Gómez-Aix C, Sempere RN, García-Villalba J, García-Núñez J, Hernando Y, Aranda MA (2018) Stable and broad spectrum cross-protection against pepino mosaic virus attained by mixed infection. Front Plant Sci 871:1–12 Alcaide C, Aranda MA (2021) Determinants of persistent patterns of pepino mosaic virus mixed infections. Front Microbiol 12:694492

Plant Virus Adaptation to New Hosts: A Multi-scale Approach

189

Alizon S, Hurford A, Mideo N, van Baalen M (2009) Virulence evolution and the trade-off hypothesis: history, current state of affairs and the future. J Evol Biol 22:245–259 Amari K, Burgos L, Pallas V, Sanchez-Pina MA (2009) Vertical transmission of prunus necrotic ringspot virus: hitch-hiking from gametes to seedling. J Gen Virol 90:1767–1774 Ammar ED (1975) Effect of European wheat striate mosaic, acquired transovarially, on the biology of its planthopper vector Javesella pellucida. Ann Appl Biol 79:203–213 Ammar ED, Gingery RE, Madden L (1995) Transmission efficiency of three isolates of maize stripe tenuivirus in relation to virus titre in the planthopper vector. Plant Pathol 44:239–243 Anderson R, May R (1982) Coevolution of hosts and parasites. Parasitology 85:411–426 Atkinson PH, Matthews REF (1970) On the origin of dark green tissue in tobacco leaves infected with tobacco mosaic virus. Virology 40:344–356 Balcan D, Vespignani A (2011) Phase transitions in contagion processes mediated by recurrent mobility patterns. Nat Phys 7:581–586 Barthélemy M, Barrat A, Pastor-Satorras R, Vespignani A (2005) Dynamical patterns of epidemic outbreaks in complex heterogeneous networks. J Theor Biol 235:275–288 Bera S, Fraile A, García-Arenal F (2018) Analysis of fitness trade-offs in the host range expansion of an RNA virus, tobacco mild green mosaic virus. J Virol 92:e01268-e1318 Bera S, Moreno-Pérez MG, García-Figuera S, Pagán I, Fraile A, Pacios LF, García-Arenal F (2017) Pleiotropic effects of resistance breaking mutations on particle stability provide insight into life history evolution of a plant RNA virus. J Virol 91:e00435-e517 Bergua M, Zwart MP, El-Mohtar C, Shilts T, Elena SF, Folimonova SY (2014) A viral protein mediates superinfection exclusion at the whole-organism level but is not required for exclusion at the cellular level. J Virol 88:11327–11338 Bedhomme S, Hillung J, Elena SF (2015) Emerging viruses: why they are not jacks of all trades? Curr Opin Virol 10:201203961–201203966 Bedhomme S, Lafforgue G, Elena SF (2012) Multihost experimental evolution of a plant RNA virus reveals local adaptation and host-specific mutations. Mol Biol Evol 29:1481–1492 Betancourt M, Fraile A, García-Arenal F (2011) Cucumber mosaic virus satellite RNAs that induce similar symptoms in melon plants show large differences in fitness. J Gen Virol 92:1930–1938 Betancourt M, Fereres A, Fraile A, Garcia-Arenal F (2008) Estimation of the effective number of founders that initiate an infection after aphid transmission of a multipartite plant virus. J Virol 82:12416–12421 Boccaletti S, Bianconi G, Criado R, Del Genio CI, Gómez-Gardeñes J, Romance M, SendiñaNadal I, Wang Z, Zanin M (2014) Th structure and dynamics of multilayer networks. Phys Rep 544:1–122 Bolker BM, Nanda A, Shah D (2010) Transient virulence of emerging pathogens. J R Soc Interface 7:811–822 Bonhoeffer S, Lenski RE, Ebert D (1996) The curse of the pharaoh: the evolution of virulence in pathogens with long living propagules. Proc R Soc B 263:715–721 Bosque G, Folch-Fortuny A, Picó J, Ferrer A, Elena SF (2014) Topology analysis and visualization of Potyvirus protein-protein interaction network. BMC Syst Biol 8:129 Bradwell K, Combe M, Domingo-Calap P, Sanjuán R (2013) Correlation between mutation rate and genome size in ribovirus: mutation rate of bacteriophage Qβ. Genetics 195:243–251 Brett TS, Rohani P (2020) Dynamical footprints enable detection of disease emergence. PLoS Biol 18:e3000697 Brooks DR, Boeger WA (2019) Climate change and emerging infectious diseases: evolutionary complexity in action. Curr Opin Syst Biol. 13:75–81 Bull JJ, Molineux IJ, Rice WR (1991) Selection of benevolence in a host-parasite system. Evolution 45:875–882 Catalán P, Arias CF, Cuesta JA, Manrubia S (2017) Adaptive multiscapes: an up-to-date metaphor to visualize molecular adaptation. Biol Direct 12:7 Cellai D, López E, Zhou J, Gleeson JP, Bianconi G (2013) Percolation in multiplex netwoks with overlap. Phys Rev E 88:052811

190

S. F. Elena and F. García-Arenal

Cervera H, Lali´c J, Elena SF (2016) Effect of host species on topography of the fitness landscape for a plant RNA virus. J Virol 90:10160–10169 Chao L, Rang C, Wong L (2002) Distribution of spontaneous mutants and inferences about the replication mode of the RNA bacteriophage ϕ6. J Virol 76:3276–3281 Charon J, Barra A, Walter J, Millot P, Hébrard E, Moury B, Michon T (2018) First experimental assessment of protein intrinsic disorder involvement in an RNA virus natural adaptive process. Mol Biol Evol 35:38–49 Cobos A, Montes N, López-Herranz M, Gil-Valle M, Pagán I (2019) Within-host multiplication and speed of colonization as infection traits associated with plant virus vertical transmission. J Virol 93:e01078-e1119 Combe M, Garijo R, Geller R, Cuevas JM, Sanjuán R (2015) Single-cell analysis of RNA virus infection identifies multiple genetically diverse viral genomes within single infectious units. Cell Host Microbe 18:424–432 Creissen HE, Jorgensen TH, Brown JKM (2016) Impact of disease on diversity and productivity of plant populations. Funct Ecol 30:649–657 Cronin JP, Welsh ME, Dekkers MG, Abercrombie ST, Mitchell CE (2010) Host physiological phenotype explains pathogen reservoir potential. Ecol Lett 13:1221–1232 Cuevas JM, Willemsen A, Hillung J, Zwart MP, Elena SF (2015) Temporal dynamics of intrahost molecular evolution for a plant RNA virus. Mol Biol Evol 32:1132–1147 da Silva W, Kutnjak D, Xu Y, Xu Y, Giovannoni J, Elena SF, Gray S (2020) Transmission modes affect the population structure of potato virus Y in potato. PLoS Pathog 16:e1008608 De Domenico M, Granell C, Porter MA, Arenas A (2016) The physics of spreading processes in multilayer networks. Nat Phys 12:901–906 De Fine Licht HH (2018) Does pathogen plasticity facilitate host shifts? PLoS Pathog 14:e1006961 Denhardt D, Silver RB (1966) An analysis of the clone size distribution of ϕX174 mutants and recombinants. Virology 30:10–19 Dharmavaram S, Xie F, Klug W, Rudnick J, Bruinsma R (2017) Orientational phase transitions and the assembly of viral capsids. Phys Rev E 95:062402 Dickison M, Havlin S, Stanley HE (2012) Epidemics on interconnected networks. Phys Rev E 85:066109 Djidjou-Demasse R, Moury B, Fabre F (2017) Mosaics often outperform pyramids: insights from a model comparing strategies for the deployment of plant resistance genes against viruses in agricultural landscapes. New Phytol 216:239–253 Doekes HM, Fraser C, Lythgoe KA (2017) Effect of the latent reservoir on the evolution of HIV at the within- and between-host levels. PLoS Comput Biol 13:e1005228 Domingo E, Sheldon J, Perales C (2012) Viral quasispecies evolution. Microbiol Mol Biol Rev 76:159–216 Donnelly R, Cunniffe NJ, Carr JP, Gilligan CA (2019) Pathogenic modification of plants enhances long-distance dispersal of nonpersistently transmitted viruses to new hosts. Ecology 100:e02725 Doumayrou J, Avellan A, Froissart R, Michalakis Y (2012) An experimental test of the transmissionvirulence trade-off hypothesis in a plant virus. Evolution 67:477–486 Dwyer G, Levin SA, Buttel L (1990) A simulation model of the population dynamics and evolution of myxomatosis. Ecol Monogr 60:423–447 Ebert D, Bull JJ (2003) Challenging the trade-off model for the evolution of virulence: is viulence management feasible? Trends Microbiol 11:15–20 Eigen M (1971) Self organization of matter and evolution of biological macromolecules. Naturwissenschaften 58:465–523 Elena SF, Fraile A, García-Arenal F (2014) Evolution and emergence of plant viruses. Adv Virus Res 88:161–191 Escriu F, Perry KL, Garcia-Arenal F (2000) Transmissibility of cucumber mosaic virus by Aphis gossypii correlates with viral accumulation and is affected by the presence of its satellite RNA. Phytopathology 90:1068–1072

Plant Virus Adaptation to New Hosts: A Multi-scale Approach

191

Escriu F, Fraile A, García-Arenal F (2003) The evolution of virulence in a plant virus. Evolution 57:755–765 Ewald PW (1987) Transmission modes and evolution of the parasitism mutualism continuum. Ann NY Acad Sci 503:295–306 Fabre F, Bruchou C, Palloix A, Moury B (2009) Key determinants of resistance durability to plant viruses: insights from a model linking within- and between-host dynamics. Virus Res 141:140–149 Fabre F, Montarry J, Coville J, Senoussi R, Simon V, Moury B (2012) Modelling the evolutionary dynamics of viruses within their hosts: a case study using high-throughput sequencing. PLoS Pathog 8:e1002654 Fabre F, Moury B, Johansen EI, Simon V, Jacquemond M, Senoussi R (2014) Narrow bottlenecks affect pea seedborne mosaic virus populations during vertical seed transmission but not during leaf colonization. PLoS Pathog 10:e1003833 Fereres A (2015) Insect vectors as drivers of plant virus emergence. Curr Op Virol. 10:42–46 Flores CO, Valverde S, Weitz JS (2013) Multi-scale structure and geographic drivers of crossinfection within marine bacteria and phages. ISME J 7:520–532 Foxe MJ, Rochow WF (1975) Importance of virus source leaves in vector specificity of barley yellow dwarf virus. Phytopathology 65:1124–1129 Fraile A, Hily JM, Pagán I, Pacios LF, García-Arenal F (2014) Host resistance selects for traits unrelated to resistance-breaking that affect fitness in a plant virus. Mol Biol Evol 31:928–939 French R, Stenger DC (2003) Evolution of wheat streak mosaic virus: dynamics of population growth within plants may explain limited variation. Annu Rev Phytopathol 41:199–214 Funk S, Jansen VAA (2010) Interacting epidemics on everlay networks. Phys Rev E 81:036118 Gallet R, Michalakis Y, Blanc S (2018a) Vector-transmission of plant viruses and constraints imposed by virus-vector interactions. Curr Op Virol 33:144–150 Gallet R, Fabre F, Thébaud G, Sofonea MT, Sicard A, Blanc S, Michalakis Y (2018b) Small bottleneck size in a highly multipartite virus during a complete infection cycle. J Virol 92:e00139e218 Gallois JL, Moury B, German-Retana S (2018) Role of the genetic background in resistance to plant viruses. Int J Mol Sci 19:2856 García-Arenal F, Fraile A (2013) Trade-offs in host range evolution of plant viruses. Plant Pathol 62:S2–S9 García-Villada L, Drake JW (2012) The three faces of riboviral spontaneous mutation: spectrum, mode of genome replication, and mutation rate. PLoS Genet 8:e 1002832 Gauthier J, Drezen JM, Herniou EA (2018) The recurrent domestication of viruses: major evolutionary transitions in parasitic wasps. Parasitology 145:713–723 Gog JR, Pellis L, Wood JLN, McLean AR, Arinaminpathy N, Lloyd-Smith JO (2015) Seven challenges in modeling pathogen dynamics within-host and across scales. Epidemics 10:45–48 Goldhill DH, Turner PE (2014) The evolution of life history trade-offs in viruses. Curr Op Virol 8:79–84 Gómez P, Sempere RN, Elena SF, Aranda MA (2009) Mixed infections of pepino mosaic virus strains modulate the evolutionary dynamics of this emergent virus. J Virol 83:12378–12387 González R, Butkovi´c A, Elena SF (2020) From foes to friends: viral infections expand the limits of host phenotypic plasticity. Adv Virus Res 106:85–121 González R, Butkovi´c A, Escaray FJ, Martínez-Latorre J, Melero Í, Pérez-Parets E, Gómez-Cadenas A, Carrasco P, Elena SF (2021) Plant virus evolution under strong drought conditions results in a transition from parasitism to mutualism. Proc Natl Acad Sci USA 118:e2020990118 González-Jara P, Fraile A, Cantó T, García-Arenal F (2009) The multiplicity of infection of a plant virus varies during colonization of its eukaryotic host. J Virol 83:7487–7494 Goodchild DJ, Cohen M, Wildman SG (1958) The specific activity of tobacco mosaic virus as a function of age of infection. Virology 5:561–566 Granell C, Gómez S, Arenas A (2013) Dynamical interplay between awareness and epidemic spreading in multiplex networks. Phys Rev Lett 111:128701

192

S. F. Elena and F. García-Arenal

Guo Q, Jiang X, Lei Y, Li M, Ma Y, Zheng Z (2015) Two-stage effects of awareness cascade on epidemic spreading in multiplex networks. Phys Rev E 91:012822 Gutiérrez S, Pirolles E, Yvon M, Baecker V, Michalakis Y, Blanc S (2015) The multiplicity of cellular infection changes depending on the route of cell infection in a plant virus. J Virol 89:9665–9675 Gutiérrez S, Yvon M, Thébaud G, Monsion B, Michalakis Y, Blanc S (2010) Dynamics of the multiplicity of cellular infection in a plant virus. PLoS Pathog 6:e1001113 Hackett J, Muthukumar V, Wiley GB, Palmer MW, Roe BA, Melcher U (2009) Viruses in Oklahoma Euphorbia marginata. Proc Oklahoma Acad Sci 89:57–62 Hall JS, French R, Hein GL, Morris TJ, Stenger DC (2001) Three distinct mechanisms facilitate genetic isolation of sympatric wheat streak mosaic virus lineages. Virology 282:230–236 Hily JM, García A, Moreno A, Plaza M, Wilkinson MD, Fereres F, Fraile A, García-Arenal F (2014) The relationship between host lifespan and pathogen reservoir potential: an analysis in the system Arabidopsis thaliana-cucumber mosaic virus. PLoS Pathog 10:e1004492 Iranzo J, Manrubia SC (2012) Evolutionary dynamics of genome segmentation in multipartite viruses. Proc R Soc B 279:3812–3819 Isogai M, Matsudaira T, Miyoshi K, Shimura T, Torii S, Yoshikawa N (2020) The raspberry bushy dwarf virus 1b gene enables pollen grains to function efficiently in horizontal pollen transmission. Virology 542:28–33 Jimenez-Martinez ES, Bosque-Perez A (2004) Variation in barley yellow dwarf virus transmisión efficiency by Rhopalosiphum padi (Homoptera: Aphididae) after acquisition from transgenic and nontransformed wheat genotypes. J Econ Entomol 97:109–127 Jenner CE, Wang X, Ponz F, Walsh JA (2002) A fitness cost for turnip mosaic virus to overcome host resistance. Virus Res 86:1–6 Kamitani M, Nagano AJ, Honjo MN, Kudoh H (2016) RNA-Seq reveals virus-virus and virus-plant interactions in nature. FEMS Microbiol Ecol 92:fiw176 King AM, Lefkowitz E, Adams M, Carstens EB (2011) Virus taxonomy: ninth report of the international committee on taxonomy of viruses. Elsevier Inc., San Diego, CA Jridi C, Martin JF, Marie-Jeanne V, Labonne G, Blanc S (2006) Distinct viral populations differentiate and evolve independently in a single perennial host plant. J Virol 80:2349–2357 Kivelä M, Arenas A, Barthelemy M, Gleeson JP, Moreno Y, Porter MA (2014) Multilayer Networks. J Complex Netw 2:203–271 Kutnjak D, Rupar M, Gutiérrez-Aguirre I, Curk T, Kreuze JF, Ravnikar M (2015) Deep sequencing of virus-derived small interfering RNAs and RNA from viral particles shows highly similar mutational landscapes of a plant virus population. J Virol 89:4760–4769 Kutnjak D, Elena SF, Ravnikar M (2017) Time-sampled population sequencing reveals the interplay of selection and genetic drift in experimental evolution of potato virus Y. J Virol 91:e00692-e717 Lacroix C, Seabloom EW, Borer ET (2017) Environmental nutrient supply directly alters plant traits but indirectly determines virus growth rate. Front Microbiol 8:2116 Lafforgue G, Tromas N, Elena SF, Zwart MP (2012) Dynamics of the establishment of systemic potyvirus infection: independent yet cumulative action of primary infection sites. J Virol 86:12912–12922 Lali´c J, Cuevas JM, Elena SF (2011) Effect of host species on the distribution of mutational fitness effects for an RNA virus. PLoS Genet 11:e1002378 Lali´c J, Elena SF (2015a) The impact of high-order epistasis in the within-host fitness of a positivesense plant RNA virus. J Evol Biol 28:2236–2247 Lali´c J, Elena SF (2015b) Epistasis between mutations is host-dependent for an RNA virus. Biol Lett 9:20120396 Lambrechts L, Scott TW (2009) Mode of transmission and the evolution of arbovirus virulence in mosquito vectors. Proc R Soc B 276:1369–1378 Lemey P, Rambaut A, Pybus OG (2006) HIV evolutionary dynamics within and among hosts. AIDS Rev 8:125–140 Li H, Roossinck MJ (2004) Genetic bottlenecks reduce population variation in an experimental RNA virus population. J Virol 78:10582–10587

Plant Virus Adaptation to New Hosts: A Multi-scale Approach

193

Linak JA, Jacobson AL, Sit TL, Kennedy GG, GG, (2020) Relationships of virus titers and transmission rates among sympatric and allopatric virus isolates and thrips vectors support local adaptation. Sci Rep 10:7649 Lipsitch M, Siller S, Nowak MA (1996) The evolution of virulence in pathogens with vertical and horizontal transmission. Evolution 50:1729–1741 Liang XZ, Lee BTK, Wong SM (2002) Covariation in the capsid protein of hibiscus chlorotic ringspot virus induced by serial passaging in a host that restricts movement leads to avirulence in its systemic host. J Virol 76:1320–1324 Liu R, Li M, Liu ZP, Wu J, Chen L, Liu R, Liu ZP, Li M, Aihara K (2012) Identifying critical transitions and their leading biomolecular networks in complex diseases. Sci Rep 2:813 LoGiudice K, Ostefeld RS, Schmidt KA, Keesing F (2003) The ecology of infectious disease: effects of host diversity and community composition on Lyme disease risk. Proc Natl Acad Sci USA 100:567–571 Loverdo C, Park M, Schreiber SJ, Lloyd-Smith JO (2012) Influence of viral replication mechanisms on within-host evolutionary dynamics. Evolution 66:3462–3471 Luria S (1951) The frequency distribution of spontaneous bacteriophage mutants as evidence for the exponential rate of phage reproduction. Cold Spring Harbor Symp Quant Biol 16:463–470 MacArthur RH (1972) Geographical ecology: patterns in the distribution of species. Harper & Row, New York, NY Mackinnon JM, Read AF (1999) Genetic relationships between parasite virulence and transmission in the rodent malaria Plasmodium chabaudi. Evolution 53:689–703 Madden LV, Hughes G, van den Bosch F (2007) The study of plant disease epidemics. American Phytopathological Society, Saint Paul, MN Madden LV, Jeger MJ, van den Bosch F (2000) A theoretical assessment of the effects of vector-virus transmission mechanisms on plant virus disease epidemics. Phytopathology 90:576–594 Malpica JM, Sacristán S, Fraile A, García-Arenal F (2006) Association and host selectivity in multi-host pathogens. PLoS ONE 1:e41 Martin DP, van der Walt E, Posada D, Rybicki EP (2005) The evolutionary value of recombination is constrained by genome modularity. PLoS Genet 1:e51 Martínez F, Sardanyés J, Elena SF, Daròs JA (2011) Dynamics of a plant RNA virus intracellular accumulation: stamping machine vs. geometric replication. Genetics 188:637–646 Martinière A, Bak A, Macia JL, Lautredou N, Gargani D, Doumayrou J, Garzo E, Moreno A, Fereres A, Blanc S, Drucker M (2013) A virus responds instantly to the presence of the vector on the host and forms transmission morphs. eLife 2:e00183 Mauck KE, Chesnais Q, Shapiro LR (2018) Evolutionary determinants of host and vector manipulation by plant viruses. Adv Virus Res 101:189–250 McLeish M, Fraile A, García-Arenal F (2018) Ecological complexity in plant virus host range evolution. Adv Virus Res 101:293–339 McLeish M, Fraile A, García-Arenal F (2019a) Evolution of plant-virus interactions: host range and virus emergence. Curr Opin Virol 34:50–55 McLeish M, Sacristán S, Fraile A, García-Arenal F (2017) Scale dependencies and generalism in host use shape virus prevalence. Proc R Soc B 284:20172066 McLeish M, Sacristán S, Fraile A, García-Arenal F (2019b) Coinfection organizes epidemiological networks of viruses and hosts and reveals hubs of transmission. Phytopathology 109:1003–1010 Messenger SL, Molineux IJ, Bull JJ (1999) Virulence evolution in a virus obeys a trade-off. Proc R Soc B 266:397–404 Mi S, Lee X, Li XP, Veldman GM, Finnerty H, Racie L, LaVallie E, Tang XY, Edouard P, Howes S, Keith JC Jr, McCoy JM (2000) Syncytin is a captive retroviral envelope protein involved in human placental morphogenesis. Nature 403:785–789 Mideo N, Alizon S, Day T (2008) Linking within- and between-host dynamics in the evolutionary epidemiology of infectious diseases. Trends Ecol Evol 23:511–517 Miyashita S, Ishibashi K, Kishino H (2015) Viruses roll the dice: the stochastic behaviour of viral genome molecules accelerates viral adaptation at the cell and tissue levels. PLoS Biol 13:e1002094

194

S. F. Elena and F. García-Arenal

Miyashita S, Kishino H (2010) Estimation of the size of genetic bottlenecks in cell-to-cell movement of soil-borne wheat mosaic virus and the possible role of bottlenecks in speeding up selection of variations in trans-acting genes or elements. J Virol 4:1828–1837 Monjane AL, Dellicour S, Hartnady P, Oyeniran KA, Owor BE. Bezeidenhout M, Linderme D, Syed RA, Donaldson L, Murray S, Rybicki EP, Kvarnheden A, Yazdkhasti E, Lefeuvre P, Froissart R, Roumagnac P, Shepherd DN, Harkins GW, Suchard MA, Lemey P, Varsani A, Martin DP (2020) Symptom evolution following the emergence of maize streak virus. eLife 9:e51984 Moore CJ, Sutherland PW, Forster RL, Gardner RC, MacDiarmid RM (2001) Dark green islands in plant virus infection are the result of posttranscriptional gene silencing. Mol Plant Microbe Interact 14:939–946 Moreno AB, López-Moya JJ (2020) When viruses play team sports: mixed infections in plants. Phytopathology 110:29–48 Moreno-Pérez MG, García-Luque I, Fraile A, García-Arenal F (2016) Mutations that determine resistance breaking in a plant RNA virus have pleiotropic effects on its fitness that depend on the host environment and on the type, single or mixed, of infection. J Virol 90:9128–9137 Moury B, Fabre F, Senoussi R (2007) Estimation of the number of virus particles transmitted by an insect vector. Proc Natl Acad Sci USA 104:17891–17896 Moury B, Fabre F, Hébrard E, Froissart R (2017) Determinants of host species range in plant viruses. J GenVirol 98:862–873 Mustonen V, Lässig M (2009) From fitness landscapes to seascapes: non-equilibrium dynamics of selection and adaptation. Trends Genet 25:111–119 Nasir A, Romero-Severson E, Claverie JM (2020) Investigating the concept and origin of viruses. Trends Microbiol 28:959–967 Orton RJ, Wright CF, Morelli MJ, Juleff N, Thébaud G, Knowles NJ, Valdazo-González B, Paton DJ, King DP, Haydon DT (2012) Observing micro-evolutionary processes of viral populations at multiple scales. Phil Trans R Soc B 368:20120203 Pagán I, Alonso-Blanco C, García-Arenal F (2007) The relationship of within-host multiplication and virulence in a plant-virus system. PLoS ONE 2:2786 Pagán I, Montes N, Milgroom MG, García-Arenal F (2014) Vertical transmission selects for reduced virulence in a plant virus and for increased resistance in the host. PLoS Pathog 10:e1004293 Pastor-Satorras R, Vespignani A (2001) Epidemic spreading in scale-free networks. Phys Rev Lett 86:3200–3203 Peláez A, McLeish MJ, Paswan RR, Dubay B, Fraile A, García-Arenal F (2020) Ecological fitting is the forerunner to diversification in a plant virus with broad host range. J Evol Biol. https://doi. org/10.1111/jeb.13672 Pérez IA, Di Muro M, La Rocca CE (2020) Disease spreading with social distancing: a prevention strategy in disordered multiplex networks. Phys Rev E 102:022310 Pilosof S, Greenbaum G, Krasnov BR, Zelnik YR (2017a) Asymmetric disease dynamics in multihost interconnected networks. J Theor Biol 430:237–244 Pilosof S, Porter MA, Pascual M, Kéfi S (2017b) The multilayer nature of ecological networks. Nat Ecol Evol 1:0101 Pirone TP, Thornbury DW (1988) Quantity of virus required for aphid transmission of a Potyvirus. Phytopathology 78:104–107 Pocock MJO, Evans DM, Memmott J (2012) The robustness and restoration of a network of ecological networks. Science 335:973–977 Power AG, Flecker AS (2003) Virus specificity in disease systems: are species redundant? In: Kareiva P, Levin SA (eds) The importance of species: perspectives on expendability and triage. Princeton University Press, Princeton, MA, pp 330–347 Poulicard N, Pinel-Galzi A, Hebrard E, Fargette D (2010) Why rice yellow mottle virus, a rapidly evolving RNA plant virus, is not efficient at breaking rymv1-2 resistance. Mol Plant Pathol 11:145–154 Rappaport I, Wildman SG (1957) A kinetic study of local lesion growth on Nicotiana glutinosa resulting from tobacco mosaic virus infection. Virology 4:265–174

Plant Virus Adaptation to New Hosts: A Multi-scale Approach

195

Rico P, Ivars P, Elena SF, Hernández C (2006) Insights into the selective pressures restricting pelargonium flower break virus genome variability: evidence for host adaptation. J Virol 80:8124– 8132 Rodrigo G, Zwart MP, Elena SF (2014) Onset of virus systemic infection in plants is determined by speed of cell-to-cell movement and number of primary infection foci. J R Soc Interface 11:20140555 Rotenberg D, Kumar NKK, Ullman DE, Montero-Astúa M, Willis DK, Whitfield GTL, AE, (2009) Variation in tomato spotted wilt virus titer in Frankliniella occidentalis and its association with frequency of transmission. Phytopathology 99:404–410 Rousseau E, Bonneault M, Fabre F, Moury B, Mailleret L, Grognard F (2019) Virus epidemics, plant-controlled population bottlenecks and the durability of plant resistance. Phil. Trans R Soc B 374:20180263 Rousseau E, Tamisier L, Fabre F, Simon V, Szadkowski M, Bouchez O, Zanchetta C, Girardot G, Mailleret L, Grognard F, Palloix A, Moury B (2018) Impact of genetic drift, selection and accumulation level on virus adaptation to its host plants. Mol Plant Pathol 19:2575–2589 Sacristán S, Malpica JM, Fraile A, García-Arenal F (2003) Estimation of population bottlenecks during systemic movement of tobacco mosaic virus in tobacco plants. J Virol 77:9906–9911 Sacristán S, Fraile A, Malpica JM, García-Arenal F (2005) An analysis of host adaptation and its relationship with virulence in cucumber mosaic virus. Phytopathology 95:827–833 Sacristán S, Díaz M, Fraile A, García-Arenal F (2011) Contact transmission of tobacco mosaic virus: a quantitative analysis of parameters relevant for virus evolution. J Virol 85:4974–4981 Sahneh FD, Scoglio C (2014) Competitive epidemic spreading over arbitrary multilayer networks. Phys Rev E 89:062817 Sanz J, Xia CY, Meloni S, Moreno Y (2014) Dynamics of interacting diseases. Phys Rev X 4:041005 Sardanyés J, Elena SF (2011) Quasispecies spatial models for RNA virus with different replication modes and infection strategies. PLoS ONE 6:e24884 Sardanyés J, Martínez F, Daròs JA, Elena SF (2012) Dynamics of alternative modes of RNA replication for positive-sense RNA viruses. J R Soc Interface 9:768–776 Sattentau Q (2008) Avoiding the void: cell-to-cell spread of human viruses. Nat Rev Microbiol 6:815–826 Saumell-Mendiola A, Seerrano MA, Boguñá M (2013) Epidemic spreading on interconnected networks. Phys Rev E 86:026106 Scatà M, Di Stefano A, Liò P, La Corte A (2016) The impact of heterogeneity and awareness in modeling epidemic spreading on multiplex networks. Sci Rep 6:37105 Schulte MB, Draghi JE, Plotkin JB, Andino R (2015) Experimentally guided models reveal replication principles that shape the mutation distribution of RNA viruses. eLife 4:e03753 Shapiro JW, Turner PE (2018) Evolution of mutualism from parasitism in experimental virus populations. Evolution 72:707–712 Simko I, Piepho H (2012) The area under the disease progress stairs: calculation, advantage, and application. Phytopathology 102:381–389 Simon-Loriere E, Holmes EC (2011) Why do RNA viruses recombine? Nat Rev Microbiol 9:617– 626 Sofonea MT, Alizon S, Michalakis Y (2015) From within-host interactions to epidemiological competition: a general model for multiple infections. Phil Trans R Soc B 370:20140303 Solé R, Elena SF (2019) Viruses as complex adaptive systems. Princeton University Press, Princeton, NJ Stella M, Andreazzi CS, Selakovic S, Goudarzi A, Antonioni A (2017) Parasite spreading in spatial ecological multiplex networks. J Complex Netw 5:486–511 Stewart AD, Logsdon JM, Kelley SE (2005) An empirical study of the evolution of virulence under both horizontal and vertical transmission. Evolution 59:730–739 Sulzinski MA, Zaitlin M (1982) Tobacco mosaic virus replication in resistant and susceptible plants: in some resistant species virus is confined to a small number of initially infected cells. Virology 121:12–19

196

S. F. Elena and F. García-Arenal

Syller J, Grupa A (2016) Antagonistic within-host interactions between plant viruses: molecular basis and impact on viral and host fitness. Mol Plant Pathol 17:769–782 Tamukong YB, Collum TD, Stone AL, Kappagantu M, Sherman DJ, Rogers EE, Dardick C, Culver JN (2020) Dynamic changes impact the plum pox virus population structure during leaf and bud development. Virology 548:192–199 Tromas N, Zwart MP, Lafforgue G, Elena SF (2014) Within-host spatiotemporal dynamics of plant virus infection at the cellular level. PLoS Genet 10:e1004186 Valverde S, Piñero J, Corominas-Murtra B, Montoya J, Joppa L, Solé R (2018) The architecture of mutualistic networks as an evolutionary spandrel. Nat Ecol Evol 2:94–99 Valverde S, Vidiella B, Montañez R, Fraile A, Sacristán S, García-Arenal F (2020) Coexistence of nestedness and modularity in host–pathogen infection networks. Nat Ecol Evol 4:568–577 Van den Bosch F, McRoberts N, van den Berg F, Madden LV (2008) The basic reproduction number of plant pathogens: matrix approaches to complex dynamics. Phytopathology 98:239–249 Van Oosten AR, McGill J, Legge KL (2010) Quantification of the frequency and multiplicity of infection of respiratory- and lymph node-resident dendritic cells during influenza virus infection. PLoS ONE 5:e12902 Vanderplank JE (1963) Plant diseases: epidemics and control. Academic Press, New York, NY Walther BA, Ewald PW (2004) Pathogen survival in the external environment and the evolution of virulence. Biol Rev 79:849–869 Wang D, Macfarlane SA, Maule AJ (1997) Viral determinants of pea early browning virus seed transmission in pea. Virology 234:112–117 Wang K, Lau TY, Morales M, Mont EK, Straus SE (2005) Laser-capture microdissection: refining estimates of the quantity and distribution of latent herpes simplex virus 1 and varicella-zoster virus DNA in human trigeminal ganglia at the single-cell level. J Virol 79:14079–14087 Weaver SC, Forrester NL, Liu JY, Vasilakis N (2021) Population bottlenecks and founder effects: implications for mosquito- borne arboviral emergence. Nat Rev Microbiol 19:184–195 Weigel K, Pohl JO, Wege C, Jeske H (2015) A population genetics perspective of Geminivirus infection. J Virol 89:11926–11934 Weitz JS, Poisot T, Meyer JR, Flores CO, Valverde S, Sullivan MB, Hochberg ME (2013) Phagebacteria infection networks. Trends Microbiol 21:82–91 Whitfield AE, Falk BW, Rotenberg D (2015) Insect vector-mediated transmission of plant viruses. Virology 479:278–289 Wolff G, Melia E, Snijder EJ, Bárcena M (2020) Double-membrane vesicles as platforms for viral replication. Trends Microbiol 28:1022–1032 Woolhouse ME, Gowtage-Sequeria S (2005) Host range and emerging and reemerging pathogens. Emerg Infect Dis 11:1842–1847 Zhao D, Li L, Peng H, Luo Q, Yang Y (2014a) Multiple routes transmitted epidemics on multiplex networks. Phys Lett A 378:770–776 Zhao L, Abbasi AB, Illingworth CJR (2019) Mutational load causes stochastic evolutionary outcomes in acute RNA viral infection. Virus Evol 5:vez008 Zhao Y, Zheng M, Liu Z (2014b) A unified framework of mutual influence between two pathogens in multiplex networks. Chaos 24:043129 Zhao W, Xu Z, Zhang X, Yang M, Kang L, Liu R, Cui F (2018) Genomic variations in the 3, -termini of rice stripe virus in the rotation between vector insect and host plant. New Phytol 219:1085–1096 Zwart MP, Daròs JA, Elena SF (2011) One is enough: In vivo effective population size is dosedependent for a plant RNA virus. PLoS Pathog 7:e1002122 Zwart MP, Elena SF (2015) Matters of size: Genetic bottlenecks in virus infection and their potential impact on evolution. Annu Rev Virol 2:161–179 Zwart MP, Tromas N, Elena SF (2013) Model-selection-based approach for calculating cellular multiplicity of infection during virus colonization of multi-cellular hosts. PLoS ONE 8:e64657

Viral Fitness, Population Complexity, Host Interactions, and Resistance to Antiviral Agents Esteban Domingo, Carlos García-Crespo, María Eugenia Soria, and Celia Perales

Abstract Fitness of viruses has become a standard parameter to quantify their adaptation to a biological environment. Fitness determinations for RNA viruses (and some highly variable DNA viruses) meet with several uncertainties. Of particular interest are those that arise from mutant spectrum complexity, absence of population equilibrium, and internal interactions among components of a mutant spectrum. Here, concepts, fitness measurements, limitations, and current views on experimental viral fitness landscapes are discussed. The effect of viral fitness on resistance to antiviral agents is covered in some detail since it constitutes a widespread problem in antiviral pharmacology, and a challenge for the design of effective antiviral treatments. Recent evidence with hepatitis C virus suggests the operation of mechanisms of antiviral resistance additional to the standard selection of drug-escape mutants. The possibility that high replicative fitness may be the driver of such alternative mechanisms is considered. New broad-spectrum antiviral designs that target viral fitness may curtail the impact of drug-escape mutants in treatment failures. We consider to what extent fitness-related concepts apply to coronaviruses and how they may affect strategies for COVID-19 prevention and treatment.

E. Domingo (B) · C. García-Crespo · M. E. Soria · C. Perales Centro de Biología Molecular “Severo Ochoa” (CSIC-UAM), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, 28049 Madrid, Spain e-mail: [email protected] Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, 28029 Madrid, Spain M. E. Soria · C. Perales Department of Clinical Microbiology, Instituto de Investigación Sanitaria-Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), Av. Reyes Católicos 2, 28040 Madrid, Spain C. Perales Department of Molecular and Cell Biology, Centro Nacional de Biotecnología (CNB-CSIC), Consejo Superior de Investigaciones Científicas (CSIC), Campus de Cantoblanco, 28049 Madrid, Spain © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 E. Domingo et al. (eds.), Viral Fitness and Evolution, Current Topics in Microbiology and Immunology 439, https://doi.org/10.1007/978-3-031-15640-3_6

197

198

E. Domingo et al.

1 Fitness Concept and Its Application to Viruses Fitness is a general concept in genetics whose origin can be traced to C. Darwin. Fitness quantifies the reproductive and survival efficiency of a biological entity, correlated with the rate of evolutionary change promoted by natural selection. Its use in evolution in the form of fitness landscape can be traced to A. Janet in 1895 and to S. Wright in 1931 (see article by P. Schuster and P. Stadler in this volume). Early studies by F.J. Ayala documented that diversified Drosophila populations exhibited higher fitness than their undiversified counterparts (Ayala 1965, 1969) thus giving experimental support to R.A. Fisher’s fundamental theorem of natural selection that asserts that fitness increase is determined by population variance (Fisher 1930). The prominent population variance of viruses should facilitate fitness gains, as indeed documented with many experimental studies, despite confounding factors such as population size limitations or genetic drift. At the molecular level, fitness modifications may come about by the three major mechanisms of molecular variation of viruses: mutation, recombination, and genome segment reassortment, the latter in the case of viruses with segmented genomes. Comparative fitness values have become an important tool in viral genetics since they capture the overall replicative capacity of a virus, independently of its molecular and functional basis. The intricacies and interconnections among virus and cellular components involved in virus-host interactions render extremely valuable to have at our disposal a parameter that captures quantitatively the end result that mirrors the capacity of virus survival. Indeed, a fitness value is the net result of the performance of a virus at any step of its life cycle: entry into the host cells (except in cases of intracellular steadystate persistence without significant virus release from cells), uncoating of the viral genome, viral gene expression, genome replication, transcription (retrotranscription in the case of retroviruses), particle assembly, exit from the cell, and stability in the extracellular environment (except for direct cell-to-cell transmission). Each of these steps comprises many sub-steps whose adequacy for infectious progeny production can conceivably be influenced by genome variations whose success is conditioned by trade-offs needed to sustain functional equilibrium [reviewed in (Domingo 2020)]. Viral fitness is also a determinant of the extent of host modifications in the course of an infection (Cervera et al. 2018; Correa et al. 2020). This effect bears on infection sequels in host cells and organisms when the infecting virus has been cleared. This is an old observation (for example, with the post-polio syndrome) that has recently revived with clinical conditions remaining upon resolution (virus clearance according to current detection methods) of hepatitis C virus (HCV) and SARS-CoV-2 infections. An example of identification of a step of a virus life cycle responsible for fitness increase was the demonstration that increased thermal stability of the virus particle was responsible for the fitness advantage of a segmented form of the foot-andmouth disease virus (FMDV) genome. This unusual evolutionary transition—akin to genome segmentation thought to have occurred in nature during evolution of ancestors of present day RNA viruses—arose in cell culture as a result of 260 serial,

Viral Fitness, Population Complexity, Host Interactions …

199

high multiplicity of infection (MOI, the number of infectious particles per cell) passages in BHK-21 cells (García-Arriaza et al. 2004; Moreno et al. 2014; Ojosnegros et al. 2011). Cell entry, intracellular replication, and release of virus from cells were monitored during competitions between the standard monopartite and the segmented FMDV versions, leading to the identification of thermal stability as the critical trait. The selective advantage was probably conferred by the lower packaging density of shorter genomes inside the geometrically constrained picornaviral capsid (Ojosnegros et al. 2011). In an in vivo study, genotypes of the fish virus infectious hematopoietic necrosis virus (IHNV) that differed in virulence were compared regarding entry into its host rainbow trout, within-host replication, within-host coinfection fitness, and virus shedding (Wargo and Kurath 2011). Although the overall fitness and virulence were influenced by each of the traits examined, the maximum impact on fitness was due to differences in within-host replication. The high error rates during RNA replication [average of 10–3 to 10–6 mutations introduced per nucleotide copied (Drake and Holland 1999; Sanjuan et al. 2010)] are critical to interpret fitness variations of RNA viruses since the substrate for fitness gain is the multitudes of related variants blindly and unavoidably produced during replication. The resulting mutant spectra, distributions, clouds or swarms, termed viral quasispecies, are the true actors of evolutionary change (Domingo et al. 2021b; Domingo and Perales 2019; Domingo and Schuster 2016). The supportive theoretical background that explains the adaptive dynamics of mutant swarms has been strengthened by extensions of quasispecies theory to non-equilibrium conditions (Saakian and Hu 2006, 2016; Wagner et al. 2016). Perturbations of equilibrium (meaning absence of a steady-state distribution of components of a mutant spectrum within the experimentally amenable observation times) characterize viral populations. It poses a problem to establish a ranking of fitness among genomes of a mutant spectrum that have a transient existence (Sect. 5.1). The complexity of dynamic mutant spectra can be quantified by diversity indices that capture the number and abundance of different variants, as well as the average number of mutations per genome (Fuhrmann et al. 2021). The diversity attained by a viral population is influenced by several parameters such as the mutation rate, environmental stability, proximity of the viral population to a clonal origin, or bottlenecks and their intensity (defined as the number of individual genomes that separate from a mutant ensemble to initiate an evolutionary trajectory). An additional consideration is that mutant swarms are not mere aggregates of variant genomes, since internal interactions may be established among them or their expression products; intra-mutant spectrum interactions may modify the behavior of the ensemble and, therefore, the molecular pathways followed for fitness modification (Sects. 2 and 4). The genotypic and phenotypic plasticity of viral quasispecies renders fitness variations frequent and contingent upon environmental conditions. Two definitions of viral fitness, that mirror preferences in general genetics, have been proposed, and are reflected in the experimental protocols for its determination: (i) The capacity of contributing viral progeny to the immediate generations, for example, as measured in a single passage in cell culture or with constructs that limit replication to a single

200

E. Domingo et al.

replication cycle (Mansky 1998; Scholle et al. 2004). (ii) The contribution of progeny to more distant generations, as measured, for example, during several virus passages in cell culture. When the second design is chosen, it is often assumed that the fitness value will be maintained independently of the number of passages, provided the environment does not change. As discussed more amply in Sect. 4, despite the assumption being largely unavoidable with current experimental settings, it is weakened by uncertainties derived from viral dynamics. Viruses keep evolving in the course of fitness assays unless replication is restricted to a single cycle, which eliminates the opportunity of continued competition. An additional facet of fitness determinations in general—which is accentuated in the case of viral populations—is that adaptation involves coordinated changes in several phenotypic traits. Such changes depend on genomic residues, the resulting RNA structures, or the encoded viral proteins that are often multifunctional, and that may not have an identical amino acid sequence when expressed by individual genomes of the same mutant spectrum (Sect. 4). Different roles in which a single nucleotide may predictably be involved include RNA signals, the sequence of a protein that has two or more functions, or the sequence of two different proteins when the nucleotide is part of two overlapping reading frames. Disruption of genome-scale ordered RNA secondary structure (GORS) in murine norovirus resulted in mutants that replicated with similar kinetics to wild type virus in cell culture and primary macrophages in serial passages. In vivo, the mutant viruses established persistence but were rapidly eliminated in competition with the wild type virus. Thus, GORS may contribute to viral fitness and persistence (McFadden et al. 2013). In compact viral genomes, each nucleotide is likely to have an influence on multiple traits, with implications for fitness evolution. Some uncertainties may prevail over others, depending on the protocol used for a fitness determination, as discussed in the coming Sect. 2.

2 Experimental Fitness Measurements in Cell Culture There are two major designs to determine fitness of a viral population under a set of environmental conditions established in a cell culture system: (i) growth-competition experiments in which two viruses are confronted for entry into cells, intracellular replication, exit from the cells, and particle stability, and (ii) determination of performance in the different steps of an infectious cycle (notably replicative parameters) for the individual viruses in separate infections, using the same cell culture system. Each of these two designs may involve a single infection (one viral passage or single replication round) or two or more serial infections (multiple viral passages) (Fig. 1). Relevant variables are the cell culture medium composition, temperature, pH, and presence of externally added components (drugs, antibodies, etc.). In growth-competition experiments, the two viruses compete for intracellular resources. Potentially limiting resources include rearranged membrane structures, cellular proteins that are assembled as co-factors in the replication complexes (with

Viral Fitness, Population Complexity, Host Interactions …

201

the viral polymerase and other viral proteins), metabolites, in particular nucleotides as substrates for genome replication and transcription, cellular components for protein synthesis (ribosomes, t-RNA, translation initiation, and termination factors), among others. For example, all HCV-coded proteins are either anchored in or in close contact with membranes, and some of them assemble at membrane sites to form replication factories (Shivaprasad and Sarnow 2021). For both fitness determination designs, the MOI may influence the final outcome. In the standard cell culture protocols—where the infection is allowed to be completed to produce overt cytopathology in the case of cytolytic infections—a low MOI means that the two viruses will not compete for resources in the initial infection rounds but will do so at late times. Depending on the burst size (number of infectious progeny produced per infected cell), following the first round of infection, the remaining susceptible cells will be infected at random by multiple particles of the two competing viruses. This will establish various levels of competition for resources, and of intensity of cellular modifications due to the intracellular viral load. Superinfection permissiveness or exclusion may also affect the outcome of the competition; in a study with IHNV, it was observed that superinfection restriction was not influenced by the virulence level of the virus isolate (Kell et al. 2013). The growth-competition design yields a fitness vector whose slope determines the relative proportion of the two competitors as a function of time (Fig. 1a). During the competition two or more populations may coexist for several passages until one of them displaces the others, and becomes dominant; this experimental observation is in agreement with the Red Queen hypothesis and the competitive exclusion principle of population genetics (Clarke et al. 1994) (Fig. 1). Competitions can be programmed directly between two different viral populations of interest, or, indirectly, by confronting each of the virus populations with a third, reference clone or population, arbitrarily given a fitness value of 1. In both cases, the measurements are equivalent to the determination of a selection coefficient (Maree et al. 2000), although in virology the relative nature of the fitness value (given by a ratio) is widely accepted. Since during fitness assays mutant spectra infect cells to produce successive rounds of mutant spectra with a new composition, not only the MOI is a relevant parameter but also the total number of cells and virus involved in each infection. Depending on the number of infecting viruses some minority genomes may or may not participate in the next infection (Fig. 2). That is, many variants will be lost by genetic drift. Unavoidably, each infection represents a bottleneck, although not necessarily severe. This is because in the usual infection designs it is not possible to use the entire virus yield from one passage to infect the corresponding larger number of cells to maintain the MOI in the next infection. A design of this sort, with an increasing number of infectious virus at each successive passage, is manageable only for a few passages with a non-pathogenic virus (and with minimal probability of becoming pathogenic, such as a bacterial virus). It is certainly not a feasible protocol with pathogenic viruses, for obvious safety considerations due not only to the handling of a large number of virus but also to the increasing numbers of variants with unknown pathogenic potential that would be produced. The mutation frequency

202

E. Domingo et al.

(a)

Fraction of original ratio to wt

Proportion relative to initial mixture

10

1

A

1

C G

-1

10

B

-2

10

D F

0

Proportion relative to initial mixture

H

10

0.1

(b)

Control

100

E

4 8 12 Passage number

1 2 3 4 Passage number

10

1

0.1

Fraction of original ratio to wt

5

10 0

15

20 25 30 35 Passage number

40

45

D

100

50

H

10 1 10-1 -2

10

G

4

(c)

8

12

24 Passage

10

Extracellular progeny

Viral titer

6 4

Population 1

6 Population 2 4 Population 3

2

2 0

0 24 48 Time (h) Growth rate

2 1.5 1 0.5

0

72 15 TCID50/ml/passage

0 2.5

Intracellular progeny

8

8 Viral titer

48

Fitness Population 1 < Fitness Population 2 < Fitness Population 3 10

TCID50/ml/passage

36

24 48 Time (h) Maximum titer

12

Population 1

9 Population 2 6 Population 3

3 0

0 Population

72

Population

Viral Fitness, Population Complexity, Host Interactions …

203

◄Fig. 1 Fitness determinations. a Competition between two viral clones or populations. The scheme on the left depicts fitness vectors with a winner (orange arrow), a looser (green arrow), or without a winner (equal fitness, blue arrow). In the panel on the right, a real set of fitness vectors determined with vesicular stomatitis virus (VSV) clones (A–H) that had lost fitness upon plaque to plaque transfers (Muller’s ratchet; compare with Fig. 2) is shown. Fitness of a monoclonal antibody-resistant VSV mutant was also measured (Control). Vectors were determined by measuring the proportion of each VSV clone in competition with a reference VSV clone, in the course of the passages indicated in abscissa. Adapted from (Clarke et al. 1993), with permission from the American Society for Microbiology, Washington DC, USA. b Competition exclusion principle. Top scheme: Two viral clones or populations can coexist for many passages until one may displace the other and become dominant (orange arrow) or be displaced and become a minority (green arrow). Bottom panel: experimental data obtained with VSV. Monoclonal antibody-resistant mutants D, H, G were competed separately with wild type virus. During the first 12–15 passages the two competing viruses coexisted and gained fitness (in agreement with the Red Queen hypothesis). At later passages mutant D and H outcompeted the wild type virus, while mutant G was outcompeted by the wild type (competitive exclusion). Reproduced from (Clarke et al. 1994) with permission from the National Academy of Sciences of the US. c Schematic representation of parameters indicative of replicative fitness of three individual viral populations. Top panels: viral titers in the course of a single infection in cell culture; the discontinuous horizontal line marks the limit of detection of virus infectivity. Bottom panels: average growth rates and maximum titer, calculated for several infections carried out at different MOI. Both, infectivity and amount of viral RNA can be measured with the same samples and experimental design. An example with HCV can be found in (Moreno et al. 2017); it is discussed in the text

(Mf; number of mutations counted relative to the total number of nucleotides that have been sequenced)—commonly used as diversity index—does not recapitulate quantitative aspects of variant availability. Mf is an intrinsic property, in the sense of being independent of the population size of the sample of the population under study. In contrast, the latter is key for the probability that a specific type of variant genome is represented in a sample of that population (the one used in an infection for fitness determination or other) (Domingo and Perales 2012, 2021). Both the MOI and the total infective load are relevant to interpret the result of a fitness assay. In growth-competition experiments, the two viruses should be distinguishable by a genetic or a phenotypic marker. Marker stability should be verified for correct identification of the competitors (each one consisting of a mutant spectrum) in the course of several passages. Genetic markers have included mutations, ideally multiple mutations to be monitored by Sanger or ultra-deep sequencing (UDS) (Seifert and Beerenwinkel 2016) in the initial and sequential progeny populations. Alternatively, oligonucleotide primers have been designed for specific RNA amplification of each of the competing viruses by reverse transcription-polymerase chain reaction (RTPCR) (Wargo and Kurath 2011, 2012). Phenotypic markers have included plaque size or morphology, and drug or monoclonal antibody resistance. Again, control for stability is required (authentication of the marker through nucleotide sequencing, or verification of the absence of reversion when the virus is subjected to a larger number of passages than those involved in the fitness assay). Fitness can be measured with natural or engineered viruses or centered on a viral genetic region whose variant forms are introduced into an infectious clone; in this case, the genetic background

E. Domingo et al.

.........

204

BOTTLENECKS OF DIFFERENT INTENSITY

***

MODIFICATION OF MUTANT SPECTRUM CONTEXT

*** *** SOME MINORITIES DIVERSIFICATION

.........

CAN BECOME PROMINENT, OTHERS ARE EXCLUDED FROM ENSUING EVOLUTION

Fig. 2 Bottlenecks acting on a mutant spectrum. Bottlenecks of different intensity (here represented by arrows of different width) are permanent in the natural intra-host and inter-host evolution of viruses, and in experimental designs. Out of thousands (probably often millions) of mutant genomes (here represented as horizontal lines, with mutations as colored symbols) bottlenecks restrict the number of genomes that participate in subsequent evolutionary events. Some of the consequences are summarized in the box. Sequential bottlenecks have consequences for fitness maintenance or variation, as discussed in the text with literature references. Drawing adapted from Domingo and Perales (2019)

Viral Fitness, Population Complexity, Host Interactions …

205

where the regions of interest to be compared have been inserted may influence the competition outcome. An illustration of the concepts just summarized has been reported in studies with HCV in cell culture. Quantification of fitness gain and mutant spectrum analysis were carried out upon subjecting a clonal population of HCV (rescued by transfection of plasmid transcripts into susceptible cells) to 200 serial infections in human hepatoma (Huh-7.5) cells (Fig. 3). By passage 45 (population termed HCV p45), HCV had increased its fitness by 1.8-fold relative to the initial HCV p0, arbitrarily given the fitness value of 1, as determined by growth-competition experiments (Sheldon et al. 2014). The fitness of the populations at passages 100 and 200 (populations termed HCV p100 and HCV p200, respectively) was 2.3-fold the value of the initial population (Moreno et al. 2017) (Fig. 3). The observed leveling-off of fitness (no increase from passage 100 to passage 200) may be due to difficulty of attaining increased fitness through genome modifications, and to limitation of the viral population size whose expansion would be necessary to include the arising variants that would contribute to continuous fitness gain (Novella et al. 1995a, 1999). For individual clones or populations, fitness can be approximated by measuring replicative parameters during single or multiple passages. Growth competitions and individual parameters do not necessarily yield identical fitness rankings. In contrast to the result of the growth-competition experiments, when HCV p100 and HCV p200 were compared individually in a five serial passage experiment, the progeny virus that was shed into the extracellular medium had increased 17-fold for HCV p100 and 45-fold for HCV p200, relative to that of HCV p0. In the same experiment, the maximum level of extracellular virus exhibited a modest increase of 1.2-fold for both HCV p100 and HCV p200, relative to HCV p0, thus paralleling the leveling-off observed in the competition experiments (Fig. 3). These comparisons suggest that in assays in which individual viral clones or populations are tested separately, cellular resources may become limiting depending on the number of intracellular particles attained. They also emphasize that not only a clear definition of fitness, but also an accurate description of the protocol used for its measurement is paramount to the interpretation of results. When studying viruses individually, alternative replicative parameters may not show a parallel behavior [for example, compare the level of intracellular virus versus maximum progeny production in Huh-7.5 cells (Fig. 3)], thus requiring the measurements of several parameters to gain insight on the basis of a fitness advantage.

3 Experimental Fitness Measurements in vivo In vivo assays are also based on growth-competition experiments in which hosts (animals, plants) are infected with a mixture of the two viruses. Their relative proportion is analyzed after a single infection or two or more serial infections in naïve hosts. In addition, the efficiency of different steps of the virus life cycle can be quantified in the course of the competitions or in infections with the individual viruses.

206

E. Domingo et al.

(a)

HCVcc

p0

p15

p30

…

p60

p45

…

…

…

p100

p80

…

…

p125

…

p150

…

p175

…

p200

…

8

10

Viral titer (TCID50/ml)

7

10

6

10

5

10

4

10

0

10

20

30

40

50

60

70

80

90

100 110 120 130 140 150 160 170 180 190 200

10

20

30

40

50

60

70

80

90

100 110 120 130 140 150 160 170 180 190 200

10

20

30

40

50

60

70

80

90

100 110 120 130 140 150 160 170 180 190 200

11

Viral RNA (molecules/ml)

10

10

10

9

10

8

10

7

Specific infectivity (TCID50/RNA molecules)

10 0 -1

10

-2

10

-3

10

-4

10

-5

10

0

Passage number

Relative fitness

3 2 1 0 0

10

20

30

40

50

60

70

80

90

100 110 120 130 140 150 160 170 180 190 200

Passage number

3 2.5

0

ns ns (p=0.06)

2

1 0.5

*** HCV p0

HCV p100

HCV population

HCV p200

1

2

3

4

Passage number

Growth rate

1.5

0

5

5

Viral titer [Log (TCID 50/ml)]

4

6 4 2 0

t

3

2

HCV p200 8

pu

2

Passage number

4

MOI 0.00003

In

1

Maximum titer Log 10 TCID50/ml/passage

t

0

6

8

10

ns

7

Viral titer (TCID50/ml)

2

Viral titer [Log (TCID 50/ml)]

4

HCV p100

t

6

MOI 0.0003

8

pu

HCV p0 8

In

Log 10 TCID50/ml/passage

MOI 0.003

In

MOI 0.03

pu

Viral titer [Log (TCID 50/ml)]

(b)

***

6 5 4 3

*** HCV p0

HCV p100

HCV p200

HCV population

10 10 10 10 10

1

2

3

4

5

Passage number 6 5

HCV p0 HCV p200

4 3 2 1 0

50 100 150 200 250 300

Time (min) at 45ºC

Viral Fitness, Population Complexity, Host Interactions …

207

◄Fig. 3 Fitness variation of hepatitis C virus (HCV) in cell culture. a Scheme of a clonal (plasmidderived) HCV p0 population subjected to 200 serial infections in human hepatoma Huh-7.5 cells. The three top panels give the amount of extracellular infectivity, viral RNA, and their ratio (specific infectivity), calculated at each passage. The bottom panel indicates fitness values determined by growth-competition experiments, using HCV p0 (arbitrarily given the fitness value of 1) as reference. b Growth rate and maximum titer attained by HCV p0, HCV p100, and HCV p200 subjected to five serial passages in Huh-7.5 cells, at a 1000-fold range of MOI. The top panels depict values for individual populations at different MOI, and the bottom panels compare the average values obtained at different MOI. The panel on the right shows a control to exclude that a different thermal stability of HCV p0 and HCV p200 infectivity might have altered the interpretation of the results. Statistical significances were determined by ANCOVA. Figure adapted from (Moreno et al. 2017), with permission from the American Society for Microbiology, Washington DC, USA

The in vivo assays introduce the additional complication of the complex, mosaic (fine-grained) environments scattered within an entire, differentiated live host. Even human hepatitis B virus, with a defined tropism for hepatocytes, encounters a heterogeneous environment due to metabolic zonation of the liver. Bottlenecks conductive to random drift of sequences are considered a hallmark of host-to-host virus transmission but they can take place also within a single infected individual during intra-host virus evolution (Bull et al. 2011; Pfeiffer and Kirkegaard 2006; Roossinck and Ali 2007). Intra-host bottlenecks contribute to virus diversification together with distinct selective forces acting in different host compartments (which include cell-tocell differences and local physiological and immune responses). En block transmission of viruses through extracellular vesicles can also group variants (presumably in a random fashion) that together will reach a target cell in experimental fitness determinations (Altan-Bonnet et al. 2019; Chen et al. 2015; Sanjuan and Thoulouze 2019). FMDV genetic variants obtained during infection of pigs displayed different fitness when measured in competition with a reference FMDV clone in swine (Carrillo et al. 1998). In competitions between the parental virus used for the initial pig infection and variants found at low frequency in the infected animals, individual lesions (aphthae) harbored different proportions of the two FMDV competitors. The different composition observed in individual aphthae suggests a random component in the occupation of those host sites by viruses from the viremic pool. The subset of particles that reach aphthae is a major source of virus for transmission to a contact host individual. Another in vivo fitness assay consisted in the coinfection of juvenile rainbow trout by different IHNV isolates that belonged to common genetic subtypes circulating in a given geographical area. The results did not reveal an average difference in relative fitness despite high levels of fish-to-fish variation in the ratio of two competing genotypes (Troyer et al. 2008). In yet another in vivo assay, a herpes simplex virus 1 mutant that was resistant to thiourea compounds (a group of inhibitors or viral DNA encapsidation) replicated very similarly to its parental virus following corneal inoculation of mice; the mutant retained in vivo fitness, and exhibited increased

208

E. Domingo et al.

virulence in mice (Pesola and Coen 2007). The study illustrates also that acquisition of drug resistance does not necessarily imply fitness cost (Sect. 6.1). Severe fitness losses were described for some CD8+ T cell escape mutants of simian immunodeficiency virus (SIVmac239). The decrease was such that it resulted in the direct reversion of the mutations in vivo (Mudd et al. 2011) rather than in incorporation of compensatory mutations. When a genetic lesion results in a strong fitness decrease but the virus can still replicate, an expected outcome is the direct reversion of the genetic lesion, which may also occur during fitness assays. Depending on the initial fitness and the population size in the course of fitness recovery passages, compensatory mutations rather than direct reversion of the lesion may mediate in the gain (Sanjuan et al. 2005). Fitness effects being environmental dependent, reversion may occur only at some sites within an organism. Preferential colonization of organs by some variants over others has been reported in infected hosts [(Rueca et al. 2020; Sanz-Ramos et al. 2008; Tracy et al. 2015), among other studies]. Influenza virus (IV) variants that showed reduced susceptibility to different antiviral agents displayed a moderate reduction in fitness compared to wild type viruses (Lee et al. 2021; Lloren et al. 2019; Nogales et al. 2019). Examination of the tolerance to substitutions of a critical IV hemagglutinin H9 residue involved in receptor recognition indicated that some alternative amino acids conferred a selective advantage in cell culture and in vivo (Obadan et al. 2019). One of the first salient variants identified in SARS-CoV-2, with amino acid substitutions D614G in spike protein S, showed increased competitive fitness in primary human airway epithelial cells as compared with its parental virus. Despite not reaching higher titers in vivo, it was transmitted significantly more efficiently than the wild type virus (Hou et al. 2020) (see also Sect. 8). Although the number of studies is still scarce, fitness variations and adaptive changes can be investigated using infection of tissue explants and organoids in culture. The approach faces the advantages of an environment more realistic than a cell culture, and the limitations inherent to its complexity.

4 Limitations of Fitness Determinations In addition to possible differences in fitness values calculated in growth-competition experiments or deduced from replicative parameters of the individual viral populations (Sect. 2), other limitations should be considered for the experimental designs and interpretation of fitness differences. Intracellular replication may lead to complementation, cooperation, or interference among genomes of two competing mutant clouds (de la Torre and Holland 1990; García-Arriaza et al. 2004; Kirkegaard et al. 2016; Novella 2003; Perales et al. 2007, 2010; Shirogane et al. 2012, 2016, 2019). Removal of a subpopulation of viruses from the context of its mutant spectrum (as it happens in each viral passage in cell culture or during transmission in vivo) may modify the intra-population interactions and relative genome dominances. Viral

Viral Fitness, Population Complexity, Host Interactions …

209

fitness of a mutant spectrum is not a mere average of fitness of individual components, because the latter do not have an existence which is totally independent of their population companions. Furthermore, recent studies with HCV have indicated that a true population equilibrium (meaning a steady-state distribution of genomes within the mutant spectrum; see also Sect. 1) is very difficult to approach, at least with current serial passage designs (Delgado et al. 2021; Domingo et al. 2020; Gallego et al. 2020; Moreno et al. 2017). Either because each viral passage implies some population drift or because the changing mutant spectrum acts as part of the environment for the replicating virus (Kuppers 2016) (or a combination of both classes of influences) no steady mutant distribution was attained. If transient equilibrium steps occurred, they were not identified within the time frame of experimental observations. Thus, perturbations will take place during fitness assays, even under carefully controlled cell culture conditions, and using viruses well adapted to the host cells. Although obvious given the influence of the environment on fitness, it should be emphasized that a virus may (often will) display different fitness in two cell types or in different natural hosts (Ebel et al. 2011). Also, fitness in cell culture need not reflect fitness in vivo (Clementi and Lazzarin 2010; Domingo et al. 2019; Martinez-Picado and Martinez 2008; Quinones-Mateu et al. 2008; Quiñones-Mateu and Arts 2006). Subtle structure modifications away from the catalytic site of the RNA-dependent RNA polymerase (RdRp) of West Nile Virus (WNV) altered WNV fitness in a host cell-dependent manner (Van Slyke et al. 2012). Regarding fitness determinations in vivo, virus compartmentalization in different cells, tissues, and organs renders a fitness value dependent on random effects involved in the extent of colonization of different compartments by the competing viruses. Genetic or epigenetic heterogeneities within a cellular collectivity may exert still largely unexplored effects on virus replication. Adaptation of a virus to a new in vivo environment (new meaning in this case a different individual of the same host species) entails coordinated changes in several characters. This is particularly relevant for viruses with compact genomes in which virtually any nucleotide has a good probability of affecting functions at the RNA or protein level. Genetic correlations among phenotypic traits may arise from individual nucleotide residues, RNA, and protein high order structures, in addition to linkage, epistasis, or pleiotropy (Sect. 1). The multifunctional nature of most viral proteins adds to the complexity of adaptive adjustments and the trade-offs involved in accommodation of different traits for fitness maintenance or gain. As an example, influenza virus H9N2 antibody escape mutants that included deletions in the HA receptor binding site were successful in escaping polyclonal antiserum binding, and were able to bind human-like receptors. Yet, they displayed reduced pH and thermal stability. The latter change rendered unlikely their human-to-human transmission (Peacock et al. 2017). Much more attention has been paid to fitness modifications associated with mutation than with recombination. Yet, recent results suggest that recombination is a random, and probably much more frequent event than previously thought, according to results with some positive strand RNA viruses (Bentley et al. 2021; Lowry et al. 2014). RNA virus recombination is increasingly viewed as intricately liked to genome

210

E. Domingo et al.

replication. Both, recombinant and reassortant genomes may arise in the course of fitness determinations, in line with their participation in fitness gain; both may also participate in the compensation of fitness decrease that results from the operation of Muller’s ratchet (Sect. 5.1). Following exposure to these series of limitations, the reader will wonder how can reliably viral fitness levels be ever determined. The answer is that some fitness differences of biological impact may be sufficiently large and robust as not being blurred by the effects outlined here that, although possible, do not occur systematically in all the parallel assays that should be carried out for fitness determinations. As in any good experimental practice, quadruple parallel assays with initial samples of the same viral populations to be compared are required, together with a meticulous description of the infection and environmental conditions. Thanks to these practices, reproducible fitness variations and general trends have been produced, and allow the descriptions that justify the writing of the present article.

5 Overview of Fitness Landscapes for Viruses Fitness landscapes have been depicted as mountains (representing high fitness, or high potential to produce infectious progeny) and valleys (representing low fitness, or low potential to produce infectious progeny), following the Wrightian type of representation (Wright 1931). In these displays, fitness values are generally plotted in ordinate whereas position in sequence space or environmental change is represented in abscissa (Fig. 4). In the simplified way we depict fitness landscapes in two dimensions, the discontinuous line in the lower panel of Fig. 4 suggests that a given viral population may have the same fitness value in widely different positions of sequence space. This may be due to entirely different constellations of mutations (or other modifications) that happen to settle at an even fitness, or to epistatic effects (within an individual genome) and other interactions among different genomes. While maintaining some common mutations, these influences permit divergent mutant spectra to attain the same fitness. No measurements carried to date allow an estimate of the probability of such occurrence, that may be much lower than naïvely visualized in a two-dimension picture. The proposal is coherent with the evidence that low fitness virus populations and clones have available multiple fitness recovery trajectories (Arenas et al. 2016; Escarmís et al. 1999; Gallego et al. 2020; Nguyen et al. 2012). When considering the environment as the variable represented in the abscissa of Fig. 4, an identical fitness in two distant environments (large distance in the discontinuous line) bears on the probability of disease emergence, for example in humans from a virus present in an animal reservoir. We know from the evidence of origin (emergence or re-emergence) of several human pathogenic viruses (influenza virus, human immunodeficiency virus, Ebola virus, or SARS-CoV-2, among others) that the probability of zoonotic virus transmission into humans is sufficient to permit about one human viral emergence per year. A not too distant fitness value (rather than an identical fitness) may suffice for host jumping, given the potential of mutant

211

Fitness

Fitness

Viral Fitness, Population Complexity, Host Interactions …

POSITION IN SEQUENCE SPACE ENVIRONMENTAL CHANGE

Fig. 4 Schematic fitness landscapes drawn in two dimensions. Circles with arrows represent mutational or environmental movements (abscissa) for fitness maintenance (along a flat surface), fitness gain (uphill from a valley), or fitness decrease (downhill from a peak). Current views are that fitness landscapes for viruses are rugged and dynamic, as discussed in the text. Figure taken from (Domingo 2020), with permission from Elsevier

swarms to find alternative pathways for fitness gain (Arenas et al. 2016; Escarmís et al. 1999; Gallego et al. 2020; Nguyen et al. 2012). Viral quasispecies as a contributing factor to virus disease emergence and re-emergence has been previously reviewed (Domingo 2010, 2020). Quasispecies dynamics pushes mutant spectra towards explorations of sequence space, driven either by selective gradients or random events. Not surprisingly, the current picture we have of fitness landscapes of viruses determined by independent procedures is that they are rugged and dynamic (Acevedo et al. 2014; de Visser and Krug 2014; Delgado et al. 2021; Eyre-Walker and Keightley 2007; Fernandez et al. 2007; Kouyos et al. 2012; Lorenzo-Redondo et al. 2011; Munoz-Moreno et al. 2019; Peris et al. 2010; Qi et al. 2014; Quadeer et al. 2020; Sanjuan et al. 2004; Seifert and Beerenwinkel 2016; Wylie and Shakhnovich 2011) (see also the article by Elena and García-Arenal in this volume). Landscape complexity appears to be a widespread feature. Even prolonged replication of HCV in cell culture in absence of external constraints did not simplify the fitness landscape of the virus (the experiment is described in Sect. 2). Rather, ruggedness and dynamism persisted even when adaptation to the cell culture environment reached a maximum under the experimental conditions used. The mutant spectrum kept exploring new portions of sequence space with intra-mutant spectrum fitness peaks—defined by self-organized maps derived from deep sequencing reads—that shifted their position in sequential samples (Delgado et al. 2021) (described in the article by Delgado et al. in the present volume).

212

E. Domingo et al.

5.1 Fitness and Virus Propagation Regimen It follows from fitness landscape dynamism that the passage regime of a virus—for example serial massive infections versus sequential bottlenecks—leads to remarkable fitness variations, with reasonably predictable trends. Large population passages lead to fitness gains (Escarmís et al. 1999; García-Arriaza et al. 2004; Novella et al. 1995a; Sheldon et al. 2014). In contrast, serial bottleneck events lead to profound fitness decreases attributed to the accumulation of mutations due to Muller’s ratchet. The process of fitness decrease was accompanied by dominance of unusual genetic lesions (never observed in natural isolates of the corresponding viruses), with functional consequences (Chao 1990; Clarke et al. 1993; de la Iglesia and Elena 2007; Duarte et al. 1992; Elena et al. 2001; Escarmís et al. 1996; Yuste et al. 1999, 2000). Individual clones that exhibit lower fitness than their parental uncloned populations, or clones that accumulate mutations by the action of Muller’s ratchet, will generally regain fitness when subjected to large population passages. The gain may require many recovery infection rounds, and its rate will depend on the initial fitness value, and the population size involved in the large population passages (Cervera and Elena 2016; Duarte et al. 1993; Escarmís et al. 1999). Thus, passage regimen exerts an important influence on fitness evolution (Domingo et al. 2019; Elena et al. 2000). Subjecting FMDV to serial pig-to-pig contact transfers resulted in fitness decrease, lower virus transmissibility, and a remarkable biological modification: the acquisition by the virus of the capacity to establish a carrier state in swine (Carrillo et al. 2007), a type of inapparent infection that for FMDV had been described only in ruminants. A transition from a cytopathic into persistent phenotype due to serial bottlenecks was described with FMDV clones in cell culture (Escarmis et al. 2008). Important phenotypic changes have been reported upon large population passages of HCV concomitantly with its adaptation to human hepatoma cells. Some changes accentuated key virus-host interactions such as the capacity of the virus to shut off host protein synthesis (Moreno et al. 2017; Perales et al. 2013) (Fig. 5). Therefore, a passage regime may direct a viral population towards new areas of sequence space, with phenotypic modifications, including some that deviate from the accepted biological features of a virus group, according to current virus classifications. Such deviations from viral features that once they were thought to be fixed reinforce the concept, well rooted in quasispecies theory, that viral quasispecies are rich reservoirs of alternative viral phenotypes. In viruses, alternative phenotypes can be frequently realized. The initial fitness of a viral population determines the population size that has to be transferred in subsequent infections for the population to maintain, increase or decrease its fitness. When the initial population has low fitness, fitness gain is more likely, and few particles per transfer are sufficient to increase fitness. In contrast, with a high initial fitness large populations are needed just to maintain fitness (Novella et al. 1995b) (Fig. 6). Fitness of the individual infectious genomes that populate a mutant spectrum does not reflect the fitness of the mutant swarm where they belong. This was observed

Viral Fitness, Population Complexity, Host Interactions …

213

in a very early study with bacteriophage Qβ (Domingo et al. 1978), and also with vesicular stomatitis virus (Duarte et al. 1994), and other viruses (Cervera and Elena 2016). Fitness of a clone immersed in a mutant spectrum is modulated by the composition of the mutant spectrum, in particular by the number of neighbors in sequence space, as predicted by theoretical models and verified experimentally (de la Torre and Holland 1990; Domingo and Schuster 2016; Swetina and Schuster 1982). Fitness is a collective property of an entire virus population.

5.2 Intra-mutant Spectrum Fitness Profiles An interesting problem arises when trying to describe experimentally the fitness distribution of the individual genomes that compose a viral mutant spectrum at a fixed time point. Even if fitness values could be determined for a sufficient number of individual clones retrieved from the population, their values would not be representative of the ensemble (Cervera and Elena 2016; Domingo et al. 1978; Duarte et al. 1994). UDS methods—that permit determining reliably 10,000 to 100,000 sequences and their corresponding haplotypes from a mutant spectrum—have come to the rescue to approach this problem. A basic assumption is that the frequency of individual sequences (and deduced haplotypes) is a reflection of the replicative efficiency of the genomes that harbor them; in consequence, a distribution of haplotype frequencies is a valid approximation to the intra-population fitness profile. A UDS-based model was developed for the equilibrium quasispecies, using haplotype frequencies to quantify the selective advantage of an adaptive allele (Seifert and Beerenwinkel 2016; Seifert et al. 2015). An alternative procedure has been applied to HCV, and consists in the construction of two-dimensional Self-Organized Maps (SOMs), with an ordering of sequences according to their relatedness; the third dimension of the landscape is given by the haplotype frequencies, calculated from UDS reads (Delgado et al. 2021) (see article by Delgado et al. in this volume). Application of this SOM procedure to the comparison of samples from HCV populations subjected to up to 210 passages in human hepatoma cells evidenced a dynamic variation of the fitness profile that did not subside even after extensive replication in the cell culture environment (Delgado et al. 2021). A fundamental limitation of these procedures is that for haplotype frequencies to represent a fitness landscape, a population equilibrium, akin to a mutation-selection balance, is presupposed. No such equilibrium can ever exist for a replicating virus. Haplotypes have transient existences in the sense that they are dynamically replaced by others. When some persist, it occurs with frequency changes. In viral systems, an equilibrium is a tendency not a definite state. Persistent population disequilibrium complicates antiviral control strategies generally, including treatments with antiviral agents.

214

E. Domingo et al. 72 h.p.i.

24 h.p.i.

c

48 h.p.i.

72 h.p.i.

N

N

o v H irus C V H p0 C V H p1 C 0 V 0 N p2 o v 00 H irus C V H p0 C V H p1 C 00 V N p2 o v 00 H irus C V H p0 C V H p1 C 00 V p2 00

48 h.p.i.

o v H irus C V H p0 C V H p1 C 00 V N p2 o v 00 H irus C V H p0 C V H p1 C 00 V N p2 o v 00 H irus C V H p0 C V H p1 C 00 V p2 00

24 h.p.i.

a

NS5A

Core Actin Actin

WB NS5A No virus HCV p0

HCV p100

HCV p200

120

% actin

100 80 60 ns

ns

*** ***

*

20

ns

0

24 h.p.i.

48 h.p.i.

ns

ns

ns

ns

**

b

***

No virus

HCV p0

HCV p100

**

Optical density

No virus

40

100 ns

50

ns

** **

ns

*

** * ***

24 h.p.i.

48 h.p.i.

72 h.p.i.

ns

*

*** *** ***

ns

ns

72 h.p.i. **

*

**

*** ***

WB CORE No virus

HCV p200

HCV p0

HCV p100

HCV p200

200

6 5

10

4

10

3

10

** * ns

24 h.p.i. *** *** ***

** ** **

** *** *

48 h.p.i. *** *** ***

72 h.p.i. *** *** ***

Optical density

Viral titer (TCID50 /ml)

150

**

10

10

HCV p200

200

**

10

2

HCV p100

250

0

7

10

HCV p0

300

150

100 50

ns

ns ns

ns

0

24 h.p.i. ns ns ns

*

**

48 h.p.i. ns ns

**

ns

*

**

72 h.p.i. *** *** **

Viral Fitness, Population Complexity, Host Interactions …

215

FITNESS

◄Fig. 5 The effect of hepatitis C virus (HCV) fitness on suppression of host cell protein synthesis. The viruses compared were HCV p0, HCV p100, and HCV p200 whose origin and fitness values are described in Fig. 3. a For each infection 4 × 105 Huh-7.5 cell were infected with 1.2 × 104 TCID50 of virus, and the indicated times post-infection the cells were pulse-labeled with radioactive [35 S]-Met/Cys and the cellular extracts analyzed electrophoretically. The total amount of labeled protein was quantified relative to actin (second panel). b Virus titer in the cell culture supernatants of the same infections. c Viral protein NS5A and core visualized in the cellular extract by Western blot; values normalized to actin are shown in the bottom panels. Statistical significances were determined using the unpaired t test. Further experimental details are described in (Moreno et al. 2017) from where the figure has been copied, with permission from the American Society for Microbiology, Washington DC, USA

FITNESS

TRANSFER NUMBER

H

L

FITNESS

TRANSFER NUMBER

H

L TRANSFER NUMBER

Fig. 6 Pool size of viral clones requirement to maintain, increase, or decrease viral fitness during serial infections of vesicular stomatitis virus (Novella et al. 1995a). Top: single plaque-to-plaque transfers (the most severe form of bottleneck depicted in Fig. 2) lead to fitness decrease due to mutation accumulation (Muller’s ratchet). Middle: infections with a pool of many clones (dot array) is required to maintain fitness; when fitness is low, a smaller number of pooled clones is sufficient to gain fitness. Bottom: summary of the same concept with clone size pool depicted by spheres of different size. Note that stochastic fitness fluctuations have been reported at the extremes of the fitness scale. Figure copied from (Domingo 2020), with permission from Elsevier

216

E. Domingo et al.

6 Fitness in the Development of Antiviral Resistance The influence of viral fitness on resistance to antiviral agents covers two complementary aspects: (i) the fitness cost of the mutations and amino acid substitutions that the viral genome has to acquire to express the resistance phenotype, and (ii) the decrease of sensitivity to antiviral agents due to high fitness of the virus population which is confronted with an antiviral agent. While the mechanism (i) has a long history of observations, mechanism (ii) is very recent since its disclosure required having available closely related viral populations displaying different fitness, together with multiple effective drugs with different modes of action directed to the virus.

6.1 Fitness Cost of Escape Mutants Resistance to antiviral agents due to the selection of drug-escape mutants is a general problem in antiviral therapy, with the first evidence going back to early molecular studies with picornaviruses in the 1960s (Eggers and Tamm 1961, 1965). Selection of drug-escape mutants has been a continued occurrence with virtually any virus for which a specific antiviral agent has been developed. For RNA viruses, high mutation rates and quasispecies dynamics provided a neat interpretation of the frequent selection of drug-escape mutants, and of the advantages that combination therapies offered to suppress them (Domingo 1989). It became also evident that mutation frequencies for HIV-1 in the field were so high that drug-resistance mutations were present in viruses that had never been exposed to antiretroviral agents (Mohri et al. 1993; Nájera et al. 1994, 1995). Escape mutants displaying various degrees of resistance to antiviral agents or to antibodies (in particular monoclonal antibodies) have been detected at high frequencies in populations of important pathogenic viruses such as HIV-1, hepatitis B virus, or influenza virus. Their selection is still today a major cause of treatment failures (Domingo 2020; Domingo and Perales 2019; Perales et al. 2012, 2017). The ease of acquisition of escape mutations depends on two types of barriers, termed genetic and phenotypic barriers. The genetic barrier is given by the number and type of mutations required to reach the resistance phenotype; typically, transitions are more frequent than transversions, and the probability of occurrence of two mutations in the same genome is the product of probabilities of occurrence of each of the mutations separately. The phenotypic barrier is given by the cost in fitness inflicted by the required number or mutations. Both, escape mutations that impose severe fitness losses, and others that have a negligible effect on fitness have been reported. In a study with respiratory syncytial virus, two monoclonal antibodyresistant mutants were selected; one of them inflicted a fitness cost while the other displayed a competitive advantage relative to the parental strain (Zhao et al. 2006). HIV-1-infected patients generally contain viruses that exhibit a broad range of fitness values (Goudsmit et al. 1997). Reconstructed HIV-1 variants selected from

Viral Fitness, Population Complexity, Host Interactions …

217

treated patients for resistance to protease inhibitors manifested reproducible fitness decreases that were attributed to multiple defects in the processing of Gag and GagPol polyprotein precursors by the protease (Zennou et al. 1998). The current picture is that the fitness cost of escape mutations remains largely unpredictable, partly as a consequence of the multifunctionality of the proteins that incorporate the substitution responsible for the escape phenotype. For HCV infections, despite treatment success with available direct acting antiviral agents (DAAs), HCV resistance to one or multiple antiviral agents remains an issue, and the selection of resistance-associated substitutions (RAS) has been documented in several cohorts (Chen et al. 2020; Di Maio et al. 2017, 2021; Dietz et al. 2018; Parczewski et al. 2021). Again, differences in fitness cost are quite clear. Resistanceassociated substitutions (RAS) selected in protein NS3 in patients treated with the protease inhibitors telaprevir and boceprevir became undetectable in the HCV populations after weeks of treatment cessation (Susser et al. 2009, 2011). In contrast, the RAS in NS5A following treatment with daclatasvir persisted for years in the population (Sarrazin 2016; Yoshimi et al. 2015).

6.2 High Fitness as Promoter of Antiviral Resistance Despite RAS selection being the main mediator of antiviral resistance in HCV and other viruses, our group has recently documented with studies in cell culture a new resistance mechanism in HCV that is not associated with standard bona fide RAS. The first observations were made when a clonal HCV population termed HCV p0 (clonal meaning that it was the progeny of a plasmid transcript) was passaged 100 times in Huh-7.5 human hepatoma cells in the absence and presence of different interferon alpha (IFN-α) concentrations; the objective of the passage experiment was to select IFN-α-resistant mutants (Perales et al. 2013). We realized that HCV p100 (the population passaged 100 times in absence of IFN-α) displayed partial resistance to IFN-α when it was subjected to ten serial passages in the presence of the cytokine (Perales et al. 2013). This observation encouraged examining the possible resistance to other HCV inhibitors that target different viral or cellular functions; the result was similar (Sheldon et al. 2014). Indeed, ten serial passages of HCV p0 and HCV p100 in the presence of telaprevir, daclatasvir, ribavirin, or cyclosporine A resulted in an inhibition of HCV p0 (with the exception of daclatasvir that selected a resistance mutant), and a sustained viral replication of HCV p100 in assays carried out over a 1000-fold range of MOI. In addition to the two populations differing in mutant spectrum composition, HCV p100 displayed a 2.3-fold higher fitness than HCV p0, according to growth-competition experiments in the same Huh-7.5 cell line (see also Sect. 2). Since individual biological clones retrieved from the populations maintained the antiviral resistance phenotype, and no RAS mutations were identified by UDS, we associated high fitness with the general antiviral resistance phenotype (Sheldon et al. 2014). Resistance was also documented with sofosbuvir, a drug amply used in current anti-HCV therapies and with a high barrier to resistance (Gallego et al. 2016). Thus,

218

E. Domingo et al.

the initial fitness displayed by HCV is a general determinant of antiviral resistance that does not necessitate the occurrence of specific resistance mutations. The analyses were extended to HCV p200, also a high fitness population resulting from 200 serial passages of HCV p0. Resistance was observed with the lethal mutagens favipiravir and ribavirin, linked also to a limited increase of mutant spectrum complexity, as calculated by several diversity indices (Gallego et al. 2018). To explain the fitness-dependent antiviral resistance documented with inhibitors displaying different mechanisms of activity, we proposed a competition model between replicative RNA and inhibitor molecules at the replication complexes. For the high fitness HCVp100 and HCV p200, the amount of inhibitor per replication complex unit may be lower than for HCV p0, as predicted from the intracellular HCV RNA levels (Moreno et al. 2017). Competitions may also operate at other steps of the HCV infectious cycle. In addition to a mass action effect, another contributing factor may be the delocalization in sequence space attained by high fitness viruses that expands possibilities for continued viral multiplication with multiple and distinct initial genomes. Such delocalization has been termed broadly diversifying selection, a process that took place upon extensive replication of HCV in cell culture in absence of external perturbations (Delgado et al. 2021; Domingo et al. 2020; Gallego et al. 2020). Pendent issues are whether fitness-dependent antiviral resistance operates also with other viruses, and what might be its relevance in vivo.

7 Mechanisms of Antiviral Resistance Alternative to Direct Selection of Escape Mutants An intriguing observation made with several cohorts of HCV chronically-infected patients in the course of antiviral treatments is that a sizeable proportion (around 20%) of patients who fail therapy include viruses that lack RAS (Chen et al. 2020; Di Maio et al. 2017; Dietz et al. 2018; Foster et al. 2015; Jacobson et al. 2013; Lawitz et al. 2013a, b; Sato et al. 2015; Stross et al. 2016; Sullivan et al. 2013; Svarovskaia et al. 2014). A recent study with a large cohort of 220 HCV-infected patients revealed that their escape viruses harbored one or more amino acid substitutions in NS3, NS5A, or NS5B that were not RAS. Since the same substitutions were present in many patients—irrespective of the antiviral agents they were treated with—they were termed highly represented substitutions (HRSs) (Fig. 7). HRSs belonged to more accepted classes of substitutions (according to the PAM 250 replacement matrix) than RAS, and they were also less disruptive of protein structures. They had not been previously associated with resistance to any anti-HCV antiviral agent, they were present in basal HCV samples (prior to treatment), and were maintained after treatment failure (Soria et al. 2020). Some of the HRSs were also identified in patients from other cohorts who also failed therapy (Bellocchi et al. 2019; Hamano et al. 2005; Jiang et al. 2014; Nakamoto et al. 2014; Uchida et al. 2018). Specifically, one of the

Viral Fitness, Population Complexity, Host Interactions …

219

Frequency (%)

(a)

5, exoribonuclease that is critically involved in coronavirus RNA synthesis. Proc Natl Acad Sci USA 103:5108–5113 Mohri H, Singh MK, Ching WT, Ho DD (1993) Quantitation of zidovudine-resistant human immunodeficiency virus type 1 in the blood of treated and untreated patients. Proc Natl Acad Sci USA 90:25–29 Moreno E, Gallego I, Gregori J, Lucia-Sanz A, Soria ME, Castro V, Beach NM, Manrubia S, Quer J, Esteban JI, Rice CM, Gomez J, Gastaminza P, Domingo E, Perales C (2017) Internal disequilibria and phenotypic diversification during replication of hepatitis C virus in a noncoevolving cellular environment. J Virol 91:e02505-e2516 Moreno E, Ojosnegros S, Garcia-Arriaza J, Escarmis C, Domingo E, Perales C (2014) Exploration of sequence space as the basis of viral RNA genome segmentation. Proc Natl Acad Sci USA 111:6678–6683 Morga B, Jacquot M, Pelletier C, Chevignon G, Dégremont L, Biétry A, Pepin J-F, Heurtebise S, Escoubas J-M, Bean TP, Rosani U, Bai C-M, Renault T, Lamy J-B (2021) Genomic diversity of the ostreid herpesvirus type 1 across time and location and among host species. Front Microbiol 12 Mudd PA, Ericsen AJ, Walsh AD, Leon EJ, Wilson NA, Maness NJ, Friedrich TC, Watkins DI (2011) CD8+ T cell escape mutations in simian immunodeficiency virus SIVmac239 cause fitness defects in vivo, and many revert after transmission. J Virol 85:12804–12810 Munoz-Moreno R, Martinez-Romero C, Blanco-Melo D, Forst CV, Nachbagauer R, Benitez AA, Mena I, Aslam S, Balasubramaniam V, Lee I, Panis M, Ayllon J, Sachs D, Park MS, Krammer F, tenOever BR, Garcia-Sastre A (2019) Viral fitness landscapes in diverse host species reveal multiple evolutionary lines for the NS1 gene of influenza A viruses. Cell Rep 29(3997–4009):e3995 Nájera I, Holguín A, Quiñones-Mateu ME, Muñoz-Fernández MA, Nájera R, López-Galíndez C, Domingo E (1995) Pol gene quasispecies of human immunodeficiency virus: mutations associated with drug resistance in virus from patients undergoing no drug therapy. J Virol 69:23–31 Nájera I, Richman DD, Olivares I, Rojas JM, Peinado MA, Perucho M, Najera R, Lopez-Galindez C (1994) Natural occurrence of drug resistance mutations in the reverse transcriptase of human immunodeficiency virus type 1 isolates. AIDS Res Hum Retroviruses 10:1479–1488 Nakamoto S, Kanda T, Wu S, Shirasawa H, Yokosuka O (2014) Hepatitis C virus NS5A inhibitors and drug resistance mutations. World J Gastroenterol 20:2902–2912 Nguyen AH, Molineux IJ, Springman R, Bull JJ (2012) Multiple genetic pathways to similar fitness limits during viral adaptation to a new host. Evolution 66:363–374 Nogales A, Aydillo T, Avila-Perez G, Escalera A, Chiem K, Cadagan R, DeDiego ML, Li F, Garcia-Sastre A, Martinez-Sobrido L (2019) Functional characterization and direct comparison of influenza A, B, C, and D NS1 proteins in vitro and in vivo. Front Microbiol 10:2862 Novella IS (2003) Contributions of vesicular stomatitis virus to the understanding of RNA virus evolution. Curr Opin Microbiol 6:399–405 Novella IS, Duarte EA, Elena SF, Moya A, Domingo E, Holland JJ (1995a) Exponential increases of RNA virus fitness during large population transmissions. Proc Natl Acad Sci USA 92:5841–5844 Novella IS, Elena SF, Moya A, Domingo E, Holland JJ (1995b) Size of genetic bottlenecks leading to virus fitness loss is determined by mean initial population fitness. J Virol 69:2869–2872 Novella IS, Quer J, Domingo E, Holland JJ (1999) Exponential fitness gains of RNA virus populations are limited by bottleneck effects. J Virol 73:1668–1671

232

E. Domingo et al.

Obadan AO, Santos J, Ferreri L, Thompson AJ, Carnaccini S, Geiger G, Gonzalez Reiche AS, Rajao DS, Paulson JC, Perez DR (2019) Flexibility in vitro of amino acid 226 in the receptor-binding site of an H9 subtype influenza A virus and its effect in vivo on virus replication, tropism, and transmission. J Virol 93 Ogando NS, Zevenhoven-Dobbe JC, van der Meer Y, Bredenbeek PJ, Posthuma CC, Snijder EJ (2020) The enzymatic activity of the nsp14 exoribonuclease is critical for replication of MERSCoV and SARS-CoV-2. J Virol 94 Ojosnegros S, Garcia-Arriaza J, Escarmis C, Manrubia SC, Perales C, Arias A, Mateu MG, Domingo E (2011) Viral genome segmentation can result from a trade-off between genetic content and particle stability. PLoS Genet 7:e1001344 Parczewski M, Janczewska E, Pisula A, Dybowska D, Lojewski W, Witor A, WawrzynowiczSyczewska M, Socha L, Krygier R, Knysz B, Musialik J, Urbanska A, Scheibe K, Jaroszewicz J (2021) HCV resistance-associated substitutions following direct-acting antiviral therapy failure - Real-life data from Poland. Infect Genet Evol 93:104949 Park D, Huh HJ, Kim YJ, Son DS, Jeon HJ, Im EH, Kim JW, Lee NY, Kang ES, Kang CI, Chung DR, Ahn JH, Peck KR, Choi SS, Kim YJ, Ki CS, Park WY (2016) Analysis of intrapatient heterogeneity uncovers the microevolution of Middle East respiratory syndrome coronavirus. Cold Spring Harb Mol Case Stud 2:a001214 Parsons LR, Tafuri YR, Shreve JT, Bowen CD, Shipley MM, Enquist LW, Szpara ML (2015) Rapid genome assembly and comparison decode intrastrain variation in human alphaherpesviruses. mBio 6 Peacock TP, Benton DJ, James J, Sadeyen JR, Chang P, Sealy JE, Bryant JE, Martin SR, Shelton H, Barclay WS, Iqbal M (2017) Immune escape variants of H9N2 influenza viruses containing deletions at the hemagglutinin receptor binding site retain fitness in vivo and display enhanced zoonotic characteristics. J Virol 91 Perales C, Beach NM, Gallego I, Soria ME, Quer J, Esteban JI, Rice C, Domingo E, Sheldon J (2013) Response of hepatitis C virus to long-term passage in the presence of alpha interferon: multiple mutations and a common phenotype. J Virol 87:7593–7607 Perales C, Gallego I, de Avila AI, Soria ME, Gregori J, Quer J, Domingo E (2019) The increasing impact of lethal mutagenesis of viruses. Future Med Chem 11:1645–1657 Perales C, Iranzo J, Manrubia SC, Domingo E (2012) The impact of quasispecies dynamics on the use of therapeutics. Trends Microbiol 20:595–603 Perales C, Lorenzo-Redondo R, López-Galíndez C, Martínez MA, Domingo E (2010) Mutant spectra in virus behavior. Futur Virol 5:679–698 Perales C, Mateo R, Mateu MG, Domingo E (2007) Insights into RNA virus mutant spectrum and lethal mutagenesis events: replicative interference and complementation by multiple point mutants. J Mol Biol 369:985–1000 Perales C, Ortega-Prieto AM, Beach NM, Sheldon J, Menendez-Arias L, Domingo E (2017) Quasispecies and drug resistance. Handbook of antimicrobial resistance. Springer Science+Business Media New York Peris JB, Davis P, Cuevas JM, Nebot MR, Sanjuan R (2010) Distribution of fitness effects caused by single-nucleotide substitutions in bacteriophage f1. Genetics 185:603–609 Pesola JM, Coen DM (2007) In vivo fitness and virulence of a drug-resistant herpes simplex virus 1 mutant. J Gen Virol 88:1410–1414 Pfeiffer JK, Kirkegaard K (2006) Bottleneck-mediated quasispecies restriction during spread of an RNA virus from inoculation site to brain. Proc Natl Acad Sci USA 103:5520–5525 Qi H, Olson CA, Wu NC, Ke R, Loverdo C, Chu V, Truong S, Remenyi R, Chen Z, Du Y, Su SY, Al-Mawsawi LQ, Wu TT, Chen SH, Lin CY, Zhong W, Lloyd-Smith JO, Sun R (2014) A quantitative high-resolution genetic profile rapidly identifies sequence determinants of hepatitis C viral fitness and drug sensitivity. PLoS Pathog 10:e1004064 Quadeer AA, Barton JP, Chakraborty AK, McKay MR (2020) Deconvolving mutational patterns of poliovirus outbreaks reveals its intrinsic fitness landscape. Nat Commun 11:377

Viral Fitness, Population Complexity, Host Interactions …

233

Quinones-Mateu ME, Moore-Dudley DM, Jegede O, Weber J, E JA (2008) Viral drug resistance and fitness. Adv Pharmacol 56:257–296 Quiñones-Mateu ME, Arts E (2006) Virus fitness: concept, qunatification, and application to HIV population dynamics. Curr Top Microbiol Immunol 299:83–140 Renner DW, Szpara ML (2018) Impacts of genome-wide analyses on our understanding of human herpesvirus diversity and evolution. J Virol 92 Roossinck MJ, Ali A (2007) Mechanisms of plant virus evolution and identification of genetic bottlenecks: impact on disease management. In: Punja ZK, DeBoer SH, Sanfaçon H (eds) Biotechnology and plant disease management. CABI, Wallingford, pp 109–124 Rueca M, Bartolini B, Gruber CEM, Piralla A, Baldanti F, Giombini E, Messina F, Marchioni L, Ippolito G, Di Caro A, Capobianchi MR (2020) Compartmentalized replication of SARSCov-2 in upper vs. lower respiratory tract assessed by whole genome quasispecies analysis. Microorganisms 8:E1302 Saakian DB, Hu CK (2006) Exact solution of the Eigen model with general fitness functions and degradation rates. Proc Natl Acad Sci USA 103:4935–4939 Saakian DB, Hu CK (2016) Mathematical models of quasi-species theory and exact results for the dynamics. Curr Top Microbiol Immunol 392:121–139 Sanjuan R, Cuevas JM, Moya A, Elena SF (2005) Epistasis and the adaptability of an RNA virus. Genetics 170:1001–1008 Sanjuan R, Moya A, Elena SF (2004) The distribution of fitness effects caused by single-nucleotide substitutions in an RNA virus. Proc Natl Acad Sci USA 101:8396–8401 Sanjuan R, Nebot MR, Chirico N, Mansky LM, Belshaw R (2010) Viral mutation rates. J Virol 84:9733–9748 Sanjuan R, Thoulouze MI (2019) Why viruses sometimes disperse in groups?(dagger). Virus Evol 5:vez014 Sanz-Ramos M, Diaz-San Segundo F, Escarmis C, Domingo E, Sevilla N (2008) Hidden virulence determinants in a viral quasispecies in vivo. J Virol 82:10465–10476 Sarrazin C (2016) The importance of resistance to direct antiviral drugs in HCV infection in clinical practice. J Hepatol 64:486–504 Sato M, Maekawa S, Komatsu N, Tatsumi A, Miura M, Muraoka M, Suzuki Y, Amemiya F, Takano S, Fukasawa M, Nakayama Y, Yamaguchi T, Uetake T, Inoue T, Sato T, Sakamoto M, Yamashita A, Moriishi K, Enomoto N (2015) Deep sequencing and phylogenetic analysis of variants resistant to interferon-based protease inhibitor therapy in chronic hepatitis induced by genotype 1b hepatitis C virus. J Virol 89:6105–6116 Scholle F, Girard YA, Zhao Q, Higgs S, Mason PW (2004) trans-Packaged West Nile virus-like particles: infectious properties in vitro and in infected mosquito vectors. J Virol 78:11605–11614 Scholle MD, Liu C, Deval J, Gurard-Levin ZA (2021) Label-free screening of SARS-CoV-2 NSP14 exonuclease activity using SAMDI mass spectrometry. SLAS Discov:24725552211008854 Seifert D, Beerenwinkel N (2016) Estimating fitness of viral quasispecies from next-generation sequencing data. Curr Top Microbiol Immunol 392:181–200 Seifert D, Di Giallonardo F, Metzner KJ, Gunthard HF, Beerenwinkel N (2015) A framework for inferring fitness landscapes of patient-derived viruses using quasispecies theory. Genetics 199:191–203 Sender R, Bar-On YM, Gleizer S, Bernshtein B, Flamholz A, Phillips R, Milo R (2021) The total number and mass of SARS-CoV-2 virions. Proc Natl Acad Sci USA 118 Sheldon J, Beach NM, Moreno E, Gallego I, Pineiro D, Martinez-Salas E, Gregori J, Quer J, Esteban JI, Rice CM, Domingo E, Perales C (2014) Increased replicative fitness can lead to decreased drug sensitivity of hepatitis C virus. J Virol 88:12098–12111 Shirogane Y, Watanabe S, Yanagi Y (2012) Cooperation between different RNA virus genomes produces a new phenotype. Nat Commun 3:1235 Shirogane Y, Watanabe S, Yanagi Y (2016) Cooperative interaction within RNA virus mutant spectra. Curr Top Microbiol Immunol 392:219–229

234

E. Domingo et al.

Shirogane Y, Watanabe S, Yanagi Y (2019) Cooperation between different variants: a unique potential for virus evolution. Virus Res 264:68–73 Shivaprasad S, Sarnow P (2021) The tale of two flaviviruses: subversion of host pathways by RNA shapes in dengue and hepatitis C viral RNA genomes. Curr Opin Microbiol 59:79–85 Smith EC, Blanc H, Vignuzzi M, Denison MR (2013) Coronaviruses lacking exoribonuclease activity are susceptible to lethal mutagenesis: evidence for proofreading and potential therapeutics. PLoS Pathog 9:e1003565 Soria ME, Garcia-Crespo C, Martinez-Gonzalez B, Vazquez-Sirvent L, Lobo-Vega R, de Avila AI, Gallego I, Chen Q, Garcia-Cehic D, Llorens-Revull M, Briones C, Gomez J, Ferrer-Orta C, Verdaguer N, Gregori J, Rodriguez-Frias F, Buti M, Esteban JI, Domingo E, Quer J, Perales C (2020) Amino acid substitutions associated with treatment failure for hepatitis C virus infection. J Clin Microbiol 58 Stross C, Shimakami T, Haselow K, Ahmad MQ, Zeuzem S, Lange CM, Welsch C (2016) Natural HCV variants with increased replicative fitness due to NS3 helicase mutations in the C-terminal helix alpha18. Sci Rep 6:19526 Sullivan JC, De Meyer S, Bartels DJ, Dierynck I, Zhang EZ, Spanks J, Tigges AM, Ghys A, Dorrian J, Adda N, Martin EC, Beumont M, Jacobson IM, Sherman KE, Zeuzem S, Picchio G, Kieffer TL (2013) Evolution of treatment-emergent resistant variants in telaprevir phase 3 clinical trials. Clin Infect Dis 57:221–229 Sun F, Wang X, Tan S, Dan Y, Lu Y, Zhang J, Xu J, Tan Z, Xiang X, Zhou Y, He W, Wan X, Zhang W, Chen Y, Tan W, Deng G (2021) SARS-CoV-2 quasispecies provides an advantage mutation pool for the epidemic variants. Microbiol Spectr 9:e0026121 Susser S, Vermehren J, Forestier N, Welker MW, Grigorian N, Fuller C, Perner D, Zeuzem S, Sarrazin C (2011) Analysis of long-term persistence of resistance mutations within the hepatitis C virus NS3 protease after treatment with telaprevir or boceprevir. J Clin Virol 52:321–327 Susser S, Welsch C, Wang Y, Zettler M, Domingues FS, Karey U, Hughes E, Ralston R, Tong X, Herrmann E, Zeuzem S, Sarrazin C (2009) Characterization of resistance to the protease inhibitor boceprevir in hepatitis C virus-infected patients. Hepatology 50:1709–1718 Svarovskaia ES, Dvory-Sobol H, Parkin N, Hebner C, Gontcharova V, Martin R, Ouyang W, Han B, Xu S, Ku K, Chiu S, Gane E, Jacobson IM, Nelson DR, Lawitz E, Wyles DL, Bekele N, Brainard D, Symonds WT, McHutchison JG, Miller MD, Mo H (2014) Infrequent development of resistance in genotype 1–6 hepatitis C virus-infected subjects treated with sofosbuvir in phase 2 and 3 clinical trials. Clin Infect Dis 59:1666–1674 Swetina J, Schuster P (1982) Self-replication with errors. A model for polynucleotide replication. Biophys Chem 16:329–345 Tang JW, Cheung JL, Chu IM, Sung JJ, Peiris M, Chan PK (2006) The large 386-nt deletion in SARS-associated coronavirus: evidence for quasispecies? J Infect Dis 194:808–813 Tracy S, Smithee S, Alhazmi A, Chapman N (2015) Coxsackievirus can persist in murine pancreas by deletion of 5, terminal genomic sequences. J Med Virol 87:240–247 Troyer RM, Garver KA, Ranson JC, Wargo AR, Kurath G (2008) In vivo virus growth competition assays demonstrate equal fitness of fish rhabdovirus strains that co-circulate in aquaculture. Virus Res 137:179–188 Uchida Y, Nakamura S, Kouyama JI, Naiki K, Motoya D, Sugawara K, Inao M, Imai Y, Nakayama N, Tomiya T, Hedskog C, Brainard D, Mo H, Mochida S (2018) Significance of NS5B substitutions in genotype 1b hepatitis C virus evaluated by bioinformatics analysis. Sci Rep 8:8818 Van Slyke GA, Ciota AT, Willsey GG, Jaeger J, Shi PY, Kramer LD (2012) Point mutations in the West Nile virus (Flaviviridae; Flavivirus) RNA-dependent RNA polymerase alter viral fitness in a host-dependent manner in vitro and in vivo. Virology 427:18–24 Wagner N, Atsmon-Raz Y, Ashkenasy G (2016) Theoretical models of generalized quasispecies. Curr Top Microbiol Immunol 392:141–159 Wargo AR, Kurath G (2011) In vivo fitness associated with high virulence in a vertebrate virus is a complex trait regulated by host entry, replication, and shedding. J Virol 85:3959–3967

Viral Fitness, Population Complexity, Host Interactions …

235

Wargo AR, Kurath G (2012) Viral fitness: definitions, measurement, and current insights. Curr Opin Virol 2:538–545 Wong YC, Lau SY, Wang To KK, Mok BWY, Li X, Wang P, Deng S, Woo KF, Du Z, Li C, Zhou J, Chan JFW, Yuen KY, Chen H, Chen Z (2021) Natural transmission of bat-like severe acute respiratory syndrome coronavirus 2 without proline-arginine-arginine-alanine variants in coronavirus disease 2019 patients. Clin Infect Dis 73:e437–e444 Wright S (1931) Evolution in Mendelian populations. Genetics 16:97–159 Wylie CS, Shakhnovich EI (2011) A biophysical protein folding model accounts for most mutational fitness effects in viruses. Proc Natl Acad Sci USA 108:9916–9921 Xu D, Zhang Z, Wang FS (2004) SARS-associated coronavirus quasispecies in individual patients. N Engl J Med 350:1366–1367 Yoshimi S, Imamura M, Murakami E, Hiraga N, Tsuge M, Kawakami Y, Aikata H, Abe H, Hayes CN, Sasaki T, Ochi H, Chayama K (2015) Long term persistence of NS5A inhibitor-resistant hepatitis C virus in patients who failed daclatasvir and asunaprevir therapy. J Med Virol Yuste E, López-Galíndez C, Domingo E (2000) Unusual distribution of mutations associated with serial bottleneck passages of human immunodeficiency virus type 1. J Virol 74:9546–9552 Yuste E, Sánchez-Palomino S, Casado C, Domingo E, López-Galíndez C (1999) Drastic fitness loss in human immunodeficiency virus type 1 upon serial bottleneck events. J Virol 73:2745–2751 Zennou V, Mammano F, Paulous S, Mathez D, Clavel F (1998) Loss of viral fitness associated with multiple Gag and Gag-Pol processing defects in human immunodeficiency virus type 1 variants selected for resistance to protease inhibitors in vivo. J Virol 72:3300–3306 Zhao X, Liu E, Chen FP, Sullender WM (2006) In vitro and in vivo fitness of respiratory syncytial virus monoclonal antibody escape mutants. J Virol 80:11651–11657

Mechanisms and Consequences of Genetic Variation in Hepatitis C Virus (HCV) Andrea Galli and Jens Bukh

Abstract Chronic infection with hepatitis C virus (HCV) is an important contributor to the global incidence of liver diseases, including liver cirrhosis and hepatocellular carcinoma. Although common for single-stranded RNA viruses, HCV displays a remarkable high level of genetic diversity, produced primarily by the error-prone viral polymerase and host immune pressure. The high genetic heterogeneity of HCV has led to the evolution of several distinct genotypes and subtypes, with important consequences for pathogenesis, and clinical outcomes. Genetic variability constitutes an evasion mechanism against immune suppression, allowing the virus to evolve epitope escape mutants that avoid immune recognition. Thus, heterogeneity and variability of the HCV genome represent a great hindrance for the development of vaccines against HCV. In addition, the high genetic plasticity of HCV allows the virus to rapidly develop antiviral resistance mutations, leading to treatment failure and potentially representing a major hindrance for the cure of chronic HCV patients. In this chapter, we will present the central role that genetic diversity has in the viral life cycle and epidemiology of HCV. Incorporation errors and recombination, both the result of HCV polymerase activity, represent the main mechanisms of HCV evolution. The molecular details of both mechanisms have been only partially clarified and will be presented in the following sections. Finally, we will discuss the major consequences of HCV genetic diversity, namely its capacity to rapidly evolve antiviral and immunological escape variants that represent an important limitation for clearance of acute HCV, for treatment of chronic hepatitis C and for broadly protective vaccines.

A. Galli · J. Bukh (B) Copenhagen Hepatitis C Program (CO-HEP), Department of Infectious Diseases, Copenhagen University Hospital - Amager and Hvidovre, Copenhagen, Denmark e-mail: [email protected] Copenhagen Hepatitis C Program (CO-HEP), Department of Immunology and Microbiology, University of Copenhagen, Copenhagen, Denmark © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 E. Domingo et al. (eds.), Viral Fitness and Evolution, Current Topics in Microbiology and Immunology 439, https://doi.org/10.1007/978-3-031-15640-3_7

237

238

A. Galli and J. Bukh

1 HCV Introduction 1.1 HCV Impact on Human Health Hepatitis C virus (HCV) is an important human pathogen being one of the primary causes of liver cirrhosis and hepatocellular carcinoma worldwide (Mortality and Causes of Death 2016; WHO 2017). Current data estimate that around 70 million people has developed chronic hepatitis C viral infection, leading to an estimated 400,000 HCV related deaths per year, globally (Mortality and Causes of Death 2016; Polaris Observatory 2017). The prognosis of HCV chronic infection has been traditionally very poor, due to the initial lack of proper therapeutic options (Pawlotsky et al. 2015). The first significant improvements in the care of HCV patients came in the 1990s, when combined therapy with ribavirin and pegylated interferon α was introduced (Feld et al. 2017; Pawlotsky et al. 2015). This approach however led to viral clearance in at best 50% of treated patients and, in addition, had serious side effects and was often incompatible with co-morbidities (Cuypers et al. 2016). In particular, the genotype of the infecting HCV strain had a major effect on treatment outcome, with some genotypes resulting in higher clearance rates than others (Fried and Hadziyannis 2004). Nonetheless, combination treatment with pegylated interferon α and ribavirin has been a stalwart of HCV treatment for decades, being often the only option available to patients with advanced liver disease (Oancea et al. 2020). Interestingly, some data indicated that two short regions within the NS5A protein might influence the efficacy of interferon-ribavirin treatment. Variability within the first region, termed interferon sensitivity-determining region (ISDR), has been suggested to affect the response to interferon (Enomoto et al. 1995; Witherell and Beineke 2001). Similarly, mutations within the variable region 3 (V3) of NS5A have been reported to modulate response to antiviral therapy in few studies (Duverlie et al. 1998; Inchauspe et al. 1991; Nousbaum et al. 2000). These results provided early indications of how HCV genetic variability could influence treatment outcome and even result in failure to cure the infection. During the 2010s, highly efficient direct acting antiviral (DAAs) were developed and approved for treatment for the first time (D’Ambrosio et al. 2017; Egerman 2019; Oancea et al. 2020; Pawlotsky 2014, 2020). These molecules specifically inhibit HCV viral enzymes or proteins with high affinity, thus allowing better viral suppression and reduced side effects. The three classes of DAAs currently available target the viral polymerase (NS5B), protease (NS3), and NS5A, and are often used in combination to reduce viral escape and improve treatment outcome (D’Ambrosio et al. 2017; Thiagarajan and Ryder 2015). Current DAA drugs allow the cure of most HCV chronic infections in a single treatment round of 8–12 weeks, with failing patients often being cured by subsequent or extended treatment regimens (Ahmed et al. 2021; Feld et al. 2015; Hezode 2017; Jackson and Everson 2017; Liu and Hu 2021; Parlati et al. 2021; Pawlotsky 2019; Sulejmani and Jafri 2018). The preexistence or development of resistance mutations during HCV treatment is well described and can lead to failure of specific antiviral combinations (Bagaglio et al. 2017; Ceccherini-Silberstein et al.

Mechanisms and Consequences of Genetic Variation …

239

2018; Hayes et al. 2021; Oancea et al. 2020; Sarrazin 2021; Vermehren and Sarrazin 2012). The availability of multiple regimens currently allows the treatment and cure of patients using different treatment combinations, however the long-term impact of circulating resistance mutations remains to be determined, especially in low-resource settings (Hezode 2017; Oancea et al. 2020). Humoral and cellular immune responses play an important role during early infection with HCV. Early formation of neutralizing antibodies (nAbs) has been associated with spontaneous clearance of infection, whereas a delayed humoral immune response is correlated to chronicity (Osburn et al. 2014; Pestka et al. 2007). Additionally, in resolved infections the T-cell cellular immune responses contribute to clearance and confers protective immunity against subsequent infections (Shoukry et al. 2003). Immune responses rely on the recognition of viral proteins determinants termed epitopes, which are often located in regions with high variability within viral genomes. Epitope variability is an effective mechanism of immune evasion for viral infections, and HCV is no exception (Callendret et al. 2011; von Hahn et al. 2007). The major immune epitopes recognized by neutralizing antibodies during humoral immune response are located within the HCV envelope proteins E1 and E2, whose coding regions have the highest variability within the HCV genome (Argentini et al. 2009). The E2 glycoprotein, in particular, contains two regions with high genetic variability termed hypervariable region 1 and 2 (HVR1 and HVR2), which display very low homology among HCV isolates and overlap with several immune epitopes (Le Guillou-Guillemette et al. 2007). Variability in or close to epitopes recognized by T-cells involved in the cellular immune response can also hinder immune recognition of HCV, in particular by CD8+ T-cells (Salimi Alizei et al. 2021). Similarly, changes occurring within genomic sequences involved in antigen presentation to T cells can interfere with T-cell activation and suppress cellular immune responses to HCV (Dustin and Rice 2007; Salimi Alizei et al. 2021). Genetic variability is thus the main driver of immune escape and is a major hindrance to immune control of HCV infections which leads to chronicity. The presence of variable regions containing immune epitopes across the HCV polyprotein and the wide genetic variability observed across genotypes are major hindrances for the development of an effective protective vaccine against HCV infection.

1.2 Molecular Biology of HCV HCV is an enveloped positive-sense, single-stranded RNA virus carrying a genome of about 9600 nucleotides. The genome encodes 3 structural (Core, and envelope E1 and E2) and 7 nonstructural (p7, NS2, NS3, NS4A and B, NS5A and B) proteins, which are translated as a single polyprotein (Webster et al. 2015). Viral polyproteins are co-translationally processed by viral and cellular proteases into individual proteins, which are located on the surface of the endoplasmic reticulum (ER) after synthesis and proteolytic cleavage (Hellen and Pestova 1999; Simmonds 1996). HCV infects

240

A. Galli and J. Bukh

primarily hepatocytes, through interaction with specific cellular receptors such as CD81, scavenger receptor class B, type I (SR-BI), occludin, and claudin-1 (CLDN1) (Bartosch et al. 2003; Colpitts et al. 2020; Evans et al. 2007; Pileri et al. 1998; Ploss et al. 2009). During viral entry, the envelope-embedded viral glycoproteins E1 and E2 sequentially interface with the different cellular receptors, resulting in endocytosis of viral particles and release of the genomic RNA into the cell cytoplasm. The positive-sense RNA genome functions as messenger RNA allowing initiation of translation of the viral polyprotein as soon as it is released into the cytoplasm (Hellen and Pestova 1999). Subsequently, the cleaved viral proteins organize into multiprotein complexes supporting viral replication, assembly, and release (Dubuisson 2007; Lohmann 2013). Genome replication is carried out in replication complexes (RCs), semi-spherical extrusions of the host cell ER, which are organized and maintained by several HCV proteins, including NS5A and B, NS4B, and NS34A (Lohmann 2013). Within the RCs, the positive sense viral genome is first used as template for the synthesis of a negative strand RNA by the viral polymerase, or NS5B (Li et al. 2021; Wang and Tai 2016). The same enzyme can then synthesize positive-sense RNA genomes by using the negative strand as template. The positive sense RNA molecules are furthermore used for production of viral proteins and packaging of new virions. The viral NS5B polymerase plays a central role in viral replication, and at this stage it has the ability to introduce a number of genetic alterations into the HCV genome (Sanjuan and Domingo-Calap 2016). Importantly, all genetic alteration will persist within the viral population of an infected cell and will only undergo selection once they are packaged and infect a naïve host cell. This gives the possibility of accumulation of multiple mutations within individual genomes and reassortment of mutations between different genomes, resulting in the high genetic variability of HCV displayed both in individual patients and at the population level (Argentini et al. 2009). Once synthesized, HCV genomes destined to packaging into novel virions are carried to the surface of cellular lipid droplets, through interaction with core and NS5A, where the assembly of novel virions initiates (Colpitts et al. 2020; McLauchlan 2009). The nascent virion is then transported to the surface of the ER where assembly proceeds further through interaction with several HCV proteins including NS2, NS3-4A, p7, and E1/E2. During this phase the virion translocates to the inner side of the ER thus acquiring its lipid envelope with embedded E1/E2 complexes (Bley et al. 2020; Cosset et al. 2020; McLauchlan 2009). Novel virions proceed through the Golgi network, allowing maturation of the viral envelope glycoproteins, and are finally released from infected cells through the secretory pathway (Bley et al. 2020). The extensive network of viral protein–protein and protein-RNA interactions observed throughout the HCV replication cycle, and the multiple roles played by individual proteins, pose restraints to viral evolution. Several studies have identified viruses from clinical isolates carrying amino acid changes that require compensatory mutations at one or more positions in other regions of the genome or result in high fitness costs for the viral population (Barnard et al. 2016; Dultz et al. 2021; Neumann-Haefelin et al. 2011; Ruhl et al. 2012). Similarly, in vitro experiments with

Mechanisms and Consequences of Genetic Variation …

241

engineered HCV mutants often demonstrated the need for compensatory mutations before becoming fully viable, underscoring the relevance of inter-molecular interactions in the HCV life cycle (Dazert et al. 2009; Gottwein et al. 2007, 2009; Han et al. 2009; Jensen et al. 2008; Scheel et al. 2011).

1.3 HCV Phylogeny HCV is classified as a member of the Flaviviridae family, genus hepacivirus, together with several mammalian viruses, such as the rodent hepacivirus (RHV), bat hepacivirus (BHV), and equine hepacivirus (NPEV) (Bukh 2016; Hartlage et al. 2016). Outside hepacivuses, viruses belonging to the genus pestivirus are considered among the most closely related to HCV, and research of pestiviruses has often served as the foundation for HCV studies (Smith et al. 2016). The high genetic heterogeneity of HCV has led to the evolution of different HCV genetic lineages, classified as genotypes and subtypes (Bukh et al. 1993; Bukh 2016; Simmonds et al. 2017). These are successful variants that carry specific sets of mutations that spread efficiently to a population subset, displaying a defined level of genetic diversity from other known genotypes or subtypes (Argentini et al. 2009). Although phylogenetically defined, genotypes differ at about 30% from one another, at the nucleotide level, whereas subtypes are defined within genotypes and differ at least 15% at the nucleotide level (Simmonds et al. 2017). HCV is currently classified into 8 genotypes, 6 of which are of epidemiological and clinical relevance (Borgia et al. 2018; Bukh et al. 1993; Smith et al. 2014). Each major genotype is further classified into several subtypes (Bukh 2016; Smith et al. 2016; Webster et al. 2015). Phylogenetic classification of HCV into genotypes and subtypes has relevance for molecular epidemiology and to understand and track the viral spread globally (Bukh et al. 1995; Messina et al. 2015). Different geographical regions have often specific circulating genotypes and subtypes, and defined risk groups or limited epidemics can be identified by their association with a specific strain (Guntipalli et al. 2021; Webster et al. 2000). Pathogenesis and natural history of HCV infection can vary slightly across genotypes, but are remarkably different only in genotype 3 infections (Goossens and Negro 2014; Jackowiak et al. 2014). This is the only genotype that has been associated with increased level of liver steatosis and worse overall prognosis (Goossens and Negro 2014; Rubbia-Brandt et al. 2000). The HCV genotype had important consequences for optimizing diagnostic tests and for treatment outcome, especially in the pre-DAA era, with certain genotypes responding better to IFNRBV therapy than others (Cuypers et al. 2016; Smith et al. 2014). The current DAAbased treatments are considered pan-genotypic and offer comparable cure rates across all major HCV genotypes (Hezode 2017; Thiagarajan and Ryder 2015). However, different genotypes can have different propensity to develop antiviral resistance, due the presence of pre-existing polymorphisms in their genetic sequence (Bagaglio et al. 2017; Barnard et al. 2016; Ceccherini-Silberstein et al. 2018; Pham et al. 2018, 2022;

242

A. Galli and J. Bukh

Ramirez et al. 2016; Vermehren and Sarrazin 2012). Such pre-existing mutations can either accelerate the development of additional resistance polymorphisms or confer themselves some level of antiviral resistance.

2 Mechanisms of Genetic Diversity 2.1 Mutation Polymerases are one of the workhorses of evolution, since their capability to introduce random errors while synthesizing new genetic molecules is an essential feature to guarantee genetic flexibility to life forms (Sanjuan and Domingo-Calap 2016). However, the error rate must be controlled: too slow and it would impair adaptation, too fast and it would lead to lethal aberrations (Bordería et al. 2016; Peck and Lauring 2018). Complex organisms have thus evolved polymerases with higher fidelity and proof-reading mechanisms, to minimize deleterious effects of random mutations (Bebenek and Ziuzia-Graczyk 2018). Viruses, on the other hand, rely on high mutations rates for their adaptation and survival under selective pressure (Peck and Lauring 2018). The deleterious effects of high mutation rates are compensated by the extreme viral population sizes, high turnover, and short generation times. RNA viruses, in particular, have some of the highest polymerase error rates observed (Sanjuan and Domingo-Calap 2016). HCV is no exception, and with a viral turnover of around 1010 –1012 virions per day and a viral half-life of only around 3 h it has the capacity of producing a large array of genetic variants (Neumann et al. 1998). Thus, HCV displays a remarkable high level of genetic diversity and novel mutants or variants are constantly being generated and selected during infection and transmission.

2.1.1

Generation of Single Nucleotide Polymorphisms (SNPs)

The virally encoded RNA-dependent-RNA-polymerase (RdRp), or NS5B, is the key enzyme implicated in HCV genome replication (Bartenschlager and Lohmann 2000). NS5B synthesizes both the negative RNA strand from the positive stranded genomic template, as well as novel HCV genomes templated on the negative RNA strand intermediate (Lohmann 2013). It shares the overall topology of other viral polymerases, with thumb, palm, and fingers domains recognizable in its three-dimensional structure (Shu and Gong 2016). These three domains are arranged around a central cavity containing the active site, to which the RNA template and incoming nucleotides have access (Biswal et al. 2005; Lesburg et al. 2000). Once incoming nucleotides have been recognized by the enzyme and paired to its cognate on the template, the polymerase proceeds to extend the nascent RNA molecule with the following nucleotide. Incorporation errors occur at the recognition step when the polymerase needs to determine whether a nucleotide is correctly paired to its template counterpart or not.

Mechanisms and Consequences of Genetic Variation …

243

The polymerase ability to correctly incorporate the incoming nucleotide is termed fidelity, and is key to the polymerase capacity to produce genetic diversity (Sanjuan and Domingo-Calap 2016). In addition, as common for RNA virus polymerases, HCV NS5B lacks proofreading capability in the form of a 5, -3, exonuclease activity domain and is thus unable to remove mis-incorporated nucleotides once it has moved to the next position. When combined, the high turnover of HCV genomes, the lack of proofreading capability, and the low fidelity of the HCV RdRp, result in an overall high incorporation error rate of NS5B, estimated at around 10–4 substitutions per nucleotide per generation (Echeverria et al. 2015). As a result, the activity of the viral RNA-dependent-RNA-polymerase leads to the production of genetically diverse viral quasispecies (Fig. 1). That is, a population of distinct but closely related viral variants within the infected host (Argentini et al. 2009; Domingo and Perales 2018). It has been estimated that HCV quasispecies carry every possible combination of single and double mutations at any given moment, in the absence of specific selective pressure (Rong et al. 2010). In addition, the HCV polymerase is considered capable of genomic strandswitching (Galli et al. 2022; Reiter et al. 2011; Scheel et al. 2013), that is the ability to detach from the RNA template during synthesis of a nascent RNA molecule, and attach to a distinct template before resuming synthesis. Strand-switching is a key requisite for replicative recombination (Perez-Losada et al. 2015). The HCV polymerase has thus a central role in the generation of viral diversity, being able to both introduce random polymorphisms in the viral genome by nucleotide incorporation errors, and re-assorting mutations into novel combinations by genetic recombination. However, the distribution of mutations along the HCV genome is not uniform, some regions are more prone to mutate than others, due to different selective pressure

Fig. 1 Generation of viral quasispecies. During HCV genome replication, the viral polymerase introduce random mutations across the viral genome. The figure depicts a simplified representation of HCV quasispecies, with viral genomes represented by black lines and mutations indicated by colored boxes. Each color represents a different mutation, whereas the box position on the line approximates the mutation location along the genome. At each successive generation additional mutations can emerge on each genome, expanding the number of mutational combinations in the viral population

244

A. Galli and J. Bukh

acting on them (Argentini et al. 2009). Functionally important RNA regions and coded proteins have lower tolerance for mutations, as they can disrupt the function of key domains and motifs. The 5, - and 3, - UTRs contain RNA structures and sequences that are essential for viral replication, and are among the most conserved portions of the HCV genome across genotypes (Sagan et al. 2015). Similarly, the 3, -end portion of NS5B contains key structural elements involved in genome replication and has also relative low variability (Shi and Lai 2006). The core region is highly conserved across genotypes, probably due to structural constrains related to the capsid assembly and the multiple interactions that this protein has with several other viral factors during virion assembly (Echeverria et al. 2015). The NS3 protease region contains several highly conserved domains, especially surrounding the catalytic site (Lodrini et al. 2003; Raney et al. 2010). Due to its high conservation NS3 is often used to examine viral evolutionary correlation among flaviviruses, since the viral protease is essential for the life cycle of all members of the Flaviviridae family (Ryan et al. 1998). On the other hand, the E1 and E2 regions display the higher variability among genotypes and within quasispecies (Le Guillou-Guillemette et al. 2007). In particular, the E1 region has been shown in some cases to possess over 50% variability among isolates (Bukh et al. 1993). E2 contains the HVR1 and HVR2 domains, which show sequence homology as low as 50% between different isolates. These regions include important neutralizing epitopes but few functional constrains, so that the variability of HVRs contribute to HCV immune escape with limited fitness cost (Prentoe and Bukh 2018).

2.1.2

Polymerase Mutants

Of particular interest are mutations occurring within the NSB5 region, as they can directly affect the polymerase functionality during HCV replication. Such polymorphisms can affect for example the polymerase processivity, fidelity, or accessibility, thus altering the enzyme’s own mutation rate or recombination capacity (Sanjuan and Domingo-Calap 2016). During the normal viral life cycle, the emergence of such mutations will be balanced against their impact on viral replication, reaching an equilibrium between genetic variability and viral fitness. Under selective pressure, however, mutations affecting the functionality of the polymerase itself might confer competitive advantages to particular viral variants and become fixed in the viral population. Polymerase mutants with altered fidelity, emerging primarily upon antiviral treatment with nucleotide analogs, have been described for several RNA viruses (Bordería et al. 2016). These mutants have an increased capacity of rejecting incorrectly paired nucleotides during RNA synthesis, thus reducing the overall error rate of the enzyme. In the context of treatment, mutations that increase polymerase fidelity represent thus one kind of resistance substitutions, allowing the virus to dampen the effect of antiviral drugs. Analogous mutations have also been identified in HIV1, in which they confer resistance by increasing the accuracy of the virally-encoded reverse-transcriptase (Menendez-Arias 2009). In HCV, results obtained from in vitro

Mechanisms and Consequences of Genetic Variation …

245

treatment with ribavirin indicated the emergence of specific mutations conferring increased fidelity to the viral polymerase (Mejer et al. 2020b). Mutations within the polymerase region can affect other enzyme functions, as well. Mutants affecting polymerase processivity and RNAse H activity have been described in HIV-1. In this virus, polymerase processivity and RNAse H activity are both related to the rate of strand-switching, thus affecting viral recombination frequency (Hwang et al. 2001). Although similar mutations affecting NS5B processivity have not yet been identified in HCV, it is arguable that mutants affecting recombination rates could arise under specific conditions. Interestingly, mutations affecting HCV polymerase processivity have been identified in NS5A, underlining once more the relevance of inter-protein interactions in HCV (Mani et al. 2015). The effect of such mutations to the rate of viral recombination has not been investigated. Polymerase variants with altered fidelity or processivity thus highlight the capacity of RNA viruses to adapt to exogenous selection pressures by altering their intrinsic mutation rates.

2.2 Recombination Recombination is an evolutionary mechanism shared across all organisms, allowing the generation of novel genetic combinations from different parental genomes. In the context of viruses, recombination commonly occurs between different genomes of the same species but can more rarely occur between viral genomes and host cell genetic material, or between genomes originating from different viruses (Becher and Tautz 2011; Lai 1992; Li et al. 2011). Recombination thus allows the rapid acquisition of genetic features in a single round of replication, and, coupled with the introduction of single-point mutations by polymerase errors, can function as a “shortcut” in viral evolution (Worobey and Holmes 1999). Recombination can thus serve as a rescue mechanisms for virus viability (Gallei et al. 2004; Li et al. 2011; Scheel et al. 2013), increase viral pathogenicity (Becher et al. 2001; Rousset et al. 2003), increase fitness (Gottwein et al. 2011b; Scheel et al. 2013), and accelerate the emergence of drug resistance (Rambaut et al. 2004). Both RNA and DNA viruses have been shown to recombine, although their estimated recombination rates can vary widely (Perez-Losada et al. 2015). Flaviviridae are considered a low-recombination viral group, compared to viruses such as Herpesvirus or HIV (Ward et al. 2013). The recombination rate of HCV has indeed been considered low for a long time, mostly based on the limited number of observed recombinant forms in infected individuals and early experiments in chimpanzees (Galli and Bukh 2014; Gao et al. 2007). Recent data, however, suggest that HCV can recombine efficiently in vitro showing recombination rates much higher than previously estimated (Galli et al. 2022; Scheel et al. 2013). A reason for this apparent discrepancy could be due to the strong selective pressure acting against most recombinant forms, resulting in an apparent lower recombination frequency

246

A. Galli and J. Bukh

especially observed in vivo. Additionally, challenges in the identification of recombination between related strains and limitations in genotyping strategies used in HCV molecular epidemiology monitoring, might have resulted in underestimation of HCV recombinant forms prevalence (Avo et al. 2013; Bukh et al. 1995). The extensive network of interaction between HCV viral proteins during replication results in an overall lower genetic plasticity for this virus, so that single point mutations in one genomic region often require multiple compensatory mutation in other regions to maintain virus viability or fitness population (Barnard et al. 2016; Dazert et al. 2009; Gonzalez-Candelas et al. 2011; Gottwein et al. 2007; NeumannHaefelin et al. 2011; Scheel et al. 2011). In this context, it is noteworthy that most intergenotypic recombinant forms identified in patients share a peculiar genetic structure with crossover breakpoints located within the NS2-NS3 genomic region (Galli and Bukh 2014; Gonzalez-Candelas et al. 2011). These observations suggest that recombination events occurring within this genomic region can more easily result in viable recombinants compared to crossovers located in other areas of the genome. This is also mirrored by in vitro generated recombinants, in which recombination breakpoints located within the NS2-NS3 area have a higher chance of producing viable viruses, compared to other recombination points (Gottwein and Bukh 2008; Scheel et al. 2013). Since most proteins in the NS3-NS5B genomic region interact heavily during HCV replication, whereas the E1-NS2 block is heavily involved in virion assembly and release, it could be argued that the exchange of these two functional blocks gives higher chances of producing viable recombinants than swapping of other regions and random sites along the genome. Recombination in HCV can occur through two main mechanisms, replicative dynamic copy–choice and non-replicative breakage–rejoining (Fig. 2) (Belling 1933; Janssens et al. 2012; Lai 1992). Both mechanisms can lead to the generation of either homologous or non-homologous recombinants, depending on the structure of the crossover site in the newly generated genome. Homologous recombinants share the same genetic structure as their parental genomes, whereas non-homologous recombinants have an aberrant genetic structure including insertion, deletions, or duplications. Non-homologous recombinants have a higher chance of being nonviable, thus being rarely observed both in vitro and in vivo.

2.2.1

Non-replicative Recombination

Non-replicative recombination posits that genetic fragments originated by different means can be joined together by cellular machinery thus generating genetic chimeras (Gmyl et al. 1999). Fragments of viral RNA genomes could be produced by mechanical stress, endoribonuclease activity, or cryptic ribozymes, resulting in molecular termini compatible with either self-ligation or ligase-mediated re-joining. Due to the inherent non-specificity of these mechanisms, molecular repair can occur between fragments of HCV genomes, but could also involve genetic fragments from other viruses or cellular RNA molecules, thus broadening the range of possible chimeric forms. Non-replicative recombination can in principle produce both homologous and

Mechanisms and Consequences of Genetic Variation …

247

Fig. 2 Recombination mechanisms. Recombination of HCV genomes can occur by two different mechanisms, replicative and non-replicative. This figure exemplifies HCV recombination in the presence of two hypothetical genomes, represented by the black and red lines, carrying different mutations, illustrated by the colored boxes. In the case of replicative recombination, the polymerase initiates replication using one genome as template, then detaches from the template while remaining associated with the nascent RNA molecule (indicated by the blue line). The polymerase can then anneal with a different genome to resume synthesis, thus producing a chimeric genome containing parts of each parental template. In this case the generated recombinant molecule is overwhelmingly homologous, without insertions or deletions. Alternatively, genomes can become fragmented due to chemical or mechanical factors, a pre-requisite for non-replicative recombination. The genomic fragments originating from different genomes are cross compatible and can be re-ligated by cellular ligases resulting in chimeric genomes. In this case, the structure of the recombinant molecules is commonly non-homologous, due to the random nature of the occurring breakages, resulting in genomes with insertions or deletions. Subsequent recombination cycles can “optimize” such genomes by removing the insertions

non-homologous recombinants, although the frequency of non-homologous structures will be much higher due to the random nature of the ligation mechanism (Gallei et al. 2004; Scheel et al. 2013). Non-replicative recombination has been demonstrated for HCV, as shown by the generation of viable recombinant HCV genomes originated from non-replicating parental viruses (Li et al. 2011; Scheel et al. 2013). In addition, engineered HCV mutants have been shown to acquire structured RNA sequences from cellular RNAs to restore virus viability (Li et al. 2011). Although the appearance of viable recombinants was rare, breakpoints could be identified at different location within the HCV genomes. These results suggest that a random breakage and joining mechanisms is involved in non-replicative recombination of HCV. The number of observed recombinants was low, again suggesting that most recombination events resulted in non-viable forms that were eliminated from the viral population by natural selection. Indeed, most identified viable recombinants were non-homologous, supporting a random breakage-rejoining model. However, non-homologous recombinants are seldom observed in vivo. Data from HCV in vitro studies showed that non-homologous recombinants evolve rapidly into homologous recombinants, by eliminating genomic duplications through replicative recombination. These data incidentally suggest that non-homologous recombination in HCV might be more frequent than previously assumed, as non-homologous recombinants appear to be very difficult to detect both due to counter-selection and rapid evolution.

248

A. Galli and J. Bukh

Studies in other RNA viruses had previously suggested the existence of cryptic ribozymes within viral genomes, leading to the generation of genetic fragments of specific length and structure (Gmyl et al. 1999). The variability in the observed crossover sites in the viable HCV recombinants argues against the presence of such cryptic sites in the HCV genome. On the other hand, poliovirus and the HCVrelated pestiviruses have been shown to recombine by non-homologous recombination, with a mechanism involving re-ligation of genetic molecules with 3, -phosphate and 5, -hydroxyl, compatible with either endoribonucleolytic cleavage or mechanical breakage (Austermann-Busch and Becher 2012; Gallei et al. 2004). These data suggest that non-replicative recombination in HCV occurs though a breakage-rejoining mechanism, where genetic fragments produced by endoribonucleases or physical damage are randomly joined by cellular enzymes or though selfligation. Supporting this hypothesis, a linear correlation between RNA concentration and frequency of recombination was observed in pestiviruses (Austermann-Busch and Becher 2012) and HCV (Scheel et al. 2013). The population of randomly generated non-homologous recombinants is heavily selected and only very few viable genomes can replicate and survive. Further evolution of the viable genomes results in the production of one or few fit viral genomes with conventional genetic structure that eventually become dominant.

2.2.2

Replicative Recombination

Replicative recombination relies on the ability of most viral polymerases to change template during synthesis of a new genomic molecule, thus producing a chimeric genome that contain parts of different parental templates (Lai 1992). The underlying mechanism is the copy-choice model, according to which a nascent RNA molecule can dissociate from its template and bind to either a different one or a different region of the same template. After transfer, the replication complex resumes synthesis of the RNA genome resulting in a chimera. As for non-replicative recombination, the replicative model can lead to either homologous or non-homologous recombinants, however in the case of replicative recombination homologous crossover are more common than non-homologous ones (Baroth et al. 2000; Bowman et al. 1998; Delviks et al. 1997). The level of sequence homology between templates will affect the rate between homologous and non-homologous recombination: highly similar templates will produce primarily the first type, whereas more divergent parental genomes can result in non-homologous recombinants. HCV can produce homologous recombinants both directly through homologous recombination between viral genomes and indirectly through non-homologous recombination between non-homologous recombinant genomes (Andrea Galli et al. 2022; Scheel et al. 2013). The first scenario has been demonstrated in cell culture systems by detecting recombinant genomes carrying selected markers carried by different parental genomes (Andrea Galli et al. 2022; Reiter et al. 2011). Short generation times and single-cycle infections heavily supported the hypothesis that

Mechanisms and Consequences of Genetic Variation …

249

the detected recombinants were generated directly through homologous replicative recombination. In the latter case, first-generation non-homologous recombinants, generated by either replicative or non-replicative recombination, produce homologous recombinant genomes over one or more subsequent recombination events (Scheel et al. 2013). This mechanism has been demonstrated in cellculture systems, in which duplicated genomic regions generated by non-homologous non-replicative recombination were eliminated by replicative recombination in subsequent replication cycles (Scheel et al. 2013). Replicative recombination is possible due to the ability of viral polymerases to change template during replication, a phenomenon termed “strand switching”. Several viral polymerases have been demonstrated capable of strand switching in vitro (Kim and Kao 2001), although direct evidence of strand switching of the HCV polymerase is still lacking. Several studies have demonstrated the capacity of HCV to produce homologous recombinants during replication, strongly supporting the capacity of HCV polymerase to strand-switching (Galli et al. 2022; Reiter et al. 2011; Scheel et al. 2013). The identification of homologous recombinants produced by excision of duplicated regions is also a strong indication of polymerase strand switching. Additionally, a recent study demonstrated the increase in homologous recombination frequencies in HCV with the increase of observation distances, correlating with the increased chances of strand-switching over longer polymerization stretches (Galli et al. 2022). A similar strategy has been demonstrated for HIV, that encodes a polymerase capable of switching templates at high frequency (Rhodes et al. 2003). The molecular mechanisms of HCV copy-choice recombination remain unclear, however replicative recombination has been extensively studied in several RNA viruses (such as pestiviruses) and in retroviruses (such as HIV) (Becher and Tautz 2011; Delviks-Frankenberry et al. 2011; Fricke et al. 2001; Lund et al. 1999; Meyers et al. 1992; Thomson and Najera 2005). Notwithstanding important differences in their viral life cycles, several similarities in the molecular aspects of recombination can be evidenced in retrovirus and other RNA viruses. These principles can reasonably be considered valid for HCV, especially given its genetic closeness to pestiviruses. HCV recombinants generated by replicative recombination need to be viable to overcome natural selection. Although homologous recombinants have a higher chance of survival, given their correct genetic structure and lack of major genetic aberrations, their viability is not assured. The wide array of intramolecular interactions among HCV proteins, can result in low fitness or lack of viability after recombination, as confirmed by the difficulty of generating in vitro engineered viable recombinants.

2.3 Impact of Genetic Variability The combination of mutations and recombination events introduced by the HCV polymerase during genomic replication, potentially allows the virus to efficiently

250

A. Galli and J. Bukh

Fig. 3 Selective pressure. The large variety of mutated genomes represented in the HCV quasispecies allow the virus to respond to changes in the surrounding environment. These can result from example from immune pressure during infection or antiviral treatment of chronic infection. In both cases, the applied selective pressure result in a population bottleneck that selects only viral genomes able to survive in the changed conditions. The ability to survive is typically associated with the presence of specific mutations in the HCV genomes. The figure illustrates how specific genetic variants in the viral quasispecies, depicted in a similar way as in Fig. 1, can survive a hypothetical selective pressure and become prevalent in the viral population. The mutation indicated by the black box confer resistance to the applied pressure and allow survival of all variants that carry it. Novel mutations can occur at any time further promoting resistance

explore the mutational landscape, generating large numbers of mutation combinations within the quasispecies. These pre-existing variants make it possible for HCV to rapidly respond to changes in its environment (Farci et al. 2000; Farci 2011). During infection, recognition of HCV epitopes by the humoral and cellular immune responses is critical for viral clearance (Dustin and Rice 2007; Farci et al. 1994). Thanks to the existing heterogeneity and the ability to accumulate additional novel mutations, HCV can develop escape mutant epitopes that can effectively avoid recognition by the immune system. Emergence of escape epitopes is associated with progression into chronicity (Erickson et al. 2001). The presence of multiple genomic variants in HCV quasispecies has also relevance during treatment of chronic infection (Farci et al. 2000). In this case, mutations present in the viral population can provide partial or complete resistance to specific antivirals, thus allowing the virus to escape treatment or facilitating the further accumulation of additional resistance mutations (Fig. 3) (Farci et al. 1994).

2.3.1

Viral Adaptation

The viral population infecting an individual is subject to natural selection and immune pressure. The first allows the evolution of viable viruses with high fitness, whereas the second select genomes with escape epitopes that can avoid recognition by the immune system. The dynamic balance between error-driven mutations and selective pressure results in the existence within the infected host of one or few primary viral

Mechanisms and Consequences of Genetic Variation …

251

populations, representing viruses with high fitness, surrounded by minor populations with different mutations and possibly lower fitness (Echeverria et al. 2015). The continuous introduction of novel polymorphisms in the quasispecies by the error prone NS5B, allows the establishment of a dynamic equilibrium between viral population and selective pressure, and permits the exploration of large numbers of mutations. This results in the continuous evolution of the host HCV viral population over time, with accumulation of mutations in the viral genome. The existence of quasispecies grants HCV enormous genetic plasticity, allowing the virus to adapt rapidly to changes in the surrounding environment (Farci and Purcell 2000). Although viral evolution can be monitored in patients, by performing longitudinal observations and viral sequencing, the correlation between virus variability and fitness is better evaluated using in vitro cell culture systems. However, HCV isolates from infected patients are non-viable in cell culture, and specific constructs and cell lines had to be developed to allow culturing of HCV in vitro. Several HCV cell culture systems are now available for the study of the viral life cycle, representing multiple genotypes and isolates (Ramirez and Bukh 2018). In cell culture in vitro experiments, engineered HCV can evolve rapidly and become more fit by acquiring mutations that favor growth within a defined cell line (Li et al. 2012; Pham et al. 2018, 2022; Ramirez et al. 2014, 2016). In fact, most of the HCV cell culture systems used experimentally today have been developed and optimized by allowing HCV constructs to become adapted to a specific cell line (Ramirez and Bukh 2018). This is even more evident when developing artificial HCV recombinants, combining different strains or inserting exogenous genes within the HCV genomes (Gottwein et al. 2007, 2011b; Jensen et al. 2008; Scheel et al. 2008, 2011). These chimeric HCV genomes are often sub-optimal or non-viable but can acquire compensatory replication enhancing mutations that result in viable and infectious viruses. Even recombinants that are generated intracellularly during in vitro infection can have sub-optimal fitness and can acquire higher fitness by sequentially accumulating favorable mutations (Scheel et al. 2013). Thus, genetic variability provides HCV the capacity to rapidly adapt to changing conditions, supporting its ability to efficiently evolve and survive. The high evolutionary rate of HCV is of particular relevance for the immune response against acute HCV infection. Within the E1/E2 protein complex embedded in the virus envelope, E2 represents the major target for neutralizing antibodies and is the main receptor binding protein of HCV (Dustin and Rice 2007). E2 contains several variable regions, that contribute to its high variability and increase genetic diversity for immune evasion (Prentoe and Bukh 2018; Prentoe et al. 2019). The E2 glycoprotein contains several glycosylation sites, which are correlated to immune escape (Prentoe et al. 2019). Neutralizing antibodies have been shown effective in protecting against HCV infection (Bukh et al. 2015; Meuleman et al. 2011). In addition, several vaccines against other viruses currently in use rely on the induction of neutralizing antibodies to prevent infection (Plotkin and Plotkin 2011). A region within E2, HVR1, is heavily entwined with neutralizing antibody recognition as it contains several epitopes and can affect the overall sensitivity of the virus to antibody neutralization (Augestad et al. 2021; Bailey et al. 2015). On the one side, HVR1

252

A. Galli and J. Bukh

represent a major target for neutralizing antibodies, especially in early infection, and its variability helps HCV escape immune suppression and establish chronic infection. On the other hand, HVR1 has been shown to exert protection from immune recognition to epitopes that are located outside its region, through mechanisms not entirely understood (Augestad et al. 2021). Thus, evolution of the HVR1 region throughout infection provide HCV escape mechanisms from neutralizing antibody recognition, both directly through epitope escape, and indirectly through broader protection of other epitopes. Additionally, genetic variability in epitopes associated with the cellular immune response can further dampen immune recognition of HCV, in particular by cytotoxic T-cells (CTLs) whose function is essential for viral clearance (Bowen and Walker 2005; Erickson et al. 2001; Salimi Alizei et al. 2021). Emergence of CTL escape epitopes is strongly associated with HCV progression to chronic infection. Variation in other areas of the HCV genome can further affect different steps involved in antigen presentation to T cells, thus reducing T-cell activation and suppressing cellular immune response to HCV (Dustin and Rice 2007). Genetic variability is thus the main driver of immune escape and is a major hindrance to immune control of HCV infections (Farci et al. 2000). The extensive genetic heterogeneity of HCV encapsulated by the vast number of circulating variants and the continuous intra-patient accumulation of escape mutations have enormous implication for the development of a preventive vaccine against HCV infection. An effective HCV vaccine will likely need to be able to elicit broad, cross-genotype, protective immunity through both neutralizing antibodies and cellular immunity (Shoukry 2018).

2.3.2

Antiviral Resistance

One of the most relevant real-world consequences of HCV genetic variability is the virus ability to acquire drug resistance mutations, thus hampering the efficacy of antiviral treatment efforts. When antiviral treatment is applied to a virus population, existing mutations within the quasispecies that confer fitness advantages to the virus will be selected and become prevalent within the population. Further mutations in selected variants and recombination among them will lead to the emergence of viral strains with higher resistance levels and potentially fitness, resulting in viruses with reduced sensitivity to antiviral drugs and treatment failure. Development of antiviral resistance represents a major problem for viral treatment and should be considered when planning therapeutic regimens. In the early days of HCV treatment, interferon was the one and only option to treat chronic HCV infection. The clearance rate after interferon treatment was exceedingly low, and in an attempt to identify genetic correlates of virological failure, early studies identified putative interferon resistance mutations in ISDR in NS5A. Although the relevance of the ISDR for interferon sensitivity could not be definitely demonstrated, research continued to focus on identifying HCV genetic determinants of drug sensitivity (Pawlotsky et al. 1998).

Mechanisms and Consequences of Genetic Variation …

253

Later, once ribavirin and interferon treatment became the standard of care for HCV, the focus of antiviral resistance studies shifted to the viral polymerase, as this is the main target of the nucleotide analog ribavirin. The mechanism of action of this antiviral drug remains not completely clarified, but research has shown that at least one of its mechanisms of action is to increase the mutation rate of NS5B, thus leading to production of viruses with excess mutations and reduced viability (Asahina et al. 2005; Chevaliez et al. 2007; Dietz et al. 2013; Galli et al. 2018; Hofmann et al. 2007; Lutchman et al. 2007; Mejer et al. 2018; Saito et al. 2020; Young et al. 2003). Treatment with ribavirin results in relatively low clearance rates, even when paired with interferon, prompting the question of whether virally encoded drug resistance polymorphisms were involved. Putative ribavirin resistance mutations have been identified within the NS5B region, but their significance for in vivo drug resistance has been difficult to ascertain (Asahina et al. 2005; Chevaliez et al. 2007; Dietz et al. 2013; Hofmann et al. 2007; Lutchman et al. 2007; Mejer et al. 2020a; Wing et al. 2019; Young et al. 2003). Recently, mutations identified in chronically infected patients treated with ribavirin monotherapy, located within the HCV polymerase, have been shown to affect viral susceptibility to ribavirin using in vitro cell culture systems (Mejer et al. 2020b). The mutations identified to confer reduced sensitivity to ribavirin, were positioned spatially close to the nucleotide entry site of the HCV polymerase, based on the three-dimensional structure of the NS5B deducted for genotype 1a (Harrus et al. 2010; Mosley et al. 2012). These findings suggested that the observed polymorphisms could modulate the accessibility of the NS5B to ribavirin, thus resulting in fewer misincorporations. Alternatively, the mutations could increase the polymerase fidelity overall, resulting in lower error rates for all nucleotides. The effect of these polymorphisms on ribavirin treatment was comparatively mild, at least relative to other antiviral treatment resistance mutations, but when considered in the context of the low efficacy of ribavirin they could play a key role in treatment response. Interestingly, some of the identified resistance mutations seem to affect ribavirin sensitivity by increasing polymerase fidelity overall, rather than or in combination with selectively excluding ribavirin from incorporation into the nascent RNA molecule. Additionally, recent data indicate that HCV sensitivity to ribavirin might be modulated by polymorphisms located outside the NS5B region, through mechanisms not yet understood (Mejer et al. 2020b). These mutations are thus not involved in bona fide antiviral resistance but could be related to viral fitness modulation or immune evasion. HCV genetic variability affects ribavirin sensitivity in yet another aspect, as different genotypes display different response to ribavirin treatment (Cuypers et al. 2016; Mejer et al. 2020a). Since polymorphisms defining genotype classification are not limited to the NS5B region, these mutations are also not likely to directly affect polymerase fidelity or ribavirin recognition, but affect drug sensitivity through other, yet to clarify, mechanisms. Taken together, these results highlight how the genetic plasticity of HCV allows the virus to respond in different ways to ribavirin treatment. After the advent of DAAs as mainstay of HCV treatment, resistance mutations to all classes of antiviral drugs emerged and were characterized. The first-generation

254

A. Galli and J. Bukh

protease and NS5A inhibitors displayed different efficacies to the diverse HCV genotypes due to their genetic background (Gottwein et al. 2011a; Li et al. 2014; Ramirez et al. 2014). Moreover, different HCV genotype and subtypes can have different levels of pre-existing resistance due to specific polymorphisms, and thus are more prone to treatment failure than others. Additionally, these early DAAs had relatively low genetic barrier to resistance, that is the number of nucleotide substitutions that their target protein needs to acquire to produce a resistant viral phenotype. In vitro studies using cell culture system representing several different HCV genotypes identified and confirmed the relevance of several resistance mutations in the protease and NS5A regions (Gottwein et al. 2018; Jensen et al. 2015; Serre et al. 2016). Highly efficient NS5B inhibitors were soon approved for HCV treatment, also displaying differential efficacy against different genotypes (Ramirez et al. 2014). The ability of HCV to easily escape single drug treatment led to the early introduction of multi drug treatment against HCV in patients care, resulting in markedly lower treatment failures (Welzel et al. 2014). These clinical observations were confirmed by in vitro studies showing lower rates of viral escape under combination treatment (Gottwein et al. 2013). When multiple drugs targeting different viral components are combined, the selective pressure against the virus is higher thus lowering the chance of resistance mutations emerging simultaneously. In this case, individual strains within the quasispecies could carry resistance mutations to different drugs, but they would remain sensitive to the other component of the combination treatment and undergo limited replication. Thus, the emergence of a multi-drug resistant strain would rely on either recombination between partially resistant variants, or accumulation of multiple subsequent mutations, both low frequency events in the context of viruses with limited replication capacity. The role of recombination in the emergence of single- and multi-drug resistance has not been definitely demonstrated, although findings in other viral infections strongly indicate an important role of recombination in the evolution of strains with multiple resistance mutations (Kellam and Larder 1995; Moutouh et al. 1996). Mathematical models indicate that recombination could accelerate the development of multi-drug resistant variants by bringing together different resistance mutations carried by distinct genomes within the viral population (Carvajal-Rodriguez et al. 2007; Rambaut et al. 2004; Vijay et al. 2008). This is of particular relevance for multi-class treatment regimens, where resistance mutations to the different classes of antivirals are located in distinct regions of the genome and are thus more likely to recombine efficiently (Rhodes et al. 2003). Interestingly, recent evidence suggests that mutations located outside of the antivirals target regions can provide varying degrees of drug resistance, through mechanisms not completely understood (Fahnøe et al. 2021; Smith et al. 2021). These mutations are located far from the antiviral targets along the genome, so that recombination could play a significant role in their re-assortment with genomes carrying mutations in the antivirals target regions thus producing viruses with multiple resistance substitutions.

Mechanisms and Consequences of Genetic Variation …

255

Subsequently, the development of improved molecules led to drugs with higher genetic barrier and better response to different genotypes (Gottwein et al. 2018; Jensen et al. 2015; Scotto et al. 2019; Serre et al. 2016). However, even the newer drugs remain sensitive to the development of antiviral resistance, especially when used alone (Gottwein et al. 2018; Pawlotsky 2016; Pham et al. 2018, 2019; Serre et al. 2016). The recent availability of pan-genotypic combination treatments offers high-efficacy therapeutic option with extremely high barrier to resistance and low escape rates. Nonetheless, under certain conditions, HCV is able to escape these latest-generation drug combinations, thanks to its high genetic variability (Pham et al. 2022; Ramirez et al. 2016, 2020). Additionally, specific genotypes can carry treatment escape substitution providing resistance to specific drugs, or facilitating the emergence of single and multidrug resistance patterns (McPhee et al. 2019). Chronic HCV patients can also carry pre-existing resistance substitutions within the infecting quasispecies, possibly reducing the efficacy of subsequent treatments (Lindström et al. 2015). Finally, the continuous evolution of HCV globally may lead to the emergence of novel isolates carrying resistance associated substitutions, hampering the deployment of effective curative therapies around the globe (Nguyen et al. 2020). Drug resistance remains thus a serious potential issue in the long term, when considering the potential impact of pre-existing resistance substitution and the spread of partially resistant variant in areas with limited access to treatment.

3 Final Considerations The unique high genetic variability of HCV, hardly matched by any other virus, has represented a formidable obstacle to the cure of chronic HCV infections and the development of effective vaccines. The existence of several highly divergent genotypes and subtypes, displaying different pathological outcomes, and possessing varying level of drug susceptibility and barriers to resistance, have further complicated the clinical approach to the HCV global epidemic. Only the advent of highly efficacious DAAs allowed for the first time to cure a majority of patients with chronic hepatitis C. Despite their high efficacy, however, the effect of DAAs can be diminished by the emergence or preexistence of resistance associated mutations in the HCV genome. In addition, some DAAs exhibit different level of efficacy when used against different HCV genotypes or variants. Thus, HCV genetic variability is still a formidable challenge even in the DAA era. Thanks to the availability of classes of DAAs targeting different viral proteins, and the presence of therapeutic options within each class, today combination therapies represent the recommended treatment option for most HCV infections. By targeting 2 or 3 viral components simultaneously, and by optimizing the drug combination according to genotype, barrier to resistance, and existing mutations, the impact of HCV genetic heterogeneity can be minimized thus reducing the chance of resistance development. In recent years, the advent of pan-genotypic DAA treatments has further reduced the relevance of the infecting genotype, allowing more flexibility in the choice of combination treatments.

256

A. Galli and J. Bukh

However, DAA treatments alone will not be sufficient to reach the WHO ambitious goal of eliminating HCV as a public health threat by 2030 (WHO 2017). Treatment availability is still unevenly distributed across the globe, with the cost of more advanced treatment combinations constituting a limiting factor in limited resources settings. Additionally, a large part of the HCV infected population remains undiagnosed, and thus will not be treated and cured with DAAs combinations. This unaware infected population perpetuates the chronic infection burden. The best option to eliminate HCV as a public threat, worldwide and with low access barriers, is the development of a universal protective vaccine. HCV global heterogeneity and genetic variability pose tremendous challenges to this task, but much progress has already been made. The optimal vaccine candidate should most likely be able to elicit neutralizing antibodies and T-cell responses, with broad recognition so as to protect against multiple viral genotypes and variants. Broadly neutralizing antibodies have been identified, demonstrating that multi-strain recognition is effective and a feasible approach to vaccine design. Today, virtually all chronic HCV infections can be cured using DAA-based regimens. However, given the exceedingly high number of infected individuals worldwide and the continuous detection of new subtype variants of HCV, some with preexisting resistance mutations, the emergence of multi-drug resistant strains remains a very real concern. The development of an HCV vaccine is thus not only a priority, but a necessity for the future control of HCV infection. Only by further clarifying the molecular mechanisms of HCV evolution and monitoring the circulation of new variants, including surveillance of resistance mutants, can we stay ahead of the virus and control the worldwide HCV epidemic.

References Ahmed R et al (2021) Sofosbuvir/velpatasvir—a promising treatment for chronic hepatitis C virus infection. Cureus 13(8):e17237 Argentini C et al (2009) HCV genetic variability: from quasispecies evolution to genotype classification. Future Microbiol 4(3):359–373 Asahina Y et al (2005) Mutagenic effects of ribavirin and response to interferon/ribavirin combination therapy in chronic hepatitis C. J Hepatol 43(4):623–629 Augestad EH, Bukh J, Prentoe J (2021) Hepatitis C virus envelope protein dynamics and the link to hypervariable region 1. Curr Opin Virol 50:69–75 Austermann-Busch S, Becher P (2012) RNA structural elements determine frequency and sites of nonhomologous recombination in an animal plus-strand RNA virus. J Virol 86(13):7393–7402 Avo AP et al (2013) Hepatitis C virus subtyping based on sequencing of the C/E1 and NS5B genomic regions in comparison to a commercially available line probe assay. J Med Virol 85(5):815–822 Bagaglio S, Uberti-Foppa C, Morsica G (2017) Resistance Mechanisms in Hepatitis C Virus: implications for direct-acting antiviral use. Drugs 77(10):1043–1055 Bailey JR et al (2015) Naturally selected hepatitis C virus polymorphisms confer broad neutralizing antibody resistance. J Clin Invest 125(1):437–447 Barnard R et al (2016) Primer ID ultra-deep sequencing reveals dynamics of drug resistanceassociated variants in breakthrough hepatitis C viruses: relevance to treatment outcome and resistance screening. Antivir Ther 21(7):567–577

Mechanisms and Consequences of Genetic Variation …

257

Baroth M et al (2000) Insertion of cellular NEDD8 coding sequences in a pestivirus. Virology 278(2):456–466 Bartenschlager R, Lohmann V (2000) Replication of hepatitis C virus. J Gen Virol 81(Pt 7):1631– 1648 Bartosch B et al (2003) Cell entry of hepatitis C virus requires a set of co-receptors that include the CD81 tetraspanin and the SR-B1 scavenger receptor. J Biol Chem 278(43):41624–41630 Bebenek A, Ziuzia-Graczyk I (2018) Fidelity of DNA replication-a matter of proofreading. Curr Genet 64(5):985–996 Becher P, Tautz N (2011) RNA recombination in pestiviruses: cellular RNA sequences in viral genomes highlight the role of host factors for viral persistence and lethal disease. RNA Biol 8(2):216–224 Becher P, Orlich M, Thiel HJ (2001) RNA recombination between persisting pestivirus and a vaccine strain: generation of cytopathogenic virus and induction of lethal disease. J Virol 75(14):6256– 6264 Belling J (1933) Crossing over and gene rearrangement in flowering plants. Genetics 18(4):388–413 Biswal BK et al (2005) Crystal structures of the RNA-dependent RNA polymerase genotype 2a of hepatitis C virus reveal two conformations and suggest mechanisms of inhibition by nonnucleoside inhibitors. J Biol Chem 280(18):18202–18210 Bley H, Schobel A, Herker E (2020) Whole Lotta lipids-from HCV RNA replication to the mature viral particle. Int J Mol Sci 21(8):2888 Bordería AV, Rozen-Gagnon K, Vignuzzi M (2016) Fidelity variants and RNA quasispecies. In: Domingo E, Schuster P (eds) Quasispecies: from theory to experimental systems. Springer International Publishing, Cham, pp 303–322 Borgia SM et al (2018) Identification of a novel hepatitis C virus genotype from Punjab, India: expanding classification of hepatitis C virus into 8 genotypes. J Infect Dis 218(11):1722–1729 Bowen DG, Walker CM (2005) Mutational escape from CD8+ T cell immunity: HCV evolution, from chimpanzees to man. J Exp Med 201(11):1709–1714 Bowman RR, Hu WS, Pathak VK (1998) Relative rates of retroviral reverse transcriptase template switching during RNA- and DNA-dependent DNA synthesis. J Virol 72(6):5198–5206 Bukh J (2016) The history of hepatitis C virus (HCV): basic research reveals unique features in phylogeny, evolution and the viral life cycle with new perspectives for epidemic control. J Hepatol 65(1 Suppl):S2–S21 Bukh J, Purcell RH, Miller RH (1993) At least 12 genotypes of hepatitis C virus predicted by sequence analysis of the putative E1 gene of isolates collected worldwide. Proc Natl Acad Sci USA 90(17):8234–8238 Bukh J, Miller RH, Purcell RH (1995) Genetic hetrogeneity of hepatitis C virus: quasispecies and genotypes. Semin Liver Dis 15(1):41–63 Bukh J et al (2015) Immunoglobulin with high-titer in vitro cross-neutralizing hepatitis C virus antibodies passively protects chimpanzees from homologous, but not heterologous, challenge. J Virol 89(17):9128–9132 Callendret B et al (2011) Transmission of clonal hepatitis C virus genomes reveals the dominant but transitory role of CD8+ T cells in early viral evolution. J Virol 85(22):11833–11845 Carvajal-Rodriguez A, Crandall KA, Posada D (2007) Recombination favors the evolution of drug resistance in HIV-1 during antiretroviral therapy. Infect Genet Evol 7(4):476–483 Ceccherini-Silberstein F et al (2018) Viral resistance in HCV infection. Curr Opin Virol 32:115–127 Chevaliez S et al (2007) Analysis of ribavirin mutagenicity in human hepatitis C virus infection. J Virol 81(14):7732–7741 Colpitts CC, Tsai PL, Zeisel MB (2020) Hepatitis C virus entry: an intriguingly complex and highly regulated process. Int J Mol Sci 21(6):2091 Cosset FL et al (2020) HCV interplay with lipoproteins: inside or outside the cells? Viruses 12(4):434 Cuypers L et al (2016) Impact of HCV genotype on treatment regimens and drug resistance: a snapshot in time. Rev Med Virol 26(6):408–434

258

A. Galli and J. Bukh

D’Ambrosio R et al (2017) Direct-acting antivirals: the endgame for hepatitis C? Curr Opin Virol 24:31–37 Dazert E et al (2009) Loss of viral fitness and cross-recognition by CD8+ T cells limit HCV escape from a protective HLA-B27-restricted human immune response. J Clin Invest 119(2):376–386 Delviks-Frankenberry K et al (2011) Mechanisms and factors that influence high frequency retroviral recombination. Viruses 3(9):1650–1680 Delviks KA, Hu WS, Pathak VK (1997) Psi- vectors: murine leukemia virus-based self-inactivating and self-activating retroviral vectors. J Virol 71(8):6218–6224 Dietz J et al (2013) Deep sequencing reveals mutagenic effects of ribavirin during monotherapy of hepatitis C virus genotype 1-infected patients. J Virol 87(11):6172–6181 Domingo E, Perales C (2018) quasispecies and virus. Eur Biophys J 47(4):443–457 Dubuisson J (2007) Hepatitis C virus proteins. World J Gastroenterol 13(17):2406–2415 Dultz G et al (2021) Epistatic interactions promote persistence of NS3-Q80K in HCV infection by compensating for protein folding instability. J Biol Chem 297(3):101031 Dustin LB, Rice CM (2007) Flying under the radar: the immunobiology of hepatitis C. Ann Rev Immunol 25:71–99 Duverlie G et al (1998) Sequence analysis of the NS5A protein of European hepatitis C virus 1b isolates and relation to interferon sensitivity. J Gen Virol 79(Pt 6):1373–1381 Echeverria N et al (2015) Hepatitis C virus genetic variability and evolution. World J Hepatol 7(6):831–845 Egerman RS (2019) New antiviral agents for treatment of hepatitis C. Clin Obstet Gynecol 62(4):823–834 Enomoto N et al (1995) Comparison of full-length sequences of interferon-sensitive and resistant hepatitis C virus 1b. Sensitivity to interferon is conferred by amino acid substitutions in the NS5A region. J Clin Invest 96(1):224–230 Erickson AL et al (2001) The outcome of hepatitis C virus infection is predicted by escape mutations in epitopes targeted by cytotoxic T lymphocytes. Immunity 15(6):883–895 Evans MJ et al (2007) Claudin-1 is a hepatitis C virus co-receptor required for a late step in entry. Nature 446(7137):801–805 Fahnøe U et al (2021) Global evolutionary analysis of chronic hepatitis C patients revealed significant effect of baseline viral resistance, including novel non-target sites, for DAA-based treatment and retreatment outcome. J Viral Hepat 28(2):302–316 Farci P (2011) New insights into the HCV quasispecies and compartmentalization. Semin Liver Dis 31(4):356–374 Farci P, Purcell RH (2000) Clinical significance of hepatitis C virus genotypes and quasispecies. Semin Liver Dis 20(1):103–126 Farci P et al (1994) Prevention of hepatitis C virus infection in chimpanzees after antibody-mediated in vitro neutralization. Proc Natl Acad Sci USA 91(16):7792–7796 Farci P et al (2000) The outcome of acute hepatitis C predicted by the evolution of the viral quasispecies. Science 288(5464):339–344 Feld JJ et al (2017) Ribavirin revisited in the era of direct-acting antiviral therapy for hepatitis C virus infection. Liver Int 37(1):5–18 Feld JJ et al (2015) Sofosbuvir and velpatasvir for HCV genotype 1, 2, 4, 5, and 6 infection. N Engl J Med 373(27):2599–2607 Fricke J, Gunn M, Meyers G (2001) A family of closely related bovine viral diarrhea virus recombinants identified in an animal suffering from mucosal disease: new insights into the development of a lethal disease in cattle. Virology 291(1):77–90 Fried MW, Hadziyannis SJ (2004) Treatment of chronic hepatitis C infection with peginterferons plus ribavirin. Semin Liver Dis 24(Suppl 2):47–54 Gallei A et al (2004) RNA recombination in vivo in the absence of viral replication. J Virol 78(12):6271–6281 Galli A, Bukh J (2014) Comparative analysis of the molecular mechanisms of recombination in hepatitis C virus. Trends Microbiol 22(6):354–364

Mechanisms and Consequences of Genetic Variation …

259

Galli A et al (2018) Antiviral effect of Ribavirin against HCV associated with increased frequency of G-to-A and C-to-U transitions in infectious cell culture model. Sci Rep 8(1):4619 Galli A, Fahnøe U, Bukh J (2022) High recombination rate of hepatitis C virus revealed by a green fluorescent protein reconstitution cell system. Virus Evol 8(1) Gao F et al (2007) Recombinant hepatitis C virus in experimentally infected chimpanzees. J Gen Virol 88(Pt 1):143–147 Gmyl AP et al (1999) Nonreplicative RNA recombination in poliovirus. J Virol 73(11):8958–8965 Gonzalez-Candelas F, Lopez-Labrador FX, Bracho MA (2011) Recombination in hepatitis C virus. Viruses 3(10):2006–2024 Goossens N, Negro F (2014) Is genotype 3 of the hepatitis C virus the new villain? Hepatology 59(6):2403–2412 Gottwein JM, Bukh J (2008) Cutting the gordian knot-development and biological relevance of hepatitis C virus cell culture systems. Adv Virus Res 71:51–133 Gottwein JM et al (2011a) Differential efficacy of protease inhibitors against HCV genotypes 2a, 3a, 5a, and 6a NS3/4A protease recombinant viruses. Gastroenterology 141(3):1067–1079 Gottwein JM et al (2007) Robust hepatitis C genotype 3a cell culture releasing adapted intergenotypic 3a/2a (S52/JFH1) viruses. Gastroenterology 133(5):1614–1626 Gottwein JM et al (2009) Development and characterization of hepatitis C virus genotype 1–7 cell culture systems: role of CD81 and scavenger receptor class B type I and effect of antiviral drugs. Hepatology 49(2):364–377 Gottwein JM et al (2013) Combination treatment with hepatitis C virus protease and NS5A inhibitors is effective against recombinant genotype 1a, 2a, and 3a viruses. Antimicrob Agents Chemother 57(3):1291–1303 Gottwein JM et al (2018) Efficacy of NS5A inhibitors against hepatitis C virus genotypes 1–7 and escape variants. Gastroenterology 154(5):1435–1448 Gottwein JM et al (2011b) Development and application of hepatitis C reporter viruses with genotype 1 to 7 core-nonstructural protein 2 (NS2) expressing fluorescent proteins or luciferase in modified JFH1 NS5A. J Virol 85(17):8913–8928 Guntipalli P et al (2021) Worldwide prevalence, genotype distribution and management of hepatitis C. Acta Gastroenterol Belg 84(4):637–656 Han Q et al (2009) Compensatory mutations in NS3 and NS5A proteins enhance the virus production capability of hepatitis C reporter virus. Virus Res 145(1):63–73 Harrus D et al (2010) Further insights into the roles of GTP and the C terminus of the hepatitis C virus polymerase in the initiation of RNA synthesis. J Biol Chem 285(43):32906–32918 Hartlage AS, Cullen JM, Kapoor A (2016) The strange, expanding world of animal hepaciviruses. Ann Rev Virol 3(1):53–75 Hayes CN et al (2021) Road to elimination of HCV: clinical challenges in HCV management. Liver Int 42(9):1935–1944 Hellen CU, Pestova TV (1999) Translation of hepatitis C virus RNA. J Viral Hepat 6(2):79–87 Hezode C (2017) Pan-genotypic treatment regimens for hepatitis C virus: advantages and disadvantages in high- and low-income regions. J Viral Hepat 24(2):92–101 Hofmann WP et al (2007) Mutagenic effect of ribavirin on hepatitis C nonstructural 5B quasispecies in vitro and during antiviral therapy. Gastroenterology 132(3):921–930 Hwang CK, Svarovskaia ES, Pathak VK (2001) Dynamic copy choice: steady state between murine leukemia virus polymerase and polymerase-dependent RNase H activity determines frequency of in vivo template switching. Proc Natl Acad Sci U S A 98(21):12209–12214 Inchauspe G et al (1991) Genomic structure of the human prototype strain H of hepatitis C virus: comparison with American and Japanese isolates. Proc Natl Acad Sci USA 88(22):10292–10296 Jackowiak P et al (2014) Phylogeny and molecular evolution of the hepatitis C virus. Infect Genet Evol 21:67–82 Jackson WE, Everson GT (2017) Sofosbuvir and velpatasvir for the treatment of hepatitis C. Expert Rev Gastroenterol Hepatol 11(6):501–505

260

A. Galli and J. Bukh

Janssens FA, Koszul R, Zickler D (2012) The chiasmatype theory: a new interpretation of the maturation divisions. 1909. Genetics 191(2):319–346 Jensen SB et al (2015) Substitutions at NS3 residue 155, 156, or 168 of hepatitis C virus genotypes 2 to 6 induce complex patterns of protease inhibitor resistance. Antimicrob Agents Chemother 59(12):7426–7436 Jensen TB et al (2008) Highly efficient JFH1-based cell-culture system for hepatitis C virus genotype 5a: failure of homologous neutralizing-antibody treatment to control infection. J Infect Dis 198(12):1756–1765 Kellam P, Larder BA (1995) Retroviral recombination can lead to linkage of reverse transcriptase mutations that confer increased zidovudine resistance. J Virol 69(2):669–674 Kim MJ, Kao C (2001) Factors regulating template switch in vitro by viral RNA-dependent RNA polymerases: implications for RNA-RNA recombination. Proc Natl Acad Sci USA 98(9):4972– 4977 Lai MM (1992) RNA recombination in animal and plant viruses. Microbiol Rev 56(1):61–79 Le Guillou-Guillemette H et al (2007) Genetic diversity of the hepatitis C virus: impact and issues in the antiviral therapy. World J Gastroenterol 13(17):2416–2426 Lesburg CA, Radfar R, Weber PC (2000) Recent advances in the analysis of HCV NS5B RNAdependent RNA polymerase. Curr Opin Investig Drugs 1(3):289–296 Li HC, Yang CH, Lo SY (2021) Hepatitis C viral replication complex. Viruses 13(3):520 Li YP et al (2011) MicroRNA-122 antagonism against hepatitis C virus genotypes 1–6 and reduced efficacy by host RNA insertion or mutations in the HCV 5, UTR. Proc Natl Acad Sci USA 108(12):4991–4996 Li YP et al (2012) Highly efficient full-length hepatitis C virus genotype 1 (strain TN) infectious culture system. Proc Natl Acad Sci USA 109(48):19757–19762 Li YP et al (2014) Differential sensitivity of 5, UTR-NS5A recombinants of hepatitis C virus genotypes 1–6 to protease and NS5A inhibitors. Gastroenterology 146(3):812–821 e4 Lindström I et al (2015) Prevalence of polymorphisms with significant resistance to NS5A inhibitors in treatment-naive patients with hepatitis C virus genotypes 1a and 3a in Sweden. Infect Dis (Lond) 47(8):555–562 Liu X, Hu P (2021) Efficacy and safety of glecaprevir/pibrentasvir in patients with chronic HCV infection. J Clin Transl Hepatol 9(1):125–132 Lodrini S et al (2003) The NS3 protease gene of HCV is highly conserved within the putative catalytic site region. J Hepatol 38:113–114 Lohmann V (2013) Hepatitis C virus RNA replication. Curr Top Microbiol Immunol 369:167–198 Lund AH et al (1999) The kissing-loop motif is a preferred site of 5, leader recombination during replication of SL3-3 murine leukemia viruses in mice. J Virol 73(11):9614–9618 Lutchman G et al (2007) Mutation rate of the hepatitis C virus NS5B in patients undergoing treatment with ribavirin monotherapy. Gastroenterology 132(5):1757–1766 Mani N et al (2015) Nonstructural protein 5A (NS5A) and human replication protein A increase the processivity of hepatitis C virus NS5B polymerase activity in vitro. J Virol 89(1):165–180 McLauchlan J (2009) Hepatitis C virus: viral proteins on the move. Biochem Soc Trans 37(Pt 5):986–990 McPhee F et al (2019) Impact of preexisting hepatitis C virus genotype 6 NS3, NS5A, and NS5B polymorphisms on the in vitro potency of direct-acting antiviral agents. Antimicrob Agents Chemother 63(4) Mejer N et al (2018) Ribavirin-induced mutagenesis across the complete open reading frame of hepatitis C virus genotypes 1a and 3a. J Gen Virol 99(8):1066–1077 Mejer N et al (2020a) Ribavirin inhibition of cell-culture infectious hepatitis C genotype 1–3 viruses is strain-dependent. Virology 540:132–140 Mejer N et al (2020b) Mutations identified in the hepatitis C virus (HCV) polymerase of patients with chronic HCV treated with ribavirin cause resistance and affect viral replication fidelity. Antimicrob Agents Chemother 64(12):e01417-e1420

Mechanisms and Consequences of Genetic Variation …

261

Menendez-Arias L (2009) Mutation rates and intrinsic fidelity of retroviral reverse transcriptases. Viruses 1(3):1137–1165 Messina JP et al (2015) Global distribution and prevalence of hepatitis C virus genotypes. Hepatology 61(1):77–87 Meuleman P et al (2011) In vivo evaluation of the cross-genotype neutralizing activity of polyclonal antibodies against hepatitis C virus. Hepatology 53(3):755–762 Meyers G et al (1992) Rearrangement of viral sequences in cytopathogenic pestiviruses. Virology 191(1):368–386 Mortality GBD, Causes of Death, Collaborators (2016) Global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death, 1980–2015: a systematic analysis for the Global Burden of Disease Study 2015. Lancet 388(10053):1459–544 Mosley RT et al (2012) Structure of hepatitis C virus polymerase in complex with primer-template RNA. J Virol 86(12):6503–6511 Moutouh L, Corbeil J, Richman DD (1996) Recombination leads to the rapid emergence of HIV-1 dually resistant mutants under selective drug pressure. Proc Natl Acad Sci USA 93(12):6106–6111 Neumann AU et al (1998) Hepatitis C viral dynamics in vivo and the antiviral efficacy of interferonalpha therapy. Science 282(5386):103–107 Neumann-Haefelin C et al (2011) Human leukocyte antigen B27 selects for rare escape mutations that significantly impair hepatitis C virus replication and require compensatory mutations. Hepatology 54(4):1157–1166 Nguyen D et al (2020) Efficacy of NS5A inhibitors against unusual and potentially difficult-totreat HCV subtypes commonly found in sub-Saharan Africa and South East Asia. J Hepatol 73(4):794–799 Nousbaum J et al (2000) Prospective characterization of full-length hepatitis C virus NS5A quasispecies during induction and combination antiviral therapy. J Virol 74(19):9028–9038 Oancea CN et al (2020) Global hepatitis C elimination: history, evolution, revolutionary changes and barriers to overcome. Rom J Morphol Embryol 61(3):643–653 Osburn WO et al (2014) Clearance of hepatitis C infection is associated with the early appearance of broad neutralizing antibody responses. Hepatology 59(6):2140–2151 Parlati L, Hollande C, Pol S (2021) Treatment of hepatitis C virus infection. Clin Res Hepatol Gastroenterol 45(4):101578 Pawlotsky JM (2014) New hepatitis C therapies: the toolbox, strategies, and challenges. Gastroenterology 146(5):1176–1192 Pawlotsky JM (2016) Hepatitis C virus resistance to direct-acting antiviral drugs in interferon-free regimens. Gastroenterology 151(1):70–86 Pawlotsky JM (2019) Retreatment of hepatitis C virus-infected patients with direct-acting antiviral failures. Semin Liver Dis 39(3):354–368 Pawlotsky JM (2020) Interferon-free hepatitis C virus therapy. Cold Spring Harb Perspect Med 10(11):a036855 Pawlotsky JM et al (2015) From non-A, non-B hepatitis to hepatitis C virus cure. J Hepatol 62(1 Suppl):S87-99 Pawlotsky JM et al (1998) Interferon resistance of hepatitis C virus genotype 1b: relationship to nonstructural 5A gene quasispecies mutations. J Virol 72(4):2795–2805 Peck KM, Lauring AS (2018) Complexities of viral mutation rates. J Virol 92(14):e01031-e1117 Perez-Losada M et al (2015) Recombination in viruses: mechanisms, methods of study, and evolutionary consequences. Infect Genet Evol 30:296–307 Pestka JM et al (2007) Rapid induction of virus-neutralizing antibodies and viral clearance in a single-source outbreak of hepatitis C. Proc Natl Acad Sci USA 104(14):6025–6030 Pham LV et al (2018) HCV Genotype 6a escape from and resistance to velpatasvir, pibrentasvir, and sofosbuvir in robust infectious cell culture models. Gastroenterology 154(8):2194–208 e12 Pham LV et al (2019) HCV genotype 1–6 NS3 residue 80 substitutions impact protease inhibitor activity and promote viral escape. J Hepatol 70(3):388–397

262

A. Galli and J. Bukh

Pham LV et al (2022) HCV genome-wide analysis for development of efficient culture systems and unravelling of antiviral resistance in genotype 4. Gut 71(3):627–642 Pileri P et al (1998) Binding of hepatitis C virus to CD81. Science 282(5390):938–941 Ploss A et al (2009) Human occludin is a hepatitis C virus entry factor required for infection of mouse cells. Nature 457(7231):882–886 Plotkin SA, Plotkin SL (2011) The development of vaccines: how the past led to the future. Nat Rev Microbiol 9(12):889–893 Polaris Observatory, H. C. V. Collaborators (2017) Global prevalence and genotype distribution of hepatitis C virus infection in 2015: a modelling study. Lancet Gastroenterol Hepatol 2(3):161–176 Prentoe J, Bukh J (2018) Hypervariable region 1 in envelope protein 2 of hepatitis c virus: a Linchpin in neutralizing antibody evasion and viral entry. Front Immunol 9:2146 Prentoe J et al (2019) Hypervariable region 1 and N-linked glycans of hepatitis C regulate virion neutralization by modulating envelope conformations. Proc Natl Acad Sci USA 116(20):10039– 10047 Rambaut A et al (2004) The causes and consequences of HIV evolution. Nat Rev Genet 5(1):52–61 Ramirez S, Bukh J (2018) Current status and future development of infectious cell-culture models for the major genotypes of hepatitis C virus: essential tools in testing of antivirals and emerging vaccine strategies. Antiviral Res 158:264–287 Ramirez S et al (2014) Highly efficient infectious cell culture of three hepatitis C virus genotype 2b strains and sensitivity to lead protease, nonstructural protein 5A, and polymerase inhibitors. Hepatology 59(2):395–407 Ramirez S et al (2016) Robust HCV genotype 3a infectious cell culture system permits identification of escape variants with resistance to sofosbuvir. Gastroenterology 151(5):973–985 e2 Ramirez S et al (2020) Cell culture studies of the efficacy and barrier to resistance of sofosbuvirvelpatasvir and glecaprevir-pibrentasvir against hepatitis C virus genotypes 2a, 2b, and 2c. Antimicrob Agents Chemother 64(3):e01888-e1919 Raney KD et al (2010) Hepatitis C virus non-structural protein 3 (HCV NS3): a multifunctional antiviral target. J Biol Chem 285(30):22725–22731 Reiter J et al (2011) Hepatitis C virus RNA recombination in cell culture. J Hepatol 55(4):777–783 Rhodes T, Wargo H, Hu WS (2003) High rates of human immunodeficiency virus type 1 recombination: near-random segregation of markers one kilobase apart in one round of viral replication. J Virol 77(20):11193–11200 Rong L et al (2010) Rapid emergence of protease inhibitor resistance in hepatitis C virus. Sci Transl Med 2(30):30ra32 Rousset D et al (2003) Recombinant vaccine-derived poliovirus in Madagascar. Emerg Infect Dis 9(7):885–887 Rubbia-Brandt L et al (2000) Hepatocyte steatosis is a cytopathic effect of hepatitis C virus genotype 3. J Hepatol 33(1):106–115 Ruhl M et al (2012) Escape from a dominant HLA-B*15-restricted CD8+ T cell response against hepatitis C virus requires compensatory mutations outside the epitope. J Virol 86(2):991–1000 Ryan MD, Monaghan S, Flint M (1998) Virus-encoded proteinases of the Flaviviridae. J Gen Virol 79(5):947–959 Sagan SM, Chahal J, Sarnow P (2015) cis-Acting RNA elements in the hepatitis C virus RNA genome. Virus Res 206:90–98 Saito Y et al (2020) Ribavirin induces hepatitis C virus genome mutations in chronic hepatitis patients who failed to respond to prior daclatasvir plus asunaprevir therapy. J Med Virol 92(2):210–218 Salimi Alizei E et al (2021) Mutational escape from cellular immunity in viral hepatitis: variations on a theme. Curr Opin Virol 50:110–118 Sanjuan R, Domingo-Calap P (2016) Mechanisms of viral mutation. Cell Mol Life Sci 73(23):4433– 4448 Sarrazin C (2021) Treatment failure with DAA therapy: importance of resistance. J Hepatol 74(6):1472–1482

Mechanisms and Consequences of Genetic Variation …

263

Scheel TK et al (2008) Development of JFH1-based cell culture systems for hepatitis C virus genotype 4a and evidence for cross-genotype neutralization. Proc Natl Acad Sci USA 105(3):997– 1002 Scheel TK et al (2011) Efficient culture adaptation of hepatitis C virus recombinants with genotypespecific core-NS2 by using previously identified mutations. J Virol 85(6):2891–2906 Scheel TK et al (2013) Productive homologous and non-homologous recombination of hepatitis C virus in cell culture. PLoS Pathog 9(3):e1003228 Scotto R et al (2019) Real-world efficacy and safety of pangenotypic direct-acting antivirals against hepatitis C virus infection. Rev Recent Clin Trials 14(3):173–182 Serre SB et al (2016) Hepatitis C virus genotype 1 to 6 protease inhibitor escape variants: in vitro selection, fitness, and resistance patterns in the context of the infectious viral life cycle. Antimicrob Agents Chemother 60(6):3563–3578 Shi ST, Lai MMC (2006) ‘HCV 5’ and 3, UTR: when translation meets replication. In: Tan SL (ed) Hepatitis C viruses: genomes and molecular biology. Horizon Bioscience, Norfolk, UK Shoukry NH (2018) Hepatitis C vaccines, antibodies, and T cells. Front Immunol 9:1480 Shoukry NH et al (2003) Memory CD8+ T cells are required for protection from persistent hepatitis C virus infection. J Exp Med 197(12):1645–1655 Shu B, Gong P (2016) Structural basis of viral RNA-dependent RNA polymerase catalysis and translocation. Proc Natl Acad Sci U S A 113(28):E4005–E4014 Simmonds P (1996) Virology of hepatitis C virus. Clin Ther Suppl B 18:9–36 Simmonds P et al (2017) ICTV Virus taxonomy profile: Flaviviridae. J Gen Virol 98(1):2–3 Smith DB et al (2014) Expanded classification of hepatitis C virus into 7 genotypes and 67 subtypes: updated criteria and genotype assignment web resource. Hepatology 59(1):318–327 Smith DB et al (2016) Proposed update to the taxonomy of the genera Hepacivirus and Pegivirus within the Flaviviridae family. J Gen Virol 97(11):2894–2907 Smith DA et al (2021) Viral genome wide association study identifies novel hepatitis C virus polymorphisms associated with sofosbuvir treatment failure. Nat Commun 12(1):6105 Sulejmani N, Jafri SM (2018) Grazoprevir/elbasvir for the treatment of adults with chronic hepatitis C: a short review on the clinical evidence and place in therapy. Hepat Med 10:33–42 Thiagarajan P, Ryder SD (2015) The hepatitis C revolution part 1: antiviral treatment options. Curr Opin Infect Dis 28(6):563–571 Thomson MM, Najera R (2005) Molecular epidemiology of HIV-1 variants in the global AIDS pandemic: an update. AIDS Rev 7(4):210–224 Vermehren J, Sarrazin C (2012) The role of resistance in HCV treatment. Best Pract Res Clin Gastroenterol 26(4):487–503 Vijay NNV et al (2008) Recombination increases human immunodeficiency virus fitness, but not necessarily diversity. J Gen Virol 89(Pt 6):1467–1477 von Hahn T et al (2007) Hepatitis C virus continuously escapes from neutralizing antibody and T-cell responses during chronic infection in vivo. Gastroenterology 132(2):667–678 Wang H, Tai AW (2016) Mechanisms of cellular membrane reorganization to support hepatitis C virus replication. Viruses 8(5):142 Ward MJ et al (2013) Estimating the rate of intersubtype recombination in early HIV-1 group M strains. J Virol 87(4):1967–1973 Webster DP, Klenerman P, Dusheiko GM (2015) Hepatitis C. Lancet 385(9973):1124–1135 Webster G et al (2000) HCV genotypes–role in pathogenesis of disease and response to therapy. Baillieres Best Pract Res Clin Gastroenterol 14(2):229–240 Welzel TM, Dultz G, Zeuzem S (2014) Interferon-free antiviral combination therapies without nucleosidic polymerase inhibitors. J Hepatol 61(1 Suppl):S98-s107 WHO (2017) Global Hepatitis Report (Global Hepatitis Report) Wing PAC et al (2019) Amino acid substitutions in genotype 3a hepatitis C virus polymerase protein affect responses to sofosbuvir. Gastroenterology 157(3):692–704 e9 Witherell GW, Beineke P (2001) Statistical analysis of combined substitutions in nonstructural 5A region of hepatitis C virus and interferon response. J Med Virol 63(1):8–16

264

A. Galli and J. Bukh

Worobey M, Holmes EC (1999) Evolutionary aspects of recombination in RNA viruses. J Gen Virol 80(Pt 10):2535–2543 Young KC et al (2003) Identification of a ribavirin-resistant NS5B mutation of hepatitis C virus during ribavirin monotherapy. Hepatology 38(4):869–878

Mammarenavirus Genetic Diversity and Its Biological Implications Manuela Sironi , Diego Forni , and Juan C. de la Torre

Abstract Members of the family Arenaviridae are classified into four genera: Antennavirus, Hartmanivirus, Mammarenavirus, and Reptarenavirus. Reptarenaviruses and hartmaniviruses infect (captive) snakes and have been shown to cause boid inclusion body disease (BIBD). Antennaviruses have genomes consisting of 3, rather than 2, segments, and were discovered in actinopterygian fish by nextgeneration sequencing but no biological isolate has been reported yet. The hosts of mammarenaviruses are mainly rodents and infections are generally asymptomatic. Current knowledge about the biology of reptarenaviruses, hartmaniviruses, and antennaviruses is very limited and their zoonotic potential is unknown. In contrast, some mammarenaviruses are associated with zoonotic events that pose a threat to human health. This review will focus on mammarenavirus genetic diversity and its biological implications. Some mammarenaviruses including lymphocytic choriomeningitis virus (LCMV) are excellent experimental model systems for the investigation of acute and persistent viral infections, whereas others including Lassa (LASV) and Junin (JUNV) viruses, the causative agents of Lassa fever (LF) and Argentine hemorrhagic fever (AHF), respectively, are important human pathogens. Mammarenaviruses were thought to have high degree of intra-and inter-species amino acid sequence identities, but recent evidence has revealed a high degree of mammarenavirus genetic diversity in the field. Moreover, closely related mammarenavirus can display dramatic phenotypic differences in vivo. These findings support a role of genetic variability in mammarenavirus adaptability and pathogenesis. Here, we will review the molecular biology of mammarenaviruses, phylogeny, and evolution, as well as the quasispecies dynamics of mammarenavirus populations and their biological implications. M. Sironi · D. Forni Bioinformatics, Scientific Institute IRCCS E. MEDEA, 23842 Bosisio Parini, Italy e-mail: [email protected] D. Forni e-mail: [email protected] J. C. de la Torre (B) Department Immunology and Microbiology IMM-6, Scripps Research, 10550 North Torrey Pines Road, La Jolla, CA 92037, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 E. Domingo et al. (eds.), Viral Fitness and Evolution, Current Topics in Microbiology and Immunology 439, https://doi.org/10.1007/978-3-031-15640-3_8

265

266

M. Sironi et al.

1 Arenaviridae: History and Current Taxonomy Viruses in the family Arenaviridae (phylum Negarnaviricota, order Bunyavirales) have single-stranded RNA genomes consisting of 2–3 segments totaling about 10.5 kb (Fig. 1). Arenaviruses are currently classified into four genera (Antennavirus, Hartmanivirus, Mammarenavirus, and Reptarenavirus). However, since 1976 and until 2012 the Arenaviridae family comprised a single genus of mammal-infecting viruses (now the Mammarenavirus genus). The first arenavirus was isolated in 1933 by Armstrong and Lillie during their studies on an epidemic of encephalitis in St. Louis (Armstrong and Lillie 1934). They coined the name lymphocytic choriomeningitis virus (LCMV) to reflect the pathological features the new agent caused. Around the same time (1935), Traub (1935) recovered the virus from white mice, and in 1936 Rivers and Scott (1936) isolated the same agent from the cerebrospinal fluid of two men with lymphocytic meningitis (Scott 1936). By the late 1960s, the rodent viruses responsible for Bolivian hemorrhagic fever (HF) (Machupo virus, MACV) and Argentinian HF (Junin virus, JUNV) had been discovered (Johnson 1965; Parodi et al. 1958, 1959). Electron microscopy morphological similarities among LCMV, MACV, and Tacaribe virus (TACV), isolated from fruit bats in Trinidad during a rabies surveillance survey (Downs et al. 1963), led to the proposal of the new virus family Arenaviridae (Murphy et al. 1969). Around the same, Lassa fever was first described in the town of Lassa, Nigeria and the causative agent was isolated and named Lassa virus (LASV) (Frame et al. 1970). LASV virions were morphologically indistinguishable from those of LCMV, MACV, and TCRV (Speir et al. 1970), and LASV was also found to be a rodent-associated virus (Monath et al. 1974). The family Arenaviridae was ratified by the International Committee on Taxonomy of Viruses (ICTV) in 1976 (Fenner 1976). It was also recognized that TCRV, JUNV, and MACV were antigenically similar, whereas LASV was more closely related to LCMV (Buckley and Casals 1970; Johnson et al. 1966). Based on antigenic crossreactivity and geographic distribution, arenaviruses are classified into two main groups: the Old World (OW) or LCMV-LASV complex, and the New World (NW) or Tacaribe complex arenaviruses (Matthews 1979). The use of nextgeneration sequencing (NGS) technologies in recent years has identified a number of novel mammarenaviruses, and the genus Mammarenavirus now includes 40 viral species (Kuhn et al. 2021) (https://talk.ictvonline.org/taxonomy/) (Fig. 2). Likewise, the implementation over the last decade of these novel genomic technologies resulted in the identification of arenaviruses in non-mammalian hosts and, consequently, the creation of additional genera in the Arenaviridae family. The presence of arenaviruses in reptiles was documented first in 2009 in samples from snakes affected by boid inclusion body disease (BIBD). Subsequently, a number of arenaviruses have been found to be associated with BIBD cases in snakes (Bodewes et al. 2013; Hetzel et al. 2013; Stenglein et al. 2012). The so-called “snake arenaviruses” are currently classified into two novel genera: Reptarenavirus and Hartmanivirus) (Maes et al. 2018). At present, the Reptarenavirus and Hartmanivirus genera include 5 and 6 species, respectively (Kuhn et al. 2021) (https://talk.ictvonline.org/taxonomy/).

Mammarenavirus Genetic Diversity and Its Biological Implications

267

Fig. 1 Mammarenavirus genome organization and sequence similarity. Mammarenavirus genome organization is schematically shown together with information on sequence homology. Homologous regions are indicated in the same color. Gray indicates regions that display no homology to known proteins outside the genus

Fig. 2 Mammarenavirus phylogenetic relationships. Maximum Likelihood phylogenetic trees of 40 mammarenaviruses inferred from PRANK alignments (Loytynoja and Goldman 2008) of the NP (a) and L (b) genes. For both alignments, the best-fit model of protein evolution (LG + G) was selected using ProtTest 3 (v. 3.4.2) (Darriba et al. 2011). Maximum likelihood trees were produced using RAxML (v. 8) (Stamatakis 2014). NW mammarenaviruses are shaded in colors to denote clades

268

M. Sironi et al.

Mammarenaviruses, reptarenaviruses, and hartmaniviruses have bi-segmented genomes that use an ambisense coding strategy to produce their proteins. The small segment (S) encodes the glycoprotein precursor (GPC) and the nucleocapsid protein (NP); the large segment (L) codes for the RNA-dependent RNA polymerase (L) and, in the case of mammarenaviruses and reptarenaviruses, for the Z protein (Fig. 1). However, a bi-segmented genome is not a universal feature of arenaviruses, as largescale metatranscriptomic study of RNA viruses in cold-blooded vertebrates identified two novel arenaviruses in frogfish (Antennarius striatus) whose genomes are tri-segmented (Shi et al. 2016) and they are now classified in the Antennavirus genus (Kuhn et al. 2021) (Fig. 1). This genus was recently expanded to include additional viruses sequenced from salmon (Oncorhynchus tshawytscha and Oncorhynchus nerka) (Mordecai et al. 2019). Interestingly, large-scale analyses also identified arenavirus-like sequences in invertebrates. Thus, Húbˇei myriapoda virus 5 (Shi et al. 2016) has sequence similarity to arenavirus L proteins and has a tri-segmented genome (Fig. 1). Húbˇei myriapoda virus 5 is presently classified in the Mypoviridae (order Bunyavirales) (Shi et al. 2016) and its relatedness to arenaviruses suggest that the latter might have originated in invertebrates.

2 Mammarenavirus Impact on Virology and Human Health 2.1 Mammarenavirus Impact on Human Health Mammarenavirus human infections usually occur via mucosal exposure to aerosols or infectious materials (Buchmeier et al. 2007). Manifestations of mammarenavirus infection in humans can vary from subclinical infections to infections associated with mild to severe disease symptoms. Several mammarenaviruses cause hemorrhagic fever (HF) disease in humans and represent an important public health problem in their endemic regions. Thus, the Old World (OW) mammarenavirus Lassa virus (LASV) is highly prevalent in Western Africa where it is estimated to infect several hundred thousand individuals yearly resulting in a high number of Lassa fever (LF) cases, a HF disease associated with high morbidity and mortality (Bray 2005; Geisbert and Jahrling 2004). The population at risk of LF might be as high as 200 million people (Sogoba et al. 2012), one of the highest among viral HF (Falzarano and Feldmann 2013). Notably, increased traveling has resulted in the importation of LF cases into non-endemic metropolitan areas around the globe (Freedman and Woodall 1999; Isaacson 2001). The New World (NW) mammarenavirus Junin virus (JUNV) causes Argentine HF (AHF), a disease endemic to the Argentinean Pampas with hemorrhagic and neurological manifestations and a case fatality of 15–30% (Peters and Oldstone 2002). The worldwide distributed mammarenavirus LCMV does not cause viral HF disease, but mounting evidence indicates that LCMV is likely a neglected human pathogen of clinical significance (Barton and Mets 2001; Barton et al. 2002;

Mammarenavirus Genetic Diversity and Its Biological Implications

269

Bonthius 2009, 2012; Bonthius and Perlman 2007; Jahrling and Peters 1992), and LCMV poses a special threat in immune-compromised individuals (Fischer et al. 2006; Peters 2006; Palacios et al. 2008). In addition, the significant seroprevalence of LCMV in urban populations across the world, raises the issue of whether LCMV may contribute to cases of undiagnosed aseptic meningitis (Vilibic-Cavlek et al. 2021). LCMV is also highly ranked among wildlife viruses with the potential for zoonotic events leading to spread in humans (Grange et al. 2021). Currently, there are no FDA-licensed vaccines or therapeutics to prevent or treat mammarenavirus human infections. Current anti-mammarenaviral therapy is limited to an off-label use of ribavirin whose efficacy remains controversial and its use can result in significant side effects (Damonte and Coto 2002). The live-attenuated Candid#1 strain of JUNV has been approved in Argentina for use in populations at high risk of exposure to JUNV, and its deployment has resulted in a drastic reduction in AHF cases (Enria and Barrera Oro 2002; Grant et al. 2012).

2.2 Mammarenavirus as Highly Tractable Experimental Systems for the Investigation of Virus-Host Interactions LCMV infection of the mouse is a widely used experimental system in the field of viral immunology and pathogenesis and investigations using LCMV have provided the bases for the establishment of many important concepts in virology and immunology (Freeman et al. 2006; Oldstone 2002; Trautmann et al. 2006; Zinkernagel 2002). LCMV infection of the mouse can result in very different phenotypic manifestations depending on both viral and host factors (Oldstone 2002; Zinkernagel 2002), which provides investigators with a robust experimental system to examine both host immune mechanisms of virus control and viral counteracting strategies.

3 Molecular and Cell Biology of Mammarenaviruses 3.1 Mammarenavirus Genome Organization Mammarenaviruses are enveloped viruses with a bi-segmented negative-stranded (NS) RNA genome (Buchmeier et al. 2007). Virions are pleomorphic ranging in size from 40 to more than 200 nm in diameter. Both L (ca 7.3 kb) and S (ca 3.5 kb) genome segments use an ambisense coding strategy to express two viral proteins in opposite orientation, separated by a non-coding intergenic region (IGR) that serves as a bona fide transcription termination signal (Fig. 3). The S and L IGR differ in both sequence and predicted structure but S and L IGR sequences are highly conserved among strains of the same mammarenavirus species. The S segment encodes the viral NP and the glycoprotein precursor (GPC) that is co-and post-translationally cleaved

270

M. Sironi et al.

by the cellular signal peptidase and site 1 protease (S1P), respectively, to yield a 58 amino acid-long stable signal peptide (SSP) and the two mature virion glycoproteins GP1 and GP2 that together with the SSP form the spikes that decorate the virus surface (Beyer et al. 2003; Lenz et al. 2001; Pinschewer et al. 2003). GP1 mediates virus receptor recognition and cell entry via endocytosis, whereas GP2 mediates the pH-dependent fusion event required to complete the entry process that results in the release of the virus ribonucleoprotein (vRNP) into the cytoplasm of infected cells. The L segment encodes the viral RNA-dependent RNA polymerase (RdRp, or L polymerase), and the matrix Z protein (Perez et al. 2003; Strecker et al. 2003). Mammarenaviruses genome 3’-termini have a high degree of sequence conservation (17 out of 19 nt are identical) and both L and S segments exhibit terminal complementarity between their 5’- and 3’-ends, which are predicted to form panhandle structures (Buchmeier et al. 2007). Consistent with the formation of predicted panhandle structures involving the genome termini, electron microscopy (EM) studies revealed the presence of circular vRNPs within virions (Young and Howard 1983). EM studies revealed the presence within mammaernavirus virions of electron dense structures that have been proposed to be host ribosomes (Muller et al. 1983).

Fig. 3 Mammarenavirus genome organization and virion structure. Mammarenaviruses are enveloped viruses with a bi-segmented negative-strand RNA genome. Each genome segment, S (ca 3.5 kb) and L (ca 7.3 kb), uses an ambisense coding strategy to direct the synthesis of two viral proteins. The S segment encodes for the glycoprotein precursor (GPC) and nucleoprotein (NP). The L segment encodes for the virus RNA-dependent RNA polymerase (L) and the matrix protein (Z)

Mammarenavirus Genetic Diversity and Its Biological Implications

271

3.2 Mammarenavirus Life Cycle Despite differences in their impact on the infected host, all known mammarenavirus execute a similar life cycle (Fig. 4) to produce infectious progeny. Cell entry: Mammarenavirus enters cells via receptor-mediated endocytosis, a process that is terminated via a pH-dependent fusion event between viral and cellular membranes within the acidic environment of the late endosome (Rojek and Kunz 2008; Rojek et al. 2008a, b). The ubiquitously expressed alpha-dystroglycan (aDG), a cell surface receptor for extracellular matrix proteins, serves as the main receptor for LCMV, LASV, and several other mammarenaviruses (Kunz et al. 2002, 2004). Posttranslational modifications of aDG, including glycosylation by likeacetylglucosaminyltransferase (LARGE), are required for its role as mammarenavirus receptor (Kunz et al. 2005). Cell entry of OW mammarenavirus follows an unusual endocytic pathway without the participation of Rab5/EEA-1 positive early

Fig. 4 Mammarenavirus life cycle. Mammarenaviruses enter cells via receptor-mediated endocytosis, a process initiated by the interaction between GP1 and the virus receptor present at the cell surface. The acidic environment of the late endosome promotes fusion between viral and host cell membranes, a process directed by the GP2 subunit of the virus GPC complex. Following the pH-dependent membrane fusion event, the vRNP is released into the cytoplasm where it directs the biosynthetic processes of replication and transcription of the viral genome. Viral assembly takes place in the cell cytoplasm and virions bud from the plasma membrane, with Z playing critical roles in both of these processes

272

M. Sironi et al.

endosomes, but with the involvement of the multivesicular body (MVB) and endosomal sorting complex required for transport (ESCRT)-dependent process (Pasqual et al. 2011). It remains to be determined whether this unusual endocytotic pathway reflects the natural cellular trafficking of aDG, or a novel route of αDG trafficking triggered upon interaction with the virus GP1 ligand. It should be noted that many NW arenaviruses use a different cell entry pathway involving the use the human transferrin receptor 1 to initiate cell entry via clathrin-mediated endocytosis (Rojek et al. 2008b). Expression and replication of the viral genome: RNA replication and gene transcription of the mammarenavirus genome, processes directed by the vRNP, take place in the cytoplasm (Buchmeier et al. 2007). The NP and L mRNAs are transcribed from RNA genome species, whereas the GPC and Z mRNAs are transcribed from the corresponding antigenome RNA species. Arenavirus mRNAs are non-polyadenylate and contain a 5’-cap structure obtained from cellular mRNAs via a cap-snatching mechanism (Morin et al. 2010). Transcription termination of viral mRNAs was mapped to the distal side of the IGRs that exert the role of bona fide transcription termination signals for the virus polymerase (Meyer et al. 2002; Meyer and Southern 1994; Tortorici et al. 2001; Pinschewer et al. 2005). NP and L are the only viral proteins required for efficient arenavirus RNA synthesis, whereas Z exerts a dose-dependent inhibitory effect on the activity of the mammarenavirus polymerase complex (Cornu and Torre 2001, 2002). Mammareanvirus L proteins have the characteristic motifs conserved among RdRp of negative-strand RNA viruses (Poch et al. 1989). LASV L protein was found to be organized into three distinct structural domains, and the Nand C-regions can functionally trans-complement each other (Brunotte et al. 2011). A modular organization of mammarenavirus L protein is supported by the recent EM characterization of MACV L protein that revealed a core ring-domain decorated by an appendage (Kranzusch and Whelan 2011). LASV and MACV L polymerases have an overall architecture similar to influenza virus and bunyavirus polymerases (Peng et al. 2020). However, LASV and MACV polymerases exhibit some unique local features, including an insertion domain that regulates the polymerase activity. Moreover, in contrast to influenza virus and bunyavirus polymerases (Gerlach et al. 2015; Pflug et al. 2014), the active site of LASV and MACV polymerases were switched on, without the requirement for allosteric activation by the 5, -end of viral RNA genome. In addition, consistent with previously reported genetic and biochemical evidence, L dimerization facilitates polymerase activity. NP is the main structural component of the vRNP responsible for directing viral RNA synthesis. NP was shown to be also a potent anti-type I interferon (IFN-I) viral factor (Martinez-Sobrido et al. 2009, 2007, 2006). Crystal structures of LASV NP (Qi et al. 2010) identified distinct N-and C-terminal domains connected by a flexible linker. The C-terminal of NP contains a functional 3’-5’ exoribonuclease (ExoN) domain of the DEDDH family (Qi et al. 2010; Hastie et al. 2011). The ExoN of LASV NP was proposed to be critical for NP’s anti-IFN activity but dispensable for the role of NP on RNA synthesis directed by the vRNP (Qi et al. 2010; Hastie et al. 2011). However, mammarenaviruses with

Mammarenavirus Genetic Diversity and Its Biological Implications

273

mutations in NP ExoN activity exhibit decreased fitness during replication in IFNdeficient cells (Martinez-Sobrido et al. 2009), supporting an important role of NP ExoN activity in virus multiplication. Assembly and budding: Both GPC and Z were shown to be required for the formation and release of infectious virus-like particles (VLPs). Co-and post-translational processing of GPC generates the SSP, and the mature GP1 and GP2 subunits (Pinschewer et al. 2003; Perez et al. 2003; Strecker et al. 2003; Kunz et al. 2003; Urata et al. 2006). GP1 and GP2 together with SSP form the functional GP complex. SSP has been implicated in the trafficking and function of the viral GP complex (Saunders et al. 2007; York and Nunberg 2006, 2007; York et al. 2004). As with the matrix protein of many other negative-strand RNA viruses, the mammarenavirus matrix Z protein plays a critical role in the assembly and budding of mature infectious virions (Perez et al. 2003; Strecker et al. 2003; Urata et al. 2006). Z building activity is mediated by the late (L) domain motifs: PTAP and PPPY similar to those known to direct budding of several other viruses via interaction with specific host cell proteins (Perez et al. 2003; Strecker et al. 2003; Urata et al. 2006). Z myristoylation at the G2 position is required for its association with the plasma membrane, the location of arenavirus budding (Perez et al. 2004; Strecker et al. 2006). Z has been shown to interact with a number of host cell proteins, including PML and eIF4E, which is thought to contribute to the non-cytolytic strategy of LCMV multiplication and interference with cap-dependent translation of some cellular mRNAs, respectively (Borden et al. 1998; Campbell Dwyer et al. 2000; Djavani et al. 2001; Volpon et al. 2010).

4 Mammarenavirus Phylogenetic Relationships Reassortment and recombination can cause incongruences in phylogenetic relationships among different segments of viral genomes. Accordingly, different phylogenies may be observed, depending on which genomic region is analyzed (Fig. 2). In all cases, though, OW and NW viruses form two distinct, monophyletic clusters (Charrel et al. 2008; Wolff et al. 1978). The Old World group is composed of 20 viral species. LASV, Mopeia, and Mobala viruses cluster in a monophyletic clade with Gairo and Luna viruses, and they are distantly related to the LCMV and Ippy virus clades (Fig. 2). Most of the mammarenaviruses sampled in Asian rodents (i.e., Lìji¯ang virus, Loei River virus, and W¯enzh¯ou virus) cluster together, suggesting a possible common ancestor, and they are more closely related to the LASV clade viruses compared to other African viruses (Fig. 2) (Wang et al. 2019). An exception is Ryukyo virus, which does not cluster with the other Asian viruses, but with LCMV-related sequences (Wu et al. 2018a). Lujo virus (LUJV) (Briese et al. 2009) was discovered in 2008 as the causative agent of a cluster of viral HF cases in South Africa and, until the discovery of Alxa virus (ALXV) (Wu et al. 2018a), it was the most divergent OW mammarenavirus, with

274

M. Sironi et al.

some genomic regions similar to NW viruses. The detection of ALXV in Mongolian three-toed jerboas (Dipus sagitta, subfamily Dipodinae) and its placing in the OW mammarenavirus phylogenetic trees (Fig. 2) revealed that this group has a degree of genetic diversity higher than previously thought and, importantly, that natural host of OW mammarenaviruses is not restricted to rodents in the Murinae subfamily (Wu et al. 2018b). This could be of great relevance in the surveillance for possible future spillovers. Based on their phylogenetic relationships NW arenaviruses are usually divided into three or four major clades A to D. Clade D is also referred to as RecA due to its possible origin following an ancient recombination event (Radoshitzky et al. 2015) (Fig. 2). Group A is composed of viruses from both North (e.g., Whitewater Arroyo virus and Bear Canyon virus) and South America (e.g., Pirital virus and Paraná virus). Clade B includes all known human pathogenic NW arenaviruses (i.e., Chapare virus, Junín virus, Guanarito virus, Machupo virus, and Sabiá virus). This represents a clear difference compared to the OW phylogeny, where pathogenic viruses (i.e., LCMV, LASV, and LUJV) are clearly separated on the tree and diverge from each other (Fig. 2). Nevertheless, clade B also includes non-pathogenic viruses (Fig. 2). Possibly, recombination shaped the evolution of NW mammarenaviruses in this clade. In fact, the nucleocapsid and glycoprotein genes of North American viruses have divergent phylogenetic histories. Based on the NP tree, these three viruses were placed into clade A, whereas GPC analysis placed them within clade B (Charrel et al. 2008). Finally, group C is the smallest one and is formed by the Latino virus and Oliveros virus. Recently, a new mammarenavirus was discovered in Musser’s bristly mouse (Neacomys musseri) in the Amazon region (Fernandes et al. 2018) and was named Xapuri virus (species Xapuri mammarenavirus, XAPV). Phylogenetic analyses showed that this virus forms an independent clade closely related to NW clades C and B (Fernandes et al. 2018). XAPV is a sister group of clade C viruses when NP is used for reconstructing the phylogenetic tree, but a sister lineage of clade B viruses based on L gene tree topology (Fig. 2). These findings suggest that this virus is the result of a reassortment or recombination between viruses in the two clades (Fernandes et al. 2018).

5 Mechanisms of Mammarenavirus Genetic Diversity 5.1 Mammarenavirus Mutation Frequencies Early studies documented that LCMV variants are resistant to neutralizing antibodies (Ciurea et al. 2000, 2001), and CTL both in vivo (Pircher et al. 1990) and in cell culture (Aebischer et al. 1991; Lewicki et al. 1995) could be readily selected, a finding consistent with the well established high mutation rates of RNA viruses due their error-prone replication machinery (Domingo 2007). Studies examining the

Mammarenavirus Genetic Diversity and Its Biological Implications

275

variability of LCMV populations during passages in BHK-21 cells found mutation frequencies in the range of 1.0 × 10−4 –2.7 × 10−4 and 1.0 × 10−4 –5.7 × 10−4 after one and nine, respectively, passages (Grande-Perez et al. 2002). These studies revealed differences in mutation frequencies, as determined by Shannon entropy, among different regions of the viral genome (Domingo 2006). Similar mutation frequencies have been reported during LCMV persistence in BHK-21 cells (GrandePerez et al. 2005a). These findings are consistent with the known genetic variability of RNA viruses (Domingo 2007) and support the view of a quasispecies genetic structure of mammarenavirus populations subjected to natural selection and genetic drift (Sevilla and Torre 2006).

5.2 RNA Recombination in Mammarenaviruses Replication of the genome of NS RNA viruses takes place in the context of the vRNP, which poses a barrier for “copy choice” recombination where the RdRp switches from the donor template RNA to the acceptor template RNA. Accordingly, phylogenetic analysis based on GPC and NP sequences for clades A and B South American and North American mammarenaviruses revealed some discrepancies that were interpreted as evidence that a recombination event between members of clades A and B led to the emergence of North American mammarenavirus A/Rec lineage (Archer and Rico-Hesse 2002; Charrel et al. 2001, 2002, 2003; Fulhorst et al. 2001). This interpretation should be cautiously considered as GP1 is highly variable in clade B and A/Rec lineage, which may introduce an important confounding factor into the phylogenetic analysis (Emonet et al. 2009). Nevertheless, the phylogenetic discrepancies remain when the analysis is done using only the most conserved GPC regions (Charrel et al. 2008; Emonet et al. 2009), but it is still plausible that the recombination signal in these phylogenetic studies was influenced by specific selection pressures (Cajimat et al. 2007). The lack of A/Rec isolates in South America and clade A or clade B in North America raises further questions about how this recombination event occurred. It cannot be ruled out that both parental mammarenaviruses were originally present in North America but disappeared. Recombination requires cells to be co-infected by two different mammarenaviruses. Co-infection with two different mammarenaviruses has been reported in cultured cells (Ellenberg et al. 2004, 2007; Lukashevich et al. 2005; Moreno et al. 2012a; Riviere et al. 1986), but its occurrence in nature may be highly restricted by the superinfection exclusion feature associated with mammarenavirus infection (Ellenberg et al. 2004, 2007; Lukashevich et al. 2005; Moreno et al. 2012a; Riviere et al. 1986). Several mammarenaviruses, like Guanarito (GTOV) and Pirital (PIRV) viruses in Venezuela, share a common geographic localization (Fulhorst et al. 1999; Weaver et al. 2000). Likewise, different mammarenaviruses can infect the same rodent species (Weaver et al. 2000). However, recombinants between GTOV and PIRV have not been observed (Cajimat and Fulhorst 2004; Fulhorst et al. 2008),

276

M. Sironi et al.

which may reflect the mammarenavirus superinfection exclusion (Ellenberg et al. 2004, 2007; Damonte et al. 1983). In summary, consistent with findings with other NS RNA viruses, recombination appears to be a very rare or event absent event in mammarenaviruses. The putative mammarenavirus recombination events documented in the literature should be carefully assessed to rule out laboratory contamination or flawed bioinformatics methods. Intriguingly, recombination appears to be a very common event among members of genus Reptarenavirus, which includes recently discovered arenaviruses infecting snakes in captivity (Stenglein et al. 2012, 2015). Reptarenaviruses and mammarenaviruses have similar genome organization and coding capability, and although current knowledge about reptarenavirus molecular biology is limited, it would be expected that reptarenavirus RNA synthesis is directed by a vRNP of similar characteristics to the mammarenavirus vRNP, raising intriguing questions about the mechanisms by which reptarenavirus, but not mammeranavirus, polymerase appears to be able to switch RNA templates.

5.3 Mammarenavirus Genomic Reassortments Generation of reassortant mammarenaviruses requires a coinfection event that can be experimentally studied in cultured cells (Ellenberg et al. 2004, 2007; Lukashevich et al. 2005; Moreno et al. 2012a; Riviere et al. 1986; Lukashevich 1992; Moshkoff et al. 2007; Riviere and Oldstone 1986). However, phylogenetic analysis of NW and OW mammarenaviruses has failed to uncover evidence of genomic reassortment events during their evolution (Charrel et al. 2003). A possible exception is a mammarenavirus identified in a cluster of transplant-associated fatal cases of LCM infection (Palacios et al. 2008). Phylogenetic analysis revealed a new arenavirus, in which the S segment was closest to the LE strain of LCMV, whereas the L segment had a closer relationship to Kodoko virus. Strain LE was isolated in France from an infected human fetus, whereas Kodoko virus was isolated from wild mice in Africa, raising intriguing questions about the generation of this putative reassortant mammarenavirus. The documented robust superinfection exclusion (Ellenberg et al. 2004, 2007; Moreno et al. 2012a) can contribute to the lower reassortant frequency observed in mammarenaviruses. Moreover, the generation of viable reassortants requires compatibility on the complex network of signals together with RNA–RNA and RNA–protein interactions that mediate genome packaging. Therefore, strainspecific differences involving the sequences or structures of RNAs and packaging proteins of co-infecting parent viruses can pose strong restrictions to the generation of reassortant progeny during coinfection. In addition, the raise of a new reassortant mammarenavirus to levels that can be readily detected in the viral population would require the reassortant to be endowed with some degree of increased viral fitness.

Mammarenavirus Genetic Diversity and Its Biological Implications

277

Mammarenavirus genome segment reassortment in nature appears to be restricted to certain combinations of L and S segments (Riviere et al. 1986; Lukashevich 1992), and recombination between mammarenaviruses is extremely infrequent (Archer and Rico-Hesse 2002; Charrel et al. 2002). Therefore, mutations during genome replication are thought to be the main driving force of mammarenavirus genetic variability.

6 Mammarenavirus Origin and Geographic Distribution Mammarenaviruses have distinct geographic distribution in the Old World and in the New World. The vast majority of Old World mammarenaviruses are limited to the African continent, with the exceptions of LCMV and the known five Asian mammarenaviruses. LCMV global distribution is thought to be caused by anthropogenic factors and the commensal behavior of the house mouse facilitating LCMV dispersal during long-range and short-range human travels (Albarino et al. 2010; Forni et al. 2018). It is worth mentioning that LCMV has been identified in the African continent only recently (Nd et al. 2015). Phylogenetic analyses suggested that its introduction possibly occurred in North America, supporting the idea that human movements carried to Africa both the virus and its reservoir (Forni et al. 2018). The first Asian mammarenaviruses (W¯enzh¯ou virus, species Wenzhou mammarenavirus, WENV ) was discovered in 2104 in the Zhejiang region in China (Li et al. 2015). After that, related viruses were found in several rodent species in China and Cambodia (Wang et al. 2019, 2021; Blasdell et al. 2008). Cambodian viruses were also suspected to be associated with human infection and disease, but this possibility awaits confirmation (Blasdell et al. 2008). Together with the Cambodian WENV strains, another virus (Loei River virus, LORV) was discovered in Thailand in bandicoot rats (subfamily Murinae) (Blasdell et al. 2008), suggesting again a wider distribution of mammarenaviruses than previously thought, both in terms of geography and of host associations. Phylogeographic analysis (Forni et al. 2018) suggests that WENV and LORV have an African origin and may have reached Asia in relatively recent times (Forni et al. 2018). These observations, as well as phylogeographic inference, indicated that OW mammarenaviruses originated in Africa (Forni et al. 2018). RVKV and ALXV mammarenaviruses were discovered in 2018 in China, and ALXV falls basal to all OW mammarenaviruses, suggesting that at least some OW mammarenaviruses shared an Asian ancestor. The New World arenaviruses are present in North and South America, but Latin America is the area hosting the largest diversity of mammarenavirus species. To date, Brazil is the country with the highest number of known mammarenavirus species (Fernandes et al. 2018; Forni et al. 2018), with several of them (Amaparí virus, Cupixi virus, Flexal virus, Xapuri virus) detected in the Amazon Basin Region. In line with these observations, phylogeographic analyses indicated Latin American as the most likely region where NW mammarenaviruses originated. However, phylogeographic

278

M. Sironi et al.

inference is very sensitive to the exclusion of un-sampled extant species, and the identification of novel mammarenaviruses in distinct geographic areas may challenge the inference of a South American origin for the NW group. Mammarenaviruses exhibit relatively narrow geographic ranges, and several mammarenaviruses tend to be confined to specific countries or even regions in a country (Pontremoli et al. 2019). For instance, Gairo, Morogoro, and Luna viruses have specific ranges within Tanzania (Cuypers et al. 2020), whereas LASV is confined to specific regions in West Africa and Kitale virus is the only mammarenavirus identified in Kenya up to date, despite the large rodent diversity in this country (Onyuok et al. 2019). Likewise, most NW mammarenavirus species are mostly located in one country. This applies to species that cause disease in humans including JUNV (Argentina), Chaparé and Machupo viruses (Bolivia), Sabia virus (Brazil), and Guanarito virus (Venezuela) (Forni et al. 2018). The reasons why these viruses tend to remain confined to specific regions remain to be fully investigated, as the rodent reservoirs often display broad distribution across countries and continents.

7 Co-evolution of Mammarenaviruses and Their Natural Reservoirs: Intra- versus Inter-host Genetic Variation of Mammarenaviruses As with their geographic distribution, the host range of most mammarenaviruses is relatively narrow. Sigmodontinae and Neotominae rodents are the main reservoirs for NW and OW, respectively, mammarenaviruses (Gonzalez et al. 2007). TCRV is the only exception, as the virus has only been detected in bats. The specific association of mammarenaviruses with their rodent reservoirs has often been considered as evidence of co-divergence and co-speciation. However, cophylogenetic studies examining phylogenetic agreement between the evolutionary histories of mammarenaviruses and their hosts have revealed that for both OW and NW mammarenaviruses, co-divergence events do not represent the major determinant of virus-host associations, and host switches were found to be common (Forni et al. 2018; Irwin et al. 2012). It is thus possible that the geographic distribution of their hosts rather than host genetic relatedness explains the preferential associations of mammarenaviruses with rodent subfamilies. Geographic proximity and range overlap of rodent subfamilies may represent an important factor for host switches (Irwin et al. 2012). Nonetheless, field studies on African mammarenaviruses have indicated a potential association with specific subtaxa of the same host. Such an association was shown to occur irrespective of rodent taxa mixing in contact zones and, thus, of ecological opportunity for host switches (see also below). Whether these observations reflect very close and specific associations between mammarenaviruses and their hosts or some other processes remains to be determined.

Mammarenavirus Genetic Diversity and Its Biological Implications

279

8 Contribution of Genetic Variability to Mammarenavirus Pathogenesis 8.1 Contribution of Viral Quasispecies to Arenavirus Pathogenesis Because of their low copy-fidelity replication machinery, RNA viruses replicate within infected cells as quasispecies (Domingo 1992; Domingo et al. 2001). Accordingly, studies with LCMV found mutation frequencies of (1 to 5.7) × 10–4 substitutions per nucleotide for all coding regions (Grande-Perez et al. 2002, 2005b; Martin et al. 2010; Moreno et al. 2011), which are consistent with findings reported for other mammarenaviruses (Fulhorst et al. 2001; Weaver et al. 2000, 2001; Bowen et al. 1996, 1997, 2000; Charrel and Lamballerie 2003; Garcia et al. 2000) and within the range commonly observed for riboviruses (Domingo et al. 1988; Drake and Holland 1999). The quasispecies genetic structure, and underlying error-prone replication machinery, provide riboviruses with their capability for rapid evolution and adaptation to different environments (Novella et al. 1995). It should be noted that the quasispecies, rather than individual variants, is the target of selection (Perales et al. 2005), and cooperation and complementation between variants in the quasispecies can allow for the maintenance of low fitness variants within the population that can interfere with variants of higher fitness (Martin et al. 2010; Torre and Holland 1990; Gonzalez-Lopez et al. 2004; Mas et al. 2010; Perales et al. 2007; Zapata and Salvato 2013). Recombination (Charrel et al. 2001; Fulhorst et al. 1996) and genome reassortment (Riviere et al. 1986; Riviere and Oldstone 1986; Riviere 1987) can contribute to mammarenavirus genetic diversity. However, these two processes appear to occur very rarely in nature and involve genetically closely related mammarenavirus (Archer and Rico-Hesse 2002; Charrel et al. 2002; Emonet et al. 2006; Jay et al. 2005). Mammarenavirus genetic diversity is also likely influenced by their ability to infect a wide variety of hosts (Zapata and Salvato 2013), and mammarenavirus host range and pathogenicity appear to reflect the genetic diversity within and between mammarenaviruses (Blasdell et al. 2008). Evidence indicates that differences in quasispecies structure contribute to biological properties and pathogenesis of mammarenaviruses (Sevilla et al. 2002), as well as virus spread, both within the host and between species (Sevilla et al. 2002; Oldstone and Campbell 2010). Viral variants present at low frequency in the quasispecies can raise to dominance under specific selective pressures as documented for rabies (Morimoto et al. 1998), coxackie viruses (Beaucourt et al. 2011; Domingo et al. 2008), poliovirus (PV) (Pfeiffer and Kirkegaard 2005; Vignuzzi et al. 2006), foot-and-mouth disease virus (FMDV) (Haydon et al. 2001; Martin and Domingo 2008) and human para-influenza (Prince et al. 2001). Tissue-specific selection of LCMV variants with distinct pathogenic properties has been documented (Ahmed and Oldstone 1988; Ahmed et al. 1984). Likewise, serial

280

M. Sironi et al.

passages of LCMV in cultured cells can result in the emergence of variants with altered pathogenicity (Lukashevich 1992; Hotchin and Sikora 1973; Pulkkinen and Pfau 1970).

8.2 Selection of Immunosuppressive Variants During LCMV Persistence Persistent infections can facilitate the emergence and selection over time of novel viral phenotypes that might have an impact on the course and outcome of infection (Ahmed et al. 1984). Both viral and host factors influence intra-host viral evolution that can result in the selection of tissue-specific virus populations (Ahmed and Oldstone 1988; Brown et al. 2011; Cabot et al. 2000; Cheng-Mayer et al. 1989; Deforges et al. 2004; Hall et al. 2001; Jelcic et al. 2004; Jridi et al. 2006; Sanjuan et al. 2004; Trivedi et al. 1994; Wright et al. 2011). LCMV infection of the mouse offers a robust experimental model to study viral population dynamics in vivo (Ahmed et al. 1988; Evans et al. 1994). Infection of adult mice with Armstrong (ARM) strain of LCMV causes an acute infection that is controlled and clears the virus within days of infection by a strong virus-specific CD8+ T cell response. In contrast, similar infection conditions with the immunosuppressive clone 13 (Cl-13) variant of LCMV ARM strain causes a persistent infection that is associated with generalized immune suppression (Ahmed and Oldstone 1988; Ahmed et al. 1984, 1991; Evans et al. 1994; Baranowski et al. 2000; Tishon et al. 1993; Wu-Hsieh et al. 1988). Phenotypic differences between ARM (CTL+ /PI− ) and Cl-13 (CTL− /PI+ ) correlate with differences in cell tropism at early times of infection. In the spleen, ARM targets macrophages of the red pulp of the spleen, whereas Cl-13 preferentially infects dendritic cells (DCs) of the white pulp (Borrow et al. 1995; Sevilla et al. 2000). ARM and Cl-13 differ at three amino acid positions, one in the L polymerase [K(Arm)-1079-Q(C13)] and two within GP1 [N(ARM)-176-D(Cl-13) and F(Arm)-260-L(Cl-13)]. Mutation F260L provided Cl-13 GP1 with a strong binding affinity for its receptor alphadystroglycan (aDG), which facilitates infection of DCs by Cl-13 (Kunz et al. 2002; Cao et al. 1998). In addition to mutation F260L in GP1, mutation K1076Q in the L polymerase was found to play a critical role in Cl-13 persisting phenotype (Bergthaler et al. 2010; Sullivan et al. 2011). Notably, different L mutations in combination with mutation F260L in GP1 are able to support Cl-13 persistence, as mutation L (K1076Q) was found only in 19% of viral isolates from persistently infected mice (Sevilla et al. 2000).

Mammarenavirus Genetic Diversity and Its Biological Implications

281

8.3 LCMV Variants and Growth Hormone Deficiency Syndrome C3He mice infected at birth with certain strains of LCMV succumb within 15–20 days of infection due to a growth hormone deficiency syndrome (GHDS) manifested by growth retardation and hypoglycaemia (Oldstone 2002; Oldstone et al. 1985, 1984). Development of the GHDS correlated with the virus’s ability to infect and replicate GH-producing cells in the anterior pituitary, which was associated with decreased levels of growth hormone (GH) mRNA and protein (Klavinskis and Oldstone 1989; Valsamakis et al. 1987). The use of reassortant viruses mapped the virus-induced GHDS to the S genome segment (Riviere 1987). As expected, clonal analysis of WE strain of LCMV, which does not cause GHDS in C3He mice, showed that most (58/61) of the clones tested did not cause GHDS. However, three of the clones tested caused the characteristic GHDS in C3He mice (Buesa-Gomez et al. 1996). Notably, WEc54 (GHS+ ) and WEc2.5 (GHS− ) clonal populations differed in a single amino acid at position 153 in GP1. These findings illustrated that virulent (GHDS+ ) can be present within a non-virulent (GHDS− ) viral population, and how minor changes in the virus genome may have a great impact on the biology of the infected host (Buesa-Gomez et al. 1996; Teng et al. 1996).

9 Lassa Virus: Origin, Evolution, and Contribution of Genetic Variability to Detection and Pathogenesis LASV was first identified in 1969 (Frame et al. 1970). However, molecular dating approaches have indicated that the virus originated in Nigeria more than 1000 years ago and gradually moved westwards probably afflicting human populations ever since (Forni et al. 2018; Andersen et al. 2015; Manning et al. 2015). At present, LASV is endemic in large areas of West Africa including Nigeria, Guinea, Liberia, and Sierra Leone) (https://www.cdc.gov/vhf/lassa/index.html) (Happi et al. 2019). Sporadic cases of LF cases have been reported in nearby countries such as Benin and Togo (Salu et al. 2019; Whitmer et al. 2018). Phylogenetic and epidemiological data indicate that human-to-human transmission of LASV is rare and most human infections are transmitted by rodents (Pontremoli et al. 2019; Andersen et al. 2015; Charrel and Lamballerie 2010). Accordingly, phylogenetic analyses showed that humanand rodent-derived genomes are interspersed (Fig. 5a), and sequences sampled over the same time-period do not necessarily cluster together (Pontremoli et al. 2019; Andersen et al. 2015; Kafetzopoulou et al. 2019; Siddle et al. 2018). These observations are consistent with most human cases being the result of independent spillover events from the rodent reservoir. Thus, LASV genetic diversity and evolution are largely determined by the circulation in its natural rodent reservoirs. The Natal multimammate mouse (Mastomys natalensis), a commensal rodent, is the main natural reservoir of LASV (Bonwitt et al. 2017; Fichet-Calvet and Rogers

282

M. Sironi et al.

2009). However, other non-commensal rodents, including the African wood mouse (Hylomyscus pamfi), the Guinea multimammate mouse (Mastomys erythroleucus), and the pygmy mice (Mus Baoulei), can serve as alternative hosts for LASV (Olayemi et al. 2016; Yadouleton et al. 2019). Natal multimammate mice are widespread throughout Sub-Saharan Africa (Colangelo et al. 2013). Nonetheless, LASV seems to be confined to West Africa, with no report of LF cases east of Nigeria (Pontremoli

Mammarenavirus Genetic Diversity and Its Biological Implications

283

◄Fig. 5 Geographic clustering of LASV genomes. a A phylogeny of 280 LASV sequences (L segments) was constructed using RAxML and the reference mobala virus sequence (L: NC_007904) as the outgroup (red). Color codes denote the geographic origin (see map in panel B and legend). Lineage and sublineage nomenclature is in accordance with previous work (Ehichioya et al. 2019). Sequences sampled from rodents are indicated with an asterisk. b Discriminant analysis of principal components (DAPC) (Stamatakis 2014; Jombart et al. 2010) of LASV genomes. DAPC allows the identification of clusters of genetically related samples. Specifically, it searches for allele combinations that maximize the variation among clusters while minimizing the variation within clusters. In DAPC, data are first transformed using PCA (principal component analysis), the optimal number of clusters is identified, and then each sample is assigned to one of these clusters. In the figure, the first two discriminant functions, which explain the majority of variance in the data, are plotted. Distance on the plot is proportional to the genetic distance between clusters. Samples are color-coded based on their geographic origin, as shown in the map of West Africa. DAPC data are from previous work (Forni and Sironi 2020)

et al. 2019). This is thought to reflect a limited dispersal range of reservoir rodents, constrained by natural barriers such as rivers and ridges (Brouat et al. 2007; Russo et al. 2016). An alternative hypothesis is that as with other OW mammarenaviruses (Cuypers et al. 2020; Gryseels et al. 2017), LASV is preferentially associated with specific subtaxa, and distinct M. natalensis subtaxa are present east and west of Nigeria (Olayemi et al. 2016; Gryseels et al. 2017). This possibility, which implies that host-specific factors restrict the host range of LASV, is however in contrast with the fact that the virus can infect rodents of species other than M. natalensis. Consistent with the relatively narrow geographic range of LASV, large-scale analyses have indicated that viral genetic diversity is geographically structured (Pontremoli et al. 2019; Bowen et al. 2000; Andersen et al. 2015; Kafetzopoulou et al. 2019; Siddle et al. 2018; Ehichioya et al. 2019, 2011; Forni and Sironi 2020; Wiley et al. 2019) (Fig. 5). As today, LASV isolates have been classified into seven lineages. Circulating LASV strains in Nigeria corresponding to members from lineages I, II, and III, whereas members of lineage IV are present in Liberia, Guinea, and Sierra Leone (Bowen et al. 2000; Andersen et al. 2015; Kafetzopoulou et al. 2019; Siddle et al. 2018; Ehichioya et al. 2019, 2011; Forni and Sironi 2020; Wiley et al. 2019). Members of lineage V are present in Mali and the Ivory Coast (Manning et al. 2015). Lineage VI consists of a limited number of sequences identified in Nigeria (Olayemi et al. 2016; Ehichioya et al. 2019), whereas presently lineage VII includes only two sequences from Togo (Whitmer et al. 2018) (Fig. 5a). Higher LASV diversity in Nigeria, together with phylogeographic analyses, supports that LASV originated in Nigeria and moved westward (Bowen et al. 2000; Andersen et al. 2015; Manning et al. 2015). LASV populations exhibit a strong spatial clustering, with two major East–West and North–South diversity gradients (Fig. 5b). Current LASV lineages appear to diverge from an ancestral population present in Nigeria, but other locations cannot be ruled out. Genetic ancestry analysis supports a single, non-recent introduction of LASV in the Mano River Union, likely driven by westward rodent migration (Bowen et al. 2000; Andersen et al. 2015). The viral populations showing the strongest genetic drift are instead those transmitted

284

M. Sironi et al.

in Mali and the Ivory Coast (lineage V). In general, high levels of genetic drift are consistent with population bottlenecks and/or founder effects. These observations are thus consistent with molecular dating analyses suggesting LASV was introduced in these areas during colonial times, most likely by the human-mediated transportation of commensal and/or semi-commensal species (Manning et al. 2015; Wiley et al. 2019). It is possible that anthropogenic changes, like changes in the population sizes both of multimammate mice during armed conflicts in Sierra Leone and Nigeria, have also influenced the spatial distribution and divergence of LASV lineages (Lalis et al. 2012). The impact of the LASV genetic divergence on pathogenicity and human disease remains poorly understood. Epidemiological data indicates that a low (≤20%) of people infected with LASV develop LF, and among hospitalized confirmed cases of LF, the fatality rate is around 15% (https://www.who.int/news-room/fact-sheets/ detail/lassa-fever). As with many other viral infections, both viral and host factors are likely to determine the outcome of LASV infection, including the severity of disease symptoms, but data are presently limited (Pontremoli et al. 2019; Duvignaud et al. 2021; Okokhere et al. 2018; Wauquier et al. 2019). It should be noted that the vast majority of human-derived LASV genomes that have been analyzed were obtained from patients who developed LF symptoms, and most of them were infected with strains from lineages II and IV. This prevents an accurate assessment of possible differences in pathogenic potential among members of different lineages. Moreover, the disparity in time to diagnosis and treatment might influence disease outcomes. Nevertheless, an analysis of LF cases between 2008 and 2012 found a higher case fatality at Kenema Government Hospital (Sierra Leone) than at Irrua Specialist Teaching Hospital (Nigeria), despite similar available treatments (Andersen et al. 2015). High fatality rate was also documented among a cluster of LF cases originated in Jalingo (Taraba State, Nigerai) and being treated at Irrua Specialist Teaching Hospital (Okokhere et al. 2018). These findings suggest that viral lineages and sublineages transmitted in specific areas might carry distinct virulence determinants. LASV genetic diversity has important implications for the development of LASV detection assays and LF diagnostic, central to surveillance strategies and patient management. To date, the golden standard of LASV detection from clinical samples is RT-PCR (Nikisins et al. 2015). Early RT-PCR assays were designed with primer sequences based on the Josiah strain from Sierra Leone, but performed poorly with sequences from other countries (Demby et al. 1994; Olschlager et al. 2010; Trappier et al. 1993). Several RT-PCR assays have been developed mainly for research purposes only and with variable performance. In 2015, the European Network for Diagnostics of “Imported” Viral Diseases (ENIVD) released the data of a quality assessment study for molecular detection of LASV (Nikisins et al. 2015). The 24 participating laboratories received a panel of 13 samples containing various concentrations of inactivated LASV strains from different geographic origins: Josiah (Sierra Leone), Lib-1580/121 (Liberia), CSF (Nigeria), and AV (most likely Ivory Coast). The laboratories used different protocols, either published or in-house. Results indicated that 54% of participants had no false-negative results, but 46% had at least

Mammarenavirus Genetic Diversity and Its Biological Implications

285

one, with 29% of laboratories reporting 3–5 false-negative results. The strain most commonly undetected across laboratories was the one from Liberia, possibly because knowledge of LASV diversity in this country was very limited at the time of the quality assessment. In fact, a recent sequencing effort of LASV in Liberia (Wiley et al. 2019) revealed that circulating strains are genetically divergent from those sampled elsewhere, and indicated that a number of mismatches were located in the target sites of many published assays. In general, Wiley and coworkers noted that there was a correlation between the number of mismatches in the target region and the ability to detect dilutions of Liberian strains. Their analysis underscored the need to develop alternative assays and to use assay combinations to increase detection rates. Indeed, during a large-scale sequencing of LASV in Nigeria, two assays, which showed not fully concordant results, were combined to identify positive samples (Kafetzopoulou et al. 2019). In addition to the problems posed by LASV genetic diversity, RT-PCR poses problems for deployment in resource-limited areas, due to its costs and relative complexity. Alternative detection systems have been developed including a PanLassa Antigen Rapid Test (Pan-Lassa RDT) and CRISPR-Cas13a-based assays (Barnes et al. 2020; Boisen et al. 2020). The Pan-Lassa RDT uses a mixture of polyclonal antibodies against the nucleoproteins of representative strains from the three most prevalent LASV lineages (II, III and IV), and it showed good performance and sensitivity when tested during the 2018 Lassa fever outbreak in Nigeria (Boisen et al. 2020). The CRISPR-Cas13a-based assays require only isothermal amplification (eliminating the need of thermal cycling) and were designed to detect lineage II and lineage IV viruses with high specificity (Barnes et al. 2020).

10 Implications of Mammarenavirus Genetic Variability for the Development of Vaccines and Antiviral Drugs 10.1 Mammarenavirus Genetic Variation and Vaccine Development JUNV in the Argentinean Pampas and LASV in Western Africa are the two mammarenaviruses with the highest impact on human health. The use of the liveattenuated Candid#1 strain of JUNV within populations at high risk of exposure to JUNV resulted in a very significant reduction in cases of AHF (Enria and Barrera Oro 2002; Grant et al. 2012). In contrast, as today none of the LASV vaccine candidates has entered a clinical trial. Nevertheless, several LASV LAV candidates based on the use of different viral vector-based platforms have shown promising results in animal models of LF (Falzarano and Feldmann 2013; Lukashevich 2012). The high degree of genetic diversity observed among members of the different LASV lineages has raised the question about the feasibility of a pan-LASV vaccine able to protect against infections by members of the different currently identified

286

M. Sironi et al.

lineages of LASV. LASV sequences clustered geographically independently of a rodent or human source (Bowen et al. 2000; Andersen et al. 2015), findings that have been supported by the results of genetic analysis of clinical LF samples collected in 2008–2013 (Andersen et al. 2015; Leski et al. 2015). LASV lineages I-III circulate in different geographic areas of Nigeria, whereas members of lineage IV are present in Guinea, Liberia, and Sierra Leone (SL). Isolates from Mali and Ivory Coast have been assigned to lineage V (Manning et al. 2015; Gunther et al. 2000), whereas an isolate from Togo exhibiting a mosaic genome structure has been proposed to represent a new lineage VI (Whitmer et al. 2018). Recombination has not been yet reported for LASV, and naturally occurring LASV reassortment associated with cases of LV seems to be an extremely rare event (Andersen et al. 2015), which may have been generated via co-infections in individuals traveling across different LASV endemic areas (Lalis and Wirth 2018). Evidence indicates that LASV-specific CD4 + and CD8 + T cell responses play critical roles in protection against LF, whereas neutralizing antibodies appear too late after infection and at low titers (Fisher-Hoch et al. 2000; Fisher-Hoch and McCormick 2004; Hallam et al. 2018). However, passive antibody transfer has been shown to induce protection in animal models of LF (Jahrling 1983) and in limited human studies (Monath and Casals 1975). Only one of three predicted cross-reactive HLAA2-restricted LASV GPC CD8 + T cell epitopes is shared between LASV/JOS (clade IV) and LASV/GA391 (clade III) (Botten et al. 2006). However, studies with the vaccine candidate reassortant ML29, carrying the L RNA from MOPV and the S RNA from LASV/JOS (Lukashevich 1992, 2012), have suggested the feasibility of a pan-LASV vaccine (Carrion et al. 2007). Moreover, recent studies have identified several CD8 + T cell epitopes common to both Nigerian and Sierra Leonean LF survivors (Sakabe et al. 2020). However, the overall observed high variability in epitope specific responses among LF survivors suggests that a pan-LASV vaccine should incorporate T cell epitopes able to elicit broad CD8 + T cell responses in diverse populations across Western Africa.

10.2 Lethal Mutagenesis as a Novel Antiviral Strategy to Combat Mammarenavirus Infections The potential for rapid evolution and adaptation exhibited by RNA is determined mainly by their low fidelity replication machineries, which results in viral populations having a quasispecies genetic structure, constituted by a spectrum of closely related but non-identical genomes. Individual species within the replicating viral quasispecies may exhibit distinct phenotypes, including increased resistance to a given therapeutic, that can be readily manifested under specific selective pressures. Mutation rates of replicating riboviruses are close to a threshold beyond which the number of mutations incorporated per round of replication is predicted to promote

Mammarenavirus Genetic Diversity and Its Biological Implications

287

viral extinction, a process termed entry into error catastrophe (Eigen 2002). Therefore, mutagenic agents are expected to exert antiviral activity by promoting viral entry into error catastrophe. This prediction has been experimentally confirmed in different viral systems (Dapp et al. 2009, 2012; Harris et al. 2005; Graci et al. 2007; OrtegaPrieto et al. 2013). Moreover, mutagen-induced mutations can promote the generation of dominant negative viral mutants, or “defectors”, that can inhibit virus multiplication, a process that has been termed “lethal defection”, a model that is supported by experimental results and in silico predictions (Grande-Perez et al. 2005a; GonzalezLopez et al. 2004; Perales et al. 2007; Ahmed and Oldstone 1988; Ahmed et al. 1984, 1991; Evans et al. 1994; Baranowski et al. 2000; Tishon et al. 1993; Wu-Hsieh et al. 1988; Manrubia et al. 2010; Moreno et al. 2012b). This model provides an explanation for why randomization of the virus genome sequence predicted by the quasispecies theory is not observed during viral extinction via lethal mutagenesis (Grande-Perez et al. 2005b). Virus extinction through lethal mutagenesis has been documented for mammarenaviruses in cultured cells (Grande-Perez et al. 2005a; Moreno et al. 2011, 2012a; Martin et al. 2010; Ruiz-Jarabo et al. 2003), a finding that was consistent with results from in vivo experiments using the mouse model of LCMV infection (Ruiz-Jarabo et al. 2003; Sanz-Ramos et al. 2012). The mutagen 5-FU exhibited a dose-dependent effect on LCMV in infectivity and systematic and rapid extinction in BHK-21 cell cultures. However, mutation frequencies did not correlate with infectivity values (Grande-Perez et al. 2002), and LCMV mutagenized pre-extinction populations retained, despite increased diversity, their original consensus genome (Grande-Perez et al. 2005b). Preextinction LCMV populations were enriched in DIG (GrandePerez et al. 2005a, b) (Fig. 3). 5-FU induced mutagenesis promoted an enrichment of “defectors” in pre-extinction LCMV populations (Grande-Perez et al. 2005a, b) (Fig. 3), which inhibited viral RNA synthesis, thus resulting in increased protection of the virus population against the mutagenic effect of 5-FU.

10.3 Novel Combination Drug Therapy to Combat Mammarenavirus Infections Current anti-mammarenavirus therapy is limited to an off-label use of the nucleoside analog ribavirin that is only partially effective, has a narrow therapeutic window, and can cause side effects (Damonte and Coto 2002; Parker 2005). A number of compounds have been shown to exhibit anti-mammarenaviral activity in cell-based assays, but their safety and efficacy in vivo remain to be determined (Lee et al. 2011). Both favipiravir, a broad-spectrum antiviral (Gowen et al. 2013; Mendenhall et al. 2011a, b; Safronetz et al. 2015), ST-193, a GP-mediated fusion inhibitor (Cashman et al. 2011), have shown promising results in animal models of mammarenavirus infection disease. Nevertheless, the development of additional anti-mammarenaviral drugs can facilitate the use of combination therapy to combat human pathogenic

288

M. Sironi et al.

mammarenavirus infection, an approach known to ameliorate the emergence of drug resistant variants often observed with monotherapy strategies. However, the development of novel antiviral drugs is a costly and lengthy process due to the labor and resource-intensive efforts involved in the preclinical optimization of the drug (Ashburn and Thor 2004; Pushpakom et al. 2019). Drug repurposing strategies can help to overcome some of these obstacles. Drug repurposing can also facilitate the generation of new knowledge about virus biology by identifying novel host cell factors and pathways contributing to different steps of the virus “life” cycle. In the case of mammarenaviruses, the use of this strategy is illustrated by the results of screening the Repurposing, Focused Rescue, and Accelerated Medchem [ReFRAME] library (Kim et al. 2019). This screen identified inhibitors of enzymes of the rate-limiting steps of pyrimidine and purine biosynthesis, the pro-viral MCL1 apoptosis regulator, BCL2 family member protein, and mitochondrial electron transport complex III as having a potent antiviral activity against LCMV. Importantly, these inhibitors exhibited also strong antiviral activity against LASV and JUNV. Combining antiviral drugs targeting specific steps of the virus life cycle with lethal mutagenesis provides a unique combination therapy approach that could be diminished, or eliminate, the problem posed by the emergence of viral variants resistant to multiple antiviral drugs as documented for HCV (Kai et al. 2017). This strategy is supported by findings showing that for RNA viruses combination therapy of inhibitors of virus multiplication and a mutagen is more effective and less toxic than classic combination therapy (Perales et al. 2012). Moreover, mutagen-mediated increase in “defectors” further contributes to the inhibition of virus multiplication (Grande-Perez et al. 2005a). This approach, however, needs to take into consideration theoretical predictions supporting that sequential administration of first an inhibitor of virus multiplication to reduced viral load followed by treatment with the mutagenic agent can have a stronger antiviral effect than when both type of drugs are administered simultaneously or individually (Eigen 2002, 1971). These predictions have received experimental support from studies with LCMV showing that administration of Rib at concentrations that favored its replication inhibition activity followed by administration of the mutagen 5-FU resulted in a stronger antiviral effect than when drugs were administered simultaneously or individually (Moreno et al. 2012b; Perales et al. 2009). Acknowledgements M.S. acknowledges support from the Italian Ministry of Health (‘Ricerca Corrente 2019–2020’); D.F. acknowledges support from the Italian Ministry of Health (‘Ricerca Corrente 2018–2020’). JCT was supported by NIH/NIAID grants AI125626 and AI128556. This is the manuscript # XXX from The Scripps Research Institute.

Mammarenavirus Genetic Diversity and Its Biological Implications

289

References Aebischer T, Moskophidis D, Rohrer UH, Zinkernagel RM, Hengartner H (1991) In vitro selection of lymphocytic choriomeningitis virus escape mutants by cytotoxic T lymphocytes. Proc Natl Acad Sci U S A 88:11047–11051 Ahmed R, Oldstone MB (1988) Organ-specific selection of viral variants during chronic infection. J Exp Med 167:1719–1724 Ahmed R, Salmi A, Butler LD, Chiller JM, Oldstone MB (1984) Selection of genetic variants of lymphocytic choriomeningitis virus in spleens of persistently infected mice. Role in suppression of cytotoxic T lymphocyte response and viral persistence. J Exp Med 160:521–540 Ahmed R, Simon RS, Matloubian M, Kolhekar SR, Southern PJ, Freedman DM (1988) Genetic analysis of in vivo-selected viral variants causing chronic infection: importance of mutation in the L RNA segment of lymphocytic choriomeningitis virus. J Virol 62:3301–3308 Ahmed R, Hahn CS, Somasundaram T, Villarete L, Matloubian M, Strauss JH (1991) Molecular basis of organ-specific selection of viral variants during chronic infection. J Virol 65:4242–4247 Albarino CG, Palacios G, Khristova ML, Erickson BR, Carroll SA, Comer JA, Hui J, Briese T, St George K, Ksiazek TG, Lipkin WI, Nichol ST (2010) High diversity and ancient common ancestry of lymphocytic choriomeningitis virus. Emerg Infect Dis 16:1093–1100 Andersen KG, Shapiro BJ, Matranga CB, Sealfon R, Lin AE, Moses LM, Folarin OA, Goba A, Odia I, Ehiane PE, Momoh M, England EM, Winnicki S, Branco LM, Gire SK, Phelan E, Tariyal R, Tewhey R, Omoniwa O, Fullah M, Fonnie R, Fonnie M, Kanneh L, Jalloh S, Gbakie M, Saffa S, Karbo K, Gladden AD, Qu J, Stremlau M, Nekoui M, Finucane HK, Tabrizi S, Vitti JJ, Birren B, Fitzgerald M, McCowan C, Ireland A, Berlin AM, Bochicchio J, Tazon-Vega B, Lennon NJ, Ryan EM, Bjornson Z, Milner DA Jr, Lukens AK, Broodie N, Rowland M, Heinrich M, Akdag M et al (2015) Clinical sequencing uncovers origins and evolution of Lassa virus. Cell 162:738–750 Archer AM, Rico-Hesse R (2002) High genetic divergence and recombination in Arenaviruses from the Americas. Virology 304:274–281 Armstrong CL, Lillie RD (1934) Experimental lymphocytic choriomeningitis of monkeys and mice produced by a virus encountered in studies of the 1933 St. Louis encephalitis epidemic. Public Health Rep 1896–1970:1019–1027 Ashburn TT, Thor KB (2004) Drug repositioning: identifying and developing new uses for existing drugs. Nat Rev Drug Discov 3:673–683 Baranowski E, Ruiz-Jarabo CM, Sevilla N, Andreu D, Beck E, Domingo E (2000) Cell recognition by foot-and-mouth disease virus that lacks the RGD integrin-binding motif: flexibility in aphthovirus receptor usage. J Virol 74:1641–1647 Barnes KG, Lachenauer AE, Nitido A, Siddiqui S, Gross R, Beitzel B, Siddle KJ, Freije CA, Dighero-Kemp B, Mehta SB, Carter A, Uwanibe J, Ajogbasile F, Olumade T, Odia I, Sandi JD, Momoh M, Metsky HC, Boehm CK, Lin AE, Kemball M, Park DJ, Branco L, Boisen M, Sullivan B, Amare MF, Tiamiyu AB, Parker ZF, Iroezindu M, Grant DS, Modjarrad K, Myhrvold C, Garry RF, Palacios G, Hensley LE, Schaffner SF, Happi CT, Colubri A, Sabeti PC (2020) Deployable CRISPR-Cas13a diagnostic tools to detect and report Ebola and Lassa virus cases in real-time. Nat Commun 11:4131 Barton LL, Mets MB (2001) Congenital lymphocytic choriomeningitis virus infection: decade of rediscovery. Clin Infect Dis 33:370–374 Barton LL, Mets MB, Beauchamp CL (2002) Lymphocytic choriomeningitis virus: emerging fetal teratogen. Am J Obstet Gynecol 187:1715–1716 Beaucourt S, Borderia AV, Coffey LL, Gnadig NF, Sanz-Ramos M, Beeharry Y, Vignuzzi M (2011) Isolation of fidelity variants of RNA viruses and characterization of virus mutation frequency. J Vis Exp 2953. https://doi.org/10.3791/2953:2953-2982 Bergthaler A, Flatz L, Hegazy AN, Johnson S, Horvath E, Lohning M, Pinschewer DD (2010) Viral replicative capacity is the primary determinant of lymphocytic choriomeningitis virus persistence and immunosuppression. Proc Natl Acad Sci U S A 107:21641–21646

290

M. Sironi et al.

Beyer WR, Popplau D, Garten W, von Laer D, Lenz O (2003) Endoproteolytic processing of the lymphocytic choriomeningitis virus glycoprotein by the subtilase SKI-1/S1P. J Virol 77:2866– 2872 Blasdell KR, Becker SD, Hurst J, Begon M, Bennett M (2008) Host range and genetic diversity of arenaviruses in rodents, United Kingdom. Emerg Infect Dis 14:1455–1458 Bodewes R, Kik MJ, Raj VS, Schapendonk CM, Haagmans BL, Smits SL, Osterhaus AD (2013) Detection of novel divergent arenaviruses in boid snakes with inclusion body disease in The Netherlands. J Gen Virol 94:1206–1210 Boisen ML, Uyigue E, Aiyepada J, Siddle KJ, Oestereich L, Nelson DKS, Bush DJ, Rowland MM, Heinrich ML, Eromon P, Kayode AT, Odia I, Adomeh DI, Muoebonam EB, Akhilomen P, Okonofua G, Osiemi B, Omoregie O, Airende M, Agbukor J, Ehikhametalor S, Aire CO, Duraffour S, Pahlmann M, Bohm W, Barnes KG, Mehta S, Momoh M, Sandi JD, Goba A, Folarin OA, Ogbaini-Emovan E, Asogun DA, Tobin EA, Akpede GO, Okogbenin SA, Okokhere PO, Grant DS, Schieffelin JS, Sabeti PC, Gunther S, Happi CT, Branco LM, Garry RF (2020) Field evaluation of a Pan-Lassa rapid diagnostic test during the 2018 Nigerian Lassa fever outbreak. Sci Rep 10:8724 Bonthius DJ (2009) Lymphocytic choriomeningitis virus: a prenatal and postnatal threat. Adv Pediatr 56:75–86 Bonthius DJ (2012) Lymphocytic choriomeningitis virus: an underrecognized cause of neurologic disease in the fetus, child, and adult. Semin Pediatr Neurol 19:89–95 Bonthius DJ, Perlman S (2007) Congenital viral infections of the brain: lessons learned from lymphocytic choriomeningitis virus in the neonatal rat. PLoS Pathog 3:e149 Bonwitt J, Saez AM, Lamin J, Ansumana R, Dawson M, Buanie J, Lamin J, Sondufu D, Borchert M, Sahr F, Fichet-Calvet E, Brown H (2017) At home with mastomys and rattus: human-rodent interactions and potential for primary transmission of Lassa virus in domestic spaces. Am J Trop Med Hyg 96:935–943 Borden KL, Campbell Dwyer EJ, Salvato MS (1998) An arenavirus RING (zinc-binding) protein binds the oncoprotein promyelocyte leukemia protein (PML) and relocates PML nuclear bodies to the cytoplasm. J Virol 72:758–766 Borrow P, Evans CF, Oldstone MB (1995) Virus-induced immunosuppression: immune systemmediated destruction of virus-infected dendritic cells results in generalized immune suppression. J Virol 69:1059–1070 Botten J, Alexander J, Pasquetto V, Sidney J, Barrowman P, Ting J, Peters B, Southwood S, Stewart B, Rodriguez-Carreno MP, Mothe B, Whitton JL, Sette A, Buchmeier MJ (2006) Identification of protective Lassa virus epitopes that are restricted by HLA-A2. J Virol 80:8351–8361 Bowen MD, Peters CJ, Nichol ST (1996) The phylogeny of New World (Tacaribe complex) arenaviruses. Virology 219:285–290 Bowen MD, Peters CJ, Nichol ST (1997) Phylogenetic analysis of the Arenaviridae: patterns of virus evolution and evidence for cospeciation between arenaviruses and their rodent hosts. Mol Phylogenet Evol 8:301–316 Bowen MD, Rollin PE, Ksiazek TG, Hustad HL, Bausch DG, Demby AH, Bajani MD, Peters CJ, Nichol ST (2000) Genetic diversity among Lassa virus strains. J Virol 74:6992–7004 Bray M (2005) Pathogenesis of viral hemorrhagic fever. Curr Opin Immunol 17:399–403 Briese T, Paweska JT, McMullan LK, Hutchison SK, Street C, Palacios G, Khristova ML, Weyer J, Swanepoel R, Egholm M, Nichol ST, Lipkin WI (2009) Genetic detection and characterization of Lujo virus, a new hemorrhagic fever-associated arenavirus from southern Africa. PLoS Pathog 5:e1000455 Brouat C, Loiseau A, Kane M, Ba K, Duplantier JM (2007) Population genetic structure of two ecologically distinct multimammate rats: the commensal Mastomys natalensis and the wild Mastomys erythroleucus in southeastern Senegal. Mol Ecol 16:2985–2997

Mammarenavirus Genetic Diversity and Its Biological Implications

291

Brown RJ, Peters PJ, Caron C, Gonzalez-Perez MP, Stones L, Ankghuambom C, Pondei K, McClure CP, Alemnji G, Taylor S, Sharp PM, Clapham PR, Ball JK (2011) Intercompartmental recombination of HIV-1 contributes to env intrahost diversity and modulates viral tropism and sensitivity to entry inhibitors. J Virol 85:6024–6037 Brunotte L, Lelke M, Hass M, Kleinsteuber K, Becker-Ziaja B, Gunther S (2011) Domain structure of Lassa virus L protein. J Virol 85:324–333 Buchmeier MJ, Peters, CJ, de la Torre JC (2007) Arenaviridae: the viruses and their replication. In: Knipe DM, Holey PM (ed) Fields virology, 5th ed, vol 2, pp 1792–1827 Buckley SM, Casals J (1970) Lassa fever, a new virus disease of man from West Africa. 3. Isolation and characterization of the virus. Am J Trop Med Hyg 19:680–691 Buesa-Gomez J, Teng MN, Oldstone CE, Oldstone MB, de la Torre JC (1996) Variants able to cause growth hormone deficiency syndrome are present within the disease-nil WE strain of lymphocytic choriomeningitis virus. J Virol 70:8988–8992 Cabot B, Martell M, Esteban JI, Sauleda S, Otero T, Esteban R, Guardia J, Gomez J (2000) Nucleotide and amino acid complexity of hepatitis C virus quasispecies in serum and liver. J Virol 74:805–811 Cajimat MN, Fulhorst CF (2004) Phylogeny of the Venezuelan arenaviruses. Virus Res 102:199–206 Cajimat MN, Milazzo ML, Hess BD, Rood MP, Fulhorst CF (2007) Principal host relationships and evolutionary history of the North American arenaviruses. Virology 367:235–243 Campbell Dwyer EJ, Lai H, MacDonald RC, Salvato MS, Borden KL (2000) The lymphocytic choriomeningitis virus RING protein Z associates with eukaryotic initiation factor 4E and selectively represses translation in a RING-dependent manner. J Virol 74:3293–3300 Cao W, Henry MD, Borrow P, Yamada H, Elder JH, Ravkov EV, Nichol ST, Compans RW, Campbell KP, Oldstone MB (1998) Identification of alpha-dystroglycan as a receptor for lymphocytic choriomeningitis virus and Lassa fever virus [see comments]. Science 282:2079–2081 Carrion R Jr, Patterson JL, Johnson C, Gonzales M, Moreira CR, Ticer A, Brasky K, Hubbard GB, Moshkoff D, Zapata J, Salvato MS, Lukashevich IS (2007) A ML29 reassortant virus protects guinea pigs against a distantly related Nigerian strain of Lassa virus and can provide sterilizing immunity. Vaccine 25:4093–4102 Cashman KA, Smith MA, Twenhafel NA, Larson RA, Jones KF, Allen RD 3rd, Dai D, Chinsangaram J, Bolken TC, Hruby DE, Amberg SM, Hensley LE, Guttieri MC (2011) Evaluation of Lassa antiviral compound ST-193 in a guinea pig model. Antiviral Res 90:70–79 Charrel RN, de Lamballerie X (2003) Arenaviruses other than Lassa virus. Antiviral Res 57:89–100 Charrel RN, de Lamballerie X (2010) Zoonotic aspects of arenavirus infections. Vet Microbiol 140:213–220 Charrel RN, de Lamballerie X, Fulhorst CF (2001) The Whitewater Arroyo virus: natural evidence for genetic recombination among Tacaribe serocomplex viruses (family Arenaviridae). Virology 283:161–166 Charrel RN, Feldmann H, Fulhorst CF, Khelifa R, de Chesse R, de Lamballerie X (2002) Phylogeny of New World arenaviruses based on the complete coding sequences of the small genomic segment identified an evolutionary lineage produced by intrasegmental recombination. Biochem Biophys Res Commun 296:1118–1124 Charrel RN, Lemasson JJ, Garbutt M, Khelifa R, De Micco P, Feldmann H, de Lamballerie X (2003) New insights into the evolutionary relationships between arenaviruses provided by comparative analysis of small and large segment sequences. Virology 317:191–196 Charrel RN, de Lamballerie X, Emonet S (2008) Phylogeny of the genus Arenavirus. Curr Opin Microbiol 11:362–368 Cheng-Mayer C, Weiss C, Seto D, Levy JA (1989) Isolates of human immunodeficiency virus type 1 from the brain may constitute a special group of the AIDS virus. Proc Natl Acad Sci U S A 86:8575–8579 Ciurea A, Klenerman P, Hunziker L, Horvath E, Senn BM, Ochsenbein AF, Hengartner H, Zinkernagel RM (2000) Viral persistence in vivo through selection of neutralizing antibody-escape variants. Proc Natl Acad Sci U S A 97:2749–2754

292

M. Sironi et al.

Ciurea A, Hunziker L, Zinkernagel RM, Hengartner H (2001) Viral escape from the neutralizing antibody response: the lymphocytic choriomeningitis virus model. Immunogenetics 53:185–189 Colangelo P, Verheyen E, Leirs H, Tatard C, Denys C, Dobigny G, Duplantier J-M, Brouat C, Granjon L, Lecompte E (2013) A mitochondrial phylogeographic scenario for the most widespread African rodent, Mastomys natalensis. Biol J Lin Soc 108:901–916 Cornu TI, de la Torre JC (2001) RING finger Z protein of lymphocytic choriomeningitis virus (LCMV) inhibits transcription and RNA replication of an LCMV S-segment minigenome. J Virol 75:9415–9426 Cornu TI, de la Torre JC (2002) Characterization of the arenavirus RING finger Z protein regions required for Z-mediated inhibition of viral RNA synthesis. J Virol 76:6678–6688 Cuypers LN, Baird SJE, Hanova A, Locus T, Katakweba AS, Gryseels S, Bryja J, Leirs H, Gouy de Bellocq J (2020) Three arenaviruses in three subspecific natal multimammate mouse taxa in Tanzania: same host specificity, but different spatial genetic structure? Virus Evol 6:veaa039 Damonte EB, Coto CE (2002) Treatment of arenavirus infections: from basic studies to the challenge of antiviral therapy. Adv Virus Res 58:125–155 Damonte EB, Mersich SE, Coto CE (1983) Response of cells persistently infected with arenaviruses to superinfection with homotypic and heterotypic viruses. Virology 129:474–478 Dapp MJ, Clouser CL, Patterson S, Mansky LM (2009) 5-Azacytidine can induce lethal mutagenesis in human immunodeficiency virus type 1. J Virol 83:11950–11958 Dapp MJ, Holtz CM, Mansky LM (2012) Concomitant lethal mutagenesis of human immunodeficiency virus type 1. J Mol Biol 419:158–170 Darriba D, Taboada GL, Doallo R, Posada D (2011) ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27:1164–1165 de la Torre JC, Holland JJ (1990) RNA virus quasispecies populations can suppress vastly superior mutant progeny. J Virol 64:6278–6281 Deforges S, Evlashev A, Perret M, Sodoyer M, Pouzol S, Scoazec JY, Bonnaud B, Diaz O, ParanhosBaccala G, Lotteau V, Andre P (2004) Expression of hepatitis C virus proteins in epithelial intestinal cells in vivo. J Gen Virol 85:2515–2523 Demby AH, Chamberlain J, Brown DW, Clegg CS (1994) Early diagnosis of Lassa fever by reverse transcription-PCR. J Clin Microbiol 32:2898–2903 Djavani M, Rodas J, Lukashevich IS, Horejsh D, Pandolfi PP, Borden KL, Salvato MS (2001) Role of the promyelocytic leukemia protein PML in the interferon sensitivity of lymphocytic choriomeningitis virus. J Virol 75:6204–6208 Domingo E (1992) Genetic variation and quasi-species. Curr Opin Genet Dev 2:61–63 Domingo E, Mas A, Yuste E, Pariente N, Sierra S, Gutierrez-Riva M, Menendez-Arias L (2001) Virus population dynamics, fitness variations and the control of viral disease: an update. Prog Drug Res 57:77–115 Domingo E, Martin V, Perales C, Escarmis C (2008) Coxsackieviruses and quasispecies theory: evolution of enteroviruses. Curr Top Microbiol Immunol 323:3–32 Domingo E (2006) Quasispecies: concepts and implications for virology, vol 299. Springer Domingo E (2007) Virus evolution. In: Knipe DM, Holey PM (ed) Fields virology, 5th ed Domingo E, Holland JJ, Ahlquist P (1988) RNA Genetics, vol I, II, III. CRC Press, Boca Ratón Downs WG, Anderson CR, Spence L, Aitken TH, Greenhall AH (1963) Tacaribe virus, a new agent isolated from Artibeus bats and mosquitoes in Trinidad, West Indies. Am J Trop Med Hyg 12:640–646 Drake JW, Holland JJ (1999) Mutation rates among RNA viruses. Proc Natl Acad Sci U S A 96:13910–13913 Duvignaud A, Jaspard M, Etafo IC, Gabillard D, Serra B, Abejegah C, le Gal C, Abidoye AT, Doutchi M, Owhin S, Seri B, Vihundira JK, Bererd-Camara M, Schaeffer J, Danet N, Augier A, OgbainiEmovon E, Salam AP, Ahmed LA, Duraffour S, Horby P, Gunther S, Adedosu AN, Ayodeji OO, Anglaret X, Malvy D, LASCOPE Study Group (2021) Lassa fever outcomes and prognostic factors in Nigeria (LASCOPE): a prospective cohort study. Lancet Glob Health 9:e469–e478

Mammarenavirus Genetic Diversity and Its Biological Implications

293

Ehichioya DU, Hass M, Becker-Ziaja B, Ehimuan J, Asogun DA, Fichet-Calvet E, Kleinsteuber K, Lelke M, ter Meulen J, Akpede GO, Omilabu SA, Gunther S, Olschlager S (2011) Current molecular epidemiology of Lassa virus in Nigeria. J Clin Microbiol 49:1157–1161 Ehichioya DU, Dellicour S, Pahlmann M, Rieger T, Oestereich L, Becker-Ziaja B, Cadar D, Ighodalo Y, Olokor T, Omomoh E, Oyakhilome J, Omiunu R, Agbukor J, Ebo B, Aiyepada J, Ebhodaghe P, Osiemi B, Ehikhametalor S, Akhilomen P, Airende M, Esumeh R, Muoebonam E, Giwa R, Ekanem A, Igenegbale G, Odigie G, Okonofua G, Enigbe R, Omonegho Yerumoh E, Pallasch E, Bockholt S, Kafetzopoulou LE, Duraffour S, Okokhere PO, Akpede GO, Okogbenin SA, Odia I, Aire C, Akpede N, Tobin E, Ogbaini-Emovon E, Lemey P, Adomeh DI, Asogun DA, Gunther S (2019) Phylogeography of Lassa virus in Nigeria. J Virol 93 Eigen M (1971) Selforganization of matter and the evolution of biological macromolecules. Naturwissenschaften 58:465–523 Eigen M (2002) Error catastrophe and antiviral strategy. Proc Natl Acad Sci U S A 99:13374–13376 Ellenberg P, Edreira M, Scolaro L (2004) Resistance to superinfection of Vero cells persistently infected with Junin virus. Arch Virol 149:507–522 Ellenberg P, Linero FN, Scolaro LA (2007) Superinfection exclusion in BHK-21 cells persistently infected with Junin virus. J Gen Virol 88:2730–2739 Emonet S, Lemasson JJ, Gonzalez JP, de Lamballerie X, Charrel RN (2006) Phylogeny and evolution of old world arenaviruses. Virology 350:251–257 Emonet SF, de la Torre JC, Domingo E, Sevilla N (2009) Arenavirus genetic diversity and its biological implications. Infect Genet Evol 9:417–429 Enria DA, Barrera Oro JG (2002) Junin virus vaccines. Curr Top Microbiol Immunol 263:239–261 Evans CF, Borrow P, de la Torre JC, Oldstone MB (1994) Virus-induced immunosuppression: kinetic analysis of the selection of a mutation associated with viral persistence. J Virol 68:7367–7373 Falzarano D, Feldmann H (2013) Vaccines for viral hemorrhagic fevers–progress and shortcomings. Curr Opin Virol 3:343–351 Fenner F (1976) Classification and nomenclature of viruses. Second report of the International Committee on Taxonomy of Viruses. Intervirology 7:1–115 Fernandes J, Guterres A, de Oliveira RC, Chamberlain J, Lewandowski K, Teixeira BR, Coelho TA, Crisostomo CF, Bonvicino CR, D’Andrea PS, Hewson R, de Lemos ERS (2018) Xapuri virus, a novel mammarenavirus: natural reassortment and increased diversity between New World viruses. Emerg Microbes Infect 7:120 Fichet-Calvet E, Rogers DJ (2009) Risk maps of Lassa fever in West Africa. PLoS Negl Trop Dis 3:e388 Fischer SA, Graham MB, Kuehnert MJ, Kotton CN, Srinivasan A, Marty FM, Comer JA, Guarner J, Paddock CD, DeMeo DL, Shieh WJ, Erickson BR, Bandy U, DeMaria A Jr, Davis JP, Delmonico FL, Pavlin B, Likos A, Vincent MJ, Sealy TK, Goldsmith CS, Jernigan DB, Rollin PE, Packard MM, Patel M, Rowland C, Helfand RF, Nichol ST, Fishman JA, Ksiazek T, Zaki SR (2006) Transmission of lymphocytic choriomeningitis virus by organ transplantation. N Engl J Med 354:2235–2249 Fisher-Hoch SP, McCormick JB (2004) Lassa fever vaccine. Expert Rev Vaccines 3:189–197 Fisher-Hoch SP, Hutwagner L, Brown B, McCormick JB (2000) Effective vaccine for lassa fever. J Virol 74:6777–6783 Forni D, Pontremoli C, Pozzoli U, Clerici M, Cagliani R, Sironi M (2018) Ancient evolution of mammarenaviruses: adaptation via changes in the L protein and no evidence for host-virus codivergence. Genome Biol Evol 10:863–874 Forni D, Sironi M (2020) Population structure of Lassa Mammarenavirus in West Africa. Viruses 12 Frame JD, Baldwin JM Jr, Gocke DJ, Troup JM (1970) Lassa fever, a new virus disease of man from West Africa. I. Clinical description and pathological findings. Am J Trop Med Hyg 19:670–676 Freedman DO, Woodall J (1999) Emerging infectious diseases and risk to the traveler. Med Clin North Am 83(865–83):v

294

M. Sironi et al.

Freeman GJ, Wherry EJ, Ahmed R, Sharpe AH (2006) Reinvigorating exhausted HIV-specific T cells via PD-1-PD-1 ligand blockade. J Exp Med 203:2223–2227 Fulhorst CF, Bowen MD, Ksiazek TG, Rollin PE, Nichol ST, Kosoy MY, Peters CJ (1996) Isolation and characterization of Whitewater Arroyo virus, a novel North American arenavirus. Virology 224:114–120 Fulhorst CF, Bowen MD, Salas RA, Duno G, Utrera A, Ksiazek TG, De Manzione NM, De Miller E, Vasquez C, Peters CJ, Tesh RB (1999) Natural rodent host associations of Guanarito and pirital viruses (Family Arenaviridae) in central Venezuela. Am J Trop Med Hyg 61:325–330 Fulhorst CF, Charrel RN, Weaver SC, Ksiazek TG, Bradley RD, Milazzo ML, Tesh RB, Bowen MD (2001) Geographic distribution and genetic diversity of Whitewater Arroyo virus in the southwestern United States. Emerg Infect Dis 7:403–407 Fulhorst CF, Cajimat MN, Milazzo ML, Paredes H, de Manzione NM, Salas RA, Rollin PE, Ksiazek TG (2008) Genetic diversity between and within the arenavirus species indigenous to western Venezuela. Virology 378:205–213 Garcia JB, Morzunov SP, Levis S, Rowe J, Calderon G, Enria D, Sabattini M, Buchmeier MJ, Bowen MD, St Jeor SC (2000) Genetic diversity of the Junin virus in Argentina: geographic and temporal patterns. Virology 272:127–136 Geisbert TW, Jahrling PB (2004) Exotic emerging viral diseases: progress and challenges. Nat Med 10:S110–S121 Gerlach P, Malet H, Cusack S, Reguera J (2015) Structural insights into bunyavirus replication and its regulation by the vRNA promoter. Cell 161:1267–1279 Gonzalez JR, Armengol L, Sole X, Guino E, Mercader JM, Estivill X, Moreno V (2007) SNPassoc: an R package to perform whole genome association studies. Bioinformatics 23:644–645 Gonzalez-Lopez C, Arias A, Pariente N, Gomez-Mariano G, Domingo E (2004) Preextinction viral RNA can interfere with infectivity. J Virol 78:3319–3324 Gowen BB, Juelich TL, Sefing EJ, Brasel T, Smith JK, Zhang L, Tigabu B, Hill TE, Yun T, Pietzsch C, Furuta Y, Freiberg AN (2013) Favipiravir (T-705) inhibits Junin virus infection and reduces mortality in a guinea pig model of Argentine hemorrhagic fever. PLoS Negl Trop Dis 7:e2614 Graci JD, Harki DA, Korneeva VS, Edathil JP, Too K, Franco D, Smidansky ED, Paul AV, Peterson BR, Brown DM, Loakes D, Cameron CE (2007) Lethal mutagenesis of poliovirus mediated by a mutagenic pyrimidine analogue. J Virol 81:11256–11266 Grande-Perez A, Sierra S, Castro MG, Domingo E, Lowenstein PR (2002) Molecular indetermination in the transition to error catastrophe: systematic elimination of lymphocytic choriomeningitis virus through mutagenesis does not correlate linearly with large increases in mutant spectrum complexity. Proc Natl Acad Sci U S A 99:12938–12943 Grande-Perez A, Lazaro E, Lowenstein P, Domingo E, Manrubia SC (2005a) Suppression of viral infectivity through lethal defection. Proc Natl Acad Sci U S A 102:4448–4452 Grande-Perez A, Gomez-Mariano G, Lowenstein PR, Domingo E (2005b) Mutagenesis-induced, large fitness variations with an invariant arenavirus consensus genomic nucleotide sequence. J Virol 79:10451–10459 Grange ZL, Goldstein T, Johnson CK, Anthony S, Gilardi K, Daszak P, Olival KJ, O’Rourke T, Murray S, Olson SH, Togami E, Vidal G, Expert P, Consortium P, Mazet JAK, University of Edinburgh Epigroup members those who wish to remain (2021) Ranking the risk of animal-tohuman spillover for newly discovered viruses. Proc Natl Acad Sci U S A 118:e2002324118 Grant A, Seregin A, Huang C, Kolokoltsova O, Brasier A, Peters C, Paessler S (2012) Junin virus pathogenesis and virus replication. Viruses 4:2317–2339 Gryseels S, Baird SJ, Borremans B, Makundi R, Leirs H, Gouy de Bellocq J (2017) When viruses don’t go viral: the importance of host phylogeographic structure in the spatial spread of arenaviruses. PLoS Pathog 13:e1006073 Gunther S, Emmerich P, Laue T, Kuhle O, Asper M, Jung A, Grewing T, ter Meulen J, Schmitz H (2000) Imported lassa fever in Germany: molecular characterization of a new Lassa virus strain. Emerg Infect Dis 6:466–476

Mammarenavirus Genetic Diversity and Its Biological Implications

295

Hall JS, French R, Hein GL, Morris TJ, Stenger DC (2001) Three distinct mechanisms facilitate genetic isolation of sympatric wheat streak mosaic virus lineages. Virology 282:230–236 Hallam HJ, Hallam S, Rodriguez SE, Barrett ADT, Beasley DWC, Chua A, Ksiazek TG, Milligan GN, Sathiyamoorthy V, Reece LM (2018) Baseline mapping of Lassa fever virology, epidemiology and vaccine research and development. NPJ Vaccines 3:11 Happi AN, Happi CT, Schoepp RJ (2019) Lassa fever diagnostics: past, present, and future. Curr Opin Virol 37:132–138 Harris KS, Brabant W, Styrchak S, Gall A, Daifuku R (2005) KP-1212/1461, a nucleoside designed for the treatment of HIV by viral mutagenesis. Antiviral Res 67:1–9 Hastie KM, Kimberlin CR, Zandonatti MA, MacRae IJ, Saphire EO (2011) Structure of the Lassa virus nucleoprotein reveals a dsRNA-specific 3’ to 5’ exonuclease activity essential for immune suppression. Proc Natl Acad Sci U S A 108:2396–2401 Haydon DT, Bastos AD, Knowles NJ, Samuel AR (2001) Evidence for positive selection in footand-mouth disease virus capsid genes from field isolates. Genetics 157:7–15 Hetzel U, Sironen T, Laurinmaki P, Liljeroos L, Patjas A, Henttonen H, Vaheri A, Artelt A, Kipar A, Butcher SJ, Vapalahti O, Hepojoki J (2013) Isolation, identification, and characterization of novel arenaviruses, the etiological agents of boid inclusion body disease. J Virol 87:10918–10935 Hotchin J, Sikora E (1973) Low-pathogenicity variant of lymphocytic choriomeningitis virus. Infect Immun 7:825–826 Irwin NR, Bayerlova M, Missa O, Martinkova N (2012) Complex patterns of host switching in New World arenaviruses. Mol Ecol 21:4137–4150 Isaacson M (2001) Viral hemorrhagic fever hazards for travelers in Africa. Clin Infect Dis 33:1707– 1712 Jahrling PB (1983) Protection of Lassa virus-infected guinea pigs with Lassa-immune plasma of guinea pig, primate, and human origin. J Med Virol 12:93–102 Jahrling PB, Peters CJ (1992) Lymphocytic choriomeningitis virus. A neglected pathogen of man. Arch Pathol Lab Med 116:486–488 Jay MT, Glaser C, Fulhorst CF (2005) The arenaviruses. J Am Vet Med Assoc 227:904–915 Jelcic I, Hotz-Wagenblatt A, Hunziker A, Zur Hausen H, de Villiers EM (2004) Isolation of multiple TT virus genotypes from spleen biopsy tissue from a Hodgkin’s disease patient: genome reorganization and diversity in the hypervariable region. J Virol 78:7498–7507 Johnson KM (1965) Epidemiology of Machupo virus infection. 3. Significance of virological observations in man and animals. Am J Trop Med Hyg 14:816–818 Johnson KM, Kuns ML, Mackenzie RB, Webb PA, Yunker CE (1966) Isolation of Machupo virus from wild rodent Calomys callosus. Am J Trop Med Hyg 15:103–106 Jombart T, Devillard S, Balloux F (2010) Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genet 11:94 Jridi C, Martin JF, Marie-Jeanne V, Labonne G, Blanc S (2006) Distinct viral populations differentiate and evolve independently in a single perennial host plant. J Virol 80:2349–2357 Kafetzopoulou LE, Pullan ST, Lemey P, Suchard MA, Ehichioya DU, Pahlmann M, Thielebein A, Hinzmann J, Oestereich L, Wozniak DM, Efthymiadis K, Schachten D, Koenig F, Matjeschk J, Lorenzen S, Lumley S, Ighodalo Y, Adomeh DI, Olokor T, Omomoh E, Omiunu R, Agbukor J, Ebo B, Aiyepada J, Ebhodaghe P, Osiemi B, Ehikhametalor S, Akhilomen P, Airende M, Esumeh R, Muoebonam E, Giwa R, Ekanem A, Igenegbale G, Odigie G, Okonofua G, Enigbe R, Oyakhilome J, Yerumoh EO, Odia I, Aire C, Okonofua M, Atafo R, Tobin E, Asogun D, Akpede N, Okokhere PO, Rafiu MO, Iraoyah KO, Iruolagbe CO et al (2019) Metagenomic sequencing at the epicenter of the Nigeria 2018 Lassa fever outbreak. Science 363:74–77 Kai Y, Hikita H, Morishita N, Murai K, Nakabori T, Iio S, Hagiwara H, Imai Y, Tamura S, Tsutsui S, Naito M, Nishiuchi M, Kondo Y, Kato T, Suemizu H, Yamada R, Oze T, Yakushijin T, Hiramatsu N, Sakamori R, Tatsumi T, Takehara T (2017) Baseline quasispecies selection and novel mutations contribute to emerging resistance-associated substitutions in hepatitis C virus after direct-acting antiviral treatment. Sci Rep 7:41660

296

M. Sironi et al.

Kim YJ, Cubitt B, Chen E, Hull MV, Chatterjee AK, Cai Y, Kuhn JH, de la Torre JC (2019) The ReFRAME library as a comprehensive drug repurposing library to identify mammarenavirus inhibitors. Antiviral Res 169:104558 Klavinskis LS, Oldstone MB (1989) Lymphocytic choriomeningitis virus selectively alters differentiated but not housekeeping functions: block in expression of growth hormone gene is at the level of transcriptional initiation. Virology 168:232–235 Kranzusch PJ, Whelan SP (2011) Arenavirus Z protein controls viral RNA synthesis by locking a polymerase-promoter complex. Proc Natl Acad Sci U S A 108:19743–19748 Kuhn JH, Adkins S, Agwanda BR, Al Kubrusli R, Alkhovsky SV, Amarasinghe GK, Avsic-Zupanc T, Ayllon MA, Bahl J, Balkema-Buschmann A, Ballinger MJ, Basler CF, Bavari S, Beer M, Bejerman N, Bennett AJ, Bente DA, Bergeron E, Bird BH, Blair CD, Blasdell KR, Blystad DR, Bojko J, Borth WB, Bradfute S, Breyta R, Briese T, Brown PA, Brown JK, Buchholz UJ, Buchmeier MJ, Bukreyev A, Burt F, Buttner C, Calisher CH, Cao M, Casas I, Chandran K, Charrel RN, Cheng Q, Chiaki Y, Chiapello M, Choi IR, Ciuffo M, Clegg JCS, Crozier I, Dal Bo E, de la Torre JC, de Lamballerie X, de Swart RL et al (2021) 2021 Taxonomic update of phylum Negarnaviricota (Riboviria: Orthornavirae), including the large orders Bunyavirales and Mononegavirales. Arch Virol. https://doi.org/10.1007/s00705-021-05143-6 Kunz S, Borrow P, Oldstone MB (2002) Receptor structure, binding, and cell entry of arenaviruses. Curr Top Microbiol Immunol 262:111–137 Kunz S, Edelmann KH, de la Torre JC, Gorney R, Oldstone MB (2003) Mechanisms for lymphocytic choriomeningitis virus glycoprotein cleavage, transport, and incorporation into virions. Virology 314:168–178 Kunz S, Sevilla N, Rojek JM, Oldstone MB (2004) Use of alternative receptors different than alphadystroglycan by selected isolates of lymphocytic choriomeningitis virus. Virology 325:432–445 Kunz S, Rojek JM, Kanagawa M, Spiropoulou CF, Barresi R, Campbell KP, Oldstone MB (2005) Posttranslational modification of alpha-dystroglycan, the cellular receptor for arenaviruses, by the glycosyltransferase LARGE is critical for virus binding. J Virol 79:14282–14296 Lalis A, Leblois R, Lecompte E, Denys C, Ter Meulen J, Wirth T (2012) The impact of human conflict on the genetics of Mastomys natalensis and Lassa virus in West Africa. PLoS ONE 7:e37068 Lalis A, Wirth T (2018) Mice and men: an evolutionary history of Lassa fever. Elsevier Ltd. https:// doi.org/10.1016/B978-1-78548-277-9.50011-5 Lee AM, Pasquato A, Kunz S (2011) Novel approaches in anti-arenaviral drug development. Virology 411:163–169 Lenz O, ter Meulen J, Klenk HD, Seidah NG, Garten W (2001) The Lassa virus glycoprotein precursor GP-C is proteolytically processed by subtilase SKI-1/S1P. Proc Natl Acad Sci U S A 98:12701–12705 Leski TA, Stockelman MG, Moses LM, Park M, Stenger DA, Ansumana R, Bausch DG, Lin B (2015) Sequence variability and geographic distribution of Lassa virus, Sierra Leone. Emerg Infect Dis 21:609–618 Lewicki H, Tishon A, Borrow P, Evans CF, Gairin JE, Hahn KM, Jewell DA, Wilson IA, Oldstone MB (1995) CTL escape viral variants. I. Generation and molecular characterization. Virology 210:29–40 Li K, Lin XD, Wang W, Shi M, Guo WP, Zhang XH, Xing JG, He JR, Wang K, Li MH, Cao JH, Jiang ML, Holmes EC, Zhang YZ (2015) Isolation and characterization of a novel arenavirus harbored by Rodents and Shrews in Zhejiang province, China. Virology 476:37–42 Loytynoja A, Goldman N (2008) Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320:1632–1635 Lukashevich IS (1992) Generation of reassortants between African arenaviruses. Virology 188:600– 605 Lukashevich IS (2012) Advanced vaccine candidates for Lassa fever. Viruses 4:2514–2557

Mammarenavirus Genetic Diversity and Its Biological Implications

297

Lukashevich IS, Patterson J, Carrion R, Moshkoff D, Ticer A, Zapata J, Brasky K, Geiger R, Hubbard GB, Bryant J, Salvato MS (2005) A live attenuated vaccine for Lassa fever made by reassortment of Lassa and Mopeia viruses. J Virol 79:13934–13942 Maes P, Alkhovsky SV, Bao Y, Beer M, Birkhead M, Briese T, Buchmeier MJ, Calisher CH, Charrel RN, Choi IR, Clegg CS, de la Torre JC, Delwart E, DeRisi JL, Di Bello PL, Di Serio F, Digiaro M, Dolja VV, Drosten C, Druciarek TZ, Du J, Ebihara H, Elbeaino T, Gergerich RC, Gillis AN, Gonzalez JJ, Haenni AL, Hepojoki J, Hetzel U, Ho T, Hong N, Jain RK, Jansen van Vuren P, Jin Q, Jonson MG, Junglen S, Keller KE, Kemp A, Kipar A, Kondov NO, Koonin EV, Kormelink R, Korzyukov Y, Krupovic M, Lambert AJ, Laney AG, LeBreton M, Lukashevich IS, Marklewitz M, Markotter W et al (2018) Taxonomy of the family Arenaviridae and the order Bunyavirales: update 2018. Arch Virol 163:2295–2310 Manning JT, Forrester N, Paessler S (2015) Lassa virus isolates from Mali and the Ivory Coast represent an emerging fifth lineage. Front Microbiol 6:1037 Manrubia SC, Domingo E, Lazaro E (2010) Pathways to extinction: beyond the error threshold. Philos Trans R Soc Lond B Biol Sci 365:1943–1952 Martin V, Domingo E (2008) Influence of the mutant spectrum in viral evolution: focused selection of antigenic variants in a reconstructed viral quasispecies. Mol Biol Evol 25:1544–1554 Martin V, Abia D, Domingo E, Grande-Perez A (2010) An interfering activity against lymphocytic choriomeningitis virus replication associated with enhanced mutagenesis. J Gen Virol 91:990– 1003 Martinez-Sobrido L, Zuniga EI, Rosario D, Garcia-Sastre A, de la Torre JC (2006) Inhibition of the type I interferon response by the nucleoprotein of the prototypic arenavirus lymphocytic choriomeningitis virus. J Virol 80:9192–9199 Martinez-Sobrido L, Giannakas P, Cubitt B, Garcia-Sastre A, de la Torre JC (2007) Differential inhibition of type I interferon induction by arenavirus nucleoproteins. J Virol 81:12696–12703 Martinez-Sobrido L, Emonet S, Giannakas P, Cubitt B, Garcia-Sastre A, de la Torre JC (2009) Identification of amino acid residues critical for the anti-interferon activity of the nucleoprotein of the prototypic arenavirus lymphocytic choriomeningitis virus. J Virol 83:11330–11340 Mas A, Lopez-Galindez C, Cacho I, Gomez J, Martinez MA (2010) Unfinished stories on viral quasispecies and Darwinian views of evolution. J Mol Biol 397:865–877 Matthews RE (1979) Third report of the International Committee on Taxonomy of Viruses. Classification and nomenclature of viruses. Intervirology 12:129–296 Mendenhall M, Russell A, Juelich T, Messina EL, Smee DF, Freiberg AN, Holbrook MR, Furuta Y, de la Torre JC, Nunberg JH, Gowen BB (2011a) T-705 (favipiravir) inhibition of arenavirus replication in cell culture. Antimicrob Agents Chemother 55:782–787 Mendenhall M, Russell A, Smee DF, Hall JO, Skirpstunas R, Furuta Y, Gowen BB (2011b) Effective oral favipiravir (T-705) therapy initiated after the onset of clinical disease in a model of arenavirus hemorrhagic Fever. PLoS Negl Trop Dis 5:e1342 Meyer BJ, Southern PJ (1994) Sequence heterogeneity in the termini of lymphocytic choriomeningitis virus genomic and antigenomic RNAs. J Virol 68:7659–7664 Meyer BJ, de la Torre JC, Southern PJ (2002) Arenaviruses: genomic RNAs, transcription, and replication. Curr Top Microbiol Immunol 262:139–149 Monath TP, Casals J (1975) Diagnosis of Lassa fever and the isolation and management of patients. Bull World Health Organ 52:707–715 Monath TP, Newhouse VF, Kemp GE, Setzer HW, Cacciapuoti A (1974) Lassa virus isolation from Mastomys natalensis rodents during an epidemic in Sierra Leone. Science 185:263–265 Mordecai GJ, Miller KM, Di Cicco E, Schulze AD, Kaukinen KH, Ming TJ, Li S, Tabata A, Teffer A, Patterson DA, Ferguson HW, Suttle CA (2019) Endangered wild salmon infected by newly discovered viruses. Elife 8 Moreno H, Gallego I, Sevilla N, de la Torre JC, Domingo E, Martin V (2011) Ribavirin can be mutagenic for arenaviruses. J Virol 85:7246–7255

298

M. Sironi et al.

Moreno H, Tejero H, de la Torre JC, Domingo E, Martin V (2012a) Mutagenesis-mediated virus extinction: virus-dependent effect of viral load on sensitivity to lethal defection. PLoS ONE 7:e32550 Moreno H, Grande-Perez A, Domingo E, Martin V (2012b) Arenaviruses and lethal mutagenesis. Prospects for new ribavirin-based interventions. Viruses 4:2786–2805 Morimoto K, Hooper DC, Carbaugh H, Fu ZF, Koprowski H, Dietzschold B (1998) Rabies virus quasispecies: implications for pathogenesis. Proc Natl Acad Sci U S A 95:3152–3156 Morin B, Coutard B, Lelke M, Ferron F, Kerber R, Jamal S, Frangeul A, Baronti C, Charrel R, de Lamballerie X, Vonrhein C, Lescar J, Bricogne G, Gunther S, Canard B (2010) The N-terminal domain of the arenavirus L protein is an RNA endonuclease essential in mRNA transcription. PLoS Pathog 6:e1001038 Moshkoff DA, Salvato MS, Lukashevich IS (2007) Molecular characterization of a reassortant virus derived from Lassa and Mopeia viruses. Virus Genes 34:169–176 Muller G, Bruns M, Martinez Peralta L, Lehmann-Grube F (1983) Lymphocytic choriomeningitis virus. IV. Electron microscopic investigation of the virion. Arch Virol 75:229–242 Murphy FA, Webb PA, Johnson KM, Whitfield SG (1969) Morphological comparison of Machupo with lymphocytic choriomeningitis virus: basis for a new taxonomic group. J Virol 4:535–541 Nd N, Berthet N, Rougeron V, Mangombi JB, Durand P, Maganga GD, Bouchier C, Schneider BS, Fair J, Renaud F, Leroy EM (2015) Evidence of lymphocytic choriomeningitis virus (LCMV) in domestic mice in Gabon: risk of emergence of LCMV encephalitis in Central Africa. J Virol 89:1456–1460 Nikisins S, Rieger T, Patel P, Muller R, Gunther S, Niedrig M (2015) International external quality assessment study for molecular detection of Lassa virus. PLoS Negl Trop Dis 9:e0003793 Novella IS, Duarte EA, Elena SF, Moya A, Domingo E, Holland JJ (1995) Exponential increases of RNA virus fitness during large population transmissions. Proc Natl Acad Sci U S A 92:5841–5844 Okokhere P, Colubri A, Azubike C, Iruolagbe C, Osazuwa O, Tabrizi S, Chin E, Asad S, Ediale E, Rafiu M, Adomeh D, Odia I, Atafo R, Aire C, Okogbenin S, Pahlman M, Becker-Ziaja B, Asogun D, Fradet T, Fry B, Schaffner SF, Happi C, Akpede G, Gunther S, Sabeti PC (2018) Clinical and laboratory predictors of Lassa fever outcome in a dedicated treatment facility in Nigeria: a retrospective, observational cohort study. Lancet Infect Dis 18:684–695 Olayemi A, Obadare A, Oyeyiola A, Igbokwe J, Fasogbon A, Igbahenah F, Ortsega D, Asogun D, Umeh P, Vakkai I, Abejegah C, Pahlman M, Becker-Ziaja B, Gunther S, Fichet-Calvet E (2016) Arenavirus diversity and phylogeography of mastomys natalensis rodents, Nigeria. Emerg Infect Dis 22:694–697 Oldstone MB, Campbell KP (2010) Decoding arenavirus pathogenesis: essential roles for alphadystroglycan-virus interactions and the immune response. Virology 411:170–179 Oldstone MB, Rodriguez M, Daughaday WH, Lampert PW (1984) Viral perturbation of endocrine function: disordered cell function leads to disturbed homeostasis and disease. Nature 307:278–281 Oldstone MB, Ahmed R, Buchmeier MJ, Blount P, Tishon A (1985) Perturbation of differentiated functions during viral infection in vivo. I. Relationship of lymphocytic choriomeningitis virus and host strains to growth hormone deficiency. Virology 142:158–174 Oldstone MB (2002) Biology and pathogenesis of lymphocytic choriomeningitis virus infection. In: Oldstone MB (ed) Arenaviruses, vol 263, pp 83–118 Olschlager S, Lelke M, Emmerich P, Panning M, Drosten C, Hass M, Asogun D, Ehichioya D, Omilabu S, Gunther S (2010) Improved detection of Lassa virus by reverse transcription-PCR targeting the 5’ region of S RNA. J Clin Microbiol 48:2009–2013 Onyuok SO, Hu B, Li B, Fan Y, Kering K, Ochola GO, Zheng XS, Obanda V, Ommeh S, Yang XL, Agwanda B, Shi ZL (2019) Molecular detection and genetic characterization of novel RNA viruses in wild and synanthropic rodents and shrews in Kenya. Front Microbiol 10:2696 Ortega-Prieto AM, Sheldon J, Grande-Perez A, Tejero H, Gregori J, Quer J, Esteban JI, Domingo E, Perales C (2013) Extinction of hepatitis C virus by ribavirin in hepatoma cells involves lethal mutagenesis. PLoS ONE 8:e71039

Mammarenavirus Genetic Diversity and Its Biological Implications

299

Palacios G, Druce J, Du L, Tran T, Birch C, Briese T, Conlan S, Quan PL, Hui J, Marshall J, Simons JF, Egholm M, Paddock CD, Shieh WJ, Goldsmith CS, Zaki SR, Catton M, Lipkin WI (2008) A new arenavirus in a cluster of fatal transplant-associated diseases. N Engl J Med 358:991–998 Parker WB (2005) Metabolism and antiviral activity of ribavirin. Virus Res 107:165–171 Parodi AS, Rugiero HR, Frigerio M, de la Barrera JM, Mettler N et al (1958) Concerning the epidemic outbreak in Junin. Dia Med 30:2300–2301 Parodi AS, Rugiero HR, Greenway DJ, Mettler N, Martinez A, Boxaca M et al (1959) Isolation of the Junin virus (epidemic hemorrhagic fever) from the mites of the epidemic area (Echinolaelaps echidninus, Barlese). Prensa Med Argent 46:2242–2244 Pasqual G, Rojek JM, Masin M, Chatton JY, Kunz S (2011) Old world arenaviruses enter the host cell via the multivesicular body and depend on the endosomal sorting complex required for transport. PLoS Pathog 7:e1002232 Peng R, Xu X, Jing J, Wang M, Peng Q, Liu S, Wu Y, Bao X, Wang P, Qi J, Gao GF, Shi Y (2020) Structural insight into arenavirus replication machinery. Nature 579:615–619 Perales C, Martin V, Ruiz-Jarabo CM, Domingo E (2005) Monitoring sequence space as a test for the target of selection in viruses. J Mol Biol 345:451–459 Perales C, Mateo R, Mateu MG, Domingo E (2007) Insights into RNA virus mutant spectrum and lethal mutagenesis events: replicative interference and complementation by multiple point mutants. J Mol Biol 369:985–1000 Perales C, Agudo R, Tejero H, Manrubia SC, Domingo E (2009) Potential benefits of sequential inhibitor-mutagen treatments of RNA virus infections. PLoS Pathog 5:e1000658 Perales C, Iranzo J, Manrubia SC, Domingo E (2012) The impact of quasispecies dynamics on the use of therapeutics. Trends Microbiol 20:595–603 Perez M, Craven RC, de la Torre JC (2003) The small RING finger protein Z drives arenavirus budding: implications for antiviral strategies. Proc Natl Acad Sci U S A 100:12978–12983 Perez M, Greenwald DL, de la Torre JC (2004) Myristoylation of the RING finger Z protein is essential for arenavirus budding. J Virol 78:11443–11448 Peters CJ (2002) Human infection with arenaviruses in the Americas. In: Oldstone MB (ed) Arenaviruses I, vol 262. Springer, Berlin Heidelberg, pp 65–74 Peters CJ (2006) Lymphocytic choriomeningitis virus–an old enemy up to new tricks. N Engl J Med 354:2208–2211 Pfeiffer JK, Kirkegaard K (2005) Increased fidelity reduces poliovirus fitness and virulence under selective pressure in mice. PLoS Pathog 1:e11 Pflug A, Guilligay D, Reich S, Cusack S (2014) Structure of influenza a polymerase bound to the viral RNA promoter. Nature 516:355–360 Pinschewer DD, Perez M, Sanchez AB, de la Torre JC (2003) Recombinant lymphocytic choriomeningitis virus expressing vesicular stomatitis virus glycoprotein. Proc Natl Acad Sci U S A 100:7895–7900 Pinschewer DD, Perez M, de la Torre JC (2005) Dual role of the lymphocytic choriomeningitis virus intergenic region in transcription termination and virus propagation. J Virol 79:4519–4526 Pircher H, Moskophidis D, Rohrer U, Burki K, Hengartner H, Zinkernagel RM (1990) Viral escape by selection of cytotoxic T cell-resistant virus variants in vivo. Nature 346:629–633 Poch O, Sauvaget I, Delarue M, Tordo N (1989) Identification of four conserved motifs among the RNA-dependent polymerase encoding elements. EMBO J 8:3867–3874 Pontremoli C, Forni D, Sironi M (2019) Arenavirus genomics: novel insights into viral diversity, origin, and evolution. Curr Opin Virol 34:18–28 Prince GA, Ottolini MG, Moscona A (2001) Contribution of the human parainfluenza virus type 3 HN-receptor interaction to pathogenesis in vivo. J Virol 75:12446–12451 Pulkkinen AJ, Pfau CJ (1970) Plaque size heterogeneity: a genetic trait of lymphocytic choriomeningitis virus. Appl Microbiol 20:123–128 Pushpakom S, Iorio F, Eyers PA, Escott KJ, Hopper S, Wells A, Doig A, Guilliams T, Latimer J, McNamee C, Norris A, Sanseau P, Cavalla D, Pirmohamed M (2019) Drug repurposing: progress, challenges and recommendations. Nat Rev Drug Discov 18:41–58

300

M. Sironi et al.

Qi X, Lan S, Wang W, Schelde LM, Dong H, Wallat GD, Ly H, Liang Y, Dong C (2010) Cap binding and immune evasion revealed by Lassa nucleoprotein structure. Nature 468:779–783 Radoshitzky SR, Bao Y, Buchmeier MJ, Charrel RN, Clawson AN, Clegg CS, DeRisi JL, Emonet S, Gonzalez JP, Kuhn JH, Lukashevich IS, Peters CJ, Romanowski V, Salvato MS, Stenglein MD, de la Torre JC (2015) Past, present, and future of arenavirus taxonomy. Arch Virol 160:1851–1874 Riviere Y (1987) Mapping arenavirus genes causing virulence. Curr Top Microbiol Immunol 133:59–65 Riviere Y, Oldstone MB (1986) Genetic reassortants of lymphocytic choriomeningitis virus: unexpected disease and mechanism of pathogenesis. J Virol 59:363–368 Riviere Y, Ahmed R, Oldstone MB (1986) The use of lymphocytic choriomeningitis virus reassortants to map viral genes causing virulence. Med Microbiol Immunol 175:191–192 Rojek JM, Kunz S (2008) Cell entry by human pathogenic arenaviruses. Cell Microbiol 10:828–835 Rojek JM, Perez M, Kunz S (2008a) Cellular entry of lymphocytic choriomeningitis virus. J Virol 82:1505–1517 Rojek JM, Sanchez AB, Thao NN, de la Torre JC, Kunz S (2008b) Different mechanisms of cell entry by human pathogenic Old World and New World arenaviruses. J Virol Ruiz-Jarabo CM, Ly C, Domingo E, de la Torre JC (2003) Lethal mutagenesis of the prototypic arenavirus lymphocytic choriomeningitis virus (LCMV). Virology 308:37–47 Russo IR, Sole CL, Barbato M, von Bramann U, Bruford MW (2016) Landscape determinants of fine-scale genetic structure of a small rodent in a heterogeneous landscape (Hluhluwe-iMfolozi Park, South Africa). Sci Rep 6:29168 Safronetz D, Rosenke K, Westover JB, Martellaro C, Okumura A, Furuta Y, Geisbert J, Saturday G, Komeno T, Geisbert TW, Feldmann H, Gowen BB (2015) The broad-spectrum antiviral favipiravir protects guinea pigs from lethal Lassa virus infection post-disease onset. Sci Rep 5:14775 Sakabe S, Hartnett JN, Ngo N, Goba A, Momoh M, Sandi JD, Kanneh L, Cubitt B, Garcia SD, Ware BC, Kotliar D, Robles-Sikisaka R, Gangavarapu K, Branco LM, Eromon P, Odia I, OgbainiEmovon E, Folarin O, Okogbenin S, Okokhere PO, Happi C, Sabeti PC, Andersen KG, Garry RF, de la Torre JC, Grant DS, Schieffelin JS, Oldstone MBA, Sullivan BM (2020) Identification of common CD8(+) T cell epitopes from Lassa fever survivors in Nigeria and Sierra Leone. J Virol 94 Salu OB, James AB, Bankole HS, Agbla JM, Da Silva M, Gbaguidi F, Loko CF, Omilabu SA (2019) Molecular confirmation and phylogeny of Lassa fever virus in Benin Republic 2014–2016. Afr J Lab Med 8:803 Sanjuan R, Codoner FM, Moya A, Elena SF (2004) Natural selection and the organ-specific differentiation of HIV-1 V3 hypervariable region. Evolution 58:1185–1194 Sanz-Ramos M, Rodriguez-Calvo T, Sevilla N (2012) Mutagenesis-mediated decrease of pathogenicity as a feature of the mutant spectrum of a viral population. PLoS ONE 7:e39941 Saunders AA, Ting JP, Meisner J, Neuman BW, Perez M, de la Torre JC, Buchmeier MJ (2007) Mapping the landscape of the lymphocytic choriomeningitis virus stable signal peptide reveals novel functional domains. J Virol 81:5649–5657 Scott TFRT (1936) Meningitis in man caused by a filterable virus: I. Two cases and the method of obtaining a virus from their spinal fluids. J Exp Med 69:397–414 Sevilla N, Kunz S, Holz A, Lewicki H, Homann D, Yamada H, Campbell KP, de La Torre JC, Oldstone MB (2000) Immunosuppression and resultant viral persistence by specific viral targeting of dendritic cells. J Exp Med 192:1249–1260 Sevilla N, Domingo E, de la Torre JC (2002) Contribution of LCMV towards deciphering biology of quasispecies in vivo. Curr Top Microbiol Immunol 263:197–220 Sevilla N, de la Torre JC (2006) Arenavirus diversity and evolution: quasispecies in vivo. In Domingo E (ed) Quasispecies: concepts and implications for virology, vol 299, pp 315–335 Shi M, Lin XD, Tian JH, Chen LJ, Chen X, Li CX, Qin XC, Li J, Cao JP, Eden JS, Buchmann J, Wang W, Xu J, Holmes EC, Zhang YZ (2016) Redefining the invertebrate RNA virosphere. Nature 540:539–543

Mammarenavirus Genetic Diversity and Its Biological Implications

301

Siddle KJ, Eromon P, Barnes KG, Mehta S, Oguzie JU, Odia I, Schaffner SF, Winnicki SM, Shah RR, Qu J, Wohl S, Brehio P, Iruolagbe C, Aiyepada J, Uyigue E, Akhilomen P, Okonofua G, Ye S, Kayode T, Ajogbasile F, Uwanibe J, Gaye A, Momoh M, Chak B, Kotliar D, Carter A, Gladden-Young A, Freije CA, Omoregie O, Osiemi B, Muoebonam EB, Airende M, Enigbe R, Ebo B, Nosamiefan I, Oluniyi P, Nekoui M, Ogbaini-Emovon E, Garry RF, Andersen KG, Park DJ, Yozwiak NL, Akpede G, Ihekweazu C, Tomori O, Okogbenin S, Folarin OA, Okokhere PO, MacInnis BL, Sabeti PC et al (2018) Genomic analysis of Lassa virus during an increase in cases in Nigeria in 2018. N Engl J Med 379:1745–1753 Sogoba N, Feldmann H, Safronetz D (2012) Lassa fever in West Africa: evidence for an expanded region of endemicity. Zoonoses Public Health 59(Suppl 2):43–47 Speir RW, Wood O, Liebhaber H, Buckley SM (1970) Lassa fever, a new virus disease of man from West Africa. IV. Electron microscopy of Vero cell cultures infected with Lassa virus. Am J Trop Med Hyg 19:692–694 Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313 Stenglein MD, Sanders C, Kistler AL, Ruby JG, Franco JY, Reavill DR, Dunker F, Derisi JL (2012) Identification, characterization, and in vitro culture of highly divergent arenaviruses from boa constrictors and annulated tree boas: candidate etiological agents for snake inclusion body disease. Mbio 3:e00180-e212 Stenglein MD, Jacobson ER, Chang LW, Sanders C, Hawkins MG, Guzman DS, Drazenovich T, Dunker F, Kamaka EK, Fisher D, Reavill DR, Meola LF, Levens G, DeRisi JL (2015) Widespread recombination, reassortment, and transmission of unbalanced compound viral genotypes in natural arenavirus infections. PLoS Pathog 11:e1004900 Strecker T, Eichler R, Meulen J, Weissenhorn W, Dieter Klenk H, Garten W, Lenz O (2003) Lassa virus Z protein is a matrix protein and sufficient for the release of virus-like particles [corrected]. J Virol 77:10700–10705 Strecker T, Maisa A, Daffis S, Eichler R, Lenz O, Garten W (2006) The role of myristoylation in the membrane association of the Lassa virus matrix protein Z. Virol J 3:93 Sullivan BM, Emonet SF, Welch MJ, Lee AM, Campbell KP, de la Torre JC, Oldstone MB (2011) Point mutation in the glycoprotein of lymphocytic choriomeningitis virus is necessary for receptor binding, dendritic cell infection, and long-term persistence. Proc Natl Acad Sci U S A 108:2969– 2974 Teng MN, Borrow P, Oldstone MB, de la Torre JC (1996) A single amino acid change in the glycoprotein of lymphocytic choriomeningitis virus is associated with the ability to cause growth hormone deficiency syndrome. J Virol 70:8438–8443 Tishon A, Eddleston M, de la Torre JC, Oldstone MB (1993) Cytotoxic T lymphocytes cleanse viral gene products from individually infected neurons and lymphocytes in mice persistently infected with lymphocytic choriomeningitis virus. Virology 197:463–467 Tortorici MA, Albarino CG, Posik DM, Ghiringhelli PD, Lozano ME, Rivera Pomar R, Romanowski V (2001) Arenavirus nucleocapsid protein displays a transcriptional antitermination activity in vivo. Virus Res 73:41–55 Trappier SG, Conaty AL, Farrar BB, Auperin DD, McCormick JB, Fisher-Hoch SP (1993) Evaluation of the polymerase chain reaction for diagnosis of Lassa virus infection. Am J Trop Med Hyg 49:214–221 Traub EA (1935) Filterable virus recovered from white mice. Science 298–299 Trautmann L, Janbazian L, Chomont N, Said EA, Gimmig S, Bessette B, Boulassel MR, Delwart E, Sepulveda H, Balderas RS, Routy JP, Haddad EK, Sekaly RP (2006) Upregulation of PD1 expression on HIV-specific CD8+ T cells leads to reversible immune dysfunction. Nat Med 12:1198–1202 Trivedi P, Meyer KK, Streblow DN, Preuninger BL, Schultz KT, Pauza CD (1994) Selective amplification of simian immunodeficiency virus genotypes after intrarectal inoculation of rhesus monkeys. J Virol 68:7649–7653

302

M. Sironi et al.

Urata S, Noda T, Kawaoka Y, Yokosawa H, Yasuda J (2006) Cellular factors required for Lassa virus budding. J Virol 80:4191–4195 Valsamakis A, Riviere Y, Oldstone MB (1987) Perturbation of differentiated functions in vivo during persistent viral infection. III. Decreased growth hormone mRNA. Virology 156:214–220 Vignuzzi M, Stone JK, Arnold JJ, Cameron CE, Andino R (2006) Quasispecies diversity determines pathogenesis through cooperative interactions in a viral population. Nature 439:344–348 Vilibic-Cavlek T, Savic V, Ferenc T, Mrzljak A, Barbic L, Bogdanic M, Stevanovic V, Tabain I, Ferencak I, Zidovec-Lepej S (2021) Lymphocytic choriomeningitis-emerging trends of a neglected virus: a narrative review. Trop Med Infect Dis 6 Volpon L, Osborne MJ, Capul AA, de la Torre JC, Borden KL (2010) Structural characterization of the Z RING-eIF4E complex reveals a distinct mode of control for eIF4E. Proc Natl Acad Sci U S A 107:5441–5446 Wang J, Yang X, Liu H, Wang L, Zhou J, Han X, Zhu Y, Yang W, Pan H, Zhang Y, Shi Z (2019) Prevalence of Wenzhou virus in small mammals in Yunnan Province. China. Plos Negl Trop Dis 13:e0007049 Wang N, Yang L, Li G, Zhang X, Shao J, Ma J, Chen S, Liu Q (2021) Molecular detection and genetic characterization of Wenzhou virus in rodents in Guangzhou, China. BMC Vet Res 17:301 Wauquier N, Petitdemange C, Tarantino N, Maucourant C, Coomber M, Lungay V, Bangura J, Debre P, Vieillard V (2019) HLA-C-restricted viral epitopes are associated with an escape mechanism from KIR2DL2(+) NK cells in Lassa virus infection. EBioMedicine 40:605–613 Weaver SC, Salas RA, de Manzione N, Fulhorst CF, Duno G, Utrera A, Mills JN, Ksiazek TG, Tovar D, Tesh RB (2000) Guanarito virus (Arenaviridae) isolates from endemic and outlying localities in Venezuela: sequence comparisons among and within strains isolated from Venezuelan hemorrhagic fever patients and rodents. Virology 266:189–195 Weaver SC, Salas RA, de Manzione N, Fulhorst CF, Travasos da Rosa AP, Duno G, Utrera A, Mills JN, Ksiazek TG, Tovar D, Guzman H, Kang W, Tesh RB (2001) Extreme genetic diversity among Pirital virus (Arenaviridae) isolates from western Venezuela. Virology 285:110–118 Whitmer SLTS, Daniel C, Hans-Peter D, Kelly F, Ketan P, Shelley MB, William GD, John DK, Pierre ER, Jonas S-C, Elisabeth F-C, Bernd N, Petra E, Toni R, Svenja W, Sarah Katharina F, Markus E, Jan Philipp M, Tilman S, Torsten H, William A, Kofi B, Juliana Naa Dedei A, Bruce R, Jay BV, Aneesh KM, Lyon GM, Gerrit K, De Philipp L, Gundolf S, Christoph S, Ulrike W, Jochen WUF, Matthias K, Colleen SK, Timo W, Stuart TN, Stephan B, Ute S, Stephan G (2018) New lineage of Lassa virus, Togo, 2016. Emerging Infectious Disease 24:599 Wiley MR, Fakoli L, Letizia AG, Welch SR, Ladner JT, Prieto K, Reyes D, Espy N, Chitty JA, Pratt CB, Di Paola N, Taweh F, Williams D, Saindon J, Davis WG, Patel K, Holland M, Negron D, Stroher U, Nichol ST, Sozhamannan S, Rollin PE, Dogba J, Nyenswah T, Bolay F, Albarino CG, Fallah M, Palacios G (2019) Lassa virus circulating in Liberia: a retrospective genomic characterisation. Lancet Infect Dis 19:1371–1378 Wolff H, Lange JV, Webb PA (1978) Interrelationships among arenaviruses measured by indirect immunofluorescence. Intervirology 9:344–350 Wright CF, Morelli MJ, Thebaud G, Knowles NJ, Herzyk P, Paton DJ, Haydon DT, King DP (2011) Beyond the consensus: dissecting within-host viral population diversity of foot-and-mouth disease virus by using next-generation genome sequencing. J Virol 85:2266–2275 Wu Z, Du J, Lu L, Yang L, Dong J, Sun L, Zhu Y, Liu Q, Jin Q (2018a) Detection of Hantaviruses and Arenaviruzses in three-toed jerboas from the Inner Mongolia Autonomous Region, China. Emerg Microbes Infect 7:35 Wu Z, Lu L, Du J, Yang L, Ren X, Liu B, Jiang J, Yang J, Dong J, Sun L, Zhu Y, Li Y, Zheng D, Zhang C, Su H, Zheng Y, Zhou H, Zhu G, Li H, Chmura A, Yang F, Daszak P, Wang J, Liu Q, Jin Q (2018b) Comparative analysis of rodent and small mammal viromes to better understand the wildlife origin of emerging infectious diseases. Microbiome 6:178 Wu-Hsieh B, Howard DH, Ahmed R (1988) Virus-induced immunosuppression: a murine model of susceptibility to opportunistic infection. J Infect Dis 158:232–235

Mammarenavirus Genetic Diversity and Its Biological Implications

303

Yadouleton A, Agolinou A, Kourouma F, Saizonou R, Pahlmann M, Bedie SK, Bankole H, BeckerZiaja B, Gbaguidi F, Thielebein A, Magassouba N, Duraffour S, Baptiste JP, Gunther S, FichetCalvet E (2019) Lassa virus in pygmy mice, benin, 2016–2017. Emerg Infect Dis 25:1977–1979 York J, Nunberg JH (2006) Role of the stable signal peptide of Junin arenavirus envelope glycoprotein in pH-dependent membrane fusion. J Virol 80:7775–7780 York J, Nunberg JH (2007) Distinct requirements for signal peptidase processing and function in the stable signal peptide subunit of the Junin virus envelope glycoprotein. Virology 359:72–81 York J, Romanowski V, Lu M, Nunberg JH (2004) The signal peptide of the Junin arenavirus envelope glycoprotein is myristoylated and forms an essential subunit of the mature G1–G2 complex. J Virol 78:10783–10792 Young PR, Howard CR (1983) Fine structure analysis of Pichinde virus nucleocapsids. J Gen Virol 64(Pt 4):833–842 Zapata JC, Salvato MS (2013) Arenavirus variations due to host-specific adaptation. Viruses 5:241– 278 Zinkernagel RM (2002) Lymphocytic choriomeningitis virus and immunology. Curr Top Microbiol Immunol 263:1–5

Genome Structure, Life Cycle, and Taxonomy of Coronaviruses and the Evolution of SARS-CoV-2 Kevin Lamkiewicz, Luis Roger Esquivel Gomez, Denise Kühnert, and Manja Marz

Abstract Coronaviruses have a broad host range and exhibit high zoonotic potential. In this chapter, we describe their genomic organization in terms of encoded proteins and provide an introduction to the peculiar discontinuous transcription mechanism. Further, we present evolutionary conserved genomic RNA secondary structure features, which are involved in the complex replication mechanism. With a focus on computational methods, we review the emergence of SARS-CoV-2 starting with the 2019 strains. In that context, we also discuss the debated hypothesis of whether SARS-CoV-2 was created in a laboratory. We focus on the molecular evolution and the epidemiological dynamics of this recently emerged pathogen and we explain how variants of concern are detected and characterised. COVID-19, the disease caused by SARS-CoV-2, can spread through different transmission routes and also depends on a number of risk factors. We describe how current computational models of viral epidemiology, or more specifically, phylodynamics, have facilitated and will continue to enable a better understanding of the epidemic dynamics of SARS-CoV-2.

K. Lamkiewicz, L. R. E. Gomez, D. Kühnert, M. Marz: These authors contributed equally. K. Lamkiewicz · M. Marz (B) RNA Bioinformatics and High-Throughput Analysis, Friedrich Schiller University Jena, Leutragraben 1, 07743 Jena, Germany e-mail: [email protected] L. R. Esquivel Gomez · D. Kühnert Transmission, Infection, Diversification and Evolution Group, Max Planck Institute for the Science of Human History, Kahlaische Straße 10, 07745 Jena, Germany K. Lamkiewicz · D. Kühnert · M. Marz European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany K. Lamkiewicz · M. Marz German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Puschstr. 4, 04103 Leipzig, Germany M. Marz FLI Leibniz Institute for Age Research, Beutenbergstraße 11, 07745 Jena, Germany © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 E. Domingo et al. (eds.), Viral Fitness and Evolution, Current Topics in Microbiology and Immunology 439, https://doi.org/10.1007/978-3-031-15640-3_9

305

306

K. Lamkiewicz et al.

1 An Introduction to Coronaviruses In this chapter, the focus is on the structure of coronavirus genomes. Aside from the genome organization and the taxonomy of coronaviruses, we further introduce conserved RNA secondary structure elements present in the UTRs of the genome, which are vital for the virus life cycle. In the second part of the chapter, we discuss the evolution and epidemiology of the pandemic coronavirus SARS-CoV-2 and the application of phylodynamic methods to SARS-CoV-2 genomes. Coronaviruses (CoV) are enveloped positive-sense (+) single-stranded (ss) RNA viruses that belong to the family Coronaviridae in the order Nidovirales (Lefkowitz et al. 2018). The family Coronaviridae is divided into the newly established subfamilies Letovirinae and Orthocoronavirinae. The common word coronavirus refers to the four genera of the latter subfamily, termed Alpha-, Beta-, Gamma-, and Deltacoronavirus (Snijder et al. 2003; Gorbalenya et al. 2004; Woo et al. 2010, 2012; King et al. 2012). Coronaviruses have a broad host range, spanning several mammalian and avian hosts. Due to their zoonotic potential, which has been demonstrated in several recent epidemics and the most recent SARS-CoV-2 pandemic, CoVs are of economic and medical importance (Menachery et al. 2017; Vijay and Perlman 2016). The envelope of coronavirus particles embeds several different types of membrane proteins, having a total size ranging from 60 to 160 nm. The 20 nm large crownshaped spikes on the surface consist of parts of the large, glycosylated S protein, which here forms a membrane-anchored trimer. Inside the envelope a presumably icosahedral capsid containing helical nucleoprotein complexes can be found. Figure 1 shows an EM picture of SARS-CoV-2 virions attached on the hosts’ cell surface. Viruses of Coronaviridae have one of the largest RNA genomes with a genome size of approximately 30 kb. The largest RNA genome known to date is 41.1 kb (Saberi et al. 2018). In general, coronaviruses encode for large polyproteins in the first half of their genome (ORF1a and ORF1b) and several smaller proteins in the second half (Fig. 2). The resulting proteins of ORF1a and ORF1b are processed into several non-structural proteins such as the RNA-dependent RNA polymerase (RdRp). Structural proteins are encoded on the second half of the genome and vary in number between different species and genera. Transcription of genes is done by a unique mechanism called “discontinuous extension” of minus strands. This mechanism produces a nested set of subgenomic (sg) mRNAs (Sawicki and Sawicki 1995, 1998). Each sg mRNA is 3’-coterminal with the genome and further shares a 5’ leader sequence identical to part of the 5’ UTR of the viral genome (Spaan et al. 1983; Zuniga et al. 2004; Pasternak et al. 2006; Sawicki et al. 2007). Depending on the viral species, this leader sequence varies between 60 to 95 nucleotides in size. Only the most 5’-located ORF of a sg mRNA is translated into a protein sequence (Nakagawa et al. 2016). For SARS-CoV-2, a member of the betacoronavirus genus (subgenus: Sarbecovirus), eight canonical sg mRNAs have been described, encoding for a total of 13 ORFs. A simplified overview of the genome organization and the canonical sg mRNAs is given in Fig. 2. For mRNA 3, mRNA 7, and mRNA 9, leaky scanning is assumed to cause these sg mRNAs to encode for several proteins. These numbers are

Genome Structure, Life Cycle, and Taxonomy …

307

similar to SARS-CoV, first described in 2003, which has eight sg mRNAs encoding for 12 proteins. However, in contrast to SARS-CoV, SARS-CoV-2 further harbors a (suboptimal) furin-cleavage site in the spike protein (mRNA 2, gene S). Such a furin-cleavage site is not uncommon in other human coronaviruses (common cold CoVs and MERS-CoV), but has not been described for sarbecoviruses previously (Peacock et al. 2021). It is hypothesized that the cleavage site facilitates transmission between humans (Coutard et al. 2020). During infection, virions of SARS-CoV-2 enter the cell via endocytosis. The human cell receptor ACE2 is targeted by the viral spike protein and initiates cell entry (Shang et al. 2020). Due to the furin-cleavage site in the spike protein, an alternative way of cell entry via membrane fusion is possible. After uncoating of the genome in the cytoplasm, translation of viral proteins and replication of the genome is established by exploiting the hosts’ machinery (V’kovski et al. 2021). Further, several non-structural proteins (nsp) are involved in shutting down the processing of cellular proteins, e.g. Nsp1 causes a host translation shutoff (Schubert et al. 2020). Viral protein levels are regulated via the generation of the nested sg mRNAs, as described in the next paragraph. Virions are assembled in the cytoplasm and released into the extracellular space via exocytosis.

Fig. 1 SARS coronavirus-2 (SARS-CoV-2, isolate SARS-CoV-2/Italy-INMI1). Electron microscopy. Ultra-thin section through a Vero cell with virus particles on the surface. Scale: 100 nm. Source Tobias Hoffmann, Robert Koch Institute (RKI), 2020

308

K. Lamkiewicz et al.

Fig. 2 Genome organisation and overview of canonical sg mRNAs of SARS-CoV-2. Shown are the positive-oriented molecules resulting from transcription of the individual sg mRNAs. During negative-strand synthesis, the RdRp may perform a template-switch, causing a discontinuous transcription of RNA. The template switch is facilitated by the sequence similarity of TRS sequences in the genome (upstream of each canonical ORF) and the 5’-UTR (downstream of the leader sequence present in all sg mRNAs). For each sg mRNA only the ORF at the 5’-UTR is translated

2 Discontinuous transcription and subgenomic mRNAs One of the key regulatory elements that controls coronavirus sg mRNA synthesis is the transcription-regulating sequence (TRS) (Sola et al. 2011, 2015). TRSs are located downstream of the 5’-leader on the genome (“leader TRS”, TRS-L) and upstream of each of the major ORFs present in the 3’-terminal genome region (“body TRS”, TRS-B). It is hypothesized that the elongation of negative-stranded RNA is attenuated if the replication-transcription complex (RTC) encounters a TRS-B. This attenuation may subsequently cause RNA synthesis to be continued from either the current template position (continuous RNA synthesis) or an upstream template position (discontinuous RNA synthesis). The discontinuous mode of negative-strand RNA synthesis is initiated with a template switch of the RTC. This switch is primarily guided by RNA-RNA base-pairing between the 3’ end of the nascent minus strand, containing the reverse-complement TRS sequence and the TRS-L (Fig. 3). At this (complement) TRS-B and TRS-L pairing site, the RTC resumes (and completes) minus-strand synthesis by transcribing the 5’-leader sequence of the template RNA (Zuniga et al. 2004; Sola et al. 2011). It was shown that the template switch occurs during negative-strand RNA synthesis (Zuniga et al. 2004; Sola et al. 2011). If no template switch occurs, the complete genome is copied, resulting in a complete negative-oriented template for genome replication. The subsequent synthesized molecule is called mRNA 1, whereas the sg mRNAs are then enumerated based on their length (i.e., the longest sg mRNA is called mRNA 2, etc.). Due to several epidemics in the past, coronaviruses have been studied thoroughly compared to other viruses. The first human-infecting coronavirus was observed in 1961 and has been named B814 (Kendall et al. 1962). Only a few years later another

Genome Structure, Life Cycle, and Taxonomy …

309

Fig. 3 During minus-strand synthesis the TRS-B sequence is transcribed into its reverse complement (cTRS-B). Given this complementarity the cTRS-B and the TRS-L can interact with each other. This interaction facilitates the template switch of the RdRp. Even though the exact mechanisms of the template-switch (RdRp pausing, guidance of the cTRS-B to the proximity of the TRS-L, etc.) are not completely understood, it is assumed that several RNA-RNA and RNA–protein interactions are involved in this process

common cold virus, Human coronavirus 229E (HCoV-229E) was characterized (Hamre and Procknow 1966). In the following decades, several coronaviruses have been identified. The ICTV currently recognizes 45 different coronavirus species, six of those are known to infect humans (representative human-infecting viruses for each species are named in brackets): (1) Human coronavirus 229E (HCoV229E), (2) Human coronavirus NL63 (HCoV-NL63), (3) betacoronavirus 1 (HCoVOC43), (4) Human coronavirus HKU1 (HCoV-HKU1), (5) Middle East respiratory syndrome-related coronavirus (MERS-CoV), and (6) Severe acute respiratory syndrome-related coronavirus (SARS-CoV and SARS-CoV-2) (King et al. 2012). Unfortunately, the virus that B814 refers to was lost; it is not clear which of the “present” known coronaviruses the B814 was. Further, the first four viral species cause mild symptoms, comparable with the common cold viruses, whereas the latter three species cause potentially severe and lethal diseases, commonly called MERS, SARS, and COVID-19, respectively (Liu et al. 2021a, b, c). Considering the taxonomy of Coronaviridae, HCoV-229E and HCoV-NL63 are alphacoronaviruses, whereas HCoV-HKU1, HCoV-OC43, SARS-CoV, SARS-CoV-2, and MERS-CoV are betacoronaviruses. A phylogenetic tree, annotated with the genus information and based on the complete genome sequences of NCBI RefSeq database entries, is given in Fig. 4 and includes coronaviruses representing all four genera of Orthocoronavirinae.

3 Coronaviruses beyond their sequence: RNA structures RNA-RNA interactions are essential for viruses. Base pairing of reversecomplementary nucleotides allows RNA molecules to fold into (secondary) structures. Such RNA structures have already been described for viruses from all Baltimore classes and are often important regulators of fundamental molecular processes such as replication. It is assumed that untranslated regions (UTR) of RNA molecules (and RNA viruses) are more structured than coding regions to enable rapid transcription and translation. In the case of coronaviruses, UTRs have also been studied for

K. Lamkiewicz et al.

ma

nc

oro

nav

irus

229

ro

na

vi

ru

s

N

L6

us vir oe

nte

ritis

t corona

iss

ible

ga

str

erus ba

sm

co

Tra n

an

Bat coronavirus HKU8

ee

Hu

um

Miniopt

rcin

2 s 51 aviru oron bat c ilus s toph viru Sco hea iarr ic d em pid

Po

H

virus 1

310

3

n ia Av

us io ct fe in

itis ch on br

us vir

ir av

1 SW us

ron

le

ha

E

co

aW

lug

Bat coro na

Be

virus HK

U2

α

γ

Munia

514 U13-3

s HK

aviru

coron

Bulbul coronavirus HKU11-934

δ

β

Thrush

coron

aviru

HK

U1

oro

na

us

na

on

ro co

00

s viru

virus 2

s

HK

U5

U4

-1

-1

ato

cor ona

HK

pir c me

dro

yn

s ry

corona

us

res

SA RS

iru

st

oro

HKU9-1

SARS

av

Ea

an

dle

m

vir

or

Mid

Bat coronavirus

Hu

U12-6

Ba tc Ba tc

vir

OC43 ronavirus bus Human co e in M stra s iru nav us oro c vir ine itis Bov at p he ine ur M

s HK

na vir us

Fig. 4 Phylogenetic tree of selected coronaviruses representing their respective genus. The tree is based on the nucleotide sequence of each RdRp. Coronaviruses infecting humans are indicated with a red dot. The multiple sequence alignment has been calculated with MAFFT (v7.470, (Katoh and Standley 2013)), the phylogenetic tree with QuickTree (v1.1, (Howe, Bateman, and Durbin 2002))

secondary structures of RNA (Yang and Leibowitz 2015; Madhugiri et al. 2014). The first about 200 to 300 nucleotides of a coronavirus genome comprise the 5’UTR, whereas the 3’-UTR has a length of about 300 to 500 nucleotides. For alphaand betacoronaviruses, several studies have been published establishing general and specific RNA secondary structure models for the UTRs of these genera. For example, the 5’-UTR harbors RNA structures that are essential for viral replication, translation initiation, and packaging of new virions. In the 5’-UTR of sarbecoviruses, and thus also of SARS-CoV-2, four conserved RNA structures are present. A fifth structure overlaps with ORF1a. Figure 5 shows all these RNA structures with their respective co-variation in the 5’-UTR of sarbecoviruses. SL1 is important for viral replication, assumingly by playing a role during transcription.

Genome Structure, Life Cycle, and Taxonomy …

311

Fig. 5 Overview of conserved structural RNA elements in the 5’-UTR of sarbecoviruses and the corresponding multiple sequence alignment. Base-pairs are colored according to their degree of conservation. Further, important sequential motifs are highlighted in the structure. The alignment and consensus structure was calculated with LocARNA (v.2.0.0RC8, (Will et al. 2007))

It has been shown that mutations in the upper part of SL1 have a higher impact on viral replication levels, suggesting that the exposed regions might be of importance. SL2 is crucial for viral viability. A study from Madhugiri et al. suggest that the sequence itself is not as important as the structure, since (1) any disruption of G-C pairing has an impact on viral replication and (2) replacing the SL2 sequence of HCoV-229E with the SL2 sequence of SARS-CoV shows no effect on viral titer level (Madhugiri et al. 2018). In sarbecoviruses, an additional stem-loop is formed downstream of SL2. This stem-loop, named SL3 harbors the TRS-L sequence and is essential for sg mRNA synthesis, even though it has not been observed in other subgenera of betacoronavirus (Sola et al. 2011).

312

K. Lamkiewicz et al.

For SL4, it is hypothesized that it also plays a role during viral transcription. Further downstream of SL4, overlapping with the first ORF of the viral genome, lies SL5. This structural element forms a 3-multiloop that exposes the conserved nucleotide sequence 5’-UUCCGU-3’ in all of its hairpins. These conserved sequences are assumed to mediate the assembly of new virions by RNA–protein interactions (Chen et al. 2021a). The 3’-UTR harbors two small hairpins, called PK-SL2, which can form an alternative conformation (PK-SL1). Downstream to this flexible region lies the hypervariable region (HVR), which (partially) exposes the conserved octanucleotide 5’GGAAGAGC-3’. The HVR is supported by a lot of co-variance across alpha- and betacoronaviruses, however, the octanucleotide is highly conserved on sequence level in these two subgenera. Functionally, the structures are vital for viral replication, although exact mechanisms are not fully understood yet and merit further studies. Databases Generally, no specific database for Coronaviridae exists. However, during the SARSCoV-2 pandemic several databases have been established that provide genomic, metaand specialized data for SARS-CoV2. In the following, we would like to give an incomplete overview of a few databases and broadly describe what kind of data the user finds in each case. First, probably the best-known databases for genomic resources are the GISAID database (Shu and McCauley 2017), the COVID-19 Data Portal (managed by ENA) (Harrison et al. 2021), and the NCBI SARS-CoV-2 Resources. At the time of writing (31.09.2021), these databases contain nearly 4 million (GISAID), 5.3 million (ENA), and 1.7 million (NCBI) records, respectively. It is noteworthy that GISAID only provides consensus genomes, whereas ENA and NCBI also provide raw sequencing data. While ENA and NCBI are fully open access, a registration (free) is needed for GISAID. GISAID further provides comprehensive metadata about origin and sequencing date of each individual sample. On the other hand, NCBI and ENA provide an API, facilitating automated workflows and analyses. Additional metadata can be found in data resources such as Pangolin (O’Toole et al. 2021) or Nextstrain (Hadfield et al. 2018). In addition to the data, which usually comes from the GISAID and ENA, additional information is given here for preprocessing. Nextstrain, for example, provides tracking of individual viral strands. Pangolin, on the other hand, is a lineage assignment tool that also serves as a database for the different lineages. Both tools allow the classification of new SARS-CoV-2 sequences based on known lineages or clades, which have already been described with and by the respective tool. Their nomenclature systems are described in the second half of this chapter. Specialised databases, such as the Pfam for protein families (Mistry et al. 2021) or the Rfam for RNA families (Kalvari et al. 2021), provide dedicated resources for SARS-CoV-2 data. Additionally, many databases are available that provide genome sequences for a specific region or country.

Genome Structure, Life Cycle, and Taxonomy …

313

4 SARS-CoV-2 Evolution and Epidemiological Dynamics Discovery and the beginning of the pandemic In December of 2019 a series of “pneumonia of unknown etiology” cases were identified in the Chinese city of Wuhan, located in the Hubei Province (Li et al. 2020; Zhou et al. 2021). The most common symptoms associated with this new disease ranged from fever, headaches and dry cough to breathing difficulties, alveolar damage and in some cases death. Furthermore, a decrease in the count of lymphocytes, the lack of improvement of the symptoms after three days of antimicrobial treatment, and chest radiographies determined that the disease was of viral origin (Zhou et al. 2021). After metagenomic sequencing of a sample of bronchoalveolar lavage fluid from a patient, the causal agent was identified as a new member of the family Coronaviridae and received the provisional names of WH-Human 1 coronavirus and 2019-nCoV. Initial phylogenetic analysis placed the new virus in the subgenus Sarbecovirus (which also includes SARS-CoV) within the genus betacoronavirus (Wu et al. 2020a, b). In February 2020, the World Health Organization officially named the disease as COVID-19 and in March the official taxonomic classification of the new virus was published by the Coronaviridae Study Group of the International Committee on Taxonomy of Viruses. It confirmed its placement as a Sarbecovirus, being clustered together with a group of bat coronaviruses of the species Severe acute respiratory syndrome-related coronavirus, and gave it the official name of Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (Coronaviridae Study Group of the International Committee on Taxonomy of Viruses 2020), the third coronavirus capable of causing severe respiratory diseases in humans to emerge in the last 18 years. The first COVID-19 patient was hospitalized on December 12th and several more were admitted in the following weeks. Most of these early cases had links to the Huanan Seafood Wholesale Market, a place known for selling birds, reptiles and small mammals in addition to the more traditional seafood (Wu et al. 2020a, b; Lu et al. 2020). This evidence initially pointed at the market as the place of origin of the disease and the exposure to exotic animals as the most likely way of transmission. However, from the first 41 confirmed cases only 66% had been exposed to the market, and the first patient, whose symptoms appeared on December 1st was not one of them (Huang et al. 2020). Taking this into account, the first infections likely occurred no later than November, and the virus may have circulated around the city and connected regions without being detected, until the first cluster of cases with connection to the market was identified. Human to human transmission was later confirmed when cases started to be reported among medical staff and family clusters without any connection to the market. By the end of January 2020 over 5,900 people were infected in China and cases started to be reported in other countries like Thailand, Japan, South Korea, Malaysia, Singapore, and the USA (Lu et al. 2020). The virus continued to spread quickly to the rest of the world and is currently present in over 200 countries, with

314

K. Lamkiewicz et al.

over 230 million infections and almost 5 million deaths registered worldwide as of October 2021 (“Worldometer—Real Time World Statistics” n.d.). Evolutionary origin Since the beginning of the COVID-19 pandemic, research groups around the world have pushed towards elucidating the evolutionary origin of SARS-CoV-2. The placement of this virus within the Sarbecovirus phylogeny, using full genome sequences, suggests a zoonotic origin for SARS-CoV-2 following a spillover from bats to humans. Its closest known relatives are the bat coronaviruses RaTG13 and RmYN02, with which SARS-CoV-2 shares 93.3% and 96.1% of its genome at nucleotide level, respectively (Zhou et al. 2020). While the high sequence similarity and well supported phylogenies clearly argue in favor of a bat virus being the ancestor of SARS-CoV-2, coronaviruses are highly recombinant, and the consequences of recombination can generate a more complex scenario for the emergence of SARS-CoV-2. Genetic recombination is a process by which fragments of genetic material can be transferred from one DNA/RNA molecule to another, thus generating a new molecule containing different genomic regions coming from different parental lineages (Nehra et al. 2017). For the particular case of non-segmented RNA viruses, recombination happens by template switching. This process can occur during viral replication when two viruses co-infect the same cell: the RNA polymerase jumps from one RNA molecule (donor) to a second one (acceptor) without stopping the synthesis of the new viral genome (Simon-Loriere and Holmes 2011). Recombination is an important source of genetic diversity and can have a significant impact on viral evolution, as recombinant viruses can have the ability to infect new types of hosts, and present an increased virulence and resistance to antiviral treatments (Simon-Loriere and Holmes 2011; Pérez-Losada et al. 2015). Recombination in coronaviruses can occur at a very high frequency and has been under study since the early 1990s, as they were the second group of viruses for which recombination was confirmed (Lai 1992). The emergence of human coronaviruses during the last two decades, has caused a continuous increase in the interest for the study of recombination, since the detection of possible recombination events can help to clarify the origin of a virus. The emergence of SARS-CoV-2 is no exception. Several studies have used recombination detection tools like RDP4 (Martin et al. 2015), GARD (Pond et al. 2006), and Simplot (Lole et al. 1999), to analyze data sets of Sarbecoviruses. The analyses detected recombination breakpoints in the S gene involving SARS-CoV-2, RaTG13 and GD410721, a Sarbecovirus isolated from Malayan pangolins (Manis javanica) in Guangdong, China (Boni et al. 2020; Lam et al. 2020; Li et al. 2020; Liu et al. 2020; Wu et al. 2020a, b; Tagliamonte et al. 2020). While RaTG13 is the closest relative of SARS-CoV-2 considering the full S gene sequence, the receptor binding domains (RBD) of SARS-CoV-2 and GD410721 are highly conserved, with the two sharing six contact amino acid residues that confer a high affinity for the ACE2 receptors (Lam et al. 2020; P. Liu et al. 2020; Xiao et al. 2020). Phylogenetic trees made using different genomic regions of the S gene, produce different topologies that reflect the change in sequence similarity

Genome Structure, Life Cycle, and Taxonomy …

315

between SARS-CoV-2 and the other two viruses: trees built using the variable loop of the RBD, which is the region that contains the six contact residues, produce a topology that has the pangolin virus as the sister group of SARS-CoV-2, while trees using other regions of the S gene have RaTG13 as its closest relative (Boni et al. 2020). This raises the possibility that SARS-CoV-2 originated after a recombination event between sarbecoviruses from a bat (the direct ancestor of SARS-CoV-2) and a pangolin, acquiring the variable loop of the RBD from the later in the process (Fig. 6a). Interestingly, a phylogenetic analysis of the RBD constructed with only synonymous sites recovered a topology with RaTG13 as the closest relative of SARS-CoV-2 (Lam et al. 2020). A different study found a relatively high synonymous divergence in the RBD region between SARS-CoV-2 and the pangolin virus (Wang et al. 2021a, b). These findings argue against a recombination event involving a Pangolin virus and the ancestor of SARS-CoV-2. Furthermore, the same study also found a low similarity between RaTG13 and three related viruses (RmYN02, GD410721, and pangolin virus from the Chinese province of Guangxi) in the variable loop of the RBD. This result, supports an alternative hypothesis according to which the six contact amino

Fig. 6 Three scenarios to explain the presence/absence of the six contact residues in RBD of human, bat and pangolin viruses. a SARS-CoV-2 acquired the six amino acids after a recombination event (arrow) between unknown bat and pangolin viruses (in red). b RaTG13 lost five of the six amino acids after a recombination event with an unknown sarbecovirus. c SARS-CoV-2 and GD410721 acquired the six amino acids as a result of adaptation to recognize the same type of cellular receptors. The central tree represents the expected topology for the S gene

316

K. Lamkiewicz et al.

acid residues represent an ancestral state already present in the common ancestor of SARS-CoV-2, GD410721 and RaTG13 (Fig. 6b), and it was RaTG13 the virus that acquired a more divergent RBD through recombination (Boni et al. 2020). A third explanation for the similarity between SARS-CoV-2 and GD410721, is the emergence of the key amino acids of the RBD as a result of convergent evolution (Fig. 6c) (Lam et al. 2020). Since both pangolin and human cells have the ACE2 receptors (Wrobel et al. 2021), it’s plausible that the contact residues are the result of the adaptation of the two viruses to similar environments. Under this scenario, SARSCoV-2 originated as a bat virus that was transmitted to humans and circulated for some time without being detected, acquiring during this period not only the contact amino acids but also the polybasic cleavage site, which is a unique feature of the SARS-CoV-2 genome. If we consider other mammal species with ACE2 receptors similar to the ones found in humans and pangolins, SARS-CoV-2 and GD410721 could theoretically recognize the receptors of pigs, ferrets, cats, orangutans and monkeys (Wan et al. 2020). A case of reciprocal transmission of SARS-CoV-2 between humans and minks occurred in Denmark with 68% of the animals developing antibodies (Oude Munnink et al. 2021). Antibodies against SARS-CoV-2 have also been detected in cats and dogs (Patterson et al. 2020). With this in mind, SARS-CoV-2 could have originated from a generalized bat reservoir that developed the affinity for ACE2 receptors before its initial transmission to humans, with natural selection acting on the virus while infecting one or several intermediate hosts. An additional, highly debated hypothesis is that SARS-CoV-2 was created in a laboratory and acquired the contact amino acids during passages in cell cultures, since there is evidence of in-vitro evolution for the case of SARS-CoV (Sheahan et al. 2008). This seems unlikely due to the presence of a similar RBD in pangolin viruses and because the in-vitro emergence of a completely unique polybasic cleavage site could require a very high number of passages (Andersen et al. 2020), as has been observed with influenza viruses (Ito et al. 2001). Finally, there is also the problem of the identity of the progenitor virus used for the cell cultures, since it would need to be a virus more similar to SARS-CoV-2 than any of the currently known sarbecoviruses (Andersen et al. 2020). In summary, there are three main hypotheses for the emergence of SARS-CoV2 as (i) a zoonotic virus with a recombinant origin, (ii) a zoonotic virus naturally adapted to infect human cells, or (iii) a human engineered virus that accidentally leaked out from a lab. While there is evidence in favor and against each of these scenarios, we still cannot say for sure which one is correct. Increasing sampling of viruses in bats and other possible hosts could provide important evidence regarding the role of recombination in the emergence of SARS-CoV-2. On the other hand, analysis of stored human samples from biobanks could be useful in detecting cryptic transmissions of SARS-CoV-2-like viruses, which might elucidate the role of natural selection in the emergence of this virus (Andersen et al. 2020).

Genome Structure, Life Cycle, and Taxonomy …

317

Molecular evolution Selection pressures acting on key regions of the viral genome are one of the main drivers of molecular evolution, and can lead to the appearance of mutations with a significant impact on viral fitness. Positively selected mutations can not only increase the virulence of a virus but also complicate the development of efficient treatments and vaccines. Hence, selection studies aiming to detect signals of diversifying (positive) selection in pathogens are highly important. Selection analyses usually rely on the estimate of the rates of synonymous and non-synonymous substitutions, typically represented as dS and dN, respectively. The ratio dN/dS can be used to describe the type and strength of selection acting on the analyzed sequences: a ratio above one indicates diversifying (or positive) selection, a ratio of one represents neutral evolution and a ratio below one is an indication of purifying (or negative) selection (Li et al. 1985; Nei and Gojobori 1986; Yang and Bielawski 2000). Initial selection analysis of SARS-CoV-2 didn’t find evidence of diversifying selection, with signals of purifying selection (dN/dS < 0.1) being detected in different parts of the genome not only of SARS-CoV-2 but also in bat and pangolin viruses. Selection signals were particularly strong in the S2 subunit of the S gene (Li et al. 2020). A different study confirmed the action of purifying selection on the S gene, but also detected signals of strong diversifying selection on 14 codons. While 3 of those residues were located within the RBD, they didn’t represent any of the contact residues (Tagliamonte et al. 2020). Purifying selection will reduce genetic diversity by purging deleterious mutations from a population, and will be stronger in genes with functions that are key for survival (Cvijovi´c et al. 2018). Given the importance of the S gene in the infection of the host cell, it’s not surprising that the molecular evolution of this gene is constrained by strong selection pressures. The lack of significant diversifying selection in the RBD, indicates that the ability of the virus to adapt to a new host is also limited by selection, suggesting that SARS-CoV-2 was already capable of infecting human cells before jumping to humans. Signatures of positive selection in SARS-CoV-2 were detected relatively early in the pandemic in the orf3 and orf8 genes, linked to the emergence and spread of two particular mutations. The first one, G251V, is a G-T transversion at the position 26 144 in orf3 and showed an increase in frequency of 13% within a month after it was detected. The other mutation, L84S, is a T-C substitution at position 28 814 in ofr8. It was first detected on January 5th 2020 and in less than three weeks its frequency increased to 30% (Chaw et al. 2020). These two mutations later defined some of the early clades of SARS-CoV-2. It is important to note that the selection studies mentioned above used a limited number of SARS-CoV-2 sequences (