128 74 10MB
English Pages 252 [245] Year 2022
Zuben E. Sauna Chava Kimchi-Sarfaty Editors
Single Nucleotide Polymorphisms Human Variation and a Coming Revolution in Biology and Medicine
Single Nucleotide Polymorphisms
Zuben E. Sauna • Chava Kimchi-Sarfaty Editors
Single Nucleotide Polymorphisms Human Variation and a Coming Revolution in Biology and Medicine
Editors Zuben E. Sauna Office of Tissues and Advanced Therapies Center for Biologics Evaluation and Research, U.S. Food and Drug Administration Silver Spring, MD, USA
Chava Kimchi-Sarfaty Office of Tissues and Advanced Therapies Center for Biologics Evaluation and Research, U.S. Food and Drug Administration Silver Spring, MD, USA
ISBN 978-3-031-05614-7 ISBN 978-3-031-05616-1 (eBook) https://doi.org/10.1007/978-3-031-05616-1 © Springer Nature Switzerland AG 2022, Corrected Publication 2022 Chapters 4 and 5 are licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/). For further details see license information in the chapter. This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Introduction and Overview: Single Nucleotide Polymorphisms, Human Variation, and a Coming Revolution in Biology and Medicine
Introduction A germline substitution of a single nucleotide in the genome is defined as a single- nucleotide polymorphism (SNP). The term single-nucleotide variant (SNV) is more general and can refer to both germline and somatic substitutions. The SNPs are very rare as ~99.5% of the genome is identical across all humans (Auton et al. 2015). Nonetheless, the 0.5% of the genome that is different confers the extraordinary diversity that we observe among individuals. Equally important, SNPs drive individual differences in the susceptibility to disease and response to medications (King et al. 2019). Two types of SNPs occur in humans: (i) synonymous, that is, nucleotide substitutions that do not result in a change in the encoded amino acid, and (ii) non- synonymous, where the nucleotide substitutions result in a substitution of the encoded amino acid. As several chapters in this book point out, synonymous mutations are not necessarily benign (as the older nomenclature, “silent,” suggested). Recent evidence that synonymous SNPs can and do alter protein structure, expression (amount), and function is reinforced by studies in molecular evolution that show that both synonymous and non-synonymous mutations are under selective pressure. An additional complexity is that in humans exons represent ~2% of the genome and thus the vast majority of the SNPs are identified in the introns. Just at the term “silent” was once erroneously applied to synonymous SNPs, intragenic DNA was referred to as “junk DNA.” We have since learned that intronic SNPs cannot be ignored as a multitude of mechanisms by which such mutations can affect biological outcomes have been reported (Brest et al. 2011; Cartegni et al. 2002; Nackley et al. 2006; Kudla et al. 2009; Plotkin and Kudla 2011; Kimchi-Sarfaty et al. 2007). Studies that identify diseases and other trait-associated SNPs have also found that most such SNPs occur in introns. Despite the obvious importance of intronic SNPs, characterizing their functional effects remains a singular challenge.
v
vi
Introduction and Overview: Single Nucleotide Polymorphisms...
SNPs are important as they drive human variability. The genetics of SNPs helps us to better understand human biology, provides powerful tools in molecular evolution, and is key to establishing the relationship between genotype and phenotype. Moreover, SNPs are central to bringing the promise of personalized medicine to fruition. Genetic susceptibility to human diseases, responses (or lack thereof) to drugs, adverse events associated with drugs, and so on are all often determined by individual variation defined by SNPs (Tam et al. 2019).
Overview of the Book The study of SNPs has expanded in scope over time. In this book, authorities from diverse fields describe the varied lenses though which scientists study SNPs in human genomes to gain insights into human evolution and human diversity. They also highlight some of the many emerging technological applications that leverage the learnings that research on SNPs have yielded. We have organized the book into five parts.
art I: An Overview of Human Genome Sequencing P and How to Access Information About SNVs Sequencing efforts over several decades have identified over 1 billion SNVs (as of June 2021), and over 900 million of these records include population frequency data. To leverage this large dataset for basic and translational research it is critical that this information is widely available. It is also vital that the data is curated and that a common nomenclature is used to identify an SNV. The most important public repository for this information is the Single Nucleotide Polymorphism database (dbSNP). Chapter 1, by Phan, offers a comprehensive description of this archive. The chapter is also an excellent guide to this resource. Using examples and screen images as instructional aids for searching the dbSNP site, Phan demonstrates how information pertaining to the frequencies of SNPs in different human populations can be obtained. Most importantly the chapter describes how SNP records are annotated and integrated with other resources.
Introduction and Overview: Single Nucleotide Polymorphisms...
vii
art II: A Broad Survey of SNPs, Their Classification into P Synonymous and Non-synonymous, and the Undesirable Consequences of Using the Term “Silent” for Synonymous Changes Once we understand the definition of SNPs (or more broadly SNVs) and can access information with respect to the occurrence of SNPs in different human populations, the next step involves understanding the consequences of these changes. Here, key concepts have undergone considerable changes over the last few decades. The degeneracy of the genetic code is at the heart of these deliberations. When the genetic code was elucidated, it showed that multiple codons coded for a single amino acid. Deciphering the genetic code naturally classified nucleotide substitutions into two classes. Some substitutions altered the primary sequence of the encoded protein while others did not; the former were termed non-synonymous and the latter synonymous. The potential role of non-synonymous substitutions in altering protein structure and function was self-evident. However, clear experimental evidence that synonymous mutations too altered the amount of protein generated as well as protein structure and function appeared only in 2007 (Nackley et al. 2006; Kimchi-Sarfaty et al. 2007). That is not to say that there was no robust debate about the role of synonymous mutations. However, that debate and the evidence to change the dominant paradigm that synonymous nucleotide changes are necessarily silent came from evolutionary biology (Plotkin and Kudla 2011; Chamary et al. 2006). Chapter 2, by Agashe, provides both a historical and a conceptual overview of the recognition that synonymous nucleotide changes play an important role in biology. Agashe tells this story from the perspective of an evolutionary biologist. Early thinking perpetuated the view that synonymous mutations were neutral/silent. The absence of an amino acid change suggested to evolutionary biologists that synonymous mutations would not face selection pressure. Cracks in the neutral theory first appeared with the finding that selection pressure appeared to act on genomes to optimize codon use and tRNA pools for translation. Subsequent studies have demonstrated fitness impacts of synonymous SNPs across organisms. Agashe suggests that given the current state of the art, an evolutionary framework should encompass the impacts of all mutations on various forms of information transmission. Overall, the critical mass of evidence from studies on evolutionary biology put synonymous mutations on par with non-synonymous mutations and motivated mechanistic studies to understand how synonymous mutations affect protein abundance, structure, and function. The challenge for biochemists seeking to find mechanistic explanations for how synonymous variants affect protein function was an elegant experiment by Anfinsen that led to his principle. This experiment showed that an active protein denatured to lose structure and activity could be renatured to regain the same structure and activity (Anfinsen and Haber 1961). Based on this experiment Anfinsen’s principle held
viii
Introduction and Overview: Single Nucleotide Polymorphisms...
that the native conformation of a protein corresponds to that of the free energy minima, that is, the amino acid sequence (primary structure) of a protein contains all the information required for the correct folding of a protein. In Chap. 3, Rosenberg, Bronstein, and Marx document the development of the conceptual framework and experimental tools that bring us to our current understanding of how synonymous (and non-synonymous) nucleotide substitutions affect protein structure and function. Due to these advances, synonymous SNPs are no longer regarded as inconsequential cellular redundancies. In fact, the quest for mechanistic explanations of how synonymous mutations affect protein structure has informed and expanded our understanding of how protein folding occurs. These studies have demonstrated that synonymous SNPs operate via diverse mechanisms that include co-translational folding, changing mRNA structure, and altering the relative abundance of codon-specific tRNAs. Importantly, many of these mechanisms can operate in concert. Even where the mechanism is not completely clear, there are now numerous reports demonstrating dramatic effects on protein function resulting from even a single synonymous substitution (Simhadri et al. 2017; Hunt et al. 2014, 2019; Katneni et al. 2017, 2019; Alexaki et al. 2019a).
Part III: The Role of SNPs in Human Diseases There has been considerable progress in recognizing that both non-synonymous and synonymous SNPs affect the fitness of an organism and in understanding the underlying molecular mechanisms by which SNPs affect protein function. Concomitantly, there have been has been advances in elucidating the association between SNPs and human diseases. Reference human genomes such as the ones described in Part I provide the underlying datasets for studies that find associations between SNPs and human diseases. The workhorses for identifying variants associated with diseases or disease risk are the genome-wide association studies (GWAS). A typical GWAS evaluates genetic variants across genomes of numerous individuals to identify associations between the genotype and the phenotype. The first GWAS identified SNPs associated with age-related macular degeneration in 2005 (Klein et al. 2005). Since then, GWAS have provided associations between tens of thousands of SNVs and human traits and diseases (Tam et al. 2019). Additionally, GWAS can identify genes of unknown function, leading to hypothesis-driven research that elucidates previously unknown mechanisms underlying diseases (Hirschhorn 2009). Although GWAS have revolutionized the study of human genetics, these discoveries represent, in some ways, low-hanging fruit. The findings are based on a limited sampling of populations and phenotypes that are relatively easy to measure. Consequently, the potential for more impactful findings based on larger and more representative (vis-à-vis geographical origin, race, etc.) sample sizes, an expanded range of phenotypes, and novel study designs remains immense.
Introduction and Overview: Single Nucleotide Polymorphisms...
ix
The contributions of GWAS to basic and translational genetics are undisputable; nonetheless, these studies come with some inherent limitations. GWAS identify SNVs that may or may not be causal. Moreover, translating findings have proved to be immensely challenging. A recent criticism has been that GWAS are poised to identify too many loci, which could ultimately be of limited utility. Chapter 4, by Hettiarachchi and Komar, lays down the principles of GWAS, their limitations, and future trends. Their discussion of commercial applications of GWAS, particularly direct to consumer genotyping services, is timely and has important ramifications for public health. Findings from GWAS studies have had the most impact on identifying and understanding the genetic underpinnings of monogenic diseases. Interestingly, human cancers, which are among the most complex diseases, have been the subject of numerous studies seeking associations with SNPs and/or SNVs. This is because cancers involve transformation of normal cells into malignant cells. The so-called hallmarks of cancer (Hanahan and Weinberg 2011) involve coopting cellular pathways via the accumulation of genetic mutations. The ability to efficiently and inexpensively sequence genomes has led to an unprecedented increase in genetic and genomic typing of genes using samples obtained from cancer patients. The challenge with respect to making sense of this data in the context of human cancers is that the common phenotypic characteristics are driven by a heterogeneous set of genetic mutations. Nonetheless, progress has been achieved by the systematic search for driver mutations for human cancers (i.e., those that provide a selective advantage to the clone). For instance, the International Cancer Genome Consortium has had success in documenting mutations that drive common tumors (Campbell et al. 2020). As with many genetic studies until recently, the focus of studies to identify driver mutations in cancer was on the coding regions of the genome. These new studies come on the heels of an expanded understanding of the genome that does not affect the primary structure of proteins. This includes the intronic regions of the genome as well as synonymous variants within the coding regions. These studies unequivocally demonstrate the role of non-coding RNAs and gene regulatory regions in human cancers (Diederichs et al. 2016). Even within the protein-coding region, it is only recently that synonymous SNVs have been systematically studied. The synonymous SNVs represent approximately a quarter of all mutations in human cancers. Recent progress in characterizing and validating synonymous mutations in human cancers has been exhaustively reviewed by Herreros, Janssens, Daniele, and Keersmacker in Chap. 5.
x
Introduction and Overview: Single Nucleotide Polymorphisms...
art IV: An Examination of the Mechanisms by Which P Synonymous Mutations Affect Protein Levels or Protein Folding, Which Affect Human Physiology and Response to Therapy Non-synonymous SNVs alter the amino acid sequence of the translated protein. Thus, the potential effect on protein structure and function is self-evident. Amino acid substitutions have been the mainstay of mechanistic studies in biochemistry as well as in structural biology. We will not try and review the vast literature on this topic in this book. However, at first glance, synonymous mutations which do not alter the amino acid sequence appear to be benign. Thus, historically, such mutations were often found to be referred to as “silent.” As we have seen above, synonymous mutations are under evolutionary selection (Chamary et al. 2006) and sequencing studies have demonstrated that a large fraction of all SNPs are synonymous. Finally, many human diseases are associated with synonymous mutations (Hunt et al. 2014; Sauna and Kimchi-Sarfaty 2011). It is no longer disputable that synonymous mutations are not silent. Thus, we believe it is important to provide an overview of the mechanisms by which synonymous mutations affect protein levels and protein folding. The topic has been reviewed by Zhang and Bebok in Chap. 6. Central to any discussion about the mechanisms by which synonymous mutations affect the translation of proteins is that the genetic code is redundant but all synonymous codons are not equivalent. Thus, there is a codon bias wherein some codons are used more frequently than others (Plotkin and Kudla 2011; Meyer et al. 2021; Kames et al. 2020; Alexaki et al. 2019b). It has been demonstrated experimentally that codons regulate the speed of translation and this influences the accuracy of translation (Plotkin and Kudla 2011). Patterns of codon clusters have been identified and these in turn have been demonstrated to govern protein secondary structure (Tuller et al. 2010; Subramaniam et al. 2014; Cambray et al. 2018; Simms et al. 2017). These studies suggest that codon usage is not arbitrary but has been optimized through evolution. Recent studies in protein folding show that most protein folding occurs concurrently with translation. Thus, co-translational folding is affected by the rhythm of translation (Buhr et al. 2016; O’Brien et al. 2014), where translation rates can control the correct folding of a protein. These concepts are not merely speculative and are now backed by strong experimental evidence that synonymous codons drive the translated proteins to different conformations. In addition to translational rate, the secondary structures of the mRNA can also sometimes play an important role in RNA stability and thus protein levels (Nackley et al. 2006). The successes in elucidating the mechanisms by which synonymous mutations affect protein structure and function have only been possible due to the development of novel methodologies or the refinement of extant technologies. These methods span new computational and bioinformatics tools and datasets as well as biophysical and biochemical techniques that can detect subtle (but functionally important) alterations in protein conformation. Chapter 7, by Lin, Jankowska, Meyer, and Katneni, provides a methodological overview of how the effects of synonymous mutations can be evaluated. This chapter describes the in silico and experimental
Introduction and Overview: Single Nucleotide Polymorphisms...
xi
tools used to evaluate the effects of synonymous mutations on mRNA as well as proteins. Real-life examples of how these tools may be used have also been provided. Importantly, there is a growing need for in silico approaches that can rapidly predict what effects a synonymous mutation is likely to have. Such tools are necessary as synonymous mutations are increasingly being used in drug development, gene therapy, and so on.
art V: The Role of SNPs in Personalized Medicine P and the Platform Technology of Codon Optimization The dramatic decreases in the costs of genome sequencing have led to an explosion of data. Of particular importance in the context of personalized medicine has been the large numbers of individuals who have been fully genotyped. The general concept and desirability of developing personalized treatment strategies for individuals based on their genetics has been articulated for several decades. With the inexpensive access to genetic and genomic data from individuals and the availability of reference genomes, the delivery of personalized (sometimes called precision) medicine is now a realistic possibility. The primary motivation for personalized treatment regimens is the variability between patients vis-à-vis drug efficacy and safety (adverse events), where it has been estimated that ~80% of the variability is due to human genetic diversity (Cacabelos et al. 2019). In addition to genetic variations in the drug target, drug- metabolizing enzymes and other proteins that affect drug distribution and clearance can all provide pharmacogenomic biomarkers (Lauschke et al. 2017). There are currently over 200 GWAS demonstrating the association of specific genes to responses to medications (Giacomini et al. 2017). Numerous pharmacogenomic studies have led to the identification of genetic biomarkers that are clinically actionable. This is evidenced by the increase in drug labels that include pharmacogenetic information. Since 2000, 258 unique biomarker-drug pairs have been included on drug labels (Kim et al. 2021). In 2013 the US Food and Drug Administration issued a guidance document to assist the pharmaceutical industry in evaluating genetic variants that affect a drug’s efficacy, or safety [U.S. Food and Drug Administration: Guidance for Industry—Clinical Pharmacogenomics: Premarket Evaluation in Early-Phase Clinical Studies and Recommendations for Labeling. http://www.Fda. Gov/downloads/drugs/guidancecomplianceregulatoryinformation/guidances/ ucm337169.Pdf]. Personalized treatment strategies that make use of a patient’s genetic data have made modest inroads into the practice of medicine. We are at a cusp where multiple factors promise rapid advances in personalized medicine. With accurate and inexpensive sequencing techniques, a critical mass of genomic data has been accumulated and is growing exponentially. Concomitantly, the incorporation of machine learning tools into the analysis of large genetic datasets has been successful in developing more accurate models for polygenic risks. Thus, a much larger fraction of diseases and/or drugs are likely to be amenable to pharmacogenetic approaches.
xii
Introduction and Overview: Single Nucleotide Polymorphisms...
Chapter 8, by Ooi, Lim, Chong, and Lee, provides a glimpse into how many of the emerging technologies and machine learning algorithms could eventually fulfill the promise of precision medicine. Additionally, Peña-Llopis focuses on pathogenic synonymous mutations in prognosis and precision medicine in Chap. 9. While the clinical use of pharmacogenomic data is the most discussed application of SNPs and SNVs, it is by no means the only issue with technological ramifications. With the growing need for proteins in both research and in clinical applications, increasing protein yield has become a significant goal and is economically desirable. We have seen above that the genetic code is redundant. Any protein sequence can thus be generated using a multiple set of codons. The so-called optimization of these codons generally refers to maximizing the amount of protein that can be synthesized and purified. In contrast to laboratory and industrial process which optimize the maximum protein yield, evolution has optimized (or biased) codon usage to maximize protein quality. Numerous algorithms, both proprietary and in the public domain, circumvent the features of the wild-type genetic code that limit protein production (Holcomb et al. 2021; Puigbò et al. 2007; Villalobos et al. 2006; Hoover and Lubkowski 2002). For the most part, the codon optimization software addresses limits in protein synthesis by using synonymous codon mutations to alter mRNA sequences. The underlying rationale for the large-scale adoption of codon optimization in research and industry was the incorrect premise that synonymous codons do not affect protein structure and function. Unfortunately, as several chapters in this volume and a large literature demonstrate, synonymous codon changes can alter protein conformation and function. Does the fact that replacement of codons with synonymous variants can compromise protein function mean that codon optimization should be universally avoided? Mauro, in Chap. 10, provides a more nuanced view. First, he makes a distinction between proteins being used as industrial enzymes and those used in therapeutic applications. He recognizes that codon optimization could adversely affect efficacy and even induce undesirable immune responses. He then suggests strategies to mitigate the risks associated with codon optimization. We have gained unprecedented learnings about how synonymous codons provide secondary information for protein folding. These granular insights, coupled with powerful computational approaches like machine learning and artificial intelligence, are powering a new generation of codon optimization strategies that mitigate the risks.
Summary In the past two decades, advances in DNA sequencing technologies have brought about immense decreases in the cost of genome sequencing. For instance, the National Human Genome Research Institute determined that sequencing the human genome cost $95,263,072, in 2001, by August 2021 the cost of sequencing the
Introduction and Overview: Single Nucleotide Polymorphisms...
xiii
human genome had dropped to $562. This unimaginable 169,000-fold decrease in cost has led to an exponential increase in genomic data from individuals and spurred advances in the bioinformatics tools and pipelines necessary to handle this data. The surge in data related to human genetic variation could have immense consequences for how drugs are developed, evaluated, and used in the clinic. For instance, personalized (or precision) medicine, which involves customized medical care per the genetic characteristics of the individual patient, is poised to remake clinical care. We already see early signs of this transformation as treatment strategies and product labels have started to utilize genetic information to maximize clinical effectiveness and/or limit side-effects (Kim et al. 2021). The expanded data availability has also meant that genetic changes hitherto considered unimportant have been systematically studied. As an example, synonymous SNPs considered silent until recently are now well established to affect protein abundance, structure, and function (Kimchi-Sarfaty et al. 2007; Kim et al. 2015; Liu et al. 2021). The practical importance of this research is that synonymous SNPs have now been shown to be associated with human diseases and response to therapies (Sauna and Kimchi-Sarfaty 2011). The tools developed to study the consequences of synonymous mutations have also proved to be immensely useful. As a case in point, the platform technology of codon optimization sought to dramatically increase protein yield by substituting synonymous codons. An underlying premise of this technology that synonymous codon changes are benign and do not affect protein structure and function has been shown to be erroneous (Liu et al. 2021). As codon optimization is widely used in research and drug development, the tools developed to study the consequences of the synonymous substitutions can prove immensely useful in evaluating protein products of optimized constructs. The advances described in this book form the bedrock of what we believe will be transformational changes in basic sciences, drug development and regulation, and the practice of medicine. Happy reading! Zuben E. Sauna Hemostasis Branch, Division of Plasma Protein Therapeutics, Office of Tissues and Advanced Therapies Chava Kimchi-Sarfaty Center for Biologics Evaluation & Research, US FDA, Silver Spring, MD, USA
xiv
Introduction and Overview: Single Nucleotide Polymorphisms...
References Alexaki A, Hettiarachchi GK, Athey JC, Katneni UK, Simhadri V, Hamasaki-Katagiri N, Nanavaty P, Lin B, Takeda K, Freedberg D (2019a) Effects of codon optimization on coagulation factor IX translation and structure: implications for protein and gene therapies. Scientific Reports 9:1–15 Alexaki A, Kames J, Holcomb DD, Athey J, Santana-Quintero LV, Lam PVN, Hamasaki- Katagiri N, Osipova E, Simonyan V, Bar H et al (2019b) Codon and codon-pair usage tables (CoCoPUTs): facilitating genetic variation analyses and recombinant gene design. J Mol Biol 431:2434–2441 Anfinsen CB, Haber E (1961) Studies on the reduction and re-formation of protein disulfide bonds. J Biol Chem 236:1361–1363 Auton A, Abecasis GR, Altshuler DM, Durbin RM, Abecasis GR, Bentley DR, Chakravarti A, Clark AG, Donnelly P, Eichler EE et al (2015) A global reference for human genetic variation. Nature 526:68–74 Brest P, Lapaquette P, Souidi M, Lebrigand K, Cesaro A, Vouret-Craviari V, Mari B, Barbry P, Mosnier JF, Hébuterne X et al (2011) A synonymous variant in IRGM alters a binding site for miR-196 and causes deregulation of IRGM-dependent xenophagy in Crohn’s disease. Nat Genet 43:242–245 Buhr F, Jha S, Thommen M, Mittelstaet J, Kutz F, Schwalbe H, Rodnina MV, Komar AA (2016) Synonymous codons direct cotranslational folding toward different protein conformations. Mol Cell 61:341–351 Cacabelos R, Cacabelos N, Carril JC (2019) The role of pharmacogenomics in adverse drug reactions. Expert Rev Clin Pharmacol 12:407–442 Cambray G, Guimaraes JC, Arkin AP (2018) Evaluation of 244,000 synthetic sequences reveals design principles to optimize translation in Escherichia coli. Nature Biotechnology 36:1005–1015 Campbell PJ, Getz G, Korbel JO, Stuart JM, Jennings JL, Stein LD, Perry MD, Nahal-Bose HK, Ouellette BFF, Li CH et al (2020) Pan-cancer analysis of whole genomes. Nature 578:82–93 Cartegni L, Chew SL, Krainer AR (2002) Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nat Rev Genet 3:285–298 Chamary JV, Parmley JL, Hurst LD (2006) Hearing silence: non-neutral evolution at synonymous sites in mammals. Nat Rev Genet 7:98–108 Diederichs S, Bartsch L, Berkmann JC, Fröse K, Heitmann J, Hoppe C, Iggena D, Jazmati D, Karschnia P, Linsenmeier M et al (2016) The dark matter of the cancer genome: aberrations in regulatory elements, untranslated regions, splice sites, non-coding RNA and synonymous mutations. EMBO Mol Med 8:442–457 Giacomini KM, Yee SW, Mushiroda T, Weinshilboum RM, Ratain MJ, Kubo M (2017) Genome- wide association studies of drug response and toxicity: an opportunity for genome medicine. Nat Rev Drug Discov 16:1 Hanahan D, Weinberg RA (2011) Hallmarks of cancer: the next generation. Cell 144:646–674 Hirschhorn JN (2009) Genomewide association studies--illuminating biologic pathways. N Engl J Med 360:1699–1701 Holcomb D, Hamasaki-Katagiri N, Laurie K, Katneni U, Kames J, Alexaki A, Bar H, Kimchi- Sarfaty C (2021) New approaches to predict the effect of co-occurring variants on protein characteristics. Am J Hum Genet 108:1502–1511 Hoover DM, Lubkowski J (2002) DNAWorks: an automated method for designing oligonucleotides for PCR-based gene synthesis. Nucleic Acids Res 30:e43 Hunt RC, Simhadri VL, Iandoli M, Sauna ZE, Kimchi-Sarfaty C (2014) Exposing synonymous mutations. Trends Genet 30:308–321 Hunt R, Hettiarachchi G, Katneni U, Hernandez N, Holcomb D, Kames J, Alnifaidy R, Lin B, Hamasaki-Katagiri N, Wesley A et al (2019) A single synonymous variant (c.354G>A [p.P118P]) in ADAMTS13 confers enhanced specific activity. Int J Mol Sci 20:5734
Introduction and Overview: Single Nucleotide Polymorphisms...
xv
Kames J, Alexaki A, Holcomb DD, Santana-Quintero LV, Athey JC, Hamasaki-Katagiri N, Katneni U, Golikov A, Ibla JC, Bar H et al (2020) TissueCoCoPUTs: novel human tissue- specific codon and codon-pair usage tables based on differential tissue gene expression. J Mol Biol 432:3369–3378 Katneni UK, Hunt R, Hettiarachchi GK, Hamasaki-Katagiri N, Kimchi-Sarfaty C, Ibla JC (2017) Compounding variants rescue the effect of a deleterious ADAMTS13 mutation in a child with severe congenital heart disease. Thromb Res 158:98–101 Katneni UK, Liss A, Holcomb D, Katagiri NH, Hunt R, Bar H, Ismail A, Komar AA, Kimchi- Sarfaty C (2019) Splicing dysregulation contributes to the pathogenicity of several F9 exonic point variants. Mol Genet Genomic Med 7:e840 Kim SJ, Yoon JS, Shishido H, Yang Z, Rooney LA, Barral JM, Skach WR (2015) Translational tuning optimizes nascent protein folding in cells. Science 348:444–448 Kim JA, Ceccarelli R, Lu CY (2021) Pharmacogenomic biomarkers in US FDA-approved drug labels (2000–2020). J Pers Med 11:179 Kimchi-Sarfaty C, Oh JM, Kim IW, Sauna ZE, Calcagno AM, Ambudkar SV, Gottesman MM (2007) A “silent” polymorphism in the MDR1 gene changes substrate specificity. Science 315:525–528 King EA, Davis JW, Degner JF (2019) Are drug targets with genetic support twice as likely to be approved? Revised estimates of the impact of genetic support for drug mechanisms on the probability of drug approval. PLoS Genet 15:e1008489 Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C, Henning AK, SanGiovanni JP, Mane SM, Mayne ST et al (2005) Complement factor H polymorphism in age-related macular degeneration. Science 308:385–389 Kudla G, Murray AW, Tollervey D, Plotkin JB (2009) Coding-sequence determinants of gene expression in Escherichia coli. Science 324:255–258 Lauschke VM, Milani L, Ingelman-Sundberg M (2017) Pharmacogenomic biomarkers for improved drug therapy-recent progress and future developments. AAPS J 20:4 Liu Y, Yang Q, Zhao F (2021) Synonymous but not silent: the codon usage code for gene expression and protein folding. Annu Rev Biochem 90:375–401 Meyer D, Kames J, Bar H, Komar AA, Alexaki A, Ibla J, Hunt RC, Santana-Quintero LV, Golikov A, DiCuccio M et al (2021) Distinct signatures of codon and codon pair usage in 32 primary tumor types in the novel database CancerCoCoPUTs for cancer-specific codon usage. Genome Med 13:122 Nackley AG, Shabalina SA, Tchivileva IE, Satterfield K, Korchynskyi O, Makarov SS, Maixner W, Diatchenko L (2006) Human catechol-O-methyltransferase haplotypes modulate protein expression by altering mRNA secondary structure. Science 314:1930–1933 O’Brien EP, Ciryam P, Vendruscolo M, Dobson CM (2014) Understanding the influence of codon translation rates on cotranslational protein folding. Acc Chem Res 47:1536–1544 Plotkin JB, Kudla G (2011) Synonymous but not the same: the causes and consequences of codon bias. Nat Rev Genet 12:32–42 Puigbò P, Guzmán E, Romeu A, Garcia-Vallvé S (2007) OPTIMIZER: a web server for optimizing the codon usage of DNA sequences. Nucleic Acids Res 35:W126–W131 Sauna ZE, Kimchi-Sarfaty C (2011) Understanding the contribution of synonymous mutations to human disease. Nat Rev Genet 12:683–691 Simhadri VL, Hamasaki-Katagiri N, Lin BC, Hunt R, Jha S, Tseng SC, Wu A, Bentley AA, Zichel R, Lu Q et al (2017) Single synonymous mutation in factor IX alters protein properties and underlies haemophilia B. J Med Genet 54:338–345 Simms CL, Yan LL, Zaher HS (2017) Ribosome collision is critical for quality control during no-go decay. Mol Cell 68:361–373.e365 Subramaniam AR, Zid BM, O’Shea EK (2014) An integrated approach reveals regulatory controls on bacterial translation elongation. Cell 159:1200–1211 Tam V, Patel N, Turcotte M, Bossé Y, Paré G, Meyre D (2019) Benefits and limitations of genome- wide association studies. Nat Rev Genet 20:467–484
xvi
Introduction and Overview: Single Nucleotide Polymorphisms...
Tuller T, Carmi A, Vestsigian K, Navon S, Dorfan Y, Zaborske J, Pan T, Dahan O, Furman I, Pilpel Y (2010) An evolutionarily conserved mechanism for controlling the efficiency of protein translation. Cell 141:344–354 Villalobos A, Ness JE, Gustafsson C, Minshull J, Govindarajan S (2006) Gene Designer: a synthetic biology tool for constructing artificial DNA segments. BMC Bioinform 7:1–8
Contents
Part I An Overview of Human Genome Sequencing and How to Access Information About SNVs 1
SNPs Classification and Terminology: dbSNP Reference SNP (rs) Gene and Consequence Annotation �������� 3 Lon Phan
Part II A Broad Survey of SNPs, Their Classification into Synonymous and Non-synonymous and the Undesirable Consequences of Using the Term “Silent” for Synonymous Changes 2
Evolutionary Forces That Generate SNPs: The Evolutionary Impacts of Synonymous Mutations ������������������������ 15 Deepa Agashe
3
Recording Silence – Accurate Annotation of the Genetic Sequence Is Required to Better Understand How Synonymous Coding Affects Protein Structure and Disease�������������������������������������� 37 Aviv A. Rosenberg, Alex M. Bronstein, and Ailie Marx
Part III The Role of SNPs in Human Disease 4
GWAS to Identify SNPs Associated with Common Diseases and Individual Risk: Genome Wide Association Studies (GWAS) to Identify SNPs Associated with Common Diseases and Individual Risk���������������������������������������������������������������������������������� 51 Gaya Hettiarachchi and Anton A. Komar
5
SNPs Ability to Influence Disease Risk: Breaking the Silence on Synonymous Mutations in Cancer�������������������������������� 77 Eduardo Herreros, Xander Janssens, Daniele Pepe, and Kim De Keersmaecker xvii
xviii
Contents
Part IV An Examination of the Mechanisms by Which Synonymous Mutations Affect Protein Levels or Protein Folding Which Affect Human Physiology and Response to Therapy 6
An Examination of Mechanisms by which Synonymous Mutations may Alter Protein Levels, Structure and Functions ������������������������������������������������������������������������������������������ 99 Yiming Zhang and Zsuzsa Bebok
7
Methods to Evaluate the Effects of Synonymous Variants������������������ 133 Brian C. Lin, Katarzyna I. Jankowska, Douglas Meyer, and Upendra K. Katneni
Part V The Role of SNPs in Personalized Medicine and the Platform Technology of Codon Optimization 8
Using Genome Wide Studies to Generate and Test Hypotheses that Provide Mechanistic Details of How Synonymous Codons Affect Protein Structure and Function: Functional SNPs in the Age of Precision Medicine������������������������������������������������������������������������������ 171 Brandon N. S. Ooi, Ashley J. W. Lim, Samuel S. Chong, and Caroline G. L. Lee
9
SNPs and Personalized Medicine: Scrutinizing Pathogenic Synonymous Mutations for Precision Oncology �������������� 185 Samuel Peña-Llopis
10 Condon Optimization: Codon Optimization of Therapeutic Proteins: Suggested Criteria for Increased Efficacy and Safety���������������������������������������������������������� 197 Vincent P. Mauro Correction to: SNPs Ability to Influence Disease Risk: Breaking the Silence on Synonymous Mutations in Cancer������������������������ C1 Index������������������������������������������������������������������������������������������������������������������ 225
Part I
An Overview of Human Genome Sequencing and How to Access Information About SNVs
Chapter 1
SNPs Classification and Terminology: dbSNP Reference SNP (rs) Gene and Consequence Annotation Lon Phan
1.1 Introduction This chapter introduces the NCBI Single Nucleotide Polymorphism database (dbSNP), an important resource for understanding human variation and molecular genetics (Sherry et al. 2001). The chapter describes how dbSNP variants are aligned to the human genome and how the locations of variants to the annotated genes and mRNAs are identified and molecular functional classifications assigned using the standard Sequence Ontology (Gene Ontology Consortium 2008) along with summary statistics. The chapter also provides instructions for searching dbSNP with links and includes screen images of search examples. The NCBI dbSNP database houses human genetic variation and frequency data from large-scale projects including HapMap (Thorisson et al. 2005), 1000Genomes (Auton et al. 2015), GO-ESP (https://evs.gs.washington.edu/), GnomAD (Karczewski et al. 2020), and TOPMED (Taliun et al. 2021) as well as more focused studies such as locus-specific databases (LSDB) (Fokkema et al. 2011) and clinical sources such as ClinVar (Landrum et al. 2016). dbSNP aggregates genetic variations from multiple submitters and assigns stable Reference SNP (rs) identifiers. The rs identifiers can be used for citing in publications and for integrating with other data sources. The rs records are annotated and integrated with other NCBI resources including ClinVar, Gene, RefSeq, and PubMed, and disseminated to the scientific community (Sayers et al. 2021). The database currently (June 2021) houses over 1 billion rs records in Build 155 (dbSNP release version number) (Sayers et al. 2021). In addition, more than 900 million of the rs records include population frequency data from many different projects including 1000Genomes, GnomAD, and NCBI L. Phan (*) Information Engineering Branch, National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, USA e-mail: [email protected] © Springer Nature Switzerland AG 2022 Z. E. Sauna, C. Kimchi-Sarfaty (eds.), Single Nucleotide Polymorphisms, https://doi.org/10.1007/978-3-031-05616-1_1
3
4
L. Phan
Allele Frequency Aggregator (ALFA) project (Phan et al. 2020) that provide aggregate allele frequency from the Database of Genotype and Phenotype (dbGaP) studies (Mailman et al. 2007). Depending on the project the population can range from local ethnic groups to large continental groups such as European, African, and Asian.
1.2 What Is dbSNP Used for? dbSNP has served for over two decades as the premier reference variation database for many disciplines (Sherry et al. 2001; Sayers et al. 2021). It is used worldwide in the fields of personal genomics, medical genetics, and for managing, annotating, and analysis of genetic variations (Sherry et al. 2000). Single nucleotide variations (SNV) including common single nucleotide polymorphisms (SNP) and rare variations are the most common type of genetic variations in dbSNP. These genetic differences are important in the study of human health and serve as genetic markers to locate genes and genomic loci that are associated with diseases and common phenotypes or directly affect a gene’s function. SNVs are also used to track the inheritance of disease genes within families. dbSNP aggregate and annotate genetic variation data from multiple submitters. Identical variants from multiple sources are clustered together to produce a unique exemplar variant and assigned the dbSNP Reference SNP (rs or RefSNP) number (Sherry et al. 1999). The RefSNP catalog is a non-redundant collection of submitted variants that were clustered, integrated, and annotated. RefSNP number is the stable accession regardless of the differences in genomic assemblies. They provide a stable variant notation for mutation and polymorphism analysis, annotation, reporting, data mining, and data integration. dbSNP RefSNP has been cited in over 60,000 publications. Besides being integrated with other NCBI resources (see above), dbSNP data is integrated into various bioinformatic pipelines and used in both open-source and commercial variation tools and software (Ng and Henikoff 2003; Karchin et al. 2005; Hu et al. 2013). The data is also used as a benchmark for variant discovery projects, to assess data quality, enrich annotation, and for variant interpretation and prioritization. An important and common use case is finding a subset of variants using RefSNP number or match with same start and end positions, and with same observed alleles (Bhagwat 2010). Variants annotated in dbSNP can further be filtered to the gene of interest and functional class or consequences (discussed below in Computing Molecular Consequences in dbSNP) and classified as common or rare variants using population allele frequency to facilitate prioritization and interpretation. In addition, dbSNP has over 20 different attributes for search and filtering for variants including ClinVar clinical significance, somatic variants, and publications.
1 SNPs Classification and Terminology: dbSNP Reference SNP (rs) Gene…
5
1.3 dbSNP Molecular Consequences (AKA Function Class) Variations that occur in functional regions of genes or conserved non-coding regions might cause significant changes in the complement of transcribed sequences (Kumar et al. 2009). This can lead to changes in protein expression and/or structure that can affect aspects of the phenotype such as metabolism or cell signaling. The annotated consequences based on the position of the variant to identifiable features of a specific transcript for a gene are defined historically in dbSNP as Function class (missense, synonymous, etc.).
1.4 Computed Molecular vs. Observed Functional Consequences dbSNP computes variant molecular consequence, the possible molecular effect of the variant computed from the location of the variant relative to sequence annotated features such as upstream open reading frames (uORFs), coding sequence (CDS), and others (Yang and Wang 2015; McLaren et al. 2016; Li and Wang 2017). As opposed to a functional consequence which is the observed effect of a sequence variation determined experimentally (Datta et al. 2015; Delio et al. 2015; Li and Wang 2017; Zhang et al. 2020; Thompson et al. 2020; Gergics et al. 2021), the database does not annotate variants with the functional consequence that have been determined experimentally or described in publications. However, dbSNP does provide links to a subset of ClinVar variants with functional consequence information and clinical significance (Landrum et al. 2016). Due to historical reasons, the molecular consequences in dbSNP are also sometimes described as function class; a term that is being phased out.
1.5 Computing Molecular Consequences in dbSNP dbSNP computes and annotates the possible molecular consequences of the RefSNP as described in the NCBI Variation Resource glossary (https://www.ncbi. nlm.nih.gov/variation/docs/glossary/). Moreover, international standard Sequence Ontology (SO) terms (http://www.sequenceontology.org/) used that describe how the variation alters mRNA transcription and translation are also described. SO is developed by a consortium of genomic and molecular databases to provide a structured controlled vocabulary for describing the features of biological sequences including genetic variant consequences. dbSNP assigns both the description and the SO identifier for corresponding consequences.
L. Phan
6
Molecular consequences are computed using Reference Sequence (RefSeq) alignments between the genomic sequence to mRNA and proteins. The RefSNP genomic position is remapped or projected to the mRNA sequence through the sequence alignment. The RefSNP molecular consequence is then assigned based on its position [KSC1] and relationship with local gene features or transcripts. When a RefSNP is near a transcript or in a transcript interval but not in the coding region, dbSNP assigns consequence by the position of the variation relative to the structure of the aligned transcript. In other words, a variation may be near a gene (locus region), in a UTR (mRNA-utr), in an intron (intron), or a splice site (Table 1.1). If the RefSNP is in a coding region (CDS), then RefSNP assignment depends on how each allele may affect the translated peptide sequence (Table 1.2). There are four possible outcomes when a RefSNP is in the CDS. Corresponding SO accessions for the consequences are also reported below. 1. The RefSNP variant allele is different from the reference sequence and yields a new codon that encodes the same amino acid. This is termed a synonymous variant (SO:0001819). 2. The RefSNP variant allele is different from the reference sequence and yields a new codon that encodes the different amino acid. This is termed a nonsynonymous variant (SO:0001992). dbSNP reports nonsynonymous variants as missense variant (SO:0001583) for nucleotide substitutions and special cases for substitutions that result in premature termination or elongation of the transcript as stop gained (SO:0001587) and stop lost (SO:0001578) that may result in a shortened or elongated polypeptide, respectively. 3. The RefSNP variant allele has the number of inserted or deleted nucleotides that are not a multiple of three and thus cause a disruption of the translational reading frame. This is termed a frameshift variant (SO:0001589). There are also specific frameshift classes inframe insertion (SO:0001821), inframe deletion Table 1.1 CDS variant consequence assignment for effect on protein products per transcript Term stop_lost inframe_indel inframe_insertion inframe_deletion stop_gained frameshift_variant synonymous_variant missense_variant
SO ID SO:0001578 SO:0001820 SO:0001821 SO:0001822 SO:0001587 SO:0001589 SO:0001819 SO:0001583
Total 22,070 20,170 92,001 177,245 443,531 633,216 4,633,280 9,628,782
ClinVar 443 937 2703 6066 29,500 36,693 160,233 261,099
With frequency 20,354 13,966 82,616 167,207 385,971 528,266 4,365,655 8,994,410
Common 265 74 2120 2350 4851 2183 89,653 102,484
Rare 20,089 13,892 80,496 164,857 381,120 526,083 4,276,002 8,891,926
For each RNA for which the variant coincides in part or completely within a coding region, dbSNP assign one of the following molecular consequences, as a computed effect of a sequence change on a particular protein product. Total counts of dbSNP Reference SNP (RS) in CDS. The columns are (1) Sequence Ontology (S0) terms; (2) SO ID (3) the total number of RS (4) RS that are in ClinVar (4) RS with population allele frequency and the subsets that are (5) common (MAF >= 0.01) and (6) rare (MAF < 0.01). The total counts are for unique RS within the term. A single variant may have more than one associated molecular consequences and are counted for each
SO ID SO:0001590 SO:0001582 SO:0001574 SO:0001575 SO:0001623 SO:0001634 SO:0001624 SO:0001580 SO:0001619 SO:0002152 SO:0002153 SO:0001627
Total 35,122 52,789 182,261 234,992 6,850,124 10,434,787 15,647,249 14,952,100 21,333,831 107,020,260 163,642,947 563,200,771
ClinVar 626 1426 6476 8028 42,844 15,416 44,376 460,828 149,220 112,366 67,765 171,578
With Frequency 32,489 48,839 159,293 212,761 6,511,627 9,953,890 14,914,114 13,884,940 20,248,062 102,099,065 156,236,915 537,639,744
Common 455 440 1115 1881 110,208 240,692 346,245 178,680 418,682 2,449,470 3,736,732 12,920,857
Rare 32,034 48,399 158,178 210,880 6,401,419 9,713,198 14,567,869 13,706,260 19,829,380 99,649,595 152,500,183 524,718,887
Location-based Ontology Terms are assigned to a variant whenever any part of its allele sequence overlaps one of the Gene, RNA Feature or Coding regions (see Fig. 1.1). If the variant overlaps more than one region or, if multiple transcripts are involved all relevant SO terms are reported. The counts are for unique RS within the term. The columns are (1) Sequence Ontology (S0) terms; (2) SO ID (3) the total number of RS (4) RS that are in ClinVar (4) RS with population allele frequency and the subsets that are (5) common (MAF >= 0.01) and (6) rare (MAF < 0.01)
Term terminator_codon_variant initiator_codon_variant splice_acceptor_variant splice_donor_variant 5_prime_UTR_variant downstream_transcript_variant 3_prime_UTR_variant coding_sequence_variant non_coding_transcript_variant genic_downstream_transcript_variant genic_upstream_transcript_variant intron_variant
Table 1.2 Non-CDS variant location based CDS consequence assignment per transcript
1 SNPs Classification and Terminology: dbSNP Reference SNP (rs) Gene… 7
8
L. Phan
Fig. 1.1 Gene, RNA and Coding Sequence (CDS) feature locations used for assignment of consequences and SO terms
(SO:0001822), and inframe indel (SO:0001820) due to insertion, deletion, or insertion and deletion (indel) variant type substitutions, respectively. 4. The RefSNP maps to partial transcripts where the annotated coding region features cannot be determined the variation class is noted as coding sequence variant (SO:0001580), based solely on position. Molecular functional classification is defined by positional and sequence parameters, consequently, two facts emerge: (a) if a gene has multiple transcripts because of alternative splicing, then a variation may have several different functional relationships to the gene; and (b) if multiple genes are densely packed in a genomic region, then a variation at a single location in the genome may have multiple, potentially different, relationships to its local overlapping gene neighbors and transcripts. Since these are assigned per coding transcript, a single variant may have more than one associated molecular consequence for each overlapping feature, and all relevant SO terms are reported. Likewise, for protein annotation, for each RNA for which the variant overlaps in part or completely within a coding region, we would assign one of the following molecular consequences, as a computed effect of a sequence change on a particular protein product.
1.6 Splicing Variants RefSNP variant alleles mapped to splice acceptor or donor sites are assigned as molecular consequence splice_acceptor_variant (SO:0001574) and splice_donor_ variant (SO:0001575), respectively (Table 1.2).
1 SNPs Classification and Terminology: dbSNP Reference SNP (rs) Gene…
9
1.7 Other Non-CDS Variants RefSNP mapped to non-CDS regions or outside the mRNA transcripts are assigned molecular consequences based on their relative positions to the mRNA feature or neighboring transcript (upstream or downstream). The list of location-based consequences is denoted in Table 1.2 and illustrated in Fig. 1.1.
1.8 Searching dbSNP by Variant Consequences User can search dbSNP using the terms in Table 1.1 and 1.2 with the “Function Class” qualifier alone or in combination with other search terms (https://www.ncbi. nlm.nih.gov/snp/docs/entrez_help/ For example, the search for missense variant is “missense variant”[Function Class]. (https://www.ncbi.nlm.nih.gov/ snp/?term=%22missense%20variant%22%5BFunction%20Class%5D). The consequences are report in the search result (Fig. 1.2) and on the RefSNP detail report page (Fig. 1.3).
Fig. 1.2 RefSNP search and filtering results (a) The gene and functional consequence are reported to each RS in the result (b) users can also limit results by functional consequences by selecting one or more “Function Class” filters shown on the left side along with the search results
Fig. 1.3 RefSNP consequence reporting. The variant annotation including the gene and consequences are reported on the dbSNP RefSNP page. Shown is an example RefSNP report for rs429358 mapped to the APOE4 gene and is associated with the risk for Alzheimer’s disease; (A) The gene and the most severe consequence is reported on the summary header, (B) the nucleotide and protein change, and SO terms for each mRNA transcript variant are shown in the Variant Details tab, and (C) shows rs429358 in a genomic sequence viewer. All APOE five mRNA transcript variants from alternate splicing are shown
1 SNPs Classification and Terminology: dbSNP Reference SNP (rs) Gene…
11
Fig. 1.3 (continued) Acknowledgement This work was supported by the National Center for Biotechnology Information of the National Library of Medicine (NLM), National Institutes of Health.
References Auton A, Brooks LD et al (2015) A global reference for human genetic variation. Nature 526:68–74 Bhagwat M (2010) Searching NCBI’s dbSNP database. Curr Protoc Bioinformatics Chapter 1:Unit 1.19 Datta A, Mazumder MHH, Chowdhury AS, Hasan MA (2015) Functional and structural consequences of damaging single nucleotide polymorphisms in human prostate cancer predisposition gene RNASEL. Biomed Res Int 2015:271458 Delio M, Patel K et al (2015) Development of a targeted multi-disorder high-throughput sequencing assay for the effective identification of disease-causing variants. PLoS ONE 10:e0133742 Fokkema IFAC, Taschner PEM et al (2011) LOVD v.2.0: the next generation in gene variant databases. Hum Mutat 32:557–563 Gene Ontology Consortium (2008) The Gene Ontology project in 2008. Nucleic Acids Res 36:440–444 Gergics P, Smith C et al (2021) High-throughput splicing assays identify missense and silent splice-disruptive POU1F1 variants underlying pituitary hormone deficiency. Am J Hum Genet 108:1526–1539 Hu H, Huff CD et al (2013) VAAST 2.0: improved variant classification and disease-gene identification using a conservation-controlled amino acid substitution matrix. Genet Epidemiol 37:622–634 Karchin R, Diekhans M et al (2005) LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources. Bioinformatics 21:2814–2820
12
L. Phan
Karczewski KJ, Francioli LC et al (2020) The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581:434–443 Kumar P, Henikoff S, Ng PC (2009) Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 4:1073–1081 Landrum MJ, Lee JM et al (2016) ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res 44:862–868 Li Q, Wang K (2017) InterVar: clinical interpretation of genetic variants by the 2015 ACMG-AMP Guidelines. Am J Hum Genet 100:267–280 Mailman MD, Feolo M et al (2007) The NCBI dbGaP database of genotypes and phenotypes. Nat Genet 39:1181–1186 McLaren W, Gil L et al (2016) The ensembl variant effect predictor. Genome Biol 17:122 Ng PC, Henikoff S (2003) SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res 31:3812–3814 Phan L, Jin Y, Zhang H et al (2020) ALFA: Allele Frequency Aggregator, US National Library of Medicine 10. National Center for Biotechnology Information. (Manuscript in preparation) Sayers EW, Beck J et al (2021) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 49:10–17 Sherry ST, Ward M, Sirotkin K (1999) dbSNP-database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res 9:677–679 Sherry ST, Ward M, Sirotkin K (2000) Use of molecular variation in the NCBI dbSNP database. Hum Mutat 15:68–75 Sherry ST, Ward MH et al (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29:308–311 Taliun D, Harris DN et al (2021) Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590:290–299 Thompson BA, Walters R et al (2020) Contribution of mRNA splicing to mismatch repair gene sequence variant interpretation. Front Genet 11:798 Thorisson GA, Smith AV, Krishnan L, Stein LD (2005) The International HapMap Project Web site. Genome Res 15:1592–1593 Yang H, Wang K (2015) Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR. Nat Protoc 10:1556–1566 Zhang J, Yao Y, He H, Shen J (2020) Clinical interpretation of sequence variants. Curr Protoc Hum Genet 106:98
Part II
A Broad Survey of SNPs, Their Classification into Synonymous and Non-synonymous and the Undesirable Consequences of Using the Term “Silent” for Synonymous Changes
Chapter 2
Evolutionary Forces That Generate SNPs: The Evolutionary Impacts of Synonymous Mutations Deepa Agashe
2.1 Introduction As I hope to have shown, consideration of natural selection has been a good guide in exposing the amount, the nature, and the consequences of degeneracy of the code, even leading to the revelation of some of its features that add both to its beauty and, in a minor way, to the corpus of genetic concepts. – T M Sonneborn (1965)
Ever since the degeneracy of the genetic code was revealed, synonymous mutations have played an important role in the development of molecular as well as evolutionary biology. The fact that multiple codons code for a single amino acid immediately implicated synonymous codons as key players during the evolution of the genetic code (Crick et al. 1961; Sonneborn 1965; Woese 1965). This perspective, emphasizing the importance of codon assignment during the evolution of life, remains relevant to date (Koonin and Novozhilov 2016). In the decades that followed, synonymous codon changes have featured in many important developments in evolutionary biology. These include the neutral and nearly neutral theories of molecular evolution, the molecular clock, the organization, composition and exchange of genetic material, the evolution of gene regulation, robustness of information transfer, and methods to identify signatures of selection. Much of the evolutionary perspective on synonymous mutations has revolved around synonymous codon changes in the context of translation (i.e. with respect to the information contained within a nucleotide triplet) (Sharp and Li 1986b; Andersson and Kurland 1990; Hershberg and Petrov 2008; Sharp et al. 2010; Plotkin and Kudla 2010). This is because codons read by more abundant tRNAs are often translated faster and with greater accuracy (Precup and Parker 1987; Sørensen et al. 1989; Gardin et al. 2014). However, the role of synonymous changes as carriers and influencers of many different layers of information embedded in the genetic code is D. Agashe (*) National Centre for Biological Sciences, Bangalore, India e-mail: [email protected] © Springer Nature Switzerland AG 2022 Z. E. Sauna, C. Kimchi-Sarfaty (eds.), Single Nucleotide Polymorphisms, https://doi.org/10.1007/978-3-031-05616-1_2
15
16
D. Agashe
increasingly apparent (Chamary et al. 2006; Bailey et al. 2021). These include impacts that are related to translation (Liu 2020), as well as those that do not directly affect translation (Callens et al. 2021). The diverse mechanisms responsible for the consequences of synonymous mutations are described in detail in other Chapters in this book. Here, I focus on the trajectory of the evolutionary perspective on synonymous mutations over the course of the past 50 years. Although I cannot do justice to the many kinds of evidence and developments that drove this arc, I have attempted to give an overview of the major insights that changed the course of the narrative from neutrality to pervasive selection.
2.2 Evidence for the Evolutionary Impacts of Synonymous Mutations I first provide a broad introduction to different types of evidence that can be used to infer the nature and strength of selection acting on a focal set of mutations. While these methods can be applied to any class of mutations, below I discuss them in the specific context of synonymous changes, to make it easier to follow the discussion in subsequent sections.
2.2.1 Indirect Evidence Results from bioinformatics analyses can provide important evidence suggesting the role of genetic drift vs. selection in driving observed patterns of genetic variation. The core idea is that the incidence or frequency of neutral mutations should be consistent with the action of genetic drift. On the other hand, selection either for or against a category of mutations should result in their enrichment or depletion in specific contexts that reflect the source of selection. In case of synonymous mutations, selection for rapid or accurate translation is expected to drive a positive correlation between relative codon use and abundance of cognate tRNAs across taxa (Fig. 2.1a). Within a species, genes that are more important for cell function should evolve under stronger selection. Hence, deleterious synonymous mutations should be removed more effectively from highly expressed genes, reducing the overall synonymous substitution rate (Fig. 2.1b). Although the figure illustrates simple scenarios, more complex and multi-dimensional analyses can allow us to investigate the impact of many factors at once. Extending this logic to specific mutations, we can ask whether synonymous allele frequency in naturally or experimentally evolved populations is consistent with evolution under selection. Such analyses come in many flavours. Typically, the site frequency spectrum (SFS) – a histogram of derived allele frequencies in a population – is skewed left, so that only a few mutations are observed at high frequency
2 Evolutionary Forces That Generate SNPs: The Evolutionary Impacts of Synonymous…
17
Fig. 2.1 Different types of evidence for the evolutionary impacts of synonymous mutations. Schematics illustrate various kinds of comparative and experimental studies that can demonstrate the role of synonymous mutations in evolution. Panel E shows three independently evolved populations
and most substitutions are rare (Fig. 2.1c). Deviations from the expected SFS under neutral evolution are thus indicative of natural selection; e.g. the frequency of mutations at the far right of the distribution in Fig. 2.1c is higher than expected. Such deviations at synonymous sites would suggest selection favouring these mutations. Next, genome wide association studies (GWAS) ask whether a particular phenotype (e.g. a disease) is consistently correlated with the presence (or absence) of specific alleles, including synonymous polymorphisms (Fig. 2.1d). In SFS and GWAS analyses, evolution has already occurred, and we are asking whether the outcomes (at the level of patterns of genetic polymorphism) are consistent with neutrality or selection. Similarly, an increase in synonymous allele frequency during selection experiments – especially if it occurs repeatedly across independently evolving populations – suggests that the allele may be adaptive (Fig. 2.1e). A major limitation of these analyses is that it is difficult to control for the effect of multiple known and unknown covariates, and to disentangle complex interactions between them. Ultimately, although these analyses can be powerful, they can only show correlation and not causation. Hence, more direct experimental evidence becomes necessary.
18
D. Agashe
2.2.2 Direct Evidence As with the methods discussed above, direct experimental analysis can focus on a class of mutations or specific mutations of interest. A commonly used strategy involves random or directed mutagenesis of a focal gene, followed by identifying the mutation(s) and measuring their impact on phenotype or fitness (Fig. 2.1f). Such datasets can be used to measure the selection coefficient (s), which indicates the fitness impact of a mutation relative to the ancestor. The coefficients can then be used to estimate the general statistical distribution of fitness effects (DFE) of mutations (Fig. 2.1g), which is important for predicting adaptive evolution and the fate of new mutations. For instance, a distribution with fat tails would indicate that many mutations have very large fitness effects and will be rapidly eliminated or fixed by selection. Finally, the most conclusive evidence comes from experimental evolution studies that directly observe the occurrence of adaptive mutations. Such studies effectively extend the paradigm shown in Fig. 2.1e by engineering the evolved putatively beneficial allele on the ancestral genomic background. If the engineered strain has higher fitness, it is clear that the allele is beneficial and evolved under positive selection (Fig. 2.1h). Whereas the experiments described earlier (Fig. 2.1f, g) show that synonymous mutations could face selection, studies of the sort exemplified in Fig. 2.1h directly demonstrate that specific synonymous mutations do evolve under selection, and are adaptive. This is crucial because various cellular or population level phenomena may buffer or alter the longer-term impacts of mutations, preventing selection from effectively acting on them in proportion to their observed short- term phenotypic effects.
2.3 Changing Evolutionary Perspectives on Synonymous Mutations I now provide an overview of the transformation that occurred in evolutionary biology, regarding the role of synonymous changes. As with all transformations, the prevalent thinking changed gradually as evidence accumulated, and perspectives did not fall into neat mutually exclusive bins of time. Nonetheless, for ease of organization and narrative, I have split the timeline into three phases that approximately capture the progression of thought in the field. The timeline is summarized in Fig. 2.2 along with examples of the earliest comprehensive evidence for specific inferences about the evolutionary effects of synonymous changes. Below, I discuss the changing perspectives in more detail, including a larger portion of the body of supporting work.
2 Evolutionary Forces That Generate SNPs: The Evolutionary Impacts of Synonymous…
19
Fig. 2.2 Timeline of changing evolutionary perspectives on synonymous mutations. The first (or first comprehensive) report of a particular outcome relevant to our understanding of the evolutionary role of synonymous changes is shown here, color coded by the type of evidence (see Fig. 2.1). Subsequent relevant work showing additional support is discussed in the text. CUB = codon usage bias
2.3.1 Phase I: Mostly Neutral The years immediately following the discovery of the nature of the genetic code (Crick et al. 1961; Nirenberg and Matthaei 1961) were heady. Even as biochemists painstakingly assigned meaning to each codon, everyone tried to divine the logic
20
D. Agashe
behind the peculiar structure of the codon table. The degeneracy of the code suggested neutrality. Because synonymous changes retained the amino acid identity, they would not face strong natural selection, and were thus christened “silent” (Sonneborn 1965). It also became apparent that when single step non-synonymous mutations did occur, they often encoded amino acids with similar biochemical or biophysical properties. Hence, the structure of the genetic code appeared to have been optimized to reduce catastrophic errors during protein production, and synonymous mutations, due to their neutrality, played a major role in this optimization (Crick et al. 1961; Woese 1965; Crick 1967; Woese 1969). The silent nature of synonymous mutations was also important in the development of the neutral theory of molecular evolution, which posited that most mutations are effectively neutral and evolve under genetic drift rather than selection (Kimura 1968; King and Jukes 1969; Kimura 1977). Intriguingly, in his papers Kimura cautioned that not all synonymous mutations may be neutral; though he clearly thought that this was a very small proportion and did not discuss it further. Soon after, new comparative and experimental evidence showing biased codon usage began undermining the neutrality of synonymous codons (as outlined in Figs. 2.1a–c). If synonymous codons were indeed silent, one would expect different synonyms to be used in approximately equal proportions. In contrast, organisms seemed to prefer some synonyms over others, suggesting a deviation from neutrality. Early data on the predicted structures of mRNA sequences indicated potential selection for specific degenerate codons to optimize mRNA structure and stability (Adams et al. 1969; Jou et al. 1971; Hasegawa et al. 1979). In addition, comparison of globin genes across a small number of species showed that synonymous substitutions were significantly reduced compared to neutral expectation; since they could not have altered amino acid sequences, their scarcity was proposed to reflect selection on mRNA structure (Kafatos et al. 1977). Further molecular studies continued to find diverse impacts of synonymous codon changes on mRNA structure (Sørensen et al. 1989), codon-anticodon binding affinity (Grosjean and Fiers 1982), and translation rate (Pedersen 1984), though codon usage was suggested to have a stronger effect than mRNA structure on translation (Sørensen et al. 1989). Thus, there was extensive discussion of the idea that synonymous changes could influence the transmission of multiple kinds of information encoded in the genome (not only amino acid sequence but also mRNA and protein structure and stability), and that it was important to tease apart their relative influence on translation (Grosjean et al. 1978; Bennetzen and Hall 1982; Sørensen et al. 1989). However, in evolutionary biology, the narrative shifted away from selection on mRNA structure specifically, towards broader selection on translation. For instance, mRNA structure could face selection if it allowed cells to regulate translation (Yamamoto et al. 1985), and codon use itself may also evolve under selection due to its direct effects on translation. More sequencing data supported and expanded the view that biases in the use of particular codons were extensive (Fiers et al. 1976; Berger 1978; Grantham 1978; Grantham et al. 1980). At the same time, comparative analyses uncovered a striking correlation between the abundance of specific codons and the availability of tRNAs with matching anticodons, indicating evolutionary
2 Evolutionary Forces That Generate SNPs: The Evolutionary Impacts of Synonymous…
21
optimization to ensure rapid and/or accurate translation (Chavancy et al. 1979; Post et al. 1979; Ikemura 1981, 1985). Consistent with the idea of selection driving codon usage bias in proportion to cellular needs, highly expressed genes showed stronger codon preferences (Grantham et al. 1981; Ikemura 1981; Bennetzen and Hall 1982; Gouy and Gautier 1982). Similar correlations across different species suggested that a consistent, common selection pressure acted on genomes to optimize codon use and tRNA pools for translation (Andersson and Kurland 1990). Nonetheless, synonymous codon changes were thought to be largely neutral relative to non-synonymous changes, because the latter could influence both the amino acid sequence as well as mRNA properties (Li et al. 1985). Indeed, early comparative evidence indicated that synonymous substitution rates are constant across genes and genomes, and are higher than rates of non-synonymous substitutions (Miyata et al. 1980), but lower than substitution rates in pseudogenes evolving under minimal functional constraint (Miyata and Hayashida 1981). Since substitution rates should be inversely proportional to the strength of selection, these data indicated that synonymous mutations evolve under substantially weaker selection than non- synonymous mutations. Population genetic analyses also shed light on the problem of whether and how selection would favor specific codons. Extending the neutral theory, Kimura’s model showed that biased codon usage could also evolve under genetic drift (Kimura 1981). Hence, the existence of codon bias alone was insufficient evidence for selection. The key insight here was that the evolutionary fate of mutations is determined both by their fitness effect (selection coefficient) and the effective population size. Even if the overall codon usage of highly expressed genes had large fitness consequences, an individual synonymous codon change would likely face only very weak selection, because in most cases it would not impede translation significantly (Sørensen et al. 1989). In large populations (such as for bacteria), such weak selective effects are sufficient for the removal of deleterious mutations; whereas in small populations (such as for mammals), most mutations should behave as if they were neutral. Hence, codon bias would evolve under strong selection only in very large populations, where selection could effectively counter the continuous influx of mutations to suboptimal codons (Bulmer 1991). Thus, the dichotomy of neutral synonymous mutations vs. selected non-synonymous mutations continued to shape evolutionary thinking.
2.3.2 Phase II: Weak Translational Selection Gradually, analysis of the causes and consequences of codon usage bias developed into a subfield of its own. Part of the motivation here was to explain the large variation in codon choice and strength of codon usage bias observed across genes in a given genome, as well as across genomes (species). Early models suggested that cellular tRNA pools must rapidly adapt to amino acid usage in synthesized proteins (Garel 1974); over long evolutionary periods, codon usage and tRNA pools could
22
D. Agashe
thus co-evolve to optimize translation. Important direct support came from the observation that synonymous codons had distinct rates of translation in vivo (Robinson et al. 1984; Varenne et al. 1984; Sørensen et al. 1989; Curran and Yarus 1989; Sørensen and Pedersen 1991), as well as differential accuracy (Precup and Parker 1987; Parker 1989). An important development was the codon adaptation index (CAI), which quantified the observed codon bias in highly expressed reference genes (such as ribosomal proteins) relative to the maximum possible codon bias (Sharp and Li 1987a). The expectation was that stronger selection to match the cellular tRNA pool would lead to stronger codon bias and hence higher values of CAI. This metric was a significant improvement over previous ones, because it readily allowed comparisons across genes as well as genomes, while accounting for confounding factors such as gene length, amino acid composition, and distinct codon tables (Sharp and Li 1987a). As a result, the index allowed identification of genes and genomes that had evolved under stronger translational selection relative to others. The CAI also had practical utility in designing heterologous genes for enhanced expression in different hosts (Zolotukhin et al. 1996; Kim et al. 1997). While the impact of translational selection on highly expressed genes was generally well accepted, there was some debate about why genes with low expression level showed weak codon bias. Two main evolutionary hypotheses were proposed to explain this pattern: (1) rare codons serve to suppress expression of these genes (and therefore evolve under positive selection) (Ames and Hartman 1963; Itano 1963; Grosjean and Fiers 1982; Konigsberg and Godson 1983) and (2) genes with low expression level face weaker selection, allowing rare codons to arise and persist due to a combination of mutations and genetic drift (Sharp and Li 1986a, b). Though the first hypothesis was favoured by those trying to understand gene regulation, translation elongation rates did not seem to be dramatically reduced by rare codons, due to their low frequency and overall matching between codon usage and the tRNA pool (Holm 1986). Therefore, rare codons would have to occur in large numbers for translation elongation to be significantly altered. Furthermore, genes with low expression level are not enriched in rare codons (Sharp and Li 1986b), even at the beginning of the gene where codon bias is typically weaker (Eyre-Walker and Bulmer 1993). Therefore, while codon bias generally seemed to evolve under selection for smooth translation, rare codons did not appear to evolve under positive selection. Thus, differential strength of selection across genes was better supported as a general explanation for the positive correlation between the strength of codon usage bias and gene expression (Bulmer 1991). Through the 1980s and 1990s, synonymous mutations continued to be used as exemplars of neutrality (especially at four-fold degenerate sites) as a baseline against which non-synonymous substitutions could be compared to identify genomic regions under selection. Initially this involved simple comparisons between synonymous and non-synonymous substitution rates (Miyata and Yasunaga 1980). As evidence for synonymous rate variation across genes (Li et al. 1985) and correlations with codon use (Sharp and Li 1987b) emerged, these were incorporated to account for variation in important parameters such as unequal rates of codon and nucleotide changes (Li et al. 1985; Nei and Gojobori 1986). These methods were used in
2 Evolutionary Forces That Generate SNPs: The Evolutionary Impacts of Synonymous…
23
interesting ways. For instance, an analysis of enterovirus isolates associated with a conjunctivitis outbreak found identical substitutions in independently evolving lineages at both nonsynonymous and synonymous sites, suggesting that both were similarly constrained by selection (Takeda et al. 1994). Eventually, acknowledging local sequence variation, newer phylogenetic methods included the ability to estimate key evolutionary parameters from sequence data in a maximum likelihood framework (Goldman and Yang 1994; Yang and Nielsen 2000). Nonetheless, tests for signatures of selection included comparing substitution and polymorphism rates at non-synonymous sites (putatively selected) vs. synonymous sites (neutral, evolving under drift) (McDonald and Kreitman 1991). In 1987, the first experimental report of the impact of rare codons on cellular phenotype was published, showing that increasing the proportion of rare codons towards the 5′ end of a highly expressed yeast gene reduced mRNA levels by up to three-fold, and protein levels up to ten-fold (Hoekema et al. 1987). Though a subsequent study found variable impacts of altering or adding rare codons at different positions in a focal gene (Chen and Inouye 1990), it was clear that synonymous changes could have substantial effects on gene and protein expression. A few years later, the results of Hoekema et al’s yeast study were corroborated in flies. Increasing the number of rare codons in the adh gene of Drosophila melanogaster larvae successively reduced production of the highly expressed enzyme alcohol dehydrogenase (Carlini and Stephan 2003). Importantly, the reduction in enzyme levels reduced larval tolerance for ethanol by ~2% (Carlini 2004). Since the species encounters high levels of ethanol in its environment, these results demonstrated the potential for selection to directly act on local synonymous codon changes. Direct evidence for the global effects of mismatched codon usage and tRNA pools would come much later: replacing a frequently used codon with a rare codon in multiple highly expressed Escherichia coli genes increased mistranslation and reduced growth rate, effects that were rescued by overexpression of the cognate (rare) tRNA (Frumkin et al. 2018). Meanwhile, indirect evidence for the impact of codon usage on translation as well as on protein and sequence evolution continued to accumulate. In Drosophila, evolutionarily conserved amino acids showed stronger codon bias compared to sites that were more labile (Akashi 1994). Selection can also be inferred by comparing patterns of variation in closely related species: greater divergence (fixed substitutions) between species relative to within-species polymorphism indicates that divergence occurred due to selection. Indeed, across closely related species, compared to rare codons, preferred codons showed significantly more divergence relative to polymorphism, suggesting that selection on synonymous changes was prevalent (Akashi 1995). Importantly, mutational biases could not explain the observed patterns of codon usage bias within and across species (Akashi 1994; Duret and Mouchiroud 1999; Eyre-Walker 1999); though in the human genome, mutational biases appeared to better explain codon usage patterns (Urrutia and Hurst 2001). Incorporating mutation-selection-drift parameters into phylogenetic analyses of selection, McVean and Vieira also demonstrated species-specific patterns of
24
D. Agashe
selection at synonymous sites (McVean and Vieira 2001), clarifying conditions under which codon preferences could diverge across species. Subsequent analyses of larger datasets also supported the idea of co-evolved tRNA pools and codon usage bias driven by translational selection for highly expressed genes (Duret and Mouchiroud 1999; Kanaya et al. 1999; Supek 2015), especially in fast-growing species (Rocha 2004; Higgs and Ran 2008; Vieira-Silva and Rocha 2010; Ran and Higgs 2010). Mathematical models and comparative analyses further suggested that different combinations of codon use, tRNA pools, and associated systems (such as tRNA modifying enzymes, which modulate wobble pairing) could evolve under selection (Higgs and Ran 2008; Novoa and Pouplana 2012; Diwan and Agashe 2018). Changes in codon preference could thus be facilitated by periods of genetic drift (Shields 1990), though such weakening of selection is not strictly necessary (Hershberg and Petrov 2009; Sun et al. 2017). Thus, the same global selection pressure could explain diverse patterns of variation in the translation machinery. A broad comparative analysis across model organisms suggested that both synonymous and missense mutations evolve under strong selection to avoid mistranslation, reinforcing the idea of a common selective pressure acting on multiple aspects of protein evolution (Drummond and Wilke 2008). Interestingly, experimental evolution studies with microbes found repeated instances of synonymous mutations rising to high frequency (as outlined in Fig. 2.1e) (Bull et al. 1997, 1998; Holder and Bull 2001; Cuevas et al. 2002; Novella et al. 2004). Although this was rarely discussed seriously, the parallelism suggested that synonymous mutations could impact adaptive evolution. However, direct evidence for the broader evolutionary role of synonymous mutations was still missing.
2.3.3 Phase III: Pervasive (Sometimes Strong) Selection, Diverse Mechanisms The period from 2005 onwards would prove to be pivotal, as the number of studies exploded and evidence for major phenotypic and fitness impacts of synonymous changes began to accumulate in various contexts. For instance, selection at synonymous sites became clearer in groups such as mammals (Chamary et al. 2006) and specifically in humans (Comeron 2006), where earlier evidence was weaker. In fact, many synonymous mutations were now shown to be strongly associated with human diseases (Cartegni et al. 2002; Sauna and Kimchi-Sarfaty 2011; Supek et al. 2014; Hunt et al. 2014; Sharma et al. 2019). Hence, a general consensus began to form that across taxa, synonymous changes evolved under significant selection, with multiple underlying mechanisms (Hershberg and Petrov 2008; Sharp et al. 2010; Plotkin and Kudla 2010; Novoa and Pouplana 2012). Even so, much of this discussion was focused on selection on codon bias and other direct impacts on translation. Gradually, as analyses began to include synonymous mutations outside of the codon context, the broader evolutionary implications of synonymous mutations emerged. For
2 Evolutionary Forces That Generate SNPs: The Evolutionary Impacts of Synonymous…
25
example, mutagenesis of reporter genes expressed in E. coli showed large impacts of synonymous mutations on gene expression (as outlined in Fig. 2.1f), with the effects better explained by mRNA structure rather than codon bias per se (Kudla et al. 2009; Goodman et al. 2013). Many studies demonstrated that the phenotypic impacts of synonymous changes also translate into fitness consequences, via mechanisms that are not always directly related to translation. For instance, in vivo protein folding and function were altered by synonymous SNPs implicated in genetic association studies of drug-resistant cancers (Kimchi-Sarfaty et al. 2007). Beneficial as well as deleterious synonymous mutations were reported in multiple studies, often with microbes (Kudla et al. 2009; Lind et al. 2010a; Cuevas et al. 2012; Agashe et al. 2013; Bailey et al. 2014; Kershner et al. 2016; Brandis and Hughes 2016; Agashe et al. 2016; Canale et al. 2018; Mochizuki et al. 2018; Zwart et al. 2018; Lebeuf-Taylor et al. 2019). Importantly, these reports include many instances of strong fitness consequences (i.e. high s values) of single synonymous mutations (Lind et al. 2010a; Firnberg et al. 2014; Agashe et al. 2016; Lind et al. 2017; Kristofich et al. 2018; Sane et al. 2020). These large-effect synonymous mutations should certainly be visible to natural selection, even in relatively small populations. A handful of experimental evolution studies with bacteria now provide direct evidence that synonymous mutations can indeed evolve under selection (Bailey et al. 2014; Kershner et al. 2016; Knöppel et al. 2016; Agashe et al. 2016; Kristofich et al. 2018). In these studies, evolved synonymous substitutions were observed repeatedly, and shown to be highly beneficial when engineered into the ancestral genomic background (as outlined in Fig. 2.1h). For instance, Methylobacteriumstrains carrying recoded versions of a highly expressed gene initially suffered a large decrease in fitness relative to the wild type ancestor (Agashe et al. 2013). In fact, an allele carrying only preferred codons was nearly lethal because of the severe reduction in enzyme production. Subsequently, all nine independently evolved replicate populations of this strain rapidly fixed the same synonymous substitution, which elevated fitness to near-wild type levels (Agashe et al. 2016). Similarly, strains carrying other recoded alleles also adapted via synonymous beneficial mutations. Recent studies further demonstrate other characteristics of synonymous mutations that are consistent with observations for non-synonymous mutations. For instance, evolution of phage with synonymously recoded genes demonstrated patterns of epistasis that mimic those observed for non-synonymous mutations (Leuven et al. 2020). Evolutionary adaptation to deleterious synonymous mutations also occurs rapidly via similar mechanisms observed for deleterious nonsynonymous mutations, including regulatory or compensatory mutations in cis or trans (Lind et al. 2010b; Amoros-Moya et al. 2010; Bull et al. 2012; Lind and Andersson 2013; Knöppel et al. 2016; Agashe et al. 2016; Mittal et al. 2018; Mochizuki et al. 2018; Knöppel et al. 2020). Hence, evolutionary adaptation following deleterious synonymous changes does not appear to be special. Another important line of evidence for the evolutionary impacts of synonymous changes comes from attempts to engineer attenuated viruses that have reduced chances of reversion to virulence, and are hence suitable for vaccine production.
26
D. Agashe
Many (though not all) viruses use codons that are favored by their host (Lucks et al. 2008; Carbone 2008; Chithambaram et al. 2014), indicating strong selection on viral genomes for optimizing translation. Due to the expectation of combined large fitness effects but individually weak impacts of synonymous codon changes, it was proposed that introducing hundreds of suboptimal synonymous codons in a viral genome would be an effective attenuation strategy. Indeed, experimental studies showed that synonymous mutations reduced viral activity in human cell culture (Burns et al. 2006) and mouse models (Coleman et al. 2008). However, subsequent work also showed that evolutionary changes and reversions are possible and not rare (Bull et al. 2012; Leuven et al. 2020; Nouën et al. 2021). Hence, understanding the mechanisms through which codon-attenuated viruses recover fitness is important to predict the outcomes of codon deoptimization as an attenuation strategy (Leuven et al. 2020). A broader evolutionary implication is that interactions between organisms can be influenced by synonymous mutations, and this aspect deserves more attention. Together, recent experimental evidence clearly shows that specific synonymous mutations may evolve under strong selection. But are such mutations with large impacts frequent enough to have significant evolutionary consequences? Important clues come from empirically estimated statistical distributions of fitness effects (DFEs) of synonymous mutations (see Fig. 2.1g). The first such analysis was conducted for two ribosomal protein genes of Salmonella typhimurium, finding that the DFEs for synonymous and non-synonymous mutations were remarkably similar (Lind et al. 2010a). An analysis of all possible codon substitutions at specific positions in the Hsp90 protein of Saccharomyces cerevisiae also found that most codon changes had very small impacts on fitness, but a few had very large effects (Fragata et al. 2018). The resulting topography of the fitness landscape – which describes fitness peaks and valleys traversed by evolving populations – suggests that synonymous mutations create new fitness peaks, and can therefore alter the course of adaptation. A recent meta-analysis of other DFE studies also shows that large-effect synonymous mutations are not rare (Bailey et al. 2021). There are some notable differences in the DFEs of synonymous and nonsynonymous mutations: the former is unimodal compared to the typically bimodal DFEs observed for nonsynonymous mutations; and overall, synonymous mutations have weaker fitness effects. However, the beneficial part of the DFE – which is most relevant for adaptation – is similar for both classes of mutations, suggesting that they would evolve similarly under positive selection. These patterns for single gene DFEs are also corroborated by genome- wide DFEs described for E. coli (Sane et al. 2020). Recent bioinformatics analyses corroborate these results, showing large variability in the strength of selection acting on synonymous mutations, with a substantial fraction evolving under strong selection. For instance, in some tracts in mammalian genomes, synonymous substitutions are extremely depleted compared to nonsynonymous variation, indicating that the former face strong purifying selection (Schattner and Diekhans 2006). In Drosophila, over 20% of four-fold synonymous sites show signatures of evolution under strong purifying selection, and very few sites show signatures consistent with weak selection (Lawrie et al. 2013). This
2 Evolutionary Forces That Generate SNPs: The Evolutionary Impacts of Synonymous…
27
pattern is especially stark when compared to neighboring intronic regions (Machado et al. 2020). Similarly, in human coding sequences, synonymous mutations at 20–30% of exonic splice regulatory elements show signatures of strong purifying selection (Savisaar and Hurst 2018). Note that these analyses of selection strength lump codon changes across various contexts, and therefore give an overall picture of selection on synonymous mutations regardless of the underlying mechanisms.
2.4 Summary and Future Directions The overview above gives a glimpse of gradually shifting narratives in a field grappling with some of the most fundamental questions in biology. The two parallel lines of inquiry focusing on translational and non-translational impacts of synonymous mutations have effectively merged in the past few years, allowing a cohesive view of the general fitness impacts of synonymous mutations. Multiple initially puzzling patterns of synonymous codon use are now clarified. For instance, variation in codon choice and strength of codon bias across genomes is largely explained by divergent mutational biases (which may themselves evolve under selection). In prokaryotes, variable strength of selection for translation speed also plays a role. Variation in the magnitude of codon bias across genes in the same genome arises from differing strength of selection, approximated by gene expression level. Importantly, it is clear that codon usage is only one of many factors that affect gene regulation in cells. Finally, variation in codon use across different regions of a gene is governed by many molecular mechanisms that underlie information processing and transfer within and between cells. At this scale – of local DNA or mRNA sequence – the translational impacts of codon use per se are less important, and the impact of synonymous mutations can be generalized. Below, I highlight ways in which this new understanding of synonymous changes can be leveraged to address outstanding questions in evolutionary biology.
2.4.1 Building a Cohesive Evolutionary Framework Much like non-synonymous mutations, synonymous mutations appear to evolve under widely variable strengths of selection, depending on the specific context in which they occur. As an illustration, consider that a single synonymous mutation in a functionally important gene may be deleterious because it generates a very rare codon that cannot be efficiently translated. However, given cellular buffering mechanisms (e.g. tRNA modifying enzymes, which may allow other tRNAs to decode the rare codon), the fitness impact of this single rare codon may be very weak. In another position, the same synonymous mutation might be deleterious because it generates or destroys a specific regulatory motif encoded as a secondary layer of information in the mRNA sequence. As a result, it may have a larger fitness impact,
28
D. Agashe
depending on the local sequence context. Indeed, in E. coli, identical codon changes in adjacent positions in the same gene had opposite fitness effects (Hauber et al. 2016). Conceptually, such context-specific fitness consequences are already well established and rigorously analyzed for non-synonymous mutations; hence, the separation between synonymous and non-synonymous mutational classes may be quantitative rather than qualitative. We therefore need to build a single inclusive framework for evolutionary analysis that incorporates the context-specific effects of all mutations, encompassing diverse underlying molecular mechanisms.
2.4.2 Predicting the Evolutionary Fate of Synonymous Mutations As noted recently (Canale et al. 2018; Bailey et al. 2021), there is a surprising mismatch between the short- and long-term evolutionary effects of synonymous mutations, which makes it difficult to predict their evolutionary fate. While many synonymous mutations have very large immediate fitness benefits, there are relatively few cases of beneficial synonymous mutations observed in experimental evolution studies and natural populations. A simple explanation for this pattern is that there are fewer possible synonymous (compared to nonsynonymous) mutations. In addition, in microbial populations, clonal interference (i.e. competition between independent lineages carrying distinct beneficial mutations) can reduce the probability of fixation of beneficial mutations (Gerrish and Lenski 1998). Population genetic simulations show that this effect is stronger for synonymous mutations, because they have slightly weaker fitness effects and are therefore more likely to be outcompeted by non-synonymous beneficial mutations (Bailey et al. 2021). Thus, a combination of sampling bias and weaker fitness effects may limit the evolutionary impacts of synonymous mutations. We now need more extensive theoretical and experimental analyses of the direct and indirect fitness effects of synonymous mutations. The resulting understanding would be important not only in evolutionary biology, but also for practical applications of evolutionary predictions in the context of codon-attenuated viral vaccines.
2.4.3 Developing New Methods to Identify Signatures of Selection As discussed earlier, most of our current methods to identify genomic signatures of selection rely on comparisons with “neutral” sites, typically exemplified by synonymous sites. However, a growing body evidence suggests that accounting for selection on synonymous mutations is necessary to obtain accurate estimates of selection. For instance, Matsumoto and colleagues used simulations to show that weak
2 Evolutionary Forces That Generate SNPs: The Evolutionary Impacts of Synonymous…
29
selection on codon bias can inflate the inferred fraction of adaptive amino acid substitutions (Matsumoto et al. 2016). More generally, commonly used methods to identify sites under selection in focal taxa assume invariant rates of evolution at synonymous sites. However, selection on synonymous sites or local mutation biases can introduce substantial variation in synonymous substitution rates across genes as well as genomes (Rubinstein et al. 2011; Dimitrieva and Anisimova 2014). Assuming constant synonymous substitution rates causes substantial inflation of the number of sites under selection and leads to erroneous evolutionary conclusions (Rubinstein et al. 2011; Davydov et al. 2019; Wisotsky et al. 2020). Therefore, we need new methods to identify selection, that are independent of the oft-violated assumption of weak selection at synonymous sites. Pseudogenes or short introns may be better suited as neutral benchmarks, though their identification and use involves additional challenges that need to be resolved.
2.4.4 Determining the Evolutionary History of Mechanisms Driving Selection at Synonymous Sites Over the years, the focus on identifying mechanisms underlying the consequences of synonymous mutations has intensified (Hunt et al. 2014). It is now clear that the functional impacts of various kinds of mutations arise via shared mechanisms such as changes in nucleic acid and protein structure, regulation, translation rate and accuracy, and cellular targeting. While understanding mechanisms is undoubtedly important, this does not help us understand the selective pressures that drive evolution. For instance, demonstrating that altered codon bias changes a particular phenotypic outcome (such as intracellular localization) does not mean that codon bias evolved under selection to optimize localization. Instead, across clades, different mechanisms may have driven selection on synonymous changes at various points during the evolutionary history of life. Hence, we must trace how and when different layers of information were encoded by the primary nucleotide sequence. Although it is challenging to decipher these (presumably) deep and rare evolutionary events, such analyses promise new insights into the evolution of the genetic code in conjunction with the intricate mechanisms of information processing. Acknowledgements I thank Saurabh Mahajan, Pratyush MR, Parth Raval, and Mrudula Sane for discussion, and acknowledge funding and support from the National Centre for Biological Sciences (NCBS-TIFR) and the Department of Atomic Energy, Government of India (Project Identification No. RTI 4006).
30
D. Agashe
References Adams JM, Jeppesen PGN, Sanger F, Barrell BG (1969) Nucleotide sequence from the coat protein cistron of R17 bacteriophage RNA. Nature 223:1009–1014. https://doi.org/10.1038/2231009a0 Agashe D, Martinez-Gomez NC, Drummond DA, Marx CJ (2013) Good codons, bad transcript: large reductions in gene expression and fitness arising from synonymous mutations in a key enzyme. Mol Biol Evol 30:549–560. https://doi.org/10.1093/molbev/mss273 Agashe D, Sane M, Phalnikar K, Diwan GD, Habibullah A, Martinez-Gomez NC, Sahasrabuddhe V, Polachek W, Wang J, Chubiz LM, Marx CJ (2016) Large-effect beneficial synonymous mutations mediate rapid and parallel adaptation in a bacterium. Mol Biol Evol 33:1542–1553. https://doi.org/10.1093/molbev/msw035 Akashi H (1994) Synonymous codon usage in Drosophila melanogaster – natural selection and translational accuracy. Genetics 136:927–935 Akashi H (1995) Inferring weak selection from patterns of polymorphism and divergence at “silent” sites in Drosophila DNA. Genetics 139:1067–1076 Ames BN, Hartman PE (1963) The histidine operon. Cold Spring Harb Sym 28:349–356. https:// doi.org/10.1101/sqb.1963.028.01.049 Amoros-Moya D, Bedhomme S, Hermann M, Bravo IG (2010) Evolution in regulatory regions rapidly compensates the cost of nonoptimal codon usage. Mol Biol Evol 27:2141–2151. https:// doi.org/10.1093/molbev/msq103 Andersson SGE, Kurland CG (1990) Codon preferences in free-living microorganisms. Microbiol Rev 54:198–210 Bailey SF, Hinz A, Kassen R (2014) Adaptive synonymous mutations in an experimentally evolved Pseudomonas fluorescens population. Nat Commun 5:1–7. https://doi.org/10.1038/ ncomms5076 Bailey SF, Morales LAA, Kassen R (2021) Effects of synonymous mutations beyond codon bias: the evidence for adaptive synonymous substitutions from microbial evolution experiments. Genome Biol Evol. https://doi.org/10.1093/gbe/evab141 Bennetzen JL, Hall BD (1982) Codon selection in yeast. J Biol Chem 257:3026–3031 Berger EM (1978) Pattern and chance in the use of the genetic code. J Mol Evol 10:319–323. https://doi.org/10.1007/bf01734221 Brandis G, Hughes D (2016) The selective advantage of synonymous codon usage bias in salmonella. PLoS Genet 12:e1005926. https://doi.org/10.1371/journal.pgen.1005926 Bull JJ, Badgett MR, Wichman HA, Huelsenbeck JP, Hillis DM, Gulati A, Ho C, Molineux IJ (1997) Exceptional convergent evolution in a virus. Genetics 147:1497–1507. https://doi. org/10.1093/genetics/147.4.1497 Bull JJ, Jacobson A, Badgett MR, Molineux IJ (1998) Viral escape from antisense RNA. Mol Microbiol 28:835–846. https://doi.org/10.1046/j.1365-2958.1998.00847.x Bull JJ, Molineux IJ, Wilke CO (2012) Slow fitness recovery in a codon-modified viral genome. Mol Biol Evol 29:2997–3004. https://doi.org/10.1093/molbev/mss119 Bulmer M (1991) The selection-mutation-drift theory of synonymous codon usage. Genetics 129:897 Burns CC, Shaw J, Campagnoli R, Jorba J, Vincent A, Quay J, Kew O (2006) Modulation of poliovirus replicative fitness in HeLa cells by deoptimization of synonymous codon usage in the capsid region. J Virol 80:3259–3272. https://doi.org/10.1128/jvi.80.7.3259-3272.2006 Callens M, Pradier L, Finnegan M, Rose C, Bedhomme S (2021) Read between the lines: diversity of non-translational selection pressures on local codon usage. Genome Biol Evol 13:evab097. https://doi.org/10.1093/gbe/evab097 Canale AS, Venev SV, Whitfield TW, Caffrey DR, Marasco WA, Schiffer CA, Kowalik TF, Jensen JD, Finberg RW, Zeldovich KB, Wang JP, Bolon DNA (2018) Synonymous mutations at the beginning of the influenza A virus hemagglutinin gene impact experimental fitness. J Mol Biol 430:1098–1115. https://doi.org/10.1016/j.jmb.2018.02.009
2 Evolutionary Forces That Generate SNPs: The Evolutionary Impacts of Synonymous…
31
Carbone A (2008) Codon bias is a major factor explaining phage evolution in translationally biased hosts. J Mol Evol 66:210–223. https://doi.org/10.1007/s00239-008-9068-6 Carlini DB (2004) Experimental reduction of codon bias in the Drosophila alcohol dehydrogenase gene results in decreased ethanol tolerance of adult flies. J Evol Biol 17:779–785. https://doi. org/10.1111/j.1420-9101.2004.00725.x Carlini DB, Stephan W (2003) In vivo introduction of unpreferred synonymous codons into the Drosophila Adh gene results in reduced levels of ADH protein. Genetics 163:239–243. https:// doi.org/10.1093/genetics/163.1.239 Cartegni L, Chew SL, Krainer AR (2002) Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nat Rev Genet 3:285–298. https://doi.org/10.1038/nrg775 Chamary JV, Parmley JL, Hurst LD (2006) Hearing silence: non-neutral evolution at synonymous sites in mammals. Nature 7:98–108. https://doi.org/10.1038/nrg1770 Chavancy G, Chevallier A, Fournier A, Garel J-P (1979) Adaptation of iso-tRNA concentration to mRNA codon frequency in the eukaryote cell. Biochimie 61:71–78. https://doi.org/10.1016/ s0300-9084(79)80314-4 Chen G-FT, Inouye M (1990) Suppression of the negative effect of minor arginine codons on gene expression; preferential usage of minor codons within the first 25 codons of the Escherichia coli genes. Nucleic Acids Res 18:1465–1473. https://doi.org/10.1093/nar/18.6.1465 Chithambaram S, Prabhakaran R, Xia X (2014) Differential codon adaptation between dsDNA and ssDNA phages in Escherichia coli. Mol Biol Evol:msu087. https://doi.org/10.1093/ molbev/msu087 Coleman JR, Papamichail D, Skiena S, Futcher B, Wimmer E, Mueller S (2008) Virus attenuation by genome-scale changes in codon pair bias. Science 320:1784–1787. https://doi.org/10.1126/ science.1155761 Comeron JM (2006) Weak selection and recent mutational changes influence polymorphic synonymous mutations in humans. Proc Natl Acad Sci U S A 103:6940–6945 Crick FHC (1967) Origin of the genetic code. Nature 213:119–119. https://doi.org/10.1038/ 213119d0 Crick FHC, Barnett L, Brenner S, Watts-Tobin RJ (1961) General nature of the genetic code for proteins. Nature 192:1227–1232. https://doi.org/10.1038/1921227a0 Cuevas JM, Elena SF, Moya A (2002) Molecular basis of adaptive convergence in experimental populations of RNA viruses. Genetics 162:533–542 Cuevas JM, Domingo-Calap P, Sanjuán R (2012) The fitness effects of synonymous mutations in DNA and RNA viruses. Mol Biol Evol 29:17–20. https://doi.org/10.1093/molbev/msr179 Curran JF, Yarus M (1989) Rates of aminoacyl-tRNA selection at 29 sense codons in vivo. J Mol Biol 209:65–77 Davydov II, Salamin N, Robinson-Rechavi M (2019) Large-scale comparative analysis of codon models accounting for protein and nucleotide selection. Mol Biol Evol, 36:msz048. https://doi. org/10.1093/molbev/msz048 Dimitrieva S, Anisimova M (2014) Unraveling patterns of site-to-site synonymous rates variation and associated gene properties of protein domains and families. PLoS One 9:e95034. https:// doi.org/10.1371/journal.pone.0095034 Diwan GD, Agashe D (2018) Wobbling forth and drifting back: the evolutionary history and impact of bacterial tRNA modifications. Mol Biol Evol 35:2046–2059. https://doi.org/10.1093/ molbev/msy110 Drummond DA, Wilke CO (2008) Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell 134:341–352. https://doi.org/10.1016/j. cell.2008.05.042 Duret L, Mouchiroud D (1999) Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc National Acad Sci 96:4482–4487. doi:https://doi.org/10.1073/pnas.96.8.4482 Eyre-Walker A (1999) Evidence of selection on silent site base composition in mammals: potential implications for the evolution of Isochores and junk DNA. Genetics 152:675–683. https://doi. org/10.1093/genetics/152.2.675
32
D. Agashe
Eyre-Walker A, Bulmer M (1993) Reduced synonymous substitution rate at the start of Enterobacterial genes. Nucleic Acids Res 21:4599–4603 Fiers W, Contreras R, Duerinck F, Haegeman G, Iserentant D, Merregaert J, Jou WM, Molemans F, Raeymaekers A, den Berghe AV, Volckaert G, Ysebaert M (1976) Complete nucleotide sequence of bacteriophage MS2 RNA: primary and secondary structure of the replicase gene. Nature 260:500–507. https://doi.org/10.1038/260500a0 Firnberg E, Labonte JW, Gray JJ, Ostermeier M (2014) A comprehensive, high-resolution map of a gene’s fitness landscape. Mol Biol Evol 31:1581–1592. https://doi.org/10.1093/molbev/msu081 Fragata I, Matuszewski S, Schmitz MA, Bataillon T, Jensen JD, Bank C (2018) The fitness landscape of the codon space across environments. Heredity 121:422–437. https://doi.org/10.1038/ s41437-018-0125-7 Frumkin I, Lajoie MJ, Gregg CJ, Hornung G, Church GM, Pilpel Y (2018) Codon usage of highly expressed genes affects proteome-wide translation efficiency. Proc National Acad Sci 115:E4940–E4949. https://doi.org/10.1073/pnas.1719375115 Gardin J, Yeasmin R, Yurovsky A, Cai Y, Skiena S, Futcher B (2014) Measurement of average decoding rates of the 61 sense codons in vivo. elife 3. https://doi.org/10.7554/elife.03735 Garel J-P (1974) Functional adaptation of tRNA population. J Theor Biol 43:211–225. https://doi. org/10.1016/s0022-5193(74)80054-8 Gerrish PJ, Lenski RE (1998) The fate of competing beneficial mutations in an asexual population. Genetica 102–103:127. https://doi.org/10.1023/a:1017067816551 Goldman N, Yang Z (1994) A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol Biol Evol. https://doi.org/10.1093/oxfordjournals.molbev.a040153 Goodman DB, Church GM, Kosuri S (2013) Causes and effects of N-terminal codon bias in bacterial genes. Science 342:475–479. https://doi.org/10.1126/science.1241934 Gouy M, Gautier C (1982) Codon usage in bacteria: correlation with gene expressivity. Nucleic Acids Res 10:7055–7074. https://doi.org/10.1093/nar/10.22.7055 Grantham R (1978) Viral, prokaryote and eukaryote genes contrasted by mRNA sequence indexes. FEBS Lett 95:1–11. https://doi.org/10.1016/0014-5793(78)80041-6 Grantham R, Gautier C, Gouy M, Mercier R, Pavé A (1980) Codon catalog usage and the genome hypothesis. Nucleic Acids Res 8:197–197. https://doi.org/10.1093/nar/8.1.197-c Grantham R, Gautier C, Gouy M, Jacobzone M, Mercier R (1981) Codon catalog usage is a genome strategy modulated for gene expressivity. Nucleic Acids Res 9:43–74 Grosjean H, Fiers W (1982) Preferential codon usage in prokaryotic genes: the optimal codon- anticodon interaction energy and the selective codon usage in efficiently expressed genes. Gene 18:199–209. https://doi.org/10.1016/0378-1119(82)90157-3 Grosjean H, Sankoff D, Jou WM, Fiers W, Cedergren RJ (1978) Bacteriophage MS2 RNA: a correlation between the stability of the codon: anticodon interaction and the choice of code words. J Mol Evol 12:113–119. https://doi.org/10.1007/bf01733262 Hasegawa M, Yasunaga T, Miyata T (1979) Secondary structure of MS2 phage RNA and bias in code word usage. Nucleic Acids Res 7:2073–2079. https://doi.org/10.1093/nar/7.7.2073 Hauber DJ, Grogan DW, DeBry RW (2016) Mutations to less-preferred synonymous codons in a highly expressed gene of Escherichia coli: fitness and epistatic interactions. PLoS One 11:e0146375–e0146316. https://doi.org/10.1371/journal.pone.0146375 Hershberg R, Petrov DA (2008) Selection on codon bias. Annu Rev Genet 42:287–299. https://doi. org/10.1146/annurev.genet.42.110807.091442 Hershberg R, Petrov DA (2009) General rules for optimal codon choice. PLoS Genet 5:e1000556. https://doi.org/10.1371/journal.pgen.1000556.g006 Higgs PG, Ran W (2008) Coevolution of codon usage and tRNA genes leads to alternative stable states of biased codon usage. Mol Biol Evol 25:2279–2291. https://doi.org/10.1093/ molbev/msn173 Hoekema A, Kastelein RA, Vasser M, de Boer HA (1987) Codon replacement in the PGK1 gene of Saccharomyces cerevisiae: experimental approach to study the role of biased codon usage in gene expression. Mol Cell Biol 7:2914–2924. https://doi.org/10.1128/mcb.7.8.2914 Holder KK, Bull JJ (2001) Profiles of adaptation in two similar viruses. Genetics 159:1393–1404. https://doi.org/10.1093/genetics/159.4.1393
2 Evolutionary Forces That Generate SNPs: The Evolutionary Impacts of Synonymous…
33
Holm L (1986) Codon usage and gene expression. Nucleic Acids Res 14:3075–3087. https://doi. org/10.1093/nar/14.7.3075 Hunt RC, Simhadri VL, Iandoli M, Sauna ZE, Kimchi-Sarfaty C (2014) Exposing synonymous mutations. Trends Genet 30:308–321. https://doi.org/10.1016/j.tig.2014.04.006 Ikemura T (1981) Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J Mol Biol 151:389–409 Ikemura T (1985) Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol 2:13–34 Itano HA (1963) The synthesis and structure of normal and abnormal hemoglobins (Ibadan, Nigeria). In: Symposium on abnormal haemoglobins. Blackwell, Oxford, Ibadan Jou WM, Haegeman G, Fiers W (1971) Studies on the bacteriophage MS2. Nucleotide fragments from the coat protein cistron. FEBS Lett 13:105–109. https://doi. org/10.1016/0014-5793(71)80210-7 Kafatos FC, Efstratiadis A, Forget BG, Weissman SM (1977) Molecular evolution of human and rabbit beta-globin mRNAs. Proc National Acad Sci 74:5618–5622. https://doi.org/10.1073/ pnas.74.12.5618 Kanaya S, Yamada Y, Kudo Y, Ikemura T (1999) Studies of codon usage and tRNA genes of 18 unicellular organisms and quantification of Bacillus subtilis tRNAs: gene expression level and species-specific diversity of codon usage based on multivariate analysis. Gene 238:143–155. https://doi.org/10.1016/s0378-1119(99)00225-5 Kershner JP, McLoughlin SY, Kim J, Morgenthaler A, Ebmeier CC, Old WM, Copley SD (2016) A synonymous mutation upstream of the gene encoding a weak-link enzyme causes an ultrasensitive response in growth rate. J Bacteriol 198:2853–2863. https://doi.org/10.1128/jb.00262-16 Kim CH, Oh Y, Lee TH (1997) Codon optimization for high-level expression of human erythropoietin (EPO) in mammalian cells. Gene 199:293–301 Kimchi-Sarfaty C, Oh JM, Kim I-W, Sauna ZE, Calcagno AM, Ambudkar SV, Gottesman MM (2007) A “silent” polymorphism in the MDR1 gene changes substrate specificity. Science 315:525–528. https://doi.org/10.1126/science.1135308 Kimura M (1968) Evolutionary rate at the molecular level. Nature 217:624–626. https://doi. org/10.1038/217624a0 Kimura M (1977) Preponderance of synonymous changes as evidence for the neutral theory of molecular evolution. Nature 267:275–276. https://doi.org/10.1038/267275a0 Kimura M (1981) Possibility of extensive neutral evolution under stabilizing selection with special reference to nonrandom usage of synonymous codons. Proc Natl Acad Sci U S A 78:5773–5777. https://doi.org/10.1556/avet.2013.009 King JL, Jukes TH (1969) Non-Darwinian evolution. Science 164:788–798 Knöppel A, Näsvall J, Andersson DI (2016) Compensating the fitness costs of synonymous mutations. Mol Biol Evol 33:1461–1477. https://doi.org/10.1093/molbev/msw028 Knöppel A, Andersson DI, Näsvall J (2020) Synonymous mutations in rpsT Lead to ribosomal assembly defects that can be compensated by mutations in fis and rpoA. Front Microbiol 11:340. https://doi.org/10.3389/fmicb.2020.00340 Konigsberg W, Godson GN (1983) Evidence for use of rare codons in the dnaG gene and other regulatory genes of Escherichia coli. Proc National Acad Sci 80:687–691. https://doi.org/10.1073/ pnas.80.3.687 Koonin EV, Novozhilov AS (2016) Origin and evolution of the universal genetic code. Annu Rev Genet 51:1–18. https://doi.org/10.1146/annurev-genet-120116-024713 Kristofich J, Morgenthaler AB, Kinney WR, Ebmeier CC, Snyder DJ, Old WM, Cooper VS, Copley SD (2018) Synonymous mutations make dramatic contributions to fitness when growth is limited by a weak-link enzyme. PLoS Genet 14:e1007615. https://doi.org/10.1371/journal. pgen.1007615 Kudla G, Murray AW, Tollervey D, Plotkin JB (2009) Coding-sequence determinants of gene expression in Escherichia coli. Science 324:255–258. https://doi.org/10.1126/science.1170160 Lawrie DS, Messer PW, Hershberg R, Petrov DA (2013) Strong purifying selection at synonymous sites in D. melanogaster. PLoS Genet 9:e1003527. https://doi.org/10.1371/journal. pgen.1003527
34
D. Agashe
Lebeuf-Taylor E, McCloskey N, Bailey SF, Hinz A, Kassen R (2019) The distribution of fitness effects among synonymous mutations in a gene under directional selection. elife 8:e45952. https://doi.org/10.7554/elife.45952 Leuven JTV, Ederer MM, Burleigh K, Scott L, Hughes RA, Codrea V, Ellington AD, Wichman H, Miller C (2020) ΦX174 attenuation by whole genome codon deoptimization. Genome Biol Evol 13:evaa214. https://doi.org/10.1093/gbe/evaa214 Li W-H, Wu C-I, Luo C-C (1985) A new method for estimating synonymous and nonsynonymous rates of nucleotide substitution considering the relative likelihood of nucleotide and codon changes. Mol Biol Evol 2:150–174. https://doi.org/10.1093/oxfordjournals.molbev.a040343 Lind PA, Andersson DI (2013) Fitness costs of synonymous mutations in the rpsT gene can be compensated by restoring mRNA base pairing. PLoS One 8:e63373. https://doi.org/10.1371/ journal.pone.0063373 Lind PA, Berg OG, Andersson DI (2010a) Mutational robustness of ribosomal protein genes. Science 330:825–827. https://doi.org/10.1126/science.1194617 Lind PA, Tobin C, Berg OG, Kurland CG, Andersson DI (2010b) Compensatory gene amplification restores fitness after inter-species gene replacements. Mol Microbiol 75:1078–1089. https://doi.org/10.1111/j.1365-2958.2009.07030.x Lind PA, Arvidsson L, Berg OG, Andersson DI (2017) Variation in mutational robustness between different proteins and the predictability of fitness effects. Mol Biol Evol 34:408–418. https:// doi.org/10.1093/molbev/msw239 Liu Y (2020) A code within the genetic code: codon usage regulates co-translational protein folding. Cell Commun Signal 18:145. https://doi.org/10.1186/s12964-020-00642-6 Lucks JB, Nelson DR, Kudla GR, Plotkin JB (2008) Genome landscapes and bacteriophage codon usage. PLoS Comput Biol 4:e1000001. https://doi.org/10.1371/journal.pcbi.1000001 Machado HE, Lawrie DS, Petrov DA (2020) Pervasive strong selection at the level of codon usage bias in Drosophila melanogaster. Genetics 214:511–528. https://doi.org/10.1534/ genetics.119.302542 Matsumoto T, John A, Baeza-Centurion P, Li B, Akashi H (2016) Codon usage selection can bias estimation of the fraction of adaptive amino acid fixations. Mol Biol Evol 33:1580–1589. https://doi.org/10.1093/molbev/msw027 McDonald JH, Kreitman M (1991) Adaptive protein evolution at the Adh locus in Drosophila. Nature 351:652–654. https://doi.org/10.1038/351652a0 McVean GAT, Vieira J (2001) Inferring parameters of mutation, selection and demography from patterns of synonymous site evolution in drosophila. Genetics 157:245–257. https://doi. org/10.1093/genetics/157.1.245 Mittal P, Brindle J, Stephen J, Plotkin JB, Kudla G (2018) Codon usage influences fitness through RNA toxicity. Proc National Acad Sci 115:201810022. https://doi.org/10.1073/ pnas.1810022115 Miyata T, Hayashida H (1981) Extraordinarily high evolutionary rate of pseudogenes: evidence for the presence of selective pressure against changes between synonymous codons. Proc Natl Acad Sci U S A 78:5739–5743. https://doi.org/10.2307/11543?ref=search-gateway:81909a67 ba6a14f6d55afc93357514c3 Miyata T, Yasunaga T (1980) Molecular evolution of mRNA: a method for estimating evolutionary rates of synonymous and amino acid substitutions from homologous nucleotide sequences and its application. J Mol Evol 16:23–36. https://doi.org/10.1007/bf01732067 Miyata T, Yasunaga T, Nishida T (1980) Nucleotide sequence divergence and functional constraint in mRNA evolution. Proc National Acad Sci 77:7328–7332. https://doi.org/10.1073/ pnas.77.12.7328 Mochizuki T, Ohara R, Roossinck MJ (2018) Large-scale synonymous substitutions in cucumber mosaic virus RNA 3 facilitate amino acid mutations in the coat protein. J Virol 92:e01007– e01018. https://doi.org/10.1128/jvi.01007-18 Nei M, Gojobori T (1986) Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol 3:418–426. https://doi.org/10.1093/ oxfordjournals.molbev.a040410
2 Evolutionary Forces That Generate SNPs: The Evolutionary Impacts of Synonymous…
35
Nirenberg MW, Matthaei JH (1961) The dependence of cell-free protein synthesis in E. coli upon naturally occurring or synthetic polyribonucleotides. Proc National Acad Sci 47:1588–1602. https://doi.org/10.1073/pnas.47.10.1588 Nouën CL, McCarty T, Yang L, Brown M, Wimmer E, Collins PL, Buchholz UJ (2021) Rescue of codon-pair deoptimized respiratory syncytial virus by the emergence of genomes with very large internal deletions that complemented replication. Proc National Acad Sci 118:e2020969118. https://doi.org/10.1073/pnas.2020969118 Novella IS, Zárate S, Metzgar D, Ebendick-Corpus BE (2004) Positive selection of synonymous mutations in vesicular stomatitis virus. J Mol Biol 342:1415–1421. https://doi.org/10.1016/j. jmb.2004.08.003 Novoa EM, de Pouplana LR (2012) Speeding with control: codon usage, tRNAs, and ribosomes. Trends Genet 28:574–581. https://doi.org/10.1016/j.tig.2012.07.006 Parker J (1989) Errors and alternatives in reading the universal genetic code. Microbiol Rev 53:273–298 Pedersen S (1984) Escherichia coli ribosomes translate in vivo with variable rate. EMBO J 3:2895–2898 Plotkin JB, Kudla G (2010) Synonymous but not the same: the causes and consequences of codon bias. Nat Rev Genet 12:32–42. https://doi.org/10.1038/nrg2899 Post LE, Strycharz GD, Nomura M, Lewis H, Dennis PP (1979) Nucleotide sequence of the ribosomal protein gene cluster adjacent to the gene for RNA polymerase subunit beta in Escherichia coli. Proc National Acad Sci 76:1697–1701. https://doi.org/10.1073/pnas.76.4.1697 Precup J, Parker J (1987) Missense misreading of asparagine codons as a function of codon identity and context. J Biol Chem 262:11351–11355 Ran W, Higgs PG (2010) The influence of anticodon-codon interactions and modified bases on codon usage bias in bacteria. Mol Biol Evol 27:2129–2140. https://doi.org/10.1093/ molbev/msq102 Robinson M, Lilley R, Little S, Emtage JS, Yarranton G, Stephens P, Millican A, Eaton M, Humphreys G (1984) Codon usage can affect efficiency of translation of genes in Escherichia coli. Nucleic Acids Res 12:6663–6671. https://doi.org/10.1093/nar/12.17.6663 Rocha EPC (2004) Codon usage bias from tRNA’s point of view: redundancy, specialization, and efficient decoding for translation optimization. Genome Res 14:2279–2286. https://doi. org/10.1101/gr.2896904 Rubinstein ND, Doron-Faigenboim A, Mayrose I, Pupko T (2011) Evolutionary models accounting for layers of selection in protein-coding genes and their impact on the inference of positive selection. Mol Biol Evol 28:3297–3308. https://doi.org/10.1093/molbev/msr162 Sane M, Diwan GD, Bhat BA, Wahl LM, Agashe D (2020) Shifts in mutation spectra enhance access to beneficial mutations. Biorxiv. 2020.09.05.284158. https://doi. org/10.1101/2020.09.05.284158 Sauna ZE, Kimchi-Sarfaty C (2011) Understanding the contribution of synonymous mutations to human disease. Nature 12:683–691. https://doi.org/10.1038/nrg3051 Savisaar R, Hurst LD (2018) Exonic splice regulation imposes strong selection at synonymous sites. Genome Res 28:1442–1454. https://doi.org/10.1101/gr.233999.117 Schattner P, Diekhans M (2006) Regions of extreme synonymous codon selection in mammalian genes. Nucleic Acids Res 34:1700–1710. https://doi.org/10.1093/nar/gkl095 Sharma Y, Miladi M, Dukare S, Boulay K, Caudron-Herger M, Groß M, Backofen R, Diederichs S (2019) A pan-cancer analysis of synonymous mutations. Nat Commun 10:2569. https://doi. org/10.1038/s41467-019-10489-2 Sharp PM, Li W-H (1986a) Codon usage in regulatory genes in Escherichia coli does not reflect selection for ‘rare’ codons. Nucleic Acids Res 14:7737–7749. https://doi.org/10.1093/ nar/14.19.7737 Sharp PM, Li W-H (1986b) An evolutionary perspective on synonymous codon usage in unicellular organisms. J Mol Evol 24:28–38. https://doi.org/10.1007/bf02099948 Sharp PM, Li WH (1987a) The codon adaptation index – a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15:1281–1295
36
D. Agashe
Sharp PM, Li WH (1987b) The rate of synonymous substitution in enterobacterial genes is inversely related to codon usage bias. Mol Biol Evol 4:222–230. https://doi.org/10.1093/oxfordjournals. molbev.a040443 Sharp PM, Emery LR, Zeng K (2010) Forces that influence the evolution of codon bias. Philos Trans R Soc B: Biol Sci 365:1203–1212. https://doi.org/10.1098/rstb.2009.0305 Shields DC (1990) Switches in species-specific codon preferences: the influence of mutation biases. J Mol Evol 31:71–80. https://doi.org/10.1007/bf02109476 Sonneborn TM (1965) Degeneracy of the genetic code: extent, nature, and genetic implications. In: Bryson V, Vogel HJ (eds) Evolving Genes and Proteins. Academic, New York, pp 377–397 Sørensen MA, Pedersen S (1991) Absolute in vivo translation rates of individual codons in Escherichia coli. The two glutamic acid codons GAA and GAG are translated with a threefold difference in rate. J Mol Biol 222:265–280 Sørensen MA, Kurland CG, Pedersen S (1989) Codon usage determines translation rate in Escherichia coli. J Mol Biol 07:365–377 Sun Y, Tamarit D, Andersson SGE (2017) Switches in genomic GC content drive shifts of optimal codons under sustained selection on synonymous sites. Genome Biol Evol 9:2560–2579. https://doi.org/10.1093/gbe/evw201 Supek F (2015) The code of silence: widespread associations between synonymous codon biases and gene function. J Mol Evol:1–9. https://doi.org/10.1007/s00239-015-9714-8 Supek F, Miñana B, Valcárcel J, Gabaldón T, Lehner B (2014) Synonymous mutations frequently act as driver mutations in human cancers. Cell 156:1324–1335. https://doi.org/10.1016/j. cell.2014.01.051 Takeda N, Tanimura M, Miyamura K (1994) Molecular evolution of the major capsid protein VP1 of enterovirus 70. J Virol 68:854–862. https://doi.org/10.1128/jvi.68.2.854-862.1994 Urrutia AO, Hurst LD (2001) Codon usage bias covaries with expression breadth and the rate of synonymous evolution in humans, but this is not evidence for selection. Genetics 159:1191–1199 Varenne S, Buc J, Lloubes R, Lazdunski C (1984) Translation is a non-uniform process effect of tRNA availability on the rate of elongation of nascent polypeptide chains. J Mol Biol 180:549–576. https://doi.org/10.1016/0022-2836(84)90027-5 Vieira-Silva S, Rocha EPC (2010) The systemic imprint of growth and its uses in ecological (meta) genomics. PLoS Genet 6:e1000808. https://doi.org/10.1371/journal.pgen.1000808 Wisotsky SR, Pond SLK, Shank SD, Muse SV (2020) Synonymous site-to-site substitution rate variation dramatically inflates false positive rates of selection analyses: ignore at your own peril. Mol Biol Evol 37:2430–2439. https://doi.org/10.1093/molbev/msaa037 Woese CR (1965) On the evolution of the genetic code. Proc National Acad Sci 54:1546–1552. https://doi.org/10.1073/pnas.54.6.1546 Woese C (1969) Models for the evolution of codon assignments. J Mol Biol 43:235–240. https:// doi.org/10.1016/0022-2836(69)90095-3 Yamamoto T, Suyama A, Mori N, Yokota T, Wada A (1985) Gene expression in the polycistronic operons of Escherichia coli heat-labile toxin and cholera toxin: a new model of translational control. FEBS Lett 181:377–380. https://doi.org/10.1016/0014-5793(85)80296-9 Yang Z, Nielsen R (2000) Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol 17:32–43. https://doi.org/10.1093/oxfordjournals. molbev.a026236 Zolotukhin S, Potter M, Hauswirth WW, Guy J, Muzyczka N (1996) A “humanized” green fluorescent protein cDNA adapted for high-level expression in mammalian cells. J Virol 70:4646–4654 Zwart MP, Schenk MF, Hwang S, Koopmanschap B, de Lange N, van de Pol L, Nga TTT, Szendro IG, Krug J, de Visser JAGM (2018) Unraveling the causes of adaptive benefits of synonymous mutations in TEM-1 β-lactamase. Heredity 121:406–421. https://doi.org/10.1038/ s41437-018-0104-z
Chapter 3
Recording Silence – Accurate Annotation of the Genetic Sequence Is Required to Better Understand How Synonymous Coding Affects Protein Structure and Disease Aviv A. Rosenberg, Alex M. Bronstein, and Ailie Marx
3.1 Synonymous But Not Silent The term silent mutation is commonly used to describe (1) a change in the DNA sequence that does not result in an observable effect on the organism’s phenotype; and (2) a synonymous mutation where the nucleotide change leaves the translated amino acid sequence unchanged. When Christian Anfinsen showed that a folded and active protein could be denatured to lose structure and activity and then subsequently renatured to regain the same structure and activity it appeared that the native, thermodynamically stable, structure of a protein depends only on the amino acid sequence and solution conditions (Anfinsen and Haber 1961). This experiment suggested that, once translated, proteins carry no memory of the genetic sequence and led to one of the most erroneous assumptions in modern science; synonymous codons were long considered silent, a mutation of the type that has no effect on an organism’s phenotype. The error of course was not the conclusions made from experiments conducted and interpreted with respect to the wisdom of the time. Indeed, these elegant experiments, which opened up the field of study into protein folding, were awarded the Nobel Prize in 1972. To quote Anfinsen himself, albeit commenting on another finding, “an entirely acceptable conclusion can be reached that is entirely wrong because of the paucity of knowledge at that particular time” (Anfinsen 1989). It is worth noting that the ribonuclease refolding experiments were reported only a few years after the first 3D protein structure was solved (Kendrew et al. 1958) and tRNA was first discovered (Hoagland et al. 1958). The regrettable error, which has and continues to enormously hinder a more complete understanding of protein folding and disease, was the long-lasting acceptance of the term “silent” which falsely equated A. A. Rosenberg · A. M. Bronstein (*) · A. Marx (*) Computer Science, Technion – Israel Institute of Technology, Haifa, Israel e-mail: [email protected]; [email protected] © Springer Nature Switzerland AG 2022 Z. E. Sauna, C. Kimchi-Sarfaty (eds.), Single Nucleotide Polymorphisms, https://doi.org/10.1007/978-3-031-05616-1_3
37
38
A. A. Rosenberg et al.
synonymous coding to an inconsequential cellular redundancy. It is worth noting that cellular redundancies, catering for the fragility of life by providing an alternative system element when another is compromised by mutation, are rarely inconsequential. If such redundancies were completely equivalent, they would be evolutionarily unstable and destined to be lost (Wang and Zhang 2009). Redundancy can be preserved indefinitely only if the redundant elements contribute asymmetrically to their shared function, creating evolutionary pressure for the conservation of both (Krakauer and Plotkin 2002). Indeed, literature presenting evidence that synonymous codons have evolved under selective pressure, both at the RNA and protein levels (Shabalina et al. 2013; Plotkin and Kudla 2011), contributed to “silencing” the idea that synonymous coding is an inert degeneracy of the translation machinery (Sharp et al. 2010). A wealth of knowledge has accumulated over the last 50 years implicating synonymous mutations in various human diseases and has convincingly demonstrated that such mutations do affect cellular function and fitness (Walsh et al. 2020; Sauna and Kimchi-Sarfaty 2011). The first concrete hypothesis that synonymous codon usage could alter protein structure was posed by Purvis et al. (Purvis et al. 1987) and it took another two decades before experimental studies would show that synonymous mutations can modify protein structure by altering the relative abundance of codon-specific tRNAs (e.g., MDR1, Kimchi-Sarfaty et al. 2007) or changing mRNA secondary structure, (e.g., ΔF508 CFTR, Bartoszewski et al. 2010; COMT, Nackley et al. 2006; DRD2, Duan et al. 2003). The literature is riddled with reports of synonymous mutations having a striking impact on protein function. Just a few examples include the demonstration that an L22L synonymous mutation prevents MDM2 aided phosphorylation of the nascent p53 protein (Karakostis et al. 2019), an A82A mutation on the ω subunit of RNA polymerase changes the helicity of the isolated subunit and renders the complex inactive (Rajeshbhai Patel et al. 2019), a V107V mutation to human Factor IX is implicated in Hemophilia B (Simhadri et al. 2017), 16 synonymous mutations in CAT3 reduces protein activity by 20% (Komar et al. 1999), and T6T together with P8P mutations in the Salmonella flgM gene produces a 600-fold change in FlgM activity (Chevance and Hughes 2017). Today there is a large volume of evidence suggesting that codon usage affects the rate and rhythm of translation and can alter co-translational folding and the formation of folding intermediates that lead to the native global protein structure (Liu 2020; Zhao et al. 2017; Spencer et al. 2012; Siller et al. 2010; Buhr et al. 2016). Still, it seems that we are yet to appreciate the full scope of associations between synonymous coding and function and the fundamental mechanisms behind these associations. Intrigued by the potential power of single synonymous mutations to significantly affect protein function, we have recently started exploring the novel idea that synonymous mutations may be directly shaping the local backbone of protein structures (Fig. 3.1).
3 Recording Silence – Accurate Annotation of the Genetic Sequence Is Required…
39
Fig. 3.1 Is there an association between synonymous codon use and local protein structure? The central dogma of biology defines a stepwise transfer of information from the genetic sequence to the amino acid sequence to protein structure. Yet, amounting evidence supports the notion that synonymously coded genes, having identical amino acid sequences, can result in globally altered protein structures. Our recent research is investigating the possibility that the local structure of the protein backbone is associated with synonymous coding (described and shown In Fig. 3.3). We hypothesize that via a codon-dependent mechanism, amino acids at the entrance to ribosome exit tunnel adopt different, codon specific, initial backbone orientations (a), and suggest that at some very specific locations when these slightly different initial conditions sit on a local energy ridge between two bistable modes (b), the subtle difference in initial conditions could direct synonymously coded local regions of the backbone into different energy wells (c) amplifying and preserving a trace of these different initial conditions (d). We found numerous examples of matching locations in homologous proteins that exhibit bi-modal behavior of the backbone conformation (e). Belonging to one of the modes is correlated to the choice of a specific synonymous codon by over 80%
3.2 An Ironic Oversight in Structural Biology Reflecting the increasing awareness of the functional significance of synonymous mutations, there are a growing number of databases dedicated to documenting the association between synonymous codon usage and cellular (mis)function. Among them, the SynMIC database details more than 650,000 synonymous mutations found in various human cancers (Sharma et al. 2019) and the Database of Deleterious Synonymous mutation (dbDSM) catalogs disease related synonymous mutations from different databases (Wen et al. 2016). Accompanying the increasing availability of data pertaining to the effects of synonymous mutations, computational methods capable of predicting the effect of synonymous single nucleotide variations are emerging (Zhang and Bromberg 2019). A prominent exception to this trend, however, is the field of structural biology where the effects of synonymous mutations are still largely unrecognized, and
40
A. A. Rosenberg et al.
worse, data on codon usage are not even recorded alongside structural data. The Protein Data Bank (PDB) has been the leading global repository for open access atomic-level data on biological macromolecules for 50 years, today containing more than 150,000 protein structures. Although the PDB contains a rich amount of data beyond atomic coordinates (Abriata 2017), it remains silent in reporting the exact genetic template used for production of the protein – there is not even an optional field available to report this information when depositing structures. There are numerous structural databases dedicated to specific research domains; however, since the structural information in these resources are very often carved out of the PDB, they are no more insightful in providing specific data on synonymous codon usage and the resulting protein structure. Beyond the high resolution molecular maps contained in the PDB there are databases collecting other structural information; The Protein Circular Dichroism Data Bank (PCDDB) is a public repository that archives and freely distributes circular dichroism (CD) and synchrotron radiation CD (SRCD) spectral data and their associated experimental metadata (https:// pcddb.cryst.bbk.ac.uk/, Whitmore et al. 2017). There is an irony in the failure of structural databases to report codon usage. The requirement for large amounts of pure protein has led structural biologists to exploit E. coli as a protein production factory, so much so that the vast majority of the high- resolution structures deposited in the PDB were expressed in E. coli. However, it is well accepted that codon usage can affect heterologous expression of proteins leading to misfolding and aggregation. To overcome this challenge, codon optimization and harmonization to better match codon usage in the host organism is now very common practice (Angov et al. 2008). Herein lies the irony – it appears that although known that codon usage, in the context of the host, is essential in many cases for correct protein folding, we are failing to record accurate information on the genetic template of structures which could eventually be used to better characterize the associations between structure and coding. It is, perhaps, of note that the term “silent”, which has somewhat deafened us to fully perceiving the functional importance of synonymous mutations, is not the only term which may be misleading science and hindering a more complete understanding of protein folding and disease. When we talk about codon “optimization” it seems fair to ask if we are even close to understanding what we are optimizing for and how to achieve it. Numerous methods exist for codon optimization and that the practice remains as much an art as a science, in the sense that it is often difficult to predict which method will work well, if at all (Ranaghan et al. 2021), simply underscores that we are far from understanding the rules that govern the associations between coding and structure. Practically, it also means that even if known or suspected that a gene was codon optimized, it is impossible to automatically assign a codon optimized sequence – the only way of accurately assigning codons to structural data is if this is reported within the databases themselves (Fig. 3.2). The data collection gap extends beyond documentation of the genetic code used in specific protein production. Alternative methods to codon optimization have been used to overcome the difficulties which codon bias introduces into heterologous expression. Since codon bias is tightly associated with tRNA abundance, which in
3 Recording Silence – Accurate Annotation of the Genetic Sequence Is Required…
41
Fig. 3.2 Unannotated steps in the experimental pipeline from gene to protein structure. E. coli is by far the most commonly used expression system for producing the large amounts of protein required for protein crystallography. The process from gene isolation to protein purification often involves (i) codon optimization or the introduction of synonymous mutations during cloning, (ii) the use of differently engineered E. coli strains to facilitate expression (e.g., strains having extra copies of rare tRNA genes) and highly variant expression conditions (e.g., temperature, media, inducer concentration). Synonymous codon changes are not reported in the PDB, and the protein expression conditions are rarely reported accurately. The implication of this is that the assignment of the codons from the reference genetic sequence to a PDB structure is prone to error
turn affects the rhythm of translation, E. coli strains engineered to express extra copies of rare tRNAs have been developed for better heterologous expression. Consequently, accurate reporting of the exact strain of expression cells seems imperative for providing the pieces of the puzzle which are needed to eventually unveil an accurate picture of how synonymous coding is contributing to shaping protein structure. Other parameters of the expression system which could imaginably affect the expression and formation of proteins are plasmid type, expression media, temperature, inducer concentration, etc. All these parameters are readily known at the time of experiment and could be easily reported should necessary provision be made for the information in the submission format of the repositories.
3.3 Lost Opportunities in the Machine Learning Revolution We have recently postulated and shown a direct, local association between the genetic sequence and protein structure, suggesting that the conformation of an amino acid is associated with the specific synonymous codon from which it was translated (Rosenberg et al. 2022). This concept is distinct from the idea that codon usage affects the co-translational folding of a distant, global region of the protein sequence (namely, the region folding outside the ribosome exit tunnel simultaneous to continued mRNA translation). To explore this hypothesis, and in the absence of details on the genetic source code of structures deposited in the PDB, we developed a method for large scale assignment of codons to PDB protein structures. A major limitation of this method is that information about alterations made to the native
42
A. A. Rosenberg et al.
coding sequence during the experimental process is not annotated (as visualized in Fig. 3.2). Despite these intrinsic errors, which we attempted to minimize by restricting our data set to E. coli proteins expressed in E. coli, our results demonstrate that when comparing codon-specific Ramachandran plots, some synonymous codons have significantly distinct distributions (Fig. 3.3).
Val
Phe
Thr
Ser
Cys
Ile
Fig. 3.3 Some synonymous codons have significantly different φ, ψ backbone dihedral angle distributions. Contour plots of the codon-specific φ, ψ dihedral angle distributions of amino acids occurring within β-strands. Synonymous codons are overlaid. Whilst the distributions of some synonymous codons are indistinguishable (shown here all valine codons and the ATT-ATC pair of isoleucine (p-value > 0.05), many synonymous pairs of codons are clearly different (shown here the codons of cysteine, phenylalanine, threonine the AGG-TCC pair of serine and the ATA-ATT and ATA-ATC pairs of isoleucine (p-value < 0.02). The depicted level lines contain 10%, 50% and 90% of the probability density; the shaded regions represent 10–90% confidence. Methodological details. Querying the PDB for high resolution ( A, p.K172K c.2817C > T, p.S939S c.234A > G, p.T78T c.4101G > A, p.Q1367Q c.1008G > A, p.E336E c.375C > G, p.T125T c.672G > A, p.E224E c.993G > A, p.Q331Q c.5883G > A, p.P1961P c.351C > G, p.T117T c.981G > A, p.G327G c.1023C > T, p.A341A c.30A > C, p.G10G
Congenital Breast and ovarian cancer Somatic Lung squamous cell carcinoma Somatic Lung squamous cell carcinoma Somatic Lung cancer
Splicing (Exon PARP1 truncation) RAD51C Splicing (Intron retention)
ARID1A CDH1 TP53 TP53 TP53
Splicing (Not specified)
APC GATA2 GATA2 GATA2
RNA structure KRAS (RNA stability)
Mutation c.1869G > T, p.R623R
Somatic
Breast cancer
Somatic
Breast cancer
Somatic
TCGA study
Somatic
TCGA study
Somatic
Brain metastasis
Congenital Myelodysplastic syndromes Congenital Myelodysplastic syndromes Congenital Myelodysplastic syndromes Somatic TCGA study
Reference Montera et al. (2001) Coppa et al. (2018) and Minucci et al. (2018) Fraile- Bethencourt et al. (2019) Fraile- Bethencourt et al. (2019) Fraile- Bethencourt et al. (2019) Hansen et al. (2010) Jayasinghe et al. (2018) Jayasinghe et al. (2018) Jung et al. (2015) Jung et al. (2015) Jung et al. (2015) Supek et al. (2014) Supek et al. (2014) Pecina-Slaus et al. (2010) Kozyra et al. (2020) Kozyra et al. (2020) Kozyra et al. (2020) Sharma et al. (2019) (continued)
84
E. Herreros et al.
Table 5.1 (continued) Molecular mechanism RNA structure (RBP interaction)
Gene affected DAB2 ZFHX3 USP9X
RNA structure (Translation efficiency) RNA structure (miRNA binding site) Protein stability Unknown
ZFP36
Mutation c.858C > T, p.286F > F c.9942C > T, p.3313G > G c.7242G > A, p.2414 K > K c.309C > T, p.R103R
Type of mutation Somatic Somatic Somatic Somatic
Cancer type TCGA study
Reference Teng et al. (2020) TCGA study Teng et al. (2020) TCGA study Teng et al. (2020) Breast cancer cell Griseri et al. line (2011)
BCL2L12 c.51C > T, p.F17F
Somatic
Melanoma
Gartner et al. (2013)
TP53
Somatic
Lung cancer cell line Renal cell carcinoma
Karakostis et al. (2019) Grepin et al. (2020) and Tan et al. (2017)
EGFR
c.66A > G, p.L22L c.2361G > A, p.Q787Q
Somatic
In two recent-pan cancer studies, exome and RNA sequencing data from 8656 (study by Jayasinghe et al.) and 1812 tumors (study by Jung et al.) were integrated with the aim to identify pathogenic cancer mutations that affect splicing. In addition to many non-synonymous mutations that alter splicing, these studies identified 239 synonymous mutations with a detectable impact on splicing in the associated tumor RNA sequencing data. Interestingly, only 33 of these synonymous mutations occurred in genes that have previously been linked to cancer pathogenesis (Jayasinghe et al. 2018; Jung et al. 2015). The study by Jayasinghe et al. developed MiSplice (mutation-induced splicing), a bioinformatic tool to identify mutations that create splice sites. A couple of the synonymous mutations that were picked up in their study were further tested in splicing minigene assays. In such an assay, a minigene sequence consisting of the exonic and intronic regions of interest are tested for effective splicing (Stoss et al. 1999; Cooper 2005; Desviat et al. 2012). These minigene assays support that the c.2817C > T (p.S939S) mutation in the Poly ADP-ribose polymerase 1 (PARP1) gene, which encodes an enzyme involved in cellular DNA repair, generates a novel splice donor site. This then results in a 10 amino acid deletion within the PARP1 catalytic domain in a lung squamous cell carcinoma (LUSC) patient. Also the c.234A > G (p.T78T) mutation in RAD51C, which encodes a protein involved in DNA double-strand break repair, was further validated to cause aberrant splicing in splicing minigene assays (Jayasinghe et al. 2018). Jung et al. analyzed data from six cancer types to identify somatic SNVs that cause intron retention. They characterized two synonymous mutations in breast cancer that caused intron retention in TP53 c.375C > G (p.T125T) and CDH1 c.1008G > A (p.E336E) and one in ARID1A (c.4101G > A, p.Q1367Q) in lung cancer. Intron retention was validated using minigene splicing assays corroborating the synonymous mutation computational analysis results (Jung et al. 2015).
5 SNPs Ability to Influence Disease Risk: Breaking the Silence on Synonymous…
85
Fig. 5.2 Splicing dysregulation by single nucleotide variants. This figure illustrates a non- exhaustive set of examples by which synonymous mutations (indicated in red) alter the splicing of exons (indicated as rectangles)
Synonymous mutations can also influence pre-mRNA splicing by affecting exonic splicing enhancer (ESE) and exonic splicing silencer (ESS) motifs. ESE or ESS DNA sequence motifs consist of 4–18 bases within an exon that enhance or inhibit splicing. ESEs execute this function by assisting in the recruitment of splicing factors to the adjacent intron. On the other hand, ESSs recruit proteins that negatively affect the splicing machinery. Zhang and Xia et al. explored the role of somatic synonymous mutations in melanoma samples from The Cancer Genome Atlas (TCGA), reporting 402 ESEs and 316 ESSs synonymous mutations, and conclude that pathogenic synonymous mutations are enriched in regions with a role in splicing regulation (Zhang et al. 2020). Supek et al. described that synonymous mutations are 1.75 times enriched within 30 bp of an exon boundary in oncogenes as compared to control genes, and that this is not the case for tumor suppressor genes. Analysis of RNA-sequencing data from more than 2000 cancer patients from which also DNA mutation data were available could document detectable splicing aberrations in the oncogenes that showed the strongest enrichment for synonymous mutations. Their data support that the most frequent mechanism by which synonymous mutations affect the splicing of oncogene encoding RNAs is by creating ESE or by destroying ESS motifs. These results were further validated by analyzing the effects of 12 synonymous mutations in five oncogenes in splicing minigene assays. Splicing changes due to reduced exon skipping by ESS loss or ESE gain were seen for half of the tested mutations (Supek et al. 2014).
86
E. Herreros et al.
Synonymous mutations affecting splicing have been described in major tumor suppressor genes such as APC, BRCA1/2 and TP53. The TP53 gene encodes the p53 tumor suppressor protein which is considered to be the “Guardian of the genome” because of its role in conserving DNA stability by sensing and activating the DNA repair mechanisms. The first time that the pathogenic synonymous mutation c. 375G > A (p.T125T) was detected in the TP53 gene was in patients with Li-Fraumeni, which is an autosomal dominant syndrome with an elevated risk for a variety of cancers. This mutation affects the splice donor site of exon 4, provoking an intron retention (Warneford et al. 1992; Varley et al. 1998). Synonymous mutations affecting the same nucleotide have also been recurrently detected as somatically acquired mutations in cancer patients and in a human T-cell leukemia cell line (Supek et al. 2014; Soudon et al. 1991). Overall, TP53 is strongly enriched for synonymous mutations that mainly cluster in nucleotides adjacent to splice sites (75% of synonymous mutations in TP53). Also the 3′ terminal nucleotide of exon 6 is recurrently hit by synonymous (e.g. c.672G > A (p.E224E)) and non-synonymous mutations. Data from a minigene splicing assay support that the c.672G > A (p.E224E) mutation activates a cryptic splice site, resulting in an mRNA with a shifted open reading frame. Yet another synonymous mutation in TP53 (c.993G > A (p.Q331Q)) in the 3′ terminal nucleotide of exon 9 was described with similar properties (Supek et al. 2014). The APC protein is a negative regulator of the WNT signaling cascade that is involved in cell adhesion. In patients with familial adenomatous polyposis, an autosomal dominant disorder characterized by the development of colorectal adenomas, a c.1869G > T (p.R623R) mutation in APC was shown to cause exon skipping (Montera et al. 2001). Another study in a patient who primarily developed lung carcinoma and later brain metastasis, described a c.5883G > A (p.P1961P) nucleotide substitution in the APC gene that results in an aberrantly spliced mRNA. Interestingly, this mutation was present in the metastasis but not in the primary lung tumor, suggesting that this mutation may have played a role in tumor metastasis (Pecina-Slaus et al. 2010). The BRCA1 and BRCA2 genes encode tumor suppressor proteins involved in the repair pathway of DNA double-strand breaks, and their inactivation results in an elevated risk to develop breast and ovarian cancer. Around 5–10% of women with breast or ovarian cancer have a congenital mutation in the BRCA1 or BRCA2 gene, and more than 50% of breast tumors and 7% of ovarian tumors display acquired mutations in these genes (O’Donovan et al. 2010). Disruption of BRCA1 and BRCA2 function by non-synonymous mutations has been well established, but a few studies also indicate a role for synonymous mutations in this context. In BRCA1, the c.5073A > T (p.T1691T) mutation was reported by two studies to affect splicing in families with breast and ovarian cancer. Using the NNSPLICE splicing prediction tool, (Reese et al. 1997) this mutation was predicted to promote skipping of exon 17. These results were further consolidated by RT-PCR detection of the aberrant transcript (Coppa et al. 2018; Minucci et al. 2018). Regarding BRCA2, Hansen et al. described the c.744G > A (p.K172K) synonymous mutation at the last base in exon 6, next to the splice donor site, in a Danish family with breast and ovarian
5 SNPs Ability to Influence Disease Risk: Breaking the Silence on Synonymous…
87
cancer. Utilizing exon trapping analysis, they showed that the mutation results in exon skipping of exon 6 or of both exon 5 and 6 (Hansen et al. 2010). Recently, one group described mutations that could affect splicing between the exons 2–9 of the BRCA2 gene. From the mutation databases BIC and UMD, Beroud et al. (2016) they pooled out 302 different variants. After in silico analysis utilizing NNSPLICE (Reese et al. 1997) and Human Splicing Finder (Desmet et al. 2009), 84 variants were selected for splicing minigene assays. Fragment analysis by capillary electrophoresis showed that 53 mutations (63.8%) impaired splicing: 27 exonic (18 missense, 3 synonymous, 2 nonsense, and 4 frameshift) and 26 intronic changes. One of these synonymous mutations (c.378A > G (p.Q126Q)) was reported to disrupt an ESE motif, whereas the synonymous mutations, c.222G > A (p.L74L) and c.441A > G (p.Q147Q) create an ESS motif. Interestingly, the c.441A > G (p.Q147Q) mutation was previously catalogued as a variant of uncertain clinical significance (VUS) and was now reclassified as pathogenic or likely pathogenic (Fraile- Bethencourt et al. 2019). Finally, we want to highlight a recent study on synonymous mutations in the GATA2 gene. A variety of germline mutations driving deficiency for the GATA2 transcription factor lead to a complex multi-system disorder that can present with many manifestations including cytopenias, bone marrow failure, myelodysplastic syndrome/acute myeloid leukemia (MDS/AML), and severe immunodeficiency (Crispino et al. 2017; Hirabayashi et al. 2017). Kozyra et al. analyzed a cohort of 911 individuals with the cancer predisposing disease MDS. In this cohort, 110 patients carried at least one mutation in GATA2, of which nine patients carried synonymous GATA2 mutations: c.351C > G (p.T117T) (n = 4 patients); c.649C > T (p.L217L); c.981G > A (p.G327G); c.1023C > T (p.A341A) and c.1416G > A (p.P472P) (n = 2). Among these five different synonymous mutations, three were predicted in silico to affect splicing. c.351C > G (p.T117T) and c.981G > A (p.G327G) were predicted to create a novel ESS motif, whereas c.1023C > T (p.A341A) disrupts an ESE motif. Splicing analysis by RNA sequencing and RT-PCR on patient material confirmed the aberrant splicing associated with the most recurrent c.351C > G (p.T117T) variant, but not for the other variants. However, for these other mutations, a complete lack or substantial reduction in RNA levels of the mutant allele could be shown (Kozyra et al. 2020).
5.4.2 mRNA Structure The secondary structure of an mRNA molecule, and by extension its three- dimensional folding, is dictated by base pairing between complementary nucleotides. The structure of an mRNA molecule is essential for its functioning. First, mRNA structure determines its stability (Nowakowski et al. 1997), and a mutation that alters mRNA structure may thus affect cellular levels of the protein that it encodes for by altering mRNA stability. The RNA structure also determines the accessibility of the mRNA for post-transcriptional modifications such as
88
E. Herreros et al.
methylation, and can profoundly impact mRNA function (Anreiter et al. 2020). Furthermore, the mRNA structure dictates the efficiency and speed by which a ribosome can translate an mRNA into protein. In this context, it has been well established that setting the correct translation initiation speed is essential, and that this initiation speed is heavily determined by mRNA structures in the 5′ UTR region (Livingstone et al. 2010). The impact of an altered folding of the open reading frame of an mRNA on protein translation efficiency by the ribosome is less well understood. However, it is hard to imagine that a significant change in its 3D structure due to a nucleotide change would not affect ribosomal translation speed. Finally, mRNA structure also dictates interaction of an mRNA with RNA binding proteins (RBPs) and with microRNAs, which again will affect RNA stability and translation efficiency. Some nucleotides have an essential role in maintaining more complex elements of mRNA structure such as hairpin or pseudoknot structures, and mutation of such a nucleotide can thus have far reaching consequences. Interestingly, while all positions in a codon are important to maintain secondary mRNA structure, synonymous codon positions contribute most heavily to mRNA stability, and base pairing at the third codon position is significantly higher as compared to other codon sites in mammalian transcriptomes (Shabalina et al. 2006). Hence, the impact of synonymous nucleotide substitutions on secondary mRNA structure is not to be underestimated. A first example of a synonymous mutation in cancer that may promote oncogenesis by altering mRNA structure is the c.30A > C (p.G10G) mutation in the KRAS oncogene. The RAS family proteins are small GTPases involved in transmitting incoming signals at the cell surface in order to activate genes involved in cell differentiation, survival and growth. Mutations in KRAS, HRAS or NRAS are observed in around 25% of all tumors, and these mutations promote cancer by causing hyperactive cell signaling (Downward 2003). The c.30A > C (p.G10G) mutation in KRAS was picked-up in the pan-cancer study by the Diederichs group because its SynMICdb score ranked in the 99.9th percentile of all analyzed synonymous mutations and because of its high score with two different algorithms to predict mutation induced changes in RNA structure (remuRNA (Salari et al. 2013) and RNAsnp (Sabarinathan et al. 2013)). Additionally, in silico RNA structure prediction tools as well as chemical probing experiments by SHAPE (Selective 2′-Hydroxyl Acylation analyzed by Primer Extension) confirmed the impact of the c.30A > C mutation on KRAS mRNA structure. Upon overexpression of a cDNA with this mutation in HEK293 cells, a minor (±20%) induction of KRAS expression was observed. Addition of the endogenous 5′ UTR to this overexpression construct may be needed to see a more profound effect on KRAS expression and mRNA structure (Sharma et al. 2019). A second interesting example is the c.309C > T (p.R103R) variant in ZFP36. ZFP36 encodes an RNA binding protein that binds adenylate and uridylate (AU)rich elements (ARE) in the 3′UTR of mRNAs involved in inflammation, cell cycle regulation and angiogenesis, leading to the degradation of these mRNAs. This function of ZFP36, together with the reduced ZFP36 mRNA and proteins levels in a
5 SNPs Ability to Influence Disease Risk: Breaking the Silence on Synonymous…
89
variety of cancer types, point towards a tumor suppressor role for this protein (Sanduja et al. 2012). The c.309C > T (p.R103R) variant in ZFP36 corresponds to a known single nucleotide polymorphism (SNP; rs3746083) that has been linked to rheumatoid arthritis and that occurs in the healthy population at a frequency of 3.5% (Carrick et al. 2006). The c.309C > T variant in ZFP36 is present in an aggressive breast cancer cell line with detectable ZFP36 mRNA levels, in which ZFP36 protein expression could not be shown. In vitro transcription and translation assays as well as cDNA transfection experiments in HEK293 cells could confirm that introduction of the c.309C > T mutation in the ZFP36 mRNA drastically impairs ZFP36 protein expression. Whereas this mutation does not significantly alter the ZFP36 mRNA stability, it is associated with a profound reduction in association with heavy polysome fractions in polysome profiling assays. These data support that this variant reduces ZFP36 translation efficiency, which may be caused by a more stable secondary structure of the c.309C > T mutant ZFP36 mRNA as compared to wild type, as indicated by in silico predictions of mRNA folding (Griseri et al. 2011). The International Cancer Genome Consortium (ICGC; https://icgc.org/) reports the c.309C > T mutation in ZFP36 in acute myeloid leukemia and colon adenocarcinoma. Interestingly, the incidence of this variant is also 5 times higher in breast tumors that are resistant to treatment with the anti-HER2 antibody Herceptin (or Trastuzumab) as compared to Herceptin responsive breast cancers (Griseri et al. 2011). This observation supports that synonymous mutations may not only be relevant for the oncogenic transformation process of cells, but also for determining the response of cancer cells to therapy. The idea that an altered mRNA structure due to synonymous mutations may impact affinity for RNA binding proteins (RBPs) was recently explored at pan- cancer level using PIVar. PIVar is a computational pipeline that was developed to mine TCGA mutation data for SNVs that overlap with RBP-binding sites identified by crosslinking and immunoprecipitation sequencing (CLIP-seq). To filter SNVs that may have a functional impact, PIVar subsequently analyzes these SNVs for impact on RNA expression, secondary RNA structure and RBP binding. Using PIVar, almost 23,000 synonymous SNVs across 22 cancer types and targeting 2042 genes were predicted to affect RBP affinity. Depending on the cancer type, this corresponds to 5–15% of the synonymous mutations in that cancer type. The binding motif of 35 RBPs was overrepresented in the RBP binding disrupting synonymous mutations identified by PIVar. Network analyses revealed genes related to phosphoinositide 3-kinase (PI3K), NOTCH and mTOR signaling to be significantly affected by RBP affinity disrupting synonymous mutations. Interestingly, besides prominent known cancer genes, this study identifies many genes that have previously not been linked to cancer as host genes for RBP disrupting synonymous mutations. Electrophoretic mobility shift assays (EMSAs) provided experimental support for disruption of RBP binding upon introduction of synonymous mutations in DAB2, ZFHX3 and USP9X (Teng et al. 2020). Further experiments are however required to test the functional relevance of these findings in cancer pathogenesis. MicroRNAs (miRNAs) are a family of small non-coding RNAs of approximately 18–28 nucleotides long. Their primary role is post-transcriptional regulation of the
90
E. Herreros et al.
expression of proteins involved in a wide variety of biological processes. Many miRNAs have been identified as ‘oncomirs’. These oncogenic miRNAs are misexpressed in cancer and promote cancer by modulating oncogenes or tumor suppressors (Calin et al. 2006; Peng et al. 2016). Misexpression of these miRNAs in cancer originates from a variety of sources such as copy number changes, epigenetic DNA methylation and histone deacetylation changes and alterations in miRNA transcription or miRNA processing factors (Calin et al. 2004; Mavrakis et al. 2010; Scott et al. 2006; Saito et al. 2006; Chang et al. 2008; He et al. 2007; Thomson et al. 2006; Karube et al. 2005). Besides misexpression, the function of a miRNA can be abrogated by mutating its binding region in the target mRNA. This mechanism has not been studied extensively, but several interesting examples have been described (Yuan et al. 2019). In this regard, the c.51C > T (p.F17F) synonymous mutation in the BCL2L12gene was identified. This is a highly recurrent mutation that is present in 4% of melanoma tumors and that results in higher BCL2L12 transcript and protein levels. Two different miRNA target prediction programs predicted that miRNA hsa-miR-671–5p binds the BCL2L12 wild type, but not the c.51C > T BCL2L12 mutant sequence. In agreement with this, transfection of this miRNA into cells did not affect c.51C > T BCL2L12 mutant RNA levels, whereas a significant mRNA reduction was obtained for the wild type BCL2L12 mRNA. In other words, this synonymous mutation protects BCL2L12 from hsa-miR-671–5p mediated reduction of BCL2L12 transcript levels (Gartner et al. 2013). BCL2L12 belongs to the Bcl-2 protein family of apoptotic regulators and has been previously linked to tumorigenesis. In the majority of glioblastomas, BCL2L12 is upregulated, resulting in resistance to apoptosis by binding TP53 (Stegh et al. 2010). Whereas the synonymous BCL2L12 mutation in melanoma does not affect TP53 binding, the mutation was shown to protect melanoma cells from UV-induced apoptosis and is associated with reduced p53 target gene transcription (Gartner et al. 2013).
5.4.3 Codon Usage The protein coding region of the human DNA is composed of 64 different triplets of nucleotides (codons) that encode for 20 different amino acids. Most amino acids can thus be encoded by two to six synonymous codons. The particular codon that is used for an amino acid however has implications for the translation speed of the protein, as not every tRNA is present in equal abundance in each cell or tissue (Kames et al. 2020). A synonymous nucleotide change can thus result in a codon for which the tRNA abundance is different, which in turn affects translation speed, resulting in altered protein levels and/or protein misfolding and degradation (Tsai et al. 2008; Lampson et al. 2013; Hanson et al. 2018). Codon frequency has been shown to correlate well with expression of the corresponding tRNAs (Mahlab et al. 2012). Therefore, potential effects on translation speed can be scored by evaluating the difference in abundance between the wild type and mutant codon in the human protein coding genome. In addition to codon frequency, two other types of codon
5 SNPs Ability to Influence Disease Risk: Breaking the Silence on Synonymous…
91
usage bias exist: codon pair bias and codon co-occurrence bias. The codon pair bias refers to the non-random distribution of nucleotides neighboring a particular codon. This mechanism is thought to influence the efficiency of the translation process by interaction of the tRNAs in the A and P sites of the ribosome (Buchan et al. 2006). Codon co-occurrence bias works by clustering synonymous codons that are recognized by the same tRNA. This codon bias type involves both frequent and rare codons and is most prominent in highly expressed genes that must be rapidly induced (Cannarozzi et al. 2010). Chu and Wei et al. calculated the impact of synonymous mutations on codon bias and codon frequency. For this purpose, they assigned a codon bias value between −1 and 1, where a positive value indicates a preferred codon. They found that in cancer related genes, codons with their synonymous counterparts ending in C/G were preferred over the codons ending in A/T. Furthermore, cancer related genes were significantly positively correlated with codon bias changes, inferring that the synonymous SNPs in cancer-related genes tend to gain a more frequently used codon or a preferred codon (Chu et al. 2019). A recent study analyzed, using different statistical measures, the possible role of codon bias on thyroid carcinoma genes, proposing some synonymous mutations that could affect significantly the codon usage (Pepe et al. 2020). An optimization in the codon usage has been reported to benefit cellular translation and cell growth (Qian et al. 2012; Yang et al. 2014). For the RAS family of oncogenes, codon bias is an established mechanism of gene expression regulation. Whereas the three RAS family proteins have a very similar amino acid composition, their codon usage is different. The nucleotide sequence of KRAS is enriched in rare codons (decoded by low-abundant tRNAs) in comparison to HRAS which contains many common codons. This explains the poor cellular translation efficiency and low protein expression level of KRAS as compared to HRAS in the human body. In agreement with this, expression of an adeno- associated viral vector in HCT166 colon cancer cells in which the rare KRAS codons were replaced by more common codons resulted in five fold higher KRAS protein and two fold higher KRAS mRNA expression as compared to the original KRAS sequence (Lampson et al. 2013). Also in other cancer cell lines, higher protein expression of KRAS was observed when expressing a cDNA in which the rare KRAS codons were replaced by common codons. Similarly, HRAS protein expression is reduced by replacing its common codons by rare codons. Interestingly, these codon replacements affected drug sensitivity of the cancer cell lines, and the higher KRAS expression and reduced HRAS expression were associated with resistance to kinase inhibitors such as vemurafenib, gefitinib and sunitinib (Ali et al. 2017).
5.4.4 Protein Stability In the context of TP53, the synonymous c.66A > G (p.L22L) variant was shown to affect TP53 protein stability. In non-stressed cellular conditions, the ubiquitin ligase MDM2 induces continuous TP53 ubiquitination followed by proteasomal TP53
92
E. Herreros et al.
degradation. When DNA double-strand breaks occur in the cell, TP53 accumulates because of phosphorylation of MDM2 by the ATM kinase. This phosphorylation switches the activity of MDM2 from a negative to a positive regulator of TP53 by causing MDM2 binding to the TP53 mRNA and by promoting TP53 mRNA translation. ATM also phosphorylates protein residue S15 on the nascent TP53 protein, and this phosphorylation prevents that the produced TP53 protein immediately gets ubiquitinated by MDM2 and degraded by the proteasome. Interestingly, the synonymous c.66A > G nucleotide change in codon L22 of TP53 prevents this phosphorylation event by MDM2, leading to TP53 degradation (Karakostis et al. 2019). Whereas this synonymous variant clearly affects TP53 function, it is very rare in cancer. The mutation was only reported once in one patient with chronic lymphocytic leukemia (CLL) (Oscier et al. 2002) and the variant also corresponds to a rare SNP (rs748527030).
5.4.5 Other Mechanisms For the synonymous mutations described above, a (potential) mechanism of how these variants affect the protein expression level of the mutated gene has been proposed or experimentally demonstrated. However, this mechanism is not always very clear. An interesting example here is the c.2361G > A (p.Q787Q) polymorphism (SNP rs1050171) in the epidermal growth factor receptor (EGFR). This SNP has been linked to higher EGFR translation and reduced drug sensitivity to EGFR targeting drugs in the context of renal cell carcinoma (Grepin et al. 2020). In complete contrast, this SNP induces higher sensitivity to EGFR targeting drugs in head and neck cancer, where it alters EGFR splicing because of reduced expression of the EGFR-AS1 long noncoding RNA (Tan et al. 2017). In fact a number of synonymous SNPs in EGFR have been linked to drug sensitivity and patient outcome, but mechanistic details on these effects are missing (Toomey et al. 2016). The variety of molecular mechanisms by which synonymous mutations may potentially affect gene expression is very large, and it is to be expected that synonymous mutations for which mechanistic insights are currently lacking are affecting less characterized mechanisms such as exonic DNA methylation or transcription regulatory sequences, post-transcriptional RNA modifications, etc.
5.5 What Is Needed to Entirely Break the Silence on Synonymous Mutations? In the past 15 years, next generation sequencing analysis of large cohorts of cancer samples has resulted in a detailed atlas of mutations. Significant efforts have gone towards characterizing the functional impact of non-synonymous mutations in protein coding regions and recently also of non-coding mutations. Relatively little
5 SNPs Ability to Influence Disease Risk: Breaking the Silence on Synonymous…
93
attention has been dedicated to the characterization of synonymous mutations, despite the fact that they represent 23% of the point mutations in the coding region (Sharma et al. 2019). Based on biostatistical analyses, it is estimated that somatic synonymous mutations represent 6–8% of all cancer driver mutations due to single nucleotide substitutions (Supek et al. 2014). This is however only a rude estimation, as the number of somatic synonymous mutations that have been experimentally tested is highly limited. Furthermore, this experimental testing is often restricted to showing an effect of the mutation on the mRNA and/or protein expression level of the mutant gene. In order to draw conclusions on cancer driver activity, cell transformation assays and/or tumor acceleration studies in animal models would be required. Transformation of a normal cell into a malignant one typically requires accumulation of a series of mutations. It is unclear whether synonymous mutations may rather act as early mutations bringing the cell in a pre-malignant state, or as late mutations that release the brake in full-blown cell transformation. Single cell sequencing studies can shed light on this mutational order. As described in this text, a number of synonymous SNPs have been shown to affect the protein expression level of cancer genes, resulting in altered drug sensitivity. It will be of interest to see whether also somatic synonymous mutations can be linked to therapy failure and disease reappearance in relapse samples. Acknowledgements Research on synonymous mutations in the lab of Prof. De Keersmaecker is funded by an ERC consolidator grant (grant no 862246). Xander Janssens is funded by an FWO aspirant mandate fundamental research with number 1174021N.
References Adzhubei IA et al (2010) A method and server for predicting damaging missense mutations. Nat Methods 7:248–249 Alexandrov LB et al (2013) Signatures of mutational processes in human cancer. Nature 500:415–421 Ali M et al (2017) Codon bias imposes a targetable limitation on KRAS-driven therapeutic resistance. Nat Commun 8:15617 Anreiter I et al (2020) New twists in detecting mRNA modification dynamics. Trends Biotechnol 39(1):72–89. https://doi.org/10.1016/j.tibtech.2020.06.002 Beroud C et al (2016) BRCA share: a collection of clinical BRCA gene variants. Hum Mutat 37:1318–1328 Buchan JR et al (2006) tRNA properties help shape codon pair preferences in open reading frames. Nucleic Acids Res 34:1015–1027 Buske OJ et al (2013) Identification of deleterious synonymous variants in human genomes. Bioinformatics 29:1843–1850 Calin GA et al (2004) Human microRNA genes are frequently located at fragile sites and genomic regions involved in cancers. Proc Natl Acad Sci U S A 101:2999–3004 Calin GA et al (2006) MicroRNA signatures in human cancers. Nat Rev Cancer 6:857–866 Cannarozzi G et al (2010) A role for codon order in translation dynamics. Cell 141:355–367
94
E. Herreros et al.
Carrick DM et al (2006) Genetic variations in ZFP36 and their possible relationship to autoimmune diseases. J Autoimmun 26:182–196 Cartegni L et al (2002) Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nat Rev Genet 3:285–298 Cerami E et al (2012) The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov 2:401–404 Chang T-C et al (2008) Widespread microRNA repression by Myc contributes to tumorigenesis. Nat Genet 40:43–50 Chu D et al (2019) Nonsynonymous, synonymous and nonsense mutations in human cancer- related genes undergo stronger purifying selections than expectation. BMC Cancer 19:359 Consortium ITP-CAoWG (2020) Pan-cancer analysis of whole genomes. Nature 578:82–93 Cooper TA (2005) Use of minigene systems to dissect alternative splicing elements. Methods (San Diego, Calif) 37:331–340 Coppa A et al (2018) Optimizing the identification of risk-relevant mutations by multigene panel testing in selected hereditary breast/ovarian cancer families. Cancer Med 7:46–55 Crispino JD et al (2017) GATA factor mutations in hematologic disease. Blood 129:2103–2110 Desmet F-O et al (2009) Human splicing finder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res 37:e67 Desviat LR et al (2012) Minigenes to confirm exon skipping mutations. Methods Mol Biol (Clifton, NJ) 867:37–47 Diederichs S et al (2016) The dark matter of the cancer genome: aberrations in regulatory elements, untranslated regions, splice sites, non-coding RNA and synonymous mutations. EMBO Mol Med 8:442–457 Ding L et al (2008) Somatic mutations affect key pathways in lung adenocarcinoma. Nature 455:1069–1075 Downward J (2003) Targeting RAS signalling pathways in cancer therapy. Nat Rev Cancer 3:11–22 Fraile-Bethencourt E et al (2019) Mis-splicing in breast cancer: identification of pathogenic BRCA2 variants by systematic minigene assays. J Pathol 248:409–420 Gartner JJ et al (2013) Whole-genome sequencing identifies a recurrent functional synonymous mutation in melanoma. Proc Natl Acad Sci U S A 110:13481–13486 Gonzalez-Perez A et al (2012) Improving the prediction of the functional impact of cancer mutations by baseline tolerance transformation. Genome Med 4:89 Grepin R et al (2020) The combination of bevacizumab/Avastin and erlotinib/Tarceva is relevant for the treatment of metastatic renal cell carcinoma: the role of a synonymous mutation of the EGFR receptor. Theranostics 10:1107–1121 Griseri P et al (2011) A synonymous polymorphism of the Tristetraprolin (TTP) gene, an AU-rich mRNA-binding protein, affects translation efficiency and response to Herceptin treatment in breast cancer patients. Hum Mol Genet 20:4556–4568 Hanahan D et al (2000) The hallmarks of cancer. Cell 100:57–70 Hanahan D et al (2011) Hallmarks of cancer: the next generation. Cell 144:646–674 Hansen TVO et al (2010) The silent mutation nucleotide 744 G --> A, Lys172Lys, in exon 6 of BRCA2 results in exon skipping. Breast Cancer Res Treat 119:547–550 Hanson G et al (2018) Codon optimality, bias and usage in translation and mRNA decay. Nat Rev Mol Cell Biol 19:20–30 He L et al (2007) A microRNA component of the p53 tumour suppressor network. Nature 447:1130–1134 Hirabayashi S et al (2017) Heterogeneity of GATA2-related myeloid neoplasms. Int J Hematol 106:175–182 Jayasinghe RG et al (2018) Systematic analysis of splice-site-creating mutations in cancer. Cell Rep 23:270–281 Jung H et al (2015) Intron retention is a widespread mechanism of tumor-suppressor inactivation. Nat Genet 47:1242–1248
5 SNPs Ability to Influence Disease Risk: Breaking the Silence on Synonymous…
95
Kames J et al (2020) TissueCoCoPUTs: novel human tissue-specific codon and codon-pair usage tables based on differential tissue gene expression. J Mol Biol 432:3369–3378 Karakostis K et al (2019) A single synonymous mutation determines the phosphorylation and stability of the nascent protein. J Mol Cell Biol 11:187–199 Karube Y et al (2005) Reduced expression of Dicer associated with poor prognosis in lung cancer patients. Cancer Sci 96:111–115 Kozyra EJ et al (2020) Synonymous GATA2 mutations result in selective loss of mutated RNA and are common in patients with GATA2 deficiency. Leukemia 34:2673–2687 Lampson BL et al (2013) Rare codons regulate KRas oncogenesis. Curr Biol 23:70–75 Lawrence MS et al (2013) Mutational heterogeneity in cancer and the search for new cancer- associated genes. Nature 499:214–218 Livingstone M et al (2010) Mechanisms governing the control of mRNA translation. Phys Biol 7(2):021001. https://doi.org/10.1088/1478-3975/7/2/021001 Mahlab S et al (2012) Conservation of the relative tRNA composition in healthy and cancerous tissues. RNA 18:640–652 Martincorena I et al (2017) Universal patterns of selection in cancer and somatic tissues. Cell 171:1029–1041 Mavrakis KJ et al (2010) Genome-wide RNA-mediated interference screen identifies miR-19 targets in Notch-induced T-cell acute lymphoblastic leukaemia. Nat Cell Biol 12(4):372–379. https://doi.org/10.1038/ncb2037 Minucci A et al (2018) Preliminary molecular evidence associating a novel BRCA1 synonymous variant with hereditary ovarian cancer syndrome. Hum Genome Var 5:2 Montera M et al (2001) A silent mutation in exon 14 of the APC gene is associated with exon skipping in a FAP family. J Med Genet 38:863–867 Ng PC et al (2003) SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res 31:3812–3814 Nowakowski J et al (1997) RNA structure and stability. Semin Virol 8:153–165 O’Donovan PJ et al (2010) BRCA1 and BRCA2: breast/ovarian cancer susceptibility gene products and participants in DNA double-strand break repair. Carcinogenesis 31:961–967 Oscier DG et al (2002) Multivariate analysis of prognostic factors in CLL: clinical stage, IGVH gene mutational status, and loss or mutation of the p53 gene are independent prognostic factors. Blood 100:1177–1184 Pecina-Slaus N et al (2010) Report on mutation in exon 15 of the APC gene in a case of brain metastasis. J Neuro-Oncol 97:143–148 Peifer M et al (2012) Integrative genome analyses identify key somatic driver mutations of small- cell lung cancer. Nat Genet 44:1104–1110 Peng Y et al (2016) The role of microRNAs in human cancer. Signal Transduct Target Ther 1:15004 Pepe D et al (2020) Codon bias analyses on thyroid carcinoma genes. Minerva Endocrinol 45(4):295–305. https://doi.org/10.23736/S0391-1977.20.03252-6 Qian W et al (2012) Balanced codon usage optimizes eukaryotic translational efficiency. PLoS Genet 8:e1002603 Reese MG et al (1997) Improved splice site detection in Genie. J Comput Biol 4:311–323 Rheinbay E et al (2020) Analyses of non-coding somatic drivers in 2,658 cancer whole genomes. Nature 578:102–111 Sabarinathan R et al (2013) RNAsnp: efficient detection of local RNA secondary structure changes induced by SNPs. Hum Mutat 34:546–556 Saito Y et al (2006) Specific activation of microRNA-127 with downregulation of the proto- oncogene BCL6 by chromatin-modifying drugs in human cancer cells. Cancer Cell 9:435–443 Salari R et al (2013) Sensitive measurement of single-nucleotide polymorphism-induced changes of RNA conformation: application to disease studies. Nucleic Acids Res 41:44–53 Sanduja S et al (2012) The role of tristetraprolin in cancer and inflammation. Front Biosci (Landmark Edition) 17:174–188
96
E. Herreros et al.
Sauna ZE et al (2011) Understanding the contribution of synonymous mutations to human disease. Nat Rev Genet 12:683–691 Scott GK et al (2006) Rapid alteration of microRNA levels by histone deacetylase inhibition. Cancer Res 66:1277–1281 Shabalina SA et al (2006) A periodic pattern of mRNA secondary structure created by the genetic code. Nucleic Acids Res 34:2428–2437 Sharma Y et al (2019) A pan-cancer analysis of synonymous mutations. Nat Commun 10:2569 Sondka Z et al (2018) The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat Rev Cancer 18:696–705 Soudon J et al (1991) Inactivation of the p53 gene-expression by a splice donor site mutation in a human T-cell leukemia cell line. Leukemia 5:917–920 Soussi T et al (2017) Synonymous somatic variants in human cancer are not infamous: a plea for full disclosure in databases and publications. Hum Mutat 38:339–342 Stegh AH et al (2010) Glioma oncoprotein Bcl2L12 inhibits the p53 tumor suppressor. Genes Dev 24:2194–2204 Stoss O et al (1999) The in vivo minigene approach to analyze tissue-specific splicing. Brain Res Brain Res Protoc 4:383–394 Supek F et al (2014) Synonymous mutations frequently act as driver mutations in human cancers. Cell 156:1324–1335 Tan DSW et al (2017) Long noncoding RNA EGFR-AS1 mediates epidermal growth factor receptor addiction and modulates treatment response in squamous cell carcinoma. Nat Med 23:1167–1175 Teng H et al (2020) Prevalence and architecture of posttranscriptionally impaired synonymous mutations in 8,320 genomes across 22 cancer types. Nucleic Acids Res 48:1192–1205 Thomson JM et al (2006) Extensive post-transcriptional regulation of microRNAs and its implications for cancer. Genes Dev 20:2202–2207 Toomey S et al (2016) The impact of ERBB-family germline single nucleotide polymorphisms on survival response to adjuvant trastuzumab treatment in HER2-positive breast cancer. Oncotarget 7:75518–75525 Tsai C-J et al (2008) Synonymous mutations and ribosome stalling can lead to altered folding pathways and distinct minima. J Mol Biol 383:281–291 Varley JM et al (1998) Genetic and functional studies of a germline TP53 splicing mutation in a Li-Fraumeni-like family. Oncogene 16:3291–3298 Warneford SG et al (1992) Germ-line splicing mutation of the p53 gene in a cancer-prone family. Cell Growth Differ 3:839–846 Yang J-R et al (2014) Codon-by-codon modulation of translational speed and accuracy via mRNA folding. PLoS Biol 12:e1001910 Yuan Y et al (2019) Functional microRNA binding site variants. Mol Oncol 13:4–8 Zhang D et al (2020) Somatic synonymous mutations in regulatory elements contribute to the genetic aetiology of melanoma. BMC Med Genet 13:43
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Part IV
An Examination of the Mechanisms by Which Synonymous Mutations Affect Protein Levels or Protein Folding Which Affect Human Physiology and Response to Therapy
Chapter 6
An Examination of Mechanisms by which Synonymous Mutations may Alter Protein Levels, Structure and Functions Yiming Zhang and Zsuzsa Bebok
6.1 Introduction In this chapter, we review mechanisms by which synonymous codon variations influence gene expression at the mRNA or protein levels. These changes may alter the structure and function of the encoded proteins, leading to diseases, altered disease phenotype, or interfering with protein-drug interactions. In order to appreciate the significance of synonymous mutations, we first look at the concept of codon redundancy and the steps of protein biogenesis which can be influenced by synonymous codon variations. Concepts of Codon Redundancy and Codon Bias In mRNA, 61 possible three- letter codons encode 20 amino acids. Therefore, most amino acids, except methionine and tryptophan, are encoded by multiple codons, presenting codon redundancy, a term explaining this phenomenon (Dufton 1983). The frequency of synonymous codons in the genome is not random. Some codons are used more frequently than others and this codon bias refers to the preferred synonymous codon usage in each organism or gene (Sharp and Li 1986). Synonymous Codons Carry Protein Structure Information It has been demonstrated that synonymous codons carry protein structural information (Saunders and Deane 2010; Saunders et al. 2011; Liu et al. 2021; Hanson and Coller 2018), and our understanding of the mechanisms by which this occurs is rapidly advancing.
Y. Zhang · Z. Bebok (*) Department of Cell, Developmental and Integrative Biology (CDIB), The University of Alabama at Birmingham, Birmingham, AL, USA e-mail: [email protected] © Springer Nature Switzerland AG 2022 Z. E. Sauna, C. Kimchi-Sarfaty (eds.), Single Nucleotide Polymorphisms, https://doi.org/10.1007/978-3-031-05616-1_6
99
100
Y. Zhang and Z. Bebok
Novel experimental approaches allow scientists to test how synonymous codon substitutions modify cotranslational protein folding in vivo (Kim et al. 2015; Walsh et al. 2020). The results from these studies corroborate the idea that the choice of synonymous codons has significant influences on protein conformation (Buhr et al. 2016) and cellular fitness (Buhr et al. 2016; O’Brien et al. 2014). These discoveries support the theory that the redundancy of the genetic code enabled the evolution processes to optimize gene expressions in organisms (Tuller 2014; Tuller et al. 2010a). Biological Examples Suggesting That Codon Usage Influences Protein Biogenesis and Structure Pechmann and colleagues have identified universal patterns of optimally and non-optimally conserved codons and codon clusters that govern the secondary structures of the encoded proteins; these patterns are independent of protein expression levels. The authors argued that codon usage evolved to fine- tune the rhythm of translation, which optimizes cotranslational folding rather than enhances protein expression levels (Pechmann and Frydman 2013). Similar conclusions were drawn by Dhindsa and colleagues following the analysis of codon usage in ~200,000 individuals (Dhindsa et al. 2020). It has been determined experimentally that synonymous codons may direct proteins towards a different conformation and consequently function, supporting the idea that synonymous codons provide a secondary “code” for protein folding (Buhr et al. 2016). These findings suggest that protein folding is influenced by both thermodynamic and kinetic laws (Whitesides and Grzybowski 2002). This principle can be conceptualized by multi-domain membrane proteins. In addition to their amino acid compositions, their folding depends on the dynamics of translation involving both cotranslational folding and posttranslational domain assembly events (see Schwarz and Beck 2019, Marinko et al. 2019) for outstanding recent reviews on membrane protein folding). As we discuss in this chapter, the majority of proteins that are sensitive to synonymous codon changes are enzymes (Nielsen et al. 2007), integral membrane proteins such as receptors (Duan et al. 2003; Nackley et al. 2006), and transporters (Kimchi- Sarfaty et al. 2007; Bartoszewski et al. 2010; Fung et al. 2014; Rauscher et al. 2021; Rauscher and Ignatova 2018), all of which require intricate cotranslational and posttranslational folding-assembly processes for proper physiological function. An in vivo yeast study investigating translation dynamics of membrane proteins proposed that codon usage at the beginning of the mRNA sequences promotes the attachment of the signal recognition particle (SRP), which subsequently targets the mRNA-protein complex to the ER membrane. These mRNAs contain 35–40 non- optimal codons downstream of the SRP-binding site to slow down translation (Pechmann et al. 2014). Indeed, the “ramp theory” of mRNA translation indicates that rare codons are preferred for the first fifty amino acids. They moderate translational kinetics and avoid ribosome crowding (Tuller et al. 2010a; Fredrick and Ibba 2010; Tuller et al. 2010b). Rare codons, that are translated slowly (Marin 2008; Wu et al. 2004), often define specific secondary structures to delineate boundaries of protein domains. They also contribute to the accuracy of cotranslational protein
6 An Examination of Mechanisms by which Synonymous Mutations may Alter…
101
folding (Adzhubei et al. 1996; Komar and Jaenicke 1995; Zhang et al. 2005; Tsai et al. 2008). In general, codon usage is considered an important determinant of translational kinetics because rare codons are translated at a slower rate than frequent codons (Yu et al. 2015). Synonymous Mutations Affecting Cellular Fitness, Disease Penetrance and Severity Specific synonymous codon usage patterns define organisms; therefore, synonymous mutations that switch a codon to a synonym are likely to have significant consequences on protein biogenesis. Advancements in sequencing technology led to the identification of a large number of sSNPs which are associated with disease penetrance and severity (Rauscher et al. 2021; Bampi et al. 2020; McCarthy et al. 2017). Dhindsa and colleagues found strong evidence that natural selection optimized codon content in the human genome and discovered that dosage-sensitive, DNA damage-response, and cell cycle regulation genes are the most intolerant to synonymous changes (Dhindsa et al. 2020). These findings are supported by another study demonstrating rare codon-mediated regulation of translation during the cell cycle (Guimaraes et al. 2020). These developments emphasize the significance of evolutionarily selected synonymous codon usage in the human genome and strongly suggest that synonymous mutations, that alter this pattern, may lead to changes in gene expression. Most importantly, these discoveries underline the still unappreciated significance of synonymous mutations in human variation, disease severity and penetrance. Predicting the Consequences of Synonymous Mutations As shown by the examples in this chapter, experimental confirmation of the consequences of synonymous mutations is often difficult, and the results are hindered by the experimental systems applied. Fortunately, modern computing technologies greatly augmented our ability to predict and then experimentally confirm the effects of genetic variations. They help us understand the molecular underpinnings of human disorders as well as genomic evolution. It is now well accepted that controlled translation dynamics evolved to fine-tune protein folding, expression, and function. In addition to biological experimentation, rapid developments in computational science allowed the development of tools to predict the consequences of genetic variations. To annotate human genetic variations, some tools use singular information, such as the RNA sequence, to predict mRNA secondary structures (Zuker 2003; Sabarinathan et al. 2013), codon usage comparisons (Sharp and Li 1987), tRNA abundance (Zhang and Ignatova 2009); while others integrate diverse annotations into one score for each variant (combined annotation-Dependent Depletion (CADD) (Kircher et al. 2014; Rentzsch et al. 2021; Rentzsch et al. 2019). As for protein structure predictions, computational molecular physics (CMP) can predict and determine detailed conformational possibilities of proteins in space and time. In a recent review, Brini et al. summarized how CMP helps to decipher the driving forces of conformations and protein folding in high resolution, so each protein can tell its own story (Brini et al. 2020). In summary, the incorporation of biological
102
Y. Zhang and Z. Bebok
experimental tools and computational approaches helps us understand the consequences of synonymous mutations. Protein Synthesis and Folding Protein synthesis begins with the attachment of mRNAs to ribosomes. Actively translated mRNAs unfold and pass through the ribosomes unidirectionally, one codon at a time. Once translation begins on the ribosomes, the consecutive step in protein biogenesis is the selection of the tRNA carrying the corresponding amino acid and formation of the peptide bond (Rodnina and Wintermeyer 2016). Subsequently, the translated nascent polypeptide extrudes from the tunnel of the large ribosomal subunit. As domains of proteins exit the ribosome, they form secondary and tertiary structures and begin to fold into their native structures cotranslationally, with the help of chaperones. Secretory proteins enter the ER lumen, and membrane proteins are inserted into the ER membrane through the translocon complex and traffic through the secretory pathway in a highly regulated manner (see Schwarz and Beck 2019; Cassaignau et al. 2020 for review). Proper protein folding is necessary to reach the desired conformation that enables the physiological function of the particular protein during its lifespan. Because synonymous codons may affect the folding process, the functions of the proteins can be altered by such synonymous mutations.
6.2 Proposed Mechanisms by Which Synonymous Mutations Alter Translation and Protein Folding We are just at the beginning of deciphering the complexity of mechanisms by which synonymous mutations influence protein synthesis. Studies analyzing the role of synonymous codon substitutions in protein expression and function provide varying hypotheses for the primary cause by which synonymous codon variants affect mRNA levels, translation initiation, elongation, translational dynamics, efficiency, and protein folding. Here we review the different theories including pre-mRNA processing, change in codon optimality and tRNA abundance, alterations in mRNA stability (half-life), and mRNA secondary structures (Fig. 6.1). We review synonymous mutations in genes that are linked to human disorders and how investigators analyzed the mechanisms by which these mutations contribute to disease development and changes in phenotype. These reports suggest that synonymous codon changes affect protein folding and function by multiple mechanisms. With growing database of synonymous mutations and predictions on the consequences of such mutations, experimental approaches can link them to phenotypic alterations. As our understanding of the mechanisms by which synonymous codon usage contributes to disease development and severity is expanding, studies analyzing the consequences of synonymous codon changes also demonstrate the extreme complexity of such mutations.
6 An Examination of Mechanisms by which Synonymous Mutations may Alter…
103
Fig. 6.1 Steps and mechanisms of protein biogenesis affected by synonymous mutations. Synonymous mutations may affect: (1) Pre-mRNA splicing; (2) mRNA structure; (3) mRNA thermodynamic stability; (4) mRNA degradation; (5) translation initiation; (6) binding of coding sequence specific miRNAs; (7) switch to a different abundance tRNA; (8) protein elongation; (9) cotranslational protein folding; (10) chaperone binding; (11) proteasomal degradation and aggregation
6.2.1 Synonymous Mutations Affecting Pre-mRNA Processing Mammalian pre-mRNAs typically consist of protein-coding and non-coding regions, referred to as exons and introns, respectively. During mRNA processing, a 5’ guanine cap is added, introns are spliced out, and poly-adenine tails are added. Mature mRNAs contain 5’ and 3’ untranslated regions (UTR), and in the coding region, exons are precisely linked together. Improper identification of exon-intron boundaries during splicing may generate defective mRNAs and low-level or dysfunctional proteins (for a historical review on pre-mRNA processing please refer to Scherrer 2018). In higher eukaryotes, exon-intron junctions are defined by intronic and exonic cis-elements that are necessary for proper splice-site recognition. Intronic cis-elements are highly conserved and localized to both 5’ and 3’ splice sites, with an A residue in the center of the intron serving as branch point site. Exonic cis-elements are referred to as splicing enhancers (ESE) and silencers (ESS)
104
Y. Zhang and Z. Bebok
1
3
Correct splicing
1
2
4
Stable mRNA, functional protein 7-methylguanosine cap
4
1
ESS
2
3
ESS
2
ESE
1
ESE
Gene +sSNP
Gene
3
sSNP
2
4
4
Incorrect splicing Unstable mRNA, dysfunctional protein CBC: cap binding complex
Spliceosome
Fig. 6.2 The consequences of synonymous mutations on pre-mRNA processing. Left image shows correct, right image depicts incorrect splicing. ESE: exonic splicing enhancer; ESS: exonic splicing suppressor
may promote or repress mRNA splicing by the binding of proteins. Here we concentrate on sSNPs which localize to ESE and ESS regions and lead to splicing defects (Fig. 6.2). Excellent reviews on RNA processing were published elsewhere (Canson et al. 2020; Carrocci and Neugebauer 2019; Tellier et al. 2020; Xu et al. 2021). ESEs are prevalent sites of mutations (Liu et al. 1998). Up to 15% of all disease- causing point mutations may disrupt mRNA splicing—estimated and supported by the Human Gene Mutation Database (http://www.hgmd.cf.ac.uk). Furthermore, sSNPs near exon-intron junctions might be under-reported if they are assumed neutral. As discussed in the following paragraphs, synonymous mutations close to intron-exon junctions may lead to a variety of human disorders. 6.2.1.1 Examples of sSNP-Associated Pre-mRNA Processing Defects X-linked metabolic disorder, or Leigh's encephalomyelopathy: This disorder is caused by pyruvate dehydrogenase (PDH)-E1α deficiency. A silent mutation, denoted as 660 A>G in the PDH-E1 gene, has been reported to cause a severe case of familial Leigh’s syndrome by aberrant splicing that causes the skipping of exon 6. Because exon 6 contains a thiamine pyrophosphate-binding site that is necessary for the proper function of the enzyme, skipping this exon results in a dysfunctional protein (De Meirleir et al. 1994; Cardozo et al. 2000). Juvenile gout, Lesch–Nyhan, or Kelley-Seegmiller syndrome: (See details at the portal for rare disorders (https://www.orpha.net/consor/cgi-bin/OC_Exp. php?lng=EN&Expert=79233). The synonymous TTC>TTT mutation in exon 8 of the human X-linked hypoxanthine-guanine phosphoryl transferase (HGPRT) gene causes extensive exon 8 skipping, leading to juvenile gout. The authors
6 An Examination of Mechanisms by which Synonymous Mutations may Alter…
105
proposed that the mutation triggered mRNA secondary structure changes leading to inefficient exposure of the proper splicing sites (Steingrimsdottir et al. 1992). This example indicates that mRNA structures, discussed later in this chapter, may influence protein biogenesis by multiple mechanisms. Phenylketonuria: The synonymous (c.1197A>T) mutation in the human phenylalanine hydroxylase gene encoding valine at position 399 (V399V) is near the exon-intron junction of exon 11 at the 3’-end and leads to skipping of exon 11. The consequence is phenylketonuria, a metabolic disorder characterized by seizures and intellectual disability (Chao et al. 2001; Wang et al. 2019). Proximal spinal muscular atrophy (SMA): SMA is a lethal autosomal recessive neurodegenerative disorder, caused by the loss of function of the survival motor neuron gene product 1 (SMN1) (https://rarediseases.info.nih.gov/diseases/4531/ proximal-spinal-muscular-atrophy). Interestingly, a second copy, named SMN2 localized to a duplicated region of chromosome 5q13, but the protein encoded by this gene only has minimal functions. This dysfunction is caused by a synonymous, single-nucleotide change (C→T) at position +6 of exon 7 (C6T), causing the exclusion of exon 7 in the SMN2 mRNA and the translation of unstable protein. The underlying molecular mechanism has been attributed to the failure of splicing factors to recognize the ESE (Monani et al. 1999). Treacher Collins syndrome (TCS): TCS is an autosomal-dominant disorder with severe craniofacial morphogenesis defects (Grzanka and Piekielko-Witkowska 2021). Interestingly, a de novo synonymous mutation (c.3612A>C) was observed in a patient with another benign TCS-associated mutation (c.122C>T) which alone did not cause TCS in other family members. Analysis of the TCS-associated gene, the 1488-amino-acid nucleolar phosphoprotein treacle (TCOF1) mRNA in the patient with the synonymous mutation (c.3612A>C) showed a transcript with missing exon 22 (Macaya et al. 2009). These results indicate that TCS in this patient is caused by the synonymous de novo c.3612A > C mutation that caused exon skipping. Polycystic kidney disease: Polycystic kidney disease is the most common monogenic human disorder. Many disease-causing mutations affect the PKD1 gene, and many of them cause splicing defects (Claverie-Martin et al. 2015). An interesting study applied bioinformatics tools to select mutations with potential effects on pre-mRNA splicing and tested them using minigene assays. Two synonymous variants (c.327A>T (p.G109G) and c.11257C>A (p.R3753R)) have been shown to induce splicing defects by the introduction of strong donor splice sites in exons 3 and 39 respectively, resulting in aberrant mRNAs (Claverie- Martin et al. 2015; Gonzalez-Paredes et al. 2014). A subsequent study identified the c.1716G>A (p.K572K), leading to exon 7 skipping. Cystic fibrosis: Mutations in the cystic fibrosis transmembrane conductance regulator (CFTR) gene are responsible for cystic fibrosis (CF), a frequent autosomal recessive disorder of European descent with severe digestive and respiratory symptoms. CFTR encodes a complex transporter protein with two membrane- spanning, two nucleotide-binding, and regulatory domains that serves as chloride and bicarbonate transporter (for review: Dickinson and Collaco 2021). Over
106
Y. Zhang and Z. Bebok
2000 mutations have been identified in CFTR (http://www.genet.sickkids.on.ca/ app), resulting in a highly variable phenotype and penetrance (Cutting 2015). Significant efforts have been directed towards functional classification of the mutations (Veit et al. 2016), and the cystic fibrosis field has made major progress in identifying highly abundant sSNPs in CFTR (Hill et al. 2014; Tsui and Dorfman 2013). This led investigations to understand their molecular and functional consequences (Bartoszewski et al. 2010; Rauscher et al. 2021; Rauscher and Ignatova 2018; Bampi et al. 2020; Lazrak et al. 2013; Bali et al. 2016a, b; Bartoszewski et al. 2016; Kirchner et al. 2017; Polte et al. 2019). We review these developments under each proposed mechanism by which the synonymous change may alter the CFTR function. We begin with the effects of synonymous substitutions on the splicing of CFTR pre-mRNA. Using splicing of exon 12 of CFTR as the model, Pagani and colleagues concentrated on synonymous substitutions which likely affect pre-mRNA processing (Pagani et al. 2005). They have demonstrated that ~25% of random synonymous variations resulted in exon skipping. The suggested underlying mechanism is interference with the composite exonic regulatory element of splicing (CERES). The results implied that CERES sequence variations likely represent a frequent disease- causing mechanism and can also be responsible for phenotypic variability (Pagani et al. 2003). Importantly, the authors pressed the significance of accurate exon polymorphism reporting that includes synonymous changes. Interestingly, multiple, random synonymous variations in exon 12 induced splicing defects, suggesting a suboptimal splicing efficiency of the human CFTR gene. They demonstrated that the splicing effects of single synonymous substitutions were dependent on the exonic context. These results implied that synonymous substitutions may lead to altered splicing efficiency, leading to variable levels of exon loss (Raponi et al. 2007). As discussed previously for juvenile gout (Steingrimsdottir et al. 1992), synonymous mutation may alter mRNA structure and lead to splicing defects. Consistent with this idea, an analysis on the conservation of eukaryotic pre-mRNA and mRNA secondary structures, including the wild type and 29 synonymous variants of exon 12 of CFTR, concluded that the local structures of the mRNA may predict splicing efficiencies (Meyer and Miklos 2005). Cystic Fibrosis Caused by the c.2811G>T Synonymous Mutation The c.2811G>T synonymous mutation changes the GGG glycine codon to GGT encoding glycine at the 893 position of the CFTR protein. This mutation was observed in a patient with mild cystic fibrosis. Interestingly, the mutation created a new 5’ splice site within exon 15 and consequently a shorter exon 15. However, the downstream sequences of exon 16 were not affected, yet the partial exon exclusion resulted in dysfunctional CFTR protein (Faa et al. 2010). The c.1584G>A CFTR Variant with Multiple Consequences, Including Aberrant Splicing The c.1584 G>A synonymous mutation changes the GAG glutamic acid codon to GAA. The synonymous change localizes to the last nucleotide of exon 11 and therefore, suspicious of affecting pre-mRNA splicing. This mutation
6 An Examination of Mechanisms by which Synonymous Mutations may Alter…
107
was identified in an atypical form of cystic fibrosis in association with other synonymous mutations (Bampi et al. 2020). However, for the focus of this section, we conclude that this mutation has been confirmed to initiate exon 11 skipping as well as the retention of a sequence from the downstream intron, but only with low frequency. In the upcoming sections we will review how this mutation, together with other sSNPs led to cystic fibrosis. Cancer: An interesting study compiled a large number (>650,000) synonymous mutations found in different human cancers which can contribute to cancer development through multiple mechanisms (Sharma et al. 2019). Here we provide examples of cancer-associated synonymous mutations which cause splicing defects. TP53: Mutations in the gene of DNA-binding p53 protein, including synonymous changes in TP53, are known risk factors for the development of a variety of cancers (Joruiz and Bourdon 2016). Interestingly, a de novo synonymous germline mutation, c.672 G>A, in TP53 was discovered in a young patient with multiple sarcomas. This mutation caused a 5’ cryptic splice site, resulting in a frame shift with truncated protein which disrupts the DNA binding domain of p53 (Austin et al. 2017). BRCA1: BRCA1 and 2 are human tumor suppressor genes in which mutations account for a large percentage of hereditary breast and ovarian cancers (for review: https://www.cdc.gov/genomics/disease/breast_ovarian_cancer/genes_ hboc.htm). Here we provide examples of synonymous mutations in BRCA1 and BRCA2 genes which were identified in patients and caused splicing defects. The c.516 G> A, (Lys172Lys) in exon 6 of BRCA2 has been discovered in a family with ovarian and breast cancers. Analysis of blood samples from the patients verified that the mutation causes skipping of exons 5 and 6 (Hansen et al. 2010). More recently, the BRCA1 c.5073 A > T has been identified in a family with hereditary ovarian cancer (Minucci et al. 2018). These examples support the idea that synonymous mutations can cause pre-mRNA processing defects, leading to multiple types of human disorders.
6.2.2 Altered Binding of Micro-RNAs Targeting Protein-Coding Regions (ORFs) MicroRNAs (miRNAs) are regulatory RNAs, expressed in most organisms. The functions of miRNAs are studied extensively during the past 20 years, and many excellent reviews on this topic have been published (Bartel 2009, 2018). In mammalian cells, miRNAs were initially shown to regulate gene expression through the 3’ untranslated regions (3’UTRs) of mRNAs by initiating their degradation (Guo et al. 2010). Subsequently, they have also been shown to regulate gene expression through 5’UTRs (Barrett et al. 2012; Zhou and Rigoutsos 2014) by binding to
108
Y. Zhang and Z. Bebok
conserved coding regions of mammalian mRNAs and repressing translation (Forman et al. 2008; Forman and Coller 2010; Reczko et al. 2012; Zhang et al. 2018). The significance of coding sequence specific miRNAs is illustrated by their interactions with the mRNA of embryonic stem cell differentiation factors such as Nanog, Oct4, and Sox2. More importantly, the introduction of synonymous into the miRNA targeting sites abolishes miRNA activity and delays differentiation (Tay et al. 2008a, b). This example suggests that in addition to physiological processes such as stem cell differentiation, synonymous may alter coding sequence specific miRNA binding and lead to pathological events, as we discuss here. 6.2.2.1 Examples of Human Disorders with sSNPs Altering Coding Sequence Targets of miRNAs Crohn’s disease: Crohn’s disease is an inflammatory bowel disease and multiple types of genetic factors have been implicated to contribute to the development and severity of the disease (https://www.crohnscolitisfoundation.org/what-is- crohns-disease). One of the genes involved in Crohn’s disease is the autophagy- associated, immunity-related GTPase family M protein (IRGM) (Parkes et al. 2007). The susceptibility locus has been defined as a synonymous mutation, c.313T>C and the neighboring region is a binding site for miR-196. The mutation inhibits miR-196 binding, leading to enhanced IRGM expression misprocessing of Crohn’s disease-associated pathogens (Brest et al. 2011). Cancer: The sSNP (F17F) in the BCL2L12 gene has been associated with malignant melanoma, a skin cancer that develops from the skin pigment producing cells, melanocytes. The BCL2L12 gene encodes the Bcl2L12 protein, which is crucial in the regulation of apoptosis by inhibiting the action of caspases (Stegh et al. 2007). The synonymous codon change inhibits the binding of miR-671-p, leading to increased BCL2L12 mRNA and protein levels and consequently, inhibition of apoptosis (Gartner et al. 2013).
6.2.3 Codon Optimality and tRNA Abundance Codon optimality refers to the use of codons with more abundant cognate tRNA species in contrast to non-optimal or rare codons with low tRNA abundance (Hanson and Coller 2018). However, the role of codon optimality and tRNA abundance on protein folding is extremely complex. Although the fundamental role of tRNAs in protein biogenesis has been discovered over 60 years ago (Hoagland et al. 1958), deciphering their significance in the regulation of multiple cellular processes is still under investigation (for review see Schimmel 2018). It has been established that tRNA pools are organism (Fujishima and Kanai 2014), tissue (Dittmar et al. 2006; Torres et al. 2019), and cell type (Guimaraes et al. 2020; Kirchner and Ignatova 2015) specific. tRNA pools are also modified by multiple cellular conditions such
6 An Examination of Mechanisms by which Synonymous Mutations may Alter…
109
as stress (Pechmann 2018; Torrent et al. 2018) and cancer (Santos et al. 2019). In addition, tRNA composition is different in proliferating and differentiated cells (Guimaraes et al. 2020). Correspondingly, important mRNAs during proliferation could be repressed in differentiated cells, suggesting a well-tuned coordination between transcription and translation (Gingold et al. 2014). Developments in tRNA transcriptome (tRNome) analysis established an expanded genetic diversity of human tRNAs with various sequence variations (isoacceptors, isodecoders), as well as diverse tRNA base modifications, making the tRNome extremely complex (Schimmel 2018; Berg et al. 2019). These studies suggest highly controlled transcriptional and posttranscriptional regulatory processes to define tRNA abundance in cells and tissues. In contrast to higher organisms, readily available tRNAs support fast protein synthesis and corresponding quick growth of bacteria (Ikemura 1981; Hauber et al. 2016). Vice versa, low-level gene expression associates with increased number of rare codons and low tRNA abundance in most cases (Berg and Kurland 1997). As mentioned earlier, determining whether tRNA abundance plays a role in protein biogenesis in multi-cellular organisms and especially in mammals is more complicated. Nonetheless, it has been established that in general, rare codons are associated with low corresponding tRNA levels. Therefore, tRNA availability can affect the speed of protein synthesis, with consequent folding alterations and functional differences. Here we review how synonymous changes, which switch the codon in the translated mRNA to lesser or higher abundance tRNA decoder may contribute to disease development by deregulating the naturally attained translation and protein folding profile (Schimmel 2018; Rak et al. 2018). 6.2.3.1 Switch from a Frequent to a Rare Codon with Low tRNA Abundance Multidrug resistance in cancer: The multidrug resistance one gene (MDR1) gene product, P-glycoptotein (P-gp), is an adenosine triphosphate (ATP)-binding cassette (ABC) transporter, responsible for the efflux of chemotherapeutics from cancer cells. Therefore, functional changes in P-gp contribute to multidrug resistance of cancer cells (Gottesman et al. 2002). Although multiple SNPs have been identified in this gene, Kimchi-Sarfaty and colleagues demonstrated that one of the synonymous mutations in exon 26 (denoted as C3435T, or I1145-ATC>ATT), leads to the synthesis of a P-gp variant with altered drug and inhibitor binding properties (Kimchi-Sarfaty et al. 2007). Importantly, this mutation changes the isoleucine codon from ATC to its synonym ATT without altering the amino acid sequence of the protein. To investigate the mechanism by which the synonymous variant affected the function of the protein, they analyzed mRNA levels, mRNA half-life, protein levels, and structure by trypsin sensitivity assay. Although, mRNA and protein levels of the I1145 ATC and I1145ATT variants were similar, the structures and functions of the proteins were significantly different, leading to altered substrate and inhibitor binding. Following the analysis of synonymous
110
Y. Zhang and Z. Bebok
codon frequency for isoleucine in the MDR1 gene, the authors concluded that the mutation switched the isoleucine codon (ATC) to a rare (ATT) codon, with low levels of tRNA decoder. Translation slows down at rare codons and alters the timing of translation and cotranslational folding of P-gp, resulting in structural changes in the substrate and inhibitor binding sites. In a subsequent study, it has been demonstrated in a polarized epithelial monolayer that in addition to the substrate specificity, the recycling time of P-gp was also affected by the sSNP (Fung et al. 2014), strongly supporting the idea that sSNPs may have multiple consequences during the life of the encoded protein. 6.2.3.2 Switch from a High to Very Low Abundance tRNA Decoder Investigating cellular tRNA abundance as regulator of translation and protein folding, the Ignatova group made significant contribution to our understanding of how synonymous mutations may alter protein biogenesis by changing the codon with altered tRNA availabilities (Rauscher and Ignatova 2018; Zhang and Ignatova 2009; Kirchner and Ignatova 2015; Czech et al. 2010; Gorochowski et al. 2015). In collaboration with prominent scientists from the cystic fibrosis field (Kleizen et al. 2000, 2005; Mijnders et al. 2017; Li et al. 2004; Sheppard 2004, 2011; Sheppard et al. 1995), they have broadly analyzed the consequences of known sSNPs in CFTR (http://www.genet.sickkids.on.ca/app) on mRNA and protein levels in bronchial epithelial and HeLa cells (Kirchner et al. 2017). From these studies, we focus on the c.2562T>G sSNP. This mutation leads to a codon with very low abundance tRNA decoder. We note that c.2562T>G refers to the location of the nucleotide change in the CFTR cDNA (http://www.genet.sickkids.on.ca/cftr/MutationDetailPage. external?sp=2142). The CFTR c.2562T>G sSNP: This sSNP switches the threonine-854 ACT codon to ACG in CFTR. Analyzing tRNA abundance in airway epithelial cells where CFTR expression is highly important, the authors found that while the tRNA for threonine decoding ACT was moderately abundant, the tRNA to decode the ACG codon (tRNAthr(CGU)) was extremely rare. The authors concluded that the lack of tRNA to decode the ACG codon slows down translation and alters the cotranslational folding of c.2562T>G CFTR (Kirchner et al. 2017). Although the c.2562T>G sSNP alone does not cause cystic fibrosis, it is frequently associated with other, non-synonymous, disease-causing mutations. Moreover, it reduced CFTR protein expression in both cell lines tested, when compared to wild type CFTR. Reduced mRNA levels or alternative splicing were excluded as causes of reduced c.2562T>G CFTR protein levels. Analysis of CFTR maturation (trafficking of CFTR through the secretory pathway) did not yield significant differences between the c.2562T>G and wild type CFTR, indicating that the reduction in protein levels was not caused by enhanced ERAD of the c.2562T>G CFTR. As the logical next step, they measured the stabilities of the cell surface membrane localized c.2562T>G and wild type CFTRs. They found only subtle differences.
6 An Examination of Mechanisms by which Synonymous Mutations may Alter…
111
However, when the structures of the c.2562T>G and wild type CFTR were analyzed by limited proteolysis, the structural differences became apparent. Consequently, the altered topology of the protein in the membrane resulted in functional changes that were measured by physiological assays (patch clamp electrophysiology). Yet one of the most important experimental results of this work, especially in relation to our review, is that when levels of the CGU-decoder tRNA (tRNAthr(CGU)) were increased in the cells, the consequences of the c.2562T>G sSNP on CFTR protein biogenesis and function were eliminated (Kirchner et al. 2017). These studies provide evidence that the availability of the appropriate decoder tRNA is necessary for proper protein folding and function. Association of the c.2562T>G Synonymous Mutation with Disease-Causing Missense Mutations Which Result in Codons with Low tRNA Abundance As described earlier, the c.2562T>G (p.T854T) sSNP slows down translation at this codon and results in protein structural defects because of low abundance of tRNAs that decode ACG (Kirchner et al. 2017). Complicating the synonymous mutation- mediated protein folding defects, a recent study investigated how the c.2562T>G sSNP affects the biogenesis of known disease-causing missense CFTR mutations (Rauscher et al. 2021). Interestingly, they observed that the c.2562T>G sSNP augmented the biogenesis of multiple missense mutants located upstream (5”) of the c.2562T>G sSNP. The most significant positive effects (higher CFTR expression and function) were seen with missense mutations, which affected translation speed by switching to codons with markedly lower tRNA decoder levels. Therefore, the missense mutation that caused an amino acid change slowed down translation. The authors speculate that the synergistic ribosomal speed alterations (slowing at the ACG) allows more time for the cotranslational folding of CFTR and for the association of important binding factors regulating function (Rauscher et al. 2021). This type of genetic interaction between mutations (in this case a synonymous and non- synonymous) is denoted as positive epistasis caused by multiple type intragenic (occurring in the same gene) mutations (Domingo et al. 2019). However, this example only extenuates the complexity of mechanisms by which non-synonymous or synonymous codon usage alters protein folding and function. Slowing or enhancing translation speed can lead to either detrimental or positive consequences on protein levels and function. 6.2.3.3 Switch from a Rare to a More Frequent Codon with High tRNA Abundance Autosomal recessive congenital thrombotic thrombocytopenic purpura (cTTP): cTTP is a rare inherited disorder caused by mutations in the ADAMTS13 gene, encoding the multi-domain ADAMTS13 metalloprotease enzyme that cleaves multimers of the von Willebrand Factor (VWF) from the vasculature in a highly regulated manner. In the lack of the enzyme, VWF aggregates attach to the endothelium and sequester thrombocytes resulting in profound reduction of thrombo-
112
Y. Zhang and Z. Bebok
cytes (thrombocytopenia), with consequent organ failures of variable severity (for review see Joly et al. 2019). The authors identified multiple mutations, including synonymous single nucleotide variations in ADAMTS13 (Tseng and KimchiSarfaty 2011; Edwards et al. 2012; Kim et al. 2014), and used multiple in silico tools to predict the phenotypic manifestations of cTTP (Hing et al. 2013). The c.354G>A synonymous variant in ADAMTS13: Th c.354G>A synonymous mutation changes the proline codon of ADAMTS13 protein at the 118 location from CCG to CCA. CCA is a more frequent codon for proline with corresponding higher abundance tRNA decoders and therefore, predicted to translate faster. This synonymous change causes enhanced function of the enzyme (Edwards et al. 2012). In cell free assay, the translation of the c.354G>A variant was more efficient than the wild type. Using limited proteolysis assay, the investigators could not confirm conformational differences between the wild type and the synonymous variant biochemically (Hunt et al. 2019). However, computer modeling that analyzed possible interactions between the ADAMTS13 enzyme and the VWF substrate predicted different interactions between the c.354G>A variant compared to the wild type. The authors concluded that the change from a rare codon to a more frequent codon causes sufficient changes in translational dynamics and consequent protein conformation to influence the binding of the enzyme and substrate, but the difference could not be confirmed with presently available biochemical assays (Hunt et al. 2019). The c.1584G>A synonymous mutation in CFTR results in a codon with corresponding high tRNA abundance (Polte et al. 2019): The c.1584G>A CFTR variant in which the glutamic acid codon GAG is exchanged to the synonymous GAA codon is of particular interest. The Glu residue is localized in the nucleotide binding domain 1 (NBD1) of CFTR, where folding changes can result in severe functional defects (Du and Lukacs 2009; He et al. 2015; Hoelen et al. 2010; Kim and Skach 2012; Shishido et al. 2020). Furthermore, this synonymous mutation alters the last nucleotide of exon 11; therefore, it could also affect splicing. However, when the expression of cDNAs is analyzed, the possibility of splicing defects are eliminated. Nevertheless, the expression of the CFTR was not significantly different from wild type in bronchial epithelial cells (Kirchner et al. 2017). When the tRNomes (tRNA expression patterns) of the cells were analyzed, including bronchial epithelial cells, the results indicated that the levels of GAA-decoding tRNAs were 2-fold higher than the GAG decoders. Therefore, this particular codon (GAA) was predicted to increase local translation speed by RiboTempo algorithm (Polte et al. 2019), which utilizes tRNA concentration to determine translation speed (Zhang and Ignatova 2009; Zhang et al. 2009). In short, this algorithm is based on the expectation that translation speed for each codon depends on the availability of the cognate tRNA. tRNAs with higher concentrations are more likely to stumble upon the ribosome than those with low abundance (Zhang et al. 2010). To justify the use of this tool, the authors verified it by predicting the translation rates of multiple other CFTR and ADAMTS13 synonymous variants with previously identified consequences of the synonymous mutations on protein levels (Polte et al. 2019). Notably, increased transla-
6 An Examination of Mechanisms by which Synonymous Mutations may Alter…
113
tion speed may not always be beneficial, especially for multi-domain membrane proteins (Kramer et al. 2019). It is possible that lowering translation rate in a functional domain allows for more efficient folding, and speeding up translation causes misfolding and early degradation, or vice versa (for review: Collart and Weiss 2020). Therefore, while the prediction methods are interesting and important, we should remember that the dynamics of translation, not speed alone, determine protein structure. As we discuss later, the association of the c.1584G>A sSNP in CFTR with other sSNPs causes mild cystic fibrosis (Bampi et al. 2020), implying that the consequences of synonymous mutations are difficult to predict. In addition, the consequences of synonymous mutations studied in cDNA may be different from their effect as sSNPs in the entire coding region. Proof for functional consequences requires detailed experimental analysis at protein biogenesis and functional levels. At this stage of our knowledge, it is important to utilize multiple prediction tools and compare the results to biochemical and functional data in order to understand the protein biogenesis changes caused by synonymous mutations.
6.2.4 Codon Optimality, Synonymous Codon Usage, and mRNA Half-Life Optimal codon content has been shown to account for the similar stabilities of mRNAs which encode functionally related proteins such as large ribosomal, or small ribosomal proteins and glycolysis enzymes (Presnyak et al. 2015). More recently, Mauger and colleagues combined computational sequencing and modified nucleotide substitutions to study the influence of mRNA primary sequence and structural stability on protein expression levels. Using a fluorescent (eGFP) mRNA panel, they studied the effects of codon usage and mRNA secondary structures. They found a relationship between codon optimality, mRNA secondary structure, and mRNA functional half-life. More specifically, they described that the structure of the mRNA, in the protein-coding region, regulates protein expression by influencing the functional half-life of the mRNA, providing important evidence that the secondary structure of the mRNA may regulate half-life (Mauger et al. 2019). However, at present, confirming that a mRNA structure conversion is the primary cause of alterations in protein folding is not always feasible, especially for large mRNAs. The reason for this is that RNA structures are altered by interactions with proteins, other RNAs, and metabolites (Antunes et al. 2017). In addition, during translation, a dynamic unwinding of the mRNA results not only in local, but global structural changes as well (Wen et al. 2008). Here we show examples of synonymous mutations which were experimentally confirmed or predicted to alter mRNA stability, structure, forming RNA-protein (RNP) complexes, or alter the binding of coding sequence specific miRNAs.
114
Y. Zhang and Z. Bebok
6.2.4.1 Synonymous Mutation Resulting in Reduced mRNA Stability Neurological disorders associated with synonymous mutations in the dopamine receptor: Dopamine receptors are involved in multiple neurological functions and are the therapeutic targets of schizophrenia, Parkinson’s disease, Tourette’s syndrome, tardive dyskinesia, and Huntington’s disease (Duan et al. 2003; Martel and Gatti McArthur 2020). Therefore, understanding their biogenesis, structure, and function is of great interest. Duan and colleagues have analyzed naturally occurring synonymous mutations (C132T, G423A, T765C, C939T, C957T, and G1101A) in the dopamine receptor 2 gene (DRD2). Although all these mutations resulted in a non-optimal codon, only the C957T mutation reduced protein levels significantly (50%) in in vitro translation assays. They also showed that when the C957T mutation was combined with G1101A, translation levels were corrected, indicating a reversal effect when the two mutations were combined. To understand the mechanism by which the C957T mutation reduced protein synthesis, they compared the half-lives of the wild type and C957T mutant mRNAs and found a significant reduction in the stability of the mRNA transcribed from the C957T synonymous variant. Although dopamine can stabilize the wild type DRD2 mRNA, it failed to stabilize the DRD2 mRNA that was transcribed from the C957T mutant. Theoretical models of the mRNAs predicted changes in the structure of the mRNA, and RNase protection assay confirmed the changes (Duan et al. 2003). Therefore, these studies strongly support the idea that mRNA secondary structure, influenced by codon usage, plays an important role in protein expression by altering the functional half-life of the mRNA. Interestingly, while the C957T mutation leads to reduced mRNA stability, a later study showed that the affinity of the C957T receptor to dopamine was higher than the wild type. This suggests that in addition to reduced protein levels, the conformation of the protein was also altered by the synonymous mutation (Hirvonen et al. 2009). Since Duan and colleagues demonstrated that the C957T synonymous mutation reduced mRNA stability and lead to reduced translation (Duan et al. 2003), the consequences of this mutation were associated with altered availability of the receptor in different brain segments (Hirvonen et al. 2004), schizophrenia (Hoenicka et al. 2006), nicotine addiction (Jacobsen et al. 2006), sugar consumption (Eny et al. 2009), visual recognition tasks (Golimbet et al. 2017), improved verbal learning (Yue et al. 2017), cognition (Karalija et al. 2019), and executive brain function across the adult lifespan (Miranda et al. 2021). These observations indicate the wide range of consequences that are associated with synonymous mutations. Amyotrophic Lateral Sclerosis (ALS) and Synonymous Mutations in the Superoxide Dismutase 1 (SOD1) Gene ALS, or Lou Gehrig’s disease is a fatal neurodegenerative disorder and ~15% of the inherited cases are associated with mutations in SOD1 (Bruijn et al. 1997; Cleveland and Rothstein 2001). While the wild type SOD1 mRNA forms ribonucleoprotein (RNP) complexes with neuronal tissue specific proteins by binding the proteins at the 3’UTR of the SOD1 mRNA,
6 An Examination of Mechanisms by which Synonymous Mutations may Alter…
115
ALS-associated SOD1 mutant mRNAs, including multiple synonymous variants, failed to form RNP complexes and resulted in reduced mRNA stability (Kirchner and Ignatova 2015). In a more recent study the consequences of SOD1 mutations on ALS severity were linked to the location of the mutations in conserved domains (Pal et al. 2020). Of particular interest, five (L85L-253T>C; L85L-255G>A; N132N-396T>C; N140N-420C>T; A141A-423T>A) out of six synonymous mutations that were linked to ALS are localized to conserved domains of the SOD1 protein. Since SOD1 has been shown to have gene expression regulatory function with DNA binding capacity through its transcription factor (TF) region (Li et al. 2019), even minimal conformational changes can lead to modulation of gene expression. These examples suggest that a synonymous mutation may alter the function of the encoded protein through multiple mechanisms.
6.2.5 Synonymous Mutations Altering mRNA Secondary Structures Without Changing mRNA Half-Life During translation, mRNAs unfold and enter the ribosomes to be translated. Each codon is recognized by a cognate tRNA, which carries an amino acid. Peptide bonds are formed in the ribosome, and the nascent protein chain is extruded from the ribosomal tunnel (for a recent review: (Cassaignau et al. 2020). Translation is not continuous, but rather it proceeds through subsequent translocation-pause-translocation cycles (Wen et al. 2008). The length of the pauses is highly variable between a fraction of a second and 1–2 minutes, but the translocation cycles are short (ATT) (Cheng et al. 1990). (We call this mutation I507ATT F508del in this review). This mutation causes misfolding and endoplasmic reticulum associated degradation (ERAD) of the F508del CFTR protein (Ward and Kopito 1994; Ward et al. 1995). It has been studied extensively for its trafficking and functional defects (Lukacs et al. 1994; Okiyoneda et al. 2018; Sharma et al. 2001; Veit et al. 2014; Zhang et al. 1998). For the focus of this chapter, we concentrate on the studies that investigated the effects of the synonymous codon change on mRNA secondary structure (Bartoszewski et al. 2010), CFTR protein folding defects and function (Lazrak et al. 2013; Bali et al. 2016a, b). In the first study, the authors used mRNA structure prediction tools (Nackley et al. 2006; Zuker 2003) to compare the structures of the wild type and the I507 ATT F508del CFTR mRNAs. They noticed structural differences. Interestingly, restoration of the I507ATT to ATC in the F508del CFTR corrected the structure of the mRNA in the specified region. Circular dichroism (CD) of the full length CFTR mRNAs showed that the physical properties of the I507-ATT F508del and the restored I507-ATC F508del full-length CFTR mRNAs were different. They also performed biochemical mRNA folding/RNase protection assays on small fragments of the CFTR mRNA from the region of the mutation and demonstrated the altered mRNA folding pattern by sequencing the RNase treated fragments (Bartoszewski et al. 2010). Importantly, restoration of I507 ATT to ATC in the F508del CFTR eliminated the mRNA structure differences and, in this particular case, mRNA half-life was unaffected (Bartoszewski et al. 2010). In order to determine the consequences of the observed mRNA structural changes, they determined that the in vitro translational rates of the I507ATT F508del CFTR was lower than the I507ATC F508del (Bartoszewski et al. 2010). While the mRNA half-life was unaffected, they demonstrated that the I507ATC F508del CFTR protein had a longer half-life than the I507 ATT F508del (Lazrak et al. 2013). Because the F508-del CFTR mutation is the most severe mutation in cystic fibrosis that results in altered chloride transport by the mutant CFTR (Sheppard et al. 1995), the authors analyzed the functional consequences of the synonymous codon change by
6 An Examination of Mechanisms by which Synonymous Mutations may Alter…
117
electrophysiological approaches such as whole-cell and single-channel patch clamp. They demonstrated that the I507 ATC>ATT synonymous codon change conferred more severe functional consequences than the loss of phenylalanine alone (Lazrak et al. 2013). Later, when the cells expressing the synonymous variants were treated with experimental drugs that assist in protein folding, the authors showed that the natural variant, for which the drugs were originally selected for, was more sensitive to drug-mediated rescue than the variant. These studies supported the highly specific nature of the experimental drugs (Bali et al. 2016b). Notably, during early studies of CFTR biogenesis and function, CFTR expression vectors were optimized for high yield plasmid propagation (U.S. patent 5863770). To eliminate the possible consequences of codon optimization and ensure that the only genetic difference between the I507ATC and I507ATT F508del constructs was the C>T single nucleotide change, they constructed their expression vectors using the natural CFTR sequence (GenBank sequences (NM_000492)) to develop cell lines used in their subsequent studies (Bartoszewski et al. 2010; Lazrak et al. 2013). After analyzing a number of synonymous substitutions in CFTR constructs by Mfold (Zuker 2003), the authors concluded that the I507 ATC>ATT substitution in F508del CFTR has the most significant effect on mRNA structure by creating a large single stranded loop at the vicinity of the mutation (Lazrak et al. 2013). In this “hypothetical” case, since “restored” I507 ATC F508del CFTR does not exist in CF patients, the mRNA structure alteration, caused by the ATC>ATT codon change, has been experimentally confirmed and was proposed as the possible cause of protein folding defect (Bartoszewski et al. 2010). This conclusion is consistent with the idea that this type of mRNA structural change can alter translation dynamics (Wen et al. 2008). The authors also analyzed codon frequency for isoleucine, which has three possible synonymous codons, ATC, ATT and ATA. Although the ATC codon is the most frequent, the ATT codon also used in medium frequency. As of the tRNA decoders, ATC (AUC in mRNA) and ATT (AUU in mRNA) are decoded by the same tRNA. The levels of the isoleucine decoder tRNAs are low in all models tested including the cell line HEK293 (Polte et al. 2019) that was used in the studies by Bartoszewski et al. Therefore, we can conclude that the mRNA structure, rather than codon frequency, was the cause of the more severe protein folding defects in the I507ATT F508del than the I507ATC F508del CFTR (Bartoszewski et al. 2010; Lazrak et al. 2013; Bali et al. 2016a, b). However, analyzing the role of mRNA structures on translation and protein folding in living cells proved to be difficult. The mRNA secondary and tertiary structures are extremely dynamic as they associate with proteins, other RNAs, and the ribosome (Antunes et al. 2017). This is even more difficult when minimal changes are caused by synonymous codon changes in large mRNAs (Bartoszewski et al. 2010). Hemophilia: The role of synonymous variation-dependent mRNA structural changes on protein expression has also been analyzed in haemophilia-associated genes by assessing the thermodynamic stabilities of the mRNAs of blood clot-
118
Y. Zhang and Z. Bebok
ting factors 8, 9 (F8, F9). In the first study, five mRNA structure prediction tools were used to differentiate between neutral and disease-associated synonymous mutations (Hamasaki-Katagiri et al. 2017). Remarkably, in F8, the disease- associated synonymous mutations occurred in structurally stable mRNA regions (represented by low minimum free energy levels). The technically important conclusion from these studies is that 101–151 nucleotide fragment length was optimal to predict the structural changes in mRNAs. The results of these studies suggested that mRNA thermodynamic stability may be applied as a predictive characteristic in searching for disease-causing mutations. Indeed, the sSNP in F9 (c.459G>A) was predicted to reduce the thermodynamic stability of the mRNA and it causes mild hemophilia because it leads to reduced levels of F9. With assays to identify protein folding intermediates, such as conformation specific antibodies and limited trypsin digestion, the authors assessed the biogenesis of F9 encoded by the c.459G>A variant. They determined that the c.459G>A mutation, which reduces the thermodynamic stability of the mRNA, leads to slower translation, consequent protein folding defects, leading to low levels of the secreted F9 (Simhadri et al. 2017). These thorough experiments support the idea that mRNA structural changes can significantly alter translation and protein folding without affecting the half-life of the mRNA. The significance of the studies on the consequences of synonymous codon changes and mutations in blood clotting factor genes is underlined by the necessity to produce blood clotting factors, referred to as biologicals, for therapeutic purposes for patients with hemophilia. For the generation of these biologicals, “codon optimized” constructs are used to promote high peptide yield, but may affect biological function. Therefore, the thermodynamic stability of the mRNAs, encoding these therapeutic peptides, can be used to predict the possible consequences of codon optimization strategies and optimize for function over peptide yield (Hamasaki- Katagiri et al. 2017).
6.2.6 Predicting the Consequences of Synonymous Mutations In the previous sections, we analyzed mechanisms by which synonymous mutations can lead to altered codon optimality by switching to a codon with lower or higher levels of tRNA decoders or change the structure and/or stability of the mRNA. Although each of these steps have been shown to contribute to protein folding and function, it is extremely laborious to identify the consequences of each synonymous mutation on protein structure and function. Nevertheless, these examples support the idea that synonymous mutations have significant consequences on protein biogenesis, both by enhancing or reducing the levels and functions of the gene product. In the next section, we provide examples of investigations which classified synonymous mutations based on computational predictions of their consequences. These studies provide a template to test the possible consequences of individual mutations on protein structure and function.
6 An Examination of Mechanisms by which Synonymous Mutations may Alter…
119
6.2.6.1 In Silico Analysis of sSNPs in CFTR In silico tools were applied to classify synonymous variants in CFTR based on the predicted mechanisms by which they may alter CFTR protein biogenesis (Bartoszewski et al. 2016). The authors applied multiple computational methods to classify synonymous mutations in the CFTR gene. Relative synonymous codon usage (RSCU): RSCU calculation is designed to identify slow and fast translated regions based on codon usage frequency. RSCU is defined as the ratio of the frequency of codons to the expected frequency if all synonymous codons for an amino acid were used equally (Sharp and Li 1987). Using this tool, the authors predicted slow translating regions in CFTR primarily in membrane spanning regions, referred to as membrane spanning domain 1 (MSD1) and MSD2—each containing six transmembrane helices. The codons which were predicted to translate at the slowest rates were found in the last membrane spanning helices for each domain (Bartoszewski et al. 2016). mRNA structure: Known sSNPs in CFTR were also analyzed in silico for mRNA structures using RNAsnp predictor (Sabarinathan et al. 2013). Eight sSNPs showed potential to change mRNA structure, but the consequences of these predictions on protein structure and function have not been confirmed (Bartoszewski et al. 2016). To conclude this section, prediction of the consequences of sequence variations on mRNA structure is extremely complicated. In an excellent review, Susan Schroder summarized novel approaches to generate RNA structural ensembles. These new approaches intend to identify both functional and structural features. The review highlights the challenges to predict structure and function from the sequence alone (Schroeder 2018). Therefore, if the biological function of a gene product is altered as the result of synonymous mutations, multiple mechanisms affecting protein biogenesis are likely to be involved.
6.2.7 Multiple Synonymous Mutations and Their Consequences. Multiple sSNP in CFTR cause mild cystic fibrosis. Although experimental analysis of sSNPs provides important mechanistic clues concerning the significance of such mutations, the ultimate proof that synonymous mutations are important comes from their disease association. A very interesting example is provided by a patient diagnosed at an advanced age with mild cystic fibrosis symptoms, without known disease-causing mutations in the CFTR gene (Bampi et al. 2020). This observation led to the sequencing of the whole CFTR gene locus, whereby it revealed homozygosity for the earlier described c.1584G>A synonymous mutation. This mutation changes the glutamic acid codon from GAG to GAA at
120
Y. Zhang and Z. Bebok
the last nucleotide of exon 11, which encodes the NBD1 domain and suspicious of causing splicing defects (Dork et al. 1997). Analyzing protein expression from the cDNA in human bronchial epithelial cells, this sSNP was proposed to enhance translation speed by switching to a codon with abundant tRNA decoder. However, during the first studies, the investigators did not observe increase in CFTR protein levels from the cDNA with the c.1584G>A synonymous mutation in bronchial epithelial cells (Kirchner et al. 2017). In contrast, results in the more recent case report indicated that the expression of the c.1584G>A cDNA resulted in slightly higher protein expression (Bampi et al. 2020). These results suggest that the effects of the c.1584G>A mutation on CFTR protein expression using a cDNA expression vector are only minimal and may result from the differences in cell culture conditions, or other factors. Most importantly, it is unlikely that the minimal increase in CFTR protein expression has any physiological significance. Nevertheless, the case report had additional exciting new findings. RNA sequencing revealed that the c.1584G>A sSNP causes exon skipping at low frequency, with retention of a segment from the downstream intron (Bampi et al. 2020). Moreover, sequencing of the entire CFTR locus identified two additional homozygous sSNPs (c.2562T>G and c.4389G>A) in this patient’s CFTR gene. As we reviewed earlier, the c.2562T>G sSNP switches from a high abundant tRNA pool to a low abundant tRNA decoder (Kirchner et al. 2017). In order to understand the composite effects of the three sSNPs, all three mutations were introduced into the CFTR cDNA, expressed in bronchial epithelial cells. The composite sSNP construct resulted in increased CFTR protein expression. RNA sequencing revealed higher CFTR transcript (mRNA) levels in the patient samples compared to healthy subjects. When they looked at global gene expression changes in the patient, compared to healthy individuals, only slight but not cystic fibrosis specific changes were observed. Notably, CFTR protein levels were not measured in patient-derived samples, nor the function of the CFTR protein expressed from the cDNA carrying the composite mutations was assessed. Therefore, it is not clear what caused the mild cystic fibrosis in the patient. Considering the advanced age of the patient, it is possible that the combination of the splicing defect, unknown functional defects of the composite mutant CFTR protein, and post-translational modifications of mutant CFTR (including co-translational ubiquitination (Sato et al. 1998), SUMOylation (Ahner et al. 2013), S-nitrosylation (Zeitlin 2006), nitration (Bebok et al. 2002), palmitoylation (McClure et al. 2014)), or the altered post-translational modification sites (Pankow et al. 2019)) contribute to the functional deficiency and result in cystic fibrosis symptoms. Multiple synonymous mutations in CFTR constructs applied in studies of CFTR biogenesis. As mentioned above, CFTR expression vectors with synonymous substitutions have been widely used to analyze CFTR expression and function (Bartoszewski et al. 2010). To address this issue, Shah and colleagues analyzed the expression of a number of such CFTR cDNAs encoding the open
6 An Examination of Mechanisms by which Synonymous Mutations may Alter…
121
reading frame of CFTR (Shah et al. 2015). Their results showed that codon usage influenced CFTR expression levels. Though constructs containing the “optimized” synonymous codons were predicted to have higher protein expression, the native codons resulted in higher expression of wild type CFTR that efficiently trafficked through the secretory pathway. On the contrary, when “codon optimized” F508del CFTR was expressed, a greater percentage of the mutant protein escaped ERAD and trafficked to the cell surface, showing channel activity. To understand these differences, we note that F508del CFTR is subjected to ERAD (Ward et al. 1995), but when overexpressed, some of the mutant protein can escape ERAD, travel to the cell surface, and show functional activity (chloride transport) (Cheng et al. 1990). Therefore, if the “codon optimization” caused higher translation, some of the mutant protein likely escaped ERAD. On the contrary, “codon optimization”-dependent higher translation efficiency of the wild type CFTR resulted in the misfolding of more protein than the native sequence, and the misfolded potion was subjected to ERAD. These hypotheses are supported by studies which investigated the translation, processing (maturation efficiency), and functional levels of CFTR (Varga et al. 2004; Rab et al. 2007). CFTR biogenesis is influenced not only by translation efficiency but also by other factors such as cell type, natural or overexpression systems (Varga et al. 2004), culture conditions (Bebok et al. 2001; Blount et al. 2011; Guimbellot et al. 2008), cellular stress conditions (Rab et al. 2007; Bebok and Lianwu 2018), and post-translational modifications (Sato et al. 1998; Ahner et al. 2013; Zeitlin 2006; Bebok et al. 2002; McClure et al. 2014; Pankow et al. 2019). Although the studies by Shah et al., did not investigate each synonymous substitution and mechanism individually, these results are significant and support the idea that synonymous codon changes affect protein expression and function at multiple levels.
6.3 Prospective Synonymous mutations are receiving more attention because advances in sequencing methods allow the identification of these sequence variants in the genome and bioinformatic analysis can link them to human diseases and altered disease phenotypes. Analyzing mechanisms by which synonymous codon usage affects protein expression expands our understanding of the mechanisms by which codon usage bias helps to sculpt different organisms. As the database of the synonymous mutation-associated diseases grows, novel computational methods are being developed to predict the cause and consequences of synonymous mutations. Biological experimentation such as determining tRNA composition of cells, mRNA structural changes, measuring the efficiency of pre-mRNA processing and the consequences of the synonymous mutations on protein folding, structure and function are extremely laborious and may be influenced by multiple factors in cells and tissues. Nevertheless, linking synonymous mutations with protein expression profiles
122
Y. Zhang and Z. Bebok
Fig. 6.3 A summary of mechanisms by which synonymous mutations may lead to human disorders and change in phenotype
(proteome), analyzing protein folding at real time, followed by functional assays to determine the functions of the proteins, are necessary to confirm the consequences of synonymous mutations. Based on our current knowledge and available experimental data, we conclude that synonymous mutations may have multiple downstream effects on mRNA half-life and translation efficiency, protein stability, function, and response to drugs. Therefore, we argue that the term “silent mutation” does not justify the nature of synonymous mutations. Many synonymous mutations are not silent and can alter the function of proteins by multiple mechanisms without altering amino acid sequence. A summary of mechanisms by which synonymous mutations affect protein levels and function are highlighted in Fig. 6.3.
References Adzhubei AA, Adzhubei IA, Krasheninnikov IA, Neidle S (1996) Non-random usage of ‘degenerate’ codons is related to protein three-dimensional structure. FEBS Lett 399:78–82 Ahner A, Gong X, Frizzell RA (2013) Cystic fibrosis transmembrane conductance regulator degradation: cross-talk between the ubiquitylation and SUMOylation pathways. FEBS J 280:4430–4438
6 An Examination of Mechanisms by which Synonymous Mutations may Alter…
123
Antunes D, Jorge NAN, Caffarena ER, Passetti F (2017) Using RNA sequence and structure for the prediction of riboswitch aptamer: a comprehensive review of available software and tools. Front Genet 8:231 Austin F, Oyarbide U, Massey G, Grimes M, Corey SJ (2017) Synonymous mutation in TP53 results in a cryptic splice site affecting its DNA-binding site in an adolescent with two primary sarcomas. Pediatr Blood Cancer 64 Bali V, Lazrak A, Guroji P, Fu L, Matalon S, Bebok Z (2016a) A synonymous codon change alters the drug sensitivity of DeltaF508 cystic fibrosis transmembrane conductance regulator. FASEB J 30:201–213 Bali V, Lazrak A, Guroji P, Matalon S, Bebok Z (2016b) Mechanistic approaches to improve correction of the most common disease-causing mutation in cystic fibrosis. PLoS One 11:e0155882 Bampi GB, Ramalho AS, Santos LA, Wagner J, Dupont L, Cuppens H, De Boeck K, Ignatova Z (2020) The effect of synonymous single-nucleotide polymorphisms on an atypical cystic fibrosis clinical presentation. Life (Basel) 11 Barrett LW, Fletcher S, Wilton SD (2012) Regulation of eukaryotic gene expression by the untranslated gene regions and other non-coding elements. Cell Mol Life Sci 69:3613–3634 Bartel DP (2009) MicroRNAs: target recognition and regulatory functions. Cell 136:215–233 Bartel DP (2018) Metazoan microRNAs. Cell 173:20–51 Bartoszewski RA, Jablonsky M, Bartoszewska S, Stevenson L, Dai Q, Kappes J, Collawn JF, Bebok Z (2010) A synonymous single nucleotide polymorphism in DeltaF508 CFTR alters the secondary structure of the mRNA and the expression of the mutant protein. J Biol Chem 285:28741–28748 Bartoszewski R, Kroliczewski J, Piotrowski A, Jasiecka AJ, Bartoszewska S, Vecchio-Pagan B, Fu L, Sobolewska A, Matalon S, Cutting GR, Rowe SM, Collawn JF (2016) Codon bias and the folding dynamics of the cystic fibrosis transmembrane conductance regulator. Cell Mol Biol Lett 21:23 Bebok Z, Lianwu F (2018) Stressors and Stress responses in Cystic Fibrosis. Cell Pathology 5:11–29. https://doi.org/10.1515/ersc-2018-0002 Bebok Z, Tousson A, Schwiebert LM, Venglarik CJ (2001) Improved oxygenation promotes CFTR maturation and trafficking in MDCK monolayers. Am J Physiol Cell Physiol 280:C135–C145 Bebok Z, Varga K, Hicks JK, Venglarik CJ, Kovacs T, Chen L, Hardiman KM, Collawn JF, Sorscher EJ, Matalon S (2002) Reactive oxygen nitrogen species decrease cystic fibrosis transmembrane conductance regulator expression and cAMP-mediated Cl- secretion in airway epithelia. J Biol Chem 277:43041–43049 Berg OG, Kurland CG (1997) Growth rate-optimised tRNA abundance and codon usage. J Mol Biol 270:544–550 Berg MD, Giguere DJ, Dron JS, Lant JT, Genereaux J, Liao C, Wang J, Robinson JF, Gloor GB, Hegele RA, O’Donoghue P, Brandl CJ (2019) Targeted sequencing reveals expanded genetic diversity of human transfer RNAs. RNA Biol 16:1574–1585 Blount A, Zhang S, Chestnut M, Hixon B, Skinner D, Sorscher EJ, Woodworth BA (2011) Transepithelial ion transport is suppressed in hypoxic sinonasal epithelium. Laryngoscope 121:1929–1934 Brest P, Lapaquette P, Souidi M, Lebrigand K, Cesaro A, Vouret-Craviari V, Mari B, Barbry P, Mosnier JF, Hebuterne X, Harel-Bellan A, Mograbi B, Darfeuille-Michaud A, Hofman P (2011) A synonymous variant in IRGM alters a binding site for miR-196 and causes deregulation of IRGM-dependent xenophagy in Crohn’s disease. Nat Genet 43:242–245 Brini E, Simmerling C, Dill K (2020) Protein storytelling through physics. Science 370 Bruijn LI, Becher MW, Lee MK, Anderson KL, Jenkins NA, Copeland NG, Sisodia SS, Rothstein JD, Borchelt DR, Price DL, Cleveland DW (1997) ALS-linked SOD1 mutant G85R mediates damage to astrocytes and promotes rapidly progressive disease with SOD1-containing inclusions. Neuron 18:327–338
124
Y. Zhang and Z. Bebok
Buhr F, Jha S, Thommen M, Mittelstaet J, Kutz F, Schwalbe H, Rodnina MV, Komar AA (2016) Synonymous Codons Direct Cotranslational Folding toward Different Protein Conformations. Mol Cell 61:341–351 Canson D, Glubb D, Spurdle AB (2020) Variant effect on splicing regulatory elements, branchpoint usage, and pseudoexonization: strategies to enhance bioinformatic prediction using hereditary cancer genes as exemplars. Hum Mutat 41 Cardozo AK, De Meirleir L, Liebaers I, Lissens W (2000) Analysis of exonic mutations leading to exon skipping in patients with pyruvate dehydrogenase E1 alpha deficiency. Pediatr Res 48:748–753 Carrocci TJ, Neugebauer KM (2019) Pre-mRNA Splicing in the Nuclear Landscape. Cold Spring Harb Symp Quant Biol 84:11–20 Cassaignau AME, Cabrita LD, Christodoulou J (2020) How does the ribosome fold the proteome? Annu Rev Biochem 89:389–415 Chao HK, Hsiao KJ, Su TS (2001) A silent mutation induces exon skipping in the phenylalanine hydroxylase gene in phenylketonuria. Hum Genet 108:14–19 Cheng SH, Gregory RJ, Marshall J, Paul S, Souza DW, White GA, O’Riordan CR, Smith AE (1990) Defective intracellular transport and processing of CFTR is the molecular basis of most cystic fibrosis. Cell 63:827–834 Claverie-Martin F, Gonzalez-Paredes FJ, Ramos-Trujillo E (2015) Splicing defects caused by exonic mutations in PKD1 as a new mechanism of pathogenesis in autosomal dominant polycystic kidney disease. RNA Biol 12:369–374 Cleveland DW, Rothstein JD (2001) From Charcot to Lou Gehrig: deciphering selective motor neuron death in ALS. Nat Rev Neurosci 2:806–819 Collart MA, Weiss B (2020) Ribosome pausing, a dangerous necessity for co-translational events. Nucleic Acids Res 48:1043–1055 Cutting GR (2015) Cystic fibrosis genetics: from molecular understanding to clinical application. Nat Rev Genet 16:45–56 Czech A, Fedyunin I, Zhang G, Ignatova Z (2010) Silent mutations in sight: co-variations in tRNA abundance as a key to unravel consequences of silent mutations. Mol Biosyst 6:1767–1772 De Meirleir L, Lissens W, Benelli C, Ponsot G, Desguerre I, Marsac C, Rodriguez D, Saudubray JM, Poggi F, Liebaers I (1994) Aberrant splicing of exon 6 in the pyruvate dehydrogenase-E1 alpha mRNA linked to a silent mutation in a large family with Leigh’s encephalomyelopathy. Pediatr Res 36:707–712 Dhindsa RS, Copeland BR, Mustoe AM, Goldstein DB (2020) Natural selection shapes codon usage in the human genome. Am J Hum Genet 107:83–95 Diatchenko L, Slade GD, Nackley AG, Bhalang K, Sigurdsson A, Belfer I, Goldman D, Xu K, Shabalina SA, Shagin D, Max MB, Makarov SS, Maixner W (2005) Genetic basis for individual variations in pain perception and the development of a chronic pain condition. Hum Mol Genet 14:135–143 Dickinson KM, Collaco JM (2021) Cystic fibrosis. Pediatr Rev 42:55–67 Dittmar KA, Goodenbour JM, Pan T (2006) Tissue-specific differences in human transfer RNA expression. PLoS Genet 2:e221 Domingo J, Baeza-Centurion P, Lehner B (2019) The Causes and Consequences of Genetic Interactions (Epistasis). Annu Rev Genomics Hum Genet 20:433–460 Dork T, Dworniczak B, Aulehla-Scholz C, Wieczorek D, Bohm I, Mayerova A, Seydewitz HH, Nieschlag E, Meschede D, Horst J, Pander HJ, Sperling H, Ratjen F, Passarge E, Schmidtke J, Stuhrmann M (1997) Distinct spectrum of CFTR gene mutations in congenital absence of vas deferens. Hum Genet 100:365–377 Du K, Lukacs GL (2009) Cooperative assembly and misfolding of CFTR domains in vivo. Mol Biol Cell 20:1903–1915 Duan J, Wainwright MS, Comeron JM, Saitou N, Sanders AR, Gelernter J, Gejman PV (2003) Synonymous mutations in the human dopamine receptor D2 (DRD2) affect mRNA stability and synthesis of the receptor. Hum Mol Genet 12:205–216
6 An Examination of Mechanisms by which Synonymous Mutations may Alter…
125
Dufton MJ (1983) The significance of redundancy in the genetic code. J Theor Biol 102:521–526 Edwards NC, Hing ZA, Perry A, Blaisdell A, Kopelman DB, Fathke R, Plum W, Newell J, Allen CESG, Shapiro A, Okunji C, Kosti I, Shomron N, Grigoryan V, Przytycka TM, Sauna ZE, Salari R, Mandel-Gutfreund Y, Komar AA, Kimchi-Sarfaty C (2012) Characterization of coding synonymous and non-synonymous variants in ADAMTS13 using ex vivo and in silico approaches. PLoS One 7:e38864 Eny KM, Corey PN, El-Sohemy A (2009) Dopamine D2 receptor genotype (C957T) and habitual consumption of sugars in a free-living population of men and women. J Nutrigenet Nutrigenomics 2:235–242 Faa V, Coiana A, Incani F, Costantino L, Cao A, Rosatelli MC (2010) A synonymous mutation in the CFTR gene causes aberrant splicing in an italian patient affected by a mild form of cystic fibrosis. J Mol Diagn 12:380–383 Forman JJ, Coller HA (2010) The code within the code: microRNAs target coding regions. Cell Cycle 9:1533–1541 Forman JJ, Legesse-Miller A, Coller HA (2008) A search for conserved sequences in coding regions reveals that the let-7 microRNA targets Dicer within its coding sequence. Proc Natl Acad Sci U S A 105:14879–14884 Fredrick K, Ibba M (2010) How the sequence of a gene can tune its translation. Cell 141:227–229 Fujishima K, Kanai A (2014) tRNA gene diversity in the three domains of life. Front Genet 5:142 Fung KL, Pan J, Ohnuma S, Lund PE, Pixley JN, Kimchi-Sarfaty C, Ambudkar SV, Gottesman MM (2014) MDR1 synonymous polymorphisms alter transporter specificity and protein stability in a stable epithelial monolayer. Cancer Res 74:598–608 Gartner JJ, Parker SC, Prickett TD, Dutton-Regester K, Stitzel ML, Lin JC, Davis S, Simhadri VL, Jha S, Katagiri N, Gotea V, Teer JK, Wei X, Morken MA, Bhanot UK, Program NCS, Chen G, Elnitski LL, Davies MA, Gershenwald JE, Carter H, Karchin R, Robinson W, Robinson S, Rosenberg SA, Collins FS, Parmigiani G, Komar AA, Kimchi-Sarfaty C, Hayward NK, Margulies EH, Samuels Y (2013) Whole-genome sequencing identifies a recurrent functional synonymous mutation in melanoma. Proc Natl Acad Sci U S A 110:13481–13486 Gingold H, Tehler D, Christoffersen NR, Nielsen MM, Asmar F, Kooistra SM, Christophersen NS, Christensen LL, Borre M, Sorensen KD, Andersen LD, Andersen CL, Hulleman E, Wurdinger T, Ralfkiaer E, Helin K, Gronbaek K, Orntoft T, Waszak SM, Dahan O, Pedersen JS, Lund AH, Pilpel Y (2014) A dual program for translation regulation in cellular proliferation and differentiation. Cell 158:1281–1292 Golimbet VE, Garakh ZV, Zaytseva Y, Alfimova MV, Lezheiko TV, Kondratiev NV, Shmukler AB, Gurovich IY, Strelets VB (2017) The dopamine receptor D2 C957T polymorphism modulates early components of event-related potentials in visual word recognition task. Neuropsychobiology 76:143–150 Gonzalez-Paredes FJ, Ramos-Trujillo E, Claverie-Martin F (2014) Defective pre-mRNA splicing in PKD1 due to presumed missense and synonymous mutations causing autosomal dominant polycystic disease. Gene 546:243–249 Gorochowski TE, Ignatova Z, Bovenberg RA, Roubos JA (2015) Trade-offs between tRNA abundance and mRNA secondary structure support smoothing of translation elongation rate. Nucleic Acids Res 43:3022–3032 Gottesman MM, Fojo T, Bates SE (2002) Multidrug resistance in cancer: role of ATP-dependent transporters. Nat Rev Cancer 2:48–58 Grzanka M, Piekielko-Witkowska A (2021) The role of TCOF1 gene in health and disease: beyond treacher collins syndrome. Int J Mol Sci 22 Guimaraes JC, Mittal N, Gnann A, Jedlinski D, Riba A, Buczak K, Schmidt A, Zavolan M (2020) A rare codon-based translational program of cell proliferation. Genome Biol 21:44 Guimbellot JS, Fortenberry JA, Siegal GP, Moore B, Wen H, Venglarik C, Chen YF, Oparil S, Sorscher EJ, Hong JS (2008) Role of oxygen availability in CFTR expression and function. Am J Respir Cell Mol Biol 39:514–521
126
Y. Zhang and Z. Bebok
Guo H, Ingolia NT, Weissman JS, Bartel DP (2010) Mammalian microRNAs predominantly act to decrease target mRNA levels. Nature 466:835–840 Hamasaki-Katagiri N, Lin BC, Simon J, Hunt RC, Schiller T, Russek-Cohen E, Komar AA, Bar H, Kimchi-Sarfaty C (2017) The importance of mRNA structure in determining the pathogenicity of synonymous and non-synonymous mutations in haemophilia. Haemophilia 23:e8–e17 Hansen TV, Steffensen AY, Jonson L, Andersen MK, Ejlertsen B, Nielsen FC (2010) The silent mutation nucleotide 744 G → A, Lys172Lys, in exon 6 of BRCA2 results in exon skipping. Breast Cancer Res Treat 119:547–550 Hanson G, Coller J (2018) Codon optimality, bias and usage in translation and mRNA decay. Nat Rev Mol Cell Biol 19:20–30 Hauber DJ, Grogan DW, DeBry RW (2016) Mutations to less-preferred synonymous codons in a highly expressed gene of escherichia coli: fitness and epistatic interactions. PLoS One 11:e0146375 He L, Aleksandrov AA, An J, Cui L, Yang Z, Brouillette CG, Riordan JR (2015) Restoration of NBD1 thermal stability is necessary and sufficient to correct F508 CFTR folding and assembly. J Mol Biol 427:106–120 Hill AE, Plyler ZE, Tiwari H, Patki A, Tully JP, McAtee CW, Moseley LA, Sorscher EJ (2014) Longevity and plasticity of CFTR provide an argument for noncanonical SNP organization in hominid DNA. PLoS One 9:e109186 Hing ZA, Schiller T, Wu A, Hamasaki-Katagiri N, Struble EB, Russek-Cohen E, Kimchi-Sarfaty C (2013) Multiple in silico tools predict phenotypic manifestations in congenital thrombotic thrombocytopenic purpura. Br J Haematol 160:825–837 Hirvonen M, Laakso A, Nagren K, Rinne JO, Pohjalainen T, Hietala J (2004) C957T polymorphism of the dopamine D2 receptor (DRD2) gene affects striatal DRD2 availability in vivo. Mol Psychiatry 9:1060–1061 Hirvonen MM, Laakso A, Nagren K, Rinne JO, Pohjalainen T, Hietala J (2009) C957T polymorphism of dopamine D2 receptor gene affects striatal DRD2 in vivo availability by changing the receptor affinity. Synapse 63:907–912 Hoagland MB, Stephenson ML, Scott JF, Hecht LI, Zamecnik PC (1958) A soluble ribonucleic acid intermediate in protein synthesis. J Biol Chem 231:241–257 Hoelen H, Kleizen B, Schmidt A, Richardson J, Charitou P, Thomas PJ, Braakman I (2010) The primary folding defect and rescue of DeltaF508 CFTR emerge during translation of the mutant domain. PLoS One 5:e15458 Hoenicka J, Aragues M, Rodriguez-Jimenez R, Ponce G, Martinez I, Rubio G, Jimenez-Arriero MA, Palomo T, Psychosis, and Addictions Research, G. (2006) C957T DRD2 polymorphism is associated with schizophrenia in Spanish patients. Acta Psychiatr Scand 114:435–438 Hunt R, Hettiarachchi G, Katneni U, Hernandez N, Holcomb D, Kames J, Alnifaidy R, Lin B, Hamasaki-Katagiri N, Wesley A, Kafri T, Morris C, Bouche L, Panico M, Schiller T, Ibla J, Bar H, Ismail A, Morris H, Komar A, Kimchi-Sarfaty C (2019) A single synonymous variant (c.354G>A [p.P118P]) in ADAMTS13 confers enhanced specific activity. Int J Mol Sci 20 Ikemura T (1981) Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J Mol Biol 151:389–409 Jacobsen LK, Pugh KR, Mencl WE, Gelernter J (2006) C957T polymorphism of the dopamine D2 receptor gene modulates the effect of nicotine on working memory performance and cortical processing efficiency. Psychopharmacology (Berl) 188:530–540 Joly BS, Coppo P, Veyradier A (2019) An update on pathogenesis and diagnosis of thrombotic thrombocytopenic purpura. Expert Rev Hematol 12:383–395 Joruiz SM, Bourdon JC (2016) p53 Isoforms: key regulators of the cell fate decision. Cold Spring Harb Perspect Med 6 Karalija N, Papenberg G, Wahlin A, Johansson J, Andersson M, Axelsson J, Riklund K, Lovden M, Lindenberger U, Backman L, Nyberg L (2019) C957T-mediated Variation in Ligand Affinity
6 An Examination of Mechanisms by which Synonymous Mutations may Alter…
127
Affects the Association between (11)C-raclopride Binding Potential and Cognition. J Cogn Neurosci 31:314–325 Kerem B, Rommens JM, Buchanan JA, Markiewicz D, Cox TK, Chakravarti A, Buchwald M, Tsui LC (1989) Identification of the cystic fibrosis gene: genetic analysis. Science 245:1073–1080 Kim SJ, Skach WR (2012) Mechanisms of CFTR folding at the endoplasmic reticulum. Front Pharmacol 3:201 Kim B, Hing ZA, Wu A, Schiller T, Struble EB, Liuwantara D, Kempert PH, Broxham EJ, Edwards NC, Marder VJ, Simhadri VL, Sauna ZE, Howard TE, Kimchi-Sarfaty C (2014) Single-nucleotide variations defining previously unreported ADAMTS13 haplotypes are associated with differential expression and activity of the VWF-cleaving protease in a Salvadoran congenital thrombotic thrombocytopenic purpura family. Br J Haematol 165:154–158 Kim SJ, Yoon JS, Shishido H, Yang Z, Rooney LA, Barral JM, Skach WR (2015) Protein folding. Translational tuning optimizes nascent protein folding in cells. Science 348:444–448 Kimchi-Sarfaty C, Oh JM, Kim IW, Sauna ZE, Calcagno AM, Ambudkar SV, Gottesman MM (2007) A “silent” polymorphism in the MDR1 gene changes substrate specificity. Science 315:525–528 Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J (2014) A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 46:310–315 Kirchner S, Ignatova Z (2015) Emerging roles of tRNA in adaptive translation, signalling dynamics and disease. Nat Rev Genet 16:98–112 Kirchner S, Cai Z, Rauscher R, Kastelic N, Anding M, Czech A, Kleizen B, Ostedgaard LS, Braakman I, Sheppard DN, Ignatova Z (2017) Alteration of protein function by a silent polymorphism linked to tRNA abundance. PLoS Biol 15:e2000779 Kleizen B, Braakman I, de Jonge HR (2000) Regulated trafficking of the CFTR chloride channel. Eur J Cell Biol 79:544–556 Kleizen B, van Vlijmen T, de Jonge HR, Braakman I (2005) Folding of CFTR is predominantly cotranslational. Mol Cell 20:277–287 Komar AA, Jaenicke R (1995) Kinetics of translation of gamma B crystallin and its circularly permutated variant in an in vitro cell-free system: possible relations to codon distribution and protein folding. FEBS Lett 376:195–198 Kramer G, Shiber A, Bukau B (2019) Mechanisms of cotranslational maturation of newly synthesized proteins. Annu Rev Biochem 88:337–364 Lazrak A, Fu L, Bali V, Bartoszewski R, Rab A, Havasi V, Keiles S, Kappes J, Kumar R, Lefkowitz E, Sorscher EJ, Matalon S, Collawn JF, Bebok Z (2013) The silent codon change I507-ATC>ATT contributes to the severity of the DeltaF508 CFTR channel dysfunction. FASEB J 27:4630–4645 Li H, Sheppard DN, Hug MJ (2004) Transepithelial electrical measurements with the Ussing chamber. J Cyst Fibros 3(Suppl 2):123–126 Li X, Qiu S, Shi J, Wang S, Wang M, Xu Y, Nie Z, Liu C, Liu C (2019) A new function of copper zinc superoxide dismutase: as a regulatory DNA-binding protein in gene expression in response to intracellular hydrogen peroxide. Nucleic Acids Res 47:5074–5085 Liu HX, Zhang M, Krainer AR (1998) Identification of functional exonic splicing enhancer motifs recognized by individual SR proteins. Genes Dev 12:1998–2012 Liu Y, Yang Q, Zhao F (2021) Synonymous but not silent: the codon usage code for gene expression and protein folding. Annu Rev Biochem 90:375 Lukacs GL, Mohamed A, Kartner N, Chang XB, Riordan JR, Grinstein S (1994) Conformational maturation of CFTR but not its mutant counterpart (delta F508) occurs in the endoplasmic reticulum and requires ATP. EMBO J 13:6076–6086 Macaya D, Katsanis SH, Hefferon TW, Audlin S, Mendelsohn NJ, Roggenbuck J, Cutting GR (2009) A synonymous mutation in TCOF1 causes Treacher Collins syndrome due to mis- splicing of a constitutive exon. Am J Med Genet A 149A:1624–1627
128
Y. Zhang and Z. Bebok
Mannisto PT, Kaakkola S (1999) Catechol-O-methyltransferase (COMT): biochemistry, molecular biology, pharmacology, and clinical efficacy of the new selective COMT inhibitors. Pharmacol Rev 51:593–628 Marin M (2008) Folding at the rhythm of the rare codon beat. Biotechnol J 3:1047–1057 Marinko JT, Huang H, Penn WD, Capra JA, Schlebach JP, Sanders CR (2019) Folding and misfolding of human membrane proteins in health and disease: from single molecules to cellular proteostasis. Chem Rev 119:5537–5606 Martel JC, Gatti McArthur S (2020) Dopamine receptor subtypes, physiology and pharmacology: new ligands and concepts in schizophrenia. Front Pharmacol 11:1003 Mauger DM, Cabral BJ, Presnyak V, Su SV, Reid DW, Goodman B, Link K, Khatwani N, Reynders J, Moore MJ, McFadyen IJ (2019) mRNA structure regulates protein expression through changes in functional half-life. Proc Natl Acad Sci U S A 116:24075–24083 McCarthy C, Carrea A, Diambra L (2017) Bicodon bias can determine the role of synonymous SNPs in human diseases. BMC Genomics 18:227 McClure ML, Wen H, Fortenberry J, Hong JS, Sorscher EJ (2014) S-palmitoylation regulates biogenesis of core glycosylated wild-type and F508del CFTR in a post-ER compartment. Biochem J 459:417–425 Meyer IM, Miklos I (2005) Statistical evidence for conserved, local secondary structure in the coding regions of eukaryotic mRNAs and pre-mRNAs. Nucleic Acids Res 33:6338–6348 Mijnders M, Kleizen B, Braakman I (2017) Correcting CFTR folding defects by small-molecule correctors to cure cystic fibrosis. Curr Opin Pharmacol 34:83–90 Minucci A, Concolino P, De Bonis M, Costella A, Paris I, Scambia G, Capoluongo E (2018) Preliminary molecular evidence associating a novel BRCA1 synonymous variant with hereditary ovarian cancer syndrome. Hum Genome Var 5:2 Miranda GG, Rodrigue KM, Kennedy KM (2021) Cortical thickness mediates the relationship between DRD2 C957T polymorphism and executive function across the adult lifespan. Brain Struct Funct 226:121–136 Monani UR, Lorson CL, Parsons DW, Prior TW, Androphy EJ, Burghes AH, McPherson JD (1999) A single nucleotide difference that alters splicing patterns distinguishes the SMA gene SMN1 from the copy gene SMN2. Hum Mol Genet 8:1177–1183 Nackley AG, Shabalina SA, Tchivileva IE, Satterfield K, Korchynskyi O, Makarov SS, Maixner W, Diatchenko L (2006) Human catechol-O-methyltransferase haplotypes modulate protein expression by altering mRNA secondary structure. Science 314:1930–1933 Nielsen KB, Sorensen S, Cartegni L, Corydon TJ, Doktor TK, Schroeder LD, Reinert LS, Elpeleg O, Krainer AR, Gregersen N, Kjems J, Andresen BS (2007) Seemingly neutral polymorphic variants may confer immunity to splicing-inactivating mutations: a synonymous SNP in exon 5 of MCAD protects from deleterious mutations in a flanking exonic splicing enhancer. Am J Hum Genet 80:416–432 O’Brien EP, Ciryam P, Vendruscolo M, Dobson CM (2014) Understanding the influence of codon translation rates on cotranslational protein folding. Acc Chem Res 47:1536–1544 Okiyoneda T, Veit G, Sakai R, Aki M, Fujihara T, Higashi M, Susuki-Miyata S, Miyata M, Fukuda N, Yoshida A, Xu H, Apaja PM, Lukacs GL (2018) Chaperone-independent peripheral quality control of CFTR by RFFL E3 ligase. Dev Cell 44(694-708):e697 Pagani F, Stuani C, Tzetis M, Kanavakis E, Efthymiadou A, Doudounakis S, Casals T, Baralle FE (2003) New type of disease causing mutations: the example of the composite exonic regulatory elements of splicing in CFTR exon 12. Hum Mol Genet 12:1111–1120 Pagani F, Raponi M, Baralle FE (2005) Synonymous mutations in CFTR exon 12 affect splicing and are not neutral in evolution. Proc Natl Acad Sci U S A 102:6368–6372 Pal S, Tiwari A, Sharma K, Sharma SK (2020) Does conserved domain SOD1 mutation has any role in ALS severity and therapeutic outcome? BMC Neurosci 21:42 Pankow S, Bamberger C, Yates JR 3rd. (2019) A posttranslational modification code for CFTR maturation is altered in cystic fibrosis. Sci Signal 12
6 An Examination of Mechanisms by which Synonymous Mutations may Alter…
129
Parkes M, Barrett JC, Prescott NJ, Tremelling M, Anderson CA, Fisher SA, Roberts RG, Nimmo ER, Cummings FR, Soars D, Drummond H, Lees CW, Khawaja SA, Bagnall R, Burke DA, Todhunter CE, Ahmad T, Onnie CM, McArdle W, Strachan D, Bethel G, Bryan C, Lewis CM, Deloukas P, Forbes A, Sanderson J, Jewell DP, Satsangi J, Mansfield JC, Wellcome Trust Case Control C, Cardon L, Mathew CG (2007) Sequence variants in the autophagy gene IRGM and multiple other replicating loci contribute to Crohn's disease susceptibility. Nat Genet 39:830–832 Pechmann S (2018) Coping with stress by regulating tRNAs. Sci Signal 11 Pechmann S, Frydman J (2013) Evolutionary conservation of codon optimality reveals hidden signatures of cotranslational folding. Nat Struct Mol Biol 20:237–243 Pechmann S, Chartron JW, Frydman J (2014) Local slowdown of translation by nonoptimal codons promotes nascent-chain recognition by SRP in vivo. Nat Struct Mol Biol 21:1100–1105 Polte C, Wedemeyer D, Oliver KE, Wagner J, Bijvelds MJC, Mahoney J, de Jonge HR, Sorscher EJ, Ignatova Z (2019) Assessing cell-specific effects of genetic variations using tRNA microarrays. BMC Genomics 20:549 Presnyak V, Alhusaini N, Chen YH, Martin S, Morris N, Kline N, Olson S, Weinberg D, Baker KE, Graveley BR, Coller J (2015) Codon optimality is a major determinant of mRNA stability. Cell 160:1111–1124 Rab A, Bartoszewski R, Jurkuvenaite A, Wakefield J, Collawn JF, Bebok Z (2007) Endoplasmic reticulum stress and the unfolded protein response regulate genomic cystic fibrosis transmembrane conductance regulator expression. Am J Physiol Cell Physiol 292:C756–C766 Rak R, Dahan O, Pilpel Y (2018) Repertoires of tRNAs: the couplers of genomics and proteomics. Annu Rev Cell Dev Biol 34:239–264 Raponi M, Baralle FE, Pagani F (2007) Reduced splicing efficiency induced by synonymous substitutions may generate a substrate for natural selection of new splicing isoforms: the case of CFTR exon 12. Nucleic Acids Res 35:606–613 Rauscher R, Ignatova Z (2018) Timing during translation matters: synonymous mutations in human pathologies influence protein folding and function. Biochem Soc Trans 46:937–944 Rauscher R, Bampi GB, Guevara-Ferrer M, Santos LA, Joshi D, Mark D, Strug LJ, Rommens JM, Ballmann M, Sorscher EJ, Oliver KE, Ignatova Z (2021) Positive epistasis between disease- causing missense mutations and silent polymorphism with effect on mRNA translation velocity. Proc Natl Acad Sci U S A 118 Reczko M, Maragkakis M, Alexiou P, Grosse I, Hatzigeorgiou AG (2012) Functional microRNA targets in protein coding sequences. Bioinformatics 28:771–776 Rentzsch P, Witten D, Cooper GM, Shendure J, Kircher M (2019) CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res 47:D886–D894 Rentzsch P, Schubach M, Shendure J, Kircher M (2021) CADD-Splice-improving genome-wide variant effect prediction using deep learning-derived splice scores. Genome Med 13:31 Riordan JR, Rommens JM, Kerem B, Alon N, Rozmahel R, Grzelczak Z, Zielenski J, Lok S, Plavsic N, Chou JL et al (1989) Identification of the cystic fibrosis gene: cloning and characterization of complementary DNA. Science 245:1066–1073 Rodnina MV, Wintermeyer W (2016) Protein elongation, co-translational folding and targeting. J Mol Biol 428:2165–2185 Rommens JM, Iannuzzi MC, Kerem B, Drumm ML, Melmer G, Dean M, Rozmahel R, Cole JL, Kennedy D, Hidaka N et al (1989) Identification of the cystic fibrosis gene: chromosome walking and jumping. Science 245:1059–1065 Sabarinathan R, Tafer H, Seemann SE, Hofacker IL, Stadler PF, Gorodkin J (2013) The RNAsnp web server: predicting SNP effects on local RNA secondary structure. Nucleic Acids Res 41:W475–W479 Santos M, Fidalgo A, Varanda AS, Oliveira C, Santos MAS (2019) tRNA deregulation and its consequences in cancer. Trends Mol Med 25:853–865 Sato S, Ward CL, Kopito RR (1998) Cotranslational ubiquitination of cystic fibrosis transmembrane conductance regulator in vitro. J Biol Chem 273:7189–7192
130
Y. Zhang and Z. Bebok
Saunders R, Deane CM (2010) Synonymous codon usage influences the local protein structure observed. Nucleic Acids Res 38:6719–6728 Saunders R, Mann M, Deane CM (2011) Signatures of co-translational folding. Biotechnol J 6:742–751 Scherrer K (2018) Primary transcripts: from the discovery of RNA processing to current concepts of gene expression – review. Exp Cell Res 373:1–33 Schimmel P (2018) The emerging complexity of the tRNA world: mammalian tRNAs beyond protein synthesis. Nat Rev Mol Cell Biol 19:45–58 Schroeder SJ (2018) Challenges and approaches to predicting RNA with multiple functional structures. RNA 24:1615–1624 Schwarz A, Beck M (2019) The benefits of cotranslational assembly: a structural perspective. Trends Cell Biol 29:791–803 Shah K, Cheng Y, Hahn B, Bridges R, Bradbury NA, Mueller DM (2015) Synonymous codon usage affects the expression of wild type and F508del CFTR. J Mol Biol 427:1464–1479 Sharma M, Benharouga M, Hu W, Lukacs GL (2001) Conformational and temperature-sensitive stability defects of the delta F508 cystic fibrosis transmembrane conductance regulator in post- endoplasmic reticulum compartments. J Biol Chem 276:8942–8950 Sharma Y, Miladi M, Dukare S, Boulay K, Caudron-Herger M, Gross M, Backofen R, Diederichs S (2019) A pan-cancer analysis of synonymous mutations. Nat Commun 10:2569 Sharp PM, Li WH (1986) An evolutionary perspective on synonymous codon usage in unicellular organisms. J Mol Evol 24:28–38 Sharp PM, Li WH (1987) The codon Adaptation Index – a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15:1281–1295 Sheppard DN (2004) CFTR channel pharmacology: novel pore blockers identified by high- throughput screening. J Gen Physiol 124:109–113 Sheppard DN (2011) Cystic fibrosis: CFTR correctors to the rescue. Chem Biol 18:145–147 Sheppard DN, Ostedgaard LS, Winter MC, Welsh MJ (1995) Mechanism of dysfunction of two nucleotide binding domain mutations in cystic fibrosis transmembrane conductance regulator that are associated with pancreatic sufficiency. EMBO J 14:876–883 Shishido H, Yoon JS, Yang Z, Skach WR (2020) CFTR trafficking mutations disrupt cotranslational protein folding by targeting biosynthetic intermediates. Nat Commun 11:4258 Simhadri VL, Hamasaki-Katagiri N, Lin BC, Hunt R, Jha S, Tseng SC, Wu A, Bentley AA, Zichel R, Lu Q, Zhu L, Freedberg DI, Monroe DM, Sauna ZE, Peters R, Komar AA, Kimchi-Sarfaty C (2017) Single synonymous mutation in factor IX alters protein properties and underlies haemophilia B. J Med Genet 54:338–345 Stegh AH, Kim H, Bachoo RM, Forloney KL, Zhang J, Schulze H, Park K, Hannon GJ, Yuan J, Louis DN, DePinho RA, Chin L (2007) Bcl2L12 inhibits post-mitochondrial apoptosis signaling in glioblastoma. Genes Dev 21:98–111 Steingrimsdottir H, Rowley G, Dorado G, Cole J, Lehmann AR (1992) Mutations which alter splicing in the human hypoxanthine-guanine phosphoribosyltransferase gene. Nucleic Acids Res 20:1201–1208 Tay Y, Zhang J, Thomson AM, Lim B, Rigoutsos I (2008a) MicroRNAs to Nanog, Oct4 and Sox2 coding regions modulate embryonic stem cell differentiation. Nature 455:1124–1128 Tay YM, Tam WL, Ang YS, Gaughwin PM, Yang H, Wang W, Liu R, George J, Ng HH, Perera RJ, Lufkin T, Rigoutsos I, Thomson AM, Lim B (2008b) MicroRNA-134 modulates the differentiation of mouse embryonic stem cells, where it causes post-transcriptional attenuation of Nanog and LRH1. Stem Cells 26:17–29 Tellier M, Maudlin I, Murphy S (2020) Transcription and splicing: a two-way street. Wiley Interdiscip Rev RNA 11:e1593 Torrent M, Chalancon G, de Groot NS, Wuster A, Madan Babu M (2018) Cells alter their tRNA abundance to selectively regulate protein synthesis during stress conditions. Sci Signal 11
6 An Examination of Mechanisms by which Synonymous Mutations may Alter…
131
Torres AG, Reina O, Stephan-Otto Attolini C, Ribas de Pouplana L (2019) Differential expression of human tRNA genes drives the abundance of tRNA-derived fragments. Proc Natl Acad Sci U S A 116:8451–8456 Tsai CJ, Sauna ZE, Kimchi-Sarfaty C, Ambudkar SV, Gottesman MM, Nussinov R (2008) Synonymous mutations and ribosome stalling can lead to altered folding pathways and distinct minima. J Mol Biol 383:281–291 Tsao D, Shabalina SA, Gauthier J, Dokholyan NV, Diatchenko L (2011) Disruptive mRNA folding increases translational efficiency of catechol-O-methyltransferase variant. Nucleic Acids Res 39:6201–6212 Tseng SC, Kimchi-Sarfaty C (2011) SNPs in ADAMTS13. Pharmacogenomics 12:1147–1160 Tsui LC, Dorfman R (2013) The cystic fibrosis gene: a molecular genetic perspective. Cold Spring Harb Perspect Med 3:a009472 Tuller T (2014) Challenges and obstacles related to solving the codon bias riddles. Biochem Soc Trans 42:155–159 Tuller T, Waldman YY, Kupiec M, Ruppin E (2010a) Translation efficiency is determined by both codon bias and folding energy. Proc Natl Acad Sci U S A 107:3645–3650 Tuller T, Carmi A, Vestsigian K, Navon S, Dorfan Y, Zaborske J, Pan T, Dahan O, Furman I, Pilpel Y (2010b) An evolutionarily conserved mechanism for controlling the efficiency of protein translation. Cell 141:344–354 Varga K, Jurkuvenaite A, Wakefield J, Hong JS, Guimbellot JS, Venglarik CJ, Niraj A, Mazur M, Sorscher EJ, Collawn JF, Bebok Z (2004) Efficient intracellular processing of the endogenous cystic fibrosis transmembrane conductance regulator in epithelial cell lines. J Biol Chem 279:22578–22584 Veit G, Avramescu RG, Perdomo D, Phuan PW, Bagdany M, Apaja PM, Borot F, Szollosi D, Wu YS, Finkbeiner WE, Hegedus T, Verkman AS, Lukacs GL (2014) Some gating potentiators, including VX-770, diminish DeltaF508-CFTR functional expression. Sci Transl Med 6:246ra297 Veit G, Avramescu RG, Chiang AN, Houck SA, Cai Z, Peters KW, Hong JS, Pollard HB, Guggino WB, Balch WE, Skach WR, Cutting GR, Frizzell RA, Sheppard DN, Cyr DM, Sorscher EJ, Brodsky JL, Lukacs GL (2016) From CFTR biology toward combinatorial pharmacotherapy: expanded classification of cystic fibrosis mutations. Mol Biol Cell 27:424–433 Walsh IM, Bowman MA, Soto Santarriaga IF, Rodriguez A, Clark PL (2020) Synonymous codon substitutions perturb cotranslational protein folding in vivo and impair cell fitness. Proc Natl Acad Sci U S A 117:3528–3534 Wang X, He Y, Jiang Y, Feng X, Zhang G, Xia Z, Zhou Y (2019) Screening and mutation analysis of hyperphenylalaninemia in newborns from Xiamen, China. Clin Chim Acta 498:161–166 Ward CL, Kopito RR (1994) Intracellular turnover of cystic fibrosis transmembrane conductance regulator. Inefficient processing and rapid degradation of wild-type and mutant proteins. J Biol Chem 269:25710–25718 Ward CL, Omura S, Kopito RR (1995) Degradation of CFTR by the ubiquitin-proteasome pathway. Cell 83:121–127 Wen JD, Lancaster L, Hodges C, Zeri AC, Yoshimura SH, Noller HF, Bustamante C, Tinoco I (2008) Following translation by single ribosomes one codon at a time. Nature 452:598–603 Whitesides GM, Grzybowski B (2002) Self-assembly at all scales. Science 295:2418–2421 Wu X, Jornvall H, Berndt KD, Oppermann U (2004) Codon optimization reveals critical factors for high level expression of two rare codon genes in Escherichia coli: RNA stability and secondary structure but not tRNA abundance. Biochem Biophys Res Commun 313:89–96 Xu B, Meng Y, Jin Y (2021) RNA structures in alternative splicing and back-splicing. Wiley Interdiscip Rev RNA 12:e1626 Yu CH, Dang Y, Zhou Z, Wu C, Zhao F, Sachs MS, Liu Y (2015) Codon usage influences the local rate of translation elongation to regulate co-translational protein folding. Mol Cell 59:744–754 Yue JK, Winkler EA, Rick JW, Burke JF, McAllister TW, Oh SS, Burchard EG, Hu D, Rosand J, Temkin NR, Korley FK, Sorani MD, Ferguson AR, Lingsma HF, Sharma S, Robinson CK, Yuh
132
Y. Zhang and Z. Bebok
EL, Tarapore PE, Wang KK, Puccio AM, Mukherjee P, Diaz-Arrastia R, Gordon WA, Valadka AB, Okonkwo DO, Manley GT, Investigators T-T (2017) DRD2 C957T polymorphism is associated with improved 6-month verbal learning following traumatic brain injury. Neurogenetics 18:29–38 Zeitlin PL (2006) Is it go or NO go for S-nitrosylation modification-based therapies of cystic fibrosis transmembrane regulator trafficking? Mol Pharmacol 70:1155–1158 Zhang G, Ignatova Z (2009) Generic algorithm to predict the speed of translational elongation: implications for protein biogenesis. PLoS One 4:e5036 Zhang F, Kartner N, Lukacs GL (1998) Limited proteolysis as a probe for arrested conformational maturation of delta F508 CFTR. Nat Struct Biol 5:180–183 Zhang J, Long M, Li L (2005) Translational effects of differential codon usage among intragenic domains of new genes in Drosophila. Biochim Biophys Acta 1728:135–142 Zhang G, Hubalewska M, Ignatova Z (2009) Transient ribosomal attenuation coordinates protein synthesis and co-translational folding. Nat Struct Mol Biol 16:274–280 Zhang G, Fedyunin I, Miekley O, Valleriani A, Moura A, Ignatova Z (2010) Global and local depletion of ternary complex limits translational elongation. Nucleic Acids Res 38:4778–4787 Zhang K, Zhang X, Cai Z, Zhou J, Cao R, Zhao Y, Chen Z, Wang D, Ruan W, Zhao Q, Liu G, Xue Y, Qin Y, Zhou B, Wu L, Nilsen T, Zhou Y, Fu XD (2018) A novel class of microRNA-recognition elements that function only within open reading frames. Nat Struct Mol Biol 25:1019–1027 Zhou H, Rigoutsos I (2014) MiR-103a-3p targets the 5’ UTR of GPRC5A in pancreatic cells. RNA 20:1431–1439 Zuker M (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31:3406–3415
Chapter 7
Methods to Evaluate the Effects of Synonymous Variants Brian C. Lin, Katarzyna I. Jankowska, Douglas Meyer, and Upendra K. Katneni
7.1 Introduction Degeneracy of the genetic code enables a majority of amino acids (with the exceptions of methionine and tryptophan) to be encoded by multiple codons, called synonymous codons (Athey et al. 2017). Synonymous codons of an amino acid are not used at the same frequency, and in many organisms, there is a selective bias towards some codons versus others, a phenomenon called codon usage bias (CUB). Synonymous variants are single nucleotide changes that result in synonymous codon substitutions without changing the underlying amino acid sequence. A majority of synonymous variants typically introduce synonymous codon substitutions by replacement of the third nucleotide in a codon, but base substitutions at the first position of a leucine or arginine codon can also be synonymous. For a long time, synonymous variants, with the exception of those that are located in consensus splice site sequences at the exon-intron boundaries, were assumed to be functionally neutral in view of the lack of amino acid changes and were commonly referred to as “silent variants” (Hunt et al. 2014). Under the same assumption, it was considered safe to employ synonymous variants in recombinant proteins and gene therapy designs using a technique called codon optimization to primarily improve protein expression. However, there is a critical mass of evidence in the last two decades showing that synonymous variants are not always silent and can affect encoded protein’s expression and quality through multiple mechanisms including altered CUB, RNA structure, RNA stability, splicing, miRNA binding, translation efficiency and
B. C. Lin · K. I. Jankowska · D. Meyer · U. K. Katneni (*) Hemostasis Branch, Division of Plasma Protein Therapeutics, Office of Tissues and Advanced Therapies, Center for Biologics Evaluation & Research, US FDA, Silver Spring, MD, USA e-mail: [email protected] © Springer Nature Switzerland AG 2022 Z. E. Sauna, C. Kimchi-Sarfaty (eds.), Single Nucleotide Polymorphisms, https://doi.org/10.1007/978-3-031-05616-1_7
133
134
B. C. Lin et al.
co-translational folding (Sauna and Kimchi-Sarfaty 2011; Bali and Bebok 2015). A large number of synonymous variants are implicated in several diseases and cancers as well as altering drug responses (Sauna et al. 2007; Sauna and Kimchi-Sarfaty 2013; Sharma et al. 2019). Misclassification of synonymous variants as neutral or silent could result in failure to recognize disease-causing variants or failure to properly regulate recombinant therapeutic products harboring synonymous variants. Therefore, a careful evaluation of the functional effects of synonymous variants is critical. A plethora of in-vitro, ex-vivo and in-silico tools are currently available to assess functional effects of one or more co-occurring synonymous variants (Hunt et al. 2014). Expectedly, prediction or studying the effects of synonymous variants is not straight forward as predicted outcomes of in-silico tools may not always be accurate and in-vitro evaluation of variants sometimes lacks relevant biological context. Also, synonymous variants may impart their functional effects through multiple mechanisms, in some instances requiring use of multiple methods for a comprehensive evaluation. Alternatively, varying underlying principles of the assays may mean that not all assays will identify the functional effects and only one or a select list of methods will be successful. The current chapter reviews methods frequently employed to study functional effects of synonymous variants (Fig. 7.1).
Fig. 7.1 A graphical illustration showing the frequently employed methods to predict the effects of synonymous variants at mRNA and protein levels with a combination of in-silico and experimental tools. Created with Biorender.com
7 Methods to Evaluate the Effects of Synonymous Variants
135
7.2 Exploring the Effects of Synonymous Variants on mRNA 7.2.1 Fitness, Codon Usage Bias, and mRNA Transcription Synonymous variants, once deemed inconsequential, are now widely recognized to leave imprints in the genetic code that can impact cellular fitness and shape the gene expression landscape across various species (Lebeuf-Taylor et al. 2019). Though once believed to be unimpacted by “natural selection” due to the absence of a direct change to the amino acid sequence, the frequencies of synonymous variants are variable among different species and genes, and therefore, are now recognized as a platform for adaptive evolution (Bailey et al. 2014; Horton et al. 2021). Weak and strong purifying selection has been found at many synonymous sites (Keightley and Halligan 2011; Zeng and Charlesworth 2009; Lawrie et al. 2013; Savisaar and Hurst 2018; Seoighe et al. 2020), and mounting evidence has emerged, pinpointing a role of synonymous variants in disease (Hunt et al. 2014; Sauna and Kimchi-Sarfaty 2011; Kimchi-Sarfaty et al. 2010; Simhadri et al. 2017). These studies often use comparative genomics and construct matrix models, which consider a variety of codon usage parameters, to measure selection intensities. Zeng and Charlesworth (Zeng and Charlesworth 2009) applied matrix analysis to a large Drosophila data set and found that there was mutational bias for rare codons, and higher expressed genes with a greater proportion of frequent codons are under stronger selection. Indeed, CUB, which defines the frequency of synonymous codons, are variable among different species and higher expressed genes tend to have a bias towards optimal codons (Frumkin et al. 2018). CUB reflects natural evolutionary drift and correlates with a multitude of factors, such as tRNA abundance, GC content, codon placement, and mRNA folding (Parvathy et al. 2022). Synonymous variants are evaluated for whether they conform to codon usage biases, and CUB may be evaluated through many different metrics, which can be used interchangeably. One such metric is the codon adaptation index (CAI), which calculates directional synonymous codon usage bias and is useful for predicting whether synonymous variants may drive changes in expression (Sharp and Li 1987). Relative synonymous codon usage bias (RSCU) is a metric that compares the observed frequency of a codon versus the expected frequency, assuming synonymous codons are used equally. Codon usage frequencies for each species are recorded in codon usage tables, which are regularly updated to integrate the continuous influx of new genetic data (Athey et al. 2017; Alexaki et al. 2019a). Alternatively, codon bias index (CBI) is also commonly used to estimate the preference of genes for specific codons (Fox and Erill 2010). We recommend the comprehensive review by Bahiri-Elitzur and Tuller (Bahiri-Elitzur and Tuller 2021) on codon-based indices for a thorough analysis of the advantages, disadvantages, and central uses of these metrics along with many others. Furthermore, as pairs of consecutive codons (bicodon), are also subject to evolutionary pressure and have significantly different frequency usages for coding either lowly or highly abundant proteins (Diambra 2017), the codon pair biases (bicodon
136
B. C. Lin et al.
usage) in coding sequences is often evaluated by relative synonymous bi-codon usage (RSBCU) (Coleman et al. 2008) or codon pair scores (CPS) (Kunec and Osterrieder 2016). A species-specific CPS indicates whether a given codon pair is present more or less frequently than statistically predicted. In addition, rare codons may cause ribosome pausing and slow translation elongation, which in turn, can inhibit translational initiation, promote mRNA degradation, and affect protein folding (Peterson et al. 2020). Thus, rare and common codon clustering is often computed using %minmax (Rodriguez et al. 2018) or rare codon enrichment (RC) (Jacobs and Shakhnovich 2017; Clarke and Clark 2008). Aside from in-silico tools, a reporter system for rare codon-dependent expression evaluation in mammals was also developed (Peterson et al. 2020). Expression vectors containing a Green Fluorescent Protein (GFP) cDNA with a codon bias towards rare mammalian codons or optimized for common mammalian codons were expressed in human HEK293T cell lines and then subjected to Fluorescence-Activated Cell Sorting (FACS) analysis to separate cells based on the level of their GFP fluorescence. The cells expressing GFP with rare mammalian codons exhibited an approximately 100- fold lower mean fluorescent intensity when compared to cells that expressed GFP optimized for common mammalian codons. These metrics to evaluate CUB along with many other studies on synonymous variants has laid the foundation for a developing field of research on “codon optimization”, with the idea that by altering the frequencies of codons, gene expression can be beneficially tuned (Alexaki et al. 2019a; Mauro and Chappell 2014). Many in-silico tools have been used to selectively optimize CUB of a particular gene of interest, such as Optimizer (Puigbò et al. 2007), Gene Designer (Villalobos et al. 2006) and DNA Works (Hoover and Lubkowski 2002). These tools require user input of the genetic sequence and the software will codon optimize the sequence based on specialized parameters like CAI and species-specific codon usage bias. However, when introducing synonymous variants into a gene, it is important to investigate for the presence of any unintended effects. One question often posed is does synonymous variants in the genetic sequence challenge cellular fitness, and if so, do they stem from effects on transcription, translation and/or altered gene regulation? To study fitness effects of synonymous variants, two experimental techniques have been generally used by the scientific community. The first is to evaluate the distribution of fitness effects (DFE) for synonymous variants, and this method relies on using targeted mutagenesis to initially generate a series of variants, which can then undergo experimental tests for differences in growth rates and competitive fitness (Lebeuf-Taylor et al. 2019; Peris et al. 2010). For example, Cuevas et al., used this one-step DFE approach to show that RNA viruses are more vulnerable to synonymous variants than DNA viruses and 5% of random synonymous substitutions are lethal to RNA viruses (Cuevas et al. 2012). In addition, the same approach was used by Domingo-Calap and colleagues to perform a similar study on DNA and RNA bacteriophages (Domingo-Calap et al. 2009). However, when using a DFE approach, one must bear in mind that these mutations are generally evaluated in an unnatural environment, which may exaggerate their effects, as any expression
7 Methods to Evaluate the Effects of Synonymous Variants
137
changes may be under greater selection pressures. Alternatively, one can assess the role of synonymous mutations in driving adaptive evolution through an experimental evolution (EE) approach. This second experimental method relies on constructing replicate species populations and allowing them to evolve over time in a controlled environment. Through deep sequencing, populations that develop and adapt through synonymous variants can be evaluated for changes in a variety of fitness metrics and codon usage biases. Using the EE approach, Knoppel et al. compared the fitness costs of synonymous variants in the Salmonella enterica species and saw that the synonymous mutants caused reduced growth rates over time, which could be corrected through adaptive mutations that evolved over time to increase transcription rates (Knöppel et al. 2016). Both approaches have been used extensively, with the DFE approach providing characterization of synonymous variants and their effects on fitness, and the EE approach providing a method to characterize the impact of adaptive synonymous variants on evolved populations. Based on previous observations (Zeng and Charlesworth 2009; Cuevas et al. 2012; Domingo-Calap et al. 2009; Knöppel et al. 2016), replacing rare codons with frequent codons can be inferred to improve fitness. However, this is not necessarily true as many cases have shown decreased fitness with introduction of synonymous variants (Lebeuf-Taylor et al. 2019). While it is often mentioned that codon usage bias underlies fitness changes, effects of CUB are often very small when assessing only a few variants, and not easily detectable (Rahman et al. 2021). Instead, there is greater supporting experimental evidence that any changes to fitness observed may stem from changes to transcriptional processes. Synonymous variants can affect transcription initiation through the formation or strengthening of promoter and transcription factor sites on the same or separate genes, which can lead to increased transcription (Lebeuf-Taylor et al. 2019; Kershner et al. 2016; Stergachis et al. 2013). Ando and colleagues found that in Mycobacterium tuberculosis(Mtb), a silent mutation (c.609G>A) in the mycolic acid biosynthesis A (mabA) gene resulted in the formation of an alternative promoter site, which increased transcription of inhA, thereby providing isoniazid resistance for the bacterium (Ando et al. 2014) (Table 7.1). The group used RNA sequencing (RNAseq) on RNA isolated from Mtb strains expressing the synonymous variant to show that transcription was enhanced. More importantly, through RNAseq analysis, they identified the presence of two additional transcripts of inhA, which suggested the presence of an alternative promoter site. They confirmed the promoter site through using a reporter assay to show lack of enhanced transcription in the absence of the alternate promoter site and used a promoter prediction tool (GENETYX-MAC) to confirm its presence. Since this study, other prediction software have been developed to predict promoter sites in eukaryotes with improved accuracy, including DeePromoter (Oubounyt et al. 2019) and DeeReCT-PromID (Umarov et al. 2019). This experimental approach provides a beneficial strategy to assess for changes in RNA transcription that can be easily replicated in future studies of synonymous variants. Transcription can be easily assessed by measuring total mRNA abundance through standard laboratory techniques, such as northern blotting and RT-qPCR (reverse transcription-quantitative polymerase chain reaction), or by
138
B. C. Lin et al.
Table 7.1 Examples of studies investigating synonymous variants by multiple methods Synonymous variant mabA (c.609G>A)
hERG (codon-modified version)
TLR7 (codon optimized version)
Methods used in study Northern blotting, RT-qPCR, RNAseq, promoter prediction software
Description of results One of the first studies to show that synonymous variants can produce an alternative promoter site. Synonymous mutation in mabA of mycobacterium tuberculosis causes the formation of an alternative promoter site adjacent to the mutation position, which increases transcription of inhA, providing isoniazid resistance. Immunoblotting, A thorough analysis of the effects of codon RT-qPCR, pulse- optimization on labeling with transcription/ 4-thiouridine, translation. RNAfold mRNA Codon modified structure prediction, hERG-CM (reduced GC circular dichroism (CD), actinomycin-D content, rare codons, and half-life decay analysis mRNA structure) was found to confer lower protein expression due to lower transcription rate, decreased mRNA stability, and reduced translation. CAI, immunoblotting, At the time, many believed that translation ribosome profiling, was the likely polysome profile mechanism underlying analysis, RT-qPCR, codon optimization, this northern blotting, study showed that in actinomycin-D RNA TLR genes, half-life analysis, transcription was the 4-thiouracil RNA main factor. labeling Codon optimization of Tlr7 leads to increased protein levels, which were determined to be largely due to enhanced transcription versus translation.
References Ando et al. (2014) and Betel et al. (2010)
Bertalovitz et al. (2018)
Newman et al. (2016) and Kertesz et al. (2007)
(continued)
7 Methods to Evaluate the Effects of Synonymous Variants
139
Table 7.1 (continued) Synonymous variant ΔF508 CFTR (c.1520_1522delTCT)
COMT (3 haplotypes with synonymous variations [c.198A>G, c.186C>T, c.408C>G])
IRGM (c.313C>T)
Methods used in study In-vitro transcription/ translation, CD, mRNA folding assays, RT-qPCR, immunoblotting, mFold RNA structure prediction
Description of results One of the earliest studies to show that a synonymous variant can alter mRNA structure in a disease- related protein. Synonymous site within the ΔF508 CFTR leads to altered mRNA structure and expression of mutant protein. Removing the synonymous variant (ATC) reverts structure to that of wild-type mRNA. mFold RNA structure Study highlighted a functional effect on prediction, mRNA pain sensitivity that is folding analysis, derived from protein enzymatic activity assay, immunoblotting, expression differences due to single synonymous variants. Most stable COMT haplotype via mRNA structure corresponds to haplotype with lowest activity and protein levels. Synonymous variants underlie differences in pain sensitivity based on differences in protein levels. SnipMir, RegRNA and Study pointed out that a synonymous mutation Patrocles miRNA can displace a miRNA prediction software, binding site and lead to tandem affinity increased Crohn’s purification of RNA, disease risk. RT-qPCR, Synonymous mutation in immunoblotting IRGM was shown to remove a miRNA binding site via in-silico miRNA binding site prediction, followed by experimental validation.
References Krek et al. (2005) and Bartoszewski et al. (2010)
Coronnello and Benos (2013) and Nackley et al. (2006)
Bandyopadhyay and Mitra (2009)
(continued)
140
B. C. Lin et al.
Table 7.1 (continued) Synonymous variant Factor IX (c.459G>A)
PAH (c.30C>G)
MDR1 (c.3435C>T)
Methods used in study mRNA structure prediction tools, immunoblotting with conformation sensitive antibodies, trypsin digestion, activity assays,
Description of results Study identified that a disease-linked synonymous mutation in factor IX causes reduced expression due to conformational changes and reduced translation. Synonymous mutation in factor IX alters mRNA structure as determined by mRNA prediction tools and causes reduced expression due to slowing translation. ESE finder 3.0 splicing Study identified a synonymous mutation prediction tool, that introduces an minigene assay for splicing, RNA affinity exonic splicing silencer, purification, RT-qPCR which may underlie the disease, PKU. Synonymous mutation in PAH, which was once deemed neutral, is found to cause the formation of a splicing silencer via splicing prediction tools and RNA assessment. One of the earliest FACS, RT-qPCR, studies to identify a immunoblotting with conformation-sensitive synonymous mutation affecting antibodies, trypsin co-translational protein digestion folding and substrate specificity Synonymous mutation in MDR1 alters drug and inhibitor interactions as determined by codon usage bias, functional inhibitory assays, and co-translational protein folding studies.
References Simhadri et al. (2017)
Dobrowolski et al. (2010)
Kimchi-Sarfaty et al. (2007)
high-throughput techniques like RNAseq. However, any changes to mRNA abundance should be further evaluated by investigating changes to RNA synthesis and RNA decay through high-resolution gene expression profiling and metabolic labeling techniques (Dölken et al. 2008; Newman et al. 2016). Bertalovitz and colleagues used these experimental tools to evaluate how synonymous nucleotide
7 Methods to Evaluate the Effects of Synonymous Variants
141
modifications affect the human ether-a-go-go-related gene (hERG) ion channel (Table 7.1). The group used RT-qPCR to show that mRNA transcript levels of the potassium voltage-gated channel subfamily H member 2 (KCNH2), which encodes hERG, were lower in a synonymous nucleotide modified KCNH2, and they employed a pulse-experiment with 4-thiouridine exposure to determine that the rate of synthesis of wild type KCNH2 is sixfold higher than the modified version (Bertalovitz et al. 2018). Similarly, Newman and colleagues used these experimental methods to investigate how codon bias and GC content altered expression of toll-like receptors (TLR). They performed northern blotting, RT-qPCR, and thiouracil metabolic labeling to show that optimizing synonymous nucleotides of TLR7 enhances its transcription rate and low GC content modifications limit the transcription rates of many TLR genes (Newman et al. 2016).
7.2.2 Evaluation of mRNA Structure and Stability Synonymous variants, by introducing nucleotide modifications within the mRNA transcript, can significantly alter mRNA structure and stability and can have significant consequences in health and disease (Gaither et al. 2021; Wu et al. 2019). The single-stranded nature of mRNA allows it to conform to a variety of unique folds and structures based on its sequence characteristics. mRNA structures can be highly dynamic, and its structural integrity can determine its translational efficiency and susceptibility to RNA degradation (Ding et al. 2014). The stability of distinct regions within the mRNA sequence, which correlates with factors such as GC content and codon usage biases, can influence the expression levels of individual genes (Kudla et al. 2009). Highly expressed genes tend to have synonymous variants that yield greater mRNA instability at the 5′ regions of the gene, near the start codons, which is considered more favorable for promoting translation initiation (Babendure et al. 2006; Gu et al. 2010). There is a growing number of examples of synonymous variants that have been found to alter mRNA structure and stability. For example, a synonymous mutation in the DRD2 dopamine receptor was found to alter mRNA stability and lead to decreased expression of the receptor through faster mRNA decay (Duan et al. 2003). In addition, a single synonymous variant introduces additional loop structures in the mutant ΔF508 cystic fibrosis transmembrane conductance (CFTR), which interestingly, without this variant, folds into the wild-type CFTR mRNA structure (Bartoszewski et al. 2010) (Table 7.2). Notably, haplotypes of the human catechol- O-methyltransferase (COMT), characterized by unique synonymous changes, led to differential changes in pain sensitivity due to alterations in COMT enzymatic activity and protein levels (Nackley et al. 2006). These studies utilize a combination of in-silico tools to predict whether the synonymous variant induces changes to mRNA structure, and subsequently employ a variety of experimental techniques to further assess the consequences of the synonymous variant on mRNA structure, folding, and/or expression of the encoded protein (Table 7.2). Many of these techniques and
Pseudoknots Pseudoknots Pseudoknots Noncanonical base-pairing Pseudoknots
Relative entropy between Boltzmann ensembles
Minimum free energy, partition function calculations Machine-learning (neural network), pseudoknots, noncanonical base-pairing Minimum free energy
CycleFold DMfold
IPKnot Kinefold Knotty MC-fold-DP ProbKnot
remuRNA
RNAfold
UNAFold
SPOT-RNA
Prediction parameters Co-transcriptional folding (accurate for longer sequences >1000 nt) Noncanonical base-pairing Machine-learning, pseudoknots
Tool CoFold
Table 7.2 In-silico tools for predicting mRNA structure/stability
http://www.unafold.org/
http://rna.urmc.rochester.edu https://github.com/linyuwangPHD/ RNA-Secondary-Structure-Database http://rtips.dna.bio.keio.ac.jp/ipknot/ http://kinefold.curie.fr/ https://github.com/HosnaJabbari/Knotty https://hackage.haskell.org/package/MC-Fold-DP https://rna.urmc.rochester.edu/RNAstructureWeb/ Servers/ProbKnot/ProbKnot.html https://github.com/bgruening/galaxytools/tree/master/ tools/rna_tools/remurna http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/ RNAfold.cgi https://sparks-lab.org/server/spot-rna/
URL https://e-rna.org/cofold/
Singh et al. (2021) and Singh et al. (2019) Markham and Zuker (2008)
Gruber et al. (2008)
Salari et al. (2013)
Sato and Kato (2021) Xayaphoummine et al. (2005) Jabbari et al. (2018) zu Siederdissen et al. (2011) Wu et al. (2019)
Sloma and Mathews (2017) Wang et al. (2019)
References Proctor and Meyer (2013)
142 B. C. Lin et al.
7 Methods to Evaluate the Effects of Synonymous Variants
143
approaches to evaluate mRNA secondary structure will be explained in detail within this section. To determine the effect of synonymous variants on mRNA structure, most methods rely on the initial use of in-silico computational tools to predict mRNA structure. While x-ray crystallography and nuclear magnetic resonance are the state-of-the-art techniques for uncovering mRNA structure at a single nucleotide resolution, the high cost and low-throughput nature of these techniques limit their suitability for assessing large volumes of synonymous variants. In addition, these techniques are limited to revealing only the in-vitro conformation of mRNA, which can differ from models of its in-vivo mRNA structure. Due to the growing understanding of the importance of mRNA secondary structure, many in-silico tools have been created and are now widely used to characterize the effects of synonymous variants on mRNA structure (Table 7.2). Over the last few decades, the most widely adopted RNA structure prediction approaches have relied on a score-based methods to uncover an optimal structure that would accurately match the empirical structure. The inaugural prediction tool was based on a simple theory that the most optimal structure was the structure that matched the highest number of bases (Zuker and Stiegler 1981). Subsequently, more precise models have been developed that incorporate thermodynamics by inferring that the native structure would be the one with the lowest free energy (Turner and Mathews 2010; Tinoco Jr. et al. 1971). Computational methods, such as mFold (Zuker 2003), now called UNAFold (Markham and Zuker 2008), Kinefold (Xayaphoummine et al. 2005), remuRNA (Salari et al. 2013), CoFold (Proctor and Meyer 2013), and RNAfold (Gruber et al. 2008) are programs that utilize this thermodynamic principle, and are widely used today to predict RNA folding energy and optimal mRNA structure with minimum free energy. For example, Simhadri and colleagues used three separate prediction softwares to show that a synonymous variant (c.459G>A) in F9 (Factor IX) alters mRNA structure, thermodynamic mRNA stability, and leads to reduced extracellular expression (Simhadri et al. 2017). Furthermore, Hamasaki- Katagiri et al., compared multiple RNA structure prediction tools and used them to evaluate F8 (Factor VIII) mutation datasets; they discovered that in F8 mRNAs, synonymous variants that are present in more stable mRNA regions are more likely to cause disease (Simhadri et al. 2017; Hamasaki-Katagiri et al. 2017). Over the last decade, many variations of these programs have been newly developed or adapted from previous models to consider additional folding dynamics, such as formation of pseudoknots, which form from intramolecular base pairing patterns like stems and stem-loops, or noncanonical base pairings. Examples of these software include ProbKnot (Bellaousov and Mathews 2010), IPKnot (Sato and Kato 2021), and Knotty (Jabbari et al. 2018) for predicting pseudoknot structures and MC-Fold-DP (zu Siederdissen et al. 2011) and CycleFold (Sloma and Mathews 2017) for predicting structures with noncanonical base pairs. However, although these methods provide reliable means to predict potential RNA structures, issues persist with these prediction strategies. In many cases, especially when attempting to model either short or very long RNA sequences, the number of predicted structures can be numerous, and the absence of a method to filter
144
B. C. Lin et al.
these structures poses a significant challenge. Fortunately, advancements in the computational and biology fields have included the development of machine learning approaches, which recently have been used to improve RNA secondary structure prediction (Chen et al. 2020; Calonaci et al. 2020; Lu et al. 2019; Zhang et al. 2019). Though still early in development, machine learning approaches can incorporate many more features into its modeling, including sequence patterns and evolutionary information. When paired with classification parameters and trained with a growing numbers of publicly available RNA structure datasets and databases, including the Nearest Neighbor Database (NNDB), which pairs its RNA data with thermodynamic information, machine learning provides significant opportunities to improve upon currently available RNA structure prediction tools (Turner and Mathews 2010; Singh et al. 2021). Though one must bear in mind that there remains significant limitations with RNA prediction modeling, such as overfitting of machine learning models and the scarcity of available high-accuracy RNA training sets, new machine learning tools, such as DMfold (Wang et al. 2019) and SPOT-RNA (Singh et al. 2019, 2021) have already shown the capability to outperform other prediction methods (Zhao et al. 2021). Using an ensemble of these prediction tools to assess mRNA sequences, with and without the synonymous variant, is a beneficial strategy to mitigate biases from individual tools when characterizing the effect of synonymous variants on mRNA structure. Subsequently, predicted structural changes can be validated through experimental approaches to evaluate potential consequences of the altered mRNA structure, such as its effects on co-translational folding, protein expression, and mRNA abundance. For example, circular dichroism (CD), mRNA folding assays, endoribonuclease digestion, and RT-qPCR techniques were used to demonstrate that the synonymous mutation in the ΔF508 mutant form of CFTR introduced loop structures into its mRNA structure that may have altered its co-translational folding (Bartoszewski et al. 2010). In addition, targeted mutagenesis is a common approach to generate a reporter construct with the desired synonymous variant. The effects of the variant can then be studied when expressed in a desired host cell, and its RNA, once extracted, can be evaluated through northern blot analysis and RT-qPCR. Nackley and colleagues, who evaluated COMT haplotypes with synonymous variants, effectively used the MFold in-silico prediction program to compare mRNA stability among the multiple COMT haplotypes (Nackley et al. 2006). Subsequently, they transiently expressed each of the haplotypes in a rat adrenal cell line (PC12) and used a COMT enzymatic assay and RT-qPCR to show that the most stable haplotype had both the least amount of protein and the lowest activity levels.
7.2.3 Study of Pre-mRNA Splicing Splicing is the most appreciated mechanism of synonymous variant induced phenotype changes. A study using modified phylogenetic comparison reported as high as 45% of synonymous variants to affect pre-mRNA splicing (Mueller et al. 2015).
7 Methods to Evaluate the Effects of Synonymous Variants
145
Synonymous variants in the coding sequence can dysregulate the splicing of pre- mRNAs either through disruption of native splice sites and splicing regulatory element (SRE) sequences or strengthening cryptic splice sites and introducing SREs. This splicing dysregulation could manifest in a variety of effects including intron retention, mis-splicing, exon skipping and leaky splicing (Katneni et al. 2019). Splicing dysregulation features prominently among the characterized mechanisms underlying phenotypic changes induced by synonymous variants (Hunt et al. 2014; Bali and Bebok 2015), however this could be a result of potential historical bias to examine primarily splicing related changes since many of the other mechanisms were identified and reported more recently, over the last two decades. Potential impact of synonymous variants on pre-mRNA splicing can be assessed by ex-vivo, in-vitro and in-silico approaches. Ex-vivo assessment comprises characterization of transcript isoforms in RNA isolated from primary cells. RT-qPCR is a highly sensitive and simple technique for ex-vivo assessment of synonymous variant’s effect on pre-mRNA splicing. Splicing analysis of blood RNA is frequently performed to identify variant’s effect on splicing (Wai et al. 2020; Apetrei et al. 2021). Northern blotting is an alternative technique with higher specificity relative to RT-qPCR, but it is relatively less sensitive and cumbersome to perform. High-throughput techniques like microarrays and RNA sequencing can be used for simultaneous assessment of expressed genetic variants on splicing. A major limitation for ex-vivo assessment is that isolation of RNA from primary cells showing target gene expression is not always feasible. For example, F9 is exclusively expressed in hepatocytes and collection of liver cells to study splicing effects of F9 genetic variants is not practical. In such instances, in- vitro minigene assays and in-silico prediction tools can be employed. In in-vitro minigene assays, native and genetic variant encoding exon sequences together with flanking intron sequences are cloned into reporter plasmids, transfected into cells, and expressed mRNA is evaluated for splicing alterations. While the minigene experimental system is not a perfect substitute to primary cells for ex-vivo mRNA evaluation due to potential differences in cell-type specific splicing factors (Wu et al. 2018), selecting a cell line model that is identical to primary cells of target gene expression for performing experiments can partially address the issue and minimize the discrepancy. Also, this system examines genetic variants in the context of only neighboring exons, potentially not capturing the effects of distantly located regulatory elements in the transcript. Despite these limitations, the minigene experimental system was successfully used to demonstrate the effects of synonymous variants on splicing in several genes (Katneni et al. 2019; Pagani et al. 2005; Zhou et al. 2021). Finally, in-silico splicing analysis tools are frequently relied upon by researchers and are used either alone or in conjunction with ex-vivo or in-vitro assessment tools when feasible. The available in-silico splicing analysis tools primarily search for variants’ effects on disruption and/or activation of splice sites and SREs and predict their effect on splicing. The in-silico splicing analysis tools employ variable methods for these prediction purposes (Table 7.3). For example, NNSPLICE (Reese et al. 1997) and MaxEntScan (Yeo and Burge 2004), two frequently employed
146
B. C. Lin et al.
Table 7.3 In-silico tools for predicting splicing effects Tool ESEfinder
Algorithm/prediction method SELEX (systematic evolution of ligands by exponential enrichment) screen
Description Predicts exonic splicing enhancers (ESEs). Employed an in-vitro SELEX screen to identify putative ESEs responsive to SR proteins involved in splicing ESRSeq Minigene An ESRseq score was assigned to all experimental possible hexamer sequences based on assessment their positional effect on splicing in minigene experiments. Changes in the ESRseq score can be used to predict the effect of variants on SREs Ex-Skip Combines predictions Compares ESE and exonic splicing of several splicing silencer (ESS) profile of native and regulatory element mutated sequences and predicts (SRE) prediction tools probability of exon skipping. Genscan Maximal dependence A decision tree method capturing decomposition significant position dependencies from an aligned set of signal sequences HEXplorer Computational and Developed hexamer based “HEXplorer RESCUE-type score”, which provides a quantitative approach measure of variant’s effect on splicing enhancing and silencing properties MaxEntScan Maximum entropy MaxEntScan is based on modeling short principle sequence motifs involved in splicing while simultaneously considering adjacent and nonadjacent position dependencies NetGene2 Neural networks Works in a two-step prediction scheme: (1) detection of coding sequence and (2) prediction of splice sites NNSplice Neural networks Employs neural networks trained on consensus splice sites. NNSPLICE also considers dinucleotide frequencies due to the strong correlation between neighboring nucleotides in splice site consensus sequences RESCUE- Computational Predicts the effect of genetic variants on ESE approach in ESEs. By using computational and conjunction with RESCUE-ESE (Relative Enhance and experimental Silencer Classification by Unanimous validation Enrichment) approach, 238 hexanucleotide ESE sequences (of possible 4096 hexamers) were identified SpliceAI Deep residual neural Employs a 32-layer deep neural network network trained on splicing determinants in primary pre-mRNA sequences Spliceview Position weight Employs position weight matrix and is matrix further improved by considering position dependent dependencies between nucleotides
References Smith et al. (2006)
Ke et al. (2011)
Raponi et al. (2011)
Burge and Karlin (1997) Erkelenz et al. (2014)
Yeo and Burge (2004)
Hebsgaard et al. (1996) Reese et al. (1997)
Fairbrother et al. (2004)
Jaganathan et al. (2019) Rogozin and Milanesi (1997)
7 Methods to Evaluate the Effects of Synonymous Variants
147
splice site prediction tools employ “neural networks” and “maximum entropy principle” respectively for their predictions. HEXplorer (Erkelenz et al. 2014) and ESRseq (Ke et al. 2011), two frequently used SRE prediction tools predict variant’s effect on SREs based on changes in scores to hexamer sequences overlapping the genetic variant. For calculation of these score changes, ESRseq uses both experimental and computational data, whereas HEXplorer employs only computationally derived data. On the other hand, some tools like HSF (Desmet et al. 2009) and EX-SKIP (Raponi et al. 2011) perform a combination analysis using multiple methods of prediction or combining predictions of different tools (Katneni et al. 2019). For a detailed list of frequently used tools and their methodology, the readers are advised to refer to a recent review article by Riolo et al. (2021). Due to the variable methods employed by different tools, splicing predictions for individual genetic variants tend to vary between tools. In our experience, while highly accurate predictions and good consensus was found between prediction tools for genetic variants located within native canonical splice sites, variable and low accuracy predictions were found for deep exonic variants. Furthermore, usage of a combination of tools employing different methods and/or focusing on different features of splicing (e.g., a combination of splice site and SRE prediction tools) could potentially provide a better prediction of the effects of deep exonic variants on splicing (Katneni et al. 2019). It should be underlined that in-silico tools are only predictive tools and further validation by ex-vivo and in-vitro assessment methods is essential for an accurate understanding of variants’ effects on splicing.
7.2.4 Detecting Changes in miRNA Binding MicroRNAs (miRNAs) are endogenous non-coding RNA molecules that can post- transcriptionally regulate expression of genes through binding target mRNA sequences (Gebert and MacRae 2019; Jonas and Izaurralde 2015; Bartel 2009). Though only ~22 nucleotides in length, miRNAs can target multiple genes within the same pathway and have been shown to affect many biological systems, including those important for cellular growth, differentiation, development, and apoptosis (Tüfekci et al. 2014). As a result, interactions between miRNA and mRNA transcripts are important mechanisms that cells rely on to maintain gene expression homeostasis. While miRNAs mostly regulate target genes by binding to 3′ untranslated regions (3′ UTRs) of target mRNAs via one or multiple binding sites it has been shown that a large fraction of miRNA binding sites can be found also in coding sequences (Forman et al. 2008; Fang and Rajewsky 2011). Even single synonymous changes within the coding region may alter miRNA-mRNA interaction and cause loss or gain of a miRNA binding sites and thus affect gene expression. Though current studies suggest that the effects of miRNA within coding regions are weaker than for sites in the 3′ UTR, miRNA interactions in the coding region may still enhance posttranscriptional regulation, independent of miRNA binding in 3′ UTR (Fang and Rajewsky 2011; Forman and Coller 2010), disrupt target recognition and may be implicated in disease development (Shabalina et al. 2013).
148
B. C. Lin et al.
While the minimum requirements for miRNA-mRNA pairing have not been fully established, synonymous SNPs that fall within miRNA binding sites can have profound consequences in altering miRNA-mRNA interactions. A large-scale analysis performed with TargetScan, a widely used in-silico prediction tool for identifying miRNA binding sites, found that in 45.9% of cases where miRNA binding sites fall within the locus of synonymous SNPs, there was removal or addition of miRNA regulation (Wang et al. 2015). Experimental evidence has corroborated this principal idea that synonymous variants can alter miRNA binding as well. The most notable evidence was derived from the observation that a synonymous mutation (c.313C>T) displaced miR-196 regulation of an autophagy regulatory gene immunity-related GTPase family M protein (IRGM) and resulted in functional changes to autophagy and increased risk of Crohn’s disease. A separate study found that a somatic synonymous mutation (c.51C>T) in BCL2L12 led to differential targeting by hsa-miR-671-5p and increased BCL12L12 expression, while reducing expression of p53, a pro-apoptotic protein (Gartner et al. 2013) (Table 7.1). Undoubtedly, although complementation of specific miRNA sequences with mRNA sequence motifs is likely the most important factor that determines binding, miRNA binding to mRNAs is largely an imperfect and complex system (Bartel 2009). It has been thought that the first eight nucleotides of the 5′ end of the miRNA, called the miRNA “seed region” are the most important, but non-canonical interactions through centered pairing or 3′ compensatory pairing have been discovered as alternative mechanisms of interactions (Shin et al. 2010; Agarwal et al. 2015). The effect of miRNA-mRNA interactions is stronger for conserved target sites of length 7–8 nt in coding regions compared to non-conserved sites (Fang and Rajewsky 2011). Binding through these sites may guide distinct regulatory functions, such as favoring mRNA cleavage as opposed to gene repression or degradation (Lewis et al. 2005). Numerous in-silico prediction tools and high throughput experimental techniques have been developed that have made it substantially more feasible to assess the effects of synonymous variants on miRNA binding (Šulc et al. 2015; Riffo- Campos et al. 2016). In-silico tools have generally been the first tools employed for identifying potential miRNA binding sites within mRNA sequences of the gene of interest. These in-silico tools consider both sequence characteristics and conserved sequence motifs to identify specific miRNA binding partners. With the advancements of machine learning techniques and predictive strategies over the last decade, many diverse algorithms have been used to weigh various parameters, in hopes of discovering an ideal method to identify authentic miRNA-mRNA pairs. While a preeminent tool has not yet been established, there is now a large comprehensive toolbox of prediction tools, each implementing unique approaches, guiding assumptions, and principles to identify miRNA-mRNA pairs (Table 7.4) (reviewed in (Riolo et al. 2020)). As miRNA regulates the genes mainly through binding to 3′ UTR, all of the current algorithms are designed and optimized to investigate miRNA-mRNA interactions in 3′ UTR while few in-silico tools allow for performing miRNA binding prediction within coding regions. MiRDB (http://mirdb.org/custom.html) is a target prediction database that provides flexible searching of miRNA binding sites in UTR and
149
7 Methods to Evaluate the Effects of Synonymous Variants Table 7.4 In-silico tools for identifying miRNA-mRNA binding sites
Tool ComiR
Diana- microT
MinoTar
miRDB (MirTarget)
Paccmit- CDS
Algorithm/ prediction method Machine learning (support vector machines)
Description Provides probability calculations for a set of miRNAs to bind based on set binding energy score parameters derived from 4 different prediction resources (miRanda, PITA, TargetScan, mirSVR algorithms) microT-CDS Designed to identify algorithm targets in coding sequence and 3′ untranslated region; updated in 2013 to v5.0 to support meta-analysis statistics and pathway enrichment analysis with results from next generation sequencing Model considers Sequence alignment and sequence alignments conservations and calculates conservation scores scoring based on empirical conservation rates of all codons in sequence Model integrates Machine miRNA binding and learning (support vector target expression data machine [SVM]) Ranking based Model assesses precomputed on Markov miRNA-mRNA pairs model and and ranks pairs based sequence on false discovery alignment rates
Target predictiona Year CDS + 3′ 2015 (updated UTR in 2020 to include coding regions)
References Coronnello and Benos (2013) and Bertolazzi et al. (2020)
CDS + 3′ UTR
2009 (updated in 2013)
Paraskevopoulou et al. (2013)
CDS + 3′ UTR
2010
Schnall-Levin et al. (2010)
CDS + 3′ UTR
2020
Wong and Wang (2014) and Chen and Wang (2019)
CDS + 3′ UTR
2015
Šulc et al. (2015) and Marín et al. (2013)
(continued)
B. C. Lin et al.
150 Table 7.4 (continued)
Tool TargetScan and Target Scan S
Algorithm/ prediction method Sequence alignment
DeepMirTar Machine learning (stacked denoising autoencoder [SdA]) DeepTarget Machine learning (recurrent neural network [RNN]) MBSTar
miRanda- mirSVR
miRror
Description Model relies on sequence alignments and contains specific programs based on species type Model considers canonical and non-canonical seed sites
Model based on deep recurrent neural networks-based auto encoding and sequence-sequence interaction learning Model considers Machine wobble base pairing learning at seed sites and ± 30 (random nucleotides flanking forest) regions. Support vector Model considers both thermodynamics and regression (SVR) dynamic- programming alignment to determine miRNA targets Model incorporates Ranked list - predictive experimental comparison of information derived from miRNA multiple miRNA-target profiling, mass spec proteomics and gene prediction expression resources; resources allows for targeted analysis based on preferred tissue or cell type
Target predictiona Year CDS + 3′ 2005 UTR
References Lewis et al. (2005) and Lewis et al. (2003)
3′ UTR
2018
Wen et al. (2018)
3′ UTR
2016
Lee et al. (2016)
3′ UTR
2015
Bandyopadhyay et al. (2015)
3′ UTR
2010
Betel et al. (2010)
3′ UTR
2010
Friedman et al. (2010)
(continued)
151
7 Methods to Evaluate the Effects of Synonymous Variants Table 7.4 (continued) Algorithm/ prediction method Machine learning (Naïve Bayes)
Description Model based on sequence and miRNA:mRNA duplex information form previously validated targets to train ML classifier Patrocles N/A Database compilation of DNA sequence polymorphisms that disrupt miRNA gene regulation PicTar Statistical tests Multi-step algorithm that incorporates using genome-wide multiple alignments of sequences with alignment microRNA sequences; predicts target sites based on optimal free energy prediction and determines ideal structure based on maximum likelihood statistical fit PITA Parameter-free Model that calculates model the difference in free energy gained from pairing versus unpairing of miRNA with mRNA sequences Model relies on TargetBoost Machine creating weighted learning sequence motifs and (genetic programming classifiers rather than and boosting) free-energy scores Model involves TargetMiner Machine considerations of learning (support vector expression profiling, structural machines interactions, and [SVM]) seed-site conservation
Tool NbmiRTar
Target predictiona Year 2007 3′ UTR
References Yousef et al. (2007)
3′ UTR
2010
Hiard et al. (2010)
3′ UTR
2005
Krek et al. (2005)
3′ UTR
2007
Kertesz et al. (2007)
3′ UTR
2005
Saetrom et al. (2005)
3′ UTR
2009
Bandyopadhyay and Mitra (2009)
Tools that can be used to evaluate synonymous variants within the coding region are listed first in the table (CDS + 3′ UTR)
a
152
B. C. Lin et al.
coding regions of a gene and within a custom sequence (Wong and Wang 2014; Chen and Wang 2019). In addition, Target Scan (Lewis et al. 2003), MinoTar (Schnall-Levin et al. 2010), and Paccmit-CDS (Šulc et al. 2015; Marín et al. 2013) provide algorithms that can be applied to search for miRNA binding sites within native and synonymous sequences of a gene. By comparing the prediction results from a synonymous sequence to that of the wild type sequence, the miRNAs that gain or lose binding sites upon synonymous variation, can be identified. As one miRNA can bind to multiple binding sites within the gene and multiple miRNAs can bind to mRNA, the prediction results often contain a list of hundreds of miRNAs that have a complementary sequence with studied mRNA. Thus, to evaluate the effects of synonymous variants on miRNA binding, multiple prediction tools should be used. This will allow for more precision in predicting the potential binding sites in the absence and presence of the synonymous variant and will help in determining possible changes to miRNA regulation at these sites. In addition, concurrent predictions from multiple tools based on different underlying algorithms could identify a narrower list of candidate miRNAs impacting synonymous variants for experimental validation. While each tool requires careful consideration of its foundational principles when interpreting results, these in-silico prediction tools when paired with high-throughput mapping and experimental characterization of miRNA binding sites can be used to further our understanding of miRNAs and the potential impact of synonymous variants on miRNA regulation (Chi et al. 2009; Ben Or and Veksler-Lublinsky 2021). The results from these prediction tools should be paired with experimental validation that demonstrates the expression of both miRNA and mRNA pairs and their direct interaction in synonymous variant and control wild type sequences. The predicted miRNA should ultimately affect the expression levels of the mRNA-encoded protein and may further exert a biological effect on the cellular system. Using a host cell, which ideally does not endogenously express the in-silico predicted miRNA, the miRNA of interest and a reporter plasmid encoding the gene of interest fused to GFP or luciferase can be transiently transfected. Subsequently, expression levels of the gene of interest, in the presence vs absence of the miRNA, should be measured through standard experimental techniques, such as Northern blotting and RT-qPCR or via luciferase activity or GFP fluorescence of the reporter. Alternatively, if the chosen host cell expresses the miRNA of interest, mutation constructs of the miRNA sequence and/or miRNA binding sites can be generated and evaluated for specificity of binding and changes in mRNA expression levels. RNAseq can also be performed to provide a global transcriptomic view of miRNA changes. Following the identification of mRNA expression changes, tandem affinity purification (TAP-Tar) and/or CLIP-seq techniques can be used to show the direct interaction and co-expression of mRNA and miRNA pairs. For example, Brest and colleagues (Brest et al. 2011) used TAP-Tar to effectively show that a protective synonymous variant of IRGM bound miR-196 far stronger than the Crohn’s disease risk-associated IRGM allele (c.313C>T) (Table 7.1). To assess for further biological effects on the cell, standard protein quantitation through western blotting, ELISA, and in-vivo cellular assays, specific to the encoded protein of interest, can be used in the presence and absence of the miRNA, to determine whether there is a functional biological effect.
7 Methods to Evaluate the Effects of Synonymous Variants
153
7.3 Exploring the Effects of Synonymous Variants on Proteins 7.3.1 Monitoring Translation Kinetics Synonymous variants could potentially change the translation efficiency of the encoded protein primarily through changes in codon usage characteristics or altered secondary structure of the mRNA. CUB correlates with local cognate tRNA levels and affects translation elongation speed (Liu 2020). Frequently used (preferred) codons were reported to have higher translation rates than rarely used non-optimal codons (Yu et al. 2015). The codon usage differences introduced by synonymous codon substitutions either by natural genetic variants or techniques like codon optimization thus have the potential to alter local translation kinetics and overall translation efficiency of the encoded gene (Kimchi-Sarfaty et al. 2007; Alexaki et al. 2019b). mRNA secondary structure changes introduced by synonymous variants can affect ribosome movement during elongation and alter translation efficiency (Nackley et al. 2006). Several methods are currently available to measure effects of synonymous variants on protein translation efficiency (Dermit et al. 2017; Iwasaki and Ingolia 2017). In-vitro translation assay is frequently employed to demonstrate translation efficiency differences induced by synonymous variants (Katneni et al. 2019; Buhr et al. 2016; Hunt et al. 2019; Sroubek et al. 2013). In this assay, mRNAs encoding genetic variants are translated in in-vitro conditions using cell free extracts with complete functional translational machinery. Resulting translational products collected at periodic intervals can be resolved by SDS-PAGE to identify translation differences. Polysome profiling is another frequently used technique to assess translation efficiency (Sroubek et al. 2013; Griseri et al. 2011; Mordstein et al. 2020). Polysome profiling involves separation of translating mRNAs that are bound to ribosomes, called polysomes, using a sucrose density gradient ultracentrifugation (Chassé et al. 2016). mRNA that is bound to ribosomes will be heavier and settle towards the bottom whereas ribosome-free mRNA stays at the top. Separation of polysomes based on ribosome density can be achieved to some extent as well. Different fractions of polysome gradient profiles are then collected, RNA extracted, and then purified RNA is tested for target mRNA using techniques like RT-qPCR, northern blotting, microarray or RNA sequencing (Chassé et al. 2016). By comparing target mRNA profile across fractions and the total mRNA pool, the effect of genetic variants on translation efficiency of the encoded protein can be revealed (Griseri et al. 2011). A major disadvantage of polysome profiling is that this technique does not differentiate between ribosomes bound to the open reading frame (ORF) and other sequence features like 5′ upstream ORFs (uORFs) within the mRNA. This limitation can be addressed by using ribosome profiling, which provides codon level resolution of ribosome locations on the mRNA(Ingolia et al. 2009). In this technique, the mRNA fragments that are protected from RNAse digestion by translating ribosomes are isolated, size selected and sequenced by deep-sequencing techniques. Like
154
B. C. Lin et al.
polysome profiling, comparison of ribosome protected fragments that are mapped to ORF of target gene and fraction in total mRNA will reveal the effect of genetic variants on translation efficiency. Ribosome profiling is widely used since its inception to assess translational kinetics (Alexaki et al. 2019b; Hunt et al. 2019; Li et al. 2021). Recently, techniques employing a similar principle, namely TRAP-seq (Translating Ribosome Affinity Purification-Sequencing) (Heiman et al. 2014), which employs epitope-tagged ribosomal proteins, and Proximity Specific Ribosome Profiling (Jan et al. 2014), were developed aiming at studying cell-type specific and spatial analysis of translation respectively. Overall, these newly developed ribosome profiling based techniques provide better sensitivity and resolution of mRNA translation, however, they have the disadvantage of being costly (Dermit et al. 2017). Proteomic based methods are not commonly employed to study translational kinetics but can be useful to assess expression level effects. Several techniques like p-SILAC (Pulsed Stable Isotope Labeling by/with Amino acids in Cell culture) (Ong et al. 2002), BONCAT (BioOrthogonal Non-Canonical Amino Acid Tagging) (Dieterich et al. 2006), QuanCAT (Quantitative Non-canonical Amino Acid Tagging) (Howden et al. 2013) and PUNCH-P (Puromycin-associated Nascent Chain Proteomics) (Aviner et al. 2014) are available. These techniques rely on labelling nascent proteins with stable isotopes (p-SILAC and QuanCAT), noncanonical amino acids (BONCAT and QuanCAT) or puromycin (PUNCH-P) and sequencing of proteins by using mass-spectrometry for unbiased quantitative comparison of translational products (Iwasaki and Ingolia 2017). Expression level effects can also be studied by employing less complex techniques like immunoblotting, ELISA and flow cytometry (Iwasaki and Ingolia 2017). Finally, several imaging-based translation assessment techniques like TRICK (Translating RNA Imaging by Coat protein Knockoff) (Halstead et al. 2015), SINAPS (single-molecule imaging of nascent peptides) (Wu et al. 2016), NCT (nascent chain tracking) (Morisaki et al. 2016) and FUNCAT (Fluorescent Non-Canonical Amino acid Tagging) (Dieterich et al. 2010) are also available (Iwasaki and Ingolia 2017; Chekulaeva and Landthaler 2016), details of which can be found elsewhere (Iwasaki and Ingolia 2017).
7.3.2 Analysis of Subtle Structural Changes Nascent polypeptides undergo folding as they start to exit ribosomes, a phenomenon called co-translational folding. As noted in the earlier section, codon usage changes and/or mRNA structural differences induced by synonymous variants can affect translational elongation rates. These altered translational kinetics can perturb the dynamics of co-translational folding and lead to conformational differences in the final protein structure (Walsh et al. 2020). A multitude of methods are available to assess conformational differences induced by genetic variants in the encoded protein (Table 7.5). However, it should be noted that the underlying principles of
7 Methods to Evaluate the Effects of Synonymous Variants
155
Table 7.5 Frequently used experimental methods to identify conformational differences Assay Native gel electrophoresis Inhibitory antibody assay Aptamer binding assay Limited proteolysis
Circular dichroism (CD) Hydrogen-deuterium exchange mass spectrometry (HDX-MS)
Principle Altered conformation could lead to differential gel migration of proteins and binding to conformation sensitive antibodies Conformational differences could result in differential access and inhibition of functional activity by inhibitory antibodies Structural changes in protein could lead to altered binding affinity to aptamers. Altered conformation of a substrate protein could result in differential access to the cleavage sites and thereby alter its digestion profile Structural changes could lead to changes in the absorbance of right- and left-circularly polarized light Changes in structure of the protein could result in differential access of hydrogen for exchange with deuterium, which can be identified through mass spectrometry analysis
each of the assays discussed below are different and depending on the nature and extent of the conformational differences, only one or a subset of assays might be able to detect the differences. Accordingly, a comprehensive analysis employing multiple techniques is required to reliably discern the conformational differences. Folding energy of the nascent protein chain can be estimated using an in-silico coarse-grained co-translational folding energy model (Jacobs and Shakhnovich 2017). Larger energy drops imply more co-translational folding occurring, which may be facilitated by slower translation and more rare codons downstream. Synonymous variants or codon optimization that disrupt these rare codons may impact co-translational folding. Nuclear Magnetic Resonance (NMR) and X-ray crystallography provide high-resolution analysis of protein structures, however these techniques are not suitable for all proteins and require special equipment and technical expertise; thus, these techniques are not suitable for routine lab use and are very rarely used in the study of synonymous variants (Buhr et al. 2016). Western blot probing of proteins in native conditions employing conformation sensitive antibodies is a straightforward approach to identify local structural differences (Simhadri et al. 2017). In the event of conformational differences in the native tertiary structure, the accessibility of the epitope sites for antibody binding is altered and will manifest as differences in band intensities upon imaging. Additionally, extreme differences in the structure and shape of the proteins under comparison could result in differential migration in native electrophoresis. While these assays are straightforward, identification of conformational sensitive antibodies, generally monoclonal antibodies targeting specific epitopes, requires screening and testing of multiple antibodies. Conformational differences in proteins with measurable functional activity can also be assessed by using inhibitory antibodies. These inhibitory antibody assays rely on the principle that conformational differences will result in differential access and inhibition of functional activity by these antibodies (Alexaki et al. 2019b).
156
B. C. Lin et al.
Recently, aptamers have emerged as alternatives for antibodies. Aptamers are small single stranded DNA or RNA molecules (20–60 nucleotides) that can be selected for binding to their target molecules including proteins with high specificity and affinity. The affinity with which aptamers bind to their target molecules can be measured by a variety of techniques including surface plasmon resonance (Jing and Bowser 2011). Conformational changes in protein will result in altered binding affinity of the aptamers and this principle was exploited to study structural changes induced by synonymous variants (Alexaki et al. 2019b; Zichel et al. 2012). Limited proteolysis is a simple biochemical method commonly employed to identify structural changes induced by synonymous variants (Simhadri et al. 2017; Kimchi-Sarfaty et al. 2007; Yu et al. 2015; Friedrich et al. 2015; Zhao et al. 2017; Fu et al. 2016; Kirchner et al. 2017). The limited proteolysis relies on the principle that upon limited and controlled exposure to the enzyme, altered conformation of a substrate protein results in differential access to the cleavage sites and thereby alters the digestion profile. While trypsin is the commonly used enzyme for these experiments, webtools providing putative digestion profiles based on enzyme recognition sites can be used for selection of optimal proteases (e.g., PeptideCutter (https://web. expasy.org/peptide_cutter/) and PROSPER (https://prosper.erc.monash.edu.au/)). The digestion profiles are generally screened by western blot experiments (Kimchi- Sarfaty et al. 2007) or can be coupled with mass spectrometry for global analysis (Feng et al. 2014). Circular dichroism (CD) is a light absorption spectroscopy method that is also widely used to study the conformation and stability of the proteins (Simhadri et al. 2017; Hunt et al. 2019; Kelly and Price 2000). In this assay, differences in the absorption of right- and left-circularly polarized light are measured to assess conformation of the protein. Both secondary and tertiary structural conformation can be analyzed by studying far-UV CD spectra between 260–180 nm and near-UV CD spectra between 320–260 nm respectively. Since some buffers can absorb below 200 nm, a careful selection of buffer is required for measurements below this wavelength. Key advantages of the assay is that it can be performed relatively quickly (in few hours) and requires only a small quantity of sample (20 μg or less) (Greenfield 2006a). Hydrogen-deuterium exchange mass spectrometry (HDX-MS) is another technique that is gaining popularity to study conformational features of proteins (Ozohanics and Ambrus 2020; Trabjerg et al. 2018). In this assay, proteins suspended in an aqueous buffer are labelled with heavy water, called deuterium. Over time, amide hydrogens in the proteins will exchange with deuterium. Altered conformation of the protein by genetic variants will result in differential access of amide hydrogens for exchange with deuterium and these differences can be identified by measuring mass changes with mass spectrometry. Several additional techniques like analytical ultracentrifugation, size-exclusion chromatography and chemical probes can also be used to assess synonymous variant induced protein conformational changes (Lundblad 2009).
7 Methods to Evaluate the Effects of Synonymous Variants
157
7.3.3 Assessment of Stability Changes Altered translational kinetics and/or co-translational folding-dependent conformational changes could result in altered post-translational modifications, activity, and stability of the encoded protein. Post-translational modification changes induced by synonymous variants can be demonstrated by either high-throughput mass- spectrometry (Honarmand Ebrahimi et al. 2014) or by employing assays targeted against specific modifications like glycosylation (Zucchelli et al. 2017). Assessment of function/activity is generally protein-specific, and thus, discussion of all relevant methods is beyond the scope of this chapter. Instead, general techniques for assessing protein stability changes will be discussed within this section. Protein stability is frequently assessed by techniques like differential scanning colorimetry (DSC), CD, and differential scanning fluorimetry (DSF). These techniques generally analyze thermodynamic unfolding of proteins employing varying underlying principles to assess protein stability. DSC is one of the most frequently used techniques with very high sensitivity (Johnson 2013; Gill et al. 2010). DSC measures the heat required to denature proteins and various metrics calculated from the data, such as the transition midpoint (Tm, the temperature at which 50% of the protein is denatured) can be used to compare samples. Generally, higher Tm values indicate higher stability of the protein (Gill et al. 2010). CD, as noted in the earlier section, is the difference in absorbance of the circularly polarized light. Proteins subjected to a gradual increase in temperature will unfold and this loss of higher order structure results in changes to the CD spectra (Greenfield 2006b). The resulting CD data is used to calculate various parameters including Tm and compare stability of samples. DSF employs hydrophobic extrinsic fluorescent dyes like SYPRO Orange that remain quenched in aqueous solvents in which proteins are suspended, but emit fluorescence when bound to hydrophobic regions of proteins (Gao et al. 2020). With an increasing temperature gradient, proteins gradually unfold exposing their hydrophobic core to which fluorescent dyes bind and emit an increased fluorescent signal. The resulting fluorescence emission changes with temperature can be compared between samples to identify stability differences. A key advantage of the DSF technique is that it can be performed with standard RT-qPCR instruments (Gao et al. 2020; Niesen et al. 2007). NanoDSF is an alternative form of DSF which relies on internal fluorescence from tryptophan residues within protein rather than external dyes (Ghisaidoobe and Chung 2014). The fluorescence emission from tryptophan is sensitive to local conformation and generally correlate with solvent exposure (Vivian and Callis 2001). Thermal unfolding will affect fluorescence emission from tryptophan (increasing with higher denaturation of the protein). Accordingly, altered protein stability will affect thermal protein folding and thus manifest as differences in observed fluorescence emission profiles in nanoDSF assay. In addition to the above discussed methods, additional techniques like pulse-chase and hydrogen- deuterium exchange can also be used to assess protein stability. However, it should be noted that while the above discussed assays are commonly employed in the
158
B. C. Lin et al.
interrogation of recombinant therapeutic proteins and proteins with amino acid substitutions, their application to the assessment of stability changes induced by synonymous variants is limited and not yet widely adapted. Finally, several in-silico tools have been developed to predict protein stability, but these rely on amino acid sequence changes and are not suitable for the assessment of synonymous variants. (Sanavia et al. 2020).
7.3.4 Evaluation of Immunogenicity Risk Immune responses to exogenous proteins are initiated upon presentation of foreign peptides to the immune system by major histocompatibility complex class II (MHC-II) molecules on the surface of antigen presenting cells (APCs) or by MHC class I (MHC-I) molecules in the case of endogenously expressed proteins (Blum et al. 2013). Since synonymous variants can change protein structure and post- translational modifications, and thus alter access to cleavage sites of endogenous proteases involved in antigen processing and presentation, synonymous variants may also induce immunogenicity risk changes (Bali and Bebok 2015). Currently, in-silico, in-vitro, ex-vivo and in-vivo assays are available to assess immunogenicity risk. The in-silico tools predict immunogenicity based on amino acid changes while in-vitro based assessments, such as peptide-MHC binding assays (Salvat et al. 2014), employ synthetic peptides to compare immunogenicity risk profiles and thus may not be useful for studying synonymous variants. MHC-associated Peptide Proteomics (MAPPS) assay (Karle 2020) is an ex-vivo immunogenicity assessment assay that recently emerged as a superior choice due to its ability to directly identify peptides presented by APCs. In the MAPPS assay, monocytes isolated from healthy donor blood samples are differentiated into immature dendritic cells and are then matured in the presence of the test protein. In the process, the dendritic cells uptake the protein, process it and present linear peptides on MHC-II molecules. These peptide fragments are then isolated and sequenced by mass-spectrometry. By analyzing these MHC-II presented peptide profiles, potential neo-epitopes arising from the effects of synonymous variants can be identified. The immunogenicity of the identified neo-epitopes can be confirmed by employing assays like T-cell proliferation assay, ELISPOT assay, and MHC-tetramer assay that measure activation and/or proliferation of T-cells in response to neoepitopes (Pratt 2018). A key advantage of the MAPPS assay is that it identifies peptides that are processed from the test protein in its native state and presented by MHC-II molecules of ex-vivo APCs rather than in-silico predicted or synthetic peptides. However, one limitation of the assay is that in-vivo immune system interactions are very complex and MAPPS assay performed in ex-vivo conditions may not be representative of in-vivo conditions. MAPPS assay demonstrated differential presentation of T-cell epitopes by plasma-derived and recombinant factor VIII proteins, which demonstrated different immunogenicity risk profiles despite sharing similar amino acid sequence (Jankowski et al. 2019). In-vivo immunogenicity risk assessment is
7 Methods to Evaluate the Effects of Synonymous Variants
159
performed by employing animal models, generally with mice as the model organism. Expectedly, human proteins will be immunogenic in animal models and will result in immune responses against proteins. However, differences in observed responses (e.g., anti-protein antibodies) could potentially indicate conformational differences. The applicability of data from mouse models can be further improved by using HLA transgenic mice expressing human MHC (Sønderstrup et al. 1999), transgenic mice expressing human protein of interest and are tolerant to native proteins (Hermeling et al. 2005), or immune system-humanized mice (McDermott et al. 2010). It should be noted that animal models are yet to be adapted in the study of synonymous variants.
7.4 In-Silico Tools for Predicting Comprehensive Effects of Synonymous Variants The in-silico prediction of deleterious or pathogenic synonymous variants is unsurprisingly complex and consequently, very few such tools were developed. These tools include SilVA (Silent Variant Analyzer) (Buske et al. 2013), DDIG-SN (Detecting Disease-causing Genetic SynoNymous variants) (Livingstone et al. 2017), IDSV (Identification of Deleterious Synonymous Variants) (Shi et al. 2019), TraP (Transcript-inferred Pathogenicity) (Gelfman et al. 2017) and usDSM (Deleterious Synonymous Mutation Prediction using Undersampling Scheme) (Tang et al. 2021). These tools essentially use training datasets consisting of pathogenic and neutral variants, annotate variants with various features considered to be important for distinguishing the pathogenicity of synonymous variants, and employ different methods to generate a model of high accuracy that incorporates these selected parameters. The SilVA, IDSV, TraP and usDSM tools primarily considered various nucleotide sequence-based conservation, splicing, and RNA folding related features and employed random forest models to predict their impact. DDIG-SN on the other hand employed a support vector machine model. While it initially considered protein-based features related to protein conservation, protein structure and global sequence properties, these features were not used in the final DDIG-SN model. These synonymous variant prediction tools are reasonably successful in predicting deleterious synonymous variants, however there is room for further improvement (Tang et al. 2021). A major impediment for the development of synonymous variant prediction tools is the lack of large high-quality training datasets. Historically, synonymous variants are under characterized and even available data sets are of low quality. To overcome this limitation, recent efforts have been made to create large artificially created datasets that can be substituted as training sets (Zeng and Bromberg 2019). Hopefully soon, these datasets will prove useful for the development and enhancement of prediction tools to characterize synonymous variants. Disclaimer Our comments/our contributions are an informal communication and represent our own best judgement. These comments do not bind or obligate FDA.
160
B. C. Lin et al.
References Agarwal V, Bell GW, Nam JW, Bartel DP (2015) Predicting effective microRNA target sites in mammalian mRNAs. elife 4:e05005 Alexaki A, Kames J, Holcomb DD, Athey J, Santana-Quintero LV, Lam PVN, Hamasaki- Katagiri N, Osipova E, Simonyan V, Bar H et al (2019a) Codon and codon-pair usage tables (CoCoPUTs): facilitating genetic variation analyses and recombinant gene design. J Mol Biol 431:2434–2441 Alexaki A, Hettiarachchi GK, Athey JC, Katneni UK, Simhadri V, Hamasaki-Katagiri N, Nanavaty P, Lin B, Takeda K, Freedberg D et al (2019b) Effects of codon optimization on coagulation factor IX translation and structure: implications for protein and gene therapies. Sci Rep 9:15449 Ando H, Miyoshi-Akiyama T, Watanabe S, Kirikae T (2014) A silent mutation in mabA confers isoniazid resistance on Mycobacterium tuberculosis. Mol Microbiol 91:538–547 Apetrei A, Molin A, Gruchy N, Godin M, Bracquemart C, Resbeut A, Rey G, Nadeau G, Richard N (2021) A novel synonymous variant in exon 1 of GNAS gene results in a cryptic splice site and causes pseudohypoparathyroidism type 1A and pseudo-pseudohypoparathyroidism in a French family. Bone Rep 14:101073 Athey J, Alexaki A, Osipova E, Rostovtsev A, Santana-Quintero LV, Katneni U, Simonyan V, Kimchi-Sarfaty C (2017) A new and updated resource for codon usage tables. BMC Bioinform 18:391 Aviner R, Geiger T, Elroy-Stein O (2014) Genome-wide identification and quantification of protein synthesis in cultured cells and whole tissues by puromycin-associated nascent chain proteomics (PUNCH-P). Nat Protoc 9:751–760 Babendure JR, Babendure JL, Ding J-H, Tsien RY (2006) Control of mammalian translation by mRNA structure near caps. RNA (New York, NY) 12:851–861 Bahiri-Elitzur S, Tuller T (2021) Codon-based indices for modeling gene expression and transcript evolution. Comput Struct Biotechnol J 19:2646–2663 Bailey SF, Hinz A, Kassen R (2014) Adaptive synonymous mutations in an experimentally evolved Pseudomonas fluorescens population. Nat Commun 5:4076 Bali V, Bebok Z (2015) Decoding mechanisms by which silent codon changes influence protein biogenesis and function. Int J Biochem Cell Biol 64:58–74 Bandyopadhyay S, Mitra R (2009) TargetMiner: microRNA target prediction with systematic identification of tissue-specific negative examples. Bioinformatics 25:2625–2631 Bandyopadhyay S, Ghosh D, Mitra R, Zhao Z (2015) MBSTAR: multiple instance learning for predicting specific functional binding sites in microRNA targets. Sci Rep 5:8004 Bartel DP (2009) MicroRNAs: target recognition and regulatory functions. Cell 136:215–233 Bartoszewski RA, Jablonsky M, Bartoszewska S, Stevenson L, Dai Q, Kappes J, Collawn JF, Bebok Z (2010) A synonymous single nucleotide polymorphism in DeltaF508 CFTR alters the secondary structure of the mRNA and the expression of the mutant protein. J Biol Chem 285:28741–28748 Bellaousov S, Mathews DH (2010) ProbKnot: fast prediction of RNA secondary structure including pseudoknots. RNA 16:1870–1880 Ben Or G, Veksler-Lublinsky I (2021) Comprehensive machine-learning-based analysis of microRNA–target interactions reveals variable transferability of interaction rules across species. BMC Bioinform 22:264 Bertalovitz AC, Badhey MLO, McDonald TV (2018) Synonymous nucleotide modification of the KCNH2 gene affects both mRNA characteristics and translation of the encoded hERG ion channel. J Biol Chem 293:12120–12136 Bertolazzi G, Benos PV, Tumminello M, Coronnello C (2020) An improvement of ComiR algorithm for microRNA target prediction by exploiting coding region sequences of mRNAs. BMC Bioinform 21:201 Betel D, Koppal A, Agius P, Sander C, Leslie C (2010) Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites. Genome Biol 11:R90
7 Methods to Evaluate the Effects of Synonymous Variants
161
Blum JS, Wearsch PA, Cresswell P (2013) Pathways of antigen processing. Annu Rev Immunol 31:443–473 Brest P, Lapaquette P, Souidi M, Lebrigand K, Cesaro A, Vouret-Craviari V, Mari B, Barbry P, Mosnier JF, Hébuterne X et al (2011) A synonymous variant in IRGM alters a binding site for miR-196 and causes deregulation of IRGM-dependent xenophagy in Crohn’s disease. Nat Genet 43:242–245 Buhr F, Jha S, Thommen M, Mittelstaet J, Kutz F, Schwalbe H, Rodnina MV, Komar AA (2016) Synonymous codons direct cotranslational folding toward different protein conformations. Mol Cell 61:341–351 Burge C, Karlin S (1997) Prediction of complete gene structures in human genomic DNA11Edited by F. E. Cohen. J Mol Biol 268:78–94 Buske OJ, Manickaraj A, Mital S, Ray PN, Brudno M (2013) Identification of deleterious synonymous variants in human genomes. Bioinformatics 29:1843–1850 Calonaci N, Jones A, Cuturello F, Sattler M, Bussi G (2020) Machine learning a model for RNA structure prediction. NAR Genom Bioinform 2:lqaa090 Chassé H, Boulben S, Costache V, Cormier P, Morales J (2016) Analysis of translation using polysome profiling. Nucleic Acids Res 45:e15–e15 Chekulaeva M, Landthaler M (2016) Eyes on translation. Mol Cell 63:918–925 Chen Y, Wang X (2019) miRDB: an online database for prediction of functional microRNA targets. Nucleic Acids Res 48:D127–D131 Chen X, Li Y, Umarov R, Gao X, Song L (2020) RNA secondary structure prediction by learning unrolled algorithms. arXiv preprint arXiv:200205810 Chi SW, Zang JB, Mele A, Darnell RB (2009) Argonaute HITS-CLIP decodes microRNA–mRNA interaction maps. Nature 460:479–486 Clarke TFIV, Clark PL (2008) Rare codons cluster. PLoS One 3:e3412 Coleman JR, Papamichail D, Skiena S, Futcher B, Wimmer E, Mueller S (2008) Virus attenuation by genome-scale changes in codon pair bias. Science 320:1784–1787 Coronnello C, Benos PV (2013) ComiR: combinatorial microRNA target prediction tool. Nucleic Acids Res 41:W159–W164 Cuevas JM, Domingo-Calap P, Sanjuán R (2012) The fitness effects of synonymous mutations in DNA and RNA viruses. Mol Biol Evol 29:17–20 Dermit M, Dodel M, Mardakheh FK (2017) Methods for monitoring and measurement of protein translation in time and space. Mol BioSyst 13:2477–2488 Desmet F-O, Hamroun D, Lalande M, Collod-Béroud G, Claustres M, Béroud C (2009) Human Splicing Finder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res 37:e67–e67 Diambra LA (2017) Differential bicodon usage in lowly and highly abundant proteins. PeerJ 5:e3081 Dieterich DC, Link AJ, Graumann J, Tirrell DA, Schuman EM (2006) Selective identification of newly synthesized proteins in mammalian cells using bioorthogonal noncanonical amino acid tagging (BONCAT). Proc Natl Acad Sci 103:9482–9487 Dieterich DC, Hodas JJL, Gouzer G, Shadrin IY, Ngo JT, Triller A, Tirrell DA, Schuman EM (2010) In situ visualization and dynamics of newly synthesized proteins in rat hippocampal neurons. Nat Neurosci 13:897–905 Ding Y, Tang Y, Kwok CK, Zhang Y, Bevilacqua PC, Assmann SM (2014) In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features. Nature 505:696–700 Dobrowolski SF, Andersen HS, Doktor TK, Andresen BS (2010) The phenylalanine hydroxylase c.30C>G synonymous variation (p.G10G) creates a common exonic splicing silencer. Mol Genet Metab 100:316–323 Dölken L, Ruzsics Z, Rädle B, Friedel CC, Zimmer R, Mages J, Hoffmann R, Dickinson P, Forster T, Ghazal P et al (2008) High-resolution gene expression profiling for simultaneous kinetic parameter analysis of RNA synthesis and decay. RNA 14:1959–1972 Domingo-Calap P, Cuevas JM, Sanjuán R (2009) The fitness effects of random mutations in single- stranded DNA and RNA bacteriophages. PLoS Genet 5:e1000742
162
B. C. Lin et al.
Duan J, Wainwright MS, Comeron JM, Saitou N, Sanders AR, Gelernter J, Gejman PV (2003) Synonymous mutations in the human dopamine receptor D2 (DRD2) affect mRNA stability and synthesis of the receptor. Hum Mol Genet 12:205–216 Erkelenz S, Theiss S, Otte M, Widera M, Peter JO, Schaal H (2014) Genomic HEXploring allows landscaping of novel potential splicing regulatory elements. Nucleic Acids Res 42:10681–10697 Fairbrother WG, Yeo GW, Yeh R, Goldstein P, Mawson M, Sharp PA, Burge CB (2004) RESCUE- ESE identifies candidate exonic splicing enhancers in vertebrate exons. Nucleic Acids Res 32:W187–W190 Fang Z, Rajewsky N (2011) The impact of miRNA target sites in coding sequences and in 3′ UTRs. PLoS One 6:e18067 Feng Y, De Franceschi G, Kahraman A, Soste M, Melnik A, Boersema PJ, de Laureto PP, Nikolaev Y, Oliveira AP, Picotti P (2014) Global analysis of protein structural changes in complex proteomes. Nat Biotechnol 32:1036–1044 Forman JJ, Coller HA (2010) The code within the code: microRNAs target coding regions. Cell Cycle 9:1533–1541 Forman JJ, Legesse-Miller A, Coller HA (2008) A search for conserved sequences in coding regions reveals that the let-7 microRNA targets Dicer within its coding sequence. Proc Natl Acad Sci 105:14879–14884 Fox JM, Erill I (2010) Relative codon adaptation: a generic codon bias index for prediction of gene expression. DNA Res 17:185–196 Friedman Y, Naamati G, Linial M (2010) MiRror: a combinatorial analysis web tool for ensembles of microRNAs and their targets. Bioinformatics 26:1920–1921 Friedrich U, Datta S, Schubert T, Plössl K, Schneider M, Grassmann F, Fuchshofer R, Tiefenbach K-J, Längst G, Weber BHF (2015) Synonymous variants in HTRA1 implicated in AMD susceptibility impair its capacity to regulate TGF-β signaling. Hum Mol Genet 24:6361–6373 Frumkin I, Lajoie MJ, Gregg CJ, Hornung G, Church GM, Pilpel Y (2018) Codon usage of highly expressed genes affects proteome-wide translation efficiency. Proc Natl Acad Sci 115:E4940–E4949 Fu J, Murphy KA, Zhou M, Li YH, Lam VH, Tabuloc CA, Chiu JC, Liu Y (2016) Codon usage affects the structure and function of the Drosophila circadian clock protein PERIOD. Genes Dev 30:1761–1775 Gaither JBS, Lammi GE, Li JL, Gordon DM, Kuck HC, Kelly BJ, Fitch JR, White P (2021) Synonymous variants that disrupt messenger RNA structure are significantly constrained in the human population. Gigascience 10:giab023 Gao K, Oerlemans R, Groves MR (2020) Theory and applications of differential scanning fluorimetry in early-stage drug discovery. Biophys Rev 12:85–104 Gartner JJ, Parker SCJ, Prickett TD, Dutton-Regester K, Stitzel ML, Lin JC, Davis S, Simhadri VL, Jha S, Katagiri N et al (2013) Whole-genome sequencing identifies a recurrent functional synonymous mutation in melanoma. Proc Natl Acad Sci 110:13481–13486 Gebert LFR, MacRae IJ (2019) Regulation of microRNA function in animals. Nat Rev Mol Cell Biol 20:21–37 Gelfman S, Wang Q, McSweeney KM, Ren Z, La Carpia F, Halvorsen M, Schoch K, Ratzon F, Heinzen EL, Boland MJ et al (2017) Annotating pathogenic non-coding variants in genic regions. Nat Commun 8:236–236 Ghisaidoobe ABT, Chung SJ (2014) Intrinsic tryptophan fluorescence in the detection and analysis of proteins: a focus on Förster resonance energy transfer techniques. Int J Mol Sci 15:22518–22538 Gill P, Moghadam TT, Ranjbar B (2010) Differential scanning calorimetry techniques: applications in biology and nanoscience. J Biomol Tech 21:167–193 Greenfield NJ (2006a) Using circular dichroism spectra to estimate protein secondary structure. Nat Protoc 1:2876–2890 Greenfield NJ (2006b) Using circular dichroism collected as a function of temperature to determine the thermodynamics of protein unfolding and binding interactions. Nat Protoc 1:2527–2535
7 Methods to Evaluate the Effects of Synonymous Variants
163
Griseri P, Bourcier C, Hieblot C, Essafi-Benkhadir K, Chamorey E, Touriol C, Pagès G (2011) A synonymous polymorphism of the Tristetraprolin (TTP) gene, an AU-rich mRNA-binding protein, affects translation efficiency and response to Herceptin treatment in breast cancer patients. Hum Mol Genet 20:4556–4568 Gruber AR, Lorenz R, Bernhart SH, Neuböck R, Hofacker IL (2008) The Vienna RNA websuite. Nucleic Acids Res 36:W70–W74 Gu W, Zhou T, Wilke CO (2010) A universal trend of reduced mRNA stability near the translation- initiation site in prokaryotes and eukaryotes. PLoS Comput Biol 6:e1000664 Halstead JM, Lionnet T, Wilbertz JH, Wippich F, Ephrussi A, Singer RH, Chao JA (2015) An RNA biosensor for imaging the first round of translation from single cells to living animals. Science 347:1367–1671 Hamasaki-Katagiri N, Lin BC, Simon J, Hunt RC, Schiller T, Russek-Cohen E, Komar AA, Bar H, Kimchi-Sarfaty C (2017) The importance of mRNA structure in determining the pathogenicity of synonymous and non-synonymous mutations in haemophilia. Haemophilia 23:e8–e17 Hebsgaard SM, Korning PG, Tolstrup N, Engelbrecht J, Rouzé P, Brunak S (1996) Splice site prediction in arabidopsis thaliana pre-mRNA by combining local and global sequence information. Nucleic Acids Res 24:3439–3452 Heiman M, Kulicke R, Fenster RJ, Greengard P, Heintz N (2014) Cell type-specific mRNA purification by translating ribosome affinity purification (TRAP). Nat Protoc 9:1282–1291 Hermeling S, Jiskoot W, Crommelin D, Bornæs C, Schellekens H (2005) Development of a transgenic mouse model immune tolerant for human interferon beta. Pharm Res 22:847–851 Hiard S, Charlier C, Coppieters W, Georges M, Baurain D (2010) Patrocles: a database of polymorphic miRNA-mediated gene regulation in vertebrates. Nucleic Acids Res 38:D640–D651 Honarmand Ebrahimi K, West GM, Flefil R (2014) Mass spectrometry approach and ELISA reveal the effect of codon optimization on N-linked glycosylation of HIV-1 gp120. J Proteome Res 13:5801–5811 Hoover DM, Lubkowski J (2002) DNAWorks: an automated method for designing oligonucleotides for PCR-based gene synthesis. Nucleic Acids Res 30:e43 Horton JS, Flanagan LM, Jackson RW, Priest NK, Taylor TB (2021) A mutational hotspot that determines highly repeatable evolution can be built and broken by silent genetic changes. Nat Commun 12:6092 Howden AJM, Geoghegan V, Katsch K, Efstathiou G, Bhushan B, Boutureira O, Thomas B, Trudgian DC, Kessler BM, Dieterich DC et al (2013) QuaNCAT: quantitating proteome dynamics in primary cells. Nat Methods 10:343–346 Hunt RC, Simhadri VL, Iandoli M, Sauna ZE, Kimchi-Sarfaty C (2014) Exposing synonymous mutations. Trends Genet 30:308–321 Hunt R, Hettiarachchi G, Katneni U, Hernandez N, Holcomb D, Kames J, Alnifaidy R, Lin B, Hamasaki-Katagiri N, Wesley A et al (2019) A single synonymous variant (c.354G>A [p.P118P]) in ADAMTS13 confers enhanced specific activity. Int J Mol Sci 20:5734 Ingolia NT, Ghaemmaghami S, Newman JRS, Weissman JS (2009) Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science (New York, NY) 324:218–223 Iwasaki S, Ingolia NT (2017) The growing toolbox for protein synthesis studies. Trends Biochem Sci 42:612–624 Jabbari H, Wark I, Montemagno C, Will S (2018) Knotty: efficient and accurate prediction of complex RNA pseudoknot structures. Bioinformatics 34:3849–3856 Jacobs WM, Shakhnovich EI (2017) Evidence of evolutionary selection for cotranslational folding. Proc Natl Acad Sci 114:11434–11439 Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, Darbandi SF, Knowles D, Li YI, Kosmicki JA, Arbelaez J, Cui W, Schwartz GB et al (2019) Predicting splicing from primary sequence with deep learning. Cell 176:535–548.e524 Jan CH, Williams CC, Weissman JS (2014) Principles of ER cotranslational translocation revealed by proximity-specific ribosome profiling. Science (New York, NY) 346:1257521–1257521
164
B. C. Lin et al.
Jankowski W, Park Y, McGill J, Maraskovsky E, Hofmann M, Diego VP, Luu BW, Howard TE, Kellerman R, Key NS et al (2019) Peptides identified on monocyte-derived dendritic cells: a marker for clinical immunogenicity to FVIII products. Blood Adv 3:1429–1440 Jing M, Bowser MT (2011) Methods for measuring aptamer-protein equilibria: a review. Anal Chim Acta 686:9–18 Johnson CM (2013) Differential scanning calorimetry as a tool for protein folding and stability. Arch Biochem Biophys 531:100–109 Jonas S, Izaurralde E (2015) Towards a molecular understanding of microRNA-mediated gene silencing. Nat Rev Genet 16:421–433 Karle AC (2020) Applying MAPPs assays to assess drug immunogenicity. Front Immunol 11 Article 698 Katneni UK, Liss A, Holcomb D, Katagiri NH, Hunt R, Bar H, Ismail A, Komar AA, Kimchi- Sarfaty C (2019) Splicing dysregulation contributes to the pathogenicity of several F9 exonic point variants. Mol Genet Genomic Med 7:e840 Ke S, Shang S, Kalachikov SM, Morozova I, Yu L, Russo JJ, Ju J, Chasin LA (2011) Quantitative evaluation of all hexamers as exonic splicing elements. Genome Res 21:1360–1374 Keightley PD, Halligan DL (2011) Inference of site frequency spectra from high-throughput sequence data: quantification of selection on nonsynonymous and synonymous sites in humans. Genetics 188:931–940 Kelly MS, Price CN (2000) The use of circular dichroism in the investigation of protein structure and function. Curr Protein Pept Sci 1:349–384 Kershner JP, Yu McLoughlin S, Kim J, Morgenthaler A, Ebmeier CC, Old WM, Copley SD (2016) A synonymous mutation upstream of the gene encoding a weak-link enzyme causes an ultrasensitive response in growth rate. J Bacteriol 198:2853–2863 Kertesz M, Iovino N, Unnerstall U, Gaul U, Segal E (2007) The role of site accessibility in microRNA target recognition. Nat Genet 39:1278–1284 Kimchi-Sarfaty C, Oh JM, Kim I-W, Sauna ZE, Calcagno AM, Ambudkar SV, Gottesman MM (2007) A “silent” polymorphism in the MDR1 gene changes substrate specificity. Science 315:525–528 Kimchi-Sarfaty C, Simhadri VL, Kopelman D, Friedman A, Edwards N, Javaid A, Okunji C, Komar A, Sauna Z, Katagiri N (2010) The synonymous V107V mutation in factor IX is not so silent and may cause hemophilia B in patients. Blood 116:2197–2197 Kirchner S, Cai Z, Rauscher R, Kastelic N, Anding M, Czech A, Kleizen B, Ostedgaard LS, Braakman I, Sheppard DN et al (2017) Alteration of protein function by a silent polymorphism linked to tRNA abundance. PLoS Biol 15:e2000779–e2000779 Knöppel A, Näsvall J, Andersson DI (2016) Compensating the fitness costs of synonymous mutations. Mol Biol Evol 33:1461–1477 Krek A, Grün D, Poy MN, Wolf R, Rosenberg L, Epstein EJ, MacMenamin P, da Piedade I, Gunsalus KC, Stoffel M et al (2005) Combinatorial microRNA target predictions. Nat Genet 37:495–500 Kudla G, Murray AW, Tollervey D, Plotkin JB (2009) Coding-sequence determinants of gene expression in Escherichia coli. Science 324:255–258 Kunec D, Osterrieder N (2016) Codon pair bias is a direct consequence of dinucleotide bias. Cell Rep 14:55–67 Lawrie DS, Messer PW, Hershberg R, Petrov DA (2013) Strong purifying selection at synonymous sites in D. melanogaster. PLoS Genet 9:e1003527 Lebeuf-Taylor E, McCloskey N, Bailey SF, Hinz A, Kassen R (2019) The distribution of fitness effects among synonymous mutations in a gene under directional selection. elife 8:1 Lee B, Baek J, Park S, Yoon S (2016) deepTarget: end-to-end learning framework for microRNA target prediction using deep recurrent neural networks. In: Proceedings of the 7th ACM international conference on bioinformatics, computational biology, and health informatics (Seattle, WA, USA, Association for Computing Machinery), pp 434–442
7 Methods to Evaluate the Effects of Synonymous Variants
165
Lewis BP, Shih IH, Jones-Rhoades MW, Bartel DP, Burge CB (2003) Prediction of mammalian microRNA targets. Cell 115:787–798 Lewis BP, Burge CB, Bartel DP (2005) Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell 120:15–20 Li Q, Li J, Yu C-p, Chang S, Xie L-l, Wang S (2021) Synonymous mutations that regulate translation speed might play a non-negligible role in liver cancer development. BMC Cancer 21:388 Liu Y (2020) A code within the genetic code: codon usage regulates co-translational protein folding. Cell Commun Signal 18:145 Livingstone M, Folkman L, Yang Y, Zhang P, Mort M, Cooper DN, Liu Y, Stantic B, Zhou Y (2017) Investigating DNA-, RNA-, and protein-based features as a means to discriminate pathogenic synonymous variants. Hum Mutat 38:1336–1347 Lu W, Tang Y, Wu H, Huang H, Fu Q, Qiu J, Li H (2019) Predicting RNA secondary structure via adaptive deep recurrent neural networks with energy-based filter. BMC Bioinform 20:684 Lundblad RL (2009) Approaches to the conformational analysis of biopharmaceuticals. Chapman and Hall/CRC, New York Marín RM, Sulc M, Vanícek J (2013) Searching the coding region for microRNA targets. RNA 19:467–474 Markham NR, Zuker M (2008) UNAFold: software for nucleic acid folding and hybridization. Methods Mol Biol 453:3–31 Mauro VP, Chappell SA (2014) A critical analysis of codon optimization in human therapeutics. Trends Mol Med 20:604–613 McDermott SP, Eppert K, Lechman ER, Doedens M, Dick JE (2010) Comparison of human cord blood engraftment between immunocompromised mouse strains. Blood 116:193–200 Mordstein C, Savisaar R, Young RS, Bazile J, Talmane L, Luft J, Liss M, Taylor MS, Hurst LD, Kudla G (2020) Codon usage and splicing jointly influence mRNA localization. Cell Syst 10:351–362.e358 Morisaki T, Lyon K, DeLuca KF, DeLuca JG, English BP, Zhang Z, Lavis LD, Grimm JB, Viswanathan S, Looger LL et al (2016) Real-time quantification of single RNA translation dynamics in living cells. Science 352:1425–1429 Mueller WF, Larsen LSZ, Garibaldi A, Hatfield GW, Hertel KJ (2015) The silent sway of splicing by synonymous substitutions*. J Biol Chem 290:27700–27711 Nackley AG, Shabalina SA, Tchivileva IE, Satterfield K, Korchynskyi O, Makarov SS, Maixner W, Diatchenko L (2006) Human catechol-O-methyltransferase haplotypes modulate protein expression by altering mRNA secondary structure. Science 314:1930–1933 Newman ZR, Young JM, Ingolia NT, Barton GM (2016) Differences in codon bias and GC content contribute to the balanced expression of TLR7 and TLR9. Proc Natl Acad Sci U S A 113:E1362–E1371 Niesen FH, Berglund H, Vedadi M (2007) The use of differential scanning fluorimetry to detect ligand interactions that promote protein stability. Nat Protoc 2:2212–2221 Ong S-E, Blagoev B, Kratchmarova I, Kristensen DB, Steen H, Pandey A, Mann M (2002) Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics*. Mol Cell Proteomics 1:376–386 Oubounyt M, Louadi Z, Tayara H, Chong KT (2019) DeePromoter: robust promoter predictor using deep learning. Front Genet 10:286 Ozohanics O, Ambrus A (2020) Hydrogen-deuterium exchange mass spectrometry: a novel structural biology approach to structure, dynamics and interactions of proteins and their complexes. Life 10:286 Pagani F, Raponi M, Baralle FE (2005) Synonymous mutations in CFTR exon 12 affect splicing and are not neutral in evolution. Proc Natl Acad Sci U S A 102:6368–6372 Paraskevopoulou MD, Georgakilas G, Kostoulas N, Vlachos IS, Vergoulis T, Reczko M, Filippidis C, Dalamagas T, Hatzigeorgiou AG (2013) DIANA-microT web server v5.0: service integration into miRNA functional analysis workflows. Nucleic Acids Res 41:W169–W173 Parvathy ST, Udayasuriyan V, Bhadana V (2022) Codon usage bias. Mol Biol Rep 49:539–565
166
B. C. Lin et al.
Peris JB, Davis P, Cuevas JM, Nebot MR, Sanjuán R (2010) Distribution of fitness effects caused by single-nucleotide substitutions in bacteriophage f1. Genetics 185:603–609 Peterson J, Li S, Kaltenbrun E, Erdogan O, Counter CM (2020) Expression of transgenes enriched in rare codons is enhanced by the MAPK pathway. Sci Rep 10:22166 Pratt KP (2018) Anti-drug antibodies: emerging approaches to predict, reduce or reverse biotherapeutic immunogenicity. Antibodies (Basel) 7:19 Proctor JR, Meyer IM (2013) COFOLD: an RNA secondary structure prediction method that takes co-transcriptional folding into account. Nucleic Acids Res 41:e102 Puigbò P, Guzmán E, Romeu A, Garcia-Vallvé S (2007) OPTIMIZER: a web server for optimizing the codon usage of DNA sequences. Nucleic Acids Res 35:W126–W131 Rahman S, Kosakovsky Pond SL, Webb A, Hey J (2021) Weak selection on synonymous codons substantially inflates dN/dS estimates in bacteria. Proc Natl Acad Sci 118:e2023575118 Raponi M, Kralovicova J, Copson E, Divina P, Eccles D, Johnson P, Baralle D, Vorechovsky I (2011) Prediction of single-nucleotide substitutions that result in exon skipping: identification of a splicing silencer in BRCA1 exon 6. Hum Mutat 32:436–444 Reese MG, Eeckman FH, Kulp D, Haussler D (1997) Improved splice site detection in genie. J Comput Biol 4:311–323 Riffo-Campos ÁL, Riquelme I, Brebi-Mieville P (2016) Tools for sequence-based miRNA target prediction: what to choose? Int J Mol Sci 17:1987 Riolo G, Cantara S, Marzocchi C, Ricci C (2020) miRNA targets: from prediction tools to experimental validation. Methods Protoc 4(1) Riolo G, Cantara S, Ricci C (2021) What’s wrong in a jump? Prediction and validation of splice site variants. Methods Protoc 4:62 Rodriguez A, Wright G, Emrich S, Clark PL (2018) %MinMax: a versatile tool for calculating and comparing synonymous codon usage and its impact on protein folding. Protein Sci 27:356–362 Rogozin IB, Milanesi L (1997) Analysis of donor splice sites in different eukaryotic organisms. J Mol Evol 45:50–59 Saetrom O, Snøve O Jr, Saetrom P (2005) Weighted sequence motifs as an improved seeding step in microRNA target prediction algorithms. RNA 11:995–1003 Salari R, Kimchi-Sarfaty C, Gottesman MM, Przytycka TM (2013) Sensitive measurement of single-nucleotide polymorphism-induced changes of RNA conformation: application to disease studies. Nucleic Acids Res 41:44–53 Salvat R, Moise L, Bailey-Kellogg C, Griswold KE (2014) A high throughput MHC II binding assay for quantitative analysis of peptide epitopes. J Vis Exp 85:51308 Sanavia T, Birolo G, Montanucci L, Turina P, Capriotti E, Fariselli P (2020) Limitations and challenges in protein stability prediction upon genome variations: towards future applications in precision medicine. Comput Struct Biotechnol J 18:1968–1979 Sato K, Kato Y (2021) Prediction of RNA secondary structure including pseudoknots for long sequences. Brief Bioinform 23:1–9 Sauna ZE, Kimchi-Sarfaty C (2011) Understanding the contribution of synonymous mutations to human disease. Nat Rev Genet 12:683–691 Sauna ZE, Kimchi-Sarfaty C (2013) Synonymous mutations as a cause of human genetic disease. In: eLS. Wiley Sauna ZE, Kimchi-Sarfaty C, Ambudkar SV, Gottesman MM (2007) Silent polymorphisms speak: how they affect pharmacogenomics and the treatment of cancer. Cancer Res 67:9609–9612 Savisaar R, Hurst LD (2018) Exonic splice regulation imposes strong selection at synonymous sites. Genome Res 28:1442–1454 Schnall-Levin M, Zhao Y, Perrimon N, Berger B (2010) Conserved microRNA targeting in Drosophila is as widespread in coding regions as in 3′ UTRs. Proc Natl Acad Sci 107:15751–15756 Seoighe C, Kiniry SJ, Peters A, Baranov PV, Yang H (2020) Selection shapes synonymous stop codon use in mammals. J Mol Evol 88:549–561
7 Methods to Evaluate the Effects of Synonymous Variants
167
Shabalina SA, Spiridonov NA, Kashina A (2013) Sounds of silence: synonymous nucleotides as a key to biological regulation and complexity. Nucleic Acids Res 41:2073–2094 Sharma Y, Miladi M, Dukare S, Boulay K, Caudron-Herger M, Groß M, Backofen R, Diederichs S (2019) A pan-cancer analysis of synonymous mutations. Nat Commun 10:2569 Sharp PM, Li WH (1987) The codon adaptation index-a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15:1281–1295 Shi F, Yao Y, Bin Y, Zheng C-H, Xia J (2019) Computational identification of deleterious synonymous variants in human genomes using a feature-based approach. BMC Med Genet 12:12 Shin C, Nam JW, Farh KK, Chiang HR, Shkumatava A, Bartel DP (2010) Expanding the microRNA targeting code: functional sites with centered pairing. Mol Cell 38:789–802 Simhadri VL, Hamasaki-Katagiri N, Lin BC, Hunt R, Jha S, Tseng SC, Wu A, Bentley AA, Zichel R, Lu Q et al (2017) Single synonymous mutation in factor IX alters protein properties and underlies haemophilia B. J Med Genet 54:338–345 Singh J, Hanson J, Paliwal K, Zhou Y (2019) RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nat Commun 10:5407 Singh J, Paliwal K, Zhang T, Singh J, Litfin T, Zhou Y (2021) Improved RNA secondary structure and tertiary base-pairing prediction using evolutionary profile, mutational coupling and two- dimensional transfer learning. Bioinformatics 37:2589–2600 Sloma MF, Mathews DH (2017) Base pair probability estimates improve the prediction accuracy of RNA non-canonical base pairs. PLoS Comput Biol 13:e1005827 Smith PJ, Zhang C, Wang J, Chew SL, Zhang MQ, Krainer AR (2006) An increased specificity score matrix for the prediction of SF2/ASF-specific exonic splicing enhancers. Hum Mol Genet 15:2490–2508 Sønderstrup G, Cope AP, Patel S, Congia M, Hain N, Hall FC, Parry SL, Fugger LH, Michie S, McDevitt HO (1999) HLA class II transgenic mice: models of the human CD4+ T-cell immune response. Immunol Rev 172:335–343 Sroubek J, Krishnan Y, McDonald TV (2013) Sequence and structure-specific elements of HERG mRNA determine channel synthesis and trafficking efficiency. FASEB J 27:3039–3053 Stergachis AB, Haugen E, Shafer A, Fu W, Vernot B, Reynolds A, Raubitschek A, Ziegler S, LeProust EM, Akey JM et al (2013) Exonic transcription factor binding directs codon choice and affects protein evolution. Science 342:1367–1372 Šulc M, Marín RM, Robins HS, Vaníček J (2015) PACCMIT/PACCMIT-CDS: identifying microRNA targets in 3′ UTRs and coding sequences. Nucleic Acids Res 43:W474–W479 Tang X, Zhang T, Cheng N, Wang H, Zheng C-H, Xia J, Zhang T (2021) usDSM: a novel method for deleterious synonymous mutation prediction using undersampling scheme. Brief Bioinform 22:5416 Tinoco I Jr, Uhlenbeck OC, Levine MD (1971) Estimation of secondary structure in ribonucleic acids. Nature 230:362–367 Trabjerg E, Nazari ZE, Rand KD (2018) Conformational analysis of complex protein states by hydrogen/deuterium exchange mass spectrometry (HDX-MS): challenges and emerging solutions. TrAC Trends Anal Chem 106:125–138 Tüfekci KU, Meuwissen RL, Genç S (2014) The role of microRNAs in biological processes. Methods Mol Biol 1107:15–31 Turner DH, Mathews DH (2010) NNDB: the nearest neighbor parameter database for predicting stability of nucleic acid secondary structure. Nucleic Acids Res 38:D280–D282 Umarov R, Kuwahara H, Li Y, Gao X, Solovyev V (2019) Promoter analysis and prediction in the human genome using sequence-based deep learning models. Bioinformatics 35:2730–2737 Villalobos A, Ness JE, Gustafsson C, Minshull J, Govindarajan S (2006) Gene designer: a synthetic biology tool for constructing artificial DNA segments. BMC Bioinform 7:1–8 Vivian JT, Callis PR (2001) Mechanisms of tryptophan fluorescence shifts in proteins. Biophys J 80:2093–2109
168
B. C. Lin et al.
Wai HA, Lord J, Lyon M, Gunning A, Kelly H, Cibin P, Seaby EG, Spiers-Fitzgerald K, Lye J, Ellard S et al (2020) Blood RNA analysis can increase clinical diagnostic rate andresolve variants of uncertain significance. Genet Med 22:1005–1014 Walsh IM, Bowman MA, Soto Santarriaga IF, Rodriguez A, Clark PL (2020) Synonymous codon substitutions perturb cotranslational protein folding in vivo and impair cell fitness. Proc Natl Acad Sci 117:3528–3534 Wang Y, Qiu C, Cui Q (2015) A large-scale analysis of the relationship of synonymous SNPs changing microRNA regulation with functionality and disease. Int J Mol Sci 16:23545–23555 Wang L, Liu Y, Zhong X, Liu H, Lu C, Li C, Zhang H (2019) DMfold: a novel method to predict RNA secondary structure with pseudoknots based on deep learning and improved base pair maximization principle. Front Genet 10:143 Wen M, Cong P, Zhang Z, Lu H, Li T (2018) DeepMirTar: a deep-learning approach for predicting human miRNA targets. Bioinformatics 34:3781–3787 Wong N, Wang X (2014) miRDB: an online resource for microRNA target prediction and functional annotations. Nucleic Acids Res 43:D146–D152 Wu B, Eliscovich C, Yoon YJ, Singer RH (2016) Translation dynamics of single mRNAs in live cells and neurons. Science 352:1430–1435 Wu P, Zhou D, Lin W, Li Y, Wei H, Qian X, Jiang Y, He F (2018) Cell-type-resolved alternative splicing patterns in mouse liver. DNA Res 25:265–275 Wu Q, Medina SG, Kushawah G, DeVore ML, Castellano LA, Hand JM, Wright M, Bazzini AA (2019) Translation affects mRNA stability in a codon-dependent manner in human cells. elife 8:e45396 Xayaphoummine A, Bucher T, Isambert H (2005) Kinefold web server for RNA/DNA folding path and structure prediction including pseudoknots and knots. Nucleic Acids Res 33:W605–W610 Yeo G, Burge CB (2004) Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J Comput Biol 11:377–394 Yousef M, Jung S, Kossenkov AV, Showe LC, Showe MK (2007) Naïve Bayes for microRNA target predictions-machine learning for microRNA targets. Bioinformatics 23:2987–2992 Yu C-H, Dang Y, Zhou Z, Wu C, Zhao F, Sachs MS, Liu Y (2015) Codon usage influences the local rate of translation elongation to regulate co-translational protein folding. Mol Cell 59:744–754 Zeng Z, Bromberg Y (2019) Predicting functional effects of synonymous variants: a systematic review and perspectives. Front Genet 10 Article 914 Zeng K, Charlesworth B (2009) Estimating selection intensity on synonymous codon usage in a nonequilibrium population. Genetics 183:651–662, 651si–623si Zhang H, Zhang C, Li Z, Li C, Wei X, Zhang B, Liu Y (2019) A new method of RNA secondary structure prediction based on convolutional neural network and dynamic programming. Front Genet 10:467 Zhao F, Yu C-H, Liu Y (2017) Codon usage regulates protein structure and function by affecting translation elongation speed in Drosophila cells. Nucleic Acids Res 45:8484–8492 Zhao Q, Zhao Z, Fan X, Yuan Z, Mao Q, Yao Y (2021) Review of machine learning methods for RNA secondary structure prediction. PLoS Comput Biol 17:e1009291 Zhou X, Zhou W, Wang C, Wang L, Jin Y, Jia Z, Liu Z, Zheng B (2021) A comprehensive analysis and splicing characterization of naturally occurring synonymous variants in the ATP7B gene. Front Genet 11:592611–592611 Zichel R, Chearwae W, Pandey GS, Golding B, Sauna ZE (2012) Aptamers as a sensitive tool to detect subtle modifications in therapeutic proteins. PLoS One 7:e31948–e31948 zu Siederdissen CH, Bernhart SH, Stadler PF, Hofacker IL (2011) A folding algorithm for extended RNA secondary structures. Bioinformatics 27:i129–i136 Zucchelli E, Pema M, Stornaiuolo A, Piovan C, Scavullo C, Giuliani E, Bossi S, Corna S, Asperti C, Bordignon C et al (2017) Codon optimization leads to functional impairment of RD114-TR envelope glycoprotein. Mol Ther Methods Clin Dev 4:102–114 Zuker M (2003) Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res 31:3406–3415 Zuker M, Stiegler P (1981) Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res 9:133–148
Part V
The Role of SNPs in Personalized Medicine and the Platform Technology of Codon Optimization
Chapter 8
Using Genome Wide Studies to Generate and Test Hypotheses that Provide Mechanistic Details of How Synonymous Codons Affect Protein Structure and Function: Functional SNPs in the Age of Precision Medicine Brandon N. S. Ooi, Ashley J. W. Lim, Samuel S. Chong, and Caroline G. L. Lee
Authors Brandon N. S. Ooi and Ashley J.W. Lim have equally contribution to this chapters. The authors declare no potential conflict of interest.
8.1 Genetics in Precision Medicine Over the years, pharmaceuticals have gradually shifted from a universal drug development approach towards a more individualised approach, commonly known as precision medicine. Precision medicine considers an individual’s genetic profile, B. N. S. Ooi · A. J. W. Lim Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore S. S. Chong Department of Pediatrics, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore C. G. L. Lee (*) Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore Division of Cellular & Molecular Research, Humphrey Oei Institute of Cancer Research, National Cancer Centre Singapore, Singapore, Singapore Duke-NUS Medical School, Singapore, Singapore NUS Graduate School, National University of Singapore, Singapore, Singapore e-mail: [email protected] © Springer Nature Switzerland AG 2022 Z. E. Sauna, C. Kimchi-Sarfaty (eds.), Single Nucleotide Polymorphisms, https://doi.org/10.1007/978-3-031-05616-1_8
171
172
B. N. S. Ooi et al.
environment, and lifestyle choices with the aim to establish the most suitable medical care. There is growing interest in precision medicine with advancement in technologies for collection, storage, and analysis of such personal health information. In genetics, the reduced costs, and increased efficiencies of sequencing an individual’s genome, have made obtaining such information more practical and affordable. In addition, the development of large-scale biological databases and computational tools for analysing big datasets, that encompasses data from different parts of the world, has allowed for scrutinization of differences between populations and individuals that may account for differences in disease predisposition and drug responses (Gu and Jiang 2016). For example, Bachtiar et al. leveraged publicly available genome data from the 1000 Genomes Project, to establish a collection of single nucleotide polymorphisms (SNPs) that are differentiated across populations. Additionally, the team further identified such SNPs to be enriched in pathways of drugs that have also been previously reported to display differences in response across populations (Bachtiar et al. 2019). Additionally, database resources such as the Pharmacogenomics Knowledge Base (PharmGKB) has greatly facilitated the ease of access of knowledge pertaining to clinically actionable gene-drug associations and genotype-phenotype relationships (Whirl-Carrillo et al. 2012). Such databases actively collect and curate new knowledge regarding associations between genetic variations and drug responses. The usefulness of information presented in PharmGKB is evident through the Clinical Pharmacogenetics Implementation Consortium (CPIC), which is a joint project between PharmGKB and National Institutes of Health (NIH), providing curated guidelines for adjustments of drug therapy based on genetic variations within patients (Relling et al. 2020). Hence, clinicians will be able to utilise these resources to make more informed drug therapies based on the patient’s genetic background. Besides these, electronic health record systems and availability of wearable health trackers have also contributed to providing day-to-day information on the environment and lifestyle of an individual, further enabling the practice of precision medicine (Alzu’bi et al. 2019). A comprehensive ‘abstract text mining’ of NCBI PubMed database using related keywords (e.g. pharmacogenetics, pharmacogenomics, polymorphisms, SNPs, precision/personalized medicine) revealed approximately 4400 publications dating back to as early as the 1980s. Additionally, the number of publications over the years is reflective of the increased research attention being placed on the field of pharmacogenetics, with an average of about 300 related publications per year over the past 4 years (Fig. 8.1). The abstracts of these publications were further investigated using a natural language processing tool known as ScispaCy, that specialises in processing biomedical literature text (Neumann et al. 2019), allowing for the identification of the common medical conditions as well as drugs/chemicals that are commonly studied in pharmacogenetics studies. Based on the abstracts, adverse drug reactions or drug-induced toxicity are observed to be the more commonly discussed medical conditions amongst publications related to the topic (Fig. 8.2a). In general, such pharmacogenetics studies focuses on understanding how inter-individual genetic variability result in specific
8 Using Genome Wide Studies to Generate and Test Hypotheses that Provide…
173
Fig. 8.1 Distribution of the number of publications from PubMed, on topics related to precision medicine. Keywords such as precision/personalised medicine, polymorphisms/SNPs, and pharmacogenomics/pharmacogenetics were used to query for related publications and abstracts were extracted for analyses. Number of related publications was plotted against the year in which they were published
groups of individuals being susceptible to adverse drug reactions (ADR) or toxicity, with the aim of improving patient care by prescribing alternative treatments (Nakamura 2008). For example, statins are a class of drugs commonly used in the treatment of cardiovascular diseases which have been associated with skeletal muscle toxicity or myopathy in patients that exhibit polymorphisms in the SLCO1B1 gene (Camerino et al. 2021; Kee et al. 2020). As such, patients with associated high- risk polymorphisms are either offered an adjusted dosage of statins or an alternative treatment. Besides ADR, another well-studied area of pharmacogenetics is the variation of drug metabolism which may similarly influence the onset of ADR and more importantly drug responses (Daly 2017; Fuselli 2019). Variation in drug metabolism commonly involves polymorphisms occurring within genes encoding for drug- metabolising enzymes, in particular the cytochrome P450 gene family (Daly 2017). For example, rs5030656 within the CYP2D6 locus corresponds to an in-frame deletion resulting a codon encoding for lysine amino acid conferring a poor metabolizer phenotype (Tyndale et al. 1991). On the other hand, a different variant, rs3892097, may induce a non-functional CYP2D6 protein as a result of splicing defects (Gough et al. 1990). These examples are among many others that are summarised in the PharmGKB database, highlighting how polymorphic genes such as CYP2D6 may
174
B. N. S. Ooi et al.
Fig. 8.2 Word cloud of medical conditions and drugs based on abstracts of publications related to precision medicine and the use of genetics (a) Word cloud of medical conditions extracted from abstracts of related publications (b) Word cloud of drugs studied extracted from abstracts of related publications. Size of words represent the frequency in which they appear in the abstracts of publications. Abstracts were broken down and analysed using ScispaCy models built for processing biomedical, scientific or clinical text. Each scientific entity/term identified by the ScispaCy model was tabulated based on the frequency of mentions across identified publications and subsequently represented in the form of a word cloud
lead to variations in drug metabolism. CYP2D6, CYP2C9, and CYP2C19 are among genes with the most gene-drug associations by CPIC, with CYP2D6 having as many as 73 associations (Fig. 8.3a), further suggesting a central influence of these genes in influencing drug metabolism. Common drugs/chemicals discussed in related publications are similarly summarised in Fig. 8.2b. While some drugs such as warfarin and antidepressants are observed to be more visible in publications, the variety of drugs represented highlights the widespread applicability and interest of pharmacogenetics. Consistent with the trend observed in publications, CPIC similarly records guidelines for a variety of drugs which are significantly associated with SNPs in pertinent genes (Fig. 8.3b). Overall, the concept of pharmacogenetics is observed to be applied throughout a plethora of medical conditions and drugs, suggesting the widespread applicability and attractiveness of understanding the role of genetics in explaining variations in drug responses. An approach to further enhance the understanding of the role of genetics is to identify the potential functionalities of these genetic polymorphisms, investigating the associated changes in gene expression, amino acid changes, and protein structure resulting from them.
8 Using Genome Wide Studies to Generate and Test Hypotheses that Provide…
175
Fig. 8.3 Distribution of the number of Gene-Drug relationships for each drug and gene presented by Clinical Pharmacogenetics Implementation Consortium (CPIC) (a) Number of gene-drug relationships per gene that is presented by CPIC (b) Number of gene-drug relationships per drug that is presented by CPIC. Data was obtained from CPIC’s gene-drug guideline database and the gene/ drug was plotted against the number of associated relationships for each gene/drug. Only drugs with greater than two gene-drug relationships are presented
8.2 Functional Role of Coding and Non-coding SNPs Understanding or inferring the function of the various SNPs will allow for better understanding of the mechanisms that lead to differences in drug response, as well as assist in the more accurate identification of clinically relevant variants. An example of such an effort was the integration of several SNP annotation databases to create a comprehensive and well-annotated resource to establish potential functionality of SNPs through previously published reports, inferred potential functionality from genetic approaches and sequence motifs (Wang et al. 2010). With the growing interest in precision medicine over the years, it is important to have a general idea of how different categories of SNPs may exert their function. Traditionally, researchers have mainly focused on SNPs that reside within protein- coding regions of the genome. This is especially the case for non- synonymous SNPs where SNPs exert a direct effect by altering a particular amino acid in the protein, potentially leading to structural and chemical changes or even in the truncation of the protein. On the other hand, less attention has been paid to synonymous SNPs, which are SNPs within the coding region that were previously assumed to be silent due to the degeneracy of the genetic code. Increasingly, synonymous SNPs have been shown to also affect protein functionality through a variety of mechanisms (Hunt et al. 2014; Zeng and Bromberg 2019). Aside from SNPs in the coding regions, there is also growing evidence that non-coding SNPs play important functional roles in disease or drug response as a large number of
176
B. N. S. Ooi et al.
genome-wide association studies (GWAS) have revealed that >88% of variants associated with diseases are found in non-coding regions (Edwards et al. 2013; Ward and Kellis 2012). To effectively study the function of the different SNPs, separate approaches must be employed due to the uniqueness of each type of SNP. For example, the function of non-synonymous SNPs can be investigated by identifying the changes in amino acid sequences and subsequently studying how these changes influence the function of the encoded protein in terms of protein folding and protein-protein interactions. However, determining the function of synonymous and non-coding SNPs is often more challenging as these polymorphisms may influence phenotypes in a multitude of ways. A key theme however is that functional synonymous and/or non-coding SNPs exert their effect by influencing the regulation of gene expression levels, or affect the structure of the coded protein itself or the structure of other factors that would affect the stability of the coded protein (Hunt et al. 2014; Rojano et al. 2019). One approach used to distinguish between functional and non-functional variants is to make use of the contextual information about these SNPs, for example whether it lies within and predicted to alter a transcription factor binding site or other regulatory elements (Rojano et al. 2019). Such information will help to increase our confidence in the variant’s regulatory potential. Predictors that evaluate synonymous, non-synonymous, regulatory and other kinds of variants using machine learning techniques such as CADD (Kircher et al. 2014), DANN (Quang et al. 2015) and MutationTaster2 (Schwarz et al. 2014) have been developed, although currently there is a lack of gold standard experimentally validated data for proper evaluation of these predictors. Functional synonymous SNPs in coding regions have been described to act through four mechanisms: disruption of splicing signals, alteration of messenger RNA (mRNA) secondary structure, alteration of transcription factor or microRNA (miRNA) binding sites, and changes in ribosome-mediated translational attenuation program (Hunt et al. 2014; McCarthy et al. 2017). Briefly, these SNPs either indirectly influence protein folding and structure by disrupting signals contributing to these activities, or affect the levels of mRNA transcripts by disrupting the binding of transcription factors, miRNA or other complexes (Mora et al. 2016). For the case of polymorphisms in non-coding regions, gene expression could be affected if the SNP lies in promoter regions of the gene. Alternatively, non-coding SNPs could either affect the expression or alter the structure of other non-coding RNA such as miRNAs, long non-coding RNAs (lncRNAs) or circular RNAs (circRNAs). Recent studies have shown that these non-coding RNA molecules modulate the expression of mRNA, thus changes in the structure and expression of these molecules would in turn affect their binding to mRNA resulting in altered gene expression (Guo et al. 2016; Liu et al. 2014). An important resource for identifying and studying the role of regulatory SNPs is the availability of expression quantitative trait loci (eQTLs) databases (Porcu et al. 2019; Shan et al. 2019). eQTLs are generated from large scale GWAS studies that associate mRNA expression profiles with genome-wide SNP data to identify SNPs that affect the expression of gene transcripts. An example of this is the
8 Using Genome Wide Studies to Generate and Test Hypotheses that Provide…
177
Genotype-Tissue Expression (GTEx) project which attempts to collate both eQTLs and splicing quantitative trait loci (sQTLs) by performing genetic associations with gene expression and alternative splicing transcripts (Lonsdale et al. 2013). However, expression arrays may not detect all splice variants and more importantly, do not detect SNPs that affect translation. Other detection methods, such as measurement of ribosome bound mRNA (Ingolia et al. 2009), or protein level measurements would be required to identify these SNPs. An alternative strategy for detecting functional SNPs is through the determination of the allelic ratio of the transcripts. An imbalance in the ratio of allelic transcripts such that the expression of one transcript is higher or lower than the other suggests that there are cis-regulatory factors present for that gene (Johnson et al. 2008). An analysis of variant-drug combinations from PharmGKB would provide an idea of the type of variants associated with differences in drug responses (Barbarino et al. 2018). Notably, PharmGKB categorises variant-drug associations based on their level of evidence available, from high (level 1) to preliminary (level 4). Analysis of variants having a high level of associative evidence (level 1) unveils that a majority of these variants are non-synonymous, with a large number being missense variants (Fig. 8.4). However, the total number of synonymous and non-coding SNPs was 114, which was similar to the total number (123) of non-synonymous SNPs. Furthermore, when the evidence criteria were extended to include levels 2 and 3, most of the SNPs with pharmacogenomics effects were found to located in intronic regions, and the total number of synonymous and non-coding SNPs far outnumbered the non-synonymous variants (Fig. 8.5). These results suggest that greater emphasis should be placed on understanding the functionality of these regulatory and structural variants in order to facilitate the progress and advancement of pharmacogenomics and personalized healthcare.
8.3 Future Directions and Challenges Despite the accumulating evidence relating genetics to drug response, the implementation of pharmacogenetics testing in the clinics have not been as rapid as expected. Most clinicians agree that genetic variants affect patients’ response to drug therapy (Peterson et al. 2016), but factors such as unfamiliarity with pharmacogenomics findings, cost of testing, time constraints and lack of easily accessible tests have been cited as being barriers to adoption (Frigon et al. 2019). The first of these barriers is perhaps the most challenging. The task of the individual clinician to process the deluge of pharmacogenomics information (often with varying levels of detail) can be overwhelming to say the least. Hence, the curation and conversion of high-confidence evidence into clinically actionable recommendations is crucial. To that end, resources such as PharmGKB help summarize gene-drug interactions by ranking them according to the level of evidence available, from high (level 1) to preliminary (level 4), although level 1a is the only level likely to be of clinical significance. Other organizations such as the Clinical Pharmacogenetics Implementation
178
B. N. S. Ooi et al.
Fig. 8.4 Distribution of the type of SNPs having a high level of associative evidence (Level 1) for variant-drug combinations from PharmGKB. A total of 237 SNPs was identified with majority of SNPs being non-synonymous SNPs [black bars] (n = 123) as compared to synonymous and non- coding SNPs [white bars] (n = 114). SNPs with valid rsIDs were downloaded from the PharmGKB website, annotated for all functionalities with the Variant Effect Predictor (VEP) and filtered for those having Level 1 evidence
Consortium (CPIC) and the Dutch Pharmacogenetics Working Group (DPWG) publish guidelines and recommendations for the implementation of pharmacogenetic testing (Relling and Klein 2011; Swen et al. 2011). Unfortunately, the current number of such guidelines are small due to the lack of evidence warranting a change in clinical practice. It therefore imperative for variants to be validated in multiple studies and/or ethnicities to fully justify their inclusion into clinical practice. In this regard, there is still much that needs to be done. There is however much cause for optimism. Scientific progress and computational innovations are bound to positively impact the pharmacogenomics landscape of the future. Deeper understanding of the functions and properties of non-coding RNA, intronic and regulatory regions, as well as of synonymous SNPs, can help in distinguishing between variants that are truly functional from those that are associated by chance. These discoveries will spur the development of more accurate
8 Using Genome Wide Studies to Generate and Test Hypotheses that Provide…
179
Fig. 8.5 Distribution of the type of SNPs having level of evidence (Level 1–3) for variant-drug combinations from PharmGKB. A total of 9160 SNPs was identified with majority of SNPs being synonymous and non-coding SNPs [white bars] (n = 7879) as compared to non-synonymous SNPs [black bars] (n = 1281). SNPs with valid rsIDs were downloaded from the PharmGKB website, annotated for all functionalities with the Variant Effect Predictor (VEP) and filtered for those with levels of evidence from 1 to 3
predictive algorithms that utilize state of the art artificial intelligence models such as deep or machine learning. This in turn will help to reduce false positives and variants of uncertain significance, overall increasing the confidence of SNPs found from
180
B. N. S. Ooi et al.
pharmacogenomic studies. Furthermore, as more and more large-scale sequencing projects involving multiple populations are being completed, population specific genetic information could also be used for assessing pharmacogenomic effects in patients of different ethnicities (Bachtiar et al. 2019). Other commonly cited challenges to the implementation of pharmacogenomics are more surmountable. The development of multigene panels utilizing next generation sequencing (NGS) technologies reduce costs by allowing the testing of many individuals and gene loci simultaneously (Tzvetkov and Von Ahsen 2012). While NGS requires additional infrastructure and technical expertise for the processing and interpretation of results, sequencing costs have seen great reduction and standardized pipelines exists to improve the accessibility of these technologies to the clinics. Furthermore, pharmacogenomics clinical decision support tools with the potential to be integrated with electronic health records have also been developed, with at least one system being implemented as a trial in seven European countries (Blagec et al. 2018; Mukerjee et al. 2018). Larger scale implementation of these systems may follow in the near future. There is also an increasing genomic awareness both through formal education and in the public space amongst healthcare professionals and patients. In a recent survey, Owusu et al. found that physicians that had five or fewer years of practice were more prepared and confident to adopt genetic testing compared to their senior colleagues (Obeng et al. 2018). Additionally, direct-to-consumer genetic tests offered by companies such as 23andMe and CircleDNA have captured public interest; Selkirk et al. reported an increasing number of patients consulting their healthcare providers regarding the interpretation of these tests (Selkirk et al. 2013). All these developments could help accelerate the adoption of pharmacogenomics testing in the clinics. Using SNPs to predict drug response is an important component of pharmacogenomics. Association studies used in the discovery of pharmacogenomic SNPs have focused traditionally on the effects of single SNPs, which can be limited in their predictive potential. Models incorporating multiple SNPs (as well as other epidemiological factors) such as polygenic risk scoring and machine learning has been moderately successful (Ooi et al. 2021), although performance is often hampered by the size and quality of the training datasets (Ho et al. 2019). With the completion of multiple large-scale studies and the generation of higher quality datasets due to technological advancement, these predictive models could see even greater success. Furthermore, future developments in the artificial intelligence space such as in feature engineering could further increase the utility of these models. The full impact of SNPs in precision medicine, and in particular pharmacogenomics, has not yet been fully realized at present. Reproducible studies, deepening genomics insight and intelligent technologies are some of the important factors that are crucial for the success of this project. Nevertheless, the committed efforts of governments, clinicians, scientists and other relevant parties in tackling these issues are positive signs that patient-tailored treatment may not be too far off in the future.
8 Using Genome Wide Studies to Generate and Test Hypotheses that Provide…
181
References Alzu’bi AA, Zhou L, Watzlaf VJM (2019) Genetic variations and precision medicine. Perspect Health Inf Manag 16(Spring). NLM (Medline) Bachtiar M, Ooi BNS, Wang J, Jin Y, Tan TW, Chong SS, Lee CGL (2019) Towards precision medicine: interrogating the human genome to identify drug pathways associated with potentially functional, population-differentiated polymorphisms. Pharmacogenomics J 19(6):516–527. https://doi.org/10.1038/s41397-019-0096-y Barbarino JM, Whirl-Carrillo M, Altman RB, Klein TE (2018) PharmGKB: a worldwide resource for pharmacogenomic information. Wiley Interdiscip Rev Syst Biol Med 10(4). Wiley- Blackwell. https://doi.org/10.1002/wsbm.1417 Blagec K, Koopmann R, Crommentuijn-Van Rhenen M, Holsappel I, Van Der Wouden CH, Konta L, Xu H, Steinberger D, Just E, Swen JJ, Guchelaar HJ, Samwald M (2018) Implementing pharmacogenomics decision support across seven European countries: the Ubiquitous Pharmacogenomics (U-PGx) project. J Am Med Inform Assoc 25(7):893–898. https://doi. org/10.1093/jamia/ocy005 Camerino GM, Tarantino N, Canfora I, De Bellis M, Musumeci O, Pierno S (2021) Statin-induced myopathy: translational studies from preclinical to clinical evidence. Int J Mol Sci 22(4):1–18. MDPI AG. https://doi.org/10.3390/ijms22042070 Daly AK (2017) Pharmacogenetics: a general review on progress to date. Br Med Bull 124(1):1–15. https://doi.org/10.1093/bmb/ldx035 Edwards SL, Beesley J, French JD, Dunning M (2013) Beyond GWASs: illuminating the dark road from association to function. Am J Hum Genet 93(5):779–797. Cell Press. https://doi. org/10.1016/j.ajhg.2013.10.012 Frigon MP, Blackburn MÈ, Dubois-Bouchard C, Gagnon AL, Tardif S, Tremblay K (2019) Pharmacogenetic testing in primary care practice: opinions of physicians, pharmacists and patients. Pharmacogenomics 20(8):589–598. https://doi.org/10.2217/pgs-2019-0004 Fuselli S (2019) Beyond drugs: the evolution of genes involved in human response to medications. Proc R Soc B Biol Sci 286(1913). Royal Society Publishing. https://doi.org/10.1098/ rspb.2019.1716 Gough AC, Miles JS, Spurr NK, Moss JE, Gaedigk A, Eichelbaum M, Wolf CR (1990) Identification of the primary gene defect at the cytochrome P450 CYP2D locus. Nature 347(6295):773–776. https://doi.org/10.1038/347773a0 Gu W, Jiang J (2016) Genetic polymorphisms and trauma precision medicine. In: Advanced trauma and surgery. Springer, Singapore, pp 189–209. https://doi.org/10.1007/978-981-10-2425-2_13 Guo X, Gao L, Wang Y, Chiu DKY, Wang T, Deng Y (2016) Advances in long noncoding RNAs: identification, structure prediction and function annotation. Brief Funct Genomics 15(1):38–46. https://doi.org/10.1093/bfgp/elv022 Ho DSW, Schierding W, Wake M, Saffery R, O’Sullivan J (2019) Machine learning SNP based prediction for precision medicine. Front Genet 10(MAR):267. Frontiers Media S.A. https://doi. org/10.3389/fgene.2019.00267 Hunt RC, Simhadri VL, Iandoli M, Sauna ZE, Kimchi-Sarfaty C (2014) Exposing synonymous mutations. Trends Genet 30(7):308–321. Elsevier Ltd. https://doi.org/10.1016/j.tig.2014.04.006 Ingolia NT, Ghaemmaghami S, Newman JRS, Weissman JS (2009) Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324(5924):218–223. https://doi.org/10.1126/science.1168978 Johnson AD, Zhang Y, Papp AC, Pinsonneault JK, Lim JE, Saffen D, Dai Z, Wang D, Sadée W (2008) Polymorphisms affecting gene transcription and mRNA processing in pharmacogenetic candidate genes: detection through allelic expression imbalance in human target tissues. Pharmacogenet Genomics 18(9):781–791. https://doi.org/10.1097/FPC.0b013e3283050107 Kee PS, Chin PKL, Kennedy MA, Maggo SDS (2020) Pharmacogenetics of statin-induced myotoxicity. Front Genet 11. Frontiers Media S.A. https://doi.org/10.3389/fgene.2020.575678
182
B. N. S. Ooi et al.
Kircher M, Witten DM, Jain P, O’roak BJ, Cooper GM, Shendure J (2014) A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 46(3):310–315. https://doi.org/10.1038/ng.2892 Liu B, Li J, Cairns MJ (2014) Identifying miRNAs, targets and functions. Brief Bioinform 15(1):1–19. https://doi.org/10.1093/bib/bbs075 Lonsdale J, Thomas J, Salvatore M, Phillips R, Lo E, Shad S, Hasz R, Walters G, Garcia F, Young N, Foster B, Moser M, Karasik E, Gillard B, Ramsey K, Sullivan S, Bridge J, Magazine H, Syron J et al (2013) The Genotype-Tissue Expression (GTEx) project. Nat Genet 45(6):580–585. https://doi.org/10.1038/ng.2653 McCarthy C, Carrea A, Diambra L (2017) Bicodon bias can determine the role of synonymous SNPs in human diseases. BMC Genomics 18(1):227. https://doi.org/10.1186/s12864-017-3609-6 Mora A, Sandve GK, Gabrielsen OS, Eskeland R (2016) In the loop: promoter-enhancer interactions and bioinformatics. Brief Bioinform 17(6):980–995. Oxford Academic. https://doi. org/10.1093/bib/bbv097 Mukerjee G, Huston A, Kabakchiev B, Piquette-Miller M, van Schaik R, Dorfman R (2018) User considerations in assessing pharmacogenomic tests and their clinical support tools. NPJ Genom Med 3(1):26. Nature Publishing Group. https://doi.org/10.1038/s41525-018-0065-4 Nakamura Y (2008) Pharmacogenomics and drug toxicity. N Engl J Med 359(8):856–858. https:// doi.org/10.1056/NEJMe0805136 Neumann M, King D, Beltagy I, Ammar W (2019) ScispaCy: fast and robust models for biomedical natural language processing Obeng AO, Fei K, Levy KD, Elsey AR, Pollin TI, Ramirez AH, Weitzel KW, Horowitz CR (2018) Physician-reported benefits and barriers to clinical implementation of genomic medicine: a multi-site ignite-network survey. J Personal Med 8(3). https://doi.org/10.3390/jpm8030024 Ooi BNS, Raechell, Ying AF, Koh YZ, Jin Y, Yee SWL, Lee JHS, Chong SS, Tan JWC, Liu J, Lee CG, Drum CL (2021) Robust performance of potentially functional SNPs in machine learning models for the prediction of atorvastatin-induced myalgia. Front Pharmacol 12:605764. https:// doi.org/10.3389/fphar.2021.605764 Peterson JF, Field JR, Shi Y, Schildcrout JS, Denny JC, McGregor TL, Van Driest SL, Pulley JM, Lubin IM, Laposata M, Roden DM, Clayton EW (2016) Attitudes of clinicians following large-scale pharmacogenomics implementation. Pharmacogenomics J 16(4):393–398. https:// doi.org/10.1038/tpj.2015.57 Porcu E, Rüeger S, Lepik K, Agbessi M, Ahsan H, Alves I, Andiappan A, Arindrarto W, Awadalla P, Battle A, Beutner F, Jan Bonder M, Boomsma D, Christiansen M, Claringbould A, Deelen P, Esko T, Favé MJ, Franke L et al (2019) Mendelian randomization integrating GWAS and eQTL data reveals genetic determinants of complex and clinical traits. Nat Commun 10(1). https:// doi.org/10.1038/s41467-019-10936-0 Quang D, Chen Y, Xie X (2015) DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics 31(5):761–763. https://doi.org/10.1093/bioinformatics/btu703 Relling MV, Klein TE (2011) CPIC: clinical pharmacogenetics implementation consortium of the pharmacogenomics research network. Clin Pharmacol Ther 89(3):464–467. Clin Pharmacol Ther. https://doi.org/10.1038/clpt.2010.279 Relling MV, Klein TE, Gammal RS, Whirl-Carrillo M, Hoffman JM, Caudle KE (2020) The clinical pharmacogenetics implementation consortium: 10 years later. Clin Pharmacol Ther 107(1):171–175. Nature Publishing Group. https://doi.org/10.1002/cpt.1651 Rojano E, Seoane P, Ranea JAG, Perkins JR (2019) Regulatory variants: from detection to predicting impact. Brief Bioinform 20(5):1639–1654. https://doi.org/10.1093/bib/bby039 Schwarz JM, Cooper DN, Schuelke M, Seelow D (2014) Mutationtaster2: mutation prediction for the deep-sequencing age. Nat Methods 11(4):361–362. Nature Publishing Group. https://doi. org/10.1038/nmeth.2890 Selkirk CG, Weissman SM, Anderson A, Hulick PJ (2013) Physicians’ preparedness for integration of genomic and pharmacogenetic testing into practice within a major healthcare system. Genet Test Mol Biomarkers 17(3):219–225. https://doi.org/10.1089/gtmb.2012.0165
8 Using Genome Wide Studies to Generate and Test Hypotheses that Provide…
183
Shan N, Wang Z, Hou L (2019) Identification of trans-eQTLs using mediation analysis with multiple mediators. BMC Bioinformatics 20(S3):126. https://doi.org/10.1186/s12859-019-2651-6 Swen JJ, Nijenhuis M, De Boer A, Grandia L, Maitland-Van Der Zee AH, Mulder H, Rongen GAPJM, Van Schaik RHN, Schalekamp T, Touw DJ, Van Der Weide J, Wilffert B, Deneer VHM, Guchelaar HJ (2011) Pharmacogenetics: from bench to byte an update of guidelines. Clin Pharmacol Ther 89(5):662–673. https://doi.org/10.1038/clpt.2011.34 Tyndale R, Aoyama T, Broly F, Matsunaga T, Inaba T, Kalow W, Gelboin HV, Meyer UA, Gonzalez FJ (1991) Identification of a new variant CYP2D6 allele lacking the codon encoding Lys-281: possible association with the poor metabolizer phenotype. Pharmacogenetics 1(1):26–32. https://doi.org/10.1097/00008571-199110000-00005 Tzvetkov M, Von Ahsen N (2012) Pharmacogenetic screening for drug therapy: from single gene markers to decision making in the next generation sequencing era. Pathology 44(2):166–180. Lippincott Williams and Wilkins. https://doi.org/10.1097/PAT.0b013e32834f4d69 Wang J, Ronaghi M, Chong SS, Lee CGL (2010) pfSNP: an integrated potentially functional SNP resource that facilitates hypotheses generation through knowledge syntheses. https://doi. org/10.1002/humu.21331 Ward LD, Kellis M (2012) Interpreting noncoding genetic variation in complex traits and human disease. Nat Biotechnol 30(11):1095–1106. Nat Biotechnol. https://doi.org/10.1038/nbt.2422 Whirl-Carrillo M, McDonagh EM, Hebert JM, Gong L, Sangkuhl K, Thorn CF, Altman RB, Klein TE (2012) Pharmacogenomics knowledge for personalized medicine. Clin Pharmacol Ther 92(4):414–417. NIH Public Access. https://doi.org/10.1038/clpt.2012.96 Zeng Z, Bromberg Y (2019) Predicting functional effects of synonymous variants: a systematic review and perspectives. Front Genet 10:914. Frontiers Media S.A. https://doi.org/10.3389/ fgene.2019.00914
Chapter 9
SNPs and Personalized Medicine: Scrutinizing Pathogenic Synonymous Mutations for Precision Oncology Samuel Peña-Llopis
9.1 Introduction Until recently, synonymous mutations were not considered to be potentially pathogenic or deleterious, since the encoded amino acid remains the same. Initial studies considered that this type of mutations were not under selective pressure (Kimura 1977). Thus, synonymous mutations have been systematically disregarded from most genomic analyses. However, pioneering studies showed that not all synonymous mutations are indeed ‘silent’ (Kimchi-Sarfaty et al. 2007). Further studies supported the concept that synonymous mutations can be pathogenic through multiple mechanisms and result in the activation of oncogenes or inactivation of tumor suppressor genes, driving tumorigenesis and contributing significantly to human cancer (Supek et al. 2014). In the era of personalized medicine, it is crucial to identify and validate reliable biomarkers that can allow patient stratification and predict positive drug responses. Ignoring synonymous mutations from genomic analyses might have undesirable and detrimental consequences for patient stratification and precision medicine, since the synonymous mutations that were disregarded may be drivers of tumorigenesis or other human diseases. Thus, currently, there are numerous algorithms to predict pathogenic or deleterious synonymous mutations from genomic analyses. This wealth of information will pave the way for more precise risk stratifications of patients and more personalized treatments.
S. Peña-Llopis (*) Translational Genomics in Solid Tumors, German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ) at the University Hospital Essen, Essen, Germany e-mail: [email protected] © Springer Nature Switzerland AG 2022 Z. E. Sauna, C. Kimchi-Sarfaty (eds.), Single Nucleotide Polymorphisms, https://doi.org/10.1007/978-3-031-05616-1_9
185
186
S. Peña-Llopis
9.2 Biomarkers and Next-Generation Sequencing for Personalized Medicine Every individual is unique and unrepeatable. However, treatments for human diseases are generally intended to be effective in most individuals of a certain population, while there is no effect or even toxicity on others. The development of novel technologies is leading to a progressive personalization on the treatments of human diseases. After completion of the human genome sequencing early this century, DNA sequencing costs are dropping drastically. This facilitates the generation of a vast amount of genomic information from individual patients. Nevertheless, genomic big data analyses remain challenging. Current challenges include the development of precise and reliable variant calling pipelines to find mutations, since depending on the algorithm used, different mutations can be identified on whole genomes (O’Rawe et al. 2013). As a consequence, the main drivers of the disease might be missed. One of the main difficulties and bottlenecks in personalized medicine for a disease is the identification of reliable biomarkers for patient stratification (Fig. 9.1). There are several examples of very successful predictive biomarkers for positive drug responses. For instance, mutations, amplifications and overexpression of human epidermal growth factor receptor 2 (HER2) in about 25% of breast cancer patients predict the response to anti-HER2 antibodies, such as trastuzumab (Hudis 2007). Also, most chronic myeloid leukemia (CML) patients show a characteristic Philadelphia chromosome by chromosomal translocation between chromosome 9 and 22, creating the BCR–ABL fusion protein, which acts as a constitutively active Patient Stratification
Biomarker 1
Specific Treatments
Drug 1
Biomarker 2
Drug 2
Biomarker 3
Drug 3
Patient Population
Fig. 9.1 Diagram illustrating ideal patient stratification of a certain disease based on predictive biomarkers of drug response
9 SNPs and Personalized Medicine: Scrutinizing Pathogenic Synonymous Mutations…
187
tyrosine kinase and predicts responses to the tyrosine kinase inhibitor imatinib (Deininger and Druker 2003). Similarly, KIT mutations in metastatic gastrointestinal stromal tumors (GIST) predict for responses to imatinib (Verweij et al. 2004). In addition, activating mutations in epidermal growth factor receptor (EGFR) occur in about 10% of non-small cell lung cancer (NSCLC), which is the leading cause of cancer mortality, as well as other cancer entities, including colorectal cancers, and predict positive responses to treatment with the tyrosine kinase inhibitors gefitinib and erlotinib (Paez et al. 2004; Ciardiello and Tortora 2008). Approximately 3–5% NSCLC patients harbor ALK translocations, which are mutually exclusive with EGFR mutations (Sasaki et al. 2010) and predict responses to the ALK inhibitor crizotinib (Butrynski et al. 2010). About 60% of melanomas display a hotspot p.V600E activating mutation in serine/threonine kinase proto-oncogene BRAF (Miller and Mihm 2006), which predicts a remarkable positive response to the BRAF inhibitor vemurafenib (Flaherty et al. 2010). The development of molecularly targeted therapies against oncogenic kinases has revolutionized the treatment of various types of cancer in the last decade (Sawyers 2004). Unfortunately, such therapies are not suitable to target tumor suppressor genes, whose mutations generally result in a loss-of-function for the target protein. However, mutant tumor suppressor genes, which currently are considered ‘undruggable’, may nevertheless, be exploited for therapeutic purposes through ‘druggable’ synthetic lethal interactions (Hartwell et al. 1997; Kaelin 2005). One approach is to perform high-throughput chemical screens for compounds that selectively kill tumor cells harboring specific mutations; however, additional studies are necessary to characterize the responsible gene/protein targets. An alternative strategy is to perform genetic screens employing, for example, large-scale RNA interference (RNAi) using a pooled library of short hairpin RNAs (shRNAs) or a CRISPR-Cas9 screen, which can lead to the identification of synthetic lethal interactors that are pharmacologically tractable (Sawyers 2009). For instance, about 5% of breast cancers harbor inactivating mutations in either of the breast cancer susceptibility genes BRCA1 and BRCA2, which are implicated in the homologous recombination (HR) DNA-repair pathway. These mutations make cells particularly sensitive to blockade of the repair of DNA single-strand breaks via the otherwise non-essential HR compensatory pathway by inhibition of the enzyme poly(ADP) ribose polymerase (PARP) by treatment with olaparib (Ashworth 2008). Thus, BRCA1 and BRCA2 deficient tumors are predictive of positive responses with PARP inhibitors alone or in combination with cytotoxic agents (Ratnam and Low 2007). Other biomarkers of interest in molecular diagnostics are the prognostic biomarkers, which are not associated with drug responses, but are indicative of good or poor patient outcome when a gene is mutated or genetically altered. Patients can then be stratified, and it may be beneficial to know whether further therapy is required. For example, mutations in TP53 are associated with poor survival in chronic lymphocytic leukemia (CLL) (Gonzalez et al. 2011). The Oncotype DX assay is a 21-gene expression test that predicts the probability of breast cancers progressing after surgery (Paik et al. 2004; Cronin et al. 2007), which was simplified by the immunohistochemistry (IHC) of four markers (Cuzick et al. 2011).
188
S. Peña-Llopis
Inactivating mutations of the epigenetic modifier BRCA1 associated protein 1 (BAP1) are mutually exclusive with mutations in polybromo 1 (PBRM1) in clear- cell renal cell carcinoma (ccRCC) (Peña-Llopis et al. 2012, 2013) and associated with higher tumor aggressiveness, poor prognosis (Peña-Llopis et al. 2012) and poor patient survival (Kapur et al. 2013). These discoveries indeed set the foundation for the first molecular genetic classification of ccRCCs based on gene mutations/inactivations, providing a rationale for subtype-specific treatments. Other attempts to classify ccRCCs are based on multiple parameters, such as gene expression signatures (Brannon et al. 2010; Motzer et al. 2020), non-coding RNAs (Malouf et al. 2015) or both (Creighton et al. 2013), and thus, less likely to be used in the clinic. IHC for nuclear BAP1 and PBRM1 is extremely sensitive and specific for the mutation status of these tumor suppressor genes (Peña-Llopis et al. 2012) and it is being currently used for patient stratification in many hospitals. There are currently excellent initiatives of collaborative network for precision oncology, such as the MASTER trial for adult patients with rare tumors and the INFORM registry in pediatric cancers. Recently, the MASTER program used whole-genome or whole-exome DNA sequencing and RNA sequencing to identify actionable drivers of tumorigenesis, providing recommended therapies for about 32% of adult cancer patients of a large variety of tumor entities. These recommended therapies showed an improvement of 24% vis-à-vis the overall response rate and 55% with respect to the disease control rate compared to previous treatments (Horak et al. 2021). The INFORM registry used low-coverage whole-genome sequencing (lcWGS), whole-exome sequencing, RNA-Seq and methylation arrays in children with relapsed, progressive or high-risk pediatric cancer, observing an almost a two-fold increase in the progression-free survival of patients with the highest target priority level receiving matched targeted treatments (van Tilburg et al. 2021). These studies highlight the advantage of applying molecular stratification in rare cancers to more efficiently treat these patient groups through encouraging clinical trials and the approval of personalized therapies.
9.3 The Impact of Pathogenic Synonymous Mutations in Prognosis and Precision Medicine Most synonymous mutations are silent and non-pathogenic, but certain ones can be implicated in the development of several human diseases, such as pulmonary sarcoidosis, haemophilia B, adult and child attention deficit/hyperactivity disorder (ADHD) and some cancers (Sauna and Kimchi-Sarfaty 2011). In fact, synonymous mutations can activate oncogenes or inactivate tumor suppressor genes and drive tumorigenesis (Supek et al. 2014). About 6–8% of all driver mutations are estimated to be due to synonymous mutations, which are enriched in oncogenes and slightly depleted in tumor suppressor genes, indicating that they are under selective pressure in a tissue-specific manner (Supek et al. 2014).
9 SNPs and Personalized Medicine: Scrutinizing Pathogenic Synonymous Mutations…
189
Pathogenic synonymous mutations can activate oncogenes and autosomal disorder genes through a variety of molecular mechanisms by affecting, for instance, the rate of translation elongation, the RNA structure, protein folding or splicing (Sauna and Kimchi-Sarfaty 2011). About half of the driver synonymous mutations in oncogenes have effects on alternative splicing by the gain of exonic splicing enhancer (ESE) motifs and loss of exonic splicing silencer (ESS) motifs (Supek et al. 2014), which are highly conserved in evolution (Parmley et al. 2006). Examples of this are the pathogenic synonymous mutations in multidrug resistance 1 (MDR1) gene (Kimchi-Sarfaty et al. 2007), RETproto-oncogene in Hirschsprung disease (Auricchio et al. 1999), BCL2L12 in melanoma (Gartner et al. 2013) and EGFR in squamous cell carcinoma (Tan et al. 2017). On the other hand, pathogenic synonymous mutations in tumor suppressors and genes implicated in autosomal disorders are characterized by affecting the splicing machinery to skip the exon with the synonymous mutation or by activating new cryptic splice sites (Cartegni et al. 2002), which most likely causes a frameshift, truncation of the protein leading to its inactivation. Examples of exon skipping and protein inactivation by synonymous mutations were identified in ATM in ataxia- telangiectasia (Teraoka et al. 1999), NF1 in neurofibromatosis type 1 (Ars et al. 2000), APC in familial adenomatous polyposis (Montera et al. 2001), TP53 in a large variety of cancers (Cartegni et al. 2002), BRCA1 in breast and ovarian cancer families (Anczuków et al. 2008) and the von Hippel-Lindau (VHL) gene in familial erythrocytosis or von Hippel-Lindau disease (Lenglet et al. 2018). Synonymous polymorphisms can be correlated with clinical response. A germline SNP in an exon of proto-oncogene MET was found to be independently associated with recurrence-free survival in renal cell carcinoma (Schutz et al. 2013). Exonic SNPs in EGFR can predict clinical outcome in patients with advanced non-small- cell lung cancer (NSCLC) treated with gefitinib (Ma et al. 2009; Zhang et al. 2013). A synonymous polymorphism in Tristetraprolin (TTP) was correlated with a lack of response to Herceptin/Trastuzumab in HER2-positive-breast cancer patients (Griseri et al. 2011). Synonymous mutations can affect patient prognosis. A somatic synonymous mutation in tumor suppressor BAP1 in a renal cell carcinoma patient from The Cancer Genome Atlas (TCGA) was found to result in exon skipping, frameshift, premature termination of translation, degradation of mRNA and protein and, finally, loss of function (Niersch et al. 2021) (Fig. 9.2). This unacknowledged synonymous mutation decreased by four-fold the expected survival for a patient with good prognosis (because of a PBRM1 truncating mutation) to a very bad prognosis by inactivation of both BAP1 and PBRM1 (Niersch et al. 2021) (Fig. 9.3). This study highlights the importance of inspecting synonymous mutations near acceptor splice sites of cancer genes to identify drivers of tumorigenesis that might change the patient prognosis and stratification and, ultimately, be more precise in personalized medicine.
S. Peña-Llopis
190
10 11 12 12
11
*
BAP1 Exon 11 skipping
13
14 1516 17
10
BAP1 protein
10 12 Frameshift and early stop codon
Loss of function mRNA and protein degradation
Fig. 9.2 Diagram depicting a pathogenic synonymous mutation near the acceptor splice site of exon 11 of BAP1 that leads to exon skipping, frameshift and premature stop codon, inducing its mRNA and protein degradation and loss of function. (Reproduced from Niersch et al. 2021)
Fig. 9.3 Kaplan-Meier curves for the indicated groups from the renal cell carcinoma TCGA dataset. The green arrow indicates the index patient with the synonymous mutation. The blue arrow depicts the worsening of the median overall survival from tumors with PBRM1 mutations to tumor with mutations in both BAP1 and PBRM1 (B/P). (Reproduced from Niersch et al. (2021))
9 SNPs and Personalized Medicine: Scrutinizing Pathogenic Synonymous Mutations…
191
9.4 Algorithms for Predicting Pathogenic Synonymous Mutations Presently there are several algorithms and computational tools that can predict with high confidence synonymous mutations that are likely to be pathogenic or deleterious. The Synonymous Mutations in Cancer database (SynMICdb, http://SynMICdb. dkfz.de) is a comprehensive compendium of over 650,000 synonymous mutations in human cancer with information about their predicted pathogenicity (Sharma et al. 2019). Synonymous mutations are ranked by their likelihood to have a functional impact, which are scored based on the mutation frequency, the mutation load, the mutational signature, the evolutionary conservation, the annotation as a known cancer gene or as a SNP, the mutation positions (Dayem Ullah et al. 2018), the impact on splicing (derived from RegRNA 2.0 (Chang et al. 2013) and SpliceAid-F (Giulietti et al. 2013) databases) and the RNA secondary structure change (Sharma et al. 2019). Mueller and colleagues used a synonymous position mutation library to evaluate the impact of mutations and identify those synonymous mutations that might alter exon inclusion (Mueller et al. 2015). Other algorithms, such as the undersampling scheme-based method for deleterious synonymous mutation (usDSM) prediction, uses 14-dimensional biology features and random forest classifier to identify pathogenic synonymous mutation (Tang et al. 2021). This study claims to display higher performance for finding deleterious synonymous mutations than previous algorithms, including CADD (Kircher et al. 2014), DANN (Quang et al. 2015), FATHMM-MKL (Shihab et al. 2015), PhD-SNPg (Capriotti and Fariselli 2017), PrDSM (Cheng et al. 2020), SilVA (Buske et al. 2013) and TraP (Gelfman et al. 2017). However, independent benchmark tests comparing the different computational algorithms would be necessary to evaluate their specificity and sensitivity for predicting pathogenic synonymous mutations.
9.5 Concluding Remarks and Future Perspectives Clinical practice is benefiting from new technologies and high-throughput approaches, such as next-generation sequencing, SNP analysis, transcriptomics profiling and RNAi/CRISPR screens for the discovery of novel predictive biomarkers, which can be game changers in their respective diseases. Unfortunately, there are only a few predictive biomarkers that have been sufficiently validated to be used in the clinic. Therefore, currently there is a large limitation in the number of driver genes that are actionable in genomic analyses, for which there is a drug(s) that is effective when mutated or deregulated. It is envisioned that novel technologies and the development of artificial intelligence will boost the discovery and validation of new predictive biomarkers for a large amount of diseases. At the same time, algorithms for predicting pathologic synonymous mutations with excellent specificity and sensitivity will be routinely implemented in genomic analyses to avoid missing significant drivers.
192
S. Peña-Llopis
References Anczuków O, Buisson M, Salles MJ, Triboulet S, Longy M, Lidereau R, Sinilnikova OM, Mazoyer S (2008) Unclassified variants identified in BRCA1 exon 11: consequences on splicing. Genes Chromosomes Cancer 47:418–426. https://doi.org/10.1002/gcc.20546 Ars E, Serra E, Garcia J, Kruyer H, Gaona A, Lazaro C, Estivill X (2000) Mutations affecting mRNA splicing are the most common molecular defects in patients with neurofibromatosis type 1. Hum Mol Genet 9:237–247. https://doi.org/10.1093/hmg/9.2.237 Ashworth A (2008) A synthetic lethal therapeutic approach: poly(ADP) ribose polymerase inhibitors for the treatment of cancers deficient in DNA double-strand break repair. J Clin Oncol 26:3785–3790. https://doi.org/10.1200/JCO.2008.16.0812 Auricchio A, Griseri P, Carpentieri ML, Betsos N, Staiano A, Tozzi A, Priolo M, Thompson H, Bocciardi R, Romeo G et al (1999) Double heterozygosity for a RET substitution interfering with splicing and an EDNRB missense mutation in Hirschsprung disease. Am J Hum Genet 64:1216–1221. https://doi.org/10.1086/302329 Brannon AR, Reddy A, Seiler M, Arreola A, Moore DT, Pruthi RS, Wallen EM, Nielsen ME, Liu H, Nathanson KL et al (2010) Molecular stratification of clear cell renal cell carcinoma by consensus clustering reveals distinct subtypes and survival patterns. Genes Cancer 1:152–163. https://doi.org/10.1177/1947601909359929 Buske OJ, Manickaraj A, Mital S, Ray PN, Brudno M (2013) Identification of deleterious synonymous variants in human genomes. Bioinformatics 29:1843–1850. https://doi. org/10.1093/bioinformatics/btt308 Butrynski JE, D’Adamo DR, Hornick JL, Dal Cin P, Antonescu CR, Jhanwar SC, Ladanyi M, Capelletti M, Rodig SJ, Ramaiya N et al (2010) Crizotinib in ALK-rearranged inflammatory myofibroblastic tumor. N Engl J Med 363:1727–1733. https://doi.org/10.1056/NEJMoa1007056 Capriotti E, Fariselli P (2017) PhD-SNPg: a webserver and lightweight tool for scoring single nucleotide variants. Nucleic Acids Res 45:W247–W252. https://doi.org/10.1093/nar/gkx369 Cartegni L, Chew SL, Krainer AR (2002) Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nat Rev Genet 3:285–298. https://doi.org/10.1038/nrg775 Chang TH, Huang HY, Hsu JB, Weng SL, Horng JT, Huang HD (2013) An enhanced computational platform for investigating the roles of regulatory RNA and for identifying functional RNA motifs. BMC Bioinformatics 14(Suppl 2):S4. https://doi.org/10.1186/1471-2105-14-S2-S4 Cheng N, Li M, Zhao L, Zhang B, Yang Y, Zheng CH, Xia J (2020) Comparison and integration of computational methods for deleterious synonymous mutation prediction. Brief Bioinform 21:970–981. https://doi.org/10.1093/bib/bbz047 Ciardiello F, Tortora G (2008) EGFR antagonists in cancer treatment. N Engl J Med 358:1160–1174. https://doi.org/10.1056/NEJMra0707704 Creighton CJ, Morgan M, Gunaratne PH, Wheeler DA, Gibbs RA, Gordon Robertson A, Chu A, Beroukhim R, Cibulskis K, Signoretti S et al (2013) Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499:43–49. https://doi.org/10.1038/nature12222 Cronin M, Sangli C, Liu ML, Pho M, Dutta D, Nguyen A, Jeong J, Wu J, Langone KC, Watson D (2007) Analytical validation of the Oncotype DX genomic diagnostic test for recurrence prognosis and therapeutic response prediction in node-negative, estrogen receptor-positive breast cancer. Clin Chem 53:1084–1091. https://doi.org/10.1373/clinchem.2006.076497 Cuzick J, Dowsett M, Pineda S, Wale C, Salter J, Quinn E, Zabaglo L, Mallon E, Green AR, Ellis IO et al (2011) Prognostic value of a combined estrogen receptor, progesterone receptor, Ki-67, and human epidermal growth factor receptor 2 immunohistochemical score and comparison with the Genomic Health recurrence score in early breast cancer. J Clin Oncol 29:4273–4278. https://doi.org/10.1200/JCO.2010.31.2835 Dayem Ullah AZ, Oscanoa J, Wang J, Nagano A, Lemoine NR, Chelala C (2018) SNPnexus: assessing the functional relevance of genetic variation to facilitate the promise of precision medicine. Nucleic Acids Res 46:W109–W113. https://doi.org/10.1093/nar/gky399
9 SNPs and Personalized Medicine: Scrutinizing Pathogenic Synonymous Mutations…
193
Deininger MW, Druker BJ (2003) Specific targeted therapy of chronic myelogenous leukemia with imatinib. Pharmacol Rev 55:401–423. https://doi.org/10.1124/pr.55.3.4 Flaherty KT, Puzanov I, Kim KB, Ribas A, McArthur GA, Sosman JA, O’Dwyer PJ, Lee RJ, Grippo JF, Nolop K et al (2010) Inhibition of mutated, activated BRAF in metastatic melanoma. N Engl J Med 363:809–819. https://doi.org/10.1056/NEJMoa1002011 Gartner JJ, Parker SC, Prickett TD, Dutton-Regester K, Stitzel ML, Lin JC, Davis S, Simhadri VL, Jha S, Katagiri N et al (2013) Whole-genome sequencing identifies a recurrent functional synonymous mutation in melanoma. Proc Natl Acad Sci U S A 110:13481–13486. https://doi. org/10.1073/pnas.1304227110 Gelfman S, Wang Q, McSweeney KM, Ren Z, La Carpia F, Halvorsen M, Schoch K, Ratzon F, Heinzen EL, Boland MJ et al (2017) Annotating pathogenic non-coding variants in genic regions. Nat Commun 8:236. https://doi.org/10.1038/s41467-017-00141-2 Giulietti M, Piva F, D’Antonio M, D’Onorio De Meo P, Paoletti D, Castrignano T, D’Erchia AM, Picardi E, Zambelli F, Principato G et al (2013) SpliceAid-F: a database of human splicing factors and their RNA-binding sites. Nucleic Acids Res 41:D125–D131. https://doi. org/10.1093/nar/gks997 Gonzalez D, Martinez P, Wade R, Hockley S, Oscier D, Matutes E, Dearden CE, Richards SM, Catovsky D, Morgan GJ (2011) Mutational status of the TP53 gene as a predictor of response and survival in patients with chronic lymphocytic leukemia: results from the LRF CLL4 trial. J Clin Oncol 29:2223–2229. https://doi.org/10.1200/JCO.2010.32.0838 Griseri P, Bourcier C, Hieblot C, Essafi-Benkhadir K, Chamorey E, Touriol C, Pages G (2011) A synonymous polymorphism of the Tristetraprolin (TTP) gene, an AU-rich mRNA-binding protein, affects translation efficiency and response to Herceptin treatment in breast cancer patients. Hum Mol Genet 20:4556–4568. https://doi.org/10.1093/hmg/ddr390 Hartwell LH, Szankasi P, Roberts CJ, Murray AW, Friend SH (1997) Integrating genetic approaches into the discovery of anticancer drugs. Science 278:1064–1068 Horak P, Heining C, Kreutzfeldt S, Hutter B, Mock A, Hullein J, Frohlich M, Uhrig S, Jahn A, Rump A et al (2021) Comprehensive genomic and transcriptomic analysis for guiding therapeutic decisions in patients with rare cancers. Cancer Discov 11:2780–2795. https://doi. org/10.1158/2159-8290.CD-21-0126 Hudis CA (2007) Trastuzumab – mechanism of action and use in clinical practice. N Engl J Med 357:39–51. https://doi.org/10.1056/NEJMra043186 Kaelin WG Jr (2005) The concept of synthetic lethality in the context of anticancer therapy. Nat Rev Cancer 5:689–698. https://doi.org/10.1038/nrc1691 Kapur P, Peña-Llopis S, Christie A, Zhrebker L, Pavia-Jimenez A, Rathmell WK, Xie XJ, Brugarolas J (2013) Effects on survival of BAP1 and PBRM1 mutations in sporadic clear- cell renal-cell carcinoma: a retrospective analysis with independent validation. Lancet Oncol 14:159–167. https://doi.org/10.1016/S1470-2045(12)70584-3 Kimchi-Sarfaty C, Oh JM, Kim IW, Sauna ZE, Calcagno AM, Ambudkar SV, Gottesman MM (2007) A "silent" polymorphism in the MDR1 gene changes substrate specificity. Science 315:525–528. https://doi.org/10.1126/science.1135308 Kimura M (1977) Preponderance of synonymous changes as evidence for the neutral theory of molecular evolution. Nature 267:275–276. https://doi.org/10.1038/267275a0 Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J (2014) A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 46:310–315. https://doi.org/10.1038/ng.2892 Lenglet M, Robriquet F, Schwarz K, Camps C, Couturier A, Hoogewijs D, Buffet A, Knight SJL, Gad S, Couve S et al (2018) Identification of a new VHL exon and complex splicing alterations in familial erythrocytosis or von Hippel-Lindau disease. Blood 132:469–483. https://doi. org/10.1182/blood-2018-03-838235 Ma F, Sun T, Shi Y, Yu D, Tan W, Yang M, Wu C, Chu D, Sun Y, Xu B et al (2009) Polymorphisms of EGFR predict clinical outcome in advanced non-small-cell lung cancer patients treated with Gefitinib. Lung Cancer 66:114–119. https://doi.org/10.1016/j.lungcan.2008.12.025
194
S. Peña-Llopis
Malouf GG, Zhang J, Yuan Y, Comperat E, Roupret M, Cussenot O, Chen Y, Thompson EJ, Tannir NM, Weinstein JN et al (2015) Characterization of long non-coding RNA transcriptome in clear-cell renal cell carcinoma by next-generation deep sequencing. Mol Oncol 9:32–43. https://doi.org/10.1016/j.molonc.2014.07.007 Miller AJ, Mihm MCJ (2006) Melanoma. N. Engl J Med 355:51–65. https://doi.org/10.1056/ NEJMra052166 Montera M, Piaggio F, Marchese C, Gismondi V, Stella A, Resta N, Varesco L, Guanti G, Mareni C (2001) A silent mutation in exon 14 of the APC gene is associated with exon skipping in a FAP family. J Med Genet 38:863–867. https://doi.org/10.1136/jmg.38.12.863 Motzer RJ, Banchereau R, Hamidi H, Powles T, McDermott D, Atkins MB, Escudier B, Liu LF, Leng N, Abbas AR et al (2020) Molecular subsets in renal cancer determine outcome to checkpoint and angiogenesis blockade. Cancer Cell 38:803–817. e804. https://doi. org/10.1016/j.ccell.2020.10.011 Mueller WF, Larsen LS, Garibaldi A, Hatfield GW, Hertel KJ (2015) The silent sway of splicing by synonymous substitutions. J Biol Chem 290:27700–27711. https://doi.org/10.1074/jbc. M115.684035 Niersch J, Vega-Rubin-de-Celis S, Bazarna A, Mergener S, Jendrossek V, Siveke JT, Peña-Llopis S (2021) A BAP1 synonymous mutation results in exon skipping, loss of function and worse patient prognosis. iScience 24:102173. https://doi.org/10.1016/j.isci.2021.102173 O’Rawe J, Jiang T, Sun G, Wu Y, Wang W, Hu J, Bodily P, Tian L, Hakonarson H, Johnson WE et al (2013) Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med 5:28. https://doi.org/10.1186/gm432 Paez JG, Janne PA, Lee JC, Tracy S, Greulich H, Gabriel S, Herman P, Kaye FJ, Lindeman N, Boggon TJ et al (2004) EGFR mutations in lung cancer: correlation with clinical response to gefitinib therapy. Science 304:1497–1500. https://doi.org/10.1126/science.1099314 Paik S, Shak S, Tang G, Kim C, Baker J, Cronin M, Baehner FL, Walker MG, Watson D, Park T et al (2004) A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med 351:2817–2826. https://doi.org/10.1056/NEJMoa041588 Parmley JL, Chamary JV, Hurst LD (2006) Evidence for purifying selection against synonymous mutations in mammalian exonic splicing enhancers. Mol Biol Evol 23:301–309. https://doi. org/10.1093/molbev/msj035 Peña-Llopis S, Vega-Rubín-de-Celis S, Liao A, Leng N, Pavía-Jiménez A, Wang S, Yamasaki T, Zhrebker L, Sivanand S, Spence P et al (2012) BAP1 loss defines a new class of renal cell carcinoma. Nat Genet 44:751–759. https://doi.org/10.1038/ng.2323 Peña-Llopis S, Christie A, Xie XJ, Brugarolas J (2013) Cooperation and antagonism among cancer genes: the renal cancer paradigm. Cancer Res 73:4173–4179. https://doi. org/10.1158/0008-5472.CAN-13-0360 Quang D, Chen Y, Xie X (2015) DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics 31:761–763. https://doi.org/10.1093/bioinformatics/btu703 Ratnam K, Low JA (2007) Current development of clinical inhibitors of poly(ADP-ribose) polymerase in oncology. Clin Cancer Res 13:1383–1388. https://doi.org/10.1158/1078-0432. CCR-06-2260 Sasaki T, Rodig SJ, Chirieac LR, Janne PA (2010) The biology and treatment of EML4-ALK non- small cell lung cancer. Eur J Cancer 46:1773–1780. https://doi.org/10.1016/j.ejca.2010.04.002 Sauna ZE, Kimchi-Sarfaty C (2011) Understanding the contribution of synonymous mutations to human disease. Nat Rev Genet 12:683–691. https://doi.org/10.1038/nrg3051 Sawyers C (2004) Targeted cancer therapy. Nature 432:294–297. https://doi.org/10.1038/ nature03095 Sawyers CL (2009) Finding and drugging the vulnerabilities of RAS-dependent cancers. Cell 137:796–798. https://doi.org/10.1016/j.cell.2009.05.011 Schutz FA, Pomerantz MM, Gray KP, Atkins MB, Rosenberg JE, Hirsch MS, McDermott DF, Lampron ME, Lee GS, Signoretti S et al (2013) Single nucleotide polymorphisms and risk of recurrence of renal-cell carcinoma: a cohort study. Lancet Oncol 14:81–87. https://doi. org/10.1016/S1470-2045(12)70517-X
9 SNPs and Personalized Medicine: Scrutinizing Pathogenic Synonymous Mutations…
195
Sharma Y, Miladi M, Dukare S, Boulay K, Caudron-Herger M, Gross M, Backofen R, Diederichs S (2019) A pan-cancer analysis of synonymous mutations. Nat Commun 10:2569. https://doi. org/10.1038/s41467-019-10489-2 Shihab HA, Rogers MF, Gough J, Mort M, Cooper DN, Day IN, Gaunt TR, Campbell C (2015) An integrative approach to predicting the functional effects of non-coding and coding sequence variation. Bioinformatics 31:1536–1543. https://doi.org/10.1093/bioinformatics/btv009 Supek F, Minana B, Valcarcel J, Gabaldon T, Lehner B (2014) Synonymous mutations frequently act as driver mutations in human cancers. Cell 156:1324–1335. https://doi.org/10.1016/j. cell.2014.01.051 Tan DSW, Chong FT, Leong HS, Toh SY, Lau DP, Kwang XL, Zhang X, Sundaram GM, Tan GS, Chang MM et al (2017) Long noncoding RNA EGFR-AS1 mediates epidermal growth factor receptor addiction and modulates treatment response in squamous cell carcinoma. Nat Med 23:1167–1175. https://doi.org/10.1038/nm.4401 Tang X, Zhang T, Cheng N, Wang H, Zheng CH, Xia J, Zhang T (2021) usDSM: a novel method for deleterious synonymous mutation prediction using undersampling scheme. Brief Bioinform 22. https://doi.org/10.1093/bib/bbab123 Teraoka SN, Telatar M, Becker-Catania S, Liang T, Onengut S, Tolun A, Chessa L, Sanal O, Bernatowska E, Gatti RA et al (1999) Splicing defects in the ataxia-telangiectasia gene, ATM: underlying mutations and consequences. Am J Hum Genet 64:1617–1631. https://doi. org/10.1086/302418 van Tilburg CM, Pfaff E, Pajtler KW, Langenberg KPS, Fiesel P, Jones BC, Balasubramanian GP, Stark S, Johann PD, Blattner-Johnson M et al (2021) The pediatric precision oncology INFORM registry: clinical outcome and benefit for patients with very high-evidence targets. Cancer Discov 11:2764–2779. https://doi.org/10.1158/2159-8290.CD-21-0094 Verweij J, Casali PG, Zalcberg J, LeCesne A, Reichardt P, Blay JY, Issels R, van Oosterom A, Hogendoorn PC, Van Glabbeke M et al (2004) Progression-free survival in gastrointestinal stromal tumours with high-dose imatinib: randomised trial. Lancet 364:1127–1134. https://doi. org/10.1016/S0140-6736(04)17098-0 Zhang L, Yuan X, Chen Y, Du XJ, Yu S, Yang M (2013) Role of EGFR SNPs in survival of advanced lung adenocarcinoma patients treated with Gefitinib. Gene 517:60–64. https://doi. org/10.1016/j.gene.2012.12.087
Chapter 10
Condon Optimization: Codon Optimization of Therapeutic Proteins: Suggested Criteria for Increased Efficacy and Safety Vincent P. Mauro
10.1 Introduction Proteins are some of the most useful molecules in nature and are tremendously valuable for a broad range of applications. For instance, large numbers of industrial enzymes are used in both commercial and consumer applications, including textiles, pharmaceuticals, and detergents. Numerous proteins also have therapeutic applications, including monoclonal antibodies, coagulation factors, enzymes, fusion proteins, hormones, growth factors, and plasma proteins (Lagasse et al. 2017). Some proteins, including the hormone insulin can be purified from natural sources, but in many cases, this is not possible as protein levels are too low (Mauro 2018). Recombinant proteins can, however, be produced in a wide range of expression systems that include cell-free lysates, microbial expression systems, mammalian cell culture, as well as transgenic plants and animals. Although many of these systems allow large scale production, some proteins are still expressed at levels too low to be economically feasible. To overcome features of mRNAs that restrict protein expression, various codon optimization approaches have been developed to exploit the ability of synonymous codon mutations to alter mRNA sequences without disrupting the encoded polypeptides. These approaches are commonly used to increase protein production by optimizing codon usage and removing negative features of mRNAs, including cryptic splice sites and mRNA secondary structures. Codon optimization can also facilitate cloning and gene synthesis, e.g., by introducing or removing restriction sites, and by optimizing oligonucleotide design.
V. P. Mauro (*) Promosome, LLC, New York, NY, USA © Springer Nature Switzerland AG 2022 Z. E. Sauna, C. Kimchi-Sarfaty (eds.), Single Nucleotide Polymorphisms, https://doi.org/10.1007/978-3-031-05616-1_10
197
198
V. P. Mauro
10.2 Protein Synthesis Primer 10.2.1 Translation Initiation mRNAs are ribonucleic acid templates that encode polypeptides and are decoded (translated) in a mechanism that requires ribosomes and various other factors. To initiate translation, two processes are required: (1) recruitment of the translation machinery by mRNAs and (2) identification of the initiation codon (Fig. 10.1), where the efficiency of both processes affects levels of protein expression. In eukaryotes, mRNAs can recruit the small (40S) ribosomal subunit at the 5′ ends of mRNAs, at a 7-methylguanosine (m7G)-cap-structure in a process that requires a set of eukaryotic initiation factors known as the eIF4F complex (Pelletier and Sonenberg 2019). Eukaryotic mRNAs can also recruit ribosomal subunits at internal sites, termed internal ribosome entry sites (IRESes) (Yang and Wang 2019). Binding to internal sites occurs by a degenerate set of mechanisms that include direct base-pairing between complementary segments of mRNA and the 18S rRNA
Fig. 10.1 Translation initiation involves 2 major processes: ribosomal recruitment and start site selection. Ribosome recruitment includes mechanisms that enable mRNAs and ribosomes to interact. In eukaryotes, recruitment can occur at different locations within mRNAs: at the 5′ ends of mRNAs and at internal sites. mRNA is indicated schematically as a blue line. The coding region is indicated as a thicker line initiating with an AUG codon. Recruitment at the 5′ ends of mRNAs involves the cap-structure, which is an m7G post-transcriptional modification, indicated in red. Recruitment at the cap-structure is mediated by the eIF4F complex of initiation factors (indicated as brown objects). The eIF4F complex interacts with the cap-structure and the 40S ribosomal subunit. Recruitment at internal sites involves specific mRNA sequences termed IRESes, which can recruit ribosomes by multiple different mechanisms. One mechanism shown in the figure involves direct binding of a short IRES-element (red rectangle) to the 40S ribosomal subunit, indicated as an orange oval. Other IRES mechanisms require some or all the same initiation factors required by the cap and may also require non-canonical factors. In eukaryotes, start site selection involves movement of the 40S subunit to the initiation codon. The initiation codon is recognized by the initiator Met-tRNA, which is associated with the 40S subunit as part of a ternary complex. The initiator Met-tRNA base pairs to the initiation codon, whereupon the large ribosomal subunit (60S) attaches to form an 80S ribosomal complex. The 80S complex synthesizes the polypeptide encoded by the mRNA in the elongation phase of protein synthesis. When the ribosomal complex reaches a termination codon, polypeptide synthesis stops, and the ribosome and polypeptide dissociate from the mRNA
10 Condon Optimization: Codon Optimization of Therapeutic Proteins: Suggested…
199
of 40S subunits (Mauro and Edelman 1997; Mauro and Edelman 2002; Dresios et al. 2006; Chappell et al. 2006a; Matsuda and Mauro 2014). Binding to mRNAs at internal sites can also involve interactions with initiation factors or non-canonical factors (Godet et al. 2019). For both cap-dependent and cap-independent initiation, the 40S subunit is part of a 43S pre-initiation complex, which contains eukaryotic initiation factors eIF1, eIF1A, eIF3, eIF5, and a ternary complex consisting of eIF2, GTP, and the initiator Methionine tRNA (Met-tRNA) (Pelletier and Sonenberg 2019). The ability of the 43S complex to initiate translation requires base pairing of the initiator Met-tRNA to an initiation codon. For most eukaryotic mRNAs, ribosomal recruitment does not occur immediately adjacent to the initiation codon; in some cases, mRNA 5′ leaders can be hundreds or even thousands of nucleotides in length (Steinberger et al. 2020), suggesting that ribosomal recruitment can occur at such distances. For initiation to occur, the ribosomal subunit must move from the recruitment site to the initiation codon. Both linear scanning and non-linear mechanisms, which include ribosomal shunting, tethering, and clustering, have been postulated as possible mechanisms for movement (Chappell et al. 2006a, b; Kozak 1978, 1989). The scanning hypothesis suggests that ribosomal subunits inspect the 5′ leader in a processive 5′ to 3′ direction until the first AUG codon is identified. Scanning was proposed in 1978, however, over time, key assumptions of this hypothesis have been invalided, including the notion that eukaryotic mRNAs are monocistronic (Kochetov 2006; Starck and Shastri 2016), and that translation of most mRNAs is cap- dependent (Yang and Wang 2019; Keiper 2019). In addition, there are numerous falsifying observations regarding the first AUG rule and the effects of obstacles in 5′leader sequences (Matsuda and Mauro 2010; Wethmar et al. 2014; Delcourt et al. 2018). Moreover, scanning cannot easily explain various observations, including competition between closely spaced AUG codons, bypassing of obstacles to scanning, and the lack of an apparent energy requirement (Matsuda and Mauro 2010; Pestova and Kolupaeva 2002; Matsuda and Dreher 2006). Furthermore, scanning ribosomal subunits have never been visualized, but have only been inferred on the basis of end-point assays, such as autoradiograms from primer extension reactions, which cannot distinguish between different mechanisms of ribosomal movement. Non-linear mechanisms of ribosomal movement suggest that ribosomes bypass some or all of the intervening sequences between the ribosomal recruitment site and the initiation codon. Mechanisms that involve ribosome shunting, tethering, and clustering are supported experimentally and have been shown to be predictive (Chappell et al. 2006a; Jang and Paek 2016). In eukaryotes, translation initiation appears to be most easily explained by two general principles: (1) recruitment of the translation machinery by mRNAs is mediated by a degenerate set of binding interactions and (2) recognition of the translation start site involves base pairing of the ribosome-associated initiator Met-tRNA to the initiation codon. Factors including distance between the recruitment site and initiation codon, and the relative accessibilities of potential initiation codons affect where translation initiates, which for most mRNAs is at multiple sites (Delcourt et al. 2018; Wethmar 2014). These same fundamental principles also appear to underlie bacterial translation initiation.
200
V. P. Mauro
10.2.2 Polypeptide Synthesis Translation initiation positions the ribosome at the initiation codon and establishes the reading frame. After the 43S ribosomal complex recognizes an initiation codon, the large (60S) ribosomal subunit attaches and the elongation phase of protein synthesis commences, at which time the polypeptide encoded by the mRNA is synthesized (Dever et al. 2018; Wang et al. 2020). During this process, the ribosome mediates interactions between two adjacent codons and their corresponding aminoacyl-tRNAs (aa-tRNAs, charged-tRNAs), which are tRNAs covalently linked to their cognate amino acids. The ribosome catalyzes a peptide bond between the 2 amino acids, shifts downstream on the mRNA and positions the next codon for binding to an aa-tRNA. This movement enables the upstream tRNA, which is now uncharged, to detach. These steps repeat in a processive 5′ to 3′ direction along the mRNA to synthesize the encoded polypeptide. Polypeptide synthesis terminates when the ribosome encounters a stop-codon (UAA, UAG, UGA), a process that involves release factors, GTP hydrolysis, release of the polypeptide, and disassembly of the ribosomal complex (Hellen 2018). Most amino acids are encoded by 2–6 synonymous codons, which enables mRNA coding sequences to be modified without altering the encoded polypeptide sequences. The only exceptions are methionine and tryptophan, which are each encoded by single codons: AUG and UUG, respectively. Because of this degeneracy in the genetic code, many different mRNAs can encode the same protein (Mauro and Chappell 2014), and mRNAs can be generated by using only an amino acid sequence as input (Itakura et al. 1977). Welch et al., (2009a, b) illustrated this principle by calculating that a 300 amino acid protein of average amino acid composition could be encoded by more than 10100different mRNAs (Welch et al. 2009a).
10.3 Codon Optimization 10.3.1 Codon Usage and Other Features of mRNA That Affect Protein Expression mRNAs contain numerous features that can limit protein expression, including codon usage. Investigations of codon usage across different organisms including bacteria, yeast, and higher eukaryotes have revealed that synonymous codons are not used randomly. Some codons are used frequently, while others are used rarely. Codon usage has also been found to vary between different organisms, and even within organisms, between various tissue types and during development (Hanson and Coller 2018). Over the past ≈40 years, a number of variables have been considered and many approaches have been used to evaluate codon usage; Table 10.1 provides a snapshot of these approaches.
10 Condon Optimization: Codon Optimization of Therapeutic Proteins: Suggested…
201
Table 10.1 Analysis of codon usage. Methods that have been used to evaluate codon usage over the past ≈40 years Year Approach 1981 Frequency of usage of optimal codons
Ref Ikemura (1981)
1982
Bennetzen and Hall (1982)
1984
1986
1987
1989
1990
1994
2001
2004
Description Used to quantify codon usage in E. coli. Strong positive correlation found between tRNA levels and codon usage in a set of highly expressed genes. Supports the notion that codon usage is related to tRNA availability. Codon bias Calculated the extent to which yeast mRNAs use the index same 22 codons that are found preferentially in a set of highly expressed genes. Determined the likelihood of finding a specific codon Codon in a highly expressed gene and compared this to the preference likelihood of finding the same codon in the other 2 statistic reading frames of the gene, which were considered to be random sequences of the same base composition. Analyzed 110 genes in yeast by scoring the number of Relative occurrences of a codon divided by expected usage, if synonymous usage was unbiased. Identified a group of highly codon usage expressed genes with more extreme codon bias than genes expressed at moderate/low levels. The highly expressed genes used a set of preferred synonymous codons, which correlated with tRNA abundance. CAI is one of the most widely referenced measures of Codon codon usage and is similar to the codon preference adaptation statistic except that the codon usage of a gene is index (CAI) compared to that of a reference set of highly expressed genes. This approach was taken to facilitate comparisons within and between species. Codon-pair This study showed that the occurrence of codon pairs is bias not random, and that various codon pairs correlate with gene expression. For example, in highly expressed genes, highly underrepresented codon pairs are used more frequently (≈2X) than in poorly expressed genes. Looked at codon usage by quantifying the extent to Effective which codon usage in a gene departs from the unbiased number of use of synonymous codons. codons Intrinsic codon Method to estimate codon bias of a yeast species for deviation index which optimal codons were unknown, using optimal codons from S. cerevisiae as a reference. Evaluated codon usage in 2396 human genes by Maximum correcting for background nucleotide content. likelihood Although most genes exhibited codon bias, this study codon bias found no evidence for selection on codon usage in humans. Orderliness of Studied the relationship between codon usage bias and GC composition in bacteria and archaea. Found that synonymous the GC content of the third nucleotide of codons affects codon usage synonymous codon usage, independent of species.
Gribskov et al. (1984)
Sharp et al. (1986)
Sharp and Li (1987)
Gutman and Hatfield (1989)
Wright (1990)
Freire-Picos et al. (1994) Urrutia and Hurst (2001)
Wan et al. (2004)
(continued)
202
V. P. Mauro
Table 10.1 (continued) Year Approach Description 2009 Relative codon Quantified codon usage in E. coli and yeast without bias using a reference set of genes, based on the hypothesis that highly expressed genes are strongly biased with regard to codon usage. 2010 Relative codon Based on the codon adaptation index, which uses a set of highly expressed genes as a reference; however, it adaptation subtracts background codon usage from the two index noncoding reading frames of highly expressed genes. 2019 Codon usage Evaluated codon usage preferences against a reference similarity index set of genes and normalized the results over the null hypothesis of random codon usage. Analysis of genes from various species suggested this approach might identify differences better than CAI, particularly for human and chicken.
Ref Roymondal et al. (2009) and Das et al. (2009) Lee et al. (2010)
Bourret et al. (2019)
In addition to codon usage, many mRNAs contain other features that can limit protein production. The fact that some mRNAs are expressed poorly is not surprising inasmuch as many valuable proteins are only required at low levels in vivo and may cause problems at higher levels, e.g., coagulation Factor VIII (Gouse et al. 2014). In addition, some mRNAs may have evolved features or accumulated mutations that are incompatible with high levels of protein expression. Cryptic splice sites provide such an illustrative example of an mRNA feature that can restrict protein production. Many eukaryotic genes contain introns, which are removed from precursor RNAs in a post-transcriptional process termed splicing. Splicing removes introns and ligates together the remaining segments of mRNA, termed exons. Splicing requires a 5′ splice site (donor site), which contains a conserved GU dinucleotide at the 5′ end of the intron, a branch point sequence in the intron, and a 3′ splice site (acceptor site), which contains a conserved AG dinucleotide at the 3′ end of the intron (exon⋅GU—intron—AG⋅exon) (Wan et al. 2019). Sequences flanking these sites determine the quality of the splice sites. For some mRNAs, splicing can occur between multiple, different splice donor and acceptor sites within an mRNA (Dvinge 2018). In these situations, a precursor mRNA population can give rise to an ensemble of alternatively spliced mature mRNAs, some of which do not encode full-length protein, which effectively lowers production of the encoded protein. Another feature of mRNAs that can affect protein expression is secondary structure. mRNA secondary structures can affect mRNA stability, translation initiation, e.g., by altering initiation codon accessibilities, as well as the movement of ribosomes during the elongation phase of polypeptide synthesis (Nieuwkoop et al. 2020). In addition, mRNA secondary structures are known to function as essential components of some IRESes that are required for ribosome recruitment and translation initiation (Lozano et al. 2018).
10 Condon Optimization: Codon Optimization of Therapeutic Proteins: Suggested…
203
10.3.2 Codon Optimization Approaches Codon optimization uses synonymous codons to improve codon usage, remove features of mRNAs that might inhibit protein production, and facilitate gene synthesis and cloning, e.g., by modifying genes to alter restriction sites and optimize oligonucleotide design. A goal of most codon optimization programs is to maximize protein production per unit mRNA over a given time period. Two general assumptions are that protein synthesis is constrained because of negative features in mRNAs and that protein production can be increased by changing codon usage to make translation elongation more efficient. A general feature of codon optimization methods is to avoid rare codons, which is based largely on data from bacteria and yeast. These studies showed that frequently used codons are often found in mRNAs with high levels of protein expression and are correlated with tRNA levels (Mauro and Chappell 2014; Ikemura 1981; Sharp et al. 1986). These findings suggested that rare codons are limiting for protein production, and that production might be enhanced by replacing rare codons with synonymous codons that occur frequently in high expressing mRNAs. In addition to codon bias, different codon optimization approaches consider a range of other variables, which are highlighted in Table 10.2. Most companies that perform gene synthesis offer codon optimization as a service. In general, they consider rare codons and different subsets of the variables discussed in Table 10.2. These companies include ThermoFisher (https://www.thermofisher.com) who developed the GeneArt™ GeneOptimizer™ software discussed above, GenScript (https://www.genscript.com), Eurofins Genomics (https://eurofinsgenomics.com) through their Blue Heron acquisition, IDT (Integrated DNA Technologies) (https://www.idtdna.com), GENEWIZ (https://www.genewiz.com), and ATUM (https://atum.bio), which was formerly DNA2.0. ATUM’s approach differs from the others because they use machine learning to develop algorithms for different hosts using expression data from sets of systematically modified synthetic gene variants.
10.3.3 Widespread Acceptance of Codon Optimization Codon optimized mRNAs are used broadly for recombinant protein expression in applications from bioproduction to nucleic acid therapeutics. Despite strong evidence that the foundational principles of codon optimization are unfounded in higher eukaryotes (Mauro and Chappell 2014), codon-optimized mRNAs are often used in an uncritical manner. One potential reason is easy access and the fact that the cost to codon-optimize a gene is often minimal. Indeed, most gene synthesis companies offer codon-optimization as a free service. In addition, numerous programs are available online at no cost. Codon optimization may be inexpensive to access because, in general, there are no performance criteria, i.e., codon-optimized
204
V. P. Mauro
Table 10.2 Overview of the approaches and variables that have been considered and implemented over time by different codon-optimization programs Year Program 1989 Gene.Design
Description Part of a set of computer programs developed for generating DNA sequences from amino acid input sequences using user-specified codon usage, e.g., for different host organisms. This program also enables restriction sites to be introduced into coding sequences. 1994 GMAP This program is similar to Gene.Design but also redesigns genes for “cassette” mutagenesis. 2002 DNAWorks Automates optimized oligonucleotide design for gene synthesis and specifies the use of high frequency codons. 2003 Codon optimizer Replaces codons with the most frequently used synonymous codons from a set of highly expressed genes. 2003 Pattern matching Alters CpG-motifs to be immuno-stimulatory for DNA algorithm for vaccines, or immuno-inhibitory or -suppressive COa (methylated) for gene therapy applications. 2004 UpGene Uses codons based on their optimal frequency in human genome, removes AU-rich mRNA instability elements, adds a Kozak consensus at the initiation codon, and designs primers for PCR based gene assembly. 2005 JCat (Java codon Uses codons with the highest relative adaptiveness for adaptation tool) each amino acid based on genomic information of genes with the highest codon bias, removes unwanted cleavage sites, and avoids rho-independent transcription terminators. Uses codon harmonization to remove codons below a 2005 GeMS (gene certain cut-off frequency and adjusts the use of the morphing remaining codons so that high abundance codons are system) used in proportion to those in the codon usage table. 2006 Gene designer From DNA2.0, optimizes codon usage based on probabilities obtained from a codon usage table derived from genomic information, while excluding codons below a threshold frequency value. This program also avoids DNA repeats, secondary structures, methylation sites that overlap methylation sensitive restriction sites, splice sites, and other user-defined motifs. 2006 GeneDesign Three optimization approaches that replace each codon with: (1) the synonymous codon having the highest relative usage (full optimization); (2) the most optimal codon that is not the original codon, or (3) the most optimal codon that is most dissimilar to the original, e.g., with transversions at the tRNA wobble position.
Ref Weiner and Scheraga (1989)
Raghava and Sahni (1994) Hoover and Lubkowski (2002) Fuglsang (2003) Satya et al. (2003) Gao et al. (2004)
Grote et al. (2005)
Jayaraj et al. (2005)
Villalobos et al. (2006)
Richardson et al. (2006)
(continued)
10 Condon Optimization: Codon Optimization of Therapeutic Proteins: Suggested…
205
Table 10.2 (continued) Year Program 2006 Synthetic gene designer
2007 OPTIMIZER
2007 CODAb
2008 Codon harmonization 2009 Gene composer
2010 GeneOptimizer
2012 Balanced codon usage 2014 Condition- specific codon optimization 2014 COOLc
Description Three approaches: (1) full optimization; (2) selective optimization, which only replaces rare-codons, and (3) probabilistic optimization, which allows user to explore strategies between full and selective optimization, using codons based on observed frequencies in a reference set of genes. Three approaches: (1) full optimization; (2) a guided random method where codons are selected at random based on frequencies of use in a reference set, and (3) a user-specified method to maximize codon use with the fewest changes, starting with rare codons. Can also use mean codon usage of ribosomal proteins or tRNA gene-copy numbers as references. Combines gene synthesis with codon optimization and accounts for codon-pair-usage by avoiding chance occurrences of overrepresented codon pairs. For expression of heterologous genes in E. coli. Tries to identify and maintain regions of slow translation thought to be important for protein folding. Contains a gene design module that removes cryptic Shine-Dalgarno sequences, introduces out-of-frame stop codons, and removes MazF cleavage sites, which enables single protein production in E. coli strains with an inducible MazF gene. Also contains an alignment viewer to visualize sequence conservation in light of known protein properties, including conformation, as well as ligand, water, and crystal contacts. From GeneArt, a sliding window approach that considers DNA motifs, including restriction sites, promoter recognition sequences, IRESes, and intragenic poly(A) sites. GC content is considered to facilitate PCR, sequencing, gene synthesis, and to avoid genetic instability of constructs. Homologies to reference sequences are also reduced, e.g., for DNA vaccines to avoid recombination between vaccine and wild-type virus, and to generate siRNA resistant genes with reduced homology to the wild-type. Adjusts codon usage to be proportional to the concentrations of cognate tRNAs in yeast. An approach for yeast that uses codon usage information from different expression conditions, e.g., stationary phase, in order to optimize genes for expression in those growth conditions. This approach supports different codon usage parameters, including CAI, individual codon usage, and codon pairing (codon context).
Ref Wu et al. (2006)
Puigbo et al. (2007)
Hatfield and Roth (2007) Angov et al. (2008) Lorimer et al. (2009)
Raab et al. (2010) and Fath et al. (2011)
Qian et al. (2012) Lanza et al. (2014)
Chin et al. (2014) (continued)
206
V. P. Mauro
Table 10.2 (continued) Year Program 2018 Presyncodon
2019 CodonWizard
2019 COSEMd
2020 COSMOe
2020 Codon optimization with deep learning
Description This program utilizes evolutionary information from 3 microbial expression hosts to predict synonymous codon usage for a heterologous sequence. This program considers codon usage in E. coli grown under different culture medium growth conditions, e.g., rich vs minimal medium. Tries to maximize codon-specific elongation rates by modeling the process of mRNA translation. This approach considers sub-steps of codon-specific elongation, mutual ribosomal exclusion, organism- specific initiation rates, and ribosome drop-off rates. Allows removal of user-prescribed forbidden sequence motifs, such as polynucleotide tracts, and includes position-specific optimization of out-of-frame hidden stop codons. Employs machine learning techniques to obtain information on the optimal distribution of codons.
Ref Tian et al. (2018) Rehbein et al. (2019) Trösemeier et al. (2019)
Taneda and Asai (2020)
Fu et al. (2020a)
Note that because the more recent approaches tend to contain many of the same features as earlier methods, e.g., avoidance of rare codons, the individual program descriptions are not necessarily exhaustive. aCodon Optimization; bComputationally Optimized DNA Assembly; cCodon Optimization OnLine; dCodon-Specific Elongation Model; eCodon Optimization Strategy with Multiple Objectives.
genes obtained from gene synthesis companies are generally provided as-is. Performance criteria would require gene synthesis companies to generate both unmodified and codon-optimized variants of a gene and to test expression in a relevant host to determine whether the codon optimized version expresses more protein than the unmodified sequence. In general, codon optimization presupposes an increase in protein expression. Unfortunately, the lack of performance criteria risks the possibility that some codon-optimized variants will express more poorly than the unmodified natural genes. Besides cost, a second potential reason for the popularity of codon optimization is that it provides the potential for increased protein levels, which is generally viewed as a good thing. The fact that codon-optimization can result in higher levels of recombinant protein production supports a “what do we have to lose” approach to expression construct development. For some applications, including industrial enzymes, codon optimization is worth trying as an approach to increase expression/ activity because it could reduce the cost of goods with potentially little or no downside. However, for therapeutic applications, increased expression alone should never be the final objective as proteins produced from codon optimized mRNAs may be altered in ways that can decrease drug efficacy and harm patients.
10 Condon Optimization: Codon Optimization of Therapeutic Proteins: Suggested…
207
10.4 Potential Problems with Codon Optimization for Therapeutics 10.4.1 Overlapping Information Although different codon optimization methods provide an opportunity for increased protein production, potential issues may arise when codon optimization is used for therapeutic applications. The review of codon optimization approaches above illustrates a complication, which is that codon optimization is not a single approach. Not only is codon usage evaluated using various metrics (Table 10.1), there are numerous codon optimization methods that employ different approaches to optimize codon usage (Table 10.2). Different algorithms and gene synthesis companies consider different sets of variables, which yield different codon-optimized variants of the same gene. The fact that codon optimization alters a large proportion of nucleotides in a gene is worrisome because even individual synonymous mutations can have dramatic effects and cause disease (Sauna and Kimchi-Sarfaty 2011; Simhadri et al. 2017; Dhindsa et al. 2020). A serious complication associated with codon optimization is that besides encoding amino acid sequences, mRNAs often contain other layers of information that can be disrupted by codon optimization. Disrupting this overlapping information can have unintended consequences, for example, altering protein conformation and function, and increasing immunogenicity. These possibilities raise serious and legitimate safety concerns for human use. During codon optimization, some types of overlapping information are considered, including cryptic splice sites, codon context, and mRNA secondary structures. However, it is not clear that all types of overlapping information are fully understood or considered, for example, binding sites for other mRNAs, long non-coding RNAs, and proteins (Banerjee et al. 2020). Synonymous codons themselves comprise an important category of overlapping information because different synonymous codons are translated at different rates. Synonymous codons are not completely exchangeable for numerous reasons, including the fact that they are decoded by different aa-tRNAs (Mauro and Chappell 2014; Marín et al. 2017). Some synonymous codons are decoded exclusively by cognate aa-tRNAs; some are decoded exclusively by non-cognate aa-tRNAs, and others are decoded by both cognate and non-cognate aa-tRNAs. Interactions between codons and cognate aa-tRNAs involve standard A-U and G-C Watson- Crick base pairing; for each codon, 3 nucleotides base pair to a complementary anti-codon sequence in tRNA. By contrast, non-cognate interactions use wobble base pairing interactions. In these cases, the first 2 nucleotides base pair by Watson- Crick interactions, but the third nucleotide base pairs via a weaker interaction, which includes G-U, U-G, U-I, C-I, and A-I (codon-tRNA) base pairing. Synonymous codon usage can affect the rate of translation because wobble codons are translated at a slower rate than codons that use standard base pairing (Stadler and Fire 2011; Wang et al. 2017). The aa-tRNA levels for different synonymous codons may also vary due to different numbers of gene copies, different regulatory
208
V. P. Mauro
control, and the effects of specific di-codon pairs (Mauro and Chappell 2014). The pattern of codon usage in natural genes has been suggested to set up an elongation rhythm, which may be important for co-translational folding of polypeptides (Rosenblum et al. 2013; Liu 2020). The role of codon usage in protein folding was studied in E. coli using codon optimized constructs encoding human small GTPases (Konczal et al. 2019). The results showed that the codon optimized constructs expressed more protein than the unmodified native sequence, but that much of the protein was insoluble. The accumulation of insoluble protein appeared to be unrelated to high protein expression levels as the ratio of soluble to insoluble protein was not increased when total protein expression was decreased by either degrading the promoter or the translation start sites. However, when a small number of rare codons were specifically introduced into the codon-optimized mRNA, there was a significant shift of material into the soluble fraction, suggesting that slowing the translation rate increased soluble yields by allowing for more productive co-translational folding. In some cases, mRNAs have evolved sites at which ribosomes slow down or pause, which can provide time for the nascent peptide to fold correctly and prevent alternative protein conformations from forming (Liu 2020). Although ribosome pausing is important for protein folding, there is also evidence that some pausing sites may have evolved for other reasons. For example, in hepatitis C virus, ribosome pausing at rare, inefficient wobble codons was shown to be required for viral RNA replication, but not for protein synthesis (Gerresheim et al. 2020).
10.4.2 Questionable Assumptions in Higher Eukaryotes A primary approach used by most codon optimization programs involves replacing rare codons with more frequently used codons to increase protein production. Rare codons are typically defined as underrepresented in coding regions at the genomic level. The underlying assumption is that rare codons decrease protein production by slowing translation elongation, which increases the total time required to synthesize a polypeptide. However, a critical analysis of this approach has revealed numerous problems (Mauro and Chappell 2014; Sharma et al. 2018) and is discussed further below. Evidence that rare codons are limiting for protein production is based largely on studies in bacteria and yeast, where highly expressed genes tend to contain more frequently used codons. In addition, studies in prokaryotes have suggested that codon usage became progressively more adapted to tRNA pools and that tRNAs for frequently used codons are present at higher levels, indicating that translation of these codons is not restricted (López et al. 2020). These observations support the notion that protein levels for highly expressed prokaryotic genes are determined in part by higher rates of polypeptide synthesis (reviewed in (Mauro 2018)). However, even in prokaryotes, the situation is more complex. For example, in E. coli it was found that high levels of protein expression were correlated with a set of codons that
10 Condon Optimization: Codon Optimization of Therapeutic Proteins: Suggested…
209
are decoded by tRNAs that are the most highly charged during amino acid starvation, not those that are most frequently used in highly expressed proteins (Welch et al. 2009b). In addition, a study in Saccharomyces cerevisiae found that preferentially used codons were not translated faster than rare codons, but that codon usage was proportional to cognate tRNA concentrations (Qian et al. 2012). Comparable results were also found in other eukaryotes examined, including fruit flies, mice, and human. Based on codon usage of highly expressed genes in microbes, most codon optimization approaches avoid or replace rare codons. However, a problem arises in higher eukaryotes, where there is little evidence that highly expressed mRNAs are optimized for translation efficiency (Mauro and Chappell 2014). For higher eukaryotes, the largest variable affecting codon usage appears to be genomic nucleotide composition, as seen in GC-rich chromosomal regions known as isochores (Pouyet et al. 2017). Furthermore, in mammals, ribosome profiling studies have shown that elongation rates are largely independent of codon usage, with no evidence of pausing at rare codons (Ingolia et al. 2011; Park et al. 2017). Another issue with measuring codon usage, particularly in higher eukaryotes, is that it is typically defined at the genomic level, and does not consider variables that might affect the codon composition of cells, including the expression levels of various mRNAs in different cell types (Athey et al. 2017; Kames et al. 2020). Whether particular codons are limiting for expression also depends on the number of active tRNA genes and the number of different aa-tRNAs (cognate and wobble) available for different codons. These and other variables affect codon usage, which in higher eukaryotes varies even between different cell types but are not reflected at the gene level. It is important to recognize that for higher eukaryotes, even when codon optimization leads to increased protein expression, there is little evidence that increased expression is due to more rapid elongation of the polypeptide chain. In some cases, codon optimization has been shown to enhance protein production by elevating mRNA levels, not by increasing translation efficiency. Several studies have demonstrated that codon optimization results in increased transcription and appears to be related to increased GC content (Kudla et al. 2006; Krinner et al. 2014; Zhou et al. 2016; Newman et al. 2016; Alexaki et al. 2019). A natural example consistent with this phenomenon comes from two members of the RAS family of GTPases: KRAS and HRAS. Although these proteins are highly homologous at the amino acid level, they differ dramatically in codon usage and it was demonstrated that these differences affected protein expression by altering both transcription and translation in a cell-type specific manner (Fu et al. 2018). Although codon optimization of the KRAS gene increased its expression, the conformation of the protein was altered leading the authors to suggest that codon optimization changed the co-translational folding of this protein.
210
V. P. Mauro
10.4.3 Disruption of Protein Conformation In microbes, various studies have shown that clusters of rare codons and other factors including low abundance aa-tRNAs and specific bi-codons can slow elongation or cause ribosome pausing, which may be important for correct protein folding (Rodnina 2016; Samatova et al. 2020). Similar correlations have been more difficult to establish in higher eukaryotes, but there is some evidence supporting the effects of codon usage on protein structure (Liu 2020; Liu et al. 2021). In addition, other features of mRNA primary sequences may achieve similar effects. For instance, mRNA secondary structures, interactions of mRNA sequences with ribosomes, folding catalysts, chaperones (Yang 2017; Komar 2018), and miRNAs (Sako et al. 2020). It is expected that disruption of these types of features by codon optimization may also disrupt co-translational protein folding. In Neurospora crassa it was shown that codon optimization of the Cross Pathway Control Protein 1 (CPC-1) gene increased CPC-1 protein expression but changed its degradation rate and disrupted its function. These effects appeared to be due to conformational alterations which were determined by altered sensitivity in a limited trypsin digestion assay (Lyu and Liu 2020). The effects of codon optimization on protein conformation were also reported in a study expressing recombinant vasoinhibin which is normally derived by proteolytic cleavage of the hormone prolactin (Moreno-Carranza et al. 2019). These studies demonstrated that codon optimization of prolactin cDNA enhanced both mRNA and protein levels, although to different extents, in a human embryo kidney cell line. By contrast, codon optimization of a cDNA encoding only vasoinhibin increased mRNA levels six-fold, but did not change intracellular vasoinhibin protein levels, and actually decreased its secretion. The elevated mRNA levels did not lead to increased protein yields because codon optimized vasoinhibin activated the unfolded protein response and endoplasmic reticulum-associated degradation. These events indicated that the conformation of vasoinhibin was disrupted by codon-optimization of its mRNA. The effects of codon optimization on both protein conformation and functional activity were analyzed in a comprehensive investigation of coagulation factor IX (Alexaki et al. 2019). Codon optimization increased expression of this protein by up to ≈five-fold in HEK293T cells; however, the increase was attributable to higher levels of factor IX mRNA, not increased translation efficiency. To investigate potential effects of codon optimization on factor IX protein conformation, studies were performed using protein purified from clonal cell lines that expressed either wild- type or codon optimized factor IX at similar levels. The wild-type and codon optimized variants had comparable levels of specific activity, but differences were observed using various techniques, including antibody-mediated inhibition of activity, aptamer binding, and limited proteolysis. Finally, ribosome profiling studies on these factor IX mRNAs showed that codon optimization altered the ribosomal distribution pattern compared to that of the wild-type transcript, supporting the notion
10 Condon Optimization: Codon Optimization of Therapeutic Proteins: Suggested…
211
that codon optimization altered local rates of translation and affected factor IX protein folding.
10.4.4 Immunogenicity Proteins drugs should be inherently safer than chemical drugs because they are often based on natural proteins, e.g., monoclonal antibodies, or used to supplement proteins that are already expressed in the body, including insulin, growth factors, and various blood factors. However, as described above, therapeutic proteins can differ from native proteins for a variety of reasons, which can lead to immunogenicity and safety issues. Factors that can cause recombinant proteins to differ from native proteins include alterations in protein conformation and differences in post- translational modifications (Fu et al. 2020b). Conformational changes can be caused by codon optimization or introduced, e.g., during the purification process (Majorek et al. 2014). Differences in post-translational modifications may occur, for example, when recombinant proteins are expressed in non-human cell lines or transgenic organisms (Fu et al. 2020b). Genetic variation is another potential source of variation that can cause immunogenicity. For example, first generation monoclonal antibodies were derived from mouse genes and were highly immunogenic; however, newer humanized and fully human antibodies are less immunogenic (Fu et al. 2020b). An additional concern with codon optimization is that it can lead to conformational alterations of protein drugs that trigger the production of anti-drug antibodies (ADAs). ADAs can decrease a drug’s effectiveness over time by either neutralizing its activity or decreasing its concentration (Fu et al. 2020b; Kromminga and Schellekens 2005; Sauna et al. 2018; Faraji et al. 2018; Tourdot et al. 2020). In addition, ADAs can sometimes cause adverse events, most commonly hypersensitivity reactions that include anaphylactoid reactions (Garcês and Demengeot 2018). Moreover, ADAs that recognize both the therapeutic and endogenous protein may potentially neutralize or decrease levels of one or both proteins, which might be lethal if the protein is essential (Pratt 2018). Examples include recombinant interferon-β, which triggered ADAs that recognized both the recombinant and native interferon-β, causing the endogenous protein to be partially neutralized (Sethu et al. 2013). In another example, pegylated recombinant human megakaryocyte growth and development factor (PEG-rHuMGDF) triggered ADAs in healthy volunteers that cross-reacted with endogenous thrombopoietin and caused thrombocytopenia, characterized by low platelet counts (Li et al. 2001). In a third example, recombinant erythropoietin (Epo) triggered ADAs that recognized native Epo and caused some patients to develop pure red-cell aplasia, a type of anemia in which the bone marrow no longer produces red blood cells (Casadevall et al. 2002; Cournoyer et al. 2004). Although it was not reported whether the recombinant proteins in these examples were codon optimized, they illustrate the types of problems that can arise if the structure of a therapeutic protein is altered, which can occur through codon optimization.
212
V. P. Mauro
10.5 Potential Benefits of Codon Optimization As discussed above, synonymous codon mutations introduced during codon optimization can alter the conformation and activity of a recombinant protein. However, these changes can sometimes be exploited to improve or otherwise modify protein function (see Mauro 2018). A recent study in yeast screened 27 fluorescent proteins in vivo for brightness, photostability, photochromicity, and pH-sensitivity (Botman et al. 2019). The best performing proteins based on brightness and other criteria were found to express poorly. Six of these genes were codon optimized to increase expression and although the expression of the proteins was increased, brightness was altered. Three of the proteins showed a significant increase in brightness, while the other 3 had decreased brightness, up to ≈4 times less bright. The increased brightness of 3 of the fluorescent proteins after codon optimization highlights the potential of codon optimization to improve protein function. Although the authors did not directly demonstrate that changes in brightness were caused by changes in protein conformation, it is likely to be the case as fluorescent proteins are known to be sensitive to conformational alterations (Chudakov et al. 2010). Codon optimization can also be exploited in situations where expressing a gene in a heterologous host causes problems because of differences in codon bias between the organisms. For example, attempts to express the human RioK2 gene in E. coli led to the production of a truncated protein (Correddu et al. 2020). This problem was caused by the occurrence of an AGG-AGA codon pair in the human gene, which caused a premature stop that appeared to be the result of a − 1 frameshift. The AGG and AGA codons, which both encode arginine, are the rarest codons in E. coli. The authors were able to overcome this translational problem by several approaches, including codon optimization, codon harmonization, and by replacing the AGG-AGA codon pair with more frequently used arginine codons. However, the complexity of codon usage was suggested by the finding that the AGG-AGA consecutive codons could be translated correctly when they were present in a codon harmonized sequence. For heterologous expression, e.g., of human genes in prokaryotic hosts, greater versatility between expression systems can be obtained by having increased confidence in pause sites, which can be obtained experimentally by ribosome profiling (Mauro and Chappell 2018). For example, ribosome profiling of an unmodified human gene can be performed in a human cell line to identify pause sites, which can potentially be recreated in a heterologous host, such as E. coli, by using rare codons, and confirmed in this host by ribosome profiling.
10 Condon Optimization: Codon Optimization of Therapeutic Proteins: Suggested…
213
10.6 Potential Risks of Codon Optimization for Gene Therapy and mRNA Therapeutics In addition to the issues discussed above regarding limitations of codon optimization when used to produce therapeutic proteins, there are additional concerns for in vivo applications including gene therapy and mRNA therapeutics. For example, codon-optimized constructs have the potential to produce novel out-of-frame peptides and proteins (Fig. 10.2; (Mauro and Chappell 2018)). For therapeutic proteins produced in a bioreactor, these cryptic products are removed during purification. However, for gene therapy and mRNA therapeutic applications, these novel products which are produced in the patient, cannot be removed, and may have unanticipated effects. mRNA therapies may provide important benefits over DNA or protein-based therapeutics (Martini and Guey 2019; Huang et al. 2020; Linares-Fernández et al. 2020); however, they present additional complications regarding codon usage. A problem with mRNA constructs is that mRNA itself can trigger an immune response that involves induction of the type I interferon pathway (Linares-Fernández et al. 2020). This induction can lead to decreased translation and mRNA decay, resulting in lower levels of protein production. An important component of immune activation by in vitro-transcribed mRNA was shown to be double stranded RNA (Karikó et al. 2011). For some applications, such as vaccines, this inherent adjuvant property of mRNA is useful. By contrast, for therapeutic proteins, activation of the immune system by mRNA transcripts is undesirable. Approaches that have been shown to reduce immune activation include the use of modified nucleotides, including pseudouridine, 2-thiouridine, 5-methyluridine, 5-methylcytidine, N6-methyladenosine, and N1-methyl-pseudouridine (Sahin et al. 2014; Nelson et al. 2020). Alternative approaches to evade the immune system involve altering the codon composition of mRNAs to reduce the number of uridine residues, e.g., by using the most GC rich codons for each amino acid (Thess et al. 2015). mRNA constructs containing modified nucleotides are frequently codon optimized, therefore, these constructs, and those with altered nucleotide compositions, face the same potential problems as codon optimized DNA constructs. Moreover, for mRNA constructs with modified nucleotides, an additional imaginable issue, which will not be explored here, is whether the modified nucleotides themselves alter elongation and affect protein folding.
10.7 Alternative Technologies for Increased Protein Production The problem of poor protein expression can be addressed at different levels, for example transcription to increase mRNA levels. For DNA constructs, various approaches have been used to increase recombinant mRNA levels, including the use
214
V. P. Mauro
Fig. 10.2 Codon optimization of human EPO mRNA. (a) Partial sequence of human EPO mRNA encoding the amino-terminal amino acids is shown. The initiation codon AUG is indicated by the dark blue arrow and underlined. The native mRNA sequence contains numerous potential alternative translation start sites (indicated by arrows and underlined), including both canonical (AUG) and non-canonical (CUG, GUG, ACG, UUG) sites. Light blue arrows point to potential start sites that are in-frame with the EPO coding sequence. The red and green arrows point to potential start sites in reading frames 2 and 3, respectively. Truncated peptide sequences arising from these start sites are indicated schematically. In-frame peptides are shown in blue and out-of-frame peptides in red and green, corresponding to the arrows. Amino acid sequences for the peptide from reading frame 2 and the longest peptide from reading frame 3 are shown in single-letter amino acid code. Sequences for the shorter polypeptides are not repeated and correspond to those of the longer sequences in each case. (b) Codon-optimized variant of human EPO mRNA. This codon-optimized sequence was obtained from a patent for modified therapeutic mRNAs (United States Patent 10772975). Note that codon-optimization altered the alternative start sites compared to the native sequence, such that there are 2 new in-frame peptides, indicated by *. In addition, there is one disrupted site. The peptide sequences from reading frames 2 and 3 are all novel and are indicated by**
of synthetic promoters, e.g., for high expression in specific cell types (Liu et al. 2019; Wu et al. 2019; Jüttner et al. 2019; Johari et al. 2019). For bioproduction in mammalian cell culture, increased numbers of gene copies have also been used to increase expression (Lee et al. 2020).. However, increasing mRNA levels has limitations and can sometimes cause other problems, for example, by diverting resources away from routine activities within cells, which can affect cell growth in culture (Mauro 2018; Fomina-Yadlin et al. 2015). For gene therapy constructs, e.g., in
10 Condon Optimization: Codon Optimization of Therapeutic Proteins: Suggested…
215
adeno-associated virus (AAV) vectors, recombinant mRNA levels can be increased by increasing the vector dose, but this approach is limiting due to potential dose- dependent toxicities (Wilson and Flotte 2020; Ronzitti et al. 2020). Codon optimization targets the elongation phase of translation to increase protein production; however, translation can also be enhanced by targeting the initiation phase of translation. Different mRNAs initiate translation with different relative efficiencies, for example, some virus mRNAs can efficiently out-compete cellular mRNAs when cap-dependent translation is inhibited during infection (Kaufmann et al. 1976; Fernández-García et al. 2020). The translation efficiency of mRNAs is determined in large part by 5′ leader sequences and the nucleotide context of the initiation codon (Hernández et al. 2019; Akirtava and McManus 2020). Some 5′leaders contain sequence elements that can facilitate ribosomal recruitment and translation initiation. So-called translation enhancer elements (TEEs) have been identified in both natural mRNAs and selection studies (Chappell et al. 2000; Wellensiek et al. 2013). Synthetic 5′ leaders containing TEEs can increase the translation efficiency of mRNAs by increasing the amount of protein produced per unit mRNA, an approach that avoids the limitations associated with high mRNA levels. It is expected that this type of approach is unlikely to affect the structure or function of the encoded protein as the coding sequences are unaltered. In our own studies at Promosome LLC, we have found that alternative translation start sites in the coding regions of genes, proximal to the initiation codon, can have dramatic negative effects on levels of protein expression. Alternative start sites include both in-frame and out-of-frame AUG codons, as well as non-canonical codons, including CUG, ACG, and GUG (Fig. 10.2; Peabody 1989; Xu and Zhang 2020). Promiscuous initiation of this type appears to be widespread and is most strongly supported by high-throughput ribosome profiling studies that have mapped translation initiation sites in mRNAs. Synonymous mutations to disrupt these alternative start sites can often increase protein expression and appear to do so by minimizing competition with the authentic initiation codon. By restricting synonymous mutations to the signal peptide, the mature codon region of a gene can be left unaltered, minimizing or eliminating any conformational alterations to the mature protein. Another segment of mRNA that has not received extensive attention yet, but which provides an additional avenue for mRNA optimization is the 3′ untranslated region (UTR). It is known that 3′ UTRs can affect the expression and function of the encoded protein and contain specific sequences that can alter mRNA stability, localization, and translation (reviewed in Mayr 2017; Mayr 2019). One of the major regulatory mechanisms mediated by 3′ UTRs involves miRNAs, which bind to target sites frequently found in 3′ UTRs (Bartel 2018). In general, binding of miRNAs to target sites in 3′ UTRs repress expression, predominantly through mRNA destabilization and translational repression.
216
V. P. Mauro
10.8 Recommendations for Therapeutic Applications Codon optimization is an approach used to increase protein production that involves extensive mutation of mRNA coding regions. Some modifications, such as removing cryptic splice sites, can be easily verified experimentally, e.g., by performing reverse transcriptase-PCR reactions (Bachman 2013; Zhang et al. 2020). However, the notion that rare codons are limiting for protein production, which is a basic premise of most codon optimization approaches, is not supported experimentally in higher eukaryotes (Mauro and Chappell 2014). Nevertheless, codon optimization can result in increased protein production, presumably by different mechanisms. As mentioned above, several studies have shown that codon optimization can increase protein expression by increasing the rate of transcription, without altering translation efficiency (Kudla et al. 2006; Krinner et al. 2014; Zhou et al. 2016; Newman et al. 2016; Alexaki et al. 2019). The inadvertent disruption of micro-RNA seed sequences that would otherwise decrease mRNA stability or translation efficiency is another likely effect of codon optimization and is expected to increase protein expression. For some therapeutic proteins, the ability to increase protein levels is essential, e.g., to enable commercial viability of a protein drug produced in a bioreactor. For in vivo applications, higher protein expression levels could also make lower doses possible, which is usually desirable to minimize side effects. In addition, lower doses in AAV gene therapy would minimize toxicity associated with the AAV vector (Wilson and Flotte 2020; Ronzitti et al. 2020). However, in most cases, the effects of codon optimization are not assessed experimentally, and a codon optimized variant is used as the default. Given the fact that even one synonymous mutation can have dramatic effects on mRNA splicing, secondary structure, stability, translation, protein conformation, and activity (Simhadri et al. 2017; Katneni et al. 2019; Hunt et al. 2019), there are serious concerns that modifying up to ≈80% of the nucleotides in an mRNA will yield a recombinant protein identical to the native protein. An important consideration with regard to codon optimization is the protein itself, because although some proteins may adopt their correct structure regardless of codon usage, in many cases, important co-translational processes depend on codon usage (Liu 2020; Liu et al. 2021). It is clear that codon optimization can alter protein conformation and affect activity. For therapeutic proteins, these effects can lead to increased immunogenicity, with decreased efficacy and safety. It is therefore recommended that recombinant proteins intended for use in patients should meet certain criteria if they are expressed from constructs that are codon optimized or contain synonymous codon mutations, for example, as occur in some mRNA therapeutic constructs (Thess et al. 2015): 1. Protein expression from a codon-optimized gene (or mRNA) should be compared with that of the unmodified natural gene sequence to establish that codon optimization increases expression. Using a codon-optimized variant of a therapeutic gene without first establishing that the synonymous mutations increase protein production assumes unnecessary risk. If protein expression is enhanced,
10 Condon Optimization: Codon Optimization of Therapeutic Proteins: Suggested…
217
potential effects of codon optimization on activity and protein conformation should then be assessed. 2. The effects of codon optimization on protein function should be evaluated to determine whether codon optimization has altered the activity of the encoded protein, which might indicate disrupted protein conformation or post-translational processing. 3. Efforts should be undertaken to evaluate the effects of codon optimization on protein conformation (Bali and Bebok 2015), which may cause a protein to become immunogenic. There are numerous methods that may be useful for comparing purified proteins from codon-optimized and unmodified constructs, e.g., (Alexaki et al. 2019; Chance et al. 2020; Haridhasapavalan et al. 2020). Antibody- binding and aptamer binding assays provide other assays that are sensitive to conformational differences (Goldberg 1991; Wildner et al. 2019; Jankowski et al. 2020). For in vivo applications, including gene therapy and mRNA therapeutics, it would be preferable to compare protein structures from codon-modified and natural constructs in vivo. Although there have been some advances in this area, including in- cell NMR and FRET-based approaches, these methods are not yet well-established (Okamoto and Sako 2017; Ikeya et al. 2019). However, a suitable proxy for assessing protein conformation in vivo is ribosome profiling, which is more accessible approach and can be performed in cell or animal model systems (Ingolia et al. 2019). Acknowledgements I thank Dr. Stephen Chappell for valuable comments and critical reading of the chapter.
References Akirtava C, McManus CJ (2020) Control of translation by eukaryotic mRNA transcript leaders- Insights from high-throughput assays and computational modeling. Wiley Interdiscip Rev RNA:e1623 Alexaki A et al (2019) Effects of codon optimization on coagulation factor IX translation and structure: implications for protein and gene therapies. Sci Rep 9(1):15449 Angov E et al (2008) Heterologous protein expression is enhanced by harmonizing the codon usage frequencies of the target gene with those of the expression host. PLoS One 3(5):e2189 Athey J et al (2017) A new and updated resource for codon usage tables. BMC Bioinformatics 18(1):391 Bachman J (2013) Reverse-transcription PCR (RT-PCR). Methods Enzymol 530:67–74 Bali V, Bebok Z (2015) Decoding mechanisms by which silent codon changes influence protein biogenesis and function. Int J Biochem Cell Biol 64:58–74 Banerjee S, Kalyani Yabalooru SR, Karunagaran D (2020) Identification of mRNA and non-coding RNA hubs using network analysis in organ tropism regulated triple negative breast cancer metastasis. Comput Biol Med 127:104076 Bartel DP (2018) Metazoan microRNAs. Cell 173(1):20–51 Bennetzen JL, Hall BD (1982) Codon selection in yeast. J Biol Chem 257(6):3026–3031
218
V. P. Mauro
Botman D et al (2019) In vivo characterisation of fluorescent proteins in budding yeast. Sci Rep 9(1):2234 Bourret J, Alizon S, Bravo IG (2019) COUSIN (COdon Usage Similarity INdex): a normalized measure of codon usage preferences. Genome Biol Evol 11(12):3523–3528 Casadevall N et al (2002) Pure red-cell aplasia and antierythropoietin antibodies in patients treated with recombinant erythropoietin. N Engl J Med 346(7):469–475 Chance MR et al (2020) Protein footprinting: auxiliary engine to power the structural biology revolution. J Mol Biol 432(9):2973–2984 Chappell SA, Edelman GM, Mauro VP (2000) A 9-nt segment of a cellular mRNA can function as an internal ribosome entry site (IRES) and when present in linked multiple copies greatly enhances IRES activity. Proc Natl Acad Sci U S A 97:1536–1541 Chappell SA, Edelman GM, Mauro VP (2006a) Ribosomal tethering and clustering as mechanisms for translation initiation. Proc Natl Acad Sci U S A 103(48):18077–18082 Chappell SA et al (2006b) Ribosomal shunting mediated by a translational enhancer element that base pairs to 18S rRNA. Proc Natl Acad Sci U S A 103(25):9488–9493 Chin JX, Chung BK, Lee DY (2014) Codon Optimization OnLine (COOL): a web-based multi- objective optimization platform for synthetic gene design. Bioinformatics 30(15):2210–2212 Chudakov DM et al (2010) Fluorescent proteins and their applications in imaging living cells and tissues. Physiol Rev 90(3):1103–1163 Correddu D et al (2020) Effect of consecutive rare codons on the recombinant production of human proteins in Escherichia coli. IUBMB Life 72(2):266–274 Cournoyer D et al (2004) Anti-erythropoietin antibody-mediated pure red cell aplasia after treatment with recombinant erythropoietin products: recommendations for minimization of risk. J Am Soc Nephrol 15(10):2728–2734 Das S, Roymondal U, Sahoo S (2009) Analyzing gene expression from relative codon usage bias in yeast genome: a statistical significance and biological relevance. Gene 443(1–2):121–131 Delcourt V et al (2018) Small proteins encoded by unannotated ORFs are rising stars of the proteome, confirming shortcomings in genome annotations and current vision of an mRNA. Proteomics 18(10):e1700058 Dever TE, Dinman JD, Green R (2018) Translation elongation and recoding in eukaryotes. Cold Spring Harb Perspect Biol 10(8) Dhindsa RS et al (2020) Natural selection shapes codon usage in the human genome. Am J Hum Genet 107(1):83–95 Dresios J et al (2006) An mRNA-rRNA base-pairing mechanism for translation initiation in eukaryotes. Nat Struct Mol Biol 13(1):30–34 Dvinge H (2018) Regulation of alternative mRNA splicing: old players and new perspectives. FEBS Lett 592(17):2987–3006 Faraji F et al (2018) Challenges related to the immunogenicity of parenteral recombinant proteins: underlying mechanisms and new approaches to overcome it. Int Rev Immunol 37(6):301–315 Fath S et al (2011) Multiparameter RNA and codon optimization: a standardized tool to assess and enhance autologous mammalian gene expression. PLoS One 6(3):e17596 Fernández-García L et al (2020) The internal ribosome entry site of the dengue virus mRNA is active when cap-dependent translation initiation is inhibited. J Virol Fomina-Yadlin D et al (2015) Transcriptome analysis of a CHO cell line expressing a recombinant therapeutic protein treated with inducers of protein expression. J Biotechnol 212:106–115 Freire-Picos MA et al (1994) Codon usage in Kluyveromyces lactis and in yeast cytochrome c-encoding genes. Gene 139(1):43–49 Fu J et al (2018) Codon usage regulates human KRAS expression at both transcriptional and translational levels. J Biol Chem 293(46):17929–17940 Fu H et al (2020a) Codon optimization with deep learning to enhance protein expression. Sci Rep 10(1):17617 Fu K et al (2020b) Immunogenicity of protein therapeutics: a lymph node perspective. Front Immunol 11:791
10 Condon Optimization: Codon Optimization of Therapeutic Proteins: Suggested…
219
Fuglsang A (2003) Codon optimizer: a freeware tool for codon optimization. Protein Expr Purif 31(2):247–249 Gao W et al (2004) UpGene: application of a web-based DNA codon optimization algorithm. Biotechnol Prog 20(2):443–448 Garcês S, Demengeot J (2018) The immunogenicity of biologic therapies. Curr Probl Dermatol 53:37–48 Gerresheim GK et al (2020) Ribosome pausing at inefficient codons at the end of the replicase coding region is important for hepatitis C virus genome replication. Int J Mol Sci 21(18) Godet AC et al (2019) IRES trans-acting factors, key actors of the stress response. Int J Mol Sci 20(4) Goldberg ME (1991) Investigating protein conformation, dynamics and folding with monoclonal antibodies. Trends Biochem Sci 16(10):358–362 Gouse BM et al (2014) New thrombotic events in ischemic stroke patients with elevated factor VIII. Thrombosis 2014:302861 Gribskov M, Devereux J, Burgess RR (1984) The codon preference plot: graphic analysis of protein coding sequences and prediction of gene expression. Nucleic Acids Res 12(1 Pt 2):539–549 Grote A et al (2005) JCat: a novel tool to adapt codon usage of a target gene to its potential expression host. Nucleic Acids Res 33(Web Server issue):W526–W531 Gutman GA, Hatfield GW (1989) Nonrandom utilization of codon pairs in Escherichia coli. Proc Natl Acad Sci U S A 86(10):3699–3703 Hanson G, Coller J (2018) Codon optimality, bias and usage in translation and mRNA decay. Nat Rev Mol Cell Biol 19(1):20–30 Haridhasapavalan KK, Sundaravadivelu PK, Thummer RP (2020) Codon optimization, cloning, expression, purification, and secondary structure determination of human ETS2 transcription factor. Mol Biotechnol 62(10):485–494 Hatfield GW, Roth DA (2007) Optimizing scaleup yield for protein production: computationally optimized DNA assembly (CODA) and translation engineering. Biotechnol Annu Rev 13:27–42 Hellen CUT (2018) Translation termination and ribosome recycling in eukaryotes. Cold Spring Harb Perspect Biol 10(10) Hernández G, Osnaya VG, Pérez-Martínez X (2019) Conservation and variability of the AUG initiation codon context in eukaryotes. Trends Biochem Sci 44(12):1009–1021 Hoover DM, Lubkowski J (2002) DNAWorks: an automated method for designing oligonucleotides for PCR-based gene synthesis. Nucleic Acids Res 30(10):e43 Huang L et al (2020) Advances in development of mRNA-based therapeutics. Curr Top Microbiol Immunol Hunt R et al (2019) A single synonymous variant (c.354G>A [p.P118P]) in ADAMTS13 confers enhanced specific activity. Int J Mol Sci 20(22) Ikemura T (1981) Correlation between the abundance of Escherichia coli transfer RNAs and the occurrence of the respective codons in its protein genes: a proposal for a synonymous codon choice that is optimal for the E. coli translational system. J Mol Biol 151(3):389–409 Ikeya T, Güntert P, Ito Y (2019) Protein structure determination in living cells. Int J Mol Sci 20(10) Ingolia NT, Lareau LF, Weissman JS (2011) Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. Cell 147(4):789–802 Ingolia NT, Hussmann JA, Weissman JS (2019) Ribosome profiling: global views of translation. Cold Spring Harb Perspect Biol 11(5) Itakura K et al (1977) Expression in Escherichia coli of a chemically synthesized gene for the hormone somatostatin. Science 198(4321):1056–1063 Jang SK, Paek KY (2016) Cap-dependent translation is mediated by ‘RNA looping’ rather than ‘ribosome scanning’. RNA Biol 13(1):1–5 Jankowski W et al (2020) Modified aptamers as reagents to characterize recombinant human erythropoietin products. Sci Rep 10(1):18593
220
V. P. Mauro
Jayaraj S, Reid R, Santi DV (2005) GeMS: an advanced software package for designing synthetic genes. Nucleic Acids Res 33(9):3011–3016 Johari YB et al (2019) CHO genome mining for synthetic promoter design. J Biotechnol 294:1–13 Jüttner J et al (2019) Targeting neuronal and glial cell types with synthetic promoter AAVs in mice, non-human primates and humans. Nat Neurosci 22(8):1345–1356 Kames J et al (2020) TissueCoCoPUTs: novel human tissue-specific codon and codon-pair usage tables based on differential tissue gene expression. J Mol Biol 432(11):3369–3378 Karikó K et al (2011) Generating the optimal mRNA for therapy: HPLC purification eliminates immune activation and improves translation of nucleoside-modified, protein-encoding mRNA. Nucleic Acids Res 39(21):e142 Katneni UK et al (2019) Splicing dysregulation contributes to the pathogenicity of several F9 exonic point variants. Mol Genet Genomic Med 7(8):e840 Kaufmann Y, Goldstein E, Penman S (1976) Poliovirus-induced inhibition of polypeptide initiation in vitro on native polyribosomes. Proc Natl Acad Sci U S A 73(6):1834–1838 Keiper BD (2019) Cap-independent mRNA translation in germ cells. Int J Mol Sci 20(1) Kochetov AV (2006) Alternative translation start sites and their significance for eukaryotic proteomes. Mol Biol 40:705–712 Komar AA (2018) Unraveling co-translational protein folding: concepts and methods. Methods 137:71–81 Konczal J, Bower J, Gray CH (2019) Re-introducing non-optimal synonymous codons into codon- optimized constructs enhances soluble recovery of recombinant proteins from Escherichia coli. PLoS One 14(4):e0215892 Kozak M (1978) How do eucaryotic ribosomes select initiation regions in messenger RNA? Cell 15(4):1109–1123 Kozak M (1989) The scanning model for translation: an update. J Cell Biol 108:229–241 Krinner S et al (2014) CpG domains downstream of TSSs promote high levels of gene expression. Nucleic Acids Res 42(6):3551–3564 Kromminga A, Schellekens H (2005) Antibodies against erythropoietin and other protein-based therapeutics: an overview. Ann N Y Acad Sci 1050:257–265 Kudla G et al (2006) High guanine and cytosine content increases mRNA levels in mammalian cells. PLoS Biol 4(6):e180 Lagasse HA et al (2017) Recent advances in (therapeutic protein) drug development. F1000Res 6:113 Lanza AM et al (2014) A condition-specific codon optimization approach for improved heterologous gene expression in Saccharomyces cerevisiae. BMC Syst Biol 8:33 Lee S et al (2010) Relative codon adaptation index, a sensitive measure of codon usage bias. Evol Bioinformatics Online 6:47–55 Lee SY, Baek M, Lee GM (2020) Comprehensive characterization of dihydrofolate reductase- mediated gene amplification for the establishment of recombinant human embryonic kidney 293 cells producing monoclonal antibodies. Biotechnol J:e2000351 Li J et al (2001) Thrombocytopenia caused by the development of antibodies to thrombopoietin. Blood 98(12):3241–3248 Linares-Fernández S et al (2020) Tailoring mRNA vaccine to balance innate/adaptive immune response. Trends Mol Med 26(3):311–323 Liu Y (2020) A code within the genetic code: codon usage regulates co-translational protein folding. Cell Commun Signal 18(1):145 Liu Y et al (2019) Synthetic promoter for efficient and muscle-specific expression of exogenous genes. Plasmid 106:102441 Liu Y, Yang Q, Zhao F (2021) Synonymous but not silent: the codon usage code for gene expression and protein folding. Annu Rev Biochem López JL et al (2020) Codon usage optimization in the Prokaryotic tree of life: how synonymous codons are differentially selected in sequence domains with different expression levels and degrees of conservation. mBio 11(4)
10 Condon Optimization: Codon Optimization of Therapeutic Proteins: Suggested…
221
Lorimer D et al (2009) Gene composer: database software for protein construct design, codon engineering, and gene synthesis. BMC Biotechnol 9:36 Lozano G, Francisco-Velilla R, Martinez-Salas E (2018) Deconstructing internal ribosome entry site elements: an update of structural motifs and functional divergences. Open Biol 8(11) Lyu X, Liu Y (2020) Nonoptimal codon usage is critical for protein structure and function of the master general amino acid control regulator CPC-1. mBio 11(5) Majorek KA et al (2014) Double trouble-Buffer selection and His-tag presence may be responsible for nonreproducibility of biomedical experiments. Protein Sci 23(10):1359–1368 Marín M, Fernández-Calero T, Ehrlich R (2017) Protein folding and tRNA biology. Biophys Rev 9(5):573–588 Martini PGV, Guey LT (2019) A new era for rare genetic diseases: messenger RNA therapy. Hum Gene Ther 30(10):1180–1189 Matsuda D, Dreher TW (2006) Close spacing of AUG initiation codons confers dicistronic character on a eukaryotic mRNA. RNA 12(7):1338–1349 Matsuda D, Mauro VP (2010) Determinants of initiation codon selection during translation in mammalian cells. PLoS One 5:e15057 Matsuda D, Mauro VP (2014) Base pairing between hepatitis C virus RNA and 18S rRNA is required for IRES-dependent translation initiation in vivo. Proc Natl Acad Sci U S A 111(43):15385–15389 Mauro VP (2018) Codon optimization in the production of recombinant biotherapeutics: potential risks and considerations. BioDrugs 32(1):69–81 Mauro VP, Chappell SA (2014) A critical analysis of codon optimization in human therapeutics. Trends Mol Med 20(11):604–613 Mauro VP, Chappell SA (2018) Considerations in the use of codon optimization for recombinant protein expression. Methods Mol Biol 1850:275–288 Mauro VP, Edelman GM (1997) rRNA-like sequences occur in diverse primary transcripts: implications for the control of gene expression. Proc Natl Acad Sci U S A 94:422–427 Mauro VP, Edelman GM (2002) The ribosome filter hypothesis. Proc Natl Acad Sci U S A 99(19):12031–12036 Mayr C (2017) Regulation by 3’-untranslated regions. Annu Rev Genet 51:171–194 Mayr C (2019) What are 3’ UTRs doing? cold spring harb perspect. Biol 11(10) Moreno-Carranza B et al (2019) Sequence optimization and glycosylation of vasoinhibin: pitfalls of recombinant production. Protein Expr Purif 161:49–56 Nelson J et al (2020) Impact of mRNA chemistry and manufacturing process on innate immune activation. Sci Adv 6(26):eaaz6893 Newman ZR et al (2016) Differences in codon bias and GC content contribute to the balanced expression of TLR7 and TLR9. Proc Natl Acad Sci U S A 113(10):E1362–E1371 Nieuwkoop T et al (2020) The ongoing quest to crack the genetic code for protein production. Mol Cell 80(2):193–209 Okamoto K, Sako Y (2017) Recent advances in FRET for the study of protein interactions and dynamics. Curr Opin Struct Biol 46:16–23 Park JH et al (2017) Preferential use of minor codons in the translation initiation region of human genes. Hum Genet 136(1):67–74 Peabody DS (1989) Translation initiation at non-AUG triplets in mammalian cells. J Biol Chem 264:5031–5035 Pelletier J, Sonenberg N (2019) The organizing principles of eukaryotic ribosome recruitment. Annu Rev Biochem 88:307–335 Pestova TV, Kolupaeva VG (2002) The roles of individual eukaryotic translation initiation factors in ribosomal scanning and initiation codon selection. Genes Dev 16(22):2906–2922 Pouyet F et al (2017) Recombination, meiotic expression and human codon usage. elife 6 Pratt KP (2018) Anti-drug antibodies: emerging approaches to predict, reduce or reverse biotherapeutic immunogenicity. Antibodies (Basel) 7(2)
222
V. P. Mauro
Puigbo P et al (2007) OPTIMIZER: a web server for optimizing the codon usage of DNA sequences. Nucleic Acids Res 35(Web Server issue):W126–W131 Qian W et al (2012) Balanced codon usage optimizes eukaryotic translational efficiency. PLoS Genet 8(3):e1002603 Raab D et al (2010) The GeneOptimizer algorithm: using a sliding window approach to cope with the vast sequence space in multiparameter DNA sequence optimization. Syst Synth Biol 4(3):215–225 Raghava GP, Sahni G (1994) GMAP: a multi-purpose computer program to aid synthetic gene design, cassette mutagenesis and the introduction of potential restriction sites into DNA sequences. BioTechniques 16(6):1116–1123 Rehbein P et al (2019) “CodonWizard” – an intuitive software tool with graphical user interface for customizable codon optimization in protein expression efforts. Protein Expr Purif 160:84–93 Richardson SM et al (2006) GeneDesign: rapid, automated design of multikilobase synthetic genes. Genome Res 16(4):550–556 Rodnina MV (2016) The ribosome in action: tuning of translational efficiency and protein folding. Protein Sci 25(8):1390–1406 Ronzitti G, Gross DA, Mingozzi F (2020) Human immune responses to adeno-associated virus (AAV) vectors. Front Immunol 11:670 Rosenblum G et al (2013) Quantifying elongation rhythm during full-length protein synthesis. J Am Chem Soc 135(30):11322–11329 Roymondal U, Das S, Sahoo S (2009) Predicting gene expression level from relative codon usage bias: an application to Escherichia coli genome. DNA Res 16(1):13–30 Sahin U, Karikó K, Türeci Ö (2014) mRNA-based therapeutics--developing a new class of drugs. Nat Rev Drug Discov 13(10):759–780 Sako H et al (2020) microRNAs slow translating ribosomes to prevent protein misfolding. bioRxiv preprint Samatova E et al (2020) Translational control by ribosome pausing in bacteria: how a non-uniform pace of translation affects protein production and folding. Front Microbiol 11:619430 Satya RV, Mukherjee A, Ranga U (2003) A pattern matching algorithm for codon optimization and CpG motif-engineering in DNA expression vectors. Proc IEEE Comput Soc Bioinform Conf 2:294–305 Sauna ZE, Kimchi-Sarfaty C (2011) Understanding the contribution of synonymous mutations to human disease. Nat Rev Genet 12(10):683–691 Sauna ZE et al (2018) Evaluating and mitigating the immunogenicity of therapeutic proteins. Trends Biotechnol 36(10):1068–1084 Sethu S et al (2013) Immunoglobulin G1 and immunoglobulin G4 antibodies in multiple sclerosis patients treated with IFNβ interact with the endogenous cytokine and activate complement. Clin Immunol 148(2):177–185 Sharma AK, Ahmed N, O’Brien EP (2018) Determinants of translation speed are randomly distributed across transcripts resulting in a universal scaling of protein synthesis times. Phys Rev E 97(2–1):022409 Sharp PM, Li WH (1987) The codon adaptation index--a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15(3):1281–1295 Sharp PM, Tuohy TM, Mosurski KR (1986) Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res 14(13):5125–5143 Simhadri VL et al (2017) Single synonymous mutation in factor IX alters protein properties and underlies haemophilia B. J Med Genet 54(5):338–345 Stadler M, Fire A (2011) Wobble base-pairing slows in vivo translation elongation in metazoans. RNA 17(12):2063–2073 Starck SR, Shastri N (2016) Nowhere to hide: unconventional translation yields cryptic peptides for immune surveillance. Immunol Rev 272(1):8–16
10 Condon Optimization: Codon Optimization of Therapeutic Proteins: Suggested…
223
Steinberger J et al (2020) Identification and characterization of hippuristanol-resistant mutants reveals eIF4A1 dependencies within mRNA 5′ leader regions. Nucleic Acids Res 48(17):9521–9537 Taneda A, Asai K (2020) COSMO: a dynamic programming algorithm for multicriteria codon optimization. Comput Struct Biotechnol J 18:1811–1818 Thess A et al (2015) Sequence-engineered mRNA without chemical nucleoside modifications enables an effective protein therapy in large animals. Mol Ther 23(9):1456–1464 Tian J et al (2018) Presyncodon, a web server for gene design with the evolutionary information of the expression hosts. Int J Mol Sci 19(12) Tourdot S et al (2020) 10(th) European immunogenicity platform open symposium on immunogenicity of biopharmaceuticals. MAbs 12(1):1725369 Trösemeier JH et al (2019) Optimizing the dynamics of protein expression. Sci Rep 9(1):7511 Urrutia AO, Hurst LD (2001) Codon usage bias covaries with expression breadth and the rate of synonymous evolution in humans, but this is not evidence for selection. Genetics 159(3):1191–1199 Villalobos A et al (2006) Gene designer: a synthetic biology tool for constructing artificial DNA segments. BMC Bioinformatics 7:285 Wan XF et al (2004) Quantitative relationship between synonymous codon usage bias and GC composition across unicellular genomes. BMC Evol Biol 4:19 Wan R, Bai R, Shi Y (2019) Molecular choreography of pre-mRNA splicing by the spliceosome. Curr Opin Struct Biol 59:124–133 Wang H, McManus J, Kingsford C (2017) Accurate recovery of ribosome positions reveals slow translation of wobble-pairing codons in yeast. J Comput Biol 24(6):486–500 Wang J et al (2020) Structural basis for the transition from translation initiation to elongation by an 80S-eIF5B complex. Nat Commun 11(1):5003 Weiner MP, Scheraga HA (1989) A set of Macintosh computer programs for the design and analysis of synthetic genes. Comput Appl Biosci 5(3):191–198 Welch M et al (2009a) You’re one in a googol: optimizing genes for protein expression. J R Soc Interface 6(Suppl 4):S467–S476 Welch M et al (2009b) Design parameters to control synthetic gene expression in Escherichia coli. PLoS One 4(9):e7002 Wellensiek BP et al (2013) Genome-wide profiling of human cap-independent translation- enhancing elements. Nat Methods 10(8):747–750 Wethmar K (2014) The regulatory potential of upstream open reading frames in eukaryotic gene expression. Wiley Interdiscip Rev RNA 5(6):765–778 Wethmar K et al (2014) uORFdb--a comprehensive literature database on eukaryotic uORF biology. Nucleic Acids Res 42(Database issue):D60–D67 Wildner S et al (2019) Aptamers as quality control tool for production, storage and biosimilarity of the anti-CD20 biopharmaceutical rituximab. Sci Rep 9(1):1111 Wilson JM, Flotte TR (2020) Moving forward after two deaths in a gene therapy trial of myotubular myopathy. Hum Gene Ther 31(13–14):695–696 Wright F (1990) The ‘effective number of codons’ used in a gene. Gene 87(1):23–29 Wu G, Bashir-Bello N, Freeland SJ (2006) The synthetic gene designer: a flexible web platform to explore sequence manipulation for heterologous expression. Protein Expr Purif 47(2):441–445 Wu MR et al (2019) A high-throughput screening and computation platform for identifying synthetic promoters with enhanced cell-state specificity (SPECS). Nat Commun 10(1):2880 Xu C, Zhang J (2020) Mammalian alternative translation initiation is mostly nonadaptive. Mol Biol Evol 37(7):2015–2028 Yang JR (2017) Does mRNA structure contain genetic information for regulating co-translational protein folding? Zool Res 38(1):36–43 Yang Y, Wang Z (2019) IRES-mediated cap-independent translation, a path leading to hidden proteome. J Mol Cell Biol 11(10):911–919
224
V. P. Mauro
Zhang X et al (2020) RT-PCR analysis of mRNA revealed the splice-altering effect of rare intronic variants in monogenic disorders. Ann Hum Genet 84(6):456–462 Zhou Z et al (2016) Codon usage is an important determinant of gene expression levels largely through its effects on transcription. Proc Natl Acad Sci U S A 113(41):E6117–e6125
Correction to: SNPs Ability to Influence Disease Risk: Breaking the Silence on Synonymous Mutations in Cancer Eduardo Herreros, Xander Janssens, Daniele Pepe, and Kim De Keersmaecker
Correction to: Chapter 5 in: Z. E. Sauna, C. Kimchi-Sarfaty (eds.), Single Nucleotide Polymorphisms, https://doi.org/10.1007/978-3-031-05616-1_5 Chapter 5, “SNPs Ability to Influence Disease Risk: Breaking the Silence on Synonymous Mutations in Cancer” was previously published non-open access. It has now been changed to open access under a CC BY 4.0 license and the copyright holder updated to ‘The Author(s)’. The book has also been updated with this change.
The updated original version of this chapter can be found at https://doi.org/10.1007/978-3-031-05616-1_5 © The Author(s) 2022 Z. E. Sauna, C. Kimchi-Sarfaty (eds.), Single Nucleotide Polymorphisms, https://doi.org/10.1007/978-3-031-05616-1_11
C1
Index
A A82A mutation, 38 aa-tRNAs, 207, 209 Abundant tRNAs, 15 Acute myeloid leukemia, 89 Adaptive mutations, 18 Additive model, 56, 57 Adeno-associated virus (AAV), 215 Adenomatous polyposis, 189 Adverse drug reactions (ADR), 173 Age-related macular degeneration, 59 AGG-AGA codon, 212 Aggregation-prone proteins, 43, 45 ALK translocations, 187 Alleles, 52 Alternative protein conformations, 208 Alzheimer’s diseases, 10, 45 Amino acid, 21, 37, 41, 207 Aminoacyl-tRNAs (aa-tRNAs), 200 Amyotrophic lateral sclerosis (ALS), 114 Anaphylactoid reactions, 211 Ankylosing spondylitis (AS), 60 Anti-drug antibodies (ADAs), 211 Antigen presenting cells (APCs), 86, 158 Anti-HER2 antibody herceptin, 89 Aptamers, 156 Area under the curve (AUC), 58 Artificial intelligence models, 179 Artificial intelligence space, 180 Association-based studies, 63 Association model, 55–57 Ataxia telangiectasia, 78 Attention deficit/hyperactivity disorder (ADHD), 188
Autosomal dominant disorder, 86 Autosomal recessive congenital thrombotic thrombocytopenic purpura (cTTP), 111, 112 B Background mutation rate (BMR), 79, 80 Bayesian approaches, 69 BCL2L12 gene, 90, 108 BCR–ABL fusion protein, 186 Biallelic markers, 54 BIC and UMD, 87 Bioinformatic pipelines, 4 Bioinformatics analyses, 26 Biological mechanisms, 51, 58 Bioproduction, 214 Biostatistical methods, 78, 79 Bipolar disorder, 51, 60 Bone marrow failure, 87 BRCA1, 107 BRCA1 and BRCA2 genes, 86 BRCA1 associated protein 1 (BAP1), 188 BRCA2, 107 Breast cancer, 60 Broad sense heritability model, 63 C Cancer, 51, 108 amino acid substitution/premature STOP codon, 78 BMR, 79 BRCA1, 107
© Springer Nature Switzerland AG 2022 Z. E. Sauna, C. Kimchi-Sarfaty (eds.), Single Nucleotide Polymorphisms, https://doi.org/10.1007/978-3-031-05616-1
225
226 Cancer (cont.) codon usage, 90, 91 driver mutations, 80, 81 geneticists and biologists, 78 hallmarks, 78 INDELs, 78 machine-learning based bioinformatic tool, 79 malignant cells, 77 mRNA structure, 87, 88, 90 multidrug resistance, 109, 110 MutSigCV, 79 next generation sequencing technology, 77 non-coding genome, 78 protein coding genome, 78 protein expression level, 92 protein stability, 91, 92 screening, 80, 81 sequential accumulation of genetic mutations, 77 skin cancer (melanoma), 77 SNVs, 77, 79 splicing, 81, 84–87 toxins/environmental factors, 78 TP53, 107 Cardiovascular disease, 51, 62 Case-control GWASs, 56 Catechol-O-methyltransferase (COMT), 115, 116, 141 Causative alleles, 52 Cellular redundancies, 38 Cellular tRNA pools, 21 CFTR biogenesis, 120, 121 CFTR expression vectors, 117 Chromosome, 52 Chronic lymphocytic leukemia (CLL), 187 Chronic myeloid leukemia (CML), 186 Cigarette smoke, 78 Circular dichroism (CD), 40, 144, 156 Circular RNAs (circRNAs), 176 Classifier model, 57–59 Clinical Pharmacogenetics Implementation Consortium (CPIC), 172, 177–178 Clinical risk factors, 58 ClinVar, 3–7 Clonal interference, 28 Coagulation factor IX, 210 Cochran-Armitage test, 69 Coding sequence (CDS), 5 Codon adaptation index (CAI), 22, 135 Codon-anticodon binding affinity, 20 Codon bias, 99 Codon bias index (CBI), 135
Index Codon optimality and tRNA abundance codons, 108 frequent codon with high tRNA abundance cTTP, 111, 112 the c.1584G>A synonymous mutation in CFTR results, 112, 113 the c.354G>a synonymous variant in ADAMTS13, 112 high to very low abundance tRNA decoder CFTR, 110 regulator of translation, 110 the CFTR c.2562T>G sSNP, 110, 111 protein biogenesis, 108, 109 protein folding, 108 protein synthesis, 109 rare codon with low tRNA abundance multidrug resistance, cancer, 109, 110 tRNome, 109 Codon optimization, 40, 41, 43, 44, 117, 118, 121, 133, 136, 153, 155 approaches, 203–206 EPO, 214 potential advantages, 212 potential risks, 213 therapeutic applications, 216, 217 usage and features, 200, 202 widespread acceptance, 203, 206 Codon optimization problems immunogenicity, 211 overlapping information, 207, 208 protein conformation disruption, 210, 211 questionable assumptions, higher eukaryotes, 208, 209 Codon-optimized genes, 203–206, 216 Codon redundancy, 99 Codon usage, 38–41, 44, 45, 100, 101, 135, 153 approaches, 200–202 metrics, 207 mRNAs, 202 organisms, 200 protein expression, 202 protein folding, 208 splicing, 202 Codon usage bias (CUB), 133, 135–137, 141 Codons, 90, 91 Colon adenocarcinoma, 89 Commercialization of genetic testing, 62 Common disease-common variant hypothesis (CDCV), 63 Common SNPs, 52 Composite exonic regulatory element of splicing (CERES), 106
Index
227
Computational methods, 143 Computational molecular physics (CMP), 101 Computed molecular vs. observed functional consequences, 5 Computing molecular consequences in dbSNP, 5, 6, 8 Conformational changes, 211 Confounding/interacting variables, 69 Copy number variants (CNVs), 64, 65 Coronary artery disease, 60 Co-segregation, 52 Costimulatory molecules, 60 Co-translational folding, 134, 144, 154, 155 Co-translational processes, 216 CRISPR-Cas9 screen, 187 Crohn’s disease, 57, 60, 108 Cross Pathway Control Protein 1 (CPC-1) gene, 210 Cystic fibrosis (CF), 51, 66, 78 CERES, 106 CFTR, 105, 106 eukaryotic pre-mRNA and mRNA secondary structures, 106 single synonymous substitutions, 106 the c.1584G>A CFTR variant with multiple consequences, 106, 107 the c.2811G>T synonymous mutation, 106 Cystic fibrosis transmembrane conductance regulator (CFTR), 105, 106, 110–112, 116, 117, 119–121, 141 Cytochrome P450 gene family, 173 Cytokine pathway, 60 Cytopenias, 87
Detecting Disease-causing Genetic SynoNymous variants (DDIG-SN), 159 Differential scanning colorimetry (DSC), 157 Differential scanning fluorimetry (DSF), 157 Digestion profiles, 156 Direct associations, 55 Direct-to-consumer personal genotyping, 62 Disease-associated loci, 66, 70 Disease-causing variants, 63 Disease manifestation origins and mechanisms, 59–61 Disease pathogenesis, 78 Disease penetrance, 101 Disease predisposition, 172 Disease risk prediction, 57–59 Distribution of fitness effects (DFE), 18, 26, 136 DNA double-strand breaks, 86 DNA sequencing, 186 Dominant model, 56 Dopamine receptors, 114 Dosage sensitivity, 64 DRD2 dopamine receptor, 141 Driver mutations for oncogenesis, 80, 81 Drosophila melanogaster, 23 Drug response, 62, 172 Druggable’ synthetic lethal interactions, 187 Drug-induced toxicity, 172 Drugs, 174 Drug-targeted receptors, 62 Dutch Pharmacogenetics Working Group (DPWG), 178
D Data collection, 40 Data meta-analysis, 70 Data quality, 68 Database of Deleterious Synonymous mutation (dbDSM), 39, 102, 121 Database of Genotype and Phenotype (dbGaP) studies, 4 Data validation, 70 Degeneracy of genetic code, 133 Deleterious synonymous mutations, 191 Detecting changes in miRNA binding in-silico prediction tools, 148, 152 MiRDB, 148 miRNA-mRNA interactions, 148 miRNA seed region, 148 standard experimental techniques, 152 synonymous SNPs, 148
E Edifice of life, 44, 45 eIF4F complex, 198 Electronic health record systems, 172 Electrophoretic mobility shift assays (EMSAs), 89 Endogenous thrombopoietin, 211 End-point assays, 199 Environmental contributions, 58 Enzyme poly(ADP) ribose polymerase (PARP), 187 Epidermal growth factor receptor (EGFR), 92, 187 Erythrocytosis/von Hippel-Lindau disease, 189 Erythropoietin (EPO), 211, 214 ESE motif, 87 ESRseq, 147 Eukaryotes, 209
228 Eukaryotic mRNAs, 198, 199 Evolutionary biology development of molecular, 15 mRNA structure, 20 role of synonymous changes, 18 synonymous mutations Synonymous mutations Exonic splicing enhancer (ESE), 85, 189 Exonic splicing silencer (ESS), 85, 189 Experimental evolution (EE) approach, 137 Expression quantitative trait loci (eQTLs) databases, 176 Ex-vivo assessment, 145 Ex-vivo immunogenicity assessment assay, 158 Ex-vivo mRNA, 145 F F9, 145 False positive report probability (FPRP), 69 Familial Leigh’s syndrome, 104 Fitness effects, 21, 26, 28 Fluorescence-Activated Cell Sorting (FACS) analysis, 136 Fluorescence emission, 157 Functional synonymous SNPs, 176 G Gastrointestinal stromal tumors (GIST), 187 GeneArtTM GeneOptimizerTM software, 203 Gene-drug relationships, 175 Gene expression regulation, 91 Gene mutations/inactivations, 188 Genetic architecture, 58, 63 Genetic code, 15, 19, 20, 29, 100, 133 Genetic drift vs. selection, 16 Genetic markers, 54 Genetic polymorphisms, 174 Genetic sequence, 37–45 Genetic variation, 64, 211 Genome wide association studies (GWAS), 17 applications GWAS applications data interpretation, 55–59 follow-up analysis, 71 genetic markers, 52 hypothesis-free approach, 52 LD, 52–55 limitations and challenges, 66 clinical setting, 65 disease-associated loci, 66
Index Mendelian disorders, 66 missing heritability, 63–65 noncoding/regulatory regions, 65 linkage analysis, 51 SNP-associations data analysis, 68, 69 data meta-analysis, 70 data quality, 68 data validation, 70 population sampling, 67 technology, 68 SNPs, 52–55 Genomes Project, 172 Genome-wide association studies (GWAS), 176 Genomic instability, 78 Genotype and allele frequencies, 68 direct-to-consumer personal, 62 models and disease risk, 56, 57 populations, 55 and sequencing, 52 and SNP, 58 technology, 51 Genotype–phenotype associations, 55–57 Genotype-Tissue Expression (GTEx), 177 GnomAD, 3 Green Fluorescent Protein (GFP), 136 GTPases, 209 Guardian of genome, 86 GWAS applications commercialization of genetic testing, 62 individual risk assessment, 61, 62 origins and mechanisms of disease manifestation, 59–61 patient care, 61, 62 personalized medicine, 61, 62 therapeutic targets, 61, 62 GWAS data analysis, 68, 69 GWAS data interpretation association model, 55–57 classifier model, 57–59 genotype–phenotype associations, 55–57 H Haemophilia, 117, 118 Hairpin, 88 Haplotype frequencies, 54 HapMap, 3, 55 Hardy-Weinberg Equilibrium (HWE), 68 HEK293T cells, 210 Heritability, 63–65 Heterologous expression, 212
Index High-risk polymorphisms, 173 High-throughput techniques, 145 Homologous recombination (HR), 187 Hot spots, 53 Hsp90 protein, 26 Human diseases, 24, 38, 186 Human epidermal growth factor receptor 2 (HER2), 186 Human genetics, 66 Human Genome Diversity Project, 51 Human genome sequencing, 186 Human Leukocyte Antigen (HLA) locus, 60 Human Splicing Finder, 87 Huntington’s disease, 51 Hydrogen-deuterium exchange mass spectrometry (HDX-MS), 156 Hypertension, 60 Hypothesis-free approach, 52 I Imaging-based translation assessment techniques, 154 Immune activation, 213 Immunogenicity, 211 Immunogenicity risk, 158, 159 Immunohistochemistry (IHC), 187 Immunology, 44 In silico analysis of sSNPs, CFTR mRNA structure, 119 RSCU, 119 In silico tools, 79 In vitro-transcribed mRNA, 213 In vivo applications, 213, 217 In vivo protein folding and function, 25 Indirect associations, 55 Individual risk assessment, 61, 62 Industrial enzymes, 197 Infinitesimal model, 63 Inflammatory bowel disease, 60 INFORM registry, 188 In-silico splicing analysis tools, 145 In-silico tools, 141, 143, 148, 158 In-silico tools predict immunogenicity, 158 Insoluble protein, 208 Intelligent technologies, 180 Internal ribosome entry sites (IRESes), 198 International HapMap Project, 54 Intronic sequences, 81 In-vitro minigene assays, 145 In-vitro translation assay, 153 In-vivo immunogenicity risk assessment, 158 Isochores, 209
229 J Juvenile gout, 104, 105 K Kelley-Seegmiller syndrome, 104, 105 Kernel density estimation (KDE), 42 Kimura’s model, 21 L Large fitness effects, 18 Large-scale RNA interference (RNAi), 187 Leigh's encephalomyelopathy, 104 Lesch–Nyhan syndrome, 104, 105 Limited proteolysis, 156 Linkage disequilibrium (LD), 52–55 Locus-specific databases (LSDB), 3 Logistic regression models, 69 Long non-coding RNAs (lncRNAs), 176 Low-coverage whole-genome sequencing (lcWGS), 188 Lung squamous cell carcinoma (LUSC), 84 M Machine learning, 41–44, 144 Machine learning approaches, 144 Machine-learning based bioinformatic tool, 79 Machine learning techniques, 176 Machine learning tools, 144 Major histocompatibility complex class II (MHC-II), 158 Major histocompatibility complex (MHC) locus, 61 Malignant cells, 77 Mammalian transcriptomes, 88 Maturity-onset diabetes of the young (MODY), 62 Mendelian diseases, 51 Mendelian disorders, 66 Messenger RNA (mRNA), 81, 176 Metabolism/cell signaling, 5 Methionine tRNA (Met-tRNA), 199 Methylobacterium, 25 MFold in-silico prediction program, 144 MHC-associated Peptide Proteomics (MAPPS) assay, 158 Microarrays, 145 Microbial populations, 28 MicroRNAs (miRNAs), 88–90, 107, 147
230 Micro-RNAs targeting protein-coding regions (ORFs) coding sequence specific miRNAs, 108 functions, 107 human disorders with sSNPs altering coding sequence targets of miRNAs cancer, 108 Crohn’s disease, 108 mammalian cells, 107 Minigene sequence, 84 Minimal functional constraint, 21 MiRDB, 148 miRNA-mRNA interactions, 148 MiSplice (mutation-induced splicing), 84 Molecular consequences, 5 Molecularly targeted therapies, 187 Monitoring translation kinetics, 153–154 Monocistronic, 199 mRNA, 99 stability, 141–144 structure, 141–144 mRNA constructs, 213 mRNA primary sequences, 210 mRNA sequences, 20, 100 mRNA stability, 102, 113–115 mRNA structure, 20, 87–90, 119 mRNA, synonymous variants bicodon, 135 CAI, 135 CBI, 135 codon optimization, 136 codon usage biases, 135 codon usage frequencies, 135 comparative genomics, 135 construct matrix models, 135 CUB, 135, 136 detecting changes, miRNA binding, 147–152 DFE, 136 EE approach, 137 fitness effects, 136 gene, 136 in-silico tools, 136 methods use, study, 138–140 mRNA structure and stability, 141–144 Mtb, 137 natural selection, 135 predicting splicing effects, 146, 147 pre-mRNA splicing study, 144, 145, 147 rare codons, 136 replace rare codons with frequent codons, 137 RNA decay, 140 RNAseq analysis, 137
Index RSCU, 135 RT-qPCR, 141 Salmonella enterica species, 137 transcription, 137 transcription initiation, 137 mRNAs encoding genetic variants, 153 mRNA-type vaccines, 44 Multidrug resistance one gene (MDR1) gene, 109 Multiple sclerosis (MS), 60 Multiplicative model, 56 Mutant tumor suppressor genes, 187 MutSigCV, 79 Mycobacterium tuberculosis (Mtb), 137 Myelodysplastic syndrome/acute myeloid leukemia (MDS/AML), 87 Myocardial infarction, 60 N NanoDSF, 157 Nascent polypeptides, 154 National Institutes of Health (NIH), 172 NCBI Allele Frequency Aggregator (ALFA) project, 3–4 NCBI PubMed database, 172 NCBI Single Nucleotide Polymorphism database (dbSNP) bioinformatic pipelines, 4 computed molecular vs. observed functional consequences, 5 computing molecular consequences, 5, 6, 8 genetic variations, 4 GnomAD, 3 GO-ESP, 3 HapMap, 3 molecular consequences, 5 RefSNP number, 4 rs identifiers, 3 searching, 9 SNV, 4 SO, 3 splicing variants, 8 TOPMED, 3 Nearest Neighbor Database (NNDB), 144 Neural networks, 147 Neurospora crassa, 210 Neutral sites, 28 Neutral theory of molecular evolution, 15, 20, 21 Neutrality, 22 Next generation sequencing (NGS), 180 Next generation sequencing analysis, 92 Next generation sequencing technology, 77
Index Next-generation sequencing, 191 NNSPLICE splicing prediction tool, 86 Non-CDS variants, 7, 9 Non-coding genome, 78 Non-coding SNPs, 175, 176, 179 Non-functional CYP2D6 protein, 173 Non-linear mechanisms, 199 Non-small cell lung cancer (NSCLC), 187, 189 Non-synonymous mutations, 20, 25, 44, 86 Non-synonymous SNPs, 178 Nuclear magnetic resonance (NMR), 155 Nucleotides, 88 O Odds ratios, 56 Oncomirs, 90 Oncotype DX assay, 187 Open reading frame (ORF), 153 Over-dominant model, 56 P p53 protein, 38 p53 tumor suppressor protein, 86 Pain receptor, 115, 116 Parkinson’s diseases, 45 Passengers, 78 Pathogenic cancer mutations, 84 Pathogenic synonymous mutations human diseases, 188 Kaplan-Meier curves, 190 oncogenes and autosomal disorder genes activation, 189 prediction algorithms, 191 synonymous polymorphisms, 189 TCGA, 189 tumor suppressors and genes, 189 Pathogenic synonymous variants, 159 Patient care, 61, 62 Pegylated recombinant human megakaryocyte growth and development factor (PEG-rHuMGDF), 211 Personalized medicine, 61, 62, 71, 185, 186, 189 Pfizer/BioNTech’s BNT-162b2, 45 P-glycoptotein (P-gp), 109 Pharmaceuticals, 171 Pharmacogenetics, 172, 174 Pharmacogenetics testing barriers, 177 CPIC and DPWG, 178
231 direct-to-consumer genetic tests, 180 factors, 177 implementation, 177, 180 larger scale implementation, 180 optimism, 178 predictive algorithms, 179 sequencing costs, 180 SNPs, 179 Pharmacogenomic SNPs, 180 Pharmacogenomics, 62 Pharmacogenomics Knowledge Base (PharmGKB), 172, 177 Phenotype association, 57 defined, 67 in GWASs, 64 non-disease, 58 type, 55 Phenylketonuria, 105 Photochromicity, 212 Photostability, 212 pH-sensitivity, 212 Phylogenetic methods, 23 PIVar, 89 Pleitropy, 60 Poly ADP-ribose polymerase 1 (PARP1) gene, 84 Polybromo 1 (PBRM1), 188 Polycystic kidney disease, 105 Polygenic risk scoring, 180 Polygenic scoring, 57 Polymorphic genes, 173 Polymorphism, 23, 59 Polypeptide synthesis, 200, 208 PolyPhen2, 79 Polysome profiling, 153, 154 Polysomes, 153 Population genetics, 21, 28 Post-transcriptional modifications, 87 Post-translational modification, 157 Precision medicine database resources, 172 individual’s genetic profile, 171 PubMed, 173 SNPs, 175 technological advancement, 172 Precision oncology, 188 Predictive biomarkers, 186, 191 Pre-mRNA processing, 103, 104 Pre-mRNA splicing, 105, 144, 145, 147 Prognostic biomarkers, 187 Prokaryotes, 27 Promiscuous initiation, 215
232 Prostate cancer, 60 Protein Circular Dichroism Data Bank (PCDDB), 40 Protein coding genome, 78, 90 Protein Data Bank (PDB), 40–42 Protein expression technologies, 213, 215 Protein folding, 102 Proteins applications, 197 expression, 197, 216 hormone insulin, 197 Protein stability, 91, 92, 157 Protein structures, 38, 39, 41, 44 Protein synthesis, 102 Protein synthesis primer polypeptide synthesis, 200 translation initiation, 198–199 Proteins, synonymous variants experimental methods to identify conformational differences, 155 immunogenicity risk evaluation, 158, 159 monitoring translation kinetics, 153–154 stability changes assessment, 157, 158 subtle structural changes analysis, 154–156 Proteomic based methods, 154 Pseudoknot structures, 88 Pyruvate dehydrogenase (PDH)-E1α deficiency, 104 R Ramp theory, 100 Rare allele model, 63 Rare codons, 136 Receiver operating characteristic (ROC) curves, 58, 59 Recessive model, 56 Recombination events, 53 Reference Sequence (RefSeq) alignments, 6 Reference SNP (RefSNP) in CDS, 6, 8 consequence reporting, 10 genomic position, 6 identifiers, 3 maps, 8 molecular consequence, 6 non-CDS variants, 7, 9 number, 4 search and filtering, 9 variant allele, 6, 8 Relative codon use, 16 Relative synonymous codon usage (RSCU), 119, 135 RET proto-oncogene, 189 Rheumatoid arthritis (RA), 60
Index Ribosome-associated initiator Met-tRNA, 199 Ribosomal proteins, 22 Ribosomal recruitment, 199 Ribosome profiling, 154, 212, 217 RiboTempo algorithm, 112 Risk prediction models, 62 RNA binding proteins (RBPs), 88, 89 RNA sequencing (RNAseq), 85, 87, 137, 145, 188 RNA stability, 133 RNA structure, 133, 143, 144 RNA structure prediction approaches, 143 RT-PCR, 87 RT-qPCR, 141, 145, 153 S Saccharomyces cerevisiae, 26, 209 Salmonella typhimurium, 26 SARS-CoV-2, 44 Schizophrenia, 51 ScispaCy models, 172, 174 Screening, 80, 81 Secretory proteins, 102 Sequence ontology (SO), 3, 5–8, 10 Serine/threonine kinase proto-oncogene BRAF, 187 Severe immunodeficiency, 87 Short hairpin RNAs (shRNAs), 187 SIFT, 79 Signal recognition particle (SRP), 100 Signal-transduction molecules, 60 Silent, 40 Silent mutation, 37, 185, 188 Silent variants, 133 Single cell sequencing studies, 93 Single nucleotide polymorphisms (SNPs) collection, 172 dbSNP NCBI Single Nucleotide Polymorphism database (dbSNP) detection methods, 177 drug response, 175 drugs, 174 eQTLs databases, 176 functional and non-functional variants, 176 and LD, 52–55 mRNA secondary structure, 176 non-coding, 177 non-coding RNA, 176 non-synonymous, 175, 176 PharmGKB, 177 potential functionality, 175 protein folding and structure, 176 synonymous, 175 Single nucleotide variants (SNVs), 4, 77
Index Single-chain anti-IgE (scFv), 44 Site frequency spectrum (SFS), 16, 17 Skin cancer (melanoma), 77 SLCO1B1 gene, 173 Small GTPases, 88 Small insertions and deletions (INDELs), 78 SNP annotation databases, 175 SNP associations, 58 Somatic synonymous mutation, 93, 189 Spinal muscular atrophy (SMA), 105 Splicing, 81, 84–87, 106, 107, 110, 120, 144, 145, 147, 202 Splicing dysregulation features, 145 Splicing enhancers, 103, 104 Splicing quantitative trait loci (sQTLs), 177 Splicing regulatory element (SRE), 145 Spurious associations, 55 sSNP-associated pre-mRNA processing defects cancer, 107 cystic fibrosis, 105–107 Juvenile gout, 104, 105 Kelley-Seegmiller syndrome, 104, 105 Leigh's encephalomyelopathy, 104 Lesch–Nyhan syndrome, 104, 105 phenylketonuria, 105 polycystic kidney disease, 105 SMA, 105 TCS, 105 X-linked metabolic disorder, 104 Structural biology, 41 Subtle structural changes analysis, 154–156 Superoxide dismutase 1 (SOD1) gene, 114, 115 Synchrotron radiation CD (SRCD), 40 SynMIC database, 39 SynMICdb, 191 SynMICdb score, 81, 88 Synonymous and non-synonymous substitution rates, 22 Synonymous codons, 38, 39, 207 carry protein structure information, 99, 100 frequency, 99 secondary “code” for protein folding, 100 usage patterns, 101 variations, 99 Synonymous codon usage, 121 Synonymous codon usage, and mRNA half-life mRNA structure conversion, 113 neurological disorders associated with synonymous mutations, dopamine receptor, 114, 115 optimal codon content, 113
233 Synonymous mutation-mediated protein folding defects, 111 Synonymous mutations, 44 abundant tRNAs, 15 cellular fitness, 101 codon, 20 consequences, 101, 102, 122 deleterious, 185 description, 81, 83–84 direct evidence, 17, 18 disease penetrance, 101 diverse mechanisms, 24–27 evolutionary framework, 27, 28 evolutionary perspectives, 18, 19 gene expression, 25 genetic code, 15 genomic analyses, 185 genomic signatures of selection, 28 in cancer Cancer indirect evidence, 16, 17 mechanisms, 122 mechanisms driving selection, 29 molecular mechanisms, 82 neutral, 19–21 pathogenic, 185 pervasive (strong) selection, 24–27 pioneering studies, 185 pre-mRNA processing, 104 protein biogenesis mechanisms, 103 severity, 101 short-and long-term evolutionary effects, 28 silent nature, 20 translational and non-translational impacts, 27 weak translational selection, 21–24 Synonymous mutations alter translation and protein folding, mechanisms altered binding of micro-RNAs targeting protein-coding regions (ORFs) coding sequence specific miRNAs, 108 human disorders with sSNPs altering coding sequence targets, 108 mammalian cells, 107 codon optimality and tRNA abundance Codon optimality and tRNA abundance consequences, 102, 118, 119 genes, 102 mRNA secondary structures without changing mRNA half-life Synonymous mutations altering mRNA secondary structures without changing mRNA half-life
234 Synonymous mutations alter translation and protein folding, mechanisms (cont.) multiple synonymous mutations and consequences CFTR biogenesis, 120, 121 multiple sSNP, CFTR cause mild cystic fibrosis, 119, 120 optimal codon content, 113 pre-mRNA processing ESEs, 104 exonic cis-elements, 103 protein-coding, 103 sSNP-associated, 104–107 role, 102 synonymous codon usage, and mRNA half-life, 113–115 Synonymous mutations altering mRNA secondary structures without changing mRNA half-life CFTR, 116, 117 COMT, 115, 116 haemophilia, 117, 118 pain receptor, 115, 116 translation, 115 Synonymous polymorphisms, 189 Synonymous variants diseases and cancers, 134 functional effects, 134 in-silico tools, 134 methods, 134 mRNA mRNA, synonymous variants predicting comprehensive effects, in-silico tools, 159 proteins Proteins, synonymous variants silent variants, 133 single nucleotide changes, 133 synonymous codon substitutions, 133 T Tandem affinity purification (TAP-Tar), 152 Targeted mutagenesis, 144 TargetScan, 148 Therapeutic targets, 61, 62 Thermodynamic mRNA stability, 143 ThermoFisher, 203 Toll-like receptors (TLR), 141 TOPMED, 3 TP53, 80, 86, 91, 107 Translating Ribosome Affinity PurificationSequencing (TRAP-seq), 154 Translation efficiency, 133, 153, 154 Translation elongation, 208 Translation enhancer elements (TEEs), 215
Index Translation initiation cap-dependent and cap-independent, 199 Met-tRNA, 199 mRNAs, 199 non-linear mechanisms, 199 principles, 199 processes, 198 scanning hypothesis, 199 Treacher Collins syndrome (TCS), 105 Tristetraprolin (TTP), 189 tRNA abundance, 37, 38, 40, 41, 45 tRNA composition, 109 tRNA pools, 24 tRNA transcriptome (tRNome), 109, 112 Trypsin, 156 Type 1 diabetes, 60 Type 1 diabetes mellitus (T1D), 61 Type 2 diabetes, 60, 70 U Undersampling scheme-based method for deleterious synonymous mutation (usDSM), 191 Upstream open reading frames (uORFs), 5 UV-irradiation, 78 V V107V mutation, 38 Vaccine production, 25 Variant Effect Predictor (VEP), 178, 179 Variant of uncertain clinical significance (VUS), 87 Variant-drug associations, 177 Vasoinhibin, 210 von Hippel-Lindau (VHL) gene, 189 von Willebrand Factor (VWF), 111 W Wellcome Trust Case Control Consortium (WTCCC), 60 Whole-exome DNA sequencing, 188 WNT signaling, 86 X X-linked metabolic disorder, 104 X-ray crystallography, 155 Z ZFP36, 88, 89