Clinical DNA Variant Interpretation: Theory and Practice 0128205199, 9780128205198

Clinical DNA Variant Interpretation: Theory and Practice, a new volume in the Translational and Applied Genomics series,

411 76 17MB

English Pages 436 [411] Year 2021

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Front-Matter_2021_Clinical-DNA-Variant-Interpretation
Clinical DNA Variant Interpretation: Theory and Practice
Copyright_2021_Clinical-DNA-Variant-Interpretation
Copyright
Dedication_2021_Clinical-DNA-Variant-Interpretation
Dedication
Conxi Lázaro
Jordan Lerner-Ellis
Amanda Spurdle
Contributors_2021_Clinical-DNA-Variant-Interpretation
Contributors
Foreword--the-challenge-of-variant-interp_2021_Clinical-DNA-Variant-Interpre
Foreword: the challenge of variant interpretation
About-the-editors_2021_Clinical-DNA-Variant-Interpretation
About the editors
Chapter-1---Introduction--the-challenge-of-geno_2021_Clinical-DNA-Variant-In
1 . Introduction: the challenge of genomic DNA interpretation
Chapter-2---General-considerations--terminolo_2021_Clinical-DNA-Variant-Inte
2 . General considerations: terminology and standards
Introduction
Genetic variation
Types of DNA sequence changes
Types of RNA sequence changes
Types of protein sequence changes
Variant consequences by location
Promoter region
5′ untranslated region
Start codon
Protein-coding region
Splice region, splice sites, and introns
Stop codon
3′ untranslated region and the polyadenylation signal
Other variation
Standards on describing genetic variation
Gene symbols
Reference sequences
Describing variants
The Variant Call Format
The Human Genome Variation Society nomenclature
Variant classification
Functional classification
Clinical classification
Standards on reporting disorders and phenotypes
Challenges and considerations
Conclusions
References
Chapter-3---International-consensus-guidelines-for-c_2021_Clinical-DNA-Varia
3 . International consensus guidelines for constitutional sequence variant interpretation
Historical variant interpretation approaches
Current variant classification practices: the 2015 ACMG/AMP guideline for sequence variant interpretation
Background and scope
Variant classification terminology
Evidence criteria and application
Variant classification and case interpretation
Ongoing and future adaptations of the ACMG/AMP guidelines
Specifications from ClinGen: the clinical genome resource
Gene-specific versus general criteria
Qualitative versus quantitative/Bayesian approaches
Summary
References
Chapter-4---Quantitative-modeling--multifactori_2021_Clinical-DNA-Variant-In
4 . Quantitative modeling: multifactorial integration of data
Overview of quantitative modeling for variant interpretation
Derivation of likelihood ratios
Proportions of categorical data
Likelihood ratios for complex categorical data
Calibration of continuous variables
Combining likelihood ratios
Components of quantitative models
Prior probability of pathogenicity
Bioinformatic predictions
Cosegregation
Functional assays
Complete in vitro mismatch repair activity assay
TP53 assays
BRCA1/2 assays
Personal and family history
BRCA1/2
TP53
Tumor characteristics
MMR tumor characteristics
BRCA1/2 breast cancer histopathology
TP53 breast cancer histopathology
TP53 somatic/germ line ratio
Co-occurrence with a pathogenic variant
Population-based data
Population frequency
Healthy adult individuals
Case–control data
Caveats and considerations
References
Chapter-5---Clinical-and-genetic-evidence-and-_2021_Clinical-DNA-Variant-Int
5 . Clinical and genetic evidence and population evidence
Introduction
Phenotype description
Medical pedigree
Population genetic resources
Fitness—reproductive success
Hardy–Weinberg equilibrium
Population ethnic background
Prevalence of disease
Expected variant frequency
Ascertainment
Ascertainment bias
Ascertainment of “healthy” individuals
Ascertainment of individuals with disease
Matched controls in genetics studies
Population allele frequency
Allele frequency thresholds
MAF thresholds
Thresholds used for benign evidence criteria
Thresholds used for pathogenic evidence criteria
Population size
Family history
Inheritance patterns
Autosomal dominant (AD)
Autosomal recessive
X-linked recessive
X-linked dominant
Y-linked
Mitochondrial Inheritance
Inheritance analysis
Cosegregation
Cosegregation phenotyping
Cosegregation limitations
Molecular pathology
Hereditary cancer predisposition
Tumor first sequencing
Molecular pathology markers in hereditary colorectal cancers
Molecular pathology markers in hereditary breast and ovarian cancer
Molecular pathology markers in congenital disorders
Molecular pathology markers in newborn screening
Mosaicism
Somatic versus germ line mosaicism
Testing strategies in mosaicism
Identification of mosaicism using next-generation sequencing
Mosaic presentations
Example 1: mosaic neurofibromatosis
Example 2: mosaic polycystic kidney disease
Example 3: Li–Fraumeni syndrome
Conclusion
References
Further reading
Chapter-6---The-computational-approach-to-variant-int_2021_Clinical-DNA-Vari
6 . The computational approach to variant interpretation: principles, results, and applicability
Pathogenicity predictors for amino acid sequence variants
The molecular impact of amino acid variants: a biophysical view
Protein stability changes upon mutation
The effect of variants on protein interactions
The applicability of biophysical models
Bioinformatic pathogenicity predictors: principles and present situation
Development of a bioinformatic predictor
Training datasets
The discriminant features
The classifier
The validation process
The validation process
The performance of bioinformatic pathogenicity predictors
The variability of performance estimates
Future developments and challenges
Computational predictors for variants affecting splicing
RNA splicing factors
Mis-RNA splicing and disease
Bioinformatic approaches to predict variant effect on splicing
Future developments and challenges
Acknowledgments
References
Chapter-7---Functional-evidence--I--transcripts-_2021_Clinical-DNA-Variant-I
7 . Functional evidence (I) transcripts and RNA-splicing outline
Introduction
Splicing, alternative splicing events, and splicing isoforms: the splicing profile
“Reference” transcript
Spliceogenic variants overlap cis-acting determinants of alternative splicing: short sequence motifs and long-range sequenc ...
Trans-acting and epigenetic determinants of alternative splicing
Roles of alternative splicing
Alternative splicing profile is dynamic
Spliceogenic variants: alternative splicing informs on the prior probability of being pathogenic
Splicing analyses: determining the spliceogenic impact of a genetic variant
Conclusion
References
Chapter-8---Functional-evidence--II--protein-a_2021_Clinical-DNA-Variant-Int
8 . Functional evidence (II) protein and enzyme function
Historical background
The challenge of variants of uncertain significance
Assessment of variant pathogenicity
Prediction of variant effects: in silico tools
Functional assays
Validation and calibration
Example: BRCA1 and BRCA2
Example: DNA mismatch repair genes
Example: BLM
Example: RHO
Example: CFTR
High-throughput assays
In vivo assays
Conclusion
Conflict of interest statement
References
Chapter-9---Somatic-data-usage-for-classificati_2021_Clinical-DNA-Variant-In
9 . Somatic data usage for classification of germ line variants
Introduction
Data sources
Somatic data resources
Other databases with limited somatic data
Control database for comparison
Laboratory practices utilizing somatic data
Principles and rationale for utilizing somatic data for classifying germ line variants in cancer predisposition genes
Loss of heterozygosity, determining biallelic inactivation, and cancer hot spots
Loss of heterozygosity
Copy-neutral LOH
Determining biallelic inactivation
Mutational hot spots
RNA-seq tumor data
Tumor signatures
Germ line risk and variant pathogenicity informed from tumor signatures
Breast cancer
CHEK2 variants
Other considerations for integrating germ line and somatic data
Biomarker considerations (immunohistochemistry and hormone status)
Determining pathogenicity of alleles in genes with recessive and dominant phenotypes integrating population, somatic, and g ...
Recognizing clonal evolution and specific somatic mutations in the context of predisposition
Leukemia predisposition genes
Identifying candidate predisposition genes
References
Chapter-10---Pharmacogenetics-and-personali_2021_Clinical-DNA-Variant-Interp
10 . Pharmacogenetics and personalized medicine
Introduction to pharmacogenetics and personalized medicine
Variant nomenclature in pharmacogenetics
Star allele nomenclature
HLA nomenclature
Other pharmacogenetic nomenclatures
Technologies for pharmacogenetic testing
Databases/resources for pharmacogenetics
PharmGKB
PharmVar
Clinical guidelines and decision support tools in pharmacogenetics
Clinical guidelines from PGx consortia
Clinical annotation tools
Complete PGx annotation tools
CYP2D6 annotation software
Pharmacogenetics examples in clinical practice
Psychiatry: carbamazepine/oxcarbazepine and HLA-A/B
Cardiology: clopidogrel and CYP2C19
Oncology: fluoropyrimidines and DPYD
Gastroenterology: thiopurines and TPMT/NUDT15
Organ transplant: tacrolimus and CYP3A5
Pain relief: codeine and CYP2D6
Antiretroviral therapy: abacavir and HLA-B
Implementation of pharmacogenetic testing in clinical practice
Future perspectives of personalized medicine
References
Chapter-11---Data-sharing-and-gene-variant_2021_Clinical-DNA-Variant-Interpr
11 . Data sharing and gene variant databases
Introduction
General databases
Focused databases
HGMD
ClinVar and GV shared LOVD
ClinVar
Global Variome shared LOVD
Other databases
Final considerations
References
Internet resources
Chapter-12---Approaches-to-the-comprehensive-inter_2021_Clinical-DNA-Variant
12 . Approaches to the comprehensive interpretation of genome-scale sequencing
Clinical applications of GS
Diagnostics
Screening
Research applications of GS
Analysis of GS results for various applications
Variant annotation and filtration
Basic gene and variant-level data
Population frequency data
Publication data and phenotype associations
Inheritance patterns
Filtration approaches using available annotations
Criteria used for returning results of GS
Return of diagnostic findings in GS
Return of secondary and screening findings in GS
Findings related to risk for Mendelian disease risk
Predictive capacity for disease risk
Medical actionability
Age of the patient population
Patient preferences
Other types of findings
Carrier status for recessive disease
Pharmacogenetic variants
Variants with low penetrance that may be considered as risk factors
Conclusion
References
Chapter-13---Phenotype-evaluation-and-clinical-context-_2021_Clinical-DNA-Va
13 . Phenotype evaluation and clinical context: application of case-level data in genomic variant interpretation
Introduction
Genetic testing in clinical practice
History of clinical genetics services
The role of the clinical geneticist
The purposes of genetic consultations and genomic testing
The evolving knowledgebase underpinning clinical diagnostic testing
Technological advances
Understanding the genomic architecture of disease
Large-scale data generation
Evolution in clinical diagnostic variant interpretation
Historic empirical disease-based interpretation
International coordination in variant data sharing
Emergence of international frameworks
Application of clinical and phenotypic information to variant interpretation and classification
Sources of clinical data
Contribution of the patient under investigation
Cases from clinical networks
Publicly available clinical evidence
Scientific literature
Repositories of variant information and locus-specific databases
Phenotype assessment
Incorporation of clinical data in variant interpretation
Reliability and robustness of phenotypic data under evaluation
Completeness of the available information and active inclusion/exclusion of clinical features
Presence of other valid explanations for the clinical features observed in the proband
Absence of classical high-sensitivity features in the proband(s)
Specificity of the observed phenotypic feature(s) for the genetic form of disease
Genetic heterogeneity: number of genes associated with the genetic form of the disease
Frequency information in genes with rare variation in the general population
Composition of the type of established pathogenic variation within a gene
Frequency of variation observed in cases with disease compared to the control population
Mode of inheritance and segregation of disease
Management of the patient based on the genomic data
Genomic findings of uncertain significance
The “negative” genetic result: when no causative variants are found
Management for a pathogenic variant
Individualized risk estimation
General risk estimates
Contextualizing risk estimation based on pattern of disease and family history
Hypomorphic variants
Moderate risk genes
Other genetic factors
Oligogenic modifier variants
Polygenic modifiers
Nongenetic factors
Individualizing patient management based on genomic information
Conclusions
References
Chapter-14---Inherited-cardiomyopathi_2021_Clinical-DNA-Variant-Interpretati
14 . Inherited cardiomyopathies
Introduction
Inherited heart diseases
Inherited cardiomyopathies
Hypertrophic cardiomyopathy
Dilated cardiomyopathy
Arrhythmogenic cardiomyopathy
Other cardiomyopathies
Restrictive cardiomyopathy
The role of genetic testing in cardiomyopathies
Value of genetic testing in cardiomyopathies
Identification of at-risk relatives and targeting of clinical screening
Emerging gene-directed treatments
Common issues in interpreting cardiomyopathy variants
Incomplete penetrance, age- and sex-related penetrance, and additional genetic variants
Case 1: Lack of segregation in family
Incomplete phenotype information or variable expression
Case 2: variable expression
Insufficient evidence for variant pathogenicity
Case 3: insufficient variant information
Future directions
Improved phenotyping, experimental evidence, and functional data for genetic variants
Tackling secondary findings of cardiac variants
Increased genetic screening of cardiac patients
Summary
Acknowledgment
References
Chapter-15---Phenylketonuria_2021_Clinical-DNA-Variant-Interpretation
15 . Phenylketonuria
Introduction
History of phenylketonuria
Clinical features
Clinical symptoms
Newborn screening
Diagnosis
Classification of PKU
Incidence of PKU
Genetic counseling
Management
Maternal PKU
Evolution of genotyping
Practical genotype–phenotype correlation
Case 1
Case 2
Case 3
References
Chapter-16---Hearing-loss_2021_Clinical-DNA-Variant-Interpretation
16 . Hearing loss
Introduction
Genetic tests for hearing loss
Disease sections: practical examples that highlight the main challenges of the molecular diagnosis of hearing loss
Apparent non-syndromic hearing loss
Large families with more than one gene involved
The importance of molecular karyotyping in the analysis of hearing loss
Cases negative for known deafness genes: what to do?
Conclusions
References
Chapter-17---Familial-hypercholesterole_2021_Clinical-DNA-Variant-Interpreta
17 . Familial hypercholesterolemia
Variant interpretation in FH
Functional studies
LDLR
APOB
PCSK9
Cosegregation
In silico prediction algorithms
Laboratory genetic testing for FH
Cases presentations
Case A
Presentation of the case
Laboratory results
Variant interpretation
Case B
Presentation of the case
Laboratory results
Variant interpretation
Case C
Presentation of the case
Laboratory results
Variant interpretation
Case D
Presentation of the case
Laboratory results
Variant interpretation
Case E
Presentation of the case
Laboratory results
Variant interpretation
Case F
Presentation of the case
Laboratory results
Variant interpretation
Case G
Presentation of the case
Laboratory results
Variant interpretation
Case H
Presentation of the case
Laboratory results
Variant interpretation
Main final conclusion
References
Chapter-18---Classification-of-genetic-variants-_2021_Clinical-DNA-Variant-I
18 . Classification of genetic variants in hereditary cancer genes
Introduction
BRCA1/2-associated hereditary breast and ovarian cancer syndrome
ATM-associated susceptibility to breast cancer
Lynch syndrome
BRCA2 c.9976A﹥T p.(Lys3326Ter)
Presentation of the case
Variant information: BRCA2 c.9976A﹥T p.(Lys3326Ter)
Pathogenicity assessment of the variant
Population data
BRCA1/BRCA2 allele frequency thresholds
Allele frequency
Population frequencies
Coverage of exon
Computational and predictive data
Functional data
Assay—Homology-directed repair assay
Experimental data from Mesman et al. [43]
Segregation data
Cosegregation analysis
Data from Wu et al. [44]
De novo data
Allelic data
Other database
Other data
Other data not considered in ACMG/AMP classification
Case–control analysis
Data from Meeks et al. [46]
Summary of evidence and final classification (Box 18.12)
Biological and clinical interpretation
BRCA2 c.9117G﹥A
Presentation of the case
Pathogenicity assessment of the variant
Population data
BRCA1/BRCA2 allele frequency thresholds
Allele frequency
Population frequencies
Coverage of exon
Case–control data
Case–control association study—Momozawa et al. [50]
Computational and predictive data
Splice predictors
Functional data
Assay 1—Patient mRNA splicing assay
Experimental data from Colombo et al. [52]—Results extracted from Table 2
Assay 2—Construct-based assay
Experimental data from Acedo et al. [51]
Segregation data
De novo data
Allelic data
Other database
Other data
Other data not considered in ACMG/AMP classification
Multifactorial data from Lindor et al. [56]—Results extracted from Table 6
Summary of evidence and final classification (Box 18.23)
Biological and clinical interpretation
ATM c.9007_9034del
Presentation of the case
Pathogenicity assessment of the variant
Population data
Allele frequency
Population frequencies
Coverage of exon
Computational and predictive
Splice predictors
Functional data
Assays article 1
Carranza et al. [58]
Assays article 2
Fievet et al. [59]
Segregation data
De novo data
Allelic data
Other database
Other data
Summary of evidence and final classification (Box 18.34)
Biological and clinical interpretation
MLH1 c.2041G﹥A
Presentation of the case
Pathogenicity assessment of the variant
Population data
Allele frequency
Coverage of exon
Summary of evidence
Computational and predictive data
Splice predictors
Protein predictors
Functional data
Segregation data
De novo data
Allelic data
Other database
Other data
Other data not considered in ACMG/AMP classification
Summary of evidence and final classification (Box 18.46)
Biological and clinical interpretation
References
Chapter-19---RASopathies_2021_Clinical-DNA-Variant-Interpretation
19 . RASopathies
Introduction
Classification of variants associated with a RASopathy
General evidence criteria
Gene-specific evidence criteria
Case-level evidence criteria
Case examples
Noonan syndrome (Table 19.2)
Cardio-facio-cutaneous Syndrome (Table 19.3)
Costello syndrome (Table 19.4)
Unknown RASopathy diagnosis (Tables 19.5 and 19.6)
Summary
References
Chapter-20---Summary-and-conclusions_2021_Clinical-DNA-Variant-Interpretatio
20 . Summary and conclusions
Future directions
Index_2021_Clinical-DNA-Variant-Interpretation
Index
A
B
C
D
E
F
G
H
I
L
M
N
O
P
Q
R
S
T
U
V
W
Recommend Papers

Clinical DNA Variant Interpretation: Theory and Practice
 0128205199, 9780128205198

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Translational and Applied Genomics Series

Clinical DNA Variant Interpretation Theory and Practice

Edited by Conxi La´zaro Jordan Lerner-Ellis Amanda Spurdle

Series Editor George P. Patrinos

Academic Press is an imprint of Elsevier 125 London Wall, London EC2Y 5AS, United Kingdom 525 B Street, Suite 1650, San Diego, CA 92101, United States 50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom Copyright © 2021 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library ISBN: 978-0-12-820519-8 For information on all Academic Press publications visit our website at https://www.elsevier.com/books-and-journals

Publisher: Andre Gerhard Wolff Acquisitions Editor: Peter B. Linsley Editorial Project Manager: Kristi L. Anderson Production Project Manager: Stalin Viswanathan Cover Designer: Matt Limbert Typeset by TNQ Technologies

Dedication We dedicate this book to all of our colleagues in the field of variant interpretation whose perseverance and dedication have provided essential scientific knowledge to inform methods for improved interpretation of DNA variants, and thereby the use of genetic data in the context of the diagnosis of hereditary disorders, and predictive and personalized medicine. The editors would like to thank the Elsevier team for offering us the possibility of writing this book. A special recognition to Peter B. Linsley for his invitation and to Kristi L. Anderson for her technical assistance.

Conxi La´zaro I wish to dedicate this work to my mentors in the field of human and cancer genetics (Drs. Xavier Estivill, Virginia Nunes, and Gabriel Capella´) because they have been an inspiration to me throughout my scientific and professional career. I also dedicate it to all former and present members of my team for being a constant inspiration and for their work and enthusiasm. This book was planned and designed during my sabbatical year in Toronto. I would like to thank ICO-IDIBELL, my home Institutes in Barcelona; Mount Sinai Hospital, Sinai Health, and Womens College in Toronto, my host Institutes, as well as the Spanish Government of Health and Education for making this amazing sabbatical year possible. Finally, I would like to thank all the public and private agencies from which we have obtained funding as well as the patients’ associations who always encourage and support our research and make us aware of their needs and concerns.

Jordan Lerner-Ellis I would like to dedicate this work to the broader community of passionate clinicians and researchers who have devoted their time to the interpretation of DNA variation. I thank my numerous mentors for their guidance and inspiration which have directed my interests in the field of human genetics. To my colleagues with whom I share countless hours working on clinical variant interpretation and research study. Finally to Conxi La´zaro who has been the driving force behind this book!

vi

Dedication

Amanda Spurdle I dedicate this work to my mentor David Goldgar, who continues to inspire and question all I do in relation to variant interpretation methodology and implementation. The latter has directed my evidence-based approach to variant interpretation methods. I thank my many colleagues for their ongoing input and discussion, and thereby influences on my research on interpretation of genetic variants, and on improving approaches to disseminate such information for clinical benefit.

Contributors Ana Catarina Alves Cardiovascular Research Group, R&D Unit, Dept of Health Promotion and Prevention of Non-Communicable Diseases, National Institute of Health, Portugal; BioISI - Biosystems & Integrative Sciences Institute, Faculty of Sciences, University of Lisboa, Portugal Christina Anne Austin-Tse Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, United States Mafalda Bourbon Cardiovascular Research Group, R&D Unit, Dept of Health Promotion and Prevention of Non-Communicable Diseases, National Institute of Health, Portugal; BioISI - Biosystems & Integrative Sciences Institute, Faculty of Sciences, University of Lisboa, Portugal Marcelo A. Carvalho Divisa˜o de Pesquisa Clı´nica, Instituto Nacional de Caˆncer, Rio de Janeiro, Brazil; Instituto Federal do Rio de Janeiro - IFRJ, Rio de Janeiro, Brazil Ozge Ceyhan-Birsoy Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY, United States George S. Charames Pathology and Lab Medicine, Mount Sinai Hospital, Toronto, Ontario, Canada; Lab Medicine and Pathobiology, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada; Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario, Canada Joana Rita Chora Cardiovascular Research Group, R&D Unit, Dept of Health Promotion and Prevention of Non-Communicable Diseases, National Institute of Health, Portugal; BioISI - Biosystems & Integrative Sciences Institute, Faculty of Sciences, University of Lisboa, Portugal Mara Colombo Unit of Molecular Bases of Genetic Risk and Genetic Testing, Department of Research Fondazione IRCCS Istituto Nazionale Dei Tumori, Milano, Italy Xavier de la Cruz Research Unit in Clinical and Translational Bioinformatics, Vall d’Hebron Institute of Research (VHIR), Universitat Auto`noma de Barcelona, Barcelona, Spain; Institucio´ Catalana de Recerca i Estudis Avanc¸ats (ICREA), Barcelona, Spain Johan T. den Dunnen Department of Human Genetics, Leiden University Medical Center, Leiden, South Holland, the Netherlands; Department of Clinical Genetics, Leiden University Medical Center, Leiden, South Holland, the Netherlands

xvii

xviii

Contributors

Niels de Wind Leiden University Medical Center, Leiden, the Netherlands Orland Diez Hereditary Cancer Genetics Group, Vall d’Hebron Institute of Oncology (VHIO), Vall d’Hebron Barcelona Hospital Campus, Barcelona, Spain; Area of Clinical and Molecular Genetics, Hospital Universitari Vall d’Hebron, Vall d’Hebron Barcelona Hospital Campus, Barcelona, Spain Anna B.R. Elias Divisa˜o de Pesquisa Clı´nica, Instituto Nacional de Caˆncer, Rio de Janeiro, Brazil D Gareth Evans Clinical Genetics Service, Manchester Centre for Genomic Medicine, Manchester University Hospitals NHS Foundation Trust, Manchester, United Kingdom; Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester Academic Health Science Centre, Manchester, United Kingdom Lidia Feliubadalo´ Molecular Diagnostics Unit, Hereditary Cancer Program, Catalan Institute of Oncology (ICO), Institut d’Investigacio´ Biome`dica de Bellvitge (IDIBELL), ONCOBELL Program, Barcelona, Spain Vanessa C. Fernandes Divisa˜o de Pesquisa Clı´nica, Instituto Nacional de Caˆncer, Rio de Janeiro, Brazil Ivo F.A.C. Fokkema Department of Human Genetics, Leiden University Medical Center, Leiden, South Holland, the Netherlands Cristina Fortuno Genetics and Computational Division, QIMR Berghofer Medical Research Institute, Herston, QLD, Australia Alice Garrett Division of Genetics and Epidemiology at the Institute of Cancer Research, London, United Kingdom; Cancer Genetics Unit at the Royal Marsden Hospital, London, United Kingdom Paolo Gasparini Medical Genetics Unit, Institute for Maternal and Child Health – IRCCS, Burlo Garofolo, Trieste, Italy; Department of Medicine, Surgery and Health Sciences, University of Trieste, Trieste, Italy Giorgia Girotto Medical Genetics Unit, Institute for Maternal and Child Health – IRCCS, Burlo Garofolo, Trieste, Italy; Department of Medicine, Surgery and Health Sciences, University of Trieste, Trieste, Italy Anna Gonza´lez-Neira Human Genotyping Unit–Spanish National Genotyping Centre(CEGEN), Human Cancer Genetics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain Karen W. Gripp Division of Medical Genetics, A. I. duPont Hospital for Children, Wilmington, DE, United States

Contributors

Sara Gutie´rrez-Enrı´quez Hereditary Cancer Genetics Group, Vall d’Hebron Institute of Oncology (VHIO), Vall d’Hebron Barcelona Hospital Campus, Barcelona, Spain Steven M. Harrison Broad Institute of MIT and Harvard, Cambridge, MA, United States Miguel de la Hoya Molecular Oncology Laboratory, Oncology Department, Instituto de Investigacio´n Sanitaria San Carlos, Hospital Clı´nico San Carlos, Madrid, Spain Jodie Ingles Cardio Genomics Program at Centenary Institute, The University of Sydney, Sydney, NSW, Australia; Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, Australia; Department of Cardiology, Royal Prince Alfred Hospital, Sydney, NSW, Australia Renee Johnson Victor Chang Cardiac Research Institute, Sydney, NSW, Australia Jordan Lerner-Ellis Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada; Pathology and Laboratory Medicine, Mount Sinai Hospital, Toronto, Ontario, Canada; Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario, Canada Harvey Levy Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, United States; Harvard Medical School, Boston, MA, United States Conxi La´zaro Molecular Diagnostic Laboratory, Hereditary Cancer Program, Institut Catala´ d’Oncologia (ICO-IDIBELL-ONCOBELL-CIBERONC), Barcelona, Spain; Institut d’Investigacio´ Biome`dica de Bellvitge, Barcelona, Spain Heather Mason-Suares Partners Healthcare, Laboratory for Molecular Medicine, Cambridge, MA, United States Ana Margarida Medeiros Cardiovascular Research Group, R&D Unit, Dept of Health Promotion and Prevention of Non-Communicable Diseases, National Institute of Health, Portugal; BioISI - Biosystems & Integrative Sciences Institute, Faculty of Sciences, University of Lisboa, Portugal Jessica L. Mester GeneDx, Gaithersburg, MD, United States Alejandro Moles-Ferna´ndez Hereditary Cancer Genetics Group, Vall d’Hebron Institute of Oncology (VHIO), Vall d’Hebron Barcelona Hospital Campus, Barcelona, Spain Alvaro N.A. Monteiro Cancer Epidemiology Program, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States

xix

xx

Contributors

Anna Morgan Medical Genetics Unit, Institute for Maternal and Child Health – IRCCS, Burlo Garofolo, Trieste, Italy Thales C. Nepomuceno Divisa˜o de Pesquisa Clı´nica, Instituto Nacional de Caˆncer, Rio de Janeiro, Brazil Rocı´o Nu´n˜ez-Torres Human Genotyping Unit–Spanish National Genotyping Centre(CEGEN), Human Cancer Genetics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain Selen O¨zkan Research Unit in Clinical and Translational Bioinformatics, Vall d’Hebron Institute of Research (VHIR), Universitat Auto`noma de Barcelona, Barcelona, Spain Nata`lia Padilla Research Unit in Clinical and Translational Bioinformatics, Vall d’Hebron Institute of Research (VHIR), Universitat Auto`noma de Barcelona, Barcelona, Spain Michael T. Parsons Genetics & Computational Biology Department, QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia Tina F. Pesaran Ambry Genetics, Aliso Viejo, CA, United States Marta Pineda Molecular Diagnostics Unit, Hereditary Cancer Program, Catalan Institute of Oncology (ICO), Institut d’Investigacio´ Biome`dica de Bellvitge (IDIBELL), ONCOBELL Program, Barcelona, Spain Paolo Radice Unit of Molecular Bases of Genetic Risk and Genetic Testing, Department of Research Fondazione IRCCS Istituto Nazionale Dei Tumori, Milano, Italy Farrah Rajabi Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, United States; Harvard Medical School, Boston, MA, United States Ebony Richardson Cardio Genomics Program at Centenary Institute, The University of Sydney, Sydney, NSW, Australia Peter Sabatini Department of Clinical Laboratory Genetics, University Health Network, Toronto, Ontario, Canada; Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada Stephanie Sacharow Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, United States; Harvard Medical School, Boston, MA, United States Amanda Spurdle QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia

Contributors

Bryony A. Thompson Department of Pathology, Royal Melbourne Hospital, Department of Clinical Pathology, University of Melbourne, Parkville, VIC, Australia Emma Tudini Genetics & Computational Biology Department, QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia Clare Turnbull Division of Genetics and Epidemiology at the Institute of Cancer Research, London, United Kingdom; Cancer Genetics Unit at the Royal Marsden Hospital, London, United Kingdom Lisa M. Vincent Division of Pathology & Laboratory Medicine, Children’s National Health System, Washington, DC, United States Michael F. Walsh Memorial Sloan Kettering Cancer Center, New York, NY, United States Nicholas Watkins Pathology and Lab Medicine, Mount Sinai Hospital, Toronto, Ontario, Canada; Hereditary Kidney Disease Clinic, Department of Nephrology, Princess Margaret Hospital, University Health Network; Molecular Genetics, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada

xxi

Foreword: the challenge of variant interpretation This textbook represents a highly up-to-date resource for clinicians and molecular scientists on variant interpretation. The book will also be a great starting point for a broad audience from graduate and master students to the interested general public who want or need to learn more about interpretation of DNA variants and how they can be classified as disease associated or not. Until the last 5 years (and even occasionally now!) variants were often reported inaccurately as being pathogenic in published manuscripts based on fairly flimsy evidence from in silico analysis to splicing prediction tools that have a substantial inaccuracy rate. This is not a trivial matter. Wrongly classifying a variant as pathogenic or likely pathogenic can result in drastic action by those who use the test to predict disease. An apparent splicing variant in the breast/ovarian cancer predisposition gene BRCA1 was wrongly reported to be pathogenic based on it being at the (-2) position in the canonical splicing region. This led to women carrying the variant having risk-reducing mastectomies they did not require as carrying the variant did not put them at high risk. Although on the surface of it, variant interpretation can seem very complicated by having to use a range of different information sources the book gives a clear guide across human disease and particularly for inherited constitutional disorders. The importance of population frequencies made available by resources like gnomAD to caseecontrol studies is a lesson for all. A comprehensive book like this one is a great resource. Ranging from how to use computational and functional evidence to interpretation of somatic data particularly for cancer predisposition it covers all the bases. It highlights the importance of clinical phenotypes and how a rare variant found in a number of families with the same rare monogenic syndromic disorder is a useful tool. Additionally, the importance of studying RNA to interpret potential splicing variants and the potential disruption of the resultant protein product. It even provides an important chapter on pharmacogenetics and personalized medicine an area that is vital in saving lives and preventing disease. Having built the framework for variant interpretation the book then provides chapters on how specific examples of variants can be classified in different disease areas. These range from heart disease to inborn errors of metabolism (phenylketonuria), hearing loss disorders, hypercholesterolemia and cancer predisposition, and finally finishing on the rasopathies. Overall, I would strongly recommend this book for those who have even slightly more than a passing acquaintance with DNA and how the variation can be important! D. Gareth Evans1, 2 1

Clinical Genetics Service, Manchester Centre for Genomic Medicine, Manchester University Hospitals NHS Foundation Trust, Manchester, United Kingdom; 2Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester Academic Health Science Centre, Manchester, United Kingdom

xxiii

About the editors Conxi La´zaro PhD, Head of the Molecular Diagnostic Service, Hereditary Cancer Program. Catalan Institute of Oncology. Program in Molecular Mechanisms and Experimental Therapy in Oncology (Oncobell), IDIBELL. Centro de Investigacio´n Biome´dica en Red de Ca´ncer (CIBERONC). Hospitalet de Llobregat, Barcelona, Spain; Institut d’Investigacio´ Biome`dica de Bellvitge, Barcelona, Spain. Dr. La´zaro is a molecular geneticist with more than 25 years of experience in the field of human genetics. She did her PhD in Human Genetics at University of Barcelona. She has worked in several clinical hospitals in Barcelona. She was an invited professor at Massachusetts General Hospital Cancer Center at Boston in 2003/04 and did a sabbatical stay at Mount Sinai Hospital and at Women’s College Hospital in Toronto in 2018/19. In the last 10 years, she has been involved in several projects aimed at using Next-Generation Sequencing (NGS) for genetic testing purposes. Her field of expertise is Hereditary Cancer although she has worked on other genetic disorders. Of relevance was her pivotal research in the genetic basis of Neurofibromatosis type 1 (NF1) since the gene was discovered and her current work on the development of new therapeutic strategies for malignant tumors associated with NF1. She is member of several reputable international consortia and associations such as CIMBA, ENIGMA, CTF, and GENTURIS and had been member of the Scientific Program Committee of the ESHG as well as treasurer of the Spanish association of human genetics (AEGH). Jordan Lerner-Ellis, PhD, FACMG, Associate Professor, Laboratory Medicine & Pathobiology, University of Toronto; Director & Head of Advanced Molecular Diagnostics, Pathology & Laboratory Medicine, Mount Sinai Hospital, Sinai Health, Toronto, Ontario, Canada. Dr. Jordan Lerner-Ellis has 20 years of experience in molecular genetics and diagnostics. He is Director & Head of Advanced Molecular Diagnostics in the department of Pathology and Laboratory Medicine at Toronto’s Mount Sinai Hospital, Sinai Health System; Associate Professor at the University of Toronto, Laboratory Medicine & Pathobiology; and Clinician Scientist at the Lunenfeld-Tanenbaum Research Institute. His laboratory provides clinical diagnostic services for hereditary breast, ovarian, and colon cancer, and other genetic testing areas, for Toronto and the province of Ontario. Dr. Lerner-Ellis

xxv

xxvi

About the editors

completed his PhD in human genetics at McGill University. He continued his studies at the Children’s Hospital in Basel, Switzerland, before moving on to a postdoctoral fellowship in Molecular Biology at Harvard University, the Massachusetts General Hospital, and in Medical and Population Genetics at the Broad Institute. Following his postdoctoral studies, Dr. Lerner-Ellis completed the Clinical Molecular Genetics training program at Harvard Medical School, Brigham and Women’s Hospital and is certified as a diplomate of the American Board of Medical Genetics. Dr. Lerner-Ellis’ core interest is in molecular diagnostics as currently applied to breast and colon cancer. His research is focused on improving genetic testing through greater reliance on new sequencing technologies. A concurrent aim of his research is to integrate genome sequencing into the general practice of medicine. Dr Lerner-Ellis is active in national and international data sharing, and variant interpretation efforts aimed at improving our understanding of the relationship between DNA variants and disease. Amanda Spurdle, PhD, Associate Professor and Group Leader, Molecular Cancer Epidemiology, QIMR Berghofer Medical Research Institute, Brisbane, Australia. Dr. Spurdle has more than 20 years of experience in the field of molecular genetic epidemiology of hormone-related cancers. She developed the first model to classify variants in the colorectale endometrial cancer mismatch repair genes, and led the effort by the InSiGHT consortium (International Society for Gastrointestinal Hereditary Tumours) to standardize the clinical interpretation of mismatch repair gene variants in the InSiGHT database. She co-founded and now leads the ENIGMA international consortium (Evidence-based Network for Interpretation of Germline Mutant Alleles), which aims to develop statistical and laboratory methods to evaluate variants of uncertain clinical significance in known and suspected breast cancer predisposition genes, and she is recognized by the ClinGen consortium as an expert panel for BRCA1/2 variant classification for ClinVar. She coordinates the variant interpretation activities of the BRCA Challenge project initiated by the Global alliance for Genomics & Health. She also contributes to activities of multiple ClinGen Variant Curation Expert Panels focused on hereditary cancer.

CHAPTER

Introduction: the challenge of genomic DNA interpretation

1

Jordan Lerner-Ellis1, 2,3, Amanda Spurdle4, Conxi La´zaro5, 6 1

Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada; 2Pathology and Laboratory Medicine, Mount Sinai Hospital, Toronto, Ontario, Canada; 3Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario, Canada; 4QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia; 5Molecular Diagnostic Laboratory, Hereditary Cancer Program, Institut Catala´ d’Oncologia (ICO-IDIBELL-ONCOBELL-CIBERONC), Barcelona, Spain; 6Institut d’Investigacio´ Biome`dica de Bellvitge, Barcelona, Spain

Medical genetics is a field that has rapidly evolved in the last three decades and includes multiple subspecialties such as clinical genetics, genetic counseling, molecular genetics, cytogenetics, and biochemical genetics. All of these are being applied across multiple different disease specialty areas. The human genome project was completed in 2003 and new technologies can now sequence the entire genome for less than $1000; this has ushered in a new age of genetic testing that includes panels and exome and genome sequencing. Both common and rare variation exist. Unfortunately, the speed at which we can now sequence the human genome has outpaced our ability to interpret itdthis is the big challenge for the future. The aim of the book is to provide a comprehensive theoretical and practical understanding into how DNA variants are interpreted and classified for clinical applications, with a focus on germline or constitutive disease. It is designed for a broad audience from graduate and master students to clinicians, investigators, or industry employees who wish to learn how to carry out variant interpretation, including what approaches are used today and how they are applied. Practical examples are provided from experts in the field to outline considerations for specific disease areas and clinical scenarios for learning. The Human Genome Project estimated that humans have around 21,000 protein-coding genes. New geneedisease associations continue to be discovered, and as of August 28, 2020, the Online Mendelian Inheritance in Man (OMIM), a comprehensive compendium of human genes and genetic phenotypes, registered a total of 4316 genes with phenotype-causing gene alternations, and these tend to be the ones that are included in diagnostic panels or clinical exomes. However, with close to 3 billion base pairs per genome and over 5 million common and rare DNA variants per individual, understanding how this variation contributes to human phenotypes is still a colossal enterprise. Different modes of inheritance have been described. They include dominant or recessive inheritance, X-linked dominant or recessive, or mitochondrial. However, these patterns of transmission can vary by disorder due to other molecular mechanisms such as imprinting (an epigenetic phenomenon that causes genes to be expressed in a parent-of-origin-specific manner) or anticipation (increasing severity of disease from generation to generation, in repeat expansion disorders). A one-to-one correlation to disease does not always exist and some genes can be implicated in multiple different Clinical DNA Variant Interpretation. https://doi.org/10.1016/B978-0-12-820519-8.00015-6 Copyright © 2021 Elsevier Inc. All rights reserved.

1

2

Chapter 1 Introduction

diseases, with variability even within the same family. For example, variants in the LMNA gene cause over 13 different and distinct phenotypes. While much of human genetic variation is thought to be benign, of the known variation that contributes to disease-related phenotypes, disease risk effect sizes can vary from low, moderate, to high. Disease penetrance or risk of getting a disease, often measured by way of relative risk (or the risk in relation to the population baseline risk) using cohort studies (for rare diseases), is one way in which geneticists have categorized disease severity. Notably, penetrance is also closely tied to age-related risk, which can be variable. For common complex diseases such as type 2 diabetes, large caseecontrol studies called genome-wide association studies (GWAS) have been employed to look at individuals for disease outcome in relation to the presence of a common genetic variant. Thousands of risk-associated variants have been documented (GWAS catalog: https://www.ebi.ac.uk/gwas/). In recent years, combining disease-associated variation identified from across the genome has led to the development of polygenic risk scores. However, establishing which genes and which variants to test for in order to make clinical decisions based on clinical utility and health economics is a continuing area of research. For instance, if a pathogenic variant is identifieddIs there a treatment available, can the disease be managed differently to improve patient outcome, and are there facilities available where such treatment can be obtained? Is the disease of sufficient severity or frequency to warrant a clinical test? Will the result inform family planning? Interpreting genetic information and establishing how best to apply genetic testing in practice remains a formidable challenge. As mentioned above, the ultimate goal of variant assessment is to provide an interpretation of the clinical significance of a variant that results in clear and accurate reporting to the requesting physician. This work must be done with thoroughness, as the interpretation will guide decisions and will often determine patient management strategies such as with surgical, chemotherapeutic, or other treatment decisions and will be also used for patients and their relatives in family planning decisions. The general information needed to properly interpret a variant may include the following: the type of change and location; a summary of the literature and if the variant has been previously observed with associated phenotypical information; database(s) where the variant is identified and if previously detected by the lab; description of relevant data; number of carrier probands (out of how many tested); presence or absence in healthy control datasets; population frequencies; segregation; co-occurrence with pathogenic variants in the same or other disease-relevant genes; nature of predicted molecular change and consequences; conservation and in silico analyses; functional data if available; conflicting information and reconciliation if possible; and resulting classification. The geneticist carrying out an analysis may add summary sentences stating major reasons for the variant classification. Such assertions must be reconciled with any applicable patient phenotypes, and additional supporting evidence may be added if variants in a gene have not been previously reported. The job of the medical geneticist is to subsequently take all the information at hand and make clinical correlations, follow up testing if appropriate, and follow up treatment plans and referrals to other specialists. The typical decision trees for classifying variants are extremely complex and some rules do not always apply; for example, silent variants can be pathogenic, and predicted loss-of-function variants can be benign. The understanding of variants in the context of phenotypic consequences adds further complexity, and rules may also change to reflect varying degrees of heterogeneity or penetrance. Variants identified during testing for dominant hereditary cancers may be treated or classified differently than those discovered during tumor testing or testing of a recessive condition. Until recently, much of this work was carried out using data and expertise housed within individual

Chapter 1 Introduction

3

laboratories. However, as laboratories expand their testing menus to include more extensive panels and exome and genome sequencing, variants associated with phenotypes that lie outside areas of disease expertise will be detected with regularity, a situation which has created a need to share data more broadly. Long-standing differences in how laboratories, researchers, and clinicians have described DNA, RNA, and protein products have led to challenges in communicating or sharing genetic findings more broadly. Thus, the international community has settled on terminology or nomenclature to allow for a “common language” for describing DNA variation developed by the Human Genome Variation Society (HGVS). Describing variants in terms of both the location in the DNA and the type of variation that occurs as well as how variants are classified in relation to disease has allowed for a better understanding of the DNA code. This is the subject of Chapter 2, relating to terminology and standards. Consensus guidelines from the American College of Medical Genetics and Genomics/Association for Molecular Pathology (ACMG/AMP) have been instrumental in creating a framework, now used in multiple countries around the world, for making genomic data more applicable in the identification, treatment, and management of disease (see Chapter 3). For instance, if a variant is identified in the context of diagnostic evaluation in an individual affected with disease, or during screening where an individual is asymptomatic and could be a carrier. Understanding how information is reported or interpreted in the context of case-level evidence is an important consideration when determining what a particular genetic variant might mean for any given individual. The ACMG/AMP guidelines reduced clinical variant assessment into qualitative distinct evidence, e.g., functional, population, or in silico, and stratified the evidence into categories and then combined criteria to come up with a semiquantitative categorical variant assessment. However, the ACMG/AMP guidelines have been shown to fit into a Bayesian framework which provides a mathematical foundation for criteria that is largely based on qualitative criteria and this quantitative framework may help to automate variant pathogenicity assessments. This subject is described in Chapter 3 and discussed also in Chapter 4, which describes true quantitative models which have been developed for specific geneedisease associations. Unlike semiquantitative methods, different evidence types are calibrated against known reference sets of pathogenic variant carriers and noncarrier or benign variant controls. This method has the advantage that it can be tailored to incorporate any evidence type considered relevant for that gene, and analyses of large datasets can be individualized to account for dataset-specific differences, such as in ascertainment criteria for testing. One commonly used evidence type for variant interpretation is population frequency, a topic covered in Chapters 4 and 5. Large-scale sequencing projects in the last decade have led to a much more detailed view of common variation within human populations. Although the populations represented in these databases still represent a small spattering of individuals from around the world with lack of representation from many different geographical groups, this approach has allowed us to more quickly use the frequency of DNA variants to help make correlations with disease and to help classify variation as benign or pathogenic. In silico approaches to interpreting variants tend to use existing information about variants such as biophysical properties of the protein including what is known about the crystal structure of the protein, the change in amino acid and accompanying differences in the properties of these substitutions and if

4

Chapter 1 Introduction

they are predicted to result in conformational changes, whether or not variants occur within conserved or functional domains of the protein. These approaches take into consideration multiple physical lines of evidence. In silico approaches have provided much value in predicting whether or not variants can have a deleterious effect on a protein product. Some limitations exist in that multiple in silico approaches do not always agree with each other and rely on algorithms which may not always truly replicate the environmental context. Bioinformatic prediction tools are also a key factor in delineating variants with potential to alter mRNA splicing profiles. From both the protein and mRNA prediction standpoints, in silico approaches play a key role in providing supportive evidence when classifying variants. This is the subject of Chapter 6. Another important role for such predictions is their use in prioritizing variants for downstream laboratory assays of mRNA and protein function (see below). Segregation or association studies are also important approaches used in variant interpretation. However, the majority of sequence variants causing highly penetrant severe disorders are rare and limit the ability to study these variants through these approaches. For this reason, there has been much emphasis placed on developing, and assessing the validity and utility of, assays of variant impact on mRNA transcripts (Chapter 7) and protein function (Chapter 8). In the cancer predisposition field, the use of tumor data (somatic mutations or somatic signatures) can be used when assessing the pathogenicity of germline variants (Chapter 9). Pharmacogenomic testing is another area of genetic testing that is quickly evolving. The use of pharmacogenomic variant data to determine the metabolism of certain drugs and dose response is still not in routine clinical use but there are a number of well-studied examples of the application of this information in clinical practice. An array of databases are now available and professional organizations have come together to come up with guidelines and recommendations for applying this information clinically. Some genes such as CYP2D6 have been implicated in the efficacy or toxicity of many drugs. In this example, this gene includes practice guidelines on at least 15 widely adopted medications from codeine to tamoxifen, tricyclic antidepressants and serotonin reuptake inhibitors. Translating this information into metabolizer phenotype or status based on genotype is critical to enable consistent clinical implementation. Additional complexity exists because multiple genes may interact in the metabolism of a given medication and so the decision trees may vary. The PharmGKB resource that has incorporated guidelines from the Clinical Pharmacogenetics Implementation Consortium (CPIC) and/or Dutch Pharmacogenetics Working Group (DPWG) continues to provide guidance on defining genotypeephenotype correlations. Current systems to translate this information rely on the star () allele nomenclature system which defines haplotypes or variants that define a particular allele and the resulting phenotype of that allele. The topic of Pharmacogenomics and Personalized Medicine is covered in Chapter 10. Many individual international databases offer a rich store of data on DNA variants. However, most of these resources have been of limited utility for clinical laboratories due to a number of serious shortcomings, including a lack of clinically related information on phenotypic consequences. Moreover, many public-access genetic databases are limited in their scope (e.g., limited to locus-specific data), lack clinically approved interpretations, and/or are hampered by clinical and technical false positives and negatives. The Clinical Genome Resource (ClinGen) is one effort aimed at sharing and evaluating genomic variants and disease associations (http://clinicalgenome.org; http://www.nih.gov/ news/health/sep2013/nhgri-25.htm). The ClinGen project includes the ClinVar database (http://www. ncbi.nlm.nih.gov/clinvar), operated by the National Center for Biotechnology Information as their depository of record. To date, over 1.1 million records with interpretations and 842,050 unique

Chapter 1 Introduction

5

variation records have been deposited into ClinVar by 1658 submitters (August 10, 2020). Other organizations such as the Human Variome Project (HVP) are focused on bringing together “local” variant databases by creating standards and guidelines for genetic interpretations globally. The HVP puts emphasis on the standardization of genetic variant interpretation across different laboratories, but at the international level. Many national projects have taken a grassroots approach by first standardizing variant information at the national level, in order to better facilitate the entry of genetic information into the international community (e.g., the Canadian Open Genetics Repository). The overall goal is to amass existing information spread across multiple sources and combine them in a single common resource or centralized repository so that all the scientific and clinical information contained therein can be shared with all potential users. Challenges include lack of a standardized variant classification system and differences in clinical reporting protocols. By pooling variant information currently stored in individual clinical laboratories, the interpretation of human genetic variants can be made more clinically useful. The topic of data sharing and databases is discussed in Chapter 11. Lastly, it is important to understand variant interpretation in the context of different diseaseegene relationships and how underlying genetic variation presents clinically. The expertise and experience of the molecular geneticist is crucial but it should be merged with the clinical phenotype provided by clinical geneticists in a more holistic approach (Chapters 12 and 13). The manifestation of genetic disorders is specific to certain genes and diseases, and for this reason, the last part of the book is composed of six chapters dealing with examples in different genetic conditions such as Hereditary Cancer, Inherited Heart Diseases, Phenylketonuria, Hearing loss, Familial hypercholesterolemia, and RASopathies. These examples illustrate the complexity of variant assessment in different disease contexts and the importance of multidisciplinary approaches and teams to better achieve the most accurate variant classification for clinical use (Chapters 14 to 19). As indicated above, the role of the clinical molecular geneticist or cytogeneticist is to interpret the clinical significance of DNA variation and to communicate this information in clear language to the physician or patient in order to provide guidance in diagnosing or managing a particular disease. The tools in our armamentarium are many and continue to expand with new approaches. Applying these tools to interpret the clinical significance of DNA variation and inform medical decisions is the subject of this book.

CHAPTER

General considerations: terminology and standards

2

Ivo F.A.C. Fokkema1, Johan T. den Dunnen1, 2 1

Department of Human Genetics, Leiden University Medical Center, Leiden, South Holland, the Netherlands; Department of Clinical Genetics, Leiden University Medical Center, Leiden, South Holland, the Netherlands

2

Introduction The human genome, the collection of our full DNA sequence, is often referred to as “the book of life.” Decades of research have brought us releases of the human genome complete enough to allow for various applications, including clinical diagnoses. The latest of such releases, the Genome Reference Consortium’s GRCh38 reference genome build, was completed in December 2013 and consists of over 3 billion letters (nucleotides). In our body, all DNA is present in two copies: one maternal and one paternal copy. Every time a cell in our body divides, both copies need to be duplicated (replicated), a process that is very precise but not without errors. These changes will be copied, passed on to the next generation, and over time the human DNA sequence slowly changes. The speed at which this happens, the mutation rate, is estimated to be around 1.5 nucleotides per year [1] with estimates ranging between 36 and 63 changes (variants) passed on to the next generation [2,3]. Most of these variants do not cause disease but become part of the naturally existing variation in the population. Nowadays we are able to determine the sequence of a human individual within a few days. Since, especially on a global scale, the natural variation in the human DNA is high, comparing a person’s DNA to a standard reference sequence is not without problems. Compared to the reference, an average human genome contains about 4 million variants, while an average exome analysis (i.e., analysis of all protein coding sequences) returns some 40,000 variants. In the case of a genetic disorder, of these variants usually only one or two are associated with the presented condition. Finding these variants in such large datasets is truly like searching for a needle in a haystack. Being able to see the difference between the vast majority of benign variants and the few disease-causing (pathogenic) variants requires a good understanding of the different types of variants and the possible consequences these variants have on the function of the genes they affect. This chapter describes the types of genetic variation and their possible consequences as well as various standards and their importance related to describing, interpreting, and reporting genetic variants and phenotypes. Some important points to consider when classifying variants are discussed, as well as general challenges and considerations to keep in mind when performing sequencing analysis.

Clinical DNA Variant Interpretation. https://doi.org/10.1016/B978-0-12-820519-8.00009-0 Copyright © 2021 Elsevier Inc. All rights reserved.

9

10

Chapter 2 General considerations: terminology and standards

Genetic variation Types of DNA sequence changes DNA variants can be characterized by the type of variation that occurs on the DNA level as well as their consequences on either RNA or protein level. To prevent those consequences from getting mixed, it is best to strictly separate and report each level individually (DNA, RNA, and protein). As variant screening is mostly based on DNA analysis, variants detected are primarily described on the DNA level. In addition, the (predicted) consequences on the RNA and protein level can be given. In general, current short-read high-throughput sequencing technologies cannot easily detect all different DNA variant types. To detect all variant types, either special analysis pipelines are required or long-read sequencing technologies need to be applied. The basic DNA sequence variant types identified are listed in Table 2.1. •





Substitutions are variants where one single DNA nucleotide is replaced by another single DNA nucleotide. This is by far the most common type of DNA sequence variant, taking up some w80% of all reported DNA variation. Deletions are variants where one or more nucleotides have been removed from the original DNA sequence. This is the next most common variant type. When a deletion spans one or more exons of a gene or more than 1000 nucleotides, it is referred to as a copy number variant (CNV). Insertions are the reverse of deletions and occur when one or more nucleotides are added to the original sequence. When the inserted sequence is a tandem copy of the original DNA sequence, it is called a duplication. Both duplications and deletions frequently occur where the DNA contains

Table 2.1 DNA sequence variant types. Each type is explained by an example sequence, the original DNA sequence, and the changed DNA sequence, in which the variant occurred. The nucleotides that are part of the variant have been highlighted in red. Variant type

Sequence

Description

Substitution

AACGTT AACCTT AACGTT AAC-TT AAC-GTT AACAGTT AAC-GTT AACCGTT AACGTT AATATT AACGTT ACGTTT

One nucleotide has been replaced by another.

Deletion Insertion Duplication Deletion-insertion Inversion Structural variation

One or more nucleotides have been removed. One or more nucleotides have been inserted. One or more nucleotides have been duplicated in tandem. One or more nucleotides have been removed and replaced by one or more other nucleotides, other than a substitution. More than one nucleotide has been inverted into their reverse complement sequence. A variation where large parts of chromosomes have rearranged.

Genetic variation







11

repeated copies of a small sequence. When a duplication spans one or more exons of a gene or more than 1000 nucleotides, it is referred to as a CNV. Deletioneinsertions are a combination of a deletion and an insertion in the same location in the DNA (excluding substitutions). One or more nucleotides are replaced by one or more other nucleotides. Inversions are variants where a stretch of DNA turns around (inverts); the inserted sequence is the exact reverse complement of the deleted sequence. Inversions have a minimum length of two nucleotides; one-nucleotide inversions are classified as simple substitutions. Structural variation is a term for various large chromosomal changes such as translocations and transpositions. Note that these are usually not picked up by short-read sequencing methods and require additional tests to be detected. If the structural changes are large enough, they can be seen using optical mapping technologies or microscopy (karyotyping).

Types of RNA sequence changes Variants on RNA level include those detected on the DNA level, i.e., substitution, deletion, insertion, duplication, deletioneinsertion, and inversion, as well as variants affecting RNA processing. RNA processing includes transcription initiation (promoter and locus control regions), RNA capping, RNA splicing, RNA modification (editing), polyadenylation, and transcription termination. In addition, variants may indirectly influence the RNA, altering its folding, stability, and degradation, and thereby its quantity in the cell. An exceptional case is RNA fusion transcripts where parts of two different genes get fused into one transcript. RNA fusion transcripts usually occur after a translocation or deletion removing the 30 end of a gene.

Types of protein sequence changes Variants on protein level include those detected on DNA and RNA level, i.e., substitution, deletion, insertion, duplication, and deletioneinsertion. Although effectively part of these categories, some variants affect protein translation and are treated separately, i.e., frameshift and extension variants. Just like for RNA, variants may affect protein processing, translation initiation, translation termination, protein modification, protein folding, and proteineprotein interaction. In addition, variants may influence the stability and degradation of the protein molecule and thereby its quantity in the cell. An exceptional case, like for RNA, is fusion proteins translated from RNA fusion transcripts.

Variant consequences by location In the literature, rather than the genomic DNA change, variants are often described by their location in the gene or by the effect of the variant on the protein level. Such an effect can be deleterious in two ways: loss of function or gain of function. A single-nucleotide substitution in a crucial part of the gene results in a far more devastating effect than a deletion or insertion in, for instance, an intron. Fig. 2.1 shows the different locations where a DNA variant can influence the function of a gene. Below a brief overview of the main categories, based on the basic genetic unit - a gene, and the RNA and protein it encodes.

12

Chapter 2 General considerations: terminology and standards

FIGURE 2.1 Locations in which variants can influence a gene’s function.

Promoter region The promoter region regulates the gene’s transcription. Variants in this region may affect the transcription factor binding sites that are present and may increase, lower, or even abolish the gene’s expression. As such, variants in this region may render the gene dysfunctional. Since it is quite difficult to determine the exact location of the promoter for each gene, it is a challenge to distinguish variants in the promoter region from intergenic variants that have no effect on the gene’s expression. Furthermore, for many genes, the timing of expression (when during development and in which tissue is the gene expressed) is controlled by far distant sequences, such as locus control regions, enhancers, topologically associating domains, etc.

5 0 untranslated region

The 50 untranslated region (50 UTR), before the initiation (start) codon, contains regulatory elements such as internal ribosome entry sites (IRESs) and upstream open reading frames (uORFs) that control the translation [4]. Variants in these sequences mainly influence translation initiation and affect translation levels. As with the promoter region, the annotation of relevant active sites within the 50 UTR is usually lacking, and functionally relevant variants cannot easily be distinguished from nonrelevant 50 UTR variants.

Start codon Variants in the start codon that alter the ATG sequence, block translation initiation, and usually have serious consequences. When the ATG is affected, initiation may move to another initiation site, either up- or downstream, and only when that site is in frame with the normal protein sequence, the translation product may be (partially) functional. The consequences of duplications involving the ATG motif are difficult to predict. Although they leave a normal sequence, at the same time they introduce a new competing upstream initiation site. The sequence surrounding the start codon, coined the Kozak sequence, also shows conservation and is sensitive to variants that can change the level of translation [5].

Protein-coding region The consequences of variants in the protein-coding region are, in general, more severe when a large segment of the protein is altered. When a deletion removes the entire gene or the start of a gene, no protein is made, and it depends on the activity of the copy on the other chromosome whether this loss can be compensated or not. Note that deletions on the X-chromosome in males cannot be compensated

Genetic variation

13

since there is no second gene copy. Duplications of an entire gene may increase the amount of RNA/ protein produced, and it depends on the gene whether this has deleterious consequences. Some genes are “dosage-sensitive” while others are not (see https://clinicalgenome.org/curation-activities/dosagesensitivity/ for a list). Furthermore, in many cases, missing one copy is often tolerated better than having one copy with an altered protein sequence disturbing normal cellular processes. Most variants in the protein-coding region lead to the production of an altered protein. The resulting protein may not be functional at all, function only partially, or even give an additional or completely new function. The consequences of missense variants, which replace one amino acid for another amino acid, vary depending on the change in size, charge, and hydrophilicity of the affected amino acid, as well as its position relative to the functional domains of the protein. Nonsense variants replace an amino acid for a translation stop codon, therefore causing premature translation termination of the protein. This usually has deleterious consequences. The same holds true for frameshift variants which not only truncate normal translation but, in addition, add a completely new C-terminal tail to the protein which can be of considerable size. This new protein tail may be either shorter or longer than the original and may have undesired functional consequences (gain of function), interfering and disturbing normal cellular processes. When nonsense or frameshift variants occur near the end of the protein and the length of the C-terminal tail is small, normal protein function may be unaffected. In-frame variants (deletions, insertions, and duplications) do not disturb the reading frame and may have less severe consequences. A well-known example is the DMD gene, where in-frame deletions or duplications, even when spanning many exons, cause a relatively mild phenotype compared to truncating variants (nonsense or frameshift variants) [6]. The effect of in-frame variants mainly depends on the function of the protein and the size of the segment of the protein affected. In general, variants affecting larger stretches have a more significant impact. Still, like the p.Phe508del variant in the CFTR gene causing cystic fibrosis, even the deletion of a single amino acid may already have serious deleterious consequences [7]. When DNA variants in the coding region do not lead to a (predicted) change in the amino acid sequence, they are referred to as silent or synonymous variants. It should be noted, however, that such variants may still influence RNA stability or alter binding sites and RNA processing, in particular splicing, which will then have a significant effect on the gene’s function.

Splice region, splice sites, and introns After transcription, the RNA molecule undergoes a range of steps before the mature RNA is ready. The 50 end of the transcript is capped, a step which is important to protect the RNA from degradation. Most genes are spliced, a process whereby some parts of a gene (the exons, mostly protein-coding) are fused together after the removal of other sequences (the introns). Finally, many transcripts are processed at the 30 end by cleavage and the addition of a polyA tail, again protecting the RNA from degradation. The splicing process is rather complex and involves many sequences. While there is a clear and almost completely invariable DNA sequence motif spanning the first and last two nucleotides of the intron (GT and AG, respectively, see Fig. 2.1), the surrounding sequence is also important yet less well conserved, making predictions of the effect of variants in this region difficult. Changes in the first and last two nucleotides of the intron nearly always result in a disruption of normal splicing. Additionally,

14

Chapter 2 General considerations: terminology and standards

on the 50 side, the splice donor site, variants in the last nucleotide of the exon and nucleotides þ3 to þ6 often affect splicing. On the 30 side, the splice acceptor site, especially variants creating a close-by AG dinucleotide cause problems [8]. Some variant effect prediction tools consider a more cautious approach and extend the region that possibly affects splicing to the first and last eight nucleotides of the intron and including the first and last three nucleotides of the exon [9]. The intron also contains the branch point, a small region close to the 30 end of the intron, containing a single strongly conserved adenine nucleotide. The branch point initiates the formation of the loop structure (lariat) that is formed when the intron is spliced out. A variant in the branch point will disrupt the intron’s splicing completely [10]. Finally, variants in intronic and exonic splice enhancer and silencer motifs (ISE, ISS, ESE, and ESS) also influence splicing but since their sequence is less conserved, their position is rarely known and their involvement is not considered. Disruption of splicing can also occur through the creation of a new or activation of a cryptic splice site (CSS). CSSs are normally dormant sites that are silenced (suppressed) by stronger, nearby canonical splice sites. Activation occurs when a sequence change strengthens the cryptic site or weakens the canonical site. Upon activation of the CSS, the canonical splice site is no longer or not fully used and normal splicing is wholly or partially disrupted. Disruption of splicing has a range of different consequences on the RNA level. Note that to correctly determine the effect of a variant on splicing, RNA analysis is essential (see also Chapter 7). Variants affecting splicing frequently lead to multiple transcripts being produced, with the overall effect depending on the relative abundance of each of these transcripts. Possible effects include: •



The deletion of an exon or part of an exon. When a splice site is damaged, an exon might not be recognized at all (deleted) or splicing may shift to a new site in the exon. The resulting deletion can be in-frame or out-of-frame. Out-of-frame deletions result in a frameshift and have a more devastating effect on the resulting protein than in-frame deletions. The insertion of intronic sequences. When a splice site is nonfunctional, an intron may not be removed at all (inserted), splicing may shift to a new site in the intron thus elongating the exon, or a new exon (pseudoexon) may be inserted. The inserted sequence may contain a translation stop codon or contain an open reading frame that fuses in-frame or out-of-frame with the remainder of the encoded protein sequence. Truncating insertions have a stronger negative effect on the resulting protein than in-frame insertions.

Stop codon Changes in the stop codon that prevent the stop codon from being recognized lead to the elongation of the protein sequence. The effect of the additional C-terminal tail on the function of the protein is difficult to predict. In general, a longer tail will have more serious consequences and most extensions negatively influence protein folding, function, and stability.

3 0 untranslated region and the polyadenylation signal

The 30 untranslated region (30 UTR), the sequence after the translation termination (stop) codon, contains several regulatory elements, such as binding sites for miRNAs and RNA-binding proteins, and the polyadenylation signal and addition site. Together, these directly or indirectly influence RNA stability, folding, transport, localization, and translation efficiency and consequently RNA and protein

Standards on describing genetic variation

15

levels [4]. As the functional annotation of these elements (except for the polyadenylation signal) is largely lacking, variants in this region are rarely considered as having deleterious consequences.

Other variation A specific type of diseases is caused by repeat expansions. In these disorders, a short repetitive sequence may increase in length to up to many kilobases. When this sequence is translated (e.g., the CAG repeat in Huntington’s disease), it directly affects protein function. When it is located in an intron or the UTR of a transcript, RNA processing (splicing) and stability are affected and transcription may be silenced (e.g., methylation in the fragile X syndrome). Epigenetic variation, i.e., variation in the methylation of the DNA sequence, may influence the density of the chromatin structure, thereby affecting transcription. Methylation changes can cause disease by inappropriately silencing or activating gene expression [11]. Methylation cannot be measured by most sequencing protocols unless specific sample preparation steps are included. Also, RNA editing, the process where the RNA sequence is altered and becomes different from the genomic DNA template, is not detected by standard sequencing protocols but can be involved in disease [12]. Although disease through these mechanisms is rare, they should not be overlooked.

Standards on describing genetic variation To implement DNA sequence variant analysis in a clinical setting, it is essential to apply universal standards. These standards are required to remove ambiguity, prevent false-negative or false-positive results, and ensure there is no misunderstanding of what has been found and what the associated consequences for the health of the individual were. The standards required include naming genes, accepted reference sequences for the human genome and the encoded transcripts, the file formats to exchange sequence information, the description of sequence variants identified, the description of the phenotype of the individual studied, standards to classify the variants detected, and standards to store the information in gene variant and phenotype databases. Even when there is a universal standard, this does not mean it is applied correctly. An analysis published in 2016 of how diagnostic laboratories applied the same standard, the HGVS variant nomenclature, showed that 31% of the reports checked described the variant incorrectly and in such a way that it could not be reconstructed to the variant actually observed [13]. Other reports described the variant identified also incorrectly, but in a way such that it could be recognized and corrected. In both situations, however, incorrect variant descriptions cause inconsistencies and mismatches when comparing reports or searching databases and the literature. Querying external sources for variants identified is an essential part of variant interpretation and classification. Data from large population studies yield allele frequencies that provide evidence to a variant’s pathogenicity, with increasing frequencies making it less likely a variant has serious deleterious consequences (also see Chapter 5). Gene variant databases contain data from the literature and from unpublished cases and provide detailed information on variants and phenotypes and the likelihood they are causally linked, i.e., variant pathogenicity (also see Chapter 11). Given the multiple steps in the process, to maximize efficacy and reduce the chance mistakes are made during variant interpretation, it is essential the same standards are used by everyone involved.

16

Chapter 2 General considerations: terminology and standards

Gene symbols Genes are widely identified using their symbols; abbreviations mainly based on their function. The HUGO Gene Nomenclature Committee (HGNC) is the official international organization approving gene names and symbols [14]. Although genes can be identified by their unique numerical ID that does not change, e.g., HGNC:1100, people prefer symbols that are more easily remembered, e.g., BRCA1. Gene symbols rarely change, but when they do, the HGNC keeps track of a gene’s previously used symbols such that expired gene symbols referenced in reports can still be identified. The HGNC is currently actively renaming genes which do not refer to their function (e.g., FAM#, KIAA#, c#orf#, etc.).

Reference sequences As variants are defined as differences between the sample DNA sequence determined and a reference sequence, the main requirement for any variant description is to clearly define which reference sequence was used. Reference sequences all have unique identifiers, referring to their respective entries in the reference sequence databases. Reference sequences can be obtained from several databases, including GenBank at the National Center for Biotechnology Information (NCBI) in the United States [15], Ensembl at the European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI) in the United Kingdom [16], and the shared NCBI/EBI Locus Reference Genomic (LRG) project [17]. A reference sequence identifier should be stable and the sequence it contains should not change over time. When the sequence does change, this is indicated by the addition of a version number in the identifier. For LRG sequences, there is no version number and a new ID should be issued. More information can be found on the HGVS nomenclature website (http://varnomen.hgvs. org/bg-material/refseq/). For genomic variants detected by next-generation sequencing, the reference sequence will most likely be the human genome reference sequence. The first human genome reference sequence was published in 2000. Over many years, this reference sequence has improved and the latest version of the human genome is build 38. Each genome build contains reference sequences per chromosome; chromosomes 1e22, X, Y, and the mitochondrial genome (mtDNA), together representing an entire genome. For whole-chromosome genomic reference sequences, only the NCBI’s GenBank has identifiers available. Examples are NC_000015.9 for chromosome 15 of genome build GRCh37 (also known as hg19), and NC_000023.11 for chromosome X on genome build GRCh38 (also known as hg38).

Describing variants After sequence analysis and variant calling, variants are generally stored in the Variant Call Format (VCF) file format, developed for the 1000 Genomes Project and since then adopted by many other large-scale sequencing projects [18]. The VCF file does not contain information on the reference genome used. The file has a tabular format indicating the chromosome, a genomic position, the reference sequence at that position, the variant (alternate) sequence identified in the sample(s) and optionally various details on the sequencing quality, such as coverage, genotype quality, etc.

Standards on describing genetic variation

17

In publications and in databases, variants are described using the “Recommendations for the description of sequence variants” from the Human Genome Variation Society (HGVS). These standards were first published in 2000 and are regularly updated [19] (https://varnomen.HGVS.org). The HGVS nomenclature facilitates variant descriptions based on a genomic reference sequence (g. descriptions), based on a transcript (c./n. descriptions) and descriptions for variants on the RNA and protein level (r. and p. descriptions). Any reference sequence can be used, as long as the residues altered (nucleotides, amino acids) are located within the reference sequence. An overview of the main features of both formats can be found in Table 2.2.

The Variant Call Format The VCF was developed for the 1000 Genomes Project [18] and has been designed to be machinereadable for faster processing of large genomic variant datasets. It has become the most often used file format for storing and exchanging large-scale genomic variant data. It supports multiple samples within one file, rich annotation which can include mapping on transcripts and predicted protein change, and a method to indicate the absence of variants on a certain region (gVCF). Also structural variation can be stored within a VCF file. The VCF standard is currently maintained by the GA4GH (https://www.ga4gh.org/) and is currently at version 4.3 (https://github.com/samtools/hts-specs). VCF files begin with a header section, in which metadata is stored. For instance, the values and contents of some configurable fields that can be found in the VCF file’s body are defined and explained in the header of the file. After the header, a single line defines the order of fields and the names of the samples that are stored in the file. What follows is the file’s body, consisting of a list of genomic positions and the relevant sequences. The basic format of each line in the file’s body is the chromosome (CHROM), position (POS), external database ID (ID), the sequence of the reference (REF), the

Table 2.2 Main features of the HGVS nomenclature and VCF files.

In use since Intended use

Type of variation

Variant disambiguation

HGVS nomenclature

VCF files

2000 Human-readable format; databases, literature, reporting. Any type of variant, on any reference sequence.

2010 Machine-readable format; large-scale genome sequencing projects.

30 (right) aligned; in principle, each variant has only one correct description.(a) However, the normalization rules are not always followed by its users.

Any genomic variant, excluding complex genomic rearrangements. It can contain transcript variant descriptions in HGVS format in its annotation. 50 (left) aligned; has a nearly infinite number of possible descriptions for each variant. Requires specialized tools to compare variant sets.

a Even though each variant only has one correct description in HGVS, there are no rules defined regarding when two neighboring variants should be considered as one.

18

Chapter 2 General considerations: terminology and standards

sequence found at this position (ALT), and several fields containing annotations. Variant annotations stored in the VCF file can be sample-dependent or sample-independent. Sample-dependent annotations include genotype, genotype quality, and read depth. Examples of sample-independent annotations are gene symbol, mappings on transcripts, and protein change predictions. See Table 2.4 for examples of how variants can be represented in a VCF file. The VCF standard recommends variants to be described using the “simplest representation possible.” However, this is not a requirement and such an open recommendation is bound to give different implementations. Also, users are encouraged to use the lowest coordinate for a variant, therefore shifting the variant as far 50 as possible. Unfortunately, this is in contrast with the existing HGVS standard (see below), which requires variants to be shifted as far 30 as possible. These issues make a direct comparison of variants between different VCF files or a VCF file and a list of HGVS formatted variants, quite difficult. Also, it is common for variants around the same location to be merged into one line in VCF files. In this case, the ALT column will contain multiple values. With a POS of 1 and an REF value of “AG,” it is possible to have “TG,AGG” as an ALT value, meaning an A to T substitution on position 1 and a duplication of the G on position 2. This can also be the case for larger variants, causing a large sequence overlap between the REF and ALT columns. In cases like these, the variant description differs quite significantly from the simplest form of describing the variant, and a simple comparison of variants is not possible. The heterogeneity of variant descriptions in a VCF file is known to cause problems, even within individual diagnostic laboratories [20]. However, several tools exist to normalize and compare VCF files or convert VCF files into lists of HGVS formatted variants which can then be compared more easily. Examples of VCF normalization and comparison tools are vcfeval, part of RTG Tools [21], vt normalize [22], and Best Alignment Normalisation (BAN) [23]. Tools that can convert VCF files into HGVS variant descriptions include the hgvs python module [24] and VariantValidator [25]. It should be noted that none of these tools are perfect, partly also because for describing more complex variants, current HGVS recommendations are not unequivocal. In the VCF format, there are no strict rules regarding when to describe variants independently and when as one variant. As variant callers in use in NGS software pipelines rarely call deletioneinsertion events but instead prefer calling multiple consecutive variants, VCF files often contain variants found directly next to each other or in very close proximity to each other (also see HGVS below).

The Human Genome Variation Society nomenclature Since the HGVS nomenclature was first described in 2000, it has been widely adopted as the humanreadable standard for genetic variation. The HGVS nomenclature aims to remove ambiguity in variant descriptions to improve variant reporting in databases, literature, and genetic test reports. The nomenclature defines detailed rules for describing variants on DNA level (genomic and transcript), RNA level, and protein level. Recommendations include complex cases such as RNA fusion transcripts, chromosome translocations, and how to describe variants that have not been determined exactly down to the sequence level. The latest version of the HGVS nomenclature can be found online at http://varnomen.hgvs.org/. The basic structure of an HGVS variant description is (reference sequence):(numbering scheme).(variant). Reference sequence identifiers should always include version numbers where available, and the numbering scheme indicates the type of reference sequence used (e.g., “g.” or “c.”). For example, NC_000015.9:g.40699841G>C describes a substitution of a G to C at genomic position

Standards on describing genetic variation

19

Table 2.3 Overview of the HGVS numbering schemes. Indicator

Usage

Example

g.

Genomic, noncircular reference sequences. Counting starts at the first nucleotide. Genomic, circular reference sequences. Counting starts at the first nucleotide. Noncoding transcript reference sequences. Counting starts at the first nucleotide. Coding transcript reference sequences. Counting starts at the first nucleotide of the translation initiation codon (ATG). RNA reference sequences. Counting starts at the first nucleotide for noncoding RNA, and at the first nucleotide of the translation initiation codon (AUG) for coding RNA. Protein reference sequences. Counting starts at the first amino acid.

NC_000015.9: g.40699841G>C NC_012920.1:m.3243A>G

m. n. c. r.

p.

NR_002725.2: n.1725_1726insA NM_002225.3:c.158G>C NM_002225.3:r.154_243del

NP_002216.2: p.Leu52_Arg81del

40699841 of reference sequence NC_000015.9. The numbering schemes that are allowed depend on the reference sequence given. For an overview of the HGVS numbering schemes, see Table 2.3. For an overview of the most common variant type descriptions, see Table 2.4. Note that we will not go into detail here, like how to describe variants relative to a coding DNA reference sequence in 50 or 30 UTRs, exons and introns, or on the protein level. For this, please consult the HGVS variant nomenclature website. Table 2.4 Examples of the most common DNA variant descriptions using the HGVS nomenclature and the VCF file format. Note that the HGVS rules for variants on DNA level are different from the rules for variants on RNA and protein levels. Variant type

Sequence

HGVS description

VCF file format(a)

Substitution

AACGTT AACCTT AACGTT AAC-TT AAC-GTT AACAGTT AAC-GTT AACCGTT AACGTT AATATT AACGTT ACGTTT

g.4G>C

4GC

g.4del

3 CG C

g.3_4insA

3 C CA

g.3dup

3 C CC

g.3_4delinsTA

3 CG TA

g.2_4inv

2 ACG CGT

Deletion Insertion Duplication Deletion-insertion Inversion

a This column shows the POS, REF, and ALT fields, respectively. Other fields like the CHROM field (storing the chromosome) and the ID field (where a possible dbSNP identifier can be stored) are removed for simplification. Also, the most simple representation of the variant is chosen; in reality, the VCF file format can describe the same variant in many ways.

20

Chapter 2 General considerations: terminology and standards

In HGVS, there is only one correct way to describe a variant. To remove ambiguity in the description of variants in repeated sequences, the HGVS nomenclature uses the so-called 30 rule, defining that any variant should be described by its most 30 position possible. If a stretch of nucleotides is shortened by one, the 30 rule states that the variant is described as if the last nucleotide has been deleted. In addition, HGVS nomenclature uses strict definitions per variant type as well as prioritization rules when several options would be possible. For instance, prioritization defines that a T to A change is described as a substitution and not as an inversion or a deletioneinsertion. Unfortunately, the HGVS nomenclature guidelines are not used without error. Frequently observed errors include not applying the 30 rule and incorrectly describing duplications as insertions [13]. Both partially derive from NGS pipelines where deletions and insertions are largely 50 aligned, and where duplications cannot be defined in the most commonly used file format. Fortunately, several computational tools exist that can help describe variants correctly: the hgvs software package [24] for direct integration into bioinformatics projects written in the Python programming language, and the online Mutalyzer [26] and VariantValidator [25] tools, which both can also be installed locally. The latter two tools provide a website interface for verifying variants one by one, a batch interface to verify a file with variants, and online Application Programming Interfaces (APIs), online interfaces allowing software to communicate with these online tools. Although the HGVS nomenclature is comprehensive on having one valid description for each variant, not all areas have yet been covered in great detail. For instance, changing the sequence ACTG to TC can, following the current recommendations, be described as a deletioneinsertion event of the entire sequence (1_4delinsTC), or as a substitution of the first base followed by a deletion of the last two bases ([1A>T;3_4del]). Although describing this change as one variant seems obvious, variant callers in use in NGS software pipelines often choose for the latter and define two single variants. When data are then shared without allelic information, it is no longer clear whether these two variants are in cis or trans, and they can no longer be merged into one variant. In addition, when the consequences of such variants on protein level are reported, serious errors may occur [27]. When encountering two closely spaced variants, it is recommended to check whether they are on the same allele. If so, check for the combination in external sources like population frequency databases and gene variant databases.

Variant classification To get to a clinical classification of a variant, i.e., draw a conclusion regarding the effect of the variant on the health of an individual, one has to combine all available knowledge. The available knowledge has two major components: all observations of the variant in individuals with or without the associated phenotype and the interpretation of the (predicted) consequences of the variant for the function of the gene (functional, or molecular classification). To discriminate between the effect of a variant on the function of a gene and its consequences for the individual carrying the variant, the HGVS recommends to clearly separate the functional classification from the clinical classification.

Functional classification Functional classification of a certain variant can only be done in an animal model or by performing a functional assay, where the function of the gene with the variant is compared with that of the wild-type

Variant classification

21

form of the gene. A very simple, semifunctional assay which is mostly neglected is the analysis of an RNA sample from the patient. This analysis can provide valuable insights into the possible effects of a variant on RNA processing, especially splicing, and therefore the probable effect of the variant on the gene’s function. Actual functional assays are often difficult and costly to be performed. Firstly, a clear idea of the function of a gene is required. Secondly, cell types must be available where the gene is expressed and the consequences of variants can be measured. For several relatively common diseases, functional assays have been developed, e.g., breast cancer [28] or colon cancer [29] (see Chapter 8 for more details), which can be applied when a new variant is encountered and no data from other studies are available. Although functional assays cannot give direct evidence regarding the consequences of a variant in a patient, they do aid in providing evidence for a weighted clinical classification. For functional classifications, there is currently no standard that is broadly followed among different areas of research. Assays measuring the function of genes affected by certain variants commonly use relative efficiency, indicated in a percentage relative to the wild-type gene [28,30]. For pharmacogenetics, a four-class system is now common, describing the efficiency of drug metabolism: Poor, Intermediate, Extensive/Normal, and Ultrarapid [31,32]. The HGVS proposed a 5-tier functional classification, implemented by the LOVD databases: Affects function, Probably affects function, Effect unknown, Probably does not affect function, and Does not affect function. It should be noted that “affects function” includes “improved function” (e.g., increased enzymatic activity) which may give a protective effect, a feature most classification systems are not able to cope with.

Clinical classification Genome diagnostic laboratories and researchers have broadly accepted the use of a standard, 5-tier scheme for classifying variants [33]. Benign, Likely benign, Variant of uncertain significance (VUS), Likely pathogenic, and Pathogenic also described as Class 1 through 5, respectively. Although this system standardized the naming of the different variant classifications, it did not cover what evidence would be required to get a variant classified in each category. This issue was tackled by the ACMG/AMP classification guidelines, published in 2015 [34], giving detailed recommendations on how to build up the evidence to classify a variant into one of these five categories. The recommendations clearly fulfilled a need and they were quickly adopted, greatly improving comparability of classifications made by laboratories worldwide (see Chapter 3). Several improvements to the ACMG/ AMP guidelines have been published since, as some of the original specifications were open to different interpretations. Additional modifications were sometimes required to apply the guidelines for certain genes and/or diseases [35e38]. When clinically classifying variants, it is highly recommended to use the ACMG/AMP guidelines. Although a one-on-one relationship between a functional and a clinical classification seems obvious, there are many exceptions. It is clear that when a variant does not alter the function of a gene, the health of the individual will not be affected, either. The opposite, however, is not always true. Whether a “pathogenic” variant will cause a disease first depends on the mode of inheritance including dominant and recessive, autosomal and X/Y-linked, mitochondrial, imprinting, etc. In some cases, variants are “pathogenic” only in a compound heterozygous state, while they have no effect in a homozygous state. The effect of a variant on the function of a gene is also rarely “all or nothing.”

22

Chapter 2 General considerations: terminology and standards

In many cases the function is decreased, but not to zero. In pharmacogenomics, variants are cataloged that increase or decrease enzyme function, thereby affecting the level in which an individual is able to metabolize chemicals (medicine) and therefore how effective a certain drug dosage is or whether the drug is effective at all. A variant increasing enzyme activity, e.g., a full gene duplication, is “protective” (compensating) when combined with a variant decreasing enzyme activity. The same variant should, however, be considered as “deleterious” in a homozygous state when the enzyme has to metabolize the drug to generate the active substance or when it removes the substance from the body too fast. Another reason for discordance between functional and clinical classifications of a variant is penetrance. A variant may be clearly affecting the function of a gene, but may cause disease only in a subset of the individuals in which the change has been found. One example are variants in the BRCA1 or BRCA2 gene, each increasing the risk to develop breast cancer before a certain age, yet some with a much higher risk than others, e.g., the low penetrant BRCA1 p.Arg1699Gln variant [39]. A recent study [20] shows that when clearly pathogenic variants with a reduced penetrance are classified by different labs, some will classify them as Class 5 (Pathogenic) while others classify them as Class 3 (VUS). Another example are the many nondisease phenotypes including eye color, the ability to taste bitter, or blood group types. While any variant in the ABO gene would be classified as benign (Class 1), the variant is clearly of medical relevance when the individual needs a blood transfusion. Another problem is the gray zone between a disease and a trait, where the same variant for a disease would be clinically classified as a Class 5 but for a trait as a Class 1. There are currently no guidelines regarding how to deal with clinical classifications and decreased penetrance, although the ClinGen project established a Low Penetrance/Risk Allele working group that has begun investigating researchers’ and clinicians’ opinions on the matter and the ENIGMA consortium that classifies breast cancer variants published recommendations on the matter [40], recommending to always report variant-associated risks as absolute measurements. To make overall classification more informative, the “Global Variome shared LOVD” [41] has started to work with classifications including “pathogenic (dominant),” “pathogenic (recessive),” “pathogenic (maternal),” “pathogenic (paternal),” “pathogenic (!),” “association,” “VUS,” “VUS (!),” “benign (dominant),” “benign (recessive),” “benign (!),” etc. Dominant, recessive, maternal, and paternal are used to indicate the mode of inheritance. The “!” is used to warn for exceptional features like reduced penetrance, protective, not in a homozygous state, etc. Benign (dominant) and benign (recessive) are used to indicate associations with nondisease phenotypes.

Standards on reporting disorders and phenotypes Describing and classifying a variant is only relevant in the context of a certain phenotype (disease). As such, standards for describing phenotypes are equally important as standards for describing genetic variants. A clear description of the characteristic features observed in the individual investigated following a standardized ontology is crucial for elucidating geneephenotype relationships, genee panel development, and recruiting patients for clinical trials. Most unresolved genetic diseases remaining nowadays derive from rare to ultrarare cases with only a few patients known worldwide. A critical component to establish causative diseaseegene links is always to identify more cases where

Challenges and considerations

23

variants in a gene give a similar phenotype. The phenotype ontology used most widely is the Human Phenotype Ontology (HPO) [42]. HPO was developed specifically to facilitate automated phenotype matching. For this, a nested tree structure was defined where deeper terms give an increasing level of detail on each specific phenotypic feature. HPO is actively updated and community efforts have been initiated to translate HPO terms into different languages, an important step to further increase its value. One element of the very successful GeneMatcher initiative [43], built to identify patients with similar geneephenotype properties, is the use of HPO-based phenotype matching. Finally, HPO allows phenotype matching across species, facilitating correlations between human disease and observations in animal models (e.g., mouse). Another important standard is provided by OMIM (Online Mendelian Inheritance of Man) [44], providing standardized disease names and the description of the main disease features observed. While HPO defines individually identifiable phenotypic features, OMIM focuses on disorders (diseases), in which these features are found in specific combinations. For instance, OMIM defines “Duchenne muscular dystrophy” as associated with pathogenic variants in the DMD gene, while HPO defines abnormalities such as “muscle weakness,” “difficulty climbing stairs,” and “scoliosis,” which among others make up the complete phenotype of Duchenne muscular dystrophy. As such, OMIM is suitable for diagnoses while HPO is suitable for anamneses and automated phenotype matching. Several tools are available for searching and collecting HPO terms, suggesting disorders that match the given terms. Examples are Phenomizer [45] and PhenoTips [46], both web-based systems. When the underlying disorder is unknown, HPO terms can still be used for identifying genes for gene-panelbased exome data analysis like with PanelApp [47], and for matching phenotypes when variants identified in patients are also found in external databases. When a diagnosis is possible, adding a disease identifier from OMIM supports submission to external databases such as LOVD or ClinVar and facilitates disease-oriented queries.

Challenges and considerations Although NGS has successfully been implemented into the clinical workflow and the technology has since continuously been improved, challenges remain, and there are important caveats to consider. There are both technical and biological reasons for false-negative and false-positive results. It is important to keep these in mind when analyzing NGS data. The most apparent difference between analyzing data from NGS studies and techniques such as Sanger sequencing is the sheer size of the region in the genome covered. Naturally, technology has to do part of the data analysis for us, as it’s not humanly possible to check every sequence read that is generated. However, there is a major difference in the way results are presented, which is often overlooked. With Sanger sequencing, not having a proper sequence returned would simply mean a failed analysis. With today’s NGS results, a lack of variants found in any genomic region may mean there are no variants, the pipeline is not able to determine whether there are any variants, or the region is missing (deleted) in the sample. There are several reasons why the sequencing pipeline software may generate a false-negative result. The most common reason for a false-negative result is the lack of coverage. Where there are not enough sequencing reads covering a certain region, the variant caller software can only produce

24

Chapter 2 General considerations: terminology and standards

low confidence genotypes, or won’t produce genotypes at all. It is good practice to prevent false positives by only selecting good quality variant calls for analysis, by putting a threshold on the minimum number of reads required to report a certain variant. During this step, it is often overlooked that the number of expected reads on the sex chromosomes in males is only half of that in females, requiring flexible thresholds to prevent false negatives. A lack of coverage can also be the consequence of how the sequencing was performed, the quality of the sample, an incomplete or incorrect reference sequence, or reads mapping to multiple regions in the genome. Reads are often discarded when they map to multiple regions in the genome, like the telomeres and centromeres, (large) repetitive sequences, and segmental duplications. Finally, variants not present in all cells (mosaic variants), including heteroplastic mitochondrial variants, are hard to detect because either the tissue sample sequenced does not harbor the variant, or the allele fraction may be too low. When a variant is only represented in a small subset of reads, it is considered likely a sequencing error and ignored or called with only low confidence [48]. Variants that are too large to be contained in one sequencing read are also often missed. Large CNVs, like whole exon or gene deletions and duplication, can be observed by, respectively, a drop or a rise in the coverage in the affected region, but they are often left undetected by common variant calling methods as the reads spanning the variant breakpoints do not align to the reference genome well and end up discarded, or are not present in the sample analyzed as can be the case with whole-exome sequencing. Specialized tools have become available that specifically detect such large changes [49,50]. When using long-range single-molecule sequencing techniques, it is much easier to detect CNVs. False-positive results can occur due to reads aligning to pseudogenes or duplicated regions such as the pseudoautosomal regions (PARs) on the X and Y chromosomes. It is not uncommon for variants to be detected on the Y chromosome in female individuals, or heterozygous variants to be detected on the X-chromosome for male individuals. Besides rare genetic disorders, a far more common cause is sequencing reads aligning to the wrong chromosome in these PARs. The same holds for pseudogenes; variants detected in a gene could very well be variants belonging to the homologous region in the pseudogene. Variants derived from gene conversion, where the sequence of a pseudogene gets copied to the normal gene, are especially problematic since all variant reads will map to the pseudogene. In such cases, designing gene-specific primers and confirming variants by Sanger sequencing is essential to rule out false positives [48]. Another source for error is nonnormalized variant calling. Due to variants not being normalized before annotation is loaded, a common variant with a high frequency in the population might not get annotated as such when it has been described differently. On the other hand, a causative variant might not immediately be recognized as such because its description doesn’t match that of its record in gene variant databases. Also note that NGS analysis pipelines often split deletioneinsertion events into multiple variants, possibly causing similar problems.

Conclusions In this chapter, we introduced genetic variation and how NGS analyses measure it. We discussed the different kinds of variants that exist on the DNA, RNA, and protein levels, their possible locations, and how they can influence a gene’s function at many different levels, including transcription, RNA processing, translation, protein processing, and protein modification.

References

25

We listed the relevant standards for describing, interpreting, and reporting genetic variants and phenotypes, the relationship and differences between these standards, explained their importance and current status of needed improvement, and mentioned some caveats to keep in mind when describing, classifying, and reporting variants and phenotypes. Finally, we explained which technical and biological factors can lead to false-negative and falsepositive results and suggest some solutions to these problems.

References [1] Scally A. The mutation rate in human evolution and demographic inference. Curr Opin Genet Dev 2016;41: 36e43. https://doi.org/10.1016/j.gde.2016.07.008. [2] Kong A, Frigge ML, Masson G, et al. Rate of de novo mutations and the importance of father-s age to disease risk. Nature 2012;488(7412):471e5. https://doi.org/10.1038/nature11396. [3] Francioli LC, Menelaou A, Pulit SL, et al. Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat Genet 2014;46(8):818e25. https://doi.org/10.1038/ ng.3021. [4] Steri M, Idda ML, Whalen MB, Orru` V. Genetic variants in mRNA untranslated regions. Wiley Interdiscip Rev RNA 2018;9(4). https://doi.org/10.1002/wrna.1474. [5] De Angioletti M, Lacerra G, Sabato V, Carestia C. bþ45G / C: a novel silent b-thalassaemia mutation, the first in the Kozak sequence. Br J Haematol 2004;124(2):224e31. https://doi.org/10.1046/j.13652141.2003.04754.x. [6] Aartsma-Rus A, Van Deutekom JCT, Fokkema IF, Van Ommen GJB, Den Dunnen JT. Entries in the Leiden Duchenne muscular dystrophy mutation database: an overview of mutation types and paradoxical cases that confirm the reading-frame rule. Muscle Nerve 2006;34(2):135e44. https://doi.org/10.1002/mus.20586. [7] Gisler FM, Von Kanel T, Kraemer R, Schaller A, Gallati S. Identification of SNPs in the cystic fibrosis interactome influencing pulmonary progression in cystic fibrosis. Eur J Hum Genet 2013;21(4):397e403. https://doi.org/10.1038/ejhg.2012.181. [8] Anna A, Monika G. Splicing mutations in human genetic disorders: examples, detection, and confirmation. J Appl Genet 2018;59(3):253e68. https://doi.org/10.1007/s13353-018-0444-7. [9] McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham F. Deriving the consequences of genomic variants with the ensembl API and SNP effect predictor. Bioinformatics 2010;26(16):2069e70. https:// doi.org/10.1093/bioinformatics/btq330. [10] Di Leo E, Panico F, Tarugi P, Battisti C, Federico A, Calandra S. A point mutation in the lariat branch point of intron 6 of NPC1 as the cause of abnormal pre-mRNA splicing in Niemann-Pick type C disease. Hum Mutat 2004. https://doi.org/10.1002/humu.9287. [11] Zoghbi HY, Beaudet AL. Epigenetics and human disease. Cold Spring Harb Perspect Biol 2016. https:// doi.org/10.1101/cshperspect.a019497. [12] Kung C-P, Maggi LB, Weber JD. The role of RNA editing in cancer development and metabolic disorders. Front Endocrinol 2018;9. https://doi.org/10.3389/fendo.2018.00762. [13] Deans ZC, Fairley JA, den Dunnen JT, Clark C. HGVS nomenclature in practice: an example from the United Kingdom national external quality assessment scheme. Hum Mutat 2016;37(6):576e8. https://doi.org/ 10.1002/humu.22978. [14] Braschi B, Denny P, Gray K, et al. Genenames.org: the HGNC and VGNC resources in 2019. Nucleic Acids Res 2019;47(D1):D786e92. https://doi.org/10.1093/nar/gky930. [15] Sayers EW, Cavanaugh M, Clark K, Ostell J, Pruitt KD, Karsch-Mizrachi I. GenBank. Nucleic Acids Res 2019. https://doi.org/10.1093/nar/gky989.

26

Chapter 2 General considerations: terminology and standards

[16] Yates AD, Achuthan P, Akanni W, et al. Ensembl 2020. Nucleic Acids Res 2020. https://doi.org/10.1093/nar/ gkz966. [17] MacArthur JAL, Morales J, Tully RE, et al. Locus reference genomic: reference sequences for the reporting of clinically relevant sequence variants. Nucleic Acids Res 2014. https://doi.org/10.1093/nar/gkt1198. [18] Danecek P, Auton A, Abecasis G, et al. The variant call format and VCFtools. Bioinformatics 2011;27(15): 2156e8. https://doi.org/10.1093/bioinformatics/btr330. [19] den Dunnen JT, Dalgleish R, Maglott DR, et al. HGVS recommendations for the description of sequence variants: 2016 update. Hum Mutat 2016;37(6):564e9. https://doi.org/10.1002/humu.22981. [20] Fokkema IFAC, van der Velde KJ, Slofstra MK, et al. Dutch genome diagnostic laboratories accelerated and improved variant interpretation and increased accuracy by sharing data. Hum Mutat 2019. https://doi.org/ 10.1002/humu.23896. [21] Cleary JG, Braithwaite R, Gaastra K, et al. Comparing variant call files for performance benchmarking of next-generation sequencing variant calling pipelines. Cold Spring Harbor Labs J 2015. https://doi.org/ 10.1101/023754. [22] Tan A, Abecasis GR, Kang HM. Unified representation of genetic variants. Bioinformatics 2015. https:// doi.org/10.1093/bioinformatics/btv112. [23] Bayat A, Gae¨ta B, Ignjatovic A, Parameswaran S. Improved VCF normalization for accurate VCF comparison. Bioinformatics 2017. https://doi.org/10.1093/bioinformatics/btw748. [24] Wang M, Callenberg KM, Dalgleish R, et al. hgvs: a python package for manipulating sequence variants using HGVS nomenclature: 2018 update. Hum Mutat 2018;39(12):1803e13. https://doi.org/10.1002/ humu.23615. [25] Freeman PJ, Hart RK, Gretton LJ, Brookes AJ, Dalgleish R. VariantValidator: accurate validation, mapping, and formatting of sequence variation descriptions. Hum Mutat 2018;39(1):61e8. https://doi.org/10.1002/ humu.23348. [26] Wildeman M, van Ophuizen E, den Dunnen JT, Taschner PEM. Improving sequence variant descriptions in mutation databases and literature using the Mutalyzer sequence variation nomenclature checker. Hum Mutat 2008;29(1):6e13. https://doi.org/10.1002/humu.20654. [27] Spooner W, McLaren W, Slidel T, et al. Haplosaurus computes protein haplotypes for use in precision drug design. Nat Commun 2018;9(1). https://doi.org/10.1038/s41467-018-06542-1. [28] Millot GA, Carvalho MA, Caputo SM, et al. A guide for functional analysis of BRCA1 variants of uncertain significance. Hum Mutat 2012;33(11):1526e37. https://doi.org/10.1002/humu.22150. [29] Drost M, Tiersma Y, Thompson BA, et al. A functional assayebased procedure to classify mismatch repair gene variants in Lynch syndrome. Genet Med 2019;21(7):1486e96. https://doi.org/10.1038/s41436-0180372-2. [30] Boonen RACM, Rodrigue A, Stoepker C, et al. Functional analysis of genetic variants in the high-risk breast cancer susceptibility gene PALB2. Nat Commun 2019;10(1):5296. https://doi.org/10.1038/s41467-01913194-2. [31] Daly AK. Pharmacogenetics: a general review on progress to date. Br Med Bull 2017;124(1):65e79. https:// doi.org/10.1093/bmb/ldx035. [32] Wendt FR, Sajantila A, Moura-Neto RS, Woerner AE, Budowle B. Full-gene haplotypes refine CYP2D6 metabolizer phenotype inferences. Int J Leg Med 2018;132(4):1007e24. https://doi.org/10.1007/s00414017-1709-0. [33] Plon SE, Eccles DM, Easton D, et al. Sequence variant classification and reporting: recommendations for improving the interpretation of cancer susceptibility genetic test results. Hum Mutat 2008;29(11):1282e91. https://doi.org/10.1002/humu.20880.

References

27

[34] Richards S, Aziz N, Bale S, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med 2015;17(5):405e24. https://doi.org/10.1038/gim.2015.30. [35] Nykamp K, Anderson M, Powers M, et al. Sherloc: a comprehensive refinement of the ACMG-AMP variant classification criteria. Genet Med 2017;19(10):1105e17. https://doi.org/10.1038/gim.2017.37. [36] Gelb BD, Cave H, Dillon MW, et al. ClinGen’s RASopathy expert panel consensus methods for variant interpretation. Genet Med 2018;20(11):1334e45. https://doi.org/10.1038/gim.2018.3. [37] Kelly MA, Caleshu C, Morales A, et al. Adaptation and validation of the ACMG/AMP variant classification framework for MYH7-associated inherited cardiomyopathies: recommendations by ClinGen’s inherited cardiomyopathy expert panel. Genet Med 2018;20(3):351e9. https://doi.org/10.1038/gim.2017.218. [38] Romanet P, Odou M-F, North M-O, et al. Proposition of adjustments to the ACMG-AMP framework for the interpretation of MEN1 missense variants. Hum Mutat 2019;40(6):661e74. https://doi.org/10.1002/ humu.23746. [39] Moghadasi S, Meeks HD, Vreeswijk MP, et al. The BRCA1 c. 5096G>A p.Arg1699Gln (R1699Q) intermediate risk variant: breast and ovarian cancer risk estimation and recommendations for clinical management from the ENIGMA consortium. J Med Genet 2018;55(1):15e20. https://doi.org/10.1136/ jmedgenet-2017-104560. [40] Spurdle AB, Greville-Heygate S, Antoniou AC, et al. Towards controlled terminology for reporting germline cancer susceptibility variants: an ENIGMA report. J Med Genet 2019;56(6):347e57. https://doi.org/ 10.1136/jmedgenet-2018-105872. [41] Fokkema IFAC, Taschner PEM, Schaafsma GCP, Celli J, Laros JFJ, den Dunnen JT. LOVD v.2.0: the next generation in gene variant databases. Hum Mutat 2011;32(5):557e63. https://doi.org/10.1002/humu.21438. [42] Ko¨hler S, Carmody L, Vasilevsky N, et al. Expansion of the human phenotype ontology (HPO) knowledge base and resources. Nucleic Acids Res 2019;47(D1):D1018e27. https://doi.org/10.1093/nar/gky1105. [43] Bruel A-L, Vitobello A, Mau-Them FT, et al. 2.5 years’ experience of GeneMatcher data-sharing: a powerful tool for identifying new genes responsible for rare diseases. Genet Med 2019;21(7):1657e61. https://doi.org/ 10.1038/s41436-018-0383-z. [44] Amberger JS, Bocchini CA, Scott AF, Hamosh A. OMIM.org: leveraging knowledge across phenotype-gene relationships. Nucleic Acids Res 2019;47(D1):D1038e43. https://doi.org/10.1093/nar/gky1151. [45] Ko¨hler S, Schulz MH, Krawitz P, et al. Clinical diagnostics in human genetics with semantic similarity searches in ontologies. Am J Hum Genet 2009;85(4):457e64. https://doi.org/10.1016/j.ajhg.2009.09.003. [46] Girdea M, Dumitriu S, Fiume M, et al. PhenoTips: patient phenotyping software for clinical and research use. Hum Mutat 2013;34(8):1057e65. https://doi.org/10.1002/humu.22347. [47] Martin AR, Williams E, Foulger RE, et al. PanelApp crowdsources expert knowledge to establish consensus diagnostic gene panels. Nat Genet 2019;51(11):1560e5. https://doi.org/10.1038/s41588-019-0528-2. [48] Santani A, Murrell J, Funke B, et al. Development and validation of targeted next-generation sequencing panels for detection of germline variants in inherited diseases. Arch Pathol Lab Med 2017;141(6):787e97. https://doi.org/10.5858/arpa.2016-0517-RA. [49] Hehir-Kwa JY, Tops BBJ, Kemmeren P. The clinical implementation of copy number detection in the age of next-generation sequencing. Expert Rev Mol Diagn 2018;18(10):907e15. https://doi.org/10.1080/ 14737159.2018.1523723. [50] Roca I, Gonza´lez-Castro L, Ferna´ndez H, Couce ML, Ferna´ndez-Marmiesse A. Free-access copy-number variant detection tools for targeted next-generation sequencing data. Mutat Res n.d.; 779:114-125. https:// doi.org/10.1016/j.mrrev.2019.02.005.

CHAPTER

International consensus guidelines for constitutional sequence variant interpretation

3

Steven M. Harrison1, Tina F. Pesaran2, Jessica L. Mester3 1

Broad Institute of MIT and Harvard, Cambridge, MA, United States; 2Ambry Genetics, Aliso Viejo, CA, United States; 3 GeneDx, Gaithersburg, MD, United States

Historical variant interpretation approaches Testing individuals for hereditary diseases first became feasible primarily via researchers who focused on one disease of interest. A small number of individuals with a distinct and severe phenotype were typically among the first selected for testing, many of whom were members of families that had participated in linkage studies to help identify the specific gene being interrogated. Thus the a priori risk to identify the disease-causing germ line variant was high, leading researchers to safely assume in most cases that an identified variant was pathogenic. Early guidance recommended a small number (by today’s standards) of healthy controlsd50 individuals (equaling 100 chromosomes)dalso be sequenced to aid in the identification of common “polymorphic” variants not associated with disease [1]. Missense variants were soon recognized as being especially problematic [2], and Cotton et al. [1] delineated several types of evidence that could prove helpful in understanding a missense variant’s role in disease causation, including segregation analysis, nature of the amino acid substitution, and functional assays. When genetic testing moved from research settings to being offered as a fee-based service, the population of individuals undergoing testing broadened to include patients with less-specific phenotypes and unaffected individuals with a family history of disease. Testing a larger and lower-risk population led to the identification of novel variants with less certain pathogenic status, placing greater importance on having a thoughtful process by which to classify variants. As the population of individuals undergoing testing evolved, so too did the terminology used to describe the impact of identified variants. Originally, “mutation” was understood to mean disease-causing, and the term “polymorphism” was used to describe a variant present in at least 1% of the general population and thus believed to not be disease-causing due to its high prevalence [1]. However, some disease-causing variants are present at high frequencies in specific populations, muddying these definitions [3]. Laboratories also took to developing their own terminology for variant classification, leading to multiple terms being used to describe the same variant. The same disease-causing variant might have been called “deleterious,” “pathogenic,” “mutation,” or “positive”; a harmless variant might have been defined as “benign,” “polymorphism,” or “neutral”; and variants without enough evidence to classify in Clinical DNA Variant Interpretation. https://doi.org/10.1016/B978-0-12-820519-8.00005-3 Copyright © 2021 Elsevier Inc. All rights reserved.

29

30

Chapter 3 International consensus guidelines

either direction might have been referred to as “unknown,” “uncertain,” or “unclassified” [4]. Just as the terminology used to categorize variants might differ among laboratories, in the absence of unifying standards, so too did the degree of certainty or amount of evidence required to classify a variant as disease-causing or harmless. The American College of Medical Genetics (ACMG) provided recommendations for sequence variant interpretation based foremost on the type of variant identified and whether the variant had been previously reported as a “recognized” cause of disease, and highlighted the utility of follow-up family testing and functional studies to identify additional helpful evidence toward classification [5,6]. However, these guidelines did not define what degree of certainty was required to classify a variant as disease-causing or harmless, how much weight should be assigned to different types of evidence, or how to combine different pieces of evidence to arrive at a classification. The first effort toward combining evidence types into a multifactorial likelihood model was published in 2004 by Goldgar et al. [7], whose model incorporated several clinical data points together with conservation and functional data to arrive at likelihood ratios the authors deemed high enough to support (1000:1) or refute (100:1) causality for the BRCA1 and BRCA2 hereditary breast and ovarian cancer susceptibility genes. A major step forward in unifying variant classification terminology occurred in 2008 with the publication of a 5-tier classification system developed by the IARC Unclassified Genetic Variants Working Group [8]. Developed within a framework of hereditary cancer susceptibility testing, this system not only proposed unifying terminology for variant classification, but also defined probabilities of pathogenicity necessary to achieve each level of classification. Slight shifts from the previous degrees of certainty proposed by Goldgar et al. to support causality (from 99.9% to 99%) or refute causality (99%e99.9%) were adopted, with probabilities of pathogenicity added for the “likely pathogenic” and “likely not pathogenic” categories. These quantitative methods for variant categorization and assessment were adopted by the ENIGMA (Evidence-based Network for the Interpretation of Germline Mutant Alleles) consortium, an international group of researchers and clinicians initiated in 2009 to further variant classification efforts for BRCA1 and BRCA2, associated with Hereditary Breast and Ovarian Cancer syndrome (HBOC) [9]. InSiGHT (International Society for Gastrointestinal Hereditary Tumors) also applied quantitative strategies to classify variants in the mismatch repair genes associated with Lynch syndrome along with a qualitative system able to be utilized in the absence of data supporting a quantitative, multifactorial analysis [10]. Although gene-specific quantitative approaches to variant classification are ideal, the expertise and amount of data necessary for their development are rate-limiting factors for thousands of other more rare and newly described genetic disorders. This chapter will provide an overview of current variant classification practices and ongoing efforts to enable gene-specific, data-driven classification strategies.

Current variant classification practices: the 2015 ACMG/AMP guideline for sequence variant interpretation Background and scope As noted above, previous ACMG recommendations for sequence variant interpretation provided interpretative categories of sequence variants (such as sequence variation is previously unreported and is a recognized cause of the disorder), but did not define terminology able to succinctly describe each

Current variant classification practices: the 2015 ACMG/AMP guideline

31

category or detailed variant classification guidance [5,6]. Due to sequencing technology evolution increasing the number of variants requiring interpretation and variability in variant interpretation between laboratories, ACMG partnered with the Association for Molecular Pathology (AMP) to revise and publish the 2015 guideline [11]. This updated guideline recommends specific standard terminology to classifying sequence variants and provides a process for determining the appropriate classification term. The process and framework defined in the 2015 ACMG/AMP guideline is intended for determining the clinical significance of a sequence variant in a gene with an established role in a Mendelian disorder and is applicable for all sequence variants regardless of testing methodology (i.e., genotyping, single gene, gene panel, exome, or genome). The 2015 ACMG/AMP guideline specifically notes that the framework is not intended for the interpretation of somatic variants, pharmacogenomic variants, or variants associated with non-Mendelian disorders. Additionally, the guideline is not intended for variants in candidate genes or genes that have no known association to human disease. The guideline was subsequently adopted for use by the Association for Clinical Genetic Science (United Kingdom, https://www.acgs.uk.com/media/10847/acgs_consensus_statement_on_adoption_ of_acmg_guidelines__1_.pdf) and is widely used by other international laboratories [12].

Variant classification terminology With regard to classification terminology for sequence variants, the 2015 ACMG/AMP guideline recommends a 5-tier classification system for variants relevant to Mendelian disease: pathogenic (P), likely pathogenic (LP), uncertain significance (VUS), likely benign (LB), and benign (B). The 2015 ACMG/AMP guideline recommends that terms such as “mutation” and “polymorphism” not be used, as the terms may incorrectly be assumed to correlate with pathogenic and benign, respectively. Instead, the guideline recommends use of the term “variant” accompanied by one of the five classification terms listed above. If desired, laboratories may also create additional sub-tiers to further differentiate classification terms, such as “variant of uncertain significancedfavor pathogenic” or “variant of uncertain significancedfavor benign” as subclassifications for variants of uncertain significance. Given the range of confidence that could be described by the term “likely,” the 2015 guideline proposes “that the terms ‘likely pathogenic’ and ‘likely benign’ be used to mean greater than 90% certainty of a variant either being disease causing or benign to provide laboratories with a common, albeit arbitrary, definition” [11]. This 90% threshold for likely pathogenic is lower than the 95% threshold proposed in the IARC guideline because the ACMG/AMP committee felt that 90% confidence in pathogenicity was high enough to warrant medical management changes based on results and avoid frequent reclassifications from LP to VUS [8,11]. The 2015 guideline does not specify pathogenicity confidence for pathogenic or benign classifications; however, as seen in Table 3.1, the genetics community has assumed the pathogenicity confidence thresholds for pathogenic and benign classifications are > 0.99 and < 0.001, respectively [8,13].

Evidence criteria and application To standardize variant classification, the 2015 ACMG/AMP guideline outlines 28 types of evidence, or criteria, that are encountered during variant assessment. Each criterion is assigned a direction, either pathogenic (P) or benign (B), and a relative strength: very strong (VS), strong (S), moderate (M), supporting (P), or stand-alone (A). Combination of the direction and relative strength creates an

32

Chapter 3 International consensus guidelines

Table 3.1 ACMG/AMP variant classification terminology and probabilities of pathogenicity. Classification terminology

Probability of pathogenicity

Pathogenic Likely pathogenic Uncertain Likely benign Benign

>99% 90%e99% 10%e90% 0.1%e10% 0.99

Yes

4: Likely pathogenic 3: Uncertain significance

0.95e0.99 (0.90e0.99) 0.05e0.949 (0.10e0.89)

Yes

2: Likely benign/ Likely little clinical significance

0.001e0.049 (0.001e0.09)

No

1: Benign/Little clinical significance

1 denotes a positive predictor for pathogenicity, and 0.1 and 10 illustrates the magnitude of the likelihood ratio against or for pathogenicity, respectively. BC, breast cancer; CIMRA, complete in vitro mismatch repair activity; CRC, colorectal cancer; DCIS, ductal carcinoma in situ; EC, endometrial cancer; ER, estrogen receptor; GVGD, Grantham variation and Grantham deviation; HER2, human epidermal growth factor receptor 2; IHC, immunohistochemistry; LOH, loss of heterozygosity; LR, likelihood ratio; MMR, mismatch repair; MSI, microsatellite instability; OC, ovarian cancer; TN, triple negative; VRF, variant allele read fraction; yrs, age at diagnosis in years.

where Data is a type of observational data. An LR of 1 is uninformative, while an LR > 1 the evidence favors pathogenic interpretation and LR < 1 favors benign status. The formulas for converting between odds (LR) and probabilities (P) of pathogenicity are: LR ¼

P ð1  PÞ



LR 1 þ LR

Derivation of likelihood ratios

45

FIGURE 4.1 Cont’d

In order to estimate LRs, a well-characterized reference set of variants or other quantitative evidence is used to calibrate each component. “Clinical” calibration involves collecting a reference set of variants that are (assumed) pathogenic or (assumed) benign. The evidence used to classify the variants in the calibration reference set must be independent of the component being calibrated. For example, functional assay evidence cannot be used in the classification of variants that were included in a calibration set for deriving a functional assay LR. Additionally, the calibration set cannot include variants that were classified based on evidence that was not measured by the component being calibrated. For example, a variant that is pathogenic due to a splicing aberration cannot be included in the reference set for the calibration of an assay for protein function only. It is possible to apply an LR for a given evidence type to variants included in the reference sets, using a jackknife estimation approach; the distributions informing the LR estimation is recalculated by excluding one variant at a time, and the LR derived is then applied to the variant that was excluded. There are multiple approaches for LR derivation depending on the variables (categorical vs. continuous) and reference datasets used. These approaches include calculating LRs from the

46

Chapter 4 Quantitative modeling: multifactorial integration

FIGURE 4.1 Cont’d

proportions of variant carriers in categories or modeling the probabilities using regression analysis. Ascertainment of the reference dataset and method of derivation can also determine whether an LR can be applicable to any variant or is dataset-specific. The approaches are described in more detail in this section and a decision tree for the different approaches is shown in Fig. 4.2. For all of these approaches the predictors used in the model must be statistically significant in order to be used in the model.

Proportions of categorical data When the observational data are categorical, a straightforward method of LR estimation using proportions of pathogenic variant carriers (PVC) and noncarriers (NC) can be utilized. For these purposes the LR formula is equivalent to: LR ¼

pvc=PVC nc=NC

where pvc is the number of pathogenic variant carriers in a given category, PVC is the total carriers with measured data over all categories, nc is the number of noncarriers within the same category as pvc, and NC is the total noncarriers with measured data over all categories. LR estimates can be further stratified to account for confounding using the ManteleHaenszel approach [10] or strata-specific LRs can be used; for example, by age group (10) being measured in the observational data used as the reference, there are many possibilities for categorizing the data. There could be a multitude of possible categories that could end up with very small numbers; thus proportional LR derivation is impractical. In this situation logistic regression can be used to fit a model to the dataset and then identify the best predictors (statistically significantly associated with pathogenic variant carrier status) to use [11]. The output odds ratio (OR) estimates for each parameter need to be converted to an LR using the following formula: LR ¼ ORðparameterÞ  ORðconstantÞ  ð1  PÞ=P

48

Chapter 4 Quantitative modeling: multifactorial integration

where the constant is the constant/intercept term from the logistic regression model, and P is the proportion of pathogenic variant carriers in the dataset used to build the model. The P is calculated based on the reference dataset used because differences in ascertainment of data affect the weight of clinical parameters as predictors of pathogenicity [12]. It is for this reason that LRs estimated through this approach are dataset-specific and can only be applied to variants identified within that dataset. Additional data from independent datasets allow rederivation and reapplication of LRs to more variants based on evidence pertinent to each dataset.

Calibration of continuous variables When the data is a numeric continuous variable it is calibrated against a reference set of variants classified with confidence using other types of data. The reference sets can be binary pathogenic versus benign, or a continuous variable of previously computed odds toward pathogenicity. The position on the distribution curve for pathogenic versus benign variants is plotted, and the LR toward pathogenicity can be calculated based on the ratio of the probability of the component output falling in the pathogenic reference distribution versus the benign reference distribution, normalized if necessary [4]. When linear regression is used to estimate LRs, the distribution of the regression residuals for the pathogenic and benign variant correlations can be used to determine the probability that the observed value falls within either of the two expected distributions. To constrain the resulting regression equations to produce probabilities between 0.00 and 1.00, the variable can be log transformed [13,14]. Before using this approach for any component, it must be determined if there is a significant correlation between the data output and pathogenicity.

Combining likelihood ratios Estimation of an overall odds for pathogenicity integrates different lines of evidence used in variant interpretation through multiplying LRs derived for each independent component, for example: Odds in favor of pathogenicity ¼ LR1  LR2  LR3 .. LRs are also combined within the same independent component through multiplication, when there are multiple pieces of evidence for a given variant (e.g., multiple informative families with the same variant). This implicitly assumes that there is no allelic heterogeneity for a given variant within the same gene. LRs for pathogenicity for the hereditary cancer genes BRCA1/2, the MMR genes, and TP53 are estimated from statistical analyses of observational data such as cosegregation of the variant in families, personal or family history, pathological characteristics, and in vitro studies. Whereas the nonspecific Mendelian disease models and CDKN2A use a semiquantitative approach to LR estimation. For the ACMG/AMP Bayesian framework, the relative strength of the ordered evidence categories (i.e., very strong, strong, moderate, supporting) are scaled to the power of two [5], and the Multifactorial Variant Prediction model scales to the IARC probability criteria cutoffs [6]. Similarly, the CDKN2A model uses some LRs scaled to the probability cutoffs, while others are based on the calculated positive and negative predictive values of functional assays or bioinformatic tools [2].

Components of quantitative models

49

Components of quantitative models This section describes components of the multifactorial likelihood models that have been published or are under development. Use Fig. 4.1 as an overview for the components of the BRCA1/2 (Fig. 4.1A), MMR gene (Fig. 4.1B), and TP53 (Fig. 4.1C) models.

Prior probability of pathogenicity The nonspecific Mendelian disease and gene/pathway-specific models use different starting points (prior probabilities) for variant interpretation. The TP53 and Multifactorial Variant Prediction models both use a null/naive probability that makes no prior assumptions about the variant, that is, a prior probability of 0.5 [4, 6]. For the ACMG/AMP Bayesian framework a prior probability of 0.1 was set because it is the lower probability of pathogenicity cutoff for a variant of uncertain significance [5]. For BRCA1/2, prior probabilities are currently based on a sequence-based analysis that includes bioinformatic predictions of variant effect on protein sequence or mRNA splicing. These values were originally estimated through heterogeneity tests of specific variants from a Myriad Genetics dataset in groups defined by bioinformatic scores [11]. Align-GVGD classes are used to assign probabilities for the effects of missense variants based on conservation and changes in amino acid physiochemical properties within functionally important domains. Outside these domains a variant is unlikely to be pathogenic unless it causes a splicing aberration [15]. Probabilities for de novo splice site creation and loss of wild-type canonical splice sites are also categorical variables and were computed using a training set of experimentally validated variants to calibrate the mRNA splicing predictions from MaxEntScan [16]. Further, results from recent studies involving BRCA1/2 [17e19] suggest that there is need to reassess use of bioinformatic prediction information for these genes, considering new bioinformatic tools or combinations of tools, determining predictions for BRCA1/2 separately, and examining predictive capacity across multiple clinical datasets. Unlike any of the other models, CDKN2A uses various forms of prior knowledge for the prior odds in favor of pathogenicity for variants under assessment [2]. These include variant frequency in cases and/or controls, ratio of pathogenic to likely neutral variants in locus-specific databases, and previously published information about a specific variant. For the MMR genes, a regression-based calibration of a combined MAPP þ PolyPhen-2.1 (retrained without MMR gene data) output expressed as a continuous variable was used to provide probabilities based on the effect of missense substitutions on protein function for every possible substitution [13]. The accuracy of this tool has recently been replicated in a clinical laboratory reference set [20]. To ensure a variant begins quantitative assessment as a VUS, the bioinformatic prior probability for the MMR genes is capped between 0.1 and 0.9. The BRCA1/2 and MLH1/MSH2/MSH6 protein sequence and mRNA splicing prior probabilities for every single-nucleotide substitution are available on the HCI Database of Prior Probabilities of Pathogenicity (http://priors.hci.utah.edu/PRIORS/).

50

Chapter 4 Quantitative modeling: multifactorial integration

Bioinformatic predictions For the TP53 multifactorial likelihood model, contrary to the BRCA1/2 and MMR gene models, bioinformatic predictions are not used as prior probabilities but as component LRs [4]. This allows multiple tools to be used as separate components. The included tools, Align-GVGD and BayesDel, were selected due to higher performance in comparison to other tools to predict pathogenicity of p53 missense variants [21]. Instead of using categorical outputs of the possible Align-GVGD classes or binary prediction from BayesDel separated by a single cutoff, each individual bioinformatic score (on a continuous scale) is converted to a calibrated LR toward pathogenicity. For Align-GVGD, Grantham deviation scores are used as a measure of pathogenic impact, while Grantham variation scores are used as a measure of benign impact. The LR-converted scores have been made available for every possible missense substitution variant [4].

Cosegregation The quantitative cosegregation analysis method considers both likelihood of causality and likelihood of noncausality of the variant in relation to age-specific penetrance of disease phenotypes, given the background population prevalence of these disease types [22]. Age-specific penetrance of disease phenotypes relevant for BRCA1/2 [23e25], MLH1 and MSH2 [26,27], MSH6 [28], or PMS2 [29] is currently used. The LR directly measures the strength of evidence across all available family evidence, including probabilistic information from untested individuals, and accounts for ascertainment due to proband cancer status. The cosegregation online server is able to calculate LRs for multiple genes, including BRCA1/2 and the MMR genes (http://fengbj-laboratory.org/coseg/analysis.html) [30].

Functional assays Complete in vitro mismatch repair activity assay A cell-free complete in vitro mismatch repair activity (CIMRA) assay has been calibrated for functional assessment of MLH1, MSH2, and MSH6 missense substitutions [14,31]. Based on the calibration using linear regression of log-transformed odds for pathogenicity, CIMRA assay activity can be converted to an LR for pathogenicity. Ideally, both variant-level and clinical data are included in variant classification. Thus, thresholds of Class 4, likely pathogenic or Class 2, likely benign are implemented for a given variant when the CIMRA LR is the only component LR. This assay was chosen based on the feasibility of establishing its use in a clinical diagnostic setting. Currently only MMR activities generated by this assay can be converted to LRs for pathogenicity. Other assays would need to be calibrated before implementation, to account for experimental differences between the assays [32].

TP53 assays A systematic assay measuring loss-of-function in human cancer cell lines has been published for p53 [33], and results can be incorporated into the TP53 multifactorial model. This assay is preferable over other previous systematic assays for a number of reasons: transactivation assays in yeast [34] were already used to define reference sets for other components of the model; and other recent loss-offunction assays in human cells [35] only examined variants in a single domain. The continuous output of the selected loss-of-function assay [33] can be clinically calibrated, and the LR toward

Components of quantitative models

51

pathogenicity for a given missense variant calculated based on the ratio of probability of each assay output in the pathogenic versus benign reference distribution. Similarly, as for the bioinformatic component, the LR-converted outputs can be estimated for every possible missense substitution variant.

BRCA1/2 assays For BRCA1/2, proportional LRs have been calculated for categorical scoring of combined information on protein functional data and mRNA splicing results, and demonstrated the potential of including an interpretation of the combined information as additional components in future models [36]. However, there are an increasing number of publications using different assay types, which raises a number of questions. Recommendations on how to measure the robustness of results from different functional assay publications/sources have been put forward by a ClinGen Working Group, and this approach incorporates categorization of odds for pathogenicity for results from a given assay [37]. Although, another issue is the possibility of double counting, when assays from different publications are measuring exactly the same function, or the output of different assays is correlated, and it would thus be inappropriate to combine LRs from different publications. Yet another issue is when impact on one aspect of protein activity/function confers pathogenicity, but is not measured in all assays. In this instance it would seem illogical to combine a negative and a positive LR for different assay types. These are issues to be considered in future iterations of multifactorial modeling, irrespective of gene. The caveat to use of any functional assay data in multifactorial modeling is that it is important to consider results in the context of alternative mechanisms by which a variant may impact gene function, the most obvious being impact on mRNA splicing. Use of a generic decision tree, as has been developed for MMR genes [32], can be helpful to establish the appropriate application of LRs based on functional data.

Personal and family history BRCA1/2 The rationale behind the inclusion of this component of the model is that key features of an individual’s personal and family history (e.g., disease status, type of cancer, age of onset, and number and age of relatives with breast or ovarian cancer) can predict the probability of identifying a disease-causing variant. For BRCA1/2, this type of data has been modeled using logistic regression (due to its complexity), comparing the personal and family histories of individuals with known pathogenic variants to noncarriers in two large diagnostic laboratory datasets from Myriad Genetics [11] and Ambry Genetics [19]. The historic (Myriad) and more recent (Ambry) cohorts differ in several respects. Namely, the cost of testing was a lot more expensive in 2007 leading to stricter testing criteria. Thus, compared to the Myriad dataset, the prevalence of pathogenic variants was nearly half in the Ambry dataset. Furthermore, the ability to exclude individuals with pathogenic variants in several other genes from the noncarrier group and to include additional positive predictors (e.g., triple negative breast cancer status) was possible for the Ambry dataset [19]. Due to the differences in ascertainment, the LR estimates derived for this data type are dataset-specific and should only be applied to variants based on clinical data provided by the original dataset used for derivation (Myriad vs. Ambry in this instance).

52

Chapter 4 Quantitative modeling: multifactorial integration

TP53 A similar approach used for BRCA1/2 has also been incorporated in the TP53 multifactorial likelihood model. It was developed using clinical data from Ambry Genetics and provides LRs toward pathogenicity based on a number of parameters including personal history of LieFraumeni syndromee associated malignancies. Analysis of personal history in single gene testing versus multiple gene testing highlighted issues around ascertainment affecting the weight of clinical parameters as predictors of pathogenicity [12]. Clinical features were far less predictive for variants identified in the highly ascertained single gene dataset. Further development of this component to use both personal and family history data, in particular for cases ascertained for multiple gene testing, will likely advance use of clinical presentation information in the TP53 model.

Tumor characteristics MMR tumor characteristics MMR deficiency in tumors is caused by germ line pathogenic MMR gene variants or somatic inactivation through MLH1 promoter hypermethylation or double somatic mutations in MMR genes. MMR deficiency in tumors is measured using microsatellite instability (MSI) and/or MMR protein expression using immunohistochemistry (IHC) [38,39]. Using Colon Cancer Family Registry (CCFR) and Ambry Genetics datasets, LRs based on MMR tumor status have been estimated by dividing the proportion of MMR gene carriers with abnormal MSI/IHC (deficient) and normal MSI/IHC (proficient) tumors by that of noncarriers, respectively [3,20]. BRAF V600E status in MMR-deficient colorectal tumors is also incorporated as a negative predictor, due to its high correlation with MLH1 promoter methylation [40]. The Ambry dataset provides updated LRs (including additions of IHC status, endometrial tumors, and age of diagnosis) from a less biased dataset compared to the CCFR dataset because all individuals underwent full screening of the MMR genes using multigene panel testing. These tumor characteristics have been incorporated into the MMR multifactorial likelihood model as positive and negative predictors of pathogenicity depending on MSI/IHC status, cancer type (colorectal and endometrial), age of diagnosis, and BRAF V600E status, with MLH1 promoter hypermethylation status soon to be included. As mentioned earlier in this section, acquired somatic mutations in the MMR genes can cause tumor MMR deficiency similar to that observed for inherited pathogenic variants [38,41]. Based on this principle and in order to utilize the increasing amount of tumor sequencing data generated, LRs have been derived for MMR gene tumor variant read allele fraction (VRF) by dividing the proportion of pathogenic variant carriers (equivalent to driver mutations) above a VRF cutoff in MMR-deficient tumors with/without loss of heterozygosity (LOH), and the proportion below the VRF cutoff in MMRdeficient tumors with/without LOH by that of intronic assumed benign variants (passenger mutations), respectively [42].

BRCA1/2 breast cancer histopathology Breast tumors from BRCA1 pathogenic variant carriers are more likely to be high grade and triple negative (estrogen receptor, ER; progesterone receptor; and human epidermal growth factor receptor 2, HER2) [43]. While less distinctive than BRCA1, breast tumors from BRCA2 pathogenic variant carriers are more likely to be ER positive and high grade [44]. Using large pathology datasets accrued by the Consortium of Investigators of Modifiers of BRCA1/2 (CIMBA) and the Breast Cancer Association

Components of quantitative models

53

Consortium (BCAC), country-stratified LRs for pathogenicity based on breast tumor histopathology have been derived using the ManteleHaenszel approach [45]. Breast tumor ER status, grade, and triple negative status by age of diagnosis and gene have been incorporated into the BRCA1/2 multifactorial likelihood model as positive and negative predictors.

TP53 breast cancer histopathology There is evidence in the literature that breast tumors from TP53 pathogenic variant carriers are more likely to be HER2þ than those from noncarriers [46e54]. Using data from Ambry Genetics and GeneDx, dataset-specific LRs toward pathogenicity for HER2 breast tumor status diagnosed at different age ranges have been calculated for TP53, by dividing the proportion of TP53 carriers with HER2þ and HER2- breast tumors by that of noncarriers, respectively [55]. HER2 status in women with breast cancer diagnosed before age 40 was a significant predictor of pathogenicity. This evidence type can be easily incorporated into the TP53 multifactorial likelihood model as a positive or negative predictor of pathogenicity depending on the HER2 status.

TP53 somatic/germ line ratio Somatic data can be especially informative for TP53 variant classification, since it is one of the most commonly somatically mutated genes in human cancers [56]. Specifically, the relative rate of somatic variation for a TP53 variant in comparison to germ line variation can be an important piece of evidence informing pathogenicity, given that pathogenic variants in both the germ line and somatic level should occur at a rate that reflects the nucleotide specific mutation rate showing a strong positive correlation. On the other hand, benign variants that become fixed in a population should occur only by chance in tumors and lack this correlation [4]. This component was calibrated using reference sets based on clinical and strict functional evidence, and the LR for a given missense variant is calculated based on the distribution of regression residuals by dividing the probability that the observed value falls within each expected distribution.

Co-occurrence with a pathogenic variant For dominantly inherited diseases, co-occurrence with a pathogenic variant in the same gene can be a negative predictor for pathogenicity. This data type has been incorporated into the BRCA1/2 multifactorial model as a dataset-specific LR. It is derived by dividing the probability of being a compound heterozygote for two pathogenic mutations by the overall frequency of pathogenic mutations in the dataset, given the total number of times the variant was observed and the number of those that were in trans with a known pathogenic variant. For example, a single observation of a BRCA1/2 VUS cooccurring in trans with a pathogenic variant in the same gene equates to a co-occurrence LR of 0.04, assuming that 2.5% of the tested individuals have a pathogenic variant in that gene [1].

Population-based data Population frequency High frequency in healthy controls (e.g., >5%) is considered as a key piece of evidence in favor of benign impact for many high-risk genes [9]. This information can be converted to LRs toward pathogenicity through categorical and continuous approaches. There is potential to include population

54

Chapter 4 Quantitative modeling: multifactorial integration

frequency data through the calculation of categorical proportional LRs [36]. On the other hand, any given frequency can be converted to a continuous variable to be incorporated into the multifactorial likelihood model using the approach described in the Calibration of continuous variables section. To do so, the LR toward pathogenicity for a given variant is calculated by comparing its frequency against the distribution of known pathogenic and benign variants, where surrogate reference sets (e.g., truncating vs. silent or intronic variants) might be used when there are insufficient variants with known pathogenicity in the population dataset. To avoid sequencing errors an allele count cutoff may be recommended. Furthermore, the possibility of excluding small sample sets due to founder effects should be considered [57,58]. Absence in a control database can also be converted to an LR toward pathogenicity by conservatively assuming an allele count of 1.

Healthy adult individuals According to the original ACMG/AMP BS2 criterion, it is strong strength level of benign impact to observe a variant in a healthy adult individual [9]. This evidence type can also be incorporated into the TP53 multifactorial model providing an LR toward pathogenicity depending on the number of healthy adult individuals (considered here as females who reached the age of 60 without personal history of cancer, given the expected early onset of disease associated with TP53). Dataset-specific LRs using data from diagnostic testing laboratories can be calculated by dividing the proportion of TP53 carriers with no cancer by age 60 to that of noncarriers, as previously calculated [59]. The resulting LRs toward pathogenicity will provide a negative predictor that can utilize datasets of known cancer-free controls, e.g., the public database FLOSSIES (https://whi.color.com/), which presents variant-level data from women who reached 70 years without cancer.

Caseecontrol data As an additional population-based component, caseecontrol data has been incorporated into the BRCA1/2 multifactorial likelihood model. It specifically uses data generated for caseecontrol analyses, but using a likelihood-based approach that considers presentation for each variant carrier individual. The LRs can be derived by comparing the distributions of variant carriers among cases and controls under the hypothesis that the variant has the same age-specific relative risks as the “average” pathogenic variant compared to that it is not associated with any increased breast cancer risk [60].

Caveats and considerations The strength of the multifactorial likelihood model is its ability to utilize multiple sources of data readily available in the clinical setting. However, there are assumptions that underlie the basic model. Specifically, mutual independence of individual components that all variants that are disease-causing have similar penetrance or associated tumor phenotypes (i.e., frameshift/nonsense vs. missense), and that there is not another unidentified causal sequence alteration in cis with the variant being analyzed. The most important principle behind using this method is that at least two independent sources of data should be used for variant classification. Preferably, at least one of the sources addresses functionality of the variant and at least one links the human allele to the disease. To ensure robust variant classification, the following caveats and recommendations should be noted before finalizing variant classification. Only independent lines of evidence should be included. Tumor pathology information for a proband cannot be considered if the proband was selected for testing on

References

55

the basis of said tumor pathology criteria, as it would upwardly bias the probability output in favor of pathogenicity. For the models that use prior probabilities, further investigation is necessary for any variant with extreme prior probability and minimal additional evidence, and it is currently recommended that clinical or laboratory evidence should contribute an LR of 2.0 (to reach final Class 4 likely pathogenic or Class 5 pathogenic). A variant that displays an obvious discordance between the predicted prior probability and additional clinical or laboratory evidence should be reinvestigated to: consider alternative molecular mechanisms not yet/ adequately captured by bioinformatic tools of choice, e.g., exonic splice enhancer alterations; establish the veracity of clinical or laboratory evidence; assess the possibility that the variant may be a hypomorph exhibiting reduced penetrance relative to high risk observed for classical pathogenic variants that encode a premature termination codon (nonsense or frameshift); and/or exclude the very low probability that there is a pathogenic variant in cis. For variants where no adequate prior has been estimated, e.g., UTR, intronic variants located at positions between þ6 of one exon and 20 of the next exon, a prior of 0.02 may be applied; this assumes very conservatively that 2/100 of such variants might be associated with a high risk of disease. In summary, it is recognized that there will be evolution in data types and LR estimations used in multifactorial over time, and relevant changes will be incorporated into updated multifactorial likelihood classifications as necessary.

References [1] Goldgar DE, Easton DF, Deffenbaugh AM, Monteiro AN, Tavtigian SV, Couch FJ. Integrated evaluation of DNA sequence variants of unknown clinical significance: application to BRCA1 and BRCA2. Am J Hum Genet 2004;75(4):535e44. [2] Miller PJ, Duraisamy S, Newell JA, et al. Classifying variants of CDKN2A using computational and laboratory studies. Hum Mutat 2011;32(8):900e11. [3] Thompson BA, Goldgar DE, Paterson C, et al. A multifactorial likelihood model for MMR gene variant classification incorporating probabilities based on sequence bioinformatics and tumor characteristics: a report from the Colon Cancer Family Registry. Hum Mutat 2013;34(1):200e9. [4] Fortuno C, Cipponi A, Ballinger ML, et al. A quantitative model to predict pathogenicity of missense variants in the TP53 gene. Hum Mutat 2019;40(6):788e800. [5] Tavtigian SV, Greenblatt MS, Harrison SM, et al. Modeling the ACMG/AMP variant classification guidelines as a Bayesian classification framework. Genet Med : Offic J Am Coll Med Gene 2018;20(9):1054e60. [6] Qian D, Li S, Tian Y, et al. A Bayesian framework for efficient and accurate variant prediction. PloS One 2018;13(9):e0203553. [7] Spurdle AB, Greville-Heygate S, Antoniou AC, et al. Towards controlled terminology for reporting germline cancer susceptibility variants: an ENIGMA report. J Med Genet 2019;56(6):347e57. [8] Plon SE, Eccles DM, Easton D, et al. Sequence variant classification and reporting: recommendations for improving the interpretation of cancer susceptibility genetic test results. Hum Mutat 2008;29(11):1282e91. [9] Richards S, Aziz N, Bale S, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of medical genetics and genomics and the association for molecular pathology. Genet Med : Offic J Am Coll Med Gene 2015;17(5):405e24. [10] Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst 1959;22(4):719e48.

56

Chapter 4 Quantitative modeling: multifactorial integration

[11] Easton DF, Deffenbaugh AM, Pruss D, et al. A systematic genetic assessment of 1,433 sequence variants of unknown clinical significance in the BRCA1 and BRCA2 breast cancer-predisposition genes. Am J Hum Genet 2007;81(5):873e83. [12] Fortuno C, Pesaran T, Dolinsky J, et al. Differences in patient ascertainment affect the use of gene-specified ACMG/AMP phenotype-related variant classification criteria: evidence for TP53. Human mutation. 2020. [13] Thompson BA, Greenblatt MS, Vallee MP, et al. Calibration of multiple in silico tools for predicting pathogenicity of mismatch repair gene missense substitutions. Hum Mutat 2013;34(1):255e65. [14] Drost M, Tiersma Y, Thompson BA, et al. A functional assay-based procedure to classify mismatch repair gene variants in Lynch syndrome. Genet Med : Offic J Am Coll Med Gene 2019;21(7):1486e96. [15] Tavtigian SV, Byrnes GB, Goldgar DE, Thomas A. Classification of rare missense substitutions, using risk surfaces, with genetic- and molecular-epidemiology applications. Hum Mutat 2008;29(11):1342e54. [16] Vallee MP, Di Sera TL, Nix DA, et al. Adding in silico assessment of potential splice aberration to the integrated evaluation of BRCA gene unclassified variants. Hum Mutat 2016;37(7):627e39. [17] Leman R, Gaildrat P, Gac GL, et al. Novel diagnostic tool for prediction of variant spliceogenicity derived from a set of 395 combined in silico/in vitro studies: an international collaborative effort. Nucleic Acids Res 2018;48(3):1600e1. [18] Hart SN, Hoskin T, Shimelis H, et al. Comprehensive annotation of BRCA1 and BRCA2 missense variants by functionally validated sequence-based computational prediction models. Genet Med : Offic J Am Coll Med Gene 2019;21(1):71e80. [19] Li H, LaDuca H, Pesaran T, et al. Classification of variants of uncertain significance in BRCA1 and BRCA2 using personal and family history of cancer from individuals in a large hereditary cancer multigene panel testing cohort. Genet Med : Offic J Am Coll Med Gene 2019. https://doi.org/10.1038/s41436-41019-4072941431. [20] Li S, Qian D, Thompson BA, et al. Tumour characteristics provide evidence for germline mismatch repair missense variant pathogenicity. J Med Genet 2020;57(1):62e9. [21] Fortuno C, James PA, Young EL, et al. Improved, ACMG-Compliant, in silico prediction of pathogenicity for missense substitutions encoded by TP53 variants. Human mutation. 2018. [22] Thompson D, Easton DF, Goldgar DE. A full-likelihood method for the evaluation of causality of sequence variants from family data. Am J Hum Genet 2003;73(3):652e5. [23] Kuchenbaecker KB, Hopper JL, Barnes DR, et al. Risks of breast, ovarian, and contralateral breast cancer for BRCA1 and BRCA2 mutation carriers. J Am Med Assoc 2017;317(23):2402e16. [24] Antoniou AC, Cunningham AP, Peto J, et al. The BOADICEA model of genetic susceptibility to breast and ovarian cancers: updates and extensions. Br J Canc 2008;98(8):1457e66. [25] Mocci E, Milne RL, Me´ndez-Villamil EY, et al. Risk of pancreatic cancer in breast cancer families from the breast cancer family registry. Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research. Am Soc Prevent Oncol 2013;22(5):803e11. cosponsored by the. [26] Jenkins MA, Dowty JG, Ait Ouakrim D, et al. Short-term risk of colorectal cancer in individuals with lynch syndrome: a meta-analysis. J Clin Oncol : Offic J Am Soc Clin Oncol 2015;33(4):326e31. [27] Dowty JG, Win AK, Buchanan DD, et al. Cancer risks for MLH1 and MSH2 mutation carriers. Hum Mutat 2013;34(3):490e7. [28] Baglietto L, Lindor NM, Dowty JG, et al. Risks of Lynch syndrome cancers for MSH6 mutation carriers. J Natl Cancer Inst 2010;102(3):193e201. [29] ten Broeke SW, Brohet RM, Tops CM, et al. Lynch syndrome caused by germline PMS2 mutations: delineating the cancer risk. J Clin Oncol : Offic J Am Soc Clin Oncol 2015;33(4):319e25. [30] Belman S, Parsons MT, Spurdle AB, Goldgar DE, Feng BJ. Considerations in assessing germline variant pathogenicity using cosegregation analysis. Gene Med : Offic J Am Coll Med Gene 2020;22(12): 2052e9.

References

57

[31] Drost M, Tiersma Y, Glubb D, et al. Two integrated and highly predictive functional analysis-based procedures for the classification of MSH6 variants in Lynch syndrome. Gene Med : Offic J Am Coll Med Gene 2020. https://doi.org/10.1038/s41436-41019-40736-41432. [32] Thompson BA, Walters R, Parsons MT, et al. Contribution of mRNA splicing to mismatch repair gene sequence variant interpretation. Front Genet 2020;11(798). [33] Giacomelli AO, Yang X, Lintner RE, et al. Mutational processes shape the landscape of TP53 mutations in human cancer. Nat Genet 2018;50(10):1381e7. [34] Kato S, Han SY, Liu W, et al. Understanding the function-structure and function-mutation relationships of p53 tumor suppressor protein by high-resolution missense mutation analysis. Proc Natl Acad Sci U S A 2003;100(14):8424e9. [35] Kotler E, Shani O, Goldfeld G, et al. A systematic p53 mutation library links differential functional impact to cancer mutation pattern and evolutionary conservation. Mol Cell 2018;71(1):178e90. e178. [36] Parsons MT, Tudini E, Li H, et al. Large scale multifactorial likelihood quantitative analysis of BRCA1 and BRCA2 variants: an ENIGMA resource to support clinical variant classification. Human mutation. 2019. [37] Brnich SE, Abou Tayoun AN, Couch FJ, et al. Recommendations for application of the functional evidence PS3/BS3 criterion using the ACMG/AMP sequence variant interpretation framework. Genome Med 2019; 12(1):3. [38] Mensenkamp AR, Vogelaar IP, van Zelst-Stams WA, et al. Somatic mutations in MLH1 and MSH2 are a frequent cause of mismatch-repair deficiency in Lynch syndrome-like tumors. Gastroenterology 2014; 146(3):643e6. e648. [39] Richman S. Deficient mismatch repair: read all about it (Review). Int J Oncol 2015;47(4):1189e202. [40] Parsons MT, Buchanan DD, Thompson B, Young JP, Spurdle AB. Correlation of tumour BRAF mutations and MLH1 methylation with germline mismatch repair (MMR) gene mutation status: a literature review assessing utility of tumour features for MMR variant classification. J Med Genet 2012;49(3):151e7. [41] Haraldsdottir S, Hampel H, Tomsic J, et al. Colon and endometrial cancers with mismatch repair deficiency can arise from somatic, rather than germline, mutations. Gastroenterology 2014;147(6):1308e16. [42] Shirts BH, Konnick EQ, Upham S, et al. Using somatic mutations from tumors to classify variants in mismatch repair genes. Am J Hum Genet 2018;103(1):19e29. [43] Lakhani SR, Van De Vijver MJ, Jacquemier J, et al. The pathology of familial breast cancer: predictive value of immunohistochemical markers estrogen receptor, progesterone receptor, HER-2, and p53 in patients with mutations in BRCA1 and BRCA2. J Clin Oncol : Offic J Am Soci Clini Oncol 2002;20(9):2310e8. [44] Bane AL, Beck JC, Bleiweiss I, et al. BRCA2 mutation-associated breast cancers exhibit a distinguishing phenotype based on morphology and molecular profiles from tissue microarrays. Am J Surg Pathol 2007; 31(1):121e8. [45] Spurdle AB, Couch FJ, Parsons MT, et al. Refined histopathological predictors of BRCA1 and BRCA2 mutation status: a large-scale analysis of breast cancer characteristics from the BCAC, CIMBA, and ENIGMA consortia. Breast Canc Res: BCR 2014;16(6):3419. [46] Wilson JR, Bateman AC, Hanson H, et al. A novel HER2-positive breast cancer phenotype arising from germline TP53 mutations. J Med Genet 2010;47(11):771e4. [47] Rath MG, Masciari S, Gelman R, et al. Prevalence of germline TP53 mutations in HER2þ breast cancer patients. Breast Can Res Treat 2013;139(1):193e8. [48] Masciari S, Dillon DA, Rath M, et al. Breast cancer phenotype in women with TP53 germline mutations: a Li-Fraumeni syndrome consortium effort. Breast Can Res Treat 2012;133(3):1125e30. [49] Eccles DM, Li N, Handwerker R, et al. Genetic testing in a cohort of young patients with HER2-amplified breast cancer. Ann Oncol : Offic J Euro Soc Med Oncol 2016;27(3):467e73. [50] Bougeard G, Renaux-Petel M, Flaman JM, et al. Revisiting Li-Fraumeni syndrome from TP53 mutation carriers. J Clin Oncol : Official J Am Soc Clin Oncol 2015;33(21):2345e52.

58

Chapter 4 Quantitative modeling: multifactorial integration

[51] Slavin TP, Maxwell KN, Lilyquist J, et al. The contribution of pathogenic variants in breast cancer susceptibility genes to familial breast cancer risk. NPJ Breast Canc 2017;3:22. [52] Packwood K, Martland G, Sommerlad M, et al. Breast cancer in patients with germline TP53 pathogenic variants have typical tumour characteristics: the Cohort study of TP53 carrier early onset breast cancer (COPE study). J Pathol Clin Res 2019;5(3):189e98. [53] Khincha PP, Best AF, Fraumeni Jr JF, Loud JT, Savage SA, Achatz MI. Reproductive factors associated with breast cancer risk in Li-Fraumeni syndrome. Eur J Canc 2019;116:199e206. [54] Melhem-Bertrandt A, Bojadzieva J, Ready KJ, et al. Early onset HER2-positive breast cancer is associated with germline TP53 mutations. Cancer 2012;118(4):908e13. [55] Fortuno C, Mester J, Pesaran T, et al. Suggested application of HER2þ breast tumor phenotype for germline TP53 variant classification within ACMG/AMP guidelines. Human mutation. 2020. [56] Olivier M, Hollstein M, Hainaut P. TP53 mutations in human cancers: origins, consequences, and clinical use. Cold Spring Harbor Perspect Biol 2010;2(1):a001008. [57] Shi L, Webb BD, Birch AH, et al. Comprehensive population screening in the Ashkenazi Jewish population for recurrent disease-causing variants. Clin Genet 2017;91(4):599e604. [58] Kaariainen H, Muilu J, Perola M, Kristiansson K. Genetics in an isolated population like Finland: a different basis for genomic medicine? J commun Gene 2017;8(4):319e26. [59] Fortuno C, Lee K, Olivier M, et al. Specifications of the ACMG/AMP variant interpretation guidelines for germline TP53 variants. Hum Mutat 2020. https://doi.org/10.1002/humu.24152. In press. [60] de la Hoya M, Soukarieh O, Lo´pez-Perolio I, et al. Combined genetic and splicing analysis of BRCA1 c.[594-2A>C; 641A>G] highlights the relevance of naturally occurring in-frame transcripts for developing disease gene variant classification algorithms. Hum Mol Genet 2016;25(11):2256e68.

CHAPTER

Clinical and genetic evidence and population evidence

5

George S. Charames1,2, 3, Peter Sabatini4, 5, Nicholas Watkins1, 6, 7 1

Pathology and Lab Medicine, Mount Sinai Hospital, Toronto, Ontario, Canada; 2Lab Medicine and Pathobiology, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada; 3Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario, Canada; 4Department of Clinical Laboratory Genetics, University Health Network, Toronto, Ontario, Canada; 5Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada; 6Hereditary Kidney Disease Clinic, Department of Nephrology, Princess Margaret Hospital, University Health Network; 7Molecular Genetics, Faculty of Medicine, University of Toronto, Toronto, Ontario, Canada

Introduction Phenotype description Phenotype is important in clinical genetics, genetic testing, and variant analysis. We consider the phenotype to be physical traits that are observable and measurable including appearance, development, and behavior. These can be characteristics such as weight and height or results of laboratory tests such as blood type or estimated kidney function. The phenotype is a presentation of the genotype and typically influenced by environmental or other factors. Prior to the availability of genetic testing, a patient’s phenotype was primarily what a clinician had available in order to come to a diagnosis. Many, if not most, genetic conditions are established solely based on the clinical diagnosis as determined by phenotypic traits, for example, tuberous sclerosis, neurofibromatosis type 1, or Marfan syndrome to name a few. In some cases, a clinician will forego ordering genetic testing when they have already established a clinical diagnosis and genetic test results are not useful for any other family members or clarification of the disease.

Medical pedigree In addition to the patient’s medical history clinicians have also relied on family history information reported by their patient. In medicine a pedigree is a graphical presentation of a family’s health history. Recording and interpreting the family history, as part of working up a patient, is considered by many to be a standard in medical genetics practice [1]. The pedigree can be used to identify patterns associated with the mode of inheritance of a disease. Family history data are difficult to obtain correctly and clinicians work hard at understanding what type of information can be trustworthy. Patients can misreport, misunderstand, or simply be unaware of their family member’s medical histories. However, family history can be a powerful resource used to establish a genetic diagnosis. Clinical DNA Variant Interpretation. https://doi.org/10.1016/B978-0-12-820519-8.00021-1 Copyright © 2021 Elsevier Inc. All rights reserved.

59

60

Chapter 5 Clinical and genetic evidence

The power of medical and family history data is being increasingly recognized. For instance, genetic testing might identify many variants and the phenotypic information and family history can sometimes help to interpret the result. Deep phenotyping is a new term found in the literature associated with a comprehensive review of the patient’s phenotype. It usually implies some type of computational analysis and indeed programs have been developed to integrate patient charts to better describe the phenotype. This can be advantageous over the generic and partial descriptions that can be found in consult notes and submitted on genetic test requisitions. We may find that over time the source of big data will be driven more so by the phenotype rather than the genotype. For a more detailed description of nuances involved in phenotyping, see Chapter 17. Throughout this chapter we examine how medical history and family history information are considered when assessing a variant. Beginning with population-based data followed by pedigree analysis, molecular pathology, and mosaicism we illustrate how clinical information can be used to guide analysis of DNA variants.

Population genetic resources One of the current sources of big data comes from genetic test results from large groups of people. Databases such as the Genome Aggregation Database (gnomAD) and the Database of Genomic Variants are collecting data from healthy controls and making it available to the public. Although the data have many caveats which are discussed throughout this chapter, these databases reflect the current status of our knowledge of the human genome. There are well-described genetics phenomenon that have helped shape the human genome and can be observed in these databases.

Fitnessdreproductive success Charles Darwin may not have come up with the term “survival of the fittest”; however, he was the first to describe the process of natural selection. He identified that survival and more importantly reproduction can be due to differences in traits which are heritable. We now understand that these traits are coded by our DNA and we call these traits phenotypes and the changes in DNA genotypes. Variations in DNA can rarely be advantageous; however, in medical genetics, we generally are interested in variants that are deleterious. Often deleterious variations will lead to reduced survival and reproduction. Fitness is the reproductive success of a specific genotype, not to be confused with physical fitness. People with genotypes that cause disease will be less healthy and therefore less likely to pass on disease-causing variants. Fitness is a probability that applies to a group of individuals with a particular genotype. It is the chance that individuals with the genotype will have offspring, or rather the average number of offspring that an individual with a genotype will have. A genotype associated with fitness contributes to a high average number of offspring which contributes to more individuals in the population with that genotype. Conversely a genotype associated with low fitness contributes to a low average number of offspring and a low number of individuals with that genotype. DNA variants that are pathogenic are more often seen in populations of patients with disease compared with populations of healthy people. Therefore, one of the most common types of data used

Introduction

61

in variant interpretation is the frequency with which a variant is observed. The general understanding is that variants that are pathogenic for monogenic disorders will be rare and variants that are benign will be common. This is a vast oversimplification but the concept builds the foundation of understanding how to use population genetic studies in variant interpretation.

HardyeWeinberg equilibrium Fitness, however, is not the only pressure that affects the genotypes and allele frequencies that we observe in a population. The HardyeWeinberg equilibrium (HWE) is the principle that allele frequencies within a population will stay constant unless there are certain disturbances that affect the population [2]. Several factors affect allelic gene frequencies in the population. These included genetic drift, immigration and emigration (gene flow), population bottleneck, and nonrandom mating. A population that is not being affected by these factors is said to be in HWE and we can use the HWE equation to calculate genotypes based on allele frequencies or vice versa. The equation assumes there are two different alleles at a locus, p and q. Since it is assumed that these are the only alleles at this locus, p and q will always equal 1 when added together. pþq ¼ 1 The expected genotype frequencies must also add up to 1 and these are represented by p2, 2pq, and q2. These represent: p2dindividuals homozygous for allele a, typically the reference allele or wild type 2pqdheterozygous individuals, have allele a and allele b q2dindividuals homozygous for allele b p2 þ 2pq þ q2 ¼ 1 This will hold true in a population when the population is in equilibrium. The equation can be used practically for variant interpretations. By observing allele frequencies one can calculate the number of expected genotypes and therefore affected individuals. An example would be for a recessive condition. The expected number of homozygous observations of an allele can be calculated and compared to the observed number in the general population. If the variant is causative the expectation is that no homozygous observations will be made. However, homozygous observations may be missing simply because the allele frequency is low and not because the variant is causative of disease. The gene CFTR is associated with cystic fibrosis, a condition with childhood onset that is typically identified on newborn screening. One well described known pathogenic variant is often referred to as delta F508 or in HGVS nomenclature NM_000492.3(CFTR):c.1521_1523delCTT (p.Phe508delPhe). The variant is observed in gnomAD at a frequency of 0.007 and in the European population at 0.01 which may be considered as benign evidence as an allele frequency that is greater than expected for the disorder (ACMG criteria BS1). Below is the HWE equation for this allele, q is the allele frequency of the variant. q ¼ 0.01 q2 ¼ ð0.01Þ2 ¼ 0.0001 gnomAD European population is 129,034 alleles or 64,517 individuals. 64, 517  0.0001 ¼ 6.4.

62

Chapter 5 Clinical and genetic evidence

Therefore we expect six to seven individuals in gnomAD to be homozygous for this allele based on the allele frequency; however, gnomAD has only one individual listed. This is evidence that homozygous individuals would be screened out of the healthy population as they are affected with disease. The one observation might be indicative of an individual with mild presentation, the inclusion of an individual with CF in the database, or sequencing error; regardless one observation is low. As the number of individuals sequenced in the gnomAD database increases these calculations will become more powerful.

Population ethnic background The HWE is an important concept to understand allele frequencies. It reminds us that allele frequencies are not static and real-world populations are not in equilibrium. Populations across the planet have grown apart genetically and therefore their allele frequencies will differ. This highlights the importance of control databases separating entries by ethnic background. The ACMG/AMP criteria of benign evidence BA1 and BS1 can be applied not just to the entire dataset but to a single population. If a variant’s MAF passes the cutoff in one population then it is reasonable to believe that it is not causing disease. Unless that population is known to have a higher incidence of the condition in question. When deciding if a variant is common, the ethnic background of the patient in whom the variant being assessed was identified does not need to be considered as genetic variants are not known to have separate effects based on ethnic background. They can simply be identified as more common in one population or another. The basic allele frequency concepts of genetic drift, the founder effect, and selective advantage can explain why genetic variations are more common in one population over another. However, it is important to use race-matched controls when determining that a variant is rare.

Prevalence of disease Knowing the prevalence or incidence of a monogenic disorder is helpful when assessing whether or not the frequency of a variant in the population is too high to be pathogenic. The prevalence of disease is the proportion of individuals in a population that have a specified disease. Prevalence is similar but distinct from incidence which is the number of individuals with the disorder that are born in a year. The allele frequencies obtained from databases like gnomAD are essentially prevalence’s of the alleles and are suitable to compare to the prevalence of a disease. Prevalence ¼

Number of individuals with a disease Number of individuals in the population

Expected variant frequency Since most monogenic disorders have a variety of causative variants due to both genetic heterogeneity and allelic heterogeneity, the frequency of any one pathogenic variant in a population will usually be less than the prevalence of the disorder that it causes. Genetic heterogeneity is observed when a disorder is caused by pathogenic variants at more than one loci or gene. Whereas allelic heterogeneity is observed when different variants at the same loci or gene cause the same disorder.

Introduction

63

There are many factors that can confound variant frequency data when comparing it to prevalence data including: • incomplete (reduced) penetranced B some individuals who carry pathogenic variant will not display symptoms of the disease which will increase the vairant frequency in the general population • incomplete phenotypingd some individuals will have the phenotypic traits of a disorder however they were overlooked or not detected , this will also increase variant frequency in general population • genetic heterogeneityd this can lower variant frequency B pathogenic variants at two or more genes (loci) can cause the same disease • non-genetic causes/phenocopiesdincrease apparent prevalence of the disease Consider the disease in question when considering if the frequency of a variant is too high to be causative of that disease. For example, the 2015 ACMG/AMP guidelines on sequence variant interpretation refer to the following evidence under their rule PS4: “The prevalence of the variant in affected individuals is significantly increased compared with the prevalence in controls” [3]. They note the difficulty of assessing rare variants where caseecontrol evidence may not reach statistical significance and suggest that multiple observations in individuals with disease be used as evidence in replacement of a statistical difference. In a follow-up guideline on the use of ACMG/AMP guidelines tailored specifically to the gene CDH1 the authors further describe the use of clinical observations [4]. CDH1 pathogenic variants cause hereditary diffuse gastric cancer (HDGC, OMIM#137215), an extremely rare cancer syndrome. Due to the frequency of the condition, cases would almost never be numerous enough to reach significance so the authors suggest simply compiling cases where patients meet clinical criteria for HDGC [5]. Since the condition is not fully penetrant, observations in apparently healthy controls are counted against cases with a required ratio of 1:3 cases to controls in order for the cases to be counted as evidence.

Ascertainment When utilizing variant frequency data to assess a variant it is important to understand how the data were collected or the ascertainment of the data. The characteristics of the population including sex, age, and geographical location can all affect the frequency of disease-causing variants. A control population will often have individuals with disease that either may not yet have manifested or were simply undetected depending on the level of scrutiny applied to the study participants. For example, a healthy participant in a study of cancer may have a cardiac disease that was not screened for.

Ascertainment bias An ascertainment bias, sometimes referred to as a sampling bias, is essentially a distortion of the true frequency due to a bias in the way that the sample was collected leading to some of the intended population being less likely to be sampled than others.

64

Chapter 5 Clinical and genetic evidence

One example was a recent study that identified that Alzheimer’s disease risk was lower among those exposed to high levels of statins [6]. However, in response researchers pointed out that statin use is likely related to subsequent physician visits and Alzheimer’s disease diagnosis requires visits to the doctor’s office. It is possible that people with Alzheimer’s disease were just more likely to be put on statins (although appropriately) as they were all going to their physicians’ offices compared to individuals who do not seek medical care regularly although may benefit from stains. Therefore, adjusting for doctor visits may be appropriate before analyzing the data [7]. All data have some type of bias and so adjusting for ascertainment bias is needed to avoid misunderstanding data. The genome aggregation database (gnomAD) is an excellent source of data for assessing variants. In a tremendous effort the production team at gnomAD has collected exome and genome data from various projects and has made it publicly available [8]. The data are considered to be a representation of healthy individuals as the organizers have removed data from individuals known to be affected by severe pediatric disease, as well as their first-degree relatives. However, when reviewing the data we should be aware that many individuals with adult onset monogenic disorders or milder presentations of disease will be present in the dataset.

Ascertainment of “healthy” individuals Population databases such as gnomAD are a useful tool but should be interpreted with caution. Autosomal dominant polycystic kidney disease (ADPKD) serves as an excellent example of a condition for which affected individuals would be present in the gnomAD dataset. The condition is silent in affected individuals until much later in life, even though cysts may be present at a young age, kidney function is normal until middle age. Upon reviewing the gnomAD dataset researchers were able to identify many pathogenic variants in the genes responsible for ADPKD: PKD1 and PKD2. In their analysis one group identified 25 truncating variants and 393 previously reported disease-causing variants including 54 different variants that were previously classified as disease-causing that were observed in 5 patients [9]. The ascertainment criterion for the gnomAD dataset is that the healthy individuals were collected in a manner that would include a likelihood of sampling individuals with ADPKD, or other adult onset disorders. When we interpret variant frequency information from gnomAD we have to consider this example in our analysis. If the condition we are investigating can go undiagnosed we should expect some pathogenic variants in gnomAD or other databases of apparently healthy individuals. Understanding this ascertainment bias allows us to observe a variant in a healthy individual dataset and not take it as absolute evidence that the variant is benign. Caution should be taken when invoking ACMG criteria related to observations of the variant in healthy individuals.

Ascertainment of individuals with disease Alternatively, a similar situation is when a variant is observed in a population of individuals with disease more commonly than in healthy individuals. It is attractive to say that the variant is diseasecausing but without large cohorts of healthy individuals or good data from family studies it is a leap to say these variants cause disease.

Population allele frequency

65

For example, multiple recurrent deletions have been identified in chromosome region 15q11 due to low copy repeats that mediate various copy number variations. PradereWilli syndrome (PWS) and Angelman syndrome (AS), both well-described conditions, can be caused by deletions in this region. However, a small recurrent deletion of about 700 Kb adjacent to the critical region for PWS/AS is less well understood. This deletion, typically referred to as “15q11 BP1-BP2,” is often identified in cases of developmental delay, motor delay, and speech and language delay, and was implicated as causative for an emerging syndrome [10]. In addition genes within the region, although not known to cause human disease, are known to be expressed in the nervous system. However, a closer look at the phenotypes revealed that penetrance estimates were based on inconsistent phenotypes [11]. When they considered how often unaffected parents were identified and how often neurodevelopmental phenotypes were distinct from each other, they suggested that this finding does not define a distinct syndrome and should be considered as a VUS. The deletion was identified in an ascertainment group of patients referred for clinical testing, almost all of which have some type of neurodevelopmental disorder. The finding is still generally reported as a VUS and most consider it to likely be a susceptibility allele with mild impact and not a straightforward pathogenic variant.

Matched controls in genetics studies Matched controls are generally considered an excellent level of data in the medical literature. In genetic studies typically ethnicity is matched; however, depending on the study different variables may be important to match between participants and controls. However, at the level of genetic variants matched controls must be utilized carefully in order to display pathogenicity of a variant. Most variants whether pathogenic or benign are very rare and many are even unique. The absence of a variant in a matched control cohort is not great evidence of pathogenicity. Databases such as gnomAD provide a better source of variant data and even with an impressively large number of individuals the data are only a part of a variant assessment which can be used to determine the pathogenicity of a variant. The ClinGen sequence variant interpretation (SVI) working group has recently recommended using the absence of a variant from controls data as only supporting evidence as opposed to the originally proposed moderate level of evidence. The history of genetic testing in hypertrophic cardiomyopathy (HCM) is particularly interesting and provides a lesson on utilizing matched controls. In a recent review of cardiac gene panels 33 HCM genes were commonly present on panels of which 22 were determined to have little or no evidence to support an association with HCM [12]. So how did these genes end up on these panels? The authors note that studies identifying these associations took a candidate gene approach and made observations of rare variants in cases which they then showed were not present in matched controls. We now understand that this level of data on its own is not sufficient to assign pathogenicity.

Population allele frequency The reference sequence is needed in order to identify variations compared to a particular control sequence or “reference sequence.” There are all types of normal or benign variations identified when searching for disease-causing variants. The rate at which these variants are identified in the general

66

Chapter 5 Clinical and genetic evidence

population is referred to as the minor allele frequency (MAF), not to be confused with the allele fraction of a variant. Some variants are even identified at MAFs over 50%, meaning they are more common than the reference allele at that site and could very well influence a change to the reference sequence in future genome builds. Regardless, some variations are rare and some are more common. In general, we refer to common variations as polymorphisms, typically referring to a variant that is present in the general population at greater than 1%, although different cutoffs have been used over the years.

Allele frequency thresholds Identifying the frequency of a variant in the general population is extremely useful in the assessment of the pathogenicity of a DNA variant. There are various ways to establish the MAF in the general population. Publicly available databases provide a rich source of information. To name a few sources, the 1000 Genome Project and the Exome Variant Server are some of the earlier datasets; however, the most commonly utilized source is the more recently release gnomAD containing the sequence form >120,000 exomes and genomes of healthy controls in the general population. For larger copy number variants, the Database of Genomic Variants is a rich source of data as well as the gnomAD structural variants dataset. In addition, race-matched control data in a study can be a source of allele frequency data although often the control population in a single study is far too small to be informative. The general concept behind utilizing allele frequency is that a variant that causes rare disease should not be found commonly in healthy people. Therefore, in the context of a rare monogenic/ Mendelian disorder a high allele frequency in the general population is considered strong evidence for a benign interpretation. The current ACMG guidelines for assessing sequence level variants apply best to severe childhood onset diseases as does this concept. Therefore there are many caveats to this concept and careful consideration of the source of the data along with the disease being considered is essential in proper interpretation of the allele frequency.

MAF thresholds When assessing an allele frequency how common does a variant have to be to be greater than expected for the disorder? It is straightforward to establish the extremes; often there is no question that a variant is too common to bother further consideration. However, the gray area can be much more nuanced. For benign evidence the ACMG/AMP guidelines set a standalone benign evidence (BA1) of an allele frequency of >5%. A strong evidence (BS1) states that the MAF be greater than expected for the disorder is slightly more subjective. Generally, the pathogenic variants identified at frequencies that are anywhere near to this BA1 cutoff are associated with recessive disorders. Even for common pathogenic variants, the ACMG standalone cutoff is conservative, although some exceptions have been identified, as it should be for a piece of evidence that carries the weight that it does. However, in an effort to make this evidence apply to all conditions it becomes far too conservative for many rare diseases. In regard to autosomal dominant conditions there are no genetic diseases affecting 1/20 people and therefore a 5% allele frequency cutoff is far higher than the prevalence of the disease let alone the prevalence of any one variant, exemplifying how conservative it is. The criteria is still very useful in allowing laboratories to easily call polymorphisms benign but stops short of being useful for more difficult assessments.

Population allele frequency

67

Thresholds used for benign evidence criteria Being too conservative with benign evidence can make it difficult to invoke criteria needed to push truly benign variants to a likely benign or benign interpretation. Typically, variants interpreted as benign and likely benign are not reported whereas VUSs are often reported and can be stressful for patients. We know that being conservative with variant interpretations is not always in the best interest of the patient. We also know that we can safely set gene- and disorder-specific MAF cutoffs slightly lower and still avoid applying benign evidence to truly pathogenic variants. Both the known inheritance pattern and the prevalence of a disorder can be utilized to establish reasonable MAF cutoffs. Starting with autosomal dominant disorders we can assume that the frequency of pathogenic variants is approximately equal to half the prevalence of the disease. In basic terms if 1/100 people have a dominant condition then 1/200 alleles have a variant causing that condition (assuming full penetrance and ascertainment). Some pathogenic variants will contribute to multiple cases of the disease, such as within a family as well as in different families. At the same time there are no described autosomal dominant genetic conditions where only one pathogenic variant causes all cases of disease, although some conditions such as achondroplasia come close. Therefore, the frequency of any one pathogenic variant is actually much lower than the prevalence of the disease. This makes setting the allele frequency benign evidence cutoff at half the prevalence of the disorder a very conservative approach as well. The SVI group at ClinGen considered the importance of interpreting variants based on the associated disease and is attempting to establish specifications of the ACMG/AMP guidelines for specific diseases. The ACMG/AMP guidelines were designed to apply to all genes and diseases associated with Mendelian disorders and therefore adapted to each disease, and the SVI group is taking on the task of publishing the specifications. Experts in multiple disease groups have identified that a BA1 cutoff of 0.05 is high and have lowered it based on the condition in question. With this in mind ClinGen SVI group has recommended that disease experts define gene-specific BA1 criteria based on what is known about the prevalence, penetrance, and genetic and allelic heterogeneity of the associated conditions [13]. In the same publication the authors also discuss the reverse of this caveat that in some cases a BA1 MAF cutoff of >0.05 can lead to pathogenic variants being incorrectly assigned as benign. The group identified variants that are at frequencies above 0.05 MAF and that are listed as pathogenic in ClinVar but are not of concern when filtered out because they fall under one or more of the following categories: (a) (b) (c) (d) (e) (f)

the variant was better considered a common susceptibility allele or modifier the geneedisease association was judged to be unproven the phenotype was better considered a trait, instead of a disease the variant had very limited evidence that was scored as insufficient by an expert reviewer the variant was only seen somatically the gene is noncoding

However, the authors also identified variants at high frequency (>0.05) that were listed as pathogenic in ClinVar and they agreed these variants were not benign (5 VUS, 4 pathogenic) for Mendelian disorders and would be missed with a 0.05 MAF filter applied broadly to all variants. ClinGen committed to creating a list of BA1 exception variants which should be referenced when applying

68

Chapter 5 Clinical and genetic evidence

a 0.05 MAF filter to a variant caller. Laboratories are encouraged to maintain their own internal list of these high-frequency variants. These variants will differ between populations due to exceptions from HWE.

Thresholds used for pathogenic evidence criteria In regard to evidence in favor of pathogenicity the MAF does not play nearly as large of a role. The identification that a variant is rare is not strong evidence that it is pathogenic. As mentioned previously de novo variants are present in everyone, likely 50 per person including 1e2 in coding sequence. These are often rare or even unique. Moreover, variants may be inherited but still unique to a family as de novo variants are acquired over the generations. A unique de novo variant that was first formed in your patient’s great grandparent is still unlikely to be present in any public datasets. Therefore, it is not unusual to identify a unique or extremely rare variant in any patient. ACMG/AMG guidelines suggest applying supporting evidence of pathogenicity if a variant is absent from controls (PM2_Supporting). Additional care should be added when the variant is not associated with the disease in question. In the context of either an incidental variant or say in a healthy person preventative genetics screen, a rare variant in a disease-causing gene is even less suspicious. The rationale for this is that the degree of suspicion for a variant is raised because it was identified in a gene that was targeted for sequencing based on the patient’s phenotype. Without this prior associated the level of suspicion is lowered overall. In the context of monogenic disorders MAFs can indicate disease based on the hypothesis that a single DNA variant is contributing to the disease, leading to the idea that pathogenic variants are generally expected to be rare and common variants are generally expected to be benign. We can refer to this as the “Rare Disease, Rare Variant” hypothesis.

Population size When discussing a MAF we should recognize that a frequency is basically a fraction. This means that there is a numerator and denominator, being the number of positive alleles and the total number of alleles screened, respectively. Alleles positive for the variant Total alleles screened When the denominator is too low the frequency can become less reliable. ClinGen SVI recommends that a dataset has a minimum of 2000 alleles screened in order to utilize the MAFs for variant interpretation. This may seem low as gnomAD has approximately 120,000 cases or 240,000 alleles. However, certain factors can drastically lower that number and cause the total number of alleles to cross over the cutoff of 2000. Firstly, we can use the MAF from just one ethnic population; of course a subpopulation will have much fewer alleles than the total. Certain regions of the genome will be of lower quality due to genomic content or regions of homology and have many cases filtered out by quality control. Lastly most cases are from exome sequencing; therefore intronic regions will have much lower allele totals until more whole-genome sequencing cases are added. Therefore, it is always important to make note of the denominator and not the MAF alone. Some groups have taken the approach of changing the weight of the evidence based on the number of alleles in the population. MAF ¼

Population allele frequency

69

With higher weight being given when the allele population is > 15,000 and lower weight when it is < 15,000 [14]. In either analysis approach the population size is an important consideration when invoking any criterion that utilizes an MAF.

Family history Family history data can be incredibly powerful and should be leveraged when appropriate. Clinical genetics laboratories typically have room on their requisitions for family information and they will often reach out to referring clinicians when clarification would assist in variant interpretation. Evaluating inheritance patterns and cosegregation in families is a powerful way to use family information for variant interpretation.

Inheritance patterns The family history recorded in the form of a pedigree can display a recognizable pattern. There are various inheritance patterns of monogenic disorders. Determining the correct inheritance pattern can focus the analysis on variants in specific genes with known modes of inheritance.

Autosomal dominant (AD) • • • •

Affected person usually has an affected parent (multiple generations affected) Both males and females affected Transmitted by both males and females 50% recurrence risk

AD conditions are caused by a pathogenic variant in one allele of a gene pair. The allele with the pathogenic variant is said to dominate over the other allele. In each pregnancy there is a 50% chance to pass on the allele with the variant and a 50% chance to pass on the allele without the variant (Fig. 5.1).

Autosomal recessive • Affected people born to unaffected (carrier) parents • Increased incidence of parental consanguinity • Both males and females affected

I

1

II III

1

1

2

2

3

FIGURE 5.1 Autosomal dominant inheritance.

3

4

5

2

4

6

5

7

8

9

6

10

7

11

12

8

13

14

70

Chapter 5 Clinical and genetic evidence

I II III IV

1

1

2

2

1

2

1

3

3

2

3

4

4

4

5

6

5

7

6

7

8

FIGURE 5.2 Autosomal recessive inheritance.

• Disease may skip generations • 25% recurrence risk • Often only one affected person in the family Autosomal recessive (AR) conditions are caused by pathogenic variants in both alleles of a gene pair, often referred to as biallelic. These can be the same variants which are referred to as homozygous, or different variants referred to as compound heterozygous. Individuals with only one allele with a pathogenic variant are referred to as carriers and are typically unaffected (Fig. 5.2).

X-linked recessive • • • • •

Affects mainly males Females affected more mildly and variably No male-to-male transmission Affected males related through carrier females Recurrence risk, 50% for male children

X-linked recessive (XLR) conditions are caused by a pathogenic variant in a gene on the X chromosome. Female carriers have two copies of the X chromosome and therefore their unaffected allele of an XLR gene can compensate for the allele with the pathogenic variant. Whereas males have only one X chromosome; therefore if they have a pathogenic variant in an XLR condition they will be affected (Fig. 5.3). Female carriers can have mild features of the disease. Typically only one of their Xchromosomes is expressed in each cell due to a process called X-inactivation. Although it is generally random which X is inactivated it tends to be about equal proportions of the X with the variant and the X without the variant. However, sometimes there can be skewed X-inactivation and the X without the variant is inactivated more than the X with the variant leading to a female carrier having symptoms associated with the XLR disorder.

Population allele frequency

71

FIGURE 5.3 X-linked recessive inheritance.

X-linked dominant • Affected males have no affected sons and no unaffected daughters • Both male and female offspring of female carriers have 50% risk of being affected. Pedigree pattern similar to autosomal dominant inheritance • Affected females tend to have milder disease phenotype • Some X-linked dominant (XLD) disorders are male lethal, only affected females and deficiency of male offspring XLD conditions are similar to XLR in that they are caused by a pathogenic variant in a gene on the X chromosome. However, with these conditions a female’s unaffected allele does not compensate enough to avoid symptoms but does typically provide some compensation (Fig. 5.4).

Y-linked • Only male-to-male transmission, only affected males • Very few male traits, few truly Y-linked conditions • Mutant Y copy of pseudoautosomal gene can appear as Y-linked trait (e.g., LerieWeill dyschondrosteosis) In this case the pathogenic variant is on the Y chromosome and therefore females cannot be affected. Most conditions caused by pathogenic variants on the Y chromosome affect fertility and as such are not often passed on to progeny. Conditions caused by genes in the pseudoautosomal region of the Y chromosome can be subjected to recombination with the X chromosome and could be passed onto females in rare cases (Fig. 5.5).

Mitochondrial Inheritance • Affected females pass on the condition to most children • Affected males do not pass on the condition to anyone nor do their progeny • Variable expressivity is a hallmark feature

72

1

II III

1

1

2

2

2

3

FIGURE 5.4 X-linked dominant inheritance.

3

4

Chapter 5 Clinical and genetic evidence

I

5

6

4

7

5

8

9

6

10

11

7

12

13

14

15

8

16

17

18

1

Population allele frequency

I

1

II III

1

1

2

2

2

3

4

73

3

5

6

4

7

5

8

9

10

6

11

12

FIGURE 5.5 Y-linked inheritance.

This inheritance pattern applies to conditions caused by pathogenic variants in mitochondrial DNA, found in the mitochondria. There are conditions that affect mitochondria but are associated with nuclear genes, these do not follow mitochondiral inheritance patterns. Only females pass on their mitochondria to their progeny, and thus their mitochondial DNA. A pathogenic variant in mitochondrial DNA is not always present in all of an individual’s mitochondria; this is referred to as heteroplasmy. Therefore, it is not certain that all of an affected female’s children will inherit a mitochondrial DNA variant. In addition, the conditions can display a large degree of variable expressivity depending on the degree of affected mitochondria that are inherited (Fig. 5.6). Incomplete penetrance and atypical presentations of disease can complicate pedigree analysis and should be considered based on the disease in question. Some inheritance patterns are more complex including imprinting, digenic, polygenic, and multifactorial. Think about these inheritance patterns when a pedigree is not explained by one of the above inheritance patterns. The ACMG/AMP guidelines for interpreting sequence variants apply to monogenic disorders and as such the guidelines criteria may not work well with more complex inheritance patterns. These inheritance patterns can layer on top of the above inheritance patterns. For example, in digenic inheritance a second locus can influence the more significant locus which can act as either recessive or dominant.

I II III FIGURE 5.6 Mitochondrial inheritance.

1

2

1

2

1

3

3

2

4

4

5

6

74

Chapter 5 Clinical and genetic evidence

Inheritance analysis ACMG/AMP guidelines include various levels of evidence where family history can be leveraged to invoke the criteria. Starting with PP4, this criterion is invoked when the patient’s phenotype or their family history is highly specific for a disease with a single genetic etiology. The use of the family history here is to establish that the condition is being inherited as one would expect for the disorder. Take, for example, this referral for a clinical suspicion of Alport syndrome or thin basement membrane nephropathy (TBMN). This condition can be inherited in an autosomal dominant, autosomal recessive, or most classically an X-linked fashion associated with the gene COL4A5. The condition presents with proteinuria, hematuria, sensorineural hearing loss, and ophthalmological findings. The patient presents with kidney function decline, hearing is normal, and focal segmental glomerulosclerosis (FSGS) was identified on kidney biopsy. There is a family history of end-stage renal disease affecting only males in the family. The submitted pedigree is displayed in Fig. 5.7. The symptoms of the proband and the family history are in keeping with TBMN or essentially a nonclassical form of Alport syndrome. Although the biopsy findings are nonspecific, next-generation sequencing identifies a COL4A5 variant which is located on the X chromosome. At quick glance the family could look suspicious for an X-linked condition. There are males more severely affected and females less so. However, a proper analysis of the family history rules out the possibility of an X-linked condition. The X chromosome is not passed on from males to males and there are two instances in the above pedigree where an affected male has a son who is also affected. This assessment of the family history informs the variant assessment. The variant in an X-linked gene is much less suspicious since it does not explain the family history. This family history is indicative of an autosomal dominant condition with variable expressivity and not an X-linked condition. In this case we could invoke the ACMG/AMP criteria BS4 (Lack of segregation in affected members of a family), as we know that this variant will not segregate with disease in this family.

FIGURE 5.7 Pedigree displaying a family with a positive history of kidney function decline and hematuria. Inheritance pattern can be utilized to assist in variant interpretation.

Population allele frequency

75

Cosegregation Cosegregation in clinical genetics typically refers to a pathogenic variant and the disease that it causes. The disease will be present in individuals with the variant and vice versa. When a disease is present in a family we can use this concept to investigate a variant. However, cosegregation makes some assumptions that will not always hold and we should be aware of these: Assumption

Exception

Every family member positive for the variant will have the disease Every affected family member has the disease due to the same DNA variant Family history is reported correctly

Incomplete penetrance and variable expressivity leading to mild or no phenotype Two different pathogenic DNA variants in the family Phenocopiesdenvironmental cause, and nonpaternity Family members withhold medical information

The ACMG/AMP guidelines refer to cosegregation of a variant in criteria PP1 and in BS4. These criteria refer to when a variant segregates with disease in a family or does not segregate with disease in a family. When faced with a variant of uncertain significance one of the tools available to a clinician is a cosegregation analysis. This is the process of testing various family members for the variant and collecting their medical information in order to establish if a genetic finding is always found with a particular phenotype in a family.

ACMG/AMG Guidelines pertaining to Cosegregation PP1 Cosegregation with disease in multiple affected family members in a gene definitively known to cause the disease BS4 Lack of segregation in affected members of a family

A cosegregation analysis can often be found in the literature and can be very helpful in adding evidence to a variant assessment. There are hallmarks of a good cosegregation analysis and when assessing a variant, we should be aware of the caveats associated with the methods of the analysis. Features of a Thorough Cosegregation Analysis: 1. Complete phenotyping of all individuals tested: type of test and age when tested. 2. Assessment of factors associated with phenocopies. 3. Comprehensive sequencing for multiple/all affected family members. 4. An understanding of the inheritance of the disease including penetrance and presentation.

One of the most basic and common forms of a cosegregation analysis is the testing of parents to identify if a variant is inherited or de novo, meaning a new mutation. Classically there would be a child with a severe disorder such as developmental delay and multiple congenital anomalies. A variant is identified; however, it is not clear if the variant is causing disease. Parental testing may help to clarify things, raising the suspicion that the variant is causative if it is a de novo variant, while lowering the suspicion that the variant is causing disease if it is found to be present in one of the healthy parents.

76

Chapter 5 Clinical and genetic evidence

Following a VUS identified in an affected child: Genetic test result

Phenotype

Evidence

Both parents negative

Both parents healthy

One parent negative, one parent positive Both parents negative

Both parents healthy

De novo variant, suggests pathogenicity Inherited variant, suggests variant is benign De novo variant, suggests variant is benign Inherited variant, suggests pathogenicity

One parent negative, one parent positive

One parent has similar condition to child Positive parent has similar condition to child

The presence of a de novo variant does not guarantee that a variant is disease-causing. Various studies on de novo variants and estimates of mutation rates have established that each person has roughly 50 de novo variants and about one or two in coding regions. In addition, the identification of an inherited variant does not guarantee that a variant is benign considering variable expressivity, reduced penetrance, tissue-specific mosaicism, sex bias, and other rare phenomenon.

Cosegregation phenotyping The “co” in cosegregation implies two factors are part of the analysis: the presence/absence of the variant as well as the phenotype. Simply testing family members for the presence of a variant is not enough to establish cosegregation. Each family member tested should be properly phenotyped. This may include a thorough medical history, imaging, blood or urine tests, a physician exam, or retrieving records but usually does not include anything potentially harmful such as a biopsy unless otherwise indicated. As most genetic conditions display variable expressivity, phenotyping of unaffected individuals should be conducted while considering the disease features and age of onset before assuming that they are unaffected. The quality of the phenotyping will depend highly on the disease. Severe and fully penetrant disorders may require little documentation while milder and variable conditions require much more. For instance, when assessing a variant associated with BeckwitheWiedemann syndrome (BWS) a mother is found to carry the suspected disease-causing variant in the gene CDKN1C. Characteristic features of BWS include omphalocele, macroglossia, visceromegaly, overgrowth, and a predisposition to childhood cancers. The mother, an adult, is of normal height and has no tumor history suggesting that she is unaffected. An understanding of BWS will aid in establishing how informative her phenotypic information is to you. First of all, people with BWS are of above average height in childhood but this normalizes in adulthood, so the mother’s height now is of little consequence. The risk of childhood cancer is estimated to be as high as 21%, meaning that no history of cancer is not unusual in someone with BWS [15]. Another important factor is that this condition is associated with imprinting; disease-causing variants in CDKN1C are maternally inherited. Therefore, the inheritance from a mother suggests that the variant is disease-causing. However, the mother being unaffected is not informative as she may have inherited the variant from her father and she could be an unaffected carrier of a disease-causing variant.

Molecular pathology

77

Cosegregation limitations When a cosegregation analysis displays that the phenotype and variant segregate together in a family, it does not prove that the variant is causative for the phenotype. Although it may be the case that the variant is causative the above evidence simply displays that the causative variant in the family is linked to the loci being tested. Another limitation to cosegregation analysis is simply chance. When considering autosomal dominant disease, a variant can seem to be linked to a phenotype but it is actually only incidentally present and is not causative of the disorder. Each additional family member added to the analysis decreases this possibility. The analysis has to take into account that the variant is benign and is present in the affected family members by chance. The chance that any one allele is passed on from a parent to a child is 50%. Therefore, if cosegregation is displayed in an affected parent and affected child it could be due to chance. A second affected child also has a 50% chance of inheriting the variant; however, there is only a 25% chance that both children inherit any one benign variant from the parent. As the number of meiosis identified where a variant segregates with disease increases, the likelihood that the observation is by chance decreases (Table 5.1). The number of observations required depends on the condition, the penetrance, what is known about the phenotype including how specific it is and how well described. These above limitations of cosegregation analysis make it clear as to why it is most informative when done in multiple separate families, especially if those families are from different ethnic backgrounds. This helps to establish certainty that the families do not have similar origins. Overall given their difficulty in performing a cosegregation analysis properly, identifying this type of evidence is rare, but also powerful, displayed in the ACMG/AMP guidelines as strong evidence.

Molecular pathology Family history and phenotypic presentation are part of almost every indication for a clinical genetics’ assessment; however, pathological findings are also used to triage those requiring further clinical genetics assessments. Molecular pathology is the examination of molecules (also called biomarkers) Table 5.1 Outlines the odds that a cosegregation analysis displays phenotype and genotype segregating together by chance. The odds are based on the number of dominant segregations observed. Number of dominant segregations

Odds

10 7 5 4 3 2 1

1/1024 1/128 1/32 1/16 1/8 1/4 1/2

78

Chapter 5 Clinical and genetic evidence

used to understand, diagnose, and treat disease. Choosing the correct specimen type and molecular target are critical aspects of molecular pathology studies. Biological specimens may include tissue biopsies or resections, peripheral blood and plasma, bone marrow, urine, cerebrospinal fluid, and cytology extractions. The molecular targets include protein expression, biochemical analytes, and genomic alterations. In cancer, molecular biomarkers predict therapeutic response to targeted therapy, assist with tumor classification, predict survival outcomes, and can suggest a hereditary predisposition to cancer. Congenital disorders use molecular biomarkers in the diagnosis of many types of diseases such as inborn errors of metabolism, muscular dystrophies, and FSGS. This section outlines the use of molecular pathology biomarkers in cancer, congenital disorders, and newborn screening in order to understand how they apply to the clinical relevance of genetic aberrations found in the same patient.

Hereditary cancer predisposition Cancer is caused by both extrinsic (environmental exposures) and intrinsic (genetic) factors. A familial predisposition to cancer can imply a common environmental exposure within the family tree that causes cancer, but can also suggest a common genetic variant that places those family members with the inherited genetic abnormality at a higher risk to acquire certain cancer types. A genetic assessment for familial cancer predisposition includes decisions of when and who to test, what genes are relevant, and how to interpret negative genetic findings. Molecular pathology studies can identify patients who may have a hereditary predisposition to cancer versus a sporadic incidence. The distinction is important since hereditary predisposition implicates other family members at risk that may require regular screening or prophylactic treatments.

Tumor first sequencing The cost of sequencing and improved computational tools has facilitated genetic testing on tumors. The concept of using less costly methods such as immunohistochemistry (IHC) to predict presence of a pathogenic germ line mutation is now being challenged with direct tumor genetic testing. Tumor first genetic testing cannot conclusively differentiate an acquired or germ line mutation; therefore, appropriate referral for germ line follow-up testing on peripheral blood is still necessary. When to refer for germ line testing and what is the appropriate pretest counseling for genetic testing on tumors are specific challenges facing this change. EMSO practice guidelines recommend any variant found in a tumor at an allele fraction above 20% should be referred to genetic counseling [16]. Comprehensively sequencing colorectal cancers for Lynch syndrome genes, MSI loci, and somatic driver genes (BRAF, KRAS, and NRAS) displayed superior sensitivity in identifying Lynch syndrome compared to traditional BRAF þ IHC, or MSI þ IHC approaches [17]. By sequencing tumors first for cancer predisposition genes, about 17% of patients received a genetics referral and 10% had germ line variants that would not have otherwise been detected by routine clinical practice guidelines for genetic testing of inherited cancers [18]. These studies highlight the potential for improved screening for familial predisposition by upfront genetic testing of tumor samples.

Molecular pathology markers in hereditary colorectal cancers Lynch syndrome is uniquely caused by germ line mutations in genes involved in the DNA mismatch repair process. The specific genes are mutL homologue 1 (MLH1), postmeiotic segregation increased 2

Molecular pathology

79

(PMS2), mutS homologue 2 (MSH2), and mutS homologue 6 (MSH6) [19]. EPCAM deletions are also prevalent in Lynch syndrome carriers due to a silencing effect of the downstream promoter region of MSH2 causing loss of expression of MSH2 [20,21]. The mismatch repair system identifies DNA mismatches following replication and replaces the incorrect base pair with a correct one. MSH2 and MSH6 dimerize and recruit MLH1 and PMS2 (which also form a dimer) to site of DNA mismatch. Loss of function or loss of expression of MMR proteins promote the accumulation of DNA mismatches especially in genomic regions such as microsatellites and homopolymer stretches where replication errors are more likely to occur. The accumulation of these mismatches specifically in microsatellite regions is known as microsatellite instability (MSI). As such, MSI and loss of expression of the MMR proteins by IHC are molecular features of Lynch syndrome tumors; however, these molecular biomarkers are not exclusive to Lynch syndrome and also occur sporadically to promote colorectal carcinogenesis. Directly testing genomic loci for high MSI and observing a loss of expression of MLH1, PMS2, MSH2, or MSH6 by IHC are 100% sensitive in identifying Lynch syndrome and are the National Comprehensive Cancer Network recommendations for familial risk assessment of colorectal tumors [22,23]. The correlation between loss of expression of a specific MMR gene and finding a germ line variant in the same gene is not perfect because the activities of the proteins are dependent on each other. A loss-of-function variant in one MMR gene may affect the expression of another. Genetic testing for Lynch syndrome therefore requires a comprehensive gene panel that includes all MMR genes, EPCAM deletions, and also may include other genes related to differential diagnosis of familial colorectal cancer [22]. Biallelic mismatch repair deficiency (BMMRD, also called Constitutional MMR deficiency) is a more penetrant AR form that can also be identified by MSI and IHC of MMR proteins in the tumor. Since no functional copy of one of MMR genes is present in patients with BMMRD, IHC staining is lost in both normal and tumor tissue. Although not currently the standard of practice, upfront sequencing of tumor specimen for Lynch-related genes and other oncogenic driver genes in colorectal cancers is also a recommendation and improves diagnostic yield for screening Lynch syndrome in universal tumor testing model [17,24]. Molecular pathology is also used to exclude a diagnosis of Lynch syndromeebased etiology based on universal tumor testing. BRAF is a RAS pathway signaling molecule that is frequently mutated in many cancer types. A constitutively activating mutation in the kinase domain (p.Val600Glu) of BRAF is most commonly observed [24]. Colorectal cancers from Lynch syndrome do not contain BRAF mutations and colorectal tumors with this recurrent BRAF mutation are considered sporadic; therefore, no further familial follow-up is necessarily required. Endometrial cancers do not typically contain BRAF mutations and therefore BRAF testing is not routinely performed. Loss of MLH1 expression can result from a germ line loss-of-function variant in MLH1; however, MLH1 promoter hypermethylation also causes a loss of expression. The hypermethylation process also occurs sporadically and usually no further follow-up is pursued. Some rare cases of inherited hypermethylation of MLH1 promoter have been documented. These rare cases of inherited MLH1 epimutations highlight the importance of correlating the molecular pathology findings with a patient’s personal and family history of the disease, as some cases may still warrant germ line follow-up [25].

Molecular pathology markers in hereditary breast and ovarian cancer IHC for BRCA1 and BRCA2 is not used for the detection of loss-of-function mutations in breast and ovarian tumors due to its limitations. Rather, direct sequencing of BRCA1 or BRCA2 is the standard for

80

Chapter 5 Clinical and genetic evidence

determining BRCA1 and BRCA2 status in the tumors and response to platinum analogues or PARP inhibitors. Since direct tumor sequencing cannot differentiate an acquired variant found only in the tumor and a constitutional germ line variant detected in all tissues, these practices have raised important questions of when to refer a patient with a BRCA1 or BRCA2 mutation found in the tumor for genetic counseling and confirmation of the variant in the germ line. Up to 30% of women with serous ovarian carcinoma have germ line mutations in BRCA1 or BRCA2. EMSO practice guidelines recommend any variant found at an allele fraction above 20% should be referred to genetic counseling for consideration of germ line sequencing [16]. Hormone receptors are also highly active in driving oncogenesis in breast cancer. Targeted treatments in breast tumors with highly expressed estrogen or progesterone receptors are available. IHC is used as a molecular pathology marker to observe overexpression of these receptors and direct treatment strategies. IHC and fluorescence in situ hybridization of epidermal growth factor receptor 2 (also known as ERBB2 or Her-2) are also used to visualize increased expression of Her-2 protein. These receptors are coded for by oncogenes that signal to the breast cells to proliferate. Patients with breast tumors that do not overexpress estrogen receptors, progesterone receptors, and Her-2 (also called triple-negative breast cancer) have increased likelihood of carrying germ line pathogenic variant in BRCA1 or BRCA2. Germ line pathogenic variants in BRCA1 are far more likely than BRCA2 in triple-negative breast tumors [26,27] (Table 5.2).

Molecular pathology markers in congenital disorders Constitutional disorders also rely on molecular pathology markers. A classic example is Duchenne muscular dystrophy (DMD), a congenital muscular dystrophy characterized by progressive muscle degeneration. DMD is an X-linked condition diagnosed usually in males by clinical presentation and detection of a loss-of-function variant, deletion, or partial duplication in the Dystrophin gene. Missense changes and in-frame deletion or duplications can cause a less virulent disease known as Becker muscular dystrophy (BMD). Immunohistochemical staining for the dystrophin protein shows a complete loss of expression in patients with DMD. Partial loss is observed in patients with BMD due to reduced expression of the gene. CK levels in plasma are highly elevated and also used as clinical criteria to screen for individuals suspected of having DMD or BMD. Table 5.2 Molecular pathology informs appropriate germ line test and can provide functional evidence to support variant interpretation. Biomarker

Cancer histology

MLH1, PMS2 MSH6, MSH2 ER, PR, ERBB2 BAP1 BRCA1 and BRCA2 Microsatellite instability

Colorectal Colorectal Breast Uveal melanoma Ovarian serous carcinoma Colorectal, endometrial

Molecular pathology test

Molecular pathology results

Follow-up germ line test

IHC IHC IHC IHC Sequencing

Deficient Deficient Negative Lost Loss-of-function variant Microsatellite instability

MLH1, PMS2 MSH6, MSH2 BRCA1, BRCA2 BAP1 BRCA1 and BRCA2 MLH1, PMS2, MSH6, MSH2

STR analysis

Mosaicism

81

Molecular pathology markers in newborn screening Newborn screening is standard practice in neonatal screening for a range of congenital disorders including inborn errors of metabolism. Biochemical testing (also a form of molecular pathology) from peripheral blood is performed to screen for abnormal levels of molecular targets that are aberrant in these conditions. One example is classical phenylketonuria (PKU), where decreased metabolism of the amino acid phenylalanine due to the AR genetic loss-of-function mutations in the phenylalanine hydroxylase (PAH) gene. PAH is an enzyme that metabolizes phenylalanine and aberrant accumulation of phenylalanine leads to cellular toxicity resulting in clinical presentation of PKU. Intellectual disability, seizures, behavioral problems, and mental disorders are typical findings in patients with untreated PKU. Molecular pathology as part of newborn screening programs can detect PKU at birth. Tandem mass spectrometry is typically used to measure elevated phenylalanine levels compared to its metabolite tyrosine. These findings warrant genetic testing for pathogenic variants in the PAH gene in the newborn and also parents to check for carrier status.

Mosaicism Mosaicism is the presence of at least two different cell lines in one individual. To distinguish this from chimerism, in mosaicism the two cell lines are derived from a single zygote. The two cell lines can differ in multiple ways including a single nucleotide variant to a whole chromosome aneuploidy. In clinical genetics the interest is typically in a mosaic cell line with a pathogenic variant or a variant of interest for some other reason such as some type of clinical actionability. Therefore, the level of mosaicism typically refers to the number of cells detected in the test that are abnormal or carry the clinically significant variant. Many laboratory tests can establish a suspicion of mosaicism based on the result of the test. The level of mosaicism detected by the test does not necessarily reflect the level of mosaicism in the individual but may reflect the level of mosaicism in the tissue type. There are various different categories of mosaicism that can help to conceptualize the extent and implications of the findings.

Somatic versus germ line mosaicism Essentially the timing of the postzygotic event that causes the variant will influence the distribution of the newly formed cell line in the individual. For instance, a variant that occurs during the very first meiosis would be expected to be present in about half of the individual’s cells. If a variant cell is formed prior to left-right patterning then the variant will be present on both sides of the individuals. Variant cell lines that are formed later in embryonic development can be confined to only one side or limited to certain tissue types. One tissue type of specific importance is the germ line; we can refer to a finding as being germ line mosaicism when the germ cells contain the identified variant and the remainder of the individual tissue is generally unaffected. Although practically this is difficult to confirm as it would require sampling for an array of tissue types. The practical implication is that the individual can pass the variant on to their progeny but is not affected with the disease that the variant confers. If their offspring inherits the condition then they would be nonmosaic and have the variant present in every celldthis is typically referred to as a constitutional variant.

82

Chapter 5 Clinical and genetic evidence

Whereas somatic mosaicism refers to a variant cell line that is not present at all in the germ line, meaning the individual cannot pass on the disease. However, these individuals are typically affected to a lesser degree than someone with a constitutional variant. These concepts can be utilized to understand test results and combined with medical and family history to understand the pathogenicity of a mosaic variant.

Testing strategies in mosaicism General recommendations for confirming mosaicism are typically utilization of an orthogonal method (i.e., NGS vs. Sanger sequencing) to rule out a testing artifact. In addition, testing children or testing of different sample types from the patient may clarify the extent and significance of the mosaicism while confirming the result. Once detected mosaicism is difficult to rule out as any negative results could be reflective of the tissue type and would not rule out the presence of the variant in other tissue types. It is important to choose the tissue type in regard to its embryonic origin and cell type; blood and skin are good options as blood comes from the mesoderm while skin the ectoderm. A common mistake is to use both blood sample and saliva sample; however, the main cell type in both of these samples is leukocytes highlighting the importance of choosing the correct tissue type.

Identification of mosaicism using next-generation sequencing The amount of sequencing reads that the variant is present in is typically referred to as the allele fraction. The initial identification of a variant that appears to be at a mosaic level can be challenging. Nextgeneration sequencing relies largely on read depth to make reliable variant calls. The allele fraction generally becomes increasingly reliable as the read depth increases. For example, a variant present at 25% allele fraction could be in 1 of 4 reads or in 100 of 400 reads, the latter being much more reliable compared to the former. A constitutional variant would typically appear to be present in half the reads when heterozygous and in 100% of reads when homozygous. Testing artifacts can show up for various reasons. Misaligned reads can be a source of what appears to be low-level mosaicism. Pseudogene regions can also cause artifacts as it is challenging to properly align the reads back to the correct region. However, NGS testebased artifacts will typically be resolved when a separate testing method is utilized. If the variant is clinically reportable often a Sanger sequencing (dideoxy-based sequencing) approach will clarify the presence of the variant. Sanger sequencing is reliable generally at as low as 20% mosaicism; however, when the variant is targeted a lower level can be identified. Sanger sequencing is also subject to test-based artifacts such as allele dropout; however, these will not typically affect NGS results and thus combining the methods can clarify the result.

Mosaic presentations There can be a clinical suspicion of mosaicism prior to any genetic testing. In these cases, the lab result of a low-frequency variant is expected and therefore it is not suspicious of being a testing artifact.

Mosaicism

83

Therefore, indication of such a suspicion on the requisition by the referring clinicians can assist in reporting low fraction variants. Clinically an individual with a mosaic disease typically has a milder presentation and depending on the condition can have segmental disease as well. In regard to their family history they will typically be the first affected family member.

Example 1: mosaic neurofibromatosis Neurofibromatosis type 1 (NF1) is an autosomal dominant condition with full penetrance and a range of clinical manifestations including cafe au lait spots, axillary freckling, optic gliomas, Lisch nodules, and sphenoid dysplasia. A subset of patients have mosaic NF1; however, the phenotype can be strikingly different from NF1. The main contributor toward this stark difference is the skin findings. Mosaic NF1 presents with skin findings localized to one segment or dermatone of the body or is generalized to more than one segment [34]. These segments can have clear and sharp demarcations which can alert examiners to the potential of a mosaic case and guide the variant interpretation of a low-frequency finding. Genetic testing of various tissue types in these individuals provides an example of the results consistent with mosaicism. In this case series results included negative findings, positive findings, and mosaic level findings and these varied with tissue type between the cases.

Example 2: mosaic polycystic kidney disease ADPKD can also present in a manner giving the clinician a suspicion of mosaicism. ADPKD typically presents later in life with declining kidney function, upon imaging investigations bilateral enlarged and multicystic kidney disease is typically discovered. However, the cystic disease begins much earlier in life and can be detected prior to kidney function decline. Events such as pain due to a ruptured cyst, unexplained high blood pressure, or incidental identification on imaging for an alternate reason may all lead to the discovery of bilateral multicystic kidneys. Most ADPKD cases are clear cut; there are well-defined criteria for making a diagnosis based on cyst burden and family history, and pathogenic variants are identified in approximately 90% of cases with a clinical diagnosis [28,29]. However, with mosaic ADPKD we expect difficulty in making a diagnosis as there would be no family history of disease before the proband and disease will be mild. A milder cystic burden, less drastic kidney size increase, and later onset kidney function decline are expected; this is consistent with an overall milder presentation of mosaic disease. In a few cases the cystic disease can be segmental, unilateral, or asymmetric, all suggesting mosaicism [30]. In one study mosaic pathogenic variants at low frequencies on NGS were identified in most cases with a clinical suspicion of mosaic ADPKD [30].

Example 3: LieFraumeni syndrome LieFraumeni syndrome (LFS) provides another great example of mosaicism and the power of case data. LFS is associated with multiple primary cancers (brain, breast, leukemia, soft-tissue and bone sarcomas), rare cancers such as adrenocortical carcinoma, choroid plexus carcinoma, or rhabdomyosarcoma, and a positive family history. The gene associated with LFS is TP53, which is

84

Chapter 5 Clinical and genetic evidence

also recognized as the most frequently altered gene in human cancer [31]. In addition researchers have identified that hematopoietic stem/progenitor cells can acquire TP53 pathogenic variants (among other genes as well) and these can lead to a survival advantage and clonal expansion of the mutated cells [33]. NGS technologies are sensitive to mosaicism and as such these NGS panels have been identifying TP53 variants with allele frequencies that resemble mosaicism. Therefore, in order to properly assess the nature of a TP53 variant, especially when the allele frequency is not clearly at 50%, case data can be incorporated to help reconcile the findings. Figs. 5.8 and 5.9 describe the approach that a clinician may take [32].

FIGURE 5.8 Follow-up testing once a clinically significant TP53 variant is identified in a proband at a reduced allele frequency (T. Breast Cancer Res Treat 2019;174(2):543e50. https://doi.org/10.1007/s10549-018-05094-8. [118] Wai HA, Lord J, Lyon M, et al. Blood RNA analysis can increase clinical diagnostic rate and resolve variants of uncertain significance. Genet Med 2020;22(6):1005e14. https://doi.org/10.1038/s41436-0200766-9. [119] Canson D, Glubb D, Spurdle AB. Variant effect on splicing regulatory elements, branchpoint usage, and pseudoexonization: strategies to enhance bioinformatic prediction using hereditary cancer genes as exemplars. Hum Mutat 2020:1e17. https://doi.org/10.1002/humu.24074. Published online. [120] Hartmann L, Theiss S, Niederacher D, Schaal H. Diagnostics of pathogenic splicing mutations: does bioinformatics cover all bases? Linda. Front Biosci 2008;13:3252e72. https://doi.org/10.2741/2924. [121] Moles-Ferna´ndez A, Duran-Lozano L, Montalban G, et al. Computational tools for splicing defect prediction in breast/ovarian cancer genes: how efficient are they at predicting RNA alterations? Front Genet 2018;9:366. https://doi.org/10.3389/fgene.2018.00366. [122] Leman R, Gaildrat P, Gac GL, et al. Novel diagnostic tool for prediction of variant spliceogenicity derived from a set of 395 combined in silico/in vitro studies: an international collaborative effort. Nucleic Acids Res 2018;46(15):7913e23. https://doi.org/10.1093/nar/gky372. [123] Montalban G, Bonache S, Moles-ferna´ndez A, et al. Screening of BRCA1/2 deep intronic regions by targeted gene sequencing identifies the first germline BRCA1 variant causing pseudoexon activation in a patient with breast/ovarian cancer. J Med Genet 2018;56:63e74. https://doi.org/10.1136/jmedgenet-2018105606. [124] Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, et al. Predicting splicing from primary sequence with deep learning. Cell 2019;176(3):535e48. https://doi.org/10.1016/j.cell.2018.12.015. [125] Ellingford J, Thomas H, Rowlands C, et al. Functional and in-silico interrogation of rare genomic variants impacting RNA splicing for the diagnosis of genomic disorders. bioRxiv 2019. https://doi.org/10.1101/ 781088. October. [126] Zhang Q, Fan X, Wang Y, Sun MA, Shao J, Guo DBPP. A sequence-based algorithm for branch point prediction. Bioinformatics 2017;33(20):3166e72. https://doi.org/10.1093/bioinformatics/btx401. [127] Leman R, Tubeuf H, Raad S, et al. Assessment of branch point prediction tools to predict physiological branch points and their alteration by variants. BMC Genom 2020;21(1):1e12. https://doi.org/10.1186/ s12864-020-6484-5. [128] Tubeuf H, Charbonnier C, Soukarieh O, et al. Large-scale comparative evaluation of user-friendly tools for predicting variant-induced alterations of splicing regulatory elements. Hum Mutat 2020:1e19. https:// doi.org/10.1002/humu.24091. Published online. [129] Xiong HY, Alipanahi B, Lee LJ, et al. The human splicing code reveals new insights into the genetic determinants of disease. Science 2015;347(6218):1254806. https://doi.org/10.1126/science.1254806. [130] Cheng J, Nguyen TYD, Cygan KJ, et al. MMSplice: modular modeling improves the predictions of genetic variant effects on splicing. Genome Biol 2019;20(1):1e15. https://doi.org/10.1186/s13059-019-1653-z.

CHAPTER

Functional evidence (I) transcripts and RNA-splicing outline

7

Mara Colombo1, Paolo Radice1, Miguel de la Hoya2 1

Unit of Molecular Bases of Genetic Risk and Genetic Testing, Department of Research Fondazione IRCCS Istituto Nazionale Dei Tumori, Milano, Italy; 2Molecular Oncology Laboratory, Oncology Department, Instituto de Investigacio´n Sanitaria San Carlos, Hospital Clı´nico San Carlos, Madrid, Spain

Introduction As previously outlined (see chapter 2), a large number of clinically relevant DNA variants in diseaseassociated genes lead to their functional inactivation (i.e., defective alleles cause a disease, or are associated with increased risk of its development). A significant proportion of these loss-of-function alleles affect the splicing process (spliceogenic variants). However, not all spliceogenic variants are necessarily pathogenic (or associated with the same level of risk than a prototypical loss-of-function variant in that gene). In this chapter we will review different methodologies and best practices to assess the spliceogenic effect (if any) of a suspected variant, and equally relevant, to perform an accurate clinical interpretation of the spliceogenic findings. Yet, this chapter will deal as well with a related but different topic. Virtually, all human genes undergo alternative splicing, and this fact may affect genetic testing at several critical steps, including: (i) selection of target sequences to be tested, (ii) unambiguous designation of genetic variants identified, and (iii) functional interpretation of the findings. Further, we will show as well that alternative splicing often provides predictive information on the likely outcome of a spliceogenic variant, information that might be relevant when designing in vitro splicing assays. The relevance of alternative splicing in genetic testing (traditionally neglected) has been widely recognized by the American College of Medical Genetics and Genomics and the Association for Molecular Pathology (ACMG/AMP) in recent standards, guidelines, and recommendations for the clinical interpretation of sequence variants [1,2], as will be discussed. In brief, alternative splicing should be taken into consideration when performing the clinical interpretation of DNA variants, regardless of whether these variants are truncating, missense, and/or spliceogenic. A corollary will be that the better we understand the alternative splicing profile of a gene (its true scope and functional relevance) the better we will perform annotation and clinical interpretation of DNA variants in that gene.

Clinical DNA Variant Interpretation. https://doi.org/10.1016/B978-0-12-820519-8.00004-1 Copyright © 2021 Elsevier Inc. All rights reserved.

121

122

Chapter 7 Functional evidence (I) transcripts

Splicing, alternative splicing events, and splicing isoforms: the splicing profile Splicing is an essential and highly regulated gene expression mechanism whereby some sequences (introns) are excised from pre-mRNAs, and others (exons) are preserved and consecutively joined to produce mature mRNAs. The molecular mechanisms that define how gene sequences are selected into introns and exons are not fully understood but display a high level of flexibility in sequence usage [3]. As a result, the splicing machinery may process pre-mRNAs (i.e., recognize intron and exon sequences) in different alternative manners (alternative splicing) to produce different mRNA species (alternatively spliced mRNA species, or splicing isoforms) from a single gene [3,4]. Ultimately, the alternative splicing profile of a given gene (i.e., the exact identity of all splicing isoforms expressed, and the relative contribution of each individual isoform to the overall expression) is the result of a complex interplay between different cis- and trans-acting factors, as will be detailed in next sections. According to GENCODE estimates, human protein-coding loci express on average 6e8 splicing isoforms [5], and loci with >20 annotated isoforms are very rare [6]. However, some loci (e.g., TP53, HOX) are apparent outliers, expressing hundreds of different splicing isoforms [7]. Remarkably, genome-wide analyses indicate that high-level alternative splicing genes often code for proteins with intrinsically disordered domains (IDDs). IDD proteins often have Hub properties, and Hub proteins are often disease related [8e10]. The latter implies that clinically relevant genes (i.e., genes where accurate DNA variant interpretation becomes critical) are expected to display a high level of alternative splicing. Cancer susceptibility genes TP53 and BRCA1 represent paradigms of this association [7,11]. This reported association between clinical relevance and expected high levels of alternative splicing should not be disregarded. As we will see in next sections, alternative splicing can be viewed as a confounder factor adding complexity to the process of clinical interpretation of DNA variants. Compared against a “reference transcript,” splicing isoforms may differ in untranslated regions (UTRs), coding sequences, or both. Since technically demanding and time-consuming (i.e., full-length cloning and sequencing, or direct full-length sequencing with novel long-read RNA sequencing (RNAseq) technologies), the exact structure of most splicing isoforms has not been experimentally validated, but bioinformatically assembled (e.g., Cufflinks transcripts assembly) from conventional short-reads RNA-seq data [12]. For that reason, it is often more appropriate referring to alternative splicing events rather than to splicing isoforms. Alternative splicing events are defined against a “reference” transcript and catalogued into “biotypes” according to their structural changes and coding potential (See Fig. 7.1A for further details). Genome-wide analyses suggest that some alternative splicing biotypes are much more common than others (e.g., cassette events are the commonest alternative splicing biotype, while intron retentions are rare) [6,13]. Once alternative splicing events are described, splicing isoforms can be viewed as mRNAs harboring unique combinations of alternative splicing events. A corollary of the latter is that coding annotation of alternative splicing events might be misleading (e.g., an alternative splicing event annotated as in-frame might be present in mRNAs harboring also a splicing event introducing a premature termination codon (PTC) (See Fig. 7.1B for further details).

Splicing, alternative splicing events, and splicing isoforms: the splicing profile

123

FIGURE 7.1A Alternative splicing events are defined by novel exoneexon junctions not present in a reference transcript (top) and are catalogued into biotypes according to their structure. Cassette events refer to specific exons (non-constitutive exons) that are excluded or retained in the mRNA, singly or in combination (multi-cassette). Shift events refer to the use of alternative donor and/or acceptor sites that shift in the boundaries between introns and exons. Intron retention refers to the retention of a fulllength intron (e.g., intron 3) in the mRNA (note that these events are not defined by novel exoneexon junctions, but for the lack of one reference exoneexon junctions). Intronization refers to the process whereby a reference exon (e.g., exon 2) is split into two shorter exons with a novel intron in between. Mutually exclusive exons refer to two consecutive and non-constitutive exons (e.g., exons 4 and 4A) that are never identified together in the same mRNA. Splicing events are also catalogued into biotypes according to their coding potential. For instance, exon 1A cassette retention (31 nt insertion, nondivisible by three) would be catalogued as introducing a PTC, while the acceptor shift (33 nt deletion, divisible by three) would be

124

Chapter 7 Functional evidence (I) transcripts

FIGURE 7.1B Alternative splicing events and splicing isoforms. The top panel shows a reference transcript and two alternative splicing cassette events identified by conventional methodologies (e.g., short-read RNA-seq). Accordingly, the gene might express up to three different splicing isoforms: splicing isoform 1 (a reference transcript with an exon 1A insertion), splicing isoform 2 (a reference transcript with exon 3 skipped), and/or splicing isoform 3 (a reference transcript with both an exon 1A insertion and exon 3 skipped). Very often, actual splicing isoforms are not cloned and/or sequenced but assembled using bioinformatic approaches. Thus, assemblies might not capture reality properly. For instance, it might be difficult for a bioinformatic assembly approach to distinguish between these two scenarios: (i) in addition to the reference transcript, only similarly low levels of isoforms 1 and 2 are expressed (each representing 5% of the overall expression), and (ii) in addition to the reference transcript, only isoform 3 is expressed (representing 10% of the overall expression). Often, transcript assembly algorithms are designed to generate parsimonious expression models, and therefore would favor the isoform 3 only expression model (which might be indeed true but is not necessarily true). This is in turn relevant for coding potential annotation. As indicated in the top panel, exon 3 skipping is no-FS. In scenario (i) this coding annotation is meaningful. By contrast, in scenario (ii), the coding annotation is misleading, as exon 3 skipping is only present in transcripts containing an upstream PTC-NMD alteration (exon 1A inclusion). TSS, transcription start site; TTS, transcription termination site.

=

catalogued as no frameshift (no-FS). Of note, despite being annotated as no-FS, the possibility of a de novo STOP codon at the novel exoneexon junction must be considered for proper functional annotation. The figure rightly reflects the fact that most RNA technologies (e.g., short-read RNA-seq) identify alternative splicing events (novel exoneexon junctions), but do not provide direct information on the exact structure of unique splicing isoforms (i.e., unique mRNA molecules that, compared with the reference transcript, may include one or more alternative splicing events). TSS, transcription start site; TTS, transcription termination site.

Spliceogenic variants overlap cis-acting determinants of alternative splicing

125

“Reference” transcript Splicing isoforms and/or alternative splicing events are described and annotated as compared to a reference transcript. If the reference transcript changes, the description and annotation of alternative splicing events will change accordingly (this will be true as well for genetic variants annotated according to c.HGVS and p.HGVS standards; varnomen.hgvs.org). Therefore, sharing a consensus reference transcript is of paramount relevance to facilitate better communication and exchange of data among scientists and diagnostic laboratories alike (see Chapter 2). Yet, selecting for each gene of interest a reference transcript is far from obvious. Ideally, the splicing structure of a reference transcript should be well supported by experimental data (i.e., the full-length transcript has been cloned) and should represent the biology of the gene (i.e., the principal functional isoform). Yet, experimental demonstration of the latter is often difficult to achieve, and at any rate time-consuming. For that reason, several bioinformatics approaches aimed at identifying the principal functional isoform of a gene (and therefore, an ideal reference transcript for annotations) have been proposed [14e16]. Yet, all these approaches make conjectures that are not necessarily always true, among them: (i) prioritizing isoforms displaying most cross-species conservation, (ii) prioritizing isoforms expressed in more tissues, and/or in more developmental stages, and last but not least (iii) assuming that each individual gene has only one principal functional isoform [14e16]. Historically, the consensus reference transcript has been selected not based on comprehensive functional data or bioinformatic algorithms, but simply based on the pragmatic criteria of selecting the longest known transcript expressed by a given gene (for that reason, “reference” and “full-length” transcript are often interchangeable terms in the scientific literature). Far from ideal, the “full-length transcript” approach does not necessarily identify the principal functional isoform (if any) of that gene, nor the longest transcript (once the reference has been established, longer splicing isoforms might be identified). In brief, reference transcripts are often selected arbitrarily. Consequently, reference transcripts are not necessarily the most relevant transcripts produced by a gene (and this might be critical for variant classification), but at the same time, the absence of a widely accepted reference transcript (or the presence of two or more competing reference transcripts in the literature and databases) may lead to miscommunication between scientists, diagnostic laboratories, and clinicians working in the field, and ultimately, to errors of clinical relevance (see Fig. 7.2 for further details).

Spliceogenic variants overlap cis-acting determinants of alternative splicing: short sequence motifs and long-range sequence features Most cis-acting determinants of splicing are believed to be pre-mRNA short sequence motifs, including: (i) sequence motifs (of variable strength) that are recognized by core spliceosome components to define intron-exon boundaries (30 consensus splice site [3’ss], or acceptor site), and exoneintron boundaries (50 consensus splice site [5’ss], or donor site), (ii) the branch point and polypyrimidine tract (PPT) located upstream of the 3’ss, and (iii) additional intronic and exonic cisregulatory short sequences motifs that favor (splicing enhancers) or impair (splicing silencers) exon

126

Chapter 7 Functional evidence (I) transcripts

FIGURE 7.2 Alternative splicing and reference transcripts. The top panel represents a clinically relevant gene expressing three alternatively spliced isoforms. Isoform 1 is widely accepted by the genetic community as the “reference” (probably, isoform 1 would be referred to as fulllength, even though isoform 3 has a longer open reading frame). The “reference” might be supported by a biological rationale, but this is not necessarily the case. Often, legacy reasons (e.g., isoform 2 is shorter than isoform 1, and isoform 3 was discovered much later) are also relevant. Whatever the reason, a widely accepted reference guarantees uniform annotation of genetic variants (i.e., c.HGVS/p.HGVS guidelines) in scientific papers and databases. The bottom panel represents a clinically relevant gene expressing two alternatively spliced isoforms. However, in this case two competing “reference” transcripts exist. Depending on the reference transcript used, genetic variants in the 30 end of the gene (last two exons) would be annotated differently in scientific papers and databases, despite being identical. Similarly, references to exons 3 and 4 (for instance, description of a large genomic rearrangement (LGR) deleting exon 3) would be ambiguous. This might lead to relevant errors in the clinical interpretation of variants. TSS, transcription start site; TTS, transcription termination site.

recognition [17,18]. Of note, intronic cis-regulatory motifs are often, but not necessarily, in close proximity to exons. Genome-wide phylogenetic analyses have suggested that evolutionary conserved deep intronic regions contribute to splicing regulation, probably explaining the mechanism of action of some disease-causing deep intronic sequences [4]. Computational studies of exon recognition suggest that hundreds of cis-acting sequence motifs contribute to the regulation of splicing [19]. Supporting that, recent estimations have shown that an unexpectedly large fraction of genetic diseases (ranging from 15% to >60%) are caused by substitution or small indel variants that disrupt the splicing process [4,20,21].

Trans-acting and epigenetic determinants of alternative splicing

127

Most spliceogenic variants are believed to overlap known (or unknown!) cis-acting splicing motifs, impairing (or strengthening) their functionality. Others create de novo functional splicing motifs (since these motifs are relatively short and degenerated, the probability of a genetic variant creating a de novo site or activating a cryptic one is not negligible). In principle, any substitution or indel sequence variant within splicing motifs might be spliceogenic, regardless of its annotation in the scientific literature and/or databases as truncating, missense, synonymous, or intronic. The latter notion is particularly important since it might lead to misconceptions on variant functional and clinical interpretation. For instance, c.211A > G is a clear-cut BRCA1 pathogenic variant. The variant, often reported as p.(Arg71Gly) in the scientific literature, is predicted to target a well-established N-terminal functional domain of BRCA1 (RING-domain) that displays ubiquitin (Ub) protein ligase (E3) activity. Yet, a functional assay published in 2001 demonstrated that this particular missense change does not impact E3 activity [22]. Remarkably, another study published in the very same year demonstrated that c.211A > G is not a “true” missense variant, but a spliceogenic variant that destroys the naturally occurring exon 5 donor site, which is replaced by an upstream alternative donor site, resulting in the loss of 22 nucleotides at the 30 -end of the exon. This alteration introduces a PTC in the mature transcript [23]. Despite this fact, the variant is still described as missense in clinically relevant databases (ClinVar accession VCV000017693.7). In addition to these short sequence motifs, long-range sequence features (e.g., the relative size of exons and flanking introns) are also critical in determining the alternative splicing profile [24,25]. These poorly understood long-range determinants may represent a challenge for splicing analyses based on reporter minigene constructs (see below). As explained below, often, these constructs include only a small fraction of intronic sequences (i.e., long-range determinants are lost), and probably related with that, do not reproduce accurately the alternative splicing profile observed in vivo. Further, modification of long-range determinants of splicing might be relevant to interpret correctly large genomic rearrangements (LGRs) identified using copy number variation (CNV) analyses (e.g., MLPA) as a proxy. For instance, a particular BRCA2 LGR-carrying allele lacking only exon 6 produces mRNAs that skip both exons 5 and 6, and a BRCA2 LGR-carrying allele lacking only exon 7 produces two different mRNAs, one skipping exons 6 and 7, an another skipping exons 3 to 7 [26]. For that reason, we recommend evaluating LGRs at the cDNA level before performing a clinical interpretation (see Fig. 7.3).

Trans-acting and epigenetic determinants of alternative splicing Alternative splicing is regulated by the activity of numerous trans-acting splicing factors (RNAbinding proteins) that bind to splicing enhancers and silencers. Splicing factors are exemplified by the serine/arginine-rich proteins (SR proteins) and heterogeneous nuclear ribonucleoproteins (hnRNPs). SR proteins contain one or two copies of an RNA recognition motif domain at the Nterminus that provides RNA-binding specificity and a C-terminal arginine/serine-rich domain contributing to proteineprotein interactions. Similarly, hnRNPs contain both RNA-binding domains and relatively unstructured domains that likely contribute to proteineprotein interactions [19]. Many other RNA-binding proteins regulate splicing (e.g., CUG-BP- and ETR-3-like (CELF), Muscleblindlike (MBNL), RBFOX, Signal transduction and activation of RNA (STAR), and NOVA proteins).

128

Chapter 7 Functional evidence (I) transcripts

FIGURE 7.3 Long-range sequence determinants of alternative splicing: Clinical classification of large genomic rearrangements (LGRs). The top panel represents a hypothetical carrier of a monoallelic intragenic LGR in a gene of clinical relevance. The altered allele (LGR allele) has been detected by copy number variation (CNV) analysis (e.g., MLPA). The analysis has shown two copies of exons 1, 3, and 4, but only one copy of exon 2. As often in the diagnostic setting, the exact genomic coordinates of the deletion breakpoints have not been characterized. Often, this alteration would be clinically interpreted based on the assumption that exon 2 (and only exon 2) expression is lost. Therefore, clinical classification would be defined based on whether exon 2 loss is out-of-frame or inframe, and in the latter case, whether exon 2 codes for functionally relevant protein domains or not. However, the figure shows that, in addition to exon 2, putative long-range genomic determinants of splicing (e.g., relative size of intron 1, exon 2, and intron 2) and/or deep intronic short sequence motifs (e.g., an intronic splicing enhancer in intron 2) have been lost as well. As a result (bottom panel) exon 3, despite still present in the genomic sequence, is no longer recognized as an exon by the splicing machinery. Therefore, the detected LGR is not causing only exon 2 lost in the mRNA but loss of exons 2 and 3. It is therefore this mRNA alteration that must be considered for clinical interpretation. Note that the actual genomic alteration harbor by another exon 2 deletion carrier may be different (i.e., different breakpoints) so that the splicing outcome could not be necessarily identical (the latter is a logical inference, but we are not aware of specific LGR carriers identical by CNV analysis but different at the cDNA level). ISE, intronic splicing enhancer; TSS, transcription start site; TTS, transcription termination site.

Alternative splicing is regulated as well by RNA polymerase II (RNAPII) elongation rates [27]. This is due to the fact that splicing (and other pre-mRNA processing steps) occurs cotranscriptionally on nascent RNAs, and couples to transcription through core spliceosome components and splicing factors interacting with the carboxy-terminal domain of RNAPII [28]. It has now become evident that there is a complex interplay between epigenetic processes and splicing, as several epigenetic features (e.g., nucleosome occupancy, specific histone methylation and/ or acetylation modifications, and/or CpG methylation) differentially mark introns, introneexon boundaries, constitutive and nonconstitutive exons. For instance, nucleosome occupancy is higher in exonic than in intronic regions, and histone acetylation on the exoneintron junctions usually promotes skipping of cassette exons, whereas deacetylation has the opposite effect [3].

Roles of alternative splicing

129

Furthermore, highlighting the complexity of splicing regulation, alternative splicing is also regulated by the availability (i.e., changing levels) of core spliceosome components. Mass spectrometry studies indicate that the spliceosome is associated with >170 proteins, and computational studies of exon recognition suggest that hundreds of sequence motifs contribute to the regulation of splicing [19].

Roles of alternative splicing Alternative splicing occurs in all metazoan organisms with increasing prevalence according to phenotypic complexity. In fact, 25%, 60%, and >95% of all multi-exon loci undergo alternative splicing in Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens, respectively [6,29,30]. However, most alternative splicing isoforms are only known through sequence comparison, and their exact biological function is unknown [4]. Splicing was discovered in 1977 [31,32]. Immediately, it was proposed that the splicing mechanism had evolved to make possible that a large number of genes provide instructions for producing an even larger number of proteins via alternative splicing (i.e., the role of alternative splicing is increasing protein diversity) [33]. Indeed, many examples of genes producing various protein isoforms with different biological functions and/or cellular localizations have been reported. For instance, the skipping of exon 2 of the RIPK2 (receptor-interacting serine-threonine kinase 2) gene leads to the production of a protein lacking the kinase domain and apoptosis activity, and alternative splicing of the NOXO1 (NADPH oxidase organizer 1) gene gives rise to four protein isoforms: two are within the intracellular membrane, one in the plasma membrane, and one in the nucleus [4]. Remarkably, recent large-scale proteomic studies suggest that many predicted alternative transcripts are not translated into proteins, so that the actual contribution of alternative splicing to protein diversity is currently under dispute [6,30]. On top of that, some authors have suggested a role for alternative splicing in buffering mutational consequences [33], and mounting evidence indicates that alternative splicing is a major posttranscriptional regulator of gene expression [33]. Indeed, abundance of mRNAs may be regulated by alternative splicing coupled to nonsense-mediated decay (NMD), a physiological quality control mechanism that prevents cells from synthesis of nonfunctional truncated proteins by degrading PTC-containing transcripts [17,34,35]. Recent genome-wide studies have identified thousands of alternative exons and retained introns targeting mRNAs to NMD [36]. The DNA damage response (DDR) regulates gene expression of DNA repair genes through alternative splicing, nicely illustrating many of the aspects that we have just overviewed. Environmental and metabolic byproduct genotoxins induce a wide variety of harmful DNA lesions. Cells have evolved mechanisms that translate stochastic DNA damage into a coherent and organized DDR, including activation of repair systems, cell cycle checkpoints, and apoptotic programs. DDR is coordinated by signaling networks that utilize posttranslational modifications and proteineprotein interactions to elicit the initial stages of the cellular response. Later, DDR depends largely on modulation of alternative splicing through posttranslational modifications (e.g., PARylation, ubiquitylation, phosphorylation) of several splicing factors (e.g., hnRNP A1/A2, NONO, SRSF1, BCLAF1, THARP3). DDR-induced splicing changes influence the cellular proteome either regulating misspliced, rapidly degraded transcripts, or via selective utilization of alternative exons encoding divergent protein domains [37,38]. For instance, DDR-induced splicing changes regulate the expression of some apoptosis (e.g., BCL2L1, CAS8, CASP9), cell-cycle control (e.g., CHEK1, CHEK2, AURKAB,

130

Chapter 7 Functional evidence (I) transcripts

CDC25B) and DNA repair (e.g., EXO1, BRCA1, MLH3, FANCA) genes. This regulation is exerted through the combined action of splicing factors SRSF10, hnRNP A1/A2, and Sam68 [39]. At least SRSF10 action has been shown to be ATM/CHEK2 dependent [40]. Further, RNA-processing factors BCLAF1 and THRAP3 are phosphorylated, respectively, by ATM and ATR in response to DNA damage. Both BCLAF1 and THRAP3 factors are involved in upregulating a subset of DNA repair genes in response to DNA damage thought increasing rate of pre-mRNA splicing and nuclear export. The subset of upregulated DNA repair genes includes ATRIP, and several Fanconi anemia (FA)/BRCA double-strand repair-related genes such as ATM, BRCA2 (FANCD1), BRIP1 (FANCJ), EXO1, FANCD2, FANCL, RAD51, and possibly PALB2 (FANCN) [41e43]. Interestingly, BRCA1 plays an essential role in BCLAF1 upregulation of DNA repair genes. BRCA1, constitutively bound to a large subset of gene promoters, does not regulate the expression of most of these genes in unperturbed cells, but upregulates expression upon DDR induction [44]. Mechanistically, this is mediated through BRCA1 phosphoserine-1423 recruitment of BCLAF1 together with core splicing factors (Prp8,U2AF65, U2AF35 SF3B1) at target genes, promoting mRNA splicing and transcript production/stability [43]. This DDR-induced and ATM/BRCA1/BCLAF1-mediated upregulation pathway has been confirmed for DNA repair genes (ATRIP, BRIP1/FANCJ), and EXO1. BRCA1 binds constitutively to the promoter of these three genes, but also to exons and exon/intron boundaries [43].

Alternative splicing profile is dynamic The alternative splicing profile is highly dynamic, contributing to transcriptome (and proteome) diversity in different tissues and/or development stages, as well as in response to multiple endogenous (e.g., DNA damage), exogenous (e.g., physiological endocrine communication), and environmental signals [29]. A well-known example of the latter is the fact that stress provoked by exams in medical students causes overexpression of a phosphatidylinositol 3-kinase-related protein kinase (SMG-1) splicing isoform lacking exon 61 [45]. RNA-sequencing studies have analyzed alternative splicing dynamics in development and cell differentiation, revealing that: (i) splicing transitions occurred in multiple genes at the same time (splicing coordination); (ii) specific RNA-binding proteins (splicing factors) contribute to this coordination; (iii) genes that are regulated by alternative splicing mechanisms are not modulated at their overall expression levels (up- or downregulation); and (iv) splicing dynamics are cell type- and/or region-specific within different tissues (e.g., differences are observed between different neuronal cell types, between cardiomyocytes and cardiac fibroblasts, and between hepatocytes and nonparenchymal cells in the liver) [27]. This splicing profile (see Fig. 7.4) and/or the splicing profile dynamics (see Fig. 7.5) might be relevant for accurate variant classification and is consequently acknowledged in ACMG/AMP guidelines [1,2]. For instance, when classifying apparently clear-cut pathogenic variants (e.g., a PTCNMD variant, i.e., a variant introducing a PTC that is predicted to induce NMD) these guidelines recommend being cautious, considering the presence of alternative gene transcripts, which transcripts are biologically relevant, and in which tissues are expressed. Specifically, the guidelines highlight the fact that one must be cautious in overinterpreting the functional impact of DNA variants confined to only a subset of mRNA isoforms. For instance, PTC-NMD variants located in an alternatively spliced

Alternative splicing profile is dynamic

131

FIGURE 7.4 Alternative splicing profile and clinical classification of PTC-NMD variants. The cartoon represents a clinically relevant gene (loss-of-function variants are disease-causing) expressing three alternative splicing isoforms: a reference isoform 1 representing 70% of the mRNA transcripts, and isoforms 2 and 3, each representing 15% of the total mRNA transcripts. The top panel represents the splicing profile generated by an allele carrying a truncating variant (red cross) in exon 2. The bottom panel represents the splicing profile generated by an allele carrying a truncating variant in exon 3 (red cross). Both variants would be annotated against the reference isoform 1 as equivalent PTC-NMD variants, and therefore, probably clinically interpreted as pathogenic alike. Yet, if we take into consideration the actual splicing profile of the gene, we realize that the two variants are not equivalent. The exon 2 truncating variant would affect (and therefore inactivate) all 3 isoforms (red crosses). By contrast, the exon 3 variant would affect (inactivate) the reference isoform (red cross), but not isoforms 2 and 3 (blue crosses). Therefore, the loss-of-function status of this variant is not self-evident. It would depend on the coding potential of isoforms 2 and/or 3 (i.e., whether they are able to provide haplo-sufficiency or not). Expressed in other terms, the prior probability of being pathogenic is higher for the variant in exon 2 than for the variant in exon 3. If genetic studies (case/control and/or segregation analysis) demonstrate that the exon 3 truncating variant is not disease-causing (or causes a mild phenotype), we could define isoforms 2 and/or 3 as “rescue transcripts” for truncating variants in exon 3. TSS, transcription start site; TTS, transcription termination site.

132

Chapter 7 Functional evidence (I) transcripts

FIGURE 7.5 Alternative splicing profile dynamics and clinical classification of PTC-NMD variants. The cartoon represents a hypothetical clinically relevant gene (loss-of-function variants are disease-causing). No splicing isoforms other than the reference have been described. Yet, this is due to a lack of knowledge on the alternative splicing dynamics of that gene. In fact, there is an induced splicing isoform (e.g., tissueand/or developmental stage- and/or stress-specific) critical to explain the clinical relevance of the gene. The upper panel shows a carrier of an apparent PTC-NMD variant (red cross). Yet, the variant would not affect the induced splicing isoform (blue cross), and therefore would be likely misclassified as pathogenic. The bottom panel shows a carrier of a true PTC-NMD variant (red cross) that would be probably missed by genetic testing (the variant is in a nonannotated exon). TSS, transcription start site; TTS, transcription termination site.

portion of APC (adenomatous polyposis coli) exon 9 are associated with attenuated Familial Adenomatous Polyposis (AFAP, G; 641A > G] variant haplotype does not produce detectable levels of full-length transcripts, due to modification of an exon 10 splicing regulatory element caused by the c. 641A > G substitution, and generates an mRNA profile composed of 70%e80% truncating transcripts lacking exon 10 and of 20%e30% of inframe transcripts lacking exons 9 and 10 [52]. BRCA1 exons 9 and 10 code for a part of the intrinsically disordered central portion of the protein, which has been hypothesized to act as a long flexible scaffold for intermolecular interactions, but with no apparent functional activity [52]. Therefore, BRCA1

134

Chapter 7 Functional evidence (I) transcripts

transcripts lacking exons 9 and 10 are predicted to encode a protein maintaining tumor suppression properties. Consistently, integrating caseecontrol analysis with segregation and pathology information, it was possible to demonstrate that c.[594-2A > G; 641A > G] variant, although spliceogenic, was not associated with a high cancer risk. Similarly, the BRCA1 D(E9,10) isoform explains the lack of high cancer risk associated with the spliceogenic variant BRCA1 c.591C > T [53]. Further, the major outcome of PALB2 c.49-1G > A (damaging the exon 2 acceptor site) is not exon 2 skipping (an inframe alteration that targets the coiled-coil domain critical for PALB2 interaction with BRCA1), but upregulation of an alternative isoform lacking “only” the first 6 nt of exon 2. This outcome was also anticipated by the fact that this D(E2p6) acceptor shift is a PALB2 alternative splicing event [54] (See Fig. 7.6). The actual cancer risk associated with the PALB2 c.49-1G > A variant is currently unknown, but the finding that the variant is not completely deleting the coiled-coil domain, but only two residues of unknown functional relevance [p.(Leu17_Lys18del)], certainly reduces the prior probability of the variant being pathogenic. The above variants exemplify one of the mechanisms that may circumvent the potential pathogenicity of spliceogenic variants, i.e., the occurrence of functionally proficient alternative transcripts that may compensate for the loss of the reference (full-length) mRNA when a circumscribed portion of the coding region is missing without altering the reading frame. As mentioned in the previous section, the same “rescue” mechanism can counteract the deleterious effect of nonsense and frameshift variants introducing a PTC in particular exons. It has to be noted that the efficiency of such rescue mechanism most likely depends on the overall cellular level of transcripts coding for functional proteins that occur in the presence of a given variant and, therefore, may differ according to the variant type (i.e., protein truncating vs. spliceogenic). In fact, spliceogenic variants inducing the in-frame skipping of “nonessential” exons usually enhance the expression of the corresponding exon-deficient functional isoform(s), thus counteracting the loss of the normal (reference) isoform. On the contrary, proteintruncating variants cause a decrease of the cellular content of functionally proficient RNAs, since the loss of full-length transcripts is not compensated by an increase of alternatively spliced functional isoforms. These different behaviors may have different consequences. For example, variants introducing a PTC in the distal portion of BRCA1 exon 11, which undergoes alternative splicing generating the functionally active D(E11q) isoform [11], are usually considered as high-risk variants. Consistently, carriers of biallelic protein truncation variants in BRCA1 exon 11q exhibit severe cancer phenotypes [55]. Conversely, a phenotypically normal individual has been reported carrying a homozygous variant at BRCA1 exon 11 acceptor splice site (c.4096þ3A > G) [56]. This variant completely abolishes the expression of the full-length mRNA, but, at the same time, increases that of the D(E11q) [57]. The alternative mechanism that prevents the potential pathogenic effect of spliceogenic variants is represented by the incomplete effect that some of them have on normal mRNA splicing. These variants, termed “leaky” splicing variants, in addition to give rise to aberrant transcripts, and/or to quantitatively alter transcript patterns, maintain the ability to produce a certain amount of functionally proficient mRNA. In some instances, this residual functional mRNA is sufficient to maintain the functional activity of the allele carrying the variant. A paradigmatic example is provided by the BRCA2 c.68-7T > A. Several authors reported that the variant leads to an increase of naturally occurring transcripts lacking exon 3 [D(E3)] [58e62]. These isoforms are predicted to code for an unfunctional (or partially functional) protein, given that exon 3 codes for a portion of the BRCA2 domain interacting with the PALB2 protein, whose binding is required for stabilization and correct nuclear

Spliceogenic variants: alternative splicing informs

135

FIGURE 7.6 Alternative splicing may predict the outcome of a spliceogenic variant prior probability of pathogenicity and design of splicing assays. The upper panel shows a gene expressing two alternative splicing isoforms, one of them lacking exon 2. A DNA variant targeting exon 2 acceptor site (red cross) impairs isoform 1 (reference) expression but is compatible with isoform 2 expression. Based on that rationale, the most likely outcome of intervening sequence 1 variants targeting the exon 2 consensus acceptor site (IVS1-1,-2 variants) is the

136

Chapter 7 Functional evidence (I) transcripts

localization of BRCA2 [63]. However, the clinical relevance of the c.68-7T > A remained controversial for long until, more recently, a large collaborative study assessed that the variant, which was shown by allelic-specific expression analysis to induce a 4.5-fold higher expression of D(E3), but not to completely abolish the expression of a normal (full-length) mRNA and was not associated with a high risk of breast cancer [64]. Overall, leaky splicing variants remain particularly resistant to clinical classification and, while some of them were ascertained to be benign, as in the above example, others were classified as pathogenic. As an example, Table 7.1 reports spliceogenic variants in BRCA1 and BRCA2 genes that were reported as of April 2020 to have a leaky effect by at least one study, and for which a clinical classification was assessed based on multifactorial likelihood analyses [80]. Of the 18 listed variants, 13 (72%) were classified as pathogenic or likely pathogenic, whereas 5 (28%) were classified as benign or likely benign. It is possible that this depends, at least in part, by flaws in experimental analyses. In fact, not all studies that investigated the same variant were consistent in observing a leaky splicing effect. However, it is also conceivable that the pathogenicity of a leaky splicing variant may depend on the ratio between functional and nonfunctional transcripts derived from the variant alleles. It is expected that accurate quantitative analyses of the transcript profiles associated with leaky splicing variants may provide clues on this issue and, possibly, contribute to the clinical classification of this class of variants. In addition, it has been reported that leaky splicing variants might display a reduced penetrance [81].

Splicing analyses: determining the spliceogenic impact of a genetic variant The most immediate and simplest approach for determining the spliceogenic effect of a genetic variant is to analyze the RNA obtained directly from variant carriers, thus preserving the native genetic background. However, this approach is normally limited to genes that are expressed in tissues that can be obtained without invasive procedures. Typically, the RNA is extracted from peripheral blood mononuclear cells (PBMCs) isolated from freshly drawn blood or from RNA-stabilized whole blood samples. In the presence of splicing variants causing aberrant transcripts containing a PTC, this approach is hampered mainly by the low levels of such aberrant transcripts due to NMD. This can be

= expression of isoform 2 only, and the prior probability of being pathogenic would depend on the functional consequences of exon 2 skipping. If exon 2 skipping is a PTC-NMD alteration (or an in-frame alteration targeting residues critical for proper protein folding/activity), the prior probability will be extremely high. The bottom panel shows a gene expressing three alternative splicing isoforms. A DNA variant targeting exon 2 acceptor site (red cross) impairs isoform 1 (reference) expression but is compatible with isoforms 2 and 3 expression. Based on that rationale, the most likely outcome of IVS1-1,-2 DNA variants is expression of isoform 2 and/or isoform 3 only (i.e., exons 2 plus 3 skipping and/or activation of a cryptic acceptor site in exon 2). If any of these predictions is in-frame not targeting critical residues, the prior probability of being pathogenic will be low. These predictions may inform as well on the best approach to design an in vitro splicing assay for the variant of interest. TSS, transcription start site; TTS, transcription termination site.

Splicing analyses: determining the spliceogenic impact of a genetic variant

137

Table 7.1 Clinically classified leaky splicing variants in BRCA genesa.

Gene

Variant

Classificationb

References (RNA analysis)

BRCA1 BRCA1 BRCA1 BRCA1 BRCA1 BRCA2 BRCA2 BRCA2 BRCA2 BRCA2 BRCA2 BRCA2 BRCA2 BRCA2 BRCA2 BRCA2 BRCA2 BRCA2

c.4675þ1G > A c.4675þ3A > T c.4868C > G c.5074þ6C > G c.5123C > A c.68-7T > A c.440A > T c.476-2A > G c.7007G > A c.7975A > G c.7988A > T c.8009C > T c.8035G > T c.8168A > G c.8331þ1G > A c.8755-1G > A c.9501þ3A > T c.9502-12T > G

Pathogenic Pathogenic Pathogenic Benign Pathogenic Benign Likely benign Pathogenic Pathogenic Pathogenic Pathogenic Likely pathogenic Likely pathogenic Likely pathogenic Likely pathogenic Pathogenic Benign Benign

[65] [67] [69] [65] [70] [58,60,62,64] [71] [72,73] [58] [58,74] [60,69,74] [74] [74] [60,61,69,74] [75] [73] [76e78] [77]

References (multifactorial likelihood model) [66] [68] [69] [68] [66] [64] [68] [68] [69] [68] [66] [68] [68] [66] [68] [68] [66] [68]

a Leaky variants causing the occurrence of novel (predicted) functional mRNA isoforms, and/or increasing the relative level of naturally occurring functional isoforms, were not considered. b According to IARC guidelines [79].

prevented in vitro with appropriate inhibitors of protein synthesis, mainly puromycin [82] and cycloheximide [83]. These compounds, which were observed to be functionally equivalent [84], can be used to treat cultured cells as fibroblasts derived from skin biopsies [62,85], short-term lymphocytes suitably stimulated to divide by phytohemagglutinin [86,87], and stable lymphoblastoid cell lines (LCLs) obtained by Epstein Barr virus (EBV)-mediated lymphocytes transformation [73,88]. LCLs have been reported to present substantial similarities in mRNA transcript profiles compared to parent lymphocytes [89]. Alternatively, when biological samples of variant carriers are not available, the spliceogenic effect can be investigated by using minigene assay. This methodology is based on transient transfection into appropriate cultured cells of expression vectors containing the genomic portion encompassing the exon addressed by the variant and the flanking intronic regions, obtained by amplifying the DNA of variant carriers [90,91], or mutagenizing wild-type genomic DNA [74]. However, it must be reminded that minigene is an artificial construct containing a partial sequence of the gene of interest which is cloned into intrinsic vector sequences and spliced in a heterologous cellular system. Thus, as already anticipated, the splicing efficiency may be different than in the native context and the resulting transcript profile may not fully reflect that occurring in the carriers [75]. While this limitation may be circumvented by cloning an insert corresponding to a large genomic region encompassing the complete sequence of multiple exons and introns [74], assay related-

138

Chapter 7 Functional evidence (I) transcripts

discrepancies in the outcome of splicing analyses are particularly relevant for the assessment of leaky variants, which appears to depend on the experimental approach used. For example, the BRCA2 c.9501þ3A > T was reported to completely abolish the expression of the normal transcript following the investigation of carrier mRNA, while a partial expression of the normal RNA by the mutant allele was detected by minigene analysis [76e78]. Conversely, the opposite was observed for the BRCA2 c.8331þ2T > C [74,75]. The conventional molecular procedure to ascertain the presence of aberrant transcripts is the reverse transcription followed by PCR amplification (RT-PCR) performed using sequence-specific primers designed to anneal to exons flanking the gene region potentially affected by the variant under study. The amplification products are typically analyzed using agarose gel electrophoresis and characterized by sequencing. The aberrant transcripts are identified by comparing the variant splicing profile with the reference derived from normal controls [84]. Moreover, the allelic-specific expression of the normal transcript, to uncover a possible incomplete spliceogenic effect of the variant (leaky splicing variant), may be ascertained by direct sequencing of the cDNA region encompassing either the variant, if exonic, or, if intronic, an informative exonic polymorphism (tag-SNP) in cis with the variant itself [73]. In the absence of aberrant products, the complete degradation of PTC-containing transcripts is excluded by NMD inhibition and can be further verified by allele specific analysis of the normal transcripts. When tag-SNP analyses cannot be performed, the minigene assay, being an allele specific approach, represents a valuable alternative tool. Unfortunately, end-point PCR-based approaches preclude the possibility of an accurate quantitative analysis, highlighted to be crucial for the thorough characterization of the genetic variant, especially in the presence of naturally occurring alternative transcripts [78,84]. A semiquantitative analysis of transcripts is possible using fluorescent-labeled primers in nonsaturating PCR conditions and analyzing the amplification products with capillary electrophoresis (CE), a detection method more sensitive and with a greater resolution power than conventional gel electrophoresis [78,84]. The actual impact of spliceogenic variants on target transcript levels is assessed by quantitative PCR (qPCR) and digital PCR (dPCR). The qPCR enables both an absolute quantification based on the standard curve method and a relative quantification in comparison to a control sample, after normalization on a reference gene [75]. The dPCR enables exclusively absolute quantification of target transcripts by partitioning the PCR reaction into several independent subreactions and measuring the fraction of amplification-positive partitions [92]. The experimental design of PCR-based assays has been shown to be crucial to optimize the detection of aberrant transcripts and, finally, characterize spliceogenic variants [84]. In particular, prior knowledge of naturally occurring alternative splicing events is critical for the correct design of RT-PCR primers, justifying the relevance of recent multicentre studies aimed to catalogue the naturally occurring splicing pattern of several cancer susceptibility genes [93e96]. On the contrary, RNA-seq technologies allow to identify not only preannotated transcripts, but also novel splicing events, be they natural or aberrant, thus helping the characterization of potential spliceogenic variants [95,97,98] and improving clinical diagnosis [99]. Noticeably, long-read nanopore-based RNA-seq may generate entire mRNA sequences, thus not limiting the assessment of the spliceogenic impact of the genetic variants under study to a partial transcript region and allowing the characterization of each individual alternative transcript generated by the combination of different splicing events [100]. In addition, RNA-seq allows to characterize the splicing profiles of several genes simultaneously [94,95].

References

139

Conclusion While probably self-evident, it is worth highlighting here that the better we understand the genetics/ biology of a clinically relevant gene (and the mechanisms underlying pathogenicity), the more accurate will be the clinical interpretation of DNA variants in that gene. In relation with splicing, some relevant questions are: Do we have an accurate description of the alternative splicing profile? To what extent this splicing profile is dynamic (tissue-, developmental stage-, and/or stress-specific)? Are all alternative splicing isoforms relevant for pathogenicity? Or, just a subset? The more we are able to expand our knowledge on these questions, the better the investigations of mRNA transcripts will contribute to gene-specific annotation and clinical interpretation of DNA variants.

References [1] Richards S, Aziz N, Bale S, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of medical genetics and genomics and the association for molecular pathology. Genet Med Off J Am Coll Med Genet 2015;17(5):405e24. [2] Abou Tayoun AN, Pesaran T, DiStefano MT, et al. Recommendations for interpreting the loss of function PVS1 ACMG/AMP variant criterion. Hum Mutat 2018;39(11):1517e24. [3] Ramanouskaya TV, Grinev VV. The determinants of alternative RNA splicing in human cells. Mol Genet Genomics MGG 2017;292(6):1175e95. [4] Kelemen O, Convertini P, Zhang Z, et al. Function of alternative splicing. Gene 2013;514(1):1e30. [5] Frankish A, Mudge JM, Thomas M, Harrow J. The importance of identifying alternative splicing in vertebrate genome annotation. Database 2012;2012:bas014. [6] Djebali S, Davis CA, Merkel A, et al. Landscape of transcription in human cells. Nature 2012;489(7414): 101e8. [7] Mercer TR, Gerhardt DJ, Dinger ME, et al. Targeted RNA sequencing reveals the deep complexity of the human transcriptome. Nat Biotechnol 2012;30(1):99e104. [8] Buljan M, Chalancon G, Eustermann S, et al. Tissue-specific splicing of disordered segments that embed binding motifs rewires protein interaction networks. Mol Cell 2012;46(6):871e83. [9] Cortese MS, Uversky VN, Keith Dunker A. Intrinsic disorder in scaffold proteins: getting more from less. Prog Biophys Mol Biol 2008;98(1):85e106. [10] Uversky VN, Oldfield CJ, Midic U, et al. Unfoldomics of human diseases: linking protein intrinsic disorder with diseases. BMC Genom 2009;10(Suppl. 1):S7. [11] Colombo M, Blok MJ, Whiley P, et al. Comprehensive annotation of splice junctions supports pervasive alternative splicing at the BRCA1 locus: a report from the ENIGMA consortium. Hum Mol Genet 2014; 23(14):3666e80. [12] Trapnell C, Roberts A, Goff L, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc 2012;7(3):562e78. [13] Mudge JM, Frankish A, Fernandez-Banet J, et al. The origins, evolution, and functional potential of alternative splicing in vertebrates. Mol Biol Evol 2011;28(10):2949e59. [14] Tress ML, Wesselink J-J, Frankish A, et al. Determination and validation of principal gene products. Bioinforma Oxf Engl 2008;24(1):11e7. [15] Rodriguez JM, Rodriguez-Rivas J, Di Domenico T, Va´zquez J, Valencia A, Tress ML. Appris 2017: principal isoforms for multiple gene sets. Nucleic Acids Res 2018;46(D1):D213e7.

140

Chapter 7 Functional evidence (I) transcripts

[16] Matched Annotation from NCBI and EMBL-EBI (MANE). Accessed February 25, 2020. https://www.ncbi. nlm.nih.gov/refseq/MANE/. [17] Cartegni L, Chew SL, Krainer AR. Listening to silence and understanding nonsense: exonic mutations that affect splicing. Nat Rev Genet 2002;3(4):285e98. [18] Wang Z, Burge CB. Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. RNAO News N 2008;14(5):802e13. [19] Dvinge H, Kim E, Abdel-Wahab O, Bradley RK. RNA splicing factors as oncoproteins and tumour suppressors. Nat Rev Canc 2016;16(7):413e30. [20] Wang G-S, Cooper TA. Splicing in disease: disruption of the splicing code and the decoding machinery. Nat Rev Genet 2007;8(10):749e61. [21] Ward AJ, Cooper TA. The pathobiology of splicing. J Pathol 2010;220(2):152e63. [22] Ruffner H, Joazeiro CA, Hemmati D, Hunter T, Verma IM. Cancer-predisposing mutations within the RING domain of BRCA1: loss of ubiquitin protein ligase activity and protection from radiation hypersensitivity. Proc Natl Acad Sci U S A 2001;98(9):5134e9. [23] Vega A, Campos B, Bressac-De-Paillerets B, et al. The R71G BRCA1 is a founder Spanish mutation and leads to aberrant splicing of the transcript. Hum Mutat 2001;17(6):520e1. [24] Bao S, Moakley DF, Zhang C. The splicing code goes deep. Cell 2019;176(3):414e6. [25] Gelfman S, Burstein D, Penn O, et al. Changes in exon-intron structure during vertebrate evolution affect the splicing pattern of exons. Genome Res 2012;22(1):35e50. [26] Mesman RLS, Calle´ja FMGR, de la Hoya M, et al. Alternative mRNA splicing can attenuate the pathogenicity of presumed loss-of-function variants in BRCA2. Genet Med Off J Am Coll Med Genet 2020. https://doi.org/10.1038/s41436-020-0814-5. Published online May 13. [27] Baralle FE, Giudice J. Alternative splicing as a regulator of development and tissue identity. Nat Rev Mol Cell Biol 2017;18(7):437e51. [28] Saldi T, Cortazar MA, Sheridan RM, Bentley DL. Coupling of RNA polymerase II transcription elongation with pre-mRNA splicing. J Mol Biol 2016;428(12):2623e35. [29] Xu Y, Zhao W, Olson SD, Prabhakara KS, Zhou X. Alternative splicing links histone modifications to stem cell fate decision. Genome Biol 2018;19(1):133. [30] Kim E, Magen A, Ast G. Different levels of alternative splicing among eukaryotes. Nucleic Acids Res 2007; 35(1):125e31. [31] Berget SM, Moore C, Sharp PA. Spliced segments at the 5’ terminus of adenovirus 2 late mRNA. Proc Natl Acad Sci U S A 1977;74(8):3171e5. [32] Chow LT, Roberts JM, Lewis JB, Broker TR. A map of cytoplasmic RNA transcripts from lytic adenovirus type 2, determined by electron microscopy of RNA:DNA hybrids. Cell 1977;11(4):819e36. [33] Niklas KJ, Bondos SE, Dunker AK, Newman SA. Rethinking gene regulatory networks in light of alternative splicing, intrinsically disordered protein domains, and post-translational modifications. Front Cell Dev Biol 2015;3:8. [34] Perrin-Vidoz L, Sinilnikova OM, Stoppa-Lyonnet D, Lenoir GM, Mazoyer S. The nonsense-mediated mRNA decay pathway triggers degradation of most BRCA1 mRNAs bearing premature termination codons. Hum Mol Genet 2002;11(23):2805e14. [35] Baker KE, Parker R. Nonsense-mediated mRNA decay: terminating erroneous gene expression. Curr Opin Cell Biol 2004;16(3):293e9. [36] Mauger O, Scheiffele P. Beyond proteome diversity: alternative splicing as a regulator of neuronal transcript dynamics. Curr Opin Neurobiol 2017;45:162e8. [37] Tresini M, Marteijn JA, Vermeulen W. Bidirectional coupling of splicing and ATM signaling in response to transcription-blocking DNA damage. RNA Biol 2016;13(3):272e8. [38] Shkreta L, Chabot B. The RNA splicing response to DNA damage. Biomolecules 2015;5(4):2935e77.

References

141

[39] Cloutier A, Shkreta L, Toutant J, Durand M, Thibault P, Chabot B. hnRNP A1/A2 and Sam68 collaborate with SRSF10 to control the alternative splicing response to oxaliplatin-mediated DNA damage. Sci Rep 2018;8(1):2206. [40] Shkreta L, Toutant J, Durand M, Manley JL, Chabot B. SRSF10 connects DNA damage to the alternative splicing of transcripts encoding apoptosis, cell-cycle control, and DNA repair factors. Cell Rep 2016;17(8): 1990e2003. [41] Vohhodina J, Barros EM, Savage AL, et al. The RNA processing factors THRAP3 and BCLAF1 promote the DNA damage response through selective mRNA splicing and nuclear export. Nucleic Acids Res 2017; 45(22):12816e33. [42] Beli P, Lukashchuk N, Wagner SA, et al. Proteomic investigations reveal a role for RNA processing factor THRAP3 in the DNA damage response. Mol Cell 2012;46(2):212e25. [43] Savage KI, Gorski JJ, Barros EM, et al. Identification of a BRCA1-mRNA splicing complex required for efficient DNA repair and maintenance of genomic stability. Mol Cell 2014;54(3):445e59. [44] Gorski JJ, Savage KI, Mulligan JM, et al. Profiling of the BRCA1 transcriptome through microarray and ChIP-chip analysis. Nucleic Acids Res 2011;39(22):9536e48. [45] Kurokawa K, Kuwano Y, Tominaga K, et al. Brief naturalistic stress induces an alternative splice variant of SMG-1 lacking exon 63 in peripheral leukocytes. Neurosci Lett 2010;484(2):128e32. [46] Nieuwenhuis MH, Vasen HFA. Correlations between mutation site in APC and phenotype of familial adenomatous polyposis (FAP): a review of the literature. Crit Rev Oncol Hematol 2007;61(2):153e61. [47] Cartegni L, Wang J, Zhu Z, Zhang MQ, Krainer AR. ESEfinder: a web resource to identify exonic splicing enhancers. Nucleic Acids Res 2003;31(13):3568e71. [48] Desmet F-O, Hamroun D, Lalande M, Collod-Be´roud G, Claustres M, Be´roud C. Human Splicing Finder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res 2009;37(9):e67. [49] Brunak S, Engelbrecht J, Knudsen S. Prediction of human mRNA donor and acceptor sites from the DNA sequence. J Mol Biol 1991;220(1):49e65. [50] Caminsky N, Mucaki EJ, Rogan PK. Interpretation of mRNA splicing mutations in genetic disease: review of the literature and guidelines for information-theoretical analysis. F1000Res 2014;3:282. [51] Rowlands CF, Baralle D, Ellingford JM. Machine learning approaches for the prioritization of genomic variants impacting pre-mRNA splicing. Cells 2019;8(12). [52] Mark W-Y, Liao JCC, Lu Y, et al. Characterization of segments from the central region of BRCA1: an intrinsically disordered scaffold for multiple protein-protein and protein-DNA interactions? J Mol Biol 2005;345(2):275e87. [53] de la Hoya M, Soukarieh O, Lo´pez-Perolio I, et al. Combined genetic and splicing analysis of BRCA1 c.[594-2A>C; 641A>G] highlights the relevance of naturally occurring in-frame transcripts for developing disease gene variant classification algorithms. Hum Mol Genet 2016;25(11):2256e68. [54] Lopez-Perolio I, Leman R, Behar R, et al. Alternative splicing and ACMG-AMP-2015-based classification of PALB2 genetic variants: an ENIGMA report. J Med Genet 2019;56(7):453e60. [55] Seo A, Steinberg-Shemer O, Unal S, et al. Mechanism for survival of homozygous nonsense mutations in the tumor suppressor gene BRCA1. Proc Natl Acad Sci U S A 2018;115(20):5241e6. [56] Byrjalsen A, Steffensen AY, Hansen TVO, Wadt K, Gerdes A-M. Classification of the spliceogenic BRCA1 c.4096þ3A>G variant as likely benign based on cosegregation data and identification of a healthy homozygous carrier. Clin Case Rep 2017;5(6):876e9. [57] Wappenschmidt B, Becker AA, Hauke J, et al. Analysis of 30 putative BRCA1 splicing mutations in hereditary breast and ovarian cancer families identifies exonic splice site mutations that escape in silico prediction. PloS One 2012;7(12):e50800.

142

Chapter 7 Functional evidence (I) transcripts

[58] Houdayer C, Caux-Moncoutier V, Krieger S, et al. Guidelines for splicing analysis in molecular diagnosis derived from a set of 327 combined in silico/in vitro studies on BRCA1 and BRCA2 variants. Hum Mutat 2012;33(8):1228e38. [59] Jarhelle E, Riise Stensland HMF, Mæhle L, Van Ghelue M. Characterization of BRCA1 and BRCA2 variants found in a Norwegian breast or ovarian cancer cohort. Fam Cancer 2017;16(1):1e16. [60] Sanz DJ, Acedo A, Infante M, et al. A high proportion of DNA variants of BRCA1 and BRCA2 is associated with aberrant splicing in breast/ovarian cancer patients. Clin Cancer Res Off J Am Assoc Cancer Res 2010; 16(6):1957e67. [61] The´ry JC, Krieger S, Gaildrat P, et al. Contribution of bioinformatics predictions and functional splicing assays to the interpretation of unclassified variants of the BRCA genes. Eur J Hum Genet EJHG 2011; 19(10):1052e8. [62] Vreeswijk MPG, Kraan JN, van der Klift HM, et al. Intronic variants in BRCA1 and BRCA2 that affect RNA splicing can be reliably selected by splice-site prediction programs. Hum Mutat 2009;30(1):107e14. [63] Xia B, Sheng Q, Nakanishi K, et al. Control of BRCA2 cellular and clinical functions by a nuclear partner, PALB2. Mol Cell 2006;22(6):719e29. [64] Colombo M, Lo`pez-Perolio I, Meeks HD, et al. The BRCA2 c.68-7T > A variant is not pathogenic: a model for clinical calibration of spliceogenicity. Hum Mutat 2018;39(5):729e41. [65] Steffensen AY, Dandanell M, Jønson L, et al. Functional characterization of BRCA1 gene variants by minigene splicing assay. Eur J Hum Genet EJHG 2014;22(12):1362e8. [66] Easton DF, Deffenbaugh AM, Pruss D, et al. A systematic genetic assessment of 1,433 sequence variants of unknown clinical significance in the BRCA1 and BRCA2 breast cancer-predisposition genes. Am J Hum Genet 2007;81(5):873e83. [67] Baert A, Machackova E, Coene I, et al. Thorough in silico and in vitro cDNA analysis of 21 putative BRCA1 and BRCA2 splice variants and a complex tandem duplication in BRCA2 allowing the identification of activated cryptic splice donor sites in BRCA2 exon 11. Hum Mutat 2018;39(4):515e26. [68] Parsons MT, Tudini E, Li H, et al. Large scale multifactorial likelihood quantitative analysis of BRCA1 and BRCA2 variants: an ENIGMA resource to support clinical variant classification. Hum Mutat 2019;40(9): 1557e78. [69] Walker LC, Whiley PJ, Couch FJ, et al. Detection of splicing aberrations caused by BRCA1 and BRCA2 sequence variants encoding missense substitutions: implications for prediction of pathogenicity. Hum Mutat 2010;31(6):E1484e505. [70] Millevoi S, Bernat S, Telly D, et al. The c.5242C>A BRCA1 missense variant induces exon skipping by increasing splicing repressors binding. Breast Canc Res Treat 2010;120(2):391e9. [71] Fraile-Bethencourt E, Valenzuela-Palomo A, Dı´ez-Go´mez B, et al. Mis-splicing in breast cancer: identification of pathogenic BRCA2 variants by systematic minigene assays. J Pathol 2019;248(4):409e20. [72] Machackova E, Foretova L, Lukesova M, et al. Spectrum and characterisation of BRCA1 and BRCA2 deleterious mutations in high-risk Czech patients with breast and/or ovarian cancer. BMC Canc 2008;8:140. [73] Colombo M, De Vecchi G, Caleca L, et al. Comparative in vitro and in silico analyses of variants in splicing regions of BRCA1 and BRCA2 genes and characterization of novel pathogenic mutations. PloS One 2013; 8(2):e57173. [74] Fraile-Bethencourt E, Dı´ez-Go´mez B, Vela´squez-Zapata V, Acedo A, Sanz DJ, Velasco EA. Functional classification of DNA variants by hybrid minigenes: identification of 30 spliceogenic variants of BRCA2 exons 17 and 18. PLoS Genet 2017;13(3):e1006691. [75] Gelli E, Colombo M, Pinto AM, et al. Usefulness and limitations of comprehensive characterization of mRNA splicing profiles in the definition of the clinical relevance of BRCA1/2 variants of uncertain significance. Cancers 2019;11(3).

References

143

[76] Bonnet C, Krieger S, Vezain M, et al. Screening BRCA1 and BRCA2 unclassified variants for splicing mutations using reverse transcription PCR on patient RNA and an ex vivo assay based on a splicing reporter minigene. J Med Genet 2008;45(7):438e46. ´ , Dı´ez-Go´mez B, Velasco EA. Functional classification of [77] Acedo A, Herna´ndez-Moro C, Curiel-Garcı´a A BRCA2 DNA variants by splicing assays in a large minigene with 9 exons. Hum Mutat 2015;36(2):210e21. [78] Montalban G, Bonache S, Moles-Ferna´ndez A, et al. Incorporation of semi-quantitative analysis of splicing alterations for the clinical interpretation of variants in BRCA1 and BRCA2 genes. Hum Mutat 2019;40(12): 2296e317. [79] Plon SE, Eccles DM, Easton D, et al. Sequence variant classification and reporting: recommendations for improving the interpretation of cancer susceptibility genetic test results. Hum Mutat 2008;29(11):1282e91. [80] Goldgar DE, Easton DF, Byrnes GB, et al. Genetic evidence and integration of various data sources for classifying uncertain variants into a single model. Hum Mutat 2008;29(11):1265e72. [81] Høberg-Vetti H, Ognedal E, Buisson A, et al. The intronic BRCA1 c.5407-25T>A variant causing partly skipping of exon 23-a likely pathogenic variant with reduced penetrance? Eur J Hum Genet EJHG 2020. Published online March 20. [82] Andreutti-Zaugg C, Scott RJ, Iggo R. Inhibition of nonsense-mediated messenger RNA decay in clinical samples facilitates detection of human MSH2 mutations with an in vivo fusion protein assay and conventional techniques. Canc Res 1997;57(15):3288e93. [83] Bateman JF, Freddi S, Lamande´ SR, et al. Reliable and sensitive detection of premature termination mutations using a protein truncation test designed to overcome problems of nonsense-mediated mRNA instability. Hum Mutat 1999;13(4):311e7. [84] Whiley P, de la Hoya M, Thomassen M, et al. Comparison of mRNA splicing assay protocols across multiple laboratories: recommendations for best practice in standardized clinical testing. Clin Chem 2014; 60(2):341e52. [85] Santamaria R, Vilageliu L, Grinberg D. SR proteins and the nonsense-mediated decay mechanism are involved in human GLB1 gene alternative splicing. BMC Res Notes 2008;1:137. [86] Fisher GH, Rosenberg FJ, Straus SE, et al. Dominant interfering Fas gene mutations impair apoptosis in a human autoimmune lymphoproliferative syndrome. Cell 1995;81(6):935e46. [87] Etzler J, Peyrl A, Zatkova A, et al. RNA-based mutation analysis identifies an unusual MSH6 splicing defect and circumvents PMS2 pseudogene interference. Hum Mutat 2008;29(2):299e305. [88] Sugden B, Yates J, Mark W. Transforming functions associated with Epstein-Barr virus. J Invest Dermatol 1984;83(1 Suppl. l):82se7s. [89] Hussain T, Mulherkar R. Lymphoblastoid cell lines: a continuous in vitro source of cells to study carcinogen sensitivity and DNA repair. Int J Mol Cell Med 2012;1(2):75e87. [90] Baralle D, Baralle M. Splicing in action: assessing disease causing sequence changes. J Med Genet 2005; 42(10):737e48. [91] Sharma N, Sosnay PR, Ramalho AS, et al. Experimental assessment of splicing variants using expression minigenes and comparison with in silico predictions. Hum Mutat 2014;35(10):1249e59. [92] Quan P-L, Sauzade M, Brouzes E. dPCR: a technology review. Sensors 2018;18(4). [93] Fackenthal JD, Yoshimatsu T, Zhang B, et al. Naturally occurring BRCA2 alternative mRNA splicing events in clinically relevant samples. J Med Genet 2016;53(8):548e58. [94] Davy G, Rousselin A, Goardon N, et al. Detecting splicing patterns in genes involved in hereditary breast and ovarian cancer. Eur J Hum Genet EJHG 2017;25(10):1147e54. [95] Branda˜o RD, Mensaert K, Lo´pez-Perolio I, et al. Targeted RNA-seq successfully identifies normal and pathogenic splicing events in breast/ovarian cancer susceptibility and Lynch syndrome genes. Int J Canc 2019;145(2):401e14.

144

Chapter 7 Functional evidence (I) transcripts

[96] Walker LC, Lattimore VL, Kvist A, et al. Comprehensive assessment of BARD1 messenger ribonucleic acid splicing with implications for variant classification. Front Genet 2019;10:1139. [97] Cummings BB, Marshall JL, Tukiainen T, et al. Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Sci Transl Med 2017;9(386). [98] Kremer LS, Bader DM, Mertes C, et al. Genetic diagnosis of Mendelian disorders via RNA sequencing. Nat Commun 2017;8:15824. [99] Marco-Puche G, Lois S, Benı´tez J, Trivino JC. RNA-seq perspectives to improve clinical diagnosis. Front Genet 2019;10:1152. [100] de Jong LC, Cree S, Lattimore V, et al. Nanopore sequencing of full-length BRCA1 mRNA transcripts reveals co-occurrence of known exon skipping events. Breast Cancer Res 2017;19(1):127.

CHAPTER

Functional evidence (II) protein and enzyme function

8

Alvaro N.A. Monteiro1, Thales C. Nepomuceno2, Niels de Wind3, Vanessa C. Fernandes2, Anna B.R. Elias2, Marcelo A. Carvalho2, 4 1

Cancer Epidemiology Program, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, United States; Divis~ ao de Pesquisa Clınica, Instituto Nacional de C^ ancer, Rio de Janeiro, Brazil; 3Leiden University Medical Center, 4 Leiden, the Netherlands; Instituto Federal do Rio de Janeiro - IFRJ, Rio de Janeiro, Brazil

2

Abbreviations: ACMG AMP DBD DGGE EMPF1 EMSA HBOC HR LS MLPA MMR PTT SSCP SSS VUS

American College of Medical Genetics and Genomics Association of Molecular Pathologists DNA-binding domain denaturing gradient gel electrophoresis encephalopathy due to defective mitochondrial and peroxisomal fission-1 electrophoretic mobility shift assay hereditary breast and ovarian cancer homologous recombination lynch syndrome multiplex ligation-dependent probe amplification mismatch repair protein truncation test single-stranded conformation polymorphism sick sinus syndrome variants of uncertain significance

Historical background The use of protein activity as a surrogate marker to detect disease or disease risk has a long history in biomedicine. Isozymes, protein isoforms that catalyze the same reaction but are coded by different genes, were initially used in clinical practice to detect a disease state [1,2]. For example, heart muscleespecific lactate dehydrogenases, when detected in serum or plasma, reflected damage to cardiac tissues after myocardial infarction [2,3] (Fig. 8.1A). Allozymes, protein isoforms produced by different alleles of the same gene, have also been used as an early example of a diagnostic surrogate marker for retinoblastoma predisposition. In this case, the gene coding for esterase D mapped to chromosome band 13q14 in strong linkage (i.e., in close proximity) to the RB gene. Esterase D served as a proxy for the early identification of patients with a constitutional deletion at the locus (Fig. 8.1B) [4e6]. It was also used to detect loss of heterozygosity Clinical DNA Variant Interpretation. https://doi.org/10.1016/B978-0-12-820519-8.00014-4 Copyright © 2021 Elsevier Inc. All rights reserved.

145

e)

)

as

hy

se

alt

(di

Se

rum

rum Se

Tis

su e2

su e1 Tis

(he

se (di

(he alt hy

rum

e2 Tis

su

su Tis

Se rum

e1

(A)

as

)

e)

Chapter 8 Functional evidence (II) protein and enzyme

Se

146

Gene A translation Gene B chromogenic substrate Gene C

isoenzymes

D RB D RB

allele D

ild Ch

Fa

the

r

d rb

r

D D RB RB

the

d rb

Mo

(B)

RB wt translation

allele d

chromogenic substrate

rb mt

allozymes

(C) Blood from unaffected parents de novo mutation

D RB

D RB

aEstD in blood: 1

Tumor from affected child

Blood from affected child

LOH

D RB aEstD in blood: 0.5

aEstD in tumor: 0

FIGURE 8.1 Examples of early usage of protein activity as a surrogate for disease states or disease susceptibility. (A) Detection of lactate dehydrogenase isoforms (red, green, and blue ribbons) coded by different genes (red, green, and blue boxes, respectively) in different tissues using a chromogenic substrate may reveal a disease state with the appearance of an isoform normally restricted to Tissue 1 in serum red arrow. (B) Detection of esterase D allozymes (red and green ribbons) coded by different alleles of the same gene (red and green boxes, respectively) is used as a surrogate marker for RB alleles (dark and light blue boxes). (C) Diagram showing the use of esterase D activity (aEstD) as surrogate marker for RB alleles (RB and rb) to demonstrate loss of heterozygosity (LOH) in tumor tissue.

The challenge of variants of uncertain significance

147

in the tumor, demonstrating that RB was a tumor suppressor gene conforming to the two-hit hypothesis (Fig. 8.1C) [7,8]. Detection of isozymes and allozymes by chromogenic substrates based on protein activity was eventually superseded by the use of specific antibodies (for western blot detection) and hybridization probes (for southern blot detection) eventually obviating the need for measurement of protein activity as a proxy for susceptibility to disease. In the 1990s, family-based linkage analysis was used to map and clone high penetrance alleles in disease genes such as HTT (a.k.a. Huntingtin, associated with Huntington disease OMIM #143100), CTFR (associated with cystic fibrosis OMIM #219700), TP53 (associated with LieFraumeni syndrome OMIM #1516230), BRCA1 (associated with hereditary breast and ovarian cancer OMIM # 604370), and the Lynch syndromeeassociated genes MSH2, MSH6, MLH1, and PMS2 (LS; OMIM #120435) [9e14]. Following this wave of mapping and gene cloning, tests to identify carriers were primarily DNA based and included techniques such as single-stranded conformation polymorphism (SSCP), denaturing gradient gel electrophoresis (DGGE), and multiplex ligation-dependent probe amplification (MLPA) [15e17]. These developments allowed carriers of pathogenic alleles to make more informed decisions about increased surveillance, risk-reducing options, and treatment. While the use of DNA-based techniques improved accuracy, they were limited to the detection of alleles with alterations that could be differentiated after electrophoresis, mainly alleles with large insertion or deletions. Although SSCP allowed for discrimination of small changes, it did not reveal the specific nucleotide alteration. Alternative protein-based tests, such as protein truncation test (PTT), suffered from similar limitations [18]. The advent of direct DNA sequencing, first through dye terminator (Sanger) and then by massively parallel sequencing (a.k.a. next-generation sequencing), allowed for the detection of a wider spectrum of alleles including point mutations, and small insertions and deletions [19,20]. Discrimination of which alleles were likely to cause disease depended on identifying, in the sequencing results, telltale signs of gene disruption. Findings of rearrangements, out-of-frame deletions and insertions, frameshift, stop gain changes, or changes to splicing acceptor and donor sites could be easily inferred as being disruptive and compromising gene or protein function. These inferred loss-of-function variants can be classified as pathogenic for clinical purposes. However, the effect of small in-frame insertions and deletions, missense changes, and intronic changes cannot be directly inferred from sequencing data. For these so-called variants of uncertain significance (VUS), disease association could not be established from sequencing data alone.

The challenge of variants of uncertain significance With the transition to direct sequencing and the widespread use of panel testing in clinical settings, the number of VUS has exploded [21]. The vast majority of VUS is composed of predicted missense (a nucleotide sequence change leading to a change in amino acid) and predicted splicing variants (nucleotide change leading to defective or illegitimate splicing of the mRNA). Splicing variants can generally be discriminated from the wild-type sequence by western blot (Fig. 8.2A). However, splicing and missense variants cannot be discriminated when ectopically expressed from a cDNA construct. The current chapter focuses on amino acid substitutions only. For splicing effects the reader is referred to Chapter 5 (Functional Evidence (I) Transcripts and splicing).

148

Chapter 8 Functional evidence (II) protein and enzyme

(A) Regulatory site variants 1 1 DNA

Enhancer

Splice site variant (Exon skipping) 3

Exon 1

Exon 2a

Splice site variant (Alternative exon) 4 Nonsense and frameshift variants (leading to premature termination) 5

Promoter

wt

Missense variants 2

2

1

3

Exon 2b 4

5

6 Exon 4 6

* * * *

* * * * * *

RNA

Exon 3

* * * * * *

* * * * * * * * * * * *

Protein

Western blot

(B)

1

1

2

2

3

3

4

4

5

6

5

6

5

6

2

3

4

YES NO YES

5

6

loss of function inferred from genetic code? allele frequency in population >1%?

NO YES

Population and family-based data

informative families for segregation analysis? NO

1

3

4

5

1

2

3

4

5

YES

Changes in transcript levels or ratio? NO

YES

Promoter & enhancer assays (luciferase, EMSA) NO

6 YES

Protein-based functional assays

Molecular biology and biochemical data

Types of variants that can be classified

1

FIGURE 8.2 (A) Diagram of a hypothetical gene illustrating locations (numbers 1e6) of different classes of variants in several gene features (enhancer, promoter exons, and introns). The predicted effects on RNA and protein are shown as well as how the changes would be reflected on western blots for the protein (bottom panel). (B) Subway chart showing how the different classes of variants (numbers 1e6 from A) can be classified using population, family-based (red), and functional data (green).

Prediction of variant effects: in silico tools

149

When a proband undergoes genetic testing, the likelihood of receiving a VUS result varies according to several factors, including how many genes are being interrogated and the information available for variants in those genes. Increased availability of information about individual alleles leads to a decrease in the VUS rate (the percentage of probands tested for a gene that receive a test report of VUS), as VUS are progressively classified as benign or pathogenic. For example, from 2002 to 2013, Myriad Genetics VUS rate declined from 12.8% of all BRCA1/2 reports to 2.1%, partly due to the accumulation of genetic and clinical information about alleles driven by increased testing [22]. In contrast, 20%e30% genetic tests may find VUS in Lynch syndromee associated genes [23].

Assessment of variant pathogenicity Historically, the assessment of pathogenicity for missense VUS has relied on population, clinical, and family-based data. Typically, variants that are commonly (frequency > 1%) found in unselected populations of unaffected individuals can be classified as benign (or likely benign) under the assumption that disease-associated alleles tend to be removed from the population through selection (Fig. 8.2B). Less common or rare variants depend on segregation data, to assess their association (Fig. 8.2B). However, often few informative families carrying a specific variant are available to conduct segregation analysis. In some cases, co-occurrence in trans (in different alleles) with a known pathogenic variant (e.g., a truncating variant) can be used to classify a variant as benign (or likely benign) if the cooccurrence of two pathogenic variants leads to embryonic lethality or to a clinical phenotype such as Fanconi anemia [24e26]. Variants that remain unclassified due to lack of data from the sources discussed above can be assessed by in silico predictors, molecular biology, and biochemical tests including RNA transcript assays to determine splicing changes, and luciferase and electrophoretic mobility shift assays to identify regulatory defects (Fig. 8.2C). Most missense variants, however, will be assessed only using in silico predictors and protein-based assays (Fig. 8.2B) as clinical data such as family history and tumor phenotype are typically sparse and the evidence insufficient for classification. Truncating variants and in-frame alternative splicing variants may also benefit from functional assays, if no knowledge exists about the requirement of the missing protein regions in the final protein product (Fig. 8.2B).

Prediction of variant effects: in silico tools In silico tools to predict pathogenicity of missense variants are classifiers that use the variant DNA and protein sequence to predict its impact on protein structure and function. The main underlying assumption is that predicted impact on protein function can be extended to predict pathogenicity. This assumption has been consistently supported by observation and reflects the high accuracy (>80%) achieved by many predictors. In the clinic, in silico classifiers are used as one of the evidence criteria recommended for variants classification by the American College of Medical Genetics and Genomics (ACMG) and the Association of Molecular Pathologists (AMP) [27]. Importantly, in silico tools are used to prioritize variants for functional analysis [28].

150

Chapter 8 Functional evidence (II) protein and enzyme

There are currently over 40 published predictors and metapredictors (or “ensemble predictors”) to assess amino acid substitutions [29]. At their core, most popular predictors and metapredictors (which combine classifiers) [30e38] are based on multiple protein sequence alignment to predict an amino acid substitution effect on protein function (Fig. 8.3A). Predictions use the assessment of the amino acid conservation on a specific residue position when aligning orthologous proteins from different species and physicochemical differences between the reference and variant amino acid residue [39]. For example, if orthologs present only hydrophobic amino acids in a certain position, then only a change for another hydrophobic amino acid would be predicted to be tolerated (Fig. 8.3A). In addition to clinical and population data, predictors may also incorporate information on structural features and predicted structural changes using data from X-ray crystallography models, signal peptide, transmembrane topology, posttranslational modification, catalytic activity, macromolecular binding, metal binding, and allostery [40] (Fig. 8.3B). The available data are adequate to derive general rules but gene- and protein-specific data are still needed to refine predictions [40,41]. Benchmark studies comparing predictors have revealed, perhaps not surprisingly, that their performance varies according to gene/protein suggesting that the use of predictors for assay prioritization should first identify the predictor that performs best in the gene of choice [39,42e44].

Functional assays In previous paragraphs, we discussed the basic principles and the rationale for the use of a specific protein function as a proxy to disease risk and examined the role of in silico prediction to aid in the assessment of VUS. In this section, we discuss the principles of functional assays: how to perform and interpret a functional assay using select examples of diseases in which functional tests have been instrumental in identifying individuals at risk. One recurrent theme is that although there is a common underlying framework, several aspects are gene/protein-specific. Knowledge about protein function, its biological roles, and which function is implicated in disease (the trait in question) is critical for the development of functional assays that can discriminate benign from pathogenic variants. For example, a variant in the major breast and ovarian cancer susceptibility gene BRCA1 is evaluated through assessing its role in DNA damage sensitivity or homologous recombination, established functions of BRCA1 linked to development of breast and ovarian cancer. Along similar lines, a missense variant in RHO, which codes for rhodopsin, and may be associated with retinitis pigmentosa (OMIM #268000), can be assessed through its effects on surface expression, subcellular localization, or biochemical properties [45e47]. Functional assays are experiments which evaluate the impact of a protein variant in in vivo or in vitro contexts. Here we define in vivo assays as those in which the read-out of the assay is the effect of the variant in the context of an animal model such as the fruit fly (Drosophila melanogaster), zebra fish (Danio rerio), frog (Xenopus laevis), or mouse (Mus musculus). In vivo assays are typically more costly, labor intensive, and less amenable to scaling up to test large numbers of variants. In vitro assays include cell-free (e.g., enzymatic assays, protein binding, etc.) and cell-based (e.g., viability, subcellular localization, etc.) that use single-cell organisms (e.g., yeast) or animal cells in tissue culture.

Functional assays

151

(A) Variable position Conserved position BRCA1

...PFT NMPTDQ... Homo sapiens Pan troglodytes ...PFT NMPTDQ... Canis familiaris ...PFT NMPTDQ... Rattus norvegicus...PFT NMPKDE... ...PFT NMPKDD... Mus musculus ...PFT DMTTGH... Gallus gallus Substitutions are unlikely to be tolerated Substitutions (e.g. E, D, and H) are likely to be tolerated

(B) Multiple Sequence Alignments ...PFT NMPTDQ... Homo sapiens Pan troglodytes ...PFT NMPTDQ... Canis familiaris ...PFT NMPTDQ... Rattus norvegicus...PFT NMPKDE... ...PFT NMPKDD... Mus musculus ...PFT DMTTGH... Gallus gallus

Evolutionary conservation Physico-chemical characteristics

Structure-based information

Other protein-based information (post-translation modification sites, interaction surfaces, signal peptide) Allele frequency in normal population Annotation of variants observed clinically

Integration with other predictors

FIGURE 8.3 (A) Multiple sequence alignment for BRCA1 protein orthologs in five species showing invariable positions (M, red arrow, conserved position) and variable ones (Q, blue arrow, nonconserved position). (B) Subway chart illustrating different sources of data that can contribute to assess pathogenicity of amino acid substitutions in in silico predictors.

152

Chapter 8 Functional evidence (II) protein and enzyme

Validation and calibration In most cases, functional assays are conducted in the context of research laboratories and only rarely in a clinical-grade environment in which there are periodic certification, approved standard operation procedures, and a well-established set of negative and positive controls. Therefore, caution is warranted during the interpretation of results because errors in sequence and assembly of cDNA constructs, swapped samples, and clerical mistakes can happen. To minimize misclassification, the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) established standards and guidelines for the interpretation of sequence variants [27] (see Chapter 3: International Consensus Guidelines). ACMG/AMP criteria provide a framework for variant classification based on independent sources of evidence such as allele frequency, in silico predictions, segregation data, and functional assays, highlighting the importance of data integrations. Functional assays should be properly validated by testing a number of known pathogenic and benign variants to estimate specificity and sensitivity. It is unclear what the correct threshold should be for the functional information to be used in variant classification for clinical purposes. It has been suggested that 80% specificity and sensitivity is a minimal threshold given that common medical tests achieve these levels of sensitivity and specificity [48]. Another approach is to specify which type of evidence criteria (supporting, moderate, or strong) an assay can assign a variant given the odds of pathogenicity in a series of pathogenic and benign controls [49]. Another common theme apparent throughout the study of missense variants in different genes is that variants may have a functional impact on the protein ranging from mild to severe. Even in commonly used five-tier classification schemes (benign, likely benign, uncertain, likely pathogenic, pathogenic), the results are essentially binary (loss of function vs. normal function), the use of variants that have well-established relative risks are critical to calibrate the threshold at which a certain impact on function is considered “loss of function” for the purposes of classification. In summary, the interpretation of functional results depends on context and the available knowledge about the function of a gene. As there are many levels at which the gene function can be disrupted (e.g., transcript processing and stability, protein folding and binding to other proteins) it is useful to develop decision trees for the recommended course of action given the observed functional changes [50]. Below, we explore a few examples of functional assays for genes implicated in disease.

Example: BRCA1 and BRCA2 The hereditary breast and ovarian cancer (HBOC) syndrome is mainly attributed to pathogenic variants in BRCA1 or BRCA2 [9,51]. Proteins coded by both genes are key players in the DNA damage repair by homologous recombination (HR) but have also been linked to a wide variety of distinct molecular functions [52e54]. There are currently several classes of applicable functional assays described in the literature for both proteins [47,55e57]. A RING finger on its amino-terminal, a coiled-coil on its central region, and a pair of BRCT domains on its carboxyl-terminal end are known functional protein domains in BRCA1, and therefore more likely to harbor amino acid substitutions that might impact function [57]. Although the evidence for a role of the coiled-coil is still not definitive, the integrity of the RING and BRCT domains is critical for BRCA1-mediated tumor suppression [58].

Example: BRCA1 and BRCA2

153

The RING finger confers BRCA1 E3 ubiquitin ligase activity and also mediates its interaction with BARD1 [59e61]. The observation that the p.Cys61Gly cancer-associated variant abrogates both E3 ligase activity and BARD1 binding prompted the development of RING-based assays to measure its interaction with BARD1, with its E2 ubiquitin-conjugating enzyme (UbcH5a), and its intrinsic E3 ligase activity [61e63]. Proteineprotein interaction assays are performed in cell-based models (yeast two-hybrid) while the E3 ubiquitin ligase enzymatic is measured in a cell-free system. The UbcH5a binding is sensitive to variants throughout the studied region (first 100 amino acid residues of BRCA1) and presents a high correlation with E3 activity loss. On the other hand, BARD1 interaction is sensitive only to a restricted region but variants that abrogate BARD1 binding also impair the association with UbcH5a. Further analysis showed that variants with consistent impairment across different functional assays result in structural alterations that cannot sustain the RING finger functions/interactions [63]. These assays have been replicated and represent a robust method for functional evaluation of BRCA1 [64,65]. These observations highlight a collateral benefit of functional assays, which is the interpretation of how domains are structured and what are the bases of their functions. The same extrapolation is true for variants in the tandem BRCT domain. It has also been shown that the BRCA1 C-terminal region, including the tandem BRCT, can activate transcription of a reporter gene when fused to a heterologous DNA-binding domain [66]. Known pathogenic variants abrogate the transcription activity, allowing the development of a cellbased (in yeast and human cell lines) functional assay to interrogate variants in this BRCA1 carboxyl-terminal region [67,68]. As observed for the RING finger domain, structural alterations within the tandem BRCT are directly associated with transcription activation impairment [69]. Currently, more than 300 variants have been interrogated by the BRCA1 transcription activation assay [70]. Domain-specific approaches are limited as the functional interpretation is limited to the interrogated region [63,70,71]. This problem is circumvented by the mouse embryonic stem cellebased assay, in which a mouse conditional allele is complemented by full-length human BRCA1. This model interrogates the impact of variants in DNA repair by homologous recombination (HR) and consequent sensitivity to cisplatin or PARP1 inhibitor treatments [72]. Importantly, results from this approach reinforce the notion that BRCA1 impairment is primarily linked to its RING and tandem BRCT domains. BRCA2 is a 3418 amino acid residue protein, which is essential to normal cells and plays a role in DNA damage repair by HR [73e77]. Therefore, many functional assays that assess a variant’s impact on BRCA2 functions focus on the evaluation of HR repair efficiency, sensitivity to genotoxic drugs (e.g., cisplatin), and the rescue of its depletion-associated lethality [56]. BRCA2 contains eight BRC repeats in its central region which mediates its interaction with HR effector RAD51 [78] and a DNAbinding domain (DBD) on its carboxy-terminus, a critical region for its role in HR repair [79]. Two cell-based complementation assays evaluate these functions and demonstrate that BRCA2 seems to be less tolerant to missense variants when located at DBD [80,81] and the PALB2-interaction region, highlighting the critical role of the PALB2/BRCA2 complex in HR repair [80,82,83]. In addition to its role in DNA repair, BRCA2 has been implicated in centrosome duplication through its interaction with nucleophosmin and ROCK2 [84]. BRCA2 dysfunction leads to centrosome amplification and abnormal chromosome segregation. BRCA2 tumor suppressor function is also associated with its role in centrosome amplification and replication fork stabilization [85,86]. Known

154

Chapter 8 Functional evidence (II) protein and enzyme

pathogenic variants in the DBD lead to centrosome amplification. BRCA2-associated centrosome amplification presents a strong correlation with the risk of cancer predisposition [81,87,88]. Functional assays for BRCA1 and BRCA2 have analyzed several hundred missense variants and have been used for many years in support of clinical classification of VUS [47,57].

Example: DNA mismatch repair genes Lynch syndrome (LS) is a prevalent, dominantly inherited, cancer predisposition caused by a heterozygous defect in one of four DNA mismatch repair (MMR) genes MSH2, MSH6, MLH1, or PMS2. MMR is a canonical DNA repair mechanism that excises nucleotides misincorporated during replication of the genome by DNA polymerases. Thus, MMR prevents the spontaneous accumulation of mutations [14]. In carriers of a pathogenic variant in any of these four genes, the wild-type allele of that gene is inadvertently lost in a fraction of the cells, which results in a spontaneous mutator phenotype. Hence, LS predisposes to early onset of cancer of the colon, endometrium, and other proliferative visceral organs [89]. LS patients who carry a pathogenic MMR gene mutation may benefit from lifestyle alterations and they enroll surveillance programs, whereas LS-associated cancers may benefit from targeted treatment [89e91]. In many individuals suspected of LS, rather than evidently inactivating MMR gene mutations, 4000 different VUS have been identified to date [https://www.insight-group.org/variants/databases and https://www.ncbi.nlm.nih.gov/clinvar [23]]. Despite intensive efforts, only a minority of these VUS could be classified as (likely) benign or (likely) pathogenic by using mostly clinical criteria [92]. In case an individual carries a variant remains that unclassified, his or her first-degree relatives will not be tested for presence of the VUS and, consequently, personalized healthcare cannot be implemented for these families. For decades, the biochemistry and genetics of MMR have been the subject of intensive research. The phenotypic and biochemical consequences of MMR deficiency for the cell, and also the causality of loss of MMR activity for LS, have been very well established [91,93]. Therefore, calibrated and validated in cellulo and cell-free assays to assess the biochemical and phenotypic consequences of loss of MMR activity can be appropriate tools in the classification of VUS identified in suspected LS patients. Such assays have independently been developed by many laboratories. Unfortunately, most assays have not been properly calibrated and validated which interfere with their use as diagnostic tools. Below we will concisely summarize these assays (Table 8.1). The MMR pathway and its components have been conserved during evolution. This notion has prompted the development of assays to investigate molecular and phenotypic effects of human MMR gene VUS in lower eukaryotes, or even in bacteria, followed by phenotypic analysis. One such assay relies on the overexpression of a variant in the yeast Saccharomyces cerevisiae under the premise that pathogenic variants have lost their ability to interfere in a dominant-negative fashion with yeast MMR [94]. Another assay relies on the phenotypic characterization of Escherichia coli cells expressing human variants in the homologous E. coli MMR protein [95]. Although these assays have unveiled defects in some VUS, artifacts owing to species-specific idiosyncrasies cannot be excluded. For this reason, they cannot be used for diagnostic classification of human VUS in the absence of more direct functional evidence.

Example: DNA mismatch repair genes

155

Table 8.1 Assays to analyze VUS in MMR genes. References

Advantages

Disadvantages

· Bona fide in cellulo

· Species-specific effects · Limited validation · Aberrant expression · Only exonic VUS · Limited validation · May not be applicable to

In cellulo assays Escherichia coli or yeast-based

[94,95]

Transient transfectionbased

[94e99]

Genomic

[100e103]

models

all variants

· Limited validation

Cell-free assays MMR activity assays

[104e107,109]

· Rapid and cost-effective · Calibrated and validated

· Only exonic VUS · Possibility of false

· Rapid and cost-effective · Calibrated and validated

· Relatively low

negatives

In silico analysis [97,108]

sensitivity and specificity

This caveat does not apply to functional assays using mammalian cells. Several such assays have been developed, initially depending on the ectopic expression, by transfection, of the MMR gene containing a VUS in cells that are deficient for that gene [95e98]. This allows to study in cellulo phenotypes of the VUS proteins, such as (loss of) stability or of dimerization capacity, altered intracellular localization, and other parameters. In a modified approach, MMR activity is assessed in extracts of such transfected cells [99]. Nevertheless, ectopic expression of variants may not fully mimic endogenous expression, and therefore artifactual results cannot be excluded. In contrast, introduction of the VUS at its genomic locus does mimic the level and regulation of expression of LS-associated cancer cells that carry the VUS. For these reasons, this represents a bona fide approach to investigate the functionality of the VUS by phenotypic analysis of the resulting cell lines. These genomic variants can be generated by various approaches, including by Cas9/CRISPRmediated gene editing [100] or by using transfection with mutagenic oligonucleotides [101,102]. In a separate approach, using a random mutagenesis screen, a large number of deleterious MMR gene variants have been obtained, analyzed, and catalogued. Such catalogues of deleterious

156

Chapter 8 Functional evidence (II) protein and enzyme

(i.e., pathogenic) variants can be used for the identification of pathogenic human VUS in the absence of additional functional assays [103,104] (Drost et al., in preparation). Cell-free assays can rapidly quantify the functional activity of VUS in an MMR gene. Such assays are performed by incubating an in vitro expressed VUS protein with a nuclear extract devoid of that protein and a mismatched substrate [105e107]. Alternatively, an extract from cells, transfected with the VUS-containing gene of interest, can be used [99]. Of note, cell-free assays can only be used to test exonic VUS and might not identify pathogenic variants in which in vivo properties, such as intracellular localization, are perturbed. Nevertheless, Bayesian integration of their calibrated, quantitative, output with that of in silico analysis of the VUS [108] enables the rapid classification of VUS with high sensitivities and specificities [104,109]. This indicates that such a two-component cell-free analysis may be a feasible approach for the classification of the large majority of all MMR gene VUS. A caveat present in all functional assays in vitro, pathogenic variants which affect functions not assessed by the assay may not score as pathogenic.

Example: BLM Bloom syndrome is a rare autosomal recessive disease caused by loss-of-function variants in the BLM gene [OMIM # 210900]. Individuals affected by the Bloom syndrome are prone to develop different cancer types. The BLM gene product is a member of the RecQ helicase family and is involved in genome integrity maintenance by recognizing and resolving stalled replication forks [110,111]. BLM encodes a 1417 amino acid protein that acts as an ATP-dependent 30 -50 DNA helicase, preferentially unwinding complex structures as D-loops and Holliday junctions [112]. Cell-free and cell-based assays have been used to characterize rare variants in BLM. Cell-free assays are based on the canonical role of BLM as a helicase and measure helicase and ATPase activities and its ability to bind ATP, double-stranded DNA, and single-stranded DNA. Notably, some variants that abrogate the BLM enzymatic activities while retaining DNA-binding properties exist [113]. An independent approach relies on the expression of a BLM chimera composed of the amino terminus of the Saccharomyces cerevisiae orthologue (Sgs1) and the carboxy terminus of human BLM. This “humanized model” is used to interrogate the impact of human BLM missense variants on hydroxyurea treatment [114]. It is noteworthy that yeast cells expressing truncated Sgs1 are sensitive to hydroxyurea treatment, but human BLM-dependent sensitivity varies with cell type and depletion method [114e116]. This chimeric cell-based model was used to evaluate 27 missense variants and identified 9 that induced hypersensitivity to hydroxyurea treatment and are prone to cause Bloom syndrome. As exemplified here, functional data are critical for rare genetic conditions and its clinical management.

Example: RHO Retinitis pigmentosa is a rare condition but one of the most frequent inherited retinopathies (OMIM # 268000). It is often associated with autosomal dominant pathogenic variants in Rhodopsin (RHO), and rarely with recessive genotype [117]. RHO pathogenic variants lead to classical retinitis pigmentosa which is characterized by rod cells (photoreceptors) death leading to progressive vision loss.

Example: CFTR

157

RHO is a transmembrane receptor with a luminal amino terminal region and a cytosolic carboxyterminal end. The RHO carboxy-terminus is coupled to the transducing complex upon which light stimuli promote GTPeGDP exchange and signal transduction [117]. Studies suggest that RHO oligomerization is important for its functions, but the exact mechanism is not clear [118,119]. RHO variants are divided into seven classes according to the Human Gene Mutation Database (http://www.hgmd.cf.ac.uk/): (1) post-Golgi trafficking and outer segment targeting; (2) misfolding, endoplasmic reticulum retention, and stability; (3) disrupted vesicular traffic and endocytosis; (4) altered posttranslational modifications and reduced stability; (5) impaired transducin activation; (6) constitutive activation; and (7) dimerization deficiency [117]. The seven classes are the basis of RHO functional evaluation. Most of the cell-based in vitro functional assays available for RHO are focused on its cellular localization in the membrane of human cells [46,120e125]. Some cell-free assays evaluate the absorbance spectrum of purified RHO [121,125] and RHO-dependent transducing activation [121,123]. To improve screening, a high throughput method was developed to functionally evaluate a large number of variants at once. The assay is based on flow cytometric sorting of membrane-localized ectopically expressed RHO [46]. These cells can be sequenced, and the variants classified based on the accurate localization of RHO. The well-characterized p.Pro23His pathogenic variant does not localize to the cell membrane and is used as a loss-of-function negative control [46,126]. This approach was used to classify 210 RHO variants. It is important to stress that variants that fail to localize in the cytoplasmic membrane are pathogenic; however, pathogenic variants which abrogated other functions but are correctly targeted may not be scored as pathogenic in this assay.

Example: CFTR Cystic fibrosis (CF) is a rare autosomal recessive genetic condition characterized by mutations in the CFTR (cystic fibrosis transmembrane conductance regulator) gene (OMIM # 219700) [127]. CFTR is an ATP-binding cassette transporter that works as a chloride channel controlling the absorption of ions and water in epithelial cells and its dysfunction leads to mucus accumulation. Over time, the mucus dehydrates and becomes thicker, leading to the obstruction several ducts, including the airways [127]. The CFTR p.Phe508del variant is the most common pathogenic variant associated with CF development and has been estimated to account for approximately 70% of all CF cases. On the other hand, more than 2000 missense variants have been identified in CFTR. There are six classes of CFTR variants: (I) absence of mRNA (I-A) and protein (I-B) expression; (II) dysfunctional trafficking; (III) impaired channel opening; (IV) decreased conductance; and (V) reduced protein levels [128]. Based on that, functional assays available focus on the impact of variants on these six phenotypes. Most of the effort is focused on measuring protein levels/stability in cell-based assays as an outcome of four different classes (I-A, I-B, II, and V) and chloride conductance [129e132]. Raraigh et al. systematically evaluated 42 CFTR missense variants by several approaches [132]. The mRNA expression of 48 stable CFTR-expressing cell lines was used to normalize the protein levels of CFTR for chloride conductance measurement assays. Approximately 90% (26 out of 29) of CF-associated variants evaluated have shown less than 25% of activity when compared with the wild type. Importantly, functional data can form the basis of precision medicine protocols for patients carrying characterized class III mutants that can benefit from specific CFTR modulators [130,133].

158

Chapter 8 Functional evidence (II) protein and enzyme

High-throughput assays The functional assessment of variants tends to follow a course where reported variants which are suspected of being pathogenic are chosen to be interrogated first. This biased strategy is justified because, when resources are limited, they provide the greatest benefit. With the widespread application of clinical tests based on sequencing, the number of VUS has outpaced the capacity to conduct functional assays. On the other hand, advances in sequencing have also facilitated the use of highthroughput assays based on saturation mutagenesis (i.e., where all possible single-nucleotide changes that lead to unique amino acid changes are tested). Such an unbiased approach can provide a comprehensive view of protein regions that are less tolerant to changes as was discussed above for RHO. Recent high-throughput approaches in other genes have also provided functional information for much larger sample sizes. Taking advantage of the fact that BRCA1 depletion in human haploid HAP1 cells leads to lethality, a saturation mutagenesis-based study evaluated 3893 single-nucleotide variants (SNV) in BRCA1 RING and tandem BRCT domains with high sensitivity and specificity. Interestingly, as this approach uses genomic editing techniques, it is able to evaluate the impact of SNV in splicing regions, which is usually considered challenging for functional interpretation [134]. The total number of SNV tested accounts for approximately 97% of all possible variants within these coding regions. The same strategy can be expanded to other essential genes in HAP1 cells (e.g., BRCA2, PALB2, and BARD1) [134] and saturation mutagenesis approaches are not limited to essential genes in haploid cells. The tumor suppressor protein PTEN is a phosphatase involved in apoptosis, cell migration, and angiogenic processes. Carriers of PTEN pathogenic variants are prone to cancer development and can present developmental delay (OMIM #601728). PTEN has been implicated in the modulation of the Phosphoinositide 3-Kinase (PI3K) signaling pathway, which is stimulated by growth factors and leads to cell cycle progression, cell proliferation, and escape from apoptosis [135]. Mechanistically, PI3K phosphorylates the intermediate phosphatidylinositol [4,5]-bisphosphate (PIP2), producing PIP3, which recruits and promotes AKT activation. PTEN counteracts these events by dephosphorylating PIP3, thus inhibiting the PI3K pathway [136]. PTEN is a challenging target for functional evaluation due to the hypomorphic (partial function) behavior of many missense variants. To gain more insight into the impact of variants in PTEN activity, an in vitro cell-based assay has been developed in which the enzyme can dephosphorylate its preferential substrate exclusively because yeast cells do not use PIP3-dependent signaling pathways. Thus, expression of human PI3K induces a toxic production of PIP3 and depletion of PIP2 pools in these cells [137]. Cells are rescued from lethality in a PTEN-dependent manner. By using a saturation mutagenesis library, this functional assay evaluated 7244 SNV in PTEN (representing 95% of all intended mutations) [138]. This study shows the power of a high-throughput approach when combined with a sensitive assay.

In vivo assays While large high-throughput experiments can provide functional information on thousands of variants in one or a few molecular or cellular functions, they are not amenable to assess the organismal impact of variants on disease development. Several animal models have been developed for that end. The

In vivo assays

159

appropriateness of an animal model may depend on the conservation of sequence and function of the gene under study (see Chapter 7: Functional Evidence: model organisms). Classic genetic studies have been using the fruit fly (D. melanogaster) as a model for more than 100 years [139]. Its short life cycle allows for genetic experiments over several generations and for large sample sizes. Studies to model human disease are based on the complementation of the D. melanogaster orthologue depletion and therefore depend on the conservation between humans and flies. Functional reconstitution can be performed using human or even D. melanogaster-derived sequences (when assessing evolutionarily conserved regions). In some cases, depletion leads to lethality and complementation with human sequences is not sufficient to rescue the phenotype. In these cases, the overexpression of the human cDNA is used to overcome this issue, which usually leads to an aberrant development that can be used as a proxy for functional evaluation of the human protein [140]. For example, pathogenic variants of the DNM1L gene are associated with outcomes that vary from developmental delay to death due to defective mitochondrial and peroxisomal fission 1 (EMPF1). Studies using D. melanogaster as a model have shown that the depletion of the DNM1L orthologue in flies leads to lethality that can be rescued by the expression of human wild-type DNM1L, but not by a sequence containing a pathogenic variant [141,142]. Mechanistically, this phenotype is explained by the observation that DNM1L mutation leads to a decrease in the number of muscular mitochondria and salivary gland peroxisomes [141,142]. An alternative approach was also implemented to assess the role of MARK3 in visual impairment. A missense variant in MARK3 was identified as the putative cause of visual impairment and progressive phthisis bulbi. The p.Arg570Gly variant segregated with disease in three affected members of the same family. In D. melanogaster, the depletion of the MARK3 orthologue (Par-1) leads to impaired eye development. Flies expressing mutated Par-1 in the conserved arginine residue (p.Arg792Gly) also present impaired eye function and development [143]. Similar to the fruit fly, zebra fish (D. rerio) is another widely used animal model. The short life cycle, large offspring, and a genetic structure similar to humans (orthologues for more than 80% of human diseaseeassociated genes) make the zebra fish an important tool for genetic studies. It has also been used as an in vivo model for variant evaluation. For example, the role of HCN4 variants in congenital sick sinus syndrome (SSS) is replicated in a zebra fish model. Knockdown of D. rerio hcn4 leads to the same phenotypes observed in human SSS. An in vivo embryonic assay identified known pathogenic and benign variants in hcn4 and assessed the behavior of four variants (two benign and two hypomorphic) [144]. In addition to D. melanogaster and D. rerio, the mouse (M. musculus) is a well-established model with close proximity to humans and is ideal for variant analysis. Brca1-null embryos die before E7.5 and this lethal phenotype can be used as a proxy for pathogenicity. The same phenotype is observed for the known pathogenic p.Cys64Gly variant and for p.Ala1708Glu establishing their impact on in vivo function [145]. Throughout the years, several mouse and rat models have been developed to functionally assess the impact of Rho variants on photoreceptor degeneration as heterozygous Rho mice mimic the development of RP in humans [146]. They have been used to propose alternative treatments such as the use of antisense oligonucleotide to target mutated Rho [147] and to interrogate the impact of variants in retinitis pigmentosa development [120,123].

160

Chapter 8 Functional evidence (II) protein and enzyme

saturation and large scale mutagenesis

Animal models

variants tested per assay

functions interrogated per variant

functional assays

FIGURE 8.4 The spectrum of functional assays for disease genes. A variety of functional assays have been developed to assess the impact of missense variants. At one end of the spectrum there are assays based on large-scale mutagenesis that assess the function of a large number of missense variants for one or a few biochemical functions. At the other end, animal models assess the function of one or a few missense variants but provide in-depth analysis of the impact of the variant on organismal development.

Conclusion Functional assays for evaluation of missense variants in disease genes span the spectrum from saturation mutagenesis studies which assess a large number of variants for one or a few cellular or molecular functions to complex animal models in which the impact of a few variants is assessed on an organismal context (Fig. 8.4). In order to understand the effect of variation on disease and provide accurate risk estimates, information from independent assays at both ends of this spectrum will be necessary. Approaches that incorporate functional results with other sources of data such as clinical data will be critical to the implementation of precision medicine [148] (Iversen et al., submitted).

Conflict of interest statement The authors state that they have no conflict of interest to disclose.

References [1] Goodfriend TL, Kaplan NO. Isoenzymes in clinical diagnosis. Circulation 1965;32:1010e9. [2] Wroblewski F, Gregory KF. Lactic dehydrogenase isozymes and their distribution in normal tissues and plasma and in disease states. Ann N Y Acad Sci 1961;94:912e32. [3] Latner AL, Skillen AW. Clinical applications of dehydrogenase isoenzymes. Lancet 1961;2:2186e8. [4] Mukai S, Rapaport JM, Shields JA, Augsburger JJ, Dryja TP. Linkage of genes for human esterase D and hereditary retinoblastoma. Am J Ophthalmol 1984;97:681e5.

References

161

[5] Lee EY, Lee WH. Molecular cloning of the human esterase D gene, a genetic marker of retinoblastoma. Proc Natl Acad Sci USA 1986;83:6337e41. [6] Hopkinson DA, Mestriner MA, Cortner J, Harris H. Esterase D: a new human polymorphism. Ann Hum Genet 1973;37:119e37. [7] Benedict WF, Murphree AL, Banerjee A, Spina CA, Sparkes MC, Sparkes RS. Patient with 13 chromosome deletion: evidence that the retinoblastoma gene is a recessive cancer gene. Science 1983;219:973e5. [8] Knudson Jr AG. Mutation and cancer: statistical study of retinoblastoma. Proc Natl Acad Sci USA 1971;68: 820e3. [9] Miki Y, Swensen J, Shattuck-Eidens D, Futreal PA, Harshman K, Tavtigian S, Liu Q, Cochran C, Bennett LM, Ding W. A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. Science 1994;266:66e71. [10] Hall JM, Lee MK, Newman B, Morrow JE, Anderson LA, Huey B, King MC. Linkage of early-onset familial breast cancer to chromosome 17q21. Science 1990;250:1684e9. [11] Riordan JR, Rommens JM, Kerem B, Alon N, Rozmahel R, Grzelczak Z, Zielenski J, Lok S, Plavsic N, Chou JL, et al. Identification of the cystic fibrosis gene: cloning and characterization of complementary DNA. Science 1989;245:1066e73. [12] The Huntington’s Disease Collaborative Research Group. A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes. Cell 1993;72:971e83. [13] Malkin D, Li FP, Strong LC, Fraumeni JJF, Nelson CE, Kim DH, Kassel J, Gryka MA, Bischoff FZ, Tainsky MA, et al. Germ line p53 mutations in a familial syndrome of breast cancer, sarcomas and other neoplams. Science 1990;250:1233e8. [14] Cerretelli G, Ager A, Arends MJ, Frayling IM. Molecular Pathology of lynch syndrome. J Pathol 2020; 250(5):518e31. [15] Orita M, Iwahana H, Kanazawa H, Hayashi K, Sekiya T. Detection of polymorphisms of human DNA by gel electrophoresis as single-strand conformation polymorphisms. Proc Natl Acad Sci USA 1989;86:2766e70. [16] Fischer SG, Lerman LS. Length-independent separation of DNA restriction fragments in two-dimensional gel electrophoresis. Cell 1979;16:191e200. [17] Yau SC, Bobrow M, Mathew CG, Abbs SJ. Accurate diagnosis of carriers of deletions and duplications in Duchenne/Becker muscular dystrophy by fluorescent dosage analysis. J Med Genet 1996;33:550e8. [18] Roest PA, Roberts RG, Sugino S, van Ommen GJ, den Dunnen JT. Protein truncation test (PTT) for rapid detection of translation-terminating mutations. Hum Mol Genet 1993;2:1719e21. [19] Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci USA 1977;74:5463e7. [20] Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol 2008;26:1135e45. [21] Starita LM, Ahituv N, Dunham MJ, Kitzman JO, Roth FP, Seelig G, Shendure J, Fowler DM. Variant interpretation: functional assays to the rescue. Am J Hum Genet 2017;101:315e25. [22] Eggington JM, Bowles KR, Moyes K, Manley S, Esterling L, Sizemore S, Rosenthal E, Theisen A, Saam J, Arnell C, et al. A comprehensive laboratory-based program for classification of variants of uncertain significance in hereditary cancer genes. Clin Genet 2014;86:229e37. [23] Wright M, Menon V, Taylor L, Shashidharan M, Westercamp T, Ternent CA. Factors predicting reclassification of variants of unknown significance. Am J Surg 2018;216:1148e54. [24] Liu CY, Flesken-Nikitin A, Li S, Zeng Y, Lee WH. Inactivation of the mouse Brca1 gene leads to failure in the morphogenesis of the egg cylinder in early postimplantation development. Genes Dev 1996;10: 1835e43. [25] Struewing JP, Hartge P, Wacholder S, Baker SM, Berlin M, McAdams M, Timmerman MM, Brody LC, Tucker MA. The risk of cancer associated with specific mutations of BRCA1 and BRCA2 among Ashkenazi Jews. N Engl J Med 1997;336:1401e8.

162

Chapter 8 Functional evidence (II) protein and enzyme

[26] Sawyer SL, Tian L, Kahkonen M, Schwartzentruber J, Kircher M, University of Washington Centre for Mendelian G, Consortium FC, Majewski J, Dyment DA, Innes AM, et al. Biallelic mutations in BRCA1 cause a new Fanconi anemia subtype. Canc Discov 2015;5:135e42. [27] Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of medical genetics and genomics and the association for molecular Pathology. Genet Med 2015;17:405e24. [28] Ghosh R, Oak N, Plon SE. Evaluation of in silico algorithms for use with ACMG/AMP clinical variant interpretation guidelines. Genome Biol 2017;18:225. [29] Hart SN, Hoskin T, Shimelis H, Moore RM, Feng B, Thomas A, et al. Comprehensive annotation of BRCA1 and BRCA2 missense variants by functionally validated sequence-based computational prediction models. Genet Med 2019;21:71e80. [30] Ng PC, Henikoff S. Predicting deleterious amino acid substitutions. Genome Res 2001;11:863e74. [31] Sim NL, Kumar P, Hu J, Henikoff S, Schneider G, Ng PC. SIFT web server: predicting effects of amino acid substitutions on proteins. Nucleic Acids Res 2012;40:W452e7. [32] Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR. A method and server for predicting damaging missense mutations. Nat Methods 2010;7:248e9. [33] Tavtigian SV, Byrnes GB, Goldgar DE, Thomas A. Classification of rare missense substitutions, using risk surfaces, with genetic- and molecular-epidemiology applications. Hum Mutat 2008;29:1342e54. [34] Li B, Krishnan VG, Mort ME, Xin F, Kamati KK, Cooper DN, Mooney SD, Radivojac P. Automated inference of molecular mechanisms of disease from amino acid substitutions. Bioinformatics 2009;25: 2744e50. [35] Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 2014;46:310e5. [36] Choi Y, Chan AP. PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics 2015;31:2745e7. [37] Feng BJ. PERCH: a unified framework for disease gene prioritization. Hum Mutat 2017;38:243e51. [38] Stone EA, Sidow A. Physicochemical constraint violation by missense substitutions mediates impairment of protein function and disease severity. Genome Res 2005;15:978e86. [39] Tavtigian SV, Greenblatt MS, Lesueur F, Byrnes GB, Group IUGVW. In silico analysis of missense substitutions using sequence-alignment based methods. Hum Mutat 2008;29:1327e36. [40] Karchin R, Monteiro AN, Tavtigian SV, Carvalho MA, Sali A. Functional impact of missense variants in BRCA1 predicted by supervised learning. PLoS Comput Biol 2007;3:e26. [41] Mirkovic N, Marti-Renom MA, Weber BL, Sali A, Monteiro AN. Structure-based assessment of missense mutations in human BRCA1: implications for breast and ovarian cancer predisposition. Canc Res 2004;64: 3790e7. [42] Martelotto LG, Ng CK, De Filippo MR, Zhang Y, Piscuoglio S, Lim RS, Shen R, Norton L, Reis-Filho JS, Weigelt B. Benchmarking mutation effect prediction algorithms using functionally validated cancer-related missense mutations. Genome Biol 2014;15:484. [43] Ernst C, Hahnen E, Engel C, Nothnagel M, Weber J, Schmutzler RK, Hauke J. Performance of in silico prediction tools for the classification of rare BRCA1/2 missense variants in clinical diagnostics. BMC Med Gen 2018;11:35. [44] Hicks S, Wheeler DA, Plon SE, Kimmel M. Prediction of missense mutation functionality depends on both the algorithm and sequence alignment employed. Hum Mutat 2011;32:661e8. [45] Toland AE, Andreassen PR. DNA repair-related functional assays for the classification of BRCA1 and BRCA2 variants: a critical review and needs assessment. J Med Genet 2017;54:721e31. [46] Wan A, Place E, Pierce EA, Comander J. Characterizing variants of unknown significance in rhodopsin: a functional genomics approach. Hum Mutat 2019;40:1127e44.

References

163

[47] Monteiro AN, Bouwman P, Kousholt AN, Eccles DM, Millot GA, Masson JY, et al. Variants of uncertain clinical significance in hereditary breast and ovarian cancer genes: best practices in functional analysis for clinical annotation. J Med Genet 2020;57:509e18. [48] Plon SE, Eccles DM, Easton D, Foulkes WD, Genuardi M, Greenblatt MS, Hogervorst FB, Hoogerbrugge N, Spurdle AB, Tavtigian SV. Sequence variant classification and reporting: recommendations for improving the interpretation of cancer susceptibility genetic test results. Hum Mutat 2008;29: 1282e91. [49] Brnich SE, Abou Tayoun AN, Couch FJ, Cutting GR, Greenblatt MS, Heinen CD, Kanavy DM, Luo X, McNulty SM, Starita LM, et al. Recommendations for application of the functional evidence PS3/BS3 criterion using the ACMG/AMP sequence variant interpretation framework. Genome Med 2019;12:3. https://doi.org/10.1186/s13073-13019-10690-13072. [50] Thompson BA, Walters R, Parsons MT, Dumenil T, Drost M, Tiersma Y, Lindor NM, Tavtigian SV, de Wind N, Spurdle AB, et al. Contribution of mRNA splicing to mismatch repair gene sequence variant interpretation. Front Genet 2020;11. [51] Wooster R, Bignell G, Lancaster J, Swift S, Seal S, Mangion J, Collins N, Gregory S, Gumbs C, Micklem G. Identification of the breast cancer susceptibility gene BRCA2. Nature 1995;378:789e92. [52] Venkitaraman AR. Functions of BRCA1 and BRCA2 in the biological response to DNA damage. J Cell Sci 2001;114:3591e8. [53] Venkitaraman AR. Linking the cellular functions of BRCA genes to cancer pathogenesis and treatment. Annu Rev Pathol 2009;4:461e87. [54] Narod SA, Foulkes WD. BRCA1 and BRCA2: 1994 and beyond. Nat.Rev.Cancer 2004;4:665e76. [55] Millot GA, Carvalho MA, Caputo SM, Vreeswijk MP, Brown MA, Webb M, Rouleau E, Neuhausen SL, Hansen T, Galli A, et al. A guide for functional analysis of BRCA1 variants of uncertain significance. Hum Mutat 2012;33:1526e37. [56] Guidugli L, Carreira A, Caputo SM, Ehlen A, Galli A, Monteiro AN, Neuhausen SL, Hansen TV, Couch FJ, Vreeswijk MP, et al. Functional assays for analysis of variants of uncertain significance in BRCA2. Hum Mutat 2014;35:151e64. [57] Lyra Jr PC, Paulo C, Nepomuceno TC, de Souza ML, Machado GF, Veloso MF, et al. Integration of functional assay data results provides strong evidence for classification of hundreds of BRCA1 variants of uncertain significance. Genet Med 2020. https://doi.org/10.1038/s41436-020-00991-0. Online ahead of print. [58] Parsons MT, Tudini E, Li H, Hahnen E, Wappenschmidt B, Feliubadalo L, Aalfs CM, Agata S, Aittomaki K, Alducci E, et al. Large scale multifactorial likelihood quantitative analysis of BRCA1 and BRCA2 variants: an ENIGMA resource to support clinical variant classification. Hum Mutat 2019;40: 1557e78. [59] Wu LC, Wang ZW, Tsan JT, Spillman MA, Phung A, Xu XL, Yang MC, Hwang LY, Bowcock AM, Baer R. Identification of a RING protein that can interact in vivo with the BRCA1 gene product. Nat Genet 1996;14: 430e40. [60] Wu-Baer F, Lagrazon K, Yuan W, Baer R. The BRCA1/BARD1 heterodimer assembles polyubiquitin chains through an unconventional linkage involving lysine residue K6 of ubiquitin. J Biol Chem 2003;278: 34743e6. [61] Ruffner H, Joazeiro CA, Hemmati D, Hunter T, Verma IM. Cancer-predisposing mutations within the RING domain of BRCA1: loss of ubiquitin protein ligase activity and protection from radiation hypersensitivity. Proc Natl Acad Sci USA 2001;98:5134e9. [62] Hashizume R, Fukuda M, Maeda I, Nishikawa H, Oyake D, Yabuki Y, Ogata H, Ohta T. The ring heterodimer brca1-bard1 is a ubiquitin ligase inactivated by a breast cancer-derived mutation. J Biol Chem 2001; 276:14537e40.

164

Chapter 8 Functional evidence (II) protein and enzyme

[63] Morris JR, Pangon L, Boutell C, Katagiri T, Keep NH, Solomon E. Genetic analysis of BRCA1 ubiquitin ligase activity and its relationship to breast cancer susceptibility. Hum Mol Genet 2006;15:599e606. [64] Starita LM, Young DL, Islam M, Kitzman JO, Gullingsrud J, Hause RJ, Fowler DM, Parvin JD, Shendure J, Fields S. Massively parallel functional analysis of BRCA1 RING domain variants. Genetics 2015;200: 413e22. [65] Caleca L, Colombo M, van Overeem Hansen T, Lazaro C, Manoukian S, Parsons MT, Spurdle AB, Radice P. GFP-fragment reassembly screens for the functional characterization of variants of uncertain significance in protein interaction domains of the BRCA1 and BRCA2 genes. Cancers 2019;11. [66] Monteiro AN, August A, Hanafusa H. Evidence for a transcriptional activation function of BRCA1 C-terminal region. Proc Natl Acad Sci USA 1996;93:13595e9. [67] Hayes F, Cayanan C, Barilla D, Monteiro AN. Functional assay for BRCA1: mutagenesis of the COOH-terminal region reveals critical residues for transcription activation. Canc Res 2000;60: 2411e8. [68] Carvalho MA, Marsillac SM, Karchin R, Manoukian S, Grist S, Swaby RF, Urmenyi TP, Rondinelli E, Silva R, Gayol L, et al. Determination of cancer risk associated with germ line BRCA1 missense variants by functional analysis. Canc Res 2007;67:1494e501. [69] Lee MS, Green R, Marsillac SM, Coquelle N, Williams RS, Yeung T, Foo D, Hau DD, Hui B, Monteiro AN, et al. Comprehensive analysis of missense variations in the BRCT domain of BRCA1 by structural and functional assays. Canc Res 2010;70:4880e90. [70] Fernandes VC, Golubeva VA, Di Pietro G, Shields C, Amankwah K, Nepomuceno TC, de Gregoriis G, Abreu RBV, Harro C, Gomes TT, et al. Impact of amino acid substitutions at secondary structures in the BRCT domains of the tumor suppressor BRCA1: implications for clinical annotation. J Biol Chem 2019; 294:5980e92. [71] Woods NT, Baskin R, Golubeva V, Jhuraney A, De-Gregoriis G, Vaclova T, Goldgar DE, Couch FJ, Carvalho MA, Iversen ES, et al. Functional assays provide a robust tool for the clinical annotation of genetic variants of uncertain significance. NPJ Genom Med 2016;1. [72] Bouwman P, van der Gulden H, van der Heijden I, Drost R, Klijn CN, Prasetyanti P, Pieterse M, Wientjens E, Seibler J, Hogervorst FB, et al. A high-throughput functional complementation assay for classification of BRCA1 missense variants. Canc Discov 2013;3:1142e55. [73] Moynahan ME, Pierce AJ, Jasin M. BRCA2 is required for homology-directed repair of chromosomal breaks. Mol Cell 2001;7:263e72. [74] Venkitaraman AR. The breast cancer susceptibility gene, BRCA2: at the crossroads between DNA replication and recombination? Philos Trans R Soc Lond B Biol Sci 2000;355:191e8. [75] Davies AA, Masson JY, McIlwraith MJ, Stasiak AZ, Stasiak A, Venkitaraman AR, West SC. Role of BRCA2 in control of the RAD51 recombination and DNA repair protein. Mol Cell 2001;7:273e82. [76] Scully R, Livingston D. In search of the tumour-suppressor functions of BRCA1 and BRCA2. Nature 2000; 408:429e32. [77] Tutt A, Bertwistle D, Valentine J, Gabriel A, Swift S, Ross G, Griffin C, Thacker J, Ashworth A. Mutation in Brca2 stimulates error-prone homology-directed repair of DNA double-strand breaks occurring between repeated sequences. EMBO J 2001;20:4704e16. [78] Carreira A, Hilario J, Amitani I, Baskin RJ, Shivji MK, Venkitaraman AR, Kowalczykowski SC. The BRC repeats of BRCA2 modulate the DNA-binding selectivity of RAD51. Cell 2009;136:1032e43. [79] Yang HJ, Jeffrey PD, Miller J, Kinnucan E, Sun YT, Thoma NH, Zheng N, Chen PL, Lee WH, Pavletich NP. BRCA2 function in DNA binding and recombination from a BRCA2-DSS1-ssDNA structure. Science 2002;297:1837e48. [80] Mesman RLS, Calleja F, Hendriks G, Morolli B, Misovic B, Devilee P, et al. The functional impact of variants of uncertain significance in BRCA2. Genet Med 2019;21:293e302.

References

165

[81] Guidugli L, Pankratz VS, Singh N, Thompson J, Erding CA, Engel C, Schmutzler R, Domchek S, Nathanson K, Radice P, et al. A classification model for BRCA2 DNA binding domain missense variants based on homology-directed repair activity. Canc Res 2013;73:265e75. [82] Xia B, Sheng Q, Nakanishi K, Ohashi A, Wu J, Christ N, Liu X, Jasin M, Couch FJ, Livingston DM. Control of BRCA2 cellular and clinical functions by a nuclear partner, PALB2. Mol Cell 2006;22:719e29. [83] Biswas K, Das R, Eggington JM, Qiao H, North SL, Stauffer S, Burkett SS, Martin BK, Southon E, Sizemore SC, et al. Functional evaluation of BRCA2 variants mapping to the PALB2-binding and C-terminal DNA-binding domains using a mouse ES cell-based assay. Hum Mol Genet 2012;21: 3993e4006. [84] Wang HF, Takenaka K, Nakanishi A, Miki Y. BRCA2 and nucleophosmin coregulate centrosome amplification and form a complex with the Rho effector kinase ROCK2. Canc Res 2011;71:68e77. [85] Tutt A, Gabriel A, Bertwistle D, Connor F, Paterson H, Peacock J, Ross G, Ashworth A. Absence of Brca2 causes genome instability by chromosome breakage and loss associated with centrosome amplification. CurrBiol 1999;9:1107e10. [86] Mijic S, Zellweger R, Chappidi N, Berti M, Jacobs K, Mutreja K, Ursich S, Ray Chaudhuri A, Nussenzweig A, Janscak P, et al. Replication fork reversal triggers fork degradation in BRCA2-defective cells. Nat Commun 2017;8:859. [87] Farrugia DJ, Agarwal MK, Pankratz VS, Deffenbaugh AM, Pruss D, Frye C, Wadum L, Johnson K, Mentlick J, Tavtigian SV, et al. Functional assays for classification of BRCA2 variants of uncertain significance. Canc Res 2008;68:3523e31. [88] Wu K, Hinson SR, Ohashi A, Farrugia D, Wendt P, Tavtigian SV, Deffenbaugh A, Goldgar D, Couch FJ. Functional evaluation and cancer risk assessment of BRCA2 unclassified variants. Canc Res 2005;65: 417e26. [89] Ryan NAJ, Morris J, Green K, Lalloo F, Woodward ER, Hill J, Crosbie EJ, Evans DG. Association of mismatch repair mutation with age at cancer onset in lynch syndrome: implications for stratified surveillance strategies. JAMA Oncol 2017;3:1702e6. [90] Coletta AM, Peterson SK, Gatus LA, Krause KJ, Schembre SM, Gilchrist SC, Pande M, Vilar E, You YN, Rodriguez-Bigas MA, et al. Energy balance related lifestyle factors and risk of endometrial and colorectal cancer among individuals with lynch syndrome: a systematic review. Fam Cancer 2019; 18:399e420. [91] Cohen SA, Pritchard CC, Jarvik GP. Lynch syndrome: from screening to diagnosis to treatment in the era of modern molecular oncology. Annu Rev Genom Hum Genet 2019;20:293e307. [92] Thompson BA, Spurdle AB, Plazzer JP, Greenblatt MS, Akagi K, Al-Mulla F, Bapat B, Bernstein I, Capella G, den Dunnen JT, et al. Application of a 5-tiered scheme for standardized classification of 2,360 unique mismatch repair gene variants in the InSiGHT locus-specific database. Nat Genet 2014;46:107e15. [93] Kunkel TA, Erie DA. Eukaryotic mismatch repair in relation to DNA replication. Annu Rev Genet 2015;49: 291e313. [94] Takahashi M, Shimodaira H, Andreutti-Zaugg C, Iggo R, Kolodner RD, Ishioka C. Functional analysis of human MLH1 variants using yeast and in vitro mismatch repair assays. Canc Res 2007;67:4595e604. [95] Lutzen A, de Wind N, Georgijevic D, Nielsen FC, Rasmussen LJ. Functional analysis of HNPCC-related missense mutations in MSH2. Mutat Res 2008;645:44e55. [96] Andersen SD, Liberti SE, Lutzen A, Drost M, Bernstein I, Nilbert M, Dominguez M, Nystrom M, Hansen TV, Christoffersen JW, et al. Functional characterization of MLH1 missense variants identified in Lynch syndrome patients. Hum Mutat 2012;33:1647e55. [97] Arora S, Huwe PJ, Sikder R, Shah M, Browne AJ, Lesh R, Nicolas E, Deshpande S, Hall MJ, Dunbrack Jr RL, et al. Functional analysis of rare variants in mismatch repair proteins augments results from computation-based predictive methods. Canc Biol Ther 2017;18:519e33.

166

Chapter 8 Functional evidence (II) protein and enzyme

[98] Bouvet D, Bodo S, Munier A, Guillerm E, Bertrand R, Colas C, Duval A, Coulet F, Muleris M. Methylation tolerance-based functional assay to assess variants of unknown significance in the MLH1 and MSH2 genes and identify patients with lynch syndrome. Gastroenterology 2019;157:421e31. [99] Gonzalez-Acosta M, Hinrichsen I, Fernandez A, Lazaro C, Pineda M, Plotz G, Capella G. Validation of an in vitro mismatch repair assay used in the functional characterization of mismatch repair variants. J Mol Diagn 2020;22:376e85. [100] Rath A, Mishra A, Ferreira VD, Hu C, Omerza G, Kelly K, Hesse A, Reddi HV, Grady JP, Heinen CD. Functional interrogation of Lynch syndrome-associated MSH2 missense variants via CRISPR-Cas9 gene editing in human embryonic stem cells. Hum Mutat 2019;40:2044e56. [101] Houlleberghs H, Dekker M, Lantermans H, Kleinendorst R, Dubbink HJ, Hofstra RM, Verhoef S, Te Riele H. Oligonucleotide-directed mutagenesis screen to identify pathogenic Lynch syndromeassociated MSH2 DNA mismatch repair gene variants. Proc Natl Acad Sci USA 2016;113: 4128e33. [102] Houlleberghs H, Dekker M, Lusseveld J, Pieters W, van Ravesteyn T, Verhoef S, Hofstra RMW, Te Riele H. Three-step site-directed mutagenesis screen identifies pathogenic MLH1 variants associated with Lynch syndrome. J Med Genet 2020;57:308e15. [103] Drost M, Lutzen A, van Hees S, Ferreira D, Calleja F, Zonneveld JB, Nielsen FC, Rasmussen LJ, de Wind N. Genetic screens to identify pathogenic gene variants in the common cancer predisposition Lynch syndrome. Proc Natl Acad Sci USA 2013;110:9403e8. [104] Drost M, Tiersma Y, Glubb D, Kathe S, van Hees S, Calleja F, et al. Two integrated and highly predictive functional analysis-based procedures for the classification of MSH6 variants in Lynch syndrome. Genet Med 2020;22:847e56. [105] Drost M, Koppejan H, de Wind N. Inactivation of DNA mismatch repair by variants of uncertain significance in the PMS2 gene. Hum Mutat 2013;34:1477e80. [106] Drost M, Zonneveld JB, van Hees S, Rasmussen LJ, Hofstra RM, de Wind N. A rapid and cell-free assay to test the activity of lynch syndrome-associated MSH2 and MSH6 missense variants. Hum Mutat 2012;33: 488e94. [107] Drost M, Zonneveld J, van Dijk L, Morreau H, Tops CM, Vasen HF, Wijnen JT, de Wind N. A cell-free assay for the functional analysis of variants of the mismatch repair protein MLH1. Hum Mutat 2010;31: 247e53. [108] Thompson BA, Greenblatt MS, Vallee MP, Herkert JC, Tessereau C, Young EL, Adzhubey IA, Li B, Bell R, Feng B, et al. Calibration of multiple in silico tools for predicting pathogenicity of mismatch repair gene missense substitutions. Hum Mutat 2013;34:255e65. [109] Drost M, Tiersma Y, Thompson BA, Frederiksen JH, Keijzers G, Glubb D, Kathe S, Osinga J, Westers H, Pappas L, et al. A functional assay-based procedure to classify mismatch repair gene variants in Lynch syndrome. Genet Med 2019;21:1486e96. [110] Tikoo S, Sengupta S. Time to bloom. Genome Integr 2010;1:14. [111] Ellis NA, German J. Molecular genetics of Bloom’s syndrome. Hum Mol Genet 1996;5:1457e63. Spec No. [112] Manthei KA, Keck JL. The BLM dissolvasome in DNA replication and repair. Cell Mol Life Sci 2013;70: 4067e84. [113] Guo RB, Rigolet P, Ren H, Zhang B, Zhang XD, Dou SX, Wang PY, Amor-Gueret M, Xi XG. Structural and functional analyses of disease-causing missense mutations in Bloom syndrome protein. Nucleic Acids Res 2007;35:6297e310. [114] Mirzaei H, Schmidt KH. Non-Bloom syndrome-associated partial and total loss-of-function variants of BLM helicase. Proc Natl Acad Sci USA 2012;109:19357e62. [115] Davies SL, North PS, Dart A, Lakin ND, Hickson ID. Phosphorylation of the Bloom’s syndrome helicase and its role in recovery from S-phase arrest. Mol Cell Biol 2004;24:1279e91.

References

167

[116] Lahkim Bennani-Belhaj K, Buhagiar-Labarchede G, Jmari N, Onclercq-Delic R, Amor-Gueret M. BLM deficiency is not associated with sensitivity to hydroxyurea-induced replication stress. J Nucleic Acids 2010;2010. [117] Athanasiou D, Aguila M, Bellingham J, Li W, McCulley C, Reeves PJ, Cheetham ME. The molecular and cellular basis of rhodopsin retinitis pigmentosa reveals potential strategies for therapy. Prog Retin Eye Res 2018;62:1e23. [118] Jastrzebska B, Chen Y, Orban T, Jin H, Hofmann L, Palczewski K. Disruption of rhodopsin dimerization with synthetic peptides targeting an interaction interface. J Biol Chem 2015;290:25728e44. [119] Ploier B, Caro LN, Morizumi T, Pandey K, Pearring JN, Goren MA, Finnemann SC, Graumann J, Arshavsky VY, Dittman JS, et al. Dimerization deficiency of enigmatic retinitis pigmentosa-linked rhodopsin mutants. Nat Commun 2016;7:12832. [120] Chuang JZ, Vega C, Jun W, Sung CH. Structural and functional impairment of endocytic pathways by retinitis pigmentosa mutant rhodopsin-arrestin complexes. J Clin Invest 2004;114:131e40. [121] Toledo D, Ramon E, Aguila M, Cordomi A, Perez JJ, Mendes HF, Cheetham ME, Garriga P. Molecular mechanisms of disease for mutations at Gly-90 in rhodopsin. J Biol Chem 2011;286: 39993e40001. [122] Davies WI, Downes SM, Fu JK, Shanks ME, Copley RR, Lise S, Ramsden SC, Black GC, Gibson K, Foster RG, et al. Next-generation sequencing in health-care delivery: lessons from the functional analysis of rhodopsin. Genet Med 2012;14:891e9. [123] Hollingsworth TJ, Gross AK. The severe autosomal dominant retinitis pigmentosa rhodopsin mutant Ter349Glu mislocalizes and induces rapid rod cell death. J Biol Chem 2013;288:29047e55. [124] McKeone R, Wikstrom M, Kiel C, Rakoczy PE. Assessing the correlation between mutant rhodopsin stability and the severity of retinitis pigmentosa. Mol Vis 2014;20:183e99. [125] Liu MY, Liu J, Mehrotra D, Liu Y, Guo Y, Baldera-Aguayo PA, Mooney VL, Nour AM, Yan EC. Thermal stability of rhodopsin and progression of retinitis pigmentosa: comparison of S186W and D190N rhodopsin mutants. J Biol Chem 2013;288:17698e712. [126] Dryja TP, McGee TL, Hahn LB, Cowley GS, Olsson JE, Reichel E, Sandberg MA, Berson EL. Mutations within the rhodopsin gene in patients with autosomal dominant retinitis pigmentosa. N Engl J Med 1990; 323:1302e7. [127] Rafeeq MM, Murad HAS. Cystic fibrosis: current therapeutic targets and future approaches. J Transl Med 2017;15:84. [128] Bombieri C, Seia M, Castellani C. Genotypes and phenotypes in cystic fibrosis and cystic fibrosis transmembrane regulator-related disorders. Semin Respir Crit Care Med 2015;36:180e93. [129] Maitra R, Sivashanmugam P, Warner K. A rapid membrane potential assay to monitor CFTR function and inhibition. J Biomol Screen 2013;18:1132e7. [130] Van Goor F, Yu H, Burton B, Hoffman BJ. Effect of ivacaftor on CFTR forms with missense mutations associated with defects in protein processing or function. J Cyst Fibros 2014;13:29e36. [131] Eckford PD, Li C, Bear CE. Functional reconstitution and channel activity measurements of purified wildtype and mutant CFTR protein. J Vis Exp 2015;(97):52427. [132] Raraigh KS, Han ST, Davis E, Evans TA, Pellicore MJ, McCague AF, Joynt AT, Lu Z, Atalar M, Sharma N, et al. Functional assays are essential for interpretation of missense variants associated with variable expressivity. Am J Hum Genet 2018;102:1062e77. [133] Martiniano SL, Sagel SD, Zemanick ET. Cystic fibrosis: a model system for precision medicine. Curr Opin Pediatr 2016;28:312e7. [134] Findlay GM, Daza RM, Martin B, Zhang MD, Leith AP, Gasperini M, Janizek JD, Huang X, Starita LM, Shendure J. Accurate classification of BRCA1 variants with saturation genome editing. Nature 2018;562: 217e22.

168

Chapter 8 Functional evidence (II) protein and enzyme

[135] Fruman DA, Chiu H, Hopkins BD, Bagrodia S, Cantley LC, Abraham RT. The PI3K pathway in human disease. Cell 2017;170:605e35. [136] Noorolyai S, Shajari N, Baghbani E, Sadreddini S, Baradaran B. The relation between PI3K/AKT signalling pathway and cancer. Gene 2019;698:120e8. [137] Rodriguez-Escudero I, Oliver MD, Andres-Pons A, Molina M, Cid VJ, Pulido R. A comprehensive functional analysis of PTEN mutations: implications in tumor- and autism-related syndromes. Hum Mol Genet 2011;20:4132e42. [138] Mighell TL, Evans-Dutson S, O’Roak BJ. A saturation mutagenesis approach to understanding PTEN lipid phosphatase activity and genotype-phenotype relationships. Am J Hum Genet 2018;102:943e55. [139] Morgan TH. Sex limited inheritance in Drosophila. Science 1910;32:120e2. [140] Bellen HJ, Wangler MF, Yamamoto S. The fruit fly at the interface of diagnosis and pathogenic mechanisms of rare and common human diseases. Hum Mol Genet 2019;28:R207e14. [141] Assia Batzir N, Bhagwat PK, Eble TN, Liu P, Eng CM, Elsea SH, Robak LA, Scaglia F, Goldman AM, Dhar SU, et al. De novo missense variant in the GTPase effector domain (GED) of DNM1L leads to static encephalopathy and seizures. Cold Spring Harb Mol Case Stud 2019;5. [142] Chao YH, Robak LA, Xia F, Koenig MK, Adesina A, Bacino CA, Scaglia F, Bellen HJ, Wangler MF. Missense variants in the middle domain of DNM1L in cases of infantile encephalopathy alter peroxisomes and mitochondria when assayed in Drosophila. Hum Mol Genet 2016;25:1846e56. [143] Ansar M, Chung H, Waryah YM, Makrythanasis P, Falconnet E, Rao AR, Guipponi M, Narsani AK, Fingerhut R, Santoni FA, et al. Visual impairment and progressive phthisis bulbi caused by recessive pathogenic variant in MARK3. Hum Mol Genet 2018;27:2703e11. [144] Jou CJ, Arrington CB, Barnett S, Shen J, Cho S, Sheng X, McCullagh PC, Bowles NE, Pribble CM, Saarel EV, et al. A functional assay for sick sinus syndrome genetic variants. Cell Physiol Biochem 2017; 42:2021e9. [145] Chang S, Biswas K, Martin BK, Stauffer S, Sharan SK. Expression of human BRCA1 variants in mouse ES cells allows functional analysis of BRCA1 mutations. J Clin Invest 2009;119:3160e71. [146] Naash MI, Hollyfield JG, al-Ubaidi MR, Baehr W. Simulation of human autosomal dominant retinitis pigmentosa in transgenic mice expressing a mutated murine opsin gene. Proc Natl Acad Sci USA 1993;90: 5499e503. [147] Murray AR, Vuong L, Brobst D, Fliesler SJ, Peachey NS, Gorbatyuk MS, Naash MI, Al-Ubaidi MR. Glycosylation of rhodopsin is necessary for its stability and incorporation into photoreceptor outer segment discs. Hum Mol Genet 2015;24:2709e23. [148] Mighell TL, Thacker S, Fombonne E, Eng C, O’Roak BJ. An integrated deep-mutational-scanning approach provides clinical insights on PTEN genotype-phenotype relationships. Am J Hum Genet 2020; 106:818e29.

CHAPTER

Somatic data usage for classification of germ line variants

9 Michael F. Walsh

Memorial Sloan Kettering Cancer Center, New York, NY, United States

Introduction Somatic and germ line data are desirable to prove causality and association at both the gene and variant level. A critical amount of data is needed to classify a genetic variant as harmful or harmless. The human genome is made up of 6.4 billion letters and combinations of variability [1]. Determining the significance at both the gene level and variant level can be challenging. One definition of an essential gene is when loss of its function compromises viability of the individual or results in profound loss of fitness [2]. A number of genes are defined as cancer predisposing, lead to gene dysfunction, and are associated with specific cancers (Table 9.1) [3]. The field of cancer genetics benefits from tumor-normal comparative sequencing in attempting to determine significance and the role of germ line variants in tumorigenesis. Oncologists and cancer geneticists utilize sequencing data to direct care [4,5]. The Human Genome Project and The Cancer Genome Atlas along with sequencing efforts deployed across academic and commercial ventures generate data to these end points [1,6]. Increasingly cancer centers are sequencing both tumor and matched normal genomes for comprehensive cancer risk assessment [7e9]. Determining the significance of genomic alterations detected in sequencing both constitutional and tumor genomes requires classification schemes to translate clinically [10,11]. The vast possibilities of genomic alterations make utilizing data generated from both constitutional (germ line) sequencing and tumor sequencing desirable for classifying variants. The ClinGen Germline/Somatic Variant Subcommittee (GSVS) framework proposal for integrating data was put forth to the NIH Standard Variant Interpretation Committee and accepted for molecular laboratories to consider in variant classification [12]. The GSVS designed a survey targeting laboratory usage of evidence used for classifying germ line variants. Areas for consideration include loss of heterozygosity (LOH) or second hit mutations, mutational hot spots, RNA sequencing data, and tumor phenotypic characteristics (Fig. 9.1 and Table 9.2). Peer-reviewed and outward publicly facing resources that could be used for germ line variant classification were reviewed; limitations for specific datasets were considered, and then defined best practices or utilization of datasets were recommended after a series of exercises assessing appropriate application of evidence codes (Figs. 9.2 and 9.3 and Table 9.3). For ease of use, existing ACMG/AMP evidence codes were expanded upon for application and incorporation of somatic data where possible. Clinical DNA Variant Interpretation. https://doi.org/10.1016/B978-0-12-820519-8.00022-3 Copyright © 2021 Elsevier Inc. All rights reserved.

169

170

Chapter 9 Somatic data usage for classification of germ line variants

Table 9.1 A subset of cancer predisposing genes statistically significant in association with subtypes of cancer. Genes associated with specific cancer types Breast

Ovarian

Colon

Pancreas

Renal

ATM BRCA1 BRCA2 CHEK2 PALB2 PTEN TP53 RAD51C RAD51D

ATM BRCA1 BRCA2 BRIP1 CHEK2 EPCAM MLH1 MSH2 MSH6 PALB2 PTEN PMS2 RAD51C RAD51D TP53

APC EPCAM GREM1 MLH1 MSH2 MSH6 MUTYH PMS2 POLD1 POLE TP53

ATM BRCA1 BRCA2 CDK4 CDKN2A EPCAM MLH1 MSH2 MSH6 PALB2 PMS2 TP53

BAP1 EPCAM FH FLCN MET MITF MLH1 MSH2 MSH6 PMS2 PTEN SDHB SDHC SDHD TP53 TSC1 TSC2 VHL

Investment in sequencing human tumors and parallel germ line samples has provided a rich repository of data to mine and better understand cancer predisposition and variants of uncertain significance (VUS) (Fig. 9.3) [6e9,13e80]. In some instances, characterization of tumors by sequencing, expression data, and integration with additional biomarker data have provided evidence for the pathogenicity of germ line variants (Table 9.4) [9,27,77,81e90]. Advancement in bioinformatic tools and integration of genomic assays provide a means to identify cryptic genomic lesion and alterations missed by single assays [44,91e93]. Analysis of tumors using mathematical modeling to account for gene size and frequency of codon-specific alterations provides a basis for determining cancer hot spot mutations and additional data for considering if the same alterations are detected in the germ line [94,95]. Conversely, rare germ line variants potentially contributing to oncogenesis in the context of a tumor with loss of the other allele in the tumor contribute evidence toward classifying germ line VUS [96]. Additional complexity occurs in trying to interpret constitutional copy number variants impacting cancer predisposition genes for which recent standards have been proposed [97].

Data sources Somatic data resources Somatic data resources in the public domain are heterogeneous. Data sources can include variant data from tumor/normal sequencing, institutional resources such as case registries, gene/disease-focused registries, public or institutional data, and proprietary datasets. Given the variability in these

Data sources

171

FIGURE 9.1 Points to consider in attempting to integrate germ line and somatic data for variant classification. CMMRD, constitutional mismatch repair deficiency; MSI, microsatellite instability.

datasets, the GSVS comprised of oncologists, geneticists, molecular pathologists, molecular geneticists, bioinformatic specialists, computational biologists, and laboratory directors reviewed and compiled a list of peer-reviewed resources that are publicly available (Fig. 9.3 and Table 9.3). COSMIC (Catalogue of Somatic Mutations in Cancer) is the world’s largest source of somatic mutation information related to human cancers. Two main types of data are aggregated in COSMIC from tumors and cell lines. The database houses metadata from over 32,000 genomes consisting of peer-reviewed large-scale genome screening data, data from TCGA and ICGC, and provides unbiased, genome-level profiling of diseases. COSMIC provides extensive coverage of the cancer genomic landscape from a somatic perspective. The database is updated significantly four times a year [98e107]. The cBioPortal for Cancer Genomics is a data repository originally developed at Memorial Sloan Kettering Cancer Center (MSK). The public cBioPortal site is hosted by the Center for Molecular Oncology at MSK. The continuous development of software aggregating data from cancer sequencing

172

Chapter 9 Somatic data usage for classification of germ line variants

Table 9.2 Limitations in classifying germ line variants. Obstacles to variant interpretation

· Genetic variability between individuals · Genetic variability between populations · Non-uniformed genetic testing assays used in different studies · Variable thresholds and pipelines for variant filtration · Functional validation of novel or rare variants detected · Non-uniform application of classification rules leading to conflicting interpretation · Ability to segregate variations in families · Access to variant databases studies includes MSK, the Dana Farber Cancer Institute, Princess Margaret Cancer Center in Toronto, Children’s Hospital of Philadelphia, The Hyve in the Netherlands, and Bilkent University in Ankara, Turkey. The repository stores somatic variants from 295 studies including TCGA projects, the Pediatric Cancer Genome Project, and the National Moonshot Cancer initiative project GENIE (Genomics Evidence Neoplasia Information Exchange) Consortium. cBioPortal is an open access, open-source resource for interactive exploration of multidimensional cancer genomic datasets. The focus and goal of the cBioPortal is to improve collaboration between cancer researchers by allowing for rapid, intuitive, and high-quality access to molecular profiles and clinical attributes from large-scale cancer genomics studies and allow for better understanding of the biology that drives tumors and clinical translation [108,109]. Cancer Hotspots is a resource linked with the cBioPortal. Cancer Hotspots represents an analysis of close to 25,000 cancers including over 10,000 prospectively sequenced patients with advanced disease which identified 1165 statistically significant hot spot mutations of which 80% arose in 1 in 1000 or fewer patients. This online outward facing resource provides a systematic computational, experimental, and clinical analysis of hot spot mutations in approximately 25,000 human cancers [94]. PECAN.stjude.org is an interactive webtool developed by researchers at St. Jude Children’s Research Hospital (SJCRH). The interface provides visualizations of pediatric cancer mutations across various projects with SJCRH and national and international collaborators. The associated database includes samples from over 5000 pediatric cancers representing 4877 unique patients, 23 diagnoses, and annotates over 88,000 mutations in 18,395 genes [110]. CiVIC is a community-edited forum discussion of peer-reviewed publications pertaining to the clinical relevance of variants (or biomarkers) in cancer. Interpretations include associations between

Data sources

173

FIGURE 9.2 Effort and Process for considering the use of somatic variant data in classifying germ line variants in cancer predisposition genes.

FIGURE 9.3 Data repositories and resources for somatic and germ line cancer data.

174

Chapter 9 Somatic data usage for classification of germ line variants

Table 9.3 Data repositories considered for somatic and germ line cancer integration. Database

Host

Link

cBioPortal

Memorial Sloan Kettering Cancer Center National Institute of Health St. Jude Children’s Research Hospital Johns Hopkins

https://www.cbioportal. org/ https://gdc.cancer.gov/

https://civicdb.org/home

COSMIC

Washington University in St. Louis National Institute of Health Cambridge, Sanger, UK

gnoMAD

The Broad

Genomic Data Commons Pecan Online Mendelian Inheritance of man CiVIC ClinVar

https://pecan.stjude. cloud/ https://www.omim.org/

https://www.ncbi.nlm. nih.gov/clinvar/ https://cancer.sanger.ac. uk/cosmic https://gnomad. broadinstitute.org/

Table 9.4 Characterization of germ line variants integrating germ line and somatic data. Evidence codes without somatic data

Gene

Variant

TP53

p. T125M

VHL

p. Ile151Thr

PTEN

p. Gly132Asn

PP2, PP3, PP4, PM2

VUS

ATM TP53

p. Arg337Cys p. Arg196Leu

PM2, PP3 PP2, PP3, PM2

VUS VUS

P3 supporting, PM2, PP3, and PP4 PP3, PP4, PM2

Classification VUS

VUS

Evidence code with somatic data PS3 supporting, PM2, PP3, PP4 and PM1 PP3, PP4, PM2, PM1 supporting PP2, PP3, PP4, PM2, PM1 supporting PM2, PP3, PM1 PP2, PP3, PP4, PM2, PM1 supporting

Final variant classification Likely pathogenic

VUS Likely pathogenic

VUS Likely pathogenic

American College of Medical Genetics and Genomics abbreviations for coding variant evidence with scale 1e6. PM, pathogenic moderate; PP, pathogenic supporting; PS, pathogenic strong; PVS, pathogenic very strong; VUS, variants of uncertain significance. https://www.nature.com/articles/gim201530.

Laboratory practices utilizing somatic data

175

molecular alterations and one or more drugs, diagnoses, prognoses, or treatment decisions. Interpretations of clinical significance are intended for research purposes. The effort makes no guarantee of clinical benefit from usage of the site [111]. Genomic Data Commons (GDC) provides the cancer research community with a unified data repository that enables data sharing across cancer genomic studies in support of precision medicine. The GDC supports the National Cancer Institute (NCI) Center for Genomics, including The Cancer Genome Atlas (TCGA) and Therapeutically Applicable Research to Generate Effective Treatments (TARGET) [112].

Other databases with limited somatic data ClinVar is an open access, public archive of reports of the relationships of human variation with phenotypes and supporting evidence among human variations and phenotypes. ClinVar facilitates access and communication pertaining to the relationships asserted between human variation and observed health status and the history of interpretation. ClinVar curates and processes submissions reporting variants found in patient samples, assertions made regarding their clinical significance, information about the submitter, and other relevant data. The alleles described are mapped to the reference sequences and reported in accordance with the HGVS standard. In addition, ClinVar presents the data for interactive users as well as for real-time use. The level of confidence in assertions in the accuracy of variation of clinical significance depends on the supporting evidence, so information, when possible, is collected and available [113,114]. OMIM is a comprehensive, authoritative compendium of human genes and genetic phenotypes that is freely available and updated daily. The full-text, referenced overviews in OMIM contain information on all known Mendelian disorders and over 15,000 genes. OMIM focuses on the relationship between phenotype and genotypes [115].

Control database for comparison Gnomad is a resource developed by an international consortium with the goal of aggregating exome and genome sequencing data from a broad spectrum of sequencing projects and making summary data available to the scientific community. ExAC data is available in this data aggregation database as well as 50 additional projects using next-generation sequencing and over 100,000 exomes [116].

Laboratory practices utilizing somatic data The survey from 20 laboratories at major academic and commercial laboratories performing both somatic and germ line sequencing has shown variable approaches and perceived value when classifying variants revealed from germ line and somatic sequencing as well as a desire to more formally integrate data sources. The majority of laboratories responding, 72%, reported they would use somatic data for classifying germ line variants if possible. Data types polled for use included: immunohistochemistry, telomere lengths, chromosomal breakage studies, RNA sequencing data, and xenograft and cell line responses to targeted therapies involving relevant biologic pathways.

176

Chapter 9 Somatic data usage for classification of germ line variants

Principles and rationale for utilizing somatic data for classifying germ line variants in cancer predisposition genes Mutations leading to biological changes are sought and a comprehensive understanding of molecular mechanisms leading to consequences are desired to better understand cancer, Mendelian diseases, drug sensitivity, and drug resistance. Estimations near 80% in some instances whereby mutations deemed associated with disease are incorrectly assumed to be causative as the evidence provided to declare such conclusions is limited [117]. Areas considered feasible for somatic data integration to classify variants in germ line cancer predisposition genes include LOH or second hit mutations, mutational hot spots, RNA sequencing data, and tumor phenotypic characteristics (Fig.9. 4). Use of these data elements are not without limitations and caveats.

Loss of heterozygosity, determining biallelic inactivation, and cancer hot spots Loss of heterozygosity LOH is a form of allelic imbalance, by which a heterozygous somatic cell becomes homozygous because one of the two alleles gets lost [118]. This is a guiding principle in cancer genetics which stems from Alfred Knudson’s two-hit hypothesis, based on the observation in retinoblastoma patients, that retinoblastoma is a cancer caused by two mutational events. In the case of the dominant heritable type, one mutation is inherited from the germinal cells and the other occurs in the somatic cells (Fig. 9.5). In the nonhereditary form both mutations occur in somatic cells [119]. Considering the molecular mechanism proposed in the two-hit model, if a variant is detected in the germ line and a second hit revealed in the tumor in a known tumor suppressor gene, this might provide strong data for consideration in classifying the germ line variant. Notably, tumor suppressor genes have been suggested in tumorigenesis in most sporadic tumors. Another example illustrating Knudson’s two-hit model is the Vogelstein model of familial colon cancer progression, in this model germ line mutations in the APC gene are inherited as the first hit and somatic hits in the second copy of the APC gene represent the second hit [120]. This second hit represents loss of both copies of the gene and oncogenesis is initiated. Point mutations have become easier to identify in the era of next-generation sequencing, in situ hybridization assays, PCR-based LOH assays, and microdissection instruments whereas historically they were more difficult to measure. There are several caveats and factors that can impact LOH analysis including tumor cell contamination, e.g., LOH can be masked by heavy contamination by normal DNA. A means to minimize this issue is microdissection from the contaminating tissues. If polymorphic repeat markers are used it is important to recognize these markers are often only a surrogate for the gene of interest. The distance between the TSG and the polymorphic markers impact detection. Additional concerns include low DNA concentration and lack of a normal control sample. With a low concentration of DNA, one allele may be preferentially amplified over another. If one allele is inadequately amplified this can lead to allelic dropout. When this occurs the PCR product allele ratio will not reflect the original starting amount of DNA of each allele and this can lead to false-positive results for LOH. A means to quality control for this is to run duplicate samples. When DNA concentration is low, this manifests as inconsistent allele ratios.

Loss of heterozygosity, determining biallelic inactivation, and cancer hot spots

177

aberrant

chromosomal of the

FIGURE 9.4 Proposed GSVS ClinGen Working Group Proposal for Integrating somatic and germ line data. American College of Medical Genetics and Genomics abbreviations for coding variant evidence with scale 1e6. PM, pathogenic moderate; PP, pathogenic supporting; PS, pathogenic strong; PVS, pathogenic very strong; VUS, variants of uncertain significance. https://www.nature.com/articles/gim201530.

178

Chapter 9 Somatic data usage for classification of germ line variants

FIGURE 9.5 Two-hit hypothesis, one mutation is inherited from the germinal cells and the other occurs in the somatic cells. In the nonhereditary form both mutations occur in somatic cells.

Copy-neutral LOH Copy-neutral LOH in cancer can be equivalent to the second hit in Knudson’s two-hit model [121]. Acquired uniparental disomy occurs in solid and liquid tumors [122e124]. Virtual karyotypes using SNP-based arrays can inform genome-wide copy number and LOH status including copy-neutral LOH.

Determining biallelic inactivation Interpreting germ line variants requires substantial evidence to deem a variant as pathogenic or disease-causing [11]. Classifying a variant as harmless or harmful, as in the instance of BRCA1 and BRCA2, clinically translates to a recommendation for organ removal or not [4]. Sequencing the human

Mutational hot spots

179

genome provided a means to determine cancer-prone individuals in a broader manner [1]. However, genetic variability, the number of individuals having undergone testing, nonuniform testing approaches, and data access are rate-limiting factors in determining VUS from harmless and harmful genetic variation (Table 9.2). BRCA1/2 predisposing to breast, ovarian, pancreatic, and prostate cancer have been studied [4,73,78,125e129]. More recently the tumorigenic effects of germ line BRCA1/2 mutations have been shown in advanced cancer patients with associated tumors to exhibit selective pressure for biallelic inactivation, zygosity-dependent phenotype penetrance, and sensitivity to PARP inhibition, whereas in individuals with non-BRCA-associated cancer types, most germ line carriers’ tumor pathogenesis was without selective pressure for biallelic inactivation [84]. Zygosity of germ line and somatic variants to determine this difference was assessed by tumor-specific zygosity of both germ line pathogenic variants and somatic mutations and integrating the read support for the mutant allele with total coverage and the estimated locus-specific total and allele-specific copy number. Using this approach it was determined whether the observed variant allele frequencies (VAFs) for clonal events in the tumor were consistent with the expected VAF given the tumor purity and local copy number defined for germ line variants as: (F  n þ (1  F)/(F  N þ 2  (1  F)) and as: (F  n)/(F  N þ 2  (1  F)) for somatic variants, in which the tumor purity is F, and N and n are the locusspecific and allele-specific copy number, respectively. For variants with allelic imbalance in favor of the mutant allele, LOH was considered present if the somatic VAF was within the 95% binomial confidence interval of the expected VAF of the lesser allele having a copy number of zero. LOH favoring the mutant allele was those variants for which the observed VAF exceeded the lower bound of the 95% confidence interval of the expected VAF, and conversely, loss of the mutant (favoring the reference allele) was the reverse. For somatic loss-of-function BRCA1/2 mutations, zygosity inference was limited to clonal mutations. The zygosity of a given variant was considered indeterminate if no tumor VAF was estimated, the variant was homozygous in the normal, or the tumor depth at the variant site was less than 50 [84].

Mutational hot spots The aggregated genomic landscape of cancer has shown that most mutations in cancer are rare [94]. Analysis of more than 25,000 cancers sequenced and culled in the cBioPortal has provided a data repository to assess statistically significant recurrent or hot spot mutations [94]. More specifically, retrospective mutational data were culled from The Cancer Genome Atlas, the International Cancer Genome Consortium, and independently published sequencing projects [95]. Statistical significance of codon hot spots was determined in 32 separate cancer types. Every codon was assessed with a truncated binomial probability model in which the expected probability incorporates underlying features of gene-specific mutation rates including gene length, gene- and position-specific mutability, and mutational burden of the gene. This model assesses the significance of single mutant alleles relative to the background of all mutations in the gene in which it emerges rather than across genes [94]. This exercise led to identification of 1165 statistically significant hot spot mutations of which the vast majority arose in 1 in 1000 or fewer individuals. It is important to note that cancer genes exhibit different rates of hot spot discovery with increasing sample size. As additional molecular cancer landscapes are aggregated, a more refined understanding of hot spots will be possible. The critical mass of data analyzed at this point provides a starting point for integrating hot spots in cancer and how

180

Chapter 9 Somatic data usage for classification of germ line variants

they might be considered if detected from constitutional samples in individuals with associated cancer predisposition syndromes. While the American College of Medical Genetics and Genomics (ACMG)/ American Molecular Pathology (AMP) did not address how to use tumor data when assessing the pathogenicity of germ line variants, their guidelines do provide evidence codes amendable to somatic variants in addition to germ line data. National and international consortia have since focused on culling data to facilitate classification of variants considering this somatic and germ line integration of data [130]. A framework for integrating somatic variant data and biomarkers for germ line variant classification in cancer predisposition genes has been proposed with recognized caveats (Table 9.3 and Fig. 9.3) [12]. ACMG rules describe usage of the hot spot evidence code (PM1) for germ line variants: “located in mutational hotspot and/or critical and well established functional domain without benign variation.” This provides a means to integrate cancer mutational hot spots defined at sites with a Qvalue < 0.1 as statistically significant with a false discovery rate less than 10% [94,95]. This approach is also in line with the odds ratios described for moderate evidence at 4:3:1 and strong evidence 18:7:1 odds [131]. Cancer context is important for meaningful application of this rule [3]. Somatic hot spot data are recommended for use in the context of a germ line variant being interpreted with regard to cancer predisposition, as some cancer susceptibility genes are also associated with noncancer syndromes. Duplicative counting should not be applied as evidence, i.e., if an amino acid residue is a mutational hot spot at both the germ line and somatic level, then only the germ line information should be used to fulfill the evidence code (PM1 should not be used twice). In the circumstance where somatic evidence is deemed stronger, e.g., PM1 moderate for somatic and PM1 supporting for germ line then the germ line evidence should be prioritized for germ line interpretation. Another point of consideration is the variant type and the database composition and cancer types for which hot spots have been defined. For statistical significance, recurrent hot spots in cancer have been defined using a cohort of tumors described in cancerhotspots.org and the cBioPortal. Thus, the cancers contributing to the hot spot residue may be incomplete and not fully reflective of all cancers associated with a patient’s hereditary cancer predisposition syndrome spectrum. Examples of this approach have been illustrated with the tumor suppressor gene TP53, VHL, and PTEN. Using the hot spot rule requires additional considerations (Table 9.4). Other somatic data types of potential benefit to aid in classifying germ line variants include tumor signatures, chromothripsis, mutational burden, microsatellite instability, RNA-seq data, LOH in the tumor, immunohistochemistry, and functional molecular assays such as FlowFish and cytogenetic testing such as chromosomal breakage.

RNA-seq tumor data RNA-seq is a technique used to perform transcriptome profiling that uses deep-sequencing technologies. This technology provides a precise method of measuring transcripts compared to tiling arrays, cDNA, or EST sequencing. RNA sequencing is a high-throughput sequencing modality allowing resolution at the single base level has a relatively low background noise level, simultaneously maps transcribed regions and gene expression levels, provides a means to distinguish between isoforms and allelic expression, requires a small amount of DNA, and is a relatively low cost for mapping transcriptomes of large genomes [132].

Tumor signatures

181

The ACMG rules provide a means to use RNA data, when obtained from blood. Specifically, PVS1 evidence code for which a “null variant (nonsense, frameshift, canonical 1 or 2 splice sites, initiation codon, single, or multi exon deletion) in a gene where loss of function is a known mechanism of disease might be determined by RNAseq.” There are many examples of this in the literature for cancer predisposing genes. Two examples in breast cancer predisposition genes include PALB2 and BRCA1. In the first instance, a PALB2 synonymous variant c.18G > T (p. Gly6 ¼ ) identified in a family with pancreatic and breast cancers was evaluated. RT-PCR and subsequent cloning were performed to investigate whether this variant affects normal splicing. This variant was found to completely disrupt normal splicing and lead to several abnormal transcripts, and presumably leads to premature protein truncation. In the second case, a BRCA1 deletion affecting the þ4 splice donor site was identified in an individual with early-onset breast cancer. The effect of BRCA1 c.5332þ4delA variant on RNA splicing was evaluated by amplifying regions of BRCA1 from cDNA derived from the patient. The proportion of abnormal transcript in the total transcripts was quantified. LOH in tumor tissue was investigated using Sanger sequencing and fragment analysis. BRCA1 c.5332þ4delA caused skipping of exon 21 in patient-derived samples. Semiquantitative analysis indicated that this aberrant RT-PCR product accounts for about 40% of the total transcript levels. LOH was observed in the patient’s tumor tissue. Results indicated that the BRCA1 c.5332þ4delA variant contributes to cancer predisposition through disruption of normal mRNA splicing [82,133e135]. In a study reported by Ambry Genetics over a 2year period, RNA testing resolved a substantial portion of VUS in a cohort of individuals previously tested for cancer predisposition by DNA genetic testing. RNA genetic testing clarified the interpretation of 49 of 56 inconclusive cases (88%) studied; 26 (47%) were reclassified as clinically actionable and 23 (41%) were clarified as benign. Tayoun et al. have formalized the PVS1 evidence code [131]. RNA seq data should be considered similarly in tumor data when considering germ line VUS potentially leading to aberrant splicing. Analysis of RNAseq from tumors and blood might provide evidence for the application of codes such as PVS1 and should be considered by molecular tumor boards. Examples of this include the use of tumor-derived RNA sequencing data to determine if canonical or noncanonical predicated splice site variants result in abnormal cDNA isoforms that are associated with abnormal splicing. Variants occurring at splice sites or intron/exon junctions can disrupt splicing and lead to truncated protein or nonsense-mediated decay. Examples considering RNAseq tumor data in this way have been illustrated in pediatric integrative studies, e.g., where tumor RNAseq supported likely functional germ line splicing by marked loss of read counts in the splicing region [9]. Tumor RNAseq was used to support likely functional germ line splicing disruption caused by a predicted splice variant in ATM, as the RNAseq displayed marked loss of read counts in exons 30 of the splice site [9].

Tumor signatures Somatic alterations or variants in cancer stem from multiple mutational processes and may include inherent intrinsic genomic instability, environmental exposures, viral integration, metabolic derangements, aberrancy in telomere maintenance, constitutive activation of oncogenes, and more [77,84,136e140]. It is now well established that different mutational signatures are associated with oncogenesis in a tissue-specific manner [86,140]. Examples of established mutational signatures with specific processes include: Signature 1 with aging, Signature 4 with smoking exposure, and Signature

182

Chapter 9 Somatic data usage for classification of germ line variants

18 with neuroblastoma and Signature 24 with aflatoxin [86,141]. Two notable signatures that provide evidence for an inherited predisposition to cancer included: Signature 3 with BRCA1/2 germ line mutations and Signature 6 with mismatch repair deficiency (Table 9.5 and Fig. 9.6). [49,84] Mutations in BRCA1/2 predispose to breast cancer, ovarian cancer, pancreatic cancer, and prostate cancer. Tumors associated with BRCA1/2 have tumorigenic effects associated with biallelic inactivation and more rarely haploinsufficiency. Another characteristic of BRCA1/2-related tumors is a deficiency in repair of double-strand breaks by homologous recombination which is reflected by Signature 3, and the absence of this signature should prompt consideration that the pathogenesis of the tumor is driven by an alternative mechanism [84]. This phenotypic difference can be used as supporting phenotypic evidence (PP4) in the case germ line BRCA1/2 VUS are revealed and for which tumor signature data are available and are also associated with a second hit in the tumor. An important consideration in determining using tumor signature as somatic phenotypic characteristic is the molecular assay performed as whole-genome sequencing shows thousands of somatic single-nucleotide variants in common cancers not revealed by whole-exome sequencing [86].

Germ line risk and variant pathogenicity informed from tumor signatures Breast cancer CHEK2 variants Pathogenic germ line variants in checkpoint kinase 2 (CHEK2), a gene involved in DNA damage and cell cycle regulation, confer an increased risk for breast cancer. Phenotypic characteristics of CHEK2 from 33 breast cancer patients with germ line CHEK2 mutations carriers have revealed differences Table 9.5 Tumor Signature characteristics of germ line mutation carriers. Breast cancer

LOH

BRCA1/2

Signature

Hormone status

Other Reduced DNA methylation, more ancestral divisions, and elevated rates of structural variation that tend to disrupt highly expressed proteincoding genes and known tumor suppressors CHEK2 I157T less evidence as driver Does not co-occur with somatic TP53 mutations Tumors typically with microsatellite instability, and Immunohistochemistry absent

3

CHEK2 (high-risk allele) ATM

Yes

No HRD signature

Positive

Yes

No HRD signature

Positive

MMR genes

Yes

6

n/a

HRD, homologous recombination deficiency, LOH, loss of heterozygosity; MMR, mismatch repair genes.

Other considerations for integrating germ line and somatic data

183

FIGURE 9.6 Tumor signatures provide phenotypic data for germ line predisposition. Signature 3 and Signature 6 associated with BRCA1/2 predisposition and MMR (mismatch repair deficiency) predisposition, respectively (Image credit: Max Levine and Elli Papaemmanuil).

from patients with high-risk versus low-risk allele (p.Ile157Thr). Individuals with moderate-risk germ line variants were enriched for hormone receptor-positive and exhibited LOH of the CHEK2 wild-type allele, whereas CHEK2 associated breast cancers from low-risk p.I156Thr variant displayed less frequent LOH and higher levels of protein expression than those with higher risk germ line variants (still moderate risk). CHEK2-associated breast cancers in this study did not frequently exhibit a dominant Signature 3, a genomics feature of homologous recombination DNA repair deficiency (HRD). In the instance of ATM, these findings mirror those revealed for a similar ATM-associated breast cancer analysis [88,90]. Overall findings exhibited ATM-associated breast cancers often harbor biallelic inactivation of ATM, are phenotypically distinct from BRCA1/2-associated breast cancers, lack homologous recombination deficiency, lack HRD-related signatures, and have TP53 and ATM genetic alterations that are likely epistatic (Table 9.5) [90].

Other considerations for integrating germ line and somatic data Biomarker considerations (immunohistochemistry and hormone status) Hormone status and immunohistochemistry are frequently assessed as predictors for both breast and colon cancer and aid in interpretation of variants [142e145]. Biomarkers may serve to sway evidence toward a germ line predisposition where others toward sporadic tumors. BRAF hypermethylated

184

Chapter 9 Somatic data usage for classification of germ line variants

colorectal tumors are typically not associated with Lynch syndrome and can serve in most instances as an indicator of non-Lynch tumors. Whereas, in endometrial cancer use of MLH1 hypermethylation but not BRAF mutation can serve as a negative predictor of Lynch syndrome based on current evidence [148]. MLH1 hypermethylation alone may also serve as a negative predictor for Lynch-associated tumors; however, there are some examples of epimutations detected peripherally in MLH1 representing a subtype of cancer predisposition [146,147]. Microsatellite instability in absence of hypermethylation in colorectal and endometrial cancer is a strong predictor of Lynch syndrome. HER2þbreast cancers are enriched in LieFraumeni syndrome. Knowledge of these associations can be considered PP4 supporting evidence as a phenotypic characteristic of the tumor [85,142,145]. Furthermore, in breast cancer age at diagnosis and grade is more informative than ER status for BRCA2 mutation carrier prediction. The estimates will improve BRCA1 and BRCA2 variant classification and inform patient mutation testing and clinical management. Additional phenotypic evidence needing exploration include specific somatic mutations occurring in specific contexts and potentially serving as biomarkers and aid in classifying germ line variants. For example, in urothelial tumors, it is recognized that specific FGFR3 somatic mutations are more often seen in Lynch syndrome patients than sporadic urothelial tumors [149].

Determining pathogenicity of alleles in genes with recessive and dominant phenotypes integrating population, somatic, and germ line data The explosion of tumor sequencing provides a means to reconsider prior classification of germ line variants considered pathogenic and better define gene to cancer predisposition associations. Fumarate hydratase (FH) mutations underpin the autosomal recessive syndrome FH deficiency and the autosomal dominant syndrome hereditary leiomyomatosis and renal cell carcinoma (HLRCC). The FH c.1431_1433dupAAA (p. Lys477dup) genomic alteration has been conclusively shown to contribute to FH deficiency when occurring with another FH germ line alteration. However, a sufficiently large dataset has been lacking to conclusively determine its clinical significance to HLRCC cancer predisposition in the heterozygous state. We reviewed a series of 7571 patients with cancer who received germ line results through MSK-IMPACT testing at the MSK. The FH c.1431_1433dupAAA (p. Lys477dup) variant was detected in 24 individuals, none of whom was affected with renal cancer. Eleven of the 372 patients with renal cancer were identified as carrying pathogenic FH, while 11 patients carried other pathogenic FH variants that are associated with HLRCC [135].

Recognizing clonal evolution and specific somatic mutations in the context of predisposition Leukemia predisposition genes As additional somatic data are gathered in tandem with normal control samples for genes predisposing to bone marrow failure (TERT, TERC, NOP10, DKC, WRAP53, RTEL) and leukemia (RUNX1, CEBPA) there will be an increased possibility for classifying VUS. Assessment of the somatic landscapes may exhibit profiles consistent with somatic versus heritable leukemia for which data are beginning to be considered in this way [150,151].

References

185

Identifying candidate predisposition genes The means to identify cancer predisposing genes is enabled through tumor-normal sequencing. As a first step in identifying associations manifesting from mutations in cancer predisposition genes, uncertain variants in these genes can be considered for pathogenicity if detected. Two examples whereby integrative somatic germ line analysis has led to a broader understanding of clinical syndromes associated with cancer predisposition genes are germ line TP53 mutations with hypodiploid ALL and germ line SDHA mutations and neuroblastoma [27,77]. Integrative studies in breast cancer, ovarian cancer, and prostate cancer have revealed biallelic mutations in several known genes and some now requiring further investigation. As associations of the spectrum of predisposition genes are better defined, germ line VUS in these genes will be more likely to be defined or classified through integration of somatic and germ line data. In one integrative analysis of triple negative breast cancers, FANCM was raised as a candidate gene. In ovarian cancer, a study of the germ lineesomatic interaction combined with extensive bioinformatic annotation revealed over 200 candidate functional germ line truncations and missense variants, and investigators identified significantly altered pathways, including the Fanconi, MAPK, and MLL. Finally, integrative analysis links susceptibility to oncogenesis in prostate cancer and establishes functional interplay between somatic and germ line variation. While there appears to be enrichment of molecular networks for germ line and somatic mutations including: PDGF, P53, MYC IGF-1, PTEN, and AR signaling pathways, linking these genes as cancer predisposition alleles for this specific cancer type has been insufficient and requires further proof [75,152].

References [1] Venter JC, Adams MD, Myers EW, et al. The sequence of the human genome. Science 2001;291:1304e51. [2] Bartha I, di Iulio J, Venter JC, Telenti A. Human gene essentiality. Nat Rev Genet 2018;19:51e62. [3] LaDuca H, Polley EC, Yussuf A, et al. A clinical guide to hereditary cancer panel testing: evaluation of gene-specific cancer associations and sensitivity of genetic testing criteria in a cohort of 165,000 high-risk patients. Genet Med 2020;22:407e15. [4] Kauff ND, Satagopan JM, Robson ME, et al. Risk-reducing salpingo-oophorectomy in women with a BRCA1 or BRCA2 mutation. N Engl J Med 2002;346:1609e15. [5] Robson M, Im SA, Senkus E, et al. Olaparib for metastatic breast cancer in patients with a germline BRCA mutation. N Engl J Med 2017;377:523e33. [6] Cancer Genome Atlas Research N, Ley TJ, Miller C, et al. Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. N Engl J Med 2013;368:2059e74. [7] Mandelker D, Zhang L, Kemel Y, et al. Mutation detection in patients with advanced cancer by universal sequencing of cancer-related genes in tumor and normal DNA vs guideline-based germline testing. J Am Med Assoc 2017;318:825e35. [8] Zehir A, Benayed R, Shah RH, et al. Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nat Med 2017;23:703e13. [9] Zhang J, Walsh MF, Wu G, et al. Germline mutations in predisposition genes in pediatric cancer. N Engl J Med 2015;373:2336e46. [10] Chakravarty D, Gao J, Phillips SM, et al. OncoKB: a precision Oncology knowledge base. JCO Precis Oncol 2017;2017.

186

Chapter 9 Somatic data usage for classification of germ line variants

[11] Richards S, Aziz N, Bale S, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med 2015;17:405e24. [12] Walsh MF, Ritter DI, Kesserwan C, et al. Integrating somatic variant data and biomarkers for germline variant classification in cancer predisposition genes. Hum Mutat 2018;39:1542e52. [13] Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 2008;455:1061e8. [14] Cancer Genome Atlas Research Network. Integrated genomic analyses of ovarian carcinoma. Nature 2011; 474:609e15. [15] Cancer Genome Atlas Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 2012;487:330e7. [16] Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 2012; 490:61e70. [17] Cancer Genome Atlas Research Network. Comprehensive genomic characterization of squamous cell lung cancers. Nature 2012;489:519e25. [18] Cheung NK, Zhang J, Lu C, et al. Association of age at diagnosis and genetic mutations in patients with neuroblastoma. J Am Med Assoc 2012;307:1062e71. [19] Gruber TA, Larson Gedman A, Zhang J, et al. An Inv(16)(p13.3q24.3)-encoded CBFA2T3-GLIS2 fusion protein defines an aggressive subtype of pediatric acute megakaryoblastic leukemia. Cancer Cell 2012;22: 683e97. [20] Robinson G, Parker M, Kranenburg TA, et al. Novel mutations target distinct subgroups of medulloblastoma. Nature 2012;488:43e8. [21] Wu G, Broniscer A, McEachron TA, et al. Somatic histone H3 alterations in pediatric diffuse intrinsic pontine gliomas and non-brainstem glioblastomas. Nat Genet 2012;44:251e3. [22] Zhang J, Benavente CA, McEvoy J, et al. A novel retinoblastoma therapy from genomic and epigenetic analyses. Nature 2012;481:329e34. [23] Zhang J, Ding L, Holmfeldt L, et al. The genetic basis of early T-cell precursor acute lymphoblastic leukaemia. Nature 2012;481:157e63. [24] Cancer Genome Atlas Research N. Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 2013;499:43e9. [25] Cancer Genome Atlas Research N, Kandoth C, Schultz N, et al. Integrated genomic characterization of endometrial carcinoma. Nature 2013;497:67e73. [26] Chen X, Stewart E, Shelat AA, et al. Targeting oxidative stress in embryonal rhabdomyosarcoma. Cancer Cell 2013;24:710e24. [27] Holmfeldt L, Wei L, Diaz-Flores E, et al. The genomic landscape of hypodiploid acute lymphoblastic leukemia. Nat Genet 2013;45:242e52. [28] Zhang J, Wu G, Miller CP, et al. Whole-genome sequencing identifies genetic alterations in pediatric lowgrade gliomas. Nat Genet 2013;45:602e12. [29] Cancer Genome Atlas Research Network. Comprehensive molecular characterization of urothelial bladder carcinoma. Nature 2014;507:315e22. [30] Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature 2014;511:543e50. [31] Cancer Genome Atlas Research Network. Comprehensive molecular characterization of gastric adenocarcinoma. Nature 2014;513:202e9. [32] Cancer Genome Atlas Research Network. Integrated genomic characterization of papillary thyroid carcinoma. Cell 2014;159:676e90.

References

187

[33] Chen X, Bahrami A, Pappo A, et al. Recurrent somatic structural variations contribute to tumorigenesis in pediatric osteosarcoma. Cell Rep 2014;7:104e12. [34] Davis CF, Ricketts CJ, Wang M, et al. The somatic genomic landscape of chromophobe renal cell carcinoma. Cancer Cell 2014;26:319e30. [35] Hoadley KA, Yau C, Wolf DM, et al. Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell 2014;158:929e44. [36] Parker M, Mohankumar KM, Punchihewa C, et al. C11orf95-RELA fusions drive oncogenic NF-kappaB signalling in ependymoma. Nature 2014;506:451e5. [37] Roberts KG, Li Y, Payne-Turner D, et al. Targetable kinase-activating lesions in Ph-like acute lymphoblastic leukemia. N Engl J Med 2014;371:1005e15. [38] Tirode F, Surdez D, Ma X, et al. Genomic landscape of Ewing sarcoma defines an aggressive subtype with co-association of STAG2 and TP53 mutations. Cancer Discov 2014;4:1342e53. [39] Wu G, Diaz AK, Paugh BS, et al. The genomic landscape of diffuse intrinsic pontine glioma and pediatric non-brainstem high-grade glioma. Nat Genet 2014;46:444e50. [40] Andersson AK, Ma J, Wang J, et al. The landscape of somatic mutations in infant MLL-rearranged acute lymphoblastic leukemias. Nat Genet 2015;47:330e7. [41] Cancer Genome Atlas Network. Comprehensive genomic characterization of head and neck squamous cell carcinomas. Nature 2015;517:576e82. [42] Cancer Genome Atlas Network. Genomic classification of cutaneous melanoma. Cell 2015;161:1681e96. [43] Cancer Genome Atlas Research Network, Brat DJ, Verhaak RG, et al. Comprehensive, integrative genomic analysis of diffuse lower-grade gliomas. N Engl J Med 2015;372:2481e98. [44] Chen X, Gupta P, Wang J, et al. CONSERTING: integrating copy-number analysis with structural-variation detection. Nat Methods 2015;12:527e30. [45] Ciriello G, Gatza ML, Beck AH, et al. Comprehensive molecular portraits of invasive lobular breast cancer. Cell 2015;163:506e19. [46] Lu C, Zhang J, Nagahawatte P, et al. The genomic landscape of childhood and adolescent melanoma. J Invest Dermatol 2015;135:816e23. [47] Ma X, Edmonson M, Yergeau D, et al. Rise and fall of subclones from diagnosis to relapse in pediatric B-acute lymphoblastic leukaemia. Nat Commun 2015;6:6604. [48] Pinto EM, Chen X, Easton J, et al. Genomic landscape of paediatric adrenocortical tumours. Nat Commun 2015;6:6302. [49] Campbell JD, Alexandrov A, Kim J, et al. Distinct patterns of somatic genome alterations in lung adenocarcinomas and squamous cell carcinomas. Nat Genet 2016;48:607e16. [50] Cancer Genome Atlas Research N, Linehan WM, Spellman PT, et al. Comprehensive molecular characterization of papillary renal-cell carcinoma. N Engl J Med 2016;374:135e45. [51] Faber ZJ, Chen X, Gedman AL, et al. The genomic landscape of core-binding factor acute myeloid leukemias. Nat Genet 2016;48:1551e6. [52] Qaddoumi I, Orisme W, Wen J, et al. Genetic alterations in uncommon low-grade neuroepithelial tumors: BRAF, FGFR1, and MYB mutations occur at high frequency and align with morphology. Acta Neuropathol 2016;131:833e45. [53] Zhang J, McCastlain K, Yoshihara H, et al. Deregulation of DUX4 and ERG in acute lymphoblastic leukemia. Nat Genet 2016;48:1481e9. [54] Zheng S, Cherniack AD, Dewal N, et al. Comprehensive pan-genomic characterization of adrenocortical carcinoma. Cancer Cell 2016;29:723e36. [55] Aldiri I, Xu B, Wang L, et al. The dynamic epigenetic landscape of the retina during development, reprogramming, and tumorigenesis. Neuron 2017;94. 550e68 e10.

188

Chapter 9 Somatic data usage for classification of germ line variants

[56] Cancer Genome Atlas Research Network, Albert Einstein College of Medicine. Analytical Biological S, et al. Integrated genomic and molecular characterization of cervical cancer. Nature 2017;543:378e84. [57] Cancer Genome Atlas Research Network, Analysis Working Group, Asan U, Agency BCC, et al. Integrated genomic characterization of oesophageal carcinoma. Nature 2017;541:169e75. [58] Cancer Genome Atlas Research Network. Integrated genomic characterization of pancreatic ductal adenocarcinoma. Cancer Cell 2017;32:185e203. e13. [59] Cancer Genome Atlas Research Network. Comprehensive and integrated genomic characterization of adult soft tissue sarcomas. Cell 2017;171:950. 65 e28. [60] Cancer Genome Atlas Research Network. Comprehensive and integrative genomic characterization of hepatocellular carcinoma. Cell 2017;169:1327e41 e23. [61] Cherniack AD, Shen H, Walter V, et al. Integrated molecular characterization of uterine carcinosarcoma. Cancer Cell 2017;31:411e23. [62] Evans DGR, Salvador H, Chang VY, et al. Cancer and central nervous system tumor surveillance in pediatric neurofibromatosis 2 and related disorders. Clin Cancer Res 2017;23:e54e61. [63] Farshidfar F, Zheng S, Gingras MC, et al. Integrative genomic analysis of cholangiocarcinoma identifies distinct IDH-mutant molecular profiles. Cell Rep 2017;19:2878e80. [64] Fishbein L, Leshchiner I, Walter V, et al. Comprehensive molecular characterization of pheochromocytoma and paraganglioma. Cancer Cell 2017;31:181e93. [65] Hmeljak J, Sanchez-Vega F, Hoadley KA, et al. Integrative molecular characterization of malignant pleural mesothelioma. Cancer Discov 2018;8:1548e65. [66] Kehrer-Sawatzki H, Kluwe L, Friedrich RE, et al. Phenotypic and genotypic overlap between mosaic NF2 and schwannomatosis in patients with multiple non-intradermal schwannomas. Hum Genet 2018;137: 543e52. [67] Radovich M, Pickering CR, Felau I, et al. The integrated genomic landscape of thymic epithelial tumors. Cancer Cell 2018;33:244. 58 e10. [68] Robertson AG, Kim J, Al-Ahmadie H, et al. Comprehensive molecular characterization of muscle-invasive bladder cancer. Cell 2018;174:1033. [69] Robertson AG, Shih J, Yau C, et al. Integrative analysis identifies four molecular and clinical subsets in uveal melanoma. Cancer Cell 2018;33:151. [70] Shen H, Shih J, Hollern DP, et al. Integrated molecular characterization of testicular germ cell tumors. Cell Rep 2018;23:3392e406. [71] Stewart E, McEvoy J, Wang H, et al. Identification of therapeutic targets in rhabdomyosarcoma through integrated genomic, epigenomic, and proteomic analyses. Cancer Cell 2018;34:411. 26 e19. [72] Cheng DT, Mitchell TN, Zehir A, et al. Memorial sloan kettering-integrated mutation profiling of actionable cancer targets (MSK-IMPACT): a hybridization capture-based next-generation sequencing clinical assay for solid tumor molecular Oncology. J Mol Diagn 2015;17:251e64. [73] Pritchard CC, Mateo J, Walsh MF, et al. Inherited DNA-repair gene mutations in men with metastatic prostate cancer. N Engl J Med 2016;375:443e53. [74] Schrader KA, Cheng DT, Joseph V, et al. Germline variants in targeted tumor sequencing using matched normal DNA. JAMA Oncol 2016;2:104e11. [75] Abida W, Armenia J, Gopalan A, et al. Prospective genomic profiling of prostate cancer across disease states reveals germline and somatic alterations that may affect clinical decision making. JCO Precis Oncol 2017;2017. [76] Chang TC, Carter RA, Li Y, et al. The neoepitope landscape in pediatric cancers. Genome Med 2017;9:78. [77] Dubard Gault M, Mandelker D, DeLair D, et al. Germline SDHA mutations in children and adults with cancer. Cold Spring Harb Mol Case Stud 2018;4.

References

189

[78] Lowery MA, Ptashkin R, Jordan E, et al. Comprehensive molecular profiling of intrahepatic and extrahepatic cholangiocarcinomas: potential targets for intervention. Clin Cancer Res 2018;24:4154e61. [79] Mody RJ, Wu YM, Lonigro RJ, et al. Integrative clinical sequencing in the management of refractory or relapsed cancer in youth. J Am Med Assoc 2015;314:913e25. [80] Parsons DW, Roy A, Yang Y, et al. Diagnostic yield of clinical tumor and germline whole-exome sequencing for children with solid tumors. JAMA Oncol 2016;2:616e24. [81] Alexandrov LB, Nik-Zainal S, Wedge DC, et al. Signatures of mutational processes in human cancer. Nature 2013;500:415e21. [82] Yang C, Arnold AG, Trottier M, et al. Characterization of a novel germline PALB2 duplication in a hereditary breast and ovarian cancer family. Breast Cancer Res Treat 2016;160:447e56. [83] Yang C, Jairam S, Amoroso KA, Robson ME, Walsh MF, Zhang L. Characterization of a novel germline BRCA1 splice variant, c.5332þ4delA. Breast Cancer Res Treat 2018;168:543e50. [84] Jonsson P, Bandlamudi C, Cheng ML, et al. Tumour lineage shapes BRCA-mediated phenotypes. Nature 2019;571:576e9. [85] Latham A, Srinivasan P, Kemel Y, et al. Microsatellite instability is associated with the presence of lynch syndrome pan-cancer. J Clin Oncol 2019;37:286e95. [86] Alexandrov LB, Kim J, Haradhvala NJ, et al. The repertoire of mutational signatures in human cancer. Nature 2020;578:94e101. [87] Hofstra RM, Spurdle AB, Eccles D, et al. Tumor characteristics as an analytic tool for classifying genetic variants of uncertain clinical significance. Hum Mutat 2008;29:1292e303. [88] Mandelker D, Kumar R, Pei X, et al. The landscape of somatic genetic alterations in breast cancers from CHEK2 germline mutation carriers. JNCI Cancer Spectr 2019;3. pkz027. [89] Ashley CW, Da Cruz Paula A, Kumar R, et al. Analysis of mutational signatures in primary and metastatic endometrial cancer reveals distinct patterns of DNA repair defects and shifts during tumor progression. Gynecol Oncol 2019;152:11e9. [90] Weigelt B, Bi R, Kumar R, et al. The landscape of somatic genetic alterations in breast cancers from ATM germline mutation carriers. J Natl Cancer Inst 2018;110:1030e4. [91] Rusch M, Nakitandwe J, Shurtleff S, et al. Clinical cancer genomic profiling by three-platform sequencing of whole genome, whole exome and transcriptome. Nat Commun 2018;9:3962. [92] Li J, Dai H, Feng Y, et al. A comprehensive strategy for accurate mutation detection of the highly homologous PMS2. J Mol Diagn 2015;17:545e53. [93] Au KF, Underwood JG, Lee L, Wong WH. Improving PacBio long read accuracy by short read alignment. PloS One 2012;7. e46679. [94] Chang MT, Bhattarai TS, Schram AM, et al. Accelerating discovery of functional mutant alleles in cancer. Cancer Discov 2018;8:174e83. [95] Chang MT, Asthana S, Gao SP, et al. Identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. Nat Biotechnol 2016;34:155e63. [96] Consortium ITP-CAoWG. Pan-cancer analysis of whole genomes. Nature 2020;578:82e93. [97] Riggs ER, Andersen EF, Cherry AM, et al. Technical standards for the interpretation and reporting of constitutional copy-number variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics (ACMG) and the Clinical Genome Resource (ClinGen). Genet Med 2020; 22:245e57. [98] Bamford S, Dawson E, Forbes S, et al. The COSMIC (catalogue of somatic mutations in cancer) database and website. Br J Cancer 2004;91:355e8. [99] Forbes S, Clements J, Dawson E, et al. Cosmic 2005. Br J Cancer 2006;94:318e22. [100] Forbes SA, Beare D, Bindal N, et al. COSMIC: high-resolution cancer genetics using the catalogue of somatic mutations in cancer. Curr Protoc Hum Genet 2016;91:10 1 1e1 37.

190

Chapter 9 Somatic data usage for classification of germ line variants

[101] Forbes SA, Beare D, Boutselakis H, et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res 2017;45:D777e83. [102] Forbes SA, Beare D, Gunasekaran P, et al. COSMIC: exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Res 2015;43:D805e11. [103] Forbes SA, Bhamra G, Bamford S, et al. The catalogue of somatic mutations in cancer (COSMIC). Curr Protoc Hum Genet 2008. PMID: 18428421. Chapter 10: Unit 10 1. [104] Forbes SA, Bindal N, Bamford S, et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res 2011;39:D945e50. [105] Forbes SA, Tang G, Bindal N, et al. COSMIC (the Catalogue of Somatic Mutations in Cancer): a resource to investigate acquired mutations in human cancer. Nucleic Acids Res 2010;38:D652e7. [106] Sondka Z, Bamford S, Cole CG, Ward SA, Dunham I, Forbes SA. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat Rev Cancer 2018;18:696e705. [107] Tate JG, Bamford S, Jubb HC, et al. COSMIC: the catalogue of somatic mutations in cancer. Nucleic Acids Res 2019;47:D941e7. [108] Cerami E, Gao J, Dogrusoz U, et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov 2012;2:401e4. [109] Gao J, Aksoy BA, Dogrusoz U, et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal 2013;6:pl1. [110] Zhou X, Edmonson MN, Wilkinson MR, et al. Exploring genomic alteration in pediatric cancer using ProteinPaint. Nat Genet 2016;48:4e6. [111] Medicine TMGIaWUSo. CIViC. web. 2020. [112] Grossman RL, Heath AP, Ferretti V, et al. Toward a shared vision for cancer genomic data. N Engl J Med 2016;375:1109e12. [113] Landrum MJ, Kattman BL. ClinVar at five years: delivering on the promise. Hum Mutat 2018;39:1623e30. [114] Landrum MJ, Lee JM, Benson M, et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res 2018;46:D1062e7. [115] H. VMaA. Online mendelian inheritance in man, OMIMÒ. McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD). [116] Karczewski KJ, Francioli LC, Tiao G, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 2020;581:434e43. [117] Airey E, Portelli S, Xavier JS, et al. Identifying genotype-phenotype correlations via integrative mutation analysis. Methods Mol Biol 2021;2190:1e32. [118] Maloy SR, Hughes KT. Brenner’s encyclopedia of genetics. 2nd ed. Amsterdam: Elsevier/Academic Press; 2013. [119] Knudson Jr AG. Mutation and cancer: statistical study of retinoblastoma. Proc Natl Acad Sci U S A 1971; 68:820e3. [120] Powell SM, Zilz N, Beazer-Barclay Y, et al. APC mutations occur early during colorectal tumorigenesis. Nature 1992;359:235e7. [121] Mao X, Young BD, Lu YJ. The application of single nucleotide polymorphism microarrays in cancer research. Curr Genom 2007;8:219e28. [122] Gondek LP, Tiu R, O’Keefe CL, Sekeres MA, Theil KS, Maciejewski JP. Chromosomal lesions and uniparental disomy detected by SNP arrays in MDS, MDS/MPD, and MDS-derived AML. Blood 2008;111: 1534e42. [123] Ishikawa S, Komura D, Tsuji S, et al. Allelic dosage analysis with genotyping microarrays. Biochem Biophys Res Commun 2005;333:1309e14. [124] Lo KC, Bailey D, Burkhardt T, Gardina P, Turpaz Y, Cowell JK. Comprehensive analysis of loss of heterozygosity events in glioblastoma using the 100K SNP mapping arrays and comparison with copy number abnormalities defined by BAC array comparative genomic hybridization. Genes Chromosomes Cancer 2008;47:221e37.

References

191

[125] Gallagher DJ, Cronin AM, Milowsky MI, et al. Germline BRCA mutation does not prevent response to taxane-based therapy for the treatment of castration-resistant prostate cancer. BJU Int 2012;109:713e9. [126] Neuhausen S, Gilewski T, Norton L, et al. Recurrent BRCA2 6174delT mutations in Ashkenazi Jewish women affected by breast cancer. Nat Genet 1996;13:126e8. [127] Offit K, Levran O, Mullaney B, et al. Shared genetic susceptibility to breast cancer, brain tumors, and Fanconi anemia. J Natl Cancer Inst 2003;95:1548e51. [128] Robson M, Offit K. Clinical practice. Management of an inherited predisposition to breast cancer. N Engl J Med 2007;357:154e62. [129] Walsh MF, Kennedy J, Harlan M, et al. Germline BRCA2 mutations detected in pediatric sequencing studies impact parents’ evaluation and care. Cold Spring Harb Mol Case Stud 2017;3. [130] Ritter DI, Rao S, Kulkarni S, Madhavan S, Offit K, Plon SE. A case for expert curation: an overview of cancer curation in the clinical genome resource (ClinGen). Cold Spring Harb Mol Case Stud 2019;5. [131] Abou Tayoun AN, Pesaran T, DiStefano MT, et al. Recommendations for interpreting the loss of function PVS1 ACMG/AMP variant criterion. Hum Mutat 2018;39:1517e24. [132] Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 2009; 10:57e63. [133] Ichikawa S, Prockop S, Cunningham-Rundles C, et al. Reticular dysgenesis caused by an intronic pathogenic variant in AK2. Cold Spring Harb Mol Case Stud 2020;6. [134] Yang C, Ceyhan-Birsoy O, Mandelker D, et al. A synonymous germline variant PALB2 c.18G>T (p.Gly6¼) disrupts normal splicing in a family with pancreatic and breast cancers. Breast Cancer Res Treat 2019;173:79e86. [135] Zhang L, Walsh MF, Jairam S, et al. Fumarate hydratase FH c.1431_1433dupAAA (p.Lys477dup) variant is not associated with cancer including renal cell carcinoma. Hum Mutat 2020;41:103e9. [136] Romeril KR, Taylor B, Rae P. HTLV1 related cutaneous T cell lymphoma. N Z Med J 1994;107:139. [137] Royle NJ, Foxon J, Jeyapalan JN, et al. Telomere length maintenanceean ALTernative mechanism. Cytogenet Genome Res 2008;122:281e91. [138] Kohanbash G, Carrera DA, Shrivastav S, et al. Isocitrate dehydrogenase mutations suppress STAT1 and CD8þ T cell accumulation in gliomas. J Clin Invest 2017;127:1425e37. [139] Lamant L, Pulford K, Bischof D, et al. Expression of the ALK tyrosine kinase gene in neuroblastoma. Am J Pathol 2000;156:1711e21. [140] Villar S, Ortiz-Cuaran S, Abedi-Ardekani B, et al. Aflatoxin-induced TP53 R249S mutation in hepatocellular carcinoma in Thailand: association with tumors developing in the absence of liver cirrhosis. PloS One 2012;7. e37707. [141] Brady SW, Liu Y, Ma X, et al. Pan-neuroblastoma analysis reveals age- and signature-associated driver alterations. Nat Commun 2020;11:5183. [142] Fortuno C, Mester J, Pesaran T, et al. Suggested application of HER2þ breast tumor phenotype for germline TP53 variant classification within ACMG/AMP guidelines. Hum Mutat 2020;41:1555e62. [143] Li S, Qian D, Thompson BA, et al. Tumour characteristics provide evidence for germline mismatch repair missense variant pathogenicity. J Med Genet 2020;57:62e9. [144] Spurdle AB, Couch FJ, Parsons MT, et al. Refined histopathological predictors of BRCA1 and BRCA2 mutation status: a large-scale analysis of breast cancer characteristics from the BCAC, CIMBA, and ENIGMA consortia. Breast Cancer Res 2014;16:3419. [145] Thompson BA, Goldgar DE, Paterson C, et al. A multifactorial likelihood model for MMR gene variant classification incorporating probabilities based on sequence bioinformatics and tumor characteristics: a report from the colon cancer family registry. Hum Mutat 2013;34:200e9. [146] Hitchins M, Williams R, Cheong K, et al. MLH1 germline epimutations as a factor in hereditary nonpolyposis colorectal cancer. Gastroenterology 2005;129:1392e9.

192

Chapter 9 Somatic data usage for classification of germ line variants

[147] Parsons MT, Buchanan DD, Thompson B, Young JP, Spurdle AB. Correlation of tumour BRAF mutations and MLH1 methylation with germline mismatch repair (MMR) gene mutation status: a literature review assessing utility of tumour features for MMR variant classification. J Med Genet 2012;49:151e7. [148] Metcalf AM, Spurdle AB. Endometrial tumour BRAF mutations and MLH1 promoter methylation as predictors of germline mismatch repair gene mutation status: a literature review. Fam Cancer 2014;13: 1e12. [149] Donahu TF, Bagrodia A, Audenet F, et al. Genomic characterization of upper-tract urothelial carcinoma in patients with lynch syndrome. JCO Precis Oncol 2018;2018. [150] Kirschner M, Maurer A, Wlodarski MW, et al. Recurrent somatic mutations are rare in patients with cryptic dyskeratosis congenita. Leukemia 2018;32:1762e7. [151] Perdigones N, Perin JC, Schiano I, et al. Clonal hematopoiesis in patients with dyskeratosis congenita. Am J Hematol 2016;91:1227e33. [152] Mamidi TKK, Wu J, Hicks C. Interactions between germline and somatic mutated genes in aggressive prostate cancer. Prostate Cancer 2019;2019:4047680.

CHAPTER

Pharmacogenetics and personalized medicine

10 Rocı´o Nu´n˜ez-Torres, Anna Gonza´lez-Neira

Human Genotyping UniteSpanish National Genotyping Centre(CEGEN), Human Cancer Genetics Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain

Introduction to pharmacogenetics and personalized medicine Patients vary in their response to drugs: doses effective in some patients will inexorably be ineffective or cause adverse drug reactions (ADRs) in others. The two major concerns about drug treatment are therefore the lack of response and the development of toxicities. Interindividual differences in drug pharmacokinetics and pharmacodynamics can cause undesirable consequences for patients. In Europe, almost 4% of hospital admissions were due to ADRs and 10% of patients developed side effects during their in-patient stay [1]. A recent report suggests a cost of preventable adverse events that is likely to be more than £1 billion but could be up to £2.5 billion annually to the National Health Service [2]. Genetic factors can account for 20%e95% of the variability in drug disposition and effects [3]. In particular, genetic variation in genes that encode enzymes involved in drug absorption, distribution, metabolism, and excretion (ADME) genes and transporters can explain a large part of these interindividual differences in drug action. The term “pharmacogenetics” (PGx) was coined in the 1950s and captures the idea that DNA variants contribute significantly to the drug response in an individual, mainly used in relation to ADME or drug target genes. More recently, the term “pharmacogenomics” was introduced as a wider term. While both terms are often used interchangeably, pharmacogenomics involves the study of genes on all chromosomes, whereas PGx is the study of specific variants of genes with known functions that are probably connected to drug response. Once the pharmacovariants that influence drug response are identified, they can be used to stratify patients into subgroups to “give the right treatment to the right patient, at the right dose and the right time.” Therefore, PGx has been widely recognized as a fundamental step toward “personalized medicine,” also known as “precision medicine.” Many advances in this field have been made since the late 1950s, as PGx became a recognized science and genetic variations in some enzymes (e.g., N-acetyltransferase, G6PD) were discovered. Early studies were investigations of pharmacological consequences focused on single gene variations [4]. However, it seemed that most differences in drug responses were caused by the altered function of numerous genes and by environmental factors, often interacting [5]. After the increased number of studies during the last 10 years, many pharmacogenomic biomarkers have been discovered and implemented in the clinic. These studies uncovered a vast collection of genetic variants with high impact on drug metabolism, some of which have been effectively translated into clinical practice. There are currently 431 Food and Drug Clinical DNA Variant Interpretation. https://doi.org/10.1016/B978-0-12-820519-8.00010-7 Copyright © 2021 Elsevier Inc. All rights reserved.

193

194

Chapter 10 Pharmacogenetics and personalized medicine

Administration (FDA) drug labels that refer to pharmacogenomic biomarkers of drug safety or efficacy, including not only germ line variants but also somatic gene variants, functional deficiencies, gene expression differences, and chromosomal abnormalities (https://www.fda.gov/drugs/science-andresearch-drugs/table-pharmacogenomic-biomarkers-drug-labeling). Moreover, a large number of specific guidelines on how to adjust medications on the basis of PGx tests have been published by the Clinical Pharmacogenetics Implementation Consortium (CPIC) with the objective of facilitating the translation of PGx knowledge from bench to bedside. In addition, it is very important to standardize the way PGx variants are described and reported. In this sense, the International Nomenclature Workgroup, an international workgroup to increase transparency and standardization of PGx result reporting, provides recommendations applicable to stakeholders, including clinical laboratories and researchers who generate and report the results of PGx testing [6].

Variant nomenclature in pharmacogenetics Star allele nomenclature Star allele nomenclature was proposed in 1996 as a systematic way to catalog allelic variants of cytochrome P450 2D6 (CYP2D6) [7]. This nomenclature was maintained and updated in the Human Cytochrome P450 (CYP) Allele Nomenclature Database [8]. Recently, this database was moved to the Pharmacogene Variation (PharmVar) Consortium [9] and covers more than 20 CYP genes and other PGx genes. Star allele nomenclature is usually based on a common sequence defined as the reference (or wildtype) and designated 1 (e.g., CYP2D6 1 or TPMT 1). This definition is usually based on the subpopulation in which the gene was initially studied, and may not necessarily indicate the most common allele in all populations. In some cases, 1 is not the reference allele; for example, NAT24 is the reference allele for the NAT2 gene, as it is the most common functional allele across human populations [6] (see Box 10.1 for information about the reference sequences used). All subsequent allelic variants are compared to the 1 reference and assigned their own star () allele designation. Each star () allele is therefore defined by the presence of one or more specific sequence variations relative to the 1 allele. In this case, the terms haplotypes, alleles, and allelic variations are often used interchangeably. New star () alleles are currently only designated to functional variations causing consequences, such as amino acid substitutions, translation terminations,

BOX 10.1 PHARMVAR REFERENCE SEQUENCES [9] Most star alleles described are based on publicly available reference sequences (GRCh37-NC_000022.10 and GRCh38NC_000022.11) but there are some exceptions, such as M33388 for CYP2D6. In these cases, the genome reference used is defined. PharmVar provides variant coordinates using two different numbering conventions: - Sequence Start numbering counts from the start of the sequence with the first nucleotide of the sequence being position 1 and incrementing sequentially to the end of the sequence. - ATG Start numbering uses the A in the start codon of the gene as position 1 and increments sequentially in the 30 direction. Positions before the ATG start are listed as negative numbers. In this scheme, there is no position 0; the base immediately preceding the ATG start is numbered 1. ATG Start numbering is provided for locus sequences only (i.e., not for chromosomal sequences).

Variant nomenclature in pharmacogenetics

195

splice defects, or differential transcription rates [10]. When a haplotype has genetic variants with noncausative effects, it is designated as a suballele and receives letters after the star allele (e.g., CYP2B64A, CYP2B64B). Therefore, all alleles with the same star allele number share the defective variant, but could differ in less severe variants [10]. Distinction of suballeles is usually not necessary for phenotype prediction, as all alleles under a star number are believed to be functionally equal. Patient genotypes are usually reported as diplotypes, defining the inherited paternal and maternal alleles (e.g., 1/2). Combining genetic variants in diplotypes facilitates the association of diplotypes with the predicted phenotypes. Diplotypes are typically assigned based on genotypes of the genetic variants tested and default assignment may be applied depending on whether these variants are detected. Therefore, 1 alleles are assigned when none of the variants studied are detected but other dysfunctional alleles may be present. The probability of a correct designation of a 1 allele increases with the number of the relevant alleles tested [6].

HLA nomenclature Nomenclature of HLA alleles is significantly different from other pharmacogenes classified as star alleles, such as CYPs, mainly due to the high number of alleles detected (around 20,000 class I alleles and more than 5000 class II alleles). In the case of HLA the nomenclature includes up to four sets of digits separated by colons (e.g., HLA-A2:101:01:02:N). The first set of digits defines the “type,” which often corresponds to the serological antigen carried by an allotype. The second set of digits defines the “subtypes,” where the numbers have been assigned in the order in which DNA sequences have been determined. The combination of the first and second set of digits can describe every HLA allele for which there is a nucleotide polymorphism that changes the amino acid sequence of the protein. The third set of digits defines those alleles that differ only by synonymous nucleotide substitutions (also called silent or noncoding substitutions) within the coding sequence. The fourth set of digits defines alleles that only differ by sequence polymorphisms in the introns, or in the 50 or 30 untranslated regions that flank the exons and introns [11]. Additionally, alleles may include a letter suffix, which describes the allele’s protein expression. If no letter suffix is given, it is assumed that the protein expresses normally. The following letter suffixes are used: “N” for “Null” alleles; “L” is used for “Low” cell surface expression; “S” for “Secreted” molecules which are soluble proteins not present on the cell surface; the “C” suffix is assigned to proteins present in the “Cytoplasm” and not on the cell surface; the “A” suffix is used for “Aberrant” expression where there is some doubt as to whether a protein is actually expressed; and “Q” is used for alleles with “Questionable” expression, given that the mutation seen in the allele has been shown to affect normal expression levels in other alleles [11].

Other pharmacogenetic nomenclatures Some PGx genes are not based on star allele nomenclature. In these cases, their alleles are usually named using the Human Genome Variation Society (HGVS) nomenclature or rsID. The use of this nomenclature instead of star allele nomenclature may be due to the low frequency of the variants and the high number of variants described, or to large size of the genes making haplotype designation unreliable (e.g., DPYD). Nevertheless, star allele nomenclature is used for some allelic variants of certain genes, mainly the more relevant variants.

196

Chapter 10 Pharmacogenetics and personalized medicine

Technologies for pharmacogenetic testing Given the large list of known pharmacogenetic variants already established as clinically relevant, large-scale genotyping seems to be a good strategy to analyze them rapidly, accurately, and costeffectively. In the last 15 years, genotyping technology has developed from single assay methods based on restriction fragment length polymorphism analysis, single-base extension with fluorescence detection, and homogeneous solution hybridization such as TaqMan and molecular beacon genotyping, to multiplex genotyping methods such as mass spectrometry and DNA chip-based microarray technologies. Many laboratories conducting PGx testing use targeted genotyping technologies to screen for well-characterized pharmacovariants, mainly SNVs (single nucleotide variants). However, recent studies suggest that a considerable number of novel rare variants in pharmacogenes likely contribute to the still unexplained fraction of the observed interindividual variability. A study of genetic variation in phase I and II metabolic enzymes, drug transporters, and nuclear receptors combining SNV data of the Exome Sequencing Project (6503 individuals) and the 1000 Genomes Project (1092 individuals) indicates that the great majority of all variants in coding regions are rare (93%; MAF (minor allele frequency) < 1%) or very rare (83%; MAF T; rs12248560) leads to an increased CYP2C19 activity by enhancing its expression [44]. This allele presents a multiethnic frequency of 3%e21%. CYP2C1917 is in linkage disequilibrium with the

Drug

TPMT

NUDT15

Codeine

CYP2D6

Phenotype Normal metabolizer Intermediate metabolizer Poor metabolizer

Normal metabolizer Intermediate metabolizer Poor metabolizer

Ultrarapid metabolizer Extensive metabolizer

Intermediate metabolizer Poor metabolizer Abacavir

Fluoropyrimidines

HLA-B

DPYD

Very low risk of hypersensitivity High risk of hypersensitivity DPYD normal metabolizer DPYD intermediate metabolizer

Alleles 



1/ 1 1/2, 1/3A,1/B, 1/3C, 1/4





3A/3A, 2/3A, 3A/3C, 3C/4, 2/3C, 3A/4





1/1 1/2,1/3





2/2, 2/3, 3/3

Activity score >2.0, 1/1xN, 1/2xN Activity score ¼ 1.0e2.0, 1/1,   1/ 2, 2/2, 1/41, 1/4, 2/5,   1/ 10 Activity score ¼ 0.5, 4/10, 5/41 Activity score ¼ 0, 4/4, 4/5, 5/5,   4/ 6   X/ Xa 

57:01/X, 57:01/57:01

Activity score ¼ 2.0, c.[ ¼ ]; [ ¼ ], c.[85T > C]; [ ¼ ], c.[1627A > G]; [¼] Activity score ¼ 1.0 or 1.5, c.[190511G > A]; [ ¼ ], c.[1679T > G]; [ ¼ ], c.[2846A > T]; [ ¼ ]; c.[1129e5923C > G]; [ ]; c.[1129e5923C > G]; [1129e5923C > G]; c.[2846A > T]; [2846A > T]

Clinical recommendation Normal starting dose Reduction of the starting dose (30%e80%) Reduce the starting dose 10-fold and reduce frequency to thrice weekly instead of daily. For nonmalignant conditions, consider alternative nonthiopurine immunosuppressant therapy Normal starting dose Reduction of the starting dose (30%e80%) Reduction of the starting dose at 10mg/m2/ day. For nonmalignant conditions, consider alternative nonthiopurine immunosuppressant therapy Avoid codeine use due to potential toxicity Use label-recommended age- or weightspecific dosing Use label-recommended age- or weightspecific dosing. If no response, consider alternative analgesics Avoid codeine use due to lack of efficacy Use abacavir per standard dosing guidelines Abacavir is not recommended. Select an alternative agent Use label-recommended dosage and administration Reduce starting dose based on activity score followed by titration of dose based on toxicity (activity score 1: reduce dose by 50%; activity score 1.5: reduce dose by 25%e50%)

Chapter 10 Pharmacogenetics and personalized medicine

Thiopurines

Gene

202

Table 10.1 Examples of the main pharmacogenetic drug-genes studied in clinical practice and therapy recommendations based on their phenotype.

DPYD poor metabolizer

Tacrolimus

Clopidogrel

CYP3A5

CYP2C19

Extensive metabolizer (CYP3A5 expresser) Intermediate metabolizer (CYP3A5 expresser) Poor metabolizer (CYP3A5 nonexpresser) Ultrarapid metabolizer

Intermediate metabolizer Poor metabolizer Carbamazepine

HLA-A

HLA-A31:01-negative

HLA-A31:01-positive HLA-B

HLA-B15:02-negative

HLA-B15:02-positive Oxcarbazepine

HLA-B

HLA-B15:02-negative HLA-B15:02-positive



1/3,1/6,1/7



3/3,6/6,7/7,3/6,3/7,6/7



1/17, 17/17



1/1



1/2, 1/3, 2/17



2/2, 2/3, 3/3



Y/Ya



31:01/Ya, 31:01/31:01



X/Xa



15:02/Xa, 15:02/15:02



X/Xa 15:02/Xa, 15:02/15:02



Activity score 0.5: avoid use of 5-fluorouracil or 5-fluorouracil prodrug regimens; if no alternative, reduce dose strongly. Activity score 0: avoid use of 5-fluorouracil or 5fluorouracil prodrug regimens Increase starting dose 1.5e2 times recommended starting dose Increase starting dose 1.5e2 times recommended starting dose Initiate therapy with standard recommended dose Clopidogrel label-recommended dosage and administration Clopidogrel label-recommended dosage and administration Alternative antiplatelet therapy (if no contraindication), e.g., prasugrel, ticagrelor Alternative antiplatelet therapy (if no contraindication), e.g., prasugrel, ticagrelor HLA-B15:02 negative: use standard dosing guidelines; HLA-B15:02 positive: if alternative agents are available, do not use carbamazepine If alternative agents are available, do not use carbamazepine HLA-A31:01 negative: use standard dosing guidelines; HLA-A31:01 positive: if alternative agents are available, do not use carbamazepine If alternative agents are available, do not use carbamazepine Use standard dosing guidelines If alternative agents are available, do not use carbamazepine

Pharmacogenetics examples in clinical practice

Extensive metabolizer

Activity score ¼ 0 or 0.5, c.[190511G > A]; [190511G > A], c.[1679T > G]; [1679T > G], c.[190511G > A]; [2846A > T] c.[190511G > A]; [1129e5923C > G]   1/ 1

203

204

Chapter 10 Pharmacogenetics and personalized medicine

nonfunctional allele CYP2C192 and in the case of the presence of both alleles CYP2C1917 is unable to fully compensate for the CYP2C192 allele, resulting in intermediate activity of the enzyme [45]. Novel “unknown significance” or “likely benign” CYP2C19 variants should not be assumed to mimic the biological consequences of known CYP2C19 loss-of-function alleles and their effect on clopidogrel response [42]. CYP2C19 phenotypes are assigned according to the number of nonfunctional alleles harbored: if patients harbor two wild-type alleles (1/1), they are considered CYP2C19 extensive metabolizers, if patients carry one nonfunctional allele they are categorized as CYP2C19 intermediate metabolizers (IMs) (1/2. 1/3, 2/17), and patients with two nonfunctional alleles are considered CYP2C19 poor metabolizers (PMs) (2/2, 2/3, 3/3); finally, patients harboring at least one CYP2C1917 increased activity allele (1/17, 17/17) are categorized as ultrarapid metabolizers (UMs) (Table 10.1). Although clopidogrel is used widely as an antiplatelet drug, PGx testing is only done in patients with ACS undergoing PCI and is not applicable to other settings. The main clinical recommendation is to use label-recommended dosage and administration in CYP2C19 UMs and extensive metabolizers, while CYP2C19 IMs and PMs are recommended to use an alternative antiplatelet therapy such as prasugrel or ticagrelor [42].

Oncology: fluoropyrimidines and DPYD Fluoropyrimidines are a family of chemotherapeutic agents widely used in the treatment of colon cancer, metastatic colorectal cancer, and metastatic breast cancer. Fluoropyrimidines belong to the class of antimetabolites, which act by inhibiting thymidylate synthase leading to the impairment of DNA synthesis. The most widely used fluoropyrimidines are 5-fluorouracil (5-FU) and capecitabine, which is a 5-FU prodrug [46]. Approximately 10%e40% of fluoropyrimidine-treated patients develop severe adverse effects such as neutropenia, vomiting, nausea, severe diarrhea, stomatitis, mucositis, or hand-foot syndrome [47,48], showing sometimes life-threatening effects with fatal consequences [49,50]. DPYD encodes the enzyme dihydropyrimidine dehydrogenase (DPD), which is the main responsible for 5-FU clearance. Around 100 variants have been identified in DPYD, some of which are responsible for a reduced enzyme function leading to increased toxicity in fluoropyrimidine-treated patients. In contrast, to other pharmacogenes such as CYP450 family members, DPYD alleles do not follow star allele nomenclature, but rather Human Genome Variation Society (HGVS) nomenclature or rsID. This is due to the size of the DPYD gene, and the low frequency of the variants makes haplotype designation unreliable [48]. Some DPYD alleles are mentioned as star () alleles, but this refers only to the presence of a specific genetic variant, not the whole haplotype. Of all variants identified, four are tested commonly due to their frequency in the population and their effect of decreasing DPD activity: c.1905þ1G > A (rs3918290, DPYD2A, DPYD: IVS14 þ 1G > A), c.1679T > G (rs55886062, DPYD13, p.I560S), c.2846A > T (rs67376798, p.D949V), and c.11295923C > G (rs75017182, HapB3) [51]. Around 7% of Europeans carry at least one of these alleles; HapB3 is the most frequent variant with a frequency of 4.7%. In other populations, such as African populations, other decreased function variants such as c.557 A > G (rs115232898, p.Y186C) are common (3%e5%) [48].

Pharmacogenetics examples in clinical practice

205

DPYD variants may abolish DPD activity, decrease it, or not affect it at all. Accordingly, DPYD variants are assigned an activity score of 0, 0.5, or 1, respectively. DPYD c.1905þ1G > A and DPYD13 are the main nonfunctional variants (activity score ¼ 0), whereas c.2846A > T and c.1129e5923C > G result in a decreased activity (activity score ¼ 0.5). Phenotypes are assigned a global activity score, i.e., the sum of the activity scores of the DPYD alleles (Table 10.1). DPYD PMs present a low activity score (0) and carry two nonfunctional alleles, DPYD IMs present one nonfunctional or decreased DPYD allele (activity score: 1e1.5), and DPYD normal metabolizer presents two functional alleles (activity score: 2). In the case of the presence of two nonfunctional/ decreased function alleles it is assumed they are present on different chromosomes [48] (Table 10.1). Clinical recommendations are based on the DPYD phenotypes assigned to the fluoropyrimidinetreated patients. In DPYD IMs, a reduction of the starting dose by 50% is recommended for patients with an activity score of 1, and by 25%e50% for those with an activity score of 1.5. Due to the high variability described, close monitoring is recommended in these patients. For DPYD PMs, the recommendation is to avoid the use of fluoropyrimidines and to use an alternative chemotherapeutic agent [48,51] (Table 10.1).

Gastroenterology: thiopurines and TPMT/NUDT15 Members of the thiopurine family (mainly 6-mercaptopurine (6-MP), azathioprine (AZA), MP prodrug, and 6-thioguanine) are used as immunosuppressants in the treatment of childhood acute lymphoblastic leukemia, organ transplantation, and autoimmune diseases such as inflammatory bowel disease. The metabolism of these drugs is complex involving several genes but there are two key genes, TPMT and NUDT15, which explain around 90% of the individual response to thiopurines. TPMT (thiopurine methyltransferase) is the predominant enzyme involved in the inactivation of thiopurines in hematopoietic cells. A deficiency in TPMT activity leads to an accumulation of active 6thioguanine nucleotides (TGNs) in blood cells causing hematopoietic toxicity in these patients [4]. Patients with severe TPMT deficiency receiving standard doses of thiopurines have a greatly increased risk of life-threatening drug-induced myelosuppression [52]. TPMT activity can be determined by measuring levels of the enzyme in red blood cells or by genotyping described functional alleles. Up to 41 TPMT alleles have been described, 11 of which lead to a nonfunctional enzyme (TPMT2, 3A,  3B, 3C, 4, 11, 14, 15, 23, 29, and 41) [53] (see Table 10.1). However, most of these alleles are very rare, being three TPMT genetic variants which define four TPMT alleles (TPMT2, 3A, 3B, and  3C) responsible for 80%e95% of deficient TPMT activity [54]. The most frequent deficient allele in the Caucasian population is the 3A allele, defined by two inactivating SNPs in very strong linkage disequilibrium (Ala154Thr (rs1800460; c.460G > A) and Tyr240Cys (rs1142345; c.719A > G)). These SNPs are usually inherited together, so if found in heterozygosis the assumption is that the rare genotypes are in cis (on the same allele) and the diplotype call is 1/3A. Sometimes, these SNPs appear alone and are called 3B (Ala154Thr (rs1800460; c.460G > A)) and 3C (Tyr240Cys (rs1142345; c.719A > G)) alleles. Given the frequency of both variants in the different populations, the presence of a compound heterozygote diplotype (3B/3C) is very unlikely and it is controversial if a 3B/3C individual has been ever identified [53]. In order to clarify the difference between a 1/3A and a 3B/3C individual, phenotyping tests can be performed. TPMT2 (Ala80Pro (rs1800462; c. 238G > C)) is also detected in the Caucasian population, although at a low frequency. Hence, TPMT genotyping tests usually analyze TPMT2, 3A, 3B, and 3C alleles and assign the lack of these

206

Chapter 10 Pharmacogenetics and personalized medicine

variants as TPMT1 wild type. Individuals are categorized according to their TPMT status as TPMT normal metabolizers (1/1), TPMT IMs (if one loss-of-function (LOF) allele is present, e.g., 1/3A), and TPMT PMs (if two LOF alleles are present, e.g., 2/3A). Determination of TPMT status prior to thiopurine drug prescription is recommended. Approximately 10% of the European population are TPMT IMs, and around 3% are PMs. Clinical recommendations are available for these patients with dose reductions of 30%e80% for IMs and drastically reduced doses (10-fold, thrice weekly instead of daily) or even nonthiopurine immunosuppressant therapy for PMs [53] (Table 10.1). Recently, NUDT15 was identified in a genome-wide association study (GWAS) as a key gene in thiopurine tolerance in the Asian population [55]. NUDT15 deficiency leads to accumulation of cytotoxic thioguanine triphosphate causing DNA damage and apoptosis. To date, nine NUDT15 alleles have been defined, two of which were described as nonfunctional (NUDT152 and 3). The SNP p.R139C (rs116855232, c.415C > T) was the first SNP identified in the GWAS and can be found alone (NUDT153) or in combination with p.V18_V19dup (rs869320766; c.50_55dup) (NUDT152). Patients carrying this allele show severe myelosuppression and patients homozygous for this variant tolerate only 8% of the standard dose of thiopurines [56]. A reduction of thiopurine doses of 30%e80% is recommended for heterozygous individuals and a drastic reduction (10-fold, thrice weekly instead of daily) or even nonthiopurine immunosuppressant therapy for homozygous individuals [53] (Table 10.1). NUDT15 PMs are more frequent in the Asian population than TPMT PMs in the European population, and therefore NUDT15 genotyping is clinically relevant in certain populations.

Organ transplant: tacrolimus and CYP3A5 Tacrolimus is one of the most widely prescribed immunosuppressants in solid organ transplantation, but it is also used for the prevention and treatment of graft rejection in allogeneic hematopoietic stem cell transplants [57]. Tacrolimus binds to the intracellular protein FKBP-12 to form a complex leading to calcineurin inhibition, ultimately causing the inhibition of T-lymphocyte activation and proliferation [58]. Tacrolimus has a narrow therapeutic index and a wide interindividual variability in drug pharmacokinetics. Tacrolimus toxicity occurs at concentrations slightly above or even within the recommended dose range (>20ng/ul) causing nephrotoxicity, infection, hypertension, hyperkalemia, hypomagnesemia, hyperglycemia, diabetes, tremor, and other neurotoxic effects. On the other extreme, subtherapeutic concentrations (2.0 are

208

Chapter 10 Pharmacogenetics and personalized medicine

considered to be UMs. The incidence of PMs and UMs varies greatly among populations (0%e10% and 0%e29%, respectively) [65], highlighting the importance of CYP2D6 genotyping in populations such as Asian populations where the reduced-activity allele CYP2D610 is the most common allele (with an allele frequency >50%) or populations of sub-Saharan African descent, where the reducedactivity allele CYP2D617 is detected in almost 30% of individuals [63]. These alleles, however, have a considerably lower prevalence, or are even absent, in other ethnic groups such as Caucasians of European ancestry. In this sense, PMs are mainly found in Europe and UMs in North Africa and Oceania, whereas IMs are predominantly located in Asia [66]. The association of the CYP2D6 metabolizer phenotype with the formation of morphine from codeine is well documented [65]. PMs have been shown to have lower levels of morphine and hence a decreased analgesia [67], whereas UMs show toxic systemic morphine levels even at low codeine doses leading in some cases to severe and life-threatening side effects [68]. One case report describes the relationship between the death of a breastfeeding infant and the CYP2D6 UM phenotype of his mother who was prescribed codeine, resulting in lethal concentrations of morphine in the breast milk [69]. Therefore, the use of analgesics other than codeine or other opioids, such as tramadol, hydrocodone, and oxycodone that are also metabolized by CYP2D6, is recommended in patients who are CYP2D6 PMs or UMs. Patients categorized as IMs should start with standard doses but closer monitoring is recommended [65]. Finally, another consideration to take into account is that the CYP2D6 metabolizer phenotype may be altered in patients who are taking drugs that inhibit CYP2D6 activity. In addition, recently SNPs located in an enhancer region over 100 kb downstream of the CYP2D6 gene were described that seem to increase CYP2D6 transcription levels up to 2.5-fold [70]. Although this effect needs to be consolidated and explored in other populations, adjusting activity score values based on these SNPs may fine-tune phenotype prediction in the future.

Antiretroviral therapy: abacavir and HLA-B Abacavir is a nucleoside reverse transcriptase inhibitor used as adjuvant treatment in human immunodeficiency virus (HIV) infected patients. Abacavir inhibits HIV reverse transcriptase, thus avoiding the conversion of RNA to DNA and its insertion into the host’s genome. Around 5%e8% of abacavirtreated patients present hypersensitivity reactions (HSRs) during the first 6 weeks of treatment and require immediate cessation of the therapy [71]. HSRs comprise the following symptoms: fever, rash, fatigue, gastrointestinal symptoms (nausea, vomiting, or diarrhea), and respiratory symptoms (dyspnea, cough, or pharyngitis). Retreatment with abacavir after discontinuation can lead to lifethreatening reactions such as anaphylaxis [72]. Regarding HLA-B, allele HLA-B07:02:01 has been established as the reference sequence due to its high prevalence in Caucasians. The presence of allele HLA-B57:01 has been associated to the appearance of HSRs in abacavir-treated patients [73,74] with a negative predictive value of more than 99% [75]. These observations led the EMA and the FDA to recommend HLA-B57:01 genetic testing prior to abacavir use. Since other HLA-B alleles are not informative, only allele HLA-B57:01 is recommended for the screening [76]. HLA-B57:01 differs from the reference allele (HLA-B07:02: 01) by a significant number of nucleotides, which result in numerous amino acid changes. For this reason, several approaches to detect the HLA-B07:02:01 genotype can be employed, such as whole-genome sequence-based typing or allele-specific PCR. Another option is to test the allele

Implementation of pharmacogenetic testing in clinical practice

209

HLA-B57:01 using the presence of rs2395029, which is in linkage disequilibrium with the allele and has been shown to significantly correlate with the presence of HLA-B57:01 in Caucasians and Hispanics with a sensitivity of 100% and a positive predictive value of approximately 94% [77]. HLA-B57:01 status affects the likelihood that HSRs appear in an abacavir-treated patient. Since HLA-B has a codominant expression, patients can be categorized as HLA-B57:01 positive if they harbor at least one HLA-B57:01 allele, or as HLA-B57:01 negative if no HLA-B57:01 allele is detected (see Table 10.1). HLA-B57:01 is present at low frequency in African and Asian populations, and has a frequency of around 6% in the Caucasian population and a frequency of up to 20% in the South-West Asian population [76]. The main recommendation is to avoid abacavir prescription for HLA-B57:01 carriers who are naive for abacavir and to use another antiretroviral therapy (Table 10.1). Recommendation for patients harboring HLA-B57:01 who have been under treatment with abacavir for more than 6 weeks is controversial [76].

Implementation of pharmacogenetic testing in clinical practice The promise of PGx is that the use of an individual’s genetic information would help to predict drug response and further guide optimal drug and dose selection to enable safer, more effective, and costeffective treatment. Many hospitals and health centers have implemented PGx reactively, ordering a pharmacogenetic test for a specific variant or gene when it is needed to prescribe a high-risk drug, to guarantee that the optimal treatment is selected. However, this strategy is costly and has a slow turnaround time, an important problem when a correct drug prescription is urgently required. Due to these limitations, it is being increasingly recognized that the best way to implement PGx in the clinical setting is a preemptive strategy that generates variant data for multiple pharmacogenes, ideally before prescription of any target drug, together with variant data that are embedded in EHRs and accompanied by a clinical decision support that gives advice or dose recommendations when a drug is prescribed. Implementing preemptive tests requires technical expertise to support laboratory testing and reporting of well-curated genetic data, extensive understanding of predicted consequences and clinical evidence, designation of predicted metabolizer phenotype status (e.g., normal metabolizers, PMs), and advice regarding drug prescribing and alternatives for patients with a particular genetic risk variant. Fortunately, many of these needs are now being met by, among others, PharmGKB, the CPIC, and the DPGW. However, in spite of all these efforts, there is still a lack of consensus between the regulatory and advisory agencies regarding pharmacogenomic drug labels with respect to the impact of pharmacogenomic variants, which makes the translation of pharmacogenomic guidelines into clinical practice very difficult. In a recent paper, pharmacogenomic labels by the FDA and the EMA were compared, as well as the guidance provided by the CPIC and the DPGW. They found that out of 54 drugs with an actionable geneedrug interaction in the CPIC and DPWG guidelines, only 50% had actionable pharmacogenomic information in the summaries of product characteristics and the concordance between the agencies was only 18% [78]. The basis for this inconsistency is clearly due to the fact that there are no firm criteria defined for the level of evidence, which is necessary for the translation of a drugegene interaction observed by clinical studies into the different categories of a drug label. In order to obtain consensus regarding the pharmacogenomic information on drug labels, we must improve the basis for the decisions. To rectify this regulatory deficiency, Koutsilieri et al. suggest a long-term concerted effort engaging all

210

Chapter 10 Pharmacogenetics and personalized medicine

stakeholders involved and experts active in pharmacogenomics with the aim of harmonizing existing guidelines and pharmacogenomic information on drug labels [79]. Such harmonization should result in a differentiated list where actionable drug labels with proven high clinical relevance are defined and separated from the labels where more information is required and labels for informational purposes only. For a broader implementation of PGx it is essential to demonstrate the value and costeffectiveness of testing to key decision-makers [80]. Currently, several initiatives for pharmacogenetic implementation have been launched and evaluations of cost-effectiveness of preemptive strategies are being performed. In the United States, the Vanderbilt Pharmacogenomic Resource for Enhanced Decisions in Care and Treatment (PREDICT) program [81] has done preemptive genotyping of more than 10,000 patients using a panelbased approach [82] and showed the benefit of panel-based testing over single gene testing: costs were saved by reducing the number of single tests by 60%. The Electronic Medical Records and Genomics (eMERGE) Network has set up the eMERGE-PGx project together with the PGRN, with the aim of testing more than 80 very important pharmacogenes by targeted sequencing [83]. In Europe, the EU-funded Ubiquitous Pharmacogenomics (U-PGx) Consortium, a network of European experts, aims to provide evidence of the clinical utility of implementing a panel of PGxmarkers in routine care. They are conducting a prospective, block-randomized, controlled clinical study (PREemptive Pharmacogenomic testing for prevention of Adverse drug REactions [PREPARE]) of preemptive genotyping using a panel of clinically relevant PGx-markers across healthcare institutions in seven European countries. The impact on patient outcomes and cost-effectiveness will be investigated and the trial is aiming to report by the end of 2020 [84]. Another study reviewed 44 economic evaluations relating to 10 drugs; 30% of these evaluations concluded that PGx testing was cost-effective and 27% concluded that they were even cost saving. Thus, PGx-guided treatment can be a cost-effective and even a cost-saving strategy that would make more genetic tests economically worthwhile [85].

Future perspectives of personalized medicine Thanks to the development of PGx in recent years, personalized medicine becomes more and more available to the clinics. However, not all differences among individuals in the response to treatments can be explained exclusively by DNA variants. Other fields such as transcriptomics (mRNA and microRNA expression profiles), epigenomics (DNA methylation profiles), microbiota research and metabolomics could provide important insights into drug response and the integration of different data types has been shown to improve the prediction of drug response [86]. The development of high content “omics” technologies in the last decade, such as transcriptomics (genome-wide gene-expression profiling utilizing microarray technology or RNAseq), has generated, in particular in cancer, a wealth of data of different types whose analysis has helped in the identification of signatures of different cellular sensitivity/resistance responses to hundreds of chemical compounds [87,88]. One example of this is the Cancer Cell Line Encyclopedia (CCLE), a compilation of gene expression, chromosomal copy number, and massively parallel sequencing data from 947 human cancer cell lines [89]. Not only specific features can be discovered, such as biomarkers for resistance or sensitivity to a particular drug, but combinations of those same genomic and molecular features can also be used to predict the effect of a drug on a patient [90].

Future perspectives of personalized medicine

211

In the clinical setting, a number of gene expression signatures for treatment response have been identified as potential predictive markers and their use could be progressively implemented for drug selection in patients, allowing thus more rational and individualized treatments. This strategy may guide treatment decision-making and improve therapy response not only in oncology [91e93] but also in other disciplines such as psychiatry [94,95] and cardiology [96e98]. Epigenetics provides another layer of information that could help to develop personalized therapy and optimize treatment. In particular, pharmacoepigenetics studies the changes in expression of pharmacogenes that are not due to changes in DNA sequences [99]. The study of pharmacoepigenetics at a genome-wide level is referred to as pharmacoepigenomics [100]. Three main classes of epigenetic marks are DNA methylation, modification of histone tails, and noncoding RNAs. DNA methylation and histone modifications have been demonstrated to participate in the regulation of w60 human ADME genes [101]. A quick review of the list of studies supporting a role for epigenetic factors in the regulation of ADME genes reveals that the majority of studies were conducted in tumor cell lines. Tumor cells have aberrant DNA methylation patterns, characterized by global hypomethylation and promoter-associated hypermethylation; hypermethylation of the ADME gene promoters, mainly in CpG islands, represses their expression, but in some tumors promoters are hypomethylated and ADME gene expression levels are increased (e.g., NAT1 in breast cancer and CYP1B1 in prostate cancer). Outside the perspective of cancer, a recently performed genome-wide integrative analysis of noncancerous human tissues and hepatoma cells confirmed that ADME genes are transcriptionally regulated by DNA methylation, resulting in different mRNA expression levels among individuals [102]. Moreover, environmental factors can influence ADME gene expression via epigenetic modifications. Examples are chemotherapeutic drug exposure [103], tobacco exposure [104], and antiepileptic drug exposure [105]. NGS and methylation arrays have improved epigenetic biomarker identification, which will ultimately contribute to personalized medicine [106]. Furthermore, the identification of highly sensitive, specific, and easily accessible epigenetic biomarkers and applying them along with the genetic biomarkers is a key step toward successful personalized treatment. Ultimately, both PGx and pharmacoepigenetics must be taken into account in order to improve and individualize drug therapy. Pharmacometabolomics (PMx) studies use information contained in metabolic profiles (or the metabolome) to inform about how a subject will respond to drug treatment. The metabolome represents the entire repertoire of small molecules (metabolites) present in cells, tissues, or body fluids, and is analyzed predominately utilizing mass spectrometry and nuclear magnetic resonance spectroscopy technologies. Genome, age, health status, sex, gut microbiome, nutrition, and other factors can influence the metabolic profile of an individual. Some of these factors are known to influence the individual response to a particular drug. As such, metabolomic profiles obtained prior to, during, and after drug treatment could provide insights into the mechanism of action of drugs and variations in response to treatment [107e109]. The application of metabolomics to studying drug effects was established by the Pharmacometabolomics Research Network [110] (http://pharmacometabolomics. duhs.duke.edu/). This network has the goal of integrating the rapidly evolving science of metabolomics with molecular pharmacology and pharmacogenomics to move toward the goal of “individualized” drug therapy and subclassification of diseases based on treatment outcomes. An early study in humans investigated effects of three antipsychotics in schizophrenia patients, compared their effects on metabolism and defined a lipid signature at baseline associated with treatment outcomes [111]. Later, many other studies have shown metabolic profiles providing insights into variations in response

212

Chapter 10 Pharmacogenetics and personalized medicine

to antipsychotics, statins, antidepressants, antihypertensives, antiplatelet therapies, and development of side effects of treatments [112e116]. Several studies have also integrated PGx and pharmacometabolomics data, establishing that pharmacometabolomics data can inform and complement pharmacogenomics data [117e119]. Finally, pharmacomicrobiomics is an emerging field that investigates the interplay of microbiome variation and drug response. The gut microbiota forms a third dimension in drug metabolism, providing a nonoverlapping enzymatic capacity that generates distinct metabolites from host enzymatic products and may also shape drug pharmacokinetics [120]. The gut microbiota alters drugs by various mechanisms: degradation, activation, and modulation of drug-metabolizing host enzymes [120,121]. In particular, in cancer gut bacteria affect the response to chemo-, radio-, and immunotherapeutic drugs by modifying either efficacy or toxicity [122e124]. In addition, intratumor bacteria could modulate chemotherapy response, and at the same time, anticancer treatments could affect the microbiota composition, disrupting homeostasis and aggravating discomfort to the patient [125]. This interaction between microbiota and anticancer drugs is enjoying a growing interest, as are interventions aimed at shaping microbiota to optimize drug efficacy and reduce side effects [126,127]. Taken together, a drug response phenotype requires a thorough understanding of the complex mechanisms of hostemicrobiota interactions, which may be elucidated by integrating multi-omic host data (such as gene expression, epigenetics, SNVs, and metabolomics) with microbiome data (such as strain variation, gene expression, proteomics, and metabolomics). The development and integration of all these new fields together with PGx will be critical for understanding the heterogeneity of treatment outcomes in order to fully implement personalized medicine in the clinic, a major goal of 21st century healthcare.

References [1] Bouvy JC, De Bruin ML, Koopmanschap MA. Epidemiology of adverse drug reactions in Europe: a review of recent observational studies. Drug Saf 2015;38(5):437e53. https://doi.org/10.1007/s40264-015-0281-0. [2] Frontier Economics. Exploring the costs of unsafe Care in the NHS. 2014. [3] Evans WE, McLeod HL. Pharmacogenomics–drug disposition, drug targets, and side effects. N Engl J Med 2003;348(6):538e49. [4] Evans WE. Pharmacogenetics of thiopurine S-methyltransferase and thiopurine therapy. Ther Drug Monit 2004;26(2):186e91. https://doi.org/10.1097/00007691-200404000-00018. [5] Wadelius M, Pirmohamed M. Pharmacogenetics of warfarin: current status and future challenges. Pharmacogenomics J 2007;7(2):99e111. [6] Kalman LV, Agu´ndez J, Appell ML, et al. Pharmacogenetic allele nomenclature: international workgroup recommendations for test result reporting. Clin Pharmacol Ther 2016;99(2):172e85. https://doi.org/ 10.1002/cpt.280. [7] Daly AK, Brockmo¨ller J, Broly F, et al. Nomenclature for human CYP2D6 alleles. Pharmacogenetics 1996; 6(3):193e201. https://doi.org/10.1097/00008571-199606000-00001. [8] Oscarson M, Ingelman-Sundberg M. CYPalleles: a web page for nomenclature of human cytochrome P450 alleles. Drug Metabol Pharmacokinet 2002;17(6):491e5. https://doi.org/10.2133/dmpk.17.491. [9] Gaedigk A, Ingelman-Sundberg M, Miller NA, et al. The pharmacogene variation (PharmVar) consortium: incorporation of the human cytochrome P450 (CYP) allele nomenclature database. Clin Pharmacol Ther 2018;103(3):399e401. https://doi.org/10.1002/cpt.910.

References

213

[10] Sim SC, Ingelman-Sundberg M. The human cytochrome P450 (CYP) Allele Nomenclature website: a peerreviewed database of CYP variants and their associated effects. Hum Genom 2010;4(4):278e81. https:// doi.org/10.1186/1479-7364-4-4-278. [11] Marsh SGE, Albert ED, Bodmer WF, et al. Nomenclature for factors of the HLA system. Tissue Antigens 2010;75(4):291e455. https://doi.org/10.1111/j.1399-0039.2010.01466.x. [12] Kozyra M, Ingelman-Sundberg M, Lauschke VM. Rare genetic variants in cellular transporters, metabolic enzymes, and nuclear receptors can be important determinants of interindividual differences in drug response. Genet Med 2017;19(1):20e9. https://doi.org/10.1038/gim.2016.33. [13] Nelson MR, Wegmann D, Ehm MG, et al. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002. People 2015;337(6090):100e4. https://doi.org/10.1126/science.1217876.An. [14] Gordon AS, Fulton RS, Qin X, Mardis ER, Nickerson DA, Scherer S. PGRNseq: a targeted capture sequencing panel for pharmacogenetic research and implementation. Pharmacogenet Genom 2016;26(4): 161e8. https://doi.org/10.1097/FPC.0000000000000202. [15] Han S, Park J, Lee J, et al. Targeted next-generation sequencing for comprehensive genetic profiling of pharmacogenes. Clin Pharmacol Ther 2017;101(3):396e405. https://doi.org/10.1002/cpt.532. [16] Chua EW, Cree SL, Ton KNT, et al. Cross-comparison of exome analysis, next-generation sequencing of amplicons, and the iPLEXÒ ADME PGx panel for pharmacogenomic profiling. Front Pharmacol 2016;7. https://doi.org/10.3389/fphar.2016.00001. [17] Yang W, Wu G, Broeckel U, et al. Comparison of genome sequencing and clinical genotyping for pharmacogenes. Clin Pharmacol Ther 2016;100(4):380e8. https://doi.org/10.1002/cpt.411. [18] Barbarino JM, Whirl-Carrillo M, Altman RB, Klein TE. PharmGKB: a worldwide resource for pharmacogenomic information. Wiley Interdiscip Rev Syst Biol Med 2018;10(4):e1417. https://doi.org/10.1002/ wsbm.1417. [19] Thorn CF, Klein TE, Altman RB. Pharmacogenomics and bioinformatics: PharmGKB. Pharmacogenomics 2010;11(4):501e5. https://doi.org/10.2217/pgs.10.15. [20] Whirl-Carrillo M, McDonagh EM, Hebert JM, et al. Pharmacogenomics knowledge for personalized medicine. Clin Pharmacol Ther 2012;92(4):414e7. https://doi.org/10.1038/clpt.2012.96. [21] Gaedigk A, Sangkuhl K, Whirl-Carrillo M, et al. The evolution of PharmVar. Clin Pharmacol Ther 2019; 105(1):29e32. https://doi.org/10.1002/cpt.1275. [22] Relling MV, Klein TE. CPIC: clinical pharmacogenetics implementation consortium of the pharmacogenomics research network. Clin Pharmacol Ther 2011;89(3):464e7. https://doi.org/10.1038/clpt.2010.279. [23] Relling MV, Klein TE, Gammal RS, Whirl-Carrillo M, Hoffman JM, Caudle KE. The clinical pharmacogenetics implementation consortium: 10 years later. Clin Pharmacol Ther 2020;107(1):171e5. https:// doi.org/10.1002/cpt.1651. [24] Swen JJ, Wilting I, de Goede AL, et al. Pharmacogenetics: from bench to byte. Clin Pharmacol Ther 2008; 83(5):781e7. https://doi.org/10.1038/sj.clpt.6100507. [25] Swen JJ, Nijenhuis M, de Boer A, et al. Pharmacogenetics: from bench to byteean update of guidelines. Clin Pharmacol Ther 2011;89(5):662e73. https://doi.org/10.1038/clpt.2011.34. [26] Ross CJD, Visscher H, Sistonen J, et al. The Canadian pharmacogenomics network for drug safety: a model for safety pharmacology. Thyroid 2010;20(7):681e7. https://doi.org/10.1089/thy.2010.1642. [27] Klein TE, Ritchie MD. PharmCAT: a pharmacogenomics clinical annotation tool. Clin Pharmacol Ther 2018;104(1):19e22. https://doi.org/10.1002/cpt.928. [28] Sangkuhl K, Whirl-Carrillo M, Whaley RM, et al. Pharmacogenomics clinical annotation tool (PharmCAT). Clin Pharmacol Ther 2020;107(1):203e10. https://doi.org/10.1002/cpt.1568. [29] Numanagic I, Malikic S, Ford M, et al. Allelic decomposition and exact genotyping of highly polymorphic and structurally variant genes. Nat Commun 2018;9(1):828. https://doi.org/10.1038/s41467018-03273-1.

214

Chapter 10 Pharmacogenetics and personalized medicine

[30] Lee S-B, Wheeler MM, Patterson K, et al. Stargazer: a software tool for calling star alleles from nextgeneration sequencing data using CYP2D6 as a model. Genet Med 2019;21(2):361e72. https://doi.org/ 10.1038/s41436-018-0054-0. [31] Twist GP, Gaedigk A, Miller NA, et al. Constellation: a tool for rapid, automated phenotype assignment of a highly polymorphic pharmacogene, CYP2D6, from whole-genome sequences. NPJ Genom Med 2016;1: 15007. https://doi.org/10.1038/npjgenmed.2015.7. [32] Numanagic I, Malikic S, Pratt VM, Skaar TC, Flockhart DA, Sahinalp SC. Cypiripi: exact genotyping of CYP2D6 using high-throughput sequencing data. Bioinformatics 2015;31(12):i27e34. https://doi.org/ 10.1093/bioinformatics/btv232. [33] Marson AG, Al-Kharusi AM, Alwaidh M, et al. The SANAD study of effectiveness of carbamazepine, gabapentin, lamotrigine, oxcarbazepine, or topiramate for treatment of partial epilepsy: an unblinded randomised controlled trial. Lancet 2007;369(9566):1000e15. https://doi.org/10.1016/S0140-6736(07) 60460-7. [34] Phillips EJ, Sukasem C, Whirl-Carrillo M, et al. Clinical pharmacogenetics implementation consortium guideline for HLA genotype and use of carbamazepine and oxcarbazepine: 2017 update. Clin Pharmacol Ther 2018;103(4):574e81. https://doi.org/10.1002/cpt.1004. [35] Mullan KA, Anderson A, Illing PT, Kwan P, Purcell AW, Mifsud NA. HLA-associated antiepileptic druginduced cutaneous adverse reactions. HLA 2019;93(6):417e35. https://doi.org/10.1111/tan.13530. [36] Shiina T, Hosomichi K, Inoko H, Kulski JK. The HLA genomic loci map: expression, interaction, diversity and disease. J Hum Genet 2009;54(1):15e39. https://doi.org/10.1038/jhg.2008.5. [37] Chung W-H, Hung S-I, Hong H-S, et al. Medical genetics: a marker for Stevens-Johnson syndrome. Nature 2004;428(6982):486. https://doi.org/10.1038/428486a. [38] Ozeki T, Mushiroda T, Yowang A, et al. Genome-wide association study identifies HLA-A3101 allele as a genetic risk factor for carbamazepine-induced cutaneous adverse drug reactions in Japanese population. Hum Mol Genet 2011;20(5):1034e41. https://doi.org/10.1093/hmg/ddq537. [39] McCormack M, Alfirevic A, Bourgeois S, et al. HLA-A3101 and carbamazepine-induced hypersensitivity reactions in Europeans. N Engl J Med 2011;364(12):1134e43. https://doi.org/10.1056/ NEJMoa1013297. [40] Shuldiner AR, O’Connell JR, Bliden KP, et al. Association of cytochrome P450 2C19 genotype with the antiplatelet effect and clinical efficacy of clopidogrel therapy. J Am Med Assoc 2009;302(8):849e57. https://doi.org/10.1001/jama.2009.1232. [41] Kazui M, Nishiya Y, Ishizuka T, et al. Identification of the human cytochrome P450 enzymes involved in the two oxidative steps in the bioactivation of clopidogrel to its pharmacologically active metabolite. Drug Metab Dispos 2010;38(1):92e9. https://doi.org/10.1124/dmd.109.029132. [42] Scott SA, Sangkuhl K, Stein CM, et al. Clinical Pharmacogenetics Implementation Consortium guidelines for CYP2C19 genotype and clopidogrel therapy: 2013 update. Clin Pharmacol Ther 2013;94(3):317e23. https://doi.org/10.1038/clpt.2013.105. [43] Jiang X-L, Samant S, Lesko LJ, Schmidt S. Clinical pharmacokinetics and pharmacodynamics of clopidogrel. Clin Pharmacokinet 2015;54(2):147e66. https://doi.org/10.1007/s40262-014-0230-6. [44] Scott SA, Sangkuhl K, Shuldiner AR, et al. PharmGKB summary: very important pharmacogene information for cytochrome P450, family 2, subfamily C, polypeptide 19. Pharmacogenet Genom 2012;22(2): 159e65. https://doi.org/10.1097/FPC.0b013e32834d4962. [45] Sibbing D, Gebhard D, Koch W, et al. Isolated and interactive impact of common CYP2C19 genetic variants on the antiplatelet effect of chronic clopidogrel therapy. J Thromb Haemostasis 2010;8(8): 1685e93. https://doi.org/10.1111/j.1538-7836.2010.03921.x. [46] Wigle TJ, Tsvetkova EV, Welch SA, Kim RB. DPYD and fluorouracil-based chemotherapy: mini review and case report. Pharmaceutics 2019;11(5). https://doi.org/10.3390/pharmaceutics11050199.

References

215

[47] Lee AM, Shi Q, Pavey E, et al. DPYD variants as predictors of 5-fluorouracil toxicity in adjuvant colon cancer treatment (NCCTG N0147). J Natl Cancer Inst 2014;106(12). https://doi.org/10.1093/jnci/dju298. [48] Amstutz U, Henricks LM, Offer SM, et al. Clinical pharmacogenetics implementation consortium (CPIC) guideline for dihydropyrimidine dehydrogenase genotype and fluoropyrimidine dosing: 2017 update. Clin Pharmacol Ther 2018;103(2):210e6. https://doi.org/10.1002/cpt.911. [49] Fidai SS, Sharma AE, Johnson DN, Segal JP, Lastra RR. Dihydropyrimidine dehydrogenase deficiency as a cause of fatal 5-Fluorouracil toxicity. Autops Case Rep 2018;8(4):e2018049. https://doi.org/10.4322/ acr.2018.049. [50] Tong CC, Lam CW, Lam KO, Lee VHF, Luk M-Y. A novel DPYD variant associated with severe toxicity of fluoropyrimidines: role of pre-emptive DPYD genotype screening. Front Oncol 2018;8:279. https://doi.org/ 10.3389/fonc.2018.00279. [51] Lunenburg CATC, van der Wouden CH, Nijenhuis M, et al. Dutch Pharmacogenetics Working Group (DPWG) guideline for the gene-drug interaction of DPYD and fluoropyrimidines. Eur J Hum Genet 2019. https://doi.org/10.1038/s41431-019-0540-0. [52] Weinshilboum R. Inheritance and drug response. N Engl J Med 2003;348(6):529e37. https://doi.org/ 10.1056/NEJMra020021. [53] Relling MV, Schwab M, Whirl-Carrillo M, et al. Clinical pharmacogenetics implementation consortium guideline for thiopurine dosing based on TPMT and NUDT 15 genotypes: 2018 update. Clin Pharmacol Ther 2019;105(5):1095e105. https://doi.org/10.1002/cpt.1304. [54] Ford LT, Berg JD. Thiopurine S-methyltransferase (TPMT) assessment prior to starting thiopurine drug treatment; a pharmacogenomic test whose time has come. J Clin Pathol 2010;63(4):288e95. https://doi.org/ 10.1136/jcp.2009.069252. [55] Yang S-K, Hong M, Baek J, et al. A common missense variant in NUDT15 confers susceptibility to thiopurine-induced leukopenia. Nat Genet 2014;46(9):1017e20. https://doi.org/10.1038/ng.3060. [56] Yang JJ, Landier W, Yang W, et al. Inherited NUDT15 variant is a genetic determinant of mercaptopurine intolerance in children with acute lymphoblastic leukemia. J Clin Oncol 2015;33(11):1235e42. https:// doi.org/10.1200/JCO.2014.59.4671. [57] Pasternak AL, Zhang L, Hertz DL. CYP3A pharmacogenetic association with tacrolimus pharmacokinetics differs based on route of drug administration. Pharmacogenomics 2018;19(6):563e76. https://doi.org/ 10.2217/pgs-2018-0003. [58] Bowman LJ, Brennan DC. The role of tacrolimus in renal transplantation. Expet Opin Pharmacother 2008; 9(4):635e43. https://doi.org/10.1517/14656566.9.4.635. [59] Chen L, Prasad GVR. CYP3A5 polymorphisms in renal transplant recipients: influence on tacrolimus treatment. Pharmgenom Pers Med 2018;11:23e33. https://doi.org/10.2147/PGPM.S107710. ˚ sberg A, et al. Therapeutic drug monitoring of tacrolimus-personalized therapy: [60] Brunet M, van Gelder T, A second consensus report. Ther Drug Monit 2019;41(3):261e307. https://doi.org/10.1097/FTD.00000 00000000640. [61] Birdwell KA, Decker B, Barbarino JM, et al. Clinical pharmacogenetics implementation consortium (CPIC) guidelines for CYP3A5 genotype and tacrolimus dosing. Clin Pharmacol Ther 2015;98(1):19e24. https://doi.org/10.1002/cpt.113. [62] Thorn CF, Klein TE, Altman RB. Codeine and morphine pathway. Pharmacogenet Genom 2009;19(7): 556e8. https://doi.org/10.1097/FPC.0b013e32832e0eac. [63] Ingelman-Sundberg M. Genetic polymorphisms of cytochrome P450 2D6 (CYP2D6): clinical consequences, evolutionary aspects and functional diversity. Pharmacogenomics J 2005;5(1):6e13. https:// doi.org/10.1038/sj.tpj.6500285. [64] Gaedigk A. Complexities of CYP2D6 gene analysis and interpretation. Int Rev Psychiatr 2013;25(5): 534e53. https://doi.org/10.3109/09540261.2013.825581.

216

Chapter 10 Pharmacogenetics and personalized medicine

[65] Crews KR, Gaedigk A, Dunnenberger HM, et al. Clinical pharmacogenetics implementation consortium guidelines for cytochrome P450 2D6 genotype and codeine therapy: 2014 update. Clin Pharmacol Ther 2014;95(4):376e82. https://doi.org/10.1038/clpt.2013.254. [66] Ingelman-Sundberg M, Sim SC, Gomez A, Rodriguez-Antona C. Influence of cytochrome P450 polymorphisms on drug therapies: pharmacogenetic, pharmacoepigenetic and clinical aspects. Pharmacol Ther 2007;116(3):496e526. https://doi.org/10.1016/j.pharmthera.2007.09.004. [67] Lo¨tsch J, Rohrbacher M, Schmidt H, Doehring A, Brockmo¨ller J, Geisslinger G. Can extremely low or high morphine formation from codeine be predicted prior to therapy initiation? Pain 2009;144(1e2):119e24. https://doi.org/10.1016/j.pain.2009.03.023. [68] Ciszkowski C, Madadi P, Phillips MS, Lauwers AE, Koren G. Codeine, ultrarapid-metabolism genotype, and postoperative death. N Engl J Med 2009;361(8):827e8. https://doi.org/10.1056/NEJMc0904266. [69] Koren G, Cairns J, Chitayat D, Gaedigk A, Leeder SJ. Pharmacogenetics of morphine poisoning in a breastfed neonate of a codeine-prescribed mother. Lancet 2006;368(9536):704. https://doi.org/10.1016/ S0140-6736(06)69255-6. [70] Wang D, Poi MJ, Sun X, Gaedigk A, Leeder JS, Sadee W. Common CYP2D6 polymorphisms affecting alternative splicing and transcription: long-range haplotypes with two regulatory variants modulate CYP2D6 activity. Hum Mol Genet 2014;23(1):268e78. https://doi.org/10.1093/hmg/ddt417. [71] Fan W-L, Shiao M-S, Hui RC-Y, et al. HLA association with drug-induced adverse reactions. J Immunol Res 2017;2017:3186328. https://doi.org/10.1155/2017/3186328. [72] Hetherington S, McGuirk S, Powell G, et al. Hypersensitivity reactions during therapy with the nucleoside reverse transcriptase inhibitor abacavir. Clin Therapeut 2001;23(10):1603e14. https://doi.org/10.1016/ s0149-2918(01)80132-6. [73] Hetherington S, Hughes AR, Mosteller M, et al. Genetic variations in HLA-B region and hypersensitivity reactions to abacavir. Lancet 2002;359(9312):1121e2. https://doi.org/10.1016/S01406736(02)08158-8. [74] Mallal S, Nolan D, Witt C, et al. Association between presence of HLA-B5701, HLA-DR7, and HLA-DQ3 and hypersensitivity to HIV-1 reverse-transcriptase inhibitor abacavir. Lancet 2002;359(9308):727e32. https://doi.org/10.1016/s0140-6736(02)07873-x. [75] Mallal S, Phillips E, Carosi G, et al. HLA-B5701 screening for hypersensitivity to abacavir. N Engl J Med 2008;358(6):568e79. https://doi.org/10.1056/NEJMoa0706135. [76] Martin MA, Hoffman JM, Freimuth RR, et al. Clinical pharmacogenetics implementation consortium guidelines for HLA-B genotype and abacavir dosing: 2014 update. Clin Pharmacol Ther 2014;95(5): 499e500. https://doi.org/10.1038/clpt.2014.38. [77] Colombo S, Rauch A, Rotger M, et al. The HCP5 single-nucleotide polymorphism: a simple screening tool for prediction of hypersensitivity reaction to abacavir. J Infect Dis 2008;198(6):864e7. https://doi.org/ 10.1086/591184. [78] Shekhani R, Steinacher L, Swen JJ, Ingelman-Sundberg M. Evaluation of current regulation and guidelines of pharmacogenomic drug labels: opportunities for improvements. Clin Pharmacol Ther 2020;107(5): 1240e55. https://doi.org/10.1002/cpt.1720. [79] Koutsilieri S, Tzioufa F, Sismanoglou D-C, Patrinos GP. Unveiling the guidance heterogeneity for genomeinformed drug treatment interventions among regulatory bodies and research consortia. Pharmacol Res 2020;153:104590. https://doi.org/10.1016/j.phrs.2019.104590. [80] Patrinos G, Mitropoulou C. Measuring the value of pharmacogenomics evidence. Clin Pharmacol Ther 2017;102(5):739e41. https://doi.org/10.1002/cpt.743. [81] Pulley JM, Denny JC, Peterson JF, et al. Operational implementation of prospective genotyping for personalized medicine: the design of the Vanderbilt PREDICT project. Clin Pharmacol Ther 2012;92(1): 87e95. https://doi.org/10.1038/clpt.2011.371.

References

217

[82] Van Driest S, Shi Y, Bowton E, et al. Clinically actionable genotypes among 10,000 patients with preemptive pharmacogenomic testing. Clin Pharmacol Ther 2014;95(4):423e31. https://doi.org/10.1038/ clpt.2013.229. [83] Gottesman O, Kuivaniemi H, Tromp G, et al. The electronic medical records and genomics (eMERGE) network: past, present, and future. Genet Med 2013;15(10):761e71. https://doi.org/10.1038/gim. 2013.72. [84] van der Wouden C, Cambon-Thomsen A, Cecchin E, et al. Implementing pharmacogenomics in Europe: design and implementation strategy of the ubiquitous pharmacogenomics consortium. Clin Pharmacol Ther 2017;101(3):341e58. https://doi.org/10.1002/cpt.602. [85] Verbelen M, Weale ME, Lewis CM. Cost-effectiveness of pharmacogenetic-guided treatment: are we there yet? Pharmacogenomics J 2017;17(5):395e402. https://doi.org/10.1038/tpj.2017.21. [86] Iorio F, Knijnenburg TA, Vis DJ, et al. A landscape of pharmacogenomic interactions in cancer. Cell 2016; 166(3):740e54. https://doi.org/10.1016/j.cell.2016.06.017. [87] McLeod HL. Cancer pharmacogenomics: early promise, but concerted effort needed. Science 2013; 339(6127):1563e6. https://doi.org/10.1126/science.1234139. [88] Parca L, Pepe G, Pietrosanto M, et al. Modeling cancer drug response through drug-specific informative genes. Sci Rep 2019;9(1):15222. https://doi.org/10.1038/s41598-019-50720-0. [89] Barretina J, Caponigro G, Stransky N, et al. The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 2012;483(7391):603e7. https://doi.org/10.1038/nature11003. [90] Azuaje F. Computational models for predicting drug responses in cancer research. Briefings Bioinf 2016. https://doi.org/10.1093/bib/bbw065. [91] D’Costa NM, Cina D, Shrestha R, et al. Identification of gene signature for treatment response to guide precision oncology in clear-cell renal cell carcinoma. Sci Rep 2020;10(1):2026. https://doi.org/10.1038/ s41598-020-58804-y. [92] Smyth EC, Nyamundanda G, Cunningham D, et al. A seven-gene signature assay improves prognostic risk stratification of perioperative chemotherapy treated gastroesophageal cancer patients from the MAGIC trial. Ann Oncol 2018;29(12):2356e62. https://doi.org/10.1093/annonc/mdy407. [93] Shee K, Wells JD, Jiang A, Miller TW. Integrated pan-cancer gene expression and drug sensitivity analysis reveals SLFN11 mRNA as a solid tumor biomarker predictive of sensitivity to DNA-damaging chemotherapy. PLoS One 2019;14(11):e0224267. https://doi.org/10.1371/journal.pone.0224267. [94] Sainz J, Prieto C, Ruso-Julve F, Crespo-Facorro B. Blood gene expression profile predicts response to antipsychotics. Front Mol Neurosci 2018;11:73. https://doi.org/10.3389/fnmol.2018.00073. [95] Bradshaw NJ, Ukkola-Vuoti L, Pankakoski M, et al. The NDE1 genomic locus can affect treatment of psychiatric illness through gene expression changes related to microRNA-484. Open Biol 2017;7(11): 170153. https://doi.org/10.1098/rsob.170153. [96] Margerie D, Lefebvre P, Raverdy V, et al. Hepatic transcriptomic signatures of statin treatment are associated with impaired glucose homeostasis in severely obese patients. BMC Med Genom 2019;12(1):80. https://doi.org/10.1186/s12920-019-0536-1. [97] Theusch E, Kim K, Stevens K, et al. Statin-induced expression change of INSIG1 in lymphoblastoid cell lines correlates with plasma triglyceride statin response in a sex-specific manner. Pharmacogenomics J 2017;17(3):222e9. https://doi.org/10.1038/tpj.2016.12. [98] Johnson KW, Glicksberg BS, Shameer K, et al. A transcriptomic model to predict increase in fibrous cap thickness in response to high-dose statin treatment: validation by serial intracoronary OCT imaging. EBioMedicine 2019;44:41e9. https://doi.org/10.1016/j.ebiom.2019.05.007. [99] Kim I-W, Han N, Burckart GJ, Oh JM. Epigenetic changes in gene expression for drug-metabolizing enzymes and transporters. Pharmacotherapy 2014;34(2):140e50. https://doi.org/10.1002/phar.1362. [100] Tollefsbol T. Personalized epigenetics. Elsevier; 2015. https://doi.org/10.1016/C2013-0-13457-5.

218

Chapter 10 Pharmacogenetics and personalized medicine

[101] Kacevska M, Ivanov M, Ingelman-Sundberg M. Perspectives on epigenetics and its relevance to adverse drug reactions. Clin Pharmacol Ther 2011;89(6):902e7. https://doi.org/10.1038/clpt.2011.21. [102] Habano W, Kawamura K, Iizuka N, Terashima J, Sugai T, Ozawa S. Analysis of DNA methylation landscape reveals the roles of DNA methylation in the regulation of drug metabolizing enzymes. Clin Epigenet 2015;7(1):105. https://doi.org/10.1186/s13148-015-0136-7. [103] Baker EK, Johnstone RW, Zalcberg JR, El-Osta A. Epigenetic changes to the MDR1 locus in response to chemotherapeutic drugs. Oncogene 2005;24(54):8061e75. https://doi.org/10.1038/sj.onc.1208955. [104] Tekpli X, Zienolddiny S, Skaug V, Stangeland L, Haugen A, Mollerup S. DNA methylation of the CYP1A1 enhancer is associated with smoking-induced genetic alterations in human lung. Int J Cancer 2012;131(7): 1509e16. https://doi.org/10.1002/ijc.27421. [105] Wu Y, Shi X, Liu Y, et al. Histone deacetylase 1 is required for Carbamazepine-induced CYP3A4 expression. J Pharm Biomed Anal 2012;58:78e82. https://doi.org/10.1016/j.jpba.2011.09.017. [106] Gampenrieder SP, Rinnerthaler G, Hackl H, et al. DNA methylation signatures predicting bevacizumab efficacy in metastatic breast cancer. Theranostics 2018;8(8):2278e88. https://doi.org/10.7150/thno.23544. [107] Andrew Clayton T, Lindon JC, Cloarec O, et al. Pharmaco-metabonomic phenotyping and personalized drug treatment. Nature 2006;440(7087):1073e7. https://doi.org/10.1038/nature04648. [108] Kaddurah-Daouk R, Weinshilboum RM. Pharmacometabolomics: implications for clinical pharmacology and systems pharmacology. Clin Pharmacol Ther 2014;95(2):154e67. https://doi.org/10.1038/clpt. 2013.217. [109] Beger RD, Schmidt MA, Kaddurah-Daouk R. Current concepts in pharmacometabolomics, biomarker discovery, and precision medicine. Metabolites 2020;10(4):129. https://doi.org/10.3390/metabo 10040129. [110] Kaddurah-Daouk R, Weinshilboum R. Metabolomic signatures for drug response phenotypes: pharmacometabolomics enables precision medicine. Clin Pharmacol Ther 2015;98(1):71e5. https://doi.org/10.1002/ cpt.134. [111] Kaddurah-Daouk R, McEvoy J, Baillie RA, et al. Metabolomic mapping of atypical antipsychotic effects in schizophrenia. Mol Psychiatr 2007;12(10):934e45. https://doi.org/10.1038/sj.mp.4002000. [112] Krauss RM, Zhu H, Kaddurah-Daouk R. Pharmacometabolomics of statin response. Clin Pharmacol Ther 2013;94(5):562e5. https://doi.org/10.1038/clpt.2013.164. [113] McEvoy J, Baillie RA, Zhu H, et al. Lipidomics reveals early metabolic changes in subjects with schizophrenia: effects of atypical antipsychotics. PLoS One 2013;8(7):e68717. https://doi.org/10.1371/ journal.pone.0068717. [114] Zhu H, Bogdanov MB, Boyle SH, et al. Pharmacometabolomics of response to sertraline and to placebo in major depressive disorder e possible role for methoxyindole pathway. PLoS One 2013;8(7):e68283. https://doi.org/10.1371/journal.pone.0068283. [115] Backshall A, Sharma R, Clarke SJ, Keun HC. Pharmacometabonomic profiling as a predictor of toxicity in patients with inoperable colorectal cancer treated with capecitabine. Clin Cancer Res 2011;17(9):3019e28. https://doi.org/10.1158/1078-0432.CCR-10-2474. [116] Amin AM, Sheau Chin L, Teh C-H, et al. Pharmacometabolomics analysis of plasma to phenotype clopidogrel high on treatment platelets reactivity in coronary artery disease patients. Eur J Pharmaceut Sci 2018;117:351e61. https://doi.org/10.1016/j.ejps.2018.03.011. [117] Yerges-Armstrong LM, Ellero-Simatos S, Georgiades A, et al. Purine pathway implicated in mechanism of resistance to aspirin therapy: pharmacometabolomics-informed pharmacogenomics. Clin Pharmacol Ther 2013;94(4):525e32. https://doi.org/10.1038/clpt.2013.119. [118] Amin AM, Sheau Chin L, Azri Mohamed Noor D, SK Abdul Kader MA, Kah Hay Y, Ibrahim B. The personalization of clopidogrel antiplatelet therapy: the role of integrative pharmacogenetics and pharmacometabolomics. Cardiol Res Pract 2017;2017:1e17. https://doi.org/10.1155/2017/8062796.

References

219

[119] Gupta M, Neavin D, Liu D, et al. TSPAN5, ERICH3 and selective serotonin reuptake inhibitors in major depressive disorder: pharmacometabolomics-informed pharmacogenomics. Mol Psychiatr 2016;21(12): 1717e25. https://doi.org/10.1038/mp.2016.6. [120] Saad R, Rizkallah MR, Aziz RK. Gut pharmacomicrobiomics: the tip of an iceberg of complex interactions between drugs and gut-associated microbes. Gut Pathog 2012;4(1):16. https://doi.org/10.1186/1757-4749-4-16. [121] ElRakaiby M, Dutilh BE, Rizkallah MR, Boleij A, Cole JN, Aziz RK. Pharmacomicrobiomics: the impact of human microbiome variations on systems pharmacology and personalized therapeutics. OMICS A J Integr Biol 2014;18(7):402e14. https://doi.org/10.1089/omi.2014.0018. [122] Matson V, Fessler J, Bao R, et al. The commensal microbiome is associated with antiePD-1 efficacy in metastatic melanoma patients. Science 2018;359(6371):104e8. https://doi.org/10.1126/science.aao3290. [123] Barker HE, Paget JTE, Khan AA, Harrington KJ. The tumour microenvironment after radiotherapy: mechanisms of resistance and recurrence. Nat Rev Cancer 2015;15(7):409e25. https://doi.org/10.1038/ nrc3958. [124] Brandi G. Intestinal microflora and digestive toxicity of irinotecan in mice. Clin Cancer Res 2006;12(4): 1299e307. https://doi.org/10.1158/1078-0432.CCR-05-0750. [125] Dubin K, Callahan MK, Ren B, et al. Intestinal microbiome analyses identify melanoma patients at risk for checkpoint-blockade-induced colitis. Nat Commun 2016;7(1):10391. https://doi.org/10.1038/ncomms 10391. [126] Wallace BD, Wang H, Lane KT, et al. Alleviating cancer drug toxicity by inhibiting a bacterial enzyme. Science 2010;330(6005):831e5. https://doi.org/10.1126/science.1191175. [127] Pitt JM, Vetizou M, Waldschmitt N, et al. Fine-tuning cancer immunotherapy: optimizing the gut microbiome. Cancer Res 2016;76(16):4602e7. https://doi.org/10.1158/0008-5472.CAN-16-0448.

CHAPTER

Data sharing and gene variant databases

11

Johan T. den Dunnen1, 2, Ivo F.A.C. Fokkema1 1

Department of Human Genetics, Leiden University Medical Center, Leiden, South Holland, the Netherlands; Department of Clinical Genetics, Leiden University Medical Center, Leiden, South Holland, the Netherlands

2

Introduction The most obvious use of DNA variant databases in relation to this book is their application in DNA diagnostics. DNA diagnostics is based on sharing information on genes, variants, and phenotypes. Without sharing, DNA diagnostics would not be possible. When we do not share our findings, we do not offer optimal care to the patients and their families. Furthermore, to get to reliable evidence-based variant classification every observation counts, strengthening the evidence we have (disease-associated or not) as well as to ultimately allow accurate risk estimates. Given these evident basic requirements, it is astonishing to see that for long sharing was rather the exception than the standard and that stable financial support for the databases still is largely lacking. Sharing is by far the cheapest way to classify variants. A large fraction of the so-called Variants of Unknown Significance will be solved immediately when we would all share all our data. The ideal daily routine in a clinical diagnostics laboratory is a sample comes in, is analyzed (sequenced), variants are called, variants are stored in a local database and automatically shared with a public international repository, and the repository is queried to determine the population frequency of the variant and the associated phenotypic consequences (clinical variant classification). When necessary, and where possible, additional analyses are performed, e.g., analyzing RNA to check possible consequences on RNA processing (splicing, see Chapter 7) and/or immunohistochemical analysis to study protein amount and localization (see Chapter 8), and bioinformatic analysis to predict possible variant consequences (see Chapter 6). Finally a conclusion is drawn, a diagnostic report is written, and the data are published. Unfortunately, the step missing from the daily routine mentioned above is the automatic sharing with a public international repository. At best, data sharing is done after the last step, publication, although in most cases this step is rarely achieved. It is additional work and journals are not very likely to publish largely confirmatory observations. This has serious consequences. Most importantly, the data available we all use for accurate variant classification are far from complete; in the end, probably less than 10% of all observations are published. Many variants have therefore to be classified as VUS [16], a “Variant of Unknown Significance,” in fact a “Variant of Unsufficient Sharing.” Furthermore, even when variants are published, this is not a guarantee the variants get uploaded in one of the variant Clinical DNA Variant Interpretation. https://doi.org/10.1016/B978-0-12-820519-8.00002-8 Copyright © 2021 Elsevier Inc. All rights reserved.

221

222

Chapter 11 Data sharing and gene variant databases

databases making it easy to find the observation. Most databases do not have the resources to check literature and upload all published data. Since even published reports can go unnoticed, a growing number of journals currently started to demand database submission before accepting a manuscript for publication. The main aim of variant databases is to catalogue all variants encountered, collect all available information, and display the data collected using the database website. Depending on the focus of the database, the information collected will differ. Some will focus on (monogenic) gene-disease links, others on variant (population) frequencies, functional predictions, or associated polygenic phenotypes. The variants collected are stored and displayed using a standard description, mostly based on the HGVS recommendations to describe variants in DNA, RNA, and protein sequences [1] (varnomen. HGVS.org, Chapter 2). Although in general they will store variant descriptions based on several coordinate systems, depending on their focus, one choice is made for the main display based on either chromosomal genomic coordinates (“g.” description), a gene (“c.”), or a protein (“p.”). All major databases link to each other. The focus of this chapter is how the variant databases are used to support clinical diagnosis. Besides the diagnostic application, databases have much wider applications. First, they support research in several ways. The variants, the variant type, their location in the gene/protein, and the consequences reported give important insights in the function of the gene, the cellular processes they are involved, and how this affects an organism. Variants for which too little information is available to classify them reliably are obvious candidates to be tested in functional assays, in vitro gene splicing experiments, or animal model systems. Known variant effects can be used to train and benchmark predictive computational tools. Further research is directed by the aspects that are not understood, i.e., variants with unexpected or unexplained effects and genetic modifiers that influence disease outcome. When the correlation between genotype and phenotype is understood, this can be used to design a rational approach toward a possible treatment. Possible treatments may target specific variant types, e.g., exonskipping or stop-codon read-through, where the database can be used to guide a design based on the observed frequencies [2]. Testing treatment success is guided by the collected data on disease progression, pointing out important outcome measures, and comparing these between treated and untreated individuals. Another database function is to promote interaction between scientists, between scientists and patients, and between patients. Scientists find information on open questions that need further research to be answered and colleagues that may have useful biological samples to support such studies. Using the database, scientists find leads to get into contact with reporting labs for collaboration and biological samples, and to patients for further research or to participate in clinical trials. Patients use the database to see what information is available regarding their condition, and to find patients with similar or identical genotypes (“a patient like me”). Ultimately, when a treatment becomes available, the database is a source to find patients that may benefit from the treatment.

General databases There are two main types of variant databases: general and focused databases. The general databases are “a mile wide and an inch deep” while the focused databases are “an inch wide and a mile deep.” The general databases try to catalogue all variants published and direct to focused databases for further

General databases

223

details. The focused gene or disease variant databases try to collect all information from all cases available, published and unpublished. It will be impossible to discuss all available databases in this chapter and we will focus on those used most frequently in clinical diagnostics. The genome browsers, e.g., UCSC (genome.ucsc.edu) and Ensembl (www.ensembl.org), serve an important intermediate function by offering tracks in their display linking to many available datasets. The OMIM database (Online Mendelian Inheritance in Man, www.omim.org) focuses on the relation between genes and phenotypes [3]. Although the focus is on genomic disorders (genetic disease), all genetic phenotypes are covered. OMIM is a freely available authoritative compendium which is updated daily. The database has two types of records: phenotype records (Fig.11.1A) and gene records (Fig.11.1B). The records link to each other, but not all phenotypes have been linked to a gene nor have all genes been linked to a phenotype. Phenotype records include a short historic perspective, and give details on the clinical features, the modes of inheritance observed, disease diagnosis, and how the disease was mapped and linked to a specific gene or genomic region. Gene records give details on the gene, the encoded protein, its (suggested) function, and the existence of animal models. When a gene has been linked to one or more phenotypes, details are provided and a range of phenotype-associated variants are listed. The number of variants listed differs greatly; the focus is on first geneephenotype reports. In addition, a series of variants may be described, representing both typical and exceptional cases, together giving a comprehensive overview of the genotypeephenotype relations observed. Variants reported are linked to dbSNP, ClinVar, and gnomAD. Links to other sources are provided through an “External Links” menu. OMIM does not yet use HPO terms (Human Phenotype Ontology) [4] when describing phenotypes, but for frequently observed features these can be obtained from, e.g., the HPO website (hpo.jax.org). Clinical labs mainly use OMIM identifiers and disease names when reporting individual phenotypes and the variants identified. While OMIM focuses on monogenic disorders, it does store information on all genetic phenotypes including polygenic traits as revealed by genome-wide association studies (GWAS). Specific GWAS databases, e.g., the GWAS catalog (www.ebi.ac.uk/gwas) and GWAS Central (www.gwascentral.org), focus on such studies in more detail and contain large amounts of information on variants which, through large population studies, have been linked to a specific phenotype. While these databases are rarely used in clinical genetics, GWAS signals mapped to a genomic region may add additional evidence to establish new geneephenotype links. dbSNP (www.ncbi.nlm.nih.gov/snp) and EVA (European Variation Archive, www.ebi.ac.uk/eva) are databases which try to catalog all small human DNA variants, including single-nucleotide variants, microsatellites, and small insertions and deletions. To every variant an identifier is assigned, an rs number (e.g., rs104894790), and details are given regarding its genomic location and possible transcripts covering the variant position. In addition, information is provided about the populations in which the variant has been identified and its frequency. Both databases link to a range of other sites containing additional information. While in essence, the gnomAD database [5] (gnomad.broadinstitute.org, Fig. 11.2) and its predecessor ExAC collect similar information, they are clearly different from dbSNP/EVA which collect data from many different sources. gnomAD is unique since it is based on data obtained from

224

Chapter 11 Data sharing and gene variant databases

FIGURE 11.1 The OMIM database. (A) OMIM disease record for Duchenne Muscular Dystrophy (#310200). The record gives summary data on the disease and, when known, the inheritance pattern(s) observed and the genomic location and the gene(s) involved. Every record has a specific format containing the topics discussed listed on the left, while links to other resources are listed on the right. (B) OMIM gene record for the dystrophin (DMD) gene (#300377). The record gives summary data on the gene and its location and, when known, the phenotypes it is known to be involved. Every record has a specific format containing the topics discussed listed on the left, while links to other resources are listed on the right (the Variation menu has been opened).

General databases

225

FIGURE 11.2 The gnomAD database. Shown is the record for the DMD gene, its exon/intron structure, and the mean coverage obtained in exome and genome sequencing studies. The panel on the right (Constraint) shows the number of variants in the DMD gene as observed and expected for three categories: synonymous, missense, and pLoF (predicted loss-of-function). The number of pLoF variants observed is significantly below the number expected indicating the gene has an essential function.

125,748 exomes and 15,708 genomes from unrelated individuals analyzed by one group using a standard analysis pipeline, calling variants using the same thresholds. Individuals analyzed are from different populations, but population data are far from complete. Given the high quality of the data and the large number of samples analyzed, the gnomAD database can be used to study several aspects of gene function. Using a mutation rate model, gnomAD classified all protein-coding genes from mutation tolerant to mutation intolerant. Over 3000 genes were shown to be largely devoid of proteintruncating variants, more than half which have currently not been linked to a human disease phenotype. In other genes, high frequencies of predicted truncating variants were found, even in homozygous cases, making it highly unlikely these genes are associated with human disease phenotypes. Most labs use databases like gnomAD and dbSNP/EVA as indispensable sources to find variant frequencies, where a high frequency reduces the chance that a variant has serious consequences. For variants in genes which have not yet been linked to a disease phenotype, the overall mutation rate available from gnomAD gives an indication how likely it is that the gene has a critical function.

226

Chapter 11 Data sharing and gene variant databases

Focused databases Historically, focused variant databases were started by experts working in, or associated with, DNA diagnostics. These so-called Locus-Specific Databases (LSDBs) focused mostly on a gene or on a disease caused by variants in several genes. The databases differed widely, using different platforms/ database software, collecting details on different phenotypic features, and displaying variant data in many different formats. Initially, there was little collaboration and often several databases were started for the same major disease genes, e.g., TP53, BRCA1/2, colon cancer genes, etc. Stimulated by the HUGO Mutation Database initiative [6] LSDBs joined forces, developed shared international standards [7], and gradually merged into larger consortia. The development of freely available gene variant database software packages like LOVD [8] (www.LOVD.nl) and UMD [9] (www.umd.be) boosted standardization and collaboration. Ultimately, the LOVD version 3 software, facilitating collection and display of genomewide data, persuaded many independent database installations to join efforts and merge into a central repository, the “Global Variome shared LOVD.” The GV shared LOVD is now by far the largest collaborative effort to share data on genes, variants, and phenotypes. The database works under the auspices of “Global Variome,” a UK charity linked to the “Human Variome Project” (HVP), a nongovernmental organization (NGO) recognized by UNESCO (United Nations Educational, Scientific and Cultural Organization). Different LOVD installations still exist, but most public instances share basic information with a central LOVD server. The information shared is used to automatically update the LSDB gene variant database list (lsdb.variome.org) and to offer a centralized variant query service, redirecting positive hits to the LOVD installation containing the data (www. lovd.nl/3.0/search). Currently, besides the GV shared LOVD, there are two other major human gene variant database initiatives covering all human genesdHGMD [10] and ClinVar [11]deach with their own focus. All three are discussed in detail below.

HGMD While the GV shared LOVD and ClinVar are freely accessible public archives, access to HGMD (the Human Gene Mutation Database, www.hgmd.cf.ac.uk) [10] is restricted with recent data requiring a paid subscription. HGMD only collects published data, ClinVar and the GV shared LOVD contain a significant amount of unpublished data, submitted to these databases directly. While ClinVar and the GV shared LOVD may contain detailed phenotypic data, HGMD only stores a general phenotype (disease name). HGMD reports variant classifications as published; ClinVar and the GV shared LOVD collaborate with international gene/disease experts and consortia sharing their overall classification (e.g., ENIGMA, InSiGHT). HGMD focusses on phenotype-associated variants; ClinVar and LOVD store all variants following a 5-tier classification system. HGMD focuses on variants causative or associated with inherited human disease as well as diseaseassociated/functional variants. HGMD covers only published data and lists only the first report, unless an additional report extends the original entry, e.g., based on functional studies. Data stored cover all variants types within the coding regions, splicing, and regulatory regions of human nuclear genes. Somatic variants and variants in the mitochondrial genome are not included. Variants which do not alter the encoded amino acid sequence are not recorded unless they have been shown to affect mRNA

Focused databases

227

splicing or gene expression, or have been reported as associated with disease. Unless they have some clinical relevance, variants lacking obvious phenotypic consequences are not collected. When other sites, e.g., a genome browser, display HGMD data it will only give positional information, it will not display the variant.

ClinVar and GV shared LOVD Unlike HGMD, ClinVar and the GV shared LOVD archive submitted information and add data, identifiers and links that may be available about a variant or phenotype from other public resources. ClinVar and LOVD will, where possible, check submitted data for consistency (e.g., variant description using correct HGVS nomenclature). Submissions not following the minimal standards of the database are not shown unless potential issues have been resolved. In LOVD, a curator may ask a submitter to check certain details of the submission and, when there are doubts, the curator has the right to express these in the “Remarks field.” ClinVar and the GV shared LOVD will not modify records, but display them as they are submitted, and will therefore show conflicting interpretations, i.e., variants classified differently by different submitters. Both databases do not have the capacity to review published literature and add such data to their database. They fully depend on submissions from external sources. While ClinVar focusses on variants and their classification, LOVD prefers full case-level submissions containing data about the individual, the phenotype, the analysis performed, the variant(s) detected, and their clinical classification. LOVD accepts simple variant-level data as well, e.g., reporting a variant and its interpretation, or functional data from studying the effect of a variant in a model system. The main goal of the databases is to offer access to all available data to support the clinical evaluation of the possible consequences of a variant in relation to the health of the individual carrying the variant.

ClinVar ClinVar [11] is an NIH-funded public repository reporting the relationships between variants and an individual’s health status, with supporting evidence to facilitate access to and communication about the relationships and the history of that interpretation. ClinVar partners with the ClinGen project, providing data for evaluation and archiving the results of interpretation by recognized expert panels and providers of practice guidelines. ClinVar archives and versions submissions; when submitters update their records, the previous version is retained. When submitters register, they need to provide details regarding their institute, the diagnostics methods used, and the process used to evaluate and ultimately classify variants. Based on these data, ClinVar rates submitting labs. A large fraction of the data available from ClinVar comes from the major DNA diagnostics labs in the United States, sharing their variant data and classification. The standard entry point into ClinVar is by using a gene symbol. The main display obtained is a listing of all variants linked to that gene (e.g., www.ncbi.nlm.nih.gov/clinvar/?term¼DMD), ordered based on chromosomal position (so from 30 to 50 for genes on the minus strand, Fig. 11.3). Zooming in on specific subsets of the data is achieved using a menu offering categories like “clinical significance,” “molecular consequence,” “variation type,” “variant length,” “allele origin,” “review status,” etc., each with a set of predefined choices (e.g., for “molecular consequence”: frameshift, missense, nonsense, splice site, ncRNA, UTR, and near gene). Another option is to use the “Search builder” at the top of the

228

Chapter 11 Data sharing and gene variant databases

FIGURE 11.3 The ClinVar database. The database was queried for variants in the DMD gene (top middle) and next for all missense variants (selected left under “Molecular Consequences”). Four variants are shown. The “Variation Location” column gives a variant description based on a coding DNA reference sequence (c.) and the location of the variant relative to two different genome builds, GRCh37 and GRCh38. The “c.” variant description is shown based on different reference transcripts (here NM_000109.4 and NM_0040062.2). The “Protein change” column gives predicted protein variant descriptions using the one-letter amino acid code and based on a range of different transcripts (up to eight are listed). Note HGVS nomenclature recommends descriptions with the format p.(Gln2937Arg). Subsequent columns “Condition(s),” “Clinical significance,” and “Review status” show summary data for all variants submitted more than once. For variant c.8767G > T conflicting interpretations were shared, opening the record lists all individual records and interpretations.

screen which can, for example, be used to query for a specific variant. Variant descriptions given follow HGVS nomenclature and show both a transcript based (c.) and genomic chromosomal description (g.), the latter for genome builds GRCh37 and GRCh38. When the record for a specific variant is selected (e.g., c.10108C > T for DMD) the new variant display (www.ncbi.nlm.nih.gov/clinvar/variation/11213/) shows all independent reports of that variant, split over submitted interpretations (7 for c.10108C > T, February 2020) and published cases (10 for c.10108C > T, linked to PubMed). The confidence level and accuracy of variant classifications depend largely on the number of observations and supporting evidence available. All data, both consensus and conflicting, are displayed. To support users, each variant record includes a star-rated “Review status” summarizing the current status of the variant’s classification, for c.10108C > T a

Focused databases

229

two star rating meaning “criteria provided, multiple submitters, no conflicts.” Submitted variant entries provide an “Evidence details” link showing the submitter’s comments regarding the clinical significance of the variant.

Global Variome shared LOVD While ClinVar and the “Global Variome shared LOVD” have largely overlapping goals, the origin of the latter is quite different, being an unfunded community-driven initiative. Database design of the GV shared LOVD, defined by the LOVD software, follows international standards [6,7]. The historic roots of the initiative are evident from the displays shown, which include a limited number of custom options. The customization, allowing database curators to personalize their database and acknowledge their home institution, funding agencies, and others, were essential to accomplish collaboration from the diverse LSDB community. LOVD-powered databases collect four basic types of data: the Individual investigated, the Phenotype observed, the Screening performed, and the Variants detected. While in the GV shared LOVD most curators simply use the preset standard data fields, some use the “custom columns” option available for the Phenotype and Variant displays. These columns contain data considered essential by the expert curator and are displayed in addition to the standard columns. Also Individuals and Screenings can have custom columns, but these will be system-wide and can not be set by curators. The standard entry point into the database is by using a gene symbol (e.g., www.LOVD.nl/DMD). The main display obtained is a gene variant homepage giving important details on the database. Shown are the name of the responsible curator, specific database details, links to other relevant sources, and a range of data display options, including links to view the data using the main genome browsers. The gene homepage also links to general data summaries (“Graphs”) and specific tools like the “Reading Frame Checker” predicting the consequences on protein level of multi-exon deletions or duplications. The gene homepage display shows a series of tabs (“Individuals,” “Diseases,” “Screenings,” “Variants,” “Genes,” and “Transcripts”) linking to the data collected (Fig.11.4A). Per tab, a mouseover menu shows options to more specific subsets of the data (Fig.11.4B). Selecting the “Variants” tab lists all variants, ordered based on their position in the active transcript (from 50 to 30 ). Variant descriptions given follow HGVS nomenclature and offer a choice for a transcript based (c.) and genomic chromosomal description (g.), the latter for genome builds GRCh37 and GRCh38 (Fig.11.5A). Zooming in on specific subsets of the data is achieved using the query boxes at the top of each data column. Complex queries are supported by the option to query per column and using Boolean operators for AND (space), OR (“|”), and NOT (“!”). An overview of all variants in the database, displayed on one of the major genome browsers, can be obtained from the gene homepage (Fig.11.5B). A local genome browser view around a variant selected can be obtained from the Variant record. Compared to ClinVar, LOVD has some unique features. Given the structure of the database and the data collected, it is rather easy to get from a Variant to an Individual and the associated Phenotype. In cases where a gene has an active curator, often a lot of detailed phenotypic data will have been collected and displayed. For a specific gene, several different transcripts can be linked to show variant listings (e.g., www.LOVD.nl/CDKN2A). In an LOVD database there is a mandatory “RNA” column showing whether the consequences of the variant on RNA level have been investigated and, when yes, how they affected the transcript, described using HGVS recommendations. When a variant affects

230

Chapter 11 Data sharing and gene variant databases

FIGURE 11.4 The Global Variome shared LOVD. Standard LOVD display after selecting the DMD gene. (A) The database offers displays based on “Genes,” “Transcripts,” “Variants,” “Individuals,” “Phenotypes” (diseases), and “Screenings.” Clicking a tab will display the information selected. (B) Per tab, specific menu options become available through a mouse-over. The Variant tab offers display of all genomic variants, all variants affecting a transcript or variants affecting a specific gene (here DMD). (C) The database offers a range of special record types to label specific records. These labels can be used to select specific records (see text).

RNA processing, the consequences on protein level are indicated in the “Protein” column. Other databases, including ClinVar, have no RNA field and go directly from DNA to (predicted) protein consequences thereby neglecting and not reporting available experimental data. Unlike other databases, LOVD shows the combination of variants identified in an individual, in one gene or in different genes. LOVD therefore allows to see in which combinations variants have been found in recessive or di-genetically inherited diseases and whether this has consequences for disease severity. For variants in the CYP genes (see www.LOVD.nl/CYP2D6), this allows the detailed display of all variants on a specific allele (e.g., CYP2D656A databases.lovd.nl/shared/individuals/00074484) as well as listing the two alleles in one individual (see databases.lovd.nl/shared/individuals/00046493). Besides standard Variant records, linked to an Individual and Phenotype, the GV shared LOVD contains a set of specifically labeled records (Fig.11.4C). Each variant can have one “SUMMARY record,” when available showing the opinion of the curator(s) or an international expert panel regarding the classification of the variant (e.g., ENIGMA for BRCA1/BRCA2 and InSiGHT for the colon cancer genes). “CLASSIFICATION records” are used when labs are not able to share Individual and Phenotype data but are willing to share their classification of the variant, an option pioneered by all Dutch diagnostic labs [12]. “In vitro (cloned)” records are used to show data resulting from assays testing the consequences of the variant on the function of the gene/protein. “In silico” records show

FIGURE 11.5 The Global Variome shared LOVD variant query. (A) The database was queried for variants in the DMD gene and next for all unique variants (selected from “Variants” tab). Using the “Exon” column query box (top of column) variants in exon 59 were selected and using the “DNA change” (cDNA) column, variants starting with “c.89”. A range of variants have been retrieved, each column showing summary data. The “Effect” column shows the reported consequence(s) for the variant on gene function, the “Reported column” the number of independent variant submissions, and the “ClassClinical” column the clinical classification(s) submitted. Variant descriptions given are shown at DNA, RNA, and protein level, based on one transcript reference sequence (NM_004006.2, listed above the table), and two genome builds (only hg19 shown), all following HGVS recommendations. The “RNA change” column shows whether RNA was analyzed and, when yes, what was observed (for variants c.8914C > T and c.8938-9T > A splicing was affected). (B) UCSC genome browser view showing the location of all CAPN3 variants in the GV shared LOVD. Variants are concentrated around the exons. Large deletions are shown as blue lines, and duplications as orange lines.

232

Chapter 11 Data sharing and gene variant databases

consequences as predicted by bioinformatic tools, “animal model” records data available from other organisms, and “artifact” records are used to warn for false-positive variant calls. Since these records are labeled specifically, users can either zoom in specifically on these records or exclude one or more types from their display. Detailed phenotype queries can be performed using the “Diseases” tab. Selecting a specific phenotype opens a display showing all Individuals linked to that phenotype. Selecting subsequently “Phenotype entries for this disease” opens a page facilitating a detailed comparison of all Individuals linked to that phenotype. Linking “Variants” and “Phenotypes” can be achieved using the “Full data view for gene” available in the drop-down menu under the Variants tab. LOVD databases contain a range of APIs, facilitating the exchange of information with the central LOVD server, responding to variant queries as well as to submit data. The submission API is very powerful, allowing submitters to directly link their hospital LIMS system to automatically submit their data. Other databases (including ClinVar) rarely have this advanced option, for submissions demanding active human interference with e-mail steps and specific file formats. The submission API, pioneered by two German labs, is used by a growing number of submitters. Access to LOVD databases is supported by a range of short urls. An HGNC-approved gene symbol can be used to go to the list of known databases for a gene (DMD.variome.org) or the database for that gene in the GV shared LOVD installation (www.LOVD.nl/DMD). From the GV shared LOVD, based on the two-letter country code, a URL like mx.LOVD.org will retrieve a list of all Variants linked to Individuals from that country (in this example, Mexico) and to a list of all Variants shared by submitters from that country. A link based on the database ID (www.LOVD.nl/DMD_000007 for DMD_000007) immediately displays all records in the GV shared LOVD for that variant. Data linked to specific publications, when referenced in the submitted data, can be retrieved using their PubMed ID or DOI (databases.lovd.nl/shared/references/PMID:23900271, databases.lovd.nl/shared/references/ DOI:10.1038/ejhg.2013.169). Variant queries across all LOVD installations are offered through a central API and the LOVD website (www.lovd.nl/3.0/search), facilitating queries using a specific genomic position as well as using a (short) range of positions. As mentioned, LOVDs facilitate access using an API and they participate in the GA4GH Beacon project (beacon-network.org). When the major databases only contain a few observations of a variant, or do not contain any variant reports, the beacon project can be very helpful to quickly check whether data may be available from other less frequently used resources.

Other databases While their main focus is on small variants, both the GV shared LOVD and ClinVar databases include large rearrangements, genomic structural variation (multi-gene deletions and duplications, translocations, deletion-insertion, transposition, etc.), as well. There are, however, other databases which have focused on structural variation specifically, e.g., NCBI’s database of human genomic Structural Variation (dbVar, www.ncbi.nih.gov/dbvar) and EBI’s Database of Genomic Variants archive (DGVa, www.ebi.ac.uk/dgva), currently transitioning its data to the European Variation Archive (EVA). The Database of Genomic Variants (DGV, dgv.tcag.ca) includes selected high-quality datasets from dbVar and DGVa, which have been further curated for accuracy and validity. The DECIPHER database (decipher.sanger.ac.uk) was initiated as a platform to store, analyze, and share plausibly pathogenic structural variants from well-phenotyped patients suffering from genetic disorders. While it initially

Final considerations

233

contained data from large structural variants only, when whole-exome sequencing became available, it started to include small variant data as well. The focus of DECIPHER is not to catalog variants but to offer a platform to establish new geneedisease links by providing tools for variant analysis and the identification of other patients exhibiting similar genotypeephenotype characteristics. Another important source of genetic variation is the Catalogue Of Somatic Mutations In Cancer (COSMIC, cancer.sanger.ac.uk/cosmic). COSMIC is the world’s largest resource containing somatic variants and their impact in human cancer and can, for example, be used to determine cancer-specific mutation profiles. COSMIC is manually curated and updated four times each year. Since it is often difficult to discriminate between somatic and germline variants, COSMIC will contain many germline variants as well. Like for DNA, there are hundreds of databases focusing on proteins and their many different features only [13]. Most informative in relation to clinical diagnosis are those focusing on protein structure and collecting information on the consequences of variants on protein level. We would like to mention two specifically: the Protein Data Bank (PDB, www.rcsb.org) containing information about the 3D shapes of proteins and UniProt (www.uniprot.org) containing sequences and annotations for over 120 million proteins across all branches of life. These databases facilitate the analysis of evolutionary conservation of a specific protein, highlighting evolutionary conserved and nonconserved residues, as well as the conservation of specific functional protein domains, e.g., an ATPase or DNAbinding domain, and their conservation across different proteins.

Final considerations Most gene variant databases do not have the capacity to review published literature and manually add the data reported. Consequently, getting data published in scietific literature does not guarantee these data will be incorporated in the major central repositories and thereby become available for automated API-based queries from exome and genome sequencing annotation pipelines. A growing number of journals have therefore started to demand database submission as an inherent step of the process to review and accept manuscripts for publication. Although database submission is obviously of interest to authors, their data will be easier to find, they mostly consider it as additional work. However, a lot is gained when data are submitted to a database first [14]. The author will get a free consistency check, improving overall data quality. In addition, database submission gives the option to link to the database for tabular overviews (e.g., the GV shared LOVD DOI link mentioned above), obviating the need to add supplemental data to the manuscript submitted. Before submitting data, authors should first check the options available, read the manual or follow explanatory presentations/videos offered, and browse the database to get an idea of the data collected and the format used to store them. It is also wise to compare the options offered. While most databases have the option to submit data individually through web forms, others may offer a batch upload using a standard format or assistance when data have been stored in other formats. The GV shared LOVD even supports automated submission by linking to a hospital LIMS system. Important to mention is that while databases do check data quality before it is uploaded, they only display the information as submitted by users. They do in general not rate this information or, for example, classify variants. It is therefore not correct to state that according to the GV shared LOVD or ClinVar the variant is classified as pathogenic (class 5). Classifying a variant is the responsibility of the

234

Chapter 11 Data sharing and gene variant databases

investigator, the databases’ aim is to support this classification by displaying relevant data. For a specific variant this may mean “conflicting data” is shown, i.e., different variant classifications submitted by more than one submitter. Conflicting classifications often derive from lacking initial information, i.e., when the variant was reported for the first time (older submissions) it could not be accurately classified. Over time, more information will be collected and a more specific classification will become possible (toward benign or pathogenic). Unfortunately, submitters rarely come back to update older submissions. In other cases, conflicting classifications show current opinions did not yet converge to a consensus classification. When performing variant analysis, an excellent starting point is a genome browser. Using the browser the gene, transcript, and protein involved, its genomic location, evolutionary conservation, etc., can be visualized. In addition, when the proper tracks have been activated, one can immediately see what variants and variant types have been identified before and where more detailed information can be found. Informative tracks to activate include dbSNP/EVA, gnomAD (ExAC, EVS), LOVD, ClinVar, COSMIC, HGMD, DGV, Decipher, OMIM, UniProt, GWAS, and Publications. The browser will, when showing that a variant is present in any of these resources, provide a direct link to this information. A problem that cannot be solved easily is that there are way too many databases [15]. While the intention behind starting a new database in general is good, in the end a new database just worsens the problem. Instead of starting a new database, the way forward is to find a database which comes close, lacking the dimensions you consider essential, join efforts, and add the missing features. Unfortunately, such collaborations are not supported by funding agencies, where it is often easier to get support to build a new database than to extend and thereby sustain an existing database. An important task is performed by the database curator. In general, a curator is an unpaid, voluntary expert willing to spend some free time on the database. A concise database curator, actively collecting information and contacting colleagues convincing them to share unpublished information, will experience that the efforts invested are well appreciated. The curator will receive many compliments and invitations to present the database work, be considered as a world-leading expert on the gene/ disease, get many opportunities for collaborative research, and will have the option to publish database updates. Given the amount of genes in the human genome and the number of gene variant databases that still lack an active curator, anybody working in the field of clinical diagnostics should feel obliged to become involved and volunteer for at least one gene!

References [1] den Dunnen JT, Dalgleish R, Maglott DR, et al. HGVS recommendations for the description of sequence variants: 2016 update. Hum Mutat 2016;37(6):564e9. https://doi.org/10.1002/humu.22981. [2] Aartsma-Rus A, Fokkema I, Verschuuren J, et al. Theoretic applicability of antisense-mediated exon skipping for Duchenne muscular dystrophy mutations. Hum Mutat 2009;30:293e9. https://doi.org/10.1002/ humu.20918. [3] Amberger JS, Bocchini CA, Schiettecatte F, Scott AF, Hamosh A. OMIM.org: Online Mendelian Inheritance in Man (OMIMÒ), an online catalog of human genes and genetic disorders. Nucleic Acids Res 2015;43: D789e98. https://doi.org/10.1093/nar/gku1205.

Internet resources

235

[4] Ko¨hler S, Carmody L, Vasilevsky N, et al. Expansion of the human phenotype ontology (HPO) knowledge base and resources. Nucleic Acids Res 2019;47:D1018e27. https://doi.org/10.1093/nar/gky1105. [5] Karczewski KJ, Francioli LC, Tiao, et al. Variation across 141,456 human exomes and genomes reveals the spectrum of loss-of-function intolerance across human protein-coding genes. BioRxiv 2019. https://doi.org/ 10.1101/531210. [6] Cotton RG, McKusick V, Scriver CR. The HUGO mutation database initiative. Science 1998;279:10e1. https://doi.org/10.1126/science.279.5347.10c. [7] Vihinen M, den Dunnen JT, Dalgleish R, Cotton RG. Guidelines for establishing locus specific databases. Hum Mutat 2012;33:298e305. https://doi.org/10.1002/humu.21646. [8] Fokkema IF, Taschner PE, Schaafsma GC, et al. LOVD v.2.0: the next generation in gene variant databases. Hum Mutat 2011;32:557e63. https://doi.org/10.1002/humu.21438. [9] Be´roud C, Hamroun D, Collod-Be´roud G, Boileau C, Soussi T, Claustres M. UMD (universal mutation database): 2005 update. Hum Mutat 2005;26:184e91. https://doi.org/10.1002/humu.20210. [10] Stenson PD, Mort M, Ball EV, et al. The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies. Hum Genet 2017;136:665e77. https://doi.org/10.1007/s00439-017-1779-6. [11] Landrum MJ, Kattman BL. ClinVar at five years: delivering on the promise. Hum Mutat 2018;39:1623e30. https://doi.org/10.1002/humu.23641. [12] Fokkema IFAC, van der Velde KJ, Slofstra MK, et al. Dutch genome diagnostic laboratories accelerated and improved variant interpretation and increased accuracy by sharing data. Hum Mutat 2019;40:2230e8. https://doi.org/10.1002/humu.23896. [13] Xu D. Protein databases on the internet. Curr Protein Pept Sci 2012. https://doi.org/10.1002/ 0471140864.ps0206s70. Chapter 2:Unit2.6. [14] den Dunnen JT. Efficient variant data preparation for human mutation manuscripts: variants and phenotypes. Hum Mutat 2019;40:1009. https://doi.org/10.1002/humu.23830. [15] den Dunnen JT. Yet another database? Hum Mutat 2018;39:755. https://doi.org/10.1002/humu.23429. [16] Richards S, Aziz N, Bale S, et al. Standards and guidelines for the interpretation of sequence variants: A joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. 2015;17(5):405e24. https://doi.org/10.1038/gim.2015.30.

Internet resources COSMIC cancer.sanger.ac.uk/cosmic. ClinVar www.ncbi.nlm.nih.gov/clinvar. dbVar www.ncbi.nih.gov/dbvar. dbSNP www.ncbi.nlm.nih.gov/snp. DECIPHER decipher.sanger.ac.uk. DGVa www.ebi.ac.uk/dgva. DGV dgv.tcag.ca. Ensembl www.ensembl.org. EVA www.ebi.ac.uk/eva. gnomAD gnomad.broadinstitute.org. GWAS catalog www.ebi.ac.uk/gwas. GWAS Central www.gwascentral.org.

236

Chapter 11 Data sharing and gene variant databases

GV shared LOVD databases.lovd.nl/shared. HGMD www.hgmd.cf.ac.uk. HPO hpo.jax.org. LOVD www.LOVD.nl. LOVD www.LOVD.nl/3.0/search. LOVD mx.LOVD.org. LSDB list lsdb.variome.org. PDB www.rcsb.org. OMIM www.omim.org. UCSC genome.ucsc.edu. UniProt www.uniprot.org. UMD www.umd.be.

CHAPTER

Approaches to the comprehensive interpretation of genome-scale sequencing

12

Christina Anne Austin-Tse1, Ozge Ceyhan-Birsoy2 1

Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, United States; 2Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY, United States

With rapidly decreasing costs, whole-exome sequencing (WES) and whole-genome sequencing (WGS), here referred to as genome-scale sequencing (GS), are increasingly appealing testing approaches in both clinical and research settings. Both settings benefit from the application of a single test that can provide information on virtually all known genes; although each will harness the data in different ways.

Clinical applications of GS Clinical applications of GS have attracted significant attention in recent years. While all clinical GS tests are focused on the subset of genes associated with human disease (w5000), clinical testing approaches can be further subdivided into disease-targeted diagnostic testing and screening of apparently healthy individuals.

Diagnostics By far the largest patient group for which clinical diagnostic GS has been applied is infant and pediatric patients [1]. This is partly attributable to the fact that individuals presenting with early-onset disease are more likely to have a genetic etiology for their symptoms. Additionally, there is significant motivation to achieve a correct diagnosis as quickly as possible in the pediatric setting as it may allow for management or treatment of the underlying disorder prior to the onset of irreversible damage. Although the analysis of GS data is a technologically demanding and labor-intensive process, recent advances have shown that sequencing, data processing, and preliminary analysis can be completed in as little as 26 h [2]. Such rapid turnaround time means that this technology can now be considered as a diagnostic approach in highly time-sensitive settings, such as a critically ill infant in the NICU or prenatal diagnosis of a fetus with anomalous findings on ultrasound [3,4]. Although diagnostic rates in the adult population are typically lower than the pediatric population [5], GS has proved useful in a variety of adult-onset diseases such as ataxia, cardiovascular disease, and kidney disease [6e9]. One advantage of GS, which can survey virtually all known human disease genes, is that it can eliminate the possibility of selecting an overly targeted genetic testing strategy that may miss a relevant diagnosis. Secondly, GS can allow for investigation of the most recently discovered disease Clinical DNA Variant Interpretation. https://doi.org/10.1016/B978-0-12-820519-8.00012-0 Copyright © 2021 Elsevier Inc. All rights reserved.

237

238

Chapter 12 Approaches to the comprehensive interpretation

genes without additional development efforts that would be needed to update a multigene panel [10,11]. Finally, even after a test has been completed, the broad scale of genomic data captured in GS provides the ability to reanalyze variants in light of new data, which has been demonstrated to further improve diagnostic yield without the need for additional testing [12e15].

Screening Outside of a diagnostic setting, clinical GS has also been applied as a screening tool in apparently healthy individuals. Here, GS represents a single test that can simultaneously identify multiple classes of potentially relevant findings, including risk for disease, carrier status for recessive disorders, pharmacogenomic variants, and even blood typing [16]. In contrast to the diagnostic applications described above, GS for screening purposes is much more commonly applied in adult patients due to numerous ethical issues associated with screening minors for adult-onset disease [17,18], or is carried out as part of clinical research studies in pediatric populations.

Research applications of GS GS can be an invaluable tool in research and has revolutionized novel gene discovery as well as understanding the genetic basis of complex disorders in recent years [19,20]. One of its widely used applications is identifying the causative gene for a rare disorder. As the cost of sequencing continues to decrease, GS has largely replaced the traditional approach of multipoint linkage analysis in affected pedigrees to identify disease genes [21,22]. In the absence of large pedigrees available to study, GS allows the rapid identification of common defective genes in different unrelated patients. Additionally, parentechild trio analysis allows the discovery of de novo variants that underlie a significant portion of genetic disorders. Researchers may also utilize GS to rapidly screen hundreds of genes in particular functional pathways of interest. Methods to filtering variants vary widely depending on the goal of the study and variants that are classified as having uncertain significance in clinical testing may be considered as candidates for further investigation using in vitro assays or animal models.

Analysis of GS results for various applications Variant annotation and filtration In a standard WES assay, approximately 100,000 positions differing from the reference sequence will be identified. Accordingly, WGS can identify millions of variants in a single individual [23]. The vast amount of variation detected by these methods necessitates a structured approach to narrowing the scope of data to be reviewed. The process is commonly known as data filtration. While filtration methods allow for the interpretation of GS data in a reasonable timeframe, they also have a significant impact on the type of results the test will identify. As a result, it is important to critically evaluate the filtration methods employed in any analysis of GS data. The overall goal of genomic data filtration is to prioritize variation that is most likely to be relevant to the test subject while minimizing the number of nonsignificant variants to be reviewed. In order to achieve this goal, the genomic variant calls generated from the sequencing data must first be annotated with information that provides evidence toward their potential significance. Filtrations are

Analysis of GS results for various applications

239

subsequently applied to prioritize annotations relevant to the goals of the sequencing assay. Therefore, in order to adequately discuss filtration strategies, one must also consider the available annotations. Here, we will review some common categories of annotation in the context of how that information can be synthesized into a useful filtration strategy.

Basic gene and variant-level data Some annotations provide basic information about how a genomic change may impact RNA or protein. Since many filtration approaches will aim to return variants in a specific subset of the w20,000 known human genes, it is clearly important to identify the gene in which a given variant resides. It is also helpful to know the precise region of the gene that is impacted. While many tools exist to evaluate the significance of variants within protein-coding regions of genes, variants that fall in the noncoding regions (i.e., introns outside of the splice consensus region, 50 and 30 untranslated regions, and promoters) are often uninterpretable, particularly in a clinical setting. As a result, clinical filtrations may choose to remove variants called in noncoding regions that do not have an established association with human disease. When considering the variants that fall within (or near) coding regions, it is also desirable to know how a genetic variant is predicted to impact the protein. Certain classes of variants, most notably silent variants that do not alter the amino acid sequence, are more likely to be benign changes (although important exceptions exist when such variants may impact splicing). As a result, some filtration strategies may aim to isolate variants that are most likely to impact protein function, such as proteintruncating variants that are predicted to cause loss-of-function (LOF).

Population frequency data Another critical piece of information used to filter out variation that is unlikely to be disease causing is the frequency at which the variant is seen in the general population. The availability of exome- and genome-wide variant frequency data through platforms such as the Broad Institute’s Genome Aggregation Database (gnomAD, https://gnomad.broadinstitute.org/) has significantly improved our understanding of the complexity of human genetic variation. These databases have revealed that many variants that were suspected to be disease-causing based on data from small cohort studies are, in fact, seen in the general population at minor allele frequencies (MAFs) that are higher than the prevalence of the associated disease [24]. With appropriate considerations, many of these variants can be ruled out as causative for highly penetrant Mendelian disease. Laboratories may also leverage internal sequencing datasets to identify and exclude high-frequency calls that represent artifacts of their specific sequencing or variant calling technologies. Filtration strategies designed to remove likely non-disease-causing variation depend on defining an MAF cutoff that applies to either: (1) a specific disease of interest or (2) all highly penetrant Mendelian disorders. When calculating such cutoffs, care must be taken to account for disease penetrance, prevalence, genetic and allelic heterogeneity, and inheritance pattern. Guidelines for the determination of filtering allele frequencies, along with a useful web-based calculator tool (https://www.cardiodb. org/allelefrequencyapp/), have recently been published by Whiffin et al. [25]. Use of population frequency data is also discussed in Chapters 4 and 5.

240

Chapter 12 Approaches to the comprehensive interpretation

Publication data and phenotype associations Depending on the purpose of the analysis, one may be interested in variants that have been associated with a particular phenotype or group of phenotypes (when the goal is to identify the cause of disease in an affected individual) or all variants that have been associated with disease in the literature. Valuable sources of variantedisease relationships for use in GS filtration include ClinVar (https://www.ncbi. nlm.nih.gov/clinvar), an open access database in which clinical laboratories and researchers alike are able to submit genetic variation they have detected along with the associated phenotypes and interpretations, the Human Gene Mutation Database (HGMD; http://www.hgmd.org), and LocusSpecific databases such as LOVD (http://www.lovd.nl/).

Inheritance patterns In many cases, a specific Mendelian inheritance pattern is suspected when GS is applied to identify the genetic etiology of a phenotype. While this information can be used to guide any filtration strategy, it has the greatest potential to improve detection rates when multiple members of a family are sequenced [21,26,27]. For instance, in cases where recessive disease is suspected, sequencing can be performed on a trio consisting of the proband and both parents. The proband’s data can then be filtered to identify genes containing two rare variants, each of which was inherited from a different parent. Of course, it is not strictly necessary to have parental data to highlight suspicious variants for recessive traits: A filtration designed to identify all genes in which two rare variants are found can produce the same results [26]. However, in the absence of parental data, a significant amount of follow-up testing may be needed to confirm that the variants of interest reside on different copies of the gene. On the other hand, trios are required for the identification of de novo variation (variants that occurred for the first time in the proband). Here, filtration strategies would identify variants present in the proband but absent from both parents. Finally, when dominantly inherited traits are suspected based on a family history of disease, sequencing of multiple family members can be used to prioritize the variation that is shared among affected individuals and absent from those who are unaffected.

Filtration approaches using available annotations Standardized guidelines for the filtration and interpretation of GS data have yet to be established. However, effective approaches have been suggested for a variety of applications [28e31]. As mentioned above, the filtration approach used is highly dependent on the goal of the sequencing test. Here, we will outline two potential filtration approaches that are often adopted in clinical and research settings. To begin, many testing approaches may focus solely on identifying variants in a selected set of genes. These may be expansive lists of all known disease genes, genes known to be relevant to a specific patient phenotype, medically actionable genes used in a population screening, or candidate genes to be tested in a research study. Selecting an appropriate set of genes is often the most difficult step of GS data analysis, particularly given that the pace of genetic discovery means that these lists are constantly evolving. There are a wide range of approaches that scientists and clinicians have taken for this task. If the purpose is to screen for all potentially clinically significant variants in the genome, one may choose to focus on genes that have been associated with disease in affected individuals. Of the w20,000 known human disease genes, only w5000 have been implicated in human disease so far, and this subset of genes is commonly referred to as the “medical exome” [32]. In order to define the “medical exome,” information must be

Analysis of GS results for various applications

241

compiled from multiple sources of geneedisease relationships, including OMIM (https://omim.org/), ClinVar, HGMD, and UniProt (https://www.uniprot.org/), and updated frequently to capture recently discovered disease genes. For phenotype-specific genes lists, the most basic method involves a thorough review of the medical and research literature by an expert individual, with the help of databases containing information about human disease genes, such as OMIM and Orphanet (https://www.orpha.net/). However, this is a manual and very time-consuming process. To ease this burden, several data sharing efforts have been developed to distribute curation efforts between laboratories and other experts in the field. Geneedisease relationship evaluations are available from experts in multiple disease areas via the Clinical Genome Resource (ClinGen, http://clinicalgenome.org/). ClinGen has also developed guidelines on how to evaluate geneedisease relationships [33]. In addition, Genomics England’s PanelApp [34] features crowd-sourced gene panels for more than 300 conditions or phenotypes. Alternatively, many tools have been developed to computationally mine through published information about geneevariant relationship with disease phenotypes, such as Phenomizer [35], Phevor [36], and Phive [37]. Many of these tools use Human Phenotype Ontology (HPO, https://hpo.jax.org/app/) [38] terms as standard phenotype descriptions. Additionally, tools such as ClinPhen [39] may be useful to help extract patient phenotypes from electronic medical records and convert them into HPO terms. In a gene discovery setting, scientists may expand their scope beyond known disease genes to genes that encode proteins functioning in a particular pathway of interest, and those that appear as candidates based on in vitro or animal model studies. The number of variants that will be returned is directly proportional to the size of the gene list generated. In some settings, it may be most efficient to take a tiered approach to gene-based analyses. For example, one could initially screen a restrictive list of well-established disease genes, expanding the search space to include candidate genes only if clearly causative variants are not identified in the initial analysis. For these expanded gene lists, it will often be necessary to limit the results further. Annotations such as population frequency and variant impact may be used to remove variation that is unlikely to have clinical significance (Fig. 12.1). However, a major strength of the GS approach is the ability to search for pathogenic variation in an unbiased manner, which is useful in screening tests but also invaluable in a diagnostic setting where it can enable the detection of atypical disease presentation, phenotypic expansion, or the presence of multiple relevant diagnoses. Since these outcomes may not be captured using the gene listebased analysis described above, some applications of GS may incorporate a more expanded filtration strategy like the one illustrated in Fig. 12.1. This two-tiered filtration is designed to capture clinically significant variants, regardless of the associated phenotype, through the selection of two (potentially overlapping) classes of highly suspicious variants introduced above: 1) variants that are predicted to cause LOF and 2) variants that have been previously identified in affected individuals. The first tier returns variants classified as disease-causing in public database such as ClinVar and HGMD. However, because these databases contain some assertions that were made prior to the availability of large population databases such as gnomAD, it is often helpful to reduce the number of spurious assertions of pathogenicity by removing the variants that are now known to be common in the general population (here a conservative cutoff of MAF1.2. Commensurate with these modest effect sizes has been revelation that hundreds or even thousands of associated common variants will underlie each complex disease.

256

Chapter 13 Phenotype evaluation and clinical context

Genomic influences for all diseases are emerging as more complicated than originally anticipated; this includes genomic influence on the phenotype spectrum and severity in classic Mendelian and chromosomal disorders such as cystic fibrosis and Down syndrome. Disease penetrance/spectrum is also subject to geneegene interaction and geneeenvironment interaction, with epigenetic modification and regulation at the level of transcription and protein expression [16]. Evolution in our understanding of the genomic architecture of disease has dramatically improved clinical interpretation of genomic data and risk estimation, but much still remains to be understood.

Large-scale data generation When capacity and cost were limited, testing strategies were typically a gene at a time and heavily driven by detailed upfront phenotypic evaluation. As test cost and capacity have improved, it is now more typical that large gene panels or even WES/WGS is requested to simultaneously investigate a broad range of genes, with more detailed consideration of genotypeephenotype post hoc on receipt of the genomic data [10]. These developments have led to an exponential increase in the volume of data generated, both in terms of data per individual and number of individuals. Where previously the vast majority of time and resources were focused on “wet” laboratory costs, the cost of data processing, interpretation, and storage now increasingly predominate. Proposed future models of genetic testing include a “sequencing for life” approach, whereby extensive sequencing data previously generated from an individual are re-interrogated as the understanding of both genomics and the patient’s phenotype progress [17].

Evolution in clinical diagnostic variant interpretation Historic empirical disease-based interpretation Limited early availability of sufficiently large control series resulted in poor appreciation of the frequency among the “normal population” of genomic variation. Thus, a previously unseen rare variant in a gene associated with the disease under investigation was frequently assumed to be pathogenic, with little additional evidence. This led to the initial erroneous assignation of many variants as pathogenic, with subsequent clinical mismanagement, potentially amplified through family cascade testing [18,19]. As the results of large-scale population sequencing projects became widely available, it became apparent that many of these variants occurred at an appreciable frequency in controls and could not be the high-penetrance pathogenic variants initially ascribed. More conservative approaches ensued, with much more widespread assignation of the category of “variants of uncertain significance” (VUS/VOUS). However, assignation as VUS also creates problems, with inability to definitively rule in or out diagnosis/risk, inconsistent clinical management, and both patients and clinicians grappling with increased uncertainty and misunderstanding [20]. Inconsistent classifications between laboratories were leading to discrepancies between members of the same family being managed at different centers.

International coordination in variant data sharing Sharing of variant data with related clinical details and local interpretation has been sequentially improving. BIC and LOVD were early exemplar initiatives of Locus-Specific Databases (LSDs) for

Application of clinical and phenotypic information

257

sharing variant-level information but these largely relied on pro bono curation, inevitably leading to an appreciable frequency of erroneous entries, incorrect variant nomenclature, and uncorrected historic classifications [18,21]. With sizable and sustained funding from the National Institutes of Health (NIH), ClinVar was publicly launched in 2013 and has emerged as the de facto international repository for submission of clinical variant classifications [22,23]. Classifications are submitted by clinical and research laboratories, with substantial central infrastructure for evidence grading, cross-evaluation, and resolution of inter-submitter discordance.

Emergence of international frameworks Various expert groups attempted to integrate evidence more systematically to reduce subjectivity in clinical variant classification [24]. For example, in 2008, the International Agency for Research on Cancer (IARC) published a 5-point scale for classifying variant pathogenicity, with the categories of “Benign,” “Likely Benign,” “Uncertain,” “Likely Pathogenic,” and “Pathogenic” defined by probability of pathogenicity [25]. Research collaborations assembled large diagnostic testing datasets for BRCA1/BRCA2 and mismatch repair genes, using them as the substrate for more rigorous, predefined multifactorial analyses incorporating familial phenotype, tumor phenotype, segregation, and cooccurrence in trans [26e28]. However, these approaches relied on the amalgamation of large international datasets and so were not implementable by individual diagnostic laboratories encountering a variant for the first time. In 2015, the American College of Medical Genetics (ACMG) published a framework defining a structure by which laboratories should combine evidence of different types (clinical, functional, allelic, population, etc.) [29]. The framework has been widely adopted internationally, resulting in increased interlaboratory concordance in approach and classification [22]. Various organizations have further developed the guidance for use in different clinical contexts and disease areas [30e33].

Application of clinical and phenotypic information to variant interpretation and classification As explored in other chapters, the ultimate purpose of diagnostic variant interpretation is to establish which variants identified in a patient are disease-associated, and to determine the strength of association and for which elements of the observed disease/phenotype the variant accounts. Depending on ethnicity and analytical approach, evaluation of an individual’s genome on comparison to a reference genome will yield y3.5 million single-nucleotide variants (SNVs), >350,000 small insertions/deletions, and >1000 larger structural variants [34,35]. Our ability to discern which of these variants are disease-associated remains imperfect, but over the last three to four decades, significant progress has been made in understanding the types of data that are informative, and how to combine and weight them. Data informative to variant interpretation can be classified into: 1. Nonclinical • For example, functional, predictive/computational, and control population data 2. Clinical • For example, phenotypic, segregation, and de novo data

258

Chapter 13 Phenotype evaluation and clinical context

Publically available clinical data

Variant-level non-clinical data

Clinical data from clinical networks

Variant-level clinical data

Functional Population Predictive Variant pathogenicity assessment

Expert body guidance

Individualised risk assessment

Clinical context of family

Individualised management plan

Patient

Clinician

Clinical judgement

FIGURE 13.3 Flowchart of the process of clinical variant interpretation.

The “nonclinical” data are generally collated and interpreted by diagnostic laboratory scientists, and the data assembled are likely to be similar across different diagnostic laboratories (albeit that their interpretation may vary). The available clinical data with which to interpret a variant can vary widely between individual laboratories. The overall approach to case-level clinical variant interpretation and management is summarized in Fig. 13.3.

Sources of clinical data Clinical data sources can be subdivided into: 1. Presenting patient/family B From the family in whom the test has been requested 2. Unpublished cases from clinical networks B Networks may be disease/phenotype specific or general B Networks may be regional, national, or international 3. Publicly available B From the scientific literature, e.g., case reports/case series B From LSDs

Application of clinical and phenotypic information

259

All available clinical evidence must be assembled and subsequently combined with nonclinical evidence to achieve the most informed interpretation of pathogenicity for the variant.

Contribution of the patient under investigation Often, a variant detected on clinical testing of a proband affected with disease may be very rare, with no prior cases or classifications reported. In rare pediatric syndromes in particular, the clinical and genetic features of that very proband/family may provide critical weight in establishing whether the variant is likely to be pathogenic, particularly if the familial phenotype is especially notable or the variant is de novo in an isolated case. Thus, there is a highly dynamic interplay between what the specific patient/family under investigation can add to the growing body of knowledge for that variant/gene and best application of that body of knowledge in the management of the patient/family under investigation.

Cases from clinical networks The variant under evaluation may have been seen previously in another patient tested in that diagnostic laboratory, or in a different laboratory [36]. In many countries, networks and regional/ national multidisciplinary networks have emerged which enable direct sharing of laboratory data and clinical details within secure clinical networks. An example of this is the CanVIG-UK (Cancer Variant Interpretation Group UK), a national multidisciplinary group including all 25 UK NHS regional clinical and diagnostic genetics services. The group meets monthly, and has an active email network and a secure online variant-level database for sharing clinical information [31,33]. Not only does this type of group enable ready “pooling” of clinical evidence available from different genetics services, but also allows discussion of evidence application and independent evaluation of the clinical fit [19]. The DECIPHER database and Matchmaker Exchange are examples of technological facilitation of international networks for sharing of clinical information [34,37]. Using GA4GH beacon technologies, international clinical laboratories can interrogate each other for whether a particular variant has been reported before across the network. If there is a “match,” the respective groups can compare phenotypic data [37].

Publicly available clinical evidence Scientific literature Case reports and case series may contain clinical details about individuals and families in whom the variant under evaluation has been detected. In a large case series, often only very superficial phenotypic information is available, e.g., “affected with cancer.” In contrast, individual case reports or small case series can contain very comprehensive “deep phenotyping” information [38]. This may include radiographic images and reports, clinical photographs or biopsy results, and a detailed pedigree containing disease and genotypic statuses. In order to incorporate this evidence into the clinical interpretation of the variant, the clinician must evaluate its strength. This can be challenging if features are described using vague terms such as “significant” or “several.” The criteria used by authors can be unclear or highly subjective. Definitions may differ widely between both organizations and individuals, for example, categories for the degree of intellectual disability. Transcripts and variant nomenclature may change and should be closely checked.

260

Chapter 13 Phenotype evaluation and clinical context

Best practice would be to contact authors to provide updated or more detailed information, but usually this is not possible. Caution should be exercised if the majority of evidence for variant pathogenicity comes from a single published paper.

Repositories of variant information and locus-specific databases As described above, ClinVar is one of the most widely used repositories of variant classifications [23]. In contrast, there are hundreds of LSDs focusing on specific genes or phenotypes: RettBASE is an example of a disease-specific LSD which only curates variant information pertaining to Rett syndrome and related disorders [39]. Limited clinical information about cases in whom the variant has been seen may be provided by LSD submitters. Additional information about phase and the presence of other variants in the same individual is available through some LSD platforms. Further information can sometimes be obtained by contacting the submitting laboratory. However, laboratories may not consider it appropriate to share detailed clinical information with organizations with whom they do not have a formal arrangement and where explicit patient consent has not been obtained [19]. Obtaining further information is likely to be more feasible if there is overlap between clinical networks and LSD submitters. The cases reported on LSDs are associated with many of the same limitations as those in the scientific literature. In particular, clinical information can be extremely scanty. Care must be taken to avoid “double counting” of a case also described in the scientific literature or reported by another laboratory within a clinical network.

Phenotype assessment Phenotype assessment involves detailed evaluation of all aspects of the patient to inform the differential diagnosis and its potential genetic origins. Clinical Geneticists undertake direct phenotypic assessment of the patient/family under review in clinic, but the same principles should be applied on scrutinizing others’ phenotypic assessments of case data reported in the literature or on databases. Detailed clinical history is essential: where relevant, this should include details of pre- and perinatal events and exposures, attainment of developmental milestones, and a full systems review to elicit any additional events or pathologies. Detailed family history evaluation is also critical to elicit whether family members have overt or occult features suggestive of a common genetic etiology. To clarify reported diagnoses and features in the proband or family members, historic medical records, death certificates, and/or registry confirmations should be sought. A patient-reported diagnosis of “stomach cancer” could turn out to be ovarian cancer or lung cancer with liver metastases after consultation with a cancer registry [40]. Thorough physical examination of the proband is required to evaluate phenotype and in particular to establish the presence/absence of additional syndromic features. It is often informative for other people in the family in whom a diagnosis is being considered to be assessed and examined in order to establish directly the presence/absence of phenotypic features. However, the relevant family members may be deceased, abroad, or unwilling, limiting feasibility. Some aspects of physical examination are somewhat subjective, in particular whether dysmorphic features are abnormal and the severity thereof. There is anticipation that emergent 3D facial photography will improve consistency in cataloguing of variation in facial morphology [41]. Other

Application of clinical and phenotypic information

261

Table 13.1 Examples of objective and subjective findings on examination and investigation. Objective (binary)

Objective (continuous)

Examination finding

Single palmar crease Single midline incisor

Head circumference Armspan/height ratio Weight

Investigation result

Third-degree heart block on ECG

Length of QT interval on ECG

Subjective (binary/ categorical) Broad thumbs Coarse facial features Deep palmar creases Hyperreflexia Ichthyosis severity Retinal vascular abnormalities Loss of MMR proteins (tumor IHC) Hydrocephalus (neonatal cranial ultrasound) Tumor histopathological subtype Signs of skeletal dysplasia (skeletal survey) Neuronal migration defect (MRI brain)

aspects of a physical examination are relatively objective and binary. Continuous measures such as growth and anthropometric parameters can be objectively quantified. Similar principles apply to the interpretation of investigations. While some features can be objectively quantified, others are subjective and opinions may differ between interpreting specialists. Examples of objective and subjective findings on examination and investigation are shown in Table 13.1.

Incorporation of clinical data in variant interpretation When considering the weight of information provided by the observed clinical phenotypes toward assignation of the variant as pathogenic, the following factors must be taken into consideration. We highlight where there is particular relevance to evidence categories/criteria defined in the 2015 ACMG framework [29].

Reliability and robustness of phenotypic data under evaluation As previously discussed, descriptive dysmorphic features can be highly subjective and their presence or significance may be equivocal. Thus, so-called “soft” dysmorphic features on examination (e.g., mild skin syndactyly) or “weak” entities on histopathology (e.g., faint ER staining) should not be weighted heavily in the assessment. Similarly, data on published clinical cases in older or lower profile journals may be deemed to be less robust.

262

Chapter 13 Phenotype evaluation and clinical context

Completeness of the available information and active inclusion/exclusion of clinical features Some phenotypic features can be clinically overt; others may be subtle or require specific equipment or expertise. Comprehensiveness of evaluation is critical when assessing the information provided by phenotypic assessment, i.e., whether a feature has been actively ruled out rather than not looked for. For example, an iris coloboma, suggestive of CHARGE syndrome, may be missed on general examination unless specifically looked for. Axillary and inguinal freckling, suggestive of neurofibromatosis type 1 (NF1), will only be identified if the patient’s entire skin surface is examined. The characteristic hypopigmented macules of tuberous sclerosis will only be identifiable on diligent examination with Woods lamp in a darkened room. In many conditions involving an eye phenotype, clinical examination by an experienced ophthalmologist is essential to capture more subtle pathological signs. Additional investigations, including imaging, biochemical tests, or examination of tumor material, may be required to rule specific features in or out. On occasion, specialist investigations (e.g., electrophysiological studies for neurological disorders, audiograms for hearing impairments) may also be required. Some phenotypic features may only be identified via specific invasive tests, such as skin or muscle biopsy.

Presence of other valid explanations for the clinical features observed in the proband Observation of encephalopathic changes on neonatal neuroimaging may be suggestive of particular genetic disorders, but is less informative in the context of significant birth trauma. IQ and head circumference vary widely among the general population due to polygenic variation. Observing that a proband’s educational attainment or head circumference is very out of keeping with that of other family members is more suggestive of this being part of a genetic syndrome. Body mass index (BMI) is driven by polygenic and environmental causes which thus are likely to cluster in families: severe childhood obesity in a child with a nonobese family is more predictive of an underlying genetic disease.

Absence of classical high-sensitivity features in the proband(s) If a particular feature is ubiquitous or highly penetrant in specific genetic form(s) of disease, absence of this feature may be informative to evaluation of the “fit”. For example, absence of colorectal cancers within a family exhibiting excess of gynecological cancers would weight against pathogenicity of a rare variant in MLH1 or MSH2 (pathogenic variants in which are associated with Lynch syndrome with high penetrance for colorectal cancer). Leiomyomata emerge with age and are present in >75% of patients with molecular diagnoses of hereditary leiomyomatosis and renal cell carcinoma (HLRCC): complete absence of this feature in older patients reduces likelihood of this diagnosis [42].

Specificity of the observed phenotypic feature(s) for the genetic form of disease Clinical features that are “exclusive” to pathogenic variants in a specific gene (i.e., have 100% specificity) are termed “pathognomonic.” These features are extremely rare, but a given feature has an

Application of clinical and phenotypic information

263

extremely high positive predictive value for the genetic form of disease (e.g., the ocular finding of anterior lenticonus in an individual with Alport syndrome [43]). Conversely, while cafe´ au lait macules can be a sign of NF1, 10%e20% of adults have at least one cafe´ au lait macule, so this finding in isolation may not be significant [35]. In the ACMG framework, the PP4 criterion is assigned for highly specific phenotypic features [29].

Genetic heterogeneity: number of genes associated with the genetic form of the disease While some clinical/phenotypic features may be well associated with disease, there may be multiple associated genes. For example, chromosomal breakage in response to mitomycin C is characteristic of Fanconi Anemia, a condition associated with pathogenic variants in over 20 different genes [44]. Retinitis pigmentosa is a distinctive ocular phenotype but there are >80 genes associated with nonsyndromic forms of the condition. When evaluating the significance of detecting a variant in one gene relative to a particular disease, it is important to consider whether all potentially relevant genes have been investigated. Some phenotypic subelements may be particularly more specific to one or other gene within a group of genes associated with a broader phenotype. For example, ectopia lentis, high arched palate, and osteopenia are seen in individuals with primary congenital glaucoma caused by pathogenic variants in LTBP2, but not CYP1B1 or TEK [45].

Frequency information in genes with rare variation in the general population For the gene under consideration, it is useful to gauge the frequency in the population of benign rare variants of a given type (e.g., missense) or in a given region of the gene (e.g., a particular domain). This can be compared to the frequency of pathogenic variants of equivalent type in cases with disease. Essentially, this is addressing the question “How remarkable is it that a rare variant of this type has been observed in this gene in this/these probands/families with this/these particular (unusual) clinical feature(s)?”. In the ACMG framework, the PP2 and PM1 criteria incorporate information about rates of pathogenic and benign variation in cases versus the population. PP2 is defined as “missense variant in a gene that has a low rate of benign missense variation and in which missense variants are a common mechanism of disease” and PM1 is stated as “located in a mutational hot spot and/or critical and well-established functional domain (e.g., active site of an enzyme) without benign variation” [29].

Composition of the type of established pathogenic variation within a gene In certain diseases, particular types of variants within a gene may be entirely or predominantly responsible for disease causation, also known as the disease mechanism. For instance, within a cardiomyopathy phenotype, pathogenic variants in MYH7 are predominantly missense, often presumed to be acting through a dominant negative disease mechanism. Loss-of-function variants in MHY7 may modify disease phenotypes, but are not generally considered disease causing in isolation [46,47]. In contrast, pathogenic variants in MYBPC3 causing cardiomyopathy tend to be loss-of-function variants causing disease through haploinsufficiency [48].

264

Chapter 13 Phenotype evaluation and clinical context

Frequency of variation observed in cases with disease compared to the control population If a specific variant has been observed multiple times in cases of a particular disease, this gives additional weight to its implication in disease. However, the weight of this evidence is predicated on (i) the frequency of the variant in the general population and (ii) the number of cases tested from which these positive cases have been sampled. If the cases have been assembled from a combination of local laboratories, LSDs, and the literature, it is impossible to accurately estimate the total number of tests represented. Furthermore, when comparing observations in cases compared to controls, it is important that the ethnicity of the population control data matches that of the case data. The sex and age of the control series must also be taken into account, particularly when a disease may be late onset or have reduced penetrance. In the ACMG framework, the PS4 criteria has been assigned to represent the contribution to variant interpretation of such “caseecontrol” evidence, stated as “the prevalence of the variant in affected individuals is significantly increased compared with the prevalence in controls” [29].

Mode of inheritance and segregation of disease The degree to which patterns of disease match the pattern of variant inheritance within a family can be highly informative in assessing a variant’s likelihood of pathogenicity. Confirmation that a variant has arisen de novo when a child demonstrates an unusual phenotype and their parents do not is strongly suggestive of that variant being causative of the phenotype in the child and thus pathogenic. Likewise, demonstration that the variant segregates with disease over multiple meioses in an extensive multicase family lends substantial weight to the variant being causative of disease in the family. However, inference of the weight of evidence derived from segregation data is highly contingent on the specificity of the phenotype for the gene in question, its age-related penetrance, and the robustness by which genotype/phenotype correlations have been ascertained and excluded in the family. Many phenotypes, such as breast cancer, are relatively common. The presence of phenocopies can lead to an apparent lack of segregation. Reduced and age-related penetrance can also make the interpretation of segregation data more challenging [49]. Determining the affected status of parents of a proband in order to interpret potential de novo evidence may not be straightforward, particularly when there is wide variation in the population of features of a condition, such as increased head circumference in Cowden syndrome. The possibility of nonpaternity must be taken into account and parental mosaicism can also confuse any assessments made. In the ACMG framework, the PP1 criterion has been assigned to represent the contribution to variant interpretation of strength of segregation (“cosegregation with disease in multiple affected family members in a gene definitively known to cause the disease”). The number of informative meioses, individuals, and families affect the recommended PP1 evidence weights. The PS2 and PM6 criteria represent the contribution of de novo observations of the variant (PS2: “de novo (both maternity and paternity confirmed) in a patient with the disease and no family history,” PM6: “assumed de novo, but without confirmation of paternity and maternity”). Building on the original ACMG framework, several groups have recommended that de novo and segregation evidence can potentially be used at an evidence strength level of up to “very strong” where the phenotype is highly specific [31,49,50].

Management of the patient based on the genomic data

265

Management of the patient based on the genomic data Once the totality of available information has been assembled, any potentially relevant rare variant(s) detected are assigned classification(s), namely as being (likely) pathogenic, uncertain, or (likely) benign. If it is concluded that a detected variant is (likely) pathogenic, the clinician must next make a judgment about whether it is likely to explain the observed phenotype, wholly, predominantly, or in part. If it is deemed largely irrelevant to the phenotype, the variant is considered an “incidental” finding but may still have clinical implications.

Genomic findings of uncertain significance Having gathered all currently available clinical and nonclinical evidence, the consensus classification for a variant may be as “of uncertain significance,” for which management should be equivalent to that of a negative test [51]. In some clinical circumstances, additional evidence may be readily obtainable which could lead to variants being up- or downgraded. Testing of parents and relatives to obtain phase, de novo, or segregation data may enable definitive classification as (likely) pathogenic, or demotion to (likely) benign. More in-depth phenotyping of probands or relatives may also add key information. Identification of other phenotypically similar families with the variant may be possible via the local clinical network or endeavors such as Matchmaker Exchange/DECIPHER [34,37]. Functional assays performed on the patient under evaluation may be informative, in particular assays of splicing activity. However, for many variants, no such additional immediate evidence is attainable. Practice varies widely, but recent pan-UK laboratory guidance recommends that VUS should only be included on reports if additional evidence might change the classification or a MDT considers there to be clinical utility in reporting the variant [31]. According to the ACMG framework, the category of “variants of uncertain significance” (VUS/ VOUS) encompasses all variants with a likelihood of pathogenicity of 10%e90% [29]. This is clearly a wide bracket, with some variants being much more likely than others to be pathogenic. Some groups have sought to provide gradation for this uncertainty and would recommend that variants in the highest bracket of uncertainty also be included on the clinical report and prioritized for active monitoring for additional evidence [31,52,53]. Regular proactive review of all catalogued variants for newly available evidence would constitute best practice but may not be feasible for all diagnostic laboratories, particularly those with more limited resources. Responsibility for variant review is an ongoing ethical and practical debate. The local approach to variant review should be clearly communicated to the patient in order to give realistic expectations and advice regarding whom to follow up with if questions arise. The approach to variant review will depend on the extent of the evidence currently available, the clinical context, and the impact that variant reclassification would have. If variant classification changes as a result of re-review, decisions must be made on a case-by-case basis regarding whether to reissue all patient reports pertaining to the variant and how best to disseminate reclassification information among clinicallaboratory networks. Specific policies and approaches may vary between laboratories, depending on resources and service organization but the main options available for VUS management are: 1. High suspicion VUS for which additional evidence is likely to be available in the future, to be reviewed by clinical/laboratory team at specified intervals

266

Chapter 13 Phenotype evaluation and clinical context

2. Intermediate VUS for which evidence for or against pathogenicity may or may become available in the future, patients to recontact genetics unit if additional information becomes available or in several years’ time 3. VUS which is unlikely to be pathogenic, not for routine review unless triggered by additional information from patient or clinical/laboratory community or reidentified in subsequent patient samples

The “negative” genetic result: when no causative variants are found When a test is negative or the results do not adequately explain the phenotype, the next assessment is whether additional testing is likely to be beneficial. Crucial to this decision is appreciation of the scope and limitations of the first test as well as understanding of the genomic architecture of the phenotype in question. Additional tests may examine alternative mutational mechanisms for the same gene(s), for example, MLPA for dosage abnormalities or triplet primed PCR to for triplet repeat expansions [54]. Alternatively, further testing could involve broadening the gene set analyzed from a limited gene panel to a clinical exome. If a single pathogenic sequence variant is identified in a relevant gene where recessive inheritance is expected, this substantially raises the probability that the other allele has a pathogenic variant. Extended sequencing into the intron or targeted microarray for a copy number variant (CNV) may be indicated. If standard diagnostic testing has been exhausted, recruitment to a research study may be possible if a high index of suspicion for an underlying molecular diagnosis remains. These decisions are guided by factors such as the level of clinical suspicion, the genetic heterogeneity of the phenotype, and the impact that a molecular diagnosis might have on management. If the phenotype is fairly common and polygenic and/or nongenetic mechanisms are thought likely, further investigations for a monogenic cause may not be appropriate. Clinical management recommendations (including any screening, surgical, or chemoprophylaxis) may be given on the basis of existing clinical and familial information. This should include guidance on when and whether families should come back for up-to-date information on testing developments.

Management for a pathogenic variant Individualized risk estimation General risk estimates Once a (likely) pathogenic variant has been detected in the patient, the associated disease risks must be estimated. These may relate to elements of disease not yet manifest in the affected proband in whom the variant was detected, or the risks of disease in family members also identified as carrying the variant. Such risk estimates are typically derived from penetrance analyses in large case series, accessible from the scientific literature or expert body guidance. However, these estimates are highly subject to upward inflation due to ascertainment bias, since cases/families with more severe phenotypes are more likely to have presented clinically and been offered genetic testing. Risk estimation is particularly challenging for rare conditions where the clinical features of few affected patients have been described in detail. Risk estimation is often based on gene-level information, but data pertaining to particular variant types, gene regions, or even specific variants may also be variable (see Table 13.2).

Management of the patient based on the genomic data

267

Table 13.2 Examples of phenotypic variation by variant type, gene region and individual variant. Specificity level

Example

Variant type

Truncating TP53 variants generally have a lower penetrance than pathogenic missense variants in the DNA-binding domain [55] Pathogenic variants in exons 24e32 of FBN1 may cause atypically severe, early-onset Marfan disease [53] Disease risks vary widely between the ovarian cancer (OCCR) and breast cancer (BCCR) cluster regions of BRCA1 and BRCA2 [56] Caseecontrol data suggest ATM c.7271T > G has substantially higher penetrance for breast cancer than other disease-associated variants in ATM [57]

Gene region

Specific variant

Risk estimates derived from cases/families ascertained on phenotype are likely to be upwardly biased, even when based on prospective data (e.g., the prospective Lynch Syndrome Database [58]). Furthermore, the risk estimates quoted for pathogenic variants in a particular gene may encompass a relatively wide range, even when supported by large datasets. Distilling these risks into an individualized risk for a specific patient is challenging as the clinician is empirically seeking to take into account the context of ascertainment, the family history, and the additional risk factors. For some conditions, tools based on modeling of large datasets can assist the clinician with this complex data integration. For example, BOADICEA is a programme incorporating single gene variant data, polygenic risk scores, family history, and/or nongenetic factors in order to enable estimation of residual lifetime risk of breast and ovarian cancer [59]. While such tools may be useful, implementation may be limited by missingness of data and uncertain personal and family information.

Contextualizing risk estimation based on pattern of disease and family history The clinical and family context is key when interpreting results and giving risk guidance. While in most genes, there will be a range of phenotypes associated with pathogenic variants, specific phenotypic features, severity, or early-onset disease may appear to “cluster” within certain families. While chance and shared environmental factors will account for some of the observed clustering, additional oligogenic and polygenic modifiers are largely responsible for such patterns (see the following sections). Estimating risks for family members based on inheritance patterns may also be complex. For example, pathogenic CLCN1 variants can cause myotonia congenita through either autosomal dominant or autosomal recessive inheritance patterns. They may also lead to “semidominant” inheritance patterns, whereby homozygotes are more severely affected than heterozygotes [60]. In conditions such as Huntington’s disease and myotonic dystrophy, phenotype severity can increase from one generation to the next, as the triplet repeat expansions causing disease may increase in length during transmission from a parent to a child, a phenomenon called “anticipation” [35].

Hypomorphic variants Binary characterization of variants as either “high-penetrance pathogenic” or “wholly benign” is increasingly recognized as being oversimplistic. The relationship between variant pathogenicity

268

Chapter 13 Phenotype evaluation and clinical context

and disease penetrance is likely a more nuanced continuum. There are well-documented examples of variants that are disease-associated but hypomorphic or of reduced penetrance [61]. Compared to more “classic variants,” these hypomorphic variants are associated with a less severe phenotype, lower penetrance, and/or later onset of disease. There is increased challenge both in establishing accurate risk of disease and in defining optimal patient management for such hypomorphic variants.

Moderate risk genes In the field of susceptibility to common cancers in particular, large gene discovery experiments have enabled identification of “moderate penetrance” genes such as CHEK2 and ATM, for which pathogenic variants have a relative risk (RR) of disease of two- to threefold. Testing segregation of a pathogenic variant in a “high-penetrance” CSG such as BRCA1 and BRCA2 will dichotomize risk in family members as high or low; testing for the presence or absence of a CHEK2 pathogenic variant of RR ¼ 2 detected in a family with multiple cases of breast cancer offers less clear distinction. Much of the familial cancer risk in such a family is likely thus attributable to factors other than the CHEK2 variant, namely, other oligogenic and polygenic mechanisms (see the following sections). Therefore, a negative predictive test for the familial CHEK2 variant may not necessarily indicate that an individual has a significantly lower cancer risk. Again, optimal patient management for variants in moderate penetrance genes is not well agreed [62].

Other genetic factors In many conditions, phenotypes can be highly variable, even between those within a family with the same variant [63]. For example, pathogenic variants in LMNA may be associated with a phenotypic spectrum ranging from severe congenital muscular dystrophy with cardiac involvement to late-onset, subclinical muscular dystrophy [64,65]. The c.1169T > G variant in BBS1 has been seen in cases of both isolated retinitis pigmentosa and BardeteBiedl syndrome, a syndrome comprising severe intellectual disability, postaxial polydactyly, and obesity, in addition to retinal abnormalities [66]. Oligogenic and polygenic factors and random monoallelic expression are likely contributory genetic mechanisms to such phenotypic variability, although for most conditions the specific genetic factors acting as modifiers have not been identified [67]. Oligogenic modifier variants. When interpreting the clinical evidence for variant pathogenicity, clinicians compare the phenotype in the patient and their family to what would be expected for a typical family in whom that variant is found. If the phenotype is much more severe than would be expected, or the variant appears to explain some aspects of the phenotype but not others, alternative or additional causes may be considered. For certain phenotypes, the co-occurrence of multiple pathogenic variants is well established and it may be standard practice to look for this during clinical testing. For example, it is estimated that 5% of patients with hypertrophic cardiomyopathy have two or more dominant variants in sarcomeric genes [68]. Those with multiple pathogenic variants may have a more severe or earlier onset phenotype compared to those carrying a single pathogenic variant, demonstrating the utility of performing diagnostic testing on the most severely affected individual within a family [68]. For other phenotypes, such as hearing loss, the contribution of additional pathogenic or modifying variants is likely to be significant but is poorly understood [69].

Management of the patient based on the genomic data

269

In families with multiple generations of consanguinity, presence of multiple recessive disorders in a single individual has been reported. The combination of features from multiple different genetic syndromes can lead to a “blended phenotype” that may be challenging to recognize [70]. Polygenic modifiers. Diseases typically occur on a background of polygenic variation which can influence the phenotype. For many traits/diseases, the common variants identified from GWAS have been combined into polygenic risk scores, by which background risks can be quantified (e.g., breast cancer, BMI, intellectual function [71]). However, for most phenotypes, the common variants identified to date account for only a modest proportion of the total common variant risk heritability (typically C (p.M34T), in a UK population study. BMJ Open 2012; 2(4):e001238.

274

Chapter 13 Phenotype evaluation and clinical context

[70] Baldridge D, Heeley J, Vineyard M, et al. The exome clinic and the role of medical genetics expertise in the interpretation of exome sequencing results. Genet Med 2017;19(9):1040e8. [71] Mavaddat N, Michailidou K, Dennis J, et al. Polygenic risk scores for prediction of breast cancer and breast cancer subtypes. Am J Hum Genet 2019;104(1):21e34. [72] Friebel TM, Domchek SM, Rebbeck TR. Modifiers of cancer risk in BRCA1 and BRCA2 mutation carriers: a systematic review and meta-analysis. J Natl Cancer Inst 2014;106(6). [73] Kim S, Wang M, Tyrer JP, et al. A comprehensive gene-environment interaction analysis in ovarian cancer using genome-wide significant common variants. Int J Cancer 2019;144(9):2192e205. [74] Daly MB, Dresher CW, Yates MS, et al. Salpingectomy as a means to reduce ovarian cancer risk. Canc Prev Res 2015;8(5):342e8. [75] Ashley EA, Butte AJ, Wheeler MT, et al. Clinical assessment incorporating a personal genome. Lancet 2010; 375(9725):1525e35.

CHAPTER

Inherited cardiomyopathies

14

Ebony Richardson1, Renee Johnson2, Jodie Ingles1, 3, 4 1

Cardio Genomics Program at Centenary Institute, The University of Sydney, Sydney, NSW, Australia; 2Victor Chang Cardiac Research Institute, Sydney, NSW, Australia; 3Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, Australia; 4Department of Cardiology, Royal Prince Alfred Hospital, Sydney, NSW, Australia

Introduction Inherited heart diseases constitute a heterogenous group of conditions that affect the structural and/or rhythmic functioning of the heart. This can result in a constellation of symptoms that may include palpitations, shortness of breath, syncope (blackouts), heart failure, and sudden cardiac arrest or death. The cumulative prevalence of inherited heart diseases is estimated to be at least 1 in 200e500 in the general population. While there are common features among many inherited heart diseases, their etiology is broad, and in many cases, still incompletely understood. The identification of causative genes for inherited heart disease has grown exponentially over the past 30 years. The advent of genomic sequencing further accelerated the identification of new candidate genes for research, though many still have limited geneedisease association. Despite this, a proportion of families with an inherited heart disease do not have the underlying genetic cause identified by available testing methodologies. This chapter will explore our current understanding of inherited cardiomyopathies, address the key considerations for genetic testing in these conditions, and discuss some of the challenges in variant interpretation and how these translate into the clinic.

Inherited heart diseases Inherited cardiomyopathies The inherited cardiomyopathies are a group of cardiac muscle conditions that are characterized based on abnormal ventricular morphology or function. They are defined by the American Heart Association and European Society of Cardiology, respectively, as “a disease of the myocardium associated with mechanical and/or electrical dysfunction” and “myocardial disease characterized by structurally and functionally abnormal heart muscle and absence of other diseases sufficient to cause the observed myocardial abnormality” [1,2]. They include hypertrophic cardiomyopathy (HCM), dilated cardiomyopathy (DCM), arrhythmogenic right ventricular cardiomyopathy (ARVC), restrictive cardiomyopathy (RCM), and left ventricular noncompaction (LVNC). While traditionally diagnosed based on specific changes in the ventricular muscle and its function, there can be genetic and Clinical DNA Variant Interpretation. https://doi.org/10.1016/B978-0-12-820519-8.00008-9 Copyright © 2021 Elsevier Inc. All rights reserved.

277

278

Chapter 14 Inherited cardiomyopathies

Table 14.1 Genes with robust evidence for inherited cardiomyopathies. Phenotype

Key genes

Diagnostic yield

Hypertrophic cardiomyopathy

ACTC1, MYBPC3, MYH7, MYL2, MYL3, TNNI3, TNNT2, TPM1 ACTN2, FLNC, GLA, LAMP2, PLN, PRKAG2, TTR BAG3, DES, DMD, DSP, EYA4, FLNC, LMNA, MYH7, PLN, RBM20, SCN5A, TNNC1, TNNI3, TNNT2, TTN, TPM1, VCL DSC2, DSG2, DSP, JUP, PKP2, PLN, TMEM43

30%-50%

Isolated LV hypertrophy

Dilated cardiomyopathy

Arrhythmogenic right ventricular cardiomyopathy

A; p.Ala1379Asp) in the family. MYH7 is a well-described gene for HCM, DCM, and RCM though all phenotypes do not typically occur within one family, and overall the mixed phenotypes in the family was a challenge to interpreting the variant. The variant was shown to segregate to affected individuals within the family.

FIGURE 14.2 Family history of Case 2.

Inherited heart diseases

285

Specifically, the variant is seen once in ClinVar in a patient with HCM and classified as VUSefavor pathogenic, although this patient carries another variant affecting the same amino acid residue (NM_000257.3: c.4135G > A; p.Ala1379Thr) that has been previously reported as pathogenic in HCM. Using the MYH7-modified ACMG framework, the p.Ala1379Asp variant seen in PC and his family is absent from controls (PM2), a missense change at an amino acid residue where a different missense change determined to be pathogenic has been seen before (PM5), it cosegregates with disease in multiple affected family members (PP1) and multiple in silico programs predict a deleterious effect (PP3). If all the phenotypes were HCM this would have be a straightforward classification of likely pathogenic (two moderate and two supporting evidence) but the mixed phenotypes raised important discussion by the multidisciplinary team related to uncertainty about this variant being causative of the DCM/RCM phenotypes and that we cannot rule out that there may be another variant involved in these presentations within the family.

Insufficient evidence for variant pathogenicity Insufficient information or evidence of a link between a variant and disease is perhaps the most common challenge facing cardiac variant curators. In silico programs are often used to provide information on predicted impact of a variant on gene function, but in the current ACMG framework, unless the variant has been seen in other similarly affected, unrelated individuals, or there is clear functional data supporting disease causation, then the variant is often relegated to VUS status as seen in Case 3.

Case 3: insufficient variant information MJ was a 44-year-old woman who presented in with severe shortness of breath and suspected pneumonia, but was subsequently diagnosed with DCM and severe heart failure and was listed for cardiac transplantation (Fig. 14.3). MJ was otherwise healthy and well and had three children. During

FIGURE 14.3 Family history of Case 3.

286

Chapter 14 Inherited cardiomyopathies

her work-up for transplantation, genetic testing was discussed and it was clarified that a family member had been tested overseas. MJ’s sister had been diagnosed with DCM at 38 years and required heart transplantation at 44 years. Genetic testing had identified a missense variant in PLN (NM_002667.3: c.28T > A, p.Ser10Thr) initially classified VUS. Reevaluation using the ACMG criteria was performed. The variant was not reported in the literature or in population databases including GnomAD, EXAC, dbSNP, and 1000G (PM2) and no other patients have been reported. The variant is located in a highly conserved amino acid with in silico prediction programs (4/5 programs) indicating a deleterious effect (PP3). But these predictions have not been confirmed by published functional studies and their clinical significance is uncertain. In ClinVar, a variant at the same amino acid position, though with a different change (NM_002667.3: c.29C > T, p.Ser10Leu), has recently been reported, but this was classified as VUS and therefore not able to be used for a PM5 evidence. Family segregation identified the variant in both sisters and an unaffected brother, though were not considered enough to provide PP1 cosegregation evidence. This is a similar story for many variants where insufficient evidence is available to determine the impact of the variant on the function of the gene. The variant identified in this family is clinically suspicious and located next to a pathogenic PLN missense variant p.Arg9Cys that was identified in association with DCM in an American family who presented with refractory congestive heart failure [44]. Early identification of at-risk family members allows for close clinical monitoring and may result in earlier consideration of intervention, including heart transplantation. Current advice suggests that VUS should not be used for cascade testing, so despite potential benefits of early identification of at-risk family members and a high degree of clinical suspicion about this variant, there is little choice but to wait and reanalyze the variant with time, and as new evidence emerges reclassification may be possible.

Future directions Improved phenotyping, experimental evidence, and functional data for genetic variants A move to comprehensive accessible databases for genetic variants, development of high-throughput functional assays and detailed phenotyping efforts linked to genotypes, and provided in variant databases will help to provide more information to improve classification of genetic variants for cardiomyopathies [45]. Gene variant matching initiatives, such as Matchmaker Exchange, that bring together a range of different groups and databases that would have previously been inaccessible and an improved willingness to report in clinical and population databases will assist in classification of variants [46]. Work currently being undertaken by ClinGen to systematically classify both genee disease and individual variant associations will also provide important information for consideration on the strength of geneedisease relationship and is a useful resource for variant classification [47]. Methods for high-throughput functional analysis will be an important component in clarifying VUS. Animal models have long been the gold standard for functional analyses, but in general they are costly and time-consuming. Newer cardiac models such as zebra fish may provide a high-throughput option for modeling cardiac disease [45]. There is growing recognition of the importance of background genetic variation in patients and the impact this may have on the function of a variant, and how this is assessed in current animal models or in human samples from other individuals should be considered [48]. Newer systems for modeling genetic variants include the use of induced pluripotent stem cells (iPSs) and organoid bodies. These are derived from the patient themselves, so contain the background genetic variation present in an individual. While much success has been seen in the

Summary

287

cardiac field in assessing variants that alter the electrical properties of cells [49], the growing field using heart tissue organoid bodies which provide a matrix more closely resembling the heart for study of structural and developmental variants has exciting potential for the future [50].

Tackling secondary findings of cardiac variants Traditional cardiac genetic testing has taken place in individuals with a diagnosed heart condition and a high likelihood of identification of a genetic variant. A secondary finding is a genetic result (usually a likely pathogenic or pathogenic variant) in a gene for which the individual does not have a personal history of disease and was not the primary indication for testing [51]. As greater numbers of healthy people take part in sequencing projects, the likelihood of identifying secondary cardiac variants increases and for these populations the prognostic significance of these variants is unclear. There is reasonable consensus that these findings, where appropriately consented by the participants, should be returned. However, how and what to return is still under debate [52]. There has been a general move to use the terminology secondary findings rather than incidental findings as these genes are being deliberately examined as part of the ACMG recommendation [53]. Among the 59 genes on this list, 30 are cardiac disease genes, and a further 16 are genes that may include a cardiac phenotype [53]. It is likely that over time this list will continue to be adapted and changed, resulting in more individuals with potential secondary cardiac variants. At present, the medical consequences of secondary cardiac findings, specifically the requirements for long-term clinical screening and the responsibility of periodic reclassification of variants, have not been elucidated. Though there is some evidence to suggest that return of secondary cardiac finding can be cost-effective [54].

Increased genetic screening of cardiac patients Access to genetic testing is increasing around the world. As evidence of utility increases, the availability and access to genetic testing for cardiac disease has likewise increased. Analysis of more genotyped patients not only provides additional information that can assist in curation of genetic variants, but also opens up greater potential for important research findings, including improved patient management and treatment opportunities. Large-scale studies of genotyped patients can help improve our understanding of penetrance, clinical outcomes, better define clinical screening recommendations of at-risk family members, improve identification of early disease in individuals, and in the future may lead to improved treatments and provision of appropriate medical services. Additional benefits for the genotype-positive individuals include targeted clinical screening and may allow us to more effectively detect and treat early disease. Increasing the pool of genotyped patients helps to drive innovation in treatments with new gene-directed treatments for cardiac conditions emerging to the benefit of patients.

Summary It is an exciting time in the field of inherited cardiomyopathies and genetics is increasingly allowing us to better manage and care for affected families. Classification of cardiac variants is nuanced, and partnership with those with specialized expertise in this area will likely prove useful. While many findings straddle the research and clinical spheres, the benefit of a genetic diagnosis in a family cannot be underestimated.

288

Chapter 14 Inherited cardiomyopathies

Acknowledgment JI is a recipient of a National Health and Medical Research Council (NHMRC) Career Development Fellowship. Disclosures JI receives research grant support from Myokardia Inc.

References [1] Maron BJ, Towbin JA, Thiene G, et al. Contemporary definitions and classification of the cardiomyopathies: an American Heart Association scientific statement from the Council on clinical cardiology, heart failure and transplantation committee; Quality of Care and Outcomes Research and Functional Genomics and Translational Biology Interdisciplinary Working Groups; and Council on Epidemiology and Prevention. Circulation 2006;113(14):1807e16. [2] Elliott P, Andersson B, Arbustini E, et al. Classification of the cardiomyopathies: a position statement from the European Society of Cardiology working group on myocardial and pericardial diseases. Eur Heart J 2008;29(2):270e6. [3] Elliott PM, Anastasakis A, Borger MA, et al. ESC guidelines on diagnosis and management of hypertrophic cardiomyopathy: the task force for the diagnosis and management of hypertrophic cardiomyopathy of the European Society of Cardiology (ESC). Eur Heart J 2014;35(39):2733e79. [4] Ingles J, Burns C, Bagnall RD, et al. Non-familial hypertrophic cardiomyopathy: prevalence, natural history, and clinical implications. Circ Cardiovasc Genet 2017;10(2). [5] Ko C, Arscott P, Concannon M, et al. Genetic testing impacts the utility of prospective familial screening in hypertrophic cardiomyopathy through identification of a nonfamilial subgroup. Genet Med 2018;29(1): 69e75. [6] Ingles J, Goldstein J, Thaxton C, et al. Evaluating the clinical validity of hypertrophic cardiomyopathy genes. Circ Genom Precis Med 2019;12(2):e002460. [7] Watkins H, Ashrafian H, Redwood C. Inherited cardiomyopathies. N Engl J Med 2011;364(17):1643e56. [8] Rosenbaum AN, Agre KE, Pereira NL. Genetics of dilated cardiomyopathy: practical implications for heart failure management. Nat Rev Cardiol 2019;17(5):286e97. [9] Atherton JJ, Sindone A, De Pasquale CG, et al. National Heart Foundation of Australia and Cardiac Society of Australia and New Zealand: Australian clinical guidelines for the management of heart failure 2018. Med J Aust 2018;209(8):363e9. [10] Sahle BW, Owen AJ, Mutowo MP, Krum H, Reid CM. Prevalence of heart failure in Australia: a systematic review. BMC Cardiovasc Disord 2016;16:32. [11] Hershberger RE, Hedges DJ, Morales A. Dilated cardiomyopathy: the complexity of a diverse genetic architecture. Nat Rev Cardiol 2013;10(9):531e47. [12] Fatkin D, Johnson R, McGaughran J, Weintraub RG, Atherton JJ, Group CGCW. Position statement on the diagnosis and management of familial dilated cardiomyopathy. Heart Lung Circ 2017;26(11):1127e32. [13] Ingles J, Macciocca I, Morales A, Thomson K. Genetic testing in inherited heart diseases. Heart Lung Circ 2020;29(4):505e11. https://doi.org/10.1016/j.hlc.2019.10.014. [14] McNally EM, Mestroni L. Dilated cardiomyopathy: genetic determinants and mechanisms. Circ Res 2017; 121(7):731e48. [15] Fatkin D, Huttner IG. Titin-truncating mutations in dilated cardiomyopathy: the long and short of it. Curr Opin Cardiol 2017;32(3):232e8. [16] Burke MA, Cook SA, Seidman JG, Seidman CE. Clinical and mechanistic insights into the genetics of cardiomyopathy. J Am Coll Cardiol 2016;68(25):2871e86.

References

289

[17] Horvat C, Johnson R, Lam L, et al. A gene-centric strategy for identifying disease-causing rare variants in dilated cardiomyopathy. Genet Med 2019;21(1):133e43. [18] Strande NT, Riggs ER, Buchanan AH, et al. Evaluating the clinical validity of gene-disease associations: an evidence-based framework developed by the clinical genome resource. Am J Hum Genet 2017;100(6): 895e906. [19] Herman DS, Lam L, Taylor MR, et al. Truncations of titin causing dilated cardiomyopathy. N Engl J Med 2012;366(7):619e28. [20] Roberts AM, Ware JS, Herman DS, et al. Integrated allelic, transcriptional, and phenomic dissection of the cardiac effects of titin truncations in health and disease. Sci Transl Med 2015;7(270). [21] Fatkin D, MacRae C, Sasaki T, et al. Missense mutations in the rod domain of the lamin A/C gene as causes of dilated cardiomyopathy and conduction-system disease. N Engl J Med 1999;341(23):1715e24. [22] Towbin JA, McKenna WJ, Abrams DJ, et al. HRS expert consensus statement on evaluation, risk stratification, and management of arrhythmogenic cardiomyopathy. Heart Rhythm 2019;16(11):e301e72. [23] Bagnall RD, Weintraub RG, Ingles J, et al. A prospective study of sudden cardiac death among children and young adults. N Engl J Med 2016;374(25):2441e52. [24] Murray B. Arrhythmogenic right ventricular dysplasia/cardiomyopathy (ARVD/C): a review of molecular and clinical literature. J Genet Counsel 2012;21(4):494e504. [25] Groeneweg JA, Bhonsale A, James CA, et al. Clinical presentation, long-term follow-up, and outcomes of 1001 arrhythmogenic right ventricular dysplasia/cardiomyopathy patients and family members. Circ Cardiovasc Genet 2015;8(3):437e46. [26] Marcus FI, McKenna WJ, Sherrill D, et al. Diagnosis of arrhythmogenic right ventricular cardiomyopathy/ dysplasia: proposed modification of the Task Force Criteria. Eur Heart J 2010;31(7):806e14. [27] Wilcox JE, Hershberger RE. Genetic cardiomyopathies. Curr Opin Cardiol 2018;33(3):354e62. [28] McKoy G, Protonotarios N, Crosby A, et al. Identification of a deletion in plakoglobin in arrhythmogenic right ventricular cardiomyopathy with palmoplantar keratoderma and woolly hair (Naxos disease). Lancet 2000;355(9221):2119e24. [29] Arbustini E, Favalli V, Narula N, Serio A, Grasso M. Left ventricular noncompaction: a distinct genetic cardiomyopathy? J Am Coll Cardiol 2016;68(9):949e66. [30] Hershberger RE, Givertz MM, Ho CY, et al. Genetic evaluation of cardiomyopathy: a clinical practice resource of the American College of Medical Genetics and Genomics (ACMG). Genet Med 2018;20(9): 899e909. [31] Miller EM, Hinton RB, Czosek R, et al. Genetic testing in pediatric left ventricular noncompaction. Circ Cardiovasc Genet 2017;10(6). [32] Muchtar E, Blauwet LA, Gertz MA. Restrictive cardiomyopathy: genetics, pathogenesis, clinical manifestations, diagnosis, and therapy. Circ Res 2017;121(7):819e37. [33] Kostareva A, Kiselev A, Gudkova A, et al. Genetic spectrum of idiopathic restrictive cardiomyopathy uncovered by next-generation sequencing. PLoS One 2016;11(9):e0163362. [34] Ingles J, Zodgekar PR, Yeates L, et al. Guidelines for genetic testing of inherited cardiac disorders. Heart Lung Circ 2011;20(11):681e7. [35] Ackerman MJ, Priori SG, Willems S, et al. HRS/EHRA expert consensus statement on the state of genetic testing for the channelopathies and cardiomyopathies this document was developed as a partnership between the Heart Rhythm Society (HRS) and the European Heart Rhythm Association (EHRA). Heart Rhythm 2011;8(8):1308e39. [36] Ingles J, Semsarian C. The value of cardiac genetic testing. Trends Cardiovasc Med 2014;24(6):217e24. [37] Ingles J, McGaughran J, Scuffham PA, Atherton J, Semsarian C. A cost-effectiveness model of genetic testing for the evaluation of families with hypertrophic cardiomyopathy. Heart 2012;98(8):625e30. [38] Catchpool M, Ramchand J, Martyn M, et al. A cost-effectiveness model of genetic testing and periodical clinical screening for the evaluation of families with dilated cardiomyopathy. Genet Med 2019;21(12): 2815e22.

290

Chapter 14 Inherited cardiomyopathies

[39] Wordsworth S, Leal J, Blair E, et al. DNA testing for hypertrophic cardiomyopathy: a cost-effectiveness model. Eur Heart J 2010;31(8):926e35. [40] Mann SA, Castro ML, Ohanian M, et al. R222Q SCN5A mutation is associated with reversible ventricular ectopy and dilated cardiomyopathy. J Am Coll Cardiol 2012;60(16):1566e73. [41] Martin S, Ingles J, Hunyor I, Bagnall RD, Puranik R, Semsarian C. LAMP2 shines a light on cardiomyopathy in an athlete. HeartRhythm Case Rep 2017;3(3):172e6. [42] Verdonschot JAJ, Hazebroek MR, Ware JS, Prasad SK, Heymans SRB. Role of targeted therapy in dilated cardiomyopathy: the challenging road toward a personalized approach. J Am Heart Assoc 2019;8(11): e012514. [43] Kelly M, Caleshu C, Morales A, et al. Adaptation and validation of the ACMG/AMP variant classification framework for MYH7-associated inherited cardiomyopathies: recommendations by ClinGen’s inherited cardiomyopathy expert panel. Genet Med 2017;20(3):351e9. [44] Schmitt JP, Kamisago M, Asahi M, et al. Dilated cardiomyopathy and heart failure caused by a mutation in phospholamban. Science 2003;299(5611):1410e3. [45] Fatkin D, Huttner IG, Kovacic JC, Seidman JG, Seidman CE. Precision medicine in the management of dilated cardiomyopathy: JACC state-of-the-art review. J Am Coll Cardiol 2019;74(23):2921e38. [46] Philippakis AA, Azzariti DR, Beltran S, et al. The Matchmaker Exchange: a platform for rare disease gene discovery. Hum Mutat 2015;36(10):915e21. [47] Rehm HL, Berg JS, Brooks LD, et al. ClinGen–the clinical genome resource. N Engl J Med 2015;372(23): 2235e42. [48] Rouhani F, Kumasaka N, de Brito MC, Bradley A, Vallier L, Gaffney D. Genetic background drives transcriptional variation in human induced pluripotent stem cells. PLoS Genet 2014;10(6):e1004432. [49] Wu JC, Garg P, Yoshida Y, et al. Towards precision medicine with human iPSCs for cardiac channelopathies. Circ Res 2019;125(6):653e8. [50] Mills RJ, Hudson JE. Bioengineering adult human heart tissue: how close are we? APL Bioeng 2019;3(1): 010901. [51] Green RC, Berg JS, Grody WW, et al. ACMG recommendations for reporting of incidental findings in clinical exome and genome sequencing. Genet Med 2013;15(7):565e74. [52] Mackley MP, Fletcher B, Parker M, Watkins H, Ormondroyd E. Stakeholder views on secondary findings in whole-genome and whole-exome sequencing: a systematic review of quantitative and qualitative studies. Genet Med 2017;19(3):283e93. [53] Kalia SS, Adelman K, Bale SJ, et al. Recommendations for reporting of secondary findings in clinical exome and genome sequencing, 2016 update (ACMG SF v2.0): a policy statement of the American College of Medical Genetics and Genomics. Genet Med 2017;19(2):249e55. [54] Bennette CS, Gallego CJ, Burke W, Jarvik GP, Veenstra DL. The cost-effectiveness of returning incidental findings from next-generation genomic sequencing. Genet Med 2015;17(7):587e95.

CHAPTER

15

Phenylketonuria

Stephanie Sacharow1, 2, Farrah Rajabi1,2, Harvey Levy1, 2 1

Division of Genetics and Genomics, Boston Children’s Hospital, Boston, MA, United States; 2Harvard Medical School, Boston, MA, United States

Introduction Phenylketonuria (PKU) is the traditional generic term for phenylalanine hydroxylase (PAH) deficiency. It is an autosomal recessive inborn error of metabolism which causes intellectual disability in the untreated state. Pathogenic variants in the PAH gene encoding phenylalanine hydroxylase (PAH) result in defective PAH impairing the conversion of phenylalanine (Phe) to tyrosine (Tyr) and leading to excess Phe in the blood and brain as well as reduced Tyr (Fig. 15.1). PKU is considered the model for an inborn error of metabolism: it was one of the first discovered, it is one of the most prevalent of the rare inborn errors of metabolism, it is virtually always identified by newborn screening, and there is an effective treatment that together with newborn identification prevents the severe morbidities. In fact, PKU has been one of the most successful examples of prevention of intellectual disability, and was the first condition for which newborn screening was successfully implemented [1,2]. The gene for PAH is located on chromosome 12q23.1, and more than 1000 pathological variants have been described. Most variants are missense and most individuals with PKU have biallelic heterozygous variants. Generally disease-causing variants result in abnormal folding, increased turnover with decreased activity. Genotype is a strong predictor of severity [3,4]. PAH requires the cofactor tetrahydrobiopterin (BH4) and certain variants lead to cofactor responsive PKU in which pharmacological amounts of the cofactor (actually sapropterin dihydrochloride, a synthetic form of the cofactor that provides essentially the same stimulation as BH4) may increase residual PAH activity. These variants have become recognized as a subgroup of genotypeephenotype correlation. The differential diagnosis for elevated phenylalanine includes several other conditions involved in BH4 synthesis and recycling.

History of phenylketonuria Asbjorn Følling, a Norwegian physician and biochemist born in 1888, was the first to describe PKU (which he originally termed “imbecillitas phenylpyruvica”). While working as a chemist in 1934, he was approached by a couple who had two children with global developmental delay and a peculiar odor to their urine. Testing their urine he discovered that adding a few drops of ferric chloride reagent, Clinical DNA Variant Interpretation. https://doi.org/10.1016/B978-0-12-820519-8.00006-5 Copyright © 2021 Elsevier Inc. All rights reserved.

291

292

Chapter 15 Phenylketonuria

FIGURE 15.1 Phenylalanine hydroxylase pathway. This pathway for the conversion of phenylalanine to tyrosine is disrupted in patients with PKU. Note there is a cofactor, tetrahydrobiopterin (BH4), which is regenerated through a series of 3 enzymatic reactions.

which he was using at the time to study ketones, turned the urine into a deep green color. Laborious studies of their urine led to identification of the ferric chloride reacting substance as a specific ketone, phenylpyruvic acid [5]. He initially hypothesized that the defect was in the further degradation of phenylpyruvic acid but soon thereafter, he and others realized that the defect in the pathway was in the conversion of Phe to Tyr pathway leading to Phe accumulation and that phenylpyruvic acid was a deamination byproduct of the increased Phe. Følling identified additional patients within institutions in Norway including other sibling pairs. Discovery of sets of affected siblings supported heritability and more specifically autosomal recessive inheritance. On testing large numbers of individuals institutionalized because of severe intellectual disability, w1/50 was found to be affected [6]. Furthermore, Dr. Følling was able to detect carriers by administration of phenylalanine and testing urine with ferric chloride [7]. Lionel Penrose of England recognized the significance of Følling’s discoveries very soon after Følling published his findings and identified more individuals in institutions affected with PKU and made an early attempt at dietary management. In addition, Penrose introduced the name phenylketonuria because of the presence of phenylpyruvic acid, a ketone, in the urine, although now the condition is best identified and followed on the basis of Phe levels in the blood. Penrose also aided

History of phenylketonuria

293

in establishing autosomal recessive inheritance and noted multigenerational occurrences in a consanguineous family [5,7,8]. In 1950, Pearl S. Buck, who had been awarded the Nobel Prize for literature, published a book entitled The Child Who Never Grew which described her intellectually disabled daughter Caroline who had untreated PKU. At this time, it was not standard to speak openly about disabilities. This helped to raise awareness of PKU and the plight of families with disabled children [6]. In the 1940s George Jervis, a refugee from Germany, performed studies in the United States which delineated the causative reaction to be an error in the hydroxylation of Phe and correctly hypothesized that there was a block in the conversion of Phe into Tyr, a hypothesis he proved in 1953 by measuring Phe hydroxylation in biopsied liver [9e11]. At the same time, Horst Bickel, a German physician, developed the first “workable” diet for PKU. This occurred after he encountered a female toddler with global developmental delay and diagnosed her with untreated PKU. For this diet Bickel successfully removed Phe from a casein hydrolysate to make an essentially phenylalanine-free protein substitute. He instructed the mother to discontinue dietary protein and instead to give her daughter only this formulation. Within a day the musty smell in the child dissipated, the blood Phe levels normalized, urinary phenylpyruvic acid became undetectable, and the ferric chloride reaction in the urine became negative. The child showed dramatic improvement in her development. When Phe was added to the formulation the biochemical and clinical features of PKU returned, proving that the formulation worked as a therapy for PKU [10]. Subsequent work by others demonstrated that with this treatment beginning in the neonate it was possible to prevent irreversible neurological damage [12]. A low protein diet with the Phe-free protein replacement became commercially available in the 1950s and thereafter became the standard of care for PKU. Dr. Robert Guthrie, a physician and microbiologist in Buffalo, NY, had a son with intellectual disability and, as a result, became involved with the National Association for Retarded Children (NARC). He learned about PKU through this organization and was challenged with the task of creating a better method for measuring blood Phe to monitor patients with PKU who had been placed on the diet. At the time, quantification of blood Phe for disease monitoring was burdensome and required a large volume of blood. Within 3 days of being presented with the problem, Guthrie developed a “bacterial inhibition assay” to semiquantitatively measure Phe using venous blood which he impregnated into filter paper. The principle of the assay was competitive inhibition between an inhibitor of growth of the Bacillus subtilis (b-2-thienylalanine) in the agar gel and Phe that eluted into the gel from strips of the filter paper. Increased Phe counteracted the inhibitor and allowed for bacterial growth around the filter paper strips on the agar plate from patients with PKU. The size of the bacterial growth indicated the blood Phe level. In 1961, Guthrie was informed that his wife’s niece who was institutionalized for profound intellectual disability was diagnosed with PKU. Guthrie then found out that had she been diagnosed as a neonate and placed on dietary treatment the disability would have been prevented. This was the impetus for screening newborns, and soon Guthrie found that the bacterial inhibition assay worked on dried blood that had been impregnated into filter paper directly from a heel stick of the newborn. This was dubbed the “Guthrie Test.” Testing individuals with intellectual disability at a state residential school, Guthrie found that some with PKU were missed by the urine ferric chloride test but readily detectable by his bacterial assay. Subsequently, a trial of his assay in New York State successfully identified presymptomatic newborns. Massachusetts became the first state to routinely test all newborn infants before they were discharged from the hospital nursery and later to mandate newborn screening for PKU. By 1966, newborn screening was mandatory in most states [13,14].

294

Chapter 15 Phenylketonuria

Clinical features Clinical symptoms PKU in the untreated state was noted early on to cause an unusual musty body odor and light coloring of the hair and eyes. Intellectual disability may be severe in the untreated or late treated state, and affected individuals may be nonambulatory and nonverbal. Many behavioral characteristics have been described, including autistic features and skin picking, the latter of which may be attributed to PKUrelated eczematous changes. Adults may exhibit Parkinson-like features. In the situation that a patient was identified early but incompletely treated over time, there may be significant learning disabilities or intellectual disability, often in the mild or borderline range. Those treated in childhood but not following treatment recommendations in adulthood are likely to experience difficulties with executive function and attention, and may have psychiatric problems including anxiety and/or depression. Even with treatment, people may exhibit subtle differences in cognition, neuropsychiatric symptoms, and physical findings such as hyperreflexia and tremor [4]. Neuroimaging with brain MRI will often demonstrate white matter changes, which may be more severe with a longer period of nonadherence to treatment recommendations. Blood Phe levels during critical periods of development, from 0 to 12 years, are significantly correlated with intelligence quotient (IQ). Metaanalysis demonstrated that each 100 umol/L increase in blood Phe predicted a 1.3 to 3.1 point reduction in IQ. Untreated classical PKU may result in severe intellectual disability (IQ < 30), seizure disorder, and abnormal behaviors [15]. Psychological testing is part of many PKU clinical programs. Mental health monitoring is recommended and neuropsychological testing should be considered as part of the treatment plan [15].

Newborn screening Newborn screening was initially developed by bacterial inhibition assay but is currently performed by tandem mass spectrometry (MS/MS). In the United States, newborn dried blood spot samples are obtained at 24e48 h of life and generally flagged for any level above 120 mmol/L. False positives can be due to sick neonates with need for parenteral nutrition or blood transfusion, and false negatives can result from early screening prior to 24 h of age. Once elevated Phe levels are identified on a newborn screen a sequence of biochemical testing ensues. If the elevated Phe level is confirmed, this pretreatment blood concentration is used as a diagnostic parameter for classification of the PKU phenotype (Table 15.1) [16,17]. Phe screening is a major part of newborn screening in all industrialized countries, so now PKU is almost always identified in the newborn and the initial biochemical phenotype is evident prior to confirmatory genotype.

Diagnosis Plasma amino acid testing is the standard in confirming the presence of high Phe in newborns with a positive newborn screen. Diagnostic criteria include having a plasma Phe greater than 120 umol/L (normal 60e70 umol/L) with increased ratio of Phe to Tyr (due to both high Phe and reduced Tyr, the product of PAH) and, subsequently, found to have biallelic pathogenic variants in the PAH gene.

Clinical features

295

Table 15.1 Biochemical classification of PKU. Phe-related disorder

Pretreatment phe level

PAH genotype

Classic PKU

>1200 mmol/L

Moderate PKU

900e1200 mmol/L

Mild PKU

600e900 mmol/L

Mild HPA-gray zone

360e600 mmol/L

Mild HPA

120e360 mmol/L

Tetrahydrobiopterin deficiencies

Normal to elevated

2 classic variants (often null) Classic þ moderate or 2 moderate variants Classic, moderate, or mild variant þ 1 mild HPA variant Classic, moderate, or mild variant þ 1 mild HPA variant Classic, moderate, or mild variant þ 1 mild HPA variant Not applicable

Likelihood BH4 response Low Low Medium

High

No treatment required

Very high

BH4 deficiency cofactor defects, which also produce hyperphenylalaninemia (HPA), therefore can mimic PKU in initial screening, may produce an even more severe neurological disease than PKU and require a very different treatment regimen, and should be evaluated during the confirmatory process and ruled out through urine pterin analysis and erythrocyte DHPR testing [18,19]. Since the deficiency of PAH activity is secondary in these entities and not primary PAH deficiency, they are not included in this chapter.

Classification of PKU Classification schemes have varied, but classical PKU may be defined as Phe >1200 umol/L, mild to moderate PKU as reaching Phe 600e1200 umol/L, and HPA as having blood Phe 120e599 umol/L [15]. However, PAH deficiency occurs on a continuum and the degree of HPA cannot always be clearly defined. The higher Phe levels in the untreated state (usually 1200 umol/L or greater), if untreated, are associated with more severe cognitive impairment and requires treatment while benign HPA (120e360 umol/L) in the untreated state seems not to cause symptoms. In between these two extremes, mild and moderate PKU, produce varying degrees of cognitive and psychological impairments and require treatment.

Incidence of PKU Incidence is approximately 1/10,000 in North America, with an estimated w200e400 babies born with PKU each year [20]. The incidence is strongly related to the region of origin, with some of the highest rates in Ireland (1:4500) and Turkey (1:2600) [4].

296

Chapter 15 Phenylketonuria

Genetic counseling Phenylalanine hydroxylase deficiency and most of the cofactor defects are inherited by autosomal recessive inheritance. Affected individuals have two pathogenic variants, one inherited from each parent. Due to parents being carriers, there is a 25% recurrence risk, and in siblings who are not affected, there is a 2/3 risk for being a carrier. Genetic counseling should be offered to families of affected children and to individuals with PKU at an appropriate age, as well as to individuals with PKU and their relatives who are making reproductive decisions. Carrier testing is most definitively conducted by targeted genotyping of the PAH gene to evaluate for known familial variants. Families should be given the option of this testing among members of the immediate as well as extended family, especially those of reproductive age. The authors are aware of families with affected cousins as well as families with one affected parent and affected children. Prenatal diagnosis by chorionic villous biopsy or amniocentesis as well as preconception testing through preimplantation genetic diagnosis (PGD) is possible if the familial variants are known.

Management A comprehensive management framework was proposed to include individualization of therapy, optimization of neurological outcomes, psychosocial outcomes, nutritional outcomes, quality of life, and multidisciplinary care [5]. Dietary therapy has long been the focal point of PKU management. Children identified by newborn screening to have blood Phe >360 umol/L on confirmatory testing should have dietary therapy initiated within the first 10 days of life. The foundation of management is to limit dietary Phe intake by severely limiting natural protein and replacing protein with a medical food that has very low or no Phe. Patients should be managed by an experienced metabolic physician and dietician and frequent blood phenylalanine monitoring is required during infancy and childhood as the Phe tolerance will change at different stages of growth and metabolic demand. The American College of Medical Genetics (ACMG) practice guidelines recommendation is for lifetime treatment with the goal for blood Phe to be controlled in the range of 120e360 umol/L throughout the lifespan [19]. For many patients this means that they can only ingest as little as 10% of their daily protein from natural foods [21]. Unfortunately, due to the extreme difficulty of the diet, many, especially adults, do not strictly comply with the PKU diet [22]. There have been addition therapeutic options in the more recent past to be used in conjunction with PKU diet, or even to potentially replace PKU diet. Pharmacological amounts of sapropterin, a synthetic form of BH4, have been available and used in PKU treatment since 2007. This has been effective in stimulating PAH activity in patients with residual PAH activity, i.e., those with less than classic PKU. In the majority of those who respond to this drug, natural protein may be increased in the diet and an occasional patient who is unusually responsive to sapropterin may be controlled on this as monotherapy and eat an unrestricted diet. Between 25% and 50% of patients with PAH deficiency are responsive to sapropterin. It is recommended that patients with PAH deficiency be offered a trial of sapropterin unless they have biallelic null mutations [19]. Imbalances of large neutral amino acids along with Phe elevation may contribute to the changes in the brain in PKU [23]. Large neutral amino acids (LNAA), which may partially block Phe uptake at the bloodebrain barrier and thereby lower brain Phe, have been used as adjunctive therapy [24]. LNAA are avoided in pregnancy and children because of unknown effects on development [19].

Evolution of genotyping

297

Pegvaliase is an injectable treatment comprised of the pegylated nonhuman enzyme phenylalanine ammonia lyase (PAL). This drug which was approved in the United States in 2018 for the treatment of PKU has demonstrated efficacy in lowering Phe in responsive patients, many of whom may transition to an unrestricted diet without medical food while maintaining normal blood Phe levels. The enzyme is immunogenic and side effects are experienced by patients, so therapy requires mindful titration toward a therapeutic dose while managing side effects. In some patients efficacy requires a prolonged time, up to a year or more, and occasional patients never achieve efficacy. Responsiveness is genotypeindependent since PAL metabolizes Phe in a different pathway from PAH [25].

Maternal PKU Blood Phe level in a pregnant mother can have teratogenic effects on a developing fetus. This was first brought to attention in 1956 by Dr. Charles Dent, who discussed a mother with PKU who had three children without PKU who were intellectually disabled [25]. Further studies by Charlton Mabry showed a high incidence of cognitive deficiency in the offspring of mothers with PKU [26]. In 1980, a survey study regarding offspring of mothers with PKU characterized the teratogenic effects, and the subsequent prospective international Maternal PKU Collaborative Study (MPKUCS) has become the definitive study of the treatment of maternal PKU. Common outcomes of untreated or very late treated maternal PKU pregnancies are microcephaly, intellectual disability, congenital heart disease, and prenatal as well as postnatal growth deficiency [27]. In a retrospective survey of 423 offspring to mothers with Phe >20 mg/dL or 1200 umol/L, 92% were intellectually disabled, 73% were microcephalic, and 17% had congenital heart defects [28]. It is now standard of care for mothers with PKU to attain strict Phe control prior to conception and to continue close monitoring and adjustments of the diet in response to Phe tolerance, which changes over the course of a pregnancy. Early reproductive counseling is vital [19]. Women not known to have PKU who have children with features of maternal PKU should have their blood Phe measured.

Evolution of genotyping The relationship between the PAH gene and HPA has important implications for PKU. Finding two pathogenic variants in the PAH gene establishes a diagnosis of PKU and can guide the next steps of management [29]. Thus, the molecular relationship of the PKU gene to the clinical characteristics of HPA and PKU has become one of the most important developments in the inborn errors of metabolism. This understanding has become possible by the cloning and characterization of the PAH gene. The background that led to cloning of the gene for PAH and of other metabolic genes is of interest. In 1969, Pardue and Gall demonstrated molecular hybridization, that the nucleotides of one strand of DNA could bind to the nucleotides of a related strand [30]. A few years later Jackson et al. and Cohen et al., among others, demonstrated molecular recombination, basically inserting a DNA fragment of one species into the DNA of another species [31,32]. Thus, molecular cloning and the use of the clone to capture related DNA through hybridization could be used to isolate and study human genes and determine the variants in these genes, providing a molecular explanation for the phenotypes of genetic disorders.

298

Chapter 15 Phenylketonuria

By the early 1980s investigators became interested in actually using this technology to examine the molecular basis of the inborn errors of metabolism. Prominent among the inborn errors are the urea cycle disorders, so in 1981 Su et al. cloned the human cDNA for argininosuccinate synthetase and in 1984 Horwich et al. cloned human cDNA for ornithine transcarbamylase [33,34]. During this period Savio Woo at Baylor University developed a laboratory devoted to cloning the human cDNA of PAH as a means to understand PKU on a molecular basis and for diagnostic purposes as well. Dr. Woo had been interested in examining the biochemical and molecular basis of PKU for a number of years. His interest began in the early 1970s when he emigrated from Hong Kong to British Columbia to study in the laboratory of Dr. Louis Woolf, one of the pioneers in PKU research. Dr. Woolf was investigating the possibility that the dilute lethal mouse, a genetic mutant that had the combination of profound neurological disease and a very light-colored coat, might be a model for PKU. Dr. Woolf had determined that blood in this mouse had a normal concentration of Phe but he felt that further investigation might still find that this mutant is a suitable model for PKU. Thus, when Dr. Woo joined his laboratory as a fellow Dr. Wolff assigned him the project of the measuring PAH activity in the mouse. Dr. Woo isolated PAH from the liver of this mouse and found PAH activity as well other aspects of PAH to be normal [35]. Although this eliminated the possibility that the dilute lethal mouse has PKU, it stimulated Dr. Woo to further investigate PAH and its relationship to human PKU. When Dr. Woo moved to Baylor University, he continued his work in PKU with the intention of cloning and characterizing the PAH gene. The first step in this process was to clone rat PAH cDNA [36]. Using the rat PAH cDNA as a specific hybridization probe, Dr. Woo and his colleagues isolated human PAH cDNA from a human liver cDNA library. With this human PAH cDNA as a probe for genomic DNA, they found that the PAH gene was present, not deleted, in two PKU cell lines and was organizationally identical to the normal PAH gene [37]. Subsequently, his laboratory localized the PAH gene to chromosome 12 and regionally mapped the locus to 12q22-24.1 [38e40]. They then investigated the function of the gene and showed that human PAH cDNA contained all the genetic information required to transcribe PAH cDNA and translate fully active PAH [41]. Finally, the Woo laboratory examined and reported the nucleotide sequence of human PAH cDNA and the derived amino acid sequence of PAH. All of this work enabled the determination of PAH gene, its haplotypes, and variants in the PAH gene associated with PKU [42,43].

Practical genotypeephenotype correlation PKU genotypeephenotype correlation has proven invaluable in the clinical setting to guide the approach for Phe tolerance and potential response to BH4 treatment. Identifying the PAH genetic variants in a newborn newly identified and confirmed with HPA is also valuable to the clinician, genetic counselor, and parents for adding to information that will better allow for an estimation of prognosis (see Table 15.1). The majority of PAH variants are missense, and the majority of patients are compound heterozygotes. Guldberg et al. noted in 1998 that phenotypic categorization of PAH variants based on functional hemizygous patient phenotypes could predict the observed phenotype in 79% of patients [44]. PAH is a 2.6 kb enzyme which exists predominantly as a tetramer. It has an absolute requirement for the ferrous ion as well as the BH4 cofactor and O2 for the hydroxylation reaction. The enzyme

Practical genotypeephenotype correlation

299

consists of three domains: the regulatory domain (residues 1e142), the catalytic domain (residues 143e410), and a short tetramerization domain (residues 411e452) [45]. The initial in vitro studies of variant PAH were primarily performed on those identified in European populations with residual PAH activity and revealed variable correlation with phenotype. However, they did reveal correspondence with a milder phenotype than severe classic PKU and greater likelihood of higher activity in the presence of pharmacological amounts of BH4 [51]. BH4 appears to function as a molecular chaperone to PAH that protects the protein from misfolding during synthesis and promotes reconstitution of the correct three-dimensional structure in the cytosol [46]. Interallelic complementation occurs in PAH when the hybrid protein is expressed from two different alleles of the gene resulting in higher or lower catalytic activity than predicted from the sum of the predicted residual activities of the alleles [47]. In addition, some variants are believed to be kinetic with a lower binding affinity for the BH4 cofactor. These variants will also respond to pharmacological amounts of BH4 with enhanced PAH activity. Kinetic variants are more often located in cofactor-binding regions [CBR #1 (residues 245e266), CBR #2 (residues 280e283), CBR #3 (residues 322e326), and CBR #4 (residues 377e379)], or residues in regions that interact with the secondary structural elements involved in cofactor binding [48]. BioPKU has organized the largest database of PAH variants with 1184 unique variants identified as of November 10, 2019 [49]. The open access PAHvdb and BIOPKU database provides the reported phenotype percentages of 16,244 PKU patients. With the analysis of this large and comprehensive database combining the input from worldwide publications, improved tools to predict phenotype are being developed (Table 15.2). A potential quantified method to predict the genotypeephenotype correlation has been developed with an allelic phenotype value (APV) for individual alleles and a genotypic phenotype value (GPV) for combined information from both alleles, providing values ranging from 0 for classic PKU to 10 for MHP [50]. However, genotypeephenotype discrepancies may be seen clinically. Biochemical phenotypes in PKU are a spectrum and patients may fall between categories. Several variants when paired with different null variants demonstrate variability in biochemical phenotypic expression ranging from classic to mild HPA [29]. Discordant sapropterin responsiveness also may be observed. Some of the phenotypic variation is thought to be explained by interallelic complementation [51,52] but there remains the possibility of other influences on phenotypes that could play a role in expression. These might include such epigenetic factors as absorption of Phe from the gastrointestinal tract, transportation of Phe into the hepatocytes, and transport of Phe across the bloodebrain barrier.

Case 1 A term 41 and 2/7 weeks gestational age female infant presents at 6 days of life after abnormal newborn screen demonstrated Phe elevation of 460 mmol/L (normal T (p.Arg408Trp) and c.7826G > A (p.Arg261Gln). The p.Arg408Trp variant is the most common classic variant in European populations and confers no residual activity (null variant). The p.Arg261Gln variant is a moderate PKU variant conferring slight residual activity. On the BioPKU database, the majority of individuals

300

Chapter 15 Phenylketonuria

Table 15.2 Common variants worldwide. Region North America

a,b

South Americac,d

Europee,f,g,h

East Asiai,j,k

Middle Eastl,m,n

Oceaniao,p

Classic PKU

Mild-moderate PKU

p.R261Q p.R408W c.60þ5G > T c.1066-11G > A c.1315þ1G > A p.R261Q c.913-7A > G c.1066-11G > A p.Ex5del p.R261Q p.P281L p.R408W c.1066-11G > A c.1315þ1G > A p.R111 p.P281L p.Y356 p.V399V c.442-1G > A p.EX6-96A > G p.F39del p.R261Q p.P281L p.Ex3del c.843-5T > C c.969þ5G > A c.1066-11G > A p.R261Q p.R408W c.1315þ1G > A

p.I65T p.V388M

HPA

p.I65T p.V388M

p.V388M p.Y414C

p.I174N p.R243Q p.R413P

p.R53H p.R241C

p.L48S

p.A300S

p.F39L p.I65T p.Y414C

a Rajabi F, Rohr F, Wessel A et al. Phenylalanine hydroxylase genotype-phenotype associations in the United States: A single center study. Mol Genet Metab. 2019;128(4):415e421. https://doi.org/10.1016/j.ymgme.2019.09.004. b Vela-Amieva M, Abreu-Gonz, lez M, Gonz, lez-del Angel A et al. Phenylalanine hydroxylase deficiency in Mexico: Genotypephenotype correlations, BH4 responsiveness and evidence of a founder effect. Clin Genet. 2014;52(April 2014):62e67. https://doi. org/10.1111/cge.12444. c Hamilton V, Santa Marı´a L, Fuenzalida K et al. Characterization of phenyalanine hydroxylase gene mutations in Chilean PKU patients. JIMD Rep. 2018;42:71e77. https://doi.org/10.1007/8904_2017_85. d Vieira Neto E, Laranjeira F, Quelhas D et al. Mutation analysis of the PAH gene in phenylketonuria patients from Rio de Janeiro, Southeast Brazil. Mol Genet genomic Med. May 2018. https://doi.org/10.1002/mgg3.408. e Alda´miz-Echevarrı´a L, Llarena M, Bueno MA et al. Molecular epidemiology, genotypeephenotype correlation and BH4 responsiveness in Spanish patients with phenylketonuria. J Hum Genet. 2016;61:731e744. https://doi.org/10.1038/jhg.2016.38. f Bayat A, Yasmeen S, Lund A, Nielsen JB, Møller LB. Mutational and phenotypical spectrum of phenylalanine hydroxylase deficiency in Denmark. Clin Genet. 2015:1e5. https://doi.org/10.1111/cge.12692. g Trunzo R, Santacroce R, D’Andrea G et al. Phenylalanine hydroxylase deficiency in south Italy: Genotype-phenotype correlations, identification of a novel mutant PAH allele and prediction of BH4 responsiveness. Clin Chim Acta. 2015;450:51e55. doi: https://doi.org/10.1016/j.cca.2015.07.014. h Gundorova P, Stepanova AA, Kuznetsova IA, Kutsev SI, Polyakov A V. Genotypes of 2579 patients with phenylketonuria reveal a high rate of BH4 nonresponders in Russia. PLoS One. 2019;14(1):e0211048. https://doi.org/10.1371/journal.pone.0211048. i Li N, He C, Li J et al. Analysis of the genotype-phenotype correlation in patients with phenylketonuria in mainland China. Sci Rep. 2018;8(1):11,251. https://doi.org/10.1038/s41598-018-29640-y.

Practical genotypeephenotype correlation

301

j Dateki S, Watanabe S, Nakatomi A et al. Genetic background of hyperphenylalaninemia in Nagasaki, Japan. Pediatr Int. 2016; 58(5):431e433. https://doi.org/10.1111/ped.12924. k Lee YW, Lee DH, Kim ND et al. Mutation analysis of PAH gene and characterization of a recurrent deletion mutation in Korean patients with phenylketonuria. Exp Mol Med. 2008;40(5):533e540. https://doi.org/10.3858/emm.2008.40.5.533. l Biglari A, Saffari F, Rashvand Z, Alizadeh S, Najafipour R, Sahmani M. Mutations of the phenylalanine hydroxylase gene in Iranian patients with phenylketonuria. Springerplus. 2015;4(2):542. https://doi.org/10.1186/s40064-015-1309-8. m Bercovich D, Elimelech A, Yardeni T et al. A mutation analysis of the phenylalanine hydroxylase (PAH) gene in the Israeli population. Ann Hum Genet. 2008; 72(Pt 3):305e309. https://doi.org/10.1111/j.1469-1809.2007.00425.x. n Esfahani MS, Vallian S. A comprehensive study of phenylalanine hydroxylase gene mutations in the Iranian phenylketonuria patients. Eur J Med Genet. 2019;62(9):103559. https://doi.org/10.1016/j.ejmg.2018.10.011. o Ho G, Alexander I, Bhattacharya K et al. The Molecular Bases of Phenylketonuria (PKU) in New South Wales, Australia: Mutation Profile and Correlation with Tetrahydrobiopterin (BH4) Responsiveness. JIMD Rep. 2014;14(5939):55e65. https://doi. org/10.1007/8904_2013_284. p Ramus SJ, Treacy EP, Cotton RG. Characterization of phenylalanine hydroxylase alleles in untreated phenylketonuria patients from Victoria, Australia: origin of alleles and haplotypes. Am J Hum Genet. 1995;56(5):1034e1041. http://www.ncbi.nlm.nih.gov/ pubmed/7726156.

with the same two variants presented with classic PKU (N ¼ 102) although some presented with a mild or moderate phenotype (N ¼ 25). The majority of these individuals challenged with pharmacological amounts of sapropterin (synthetic BH4) were not responsive (N ¼ 34) although a few did respond with a reduction in blood Phe (N ¼ 5) [49]. Based on the PAHvdb database, the first variant p.Arg408Trp has an APV of 0 and the second variant has an APV of 1.3 [50]. Although her predicted phenotype is classic PKU her actual phenotype is slightly milder, corresponding to moderate PKU. The patient has not attempted BH4 supplementation based on the previously available data after discussion with the family.

Case 2 A preterm 33 weeks gestational age male infant presents at 4 days of life for consultation from the neonatal intensive care unit after the newborn screen demonstrated a Phe elevation of 296 mmol/L (normal C (p.Ile65Thr) and c.1315þ1G > A (IVS12þ1G > A). On the BioPKU database, individuals with these two variants seemed equally likely to present with classic PKU (N ¼ 6) or with a mild or moderate form (N ¼ 8) [49]. Of the five individuals tested with sapropterin treatment three were responsive and one was slowly responsive. Based on the PAHvdb database, the first variant p.Ile65Thr has an APV of 1.5 and the second variant has an APV of 0. His predicted phenotype is classic to moderate PKU [50]. His actual phenotype is closer to moderate PKU. The family has trialed BH4 treatment and he was found to be responsive.

Case 3 A term 15-day infant presents after two abnormal newborn screens, the initial showing a Phe level of 180 umol/L (normal T (p.Arg221) pathogenic variant consistent with HPA not due to PKU (PAH deficiency) but to the BH4 deficiency (pterin disorder) known as dihydropteridine reductase deficiency. Treatment was initiated with sapropterin supplementation as well as neurotransmitter precursor supplements (L-dopa and 5-hydroxytryptophan).

References [1] Scriver CR, Clow CL. Phenylketonuria: epitome of human biochemical genetics (first of two parts). N Engl J Med 1980;303(23):1336e42. https://doi.org/10.1056/NEJM198012043032305. [2] Scriver CR, Clow CL. Phenylketonuria: epitome of human biochemical genetics (second of two parts). N Engl J Med 1980;303(24):1394e400. https://doi.org/10.1056/NEJM198012113032404. [3] Kayaalp E, Treacy E, Waters PJ, Byck S, Nowacki P, Scriver CR. Human phenylalanine hydroxylase mutations and hyperphenylalaninemia phenotypes: a metanalysis of genotype-phenotype correlations. Am J Hum Genet 1997;61(6):1309e17. https://doi.org/10.1086/301638. [4] National Institutes of Health Consensus Development Panel. National Institutes of Health Consensus Development Conference Statement: phenylketonuria: screening and management, October 16e18, 2000. Pediatrics 2001;108(4):972e82. https://doi.org/10.1542/peds.108.4.972. [5] Boneh A. In: Blau N, editor. Phenylketonuria and BH4 deficiencies. 3rd ed., vol. 40; 2016. [6] Centerwall SA, Centerwall WR. The discovery of phenylketonuria: the story of a young couple, two retarded children, and a scientist. Pediatrics 2000;105(1 Pt 1):89e103. https://doi.org/10.1542/peds.105.1.89. [7] Centerwall WR, Centerwall SA. Phenylketonuria (Følling’s disease). The story of its discovery. J Hist Med Allied Sci 1961;16:292e6. [8] Penrose LS. Inheritance of phenylpyruvic amentia (phenylketonuria). Lancet 1935;226(5839):192e4. https://doi.org/10.1016/S0140-6736(01)04897-8. [9] Jervis GA. Studies on phenylpyruvic oligophrenia; the position of the metabolic error. J Biol Chem 1947; 169(3):651e6. [10] Bickel H, Gerrard J, Hickmans E. Influence of phenylalanine intake on phenylketonuria. Lancet 1953; 265(6790):812e3. [11] Jervis GA. Phenylpyruvic oligophrenia deficiency of phenylalanine-oxidizing system. Proc Soc Exp Biol Med 1953;82(3):514e5. [12] Woolf LI, Griffiths R, Moncrieff A. Treatment of phenylketonuria with a diet low in phenylalanine. Br Med J 1955;1(4905):57e64. https://doi.org/10.1136/bmj.1.4905.57. [13] Scriver CC. A simple phenylalanine method for detecting phenylketonuria in large populations of newborn infants, by Robert Guthrie and Ada Susi, Pediatrics, 1963;32:318e343. Pediatrics 1998;102(1 Pt 2):236e7. [14] Guthrie P, Life happens, science follows. [15] Waisbren SE, Noel K, Fahrbach K, et al. Phenylalanine blood levels and clinical outcomes in phenylketonuria: a systematic literature review and meta-analysis. Mol Genet Metabol 2007;92(1e2):63e70. https:// doi.org/10.1016/j.ymgme.2007.05.006. [16] Blau N, Hennermann JB, Langenbeck U, Lichter-Konecki U. Diagnosis, classification, and genetics of phenylketonuria and tetrahydrobiopterin (BH4) deficiencies. Mol Genet Metabol 2011;104(Suppl. l):S2e9. https://doi.org/10.1016/j.ymgme.2011.08.017. [17] Camp KM, Parisi MA, Acosta PB, et al. Phenylketonuria Scientific Review Conference: state of the science and future research needs. Mol Genet Metabol 2014;112(2):87e122. https://doi.org/10.1016/ j.ymgme.2014.02.013. [18] Regier DS, Greene CL. Phenylalanine hydroxylase deficiency. 1993.

References

303

[19] Vockley J, Andersson HC, Antshel KM, et al. Phenylalanine hydroxylase deficiency: diagnosis and management guideline. Genet Med 2014;16(2):188e200. https://doi.org/10.1038/gim.2013.157. [20] Resta R. Generation n þ 1: projected numbers of babies born to women with PKU compared to babies with PKU in the United States in 2009. Am J Med Genet A 2012;158A(5):1118e23. https://doi.org/10.1002/ ajmg.a.35312. [21] Levy HL, Sarkissian CN, Scriver CR. Phenylalanine ammonia lyase (PAL): from discovery to enzyme substitution therapy for phenylketonuria. Mol Genet Metabol 2018;124(4):223e9. https://doi.org/10.1016/ j.ymgme.2018.06.002. [22] Jurecki ER, Cederbaum S, Kopesky J, et al. Adherence to clinic recommendations among patients with phenylketonuria in the United States. Mol Genet Metabol 2017;120(3):190e7. https://doi.org/10.1016/ j.ymgme.2017.01.001. [23] Andersen AE, Avins L. Lowering brain phenylalanine levels by giving other large neutral amino acids. A new experimental therapeutic approach to phenylketonuria. Arch Neurol 1976;33(10):684e6. https:// doi.org/10.1001/archneur.1976.00500100018008. [24] Pietz J, Kreis R, Rupp A, et al. Large neutral amino acids block phenylalanine transport into brain tissue in patients with phenylketonuria. J Clin Invest 1999;103(8):1169e78. https://doi.org/10.1172/JCI5017. [25] Thomas J, Levy H, Amato S, et al. Pegvaliase for the treatment of phenylketonuria: results of a long-term phase 3 clinical trial program (PRISM). Mol Genet Metabol 2018;124(1):27e38. https://doi.org/10.1016/ j.ymgme.2018.03.006. [26] Mabry CC, Denniston JC, Nelson TL, Son CD. Maternal phenylketonuria. A cause of mental retardation in children without the metabolic defect. N Engl J Med 1963;269:1404e8. https://doi.org/10.1056/ NEJM196312262692604. [27] Levy HL. Historical background for the maternal PKU syndrome. Pediatrics 2003;112(6 Pt 2):1516e8. [28] Lenke RR, Levy HL. Maternal phenylketonuria and hyperphenylalaninemia. An international survey of the outcome of untreated and treated pregnancies. N Engl J Med 1980;303(21):1202e8. https://doi.org/10.1056/ NEJM198011203032104. [29] Rajabi F, Rohr F, Wessel A, et al. Phenylalanine hydroxylase genotype-phenotype associations in the United States: a single center study. Mol Genet Metabol 2019;128(4):415e21. https://doi.org/10.1016/ j.ymgme.2019.09.004. [30] Pardue ML, Gall JG. Molecular hybridization of radioactive DNA to the DNA of cytological preparations. Proc Natl Acad Sci U S A 1969;64(2):600e4. https://doi.org/10.1073/pnas.64.2.600. [31] Jackson DA, Symons RH, Berg P. Biochemical method for inserting new genetic information into DNA of Simian virus 40: circular SV40 DNA molecules containing lambda phage genes and the galactose operon of Escherichia coli. Proc Natl Acad Sci U S A 1972;69(10):2904e9. https://doi.org/10.1073/pnas.69.10.2904. [32] Cohen SN, Chang AC, Hsu L. Nonchromosomal antibiotic resistance in bacteria: genetic transformation of Escherichia coli by R-factor DNA. Proc Natl Acad Sci U S A 1972;69(8):2110e4. https://doi.org/10.1073/ pnas.69.8.2110. [33] Su TS, Bock HG, O’Brien WE, Beaudet AL. Cloning of cDNA for argininosuccinate synthetase mRNA and study of enzyme overproduction in a human cell line. J Biol Chem 1981;256(22):11826e31. [34] Horwich AL, Fenton WA, Williams KR, et al. Structure and expression of a complementary DNA for the nuclear coded precursor of human mitochondrial ornithine transcarbamylase. Science 1984;224(4653): 1068e74. https://doi.org/10.1126/science.6372096. [35] Gillam SS, Woo SL, Woolf LI. The isolation and properties of phenylalanine hydroxylase from rat liver. Biochem J 1974;139(3):731e9. https://doi.org/10.1042/bj1390731. [36] Robson KJ, Chandra T, MacGillivray RT, Woo SL. Polysome immunoprecipitation of phenylalanine hydroxylase mRNA from rat liver and cloning of its cDNA. Proc Natl Acad Sci U S A 1982;79(15):4701e5. https://doi.org/10.1073/pnas.79.15.4701.

304

Chapter 15 Phenylketonuria

[37] Woo SL, Lidsky AS, Gu¨ttler F, Chandra T, Robson KJ. Cloned human phenylalanine hydroxylase gene allows prenatal diagnosis and carrier detection of classical phenylketonuria. Nature 1983;306(5939):151e5. https://doi.org/10.1017/CBO9781107415324.004. [38] Lidsky AS, Ledley FD, DiLella AG, et al. Extensive restriction site polymorphism at the human phenylalanine hydroxylase locus and application in prenatal diagnosis of phenylketonuria. Am J Hum Genet 1985; 37(4):619e34. [39] Woo SL, Lidsky AS, Gu¨ttler F, Thirumalachary C, Robson KJ. Prenatal diagnosis of classical phenylketonuria by gene mapping. J Am Med Assoc 1984;251(15):1998e2002. [40] Lidsky AS, Law ML, Morse HG, et al. Regional mapping of the phenylalanine hydroxylase gene and the phenylketonuria locus in the human genome. Proc Natl Acad Sci U S A 1985;82(18):6221e5. https:// doi.org/10.1073/pnas.82.18.6221. [41] Ledley FD, Grenett HE, DiLella AG, Kwok SC, Woo SL. Gene transfer and expression of human phenylalanine hydroxylase. Science 1985;228(4695):77e9. https://doi.org/10.1126/science.3856322. [42] Ledley FD, Levy HL, Woo SL. Molecular analysis of the inheritance of phenylketonuria and mild hyperphenylalaninemia in families with both disorders. N Engl J Med 1986;314(20):1276e80. https://doi.org/ 10.1056/NEJM198605153142002. [43] DiLella AG, Kwok SC, Ledley FD, Marvit J, Woo SL. Molecular structure and polymorphic map of the human phenylalanine hydroxylase gene. Biochemistry 1986;25(4):743e9. https://doi.org/10.1021/ bi00352a001. [44] Guldberg P, Rey F, Zschocke J, et al. A European multicenter study of phenylalanine hydroxylase deficiency: classification of 105 mutations and a general system for genotype-based prediction of metabolic phenotype. Am J Hum Genet 1998;63(1):71e9. https://doi.org/10.1086/301920. [45] Erlandsen H, Patch MG, Gamez A, Straub M, Stevens RC. Structural studies on phenylalanine hydroxylase and implications toward understanding and treating phenylketonuria. Pediatrics 2003;112(6 Pt 2):1557e65. [46] Flydal MI, Alcorlo-Page´s M, Johannessen FG, et al. Structure of full-length human phenylalanine hydroxylase in complex with tetrahydrobiopterin. Proc Natl Acad Sci U S A 2019;116(23):11229e34. https:// doi.org/10.1073/pnas.1902639116. [47] Shen N, Heintz C, Thiel C, Okun JG, Hoffmann GF, Blau N. Co-expression of phenylalanine hydroxylase variants and effects of interallelic complementation on in vitro enzyme activity and genotype-phenotype correlation. Mol Genet Metab 2016. https://doi.org/10.1016/j.ymgme.2016.01.004. [48] Erlandsen H, Stevens RC. A structural hypothesis for BH4 responsiveness in patients with mild forms of hyperphenylalaninaemia and phenylketonuria. J Inherit Metab Dis 2001;24(2):213e30. https://doi.org/ 10.1023/a:1010371002631. [49] PAHvdb. http://www.biopku.org. [50] Garbade SF, Shen N, Himmelreich N, et al. Allelic phenotype values: a model for genotype-based phenotype prediction in phenylketonuria. Genet Med 2019;21(3):580e90. https://doi.org/10.1038/s41436-018-0081-x. [51] Leandro J, Nascimento C, de Almeida IT, Leandro P. Co-expression of different subunits of human phenylalanine hydroxylase: evidence of negative interallelic complementation. Biochim Biophys Acta 2006; 1762(5):544e50. https://doi.org/10.1016/j.bbadis.2006.02.001. [52] Shen N, Heintz C, Thiel C, Okun JG, Hoffmann GF, Blau N. Co-expression of phenylalanine hydroxylase variants and effects of interallelic complementation on in vitro enzyme activity and genotype-phenotype correlation. Mol Genet Metabol 2016;117(3):328e35. https://doi.org/10.1016/j.ymgme.2016.01.004.

CHAPTER

16

Hearing loss

Anna Morgan1, Paolo Gasparini1, 2, Giorgia Girotto1, 2 Medical Genetics Unit, Institute for Maternal and Child Health e IRCCS, Burlo Garofolo, Trieste, Italy; 2Department of Medicine, Surgery and Health Sciences, University of Trieste, Trieste, Italy

1

Introduction Hearing loss (HL) is the most prevalent sensory impairment in both childhood and adulthood [1]. According to the World Health Organization (WHO) more than 5% of the world’s population (w466 million people) has disabling HL and it is estimated that by 2050 this number will almost double, with over 900 million people that will be affected by HL (https://www. who.int/news-room/fact-sheets/ detail/deafness-and-hearing-loss). HL is classified in many categories based on (1) the degree and configuration of loss, (2) the location of the damage causing the deficit, and (3) the age of onset. In terms of (1) the degree of loss we can distinguish between: Mild HL (hearing threshold between 26 and 40 dB), Moderate HL (hearing threshold between 41 and 55 dB), Moderately severe HL (hearing threshold between 56 and 70 dB), Severe HL (hearing threshold between 71 and 90 dB), and Profound HL (hearing threshold above 90 dB) [2]. Furthermore, HL can be categorized according to the audiometric configuration, i.e., the shape or pattern of the audiogram across the frequency spectrum. When the HL does not vary more than 20 dB across all frequencies, the audiogram is defined as flat. Conversely, a sloping configuration is characterized by a better hearing in the lower frequencies and poorer in the higher ones. A rare type of audiogram is a rising configuration, which indicates that high-frequency sounds can be better heard than low-frequency ones. Another peculiar type of HL is the cookie-bite or U-shaped configuration, in which subjects experience difficulty in hearing mid-frequency sounds, while maintaining the ability to hear high- and low-frequency sounds. Finally, a noise-notched configuration indicates a loss mostly between 3 and 6 kHz, while lower and higher frequencies are not affected, and is usually a consequence of noise exposure [3]. Depending on the 2) location of the lesion causing the problem, HL can be classified as: Conductive HL, which results from abnormalities of the external ear and/or the ossicles of the middle ear that prevent sound from being conducted to the inner ear; Sensorineural HL, when the damage is in the inner ear (i.e., a lesion affecting the cochlea or the auditory nerve); Mixed HL, that can be

Clinical DNA Variant Interpretation. https://doi.org/10.1016/B978-0-12-820519-8.00016-8 Copyright © 2021 Elsevier Inc. All rights reserved.

305

306

Chapter 16 Hearing loss

defined as a combination of conductive and sensorineural HL; Central auditory dysfunction, resulting from damage or dysfunction at the level of the eighth cranial nerve, auditory brain stem, or cerebral cortex [4]. Based on the 3) age of onset it is possible to distinguish between: Prelingual HL, present before speech develops; Postlingual HL, that occurs after the development of normal speech [2]. It has been shown that at least 50% of congenital or childhood HL is attributable to genetic causes, leading to the so-called hereditary hearing loss (HHL) [5]. HHL can be syndromic (SHL) (about 30%), in which deafness is accompanied by other signs and/ or symptoms, and non-syndromic (NSHL) (about 70%), in which there are no additional abnormalities [6]. As regard NSHL, it can be inherited as autosomal recessive (DFNB) in 75%e80% of cases, autosomal dominant (DFNA) in 20%, while X-linked (DFNX) or mitochondrial inheritance accounts for a small proportion of cases (1%e5%) [7]. Considering the clinical complexity of HHL, it should come as no surprise that this phenotype is extremely heterogeneous also from the genetic point of view. To date, about 170 NSHL loci (67 DFNA loci, 93 DFNB loci, 6 X-linked loci, 2 modifier loci, 1 Y-linked locus, and 1 locus for auditory neuropathy) and 117 genes (36 DFNA genes, 65 DFNB genes, 11 DFNA/DFNB genes, and 5 X-linked genes) have been reported as causative (Hereditary Hearing Loss homepage; http://hereditaryhearingloss.org/), and more than 400 syndromes associated with HL and other symptoms have been described [8]. The introduction of next-generation sequencing (NGS) technologies, such as targeted resequencing (TRS) and whole-exome sequencing (WES), has dramatically increased the diagnostic rate of HHL leading to the identification of several mutations in known deafness genes, as well as to the discovery of new disease genes [9e12]. In fact, apart from the relevant contribution of variants in GJB2 and GJB6 genes, that are responsible for approximately 50% of all autosomal recessive cases in some populations [13e15], and of STRC deletions, that have been estimated to account for 1%e5% of HL cases [16], no other worldwide major players have been identified. In this light, the possibility of simultaneously screening a large number of genes is essential for reaching the molecular diagnosis of such a heterogeneous disease. In addition to TRS and WES, also whole-genome sequencing (WGS) is currently taking place in the diagnosis of HL, even though the costebenefit ratio is still highly unbalanced. The identification of the genetic cause of HHL is particularly difficult for NSHL, since, in addition to the genetic heterogeneity, the same pathogenic variant may present differently in the severity of HL or in the overall clinical presentation among different patients [17], and predicting the specific causative gene based on the audiological phenotype is possible only in few cases (e.g., WFS1 (OMIM 606201), COCH (OMIM 603196), and TECTA (OMIM 602574)) [18]. An additional level of complexity is given by the fact that certain SHL genes can mimic NSHL since some syndromic features may appear later in life, such as retinitis pigmentosa (RP) in Usher syndrome or delayed puberty in Kallmann syndrome, and in other cases patients may only have subtle phenotypic features of a syndrome, e.g., mild facial features of CHARGE or Noonan syndrome, misleading clinicians [6]. At this point it is evident that the study of the genetics of HHL is far from trivial, and requires the close collaboration between otolaryngologists, geneticists, ophthalmologists, and audiologists. Here we will review the standard practices for the molecular analysis of HHL and describe a series of clinical cases that will highlight frequent problems that could be encountered during the study of this common disease.

Introduction

307

Genetic tests for hearing loss Prior to any genetic test it is necessary to collect a comprehensive history, physical examination, and audiological evaluation of the patients. This first stage of evaluation can help to detect those cases where the HL is caused by prenatal infections (e.g., toxoplasmosis or cytomegalovirus) [19] and does not require genetic insights, and to identify clinical features related to SHL. If a syndrome is identified, genetic testing can be limited to the relevant set of genes that are known to cause that specific syndrome. Conversely, when a patient presents with NSHL that appears to be recessive, the sequencing of GJB2 gene, encoding for the Connexin 26 protein, is recommended, based on the frequency of occurrence of its mutations [20]. For the same reason, together with the analysis of GJB2, also mutations and deletions in GJB6 gene (i.e., delGJB6-D13S1830; delGJB6-D13S1854), encoding for the Connexin 30, and mutations in the mitochondrial 12S rRNA, are screened [21,22]. Even if the positivity to this test depends on several factors, such as the type of HL (i.e., NSHL due to GJB2 mutation is usually congenital and severe/profound) [23], the ethnicity (i.e., it is well known that the prevalence of GJB2 and GJB6 mutations may vary considerably among different populations) [24,25], and the exposition of aminoglicosides (i.e., HL due to mutations in the mtDNA is usually triggered from specific antibiotics that, in the presence of the mutations, have an ototoxic effect) [26], this is a wellaccepted first-level screening for patients affected by NSHL of unknown etiology. In case of negativity to GJB2, GJB6, and mtDNA mutations, the second step for the identification of the molecular cause of NSHL is the use of NGS technologies, such as TRS of a specific subset of genes, or WES, for the analysis of the entire coding region of the genome. This approach, allowing the simultaneous analysis of all the genes already described as causative for NSHL, is the most appropriate for facing the genetic heterogeneity of this disorder and is suitable for any type of inheritance (i.e., autosomal recessive, autosomal dominant, X-linked). The choice between TRS or WES must take into account several aspects, since both approaches present with pros and cons. With TRS only a specific subset of genes is analyzed and therefore the mean coverage (i.e., the average number of reads that align to the known reference bases) is usually much higher compared to WES, ensuring a better sensitivity (false-negative rate) and specificity (false-positive rate). Furthermore, when testing a limited panel, the data storage requirements are smaller, the number of patients that can be analyzed in parallel is higher, reducing the costs of the test, and the data analysis is usually faster, since the number of variants to evaluate is reduced compared to WES, and mostly because establishing the pathogenicity of a variant that affect a gene already known for causing HHL is facilitated by several databases, such as Deafness Variation Database (DVD) [27], Human Gene Mutation Database (HGMD) [28], Leiden Open Variation Database 3.0 (LOVD v.3.0) [29], and ClinVar [30]. These databases are repositories of thousands of variants already classified as pathogenic, likely pathogenic, benign, or as variants of unknown significance (VUS), and are extremely valuable for geneticists and bioinformaticians during sequencing data interpretation. Of course, these resources are not helpful for novel variants that need different tools to aid in the analysis. In this case, the workflow for variant interpretation is the same for WES, with the only difference that the overall number of nucleotide changes to evaluate is considerably smaller for TRS. The standard pipeline for variant interpretation includes: (1) the analysis of the variant’s allele frequency (AF) in public databases, such as Database of Short Genetic Variations (dbSNP) [31], Exome Aggregation Consortium (ExAC) [32], Exome Variant Server (EVS) (Exome NHLBI Exome Sequencing Project (ESP), Seattle, WA (URL: http://evs.gs.washington.edu/EVS/)), and Genome Aggregation Database (gnomAD) (https://gnomad.broadinstitute.org/). A recent work,

308

Chapter 16 Hearing loss

that systematically reviewed gene variants linked to HL, suggested an AF filter of 0.6% and 0.1% for recessive and dominant genes, respectively [33]. (2) In silico pathogenicity prediction using different tools, such as MutationTaster [34], PolyPhen-2 [62], Sorting Intolerant from Tolerant (SIFT) [35], Combined AnnotationeDependent Depletion (CADD) [36], Human Splicing Finder (HSF) [37], PhyloP [38], and GERPþþ score [39]. The best way to assess the pathogenic role of a variant is to validate it through functional studies, however, this does not fit with the timeframe of the standard diagnostic routine. In this light, the use of different software that can predict the effect of a variant on the protein structure, or on the splicing process, is the only suitable alternative. (3) The application of the American College of Medical Genetics and Genomics (ACMG) guidelines for variant interpretation [40]. These recommendations describe the ideal process for classification of variants into five categories (i.e., “pathogenic,” “likely pathogenic,” “uncertain significance,” “likely benign,” and “benign”) based on specific criteria (i.e., population data, caseecontrol analyses, functional data, computational predictions, allelic data, segregation studies, and de novo observations), and they are considered as the most reliable worldwide. In particular, the ACMG defined 28 criteria, each associated with a specific weight (stand-alone, very strong, strong, moderate, or supporting) and direction (benign or pathogenic) that, combined together, allow the final classification of the variants. In some cases, the strength of the individual criteria can be changed at the discretion of the curator, and the overall classification criteria (see above) can be modified with expert judgment. In the case of HL a recent paper by Oza et al. [63] provided a comprehensive illustration of the ACMG rules for variants interpretation for HL, revising them taking into account the high heterogeneity of this disease. Briefly, among the 28 criteria, 4 have been classified as “not applicable” (i.e., PP2, PP5, concerning supporting evidence of pathogenicity, and BP1 and BP6, belonging to the supporting evidence of benign impact), 21 had gene/disease-based or strength-level specifications (i.e., PS3, PM1, PM2, PP4, BA1, BS4, BP2, PVS1, PS2, PM3, PM5, PM6, PP1, BS3, PS4, BS1, and BS3), and the last 3 remained unchanged (i.e., PM4, BP3, and BP7). Moreover, two additional changes to the ACMG recommendations have been introduced: 1) null variants absent from controls (i.e., PM2) affecting autosomal recessive genes where loss-of-function is a known mechanism of disease (i.e., PSV1) should be classified as likely pathogenic; 2) variants with an AF greater than expected for HL (i.e., BS1) without valid conflicting evidence that would suggest a pathogenic role should be classified as likely benign. More detailed information and technical examples can be found in the abovementioned paper. When applying TRS, the pool of variants that need to be categorized affects only a subset of genes strictly related to the disease, thus there is no risk of secondary findings, which are clinically significant to the patient but are unrelated to the primary indication. To date, in case of incidental findings, the ACMG recommended reporting them in a minimum of 59 medically actionable genes [41], if the patient previously expressed the will to be informed about them in the pretest consent. If on the one hand testing a panel of genes has a series of advantages, that are exactly the cons of WES, on the other the main limitation is that the panel must be constantly updated, including newly identified genes. At this purpose, a specific working group of the Clinical Genome Resource (an NIHfunded resource dedicated to defining the clinical relevance of genes and variants for use in precision medicine and research [42]) performed an evidence-based curation of 142 genes associated with both SHL and NSHL, classifying them in “Definitive,” “Strong,” “Moderate,” “Limited,” “Disputed,” and “Refuted,” and providing helpful recommendations for the choice of the genes to be included in a TRS panel [43]. Moreover, the Group pointed out the need of including some SHL genes also in panels dedicated to NSHL, given the possibility of missing the syndromic diagnosis due to delayed onset,

Disease sections: practical examples

309

variable expressivity, or subtle presentations of non-HL features. In fact, if different panels are available for NSHL and SHL, many NSHL-mimic syndromes may escape from the molecular diagnosis, lengthening the time required for a proper diagnosis, and thus delaying the correct clinical management of the patient. Furthermore, with TRS is not possible to identify any new disease-causing gene. Needless to say, that the cons of TRS are actually the strength of WES. At this point, the choice of one technique over the other should be determined case by case. In particular, if after the clinical evaluation of the patient a specific syndrome is suspected, the best approach should be TRS of the panel of genes causative of that SHL, since it will be a cost- and time-saving solution that easily will lead to a conclusive molecular diagnosis. Conversely, if we are dealing with an apparent NSHL, for which it is nearly impossible to establish a prediagnostic hypothesis only based on clinical examination and audiological findings, WES should be performed. It is important to consider that neither TRS nor WES is capable to detect deep intronic variants that can be involved in several HHL cases [44], nor being reliable for the detection of copy number variations (CNVs), especially those at the heterozygous state. Thus, considering the increasing detection of large CNVs in HHL genes [9,45], a comprehensive molecular test for HHL should include SNP arrays analysis or similar technologies. As an example, STRC gene (the second most frequent cause of HHL in several population) has a deletion carrier rate of w1.6% in the European population which is similar to the carrier rate of the c.35delG mutation of GJB2 gene in the same population (1.89%) [46]. Here we will present some practical examples of both NSHL and SHL that will highlight some of the most common problems and difficulties that geneticists and clinicians can encounter during the molecular analysis of HL.

Disease sections: practical examples that highlight the main challenges of the molecular diagnosis of hearing loss Apparent non-syndromic hearing loss An emerging problem in the molecular analysis of HL is the inability to distinguish among SHL and NSHL. While in most of the cases the differences between these two scenarios are evident, in other cases the clinical features of a syndrome might be subtle, or even not present until later on in life. This is the case of Usher syndrome, a rare disease characterized by sensorineural HL with or without vestibular problems, and RP. Clinicians identified three types of Usher syndrome that differ one from another for the age at which the symptoms appear and the severity of symptoms. In particular, Usher syndrome 1 (USH1), which accounts for 70% of all Usher cases, is usually characterized by profound congenital sensorineural HL, absent vestibular function, and RP that appear by the age of 10; Usher syndrome 2 (USH2) (w26% of all Usher cases) presents with moderate to severe congenital sensorineural HL, normal vestibular function, and RP with a delayed onset compared to USH1 (i.e., by 20 years of age); Usher syndrome 3 (USH3), that represents about 4% of all Usher cases, is characterized by progressive sensorineural HL, in half of the cases with vestibular problem, and RP that develops in the second decade of life [47]. Despite an accurate clinical assessment, RP would never be diagnosed during early childhood, especially for USH2 and USH3 patients, leading clinicians to the incorrect diagnosis of NSHL.

310

Chapter 16 Hearing loss

This is something that geneticists need to keep in mind when using TRS panels that only include genes responsible for NSHL for the molecular diagnosis of HHL. A clear example is Family 1, a nonconsanguineous, three-generation Italian family with five individuals (Fig. 16.1A) that came to genetic counseling with the only diagnosis of sensorineural HL. The proband (II:1), a 14-year-old female, was diagnosed with moderate to severe down-sloping bilateral sensorineural HL at the age of 2. Over the next few years, her HL remained stable, and no vestibular or visual impairment was ever diagnosed. The proband’s brother (II:3), of 8 years old, displayed an overlapping audioprofile, with a similar age of onset, as shown in Fig. 16.1. No one else in the family, nor the parents (I:1, I:2) or the proband’s sister (II:2), presented with HL (Fig. 16.1B). As a first step, the family was tested for mutations in GJB2, GJB6 genes and in MT-RNR1 mitochondrial gene that resulted negative. Secondly, a TRS panel of 96 NSHL genes [48] was performed. Data analysis did not detect any pathogenic mutations in all the genes under investigation, and for this reason the family was analyzed by WES, that revealed the presence of two likely pathogenic variants in ADGRV1 gene (NM_032119.3), c.3917G > A, p.(G1306E), and c.12802C > T, p.(R4268) at the compound heterozygous state in the affected patients. This gene encodes a member of the G-proteincoupled receptor superfamily, and its mutations are associated with USH2 [49]. The identification of these two variants changed the clinical diagnosis of the patients, moving from NSHL to Usher syndrome type 2, with extremely relevant implications in the clinical management of patients and their relatives. Periodical ophthalmological controls have been proposed to the family, in order to monitor and rapidly detect the beginning of RP, and the family was put in contact with specific associations that could help in practical and psychological management of the disorder. This example highlights two important aspects: 1) when dealing with NSHL, especially with young patients, it is necessary to consider that some clinical features might not be already present and the phenotype might not be 100% certain. Thus it is necessary to use more comprehensive gene panels, that include also SHL genes, or WES, for achieving a correct molecular diagnosis; 2) the identification of the genetic cause of the disease could be extremely important since it can in some cases lead to a presymptomatic diagnosis, avoiding further unnecessary testing, providing useful prognostic value, and important heritability information.

Large families with more than one gene involved Considering the high incidence of HL in the general population, it is not uncommon for large families to have at least one phenocopy that could hamper the molecular diagnosis. A phenocopy is phenotypic trait that resembles the trait expressed by a specific genotype, but in a subject who is not a carrier of that genotype. In the case of HL, it can be due to several factors, such as aging, life style, noise exposure, or to a mutation in a different gene. Unfortunately, since it is extremely difficult to draw any clear phenotypeegenotype correlation in the case of HHL, identifying phenocopies among all the impaired individuals of a family is far from easy. An interesting example is Family 2, a five-generation nonconsanguineous Italian family (Fig. 16.2A) that came to genetic counseling with a diagnosis of HL. The proband (IV:3), a 53-year-old male, displayed postlingual profound bilateral high-frequency HL, following a ski-slope profile (Fig. 16.2B). His HL was not progressive during the years, and no additional clinical signs were ever diagnosed. The family history suggested a likely autosomal dominant HHL, thus the family was enrolled for the genetic screening. As usual, the proband was first tested for mutations in GJB2,

Pedigree, genetic, and audiological data of Family 1. (A) Pedigree of Family 1. Filled symbols represent affected individuals. The proband is indicated with an arrow. Individuals with Roman numeric labels were analyzed in this study. (B) Audiometric features of all the family members are displayed as audiograms (air conduction). The thresholds of the right and left ears are shown.

Disease sections: practical examples

FIGURE 16.1

311

312 Chapter 16 Hearing loss

FIGURE 16.2 Pedigree, genetic, and audiological data of Family 2. (A) Pedigree of Family 2. Filled symbols represent affected individuals. The proband is indicated with an arrow. Individuals with Roman numeric labels were analyzed in this study. (B) Audiograms (air conduction) of the individuals selected for TRS. The thresholds of the right and left ears are shown. (C) Audiograms (air conduction) of the patients carrying GJB2 mutations. The thresholds of the right and left ears are shown.

Disease sections: practical examples

313

deletions in GJB6, and for the A1555G mitochondrial mutation, with negative result. As a second step, the proband (IV:3), his daughter (V:4), his wife (IV:4), and one affected sibling (IV:1) were tested with a TRS panel of 96 NSHL genes. Data analysis led to the identification of a novel deletion in GSDME (NM_001127454.1) gene c.666_669del, p.(Y223Sfs49). GSMDE, also known as DFNA5, encodes the DFNA5 protein, which belongs to a family of proteins involved apoptosis and necrotic activity, and is known for causing autosomal dominant NSHL. Interestingly, the audiograms of the proband and of the other affected individuals selected for TRS (Fig. 16.2B) were similar to those of other DFNA mutated patients [50], and for this reason the deletion was selected for the segregation analysis within the family. The DNAs of the other family members, as well as their audiograms, were collected, and Sanger sequencing was performed. The variant appeared to correctly segregate within the family, with the only exception of V:5 and V:6 individuals, two siblings and nephews of the proband. After a careful clinical evaluation, the audiograms of the two siblings, as well as the age of onset of their HL, appeared different from the other relatives, since they displayed congenital severe-to-profound HL across all frequencies (Fig. 16.2C). In this light, a different genetic cause was hypothesized, and the siblings were screened for mutations in GJB2 (NM_004004.5) gene. Sanger sequencing revealed the presence of two known heterozygous mutation at the compound heterozygous state, c.101T > C, p.(M34T) and c.109G > A, p.(V37I), the first inherited from the father, and the second from the mother, who was also carrier of the DFNA5 deletion. This is not the only example of multiple genes involved in one single family [51], and especially when dealing with large pedigrees, the apparent incorrect segregation of a mutation not always indicates a wrong diagnosis but could be attributable to the presence of phenocopies. In this light, even though we already highlighted how difficult it is to draw any genotypeephenotype correlation for HHL, if almost all the patients of one family display a similar phenotype, except from few individuals, who do not share the same genotype, the presence of other possible causes of HL should be taken into consideration.

The importance of molecular karyotyping in the analysis of hearing loss During the last few years several studies highlighted the importance of CNVs and complex rearrangements in the etiopathogenesis of HHL [9,45,52,53]. Despite the huge improvement of both TRS/ WES protocols, and data analysis software, the reliability of CNV calling from sequencing data is still controversial. In fact, it strictly depends on high depth and uniformity of coverage across all target sites, and also on the size of the genomic regions of interest. In this regard, WGS offers the potential to detect all genetic variation including CNVs and structural changes, such as inversions and translocations. However, because of the high costs and of the significant bioinformatics challenges arising from the quantity and complexity of data produced, it is still not adopted as a standard diagnostic routine assay. In contrast to WGS, WES reduces costs and complexity of data analysis; however, because of differences in probe hybridization and efficiency, that affect the uniformity coverage across targets, the CNV calling form WES data can be really challenging, especially with CNV at the heterozygous state. The same reason applies to TRS data, for which the uniformity of coverage can be even more fluctuating. It appears clear that additional techniques, such as comparative genomic hybridization (CGH) array, or SNP array, should be applied to detect or confirm CNV in HHL patients.

314

Chapter 16 Hearing loss

One of the genes more prone to deletion is STRC, which encodes a highly conserved protein called Stereocilin, necessary for proper hair cell function and causative of autosomal recessive HHL [54]. Deletions of the STRC gene have been described mainly in association with mild-to-moderate HL [55], together with infertility in males when the deletion involves also the contiguous gene CATSPER2 [56]. Recent data have shown that mutations and deletions affecting the STRC gene represent the second most frequent cause of HHL, especially in Caucasians and Hispanics [57], highlighting the importance of implementing CNV analysis for HHL patients. Besides STRC, many other HHL genes can be affected by CNVs [9,45]; thus their detection must be included in comprehensive genetic testing for HL. Interestingly, genotyping data analysis can also reveal peculiar cases in which the genetic cause of HHL is really unexpected. In this regard, one example is Family 3 that came to genetic counseling with a diagnosis of an early-onset bilateral symmetric severe to profound NSHL (Fig. 16.3A). Audiometric data of the proband’s parents and sister did not reveal any hearing defects, thus an autosomal recessive HHL was hypothesized. The proband was tested for mutations in GJB2, GJB6 and for the A1555G mitochondrial mutation, with negative result. Later, the family trio was tested with a TRS panel of 96 NSHL genes. Data analysis revealed a novel homozygous variant in LOXHD1 (NM_144,612.6), c.3071A > G, p.(Y1024C), apparently segregating only from the father. At this point a contralateral deletion was suspected, and the proband together with her parents were analyzed with SNP array. SNP array analysis identified one run of homozygosity bigger than 8 Mb in length, spanning LOXHD1 gene on chromosome 18. Analysis of informative SNPs in parental samples and their comparison with the patient’s genotype revealed the presence of a paternal uniparental disomy (UPD). UPD occurs when two copies of a chromosome are inherited from one parent, and nothing is inherited from the other one. We can distinguish between: a) uniparental isodisomy, when both chromosomes from one parent are identical because of a meiosis II error or postzygotic duplication of a chromosome, and b) uniparental heterodisomy, when the two chromosomes represent different copies of the same chromosome, due to a meiosis I error. In a UPD the chromosome content is not changed; thus it could have no clinical consequences, except when the disomy contains imprinted genes, or, as it happened in our family, it contains a mutation in a recessive gene. A deep analysis of SNPs on the whole chromosome 18 revealed the presence of both a small isodisomy segment spanning the LOXHD1 gene plus the presence of heterodisomy on the remaining parts of chromosome 18 (Fig. 16.3B). This is just an example of a molecular diagnosis that could have never been reached without applying molecular karyotyping, confirming its usefulness in the diagnosis of HHL.

Cases negative for known deafness genes: what to do? In the last decade, the application of NGS technologies (i.e., TRS and WES) and of molecular karyotyping increased the diagnostic rate of HHL. Nevertheless, it is estimated that still a large percentage of cases remain without a clear molecular diagnosis, in particular for NSHL (w35/40%). This result may be explained in different ways: 1) the causative mutation lies in regions not covered by the enrichment kit, or in regions with low coverage, 2) the causative mutation lies in exons that are missing from the used reference assembly, 3) the causative mutation is located in deep intronic regions, 4) the phenotype is due to a CNV not detectable by both NGS and molecular karyotyping (e.g., CNV with size below the limit of resolution, or low confidence calling due to a low density of probes), 5) a new gene is involved.

Pedigree, clinical, and audiological data of Family 3. (A) Pedigree of the Family investigated in the present study. Filled symbols represent affected individuals. The proband is indicated with an arrow. Individuals with Roman numeric labels were analyzed in this study. (B) Audiometric features of the proband are displayed as audiograms (air conduction). The thresholds of the right and left ears are shown. (C) SNP array analysis of LOXHD1 gene. Results of SNP array on the proband showing a UPD with a Run of Homozygosity (ROH) region (reported in green) suggesting a recombination event originated in meiosis I. The log R ratio of the analysis is consistent with normal copy number. Some of the markers in both the telomeric regions 18p and 18q plus in the ROH region are also indicated.

Disease sections: practical examples

FIGURE 16.3

315

316

Chapter 16 Hearing loss

The identification of a new gene is extremely valuable since it helps in dissecting the biology and functioning of the hearing system, and provides new possible therapeutic targets for future medical treatments. Of course, validating the role of a new gene requires several efforts that include in silico approaches (e.g., in silico protein modeling, conservation scores, etc.), and in vitro and in vivo functional studies. Furthermore, another helpful strategy is the identification of additional families carrying a mutation in the same gene. In this regard, data sharing between research groups or the use of databases such as GeneMatcher [58] is essential for identifying more patients. One example that clearly summarizes the efficacy of this workflow is the recent identification of a new gene named PLS1 in an NSHL Italian family (Family 4) [12] (Fig. 16.4A). Pure-tone audiometry of the proband (III:1) displayed bilateral symmetric mild-to-severe, medium-high frequency HL, first detected at the age of 8. The proband’s mother (II:4) showed a similar down-sloping audiometric pattern that was originally discovered around the age of 30, and the proband’s uncle (II:3) reported medium-high frequency HL, identified in his adulthood (w30 years old) (Fig. 16.4D). All patients were negative for mutations in GJB2, GJB6, and MTRNR1 and in 96 HL genes, and for this reason the family was investigated by WES. Data analysis revealed the presence of a novel missense variant in PLS1 gene (NM_002670.2) (c.805G > A, p.(E269K)), predicted as damaging by all the in silico tools. Although at the time the gene had never been associated with deafness in humans, it immediately seemed an ideal candidate considering that PLS1 encodes plastin-1, one of the most abundant actin-bundling proteins of the stereocilia, and is known for causing HL in mice [59]. Using GeneMatcher database it was possible to identify two additional families (Fig. 16.4B and C), one coming from the United States (Family 5) and one from France (Family 6), both affected by autosomal dominant NSHL and carrying novel PLS1 variants (c.713T > G, p.(L238R) and c.383T > C, p.(F128S), respectively). At this point a series of in silico studies have been performed for confirming the pathogenic role of all the identified variants. We also demonstrated that PLS1 revealed signatures of natural selection, in addition to a low observed/expected ratio of both missense and loss-of-function. Moreover, in silico protein modeling demonstrated that all variants affect the actin-binding domain 1 (ABD1), a domain which binds one actin monomer in the filament, and predicted an overall destabilization of the protein structure. In particular, the modeling suggested a perturbation of the structural stability of the whole ABD1 via disruption of an essential electrostatic interaction (p.(E269K)) or hydrophobic core network (p.(F128S) and p.(L238R)), that may result in a reduced protein’s ability to bind F-actin. It was possible to speculate that these changes may eventually cause an abnormal stereocilia formation (like in the mouse model), leading to the hearing defect identified in all patients. These results, i.e., genomic data of three independent autosomal dominant NSHL families, the Pls1/ mice phenotype described in the literature, the demonstration that PLS1 is under selection, and the protein modeling results, provided compelling evidence that PLS1 is required for normal hearing and that its alteration in humans leads to dominant NSHL. This is just one example of the many novel deafness genes identified during the last decade [11,60,61] that highlights the importance of a multidisciplinary approach for both the discovery and the validation of a novel disease-associated gene.

Pedigree, genetic, and audiological data of the families carrying novel variants in PLS1 gene. (A) Pedigree of Family 4. (B) Pedigree of Family 5. (C) Pedigree of Family 6. Filled symbols represent affected individuals. Probands are indicated with an arrow. Individuals with Roman numeric labels were analyzed in this study. (D) Audiometric features of the subjects II:4, II:5, III:1 (Family 5), IV:0 (Family 5), I:1, II:1, III:1 (Family 6) are displayed as audiograms (air conduction). The thresholds of the right and left ears are shown.

Disease sections: practical examples

FIGURE 16.4

317

318

Chapter 16 Hearing loss

Conclusions All the examples reported above highlight the complexity behind HHL (both syndromic and nonsyndromic). When dealing with this phenotype it is important to be aware of the difficulties that could be encountered, in order to choose the most effective approach for reaching a proper molecular diagnosis. What we have learnt after years of experience in this field and thousands of patients analyzed is that different strategies need to be tailored on patients and populations cohort level. The development and application of a multistep integrated strategy based on TRS, SNP array, and WES has proven to be an extremely powerful tool for the molecular diagnosis of HHL patients. As an example, in the case of syndromes well genetically characterized (e.g., Usher, Alport, Pendred syndromes, etc.) the use of TRS panels that include the subset of genes of interest is highly recommended, since it represents an optimal balance between costs, time, and probability of a positive outcome. On the contrary, in the case of NSHL, after excluding the involvement of GJB2 gene, the high genetic and clinic heterogeneity of the disease as well as the genetic characteristics of a specific population require an extended “whole exome/genome” analysis. For this reason, the most efficient approach is the use of an integrated pipeline that includes WES and molecular karyotyping. In fact, WES allows (a) an early diagnosis of all the syndromes that do not already show all the clinical signs or symptoms, (b) to analyze sequencing data looking at the new deafness genes described in the literature, and (c) to identify new candidates. On the other hand, with molecular karyotyping is possible to detect CNVs, or other complex rearrangements that represent an increasingly frequent cause of HHL. Of course, this strategy has pitfalls: a platform bias (with only a limited number of genes and some of them with huge difference in coverage) and analytic bias (neglecting to consider rearrangements such as deletions and duplications in the analysis). Thus, these tailored protocols need to be performed and analyzed by a multidisciplinary team of geneticists, clinicians, and otolaryngologists, who have extended knowledge of the clinical features of HL diseases and of the genes involved. As for other genetic disorders, this multidisciplinary team needs to focus on case selection including the extent and interpretation of clinical and family history information, required to allow a more precise data analysis and a proper medical care, in calculating the recurrence risk for future generations. Moreover, achieving a balance between effective use of TRS/WES/SNP arrays identifying the specific genetic targets of a patient population will be essential to predict the disease course, to establish genotypeephenotype correlations, and in describing the epidemiological picture of a specific population. As for other disorders, the knowledge of the genetic background responsible of the clinical phenotype is extremely valuable for the scientific community, helping in elucidating the biology of the hearing system, but it also has some practical outcomes, moving forward to the development of future personalized therapeutic approaches that keep into account the prevalence of some genes in different population, and that could be employed in large number of patients. Overall, the study of HHL is an everyday challenge for those who are involved in this field, and requires a collective effort for reaching the final goal of a correct molecular diagnosis, which will ultimately improve the life of the patients.

References

319

References [1] Koffler T, Ushakov K, Avraham KB. Genetics of hearing loss: syndromic. Otolaryngol Clin N Am 2015; 48(6):1041e61. https://doi.org/10.1016/j.otc.2015.07.007. [2] Shearer AE, Hildebrand MS, Smith RJ. Hereditary hearing loss and deafness overview. Seattle: University of Washington; 1993. http://www.ncbi.nlm.nih.gov/pubmed/20301607. [3] Alshuaib WB, Al-Kandari JM, Hasan SM. Classification of hearing loss. In: Update on hearing loss. InTech; 2015. https://doi.org/10.5772/61835. ¨ rztebl 2011;108(25):433e44. https://doi.org/ [4] Zahnert T. Differenzialdiagnose der schwerho¨rigkeit. Dtsch A 10.3238/arztebl.2011.0433. [5] Yang T, Guo L, Wang L, Yu X. Diagnosis, intervention, and prevention of genetic hearing loss. In: Advances in experimental medicine and biology, vol. 1130. Springer New York LLC; 2019. p. 73e92. https://doi.org/ 10.1007/978-981-13-6123-4_5. [6] Bademci G, Cengiz FB, Foster J, et al. Variations in multiple syndromic deafness genes mimic nonsyndromic hearing loss. Sci Rep 2016;6. https://doi.org/10.1038/srep31622. [7] Vona B, Nanda I, Hofrichter MAH, Shehata-Dieler W, Haaf T. Non-syndromic hearing loss gene identification: a brief history and glimpse into the future. Mol Cell Probes 2015;29(5):260e70. https://doi.org/ 10.1016/j.mcp.2015.03.008. [8] Ideura M, Nishio S ya, Moteki H, et al. Comprehensive analysis of syndromic hearing loss patients in Japan. Sci Rep 2019;9(1). https://doi.org/10.1038/s41598-019-47141-4. [9] Morgan A, Lenarduzzi S, Cappellani S, et al. Genomic studies in a large cohort of hearing impaired Italian patients revealed several new alleles, a rare case of uniparental disomy (UPD) and the importance to search for copy number variations. Front Genet 2018;9:681. https://doi.org/10.3389/fgene.2018.00681. [10] Cabanillas R, Din˜eiro M, Cifuentes GA, et al. Comprehensive genomic diagnosis of non-syndromic and syndromic hereditary hearing loss in Spanish patients. BMC Med Genom 2018;11(1). https://doi.org/ 10.1186/s12920-018-0375-5. [11] Azaiez H, Decker AR, Booth KT, et al. HOMER2, a stereociliary scaffolding protein, is essential for normal hearing in humans and mice. PLoS Genet 2015;11(3):e1005137. https://doi.org/10.1371/journal. pgen.1005137. [12] Morgan A, Koboldt DC, Barrie ES, et al. Mutations in PLS1, encoding fimbrin, cause autosomal dominant nonsyndromic hearing loss. Hum Mutat 2019. https://doi.org/10.1002/humu.23891. [13] Del Castillo I, Moreno-Pelayo MA, Del Castillo FJ, et al. Prevalence and evolutionary origins of the del(GJB6-D13S1830) mutation in the DFNB1 locus in hearing-impaired subjects: a multicenter study. Am J Hum Genet 2003;73(6):1452e8. https://doi.org/10.1086/380205. [14] Cama E, Melchionda S, Palladino T, et al. Hearing loss features in GJB2 biallelic mutations and GJB2/GJB6 digenic inheritance in a large Italian cohort. Int J Audiol 2009;48(1):12e7. https://doi.org/10.1080/ 14992020802400654. [15] Morton CC, Nance WE. Newborn hearing screening d a silent revolution. N Engl J Med 2006;354(20): 2151e64. https://doi.org/10.1056/NEJMra050700. [16] Yokota Y, Moteki H, Nishio S ya, et al. Frequency and clinical features of hearing loss caused by STRC deletions. Sci Rep 2019;9(1). https://doi.org/10.1038/s41598-019-40586-7. [17] Miyagawa M, Nishio SY, Usami SI, et al. Mutation spectrum and genotype-phenotype correlation of hearing loss patients caused by SLC26A4 mutations in the Japanese: a large cohort study. J Hum Genet 2014;59(5): 262e8. https://doi.org/10.1038/jhg.2014.12. [18] Hilgert N, Smith RJH, Van Camp G. Function and expression pattern of nonsyndromic deafness genes. Curr Mol Med 2009;9(5):546e64. http://www.ncbi.nlm.nih.gov/pubmed/19601806. [19] Kenna MA. Acquired hearing loss in children. Otolaryngol Clin N Am 2015;48(6):933e53. https://doi.org/ 10.1016/j.otc.2015.07.011.

320

Chapter 16 Hearing loss

[20] Kenneson A, Van Naarden Braun K, Boyle C. GJB2 (connexin 26) variants and nonsyndromic sensorineural hearing loss: a HuGE review. Genet Med 2002;4(4):258e74. https://doi.org/10.1097/00125817-20020700000004. ´ lvarez A, et al. A novel deletion involving the connexin-30 gene, [21] Del Castillo FJ, Rodrı´guez-Ballesteros M, A del(GJB6-d13s1854), found in trans with mutations in the GJB2 gene (connexin-26) in subjects with DFNB1 non-syndromic hearing impairment. J Med Genet 2005;42(7):588e94. https://doi.org/10.1136/ jmg.2004.028324. [22] Guaran V, Astolfi L, Castiglione A, et al. Association between idiopathic hearing loss and mitochondrial DNA mutations: a study on 169 hearing-impaired subjects. Int J Mol Med 2013;32(4):785e94. https:// doi.org/10.3892/ijmm.2013.1470. [23] Kenna MA, Feldman HA, Neault MW, et al. Audiologic phenotype and progression in GJB2 (connexin 26) hearing loss. Arch Otolaryngol Head Neck Surg 2010;136(1):81e7. https://doi.org/10.1001/ archoto.2009.202. [24] Del Castillo FJ, Del Castillo I. DFNB1 non-syndromic hearing impairment: diversity of mutations and associated phenotypes. Front Mol Neurosci 2017;10. https://doi.org/10.3389/fnmol.2017.00428. [25] Khalifa Alkowari M, Girotto G, Abdulhadi K, et al. GJB2 and GJB6 genes and the A1555G mitochondrial mutation are only minor causes of nonsyndromic hearing loss in the Qatari population. Int J Audiol 2012; 51(3):181e5. https://doi.org/10.3109/14992027.2011.625983. [26] Nguyen T, Jeyakumar A. Genetic susceptibility to aminoglycoside ototoxicity. Int J Pediatr Otorhinolaryngol 2019;120:15e9. https://doi.org/10.1016/j.ijporl.2019.02.002. [27] Azaiez H, Booth KT, Ephraim SS, et al. Genomic landscape and mutational signatures of deafnessassociated genes. Am J Hum Genet 2018;103(4):484e97. https://doi.org/10.1016/j.ajhg.2018.08.006. [28] Stenson PD, Mort M, Ball EV, et al. The human gene mutation database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies. Hum Genet 2017;136(6):665e77. https://doi.org/10.1007/s00439-017-1779-6. [29] Fokkema IFAC, Taschner PEM, Schaafsma GCP, Celli J, Laros JFJ, den Dunnen JT. LOVD v.2.0: the next generation in gene variant databases. Hum Mutat 2011;32(5):557e63. https://doi.org/10.1002/humu.21438. [30] Landrum MJ, Lee JM, Benson M, et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res 2018;46(D1):D1062e7. https://doi.org/10.1093/nar/gkx1153. [31] Sherry ST, Ward MH, Kholodov M, et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 2001;29(1):308e11. http://www.ncbi.nlm.nih.gov/pubmed/11125122. [32] Karczewski KJ, Weisburd B, Thomas B, et al. The ExAC browser: displaying reference data information from over 60 000 exomes. Nucleic Acids Res 2017;45(D1):D840e5. https://doi.org/10.1093/nar/gkw971. [33] Rim JH, Lee JS, Jung J, et al. Systematic evaluation of gene variants linked to hearing loss based on allele frequency threshold and filtering allele frequency. Sci Rep 2019;9(1). https://doi.org/10.1038/s41598-01941068-6. [34] Schwarz JM, Ro¨delsperger C, Schuelke M, Seelow D. MutationTaster evaluates disease-causing potential of sequence alterations. Nat Methods 2010;7(8):575e6. https://doi.org/10.1038/nmeth0810-575. [35] Ng PC, Henikoff S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res 2003;31(13):3812e4. http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid¼168916&tool¼pmcentrez& rendertype¼abstract. [36] Kircher M, Witten DM, Jain P, O’Roak BJ, Cooper GM, Shendure J. A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 2014;46(3):310e5. https://doi.org/10.1038/ ng.2892. [37] Desmet F-O, Hamroun D, Lalande M, Collod-Be´roud G, Claustres M, Be´roud C. Human splicing finder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res 2009;37(9):e67. https://doi.org/ 10.1093/nar/gkp215.

References

321

[38] Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res 2010;20(1):110e21. https://doi.org/10.1101/gr.097857.109. [39] Cooper GM, Stone EA, Asimenos G, et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res 2005;15(7):901e13. https://doi.org/10.1101/gr.3577405. [40] Richards S, Aziz N, Bale S, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med 2015;17(5):405e23. https://doi.org/10.1038/gim.2015.30. [41] Kalia SS, Adelman K, Bale SJ, et al. Recommendations for reporting of secondary findings in clinical exome and genome sequencing, 2016 update (ACMG SF v2.0): a policy statement of the American College of Medical Genetics and Genomics. Genet Med 2017;19(2):249e55. https://doi.org/10.1038/gim.2016.190. [42] Rehm HL, Berg JS, Brooks LD, et al. ClinGen d the clinical genome resource. N Engl J Med 2015;372(23): 2235e42. https://doi.org/10.1056/NEJMsr1406261. [43] DiStefano MT, Hemphill SE, Oza AM, et al. ClinGen expert clinical validity curation of 164 hearing loss geneedisease pairs. Genet Med 2019;21(10):2239e47. https://doi.org/10.1038/s41436-019-0487-0. [44] Khan AO, Becirovic E, Betz C, et al. A deep intronic CLRN1 (USH3A) founder mutation generates an aberrant exon and underlies severe usher syndrome on the Arabian Peninsula. Sci Rep 2017;7(1). https:// doi.org/10.1038/s41598-017-01577-8. [45] Shearer AE, Kolbe DL, Azaiez H, et al. Copy number variants are a common cause of non-syndromic hearing loss. Genome Med 2014;6(5):37. https://doi.org/10.1186/gm554. [46] Vona B, Mu¨ller M, Dofek S, Holderried M, Lo¨wenheim H, Tropitzsch A. A big data perspective on the genomics of hearing loss. Laryngo-Rhino-Otol 2019;98:S1e26. https://doi.org/10.1055/a-0803-6149. [47] Tsang SH, Aycinena ARP, Sharma T. Ciliopathy: usher syndrome. In: Advances in experimental medicine and biology, vol. 1085. Springer New York LLC; 2018. p. 167e70. https://doi.org/10.1007/978-3-31995046-4_32. [48] Vozzi D, Morgan A, Vuckovic D, et al. Hereditary hearing loss: a 96 gene targeted sequencing protocol reveals novel alleles in a series of Italian and Qatari patients. Gene 2014;542(2):209e16. https://doi.org/ 10.1016/j.gene.2014.03.033. [49] Jouret G, Poirsier C, Spodenkiewicz M, et al. Genetics of usher syndrome: new insights from a metaanalysis. Otol Neurotol 2019;40(1):121e9. https://doi.org/10.1097/MAO.0000000000002054. [50] Booth KT, Azaiez H, Kahrizi K, et al. Exonic mutations and exon skipping: lessons learned from DFNA5. Hum Mutat 2018;39(3):433e40. https://doi.org/10.1002/humu.23384. [51] Wang HY, Zhao YL, Liu Q, et al. Identification of two disease-causing genes TJP2 and GJB2 in a Chinese family with unconditional autosomal dominant nonsyndromic hereditary hearing impairment. Chin Med J 2015;128(24):3345e51. https://doi.org/10.4103/0366-6999.171440. [52] Francey LJ, Conlin LK, Kadesch HE, et al. Genome-wide SNP genotyping identifies the Stereocilin (STRC) gene as a major contributor to pediatric bilateral sensorineural hearing impairment. Am J Med Genet A 2012; 158A(2):298e308. https://doi.org/10.1002/ajmg.a.34391. [53] Fontana P, Morgutti M, Pecile V, et al. A novel OTOA mutation in an Italian family with hearing loss. Gene Rep 2017;9:111e4. https://doi.org/10.1016/J.GENREP.2017.10.002.  Lassuthova´ P, et al. STRC gene mutations, mainly large deletions, are a very [54] Markova´ SP, Brozkova´ DS, important cause of early-onset hereditary hearing loss in the Czech population. Genet Test Mol Biomarkers 2018;22(2):127e34. https://doi.org/10.1089/gtmb.2017.0155.   [55] Cada Z, Safka Brozkova´ D, Balatkova´ Z, et al. Moderate sensorineural hearing loss is typical for DFNB16 caused by various types of mutations affecting the STRC gene. Eur Arch Oto-Rhino-Laryngol 2019;276(12): 3353e8. https://doi.org/10.1007/s00405-019-05649-5. [56] Zhang Y, Malekpour M, Al-Madani N, et al. Sensorineural deafness and male infertility: a contiguous gene deletion syndrome. BMJ Case Rep 2009. https://doi.org/10.1136/bcr.08.2008.0645.

322

Chapter 16 Hearing loss

[57] Sloan-Heggen CM, Bierer AO, Shearer AE, et al. Comprehensive genetic testing in the clinical evaluation of 1119 patients with hearing loss. Hum Genet 2016;135(4):441e50. https://doi.org/10.1007/s00439-0161648-8. [58] Sobreira N, Schiettecatte F, Valle D, Hamosh A. GeneMatcher: a matching tool for connecting investigators with an interest in the same gene. Hum Mutat 2015;36(10):928e30. https://doi.org/10.1002/humu.22844. [59] Taylor R, Bullen A, Johnson SL, et al. Absence of plastin 1 causes abnormal maintenance of hair cell stereocilia and a moderate form of hearing loss in mice. Hum Mol Genet 2015;24(1):37e49. https://doi.org/ 10.1093/hmg/ddu417. [60] Girotto G, Scheffer DI, Morgan A, et al. PSIP1/LEDGF: a new gene likely involved in sensorineural progressive hearing loss. Sci Rep 2015;5:18568. https://doi.org/10.1038/srep18568. [61] Morgan A, Vuckovic D, Krishnamoorthy N, et al. Next-generation sequencing identified SPATC1L as a possible candidate gene for both early-onset and age-related hearing loss. Eur J Hum Genet 2018;27(1). https://doi.org/10.1038/s41431-018-0229-9(830-17-EJHG. [62] Adzhubei I, Jordan DM, Sunyaev SR. Predicting functional effect of human missense mutations using PolyPhen-2. Curr Protoc Hum Genet 2013. https://doi.org/10.1002/0471142905.hg0720s76. Chapter 7: Unit7.20. [63] Oza AM, DiStefano MT, Hemphill SE, et al. Expert specification of the ACMG/AMP variant interpretation guidelines for genetic hearing loss. Hum Mutat 2018;39(11):1593e613. https://doi.org/10.1002/ humu.23630.

CHAPTER

Familial hypercholesterolemia

17

Joana Rita Chora1, 2, Ana Margarida Medeiros1, 2, Ana Catarina Alves1, 2, Mafalda Bourbon1, 2 1

Cardiovascular Research Group, R&D Unit, Dept of Health Promotion and Prevention of Non-Communicable Diseases, National Institute of Health, Portugal; 2BioISI - Biosystems & Integrative Sciences Institute, Faculty of Sciences, University of Lisboa, Portugal

Familial hypercholesterolemia (FH) is a common autosomal dominant disorder of lipid metabolism, with a heterozygous frequency of 1/250e1/500 in most of the European countries [1]. Clinically FH is characterized by elevated concentrations of plasma cholesterol that accumulates in arteries and tendons from birth leading to premature coronary heart disease (pCHD) [2]. The primary defect in FH is in cholesterol clearance and all genes causing FH identified so far have an important role in the LDL receptor (LDLR) pathway where the LDLR has a main role (Fig. 17.1). Loss-of-function mutations in the LDLR account for more than 90% of the cases [4]. Mutations in apolipoprotein B gene (APOB), the ligand for the LDLR [5e7], or gain-of-function mutations in PCSK9, a new posttranscriptional controller of LDLR [8,9], are also associated with FH, but are found less frequently in FH patients (APOB 5%e10%, PCSK9 3000 unique variants identified in LDLR, APOB, and PCSK9 in ClinVar [11], but for the majority of these variants, important information such as functional validation is lacking [11]. For about 50%e70% of the clinical FH patients a causative variant is found in either LDLR, APOB, or PCSK9, but in 30%e50% a variant in one of the known FH genes is not found [12e16]. In a small percentage of these patients a causative variant is found in FH phenocopy genes, such as ABCG5/8, causing sitosterolemia [17]; LIPA, causing lysosomal acid lipase [18,19]; LDLRAP1, causing autosomal recessive hypercholesterolemia [20]; or APOE, associated with dysbetalipoproteinemia [21]. In the remaining patients novel mechanisms or novel genes must exist to explain these cases. In FH early diagnosis is critical for timely intervention and treatment. In a recent study it was shown that the presence of a pathogenic variant increases the cardiovascular risk up to 16 times compared to non-FH patients with similar LDL values [22,23]. However, there is a huge diagnosis gap concerning the classification of pathogenicity of a variant. Only about 10% of all LDLR variants found worldwide in clinical FH patients (n ¼ 2300, submitted to ClinVar database) [11] have a functional proof of affecting LDLR function. Additionally, and by consequence of the latter, only about 60% of these variants have a likely pathogenic or pathogenic classification following an FH adaptation [24] of the most recent guidelines of the American College of Medical Genetics and Genomics (ACMG) [25], being these the only type of variants that are clinically actionable. For APOB and PCSK9 the scenario is even worse, the majority of variants do not have functional studies so the great majority is classified as variants of unknown significance (VUS). This provides inconclusive results for many families, posing a problem for FH diagnosis. Clinical DNA Variant Interpretation. https://doi.org/10.1016/B978-0-12-820519-8.00019-3 Copyright © 2021 Elsevier Inc. All rights reserved.

323

324

Chapter 17 Familial hypercholesterolemia

FIGURE 17.1 The LDLR pathway. The LDL receptor is synthesized in the ERs, undergoes extensive glycosylation in the Golgi apparatus, and is transported to the cell surface. The LDL receptor specifically binds ApoB in LDL particles (A), internalization (endocytosis) of the complex LDL:ApoB (B and C). Inside the endosome the complex dissociates (D); the receptor is recycled to the cell surface (E), whereas the complex LDL:ApoB is degraded in the lysosome (F). From Alves AC. 2014. Base Gene´tica da Hipercolesterolemia Familiar. Thesis. pp. 1e215 [3].

Variant interpretation in FH The interpretation of the clinical significance of a variant is the most challenging part in the current era, where gene sequencing is becoming commonplace. The Clinical Genome Resource (ClinGen) and ClinVar, two NIH-based efforts, have formed a critical partnership to improve our knowledge of clinically relevant genomic variation. This partnership includes significant efforts in data sharing, data archiving, and collaborative curation to characterize and disseminate the clinical relevance of genomic variation. ClinGen and its procedures were recently recognized by the Food and Drug Administration (FDA) as a valid Genetic Database to be used in in vitro diagnosis (https://clinicalgenome.org/about/ fda-recognition/). Since the ACMG guidelines are a very general guideline, ClinGen groups are developing diseasespecific algorithms. The ClinGen FH variant curation expert panel (VCEP) is composed of different

Variant interpretation in FH

325

members, including clinicians and clinical and molecular geneticists, from all over the world (https:// clinicalgenome.org/working-groups/clinical-domain/cardiovascular/), and has developed an ACMGbased FH algorithm for LDLR variants that will be released soon (personal communication). This will allow FH variants to be classified equally in every lab, in every country. The FH VCEP is now initiating the large curation of all LDLR variants submitted to ClinVar and the classification performed by this group will reach a 3-star level, meaning that following FDA recognition, the classification stated can be used in in vitro genetic diagnosis. APOB and PCSK9 variants will be tackled afterward by the FH VCEP and for now the general ACMG guideline has to be applied. In Tables 17.1 and 17.2 are the variants in these two genes, APOB and PCSK9, respectively, that have a likely pathogenic or pathogenic classification [24,25], and also the variants that do not reach these two classifications, but have functional studies demonstrating that they affect LDL receptor activity. There are several data useful for variant interpretation. These data can be produced by specialized labs and then shared with other labs or they can be internal labs information that should be shared by all labs in a public database as ClinVar. The most important are described in the following sections.

Functional studies Functional studies are one type of evidence that can provide invaluable information for variant interpretation and should be performed, whenever possible, to improve variant classification. ClinGen has published general guidelines for developing valid functional assays [29].

LDLR For LDLR functional studies the most informative studies are the ones where the three main steps of the LDLR cycle are studied: expression, binding, and uptake. Briefly, studies of LDLR function can be divided into three main classes: (i) Studies in heterologous cellsdLDLR-deficient cells, usually CHO-A7 ldlr/-, are transfected with a mutant plasmid; fluorescent labeled LDL (e.g., FITC) is added and monitored by flow cytometry to determine binding and uptake. Expression is evaluated by cytometry with specific antibodies against the LDLR [30]. (ii) Studies in homozygous patient cellsdthe same processes than in (i); labeled LDL monitored by flow cytometry (or in older studies radiolabeled LDL/ApoB [31]), using lymphocytes, fibroblasts, or lymphoblasts isolated from a genetically diagnosed homozygous FH patient [32]. Since only with studies with true homozygous FH it is possible to determine a specific variant activity, only these studies are recommended to be considered valid for ACMG classification [24]. (iii) Studies in heterozygous patient cellsdsame as described in (ii), except that we have to take into account one functional (wild-type) LDLR allele. For example, an overall result of 60% of activity of LDL receptors in a heterozygous patient cell corresponds to roughly 50% contribution from the wild-type allele and 10% from the allele carrying the variant under study [33].

Nucleotide change

Protein

Functional assays by

ACMG classification

APOB

c.148C > T

p.(Arg50Trp)

[61]

VUS

APOB

c.3491G > C

p.(Arg1164Thr)

[5]

Likely pathogenic

APOB

c.9175C > T

p.(Arg3059Cys)

[7]

VUS

APOB

c.10182G > T

p.(Lys3394Asn)

[7]

Likely pathogenic

APOB

c.10519C > T

p.(Arg3507Trp)

[53]

VUS

APOB

c.10579C > T

p.(Arg3527Trp)

[56]

VUS

Mutant APOB accumulates in circulation / PM7 (modified PS3) In individual with FH-specific phenotype / PP4 50%e60% binding and uptake in wt lymphocytes and HepG2 cells with heterozygous patient LDL / PS3 Identified in two unrelated cases / PM8 (modified PS4) In individual with FH-specific phenotype / PP4 In critical region / PM1 70% uptake in HepG2 cells with heterozygous patient LDL / PM7 (modified PS3) In individual with FH-specific phenotype / PP4 In critical region / PM1 Absent from gnomAD / PM2 50% uptake in HepG2 cells with heterozygous patient LDL / PM7 (modified PS3) In individual with FH-specific phenotype / PP4 70% binding in wt fibroblasts with heterozygous patient LDL / PM7 (modified PS3) In individual with FH-specific phenotype / PP4 Same codon as pathogenic variant, c.10580G > A, p.(Arg3527Gln) / PM5 50% proliferation in U397 cells with heterozygous patient LDL / PM7 (modified PS3) In individual with FH-specific phenotype / PP4

Chapter 17 Familial hypercholesterolemia

Gene

326

Table 17.1 Variants in APOB with a pathogenic/likely pathogenic classification [24], or with a functional study.

c.10580G > A

p.(Arg3527Gln)

[59]

Pathogenic

APOB

c.10629C > G

p.(Asn3543Lys)

[57]

VUS

APOB

c.10672C > T

p.(Arg3558Cys)

[26]

VUS

APOB

c.11477C > T

p.(Thr3826Met)

[51]

Likely pathogenic

APOB

c.13480_13482del

p.(Gln4494del)

[5]

Likely pathogenic

327

40%e50% binding and uptake in HepG2 cells with heterozygous patient LDL / PS3 Same codon as pathogenic variant, c.10579C > T, p.(Arg3527Trp) / PM5 Identified in 12 unrelated cases / PM8 (modified PS4) Cosegregates with phenotype in several families / PP1 In individual with FH-specific phenotype / PP4 Absent from gnomAD / PM2 60% proliferation in U397 cells with heterozygous patient LDL / PM7 (modified PS3) In individual with FH-specific phenotype / PP4 70% proliferation in U397 cells with heterozygous patient LDL / PM7 (modified PS3) In individual with FH-specific phenotype / PP4 50%e60% binding and uptake in wt lymphocytes and HepG2 cells with heterozygous patient LDL / PS3 Identified in two unrelated cases / PM8 (modified PS4) In individual with FH-specific phenotype / PP4 50%e60% binding and uptake in wt lymphocytes and HepG2 cells with heterozygous patient LDL / PS3 In critical region / PM1 Identified in two unrelated cases / PM8 (modified PS4) In individual with FH-specific phenotype / PP4

Variant interpretation in FH

APOB

Table 17.2 Variants in PCSK9 with a pathogenic/likely pathogenic classification [24], or with a functional study. Protein

Functional assays by

ACMG classification

PCSK9

c.185C > A

p.(Ala62Asp)

[27]

Likely pathogenic

PCSK9

c.381T > A

p.Ser127Arg

[28,52,55,58]

Likely pathogenic

PCSK9

c.386A > G

p.Asp129Gly

[58]

Likely pathogenic

PCSK9

c.706G > A

p.Gly236Ser

[54]

VUS

PCSK9

c.1061A > T

p.Asn354Ile

[54]

Likely pathogenic

PCSK9

c.1120G > T

p.Asp374Tyr

[28,52,55,60]

Likely pathogenic

PCSK9

c.1120G > C

p.(Asp374His)

[28]

Likely pathogenic

PCSK9

c.1399C > G

p.(Pro467Ala)

[27]

VUS

50% cell surface LDLR, 65% uptake in HepG2 cells / PS3 Absent from gnomAD / PM2 In individual with FH-specific phenotype / PP4 No secreted PCSK9, 50% cell surface LDLR, 70% binding in Huh-7 cells / PS3 Absent from gnomAD / PM2 In individual with FH-specific phenotype / PP4 No secreted PCSK9, 50% cell surface LDLR, and 70% binding in Huh-7 cells / PS3 Absent from gnomAD / PM2 In individual with FH-specific phenotype / PP4 Normal autocatalytic activity, 5%e10% secreted, retained in the ER in HEK cells / PS3 In individual with FH-specific phenotype / PP4 5% autocatalytic activity and C, p.(Asp374His) / PM5 In individual with FH-specific phenotype / PP4 35% LDLR activity in HEK cells / PS3 Absent from gnomAD / PM2 Same codon as pathogenic variant, c.1120G > T, p.Asp374Tyr / PM5 In individual with FH-specific phenotype / PP4 40% cell surface LDLR, 60% uptake in HepG2 cells / PS3 In individual with FH-specific phenotype / PP4

Chapter 17 Familial hypercholesterolemia

Nucleotide change

328

Gene

Variant interpretation in FH

329

Other types of studies involve: (iv) RNA sequencing, with or without transcript quantificationdused to evaluate splicing variants [34]. (v) Luciferase studiesdto evaluate promoter variants, using a reporter gene construct and measuring fluorescent intensity [35]. However, note that not all variants with a functional study showing to affect LDLR function are immediately classified as likely pathogenic or pathogenic using ACMG guidelines.

APOB As the ApoB molecules are part of LDL particles, functional studies for APOB variants involve isolation of LDL band from serum of a patient with the variant, labeling this LDL with FITC and then using this LDL to perform the experiences with a wild-type cell line of heterologous cells (e.g., HepG2) or lymphocytes from a wild-type individual, and test this way the ability of the mutant ApoB to bind to LDL receptors, performed by cytometry as described by Alves et al. [5]. Alternatively, the isolated LDL particles can be added to U937 cells, a cholesterol-dependent cell line, which will proliferate according to the mutant ApoB’s ability to bind to LDL receptors [5,26].

PCSK9 PCSK9 is involved in LDLR recycling and as such, PCSK9 variants’ functional studies test the change in amount of LDL receptors at the cell surface: LDLR binding and uptake. They are performed in wildtype HepG2 or HEK cell lines which are transfected with mutant PCSK9 transcripts by the same methodology presented before for LDLR variants (i) [27,28].

Cosegregation Cosegregation studies are very important since they represent an in vivo functional assay. However, an individual laboratory may not have access to a large number of family members. So, cosegregation of the variant with the phenotype should be collected and shared in public databases, as ClinVar, to allow for a definite diagnosis to be achieved. For cosegregation analysis in FH the definition that has been used of an affected individual is a person with untreated total cholesterol (TC) or LDL-C above the 75th percentile adjusted for age and sex, after alternative causes of high lipid values such as diabetes mellitus or untreated hypothyroidism are excluded. If no untreated values are available, these can be estimated based on the specific medication and dose [36] or by use of general correction factors, 0.8 and 0.7, corresponding to an estimated 20% TC and 30% LDL-C reduction on treatment, respectively [37,38]. There is evidence of cosegregation when affected individuals present the variant in question, and lack of cosegregation when either nonaffected individuals carry the variant or affected individuals in the family do not carry the variant (if there is a chance of having another dyslipidemia in the family, these individuals should not count for segregation). As mentioned before, FH has a dominant transmission; as such, in a family with a heterozygous index case, only one branch of the family should be considered. For example, in Fig. 17.2, FH is transmitted from the index case’s mother side; individual II:1 should not be counted as a lack of segregation. For a large number of variants, cosegregation is vital to reach a likely pathogenic or likely benign classification and all efforts to share this information are important.

330

Chapter 17 Familial hypercholesterolemia

FIGURE 17.2 Pedigree of an FH family. The proband is indicated by an arrow. Half-filled symbols represent heterozygous individuals.

In silico prediction algorithms In silico prediction is a valuable resource for variant curation, especially if no functional studies have been performed. There are several available programs that evaluate different variant characteristics such as evolutionary nucleotide or amino acid conservation, protein structure and function, splicing effects, etc. For an overall missense prediction an ensemble method, REVEL, was developed that combines predictions from 13 individual commonly used computational tools: MutPred, FATHMM, VEST, Poly-Phen, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP, SiPhy, phyloP, and phastCons [39]. For splicing changes, the most commonly used programs are MaxEntScan and Human Splicing Finder or Splice Site Finder. These programs are recommended by ACMG. New in silico methods are being developed and validated by ClinGen groups. Structural or catalytic proteins like ApoB and PCSK9 should not be accessed by in silico programs since they are complex proteins who have not been completely crystalized and, as such, the complete structure of the protein is not yet known. Additionally, some PCSK9 and APOB variants have also been associated with hypocholesterolemia and in silico algorithms only evaluate if the variant affects protein function, and not if the effect will cause hyper- or hypocholesterolemia [5,40]. At the present time, following original ACMG guidelines about half of the variants do not have enough evidence to be classified as either pathogenic or benign and remain as VUS [24]. Using the

Cases presentations

331

specific ClinGen FH guideline under development can improve these numbers, but only if data sharing in ClinVar is generalized between different groups to gather enough evidence on a variant pathogenicity, so a conclusive diagnosis can be reached. Important information to be shared is: (1) number of independent families identified in each country where each variant has been reported, (2) index case phenotype, and (3) cosegregation data. Submission of valid functional studies [29] to ClinVar is also important since this saves time to curators that do not need to be looking for all published functional assays.

Laboratory genetic testing for FH The FH genetic diagnosis has been performed until now by PCR amplification and direct Sanger sequence of fragments of LDLR promoter, coding, and splicing regions, APOB fragments of exon 26 and 29, and in some labs study of promoter, coding, and splicing regions of PCSK9. Study of large rearrangements in LDLR is easily performed since 2006 by multiplex ligation-dependent probe amplification (MLPA). More recently labs are introducing next-generation sequencing (NGS) panels for FH or general dyslipidemia with a range of 4e50 genes. The 2018 consensus paper on genetics diagnosis of FH [41] recommends using an eight-gene panel: the three FH genes, LDLR, APOB, and PCSK9, and FH phenocopy genes, ABCG5, ABCG8, LIPA, LDLRAP1, and APOE. Some examples will be presented below with a discussion about the evidence for variant interpretation and when existing, reasons for conflicting interpretations. The variant classification was performed using an adapted version [24] of the original ACMG guidelines [25]. These classifications can change when the new ClinGen FH algorithm will be approved and applied to LDLR variants and whenever new evidence is published, especially new functional studies and cosegregation data.

Cases presentations Case A Presentation of the case Patient A has been referred for genetic testing because his fasting lipid profiles showed high levels of total plasma cholesterol (326 mg/dL) and LDL cholesterol (221 mg/dL). He also has a family history of hypercholesterolemia (mother and maternal uncle).

Laboratory results The laboratory used polymerase chain reaction (PCR) amplification followed by direct sequencing of DNA fragments of LDLR promoter, exons plus flanking intron regions, and two fragments of exons 26 and 29 of APOB and MLPA for the search of large rearrangements. The patient was identified with the heterozygous LDLR variant c.1322T > C, a substitution in exon 9, which is predicted to result in an amino acid change p.(Ile441Thr) in the EGF precursor homology domain (Fig. 17.3B). The result was confirmed in a second independently extracted DNA sample. Study of large rearrangements in LDLR was negative.

332

Chapter 17 Familial hypercholesterolemia

FIGURE 17.3 (A) Pedigree of case A. The proband is indicated by an arrow. Half-filled black symbols represent LDLR c.1322T > C/p.(Ile441Thr) heterozygous individuals. (B) Nucleotide sequence of a fragment of exon 9 of the LDLR gene showing the c.1322T > C/p.(Ile441Thr) variant. (C) ACMG classification [24] for LDLR c.1322T > C/p.(Ile441Thr).

Using a cascade screening approach, it was identified the same variant in the mother and in the maternal uncle (Fig. 17.3A).

Variant interpretation This variant was described before in FH patients [42]. Functional studies to assess the pathogenicity of the variant showed that this variant decreases the LDLR protein activity (7% of LDLR in cell surface, 5% of LDL-LDLR binding, 10% LDL-LDLR internalization in heterologous cells) [43]. According to the guidelines of the American College of Medical Genetics and Genomics (ACMG) [24,25], this variant is considered to be pathogenic (Fig. 17.3C).

Cases presentations

333

Case B Presentation of the case Patient B has been referred for genetic testing by an endocrinologist because her fasting lipid profile showed high levels of plasma total and LDL cholesterol, remaining elevated after 3 months of dietary treatment (TC 291 mg/dL and LDL cholesterol 212 mg/dL). Further examination identified a family history of hypercholesterolemia (father, uncles, and cousins).

Laboratory results The laboratory used an NGS panel of eight genes (LDLR, APOB, PCSK9, ABCG5, ABCG8, LIPA, LDLRAP1, and APOE). The NGS panel included the promoter and the coding regions of each gene plus 50 pb of intronic flanking regions. Study of large rearrangements in LDLR by MLPA was negative. The patient was identified with the heterozygous LDLR variant c.2096C > T, a substitution in exon 14, which is predicted to result in an amino acid change p.(Pro699Leu) in the EGF precursor homology domain. The laboratory confirmed the presence of the variant using a Sanger sequencing approach (Fig. 17.4B). The result was also confirmed in a second independently extracted DNA sample. Using a cascade screening approach this variant was also identified in the father, paternal uncle, and first-degree cousin (Fig. 17.4A).

Variant interpretation This variant was described before in FH patients [44]. Functional studies to assess the pathogenicity of this variant have never been performed. According to the guidelines of the ACMG [24,25] this variant is considered to be of uncertain significance (VUS) due to the lack of available evidence regarding its pathogenicity (Fig. 17.4C).

Case C Presentation of the case Patient C has been referred for genetic testing by his pediatrician because his fasting lipid profile showed high levels of plasma cholesterol (TC 352 mg/dL, LDL cholesterol 263 mg/dL, HDL cholesterol 61 mg/dL, and triglycerides 140 mg/dL) before initiation of therapy. He also has a family history of hypercholesterolemia (father and sister).

Laboratory results The laboratory used an NGS panel of eight genes (LDLR, APOB, PCSK9, ABCG5, ABCG8, LIPA, LDLRAP1, and APOE). The NGS panel included the promoter and the coding regions of each gene plus 50 pb of intronic flanking regions. Study of large rearrangements in LDLR by MLPA was negative. The patient was identified with the heterozygous LDLR variant c.1186D5G > A, a substitution in the fifth position of the intron 8. The laboratory confirmed the presence of the variant using a Sanger sequencing approach and the result was also confirmed in a second independently extracted DNA sample. Cascade screening identified the presence of this variant in the father and sister (Fig. 17.5A).

334 Chapter 17 Familial hypercholesterolemia

FIGURE 17.4 (A) Pedigree of case B. The proband is indicated by an arrow. Half-filled black symbols represent LDLR c.2096C > T/p.(Pro699Leu) heterozygous individuals. N represents individuals negative for the LDLR variant. Symbol ? represents individuals with clinical diagnosis of FH but not tested. Symbol crossed by a line represents a deceased individual. (B) Nucleotide sequence of a fragment of exon 14 of the LDLR gene showing the c.2096C > T/p.(Pro699Leu) variant. (C) ACMG classification [24] for LDLR c.2096C > T/p.(Pro699Leu).

Cases presentations

335

FIGURE 17.5 (A) Pedigree of case C. Half-filled black symbols represent LDLR c.1186þ5G > A heterozygous individuals. Symbol ? represents individuals with clinical diagnosis of FH but not tested. (B) ACMG classification [24] for LDLR c.1186þ5G > A.

Variant interpretation This variant was described before in FH patients [45]. Functional studies to assess the pathogenicity of this variant were performed by Ref. [33] using lymphocytes from heterozygous patients. The study showed that the c.1186þ5G > A variant destroys the 50 donor splice site of exon 8 and cause intron 8 inclusion in the mature mRNA. This variant results in a retention of part of intron 8 that will result in a premature stop codon (p.G396fs26). The aberrant transcript showed an LDLR protein activity of 50%e60% (cell surface LDLR, LDL-LDLR binding, and internalization assays) [33]. According to the guidelines of the ACMG [24,25] this variant is considered to be likely pathogenic (Fig. 17.5B).

Case D Presentation of the case Patient D was admitted in the intensive care unit of the hospital for an acute anterior wall myocardial infarction. She is a smoker and physical examination revealed tendinous xanthomas (TC 374 mg/dL and LDL cholesterol 259 mg/dL at admission). She also has a family history of hypercholesterolemia and premature coronary heart disease (Fig. 17.6A).

336 Chapter 17 Familial hypercholesterolemia

FIGURE 17.6 (A) Pedigree of case D. The proband is indicated by an arrow. Half-filled black symbol represents LDLR c.313þ1G > A heterozygous individual. Symbol ? represents individuals with clinical diagnosis of FH but not tested. Symbol crossed by a line represents deceased individuals. (B) Nucleotide sequence of a fragment of exon 3 and intron 3 of the LDLR gene showing the c.313þ1G > A variant. (C) ACMG classification [24] for LDLR c.313þ1G > A.

Cases presentations

337

Laboratory results The laboratory used an NGS panel of eight genes (LDLR, APOB, PCSK9, ABCG5, ABCG8, LIPA, LDLRAP1, and APOE). The NGS panel included the promoter and the coding regions of each gene plus 50 pb of intronic flanking regions. Study of large rearrangements in LDLR by MLPA was negative. The patient was identified with the heterozygous LDLR variant c.313D1G > A, a substitution in the first position of the intron 3. The laboratory confirmed the presence of the variant using a Sanger sequencing approach (Fig. 17.6B). The result was also confirmed in a second independently extracted DNA sample.

Variant interpretation This variant was described before in FH patients [46]. Functional studies to assess the pathogenicity of this variant were performed by Ref. [47] using lymphocytes from heterozygous and homozygous patients. The study showed that the c.313þ1G > A variant disrupts the donor splice site of intron 3 and resulting in alternative splicing: one aberrant transcript has an in-frame skipping of exon 3 (p.Leu64_Pro105delinsSer) and the other aberrant transcript has inclusion of intron 3 (p.Pro105_Ala860delinsArgLysCysGlyProAlaPheAlaIleGluProIle). The transcript that results from the skipping of exon 3 will produce a receptor where the 41 residues of repeat 2 of the ligand-binding domain have been deleted. The transcript containing the inclusion of intron 3 will not produce a functional protein. The study in homozygous patients showed an LDLR activity of 30% [47]. According to the guidelines of the ACMG [24,25] this variant is considered to be pathogenic (Fig. 17.6C).

Case E Presentation of the case Patient E fasting lipid profile showed high levels of plasma cholesterol: TC 405 mg/dL before initiation of therapy and TC 388 mg/dL, LDL cholesterol 285 mg/dL, HDL cholesterol 73 mg/dL, and triglycerides 150 mg/dL, under treatment with a statin. He has a family history of hypercholesterolemia, with son and brother both affected.

Laboratory results The laboratory used PCR amplification followed by direct sequencing of DNA fragments of LDLR promoter, exons plus flanking intron regions, and two fragments of exons 26 and 29 of APOB and MLPA for the search of large rearrangements in LDLR. Study of large rearrangements by MLPA was negative. The patient was identified with the heterozygous LDLR variant: c.806G > A, a substitution in exon 5, which is predicted to result in an amino acid change p.(Gly269Asp) in the ligand-binding domain. A second heterozygous LDLR variant was also identified: c.818-2A > G. Using a cascade screening approach these variants were identified to be in different alleles (Fig. 17.7A).

Variant interpretation The variant c.806G > A/p.(Gly269Asp) was described before in FH patients [48]. Functional studies to assess the pathogenicity of the variant revealed that this variant presents normal LDLR activity in

338 Chapter 17 Familial hypercholesterolemia

FIGURE 17.7 (A) Pedigree of case E. The proband is indicated by an arrow. Half-filled black symbols represent LDLR c.818-2A > G heterozygous individuals. Half-filled gray symbols represent LDLR c.806G > A/p.(Gly269Asp) heterozygous individuals. N represents individuals negative for the LDLR variant. Symbol ? represents individuals with clinical diagnosis of FH but not tested. (B) ACMG classification [24] for LDLR c.806G > A/p.(Gly269Asp). (C) ACMG classification [24] for LDLR c.818-2A > G/p.Val273Glufs31.

Cases presentations

339

heterozygous patients’ lymphocytes [33]. The variant c.818-2A > G has been described before [49] and the functional study showed that this variant retains 10 nucleotides of intron 5 leading to a premature stop codon (p.Val273Glufs31) [14]. According to the guidelines of the ACMG [24,25] the variant c.806G > A/p.(Gly269Asp) is considered to be likely benign (Fig. 17.7B), and the cause of FH in patient E is due to the presence of the variant c.818-2A > G/p.Val273Glufs31 variant, classified as pathogenic following ACMG [24,25] (Fig. 17.7C).

Case F Presentation of the case Patient F came to emergency department of the hospital complaining of angina chest pain. The electrocardiogram was consistent with acute anterior myocardial infarction. His fasting lipid profile showed high levels of total plasma cholesterol: TC 542 mg/dL, LDL cholesterol 472 mg/dL, HDL cholesterol 49 mg/dL, and triglycerides 105 mg/dL. He also has had arcus cornealis from age 42, and he had been treated for hypercholesterolemia (with a statin), but this was discontinued due to adverse side effects.

Laboratory results The laboratory used PCR amplification followed by direct sequencing of DNA fragments of promoter, exons plus flanking intron regions of LDLR and PCSK9 genes, and two APOB fragments (from exons 26 and 29) and didn’t reveal any pathogenic variant (SNP). Study of large rearrangements in LDLR was performed using MLPA technique (Fig. 17.8B). The patient was identified with the heterozygous LDLR CNV c.[1-?_190þ?del;1061-?_1845þ?del], a large deletion including promoter region to exon 2 and exon 8 to exon 12 (Pr_ex2þex8_12del). Large rearrangements including several exons of a gene will not produce a functional protein. Cascade screening identified the presence of this variant in both parents, son, sister, and niece (Fig. 17.8A). Family history indicated a brother who died at the age of 15 of sudden death, who was probably a homozygous patient.

Variant interpretation This CNV has been described before in FH patients [49]. According to the guidelines of the ACMG [24,25] this variant is considered to be pathogenic (Fig. 17.8C).

Case G Presentation of the case Patient G has been referred for genetic testing by his internist because his fasting lipid profile showed high levels of plasma cholesterol: TC 317 mg/dL, LDL cholesterol 232 mg/dL, triglycerides 55 mg/dL, HDL cholesterol 74 mg/dL, before initiation of therapy. He also has a family history of hypercholesterolemia (mother and brothers) (Fig. 17.9A).

340 Chapter 17 Familial hypercholesterolemia

FIGURE 17.8 (A) Pedigree of case F. The proband is indicated by an arrow. Half-filled black symbols represent LDLR Pr_ex2þex8_12del heterozygous individuals. N represents individuals negative for the LDLR Pr_ex2þex8_12del variant. Symbol ? represents individuals with clinical diagnosis of FH but not tested. Symbol crossed by a line represents a deceased individual. (B) MLPA result of the LDLR gene showing the Pr_ex2þex8_12del variant. On the left, electropherogram of patient D. On the right, calculated probe ratio of patient F sample (top) normalized to the reference sample (bottom), as displayed by Coffalyser.Net software; displaying probes by chromosomal location reveals a heterozygous deletion (Pr_ex2þex8_12del), probe ratio 0.5, in the patient sample (red dots). (C) ACMG classification [24] for LDLR c.[1-? _190þ?del; 1061-?_1845þ?del].

(A) Pedigree of case G. The proband is indicated by an arrow. Half-filled black symbols represent APOB exon 26: c.10580G > A/p.(Arg3527Gln) heterozygous individuals. N represents individuals negative for the APOB exon 26: c.10580G > A/p.(Arg3527Gln) variant. Symbol ? represents individuals with clinical diagnosis of FH but not tested. Symbol crossed by a line represents deceased individuals. (B) Nucleotide sequence of a fragment of part of the exon 26 of the APOB gene showing the c.10580G > A/p.(Arg3527Gln) variant. (C) ACMG classification [24] for APOB c.10580G > A/p.(Arg3527Gln).

Cases presentations

FIGURE 17.9

341

342

Chapter 17 Familial hypercholesterolemia

Laboratory results The laboratory used an NGS panel of eight genes (LDLR, APOB, PCSK9, ABCG5, ABCG8, LIPA, LDLRAP1, and APOE). The NGS panel included the promoter and the coding regions of each gene plus 50 pb of intronic flanking regions. Study of large rearrangements in LDLR was performed by MLPA and it was negative. The patient was identified with the heterozygous APOB variant c.10580G > A, a substitution in exon 26, which is predicted to result in an amino acid change p.(Arg3527Gln) in the domain 6 (Fig. 17.9B). The result was confirmed by Sanger sequence in a second independently extracted DNA sample.

Variant interpretation This variant was the first to be described in the APOB gene [50] and it is the most common variant in the APOB gene worldwide. Different functional studies were performed to test the pathogenicity of the variant. Innerarity et al. in 1990 used U937 cells, a cholesterol-dependent cell line [59], and later studies using heterologous cells, where the entire LDLR cycle was studied, were also performed [5]. According to the ACMG guidelines [24,25] this variant is considered to be pathogenic (Fig. 17.9C).

Case H Presentation of the case Patient H has been referred for genetic testing by an endocrinologist because her fasting lipid profile showed high levels of plasma cholesterol: TC 408 mg/dL and LDL cholesterol 315 mg/dL before initiation of therapy and TC 262 mg/dL, triglycerides 131 mg/dL, HDL cholesterol 42 mg/dL and LDL cholesterol 179 mg/dL under treatment with rosuvastatin 20 þ ezetimibe 10. She has a family history of hypercholesterolemia, brother and son both affected; son deceased at the age of 39 years with a myocardial infarction (Fig. 17.10A).

Laboratory results The laboratory used an NGS panel of eight genes (LDLR, APOB, PCSK9, ABCG5, ABCG8, LIPA, LDLRAP1, and APOE). The NGS panel included the promoter and the coding regions of each gene plus 50 pb of intronic flanking regions. Study of large rearrangements in LDLR performed by MLPA was negative. The patient was identified with the heterozygous PCSK9 variant c.1120G > C, a substitution in exon 7, which is predicted to result in an amino acid change p.(Asp374His) in the catalytic domain (Fig. 17.10B). The result was confirmed by Sanger sequencing and also in a second independently extracted DNA sample.

Variant interpretation This variant has been described in only one population to date [49] but in two unrelated individuals. Functional studies to assess the pathogenicity of the variant showed that this variant presents a 77% reduction compared to wild-type and is one of the most potent variants in reducing LDLR [28]. According to the ACMG guidelines [24,25] this variant is considered to be pathogenic (Fig. 17.10C).

(A) Pedigree of case H. The proband is indicated by an arrow. Half-filled black symbols represent the c.1120G > C/p.(Asp374His) variant in PCSK9 gene. N represents individuals negative for the PCSK9 c.1120G > C/p.(Asp374His) variant. Symbol ? represents individuals with clinical diagnosis of FH but not tested. Symbols crossed by a line represent deceased individuals. (B) Nucleotide sequence of a fragment of exon 7 of the PCSK9 gene showing the c.1120G > C/p.(Asp374His) variant. (C) ACMG classification [24] for PCSK9 c.1120G > C, p.(Asp374His).

Cases presentations

FIGURE 17.10

343

344

Chapter 17 Familial hypercholesterolemia

Main final conclusion FH is an excellent model for personalized medicine and an accurate variant interpretation is very important for FH diagnosis. A correct FH diagnosis also allows a better cardiovascular risk stratification of each patient in order to improve patient prognosis. Joint efforts of different teams worldwide to share data will improve variant classification. The ACMG algorithm and ClinGen specifications are excellent tools for variant interpretation and are being improved continually.

References [1] Nordestgaard BG, Chapman MJ, Humphries SE, Ginsberg HN, Masana L, Descamps OS, Wiklund O, Hegele RA, Raal FJ, Defesche JC, Wiegman A, Santos RD, Watts GF, Parhofer KG, Hovingh GK, Kovanen PT, Boileau C, Averna M, Bore´n J, Bruckert E, Catapano AL, Kuivenhoven JA, Pajukanta P, Ray K, Stalenhoef AFH, Stroes E, Taskinen M-R, Tybjærg-Hansen A. Familial hypercholesterolaemia is underdiagnosed and undertreated in the general population: guidance for clinicians to prevent coronary heart disease: consensus Statement of the European Atherosclerosis Society. Eur Heart J 2013;34:3478e90. [2] Marks D, Thorogood M, Neil HAW, Humphries SE. A review on the diagnosis, natural history, and treatment of familial hypercholesterolaemia. Atherosclerosis 2003;168:1e14. [3] Alves AC. Base Gene´tica da Hipercolesterolemia Familiar. Thesis. 2014. p. 1e215. [4] Brown MS, Goldstein JL. A receptor-mediated pathway for cholesterol homeostasis. Science (80e) 1986; 232:34e47. [5] Alves AC, Etxebarria A, Soutar AK, Martin C, Bourbon M. Novel functional APOB mutations outside LDLbinding region causing familial hypercholesterolaemia. Hum Mol Genet 2014;23:1817e28. [6] Innerarity TL, Weisgraber KH, Arnold KS, Mahley RW, Krauss RM, Vega GL, Grundy SM. Familial defective apolipoprotein B-100: low density lipoproteins with abnormal receptor binding. Proc Natl Acad Sci USA 1987;84:6919e23. [7] Motazacker MM, Pirruccello J, Huijgen R, Do R, Gabriel S, Peter J, Kuivenhoven JA, Defesche JC, Kastelein JJP, Hovingh GK, Zelcer N, Kathiresan S, Fouchier SW. Advances in genetics show the need for extending screening strategies for autosomal dominant hypercholesterolaemia. Eur Heart J 2012;33:1360e6. [8] Abifadel M, Varret M, Rabes JP, Allard D, Ouguerram K, Devillers M, Cauaud C, Benjannet S, Wickham L, Erlich D, Derre´ A, Ville´ger L, Farnier M, Beucler I, Bruckert E, Chambaz J, Chanu B, Lecerf JM, Luc G, Moulin P, Weissenbach J, Prat A, Krempf M, Junien C, Seidah NG, Boileau C. Mutations in PCSK9 cause autosomal dominant hypercholesterolemia. Nat Genet 2003;34:154e6. [9] Varret M, Abifadel M, Rabe`s J-P, Boileau C. Genetic heterogeneity of autosomal dominant hypercholesterolemia. Clin Genet 2008;73:1e13. [10] Soutar AK, Naoumova RP. Mechanisms of disease: genetic causes of familial hypercholesterolemia. Nat Clin Pract Cardiovasc Med 2007;4:214e25. [11] Iacocca MA, Chora JR, Carrie´ A, Freiberger T, Leigh SE, Defesche JC, Kurtz CL, DiStefano MT, Santos RD, Humphries SE, Mata P, Jannes CE, Hooper AJ, Wilemon KA, Benlian P, O’Connor R, Garcia J, Wand H, Tichy L, Sijbrands EJ, Hegele RA, Bourbon M, Knowles JW, ClinGen FH Variant Curation Expert Panel. ClinVar database of global familial hypercholesterolemia-associated DNA variants. Hum Mutat 2018;39: 1631e40. [12] Bourbon M, Alves AC, Alonso R, Mata N, Aguiar P, Padro´ T, Mata P. Mutational analysis and genotypephenotype relation in familial hypercholesterolemia: the SAFEHEART registry. Atherosclerosis 2017; 262:8e13.

References

345

[13] Fouchier SW, Kastelein JJP, Defesche JC. Update of the molecular basis of familial hypercholesterolemia in The Netherlands. Hum Mutat 2005;26:550e6. [14] Medeiros AM, Alves AC, Bourbon M. Mutational analysis of a cohort with clinical diagnosis of familial hypercholesterolemia: considerations for genetic diagnosis improvement. Genet Med 2016;18:316e24. [15] Santos RD, Bourbon M, Alonso R, Cuevas A, Va´squez-Ca´rdenas A, Pereira AC, Merchan A, Alves AC, Medeiros AM, Jannes CE, Krieger JE, Schreier L, de Isla LP, Magan˜a-Torres MT, Stoll M, Mata N, Dell Oca N, Corral P, Asenjo S, Ban˜ares VG, Reyes X, Mata P. Clinical and molecular aspects of familial hypercholesterolemia in Ibero-American countries. J Clin Lipidol 2016;0. [16] Talmud PJ, Shah S, Whittall R, Futema M, Howard P, Cooper JA, Harrison SC, Li K, Drenos F, Karpe F, Neil HAW, Descamps OS, Langenberg C, Lench N, Kivimaki M, Whittaker J, Hingorani AD, Kumari M, Humphries SE. Use of low-density lipoprotein cholesterol gene score to distinguish patients with polygenic and monogenic familial hypercholesterolaemia: a case-control study. Lancet 2013;381:1293e301. [17] Rios J, Stein E, Shendure J, Hobbs HH, Cohen JC. Identification by whole-genome resequencing of gene defect responsible for severe hypercholesterolemia. Hum Mol Genet 2010;19:4313e8. [18] Chora JR, Alves AC, Medeiros AM, Mariano C, Lobarinhas G, Guerra A, et al. Lysosomal Acid Lipase Deficiency: a hidden disease among cohorts of familial hypercholesterolaemia? J Clin Lipid 2017;11(2): 477e484.e2. https://doi.org/10.1016/j.jacl.2016.11.002. [19] Stitziel NO, Peloso GM, Abifadel M, Cefalu` AB, Fouchier S, Motazacker MM, Tada H, Larach DB, Awan Z, Haller JF, Pullinger CR, Varret M, Rabe`s J-P, Noto D, Tarugi P, Kawashiri M-A, Nohara A, Yamagishi M, Risman M, Deo R, Ruel I, Shendure J, Nickerson DA, Wilson JG, Rich SS, Gupta N, Farlow DN, Neale BM, Daly MJ, Kane JP, Freeman MW, Genest J, Rader DJ, Mabuchi H, Kastelein JJP, Hovingh GK, Averna MR, Gabriel S, Boileau C, Kathiresan S. Exome sequencing in suspected monogenic dyslipidemias. Circ Cardiovasc Genet 2015;8:343e50. [20] Johansen CT, Dube JB, Loyzer MN, Macdonald A, Carter DE, McIntyre AD, Cao H, Wang J, Robinson JF, Hegele RA. LipidSeq: a next-generation clinical resequencing panel for monogenic dyslipidemias. J Lipid Res 2014. [21] Marduel M, Ouguerram K, Serre V, Bonnefont-Rousselot D, Marques-Pinheiro A, Erik Berge K, Devillers M, Luc G, Lecerf J-M, Tosolini L, Erlich D, Peloso GM, Stitziel N, Nitchke´ P, Jaı¨s J-P, Abifadel M, Kathiresan S, Leren TP, Rabe`s J-P, Boileau C, Varret M. Description of a large family with autosomal dominant hypercholesterolemia associated with the APOE p.Leu167del mutation. Hum Mutat 2013;34:83e7. [22] Khera AV, Won H-H, Peloso GM, Lawson KS, Bartz TM, Deng X, van Leeuwen EM, Natarajan P, Emdin CA, Bick AG, Morrison AC, Brody JA, Gupta N, Nomura A, Kessler T, Duga S, Bis JC, van Duijn CM, Cupples LA, Psaty B, Rader DJ, Danesh J, Schunkert H, McPherson R, Farrall M, Watkins H, Lander E, Wilson JG, Correa A, Boerwinkle E, Merlini PA, Ardissino D, Saleheen D, Gabriel S, Kathiresan S. Diagnostic yield and clinical utility of sequencing familial hypercholesterolemia genes in patients with severe hypercholesterolemia. J Am Coll Cardiol 2016;67:2578e89. [23] Villa G, Wong B, Kutikova L, Ray KK, Mata P, Bruckert E. Prediction of cardiovascular risk in patients with familial hypercholesterolaemia. Eur Hear J Qual Care Clin Outcomes 2017;3:274e80. [24] Chora JR, Medeiros AM, Alves AC, Bourbon M. Analysis of publicly available LDLR, APOB, and PCSK9 variants associated with familial hypercholesterolemia: application of ACMG guidelines and implications for familial hypercholesterolemia diagnosis. Genet Med 2018;20:591e8. [25] Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, Grody WW, Hegde M, Lyon E, Spector E, Voelkerding K, Rehm HL. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of medical genetics and genomics and the association for molecular pathology. Genet Med 2015;17:405e23.

346

Chapter 17 Familial hypercholesterolemia

[26] Pullinger CR, Gaffney D, Gutierrez MM, Malloy MJ, Schumaker VN, Packard CJ, Kane JP. The apolipoprotein B R3531C mutation. Characteristics of 24 subjects from 9 kindreds. J Lipid Res 1999;40:318e27. [27] Alves AC, Etxebarria A, Medeiros AM, Benito-Vicente A, Thedrez A, Passard M, Croyal M, Martin C, Lambert G, Bourbon M. Characterization of the first PCSK9 gain of function homozygote. J Am Coll Cardiol 2015;66:2152e4. [28] Fasano T, Sun X-M, Patel DD, Soutar AK. Degradation of LDLR protein mediated by “gain of function” PCSK9 mutants in normal and ARH cells. Atherosclerosis 2009;203:166e71. [29] Kanavy DM, McNulty SM, Jairath MK, Brnich SE, Bizon C, Powell BC, Berg JS. Comparative analysis of functional assay evidence use by ClinGen variant curation expert panels. Genome Med 2019;11. [30] Etxebarria A, Benito-Vicente A, Alves AC, Ostolaza H, Bourbon M, Martin C. Advantages and versatility of fluorescence-based methodology to characterize the functionality of LDLR and class mutation assignment. PLoS One 2014;9:e112677. [31] Hobbs, et al. Molecular genetics of the LDL receptor gene in familial hypercholesterolemia. Hum Mutat 1992;1:445e66. [32] Banerjee P, Chan KC, Tarabocchia M, Benito-Vicente A, Alves AC, Uribe KB, Bourbon M, Skiba PJ, Pordy R, Gipe DA, Gaudet D, Martin C. Functional analysis of LDLR (low-density lipoprotein receptor) variants in patient lymphocytes to assess the effect of evinacumab in homozygous familial hypercholesterolemia patients with a spectrum of LDLR activity. Arterioscler Thromb Vasc Biol 2019;39:2248e60. [33] Etxebarria A, Palacios L, Stef M, Tejedor D, Uribe KB, Oleaga A, Irigoyen L, Torres B, Ostolaza H, Martin C. Functional characterization of splicing and ligand-binding domain variants in the LDL receptor. Hum Mutat 2012;33:232e43. [34] Bourbon M, Duarte MA, Alves AC, Medeiros AM, Marques L, Soutar AK. Genetic diagnosis of familial hypercholesterolaemia: the importance of functional analysis of potential splice-site mutations. J Med Genet 2009;46:352e7. [35] Khamis A, Palmen J, Lench N, Taylor A, Badmus E, Leigh S, Humphries SE. Functional analysis of four LDLR 5’UTR and promoter variants in patients with familial hypercholesterolaemia. Eur J Hum Genet 2015;23:790e5. [36] Brunham LR, Ruel I, Aljenedil S, Rivie`re JB, Baass A, Tu JV, Mancini GBJ, Raggi P, Gupta M, Couture P, Pearson GJ, Bergeron J, Francis GA, McCrindle BW, Morrison K, St-Pierre J, Henderson M, Hegele RA, Genest J, Goguen J, Gaudet D, Pare´ G, Romney J, Ransom T, Bernard S, Katz P, Joy TR, Bewick D, Brophy J. Canadian cardiovascular society position statement on familial hypercholesterolemia: update 2018. Can J Cardiol 2018;34:1553e63. [37] Baigent C, Keech A, Kearney PM, Blackwell L, Buck G, Pollicino C, Kirby A, Sourjina T, Peto R, Collins R, Simes R, CTT Collaborators. Efficacy and safety of cholesterol-lowering treatment: prospective metaanalysis of data from 90,056 participants in 14 randomised trials of statins. Lancet 2005;366:1267e78. [38] Benn M, Watts GF, Tybjaerg-Hansen A, Nordestgaard BG. Mutations causative of familial hypercholesterolaemia: screening of 98 098 individuals from the Copenhagen General Population Study estimated a prevalence of 1 in 217. Eur Heart J 2016;37:1384e94. [39] Ioannidis NM, Rothstein JH, Pejaver V, Middha S, McDonnell SK, Baheti S, Musolf A, Li Q, Holzinger E, Karyadi D, Cannon-Albright LA, Teerlink CC, Stanford JL, Isaacs WB, Xu J, Cooney KA, Lange EM, Schleutker J, Carpten JD, Powell IJ, Cussenot O, Cancel-Tassin G, Giles GG, MacInnis RJ, Maier C, Hsieh C-L, Wiklund F, Catalona WJ, Foulkes WD, Mandal D, Eeles RA, Kote-Jarai Z, Bustamante CD, Schaid DJ, Hastie T, Ostrander EA, Bailey-Wilson JE, Radivojac P, Thibodeau SN, Whittemore AS, Sieh W. REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am J Hum Genet 2016;99:877e85. [40] Krisko A, Etchebest C. Theoretical model of human apolipoprotein B100 tertiary structure. Proteins 2007; 66:342e58.

References

347

[41] Sturm AC, Knowles JW, Gidding SS, Ahmad ZS, Ahmed CD, Ballantyne CM, Baum SJ, Bourbon M, Carrie´ A, Cuchel M, de Ferranti SD, Defesche JC, Freiberger T, Hershberger RE, Hovingh GK, Karayan L, Kastelein JJP, Kindt I, Lane SR, Leigh SE, Linton MF, Mata P, Neal WA, Nordestgaard BG, Santos RD, Harada-Shiba M, Sijbrands EJ, Stitziel NO, Yamashita S, Wilemon KA, Ledbetter DH, Rader DJ, Convened by the Familial Hypercholesterolemia Foundation. clinical genetic testing for familial hypercholesterolemia. J Am Coll Cardiol 2018;72:662e80. [42] Fouchier SW, Defesche JC, Umans-Eckenhausen MW, Kastelein JP. The molecular basis of familial hypercholesterolemia in The Netherlands. Hum Genet 2001;109:602e15. [43] Benito-Vicente A, Alves AC, Etxebarria A, Medeiros AM, Martin C, Bourbon M. The importance of an integrated analysis of clinical, molecular, and functional data for the genetic diagnosis of familial hypercholesterolemia. Genet Med 2015;17:980e8. [44] Schuster H, Keller C, Wolfram G, Zo¨llner N. Ten LDL receptor mutants explain one third of familial hypercholesterolemia in a German sample. Arterioscler Thromb Vasc Biol 1995;15:2176e80. [45] Amsellem S, Briffaut D, Carrie´ A, Rabe`s JP, Girardet JP, Fredenrich A, Moulin P, Krempf M, Reznik Y, Vialettes B, de Gennes JL, Brukert E, Benlian P. Intronic mutations outside of Alu-repeat-rich domains of the LDL receptor gene are a cause of familial hypercholesterolemia. Hum Genet 2002;111:501e10. [46] Leren TP, Solberg K, Rødningen OK, Tonstad S, Ose L. Two founder mutations in the LDL receptor gene in Norwegian familial hypercholesterolemia subjects. Atherosclerosis 1994;111:175e82. [47] Cameron J, Holla ØL, Kulseth MA, Leren TP, Berge KE. Splice-site mutation c.313þ1, G>A in intron 3 of the LDL receptor gene results in transcripts with skipping of exon 3 and inclusion of intron 3. Clin Chim Acta 2009;403:131e5. [48] Mozas P, Castillo S, Tejedor D, Reyes G, Alonso R, Franco M, Saenz P, Fuentes F, Almagro F, Mata P, Pocovı´ M. Molecular characterization of familial hypercholesterolemia in Spain: identification of 39 novel and 77 recurrent mutations in LDLR. Hum Mutat 2004;24:187. [49] Bourbon M, Alves AC, Medeiros AM, Silva S, Soutar AK. Familial hypercholesterolaemia in Portugal. Atherosclerosis 2008;196:633e42. [50] Soria LF, Ludwing EH, Clarke HRG, Vega GL, Grundy SM, McCarthy BJ. Association between a specific apolipoprotein B mutation and familial defective apolipoprotein B-100. Proc Natl Acad Sci U S A 1989;86: 587e91. [51] Alves AC, Benito-Vicente A, Medeiros AM, Reeves K, Martin C, Bourbon M. Further evidence of novel APOB mutations as a cause of familial hypercholesterolaemia. Atherosclerosis 2018;277:448e56. https:// doi.org/10.1016/j.atherosclerosis.2018.06.819. [52] Benjannet S, Rhainds D, Essalmani R, Mayne J, Wickham L, Jin W, et al. NARC-1/PCSK9 and its natural mutants: Zymogen cleavage and effects on the low density lipoprotein (LDL) receptor and LDL cholesterol. J Biol Chem 2004;279(47):48865e75. https://doi.org/10.1074/jbc.M409699200. [53] Boren J, Ekstrom U, Agren B, Nilsson-Ehle P, Innerarity TL. The molecular mechanism for the genetic disorder familial defective apolipoprotein B100. J Biol Chem 2001;276(12):9214e8. https://doi.org/ 10.1074/jbc.M008890200M008890200. [54] Cameron J, Holla OL, Laerdahl JK, Kulseth MA, Ranheim T, Rognes T, et al. Characterization of novel mutations in the catalytic domain of the PCSK9 gene. J Intern Med 2008;263(4):420e31. https://doi.org/ 10.1111/j.1365-2796.2007.01915.x. [55] Cameron J, Holla ØL, Ranheim T, Kulseth MA, Berge KE, Leren TP. Effect of mutations in the PCSK9 gene on the cell surface LDL receptors. Hum Mol Genet 2006;15(9):1551e8. https://doi.org/10.1093/hmg/ ddl077. [56] Gaffney D, Reid JM, Cameron IM, Vass K, Caslake MJ, Shepherd J, et al. Independent mutations at codon 3500 of the apolipoprotein B gene are associated with hyperlipidemia. Arterioscler Thromb Vasc Biol 1995; 15(8):1025e9.

348

Chapter 17 Familial hypercholesterolemia

[57] Gaffney Dairena, Pullinger Clive R, Reilly Denis St J, Hoffs Michael S, Cameron I, Vass J Keith, et al. Influence of an asparagine to lysine mutation at amino acid 3516 of apolipoprotein B on low-density lipoprotein receptor binding. Clinica Chimica Acta 2002;321:113e21. [58] Homer VM, Marais AD, Charlton F, Laurie AD, Hurndell N, Scott R, et al. Identification and characterization of two non-secreted PCSK9 mutants associated with familial hypercholesterolemia in cohorts from New Zealand and South Africa. Atherosclerosis 2008;196(2):659e66. https://doi.org/10.1016/j.athero sclerosis.2007.07.022. [59] Innerarity TL, Mahley RW, Weisgraber KH, Bersot TP, Krauss RM, Vega GL, et al. Familial defective apolipoprotein B-100: a mutation of apolipoprotein B that causes hypercholesterolemia. J Lipid Res 1990; 31:1337e49. [60] Sun XM, Eden ER, Tosi I, Neuwirth CK, Wile D, Naoumova RP, et al. Evidence for effect of mutant PCSK9 on apolipoprotein B secretion as the cause of unusually severe dominant hypercholesterolaemia. Hum Mol Genet 2005;14(9):1161e9. https://doi.org/10.1093/hmg/ddi128. [61] Thomas ERA, Atanur SS, Norsworthy PJ, Encheva V, Snijders AP, Game L, et al. Identification and biochemical analysis of a novel APOB mutation that causes autosomal dominant hypercholesterolemia. Mol Genet Genomic Med 2013;1(3):155e61. https://doi.org/10.1002/mgg3.17.

CHAPTER

Classification of genetic variants in hereditary cancer genes

18

Lidia Feliubadalo´1, Michael T. Parsons2, Marta Pineda1, Emma Tudini2 1

Molecular Diagnostics Unit, Hereditary Cancer Program, Catalan Institute of Oncology (ICO), Institut d’Investigacio´ Biome`dica de Bellvitge (IDIBELL), ONCOBELL Program, Barcelona, Spain; 2Genetics & Computational Biology Department, QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia

Introduction BRCA1/2-associated hereditary breast and ovarian cancer syndrome The BRCA1 (OMIM #604370) and BRCA2 (OMIM #612555) genes (BRCA1/2) are critical members of the homologous recombination repair pathway [1,2], and two of the most well-established high-risk cancer predisposition genes. Individuals with a heterozygous pathogenic variant in either of these genes have an increased risk of early-onset breast and ovarian cancer, often referred to as hereditary breast and ovarian cancer (HBOC). The risk estimates vary depending on population and study, but a recent prospective study estimated that female carriers of a pathogenic variant in these genes have a cumulative risk of breast cancer to age 80 years of 72% (BRCA1) and 69% (BRCA2) [3]. Additionally, their cumulative risk for ovarian cancer is 44% (BRCA1) and 17% (BRCA2) [3]. While these greatly increased cancer risks are the signature for HBOC, there are also increased risks for pancreatic cancer (males and females), male breast cancer, and prostate cancer. These additional cancer risks are known to be lower than the female breast and ovarian cancer risks [4,5]; however, larger studies are required to generate more accurate risk estimates. The other hallmark of BRCA1/2-associated HBOC is the early onset of disease. The greatest increase in risk is seen between 30 and 60 years of age, where it then declines to be roughly equivalent to population risk by age 80 years [3]. Thus, identification of pathogenic variants is crucial for patient management and screening of their family members. The challenge for clinical genetics services is the accurate classification of variants identified in BRCA1/2, as benign variation is also observed in these genes. Importantly, homozygous or compound heterozygous pathogenic variant carriers are expected to have different phenotypes compared to heterozygous carriers. Namely for BRCA2, this leads to Fanconi anemia [6]. Additionally, although BRCA1 was thought to be embryonically lethal, individuals with biallelic BRCA1 variants have recently been shown to present with a Fanconi anemiaelike phenotype [7e10].

Clinical DNA Variant Interpretation. https://doi.org/10.1016/B978-0-12-820519-8.00003-X Copyright © 2021 Elsevier Inc. All rights reserved.

349

350

Chapter 18 Classification of genetic variants in hereditary

ATM-associated susceptibility to breast cancer Only about 5% of hereditary breast cancer is explained by pathogenic variants in highly penetrant genes with an autosomal dominant inheritance pattern [11] (e.g., BRCA1, BRCA2). Another portion of hereditary breast cancer is explained by moderate-penetrance genes, including ATM (OMIM #607585). Germline monoallelic ATM pathogenic variants increase the risk of cancer, particularly breast cancer (relative risk ¼ 2.4, 95% CI: 1.5e3.8) [12], and have also been associated with colorectal, prostate, and pancreatic cancer predisposition [13e15]. Additionally, biallelic loss-of-function variants in ATM cause the highly penetrant disease ataxiaetelangiectasia, mainly characterized by ataxia, conjunctival telangiectasias, immune deficiency, hypersensitivity to ionizing radiation, chromosomal breakage, sterility, and an increased risk for malignancies (lifetime risk of 30%e40%), particularly leukemia and lymphoma [16,17]. Progressive cerebellar ataxia begins between one and four years and complete penetrance is expected at 10 years. There are rare atypical cases with later onset or diagnosis or with milder disease progression. In atypical patients one of the two pathogenic variants is often a missense or leaky splicing variant, with a slightly retained kinase activity [18]. The ATM gene on chromosome 11 has 63 exons with the initiation codon in exon 2 and a 9.2-kb coding sequence. ATM is a tumor suppressor gene that encodes for a serine/threonine kinase from the PIKK (phosphatidyl inositol-30 kinase-related protein kinase) family that plays a critical role in DNA double-strand break repair [19,20] and in cell cycle control [21]. ATM is a ubiquitous and constitutively expressed large (3056 amino acid) soluble protein that is kept inactive in a homodimeric or multimeric state until its activation upon genomic insults. ATM is then acetylated, autophosphorylated, and released in active kinase monomers [22] that phosphorylate more than 700 substrates such as P53 and checkpoint kinase CHK2, which are involved in cell cycle control, DNA repair, cell survival, and other cellular processes [23]. Cells lacking ATM activity become ionization radiation sensitive and genomically unstable [24]. The protein C-terminal domain contains the conserved FAT/kinase domain/ FATC domain structure [25].

Lynch syndrome Lynch syndrome (LS; OMIM #120435) is the most prevalent hereditary colorectal and endometrial cancer syndrome [26]. It is also characterized by an increased risk for other cancers such as ovary, stomach, small intestine, hepatobiliary tract, urinary tract, brain, and skin [26]. It is an autosomal dominant canceresusceptibility disease caused by inactivating heterozygous germline variants in mismatch repair (MMR) genes (MLH1, MSH2, MSH6, and PMS2). Moreover, constitutional epigenetic alterations in MLH1 and MSH2 are occasionally responsible for the MMR-deficient phenotype in LS patients [27]. Somatic inactivation of the MMR wild-type allele initiates an accumulation of errors mainly in repetitive sequences. Consequently, LS-associated tumors are MMR-deficient, characterized by microsatellite instability (MSI), loss of expression of MMR proteins, and hypermutation status (>10 mutations/Mb) [28]. Constitutional MMR deficiency (CMMRD; OMIM #276300) is the biallelic counterpart of LS. It is a rare devastating cancer syndrome caused by biallelic germline pathogenic variants in the same genes and mainly characterized by the development of hematological, brain, and colorectal tumors during childhood and adolescence [28]. The germline inactivation of both MMR alleles together with somatic polymerase exonuclease domain mutations leads to ultra-hypermutated tumors (>100 mutations/Mb) [30]. Loss of MMR protein expression in both tumor and normal tissue as well as presence of low-level MSI in blood are diagnostic hallmarks of CMMRD [29e31].

BRCA2 c.9976A>T p.(Lys3326Ter)

351

Detection of germline MMR gene pathogenic variants allows the diagnosis of MMR-deficient associated cancer syndromes and the appropriate management of patients and their families [32,33]. However, in routine testing MMR variants of uncertain significance (VUS) are often identified, representing up to 30% of all the identified variants [34], precluding diagnosis for carriers and their relatives. Moreover, the number of MMR VUS identified is increasing by the implementation of LS population screening of colorectal cancer and the use of multigene panel testing [35]. The MLH1 gene is located on chromosome 3, MSH2 and MSH6 on chromosome 2, and PMS2 on chromosome 7. The encoded proteins belong to the MMR system, involved in correction of baseebase mismatches and small insertions and deletions mainly introduced by DNA polymerases during replication, and mispairs formed during recombination or chemically modified bases [36,37]. In humans, baseebase MMR is initiated when the heterodimer formed by MSH2/MSH6 recognizes a mismatch. The MLH1/PMS2 protein complex is then recruited and binds other proteins, such as PCNA and RFC. PMS2 introduces a nick in the daughter strand, ExoI degrades the sequence containing the error, and the strand is finally resynthesized. In the following sections, four mock clinical cases of hereditary cancer suspicion will be presented, where a germline variant in one of the genes above has been detected. The different types of evidence available for each variant will be discussed, in the context of ACMG/AMP variant classification guidelines (Chapter 3) and, where available, other evidence not considered by ACMG/AMP, like multifactorial likelihood analysis (Chapter 11).

BRCA2 c.9976A>T p.(Lys3326Ter) Presentation of the case Fifty-four-year-old female patient diagnosed with carcinoma of the left breast at age 52 (ERþ, PRþ, and HER2þ). Her mother is cancer-free at 80 years and she has two healthy sisters aged 49 and 60. The patient reports that her maternal grandmother had breast cancer at 72 (unconfirmed). There is no other family history of cancer. A breast cancer panel was requested, that identified the germline BRCA2 (Box 18.1) heterozygous variant c.9976A>T.

BOX 18.1 Gene information BRCA2 gene Phenotypes

Hereditary breast and ovarian cancer

Penetrance

Penetrance (by age 80): Breast cancerd0.693 Ovarian cancerd0.173 Definitive (https://search.clinicalgenome.org/kb/genes/HGNC:1101) Autosomal dominant Loss-of-function

Geneedisease validity Mode of inheritance Mechanism of disease

352

Chapter 18 Classification of genetic variants in hereditary

Variant information: BRCA2 c.9976A>T p.(Lys3326Ter) See Box 18.2.

BOX 18.2 Variant information BRCA2 c.9976A>T p.(Lys3326Ter) Genomic location HGVS coding HGVS protein Exon Variant type Zygosity

chr13:32398489A>T (GRCh38); chr13:32972626A>T (GRCh37) NM_000059.3:c.9976A>T NP_000050.2:p.(Lys3326Ter) 27 (of 27) Stop gained Heterozygous

Pathogenicity assessment of the variant Here we present the reasoning by which each criterion is fulfilled or not, grouped by the type of evidence found for this variant.

Population data BRCA1/BRCA2 allele frequency thresholds -

Gene-specific thresholds applied for BRCA1/BRCA2 as per recommendations (ENIGMA BRCA1/ 2 Gene Variant Classification Criteria v2.5.1 June 29, 2017; https://enigmaconsortium.org/library/ general-documents/enigma-classification-criteria/) • Allele frequency 1% in outbred population regarded as stand-alone evidence for Benign

Allele frequency Dataset: gnomAD v2.1.1 (non-cancer) Population frequencies -

Allele frequency assessed for each outbred subpopulation in gnomAD non-cancer dataset (Fig. 18.1) • High frequency (>1%) in European Finnish (not outbred population) • High frequency (0.8%) in European non-Finnish (outbred population)

BRCA2 c.9976A>T p.(Lys3326Ter)

353

FIGURE 18.1 BRCA2 c.9976A>TdScreenshot from gnomAD v2.1.1 non-cancer dataset population allele frequencies. Website: https://gnomad.broadinstitute.org/variant/13-32972626-A-T?dataset¼gnomad_r2_1_non_cancer.

Coverage of exon - Adequate coverage observed for exon (>20X) (Fig. 18.2). - There is limited guidance on what qualifies as “adequate coverage”; however, expert panel recommendations for RUNX1 state that the mean coverage of sequencing data for the gene should be a minimum of 20X (Luo et al. [88]: PMID:31648317). Population criterion BS1 is met (Box 18.3)

354

Chapter 18 Classification of genetic variants in hereditary

FIGURE 18.2 BRCA2 c.9976A>TdScreenshot from gnomAD v2.1.1 non-cancer dataset coverage of exon. Website: https://gnomad.broadinstitute.org/region/13-32972063-32973184?dataset¼gnomad_r2_1_non_cancer.

BOX 18.3 Summary of population data evidence ACMG/AMP criteria

Met/not met

Evidence

BA1

Not met

Frequency below threshold (1%)

BS1

Met

BS2 PM2 PS4

Not met Not met Not met

Frequency of 0.8% in nonFinnish Europeans (gnomAD v2.1.1 non-cancer) No data available Not applicabledBS1 applied Not applicable

Computational and predictive data -

Variant is predicted to encode a premature termination codon • Variant is located at the 30 end of the gene and not predicted to result in nonsense-mediated mRNA decay • Does not disrupt a clinically important domain (see Table 4 of ENIGMA BRCA1/2 Gene Variant Classification Criteria v2.5.1 June 29, 2017; https://enigmaconsortium.org/library/ general-documents/enigma-classification-criteria/)

BRCA2 c.9976A>T p.(Lys3326Ter)

355

Richards et al. [38] recommend caution when interpreting variants in the most 30 exon of a gene, with further considerations outlined by Tayoun et al. [39]. According to the ENIGMA BRCA1/2 Gene Variant Classification Criteria, the last clinically relevant amino acid is between p.3309 and p.3325, and amino acid changes downstream of 3325 are unlikely to be clinically important. Therefore, PVS1 cannot be applied to this variant. - Variant is predicted to encode a premature termination codon therefore bioinformatic predictions based on amino acid changes are not applicable - SpliceAI [40], MaxEntScan [41], and NNSplice [42] do not predict any splice sites at the variant position; thus it is not predicted to create a de novo acceptor site. •

No computational criteria are met (Box 18.4).

BOX 18.4 Summary of computational and predictive data ACMG/AMP criteria

Met/not met

Evidence

BP1

Not met

Not applicable for BRCA2

BP3 BP4

Not met Not met

BP7 PP3

Not met Not met

PM4 PM5 PS1 PVS1

Not Not Not Not

Not applicable Not applicable for premature termination codon variants Not applicable Not applicable for premature termination codon variants Not applicable Not applicable Not applicable Premature termination codon at 30 end of the gene and does not disrupt a clinically important domain

met met met met

Functional data AssaydHomology-directed repair assay Experimental data from Mesman et al. [43] - Homology-directed repair (HDR) assay shows function similar to BRCA2 wild-type (104% HDR capacity relative to wild-type (WT) BRCA2-expressing cells) • HDR assay is appropriate to assess BRCA2 function as assay replicates role of BRCA2 in DNA repair • Assay assesses multiple (>10) variants with established Benign/Likely Benign (n ¼ 20) and Likely Pathogenic/Pathogenic (n ¼ 15) variants classified using alternative clinical data to determine specificity and thresholds. Functional criterion BS3 is met (Box 18.5)

356

Chapter 18 Classification of genetic variants in hereditary

BOX 18.5 Summary of functional data ACMG/AMP criteria

Met/not met

Evidence

BS3

Met

PP2 PM1

Not met Not met

PS3

Not met

Homology-directed repair assay shows function similar to BRCA2 wild-type [43] Not applicable Not in clinically important domain (see Table 4 of ENIGMA BRCA1/2 Gene Variant Classification Criteria v2.5.1 June 29, 2017; https:// enigmaconsortium.org/library/ general-documents/enigmaclassification-criteria/) Not applicabledBS3 applied

Segregation data Cosegregation analysis Data from Wu et al. [44] -

Nonsegregation with disease in >300 families (Cosegregation likelihood ratio 4) or moderate risk (Odds ratio 2e4) (Box 18.11) [47]

BOX 18.11 Summary of non-ACMG/AMP evidence - BRCA2 c.9976A>T is a low-penetrance allele (below two-fold increased risk) [47]

Summary of evidence and final classification (Box 18.12)

BOX 18.12 Summary of evidence and final classification of BRCA2 c.9976A>T Category

ACMG/AMP criteria applied [38]

Evidence

Population data

BS1

Computational and predictive data

Not met

Functional data

BS3

Segregation data

BS4

De novo data Allelic data Other database

Not met Not met N/A

Other data Final classification (ACMG/AMP) Other data not in ACMG/AMP

Not met Benign

Allele frequency 0.8% in non-Finnish Europeans (gnomAD v2.1.1 non-cancer) PVS1 not applied as variant located at 30 end of gene Homology-directed repair assay shows function similar to BRCA2 wild-type [43] Nonsegregation with disease in >300 families [44] No data available No data available Not applicabledas per current recommendations [45] No data available

Caseecontrol study reports odds ratio consistent with low-risk variant. This evidence supports the ACMG/AMP Benign classification.

Biological and clinical interpretation The patient is a germline heterozygous carrier of a benign BRCA2 variantdc.9976A>T p.(Lys3326Ter). This variant should not be used to alter patient management. Family members of the patient do not require screening for this variant.

BRCA2 c.9117G>A

359

BRCA2 c.9117G>A Presentation of the case Thirty-nine-year-old female patient diagnosed with carcinoma of the right breast at age 39 (ERþ, PRþ, and HER2þ). Her mother was diagnosed with breast cancer at 52 and a maternal uncle had prostate cancer at 55. One sister was also diagnosed with breast cancer at 42. Her two other sisters are healthy, aged 32 and 35. Sequencing of BRCA1 and BRCA2 was requested that identified a germline BRCA2 (Box 18.1) heterozygous variant c.9117G>A (Box 18.13).

BOX 18.13 Variant information BRCA2 c.9117G>A p.(Pro3039[) Genomic location HGVS coding HGVS protein Exon Variant type Zygosity

chr13:32379913G>A (GRCh38); chr13:32954050G>A (GRCh37) NM_000059.3:c.9117G>A NP_000050.2:p.(Pro3039¼) 23 (of 27) Synonymous, splice region variant Heterozygous

Pathogenicity assessment of the variant Here we present the reasoning by which each criterion is fulfilled or not, grouped by the type of evidence found for this variant.

Population data BRCA1/BRCA2 allele frequency thresholds - Gene-specific thresholds applied for BRCA1/BRCA2 as per recommendations (ENIGMA BRCA1/ 2 Gene Variant Classification Criteria v2.5.1 June 29, 2017; https://enigmaconsortium.org/library/ general-documents/enigma-classification-criteria/) • Allele frequency 1% in outbred population regarded as stand-alone evidence for Benign

Allele frequency Dataset: gnomAD v2.1.1 (non-cancer)

360

Chapter 18 Classification of genetic variants in hereditary

Population frequencies -

Allele frequency assessed for each outbred subpopulation in gnomAD non-cancer dataset (Fig. 18.3)

FIGURE 18.3 BRCA2 c.9117G>A d Screenshot from gnomAD v2.1.1 non-cancer dataset population allele frequencies. Website: https://gnomad.broadinstitute.org/variant/13-32954050-G-A?dataset¼gnomad_r2_1_non_cancer.

-

• Absent in African, Latino, East Asian, and South Asian populations • Present in one non-Finnish European individual ACMG/AMP guidelines state that variants absent, or at very low frequency, in large outbred population datasets provide moderate evidence toward pathogenicity [38]. There is no guidance on what qualifies as “very low frequency”; however, expert panels have published on other highrisk cancer genes CDH1 [48] and PTEN [49] suggesting that less than 1 in 100,000 alleles is sufficient. As no similar analysis is published for BRCA1 or BRCA2, we have opted to use this cutoff for the PM2 ACMG/AMP code.

BRCA2 c.9117G>A

361

Coverage of exon - Adequate coverage observed for exon (>20X) (Fig. 18.4) - There is limited guidance on what qualifies as “adequate coverage”; however, expert panel recommendations for RUNX1 state that the mean coverage of sequencing data for the gene should be a minimum of 20X (Luo et al. [88]: PMID:31648317).

FIGURE 18.4 BRCA2 c.9117G>AdScreenshot from gnomAD v2.1.1 non-cancer dataset coverage of exon. Website: https://gnomad.broadinstitute.org/region/13-32953798-32954101?dataset¼gnomad_r2_1_non_cancer.

Caseecontrol data Caseecontrol association studydMomozawa et al. [50] - Caseecontrol association study in 7051 unselected breast cancer patients and 11,241 female controls of Japanese ancestry Results from supplementary data 1:

362

Gene BRCA2

-

Chapter 18 Classification of genetic variants in hereditary

HGVS.c

HGVS.p

Carrier frequency in 7051 cases

c.9117G>A p.Pro3039Pro 0.00071

Carrier frequency in 11,241 controls

P-value

0

8.51.E-03 Inf

(95% OR CI) (1.5-inf)

Final clinical significance Pathogenic

Statistically significant caseecontrol study with a lower 95% confidence interval above one The odds ratio and upper 95% confidence interval are reported as infinite due to carrier frequency in controls of zero.

Population criteria PM2 and PS4 are met (Box 18.14).

BOX 18.14 Summary of population data evidence ACMG/AMP criteria

Met/not met

Evidence

BA1

Not met

BS1 BS2 PM2

Not met Not met Met

PS4

Met

Allele frequency below threshold (1%) Allele frequency below threshold No data available Very low allele frequency in gnomad v2.1.1 non-cancer as per recommendations [48,49] Prevalence is statistically increased in affected individuals over controls [50]

Computational and predictive data -

Synonymous change; therefore bioinformatic predictions based on amino acid changes are not applicable

Splice predictors

-

Prediction tool

Score reference

Score variant

MaxEntScan [41] NNSplice [42] SpliceAI [40]

4.28 0.57

4.94 0 6 score 0.89

Bioinformatically predicted that the reference site is a weak donor Variant is predicted to reduce the strength of the donor site by 3/3 predictors.

Computational criterion PP3 is met (Box 18.15).

BRCA2 c.9117G>A

363

BOX 18.15 Summary of computational and predictive data ACMG/AMP criteria

Met/not met

Evidence

BP1

Not met

Not applicable for BRCA2

BP3

Not met

BP4 BP7

Not met Not met

PP3

Met

PM4

Not met

PM5

Not met

PS1 PVS1

Not met Not met

Not applicable for synonymous variants Not applicabledPP3 applied Not applicabledsynonymous variant but predicted splice impact Variant is predicted to impact splicing Not applicable for synonymous variants Not applicable for synonymous or splicing variants See PM5 Not applicable

Functional data Assay 1dPatient mRNA splicing assay Experimental data from Colombo et al. [52]dResults extracted from Table 2.

Allelic mRNA change expression of Predicted protein HGVSmRNA change observeddHGVS- normal Predicted protein changedHGVSnomenclature observedddescription nomenclature transcript(s) changeddescription nomenclature c.9117G>A

Skipping of exon 23

r.[8954_9117del]

Monoallelic

Stop at codon 2988

p.Val2985GlyfsX4

- Patient mRNA splicing assay shows skipping of exon 23 [r.8194_9117del] - This transcript is predicted to encode a premature termination codon [p.Val2985GlyfsX4] - Assessed for allele-specific expression of transcripts produced • As stated in Table 2, full length transcript is only produced by one allele • Therefore, it is assumed that the variant allele does not produce any full length transcript

Assay 2dConstruct-based assay Experimental data from Acedo et al. [51] - mRNA splicing assay identified a transcript with complete skipping of exon 23 [r.8194_9117del] - Important aspects of this construct-based assay:

364

• • • •

Chapter 18 Classification of genetic variants in hereditary

Only assesses variant allele No other transcripts are present in electropherograms This transcript is not present in the wild-type construct (Wt-MGBR2_ex19-27) Construct includes multiple exons either side of exon 23.

Functional criterion PS3 is met (Box 18.16).

BOX 18.16 Summary of functional data ACMG/AMP criteria

Met/not met

Evidence

BS3

Not met

Not applicabledPS3 applied

PP2 PM1 PS3

Not met Not met Met

Not applicable Not applicable Both patient mRNA assay and construct-based assay are in agreement, showing no full length transcript from the variant allele [51,52]. The variant allele is only shown to produce transcripts lacking exon 23, which is predicted to result in a premature termination codon and therefore undergo nonsensemediated mRNA decay.

Segregation data No segregation criteria are met (Box 18.17)

BOX 18.17 Summary of segregation data evidence ACMG/AMP criteria

Met/not met

Evidence

BS4

Not met

No data available

PP1

Not met

No data available

De novo data No de novo criteria are met (Box 18.18)

BRCA2 c.9117G>A

365

BOX 18.18 Summary of de novo data evidence ACMG/AMP criteria

Met/not met

Evidence

PM6

Not met

No data available

PS2

Not met

No data available

Allelic data No allelic criteria are met (Box 18.19)

BOX 18.19 Summary of allelic data evidence ACMG/AMP criteria

Met/not met

Evidence

BP2

Not met

No data available

PM3

Not met

No data available

Other database No other database criteria are met (Box 18.20)

BOX 18.20 Summary of other database evidence ACMG/AMP criteria

Met/not met

Evidence

BP6

N/A

PP5

N/A

Not applicabledas per current recommendations [45] Not applicabledas per current recommendations [45]

Other data No other data criteria are met (Box 18.21)

BOX 18.21 Summary of other data evidence ACMG/AMP criteria

Met/not met

Evidence

BP5

Not met

No data available

PP4

N/A

Not applicable for BRCA2

366

Chapter 18 Classification of genetic variants in hereditary

Other data not considered in ACMG/AMP classification Multifactorial likelihood analysis has been developed for specific genes, including BRCA1 and BRCA2 [53e55], but is not yet considered in the ACMG/AMP guidelines [38].

Multifactorial data from Lindor et al. [56]dResults extracted from Table 6

Gene BRCA2

-

HGVS: protein level p.Pro3039Pro

HGVS: DNA level c.9117G>A

Odds in favor of causality 146

Prior probability of being Reference deleterious Easton et al. [53]

0.96

Posterior probability of being deleterious 1

Odds in favor of causality from Easton et al. [53] are: • Family History Likelihood Ratio: 64.57 • Co-occurrence Likelihood Ratio: 2.24

Multifactorial data supports pathogenicity of the variant (Box 18.22)

BOX 18.22 Summary of non-ACMG/AMP evidence (multifactorial likelihood analysis) -

High prior probability (0.96) is due to bioinformatic prediction of donor abrogation Additional clinical evidence is from family history and co-occurrence data, and both are in favor of pathogenicity Final posterior probability ¼ 1 (Pathogenic)

IARC class 5

ATM c.9007_9034del

367

Summary of evidence and final classification (Box 18.23) BOX 18.23 Summary of evidence and final classification of BRCA2 c.9117G>A Category

ACMG/AMP criteria applied [38]

Evidence

Population data

PS4, PM2

Computational and predictive data Functional data

PP3

Significant caseecontrol odds ratio [50] Very low allele frequency in gnomad v2.1.1 non-cancer as per recommendations [48,49] Predicted to disrupt splice donor site

PS3

Segregation data De novo data Allelic data Other database

Not met Not met Not met N/A

Other data Final classification (ACMG/ AMP) Other data not in ACMG/AMP

Not met Pathogenic

Patient mRNA [52] and construct-based [51] assay show complete exon 23 skipping No data available No data available No data available Not applicabledas per current recommendations [45] No data available

Published multifactorial likelihood analysis supports the ACMG/AMP pathogenic classification.

Biological and clinical interpretation The patient is a germline heterozygous carrier of a pathogenic BRCA2 variant c.9117G>A. Heterozygous BRCA2 pathogenic variants are associated with an increased risk of breast and ovarian cancer.

ATM c.9007_9034del Presentation of the case Forty-six-year-old female patient diagnosed with carcinoma of the right breast at age 29 (ER, PRþ, and HER2 status unknown) and then carcinoma of the left breast at age 47 (ERþ, PRþ, and HER2). She has two healthy sisters but her mother was diagnosed with breast cancer at age 50. The patient reports that her maternal grandmother had leukemia at age 80 (unconfirmed) and her maternal cousin had breast cancer at age 65. A breast cancer panel was requested, that allowed for the identification of the germline ATM (Box 18.24) heterozygous variant c.9007_9034del (Box 18.25).

368

Chapter 18 Classification of genetic variants in hereditary

BOX 18.24 Gene information ATM gene Phenotypes Penetrance

Geneedisease validity Mode of inheritance Mechanism of disease

Ataxiaetelangiectasia (A-T; OMIM #208900) and susceptibility to breast cancer (SBC; OMIM #114480) Breast cancerd6% by age 50 and 33% by age 80 in monoallelic carriers [86] A-Tdalmost 100% at adulthood. Classic A-T has a childhood onset and rapid progression, but milder symptoms and progression are common in atypical A-T [87] Definitive (https://search.clinicalgenome.org/kb/gene-validity/ 9908) Autosomal recessive for A-T, autosomal dominant for SBC Loss-of-function

BOX 18.25 Variant information ATM c.9007_9034del Genomic location HGVS coding HGVS protein Exon Variant type Zygosity

chr11:g.108365344_108365371del (GRCh38); chr11:g.108236071_108236098del (GRCh37) NM_000051.3:c.9007_9034del NP_000042.3:p.(Asn3003Aspfs6) 63 (of 63) Frameshift deletion Heterozygous

Pathogenicity assessment of the variant Here we present the reasoning by which each criterion is fulfilled or not, grouped by the type of evidence found for this variant.

Population data Allele frequency Dataset: gnomAD v2.1.1 (non-cancer) Population frequencies -

The variant does not appear in gnomAD. Although other variants have been found in the deleted region (see Fig. 18.5), no allele with c.9007_9034del has been found among more than 236,000 sequenced alleles.

ATM c.9007_9034del

369

FIGURE 18.5 ATM c.9007_9034deldScreenshot from a portion of the table of ATM variants, frequencies and sequenced alleles in the gnomAD v2.1.1 non-cancer dataset, showing the absence of variant c.9007_9034del. Website: https://gnomad.broadinstitute.org/gene/ENSG00000149311?dataset¼gnomad_r2_1_non_cancer.

Coverage of exon - Adequate coverage observed for exon (>20X), as seen in Fig. 18.6. - There is limited guidance on what qualifies as “adequate coverage”; however, expert panel recommendations for RUNX1 state that the mean coverage of sequencing data for the gene should be a minimum of 20X (Luo et al. [88]: PMID:31648317). Population criterion PM2 is met (Box 18.26)

FIGURE 18.6 ATM c.9007_9034del - Screenshot from coverage of exon in the gnomAD v2.1.1 non-cancer dataset. Website: https://gnomad.broadinstitute.org/region/11-108235990-108236290?dataset¼gnomad_r2_1_non_cancer.

370

Chapter 18 Classification of genetic variants in hereditary

BOX 18.26 Summary of population data evidence ACMG/AMP criteria

Met/not met

Evidence

BA1

Not met

Not applicabledPM2 applied

BS1 BS2 PM2

Not met Not met Met

PS4

Not met

Not applicabledPM2 applied No data available Absent from large population database (gnomAD v2.1.1 non-cancer) No caseecontrol data available

Computational and predictive -

-

The variant is a deletion of 28 nucleotides in the last exon of the gene. It is predicted to produce a frameshift but does not trigger nonsense-mediated mRNA decay (NMD). Truncation of the protein after residue 3003 deletes the FATC domain (aa 3024e3056). This domain is critical for ATM activation as its interaction with the Tip60 histone acetyltransferase allows ATM acetylation and then autophosphorylation. Some missense variants in the FATC domain prevent ATM association with Tip60, ATM acetylation, ATM autophosphorylation, and cell survival after irradiation [57]. According to ClinGen’s recommendation, a frameshift variant not predicted to undergo NMD but altering a region critical to protein function computes as PVS1_Strong (PVS1 evidence type, but with a lower strength) [39].

Splice predictors -

The natural splice acceptor site located 19 bp upstream of the variant (position c.8988) is correctly recognized by the four predictors shown in the table below, and the scores are not reduced by the variant. Furthermore, the variant is not predicted to create a new splice site.

Prediction tool

Score reference

Score variant

MaxEntScan [41] NNSplice [42] SpliceAI [40]

10.46 0.87

10.46 0.88 6 score 0

Summary of evidence -

Frameshift deletion not predicted to trigger NMD but altering a critical region No change in splicing predicted

ATM c.9007_9034del

371

Computational criterion PVS1_Strong is met (Box 18.27).

BOX 18.27 Summary of computational and predictive data evidence ACMG/AMP criteria

Met/not met

Evidence

BP1

Not met

Not applicable

BP3 BP4 BP7 PP3 PM4 PM5 PS1 PVS1_Strong

Not met Not met Not met Not met Not met Not met Not met Met

Not applicable Not applicabledPVS1 applied Not applicable Not applicabledPVS1 applied Not applicable Not applicable Not applicable Frameshift variant not predicted to undergo NMD altering a region critical to protein function

Functional data Assays article 1 Carranza et al. [58]. Variant found in five A-T patients (four in a homozygous and one in a heterozygous state together with the ATM variant c.9007A>G, p.(Asn3003Asp), all sharing a haplotype). T-cell lines from the five patients showed null or trace amounts of ATM protein as assessed by Western blot. Ser1981 ATM autophosphorylation was assayed in T-cell lines from one homozygous patient and showed no activity. T-cell lines from two homozygous patients showed less than 21% survival after irradiation with 1 Gy (radiosensitive), and the T-cells from one homozygous and the heterozygous patient showed survival fractions between 21% and 35% (intermediate radiosensitivity). Negative controls used for these assays were cells from healthy individuals and positive controls included seven patient cells with biallelic NMD-triggering truncating variants. As these assays directly assess ATM function in A-T patient cells, there is the slight possibility that a nondetected variant in cis with c.9007_9034del is responsible for the phenotype. The accumulation of different patients with the variant would decrease this probability, but this is not the case for the patients in this article, since they share a common haplotype.

Assays article 2 Fievet et al. [59]. Variant found in a homozygous classic A-T patient. A lymphoblastoid cell line (LCL) derived from this patient underwent a series of studies to assess ATM function, together with many other patient and control LCLs: (1) Western blot of total ATM protein showed a reduced expression of the variant ATM (a value lower than the authors cutoff of three standard deviations (3SD) below the mean from healthy controls); (2) Western blot of nuclear and cytoplasm extracts showed that less than 5% of ATM was localized in the nucleus (lower than the 3SD cutoff); (3) Phosphorylation assay of CHK2

372

Chapter 18 Classification of genetic variants in hereditary

and KAP1 after treatment with the Topoisomerase 1 inhibitor camptothecin (CPT), which induces double-strand breaks (DSB) during S-phase, indicated reduced phosphorylation of both substrates below the 3SD cutoff; (4) ATM-dependent KAP1 phosphorylation, defined as the p-KAP1 that disappears after exposure to an ATM-inhibitor, was almost completely absent; (5) G2-M phase accumulation after CPT treatment, a feature of ATM-deficient cells, was higher than the þ3SD cutoff. All assays were performed with biological replicates in 36 A-T patients and four healthy individual control cells. Variants tested represented many variant types, with different amounts of protein and ATM activities, from classic and atypical A-T patients. The authors calculate sensitivity of each assay according to its ability to separate A-T LCL values from healthy control values, using the 3SD of the mean of healthy control values as the cutoff for all assays. They find sensitivities of 0.81, 0.91, 0.69, 1, 0.94, 0.95 for ATM protein expression, ATM nuclear fraction, CHK2 phosphorylation, KAP1 phosphorylation, G2-M accumulation at 24 h, and G2-M accumulation at 48 h, respectively. They do not use a set of variants that have been classified as pathogenic according to the ACMG/AMP recommendations (excluding functional criteria) as positive controls. However, the variant can be regarded as functionally abnormal as this was consistently shown across four assays with a sensitivity >90% that included a large number of NMD-triggering truncating variants. Summary of functional evidence -

Several assays on A-T patient cell lines from two different laboratories show low ATM protein expression and abnormal function. The assays had replicates, wild-type controls, and several NMD-triggering truncating variant carrier cells that could be regarded as abnormal-function controls.

Functional criterion PS3 is met (Box 18.28).

BOX 18.28 Summary of functional data evidence ACMG/AMP criteria

Met/not met

Evidence

BS3

Not met

Not applicabledPS3 applied

PP2 PM1 PS3

Not met Not met Met

Not applicable Not applicable Several assays show low ATM protein expression and abnormal function [58,59]

Segregation data No segregation criteria are met (Box 18.29).

BOX 18.29 Summary of segregation data evidence ACMG/AMP criteria

Met/not met

Evidence

BS4

Not met

No data available

PP1

Not met

No data available

ATM c.9007_9034del

373

De novo data No de novo criteria are met (Box 18.30).

BOX 18.30 Summary of de novo data evidence ACMG/AMP criteria

Met/not met

Evidence

PM6

Not met

No data available

PS2

Not met

No data available

Allelic data The variant was initially described in an A-T patient [60] homozygous for the variant. In Carranza et al. [58], the variant was found in five A-T patients (four in a homozygous and one in a heterozygous state together with the ATM VUS c.9007A>G, p.(Asn3003Asp)). The nine alleles carrying the variant shared a common haplotype. Finally, in Fievet et al. [59], a seventh A-T patient was described, also a homozygous carrier of the variant (haplotype unknown). To score these data for PM3 according to the latest ClinGen recommendations (ClinGen Sequence Variant Interpretation Recommendation for in trans Criterion (PM3) Version 1.0 May 2, 2019; https://clinicalgenome.org/site/assets/files/3717/svi_proposal_for_pm3_criterion_-_version_1.pdf), 0.5 points can be added for each homozygous carrier A-T patient, to a maximum of 1.0 (see Fig. 18.7,

FIGURE 18.7 ATM c.9007_9034del - Screenshot of 2 tables of the ClinGen recommendations document. Website: https:// clinicalgenome.org/site/assets/files/3717/svi_proposal_for_pm3_criterion_-_version_1.pdf.

374

Chapter 18 Classification of genetic variants in hereditary

Table 1). The homozygotes from Carranza et al. [58] should be counted as one case as they all share a common haplotype, but there are also two homozygotes from Garcı´a-Pe´rez [60] and Fievet’s [59] papers. The heterozygote described in Carranza et al. [58] can be considered as confirmed in trans, since variants c.9007_9034del and c.9007A>G overlap (they cannot be on the same allele). As c.9007A>G is classified as VUS, c.9007_9034del will score 0.25 points according to ClinGen recommendations. ATM c.9007A>G was classified using the following information: • • • • •

Not present in the gnomAD v2.1.1 non-cancer dataset (PM2) Missense variant where in silico protein algorithms predict the variant as deleterious (PP3) Functional assays published in Carranza et al. [58] with A Presentation of the case Forty-three-year-old patient recently diagnosed with a right-sided colon carcinoma. Tumor showed loss of MLH1/PMS2 protein expression in the absence of MLH1 promoter methylation. His mother died of an abdominal cancer at age 31. Variant analysis of the MLH1 gene (Box 18.35) was requested that allowed for the identification of the germline heterozygous variant c.2041G>A (Box 18.36).

BOX 18.35 Gene information MLH1 gene Phenotypes

Lynch syndrome (OMIM #120435) and constitutional mismatch repair deficiency (CMMRD, OMIM #276300) Cumulative incidence by age 75 in monoallelic carriers [61]

Penetrance

-

Any cancerd81.0% and 71.4% for females and males, respectively Colorectal cancerd48.3% and 57.1% for females and males, respectively Endometrial cancerd37.0%

Biallelic condition: Development of hematological, brain, and colorectal tumors during childhood and adolescence. Penetrance is extremely high reaching more than 90% at age 20 [62]. Definitive (https://search.clinicalgenome.org/kb/genes/HGNC: 7127) Autosomal dominant (Lynch syndrome) and recessive (CMMRD) Loss-of-function

Geneedisease validity Mode of inheritance Mechanism of disease

BOX 18.36 Variant information MLH1 c.2041G>A p.(Ala681Thr) Genomic location HGVS coding HGVS protein Exon Variant type Zygosity

chr3:37048955G>A (GRCh38); chr3:37090446G>A (GRCh37) NM_000249.3:c.2041G>A NP_000240.1:p.(Ala681Thr) 18 (of 19) Missense Heterozygous

MLH1 c.2041G>A

377

Pathogenicity assessment of the variant Here we present the reasoning by which each criterion is fulfilled or not, grouped by the type of evidence found for this variant.

Population data Allele frequency The MLH1 variant c.2041G>A is very rare in reference populations. According to gnomAD v2.1.1 (non-cancer) dataset: • •

Global frequency: 0.0004225% (1/236,712) Max frequency: 0.005657% in East Asian population (1/17,676) (Fig. 18.8)

FIGURE 18.8 MLH1 c.2041G>A - Screenshot from gnomAD v2.1.1 non-cancer dataset population allele frequencies. Website: https://gnomad.broadinstitute.org/variant/3-37090448-T-A?dataset¼gnomad_r2_1_non_cancer.

378

Chapter 18 Classification of genetic variants in hereditary

Coverage of exon -

Adequate coverage observed for exon (>20X) There is limited guidance on what qualifies as “adequate coverage”; however, expert panel recommendations for RUNX1 state that the mean coverage of sequencing data for the gene should be a minimum of 20X (Luo et al. [88]: PMID:31648317).

Summary of evidence -

Extremely low allele frequency in the gnomAD v2.1.1 non-cancer dataset

Population criterion PM2 is met (Box 18.37).

BOX 18.37 Summary of population data evidence ACMG/AMP criteria

Met/not met

Evidence

BA1

Not met

Not applicabledPM2 applied

BS1 BS2 PM2

Not met Not met Met

PS4

Not met

Not applicabledPM2 applied No data available Extremely low allele frequency from large population database (gnomAD v2.1.1 non-cancer dataset) No caseecontrol data available

Computational and predictive data Splice predictors -

SpliceAI [40], MaxEntScan [41], and NNSplice [42] do not predict any splice sites at the variant position.

Protein predictors -

The MLH1 variant c.2041G>A is predicted to encode the missense variant p.(Ala681Thr). The alanine 681 residue is weakly conserved and there is a small physicochemical difference between alanine and threonine. The variant is located within the PMS2 interaction domain of the MLH1 protein. Several bioinformatic predictors (GERP [63], FATHMM [64], LRT [65], MetaLR [66], MetaSVM [66], MutationTaster [67]) predict a damaging effect for variant p.(Ala681Thr) (vs. benign predictions from PROVEAN [68] and Align-GVGD [69]).

MLH1 c.2041G>A

379

Summary of predictive evidence - Missense variant predicted to impair protein function - No change in splicing predicted Computational criterion PP3 is met (Box 18.38).

BOX 18.38 Summary of computational and predictive data evidence ACMG/AMP criteria

Met/not met

Evidence

BP1

Not met

Not applicable

BP3 BP4 BP7 PP3

Not met Not met Not met Met

PM4 PM5 PS1 PVS1

Not Not Not Not

Not applicable Not applicabledPP3 applied Not applicable Missense variant predicted to affect protein function Not applicable Not applicable Not applicable Not applicable

met met met met

Functional data MLH1 c.2041G>A (p.Ala681Thr) has been functionally evaluated in multiple studies, most of them assessing the MMR capacity and protein expression (see the table below). Interestingly, whereas MMR activity has been reported from intermediate to normal, most of the studies reported decreased protein expression of the variant. Functional assessment MLH1 variants

Reference

MMR activity (% normalized to WT)

MLH1 expression (% normalized to WT)

c.2041G>A (p.Ala681Thr)

Takahashi [70] Raevaara [71] Hinrichsen [72] Hardt [73] Drost [74] Gonza´lez-Acosta [75]

69.8% 115% 99% NA 73.6% 54.9%

75% Slightly decreased 51% 41% NA 33.3%

380

Chapter 18 Classification of genetic variants in hereditary

Summary of functional evidence -

-

The obtained results suggest that MLH1 c.2041G>A is probably associated with decreased protein expression, but not with loss of MMR capacity. Since the experimental models used are based on overexpression of the variant by transient transfection in heterologous systems, they are not optimal to assess expression defects. Therefore, functional results have not been taken into account for the classification of this variant. The apparently conflicting results could be partially explained by the defect of the variant on protein stability and the different experimental approaches used [75].

No functional criteria are met (Box 18.39).

BOX 18.39 Summary of functional data evidence ACMG/AMP criteria

Met/not met

Evidence

BS3

Not met

Apparently conflicting results

PP2 PM1 PS3

Not met Not met Not met

Not applicable Not applicable Apparently conflicting results

Segregation data MLH1 c.2041G>A has been reported in several individuals affected with colorectal cancer or LSrelated tumors [76e81]. In two of these families, the variant was reported to segregate with disease in a total of 16 affected family members ([78,79], unpublished data). Since the number of cosegregating meioses is more than seven, PP1_Strong evidence has been considered according to current recommendations [82]. Summary of segregation evidence -

Cosegregation of the variant in 16 affected family members from two families has been reported

Segregation criterion PP1_Strong is met (Box 18.40).

MLH1 c.2041G>A

381

BOX 18.40 Summary of segregation data evidence ACMG/AMP criteria

Met/not met

Evidence

BS4

Not met

Not applicabledPP1 applied

PP1_Strong

Met

Cosegregation in 16 affected members from two families

De novo data No de novo criteria are met (Box 18.41).

BOX 18.41 Summary of de novo data evidence ACMG/AMP criteria

Met/not met

Evidence

PM6

Not met

No data available

PS2

Not met

No data available

Allelic data No allelic criteria are met (Box 18.42).

BOX 18.42 Summary of allelic data evidence ACMG/AMP criteria

Met/not met

Evidence

BP2

Not met

No data available

PM3

Not met

No data available

Other database No other database criteria are met (Box 18.43).

BOX 18.43 Summary of other database evidence ACMG/AMP criteria

Met/not met

Evidence

BP6

N/A

PP5

N/A

Not applicabledas per current recommendations [45] Not applicabledas per current recommendations [45]

382

Chapter 18 Classification of genetic variants in hereditary

Other data Tumor phenotypic features such as MSI and loss of MMR protein expression are considered supporting evidence of pathogenicity for MMR variants [83]. For MLH1 c.2041G>A, tumors from more than 20 carriers have shown high MSI and/or absence of MLH1 protein expression by immunohistochemistry (https://www.insight-database.org/classifications/index.html?gene¼MLH1&variant¼c. 2041G%3EA&protein¼). In most of the carriers, the presence of MLH1 promoter methylation has been tested and discarded. Other data criterion PP4_Strong is met (Box 18.44).

BOX 18.44 Summary of other data evidence ACMG/AMP criteria

Met/not met

Evidence

BP5

Not met

No data available

PP4_Strong

Met

More than 20 tumors showing high microsatellite instability and loss of expression of MLH1 protein

Other data not considered in ACMG/AMP classification Multifactorial likelihood analysis has been developed for specific genes, including MLH1 [84,85]. It can be found in the table below, from the website: https://www.insight-database.org/classifications/ mmr_integrative_eval.html?gene¼MLH1&variant¼c.2041G%3EA.

Gene MLH1

HGVS: DNA level c.2041G>A

Prior probability (used) 0.027 (0.1)

Total MSI/IHC odds in favor of causality 700.72

Total segregation odds in favor of causality 1.50

Odds in favor of causality

Posterior odds

1051.09

116.79

Posterior probability of being IARC deleterious class 0.992

Summary of non-ACMG/AMP evidence (multifactorial likelihood analysis) -

Molecular evidence from tumor data and segregation data are in favor of pathogenicity Final posterior probability ¼ 0.992 (Pathogenic)

5

References

383

BOX 18.46 Summary of evidence and final classification of MLH1 c.2041G>A Category

ACMG/AMP criteria applied [38]

Evidence

Population data

PM2

Very rare in population databases

Computational and predictive data

PP3

Functional data Segregation data

Not met PP1_Strong

De novo data Allelic data Other database

Not met Not met N/A

Other data

PP4_Strong

Missense variant predicted to affect protein function Apparently conflicting results Cosegregation in 16 affected members from two families No data available No data available Not applicabledas per current recommendations [45] More than 20 tumors showing high microsatellite instability and loss of expression of MLH1 protein

Final classification (ACMG/AMP) Other data not in ACMG/AMP

Pathogenic Published InSiGHT multifactorial likelihood analysis supports the ACMG/ AMP pathogenic classification

Summary of evidence and final classification (Box 18.46) Biological and clinical interpretation The analysis performed identifies the pathogenic MLH1 variant c.2041G>A as responsible for LS in the patient.

References [1] Moynahan ME, Chiu JW, Koller BH, Jasin M. BRCA1 controls homology-directed DNA repair. Mol Cell 1999;4(4):511e8. [2] Moynahan ME, Pierce AJ, Jasin M. BRCA2 is required for homology-directed repair of chromosomal breaks. Mol Cell 2001;7(2):263e72. [3] Kuchenbaecker KB, Hopper JL, Barnes DR, et al. Risks of breast, ovarian, and contralateral breast cancer for BRCA1 and BRCA2 mutation carriers. J Am Med Assoc 2017;317(23):2402e16. [4] Antoniou AC, Cunningham AP, Peto J, et al. The BOADICEA model of genetic susceptibility to breast and ovarian cancers: updates and extensions. Br J Canc 2008;98(8):1457e66. [5] Mocci E, Milne RL, Mendez-Villamil EY, et al. Risk of pancreatic cancer in breast cancer families from the breast cancer family registry. Canc Epidemiol Biomarkers Prev 2013;22(5):803e11.

384

Chapter 18 Classification of genetic variants in hereditary

[6] Howlett NG, Taniguchi T, Olson S, et al. Biallelic inactivation of BRCA2 in Fanconi anemia. Science 2002; 297(5581):606e9. [7] Domchek SM, Tang J, Stopfer J, et al. Biallelic deleterious BRCA1 mutations in a woman with early-onset ovarian cancer. Canc Discov 2013;3(4):399e405. [8] Freire BL, Homma TK, Funari MFA, et al. Homozygous loss of function BRCA1 variant causing a Fanconianemia-like phenotype, a clinical report and review of previous patients. Eur J Med Genet 2018;61(3): 130e3. [9] Seo A, Steinberg-Shemer O, Unal S, et al. Mechanism for survival of homozygous nonsense mutations in the tumor suppressor gene BRCA1. Proc Natl Acad Sci USA 2018;115(20):5241e6. [10] Keupp K, Hampp S, Hubbel A, et al. Biallelic germline BRCA1 mutations in a patient with early onset breast cancer, mild Fanconi anemia-like phenotype, and no chromosome fragility. Mol Genet Genom Med 2019; 7(9):e863. [11] Apostolou P, Fostira F. Hereditary breast cancer: the era of new susceptibility genes. BioMed Res Int 2013; 2013:747318. [12] Renwick A, Thompson D, Seal S, et al. ATM mutations that cause ataxia-telangiectasia are breast cancer susceptibility alleles. Nat Genet 2006;38(8):873e5. [13] Pritchard CC, Offit K, Nelson PS. DNA-repair gene mutations in metastatic prostate cancer. N Engl J Med 2016;375(18):1804e5. [14] Roberts NJ, Jiao Y, Yu J, et al. ATM mutations in patients with hereditary pancreatic cancer. Canc Discov 2012;2(1):41e6. [15] Thompson D, Duedal S, Kirner J, et al. Cancer risks and mortality in heterozygous ATM mutation carriers. J Natl Cancer Inst 2005;97(11):813e22. [16] Peterson RD, Funkhouser JD, Tuck-Muller CM, Gatti RA. Cancer susceptibility in ataxia-telangiectasia. Leukemia 1992;6(Suppl. 1):8e13. [17] Khalil HS, Tummala H, Zhelev N. ATM in focus: a damage sensor and cancer target. BioDiscovery 2012;5. [18] Schon K, van Os NJH, Oscroft N, et al. Genotype, extrapyramidal features, and severity of variant ataxiatelangiectasia. Ann Neurol 2019;85(2):170e80. [19] Kurz EU, Lees-Miller SP. DNA damage-induced activation of ATM and ATM-dependent signaling pathways. DNA Repair 2004;3(8e9):889e900. [20] Lee JH, Paull TT. Activation and regulation of ATM kinase activity in response to DNA double-strand breaks. Oncogene 2007;26(56):7741e8. [21] Beamish H, Lavin MF. Radiosensitivity in ataxia-telangiectasia: anomalies in radiation-induced cell cycle delay. Int J Radiat Biol 1994;65(2):175e84. [22] Bakkenist CJ, Kastan MB. DNA damage activates ATM through intermolecular autophosphorylation and dimer dissociation. Nature 2003;421(6922):499e506. [23] Shiloh Y, Ziv Y. The ATM protein kinase: regulating the cellular response to genotoxic stress, and more. Nat Rev Mol Cell Biol 2013;14(4):197e210. [24] Lavin MF. Ataxia-telangiectasia: from a rare disorder to a paradigm for cell signalling and cancer. Nat Rev Mol Cell Biol 2008;9(10):759e69. [25] Baretic D, Pollard HK, Fisher DI, et al. Structures of closed and open conformations of dimeric human ATM. Sci Adv 2017;3(5):e1700933. [26] Lynch HT, Snyder CL, Shaw TG, Heinen CD, Hitchins MP. Milestones of lynch syndrome: 1895e2015. Nat Rev Canc 2015;15(3):181e94. [27] Hitchins MP. Constitutional epimutation as a mechanism for cancer causality and heritability? Nat Rev Canc 2015;15(10):625e34.

References

385

[28] Wimmer K, Kratz CP, Vasen HF, et al. Diagnostic criteria for constitutional mismatch repair deficiency syndrome: suggestions of the European consortium ’care for CMMRD’ (C4CMMRD). J Med Genet 2014; 51(6):355e65. [29] Suerink M, Ripperger T, Messiaen L, et al. Constitutional mismatch repair deficiency as a differential diagnosis of neurofibromatosis type 1: consensus guidelines for testing a child without malignancy. J Med Genet 2019;56(2):53e62. [30] Gallon R, Muhlegger B, Wenzel SS, et al. A sensitive and scalable microsatellite instability assay to diagnose constitutional mismatch repair deficiency by sequencing of peripheral blood leukocytes. Hum Mutat 2019; 40(5):649e55. [31] Gonzalez-Acosta M, Marin F, Puliafito B, et al. High-sensitivity microsatellite instability assessment for the detection of mismatch repair defects in normal tissue of biallelic germline mismatch repair mutation carriers. J Med Genet 2020;57(4):269e73. [32] Vasen HF, Blanco I, Aktan-Collan K, et al. Revised guidelines for the clinical management of Lynch syndrome (HNPCC): recommendations by a group of European experts. Gut 2013;62(6):812e23. [33] Vasen HF, Ghorbanoghli Z, Bourdeaut F, et al. Guidelines for surveillance of individuals with constitutional mismatch repair-deficiency proposed by the European Consortium "Care for CMMR-D" (C4CMMR-D). J Med Genet 2014;51(5):283e93. [34] Thompson BA, Spurdle AB, Plazzer JP, et al. Application of a 5-tiered scheme for standardized classification of 2,360 unique mismatch repair gene variants in the InSiGHT locus-specific database. Nat Genet 2014; 46(2):107e15. [35] Susswein LR, Marshall ML, Nusbaum R, et al. Pathogenic and likely pathogenic variant prevalence among the first 10,000 patients referred for next-generation cancer panel testing. Genet Med 2016;18(8):823e32. [36] Reyes GX, Schmidt TT, Kolodner RD, Hombauer H. New insights into the mechanism of DNA mismatch repair. Chromosoma 2015;124(4):443e62. [37] Jiricny J. Postreplicative mismatch repair. Cold Spring Harb Perspect Biol 2013;5(4):a012633. [38] Richards S, Aziz N, Bale S, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American college of medical genetics and genomics and the association for molecular pathology. Genet Med 2015;17(5):405e24. [39] Abou Tayoun AN, Pesaran T, DiStefano MT, et al. Recommendations for interpreting the loss of function PVS1 ACMG/AMP variant criterion. Hum Mutat 2018;39(11):1517e24. [40] Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, et al. Predicting splicing from primary sequence with deep learning. Cell 2019;176(3):535e548 e524. [41] Yeo G, Burge CB. Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J Comput Biol 2004;11(2e3):377e94. [42] Reese MG, Eeckman FH, Kulp D, Haussler D. Improved splice site detection in Genie. J Comput Biol 1997; 4(3):311e23. [43] Mesman RLS, Calleja F, Hendriks G, et al. The functional impact of variants of uncertain significance in BRCA2. Genet Med 2019;21(2):293e302. [44] Wu K, Hinson SR, Ohashi A, et al. Functional evaluation and cancer risk assessment of BRCA2 unclassified variants. Canc Res 2005;65(2):417e26. [45] Biesecker LG, Harrison SM. ClinGen Sequence Variant Interpretation Working G. The ACMG/AMP reputable source criteria for the interpretation of sequence variants. Genet Med 2018;20(12):1687e8. [46] Meeks HD, Song H, Michailidou K, et al. BRCA2 polymorphic stop codon K3326X and the risk of breast, prostate, and ovarian cancers. J Natl Cancer Inst 2016;108(2). [47] Spurdle AB, Greville-Heygate S, Antoniou AC, et al. Towards controlled terminology for reporting germline cancer susceptibility variants: an ENIGMA report. J Med Genet 2019;56(6):347e57.

386

Chapter 18 Classification of genetic variants in hereditary

[48] Lee K, Krempely K, Roberts ME, et al. Specifications of the ACMG/AMP variant curation guidelines for the analysis of germline CDH1 sequence variants. Hum Mutat 2018;39(11):1553e68. [49] Mester JL, Ghosh R, Pesaran T, et al. Gene-specific criteria for PTEN variant curation: recommendations from the ClinGen PTEN expert panel. Hum Mutat 2018;39(11):1581e92. [50] Momozawa Y, Iwasaki Y, Parsons MT, et al. Germline pathogenic variants of 11 breast cancer genes in 7,051 Japanese patients and 11,241 controls. Nat Commun 2018;9(1):4083. [51] Acedo A, Hernandez-Moro C, Curiel-Garcia A, Diez-Gomez B, Velasco EA. Functional classification of BRCA2 DNA variants by splicing assays in a large minigene with 9 exons. Hum Mutat 2015;36(2):210e21. [52] Colombo M, De Vecchi G, Caleca L, et al. Comparative in vitro and in silico analyses of variants in splicing regions of BRCA1 and BRCA2 genes and characterization of novel pathogenic mutations. PLoS One 2013; 8(2):e57173. [53] Easton DF, Deffenbaugh AM, Pruss D, et al. A systematic genetic assessment of 1,433 sequence variants of unknown clinical significance in the BRCA1 and BRCA2 breast cancer-predisposition genes. Am J Hum Genet 2007;81(5):873e83. [54] Goldgar DE, Easton DF, Deffenbaugh AM, et al. Integrated evaluation of DNA sequence variants of unknown clinical significance: application to BRCA1 and BRCA2. Am J Hum Genet 2004;75(4):535e44. [55] Thompson D, Easton DF, Goldgar DE. A full-likelihood method for the evaluation of causality of sequence variants from family data. Am J Hum Genet 2003;73(3):652e5. [56] Lindor NM, Guidugli L, Wang X, et al. A review of a multifactorial probability-based model for classification of BRCA1 and BRCA2 variants of uncertain significance (VUS). Hum Mutat 2012;33(1):8e21. [57] Sun Y, Jiang X, Chen S, Fernandes N, Price BD. A role for the Tip60 histone acetyltransferase in the acetylation and activation of ATM. Proc Natl Acad Sci USA 2005;102(37):13182e7. [58] Carranza D, Vega AK, Torres-Rusillo S, et al. Molecular and functional characterization of a cohort of Spanish patients with ataxia-telangiectasia. Neuromol Med 2017;19(1):161e74. [59] Fievet A, Bellanger D, Rieunier G, et al. Functional classification of ATM variants in ataxia-telangiectasia patients. Hum Mutat 2019;40(10):1713e30. [60] Garcia-Perez MA, Allende LM, Corell A, et al. Novel mutations and defective protein kinase C activation of T-lymphocytes in ataxia telangiectasia. Clin Exp Immunol 2001;123(3):472e80. [61] Dominguez-Valentin M, Sampson JR, Seppala TT, et al. Cancer risks by gene, age, and gender in 6350 carriers of pathogenic mismatch repair variants: findings from the Prospective Lynch Syndrome Database. Genet Med 2020;22(1):15e25. [62] Lavoine N, Colas C, Muleris M, et al. Constitutional mismatch repair deficiency syndrome: clinical description in a French cohort. J Med Genet 2015;52(11):770e8. [63] Cooper GM, Stone EA, Asimenos G, et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res 2005;15(7):901e13. [64] Shihab HA, Gough J, Cooper DN, et al. Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models. Hum Mutat 2013;34(1):57e65. [65] Chun S, Fay JC. Identification of deleterious mutations within three human genomes. Genome Res 2009; 19(9):1553e61. [66] Dong C, Wei P, Jian X, et al. Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. Hum Mol Genet 2015;24(8):2125e37. [67] Schwarz JM, Cooper DN, Schuelke M, Seelow D. MutationTaster2: mutation prediction for the deepsequencing age. Nat Methods 2014;11(4):361e2. [68] Choi Y, Sims GE, Murphy S, Miller JR, Chan AP. Predicting the functional effect of amino acid substitutions and indels. PLoS One 2012;7(10):e46688.

References

387

[69] Tavtigian SV, Deffenbaugh AM, Yin L, et al. Comprehensive statistical study of 452 BRCA1 missense substitutions with classification of eight recurrent substitutions as neutral. J Med Genet 2006;43(4): 295e305. [70] Takahashi M, Shimodaira H, Andreutti-Zaugg C, Iggo R, Kolodner RD, Ishioka C. Functional analysis of human MLH1 variants using yeast and in vitro mismatch repair assays. Canc Res 2007;67(10):4595e604. [71] Raevaara TE, Korhonen MK, Lohi H, et al. Functional significance and clinical phenotype of nontruncating mismatch repair variants of MLH1. Gastroenterology 2005;129(2):537e49. [72] Hinrichsen I, Schafer D, Langer D, et al. Functional testing strategy for coding genetic variants of unclear significance in MLH1 in Lynch syndrome diagnosis. Carcinogenesis 2015;36(2):202e11. [73] Hardt K, Heick SB, Betz B, et al. Missense variants in hMLH1 identified in patients from the German HNPCC consortium and functional studies. Fam Cancer 2011;10(2):273e84. [74] Drost M, Tiersma Y, Thompson BA, et al. A functional assay-based procedure to classify mismatch repair gene variants in Lynch syndrome. Genet Med 2019;21(7):1486e96. [75] Gonzalez-Acosta M, Hinrichsen I, Fernandez A, et al. Validation of an in vitro mismatch repair assay used in the functional characterization of mismatch repair variants. J Mol Diagn 2019;22(3):376e85. [76] Rodriguez-Soler M, Perez-Carbonell L, Guarinos C, et al. Risk of cancer in cases of suspected lynch syndrome without germline mutation. Gastroenterology 2013;144(5):926e32. e921; quiz e913e924. [77] Kurzawski G, Suchy J, Kladny J, et al. Germline MSH2 and MLH1 mutational spectrum in HNPCC families from Poland and the Baltic States. J Med Genet 2002;39(10):E65. [78] Barnetson RA, Cartwright N, van Vliet A, et al. Classification of ambiguous mutations in DNA mismatch repair genes identified in a population-based study of colorectal cancer. Hum Mutat 2008;29(3):367e74. [79] Froggatt NJ, Brassett C, Koch DJ, et al. Mutation screening of MSH2 and MLH1 mRNA in hereditary nonpolyposis colon cancer syndrome. J Med Genet 1996;33(9):726e30. [80] Bonadona V, Bonaiti B, Olschwang S, et al. Cancer risks associated with germline mutations in MLH1, MSH2, and MSH6 genes in Lynch syndrome. J Am Med Assoc 2011;305(22):2304e10. [81] Borras E, Pineda M, Brieger A, et al. Comprehensive functional assessment of MLH1 variants of unknown significance. Hum Mutat 2012;33(11):1576e88. [82] Jarvik GP, Browning BL. Consideration of cosegregation in the pathogenicity classification of genomic variants. Am J Hum Genet 2016;98(6):1077e81. [83] Walsh MF, Ritter DI, Kesserwan C, et al. Integrating somatic variant data and biomarkers for germline variant classification in cancer predisposition genes. Hum Mutat 2018;39(11):1542e52. [84] Thompson BA, Greenblatt MS, Vallee MP, et al. Calibration of multiple in silico tools for predicting pathogenicity of mismatch repair gene missense substitutions. Hum Mutat 2013;34(1):255e65. [85] Li S, Qian D, Thompson BA, et al. Tumour characteristics provide evidence for germline mismatch repair missense variant pathogenicity. J Med Genet 2020;57(1):62e9. [86] Marabelli M, Cheng SC, Parmigiani G. Penetrance of ATM Gene mutations in breast cancer: a meta-analysis of different measures of risk. Genet Epidemiol 2016;40(5):425e31. [87] Taylor AM, Lam Z, Last JI, Byrd PJ. Ataxia telangiectasia: more variation at clinical and cellular levels. Clin Genet 2015;87(3):199e208. [88] Luo X, Feurstein S, Mohan S, et al. ClinGen myeloid malignancy variant curation expert panel recommendations for germline RUNX1 variants. Blood Adv. 2019;20(3):2962e79.

CHAPTER

19

RASopathies

Lisa M. Vincent1, Karen W. Gripp2, Heather Mason-Suares3 Division of Pathology & Laboratory Medicine, Children’s National Health System, Washington, DC, United States; Division of Medical Genetics, A. I. duPont Hospital for Children, Wilmington, DE, United States; 3Partners Healthcare, Laboratory for Molecular Medicine, Cambridge, MA, United States 1

2

Introduction The RASopathies are a group of phenotypically related developmental disorders including Noonan syndrome (NS; MIM:163950), CBL syndrome (MIM:613563), cardio-facio-cutaneous syndrome (CFC; MIM:115150), Costello syndrome (MIM:218040), Noonan syndrome with multiple lentigines (NSML; MIM:151100), Noonan syndromeelike disorder with loose anagen hair (MIM:607721), neurofibromatosis type 1 (NF1; MIM:162200), and Legius syndrome (LS; MIM:611431). Together, these typically autosomal dominant disorders have a cumulative incidence of about 1:1000, with NS being the most common [1]. Clinically, the RASopathies often present with dysmorphic craniofacial anomalies, cardiac abnormalities, poor growth, short stature, and musculoskeletal abnormalities. Other variable features include neurodevelopmental deficits, ectodermal anomalies, endocrine/metabolic imbalances, and tumor predisposition [2e5]. While generally considered fully penetrant, these disorders exhibit variable expressivity and severity even among individuals and families with the same pathogenic variant [1,5,6]. However, strong clinical associations still exist between each specific syndrome and corresponding gene(s) and/or pathogenic variants within a gene. The RASopathies are caused by pathogenic germline variants in genes encoding key signaling proteins that belong to or regulate the Ras/mitogen-activated protein kinase (Ras/MAPK) pathway. Pathogenic variants in BRAF, CBL, HRAS, KRAS, NRAS, MAP2K1, MAP2K2, PTPN11, RAF1, RIT1, SHOC2, SOS1, SOS2, and other genes primarily result in a gain of function, altered activity, or abnormal response to effectors regulating signaling activity. In contrast, pathogenic variants in NF1 and SPRED1 result in loss of function of the regulators of the Ras/MAPK pathway (for review, see Refs. [3,5]). Given the phenotypic overlap and genetic heterogeneity in the RASopathies, genetic testing can provide clinicians with a specific molecular diagnosis that confirms a clinical diagnosis in their patients.

Classification of variants associated with a RASopathy The primary disease mechanism of pathogenic variants observed to cause the RASopathies is gain of function. Gain-of-function pathogenic variants are mainly single nucleotide changes that result in Clinical DNA Variant Interpretation. https://doi.org/10.1016/B978-0-12-820519-8.00011-9 Copyright © 2021 Elsevier Inc. All rights reserved.

389

390

Chapter 19 RASopathies

Table 19.1 Common ACMG/AMP evidence used in classifying variants in the RASopathies. Evidence

ACMG/AMP criteria

Frequency in the general population (e.g., ExAC, gnomAD) Inheritance of variant in affected patients (e.g., de novo occurrences and segregation) Observations in unaffected individuals or family members Functional studies of variant Number of affected patients (probands) with variant Important functional domains and mutational hot spots Disease mechanism and computational or predictive effects Other case-level data

BA1, BS1, PM2 PS2, PM6, PP1 BS2, BS4 PS3, BS3 PS4 PM1, PM5 PP2, PP3, BP4 BP2, BP5

missense substitutions within key functional residues or domains. For example, a large majority of pathogenic missense variants in PTPN11 occur in the SH2 and protein tyrosine phosphatase domains and ultimately increase its phosphatase activity [7]. Pathogenic variants in the RAS GTPase genes (e.g., HRAS, KRAS, etc.) often impair GAP-assisted GTP->GDP hydrolysis, subsequently favoring its active confirmation state [8,9]. Pathogenic variants in MAP2K1 and MAP2K2 typically occur in the catalytic core domain or the negative regulatory region causing increased kinase activity [10e13]. All of these variants result in hyperactive or dysregulated signaling of the Ras/MAPK pathway. Only a subset of ACMG/AMP criteria can be used when classifying gain-of-function variants. In addition, some criteria are not applicable to autosomal dominant diseases. The most common ACMG/ AMP criteria used in classifying gain-of-function missense variants in the autosomal dominant RASopathies are listed in Table 19.1. The type and amount of evidence observed for a variant may warrant either upgrading or downgrading the strength level of criteria [14]. For example, PS4 (i.e., variant prevalence in affected) may be downgraded to a supporting level of criteria if only two RASopathy probands are observed. To this end, efforts have been directed at establishing geneedisease specifications of ACMG/AMP criteria to further improve and refine variant classifications. The RASopathy Clinical Domain Working Group within the Clinical Genome Resource (www.clinicalgenome.org) released refined criteria specifications to aid in variant interpretation [15]. These geneedisease specifications can be utilized as a guide to provide the most accurate classification of a variant observed in association with a RASopathy due to a gain-of-function mechanism of disease.

General evidence criteria As genetic testing becomes more prevalent, it is important to know how common genetic variation is in the general population and in different ethnic groups. While considered fully penetrant, the variable expressivity of these disorders can complicate the assessment of apparently unaffected individuals. This means that individuals with pathogenic variants may inadvertently be represented in large population databases. To address this possibility, retrospective review of the most common pathogenic variants observed in the RASopathies supported that these variants were not observed at significant frequencies [15]. Thus, the frequency thresholds based on the incidence of 1:1000 for the RASopathies can be reduced to observation levels of 0.05% for BA1 and 0.025% for BS1 in the general

Case-level evidence criteria

391

population without risk of misclassification [15]. Given these are autosomal dominant disorders with the most common pathogenic variants well described in the literature, a variant should be absent in the general population to use PM2. Criteria like BS2 (i.e., observed in healthy adult) should be used with caution and apparently unaffected individuals should be well phenotyped before classifying a variant toward the benign spectrum given the small number of individuals (3) needed to apply such criteria and the potential risk in not ascertaining mild clinical presentations.

Gene-specific evidence criteria Many genes in the Ras/MAPK pathway are highly conserved, especially in functionally important regions, and intolerant of variation as reflected in their respective missense constraint Z-scores available in gnomAD (https://gnomad.broadinstitute.org/ [16]). This supports the use of PP2 (i.e., missense variant in a gene with low rate of polymorphisms) for classifying missense variants relative to the gain-of-function mechanism for the RASopathies. Moreover, specific functional domains and known hot spot residues have been defined for use of PM1 (i.e., located in hot spot or established functional domain) [15]. When other pathogenic variants have been observed at the same location, PS1 (i.e., same amino acid change as pathogenic variant) or PM5 (i.e., same codon as pathogenic variant) should be used. If a variant is observed alongside another known pathogenic variant, then BP2 (if in same gene) or BP5 (if in different gene) should be applied unless the patient has a significantly more severe phenotype. Observation of two pathogenic variants in a patient with typical features of RASopathy is highly unlikely and therefore, the variant in question is likely not the cause of the disorder. An exception may occur with two pathogenic variants in LZTR1, known to cause autosomal dominant and autosomal recessive NS. The functional effect of pathogenic variants can be assessed by measuring the hyperactivation or dysregulation of downstream signaling. The approved functional assays for applying PS3 or BS3 have been defined, with measuring the ratios of phosphorylation of pMEK/MEK or pERK/ERK being the most common [15]. As with all in vitro assays, some pathogenic variants may not accurately reflect in vivo effects; therefore, this evidence should not outweigh or supersede case-level or general evidence accrued for a variant. In addition, criteria for in silico prediction algorithms for missense changes (PP3, BP4) can be used as originally advised in Richards et al. [14], but should not undermine case-level data. Note that pathogenic splicing variants are rare in the RASopathies and splicing prediction algorithm results must be evaluated to determine if the effect would result in a gain of protein function expected for these disorders.

Case-level evidence criteria Comparing the genetic evidence of different individuals with a RASopathy, including their family members, provides important information for classifying variants. Identifying a variant in multiple affected individuals supports its association with a RASopathy. Given the genetic heterogeneity, full penetrance, and the rarity of these disorders, use of PS4 (i.e., variant prevalence in affected) is preferred over PP4 (i.e., highly specific features of disease in proband) when assessing the number of affected individuals with the same variant. As these disorders are fully penetrant, only 5 probands are required to reach the strong criteria level, 3e4 for moderate, and 1e2 for supporting [15].

392

Chapter 19 RASopathies

Often RASopathies present sporadically. Confirming that the variant is de novo (i.e., not inherited from an unaffected parent) is one of the most common criteria (PS2, PM6) supporting pathogenicity. As the number of de novo occurrences increase, the criteria strength increases. The combination of the total number of cases relative to whether parentage was confirmed dictates the applied criteria strength. Generally, two occurrences of either PS2 (i.e., de novo with confirmed parentage) or PM6 (i.e., de novo without confirmed parentage) warrant increasing the strength by one level of the criteria (i.e., PM6 to PM6_Strong) [15]. Additionally, if a variant is observed in multiple affected family members, then segregation criteria (PP1) are applied based on the number of meioses. Strong criteria application requires 7 meioses, moderate is 5e6, and supporting is 3e4 [15]. Conversely, if the variant fails to segregate with affected individuals, then BS4 can be applied with only one meiosis. Caution for phenocopies should always be considered when assessing clinical indications for an affected status, given the variable expressivity of these disorders.

Case examples Noonan syndrome (Table 19.2) A case presents with a clinical diagnosis of NS. Review of the patient’s clinical features indicated pulmonary stenosis, characteristic facial features of NS including hypertelorism with downslanting palpebral fissures, low-set, posteriorly angulated ears with a thickened helix, a deeply grooved philtrum, and a short, webbed neck. Genetic testing identified the c.922A > G; p.(Asn308Asp) heterozygous variant in PTPN11 (NM_002834.5). The PTPN11 (NM_002834.5) c.922A > G; p.(Asn308Asp) variant is classified as pathogenic due to the extensive case-level data in the literature. Moreover, functional studies showed increased phosphatase activity, supporting it causes a gain of function.

Cardio-facio-cutaneous Syndrome (Table 19.3) A case presents with a clinical diagnosis of CFC. Review of the patient’s clinical features indicated a ventricular septal defect, a history of seizures, and palmoplantar hyperkeratosis. Genetic testing identified the c.389A > G; p.(Tyr130Cys) heterozygous variant in MAP2K1 (NM_002755.4). The MAP2K1 (NM_002755.4) c.389A > G; p.(Tyr130Cys) variant is classified as pathogenic due to the extensive case-level data. Moreover, functional studies show that it leads to increased MEK1 kinase activity, supporting a gain of function.

Costello syndrome (Table 19.4) A patient presents with a clinical diagnosis of Costello syndrome. Review of the patient’s clinical features indicated hypertrophic cardiomyopathy, coarse facial features, deep palmar and plantar creases, moderate intellectual disability, soft and loose skin, short stature, and growth hormone deficiency. Molecular testing identified the c.34G > A; p.(Gly12Ser) heterozygous variant in HRAS (NM_005343.4) and classified it as pathogenic. The HRAS (NM_005343.4) c.34G > A; p.(Gly12Ser) variant is classified as pathogenic based on the extensive case-level data. Moreover, functional studies show that it leads to increased levels of the active GTP-bound HRAS, supporting gain of function.

Case examples

Table 19.2 Case example PTPN11 (NM_002834.5) c.922A > G; p.(Asn308Asp). Evidence type

Evidence review

Frequency in the general population Inheritance of variant in affected patients (e.g., de novo occurrences or segregation)

Negligible, but not absent in gnomAD Multiple confirmed and unconfirmed de novo occurrences, Cosegregated with disease in more than seven affected family members Supports abnormal function Identified in >5 probands

Functional studies of variant Number of affected patients (probands) with variant Important functional domains and mutational hot spots Disease mechanism and computational or predictive effects

Known hot spot residue

Applied ACMG/AMP criteria

References

N/A PS2_Very Strong, PP1_Strong

[16e18]

PS3

[19e22]

PS4

[23e26]

PM1

[15]

In silico predicts this PP2, PP3 [15,27] variant is deleterious to protein function; missense variant Final classification: PATHOGENIC (PS2_Very Strong, PS3, PS4, PP1_Strong, PM1, PP2, PP3)

Table 19.3 Case Example MAP2K1 (NM_002755.4) c.389A > G; p.(Tyr130Cys). Evidence type

Evidence review

Applied ACMG/AMP criteria

Frequency in the general population Inheritance of variant in affected patients (e.g., de novo occurrences or segregation) Functional studies of variant Number of affected patients (probands) with variant Important functional domains and mutational hot spots Disease mechanism and computational or predictive effects

Absent in gnomAD

PM2

Confirmed de novo in at least two patients

PS2_Very Strong

[12,28,29]

Supports abnormal function Identified in >5 probands

PS3

[30e32]

PS4

[33e35]

Known hot spot residue

PM1

[15]

In silico predicts this PP2, PP3 variant is deleterious to protein function; missense variant Final classification: PATHOGENIC (PS2_VeryStrong, PS3, PS4, PM1, PM2, PP2, PP3)

References

[15,27]

393

394

Chapter 19 RASopathies

Table 19.4 Case example HRAS (NM_005343.4) c.34G > A; p.Gly12Ser. Evidence type

Evidence review

Applied ACMG/AMP criteria

Frequency in the general population Inheritance of variant in affected patients (e.g., de novo occurrences or segregation) Functional studies of variant Number of affected patients (probands) with variant Important functional domains and mutational hot spots Disease mechanism and computational or predictive effects

Absent in gnomAD

PM2

Confirmed de novo in at least two patients

PS2_Very Strong

[36e41]

Supports abnormal function Identified in >5 probands

PS3

[9]

PS4

[42e47]

Known hot spot residue

PM1

[15]

In silico predicts this PP2, PP3 variant is deleterious to protein function; missense variant Final classification: PATHOGENIC (PS2_VeryStrong, PS3, PS4, PM1, PM2, PP2, PP3)

References

[15,27]

Unknown RASopathy diagnosis (Tables 19.5 and 19.6) A patient presents with a suspected RASopathy. Review of the patient’s clinical features indicated pulmonary valve stenosis, mild intellectual disability, and mild dysmorphic features including hypertelorism with downslanting palpebral fissures and short neck. Molecular testing identified two variants: c.698A > G; p.(Asn233Ser) in SOS1 (NM_005633.3) and c.995T > C; p.(Val332Ala) in RAF1 (NM_001354689.3). The SOS1 (NM_005633.3) c.698A > G; p.(Asn233Ser) variant is classified as likely benign due to its frequency in the general population. Population frequency, even at very low levels such as this, is extremely informative for the RASopathies and other rare dominant disorders with full penetrance. The RAF1 (NM_001354689.3) c.995T > C; p.(Val332Ala) variant is classified as uncertain due to the lack of case-level and gene-level knowledge for this residue and change. This case would benefit from parental testing for the variant to determine if it was inherited or arose de novo. It is important that the parents are well phenotyped in order to accurately inform the application of the de novo or segregation criteria. If the variant was determined to be de novo, then the variant could be upgraded to likely pathogenic.

Summary The specifications described in this chapter provide examples of optimizing the ACMG/AMP guidelines [14] for sequence variant interpretation in RASopathy disorders. The ACMG/AMP

Summary

Table 19.5 Case example SOS1 (NM_005633.3) c.698A > G; p.(Asn233Ser). Evidence type

Evidence review

Frequency in the general gnomAD filtering allele population frequency ¼ 0.037% Inheritance of variant in Unknown affected patients (e.g., de novo occurrences or segregation) Functional studies of None variant Number of affected Current proband patients (probands) with variant Important functional Not in a known hot spot domains and mutational hot spots Disease mechanism and In silico predicts this computational or variant does not impact predictive effects protein function Final classification: LIKELY BENIGN (BS1, BP4)

Applied ACMG/AMP criteria

References

BS1

[15]

N/A

N/A

N/A

[15]

BP4

[15,27]

Table 19.6 Case example RAF1 (NM_001354689.3) c.995T > C; p.(Val332Ala). Evidence type

Evidence review

Applied ACMG/AMP criteria

Frequency in the general population Inheritance of variant in affected patients (e.g., de novo occurrences or segregation) Functional studies of variant Number of affected patients (probands) with variant Important functional domains and mutational hot spots Disease mechanism and computational or predictive effects

Absent in gnomAD

PM2

Unknown

N/A

None

N/A

Current proband

PS4_Supporting

Not in a known hot spot

N/A

In silico predict this PP2, BP4 variant does not impact protein function; missense variant Final classification: UNCERTAIN SIGNIFICANCE (PM2, PS4_Supporting, PP2, BP4)

References

[15]

[15,27]

395

396

Chapter 19 RASopathies

guidelines were established as general guidelines and, therefore, needed to be broad enough to encompass all types of inheritance patterns and disease mechanisms. Inevitably, only a subset of criteria were applicable to the RASopathy disorders and those applicable needed further refinement to solve potentially ambiguous use. As illustrated in the provided case examples, these specifications can be used to provide consistent and accurate interpretation of variation in RASopathy-associated genes. The described RASopathy specifications may be extended, with minor modifications, to other autosomal dominant disorders. The refinements to the general and case-level evidence criteria may apply to other rare, fully penetrant autosomal dominant disorders, while certain gene-specific evidence criteria can apply to those having a gain-of-function disease mechanism. However, the described specifications should be extended with caution. For example, if the autosomal disorder is due to haploinsufficiency, then PVS1 (i.e., null variant in gene with a loss-of-function mechanism) would predominate and BP1 (i.e., missense variant in gene with a loss-of-function mechanism) may replace the use of PP2 (i.e., missense variant in a gene with low rate of polymorphisms). We recommend pairing implementation of the described specifications with expert curation of gene information, with ongoing review and refinement, to ensure accurate and uniform variant interpretation.

References [1] Allanson JERA. Noonan syndrome. In: Adam MPAH, Pagon RA, editors. GeneReviewsÒ. Seattle, WA: University of Washington, Seattle; 2001 [Updated Aug 8, 2019]. Available from: https://www.ncbi.nlm.nih. gov/books/NBK1124/. [2] Nava C, Hanna N, Michot C, et al. Cardio-facio-cutaneous and Noonan syndromes due to mutations in the RAS/MAPK signalling pathway: genotype-phenotype relationships and overlap with Costello syndrome. J Med Genet 2007;44(12):763e71. [3] Rauen KA. The RASopathies. Annu Rev Genom Hum Genet 2013;14:355e69. [4] Pierpont ME, Magoulas PL, Adi S, et al. Cardio-facio-cutaneous syndrome: clinical features, diagnosis, and management guidelines. Pediatrics 2014;134(4):e1149e62. [5] Tajan M, Paccoud R, Branka S, Edouard T, Yart A. The RASopathy family: consequences of germline activation of the RAS/MAPK pathway. Endocr Rev 2018;39(5):676e700. [6] Tartaglia M, Kalidas K, Shaw A, et al. PTPN11 mutations in Noonan syndrome: molecular spectrum, genotype-phenotype correlation, and phenotypic heterogeneity. Am J Hum Genet 2002;70(6):1555e63. [7] Tajan M, de Rocca Serra A, Valet P, Edouard T, Yart A. SHP2 sails from physiology to pathology. Eur J Med Genet 2015;58(10):509e25. [8] Gripp KW, Lin AE, Stabley DL, et al. HRAS mutation analysis in Costello syndrome: genotype and phenotype correlation. Am J Med Genet 2006;140(1):1e7. [9] van der Burgt I, Kupsky W, Stassou S, et al. Myopathy caused by HRAS germline mutations: implications for disturbed myogenic differentiation in the presence of constitutive HRas activation. J Med Genet 2007;44(7): 459e62. [10] Dentici ML, Sarkozy A, Pantaleoni F, et al. Spectrum of MEK1 and MEK2 gene mutations in cardio-faciocutaneous syndrome and genotype-phenotype correlations. Eur J Hum Genet 2009;17(6):733e40. [11] Anastasaki C, Estep AL, Marais R, Rauen KA, Patton EE. Kinase-activating and kinase-impaired cardiofacio-cutaneous syndrome alleles have activity during zebrafish development and are sensitive to small molecule inhibitors. Hum Mol Genet 2009;18(14):2543e54. [12] Rodriguez-Viciana P, Tetsu O, Tidyman WE, et al. Germline mutations in genes within the MAPK pathway cause cardio-facio-cutaneous syndrome. Science 2006;311(5765):1287e90.

References

397

[13] Bromberg-White JL, Andersen NJ, Duesbery NS. MEK genomics in development and disease. Brief Funct Genom 2012;11(4):300e10. [14] Richards S, Aziz N, Bale S, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical genetics and genomics and the association for molecular pathology. Genet Med 2015;17(5):405e24. [15] Gelb BD, Cave H, Dillon MW, et al. ClinGen’s RASopathy Expert Panel consensus methods for variant interpretation. Genet Med 2018;20(11):1334e45. [16] Elalaoui SC, Kraoua L, Liger C, Ratbi I, Cave H, Sefiani A. Germinal mosaicism in Noonan syndrome: a family with two affected siblings of normal parents. Am J Med Genet 2010;152A(11):2850e3. [17] Tartaglia M, Mehler EL, Goldberg R, et al. Mutations in PTPN11, encoding the protein tyrosine phosphatase SHP-2, cause Noonan syndrome. Nat Genet 2001;29(4):465e8. [18] Ezquieta B, Santome JL, Carcavilla A, et al. Alterations in RAS-MAPK genes in 200 Spanish patients with Noonan and other neuro-cardio-facio-cutaneous syndromes. Genotype and cardiopathy. Rev Esp Cardiol 2012;65(5):447e55. [19] Fragale A, Tartaglia M, Wu J, Gelb BD. Noonan syndrome-associated SHP2/PTPN11 mutants cause EGFdependent prolonged GAB1 binding and sustained ERK2/MAPK1 activation. Hum Mutat 2004;23(3): 267e77. [20] Keilhack H, David FS, McGregor M, Cantley LC, Neel BG. Diverse biochemical properties of Shp2 mutants. Implications for disease phenotypes. J Biol Chem 2005;280(35):30984e93. [21] Zhang W, Chan RJ, Chen H, et al. Negative regulation of Stat3 by activating PTPN11 mutants contributes to the pathogenesis of Noonan syndrome and juvenile myelomonocytic leukemia. J Biol Chem 2009;284(33): 22353e63. [22] Edouard T, Combier JP, Nedelec A, et al. Functional effects of PTPN11 (SHP2) mutations causing LEOPARD syndrome on epidermal growth factor-induced phosphoinositide 3-kinase/AKT/glycogen synthase kinase 3beta signaling. Mol Cell Biol 2010;30(10):2498e507. [23] Yoshida R, Ogata T, Masawa N, Nagai T. Hepatoblastoma in a Noonan syndrome patient with a PTPN11 mutation. Pediatr Blood Canc 2008;50(6):1274e6. [24] Riccardi F, Rivolta GF, Uliana V, et al. Cryptic 13q34 and 4q35.2 deletions in an Italian family. Cytogenet Genome Res 2015;147(1):24e30. [25] Tafazoli A, Eshraghi P, Pantaleoni F, et al. Novel mutations and their genotype-phenotype correlations in patients with Noonan syndrome, using next-generation sequencing. Adv Med Sci 2018;63(1):87e93. [26] Ueda K, Yaoita M, Niihori T, Aoki Y, Okamoto N. Craniosynostosis in patients with RASopathies: accumulating clinical evidence for expanding the phenotype. Am J Med Genet 2017;173(9):2346e52. [27] Ioannidis NM, Rothstein JH, Pejaver V, et al. REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am J Hum Genet 2016;99(4):877e85. [28] Gripp KW, Lin AE, Nicholson L, et al. Further delineation of the phenotype resulting from BRAF or MEK1 germline mutations helps differentiate cardio-facio-cutaneous syndrome from Costello syndrome. Am J Med Genet 2007;143A(13):1472e80. [29] Schulz AL, Albrecht B, Arici C, et al. Mutation and phenotypic spectrum in patients with cardio-faciocutaneous and Costello syndrome. Clin Genet 2008;73(1):62e70. [30] Rodriguez-Viciana P, Rauen KA. Biochemical characterization of novel germline BRAF and MEK mutations in cardio-facio-cutaneous syndrome. Methods Enzymol 2008;438:277e89. [31] Cheng TM, Goehring L, Jeffery L, et al. A structural systems biology approach for quantifying the systemic consequences of missense mutations in proteins. PLoS Comput Biol 2012;8(10):e1002738. [32] Senawong T, Phuchareon J, Ohara O, McCormick F, Rauen KA, Tetsu O. Germline mutations of MEK in cardio-facio-cutaneous syndrome are sensitive to MEK and RAF inhibition: implications for therapeutic options. Hum Mol Genet 2008;17(3):419e30.

398

Chapter 19 RASopathies

[33] Narumi Y, Aoki Y, Niihori T, et al. Molecular and clinical characterization of cardio-facio-cutaneous (CFC) syndrome: overlapping clinical manifestations with Costello syndrome. Am J Med Genet 2007;143A(8): 799e807. [34] Neumann TE, Allanson J, Kavamura I, et al. Multiple giant cell lesions in patients with Noonan syndrome and cardio-facio-cutaneous syndrome. Eur J Hum Genet 2009;17(4):420e5. [35] Celik N, Cinaz P, Bideci A, et al. Cardio-facio-cutaneous syndrome with precocious puberty, growth hormone deficiency and hyperprolactinemia. J Clin Res Pediatr Endocrinol 2014;6(1):55e8. [36] Aoki Y, Niihori T, Kawame H, et al. Germline mutations in HRAS proto-oncogene cause Costello syndrome. Nat Genet 2005;37(10):1038e40. [37] Sol-Church K, Stabley DL, Nicholson L, Gonzalez IL, Gripp KW. Paternal bias in parental origin of HRAS mutations in Costello syndrome. Hum Mutat 2006;27(8):736e41. [38] Kerr B, Delrue MA, Sigaudy S, et al. Genotype-phenotype correlation in Costello syndrome: HRAS mutation analysis in 43 cases. J Med Genet 2006;43(5):401e5. [39] van Steensel MA, Vreeburg M, Peels C, et al. Recurring HRAS mutation G12S in Dutch patients with Costello syndrome. Exp Dermatol 2006;15(9):731e4. [40] Zampino G, Pantaleoni F, Carta C, et al. Diversity, parental germline origin, and phenotypic spectrum of de novo HRAS missense changes in Costello syndrome. Hum Mutat 2007;28(3):265e72. [41] Zhang H, Ye J, Gu X. Recurring G12S mutation of HRAS in a Chinese child with Costello syndrome with high alkaline phosphatase level. Biochem Genet 2009;47(11e12):868e72. [42] Dileone M, Profice P, Pilato F, et al. Enhanced human brain associative plasticity in Costello syndrome. J Physiol 2010;588(Pt 18):3445e56. [43] Estep AL, Tidyman WE, Teitell MA, Cotter PD, Rauen KA. HRAS mutations in Costello syndrome: detection of constitutional activating mutations in codon 12 and 13 and loss of wild-type allele in malignancy. Am J Med Genet 2006;140(1):8e16. [44] Gripp KW, Stabley DL, Nicholson L, Hoffman JD, Sol-Church K. Somatic mosaicism for an HRAS mutation causes Costello syndrome. Am J Med Genet 2006;140(20):2163e9. [45] Lo IF, Brewer C, Shannon N, et al. Severe neonatal manifestations of Costello syndrome. J Med Genet 2008; 45(3):167e71. [46] Paquin A, Hordo C, Kaplan DR, Miller FD. Costello syndrome H-Ras alleles regulate cortical development. Dev Biol 2009;330(2):440e51. [47] Sol-Church K, Stabley DL, Demmer LA, et al. Male-to-male transmission of Costello syndrome: G12S HRAS germline mutation inherited from a father with somatic mosaicism. Am J Med Genet 2009;149A(3): 315e21.

CHAPTER

Summary and conclusions

20

Jordan Lerner-Ellis1, 2,3, Amanda Spurdle4, Conxi La´zaro5, 6 1

Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada; 2Pathology and Laboratory Medicine, Mount Sinai Hospital, Toronto, Ontario, Canada; 3Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario, Canada; 4QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia; 5 Molecular Diagnostic Laboratory, Hereditary Cancer Program, Institut Catala´ d’Oncologia (ICO-IDIBELLONCOBELL-CIBERONC), Barcelona, Spain; 6Institut d’Investigacio´ Biome´dica de Bellvitge, Barcelona, Spain

One of the biggest challenges in genetic research and medicine today is our limited ability to collect large amounts of data for real-time monitoring of the relationship between genetic status and disease. Integrating genetics into routine medical practice has begun to facilitate this process and will improve our ability to interpret DNA variation. The human genome has w3 billion base pairs and over 15 million common variants. Since the completion of the human genome project and the elucidation of the entire DNA code, and generation of increasing amounts of genetic data, a broader picture and numerous applications have emerged. It is a testament of how far we have come in the field, but also of how much more work remains to be done. In recent years, enormous efforts have been devoted to understanding the various components of the human genome and how it relates to human diseases. Implicit in these projects is the need to translate research discoveries into guidelines for clinical practice and related areas like population health impact, areas that have traditionally been lacking in funding. Clinical and academic laboratories today face the onset of next-generation sequencing with the attendant need to sort and interpret large numbers of variants. Firstly, the raw sequence of a single genome can produce upward of 60 Gb of data and as much as 400 Gb with alignment and other annotation files. Researchers and clinicians worldwide are grappling with developing new tools in a standardized format and for automation and annotation to filter and triage variants of clinical significance. There is an increasing need to search, share, and access information relating to diseaseegene relationships, genotypeephenotype correlation, and downstream approaches to applying this information to personalized medicine. Secondly, with whole-genome sequencing (WGS) comes increased resolution and the need for improved bioinformatics tools that give us the ability to identify and interpret different types of variation. New predictive and interpretive tools present new challenges, risks such as overestimation of positive predictive values, and ascertainment bias; sequencing of more people and larger cohorts and incorporating large-scale population datasets into our interpretive algorithms will enable better interpretation and reporting. In this way, newer applications such as polygenic risk scores (PRS) and incorporation of other risk factors will give us a better resolution of high, moderate, and low disease-related risk and how to interpret and report this information in a clear and understandable way to referring physicians and patients. Clinical DNA Variant Interpretation. https://doi.org/10.1016/B978-0-12-820519-8.00023-5 Copyright © 2021 Elsevier Inc. All rights reserved.

399

400

Chapter 20 Summary and conclusions

A specific alteration in the DNA code does not always have a direct one-to-one correlation with disease; it may behave differently in different genetic backgrounds or in the context of environmental exposures. This is particularly true for DNA variants that have a modest effect on disease development or progression. Much study has been given to common variants associated with modest risk of common diseases, identified in recent years through large-scale genome-wide association studies (GWAS). While PRS are now being included in genetic test reporting for some diseases, e.g., breast cancer, results from formal clinical trials are needed to demonstrate confidence in the clinical utility of PRS in risk prediction at the individual level. Perhaps more tangible is the potential to use this information at the level of selecting and applying population health strategies for disease prevention and early detection. Our understanding of the DNA code continues to evolve. Some types of DNA alterations have predictable outcomes such as pathogenic variants in genes that cause characteristic clinical features that allow us to identify or distinguish them solely based on their physical manifestations. These types of disorders tend to be rare and so are the DNA variants that cause them. Some genetic variants masquerade as different disease entities and require more in-depth clinical or biochemical work to assess their relationship with one or more diseases. The full spectrum of genetic and clinical heterogeneity has yet to be uncovered. Yet another issue of increasing importance as sequencing moves to WGS is the challenge of how to develop and benchmark reliable in silico and functional methods to pinpoint variants in regulatory regions that are important for Mendelian disease, given that there a few proven pathogenic variants in these regions. Projects such as ENCODE (https://www.encodeproject. org/) are building comprehensive lists of functional elements that act at the protein and RNA levels or regulatory elements that control cells and the triggers that determine when genes are active. Only by studying these diseaseegeneevariant relationships in millions of people will we obtain a more clear understanding of the role of human genetic variation in disease and this endeavor will continue for years to come. In some cases, just knowing the cause of disease in an individual can provide resolution to the diagnostic odyssey in a family and this alone can be enough. In other cases, new diseaseegene relationships can empower us to find and develop new treatments to prevent, treat, and even cure disease and this is owed, in part, to our ability to interpret the human genome. In recent years, direct-to-consumer (DTC) genetic testing has given access of genetic information to the population at large and for some people may offer hope in an effort to maybe even improve ourselves, live up to our potential, or leave our mark on the world. The DTC tests available on the market are largely recreational and return information on ancestry and the propensity for certain physical attributes, but DTC testing has also turned out to be quite useful for research. As we learn more about the human genome it is increasingly important that we manage public expectations. While genetic testing has many examples of proven clinical utility for diagnosis, prognosis, and therapy, many questions still need to be answered. Major advances have unfolded with each passing decade and together with the integration of genetics into routine clinical practice and the recognition of the importance of the study of human genetics in relation to health and disease by government funding agencies, our understanding will continue to improve. Today patients are being offered targeted genetic testing for specific clinical applications or indications. New technologies such as genome sequencing are being used mostly for research-based identification of disease-related genes, but wide-scale sequencing is now also being introduced as a routine method for testing and screening. As field experts and professionals we have increasing responsibility to provide awareness of the impact of genetic information as well as its limitations, and to manage expectations about the capabilities of genetic testing. We are currently faced with the Sisyphean and tantalizing work of determining the mechanisms that underpin human inheritance and

Future directions

401

the role of genetic factors in health and disease. It is also our responsibility to inform the public about where and how to obtain information on the applications of genetics. Rapid processing and new technologies will not necessarily solve the key issues of clinical utility but access to genome data may allow physicians to make more informed choices and success in using genetic information will depend on improved understanding and communication. With increased sharing and collaboration of laboratories worldwide in the development of tools and methods that leverage laboratory data in diagnosing, managing, and treating genetic diseases, knowledge will improve and ultimately lead to better patient care.

Future directions Several important and major topics were not covered in this book. For instance, there are not chapters on copy number analysis and somatic variant interpretation, which are critical areas of genomic research and diagnostic testing. Progress has been made in these domains but are not as evolved as with other areas of variant interpretation and warrant special attention in the future.

Index Note: ‘Page numbers followed by “f ” indicate figures , “t” indicate tables and “b” indicate boxes’.

A Aldy, 199 Allozymes, 145e147 Alternative splicing events, 122 bioinformatics programs, 133 BRCA genes, 137t conventional molecular procedure, 138 dynamic profile, 130e132, 131f epigenetic determinants, 127e129 intrinsically disordered domains (IDDs), 122 long-range sequence features, 125e127 pathogenic, 132e136 profile, 122 reference transcript, 122, 125 roles, 129e130 short sequence motifs, 125e127 spliceogenic variants, 132e136 overlap cis-acting determinants, 125e127 splicing isoforms, 125 trans-acting, 127e129 untranslated regions (UTRs), 122 American College of Medical Genetics and Genomics/ Association for Molecular Pathology (ACMG/AMP), 3 missense variants, 29 ongoing/future adaptations ClinGen, 35 gene-specific versus general criteria, 36 qualitative versus quantitative/Bayesian approaches, 37 Variant Curation Expert panels (VCEPs), 35 sequence variant interpretation, 34b application, 31e34 background, 30e31, 32te33t case interpretation, 34 evidence criteria, 31e34 scope, 30e31 variant classification, 31, 34 Amino acid sequence variants, pathogenicity predictors for, 90e110 biophysical models applicability, 94 molecular impact, 90e94 protein interactions, 92e94 protein stability changes upon mutation, 90e92, 91f APOB, 329 Apparent non-syndromic hearing loss, 309e310 Application Programming Interfaces (APIs), 20

Arrhythmogenic right ventricular cardiomyopathy (ARVC), 277e278, 280 Astrolabe, 200 ATM c.90009034del allele frequency, 368e369 allelic data, 373e374 biological and clinical interpretation, 375 case presentation, 367 computational and predictive, 370e371 coverage of exon, 369 de novo data, 373 evidence and final classification, 375 functional data, 371e375 pathogenicity assessment, 368e371 population data, 368e369 population frequencies, 369 segregation data, 372 splice predictors, 369e371

B Bayesian framework, 3 Best Alignment Normalisation (BAN), 18 Biallelic inactivation, 178e179 Bioinformatic approaches, 111e113 Bioinformatic pathogenicity predictors classifier, 96 discriminant features, 96, 97te99t, 101te103t metapredictors, 100f performance, 100e104, 104f estimation variability, 105e109, 106f, 107te108t principles, 94e109 training datasets, 95, 96f validation process, 98e100 Bioinformatic predictions, 50 Bloom syndrome, 156 BRCA1, 152e154 BRCA2, 152e154 BRCA1/2 assays, 51 BRCA1/2 breast cancer histopathology, 52e53 BRCA2 c.9976A>T p.(Lys3326Ter) allele frequency, 352 allelic data, 357 Assay-Homology-directed repair assay, 355 biological and clinical interpretation, 358 computational and predictive data, 353e355 cosegregation analysis, 356

403

404

Index

BRCA2 c.9976A>T p.(Lys3326Ter) (Continued ) coverage of exon, 352 de novo data, 356 evidence and final classification, 358 functional data, 355 homology-directed repair (HDR) assay, 355 pathogenicity assessment, 352e358 population data, 352 frequencies, 352 segregation data, 356 BRCA2 c.9117G>A allele frequency, 359e361 allelic data, 366 biological and clinical interpretation, 367 case-control association study, 361e363 case presentation, 359 computational and predictive data, 363 coverage of exon, 361 de novo data, 365 evidence and final classification, 367, 367b functional data, 364e365 pathogenicity assessment, 359e367 population data, 359e363 frequencies, 360 segregation data, 365 splice predictors, 363 Breast cancer, 182e183

C Calibration, 152 Cancer hotspots, 172 Candidate predisposition genes, 185 Cardiac patients genetic screening, 287 Cardiac variants secondary findings, 287 Cardio-facio-cutaneous syndrome, 392 Case-level evidence criteria, 391e392 Catalogue of somatic mutations in cancer (COSMIC), 171 Caveats, 54e55 CBioPortal, 171e172 Cell-free assays, 156 Central auditory dysfunction, 305e306 CHEK2 variants, 182e183 CiVIC, 172e175 Clinical Genome Resource (ClinGen), 4e5, 35 Clinical Pharmacogenetics Implementation Consortium (CPIC), 4 ClinVar, 175 Complete in vitro mismatch repair activity (CIMRA), 50 Computational approach, variant interpretation

amino acid sequence variants, pathogenicity predictors for, 90e110 biophysical models applicability, 94 molecular impact, 90e94 protein interactions, 92e94 protein stability changes upon mutation, 90e92, 91f bioinformatic pathogenicity predictors classifier, 96 discriminant features, 96, 97te99t, 101te103t metapredictors, 100f performance estimation variability, 105e109, 106f, 107te108t performance of, 100e104, 104f principles, 94e109 training datasets, 95, 96f validation process, 98e100 challenges, 109e110 future developments, 109e110 pathogenicity prediction problem, 90 Computational predictors variants affecting splicing bioinformatic approaches, 111e113 challenges, 112e113 future developments, 112e113 Mis-RNA splicing, 110e111 RNA splicing factors, 110 Conductive hearing loss, 305e306 Copy-neutral LOH, 178 Cosegregation, 50, 329, 330f Costello syndrome, 392 CYP2D6 annotation software, 199e200 Cypiripi, 200 Cystic fibrosis transmembraneconductance regulator (CFTR), 157

D Data sharing/gene variant databases classification records, 230e232 ClinVar, 227e229, 228f European Variation Archive (EVA), 232e233 final considerations, 233e234 focused databases, 226e233 general databases, 222e225 genome-wide association studies (GWAS), 223 Global Variome shared LOVD, 229e232, 230fe231f GV shared LOVD, 227 Human Gene Mutation Database (HGMD), 226e227 locus-specific databases (LSDBs), 226 Online Mendelian Inheritance in Man (OMIM database), 223, 224f other databases, 232e233 Deafness genes, 314e316

Index

Dilated cardiomyopathy (DCM), 277e279 Direct-to-consumer (DTC) genetic testing, 400 DNA mismatch repair genes, 154e156 DNA sequence changes, 10e11 deletion-insertions, 11 deletions, 10 insertions, 10e11 inversions, 11 structural variation, 11 substitutions, 10 Dominant phenotypesintegrating population, 184 Duplicated regions, 24 Duplication, 10e11 Dutch Pharmacogenetics Working Group (DPWG), 4

E Epigenetic variation, 15 Escherichia coli, 154 Evidence-based Network for the Interpretation of Germline Mutant Alleles (ENIGMA), 30

F Familial hypercholesterolemia (FH) APOB, 329 cases presentations, 331e342 cosegregation, 329, 330f functional studies, 325e329 laboratory genetic testing, 331 LDLR, 325e329 PCSK9, 329 in silico prediction algorithms, 330e331 variant interpretation, 324e331, 326te328t Functional evidence (II) protein bloom syndrome, 156 BRCA1, 152e154 BRCA2, 152e154 calibration, 152 cell-free assays, 156 cystic fibrosis transmembraneconductance regulator (CFTR), 157 DNA mismatch repair genes, 154e156 Escherichia coli, 154 functional assays, 150 high-throughput assays, 158 historical background, 145e147, 146f Lynch syndrome (LS), 154 Rhodopsin (RHO), 156e157 in silico tools, variant effects, 149e150 validation, 152 variant pathogenicity assessment, 149 variants of uncertain significance (VUS), 147e149, 148f, 155t in vivo assays, 158e159

G Gene-specific evidence criteria, 391 Gene symbols, 16 Genetic counseling, 296 Genetic disorders, 5 Genetic testing, 281 additional genetic variants, 282 age- and sex-related penetrance, 282 at-risk relatives identification, 281e282 clinical screening, 281e282 gene-directed treatments, 282 incomplete penetrance, 282 incomplete phenotype information, 283e285 segregation in family, 282e283 variable expression, 284e285 variant pathogenicity insufficient evidence, 285e286 Genetic variation describing variants, 16e20 DNA sequence changes, 10e11 deletion-insertions, 11 deletions, 10 insertions, 10e11 inversions, 11 structural variation, 11 substitutions, 10 gene symbols, 16 Human Genome Variation Society nomenclature, 18e20, 19t in-frame variants, 13 introns, 13e14 nonsense variants, 13 other variation, 15 promoter region, 12 protein-coding region, 12e13 protein sequence changes, 11 reference sequences, 16 RNA sequence changes, 11 splice region, 13e14 splice sites, 13e14 standards on, 15e20 start codon, 12 stop codon, 14 30 untranslated region (30 UTR), 14e15 50 untranslated region (50 UTR), 12 variant call format (VCF), 17e18 variant consequences by location, 11e15 Genome-scale sequencing (GS) available annotations, 240e242 basic gene, 239 clinical applications, 237e238 criteria, 242e246 diagnostic findings returned, 243

405

406

Index

Genome-scale sequencing (GS) (Continued ) diagnostics, 237e238 filtration approaches, 240e242 inheritance patterns, 240 Mendelian disease risk, 244e245 low penetrance, 246 medical actionability, 245 patient population, 245 patient preferences, 245 pharmacogenetic variants, 246 predictive capacity, 244 recessive disease, carrier status, 245e246 risk factors, 246 minor allele frequencies (MAFs), 239 phenotype associations, 240 population frequency data, 239 publication data, 240 research applications, 238 screening, 238 secondary and screening findings returned, 243e246, 244f variant annotation/filtration, 238e240 variant-level data, 239 Genome-wide association studies (GWAS), 1e2, 399e400 Genomic data commons (GDC), 175 Genomic variant interpretation active inclusion/exclusion, 262 available information completeness, 262 causative variants, 266 classical high-sensitivity features, 262 clinical and phenotypic information, 257e264 clinical data sources, 258e260 cases from clinical networks, 259 patient under investigation, 259 publicly available clinical evidence, 259e260 clinical diagnostic testing, 254e256 clinical diagnostic variant interpretation historic empirical disease-based interpretation, 256 international coordination, variant data sharing, 256e257 international frameworks, 257 clinical geneticist, 251e253 clinical genetics services, 251 frequency information, 263 frequency of variation, 264 genetic consultations, 253e254 genetic heterogeneity, 263 genomic architecture of disease, 254e256 genomic testing, 253e254 knowledgebase underpinning, 254e256 large-scale data generation, 256 locus-specific databases, 260 management based, 269e270

mode of inheritance, 264 observed phenotypic feature(s), 262e263 pathogenic variant, 266e269 contextualizing risk estimation, 267 hypomorphic variants, 267e268 individualized risk estimation, 266e269 moderate risk genes, 268 nongenetic factors, 269 oligogenic modifier variants, 268e269 other genetic factors, 268e269 polygenic modifiers, 269 risk estimates, 266e267 pathogenic variation within a gene, 263 patient based genomic data, 265e270 phenotype assessment, 260e261 phenotypic data under evaluation, 261 scientific literature, 259e260 segregation of disease, 264 technological advances, 254 uncertain significance, 265e266 variant information repositories, 260 variant interpretation clinical data, 261 Genotyping, 297e298 Germline/Somatic Variant Subcommittee (GSVS), 169 Gnomad, 175

H Hearing loss (HL) central auditory dysfunction, 305e306 conductive hearing loss, 305e306 cookie-bite, 305 genetic tests, 307e309 mild HL, 305 mixed HL, 305e306 moderate HL, 305 moderately severe HL, 305 molecular diagnosis, 309e316 apparent non-syndromic hearing loss, 309e310 deafness genes, 314e316 molecular karyotyping, 313e314 more than one gene involved, large families with, 310e313 noise-notched configuration, 305 non-syndromic (NSHL), 306 postlingual HL, 306 prelingual HL, 306 profound HL, 305 rising configuration, 305 sensorineural HL, 305e306 severe HL, 305 sloping configuration, 305 syndromic HL, 306

Index

U-shaped configuration, 305 Hereditary Breast and Ovarian Cancer syndrome (HBOC), 30 Hereditary cancer genes, genetic variants in ATM-associated susceptibility to breast cancer, 350 BRCA1/2-associated hereditary breast and ovarian cancer syndrome, 349 case presentation, 351e352 Lynch syndrome, 350e351 pathogenicity assessment of the variant, 352e358 variant information, 352 High-throughput assays, 158 Human Genome Variation Society (HGVS), 3, 18e20, 19t Human Phenotype Ontology (HPO), 22e23 Human Variome Project (HVP), 4e5 Hypertrophiccardiomyopathy (HCM), 277e278

I In-frame variants, 13 Inherited cardiomyopathies, 277e278 arrhythmogenic right ventricular cardiomyopathy (ARVC), 277e278, 280 cardiac patients genetic screening, 287 cardiac variants secondary findings, 287 dilated cardiomyopathy (DCM), 277e279 experimental evidence, 286e287 functional data, 286e287 genetic testing, 281 additional genetic variants, 282 age- and sex-related penetrance, 282 at-risk relatives identification, 281e282 clinical screening, 281e282 gene-directed treatments, 282 incomplete penetrance, 282 incomplete phenotype information, 283e285 segregation in family, 282e283 variable expression, 284e285 variant pathogenicity insufficient evidence, 285e286 hypertrophiccardiomyopathy (HCM), 277e278 inherited cardiomyopathies, 277e278 inherited heart diseases, 277e287 insufficient variant information, 285e286 left ventricular noncompaction (LVNC), 277e278, 281 phenotyping, 286e287 restrictive cardiomyopathy (RCM), 277e278, 281 Inherited heart diseases, 277e287 In silico prediction algorithms, 330e331 In silico tools, variant effects, 149e150 Insufficient variant information, 285e286 Intrinsically disordered domains (IDDs), 122 Introns, 13e14, 123fe124f retention, 123fe124f In vivo assays, 158e159

L Large-scale sequencing projects, 3 LDLR, 325e329 Left ventricular noncompaction (LVNC), 277e278, 281 Leukemia predisposition genes, 184 Likelihood ratios (LR), 43e48, 47f categorical data proportions, 46 combining likelihood ratios, 48 complex categorical data, 47e48 continuous variables calibration, 48 Long-range sequence features, 125e127 Loss of heterozygosity (LOH), 176 Lynch syndrome (LS), 154

M Maternal PKU, 297 Medical genetics, 1 Medical pedigree, 59e60 Mendelian disease risk, 244e245 low penetrance, 246 medical actionability, 245 patient population, 245 patient preferences, 245 pharmacogenetic variants, 246 predictive capacity, 244 recessive disease, carrier status, 245e246 risk factors, 246 Mild hearing loss, 305 Minor allele frequencies (MAFs), 239 Mis-RNA splicing, 110e111 Missense variants, 13 Mitochondrial genome (mtDNA), 16 Mixed hearing loss, 305e306 MLH1 c.2041G>A allele frequency, 377 allelic data, 381 biological and clinical interpretation, 383 case presentation, 376 computational and predictive data, 379 coverage of exon, 377 evidence and final classification, 382 functional data, 379e380 multifactorial likelihood analysis, 351 non-ACMG/AMP data, 382 pathogenicity assessment, 377e382 population data, 377 segregation data, 380 MMR tumor characteristics, 52 Moderate hearing loss, 305 Moderately severe hearing loss, 305 Molecular karyotyping, 313e314 Molecular pathology, 77e81

407

408

Index

Molecular pathology (Continued ) hereditary cancer predisposition, 78e80 breast cancer, 79e80 molecular pathology marker, 78e79 newborn screening, 80e81 ovarian cancer, 79e80 tumor first sequencing, 78 Mosaicism, 80e81 Li-Fraumeni syndrome (LFS), 83e84 mosaic polycystic kidney disease, 83 neurofibromatosis type 1 (NF1), 83 next-generation sequencing, 82 presentations, 82e84 somatic vs. germ line mosaicism, 81e82 testing strategies, 82 Multiplex ligation-dependent probe amplification (MLPA), 147 Mutational hot spots, 179e180 Mutually exclusive exons, 123fe124f

N National Center for Biotechnology Information (NCBI), 16 Newborn screening, 294 Nonnormalized variant calling, 24 Nonsense variants, 13 Non-syndromic hearing loss, 306 Noonan syndrome, 392

O Online Mendelian Inheritance in Man (OMIM), 1

P Pathogenicity of alleles, 184 Pathogenicity prior probability, 49 Pathogenic variant co-occurrence with, 53 PCSK9, 329 PECAN.stjude.org, 172 Pharmacogenetics abacavir, 208e209 adverse drug reactions(ADRs), 193 antiretroviral therapy, 208e209 Canadian Pharmacogenomics Network for Drug Safety (CPNDS), 199 carbamazepine, 200e201 clinical guidelines, 198e200 clinical pharmacogenetics implementation consortium (CPIC), 198e199 clinical practice, 209e210 clopidogrel, 201e204 codeine, 207e208 complete PGx annotation tools, 199

CYP3A5, 206e207 CYP2C19, 201e204 CYP2D6, 207e208 databases/resources, 196e198 PGx level, 197b PharmGKB, 196e198, 197b decision support tools, 198e200 DPYD, 204e205 Dutch Pharmacogenetics Working Group (DPWG), 199 fluoropyrimidines, 204e205 HLA-B, 208e209 maculopapular exanthema (MPE), 200 major histocompatibility complex (MHC), 201 NUDT15, 205e206 oxcarbazepine, 200e201 pain relief, 207e208 personalized medicine, 193e194 personalized medicine, future perspectives of, 210e212 PGx consortia, clinical guidelines from, 198e199 pharmacogene variation (PharmVar), 198 pharmacometabolomics (PMx), 211e212 tacrolimus, 206e207 technologies, 196 thiopurine methyltransferase, 205e206 toxic epidermal necrolysis (TEN), 200 variant nomenclature HLA nomenclature, 195 Human GenomeVariation Society (HGVS), 195 PharmVar reference sequences, 194b star allele nomenclature, 194e195 Pharmacogenomic testing, 4 PharmCat, 199 Phenotype description, 59 Phenylalanine hydroxylase (PAH), 291 Phenylketonuria (PKU) biochemical classification, 295t case studies, 299e302 classification of, 295 clinical symptoms, 294 definition, 291 diagnosis, 294e295 genetic counseling, 296 genotyping, 297e298 history, 291e293 incidence, 295 management, 296e297 maternal PKU, 297 newborn screening, 294 phenylalanine hydroxylase (PAH), 291 practical genotype-phenotype correlation, 298e302 Polygenic risk scores (PRS), 399 Population allele frequency

Index

allele frequency thresholds, 66 autosomal dominant (AD), 69, 69f autosomal recessive, 69e70, 70f benign evidence criteria, thresholds used for, 67e68 cosegregation, 75e76 limitations, 77 phenotyping, 76 family history, 69e77 inheritance analysis, 74, 74f inheritance patterns, 69e73 MAF thresholds, 66e68 mitochondrial inheritance, 71e73, 73f pathogenic evidence criteria, thresholds used for, 68 population size, 68e69 X-linked dominant, 71, 72f X-linked recessive, 70, 71f Y-linked, 71, 73f Population-based data, 53e54 case-control data, 54 healthy adult individuals, 54 population frequency, 53e54 Population frequency data, 239 Population genetic resources, 60e65 ascertainment, 63 ascertainment bias, 63e64 expected variant frequency, 62e63 fitness-reproductive success, 60e61 genetics studies, matched controls in, 65 Hardy-Weinberg equilibrium, 61e62 healthy individuals ascertainment, 64 individuals with disease ascertainment, 64e65 population ethnic background, 62 prevalence of disease, 62 Postlingual hearing loss, 306 Prelingual hearing loss, 306 Profound hearing loss, 305 Protein truncation test (PTT), 147 Pseudogenes, 24

Q Quantitative modeling BRCA1/2 assays, 51 caveats, 54e55 complete in vitro mismatch repair activity (CIMRA), 50 components of, 49e54 bioinformatic predictions, 50 cosegregation, 50 pathogenicity prior probability, 49 considerations, 54e55 likelihood ratios (LR), 43e48, 47f categorical data proportions, 46

409

combining likelihood ratios, 48 complex categorical data, 47e48 continuous variables calibration, 48 pathogenic variant co-occurrence with, 53 population-based data, 53e54 case-control data, 54 healthy adult individuals, 54 population frequency, 53e54 TP53, 50e52 tumor characteristics, 52e53 BRCA1/2 breast cancer histopathology, 52e53 MMR tumor characteristics, 52 TP53 breast cancer histopathology, 53 TP53 somatic/germ line ratio, 53 variant interpretation, 41e42

R RASopathies Cardio-facio-cutaneous syndrome, 392 case-level evidence criteria, 391e392 classification of variants, 389e390 Costello syndrome, 392 definition, 389 general evidence criteria, 390e391 gene-specific evidence criteria, 391 Noonan syndrome, 392 unknown RASopathy diagnosis, 394 Reference transcript, 122, 125 Restrictive cardiomyopathy (RCM), 277e278, 281 Rhodopsin (RHO), 156e157 RNA-seq tumor data, 180e181 RNA splicing factors, 110

S Severe hearing loss, 305 Shift events, 123fe124f Short sequence motifs, 125e127 Single-nucleotide substitution, 11 Somatic data resources, 170e175 Somatic/germ line data biallelic inactivation, 178e179 biomarker considerations, 183e184 cancer hotspots, 172 catalogue of somatic mutations in cancer (COSMIC), 171 cBioPortal, 171e172 characterization, 174t CiVIC, 172e175 ClinVar, 175 control database for comparison, 175 copy-neutral LOH, 178 data repositories considered for, 174t

410

Index

Somatic/germ line data (Continued ) dominant phenotypesintegrating population, 184 genomic data commons (GDC), 175 Gnomad, 175 laboratory practices, 175 limitations, 172t loss of heterozygosity (LOH), 169, 176 mutational hot spots, 179e180 OMIM, 175 pathogenicity of alleles, 184 PECAN.stjude.org, 172 predisposition, 184e185 candidate predisposition genes, 185 leukemia predisposition genes, 184 principles, 176 recognizing clonal evolution, 184e185 RNA-seq tumor data, 180e181 somatic data resources, 170e175 tumor signatures, 181e182, 182t breast cancer, 182e183 CHEK2 variants, 182e183 variants of uncertain significance (VUS), 170 Spliceogenic variants, 122, 125e127 Splicing isoforms, 122, 125 Stargazer, 199e200 Syndromic hearing loss, 306 Synonymous variants, 13

breast cancer histopathology, 53 somatic/germ line ratio, 53 Trans-acting, 127e129

U Unknown RASopathy diagnosis, 394 30 Untranslated region (30 UTR), 14e15 50 Untranslated region (50 UTR), 12 Untranslated regions (UTRs), 122 U-shaped configuration, 305

V Validation, 152 Variant call format (VCF), 17e18 Variant classification challenges and considerations, 23e24 clinical classification, 21e22 disorders and phenotypes, 22e23 variant classification, 20e21 Variant Curation Expert panels (VCEPs), 35 Variant interpretation, 41e42, 324e331, 326te328t Variant-level data, 239 Variant of uncertain significance (VUS), 21, 147e149, 148f, 155t, 170 challenge of, 147e149 Variant pathogenicity assessment, 149

T

W

TP53, 50e52

Whole-genomesequencing (WGS), 399