171 107 16MB
English Pages 431 [413] Year 2018
Methods in Molecular Biology 1834
Bernhard H. F. Weber Thomas Langmann Editors
Retinal Degeneration Methods and Protocols Second Edition
Methods
in
M o l e c u l a r B i o lo g y
Series Editor John M. Walker School of Life and Medical Sciences University of Hertfordshire Hatfield, Hertfordshire, AL10 9AB, UK
For further volumes: http://www.springer.com/series/7651
Retinal Degeneration Methods and Protocols Second Edition
Edited by
Bernhard H. F. Weber Institute of Human Genetics, University of Regensburg, Regensburg, Germany
Thomas Langmann Department of Ophthalmology, University of Cologne, Cologne, Germany
Editors Bernhard H. F. Weber Institute of Human Genetics University of Regensburg Regensburg, Germany
Thomas Langmann Department of Ophthalmology University of Cologne Cologne, Germany
ISSN 1064-3745 ISSN 1940-6029 (electronic) Methods in Molecular Biology ISBN 978-1-4939-8668-2 ISBN 978-1-4939-8669-9 (eBook) https://doi.org/10.1007/978-1-4939-8669-9 Library of Congress Control Number: 2018956140 © Springer Science+Business Media, LLC, part of Springer Nature 2019 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Humana Press imprint is published by the registered company Springer Science+Business Media, LLC part of Springer Nature. The registered company address is: 233 Spring Street, New York, NY 10013, U.S.A.
Preface The first edition of Retinal Degeneration: Methods and Protocols was published in 2013, at the time providing a comprehensive step-by-step guide of relevant and state-of-the-art methods for studying retinal homeostasis and disease. Since then, retinal research has witnessed profound and far-reaching methodological developments not only in basic but also in clinical approaches to understand retinal function and treat disease pathology. To many experts in the field, such a significant and rapid progress may not have come unexpectedly and obviously was a result of a number of groundbreaking technological advances over recent times. First and foremost, there were radical innovations in the area of high-throughput technologies, most notably in the area of large-scale sequence analysis of genomes and transcriptomes allowing new strategic approaches from disease gene identification to understanding cellular processes to designing novel therapeutic targets for retinal diseases. Inadvertently, such advances produce high-dimensional and complex data while their contextual evaluation is unthinkable without a co-evolution of parallel technological developments in computational capabilities. Other areas of major advances concern the field of stem cell research, specifically the possibility to generate induced pluripotent stem cells from differentiated adult, and importantly from patient cells, as well as the most recent applications of genome editing in mammalian cells. By itself and in combination, these advances have found their widespread application in retinal research within the last few years. Volume 2 of Retinal Degeneration: Methods and Protocols responds to these advancements by providing key updates for a number of earlier chapters but also by including novel chapters addressing the most recent technological developments and their applications in retinal research. Volume 2 of Retinal Degeneration: Methods and Protocols has grouped related topics into five main parts. Part I Molecular Genetics and Tools provides an introductory chapter with an updated overview of gene identification approaches in human retinal disease followed by newly added method chapters describing state-of-the-art methodologies such as the analysis of differential gene expression using high-throughput transcriptome sequencing, molecular-based screening techniques for retinal degeneration mutations in laboratory mouse strains, CRISPR/Cas9 gene editing, and monitoring surface reactions by analyzing complement factors in the retina. Part II Cell/Tissue Culture Models comprises detailed protocols to generate functional retinal pigment epithelium cells from human induced pluripotent stem cells, to analyze photoreceptor outer segment phagocytosis, to establish porcine RPE/choroidal explant cultures, to generate 3D retinal tissue from mouse embryonic stem cells, to analyze cell death in retinal cultures, and to fate map microglia in retinal disease. In Part III Animal Models methodological details are given for the mouse retina as a model for light damage and oxygen-induced retinopathy. Other animal models in retinal research include Xenopus laevis, zebrafish, and the fruit fly Drosophila melanogaster, each one reflecting defined aspects of retinal biology and disease. Part IV Retinal Imaging covers advanced fundus imaging and angiography as well as optical coherence tomography (OCT) suited to comprehensively phenotype the mammalian retina. For immunofluorescence microscopy, cell-specific protein markers are characterized in depth
v
vi
Preface
and the immuno-TEM/STEM system is introduced for immuno-EM, a method that can determine protein localization at the ultrastructural level. Finally, the two-photon e xcitation microscopy allows visualization of membrane structures and imaging deep into the retina. Lastly, in Part V Therapies an introductory chapter offers a comprehensive overview of cell-based treatment approaches. Additional chapters explore various transport vehicles for retinal gene therapy such as AAV vectors or nanoparticles, the latter specifically targeting retinal and choroidal capillaries and optimized techniques to deliver such vehicles into the subretinal space in the mouse eye. We are most grateful to all expert contributors of chapters in this volume who ensured with detailed and comprehensive descriptions of methods and approaches that the reader and experimenter will directly be in a situation to translate even complex techniques into their own line of work. We hope that the manifold tips and additional suggestions will make the crucial difference between failure and frustration over ill-working or ill-designed experiments but instead will help to successfully establish an efficient and new experimental approach in your laboratory. Often it is the minor methodological detail not generally provided in the regular literature that makes the difference for an experiment to work to full satisfaction and to yield meaningful results. We are deeply indebted to John Walker who once again entrusted us with the task to guest edit a further edition of the Retinal Degeneration: Methods and Protocols. As before, he has greatly supported these efforts by sharing his knowledge and his many insightful experiences in preparing this edition. Regensburg, Germany Cologne, Germany
Bernhard H. F. Weber Thomas Langmann
Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Contributors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
Part I Molecular Genetics and Tools 1 Identification and Analysis of Genes Associated with Inherited Retinal Diseases������������������������������������������������������������������������������������������������� 3 Mubeen Khan, Zeinab Fadaie, Stéphanie S. Cornelis, Frans P. M. Cremers, and Susanne Roosing 2 Conduct and Quality Control of Differential Gene Expression Analysis Using High-Throughput Transcriptome Sequencing (RNASeq) ����������������������� 29 Felix Grassmann 3 Testing for Known Retinal Degeneration Mutants in Mouse Strains������������������� 45 Khalid Rashid, Katharina Dannhausen, and Thomas Langmann 4 CRISPR/Cas9 Gene Editing In Vitro and in Retinal Cells In Vivo ������������������� 59 Daniela Benati, Valeria Marigo, and Alessandra Recchia 5 Monitoring Surface Reactions by Combined Western Blot-ELISA Analysis ������� 75 Yuchen Lin, Sarah Irmscher, and Christine Skerka
Part II Cell/Tissue Culture Models 6 Generation of Functional Retinal Pigment Epithelium from Human Induced Pluripotent Stem Cells��������������������������������������������������������������������������������������� 87 Caroline Brandl 7 Advanced Analysis of Photoreceptor Outer Segment Phagocytosis by RPE Cells in Culture������������������������������������������������������������������������������������� 95 Francesca Mazzoni, Yingyu Mao, and Silvia C. Finnemann 8 Porcine RPE/Choroidal Explant Cultures��������������������������������������������������������� 109 Alexa Klettner and Yoko Miura 9 The Mouse Retinal Organoid Trisection Recipe: Efficient Generation of 3D Retinal Tissue from Mouse Embryonic Stem Cells����������������������������������� 119 Manuela Völkner, Thomas Kurth, and Mike O. Karl 10 Cell Death Analysis in Retinal Cultures ������������������������������������������������������������� 143 Sarah L. Roche, Ana M. Ruiz-Lopez, and Thomas G. Cotter 11 Fate Mapping In Vivo to Distinguish Bona Fide Microglia Versus Recruited Monocyte-Derived Macrophages in Retinal Disease������������������������������������������� 153 Nancy J. Reyes, Rose Mathew, and Daniel R. Saban
vii
viii
Contents
Part III Animal Models 12 Light Damage Models of Retinal Degeneration������������������������������������������������� 167 Christian Grimm and Charlotte E. Remé 13 Induction and Readout of Oxygen-Induced Retinopathy����������������������������������� 179 Raffael Liegl, Claudia Priglinger, and Andreas Ohlmann 14 Generation and Analysis of Xenopus laevis Models of Retinal Degeneration Using CRISPR/Cas9 ��������������������������������������������������������������������������������������� 193 Joanna M. Feehan, Paloma Stanar, Beatrice M. Tam, Colette Chiu, and Orson L. Moritz 15 Gene Knockdown in Zebrafish (Danio rerio) as a Tool to Model Photoreceptor Diseases������������������������������������������������������������������������������������� 209 Holger Dill and Utz Fischer 16 Drosophila melanogaster: A Valuable Genetic Model Organism to Elucidate the Biology of Retinitis Pigmentosa������������������������������������������������������������������� 221 Malte Lehmann, Elisabeth Knust, and Sarita Hebbar
Part IV Retinal Imaging 17 Retinal Fundus Imaging in Mouse Models of Retinal Diseases��������������������������� 253 Anne F. Alex, Maged Alnawaiseh, Peter Heiduschka, and Nicole Eter 18 Phenotyping of Mouse Models with OCT��������������������������������������������������������� 285 G. Alex Ochakovski and M. Dominik Fischer 19 Cell-Specific Markers for the Identification of Retinal Cells and Subcellular Organelles by Immunofluorescence Microscopy������������������������������������������������� 293 Laurie L. Molday, Christiana L. Cheng, and Robert S. Molday 20 Immuno-TEM/STEM in Retinal Research������������������������������������������������������� 311 Sanae Sakami and Krzysztof Palczewski 21 Noninvasive Two-Photon Microscopy Imaging of Mouse Retina and Retinal Pigment Epithelium ����������������������������������������������������������������������� 333 Grazyna Palczewska, Timothy S. Kern, and Krzysztof Palczewski 22 Analysis of the Drosophila Compound Eye with Light and Electron Microscopy����������������������������������������������������������������������������������� 345 Monalisa Mishra and Elisabeth Knust
Part V Therapies 23 Cell-Based Therapy for Retinal Disease: The New Frontier ������������������������������� 367 Marco Zarbin 24 In Vitro Evaluation of AAV Vectors for Retinal Gene Therapy��������������������������� 383 Johanna E. Wagner, Christian Schön, Elvir Becirovic, Martin Biel, and Stylianos Michalakis
Contents
ix
25 Nanoparticles Targeting Retinal and Choroidal Capillaries In Vivo��������������������� 391 Alexandra Haunberger and Achim Goepferich 26 Optimized Subretinal Injection Technique for Gene Therapy Approaches ��������� 405 Regine Mühlfriedel, Stylianos Michalakis, Marina Garcia Garrido, Vithiyanjali Sothilingam, Christian Schön, Martin Biel, and Mathias W. Seeliger Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413
Contributors Anne F. Alex • Department of Ophthalmology, University of Muenster Medical School, Muenster, Germany Maged Alnawaiseh • Department of Ophthalmology, University of Muenster Medical School, Muenster, Germany Elvir Becirovic • Department of Pharmacy—Center for Drug Research, Center for Integrated Protein Science Munich (CiPSM), Ludwig-Maximilians-Universität München, Munich, Germany Daniela Benati • Department of Life Sciences, Centre for Regenerative Medicine, University of Modena and Reggio Emilia, Modena, Italy Martin Biel • Department of Pharmacy—Center for Drug Research, Center for Integrated Protein Science Munich (CiPSM), Ludwig-Maximilians-Universität München, Munich, Germany Caroline Brandl • Department of Ophthalmology, University Hospital Regensburg, Regensburg, Germany; Institute of Human Genetics, University of Regensburg, Regensburg, Germany; Department of Genetic Epidemiology, University of Regensburg, Regensburg, Germany Christiana L. Cheng • Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, BC, Canada Colette Chiu • Department of Ophthalmology and Visual Sciences, University of British Columbia, Vancouver, BC, Canada Stéphanie S. Cornelis • Department of Human Genetics, Donders Institute for Brain Cognition and Behaviour, Radboud University Medical Center, Nijmegen, The Netherlands Thomas G. Cotter • Tumour Biology Laboratory, School of Biochemistry and Cell Biology, Bioscience Research Institute, University College Cork, Cork, Ireland Frans P. M. Cremers • Department of Human Genetics, Donders Institute for Brain Cognition and Behaviour, Radboud University Medical Center, Nijmegen, The Netherlands Katharina Dannhausen • Laboratory for Experimental Immunology of the Eye, Department of Ophthalmology, University of Cologne, Cologne, Germany Holger Dill • Department of Biochemistry, Biocentre, University of Wuerzburg, Würzburg, Germany Nicole Eter • Department of Ophthalmology, University of Muenster Medical School, Muenster, Germany Zeinab Fadaie • Department of Human Genetics, Donders Institute for Brain Cognition and Behaviour, Radboud University Medical Center, Nijmegen, The Netherlands Joanna M. Feehan • Department of Ophthalmology and Visual Sciences, University of British Columbia, Vancouver, BC, Canada; The Sainsbury Laboratory, Colney Ln, Norwich Research Park, Norwich, Norfolk, UK Silvia C. Finnemann • Department of Biological Sciences, Center for Cancer, Genetic Diseases and Gene Regulation, Larkin Hall, Fordham University, Bronx, NY, USA
xi
xii
Contributors
M. Dominik Fischer • Centre for Ophthalmology, University Eye Hospital, University of Tuebingen, Tuebingen, Germany Utz Fischer • Department of Biochemistry, Biocentre, University of Wuerzburg, Würzburg, Germany Marina Garcia Garrido • Division of Ocular Neurodegeneration, Institute for Ophthalmic Research, Centre for Ophthalmology, Eberhard Karls Universität Tübingen, Tübingen, Germany Achim Goepferich • Department of Pharmaceutical Technology, University of Regensburg, Regensburg, Germany Felix Grassmann • Institute of Human Genetics, University of Regensburg, Regensburg, Germany; Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden Christian Grimm • Lab for Retinal Cell Biology, Department of Ophthalmology, University of Zürich, Zürich, Switzerland Alexandra Haunberger • Department of Pharmaceutical Technology, University of Regensburg, Regensburg, Germany Sarita Hebbar • Max-Planck-Institute of Molecular Cell Biology and Genetics, Dresden, Germany Peter Heiduschka • Department of Ophthalmology, University of Muenster Medical School, Muenster, Germany Sarah Irmscher • Department of Infection Biology, Leibniz Institute for Natural Product Research and Infection Biology, Jena, Germany Mike O. Karl • German Center for Neurodegenerative Diseases Dresden (DZNE), Dresden, Germany; Technische Universität Dresden, Center for Regenerative Therapies Dresden (CRTD), Dresden, Germany Timothy S. Kern • Department of Pharmacology, Case Western Reserve University School of Medicine, Cleveland, OH, USA Mubeen Khan • Department of Human Genetics, Donders Institute for Brain Cognition and Behaviour, Radboud University Medical Center, Nijmegen, The Netherlands Alexa Klettner • Department of Ophthalmology, University Medical Center, University of Kiel, Kiel, Germany Elisabeth Knust • Max-Planck-Institute of Molecular Cell Biology and Genetics, Dresden, Germany Thomas Kurth • Technische Universität Dresden, Center for Regenerative Therapies Dresden (CRTD), Dresden, Germany Thomas Langmann • Laboratory for Experimental Immunology of the Eye, Department of Ophthalmology, University of Cologne, Cologne, Germany Malte Lehmann • Max-Planck-Institute of Molecular Cell Biology and Genetics, Dresden, Germany Raffael Liegl • Department of Ophthalmology, University Hospital, Ludwig- Maximilians-University Munich, Munich, Germany Yuchen Lin • Department of Infection Biology, Leibniz Institute for Natural Product Research and Infection Biology, Jena, Germany Yingyu Mao • Department of Biological Sciences, Center for Cancer, Genetic Diseases and Gene Regulation, Larkin Hall, Fordham University, Bronx, NY, USA
Contributors
xiii
Valeria Marigo • Department of Life Sciences, University of Modena and Reggio Emilia, Modena, Italy Rose Mathew • Department of Ophthalmology, Duke University School of Medicine, Durham, NC, USA Francesca Mazzoni • Department of Biological Sciences, Center for Cancer, Genetic Diseases and Gene Regulation, Larkin Hall, Fordham University, Bronx, NY, USA Stylianos Michalakis • Department of Pharmacy—Center for Drug Research, Center for Integrated Protein Science Munich (CiPSM), Ludwig-Maximilians-Universität München, Munich, Germany Maged Mishra • National Institute of Technology Rourkela (NITR), Rourkela, Odisha, India Yoko Miura • Institute of Biomedical Optics, University of Lübeck, Lübeck, Germany; Department of Ophthalmology, University of Lübeck, Lübeck, Germany Laurie L. Molday • Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, BC, Canada Robert S. Molday • Department of Biochemistry and Molecular Biology, University of British Columbia, Vancouver, BC, Canada Orson L. Moritz • Department of Ophthalmology and Visual Sciences, University of British Columbia, Vancouver, BC, Canada Regine Mühlfriedel • Division of Ocular Neurodegeneration, Institute for Ophthalmic Research, Centre for Ophthalmology, Eberhard Karls Universität Tübingen, Tübingen, Germany G. Alex Ochakovski • Centre for Ophthalmology, University Eye Hospital, University of Tuebingen, Tuebingen, Germany Andreas Ohlmann • Department of Ophthalmology, University Hospital, Ludwig- Maximilians-University Munich, Munich, Germany Grazyna Palczewska • Polgenix Inc., Cleveland, OH, USA Krzysztof Palczewski • Department of Pharmacology, Cleveland Center for Membrane and Structural Biology, Case Western Reserve University School of Medicine, Cleveland, OH, USA Claudia Priglinger • Department of Ophthalmology, University Hospital, Ludwig- Maximilians-University Munich, Munich, Germany Khalid Rashid • Laboratory for Experimental Immunology of the Eye, Department of Ophthalmology, University of Cologne, Cologne, Germany Alessandra Recchia • Department of Life Sciences, Centre for Regenerative Medicine, University of Modena and Reggio Emilia, Modena, Italy Charlotte E. Remé • University Zürich, Zürich, Switzerland Nancy J. Reyes • Department of Ophthalmology, Duke University School of Medicine, Durham, NC, USA Sarah L. Roche • Tumour Biology Laboratory, School of Biochemistry and Cell Biology, Bioscience Research Institute, University College Cork, Cork, Ireland Susanne Roosing • Department of Human Genetics, Donders Institute for Brain Cognition and Behaviour, Radboud University Medical Center, Nijmegen, The Netherlands Ana M. Ruiz-Lopez • Tumour Biology Laboratory, School of Biochemistry and Cell Biology, Bioscience Research Institute, University College Cork, Cork, Ireland
xiv
Contributors
Daniel R. Saban • Department of Ophthalmology, Duke University School of Medicine, Durham, NC, USA Sanae Sakami • Department of Pharmacology, Cleveland Center for Membrane and Structural Biology, Case Western Reserve University School of Medicine, Cleveland, OH, USA Christian Schön • Department of Pharmacy—Center for Drug Research, Center for Integrated Protein Science Munich (CiPSM), Ludwig-Maximilians-Universität München, Munich, Germany Mathias W. Seeliger • Division of Ocular Neurodegeneration, Institute for Ophthalmic Research, Centre for Ophthalmology, Eberhard Karls Universität Tübingen, Tübingen, Germany Christine Skerka • Department of Infection Biology, Leibniz Institute for Natural Product Research and Infection Biology, Jena, Germany Vithiyanjali Sothilingam • Division of Ocular Neurodegeneration, Institute for Ophthalmic Research, Centre for Ophthalmology, Eberhard Karls Universität Tübingen, Tübingen, Germany Paloma Stanar • Department of Ophthalmology and Visual Sciences, University of British Columbia, Vancouver, BC, Canada Beatrice M. Tam • Department of Ophthalmology and Visual Sciences, University of British Columbia, Vancouver, BC, Canada Manuela Völkner • German Center for Neurodegenerative Diseases Dresden (DZNE), Dresden, Germany Johanna E. Wagner • Department of Pharmacy—Center for Drug Research, Center for Integrated Protein Science Munich (CiPSM), Ludwig-Maximilians-Universität München, Munich, Germany Marco Zarbin • Institute of Ophthalmology and Visual Science, Rutgers-New Jersey Medical School, Rutgers University, Newark, NJ, USA
Part I Molecular Genetics and Tools
Chapter 1 Identification and Analysis of Genes Associated with Inherited Retinal Diseases Mubeen Khan, Zeinab Fadaie, Stéphanie S. Cornelis, Frans P. M. Cremers, and Susanne Roosing Abstract Inherited retinal diseases (IRDs) display a very high degree of clinical and genetic heterogeneity, which poses challenges in finding the underlying defects in known IRD-associated genes and in identifying novel IRD-associated genes. Knowledge on the molecular and clinical aspects of IRDs has increased tremendously in the last decade. Here, we outline the state-of-the-art techniques to find the causative genetic variants, with special attention for next-generation sequencing which can combine molecular diagnostics and retinal disease gene identification. An important aspect is the functional assessment of rare variants with RNA and protein effects which can only be predicted in silico. We therefore describe the in vitro assessment of putative splice defects in human embryonic kidney cells. In addition, we outline the use of stem cell technology to generate photoreceptor precursor cells from patients’ somatic cells which can subsequently be used for RNA and protein studies. Finally, we outline the in silico methods to interpret the causality of variants associated with inherited retinal disease and the registry of these variants. Key words Inherited retinal diseases, Genome sequencing, RNA splicing assays, Variant interpretation, Variant registry
1 Introduction 1.1 Spectrum of Inherited Retinal Diseases
Inherited retinal diseases (IRDs) represent a clinically and genetically heterogeneous group of disorders affecting the retina. These diseases can be classified clinically based on whether they predominantly affect the rods (e.g., retinitis pigmentosa, RP) or the cones (e.g., cone and cone-rod dystrophies, CD/CRD) or cause a more generalized photoreceptor disease (e.g., Leber congenital amaurosis, LCA) [1]. Most IRDs are associated with a gradual deterioration throughout life, while some appear nonprogressive (e.g., congenital stationary night blindness, achromatopsia, some forms of LCA). The differential diagnosis between various IRDs can sometimes be complicated, as the clinical features of some IRDs can be very similar, both at early and late stages (Fig. 1).
Bernhard H. F. Weber and Thomas Langmann (eds.), Retinal Degeneration: Methods and Protocols, Methods in Molecular Biology, vol. 1834, https://doi.org/10.1007/978-1-4939-8669-9_1, © Springer Science+Business Media, LLC, part of Springer Nature 2019
3
4
Mubeen Khan et al.
Fig. 1 Phenotypic overlap between autosomal recessive retinal diseases. Patients with achromatopsia (ACHM) display a virtually stationary disease course in which cones are principally defective. At end stages, cone dystrophy (CD) can hardly be distinguished from cone-rod dystrophy (CRD). Patients with Stargardt disease (STGD1) later in life show mid-peripheral defects similar to CRD patients. Patients with retinitis pigmentosa (RP) initially display night blindness, followed by tunnel vision due to rod defects which very often progresses to complete blindness when the cones are also afflicted. In patients with Leber congenital amaurosis (LCA), the defects can occur in both types of photoreceptors, or in Müller or RPE cells, and therefore both clinical and molecular genetic overlap with CD, CRD, or RP can be expected. Patients with congenital stationary night blindness (CSNB) show a rod-specific defect
IRDs can be inherited in an autosomal recessive, autosomal dominant, and X-linked mode of inheritance (Fig. 2). For isolated males and females with RP (Fig. 2b, c, h, i), all inheritance patterns have been observed which are partially explained by the observation that de novo mutations are a significant cause of dominant non-familial disease [2, 3]. Although some diseases are caused by mutations in a relatively small number of genes, the most prevalent IRDs are genetically highly heterogeneous with many causative genes (Fig. 3). The extreme example is RP, which has currently been associated with mutations in 84 different genes. To date, mutations in 261 genes have been identified in patients with non-syndromic and syndromic IRDs, and it is estimated that these genes account for ~80% of the genetic disease load [4, 5]. As the most recently identified novel genetic defects in IRDs are found in single cases or families, it is difficult to predict how many IRD-associated genes are yet to be identified. All known IRD-associated genes and the corresponding modes of inheritance can be found at http://www.sph.uth.tmc. edu/RetNet/.
Identification and Analysis of Genes Associated with Inherited Retinal Diseases
A.
C.
E.
AD
AR>AD*>XL
XL>AD#>AR
B.
AR>XL>AD*
D.
AD, MI
F.
AD>ARPD
5
#
G.
AR>XL
H. AR>XL>AD*
J.
AR
I. AR>AD*>XL
Fig. 2 Examples of the different modes of inheritance observed in retinal diseases. Illustration of inheritance models based on the occurrence and gender of affected individuals and their position in the pedigree. In bold above the pedigrees, the most likely modes of inheritance are given, followed by the less likely modes of inheritance. AD autosomal dominant, AR autosomal recessive, MI mitochondrial inheritance, PD pseudodominant (autosomal recessive) inheritance, XL X-linked recessive, *de novo mutation, #non-penetrant individual
1.2 The Changing Landscape of Retinal Disease Gene Identification
The methods and tools available for gene identification have continuously evolved in the last three decades. The first retinal disease- associated gene identified was the ornithine aminotransferase (OAT) gene involved in gyrate atrophy. An enzymatic defect of ornithine aminotransferase activity was measured in patient’s cells in 1977 [6], and 11 years later the OAT gene was cloned and the
6
Mubeen Khan et al.
Fig. 3 Genetic overlap between non-syndromic monogenic retinal diseases. Clinical diagnoses are indicated by colored circles. In the overlapping areas, we provide the number of genes implicated in different phenotypes. Colored numbers outside the balloons represent the total number of genes associated with these phenotypes. RP retinitis pigmentosa, CSNB congenital stationary night blindness, LCA Leber congenital amaurosis, CD/CRD cone and cone-rod dystrophies, MD macular degeneration, EVR exudative vitreoretinopathies
first mutation was identified [7]. In 1990, mutations in the rhodopsin gene were identified in patients with autosomal dominant retinitis pigmentosa using a candidate gene approach [8], after linkage analysis in a large Irish adRP family had pointed toward a genomic region encompassing this gene [9]. In the same year, the choroideremia (CHM) gene was identified using a positional cloning approach by mapping deletions in patients with syndromic and non-syndromic choroideremia [10]. The candidate gene approach, i.e., the search for IRD- associated variants in genes encoding proteins with known crucial functions in the retina, has been very successful. The identification of IRD-associated genes through their genomic position (i.e., positional cloning) as determined by linkage analysis (see Subheading 2.8) has been used effectively, though this generally requires the availability of large families or a large set of families in which the same locus is involved. In the early years, linkage analysis using polymorphic microsatellite markers was a labor-intensive method but was lifted to a fast and genome-wide approach with the development of microarray technology allowing rapid genotyping of thousands of single nucleotide polymorphisms (SNPs) spread across the genome. SNP microarrays have also proven very
Identification and Analysis of Genes Associated with Inherited Retinal Diseases
7
valuable for homozygosity and identity-by-descent (IBD) mapping of recessive disease genes (see Subheading 2.5), not only in consanguineous families but also in small families and single patients of non-consanguineous marriages [11]. We are witnessing a new era in disease gene identification with the introduction of next-generation sequencing, allowing the analysis of all genes in a defined linkage interval, all exons in the genome (whole exome sequencing – WES), or even the entire genomic sequence (whole genome sequencing – WGS). This also brings new challenges, such as data analysis and interpretation of genomic variants. Given the huge number of variants present in a patient’s genome, positional information on where the causative gene may be localized (e.g., by linkage analysis and/or homozygosity mapping) remains very helpful to pinpoint the genetic defect. Employing WGS, thousands of rare single nucleotide variants (SNVs) and structural variations (SVs) are found in every individual, and it remains very challenging to identify the causal variant(s). A functional readout is required to identify the culprit variant(s). Gene-specific mRNA analysis or genome-wide mRNA analysis (transcriptome analysis) may identify quantitative or structural defects in mRNAs. 1.3 Importance of Molecular Diagnostics
Receiving a molecular diagnosis becomes increasingly important with the development of (gene) therapy for IRDs [4, 5]. Up to 9 years ago, it was not possible to slow down, stabilize, or treat the vision impairment in patients with IRDs. This changed for a small group of patients with RPE65 mutations, as gene augmentation was successfully and safely applied through subretinal injections of recombinant adeno-associated viruses (rAAVs) in Phase 1/2 trials [12–14]. rAAVs transduce the retinal pigment epithelium (RPE) cells, upon which the viruses are shuttled to the nucleus, and the rAAV vector remains a stable extrachromosomal element. In the meantime, many more patients have been treated in two centers in Philadelphia and one in London. Vision improvement was variable and in general modest and appears to be more effective in younger patients. A Phase 3 trial was conducted in one center using an improved rAAV vector which resulted in increased subjective and objective vision in the treated eye versus the untreated eye [15]. Gene augmentation targeting photoreceptors and the RPE was also successfully performed in a Phase 1/2 trial in choroideremia patients [16, 17]. In addition, an oral 9-cis retinoid supplementation therapy seems effective in patients with RPE65 and LRAT mutations [18, 19]. Several therapies that will be developed in the next years will be gene- or even mutation-specific, emphasizing the importance for patients to receive a molecular diagnosis. An overview of all ongoing gene therapy trials can be found at http://clinicaltrials.gov.
8
Mubeen Khan et al.
To provide a more accurate prognosis, and to determine which forms of IRD would most likely benefit from (gene) therapy, patients should be thoroughly clinically examined using standardized protocols. Ocular coherence tomography (OCT) studies have shown that certain forms of IRDs are likely less suitable for therapy, while in other forms photoreceptors remain viable for a prolonged period [20]. IRDs are sometimes the first sign of syndromic disease, such as Senior-Løken syndrome that involves renal failure. Since the ocular phenotype precedes the manifestation of kidney abnormalities, there is often a delay in the diagnosis of nephronophthisis. This causes a risk for sudden death from fluid and electrolyte imbalance. Determining an early molecular diagnosis allows physicians to monitor patients carrying mutations in genes associated with a syndrome more closely and provide better healthcare for kidney disease and other systemic features. Nephronophthisis patients that receive kidney transplants have excellent outcomes compared with the general pediatric transplant population.
2 Techniques 2.1 Sanger Sequencing
Sanger sequencing is still the gold standard of DNA sequencing and mutation identification. Sanger sequencing is based on the incorporation of deoxynucleotides and fluorochrome-labeled dideoxynucleotides using DNA polymerase, the latter of which abrogate the replication of a DNA fragment at random positions. In this way, a mixture of DNA fragments is synthesized and size separated through capillary electrophoresis. The most widely used apparatus (Applied Biosystems) can analyze up to 96 samples in parallel. Sanger sequencing is preceded by PCR amplification of a DNA fragment of interest and is the most widely used sequencing technique for a limited number of exons or amplicons. Its advantages are its accuracy, flexibility, and speed. The costs are relatively low as long as the number of amplicons is limited, i.e., less than 20. If NGS-based techniques are available, genes with more exons or amplicons rather can be sequenced using NGS-based approaches (see below). Sanger sequencing however is the preferred method to study the segregation of variants in family members of the IRD proband.
2.2 Next-Generation Sequencing
Although Sanger sequencing is considered as a gold standard for sequencing, it is not appropriate for the identification of disease- associated genes and variants involved in genetic heterogeneous disorders [2, 21]. In the last decade, next-generation sequencing (NGS) or massively parallel sequencing has been developed and continuously improved in terms of accuracy and throughput. This revolutionized all aspects of biological sciences and healthcare
Identification and Analysis of Genes Associated with Inherited Retinal Diseases
9
through the identification of an enormous number of genomic variants and their role in disease etiology [22, 23]. These technological advancements sparked the identification of not only novel causative variants but also new causal genes for IRDs. Since the NGS-based discovery of the TSPAN12 gene to underlie familial exudative vitreoretinopathy, 53 new IRD-associated genes have been identified using this technology [24]. 2.3 Targeted Next-Generation Sequencing
With the emergence of high-throughput sequencing technologies, many human genomes have been completely sequenced. However, due to the relatively high costs and the complexity of data analysis, it is economically not feasible to screen many individuals with genetically heterogeneous disorders. Therefore, to overcome these issues, often genomic regions of interest are selectively enriched and sequenced using high NGS, commonly known as targeted NGS or panel-based sequencing [25]. Targeted approaches have several advantages over the holistic approach, i.e., lower costs in research settings, deeper coverage, and the faster generation of data [26]. Targeted approaches have been widely used to investigate novel variants in a large group of known genes associated with genetic disorders as well as to study other specific genomic regions such as CpG islands and regulatory elements [27]. Therefore, this approach provided a better understanding of genetic etiology of heterogeneous disorders such as IRDs [3]. To enrich targeted regions, various methods can be used depending on the aims of the study. The most widely used enrichment strategies include the hybridization approaches, which are multiplexed array hybridization (NimbleGen) or in-solution hybridization (Agilent SureSelect Target Enrichment System, NimbleGen SeqCap EZ), highly multiplexed PCR (molecular inversion probes), and targeted circularization (HaloPlex, Agilent Technologies) [28–33]. Of all these enrichment approaches, the molecular inversion probe-based multiplex PCR enrichment strategy has shown several advantages over other capture methods in terms of cost, ease of use, sensitivity, and specificity. This can be employed for simultaneous analysis of hundreds of patients, with a drop in costs with an increasing number of analyzed cases ([34], FPM Cremers, personal communication).
2.4 Whole Exome Sequencing
More than 20 different NGS platforms are commercially available. However, the most widely used platforms are Life Technologies Systems (Ion Torrent, Proton, and SOLiD 5500xl) and the Illumina platforms (e.g., NextSeq 500, MiSeq, Genome Analyzer IIx). Although all these platforms have a much higher throughput than conventional Sanger sequencing, they still differ significantly. Important differences concern the read length and number of reads produced in a specific run (e.g., SOLiD 5500xl, up to 75 bp (+35 bp) reads; HiSeq 2000, up to 2 × 100 bp reads) [35].
10
Mubeen Khan et al.
As these technologies focus on the protein-coding elements of human genes, they rely on the enrichment of small fragments. This enrichment is hampered by low and high GC-containing sequences. The latter is frequently observed in 5′ regions of genes. A median coverage of 50x is considered acceptable but generally will leave up to 5% of coding regions not or poorly covered [36]. As described below, better coverage is achieved using newly emerging NGS platforms including Pacific Biosciences (RS II and Sequel) and Oxford Nanopore (Oxford Nanopore MK 1 MinION Oxford Nanopore PromethION) using single-molecule real-time sequencing approach generating longer reads ranging between 8 and 200 kb [37, 38]. 2.5 Whole Genome Sequencing: Short- Read Versus Long- Read Techniques (Pros and Cons)
WES is not able to detect deep-intronic variants or SVs that might affect RNA splicing or transcription [21]. WGS is a more comprehensive method to detect genetic defects [39] as it enables identification of SNVs and SVs such as complex genomic rearrangements, large deletions, and insertions [40–42]. The costs of WGS are continuously decreasing which makes the technology accessible to more laboratories, while the analysis of huge amounts of genomic variant data remains challenging [43]. First, approximately three million variants are detected in each sample. Furthermore, determining the effect of regulatory and intronic variants is still complex, even using current available prediction programs [21, 44]. Generally, there are two different methods for performing WGS, long reads versus short reads. The sequencing read length depends on the purpose of the experiment and also the properties of the instruments. In regular short-read sequencing, the read length is between 25 and 100 bp, whereas 10–15 kb is usually sequenced in the long-read approach [45–47]. Long-read sequencing is often performed when the aim is to identify SVs in the genome-like CNVs and provide a high level of completeness in detecting these, especially for regions with repetitive nucleotides or hairpin-like structure. For example, WES typically will not enrich for RPGR ORF15 sequences, a hotspot for mutations causing X-linked RP and CRD, due to its repetitive nature. The disadvantages of long-read sequencing, compared to short-read sequencing, are that it is ~5 times more expensive than short-read sequencing (1000$ versus 200$ per GB) [38] and the 15% error rate in long-read sequencing, mostly indels [48]. Reducing the error rate is crucial in the use of these reads in, e.g., de novo genome assembly [49, 50]. Also transposable elements complicate genome reconstruction due to their high sequence identity, high copy number, or complexity in genomic rearrangements [51].
Identification and Analysis of Genes Associated with Inherited Retinal Diseases
11
Short-read sequencing is the preferred method for the detection of single nucleotide variations (SNVs) due to its deeper coverage, accuracy [52], and low cost. For this purpose, the reads should align sufficiently and uniquely with the reference sequence, which can be achieved in general with a 25–100 bp read length [45]. 2.6 Homozygosity Mapping
In the past decades, microsatellite markers and short tandem repeat polymorphisms were used to detect the disease locus using linkage analysis and homozygosity mapping. These multi-allelic markers were replaced by SNP arrays that contain hundreds of thousands of biallelic markers, which can be analyzed rapidly [53, 54]. WES and WGS data contain SNP data which can now be used to perform homozygosity mapping [55]. This not only can pinpoint the culprit gene or variant in consanguineous families in which the offspring of first-cousin marriages on average show homozygosity in 10% of their genome [56] but also can be informative to identify the causal variant – and sometimes a novel causal gene – in non- consanguineous families [11, 57].
2.7 Copy Number Variation Mapping
Copy number variations (CNVs) are one of the most significant SVs in the human genome that involve DNA fragments typically longer than 50 bp [58], whereas smaller elements are categorized as small insertions or deletions (indels) [59]. Generally, 4.8–9.5% of the human genome, based on the stringency of the map, contributes to CNVs [58]. Therefore, every healthy individual has almost one thousand CNVs within the genome [60, 61]. Furthermore, CNVs play an important role in the human diversity, which are estimated to lead to a 1.2% difference from the reference human genome as well as in the disease susceptibility [60, 62]. The phenotypic effect of the CNVs can vary from evolutionary changes to embryonic lethality, although the adaptive traits can be different in various environments [63, 64]. Pathogenic CNVs can be associated with inherited (monogenic) diseases, such as IRDs, as well as multifactorial diseases. WES analysis genetically solves 55–60% of these cases [65, 66]. The hidden genetic variations may be unrecognized CNVs and deep-intronic variations, which can be identified by WGS or gene-specific locus sequencing. CNVs can explain 18% of previously unsolved cases [67, 68]. The copy number, content, and positional information are the three genomic features that should be considered in order to identify the CNVs precisely [69]. Accordingly, CNV maps based on different ethnic groups of healthy populations were designed which can be utilized to accurately assess the variability of the human genome [58]. The map consists of microscopic (>3 Mb) and submicroscopic variations (50 bp to 3 Mb) [58]. The phenotype-first research approach has uncovered more deletions than duplications as gain- of-function variants associate generally with milder phenotypes. The consequence of a milder phenotype is a lower selective
12
Mubeen Khan et al.
ressure, which becomes visible when employing genotype-first p research [64]. There are two different approaches for studying CNVs: array- based and NGS-based. Overall, NGS-based approaches have a higher sensitivity and resolution than array-based approaches and are able to generate more precise sequence-level breakpoint resolution [62, 70]. However, duplications are more likely to be detected by array CGH than by SNP-based array or by NGS techniques [71]. In addition, WES can be used for CNV analysis in order to combine the detection of small and large variants [69]. Nevertheless, the orientation and location of duplicated sequences are complicated, and also some SVs such as inversions cannot be identified by WES analysis [72]. Consequently, WGS is a more accurate and reliable technique for studying CNVs. Furthermore, it gives us the opportunity to analyze not only the exonic regions but also the noncoding elements of the genome such as promoters, untranslated regions, and intronic sequences, as well as enhancers and insulators. It allows recognizing all different types of SVs in the human genome and is an ultimate goal of genetic testing in diagnostic laboratories, although it still requires a few years to become the standard diagnostic method [58, 69, 73]. 2.8 Linkage Analysis
Although WES and WGS potentially can identify causal defects in single patients, in some families with multiple affected cases, the causal defect cannot be identified in the absence of a functional readout. In this situation, linkage and haplotype analysis can narrow down the search for the causal defect. Linkage analysis, following genome-wide SNP genotyping, can be performed to determine the chromosomal region that segregates with a trait. The logarithm of the odds (LOD) score is the log10 ratio of the likelihood that the disease locus and a given genomic marker (e.g., SNP) are linked versus the likelihood that they are unlinked and is generally used as an outcome measure in linkage calculations. In order to reach statistically significant locus assignment by genome-wide genotyping, a minimum LOD score of 3.3 has to be obtained, whereas a LOD score of 1.86 is suggestive for linkage [74]. The more individuals (both affected and unaffected) are genotyped in the linkage analysis, the higher the final LOD score will be. Generally, linkage analysis is only performed if a LOD score of >2 can be obtained with the available relatives, which can be calculated by a linkage simulation prior to the actual genotyping. Issues that occasionally can interfere with linkage analysis are the occurrence of phenocopies (e.g., affected relatives with the same phenotype but a different (genetic) cause) or non-penetrance (e.g., the occurrence of individuals who carry the same causative mutation but do not, or hardly, display the clinical phenotype). Especially in some dominant retinal diseases, e.g., familial exudative vitreoretinopathy [75] and adRP caused by mutations in
Identification and Analysis of Genes Associated with Inherited Retinal Diseases
13
PRPF31 [76], non-penetrance is frequently observed. The actual linkage calculations can be performed by freely available software programs, like LINKAGE, Allegro, Genehunter, or SimWalk2. Graphical user interfaces for linkage analysis software on Microsoft Windows-based operating systems like easyLINKAGE [77] or Alohomora [78] allow to use each of these programs, with manually adjustable settings for the mode of inheritance, ethnic origin of the family, disease prevalence, and penetrance, among others. A more detailed description on linkage analysis is provided in several textbooks [79, 80]. Next, we provide a review to interpret the causality of variants associated with inherited retinal disease and the registry of these variants. 2.9 In Vitro RNA Splice Assays
Variants that have a potential effect on RNA splicing can be tested directly by reverse transcription PCR of mRNA extracted from accessible human tissues such as lymphoblasts or fibroblasts. As the mRNA derived from a gene with a protein-truncating mutation in the last exon or in the last 50–55 nucleotides of the penultimate exon will undergo nonsense-mediated decay, it is preferred to suppress nonsense-mediated decay using, e.g., cycloheximide in the last phase of culturing cells. The majority of IRD-associated genes are however not or poorly expressed in these non-ocular tissues. Alternatively, in vitro splice assays can be performed in human embryonic kidney (HEK) cells. Classically, the exonic or intronic segment to be analyzed was amplified from genomic DNA from a patient carrying the potential splice defect and cloned into a minigene splicing vector that contains a ubiquitously expressed promoter, a transcriptional start site, and at least two exons of another gene that flank the cloning site. Minigenes carrying the wild-type sequence and minigenes containing the variant are transfected in parallel with HEK293T cells, and after 48 h, cells are collected for RNA extraction. RT-PCR using primers annealing to the flanking vector exons was performed to analyze the effect of the putative splice variant [81, 82]. We recently detected that when using small minigenes that lack the proper genomic context, i.e., flanking exons and cis-acting elements that influence exon recognition, in vitro results do not always correlate with splice defects observed in patient cells and also are inadequate to rigorously test the effect of deep-intronic variants. We therefore designed an alternative strategy to generate sizeable multi-exonic splice vectors by employing bacterial artificial chromosome (BAC) DNA spanning the entire gene of interest. In this manner, vectors can be designed in which each exon is flanked by at least one downstream and upstream exon and their corresponding introns allowing variants to be tested in their genomic context without the influence of the splice site of flanking exons used in the original vector. BAC clones facilitated the generation of
14
Mubeen Khan et al.
large, multi-exon wild-type splice vectors. This approach can readily be applied to generate minigenes from all human genes known to be involved in IRDs or other diseases. In this way, deep-intronic, noncanonical splice site or even coding variants with unclear functional effects can be assessed for their effect on splicing, as BAC clones are available for this purpose [83, 84]. 2.10 In Vivo Splice Studies Using Stem Cell Technology
In vitro splice assays in HEK293T cells have one disadvantage, i.e., retina-specific splicing patterns may be missed. As shown for the consequences of a deep-intronic variant in CEP290, defective splicing (in this case pseudo-exon insertion) can be more prominent in retina-like cells than in fibroblasts [85]. We and others used stem cell technology to generate induced pluripotent stem cells (iPSCs) from patient’s fibroblasts and subsequently differentiated iPSCs to photoreceptor precursor cells (PPCs) [85–89]. In this way we were able to reveal the causative effect of an apparently mild noncanonical splice site variant, c.5461-10T>C, which turned out to result in the skipping of exon 39 or exons 39 and 40, rendering this the most frequent severe ABCA4 variant in Stargardt disease [86]. PPCs also have been used to pinpoint the splicing defects due to many other deep-intronic variants (S. Albert, A. Garanto, R. Sangermano, M. Khan, R. Collin, F. Cremers, unpublished data).
2.11 Variant Data Interpretation
After using NGS to sequence a patient’s exome or genome, an enormous amount of variants needs to be analyzed to find possible disease-causing variants. When these filtering steps have been performed, often a number of putative pathogenic variants are still remaining. To find out which variant is most likely pathogenic, the American College of Medical Genetics and Genomics (ACMG) guidelines can be used. When a likely pathogenic variant is found in a gene that is not known to cause disease yet, online tools like GeneMatcher (https://genematcher.org/) can be used.
2.11.1 American College of Medical Genetics and Genomics
To prioritize novel rare variants according to their functional impact, the American College of Medical Genetics and Genomics (ACMG) provided a system to classify variants along a five-tier gradient from benign, likely benign, uncertain significance, likely pathogenic, to pathogenic [44]. The system was developed by a collaboration of several experts in the field and made use of 11 classification protocols from different diagnostic groups. Furthermore, it was tested by classifying variants of which the pathogenicity was already known. For each variant one should check, among others, if the variant or the gene is already reported to be involved in the disease, whether the prediction of a truncating variant is reliable, whether
Identification and Analysis of Genes Associated with Inherited Retinal Diseases
15
the variant segregates, what the allele frequency of the variant is, and what in silico pathogenicity prediction programs predict for this variant. In silico prediction programs mainly focus on the effect of missense variants – such as PolyPhen [90], SIFT [91], and MutationTaster [92] – or on the effect on splicing, such as Human Splicing Finder [93], GeneSplicer [94], and MaxEntScan [95]. For each validation of a pathogenicity criteria, a score is given: PVS (very strong), PS (strong), PM (moderate), and PP (supporting). When all the criteria are checked, the guidelines provide rules to combine the scores to the five-tire classification. For example, few strong criteria need to be met to classify a variant as pathogenic, whereas more than three moderate criteria solely classify a variant as likely pathogenic. A disadvantage of the ACMG guidelines is that any proof or indication against pathogenicity will overrule the pathogenicity classification. This means that one incorrect study outweighs multiple other studies. The guidelines clearly show how complex the pathogenicity classification of a variant is and clearly mention that even when you followed this strict system, there still is a chance that the classification is incorrect due to unforeseen factors that could play a role [44]. 2.11.2 GeneMatcher
Likely pathogenic variants that match the genetic inheritance pattern can be found in genes of which the function is unknown. This gene then becomes a candidate gene for the disease. Especially for candidate disease genes of patients with a rare phenotype, the online web tool GeneMatcher (https://genematcher.org/) was developed. This tool allows researchers and clinicians to upload a gene with information about the identified genetic variants and the phenotype found in the patient that carries this variant or combination of variants. When a new gene is submitted to GeneMatcher and this gene is already present in GeneMatcher, submitters of this gene automatically receive an email and can get in touch [96].
2.11.3 European Retinal Disease Consortium
Specifically, for IRD candidate genes, the European Retinal Disease Consortium (ERDC) was set up. The ERDC consists of 20 European research groups, 1 Canadian group, and 1 group from the USA who collaborate to unravel the genetics of rare IRD http://www.erdc.info/.
2.12 Sequence Variant Registries and Databases
In order to classify a variant to be pathogenic, several databases exist to provide variant information per gene. Especially databases that provide phenotypic information that accompanies variants can be very useful. ClinVar [97] and Leiden Open (source) Variation Databases (LOVDs) [98] are examples of databases that provide this type of genotype-phenotype information. Most databases make use of the Human Genome Variation Society (HGVS) nomenclature (http:// varnomen.hgvs.org/) to describe variants in a consistent manner.
16
Mubeen Khan et al.
2.12.1 ClinVar
One of the databases that aims at improving the translation of genetic information into clinical healthcare is ClinVar, located at http://www.ncbi.nlm.nih.gov/clinvar, which is part of the NCBI’s Entrez system. A genetic variant with an accompanying phenotype can be uploaded, as long as the submission is based on clinical testing, research, or literature curation. Each submitted variant gets a record ID, to which also the submitter and the phenotype of the patient are added, as well as the curation of the variant being benign or pathogenic, as described by the submitter. Results from functional studies, e.g., in vitro studies, are not accepted as evidence for pathogenicity. In 2015, ClinVar added a star system to improve the reliability of the pathogenicity classification: ●●
●● ●●
●● ●●
Zero star: no assertion criteria were provided by the submitter. One star: criteria are met by one submitter. Two stars: multiple single submitters meet the criteria, and their interpretation is similar. Three stars: criteria are met and reviewed by an expert panel. Four stars: practice guidelines were met, meaning on top of the met criteria and revision by an expert panel, a rating system as described on their website is used and external revision took place (https://www.ncbi.nlm.nih.gov/clinvar/docs/ assertion_criteria/).
The classification of the variant’s pathogenicity is thus based on the interpretation of the submitter and thereby not consistent, although ClinVar encourages the use of the ACMG guidelines. Furthermore, the submitter is able to add “evidence,” a description of how variants were called and some context about the variant and its inheritance. The strength of ClinVar is also its downside: on the one hand, because everyone can submit their data, it is difficult to judge the reliability of a ClinVar record. On the other hand, however, because everyone can submit their data, ClinVar contains over 158,000 variant interpretations [97]. 2.12.2 Leiden Open (Source) Variation Database
The Leiden Open (source) Variation Database (LOVD) is in many ways similar to ClinVar and contains over 155,000 variants identified in over 22,000 genes (July 31, 2017), including ~20 IRD- associated genes [99]. The database contains variant information related to human phenotypes. The main difference between LOVDs and ClinVar is that the variants in the LOVD are curated by an expert who is responsible to check all variant records for a certain gene before they are uploaded to the database. Moreover, the variants are uploaded per patient. The latter allows one to look at the frequency of a variant within a patient group. Furthermore, there is space for the phenotype description and family information, and a reference needs to be given in order for the user to look up more information [98].
Identification and Analysis of Genes Associated with Inherited Retinal Diseases
17
Comprehensive gene-specific databases were set up for the Norrie disease pseudoglioma (NDP) gene, associated with X-linked Norrie disease and vitreoretinopathy (http://www.medmolgen. uzh.ch/research/eyediseases/norriedisease/Norrinmutations. html), and CEP290, associated with LCA, early-onset RP, and several syndromic retinopathies (http://www.LOVD.nl/CEP290; https://cep290base.cmgg.be/index.php). A large proportion of variants in Bardet-Biedl syndrome (BBS)-associated genes can also be accessed online (https://lovd.euro-wabb.org/status.php). Comprehensive Leiden Open (source) Variation Databases (LOVDs) have been developed for nine genes implicated in Usher syndrome (https://grenada.lumc.nl/LOVD2/Usher_montpellier/USHbases.html) [100]. Only for ten other IRD-associated genes (ABCA4, www.LOVD.nl/ABCA4 [101]; AIPL1, http:// www.LOVD.nl/AIPL1; CHM, https://grenada.lumc.nl/ LOVD2/Usher_montpellier/home.php?select_db=CHM; CRB1, http://www.LOVD.nl/CRB1 [102]; EYS, http://www.LOVD. nl/EYS [103]; LCA5, http://www.LOVD.nl/LCA5 [104]; RDH5, http://www.LOVD.nl/RDH5; RPE65, http://www. LOVD.nl/RPE65; SEMA4A, http://www.LOVD.nl/SEM4A4; TULP1, http://www.LOVD.nl/TULP1), comprehensive LOVDs have thus far been created. LOVDs for any gene can be searched using GeneSymbol.LOVD.nl. 2.12.3 Human Gene Mutation Database
The Human Gene Mutation Database (HGMD) is another database that contains variants that are found in patients and are likely disease causing. All the variants in the database have been published in literature and are manually curated. A subset of the variant data, i.e., 141,000 variants, is freely available for academic institutions and nonprofit organizations, although registration is needed. The pathogenicity classification system of HGMD is also five-tier based: ●●
●●
●●
●●
●●
Disease-causing mutations (DM), when the author of the paper reporting the variant has established that the reported variant is involved in the phenotype Possible/probable disease-causing mutations (DM), when the author, the curators, or other literature indicates that there is doubt about the pathogenicity of the variant Disease-associated polymorphism (DP), when a significant association has been reported between the polymorphism and the disease Functional polymorphism (FP), when a functional effect of the variant has been shown in research, but no disease association has been reported so far Disease-associated polymorphisms with supporting functional evidence (DFP), when the polymorphism is both associated with disease and has been shown to have a functional effect
18
Mubeen Khan et al.
There is also a commercial version of HGMD which contains over 203,000 variants. The information in HGMD gives an overview per variant with minimal information, including a reference to the literature. Interestingly, in 2013, Cassa et al. published a study indicating that HGMD likely contains variants that are erroneously described to be pathogenic: 4.6% of the variants in HGMD have an allele frequency >0.01, and 3.5% of the variants even have an allele frequency of >0.05. Furthermore, 8.5% of a large set of pathologic variants tested were found in asymptomatic individuals [105]. These erroneously pathogenic classified variants are also likely to be present in ClinVar and LOVD, as those also contain variants from literature. Also, Abouelhoda et al. categorized many HGMD variants as benign by the use of a Saudi Arabian dataset of healthy consanguineous individuals, which show many homozygous rare variants [106]. HGMD now removed ~1000 “retired records,” which were erroneously included into HGMD. It is unclear whether these were removed from the public or the commercial version of HGMD [107]. Multiple studies have shown that variant databases often do not agree on the classification of variants or report classifications based on too little supportive evidence. In 2013, Vail et al. reviewed the pathogenicity classification of 2017 variants in BRCA1 and BRCA2 by five different variant databases, including ClinVar, LOVD, and HGMD. From variants present in two out of five databases, over 30% of the variants that were classified as pathogenic in one database were not classified as such in the second database and the same for ~50% of the variants classified as variant of unknown significance in one out of two and ~90% of the variants classified as benign in one out of two. When variants were present in more than two databases, the percentage of discrepancy in most cases increased even more. Out of ClinVar, LOVD, and HGMD, the one for which the pathogenic classifications of a subset of the variants were most often based on appropriate literature – 40% of the cases – was LOVD [108]. 2.12.4 iEYE
When no pathogenic variants have been found after performing whole exome sequencing (WES), often the patient variant data are not actively used thereafter. iEYE is a private database (K. Cisarova and C. Rivolta, unpublished data) in which WES or WGS variant data can be scrutinized for the relevance of novel rare variants and can be compared. ERDC groups have uploaded WES or WGS variant data of patients with inherited retinal diseases to iEYE. In this way all submitters can analyze both solved and unsolved genetic data of IRD patients to check if their newly found candidate genes or candidate variants show abnormalities in other IRD patients.
Identification and Analysis of Genes Associated with Inherited Retinal Diseases
19
3 Future Outlook The identification of SNVs and SVs in IRD patients will become more straightforward as sequencing technologies keep improving, both in terms of sensitivity and cost-effectiveness. Long-read WGS represents the most attractive technology. Data storage costs will become more important as well as variant data interpretation. Novel candidate IRD-associated genes are identified continuously, but an increasing number have only been found to be involved in a single IRD patient or family. Within the ERDC, 130 IRD candidate genes have been identified, a small subset of which (www.erdc.info) is publicly available (S. Roosing and FPM Cremers, unpublished data). Worldwide sharing of genotype data however would increase the pace to find additional families with variants in the same gene, thereby strengthening their candidacy. Modeling the identified mutations in animal models also is important to provide proof for the causality of gene defects. The entrance of CRISPR/Cas into science has opened great opportunities for IRD research [109]. Apart from the modeling of gene defects in animals, novel functional assays need to be developed to assess the effect of variants at the RNA and protein levels. As mentioned above, in the absence of patients’ somatic cells that express the gene of interest, robust in vitro RNA splice assays can be set up for every human gene. In case retina-specific splice defects could play a role, photoreceptor precursor cells can be derived from iPSCs generated from blood cells or fibroblasts. This technology has been used successfully to analyze the effect of known coding and noncoding variants in IRDs. Whether it also can be successfully used to identify elusive causal variants in large genomic regions or in entire genomes remains to be seen. Only a few examples of digenic inheritance and modifier genes for IRDs have been reported [110–115]. Nevertheless, there are many examples of significant differences between phenotypes (e.g., age at onset) in IRD cases that carry the same mutation(s), both within and between families. Reduced penetrance of variants might explain several autosomal dominant conditions, but as yet we have little clues regarding the genetic and possibly nongenetic modifiers. To study the mechanism of variable expression and non- penetrance, large case/control cohorts are required and genome-wide analysis techniques such as WES and WGS.
Acknowledgments The work of M.K. is supported by the Rotterdamse Stichting Blindenbelangen, the Stichting Blindenhulp, the Stichting tot Verbetering van het Lot der Blinden, and the Stichting
20
Mubeen Khan et al.
linden-Penning (to F.P.M.C and S.R.). The work of Z.F. is supB ported by the Foundation Fighting Blindness USA Project Program Award grant no. PPA-0517-0717-RAD (to F.P.M.C. and S.R.). The work of M.K. and S.C. is supported by the RP Fighting Blindness, UK, grant no. GR591 (to F.P.M.C.). The work of S.C. is supported by the Fighting Blindness, Ireland (to F.P.M.C. and S.R.). References 1. Berger W, Kloeckener-Gruissem B, Neidhardt 8. Dryja TP, McGee TL, Reichel E, Hahn LB, Cowley GS, Yandell DW, Sandberg MA, J (2010) The molecular basis of human retiBerson EL (1990) A point mutation of the nal and vitreoretinal diseases. Prog Retin Eye rhodopsin gene in one form of retinitis pigRes 29(5):335–375. https://doi. mentosa. Nature 343(6256):364–366. org/10.1016/j.preteyeres.2010.03.004 https://doi.org/10.1038/343364a0 2. Neveling K, den Hollander AI, Cremers FP, Collin RW (2013) Identification and analysis 9. McWilliam P, Farrar GJ, Kenna P, Bradley DG, Humphries MM, Sharp EM, McConnell of inherited retinal disease genes. Methods DJ, Lawler M, Sheils D, Ryan C et al (1989) Mol Biol 935:3–23. https://doi. Autosomal dominant retinitis pigmentosa org/10.1007/978-1-62703-080-9_1 (ADRP): localization of an ADRP gene to the 3. Glockle N, Kohl S, Mohr J, Scheurenbrand T, long arm of chromosome 3. Genomics Sprecher A, Weisschuh N, Bernd A, Rudolph 5(3):619–622 G, Schubach M, Poloschek C, Zrenner E, Biskup S, Berger W, Wissinger B, Neidhardt 10. Cremers FP, van de Pol DJ, van Kerkhoff LP, Wieringa B, Ropers HH (1990) Cloning of a J (2014) Panel-based next generation gene that is rearranged in patients with chosequencing as a reliable and efficient techroideraemia. Nature 347(6294):674–677. nique to detect mutations in unselected https://doi.org/10.1038/347674a0 patients with retinal dystrophies. Eur J Hum Genet 22(1):99–104. https://doi. 11. Collin RW, van den Born LI, Klevering BJ, de org/10.1038/ejhg.2013.72 Castro-Miro M, Littink KW, Arimadyo K, Azam M, Yazar V, Zonneveld MN, Paun CC, 4. den Hollander AI, Black A, Bennett J, Siemiatkowska AM, Strom TM, Hehir-Kwa JY, Cremers FP (2010) Lighting a candle in the Kroes HY, de Faber JT, van Schooneveld MJ, dark: advances in genetics and gene therapy of Heckenlively JR, Hoyng CB, den Hollander recessive retinal dystrophies. J Clin Invest AI, Cremers FP (2011) High-resolution 120(9):3042–3053. https://doi. homozygosity mapping is a powerful tool to org/10.1172/JCI42258 detect novel mutations causative of autosomal 5. Roosing S, Thiadens AA, Hoyng CB, Klaver recessive RP in the Dutch population. Invest CC, den Hollander AI, Cremers FP (2014) Ophthalmol Vis Sci 52(5):2227–2239. Causes and consequences of inherited cone https://doi.org/10.1167/iovs.10-6185 disorders. Prog Retin Eye Res 42:1–26. https://doi.org/10.1016/j.preteyeres. 12. Bainbridge JW, Smith AJ, Barker SS, Robbie S, Henderson R, Balaggan K, Viswanathan A, 2014.05.001 Holder GE, Stockman A, Tyler N, Petersen- 6. Valle D, Kaiser-Kupfer MI, Del Valle LA Jones S, Bhattacharya SS, Thrasher AJ, Fitzke (1977) Gyrate atrophy of the choroid and FW, Carter BJ, Rubin GS, Moore AT, Ali RR retina: deficiency of ornithine aminotransfer(2008) Effect of gene therapy on visual funcase in transformed lymphocytes. Proc Natl tion in Leber’s congenital amaurosis. N Engl Acad Sci U S A 74(11):5159–5161 J Med 358(21):2231–2239. https://doi. 7. Mitchell GA, Brody LC, Looney J, Steel G, org/10.1056/NEJMoa0802268 Suchanek M, Dowling C, Der Kaloustian V, Kaiser-Kupfer M, Valle D (1988) An initiator 13. Hauswirth WW, Aleman TS, Kaushal S, Cideciyan AV, Schwartz SB, Wang L, Conlon codon mutation in ornithine-delta- TJ, Boye SL, Flotte TR, Byrne BJ, Jacobson aminotransferase causing gyrate atrophy of SG (2008) Treatment of Leber congenital the choroid and retina. J Clin Invest amaurosis due to RPE65 mutations by ocular 81(2):630–633. https://doi.org/10.1172/ subretinal injection of adeno-associated virus jci113365
Identification and Analysis of Genes Associated with Inherited Retinal Diseases gene vector: short-term results of a phase I trial. Hum Gene Ther 19(10):979–990. https://doi.org/10.1089/hum.2008.107 14. Maguire AM, Simonelli F, Pierce EA, Pugh EN Jr, Mingozzi F, Bennicelli J, Banfi S, Marshall KA, Testa F, Surace EM, Rossi S, Lyubarsky A, Arruda VR, Konkle B, Stone E, Sun J, Jacobs J, Dell'Osso L, Hertle R, Ma JX, Redmond TM, Zhu X, Hauck B, Zelenaia O, Shindler KS, Maguire MG, Wright JF, Volpe NJ, McDonnell JW, Auricchio A, High KA, Bennett J (2008) Safety and efficacy of gene transfer for Leber’s congenital amaurosis. N Engl J Med 358(21):2240–2248. https://doi.org/10.1056/ NEJMoa0802315 15. Russell S, Bennett J, Wellman JA, Chung DC, Yu ZF, Tillman A, Wittes J, Pappas J, Elci O, McCague S, Cross D, Marshall KA, Walshire J, Kehoe TL, Reichert H, Davis M, Raffini L, George LA, Hudson FP, Dingfield L, Zhu X, Haller JA, Sohn EH, Mahajan VB, Pfeifer W, Weckmann M, Johnson C, Gewaily D, Drack A, Stone E, Wachtel K, Simonelli F, Leroy BP, Wright JF, High KA, Maguire AM (2017) Efficacy and safety of voretigene neparvovec (AAV2-hRPE65v2) in patients with RPE65- mediated inherited retinal dystrophy: a randomised, controlled, open-label, phase 3 trial. Lancet 390(10097):849–860. https://doi. org/10.1016/S0140-6736(17)31868-8 16. Edwards TL, Jolly JK, Groppe M, Barnard AR, Cottriall CL, Tolmachova T, Black GC, Webster AR, Lotery AJ, Holder GE, Xue K, Downes SM, Simunovic MP, Seabra MC, MacLaren RE (2016) Visual acuity after retinal gene therapy for choroideremia. N Engl J Med 374(20):1996–1998. https://doi. org/10.1056/NEJMc1509501 17. MacLaren RE, Groppe M, Barnard AR, Cottriall CL, Tolmachova T, Seymour L, Clark KR, During MJ, Cremers FP, Black GC, Lotery AJ, Downes SM, Webster AR, Seabra MC (2014) Retinal gene therapy in patients with choroideremia: initial findings from a phase 1/2 clinical trial. Lancet 383(9923):1129–1137. https://doi. org/10.1016/S0140-6736(13)62117-0 18. Scholl HP, Moore AT, Koenekoop RK, Wen Y, Fishman GA, van den Born LI, Bittner A, Bowles K, Fletcher EC, Collison FT, Dagnelie G, Degli Eposti S, Michaelides M, Saperstein DA, Schuchard RA, Barnes C, Zein W, Zobor D, Birch DG, Mendola JD, Zrenner E, Group RIS (2015) Safety and Proof-of-Concept Study of Oral QLT091001 in Retinitis Pigmentosa Due to Inherited Deficiencies of Retinal Pigment Epithelial 65 Protein (RPE65)
21
or Lecithin: Retinol Acyltransferase (LRAT). PLoS One 10(12):e0143846. https://doi. org/10.1371/journal.pone.0143846 19. Koenekoop RK, Sui R, Sallum J, van den Born LI, Ajlan R, Khan A, den Hollander AI, Cremers FP, Mendola JD, Bittner AK, Dagnelie G, Schuchard RA, Saperstein DA (2014) Oral 9-cis retinoid for childhood blindness due to Leber congenital amaurosis caused by RPE65 or LRAT mutations: an open-label phase 1b trial. Lancet 384(9953):1513–1520. https://doi. org/10.1016/S0140-6736(14)60153-7 20. Pasadhika S, Fishman GA, Stone EM, Lindeman M, Zelkha R, Lopez I, Koenekoop RK, Shahidi M (2010) Differential macular morphology in patients with RPE65-, CEP290-, GUCY2D-, and AIPL1- related Leber congenital amaurosis. Invest Ophthalmol Vis Sci 51(5):2608–2614. https://doi.org/10.1167/iovs.09-3734 21. Siemiatkowska AM, Collin RW, den Hollander AI, Cremers FP (2014) Genomic approaches for the discovery of genes mutated in inherited retinal degeneration. Cold Spring Harb Perspect Med 4(8). https://doi. org/10.1101/cshperspect.a017137 22. Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, Axelrod N, Huang J, Kirkness EF, Denisov G, Lin Y, MacDonald JR, Pang AW, Shago M, Stockwell TB, Tsiamouri A, Bafna V, Bansal V, Kravitz SA, Busam DA, Beeson KY, McIntosh TC, Remington KA, Abril JF, Gill J, Borman J, Rogers YH, Frazier ME, Scherer SW, Strausberg RL, Venter JC (2007) The diploid genome sequence of an individual human. PLoS Biol 5(10):e254. https://doi.org/10.1371/journal. pbio.0050254 23. Myllykangas S, Natsoulis G, Bell JM, Ji HP (2011) Targeted sequencing library preparation by genomic DNA circularization. BMC Biotechnol 11:122. https://doi. org/10.1186/1472-6750-11-122 24. Broadgate S, Yu J, Downes SM, Halford S (2017) Unravelling the genetics of inherited retinal dystrophies: past, present and future. Prog Retin Eye Res 59:53–96. https://doi. org/10.1016/j.preteyeres.2017.03.003 25. Tewhey R, Warner JB, Nakano M, Libby B, Medkova M, David PH, Kotsopoulos SK, Samuels ML, Hutchison JB, Larson JW, Topol EJ, Weiner MP, Harismendy O, Olson J, Link DR, Frazer KA (2009) Microdroplet- based PCR enrichment for large-scale targeted sequencing. Nat Biotechnol 27(11):1025– 1031. https://doi.org/10.1038/nbt.1583
22
Mubeen Khan et al.
26. Lin X, Tang W, Ahmad S, Lu J, Colby CC, Zhu J, Yu Q (2012) Applications of targeted gene capture and next-generation sequencing technologies in studies of human deafness and other genetic disabilities. Hear Res 288(1):67–76 27. Head SR, Komori HK, LaMere SA, Whisenant T, Van Nieuwerburgh F, Salomon DR, Ordoukhanian P (2014) Library construction for next-generation sequencing: overviews and challenges. BioTechniques 56(2):61–64, 66, 68, passim. https://doi. org/10.2144/000114133 28. Absalan F, Ronaghi M (2007) Molecular inversion probe assay. Methods Mol Biol 396:315–330. https://doi. org/10.1007/978-1-59745-515-2_20 29. Jacob CO, Reiff A, Armstrong DL, Myones BL, Silverman E, Klein-Gitelman M, McCurdy D, Wagner-Weiner L, Nocton JJ, Solomon A, Zidovetzki R (2007) Identification of novel susceptibility genes in childhood-onset systemic lupus erythematosus using a uniquely designed candidate gene pathway platform. Arthritis Rheum 56(12):4164–4173. https://doi. org/10.1002/art.23060 30. Turner EH, Lee C, Ng SB, Nickerson DA, Shendure J (2009) Massively parallel exon capture and library-free resequencing across 16 genomes. Nat Methods 6(5):315–316. https://doi.org/10.1038/nmeth.f.248 31. Igartua C, Turner EH, Ng SB, Hodges E, Hannon GJ, Bhattacharjee A, Rieder MJ, Nickerson DA, Shendure J (2010) Targeted enrichment of specific regions in the human genome by array hybridization. Curr Protoc Hum Genet Chapter 18:Unit 18 13. doi:https://doi.org/10.1002/0471142905. hg1803s66 32. Teer JK, Bonnycastle LL, Chines PS, Hansen NF, Aoyama N, Swift AJ, Abaan HO, Albert TJ, Program NCS, Margulies EH, Green ED, Collins FS, Mullikin JC, Biesecker LG (2010) Systematic comparison of three genomic enrichment methods for massively parallel DNA sequencing. Genome Res 20(10):1420–1431. https://doi.org/10.1101/gr.106716.110 33. O'Roak BJ, Vives L, Fu W, Egertson JD, Stanaway IB, Phelps IG, Carvill G, Kumar A, Lee C, Ankenman K, Munson J, Hiatt JB, Turner EH, Levy R, O’Day DR, Krumm N, Coe BP, Martin BK, Borenstein E, Nickerson DA, Mefford HC, Doherty D, Akey JM, Bernier R, Eichler EE, Shendure J (2012) Multiplex targeted sequencing identifies recurrently mutated genes in autism spectrum
disorders. Science 338(6114):1619–1622. https://doi.org/10.1126/science.1227764 34. Mamanova L, Coffey AJ, Scott CE, Kozarewa I, Turner EH, Kumar A, Howard E, Shendure J, Turner DJ (2010) Target-enrichment strategies for next-generation sequencing. Nat Methods 7(2):111–118. https://doi. org/10.1038/nmeth.1419 35. Mardis ER (2011) A decade's perspective on DNA sequencing technology. Nature 470(7333):198–203. https://doi. org/10.1038/nature09796 36. Fukunaga R, Matsumoto T, Aoyagi Y, Matsuda D, Tanaka S, Okadome J, Morisaki K, Maehara Y (2014) Thoracic stent graft with distal fenestration for the superior mesenteric artery for treatment of thoracic aortic aneurysm. Ann Vasc Dis 7(2):152–155. https://doi.org/10.3400/avd.cr.13-00119 37. Levy SE, Myers RM (2016) Advancements in next-generation sequencing. Annu Rev Genomics Hum Genet 17:95–115. https:// doi.org/10.1146/annurev-genom-083115022413 38. Goodwin S, McPherson JD, McCombie WR (2016) Coming of age: ten years of next- generation sequencing technologies. Nat Rev Genet 17(6):333–351. https://doi. org/10.1038/nrg.2016.49 39. Meienberg J, Bruggmann R, Oexle K, Matyas G (2016) Clinical sequencing: is WGS the better WES? Hum Genet 135(3):359–362. https://doi.org/10.1007/s00439-0151631-9 40. Knoppers BM, Zawati MH, Senecal K (2015) Return of genetic testing results in the era of whole-genome sequencing. Nat Rev Genet 16(9):553–559. https://doi.org/10.1038/ nrg3960 41. Belkadi A, Bolze A, Itan Y, Cobat A, Vincent QB, Antipenko A, Shang L, Boisson B, Casanova JL, Abel L (2015) Whole-genome sequencing is more powerful than wholeexome sequencing for detecting exome variants. Proc Natl Acad Sci U S A 112(17):5473–5478. https://doi. org/10.1073/pnas.1418631112 42. Chesworth BM, Hamilton CB, Walton DM, Benoit M, Blake TA, Bredy H, Burns C, Chan L, Frey E, Gillies G, Gravelle T, Ho R, Holmes R, Lavallee RL, MacKinnon M, Merchant AJ, Sherman T, Spears K, Yardley D (2014) Reliability and validity of two versions of the upper extremity functional index. Physiother Can 66(3):243–253. https://doi. org/10.3138/ptc.2013-45
Identification and Analysis of Genes Associated with Inherited Retinal Diseases 43. Genome of the Netherlands Consortium (2014) Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat Genet 46(8):818–825. https://doi.org/10.1038/ ng.3021 44. Richards S, Aziz N, Bale S, Bick D, Das S, Gastier-Foster J, Grody WW, Hegde M, Lyon E, Spector E, Voelkerding K, Rehm HL, Committee ALQA (2015) Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med 17(5):405– 424. https://doi.org/10.1038/ gim.2015.30 45. Bentley DR (2006) Whole-genome re- sequencing. Curr Opin Genet Dev 16(6):545–552. https://doi.org/10.1016/j. gde.2006.10.009 46. Whiteford N, Haslam N, Weber G, Prugel- Bennett A, Essex JW, Roach PL, Bradley M, Neylon C (2005) An analysis of the feasibility of short read sequencing. Nucleic Acids Res 33(19):e171. https://doi.org/10.1093/ nar/gni170 47. Shendure J, Mitra RD, Varma C, Church GM (2004) Advanced sequencing technologies: methods and goals. Nat Rev Genet 5(5):335– 344. https://doi.org/10.1038/nrg1325 48. Carneiro MO, Russ C, Ross MG, Gabriel SB, Nusbaum C, DePristo MA (2012) Pacific biosciences sequencing technology for genotyping and variation discovery in human data. BMC Genomics 13:375. https://doi. org/10.1186/1471-2164-13-375 49. Salmela L, Walve R, Rivals E, Ukkonen E (2017) Accurate self-correction of errors in long reads using de Bruijn graphs. Bioinformatics 33(6):799–806. https://doi. org/10.1093/bioinformatics/btw321 50. Chin CS, Alexander DH, Marks P, Klammer AA, Drake J, Heiner C, Clum A, Copeland A, Huddleston J, Eichler EE, Turner SW, Korlach J (2013) Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat Methods 10(6):563–569. https://doi.org/10.1038/ nmeth.2474 51. McCoy RC, Taylor RW, Blauwkamp TA, Kelley JL, Kertesz M, Pushkarev D, Petrov DA, Fiston-Lavier AS (2014) Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly- repetitive transposable elements. PLoS One 9(9):e106689. https://doi.org/10.1371/ journal.pone.0106689
23
52. Ajay SS, Parker SC, Abaan HO, Fajardo KV, Margulies EH (2011) Accurate and comprehensive sequencing of personal genomes. Genome Res 21(9):1498–1505. https://doi. org/10.1101/gr.123638.111 53. Alkuraya FS (2013) The application of next- generation sequencing in the autozygosity mapping of human recessive diseases. Hum Genet 132(11):1197–1211. https://doi. org/10.1007/s00439-013-1344-x 54. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81(3):559–575. https:// doi.org/10.1086/519795 55. Abu Safieh L, Aldahmesh MA, Shamseldin H, Hashem M, Shaheen R, Alkuraya H, Al Hazzaa SA, Al-Rajhi A, Alkuraya FS (2010) Clinical and molecular characterisation of Bardet-Biedl syndrome in consanguineous populations: the power of homozygosity mapping. J Med Genet 47(4):236–241. https://doi.org/10.1136/jmg.2009. 070755 56. Woods CG, Cox J, Springell K, Hampshire DJ, Mohamed MD, McKibbin M, Stern R, Raymond FL, Sandford R, Malik Sharif S, Karbani G, Ahmed M, Bond J, Clayton D, Inglehearn CF (2006) Quantification of homozygosity in consanguineous individuals with autosomal recessive disease. Am J Hum Genet 78(5):889–896. https://doi. org/10.1086/503875 57. Collin RW, Littink KW, Klevering BJ, van den Born LI, Koenekoop RK, Zonneveld MN, Blokland EA, Strom TM, Hoyng CB, den Hollander AI, Cremers FP (2008) Identification of a 2 Mb human ortholog of Drosophila eyes shut/spacemaker that is mutated in patients with retinitis pigmentosa. Am J Hum Genet 83(5):594–603. https:// doi.org/10.1016/j.ajhg.2008.10.014 58. Zarrei M, MacDonald JR, Merico D, Scherer SW (2015) A copy number variation map of the human genome. Nat Rev Genet 16(3):172–183. https://doi.org/10.1038/ nrg3871 59. Conrad DF, Hurles ME (2007) The population genetics of structural variation. Nat Genet 39(7 Suppl):S30–S36. https://doi. org/10.1038/ng2042 60. Pirooznia M, Goes FS, Zandi PP (2015) Whole-genome CNV analysis: advances in computational approaches. Front Genet 6:138. https://doi.org/10.3389/ fgene.2015.00138
24
Mubeen Khan et al.
61. Chen W, Hayward C, Wright AF, Hicks AA, Vitart V, Knott S, Wild SH, Pramstaller PP, Wilson JF, Rudan I, Porteous DJ (2011) Copy number variation across European populations. PLoS One 6(8):e23087. https:// doi.org/10.1371/journal.pone.0023087 62. Pang AW, MacDonald JR, Pinto D, Wei J, Rafiq MA, Conrad DF, Park H, Hurles ME, Lee C, Venter JC, Kirkness EF, Levy S, Feuk L, Scherer SW (2010) Towards a comprehensive structural variation map of an individual human genome. Genome Biol 11(5):R52. https://doi.org/10.1186/gb-201011-5-r52 63. Beckmann JS, Estivill X, Antonarakis SE (2007) Copy number variants and genetic traits: closer to the resolution of phenotypic to genotypic variability. Nat Rev Genet 8(8):639–646. https://doi.org/10.1038/ nrg2149 64. Buchanan JA, Scherer SW (2008) Contemplating effects of genomic structural variation. Genet Med 10(9):639–647 https://doi.org/10.1097GIM.0b013e3181 83f848 65. Haer-Wigman L, van Zelst-Stams WA, Pfundt R, van den Born LI, Klaver CC, Verheij JB, Hoyng CB, Breuning MH, Boon CJ, Kievit AJ, Verhoeven VJ, Pott JW, Sallevelt SC, van Hagen JM, Plomp AS, Kroes HY, Lelieveld SH, Hehir-Kwa JY, Castelein S, Nelen M, Scheffer H, Lugtenberg D, Cremers FP, Hoefsloot L, Yntema HG (2017) Diagnostic exome sequencing in 266 Dutch patients with visual impairment. Eur J Hum Genet 25(5):591–599. https://doi.org/10.1038/ ejhg.2017.9 66. Combs R, McAllister M, Payne K, Lowndes J, Devery S, Webster AR, Downes SM, Moore AT, Ramsden S, Black G, Hall G (2013) Understanding the impact of genetic testing for inherited retinal dystrophy. Eur J Hum Genet 21(11):1209–1213. https://doi. org/10.1038/ejhg.2013.19 67. Bujakowska KM, Fernandez-Godino R, Place E, Consugar M, Navarro-Gomez D, White J, Bedoukian EC, Zhu X, Xie HM, Gai X, Leroy BP, Pierce EA (2017) Copy-number variation is an important contributor to the genetic causality of inherited retinal degenerations. Genet Med 19(6):643–651. https://doi. org/10.1038/gim.2016.158 68. Eisenberger T, Neuhaus C, Khan AO, Decker C, Preising MN, Friedburg C, Bieg A, Gliem M, Charbel Issa P, Holz FG, Baig SM, Hellenbroich Y, Galvez A, Platzer K, Wollnik B, Laddach N, Ghaffari SR, Rafati M,
Botzenhart E, Tinschert S, Borger D, Bohring A, Schreml J, Kortge-Jung S, Schell-Apacik C, Bakur K, Al-Aama JY, Neuhann T, Herkenrath P, Nurnberg G, Nurnberg P, Davis JS, Gal A, Bergmann C, Lorenz B, Bolz HJ (2013) Increasing the yield in targeted next-generation sequencing by implicating CNV analysis, non-coding exons and the overall variant load: the example of retinal dystrophies. PLoS One 8(11):e78496. https://doi.org/10.1371/journal. pone.0078496 69. Hehir-Kwa JY, Pfundt R, Veltman JA (2015) Exome sequencing and whole genome sequencing for the detection of copy number variation. Expert Rev Mol Diagn 15(8):1023– 1032. https://doi.org/10.1586/14737159. 2015.1053467 70. Pang AW, Macdonald JR, Yuen RK, Hayes VM, Scherer SW (2014) Performance of high-throughput sequencing for the discovery of genetic variation across the complete size spectrum. G3 (Bethesda) 4(1):63–65. https://doi.org/10.1534/g3.113.008797 71. Pinto D, Darvishi K, Shi X, Rajan D, Rigler D, Fitzgerald T, Lionel AC, Thiruvahindrapuram B, Macdonald JR, Mills R, Prasad A, Noonan K, Gribble S, Prigmore E, Donahoe PK, Smith RS, Park JH, Hurles ME, Carter NP, Lee C, Scherer SW, Feuk L (2011) Comprehensive assessment of arraybased platforms and calling algorithms for detection of copy number variants. Nat Biotechnol 29(6):512–520. https://doi. org/10.1038/nbt.1852 72. Newman S, Hermetz KE, Weckselblatt B, Rudd MK (2015) Next-generation sequencing of duplication CNVs reveals that most are tandem and some create fusion genes at breakpoints. Am J Hum Genet 96(2):208– 220. https://doi.org/10.1016/j. ajhg.2014.12.017 73. Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, Aerts J, Andrews TD, Barnes C, Campbell P, Fitzgerald T, Hu M, Ihm CH, Kristiansson K, Macarthur DG, Macdonald JR, Onyiah I, Pang AW, Robson S, Stirrups K, Valsesia A, Walter K, Wei J, Wellcome Trust Case Control Consortium, Tyler-Smith C, Carter NP, Lee C, Scherer SW, Hurles ME (2010) Origins and functional impact of copy number variation in the human genome. Nature 464(7289):704– 712. https://doi.org/10.1038/ nature08516 74. Lander E, Kruglyak L (1995) Genetic dissection of complex traits: guidelines for inter-
Identification and Analysis of Genes Associated with Inherited Retinal Diseases preting and reporting linkage results. Nat Genet 11(3):241–247. https://doi. org/10.1038/ng1195-241 75. Boonstra FN, van Nouhuys CE, Schuil J, de Wijs IJ, van der Donk KP, Nikopoulos K, Mukhopadhyay A, Scheffer H, Tilanus MA, Cremers FP, Hoefsloot LH (2009) Clinical and molecular evaluation of probands and family members with familial exudative vitreoretinopathy. Invest Ophthalmol Vis Sci 50(9):4379–4385. https://doi. org/10.1167/iovs.08-3320 76. Al-Maghtheh M, Vithana E, Tarttelin E, Jay M, Evans K, Moore T, Bhattacharya S, Inglehearn CF (1996) Evidence for a major retinitis pigmentosa locus on 19q13.4 (RP11) and association with a unique bimodal expressivity phenotype. Am J Hum Genet 59(4):864–871 77. Hoffmann K, Lindner TH (2005) easyLINKAGE- plus – automated linkage analyses using large-scale SNP data. Bioinformatics 21(17):3565–3567. https:// doi.org/10.1093/bioinformatics/bti571 78. Ruschendorf F, Nurnberg P (2005) ALOHOMORA: a tool for linkage analysis using 10K SNP array data. Bioinformatics 21(9):2123–2125. https://doi. org/10.1093/bioinformatics/bti264 79. Terwillinger D, Ott J (1994) Handbook for human genetic linkage. Johns Hopkins University Press, Baltimore 80. Nyholt D (2008) Statistical genetics: gene mapping through linkage and association. In: Neale BM, Ferreira M, Medland SE, Posthuma D (eds) Principles of linkage analysis. Taylor & Francis Group, New York, pp 113–134 81. Movassat M, Mueller WF, Hertel KJ (2014) In vitro assay of pre-mRNA splicing in mammalian nuclear extract. Methods Mol Biol 1126:151–160. https://doi. org/10.1007/978-1-62703-980-2_11 82. Hicks MJ, Lam BJ, Hertel KJ (2005) Analyzing mechanisms of alternative premRNA splicing using in vitro splicing assays. Methods (San Diego, Calif) 37(4):306–313. https://doi.org/10.1016/j. ymeth.2005.07.012 83. Osoegawa K, de Jong PJ (2004) BAC library construction. Methods Mol Biol 255:1–46. https://doi.org/10.1385/1-59259752-1:001 84. Sangermano R, Khan M, Cornelis SS, Richelle V, Albert S, Garanto A, Elmelik D, Qamar R, Lugtenberg D, van den Born LI, Collin RWJ, Cremers FPM (2018) ABCA4 midigenes
25
reveal the full splice spectrum of all reported noncanonical splice site variants in Stargardt disease. Genome Res 28:100–110. PMID: 29162642 85. Parfitt DA, Lane A, Ramsden C, Jovanovic K, Coffey PJ, Hardcastle AJ, Cheetham ME (2016) Using induced pluripotent stem cells to understand retinal ciliopathy disease mechanisms and develop therapies. Biochem Soc Trans 44(5):1245–1251. https://doi. org/10.1042/BST20160156 86. Sangermano R, Bax NM, Bauwens M, van den Born LI, De Baere E, Garanto A, Collin RW, Goercharn-Ramlal AS, den Engelsman- van Dijk AH, Rohrschneider K, Hoyng CB, Cremers FP, Albert S (2016) Photoreceptor progenitor mRNA analysis reveals exon skipping resulting from the ABCA4 c.5461-10T->C mutation in stargardt disease. Ophthalmology 123(6):1375–1385. https://doi.org/10.1016/j.ophtha.2016. 01.053 87. Lukovic D, Artero Castro A, Delgado AB, Bernal Mde L, Luna Pelaez N, Diez Lloret A, Perez Espejo R, Kamenarova K, Fernandez Sanchez L, Cuenca N, Corton M, Avila Fernandez A, Sorkio A, Skottman H, Ayuso C, Erceg S, Bhattacharya SS (2015) Human iPSC derived disease model of MERTKassociated retinitis pigmentosa. Sci Rep 5:12910. https://doi.org/10.1038/ srep12910 88. Yoshida T, Ozawa Y, Suzuki K, Yuki K, Ohyama M, Akamatsu W, Matsuzaki Y, Shimmura S, Mitani K, Tsubota K, Okano H (2014) The use of induced pluripotent stem cells to reveal pathogenic gene mutations and explore treatments for retinitis pigmentosa. Mol Brain 7:45. https://doi. org/10.1186/1756-6606-7-45 89. Tucker BA, Mullins RF, Streb LM, Anfinson K, Eyestone ME, Kaalberg E, Riker MJ, Drack AV, Braun TA, Stone EM (2013) Patient-specific iPSC-derived photoreceptor precursor cells as a means to investigate retinitis pigmentosa. eLife 2:e00824. https://doi. org/10.7554/eLife.00824 90. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR (2010) A method and server for predicting damaging missense mutations. Nat Methods 7(4):248– 249. https://doi.org/10.1038/ nmeth0410-248 91. Kumar P, Henikoff S, Ng PC (2009) Predicting the effects of coding non- synonymous variants on protein function using the SIFT algorithm. Nat Protoc
26
Mubeen Khan et al.
4(7):1073–1081. https://doi.org/10.1038/ nprot.2009.86 92. Schwarz JM, Cooper DN, Schuelke M, Seelow D (2014) MutationTaster2: mutation prediction for the deep-sequencing age. Nat Methods 11(4):361–362. https://doi. org/10.1038/nmeth.2890 93. Desmet FO, Hamroun D, Lalande M, Collod-Beroud G, Claustres M, Beroud C (2009) Human splicing finder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res 37(9):e67. https://doi. org/10.1093/nar/gkp215 94. Pertea M, Lin X, Salzberg SL (2001) GeneSplicer: a new computational method for splice site prediction. Nucleic Acids Res 29(5):1185–1190 95. Yeo G, Burge CB (2004) Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J Comput Biol 11(2–3):377–394. https://doi. org/10.1089/1066527041410418 96. Sobreira N, Schiettecatte F, Valle D, Hamosh A (2015) GeneMatcher: a matching tool for connecting investigators with an interest in the same gene. Hum Mutat 36(10):928–930. https://doi.org/10.1002/humu.22844 97. Landrum MJ, Lee JM, Benson M, Brown G, Chao C, Chitipiralla S, Gu B, Hart J, Hoffman D, Hoover J, Jang W, Katz K, Ovetsky M, Riley G, Sethi A, Tully R, Villamarin-Salomon R, Rubinstein W, Maglott DR (2016) ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res 44(D1):D862–D868. https://doi. org/10.1093/nar/gkv1222 98. Fokkema IF, Taschner PE, Schaafsma GC, Celli J, Laros JF, den Dunnen JT (2011) LOVD v.2.0: the next generation in gene variant databases. Hum Mutat 32(5):557–563. https://doi.org/10.1002/humu.21438 99. Cremers FP, den Dunnen JT, Ajmal M, Hussain A, Preising MN, Daiger SP, Qamar R (2014) Comprehensive registration of DNA sequence variants associated with inherited retinal diseases in Leiden open variation databases. Hum Mutat 35(1):147–148. https:// doi.org/10.1002/humu.22458 100. Baux D, Blanchet C, Hamel C, Meunier I, Larrieu L, Faugere V, Vache C, Castorina P, Puech B, Bonneau D, Malcolm S, Claustres M, Roux AF (2014) Enrichment of LOVD- USH bases with 152 USH2A genotypes defines an extensive mutational spectrum and highlights missense hotspots. Hum Mutat 35(10):1179–1186. https://doi. org/10.1002/humu.22608
101. Cornelis SS, Bax NM, Zernant J, Allikmets R, Fritsche LG, den Dunnen JT, Ajmal M, Hoyng CB, Cremers FP (2017) In Silico functional meta-analysis of 5,962 ABCA4 variants in 3,928 retinal dystrophy cases. Hum Mutat 38(4):400–408. https://doi. org/10.1002/humu.23165 102. Bujakowska K, Audo I, Mohand-Said S, Lancelot ME, Antonio A, Germain A, Leveillard T, Letexier M, Saraiva JP, Lonjou C, Carpentier W, Sahel JA, Bhattacharya SS, Zeitz C (2012) CRB1 mutations in inherited retinal dystrophies. Hum Mutat 33(2):306– 315. https://doi.org/10.1002/ humu.21653 103. Messchaert M, Haer-Wigman L, Khan MI, Cremers FPM, Collin RWJ (2018) EYS mutation update: in silico assessment of 271 reported and 26 novel variants in patients with retinitis pigmentosa. Hum Mutat 39(2):177–186 104. Mackay DS, Borman AD, Sui R, van den Born LI, Berson EL, Ocaka LA, Davidson AE, Heckenlively JR, Branham K, Ren H, Lopez I, Maria M, Azam M, Henkes A, Blokland E, Qamar R, Webster AR, Cremers FPM, Moore AT, Koenekoop RK, Andreasson S, de Baere E, Bennett J, Chader GJ, Berger W, Golovleva I, Greenberg J, den Hollander AI, Klaver CCW, Klevering BJ, Lorenz B, Preising MN, Ramsear R, Roberts L, Roepman R, Rohrschneider K, Wissinger B (2013) Screening of a large cohort of Leber congenital amaurosis and retinitis pigmentosa patients identifies novel LCA5 mutations and new genotype-phenotype correlations. Hum Mutat 34(11):1537–1546. https://doi. org/10.1002/humu.22398 105. Cassa CA, Tong MY, Jordan DM (2013) Large numbers of genetic variants considered to be pathogenic are common in asymptomatic individuals. Hum Mutat 34(9):1216– 1220. https://doi.org/10.1002/ humu.22375 106. Abouelhoda M, Faquih T, El-Kalioby M, Alkuraya FS (2016) Revisiting the morbid genome of Mendelian disorders. Genome Biol 17(1):235. https://doi.org/10.1186/ s13059-016-1102-1 107. Stenson PD, Mort M, Ball EV, Evans K, Hayden M, Heywood S, Hussain M, Phillips AD, Cooper DN (2017) The human gene mutation database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next- generation sequencing studies. Hum Genet 136(6):665–677. https://doi.org/10.1007/ s00439-017-1779-6
Identification and Analysis of Genes Associated with Inherited Retinal Diseases 108. Vail PJ, Morris B, van Kan A, Burdett BC, Moyes K, Theisen A, Kerr ID, Wenstrup RJ, Eggington JM (2015) Comparison of locusspecific databases for BRCA1 and BRCA2 variants reveals disparity in variant classification within and among databases. J Community Genet 6(4):351–359. https:// doi.org/10.1007/s12687-015-0220-x 109. Peng YQ, Tang LS, Yoshida S, Zhou YD (2017) Applications of CRISPR/Cas9 in retinal degenerative diseases. Int J Ophthalmol 10(4):646–651. https://doi.org/10.18240/ ijo.2017.04.23 110. Badano JL, Leitch CC, Ansley SJ, May-Simera H, Lawson S, Lewis RA, Beales PL, Dietz HC, Fisher S, Katsanis N (2006) Dissection of epistasis in oligogenic Bardet-Biedl syndrome. Nature 439(7074):326–330. https://doi. org/10.1038/nature04370 111. Beales PL, Badano JL, Ross AJ, Ansley SJ, Hoskins BE, Kirsten B, Mein CA, Froguel P, Scambler PJ, Lewis RA, Lupski JR, Katsanis N (2003) Genetic interaction of BBS1 mutations with alleles at other BBS loci can result in non-Mendelian Bardet-Biedl syndrome. Am J Hum Genet 72(5):1187–1199. https://doi.org/10.1086/375178
27
112. Kajiwara K, Berson EL, Dryja TP (1994) Digenic retinitis pigmentosa due to mutations at the unlinked peripherin/RDS and ROM1 loci. Science 264(5165):1604–1608 113. Liu YP, Bosch DG, Siemiatkowska AM, Rendtorff ND, Boonstra FN, Moller C, Tranebjaerg L, Katsanis N, Cremers FP (2017) Putative digenic inheritance of heterozygous RP1L1 and C2orf71 null mutations in syndromic retinal dystrophy. Ophthalmic Genet 38(2):127–132. https://doi.org/10.3109/13816810.201 6.1151898 114. Vithana EN, Abu-Safieh L, Pelosini L, Winchester E, Hornan D, Bird AC, Hunt DM, Bustin SA, Bhattacharya SS (2003) Expression of PRPF31 mRNA in patients with autosomal dominant retinitis pigmentosa: a molecular clue for incomplete penetrance? Invest Ophthalmol Vis Sci 44(10):4204–4209 115. Venturini G, Rose AM, Shah AZ, Bhattacharya SS, Rivolta C (2012) CNOT3 is a modifier of PRPF31 mutations in retinitis pigmentosa with incomplete penetrance. PLoS Genet 8(11):e1003040. https://doi.org/10.1371/ journal.pgen.1003040
Chapter 2 Conduct and Quality Control of Differential Gene Expression Analysis Using High-Throughput Transcriptome Sequencing (RNASeq) Felix Grassmann Abstract High-throughput transcriptome sequencing (RNASeq) represents one of the most comprehensive and scalable methods to analyze global gene expression. It allows for absolute quantification of gene expression and also enables the discovery of novel transcripts and alternatively spliced isoforms. This chapter provides hand-on tools and a step-by-step procedure to analyze RNASeq data from punctures of two different retinal tissues (retina and RPE-choroid-sclera) at two different locations (periphery and macular region) from eight individuals. The procedure described in this chapter will use various programs from the free, open- source Tuxedo Suite software package to analyze sequencing data and to ascertain genes that are differentially expressed between retina and RPE-choroid-sclera. Key words Transcriptome, Global gene expression, Alternative splicing, Differential expression, RNASeq, Tuxedo Suite/software package
1 Introduction In recent years, transcript discovery and global transcript quantification have been revolutionized by high-throughput next- generation sequencing methods (RNASeq). In order to sequence the mRNA of a specimen, high-quality RNA has to be extracted from tissue or cell culture, and ribosomal RNA (rRNA) has to be removed either by enrichment of poly-A-containing mRNA or by depletion of rRNA. Next, the enriched mRNA is reverse transcribed into a cDNA fragment library fused to the sequencing adapters of the respective sequencing platform. A typical RNASeq experiment can yield millions of short sequencing reads which can be used to measure gene abundance and thus allow to identify genes which are differentially expressed between tissues, treatments, or time points. The analysis of differential gene expression based on RNASeq data generally follows four steps: (1) basic quality control and read preprocessing (such as adapter removal and Bernhard H. F. Weber and Thomas Langmann (eds.), Retinal Degeneration: Methods and Protocols, Methods in Molecular Biology, vol. 1834, https://doi.org/10.1007/978-1-4939-8669-9_2, © Springer Science+Business Media, LLC, part of Springer Nature 2019
29
30
Felix Grassmann
exclusion of short- and/or low-quality reads), (2) alignment of short reads to the genome and the transcriptome, (3) quantification of individual transcript or total gene expression abundance, and (4) statistical analysis of differential gene/transcript expression. There are several software packages, both commercial and open-source, available (see Note 1) to perform these tasks. Among the most widely used open-source software package is the so-called Tuxedo Suite [1], which enables researchers to effectively and accurately perform the required steps. In recent years, an ever increasing number of RNASeq data is deposited in different publically available databases and can be accessed freely or by a defined application process. This allows researchers to answer individual research questions without the financial burden of performing the actual RNASeq experiments. In this chapter, a method is described that allows to analyze RNASeq data from two different eye tissues (retina and RPE- choroid- sclera) from eight individuals (GEO accession ID GSE94437) [2]. The RNA was isolated from tissue punctures which were prepared from either the center of the macula or from the periphery of the eye, yielding a total of 32 biological replicates (Table 1). This chapter will focus on the identification of genes which are differentially expressed between retina and RPE-choroid- sclera punctures. However, additional analyses, such as gene expression lookup or identification of genes differentially expressed in the macula vs. the periphery of the retina, can easily be performed with the quantified reads.
2 Material 1. Hardware [64-bit computer running Linux (preferably Ubuntu); 4 GB of RAM (8 GB preferred)]. 2. SRAToolkit software (https://www.ncbi.nlm.nih.gov/sra/ docs/toolkitsoft/, version 2.2.8 or later). 3. HISAT2 software (http://github.com/infphilo/hisat2, version 2.1.0 or later). 4. StringTie software (https://github.com/gpertea/stringtie, version 1.3.3 or later). 5. SAMtools software (http://samtools.sourceforge.net/, version 0.1.19 or later). 6. FastQC software (https://www.bioinformatics.babraham. ac.uk/projects/fastqc/, version 0.11.5 or later). 7. cutadapt software (https://github.com/marcelm/cutadapt, version 1.14 or later). 8. R software (https://www.r-project.org/, version 3.2.3 or later) and the ballgown and genefilter libraries.
Conduct and Quality Control of Differential Gene Expression Analysis Using…
Table 1 Detailed phenotypes of the samples included in GSE94437 SRA Dataset
ID
Tissue
Location
Sex
Age
Color
SRR5225763
11-1516
Retina
Macular
Male
79
Firebrick
SRR5225767
11-1556
Retina
Macular
Female
62
Firebrick
SRR5225771
11-1614
Retina
Macular
Male
63
Firebrick
SRR5225775
11-1624
Retina
Macular
Male
62
Firebrick
SRR5225779
11-1648
Retina
Macular
Female
88
Firebrick
SRR5225783
11-1833
Retina
Macular
Female
95
Firebrick
SRR5225787
11-1875
Retina
Macular
Male
72
Firebrick
SRR5225791
11-2043
Retina
Macular
Male
70
Firebrick
SRR5225761
11-1516
Retina
Peripheral
Male
79
Darkorange2
SRR5225765
11-1556
Retina
Peripheral
Female
62
Darkorange2
SRR5225769
11-1614
Retina
Peripheral
Male
63
Darkorange2
SRR5225773
11-1624
Retina
Peripheral
Male
62
Darkorange2
SRR5225777
11-1648
Retina
Peripheral
Female
88
Darkorange2
SRR5225781
11-1833
Retina
Peripheral
Female
95
Darkorange2
SRR5225785
11-1875
Retina
Peripheral
Male
72
Darkorange2
SRR5225789
11-2043
Retina
Peripheral
Male
70
Darkorange2
SRR5225764
11-1516
RPE-choroid-sclera
Macular
Male
79
Dodgerblue4
SRR5225768
11-1556
RPE-choroid-sclera
Macular
Female
62
Dodgerblue4
SRR5225772
11-1614
RPE-choroid-sclera
Macular
Male
63
Dodgerblue4
SRR5225776
11-1624
RPE-choroid-sclera
Macular
Male
62
Dodgerblue4
SRR5225780
11-1648
RPE-choroid-sclera
Macular
Female
88
Dodgerblue4
SRR5225784
11-1833
RPE-choroid-sclera
Macular
Female
95
Dodgerblue4
SRR5225788
11-1875
RPE-choroid-sclera
Macular
Male
72
Dodgerblue4
SRR5225792
11-2043
RPE-choroid-sclera
Macular
Male
70
Dodgerblue4
SRR5225762
11-1516
RPE-choroid-sclera
Peripheral
Male
79
Cornflowerblue
SRR5225766
11-1556
RPE-choroid-sclera
Peripheral
Female
62
Cornflowerblue
SRR5225770
11-1614
RPE-choroid-sclera
Peripheral
Male
63
Cornflowerblue
SRR5225774
11-1624
RPE-choroid-sclera
Peripheral
Male
62
Cornflowerblue
SRR5225778
11-1648
RPE-choroid-sclera
Peripheral
Female
88
Cornflowerblue
SRR5225782
11-1833
RPE-choroid-sclera
Peripheral
Female
95
Cornflowerblue
SRR5225786
11-1875
RPE-choroid-sclera
Peripheral
Male
72
Cornflowerblue
SRR5225790
11-2043
RPE-choroid-sclera
Peripheral
Male
70
Cornflowerblue
31
32
Felix Grassmann
9. Data: raw sequencing reads in .fastq format (retrieved from a database or provided by your trusted sequencing center). In case the reads are stored in a different format, the files need to be converted to .fastq format (see Note 2).
3 Installation Executing the following commands will install the open-source programs required to analyze RNASeq data. Lines starting with a “$” denote lines that should be typed into the terminal of the Linux machine, while lines starting with a “>” denote commands that should be entered after the program R has been launched. Create a new directory in your home folder called “bin” which will contain the installation folders of the software used in this chapter. $ mkdir $HOME/bin $ cd $HOME/bin Install the SRA Toolkit from NCBI. $ mkdir SRAToolkit $ cd SRAToolkit $ wget https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/current/sratoolkit. current-ubuntu64.tar.gz $ tar xvfz sratoolkit.current- ubuntu64.tar.gz Install HISAT2 and download the reference genome and transcriptome data. $ cd $HOME/bin $ mkdir hisat2 $ cd hisat2 $ wget ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/downloads/ hisat2-2.1.0-Linux_x86_64.zip $ gunzip hisat2-2.1.0- Linux_x86_64.zip $ mkdir genomes $ cd genomes $ wget ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/data/grch38_ tran.tar.gz $ tar xvfz grch38_tran.tar.gz $ cd grch38_tran $ wget ftp://ftp.ensembl.org/pub/release-84/gtf/homo_sapiens/ Homo_sapiens.GRCh38.84.gtf.gz $ gzip -d Homo_sapiens.GRCh38.84.gtf.gz Install StringTie, SAMtools, and FastQC. $ cd $HOME/bin $ mkdir stringtie
Conduct and Quality Control of Differential Gene Expression Analysis Using…
33
$ cd stringtie $ wget http://ccb.jhu.edu/software/stringtie/dl/stringtie- 1.3.3b.Linux_x86_64.tar.gz $ tar xvfz stringtie-1.3.3b.Linux_x86_64.tar.gz $ cd $HOME/bin $ wget https://github.com/samtools/samtools/releases/download/1.5/samtools-1.5.tar.bz2 $ bzip2 -d samtools-1.5.tar.bz2 $ cd samtools-1.5 $ make $ cd $HOME/bin $ mkdir fastqc $ cd fastqc $ wget https://www.bioinformatics.babraham.ac.uk/projects/ fastqc/fastqc_v0.11.5.zip $ unzip fastqc_v0.11.5.zip Install cutadapt implemented in python with the pip command. $ pip install --user cutadapt Install the R program using apt-get. $ sudo apt-get install r-base
$ > > >
Open R and install the required libraries from the Bioconductor [3] repository.
R source("https://bioconductor.org/biocLite.R") biocLite("ballgown") biocLite("genefilter")
Close R with the q() command and select “no” when prompted to save the workspace. > q() Set the $PATH environmental variable to include the necessary executables by either executing these commands or by manually adding them to the .profile file in the home directory. $ export PATH=$PATH:$HOME/bin/SRAToolkit/sratoolkit.2.8.2- ubuntu64/bin/ $ export PATH=$PATH:$HOME/bin/hisat2/hisat2-2.1.0/ $ export HISAT2_INDEXES=$HOME/bin/hisat2/genomes/grch38_tran/ $ export PATH=$PATH:$HOME/bin/stringtie/stringtie-1.3.3b.Linux_ x86_64/ $ export PATH=$PATH:/$HOME/bin/samtools-1.5/ $ export PATH=$PATH:/$HOME/bin/fastqc/FastQC/ $ export PATH=$PATH:/$HOME/.local/bin/
34
Felix Grassmann
4 Methods 4.1 Download the Raw Reads from the GEO Database with the SRAToolkit
The following commands will download 1 000 000 paired-end reads from 32 samples (Table 1) from the Gene Expression Omnibus (GEO) database [4]. The reads are in the .fastq format and are saved in a newly created folder called data. For each sample, two files will be downloaded, one containing the forward reads (files ending in _1.fastq) and one containing the reverse reads (files ending in _2.fastq) of the same library fragment. The following commands are highly repetitive and identical for all SRA Dataset identifiers. Therefore, a general command structure and one example command are shown for the operation.
$ mkdir $HOME/data $ cd $HOME/data $ fastq-dump.2.8.2 -X 1000000 --split-files [SRA Dataset identifier] -O . $ fastq-dump.2.8.2 -X 1000000 --split-files SRR5225763 -O . 4.2 Prepare Reads and Conduct Basic Quality Control Steps
In the next step, Illumina adapter sequences are removed from the reads to ensure correct alignment in later steps. This step is often already done by the sequencing center, but one can still run cutadapt to get rid of any remaining adapter sequences or other, highly repetitive elements in the reads. The sequences of the adapters must be supplied to cutadapt with the – a parameter for adapters to be removed from the first (forward) read and the – A command for reverse-oriented adapters. Different sequencing platforms may have different adapter sequences.
$ cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -o [SRA Dataset identifier]_1-trimmed.fastq -p [SRA Dataset identifier]_2-trimmed.fastq [SRA Dataset identifier]_1.fastq [SRA Dataset identifier]_2.fastq $ cutadapt -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -o SRR5225763_1- trimmed.fastq -p SRR5225763_2-trimmed.fastq SRR5225763_1.fastq SRR5225763_2.fastq Next, execute the FastQC program in order to generate a comprehensive quality control report of the adapter-trimmed reads. A new folder called QC needs to be created, which will contain the quality control reports generated by FastQC. $ mkdir QC $ fastqc -o ./QC [SRA Dataset identifier]_[1/2]-trimmed.fastq $ fastqc -o ./QC SRR5225763_1-trimmed.fastq A quick glance over the quality reports indicates no major issues with the reads. As expected for RNASeq libraries, there is an
Conduct and Quality Control of Differential Gene Expression Analysis Using…
35
enrichment of nonrandom K-mers at the beginning of the reads due to the nonrandom priming of the random hexamer primers during the reverse transcription step in library construction [5]. 4.3 Run HISAT2 to Align the Reads to the Human Genome and Transcriptome
HISAT2 will take the trimmed reads as input and align it to a reference genome/transcriptome. The previously downloaded reference database (grch38_tran) should be supplied to HISAT2 with the -x command line parameter. In case the index cannot be found by HISAT2, see Note 3. The –p command line argument dictates HISAT2 to use 12 processors, which greatly reduces computation time.
$ hisat2 -p 12 --dta -x genome_tran -1 [SRA Dataset identifier]_1-trimmed.fastq -2 [SRA Dataset identifier]_2trimmed.fastq -S [SRA Dataset identifier].sam $ hisat2 -p 12 --dta -x genome_tran -1 SRR5225763_1-trimmed. fastq -2 SRR5225763_2- trimmed.fastq -S SRR5225763.sam The aligned reads are written in the human readable .sam file format. Subsequent steps, however, require the reads to be in the binary (compressed) .bam file format and also to be sorted according to their genomic location. Convert between those formats, and sort the reads with SAMtools, and specify the -@ command line argument to use 12 processors at once for this task. $ samtools sort -@ 12 -o [SRA Dataset identifier].bam [SRA Dataset identifier].sam $ samtools sort -@ 12 -o SRR5225763.bam SRR5225763.sam 4.4 Quantify Transcript Abundance and Discover Novel Transcripts and Isoforms with the StringTie Program
Next, run the StringTie program to quantify the abundance of transcripts. StringTie requires the sorted and aligned reads in the .bam format (as generated above) and also a reference transcript database in the .gtf format. This step will not only quantify known transcripts but also identify novel transcripts as well as novel isoforms.
$ stringtie [SRA Dataset identifier].bam -p 12 -G $HOME/bin/ hisat2/ genomes/grch38_tran/Homo_sapiens.GRCh38.84.gtf -o [SRA Dataset identifier].gtf -l [SRA Dataset identifier] $ stringtie SRR5225763.bam -p 12 -G $HOME/bin/hisat2/genomes/ grch38_tran/Homo_sapiens.GRCh38.84.gtf -o SRR5225763.gtf -l SRR5225763 Since some novel transcripts might be discovered only in a subset of the samples and it might be desirable to quantify those transcripts in all samples, we can merge all potential transcripts into a novel reference transcript database in the .gft format. Therefore, run StringTie again to merge all transcripts from all .gtf files
36
Felix Grassmann
created in the previous step and to generate a new reference transcript database called Retina_RPE-Choroid_stringtie_merged.gtf. $ stringtie --merge -p 12 -G $HOME/bin/hisat2/genomes/grch38_ tran/Homo_sapiens.GRCh38.84.gtf -o Retina_RPE- Choroid_stringtie_ merged.gtf *.gtf Finally, to quantify the abundance of all transcripts (novel and known) using the newly generated reference transcriptome database, run the StringTie algorithm, and supply the sorted and aligned reads in .bam format as well as the newly created reference transcript database. The results of the quantification will be saved in a new folder called ballgown, which contains a subfolder for each sequenced sample. $ stringtie -e -B -p 12 -G Retina_RPE- Choroid_stringtie_merged. gtf -o ./ballgown/[SRA Dataset identifier]/ [SRA Dataset identifier].gtf [SRA Dataset identifier].bam $ stringtie -e -B -p 12 -G Retina_RPE- Choroid_stringtie_merged. gtf -o ./ballgown/SRR5225763/SRR5225763.gtf SRR5225763.bam 4.5 Analyze the RNASeq Data with the Ballgown Library Implemented in R and Visualize the Results
Start the program R and load the required libraries:
$ R > library(ballgown) > library(genefilter) Next, load a table containing the phenotypes of each sample. This table is identical to Table 1, which can be copied into a text file and saved as a tab-delimited text file named phenotypes.csv in the ballgown folder created above ($HOME/data/ballgown/). > setwd(paste(path.expand("~"), "/data/ballgown/", sep="")) > pheno_data = read.csv("phenotypes.csv", sep = "\t", stringsAsFactor = FALSE) Subsequently, run the ballgown() function to read the quantification results from the files created by StringTie and create a new object (bg.data), which contains the relevant data. ballgown() requires the names of the samples in order to match the gene/ transcript expression data to the phenotypes of each sample. > bg.data = ballgown(samples = pheno_data$SRA.Dataset, pData = pheno_data) Many transcripts will have a very low transcript count, usually zero across all samples, and it can be useful to exclude those
Conduct and Quality Control of Differential Gene Expression Analysis Using…
37
transcripts from downstream analyses by filtering bg.data using the subset() function. > bg.data = subset(bg.data, "rowVars(texpr(bg.data))>1") This chapter focuses on the analysis of expression levels of known (annotated) genes. Therefore, extract the gene expression data from bg.data with the gexpr() function and then name the different columns according to their tissue of origin (RPE-choroid- sclera or Retina), the location the puncture was taken (Macular or Peripheral), as well as the ID of the individual. > gene.expr.data = gexpr(bg.data) > colnames(gene.expr.data) = apply(pData(bg.data)[,c( "Tissue", "Location", "ID")], 1, paste, collapse="_") The first six rows of the gene expression data can be displayed with the following command: > head(gene.expr.data) It is also useful to plot the distribution of gene expression levels of each sample, since major differences in library preparation or quantification of library concentration might lead to significant differences between samples (Fig. 1). Since the y-axis of the plot will be plotted on a logarithmic scale, a small constant values needs to be added to all gene expression values since non-expressed genes with a value of 0 will cause problems with the logarithmic transformation. > boxplot(gene.expr.data + 0.01, col = pData(bg.data)$Color, log= "y", las = 2, names = pData(bg.data)$ID, ylim = c(0.01, 1E9)) > legend("topright", legend = unique(apply(pData(bg.data) [,c("Tissue", "Location")], 1, paste, collapse=" ")), pt.bg = unique(pData(bg.data)$Color), pch = 21, cex = 1.2) The expression of all annotated genes seems to be quite similarly distributed in all samples, indicating that further normalization is not necessary at this point. In case there are severe differences in the distribution of gene expression, additional normalization strategies are necessary (see Note 4). In the next steps, an unsupervised cluster algorithm will be used to group the samples according to their expression data. This step is generally useful to identify potential outlier samples as well as potential confounding factor which are not accounted for (e.g., different library preparation or sequencing batches; see Note 5). First, run a principal component analysis by transposing the gene expression matrix and calling the function prcomp(). > pc = prcomp(t(gene.expr.data), center = TRUE, scale = TRUE) Second, plot the samples according to the first two principle components. In order to distinguish different samples according to
38
Felix Grassmann
Fig. 1 Distribution of gene expression data across all samples. The expression of all genes of each sample is plotted in a boxplot representation. The gene expression distribution is uniform across all samples, and further normalization is not required
their tissue and location of origin, supply the plot() function with the Color column from phenotypes.csv (Table 1). The legend() function creates a legend at the top right of the plot. > plot(pc$x[,1:2], pch = 21, bg = pData(bg.data)$Color, cex = 2, ylim = c(-100, 100), xlim = c(-100, 100), xlab = "PC1", ylab = "PC2", cex.lab = 1.4, cex.axis = 1.3) > legend("topright", legend = unique(apply(pData(bg.data) [,c("Tissue", "Location")], 1, paste, collapse=" ")), pt.bg = unique(pData(bg.data)$Color), pch = 21, cex = 1.5) The unsupervised cluster algorithm grouped together samples from different tissues, while the differences between locations within each tissue are not pronounced (Fig. 2). Next, run the program stattest() in order to identify genes which are (significantly) differentially expressed between RPE and retina (coded in the column Tissue in phenotypes.csv) while accounting for differences between samples such as age, location of
Conduct and Quality Control of Differential Gene Expression Analysis Using…
39
Fig. 2 Unsupervised cluster analysis of the gene expression data. A principal component analysis (PCA) was performed on the gene expression data, and the samples were projected onto a plane defined by the first two principle components (PC1 and PC2). Samples from retinal tissue cluster distinctively from samples derived from RPE-choroid-sclera. Within each tissue, the gene expression in different locations (peripheral vs. macular) does not vary strongly from one another
puncture, and sex. The following command further requires stattest() to use the measure fragments per kilo base pair per million reads (FPKM) and to report the fold change (FC) observed between the tissues. > results_genes = stattest(bg.data, feature = "gene", covariate = "Tissue", adjustvars = c("Location","Age", "Sex"), getFC = TRUE, meas = "FPKM") The six most significant genes can be displayed with the head() function: > head(results_genes) In addition to calculating the raw P-values, the stattest() function also automatically calculates the false discovery rate (FDR) of differential expression of each gene. In case the RNASeq experiment did not yield any significant differentially expressed genes (i.e., no gene showed a differential gene expression with a FDR smaller than 5%), the experiment was either confounded by known or unknown batch effects, or it was underpowered due to a low biological replicate number (see Note 6).
40
Felix Grassmann
Fig. 3 Volcano plot representation of differentially expressed genes. The logarithmic P-values of each gene are plotted against the respective logarithmic fold change. Gray dots indicate genes which do not meet the significance threshold (FDR smaller than 0.05), while green dots indicate genes that meet the significance threshold but have a small fold change between 0.10 and 10.0. Orange dots represent genes that are significantly differentially expressed between RPE- choroid-sclera and retina with a strong effect (fold change greater than 10.0 or lower than 0.10)
The results of the differential gene expression analysis can be visualized in a volcano plot by plotting the logarithmic P-Value against the logarithmic observed fold change for each gene. This will highlight genes which are significantly differentially expressed between tissues (FDR smaller 0.05 in this analysis) and, in addition, genes that also have a large fold change (FC > 10.0 or FC < 0.10) between those groups (Fig. 3). > plot(-log10(pval) ~ log2(fc), data = results_genes, pch = 21, main = "Volcano plot of differentially expressed genes", xlab = "log2 Fold Change", ylab = "-log10(P-Value)", bg = "darkgrey", xlim = c(-8, 8), ylim=c(0, 19), cex.lab = 1.3, cex.axis = 1.3) > points(-log10(pval) ~ log2(fc), data = results_genes[results_ genes$qval points(-log10(pval) ~ log2(fc), data = results_genes[results_ genes$qval3.3,], bg = "green", pch = 21)
Conduct and Quality Control of Differential Gene Expression Analysis Using…
41
> legend("topright", legend = c("FDR > 0.05","FDR < 0.05 and 0.1 < FC < 10", "FDR < 0.05 and FC > 10 or FC < 0.1"), pt.bg = c("darkgrey", "orange", "green"), pch = 21, cex = 1.0) Lastly, create a large table that contains the results from stattest(), as well as the names of the genes and the expression values in each sample, and write this table to file which can be viewed with a text editor or Microsoft Excel®. > results_genes = data.frame(GeneName = ballgown::geneNames(bg. data)[match(results_genes$id, ballgown::geneIDs(bg.data))], results_genes, gene.expr.data) > write.csv(results_genes, "Retina_RPE- Choroid_genes_results. csv", row.names=FALSE)
5 Notes 1. There are commercially available programs to analyze RNASeq data providing a graphics user interface, which might be more suitable for beginners. The programs used in this chapter are free for academic uses and only require an affordable computer setup. 2. RNASeq reads might be delivered in different formats and may require additional cleaning steps such as trimming the 3’ part of reads not meeting a certain quality threshold [6]. All of these operations can be performed by free, open-source tools, such as SAMtools or Seqtk (https://github.com/lh3/seqtk/). 3. HISTA2 will search for the appropriate transcriptome/genome reference using the $HISAT2_INDEXES environmental variable. In case this variable has not been set or does not work, the absolute path to the index must be manually supplied from the command line. In case the references have been downloaded with the commands in this chapter, the reference can be supplied to HISAT2 with the following command line argument: -x $HOME/bin/hisat2/genomes/grch38_tran/genome_tran 4. The RNASeq data used in this project are relatively uniform in their expression profile across all samples. Additional normalization may be required and can be achieved by taking the logarithm of all expression values and/or by using different normalization techniques such as quantile normalization or TMM (trimmed mean of M-values normalization) [7]. Care has to be taken not to eliminate relevant biological variance between samples. 5. Ideally, all libraries should be constructed in the same batch from biological replicates ascertained under the same condi-
42
Felix Grassmann
tions and also sequenced on the same sequencing lane(s) on the sequencer. However, in many cases this might not be practical and confounding batch effects can result in loss of power to detect differentially expressed genes. Technical batch effects from library preparation and sequencing as well as biological batch effects (e.g., different litters of animals or varying tissue ascertainment conditions, etc.) can be accounted for by different methods, for instance, by including a reference sample in each prep/run and by ensuring that each batch includes enough samples from each treatment. In addition, the program ComBat implemented in R [8] can be used to correct for known batch effects and other confounders, which has been shown to increase power to detect differential gene expression. 6. Importantly, RNASeq will analyze the expression of thousands of genes and thus warrants stringent adjustment for multiple testing to prevent false-positive findings. In most cases, the raw P-values of differential gene expression can be adjusted according to the false discovery rate (FDR). The FDR is reported by the stattest() function in R, and FDR values below 5% are usually considered to be statistically significant. A low biological replicate number can result in lack of significant findings after adjustment for multiple testing. Conduct a power analysis before preparing the libraries to find out how many biological replicates are necessary to identify significantly differentially expressed gene, given an expected effect size. More samples might be required than expected.
Acknowledgment This work was supported in part by a grant from the Deutsche Forschungsgemeinschaft (GR 5065/1-1) and by the institutional budget for Research and Teaching from the Free State of Bavaria (Titel 73). References 1. Pertea M, Kim D, Pertea GM, Leek JT, Salzberg SL (2016) Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc 11:1650–1667 2. Tian L, Kazmierkiewicz KL, Bowman AS, Li M, Curcio CA, Stambolian DE (2015) Transcriptome of the human retina, retinal pigmented epithelium and choroid. Genomics 105:253–264 3. Huber W, Carey VJ, Gentleman R, Anders S, Carlson M, Carvalho BS, Bravo HC, Davis S, Gatto L, Girke T et al (2015) Orchestrating
high-throughput genomic analysis with Bioconductor. Nat Methods 12:115–121 4. Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M et al (2013) NCBI GEO: archive for functional genomics data sets-update. Nucleic Acids Res 41:D991–D995 5. Hansen KD, Brenner SE, Dudoit S (2010) Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Res 38:e131–e131
Conduct and Quality Control of Differential Gene Expression Analysis Using… 6. Williams CR, Baccarella A, Parrish JZ, Kim CC (2016) Trimming of sequence reads alters RNA-Seq gene expression estimates. BMC Bioinformatics 17:103 7. Robinson MD, Oshlack A (2010) A scaling normalization method for differential expres-
43
sion analysis of RNA-seq data. Genome Biol 11:R25 8. Johnson WE, Li C, Rabinovic A (2007) Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8:118–127
Chapter 3 Testing for Known Retinal Degeneration Mutants in Mouse Strains Khalid Rashid, Katharina Dannhausen, and Thomas Langmann Abstract Approximately 93 years ago at the zoological laboratories of Harvard University, Keeler, a medical geneticist, discovered a retina from a male albino mouse that was completely devoid of visual cells (rods). This rodless mouse was to be the first ever reported murine model of retinal degeneration. Over the years, naturally occurring retinal degeneration mouse mutants have been identified in several common laboratory inbred lines including FVB/NJ (Pde6brd1) and C57BL/6N (Crb1rd8). It is therefore imperative that vision researchers employing other genetically induced retinal degeneration models and experimental models such as laser-induced choroidal neovascularization (CNV) or bright white-light exposure screen for such naturally occurring mutations to prevent costly misinterpretations. In this regard, we describe herein simple molecular-based techniques for screening the presence of some commonly encountered rd mutations (Pde6brd1, Crb1rd8, Pde6brd10, and Rpe65rd12). Key words Mice, Molecular techniques, Mutations, Retinal degeneration
1 Introduction Retinal degenerative diseases (RDD’s) are a multitude of debilitating diseases which result in impaired vision or incurable blindness due to photoreceptor loss or dysfunction [1]. RDD’s can be classified broadly into monogenic diseases, inherited in a classic Mendelian fashion, or multifactorial diseases [2]. Some of the most common human forms of RDD’s include Leber congenital amaurosis, retinitis pigmentosa, macular degeneration, cone or cone- rod dystrophies, congenital stationary night blindness, as well as syndromic ciliopathies such as Usher’s syndrome [1, 2]. Tremendous progress made in molecular diagnostic techniques, such as the advent of next-generation sequencing, has seen around 316 mutations on various genes and loci reported to cause inherited retinal diseases in humans (RetNet: https://sph.uth.edu/ retnet/).
Bernhard H. F. Weber and Thomas Langmann (eds.), Retinal Degeneration: Methods and Protocols, Methods in Molecular Biology, vol. 1834, https://doi.org/10.1007/978-1-4939-8669-9_3, © Springer Science+Business Media, LLC, part of Springer Nature 2019
45
46
Khalid Rashid et al.
Spontaneously occurring or genetically engineered murine models of RDD’s display pathophysiological phenotypes that mimic clinical manifestation in the corresponding human diseases. For instance, the Pde6brd1 mouse carrying a nonsense mutation in the Pde6b gene that codes for the β-subunit of cGMP phosphodiesterase (PDE) bears a great deal of phenotypic resemblance to human patients suffering from autosomal recessive retinitis pigmentosa also as a result of cGMP-PDE gene mutations (OMIM 180072) [3, 4]. The shared phenotypic characteristics include degeneration of photoreceptors and their nuclei in the outer nuclear layer, retinal vessel attenuation, pigment patches in the fundus as well as diminished electroretinogram (ERGs) amplitudes [3–5]. As such, these models have contributed immensely to our understanding of the underlying neuropathological processes and disease mechanisms. These murine models have in addition been fundamental in the design, testing, optimization, as well as refinement of therapeutic approaches [6]. Vision researchers developing new genetical mouse models or using normal strains in experimental models such as light- or laser- induced damage should screen their mice to exclude the different retinal degeneration (rd) allele contamination. The importance of such a practice is exemplified in a classical example in 2012 where researchers crossing induced ocular mutants onto the C57BL/6N background noted the same ocular phenotype in their littermate controls [7]. A PCR screen later uncovered that the C57BL/6N mice were homozygous for the rd8 mutation [7]. A further example comes from a study utilizing Ccl2−/−/Cx3cr1−/− double knockout mice where the early onset of retinal degeneration characterized by the accumulation of drusen and bloated macrophages in the subretinal space was a result of homozygous rd8 mutation rather than the synergistic effect of the combined knockout [8, 9]. Therefore, to prevent such misinterpretation due to the unnoticed presence of the rd mutations, we outline the steps to be undertaken in genotyping some commonly encountered rd mutations (Pde6brd1, Crb1rd8, Pde6brd10, and Rpe65rd12) in many common laboratory mouse strains. Our article complements other valuable resources for typing rd mutations [7, 10, 11] (Fig. 1).
2 Materials 2.1 Animals
Ear tags for DNA extraction were obtained from C57BL/6J controls, homozygous mutants FVB/NJ (Pde6brd1), C57BL/6N (Crb1rd8), C57BL/6J (Pde6brd10), C57BL/6J (Rpe65rd12), as well as their respective heterozygous hybrids (crossed with wild-type C57BL/6J). All experimental protocols and procedures involving the use of mice as experimental animals adhered to rules and regulations set out by the Association for Research in Vision and Ophthalmology (ARVO).
Testing for Known Retinal Degeneration Mutants
47
Fig. 1 Schematic representation of the various steps involved in the genotyping of some commonly encountered retinal degeneration mutations in mice (Pde6brd1, Crb1rd8, Pde6brd10, and Rpe65rd12) 2.2 Equipment
1. Nanodrop Spectrophotometer. 2. Thermocycler. 3. Gel electrophoresis chamber and power supply. 4. Gel documentation station (UV-transilluminator, computer, capture software, and a printer). 5. Gel casting trays and combs. 6. Dry block heaters/Water bath. 7. Microwave. 8. Microcentrifuge. 9. Vortex mixer. 10. Analytical balance. 11. Ice boxes. 12. Adjustable pipets. 13. PCR strips. 14. Microcentrifuge tubes (0.5–1.5 mL).
2.3 Reagents
1. Tissue samples as source of Genomic DNA (tail tip 1–2 mm or ear tissue). 2. Nuclease-free water (ddH2O). 3. Alkaline lysis solution (25 mM NaOH, 0.2 mM EDTA in ddH2O).
48
Khalid Rashid et al.
Table 1 PCR thermal profiles for genotyping the Pde6brd1 allele Cycle step
Temperature (°C)
Time
Cycles
Initial denaturation
94
5 min
1
Denaturation Annealing Extension
94 65 72
40 s 44 s 2 min
35
Final extension
72
10 min
1
Hold
4–10
4. Neutralization solution (40 mM Tris–HCL, pH 5). 5. Laird’s buffer (200 mM NaCl, 100 mM Tris–HCl pH 8.3, 5 mM EDTA, proteinase K, 20% SDS, isopropanol, ethanol, ddH2O). 6. Reverse and forward primers (see Table 1). 7. DNA loading dye. 8. Agarose. 9. Tris/borate/EDTA (TBE) buffer (see Note 1). 10. Ethidium bromide. 11. 1 kb DNA ladder. 12. NarI restriction enzyme +10× restriction buffer. 13. PCR Kit. 14. DNA purification Kit.
3 Methods 3.1 DNA Extraction Hotshot Procedure
1. Add 75 μL lysis solution to the fresh or thawed tissue samples. Make sure that the sample is covered in lysis solution and avoid air bubbles. Air bubbles sticking to the sample can lower the amount of resulting DNA. 2. Place the samples in a heating block at 95 °C for 15–20 min. 3. Following the incubation, vortex the samples, and allow them to cool down for 10–15 min on ice or by placing in a −20 °C freezer. 4. Then add 75 μL of neutralization solution, mix thoroughly by flipping, and allow undigested tissue debris to settle. 5. Collect the supernatant in a fresh microcentrifuge tube. This can be used immediately in a PCR reaction mixture.
Testing for Known Retinal Degeneration Mutants
3.2 DNA Extraction Modified “Laird’s” Procedure
49
1. Cut tissue into small pieces, and add 500 μL of Laird’s buffer, 20 μL 20% SDS, and 20 μL proteinase K, and incubate for 3–4 h at 56 °C until the tissue is completely solved. 2. Centrifuge at 11,000 × g for 10 min, and pipet supernatant in a new 1.5 mL microcentrifuge tube to get rid of the remaining fur. 3. Add 500 μL of isopropanol, mix thoroughly until DNA gets visible as a thin white string, and then centrifuge again at 11,000 × g for 10 min. 4. Discard supernatant, and wash DNA using 70% ethanol before centrifuging for 10 min at 11,000 × g. 5. Discard supernatant and leave the pellet to dry. Dissolve in 20–50 μL of ddH2O based on the size of the pellet.
3.3 Genotyping Procedures 3.3.1 Retinal Degeneration 1
The Pde6brd1 mutation is caused by the integration of an 8.5 kb xenotropic murine leukemia virus (Xmv-28) element into intron 1 of the Pde6b gene encoding the β subunit of cGMP phosphodiesterase [12]. This integration occurs 1511 bp downstream of the exon-intron boundary and interferes with the normal transcription of this gene [4, 12]. Consequently, the rd1 model is characterized by a rapid rod photoreceptor degeneration, leaving the outer nuclear layer with only a single layer of cone photoreceptors by 4 weeks of age [6]. Other associated phenotypic features include vessel attenuation and pigment patches in the fundus [4]. The PCR method described herein was developed by Giménez and Montoliu [10] and makes use of two different combinations of oligonucleotides (Pde6brd1_F1/Pde6brd1_R and Pde6brd1_ F2/Pde6brd1_R) to identify mice homozygous, heterozygous, or wild type for the rd1 mutation (Fig. 2) (see Note 2). 1. Primer pair Pde6brd1_F1/Pde6brd1_R amplifies a 0.40 kb PCR product from the wild-type allele, whereas pair Pde6brd1_ F2/Pde6brd1_R amplifies a 0.55 kb PCR product from the Pde6brd1 mutant allele. No PCR product should be obtained from the rd1 mutants using the wild-type primer pairs and vice versa,
Fig. 2 Schematic representation of the Pde6brd1 mutation. The scheme shows insertion of an 8.5 kb xenotropic murine leukemia virus (Xmv-28) element into intron 1 of the Pde6b gene encoding the β-subunit of cGMP phosphodiesterase. The respective positions of the genotyping primers used are also indicated
50
Khalid Rashid et al.
Fig. 3 Agarose gel showing resolution of wild-type (400 bp), rd1 homozygous (550 bp), and rd1 heterozygous genotypes using primer pairs Pde6brd1_F1/Pde6brd1_R (RD1 WT assay) and Pde6brd1_F2/Pde6brd1_R (RD1 mut assay), respectively Table 2 PCR thermal profiles for genotyping the Crb1rd8 allele Cycle step
Temperature (°C)
Time
Cycles
Initial denaturation
94
5 min
1
Denaturation Annealing Extension
94 56 72
40 s 44 s 1 min
35
Final extension
72
10 min
1
Hold
4—10
while both PCR bands should be present in heterozygous animals (Fig. 3). 2. We recommend using a High-Fidelity DNA Polymerase PCR Kit. In the current protocol, we have used the Q5® High- Fidelity DNA Polymerase products for preparing a master mix containing 5 μL of the 5× Q5 reaction buffer, 0.5 μL of 10 mM dNTPs, 1.25 μL forward and 1.25 μL reverse primers (10 μM primer stock solution), 0.25 μL Q5 High-Fidelity DNA Polymerase, and 5 μL of 5× Q5 High GC Enhancer (optional); fill up to a total volume of 25 μL with nuclease-free water. 3. Pipet the appropriate volume of the PCR master mix solution per tube and afterward each sample DNA (