Long Non-Coding RNAs: Methods and Protocols (Methods in Molecular Biology, 2372) 107161696X, 9781071616963

This second edition provides a broad spectrum of methods used in long non-coding RNAs (lncRNA) research, ranging from co

112 53 8MB

English Pages 321 [313] Year 2021

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Contents
Contributors
Chapter 1: LncRNA Pulldown Combined with Mass Spectrometry to Identify the Novel lncRNA-Associated Proteins
1 Introduction
2 Materials and Reagents
2.1 Reagents
2.1.1 For Preparation of In Vitro Transcribed RNA (IVT)
2.1.2 For Preparation of Cell Lysate
2.1.3 For lncRNA Pulldown
2.1.4 For In-Solution Tryptic Digestion
3 Methods
3.1 IVT of Biotin-Labeled lncRNA
3.1.1 Cloning the lncRNA Gene Sequence into pGEM-3Z Vector
3.1.2 Lineation of the pGEM-3Z-lncRNA Vector
3.1.3 IVT Reaction
3.1.4 Purification and Size/Yield Check of IVT RNA
3.2 Preparation of Cell Lysate
3.3 Activation of BcMag Monomer Avidin Magnetic Beads
3.4 lncRNA Pulldown
3.4.1 Pre-clearing the Lysate with Activated Beads to Reduce Non-specific Binding
3.4.2 Folding of IVT RNA
3.4.3 LncRNA Pulldown
3.5 MS Analysis
4 Notes
References
Chapter 2: Crosslinking Immunoprecipitation and qPCR (CLIP-qPCR) Analysis to Map Interactions of Long Noncoding RNAs with Cano...
1 Introduction
2 Materials
2.1 Mammalian Cell Culture and RNA Labeling
2.2 UVA Crosslinking and Cell Harvesting
2.3 RNase Digestion and Immunoprecipitation
2.4 Reverse Transcription and qPCR
3 Methods
3.1 Mammalian Cell Culture and RNA Labeling
3.2 UVA Crosslinking and Harvesting Cells
3.3 RNase Digestion and Immunoprecipitation
3.4 Reverse Transcription and qPCR
4 Notes
References
Chapter 3: Characterization of Long Non-coding RNA Associated Proteins by RNA-Immunoprecipitation
1 Introduction
2 Materials
3 Methods
3.1 Whole Cell Lysate Preparation (See Note 2)
3.2 Cell Harvest and Nuclei Lysate Preparation (See Note 2)
3.3 RNA Immune-Precipitation and Purification
4 Notes
References
Chapter 4: Isolation of Protein Complexes Associated with Long Non-coding RNAs
1 Introduction
2 Materials
2.1 Bacterial Lysis Buffer
2.2 MBP Column Buffer
2.3 Wash Buffer 1
2.4 Wash Buffer 2
3 Methods
3.1 MS2-MBP Fusion Protein Expression and Immobilization
3.2 Transfection and Cytoplasmic Protein Extraction
3.3 Pull-Down Assay and Protein Separation
4 Notes
References
Chapter 5: Detection of Long Noncoding RNA Expression by Real-Time PCR
1 Introduction
2 Materials
2.1 List of Materials
2.2 Primer and Probe Design
2.3 cDNA Template
2.4 Real-Time PCR Mix
3 Methods
3.1 Reaction Preparation
3.2 Temperature Cycling
3.3 Detection Methods
3.3.1 SYBR Green
3.3.2 TaqMan Probe
3.3.3 Molecular Beacon
3.4 Data Analysis
4 Notes
References
Chapter 6: Profiling Long Non-coding RNA expression Using Custom-Designed Microarray
1 Introduction
2 Materials
2.1 RNA Purification
2.2 RNA polyA Tailing and cDNA Synthesis
2.3 cDNA Labeling
2.4 DNA Hybridization
2.5 Equipment
3 Methods
3.1 DNase Treatment
3.2 Poly (A) Tailing
3.3 First Strand cDNA Synthesis
3.4 Second Strand Synthesis
3.5 Digestion of RNA
3.6 QC of cDNA
3.7 Labeling
3.8 Hybridization
4 Notes
References
Chapter 7: Long Non-coding RNA Expression Profiling Using Arraystar LncRNA Microarrays
1 Introduction
2 Materials
3 Methods
3.1 RNA Samples
3.2 RNA Sample QC
3.3 Spike-in Control
3.4 cRNA Synthesis and Labeling
3.4.1 Double-Strand cDNA Synthesis
3.4.2 In Vitro Transcription of cRNA
3.4.3 cRNA Purification
3.4.4 Quantification of the cRNA
3.5 Microarray Hybridization
3.6 Microarray Washing
3.7 Microarray Scanning and Data Extraction
3.7.1 Agilent Scanner Settings
3.7.2 GenePix Scanner Settings
3.7.3 Data Extraction
3.8 Data Analysis
3.8.1 Set Up Raw Data Import
3.8.2 Create New Project for Data Analysis
3.9 Standard Analyses
4 Notes
References
Chapter 8: Nuclear RNA Isolation and Sequencing
1 Introduction
2 Materials
2.1 Components for Isolation and Testing Intact Nuclei
2.2 Components for Nuclear RNA Isolation
2.3 Required Equipment
3 Methods
3.1 Isolating Intact Nuclei and Testing the Purity of the Nuclear Fraction
3.1.1 Isolation of Intact Nuclei
3.1.2 Testing the Purity of Nuclear Fraction
3.2 Trizol Extraction of Nuclear RNA
3.3 Generation of Illumina Paired End Library for Sequencing
4 Notes
References
Chapter 9: Capturing Endogenous Long Noncoding RNAs and Their Binding Proteins Using Chromatin Isolation by RNA Purification
1 Introduction
2 Materials
2.1 Biotinylated DNA Probes
2.2 Buffers for ChIRP
2.3 Reagents and Supplies
2.4 Instruments
3 Methods
3.1 Preparation of Tissue Samples
3.2 Cross-Linking of the lncRNA-Protein Complex
3.3 Tissue Lysis
3.4 Pre-clearing of the Tissue Lysate
3.5 Hybridization of the Endogenous lncRNA in the Tissue Lysate with the Probes
3.6 Wash the Protein-lncRNA-Probe-Streptavidin Bead Complex
3.7 Protein Elution for Mass Spectrometry or Immunoblotting
3.8 RNA Isolation for Real-Time Quantitative PCR
3.9 Real-Time Quantitative PCR to Validate RNA Pulldown
4 Notes
References
Chapter 10: Purification and Structural Characterization of the Long Noncoding RNAs
1 Introduction
2 Materials
2.1 Transcription
2.2 Native Purification
2.3 Structure Probing and Library Preparation
3 Methods
3.1 Transcription
3.2 Native Purification
3.3 Filtration and Buffer Exchange
3.4 Size Exclusion Chromatography
3.5 Magnesium Titration for Determining the lncRNA Folding Conditions
3.6 Secondary Structure Probing
3.6.1 SHAPE Probing and Reverse Transcription
3.6.2 DMS-MaP Probing and Reverse Transcription
3.7 cDNA Amplification
3.8 Library Preparation
3.9 Data Analysis
3.9.1 Structure Determination
3.9.2 Shotgun Secondary Structure (3S) Analysis
3.9.3 Confidence Estimation
3.9.4 Validate the Model Using DMS Probing Data
4 Notes
References
Chapter 11: Simultaneous RNA-DNA FISH
1 Introduction
2 Materials
2.1 Equipment
2.2 Commercial Reagents
2.3 Buffers and Solutions
3 Methods
3.1 Probe Preparation
3.1.1 Synthesis of a Single Oligonucleotide Probe Dually Labeled with Fluorescein and Amino Group
3.1.2 Synthesis of Cy5 and Amino Dually Labeled Probe Set for SMRF
3.1.3 Nick Translation
3.2 Slide Preparation
3.2.1 Cytospin
3.2.2 Cell Culture on Coverslips
3.3 FISH
3.3.1 Single Molecule RNA FISH (SMRF)
3.3.2 RNA FISH
3.3.3 DNA FISH
4 Notes
References
Chapter 12: Highly Resolved Detection of Long Non-coding RNAs In Situ
1 Introduction
2 Materials
2.1 Sample Preparation: Cultured Cells
2.2 Sample Preparation: Mouse Embryos/Embryo Fragments
2.3 Immunofluorescence (IF)
2.4 RNA and DNA FISH: Probe Labeling for Double-Stranded Probes (See Note 7)
2.5 RNA FISH: Probe Labeling for Strand-Specific Probes (See Note 7)
2.6 RNA and DNA FISH: Probe Precipitation
2.7 Allele-Specific RNA FISH: Guide Probe Precipitation
2.8 RNA FISH: Sample Hybridization
2.9 DNA FISH: Sample Hybridization
2.10 Allele-Specific RNA FISH Protocol: Sample Hybridization (See Note 13)
3 Methods
3.1 Sample Preparation: Cultured Cells (See Note 16)
3.2 Sample Preparation: Mouse Embryos
3.3 Immunofluorescence (IF)
3.4 RNA FISH: Probe Labeling for Double-Stranded Probes
3.5 RNA FISH: Probe Labeling for Strand-Specific Probes
3.6 RNA FISH: Probe Precipitation
3.7 Allele-Specific RNA FISH: Guide Probe Precipitation
3.8 RNA FISH: Sample Hybridization
3.9 DNA FISH: Sample Hybridization
3.10 Allele-Specific RNA FISH: Sample Hybridization
4 Notes
References
Chapter 13: Detection of Long Non-coding RNA Expression by Non-radioactive Northern Blots
1 Introduction
2 Materials
2.1 Materials for DIG-Labeled RNA Probe Synthesis
2.2 Materials for Separating RNA by Electrophoresis
2.3 Materials for Transferring RNA to the Membrane
2.4 Materials for Probe-RNA Hybridization
2.5 Materials for Detection of Probe-RNA Hybrids
3 Methods (See Note 2)
3.1 DIG-Labeled RNA Probe Synthesis by In Vitro Transcription
3.2 Separating RNA Samples by Electrophoresis
3.2.1 Gel Setup
3.2.2 RNA Electrophoresis
3.3 Transfer RNA from Agarose Gel to the Membrane
3.4 Hybridization of DIG-Labeled Probes to the Membrane (See Note 13)
3.4.1 Prehybridization
3.4.2 Hybridization
3.5 Detection of DIG-Probe/Target RNA Hybrids
3.5.1 Localizing the Probe-Target Hybrid with Anti-DIG Antibody
3.5.2 Visualizing Probe-Target Hybrids Using Chromogenic or Chemiluminescent Method (See Note 18)
4 Notes
References
Chapter 14: Assessment of In Vivo siRNA Delivery in Cancer Mouse Models
1 Introduction
2 Materials
2.1 Commercial Reagents
2.2 Equipment
3 Methods
3.1 Preparation of siRNA-Chitosan Nanoparticles (siRNA/CH-NPs)
3.2 Delivery of siRNA/CH-NPs into Ovarian Cancer Cells
3.3 Development of Orthotopic In Vivo Models of Ovarian Cancer
3.4 Determination of Uptake of Fluorescent siRNA/CH-NPs in Ovarian Tumors
3.5 RNA Isolation from Blood, Plasma, and Tissue
3.6 siRNA Quantification in Blood and Tissue by Stem-Loop qRT-PCR
3.7 Fluorescence-Based Biodistribution Study
3.8 Quantifying Levels of Gene Knockdown Using Quantitative Reverse Transcription PCR (qRT-PCR)
4 Notes
References
Chapter 15: Genetic Editing of Long Noncoding RNA Using CRISPR/Cas9 Technology
1 Introduction
2 Materials
2.1 CRISPR/cas9 to lncRNA Construction
2.2 Transient Transfection of CRISPR/cas9 and Selection
2.3 T7EN1 Analysis and DNA Sequencing
2.4 RNA Isolation and SYBR-Based qRT-PCR
3 Methods
3.1 CRISPR/cas9 to lncRNA Construction and Transfection
3.1.1 Design for gRNA Sequences of CRISPR/cas9 Against lncRNAs
3.1.2 Cloning Specific lncRNA Target Sequence into Lenti-CRISPRv2
3.2 T7EN1 Analysis and DNA Sequencing
3.3 RNA Isolation and SYBR-Based qRT-PCR
3.3.1 RNA Isolation
3.3.2 SYBR-Based qRT-PCR Analysis
4 Notes
References
Chapter 16: Characterization of Circular RNAs
1 Introduction
2 Materials
2.1 RNA Fractionations
2.2 RNase R Digestion
2.3 Validation of Circular RNAs
2.3.1 RNA Deep-Sequencing and Genome-Wide Analysis of Circular RNAs
2.3.2 Denaturing Urea-Polyacrylamide Gel Electrophoresis (Urea PAGE)
2.3.3 Divergent RT-PCR
3 Methods
3.1 Fractionation of Nonpolyadenylated RNAs from Mammalian Cells
3.2 RNase R Digestion (See Note 2)
3.3 Validation of Circular RNAs
3.3.1 High-Throughput Sequencing Analysis of Circular RNAs
3.3.2 Northern Blots with Denaturing Polyacrylamide Gel Electrophoresis (PAGE)
3.3.3 RT-PCR with Divergent Primers
4 Notes
References
Chapter 17: Identification of circRNA-Interacting Proteins by Affinity Pulldown
1 Introduction
2 Materials
2.1 General Reagents
2.2 Preparation of Biotinylated Antisense Oligomer (ASO)
2.3 Cell Lysis and Binding
2.4 Pulldown of circRNAs and Elution of Proteins
2.5 RNA Isolation, Reverse Transcription, and qPCR
3 Methods
3.1 Preparation of Cell Lysate
3.2 Binding of circRNPs to Biotinylated SO/ASO (All Steps on Ice)
3.3 Preparation of Streptavidin Magnetic Beads and circRNP Purification
3.4 RNA Isolation, Reverse Transcription and qPCR
3.5 Preparation of Protein for MS Analysis and Immunoblotting Validation
4 Notes
References
Chapter 18: Exploration of Circular RNA Interactomes by RNA Pull-Down Method
1 Introduction
2 Materials
3 Methods
3.1 Preparation of Biotin-Labeled DNA Probes
3.2 Preparation of Streptavidin Magnetic Beads
3.3 Binding of Biotin-Labeled DNA Probes to Streptavidin Magnetic Beads
3.4 Preparation of Cell Lysate
3.5 RNA Pull-Down for circRNA Interacting Partners
4 Notes
References
Chapter 19: Methods for Characterization of Alternative RNA Splicing
1 Introduction
2 Materials
2.1 RNA Source Material
2.2 RNA Extraction and RT
2.3 qPCR
2.4 Semiquantitative PCR
2.5 Cotransfection of Splicing Minigene and Splicing Factors
2.6 Equipment
3 Methods
3.1 RNA Extraction and Purification
3.2 RT Reaction
3.3 Primer Design for qPCR
3.4 Quantification of Splicing Via qPCR
3.5 Primer Design for Detection of Splice Isoforms Via Semiquantitative PCR
3.6 Performing and Analyzing Semi-Quantitative PCR
3.7 Generating a Splicing Minigene to Analyze Regulation of Alternative Splicing of a Variable Exon
3.8 Investigating the Regulation of Alternative Splicing Via Cotransfection of a Splicing Minigene and Splicing Factors
4 Notes
References
Chapter 20: Computational Analysis of lncRNA from cDNA Sequences
1 Introduction
2 Materials
2.1 Hardware Requirements
2.2 Software Requirements
3 Methods
3.1 Size Selection of Long Noncoding RNAs
3.2 Identification of Potential lncRNAs Using the Coding Potential Calculator
3.3 Filtering Known miRNA Precursors
3.4 Small RNA Precursor Prediction Via Mapping of Available Small RNA Sequences
3.5 Localization of lncRNAs Within the Genome, Genic Vs Intergenic
3.6 Sub-Localization of Genic lncRNAs Within Gene Models
4 Notes
References
Chapter 21: Characterizing miRNA-lncRNA Interplay
Abbreviations
1 Introduction
2 Materials
3 Methods
3.1 Experimentally Verified miRNA-lncRNA Interactions
3.2 Experimental Methodologies
3.3 DIANA-LncBase
3.3.1 Analysis of AGO-CLIP-Seq Datasets
3.3.2 Preprocessing of Deep Sequencing Data
3.3.3 Identification of AGO-CLIP-Guided miRNA Targets with microCLIP
3.4 Hands-on Guide for LncBase v3 Utilization
3.4.1 LncBase Module of miRNA-lncRNA Interactions
3.4.2 LncBase Module of lncRNA Expression Profiles
3.4.3 Visualization Plots
3.5 Download of LncBase Content
4 Notes
References
Chapter 22: Illuminating lncRNA Function Through Target Prediction
1 Long Noncoding RNAS Are Ubiquitous Players in Cellular Processes
1.1 Cataloging lncRNAs
1.2 Estimating lncRNA Expression Profiles in Normal and Disease Cells
1.3 lncRNAs Regulate Key Genes and Pathways in Normal and Disease Cells
2 Systems Biology Approaches to Predict LNCRNA Function
2.1 Systems Biology Approaches to Evaluate the Phenotypic Effects of lncRNAs
2.2 Systems Biology Approaches to Identify lncRNA Targets
3 Computational Inference to Predict LNCRNA Function
3.1 Predicting Transcriptional Interactions
3.2 Predicting Posttranscriptional Interactions
3.3 Integrative Computational Methods
3.4 Total RNA-Seq Helps Improve lncRNA Interaction Prediction
3.5 Subcellular Localization Predictions
3.6 Gene Set Enrichment Analysis Helps Identify Functional Roles of lncRNAs
4 Challenges and Future Perspectives
References
Chapter 23: Long Noncoding RNAs: An Overview
1 Introduction
2 Characteristics of LncRNAs
3 Physiological Functions of LncRNAs
3.1 Roles in Developmental Processes
3.1.1 X Chromosome Inactivation
3.1.2 Stem Cell Pluripotency and Somatic Cell Reprogramming
3.1.3 Lineage Specification
3.2 Functions in Human Diseases
3.3 LncRNAs and Imprinting
4 Mechanism for lncRNA Function
References
Index
Recommend Papers

Long Non-Coding RNAs: Methods and Protocols (Methods in Molecular Biology, 2372)
 107161696X, 9781071616963

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Methods in Molecular Biology 2372

Lin Zhang Xiaowen Hu Editors

Long Non-Coding RNAs Methods and Protocols Second Edition

METHODS

IN

MOLECULAR BIOLOGY

Series Editor John M. Walker School of Life and Medical Sciences University of Hertfordshire Hatfield, Hertfordshire, UK

For further volumes: http://www.springer.com/series/7651

For over 35 years, biological scientists have come to rely on the research protocols and methodologies in the critically acclaimed Methods in Molecular Biology series. The series was the first to introduce the step-by-step protocols approach that has become the standard in all biomedical protocol publishing. Each protocol is provided in readily-reproducible step-bystep fashion, opening with an introductory overview, a list of the materials and reagents needed to complete the experiment, and followed by a detailed procedure that is supported with a helpful notes section offering tips and tricks of the trade as well as troubleshooting advice. These hallmark features were introduced by series editor Dr. John Walker and constitute the key ingredient in each and every volume of the Methods in Molecular Biology series. Tested and trusted, comprehensive and reliable, all protocols from the series are indexed in PubMed.

Long Non-Coding RNAs Methods and Protocols Second Edition

Edited by

Lin Zhang and Xiaowen Hu Department of Obstetrics and Gynecology, University of Pennsylvania, Philadelphia, PA, USA

Editors Lin Zhang Department of Obstetrics and Gynecology University of Pennsylvania Philadelphia, PA, USA

Xiaowen Hu Department of Obstetrics and Gynecology University of Pennsylvania Philadelphia, PA, USA

ISSN 1064-3745 ISSN 1940-6029 (electronic) Methods in Molecular Biology ISBN 978-1-0716-1696-3 ISBN 978-1-0716-1697-0 (eBook) https://doi.org/10.1007/978-1-0716-1697-0 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Humana imprint is published by the registered company Springer Science+Business Media, LLC, part of Springer Nature. The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.

Preface The advent of next generation sequencing revealed that the majority of the human genome is transcribed to non-coding RNAs. Recent studies on long non-coding RNAs (lncRNAs) suggest that they may represent another class of important, nonprotein regulators of various biological processes, including cell proliferation, differentiation, migration, apoptosis, and transformation. In this volume of the Methods in Molecular Biology series, we assemble a broad spectrum of methods used in lncRNA research, ranging from computational annotation of lncRNA genes to molecular and cellular analyses of the function of individual lncRNA. Methods used to study circular RNAs and RNA splicing also are included. Given that methods for lncRNA research are developed rapidly, we also add chapters describing the applications of new technologies such as CRISPR in the lncRNA field. We would like to make this book a must-have for anyone who conducts lncRNA research. The intended audience includes molecular biologists, cell and developmental biologists, specialists who conduct disease-oriented research, and bioinformatics experts who seek a better understanding on lncRNA expression and function by computational analysis of the massive sequencing data that are rapidly accumulating in recent years. It is our hope this Long Non-coding RNA Protocol will stimulate the reader to explore diverse ways to understand the mechanisms by which lncRNAs facilitate the molecular aspects of biomedical research. We would like to acknowledge and thank all authors for their valuable contributions, particularly for sharing with readers all hints, tips, and observations that one learns from using a method regularly. We also thank Dr. John Walker for his technical guidance and support. Philadelphia, PA, USA

Lin Zhang Xiaowen Hu

v

Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

v ix

1 LncRNA Pulldown Combined with Mass Spectrometry to Identify the Novel lncRNA-Associated Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Zhen Xing, Sergey Egranov, Chunru Lin, and Liuqing Yang 2 Crosslinking Immunoprecipitation and qPCR (CLIP-qPCR) Analysis to Map Interactions of Long Noncoding RNAs with Canonical and Non-canonical RNA-Binding Proteins . . . . . . . . . . . . . . . . . . . 11 Jung-Hyun Cho, Ji Won Lee, Je-Hyun Yoon, and Kyung-Won Min 3 Characterization of Long Non-coding RNA Associated Proteins by RNA-Immunoprecipitation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Junjie Jiang, Tianli Zhang, Yutian Pan, Zhongyi Hu, Jiao Yuan, Xiaowen Hu, Lin Zhang, and Youyou Zhang 4 Isolation of Protein Complexes Associated with Long Non-coding RNAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Min Li, Ying Xue, Ruiping He, and Qihong Huang 5 Detection of Long Noncoding RNA Expression by Real-Time PCR . . . . . . . . . . 35 Rui Huang, Yihao Wang, Yaqi Deng, and Jianfeng Shen 6 Profiling Long Non-coding RNA expression Using Custom-Designed Microarray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Xinna Zhang and George A. Calin 7 Long Non-coding RNA Expression Profiling Using Arraystar LncRNA Microarrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Yanggu Shi and Jindong Shang 8 Nuclear RNA Isolation and Sequencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Navroop K. Dhaliwal and Jennifer A. Mitchell 9 Capturing Endogenous Long Noncoding RNAs and Their Binding Proteins Using Chromatin Isolation by RNA Purification . . . . . . . . . . . . 85 Jongchan Kim and Li Ma 10 Purification and Structural Characterization of the Long Noncoding RNAs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Allison Yankey, Sean C. Clark, Michael C. Owens, and Srinivas Somarowthu 11 Simultaneous RNA-DNA FISH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Lan-Tian Lai, Zhenyu Meng, Fangwei Shao, and Li-Feng Zhang 12 Highly Resolved Detection of Long Non-coding RNAs In Situ . . . . . . . . . . . . . . 123 Megan Trotter, Clair Harris, Marissa Cloutier, Milan Samanta, and Sundeep Kalantry

vii

viii

13

14

15

16 17 18 19 20 21

22

23

Contents

Detection of Long Non-coding RNA Expression by Non-radioactive Northern Blots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yueying Wang, Mu Xu, Jiao Yuan, Zhongyi Hu, Youyou Zhang, Lin Zhang, and Xiaowen Hu Assessment of In Vivo siRNA Delivery in Cancer Mouse Models . . . . . . . . . . . . . Lingegowda S. Mangala, Cristian Rodriguez-Aguayo, Emine Bayraktar, Nicholas B. Jennings, Gabriel Lopez-Berestein, and Anil K. Sood Genetic Editing of Long Noncoding RNA Using CRISPR/Cas9 Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kristina Larter, Bin Yi, and Yaguang Xi Characterization of Circular RNAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yang Zhang, Li Yang, and Ling-Ling Chen Identification of circRNA-Interacting Proteins by Affinity Pulldown . . . . . . . . . . Jen-Hao Yang, Poonam R. Pandey, and Myriam Gorospe Exploration of Circular RNA Interactomes by RNA Pull-Down Method . . . . . . Mengshi Wu, Dan Peng, and Xiaomin Zhong Methods for Characterization of Alternative RNA Splicing. . . . . . . . . . . . . . . . . . . Samuel E. Harvey, Jingyi Lyu, and Chonghui Cheng Computational Analysis of lncRNA from cDNA Sequences . . . . . . . . . . . . . . . . . . Susan Boerner and Karen M. McGinnis Characterizing miRNA–lncRNA Interplay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dimitra Karagkouni, Anna Karavangeli, Maria D. Paraskevopoulou, and Artemis G. Hatzigeorgiou Illuminating lncRNA Function Through Target Prediction . . . . . . . . . . . . . . . . . . Hua-Sheng Chiu, Sonal Somvanshi, Ting-Wen Chen, and Pavel Sumazin Long Noncoding RNAs: An Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dongmei Zhang, Mengshi Wu, Minmin Xiong, Congjian Xu, Peng Xiang, and Xiaomin Zhong

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

145

157

169 179 193 203 209 223 243

263

297

307

Contributors EMINE BAYRAKTAR • Department of Gynecologic Oncology and Reproductive Medicine, The University of Texas MD Anderson Cancer Center (MDACC), Houston, TX, USA; Center for RNA Interference and Non-Coding RNA, MDACC, Houston, TX, USA SUSAN BOERNER • Department of Biological Science, Florida State University, Tallahassee, FL, USA GEORGE A. CALIN • Center for RNA Interference and Non-coding RNAs, The University of Texas MD Anderson Cancer Center, Houston, TX, USA; Department of Experimental Therapeutics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA LING-LING CHEN • School of Life Science and Technology, ShanghaiTech University, Shanghai, China; State Key Laboratory of Molecular Biology, Shanghai Key Laboratory of Molecular Andrology, CAS Center for Excellence in Molecular Cell Science, Shanghai Institute of Biochemistry and Cell Biology, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China TING-WEN CHEN • National Yang Ming Chiao Tung University, Hsinchu City, Taiwan CHONGHUI CHENG • Lester & Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA HUA-SHENG CHIU • Texas Children’s Cancer Center, Baylor College of Medicine, Houston, TX, USA JUNG-HYUN CHO • Department of Biochemistry and Molecular Biology, Medical University of South Carolina, Charleston, SC, USA SEAN C. CLARK • Biochemistry and Molecular Biology, Drexel University College of Medicine, Philadelphia, PA, USA MARISSA CLOUTIER • Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, USA YAQI DENG • Department of Ophthalmology, Ninth People’s Hospital, Shanghai JiaoTong University School of Medicine, Shanghai, China; Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Shanghai, China NAVROOP K. DHALIWAL • Department of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada SERGEY EGRANOV • Department of Molecular and Cellular Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA MYRIAM GOROSPE • Laboratory of Genetics and Genomics, Biomedical Research Center, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA CLAIR HARRIS • Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, USA SAMUEL E. HARVEY • Lester & Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA ARTEMIS G. HATZIGEORGIOU • DIANA-Lab, Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece; Hellenic Pasteur Institute, Athens, Greece RUIPING HE • Fudan University Zhongshan Hospital, Shanghai, P. R. China

ix

x

Contributors

XIAOWEN HU • Center for Research on Reproduction & Women’s Health, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Department of Obstetrics and Gynecology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA ZHONGYI HU • Center for Research on Reproduction & Women’s Health, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA QIHONG HUANG • Fudan University Zhongshan Hospital, Shanghai, P. R. China RUI HUANG • Department of Ophthalmology, Ninth People’s Hospital, Shanghai JiaoTong University School of Medicine, Shanghai, China; Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Shanghai, China NICHOLAS B. JENNINGS • Department of Gynecologic Oncology and Reproductive Medicine, The University of Texas MD Anderson Cancer Center (MDACC), Houston, TX, USA JUNJIE JIANG • Center for Research on Reproduction & Women’s Health, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA SUNDEEP KALANTRY • Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, USA DIMITRA KARAGKOUNI • DIANA-Lab, Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece; Hellenic Pasteur Institute, Athens, Greece ANNA KARAVANGELI • DIANA-Lab, Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece; Hellenic Pasteur Institute, Athens, Greece JONGCHAN KIM • Department of Life Sciences, School of Natural Sciences, Sogang University, Seoul, Republic of Korea LAN-TIAN LAI • School of Biological Sciences, Nanyang Technological University, Singapore, Singapore KRISTINA LARTER • Department of Genetics and Stanley S. Scott Cancer Center, Louisiana State University Health Sciences Center, New Orleans, LA, USA JI WON LEE • Department of Biology, College of Natural Sciences, Gangneung-Wonju National University, Gangneung, Gangwon-do, Republic of Korea MIN LI • Fudan University Zhongshan Hospital, Shanghai, P. R. China CHUNRU LIN • Department of Molecular and Cellular Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA; Cancer Biology Program, The University of Texas Graduate School of Biomedical Sciences, The University of Texas MD Anderson Cancer Center, Houston, TX, USA GABRIEL LOPEZ-BERESTEIN • Center for RNA Interference and Non-Coding RNA, MDACC, Houston, TX, USA; Department of Experimental Therapeutics, MDACC, Houston, TX, USA JINGYI LYU • Lester & Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA LI MA • Department of Experimental Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA LINGEGOWDA S. MANGALA • Department of Gynecologic Oncology and Reproductive Medicine, The University of Texas MD Anderson Cancer Center (MDACC), Houston, TX, USA; Center for RNA Interference and Non-Coding RNA, MDACC, Houston, TX, USA KAREN M. MCGINNIS • Department of Biological Science, Florida State University, Tallahassee, FL, USA

Contributors

xi

ZHENYU MENG • ZJU-UIUC Institute, International Campus, Zhejiang University, Haining, Zhejiang, China KYUNG-WON MIN • Department of Biology, College of Natural Sciences, Gangneung-Wonju National University, Gangneung, Gangwon-do, Republic of Korea JENNIFER A. MITCHELL • Department of Cell and Systems Biology, University of Toronto, Toronto, ON, Canada MICHAEL C. OWENS • Biochemistry and Molecular Biology, Drexel University College of Medicine, Philadelphia, PA, USA YUTIAN PAN • Center for Research on Reproduction & Women’s Health, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA POONAM R. PANDEY • Laboratory of Genetics and Genomics, Biomedical Research Center, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA MARIA D. PARASKEVOPOULOU • DIANA-Lab, Department of Computer Science and Biomedical Informatics, University of Thessaly, Lamia, Greece DAN PENG • Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Center for Stem Cell Biology and Tissue Engineering, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, China CRISTIAN RODRIGUEZ-AGUAYO • Center for RNA Interference and Non-Coding RNA, MDACC, Houston, TX, USA; Department of Experimental Therapeutics, MDACC, Houston, TX, USA MILAN SAMANTA • Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, USA JINDONG SHANG • Arraystar Inc., Rockville, MD, USA FANGWEI SHAO • ZJU-UIUC Institute, International Campus, Zhejiang University, Haining, Zhejiang, China JIANFENG SHEN • Department of Ophthalmology, Ninth People’s Hospital, Shanghai JiaoTong University School of Medicine, Shanghai, China; Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Shanghai, China YANGGU SHI • Arraystar Inc., Rockville, MD, USA SRINIVAS SOMAROWTHU • Biochemistry and Molecular Biology, Drexel University College of Medicine, Philadelphia, PA, USA SONAL SOMVANSHI • Texas Children’s Cancer Center, Baylor College of Medicine, Houston, TX, USA ANIL K. SOOD • Department of Gynecologic Oncology and Reproductive Medicine, The University of Texas MD Anderson Cancer Center (MDACC), Houston, TX, USA; Center for RNA Interference and Non-Coding RNA, MDACC, Houston, TX, USA; Department of Cancer Biology, MDACC, Houston, TX, USA PAVEL SUMAZIN • Texas Children’s Cancer Center, Baylor College of Medicine, Houston, TX, USA MEGAN TROTTER • Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI, USA YIHAO WANG • Department of Ophthalmology, Ninth People’s Hospital, Shanghai JiaoTong University School of Medicine, Shanghai, China; Shanghai Key Laboratory of Orbital Diseases and Ocular Oncology, Shanghai, China YUEYING WANG • Center for Research on Reproduction & Women’s Health, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA

xii

Contributors

MENGSHI WU • Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Center for Stem Cell Biology and Tissue Engineering, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, People’s Republic of China YAGUANG XI • Department of Genetics and Stanley S. Scott Cancer Center, Louisiana State University Health Sciences Center, New Orleans, LA, USA PENG XIANG • Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Center for Stem Cell Biology and Tissue Engineering, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, People’s Republic of China ZHEN XING • Department of Molecular and Cellular Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA MINMIN XIONG • Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Center for Stem Cell Biology and Tissue Engineering, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, People’s Republic of China CONGJIAN XU • Shanghai Key Laboratory of Female Reproductive Endocrine Related Diseases, Obstetrics and Gynecology Hospital of Fudan University, Shanghai, People’s Republic of China MU XU • Center for Research on Reproduction & Women’s Health, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA YING XUE • Fudan University Zhongshan Hospital, Shanghai, P. R. China JEN-HAO YANG • Laboratory of Genetics and Genomics, Biomedical Research Center, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA LI YANG • CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China; School of Life Science and Technology, ShanghaiTech University, Shanghai, China LIUQING YANG • Department of Molecular and Cellular Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA; Cancer Biology Program, The University of Texas Graduate School of Biomedical Sciences, The University of Texas MD Anderson Cancer Center, Houston, TX, USA; The Center for RNA Interference and NonCoding RNAs, The University of Texas MD Anderson Cancer Center, Houston, TX, USA ALLISON YANKEY • Biochemistry and Molecular Biology, Drexel University College of Medicine, Philadelphia, PA, USA BIN YI • Department of Genetics and Stanley S. Scott Cancer Center, Louisiana State University Health Sciences Center, New Orleans, LA, USA JE-HYUN YOON • Department of Biochemistry and Molecular Biology, Medical University of South Carolina, Charleston, SC, USA JIAO YUAN • Center for Research on Reproduction & Women’s Health, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA DONGMEI ZHANG • State Key Laboratory of Biotherapy/Collaborative Innovation Center of Biotherapy, West China Hospital, Sichuan University, Chengdu, People’s Republic of China LI-FENG ZHANG • School of Biological Sciences, Nanyang Technological University, Singapore, Singapore LIN ZHANG • Center for Research on Reproduction & Women’s Health, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Department of Obstetrics and Gynecology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA TIANLI ZHANG • Center for Research on Reproduction & Women’s Health, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA

Contributors

xiii

XINNA ZHANG • Center for RNA Interference and Non-coding RNAs, The University of Texas MD Anderson Cancer Center, Houston, TX, USA; Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA YANG ZHANG • Department of Medicine and Pathology, Cancer Research Institute, Beth Israel Deaconess Cancer Center, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA YOUYOU ZHANG • Center for Research on Reproduction & Women’s Health, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA; Department of Obstetrics and Gynecology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA XIAOMIN ZHONG • Key Laboratory for Stem Cells and Tissue Engineering, Ministry of Education, Center for Stem Cell Biology and Tissue Engineering, Zhongshan School of Medicine, Sun Yat-Sen University, Guangzhou, People’s Republic of China

Chapter 1 LncRNA Pulldown Combined with Mass Spectrometry to Identify the Novel lncRNA-Associated Proteins Zhen Xing, Sergey Egranov, Chunru Lin, and Liuqing Yang Abstract Long non-coding RNAs (LncRNAs) are non-protein coding transcripts longer than 200 nucleotides. Recent studies have revealed that nearly 80% of transcripts in human cells are lncRNA species. Based on their genomic location, most lncRNAs can be characterized as large intergenic non-coding RNAs, natural antisense transcripts, pseudogenes, and long intronic ncRNAs, as well as other divergent transcripts. However, despite mounting evidence suggesting that many lncRNAs are likely to be functional, only a small proportion has been demonstrated to be biologically and physiologically relevant due to their lower expression levels and current technique limitations. Thus, there is a greater need to design and develop new assays to investigate the real function of lncRNAs in depth in various systems. Indeed, several methods such as genome-wide chromatin immunoprecipitation-sequencing (ChIP-seq) and RNA immunoprecipitation followed by sequencing (RIP-seq) have been developed to examine the genome localization of lncRNAs and their interacting proteins in cells. Here, we describe an open-ended method, LncRNA pulldown assay, which has been frequently used to identify lncRNA interacting protein partners in a cellular context. We provide a detailed protocol for this assay with hands-on tips based on our own experience working in the lncRNA field. Key words Long non-coding RNAs, RNA–protein interaction, Mass spectrometry

1

Introduction Increasing evidence has indicated that long non-coding RNA (lncRNA) is a new class of players associated with the development and progression of cancer, but the mechanisms by which they act are still unclear [1–4]. Several well-studied lncRNAs have provided us with important clues about the biology and human disease relevance of these molecules [5–8], and a few key functional and mechanistic themes have begun to be understood [9–13]. It is believed that lncRNAs interact with proteins, such as epigenetic

Lin Zhang and Xiaowen Hu (eds.), Long Non-Coding RNAs: Methods and Protocols, Methods in Molecular Biology, vol. 2372, https://doi.org/10.1007/978-1-0716-1697-0_1, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021

1

2

Zhen Xing et al.

modifiers, transcriptional factors/coactivators, and RNP complexes to regulate related biological processes [14]. Of note, specific lncRNA–protein interactions could also be mediated by increasingly prevalent RNA-binding domains [15–17]. Here, we summarize an unbiased and open-ended lncRNA pulldown assay for better understanding lncRNA-associated protein partners. We believe this methodology will drive the advance of studying lncRNA functions and their related mechanisms. We use lncRNA pulldown assays to selectively extract a proteinRNA complex from a cell lysate based on the principle of physical forces between proteins and RNAs. These forces, including dipolar interactions, electrostatic interaction, entropic effects, and dispersion forces, form different degrees during the process of protein binding in a sequence specific manner (tight). Also, RNA secondary and tertiary structures play critical roles in protein recognition and binding to certain nucleic acid sequences (loose). Typically, the pulldown assay uses a RNA probe labeled with a high affinity biotin tag that allows the probe to be recovered. After incubation, the biotinylated RNA probe can bind with a protein/protein complex in a cell lysate, and the complex is then purified using magnetic streptavidin beads. The proteins are then eluted from the RNA and detected by western blot or mass spectrometry. This assay has the advantage of enrichment of low-abundance protein targets, isolation of intact protein complexes, and compatibility with immunoblotting and mass spectrometry analysis. With this technology, lncRNAs have been shown to regulate chromatin states through recruiting chromatin-modifying complexes PRC2 [7]; control transcription programs through binding to androgen receptors [8]; direct cooperative epigenetic regulation downstream of chemokine signals [10]; and modulate antigen presentation machinery [13]. In this chapter, we describe a hands-on protocol for conducting a lncRNA pulldown assay with some tips gained from our own direct experiences.

2

Materials and Reagents Owing to the potential for widespread RNase contamination in the experimental environment, we recommend that throughout this assay, all standard precautions should be taken to minimize RNase contamination. All reagents, glassware, and plasticware should be purchased RNase-free or be treated with 0.1% DEPC to make sure that they are RNase-free (see Note 1).

LncRNA Pulldown Combined with Mass Spectrometry to Identify the Novel. . .

2.1

Reagents

2.1.1 For Preparation of In Vitro Transcribed RNA (IVT)

3

1. pGEM-3Z Vector (Promega, P2151). 2. Biotin RNA Labeling Mix. 3. T7 and SP6 RNA polymerase. 4. TURBO DNase. 5. Anti-RNase. 6. RNA Clean & Concentrator (Zymo Research, R1015). 7. Nuclease-free H2O. 8. UltraPure Agarose (Life Technology, 16500-100). 9. Formaldehyde. 10. 10  MOPS Buffer (Lonza, 50876). 11. Ethidium Bromide. 12. RNA loading dye. 13. RNA marker.

2.1.2 For Preparation of Cell Lysate

1. ProteaPrep Zwitterionic Cell Lysis Kit, Mass Spec Grade (Protea, SP-816). 2. Protease/Phosphatase Inhibitor Cocktail. 3. Panobinostat. 4. Methylstat. 5. 1  Phosphate-buffered-saline (PBS) buffer.

2.1.3 For lncRNA Pulldown

1. BcMag™ Monomer Avidin Magnetic Beads (Bioclone, MMI101). 2. RNA structure buffer: Tris-HCl (pH 7.0) 10 mM, KCl 0.1 M, MgCl2 10 mM. 3. RNA capture buffer: Tris-HCl (pH 7.5) 20 mM, NaCl 1 M, EDTA 1 mM. 4. NT2 buffer: Tris-HCl (pH 7.4) 50 mM, NaCl 150 mM, MgCl2 1 mM, NP-40 0.05%. 5. Wash buffer: NT2 buffer supplemented with sequentially increased NaCl (500 mM to 1 M) or KSCN (750 mM).

2.1.4 For In-Solution Tryptic Digestion

1. Digestion buffer, Ammonium Bicarbonate (50 mM). 2. Reducing buffer, DTT (100 mM). 3. Alkylation buffer, Iodoacetamide in NH4HCO3 (100 mM). 4. Trifluoroacetic acid (TFA), 10%. 5. Acetonitrile.

4

Zhen Xing et al.

6. Peptide recovery buffer, 1 ml: 400 μl acetonitrile, 20 μl 10% TFA, 580 μl 50 mM NH4HCO3. 7. Immobilized Trypsin.

3

Methods The following protocol is based on the previous method, with some modifications according to our own experience (Fig. 1).

3.1 IVT of BiotinLabeled lncRNA 3.1.1 Cloning the lncRNA Gene Sequence into pGEM3Z Vector

The gene sequence should be ligated into the multiple cloning sites located between the T7 promoter and SP6 promoter. T7 promoter: 50 TAA TAC GAC TCA CTA TAG GG 30 . Sp6 promoter: 50 AAT TTA GGT GAC ACT ATA GAA 30 .

3.1.2 Lineation of the pGEM-3Z-lncRNA Vector

Add the restriction endonuclease (RE) to the plasmid. 20 μg of midi-prepared plasmid. 5 μl 10  RE buffer. 5 Unit RE enzyme ddH2O. Bring the total volume to 50 μl with H2O and incubate at 37  C from 2 h to overnight. Purify the linearized plasmid using the QIAquick Gel Purification Kit in accordance with the manufacturer’s protocol (see Note 2).

3.1.3 IVT Reaction

This reaction is used for the synthesis of “run off” transcript. Reaction Components. Nuclease-free H2O. Linearized plasmid

1 μg

Biotin RNA labeling mix, 10

2 μl

10  transcription buffer

2 μl

T7/SP6 RNA polymerase

2 μl

Add the above to a microfuge tube on ice and bring the total volume to 20 μl with the nuclease-free H2O; mix and centrifuge briefly. Incubate for 2 h at 37  C. Add 2 μl DNase to remove template DNA by incubation for 15 min at 37  C. Stop the reaction by adding 2 μl 0.2 M EDTA (pH 8.0) (see Note 3).

LncRNA Pulldown Combined with Mass Spectrometry to Identify the Novel. . .

5

Fig. 1 Schematic diagram showing lncRNA pulldown 3.1.4 Purification and Size/Yield Check of IVT RNA

1. An RNA Clean & Concentrator kit can be used to recover IVT RNA fragments following DNase treatment and the termination of the above reaction. Typically, up to 10 μg IVT RNA eluted into 10–20 μl RNase-free water can be obtained. 2. Preparing denatured RNA gel to confirm if the RNAs are transcribed at the right size. Make 100 ml denatured RNA gel. 1 g agarose in 72 ml water heated until dissolved and then cooled to 60  C. 10 ml 10  MOPS running buffer. 18 ml 37% formaldehyde. Prepare the RNA sample and electrophoresis. Take 1 μl RNA sample, add 5 μl Formaldehyde Loading dye, 1 μl Ethidium Bromide (10 μg/ml). Heat denatured samples at 65–70  C for 5–15 min. Load the gel and electrophorese at 5–6 V/cm as far as 2/3rds the length of the gel.

6

Zhen Xing et al.

Check the RNA size under a UV light. One sharp band with the correct size is a good indication that the RNA was efficiently transcribed. A smeared appearance may be the result of the enzyme failing to bind to the promoter efficiently or dropping off the template half way (see Note 4). 3.2 Preparation of Cell Lysate

Culture and collect sufficient cells (at least a 10  10 cm plate of cells for each pulldown); suspend the cell pellet with 1 ml of ProteaPrep Zwitterionic Cell Lysis buffer supplemented with Protease and Phosphatases inhibitor Cocktails (1:100), anti-RNase (1 U/μL), Panobinostat (1:100), and Methylstat (1:100); incubate the tube on ice for 40 min, vortex every 10 min; centrifuge the mixture at 14,000  g for 15 min at 4  C; transfer supernatant to a new tube for further use (see Note 5).

3.3 Activation of BcMag™ Monomer Avidin Magnetic Beads

Prior to use, all reagents in the kit should be equilibrated to room temperature and make 1x working solutions with ddH2O. Follow the manufacturer’s instructions to prepare the activated beads, briefly summarized as: 1. Gently shake the bottle containing the beads until completely suspended. Transfer 50 μl beads to a fresh tube. 2. Place the tube on a magnetic separator for 1 min; remove the supernatant. Note: While the tube should remain on the separator, you may remove the tube from the separator for 1 min to prevent beads loss. 3. Wash beads with 200 μl of ddH2O. 4. Wash the beads with 200 μl of 1  PBS buffer. 5. Block the beads with 150 μl of 1  Blocking/Elution Buffer at room temperature for 5 min. Remove the buffer. 6. Regenerate the beads by adding in 300 μl of 1  Regeneration Buffer. Remove the buffer. 7. Wash the beads with 200 μl of 1  PBS Buffer. The beads are ready for use (see Note 6).

3.4

lncRNA Pulldown

3.4.1 Pre-clearing the Lysate with Activated Beads to Reduce Nonspecific Binding 3.4.2 Folding of IVT RNA

1. Remove the PBS from the activated avidin beads. 2. Add 1 ml of cell lysate to the beads and mix well. 3. Incubate for 1 h at 4  C with end-to-end gentle rotation. 4. Place the tube on a magnetic separator and keep the supernatant for further use. Discard the beads pellet. 1. 20 μg of biotinylated RNA should be heated to 90  C for 2 min. 2. Chill on ice for 2 min and briefly centrifuge.

LncRNA Pulldown Combined with Mass Spectrometry to Identify the Novel. . .

7

3. Bring the total volume to 100 μl with RNA structure buffer. 4. Incubate at room temperature for 20 min to allow proper secondary structure formation. 3.4.3 LncRNA Pulldown

1. Prepare the 50 μl activated Avidin Magnetic Beads. 2. Immediately subject the beads to RNA (20 μg) capture in the RNA capture buffer for 30 min at room temperature with gentle agitation. 3. Remove the RNA capture buffer. Wash the RNA captured beads once with NT2 buffer and incubate with 30 mg pre-cleared cell lysate for 2 h at 4  C with gentle rotation. 4. Briefly centrifuge the tube and place the tube on a magnetic separator. Remove the supernatant from the beads and discard. The complex of interest should now bind to the beads. The following steps are used to remove non-specific binding to the beads. 5. Wash the RNA-binding protein complexes with NT2 buffer twice. 6. Wash the RNA-binding protein complexes with NT2 high salt buffer (500 mM NaCl) twice. 7. Wash the RNA-binding protein complexes with NT2 high salt buffer (1 M NaCl) once. 8. Wash the RNA-binding protein complexes with NT2 high salt buffer (750 mM KSCN) once. 9. Wash the RNA-binding protein complexes with PBS twice (see Note 7). 10. Elute the beads by incubating the sample in an elution buffer for 20 min at 4  C with frequent agitation. Collect the elute. 11. Repeat elution and pool the elute together.

3.5

MS Analysis

Mass spectrometry (MS) has become a powerful and frequently used choice for protein detection, identification, and quantitation. Due to the complexity of the whole workflow, there is no fixed method from sample preparation, instrumentation, and software analysis. Here, we summarize a detailed protocol for in-solution digestion with hands-on tips, because we believe that the quality of the digested samples also significantly impacts the MS results. Next, the resultant peptides can be subjected to MS analysis. We recommend regular liquid chromatography-mass spectrometry (LC-MS), which can be handled by most proteomics facilities in different institutions (Fig. 2). Procedure for In-Solution Digestion 1. Concentrate and dry the eluted samples with a Speedy Vacuum system.

8

Zhen Xing et al.

In-Solution Digestion

Peptide enrichment/Clean-up

-Reduction -Alkylation -Digestion

MS analysis

Fig. 2 Schematic diagram showing sample preparation for MS analysis

2. Dissolve the dry elute in 10 μl ddH2O and prepare the reduction reaction. 15 μl Digestion buffer 10 μl dissolved eluted samples 1.5 μl reducing buffer Bring the total volume to 27 μl with ddH2O and incubate the solution at 95  C for 5 min. Allow the samples to cool to room temperature. 3. Perform an alkylation reaction by adding 3 μl alkylation buffer to the above tube and incubate in a dark at room temperature for 20 min. 4. Perform the trypsin digestion overnight with the immobilized trypsin kit according to the manufacturer’s instructions. 5. Subject the recovered digested peptides to LS-MS analysis.

4

Notes 1. Because RNA is susceptible to degradation and because RNases are found in all cell types and also widespread in the air, most surfaces, and dust, we strongly recommend that all materials, reagents, and solutions be prepared carefully or purchased commercially certified for RNA work.

LncRNA Pulldown Combined with Mass Spectrometry to Identify the Novel. . .

9

2. Start with at least 10 μg of plasmid. Since most commercial Gel Purification Kits have less than a 50% recovery efficiency, a too-low concentration of template will significantly affect the transcription efficiency. We recommend adding the BSA to the restriction enzyme digestion reaction, since this will avoid star activity. 3. Validate the insertion direction of the lncRNA gene fragment into the vector, and use T7 or SP6 RNA polymerase for transcription of both strands of the cloned DNA (sense and antisense transcripts). Antisense transcripts and beads can be used as two negative controls for the pulldown experiment, which will be critical for the elucidation of the final MS results. 4. The quality and yield of IVT RNA are critical factors for a successful lncRNA pulldown experiment. A sharp and pure band without any smearing is recommended for continuing the experiment. 5. The Protease/Phosphatase Inhibitor Cocktail protects the protein from degradation by endogenous proteases and phosphatases released during protein extraction and purification. Panobinostat is a deacetylase inhibitor and Methylstat is a selective inhibitor of the Jumonji C domain-containing histone demethylases. Supplementation of these inhibitors provides a higher chance to identify novel modification sites related to lncRNA-associated protein/protein complexes. 6. These beads are 1 μm, silica-based superparamagnetic beads coated with a high density, ultrapure (97%) avidin subunit monomer on the surface. The bound biotinylated molecules can be easily eluted from the beads under mild elution conditions, such as a 2 mM biotin-containing buffer. It is critical that the beads are used immediately after activation, otherwise the binding capacity will be dramatically reduced. Some other forms of avidin beads may also be used and the final elution condition needs to be titrated. 7. For each wash, mix the beads gently with a wash buffer and then subject them to gentle rotation at 4  C for 5 min. Make sure to carefully remove as much wash buffer as possible from the beads and prevent beads loss. References 1. Efroni S et al (2008) Global transcription in pluripotent embryonic stem cells. Cell Stem Cell 2(5):437–447 2. Wang KC, Chang HY (2011) Molecular mechanisms of long noncoding RNAs. Mol Cell 43(6):904–914 3. Fatica A, Bozzoni I (2014) Long non-coding RNAs: new players in cell differentiation and development. Nat Rev Genet 15(1):7–21

4. St Laurent G, Wahlestedt C, Kapranov P (2015) The landscape of long noncoding RNA classification. Trends Genet 31 (5):239–251 5. Gabory A et al (2009) H19 acts as a trans regulator of the imprinted gene network controlling growth in mice. Development 136 (20):3413–3421

10

Zhen Xing et al.

6. Wang KC et al (2011) A long noncoding RNA maintains active chromatin to coordinate homeotic gene expression. Nature 472 (7341):120–124 7. Yang L et al (2011) ncRNA- and Pc2 methylation-dependent gene relocation between nuclear structures mediates gene activation programs. Cell 147(4):773–788 8. Yang L et al (2013) lncRNA-dependent mechanisms of androgen-receptor-regulated gene activation programs. Nature 500 (7464):598–602 9. Wang P et al (2014) The STAT3-binding long noncoding RNA lnc-DC controls human dendritic cell differentiation. Science 344 (6181):310–313 10. Xing Z et al (2014) lncRNA directs cooperative epigenetic regulation downstream of chemokine signals. Cell 159(5):1110–1125 11. Liu B et al (2015) A cytoplasmic NF-kappaB interacting long noncoding RNA blocks IkappaB phosphorylation and suppresses breast cancer metastasis. Cancer Cell 27(3):370–381

12. Li C et al (2017) A ROR1-HER3-lncRNA signalling axis modulates the Hippo-YAP pathway to regulate bone metastasis. Nat Cell Biol 19(2):106–119 13. Hu Q et al (2019) Oncogenic lncRNA downregulates cancer cell antigen presentation and intrinsic tumor suppression. Nat Immunol 20 (7):835–851 14. Lin C, Yang L (2018) Long noncoding RNA in cancer: wiring signaling circuitry. Trends Cell Biol 28(4):287–301 15. Huarte M (2015) The emerging role of lncRNAs in cancer. Nat Med 21 (11):1253–1261 16. Lin A et al (2017) The LINK-A lncRNA interacts with PtdIns(3,4,5)P3 to hyperactivate AKT and confer resistance to AKT inhibitors. Nat Cell Biol 19(3):238–251 17. Hu Q et al (2019) LncRNAs-directed PTEN enzymatic switch governs epithelialmesenchymal transition. Cell Res 29 (4):286–304

Chapter 2 Crosslinking Immunoprecipitation and qPCR (CLIP-qPCR) Analysis to Map Interactions of Long Noncoding RNAs with Canonical and Non-canonical RNA-Binding Proteins Jung-Hyun Cho, Ji Won Lee, Je-Hyun Yoon, and Kyung-Won Min Abstract Mammalian cells express a wide range of transcripts, some protein-coding RNAs (mRNA), and many noncoding (nc) RNAs. Long (l)ncRNAs can modulate protein expression patterns by regulating gene transcription, pre-mRNA splicing, mRNA export, mRNA degradation, protein translation, and protein ubiquitination. Given the growing recognition that lncRNAs have a robust impact upon gene expression, there is rising interest in elucidating the levels and regulation of lncRNAs. A number of high-throughput methods have been developed recently to map the interaction of lncRNAs and RNA-binding proteins (RBPs). However, few of these approaches are suitable for mapping and quantifying RBP–lncRNA interactions. Here, we describe the recently developed method CLIP-qPCR (crosslinking and immunoprecipitation followed by reverse transcription and quantitative PCR) for mapping and quantifying interactions of lncRNAs with canonical and non-canonical RBPs. Key words CLIP, lncRNA, RBP, Ribonucleoprotein complexes, qPCR

1

Introduction Gene expression programs are influenced transcriptionally via processes such as chromatin remodeling and transcription factor mobilization, and posttranscriptionally through processes like pre-mRNA splicing, 50 capping, 30 polyadenylation, editing, mRNA export, localization, and translation. These processes are governed by DNA-binding proteins, RNA-binding proteins, and noncoding RNAs [1–4]. In recent years, lncRNAs have emerged as major regulators of transcription, mRNA fate, and protein stability [4–6]. Although they can perform these gene regulatory functions directly via sequence complementarity with target DNA and RNA

Lin Zhang and Xiaowen Hu (eds.), Long Non-Coding RNAs: Methods and Protocols, Methods in Molecular Biology, vol. 2372, https://doi.org/10.1007/978-1-0716-1697-0_2, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021

11

12

Jung-Hyun Cho et al.

sequences, most lncRNA functions identified to-date involve their interaction with partner DNA-binding proteins and RNA-binding proteins (RBPs) [4, 6]. RBP-RNAs complexes have been studied using a variety of methods [7]. Interactions of specific RBPs and specific RNAs have been examined using traditional procedures such as the analysis of ribonucleoprotein immunoprecipitation (RIP) complexes, biotinylated RNA pulldown complexes, and RNA electrophoretic mobility shifts assays [8]. Global analyses of mRNAs interacting with RBP were developed using RIP followed by cDNA microarray analysis [9] or by RIP followed by RNA sequencing (RNA-seq [10]). Subsequently, RBP crosslinking using UVC light (254 nm) and immunoprecipitation (CLIP) was developed, initially combined with Sanger Sequencing method [11] and later with highthroughput sequencing (RNA-seq) of the CLIP cDNA library [12, 13]. Further advances were achieved by photoactivatable ribonucleoside-enhanced (PAR)-CLIP analysis, using nucleotide analogs such as 4-thiouridine (4-SU) or 6-thioguanosine (6-SG) following exposure to UVA light (365 nm) [14]. Alternative CLIP methods are also known as individual-nucleotide resolution CLIP (iCLIP) [15] using reverse transcriptase stall and crosslinking, ligation and sequencing of hybrids (CLASH), which monitors interRNA interactions within tripartite complexes [16]. Recently, more improved derivatives of the CLIP method were developed. For example, RNA hybrid and individual-nucleotide resolution UV crosslinking and immunoprecipitation (hiCLIP) was improved from the CLASH method by modifying the proximity ligation step to identify inter- and intramolecular RNA duplexes bound by a specific RBP [17]. The infrared-CLIP (irCLIP), the enhanced CLIP (eCLIP), and the fast ligation of RNA after some sort of affinity purification for high-throughput sequencing (FLASH) are more efficient, faster, and do not require radioactive substances when compared to the iCLIP method [18–21]. Despite such enormous advances in RBP-RNA detection, highthroughput analyses require the generation of a small RNA library, and thus a significant amount of effort, cost, time, and optimization. In addition, a substantial amount of bioinformatic analysis is required to map the sequence reads and identify the binding sites. Therefore, we have developed a protocol that combines PAR-CLIP with qPCR analysis to map RBP binding sites at 100-nt to 300-nt intervals within a lncRNA of interest. This method is based on the novel introduction of partial RNase digestion and the detection of serial, progressive scanning of the target RNA by reverse transcription and qPCR (Fig. 1).

CLIP-qPCR of Long Noncoding RNAs

B

A

RBP 4SU 4SU

lncRNA

Lysis RNase T1 digestion Immunoprecipitation

50

PCR amplification

4

5

6

7

8

9

10 11 12 13 14

2370 15 16

17 18

19 20

21 22 23

Yoon et al., 2013 Nat. Commun. (Figure 2D)

30 20 10 0

1

2

3

4

5

6

7

8

9

1

3.5 3 2.5 2 1.5 1 0.5 0

D

10 11 12 13 14 15 16 17 18 19 20 21 22 23

NEAT1 1

UxI

3

HuR IP / IgG IP

lncRNA fragments (100-300 nt) Proteinase K treatment DNase digestion Phenol extraction

2

40

C Relative enrichment (NEAT1 / 18S rRNA)

4SU

HOTAIR 1

UVA, 365 nm

4SU

1 Relative enrichment (HOTAIR / GAPDH mRNA)

Incubate cells in 4-thiouridine

(4SU)

13

2

3

4

5

6

7

8

3729

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

ENO1 IP / IgG IP

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

Malat1

1

6983

Relative enrichment (Malat1 / GAPDH mRNA)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69

100 90 80 70 60 50 40 30 20 10 0

Kim et al., 2018 Nat. Genet. (Figure 4F)

HA-TEAD1 IP / IgG IP

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68

Fig. 1 LncRNAs interact with canonical and non-canonical RNA-binding proteins. (a) Schematic diagram of CLIP-qPCR assay to identify lncRNA segments associated with a protein of interest. Immunoprecipitated lncRNA-protein complexes are digested with RNase T1 and protein-protected lncRNA segments are subjected to qPCR analysis with overlapping 200-bp amplicons. (b) The segments of human HOTAIR lncRNA bound by HuR were detected using 23 pairs of primers for mapping of HuR-binding sites on HOTAIR at 100- to 300nucleotide intervals [22]. (c) The segments of human NEAT1 lncRNA bound by ENO1 were detected using 36 pairs of primers for mapping of ENO1-binding sites on NEAT1 at 100- to 300-nucleotide intervals (Unpublished data). (d) The segments of mouse Malat1 lncRNA bound by TEAD1 using 69 pairs of primers for mapping of TEAD1-binding sites on Malat1 at 100- to 300-nucleotide intervals [23]

2

Materials

2.1 Mammalian Cell Culture and RNA Labeling

1. DMEM, 10% FBS, 2 mM L-Glutamine, 100 U/ml Penicillin/ streptomycin.

2.2 UVA Crosslinking and Cell Harvesting

1. UV crosslinker.

2. 1 M 4-thiouridine (4-SU) in DMSO.

2. Stratalinker 1800, 365 nm bulb. 3. Ice-cold phosphate-buffered saline (PBS).

14

Jung-Hyun Cho et al.

2.3 RNase Digestion and Immunoprecipitation

1. RNase T1. 2. Protein A or G Sepharose beads. 3. Antibodies to detect an endogenous protein of interest. 4. Normalized IgG. 5. Proteinase K. 6. DNase I. 7. Acidic phenol. 8. Glycoblue. 9. NP-40 lysis buffer: 20 mM Tris–HCl at pH 7.5, 100 mM KCl, 5 mM MgCl2, and 0.5% NP-40. 10. NT2 buffer: 50 mM Tris–HCl at pH 7.5, 150 mM NaCl, 1 mM MgCl2, and 0.05% NP-40.

2.4 Reverse Transcription and qPCR

1. dNTP mix (10 mM). 2. Random hexamer (150 ng/μl). 3. Reverse transcriptase. 4. Gene-specific primer sets. 5. SYBR mix.

3

Methods

3.1 Mammalian Cell Culture and RNA Labeling

1. Expand Human Embryonic Kidney (HEK) or Human cervical cancer (HeLa), or Human breast cancer (MCF7) cells in 150-mm culture plates, cultured in DMEM supplemented with 10% (v/v) fetal bovine serum and antibiotics. Grow cells up to 80% confluence. 2. Sixteen hours before UVA exposure, add 4SU (1 M stock solution in DMSO) to a final concentration of 100 μM in culture medium. Alternatively, 100 μM 6SG can be used, but the crosslinking efficiency is somewhat lower.

3.2 UVA Crosslinking and Harvesting Cells

1. Aspirate the culture medium, wash cells once with 15 ml ice-cold PBS and aspirate PBS completely. 2. Set up a tray containing ice and place the plate on the ice. 3. Uncover the plate and irradiate it with 150 mJ/cm2 of UVA (365 nm) in Stratalinker or similar device (see Notes 1 and 2). 4. Scrape cells in 5 mL PBS, transfer to 50 mL conical tube, centrifuge at 2000  g at 4  C for 5 min, and aspirate the supernatant. 5. Cell pellets can be used immediately for lysis or stored at 80  C for later use.

CLIP-qPCR of Long Noncoding RNAs

3.3 RNase Digestion and Immunoprecipitation

15

1. Resuspend cell pellets by adding three volumes of NP-40 lysis buffer supplemented with protease inhibitors, and 1 mM DTT. 2. Incubate on ice for 10 min and centrifuge at 10,000  g for 15 min at 4  C. 3. Collect supernatants, and add RNase T1 to 1 U/μl final concentration, incubate at 22  C for 2, 4, 6, 8, 10, and 15 min (see Notes 3). 4. Spare 100 μl lysate, add 400 μl water and 500 μl acidic phenol, vortex for 1 min, and centrifuge at 10,000  g for 20 min at 4  C. 5. Collect 400 μl of supernatant, add 800 μl 100% ethanol, 40 μl 3 M sodium acetate, and 1 μl glycoblue. Incubate it at 80  C for 1 h (or overnight). 6. Centrifuge at 10,000  g for 20 min at 4  C, aspirate the supernatant, and add 500 μl of 70% ethanol. 7. Centrifuge at 10,000  g for 10 min at 4  C, aspirate the supernatant, dry pellet at room temperature, and dissolve the pellet with RNase-free water. 8. Run the RNA samples in 1.5% formaldehyde agarose gel to verify that RNAs are digested in 100- to 300-nt range. 9. Select the samples having RNAs partially digested in the 100to 300-nt range. 10. To prepare sepharose beads, wash beads with ice-cold PBS three times and resuspend them with equal volume of ice-cold PBS to create a 50% slurry. 11. Incubate 40 μl of the beads slurry with 10 μg of normalized IgG or antibody of interest against canonical; Human antigen R (HuR) as example in Fig. 1b, or non-canonical RBPs; Enolase1 (ENO1) and TEA domain transcription factor (TEAD1) as examples in Fig. 1c, d for 2 h at 4  C in NT2 buffer. 12. Centrifuge the beads at 2000  g for 1 min at 4  C, wash three times with NT2 buffer. 13. Add 1 mL of cell lysates to the sepharose beads coated with antibody and incubate them for 3 h at 4  C. 14. After centrifugation at 2000  g for 1 min at 4  C, wash beads three times with NT2 buffer. 15. Incubate the pellets with 20 units of RNase-free DNase I in 100 μl NT2 buffer for 15 min at 37  C. 16. Add 700 μl of NT2 buffer and centrifuge at 2000  g for 1 min at 4  C. 17. Incubate the pellets with 500 μl of NT2 buffer supplemented with 0.1% SDS and 0.5 mg/ml Proteinase K for 15 min at 55  C.

16

Jung-Hyun Cho et al.

18. Collect the supernatant after centrifugation at 10,000  g at 4  C for 5 min. 19. Add 500 μl of RNase-free water and 500 μl of acidic phenol, then vortex for 5 min. 20. Centrifuge at 10,000  g, 4  C for 20 min; collect 400 μl supernatant; add 800 μl 100% ethanol, 40 μl 3 M sodium acetate, and 1 μl glycoblue; and incubate it at 80  C for 1 h (or overnight). 21. Centrifuge at 10,000  g for 20 min at 4  C, remove the supernatant, and add 500 μl of 70% ethanol. 22. Centrifuge at 10,000  g for 10 min at 4  C, remove the supernatant, dry pellet at room temperature, and dissolve the pellet with 12 μl RNase-free water. 3.4 Reverse Transcription and qPCR

1. Mix 1 μl dNTP mix (10 mM) and 1 μl random hexamer (150 ng/μl) with 12 μl purified RNAs. 2. Incubate them at 65  C for 5 min and 4  C for 5 min using a thermocycler. 3. Add 1 μl reverse transcriptase (200 U/μl), 1 μl RNase Inhibitor (40 U/μl), and 4 μl 5 reaction buffer. 4. Incubate samples at 25  C for 10 min, at 50  C for 30 min, and at 85  C for 5 min using a thermocycler. 5. Mix 2.5 μl of cDNAs, 2.5 μl forward and reverse gene-specific primers (2.5–10 μM) designed to amplify PCR products in 200 nt intervals, and 5 μl of SYBR green master mix (see Notes 4 and 5). 6. After completion of qPCR, calculate Ct values of IgG IP and specific antibody IP normalized with Ct values of mRNAs encoding housekeeping proteins like GAPDH, ACTB, UBC, SDHA, and 18S (see Notes 6–8).

4

Notes 1. If a method without RNA labeling is preferred, UVC at 254 nm can be utilized instead of UVA at 365 nm and preincubation with 4-SU or 4-SG can be omitted. 2. If non-canonical RBPs are of interest, RBP and RNA crosslinking may not be successful upon UV exposure. In this case, formaldehyde crosslinking can be utilized instead. However, our representative data (Fig. 1) were generated from UVA (365 nm) crosslinking although ENO1 and TEAD1 are not canonical RBPs.

CLIP-qPCR of Long Noncoding RNAs

17

3. It is critical to titrate the amount of RNase T1 (or other RNase) and the time of incubation. Optimization of these parameters in order to obtain 100–300-nt RNA fragments (mainly from 18S and 28S ribosomal RNAs) is a key step in CLIP-qPCR analysis. For highly abundant RNAs (e.g., MALAT1 and NEAT1), higher amount of RNase and longer incubation can be utilized. 4. For primer design, divide the RNA of interest in 200-nt overlapping intervals (e.g., spanning positions 1–200, 101–300, 201–400, 301–500, etc.). This way, each gene-specific primer will cover all of the full-length transcripts after qPCR. If the full-length target RNA is not in a public database, primer extension or rapid amplification of cDNA ends (RACE) can be done first. 5. Depending on the RBP and RNAs localized in specific cellular compartments, NP-40 lysis buffer can be modified by increasing the concentration of NP-40 (e.g., from 0.5% to 2%) or by adding more stringent detergents such as Triton X-100 and SDS. 6. If DNA contamination is suspected, RT-minus PCR amplification reactions can be performed. If there are amplifications, genomic DNA could be contaminated during immunoprecipitation. In this case, increased amount of DNase can be utilized with longer incubation time. 7. This method can be used to map canonical or non-canonical RBPs binding to mRNAs. Some attention should be paid to the fact that mRNAs may be extensively engaged with polyribosomes and hence the coding region may not be available to RBPs and PCR amplification of certain regions may be challenging. 8. This method was recently utilized in high-profile journals (Fig. 1) [22, 23], supporting reliability and feasibility of the method for mapping and quantifying RBP–lncRNA interactions.

Acknowledgments J.-H.C. was supported by the Holing Cancer Center’s Fellowship Program sponsored by the Abney Foundation at Medical university of South Carolina. J.W.L. and K.-Y.M. were supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (NRF-2019R1C1C1002886). J.H.Y. was supported by Medical university of South Carolina.

18

Jung-Hyun Cho et al.

References 1. Berger SL (2002) Histone modifications in transcriptional regulation. Curr Opin Genet Dev 12:142–148 2. Glisovic T, Bachorik JL, Yong J, Dreyfuss G (2008) RNA-binding proteins and posttranscriptional gene regulation. FEBS Lett 582:1977–1986 3. Filipowicz W, Bhattacharyya SN, Sonenberg N (2008) Mechanisms of post-transcriptional regulation by microRNAs: are the answers in sight? Nat Rev Genet 9:102–114 4. Yoon JH, Abdelmohsen K, Gorospe M (2013) Posttranscriptional gene regulation by long noncoding RNA. J Mol Biol 425:3723–3730 5. Yoon JH, Abdelmohsen K, Gorospe M (2014) Functional interactions among microRNAs and long noncoding RNAs. Semin Cell Dev Biol 34:9–14 6. Rinn JL, Chang HY (2012) Genome regulation by long noncoding RNAs. Annu Rev Biochem 81:145–166 7. Riley KJ, Steitz JA (2013) The “Observer Effect” in genome-wide surveys of proteinRNA interactions. Mol Cell 49:601–604 8. Lin RJ (2008) RNA -protein interaction protocols. Methods Mol Biol 488:v–vii 9. Tenenbaum SA, Carson CC, Lager PJ, Keene JD (2000) Identifying mRNA subsets in messenger ribonucleoprotein complexes by using cDNA arrays. Proc Natl Acad Sci U S A 97:14085–14090 10. Mukherjee N, Corcoran DL, Nusbaum JD, Reid DW, Georgiev S, Hafner M, Ascano M Jr, Tuschl T, Ohler U, Keene JD (2011) Integrative regulatory mapping indicates that the RNA-binding protein HuR couples pre-mRNA processing and mRNA stability. Mol Cell 43:327–339 11. Ule J, Jensen KB, Ruggiu M, Mele A, Ule A, Darnell RB (2003) CLIP identifies Novaregulated RNA networks in the brain. Science 302:1212–1215 12. Licatalosi DD, Mele A, Fak JJ, Ule J, Kayikci M, Chi SW, Clark TA, Schweitzer AC, Blume JE, Wang X, Darnell JC, Darnell RB (2009) HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature 456:464–469 13. Chi SW, Zang JB, Mele A, Darnell RB (2009) Argonaute HITS-CLIP decodes microRNAmRNA interaction maps. Nature 460:479–486 14. Hafner M, Landthaler M, Burger L, Khorshid M, Hausser J, Berninger P,

Rothballer A, Ascano M Jr, Jungkamp AC, Munschauer M, Ulrich A, Wardle GS, Dewell S, Zavolan M, Tuschl T (2010) Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell 141:129–141 15. Ko¨nig J, Zarnack K, Rot G, Curk T, Kayikci M, Zupan B, Turner DJ, Luscombe NM, Ule J (2010) iCLIP reveals the function of hnRNP particles in splicing at individual nucleotide resolution. Nat Struct Mol Biol 17:909–915 16. Kudla G, Granneman S, Hahn D, Beggs JD, Tollervey D (2011) Cross-linking, ligation, and sequencing of hybrids reveals RNA-RNA interactions in yeast. Proc Natl Acad Sci U S A 108:10010–10015 17. Sugimoto Y, Chakrabarti AM, Luscombe NM, Ule J (2017) Using hiCLIP to identify RNA duplexes that interact with a specific RNA-binding protein. Nat Protoc 12:611–637 18. Van Nostrand EL, Pratt GA, Shishkin AA, Gelboin-Burkhart C, Fang MY, Sundararaman B, Blue SM, Nguyen TB, Surka C, Elkins K, Stanton R, Rigo F, Guttman M, Yeo GW (2016) Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP). Nat Methods 13:508–514 19. Zarnegar BJ, Flynn RA, Shen Y, Do BT, Chang HY, Khavari PA (2016) irCLIP platform for efficient characterization of protein-RNA interactions. Nat Methods 13:489–492 20. Martin G, Zavolan M (2016) Redesigning CLIP for efficiency, accuracy and speed. Nat Methods 13:482–483 21. Ilik IA, Aktas T, Maticzka D, Backofen R, Akhtar A (2020) FLASH: ultra-fast protocol to identify RNA-protein interactions in cells. Nucleic Acids Res 20:e15 22. Yoon JH, Abdelmohsen K, Kim J, Yang X, Martindale JL, Tominaga-Yamanaka K, White EJ, Orjalo AV, Rinn JL, Kreft SG, Wilson GM, Gorospe M (2013) Scaffold function of long non-coding RNA HOTAIR in protein ubiquitination. Nat Commun 4:2939 23. Kim J, Piao HL, Kim BJ, Yao F, Han Z, Wang Y, Xiao Z, Siverly AN, Lawhon SE, Ton BN, Lee H, Zhou Z, Gan B, Nakagawa S, Ellis MJ, Liang H, Hung MC, You MJ, Sun Y, Ma L (2018) Long noncoding RNA MALAT1 suppresses breast cancer metastasis. Nat Genet 50:1705–1715

Chapter 3 Characterization of Long Non-coding RNA Associated Proteins by RNA-Immunoprecipitation Junjie Jiang, Tianli Zhang, Yutian Pan, Zhongyi Hu, Jiao Yuan, Xiaowen Hu, Lin Zhang, and Youyou Zhang Abstract With the advances in sequencing technology and transcriptome analysis, it is estimated that up to 75% of the human genome is transcribed into RNAs. This finding prompted intensive investigations on the biological functions of non-coding RNAs and led to very exciting discoveries of microRNAs as important players in disease pathogenesis and therapeutic applications. Research on long non-coding RNAs (lncRNAs) is in its infancy, yet a broad spectrum of biological regulations has been attributed to lncRNAs. RNA-immunoprecipitation (RNA-IP) is a technique of detecting the association of individual proteins with specific RNA molecules in vivo. It can be used to investigate lncRNA–protein interaction and identify lncRNAs that bind to a protein of interest. Here we describe the protocol of this assay with detailed materials and methods. Key words Long non-coding RNA, RNA-binding protein, RNA-immunoprecipitation

1

Introduction lncRNAs are operationally defined as RNA transcripts larger than 200 nt that do not appear to have coding potential [1–5]. Given that up to 75% of the human genome is transcribed to RNA, while only a small portion of the transcripts encodes proteins [6], the number of lncRNA genes can be large. After the initial cloning of functional lncRNAs such as H19 [7, 8] and XIST [9] from cDNA libraries, two independent studies using high-density tiling array reported that the number of lncRNA genes is at least comparable to that of protein-coding genes [10, 11]. Recent advances in tiling array [10–13], chromatin signature [14, 15], computational analysis of cDNA libraries [16, 17], and next-generation sequencing (RNA-seq) [18–21] have revealed that thousands of lncRNA

Lin Zhang and Xiaowen Hu (eds.), Long Non-Coding RNAs: Methods and Protocols, Methods in Molecular Biology, vol. 2372, https://doi.org/10.1007/978-1-0716-1697-0_3, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021

19

20

Junjie Jiang et al.

genes are abundantly expressed with exquisite cell-type and tissue specificity in human. In fact, the GENCODE consortium within the framework of the ENCODE project recently reported 14,880 manually annotated and evidence-based lncRNA transcripts originating from 9277 gene loci in human [6, 21], including 9518 intergenic lncRNAs (also called lincRNAs) and 5362 genic lncRNAs [14, 15, 20]. These studies indicate that (1) lncRNAs are independent transcriptional units; (2) lncRNAs are spliced with fewer exons than protein-coding transcripts and utilize the canonical splice sites; (3) lncRNAs are under weaker selective constraints during evolution and many are primate specific; (4) lncRNA transcripts are subjected to typical histone modifications as proteincoding mRNAs, and (5) the expression of lncRNAs is relatively low and strikingly cell-type or tissue-specific. The discovery of lncRNA has provided an important new perspective on the centrality of RNA in gene expression regulation. lncRNAs can regulate the transcriptional activity of a chromosomal region or a particular gene by recruiting epigenetic modification complexes in either cis- or trans-regulatory manner. For example, Xist, a 17 kb X chromosome specific non-coding transcript, initiates X chromosome inactivation by targeting and tethering polycombrepressive complexes (PRC) to X chromosome in cis [22– 24]. HOTAIR regulates the HoxD cluster genes in trans by serving as a scaffold which enables RNA-mediated assembly of PRC2 and LSD1 and coordinates the binding of PRC2 and LSD1 to chromatin [12, 25]. Based on the knowledge obtained from studies on a limited number of lncRNAs, at least two working models have been proposed. First, lncRNAs can function as scaffolds. lncRNAs contain discrete protein-interacting domains that can bring specific protein components into the proximity of each other, resulting in the formation of unique functional complexes [25–27]. These RNA-mediated complexes can also extend to RNA–DNA and RNA–RNA interactions. Second, lncRNAs can act as guides to recruit proteins [24, 28, 29], such as chromatin modification complexes, to chromosome [24, 29]. This may occur through RNA– DNA interactions [29] or through RNA interaction with a DNA-binding protein [24]. In addition, lncRNAs have been proposed to serve as decoys that bind to DNA-binding proteins [30], transcriptional factors [31], splicing factors [32–34], or miRNAs [35]. Some studies have also identified lncRNAs transcribed from the enhancer regions [36–38] or a neighbor loci [18, 39] of certain genes. Given that their expressions correlated with the activities of the corresponding enhancers, it was proposed that these RNAs (termed enhancer RNA/eRNA [36–38] or ncRNA-activating/ ncRNA-a [18, 39]) may regulate gene transcription. RNA-immunoprecipitation (RNA-IP) is a technique of detecting the association of individual proteins with specific RNA molecules in vivo. It can be used to investigate lncRNA–protein

Characterization of Long Non-coding RNA Associated Proteins by RNA. . .

21

interaction and identify lncRNAs that bind to a protein of interest. Here we describe the protocol of this assay with detailed materials and methods.

2

Materials Prepare all solutions using ultra-pure RNase-free water and analytical grade reagents. Contamination of the solutions with RNase can result in RNA degradation. Use filtration or/and autoclave sterilization to ensure that all reagents and supplies used in this section are RNase-free. Use RNase ZAP to clean all equipment and work surface. 1. Sucrose. 2. 1 M Tris–HCl (pH 7.4). 3. 1 M MgCl2. 4. Triton X-100. 5. 1 M KCl. 6. 0.5 M EDTA. 7. NP-40. 8. 1 M Dithiothreitol (DTT). 9. 10x Phosphate-Buffered Saline (PBS, Invitrogen, AM9625): to make 1 PBS, mix one part of 10X PBS with nine parts RNase-free water. Store at 4  C. 10. Protein A/G beads (Sigma, P9424). 11. RNase inhibitor (Invitrogen, 10777-019). 12. Protease inhibitor cocktail (Sigma, P8340). 13. TRIzol RNA extraction reagent (Invitrogen). 14. 1 mL Dounce homogenizer (Fish Scientific, FB56687). 15. Nuclear Isolation Buffer: 1.28 M sucrose, 40 mM Tris–HCl (pH 7.4), 20 mM MgCl2, 4% Triton X-100. Put 40 mL RNasefree water in a beaker with a stir bar and dissolve 21.9 g sucrose in the beaker. Add 2 mL 1 M Tris–HCl (pH 7.5), 1 mL 1 M MgCl2, and 2 mL Triton X-100, and mix well. Make up to a final volume of 50 mL with RNase-free water, store at 4  C. 16. RNA Immunoprecipitation (RIP) Buffer: 150 mM KCl, 25 mM Tris (pH 7.4), 5 mM EDTA, 0.5% NP-40. Mix 7.5 mL 1 M KCl, 1.25 mL 1 M Tris–HCl (pH 7.4), 500 uL 0.5 M EDTA, and 250 uL NP-40 and make up to a final volume of 48 mL with RNase-free water. Store at 4  C. Right before use, add DTT (0.5 mM final concentration), RNase inhibitor (100 U/mL final concentration), and protease inhibitor cocktail (1 final concentration).

22

3

Junjie Jiang et al.

Methods The procedures must be performed in an RNase-free environment. Use filtered-tips and RNase-free tubes and clean all equipment and work surface with RNase ZAP before staring the experiment. lncRNA-IP aims to identify lncRNA species that bind to a protein of interest. The protocol includes two parts: (1) preparing protein lysate from target cells and (2) immunoprecipitating the protein of interest and extract protein-bound RNAs. It is up to the readers to decide the subsequent analysis on the isolated RNAs. Before harvesting cells, pre-cool 1 PBS, RNase-free water, nuclear isolation buffer, and RIP buffer on ice; estimate the amount of RIP buffers needed and add RNase inhibitor and protease inhibitor cocktail to the buffer accordingly (see Note 1).

3.1 Whole Cell Lysate Preparation (See Note 2)

If nuclear RNA–protein interaction is the focus of the research, skip this step and go directly to step 2 for nuclear lysate preparation. 1. Harvest cells using regular trypsinization technique and count the cell number. 2. Wash cells in ice-cold 1 PBS once and resuspend the cell pellet (1.0  107 cells) in 1 mL ice-cold RIP buffer containing RNase and protease inhibitors. 3. Shear the cells on ice using a Dounce homogenizer with 15 to 20 strokes. 4. Centrifuge at 15,000  g for 15 min at 4  C and transfer the supernatant into a clean tube. This supernatant is the whole cell lysate.

3.2 Cell Harvest and Nuclei Lysate Preparation (See Note 2)

1. Harvest cells using regular trypsinization technique and count the cell number. 2. Wash cells in ice-cold 1 PBS three times and resuspend 1.0  107 cells in 2 mL ice-cold PBS (see Note 3). 3. Put cell suspension in 1 PBS on ice, add 2 mL ice-cold nuclear isolation buffer and 6 mL ice-cold RNase-free water into the tube and mix well, incubate the cells on ice for 20 min with intermittent mixing (four to five times). 4. Harvest nuclei by spinning the tube at 2500  g for 15 min at 4  C. The pellet contains the purified nuclei. 5. Resuspend nuclei pellet in 1 mL freshly prepared ice-cold RIP buffer containing DTT, RNase and protease inhibitors. 6. Shear the nucleus on ice with 15 to 20 strokes using a Dounce homogenizer. 7. Pellet nuclear membrane and debris by centrifugation at 16,000  g for 10 min at 4  C.

Characterization of Long Non-coding RNA Associated Proteins by RNA. . .

23

8. Carefully transfer the clear supernatant (nuclear lysate) into a new tube. The supernatant is nuclear lysate. 3.3 RNA ImmunePrecipitation and Purification

1. Wash 40 uL protein A/G beads with 500 μL ice-cold RIP buffer three times. After the wash, spin down the beads at 600  g for 30 s at 4  C, take off the RIP buffer, and add 40 uL RIP buffer to resuspend the beads. 2. Add the prewashed beads and 5–10 ug IgG into the whole cell lysate from Subheading 3.1, step 1 or nuclear lysate from Subheading 3.1, step 2. 3. Incubate the lysate with IgG and beads at 4  C with gentle rotation for 1 h. Pellet the IgG with beads by centrifugation at 16,000  g for 5 min. 4. Carefully transfer the supernatant (pre-cleared nuclear lysate) into a new tube. At this point, the lysate can be divided into multiple portions of equal volume for different antibodies and corresponding controls. Take 50 μL lysate and set aside on ice as input control. 5. Add antibody of interest into nuclear lysate (see Note 4), incubate the lysate and antibody overnight at 4  C with gentle rotation. 6. The next day, add 40 μL prewashed protein A/G beads and incubate at 4  C for 1 h with gentle rotation. 7. Pellet the beads by spinning at 600  g for 30 s at 4  C, remove supernatant. 8. Wash the beads with 500 μL ice-cold RIP buffer three times, invert five to ten times during each wash and pellet the beads by spinning at 600  g for 30 s at 4  C. 9. Wash the beads with 500 μL ice-cold PBS and pellet the beads by spinning at 600  g for 30 s at 4  C, and use a fine needle or tip to remove as much PBS as possible without disturbing the beads. 10. Resuspend beads in 1 mL TRIzol RNA extraction reagent and isolate co-precipitated RNA according to manufacturer’s instructions. 11. Dissolve RNA in nuclease-free water and store the RNA at 80  C for further application (see Note 5).

4

Notes 1. It is utterly important that the experiment described above is conducted with extra precaution to avoid RNA degradation. All materials have to be RNase-free and the buffers need to be precooled on ice.

24

Junjie Jiang et al.

2. To ensure result reproducibility, the cells need to be maintained consistently. 3. The abundances of different target protein and lncRNAs may vary from cell line to cell line, therefore the amount of lysate input needs to be empirically determined for each assay. We found 1.0  107 cells is a good starting point. In cases more cells are needed, scale up the amount of buffer used to ensure high nuclear lysing efficiency. 4. The amount of antibody used for each experiment need to be empirically determined. Our suggestion is to start at around 1–2 μg antibody per million cells. 5. The amount of nuclease-free water used to dissolve the RNAs are determined by several factors, including the type of downstream analysis, the amount of lncRNA bound to the target protein, and the cell-type. We recommend the researchers start at 20 uL and adjust according to their specific situations.

Acknowledgments This work was supported, in whole or in part, by the NIH (R01CA142776, R01CA190415, R01CA225929, R01CA262070), and Foundation for Women’s Cancer (YZ). References 1. Rinn JL, Chang HY (2012) Genome regulation by long noncoding RNAs. Annu Rev Biochem 81:145–166 2. Lee JT (2012) Epigenetic regulation by long noncoding RNAs. Science 338:1435–1439 3. Prensner JR, Chinnaiyan AM (2011) The emergence of lncRNAs in cancer biology. Cancer Discov 1:391–407 4. Guttman M, Rinn JL (2012) Modular regulatory principles of large non-coding RNAs. Nature 482:339–346 5. Spizzo R, Almeida MI, Colombatti A, Calin GA (2012) Long non-coding RNAs and cancer: a new frontier of translational research? Oncogene 31:4577–4587 6. Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A et al (2012) Landscape of transcription in human cells. Nature 489:101–108 7. Pachnis V, Brannan CI, Tilghman SM (1988) The structure and expression of a novel gene activated in early mouse embryogenesis. EMBO J 7:673–681

8. Bartolomei MS, Zemel S, Tilghman SM (1991) Parental imprinting of the mouse H19 gene. Nature 351:153–155 9. Brown CJ, Ballabio A, Rupert JL, Lafreniere RG, Grompe M, Tonlorenzi R et al (1991) A gene from the region of the human X inactivation centre is expressed exclusively from the inactive X chromosome. Nature 349:38–44 10. Kapranov P, Cawley SE, Drenkow J, Bekiranov S, Strausberg RL, Fodor SP et al (2002) Large-scale transcriptional activity in chromosomes 21 and 22. Science 296:916–919 11. Rinn JL, Euskirchen G, Bertone P, Martone R, Luscombe NM, Hartman S et al (2003) The transcriptional activity of human chromosome 22. Genes Dev 17:529–540 12. Rinn JL, Kertesz M, Wang JK, Squazzo SL, Xu X, Brugmann SA et al (2007) Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129:1311–1323

Characterization of Long Non-coding RNA Associated Proteins by RNA. . . 13. Gupta RA, Shah N, Wang KC, Kim J, Horlings HM, Wong DJ et al (2010) Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature 464:1071–1076 14. Guttman M, Amit I, Garber M, French C, Lin MF, Feldser D et al (2009) Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458:223–227 15. Khalil AM, Guttman M, Huarte M, Garber M, Raj A, Rivea Morales D et al (2009) Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc Natl Acad Sci U S A 106:11667–11672 16. Maeda N, Kasukawa T, Oyama R, Gough J, Frith M, Engstrom PG et al (2006) Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs. PLoS Genet 2:e62 17. Jia H, Osak M, Bogu GK, Stanton LW, Johnson R, Lipovich L (2010) Genome-wide computational identification and manual annotation of human long noncoding RNA genes. RNA 16:1478–1487 18. Orom UA, Derrien T, Beringer M, Gumireddy K, Gardini A, Bussotti G et al (2010) Long noncoding RNAs with enhancer-like function in human cells. Cell 143:46–58 19. Prensner JR, Iyer MK, Balbin OA, Dhanasekaran SM, Cao Q, Brenner JC et al (2011) Transcriptome sequencing across a prostate cancer cohort identifies PCAT-1, an unannotated lincRNA implicated in disease progression. Nat Biotechnol 29:742–749 20. Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A et al (2011) Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev 25:1915–1927 21. Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H et al (2012) The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res 22:1775–1789 22. Brown CJ, Hendrich BD, Rupert JL, Lafreniere RG, Xing Y, Lawrence J et al (1992) The human XIST gene: analysis of a 17 kb inactive X-specific RNA that contains conserved repeats and is highly localized within the nucleus. Cell 71:527–542 23. Zhao J, Sun BK, Erwin JA, Song JJ, Lee JT (2008) Polycomb proteins targeted by a short repeat RNA to the mouse X chromosome. Science 322:750–756

25

24. Jeon Y, Lee JT (2011) YY1 tethers Xist RNA to the inactive X nucleation center. Cell 146:119–133 25. Tsai MC, Manor O, Wan Y, Mosammaparast N, Wang JK, Lan F et al (2010) Long noncoding RNA as modular scaffold of histone modification complexes. Science 329:689–693 26. Yap KL, Li S, Munoz-Cabello AM, Raguz S, Zeng L, Mujtaba S et al (2010) Molecular interplay of the noncoding RNA ANRIL and methylated histone H3 lysine 27 by polycomb CBX7 in transcriptional silencing of INK4a. Mol Cell 38:662–674 27. Kotake Y, Nakagawa T, Kitagawa K, Suzuki S, Liu N, Kitagawa M et al (2011) Long non-coding RNA ANRIL is required for the PRC2 recruitment to and silencing of p15 (INK4B) tumor suppressor gene. Oncogene 30:1956–1962 28. Huarte M, Guttman M, Feldser D, Garber M, Koziol MJ, Kenzelmann-Broz D et al (2010) A large intergenic noncoding RNA induced by p53 mediates global gene repression in the p53 response. Cell 142:409–419 29. Grote P, Wittler L, Hendrix D, Koch F, Wahrisch S, Beisaw A et al (2013) The tissuespecific lncRNA Fendrr is an essential regulator of heart and body wall development in the mouse. Dev Cell 24:206–214 30. Kino T, Hurt DE, Ichijo T, Nader N, Chrousos GP (2010) Noncoding RNA gas5 is a growth arrest- and starvation-associated repressor of the glucocorticoid receptor. Sci Signal 3:ra8 31. Hung T, Wang Y, Lin MF, Koegel AK, Kotake Y, Grant GD et al (2011) Extensive and coordinated transcription of noncoding RNAs within cell-cycle promoters. Nat Genet 43:621–629 32. Tripathi V, Ellis JD, Shen Z, Song DY, Pan Q, Watt AT et al (2010) The nuclear-retained noncoding RNA MALAT1 regulates alternative splicing by modulating SR splicing factor phosphorylation. Mol Cell 39:925–938 33. Bernard D, Prasanth KV, Tripathi V, Colasse S, Nakamura T, Xuan Z et al (2010) A long nuclear-retained non-coding RNA regulates synaptogenesis by modulating gene expression. EMBO J 29:3082–3093 34. Tripathi V, Shen Z, Chakraborty A, Giri S, Freier SM, Wu X et al (2013) Long noncoding RNA MALAT1 controls cell cycle progression by regulating the expression of oncogenic transcription factor B-MYB. PLoS Genet 9: e1003368 35. Salmena L, Poliseno L, Tay Y, Kats L, Pandolfi PP (2011) A ceRNA hypothesis: the Rosetta

26

Junjie Jiang et al.

stone of a hidden RNA language? Cell 146:353–358 36. Kim TK, Hemberg M, Gray JM, Costa AM, Bear DM, Wu J et al (2010) Widespread transcription at neuronal activity-regulated enhancers. Nature 465:182–187 37. Wang D, Garcia-Bassets I, Benner C, Li W, Su X, Zhou Y et al (2011) Reprogramming transcription by distinct classes of enhancers functionally defined by eRNA. Nature 474:390–394

38. Melo CA, Drost J, Wijchers PJ, van de Werken H, de Wit E, Oude Vrielink JA et al (2013) eRNAs are required for p53-dependent enhancer activity and gene transcription. Mol Cell 49:524–535 39. Lai F, Orom UA, Cesaroni M, Beringer M, Taatjes DJ, Blobel GA et al (2013) Activating RNAs associate with mediator to enhance chromatin architecture and transcription. Nature 494:497–501

Chapter 4 Isolation of Protein Complexes Associated with Long Non-coding RNAs Min Li, Ying Xue, Ruiping He, and Qihong Huang Abstract Long non-coding RNAs (lncRNAs) are a new class of regulatory genes that play critical roles in various processes ranging from normal development to human diseases. Recent studies have shown that protein complexes are required for the functions of lncRNAs. The identification of these proteins which are associated with lncRNAs is critical for the understanding of molecular mechanisms of lncRNAs in gene regulation and their functions. In this chapter, we describe a method to isolate proteins associated with lncRNAs. This procedure involves fusion protein Maltose-Binding Protein (MBP) fused to MS2-binding protein to pull down the proteins associated with lncRNA and the identification of these proteins by mass spectrometry. Key words Long non-coding RNA, Protein complex, MS2

1

Introduction Human genome sequencing studies have shown that only 2% of the human genome consists of protein-coding sequences [1]. They have also shown that up to 90% of the human genome can be transcribed [2]. The transcripts from non-protein coding regions include non-coding RNAs. Long non-coding RNAs (lncRNAs) are a novel class of non-coding RNAs. lncRNAs have been shown to have important functions in development, as well as differentiation, X-chromosome inactivation, genomic imprinting, and cellular processes including cell cycle and apoptosis [3–7]. They have also been implicated in human diseases ranging from cancer, Parkinson’s disease, Huntington’s disease, amyotrophic lateral sclerosis (ALS) to Alzheimer’s disease [8–20]. Recent studies have shown that lncRNAs function through gene regulation in almost all steps of gene expression including chromatin remodeling, transcription, splicing, RNA decay, translation, enhancer function, and epigenetic regulation [21–39]. LncRNAs regulate gene expression by binding

Lin Zhang and Xiaowen Hu (eds.), Long Non-Coding RNAs: Methods and Protocols, Methods in Molecular Biology, vol. 2372, https://doi.org/10.1007/978-1-0716-1697-0_4, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021

27

28

Min Li et al.

to DNA, RNA, miRNAs, and proteins [7, 40–44]. Interaction with protein complexes is a common mechanism for the functions of lncRNAs. Thus, identification of proteins associated with lncRNAs is critical for the understanding of molecular mechanisms and functions of lncRNAs. Immunoprecipitation is commonly used for the isolation of protein complexes associated with a protein of interest. However, this method is not available for lncRNAs because antibodies generally do not recognize RNAs. This chapter describes a method to isolate protein complexes associated with lncRNAs using fusion protein Maltose-Binding Protein (MBP) fused to MS2-binding protein. MS2-binding protein binds to RNA sequence MS2. The MS2 sequence needs to be added to either the 50 or 30 end of the lncRNA of interest. Sometimes adding the MS2 sequence to the end of lncRNA may change the structure of lncRNA and cause the loss of its functions. Functions of lncRNA-MS2 need to be confirmed by assays to ensure that the fusion lncRNA-MS2 RNA has the same function as the wildtype lncRNA. MBP-MS2 pull-down assay consists of the following steps. The first step involves the expression of Maltose-Binding Protein fused to MS2-binding protein (MBP-MS2) and immobilization of fusion protein to amylose beads. This fusion protein is ~59 kDa with MBP N-terminal to MS2-binding protein which carries a double mutation (V75Q and A81G) that prevents oligomerization and increases its affinity for RNA. The second step consists of the construction of plasmids and their expression in mammalian cells. A plasmid with MS2 sequence alone without any non-coding RNA is used as a control. Finally, transduce the cell line of interest with MS2 control or lncRNAMS2 plasmid and lyse the cells. Incubate cytoplasmic or nuclear lysate with MBP-MS2 bound amylose beads, wash and use for further analysis. PAGE and mass spectroscopy are used to identify the bound proteins.

2

Materials All the solutions are prepared using ultrapure water from Millipore water system and analytical grade reagents.

2.1 Bacterial Lysis Buffer

20 mM HEPES pH 7.9, 200 mM KCL,1 mM EDTA, 0.05% NP-40, 1 mM PMSF. PMSF protease inhibitor in the lysis buffer is added fresh before each use (see Note 1).

2.2 MBP Column Buffer

200 mM NaCl, 20 mM Tris–HCl, 1 mM EDTA, 1 mM DTT pH 7.4. DTT is added fresh before each use (see Note 2).

Isolation of Protein Complexes Associated with Long Non-coding RNAs

2.3 2.4

3

Wash Buffer 1 Wash Buffer 2

29

20 mM HEPES pH 7.9, 200 mM KCl, and 1 mM EDTA. 20 mM HEPES pH 7.9, 20 mM KCl, and 1 mM EDTA.

Methods

3.1 MS2-MBP Fusion Protein Expression and Immobilization

1. Inoculate 5 mL Luria Broth containing 100 μg/mL ampicillin with a single bacterial colony of BL21 cells transformed with a plasmid expressing MS2-MBP and grow overnight at 37  C, 220 RPM. Next morning inoculate 1 L of Terrific Broth containing 100 μg/mL ampicillin with 5 mL of overnight culture and grow the cells at 37  C, 220 RPM for 1 h. Check OD at 600 nm. OD600 should be around 0.6–0.8, if lower than 0.6, check OD every 20 min. 2. Once the OD > 0.6, induce expression of protein by adding 100 μL of 1 M IPTG (to final concentration of 0.1 mM), continue to grow cells at 16  C, 220 RPM overnight. 3. Harvest the cells by centrifugation at 600 RPM for 10 min. All steps should be performed on ice or at 4  C from this point on. 4. Discard the supernatant and either begin lysis or store the pellet at 80  C. 5. Suspend the bacterial pellet in 5 mL chilled lysis buffer (20 mM HEPES pH 7.9, 200 mM KCL,1 mM EDTA, 0.05% NP-40) containing 1 mM PMSF for every 100 mL culture (for 1 L culture, add 50 mL lysis buffer). Break open the cells by sonication at an output of 15% for 10 s (1 s on/1 s off), put tube back on ice. Repeat sonication one more time. 6. Centrifuge the lysate at 15,000 RPM (21,000  g) at 4  C for 30 min. Collect the supernatant. 7. Take 100 μL of amylose magnetic bead suspension and add 500 μL of MBP column buffer (200 mM NaCl, 20 mM Tris– HCl, 1 mM EDTA, 1 mM DTT pH 7.4) and vortex. Place the tube on magnet for 1 min to pull the beads to the side of the tube. Discard the supernatant and repeat wash for one more time. 8. Remove the tube from the magnet and add 2 mL of lysate from step 6 to the washed amylase beads, mix thoroughly and incubate at 4  C for 1 h with gentle rotation. 9. Place the tube on a magnet for 2–3 min and discard the supernatant. 10. Wash the coated beads 3 times with wash buffer. 11. MS2-MBP fusion protein beads are ready to use in next step.

30

Min Li et al.

3.2 Transfection and Cytoplasmic Protein Extraction

1. Seed 1 106 HeLa cells per 60 mm culture dish in DMEM supplemented with 10% Fetal Bovine serum. 2. In a sterile 1.5 mL tube, add 3 μL Lipofectamine™ 2000 for 1 μg of DNA in 250 μL of Opti-MEM® I Medium without serum. Mix gently and incubate for 5 min at room temperature. 3. After 5 min of incubation, add 1 μg of lncRNA-MS2 (see Note 3) or control MS2 plasmid and incubate at room temperature for 20 min. 4. Add transfection mix drop-wise to each plate of cells. Mix gently by rocking the plate back and forth. Incubate the cells at 37  C in a CO2 incubator. 5. After 6 h, remove the medium containing the transfection mix and replace with DMEM containing 10% FBS. 6. 48 h later, wash cells twice with cold PBS and harvest with trypsin-EDTA and centrifuge at 1000  g for 5 min. 7. Carefully discard the supernatant, leaving the cell pellet as dry as possible and extract cytoplasmic protein. 8. Add 100 μL ice-cold cytoplasmic extraction reagent I (CER I) (Thermo Scientific) per 10 μL of packed cell volume to the cell pellet. Vortex the tube vigorously on the highest setting for 15 s to fully suspend the cell pellet. Incubate on ice for 10 min. 9. Add 5.5 μL of ice-cold cytoplasmic extraction reagent II (CER II) (Thermo Scientific) to the tube, vortex the tube and incubate for 1 min on ice. 10. Centrifuge at 16,000  g for 5 min and collect the supernatant (cytoplasmic extract) to a pre-chilled tube. Place the tube on ice until use or store at 80  C.

3.3 Pull-Down Assay and Protein Separation

1. Incubate the supernatant collected from Step 2:10 with the MS2-MBP fusion protein beads from Step 1:11 for 3 h at 4  C. 2. Place the tube on a magnet for 3 min and discard the supernatant. 3. Wash the beads 3 times with wash buffer 1 (20 mM HEPES pH 7.9, 200 mM KCl, and 1 mM EDTA), twice with wash buffer 2 (20 mM HEPES pH 7.9, 20 mM KCl, and 1 mM EDTA) (see Note 4) and once with ice cold PBS. 4. Dissociate bound proteins by boiling with 1 SDS sample buffer for 5 min. 5. Resolve the bound proteins on 15% SDS-polyacrylamide gel electrophoresis (SDS-PAGE) and stain with colloidal blue stain. 6. Analyze the gel by mass spectroscopy.

Isolation of Protein Complexes Associated with Long Non-coding RNAs

31

Proteins that have higher mass-spectrometry counts in cells transfected with lncRNA-MS2 than control MS2 plasmid are selected as potential candidates. Knockdown of these protein candidates with short hairpin RNAs (shRNAs) is used to confirm whether any of the candidates are required for the functions of lncRNA of interest. Further validation of the binding of protein candidates to lncRNA of interest is required to confirm their association.

4

Notes 1. PMFS is from a 1 M stock solution. 2. Pre-measured, dried DTT is purchased from Pierce. 3. MS2 tag can be constructed at the 50 or 30 end of lncRNAs. Determine the function(s) of MS2-tagged lncRNA of interest before pull-down assay is critical to ensure that MS2 tag does not affect the function(s) of lncRNA and possibly the interactions with protein complex(es). 4. We find it useful to wash the beads with stringent buffer first and then wash with less stringent buffer. This procedure makes sure that the beads are washed thoroughly but protein interactions are less perturbed.

Acknowledgments This work was supported by National Natural Science Foundation of China grant 81872358. References 1. International Human Genome Sequencing Consortium (2004) Finishing the euchromatic sequence of the human genome. Nature 431:931–945 2. Kapranov P, Cheng J, Dike S, Nix DA, Duttagupta R, Willingham AT et al (2007) RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science 316:1484–1488 3. Guttman M, Donaghey J, Carey BW, Garber M, Grenier JK, Munson G et al (2011) lincRNAs act in the circuitry controlling pluripotency and differentiation. Nature 477:295–300 4. Huarte M, Guttman M, Feldser D, Garber M, Koziol MJ, Kenzelmann-Broz D et al (2010) A large intergenic noncoding RNA induced by

p53 mediates global gene repression in the p53 response. Cell 142:409–419 5. Kino T, Hurt DE, Ichijo T, Nader N, Chrousos GP (2010) Noncoding RNA gas5 is a growth arrest- and starvation-associated repressor of the glucocorticoid receptor. Sci Signal 3:ra8 6. Mancini-Dinardo D, Steele SJ, Levorse JM, Ingram RS, Tilghman SM (2006) Elongation of the Kcnq1ot1 transcript is required for genomic imprinting of neighboring genes. Genes Dev 20:1268–1282 7. Tian D, Sun S, Lee JT (2010) The long noncoding RNA, Jpx, is a molecular switch for X chromosome inactivation. Cell 143:390–403 8. Beltran M, Puig I, Pena C, Garcia JM, Alvarez AB, Pena R et al (2008) A natural antisense transcript regulates Zeb2/Sip1 gene

32

Min Li et al.

expression during Snail1-induced epithelialmesenchymal transition. Genes Dev 22:756–769 9. Benetatos L, Hatzimichael E, Dasoula A, Dranitsaris G, Tsiara S, Syrrou M et al (2010) CpG methylation analysis of the MEG3 and SNRPN imprinted genes in acute myeloid leukemia and myelodysplastic syndromes. Leuk Res 34:148–153 10. Calin GA, Liu CG, Ferracin M, Hyslop T, Spizzo R, Sevignani C et al (2007) Ultraconserved regions encoding ncRNAs are altered in human leukemias and carcinomas. Cancer Cell 12:215–229 11. Faghihi MA, Modarresi F, Khalil AM, Wood DE, Sahagan BG, Morgan TE et al (2008) Expression of a noncoding RNA is elevated in Alzheimer’s disease and drives rapid feedforward regulation of beta-secretase. Nat Med 14:723–730 12. Harismendy O, Notani D, Song X, Rahim NG, Tanasa B, Heintzman N et al (2011) 9p21 DNA variants associated with coronary artery disease impair interferon-gamma signalling response. Nature 470:264–268 13. Li L, Feng T, Lian Y, Zhang G, Garen A, Song X (2009) Role of human noncoding RNAs in the control of tumorigenesis. Proc Natl Acad Sci U S A 106:12956–12961 14. Mourtada-Maarabouni M, Pickard MR, Hedge VL, Farzaneh F, Williams GT (2009) GAS5, a non-protein-coding RNA, controls apoptosis and is downregulated in breast cancer. Oncogene 28:195–208 15. Perez DS, Hoage TR, Pritchett JR, DucharmeSmith AL, Halling ML, Ganapathiraju SC et al (2008) Long, abundantly expressed non-coding transcripts are altered in cancer. Hum Mol Genet 17:642–655 16. Petrovics G, Zhang W, Makarem M, Street JP, Connelly R, Sun L et al (2004) Elevated expression of PCGEM1, a prostate-specific gene with cell growth-promoting function, is associated with high-risk prostate cancer patients. Oncogene 23:605–611 17. Wang X, Arai S, Song X, Reichart D, Du K, Pascual G et al (2008) Induced ncRNAs allosterically modify RNA-binding proteins in cis to inhibit transcription. Nature 454:126–130 18. Carrieri C, Cimatti L, Biagioli M, Beugnet A, Zucchelli S, Fedele S et al (2012) Long non-coding antisense RNA controls Uch1 translation through an embedded SINEB2 repeat. Nature 491:454–457 19. Carrieri C, Forrest AR, Santoro C, Persichetti F, Carninci P, Zucchelli S et al (2015) Expression analysis of the long

non-coding RNA antisense to Uch1 (AS Uch1) during dopaminergic cell’s differentiation in vitro and in neurochemical models of Parkinson’s disease. Front Cell Neurosci 9:114 20. Chanda K, Das S, Chakraborty J, Bucha S, Maitra A, Chatterjee R et al (2018) Altered level of long ncRNAs Meg3 and Neat1 in cell and animal models of Huntington’s disease. RNA Biol 15:1348–1363 21. Azzalin CM, Reichenbach P, Khoriauli L, Giulotto E, Lingner J (2007) Telomeric repeat containing RNA and RNA surveillance factors at mammalian chromosome ends. Science 318:798–801 22. Bernard D, Prasanth KV, Tripathi V, Colasse S, Nakamura T, Xuan Z et al (2010) A long nuclear-retained non-coding RNA regulates synaptogenesis by modulating gene expression. EMBO J 29:3082–3093 23. Flynn RL, Centore RC, O’Sullivan RJ, Rai R, Tse A, Songyang Z et al (2011) TERRA and hnRNPA1 orchestrate an RPA-to-POT1 switch on telomeric single-stranded DNA. Nature 471:532–536 24. Gong C, Maquat LE (2011) lncRNAs transactivate STAU1-mediated mRNA decay by duplexing with 30 UTRs via Alu elements. Nature 470:284–288 25. Gumireddy K, Li A, Yan J, Setoyama T, Johannes GJ, Orom UA et al (2013) Identification of a long non-coding RNA-associated RNP complex regulating metastasis at the translational step. EMBO J 32:2672–2684 26. Heo JB, Sung S (2011) Vernalizationmediated epigenetic silencing by a long intronic noncoding RNA. Science 331:76–79 27. Kim TK, Hemberg M, Gray JM, Costa AM, Bear DM, Wu J et al (2010) Widespread transcription at neuronal activity-regulated enhancers. Nature 465:182–187 28. Kotake Y, Nakagawa T, Kitagawa K, Suzuki S, Liu N, Kitagawa M et al (2011) Long non-coding RNA ANRIL is required for the PRC2 recruitment to and silencing of p15 (INK4B) tumor suppressor gene. Oncogene 30:1956–1962 29. Liu F, Marquardt S, Lister C, Swiezewski S, Dean C (2010) Targeted 30 processing of antisense transcripts triggers Arabidopsis FLC chromatin silencing. Science 327:94–97 30. Maison C, Bailly D, Roche D, Montes de Oca R, Probst AV, Vassias I et al (2011) SUMOylation promotes de novo targeting of HP1alpha to pericentric heterochromatin. Nat Genet 43:220–227

Isolation of Protein Complexes Associated with Long Non-coding RNAs 31. Martianov I, Ramadass A, Serra Barros A, Chow N, Akoulitchev A (2007) Repression of the human dihydrofolate reductase gene by a non-coding interfering transcript. Nature 445:666–670 32. Nagano T, Mitchell JA, Sanz LA, Pauler FM, Ferguson-Smith AC, Feil R et al (2008) The air noncoding RNA epigenetically silences transcription by targeting G9a to chromatin. Science 322:1717–1720 33. Orom UA, Derrien T, Beringer M, Gumireddy K, Gardini A, Bussotti G et al (2010) Long noncoding RNAs with enhancer-like function in human cells. Cell 143:46–58 34. Rinn JL, Kertesz M, Wang JK, Squazzo SL, Xu X, Brugmann SA et al (2007) Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129:1311–1323 35. Schmitz KM, Mayer C, Postepska A, Grummt I (2010) Interaction of noncoding RNA with the rDNA promoter mediates recruitment of DNMT3b and silencing of rRNA genes. Genes Dev 24:2264–2269 36. Tripathi V, Ellis JD, Shen Z, Song DY, Pan Q, Watt AT et al (2010) The nuclear-retained noncoding RNA MALAT1 regulates alternative splicing by modulating SR splicing factor phosphorylation. Mol Cell 39:925–938 37. Tsai MC, Manor O, Wan Y, Mosammaparast N, Wang JK, Lan F et al (2010) Long noncoding RNA as modular scaffold of histone modification complexes. Science 329:689–693

33

38. Wang KC, Chang HY (2011) Molecular mechanisms of long noncoding RNAs. Mol Cell 43:904–914 39. Yap KL, Li S, Munoz-Cabello AM, Raguz S, Zeng L, Mujtaba S et al (2010) Molecular interplay of the noncoding RNA ANRIL and methylated histone H3 lysine 27 by polycomb CBX7 in transcriptional silencing of INK4a. Mol Cell 38:662–674 40. Kong Y, Geng C, Dong Q (2019) LncRNA PAPAS may promote triplenegative breast cancer downregulating miR-34a. J Int Med Res 47:3709–3718 41. Bierhoff H, Danmert MA, Brocks D, Dambacher S, Schotta G, Grummt I (2014) Quiescence-induced LncRNAs trigger H4K20 trimethylation and transcriptional silencing. Mol Cell 54:675–682 42. Sun Y, Zhou Y, Bai Y, Wang Q, Bao J, Luo Y et al (2017) A long non-coding RNA HOTTIP expression is associated with disease progression and predicts outcome in small cell lung cancer patients. Mol Cancer 16:162 43. Zhou Y, Zhong Y, Wang Y, Zhang X, Batista DL, Gejman R et al (2007) Activation of p53 by MEG3 non-coding RNA. J Biol Chem 282:24731–24742 44. Wang Y, Xu Z, Jiang J, Xu C, Kang J, Xiao L et al (2013) Endogenous miRNA sponge lincRNA-RoR regulates Oct4, Nanog, and Sox2 in human embryonic stem cell selfrenewal. Dev Cell 25:69–80

Chapter 5 Detection of Long Noncoding RNA Expression by Real-Time PCR Rui Huang, Yihao Wang, Yaqi Deng, and Jianfeng Shen Abstract Recent advances in sequencing technology and bioinformatic analysis promote the identification of long noncoding RNA (lncRNA), a novel form of untranslated RNA transcript. Long coding RNA has been extensively investigated for its fundamental biological functions in a broad spectrum of pathogenetic and therapeutic studies. The level of lncRNA expression varies among different tissue types and contexts. Therefore, a quantitative method to accurately measure lncRNA expression is highly desired. Real-time PCR analysis enables us to conveniently determine the identity and abundance of lncRNA in a time-efficient manner. Here, we introduce how to use real-time PCR to analyze and quantify a defined lncRNA. Key words Long noncoding RNA, RNA expression, Real-time PCR

1

Introduction Noncoding transcripts were long considered as “transcriptional noise” lacking in function [1]. However, recent genome research projects, such as ENCODE and FANTOM, have revolutionized our understanding of the human genome. These studies have demonstrated that in fact more than 90% of the genome is transcribed into noncoding RNAs (ncRNAs), while coding RNAs constitute less than 2% of total transcripts [2, 3]. Based on the sequence length, ncRNA can be characterized as short noncoding RNA (sncRNA) or long noncoding RNA (lncRNA, more than 200 nucleotides) [4]. Recent genome-wide analysis has identified more than 58,000 lncRNAs accounting for 68% of the total human transcripts [5]. Increasing evidence suggests that, rather than being nonfunctional, lncRNA actually plays an essential role in the modulation of key biological processes [6, 7]. The major functions of lncRNA include: (a) transcriptional regulation in a cis- or trans-

Lin Zhang and Xiaowen Hu (eds.), Long Non-Coding RNAs: Methods and Protocols, Methods in Molecular Biology, vol. 2372, https://doi.org/10.1007/978-1-0716-1697-0_5, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021

35

36

Rui Huang et al.

manner; (b) modulation of mRNA processes; and (c) nuclear domain organization [8, 9]. To date, a number of lncRNAs have been extensively studied and their biological roles have been revealed. For instance, XIST (X-inactive specific transcript) and HOTAIR (HOX transcript antisense RNA) are well known for their functions in X-chromosome inactivation and positional identity, respectively [10]. Nevertheless, the majority of lncRNAs have not yet been studied and their biological functions remain to be elucidated. Notably, most lncRNAs have relatively low expression levels exhibiting tissue- and cell-type-specific patterns [11]. In some cases, the expression level of lncRNA can be extremely low at less than even one copy per cell [12]. As such, accurate but specific measurements to determine the expression levels of lncRNA are in high demand for further dissection of the biological features of lncRNA in diverse types of cells and tissues. To date, several detection and quantification methods have been used to examine the identity and expression of lncRNA, among which the real-time PCR approach is widely acknowledged for its high efficiency and real-time nature [13]. Real-time PCR is a technique used in a “closed tube” system to amplify a target sequence, and simultaneously to determine the quantity of a given PCR product by measuring the intensity of fluorescence generated in the reaction [14]. It has become a standard method for studying lncRNA and is commonly used in daily practice. Here, we introduce a general protocol that describes how to use real-time PCR to investigate and/or quantify a given lncRNA. It offers either absolute or relative quantification power, however, the relative quantification method is more frequently used and will be focused on in the following protocol.

2 2.1

Materials List of Materials

2.2 Primer and Probe Design

DEPC-treated water. 10 mM primers. 2  Real-time PCR Mix. cDNA template. Real-time thermal cycler. Microcentrifuge tubes. Thin-walled PCR tubes. 96- or 386-well reaction plate. Vortex mixer. Quick-spin microfuge. The application of optimal primer and probe sequences is critical in a real-time PCR assay. Sequence alignment programs and primer design software are popular tools to design optimal primer and

Real-Time PCR of Long Noncoding RNA

37

Table 1 Recommended parameters for primer and probe design Parameters

Primers/Probes

GC content

40–60%

Length

15–30 bp

Length of amplification products

50–150 bp

probe sequences. The design parameters of primers or probes are expected to concomitantly ensure high specificity and efficiency of real-time PCR [15] (Table 1). 2.3

cDNA Template

2.4 Real-Time PCR Mix

3

The cDNA template for lncRNA measurement is generated by reverse transcription. Generally, 100 ng of cDNA template per reaction is recommended, but the amount should be optimized based on cell and tissue types. The real-time PCR mix contains 0.05 units/mL Taq DNA Polymerase, 4 mM MgCl2 (see Note 1), and 0.4 mM dNTPs (dATP, dGTP, dCTP, dTTP) (see Note 2).

Methods

3.1 Reaction Preparation

Prepare the reaction mixture by adding to an individual well of a 96- or 384-well plate with 0.5 μL of primers (forward and reverse, 10 μM final concentration), 1 μL of cDNA template, 12.5 μL of real-time PCR Mix, and 11 μL of DEPC-treated water. Vortex the plate thoroughly and briefly spin it down in a pre-chilled plate centrifuge; then, immediately load the plate onto the thermal cycler for the reaction (Table 2).

3.2 Temperature Cycling

The recommended thermal cycling conditions are as follows: 1 cycle for polymerase activation at 95  C (2–5 min), and then 40 cycles for denaturation at 95  C (10 s) and for annealing/ extension at 60  C (20–50 s) (Table 3).

3.3 Detection Methods

Several methods for detecting real-time PCR products have been described. Based on the fluorescent agent used and the specifics of the PCR detection, these methods can be divided into two groups. One group employs DNA intercalating agents such as SYBR Green and Eva Green. The other group uses fluorophores attached to oligonucleotides. In practice, SYBR Green, TaqMan, and molecular beacon probes are the main tools used for the detection and quantification of fluorescence in real-time PCR [16].

38

Rui Huang et al.

Table 2 Representative real-time PCR reaction mixture. Left: required reagents, Right: the volume of individual reagent Reagents

Volume

10μM primers

0.5 μL

Template

1 μL

Real-time PCR mix

12.5 μL

DEPC-treated

11 μL

Table 3 Recommended thermal cycling condition for real-time PCR

Pre-denaturation PCR

Temperature

Time

Cycle

95  C

2–5 s

1

10 s/20–50 s

40



95/60 C

3.3.1 SYBR Green

When the SYBR Green dye binds to the minor groove of DNA, its fluorescent signal can be measured during the extension phase of each cycle of real-time PCR [17]. No fluorescent signal can be generated unless SYBR Green is incorporated, hence, the amount of fluorescent signal is strictly proportional to the level of PCR products. Nonspecific products and primer-dimers may sometimes develop, so a melting curve test is used to check the specificity of the amplified products, based on the observation that the denature temperatures of primer-dimers and nonspecific products are significantly lower than that of specific products [18] (see Note 3).

3.3.2 TaqMan Probe

The TaqMan probes are oligonucleotides that contain a donor fluorescent moiety at the 50 -end and an acceptor fluorescent moiety at the 30 -end. In solution, the fluorescent signal from the donor at the 50 -end is suppressed by the acceptor at the 30 -end. During the extension phase, the bound probe is decoupled by the 50 –30 exonuclease activity of DNA polymerase [16], generating fluorescence from the donor. Then, the fluorescent signal can be detected by a real-time thermal cycler. Furthermore, nonspecific extension of the probe can be prevented since the acceptor possesses additional advantage to block the 30 -end of the probe [19].

3.3.3 Molecular Beacon

Molecular beacons are hairpin-shaped, single-stranded oligonucleotide probes [20]. During the annealing process, the molecular beacon probe unfolds and binds to the target, and then emits a fluorescent signal. This signal is proportional to the amount of PCR product [21]. Only a perfect sequence complement between the

Real-Time PCR of Long Noncoding RNA

39

molecular beacon and the target is reached, the fluorescent signal will be emitted because the hairpin structure serves as a hybridization blocker [16]. 3.4

Data Analysis

The output of real-time PCR forms a melting curve, which shows the changes of fluorescence over the course of amplification (see Note 4). A greater initial concentration of the target leads to earlier observation of a significant increase of the fluorescent signal and the need for fewer cycles to reach the threshold of the amplified product. Therefore, the initial concentration of the target can be indicated by the number of cycles required to reach the preset threshold. This value is known as the cycle threshold (Ct) [22]. A standard curve represents a standard sample with a known starting copy number, and the abscissa of the curve represents the logarithm of the amount of products and the ordinate refers to the Ct value. As long as the Ct value is obtained, the initial copy number of the target sample can be acquired from the standard curve. In this regard, a relative quantification method to determine the expression level of the target gene by comparing its actual value to that of an internal control or reference sample is most commonly used. The “housekeeping” genes that have constant expression levels in various conditions usually serve as controls. The relative quantification can be performed based on the following formula [23]: ΔCt ¼ 2x x ¼ ΔΔCt ¼ ΔCtðESÞ  ΔCtðCSÞ ¼ CtðES  HSÞ  CtðCS  HSÞ ES: experimental sample; housekeeping sample (Fig. 1).

4

CS:

control

sample;

HS:

Notes 1. An appropriate concentration of MgCl2 is essential for Taq polymerase activity. If it is too low, optimal activity of the polymerase cannot be ensured. In contrast, the likelihood of primer-dimer formation may be significantly increased when the MgCl2 concentration becomes too high. Furthermore, the optimal MgCl2 concentration is required to maintain a high level of fluorescent signal intensity as well as a good amplification plot in data analysis. 2. The loading amount of an individual reaction mix is so small that even a tiny loading disparity may significantly affect the final results. Therefore, it is recommended to prepare a reaction master mixture and then aliquot it to each well to improve assay accuracy.

40

Rui Huang et al. 1

Extract RNA

2

RT PCR

3

Get cDNA

Combine Components

4

nucleotides primer

5

Add Samples in plate

6

Real Time PCR Running

7

probe

Data Analysis Amplification

1500 1000 500 0 0

10

20

30

40

Cycles

Fig. 1 The workflow of real-time PCR detection of lncRNAs

3. Although SYBR Green is commonly used, it presents some drawbacks, including limited dye stability. 4. If the melting curve shows more than one peak, primer-dimers or nonspecific products may be present during PCR. If the dissolution profile of the negative control shows the same peak as in the sample, that indicates the presence of contamination in the system.

Acknowledgments This work was supported, in whole or in part, by The Program for Professor of Special Appointment (Eastern Scholar) at Shanghai Institutions of Higher Learning (No. TP2018046 to J.S.), Shanghai Municipal Education Commission-Two Hundred Talents (No. 20191817 to J.S.), and General Program of National Natural Science Foundation of China (No. 81972667 to J.S.). References 1. Yang X, Meng T (2019) Long noncoding RNA in preeclampsia: transcriptional noise or innovative indicators? Biomed Res Int 2019:5437621. https://doi.org/10.1155/ 2019/5437621

2. ENCODE Project Consortium, Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, Margulies EH, Weng Z, Snyder M, Dermitzakis ET, Thurman RE, Kuehn MS, Taylor CM, Neph S, Koch CM, Asthana S,

Real-Time PCR of Long Noncoding RNA Malhotra A, Adzhubei I, Greenbaum JA, Andrews RM, Flicek P, Boyle PJ, Cao H, Carter NP, Clelland GK, Davis S, Day N, Dhami P, Dillon SC, Dorschner MO, Fiegler H, Giresi PG, Goldy J, Hawrylycz M, Haydock A, Humbert R, James KD, Johnson BE, Johnson EM, Frum TT, Rosenzweig ER, Karnani N, Lee K, Lefebvre GC, Navas PA, Neri F, Parker SC, Sabo PJ, Sandstrom R, Shafer A, Vetrie D, Weaver M, Wilcox S, Yu M, Collins FS, Dekker J, Lieb JD, Tullius TD, Crawford GE, Sunyaev S, Noble WS, Dunham I, Denoeud F, Reymond A, Kapranov P, Rozowsky J, Zheng D, Castelo R, Frankish A, Harrow J, Ghosh S, Sandelin A, Hofacker IL, Baertsch R, Keefe D, Dike S, Cheng J, Hirsch HA, Sekinger EA, Lagarde J, Abril JF, Shahab A, Flamm C, Fried C, Hackermuller J, Hertel J, Lindemeyer M, Missal K, Tanzer A, Washietl S, Korbel J, Emanuelsson O, Pedersen JS, Holroyd N, Taylor R, Swarbreck D, Matthews N, Dickson MC, Thomas DJ, Weirauch MT, Gilbert J, Drenkow J, Bell I, Zhao X, Srinivasan KG, Sung WK, Ooi HS, Chiu KP, Foissac S, Alioto T, Brent M, Pachter L, Tress ML, Valencia A, Choo SW, Choo CY, Ucla C, Manzano C, Wyss C, Cheung E, Clark TG, Brown JB, Ganesh M, Patel S, Tammana H, Chrast J, Henrichsen CN, Kai C, Kawai J, Nagalakshmi U, Wu J, Lian Z, Lian J, Newburger P, Zhang X, Bickel P, Mattick JS, Carninci P, Hayashizaki Y, Weissman S, Hubbard T, Myers RM, Rogers J, Stadler PF, Lowe TM, Wei CL, Ruan Y, Struhl K, Gerstein M, Antonarakis SE, Fu Y, Green ED, Karaoz U, Siepel A, Taylor J, Liefer LA, Wetterstrand KA, Good PJ, Feingold EA, Guyer MS, Cooper GM, Asimenos G, Dewey CN, Hou M, Nikolaev S, Montoya-Burgos JI, Loytynoja A, Whelan S, Pardi F, Massingham T, Huang H, Zhang NR, Holmes I, Mullikin JC, Ureta-Vidal A, Paten B, Seringhaus M, Church D, Rosenbloom K, Kent WJ, Stone EA, Program NCS, Baylor College of Medicine Human Genome Sequencing Center, Washington University Genome Sequencing Center, Broad Institute, Children’s Hospital Oakland Research Institute, Batzoglou S, Goldman N, Hardison RC, Haussler D, Miller W, Sidow A, Trinklein ND, Zhang ZD, Barrera L, Stuart R, King DC, Ameur A, Enroth S, Bieda MC, Kim J, Bhinge AA, Jiang N, Liu J, Yao F, Vega VB, Lee CW, Ng P, Shahab A, Yang A, Moqtaderi Z, Zhu Z, Xu X, Squazzo S, Oberley MJ, Inman D, Singer MA, Richmond TA, Munn KJ, Rada-Iglesias A, Wallerman O, Komorowski J, Fowler JC, Couttet P, Bruce AW, Dovey OM, Ellis PD, Langford CF, Nix DA, Euskirchen G, Hartman S, Urban AE,

41

Kraus P, Van Calcar S, Heintzman N, Kim TH, Wang K, Qu C, Hon G, Luna R, Glass CK, Rosenfeld MG, Aldred SF, Cooper SJ, Halees A, Lin JM, Shulha HP, Zhang X, Xu M, Haidar JN, Yu Y, Ruan Y, Iyer VR, Green RD, Wadelius C, Farnham PJ, Ren B, Harte RA, Hinrichs AS, Trumbower H, Clawson H, Hillman-Jackson J, Zweig AS, Smith K, Thakkapallayil A, Barber G, Kuhn RM, Karolchik D, Armengol L, Bird CP, de Bakker PI, Kern AD, Lopez-Bigas N, Martin JD, Stranger BE, Woodroffe A, Davydov E, Dimas A, Eyras E, Hallgrimsdottir IB, Huppert J, Zody MC, Abecasis GR, Estivill X, Bouffard GG, Guan X, Hansen NF, Idol JR, Maduro VV, Maskeri B, McDowell JC, Park M, Thomas PJ, Young AC, Blakesley RW, Muzny DM, Sodergren E, Wheeler DA, Worley KC, Jiang H, Weinstock GM, Gibbs RA, Graves T, Fulton R, Mardis ER, Wilson RK, Clamp M, Cuff J, Gnerre S, Jaffe DB, Chang JL, Lindblad-Toh K, Lander ES, Koriabine M, Nefedov M, Osoegawa K, Yoshinaga Y, Zhu B, de Jong PJ (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447(7146):799–816. https://doi.org/ 10.1038/nature05874 3. Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, Tanzer A, Lagarde J, Lin W, Schlesinger F, Xue C, Marinov GK, Khatun J, Williams BA, Zaleski C, Rozowsky J, Roder M, Kokocinski F, Abdelhamid RF, Alioto T, Antoshechkin I, Baer MT, Bar NS, Batut P, Bell K, Bell I, Chakrabortty S, Chen X, Chrast J, Curado J, Derrien T, Drenkow J, Dumais E, Dumais J, Duttagupta R, Falconnet E, Fastuca M, FejesToth K, Ferreira P, Foissac S, Fullwood MJ, Gao H, Gonzalez D, Gordon A, Gunawardena H, Howald C, Jha S, Johnson R, Kapranov P, King B, Kingswood C, Luo OJ, Park E, Persaud K, Preall JB, Ribeca P, Risk B, Robyr D, Sammeth M, Schaffer L, See LH, Shahab A, Skancke J, Suzuki AM, Takahashi H, Tilgner H, Trout D, Walters N, Wang H, Wrobel J, Yu Y, Ruan X, Hayashizaki Y, Harrow J, Gerstein M, Hubbard T, Reymond A, Antonarakis SE, Hannon G, Giddings MC, Ruan Y, Wold B, Carninci P, Guigo R, Gingeras TR (2012) Landscape of transcription in human cells. Nature 489 (7414):101–108. https://doi.org/10.1038/ nature11233 4. Mercer TR, Dinger ME, Mattick JS (2009) Long non-coding RNAs: insights into functions. Nat Rev Genet 10(3):155–159. https://doi.org/10.1038/nrg2521

42

Rui Huang et al.

5. Iyer MK, Niknafs YS, Malik R, Singhal U, Sahu A, Hosono Y, Barrette TR, Prensner JR, Evans JR, Zhao S, Poliakov A, Cao X, Dhanasekaran SM, Wu YM, Robinson DR, Beer DG, Feng FY, Iyer HK, Chinnaiyan AM (2015) The landscape of long noncoding RNAs in the human transcriptome. Nat Genet 47 (3):199–208. https://doi.org/10.1038/ng. 3192 6. Colak D, Zaninovic N, Cohen MS, Rosenwaks Z, Yang WY, Gerhardt J, Disney MD, Jaffrey SR (2014) Promoter-bound trinucleotide repeat mRNA drives epigenetic silencing in fragile X syndrome. Science 343 (6174):1002–1005. https://doi.org/10. 1126/science.1245831 7. Mondal T, Rasmussen M, Pandey GK, Isaksson A, Kanduri C (2010) Characterization of the RNA content of chromatin. Genome Res 20(7):899–907. https://doi.org/10. 1101/gr.103473.109 8. Geisler S, Coller J (2013) RNA in unexpected places: long non-coding RNA functions in diverse cellular contexts. Nat Rev Mol Cell Biol 14(11):699–712. https://doi.org/10. 1038/nrm3679 9. Ulitsky I, Bartel DP (2013) lincRNAs: genomics, evolution, and mechanisms. Cell 154 (1):26–46. https://doi.org/10.1016/j.cell. 2013.06.020 10. Zhang M, Wang F, Xiang Z, Huang T, Zhou WB (2020) LncRNA XIST promotes chemoresistance of breast cancer cells to doxorubicin by sponging miR-200c-3p to upregulate ANLN. Clin Exp Pharmacol Physiol 47 (8):1464–1472. https://doi.org/10.1111/ 1440-1681.13307 11. Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H, Guernec G, Martin D, Merkel A, Knowles DG, Lagarde J, Veeravalli L, Ruan X, Ruan Y, Lassmann T, Carninci P, Brown JB, Lipovich L, Gonzalez JM, Thomas M, Davis CA, Shiekhattar R, Gingeras TR, Hubbard TJ, Notredame C, Harrow J, Guigo R (2012) The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res 22(9):1775–1789. https://doi.org/10.1101/gr.132159.111 12. de Hoon M, Shin JW, Carninci P (2015) Paradigm shifts in genomics through the FANTOM projects. Mamm Genome 26 (9–10):391–402. https://doi.org/10.1007/ s00335-015-9593-8 13. Bustin SA, Benes V, Garson JA, Hellemans J, Huggett J, Kubista M, Mueller R, Nolan T,

Pfaffl MW, Shipley GL, Vandesompele J, Wittwer CT (2009) The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments. Clin Chem 55(4):611–622. https://doi.org/10.1373/ clinchem.2008.112797 14. Jothikumar P, Hill V, Narayanan J (2009) Design of FRET-TaqMan probes for multiplex real-time PCR using an internal positive control. BioTechniques 46(7):519–524. https:// doi.org/10.2144/000113127 15. Basu C (2015) PCR primer design. In: Methods in molecular biology, vol 1275, 2nd edn. Humana Press, New York 16. Navarro E, Serrano-Heras G, Castano MJ, Solera J (2015) Real-time PCR detection chemistry. Clin Chim Acta 439:231–250. https://doi.org/10.1016/j.cca.2014.10.017 17. Shahzamani K, Sabahi F, Merat S, Sadeghizadeh M, Lashkarian HE, Rezvan H, Samiee SM, Arzanani MK, Jabbari H, Malekzadeh R (2011) Rapid low-cost detection of hepatitis C virus RNA in HCV-infected patients by real-time RT-PCR using SYBR Green I. Arch Iran Med 14(6):396–400 18. Rodriguez A, Rodriguez M, Cordoba JJ, Andrade MJ (2015) Design of primers and probes for quantitative real-time PCR methods. Methods Mol Biol 1275:31–56. https:// doi.org/10.1007/978-1-4939-2365-6_3 19. Luthra R, Singh RR, Patel KP (2016) Clinical applications of PCR. In: Methods in molecular biology. Springer, New York 20. Ying Y, Mao S, Krueger CJ, Chen AK (2019) Live-cell imaging of long noncoding RNAs using molecular beacons. Methods Mol Biol 2038:21–33. https://doi.org/10.1007/9781-4939-9674-2_2 21. Menendez-Gil P, Caballero CJ, Solano C, Toledo-Arana A (2020) Fluorescent molecular beacons mimicking RNA secondary structures to study RNA chaperone activity. Methods Mol Biol 2106:41–58. https://doi.org/10.1007/ 978-1-0716-0231-7_3 22. Green MR, Sambrook J (2018) Analysis and normalization of real-time polymerase chain reaction (PCR) experimental data. Cold Spring Harb Protoc 2018(10). https://doi.org/10. 1101/pdb.top095000 23. Green MR, Sambrook J (2018) Constructing a standard curve for real-time polymerase chain reaction (PCR) experiments. Cold Spring Harb Protoc 2018(10). https://doi.org/10.1101/ pdb.prot095026

Chapter 6 Profiling Long Non-coding RNA expression Using Custom-Designed Microarray Xinna Zhang and George A. Calin Abstract Long non-coding RNAs (lncRNAs) are an important class of pervasive genes involved in a variety of biological functions. The abnormal expression of lncRNAs has been implicated in a range of many human diseases, including cancer. To date, a small number of functional lncRNAs have been well characterized. lncRNA expression profiling may help to identify useful molecular biomarkers and targets for novel therapeutic approaches. In this chapter, we describe a highly efficient lncRNA expression profiling method using a custom-designed microarray. Key words lncRNA expression, Microarray profiling, cDNA synthesis, cDNA labeling

1

Introduction The most well-studied sequences in the human genome are those of protein-coding genes. However, the coding exons of these genes account for only 1.5% of the genome [1, 2]. In recent years, it has become increasingly apparent that the non-protein-coding portion of the genome is of crucial functional importance in the regulation of multiple biological processes including development, differentiation, and metabolism [3, 4]. MicroRNAs (miRNAs) are the most widely studied class of ncRNAs. It has been shown that epigenetic and genetic defects in miRNAs and their processing machinery are a common hallmark of disease, such as in diabetes and cancer [5, 6]. However, other ncRNAs, such as transcribed ultraconserved regions (T-UCRs), small nucleolar RNAs (snoRNAs), PIWIinteracting RNAs (piRNAs), large intergenic non-coding RNAs (lincRNAs) along with the heterogeneous group of lncRNAs, may also contribute to the development of many different human disorders [7].

Lin Zhang and Xiaowen Hu (eds.), Long Non-Coding RNAs: Methods and Protocols, Methods in Molecular Biology, vol. 2372, https://doi.org/10.1007/978-1-0716-1697-0_6, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021

43

44

Xinna Zhang and George A. Calin

Non-coding RNAs are grouped into two major classes based on transcription size: small ncRNAs and long ncRNAs. Small ncRNAs include the well-documented miRNAs, siRNAs, and piRNAs. lncRNAs are a heterogeneous group of non-coding transcripts ranging in length from 200 nucleotides to ~100 kilobases and lack significant open reading frames. This class of ncRNAs makes up the largest portion of the mammalian non-coding transcriptome. The expression levels of lncRNAs appear to be lower than protein-coding genes, and some lncRNAs are preferentially expressed in specific tissues [8]. Various mechanisms of transcriptional regulation of gene expression by lncRNAs have been proposed. Early discoveries support a paradigm in which lncRNAs regulate transcription via chromatin modulation. They can bind directly to RNA or DNA, provide scaffolds for multi-modular complexes controlling gene expression, or exert guide functions to target proteins to specific genomic locations [9]. Recent evidence indicates that lncRNAs function in various cellular contexts, including post-transcriptional regulation, post-translational regulation of protein activity, organization of protein complexes, cell-cell signaling, as well as recombination [10]. lncRNAs are developmental and tissue specific, and have been associated with a spectrum of biological processes. The abnormal expression of lncRNAs has been implicated in a range of many human diseases, including cancer, ischemic heart disease, and Alzheimer’s disease. The altered expression of lncRNAs is a feature of many types of cancers and has been shown to promote the development, invasion, and metastasis of tumors by a variety of mechanisms. HOTAIR was also found to be involved in human neoplasia [11]. In epithelial cancer cells, HOTAIR overexpression causes polycomb to be retargeted across the genome. The invasive capacity of these cells and propensity to metastasize is also increased in these cells, mediated by the polycomb protein PRC2. By contrast, cancer invasiveness is decreased when HOTAIR expression is lost; as a result these cells show higher than usual levels of PRC2 activity. As such, HOTAIR might have an active role in modulating the cancer epigenome and mediating cell transformation. A similar function has been postulated for some other lincRNAs, such as lincRNA-p21, which functions as a repressor in p53-dependent transcriptional response [12]. An investigation of the differentially expressed lncRNAs in human diseases may help identify useful biomarkers for diagnosis and prognosis of human diseases. In addition, there is currently great interest in the emerging opportunities for targeting lncRNAs using novel therapeutic approaches. Numerous techniques are available for gaining information about lncRNA signatures. Frequently used strategies include northern blot analysis [13], microarray analysis [14], quantitative real-time polymerase chain reaction (qRT-PCR) [13], deep sequencing [15], ChIP-seq [16], and in situ

Profiling Long Non-coding RNA expression Using Custom-Designed Microarray

45

Fig. 1 Schematic of cDNA synthesis and labeling procedure

hybridization [17]. RNA microarray is a commonly used highthroughput technology based on nucleic acid hybridization between a mixture of labeled RNA identified as a target and their corresponding complementary probes. In order to profile lncRNA expression in human cancer and identify novel biomarkers involving cancer initiation and progression, we have developed a highly efficient lncRNA microarray profiling method (Fig. 1) using a custom microarray platform containing 22 K probes for pyknons, UCRs, and other lncRNAs.

2 2.1

Materials RNA Purification

2.2 RNA polyA Tailing and cDNA Synthesis

1. DNA-free kit (Life Technologies, AM1906). 1. Poly(A) Tailing Kit (Life Technologies, AM1350). 2. cDNA Synthesis System (Roche, 11117831001). 3. 5 mg/ml Glycogen (Life Technologies, AM9510). 4. 7.5 M Ammonium Acetate (Sigma, A2706). 5. Absolute Ethanol (Sigma, E702-3). 6. Agilent RNA 6000 Nano Kit (Agilent, 5067-1511). 7. Isopropanol (Sigma, I-9516). 8. Phenol:chloroform: (Ambion, 9730).

isoamyl

alcohol

(25:24:1)

46

2.3

Xinna Zhang and George A. Calin

cDNA Labeling

1. Cy3 Random Nonamers (Trilink, N46-0001). 2. Klenow Fragment (NEB, M0212M). 3. 1 M Tris–HCl pH 7.4 (Sigma, T-2663). 4. 1 M MgCl2 (Sigma, M-1028). 5. β-Mercaptoethanol (Sigma, M3148). 6. 5 M NaCl (Sigma, 71386). 7. 100 mM dNTPs (Life Technologies, 10297-018). 8. 0.5 M EDTA (Sigma, E-7889).

2.4 DNA Hybridization

1. Gene Expression Hybridization Kit (Agilent, 5188-5242).

2.5

1. A thermocycler, Technologies).

Equipment

such

as

SureCycler

8800

(Agilent

2. A desktop refrigerated centrifuge such as the Eppendorf 5430 (Eppendorf). 3. A spectrophotometer such as the NanoDrop 2000 (Thermo Scientific). 4. A Bioanalyzer 2100 (Agilent). 5. A heat block. 6. A microarray scanner such as SureScan Microarray Scanner (Agilent). 7. Hybridization Oven (Agilent).

3 3.1

Methods DNase Treatment

1. In most cases, 2–5 μg of total RNA is suitable as starting material (see Note 1). To remove the contamination of DNA, incubate 3 μg of RNA with 1 μl rDNase I (2 U) in a 15 μl reaction for 30 min at 37  C (see Note 2). 2. Add 2 μl of DNase Inactivation Reagent to stop the reaction, incubate 2 min at room temperature, mixing occasionally. 3. Centrifuge at 10,000  g for 1.5 min and transfer the RNA to a fresh tube (see Notes 3 and 4).

3.2

Poly (A) Tailing

3.3 First Strand cDNA Synthesis

1. Add 4 μl 5 Reaction Buffer, 2 μl 25 mM MnCl2, 1 μl 1 mM ATP, 1 μl poly A polymerase to the RNA (12 μl from last step), and adjust the final volume to 20 μl, mix well, then incubate the tube at 37  C for 30 min (see Notes 5–7). 1. Add 2 μl oligo dT primer to the poly (A) tailed RNA from last step, incubate at 70  C for 5 min (see Note 8), then place the tube immediately on ice.

Profiling Long Non-coding RNA expression Using Custom-Designed Microarray

47

2. Add the following components and mix well: Component

Volume

Final concentration

RT buffer, 5 conc

8 μl

1

DTT, 0.1 M

4 μl

10 mM

AMV, 25 U/ul

2 μl

50 U

Protector RNase inhibitor, 25 U/μl

1 μl

25 U

dNTP, 10 mM

3μl

Incubate the samples at 42  C for 60 min. Then place the tube on ice. 3.4 Second Strand Synthesis

1. Add the following components to the first strand reaction(s) in the indicated order on ice:

Component

Volume

cDNA from RT-reaction

40 μl

Second strand synthesis buffer, 5 conc.

30 μl

dNTP mix, 10 mM

2.5 μl

Second strand enzyme blend

6.5 μl

Water, PCR grade

71 μl

Total volume

150 μl

Incubate at +16  C for 2 h. 2. Add 20 μl of 5 U/μl T4 DNA polymerase to each reaction. Incubate at +16  C for an additional 5 min. Do not allow the reaction temperature to exceed +16  C during this step. 3. Stop reaction by adding 17 μl EDTA, 0.2 M. 3.5

Digestion of RNA

1. Add 1.5 μl of 15 U RNase I to the tubes from the step above, digest 30 min at 37  C. 2. Add 190 μl of phenol:chloroform:isoamyl alcohol to stop the reaction. Vortex well, then centrifuge at 12,000  g for 5 min. Transfer the upper, aqueous layer to a clean, labeled 1.5 ml tube. 3. Add 19 μl of 7.5 M ammonium acetate, then add 8 μl of 5 mg/ ml glycogen to the samples, mix by repeated inversion at each step. Add 380 μl of ice-cold absolute ethanol to the samples. Centrifuge at 12,000  g for 20 min. 4. Wash the supernatant with 500 μl of ice-cold 80% ethanol (v/v) once, and dissolve the DNA pellet in 20 μl of VWR water (see Note 9).

48

Xinna Zhang and George A. Calin

3.6

QC of cDNA

1. Measure the concentration of cDNA by Nanodrop, and verify that all samples meet the following requirements: Concentration  100 ng/μl; A260/A280  1.8; A260/A230  1.8. Analyze the samples using the Agilent Bioanalyzer and RNA 6000 Nano Kit, verify that all samples meet the following requirement for acceptance: Median size 400 bp when compared to a DNA ladder.

3.7

Labeling

1. Prepare Random 9mer Buffer (see Note 10) as following:

VWR deionized water

8.6 ml

1 M Tris–HCl

1.25 ml

1 M MgCl2

125 μl

β-Mercaptoethanol

17.5 μl

Total

10 ml

2. Dilute Cy3 dye-labeled 9mers to 1 O.D./42μl Random 9mer Buffer. Aliquot 40 μl in 0.2 ml thin-walled PCR tubes and store at 20  C (see Notes 10 and 11). 3. Assemble the following components in separate 0.2 ml thinwalled PCR tubes: Component

Volume

cDNA

1 μg

Cy3-9mer primers

40 μl

VWR water

To volume

Total

80 μl

4. Heat-denature samples at 98  C for 10 min. Quick-chill in an ice-water bath for 2 min (see Note 12). 5. Prepare the following dNTP/Klenow Master Mix: Component

Volume

dNTP mix

10 μl

VWR deionized water

8 μl

Klenow (50 U/μl)

2 μl

Total

20 μl

6. Add 20 μl of dNTP/Klenow Master Mix to the denatured samples from step 4. Mix well, then incubate at 37  C for 2 h. 7. Stop the reaction by addition of 10 μl 0.5 M EDTA.

Profiling Long Non-coding RNA expression Using Custom-Designed Microarray

49

8. To precipitate the cDNA, add 11.5 μl 5 M NaCl and 110 μl of isopropanol, then incubate 10 min on ice. 9. Centrifuge at maximum speed of 12,000  g for 10 min. Discard supernatant. Wash the pellet with 500 μl 80% ice-cold ethanol. 10. Air dry the pellet, then rehydrate pellets in 25 μl VWR deionized water. 11. Determine the concentration of each sample (see Note 13). 3.8

Hybridization

1. Prepare the hybridization mix as following (see Note 14):

Component

Volume

Labeled cDNA sample

5 μg

10 GE blocking agent

11 μl

2 hi-RPM hybridization buffer

55 μl

Total volume

110 μl

Incubate at 100  C for 5 min. Immediately transfer to an ice water bath for 5 min. 2. Quickly spin in a centrifuge to collect any condensation at the bottom of the tube. Immediately proceed to hybridization. 3. Hybridize at 55  C for 18 h (see Note 15).

4

Notes 1. High-quality RNA is required for optimal cDNA synthesis yield and cDNA labeling for microarray hybridization (free of interfering substances and with high integrity). Total RNAs can be prepared by TRIzol Reagent or other commercially available kits. 2. The reaction can be scaled up, but the RNA concentration shouldn’t exceed 0.2 μg/μl. 3. When transferring the RNA-containing supernatant to a fresh tube, avoid introducing the DNase Inactivation Reagent into solutions because it can sequester divalent cations and change the buffer conditions. 4. Analyze samples using the Agilent Bioanalyzer and RNA 6000 Nano Kit. Degraded samples appear as significantly lower intensity traces with the main peak area shifted to the left and typically exhibit much more noise in the trace. Verify all samples meet the following requirements: A260/A280  1.8;.A260/ A230  1.8.

50

Xinna Zhang and George A. Calin

5. Enzymes (e.g., poly A polymerase) should be mixed gently without generating bubbles. Pipet the enzymes carefully and slowly; otherwise, the viscosity of the 50% glycerol in the buffer can lead to pipetting errors. 6. If high concentration of RNA is used, the concentration of ATP should be increased too. 7. The no-PAP control cDNA can be prepared from a polyadenylation reaction in which the poly A polymerase is omitted. 8. Heat-denature the tailed RNA no longer than 5 min, since the divalent cations from the tailing reaction can cause degradation of RNA in high temperature. The polyadenylated RNA can also be purified with a phenol-chloroform extraction and ethanol precipitation before the cDNA synthesis step. 9. cDNA is stable at 20  C for 1 month, but try to avoid multiple freeze-thaw cycles. 10. Cy3-9mer dilution buffer should be prepared freshly each time when the primers are resuspended. Diluted primers can be stored at 20  C for 4 months. 11. Cyanine dyes (Cy) are ozone sensitive. It is important to regularly monitor ozone levels in the lab environment and take the necessary precautions to maintain atmospheric ozone levels below 5 ppb (parts per billion). 12. Snap-chilling after denaturation is critical for high-efficiency labeling. 13. Use a NanoDrop spectrophotometer to measure the concentration of labeled cDNA. Typical yields range from 25 μg to 50 μg per reaction. 14. To prepare the 10 Blocking Agent, add 500 μl of nucleasefree water to the vial containing lyophilized 10 Gene Expression Blocking Agent supplied with the Gene Expression Hybridization Kit, gently mix on a vortex mixer. If the pellet does not go into solution completely, heat the mix for 4–5 min at 37  C. It can then be stored at 20  C for 2 months. 15. For the microarray wash and scan, we follow the instruction of Agilent miRNA microarray protocol. The condition can be further optimized based on the design of the custom arrays. References 1. Ezkurdia I et al (2014) Multiple evidence strands suggest that there may be as few as 19,000 human protein-coding genes. Hum Mol Genet 23:5866–5878. https://doi.org/ 10.1093/hmg/ddu309 2. Warner KD, Hajdin CE, Weeks KM (2018) Principles for targeting RNA with drug-like

small molecules. Nat Rev Drug Discov 17:547–558. https://doi.org/10.1038/nrd. 2018.93 3. Fabbri M, Girnita L, Varani G, Calin GA (2019) Decrypting noncoding RNA interactions, structures, and functional networks.

Profiling Long Non-coding RNA expression Using Custom-Designed Microarray Genome Res 29:1377–1388. https://doi.org/ 10.1101/gr.247239.118 4. Redis RS, Calin GA (2017) SnapShot: non-coding RNAs and metabolism. Cell Metab 25:220–220.e1. https://doi.org/10. 1016/j.cmet.2016.12.012 5. Gao ZJ, Yuan WD, Yuan JQ, Yuan K, Wang Y (2018) miR-486-5p functions as an oncogene by targeting PTEN in non-small cell lung cancer. Pathol Res Pract 214:700–705. https:// doi.org/10.1016/j.prp.2018.03.013 6. Zhu H, Leung SW (2015) Identification of microRNA biomarkers in type 2 diabetes: a meta-analysis of controlled profiling studies. Diabetologia 58:900–911. https://doi.org/ 10.1007/s00125-015-3510-2 7. Esteller M (2011) Non-coding RNAs in human disease. Nat Rev Genet 12:861–874. https://doi.org/10.1038/nrg3074 8. Yao RW, Wang Y, Chen LL (2019) Cellular functions of long noncoding RNAs. Nat Cell Biol 21:542–551. https://doi.org/10.1038/ s41556-019-0311-8 9. Wang KC, Chang HY (2011) Molecular mechanisms of long noncoding RNAs. Mol Cell 43:904–914. https://doi.org/10.1016/ j.molcel.2011.08.018 10. Geisler S, Coller J (2013) RNA in unexpected places: long non-coding RNA functions in diverse cellular contexts. Nat Rev Mol Cell Biol 14:699–712. https://doi.org/10.1038/ nrm3679 11. Gupta RA et al (2010) Long non-coding RNA HOTAIR reprograms chromatin state to

51

promote cancer metastasis. Nature 464:1071–1076. https://doi.org/10.1038/ nature08975 12. Huarte M et al (2010) A large intergenic noncoding RNA induced by p53 mediates global gene repression in the p53 response. Cell 142:409–419. https://doi.org/10.1016/j. cell.2010.06.040 13. Feng Y et al (2014) Methods for the study of long noncoding RNA in cancer cell signaling. Methods Mol Biol 1165:115–143. https:// doi.org/10.1007/978-1-4939-0856-1_10 14. Xu G et al (2014) Long noncoding RNA expression profiles of lung adenocarcinoma ascertained by microarray analysis. PLoS One 9:e104044. https://doi.org/10.1371/jour nal.pone.0104044 15. Malouf GG et al (2015) Characterization of long non-coding RNA transcriptome in clearcell renal cell carcinoma by next-generation deep sequencing. Mol Oncol 9:32–43. https://doi.org/10.1016/j.molonc.2014.07. 007 16. Chakravarty D et al (2014) The oestrogen receptor alpha-regulated lncRNA NEAT1 is a critical modulator of prostate cancer. Nat Commun 5:5383. https://doi.org/10.1038/ ncomms6383 17. Ling H et al (2013) CCAT2, a novel noncoding RNA mapping to 8q24, underlies metastatic progression and chromosomal instability in colon cancer. Genome Res 23:1446–1461. https://doi.org/10.1101/gr.152942.112

Chapter 7 Long Non-coding RNA Expression Profiling Using Arraystar LncRNA Microarrays Yanggu Shi and Jindong Shang Abstract Arraystar LncRNA microarrays are designed for transcriptome-wide gene expression profiling of both lncRNAs and mRNAs on the same array. The array contents feature comprehensive collections of lncRNAs and include all canonical known coding mRNAs. Each lncRNA transcript is detected by a splice junction specific probe or a unique exon sequence, such that the alternatively spliced transcript isoforms or variants are reliably and accurately detected. The highly optimized experimental protocols and efficient workflow ensure sensitive, robust, and accurate microarray data generation. Standard data analyses are provided for microarray raw data processing, data quality control, gene expression clustering heat map visualization, differentially expressed lncRNAs and mRNAs, lncRNA subcategories, regulatory relationships of lncRNAs with the target mRNAs, gene ontology and pathway analyses. The LncRNA microarrays are a powerful tool for studying lncRNAs in biology and disease, with broad applications in gene expression profiling, gene regulatory mechanism research, lncRNA functional discovery, and biomarker development. Key words Long non-coding RNA, lncRNA, lincRNA, ceRNA, Microarray, Gene expression, Gene regulation, Expression profiling, Epigenetics, Biomarker, Arraystar

1

Introduction lncRNAs have diverse regulatory roles in gene expression [1–7] and are involved in many biological processes and diseases [3, 8– 21]. Expression profiling of lncRNAs has become increasingly essential to unravel how the genes are expressed and regulated. Also, lncRNA expression patterns during biological development or pathogenesis are often more specific than that of mRNAs, which can be desirable for better specificity in biomarker applications [2, 22]. In a sense, past gene expression profiling studies missing the lncRNAs may benefit from revisiting the lncRNA expression to gain new insights. For gene expression profiling that includes lncRNAs, microarray is a preferred platform compared to RNA-seq [23]. Typically, lncRNAs are expressed at much lower abundance (~1/10 of the

Lin Zhang and Xiaowen Hu (eds.), Long Non-Coding RNAs: Methods and Protocols, Methods in Molecular Biology, vol. 2372, https://doi.org/10.1007/978-1-0716-1697-0_7, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021

53

54

Yanggu Shi and Jindong Shang

median mRNA level) [1, 2, 24–28], which can be overwhelmed by the highly abundant mRNAs and impeded for accurate quantification in RNA-sequencing [29–31]. In fact, the RNA-seq performance metrics of Limit of Quantification (LoQ) to measure the transcript abundance, Limit of Detection of Ratio (LoDR) to detect differential expression, and Limit of Assembly (LoA) to assemble RNA transcript isoform from short NGS reads are poor for lncRNAs [32]. That is, less than 5% lncRNAs can be quantified at sufficient accuracy and only a tiny fraction of lncRNAs (< 1%) can be detected for differential expression down to twofold change by RNA-seq (Fig. 5). For microarrays, on the other hand, different RNA sequences are independently hybridized to the probes on the microarray. lncRNAs at lower abundance levels are relatively unaffected by the presence of highly abundant but unrelated RNA molecules [23, 33]. Also, a significant population of lncRNAs (50–70%) are transcribed as natural antisense RNAs (NAT), the strand directions of which can be readily and natively determined by the complementarity of the array probes. Like mRNAs, lncRNAs undergo alternative splicing, which may produce functionally diverse or even opposite transcripts without the constraints of encoding protein. Due to the difficulty of transcript reconstruction from short RNA-sequencing reads, less than 35% of the all RNA transcript isoforms can be correctly reconstructed by top RNA-seq assembly algorithms [34]. For lncRNAs at lower expression levels and poorly covered by sequencing reads, the problem is worse: the fraction of lncRNAs above the Limit of Assembly (LoA) is less than 1% [35]. However, with Arraystar LncRNA microarrays, the splice junction specific or unique exon specific probes unambiguously and reliably detect RNAs at transcript level. The superior performance of microarray over RNA-seq for lncRNA profiling was demonstrated by using a publically available RNA-seq dataset (BioProject PRJNA393626) and Arraystar LncRNA microarray data to compare the same metastatic colorectal cancer cell line SW620 and primary cancer cell line SW480 [36]. Even with the RNA-seq coverage at 10G much deeper than typical mRNA-seq at 6G, fewer than a hundred lncRNAs were detected for differential expression by RNA-seq, whereas Arraystar LncRNA microarray sensitively detected close to 2000 differentially expressed lncRNAs under the same stringency cutoffs at |FC|  3 and FDR  0.05. That is, the microarray was a magnitude better than RNA-seq for lncRNA. Arraystar LncRNA microarrays curate from various data sources and scientific publications for comprehensive lncRNA collection. On the same microarray, all known canonical protein coding mRNAs in Refseq, UCSC, GENCODE, and others are entirely covered. Importantly, the stringent computational pipeline ensures the lncRNA collections are both extensive and of high confidence.

LncRNA Microarrays

55

Unlike protein coding genes, publically available lncRNAs are often scantily annotated, partial in scope, and scattered in collection. Large numbers of lncRNAs deposited in public databases are actually RNA degradation fragments or artifacts. Arraystar maintains high quality proprietary transcriptome and lncRNA databases to extensively collect lncRNAs through lncRNA discovery pipelines, external data sources, and knowledge-based mining of scientific publications. The lncRNAs in “Gold standard” collection tier are well annotated, experimentally supported, and described in at least one Pubmed publication. They are complete with information with lncRNA transcription units, transcript isoforms, functional molecular mechanisms, and subcellular localizations. The “Reliable lncRNAs” are the remaining lncRNA sequences from databases and publications merged into transcription units (TU). One best representative lncRNA is selected from each TU based on the transcript source, length, and other available information. This tier of lncRNAs is a comprehensive yet high confidence collection of lncRNAs even they have not yet been well studied so far. Along with the expression data, the standard analyses are furnished with detailed annotation. lncRNAs are broadly classified in the epigenomic context as promoter driven lncRNAs (p-lncRNA) and enhancer driven lncRNAs (e-lncRNA). Based on the genomic arrangements and the cis-regulatory potentials on the nearby protein coding genes, lncRNAs are systematically and concisely classified into subgroups: intergenic, bidirectional, intronic, antisense and sense-overlapping. The abundance levels, differential expression, cell/tissue specific expression, subcellular localization, degree of evolutionary conservation, small peptide coding potential, associated coding genes, non-coding and coding relationships, associated diseases and cancers, clustering heat maps, Volcano plots, gene ontology and pathway analysis are richly analyzed to get close to the lncRNA biology. Important subcategories of lncRNAs, such as lincRNAs as potential enhancer lncRNAs and antisense lncRNAs partnering with their mRNAs, are specially analyzed in subsets. At a practical level, the microarray procedures are robust, efficient, and much simpler. Microarray RNA targets are generated by strong T7 promoter driven linear amplification in vitro, which is better at preserving the fidelity of native RNA abundance levels and avoiding distortions/biases introduced by exponential PCR amplifications during the RNA-seq steps [37–40]. The RNA labeling system is formulated for both lncRNA and mRNA. Cy3-labeled cRNAs are generated along the entire length of the transcript without 30 -end bias. The procedure enhances the detection of RNAs with limited quantities, degraded RNAs, or lncRNAs without a poly(A) tail. The highly optimized protocols and wellestablished workflow maximize the success of producing sensitive, accurate, and reproducible results.

56

2

Yanggu Shi and Jindong Shang

Materials 1. LncRNA Expression Microarrays Arraystar LncRNA Expression Microarrays are available for human, mouse, and rat species (Arraystar Inc., Maryland, USA). Due to the low phylogenetic conservation of lncRNA sequences [2], an LncRNA array designed for one species cannot be used for other species. The technical specifications are listed in Table 1. 2. Arraystar Flash RNA Labeling Kit, One Color Arraystar Flash RNA Labeling Kit, One Color is designed to prepare fluorophore-labeled RNA targets from an input RNA amount between 1.5 ~ 3μg. The kit reverse transcribes RNA to double stranded cDNA using oligo(dT) and random primers, both containing a T7 promoter. The target RNAs are then linearly amplified from the T7 promoter by T7 polymerase in an in vitro transcription (IVT) reaction. The reaction simultaneously amplifies cRNA and incorporates Cy3- or Cy5-UTP substrate. The IVT linear amplification preserves the native RNA abundance levels much better than PCR-based exponential amplification. 3. Microarray Hybridization. l Hybridization oven, such as Agilent Microarray Hybridization Oven G2545A or Tecan HS Pro Hybridization Station. l

Agilent SureHyb Microarray Hybridization Chamber G2534A.

l

Agilent Hybridization Gasket Slide Kit.

l

Rectangular slide staining dishes and slide rack.

4. Compatible Microarray Scanners for standard 100 x 300 slide format. l Agilent Microarray Scanner, Model SureScan. l

Agilent Microarray Scanner, Model G2565B.

l

GenePix® 4000A Microarray Scanner (Molecular Devices).

l

GenePix® 4000B Microarray Scanner (Molecular Devices).

5. Agilent 2100 BioAnalyzer. 6. NanoDrop spectrophotometer, equipped with Microarray module (Thermo Scientific). 7. PCR Thermal cycler. 8. Software packages. l Agilent Feature Extraction Software. l

Agilent GeneSpring GX Software Package.

LncRNA Microarrays

57

Table 1 Arraystar LncRNA Expression Microarray specifications Arraystar Rat LncRNA Arraystar Human LncRNA Arraystar Mouse LncRNA Expression Microarray Microarray Expression Microarray V5.0 Expression Microarray V4.0V3.0 GPL26963 GEO platform accession

GPL26962

GPL26961

Species

Human, Homo sapiens

Mouse, Mus musculus

Rat, Rattus norvegicus

Total number of distinct probes

60,491

60,641

38,620

Probe length

60 nt

60 nt

60 nt

Probe region

Unique exon or splice junction specific probes along the entire length of the transcript

Probe Transcript-specific specificity Protein coding mRNAs

21,174

22,692

28,287

LncRNAs

39,317 (8393 gold standard 37,949 and 30,924 reliable LncRNAs)

10,333

LncRNA sources

Arraystar LncRNA collection pipelines: lncRNAs from all major databases and literatures up to 2018. “Canonical” or “longest” priority assigned to the transcript for each lncRNA gene External databases (current in 2018): FANTOM5 CAT (v1), GENCODE (v29), RefSeq (updated to 2018.11), BIGTranscriptome (v1), knownGene (updated to 2018.11), LncRNAdb, LncRNAWiki, RNAdb, NRED, CLS FL, NONCODE (v5), MiTranscriptome (v2) Literatures:

Arraystar LncRNA collection pipelines: lncRNAs from all major databases and literatures up to 2018. “Canonical” or “longest” priority assigned to the transcript for each lncRNA gene External databases (current in 2018): GENECODE(VM19), RefSeq, KnownGene, GenBank Literatures: Scientific publications up to 2018

Arraystar LncRNA collection pipelines: lncRNAs from all major databases and literatures up to 2018. “Canonical” or “longest” priority assigned to the transcript for each lncRNA gene External databases (current in 2018): Refseq, Ensembl Literatures: Scientific publications up to 2018

(continued)

58

Yanggu Shi and Jindong Shang

Table 1 (continued) Arraystar Rat LncRNA Arraystar Human LncRNA Arraystar Mouse LncRNA Expression Microarray Microarray Expression Microarray V5.0 Expression Microarray V4.0V3.0 Scientific publications up to 2018 mRNA sources

Refseq, UCSC, GENCODE, Refseq, known gene, FANTOM5 CAT GENCODE

Refseq, Ensembl

Array format

8  60 k

4  44 k

8  60 k

Technology In situ oligonucleotide

9. Additional equipment. l Water bath/heating block. l

Powder-free gloves.

l

Clean, blunt forceps.

l

Micropipettors.

l

Sterilized and nuclease-free pipette tips.

l

Sterilized and nuclease-free microcentrifuge tubes.

l

High-speed microcentrifuge.

l

Low-speed tabletop microcentrifuge with slide holder attachment.

l

Vortex mixer.

10. Additional reagents. l Arraystar Spike-In RNA kit.

3

l

Agilent Gene Expression Wash Buffer Kit.

l

De-ionized nuclease-free water.

l

TRIzol™ Reagent (Thermo Fisher Scientific).

l

QIAGEN RNeasy® Mini Kit (Qiagen).

Methods An LncRNA Expression Microarray experiment consists of several major steps in a workflow shown in Fig. 1.

3.1

RNA Samples

Plan the samples in biological replicates and set up comparison groups based on the experimental objective and design (see Note 1). The amount of total RNA per sample for a microarray experiment is typically 2μg (Table 2). A lower amount may place

LncRNA Microarrays

59

Total RNA isolation and Sample QC

Spike-in control

Synthesis of Fluorescently labeled cRNA

Microarray hybridization

Microarray washing

Microarray scanning and data extraction

Data analysis and report Fig. 1 Overview of lncRNA microarray experiment

Table 2 Input RNA amount per sample per array Recommendation

Input RNA amount

Optimal

2μg

Minimum

1.5μg

Maximum

3μg

significant constraints on reliable handling, storage, recovery, sample QC, experiment repeat, success rate, and data quality. As a good practice, reserve an aliquot of the RNA from the same preparation for future qPCR confirmation of the microarray result. Total RNA can be extracted and purified in sufficient amount from 2  106 cells or 10 ~ 25 mg tissue as the starting material. We

60

Yanggu Shi and Jindong Shang

recommend using TRIzol® Reagent (Life Technologies) or RNeasy® Mini Kit (Qiagen). Please refer to the manufacturer’s instructions accordingly. For RNA extraction from blood or plasma, heparin as anticoagulant should be avoided. Heparin inhibits many enzymatic reactions and is difficult to be removed. EDTA or citrate based anticoagulants can be used instead. 3.2

RNA Sample QC

The RNA sample quality and quantity are critical to the success of microarray experiment, which should be evaluated before proceeding with the experiment. The concentration of RNA can be measured by UV absorbance at 260 nm with a NanoDrop spectrophotometer. A260/A280 and A260/A230 ratios are indicators for the presence of impurities and the values should be close to 2.0 (1.9 to 2.1 are acceptable). The RNA integrity can be evaluated by one of the following methods: l

Agilent 2100 Bioanalyzer, with a RNA Integrity Number (RIN) score greater than 7.

l

Denaturing gel electrophoresis, showing sharp and intense 28S and 18S rRNA bands, with a 28S:18S rRNA band intensity ratio close to 2:1.

RNAs extracted from formaldehyde fixed-paraffin embedded (FFPE) tissues are heavily degraded, it may be difficult to obtain a reliable BioAnalyzer RIN number, gel electrophoresis is used. 3.3

Spike-in Control

RNA samples are spiked with Spike-In RNA Mix as an internal control prior to RNA labeling. The spiked-in Control RNA is then fluorescently labeled and hybridized identically as the sample RNA during the entire processes. The control serves as the embedded reference in RNA labeling, scanner calibration, data normalization, sensitivity, and variability, which greatly improves the data quality assessment. 1. The Spike-In Mix is used in equal amount to the sample RNA in labeling reaction. Decide an input RNA amount to use (Table 2). Prepare fresh Spike-In Control dilution according to Table 3. Diluted Spike-In Mix is for single use only and should not be saved for re-use. 2. Add the volume of the diluted Spike-In Mix to the RNA sample (see Tables 2 and 3)

3.4 cRNA Synthesis and Labeling

Fluorescently labeled target RNA is synthesized using Arraystar Flash RNA Labeling Kit, specially formulated for use with Arraystar LncRNA Expression Microarrays (Fig. 2). The first-strand cDNA synthesis uses a mixture of poly(dT) and random RT primers tagged with a T7 promoter, allowing copying mRNAs and

LncRNA Microarrays

61

Table 3 Spike-in Mix dilution and amount to use Amount of sample RNA

Dilution of Spike-In control

Amount of Spike-In control

1μg

1:100

1μl

1.5μg

1:100

1.5μl

2μg

1:100

2μl

3μg

1:100

3μl

lncRNA lacking poly(A) tail or degraded mRNA

mRNA or lncRNA with poly(A) tail

AAAAAA

Reverse transcription to cDNA AAAAAA

NNNNNN

NNNNNN T7

TTTTTT T7

Random priming

Random priming

T7

Oligo(dT) priming

2nd strand cDNA synthesis T7

In vitro transcription by T7 polymerase Cy3-UTP substrate Cy3

Cy3

Cy3

Cy3

Cy3 labeled cRNA Fig. 2 Linear amplification and Cy-3 labeling of RNA targets

62

Yanggu Shi and Jindong Shang

lncRNAs either with or without polyadenylation. The mixed primers also reduce 30 -biased reverse transcription from the poly (A) tail. After the cDNA synthesis, the cRNA is transcribed in vitro from the strong T7 promoter by T7 RNA polymerase, using fluorescently labeled nucleoside triphosphate substrate. 3.4.1 Double-Strand cDNA Synthesis

1. Anneal T7-RT Primer Mix to the RNA sample in Annealing Mix. Incubate at 65  C for 5 min, then immediately chill on ice for 1–2 min. Annealing Mix x μl

RNase-Free Water

y μl

Total RNA sample (2μg)

1μl

T7-RT primer mix

5.2μl

Total reaction volume

2. Prepare cDNA Synthesis Master Mix cDNA Synthesis Master Mix 2.0μl

5 cDNA synthesis buffer

1μl

DTT

0.5μl

10 mM dNTP

1μl

MuLV RT

0.3μl

RNase inhibitor

4.8μl

Total reaction volume

3. Combine the cDNA Synthesis Master Mix with the T7-RT Primer annealed RNA sample. Incubate at 40  C for 120 min. Then 65  C for 10 min to inactivate the MuLV RT. 3.4.2 In Vitro Transcription of cRNA

1. Prepare In Vitro Transcription Master Mix on ice

In Vitro Transcription Master Mix 8μl

T7 Transcription Buffer

0.5μl

Cy3-UTP

3.8μl

IVT enzyme mix

12.3μl

Total reaction volume

2. Add 12.3μl of the In Vitro Transcription Master Mix to each synthesized cDNA reaction. Incubate at 42  C for 3.5–4 h in the dark.

LncRNA Microarrays

63

3.4.3 cRNA Purification

Add 500μl TRIzol Reagent to the mix. Proceed with the cRNA extraction according to the manufacturer’s instructions (Life Technologies).

3.4.4 Quantification of the cRNA

To verify the RNA yield and dye incorporation, we recommend using a Microarray module equipped NanoDrop ND-1000 Spectrophotometer. Select Cy3 dye setting in the Microarray module. Please consult the NanoDrop manual for operating details. 1. Determine the cRNA yield by A260 reading, select “RNA” type for an extinction coefficient of 40 in the measurement setting. cRNA amount ðμgÞ ¼

cRNA concentration ðng=μlÞ  VolumeðμlÞ 1000

A typical cRNA yield is >15μg, which is sufficient for hybridization of at least two slides. 2. Calculate the specific activity of the labeled cRNA (pmoles of dye per μg of cRNA): Specific activity ¼

Dye ðpmol=μlÞ cRNA ðμg=μlÞ

If the cRNA yield is Start Extracting. 8. After the extraction is completed successfully, examine the QC report for each extraction set by double-clicking the QC Report link in the Summary Report tab (Fig. 3). Refer to Agilent Feature Extraction Software Reference Guide for detailed explanations.

66

Yanggu Shi and Jindong Shang

Fig. 3 An example QC Report for a 8  60K microarray, generated by the Feature Extraction Software

LncRNA Microarrays

3.8

Data Analysis

3.8.1 Set Up Raw Data Import

67

GeneSpring GX (Agilent, USA) software is used for microarray data analysis, which features extensive graphic user interface, guided microarray analyses on raw or processed data. Please refer to the User Manual for instructions. Many software are also freely available for microarray data analysis (see Note 2). The procedures to import Arraystar LncRNA microarray raw data into GeneSpring are described below. To read LncRNA microarray raw data into GeneSpring GX, set up the import for the first time use. 1. Select Annotations/Create File menu.

Technology/Custom

from

2. Select Technology Type: “Single color”; Technology name: desired_technology_name; Organism: select_one; Choose a sample data file: a_rawdatafile; Number of samples in single data file: “One Sample.” 3. The raw data will be displayed in a spread sheet in the “Format data file” step. 4. Define the data rows to be imported in the “Select Row Scope for Import.” Keep the rows starting with the column headings to the end row. 5. Select Identifier: “ProbeName”; BG Corrected Signal: “gProcessedSignal.” 1. Select Project/New experiment menu.

3.8.2 Create New Project for Data Analysis

2. Select Experiment type: “Generic Single Color” and Workflow type: “Advanced Analysis.” 3. In the Load Data step, select the Technology: desired_technology_name you created previously. 4. Choose the raw data files. 5. Uncheck “Please select if your data is in log scale,” as the intensity values in raw data are not log2 transformed. Threshold raw signals to: “5.0”. Select Normalization Algorithm: “Quantile.” 6. In the Preprocess Baseline Options step, select “Do not perform baseline transformation.” 7. Once the data are imported, proceed with microarray data processing, analysis, interpretation, graphics, and result reporting using the functionalities provided by the software suite.

3.9 Standard Analyses

The standard analyses provided by the Arraystar LncRNA Expression Microarray service include common microarray data analyses as well as specialized annotations and analyses for lncRNAs. l

Microarray scan images.

l

Feature extraction and raw intensity values.

68

Yanggu Shi and Jindong Shang l

Array QC metrics.

l

Data quality and low intensity filtering/flagging (see Note 3).

l

Data normalization (see Note 4).

l

Box plots of intensity distribution.

l

LncRNA Expression Profiling Data for raw and normalized intensities.

l

mRNA Expression Profiling Data for raw and normalized intensities,

l

Scatter plots of intensities between groups or samples.

l

Differentially expressed lncRNAs (see Note 5).

l

Differentially expressed mRNAs (see Note 5).

l

lncRNA annotations (Fig. 4) including lncRNA genomic information, full length completeness status, enhancer or promoter driven e-lncRNA/p-lncRNA, Gold/Reliable collection tier, cancer/disease association, cell/tissue expression specificity, subcellular localization, transcription initiation region (TIR) conservation, exonic sequence conservation, association with phenotypical trait by GWAS, expression quantitative trait loci (eQTL) linked lncRNA-mRNA pair, small peptide encoding potential, bidirectional/antisense/exon overlapping/intronic/ intergenic relationship with the closest coding mRNA gene, and the genomic information of the associated target mRNA gene (see Note 6),

Fig. 4 Specialized lncRNA microarray data annotations and analyses systematically collate vital lncRNA information and properties to facilitate the interpretation and understanding of regulatory functions, biological roles, disease implications, and follow-up study for lncRNAs

LncRNA Microarrays

69

Fig. 5 An illustration of Limit of Quantification (LoQ) when measuring RNA abundance and Limit of Detection of Ratio (LoDR) when detecting differential gene expression in RNA-seq. The fraction of lncRNAs above LoQ is less than 5% (the yellow and red areas under the lncRNA curve). The fraction of lncRNAs above LoDR for differential expression down to twofold change is less than 1% (the red area under the lncRNA curve). The data were based on [32]

l

Special lncRNA subgroups of long intergenic non-coding RNAs (lincRNA) and antisense lncRNAs, and their associated/nearby protein coding mRNA gene expression data.

l

mRNA annotations including protein coding gene genomic information, database source, sequence accession, genomic coordinates, transcription start/stop, probe cross-hybridization with others, probe name and sequence, associated cancer/disease, whether it is a transcription factor (TF), cell/tissue expression specificity, subcellular localization, transcription initiation region (TIR) conservation, exonic sequence conservation, association with phenotypical trait by GWAS, expression quantitative trait loci (eQTL), and protein product,

70

4

Yanggu Shi and Jindong Shang l

Hierarchical clustering heat maps (see Note 7).

l

Volcano plots to visualize differentially unchanged lncRNAs or mRNAs.

l

Gene ontology analysis of differentially expressed mRNAs (see Note 8).

l

Pathway analysis of differentially expressed mRNAs (see Note 9).

l

Gene Expression Omnibus (GEO) repository support, with compliance to Minimum Information About a Microarray Experiment (MIAME) (see Note 10).

expressed

and

Notes 1. In a basic experiment design to compare gene expression between conditions and to screen differentially expressed genes, one group of samples are designated for each condition for comparison. At least three samples per group are needed as biological replicates. More biological replicates are strongly recommended. The effect of replication on gene expression microarray experiments has been discussed previously [41]. The variabilities in clinical samples are typically much larger than experiments using cell line samples, requiring more biological replicates. Due to the high quality manufacturing process and excellent reproducibility of the microarrays, technical replications, i.e., the same RNA sample on replicated arrays, are not needed. 2. Open source or free software are also available for data analysis, for example, GenePattern (www.broadinstitute.org/cancer/ software/genepattern), MultiExperiment Viewer MeV (www. tm4.org/mev.html), Chipster (chipster.csc.fi), and R/Bioconductor packages (www.bioconductor.org). 3. Each microarray feature (spot) is evaluated and flagged as “Present” or “Absent” based on whether the signal is positive and significant, uniform, above the background, over saturated, or population outlier. Additionally, a feature may be flagged as “Marginal” if the background is not uniform or the background reading is a population outlier. 4. The raw intensities need to be “normalized” first to be comparable. Quantile normalization is commonly used in microarray data normalization [42]. The method does not rely on housekeeping genes as the reference. The normalized intensity values are log2 transformed. 5. When comparing differential gene expression, for example, Group2 vs. Group1, the magnitude of change is expressed as fold change (FC), which can be obtained by:

LncRNA Microarrays

71

FC ¼ 2jNorm2Norm1j: where Norm2 and Norm1 are the arithmetic means of the log2 transformed normalized intensities for samples in Group2 and Group1, respectively. Conventionally, a decrease in gene expression (downregulation) is indicated by a negative sign of the FC: FC ¼ jFCj

if norm2 < norm1, for downregulation

For example, a gene is downregulated by 13.93-fold when comparing Group2 vs. Group1: Group1

Group2

Norm1 sample1 sample2 sample3 (average)

Norm2 sample4 sample5 sample6 (average)

6.4413

3.6496

7.2712

7.8067

7.1731

3.2023

3.2670

3.3729

FC 2|

3.3729  7.1731|

13.93

The p-value for a gene expression comparison is calculated by t-test. As many t-tests are performed for the genes on the array, the p-values are adjusted for multiple testing correction as False Discovery Rates (FDR), using Benjamini Hochberg procedure [43]. Other statistical analyses have been devised to assess differential gene expression more robustly and accurately. For example, R/Bioconductor limma package implements linear models and empirical Bayes methods to perform differential expression analysis [44]. The differentially expressed genes (DEG) are initially selected by using minimally stringent cutoffs of p  0.05 and FC  2. These DEGs are displayed on Volcano plots along the -log10( p) and log2(FC) axes. The DEGs on the list can be further ranked by FC and applied with more stringent p-value (or FDR) cutoffs using the Excel Data/Filter functionality. When selecting differentially expressed genes for confirmation, the fold change magnitudes, p-values (adjusted), and raw signal intensities should be considered. Genes showing significant FC and p-values but having very low raw intensities may still not be confirmed by qPCR. Empirically, at least one of the group raw intensity must be greater than 200 ~ 500. To confirm the differentially expressed genes, either SYBR® Green qPCR or Taqman® assay can be used. It is a good practice to perform qPCR on the same RNA prep used for the array and follow all the qPCR design guidelines by the qPCR manufacturer. The qPCR primers should be situated as close as possible to the array probe location. If the lncRNA is a natural antisense lncRNA, a strand specific primer, instead of

72

Yanggu Shi and Jindong Shang

oligo(dT) primer, should be used for first-strand cDNA synthesis. qPCR has a typical limit of detection equivalent to 0.5 ΔΔCt (or FC  1.5) in differential expression analysis. There are cases where (a) No or very few individual genes meet the threshold for DEGs, because the relevant biological differences are modest; (b) Too many genes showing up as DEGs; (c) Single-gene analysis may miss important effects on pathways. Biological processes often affect sets of genes acting in concert. Consistent changes in all genes in a pathway may have more significant effect by a large change of a single gene. To overcome these challenges, Gene Set Enrichment Analysis (GSEA) or gene ontology or pathway analysis can be considered to prioritize the genes for further studies [45]. 6. The lncRNAs on the array were collected from multiple data sources. To obtain the sequences and to access the genomic records, use the corresponding databases and accession numbers/IDs as indicated. In the report, a probe may have multiple annotation entries for the same lncRNA, due to multiple relationships with the protein coding gene(s) or transcript(s). 7. Correct sample grouping by unsupervised clustering is an indication of successful classification by gene expression patterns. However, if the groups are identical or too similar in gene expression, samples may appear incorrectly clustered with other groups. This is not necessarily an indication of microarray data problem. Due to space limitation, gene names are not displayed on the heat maps. To identify individual genes on the heat map, an interactive software is needed. Alternatively, limit the number of the genes on the heat map by including only a small subset of genes of interest, for example, DEGs at high stringency in a pathway, to allow the gene names be printed. 8. Gene ontology analysis is performed for protein coding genes only. The GO terms are hyperlinked to and described at geneontology.org. 9. Pathway analysis is performed for protein coding genes only. The default pathway database is KEGG: Kyoto Encyclopedia of Genes and Genomes (http://www.genome.jp/kegg/). 10. GEO is a public functional genomics data repository supporting MIAME-compliant data submissions. Many journals require microarray data to be deposited and the associated GEO accession number obtained prior to publication. The GEO platform records for Arraystar LncRNA Expression Microarrays have been created by Arraystar (Table 1), which can be referenced for the preparation of GEO submissions.

LncRNA Microarrays

73

References 1. Cabili MN, Trapnell C, Goff L et al (2011) Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev 25:1915–1927 2. Derrien T, Johnson R, Bussotti G et al (2012) The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res 22:1775–1789 3. Batista PJ, Chang HY (2013) Long noncoding RNAs: cellular address codes in development and disease. Cell 152:1298–1307 4. Kung JT, Colognori D, Lee JT (2013) Long noncoding RNAs: past, present, and future. Genetics 193:651–669 5. Mercer TR, Dinger ME, Mattick JS (2009) Long non-coding RNAs: insights into functions. Nat Rev Genet 10:155–159 6. Amaral PP, Clark MB, Gascoigne DK et al (2011) lncRNAdb: a reference database for long noncoding RNAs. Nucleic Acids Res 39: D146–D151 7. Fatica A, Bozzoni I (2014) Long non-coding RNAs: new players in cell differentiation and development. Nat Rev Genet 15:7–21 8. Kornienko AE, Guenzl PM, Barlow DP et al (2013) Gene regulation by the act of long non-coding RNA transcription. BMC Biol 11:59 9. Nie L, Wu HJ, Hsu JM et al (2012) Long non-coding RNAs: versatile master regulators of gene expression and crucial players in cancer. Am J Transl Res 4:127–150 10. Lee JT (2012) Epigenetic regulation by long noncoding RNAs. Science 338:1435–1439 11. Wang KC, Chang HY (2011) Molecular mechanisms of long noncoding RNAs. Mol Cell 43:904–914 12. Wapinski O, Chang HY (2011) Long noncoding RNAs and human disease. Trends Cell Biol 21:354–361 13. Chen G, Wang Z, Wang D et al (2013) LncRNADisease: a database for long-non-coding RNA-associated diseases. Nucleic Acids Res 41:D983–D986 14. Taft RJ, Pang KC, Mercer TR et al (2010) Non-coding RNAs: regulators of disease. J Pathol 220:126–139 15. Broadbent HM, Peden JF, Lorkowski S et al (2008) Susceptibility to coronary artery disease and diabetes is encoded by distinct, tightly linked SNPs in the ANRIL locus on chromosome 9p. Hum Mol Genet 17:806–814 16. Scheuermann JC, Boyer LA (2013) Getting to the heart of the matter: long non-coding RNAs

in cardiac development and disease. EMBO J 32:1805–1816 17. Gomez JA, Wapinski OL, Yang YW et al (2013) The NeST long ncRNA controls microbial susceptibility and epigenetic activation of the interferon-gamma locus. Cell 152:743–754 18. Cabianca DS, Casa V, Bodega B et al (2012) A long ncRNA links copy number variation to a polycomb/trithorax epigenetic switch in FSHD muscular dystrophy. Cell 149:819–831 19. Sana J, Faltejskova P, Svoboda M et al (2012) Novel classes of non-coding RNAs and cancer. J Transl Med 10:103 20. Gutschner T, Diederichs S (2012) The hallmarks of cancer: a long non-coding RNA point of view. RNA Biol 9:703–719 21. Han P, Li W, Lin CH et al (2014) A long noncoding RNA protects the heart from pathological hypertrophy. Nature 514:102–106 22. Kumarswamy R, Bauters C, Volkmann I et al (2014) Circulating long noncoding RNA, LIPCAR, predicts survival in patients with heart failure. Circ Res 114:1569–1575 23. Xu W, Seok J, Mindrinos MN et al (2011) Human transcriptome array for highthroughput clinical studies. Proc Natl Acad Sci U S A 108:3707–3712 24. Kampa D, Cheng J, Kapranov P et al (2004) Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. Genome Res 14:331–342 25. Cawley S, Bekiranov S, Ng HH et al (2004) Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell 116:499–509 26. Ravasi T, Suzuki H, Pang KC et al (2006) Experimental validation of the regulated expression of large numbers of non-coding RNAs from the mouse genome. Genome Res 16:11–19 27. Guttman M, Garber M, Levin JZ et al (2010) Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs. Nat Biotechnol 28:503–510 28. Yan L, Yang M, Guo H et al (2013) Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells. Nat Struct Mol Biol 20(9):1131–1139 29. Jiang L, Schlesinger F, Davis CA et al (2011) Synthetic spike-in standards for RNA-seq experiments. Genome Res 21:1543–1551 30. Labaj PP, Leparc GG, Linggi BE et al (2011) Characterization and improvement of RNA-Seq precision in quantitative transcript

74

Yanggu Shi and Jindong Shang

expression profiling. Bioinformatics 27: i383–i391 31. Toung JM, Morley M, Li M et al (2011) RNA-sequence analysis of human B-cells. Genome Res 21:991–998 32. Hardwick SA, Chen WY, Wong T et al (2016) Spliced synthetic genes as internal controls in RNA sequencing experiments. Nat Methods 13:792–798 33. Kretz M, Webster DE, Flockhart RJ et al (2012) Suppression of progenitor differentiation requires the long noncoding RNA ANCR. Genes Dev 26:338–343 34. Steijger T, Abril JF, Engstrom PG et al (2013) Assessment of transcript reconstruction methods for RNA-seq. Nat Methods 10:1177–1184 35. Deveson IW, Hardwick SA, Mercer TR et al (2017) The dimensions, dynamics, and relevance of the mammalian noncoding transcriptome. Trends Genet 33:464–478 36. Li Y, Zeng C, Hu J et al (2018) Long non-coding RNA-SNHG7 acts as a target of miR-34a to increase GALNT7 level and regulate PI3K/Akt/mTOR pathway in colorectal cancer progression. J Hematol Oncol 11:89 37. King C, Guo N, Frampton GM et al (2005) Reliability and reproducibility of gene expression measurements using amplified RNA from laser-microdissected primary breast tissue with oligonucleotide arrays. J Mol Diagn 7:57–64 38. Li L, Roden J, Shapiro BE et al (2005) Reproducibility, fidelity, and discriminant validity of

mRNA amplification for microarray analysis from primary hematopoietic cells. J Mol Diagn 7:48–56 39. Klur S, Toy K, Williams MP et al (2004) Evaluation of procedures for amplification of smallsize samples for hybridization on microarrays. Genomics 83:508–517 40. Wilson CL, Pepper SD, Hey Y et al (2004) Amplification protocols introduce systematic but reproducible errors into gene expression studies. BioTechniques 36:498–506 41. Pavlidis P, Li Q, Noble WS (2003) The effect of replication on gene expression microarray experiments. Bioinformatics 19:1620–1627 42. Bolstad BM, Irizarry RA, Astrand M et al (2003) A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19:185–193 43. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B 57:289–300 44. Smyth GK (2004) Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 3:Article3 45. Subramanian A, Tamayo P, Mootha VK et al (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci U S A 102:15545–15550

Chapter 8 Nuclear RNA Isolation and Sequencing Navroop K. Dhaliwal and Jennifer A. Mitchell Abstract Most transcriptome studies involve sequencing and quantification of steady-state mRNA by isolating and sequencing poly (A) RNA. Although this type of sequencing data is informative to determine steady-state mRNA levels, it does not provide information on transcriptional output and thus may not always reflect changes in transcriptional regulation of gene expression. Furthermore, sequencing poly (A) RNA may miss transcribed regions of the genome not usually modified by polyadenylation which includes many long non-coding RNAs including enhancer RNA (eRNA). Here, we describe nuclear RNA sequencing (nucRNA-seq) which investigates the transcriptional landscape through sequencing and quantification of nuclear RNAs which are both unspliced and spliced transcripts for protein-coding genes and nuclearretained long non-coding RNAs. Key words Transcriptome profiling, LncRNA, Nuclear RNA, Cellular fractionation, Massively parallel sequencing

1

Introduction RNA sequencing (RNA-seq) is a technique that uses next generation sequencing in order to investigate the dynamic transcriptome of the cell [1]. Most transcriptomic studies are based on RNA-seq that involves generation of poly (A) positive RNA libraries from total cellular RNA [2]. For mammalian genomes, a large proportion of the transcribed genome is non-coding and transcribed along with protein-coding genes, contributing to the complexity of mammalian transcriptomes [3]. Nuclear RNA sequencing (nucRNAseq) characterizes genome-wide transcriptional output by performing deep sequencing of a cDNA library generated from nuclear RNA isolated from purified intact nuclei [4]. The quality of nucRNA-seq data depends on isolation of pure intact nuclei from single cells, purification of RNA from the nuclear fraction, and generation of the double stranded cDNA library. Isolation of intact nuclei is the first critical step and must be optimized for different cell types to obtain pure nuclei. Here, we

Lin Zhang and Xiaowen Hu (eds.), Long Non-Coding RNAs: Methods and Protocols, Methods in Molecular Biology, vol. 2372, https://doi.org/10.1007/978-1-0716-1697-0_8, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021

75

76

Navroop K. Dhaliwal and Jennifer A. Mitchell

describe an optimized nuclear isolation protocol for mouse embryonic stem (ES) cells. The optimal protocol for erythroid cells has been described by Mitchell et al. [4] and was used to develop the mouse ES cell protocol described here and used in Dhaliwal et al. (2018, 2019) [5, 6].

2

Materials

2.1 Components for Isolation and Testing Intact Nuclei

Prepare all solutions using molecular biology grade water and reagents that are certified RNase-free. Work on ice for all steps unless otherwise indicated. Chill all buffers completely on ice prior to use. 1. Molecular biology grade Sigma water: Sigma-Aldrich, W4502. 2. 1 M Sucrose: Weigh 17.115 g sucrose, transfer to a 100 ml bottle and dissolve in Sigma water to a final volume of 50 ml. 3. 1 PBS: Sigma-Aldrich. 4. 10 RSB: Mix 2.5 ml of 1 M Tris–HCl pH 7.5, 0.5 ml of 5 M NaCl, 750 μl of 1 M MgCl2, and 21.25 ml Sigma water to make final concentrations of 100 mM Tris–HCl, 100 mM NaCl, and 30 mM MgCl2. 5. 1x RSB: 1/10 dilution of 10 RSB with Sigma water. 6. Buffer A: Mix 0.5 ml of 10 RSB, 0.5 ml of 1% Triton X-100, 2.5 μl of 1 M DTT, 0.5 ml of 1 M sucrose stock and adjust final volume to 5 ml with Sigma water generating final concentrations of 1x RSB, 0.2% Triton X-100 (see Note 1), 0.5 mM DTT, 0.1 M Sucrose. 7. Buffer B: Mix 0.5 ml of 10 RSB, 0.5 ml of 1% Triton X-100, 2.5 μl of 1 M DTT, 1.25 ml of 1 M sucrose and adjust final volume to 5 ml with Sigma water generating working concentrations as 1 RSB, 0.1% Triton X-100, 0.5 mM DTT, 0.25 M Sucrose. 8. Buffer C: Mix 16.7 ml of 1 M sucrose, 250 μl of 1 M MgCl2, 500 μl of 1 M Tris–HCl (pH 8.0), 25 μl 1 M DTT and adjust final volume to 50 ml with Sigma water generating working concentrations of 5 mM MgCl2, 10 mM Tris–HCl, 0.5 mM DTT, 0.33 M Sucrose. 9. 10 RIPA (Cell Signaling Technology, 9806), proteinase inhibitor cocktail (Roche 04693159001). 10. Antibodies used to test the purity of the nuclear fraction by western blotting: Anti-Ubf1 rabbit polyclonal (Santa Cruz Biotechnology, sc-9131), anti-Cyclophilin A mouse monoclonal (Abcam, ab58144), Goat anti-rabbit conjugated with HRP (Bio-Rad, 166-2408EDU), Goat anti-mouse conjugated with HRP (Bio-Rad, 172-1011).

Nuclear RNA Isolation and Sequencing

2.2 Components for Nuclear RNA Isolation

77

1. Trizol LS Reagent: Life Technologies, 15596-026. 2. Chloroform: Sigma-Aldrich 472476. 3. 100% Isopropanol: Sigma-Aldrich I9516. 4. 75% Ethanol: Mix 37.5 ml 100% EtOH with 12.5 ml Sigma water. 5. DNase I, RNase-free: MBI Fermentas, EN0525, 1 U/μl. 6. Acid Phenol/Chloroform: Ambion, AM9720. 7. Ribolock: Thermo Fischer, E00382, 40 U/μl.

2.3 Required Equipment

1. Polypropylene round bottom tubes: 14.5 ml, Sarstedt, 62.515.006. 2. Dounce homogenizer: B-type (tight) 7 ml, VWR, 71000-518. 3. RNase Zap: for cleaning the Dounce homogenizers between each use, Life Technologies, AM9780. 4. Centrifuge: to pellet the nuclei the centrifuge should ideally be equipped with a swinging bucket rotor able to accommodate the 14.5 ml tubes.

3

Methods

3.1 Isolating Intact Nuclei and Testing the Purity of the Nuclear Fraction

In order to perform optimal nuclear RNA sequencing, it is critical to isolate pure intact nuclei.

3.1.1 Isolation of Intact Nuclei

Note: Before starting make fresh 10 and 1 RSB, 1 M sucrose and buffers A, B, and C. Place all buffers on ice. Chill centrifuge to 4  C. 1. Pour an 8 ml sucrose cushion of Buffer C in the 14.5 ml round bottom tube for each sample and chill on ice. 2. Collect 1–5  107 dissociated single cells in cold 1 PBS. Centrifuge cells at 120  g for 4 min at 4  C. 3. Resuspend single cells in 1 ml ice cold Buffer A (see Note 2). 4. Dounce gently with 10–12 strokes in a chilled B-type (tight) 7 ml Dounce homogenizer (see Note 3). 5. Dilute with equal volume, 1 ml, of ice cold Buffer B. Wash walls of Dounce as you add Buffer B (see Note 4). Pipette gently to mix the two buffers. 6. Pipette the 2 ml of Buffer A/B suspension gently onto the wall of the tube immediately above the sucrose cushion provided by Buffer C. There should be minimal mixing of the cell suspension with the sucrose cushion (Fig. 1).

78

Navroop K. Dhaliwal and Jennifer A. Mitchell

Fig. 1 (a) This cartoon demonstrates that dounced material should be placed over the sucrose cushion by tilting the round bottom tube to an angle and adding the dounced material along the walls of the tube close to the interface with the sucrose cushion. (b) This demonstrates the position for cytosolic and nuclear fraction. Nuclei will be pelleted at the bottom of the tube, whereas the top 1 ml fraction will be the cytosolic fraction

7. Centrifuge at 300  g for 5 min at 4  C. 8. If the cytoplasmic fraction is to be retained, remove 1 ml of the cytosolic fraction from the top layer avoiding the interface with the sucrose cushion and place on ice in a fresh 14.5 ml tube. Pipette off and dispose of the mixed interface of remaining cytoplasm-contaminated solution and about half of the sucrose cushion. Pour off remaining sucrose cushion and resuspend the nuclear pellet in 1 ml ice cold 1 RSB. 3.1.2 Testing the Purity of Nuclear Fraction

1. Remove 50 μl of the suspended nuclei and cytoplasmic fraction, dilute each with 50 μl 2 RIPA buffer containing protease inhibitors, mix by pipetting and incubate on ice for 5 min. Centrifuge at 10,000  g for 10 min at 4  C. 2. Nuclear and cytoplasmic fractions can be tested using a standard western blotting protocol. After transferring the membrane can be incubated with both rabbit polyclonal anti-Ubf1 antibody (nuclear protein) at a 1:1000 dilution and mouse monoclonal anti-Cyclophilin A (cytoplasmic protein) antibody at a 1:1000 dilution for 2 h at room temperature, washed and incubated with secondary antibodies goat anti-rabbit and goat anti-mouse HRP conjugated both at 1:1000 dilution, washed and developed using Bio-Rad Clarity Western ECL kit (Fig. 2).

3.2 Trizol Extraction of Nuclear RNA

1. Working in the 14.5 ml round bottom tubes, add 3 ml Trizol LS to each 1 ml nuclear or cytoplasmic fraction, mix well by pipetting to homogenize the sample, if necessary, snap freeze on dry ice and store at 80  C for later processing while checking the purity of the nuclear and cytoplasmic fractions. 2. When ready thaw samples completely on ice.

Nuclear RNA Isolation and Sequencing

79

Fig. 2 Immunoblot showing the purity of three different ES cell cytoplasmic (CE1, CE2, CE3) and nuclear extractions (NE1, NE2, NE3). Cyclophilin A (cytoplasmic protein loading control) and UBF-1 (nuclear protein loading control) were detected simultaneously in all fractions on one immunoblot. Only UBF-1 is detected in the nuclear fractions and only Cyclophilin A is detected in the cytoplasmic fraction indicating the fractions are pure

3. Incubate at room temperature for 5 min. 4. Add 0.8 ml chloroform, cap tightly and mix by inverting 10 times. 5. Incubate at room temperature for 2–15 min. 6. Centrifuge at 12,000  g for 15 min at 4  C. 7. Remove the upper aqueous phase to a new tube avoiding the interphase which contains genomic DNA. 8. Add 2 ml of 100% isopropanol to the aqueous phase. Cap tightly, mix and incubate for 10 min at room temperature. 9. Centrifuge at 12,000  g for 10 min at 4  C. 10. Remove the supernatant from the tube leaving the RNA pellet. 11. Wash pellet with 4 ml of 75% EtOH. 12. Centrifuge at 7500  g for 5 min at 4  C. 13. Mark the location of the RNA pellet on the outside of the tube with an EtOH resistant marker and then carefully pour off the supernatant. 14. Centrifuge briefly at 7500  g at 4  C to pellet any remaining 75% EtOH which should be carefully removed with a pipette, avoiding touching the RNA pellet.

80

Navroop K. Dhaliwal and Jennifer A. Mitchell

15. Air dry the pellet on ice for about 5 min but do not allow it to dry completely or the RNA will become less soluble. 16. Resuspend the pellet in 50 μl RNase-free Sigma water and transfer to a clear 1.5 ml tube. Nanodrop samples to determine yield and purity. 17. Working on ice, for each sample take 10 μg of RNA and add Sigma water to a final volume of 79 μl. 18. To each sample add: 10 μl 10 DNase I Buffer containing MgCl2, 1 μl Ribolock, 10 μl DNase I. Mix by flicking gently and give the tubes a quick spin. 19. Incubate for 30 min at 37  C. 20. Add 100 μl acid phenol/chloroform (see Note 5). Mix by inverting 10 times and incubate at room temperature for 2 min. 21. Centrifuge at 12,000  g for 15 min at 4  C. 22. Remove the upper aqueous phase to a fresh tube avoiding the interphase. Store this tube on ice during steps 23–25. 23. Add 150 μl Sigma water to the remaining acid phenol. Mix by inverting 10 times and incubate at room temperature for 2 minutes. 24. Centrifuge at 12000 x g for 15 min at 4  C. 25. Remove the upper aqueous phase, avoiding the interphase, and combine it with the first aqueous fraction. 26. Add 1/10 volume 3 M NaAc (pH 5.2) and 2.5 volumes of ice cold 100% ethanol to precipitate the RNA. 27. Centrifuge at 12,000  g for 10 min at 4  C. 28. Remove the supernatant from the tube leaving the RNA pellet. 29. Wash pellet with 1 ml of 75% EtOH. 30. Centrifuge at 7500  g for 5 min at 4  C. 31. Mark the location of the RNA pellet on the outside of the tube with an EtOH resistant marker and then carefully pour off the supernatant. 32. Centrifuge briefly at 7500  g at 4  C to pellet any remaining 75% EtOH which should be carefully removed with a pipette, avoiding touching the RNA pellet. 33. Air dry the pellet on ice for about 5 min but do not allow it to dry completely or the RNA will become less soluble. 34. Resuspend the pellet in 20–50 μl RNase-free Sigma water. Nanodrop samples to determine yield and purity. 35. Run the sample on a Bioanalyzer to determine RNA integrity and quantity (see Note 6). 36. RT-qPCR may be used to determine the location of specific transcripts within the cell (see Note 7).

Nuclear RNA Isolation and Sequencing

3.3 Generation of Illumina Paired End Library for Sequencing

4

81

In Mitchell et al. (2012), library preparation was performed according to the Illumina PE protocol, incorporating improvements suggested in Quail et al. (2008), with reactions scaled according to starting DNA quantity [7]. Other methods that preserve RNA strand information could also be used, for example the Illumina TruSeq Stranded Total RNA Sample Preparation approach.

Notes 1. Triton X-100 is mild non-ionic detergent used to solubilize cytoplasmic membrane proteins but the nuclear membrane needs to remain intact; vary the concentration of detergent to optimize the protocol for different cell types. We routinely test 0.1%, 0.2%, and 0.5%, although some cells may require more or less detergent for optimal results. 2. In order to obtain a more concentrated cell suspension, less volume of buffer A and B can be used, for example 0.5 ml of each. 3. Gentle douncing is required to prevent bursting the nuclei. Nuclei can be observed using an inverted microscope; nuclei should be round and intact with few cytoplasmic shreds. The Dounce homogenizer should be cleaned between each use with RNase Zap (Life Technologies, AM9780) followed by Sigma water. 4. Use 14.5 ml polypropylene round bottom (Sarstedt, 62.515.006) tubes for sucrose cushions. Pipette solution gently and slowly onto the wall of the tube immediately above the sucrose cushion. Tilting the tube will reduce mixing of the cell fraction with the sucrose cushion (Fig. 1). 5. Acid phenol/chloroform is used to separate RNA from DNA. The RNA will partition into the upper aqueous phase while the DNA will remain at the interphase with the phenol solution. Avoid contaminating the RNA with excessive DNA by leaving the last 100 μl of the aqueous phase. 6. This step may be done for you by a sequencing facility prior to library construction. 7. Perform reverse transcription using random primers as many nuclear RNA will not be polyadenylated. For protein-coding genes containing intron sequences, introns will be preferentially found in the nuclear fraction as introns are co-transcriptionally spliced [8], whereas exon sequences will usually be equally partitioned between the nuclear and cytoplasmic fractions. Many lncRNAs are retained in the nucleus, for example Airn, and will be found almost exclusively in the nuclear fraction [4]. Primers that can be used to detect intron,

82

Navroop K. Dhaliwal and Jennifer A. Mitchell

Table 1 Primers used to assess the purity of the nuclear and cytoplasmic RNA fractions Gene and feature Fraction

Forward Primer

Reverse primer

Slc4a1 exon

AGTTGGGAGCTCAGCCAGT

GGTCCTTCGGGAAGTCCT

Nuclear/ cytoplasmic

Slc4a1 intron

Nuclear

GATCTGAGGCCCAGCCAGAA

CCCCACCTCCTTCACCTTCC

Airn (lncRNA)

Nuclear

ACTTTGACAGAACAATCGGC TCAG

GAACATTTGCAAAGGACAG TCGAG

Malat1

Nuclear

CACTTGTGGGGAGACCTTGT

GTTACCAGCCCAAACCTCAA

Relave transcript Level/ng RNA

(lncRNA)

100

NucRNA

CytoRNA

Slc4a1 intron

Air

80 60 40 20 0

Slc4a1 exon

Malat1

Fig. 3 Relative quantities of each sequence in the nuclear (NucRNA) and cytoplasmic (CytoRNA) fractions

exon, and non-coding transcripts in the mouse genome are provided in Table 1. Note that these primers work well for qPCR using iQ™ SYBR Green Supermix (Bio-Rad, 170-8880). Absolute quantification (using the standard curve method) per ng RNA used in the reverse transcription reaction should be applied as a reference gene is not used to normalize expression between the nuclear and cytoplasmic fractions (Fig. 3).

Nuclear RNA Isolation and Sequencing

83

Acknowledgments This work was supported by the Canadian Institutes of Health Research, the Canada Foundation for Innovation, and the Ontario Ministry of Research and Innovation (operating and infrastructure grants held by JAM). Studentship funding held by NKD was provided by Ontario Graduate Scholarships. References 1. Chu Y, Corey DR (2012) RNA sequencing: platform selection, experimental design, and data interpretation. Nucleic Acid Ther 22 (4):271–274 2. Morozova O, Hirst M, Marra MA (2009) Applications of new sequencing technologies for transcriptome analysis. Annu Rev Genomics Hum Genet 10:135–151 3. The FANTOM Consortium and RIKEN Genome Exploration Research Group and Genome Science Group (Genome Network Project Core Group) (2005) The transcriptional landscape of the mammalian genome. Science 309(5740):1559–1563 4. Mitchell JA, Clay I, Umlauf D, Chen CY, Moir CA, Eskiw CH, Schoenfelder S, Chakalova L, Nagano T, Fraser P (2012) Nuclear RNA sequencing of the mouse erythroid cell transcriptome. PLoS One 7(11):e49274 5. Dhaliwal NK, Miri K, Davidson S, Tamim El Jarkass H, Mitchell JA (2018) KLF4 nuclear

export requires ERK activation and initiates exit from naive pluripotency. Stem Cell Rep 10 (4):1308–1323 6. Dhaliwal NK, Abatti LE, Mitchell JA (2019) KLF4 protein stability regulated by interaction with pluripotency transcription factors overrides transcriptional control. Genes Dev 33 (15–16):1069–1082 7. Quail MA, Kozarewa I, Smith F, Scally A, Stephens PJ, Durbin R, Swerdlow H, Turner DJ (2008) A large genome center’s improvements to the Illumina sequencing system. Nat Methods 5(12):1005–1010 8. Tilgner H, Knowles DG, Johnson R, Davis CA, Chakrabortty S, Djebali S, Curado J, Snyder M, Gingeras TR, Guigo´ R (2012) Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs. Genome Res 22(9):1616–1625

Chapter 9 Capturing Endogenous Long Noncoding RNAs and Their Binding Proteins Using Chromatin Isolation by RNA Purification Jongchan Kim and Li Ma Abstract Long noncoding RNAs (lncRNAs) exert their functions through binding to other RNA, genomic DNA, or proteins. The identification of proteins that associate with the lncRNA of interest sheds light on the molecular basis of its biological functions. This can be achieved by tagging the lncRNA with chemically modified ribonucleotides, or by using in vitro transcribed lncRNA to retrieve proteins from cell lysates. Alternatively, endogenous lncRNAs can be pulled down from cells or tissues with biotinylated antisense DNA oligonucleotides, which may overcome the potential pitfalls of using tagged lncRNAs, such as artifacts caused by tagging or non-physiological interactions. Here we describe the detailed protocol for chromatin isolation by RNA purification (ChIRP) from mammalian cell lines and mouse tissues, which captures endogenous lncRNAs and enables subsequent identification of their physiologically relevant binding partners. Key words lncRNA, ChIRP, RNA-binding protein, Mass spectrometry

1

Introduction Long noncoding RNAs (lncRNAs) are transcripts that are longer than 200 nucleotides and lack protein-coding potential. The reported modes of action of lncRNAs include regulation of gene transcription in cis or in trans through recruitment of proteins or molecular complexes to specific loci [1, 2], the scaffolding of nuclear structures [3, 4], and titration of RNA-binding proteins or microRNAs to trigger post-transcriptional regulation [5–8]. In most cases, lncRNAs associate with proteins to exert their biological functions, and thus identifying the interacting proteins of lncRNAs enables the understanding of the molecular mechanisms of action [9].

Lin Zhang and Xiaowen Hu (eds.), Long Non-Coding RNAs: Methods and Protocols, Methods in Molecular Biology, vol. 2372, https://doi.org/10.1007/978-1-0716-1697-0_9, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021

85

86

Jongchan Kim and Li Ma

In recent years, various methods have been developed to capture lncRNA–protein interactions. For the identification of RNA-binding proteins (RBPs), a lncRNA of interest can be synthesized with modified ribonucleotide tags containing biotin, fluorescent dyes, or aptamers, and subjected to association with proteins in vitro by mixing the tagged synthetic lncRNA with cell lysates. Representative methods include tandem RNA affinity purification (TRAP) [10], RNA affinity in tandem (RAT) [11], RNA purification and identification (RaPID) [12], RiboTrap [13], and RNaseassisted RNA chromatography [14]. These in vitro methods are relatively easy and flexible but have disadvantages including high background, potential perturbation of RNA folding, the formation of non-physiological RNA–protein interactions, and the requirement for a large amount of starting material [15]. For capturing the in vivo interaction of a lncRNA with its binding proteins, either the tagged lncRNA is expressed in host cells by transfection, or the endogenous lncRNA is pulled down with biotinylated antisense DNA oligonucleotides. In both scenarios, the pulldown of the lncRNA of interest can be coupled with mass spectrometry (for protein identification) or with immunoblotting (for validation). Methods for capturing endogenous RNA-protein complexes include chromatin isolation by RNA purification (ChIRP) [16] and capture hybridization analysis of RNA targets (CHART) [17], while a tagged lncRNA can be transfected into cells and captured by using methods such as MS2 in vivo biotin tagged RNA affinity purification (MS2-BioTRAP) [18]. ChIRP and CHART are designed to identify the binding partners of lncRNAs at physiological levels, although they require more time, cost, and starting material than MS2-BioTRAP [15]. In this chapter, we describe the detailed protocol (Fig. 1) for the ChIRP method by which we have used to pull down an endogenous lncRNA from mammalian cell lines and mouse tumor tissues, followed by profiling of its physiological binding proteins [19].

2

Materials

2.1 Biotinylated DNA Probes

1. The sequences of DNA probes can be designed by using the ChIRP Probe Designer at https://www.biosearchtech.com. Probes are biotinylated at the 30 end. 2. Probe mix (100 μM in total): mix multiple probes that target different regions of the lncRNA of interest to reach a total concentration of 100 μM. For example, if five probes targeting five different regions of the lncRNA of interest are mixed, then 20 μM should be used for each probe.

Capturing Endogenous lncRNAs in vivo by ChIRP

. Tissue collection Snap-freeze in liquid nitrogen

Tissue pulverizer

Pestle and mortar Tissue shattering

Cross-linking

4% formaldehyde Pulverized tissue

Homogenize and sonicate Add biotinylated antisense DNA oligonucleotides o

Protein

o

Streptavidin beads

o

Protein

o

RNA isolation

Protein elution

One-step RT-qPCR Protein

RNA binding proteins SDS-PAGE Mass spectrometry or immunoblotting

Fig. 1 Workflow of the ChIRP procedure using animal tissues

87

88

2.2

Jongchan Kim and Li Ma

Buffers for ChIRP

1. Lysis buffer: 50 mM Tris-Cl (pH 7.0), 10 mM EDTA, and 1% SDS. Add 1 mM PMSF, protease inhibitor, phosphatase inhibitor, and RNase inhibitor (0.1 U/μl Superase-in, Thermofisher, #AM2694) fresh before use. 2. Hybridization buffer: 750 mM NaCl, 1% SDS, 50 mM Tris-Cl (pH 7.0), 1 mM EDTA, and 15% formamide. Heat briefly at 37  C to completely dissolve. Add 1 mM PMSF, protease inhibitor, phosphatase inhibitor, and RNase inhibitor (0.1 U/μl Superase-in, Thermofisher, #AM2694) fresh before use. 3. Wash buffer: dilute 20 SSC buffer (Thermofisher, #AM9765) in nuclease-free water to 2 and add 0.5% SDS. Vortex and heat briefly at 37  C. Add 1 mM PMSF fresh before use. 4. Proteinase K buffer: 100 mM NaCl, 10 mM Tris-Cl (pH 7.0), 1 mM EDTA, and 0.5% SDS. Add 1 mg/ml proteinase K (Thermofisher, #AM2546) fresh before use. 5. 4 LDS buffer: Thermofisher, #NP0007. 6. RIPA buffer: 50 mM Tris-Cl (pH 7.4), 150 mM NaCl, 0.25% deoxycholic acid, 1% NP-40, and 1 mM EDTA.

2.3 Reagents and Supplies

1. Liquid nitrogen. 2. Formaldehyde. 3. Glycine. 4. Magnetic streptavidin beads (Thermofisher, #65001). 5. Trizol. 6. Chloroform. 7. 2-propanol. 8. RNA isolation kit (Qiagen, #217004). 9. One-step RT-qPCR kit (Bio-Rad, #172-5150).

2.4

Instruments

1. Sample pulverizer and tissue bags. 2. Mortar and pestle. 3. Rotating hybridization oven.

3

Methods

3.1 Preparation of Tissue Samples

1. Snap-freeze ~300 mg animal tissues in liquid nitrogen and store them at 80  C (for preparation of samples from cultured cell lines, see Note 1). 2. Grind the frozen tissues using a mortar and pestle (a) or break them using a sample pulverizer (b).

Capturing Endogenous lncRNAs in vivo by ChIRP

89

(a) With a mortar and pestle: place a piece of frozen tissue in the mortar cooled by dry ice and grind it with the pre-cooled pestle. (b) With a sample pulverizer: process the frozen tissue in the tissue bag according to the manufacturer’s instruction. 3.2 Cross-Linking of the lncRNA-Protein Complex

1. Cross-link the processed tissues in 4% formaldehyde (diluted in PBS) at room temperature for 30–60 min. 2. Quench the cross-linking reaction with 1/10 volume (0.125 M) of 1.25 M glycine at room temperature for 5–10 min. 3. Centrifuge the sample at 2000  g at 4  C for 5 min. 4. Aspirate the supernatant and wash the pellet with chilled PBS twice. 5. Aspirate the supernatant as completely as possible. Proceed to the next step or snap-freeze the pellet in liquid nitrogen and store at 80  C.

3.3

Tissue Lysis

(Thaw the samples at room temperature if frozen) 1. Place the sample in a 50-ml conical tube and add lysis buffer (1 ml buffer per 100 mg tissue or cell pellet). 2. Break the tissue completely with a tissue homogenizer (see Note 2). 3. Sonicate the sample to make the tissue lysate with the following cycle: 30 s on and 30 s off at 4  C for 30 min (or until the lysate is clear). The sonication step may take longer if tissue chunks are still visible (see Note 3). 4. Centrifuge at 16,000  g at 4  C for 30 min and transfer the supernatant to a 15-ml conical tube. Proceed to the next step or snap-freeze the lysate in liquid nitrogen and store at 80  C.

3.4 Pre-clearing of the Tissue Lysate

(Thaw the tissue lysates at room temperature if frozen) 1. Pre-clear the lysate with lysis-buffer-washed magnetic streptavidin beads (30 μl beads per 1 ml lysate) at 37  C for 30 min in a rotating hybridization oven. Then repeat this step once. 2. Transfer the supernatant to a new tube using the magnetic separator rack (see Note 4). 3. Take 1% of the pre-cleared lysate for RNA input and store it at 80  C for RNA extraction later (as described in Subheading 3.8).

3.5 Hybridization of the Endogenous lncRNA in the Tissue Lysate with the Probes

1. Add 1 μl of the probe or probe mixture (100 μM stock) per 1 ml lysate, and add 2 ml hybridization buffer per 1 ml lysate. Incubate the sample at 37  C overnight in a rotating hybridization oven. Include two negative controls (see Note 5): (1) probe-free conditions. (2) The probe for GFP or nuclear RNA U1.

90

Jongchan Kim and Li Ma

2. Add lysis-buffer-washed streptavidin beads to the hybridization reaction (100 μl beads per 100 pmole probes). Mix well and incubate at 37  C for 30–60 min in a rotating hybridization oven. 3.6 Wash the Protein-lncRNAProbe-Streptavidin Bead Complex

3.7 Protein Elution for Mass Spectrometry or Immunoblotting

1. Aspirate the supernatant using the magnetic separator rack, and wash the beads with 1 ml wash buffer five times (10 min each time) at 37  C in a rotating hybridization oven. 2. After five washes, resuspend the beads in wash buffer. Transfer 1/10 volume to a new tube for RNA isolation (as described in Subheading 3.8) and use the rest (9/10 volume) for protein elution (as described in Subheading 3.7). 1. Aspirate the wash buffer using the magnetic separator rack. 2. Add 35 μl of 2 LDS buffer (diluted in RIPA buffer) to the tube, mix well, and boil for 30 min. Vortex for 20 s. Then store at 80  C or proceed to the next step. 3. Place the tube on the magnetic separator rack and transfer the supernatant for SDS-PAGE. 4. Process the gel for mass spectrometry or immunoblotting.

3.8 RNA Isolation for Real-Time Quantitative PCR

(Thaw the samples for RNA input from Subheading 3.4 and the samples for real-time quantitative PCR from Subheading 3.6) 1. Add 1 mg/ml proteinase K and RNA proteinase K buffer (pH 7.0) to the RNA input (total 100 μl) and the streptavidin-bound sample (total 200 μl). 2. Incubate at 50  C for 45 min in a rotating hybridization oven. 3. Briefly spin down the sample and boil it at 95  C for 10 min. 4. Add 500 μl Trizol and vortex for 10 s. Then store at 80  C or immediately proceed to the next step for RNA isolation. 5. Add 300 μl chloroform to the sample and vortex for ~20 s. Incubate at room temperature for 10 min. 6. Centrifuge at 12,000  g at 4  C for 15 min. Transfer the top, aqueous phase to a new tube. Add 1 volume of 2-propanol and mix thoroughly. 7. Load the mixture to the column from the miRNeasy kit (Qiagen, #217004) and follow the RNA isolation instruction provided by the manufacturer.

3.9 Real-Time Quantitative PCR to Validate RNA Pulldown

Perform real-time quantitative PCR using the one-step RT-qPCR kit (Bio-Rad, #172-5150). 2 μl eluted RNA from each group is amplified (see Note 6).

Capturing Endogenous lncRNAs in vivo by ChIRP

4

91

Notes 1. Cultured adherent mammalian cells are trypsinized, spun down, and processed as follows: (1) add PBS to resuspend and wash the cell pellet. (2) Centrifuge and aspirate the supernatant. (3) Cross-link the cells in 4% formaldehyde as described in Subheading 3.2. Sub-confluent mammalian cells in five to six 15-cm tissue culture plates are likely ~300 mg, which should be determined by the investigator. 2. This step is not necessary if cultured cells are used. 3. This step should be shorter if cultured cells are used. 4. It is always better to briefly spin down the tube before placing it on the magnetic separator rack. 5. If available, lncRNA-knockout cell lines or tissues from lncRNA-knockout mice can serve as an additional negative control. Alternatively, a control group treated with RNase A can be included as follows: (1) add RNase A (10 μg/ml) to the lysate and incubate the sample at 37  C for 30 min. (2) Add hybridization buffer and the probe or probe mixture. 6. One-step RT-qPCR is preferred because the amount of purified lncRNA may not be enough for both RNA concentration measurement and cDNA synthesis.

Acknowledgments L.M. is supported by the US National Cancer Institute grants R01CA166051 and R01CA181029 and a Cancer Prevention and Research Institute of Texas grant RP190029. J.K is supported by the Sogang University Research Grant of 2019 (grant number: 201910004.01) and the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT; grant number: 2019R1F1A1060705 and 2020R1F1A1065643). References 1. Rinn JL, Kertesz M, Wang JK, Squazzo SL, Xu X, Brugmann SA, Goodnough LH, Helms JA, Farnham PJ, Segal E, Chang HY (2007) Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129(7):1311–1323. https://doi.org/10.1016/j.cell.2007.05.022 2. da Rocha ST, Heard E (2017) Novel players in X inactivation: insights into Xist-mediated gene silencing and chromosome conformation. Nat Struct Mol Biol 24(3):197–204. https://doi. org/10.1038/nsmb.3370

3. Miyagawa R, Tano K, Mizuno R, Nakamura Y, Ijiri K, Rakwal R, Shibato J, Masuo Y, Mayeda A, Hirose T, Akimitsu N (2012) Identification of cis- and trans-acting factors involved in the localization of MALAT-1 noncoding RNA to nuclear speckles. RNA 18(4):738–751. https://doi.org/10.1261/rna.028639.111 4. Hutchinson JN, Ensminger AW, Clemson CM, Lynch CR, Lawrence JB, Chess A (2007) A screen for nuclear transcripts identifies two linked noncoding RNAs associated with SC35

92

Jongchan Kim and Li Ma

splicing domains. BMC Genomics 8:39. https://doi.org/10.1186/1471-2164-8-39 5. Thomson DW, Dinger ME (2016) Endogenous microRNA sponges: evidence and controversy. Nat Rev Genet 17(5):272–283. https://doi.org/10.1038/nrg.2016.20 6. Memczak S, Jens M, Elefsinioti A, Torti F, Krueger J, Rybak A, Maier L, Mackowiak SD, Gregersen LH, Munschauer M, Loewer A, Ziebold U, Landthaler M, Kocks C, le Noble F, Rajewsky N (2013) Circular RNAs are a large class of animal RNAs with regulatory potency. Nature 495(7441):333–338. https:// doi.org/10.1038/nature11928 7. Lee S, Kopp F, Chang TC, Sataluri A, Chen B, Sivakumar S, Yu H, Xie Y, Mendell JT (2016) Noncoding RNA NORAD regulates genomic stability by sequestering PUMILIO proteins. Cell 164(1–2):69–80. https://doi.org/10. 1016/j.cell.2015.12.017 8. Hansen TB, Jensen TI, Clausen BH, Bramsen JB, Finsen B, Damgaard CK, Kjems J (2013) Natural RNA circles function as efficient microRNA sponges. Nature 495 (7441):384–388. https://doi.org/10.1038/ nature11993 9. Zhang H, Liang Y, Han S, Peng C, Li Y (2019) Long noncoding RNA and protein interactions: from experimental results to computational models based on network methods. Int J Mol Sci 20(6):1284. https://doi.org/10. 3390/ijms20061284 10. Krause H (2006) Trap-tagging: a novel method for the identification and purification of rna-protein complexes. US Patent App 10/531,095 11. Hogg JR, Collins K (2007) RNA-based affinity purification reveals 7SK RNPs with distinct composition and regulation. RNA 13 (6):868–880. https://doi.org/10.1261/rna. 565207 12. Slobodin B, Gerst JE (2011) RaPID: an aptamer-based mRNA affinity purification

technique for the identification of RNA and protein factors present in ribonucleoprotein complexes. Methods Mol Biol 714:387–406. https://doi.org/10.1007/978-1-61779-0058_24 13. Beach DL, Keene JD (2008) Ribotrap: targeted purification of RNA-specific RNPs from cell lysates through immunoaffinity precipitation to identify regulatory proteins and RNAs. Methods Mol Biol 419:69–91. https://doi. org/10.1007/978-1-59745-033-1_5 14. Michlewski G, Caceres JF (2010) RNaseassisted RNA chromatography. RNA 16 (8):1673–1678. https://doi.org/10.1261/ rna.2136010 15. Marchese D, de Groot NS, Lorenzo Gotor N, Livi CM, Tartaglia GG (2016) Advances in the characterization of RNA-binding proteins. Wiley Interdiscip Rev RNA 7(6):793–810. https://doi.org/10.1002/wrna.1378 16. Chu C, Quinn J, Chang HY (2012) Chromatin isolation by RNA purification (ChIRP). J Vis Exp (61):3912. https://doi.org/10.3791/ 3912 17. Simon MD (2013) Capture hybridization analysis of RNA targets (CHART). Curr Protoc Mol Biol Chapter 21:Unit 21.25. https://doi. org/10.1002/0471142727.mb2125s101 18. Tsai BP, Wang X, Huang L, Waterman ML (2011) Quantitative profiling of in vivoassembled RNA-protein complexes using a novel integrated proteomic approach. Mol Cell Proteomics 10(4):M110.007385. https://doi.org/10.1074/mcp.M110. 007385 19. Kim J, Piao HL, Kim BJ, Yao F, Han Z, Wang Y, Xiao Z, Siverly AN, Lawhon SE, Ton BN, Lee H, Zhou Z, Gan B, Nakagawa S, Ellis MJ, Liang H, Hung MC, You MJ, Sun Y, Ma L (2018) Long noncoding RNA MALAT1 suppresses breast cancer metastasis. Nat Genet 50 (12):1705–1715. https://doi.org/10.1038/ s41588-018-0252-3

Chapter 10 Purification and Structural Characterization of the Long Noncoding RNAs Allison Yankey, Sean C. Clark, Michael C. Owens, and Srinivas Somarowthu Abstract Long noncoding RNAs (lncRNAs) are now accepted as key players in diverse cellular functions, yet the structure–function relationships of these novel RNAs remain mostly unknown. Homogenous purification of lncRNAs is a necessary first-step for downstream structural studies. The large size of lncRNAs (often more than 1 kb) presents many unique challenges during the purification process. Here, we detail the purification of lncRNAs, including strategies to identify proper folding conditions of the target lncRNA. Next, we discuss two recently developed RNA structure probing techniques, SHAPE-MaP (SHAPE probing followed by mutational profiling) and DMS-MaP (DMS probing followed by mutational profiling). These techniques couple traditional RNA chemical probing methods with next-generation sequencing and allow high-throughput determination of RNA structures. Using the datasets resulting from these orthogonal probing experiments, we lay out the steps to determine and validate the secondary structure of the target lncRNA. Overall, this chapter details an adaptable protocol that can lead to a better understanding of the structure–function relationships of lncRNAs. Key words Native purification, lncRNA structure, Chemical probing, DMS (dimethyl sulfate), SHAPE (Selective 20 -Hydroxyl Acylation analyzed by Primer Extension), 1-Methyl-7-nitroisatoic anhydride (1 M7), Mutational profiling

1

Introduction lncRNAs have recently emerged as multifaceted biomolecules with roles in a variety of cellular processes [1]. For example, lncRNA X-chromosome inactivating transcript (XIST) is involved in X-chromosome inactivation, and the lncRNA Fendrr is involved in mouse fetal heart development [2, 3]. Despite their critical role in cell biology, only a small fraction of lncRNAs have been functionally characterized. Currently, there is a need for more functional studies to illuminate the molecular mechanisms of lncRNAs [4, 5].

Lin Zhang and Xiaowen Hu (eds.), Long Non-Coding RNAs: Methods and Protocols, Methods in Molecular Biology, vol. 2372, https://doi.org/10.1007/978-1-0716-1697-0_10, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021

93

94

Allison Yankey et al.

lncRNAs achieve their function by acting as scaffolds for regulatory proteins or by interacting with DNA and RNAs [6]. Emerging studies show that lncRNAs are highly structured and form independent domains [7–10]. For example, lncRNA HOTAIR possesses four domains, and two of these are shown to be involved in recruiting chromatin remodeling complexes [8, 11]. The lncRNA Braveheart directs cardiomyocyte lineage via a specific interaction between a guanine-rich motif and its protein partner CNBP [12]. Identifying such structural and functional motifs in lncRNAs will eventually allow us to develop novel lncRNA-based therapies. Some lncRNAs are already shown as potential drug targets, opening up new therapeutic avenues via lncRNA structure targeting [13]. A recent study characterized the structure of lncRNA SLNCR1 and identified its protein binding motifs [14]. Further, targeting these motifs with oligonucleotides is shown to block the interaction with protein partners and inhibit melanoma invasion [14]. Analyzing the lncRNA structures requires both homogenous purification of the lncRNA and careful attention to its folded state [15, 16]. In vitro purification of lncRNAs is an appealing method for lncRNA preparation to use in downstream structural probing techniques, but the large size of lncRNAs (0.2 kb–100 kb) presents several challenges. Here, we describe a non-denaturing method that preserves the lncRNA structure throughout the purification protocol [16]. The folding condition could be unique for each lncRNA. Therefore, we also detail the protocol for establishing folding conditions using size exclusion chromatography. After purification, the folded lncRNA can be probed using a variety of either chemical or enzymatic methods [17]. While, traditionally, different chemical probing reagents have been employed independently, recent studies have shown the robustness of employing multiple orthogonal probes [7, 8]. DMS (dimethyl sulfate) methylates single-stranded adenine and cytosine nucleotides on the Watson–Crick base pairing face [18]. SHAPE (Selective 20 -Hydroxyl Acylation analyzed by Primer Extension) reagents modify the backbone of all nucleotides when they are not basepaired [19]. Here we describe protocols that combines these chemical probing methods with next-generation sequencing and allows high-throughput determination of RNA structures [20, 21]. Using the structural datasets from both DMS and SHAPE probing experiments, we outline a method for robust determination of lncRNA structures [20, 21].

2

Materials All buffers and stock solutions must be made with RNase-free water. For general applications, we use ultrapure water purified by Elga PURELAB flex (Veolia Water Solutions). For sensitive

Purification and Structural Characterization of the Long Noncoding RNAs

95

applications such as DNA library preparation for next-generation sequencing, we prepare solutions using HyClone HyPure water (Cytiva #SH30538). We filter all the solutions using Nalgene filter units or syringe filters. Whenever possible, purchase high pure reagents. 2.1

Transcription

1. T7 RNA polymerase: We use an in-house prepared T7 RNA polymerase (a gift from the Pyle laboratory, Yale University), typically at concentrations of ~2 mg/mL, and stored at 80  C. Expression and purification of the T7 RNA polymerase are performed as described before [16]. 2. Magnesium chloride (99.995% pure): Make 1 M stocks and store at 20  C. We use high pure magnesium chloride from Sigma (Sigma #255777) because trace amounts of transition metals may lead to RNA degradation. 3. rNTPs: ATP (Cytiva #27100603), CTP (Cytiva #27120004), GTP (Cytiva #27200004), UTP (Alfa Aesar #J63427-MD): Prepare 30 mM solutions of NTPs and adjust the pH to 7.0 for long-term storage at 20  C. Alternatively, premade solutions can be purchased. 4. DTT (Fisher BioReagents #BP172–5): Make 1 M stock and store at 20  C. 5. Transcription Buffer (10): 400 mM Tris–HCl pH 8.0, 20 mM Spermidine, 100 mM NaCl, 120 mM MgCl2, 0.1% Triton X-100. Make 1 mL aliquots and store at 20  C. 6. DNA plasmid: Clone the target lncRNA into a suitable vector such that it is flanked by the T7 promoter sequence on the 50 end and by a unique restriction site on the 30 end. 7. Restriction enzyme: The restriction site must be unique and efficient. The example presented here uses BamHI-HF (New England Biolabs #R3136T). 8. RNase Inhibitor: RNaseOUT (Invitrogen #10777019).

2.2 Native Purification

1. DNase and Buffer: TURBO DNA-free™ Kit (Invitrogen™ #AM1907). 2. Proteinase K (Sigma Aldrich #3115828001): Make 30 mg/mL stocks in a buffer containing: 10 mM Tris–HCl pH 7.5, 1 mM CaCl2, 40% glycerol, and store at 80  C. 3. Amicon Filtration Units (MilliporeSigma, #UFC510096 #UFC5050BK): 500 μL volume, the size cutoff used is dependent on the molecular weight of the target lncRNA. 4. Folding Buffers: The components of the folding buffer depend on which structure probes are being used and which magnesium concentration is determined to be best for the lncRNA folding.

96

Allison Yankey et al.

(a) SHAPE Folding Buffer: 25 mM HEPES pH 7.4, 150 mM KCl, 1 mM EDTA, various concentrations of MgCl2 (see Note 1). (b) DMS Folding Buffer: 25 mM Cacodylic Acid pH 7.0, 150 mM KCl, 0.1 mM EDTA, various concentrations of MgCl2 (see Note 1). 5. FPLC: This protocol requires a fast protein liquid chromatography ideally equipped with a UV absorbance detector at wave¨ kta Pure FPLC length 260 nm (see Note 2). We use the A system (GE Healthcare), dedicated to RNA purifications, and operated at room temperature. 6. Chromatography columns and resins: (a) Superdex 200: We recommend Superdex 200 for lncRNAs of size range 200–700 nts. A prepacked column of Superdex 200 is available from Cytiva #28990944. (b) Sephacryl resins: Sephacryl S-400 HR (Cytiva # 17060901), Sephacryl S-500 HR (Cytiva # 17061301), Sephacryl S-1000 HR (Cytiva # 17047601). For lncRNAs of size >700 nts, we use Sephacryl resin (see Note 3). The bead size of the Sephacryl resin will vary based on the size of the lncRNA. We use Tricorn 10/300 empty columns (Cytiva #28406418) to pack Sephacryl resins at room temperature using manufacturer instructions. 2.3 Structure Probing and Library Preparation

1. SHAPE reagents: 1 M7 (AstaTech #F51360). Prepare a fresh stock (100 mM in DMSO) on the day of the experiment (see Note 4). 2. DMS (Sigma #D186309): DMS is highly toxic; therefore all the steps involving DMS must be carried in a fume hood. Prepare a fresh stock in 100% ethanol on the day of the experiment. The final concentration of DMS might vary by lncRNA. Typically, we titrate in the range of 1 to 15 mM of final DMS concentration. 3. SuperScript™ II Reverse Transcriptase (Invitrogen # 18064014) (see Note 5). This specific reverse transcriptase is essential for SHAPE-MaP probing. In the presence of manganese, SuperScript™ II inserts a noncomplementary mutation to the cDNA at the sites of SHAPE reagent modified nucleotides [22]. 4. TGIRT™-III enzyme: (InGex #TGIRT50). This enzyme is specific for DMS-MaP probing. 5. 1 M Manganese chloride solution: (Fisher Bioreagents #BP541–100). 6. dNTPs: 10 mM mix (Invitrogen #18427013).

Purification and Structural Characterization of the Long Noncoding RNAs

97

7. 5 pre-SHAPE-MaP Buffer: 250 mM Tris (pH 8.0), 375 mM KCl, and 50 mM DTT. Make 1 mL aliquots and store at 20  C. 8. 5 pre-DMS-MaP Buffer: 250 mM Tris–HCl (pH 8.3), 375 mM KCl, 15 mM MgCl2. Alternatively, use 5 first-strand synthesis buffer (Invitrogen, SuperScript II reverse transcriptase buffer). Make 1 mL aliquots and store at 20  C. 9. Q5 Hot Start High-Fidelity DNA Polymerase (New England Biolabs #M0493S). 10. RNA cleanup kit: RNA Clean & Concentrator (Zymo #R1013). 11. Primers for Reverse Transcription: The number of reverse transcription primers necessary depends on the length of the lncRNA and the processivity of the reverse transcriptase. SuperScript II is processive to the extent of ~700 nts. By tiling the lncRNA with primer sets every 700 nts, the cDNA created represents the entire lncRNA with no gaps. Prepare reverse transcription primers in water to a final concentration of 2 μM and store at 20  C. 12. PCR Primers: Prepare 10 μM forward and reverse primers in water and store at 20  C. 13. PCR cleanup kit: DNA clean and concentrator (Zymo #D4013). 14. Library Preparation Kit: Nextera XT DNA Library Prep kit (Illumina #FC-131-1024) and Nextera Index primers (Illumina #FC-131-1002).

3

Methods Maintain an RNase-free environment and carry out all experiments using RNase-free glassware and plasticware. An overview of our purification protocol is shown in Fig. 1a.

3.1

Transcription

1. Prepare the DNA template by digesting the lncRNA plasmid using an appropriate restriction enzyme and following the manufacturer’s instructions. Purify the linearized plasmid using DNA cleanup and concentrator kits (Zymo #D4013). The purified DNA template can be stored at 20  C for a week. 2. Thaw the transcription reagents on ice, including 10 transcription buffer, NTPs, 1 M DTT, and the DNA template. 3. Add 20 μL of 10 Transcription buffer, 2 μL of 1 M DTT, 25 μL of each 30 mM NTP, 2 μL of RNase Inhibitor, 5 μg of DNA template, 10 μL of 2 mg/mL T7 RNA polymerase, and water to a final volume of 200 μL.

98

Allison Yankey et al.

Fig. 1 Purification and folding of lncRNAs. (a) Overview of our native purification protocol. (b) and (c) lncTCF7 purified with S200 and S400 Sepharcyl columns, respectively. (This figure is reproduced from [9])

4. Incubate at 37  C for 2–3 h. While a more extended incubation time might increase the RNA yield, we do not recommend incubation time longer than 4 h. 3.2 Native Purification

1. During transcription, thaw TURBO™ DNase buffer on ice. 2. After transcription, digest the DNA template by adding 20 μL of 10 TURBO™ DNase buffer and 2 μL TURBO™ DNase (2 Units/μL) to the transcription reaction and incubate at 37  C for 30 minutes. 3. After DNA template digestion, digest proteins by adding 2 μL of 30 mg/mL proteinase K to the reaction mix and incubate at 37  C for 30 min. 4. During the proteinase K digestion step, prepare the Amicon filtration unit. Add the folding buffer and centrifuge for 5 min at 4000  g to equilibrate the filter membrane with folding buffer.

3.3 Filtration and Buffer Exchange

1. Prior to filtration, prewarm 5 mL of the appropriate folding buffer at 37  C to maintain consistent temperature throughout transcription and filtration (Note 6).

Purification and Structural Characterization of the Long Noncoding RNAs

99

2. Add the proteinase K digested sample to the Amicon filtration unit. 3. Fill the remaining volume of the Amicon unit with the prewarmed folding buffer and centrifuge at 4000  g for 2–5 min (see Note 7). Repeat this step 3–5 times. 4. Remove the sample from the filter unit into a clean microcentrifuge tube and dilute to 500 μL total volume. 5. Incubate the sample at 37  C for 30 min to equilibrate and allow folding to occur. 3.4 Size Exclusion Chromatography

Unless noted otherwise, all size exclusion chromatography steps are performed at room temperature. Example chromatograms showing the native purification of the lncRNA lncTCF7 using both Sephacryl S400 and Superdex 200 columns are shown in Fig. 1b, c. 1. Run and maintain the FPLC as recommended by the manufacturer. We recommend regular RNase decontamination by washing the system and columns with 5 column volumes of 10% RNaseZap (Life Technologies #AM9784) in ultrapure water. 2. Prior to RNA purification, equilibrate the column with 3 column volumes of the appropriate folding buffer. 3. Load the RNA sample and run the appropriate SEC method as recommended by the manufacturer. Collect 0.5 mL fractions. 4. Pool the eluted samples with high absorbance and incubate at 37  C for 30 minutes before initiating the probing experiments.

3.5 Magnesium Titration for Determining the lncRNA Folding Conditions

For studying the dependence of magnesium for folding of the target lncRNA and accurate determination of KMG, sedimentation velocity analytical ultracentrifugation (SV-AUC) should be employed as previously described [16]. However, given the instrument cost and sensitive nature of lncRNAs, SV-AUC may not be easily accessible. Here, we describe an alternative method using analytical SEC to determine the folding conditions of the target lncRNA. 1. Perform a series of purification and SEC steps as outlined above while varying the Mg2+ ion concentrations in the folding buffer. 2. Monitor the elution peak while varying the Mg2+ concentration. An example of titration curves is shown in Fig. 2a. Typically, with an increase in Mg2+ concentration, we notice a decrease in absorbance and a rightward shift in the RNA elution volume, both indicating RNA folding and compaction.

100

Allison Yankey et al.

Fig. 2 (a) SEC profiles of lncTCF7 purified at various magnesium ion concentrations; lncRNAs can be purified and folded to homogeneity over a variety of magnesium ion concentrations. (b) SHAPE reactivity correlation between lncTCF7 purified at 12 and 25 mM Mg2+. The high correlation between SHAPE reactivities at 12 mM and 25 mM Mg2+ indicate that there are no significant secondary structural changes beyond 12 mM. (This figure is reproduced from [9])

3. Identify the highest amount of Mg2+ concentration such that there is no further shift in the elution peak and also no aggregation peak. Choose at least two higher Mg2+ concentrations (at least two-fold apart, such as 12 mM and 25 mM) and probe lncRNA structure in both conditions following the SHAPE probing protocol below. By probing at two concentrations, we ensure there is a saturating amount of Mg2+ present, and the secondary structure is not further affected by adding more Mg2+. 3.6 Secondary Structure Probing

1. Prepare a fresh stock of 100 mM 1 M7 SHAPE reagent in DMSO.

3.6.1 SHAPE Probing and Reverse Transcription

2. Split the pooled sample into two 900 μL aliquots. One will be (+) 1 M7 and the other will be () control. 3. Add 100 μL of 100 mM 1 M7 to (+) sample and 100 μL DMSO to () sample. 4. Incubate samples at 37  C for 10 min. 5. Purify the probed RNA using an RNA cleanup kit and measure the concentration. 6. Split 2 picomoles of (+) and () RNA sample into separates tubes based on the total number of reverse transcription primer pairs. 7. Perform an annealing reaction for each reverse transcription reaction by adding 1 μL of 2 μM reverse primer to 2 picomoles of each RNA sample and dilute with water to a final volume of 11 μL. Incubate at 65  C for 5 min, then immediately cool on ice.

Purification and Structural Characterization of the Long Noncoding RNAs

101

8. Freshly prepare 100 μL of SHAPE-MaP RT master mix: 50 μL of 5 pre-SHAPE-MaP buffer, 1.5 μL of 1 M MnCl2, 12.5 μL of 10 mM dNTPs, and 36 μL of water. This master mix is sufficient for 12 reactions, scale up or down as necessary. 9. Add 8 μL of SHAPE-MaP RT master mix to each sample and mix thoroughly. 10. Add 1 μL of SuperScript™ II enzyme and incubate at 42  C for 3 h. 11. Inactivate the RT reaction by incubating the samples at 70  C for 15 min. 12. Purify cDNA samples using G-25 columns. 13. cDNA can be stored at 20  C for up to 6 months. 3.6.2 DMS-MaP Probing and Reverse Transcription

1. Prepare a fresh stock of 100 mM DMS reagent in 100% ethanol. 2. Split the pooled sample into two 900 μL aliquots. One for (+) DMS and the second tube for () control. 3. Add 100 μL of 100 mM DMS to (+) sample and 100 μL of 100% Ethanol to () sample. 4. Incubate samples at 37  C for 10 min. 5. Purify the probed RNA using an RNA cleanup kit and measure the concentration. 6. Split 2 picomoles of (+) and () RNA sample into separates tubes based on the total number of reverse transcription primer pairs. 7. Perform an annealing reaction for each reverse transcription reaction by adding 1 μL of 2 μM reverse primer to 2 picomoles of each RNA sample and dilute with water to a final volume of 11 μL. Incubate at 65  C for 5 min, then immediately cool on ice. 8. Freshly prepare 100 μL of DMS-MaP-RT master mix: 50 μL of 5 pre-DMS-Map buffer, 12.5 μL of 10 mM dNTPs, 12.5 μL of 0.1 M DTT, 6.25 μL of RNaseOUT, and 18.75 μL of water. This master mix is sufficient for 12 reactions, scale up or down as necessary. 9. Add 8 μL of the DMS-MaP-RT master mix to each sample and mix thoroughly. 10. Add 1 μL of TGIRT-III enzyme and incubate at 57  C for 2 h. 11. Purify cDNA samples using G-25 columns. 12. cDNA can be stored at 20  C for up to 6 months.

3.7 cDNA Amplification

1. For each reverse transcription reaction, set up a PCR reaction using 5 μL of the cDNA. The annealing temperature, extension

102

Allison Yankey et al.

time, and cycle number all must be optimized for each primer set. Save remaining cDNA at 20  C in order to repeat the PCR reaction if necessary. 2. Clean up PCR reactions using DNA cleanup kits (Zymo #D4013). 3. Run the samples on 1.5% TAE agarose gel to check for the correct amplicons. If necessary, perform gel extraction using DNA gel extraction kits such as Zymoclean Gel DNA Recovery Kit (Zymo #D4007). 4. Create a 1:5 dilution of each amplified DNA to measure the concentration using the Qubit system. 3.8 Library Preparation

At this point, the library preparation method of choice can be used to prepare the cDNA for sequencing. We recommend the Nextera XT Library prep protocol (Illumina, cat. no. FC-131-1024). 1. Make 0.2 ng/μL stocks of each amplified DNA sample, store at 20  C. 2. Thaw the Tagment DNA buffer, DNA, and Amplicon tagment mix on ice. 3. In a 0.2 mL PCR tube, add 10 μL of Tagment DNA buffer, 5 μL of 0.2 ng/μL DNA, and 5 μL of Amplicon tagment mix. 4. Using the heated lid option on a thermocycler, incubate at 55  C for 5 min and hold at 10  C. 5. Immediately after reaching 10  C, add 5 μL of neutralize tagment buffer to each reaction and incubate at room temperature for 5 min. The total volume now is 25 μL, all of which can be used for the Nextera PCR step. 6. Thaw the Nextera PCR master mix on ice. Thaw the index primers at room temperature and mix by inverting. Use unique barcodes for (+) and () samples and record for demultiplexing after the sequencing run. 7. Set up the Nextera PCR reaction: In a 0.2 mL PCR tube, add 25 μL of Tagmented sample, 15 μL of Nextera PCR master mix, 5 μL of Index 1 primer, 5 μL of Index 2 primer and mix thoroughly. 8. Using the heated lid option, run the following program on a thermocycler: l Step 1: 72  C for 3 min. l

Step 2: 95  C for 30 s.

l

Step 3: 12 cycles of: – 95  C for 10 s – 55  C for 30 s – 72  C for 30 s.

Purification and Structural Characterization of the Long Noncoding RNAs l

Step 4: 72  C for 5 min.

l

Step 5: Hold at 10  C.

103

9. After PCR, samples can be stored at 4  C overnight. 10. Warm AMPure XP beads (Beckman, #A63880) at room temperature and vortex briefly before use. 11. Add 45 μL of AMPure XP beads to each PCR reaction and transfer the contents to a 1.7 mL microcentrifuge tube. Mix thoroughly and incubate at room temperature for 5 min. 12. Place the tubes on a magnetic stand and let the supernatant clear for 2 min. Carefully remove the supernatant. 13. Add 200 μL of freshly prepared 80% ethanol to each tube, leave on the magnetic stand for 30 s, then remove the ethanol. Repeat this step two more times. 14. Air-dry the beads on the magnetic stand for 15 min. 15. Resuspend the beads in 20 μL of pure water and leave on the bench at room temperature for 2 min. 16. Place the tubes on a magnetic stand and let the supernatant clear for 2 min. Carefully pipet 17 μL of the eluted DNA into a new microcentrifuge tube. 17. Measure the concentration of each library using a highsensitivity assay such as Qubit fluorometer following the manufacturer instructions. 18. Send samples for sequencing, following the instructions from the sequencing facility. The resulting compressed “.fastq.gz” file can be directly used for data analysis. 3.9

Data Analysis

3.9.1 Structure Determination

SHAPE-MaP and DMS-MaP data analysis requires a 64-bit Linux system with at least 8 GB RAM. Depending on the number of sequencing reads, read depth, and lncRNA size, a typical run might take more than 12 h. We recommend performing data analysis on a high-performance computer cluster. 1. Download shapemapper software from the Github site (https://github.com/Weeks-UNC/shapemapper2) and install according to the instructions provided in the readme file [23]. 2. Run the shapemapper analysis using the default parameters. Check shapemapper documentation for full explanation of all the available options. An example command is shown below. [shapemapper --name

--target --out --modified --R1 --R2 --untreated --R1 0.8) are more likely to contain an independently folded sub-domain. 8. Eliminate the secondary structure models that do not contain these sub-domains and identify the correct secondary structure model.

106

Allison Yankey et al.

3.9.3 Confidence Estimation

We use jackknife resampling methods to estimate the confidence of each base pair in the final secondary structure model [26]. 1. Generate mock datasets by randomly removing 10% of the SHAPE-MaP reactivities in the “.shape” file and labeling them as “-999” (no data). We use MATLAB to create these datasets. Alternatively, python or other programming languages can be employed. 2. Predict secondary structures using each mock dataset as input for RNAstructure (see Subheading 3.9.1, step 4). 3. Estimate confidence for each base pair in the final model by counting its occurrence among all the secondary structure models generated using mock datasets. We use MATLAB to estimate the confidence of each base pair. Alternatively, other programming languages such as python can be employed.

3.9.4 Validate the Model Using DMS Probing Data

DMS methylates single-stranded adenosines and cytidines on the Watson–Crick base pairing face [18]. Therefore, we use DMS probing data as an orthogonal approach to validate our SHAPEdirected secondary structure model. Although developed for analyzing SHAPE-MaP data, shapemapper can also be employed to analyze DMS-MaP data [23]. We tested this by performing DMS-MaP on D135 construct of ai5γ group IIB intron and analyzed the data using shapemapper. We mapped the DMS reactivities onto the known secondary structure of ai5γ (Fig. 4). As expected, we found almost all of the highly reactive adenine and cytosines are in the loops or helix terminal regions. 1. Perform shapemapper analysis on DMS-MaP similar to the steps outlined in Subheading 3.9.1, step 2. 2. Using VARNA (http://varna.lri.fr/) or similar RNA structure visualizing software, color the shape-directed secondary structure model using DMS reactivities [27]. 3. Identify the regions in disagreement with DMS data and calculate the overall percent of regions in agreement with DMS data.

4

Notes 1. Typically, we titrate in the range of 0–50 mM [Mg+2] to determine which concentration best supports the folding of the lncRNA while preventing aggregation. 2. While it is ideal that the FPLC system is equipped with a UV absorbance detector at wavelength 260 nm, it is possible to use machines that detect at 280 nm instead. 3. Sephacryl resins are typically used to purify proteins. The manufacture recommendations on the choice of resin is based on

Purification and Structural Characterization of the Long Noncoding RNAs

107

Fig. 4 DMS reactivity of ai5γ group IIB intron calculated via using shapemapper. We use the wellcharacterized D135 construct of ai5γ group IIB intron as a control for in vitro probing experiments. Here, we performed DMS-MaP and analyzed the data using shapemapper. The resulting DMS reactivities are then mapped onto the known secondary structure. Nucleotides with high DMS reactivity are highlighted in red, nucleotides with medium reactivity are highlighted in yellow, and nucleotides with “no data” are highlighted in gray. As expected, almost all of the adenine and cytosines with high DMS reactivity are in the single-stranded regions. Note that not all loops in D135 are reactive because they are involved in long-range kissing loop interactions

the overall shape of proteins. However, it is possible that the target lncRNA does not form a globular shape. Therefore, we recommend testing various Sephacryl resins for purification.

108

Allison Yankey et al.

4. 1 M7 is a well-established SHAPE reagent for in vitro RNA probing applications. However, there are many other SHAPE reagents available [28, 29]. Slight modifications to this method may be required to accommodate other SHAPE reagents. 5. While this method utilizes SuperScript II as the reverse transcriptase for SHAPE-MaP probing, very recently the Pyle Laboratory developed Marathon RT and characterized its application for both SHAPE-MaP and DMS-MaP [30]. Alterations to the reverse transcription step of this protocol can be made to accommodate the use of Marathon RT or other enzymes. 6. Since SHAPE and DMS probing reagents require different folding buffers, the filtration and buffer exchange steps must use the correct buffer for the intended downstream application. While a titration is required to determine the necessary concentration of magnesium ions for proper folding, using 12 mM Mg+2—which is the same concentration found in the transcription reaction—is a good starting point. 7. The centrifugation of the filtration unit may lead to RNA aggregation if run for too long or at too high speed. Opt for a low speed for 2–3 min first, then check the remaining volume in the column. Aim to retain between 100 μL and 200 μL after each spin consistently. At that volume, you can proceed to the next spin.

Acknowledgments This research is partially funded by a CURE grant (SAP 4100079710) from the Pennsylvania Department of Health to S. S and by the start-up funds from Drexel University College of Medicine. References 1. Cech TR, Steitz JA (2014) The noncoding RNA revolution-trashing old rules to forge new ones. Cell 157(1):77–94. https://doi. org/10.1016/j.cell.2014.03.008 2. Brown CJ, Ballabio A, Rupert JL, Lafreniere RG, Grompe M, Tonlorenzi R et al (1991) A gene from the region of the human x-inactivation center is expressed exclusively from the inactive x-chromosome. Nature 349 (6304):38–44. https://doi.org/10.1038/ 349038a0 3. Sauvageau M, Goff LA, Lodato S, Bonev B, Groff AF, Gerhardinger C et al (2013) Multiple knockout mouse models reveal lincRNAs

are required for life and brain development. Elife 2:e01749. https://doi.org/10.7554/ eLife.01749 4. Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H et al (2012) The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res 22 (9):1775–1789. https://doi.org/10.1101/gr. 132159.111 5. Fang S, Zhang L, Guo J, Niu Y, Wu Y, Li H et al (2018) NONCODEV5: a comprehensive annotation database for long non-coding RNAs. Nucleic Acids Res 46(D1):

Purification and Structural Characterization of the Long Noncoding RNAs D308–DD14. https://doi.org/10.1093/nar/ gkx1107 6. Sun Q, Hao Q, Prasanth KV (2018) Nuclear long noncoding RNAs: key regulators of gene expression. Trends Genet 34(2):142–157. https://doi.org/10.1016/j.tig.2017.11.005 7. Novikova IV, Hennelly SP, Sanbonmatsu KY (2012) Structural architecture of the human long non-coding RNA, steroid receptor RNA activator. Nucleic Acids Res 40 (11):5034–5051. https://doi.org/10.1093/ nar/gks071 8. Somarowthu S, Legiewicz M, Chillon I, Marcia M, Liu F, Pyle AM (2015) HOTAIR forms an intricate and modular secondary structure. Mol Cell 58(2):353–361. https:// doi.org/10.1016/j.molcel.2015.03.006 9. Owens MC, Clark SC, Yankey A, Somarowthu S (2019) Identifying structural domains and conserved regions in the long non-coding RNA lncTCF7. Int J Mol Sci 20(19):4770. https://doi.org/10.3390/ijms20194770 10. Qian X, Zhao J, Yeung PY, Zhang QC, Kwok CK (2019) Revealing lncRNA structures and interactions by sequencing-based approaches. Trends Biochem Sci 44(1):33–52. https:// doi.org/10.1016/j.tibs.2018.09.012 11. Tsai M-C, Manor O, Wan Y, Mosammaparast N, Wang JK, Lan F et al (2010) Long noncoding RNA as modular scaffold of histone modification complexes. Science 329(5992):689–693. https://doi.org/ 10.1126/science.1192002 12. Kim DN, Thiel BC, Mrozowich T, Hennelly SP, Hofacker IL, Patel TR et al (2020) Zincfinger protein CNBP alters the 3-D structure of lncRNA Braveheart in solution. Nat Commun 11(1):13. https://doi.org/10.1038/s41467019-13942-4 13. Donlic A, Hargrove AE (2018) Targeting RNA in mammalian systems with small molecules. Wiley Interdiscip Rev RNA 9(4):21. https:// doi.org/10.1002/wrna.1477 14. Schmidt K, Weidmann CA, Hilimire TA, Yee E, Hatfield BM, Schneekloth JS et al (2020) Targeting the oncogenic long non-coding RNA SLNCR1 by blocking its sequence-specific binding to the androgen receptor. Cell Rep 30(2):541–554.e5. https://doi.org/10. 1016/j.celrep.2019.12.011 15. Batey RT (2014) Advances in methods for native expression and purification of RNA for structural studies. Curr Opin Struct Biol 26:1–8. https://doi.org/10.1016/j.sbi.2014. 01.014 16. Chillon I, Marcia M, Legiewicz M, Liu F, Somarowthu S, Pyle AM (2015) Native

109

purification and analysis of long RNAs. Methods Enzymol 558:3–37. https://doi.org/10. 1016/bs.mie.2015.01.008 17. Novikova IV, Hennelly SP, Sanbonmatsu KY (2013) Tackling structures of long noncoding RNAs. Int J Mol Sci 14(12):23672–23684. https://doi.org/10.3390/ijms141223672 18. Tijerina P, Mohr S, Russell R (2007) DMS footprinting of structured RNAs and RNA-protein complexes. Nat Protoc 2 (10):2608–2623. https://doi.org/10.1038/ nprot.2007.380 19. Merino EJ, Wilkinson KA, Coughlan JL, Weeks KM (2005) RNA structure analysis at single nucleotide resolution by selective 20 -hydroxyl acylation and primer extension (SHAPE). J Am Chem Soc 127 (12):4223–4231. https://doi.org/10.1021/ ja043822v 20. Smola MJ, Rice GM, Busan S, Siegfried NA, Weeks KM (2015) Selective 2’-hydroxyl acylation analyzed by primer extension and mutational profiling (SHAPE-MaP) for direct, versatile and accurate RNA structure analysis. Nat Protoc 10(11):1643–1669. https://doi. org/10.1038/nprot.2015.103 21. Zubradt M, Gupta P, Persad S, Lambowitz AM, Weissman JS, Rouskin S (2017) DMS-MaPseq for genome-wide or targeted RNA structure probing in vivo. Nat Methods 14(1):75–82. https://doi.org/10.1038/ nmeth.4057 22. Siegfried NA, Busan S, Rice GM, Nelson JA, Weeks KM (2014) RNA motif discovery by SHAPE and mutational profiling (SHAPEMaP). Nat Methods 11(9):959–965. https:// doi.org/10.1038/nmeth.3029 23. Busan S, Weeks KM (2018) Accurate detection of chemical modifications in RNA by mutational profiling (MaP) with ShapeMapper 2. RNA 24(2):143–148. https://doi.org/10. 1261/rna.061945.117 24. Hajdin CE, Bellaousov S, Huggins W, Leonard CW, Mathews DH, Weeks KM (2013) Accurate SHAPE-directed RNA secondary structure modeling, including pseudoknots. Proc Natl Acad Sci U S A 110(14):5498–5503. https://doi.org/10.1073/pnas.1219988110 25. Xu ZZ, Mathews DH (2016) Secondary structure prediction of single sequences using RNAstructure. Methods Mol Biol 1490:15–34. https://doi.org/10.1007/9781-4939-6433-8_2 26. Ramachandran S, Ding F, Weeks KM, Dokholyan NV (2013) Statistical analysis of SHAPEdirected RNA secondary structure modeling.

110

Allison Yankey et al.

Biochemistry 52(4):596–599. https://doi. org/10.1021/bi300756s 27. Darty K, Denise A, Ponty Y (2009) VARNA: interactive drawing and editing of the RNA secondary structure. Bioinformatics 25 (15):1974–1975. https://doi.org/10.1093/ bioinformatics/btp250 28. Busan S, Weidmann CA, Sengupta A, Weeks KM (2019) Guidelines for SHAPE reagent choice and detection strategy for RNA structure probing studies. Biochemistry 58 (23):2655–2664. https://doi.org/10.1021/ acs.biochem.8b01218

29. Flynn RA, Zhang QC, Spitale RC, Lee B, Mumbach MR, Chang HY (2016) Transcriptome-wide interrogation of RNA secondary structure in living cells with icSHAPE. Nat Protoc 11(2):273–290. https://doi.org/ 10.1038/nprot.2016.011 30. Guo LT, Adams RL, Wan H, Huston NC, Potapova O, Olson S et al (2020) Sequencing and structure probing of long RNAs using MarathonRT: a next-generation reverse transcriptase. J Mol Biol 432(10):3338–3352. https://doi.org/10.1016/j.jmb.2020.03.022

Chapter 11 Simultaneous RNA-DNA FISH Lan-Tian Lai, Zhenyu Meng, Fangwei Shao, and Li-Feng Zhang Abstract A highly useful tool for studying lncRNAs is simultaneous RNA-DNA FISH, which reveals the localization and quantitative information of RNA and DNA in cellular contexts. However, a simple combination of RNA FISH and DNA FISH often generates disappointing results because the fragile RNA signals are often damaged by the harsh conditions used in DNA FISH for denaturing the DNA. Here, we describe a robust and simple RNA-DNA FISH protocol, in which amino-labeled nucleic acid probes are used for RNA FISH. The method is suitable to detect single RNA molecules simultaneously with DNA. Key words Fluorescence in situ hybridization (FISH), Single Molecule RNA FISH, Amino-labeled Probes, Simultaneous RNA-DNA FISH, lncRNA

1

Introduction Long non-coding RNAs (lncRNAs) are non-protein coding transcripts that are longer than 200 nucleotides. A long growing list of lncRNAs is identified in the current research [1]. It is speculated that many lncRNAs are components of nuclear architecture and players of epigenetic regulation. Therefore, it is important to study how lncRNAs interact with their DNA targets in the cellular context. Simultaneous RNA-DNA fluorescence in situ hybridization (RNA-DNA FISH) is a highly useful technique to address this question. FISH not only reveals the subcellular localization of nucleic acids, but also provides quantitative information. For example, single molecule RNA FISH (SMRF) can detect single mRNA molecules in a single cell [2]. Furthermore, the DNA FISH signal intensity of telomeres can be used to measure individual telomere length in a given cell [3]. However, simultaneous RNA-DNA FISH was technically challenging. In a simple combination of RNA FISH and DNA FISH, the RNA signals are often damaged in DNA FISH

Lin Zhang and Xiaowen Hu (eds.), Long Non-Coding RNAs: Methods and Protocols, Methods in Molecular Biology, vol. 2372, https://doi.org/10.1007/978-1-0716-1697-0_11, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021

111

112

Lan-Tian Lai et al.

by the harsh conditions (high temperature and low pH) used for denaturing the genomic DNA [4, 5]. We developed a robust method to protect the RNA signals in RNA-DNA FISH by introducing multiple protein components (immunostains) into the signal detection steps of RNA FISH [4]. This, followed by a formaldehyde fixation step, provides robust protection on the RNA signals during the subsequent DNA FISH. Introducing immunostains into RNA FISH is critical because formaldehyde acts as a fixative in the cellular context mainly by crosslinking the amino groups from the lysine residues of the protein matrix. Although introducing multiple layers of immunostains into RNA FISH provides robust protection on the RNA signal, it makes the protocol lengthy and tedious. It also unnecessarily amplifies the RNA signals and raises the background noise. Therefore, we further improved the method by adding amino groups directly onto the nucleic acid probes used in RNA FISH [6]. The improved method is simple and robust. Using amino-labeled RNA FISH probes, we successfully detected single mRNA molecules simultaneously with DNA. Here, we describe the protocol in detail.

2 2.1

Materials Equipment

1. Cytospin centrifuge. 2. Reverse phase HPLC equipped with C18 (10 mm  250 mm). 3. Automated DNA synthesizer (Mermade 4, BioAutomation).

2.2 Commercial Reagents

1. Molecular trap pack. 2. Phosphoramidites and reagents used in oligonucleotide synthesis: dT-CE Phosphoramidite, dmf-dG-CE Phosphoramidite, dA-CE Phosphoramidite, Ac-dC-CE phosphoramidite, Activator, Cap mix A, Cap mix B, Oxidizing solution, 3% DCA/DCM Deblocking Mix. 3. 200 nmole size of CPG resin in Mermade column. 4. Nick translation kit. 5. Cy3-dUTP. 6. Aminoallyl-dUTP (Jena Bioscience, Cat# NU-803S). 7. Cytology funnels. 8. Superfrost/Plus microscopic slides. 9. 18  18 mm #1.5 coverslips. 10. DAPI-containing mounting media with antifade. 11. Transparent nail polish. 12. Rubber cement. 13. Coplin jars.

RNA-DNA FISH

2.3 Buffers and Solutions

113

All solutions for RNA FISH are prepared with RNase-free water. 1. Prepare phosphoramidite solutions including the four natural nucleotides, amino-dT (NH2T), terminal amino linker (NH2C6) and fluorescein phosphoramidites (FAM). To each phosphoramidite, add 1.492 mL of anhydrous acetonitrile to directly prepare 0.067 M of solution. 2. Deprotection solution I: 40% w/w Methylamine/water solution. 5.7 mL of methylamine is dissolved in 6 mL of water. 3. Deprotection solution II: 28% Ammonium hydroxide solution. 4. HPLC buffer: 0.1 M TEAA buffer, pH 7.0. Dissolve 5.72 mL of acetic acid in 500 mL water. Add 10.12 g of triethylamine to the acetic acid solution. Adjust pH to 7.0 by adding either triethylamine or acetic acid. Make up the solution to 1 L and filter the solution by a 0.45 μm filter unit. 5. Cy5 coupling solution: Dissolve 1 mg of Cy5 NHS ester in 20 μL anhydrous DMSO. Dilute Cy5/DMSO solution by 2 mL of acetonitrile. 400 μL of N,N-Diisopropylethylamine is added right before the solution is applied to DNA resin. 6. Hybridization buffer: 50% formamide, 2 SSC, 2 mg/mL BSA, 10% Dextran Sulfate-500 K. 7. CSK buffer: 100 mM NaCl, 300 mM Sucrose, 3 mM MgCl2, 10 mM PIPES (pH 6.8). The solution is autoclaved and stored at 4  C. 8. 4% (w/v) paraformaldehyde (PFA) in 1  PBS: Dissolve paraformaldehyde in 4 mM NaOH, add in 10  PBS, adjust pH to 7.4, and bring up the solution to the final volume with water. The solution should be stored in dark at 4  C for no more than 3 weeks. 9. 2% PFA: dilute from 4% PFA using PBS. 10. 70%, 80%, 90%, and 100% ethanol (v/v in water). 11. 20 SSC 12. 10% formamide, 2 SSC with 4 mM Vanadyl ribonucleoside. 13. PBS with 0.5% Triton X-100. 14. PBS with 0.2% Tween-20. 15. 70% formamide, 2 SSC. 16. 50% formamide, 2 SSC.

114

3

Lan-Tian Lai et al.

Methods

3.1 Probe Preparation 3.1.1 Synthesis of a Single Oligonucleotide Probe Dually Labeled with Fluorescein and Amino Group

A single fluorescently labeled oligo probe can be used in RNA FISH to detect RNA transcripts with repetitive sequences. In this section, to illustrate the probe synthesis process, we describe the procedure of synthesizing a fluorescein and amino dually labeled oligo probe with the sequence 50 -FITC-(NH2TAACCC)6–30 . This probe was used to detect the telomeric repeat-containing RNA (Terra) [6]. 1. Install NH2T phosphoramidite solution on DNA synthesizer as the first unnatural nucleobase (see Note 1). Input the sequence of Terra probe to the synthesizer software with NH2T as 1. Start a 200 nmole scale of DNA synthesis via standard solid phase phosphoramidite chemistry on DNA synthesizer (see Notes 2 and 3). Leave the probe on CPG resin for FAM labeling in the next step. 2. Install FAM phosphoramidite solution on DNA synthesizer as the second unnatural nucleobase. Input DNA sequence as 50 -2 T-30 to the synthesizer software. Start the synthesis and follow the standard synthetic procedure on DNA synthesizer (see Note 4). T is only used to represent the oligonucleotides already on GPG resin. In this step, only FAM is added to DNA probes as one nucleobase. 3. To cleave and deprotect DNA probe from CPG resin, load 250 μL of deprotection solution I onto the resin twice. Leave the solution on the resin for 10 min each time. Combine the two solutions and stand the solution for 2 h at room temperature. Simultaneously, protection groups on all the nucleobases are removed. 4. Dry solution from step 3 by vacuum concentrator to yield a slightly yellow pellet. 5. Dissolve the probe crudes from step 4 in 400 μL water to prepare samples for HPLC purification. 6. Inject 200 μL solution from step 5 to reverse phase HPLC. Elute the desired probe by acetonitrile over TEAA buffer (5–35% over 30 min). The peak with absorption at both 260 nm and 490 nm is collected. 7. Repeat step 6 to purify the rest half of probe crudes. 8. Combine the two collections and remove the solvent by lyophilization. The probe is dissolved in hybridization buffer and stored at 20  C. The working concentration of the probe is 1 μM. 9. For quality control, the dually labeled probe is characterized by ESI-MS and is quantified based upon UV-vis absorption of either DNA (260 nm) or fluorescein dye (490 nm).

RNA-DNA FISH 3.1.2 Synthesis of Cy5 and Amino Dually Labeled Probe Set for SMRF

115

A set of 20 to 40 fluorescently labeled oligo probes is required in SMRF to detect single RNA molecules. In this section, we describe the procedure of synthesizing a set of 20 Cy5 and amino dually labeled oligo probes, which was used to detect the single molecules of EGFP mRNA [6]. 1. Synthesize twenty DNA probes with sequences listed in Table 1 on 200 nmole scale via standard solid phase synthesis. Use NH2 T phosphoramidite solution to incorporate NH2T at indicated positions (Table 1) by following the same procedure in Subheading 3.1.1 step 1. 2. Adding an amino C6-alkyl chain: Install NH2-C6 phosphoramidite solution on DNA synthesizer as the second unnatural nucleobase. Add the NH2-C6 chain to 50 -terminal of each probe, via phosphoramidite chemistry on DNA synthesizer by following the same step as Subheading 3.1.1 step 2. Take off MMT protection group from terminal amine (see Note 5). 3. Transfer CPG resin of each probe from step 2 into a 2 mL eppendorf. Swell the resin in ACN for 20 min at room temperature. Centrifuge to set down the resin at the bottom of the eppendorf and remove ACN completely. 4. Add 60 μL of Cy5 coupling solution to the resin from step 3. Vortex the mixture vigorously at room temperature for 2 h to conjugate Cy5 dye to alkyl amino terminal of DNA probes (see Note 6). 5. Centrifuge to set down the resin at the bottom of the eppendorf and remove Cy5 coupling solution. 6. Repeat steps 4 and 5 (see Note 7). 7. Cleave Cy5 labeled probes by incubating the resin in 200 μL of deprotection solution II. The solution is shaken gently for 1 h to cleave probes from resin. After the resins are removed by filtration, the blue-colored solution is kept in dark at room temperature for 24 h to complete the deprotection of nucleobases. 8. Gently blow the surface of solution with nitrogen to remove most of the ammonium hydroxide. 9. Dilute solution from step 8 with 200 μL water to prepare sample for HPLC purification. 10. Inject the solution from step 9 to reverse phase HPLC. Desired probes are eluted with ACN over TEAA buffer (5–50% over 30 min). Collect the peak with UV-vis absorptions at both 260 nm and 630 nm. Remove the solvent by lyophilization. The probes are dissolved in hybridization buffer at desired concentrations (see Note 8) and stored at 20  C.

116

Lan-Tian Lai et al.

Table 1 Sequences of amino-labeled probe set for single molecule RNA FISH of EGFP mRNA Probe

Sequencea

Probe

Sequencea

1

GTCCAGCTCGACCAGGATGG

11

CGATGCCCTTCAGCTCGATG

2

CTGAACTTGTGGCCGTTTAC

12

ATGTTGCCGTCCTCCTTGAA

3

AGCTTGCCGTAGGTGGCATC

13

GTAGTTGTACTCCAGCTTGT

4

GTGGTGCAGATGAACTTCAG

14

CATGATATAGACGTTGTGGC

5

CACTGCACGCCGTAGGTCAG

15

ATGCCGTTCTTCTGCTTGTC

6

GACTTGAAGAAGTCGTGCTG

16

GCGGATCTTGAAGTTCACCT

7

TGGACGTAGCCTTCGGGCAT

17

CGCTGCCGTCCTCGATGTTG

8

CTTGAAGAAGATGGTGCGCT

18

TGTGATCGCGCTTCTCGTTG

9

CGGGTCTTGTAGTTGCCGTC

19

GTCACGAACTCCAGCAGGAC

10

GTTCACCAGGGTGTCGCCCT

20

GTCCATGCCGAGAGTGATCC

Cy5 fluorophore is attached to 50 -end of each probe. Thymines highlighted in red are NH2T

a

11. For quality control, the probes are characterized by ESI-MS and are quantified by UV-vis absorption of both DNA (260 nm) and Cy5 dye (630 nm). 3.1.3 Nick Translation

In general, the probe used in RNA FISH or DNA FISH can be prepared by nick translation. The template DNA used in the nick translation reaction can be an appropriate BAC clone. For example, if a BAC clone contains no other transcribed regions than the target RNA, then the BAC can serve as the template DNA in nick translation to generate RNA FISH probes for the RNA target. 1. Add the following components (Nick Translation Kit) into a 0.2 mL tube and adjust the reaction volume to 50 μL with water: 2.5 μg

Template DNA

7.5 μL

dATP (0.4 mM)

7.5 μL

dCTP (0.4 mM)

7.5 μL

dGTP (0.4 mM)

1.5 μL

Cy3-dUTP (1 mM)

1.5 μL

Aminoallyl-dUTP (1 mM) (see Note 9)

5 μL

10  Nick translation buffer

5 μL

Enzyme mix

2. Incubate the reaction at 15  C for 2 h and 65  C for 10 min.

RNA-DNA FISH

117

3. Add 25 μg of Cot-1 DNA into the reaction mix. 4. Ethanol precipitation of the DNA in the reaction mix. 5. Resuspend the DNA pellet in 50 μL of hybridization buffer. 6. Store the probe in dark at 20  C. 3.2

Slide Preparation

3.2.1 Cytospin

The cytospin method helps to deposit cells cultured in suspension onto microscope slides. It also helps to separate cell clusters into single cells, for example the colonies of embryonic stem cells. 1. Trypsinize the cells and resuspend them in a concentration of 8 x 105 cells per mL in PBS (see Note 10). 2. Cytospin 100 μL cells onto microscope slides using cytology funnels at 1500 rpm for 10 min. 3. Air-dry the slides for 2–3 min. 4. Rinse the slides in ice-cold PBS for 5 min in a coplin jar. 5. Treat the slides with CSK buffer for 3 min (optional, see Note 11). 6. Fix the sample in 2% PFA for 10 min at room temperature (see Note 12). 7. Store the slides in 70% ethanol at 4  C until use.

3.2.2 Cell Culture on Coverslips

Cells can be directly cultured on coverslips before FISH experiments. In this way, better protection on cellular components and structures can be achieved because the cells are directly fixed on the coverslips without trypsinization and other physical manipulations. 1. Sterilize coverslips and lay the coverslips onto the bottom of a tissue culture container. Directly culture cells on the coverslips until the cell culture reaches a desired cell density. 2. Briefly rinse the coverslips in PBS twice. 3. Fix the cells in 2% PFA for 10 min at room temperature (see Note 12). 4. Store the coverslips in 70% ethanol at 4  C until use.

3.3

FISH

3.3.1 Single Molecule RNA FISH (SMRF)

The protocol of SMRF is adapted from the method described by Raj and colleagues [2]. A detailed protocol was published [7]. In this section, we describe the details of how to handle the cells directly cultured on coverslips. Readers are referred to Subheading 3.3.2 for details of how to handle the cells deposited onto microscope slides by cytospin. 1. Warm the probe to room temperature (see Note 8). 2. Aspirate 70% ethanol off the coverslip. 3. Wash the coverslip three times with 2 mL of wash buffer (10% formamide, 2 SSC with 4 mM Vanadyl ribonucleoside) each.

118

Lan-Tian Lai et al.

4. Pipette 10 μL of the probe onto a new microscope slide and place the coverslip with the cell-side down onto the probe. 5. Incubate the slide in a dark and humidified chamber at 30  C overnight. 6. Slowly remove the coverslip and wash the coverslip with 2 mL of wash buffer (10% formamide, 2 SSC with 4 mM Vanadyl ribonucleoside) at 30  C for 30 min. 7. Aspirate the wash buffer. 8. Fix the coverslip in 4% PFA at room temperature for 10 min. 9. Briefly rinse the coverslip twice with PBS. The sample is ready for the subsequent DNA FISH. 3.3.2 RNA FISH

In this section, we describe the details of how to handle the cytospin slides. Readers are referred to Subheading 3.3.1 for details of how to handle the cells directly cultured on coverslips. 1. Dehydrate the slide through 80%, 90%, 100% ethanol sequentially for 2 min each. 2. Air-dry the slide at room temperature for 2–3 min. 3. Denature the probe at 75  C for 10 min, pre-anneal the probe at 42  C for 10–60 min (see Note 13). It is important to coordinate steps 2 and 3, so that when the slide is dried, the probe is also ready to be applied onto the slide. 4. Deposit 6 μL denatured probe onto the slide. The probe should be directly applied onto the “area” of cells. After the probe is added, a coverslip is laid on top of the area. 5. Probe hybridization is carried out at 42  C in a dark and humid environment overnight. 6. Remove the coverslip gently. 7. Wash the slide three times (5 min for each wash) in 50% formamide, 2 SSC at 45  C with slight agitation. 8. Wash the slide three times (5 min for each wash) in 2 SSC at 45  C with slight agitation. 9. Rinse the slide briefly in PBS. 10. Fix the slides in 4% PFA at room temperature for 10 min. 11. Rinse the slides again briefly in PBS. The sample is ready for the subsequent DNA FISH.

3.3.3 DNA FISH

1. Permeabilize the sample with PBS 0.5% Triton X-100 for 15 min (optional). 2. For DNA FISH using the chromosome paint, treat the slides with freshly prepared 0.1 M HCl 0.25% Tween-20 for 1 min and wash with 2 SSC for 5 min (optional, see Note 14).

RNA-DNA FISH

119

3. Denature the sample in 70% formamide, 2 SSC at 75  C for 10 min. 4. Dehydrate the sample through 80%, 90%, 100% ice-cold ethanol sequentially for 2 min each. 5. Air-dry the slides at room temperature for 2–3 min. 6. Denature the probe at 75  C for 10 min, pre-anneal the probe at 42  C for 10–60 min (see Note 13). It is important to coordinate steps 5 and 6, so that when the slide is dried, the probe is also ready to be applied onto the slide. 7. Add 6 μL probe, lay the cover slip on top of the probe, and seal the four edges of the coverslip with rubber cement. 8. Probe hybridization is carried out overnight at 42  C in a dark and humid environment. 9. Remove the rubber cement and then the coverslip carefully. 10. Wash the sample three times (5 min for each wash) in 50% formamide, 2 SSC at 45  C with slight agitation. 11. Wash the sample three times (5 min for each wash) in 2 SSC at 45  C with slight agitation. 12. Rinse briefly in PBS with 0.2% Tween-20. 13. Briefly dry the sample for less than 1 min. 14. Apply mounting media with DAPI. Lay the coverslip. 15. Seal the four edges of the coverslip with minimal amount of nail polish. The sample is ready to be examined under microscopes.

4

Notes 1. Solutions of phosphoramidites, activator, and acetonitrile are dried over molecular trap packs 2 days before synthesis. 2. 200 nmole size of CPG resin in Mermade column are used for solid phase synthesis unless indicated otherwise. 3.

NH2

T and natural nucleoside phosphoramidites incorporated with standard coupling time, 60 s.

are

4. 50 -Fluorescein Phosphoramidite is incorporated with a 3-min coupling time. 5. Turn on “DMT-off” function during the initial setup of DNA sequence on the synthesizer. 4-Monomethoxytrityl group on terminal amine is removed by deblocking mix when the probes are still attached to CPG resin. 6. Vigorously vortexing the resins is highly recommended to increase the coupling yields. Cover the eppendorf with aluminum foil to protect the resin from light.

120

Lan-Tian Lai et al.

7. After Cy5 coupling, the resin turns to deep blue color. 8. In SMRF, it is important to test a wide range of probe concentrations (5–50 nM) to determine the optimized probe concentration for each experiment [7]. 9. Not all the amino-labeled dNTPs are compatible substrates for DNA polymerase I in the nick translation reaction. If a different amino-labeled dNTP is chosen for the nick translation reaction, it is important to confirm whether the amino-labeled dNTP is a compatible enzyme substrate in the nick translation reaction. 10. The concentration needs to be optimized for different cell types. 8  105 cells per mL is suitable for mouse ES cells and fibroblasts. 11. CSK treatment permeabilizes cells and improves probe penetration in FISH. This step is necessary for detecting nuclear RNA signals. The treatment time needs to be optimized for individual experiments. 12. In RNA-DNA FISH experiments, the sample is fixed twice in PFA solutions, one before RNA FISH (pre-fixation), the other one after RNA FISH (post-fixation). To make sure that the amino probes used in RNA FISH can be fixed efficiently with the matrix of cellular proteins in the post-fixation step, we pre-fix the cells in a milder condition (2% PFA). 13. If double stranded DNA is used as probe for FISH, it is necessary to denature the probe. Probe denaturation also helps to reduce the secondary structures of single stranded probes. In some experiments, unlabeled Cot-1 DNA (a fraction of repetitive DNA fragments isolated from genomic DNA) is mixed with the probe. After probe denaturation, the probe is pre-annealed with Cot-1 DNA at 42  C to block unspecific probe hybridization during FISH. The time for the pre-annealing step needs to be optimized for individual experiments. 14. For some difficult DNA targets, briefly subjecting the cells to 0.1 M HCl 0.25% Tween-20 enhances the accessibility of the probe to the DNA target.

Acknowledgments L-FZ is supported by the Singapore National Research Foundation under its Cooperative Basic Research Grant administered by the Singapore Ministry of Health’s National Medical Research Council. FS is supported by Nanyang Assistant Professor Fellowship (M4080531) and Ministry of Education Tier 2 Research Grant (M4020163). The authors would also like to acknowledge the

RNA-DNA FISH

121

Nanyang Technological University College of Science Collaborative Research Award. References 1. Wilusz JE, Sunwoo H, Spector DL (2009) Long noncoding RNAs: functional surprises from the RNA world. Genes Dev 23:1494–1504 2. Raj A, van den Bogaard P, Rifkin SA et al (2008) Imaging individual mRNA molecules using multiple singly labeled probes. Nat Methods 5:877–879 3. Lansdorp PM, Verwoerd NP, van de Rijke FM et al (1996) Heterogeneity in telomere length of human chromosomes. Hum Mol Genet 5 (5):685–691 4. Lai L-T, Lee PJ, Zhang L-F (2013) Immunofluorescence protects RNA signals in simultaneous RNA-DNA FISH. Exp Cell Res 319:46–55

5. Zhang L-F, Huynh KD, Lee JT (2007) Perinucleolar targeting of the inactive X during S phase: evidence for a role in the maintenance of silencing. Cell 129:693–706 6. Basu R, Lai L-T, Meng Z et al (2014) Using amino-labeled nucleotide probes for simultaneous single molecule RNA-DNA FISH. PLoS One 9(9):e107425 7. Raj A, Tyagi S (2010) Detection of individual endogenous RNA transcripts in situ using multiple singly labeled probes. Method Enzymol 472:365–386

Chapter 12 Highly Resolved Detection of Long Non-coding RNAs In Situ Megan Trotter, Clair Harris, Marissa Cloutier, Milan Samanta, and Sundeep Kalantry Abstract Long non-coding RNAs (lncRNAs) have been postulated to function in a number of DNA-based processes, most notably transcription. The detection of lncRNAs in situ can offer insights into their function. Fluorescence in situ hybridization (FISH) enables the detection of specific nucleic acid sequences, including lncRNAs, within individual cells. Current RNA FISH techniques can inform both the localization and expression level of RNA transcripts. Together with advances in microscopy, these in situ techniques now allow for visualization and quantification of even lowly expressed or unstable lncRNAs. When combined with detection of associated proteins and chromatin modifications by immunofluorescence, RNA FISH can lend essential insights into lncRNA function. Here, we describe an integrated set of protocols to detect, individually or in combination, specific RNAs, DNAs, proteins, and histone modifications in single cells at high sensitivity using conventional fluorescence microscopy. Key words Long non-coding RNAs, Fluorescence in situ hybridization, RNA FISH, DNA FISH, Immunofluorescence, Chromatin, Histone modifications

1

Introduction lncRNAs can recruit proteins, including chromatin modifiers, and alter gene expression states [1, 2]. Some lncRNAs are hypothesized to regulate expression of specific loci [3]. Other lncRNAs are postulated to regulate gene expression of broader domains, including chromosome-wide, as in the case of the Xist lncRNA and the inactive X-chromosome [1, 4]. Refinements of fluorescence in situ hybridization (FISH) techniques have resulted in the ability to visualize lncRNAs at their sites of synthesis on the chromosome or site of function in the genome at the single cell resolution in cultured cell lines as well as in tissues [5]. More recently, singlemolecule and allele-specific RNA FISH procedures have permitted

Megan Trotter, Clair Harris, Marissa Cloutier, and Milan Samanta are equal authors. Lin Zhang and Xiaowen Hu (eds.), Long Non-Coding RNAs: Methods and Protocols, Methods in Molecular Biology, vol. 2372, https://doi.org/10.1007/978-1-0716-1697-0_12, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021

123

124

Megan Trotter et al.

imaging and quantifying RNA molecules at the single allele resolution throughout the cell [6]. Combined with immunofluorescence (IF) detection of chromatin modifying factors and histone modifications, FISH techniques can permit the formulation of specific hypotheses of lncRNA function. For example, colocalization of chromatin modifiers or histone modifications with lncRNA transcripts can suggest the recruitment of the specific chromatin modifying factors to discrete loci in the genome by these transcripts. Here, we describe the detection via FISH of lncRNAs with probes that do not distinguish the DNA strand of synthesis, with strand-specific probes, and also with allele-specific probes. We additionally describe methods for IF-based detection of proteins and histone modifications and a DNA FISH procedure to mark specific DNA loci of interest in cultured and embryonic cells. These protocols can be used individually or in combination. We provide examples of cells profiled by IF and RNA FISH, RNA FISH followed by DNA FISH, and allele-specific RNA FISH interrogating the X-linked Xist and Tsix lncRNAs that are expressed from and function in cis on the X-chromosome along with histone H3 lysine 27 trimethylation (H3K27me3), which, like the Xist lncRNA, marks the inactive X-chromosome.

2

Materials

2.1 Sample Preparation: Cultured Cells

1. Sterile, gelatinized glass coverslips (see Note 1). 2. UltraPure DNase/RNase-free water. 3. 0.2% gelatin (Sigma#G2500): 0.2 g gelatin per 100 ml of UltraPure DNase/RNase-free water. Autoclave to dissolve and sterilize. 4. 6-well tissue culture dish. 5. Cytoskeletal buffer (CSK): 100 mM NaCl, 300 mM sucrose, 3 mM magnesium chloride, 10 mM PIPES (C8H18N2O6S2, pH 6.8; be sure to avoid using the sodium-salt version of PIPES). For PIPES, make 0.25–0.5 M stock; bring into solution by adding NaOH while measuring pH. 6. CSK buffer with 0.4% Triton X-100. Make stock of 10–20% Triton X-100 solution in UltraPure DNase/RNase-free water before adding to CSK. 7. 4% paraformaldehyde (PFA; Electron Microscopy Science 16% paraformaldehyde, diluted in UltraPure DNase/RNase-free water with a final concentration of 1 phosphate buffered saline [PBS]). 8. 70% ethanol made with filtered ddH2O. 9. Plate sealer tape.

Detecting lncRNAs in situ

2.2 Sample Preparation: Mouse Embryos/Embryo Fragments

125

1. UltraPure DNase/RNase-free water. 2. 0.01% poly-L-lysine (PLL; Sigma #P4707, 0.1% diluted 1:10 in DNase/RNase-free water). 3. PLL-coated glass coverslips (see Note 2). 4. Pipette for plating embryos or embryo fragments: A glass Pasteur pipette pulled to a width only slightly larger than the embryos/embryo fragments. 5. 1 PBS with 6 mg/ml of bovine serum albumin (BSA). To make 50 ml, combine 4 ml BSA (Invitrogen, 7.5 g/100 ml #15260037), 5 ml 10 PBS, and 41 ml UltraPure DNase/ RNase-free water. 6. Fixation and Permeabilization Solution: 1% PFA with 0.05% tergitol in a final concentration of 1 PBS. 7. 1% PFA (PFA; Electron Microscopy Science 16% paraformaldehyde in aqueous solution, diluted in UltraPure DNase/ RNase-free water and with a final concentration of 1 phosphate buffered saline [PBS]). 8. 70% ethanol made with filtered ddH2O. 9. 6-well dish, or similar container for storage (see Note 3). 10. Plate sealer.

2.3 Immunofluorescence (IF)

1. UltraPure DNase/RNase-free water. 2. 1 PBS. 3. 6-well dish, or similar chamber to be used for washing coverslips (see Note 3). 4. Blocking Buffer: 1 PBS with 0.5 mg/ml BSA, 50 μg/ml tRNA, 80 units/ml RNase inhibitor (such as RNAseOUT, Invitrogen #10777019), and 0.2% Tween-20 (make and use a 10% Tween-20 stock). Pre-warm to 37  C. 5. Primary antibody against antigen of choice. 6. Small glass plate (see Note 4). 7. Parafilm. 8. Forceps. 9. IF chamber: a small humid chamber for incubating slides, humidity provided by 1 PBS (see Note 5). 10. Incubator set to 37  C. 11. 1 PBS with 0.2% Tween-20. 12. Fluorescently conjugated secondary antibody. AlexaFluor (Invitrogen) secondary antibodies work well with this protocol; AF488, AF555, and AF647 have very similar absorption and emission spectra to the Fluorescein-12, Cy3, and Cy5 dyes used for fluorescence in situ hybridization (FISH), respectively.

126

Megan Trotter et al.

These antibodies can be used in conjunction with FISH probes for multi-color imaging with the same set of fluorescence microscope filters. 13. 40 , 6-diamidino-2-phenylindole, dihydrochloride (DAPI) (Invitrogen, 10 mg reconstituted in ddH2O to 5 mg/ml) (for IF without subsequent RNA FISH). 14. Mounting medium (for IF without subsequent RNA FISH, see Note 6). 15. Microscope slides (for IF without subsequent RNA FISH). 16. 2% PFA (for IF followed by RNA FISH only). 2.4 RNA and DNA FISH: Probe Labeling for Double-Stranded Probes (See Note 7)

1. BioPrime DNA Labeling System (Invitrogen, #18094011) (see Note 8). 2. UltraPure DNase/RNase-free water. 3. TE Buffer: 10 mM Tris–HCl, 1 mM EDTA, pH 8 (see Note 9). 4. DNA template for labeling. Large templates, such as fosmids or BACs, work best for labeling by random priming. BACs are recommended for DNA FISH probes, as the larger template size will maximize the signal despite low copy number of DNA compared to transcribed RNA in the cell. Ensure that the template preparation is extremely pure; contaminating bacterial DNA will lead to high levels of background fluorescent signal. 5. Custom dNTP mixture: (a) For use with Cy3 or Cy5 fluorescently labeled dCTP (see below): 2 mM dATP, 2 mM dGTP, 2 mM dTTP, 1 mM dCTP. (b) For use with Fluorescein-labeled dUTP (Fluorescein-12dUTP; see below): 2 mM dCTP, 2 mM dATP, 2 mM dGTP, 1 mM dTTP. 6. 1 mM fluorescently labeled nucleotide: Fluorescein-12-dUTP (Roche) labeled probes excite maximally at 495 nm and emit maximally at 521 nm. Cy3-labeled dCTP (GE Healthcare) labeled probes excite maximally at 550 nm and emit maximally at 570 nm, while Cy5-labeled dCTP (GE Healthcare) labeled probes absorb maximally in the far-red end of the spectrum, at 649 nm, and emit maximally at 670 nm. The spectra for these dyes do not overlap significantly and labeled probes can be combined with other FISH probes and used with DAPI, which absorbs maximally at 358 nm and emits maximally at 461 nm when bound to double-stranded DNA, or fluorescently conjugated antibodies for multi-color imaging. 7. G-50 ProbeQuant Micro Columns (GE Healthcare). 8. Tabletop centrifuge.

Detecting lncRNAs in situ

127

9. 20 mg/ml yeast tRNA: reconstitute lyophilized yeast tRNA (Invitrogen, #15401) in UltraPure DNase/RNase-free water. Aliquot and store at 20  C. 10. 3 M sodium acetate, pH 5.2 (see Note 9). 11. Molecular biology grade 100% ethanol. 12. Heat block set to 37  C. 2.5 RNA FISH: Probe Labeling for StrandSpecific Probes (See Note 7)

1. MAXIscript T3/T7 Kit (Ambion, AM1326). 2. Linear DNA template, PCR amplified with T3 or T7 promoter sequence upstream of the sequence to be labeled. To detect RNA transcribed from the desired strand, the probes should be complementary to the target transcript; the T3 or T7 RNA polymerase promoter sequences must therefore be incorporated at the appropriate ends of the template DNA. See MAXIscript kit instructions for details on template preparation. 3. Custom NTP mixture (see Note 10): (a) For use with Cy3 or Cy5 fluorescently labeled CTP (see below): 2 mM ATP, 2 mM GTP, 2 mM TTP, 1 mM CTP. (b) For use with Fluorescein-labeled UTP (Fluorescein-12UTP, see below): 2 mM CTP, 2 mM ATP, 2 mM GTP, 1 mM TTP. 4. 100 nmol fluorescently labeled nucleotide: Fluorescein-12UTP (Roche), Cy3-CTP (GE Healthcare), or Cy5-CTP (GE Healthcare) (see notes on emission spectra in Subheading 2.4). 5. 0.5 M EDTA (see Note 9). 6. 20 mg/ml yeast tRNA: reconstitute lyophilized yeast tRNA in UltraPure DNase/RNase-free water. Aliquot and store at 20  C. 7. 5 M ammonium acetate (see Note 9). 8. 100% molecular biology grade ethanol. 9. UltraPure DNase/RNase-free water. 10. Heat block set to 37  C. 11. Mini Quick Spin RNA Columns (Roche, #11814427001). 12. Tabletop centrifuge.

2.6 RNA and DNA FISH: Probe Precipitation

1. Fluorescently labeled probes in ethanol (see Subheadings 3.4 and 3.5). A combination of double-stranded and strandspecific probes may be used. 2. 20 mg/ml yeast tRNA: reconstitute lyophilized yeast tRNA in UltraPure DNase/RNase-free water. Aliquot and store at 20  C.

128

Megan Trotter et al.

3. 1 mg/ml COT-1 DNA (Invitrogen) (Optional, see Note 11). 4. 10 mg/ml salmon sperm DNA. 5. 5 M ammonium acetate (see Note 9). 6. UltraPure DNase/RNase-free water. 7. 100% molecular biology grade ethanol. 8. Tabletop centrifuge. 9. 70% ethanol made with filtered ddH2O. 10. Deionized formamide (Amresco). 11. 50% dextran sulfate (Millipore). 12. 20 saline sodium citrate (SSC) (Invitrogen). 13. 2 hybridization Solution: 200 μl UltraPure DNase/RNasefree water, 100 μl 20 SSC, 200 μl 50% dextran sulfate (see Note 12). 14. Vacuum centrifuge. 15. Heat block set to 90  C. 2.7 Allele-Specific RNA FISH: Guide Probe Precipitation

1. Fluorescein-12-UTP labeled strand-specific probe in ethanol (see Subheading 3.5). 2. UltraPure DNase/RNase-free water. 3. 5 M ammonium acetate. 4. Molecular biology grade 100% ethanol. 5. Tabletop centrifuge. 6. 70% ethanol made with filtered ddH2O. 7. Vacuum centrifuge. 8. Deionized formamide (Amresco). 9. 50% dextran sulfate (Millipore). 10. 20 saline sodium citrate (SSC) (Invitrogen). 11. Heat block set to 90  C. 12. Allele-specific hybridization solution: 300 μl UltraPure DNase/RNase-free water, 50 μl 20 SSC, 50 μl deionized formamide, 100 μl 50% dextran sulfate (see Note 12).

2.8 RNA FISH: Sample Hybridization

1. 100% molecular biology grade ethanol. Use filtered ddH2O to make stocks of 70%, 85%, and 95% ethanol. 2. Fluorescently labeled probes (see Subheadings 3.4, 3.5, and 3.6). 3. 6-well dish, or similar chamber to be used for dehydration and for washing coverslips (see Note 3). 4. Small glass plate (see Note 4). 5. Parafilm.

Detecting lncRNAs in situ

129

6. Forceps. 7. Hybridization chamber: a small humid chamber for incubating slides, humidity provided by 2 SSC/50% deionized formamide (see Note 5). 8. Incubator set to 37  C, for overnight incubation. 9. 2 SSC/50% deionized formamide: 5 ml 20 SSC, 25 ml deionized formamide, 20 ml filtered ddH2O. 10. 2 SSC: 5 ml 20 SSC, 45 ml filtered ddH2O. 11. 1 SSC: 2.5 ml 20 SSC, 47.5 ml filtered ddH2O. 12. DAPI (Invitrogen, 10 mg reconstituted in ddH2O to 5 mg/ ml). 13. Incubator set to 39  C, for washes. 14. Mounting medium (see Note 6). 15. Microscope slides. 2.9 DNA FISH: Sample Hybridization

1. 1 PBS. 2. 1% PFA with 0.5% Tergitol and 0.5% Triton X-100. (For DNA FISH following RNA FISH) 3. 100% molecular biology grade ethanol. Use filtered ddH2O to make stocks of 70%, 85%, and 95% ethanol. 4. 6-well dish, or similar chamber to be used for dehydration and for washing coverslips (see Note 3). 5. RNase A solution: 1.25 μg/μl RNase A, diluted in 2 SSC. 6. 2 SSC/70% deionized formamide: 10 μl 20 SSC, 70 μl deionized formamide, 20 μl filtered ddH2O. 7. Heat block set to 95  C. 8. Fluorescently labeled double-stranded probe (see Subheadings 3.4 and 3.6). 9. Small glass plate (see Note 4). 10. Parafilm. 11. Forceps. 12. Hybridization chamber: a small humid chamber for incubating slides, humidity provided by 2 SSC/50% deionized formamide (see Note 5). 13. Incubator set to 37  C, for overnight incubation. 14. 2XSSC/50% deionized formamide: 5 ml 20 SSC, 25 ml deionized formamide, 20 ml filtered ddH2O. 15. 2 SSC: 5 ml 20 SSC, 45 ml filtered ddH2O. 16. 1 SSC: 2.5 ml 20 SSC, 47.5 ml filtered ddH2O. 17. DAPI (Invitrogen, 10 mg reconstituted in ddH2O to 5 mg/ ml).

130

Megan Trotter et al.

18. Incubator set to 37  C, for washes. 19. Mounting medium (see Note 6). 20. Microscope slides. 2.10 Allele-Specific RNA FISH Protocol: Sample Hybridization (See Note 13)

1. 6-well dish, or similar chamber to be used for dehydration and for washing coverslips (see Note 3). 2. Forceps. 3. 100% molecular biology grade ethanol. Use filtered ddH2O to make stocks of 70%, 85%, and 95% ethanol. 4. Single-stranded Fluorescein-12-UTP labeled probe prepared for allele-specific FISH (see Subheading 3.7), detecting the same transcript as the Quasar dye labeled probe mixes in steps 5 and 6. 5. 0.4 μM probe mix for the reference allele: Equal mixture of 5 short oligonucleotide probes overlapping a SNP specific to one allele of the hybrid samples being analyzed. Each oligonucleotide is labeled with Quasar 570, which excites maximally at 548 nm and emits maximally at 566 nm. Quasar 570 labeled probes will appear in the same channel as Cy3 and cannot be used in conjunction with a Cy3-labeled probe. Use at a final dilution of 8 nM for hybridization (Biosearch Technologies, see Note 14). 6. 0.4 μM probe mix for alternate allele: Equal mixture of 5 short oligonucleotide probes overlapping a SNP specific to one allele of the hybrid samples being analyzed. Each oligonucleotide is labeled with Quasar 670, which excites maximally at 647 nm and emits maximally at 670 nm. Quasar 670 labeled probes will appear in the same channel as Cy5 and cannot be used in conjunction with a Cy5-labeled probe. Use at a final dilution of 8 nM for hybridization (Biosearch Technologies, see Note 14). 7. 1.2 μM mask probe mix. Use at a final dilution of 24 nM for hybridization (Biosearch Technologies, see Note 15). 8. Incubator set to 37  C. 9. Hybridization chamber (see Note 5). 10. Parafilm. 11. Small glass plate (see Note 4). 12. 2 SSC/10% deionized formamide: 5 ml 20 SSC, 5 ml deionized formamide, 40 ml filtered ddH2O. 13. 2 SSC (Invitrogen). 14. DAPI (Invitrogen, 10 mg reconstituted in ddH2O to 5 mg/ ml). 15. Mounting medium (see Note 6).

Detecting lncRNAs in situ

131

16. Clear nail polish. 17. Microscope slides.

3

Methods

3.1 Sample Preparation: Cultured Cells (See Note 16)

1. Grow cells to desired confluency on sterile, gelatinized glass coverslips (see Note 1) in the bottom of a 6-well tissue culture dish. 2. Prepare reagents for fixation and permeabilization: make and sterile filter CSK and CSK with 0.4% Triton X-100 buffers and chill to 4  C. Prepare 4% PFA. All subsequent steps should be performed in the tissue culture hood. The CSK and CSK + 0.4% Triton X-100 buffers should be kept chilled on ice during the procedure. 3. Aspirate media from all six wells of a 6-well dish. 4. Pipette 2 ml of ice-cold CSK buffer in each well of the 6-well dish. 5. After 30 s, aspirate the CSK buffer: by the time CSK buffer has been pipetted into each of the 6 wells, the first well has incubated in the CSK buffer for ~30 s, so simply pipette CSK buffer into the 6 wells and then aspirate right afterward from the first well onward. 6. Pipette 2 ml of ice-cold CSK + 0.4% Triton X-100 buffer in each well of the 6-well dish. 7. After 30 s, aspirate the CSK + 0.4% Triton X-100. See details in step 5. 8. Pipette 2 ml of ice-cold CSK buffer in each well of the 6-well dish. 9. After 30 s, aspirate the CSK buffer. See details in step 5. 10. Pipette 2 ml of 4% PFA in each well of the 6-well dish. Incubate the cells in PFA for 10 min. 11. Aspirate the PFA and pipette in 5 ml of cold 70% ethanol in each well. Repeat the ethanol wash a total of 3 times, to remove all traces of PFA. 12. Seal dish with plate sealer tape and store in 5 ml 70% ethanol at 20  C (see Note 17).

3.2 Sample Preparation: Mouse Embryos

1. Plate embryos or dissected embryo fragments on sterile, PLL-coated glass coverslips (22  22 cm square coverslips, see Note 2) in 1 PBS 6 mg/ml BSA using finely pulled Pasteur pipette. 2. Aspirate excess 1 PBS 6 mg/ml BSA and let dry for about 20–30 min.

132

Megan Trotter et al.

3. Fix and permeabilize with 50 μl of 1% PFA in 1 PBS with 0.05% Tergitol for 5 min (to get rid of excess solution on coverslip after the 5 min permeabilization/fixation, simply tap solution off onto a paper towel). 4. Fix again with 50 μl of 1% PFA (no Tergitol) in 1 PBS for 5 min. Drain the excess solution onto a paper towel. 5. Place coverslips with plated embryos in a 6-well dish. Rinse with 3 changes of 70% ethanol. 6. Seal dish with plate sealer tape and store in 5 ml 70% ethanol at 20  C until use (see Note 17). 3.3 Immunofluorescence (IF)

1. Begin with fixed, permeabilized cells or embryo samples, plated on gelatinized or PLL-coated glass coverslips and stored in 70% ethanol (see Subheadings 3.1 and 3.2). 2. Make blocking buffer and warm to 37  C. 3. Place sample coverslip in a 6-well dish that contains 2 ml of 1 PBS in each well (see Note 3). 4. Wash briefly with 3 changes of 1 PBS to remove ethanol. 5. Wash with 1 PBS 3 times, 3 min each on a rocker. 6. Wrap a glass plate tightly with parafilm for incubating coverslips for subsequent steps. 7. Block slides for 30 min at 37  C in 50 μl pre-warmed blocking buffer in a humid chamber: place a 50 μl drop of blocking buffer on the parafilm-wrapped glass plate and invert the coverslip, sample side down, into the blocking buffer. Place the parafilm-wrapped plate in the humid chamber and incubate for 30 min at 37  C. All incubations in blocking buffer, primary antibody, or secondary antibody should be set up in this manner. 8. Carefully lift coverslip from blocking buffer with forceps and place sample side down into a 50 μl droplet of primary antibody diluted in pre-warmed blocking buffer (dilution based on primary antibody of choice) on a parafilm-wrapped plate in a humid chamber at 37  C for 1 h. 9. Remove coverslip from primary antibody solution and place, sample side up, in a 6-well dish. Wash 3 times with 1 PBS/0.2% Tween-20 for 3 min on a rocker. 10. Incubate in 50 μl pre-warmed blocking buffer on a parafilmwrapped plate in a humid chamber for 5 min at 37  C.

Detecting lncRNAs in situ

133

11. Incubate with 50 μl secondary antibody diluted in pre-warmed blocking buffer in humid chamber at 37  C for 30 min. Antibody dilution depends on secondary antibody used. AlexaFluor conjugated secondary antibodies should be used at a 1:300 dilution. 12. Remove coverslip from secondary antibody and wash 3 times with 1 PBS/0.2% Tween-20 for 3 min each on a rocker. (a) If processing samples only for IF, the first wash of step 12 should contain a 1:100,000 to 1:200,000 dilution of DAPI (see Note 18). Then, rinse once briefly with PBS/0.2% Tween-20 and wash 2 more times for 5–7 min each while rocking to remove excess DAPI. Remove coverslip from dish and tap off excess liquid, then mount on a slide, sample side down, in mounting medium. Image samples or store at 20  C for later imaging. (b) If samples will be stained by RNA FISH probes following IF, pipette a 100 μl drop of 2% PFA on a glass plate wrapped in parafilm. After washes, place the coverslip sample side down in PFA and place in the humid chamber. Incubate the sample 10 min at room temperature, then proceed to RNA FISH hybridization (Subheading 3.8). 3.4 RNA FISH: Probe Labeling for DoubleStranded Probes

1. Dilute 50–100 ng DNA template in TE buffer in a final volume of 19 μl. 2. On ice, add 20 μl 2.5 Random Primers Solution (supplied with BioPrime labeling kit). 3. Denature DNA by heating for 5 min in a 90  C heat block. 4. Immediately cool on ice for 5 min. 5. Add 5 μl of custom dNTP mixture. 6. Add 5 μl of 1 mM fluorescently labeled nucleotide. 7. Add 2 μl (80 units) Klenow fragment (Supplied with the BioPrime Labeling Kit, see Note 19). 8. Incubate at 37  C overnight. Cover the tube with aluminum foil to protect the probe from light during this step. 9. Add 5 μl of Stop Buffer. 10. Prepare G-50 column: Vortex briefly to mix the Sephadex beads. Twist off the bottom and place the column in one of the provided collection tubes. Spin at 750 g for 1 min. 11. Move column to a new microcentrifuge tube and pipette the probe into the center of the column (see Note 20). Spin at 750 g for 1 min. Measure eluted volume by pipette. 12. Add 10 μl of 20 mg/ml yeast tRNA. 13. Add 3 M sodium acetate to a final concentration of 0.3 M.

134

Megan Trotter et al.

14. Add 2.5 volumes of 100% ethanol. 15. Spin at 4  C for 20 min at top speed, ~21,100  g. 16. Carefully aspirate the supernatant using the vacuum apparatus (see Note 21). 17. Resuspend the pellet in 360 μl of UltraPure DNase/RNasefree ddH2O. Make sure the pellet is fully dissolved before moving to the next step. 18. Add 40 μl 3 M sodium acetate. 19. Add 1 ml 100% ethanol. 20. Store labeled probe at 20  C in the dark (see Note 22). 3.5 RNA FISH: Probe Labeling for StrandSpecific Probes

1. Start with 1 μg template DNA. 2. Add DNase/RNase-free UltraPure ddH2O up to 9 μl, so that the total volume at the end of step 6 will be 20 μl. 3. Add 2 μl 10 MAXIscript kit).

Transcription

Buffer

(supplied

with

4. Add 5 μl NTP mixture (supplied with MAXIscript kit, see Note 10). 5. Add 2 μl of fluorescently labeled NTP. 6. Add 2 μl T3 (60 units) or T7 (30 units) enzyme mix (supplied with MAXIscript kit, see Note 10). 7. Mix thoroughly. 8. Incubate for 2 h at 37  C. 9. Add 1 μl of TURBO DNase (supplied with MAXIscript kit) and mix well. 10. Add 1 μl of 0.5 mM EDTA. 11. Prepare Quick Spin RNA Column: Vortex briefly to mix the Sephadex beads. Remove cap and twist off the bottom tip, then place the column in a microcentrifuge tube. Spin at 1000  g for 1 min. 12. Move column to a new microcentrifuge tube and pipette the probe into the center of the column (see Note 20). Spin at 1000  g for 1 min. Measure the new volume by pipette. 13. Add 10 μl of 20 mg/ml yeast tRNA. 14. Add 5 M ammonium acetate to a final concentration of 0.5 M. 15. Add 2.5 volumes of 100% ethanol. 16. Spin for 20 min at 4  C at top speed, 21,100  g. 17. Carefully aspirate the supernatant using the vacuum apparatus (see Note 21).

Detecting lncRNAs in situ

135

18. Resuspend the pellet in 360 μl DNase/RNase-free UltraPure ddH2O. Make sure the pellet is fully dissolved before moving to the next step. 19. Add 40 μl of 5 M ammonium acetate. 20. Add 1 ml of 100% ethanol. 21. Store labeled probe at 20  C in the dark (see Note 22). 3.6 RNA FISH: Probe Precipitation

All spin steps should be completed at top speed, ~21,100  g. 1. Aliquot 100 μl of fluorescently labeled probe in ethanol. A combination of double-stranded and strand-specific probes may be precipitated together (see Note 23). Vortex all probe stocks before aliquoting. 2. Add 15 μl of 20 mg/ml tRNA (300 μg) (see Note 24). Mix thoroughly by pipetting. 3. Add 15 μl of 1 mg/ml mouse COT-1 DNA (15 μg). Mix thoroughly by pipetting. (Optional, see Notes 11 and 24. If not using COT-1 DNA, add 15 μl of UltraPure DNase/ RNase-free water). 4. Add 15 μl of 10 mg/ml sheared boiled salmon sperm DNA (150 μg) (see Note 24). Mix thoroughly by pipetting. 5. Add 20 μl 5 M ammonium acetate (see Note 25). Vortex briefly to mix. 6. Add 45 μl of DNase/RNase-free UltraPure water (see Note 25). Vortex briefly to mix. 7. Add 250 μl of 100% ethanol (see Note 25). Vortex briefly to mix. 8. Spin for 20–30 min at 4  C to precipitate probe. 9. Carefully aspirate the supernatant using a vacuum apparatus (see Note 21). 10. Add 500 μl of 70% ethanol. Vortex thoroughly. Spin at room temperature for 4 min. 11. Carefully aspirate the supernatant using a vacuum apparatus (see Note 21). 12. Add 500 μl of 100% ethanol. Vortex thoroughly. Spin at room temperature for 4 min. 13. Carefully aspirate the supernatant using a vacuum apparatus (see Note 21). 14. Dry in a speed vacuum without heat for 5 min, or alternatively, air-dry. No ethanol should remain in the tube for the next step. 15. Add 50 μl of 100% deionized formamide to the tube. Make sure that the pellet is submerged in the formamide.

136

Megan Trotter et al.

16. Denature in a heat block at 80–90  C for 10 min. At ~5 min, pipette the formamide up and down to make sure that the pellet goes into solution. Place back into the heat block. 17. Immediately cool on ice for 5 min. 18. Add 50 μl 2 hybridization solution. Mix thoroughly by pipetting repeatedly (see Note 26). 19. Pre-anneal at 37  C for 1–1.5 h. (Optional, see Note 11. Pre-annealing is not required for strand-specific probes or for double-stranded probes precipitated without COT-1 DNA). 20. Store probe at 20  C in the dark. 3.7 Allele-Specific RNA FISH: Guide Probe Precipitation

1. Aliquot 100 μl of the fluorescently labeled probe into a clear 1.5 ml tube. 2. Add 260 μl DNase/RNase-free water. Vortex briefly to mix. 3. Add 40 μl of 5 M ammonium acetate. Vortex briefly to mix. 4. Add 1 ml of 100% ethanol. Vortex briefly to mix. 5. Spin to precipitate the probe mix at top speed for 20–30 min at 4  C (see Note 27). 6. Aspirate the supernatant using the vacuum apparatus (see Note 21). 7. Add 500 μl of 70% ethanol and spin at top speed for 4 min at room temperature. 8. Aspirate the supernatant using the vacuum apparatus (see Note 21). 9. Add 500 μl of 100% ethanol and spin at top speed for 4 min at room temperature. 10. Aspirate the supernatant using the vacuum apparatus (see Note 21). 11. Dry in a speed vacuum without heat for 5 min, or alternatively, air-dry. No ethanol should remain in the tube for the next step. 12. Resuspend pellet in 10 μl of 100% deionized formamide. 13. Denature at 90  C for 8 min in heat block. At ~4 min, pipette the formamide up and down to make sure that the pellet goes into solution. Place back into the heat block. 14. Immediately cool on ice for 5 min. 15. Add 90 μl of allele-specific hybridization solution. Pipette to mix (see Note 28). 16. Store precipitated probe at 20  C.

3.8 RNA FISH: Sample Hybridization

1. Start with permeabilized and fixed samples plated on coverslips and stored in 70% ethanol (see Subheadings 3.1 or 3.2), or samples previously processed for IF (see Subheading 3.3).

Detecting lncRNAs in situ

137

2. Dehydrate the coverslips by moving them through a roomtemperature ethanol series (85%, 95%, and 100% ethanol) for 2 min each. 3. Remove the coverslips from the well and air-dry at room temperature for 15 min (see Note 29). 4. Set up the FISH hybridization. Use 8–10 μl of probe per 22  22mm coverslip of cells and 12 μl of probe per coverslip of embryos. Pipette the probe onto a parafilm-wrapped glass plate (see Note 4), invert the coverslip onto the droplet of probe, and tap lightly with forceps to help the probe spread across the coverslip. 5. Hybridize overnight at 37  C in a humid chamber (humidity provided by 2 SSC/50% formamide) (see Note 5). 6. Make all wash solutions (2 SSC/50% formamide, 2 SSC, and 1 SSC), and warm the solutions to 39  C. 7. Carefully peel the coverslip off of the parafilm, re-invert and place in a well containing pre-warmed 2 SSC/50% formamide. Be sure the coverslip goes into the well with the sample side up. 8. Wash with pre-warmed 2 SSC/50% formamide at 39  C, 3 times for 7 min each. 9. Wash with pre-warmed 2 SSC at 39  C, 3 times for 7 min each. Add DAPI into the last 2 SSC wash at a 1:100,000–1:200,000 dilution (see Note 18). 10. Rinse once quickly with pre-warmed 1 SSC. 11. Wash with pre-warmed 1 SSC at 39 7 min each.



C, 2 times for

12. Use mounting medium to mount the coverslip onto a labeled microscope slide. Invert the slide onto a paper towel and press gently but firmly to remove excess mounting medium from under the coverslip. 13. Seal the coverslip and let dry thoroughly before viewing under a microscope. 14. Slides can be stored in the dark at 20  C to preserve the fluorescent signal. 3.9 DNA FISH: Sample Hybridization

1. Begin with permeabilized and fixed samples, plated on coverslips and stored in 70% ethanol (see Subheadings 3.1 and 3.2) or with samples previously stained for RNA FISH (see Subheading 3.8). For samples previously stained by RNA FISH, RNA FISH signals will be degraded during DNA FISH, so samples should be imaged prior to beginning this protocol. Take note of the visual fields imaged, so that RNA FISH images can be aligned with DNA stains later.

138

Megan Trotter et al.

2. If using samples previously stained for RNA FISH, use a razor blade to cut away the nail polish used to seal the coverslip to the slide. Then submerge the entire slide with the coverslip still attached in a solution of 2 SSC. While the sample is still submerged, gently peel off the coverslip by letting the solution of 2 SSC infiltrate between the coverslip and the slide. For samples not previously processed for RNA FISH, proceed to step 5. 3. Wash the coverslip with 1 PBS three times quickly and then incubate in 1 PBS for 5 min at room temperature. 4. Re-fix the cell/embryos with 1% PFA containing 0.5% tergitol and 0.5% Triton X-100. Fix for 10 min at room temperature. 5. Dehydrate the coverslips by moving them through a roomtemperature ethanol series (70%, 85%, and 100% ethanol) for 2 min each. 6. Remove the coverslips from the well and air-dry at room temperature for 15 min (see Note 29). 7. RNase treatment: Treat with 1.25 μg/μl RNase A in 2 SSC. Invert slides onto a 100 μl drop of RNase A solution on parafilm stretched over a glass slide. Incubate at 37  C for 30 min. RNase treatment allows the probe to gain access to the DNA. 8. Dehydrate the coverslips by moving them through a roomtemperature ethanol series (85%, 95%, and 100% ethanol) for 2 min each. 9. Remove the coverslips from the well and air-dry at room temperature for 15 min (see Note 21). 10. Denature the samples in a pre-warmed solution of 2 SSC/70% formamide on a glass slide or glass plate stationed on top of a heat block set at 95  C for 11 min. Denaturation time and temperature are sample dependent and need to be optimized (see Note 30). 11. Immediately dehydrate through a 20  C ethanol series (70%, 85%, 95%, and 100% ethanol) for 2 min each. 12. Remove the coverslips from the well and air-dry at room temperature for 15 min (see Note 29). 13. Set up the FISH hybridization. Use 8–10 μl of probe per 22  22 mm coverslip of cells and 12 μl of probe per coverslip of embryos. Pipette the probe onto parafilm-wrapped glass plate (see Note 4), invert the coverslip onto the droplet of probe and tap lightly with forceps to help the probe spread across the coverslip. Be sure to avoid air bubbles that may result in uneven distribution of the probe.

Detecting lncRNAs in situ

139

14. Hybridize overnight at 37  C in a humid chamber (humidity provided by 2 SSC/50% formamide, see Note 5). 15. Make all wash solutions (2 SSC/50% formamide, 2 SSC), and warm the solutions to 39  C. 16. Carefully peel the coverslip off of the parafilm, re-invert and place in a well containing pre-warmed 2 SSC/50% formamide. Be sure the coverslip goes into the well with the sample side up. 17. Wash with 2 SSC/50% formamide at 39 7 min each.



C twice,

18. Wash with pre-warmed 2 SSC plus a 1:100,000–1:200,000 dilution of DAPI for 7 min at 39  C (see Note 18). 19. Wash with pre-warmed 2 SSC without DAPI. 20. Use mounting medium to mount the coverslip onto a microscope slide. Invert the slide onto a paper towel and press gently but firmly to remove excess mounting medium from under the coverslip. 21. Seal the coverslip with clear nail polish and let dry thoroughly before viewing under a microscope. 22. Image and store the slide at 20  C. When combining RNA FISH with DNA FISH, RNA FISH-stained samples must be imaged first and the coordinates of the cells on coverslip recorded. Following DNA FISH, the same cells should be imaged, to allow for analysis of overlaid images. 3.10 Allele-Specific RNA FISH: Sample Hybridization

1. Start with permeabilized and fixed samples plated on coverslips in 70% ethanol (see Subheadings 3.1 or 3.2). 2. Aspirate the ethanol using a vacuum apparatus and dehydrate the cells through an ethanol series (85%, 95%, and 100%) for 2 min each at room temperature. 3. Remove the coverslips from the well and air-dry at room temperature for 15 min (see Note 29). 4. While the coverslips are drying, prepare the allele-specific probe mix. In a 1.5 ml tube, add 1 μl of probe mix for the reference allele, 1 μl of probe mix for the alternate allele, and 1 μl mask probe mix per 50 μl guide probe. 5. Set up FISH hybridization. Pipette the probe onto parafilmwrapped glass plate (see Note 4), invert the coverslip onto the droplet of probe and tap lightly with forceps to help the probe spread across the coverslip. Use 12 μl of probe for coverslips of cells and 25 μl of probe for coverslips with embryos. Use clear nail polish to seal around the edge of the coverslips to prevent drying and loss of embryos (see Note 31). 6. Incubate coverslips overnight at 37  C in a humid chamber (see Note 5).

140

Megan Trotter et al.

7. Gently remove nail polish using 70% EtOH. Transfer coverslips, sample side up, into a 6-well dish containing 2 ml pre-warmed 10% Formamide/2 SSC using forceps. 8. Wash at 37  C for 30 min in the dark (see Note 32). 9. Aspirate off the wash buffer and wash in pre-warmed 10% Formamide/2 SSC with a 1:100,000–1:200,000 dilution of DAPI at 37  C for 30 min in the dark (see Notes 18 and 32). 10. Aspirate the wash buffer and wash with pre-warmed 2 SSC for 5 min at room temperature. 11. Use mounting medium to mount the coverslip onto a microscope slide. Invert the slide onto a paper towel and press gently but firmly to remove excess mounting medium from under the coverslip. Seal coverslips using clear nail polish. 12. Allow coverslips to dry in the dark. 13. Store at 20  C.

4

Notes 1. To prepare coverslips for tissue cultured cells, sterilize coverslips one at a time by dipping in 100% ethanol and running through a flame. Once the coverslip is sterilized in this manner, place it in a well of the 6-well dish. In the tissue culture hood, pipette 0.5–1 ml sterile 0.2% gelatin onto the coverslip and spread around. Keep the gelatin on the surface of the coverslip for ~10 min and avoid letting the gelatin infiltrate between the coverslip and the bottom of the well, or the coverslip may stick to the tissue culture surface. Aspirate excess gelatin and let dry. 2. To prepare coverslips for embryos, place glass coverslip in a well of a 6-well dish. Add 2 ml of 0.01% poly-L-lysine, ensure coverslip is submerged and incubate for 15 min while rocking. Wash 3 in DNase/RNase-free water or 1xPBS. Allow coverslips to dry for at least 2 h in a microtube storage box. 3. Use 22  22 mm coverslips, which fit well within a single well of a 6-well dish. All the dehydration and washing steps can be performed in the wells of this dish. 4. Short glass plates designed for casting protein gels are a good size for the hybridization. For example, BioRad’s MiniPROTEAN Short Plates fit well in the humid chamber described in Note 5. 5. For humid chambers, a microscope slide box (i.e., one that holds 100 slides) works well. Place paper towels soaked in 1PBS/0.2% Tween 20 in the bottom of the box to create a humid chamber for immunofluorescence. For RNA and DNA FISH hybridization procedures, create a humid chamber by

Detecting lncRNAs in situ

141

placing paper towels soaked in 2 SSC/50% formamide in the bottom of the box. 6. For mounting medium, use Vectashield (Vector Labs) or similar anti-fade mounting medium and seal coverslips with clear nail polish after mounting on slides. 7. For “double-stranded” probes (see Subheadings 2.4 and 3.4), whereas the labeled probes are single-stranded, the probes are created from a double-stranded fosmid or BAC template using random primers. As such, these probes will hybridize to both strands of DNA and to sense and antisense RNAs transcribed from the same genomic sequence. Double-stranded probes can be used for DNA FISH and for RNA FISH for most genes. With strand-specific probes (see Subheadings 2.5 and 3.5), only one strand is labeled, and the probe will hybridize only to the complementary RNA molecule. The strand-specific probes can be used to distinguish sense and antisense transcription from a single locus. 8. This kit was originally designed for the preparation of biotinylated probes. The protocol outlined above modifies the BioPrime labeling procedure to generate fluorescently labeled probes instead. Do not use the supplied 10 dNTP Mixture, which includes biotinylated nucleotides. Prepare the dNTP mixture and add the fluorescently labeled nucleotides separately as described in Subheading 3.4, steps 5 and 6. Alternative random primer solutions have been tested, but do not work as well as the 2.5 Solution provided with this kit. 9. While these solutions can be prepared from scratch, purchasing them is preferred to ensure their DNase/RNase-free purity. 10. A set of 10 mM NTPs and the T3 and T7 enzymes are included in the MAXIscript kit. Both the NTPs and enzymes can be replaced with other commercially available substitutes with no apparent loss of enzymatic activity. 11. COT-1 DNA can be added to double-stranded probes to help reduce background by hybridizing to repetitive sequences, if necessary. COT-1 DNA should not be used in strand-specific probes, as these probes are specific to the target sequence. While routine use of mouse COT-1 in the precipitation of mouse probes does not appear to be necessary, COT-1 use may reduce hybridization of FISH probes to repetitive sequences. If COT-1 is omitted, pre-annealing is not required. 12. 50% Dextran sulfate is very viscous. Make the hybridization solutions in a tube with graduations. Pipet all the reagents into the tube other than the dextran sulfate. Then, instead of pipetting the dextran sulfate, use a pipette tip to “scoop” it into the tube up to the appropriate measurement marking on the side of microtube. Vortex thoroughly to mix.

142

Megan Trotter et al.

13. Allele-specific FISH relies on hybrid samples with known single nucleotide polymorphisms (SNPs) between the two parental strains. Probes designed to bind variant nucleotides at the SNP sites will only detect a signal from RNA transcripts originating from one or the other parental allele and can be utilized to examine expression of imprinted genes or other genes that are monoallelically expressed. This protocol is specific for Xist RNA but can be modified for use with any long non-coding RNA (lncRNA) with allele-specific expression. 14. See refs. 6, 7 for methodology for the design of allele-specific probes. Briefly, in the example for allele-specific detection of Xist RNA, a panel of short oligonucleotide probes was designed to uniquely detect either the M. musculus or the M. molossinus alleles of Xist RNA [6]. Each oligonucleotide probe overlaps a SNP site that differs between the two strains, with the SNP located at the fifth base pair position from the 50 end. The five such probes used were mixed together in equal proportions to be used in the FISH hybridization. Probes overlapping both the reference and the alternate nucleotides at each of the five SNP sites are used to distinguish allelic expression. The 30 end of each oligonucleotide probe is fluorescently tagged using Quasar dyes (Biosearch technologies). 15. In addition to labeled SNP-overlapping oligonucleotides, a panel of 5 “mask” oligonucleotides were also synthesized. These “mask” probes are complementary to the 30 end of the labeled allele-specific probes and will hybridize to the allelespecific oligonucleotides, leaving available only 9–10 base pairs of sequence adjacent to the polymorphic site to initially hybridize to the target RNA. Since this region of complementarity is short, the presence of a SNP is sufficient to destabilize the hybridization of the probe to the alternate allele. It is important that the mask probe is used at a higher concentration than the allele-specific probes to ensure that the mask probe is saturating. 16. Washing cells with CSK buffers will extract soluble RNAs and proteins from the cytoplasm. This treatment will reduce background and yield cleaner results in RNA FISH stains when detecting nascent or stabilized RNAs. For staining of cytoplasmic or soluble RNAs and proteins, or for quantification of single-molecule RNA FISH signals, CSK buffer should not be used. Instead, samples can be permeabilized with detergents such as 0.5% Triton X-100 for 5 min on ice. 17. Fixed cells and embryos on coverslips in 70% ethanol can be stored in 20  C for at least 1 year. Seal with plate sealer tape to minimize evaporation of ethanol and replace the ethanol in the wells occasionally (once every 2 months or so), as it will evaporate over time despite the plate sealer.

Detecting lncRNAs in situ

143

18. Dilute DAPI 1:500 in UltraPure DNase/RNase-free water, store in the dark at 20  C. Add 5 μl of this dilution into each well holding 2ml 2 SSC while washing samples. 19. The Klenow Fragment included with the BioPrime kit can be replaced with other commercially available Klenow enzymes with no apparent loss of enzymatic activity. 20. Be sure to pipet the probe into the center of the beads in the column during column purification. The probe must travel through the beads, not down the side. 21. The pellet is usually attached to the bottom of the tube but be careful during aspiration of the supernatant. A 10 μl pipette tip can be attached to the end of the aspiration tube to slow suction. 22. Ethanol stocks of probes can last up to 1 year when stored, sealed, and protected from light. 23. At this stage, multiple probes that were labeled with different fluorophores can be combined. To start, precipitate 100 μl ethanol stock of each probe together. If any of the probes are too bright or too faint, the initial starting volume of the ethanol stock of the probes can be adjusted. 24. The volumes of tRNA, Cot-1 DNA, and Salmon sperm DNA (Subheading 3.6, steps 2–4) are determined by the final volume of probe, these will stay constant whether precipitating one or multiple probes together. 25. The volumes of ammonium acetate, H2O, and 100% ethanol (Subheading 3.6, steps 5–7) are determined by the starting volume of ethanol. These will need to be scaled up proportionately if multiple probes are precipitated together. 26. Composition of 2 hybridization buffer: 4 SSC and 20% dextran sulfate. Final composition of the RNA FISH hybridization buffer after mixing the 2 hybridization buffer with an equal volume of the probe in 100% deionized formamide: 50% deionized formamide, 2 SSC, and 10% dextran sulfate. 27. Be sure to place the hinge of the tube facing out to ensure the location of the pellet once spinning is complete. 28. Composition of the allele-specific RNA FISH hybridization buffer: 10% deionized formamide, 2 SSC, and 10% dextran sulfate. Final composition of the allele-specific RNA FISH hybridization buffer after mixing 90 μl of the buffer above with 10 μl probe in 100% deionized formamide: 19% deionized formamide, 1.8 SSC, 9% dextran sulfate. 29. Do not leave the coverslip to dry in the well after the 100% ethanol is aspirated. The glass coverslip may stick to the bottom of the well. Find a safe place to set the coverslip to dry. Laying

144

Megan Trotter et al.

them across the inside of an empty pipette tip box or microtube storage box works well, and closing the lid of the box can prevent dust from settling on the coverslip and preserve fluorescent signal. 30. Denaturation time and temperature are very sample dependent and need to be optimized. Different cell types we routinely use vary from 5 min at 90  C, 11 min at 80  C, and 11 min at 95  C. 31. Allele-specific probes diluted in the allele-specific hybridization buffer desiccate more quickly than the non-allele specific probe hybridization buffer. Thus, the coverslips should be sealed during allele-specific RNA FISH hybridization to prevent evaporation. Hybridization buffer evaporation results in the glass coverslips sticking to the parafilm and may cause loss of cells or embryos on the coverslip. This difference in hybridization conditions is likely due to the reduced concentration of deionized formamide in the allele-specific FISH hybridization buffer. 32. Cover the 6-well dish in aluminum foil.

Acknowledgments This work was funded by NIH National Research Service Awards 5-T32-GM07544 (University of Michigan Predoctoral Genetics Training Program to M.C. and M.T.); a T32-HD079342 (University of Michigan Predoctoral Career Training in the Reproductive Sciences Program to M.C.); an NIH Director’s New Innovator award (DP2-OD-008646-01), an NIH NIGMS R01 award (R01GM124571), and an NIH NICHD R01 award (R01HD095463) (to S.K.); and the University of Michigan Endowment for Basic Sciences (to S.K.). References 1. Lee JT, Bartolomei MS (2013) X-inactivation, imprinting, and long noncoding RNAs in health and disease. Cell 152(6):1308–1323 2. Brockdorff N (2013) Noncoding RNA and Polycomb recruitment. RNA 19(4):429–442 3. Brown JD, Mitchell SE, O’Neill RJ (2012) Making a long story short: noncoding RNAs and chromosome change. Heredity 108 (1):42–49 4. Maclary E et al (2013) Long nonoding RNAs in the X-inactivation center. Chromosom Res 21 (6–7):601–614

5. Maclary E et al (2014) Differentiationdependent requirement of Tsix long non-coding RNA in imprinted X-chromosome inactivation. Nat Commun 5:4209 6. Harris C et al (2019) Conversion of random X-inactivation to imprinted X-inactivation by maternal PRC2. elife 8:e44258 7. Levesque MJ et al (2013) Visualizing SNVs to quantify allele-specific expression in single cells. Nat Methods 10(9):865–867

Chapter 13 Detection of Long Non-coding RNA Expression by Non-radioactive Northern Blots Yueying Wang, Mu Xu, Jiao Yuan, Zhongyi Hu, Youyou Zhang, Lin Zhang, and Xiaowen Hu Abstract With the advances in sequencing technology and transcriptome analysis, it is estimated that up to 75% of the human genome is transcribed into RNAs. This finding prompted intensive investigations on the biological functions of non-coding RNAs and led to very exciting discoveries of microRNAs as important players in disease pathogenesis and therapeutic applications. Research on long non-coding RNAs (lncRNAs) is in its infancy, yet a broad spectrum of biological regulations has been attributed to lncRNAs. As a novel class of RNA transcripts, the expression level and splicing variants of lncRNAs are various. Northern blot analysis can help us learn about the identity, size, and abundance of lncRNAs. Here we describe how to use northern blot to determine lncRNA abundance and identify different splicing variants of a given lncRNA. Key words Long non-coding RNA, RNA expression, Northern blots

1

Introduction lncRNAs are operationally defined as RNA transcripts larger than 200 nt that do not appear to have coding potential [1–5]. Given that up to 75% of the human genome is transcribed to RNA, while only a small portion of the transcripts encodes proteins [6], the number of lncRNA genes can be large. After the initial cloning of functional lncRNAs such as H19 [7, 8] and XIST [9] from cDNA libraries, two independent studies using high-density tiling array reported that the number of lncRNA genes is at least comparable to that of protein-coding genes [10, 11]. Recent advances in tiling array [10–13], chromatin signature [14, 15], computational analysis of cDNA libraries [16, 17], and next-generation sequencing (RNA-seq) [18–21] have revealed that thousands of lncRNA genes are abundantly expressed with exquisite cell-type and tissue

Lin Zhang and Xiaowen Hu (eds.), Long Non-Coding RNAs: Methods and Protocols, Methods in Molecular Biology, vol. 2372, https://doi.org/10.1007/978-1-0716-1697-0_13, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021

145

146

Yueying Wang et al.

specificity in human. In fact, the GENCODE consortium within the framework of the ENCODE project recently reported 14,880 manually annotated and evidence-based lncRNA transcripts originating from 9277 gene loci in human [6, 21], including 9518 intergenic lncRNAs (also called lincRNAs) and 5362 genic lncRNAs [14, 15, 20]. These studies indicate that (1) lncRNAs are independent transcriptional units; (2) lncRNAs are spliced with fewer exons than protein-coding transcripts and utilize the canonical splice sites; (3) lncRNAs are under weaker selective constraints during evolution and many are primate specific; (4) lncRNA transcripts are subjected to typical histone modifications as proteincoding mRNAs, and (5) the expression of lncRNAs is relatively low and strikingly cell-type or tissue-specific. The discovery of lncRNA has provided an important new perspective on the centrality of RNA in gene expression regulation. lncRNAs can regulate the transcriptional activity of a chromosomal region or a particular gene by recruiting epigenetic modification complexes in either cis- or trans-regulatory manner. For example, Xist, a 17 kb X chromosome specific non-coding transcript, initiates X chromosome inactivation by targeting and tethering Polycombrepressive complexes (PRC) to X chromosome in cis [22– 24]. HOTAIR regulates the HoxD cluster genes in trans by serving as a scaffold which enables RNA-mediated assembly of PRC2 and LSD1 and coordinates the binding of PRC2 and LSD1 to chromatin [12, 25]. Based on the knowledge obtained from studies on a limited number of lncRNAs, at least two working models have been proposed. First, lncRNAs can function as scaffolds. lncRNAs contain discrete protein-interacting domains that can bring specific protein components into the proximity of each other, resulting in the formation of unique functional complexes [25–27]. These RNA-mediated complexes can also extend to RNA–DNA and RNA–RNA interactions. Second, lncRNAs can act as guides to recruit proteins [24, 28, 29], such as chromatin modification complexes, to chromosome [24, 29]. This may occur through RNA– DNA interactions [29] or through RNA interaction with a DNA-binding protein [24]. In addition, lncRNAs have been proposed to serve as decoys that bind to DNA-binding proteins [30], transcriptional factors [31], splicing factors [32–34], or miRNAs [35]. Some studies have also identified lncRNAs transcribed from the enhancer regions [36–38] or a neighbor loci [18, 39] of certain genes. Given that their expressions correlated with the activities of the corresponding enhancers, it was proposed that these RNAs (termed enhancer RNA/eRNA [36–38] or ncRNA-activating/ ncRNA-a [18, 39]) may regulate gene transcription. As a novel class of RNA transcripts, the expression level and splicing variants of lncRNAs are various. Northern blot analysis can help us learn about the identity, size, and abundance of lncRNAs.

Detection of Long Non-coding RNA Expression by Non-radioactive Northern Blots

147

Here we describe how to use northern blot to determine lncRNA abundance and identify different splicing variants of a given lncRNA.

2

Materials

2.1 Materials for DIG-Labeled RNA Probe Synthesis

1. DNA template (see Note 1). 2. Restriction Enzyme. 3. 10x DIG RNA labeling mix (Roche, 11277073910). 4. T7 RNA polymerase (10 U/μL) and 5 transcription buffer (Agilent, 600,123). 5. Dnase I (NEB, M0303S, 2000 U/mL). 6. Agarose gel electrophoresis supplies for DNA fragment purification. 7. Quick Spin Columns for radiolabeled RNA purification Sephadex G-50 (Roche, 11274015001). 8. Gel Extraction Kit (Qiagen, 28704).

2.2 Materials for Separating RNA by Electrophoresis

1. Nucleic acid agarose. 2. 55  C water bath. 3. 10x Denaturing gel buffer (Invitrogen, AM8676). 4. Heat block. 5. Gel electrophoresis apparatus. 6. 3-(N-morpholino) propanesulfonic acid (MOPS). 7. Sodium acetate. 8. 0.5 M EDTA. 9. 10x MOPS buffer: 200 mM MOPS, 50 mM Sodium acetate, 20 mM EDTA, adjust pH to 7.0. To make 1 MOPS gel running buffer, mix one part of 10 MOPS buffer with nine parts of RNase-free water. 10. RNA loading buffer (Invitrogen, AM8552). 11. Ethidium bromide (only if RNA visualization is needed). 12. DIG-labeled RNA marker (Roche, 11373099910).

2.3 Materials for Transferring RNA to the Membrane

1. 20x SSC (Invitrogen, 15557-036). 2. Razor blade. 3. 3 M Filter paper. 4. Positively Charged Nylon Membrane (Roche). 5. Blunt end forceps. 6. Paper towel.

148

Yueying Wang et al.

7. RNase-free flat bottomed container as buffer reservoir. 8. Clean glass pasture pipet as roller. 9. Light weights (150–200 g) object serving as weight during transfer. 10. Supports of the reservoir (i.e., a stack of books). 11. Stratalinker® UV Crosslinker. 2.4 Materials for Probe-RNA Hybridization

1. 20 SSC. 2. 10% SDS. 3. DIG easy Hyb Granules (Roche, 11796895001). 4. 68  C shaking water bath. 5. Heat block. 6. Hybridization oven. 7. Hybridization bags. 8. Low Stringency Buffer: 2 SSC with 0.1% SDS. 9. High Stringency Buffer: 0.1 SSC with 0.1% SDS.

2.5 Materials for Detection of Probe-RNA Hybrids

1. Washing and Blocking buffer set (Roche, 11585762001). 2. Anti-DIG-alkaline 11093274910).

phosphatase

antibody

(Roche,

3. NBT/BCIP Stock Solution (Roche, 11681451001). 4. CDP-Star, Ready-to-Use (Roche, 12 041 677 001). 5. TE buffer: 10 mM Tris–HCL, 1 mM EDTA, adjust pH to approximately 8.

3

Methods (See Note 2) lncRNA Northern blot analysis aims to characterize lncRNA expression. The protocol includes five parts: (1) RNA probe synthesis and labeling; (2) RNA sample electrophoresis; (3) RNA transfer; (4) RNA-probe hybridization; and (5) RNA-probe hybrid detection.

3.1 DIG-Labeled RNA Probe Synthesis by In Vitro Transcription

The DIG-labeled RNA probe synthesis is very similar to the biotinylated RNA synthesis described in lncRNA pulldown assay. The differences between the two procedures are: 1. Since the probe needs to be complement to the target sequence, the probe RNA is transcribed from the 30 end of the target sequence. We clone the gene of interest in reverse orientation to make the in vitro transcription template for Northern probes. 2. Use DIG labeling mix in place for the biotin-label mix.

Detection of Long Non-coding RNA Expression by Non-radioactive Northern Blots

3.2 Separating RNA Samples by Electrophoresis 3.2.1 Gel Setup

149

1. Wipe the gel rack, tray and combs with RNAZap, rinse with water, and let dry. 2. Weight 100 g agarose in a clean glass flask and mix with 90 mL RNase-free water. Melt the agarose completely by heating with a microwave. Put the flask with melted agarose in a 55  C water bath. 3. In a fume hood (see Note 3), add 10 mL 10 denaturing gel buffer to the gel mix that is equilibrated to 55  C. Mix the gel solution by gentle swirling to avoid generating bubbles. Slowly pour the gel mix into the gel tray, pop any bubbles or push them to the edges of the gel with a clean pipet tip. The thickness of the gel should be about 6 mm. Slowly place the comb in the gel. Allow the gel to solidify before removing the comb. 4. Right before RNA electrophoresis, place the gel tray in the electrophoresis chamber with the wells near the negative lead and add 1 MOPS gel running buffer in the chamber until it is 0.5–1 cm over the top of the gel (see Note 4).

3.2.2 RNA Electrophoresis

1. Mix no more than 30μg sample RNA with 3 volumes of RNA loading buffer (see Note 5). To destruct any secondary structure of the RNAs, incubate the RNA with loading buffer at 65  C for 15 min using a heat block (see Note 6). Spin briefly to collect samples to the bottom of the tube and put the tubes on ice (see Note 7). 2. Carefully draw the RNAs in the tip without trapping any bubbles at the end of pipet tip, place the pipet tip inside of the top of the well, slowly push samples into the well and exit the tip without disturbing the loaded samples. If markers are needed, load one lane with DIG-labeled RNA marker. 3. Run the gel at 5 V/cm (see Note 8). 4. (Optional) Stain the gel with Ethidium bromide and visualize the RNA under UV (see Note 9).

3.3 Transfer RNA from Agarose Gel to the Membrane

1. Material preparation (a) Use a razor blade to trim the gel by cutting through the wells and discard the unused gel above the wells. For marking the orientation, make a notch at a corner. (b) Cut the membrane to the size slightly larger than the gel. Make a notch at a corner to align the membrane with gel in the same orientation. Handle the membrane with care—only touching the edges with gloved hands or blunt tip forceps. (c) Cut eight pieces of filter paper the same size of the membrane.

150

Yueying Wang et al.

(d) Cut a stack of paper towels that are 3 cm in height and 1–2 cm wider than the gel. (e) Pour 20 SSC into a flat bottomed container that has bigger dimension of the agarose gel. This serves as the buffer reservoir and can also be used to wet the paper and membrane. Put the reservoir on a support (i.e., a stack of books) so that its bottom is higher than the paper towel stack. (f) Cut three pieces of filter paper that are large enough to cover the gel and long enough reach over to the reservoir. These papers serve as the bridge to transfer buffer from the reservoir to the gel. 2. Transfer setup: (a) Stack paper towel on a clean bench and put three pieces dry filter paper on top. (b) Wet two more pieces of filter paper and put on top of the dry filter paper. Gently roll out any bubbles between the filter paper layers. (c) Carefully put the membrane on top of the wet filter paper. Gently roll out any bubbles between the membrane and the filter papers. (d) Put the trimmed gel onto the center of the membrane with the bottom of the gel touching the membrane (i.e., the gel plane that faces down during electrophoresis will be in contact with the membrane), align the notches of the gel and membrane. Roll out bubbles between the membrane and the gel. (e) Place three more pieces pre-wet filter paper on top of the gel and roll out bubbles between filter paper layers. (f) Wet the three pieces of paper bridge and place them with one end on top of the stack and the other end in buffer reservoir. Make sure there is no bubble between any layers of paper (see Note 10). (g) Place a 150–200  g object with the size similar to the gel on top of the stack. (h) Transfer the gel for 15–20 min per mm of gel thickness. It usually takes about 2 h (see Note 11). 3. RNA crosslink: disassemble the transfer stack carefully and rinse the member with 1 MOPS gel running buffer to remove residual agarose. Blot off excessive liquid and immediately subject the membrane to crosslink treatment. Crosslinking the RNA to the membrane with Stratalinker® UV Crosslinker using the autocrosslink setting (see Note 12).

Detection of Long Non-coding RNA Expression by Non-radioactive Northern Blots

151

Air-dry the membrane at room temperature. At this point, the membrane can be subjected to hybridization immediately or stored in a sealed bag between two pieces filter paper at 4 oC for several months before hybridization. 3.4 Hybridization of DIG-Labeled Probes to the Membrane (See Note 13) 3.4.1 Prehybridization

1. Reconstitute the DIG easy Hyb Granules: add 64 mL RNasefree water into one bottle of the DIG easy Hyb Granules, stir for 5 min at 37  C to completely dissolve the granules. DIG easy Hyb buffer will be used in prehybridization and hybridization. The reconstituted DIG easy Hyb buffer is stable at room temperature for up to 1 month. 2. For every 100 cm2 membrane, 10–15 mL Hyb buffer should be used for prehybridization. Measure the appropriate amount of Hyb buffer for prehybridization and place it in a clean tube and pre-warm it in a 68  C water bath (see Note 14). 3. Put the membrane in a hybridization bag, add the pre-warmed Hyb buffer from the previous step, seal the bag properly and incubate the membrane in Hyb buffer at 68  C for at least 30 min with gentle agitation (see Note 15). Prehybridization can be up to several hours as far as the membrane remains wet.

3.4.2 Hybridization

1. For every 100 cm2 membrane, 3.5 mL Hyb buffer is needed for hybridization. Measure the appropriate amount of Hyb buffer for hybridization and place it in a clean tube and pre-warm it in a 68  C water bath (see Note 14). 2. Determine the amount of RNA probe needed (see Note 16) and place it into a microcentrifuge tube with 50μL RNase-free water. Denature the probe by heating the tube at 85  C for 5 min and chill on ice immediately. 3. Mix the denatured probe with pre-warmed Hyb buffer by inversion. 4. Remove prehybridization buffer from the membrane and immediately replace with the pre-warmed hybridization buffer containing the probe. 5. Seal the bag properly and incubate the membrane in probecontaining Hyb buffer at 68  C overnight with gentle agitation (see Note 15). 6. The next day, pre-warm the High Stringency Buffer to 68  C and pour Low Stringency Buffer in an RNase-free container at room temperature and make sure it is enough to cover the membrane. 7. Cut open the hybridization bag, remove the Hyb buffer, and immediately submerge the membrane in the Low Stringency Buffer.

152

Yueying Wang et al.

8. Wash the membrane twice in Low Stringency Buffer at room temperature for 5 min each time with shaking. 9. Wash the membrane twice in High Stringency Buffer at 68  C for 5 min each time with shaking (see Note 17). 3.5 Detection of DIG-Probe/Target RNA Hybrids 3.5.1 Localizing the Probe-Target Hybrid with Anti-DIG Antibody

1. Transfer the membrane from the last wash in High stringency buffer to a plastic container with 100 mL Washing buffer. Incubate for 2 min at room temperature and discard the Washing buffer. 2. Add 100 mL Blocking buffer onto the membrane and incubate for more than 30 min (up to 3 h) with shaking at room temperature. 3. Dilute anti-DIG-alkaline phosphatase antibody at the ratio of 1:5000 in Blocking buffer and incubate the membrane in 20 mL diluted antibody for 30 min at room temperature with shaking. 4. Wash membrane twice with 100 mL of Washing buffer for 15 min each time at room temperature.

3.5.2 Visualizing Probe-Target Hybrids Using Chromogenic or Chemiluminescent Method (See Note 18)

1. Equilibrate the membrane in 20 mL Detection buffer for 3 min at room temperature. If using the chromogenic method, prepare the color substrate solution while equilibrating the membrane. 2. For chromogenic detection: (a) Put the membrane with the RNA side facing up in a container and incubate in 10 mL color substrate solution in the dark without shaking. (b) When the desired intensity for the band is observed, discard the color substrate solution and rinse the membrane in 50 mL of TE buffer for 5 min (see Note 19). (c) Document the result by photographing the membrane (see Note 20). 3. For chemiluminescent detection: (a) Put the membrane with the RNA side facing up on a plastic sheet (i.e., cut out of a hybridization bag) and add 20 drops of CDP-Star, Ready-to-Use reagent. (b) Immediately cover the membrane with another sheet to evenly distribute the reagent without creating any bubbles. (c) Incubate for 5 min at room temperature. (d) Squeeze out excess reagent and seal the bag. (e) Develop the membrane with an X-ray film in a dark room (see Note 20).

Detection of Long Non-coding RNA Expression by Non-radioactive Northern Blots

4

153

Notes 1. The desired DNA template should be a plasmid containing a promoter for in vitro transcription (i.e., T7 or T3) and a target sequence whose 50 end is placed as close as possible to the 30 end of the promoter. We usually use pBluescript SK(+) and transcribe the target sequence using T7 polymerase. Minimize any unnecessary addition of non-lncRNA sequence into the plasmid to avoid improper RNA folding. 2. Northern blot analysis is a golden standard in RNA detection and analysis. There are many protocols developed by laboratories specialized in RNA research or companies. The protocol described here is adapted from the NorthernMax procedure from Invitrogen and DIG application manual for filter hybridization from Roche. In our hand, this protocol is time efficient and gives satisfying results without using radioactivity. 3. Always cast the gel in a fume hood as the denaturing solution contains formaldehyde. Solidified gels can be wrapped up and stored at 4  C overnight. 4. Do not let gel soaked in running buffer for more than 1 h before loading. 5. Load no more than 30μg total RNA in each lane. As the binding capacity of the membrane is limited, more RNA loaded does not guarantee a stronger signal. Overloading can lead to the detection of minor degradation of targeted RNAs. 6. If the total volume of sample and dye exceed the capacity of the wells, it is necessary to concentrate the RNA by precipitation and suspend the pellet in smaller volume of water before adding the loading dye. 7. Use a heat block instead of a water bath to avoid contaminating the samples with water. 8. The voltage is decided by the distance between the two electrodes (not the size of the gel). Usually, the run takes about 2 h. If the run is longer than 3 h, exchange the buffer at the two end chambers to avoid the pH gradient. 9. RNA gels that stained with Ethidium bromide are not suitable for Northern blot analysis. Therefore, if a visual examination or photograph of total RNA samples is needed as a reference for the northern blot, we suggest the researchers to either run the same set of samples on a separate gel or stain with Ethidium bromide the gel just for visualization; or de-stain the gel before continuing northern blot analysis. If a gel will be subjected to Northern analysis after UV visualization, avoid prolonged exposure of the gel to UV light.

154

Yueying Wang et al.

10. It is essential to ensure that the only way for the transfer buffer to run from reservoir to the dry paper stack is through the gel. Therefore, extra care is needed to assemble the stack properly to avoid shortcut. The most common shortcut happens between the bridge and the paper beneath the gel. One can cover the edges of the gel with Parafilm to prevent this from happening. 11. Transfer longer than 4 h may cause small RNA hydrolysis and reduce yield. 12. The autocrosslink Mode of Stratalinker® UV Crosslinker delivers a preset exposure of 1200 microjoules to the membrane and takes about 40 s. Other methods of crosslinking RNA to membrane are available and can be used at this step as well. 13. Once the membrane is wet during prehybridization, it is important to avoid it getting dry during the hybridization and detection process. Dried membrane will have high background. Only if the membrane will not be stripped and reprobed, it can be dried after the last high stringency wash and stored at 4  C for future analysis. 14. For most northern blot hybridization using DIG Easy Hyb buffer, 68  C is appropriate for both prehybridization and hybridization. In cases of more heterologous RNA probes being used, the prehybridization and hybridization temperature need to be optimized. 15. Prehybridization/hybridization can be performed in containers other than bags, as far as it can be tightly sealed. Sealing the hybridization container can prevent the release of NH4, which changes the pH of the buffer, during incubation. 16. For RNA probe synthesized by in vitro transcription, it is recommended that the probe concentration should be 100 ng per mL Hyb buffer. 17. If the probe is less than 80% homologous to the target RNA, the high stringency wash should be performed at a lower temperature, which needs to be empirically determined. 18. The DIG probe-target RNA hybrids can be detected in two ways. One uses chemiluminescent method, whereas the other uses chromogenic method. The chemiluminescent method is sensitive and fast, but it requires the usage of the films and the accessibility of a darkroom. The chromogenic method requires no film or dark room and different targets can be detected simultaneously using different colored substrate. However, the chromogenic method may not be sensitive enough for low-abundant targets. 19. At this step, if there are multiple membranes, process one at a time. Depending on the abundance of target RNAs, the band

Detection of Long Non-coding RNA Expression by Non-radioactive Northern Blots

155

may appear as quickly as a few minutes after adding the chromogenic agents. The reaction can be stopped when the band reaches a desired intensity. 20. If reprobing is needed, photograph the result while the membrane is wet and proceed to stripping and reprobing. If no reprobing is needed, dry the membrane, document the result by photograph, and store the dried membrane in a clean bag at room temperature.

Acknowledgments This work was supported, in whole or in part, by the NIH (R01CA142776, R01CA190415, R01CA225929, R01CA262070), Foundation for Women’s Cancer (XH), and the Ovarian Cancer Research Alliance (XH). References 1. Rinn JL, Chang HY (2012) Genome regulation by long noncoding RNAs. Annu Rev Biochem 81:145–166 2. Lee JT (2012) Epigenetic regulation by long noncoding RNAs. Science 338:1435–1439 3. Prensner JR, Chinnaiyan AM (2011) The emergence of lncRNAs in cancer biology. Cancer Discov 1:391–407 4. Guttman M, Rinn JL (2012) Modular regulatory principles of large non-coding RNAs. Nature 482:339–346 5. Spizzo R, Almeida MI, Colombatti A, Calin GA (2012) Long non-coding RNAs and cancer: a new frontier of translational research? Oncogene 31:4577–4587 6. Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A et al (2012) Landscape of transcription in human cells. Nature 489:101–108 7. Pachnis V, Brannan CI, Tilghman SM (1988) The structure and expression of a novel gene activated in early mouse embryogenesis. EMBO J 7:673–681 8. Bartolomei MS, Zemel S, Tilghman SM (1991) Parental imprinting of the mouse H19 gene. Nature 351:153–155 9. Brown CJ, Ballabio A, Rupert JL, Lafreniere RG, Grompe M, Tonlorenzi R et al (1991) A gene from the region of the human X inactivation centre is expressed exclusively from the inactive X chromosome. Nature 349:38–44 10. Kapranov P, Cawley SE, Drenkow J, Bekiranov S, Strausberg RL, Fodor SP et al (2002) Large-scale transcriptional activity in

chromosomes 21 and 22. Science 296:916–919 11. Rinn JL, Euskirchen G, Bertone P, Martone R, Luscombe NM, Hartman S et al (2003) The transcriptional activity of human chromosome 22. Genes Dev 17:529–540 12. Rinn JL, Kertesz M, Wang JK, Squazzo SL, Xu X, Brugmann SA et al (2007) Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129:1311–1323 13. Gupta RA, Shah N, Wang KC, Kim J, Horlings HM, Wong DJ et al (2010) Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature 464:1071–1076 14. Guttman M, Amit I, Garber M, French C, Lin MF, Feldser D et al (2009) Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458:223–227 15. Khalil AM, Guttman M, Huarte M, Garber M, Raj A, Rivea Morales D et al (2009) Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc Natl Acad Sci U S A 106:11667–11672 16. Maeda N, Kasukawa T, Oyama R, Gough J, Frith M, Engstrom PG et al (2006) Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs. PLoS Genet 2:e62 17. Jia H, Osak M, Bogu GK, Stanton LW, Johnson R, Lipovich L (2010) Genome-wide computational identification and manual

156

Yueying Wang et al.

annotation of human long noncoding RNA genes. RNA 16:1478–1487 18. Orom UA, Derrien T, Beringer M, Gumireddy K, Gardini A, Bussotti G et al (2010) Long noncoding RNAs with enhancer-like function in human cells. Cell 143:46–58 19. Prensner JR, Iyer MK, Balbin OA, Dhanasekaran SM, Cao Q, Brenner JC et al (2011) Transcriptome sequencing across a prostate cancer cohort identifies PCAT-1, an unannotated lincRNA implicated in disease progression. Nat Biotechnol 29:742–749 20. Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A et al (2011) Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev 25:1915–1927 21. Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H et al (2012) The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res 22:1775–1789 22. Brown CJ, Hendrich BD, Rupert JL, Lafreniere RG, Xing Y, Lawrence J et al (1992) The human XIST gene: analysis of a 17 kb inactive X-specific RNA that contains conserved repeats and is highly localized within the nucleus. Cell 71:527–542 23. Zhao J, Sun BK, Erwin JA, Song JJ, Lee JT (2008) Polycomb proteins targeted by a short repeat RNA to the mouse X chromosome. Science 322:750–756 24. Jeon Y, Lee JT (2011) YY1 tethers Xist RNA to the inactive X nucleation center. Cell 146:119–133 25. Tsai MC, Manor O, Wan Y, Mosammaparast N, Wang JK, Lan F et al (2010) Long noncoding RNA as modular scaffold of histone modification complexes. Science 329:689–693 26. Yap KL, Li S, Munoz-Cabello AM, Raguz S, Zeng L, Mujtaba S et al (2010) Molecular interplay of the noncoding RNA ANRIL and methylated histone H3 lysine 27 by polycomb CBX7 in transcriptional silencing of INK4a. Mol Cell 38:662–674 27. Kotake Y, Nakagawa T, Kitagawa K, Suzuki S, Liu N, Kitagawa M et al (2011) Long non-coding RNA ANRIL is required for the PRC2 recruitment to and silencing of p15 (INK4B) tumor suppressor gene. Oncogene 30:1956–1962 28. Huarte M, Guttman M, Feldser D, Garber M, Koziol MJ, Kenzelmann-Broz D et al (2010) A

large intergenic noncoding RNA induced by p53 mediates global gene repression in the p53 response. Cell 142:409–419 29. Grote P, Wittler L, Hendrix D, Koch F, Wahrisch S, Beisaw A et al (2013) The tissuespecific lncRNA Fendrr is an essential regulator of heart and body wall development in the mouse. Dev Cell 24:206–214 30. Kino T, Hurt DE, Ichijo T, Nader N, Chrousos GP (2010) Noncoding RNA gas5 is a growth arrest- and starvation-associated repressor of the glucocorticoid receptor. Sci Signal 3:ra8 31. Hung T, Wang Y, Lin MF, Koegel AK, Kotake Y, Grant GD et al (2011) Extensive and coordinated transcription of noncoding RNAs within cell-cycle promoters. Nat Genet 43:621–629 32. Tripathi V, Ellis JD, Shen Z, Song DY, Pan Q, Watt AT et al (2010) The nuclear-retained noncoding RNA MALAT1 regulates alternative splicing by modulating SR splicing factor phosphorylation. Mol Cell 39:925–938 33. Bernard D, Prasanth KV, Tripathi V, Colasse S, Nakamura T, Xuan Z et al (2010) A long nuclear-retained non-coding RNA regulates synaptogenesis by modulating gene expression. EMBO J 29:3082–3093 34. Tripathi V, Shen Z, Chakraborty A, Giri S, Freier SM, Wu X et al (2013) Long noncoding RNA MALAT1 controls cell cycle progression by regulating the expression of oncogenic transcription factor B-MYB. PLoS Genet 9: e1003368 35. Salmena L, Poliseno L, Tay Y, Kats L, Pandolfi PP (2011) A ceRNA hypothesis: the Rosetta stone of a hidden RNA language? Cell 146:353–358 36. Kim TK, Hemberg M, Gray JM, Costa AM, Bear DM, Wu J et al (2010) Widespread transcription at neuronal activity-regulated enhancers. Nature 465:182–187 37. Wang D, Garcia-Bassets I, Benner C, Li W, Su X, Zhou Y et al (2011) Reprogramming transcription by distinct classes of enhancers functionally defined by eRNA. Nature 474:390–394 38. Melo CA, Drost J, Wijchers PJ, van de Werken H, de Wit E, Oude Vrielink JA et al (2013) eRNAs are required for p53-dependent enhancer activity and gene transcription. Mol Cell 49:524–535 39. Lai F, Orom UA, Cesaroni M, Beringer M, Taatjes DJ, Blobel GA et al (2013) Activating RNAs associate with mediator to enhance chromatin architecture and transcription. Nature 494:497–501

Chapter 14 Assessment of In Vivo siRNA Delivery in Cancer Mouse Models Lingegowda S. Mangala, Cristian Rodriguez-Aguayo, Emine Bayraktar, Nicholas B. Jennings, Gabriel Lopez-Berestein, and Anil K. Sood Abstract RNA interference (RNAi) has rapidly become a powerful tool for target discovery and therapeutics. Small interfering RNAs (siRNAs) are highly effective in mediating sequence-specific gene silencing. However, the major obstacle for using siRNAs for cancer therapeutics is their systemic delivery from the administration site to target cells in vivo. This chapter describes approaches to deliver siRNA effectively for cancer treatment and discusses in detail the current methods to assess pharmacokinetics and biodistribution of siRNAs in vivo. Key words siRNA Delivery, Ovarian Cancer, Chitosan-siRNA Nanoparticles, Cancer Therapy, Stemloop RT-PCR

1

Introduction Classical analyses of gene function are performed by generating knockout (KO) mouse models and observing a phenotype [1]. Even though KO of genes offers powerful means to discover disease-related genes such as oncogenes in vivo, development of drugs such as small molecule compounds or antibodies are required for clinically relevant therapeutic strategies. However, these approaches do have limitations [2]. Small-molecule inhibitors are frequently associated with undesirable toxicities and antibodies are only useful for targets accessible in the circulation or located on the surface of target cells. Since the discovery of RNA interference (RNAi) [3] and the application of small interfering RNA (siRNA) to silence desired target genes [4], siRNA has become an alternative technology to analyze gene function and discover drug targets. Since siRNAs can inhibit the expression of any gene of interest, we can utilize this technology for targeting previously undruggable genes. Hence, the use of siRNA is attractive for cancer therapy.

Lin Zhang and Xiaowen Hu (eds.), Long Non-Coding RNAs: Methods and Protocols, Methods in Molecular Biology, vol. 2372, https://doi.org/10.1007/978-1-0716-1697-0_14, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021

157

158

Lingegowda S. Mangala et al.

Despite their promise, several hurdles must be overcome for successful use of siRNA in the clinic. SiRNA is easily degraded in the bloodstream by ribonucleases (RNase); eliminated by renal excretion; and cannot pass through a cellular membrane readily because of its large molecular weight, high hydrophilicity, and negative charge [5]. Thus, effective siRNA delivery systems are needed for this approach to be successful. Many groups have developed siRNA delivery systems for cancer using a variety of formulations such as liposomes, polymers, or micelles [5–7]. The physical properties of delivery systems such as size, shape, and surface charge are critical factors for delivery of nanoparticles to tumors after systemic administration. It is well established that longcirculating nanoparticles with an average diameter of 100 nm accumulate efficiently in tumor tissues via the enhanced permeability and retention (EPR) effect based on the fact that tumor vessels are irregularly shaped, defective, leaky, and have varying sizes compared with normal capillaries [8, 9]. Intratumoral mobility of nanoparticles may be affected by higher interstitial fluid pressure and soluble factors in solid tumors, population of stromal cells, and density of extracellular matrix in tumor. For example, polymeric micelles of 30 nm in diameter showed penetration in stromal-rich pancreatic tumors but that of 70 nm showed no penetration [10]. After being taken up by target cells through endocytosis, siRNAs need to be released from endosomes into cytosol. These sequential steps from administration site to cytosol in target cells should be considered for development of siRNA delivery systems for cancer treatment [2, 5, 7]. Importantly, siRNAs need to be effectively delivered to tumors to exert therapeutic effect. Therefore, determination of pharmacokinetic profiles of administered siRNA in the body is an important issue for the clinical development of siRNA medicine. Here, we describe in vivo siRNA delivery in orthotopic ovarian cancer (OvCa) models using chitosan-siRNA nanoparticles [11, 12] and quantification of siRNAs by stem-loop quantitative reverse-transcribed (qRT)-PCR and fluorescence-based assays [13].

2

Materials

2.1 Commercial Reagents

1. Chitosan (CH), low molecular weight (Sigma-Aldrich). 2. Sodium tripolyphosphate (TPP; Sigma-Aldrich). 3. Glacial acetic acid (Thermo Scientific). 4. Molecular biology grade water (Lonza). 5. A nonsilencing fluorescent siRNA sequence tagged with Cy5.5 siRNA (Sigma-Aldrich).

Assessment of In Vivo siRNA Delivery in Cancer Mouse Models

159

6. Human ovarian cancer cell lines: SKOV3, HeyA8, and A2780 (ATCC). 7. Phosphate-buffered saline (PBS), 1x without calcium and magnesium (Corning). 8. Roswell Park Memorial Institute-1640 (RPMI; Sigma-Aldrich) Medium supplemented with 15% fetal bovine serum (FBS; Invitrogen) and 0.1% gentamycin (Corning). 9. Trypsin solution (0.25%) and ethylenediamine tetraacetic acid (EDTA, 1 mM; Sigma-Aldrich). 10. Optimal cutting temperature compound (OCT compound; Miles, Inc., Elkhart, IN). 11. Acetone (Fisher Scientific). 12. Pap Pen (Abcam). 13. Hoechst 33342 Scientific).

trihydrochloride,

trihydrate

(Thermo

14. Fluorescent mounting medium: Glycerol-propyl gallate in PBS. 15. Hanks’ Balanced Salt Solution (HBSS) (Mediatech, Inc.). 16. TRIzol (Life Technologies). 17. Direct-zol RNA Kit (Zymo Research). 18. Verso cDNA Synthesis Kit (Thermo Scientific). 19. TaqMan miRNA assays (Applied Biosystems). 20. 2 Fast SYBR Green Master Mix (Applied Biosystems). 2.2

Equipment

1. Centrifuges, 5417R and 5810R (Eppendorf). 2. A homogenizer, TISSUE MASTER 125 homogenizer (OMNI International, Kennesaw GA, USA) for homogenization of tissue samples. 3. A spectrophotometer, NanoDrop 2000c (Thermo Scientific) for RNA quantification. 4. A thermal cycler, Mastercycler pro (Eppendorf) for RT-PCR. 5. A real-time thermal cycler, 7500 Fast Real-Time PCR System (Applied Biosystems) for real-time PCR. 6. Fluorescent microscope (Nikon NIS Elements). 7. In vivo imaging system, IVIS 200 system (Xenogen) for ex vivo imaging for siRNAs.

3

Methods Chitosan (CH) is a linear polysaccharide composed of randomly distributed β-linked D-glucosamine and N-acetyl-D-glucosamine.

160

Lingegowda S. Mangala et al.

3.1 Preparation of siRNA–Chitosan Nanoparticles (siRNA/ CH-NPs)

CH is biodegradable, biocompatible, low immunogenicity, and low toxicity, which makes it an attractive tool for clinical and biological applications [14–16]. Due to the presence of protonated amino groups, negatively charged nucleic acids can be loaded in CH and siRNA/CH-NPs can effectively interact with cell membrane. Therefore, we developed CH nanoparticles to deliver siRNA into tumors [11, 12]. siRNA/CH-NPs are prepared based on ionic gelation of anionic TPP and siRNA with cationic CH. 1. Prepare 0.25% acetic acid by dissolving 0.25 mL glacial acetic acid in 99.75 mL of water. 2. Obtain CH solution by dissolving CH (2 mg/mL) in 0.25% acetic acid. 3. Prepare 0.25% TPP by dissolving 0.25 g of TPP in 100 mL of water. 4. Mix siRNA (1 μg/μL) with TPP (0.25% w/v) solution and slowly add to CH solution dropwise under constant stirring at room temperature for spontaneous generation of siRNA/CHNPs. 5. After incubating at 4  C for 40 min, collect siRNA/CH-NPs by centrifugation at 20,817  g for 50 min at 4  C, and discard the supernatant. 6. Wash the pellet three times with water by centrifuging at 20,817  g for 50 min at 4  C to remove unbound chemicals or siRNA. 7. Dissolve the pellet in water (5 μg/100μL) and store siRNA/ CH-NPs at 4  C until use (see Note 1).

3.2 Delivery of siRNA/CH-NPs into Ovarian Cancer Cells

1. Cell Culture. (a) Maintain ovarian cancer HeyA8 cells by culturing in RPMI-1640 medium supplemented with 15% FBS and 0.1% gentamycin in log phase of cell growth at 37  C in 5% CO2/95% air. (b) Use 60–80% confluent cells for in vivo delivery studies (see Note 2). 2. Preparation of cells for in vitro transfection with siRNA/CHNPs. (a) Wash the cells with PBS twice and detach using trypsinEDTA solution. Neutralize the trypsin action by adding FBS-containing RPMI-1640 medium (complete medium). Count the cells using trypan blue and seed in six-well tissue culture plate. Cells should be plated 24 h prior to transfection.

Assessment of In Vivo siRNA Delivery in Cancer Mouse Models

161

Fig. 1 Uptake of Cy5.5 siRNA/CH-NPs by HeyA8 cells: Fluorescence microscopy image of HeyA8 cells after incubating with 1μg Cy5.5 siRNA/CH-NPs at 37  C for 24 h in RPMI-1640 medium (a) cells alone, (b) Cy5.5 siRNA alone, and (c) merge of Cy5.5 siRNA and cells. Magnification: 200

(b) After 24 h, replace the complete medium with serum-free RPMI-1640 medium, and slowly add Cy5.5-non silencing control siRNA/CH-NPs dropwise onto the cells without using transfection reagent (see Note 3). After 6 h, replace serum-free medium with complete medium. Visualize delivery of siRNA/CH-NPs into cells after 24 h using fluorescent microscope (Fig. 1). 3.3 Development of Orthotopic In Vivo Models of Ovarian Cancer

1. Female athymic nude mice (NCr-nu; 8–12 weeks old) were obtained from the Taconic Biosciences. All studies were approved and supervised by the University of Texas M.D. Anderson Cancer Center Institutional Animal Care and Use Committee. 2. Maintain human ovarian cancer cells such as SKOV3ip1, HeyA8, or A2780 by culturing in RPMI-1640 complete medium in 10 or less passages prior to injection into mice (see Note 2). 3. Detach the cells with trypsin and neutralize using complete medium followed by centrifuging at 300  g for 5 min at 4  C and washing the cell pellet twice with PBS. 4. Resuspend cells using HBSS at a concentration of 1.25  106 (HeyA8) or 5  106 (SKOV3ip1 or A2780) cells per mL. 5. Inject cells (0.25  106 (HeyA8) or 1  106 (SKOV3ip1 or A2780) per 200 μL) per mouse into the peritoneal cavity through intraperitoneal (i.p.) injection using 30G needles. 6. After tumors have been established (once tumors reach 0.5– 1.0 cm3 by palpation, approximately 17 days after tumor cells injection), inject Cy5.5 siRNA/CH-NPs (5.0 μg/100μL per mouse) into the tail vein through intravenous (i.v.) bolus under normal pressure. 7. Harvest tumors and other organs at various time points and place the tissue in OCT and freeze in liquid nitrogen (see Note 4).

162

Lingegowda S. Mangala et al.

3.4 Determination of Uptake of Fluorescent siRNA/ CH-NPs in Ovarian Tumors

1. Fix OCT frozen tumor sections (8 μm thickness) with fresh cold acetone in a Coplin staining jar for 10 min and wash three times with PBS for 5 min each. 2. Align the slides on a slide holder and place in a humidified chamber (for hydration). Draw a circle around the tissue with a Pap pen and add a drop of PBS around the tissue; see Note 5). 3. Stain the nuclei with Hoechst 33342 (diluted 1:10,000 in PBS) for 10 min at RT. 4. Wash tissues three times with PBS and mount using mounting medium (glycerol-propyl-gallate) followed by cover slipping. 5. Visualize the uptake of siRNA/CH-NPs using a conventional fluorescence microscope using appropriate filters such as blue filter for nucleus and red filter for fluorescent siRNA (Fig. 2).

3.5 RNA Isolation from Blood, Plasma, and Tissue

1. Sample preparation from blood. (a) At different time points, place mice under anesthesia using isoflurane and collect blood from inferior vena cava or by cardiac puncture into RNase-free cryotubes using 25-G needles. (b) Mix blood (typically 200 μL) with three times volume of TRIzol (600 μL) and vortex the mixture (see Note 6). (c) Centrifuge the mixture at 12,000  g for 10 min at 4  C and process the supernatant for total RNA isolation. 2. Sample preparation from plasma (a) Store blood in RNase-free tubes with each tube containing 84.3 μL K2EDTA per mL of blood. (b) Mix blood thoroughly with anticoagulant by inverting the tube immediately 10 times (see Note 7). (c) Centrifuge the mixture at 12,000  g for 10 min at 4  C which results in the formation of three layers: (from top to

Fig. 2 Uptake of siRNA following single intravenous injection of Cy5.5 siRNA/CH-NPs by orthotopic HeyA8 tumor bearing nude mice. Tumor tissues were collected after injection of untagged control siRNA/CH or Cy5.5 siRNA/CH-NPs and stained with Hoechst 33342 (blue denotes nuclei and red denotes Cy5.5 siRNA). Magnification (a) 100, (b) 200, and (c) 400

Assessment of In Vivo siRNA Delivery in Cancer Mouse Models

163

bottom) plasma, leukocytes (buffy coat), and erythrocytes. Aspirate the supernatant carefully (plasma) to a tube (can be stored at 80  C till further use). (d) Mix plasma (typically 200 μL) with three times volume of TRIzol (600 μL). (e) Centrifuge the mixture at 12,000  g for 10 min at 4  C and process the resulting supernatant for total RNA isolation. 3. Sample preparation from tissues (a) Collect tissues, such as tumor, brain, lung, heart, liver, spleen, and kidney in RNase-free cryotubes, and snapfreeze them in liquid nitrogen. Store the tissues at 80  C until use. (b) Just prior to use, thaw the tissue on ice. Transfer the weighed tissue (typically 50–100 mg) into 5 mL polystyrene round-bottom tubes with 750 μL of TRIzol (see Note 8). (c) Homogenize the tissue with a homogenizer (see Note 9). (d) Centrifuge the tissue homogenate at 12,000  g for 10 min at 4  C and process the resulting supernatant for total RNA isolation. 4. Isolate total RNA from the supernatant of blood, plasma, or tissue using Direct-zol RNA Kit according to the manufacturer’s protocol. 5. Quantify RNA using NanoDrop. 3.6 siRNA Quantification in Blood and Tissue by Stem-Loop qRT-PCR

Stem-loop qRT-PCR can be utilized to quantify small RNA fragments (e.g., miRNA) [17]. First, hybridize a miRNA-specific stemloop RT primer to the miRNA and then reverse transcribe to get cDNA. Next, amplify cDNA by regular real-time PCR using a miRNA-specific forward primer and the universal reverse primer. Stem-loop qRT-PCR method also can be applied for quantification of siRNA, which gives high sensitivity, selectivity, and wide dynamic range of detection of siRNA as compared with other means such as enzyme-linked immunosorbent assay (ELISA) or highperformance liquid chromatography (HPLC) [18]. Therefore, stem-loop qRT-PCR technique can be used for the quantification of administered siRNA in blood and tissues. The primer for siRNA in stem-loop PCR was designed for each sequence. The forward primer in PCR amplification was designed based on the siRNA sequence and a universal reverse primer (50 - GACCTGTCCGAT CACGACGAG-30 ) was used. 1. Prepare standard siRNA by serial twofold dilutions of siRNA with RNase-free water.

164

Lingegowda S. Mangala et al.

2. Standard blood sample can be prepared by adding 5 ng siRNA to 200 μL of naı¨ve blood or plasma obtained from nontreated mouse. 3. Subject siRNA/blood or plasma mixture to RNA isolation as mentioned before. 4. Perform stem-loop PCR using TaqMan miRNA assays by subjecting isolated RNA (1–10 μg) from blood and tissues, and standard siRNAs (~2.5 pg) according to the manufacturer’s protocol. Combine 10 μL of total RNA/siRNA with 5 μL of master mix of stem-loop PCR and 2 pmol stem-loop PCR primer, then perform stem-loop PCR using Mastercycler pro. Condition for the stem-loop PCR reaction is as follows: 16  C for 30 min, 42  C for 30 min, 85  C for 5 min, then hold at 4  C. 5. Mix 1.3 μL of cDNA with 10 μL PCR amplification reaction mix (2 Fast SYBR Green Master Mix), 20 pmol of forward and reverse primer sets at a volume of 20 μL. 6. Perform PCR amplification using the 7500 Fast Real-Time PCR System. Condition for the RT reaction is as follows: 95  C for 15 min enzyme activation, then 40 cycles of 95  C for 15 s denaturation and 60  C for 1 min annealing/ extension. 7. Calculate the amount of siRNA in blood sample using the standard curve (Fig. 3a). 8. siRNA concentration in blood collected at different time points C(t) can express as amount of siRNA per mL of blood. 9. Perform pharmacokinetic analysis as described below. C(t) can fit by an appropriate equation for one or two-compartment models using software such as GraphPad prism, MULTI, or other appropriate programs (Fig. 3b) (see Note 10). One compartment: C(t) ¼ Aeαt, two compartment: C (t) ¼ Aeαt + Beβt. Calculate area under the curve (AUC0-t) of blood concentration by integration of C(t) up to a given time point. Rt AUC ¼ 0 Cðt Þdt 10. Calculate levels of siRNA in various organs using standard siRNA curve (Fig. 3c). 3.7 FluorescenceBased Biodistribution Study

In vivo imaging has become an important tool for the development of drug delivery systems. The near-infrared (NIR) fluorescence provides simultaneous acquisition of full-color white light imaging with NIR images, deeper penetration of NIR signal, and decreased tissue autofluorescence compared to visible light [19]. Therefore, nanoparticles labeled with NIR fluorophores such as Cy5.5 or DiR

Assessment of In Vivo siRNA Delivery in Cancer Mouse Models

165

Fig. 3 Stem-loop PCR for siRNA quantification. (a) Standard curve showing a plot of log amount of siRNA vs. Ct values quantified using qRT-PCR method. The equation derived from this plot is used for calculating the absolute amount of siRNA. (b) Time profile of plasma siRNA concentration. Closed squares and line represent the actual concentration of siRNA in blood and calculated the curve of siRNA concentration by fitting the data using two-compartment model using the MULTI program, respectively. (c) siRNA amount in organs is calculated using the siRNA standard curve shown in (a)

allows determination of amount in organs ex vivo and noninvasive evaluation of biodistribution in vivo after their administration. 1. Administer Cy5.5-labeled siRNA loaded nanoparticles into tumor-bearing mice i.v. at a dose of 2.5 μg siRNA. 2. Excise tumor and organs at different time points (typically 24 or 48 h) followed by washing with cold PBS and transfer into 6-well plate.

166

Lingegowda S. Mangala et al.

Fig. 4 Fluorescence-based images of Cy5.5 labeled siRNA in organs. The levels of Cy5.5 siRNA were measured in various organs at 48 h postadministration of nanoparticles in tumor-bearing mice. The scale bar represents the fluorescence intensity in ps1 mW1 cm2

3. Capture fluorescent image of excised tumor and organs using the Xenogen IVIS 200 system with Cy 5.5 fluorophore excitation (678 nm) and Emission (703 nm) filter. 4. Analyze fluorescent images using Living Image 2.5 software. Regions of interest can be drawn for each organ and total radiant efficiency ps1μW1 cm2 is measured (Fig. 4). 3.8 Quantifying Levels of Gene Knockdown Using Quantitative Reverse Transcription PCR (qRT-PCR)

1. Reverse transcribe the isolated RNA (500–1000 ng) from tumor tissue using a Verso cDNA Synthesis Kit as per the manufacturer’s instructions using a Mastercycler pro. 2. Subject 2 μL diluted cDNA (typically 2–10 fold dilution) to PCR amplification as mentioned earlier. 3. Perform PCR using the 7500 Fast Real-Time PCR System. Each cycle consists of 15 s of denaturation at 95  C and 1 min of annealing and extension at 60  C (40 cycles). 4. Quantify relative levels of gene expression using the ΔΔCt method.

4

Notes 1. Once siRNA/CH-NPs are reconstituted in water, do not freeze and thaw the suspension because this process will rupture the particles. siRNA/CH-NPs should be stored in 4  C till further use. 2. Use cells at 60–80% confluence for both in vitro and in vivo experiments. Injecting over-confluent and more than tenth passage cells may lead to lower tumor take rates.

Assessment of In Vivo siRNA Delivery in Cancer Mouse Models

167

3. Do not use transfection reagent to transfect the cells with siRNA/CH-NPs because CH is cationic in nature due to its amino group that enables easy and fast complex formation with negatively charged siRNA. 4. Freezing needs to be indirect by surrounding a heat block with liquid nitrogen, and placement of OCT/tissue container on the block. 5. During immunofluorescence staining procedure, try to keep the tissues hydrated with PBS since letting the tissue dry may cause high background. 6. Blood samples should be vortexed immediately after mixing with TRIzol, or blood solidifies in TRIzol. If storage is necessary prior to use, blood needs to be collected in a tube with an anticoagulant agent and subsequently stored at 80  C. 7. Process the samples as soon as possible. If storage is necessary prior to use, store the blood at room temperature, shielded from light. 8. Hard tissues should be cut into small pieces with scissors before adding TRIzol, which results in efficient homogenization. 9. Output power of homogenizer should be adjusted depending on softness of organ. Soft organs such as brain and liver are homogenized at low power to avoid bubble; hard organs such as spleen and heart are homogenized at high power. Homogenization should be done with tubes immersed in ice cold water to avoid generating heat. 10. Five or more time points are required for curve fitting of the time profile of siRNA concentration in blood or plasma.

Acknowledgments L.S.M was supported by CPRIT (DP 150091 and RP120214). C. R-A. was supported by the NIH through the Ovarian SPORE Career Enhancement Program and NCI grant P50 CA217685. Portions of this work were supported by NIH grants (P50CA217685, P50CA098258, CA016672, CA209904, UO1 CA213759), the American Cancer Society Research Professor Award, and the Frank McGraw Memorial Chair in Cancer Research. This research was also supported, in part, by the Blanton-Davis Ovarian Cancer Research Program. References 1. Gondo Y (2008) Trends in large-scale mouse mutagenesis: from genetics to functional genomics. Nat Rev Genet 9:803–810

2. Pecot CV, Calin GA, Coleman RL, LopezBerestein G, Sood AK (2011) RNA

168

Lingegowda S. Mangala et al.

interference in the clinic: challenges and future directions. Nat Rev Cancer 11:59–67 3. Fire A, Xu S, Montgomery MK, Kostas SA, Driver SE et al (1998) Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans. Nature 391:806–811 4. Elbashir SM, Harborth J, Lendeckel W, Yalcin A, Weber K et al (2001) Duplexes of 21-nucleotide RNAs mediate RNA interference in cultured mammalian cells. Nature 411:494–498 5. Hatakeyama H, Akita H, Harashima H (2011) A multifunctional envelope type nano device (MEND) for gene delivery to tumours based on the EPR effect: a strategy for overcoming the PEG dilemma. Adv Drug Deliv Rev 63:152–160 6. Kanasty R, Dorkin JR, Vegas A, Anderson D (2013) Delivery materials for siRNA therapeutics. Nat Mater 12:967–977 7. Wu SY, Lopez-Berestein G, Calin GA, Sood AK (2014) RNAi therapies: drugging the undruggable. Sci Transl Med 6:240ps247 8. Matsumura Y, Maeda H (1986) A new concept for macromolecular therapeutics in cancer chemotherapy: mechanism of tumoritropic accumulation of proteins and the antitumor agent smancs. Cancer Res 46:6387–6392 9. McDonald DM, Choyke PL (2003) Imaging of angiogenesis: from microscope to clinic. Nat Med 9:713–725 10. Cabral H, Matsumoto Y, Mizuno K, Chen Q, Murakami M et al (2011) Accumulation of sub-100 nm polymeric micelles in poorly permeable tumours depends on size. Nat Nanotechnol 6:815–823

11. Lu C, Han HD, Mangala LS, Ali-Fehmi R, Newton CS et al (2010) Regulation of tumor angiogenesis by EZH2. Cancer Cell 18:185–197 12. Noh K, Mangala LS, Han HD, Zhang N, Pradeep S et al (2017) Differential effects of EGFL6 on tumor versus wound angiogenesis. Cell Rep 21:2785–2795 13. Wu SY, Yang X, Gharpure KM, Hatakeyama H, Egli M et al (2014) 2’-OMe-phosphorodithioate-modified siRNAs show increased loading into the RISC complex and enhanced antitumour activity. Nat Commun 5:3459 14. Han HD, Song CK, Park YS, Noh KH, Kim JH et al (2008) A chitosan hydrogel-based cancer drug delivery system exhibits synergistic antitumor effects by combining with a vaccinia viral vaccine. Int J Pharm 350:27–34 15. Katas H, Alpar HO (2006) Development and characterisation of chitosan nanoparticles for siRNA delivery. J Control Release 115:216–225 16. Zhang HM, Chen SR, Cai YQ, Richardson TE, Driver LC et al (2009) Signaling mechanisms mediating muscarinic enhancement of GABAergic synaptic transmission in the spinal cord. Neuroscience 158:1577–1588 17. Chen C, Ridzon DA, Broomer AJ, Zhou Z, Lee DH et al (2005) Real-time quantification of microRNAs by stem-loop RT-PCR. Nucleic Acids Res 33:e179 18. Cheng A, Li M, Liang Y, Wang Y, Wong L et al (2009) Stem-loop RT-PCR quantification of siRNAs in vitro and in vivo. Oligonucleotides 19:203–208 19. Frangioni JV (2003) In vivo near-infrared fluorescence imaging. Curr Opin Chem Biol 7:626–634

Chapter 15 Genetic Editing of Long Noncoding RNA Using CRISPR/Cas9 Technology Kristina Larter, Bin Yi, and Yaguang Xi Abstract Long noncoding RNAs (lncRNAs) are a class of RNA transcripts greater than 200 nucleotides in length and makeup a considerable part of the human genome. LncRNAs are well established as crucial players in a myriad of physiological and pathological processes; however, despite their abundance and versatility, the functional characteristics of lncRNAs remain largely unknown predominantly due to the lack of suitable genetic editing strategies. The complexity of their genetic structure and regulation combined with their unique functionality poses several limitations in the application of classic genetic manipulation methods in lncRNA functional studies. Several reports have demonstrated the successful implementation of CRISPR/ Cas9 technology in screening and identifying the function of specific lncRNAs. Here, we describe a detailed protocol utilizing CRISPR/Cas9 genetic editing technology for knocking down lncRNAs in vitro. Key words Long noncoding RNA, CRISPR/cas9, Knockdown, Genetic editing

1

Introduction Long noncoding RNAs (lncRNAs) are defined as transcripts greater than 200 nucleotides in length with minimal to no protein-coding potential and are enrich in the human genome [1]. LncRNAs mediate a myriad of physiological and pathological processes via gene regulation or by the mere act of their transcription [2–4]. Despite their abundance and versatile roles, the genomic loci and functionality of lncRNAs have been difficult to determine until recently, largely due to the challenges associated with manipulating their expression [5]. Some success has been seen using genomic editing tools such as zinc-finger nucleases (ZFNs), transcription activation like element nuclease (TALENs) and clustered regularly interspaced palindromic repeats (CRISPR)/Cas9 systems [6–8]. CRISPR-Cas9 technology is generally faster, cheaper, and more efficient than most existing genetic editing tools.

Lin Zhang and Xiaowen Hu (eds.), Long Non-Coding RNAs: Methods and Protocols, Methods in Molecular Biology, vol. 2372, https://doi.org/10.1007/978-1-0716-1697-0_15, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021

169

170

Kristina Larter et al.

gRNA

5’ 3’

tracrRNA

crRNA

3’

DSB NGG PAM

5’

Cas9 Fig. 1 The Cas9-gRNA complex. The guide RNA (gRNA) is composed of a crisprRNA (crRNA) and a tracrRNA sequence forms a duplex with the Cas9 endonuclease and guides it to designated target site. The Cas9 then induces a double stranded break (DSB) 3–4 nucleotides upstream of a the protospacer adjacent motif (PAM) –NGG

The CRISPR/Cas9 system was initially identified as a means of bacterial adaptive immunity and has since been exploited as a genomic engineering tool [9]. The system in essence consists of a Cas9 endonuclease guided by a crisprRNA (crRNA) and a tracrRNA that together form the guide RNA (gRNA) (see Fig. 1). The gRNA sequence is designed to be complementary to a given target genomic sequence and is used to direct the Cas9 endonuclease to this specific locus, thereby causing a double-stranded break (DSBs) 3–4 nucleotides upstream of the protospacer adjacent motif (PAM) -NGG . These DSBs repair either through the homologydirected repair (HDR) pathway or nonhomologous end joining (NHEJ)-dependent pathway. This allows the Cas9-gRNA complex to target and cleave genomic regions in a sequence-specific manner [9]. Difficulty in utilizing genomic editing tools for lncRNAs functional studies lies within the fact that genomic regions coding for lncRNAs are distributed over the whole genome, including intraand intergenic regions [5] (see Fig. 2). Therefore, it is difficult to selectively target single lncRNAs without disrupting neighboring genes. In addition, a single CRISPR/Cas9 cut site may not be enough to disrupt the function of lncRNA due to their noncoding nature and may require two cut sites to delete larger portions to render the lncRNA nonfunctional. Therefore, applying CRISPR methods to investigate lncRNAs requires specific knowledge about their genomic locus and their impact on other genes. In this protocol, we describe a method in which to successfully overcome the

Genetic Editing of Long Noncoding RNA Using CRISPR/Cas9 Technology

Sense lncRNA

a.

b. 5’

c.

Antisense lncRNA

5’

5’

Gene 1

Exon 1

d.

lncRNA DNA sequence

Exon 1

5’

lncRNA DNA sequence

Gene 2

3’

Exon 2

Gene 2

Gene 1

3’

3’

Exon 2

lncRNA DNA sequence

171

3’ 5’

3’ lncRNA DNA sequence

Fig. 2 The distribution of lncRNA loci. (a) Intergenic lncRNA between two coding genes. (b-c) Intronic lncRNA completely within an intron and partially overlapping an exon respectively. (d) Antisense lncRNA. Orange doted boxes indicate the sequence location unique to the lncRNA and therefore is ideal for creating complementary gRNA

complexities of lncRNA genetic editing and thereby selective knockdown of individual lncRNAs using the CRISPR/Cas9 system in vitro.

2

Materials

2.1 CRISPR/cas9 to lncRNA Construction

1. Lenti-CRISPRv2 (Addgene, 52961). 2. FastDigest BsmBI (Fermentas). 3. FastAP (Fermentas). 4. 10 FastDigest Buffer (Fermentas). 5. DTT (100 mM). 6. ddH2O. 7. GeneJET Gel Extraction Kit (Thermo Scientific). 8. Two oligos including lncRNA target sequence. 9. 10 T4 Ligation Buffer (NEB). 10. T4 PNK (Promega). 11. 2 Quick Ligase Buffer (Promega). 12. Quick Ligase (Promega). 13. JM109 bacteria (Promega). 14. LB Agar plates with ampicillin (50 μg/ml). 15. LB liquid medium with ampicillin (50 μg/ml). 16. GeneJET Plasmid Miniprep Kit (Thermo Scientific).

172

Kristina Larter et al.

2.2 Transient Transfection of CRISPR/cas9 and Selection

1. Lipofectamine LTX with PLUS (Invitrogen).

2.3 T7EN1 Analysis and DNA Sequencing

1. Wizard Genomic DNA Purification Kit (Promega).

2. Puromycin (Gibco).

2. GoTaq DNA Polymerase (Promega). 3. GeneJET PCR Purification Kit (Thermo Scientific). 4. 1 NEBuffer 2 (NEB). 5. T7EN1 enzymes (NEB). 6. 0.5 M EDTA. 7. Ethidium bromide. 8. 8% polyacrylamide gel. 9. ChemiDoc™ XRS (Bio-Rad). 10. pGEM-T Easy vectors (Promega).

2.4 RNA Isolation and SYBR-Based qRT-PCR

1. TRIzol reagent (Life Technologies). 2. 1-bromo-3-chloropropane Center).

(BCP)

(Molecular

Research

3. Isopropanol (Sigma-Aldrich). 4. Specific forward and reverse transcription primers for lncRNA. 5. High-Capacity cDNA Reverse Transcription Kit (Applied Biosystems). 6. 2 SYBR master mix (Applied Biosystems).

3

Methods

3.1 CRISPR/cas9 to lncRNA Construction and Transfection 3.1.1 Design for gRNA Sequences of CRISPR/cas9 Against lncRNAs 3.1.2 Cloning Specific lncRNA Target Sequence into Lenti-CRISPRv2

1. Design gRNA using PAMs (NGG) within sequences unique to lncRNA (see Fig. 2) using online tools (https://zlab.bio/ guide-design-resources; see Note 1).

The following procedure is based on the Target Guide Sequence Cloning Protocol from Dr. Feng Zhang’s lab [6].

1. Digest lenti-CRISPRv2 with restriction enzyme BsmBI for 30 min at 37  C. 5 μg of lenti-CRISPRv2. 3 μl FastDigest BsmBI. 3 μl FastAP.

Genetic Editing of Long Noncoding RNA Using CRISPR/Cas9 Technology

173

6 μl 10 FastDigest Buffer. 0.6 μl 100 mM DTT. X μl ddH2O (Up to total 60 μl). 2. Purify the linearized plasmid using the GeneJET. Gel Extraction Kit following the manufacturer’s instructions (see Note 2). 3. Phosphorylate and anneal the two oligos containing lncRNA target sequence. 1 μl oligo 1(100 μM). 1 μl oligo 2(100 μM). 1 μl 10 T4 Ligation Buffer. 6.5 μl ddH2O. 0.5 μl T4 PNK. Incubate the mixture at 37  C for 30 min and 95  C at 5 min and then ramp down to 25  C at 5  C/min. 4. Make a 1:200 dilution of mixture from step 3 with ddH2O. 5. Ligate linearized lenti-CRISPRv2 to oligo duplex from step 4. 50 ng linearized lenti-CRISPRv2. 1 μl diluted oligo duplex from step 4. 5 μl 2 Quick Ligase Buffer. 1 μl Quick Ligase. X μl ddH2O (Up to total 11 μl). Incubate the mixture at room temperature for 1 h or at 4  C cold room for overnight. 6. Transformation of ligation product into JM109 competent cells.t (a) Thaw 50 μl JM109 competent cells on ice and add 2 μl of ligation mixture to JM109. Incubate the mixture on ice for 20 min. (b) Heat-shock the cells for 45 s in a 42  C water bath. Immediately return the tubes to ice and incubate on ice for at least 2 min. (c) Add 500 μl room temperature LB liquid medium to the heat-shock transformations and incubate them for 1.5 h at 37  C with shaking (150 rpm). (d) Plate 100 μl of transformations onto LB Agar plates with ampicillin and incubate plates overnight at 37  C. (e) Select single colony and extract plasmids using the GeneJET Plasmid Miniprep Kit following the manufacturer’s instructions.

174

Kristina Larter et al.

(f) The lenti-CRISPRv2 with lncRNA specific target sequences should be verified by DNA sequencing. 7. Transient transfection of CRISPR/cas9 into cancer cells and selection. (a) Determine the lowest concentration of puromycin that can kill 100% nontransfected cancer cells for selection of CRISPR/cas9 targeting lncRNA vector transfected cancer cells (see Note 3). (b) Transient transfect lenti-CRISPRv2 with lncRNA specific target sequences or negative control (GFP target sequences) to cancer cells using the Lipofectamine LTX with PLUS Reagent following the manufacturer’s instructions. (c) After transfection for 24–48 h, puromycin (concentration from step 1) treatment for 24 h is employed for selection and then the CRISPR/cas9 targeting lncRNA vector transfected cancer cells are expanded in the regular culture medium (see Note 4). 3.2 T7EN1 Analysis and DNA Sequencing

1. Extract genomic DNA from the cancer cells and xenograft tissues using Wizard Genomic DNA Purification Kit. 2. To amplify DNA fragments spanning lncRNA target sequence of CRISPR/cas9 by GoTaq DNA Polymerase and purify PCR products by GeneJET PCR Purification Kit. 3. 200 ng of purified PCR products are denatured and reannealed. 2 μl NEB buffer2. 200 ng purified PCR products. X μl ddH2O (Up to total 20 μl). Incubate the mixture in a thermocycler with following steps (95  C, 5 min; 95 to 85  C at 2  C/s; 85 to 25  C at 0.1  C/s; hold at 4  C). 4. Add 10 U of T7EN1 enzymes to hybridized PCR products and incubate at 37  C for 15 min. Then add 2 μl 0.5 M EDTA to stop the reaction. 5. Run 8% polyacrylamide gel electrophoresis (PAGE) to separate the products digested by T7EN1 enzymes and then stain the gel with ethidium bromide and take imaging by ChemiDoc™ XRS (see Note 5). 6. Ligate the PCR products subjected to T7EN1 enzyme digestion to pGEM-T Easy vectors for DNA sequencing.

Genetic Editing of Long Noncoding RNA Using CRISPR/Cas9 Technology

3.3 RNA Isolation and SYBR-Based qRT-PCR 3.3.1 RNA Isolation

175

1. Lyse the cancer cells in the 1 ml TRIzol reagent. 2. Add 200 μl 1-bromo-3-chloropropane (BCP) for vigorously vortexing, and then incubate the mixture for 5 min at room temperature. 3. Centrifuge at 14,000 rpm (18,407  g) at 4  C for 10 min and transfer the upper aqueous phase to a new tube. 4. Add an equal volume of isopropanol (about 500 μl) to the upper aqueous phase for precipitation. Incubate mixture at room temperature for 10 min. 5. Centrifuge at 14,000 rpm (18,407  g) at 4  C for 10 min and wash the pellets with 1 ml 75% ethanol for 15 s. Centrifuge at 5000 rpm (2,348  g) at 4  C for 5 min. 6. Air-dry the pellets for 5 min and add nuclease-free water accordingly for elution (see Note 6).

3.3.2 SYBR-Based qRT-PCR Analysis

1. Synthesize cDNA by a high capacity cDNA reverse transcriptase kit (Life Technologies). Use forward and reverse primers of lncRNA and β-Actin is used as the endogenous control. The 20 μl mixture of reverse transcription reaction include: 2 μg of total RNA. 2 μM reverse transcription primers. 2 μl 10 reverse transcription buffer. 0.8 μl 100 mM dNTP. X μl nuclease-free water (Up to total 20 μl). Incubate the mixture in a thermocycler with following steps (25  C, 10 min; 37  C, 120 min; 85  C, 5 min; hold at 4  C). 2. The 20 μl mixture of quantitative real-time PCR reaction includes the following. 10 μl 2 SYBR master mix. 1 μl forward primer (7 μM). 1 μl reverse primer (7 μM). 1 μl cDNA. 7 μl nuclease-free water. Perform the real-time PCR for 30 cycles on a 7500 RealTime PCR System (Applied Biosystems, Foster City, CA, USA). Each cycle of real-time PCR includes denaturing for 10 s at 94  C, annealing and extension for 30 s at 58  C. Analyze the relative expression of target lncRNA by using 2ΔΔCT Method [7].

176

4

Kristina Larter et al.

Notes 1. These tools aid in selecting gRNA sequences specific for a designated DNA sequence and can also analyze on- and off-target efficacy. Ensure the recommended gRNAs have less than three mismatches to off-target sequences to prevent nonspecific cleavage. In addition, analyze the surrounding sequences of your targeted lncRNA. Make sure the gRNA sequence targets an area unique to the lncRNA and not nearby genes. A single cut may not be adequate to render the lncRNA dysfunctional due to its noncoding nature; therefore, two gRNAs (transfected concurrently) may be needed to cleave a larger portion out of the lncRNA locus. To ensure accurate attribute of resulting phenotype to the targeted lncRNA in CRISPR-mediated editing experiments it is recommended to monitor the expression of the surrounding genes or through exogenous rescue. 2. If successful BsmBI digestion, a ~ 2 kb band can be seen by exposure of the gel to ultraviolet light. Only purify the larger band for the following steps. 3. In order to determine the minimum concentration of puromycin required to kill nontransfected cancer cells, the puromycin should be titrated. Plate 5  104 cells in each well of a 24-well plate. Add medium plus increasing concentrations of puromycin (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 μg/ml). Examine the viable cells after 24 h. Choose the minimum concentration in which cell death >95%. 4. After puromycin selection for 24 h, replace with fresh medium and monitor the percentage of viable cells daily. The transfected cancer cells should be enriched in 1 week. 5. If the lncRNA is successfully cleaved at the target site by CRISPR/cas9, multiple site-specific bands should be detected. 6. Do not dry the RNA pellet completely because it will affect its solubility.

References 1. Geisler S, Coller J (2013) RNA in unexpected places: long non-coding RNA functions in diverse cellular contexts. Nat Rev Mol Cell Biol 14(11):699–712. https://doi.org/10.1038/ nrm3679 2. Wang KC, Chang HY (2011) Molecular mechanisms of long noncoding RNAs. Mol Cell 43(6):904–914. https://doi.org/10. 1016/j.molcel.2011.08.018

3. Quinn JJ, Chang HY (2016) Unique features of long non-coding RNA biogenesis and function. Nat Rev Genet 17(1):47–62. https://doi.org/ 10.1038/nrg.2015.10 4. Kornienko AE, Guenzl PM, Barlow DP, Pauler FM (2013) Gene regulation by the act of long non-coding RNA transcription. BMC Biol 11:59. https://doi.org/10.1186/1741-700711-59

Genetic Editing of Long Noncoding RNA Using CRISPR/Cas9 Technology 5. Awwad DA (2019) Beyond classic editing: innovative CRISPR approaches for functional studies of long non-coding RNA. Biol Methods Protoc 4(1):bpz017. https://doi.org/10.1093/ biomethods/bpz017 6. Gutschner T (2015) Silencing long noncoding RNAs with genome-editing tools. Methods Mol Biol 1239:241–250. https://doi.org/10.1007/ 978-1-4939-1862-1_13 7. Hansmeier NR, Widdershooven PJM, Khani S, Kornfeld JW (2019) Rapid generation of long noncoding RNA knockout mice using CRISPR/Cas9 technology. Noncoding RNA 5

177

(1):12. https://doi.org/10.3390/ ncrna5010012 8. Zhen S, Hua L, Liu YH, Sun XM, Jiang MM, Chen W, Zhao L, Li X (2017) Inhibition of long non-coding RNA UCA1 by CRISPR/Cas9 attenuated malignant phenotypes of bladder cancer. Oncotarget 8(6):9634–9646. https://doi.org/ 10.18632/oncotarget.14176 9. Ran FA, Hsu PD, Wright J, Agarwala V, Scott DA, Zhang F (2013) Genome engineering using the CRISPR-Cas9 system. Nat Protoc 8 (11):2281–2308. https://doi.org/10.1038/ nprot.2013.143

Chapter 16 Characterization of Circular RNAs Yang Zhang, Li Yang, and Ling-Ling Chen Abstract Accumulated lines of evidence have revealed that a large number of circular RNAs are produced in transcriptomes from fruit fly to mouse and human. Unlike linear RNAs shaped with 50 cap and 30 tail, circular RNAs are characterized by covalently closed loop structures without open terminals, thus required specific treatments for their identification and validation. Here, we describe a detailed pipeline for the characterization of circular RNAs. It has been successfully applied to the study of circular intronic RNAs (ciRNAs) derived from intron lariats and circular RNAs (circRNAs) produced from back spliced exons in human. Key words Circular RNAs, ciRNAs, circRNAs, RNA fractionation, RNase R, Northern blot, PAGE, Divergent PCR

1

Introduction Single-stranded circular RNA molecules were firstly observed by electron microscopy in plant viroids [1], yeast mitochondrial RNAs [2], and hepatitis δ virus [3]. Later, a handful of circular RNAs processed from splicing were identified by Northern blots and/or RT-PCRs in higher eukaryotes, including human and mouse [4– 6]. Due to their low abundance and scrambled order of exon–exon joining, these circular RNAs were considered aberrant splicing by-products with little functional potentials at that time. Nevertheless, recent studies reveal the important regulatory roles of circular RNAs in both physiological and pathological conditions, including neuronal function [7–9], innate immune response [10–12] and cell proliferation [13–17], and they accumulate during aging [18–20] (for reviews, see [21] and [22]). The application of next generation sequencing in transcriptome analyses (mRNA-seq) has provided unprecedented insights and quantitative measurements of gene expression, alternative splicing, and so on [23]. However, as formed covalently closed structures, circular RNAs lack canonical 30 polyadenylation, thus falling below

Lin Zhang and Xiaowen Hu (eds.), Long Non-Coding RNAs: Methods and Protocols, Methods in Molecular Biology, vol. 2372, https://doi.org/10.1007/978-1-0716-1697-0_16, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021

179

180

Yang Zhang et al.

the radar of most canonical polyadenylated (linear) transcriptome profiling. Nonpolyadenylated RNAs enriched from total RNAs by depleting polyadenylated RNAs and ribosomal RNAs allowed for the identification of new RNA transcripts in specific intron and exon regions via high throughput sequencing [24]. Of note, many of these previously uncharacterized transcripts were circular RNAs produced from either introns [25] or exons ([26–32], see review [33]). So far, two types of circular RNAs derived from primary RNA polymerase II transcripts have been reported in higher eukaryotes. One group is produced from intronic regions by escaping from debranching after splicing, resulting in circular intronic RNAs (ciRNAs) with a 20 ,50 -phosphodiester bond between the 50 splice donor site and the branchpoint site [25]. The other group is generated from back-spliced exons, where a downstream 50 splice site is joined to an upstream 30 splice site to form circular RNAs (circRNAs) with a 30 , 50 -phosphodiester bond between backspliced exons [26–32]. Importantly, these two groups of circular RNAs have been successfully recapitulated with specific expression vectors [11, 12, 25, 27–29, 34]. Currently, ten thousand circular RNAs have been widely identified in metazoans from fruit fly to mouse and human [18, 27, 30–32, 34–36], greatly expanding the complexity of transcriptomes and the diversity of noncoding RNAs. It is worthwhile noting that special attentions are required for circular RNAs enrichment and detection due to their intrinsic circular structures. In this chapter, we describe an integrated pipeline for circular RNA characterization (Fig. 1), from the fractionation of nonpolyadenylated RNAs, to the enrichment of circular RNAs with the RNase R digestion, and then to the validation of their existence by Northern blots and divergent PCRs. This pipeline provides a standard lab protocol to characterize circular RNAs, and has been successfully applied to the study of ciRNAs [25] and circRNAs [11, 12, 18, 26–32, 34–36] in human.

2

Materials All solutions/recipes are prepared from analytical grade chemicals with deionized DEPC-treated water. All reagents are RNase-free. Sterilized reagents are aliquoted and stored at room temperature for immediate usage or 20  C for long term storage. 1.5 ml RNase-free microcentrifuge tubes are purchased from Crystalgen (catalog number L-2507). 15 ml RNase-free centrifuge tubes are from Nest (catalog number 601052). 50 ml RNase-free centrifuge tubes are from Corning (catalog number 430829). Equipment required for polyacrylamide gel electrophoresis and electrophoretic transfer are from BIO-RED (catalog numbers 165-8001 and 170-3930, respectively).

Characterization of Circular RNAs

181

Fig. 1 A diagram of circular RNA fractionation, enrichment, and validation. See text for details

2.1 RNA Fractionations

All reagents and recipes needed in this step have been described previously in great details [37].

2.2 RNase R Digestion

1. Ribonuclease R (RNase R): Epicentre, catalog number RNR07250, lot number RNR40620. 2. Phenol–chloroform–isoamyl alcohol (25:24:1, v/v): Life Technologies, catalog number 15593-031. 3. 4 M LiCl solution: weigh 3.3912 g LiCl (Sigma-Aldrich, catalog number L9650-500 G) and transfer to a 15 ml RNase-free centrifuge tube, add DEPC-treated water to 10 ml. Mix thoroughly, and filter through a 0.22 μm Millex-GP Syringe Filter Unit (Millipore, catalog number SLGP05010). Split into small aliquots and store at 20  C. 4. Glycogen, RNA grade: Thermo SCIENTIFIC, catalog number R0551.

2.3 Validation of Circular RNAs 2.3.1 RNA DeepSequencing and GenomeWide Analysis of Circular RNAs

1. 75% ethanol (v/v): transfer 30 ml absolute ethanol to a 50 ml RNase-free centrifuge tubes and add 10 ml DEPC-treated water. Mix well and store at 20  C.

182

Yang Zhang et al.

2.3.2 Denaturing Urea– Polyacrylamide Gel Electrophoresis (Urea PAGE)

1. Urea: aMRESCO, catalog number 037-1 KG. 2. 30% acrylamide: Sangon Biotech, catalog number SD6017. 3. 10 TBE Electrophoresis Buffer: weigh 108 g Tris base (Sigma-Aldrich, catalog number 15456-3), 55 g boric acid (Sigma-Aldrich, catalog number B6768-500 G), transfer to a beaker with 40 ml 0.5 M EDTA (pH 8.0), and add DEPCtreated water to 1 l. Filter through a 0.22 μm Millex-GP Syringe Filter Unit, and store at room temperature. 1 TBE buffer is diluted from 10 TBE buffer with DEPC-treated water. 4. 0.5 M EDTA, pH 8.0: weigh 186.1 g Na2EDTA·2H2O (Sigma-Aldrich, catalog number E5134-250 G), transfer to a beaker with 500 ml DEPC-treated water, adjust pH to 8.0 with 10 M NaOH, and add DEPC-treated water to 1 l. Filter through a 0.22 μm Millex-GP Syringe Filter Unit, and store at room temperature. 5. 10% (w/v) ammonium persulfate (APS): dissolve 0.1 g APS (Sigma-Aldrich, catalog number A3678-25 G) with DEPCtreated water to 1 ml. Store at 4  C for few weeks. 6. TEMED ((N,N,N0 ,N0 -tetramethylethylenediamine): BIO-RED, catalog number 161-0801. 7. 2 urea loading buffer: 8 M urea, 90% formamide (SigmaAldrich, catalog number F9037-100 mL), 20 mM EDTA, 0.1% (w/v) xylene cyanol, bromophenol blue. Store at 4  C. 8. 50 Transfer Buffer: weigh 30.285 g Tris base, 17.02 g sodium acetate trihydrate (Sigma-Aldrich catalog number 236500-500 G) and 9.306 g EDTA, transfer to a beaker with 10 ml acetate, and add DEPC-treated water to 500 ml. Filter through a 0.22 μm Millex-GP Syringe Filter Unit, and store at room temperature. 1 transfer buffer is diluted from 50 transfer buffer with DEPC-treated water. 9. RiboMAX™ Large Scale RNA Production Systems: Promega, catalog number P1280 (SP6), P1300 (T7) for RNA probes synthesis. 10. Dig RNA Labeling 11277073910.

Mix:

Roche,

catalog

number

11. DNA-free™ Kit, DNase Treatment and Removal Reagents: Ambion, catalog number AM1906. 12. Dig Easy Hyb 11796895001.

Granules:

Roche,

catalog

number

13. 20 SSC: weigh 175 g NaCl (Sigma-Aldrich, catalog number S9625–1 KG) and 88 g sodium citrate dihydrate (SigmaAldrich, catalog number W302600), transfer to a beaker with 500 ml DEPC-treated water, adjust pH to 7.0 with 1 M HCl,

Characterization of Circular RNAs

183

and add DEPC-treated water to 1 l. Filter through a 0.22 μm Millex-GP Syringe Filter Unit, and store at room temperature. 2 SSC and 0.2 SSC are diluted from 20 SSC with DEPCtreated water. 14. 10% (w/v) SDS: dissolve 100 g SDS (Sigma-Aldrich, catalog number L3771-1KG) with DEPC-treated water to 1 l, filter through a 0.22 μm Millex-GP Syringe Filter Unit, store at room temperature. 15. DIG Wash and Block Buffer Set: Roche, catalog number 11585762001. 16. Anti-digoxigenin-AP, Fab fragments: Roche, catalog number 11093274910. 17. CDP-Star, ready-to-use solution: Roche, catalog number 12041677001. 2.3.3 Divergent RT-PCR

1. SuperScript™ III Reverse Transcriptase: Invitrogen, catalog number 18080. 2. Random hexamers: TaKaRa, catalog number RR037A. 3. dNTP mixture: TaKaRa, catalog number T4030. 4. Recombinant RNasin® Ribonuclease Inhibitor: Promega, catalog number N2511.

3

Methods

3.1 Fractionation of Nonpolyadenylated RNAs from Mammalian Cells

Ribominus RNAs (ribo- RNAs) is enriched from DNase I treated total RNAs by depleting most redundant ribosomal RNAs with RiboMinus™ Human/Mouse Transcriptome Isolation Kit (Invitrogen™, catalog number K1550-01) as previously described [37]. In addition, fractionation of nonpolyadenylated and ribominus RNAs (poly(A)/ribo- RNAs, poly(A)- RNAs for short) from total RNAs by removal of poly(A) + RNA transcripts and ribosomal RNAs [37] shows cleaner background for circular RNA analysis (see Note 1).

3.2 RNase R Digestion (See Note 2)

1. Dissolve fractionated ribo- RNAs (or poly(A)- RNAs) from 20 μg total RNAs from Subheading 3.1 with 52 μl DEPCtreated water. Mix well. 2. Split RNAs to two aliquots in new 1.5 ml RNase-free microcentrifuge tubes: one for RNase R digestion and another for control with digestion buffer only. 3. For RNase R digestion, add 3 μl 10 RNase R Reaction Buffer and 1 μl RNase R (20 U/μl); for control, add 3 μl 10 RNase R Reaction Buffer and 1 μl DEPC-treated water. Mix well and quick spin the tubes for a few seconds. Incubate the samples at

184

Yang Zhang et al.

37  C for 1–2 h according to manufacturer’s instructions (see Note 3). 4. After incubation, add 30 μl of phenol–chloroform–isoamyl alcohol to stop the exonuclease digestion. Vortex the tubes vigorously, and spin the tubes in a microcentrifuge at 13,000  g at 4  C for 5 min. 5. Carefully remove the upper aqueous layers and transfer them to new 1.5 ml RNase-free microcentrifuge tubes, add 6 μl 4 M LiCl, 1 μl glycogen, and 90 μl prechilled absolute ethanol (20  C). Mix well by inverting the tubes several times, and incubate the tubes at 80  C for 1 h to precipitate RNAs. These RNA samples (RNase R treated or untreated) can be stored at 80  C until use. This is a good stopping point in the process (see Note 4). 6. Precipitate RNAs by spinning at 13,000  g at 4  C for 20 min. Remove the supernatants and wash the RNA pellets twice with 700 μl prechilled 75% (v/v) ethanol and air-dry. 7. Resuspend RNA pellets with 20 μl DEPC-treated water. 3.3 Validation of Circular RNAs 3.3.1 High-Throughput Sequencing Analysis of Circular RNAs

1. Isolated ribo– RNAs (or poly(A)– RNAs) and/or their RNase R treatment can be directly used for RNA-seq library preparation according to the manufacturer’s instructions, and then subjected to deep sequencing [37]. 2. Specific pipelines can be used for genome-wide identification of ciRNAs [25] and/or circRNAs [38, 39] by retrieving junction reads for circular RNAs. 3. Circular RNAs can be individually visualized at genome browser with uploaded track files, as indicated in Fig. 2 (Fig. 2(a) ci-ankrd52 and (b) circCAMSAP1).

3.3.2 Northern Blots with Denaturing Polyacrylamide Gel Electrophoresis (PAGE)

1. Prepare the gel (a) Assemble the gel plates according to the manufacturer’s instructions, and fix the gel plates in the gel-casting apparatus. (b) Prepare the appropriate denaturing polyacrylamide gel solution by mixing 8 M urea, 10 TBE and 30% acrylamide with DEPC-treated water (see Note 5). 10 ml gel solution is enough for a denaturing polyacrylamide gel of 10 cm  7.5 cm  1 mm. Dissolve the mixture by rotation and filter the solution through 0.22 μl Millex-GP Syringe Filter Unit. Sequentially add 100 μl 10% APS and 8 μl TEMED, and mix thoroughly. Immediately pour the gel solution between the gel plates and insert the comb. Wait for about 30 min to let the gel polymerize.

Characterization of Circular RNAs

185

Fig. 2 Visualization of two types of circular RNAs with RNA-seq. (a) An example of ciRNAs (circular intronic RNAs), ci-ankrd52, from the second intron of ANKRD52 gene. Deep sequencing signal from poly(A) + (black), poly(A)- (red), poly(A)-/RNase R (purple) are shown. The predicted ci-ankrd52 is indicated in pink. (b) An example of circRNAs from back spliced exons, circCAMSAP1, from the CAMSAP1 loci. Deep sequencing signal from poly(A) + (black), poly(A)- (red), poly(A)-/RNase R (purple) are shown. The predicted circCAMSAP1 is indicated in blue

2. Prepare the samples (a) Meanwhile, prepare RNA samples. Total RNAs (Subheading 3.1) were digested with RNase R (Subheading 3.2), cleaned up with phenol, precipitated by ethanol, and resuspended by DEPC-treated water. The final RNA concentration/amount was determined by measuring UV absorption with a spectrophotometer. RNA concentration is determined by the OD reading at 260 nm. To get better signals for circular RNAs, similar amounts of RNAs with or without RNase R treatment were used for Northern blots (see Note 6). (b) Add 2 urea loading buffer to RNA samples (10 μl total volume is recommended to get sharp bands), for example RNAs with or without RNase R treatment as Subheading 3.2. Boil the RNA samples at 100  C for 5 min, and chill on ice immediately. 3. Load the samples and run the gel (a) Dismount the gel from the casting chamber and assemble the gel electrophoresis apparatus according to the manufacturer’s instructions. (b) Remove the comb and fill the chamber with 1 TBE electrophoresis buffer (see Note 7). (c) Load denatured RNA samples and run the gel at 120 V for 3 h (see Note 8). 4. RNA transfer and fixation (a) Dissemble the gel electrophoresis apparatus and assemble the gel transfer “sandwich” in a transfer blot. Put the bottom of the gel face to the nylon membrane, remove

186

Yang Zhang et al.

bubbles between the gel and nylon membrane. Transfer the RNA from gel to membrane in 1 transfer buffer with 100 V for 90 min. (b) Dissemble the gel transfer apparatus. Immobilize RNAs with UV cross-linking by an Ultraviolet Crosslinker (UVP, CL-1000, 254-nm wavelength) for the appropriate length of time (usually 180 mJ/cm2 is used) (see Note 9). 5. Prepare RNA probe (a) Prepare Dig (Digoxigenin)-labeled RNA probes according to the manufacturer’s instructions (RiboMAX™ Large Scale RNA Production Systems, Promega). Briefly, mix DNA template (with T7 or SP6 promoter), 10 Dig labeling mixture, 5 transcription buffer (T7 or SP6), and enzyme mixture (T7 or SP6) with DEPC-treated water. Mix gently by pipetting and incubate reaction at 37  C for 2 ~ 3 h. (b) Add 1 μl DNase I and mix well. Incubate at 37  C for 15 min to remove the DNA template. (c) To precipitate the RNA probe, sequentially add 4 μl 4 M LiCl, 100 μl DEPC-treated water and 300 μl prechilled absolute ethanol (20  C), mix well by inverting the tubes several times, and incubate the tubes at 20  C for 1 h. (d) Spin the tubes at 13,000  g at 4  C for 15 min. Remove the supernatants and wash the RNA pellet twice with 700 μl 75% (v/v) cold ethanol (20  C) and air-dry. (e) Dissolve the RNA probes by adding 40 μl DEPC-treated water (usually 8–10 μg RNA probes can be obtained). 6. Hybridization (a) Prehybridize membrane with DIG Easy Hyb (3 ml/ 100 cm2) for 30 min at 68  C with gentle rotation. (b) Denature DIG-labeled RNA probe by heating at 100  C for 5 min, and immediately place the tube on ice. (c) Discard prehybridization buffer, and add new prewarmed DIG Easy Hyb with denatured DIG-labeled RNA probe (100 ng/ml). Incubate overnight with gentle rotation. 7. Immunological detection The following steps followed protocol for DIG Wash and Block Buffer Set, and all performed at room temperature (unless indicated otherwise). (a) Wash the membrane twice with 2 SSC, 0.1% SDS for 5 min with gentle rotation. (b) Wash the membrane twice with 0.2 SSC, 0.1% SDS at 68  C for 30 min with gentle rotation.

Characterization of Circular RNAs

187

(c) After stringency washes, briefly rinse the membrane in 1 Washing buffer. (d) Incubate the membrane in 1 Blocking solution for 30 min in an appropriate container with gentle agitation. (e) Incubate the membrane in Antibody Solution for 30 min with gentle agitation. (f) Wash the membrane three times with 1 Washing buffer for 20 min. (g) Incubate in Detection buffer for 5 min. (h) Place the membrane (RNA side-up) on a development folder, add CDP-Star ready-to-use solution (1 ml for 100 cm2 membrane), and incubate for 2–5 min. (i) Expose to X-ray film for an appropriate time. Multiple exposures can be taken to achieve appropriate signals. Examples of Northern blots of circular RNAs migration in denaturing PAGE are shown in Fig. 3 (Fig. 3(a) ciankrd52, (b) circCAMSAP1). 3.3.3 RT-PCR with Divergent Primers

1. First strand cDNA synthesis (a) Before first strand cDNA synthesis, ribo– RNAs (Subheading 3.1) were digested with RNase R (Subheading 3.2), cleaned up with phenol, precipitated by ethanol, and resuspended by DEPC-treated water. The final RNA concentration/amount was determined by measuringUV absorption with a spectrophotometer. RNA concentration is determined by the OD reading at 260 nm. To get better signals for circular RNAs, similar amounts of ribo– RNAs with or without RNase R treatment were used for RT-PCR validation (see Note 6). Prepare the first strand cDNA according to the manufacturer’s instructions (SuperScript™ III Reverse Transcriptase, Invitrogen). Briefly, mix RNAs samples, random hexamers (100 μM), dNTP (2.5 mM each) with DEPC-treated water. Heat at 65  C for 5 min, and immediately chill on ice for at least 1 min. (b) Briefly spin the tubes, add 5 First-Strand Buffer, 0.1 M DTT, RNasin and SuperScript™ III RT. Pipet gently, and incubate reaction at 25  C for 5 min. (c) Incubate at 50  C for 60 min, and inactivate the reaction by heating at 70  C for 15 min. 2. Circular RNAs divergent primer PCR (a) Unlike linear RNA, circular RNAs can be amplified by divergent primers from both ribo– RNAs with or without RNase R treatment. Example of circular RNAs semiquantitative RT-PCR results in Fig. 4 (Fig. 4(a) ci-ankrd52, (b) circCAMSAP1).

188

Yang Zhang et al.

Fig. 3 Validation of circular RNAs by Northern blots. (a) Northern blot of ci-ankrd52. RNase R treated or untreated total RNAs from H9 cells are loaded on 5% denaturing urea PAGE for Northern blot with a Dig-labeled antisense RNA probe (pink bar) as previously reported [25]. Note that ci-ankrd52 remains stable after RNase R treatment and migrates slowly on the denaturing urea PAGE. (b) Northern blot of circCAMSAP1. RNase R treated or untreated total RNAs from H9 cells are loaded on 5% denaturing urea PAGE for Northern blot with a Dig-labeled antisense RNA probe (blue bar) as previously reported [27]. Note that circCAMSAP1 remains stable after RNase R treatment and migrates slowly on the denaturing urea PAGE. *Asterisk, linear mRNAs

4

Notes 1. Ribominus RNAs (ribo– RNAs) is enriched from total RNAs by depleting ribosomal RNAs, which contain both poly (A) + and poly(A) - RNAs. However, the fractionation of poly (A)/ribo- RNAs (poly(A)- RNAs for short) removes both poly(A) + RNAs and ribosomal RNAs, which shows a much cleaner background for circular RNA analysis. We thus prefer to use poly(A)- RNAs (with or without further RNase R digestion) for RNA-seq analyses. 2. RNase R is a magnesium-dependent 30 ! 50 exoribonuclease that digests linear RNAs and Y-structure RNAs, while preserving circular RNAs [40, 41]. The digestion of RNase R can be

Characterization of Circular RNAs

189

Fig. 4 Validation of circular RNAs by divergent PCRs. (a) PCR validation of ci-ankrd52 with a divergent primer set (pink arrows) as previously reported [25]. Primers for linear mRNA are indicated as black arrows. Note that ci-ankrd52 remains stable after RNase R treatment, while the linear mRNA is largely degraded with RNase R treatment. (b) PCR validation of circCAMSAP1 with a divergent primer set (blue arrows) as previously reported [27]. Primers for linear mRNAs are indicated as black arrows. Note that circCAMSAP1 remains stable after RNase R treatment, while the linear mRNA is largely degraded with RNase R treatment

further improved by replacing K+ with Li+ that enables the digestion of highly expressed and structured linear RNAs, allowing better annotation of circular RNAs [41]. Besides RNase R, tobacco acid phosphatase and terminator exonuclease can also efficiently degrade linear RNAs, while largely leave circular RNAs intact [28]. 3. RNase R activity may vary from batch to batch. Thus, check the amount of enzyme and the incubation time for your particular experiment is highly recommended before carrying out further sequencing. For example, we have observed that circular RNAs can also be degraded with the RNase R incubation (Fig. 5). In this case, some RNase R-sensitive circular RNAs could be lost with long time incubation with RNase R. From the time course of the RNase R treatment (Fig. 5), we recommend that 1-h incubation of this batch of RNase R is good enough for circular RNA validation by Northern blots. Meanwhile, it is worthy to note that the short time course of RNase R digestion may lead to only partial digestion of linear RNAs, which may in turn result in unwanted linear RNA signals when such samples are applied to RNA-seq analyses (data not shown).

190

Yang Zhang et al.

Fig. 5 Time course of the RNase R treatment. Ribo- RNAs from 10 μg total RNAs in either wild-type (a) or transfected (b) cells are incubated with 1 μl RNase R at 37  C for 0, 5, 15, 30, 60, 120 min, respectively, and then applied for Northern blots on either native agarose gel for endogenous circCAMSAP1 (a) or denaturing PAGE for overexpressed circPOLR2A (b). The probes are the same ones as previously reported [27]. Note that linear RNAs (asterisks) are largely degraded with the short time of RNase R incubation, and circular RNAs are also reduced during the prolonged incubation

4. RNA samples are recommended to be used immediately or stored at 80  C for later usage. RNA precipitation in ethanol can be stored at 80  C for years. 5. Although we have applied the denaturing urea PAGE to detect some longer circular RNAs [25], such PAGE is usually used to analyze RNAs less than 500 nt. In addition, due to the special covalently closed loop structure, circular RNAs migrate much more slowly than linear RNAs with the same molecular weight [25, 27]. On the contrast, the migration of circular RNAs on native agarose gels is similar with linear RNAs with similar molecular weights [25]. 6. During the preparation of the RNase R treated RNA samples or RNA samples subjected to other treatments, RNAs will be lost due to different treatments and the subsequent RNA recovery. Thus, we generally use similar amounts of RNAs prior to any treatment (total RNA) and after the treatment of interest for Northern blots. For instance, about 20 μg total RNAs were first digested with RNase R (about 5–10 μg RNAs could be retrieved) and then applied for Northern blot; correspondingly, about 5–10 μg total RNAs were directly applied for Northern blot without RNase R digestion (Fig. 3). A similar strategy has been applied to RT-PCR analyses (Fig. 4). 7. Before loading the samples, rinse gel wells several times to remove dissolved urea, and then load the samples immediately. 8. The time of electrophorese depends on the molecular weights of interested circular RNAs. Usually, it requires about 3 hr. at

Characterization of Circular RNAs

191

120 V to sufficiently resolve 400-nt circular RNAs in 5% denaturing PAGE. 9. The cross-linked membrane can be immediately used for prehybridization or stored at 80  C for weeks after drying up.

Acknowledgements This work was supported by 31730111, 91940303, 31925011, and 31725009 from the National Natural Science Foundation of China, China to L.Y. and L.L.C. References 1. Sanger HL et al (1976) Viroids are singlestranded covalently closed circular RNA molecules existing as highly base-paired rod-like structures. Proc Natl Acad Sci U S A 73:3852–3856 2. Arnberg AC et al (1980) Some yeast mitochondrial RNAs are circular. Cell 19:313–319 3. Kos A et al (1986) The hepatitis delta (delta) virus possesses a circular RNA. Nature 323:558–560 4. Nigro JM et al (1991) Scrambled exons. Cell 64:607–613 5. Cocquerelle C et al (1992) Splicing with inverted order of exons occurs proximal to large introns. EMBO J 11:1095–1098 6. Capel B et al (1993) Circular transcripts of the testis-determining gene Sry in adult mouse testis. Cell 73:1019–1030 7. Rybak-Wolf A et al (2015) Circular RNAs in the mammalian brain are highly abundant, conserved, and dynamically expressed. Mol Cell 58:870–885 8. You X et al (2015) Neural circular RNAs are derived from synaptic genes and regulated by development and plasticity. Nat Neurosci 18:603–610 9. Piwecka M et al (2017) Loss of a mammalian circular RNA locus causes miRNA deregulation and affects brain function. Science 357: eaam8526 10. Chen YG et al (2017) Sensing self and foreign circular RNAs by intron identity. Mol Cell 67 (228–238):e225 11. Li X et al (2017) Coordinated circRNA biogenesis and function with NF90/NF110 in viral infection. Mol Cell 67(214–227):e217 12. Liu CX et al (2019) Structure and degradation of circular RNAs regulate PKR activation in innate immunity. Cell 177:865–880.e21

13. Conn SJ et al (2015) The RNA binding protein quaking regulates formation of circRNAs. Cell 160:1125–1134 14. Bachmayr-Heyda A et al (2015) Correlation of circular RNA abundance with proliferation-exemplified with colorectal and ovarian cancer, idiopathic lung fibrosis, and normal human tissues. Sci Rep 5:8057 15. Panda AC et al (2017) Identification of senescence-associated circular RNAs (SAC-RNAs) reveals senescence suppressor CircPVT1. Nucleic Acids Res 45:4021–4035 16. Guarnerio J et al (2016) Oncogenic role of fusion-circRNAs derived from cancerassociated chromosomal translocations. Cell 165:289–302 17. Guarnerio J et al (2019) Intragenic antagonistic roles of protein and circRNA in tumorigenesis. Cell Res 29:628–640 18. Westholm JO et al (2014) Genome-wide analysis of drosophila circular RNAs reveals their structural and sequence properties and age-dependent neural accumulation. Cell Rep 9:1966–1980 19. Cortes-Lopez M et al (2018) Global accumulation of circRNAs during aging in Caenorhabditis elegans. BMC Genomics 19:8 20. Gruner H, Cortes-Lopez M, Cooper DA, Bauer M, Miura P (2016) CircRNA accumulation in the aging mouse brain. Sci Rep 6:38907 21. Li X, Yang L, Chen LL (2018) The biogenesis, functions, and challenges of circular RNAs. Mol Cell 71:428–442 22. Kristensen LS et al (2019) The biogenesis, biology and characterization of circular RNAs. Nat Rev Genet 20:675–691 23. Graveley BR (2008) Molecular biology: power sequencing. Nature 453:1197–1198

192

Yang Zhang et al.

24. Yang L, Duff MO, Graveley BR, Carmichael GG, Chen LL (2011) Genomewide characterization of non-polyadenylated RNAs. Genome Biol 12:R16 25. Zhang Y et al (2013) Circular intronic long noncoding RNAs. Mol Cell 51:792–806 26. Salzman J, Gawad C, Wang PL, Lacayo N, Brown PO (2012) Circular RNAs are the predominant transcript isoform from hundreds of human genes in diverse cell types. PLoS One 7: e30733 27. Zhang XO et al (2014) Complementary sequence-mediated exon circularization. Cell 159:134–147 28. Hansen TB et al (2013) Natural RNA circles function as efficient microRNA sponges. Nature 495:384–388 29. Zhang Y et al (2016) The biogenesis of nascent circular RNAs. Cell Rep 15:611–624 30. Jeck WR et al (2013) Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA 19:141–157 31. Memczak S et al (2013) Circular RNAs are a large class of animal RNAs with regulatory potency. Nature 495:333–338 32. Salzman J, Chen RE, Olsen MN, Wang PL, Brown PO (2013) Cell-type specific features of circular RNA expression. PLoS Genet 9: e1003777 33. Chen LL (2016) The biogenesis and emerging roles of circular RNAs. Nat Rev Mol Cell Biol 17:205–211

34. Zhang XO et al (2016) Diverse alternative back-splicing and alternative splicing landscape of circular RNAs. Genome Res 26:1277–1287 35. Dong R, Ma XK, Chen LL, Yang L (2017) Increased complexity of circRNA expression during species evolution. RNA Biol 14:1064–1074 36. Ivanov A et al (2015) Analysis of intron sequences reveals hallmarks of circular RNA biogenesis in animals. Cell Rep 10:170–177 37. Yin QF, Chen LL, Yang L (2015) Fractionation of non-polyadenylated and ribosomalfree RNAs from mammalian cells. Methods Mol Biol 1206:69–80 38. Dong R, Ma XK, Chen LL, Yang L (2019) Genome-wide annotation of circRNAs and their alternative back-splicing/splicing with CIRCexplorer pipeline. Methods Mol Biol 1870:137–149 39. Ma XK et al (2020) CIRCexplorer3: a CLEAR pipeline for direct comparison of circular and linear RNA expression. Genomics Proteomics Bioinformatics 17:511–521 40. Suzuki H et al (2006) Characterization of RNase R-digested cellular RNA source that consists of lariat and circular RNAs from pre-mRNA splicing. Nucleic Acids Res 34:e63 41. Xiao MS, Wilusz JE (2019) An improved method for circular RNA purification using RNase R that efficiently removes linear RNAs containing G-quadruplexes or structured 30 ends. Nucleic Acids Res 47:8755–8769

Chapter 17 Identification of circRNA-Interacting Proteins by Affinity Pulldown Jen-Hao Yang, Poonam R. Pandey, and Myriam Gorospe Abstract Circular RNAs (circRNAs) comprise a vast class of covalently closed transcripts, generated primarily via backsplicing. Most circRNAs arise from full or partial exons, but they can also arise from introns, and from combinations of introns and exons. While high-throughput RNA-sequencing analysis has identified tens of thousands of circRNAs expressed in different tissues and growth conditions, the function of circRNAs has only been described for a handful of them. As most circRNAs appear not to encode peptides, their function is presumed to be linked to their interaction with a range of molecules, particularly other nucleic acids (notably microRNAs) and proteins. A major impediment to identifying circRNA-associated molecules is a lack of suitable methodologies capable of analyzing specifically circRNAs and not their linear RNA counterparts with which they share most of their sequence. Here, we describe a flexible and robust method for identifying the proteins that associate with a given circRNA. The affinity pulldown assay is based on the use of a biotinylated antisense oligomer that recognizes the circRNA-specific junction sequence. Following pulldown using streptavidin beads, the proteins are eluted from the circRNP (circribonucleoprotein) complex and identified by mass spectroscopy; validation by Western blot analysis and other methods would then confirm the identity of the circRNA-associated proteins. We present a detailed step-by-step protocol, tips to optimize the analysis, troubleshooting suggestions, and assistance in interpreting the results. In sum, this protocol enables the discovery of proteins present in circRNPs, a critical effort toward elucidating circRNA function. Key words circRNAs, Backsplice junction, Ribonucleoprotein complex, circRNA-Protein, Antisense oligomer

1

Introduction With less than 2% of the transcribed genome serving as a template for protein synthesis, a vast majority of expressed transcripts is thought to participate in other cellular processes [1]. Among this massive class of noncoding RNAs are circular (circ)RNAs, a family of covalently closed transcripts. Generally believed to arise from the splicing machinery through backsplicing, circRNAs are ubiquitous in the cytoplasm and nucleus, are evolutionarily conserved, and

Lin Zhang and Xiaowen Hu (eds.), Long Non-Coding RNAs: Methods and Protocols, Methods in Molecular Biology, vol. 2372, https://doi.org/10.1007/978-1-0716-1697-0_17, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021

193

194

Jen-Hao Yang et al.

display tissue- and development-specific expression patterns [2– 7]. Until recently, circRNAs were considered to be rare byproducts of splicing, far less abundant than the linear parental mRNAs; however, now we know that this expansive class includes tens of thousands of circRNAs, and that thousands of circRNAs may be found in a given cell [8]. A circRNA generally shares its sequence fully with its parent linear RNA; the only segment that is unique to the circRNA is the point of covalent union –the circRNA junction. Accordingly, the methods that have been developed to isolate and characterize the function of circRNAs exploit both their circular nature and their junction sequences [9]. As most circRNAs do not appear to encode proteins, their function has been proposed to be linked to their interaction with other molecules. Most of the interactions studied to date have implicated microRNAs [10] and in some cases, a function as “microRNA sponges” has been described for abundant circRNAs. However, many proteins also bind circRNAs and the function of these proteins may be modulated by the interaction with circRNAs, as shown for transcription factors and RNA-binding proteins (RBPs) that influence mRNAs posttranscriptionally [11–15]. Affinity RNA pulldown assays can specifically extract ribonucleoprotein (RNP) complexes from a mixed population of molecules. The general strategy is to use a biotinylated antisense oligomer (ASOs) designed to recognize the RNA sequence, incubate the ASO with a mixed lysate containing the RNP, pull it down using streptavidin-coated beads, and purify and identify the associated molecules. In the case of circRNPs, the ASO is designed to hybridize with the unique junction sequence on the circRNA in order to reduce or exclude pulling down the linear RNA. The proteins associated with the circRNA can be studied using methods such as mass spectrometry and Western blotting, while the RNAs present in the pulldown material can be studied by methods such as sequencing and RT-qPCR analysis. The basic schematic of this method is illustrated in Fig. 1. For low-abundance circRNAs, moderate overexpression of the circRNA before pulldown may enhance the identification of the interacting partners. This method was successfully used in recent studies to identify proteins bound to circRNAs. ASOs against circFoxo3 uncovered that the circRNA associated with ID1, E2F1, FAK, and HIF-1α [13, 14], ASOs that pulled down circPABPN1 revealed its interaction with HuR [12], and ASOs directed at circSamd4 were used to find the association of the circRNA with transcription factors PURA and PURB [15]. We have formulated a detailed protocol, adjustments to improve detection, possible limitations, and other points for consideration.

Method to Study circular RNA-Protein Interactions

195

Fig. 1 Schematic of circular RNA-antisense oligo pulldown

2

Materials It is highly recommended that all steps of this protocol be performed in RNase-free conditions by using RNase-free glassware, plasticware, and solutions.

2.1 General Reagents

1. Cultured cell line; for example, mouse C2C12 myoblasts (ATCC, USA). 2. Culture media; for C2C12 myoblasts, Dulbecco’s modified Eagle medium (DMEM) (Thermo Fisher Scientific).

196

Jen-Hao Yang et al.

3. Serum for cell culture; for C2C12 myoblasts, fetal bovine serum (FBS) (Thermo Fisher Scientific) and horse serum (Thermo Fisher Scientific). 4. Dulbecco’s phosphate-buffered saline (DPBS) (Thermo Fisher Scientific). 5. Cell scrapers. 6. DNase/RNase-free 1.5-mL microcentrifuge tubes. 7. Vortexer. 8. Microcentrifuge (Eppendorf). 9. NanoDrop spectrophotometer (Thermo Fisher Scientific). 10. Tube rotator. 11. Magnetic separation stand (New England Biolabs). 2.2 Preparation of Biotinylated Antisense Oligomer (ASO)

2.3 Cell Lysis and Binding

Biotinylated ASOs are designed against the circRNA junction to pull down the circular RNA. 30 -biotin-labeled antisense DNA oligomers (ASOs; 20–40 nucleotides in length) are designed against the splice junction and custom-made from Integrated DNA Technologies (IDT). It is highly recommended that the ASO sequence be BLASTed against the NCBI database to exclude the recognition of other possible target RNAs. As a negative control, 30 -biotin-labeled sense DNA oligomer or a scrambled oligomer (SO; IDT) can be used. If the secondary structure of the target circRNA junction is available, shorter oligomers (e.g., 20–22 nucleotides in length) can be designed to target the single-stranded regions. 1. Polysome extraction buffer (PEB) (20 mM Tris–HCl pH 7.5, 100 mM KCl, 5 mM MgCl2, 0.5% NP-40). For mitochondrial proteins, replace 0.5% NP-40 with 0.5% Triton X-100 and 1% NP-40. 2. 2 TENT binding buffer [20 mM Tris–HCl pH 8.0, 2 mM EDTA pH 8.0, 500 mM NaCl, 1% v/v Triton X-100]. 3. RiboLOCK RNase inhibitor (40 U/μL) (Life Technologies, # EO0384). 4. Halt™ Protease Inhibitor Cocktail (100) (Life Technologies # 87786).

2.4 Pulldown of circRNAs and Elution of Proteins

1. Streptavidin Dynabeads (Thermo Fisher Scientific).

2.5 RNA Isolation, Reverse Transcription, and qPCR

1. Tripure (Thermo Fisher Scientific).

2. 2 Laemmli buffer (Bio-Rad).

2. Random primers (Thermo Fisher Scientific). 3. dNTP mix (Thermo Fisher Scientific),

Method to Study circular RNA-Protein Interactions

197

4. Ribolock RNase inhibitor (Thermo Fisher Scientific). 5. Maxima Reverse Transcriptase (Thermo Fisher Scientific). 6. MicroAmp® Optical 96-Well Reaction Plate (Thermo Fisher Scientific). 7. MicroAmp® Optical Adhesive Film (Thermo Fisher Scientific). 8. MPS 1000 Mini Plate Spinner. 9. KAPA SYBR® FAST qPCR mix (Thermo Fisher Scientific). 10. circRNA-specific divergent sets of PCR primers. 11. Thermomixer (Eppendorf). 12. Veriti® 96-well Thermal Cycler (Thermo Fisher Scientific). 13. QuantStudio 5 Real-Time PCR System (Thermo Fisher Scientific).

3

Methods

3.1 Preparation of Cell Lysate

1. Seed ~five million C2C12 cells per 150-mm dish, culture in DMEM supplemented with 10% FBS and antibiotics at 37  C in a 5% CO2 incubator. 2. Allow the cells to grow for 24–48 h until they reach 70–90% confluency and then replace with differentiation media (DMEM with 2% horse serum). At 72 h after inducing differentiation, remove media, wash cells with cold DPBS and use a cell scraper to collect them into a 1.7-mL tube. Use at least one 150-mm dish per each pulldown or 30–60 million cells, depending upon the abundance of the circRNA of interest (see Note 1). 3. Harvest cell pellets, and lyse cells with 1 mL of ice-cold PEB supplemented with a cocktail of protease/phosphatase inhibitor, along with RNase inhibitor (200 U). 4. Mix well by flicking the tubes. Incubate the tubes on ice for 15–20 min, and continue to mix by flicking the tubes intermittently. 5. Centrifuge at 14,000  g at 4  C for 15 min. 6. Collect supernatant (whole-cell lysate) into a new tube and discard the cell pellet.

3.2 Binding of circRNPs to Biotinylated SO/ASO (All Steps on Ice)

1. Transfer 700 μL (approximately 0.8–1 mg) of cell lysate into a fresh microfuge tube. At this point remove a small fraction (1–10%) of the lysate to use as input for Western blot analysis. 2. Add 700 μL of 2 TENT Buffer supplemented with 2 μL of Ribolock and protease/phosphatase inhibitor cocktail.

198

Jen-Hao Yang et al.

3. Add 1–3 μL of 100 μM 30 -biotin-labeled ASO. Prepare a separate tube for control oligomer (sense or scrambled oligomer, SO) pulldown side-by-side. 4. Incubate at 4  C for 2 h with rotation. 3.3 Preparation of Streptavidin Magnetic Beads and circRNP Purification

1. Wash 50 μL streptavidin Dynabeads with 500 μL ice-cold PBS once, then wash with 1 TENT (diluted 1:1 with PEB). After the wash, incubate tubes on the magnetic stand for 1 min. Remove the supernatant by aspiration and repeat this step four additional times, aspirating the remaining buffer after each wash. 2. Add the lysate containing ASOs (from Subheading 3.2) into prewashed beads and incubate at 4  C for 1 h with rotation. 3. Once the lysate and beads have rotated for 1 h, place the tubes on the magnetic stand for 1 min. Remove the supernatant by aspiration without touching the beads. 4. Wash the beads with 1 TENT (diluted in 1:1 in PEB) four times, incubating on the magnetic stand for 1 min between washes, removing the wash buffer each time. 5. After the last wash, spin down the beads with any captured molecules, and remove any remaining buffer. The sample can temporarily be stored at 80  C for further steps. At this stage, the beads can be subjected to a variety of analyses to identify the molecules bound to the beads. However, first and foremost it must be confirmed that the circRNA of interest is enriched in the beads (Subheading 3.4). Afterward, one can proceed to protein isolation (Subheading 3.5).

3.4 RNA Isolation, Reverse Transcription and qPCR

It is extremely important to check that the designed ASO specifically pulled down the circRNA of interest (and not other RNAs) (Fig. 2a). 1. Resuspend the beads with 100 μL PEB, and add 1 mL TRIzol directly into both the “Input” material (1–10% of total material) and to 100 μL of PEB containing beads. Mix by pipetting, and incubate at room temperature for 5 min. Add 200 μL chloroform to each tube and vortex the samples vigorously. 2. Centrifuge the tubes at 12,000  g for 15 min at 4  C, transfer ~500 μL of supernatant to a new tube, and add an equal volume of isopropanol and 1 μL GlycoBlue into every tube. 3. Centrifuge the tubes at 12,000  g for 10 min at 4  C, and discard the supernatant. 4. Add 1 mL ice-cold 75% ethanol, and centrifuge at 7500  g for 5 min at 4  C. Discard the supernatant and air-dry the tubes. 5. Resuspend the RNA pellet in 20 μL of RNase-free water.

Method to Study circular RNA-Protein Interactions

199

Fig. 2 Validation of circRNA interacting proteins. (a) Biotinylated antisense oligomers (ASOs) complementary to the junction of circSamd4 and control (Ctrl) were incubated with C2C12 in GM as well as in DM. After affinity pulldown using streptavidin beads, the levels of circSamd4 enrichment in ASO pulldown samples were assessed by RT-qPCR analysis of circSamd4 (relative to the enrichment of GAPDH mRNA, a transcript that does not bind the ASOs and encodes a housekeeping protein) in the pulldown samples. Enrichment in Samd4 mRNA was monitored in parallel. (b) Top 5 proteins shared in the mass spec datasets from mouse and human myoblast pulldowns. (c) Validation in C2C12 of other proteins identified as being selectively enriched in circSamd4 and circSAMD4 ASO pulldowns. Following ASO pulldown, the presence of candidate proteins was assessed by Western blot analysis. Input, 5 μg. (Figure is reproduced from ref. 15)

6. Add 1 μL dNTP mix (10 mM) and 1 μL random hexamers (150 ng/μL) to 12 μL of the RNA resuspended in water. 7. Incubate at 65  C for 5 min followed by 4  C for 5 min using a Thermal cycler. 8. Add 1 μL Maxima Reverse Transcriptase (200 U/μL), 1 μL RNase inhibitor (40 U/μL), and 4 μL 5 RT reaction buffer provided with the reverse transcriptase kit. 9. Incubate samples at 25  C for 10 min, followed by 50  C for 30 min, and lastly 85  C for 5 min using a Thermal cycler to synthesize cDNA. 10. Mix 5 μL of diluted cDNAs, 5 μL of SYBR green master mix, and divergent primers (2.5–10 μM) designed to amplify circRNA of interest, as well as primers used to amplify (a) housekeeping control RNAs, to ensure even loading of samples; (b) the linear RNA from which the circRNA

200

Jen-Hao Yang et al.

originated, to ensure that the ASO did not enrich this RNA; (c) other circRNAs, to ensure that only the circRNA of interest, and not other circRNAs, is enriched. 11. After completion of RT-qPCR, calculate all Ct values, and compare between control sense or scrambled oligomer (SO) with specific circRNA ASO. Normalize data using housekeeping RNAs (GAPDH mRNA, ACTB mRNA, 18s rRNA, etc.) and assess if the specific circRNA was enriched in the ASO pulldown but not the SO pulldown, and compare with the values of other RNAs associated with beads listed in step 10. 3.5 Preparation of Protein for MS Analysis and Immunoblotting Validation

After confirming that the assay optimally pulled down the circRNA of interest (Subheading 3.4), one can proceed further to protein isolation and analysis. Mass spectrometry (MS) has become the method of choice for high-throughput detection, identification, and quantitation of proteins. The workflow of sample preparation, technical processing and software analysis varies depending on researcher preferences and MS facilities. The quality of protein sample preparation significantly impacts the MS results. We recommend using SDS-Laemmli buffer at 95  C for 5 min for maximum protein elution and directly sending the samples for analysis by using standard liquid chromatography-mass spectrometry (LC-MS/MS) which can be handled by most proteomics facilities in different institutions. 1. To isolate proteins bound to the biotinylated ASO, add 50 μL 2 Laemmli buffer containing β-mercaptoethanol (Bio-Rad Catalog # 1610737) to the beads, heat at 95  C for 5 min and spin down the beads. 2. Place beads in magnetic stand and remove as much of the eluate as possible without touching the beads. Load 25 μL on SDS-PAGE gels and keep the remaining 25 μL for future Western blot analysis and validation steps. 3. Stain gels with colloidal blue or silver, cut distinguishable bands from the gel, and process them according to sample preparation instructions for mass spectrometry analysis. A commonly used protocol for MS analysis is the in-gel digestion method. 4. Send the samples collected from Subheading 3.5, step 3 for mass spectrometry analysis. 5. After the mass spec facility returns the data (Fig. 2b), those proteins displaying maximum peptide counts, potentially implicated in cellular processes, and having RNA-binding properties may be selected for further validation by Western blotting analysis using sample groups “input,” “control sense/ scrambled oligomer (SO),” and “specific pulldown fractions (ASO).” Other criteria for selection of candidate proteins for validation may also be employed (see Notes 2–4).

Method to Study circular RNA-Protein Interactions

4

201

Notes 1. It is helpful to start with at least 50–80 million cells or 800 μg to 1 mg proteins per pulldown, as the pulldown itself is not an efficient process. The inclusion of protease/phosphatase inhibitor cocktail along with RNase inhibitors is strongly encouraged in all key steps while performing the assay. 2. For validation experiments, it is highly encouraged to include control groups in which the samples are incubated with RNase A and RNase R, so that total RNA and linear RNAs are degraded, respectively. The association of specific proteins should be carried out in untreated and RNase R-treated fractions. 3. It is highly recommended that the ASO pulldown experiments be validated using alternative methods such as RNP IP (RIP) or biotin-RNA pulldown analyses. 4. For nuclear circRNA or if a stronger lysis buffer is needed to isolate circRNAs, crosslinking between circRNAs and RBPs can be considered, because crosslinking enables the complexes to endure stronger lysis conditions, binding of denatured molecules, and more stringent washes. The ASO/SO design is the same as in Subheading 2.2. The cross-linking and pull-down processes can follow ChIRP-MS protocol [16, 17].

Acknowledgments This work was supported in its entirety by the NIA IRP, NIH. References 1. Atianand MK, Fitzgerald KA (2014) Long non-coding RNAs and control of gene expression in the immune system. Trends Mol Med 20:623–631 2. Hansen TB, Jensen TI, Clausen BH et al (2013) Natural RNA circles function as efficient microRNA sponges. Nature 495:384–388 3. Floris G, Zhang L, Follesa P, Sun T (2017) Regulatory role of circular RNAs and neurological disorders. Mol Neurobiol 54:5156–5165 4. Memczak S, Jens M, Elefsinioti A et al (2013) Circular RNAs are a large class of animal RNAs with regulatory potency. Nature 495:333–338 5. Salzman J, Chen RE, Olsen MN et al (2013) Cell-type specific features of circular RNA expression. PLoS Genet 9:e1003777

6. Cocquerelle C, Mascrez B, Hetuin D et al (1993) Mis-splicing yields circular RNA molecules. FASEB J 7:155–160 7. Nigro JM, Cho KR, Fearon ER et al (1991) Scrambled exons. Cell 64:607–613 8. Rybak-Wolf A, Stottmeister C, Glazar P et al (2015) Circular RNAs in the mammalian brain are highly abundant, conserved, and dynamically expressed. Mol Cell 58:870–885 9. Pandey PR, Rout PK, Das A et al (2018) RPAD (RNase R treatment, polyadenylation, and poly (A)+ RNA depletion) method to isolate highly pure circular RNA. Methods 155:41–48 10. Panda AC, Grammatikakis I, Kim KM et al (2017) Identification of senescence-associated circular RNAs (SAC-RNAs) reveals senescence suppressor CircPVT1. Nucleic Acids Res 45:4021–4035

202

Jen-Hao Yang et al.

11. Li X, Liu CX, Xue W et al (2017) Coordinated circRNA biogenesis and function with NF90/ NF110 in viral infection. Mol Cell 67:214–227 12. Abdelmohsen K, Panda AC, Munk R et al (2017) Identification of HuR target circular RNAs uncovers suppression of PABPN1 translation by CircPABPN1. RNA Biol 14:361–369 13. Du WW, Yang W, Liu E et al (2016) Foxo3 circular RNA retards cell cycle progression via forming ternary complexes with p21 and CDK2. Nucleic Acids Res 44:2846–2858 14. Du WW, Yang W, Chen Y et al (2017) Foxo3 circular RNA promotes cardiac senescence by

modulating multiple factors associated with stress and senescence responses. Eur Heart J 38:1402–1412 15. Pandey PR, Yang JH, Tsitsipatis D et al (2020) circSamd4 represses myogenic transcriptional activity of PUR proteins. Nucleic Acids Res 155:41–48 16. Chu C, Chang HY (2018) ChIRP-MS: RNA-directed proteomic discover. Methods Mol Biol 1861:37–45 17. Chen YG, Chen R, Ahmad S et al (2019) N6-Methyladenosine modification controls circular RNA immunity. Mol Cell 76:96–109

Chapter 18 Exploration of Circular RNA Interactomes by RNA Pull-Down Method Mengshi Wu, Dan Peng, and Xiaomin Zhong Abstract Circular RNAs (circRNAs) are endogenous RNA molecules produced by back-splicing during the maturation process of mRNA precursors. Recent research revealed circRNAs as important regulators in a variety of physiological and pathological processes. However, the biological functions of circRNAs remains largely unknown. The RNA pull-down method can specifically and efficiently enrich circRNAs and their interacting partners. In order to further characterize the functions and working mechanism of circRNAs, here we provide a detailed RNA pull-down protocol for capturing and identifying the interactomes of circRNAs. Key words Circular RNA, RNA pull-down, circRNA interactomes

1

Introduction Circular RNAs (circRNAs) are generally produced by back-splicing of exons, or exon–intron intermediate products, or alternatively originated from intron lariats during the conventional splicing of mRNA precursors [1]. Back-splicing results in covalent joining of the ends of a single linear RNA molecule in a head-to-tail manner. CircRNAs were first identified in the early 1990s [2–4]. They were later recognized as an endogenous RNA subset that are widely spread in eukaryotes [5–10]. Despite a large number of circRNAs found in metazoans [11, 12], they manifest some features in common. First, circRNAs show highly specific expression patterns in cell types or developmental stages [13–16]. Second, circRNAs are more stable than linear mRNAs due to the resistance to degradation by exonuclease RNase [1]. Third, circRNAs show evolutionary conservation among different species [17]. More and more studies found circRNAs closely correlated to development and pathogenesis. CircRNAs perform their biological functions by various mechanism. Some abundant circRNAs were reported to work as microRNA sponges. For example, CDR1as has

Lin Zhang and Xiaowen Hu (eds.), Long Non-Coding RNAs: Methods and Protocols, Methods in Molecular Biology, vol. 2372, https://doi.org/10.1007/978-1-0716-1697-0_18, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021

203

204

Mengshi Wu et al.

multiple binding sites for miR-7. It sequesters miR-7 to block the activity of miR-7 and modulates brain development [7, 18]. Similarly, the mouse circSRY participates in testis development through sponging miR-138 [2]. Some nuclear circRNAs can regulate gene transcription. Ci-ANKRD52 is recruited to transcription sites and binds to RNA Pol II complex during elongation [19]. A class of exon-intron circRNAs (EIciRNAs) associates with RNA pol II and U1 snRNP to enhance transcription [20]. circRNAs also function as protein scaffolds to facilitate protein interactions; for example, circACC1 serves as a component of the AMPK complex to promote the assembly and activity of the enzyme, leading to metabolic adaptation in colorectal cancer cells [21]. Alternatively, circRNAs act as protein decoys to sequester proteins. For example, circMbl contains conserved binding sites for the splicing factor muscleblind (MBL/MBNL1), which sequesters MBL and promotes the biogenesis of the circular form by competing with the linear splicing [22]. In addition, a subset of circRNAs were found associated with translating ribosome, and they were able to encode short peptides with biological functions [23–25]. For example, circ-ZNF609, which specifically controls myogenesis, contains an open reading frame that can initiate translation in a splicing-dependent and cap-independent manner [23]. In summary, circRNAs are emerging as critical regulators to various physiological processes. It appears more and more important to profile the interactomes of circRNAs to further elucidate their biological functions. Since most circRNAs have not been fully studied so far, it is necessary to use the method of RNA pull-down to identify their associated partners, including proteins and RNA molecules. The RNA pull-down method takes advantages of basepairing of circRNAs and complementary DNA probes, and affinity purification using the biotin-streptavidin system, which allows specific and efficient enrichment of circRNAs and their interacting partners. Here we provide a detailed RNA pull-down protocol for identifying the interactomes of circRNAs, which may represent a general method to explore the functions and working mechanism of circRNAs.

2

Materials All solutions are prepared with analytical grade reagents in RNasefree water. Always use sterile reagents and supplies in the following steps. 1. Wash Buffer I: 0.1 M NaOH, 50 mM NaCl. 2. NaCl, 0.1 M. 3. PBS, 0.1 M phosphate, 0.15 M sodium chloride (pH 7.2).

Exploration of Circular RNA Interactomes by RNA Pull-Down Method

205

4. Protease inhibitor cocktail (Sigma, P8340). 5. RNase inhibitor (Geneseed, R0301). 6. Dynabeads™ MyOne™ Streptavidin C1 Magnetic Beads (Invitrogen, 65001). 7. Biotin Binding Buffer: 20 mM Tris–HCl (pH 7.5), 1 M NaCl, 1 mM EDTA. 8. Tris–HCl (pH 7.5), 20 mM. 9. Lysis Buffer: 25 mM Tris–HCl (pH 7.4), 150 mM NaCl, 1% NP-40, 1 mM EDTA, 5% glycerol. Freshly add protease inhibitor and RNase inhibitor. 10. Protein-RNA Binding Buffer (10 stock solution): 0.2 M Tris–HCl (pH 7.5), 0.5 M NaCl, 20 mM MgCl2, 1% Tween 20. Freshly add protease inhibitor and RNase inhibitor to the 1 working solution. 11. Wash Buffer II: 20 mM Tris–HCl (pH 7.5), 10 mM NaCl, 0.1% Tween-20. Freshly add protease inhibitor and RNase inhibitor. 12. Glycerol, 50%. 13. Pierce™ BCA Protein Assay Kit (ThermoFisher Scientific, 23225). 14. TRIzol RNA extraction Reagent (Molecular Research Center, Inc., TR118).

3

Methods

3.1 Preparation of Biotin-Labeled DNA Probes

3.2 Preparation of Streptavidin Magnetic Beads

Single stranded DNA probes complementary to the back-splicing junction sequence of circRNA are designed using Primer3web (http://primer3.ut.ee). Probes with biotin modification on the 50 termini are synthesized by Sangon Biotech Company. Probes are dissolved in DEPC-H2O to prepare a stock solution of 50 μM. 1. Resuspend the streptavidin magnetic beads in the stock vial by vortex for more than 30 s. Aliquot desired amount of streptavidin magnetic beads to a 1.5 ml microcentrifuge tube (see Note 1). 2. Place the tube on a magnetic stand to collect the beads and discard the preservative solution. 3. Remove the tube from magnetic stand, wash beads in 500 μl Wash Buffer I. Resuspend by pipetting. Place the tube on a magnetic stand to collect beads and discard the supernatant. 4. Repeat step 3 once. 5. Wash beads in 500 μl of 0.1 M NaCl. Place the tube on a magnetic stand to collect beads and discard the supernatant.

206

Mengshi Wu et al.

6. Repeat step 5 once. 7. Wash beads in 1 ml Biotin-binding Buffer. Place the tube on a magnetic stand to collect beads and discard the supernatant. 8. Repeat step 7 once. 9. Resuspend beads in 200 μl Biotin Binding Buffer. They are now ready for binding biotin-labeled DNA probes as described in Subheading 3.1. 3.3 Binding of Biotin-Labeled DNA Probes to Streptavidin Magnetic Beads

1. Add 50 pmol biotin-labeled probes to the beads (see Note 1). Mix gently by pipetting. 2. Incubate for 15–30 min at room temperature with gentle rotation (see Note 2). 3. Place the tube on a magnetic stand to collect beads and discard the supernatant. 4. Wash beads in 200 μl of 20 mM Tris (pH 7.5). Place the tube on a magnetic stand to collect beads and discard the supernatant. 5. Repeat step 4 once. 6. Resuspend beads in 100 μl of 1 Protein-RNA Binding Buffer and mix well. Keep on ice for later use.

3.4 Preparation of Cell Lysate

1. Harvest target cells with a cell scraper and collect in a 15 ml centrifuge tube. It is recommended to use 20 μg–200 μg total protein (typical yield from 1  105 ~ 1  106 cells) per pulldown assay. 2. Wash cells in cold PBS once. Transfer cells to a new microcentrifuge tube. 3. Centrifuge at 2500  g for 10 min at 4  C. Discard the supernatant. 4. Add an appropriate volume of ice-cold Lysis Buffer to the cells (ensure the final protein concentration > 2 mg/ml) (see Notes 3 and 4). Incubate on ice for 10 min with agitation periodically. 5. Pipette gently for several times to break up the remaining cell aggregates. Centrifuge at 14,000  g for 10 min at 4  C to pellet the cell debris. 6. Transfer the supernatant to a new microcentrifuge tube and determine the protein concentration using the BCA Protein Assay Kit. The cell lysate is now ready for RNA pull-down assay.

3.5 RNA Pull-Down for circRNA Interacting Partners

1. Set up a master mix for the RNA pull-down assay as follows. 10 Protein-RNA Binding Buffer, 10 μl. 50% glycerol, 30 μl.

Exploration of Circular RNA Interactomes by RNA Pull-Down Method

207

Cell lysate, 1 ~ 30 μl (total protein 20–200 μg). DEPC-H2O, to 100 μl. Add the above components to a 1.5 ml microcentrifuge tube on ice. Mix and centrifuge briefly. 2. Place the tube with probe-bound beads (in Subheading 3.3) on a magnetic stand to collect beads. Remove the supernatant (Protein-RNA Binding Buffer). 3. Add 100 μl master mix (step 1) to the probe-bound beads. Mix by pipetting gently. 4. Incubate at 4  C for 3–4 h with rotation. 5. Place the tube on a magnetic stand to collect beads against the side of the tube. Remove and discard the supernatant. 6. Wash with 100 μl Wash Buffer II. 7. Repeat steps 5 and 6 twice. Remove Wash Buffer II as much as possible in the last wash. 8. (Optional) To analyze RNA partners of circRNAs, add 1 ml TRIzol RNA extraction reagent to beads, and purify RNAs according to the manufacturer’s instructions. The pulleddown RNA partners can be analyzed by applying samples to quantitative real-time PCR, microarray, and high-throughput RNA sequencing, etc. 9. (Optional) To identify protein partners of circRNAs, add an appropriate volume of 1 SDS-PAGE loading buffer to beads, and heat for 10 min at 95  C. Transfer the supernatant into a new tube for further analysis of proteins, such as Western blotting and Mass spectrometry, etc.

4

Notes 1. Use a range of 25–100 pmol of biotin-labeled DNA probes per 20–50 μl of magnetic beads. The instructions below use a scale of 50 pmol of DNA probes to 25 μl of beads. 2. The reaction time can be extended according to the size of biotinylated molecules and the complexity of their spatial structure. 3. Add protease inhibitor cocktail and RNase Inhibitor into lysis buffer to prevent protein and RNA from degradation. 4. Ensure the protein concentration of cell lysate is greater than 2 mg/ml so that there is significant dilution into the Binding Reaction Buffer.

208

Mengshi Wu et al.

Acknowledgments This work was supported by The National Key Research and Development Program of China (2017YFA0105501 to Zhong X), and The Science and Technology Project of Guangdong Province (2015A020212019 to Zhong X). References 1. Chen LL (2016) The biogenesis and emerging roles of circular RNAs. Nat Rev Mol Cell Biol 17(4):205–211 2. Capel B, Swain A, Nicolis S et al (1993) Circular transcripts of the testis-determining gene Sry in adult mouse testis. Cell 73 (5):1019–1030 3. Nigro JM, Cho KR, Fearon ER et al (1991) Scrambled exons. Cell 64(3):607–613 4. Pasman Z, Been MD, Garcia-Blanco MA (1996) Exon circularization in mammalian nuclear extracts. RNA 2(6):603–610 5. Ivanov A, Memczak S, Wyler E et al (2015) Analysis of intron sequences reveals hallmarks of circular RNA biogenesis in animals. Cell Rep 10(2):170–177 6. Lu T, Cui L, Zhou Y et al (2015) Transcriptome-wide investigation of circular RNAs in rice. RNA 21(12):2076–2087 7. Memczak S, Jens M, Elefsinioti A et al (2013) Circular RNAs are a large class of animal RNAs with regulatory potency. Nature 495 (7441):333–338 8. Salzman J, Gawad C, Wang PL et al (2012) Circular RNAs are the predominant transcript isoform from hundreds of human genes in diverse cell types. PLoS One 7(2):e30733 9. Shen Y, Guo X, Wang W (2017) Identification and characterization of circular RNAs in zebrafish. FEBS Lett 591(1):213–220 10. Westholm JO, Miura P, Olson S et al (2014) Genome-wide analysis of drosophila circular RNAs reveals their structural and sequence properties and age-dependent neural accumulation. Cell Rep 9(5):1966–1980 11. Dong R, Ma XK, Li GW et al (2018) CIRCpedia v2: an updated database for comprehensive circular RNA annotation and expression comparison. Genomics Proteomics Bioinformatics 16(4):226–233 12. Ji P, Wu W, Chen S et al (2019) Expanded expression landscape and prioritization of circular RNAs in mammals. Cell Rep 26 (12):3444–3460. e3445 13. Nicolet BP, Engels S, Aglialoro F et al (2018) Circular RNA expression in human

hematopoietic cells is widespread and cell-type specific. Nucleic Acids Res 46(16):8168–8180 14. Rybak-Wolf A, Stottmeister C, Glazar P et al (2015) Circular RNAs in the mammalian brain are highly abundant, conserved, and dynamically expressed. Mol Cell 58(5):870–885 15. Salzman J, Chen RE, Olsen MN et al (2013) Cell-type specific features of circular RNA expression. PLoS Genet 9(9):e1003777 16. Veno MT, Hansen TB, Veno ST et al (2015) Spatio-temporal regulation of circular RNA expression during porcine embryonic brain development. Genome Biol 16:245 17. Jeck WR, Sorrentino JA, Wang K et al (2013) Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA 19 (2):141–157 18. Hansen TB, Kjems J, Damgaard CK (2013) Circular RNA and miR-7 in cancer. Cancer Res 73(18):5609–5612 19. Zhang Y, Zhang XO, Chen T et al (2013) Circular intronic long noncoding RNAs. Mol Cell 51(6):792–806 20. Li Z, Huang C, Bao C et al (2015) Exonintron circular RNAs regulate transcription in the nucleus. Nat Struct Mol Biol 22 (3):256–264 21. Li Q, Wang Y, Wu S et al (2019) CircACC1 regulates assembly and activation of AMPK complex under metabolic stress. Cell Metab 30(1):157–173.e7 22. Ashwal-Fluss R, Meyer M, Pamudurti NR et al (2014) circRNA biogenesis competes with pre-mRNA splicing. Mol Cell 56(1):55–66 23. Legnini I, Di Timoteo G, Rossi F et al (2017) Circ-ZNF609 is a circular RNA that can be translated and functions in myogenesis. Mol Cell 66(1):22–37.e9 24. Pamudurti NR, Bartok O, Jens M et al (2017) Translation of CircRNAs. Mol Cell 66 (1):9–21. e27 25. Zhang M, Huang N, Yang X et al (2018) A novel protein encoded by the circular form of the SHPRH gene suppresses glioma tumorigenesis. Oncogene 37(13):1805–1814

Chapter 19 Methods for Characterization of Alternative RNA Splicing Samuel E. Harvey, Jingyi Lyu, and Chonghui Cheng Abstract Quantification of alternative splicing to detect the abundance of differentially spliced isoforms of a gene in total RNA can be accomplished via RT-PCR using both quantitative real-time and semiquantitative PCR methods. These methods require careful PCR primer design to ensure specific detection of particular splice isoforms. We will also describe analysis of alternative splicing using a splicing “minigene” in mammalian cell tissue culture to facilitate investigation of the regulation of alternative splicing of a particular exon of interest. Key words Alternative splicing, RNA, RT-PCR, Variable exon, Minigene, Splicing factors, Splicing regulation

1

Introduction Alternative splicing is a key cellular process whereby particular combinations of exons in a nascently transcribed pre-mRNA are either included or excluded to generate different mature mRNA splice isoform transcripts, resulting in multiple protein isoforms encoded by a single gene. It is estimated that nearly all proteincoding genes in the human genome are alternatively spliced, providing an essential source of protein diversity [1, 2]. Differentially spliced isoforms may exert distinct biological functions, therefore accurate quantification of the relative amounts of splice isoforms of a gene is essential to explore the role of alternative splicing in biological processes as well as disease [3]. The pre-mRNA of an alternative spliced gene contains both constitutive exons and variable exons interspaced by introns that are ultimately spliced out and excluded from the final mRNA transcript. In this chapter, we will describe the characterization of exon-skipping, the most common form of alternative splicing, where one or more variable exons are either included or skipped from the final transcript to make up the array of spliced isoforms of a gene [4].

Lin Zhang and Xiaowen Hu (eds.), Long Non-Coding RNAs: Methods and Protocols, Methods in Molecular Biology, vol. 2372, https://doi.org/10.1007/978-1-0716-1697-0_19, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021

209

210

Samuel E. Harvey et al.

Reverse Transcription-Polymerase Chain Reaction (RT-PCR) is one of the most convenient methods for studying RNA transcripts and can generally be used with total RNA extracted from any biological source. Quantitative PCR (qPCR) following RT is often the method of choice given its extreme sensitivity of detection and accurate comparison of amplification of PCR products during the exponential phase of the PCR reaction [5]. Semiquantitative methods are less useful for precise quantification of the relative abundance of splice isoforms, however, combined with electrophoresis, these methods allow for the direct visualization and comparison of splice isoform abundance based on size disparity between differentially spliced transcripts. Quantification of levels of different splice isoforms is especially useful in studying the regulation of alternative splicing. Regulation of alternative splicing is accomplished by the recognition of cis-acting sequences in the pre-mRNA, usually located in the variable exonic or adjacent intronic sequences, by trans-acting RNA-binding proteins known as splicing factors [6]. In this protocol, we make use of a splicing “minigene” construct to study the regulation of alternative splicing of a variable exon within an alternatively spliced gene [7]. This minigene then offers a versatile tool for studying the regulation of splicing of a particular exon as it can be expressed in cell culture along with potential splicing factors that could influence variable exon inclusion or skipping.

2

Materials

2.1 RNA Source Material

Alternative splicing can be examined using RNA from any source. In this protocol we use RNA extracted from mammalian cells grown in tissue-culture. 1. E.Z.N.A. ® Total RNA Isolation Kit (Omega Bio-Tek).

2.2 RNA Extraction and RT

2. GoScript™ Reverse Transcriptase Reagents (Promega).

2.3

1. GoTaq ® Green Master Mix (Promega)

qPCR

2. Primers for specific splice isoform detection and a housekeeping gene. As an example, primers for the detection of CD44 splice isoforms and housekeeping gene TBP are included in Table 1. 2.4 Semiquantitative PCR

1. HotStarTaq Plus DNA Polymerase Reagents (Qiagen) 2. Primers for detecting minigene splice isoforms. As an example, primers for the detection of a CD44 variable exon 8 (v8) minigene are included in Table 2. 3. 0.5 TBE buffer (45 mM Tris, 45 mM boric acid, 1 mM EDTA) for electrophoresis.

Methods for Characterization of Alternative RNA Splicing

211

Table 1 qPCR Primers for analysis of CD44 alternative splicing Primer name

Forward (50 -30 )

Reverse (50 -30 )

CD44v with v5/v6 exons

GTAGACAGAAATGGCACCAC

CAGCTGTCCCTGTTGTCGAA

CD44s

TACTGATGATGACGTGAGCA

GAATGTGTCTTGGTCTCTGGT

Human TBP

GGAGAGTTCTGGGATTGTAC

CTTATCCTCATGATTACCGCAG

Table 2 Primers for analysis of CD44 v8 minigene alternative splicing Primer name

Forward (50 -30 )

Reverse (50 -30 )

qPCR v8 minigene inclusion CAATGACAACGCTGGCACAA

CCAGCGGATAGAA TGGCGCCG

qPCR v8 minigene skipping GAGGGATCCGGTTCC TGCCCC

CAGTTGTGCCACTTGTGGGT

Semiquantitative PCR v8

CCAGCGGATAGAA TGGCGCCG

GAGGGATCCGGTTCC TGCCCC

4. 1.5% Agarose gel prepared with 0.5 Tris–borate–EDTA (TBE) buffer and 0.5 μg/mL ethidium bromide. 2.5 Cotransfection of Splicing Minigene and Splicing Factors

1. Transfectable mammalian cell line, such as HEK293T (ATCC® CRL-3216™). 2. Gibco® Dulbecco’s Modified Eagle Medium High Glucose (plain media as well as media supplemented with L-glutamine and 10% fetal bovine serum). 3. Lipofectamine 2000 Reagents (Invitrogen). 4. Mammalian expression plasmids encoding splicing minigene and splicing factors of interest. As an example, we use a CD44 v8 minigene plasmid, splicing factor hnRNPM plasmid, and pcDNA3 control plasmid for cotransfection.

2.6

Equipment

1. Thermo Fisher Scientific Nanodrop 1000 Spectrophotometer for quantification of RNA. 2. A thermal cycler, such as the Bio-Rad DNA Engine Tetrad® 2, for Reverse Transcriptase reaction and semiquantitative PCR. 3. A real-time thermal cycler such as the Roche LightCycler® 480 II System with associated LightCycler software for qPCR. 4. Horizontal electrophoresis apparatus.

212

Samuel E. Harvey et al.

5. A UV Transilluminator with Camera such as the BioRad Gel Doc XR System with Quantity One 1-D Analysis Software for visualization and densitometric analysis of ethidium-bromide stained DNA. 6. A tabletop centrifuge such as the Eppendorf Centrifuge 5424.

3

Methods Prepare all reactions with molecular grade nuclease-free water to prevent RNA degradation.

3.1 RNA Extraction and Purification

1. Total RNA from mammalian cells in tissue culture is extracted using the E.Z.N.A.® Total RNA Isolation Kit (Omega Bio-Tek) by following the product manual (see Note 1). For RNA isolation via this kit, cells can be directly lysed using the provided lysis buffer. 2. As per the manufacturer protocol, first collect cells in 350 μL RNA lysis buffer. 3. Add 350 μL of 70% ethanol to the lysate and mix thoroughly. Then transfer the sample to the RNA purification column. 4. Centrifuge at 10,000  g for 1 min and discard flow-through. 5. Wash the column once with 500 μL of RNA wash buffer I and twice with RNA wash buffer II, centrifuging at 10,000  g for 1 min between each wash and discarding flow-through. 6. Remove residual RNA wash buffer II from the column by centrifugation at maximum speed for 2 min. 7. Transfer the column into a new 1.5 mL tube, add 50 μL nuclease-free water at the center of the column matrix, and elute RNA by centrifugation at the maximum speed for 1 min. 8. Determine RNA quantity using a UV spectrometer (such as NanoDrop). High quality RNA should have a 260 nm:280 nm absorbance ratio of approximately 2.0. RNA should be stored at 80  C for long-term storage.

3.2

RT Reaction

1. In a reaction totaling 20 μL, combine 250–1000 ng of total RNA with 0.05 μg random hexamer primers, 50 pmol MgCl2, 10 pmol dNTPs, 2 μL 5 GoScript™ Buffer, and 1 μL of GoScript™ Reverse Transcriptase (see Note 2). Mix by vortex and briefly centrifuge. 2. Incubate reaction mixture 25  C for 5 min for primer annealing, 42  C for 60 min for the RT reaction, and 70  C for 5 min to inactivate the RT enzyme. These incubations are best performed in a programmable thermocycler. Completed reactions may now be frozen at 20  C for long-term storage.

Methods for Characterization of Alternative RNA Splicing

3.3 Primer Design for qPCR

213

1. In this protocol we will provide an example to detect the most common form of alternative splicing, exon skipping. In a hypothetical four-exon pre-mRNA (Fig. 1a), exon 3 is a variable exon that is either included in the mature mRNA to generate a longer isoform or skipped to generate a shorter isoform. Three primers are designed to differentiate one splice isoform from another. A forward primer lies within the variable exon 3 with a reverse primer in the immediately downstream constitutive exon 4. These primers allow the selective amplification of the longer isoform with variable exon 3 included. To detect the shorter isoform with the variable exon 3 skipped, a forward primer is designed that spans the junction between exon 2 and 4. The junction between exon 2 and 4 is disrupted with exon 3 inclusion. Thus, together with the corresponding reverse primer located in constitutive exon 4, this primer set only amplifies mRNA isoforms where exon 3 is skipped. When multiple consecutive variable exons exist in the gene of interest, primers can be designed in two different variable exons for detection of specific variable exon-containing isoforms (Fig. 1b). 2. It is useful to design qPCR primers that can detect all isoforms of the mRNA of interest. In this way the total expression of the gene as well as the expression of individual isoforms can be compared. In the hypothetical four-exon pre-mRNA (Fig. 1a), exons 1 and 2 are constitutive exons immediately adjacent to one another without any variable exons in between. By designing a forward primer in exon 1 and a reverse primer in exon 2, all mature mRNA isoforms generated from this pre-mRNA can be detected via PCR. 3. Primers should also be designed to amplify a reference “housekeeping” gene that is expected to be uniformly expressed in all samples. In this protocol we use primers amplifying a segment of human TATA-binding protein (TBP) (Table 1). 4. Primers should be designed with an ideal melting temperature (Tm) of approximately 55  C. The primer length is generally 18–22 nucleotides. Tm of each primer can be estimated using online tools such as the OligoAnalyzer of Integrative DNA Technologies (http://www.idtdna.com/analyzer/ applications/oligoanalyzer). The amplicon for each primer pair should be between 80 and 150 base pairs. 5. As an example, we have depicted primer design to detect alternatively spliced isoforms of the CD44 gene that encodes a family of CD44 transmembrane proteins (Fig. 1b, Table 1). CD44 contains 9 constitutive exons and 9 variable exons, and in our example, primers have been designed to detect isoforms containing CD44 variable exons 5–6 (CD44v5–6) and isoforms in which all variable exons have been skipped (CD44s).

214

Samuel E. Harvey et al.

a

1

Exon inclusion pre-mRNA 1

2

3

2

3

4

primer set primer set amplifying amplifying total mRNA exon-included isoform

4

variable exon Exon skipping

1

2

4

primer set primer set amplifying amplifying total mRNA exon-skipping isoform

b

CD44 pre-mRNA 1

2

3

4

5

v2 v3 v4 v5 v6 v7 v8 v9 v10 6

variable exons

constitutive exons Exon inclusion

2 3 4 5 v3 v4 v5 v6 v7 v8 v9 v10 6 7

Relative Expression to TBP

9

Exon skipping

primer set amplifying v5 & v6-containing CD44v c

8

constitutive exons

CD44v3-v10 mRNA 1

7

CD44s mRNA 8 9

1

2 3 4 5 6 7 8 9

primer set amplifying CD44s

40 30 20 10 0 CD44v5-v6

CD44s

Fig. 1 Characterization of alternative splicing by qRT-PCR. (a) A schematic illustrating a hypothetical four-exon gene with a variable exon 3 that undergoes exon-inclusion or exon-skipping. Three primer sets for qPCR indicated by the paired arrows are designed to amplify the total mRNA, the variable exon 3 inclusion isoform, or the exon-skipping isoform. (b) A schematic depicting the exon structure of the transmembrane protein CD44, with the 9 constitutive and 9 variable exons labeled. CD44v3-v10 and CD44s are predominant splice

Methods for Characterization of Alternative RNA Splicing

3.4 Quantification of Splicing Via qPCR

215

1. Prepare qPCR reactions in a total volume of 20 μL by combining 0.1–1.0 μL of cDNA generated from the RT reaction with 10 μL GoTaq® Green Master Mix and 5 pmol each of forward and reverse primers. Be sure to include reactions for the reference gene. 2. Perform triplicate qPCR reactions with 40–50 cycles of the following steps: 95  C denaturation for 10 s, 58  C annealing for 10 s, 60  C extension for 30 s. Include a thermal denaturing step to generate dissociation curves that can be used to verify specific amplification of PCR products (see Note 3). 3. After the PCR is completed, the amplified signal from each reaction is presented as a threshold cycle (Ct) determined using an arbitrary detection threshold, usually set by the qPCR software to reside within the exponential range of PCR amplification. Relative quantification of splice isoforms is accomplished via comparison to expression of a reference gene such as TBP. Relative quantification assumes that the PCR amplification efficiency for each primer pair is perfect, resulting in doubling of PCR amplicons with every cycle. Since the reference gene expression remains constant between experimental samples, the quantity of mRNA for each splice isoform relative to the reference gene is calculated using the following formula: 2ΔCt, where ΔCt ¼ (Ct splice isoform  Ct reference mRNA). As an example, quantification of two splice isoforms of CD44 was conducted via qPCR using RNA extracted from the immortalized human mammary epithelial cell line HMLE (Fig. 1c). The primers used are listed in Table 1. 4. Relative levels of splice isoforms of the same gene can also be expressed as a ratio of one splice isoform to another, for example a variable exon inclusion–variable exon skipping ratio. In such a scenario, a reference gene is not required because splice isoforms of the same gene from the same sample are compared. The ratio of splice isoforms is calculated using the following formula: 2ΔCt, where ΔCt ¼ (Ct inclusion splice isoform mRNA  Ct skipping splice isoform mRNA in the same sample). Relative comparison of this ratio between different experimental samples is accomplished via the formula: 2ΔCt (experiment) /2ΔCt(control), where ΔCt(experiment) ¼ (Ct inclusion splice isoform mRNA in experimental sample  Ct skipping splice isoform mRNA in experimental sample) and ΔCt (control) ¼ (Ct inclusion splice isoform mRNA in control

ä Fig. 1 (continued) isoforms and are depicted. Paired arrows indicate primer sets to amplify CD44v v5-v6 containing isoforms and the CD44s isoform which lacks all variable exons. (c) Representative qRT-PCR results using primers in Table 1 to show the relative expression of CD44v5/v6 and CD44s, normalized to TBP, in an immortalized human mammary epithelial cell line (HMLE)

216

Samuel E. Harvey et al.

Fig. 2 Semiquantitative PCR for examination of exon skipping. A schematic is shown depicting a hypothetical four-exon gene with variable exon 3 undergoing alternative splicing. The paired arrows indicate primer sets that flank the variable exon 3 and thus amplify both variable exon-inclusion and skipping isoforms from a cDNA template. The different splice isoforms can be distinguished by size using semiquantitative PCR and electrophoresis

sample  Ct skipping splice isoform mRNA in control sample). In this way the fold change in inclusion–skipping ratio compared to the control sample is obtained. 3.5 Primer Design for Detection of Splice Isoforms Via Semiquantitative PCR

3.6 Performing and Analyzing SemiQuantitative PCR

Through careful primer and PCR design, differences in splice isoform expression in cDNA generated from total RNA can be visualized using agarose gel electrophoresis. Returning to a hypothetical example of a four-exon pre-mRNA, a forward primer in constitutive exon 2 and a reverse primer in constitutive exon 4 will amplify differently sized PCR amplicons depending on whether or not the template includes variable exon 3 (Fig. 2). Using semiquantitative PCR, these primers facilitate the amplification, separation via electrophoresis, and approximate quantification of different splice isoforms. Primers should be designed with an ideal Tm of 55  C, and ideal amplicons should be no more than 300 base pairs long (see Note 4). 1. In a 20 μL reaction, combine 0.1–1.0 μL cDNA with 2 μL 10 PCR Buffer (Qiagen), 4 μL 5 Q-Solution (Qiagen), 0.5 μL HotStarTaq DNA Polymerase (Qiagen), and 5 pmol each of forward and reverse primers. 2. Complete PCR reaction using the following “hot start” program: 95  C denaturation for 5 min, approximately 30 cycles of 95  C for 30 s, 55  C for 30 s, 72  C for 30 s, and lastly 72  C for 10 min. Ensure that PCR is completed in the exponential range (see Note 5). 3. Visualize PCR reactions via horizontal gel electrophoresis using a 1.5% agarose-TBE gel prepared with 0.5 μg/mL ethidium bromide (see Note 6). Detection of two distinct bands differing in size by exactly the length of the variable exon indicates a successful PCR amplification. The approximate quantity of each splice isoform present in the original RNA

Methods for Characterization of Alternative RNA Splicing

217

sample can be inferred using densitometric analysis by comparing the fluorescence intensity of the inclusion and skipping PCR amplicons using a UV transilluminator with camera and image intensity quantitation software (such as the BioRad Gel Doc XR system with Quantity One 1-D Analysis Software). Brighter intensity of the band is equivalent to higher expression of the splice isoform in the original RNA. 3.7 Generating a Splicing Minigene to Analyze Regulation of Alternative Splicing of a Variable Exon

Splicing minigenes are useful tools to interrogate the regulation of splicing of a variable exon of interest. A splicing minigene is an expression plasmid consisting of a variable exon of interest flanked by two constitutive exons. It is important to include the immediately upstream and downstream endogenous intronic sequences flanking the variable exon where potential cis-acting sequences important for determining the inclusion or skipping of the exon may be located [7] (see Note 7). As an example, we have included a CD44 v8 minigene with variable exon 8 (v8) of CD44 bounded by the adjacent 250 base pairs of its upstream and downstream introns. This sequence is flanked by two constitutive exons from the human insulin gene with functioning 50 and 30 splice sites. Primers for detection of exon inclusion and exon skipping were designed in accordance with the principles mentioned in Subheadings 3.3 and 3.5 (Fig. 3a, Table 2). Splicing minigenes incorporating fluorescent signals can also be used for splicing analysis, especially in the context of large-scale alternative splicing screens. The RG6 minigene [8] is constructed so that either EGFP or dsRED is expressed by including or excluding an exon with a length of 3n + 1 bp, respectively. Exon skipping results in the production of an in frame dsRED, while exon inclusion creates a +1 bp frameshift, producing EGFP in frame. The mutually exclusive nature of dsRED and EGFP expression provides a sensitive and quantitative readout for alternative splicing, suitable for large-scale screening. As an example shown in Fig. 3b, the splicing minigene contains a 28-bp artificial exon. To determine the splicing effect of an intronic regulatory element I-8, originally located within the downstream intron of CD44 v8 exon, the I-8 sequence element was inserted into the intron downstream of the alternative exon [9, 10]. Exon skipping or inclusion leads to mutually exclusive expression of dsRED or EGFP, respectively, which can then be quantified through imaging or flow cytometry.

3.8 Investigating the Regulation of Alternative Splicing Via Cotransfection of a Splicing Minigene and Splicing Factors

1. As the splicing minigene contains native cis-acting splicing regulatory sequences within or near the variable exon, transacting RNA binding proteins are capable of modulating inclusion or skipping of the variable exon in the minigene in a transfectable cell model. As an example we will describe cotransfection of the CD44 v8 minigene plasmid and a plasmid

218

Samuel E. Harvey et al.

a

CD44 v8 Minigene c1

v8

c2

CD44v8 with ~ 250 bp flanking intronic sequences qRT-PCR detection

c1

v8

Semi-PCR detection

c2

c1

v8

c1

c2

c2

primer set amplifying v8 inclusion c1

c2 common primer set amplifying both inclusion and skipping isoforms

primer set amplifying v8 skipping

b c1

c1

c2

I-8

c2

c1

dsRED

Skipping

c

EGFP

c2

EGFP

Inclusion

d

1 0.8 Incl/Skip

dsRED

Ctrl hnRNPM

0.6

Inclusion

0.4

Skipping

0.2

% inclusion 94

17

0 Ctrl hnRNPM Fig. 3 A splicing minigene assay for characterization of exon skipping. (a) A schematic of the CD44 v8 minigene is shown. The CD44 v8 variable exon flanked by approximately 250 bp of upstream and downstream

Methods for Characterization of Alternative RNA Splicing

219

expressing splicing factor hnRNPM in the HEK293T mammalian cell line. In this case, hnRNPM promotes exon-skipping of the CD44 v8 minigene [11] (Fig. 3c, d). 2. In a tissue culture hood using aseptic technique, plate 2.25  105 HEK293T cells in 24 well tissue culture plates using DMEM containing 10% FBS (see Subheading 2). 3. 24 h after seeding cells, at which time the cells are approximately 90% confluent, conduct transfections using Lipofectamine 2000 (Invitrogen) (see Note 8). Per well of a 24-well plate, combine 1.5 μL Lipofectamine 2000 with 50 μL plain DMEM (see Note 9) and incubate 5 min at room temperature. 4. Combine Lipofectamine and media with transfecting plasmids diluted in 50 μL plain DMEM and mix well. As an example, transfect all wells with 100 ng CD44 v8 minigene, one well with a control plasmid such as pcDNA3, and other wells with splicing factors of interest such as hnRNPM. Each well should be transfected with a total of 800 ng of plasmid, so add in the remaining DNA with a control vector such as pcDNA3 (for example, a well could be transfected with 100 ng CD44 v8 minigene, 400 ng hnRNPM, and 300 ng pcDNA3). Incubate mixture for 20 min at room temperature. 5. During the above incubation, replace media on HEK293T cells with fresh DMEM containing 10% FBS. 6. Add Lipofectamine–transfectant mixture dropwise slowly over cells and incubate at 37  C for 24 h (see Note 10). 7. Collect cells using RNA extraction protocol detailed in Subheading 3.1. 8. As an example, qRT-PCR data (Fig. 3c) and semiquantitative RT-PCR data (Fig. 3d) from the CD44 v8 splicing minigene experiments were collected using primers in Table 2 [11] (see Note 11). ä Fig. 3 (continued) intronic sequence is cloned between constitutive exons c1 and c2. Exons are shown as boxes and introns as lines. Primer sets in black are used for qRT-PCR reactions. The primer set in white is designed for semiquantitative methods. (b) A schematic of a fluorescent splicing reporter containing an intronic regulatory element I-8. dsRED and EGFP are cloned after constitutive exon c2, where dsRED is in frame with c2, while the EGFP open reading frame has a +1 bp frameshift compared to dsRED. Inclusion of the variable exon produces EGFP while skipping produces dsRED. (c) qRT-PCR data after cotransfection of 100 ng CD44 v8 minigene with 400 ng of splicing factor hnRNPM using v8 inclusion and skipping primers in Table 2. A relative Inclusion/Skipping ratio (Incl/Skip) is plotted indicating that the Incl/Skip ratio decreases when hnRNPM is transfected. (d) Semiquantitative PCR image showing decreased percent inclusion (% inclusion) after transfection with 400 ng hnRNPM using semiquantitative v8 primers in Table 2. Both images were from the same gel with uniform exposure time. % inclusion was calculated using densitometric image analysis software to divide the pixel intensity of the inclusion band by the combined pixel intensity of both inclusion and exclusion bands. (Figures 3c and 3d were modified from Fig. 4 in our previous publication under Creative Commons License CC-BY-NC 4.0)

220

4

Samuel E. Harvey et al.

Notes 1. Using the E.Z.N.A.® Total RNA Isolation Kit (Omega Bio-Tek), genomic DNA contamination is minimal. In addition, as described in the Methods section, the primers are designed in different exons. This would eliminate PCR products amplified from genomic DNA which contains long intronic sequences. Therefore, the RNA may be used directly for RT-PCR analysis. If genomic DNA contamination is a concern, the manufacturer provides instructions for an on-column DNase treatment protocol. Also, during the elution step, we elute with RNase-free H2O. If DEPC-treated H2O is used as an eluent, it can interfere with quantification of RNA via UV spectrometry [12]. 2. Depending on the expression level of the gene of interest and the amount of different splice isoforms in the mRNA, the total input RNA into the Reverse Transcriptase reaction can be adjusted. Usually 250 ng of total RNA is sufficient, however up to 1 μg of RNA may be added to the reaction to increase the cDNA concentration without saturating the Reverse Transcriptase during cDNA synthesis. 3. The thermal denaturing step in qPCR is critical to determine the specific amplification of the target PCR product. The thermal denaturation curve for each pair of primers per reaction should show a single isolated peak with a uniform melting temperature. If multiple peaks with different melting temperatures are observed, then nonspecific amplification is occurring and the PCR reaction must be optimized before qPCR results can be analyzed [5]. Redesigning primers and adjusting the annealing temperature of the reaction may be necessary. We also recommend electrophoresing the PCR products on an agarose gel and visualizing the product using ethidiumbromide under UV. If amplification is specific, a single band of the appropriate size should be observed. 4. Designing primers so that the PCR amplicons for each splice isoform are relatively small (ideally 80–300 base pairs) ensures that both the large and small amplicons are amplified with similar efficiency during the PCR reaction. Automated tools to design primers include PrimerSeq (primerseq.sourceforge. net) for semiquantitative PCR design and Primer-Blast for splice junction design (https://www.ncbi.nlm.nih.gov/tools/ primer-blast/) [13, 14]. 5. Accuracy of semiquantitative PCR for visualizing the relative amounts of splice isoforms is dependent on the number of PCR

Methods for Characterization of Alternative RNA Splicing

221

cycles. The relative quantities of splice isoforms in a sample of mRNA remain proportional during the exponential phase of a PCR reaction, but this proportional relationship is less accurate in later PCR cycles once the reaction reaches the linear and plateau phases as PCR reagents and the polymerase are exhausted [5]. This is the reason qPCR offers more accurate quantification of expression compared to semiquantitative PCR. It is important to perform a series of PCR cycles to identify the range of cycles where the proportional intensity of the different splice isoform bands remains constant [15]. This range also depends on the expression level of the gene in the original RNA sample. It is also recommended to conduct 32P-labeled PCR by adding 32P-labeled dCTP to the PCR reaction to increase the sensitivity of detecting bands on a gel. In 32P-labeled PCR, the total PCR cycle number can be reduced to approximately 20 cycles to ensure that results are captured during the exponential phase of the reaction. 6. To improve the resolution of PCR product bands, it may be necessary to increase to the concentration of the agarose gel to 2%. Polyacrylamide gel electrophoresis (6%) may also be used to resolve even smaller PCR products or resolve splice isoform bands that differ by a small size. 7. Splicing minigenes using constitutive exons not derived from the gene of interest offer a convenient tool for cloning different variable exons of interest into the same minigene construct. However, the actual exons flanking the variable exon may also be included in the minigene to better approximate the genomic cis-acting sequences present for native splicing of the gene. These cis-elements are located in both the variable exon of interest as well as adjacent introns [6]. 8. When using the transfection reagent Lipofectamine 2000 (Invitrogen), it is essential that cells be at a high confluency of at least 90%. This is to ensure that cells are still dividing to maximize uptake of transfected DNA while minimizing the impact of cell death that can occur during transfection. 9. Do not use DMEM containing serum when preparing transfection mixtures with the transfection reagent and transfected DNA. Serum proteins can interfere with the formation of the DNA–lipid transfection complexes and reduce transfection efficiency. 10. Although 24 h is usually sufficient to observe efficient expression of transfected DNA, cells can be incubated for up to 48 h to obtain even higher transgene expression. 11. To confirm protein transgene expression, for example transfected splicing factor expression constructs, Western blot analysis using cell lysates and an antibody specific for the protein of

222

Samuel E. Harvey et al.

interest can be conducted. Efficient expression should result in an intense band on Western blot compared to control.

Acknowledgments This work was supported by research grants from the US National Institutes of Health (R01 CA182467, R35GM131876) to C.C. C. C. is a Cancer Prevention and Research Institute of Texas Scholar in Cancer Research (RR160009). References 1. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB (2008) Alternative isoform regulation in human tissue transcriptomes. Nature 456(7221):470–476. https://doi.org/10. 1038/nature07509 2. Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ (2008) Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet 40:1413–1415. https://doi.org/10.1038/ ng.259 3. Liu S, Cheng C (2013) Alternative RNA splicing and cancer. Wiley Interdiscip Rev RNA 4:547–566. https://doi.org/10.1002/wrna. 1178 4. Keren H, Lev-Maor G, Ast G (2010) Alternative splicing and evolution: diversification, exon definition and function. Nat Rev Genet 11:345–355. https://doi.org/10.1038/ nrg2776 5. Freeman WM, Walker SJ, Vrana KE (1999) Quantitative RT-PCR: pitfalls and potential. BioTechniques 26:112–122, 124–125. https://doi.org/10.2144/99261rv01 6. Wang Z, Burge CB (2008) Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. RNA 14:802–813. https://doi.org/10.1261/rna.876308 7. Cooper TA (2005) Use of minigene systems to dissect alternative splicing elements. Methods 37:331–340. https://doi.org/10.1016/j. ymeth.2005.07.015 8. Orengo JP, Bundman D, Cooper TA (2006) A bichromatic fluorescent reporter for cell-based screens of alternative splicing. Nucleic Acids Res 34:e148. https://doi.org/10.1093/nar/ gkl967 9. Huang H, Zhang J, Harvey SE, Hu X, Cheng C (2017) RNA G-quadruplex secondary

structure promotes alternative splicing via the RNA-binding protein hnRNPF. Genes Dev 31:2296–2309. https://doi.org/10.1101/ gad.305862.117 10. Zhang J, Harvey SE, Cheng C (2019) A highthroughput screen identifies small molecule modulators of alternative splicing by targeting RNA G-quadruplexes. Nucleic Acids Res 47:3667–3679. https://doi.org/10.1093/ nar/gkz036 11. Xu Y, Gao XD, Lee J-H, Huang H, Tan H, Ahn J, Reinke LM, Peter ME, Feng Y, Gius D, Siziopikou KP, Peng J, Xiao X, Cheng C (2014) Cell type-restricted activity of hnRNPM promotes breast cancer metastasis via regulating alternative splicing. Genes Dev 28:1191–1203. https://doi.org/10.1101/ gad.241968.114 12. Okamoto T, Okabe S (2000) Ultraviolet absorbance at 260 and 280 nm in RNA measurement is dependent on measurement solution. Int J Mol Med 5:657–659. https://doi.org/ 10.3892/ijmm.5.6.657 13. Tokheim C, Park JW, Xing Y (2014) PrimerSeq: design and visualization of RT-PCR primers for alternative splicing using RNA-seq data. Genomics Proteomics Bioinformatics 12:105–109. https://doi.org/10.1016/j. gpb.2014.04.001 14. Ye J, Coulouris G, Zaretskaya I, Cutcutache I, Rozen S, Madden TL (2012) Primer-BLAST: a tool to design target-specific primers for polymerase chain reaction. BMC Bioinformatics 13:134. https://doi.org/10.1186/14712105-13-134 15. Horner RM (2006) Relative RT-PCR: determining the linear range of amplification and optimizing the primers:competimers ratio. CSH Protoc 2006. https://doi.org/10. 1101/pdb.prot4109

Chapter 20 Computational Analysis of lncRNA from cDNA Sequences Susan Boerner and Karen M. McGinnis Abstract Based on recent findings, long noncoding (lnc) RNAs represent a potential class of functional molecules within the cell. In this chapter we describe a computational scheme to identify and classify lncRNAs within maize from full-length cDNA sequences to designate subsets of lncRNAs for which biogenesis and regulatory mechanisms may be verified at the bench. We make use of the Coding Potential Calculator and specific Python scripts in our approach. Key words Long noncoding RNA, miRNA, siRNA, NATs, Maize

1

Introduction Single sequence (alignment-free) de novo identification of long noncoding (lnc)RNAs has traditionally been based on three characteristics: base pair length (>200 bp), open reading frame (ORF) length (variable, but generally 200: l.append(record) print "number of starting cDNA sequences:", len(L) print "number of cDNAs over 200bp:", len(l) output = open("maize_flcDNA_over_200bp.fasta", "w") SeqIO.write(l, output, "fasta") output.close() input1.close()

3.2 Identification of Potential lncRNAs Using the Coding Potential Calculator

1. Running the Coding Potential Calculator Go to the Coding Potential Calculator website: cpc.cbi. pku.edu.cn. Click on “Run CPC” from either the top or side menu. Under “file upload” click “Browse” and select the FASTA file that you created in Subheading 3.1, step 2. (maize_flcDNA_over_200bp.fasta), and click “run”. (see Note 6) Depending on how many cDNA sequences you have, this analysis can take several minutes or several hours to complete. After the run begins, you are given a Task ID. Save this ID so you can retrieve your results at a later time (if needed) by clicking on “Get Results” and entering your Task ID. 2. Extracting noncoding cDNA sequence IDs and creating a FASTA file of potential lncRNAs If you saved your Task ID, you can return to the CPC website and click on “Get Results” from the side menu, enter your Task ID and click “view.” At the top of the page, click on “Raw Data” (red text), and then click on “CPC Results.” Save the results as a text file (e.g., right-click and select “save as”), and name the file “cpc.txt”. Run the following Python script to extract the necessary data from this file and create a new FASTA file containing the cDNA sequences identified as noncoding by CPC (see Note 7). Again, the input and output files are highlighted.

Computational Analysis of lncRNA from cDNA Sequences

227

#!/usr/bin/env python from Bio import SeqIO input1 = open("cpc.txt", "rU") report = file("noncoding.fasta", 'w') for line in input1: line = line.rstrip() if line[14] == "n": if int(line[25]) >= 1: print >> report, ">" + line [0:8] report.close() input2 = open("maize_flcDNA_over_200bp.fasta", "rU") input3 = open("noncoding.fasta", "rU") a = list(SeqIO.parse(input2, "fasta")) b = list(SeqIO.parse(input3, "fasta")) noncoding = [] for record in a: for rec in b: if record.id == rec.id: noncoding.append(record) print "number of potential lncRNAs identified by CPC:", len(noncoding) output = open("lncRNAs.fasta", "w") SeqIO.write(noncoding, output, "fasta") output.close() input1.close() input2.close() input3.close()

You now have a new FASTA file of potential lncRNAs, in this case named “lncRNAs.fasta”. 3.3 Filtering Known miRNA Precursors

1. Download known miRNA sequences from ensembl

228

Susan Boerner and Karen M. McGinnis

Go to http://ensembl.gramene.org/Zea_mays/Info/ Index. (see Note 8) Under “Gene annotation”, click on the FASTA link next to “Download genes, cDNAs, ncRNA, proteins”. Click on the “ncrna” folder. Click on Zea_mays. AGPv3.23.ncrna.fa.gz to download the file. Unzip the file, which contains known miRNA sequences in Zea mays, and rename it “known_miRNA.fasta”. 2. Run Python script to filter known miRNA genes from lncRNAs Before running the following Python script, you need to create a BLAST database of the lncRNAs generated in Subheading 3.2. (see Note 9) At the command line, pass the following: makeblastdb –in lncRNAs.fasta –dbtype nucl –title lncRNA_db – out lncRNA_db

The script creates a text file that lists the lncRNA ID with its associated known miRNA ID, the hsp score, and hsp identities for their alignment. The script determines lncRNA precursor potential based on a 98% identity match between the entire known miRNA sequence and the lncRNA (without gaps in the alignment). This stringency can of course be adjusted if desired. The percent identity, the name of the report file, input and output files, and the BLAST database name are highlighted in the script. #!/usr/bin/env python from Bio import SeqIO from Bio.Blast.Applications import NcbiblastnCommandline from Bio.Blast import NCBIXML input1 = open("known_miRNA.fasta", "rU") RNAs = list(SeqIO.parse(input1, "fasta")) report = file("known_miRNA_in_lncRNAs.txt", 'w') lncRNA_list = file("lncRNAs with RNA list.fasta", 'w') for record in RNAs: sequence = open("fasta1.fasta", "w") SeqIO.write(record, sequence, "fasta") sequence.close() blast_cline = NcbiblastnCommandline(db="lncRNA_db", ungapped="yes", num_alignments=1, num_threads=4, query="fasta1.fasta", outfmt=5, out="out.xml",

Computational Analysis of lncRNA from cDNA Sequences

229

max_target_seqs=1) blast_cline() handle = open("out.xml", "rU") blast_record = NCBIXML.read(handle) for alignment in blast_record.alignments: for hsp in alignment.hsps: if hsp.identities >= (len(record.seq)*0.98): print >> report, "RNA:", record.id print >> report, "length(bp):", len(record.seq) print >> report, "lncRNA:", alignment.title print >> report, "length(bp):", alignment.length print >> report, "hsp score:", hsp.score print >> report, "hsp identities:", hsp.identities print >> report, "---------" print >> lncRNA_list, ">" + alignment.title[alignment.title.find("BT"):(alignment.title.find("BT"))+10] handle.close() lncRNA_list.close() input2 = open("lncRNAs.fasta", "rU") input3 = open("lncRNAs with RNA list.fasta", "rU") lncRNA = list(SeqIO.parse(input2, "fasta")) lncRNAlist = list(SeqIO.parse(input3, "fasta")) lncRNARNA = [] for record in lncRNA: for rec in lncRNAlist: if record.id == rec.id: lncRNARNA.append(record) lncRNARNA = list(set(lncRNARNA)) print "number of lncRNAs with miRNA:", len(lncRNARNA) for record in lncRNARNA: lncRNA.remove(record)

230

Susan Boerner and Karen M. McGinnis

print "number of lncRNAs without miRNA:", len(lncRNA) output = open("lncRNAs_with_known_miRNAs.fasta", "w") SeqIO.write(lncRNARNA, output, "fasta") output.close() output = open("lncRNAs_without_known_miRNAs.fasta", "w") SeqIO.write(lncRNA, output, "fasta") output.close() report.close() input1.close() input2.close() input3.close()

We can now move forward with the new set of lncRNAs without known miRNA genes. 3.4 Small RNA Precursor Prediction Via Mapping of Available Small RNA Sequences

1. Generation of a small RNA FASTA file There are many possible sources of small RNA sequences. (see Note 10) For this analysis, we will use a dataset of maize siRNAs available on Gene Expression Omnibus via NCBI [10]. Visit http://www.ncbi.nlm.nih.gov/geo/ and enter the following accession number: GSE15286. From here, you can download 4 sets of small RNA sequences: shRNA.root.fa.gz, shRNA.shoot.fa.gz, siRNA.root.fa.gz, and siRNA.shoot.fa.gz. (see Note 11) These files are quite large; see Note 2 for the fastest way to combine them into one file, if desired. 2. Mapping small RNA sequences to lncRNAs Before running the Python script, you need to create a BLAST database of the lncRNAs generated in Subheading 3.3. At the command line, pass the following: makeblastdb –in lncRNAs_without_known_miRNAs.fasta –dbtype nucl –title lncRNAs_without_known_miRNAs_db –out lncRNAs_without_known_miRNAs_db

To map small RNA sequences to the set of lncRNAs use the Python script from Subheading 3.3, step 2. (see Note 12) For

Computational Analysis of lncRNA from cDNA Sequences

231

an exact sequence alignment of small RNAs, change the highlighted value “0.98” to “1”. Be sure to change the file names and database names accordingly. We can now move forward with our two sets of lncRNAs, those with small RNA sequences and those without. 3.5 Localization of lncRNAs Within the Genome, Genic Vs Intergenic

1. Downloading genome and gene model sequences To download chromosomal sequences (see Note 13) for the maize genome, go to: http://ensembl.gramene.org/Zea_ mays/Info/Index. Under “Genome assembly: AGPv3”, click on “Download DNA sequence”, and then click on “Zea_mays. AGPv3.23.dna.chrmosome.1.fa.gz” to download the file. Then download the remaining chromosome FASTA files (through 10). Extract the files, and combine them into one FASTA file (see Note 3). Rename the file “maize_v3_chromosomes”. Next, we will use Biomart to extract gene model sequences. Go to: http://plants.ensembl.org/biomart/ martview/7b4abd04a0d0c38a22a7c261cda9abdf. Select “Plant Mart” under the “CHOOSE DATABASE” drop down menu, and then “Zea mays genes (AGPv3 (5b))” under the “CHOOSE DATASET” drop down menu. Click on “Filters” on the left-side menu, and under “GENE” check the box for “Limit to genes” and select “with Affymetrix array Maize ID(s)”. Click on “Attributes” on the left-side menu, then select “Sequences” at the top, and select “Unspliced (Gene)” under “ SEQUENCES”. Then check the two boxes corresponding to “Upstream flank” and “Downstream flank” below and enter a value for the number of base pairs of the flanking region to include in the gene model sequence file. For this analysis, we will focus on 1500 base pairs of flanking sequence for both upstream and downstream regions of the gene models. Click on “Results” on the top left menu bar, and for “Export all results to” select “File” and “FASTA” and check the “Unique results only” box. When the FASTA file of gene models has finished downloading, rename it “maize_v3_genemodels_flank1500.fasta”. 2. Localize lncRNAs within genome To run the following Python script to localize lncRNAs within the genome you will need to create two BLAST databases, one for the chromosome sequences, and one for the gene model sequences (maize_v3_chromosomes_db and maize_v3_genemodels_flank1500_db). The script prints IDs for lncRNAs that were not classified, and the total number of genic and intergenic lncRNAs that were classified.

232

Susan Boerner and Karen M. McGinnis

#!/usr/bin/env python from Bio.Blast.Applications import NcbiblastnCommandline from Bio import SeqIO input1 = open("lncRNAs_without_known_smallRNAs.fasta", "rU") L = list(SeqIO.parse(input1, "fasta")) intergenic = [] for record in L: sequence = open("fasta1.fasta", "w") SeqIO.write(record, sequence, "fasta") sequence.close() blast_cline = NcbiblastnCommandline(db="maize_v3_genemodels_flank1500_db", num_alignments=1, num_threads=4, query="fasta1.fasta", outfmt=5, out="out1.xml", max_target_seqs=1) blast_cline() blast_cline = NcbiblastnCommandline(db="maize_v3_chromosomes_db", num_alignments=1, num_threads=4, query="fasta1.fasta", outfmt=5, out="out2.xml", max_target_seqs=1) blast_cline() handle1 = open("out1.xml", "rU") from Bio.Blast import NCBIXML blast_record1 = NCBIXML.read(handle1) handle2 = open("out2.xml", "rU") from Bio.Blast import NCBIXML blast_record2 = NCBIXML.read(handle2) x = [] for alignment1 in blast_record1.alignments: for hsp1 in alignment1.hsps:

Computational Analysis of lncRNA from cDNA Sequences

233

x.append(hsp1.score) y = [] for alignment2 in blast_record2.alignments: for hsp2 in alignment2.hsps: y.append(hsp2.score) if len(x) == 0 and len(y) == 0: print "lncRNA that doesn't have a hit in sequenced chromosomes", record.id elif len(x) == 0 and len(y) != 0: intergenic.append(record) elif len(x) != 0 and len(y) == 0: print "lncRNA that is somehow present in genemodel but not in sequenced chromosomes", record.id else: if len(x) != 0 and len(y) != 0: if y[0] > x[0]: intergenic.append(record) handle1.close() handle2.close() print "intergenic lncRNAs:", len(intergenic) for record in intergenic: L.remove(record) print "genic lncRNAs:", len(L) output = open("lncRNA_without_smallRNA_GENIC.fasta", "w") SeqIO.write(L, output, "fasta") output.close() output = open("lncRNA_without_smallRNA_INTERGENIC.fasta", "w") SeqIO.write(intergenic, output, "fasta") output.close() input1.close()

234

Susan Boerner and Karen M. McGinnis

For this script, the FASTA file for lncRNAs without small RNA sequences was used. The lncRNAs that contain small RNA sequences can also be mapped within the genome using the same script; just change the file names accordingly. 3.6 Sub-Localization of Genic lncRNAs Within Gene Models

1. Downloading gene model sequences for localization Sequences within gene models are downloaded using Biomart as described in Subheading 3.5, step 1. For this analysis, we will grab the sequences for: “Flank (Gene)”, “5’ UTR”, “3’ UTR”, Unspliced (Transcript), and “Coding sequence.” The “Flank (Gene)” option needs to be processed twice. Click on “Flank (Gene), and then below check the box for “Upstream flank” and enter 1500, and get results. Do this again for “Downstream flank” (check box and enter 1500). Rename these files: “maize_v3_upflank1500.fasta”, “maize_v3_downflank1500.fasta”, “maize_v3_5UTR.fasta”, “maize_v3_3UTR.fasta”, “maize_v3_unspliced_trans.fasta, and “maize_v3_coding.fasta”, respectively. The unspliced transcript and coding databases will be used in step 3 below. 2. Sub-localization of genic lncRNAs within flanking regions and UTRs To run the following Python script to sub-localize lncRNAs within the flanking regions and UTRs, you will need to create BLAST databases of the FASTA files from step 1. As the full-length cDNAs used in this analysis are strand specific, the script was written to classify the genic lncRNAs based on their orientation within their respective gene model (sense or antisense). The script determines lncRNA localization based on an 80% identity match between the lncRNA sequence and the gene model region sequence. Again, this stringency can be adjusted if desired. This script also creates a text report file that lists the lncRNA ID, its orientation, the gene model ID, and the alignment scores. You can rerun the script for each database and for each set of genic lncRNAs separately. The script below uses the set of genic lncRNAs without small RNA sequences, and maps these to the upstream flanking regions of maize gene models (see Note 14).

Computational Analysis of lncRNA from cDNA Sequences

235

#!/usr/bin/env python from Bio import SeqIO from Bio.Blast.Applications import NcbiblastnCommandline from Bio.Blast import NCBIXML input1 = open("lncRNA_without_smallRNA_GENIC.fasta", "rU") genic = list(SeqIO.parse(input1, "fasta")) report = file("genic lncRNAs without small RNA in upstream flanking region.txt", 'w') sense = [] antisense = [] for record in genic: sequence = open("fasta1.fasta", "w") SeqIO.write(record, sequence, "fasta") sequence.close() blast_cline = NcbiblastnCommandline(db="maize_v3_upflank1500_db", num_alignments=1, num_threads=4, query="fasta1.fasta", outfmt=5, out="out.xml", max_target_seqs=1) blast_cline() handle = open("out.xml", "rU") blast_record = NCBIXML.read(handle) for alignment in blast_record.alignments: for hsp in alignment.hsps: if hsp.identities >= (len(record.seq)*0.8): if hsp.sbjct_start > hsp.sbjct_end: antisense.append(record) print >> report, "antisense" elif hsp.sbjct_start < hsp.sbjct_end: sense.append(record) print >> report, "sense"

236

Susan Boerner and Karen M. McGinnis

print >> report, "lncRNA:", record.id print >> report, "length(bp):", len(record.seq) print >> report, "gene model:", alignment.title print >> report, "length(bp):", alignment.length print >> report, "hsp.score:", hsp.score print >> report, "hsp.identities:", hsp.identities print >> report, "---------" handle.close() print "number of lncRNAs in the upstream region in the sense orientation:", len(sense) print "number of lncRNAs in the upstream region in the antisense orientation:", len(antisense) output = open("genic_lncRNA_without_smRNA_upflank1500_sense.fasta", "w") SeqIO.write(sense, output, "fasta") output.close() output = open("genic_lncRNA_without_smRNA_upflank1500_antisense.fasta", "w") SeqIO.write(antisense, output, "fasta") output.close() input1.close()

3. Sub-localization of genic lncRNAs, exons vs. introns As intron sequences are not available for v3 gene model builds for maize at this time, a script similar to the genic vs. intergenic localization one will be used to distinguish between intronic and exonic lncRNAs. The script compares the highest alignment scores for the lncRNA to the unspliced transcript sequence and the coding sequence. If the top hit of the lncRNA to the unspliced transcript is greater than the top hit of the lncRNA to a coding sequence, then at least part of the lncRNA is transcribed from an intronic region. In the following script, intronic lncRNAs were filtered as such if more than 50% of the sequence is complimentary to an intron. This step is highlighted in the script, and can be adjusted if desired.

Computational Analysis of lncRNA from cDNA Sequences

237

#!/usr/bin/env python from Bio.Blast.Applications import NcbiblastnCommandline from Bio import SeqIO input1 = open("lncRNA_without_smallRNA_GENIC.fasta", "rU") L = list(SeqIO.parse(input1, "fasta")) intronic = [] for record in L: sequence = open("fasta1.fasta", "w") SeqIO.write(record, sequence, "fasta") sequence.close() blast_cline = NcbiblastnCommandline(db="maize_v3_unspliced_trans_db", num_alignments=1, num_threads=4, query="fasta1.fasta", outfmt=5, out="out1.xml", max_target_seqs=1) blast_cline() blast_cline = NcbiblastnCommandline(db="maize_v3_coding_db", num_alignments=1, num_threads=4, query="fasta1.fasta", outfmt=5, out="out2.xml", max_target_seqs=1) blast_cline() handle1 = open("out1.xml", "rU") from Bio.Blast import NCBIXML blast_record1 = NCBIXML.read(handle1) handle2 = open("out2.xml", "rU") from Bio.Blast import NCBIXML blast_record2 = NCBIXML.read(handle2) x = []

238

Susan Boerner and Karen M. McGinnis

for alignment1 in blast_record1.alignments: for hsp1 in alignment1.hsps: x.append(hsp1.score) y = [] for alignment2 in blast_record2.alignments: for hsp2 in alignment2.hsps: y.append(hsp2.score) if len(x) != 0 and len(y) == 0: intronic.append(record) elif len(x) != 0 and len(y) != 0: if x[0] > y[0]: if x[0] > (len(record.seq)*0.5): intronic.append(record) handle1.close() handle2.close() print "intronic lncRNAs:", len(intronic) for record in intronic: L.remove(record) print "exonic lncRNAs:", len(L) output = open("lncRNA_without_smallRNA_GENIC_exonic.fasta", "w") SeqIO.write(L, output, "fasta") output.close() output = open("lncRNA_without_smallRNA_GENIC_intronic.fasta", "w") SeqIO.write(intronic, output, "fasta") output.close() input1.close()

Computational Analysis of lncRNA from cDNA Sequences

239

To further filter the intronic and exonic lncRNAs based on strand specificity, the resulting FASTA files of this script can be run through the script from step 2. using the database “maize_v3_unspliced_trans_db” and the appropriate file names.

4

Notes 1. These are just a few examples of the computational tools available. We recommend the recent Methods in Molecular Biology book, “RNA Sequence, Structure, and Function: Computational and Bioinformatic Methods”, for an in-depth view of what is currently available [11]. 2. The file containing cDNA sequences needs to be in FASTA format. Several online tools exist to convert files to FASTA from other formats. A simple script in Biopython to convert to FASTA from GenBank follows. Replace the input and output file names as needed. #!/usr/bin/env python from Bio import SeqIO input1 = open("cDNAs in genbank format.gb", "rU") cDNAs = SeqIO.parse(input1, “genbank”) output = open("cDNAs in fasta format.fasta", "w") SeqIO.write(cDNAs, output, "fasta") output.close() input1.close()

3. A very efficient way to combine large FASTA files is to use cat at the command line. If your two FASTA files are “shRNAshoot. fasta” and “siRNAshoot.fasta” and you want to combine them and rename the new file “smallRNA.fasta, then pass the following. cat shRNAshoot.fasta siRNAshoot.fasta > smallRNA.fasta

4. To ease the overall understanding of the analysis, sys.argv was not used to create command-line arguments for the Python scripts. Instead, we felt that highlighting the text in the script which can/need be modified would enable the reader to better follow the process (especially considering the scripts are relatively short).

240

Susan Boerner and Karen M. McGinnis

5. When copying the script into a text editor to run on your machine, be sure to retain the indentation. 6. There is an “additional options” section on the Run CPC web page that includes a search against the UTRdb and RNAdb, as well as running the antisense strand of your submitted sequence through the Coding Potential Calculator. The database searches greatly increase the time it takes to complete the run, and are therefore not recommended for this analysis. If the cDNA sequences you have are not strand specific, you may want to include the antisense strand analysis, and only select sequences that are predicted to be noncoding in the sense and antisense direction. 7. CPC has 4 classifications of coding potential based on the output score: noncoding, noncoding (weak), coding (weak), and coding. Noncoding transcripts have a score < 1.0; the script here selects only those transcripts with this classification. If you wish to lower the stringency, you can change the highlighted value of “1” in the script to “0”. This will include sequences classified as: noncoding (weak). 8. This analysis is specific for maize, but other databases of miRNAs exist for many other species, as well as databases for other small ncRNAs (piwi, sno, etc.). 9. Due to the attributes of the XML file generated for each BLAST alignment and the length of the known miRNA sequences (shorter than lncRNAs), the BLAST database must consist of the lncRNA sequences. 10. Several small RNA databases exist, and next-generation sequencing data of small RNAs can be produced or downloaded from the Gene Expression Omnibus (http://www. ncbi.nlm.nih.gov/gds/). 11. The classification of sh vs si RNA in this study was based on the likelihood of the genomic region surrounding the small RNA sequence to form a short hairpin (ref). For simplicity, these were grouped together as downstream analysis of potential lncRNAs will likely classify small RNA precursors based on this feature. 12. Depending on the power of your processor, this script may take a long time to run. Be sure to change the number of threads used for BLAST within the script to reflect the number of cores your processor has (“num_threads ¼ 4” is highlighted within the script, and assumes you have a quad-core processor to run each thread). 13. The Python script to distinguish between intergenic and genic lncRNAs is based on a comparison of the alignment scores of the lncRNA to complete genome sequences and gene model

Computational Analysis of lncRNA from cDNA Sequences

241

sequences. Assembled genome sequences are required for an alignment score of the lncRNA to this set to be comparable to the score against the gene model sequences. The chromosomal sequences are assembled. For the rare cases where the lncRNA falls outside of the assembled chromosomal sequences and therefore the gene model sequences as well, the script will print the corresponding lncRNA ID for further analysis, if desired. 14. Depending on how stringent your alignment is, you may have lncRNAs that localize to more than one region of a gene model. References 1. Dinger ME, Pang KC, Mercer TR, Mattick JS (2008) Differentiating protein-coding and noncoding RNA: challenges and ambiguities. PLoS Comput Biol 4(11):e1000176 2. Kong L, Zhang Y, Ye ZQ, Liu XQ, Zhao SQ, Wei L, Gao G (2007) CPC: assess the proteincoding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res 35:W345–W349 3. Wang L, Park HJ, Dasari S, Wang S, Kocher J-P, Li W (2013) CPAT: coding-potential assessment tool using an alignment-free logistic regression model. Nucleic Acids Res 41(6): e74 4. Li A, Zhang J, Zhou Z (2014) PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme. BMC Bioinformatics 15(1):311 5. Kung JTY, Colognori D, Lee JT (2013) Long noncoding RNAs: past, present, and future. Genetics 193(3):651–669 6. Soderlund C, Descour A, Kudrna D, Bomhoff M, Boyd L, Currie J, Angelova A, Collura K, Wissotski M, Ashley E, Morrow D, Fernandes J, Walbot V, Yu Y (2009) Sequencing, mapping, and analysis of 27,455 maize

full-length cDNAs. PLoS Genet 5(11): e1000740 7. Xue C, Li F, He T, Liu G, Li Y, Zhang X (2005) Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine. BMC Bioinformatics 6:310 8. Gruber AR, Lorenz R, Bernhart SH, Neubo¨ck R, Hofacker IL (2008) The Vienna RNA Websuite. Nucleic Acids Res 36(Web Server issue):W70–W74 9. Amaral PP, Clark MB, Gascoigne DK, Dinger ME, Mattick JS (2011) lncRNAdb: a reference database for long noncoding RNAs. Nucleic Acids Res 39:D146–D151 10. Wang X, Elling AA, Li X, Li N, Peng Z, He G, Sun H, Qi Y, Liu XS, Deng XW (2009) Genome-wide and organ-specific landscapes of epigenetic modifications and their relationships to mRNA and small RNA transcriptomes in maize. Plant Cell 21:1053–1069 11. Jan G, Ruzzo WL (2014) RNA sequence, structure, and function: computational and bioinformatic methods. Methods Mol Biol 1097:437–456

Chapter 21 Characterizing miRNA–lncRNA Interplay Dimitra Karagkouni, Anna Karavangeli, Maria D. Paraskevopoulou, and Artemis G. Hatzigeorgiou Abstract Long noncoding RNAs (lncRNAs) are noncoding transcripts, usually longer than 200 nt, that constitute one of the largest and significantly heterogeneous RNA families. The annotation of lncRNAs and the characterization of their function is a constantly evolving field. LncRNA interplay with microRNAs (miRNAs) is thoroughly studied in several physiological and disease states. miRNAs are small noncoding RNAs (~22 nt) that posttranscriptionally regulate the expression of protein coding genes, through mRNA target cleavage, degradation or direct translational suppression. miRNAs can affect lncRNA half-life by promoting their degradation, or lncRNAs can act as miRNA “sponges,” reducing miRNA regulatory effect on target mRNAs. This chapter outlines the miRNA–lncRNA interplay and provides hands-on methodologies for experimentally supported and in silico-guided analyses. The proposed techniques are a valuable asset to further understand lncRNA functions and can be appropriately adapted to become the backbone for further downstream analyses. Key words microRNA, lncRNA, AGO-CLIP-Seq, Sponge, ceRNA, Experimentally supported, Interaction, Degradation

Abbreviations ceRNA CLIP lncRNA miRNA MRE

1

Competing endogenous RNA Cross-linking immunoprecipitation Long noncoding RNA microRNA miRNA recognition element

Introduction Noncoding RNAs (ncRNAs) are being vigorously researched for their crucial implication in various physiological and pathological states. Emerging technological advances during the past decade

Lin Zhang and Xiaowen Hu (eds.), Long Non-Coding RNAs: Methods and Protocols, Methods in Molecular Biology, vol. 2372, https://doi.org/10.1007/978-1-0716-1697-0_21, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021

243

244

Dimitra Karagkouni et al.

enabled large scale analyses and turned ncRNAs into a research hotspot [1]. microRNAs are small ncRNAs, approximately 22 nucleotides long, that act as central posttranscriptional gene regulators. They are loaded into protein Argonaute (AGO) and interact with the RNA-induced silencing complex (RISC) to induce transcript cleavage, degradation, and/or translation suppression in the case of mRNAs [2]. miRNA activity is mainly observed in the cytoplasm, while recent studies indicate their reentrance into the nucleus. miRNA function in the nucleus is still under investigation and has not yet been thoroughly defined [3]. Long noncoding RNAs are the most heterogeneous type of noncoding transcripts, with lengths ranging from 200 nt to 100,000 nt. They constitute a large portion of the mammalian transcriptome; their annotation is constantly enhanced, yet the characterization of their regulatory function remains a challenge [4]. LncRNAs are observed in a wide variety of organelles within the nucleus and/or cytoplasm, characterized by high tissue, disease, and developmental stage specificity [5]. Although they constitute a class of noncoding transcripts, specific cytosolic lncRNAs may encode functional small peptides, associated with diverse biological processes [6, 7]. Recent studies indicate the crucial role of lncRNAs in numerous biological processes, regulating gene expression in transcriptional and posttranscriptional level [8]. Among the numerous reported mechanisms, many lncRNAs have been shown to promote chromatin modifications, interfere with the transcriptional machinery and be involved in the activity of transcriptional enhancers [9, 10]. At the posttranscriptional level, they control protein synthesis and stability, regulate splicing, and induce mRNA decay [11, 12]. A remarkable interplay occurs between lncRNAs and miRNAs in both the nuclear and cytoplasmic compartments. LncRNAs (such as Meg3, Dleu2, H19, and Ftx) can function as pri-miRNA host genes [13, 14]. Genomic regions where miRNA transcripts and lncRNAs overlap have dual/multiple functionality. Different biological processes can either trigger lncRNA function or promote the activation of the miRNA biogenesis pathway. Several wellknown polycistronic miRNA gene clusters, including members of the let-7 family, derive from intergenic regions that also encode lncRNAs. miRNA–lncRNA interplay can also affect the ncRNA-mediated (post)transcriptional gene regulation. Several studies have investigated the direct miRNA targeting on lncRNA transcripts in different cellular compartments and disease states. miRNAs can affect lncRNA half-life by promoting their degradation [15, 16], and/or lncRNAs can act as “sponges,” sequestering miRNAs and reducing miRNA suppressive effect on target-mRNAs. The latter

Characterizing miRNA–lncRNA Interplay

245

posttranscriptional mechanism, characterized by a regulatory network between miRNAs and their (non)coding targets, has been elegantly described as “competing endogenous RNA” (ceRNA) [17, 18]. Target RNA-directed microRNA degradation (TDMD) is a recently proposed, prominent mechanism of lncRNAs to induce miRNA degradation [19]. Kleaveland et al. report the involvement of lncRNA Cyrano in miR-7 regulation, by triggering miRNA degradation through an extensively paired site. This function activates a hidden network between ncRNAs that controls neuronal activity [20]. Long noncoding RNA transcripts have also the ability to directly bind to complementary sequences on mRNAs and prevent miRNA targeting in this region. Faghili et al. proposed the beta-secretase-1 antisense long noncoding transcript (BACE1-AS) to compete with a repressive miRNA for the same binding site and indirectly stabilize the sense BACE1 transcript in Alzheimer’s disease [21]. miRNA–lncRNA interactions have been characterized both in vitro and in vivo. Mouse xenograft models have been utilized in several cases, to confirm the regulatory role of this interplay and its association with complex diseases, such as cancer [15, 22]. An overview of miRNA–lncRNA interplay in nucleus and cytoplasm is portrayed in Fig. 1. This chapter focuses on the characterization of cell-type specific/disease-associated miRNA–lncRNA interactions and provides a hands-on guide for experimentally supported and in silico-guided analyses.

2

Materials To perform the described miRNA–lncRNA-guided analyses, the free to access online repository DIANA-LncBase v3 [23] is utilized. Essential prerequisites are a Windows, MacOS, or a Unix-like environment and an installed up-to-date web browser.

3

Methods

3.1 Experimentally Verified miRNA– lncRNA Interactions

Direct miRNA targeting on lncRNAs, in both nuclear and cytoplasmic compartments, is reported in several physiological and pathological conditions. Metastasis associated lung adenocarcinoma transcript 1 (MALAT1) is one of the most abundant long noncoding RNAs, mostly localized in the nucleus. Its activity and deregulation is associated with several neoplastic diseases [24, 25]. Multiple miRNAs target MALAT1 and act as tumor suppressors by controlling its half-life and suppressing its oncogenic effects. Leucci et al. reported the nuclear miRNA-directed

246

Dimitra Karagkouni et al.

Fig. 1 Overview of miRNA–lncRNA interplay in nucleus and cytoplasm. (a) lncRNAs can function as miRNA precursors and produce mature miRNAs after Dicer and/or Drosha cleavage. (b, c) lncRNAs bound by miRNAs in nucleus and/or cytoplasm, can act as sponges, thus reducing miRNA suppressive effect on target-mRNAs, be degraded or induce miRNA degradation specifically in the cytoplasm. (d) lncRNAs compete with miRNAs for the mRNA binding sites

regulation of MALAT1 in Hodgkin lymphoma and glioblastoma cell lines, through miR-9 direct binding on long-non coding transcript in an AGO2 dependent manner [26]. The most researched mechanism of miRNA–lncRNA interplay, the ceRNA activity, has been delineated in several physiological states and complex diseases, such as cancer. An interesting example is the “sponge” activity of X-inactive specific transcript (XIST) on bound miRNAs that either suppresses or promotes tumor cell proliferation [27, 28]. Liang et al. also propose the long noncoding RNA-HLA complex P5 (HCP5) as potential biomarker for

Characterizing miRNA–lncRNA Interplay

247

follicular thyroid carcinoma (FTC) diagnosis and therapy. HCP5 acts as oncogene in FTC, promoting tumor regulation as a decoy for miR-22, miR-186, and miR-216 and indirectly activating their targets [29]. Table 1 summarizes some of the latest reports regarding the subcellular miRNA–lncRNA interplay in several physiological and pathological conditions. 3.2 Experimental Methodologies

Numerous low-yield and high-throughput experimental techniques have emerged, aiming to define miRNA targets in a specific or wider scale [54]. Low-yield techniques infer either the exact miRNA binding location on lncRNA targets (reporter gene assays), or the lncRNA regulation by miRNAs via quantifying changes in target abundance (quantitative polymerase chain reaction - qPCR, Northern blotting). Variants of conventional experimental techniques are constantly evolved. An alternate biochemical approach followed by reverse transcription polymerase chain reaction (RT-PCR), delineates direct miRNA targets by applying miRNAs as primers during transcription of total RNA [55]. RNA-Seq and microarrays techniques, following a specific miRNA overexpression/knockdown, indirectly provide numerous miRNA–lncRNA pairs by quantifying lncRNA abundance. RIP (RNA Immunoprecipitation) and CLIP (Crosslinking and Immunoprecipitation) technologies against the miRNA-guider protein, AGO, followed by high-throughput sequencing, enable the characterization of cell-type and tissue-specific miRNA–target pairs on a wide scale [56]. CLIP-Seq techniques are considered the gold standard methodologies for the direct detection of numerous miRNA binding sites on (non)coding targets. AGO-HITS-CLIP (high-throughput sequencing of RNA isolated by cross-linking immunoprecipitation) [57], PAR-CLIP [58] (photoactivatableribonucleoside-enhanced cross-linking and immunoprecipitation), CLEAR-CLIP (covalent ligation and endogenous argonautebound RNA) and CLASH (crosslinking, ligation, and sequencing of hybrids) [59] experiments are widely used CLIP-Seq variants, providing miRNA interactions on a transcriptome-wide scale for healthy or diseased cell types. The latter two techniques reveal a fraction of chimeric microRNA–target ligated pairs. The number of publications that report miRNA–lncRNA regulation is increasing rapidly. This wealth of information is fragmented in manuscripts and raw high-throughput data. The collection of the related literature as well as the analysis of data, are demanding and time-consuming practices. This gap is closed via the deployment of repositories, dedicated to collect and catalogue miRNA– target pairs.

3.3

DIANA-LncBase (www.microrna.gr/LncBase) [23] is a reference repository with miRNA–lncRNA interactions. It was initially released in 2013 and since then is constantly updated. Its third

DIANA-LncBase

lncRNA UCA1 MACC1AS1

LincRNAp21 SNHG7 SNHG1 SNGH6

MALAT1 HCP5

MIAT SNGH1 XIST DSCR8

Disease

Bladder cancer [30]

Breast cancer [31]

Cervical cancer [16]

Colorectal cancer [32]

Colorectal cancer [33]

Colorectal cancer [34]

Esophageal squamous cell carcinoma [15]

Follicular thyroid carcinoma [29]

Gastric cancer [35]

Glioblastoma [36]

Glioblastoma [37]

Hepatocellular carcinoma [38]

miR-485

miR-152

miR-194

miR-141

miR-22 miR-186 miR216a

miR-101 miR-217

miR-214 miR-26a miR-26b

miR-154

miR216b

let-7b

miR-34c miR-126 miR-342 miR-145 miR-384

miR-1

miRNA

lncRNA sponge/oncogene

lncRNA degradation/miRNA tumor suppressor

lncRNA sponge/oncogene

Cytoplasm Nucleus







Cytoplasm

lncRNA sponge/oncogene

lncRNA sponge/oncogene



Cytoplasm

lncRNA sponge/oncogene

lncRNA degradation/miRNA tumor suppressor

Cytoplasm



lncRNA sponge/oncogene

lncRNA sponge/oncogene

Cytoplasm

Cytoplasm

lncRNA sponge/oncogene

lncRNA degradation/miRNA tumor suppressor



Compartment

lncRNA degradation/miRNA tumor suppressor

Mechanism

Table 1 A collection of miRNA–lncRNA experimentally verified interactions. Each interaction, encountered in cytoplasm and/or nucleus, is characterized by the lncRNA “sponge” activity or miRNA-mediated lncRNA degradation, as well as its physiological or pathogenic role in each condition

248 Dimitra Karagkouni et al.

HOTTIP UCA1 MALAT1 XIST MALAT1 MALAT1 XIST HOTTIP MIAT MEG3 MALAT1 BDNF-AS MALAT1 SNHG14 Rik-201 Rik-203 lncRNA1604 lncND H19 Sros1

Hepatocellular carcinoma [39]

Hepatocellular carcinoma [40]

Hodgkin lymphoma, glioblastoma [26]

Nasopharyngeal carcinoma [27]

Melanoma [41]

Ovarian cancer [42]

Prostate cancer [28]

Small cell lung cancer [22]

Atherosclerosis [43]

Cardiac hypertrophy [44]

Parkinson [45]

Parkinson [46]

Ischemic stroke [47]

Ischemic stroke [48]

Neural differentiation [49]

Neural differentiation [50]

Neuronal development and differentiation [51]

Osteoblast differentiation [52]

Innate immune response [53]

miR-1

miR-141

miR-143

miR200c

miR-96 miR467a

miR-136

miR181c

miR125b

miR-124

miR-361

miR-149

miR-574

miR-23a

miR-211

miR-34a

miR-34a

miR-9

miR-124

miR125b





lncRNA degradation/miRNA promotes IFN-γ-activated innate response Cytoplasm in macrophages

lncRNA sponge/promotes osteoblast differentiation of stem cells

lncRNA sponge/represses neural differentiation

Cytoplasm

Cytoplasm

lncRNA sponge/promotes neural differentiation

lncRNA sponge/promotes neural differentiation



Cytoplasm





Cytoplasm

Cytoplasm

Cytoplasm Nucleus









nucleus





lncRNA sponge/enhances neuronal deficit, inflammation

lncRNA sponge/promotes inflammation

lncRNA sponge/promotes autophagy and apoptosis

lncRNA sponge/promotes dopaminergic neurons apoptosis

lncRNA sponge/promotes cardiac hypertrophy

lncRNA sponge/promotes atherosclerotic

lncRNA sponge/oncogene

lncRNA sponge/tumor suppressor

lncRNA sponge/oncogene

lncRNA sponge/oncogene

lncRNA sponge/oncogene

lncRNA degradation/miRNA tumor suppressor

lncRNA degradation/miRNA tumor suppressor

lncRNA degradation/miRNA tumor suppressor

Characterizing miRNA–lncRNA Interplay 249

250

Dimitra Karagkouni et al.

version is the largest repository with experimentally supported miRNA–lncRNA pairs. It provides ~240,000 unique interactions from 14 distinct experimental methodologies, corresponding to 243 different tissues/cell types in human and mouse species. This collection of interactions is derived from the manual curation of 159 publications, retrieved with an in-house developed text-mining pipeline with full-text capacity, and the analysis of >300 highthroughput experiments. The database incorporates interactions of a meticulously collected set of various lncRNA categories from GENCODE 30 [60], RefSeq [61] and the publication of Cabili et al. [62] (see Note 2). The included set corresponds to 53,250 and 27,000 lncRNA and pseudogene transcripts respectively. The repository also provides (1) known short variant information from dbSNP build 151 [63], ClinVar [64], and COSMIC v90 [43] on >67,000 miRNA Recognition Elements (MREs) and (2) lncRNA expression profiles in a wide range of cell types/tissues and nuclear/cytoplasmic cellular compartments. 3.3.1 Analysis of AGOCLIP-Seq Datasets

LncBase v3 incorporates (in)direct miRNA–lncRNA interactions from various low-/high-yield experimental techniques including luciferase reporter assays, qPCR, CLIP-Seq, microarrays and miRNA–lncRNA chimeric fragments from CLEAR-CLIP and a previous meta-analysis of published AGO-CLIP-Seq datasets. The largest part of LncBase v3 is composed of high confident AGOCLIP-derived miRNA–lncRNA pairs. Specifically, it catalogs ~370,000 miRNA binding events on lncRNAs from human and mouse species and >2000 viral MREs on host lncRNAs. This compilation of interactions is derived from the analysis of 236 AGO-CLIP-Seq libraries (16 virus-infected) with a robust algorithm, microCLIP framework [65].

3.3.2 Preprocessing of Deep Sequencing Data

AGO-CLIP-Seq libraries were initially quality checked using FASTQC (www.bioinformatics.babraham.ac.uk/projects/fastqc) [66], while adapters/contaminants were removed by utilizing Cutadapt (https://code.google.com/p/cutadapt/) [67]. Alignment of CLIP-Seq reads against human and mouse reference genomes was performed with GMAP/GSNAP [68], accordingly parameterized in order to identify reads in splice junctions (see Note 3).

3.3.3 Identification of AGO-CLIP-Guided miRNA Targets with microCLIP

microCLIP is an innovative algorithm able to process all the AGO-enriched clusters for the identification of transcriptomewide functional miRNA–target pairs. It was initially deployed for the analysis of AGO-PAR-CLIP datasets and was generalized to other CLIP-Seq variants. The model was trained and evaluated on an unprecedented set of miRNA-binding events, supported by specific techniques, numerous miRNA perturbation experiments and structural sequencing data. microCLIP adopts a multilayer

Characterizing miRNA–lncRNA Interplay

251

super learner classification scheme that maximizes the contribution of a set of 131 descriptors, associated with AGO-CLIP-related attributes and miRNA/MRE hybrid derived characteristics. This machine learning approach is considered among the best in its field, able to detect 1.6-fold more validated target sites compared to leading implementations. microCLIP can be applied on any AGO-CLIP-Seq library. It requires a SAM/BAM alignment file and a list of miRNAs in FASTA format as minimum input. The output of the algorithm is a BED-like file comprising (non)canonical miRNA interactions. microCLIP is a standalone framework developed in R that can be easily installed. It is available for free download at www.microrna. gr/microCLIP. For the analysis of AGO-CLIP-Seq data included in LncBase v3, highly expressed miRNAs were obtained either from the relevant publications or the analysis of small RNA-Seq libraries, corresponding to similar cell types/tissues with the CLIP-Seq samples. 3.4 Hands-on Guide for LncBase v3 Utilization

The database interface incorporates two modules that are seamlessly interconnected. The primary module displays the experimentally supported miRNA–lncRNA interactions and the secondary the lncRNA expression profiles in different cell types/tissues and cellular compartments. The database allows advanced queries of multiple miRNA and lncRNA terms, specialized genomic locus search of experimentally supported MREs, different filtering combinations including species, cell types/tissues, methodologies, and transcript category. LncBase v3 is seamlessly interconnected with the content of other DIANA-tools (see Note 1), providing a valuable asset to further explore the functionality of miRNA–lncRNA pairs. The “sponge” activity of lncRNAs can be scrutinized with the use of experimentally supported miRNA–mRNA pairs from TarBase[54] and in silico characterized miRNA targets from microT-CDS [69]. miRNA–lncRNA interactions can be enhanced with predicted pairs from LncBase v2 [70]. The direct interconnection with miRPath v3 [71] facilitates the investigation of functional miRNA activity in physiological/pathological molecular pathways. The database also enables the view of interactions via RNAcentral [72] dedicated page. RNAcentral is a worldwide research consortium with focus on ncRNA annotation and function. Since 2015, LncBase is integrated in the repository.

3.4.1 LncBase Module of miRNA–lncRNA Interactions

The user can enter a query of miRNA and/or lncRNA names/ identifiers to explore experimentally supported interactions. The results of a combined query with multiple miRNAs (hsa-miR-9-5p, hsa-miR-181d-5p, hsa-miR-106a-5p) and lncRNAs (MALAT1, H19, GAS5) are displayed in Fig. 2. Specific filtering options have

252

Dimitra Karagkouni et al.

Fig. 2 A combined query using the LncBase module of experimentally supported miRNA–lncRNA interactions. Displayed miRNA–lncRNA interactions include pairs of the queried terms and the application of specific filters. Result statistics of the specific query are promptly calculated, while interactions can be sorted in descending/ ascending order upon user selection. miRNA–lncRNA pairs are accompanied with additional information including gene/miRNA details and active links to ENSEMBL, RefSeq, miRBase, and RNAcentral via the associated info buttons. High/Low miRNA confidence level information and indication of variant overlap in MRE regions are portrayed in green and red boxes respectively. Interconnection with the lncRNA expression module and other DIANA-Tools is available. Results can be also visualized in the UCSC Genome Browser via the respective link. The query results can be retrieved through a “Download” button, provided in every LncBase module

been applied to retrieve interactions observed in breast, cervical and prostate cancerous cell types (MCF7, HELA, PC3), defined from specific methodologies (PAR-CLIP, HITS-CLIP, qPCR, luciferase reporter assay). Three transcript categories have been also selected (lincRNA, antisense, processed transcript) from a specific lncRNA source (ENSEMBL). Displayed miRNA–lncRNA interactions include pairs of the queried terms. Detailed statistics regarding the number of (1) experiments, (2) cell types/tissues, and (3) publications that confirm the existence of the miRNA–lncRNA interacting pairs are provided. The results can be sorted in ascending or descending order. Figure 3 displays the provided enhanced metadata, including the exact miRNA binding location, the experimental conditions, miRNA confidence level indication (according to the latest version of miRBase [73]) and detailed variant information where applicable. miRNA binding events and MRE-overlapping variant genomic locations can be visualized with UCSC Genome Browser [74], as portrayed in Fig. 4. miRNA–lncRNA pairs can be also displayed without performing any query, via the selection of

Characterizing miRNA–lncRNA Interplay

253

Fig. 3 Experimental details, cell type and tissue specific information and the exact miRNA binding location are displayed in the interaction expanded panel. Meta-information for MRE-overlapping variants—external identifier, alleles, the exact variant position, links to the original source—are provided where available upon info button selection

Fig. 4 Visualization of lncRNA–miRNA interactions in UCSC Genome Browser upon user selection. The graphical representation is an active link to UCSC Genome Browser where the user is facilitated with all the available browser options. Upon selection of the “default tracks” in Genome Browser interface, MREs and the respective overlapping variants are displayed along with the annotated (un)spliced lncRNA transcript. Extra information tracks regarding histone marks, DNase-Seq signal, sequence conservation, SNPs, and repeat regions are also provided

254

Dimitra Karagkouni et al.

Fig. 5 Displayed miRNA–lncRNA interactions include pairs of the MREs residing on a specific applied region. The module also enables to retrieve interactions for all the overlapping long noncoding transcripts with the selected region. Different filtering options are also available to users

more than three filtering options, with priority to species, method, and at least one cell type or tissue. LncBase v3 also enables the search by location in a dedicated module. The user can specify a genomic region (the coordinates of exonic/intronic regions, an entire chromosome, etc.) and retrieve miRNA–lncRNA interactions for (1) overlapping lncRNA transcripts, via “Transcripts” selection, and (2) overlapping MREs, via “Bindings” selection. Figure 5 portrays MREs, residing on region “chr11:65499300-65499700”, identified in MCF7, HELA, and PC3 cancerous cell types. Upon selection of specific filters the retrieved interactions consist miRNAs with “High” confidence level and lncRNAs belonging to specific categories (lincRNA, processed transcript, antisense), with gene identifiers from ENSEMBL [75]. 3.4.2 LncBase Module of lncRNA Expression Profiles

LncRNA expression profiles in different cell types/tissues (“Expression” mode) and cellular compartments (“Localization” mode) can be explored in a separate page. The user can retrieve lncRNA expression either via the interconnected link, provided by miRNA–lncRNA interactions module or by applying a query of lncRNA names and/or identifiers in the dedicated result page. Transcripts Per Million (TPM) values, describing lncRNA abundance, are catered to users. Specific range of TPM values is also provided as a filtering option, described as “Low” (range:1–10), “Medium” (range: 11–600), and “High” (range: >600). The results of a combined query in “Expression” mode for multiple lncRNAs (GAS, MALAT1, H19) in cervical, mammary gland, and

Characterizing miRNA–lncRNA Interplay

255

Fig. 6 A combined query using the LncBase “Expression” module. lncRNA expression values of the selected transcripts and filters are portrayed. Expression information and experimental details are provided. Additional information, regarding the number of the analyzed replicates, is available. Interconnection with miRNA–lncRNA interactions module is provided. Users can easily change mode to “Localization” to explore lncRNA expression profiles per cellular compartment

embryonic tissues, are displayed in Fig. 6. “Medium” TPM range has been also selected. In “Localization” mode lncRNA expression (TPM) has been estimated separately in nucleus and cytoplasm and provided to users. Negative and positive values of the relative concentration index (RCI) denote the inclination of lncRNAs to localize either toward the nucleus or cytoplasm respectively. The compartmentalized expression values of GAS5, MALAT1, and H19 in cervical, mammary gland, and embryonic tissues are portrayed in Fig. 7. 3.4.3 Visualization Plots

LncBase v3 provides advanced interactive visualization plots. The user can explore (1) a possible clustering of AGO-CLIP-Seq derived miRNA–lncRNA pairs, in different cell types/tissues, via a correlation plot and (2) lncRNA expression profiles within the cell and/or comparatively between subcellular compartments via interactive bar-plots. Figure 8 portrays the correlation of miRNA– lncRNA pairs, identified in all the cancerous cell types, included in the database. Figures 9 and 10 display the expression profiles of GAS5 in different physiological/pathological cell types and tissues.

3.5 Download of LncBase Content

miRNA–lncRNA interactions, accompanied with lncRNA expression profiles can be incorporated in miRNA functional analyses, providing a valuable asset to the understanding of ncRNA role in disease-, developmental stage-, and tissue-specific mechanisms. The database content is freely available for further experimentation with

256

Dimitra Karagkouni et al.

Fig. 7 A combined query using the LncBase “Localization” module. lncRNA expression values of the selected transcripts and filters, in nucleus/cytoplasm, are portrayed. Expression information and experimental details are provided. Localization information, including lncRNA expression TPM values both in nucleus and cytoplasm, RCI value, and lncRNA localization tendency are catered to users. Interconnection with miRNA–lncRNA interactions module is available

Fig. 8 LncBase v3 interactive correlation plot between miRNA–lncRNA pairs identified in cancerous cell types. For each pairwise combination, Pearson’s r coefficient values are calculated. Higher correlations are observed among cell types of the same tissue (e.g., prostate and mammary gland)

Characterizing miRNA–lncRNA Interplay

257

Fig. 9 LncBase v3 interactive bar-plot depicting the expression profiles of GAS5 transcripts in nine different cell types. TPM values are plotted, while transcript identifiers are displayed in the legend with different colors

in-house developed algorithms and custom pipelines. The results of different queries can be downloaded via the relevant “Download” button selection, in csv or json format. In the case of miRNA–lncRNA experimentally supported entries, every line is composed of lncRNA gene name, gene identifier, miRNA name, MIMAT, species, miRNA binding location, exon/intron info, a positive indication of the experimental low/ high-throughput method that supports the interaction, the cell type/tissue, as well as relevant variant info where applicable. A sample of the adopted “miRNA–lncRNA experimentally supported interactions” file format: Gene Name, Gene ID, MiRNA, MIMAT, Species, Cell Type, Tissue, Category, Location, Region, Method, Result, Validation Type, Experimental Condition, Variant IDs, Variant Positions GAS5, ENSG00000234741, hsa-miR-106a-5p, MIMAT0000103, hg38, MCF7, Mammary Gland, Cancer/Malignant, chr1:173865574-173865597|-, exon, HITS-CLIP, POSITIVE, DIRECT, nested unprotected primer. E2 treatment, COSV62933758, chr1:173865584 H19, ENSG00000130600, hsa-miR-106a-5p, MIMAT0000103, hg38, HELA, Cervix, Cancer/Malignant, NA, NA, qPCR, POSITIVE, INDIRECT, NA, NA, NA MALAT1, ENSG00000251562, hsa-miR-106a-5p, MIMAT0000103, hg38, MCF7, Mammary Gland, Cancer/ Malignant, chr11:65505473-65505497|+, exon, HITS-CLIP, POSITIVE, DIRECT, nested unprotected primer. E2 treatment, COSV73173630|COSV73173312, chr11:65505478| chr11:65505494

258

Dimitra Karagkouni et al.

Fig. 10 LncBase v3 interactive bar-plot depicting, comparatively between nucleus and cytoplasm, the abundance of GAS5 transcripts in four different tissues, corresponding to five cell types. RCI values are plotted. Negative and positive RCI values indicate lncRNA tendency to be localized to the nucleus or cytoplasm respectively. Transcript identifiers are displayed in the legend with different colors. Bar-plot is coupled with nuclear/cytoplasmic TPM values through an interconnected interactive table

LncRNA expression values within the cell or in nuclear/cytoplasmic compartments, can be downloaded in different files. In case of “Expression” module, every line is composed of long noncoding gene name, gene and transcript identifiers, cell type and tissue information, species, TPM expression values, as well as the number of replicates included in the analysis. A sample of the adopted “lncRNA expression” file format: Gene Name, Gene ID, Transcript ID, Cell Type, Tissue, Category, TPM, Species, Replicates MALAT1, ENSG00000251562, ENST00000619449, MCF7, Mammary Gland, Cancer/Malignant, 29.282278, Homo sapiens, 1 MALAT1, ENSG00000251562, ENST00000610851, HELAS3, Cervix, Cancer/Malignant, 370.071313, Homo sapiens, 2 GAS5, ENSG00000234741, ENST00000650796, HELAS3, Cervix, Cancer/Malignant, 223.9944045, Homo sapiens, 2 In the case of “Localization” module, every line is composed of similar information included in the downloaded files of the “Expression” module, accompanied with nuclear/cytoplasmic TPM values, RCI value and a report regarding the tendency of lncRNAs to be localized either in the nucleus or in the cytoplasm. A sample of the adopted “nuclear/cytoplasmic lncRNA expression” file format: Gene Name, Gene ID, Transcript ID, Cell Type, Tissue, Category, Nucleus TPM, Cytoplasm TPM, RCI, Compartment, Replicates GAS5, ENSG00000234741, ENST00000650796, MCF7, Mammary Gland, Cancer/Malignant, 195.9394, 85.7523, -1.1922, Nucleus two replicates per compartment

Characterizing miRNA–lncRNA Interplay

259

MALAT1, ENSG00000251562, ENST00000616691, HELAS3, Cervix, Cancer/Malignant, 1719.5206, 60.1596, -4.8371, Nucleus two replicates per compartment MALAT1, ENSG00000251562, ENST00000620902, ESC, Embryo, Embryonic/Fetal,Stem/Progenitor, 0.3995, 39.3514, 6.622, Cytoplasm no replicates

4

Notes 1. lncRNA/miRNA identifiers differ in-between databases and therefore the adopted naming nomenclature should be checked thoroughly to avoid inconsistencies in the results. At the moment, LncBase provides lncRNA–miRNA interactions for miRNAs included in miRBase v22 [73], and lncRNAs retrieved from ENSEMBL 96 [75], RefSeq [61], and the publication of Cabili et al. [62]. 2. The genome assemblies utilized for the analysis of highthroughput data in LncBase v3 are hg38 and mm10 for human and mouse species respectively. 3. DIANA tools webserver (www.microrna.gr) provides an extensive documentation of every tool/database filtering options, functionalities and detailed description of the allowed queried terms.

Acknowledgments We acknowledge support of this work by the project “ELIXIR-GR: The Greek Research Infrastructure for Data Management and Analysis in Life Sciences” (MIS 5002780) which is implemented under the Action “Reinforcement of the Research and Innovation Infrastructure,” funded by the Operational Programme “Competitiveness, Entrepreneurship and Innovation” (NSRF 2014–2020) and cofinanced by Greece and the European Union (European Regional Development Fund). References 1. Cech TR, Steitz JA (2014) The noncoding RNA revolution-trashing old rules to forge new ones. Cell 157(1):77–94 2. Huntzinger E, Izaurralde E (2011) Gene silencing by microRNAs: contributions of translational repression and mRNA decay. Nat Rev Genet 12(2):99–110

3. Catalanotto C, Cogoni C, Zardo G (2016) MicroRNA in control of gene expression: an overview of nuclear functions. Int J Mol Sci:17 (10) 4. Derrien T et al (2012) The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res 22(9):1775–1789

260

Dimitra Karagkouni et al.

5. Carlevaro-Fita J, Johnson R (2019) Global positioning system: understanding long noncoding RNAs through subcellular localization. Mol Cell 73(5):869–883 6. Matsumoto A et al (2017) mTORC1 and muscle regeneration are regulated by the LINC00961-encoded SPAR polypeptide. Nature 541(7636):228–232 7. Li J, Liu C (2019) Coding or noncoding, the converging concepts of RNAs. Front Genet 10:496 8. Yao RW, Wang Y, Chen LL (2019) Cellular functions of long noncoding RNAs. Nat Cell Biol 21(5):542–551 9. Gil N, Ulitsky I (2020) Regulation of gene expression by cis-acting long non-coding RNAs. Nat Rev Genet 21(2):102–117 10. Vance KW, Ponting CP (2014) Transcriptional regulatory functions of nuclear long noncoding RNAs. Trends Genet 30(8):348–355 11. Marchese FP, Raimondi I, Huarte M (2017) The multidimensional mechanisms of long noncoding RNA function. Genome Biol 18 (1):206 12. Riva P, Ratti A, Venturin M (2016) The long non-coding RNAs in neurodegenerative diseases: novel mechanisms of pathogenesis. Curr Alzheimer Res 13(11):1219–1231 13. Cai X, Cullen BR (2007) The imprinted H19 noncoding RNA is a primary microRNA precursor. RNA 13(3):313–316 14. Dhir A et al (2015) Microprocessor mediates transcriptional termination of long noncoding RNA transcripts hosting microRNAs. Nat Struct Mol Biol 22(4):319–327 15. Wang X et al (2015) Silencing of long noncoding RNA MALAT1 by miR-101 and miR-217 inhibits proliferation, migration, and invasion of esophageal squamous cell carcinoma cells. J Biol Chem 290(7):3925–3935 16. Yoon JH et al (2012) LincRNA-p21 suppresses target mRNA translation. Mol Cell 47 (4):648–655 17. Cesana M et al (2011) A long noncoding RNA controls muscle differentiation by functioning as a competing endogenous RNA. Cell 147 (2):358–369 18. Kallen AN et al (2013) The imprinted H19 lncRNA antagonizes let-7 microRNAs. Mol Cell 52(1):101–112 19. Fuchs Wightman F et al (2018) Target RNAs strike back on MicroRNAs. Front Genet 9:435 20. Kleaveland B et al (2018) A network of noncoding regulatory RNAs acts in the mammalian brain. Cell 174(2):350–362.e17

21. Faghihi MA et al (2010) Evidence for natural antisense transcript-mediated inhibition of microRNA function. Genome Biol 11(5):R56 22. Sun Y et al (2017) A long non-coding RNA HOTTIP expression is associated with disease progression and predicts outcome in small cell lung cancer patients. Mol Cancer 16(1):162 23. Karagkouni D et al (2020) DIANA-LncBase v3: indexing experimentally supported miRNA targets on non-coding transcripts. Nucleic Acids Res 48(D1):D101–D110 24. Huo Y et al (2017) MALAT1 predicts poor survival in osteosarcoma patients and promotes cell metastasis through associating with EZH2. Oncotarget 8(29):46993–47006 25. Li Y et al (2017) Long non-coding RNA MALAT1 promotes gastric cancer tumorigenicity and metastasis by regulating vasculogenic mimicry and angiogenesis. Cancer Lett 395:31–44 26. Leucci E et al (2013) microRNA-9 targets the long non-coding RNA MALAT1 for degradation in the nucleus. Sci Rep 3:2535 27. Song P et al (2016) Long non-coding RNA XIST exerts oncogenic functions in human nasopharyngeal carcinoma by targeting miR-34a-5p. Gene 592(1):8–14 28. Du Y et al (2017) LncRNA XIST acts as a tumor suppressor in prostate cancer through sponging miR-23a to modulate RKIP expression. Oncotarget 8(55):94358–94370 29. Liang L et al (2018) LncRNA HCP5 promotes follicular thyroid carcinoma progression via miRNAs sponge. Cell Death Dis 9(3):372 30. Wang T et al (2014) Hsa-miR-1 downregulates long non-coding RNA urothelial cancer associated 1 in bladder cancer. Tumour Biol 35 (10):10075–10084 31. Yizhak K et al (2019) RNA sequence analysis reveals macroscopic somatic clonal expansion across normal tissues. Science:364(6444) 32. Shan Y et al (2018) LncRNA SNHG7 sponges miR-216b to promote proliferation and liver metastasis of colorectal cancer through upregulating GALNT1. Cell Death Dis 9(7):722 33. Xu M et al (2018) The long noncoding RNA SNHG1 regulates colorectal cancer cell growth through interactions with EZH2 and miR-154-5p. Mol Cancer 17(1):141 34. Xu M et al (2019) lncRNA SNHG6 regulates EZH2 expression by sponging miR-26a/b and miR-214 in colorectal cancer. J Hematol Oncol 12(1):3 35. Sha M et al (2018) Long non-coding RNA MIAT promotes gastric cancer growth and metastasis through regulation of miR-141/

Characterizing miRNA–lncRNA Interplay DDX5 pathway. J Exp Clin Cancer Res 37 (1):58 36. Liu L et al (2019) The long non-coding RNA SNHG1 promotes glioma progression by competitively binding to miR-194 to regulate PHLDA1 expression. Cell Death Dis 10 (6):463 37. Yao Y et al (2015) Knockdown of long non-coding RNA XIST exerts tumorsuppressive functions in human glioblastoma stem cells by up-regulating miR-152. Cancer Lett 359(1):75–86 38. Wang Y et al (2018) Long non-coding RNA DSCR8 acts as a molecular sponge for miR-485-5p to activate Wnt/beta-catenin signal pathway in hepatocellular carcinoma. Cell Death Dis 9(9):851 39. Tsang FH et al (2015) Long non-coding RNA HOTTIP is frequently up-regulated in hepatocellular carcinoma and is targeted by tumour suppressive miR-125b. Liver Int 35 (5):1597–1606 40. Zhao B et al (2019) MiRNA-124 inhibits the proliferation, migration and invasion of cancer cell in hepatocellular carcinoma by downregulating lncRNA-UCA1. Onco Targets Ther 12:4509–4516 41. Li F et al (2019) MALAT1 regulates miR-34a expression in melanoma cells. Cell Death Dis 10(6):389 42. Tao F et al (2018) miR-211 sponges lncRNA MALAT1 to suppress tumor growth and progression through inhibiting PHF19 in ovarian carcinoma. FASEB J:fj201800495RR 43. Tate JG et al (2019) COSMIC: the catalogue of somatic mutations in cancer. Nucleic Acids Res 47(D1):D941–D947 44. Zhang J et al (2019) STAT3-induced upregulation of lncRNA MEG3 regulates the growth of cardiac hypertrophy through miR-361-5p/ HDAC9 axis. Sci Rep 9(1):460 45. Liu W et al (2017) Long non-coding RNA MALAT1 contributes to cell apoptosis by sponging miR-124 in Parkinson disease. Cell Biosci 7:19 46. Fan Y et al (2020) LncRNA BDNF-AS promotes autophagy and apoptosis in MPTPinduced Parkinson’s disease via ablating microRNA-125b-5p. Brain Res Bull 157:119–127 47. Cao DW et al (2020) The lncRNA Malat1 functions as a ceRNA to contribute to berberine-mediated inhibition of HMGB1 by sponging miR-181c-5p in poststroke inflammation. Acta Pharmacol Sin 41(1):22–33 48. Zhong Y, Yu C, Qin W (2019) LncRNA SNHG14 promotes inflammatory response induced by cerebral ischemia/reperfusion

261

injury through regulating miR-136-5p/ ROCK1. Cancer Gene Ther 26(7-8):234–247 49. Zhang L et al (2019) LncRNA Riken-201 and Riken-203 modulates neural development by regulating the Sox6 through sequestering miRNAs. Cell Prolif 52(3):e12573 50. Weng R et al (2018) Long noncoding RNA-1604 orchestrates neural differentiation through the miR-200c/ZEB axis. Stem Cells 36(3):325–336 51. Rani N et al (2016) A primate lncRNA mediates notch signaling during neuronal development by sequestering miRNA. Neuron 90 (6):1174–1188 52. Li Z et al (2019) LncRNA H19 promotes the committed differentiation of stem cells from apical papilla via miR-141/SPAG9 pathway. Cell Death Dis 10(2):130 53. Xu H et al (2019) Inducible degradation of lncRNA Sros1 promotes IFN-gammamediated activation of innate immune responses by stabilizing Stat1 mRNA. Nat Immunol 20(12):1621–1630 54. Karagkouni D et al (2018) DIANA-TarBase v8: a decade-long collection of experimentally supported miRNA-gene interactions. Nucleic Acids Res 46(D1):D239–D245 55. Andachi Y (2008) A novel biochemical method to identify target genes of individual microRNAs: identification of a new Caenorhabditis elegans let-7 target. RNA 14(11):2440–2451 56. Hansen TB et al (2013) Natural RNA circles function as efficient microRNA sponges. Nature 495(7441):384–388 57. Chi SW et al (2009) Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. Nature 460(7254):479–486 58. Hafner M et al (2010) Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell 141(1):129–141 59. Helwak A et al (2013) Mapping the human miRNA interactome by CLASH reveals frequent noncanonical binding. Cell 153 (3):654–665 60. Frankish A et al (2019) GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res 47(D1):D766–D773 61. O’Leary NA et al (2016) Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 44(D1):D733–D745 62. Cabili MN et al (2011) Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev 25(18):1915–1927

262

Dimitra Karagkouni et al.

63. Sherry ST et al (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29 (1):308–311 64. Landrum MJ et al (2018) ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res 46(D1): D1062–D1067 65. Paraskevopoulou MD et al (2018) microCLIP super learning framework uncovers functional transcriptome-wide miRNA interactions. Nat Commun 9(1):3601 66. Andrews S, FastQC A (2010) A quality control tool for high throughput sequence data. Google Scholar, 2015 67. Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30 (15):2114–2120 68. Wu TD, Nacu S (2010) Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26(7):873–881 69. Paraskevopoulou MD et al (2013) DIANAmicroT web server v5.0: service integration into miRNA functional analysis workflows.

Nucleic Acids Res 41(Web Server issue): W169–W173 70. Paraskevopoulou MD et al (2016) DIANALncBase v2: indexing microRNA targets on non-coding transcripts. Nucleic Acids Res 44 (D1):D231–D238 71. Vlachos IS et al (2015) DIANA-miRPath v3.0: deciphering microRNA function with experimental support. Nucleic Acids Res 43(W1): W460–W466 72. Gorodkin J, Seemann ES (2019) RNAcentral: a hub of information for non-coding RNA sequences. Nucleic Acids Res 47(D1): D1250–D1251 73. Kozomara A, Birgaoanu M, Griffiths-Jones S (2019) miRBase: from microRNA sequences to function. Nucleic Acids Res 47(D1): D155–D162 74. Haeussler M et al (2019) The UCSC Genome Browser database: 2019 update. Nucleic Acids Res 47(D1):D853–D858 75. Cunningham F et al (2019) Ensembl 2019. Nucleic Acids Res 47(D1):D745–D751

Chapter 22 Illuminating lncRNA Function Through Target Prediction Hua-Sheng Chiu, Sonal Somvanshi, Ting-Wen Chen, and Pavel Sumazin Abstract Most of the transcribed human genome codes for noncoding RNAs (ncRNAs), and long noncoding RNAs (lncRNAs) make for the lion’s share of the human ncRNA space. Despite growing interest in lncRNAs, because there are so many of them, and because of their tissue specialization and, often, lower abundance, their catalog remains incomplete and there are multiple ongoing efforts to improve it. Consequently, the number of human lncRNA genes may be lower than 10,000 or higher than 200,000. A key open challenge for lncRNA research, now that so many lncRNA species have been identified, is the characterization of lncRNA function and the interpretation of the roles of genetic and epigenetic alterations at their loci. After all, the most important human genes to catalog and study are those that contribute to important cellular functions—that affect development or cell differentiation and whose dysregulation may play a role in the genesis and progression of human diseases. Multiple efforts have used screens based on RNA-mediated interference (RNAi), antisense oligonucleotide (ASO), and CRISPR screens to identify the consequences of lncRNA dysregulation and predict lncRNA function in select contexts, but these approaches have unresolved scalability and accuracy challenges. Instead—as was the case for better-studied ncRNAs in the past— researchers often focus on characterizing lncRNA interactions and investigating their effects on genes and pathways with known functions. Here, we focus most of our review on computational methods to identify lncRNA interactions and to predict the effects of their alterations and dysregulation on human disease pathways. Key words lncRNA, Noncoding RNA, Target prediction, Systems biology, LongHorn

1

Long Noncoding RNAS Are Ubiquitous Players in Cellular Processes

1.1 Cataloging lncRNAs

Large-scale transcriptome analyses have identified numerous classes of ncRNA transcripts of a variety of lengths and structures [1], and while the catalog of ncRNAs remains incomplete, they already outnumber protein-coding transcripts by 70% according to GENCODE (v.34). lncRNAs, which have more than 200 nucleotides in length, are the most abundant ncRNA class with over 250,000 proposed species [2]. Many lncRNAs are nonpolyadenylated [3, 4] and most are expressed at lower levels than the levels observed for mRNAs [5]. Some lncRNAs were found to encode short peptides [6–9], but these are thought to be exceptions. When

Lin Zhang and Xiaowen Hu (eds.), Long Non-Coding RNAs: Methods and Protocols, Methods in Molecular Biology, vol. 2372, https://doi.org/10.1007/978-1-0716-1697-0_22, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2021

263

264

Hua-Sheng Chiu et al.

they were first identified, lncRNAs were thought to be transcription and splicing by-products [10, 11] or sequencing artifacts [12]. Today, dozens of lncRNAs are known to regulate key pathways in normal and disease cells [13–18]. Large-scale profiling efforts aimed at cataloging lncRNA transcripts and quantifying their expression profiles across contexts include MiTranscriptome [19], FANTOM CAT [20], human large intergenic noncoding RNAs (LincRNAs) [21], and RNA Atlas [22]. These efforts provided compelling evidence for tight regulation of lncRNAs at the transcriptional and posttranscriptional levels, identified key cellular regulators that are coexpressed with lncRNAs, and proposed disease-associated alterations at the loci of lncRNAs. Taken together, these data supported the regulatory potential of lncRNAs during normal development and implicated them as agents of disease genesis and progression. The highly tissue- and cell-specific expression of these lncRNAs also suggested that lncRNAs may account for some of the observed cell-to-cell heterogeneity. These studies assembled the most comprehensive compendium of human lncRNA expression profiles across tumors, normal tissues, and cell lines, and enabled the examination of the regulatory roles of lncRNAs and multiple opportunities for targeting lncRNAs for clinical use. Other ongoing lncRNAs annotation efforts include GENCODE [23], NONCODE [24], LNCipedia [25], and RNAcentral [26]. As a part of the ENCODE project, GENCODE Consortium assembled a large set of high-quality lncRNAs through manual curation or evidence-based approaches. GENCODE-defined lncRNA biotypes are well recognized by lncRNA researchers and their high-accuracy transcript sequences have been extensively used as the gold-standard dataset for novel lncRNA discovery [27– 31]. NONCODE curated nonredundant lncRNA transcripts from several databases and annotated them with additional information including disease associations and cross-species conservation. Similarly, LNCipedia offered high-quality and comprehensive lncRNA annotations including official gene symbols and inferred protein-coding potential. RNAcentral combined data from multiple databases including lncRNA annotations, sequences, putative structures, predicted interaction partners, and inferred function. These databases provide an accessible means for retrieving basic lncRNA features for the research community. 1.2 Estimating lncRNA Expression Profiles in Normal and Disease Cells

The availability of accurate lncRNA annotations allowed the quantification of lncRNA expression rates in tumors, cell lines, and tissues. The first large-scale analysis of lncRNA expressions was profiled by RNA-Seq in >5K The Cancer Genome Atlas (TCGA) patient samples and >930 Cancer Cell Line Encyclopedia (CCLE) cell lines, covering 13 and 22 adult cancer types, respectively [32, 33]. Another concurrent study, TANRIC, provided an online

Inferring lncRNA Function Through Target Prediction

265

searchable tool with the measurement of transcriptome-wide lncRNA expression profiles from RNA-Seq datasets in a set of 20 TCGA cancer types and some independent cohorts [34]. Both systematic studies concluded that, compared to coding genes, the expression and dysregulation of lncRNAs are highly cancer typespecific, suggesting their potential as prognostic biomarkers and therapeutic targets. Similar observations were made by studies including Genotype-Tissue Expression (GTEx), which profiled both mRNA and lncRNA expression in normal tissues and healthy individuals [35–38]. These data suggested that lncRNAs have specific expression normal and diseased human cells. 1.3 lncRNAs Regulate Key Genes and Pathways in Normal and Disease Cells

2

Emerging evidence indicated that lncRNAs play regulatory roles in multiple cellular processes of normal and disease cells, including DNA repair [39], cancer cell proliferation [40], epithelialmesenchymal transition [41], reprogramming of induced pluripotent stem cells [42], and chromatin modification [43, 44] by targeting key genes and pathways. In addition, multiple lncRNAs have been implicated in epigenetic regulation of gene expression during key developmental processes. These included XIST, TSIX, AND JPX, whose knockouts caused aberrant X-inactivation and early embryonic lethality [45–47]. Deletion of lncRNAs AIR and KCNQ1OT1 led to a loss of imprinting and growth defects [48, 49]. The deletion of the imprinted lncRNA gene GTL2 in mouse revealed clear cis-acting effects on adjoining genes, resulting in perinatal lethality [50]. In addition, FENDRR, which is transcribed divergently from the transcription factor (TF) coding gene FOXF1, was shown to play an essential role in organ development, including heart, gastrointestinal tract, body wall, and lung development [51, 52]. Interestingly, DEANR1, which is involved in endoderm development, is mainly expressed downstream of the endoderm master regulator FOXA2. DEANR1 contributes to endoderm differentiation by positively regulating FOXA2, and overexpression of FOXA2 extricated endoderm differentiation defects caused by DEANR1 depletion [53]. Lastly, the depletion of DALI, which is transcribed 50 kb downstream of the TF POU3F3, downregulated POU3F3 and disrupted the differentiation of neuroblastoma cells [54].

Systems Biology Approaches to Predict LNCRNA Function Thousands of lncRNAs may be expressed in each human cell type, and multiple reports suggest that these RNAs play key roles in healthy cells and in disease genesis and progression. However, the functional roles of the majority of cataloged lncRNAs remain unknown. Indeed, annotating functional lncRNAs and interpreting the effects of alterations at their loci and their dysregulation are the

266

Hua-Sheng Chiu et al.

next frontiers in RNA biology. The key question for the field, applied to each human disease and cellular and developmental process is: which lncRNAs play key roles in this context? Subsequently, once a greater proportion of lncRNAs are annotated, we would be able to better determine the fraction of identified lncRNAs that have functional roles across contexts. In the absence of such annotation, the significance of lncRNA biology will remain a matter of academic debate. Because lncRNAs are numerous, annotation efforts often begin with large-scale screens. These include direct functional screens— where lncRNAs are dysregulated as a pool or individually to identify genes that alter specific phenotypes in specific contexts—and indirect screens where the effect of lncRNA dysregulation may be predicted based on the genes and pathways that they target. Here, we reviewed both methods but emphasize the second approach because it is more easily scalable and because it not only predicts the effects of lncRNA dysregulation but also informs about the mechanisms and biological processes by which lncRNAs alter cells. 2.1 Systems Biology Approaches to Evaluate the Phenotypic Effects of lncRNAs

Genome-wide loss-of-function screens using pooled RNAi [55– 57] and CRISPR/Cas9 libraries [58, 59] in cancer cell lines helped identify protein-coding genes whose products are essential for cancer cell growth and survival (also known as “genetic dependencies”) [60]. Computational methods for the integrative analysis of essential genes, for example, Project Achilles in the Cancer Dependency Map [55], together with other multiomics datasets in the LINCS [61] or CCLE [33, 62–66] further enabled the characterization of lncRNA targets and interaction partners governing key cellular processes and therapeutic responses. These large-scale studies have contributed to our understanding of gene function in general and helped produce candidate therapeutic targets for diseases including cancer [67–71]. To replicate these efforts for lncRNAs, a pioneering work has interrogated the functional consequences of knocking down 226 lincRNAs through shRNA-mediated silencing in mouse embryonic stem cells [72]. Its results suggested that knockdowns of nearly every highly expressed lincRNAs dysregulated gene expression programs at a large scale, and that the knockdown of dozens of lincRNAs had similar phenotypic effects to the downregulation of well-known embryonic stem cell regulators. Another RNAi-based loss-of-function study has used siRNAs to downregulate 20 lncRNAs, whose expressions were significantly increased during adipogenesis, a process of forming fat cells, and half of these assays observed reductions in lipid accumulation [73]. However, these two studies also observed that RNAi-based silencing efficiencies can vary dramatically across lncRNAs. The FANTOM6 project recently knocked down 285 lncRNAs by antisense oligonucleotides in human primary dermal fibroblasts and highlighted a

Inferring lncRNA Function Through Target Prediction

267

few lncRNAs that affected cell growth and induced morphology changes [74]. In addition, three genome-scale pooled screens based on CRISPR-Cas9-mediated deletion [75], interference [76], and activation [77] were developed to simultaneously target >600, >16K, and >10K lncRNAs in human cancer or induced pluripotent stem cells, respectively. These studies confirmed the functional roles of several lncRNAs in regulating cancer cell growth, and remarkably, in a cell line-specific manner [76], suggesting lncRNAs have great potential as better druggable targets than protein-coding genes to reduce toxicities of targeted therapies in cancer patients [78– 81]. Toward the development of combinatorial therapies for cancer, CRISPR-Cas9-mediated targeting of thousands of lncRNAs also helped identify lncRNAs that conferred resistance to chemotherapy and radiotherapy treatments [82, 83]. Taken together, these approaches allowed researchers to systematically examine phenotypic effects [84, 85] and clinical relevance [86] of human lncRNAs. 2.2 Systems Biology Approaches to Identify lncRNA Targets

Knowledge about lncRNA targets and interacting partners can be used to explore the functional consequences and regulatory mechanisms of lncRNAs. Toward these goals, multiple efforts used specifically designed assays and models for lncRNA regulations. Systems biology approaches used high-throughput screens to predict lncRNA interactions and to identify phenotypes that are associated with lncRNA dysregulation. Computational approaches mined large-scale data to model the effects of lncRNAs, predicting transcriptional and posttranscriptional targets, and identifying lncRNAs of interest in specific contexts based on in vitro association screens. As was the case for efforts to investigate other ncRNAs in the past, the identification of driver lncRNAs often focused on predicting their target genes and pathways—lncRNAs that target key genes and pathways in a given context are more likely to play important roles in this context. Efforts to identify genome-wide binding sites and transcriptional targets by capturing lncRNA–DNA interactions include ChIRP [87, 88], CHART [89], ChAR-seq [90], and GRID-seq [91, 92]. These methods created a high-resolution map of lncRNA occupancy on functional cis-regulatory DNA elements, including proximal promoters and enhancers, to help narrow down the target candidates and associated biological processes that are regulated by nuclear lncRNAs. Of note, ChIRP and CHART can pull down DNA and protein simultaneously to identify TFs and chromatin remodelers that interact with lncRNAs to regulate transcription. Cytoplasmic lncRNAs, however, may act as endogenous sponges of RNA-binding proteins (RBPs) or microRNAs (miRNAs) to posttranscriptionally modulate RNA stabilization and degradation without physically binding target RNAs [93–95]. This

268

Hua-Sheng Chiu et al.

form of indirect regulation sometimes referred to as competitive endogenous RNAs (ceRNAs) regulation [93, 96, 97], has been extensively studied in both cancer [98–103] and healthy tissues [104–109]. Unfortunately, existing experimental approaches can discover lncRNA interaction (ceRNA) partners, that is, the titrated RBPs or miRNAs, but not their RNA targets. High-throughput methods to probe lncRNA–RBP interactions by cross-linking and immunoprecipitation (CLIP) technology [110, 111] and its variants, including iCLIP [112], eCLIP [113], sCLIP [114], and PAR-CLIP [115] have become standard. These methods have been proven to outperform their counterpart, RIP [116, 117], as the extra cross-linking step helps prevent RNA-RBP pull-down complexes from rapid degradation and thus dramatically reduce the false-negative rate. ENCODE Consortium currently hosts the largest collection of 223 high-confidence eCLIP datasets for 150 RBPs in cancer cells [118–120], serving as a unique resource for the study of lncRNA-interacting RBPs. Capturing two interacting RNAs (e.g., lncRNA and miRNA) without disrupting their native state during the RNA extraction process remains a highly challenging task, only a few highthroughput screening methods are well poised for the transcriptome-wide discovery of RNA–RNA interactions; for example, CLASH is for lncRNA–miRNA or mRNA–miRNA duplexes [121], MARIO [122] and SPLASH [123] are for interacting RNAs in vivo, and others [124]. Overall, researchers recognized that the experimental characterization of the lncRNA interacting partners still has considerable unrealized potential, so that novel technologies and applications are under active development [125]. A multitude of experimental methods for studying lncRNA regulation have been developed, but each suffers from technical limitations that limit their use for lncRNA research. ChIRP and CHART both required lncRNA sequences to be fully identified before the use of oligonucleotide probes designed to pull down complexes through sequence complementarity [126]. The reliability of RBP binding targets identified by CLIP-based approaches is highly dependent on whether the antibody, used to recognize the RBP, is sufficient to produce specific binding in the process [127]. These methods require specialized equipment, expensive reagents, skilled labor, and lengthy preparation processes. Finally, scalability may be the greatest challenge for these methods, as assays are applied to one lncRNA in one cellular context at a time. Put together, these challenges make most biochemical screens for lncRNA targets prohibitively expensive for most laboratories. Indeed, the 866 human datasets from CLIP-based sequencing experiments aggregated over ENCODE [118], POSTAR2 and CLIPdb [128, 129], ENCORI (starBase) [130], and DoRiNA [131]—all major efforts with predicted CLIP peaks—include an

Inferring lncRNA Function Through Target Prediction

269

order of magnitude fewer assays than the ENCODE ChIP-seq effort alone [132, 133]. Only a handful of human datasets from lncRNA genome-wide DNA binding assays are available in the Gene Expression Omnibus (GEO) [134] and most of the reported interactions were identified after crossing biochemical screening data with curated data from low-throughput experiments from databases such as LnChrom [135], NPInter [136], and RNAInter [137].

3

Computational Inference to Predict LNCRNA Function Concurrently with biochemical screens, multiple computational methods were developed to systematically infer lncRNA targets and their interaction networks. These methods largely filled the gap between the existing biochemical technologies and our need to perform integrative analyses to gain insights into lncRNA function. Here, we classify these in silico methods into two categories: transcriptional and posttranscriptional interaction prediction methods. The advantages of computational methods are their lower cost and their seamless scalability, but the challenge has been to evaluate and improve their accuracy.

3.1 Predicting Transcriptional Interactions

Most of methods in this category recognized that single-stranded lncRNAs bind double-stranded DNA (dsDNA) by forming triplehelical (or triplex) structures [138–141], for example, NEAT1 [142], MEG3 [143], FENDRR [144], TERC [145], and HOTAIR [146]. Like DNA-binding proteins, these prediction methods identified lncRNAs that contain DNA-binding domains and inferred their potential binding sites on both strands of DNA. RNA–dsDNA triplex formations rely on Hoogsteen and reverse Hoogsteen base pairs that allow the binding of lncRNAs in either parallel (both 50 to 30 ) or antiparallel (50 to 30 and 30 to 50 ) orientations [140]. To infer lncRNA–DNA interactions, computational methods—summarized in Table 1 and reviewed below—defined a set of triplex binding rules and examined whether lncRNAs form triplexes with target promoters and enhancers. Triplexator [147, 148] defines constraint functions to identify sequence features from each pair of lncRNA and dsDNA which satisfy six canonical binding rules of Hoogsteen and reverse Hoogsteen base pairing and maximize the length of the triplex bond between them. These functions place constraints on the triplex length and the number of mismatches that do not follow the binding rules. Triplex Domain Finder [149] and LongTarget [150] both adopted the same binding rules as Triplexator. Triplex Domain Finder exclusively provides statistical significance and ranks each identified triplex candidate. LongTarget is a web-based tool that is far slower than Triplexator, making genome-wide

270

Hua-Sheng Chiu et al.

Table 1 The list of 12 genome-wide lncRNA transcriptional target prediction methods. Methods are grouped by the model type

Method

Model type

Triplex-binding assumption

Triplexator

Rule-based

Triplex Domain Finder

Input data Sequence Expression

References





[147, 148]

Rule-based





[149]

LongTarget

Rule-based





[150]

Triplex

Rule-based





[151, 152]

TTSMI

Rule-based





[153, 154]

Super-lncRNAs

Rule-based & correlation







[155]

LongHorn (transcriptional module)

Rule-based & correlation







[156]

LncMAP

Correlation



[157]

LncMod

Correlation



[158]

Liu et al.

Correlation



[159]

Lu et al.

Correlation



[160]

Wang et al.

Deep learning



[161]

analyses prohibitively slow. In addition to canonical rules, Triplex [151, 152] incorporated six less-common binding rules and then applied a BLAST-like dynamic programming algorithm [162] to select the best triplexes with the highest alignment scores. TTSMI is a database containing inferred triplex sites in the human genome and allows users to study co-occurrences of triplexes with other types of genomic regulatory elements, for example, TF binding sites, CpG islands, and single-nucleotide polymorphisms [153, 154]. Super-lncRNAs [155] is a regression-based approach to search for lncRNAs that preferentially bind superenhancers—TF binding site enriched regulatory regions [163]—through triplex formations. Lastly, Wang et al. developed a deep learning-based classifier, trained on public ChIRP-seq datasets of twelve lncRNAs, and applied it to predict genome-wide lncRNA binding sites [161]. Their results confirmed that these lncRNAs are consistently forming triplex structures with dsDNAs at genome-wide level. These rule-based predictors are fast and produce sequence evidence for interactions, but they too have known limitations. First, a manually curated lncRNA sequence is a prerequisite for triplex predictions. Given that only 38% of lncRNA transcripts

Inferring lncRNA Function Through Target Prediction

271

were validated (48479 in GENCODE v.34 vs. 127802 in LNCipedia v.5.2), sequence uncertainties may be propagated to their triplex predictions. In addition, many lncRNAs exhibit high cell type or tissue specificity, and so do their interactions. Inferring interactions without accounting for gene expression variability may lead to high false-positive rates. Third, current predictors often recommend long triplex sites with a low number of mismatches, for example, 19 bps and 10% error rate in Triplexator [147] to avoid an influx of candidates. Yet lncRNA molecules are intrinsically flexible to adopt various conformations while binding to dsDNAs or proteins [164, 165]. Setting overly stringent cutoffs might not reflect the true biological mechanism and thereby result in high false-negative rates. Lastly, a genome-wide analysis showed that about a third of nuclear lncRNAs do not form triplexes with the protein-coding gene promoters [156, 166]. Indeed, some lncRNAs, for example, Dali [54] and Kcnq1ot1 [167], regulate transcription without forming triplexes. Moreover, these predictors are incapable of inferring partnerships with proteins such as TFs. Alternatively, a few prediction efforts have modeled lncRNAs as modulators of TFs [141, 164]. Without the requirement of sequence-specific binding, these methods examine whether changes in lncRNA expression are significantly predictive of the regulatory strength between TFs and their targets. To accomplish this, a large set of gene expression profiles is required to achieve statistical significance, but such datasets sometimes are either absent from certain rare diseases (e.g., childhood cancer) or of poor quality due to technical and biological constraints. Moreover, the running time and space requirements of these methods quickly increase as a function of the number of samples, making their computational costs prohibitively high. After incorporation of gene expression profiles, however, these methods can yield more accurate predictions and their results can be readily integrated with other sample-matched multiomics data. Based on this concept of lncRNA modulation, a couple of computational methods, see Table 1, have been developed to harness the power of large-scale gene expression profiles from TCGA and TANRIC. LongHorn [156] and LncMAP [157] sought to perform a pancancer transcriptome analysis, and LncMod [158] was developed for glioblastoma. Findings from the pancancer analyses indicated that several lncRNAs act as a ubiquitous and others as tumor type-specific modulators of transcription. These methods evaluated the changes in correlation between TFs and their targets in samples with high versus low candidate lncRNA modulator expression, with significance evaluated using nonparametric methods such as permutation tests. To measure the degree of correlation, LongHorn and LncMAP used the distance correlation (dCor) [168] and the Spearman’s ρ, respectively, allowing for a nonlinear relationship to be captured. LncMod, however, used the Pearson’s

272

Hua-Sheng Chiu et al.

r to measure the strength of the linear association. In addition to expression-based evidence, LongHorn further specified the requirement for triplex formation to ensure each inferred lncRNA-target pair was supported by sequence binding. Other efforts simply performed coexpression analysis to identify the targets of lncRNAs[159, 160], which does not resolve interaction directionality and is not sufficient to infer causality. 3.2 Predicting Posttranscriptional Interactions

The multifaceted roles of lncRNAs in posttranscriptional regulation have been extensively characterized [139, 169–171] and ceRNA regulation is the most well-recognized mechanism of posttranscriptional regulation by lncRNAs [93, 96, 172, 173]. The ceRNA hypothesis is that when two RNA transcripts are targeted by the same pool of miRNAs, up- or downregulation of one RNA will titrate their coregulating miRNAs and result in up- or downregulation of the other RNA [174, 175]. Many lncRNAs are enriched for miRNA binding sites [130, 176–178] and their dysregulation alters miRNA activity and, consequently, the key biological processes that they regulate [179–181]. For example, the downregulation of MEG3 releases more oncogenic miR-181a to inhibit BCL2 expression and may lead to gastric carcinogenesis [182]. Hypoxiainduced H19 and HULC control cholangiocarcinoma by soaking up let-7a/b and miR-372/373 that target IL6 and CXCR4, respectively [183]. TUG1 upregulates ZEB2 expression, a master regulator gene of epithelial-mesenchymal transition, by competing for miR-145 binding in bladder cancer [184]. Circular RNA CDR1as (or ciRS-7) acts as a miR-7 sponge in the human and mouse brain, suggesting its potential roles in neural development [105, 185]. Other lncRNAs with ceRNA activity include OIP5AS1 [186, 187], LINC01133 [188], LINC00665 [189], UCA1 [190], PVT1 [191], BGLT3 [192], CCAT1 [193], etc. In addition to miRNAs, lncRNA can also compete for RBP binding; for example, OIP5-AS1 modulates the binding activity of ELAVL1 (HuR) to affect cervical cancer cell proliferation [194] and NORAD sequesters PUMILIO proteins from their mRNA targets to maintain genome stability in colorectal carcinoma [195]. RBP- and miRNA-mediated ceRNA interactions can facilitate cross talk between lncRNAs and cancer genes [139, 196]. Several determinants of ceRNA activities had been identified, including the number of miRNA species coregulating the targets [101], and the relative abundance and the binding affinity of miRNAs and their targets [197, 198]. Despite controversies surrounding the physiological relevance of ceRNA regulation [94, 199–201], studies have demonstrated that ceRNA has system-level regulatory effects when integrated with transcriptomes [202–205] and that ceRNA alters the abundance of hundreds of cancer genes in multiple tumor contexts [102] through densely interconnected regulatory networks [206]. In addition, the coupling effects of pancancer

Inferring lncRNA Function Through Target Prediction

273

ceRNA interactions are likely to be context-independent and physiologically relevant [101]. Here, we review computational methods for predicting ceRNA interactions between coding and ncRNAs and summarize the similarities and differences in Table 2 according to their strategies used to identify and account for miRNA targets and evaluate expression-based evidence for ceRNA interactions. Supported by both experimental and computational evidence, four landmark studies had confirmed the importance of ceRNA regulation in prostate and colon carcinomas [100], melanoma [99], glioblastoma [98], and muscle differentiation [104]. Among them, two computational methods, MuTaME [100] and Hermes [98] were proposed for the prediction of ceRNA interactions. MuTaME was originally used to identify ceRNAs for PTEN, but it can be easily applied to other cancer genes of interest. Specifically, MuTaME evaluated ceRNA candidates based on the number of shared miRNAs with PTEN, and the density, diversity, and proximity of miRNA binding sites predicted by RNA22 [223]. MuTaME-predicted PTEN ceRNAs were protein-coding transcripts that scored highly and were coexpressed with PTEN according to Pearson correlation in specific tumor contexts. Hermes infers genome-wide ceRNA interactions for genes with significantly overlapping miRNA regulation programs that have significant evidence of conditional regulation. Hermes relies on Cupid-predicted miRNA targets [98], which in turn relies on binding sites scores by TargetScan [224], miRanda [225], and PITA [226]. When predicting interactions, Cupid also integrates crossspecies conservation [227] and an array of miRNA binding site properties using ensemble-based support vector machine classifiers with bootstrap aggregating [228, 229]. Hermes uses conditional mutual information (CMI) [230] to determine whether the expression profile of one transcript is significantly associated with the change in correlation between the expression profiles of the shared miRNA and the other transcript. CMI significance was estimated by a nonparametric permutation procedure, and p-values across all shared miRNAs were integrated to obtain the final combined pvalue [231]. Recent updates to Cupid simultaneously predict miRNA and ceRNA interactions [103]. While these methods predict protein-coding ceRNAs, they can be modified to predict ceRNA interactions between ncRNAs. For example, LongHorn predicted miRNA and RBP decoys—lncRNAs that modulate activities of posttranscriptional regulators through ceRNA interactions—and their target genes directly by running Cupid [156]. In addition, multiple review articles have observed large ceRNA prediction discrepancies among different methods and suggested ensemble-based approaches may alleviate this problem and improve performance substantially [232–234]. To achieve better accuracy, it is also crucial to use accurate miRNA interaction

Predicted

mRNA–mRNA

mRNA–mRNA

mRNA–mRNA

mRNA–mRNA

mRNA–mRNA

mRNA–mRNA

Cupid (MATLAB package; LongHorn’s post-transcriptional module)

ceRDB (CEFINDER) (database)

Chiu et al.

Zhou et al.

Song et al.

ce-Subpathway (R package)

Both

Both

Predicted

Predicted

Predicted

Predicted

mRNA–mRNA

Hermes (MATLAB script available)

Predicted

mRNA–mRNA

Interaction type

MuTaME

Method

miRNA targets Expression

l

l

l

l

Conditional regulation: CMI p-value integration over shared miRNA species: Brown’s method

Conditional regulation: CMI p-value integration over shared miRNA species: Fisher’s method

pFET (miRNA species) and count (miRNA species)

pFET (miRNA species) and count (miRNA species)

pFET (miRNA species) and count (miRNA species)

Same as MuTaME

Coexpression: PCC

Coexpression: PCC

Coexpression: PCC (both ceRNA and miRNA–target interactions)

Coexpression: PCC (significance is estimated by two-sample Kolmogorov–Smirnov test)

Count (MRE based on miRNA seed Not required family)

Weighted pFET (miRNA species)

pFET (miRNA species)

Composite score and count (miRNA Coexpression: PCC species)

Overlap (miRNA/MRE)

Evidence type

[211]

[210]

[209]

[208]

[207]

[103]

[98]

[100]

References

Table 2 A list of 20 genome-wide ceRNA prediction tools, including MuTaME and Hermes. Methods are grouped by predicted interaction types. miRNA–target interactions are compiled from computational predictions (Predicted), experimental datasets (Experimental), or both (Both). The classification of miRNAs or MREs (miRNA response elements) is based on their species or seed families. pFET p-value for the Fisher’s exact test. PCC Pearson’s correlation coefficient. CMI conditional mutual information

274 Hua-Sheng Chiu et al.

lncRNA–mRNA

lncRNA–mRNA

lncRNA–mRNA

lncRNA–mRNA

lncRNA–mRNA circRNA–mRNA

Pseudogene–mRNA Pseudogene–lncRNA

Any pair of mRNAs, lncRNAs, circRNAs, and pseudogenes

LncmiRSRN (R package)

miRNACancerMAP (web application)

lnCeDB (database)

Subpathway-LNCE (R package)

HumanViCe (database)

dreamBase (database)

starBase v2.0 (database)

User-specified

lncRNA–mRNA

DMCN

JAMI (Java-based software)

lncRNA–mRNA

sp-lncRNA

pFET (miRNA seed family) and count (miRNA species and MRE based on miRNA seed family)

Same as Cupid

pFET (miRNA seed family)

Both

Same as Cupid (CMI calculation is much faster)

Coexpression: PCC

Coexpression: PCC

Not required

pFET (miRNA species)

Not required Coexpression: PCC

pFET (miRNA species)

Userspecified

Coexpression: PCC Network construction: IDA (Intervention-calculus when the DAG is Absent) [214]

Coexpression: PCC

l

l

Not required

Coexpression: PCC (In addition to ceRNA interactions, sp-lncRNA also calculates PCC of miRNA–target interactions for p-value integration) l p-value integration over shared miRNA species: Fisher’s method l

pFET (miRNA species) and Jaccard (miRNA species)

pFET (miRNA species) and ratio (MRE based on miRNA species)

count (miRNA species)

Both

Predicted (viral miRNAhost target)

Both

Both

Both

Experimental pFET (miRNA species) and count (miRNA species)

Experimental Count (miRNA species)

Predicted

(continued)

[220]

[130]

[219]

[218]

[217]

[178]

[216]

[215]

[213]

[212]

Inferring lncRNA Function Through Target Prediction 275

User-specified

User-specified

miRspongeR (R package)

Interaction type

CeRNASeek (R package)

Method

Table 2 (continued)

Userspecified

Userspecified

miRNA targets

Expression

Eight available methods: (1) miRHomology, (2) positive correlation, (3) sensitivity partial Pearson’s correlation, (4) Hermes, (5) partial Pearson’s correlation, (6) MuTaME, (7) Cernia, and (8) integrative method (majority voting)

Six available methods: (1) Ratio, (2) HyperT, (3) HyperC, (4) sensitivity correlation, (5) CMI, and (6) Cernia

Overlap (miRNA/MRE)

Evidence type

[222]

[221]

References

276 Hua-Sheng Chiu et al.

Inferring lncRNA Function Through Target Prediction

277

predictions that are supported by multiple lines of evidence and to integrate context-specific expression profiles into predictive models. Cupid considers context-specific coexpression patterns of miRNAs and their candidate targets when inferring miRNA targets, it avoids producing inflated estimates of ceRNA effects by adopting the Brown’s method [235] to merge triplet p-values, and it uses evidence for ceRNA coupling to further improve miRNA target predictions. Finally we note that integrative approaches that leverage both sequence- and expression-based evidence have been shown to perform better [232]. 3.3 Integrative Computational Methods

We recently proposed an integrative approach, dubbed LongHorn (long noncoding RNA heterogeneous regulatory network integrator) [156] to predict lncRNA modulation of transcriptional and posttranscriptional interactions between protein-coding targets and canonical regulators (or effectors), including TFs, miRNAs, and RBPs. LongHorn predicts lncRNA targets based on a model for lncRNA modulation of effectors’ targeting of promoters and UTRs. LongHorn models lncRNA regulation as cofactors, guides, decoys, or switches. Specifically, LongHorn expects that lncRNAs that act as cofactors and guides to regulate transcription form triplex structures with target promoters, while lncRNAs that act as guides or decoys harbor transcription factor binding sites (TFBSs) to physically interact with TFs. TFs and cofactor lncRNAs bind to the same promoter, but the latter can either activate or inhibit the regulatory activities of the former. For targets that lack TFBSs in promoters, guide lncRNAs bind and recruit TFs to the promoters of target genes initiated and elongate transcription. Transcriptional and posttranscriptional decoy lncRNAs compete with their target genes for TF, miRNA, and RBP binding. Decoy lncRNAs control the number of regulator molecules available for binding to the promoters and UTRs to alter target expression. In contrast, switch lncRNAs can turn a specific effector on and off to regulate its downstream targets. Put together these lncRNA models provide an integrative framework to predict lncRNA targets, with many targets regulated in multiple modalities by the same lncRNA.

3.4 Total RNA-Seq Helps Improve lncRNA Interaction Prediction

Total RNA-Seq can be used to improve effector–target interaction predictions by providing a trace for transcription regulation. Namely, total RNA-Seq can be used to derive an intermediate gene-expression profile that follows transcriptional regulation and precedes posttranscriptional regulation. As such, this data allows quantifying the abundance of precursor messenger RNAs (pre-mRNAs) and messenger RNAs (mRNAs). This depends on the assumption that mRNAs but not pre-mRNA undergo posttranscriptional regulation by miRNA/RBP and that regulation by miRNAs and RBPs alters the ratio between mRNA and

278

Hua-Sheng Chiu et al.

pre-mRNA expressions but has little effects on pre-mRNA expression alone. Analogously, pre-mRNA expression is regulated by TFs, but the ratio between mRNA and pre-mRNA expression profiles is independent of transcriptional regulators. Finally, mRNA and pre-mRNA expression profiles can be used to assign evidence for transcriptional and posttranscriptional interactions by observing significant deviations of correlations between the expression profiles of the effectors and the target pre-mRNA or m/p-ratio. Note that pre-mRNAs may lack polyadenylated tails and thus may not be detectable by standard RNA-seq library preparation protocols [236]. To improve target predictions using total RNA-Seq data we first estimated pre-mRNA and mRNA expression profiles and the ratio between them [237–239] using featureCounts [240]. Here, intronic and exonic sequences are used as pre-mRNA and mRNA surrogates, respectively [237, 241–243]. Accurate pre-mRNA estimates required that profiled introns do not overlap any exons and mapped reads across exon–intron junctions were counted as exonic [237]. Next, we used dCor to correlate expression profiles of each effector and the target pre-mRNA or m/p-ratio. We established dCor baselines [244–247] using random TF-, miRNA-, and RBP-target pairs (10K per modality) from all profiled genes without replacement. The averages of these were subtracted from computed dCors to produce adjusted dCor. The significance of difference (or correlation deviation) between an effector’s adjusted dCor with its target’s pre-mRNA and m/p-ratio was estimated by a two-sample Student’s t-test. Transcriptional targets were expected to have higher effector to pre-mRNA adjusted dCor values, while posttranscriptional targets were expected to have higher effector to m/p-ratio adjusted dCor values. Leveraging on total RNA-seq datasets profiled by RNA Atlas, we used this approach to predict targets for 122 lncRNAs in the Cancer LncRNA Census [248]—a database of human lncRNAs implicated in tumorigenesis—and 28 of them had at least 10 LongHorn-predicted targets with available expression estimates for pre-mRNA, mRNA, and m/p-ratio. These included SAMMSON [249], FENDRR [51], MALAT1 [250], MEG3 [251], FTX [252], PVT1 [253], XIST [254], NEAT1 [255], ZFAS1 [256], and HOTAIR [257]. For these 28 lncRNAs, LongHorn predicted a total of 3245 transcriptional and 740 posttranscriptional targets that were supported by the expected correlations between effector expression, that is, TF or miRNA, and target pre-mRNA or m/pratio expression estimates. Specifically, for predicted TF–target interactions modulated by these lncRNAs, we observed that pre-mRNA correlations were consistently higher than m/p-ratio correlations (Fig. 1a). Similarly, miRNA–target interactions showed higher m/p-ratio correlations than pre-mRNA correlations (Fig. 1b). In addition, 42% (1690/3985) of these lncRNA–target

Inferring lncRNA Function Through Target Prediction B

C

Sequence-based interactions

.2

.3

.4

0

.1

.2

.3

.4

45 40 35 30 25 20 15 10 5 0

n.s.

0

.1

.2

.3

adjusted distance correlation (dCor)

.4

P