445 103 10MB
English Pages 336 [337] Year 2023
Methods in Molecular Biology 2611
Georgi K. Marinov William J. Greenleaf Editors
Chromatin Accessibility Methods and Protocols
METHODS
IN
MOLECULAR BIOLOGY
Series Editor John M. Walker School of Life and Medical Sciences University of Hertfordshire Hatfield, Hertfordshire, UK
For further volumes: http://www.springer.com/series/7651
For over 35 years, biological scientists have come to rely on the research protocols and methodologies in the critically acclaimed Methods in Molecular Biology series. The series was the first to introduce the step-by-step protocols approach that has become the standard in all biomedical protocol publishing. Each protocol is provided in readily-reproducible step-bystep fashion, opening with an introductory overview, a list of the materials and reagents needed to complete the experiment, and followed by a detailed procedure that is supported with a helpful notes section offering tips and tricks of the trade as well as troubleshooting advice. These hallmark features were introduced by series editor Dr. John Walker and constitute the key ingredient in each and every volume of the Methods in Molecular Biology series. Tested and trusted, comprehensive and reliable, all protocols from the series are indexed in PubMed.
Chromatin Accessibility Methods and Protocols
Edited by
Georgi K. Marinov and William J. Greenleaf Stanford University, Stanford, CA, USA
Editors Georgi K. Marinov Stanford University Stanford, CA, USA
William J. Greenleaf Stanford University Stanford, CA, USA
ISSN 1064-3745 ISSN 1940-6029 (electronic) Methods in Molecular Biology ISBN 978-1-0716-2898-0 ISBN 978-1-0716-2899-7 (eBook) https://doi.org/10.1007/978-1-0716-2899-7 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023 Chapter 6 is licensed under the terms of the Creative Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/). For further details see license information in the chapter. This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Humana imprint is published by the registered company Springer Science+Business Media, LLC, part of Springer Nature. The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.
Preface The genomic distribution patterns of chromatin accessibility and its dynamics are key features of the regulation of gene expression and many other aspects of chromatin biology. The genomes of eukaryotes are usually packaged by nucleosomal particles, which have a generally strong inhibitory effect on transcription and on the occupancy of DNA by regulatory proteins. It is typically active cis-regulatory regions (cREs) in the genome that are characterized by depleted nucleosomal occupancy and increased chromatin accessibility, which has in turn proven to be a highly useful property enabling the identification of candidate cREs as well as the tracking of their activity across cell types and conditions as accessible DNA can be preferentially enzymatically or chemically labeled in numerous ways. Technological advances in the labeling and readout of accessible DNA have played a major role in driving forward our understanding of chromatin and regulatory biology over the last few decades. The last 15 years have seen a particularly dramatic explosion in the variety and power of approaches for studying chromatin accessibility, driven by two sequential technological revolutions: first, the development of high-throughput sequencing in the mid-2000s, and then the advent of single-cell genomics in the 2010s. The current book aims to provide a comprehensive resource covering the existing and state-of-the-art tools in the field. We have divided the protocols in the book into several sections, depending on the different aspects of chromatin accessibility that they measure and/or approaches that they take. In the first section, bulk-cell methods for profiling chromatin accessibility and nucleosome positioning that rely on enzymatic cleavage of accessible DNA and produce information about relative accessibility are covered. The second section is dedicated to methods that use single-molecule and enzymatic approaches to solving the problem of mapping absolute occupancy/accessibility. The third section covers the wide array of emerging tools for mapping DNA accessibility and nucleosome positioning in single cells, as well as a number of single-cell multiomics methods that simultaneously measure chromatin accessibility and other features of the cell, such as the transcriptome, the methylome, and protein markers. More recently, imaging-based methods for visualizing accessible chromatin in its nuclear context have emerged; these are included in the fourth section. The final section features computational methods for the processing and analysis of chromatin accessibility datasets. This book will serve as an extensive and useful reference for researchers studying different facets of chromatin accessibility in a wide variety of biological contexts. Georgi K. Marinov William J. Greenleaf
Stanford, CA, USA
v
Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PART I
BULK CLEAVAGE-BASED METHODS
1 Genome-Wide Mapping of Active Regulatory Elements Using ATAC-seq. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Georgi K. Marinov, Zohar Shipony, Anshul Kundaje, and William J. Greenleaf 2 Mapping Nucleosome Location Using FS-Seq. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Barry Milavetz, Brenna Hanson, Kincaid Rowbotham, and Jacob Haugen 3 Universal NicE-Seq: A Simple and Quick Method for Accessible Chromatin Detection in Fixed Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hang Gyeong Chin, Udayakumar S. Vishnu, Zhiyi Sun, V. K. Chaithanya Ponnaluri, Guoqiang Zhang, Shuang-yong Xu, Touati Benoukraf, Paloma Cejas, George Spracklin, Pierre-Olivier Este`ve, Henry W. Long, and Sriharsa Pradhan 4 Measuring Inaccessible Chromatin Genome-Wide Using Protect-seq . . . . . . . . . George Spracklin, Liyan Yang, Sriharsa Pradhan, and Job Dekker 5 Determination of the Chromatin Openness in Bacterial Genomes. . . . . . . . . . . . . Mahmoud M. Al-Bassam and Karsten Zengler 6 Profiling Chromatin Accessibility on Replicated DNA with repli-ATAC-Seq. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kathleen R. Stewart-Morgan and Anja Groth 7 Analysis of Chromatin Interaction and Accessibility by Trac-Looping. . . . . . . . . . Shuai Liu, Qingsong Tang, and Keji Zhao
PART II
v ix
3
21
39
53 63
71 85
METHODS FOR MEASURING THE ABSOLUTE LEVELS OF OCCUPANCY/ACCESSIBILITY
8 Single-Molecule Mapping of Chromatin Accessibility Using NOMe-seq/dSMF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Michaela Hinks, Georgi K. Marinov, Anshul Kundaje, Lacramioara Bintu, and William J. Greenleaf 9 ORE-Seq: Genome-Wide Absolute Occupancy Measurement by Restriction Enzyme Accessibilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Elisa Oberbeckmann, Michael Roland Wolff, Nils Krietenstein, Mark Heron, Andrea Schmid, Tobias Straub, Ulrich Gerland, and Philipp Korber
vii
viii
Contents
PART III 10
11
12
13
14
Single-Cell Joint Profiling of Open Chromatin and Transcriptome by Paired-Seq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chenxu Zhu, Zhaoning Wang, and Bing Ren Simultaneous Single-Cell Profiling of the Transcriptome and Accessible Chromatin Using SHARE-seq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Samuel H. Kim, Georgi K. Marinov, S. Tansu Bagdatli, Soon Il Higashino, Zohar Shipony, Anshul Kundaje, and William J. Greenleaf Simultaneous Measurement of DNA Methylation and Nucleosome Occupancy in Single Cells Using scNOMe-Seq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Michael Wasney and Sebastian Pott Massively Parallel Profiling of Accessible Chromatin and Proteins with ASAP-Seq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Eleni Mimitou, Peter Smibert, and Caleb A. Lareau Concomitant Sequencing of Accessible Chromatin and Mitochondrial Genomes in Single Cells Using mtscATAC-Seq. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Leif S. Ludwig and Caleb A. Lareau
PART IV 15
16
18
155
187
231
249
269
IMAGING METHODS FOR VISUALIZATION OF ACCESSIBLE DNA
ATAC-See: A Tn5 Transposase-Mediated Assay for Detection of Chromatin Accessibility with Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Yonglong Dang, Ram Prakash Yadav, and Xingqi Chen NicE-viewSeq: An Integrative Visualization and Genomics Method to Detect Accessible Chromatin in Fixed Cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 Pierre-Olivier Este`ve, Udayakumar S. Vishnu, Hang Gyeong Chin, and Sriharsa Pradhan
PART V 17
METHODS FOR PROFILING CHROMATIN ACCESSIBILITY AT THE SINGLE-CELL LEVEL
COMPUTATIONAL ANALYSIS OF CHROMATIN ACCESSIBILITY DATASETS
ATAC-seq Data Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 Daniel S. Kim Deep Learning on Chromatin Accessibility. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 Daniel S. Kim
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
335
Contributors MAHMOUD M. AL-BASSAM • Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA S. TANSU BAGDATLI • Department of Genetics, Stanford University, Stanford, CA, USA TOUATI BENOUKRAF • Faculty of Medicine, Craig L. Dobbin Genetics Research Centre, Memorial University of Newfoundland, St. John’s, NL, Canada LACRAMIOARA BINTU • Department of Bioengineering, Stanford University, Stanford, CA, USA PALOMA CEJAS • Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, USA XINGQI CHEN • Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden HANG GYEONG CHIN • New England Biolabs Inc., Ipswich, MA, USA; Genome Biology Division, New England Biolabs, Inc., Ipswich, MA, USA YONGLONG DANG • Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden JOB DEKKER • Program in Systems Biology, University of Massachusetts Medical School, Worcester, MA, USA; Howard Hughes Medical Institute, Boston, MA, USA PIERRE-OLIVIER ESTE`VE • New England Biolabs Inc., Ipswich, MA, USA; Genome Biology Division, New England Biolabs, Inc., Ipswich, MA, USA ULRICH GERLAND • Department of Physics, Technical University of Munich, Garching, Germany WILLIAM J. GREENLEAF • Department of Genetics, Stanford University, Stanford, CA, USA; Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA; Department of Applied Physics, Stanford University, Stanford, CA, USA; Chan Zuckerberg Biohub, San Francisco, CA, USA ANJA GROTH • Novo Nordisk Foundation Center for Protein Research (CPR), Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark; Biotech Research and Innovation Centre (BRIC), Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark BRENNA HANSON • Department of Biomedical Sciences, School of Medicine, University of North Dakota, Grand Forks, ND, USA JACOB HAUGEN • Department of Biomedical Sciences, School of Medicine, University of North Dakota, Grand Forks, ND, USA MARK HERON • Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Go¨ttingen, Germany; Gene Center, Faculty of Chemistry and Pharmacy, Ludwig-Maximilians-Universit€ a t Mu¨nchen, Munich, Germany SOON IL HIGASHINO • Department of Genetics, Stanford University, Stanford, CA, USA MICHAELA HINKS • Department of Genetics, Stanford University, Stanford, CA, USA DANIEL S. KIM • Biomedical Informatics Program, Stanford University School of Medicine, Stanford, CA, USA SAMUEL H. KIM • Cancer Biology Program, School of Medicine, Stanford University, Stanford, CA, USA; Medical Scientist Training Program, Stanford University, Stanford, CA, USA
ix
x
Contributors
PHILIPP KORBER • Biomedical Center (BMC), Division of Molecular Biology, Faculty of Medicine, LMU Munich, Martinsried, Germany NILS KRIETENSTEIN • Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark ANSHUL KUNDAJE • Department of Genetics, Stanford University, Stanford, CA, USA; Department of Computer Science, Stanford University, Stanford, CA, USA CALEB A. LAREAU • Departments of Genetics and Pathology, Stanford University, Stanford, CA, USA SHUAI LIU • Laboratory of Epigenome Biology, Systems Biology Center, Division of Intramural Research, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, USA HENRY W. LONG • Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, MA, USA LEIF S. LUDWIG • Berlin Institute of Health at Charite´ Universit€ atsmedizin Berlin, Berlin, Germany; Max-Delbru¨ck-Center for Molecular Medicine in the Helmholtz Association, Berlin Institute for Medical Systems Biology, Berlin, Germany GEORGI K. MARINOV • Department of Genetics, Stanford University, Stanford, CA, USA BARRY MILAVETZ • Department of Biomedical Sciences, School of Medicine, University of North Dakota, Grand Forks, ND, USA ELENI MIMITOU • Immunai, New York, NY, USA ELISA OBERBECKMANN • Department of Molecular Biology, Max Planck Institute for Multidisciplinary Sciences, Go¨ttingen, Germany V. K. CHAITHANYA PONNALURI • New England Biolabs Inc., Ipswich, MA, USA SEBASTIAN POTT • University of Chicago, Chicago, IL, USA SRIHARSA PRADHAN • New England Biolabs Inc., Ipswich, MA, USA; Genome Biology Division, New England Biolabs, Inc., Ipswich, MA, USA BING REN • Ludwig Institute for Cancer Research, La Jolla, CA, USA; Department of Cellular and Molecular Medicine, University of California San Diego, School of Medicine, La Jolla, CA, USA; Center for Epigenomics, Institute of Genomic Medicine, Moores Cancer Center, University of California San Diego, School of Medicine, La Jolla, California, USA KINCAID ROWBOTHAM • Department of Biomedical Sciences, School of Medicine, University of North Dakota, Grand Forks, ND, USA ANDREA SCHMID • Biomedical Center (BMC), Division of Molecular Biology, Faculty of Medicine, LMU Munich, Martinsried, Germany ZOHAR SHIPONY • Department of Genetics, Stanford University, Stanford, CA, USA PETER SMIBERT • 10x Genomics, Pleasanton, CA, USA GEORGE SPRACKLIN • Program in Systems Biology, University of Massachusetts Medical School, Worcester, MA, USA KATHLEEN R. STEWART-MORGAN • Novo Nordisk Foundation Center for Protein Research (CPR), Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark; Biotech Research and Innovation Centre (BRIC), Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark TOBIAS STRAUB • Core Facility Bioinformatics, Biomedical Center (BMC), Faculty of Medicine, LMU Munich, Martinsried, Germany ZHIYI SUN • New England Biolabs Inc., Ipswich, MA, USA QINGSONG TANG • Laboratory of Epigenome Biology, Systems Biology Center, Division of Intramural Research, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, USA
Contributors
xi
UDAYAKUMAR S. VISHNU • New England Biolabs Inc., Ipswich, MA, USA; Genome Biology Division, New England Biolabs, Inc., Ipswich, MA, USA ZHAONING WANG • Department of Cellular and Molecular Medicine, University of California San Diego, School of Medicine, La Jolla, CA, USA MICHAEL WASNEY • Genetics and Genomics Program, University of California, Los Angeles, CA, USA MICHAEL ROLAND WOLFF • Department of Physics, Technical University of Munich, Garching, Germany SHUANG-YONG XU • New England Biolabs Inc., Ipswich, MA, USA RAM PRAKASH YADAV • Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden LIYAN YANG • Program in Systems Biology, University of Massachusetts Medical School, Worcester, MA, USA KARSTEN ZENGLER • Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA; Center for Microbiome Innovation, University of California, San Diego, La Jolla, CA, USA; Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA GUOQIANG ZHANG • New England Biolabs Inc., Ipswich, MA, USA KEJI ZHAO • Laboratory of Epigenome Biology, Systems Biology Center, Division of Intramural Research, National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD, USA CHENXU ZHU • Ludwig Institute for Cancer Research, La Jolla, CA, USA
Part I Bulk Cleavage-Based Methods
Chapter 1 Genome-Wide Mapping of Active Regulatory Elements Using ATAC-seq Georgi K. Marinov, Zohar Shipony, Anshul Kundaje, and William J. Greenleaf Abstract Active cis-regulatory elements (cREs) in eukaryotes are characterized by nucleosomal depletion and, accordingly, higher accessibility. This property has turned out to be immensely useful for identifying cREs genome-wide and tracking their dynamics across different cellular states and is the basis of numerous methods taking advantage of the preferential enzymatic cleavage/labeling of accessible DNA. ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) has emerged as the most versatile and widely adaptable method and has been widely adopted as the standard tool for mapping open chromatin regions. Here, we discuss the current optimal practices and important considerations for carrying out ATAC-seq experiments, primarily in the context of mammalian systems. Key words Enhancers, Promoters, Chromatin accessibility, ATAC-seq, High-throughput sequencing
1
Introduction Eukaryotic chromatin is generally packaged by nucleosomes, octamer particles comprised of the four core nucleosomal histones H3, H4, H2A, and H2B [1]. Nucleosomal packaging has an inhibitory effect on transcriptional activity and prevents the binding of most transcription factors and other regulatory proteins. Active promoter and enhancer elements differ from the rest of the genome in that they usually exist in a depleted of nucleosomes, open chromatin state. This property is highly useful in practice because just as regulatory factors can access active cREs so can various enzymes, whose action is otherwise inhibited by nucleosome particles. That enhancers and promoters exhibit this property was already appreciated nearly four decades ago, when their hypersensitivity to cleavage by DNase enzymes was first reported [2–4].
Georgi K. Marinov and Zohar Shipony authors contributed equally to this work. Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols, Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7_1, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023
3
4
Georgi K. Marinov et al.
DNase remained the main tool for mapping active cREs into the genomic era, initially coupled to microarray readouts [5–7] and eventually adapted to a high-throughput sequencing format [8–10]. In parallel to these developments as well as more recently, a wide variety of alternative methods taking advantage of the preferential enzymatic/chemical cleavage/modification of accessible DNA were also developed, employing methyltransferases [11–15], restriction enzymes [16], nicking enzymes [17], small molecules [18], viral integration [19], and others. ATAC-seq, which is based on the preferential insertion into unprotected DNA by a hyperactive mutant version of the Tn5 transposase [20] (Fig. 1), has emerged as the most convenient, widely adaptable and straightforward to execute method for profiling open chromatin. Treatment of chromatin with Tn5 results in the insertion into accessible DNA of adapters that then enable the direct amplification of open chromatin fragments. This eliminates much of complex series of enzymatic steps that are unavoidable features of previous methods such as DNase-seq, allows for the protocol to be completed in just a few hours, and also dramatically lowers the input requirements, down to a few tens of thousands of cells in bulk reactions as well as enabling single cell (scATAC) assays [21, 22]. In this chapter, we describe the most important considerations for carrying out successful ATAC-seq experiments in the context of the Omni-ATAC protocol, an optimized version of the ATAC-seq assay that produces high-quality ATAC libraries for most mammalian cell lines and cell types, as well as for a number of other eukaryotes.
2
Materials Prepare a master stock of the ATAC-RSB buffer without detergents in a large volume (e.g., 50 mL) and store it 4 ∘C. Prepare master stocks of 2× TD buffer (e.g., in 2-mL tubes) and keep those at - 20∘C
2.1 Transposition Buffers and Reagents
Prepare the ATAC-RSB-Lysis and ATAC-RSB-Wash buffers immediately before use by adding the necessary detergents; keep on ice. 1. IGEPAL CA-630 detergent (Sigma Cat# 11332465001; supplied as a 10% solution) 2. Tween-20 detergent (Sigma Cat# 11332465001, supplied as a 10% solution; store at 4 ∘C) 3. Digitonin detergent (Promega Cat# G9441, supplied as a 2% solution in DMSO; store at - 20∘C))
Genome-Wide Mapping of Active Regulatory Elements Using ATAC-seq
5
cells
Tn5 transposase
nuclei isolaon
Tagmentaon - transposion - library building - sequencing
Purificaon and amplificaon
Final library, sequencing
Fig. 1 Outline of the ATAC-seq assay. Nuclei are isolated from cells and chromatin is incubated with an active Tn5 transposase carrying PCR amplification adapter sequences. Tn5 preferentially inserts into accessible chromatin, such as that found at active regulatory elements. After transposition, DNA is purified and PCR amplification is carried out from the primer landing sites deposited by Tn5
4. ATAC-RSB buffer (master stock) 10 mM Tris-HCl pH 7.4 10 mM NaCl 3 mM MgCl2 5. ATAC-RSB-Lysis buffer 10 mM Tris-HCl pH 7.4 10 mM NaCl 3 mM MgCl2 0.1% IGEPAL CA-630
6
Georgi K. Marinov et al.
0.1% Tween-20 0.01% Digitonin 6. Lysis Wash Buffer (ATAC-RSB-wash) 10 mM Tris-HCl pH 7.4 10 mM NaCl 3 mM MgCl2 0.1% Tween-20 7. 2× TD buffer 20 mM Tris-HCl pH 7.6 10 mM MgCl2 20% Dimethyl Formamide 8. Tn5 transposase (see Note 1) 2.2 Library Building, Sequencing, and Quality Evaluation
1. 200-μL PCR tubes 2. Sequencing primers/adapters (see Note 2) 3. NEBNext High-Fidelity 2× PCR Master Mix (NEB, Cat# M0541S) 4. Qubit fluorometer or equivalent 5. QuBit tubes 6. QuBit dsDNA HS Assay Kit 7. TapeStation (Agilent) or equivalent, e.g., BioAnalyzer (Agilent). 8. TapeStation D1000 tape and reagents (Agilent) 9. 10 mM dNTP mix 10. 25× SYBR Green (Thermo Fisher Cat# S7563. Supplied as 10,000X) 11. Phusion High-Fidelity M0530L)
2.3 General Materials and Equipment
DNA
Polymerase
(NEB,
Cat#
1. 1.5-mL microcentrifuge tubes, preferably low protein and DNA binding (see Note 4) 2. 2-mL, 15-mL, and 50-mL tubes 3. Incubator (37 ∘C), or a Thermomixer. 4. Tabletop centrifuge 5. Thermal cycler 6. MinElute PCR Purification Kit (Qiagen Cat# 28004/28006), Zymo DNA Clean and Concentrator Kit (Zymo Cat# D4013/ D4014), or equivalent (see Note 5) 7. Nuclease-free H2O
Genome-Wide Mapping of Active Regulatory Elements Using ATAC-seq
7
8. 1× PBS buffer solution 9. qPCR machine (Step One or equivalent)
3
Methods The general outline of the ATAC-seq assay is shown in Fig. 1. Nuclei are isolated from cells, then a transposition reaction is carried out, DNA purified, and sequencing libraries prepared. Here we discuss the Omni-ATAC protocol [23] in its most widely applicable version, as it derives the optimal results in terms of reduced mitochondrial contamination (see Note 6) compared to other versions of the assay [20, 24]. The Omni-ATAC protocol works as described for the great majority of mammalian and insect cell lines, as well as for many other eukaryotic cells without cell walls. Different protocols need to be applied for nuclei isolation from other sources, such as tissues (see Note 7), plant cells (see Note 8), various small metazoan animals (see Note 9), yeast (see Note 10), and others. It is also in principle possible to carry out ATAC-seq on crosslinked material but this generally produces suboptimal results and we advise against it (see Note 11).
3.1 Removal of Nonviable Cells (Optional)
The presence of non-viable cells can negatively affect the quality of final ATAC-seq libraries as dead cells generate a general background of dechromatinized DNA, decreasing the enrichment for open chromatin regions. Two strategies are usually used to address this problem: 1. If the fraction of dead cells is not too high (i.e., 5–15%), cells are treated with DNAse (200 U/mL) in culture media, usually for 30 min at 37∘C. Cells are then washed thoroughly with 1×PBS to remove DNAse. 2. If the fraction of dead cells is higher, live cells can be separated from dead cells using a Ficoll gradient (Sigma Cat# GE171440-02), with the exact conditions varying depending on the cell type.
3.2 Preparation of Nuclei
Once the quality of the input cells has been ensured, the next step is to prepare nuclei and transpose them. The empirically determined optimum input number of cells for a species with a mammaliansized genome is 50,000 diploid cells. Scale appropriately according to expected genome size and ploidy, and also change other parameters, such as centrifugation speeds, if necessary. 1. Centrifuge 50,000 viable cells at 500 g for 5 min at 4∘C 2. Carefully aspirate the supernatant avoiding the pellet.
8
Georgi K. Marinov et al.
3. Add 50 μL of cold ATAC-RSB-Lysis Buffer and pipette up and down several times. 4. Incubate on ice for 3 min 5. Add 1 mL cold ATAC-RSB-Wash Buffer, and invert several times to mix well. 6. Centrifuge at 500 g for 5 min at 4∘C 7. Carefully aspirate the supernatant as fully as possible while avoiding the pellet. 3.3
Transposition
Carry out transposition as follows: 1. Immediately resuspend reaction mix:
the
pellet
in
the
transposase
25 μL TD buffer 2.5 μL Tn5 22.5 μL nuclease-free H20 2. Incubate at 37∘C for 30 min in a Thermomixer at 1000 RPM. 3.4
DNA Purification
1. Immediately stop the reaction using 250 μL (i.e., 5×) of PB buffer (if using MinElute) or DNA Binding Buffer (if using Zymo; also see Note 12). 2. Purify samples following the kit instructions. 3. Elute with 10 μL of Elution Buffer.
3.5 PCR Amplification and Library Generation
Typically, a dual-indexing approach is used when amplifying ATACseq libraries. The general structure of an ATAC-seq library as well as the relevant adapter and primer sequences are shown in Fig. 2. See Note 2 for further discussion. 1. Set up a PCR reaction as follows: 10 μL eluted transposition reaction 10 μL Nuclease-free H2O 2.5 μL of Adapter 1 2.5 μL of Adapter 2 25 μL NEBNext High-Fidelity 2× PCR Master Mix (see Note 3) 2. Optimization of PCR conditions, pre-amplification. Amplify DNA for 5 cycles as follows: 72∘C for 3 min 98∘C for 30 s
Genome-Wide Mapping of Active Regulatory Elements Using ATAC-seq
A
A
insert
B
ME
P5 i5
ME
SR1
SR2
B
9
i7 P7
ME
A
5’-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-3’ 3’TCTACACATATTCTCTGTC-5;
B
5’-GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG-3’ 3’TCTACACATATTCTCTGTC-5’
ME
i7 primer i5 primer
5’-CAAGCAGAAGACGGCATACGAGAT[i7]GTCTCGTGGGCTCGG-3’ 5’-AATGATACGGCGACCACCGAGATCTACAC[i5]TCGTCGGCAGCGTC
Fig. 2 Structure of an ATAC-seq library. (a) After transposition, an original DNA fragment is flanked by two Tn5 molecules with their adapter. Note that all three possible configurations—A-A, B-B, and A-B/B-A (where “A” and “B” indicate the two different adapters that Tn5 molecules used for transposition carry; these sequences have a common “ME” segment)—are produced, but only the A-B ones can be subsequently amplified and sequenced under conventional protocols. The A and B are used as landing sites for the PCR primers that add the i5 and i7 barcodes and the P5 and P7 sequences needed for Illumina sequencing. (b) Typical sequences of A and B adapters and of i5 and i7 PCR primers. The [i7] and [i5] sequences are typically 8-bp long and should be chosen appropriately so as to maximize the sequence distance between each pair of indexes
5 cycles of: 98∘C for 10 s 63∘C for 30 s 72∘C for 30 s Hold at 4∘C 3. Determining additional cycles using qPCR. Use 5 μL of the pre-amplified reaction in a total qPCR reaction of 15 μL as follows: 3.76 μL nuclease-free H2O 0.5 μL of Adapter 1 0.5 μL of Adapter 2 0.24 μL 25× SYBR Green (in DMSO) 5 μL NEBNext High-Fidelity 2× PCR Master Mix 5 μL pre-amplified sample
10
Georgi K. Marinov et al.
4. Determining additional cycles using qPCR. Run the qPCR reaction with the following settings in a qPCR machine: 98∘C for 30 s 20 cycles of: 98∘C for 10 s 63∘C for 30 s 72∘C for 30 s Hold at 4∘C 5. Assess the amplification profiles and determine the required number of additional cycles to amplify. Typical results are shown in Fig. 3. 6. Carry out final amplification by placing the remaining 45 μL in a thermocycler and running the following program: Nadd cycles of: 98∘C for 10 s 63∘C for 30 s 72∘C for 30 s Hold at 4∘C Where Nadd is the number of additional cycles. In practice, we have found that 8–10 cycles are usually sufficient to amplify a standard mammalian ATAC library, and, if a very large number of samples are being processed at a time, the following reaction can be run: 7. Single-step PCR. 72∘C for 3 min 98∘C for 30 s 8–10 cycles of: 98∘C for 10 s 63∘C for 30 s 72∘C for 30 s Hold at 4∘C 8. Purify the amplified library following the same procedure used for purified the ATAC reaction.
Genome-Wide Mapping of Active Regulatory Elements Using ATAC-seq
11
a 35,000
Relative fluorescence
30,000 25,000 20,000 15,000 10,000
+6
+8
+10
5,000 0
1
2
3
4
5
6
7
8
9
10
11
12 13 Cycle
14
15
16
17
18
19
20
b
9 8 7 Log Conc.
6 5 4 3 2
y = -1.0879x + 11.974 R² = 0.9916
1 0 4
5
6
7
8
9
10
11
Ct
Phix - 200pM - 100pM - 50pM - 25pM - 12.5pM - 6.25pM - 3.125pM - 1.56pM
Fig. 3 Determination of additional PCR cycles (post pre-amplification) and library quantification using qPCR. (a) Determination of additional PCR cycles; qPCR is performed to determine the number of extra cycles to perform on the pre-amplified ATAC material without reaching saturation. To determine the number of extra cycles, find the number of cycles needed to reach 1/3 of the maximum relative fluorescence, and then carry out this number of additional PCR cycles. (b) Quantification of libraries; qPCR qualification is performed on diluted ATAC-seq libraries (400×) against a serial dilution of PhiX (200–1.56 pM). A standard curve is generated based on the PhiX dilutions and used to calculate the molarity of the ATAC-seq library 3.6 Library Quantification and Evaluation of Library Quality
Before libraries can be sequenced, they need to be properly quantified and their quality evaluated. There are two components to this process—first, evaluation of the insert distribution, and second, quantification. 1. Examination of library size distribution. This step can be carried out using a variety of instruments that are now available for this purpose, such as a TapeStation or a BioAnalyzer. In our practice we prefer to use a TapeStation (with the D1000 or HS D1000 kits) due to its ease of use, flexibility, and rapid turnaround time. Typical results are shown in Fig. 4. A successful
12
Georgi K. Marinov et al.
Fig. 4 Evaluation of ATAC-seq library size distribution. Shown is the fragment length distribution as evaluated using a TapeStation instrument and a D1000 TapeStation kit for an ATAC-seq library for the human GM12878 cell line. When a clear nucleosomal signature is observed, as in the example shown here, the library is most likely of high quality. Note that the nucleosomal signature can in some cases be obscured by the presence of high levels of mitochondrial contamination or some other source of highly accessible DNA (see Note 13 for further discussion)
mammalian ATAC-seq library usually exhibits a clear nucleosomal signature (though the reverse is not always true; see Note 13 for further discussion). 2. Quantification of library concentration. For most highthroughput sequencing applications, this step is standardly carried out using a Qubit fluorometer. This works well for most libraries as they exhibit a unimodal fragment length distribution, and the Qubit generally returns highly accurate and reliable measurements. However, ATAC libraries do not exhibit a unimodal fragment distribution and in fact often contain fragments of length higher than what can be sequenced on standard Illumina instruments. Thus the effective library concentration often differs from the apparent library concentration measured using Qubit (though Qubit measurements can still be used, with that caveat in mind, if no other information can be obtained) 3. Estimation of effective library concentration using qPCR. A standard curve is generated using Illumina PhiX standard (10nM) by first making a 50× dilution to 200 pM, from which additional seven serial 2× dilutions are made (to 100 pM, 50 pM, 25 pM, 12.5 pM, 6.25 pM, 3.125 pM, and 1.56 pM). Set up a 20 μL qPCR reactions as follows:
7.9 μL nuclease-free H2O 5 μL ATAC-seq 400× diluted library or PhiX standards
Genome-Wide Mapping of Active Regulatory Elements Using ATAC-seq
13
4 μL Phusion HF Buffer 1 μL 25 μM i7 primer 1 μL 25 μM i5 primer 0.4 μL 10mM dNTP mix 0.5 μL 25× SYBR Green (in DMSO) 0.2 μL NEB Phusion HF Run the qPCR reaction with the following settings in a qPCR machine: 98∘C for 30 s 20 cycles of: 98∘C for 10 s 63∘C for 30 s 72∘C for 30 s Hold at 4∘C Create a standard curve based on the PhiX dilutions and estimate the true molarity of the qPCR library based on it. Commercial kits such as NEBNext Library Quant Kit for Illumina or KAPA Library Quantification Kits can also be used, in a similar manner. 3.7
Sequencing
The protocol described here generates libraries designed to be sequenced on Illumina sequencers. A decision usually needs to be made regarding the format to be used when sequencing. We strongly advise against sequencing ATAC-seq libraries in a single-end format, for two reasons. First, analysis of the fragment length distribution is an important part of the quality evaluation of ATAC-seq datasets, and this is only truly possible in paired-end format. Second, many analyses of ATAC-seq data (e.g., transcription factor footprinting) operate at the level of examining insertion points rather than read coverage; paired-end reads produce twice as many such insertion points for the same cost. In practice, we have observed that ATAC-seq insert length distributions peaks around 50–60 bp (Fig. 5). Therefore it is most cost effective to sequence ATAC libraries in 2×36 bp or 2 × 50 bp formats (depending on the exact sequencer and kits available), as sequencing kits with more cycles are usually priced significantly higher. However, for some applications (for example, if aiming to study the effects of sequence variation on chromatin accessibility), longer reads can provide important additional information. Thus how exactly sequencing is to be executed is to be determined depending on the specific needs of the study being carried out.
Georgi K. Marinov et al. Fragment length distribution
B
0.008
0.4
0.006
0.3
150
200
250
300
350
400
450
500
Fragment length 127,700,000
Position relative to TSS
hg38
100 kb chr8:
2, 00 0
100
1, 50 0
50
1, 00 0
0.0
0
0.000
50 0
0.1
-2 ,0 00
0.002
0
C
0.2
-5 00
0.004
-1 ,0 00
AverageRPM
Fraction of fragments
A
-1 ,5 00
14
127,750,000
CASC11
127,800,000
127,850,000
PVT1 CASC11 MYC MYC
Fig. 5 Expected results from a successful ATAC-seq experiment. (a) Shown is the insert length distribution of a typical sequenced mammalian ATAC-seq library, showing a prominent subnucleosomal peak, as well as a mononucleosomal and a less pronounced dinuleosomal peak. (b) Aggregate ATAC-seq signal profile around transcription start sites (TSSs). (c) ATAC-seq profile in a 212-kb neighborhood around the human MYC gene. The ENCODE Consortium [34] keratinocyte dataset with accession ID ENCSR798IJQ was used for this example
4
Expected Results After sequencing, reads mapped to the reference genome, and several quality evaluation metrics are considered before proceeding with downstream analysis. A typical ATAC-seq library exhibits a nucleosomal signature as shown in Fig. 5a. Enhancer, promote and insulator regions should be strongly enriched relative to the rest of the genome (Fig. 5c shows the accessibility profile in the neighborhood of the MYC gene, which highlights a number of candidate distal regulatory elements). A quick way to evaluate the degree of enrichment is to examine aggregate plots of ATAC-seq signal around annotated TSSs, as shown in Fig. 5b, which can also be further formalized as a TSS ratio score, which is calculated by dividing the average number of fragments within ±100 bp of the TSS to the sum of the average number of fragments within the two 100-bp windows at the points + 2 and - 2 kb away from the TSS. The advantage of this metric is that it is independent of peak calling
Genome-Wide Mapping of Active Regulatory Elements Using ATAC-seq
15
or sequencing depth; its disadvantage is that it is annotationdependent and well calibrated only for the human and mouse genomes. In the latter two cases, good ATAC-seq libraries usually exhibit TSS rations ≥8.
5
Notes
1. The Tn5 transposase can be obtained as part of the various Nextera DNA Library Prep kits offered by Illumina, and also from several other commercial vendors. It can also be made by individual laboratories following previously published protocols [25]. The latter approach is, although laborious, the most cost effective, especially for large-scale projects. If homemade Tn5 is used, its activity should ideally be well characterized relative to standard enzymatic formulations. 2. PCR and indexing primers/adapters supplied with the Nextera DNA Library Prep kits offered by Illumina can be used. Alternatively, or if a larger number of indexing sequences is needed, custom-designed, and synthesized oligos can also be used with equivalent success. The structure of an ATAC-seq library with the relevant sequences is shown in Fig. 2. The i7 primer sequence is: 5’-CAAGCAGAAGACGGCATACGAGAT[i7]GTCTCGTGGGCTCGG-3’
The i5 sequence is: 5’AATGATACGGCGACCACCGAGATCTACAC[i5]TCGTCGGCAGCGTC-3’
Where [i7] and [i5] are the index sequences (typically 8-bp long). Dissolve and dilute to 25 μM. 3. The initial extension is very important when amplifying transposed DNA as it is needed to fill in the gap left from the transposition itself (see Fig. 2) and allow PCR primers to land in subsequent amplification cycles. For this reason, it is not recommended to use hot-start polymerase mixes, in which the polymerase is only activated by exposing to denaturation temperatures. 4. Low-binding tubes are preferable, though not absolutely required, as a low number of cells (only 50,000) is usually used as input to an ATAC reaction. 5. The MinElute kit can be replaced with other DNA purification kits; for example, we have also had equivalent success using the DNA Clean & Concentrator from Zymo. The important
16
Georgi K. Marinov et al.
variables regarding the DNA isolation procedure after transposition are the efficiency of recovery and the lower size limit of the recovered fragments. The insert length distribution of most ATAC-seq libraries peaks around 50–60 bp, i.e., even including Tn5 adapter sequences, many of the informative fragments are shorter than 90–100 bp, and should ideally be preserved during the DNA purification procedure. 6. Early versions of the ATAC-seq protocol [20] exhibited very high proportions of reads originating from the mitochondrial genome, often exceeding 80% of the total. This is due to the fact that the mitochondrial genome is not packaged by nucleosomes, and is therefore highly accessible to transposase insertion. Decreasing the fraction of mitochondria has been a key part of the improvement of the ATAC-seq protocol in its the currently used variants relative to the original version, and has been achieved thanks to the addition of the combination of digitonin, Tween-20 and IgePAL detergents during the cell lysis and nuclei preparation step. As a result ATAC-seq libraries generated using modern protocols frequently contain as little as ≤5% of reads for many cell types. We do note, however, that there are cells that simply contain an extremely high number of mitochondria due to very high levels of metabolic activity (e.g., some cancer cell lines), and even with the optimized Omni-ATAC protocol mitochondrial fractions are still quite high for them. These are special cases though. We also note that high levels of mitochondrial contamination do not necessarily correspond to poor-quality ATAC-seq datasets in terms of signal-to-noise ratios in the nuclear genome. We have found no inverse correlation between the fraction of reads mapping to the mitochondrial genome and the levels of enrichment for open chromatin regions. The key benefit of eliminating mitochondrial reads is to reduce sequencing costs as fewer overall reads need to be sequenced to achieve the necessary coverage over the nuclear genome. 7. Nuclei isolation from tissues, especially when frozen, is a multistep procedure involving tissue homogenization by douncing followed by density gradient centrifugation. The reader is referred to [23] for more details. 8. Plant cells are a challenging system to isolate nuclei from because of their thick cellulose cell walls. Nuclei are isolated by grinding tissue material in liquid nitrogen [26–28] and then sorting nuclei by sucrose sedimentation or by FACS, if the cell type of interest has been labeled accordingly, e.g., using the INTACT approach [29].
Genome-Wide Mapping of Active Regulatory Elements Using ATAC-seq
17
9. It is also often necessary to carry out homogenization followed by nuclei isolation when working with whole animals, e.g., C. elegans [30], with the exact protocol optimized according to the specifics of the organism being studied. 10. Yeast (and fungal cells in general) have thick cell walls comprised of polysaccharides, lipids, and chitin in various proportions. They present a barrier to the access of Tn5 to the nucleus, thus ATAC-seq protocols tailored to such cells involve treatment with zymolyase or chitinase enzymes [31], with the exact details varying depending on the species studied. 11. It is in principle possible to carry out ATAC-seq experiments and obtain enriched libraries from crosslinked sources. This has in fact been how a number of scATAC-seq studies have been executed in recent years [32, 33]. However, ATAC-seq libraries generated from crosslinked cells are generally suboptimal, with lower signal-to-noise ratio than standard ATAC-seq datasets, and they also tend to display a pronounced loss of subnucleosomal fragments compared to the standard protocol. We thus advice against using fixed material for ATAC-seq except for special circumstances where this is the only available option. 12. This is also a possible stopping point if necessary. DNA can be stored in PB buffer at - 20∘C before proceeding with subsequent clean up steps at a later time. 13. If a clear nucleosomal signature is observed in a TapeStation profile that in almost all cases indicates a successful ATAC-seq experiment. The inverse is not always true, as the fragment distribution can be dominated by the presence of large amounts of strongly accessible DNA in the original sample. For example, libraries with high levels of mitochondrial contamination often exhibit an obscured nucleosomal signature, and so do nearly all yeast libraries (in the latter case it is because yeast genomes contain a large number of ribosomal DNA copies, which are nearly nucleosome-free when being actively transcribed, and often comprise half or even more of yeast ATAC-seq libraries [13]). As discussed in Note 6, high levels of mitochondrial contamination are undesirable in terms of the efficient utilization of sequencing resources but do not necessarily result in poor-quality datasets in the nuclear genome.
Acknowledgements The authors thank members of the Greenleaf and Kundaje labs for many helpful discussions. This work was supported by NIH grants UM1HG009436 and P50HG007735 (to W.J.G.). WJG is a Chan
18
Georgi K. Marinov et al.
Zuckerberg investigator. Z.S. is supported by EMBO Long-Term Fellowship EMBO ALTF 1119-2016 and by Human Frontier Science Program Long-Term Fellowship HFSP LT 000835/ 2017-L. G.K.M. was supported by the Stanford School of Medicine Dean’s Fellowship. References 1. Luger K, M€ader AW, Richmond RK et al. (1997) Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature 389: 251–260 2. Wu C (1980) The 5′ ends of Drosophila heat shock genes in chromatin are hypersensitive to DNase I. Nature 286(5776):854–860 3. Keene MA, Corces V, Lowenhaupt K et al. (1981) DNase I hypersensitive sites in Drosophila chromatin occur at the 5′ ends of regions of transcription. Proc Natl Acad Sci U S A 78:143–146 4. McGhee JD, Wood WI, Dolan M et al. (1981) A 200 base pair region at the 5′ end of the chicken adult β-globin gene is accessible to nuclease digestion. Cell 27:45–55 5. Dorschner MO, Hawrylycz M, Humbert R et al. (2004) High-throughput localization of functional elements by quantitative chromatin profiling. Nat Methods 1:219–225 6. Sabo PJ, Humbert R, Hawrylycz M et al. (2004) Genome-wide identification of DNaseI hypersensitive sites using active chromatin sequence libraries. Proc Natl Acad Sci U S A 101:4537–4542 7. Sabo PJ, Kuehn MS, Thurman R et al. (2006) Genome-scale mapping of DNase I sensitivity in vivo using tiling DNA microarrays. Nat Methods 3:511–518 8. Crawford GE, Holt IE, Whittle J et al. (2006) Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). Genome Res 16:123–131 9. Boyle AP, Davis S, Shulha HP et al. (2008) High-resolution mapping and characterization of open chromatin across the genome. Cell 132:311–322 10. Thurman RE, Rynes E, Humbert R et al. (2012) The accessible chromatin landscape of the human genome. Nature 489:75–82 11. Kelly TK, Liu Y, Lay FD et al. (2012) Genomewide mapping of nucleosome positioning and DNA methylation within individual DNA molecules. Genome Res 22:2497–2506 12. Krebs AR, Imanci D, Hoerner L, Gaidatzis D et al. (2017) Genome-wide single-molecule footprinting reveals high RNA polymerase II
turnover at paused promoters. Mol Cell 67: 411–422.e4 13. Shipony Z, Marinov GK, Swaffer MP et al. (2018) Long-range single-molecule mapping of chromatin accessibility in eukaryotes. bioRxiv 504662 14. Wang Y, Wang A, Liu Z et al. (2019) Singlemolecule long-read sequencing reveals the chromatin basis of gene expression. Genome Res 29:1329–1342 15. Aughey GN, Estacio Gomez A, Thomson J et al. (2018) CATaDa reveals global remodelling of chromatin accessibility during stem cell differentiation in vivo. Elife 7:pii: e32341 16. Chereji RV, Eriksson PR, Ocampo J, Clark DJ (2019) DNA accessibility is not the primary determinant of chromatin-mediated gene regulation bioRxiv 639971 17. Ponnaluri VKC, Zhang G, Este´ve PO et al. (2017) NicE-seq: high resolution open chromatin profiling. Genome Biol 18(1):122 18. Umeyama T, Ito T (2017) DMS-seq for in vivo genome-wide mapping of protein-DNA interactions and nucleosome centers. Cell Rep 21: 289–300 19. Timms RT, Tchasovnikarova IA, Lehner PJ (2019) Differential viral accessibility (DIVA) identifies alterations in chromatin architecture through large-scale mapping of lentiviral integration sites. Nat Protoc 14:153–170 20. Buenrostro JD, Giresi PG, Zaba LC et al. (2013) Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods 10:1213–1218 21. Buenrostro JD, Wu B, Litzenburger UM et al. (2015) Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523:486–490 22. Cusanovich DA, Daza R, Adey A et al. (2015) Multiplex single cell profiling of chromatin accessibility by combinatorial cellular indexing. Science 348:910–914 23. Corces MR, Trevino AE, Hamilton EG et al. (2017) An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat Methods 14:959–962
Genome-Wide Mapping of Active Regulatory Elements Using ATAC-seq 24. Corces MR, Buenrostro JD, Wu B et al. (2016) Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution. Nat Genet 48:1193–1203 25. Picelli S, Bjo¨rklund AK, Reinius B et al. (2014) Tn5 transposase and tagmentation procedures for massively scaled sequencing projects. Genome Res 24:2033–2040 26. Lu Z, Hofmeister BT, Vollmers C et al. (2017) Combining ATAC-seq with nuclei sorting for discovery of cis-regulatory regions in plant genomes. Nucleic Acids Res 45:e41 27. Maher KA, Bajic M, Kajala K et al. (2018) Profiling of accessible chromatin regions across multiple plant species and cell types reveals common gene regulatory principles and new control modules. Plant Cell 30:15–36 28. Bajic M, Maher KA, Deal RB (2018) Identification of open chromatin regions in plant genomes using ATAC-seq. Methods Mol Biol 1675:183–201 29. Deal RB, Henikoff S (2010) A simple method for gene expression and chromatin profiling of
19
individual cell types within a tissue. Dev Cell 18:1030–1040 30. Daugherty AC, Yeo RW, Buenrostro JD et al. (2017) Chromatin accessibility dynamics reveal novel functional enhancers in C. elegans. Genome Res 27:2096–2107 31. Schep AN, Buenrostro JD, Denny SK et al. (2015) Structured nucleosome fingerprints enable high-resolution mapping of chromatin architecture within regulatory regions. Genome Res 25:1757–1770 32. Cusanovich DA, Reddington JP, Garfield DA et al. (2018) The cis-regulatory dynamics of embryonic development at single-cell resolution. Nature 555:538–542 33. Cao J, Cusanovich DA, Ramani V et al. (2018) Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science 361:1380–1385 34. ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489:57–74
Chapter 2 Mapping Nucleosome Location Using FS-Seq Barry Milavetz, Brenna Hanson, Kincaid Rowbotham, and Jacob Haugen Abstract The organization of nucleosomes in eukaryotic chromatin is thought to play a critical role in the regulation of the biological function of the chromatin. Because of this potential role in regulation, a number of techniques have been developed, which combine chromatin fragmentation around nucleosomes with nextgeneration sequencing to map the location of nucleosomes in chromatin. In this section, a procedure using a kit from New England Biolabs (NEB NEXT Ultra II FS DNA library prep Kit) to fragment chromatin in preparation for next-generation sequencing is described and compared to other available procedures for mapping nucleosome location. Key words NGS, Nucleosomes, Chromatin, Sequencing, Phasing
1
Introduction It has been known for many years that eukaryotic DNA is found within the nucleus of a cell organized with histones to form chromatin. The basic building block of chromatin is the nucleosome, which consists of approximately 145 base pairs of DNA wrapped around a histone octamer core containing two copies each of histone H2A, H2B, H3, and H4. As shown in Fig. 1 for the eukaryotic virus Simian Virus 40 (SV40), the nucleosomes typically appear as “beads on a string” in chromatin. Figure 1 also shows a short region of DNA, indicated by an arrow, that appears to lack at least one nucleosome. This region of “naked” DNA is found in the SV40 regulatory region [1–3] and for obvious reasons it has been referred to as a “nucleosome-free region” (NFR). The presence of a specialized chromatin structure, such as the NFR found in SV40 chromatin, which is characterized by specific nucleosome location and/or histone modifications, appears to be a general characteristic of genes that are poised for transcription or are actively transcribing [4].
Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols, Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7_2, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023
21
22
Barry Milavetz et al.
Fig. 1 SV40 minichromosome showing the “beads on a string” structure of chromatin. The arrow indicates the location of a nucleosome-free region. SV40 chromosomes were prepared and analyzed by electron microscopy [20]
Initially, the presence of modified chromatin structure in regulatory regions of cellular genes was determined by differences in susceptibility to nucleases such as DNase I [5], since naked DNA would be expected to digest more quickly than DNA present in nucleosomes. While nuclease sensitivity has been a valuable tool to map the location of putative regulatory regions, it is relatively a low-resolution tool. However, with the development of robust next-generation sequencing (NGS) techniques, it is now possible to directly map the location of nucleosomes in chromatin to determine whether there is an NFR generated or whether the initiation of transcription results in other changes in nucleosome location. The NGS workflow in Fig. 2 consists of four steps: fragmentation of chromatin into nucleosomes, preparation of DNA libraries from the DNA in the nucleosome-sized chromatin fragments, NGS sequencing of the libraries, and bioinformatic analysis of the sequencing data to determine the location of the DNA fragments present in the library. One of the keys to the NGS workflow is the procedure used to fragment the DNA into nucleosomes. Originally, chromatin was fragmented by micrococcal nuclease and the strategy was referred to as MN-Seq [6]. Fragmentation of chromatin by micrococcal nuclease is based on the fact that the micrococcal nuclease is a double strand specific endonuclease that would be expected to cleave DNA in the linker region of chromatin without digesting the DNA present in a nucleosome [6]. While this is generally true, the procedure has two major disadvantages. First, the specificity of micrococcal nuclease for linker region DNA is relative and because of this, it is necessary to titrate the amount of nuclease, temperature of digestion, and length of digestion to optimize the generation of
Mapping Nucleosome Location Using FS-Seq
23
3.5 3 2.5
Reads
2 .5 1
0
1 182 363 544 725 906 1087 1268 1449 1630 1811 1992 2173 2354 2535 2716 2897 3078 3259 3440 3621 3802 3983 4164 4345 4526 4707 4888 5069
0.5
Nucleotide number
Fig. 2 Workflow for mapping nucleosomes using chromatin fragmentation and next-generation sequencing. Blue circle = nucleosome; blue rectangle = adapter 1; gold rectangle = adapter 2
nucleosome-sized fragments [6]. Second, for reasons that are not completely understood, some nucleosomal DNA is much more sensitive to digestion than other nucleosomal DNA [6]. A second strategy for mapping nucleosomes is known as “Assay for Transposase Accessible Chromatin” (ATAC)-Seq. ATAC-Seq uses a transposase (Tagment from Illumina) in vitro to fragment the chromatin by targeting open regions of the chromatin [7]. With this procedure the chromatin is fragmented in linker regions and libraries are prepared in one step, since the transposase introduces the linkers needed for library amplification and sequencing. The major disadvantage of the ATAC-Seq procedure is that transposition is very sensitive to higher order chromatin structure and because of this, the nucleosomes generated tend to be located in the regulatory region of active genes [7]. This result has led to ATAC-Seq being used as an assay for open chromatin [7].
24
Barry Milavetz et al.
Chromatin Immunoprecipitation sequencing (ChIP-Seq) has also been used to map nucleosomes but primarily for those nucleosomes containing a specific form of histone modification [8]. In this procedure, antibody is used to target an epitope on a histone, the chromatin is fragmented typically by sonication and the fragments containing the epitope separated from other fragments. Again libraries are prepared from the former and sequenced. The major disadvantage with this procedure is that it is specific to the antibody being used. Frequently, this results in only a small number of nucleosomes being mapped in a particular target chromatin. We have recently described a fourth method for mapping nucleosomes in viral chromatin based on a kit (New England Biolabs FS) that utilizes a proprietary procedure for fragmenting chromatin [9]. We have used this procedure to map the location of nucleosomes in SV40 chromatin and found that the procedure (FS-Seq) yields nucleosome maps similar to but not identical to the maps that we obtain using either MN-Seq, ATAC-Seq, or ChIP-Seq. In this chapter, we describe the procedures that we have used for mapping nucleosomes by FS-Seq. 1.1 Basic Protocol for Preparing Sequencing Libraries from Chromatin Fragmented Using the FS Kit
2
This protocol describes procedures for fragmenting SV40 chromatin using the proprietary reagents in the New England Biolabs NEXT Ultra II FS DNA library prep Kit and for preparing DNA sequencing libraries from the fragmented chromatin using the New England Biolabs NEXT Ultra II DNA library prep Kit and E7335S Mulitplex oligos for Illumina. The protocol includes a procedure using submerged agarose gel electrophoresis to select and purify the subset of the library members that contain insert fragments of SV40 DNA sized from approximately 60–200 base pairs.
Materials 1. NEB library kit NEXT Ultra II FS DNA library prep Kit. 2. NEB library kit NEXT Ultra II DNA library prep Kit and Mulitplex oligos for Illumina (e.g., E7335S). 3. AMPure XP (Beckman Coulter, #A63880). 4. Ethanol (Sigma-Aldrich, #459844). 5. Nuclease-free water (Ambion, #fAM9937). 6. Certified low-Melt Agarose (Bio-Rad, #1613101). 7. Certified low-Melt Agarose (Bio-Rad, #1613112). 8. Agarose (Sigma-Aldrich A6877). 9. DNA Size Markers. 10. SsoAdvanced Universal SYBR Green Supermix (Bio-Rad, #172-5274).
Mapping Nucleosome Location Using FS-Seq
25
11. Primers to target genomic DNA. 12. Agarose gel sample buffer (see Subheading 3.4). 13. GelGreen Nucleic Acid Stain (EmbTec, EC-1995). 14. Zymoclean Gel DNA Recovery Kit (Zymo Research, #D400). 15. Monarch PCR & DNA Cleanup Kit (New England BioLabs #T1030S). 16. MiSeq Reagent Kit v3 (150 cycle). 17. 10 μL Graduated Filter Tips. 18. 2 μL Pipetman. 19. 10 μL Pipetman. 20. 200 μL Graduated Filter Tips. 21. 200 μL Pipetman. 22. 1000 μL Filter Tips. 23. 1000 μL Pipetman. 24. 200 μL thin-wall PCR Tubes (VWR, #20170-012). 25. Eppendorf MasterCycler Personal PCR. 26. BioRad CFX Connect real-time PCR system. 27. Eppendorf snap-cap microcentrifuge flex tubes (Fisher Scientific, #022264111). 28. Minicentrifuge LabDoctor 12 (MidSci). 29. Savant DNA 120 SpeedVac concentrator. 30. Power supply Bio-Rad Power Pac 3000 at 125 constant volts. 31. Submerged Agarose Gel Apparatus. 32. EmbiTec PrepOne Sapphire.
3
Methods
3.1 Fragmentation of SV40 Chromatin Using the New England Biolabs NEXT Ultra II FS DNA Library Prep Kit
1. We have previously described in detail the procedures that we use to prepare various forms of SV40 chromatin including functionally active minichromosomes and the chromatin from virus particles [10]. We have used similar procedures to successfully prepare chromatin for analysis from other small DNA viruses including bovine papillomavirus (BPV) and human papillomavirus (HPV). 2. In preparation of the fragmentation procedure, we first set the block on the Eppendorf MasterCycler Personal PCR to 4 °C and cooled the appropriate number of the thin-walled PCR tubes for 10 min in the block. Typically, three or four samples including controls are fragmented at the same time. When working with a new source of chromatin, we would do three
26
Barry Milavetz et al.
samples at the same time: the first would be the input chromatin, the second input chromatin with the supplied reaction buffer, and the third input chromatin, reaction buffer and enzyme. 3. We also set up a Monarch column for each sample and labeled the columns and flex tubes to receive the purified DNA following column purification of the fragmented chromatin. 4. Each PCR tube in a set then received 13 μL of SV40 chromatin using the 200 μL Pipetman. The buffer control and enzyme tubes then received 3.5 μL of the reaction buffer using the 10 μL Pipetman and the enzyme tube received an additional 1 μL of the enzyme mix from the FS kit using the 2 μL Pipetman. Finally, using the 10 μL Pipetman set to 10 μL, the liquid in each tube was mixed by drawing the liquid into the 10 μL tip followed by forcing the liquid back into the PCR tube. 5. Fragmentation was carried out for varying times and temperatures to optimize the generation of fragments. Typically, fragmentation was done for 5–15 min at either 4 °C or 37 °C. The lower temperature was tested first since at 4 °C nucleosomes would not be expected to slide appreciably and the results would be expected to most closely resemble the original organization of nucleosomes in the chromatin. However, if only limited fragmentation occurred, we then tested the higher temperature. 6. Following fragmentation, the reaction in each PCR tube was stopped by the addition of 100 μL of binding buffer (Monarch kit) using the 200 μL Pipetman. The binding buffer and sample in each tube was mixed as above using the 200 μL Pipetman, and then added to the Monarch column. The DNA was purified according to the protocol and reagents supplied in the Monarch kit. The purified bound DNA was eluted with 13 μL of nuclease-free water (Ambion) and stored at -20 °C until used in subsequent steps. 7. The extent of fragmentation in each of the samples was determined by qPCR. The assay is based on the idea that PCR will only amplify a particular region of DNA if the DNA is intact. By comparing the amount of DNA amplification product from the untreated sample to the amount of product following addition of reaction buffer and a mixture of reaction buffer and enzyme mix, it is possible to determine the extent of fragmentation that occurred in the buffer alone or with the mixture of buffer and enzymes at each of the fragmentation conditions. In order to analyze SV40 chromatin, we have a number of sets of primers, all of which yield amplifications products between 200 and 400 base pairs in size. The primer sets recognize various regions of the SV40 genome including the regulatory region,
Mapping Nucleosome Location Using FS-Seq
27
coding region, and between the coding regions. We will sometimes compare the extent of fragmentation in the regulatory region to other genomic sites to determine if the regulatory region is hypersensitive to fragmentation in certain forms of SV40 chromatin. Routinely, we use the set of primers that are found between the two coding regions, 5′ AAAATGAA GATGGTGGGGAGAAGAA 3′ and 5′ GACTCGAGGT GAAATTTGTGATGCT 3′, which recognize a fragment approximately 250 base pairs in size. We prepare a master PCR mix containing 10 μL of 2X BioRad amplification buffer (SsoAdvanced Universal SYBR Green Supermix), 0.2 μL of each primer (100 μM), and nuclease-free water to 20 μL total volume per sample to be analyzed. A 1 μL of the Monarch purified sample DNA is added to a 20 μL volume of the PCR mix and the tubes placed in the BioRad CFX Connect real-time PCR system. The mixture in each PCR tube is preheated in the PCR machine for 10 min to activate the DNA polymerase and then the DNA is amplified for 35 cycles at 95 °C for 1 min to denature the DNA, 54 °C for 1 min to anneal the DNA, and 72 °C for 1 min DNA extension. 8. Following the completion of the PCR amplification, the cycle threshold for the untreated sample is compared to the corresponding cycle threshold for the sample containing only the reaction buffer and the sample containing the reaction buffer and enzyme mix. If typical fragmentation is occurring with the FS kit, we would expect to observe from no change to a one cycle increase in threshold for the sample containing only buffer and a two to four cycle increase in threshold for the sample containing buffer and enzyme mix. The changes in cycle threshold for the sample containing the buffer and enzyme in the example above would indicate that approximately 75–90% of the input chromatin has been fragmented. 3.2 Preparation of Sequencing Libraries from FS Fragmented DNA Using the New England Biolabs NEXT Ultra II DNA Library Prep Kit
1. The fragmented DNA obtained using the FS kit was then used for the preparation of sequencing libraries using an NEB NEXT Ultra II DNA library prep Kit designed for sequencing on an illumina sequencing platform. All biochemical manipulations associated with the preparation of libraries with this kit were performed in a BSL-II hood. An Eppendorf MasterCycler Personal PCR located in the hood was set to a block temperature of 4 °C and a lid temperature of 65 °C. With the heated lid up, sterile thin-walled PCR tubes being used for library preparation were placed in the 4 °C block of the cycler and cooled for at least 10 min. For most library preparations, we generated eight libraries at one time. At the same time that the cycler was being set up, we added 11 μL using the 10 μL Pipetman of adapter dilution buffer (NEB) to another PCR tube and placed this tube in a -20 °C freezer for later use.
28
Barry Milavetz et al.
2. Libraries were prepared according to the protocol supplied with the NEXT Ultra II DNA library prep Kit with minor modifications. In the first step of the protocol, the fragmented DNA obtained from chromatin was end-repaired using an enzyme mix supplied with the kit. The purified DNA from the fragmentation step was diluted to 25 μL total volume using the 200 μL Pipetman with nuclease-free water and added to a thin-wall PCR precooled in the cycler block. To each PCR tube, we then added 3.5 μL of repair buffer with the 10 μL Pipetman and 1.5 μL of repair enzymes using the 2 μL Pipetman both of which were supplied in the kit. The temperature of the cycler block was raised to 20 °C and the DNA and reagents were incubated for 30 min. The temperature was then raised to 65 °C for 30 min to inactivate enzymes with the lid closed to keep the temperature even throughout the tubes followed by opening of the lid and cooling the block to 4 °C. 3. The cooled PCR tubes were centrifuged in the LabDoctor 12 minicentrifuge in the hood to ensure that all the liquid in each tube was located at the bottom of the tubes to maintain the proper concentration of reactants. The tubes were returned to the block in the cycler at 4 °C. The next step in the protocol from the kit was the ligation of adapters onto the ends of the end-repaired DNA present in each of the tubes. 4. Using the 2 μL Pipetman, 0.5 μL of adapter was added to the thawed 11 μL of adapter dilution buffer placed into the cycler block and thoroughly mixed. Next, using the 10 μL Pipetman, 2.5 μL of adapter was added to each sample tube, followed by 1 μL of enhancer again using the 2 μL Pipetman. Finally, using the 200 μL Pipetman, 15 μL of ligation mix from the kit was added with thorough mixing to each tube, and tubes were incubated in the block for 15 min at 20 °C. 5. The tubes in the block were cooled to 4 °C for the addition of the USER enzyme. Using the 2 μL Pipetman, 1.5 μL of USER (from the NEB primer kit) was added to each tube, the tubes mixed using the 10 μL Pipetman set to 10 μL, and incubated at 37 °C for 15 min. At the end of this incubation, the tubes were cooled to 4 °C, centrifuged in the LabDoctor 12 minicentrifuge, and stored in a -20 °C freezer before purification. 6. The libraries were then purified through a series of steps including column purification, submerged agarose gel size selection, and AMPure purification prior to sequencing. The libraries were column purified using the Monarch kits in part to concentrate the libraries. The frozen libraries were thawed to room temperature in a BSL-II hood, 250 μL of binding buffer (Monarch kit) was added using a 1 mL Pipetman with mixing, and the library in binding buffer was added to the column. The
Mapping Nucleosome Location Using FS-Seq
29
sample was centrifuged for 1 min in the LabDoctor 12 minicentrifuge and then the bound DNA was washed with 194 μL of wash buffer (from the kit). The library DNA was eluted with 25 μL of nuclease-free water (200 μL Pipetman) and collected in a sterile flex tube. The eluted libraries were then dried using a Savant DNA 120 SpeedVac concentrator set to heat off and slow dry. This typically took approximately 40 min. 7. Each library was size-selected on a mixture of low-melting temperature agarose and standard agarose in order to maximize the number of library members that had the correct size inserts. For a 50 mL total gel volume, which was used in our gel apparatus, 0.9 g of low-melt agarose (Certified low-Melt Agarose, Bio-Rad, #1613112) and 0.1 g of regular agarose (Certified low-Melt Agarose, Bio-Rad, #1613101) was added to a 100 mL bottle. For each analysis, 350 mL of buffer was prepared in a 500 mL bottle by diluting a 50X stock buffer (see Subheading 3.4) kept at 4 °C. A total of 50 mL of the running buffer was added to the agarose in the bottle and the mixture was heated in a microwave until all of the agarose dissolved. When completely dissolved, 2 μL of GelGreen Nucleic Acid Stain was added to the agarose and the gel was then poured into the gel apparatus. 8. When the agarose was solidified (typically approximately 1 h at room temperature), the gel was covered in running buffer and the sample wells were loaded. Typically, a gel contained 10 sample well and was loaded with size markers in lane 2, lane 6, and lane 10. For our work, we use PCR amplification products approximately 120 and 300 base pairs in size. The libraries to be size-selected were loaded into lanes 4 and 8 by adding 10 μL of sample buffer to the dried libraries to resuspend the DNA and then the liquid placed in the sample well. The samples and size markers were electrophoresed for approximately 1 h and 15 min with the voltage set to 125 volts or until the blue dye in the sample buffer was at the end of the gel. 9. The gel was removed from the electrophoresis apparatus and placed onto the EmbiTec PrepOne Sapphire. The Sapphire was turned on to illuminate the gel and the center of the bands in the marker lanes were sliced with a blade. The Sapphire was turned off and the gel transferred to paper towels for cutting out of the libraries. Using the slices in the DNA marker lanes as guides the portion of the gel from the library lanes that contained the same-sized DNA were cut out. The portion of the gel containing the correct-sized DNA was sliced into small pieces and added to 800 μL of binding buffer from the Zymoclean Gel DNA Recovery Kit in a flex tube. The tube was shaken repeatedly until all of the agarose gel had dissolved.
30
Barry Milavetz et al.
10. The dissolved library was added to the Gel DNA recovery column and purified according to the protocol supplied by the kit. Following the required washes, the library DNA was eluted in 25 μL of nuclease-free water. The eluted library was dried as above in the Savant DNA 120 SpeedVac concentrator. 11. The dried library was resuspended in 5 μL of nuclease-free water in the preparation for PCR amplification with appropriate primers. Libraries were amplified in a total volume of 160 μL of amplification buffer. The buffer was prepared by adding 80 μL using a 200 μL Pipetman of 2X SsoAdvanced Universal SYBR Green Supermix, 1.6 μL of universal primer (NEB Mulitplex oligos for Illumina) using a 2 μL Pipetman, 1.6 μL of an indexed primer using a 2 μL Pipetman (NEB Mulitplex oligos for Illumina), and 80 μL nuclease-free water using a 200 μL Pipetman. 12. Following thorough mixing of the amplification buffer, a 10 μL aliquot was transferred to a PCR tube with the 200 μL Pipetman to be used as a non-DNA control. A total of 2.5 μL of the library DNA was added to the remaining 150 μL of amplification buffer using the 10 μL Pipetman and the DNA was thoroughly mixed in the buffer. A 10 μL aliquot was removed with the 200 μL Pipetman and placed into a PCR tube. The remaining 140 μL of amplification buffer was stored in a freezer at -20 °C until needed. The non-DNA control and library DNA PCR tubes were then placed in a BioRad CFX Connect real-time PCR system and amplified using 1 min cycles of 60°, 72°, and 95°. Following amplification the peak for the amplification containing the library DNA was determined from the cycle threshold data generated and the remaining 140 μL was divided into four aliquots of approximately 35 μL each using a 200 μL Pipetman and then amplified to the cycle threshold empirically determined. 13. Following amplification the amplified library DNA was purified by AMPure. All manipulation of the amplified libraries was performed in a BSL-II hood. The amplification buffer in the four tubes were combined and 95 μL of AMPure was added to the tube using a 200 μL Pipetman, the contents mixed thoroughly and then transferred to flex tube. The combined contents were incubated at room temperature for 10 min to allow the library DNA to bind to the AMPure beads. 14. Following the incubation the tube was centrifuged to ensure that the contents were all at the bottom of the tube, and the tube was placed into a magnetic stand to separate the magnetic beads with bound DNA from the DNA-depleted liquid. The stand was placed on its side while the magnetic beads were bound so that the bound beads would be located
Mapping Nucleosome Location Using FS-Seq
31
approximately half way up the tube and not at the bottom. This was done so that when the liquid was removed there was less chance that the beads would be dislodged. After a 10-min incubation to allow the beads to be separated from the liquid, the stand was placed upright in order to allow the liquid to collect at the bottom of the tube. The DNA-depleted liquid was removed using a 200 μL Pipetman set to 200 μL, being careful not to dislodge any of the magnetic beads. 15. The beads on the side of the tube were washed twice with 400 μL of a wash solution consisting of 80% ethanol and 20% nuclease-free water, which was prepared right before use using a 1 mL Pipetman. Following the removal of the second wash, the beads were air-dried for 10 min in the hood. 16. The tube containing the magnetic beads was removed from the magnetic rack and 16 μL of nuclease-free water was added using a 10 μL Pipetman set to 8 μL. The beads and water were thoroughly mixed by vortexing and incubated for 10 min at room temperature. Following the incubation, the mixture was centrifuged in a Minicentrifuge LabDoctor 12 for 10 s to force the beads and liquid to the bottom of the flex tube. The tube was then placed back into the magnetic stand and incubated for an additional 5 min to allow the beads to bind to the side of the bottom of the tube and separate from the nuclease-free water that contains the eluted library DNA. 17. A 12 μL aliquot of the nuclease-free water containing the library DNA was very carefully removed from the tube using a 10 μL Pipetman set to 6 μL and placed in a new sterile flex tube. This aliquot is stored in the freezer at -20 °C and would be submitted for DNA sequencing if it meets our quality control. The remaining aliquot of the library (4 μL) is also stored in the freezer and eventually analyzed by submerged agarose gel electrophoresis to determine the size and amount of nucleosome-sized DNA in the library. 18. The quality of the amplified libraries was determined using submerged agarose gel electrophoresis. In preparation for analysis of the library DNA, we prepared running buffer and an agarose gel. The running buffer was prepared by adding 7 mL of a 50X stock TAE buffer to a 500 mL bottle and adding 350 mL of distilled purified water. To identify the location of DNA in the gel, 17.5 μL of ethidium bromide was added to the running buffer using a 200 μL Pipetman. In a 100 mL bottle, we added 1.4 g agarose (Sigma-Aldrich) and 50 mL of the running buffer and heated the mixture in the microwave to dissolve the agarose. When the agarose was completely dissolved, we added 2.5 μL of the ethidium bromide solution, swirl the agarose, and pour it into the gel apparatus.
32
Barry Milavetz et al.
19. When the gel has cooled and hardened, it is covered with running buffer. The library sample is suspended in 10 μL of sample buffer using a 10 μL Pipetman and added to a sample well. In a well adjacent to the library sample, we added a DNA marker and subject the samples to submerged electrophoresis for approximately 1 h and 15 min with the voltage set to 125 volts. The gel is removed and the DNA present in the gel visualized on a LiCor Oddyssey FC. A high-quality library would be expected to show only a fairly tight band around the size of a nucleosome with added adapters at approximately 250 base pairs in size. 20. Libraries that are of sufficient quality are then used for DNA sequencing. 21. Libraries are sequenced on an Illumina MiSeq using a MiSeq Reagent Kit v3 (150 cycle) in the sequencing core at the University of North Dakota. Typically, 20–25 individual libraries are sequenced at the same time. Because of the small size of the SV40 genome, this usually results in enough reads per library to adequately cover the genome. 3.3 Bioinformatic Analyses
1. Following sequencing of libraries, the data files generated are analyzed using standard bioinformatics software. First, the FASTQ files generated by sequencing are subjected to an initial quality control analysis using FASTQC v.0.11.2 [11]. Second, the adapters attached to the ends of the insert DNA during the preparation of the libraries were removed using Scythe v0.98 [12]. Third, quality trimming was performed using Sickle v1.33 [13], and readings with a Phred score of less than 30 and reads smaller than 45 base pairs were discarded. Fourth, the reads corresponding to African green monkey (Chlorocebus sabaeus 1.1) and human (hg19) sequences were removed following alignment to their respective genomes. While we continue to do this we have found that it has little effect on the actual virus reads. Fifth, the reads present in the FASTQ files remaining after these treatments were then aligned to the SV40 genome (RefSeq ACC: NC_001669.1) cut between nucleotides 2666 and 2667 using Bowtie2 v2.2.4 [14]. Cutting the genome was necessary to display the data as a linear map because the SV40 genome is normally found as a circle. Sixth, duplicate reads were removed using the Picard Tools (Broad) Mark Duplicates function. Seventh, bam files were generated using an awk script from each biological library replicate with filtering for specific size ranges of the DNA. Nucleosome-sized DNA was identified using filtered reads from 100 to 150 base pairs in size, while potential transcription factor binding sites were identified using reads filtered between 60 and 99 base pairs in size.
Mapping Nucleosome Location Using FS-Seq
33
2. The bam files generated are displayed for comparison purposes as merged heatmaps. Typically, a minimum of four libraries generated from different biological replicates are sequenced to generate each heatmap. The individual bam files are normalized first so that all are weighted equally using Samtools v1.3.1 [15] and then merged using the R programming language. Bedgraphs are normalized to 1X coverage from filtered deduplicated reads using DeepTools v2.5.4 [16]. Finally, heatmaps were generated from the Z-scores of the normalized coverage and displayed using IGV v2.3.52 [17]. 3.4 Reagents and Solutions
4
Ethidium Bromide Stain (0.5 μg/μL) 50 mg ethidium bromide, add water to 100 mL. TAE stock (50X) 242 g of tris base dissolved in 750 mL water. Add 57.1 mL glacial acetic acid and 100 mL EDTA. Adjust final volume to 1 L. Bring the pH to 8.5. TAE electrophoresis running buffer to 20 mL TAE (50X) stock buffer, add 980 mL water. Agarose gel sample buffer 6.0 mL 10% SDS, 2.0 mL of 0.1 M EDTA, 50 mL glycerol 1% Coomassie blue, and water to 100 mL.
Notes Comparison of FS-Seq to other methods for mapping nucleosomes. A comparison between the mapping results obtained from FS-Seq and other mapping techniques for the SV40 chromatin found in virus particles is shown in Fig. 3.
Fig. 3 Mapping nucleosomes using FS-Seq, ATAC-Seq, MN-Seq, and ChIP-Seq. The chromatin was obtained from SV40 virus particles and analyzed by each of the different procedures as previously described [9]. ChIPSeq is shown using antibody targeting H3K9me3 and targeting H4K20me1 in nucleosomes
34
Barry Milavetz et al.
The figure compares the location of nucleosomes using FS-Seq to ATAC-Seq, MN-Seq, and ChIP-Seq using antibodies that recognize nucleosomes containing H3K9me3 and H4K20me1. It is apparent that many of the brighter yellow bands, which represent favored nucleosome locations in the chromatin, are present in a number of mapping techniques. For example, a very bright band is located in the enhancer using the FS, ATAC, and ChIP-Seq with antibody to H3K9me3 procedures. Similarly, many of the less bright bands also appear to be present in the maps obtained by more than one technique. We include the results of ChIP-Seq using antibodies to H3K9me3 and H4K20me1 to demonstrate that the pattern of nucleosomes obtained with FS-Seq most likely represents the nucleosome locations in the major form of SV40 chromatin that was present in the virus particles since the FS pattern closely resembles the ChIP-Seq result with H3K9me3 antibody, which is present in a significant fraction of viral chromatin [18] and does not resemble as much the result with H4K20me1, which appears to be present in only a small amount of the chromatin. The figure also shows that each mapping technique has preferred sites of action. As is well known, ATAC-Seq targets open chromatin, which typically is found in the regulatory region of active genes [7]. In marked contrast, MN-Seq can under-represent nucleosomes located in regulatory regions if digestion occurs at higher temperatures or for longer periods of time [6]. This has led to the suggestion that nucleosomes in certain regulatory regions are “fragile” due to the histone modifications are associated factors that allow them to be targeted by MN-Seq more efficiently than the rest of the nucleosomes in a gene [6]. Of course, ChIP-Seq results will vary with the antibody used, since the nucleosomes containing the antibody target is likely to be present at preferred sites in the genome. Based on our experience with viral chromatin, FS-Seq appears to generate maps that are most similar to ATAC-Seq, with more nucleosomes present, or ChIP-Seq for target histone modifications that are present more or less throughout the genome. The FS-Seq procedure is relatively rapid and simple. There are a minimal number of manipulations and the overall procedure can be completed over a 30 min to 1 h timeframe depending upon the fragmentation time. The FS kit is designed as a one-step procedure for fragmenting DNA and preparing libraries. The kit accomplishes this by inactivating the fragmenting enzymes during an incubation at 65 °C for 30 min. Following inactivation, adapters are ligated to the fragmented DNA as in a usual NEXT Ultra procedure. We chose not to do this because we were concerned that heating the chromatin–enzyme mixture to 65 °C for 30 min would be likely to result in over-fragmentation of the viral chromatin. We have not used FS-Seq to map the location of nucleosomes in cellular chromatin. However, we believe that a workflow similar
Mapping Nucleosome Location Using FS-Seq
35
to that used for ATAC-Seq would likely work with FS-Seq for this purpose as well. This workflow would consist of preparing nuclei from cells followed by resuspension of the nuclei in reaction buffer and addition of fragmentation enzymes to allow for fragmentation. Fragmentation would be assayed by qPCR measurement of the amount of a target gene or genes that is found in the buffer following incubation at different temperatures or times. Since FS-Seq is enzyme-based like ATAC-Seq and the two procedures yield similar results with viral chromatin, it seems likely that it would also tend to target open chromatin and might be an alternative strategy for analyzing open chromatin. Specific considerations when preparing fragmented chromatin for FS-Seq. In working with viral chromatin we always determine the relative amount of DNA present in a sample using qPCR. In order to ultimately obtain useful sequencing data following FS-Seq, we have experimentally determined that for a genome that is SV40 in size (5243 base pairs), we need the input chromatin to have a cycle threshold of less than 20 cycles. This is due to the fact that the fragmentation by FS typically results in a shift of the cycle threshold to around 25 cycles. As noted below as long as the amount of DNA in the fragmented samples is in this range, useful libraries can be prepared. With larger genomes it is likely that in order to obtain sufficient coverage of the genome, more input chromatin will probably be needed. With SV40 used this way, we can obtain anywhere from around 500 reads per library sample to 5000 reads. Since there are only about 24 nucleosomes in SV40, this is sufficient coverage. Based on the relatively large numbers of samples analyzed, we have noted that occasionally we will have a sample of disrupted virus that does not appear to fragment very well. At this time we do not know why this is the case, because we have successfully fragmented a number of other samples of chromatin from disrupted virus. We believe that this may be due to the presence of inhibitors remaining with the chromatin. For example, we use a high concentration of dithiothreitol to disrupt the virus particles and this may be the reason for the problem. We have not noted this issue with other samples that were not prepared in the presence of dithiothreitol. We are presently investigating whether the dithiothreitol is responsible for this inhibition and if so whether there are alternative ways to purify the chromatin from disrupted virus to prevent this inhibition. This observation shows the importance of at least initially using the two controls listed, chromatin alone and chromatin with buffer, since with qPCR of the samples it is possible to quickly determine the extent of chromatin fragmentation. We have also observed that with some samples, but not all we observed,
36
Barry Milavetz et al.
approximately a 1 cycle reduction in the sample that contains only added buffer. We believe that this is most likely due to the activation of endogenous nucleases that are present in the biological preparations. This is somewhat variable and appears to depend on a number of parameters including the time point in SV40 infection and the number of passages that the cells used for SV40 infections have undergone. Typically, in these situations we would still observe a 2to 3-cycle increase in the cycle threshold number in the sample containing buffer and enzyme compared to buffer alone. For our fragmentation studies, we have usually tried to use the lowest incubation temperature possible to try to limit any natural movement of nucleosomes since nucleosome movement might occur at higher temperatures. However, we have not studied this closely and do not know if it is an important consideration. Specific considerations when preparing libraries following FS fragmented chromatin. We have found that as long as the cycle threshold of the fragmented chromatin is less than or equal to approximately 25 cycles, we can obtain high-quality libraries. We judge this by analyzing the quality of the library as described in the methods section by agarose gel electrophoresis and looking for a single relatively sharp band in the region of the gel where we would expect to find nucleosome-sized DNA fragments with attached adapters. When fragmenting a new form of chromatin (such as a new virus), we generally will amplify the prepared library after column purification and analyze it on a gel similarly. In this case we are looking for a smear of amplification products from the correct size to a size significantly larger but with a clear increase in the number of products at the correct size. If the products are all larger than the correct size, it indicates that fragmentation was not done for long enough or at a high enough temperature. We would redo the fragmentation taking this into account and adjust the fragmentation conditions accordingly. When the cycle threshold is greater than 25 cycles, we find that the libraries sometimes meet our quality requirements but in some cases do not. We have observed that if a library does not show a single broad band of the correct size, but instead shows multiple bands or a smear of library elements, it is unlikely to yield good sequencing data. For those libraries we have found it better to redo the fragmentation and library preparation with more or a different sample instead of sequencing the available library. We have found that the purification of the proper sized fragments by submerged gel electrophoresis is an important step. Without this step the fraction of DNA fragments of the proper size is very small and most of the DNA sequenced is discarded during the bioinformatics analysis because it is too large. It is
Mapping Nucleosome Location Using FS-Seq
37
important that the slice of gel used for purification is sufficiently large to include the library elements containing inserts from approximately 60–200 base pairs and not larger elements. Specific considerations when analyzing sequencing data generated by FS-Seq. In our bioinformatic analysis, we include only inserts that are from 100 to 150 base pairs for nucleosomes. It would be possible to use reads that are somewhat larger but it makes it more difficult to determine exactly where the center of the associated nucleosome is supposed to be. This is because for the larger DNA inserts, it is not possible to know whether the nucleosome is found in the center of the DNA or at one of the ends. We chose to display our data as normalized merged heatmaps. We did this because in our studies on SV40, we noticed that one characteristic of viral regulation in our system appeared to be relatively small changes in nucleosome position [19] and thought that the heatmaps showed this best. However, the data could also be displayed as bedgraphs following normalizing and merging of data. We have chosen to use normalizing and merging as a way to remove some of the biological variability that appears in our system. As indicated above, we also use a minimum of four biological replicates in our studies so that the merged data is from at least four samples. Frequently, we have used as many as ten biological replicates if there appears to be variability. One place in the heatmaps that this can be seen is in the width of the bands in the heatmaps. When bands appear to be relatively broad, it probably occurs at a site in the genome where there are more than one preferred nucleosome locations. In our recent publication [9] describing the mapping of nucleosomes on the SV40 genome, we also describe how we used the sequencing data from shorter reads to look for the location of transcription factor binding. With the shorter read analysis, we found a number of reads that corresponded to the position of SP1 binding at its cognate sequence. Of course, since we are only looking at reads, it is not possible to exclude the possibility that there are other factors also bound at this site. Nevertheless, an analysis of shorter reads using the FS-Seq sequencing data may help to identify sites of interest for binding by factors in chromatin.
Acknowledgments This work was funded by a grant from the National Institutes of Health, AI142011 (to B.M.) The authors thank Ms. Corina Murphy and New England Biolabs for the generous gift of reagents.
38
Barry Milavetz et al.
References 1. Scott WA, Wigmore DJ (1978) Sites in simian virus 40 chromatin which are preferentially cleaved by endonucleases. Cell 15(4): 1511–1518 2. Varshavsky AJ, Sundin OH, Bohn MJ (1978) SV40 viral minichromosome: preferential exposure of the origin of replication as probed by restriction endonucleases. Nucleic Acids Res 5(10):3469–3477. PubMed PMID: 214758; PMCID: 342688 3. Waldeck W, Fohring B, Chowdhury K, Gruss P, Sauer G (1978) Origin of DNA replication in papovavirus chromatin is recognized by endogenous endonuclease. Proc Natl Acad Sci U S A 75(12):5964–5968. PubMed PMID: 216004; PMCID: 393097 4. Parmar JJ, Padinhateeri R (2020) Nucleosome positioning and chromatin organization. Curr Opin Struct Biol 64:111–118. https://doi. org/10.1016/j.sbi.2020.06.021 5. Weintraub H, Groudine M (1976) Chromosomal subunits in active genes have an altered conformation. Science 193(4256):848–856. https://doi.org/10.1126/science.948749 6. Voong LN, Xi L, Wang JP, Wang X (2017) Genome-wide mapping of the nucleosome landscape by micrococcal nuclease and chemical mapping. Trends Genet 33(8):495–507. https://doi.org/10.1016/j.tig.2017.05.007. PubMed PMID: 28693826; PMCID: PMC5536840 7. Sun Y, Miao N, Sun T (2019) Detect accessible chromatin using ATAC-sequencing, from principle to applications. Hereditas 156:29. https://doi.org/10.1186/s41065-0190105-9. PubMed PMID: 31427911; PMCID: PMC6696680 8. Park PJ (2009) ChIP-seq: advantages and challenges of a maturing technology. Nat Rev Genet 10(10):669–680. https://doi.org/10. 1038/nrg2641. PubMed PMID: 19736561; PMCID: PMC3191340 9. Milavetz B, Haugen J, Rowbotham K (2020) Comparing a new method for mapping nucleosomes in simian virus 40 chromatin to standard procedures. Epigenetics. 1–10. https://doi. org/10.1080/15592294.2020.1814487 10. Balakrishnan L, Milavetz B (2017) Epigenetic analysis of SV40 minichromosomes. Curr Protoc Microbiol 46:14F 3 1–1F 3 26. https:// doi.org/10.1002/cpmc.35 11. Andrews S (2010) FastQC: a quality control for high throughput sequence data
12. Buffalo V (2011) Scythe: a Bayesian adapter trimmer 13. Joshi NA, Fass JN (2011) Sickle – a windowed adaptive trimming tool for FASTQ files using quality 14. Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10(3):R25. https:// doi.org/10.1186/gb-2009-10-3-r25. PubMed PMID: 19261174; PMCID: 2690996 15. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data Processing S (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25(16): 2078–2079. https://doi.org/10.1093/bioin formatics/btp352. PubMed PMID: 19505943; PMCID: PMC2723002 16. Ramirez F, Ryan DP, Gruning B, Bhardwaj V, Kilpert F, Richter AS, Heyne S, Dundar F, Manke T (2016) deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res 44(W1):W160–W165. https://doi.org/10.1093/nar/gkw257. PubMed PMID: 27079975; PMCID: PMC4987876 17. Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP (2011) Integrative genomics viewer. Nat Biotechnol 29(1):24–26. https://doi.org/10. 1038/nbt.1754. PubMed PMID: 21221095; PMCID: PMC3346182 18. Milavetz B, Kallestad L, Gefroh A, Adams N, Woods E, Balakrishnan L (2012) Virionmediated transfer of SV40 epigenetic information. Epigenetics 7(6):528–534. https://doi. org/10.4161/epi.20057. PubMed PMID: 22507897; PMCID: 3398982 19. Kumar MA, Kasti K, Balakrishnan L, Milavetz B (2018) Directed nucleosome sliding in SV40 minichromosomes during the formation of the virus particle exposes dna sequences required for early transcription. J Virol. https://doi. org/10.1128/JVI.01678-18 20. Kube D, Milavetz B (1996) Differential regulation by SV40 T-antigen binding at site I defines two distinct classes of nucleosome-free promoter. Anat Rec 244(1):28–32. https:// doi.org/10.1002/(SICI)1097-0185 (199601)244:13.0.CO;2-B
Chapter 3 Universal NicE-Seq: A Simple and Quick Method for Accessible Chromatin Detection in Fixed Cells Hang Gyeong Chin, Udayakumar S. Vishnu, Zhiyi Sun, V. K. Chaithanya Ponnaluri, Guoqiang Zhang, Shuang-yong Xu, Touati Benoukraf, Paloma Cejas, George Spracklin, Pierre-Olivier Este`ve, Henry W. Long, and Sriharsa Pradhan Abstract Genome-wide accessible chromatin sequencing and identification has enabled deciphering the epigenetic information encoded in chromatin, revealing accessible promoters, enhancers, nucleosome positioning, transcription factor occupancy, and other chromosomal protein binding. The starting biological materials are often fixed using formaldehyde crosslinking. Here, we describe accessible chromatin library preparation from low numbers of formaldehyde-crosslinked cells using a modified nick translation method, where a nicking enzyme nicks one strand of DNA and DNA polymerase incorporates biotin-conjugated dATP, dCTP, and methyl-dCTP. Once the DNA is labeled, it can be isolated for NGS library preparation. We termed this method as universal NicE-seq (nicking enzyme-assisted sequencing). We also demonstrate a single tube method that enables direct NGS library preparation from low cell numbers without DNA purification. Furthermore, we demonstrated universal NicE-seq on FFPE tissue section sample. Key words Nicking enzyme, Methyl-dCTP, Biotin-14-dCTP/dATP, Nucleosome, Open chromatin labeling, Epigenetic profiling, DNA library preparation, Formaldehyde-crosslinked cells
1
Introduction Nucleosome-depleted regions on chromatin are accessible to transcription factor and other chromatin-associated protein binding to drive cellular processes. Early studies identified these nucleosomedepleted regions being hypersensitive to DNase I and demonstrated an association of these protein-depleted regions with gene activation, and DNA methylation in eukaryotic organisms [1– 4]. This subsequently led to genome-wide mapping of DNase hypersensitive sites (DHS), also known as “open chromatin” by DNase-Chip or massive parallel DNA sequencing [5, 6]. Following DNase-seq, another accessible chromatin determination method,
Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols, Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7_3, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023
39
40
Hang Gyeong Chin et al.
FAIRE-Seq (Formaldehyde-Assisted Isolation of Regulatory Elements Sequencing), was used for isolation and sequencing of nucleosome-depleted regions of the genome. This method relied on formaldehyde-crosslinked chromatin sheared by sonication and phenol–chloroform extraction of the accessible DNA in the aqueous phase that could be sequenced or hybridized to a DNA microarray [7]. However, both methods were cumbersome and needed large numbers of cells. Recent improvements for DHS mapping include the addition of circular carrier DNA to perform single cell DNase I seq (scDNase I-seq), requiring an input of between 1 and 1000 cells. This technology revealed highly expressed genic regions with multiple active histone marks displaying constitutive DNase I hypersensitive sites among different single cell analysis data. However, in scDNase I-seq, the mappability of 1000 cells to the reference genome was low, ranging from 2% to 40% at the single cell level [8]. Another complementary method, known as MNase-seq, uses nonspecific endo- and exonuclease activities of micrococcal nuclease to cleave protein-unbound regions of DNA on chromatin. Here, DNA bound to histones or other chromatin-associated proteins remain undigested and sequencing of these DNA fragments yields genome-wide maps of bound proteins [9–11]. The undigested DNA displays a mirror image of accessible chromatin. In 2013, ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) was introduced to study chromatin accessibility genome-wide. The method was simpler and easier to use compared with MNase-seq (sequencing of micrococcal nuclease sensitive sites), FAIRE-seq, and DNase-seq. The basic principle relied on a prokaryotic Tn5 transposon, which is loaded with sequencing adapters creating an active dimeric transposome complex. The complex can provide the cut accessible chromatin and the simultaneous ligation of specific sequences. In the early days, ATAC-seq also generated nonspecific amplification of nonnuclear DNA, such as the mitochondrial genome, accounting for ~50% of all reads [12]. Subsequent improvements to the method have led to reduce mitochondrial DNA reads [13]. This application of ATACseq to human cell lines and clinical samples has led to many landmark studies [14–19]. Here we report an improved, robust, and sensitive method, nicking enzyme-assisted sequencing (universal NicE-seq), for epigenetic profiling of the mammalian chromatin that can provide in-depth open versus closed chromatin sequence information from limited amounts of formaldehyde fixed cells [20, 21]. This method is suitable for identification of transcription factor occupancy and complementary to other accessible chromatin methods.
Universal NicE-Seq: A Simple and Quick Method for Accessible. . .
2
41
Materials Prepare all solutions using ultrapure water (e.g., Milli-Q water or equivalent by purifying deionized water, to attain a conductivity of 18 MΩ-cm at 25 °C) and analytical grade reagents. Prepare and store all reagents at room temperature or at 4 °C (unless indicated otherwise).
2.1 Harvesting and Crosslinking Cells
1. HCT116 cells are cultured in McCoys 5A medium (Thermo Fisher Scientific #16600082) supplemented with 10% Fetal Bovine Serum (GemCell #100-500). 2. TrypLE (Thermo Fisher Scientific #12605028, store at R/T before use). 3. 50 mL conical falcon tubes and pipette tips for automatic pipet. 4. Cell culture flasks. 5. Trypan Blue Solution, 0.4% (Thermo Fisher Scientific #15250061). 6. Hemocytometer and inverted microscope. 7. 1.5 mL Eppendorf tube for cell harvest; 1.5 mL DNA LoBind tube (Eppendorf AG #022431021). 8. 16% formaldehyde (Thermo Fisher Scientific #28908). 9. 1X PBS (from 10X PBS, Gibco #70011-044). 10. 2.5 M Glycine (Sigma #G7126). 11. End-over-end bench top rotator (VWR #10136-084).
2.2 Accessible Chromatin Labeling
1. Prepare Cytosolic Buffer: 15 mM Tris–HCl pH 7.5, 5 mM MgCl2, 60 mM KCl, 15 mM NaCl, 1% NP40, and 300 mM sucrose. Add 0.5 mM fresh DTT before use (NEB #B7705S). (Note: Use freshly made Cytosolic buffer; Cytosolic buffer can be stored up to 2–3 week at 4 °C without the addition of DTT. For microbial stability, it may be sterile filtered through 0.22 μM membrane). 2. Prepare a 10X dNTP mix: 240 μM dATP, 240 μM mdCTP (NEB #N0356S), 60 μM biotin-14-dCTP (Thermo Fisher Scientific #19518018), 300 μM dGTP, 300 μM dTTP, and 60 μM of biotin-14-dATP (Thermo Fisher Scientific #19524016). 3. Prepare 2X Accessible Chromatin labeling mix (for one labeling reaction): 20 μL of NEB Buffer #2 10X (NEB #B7002S), 1 U of Nt.CviPII (NEB #R0626S), 10 U of DNA Polymerase I (NEB #M0209S), with 20 μL of 10X dNTP mix, and adjust the final volume to 100 μL with water.
42
Hang Gyeong Chin et al.
2.3 Other Material and Equipment for Chromatin Labeling
1. 37 °C incubator, 65 °C heat-block, and bench-top centrifuge. 2. 0.5 M EDTA (Invitrogen #15575-038) and RNase A (Invitrogen #12091021). 3. Proteinase K (NEB #P8107S) and 20% SDS (TEKNOVA #S0295). 4. Thermolabile Proteinase K (NEB #P8111S). 5. Phenol:Chloroform:Isoamyl Alcohol (Invitrogen #15593-031). 6. Isopropanol (Pharmco-Aaper #231HPLC99). 7. Glycogen (Sigma-Aldrich #10901393001). 8. 80% ethanol in nuclease-free water (Thermo Fisher Scientific #AM9932). 9. 1X TE buffer.
2.4 Material for Quality Control
1. Heat block at 95 °C. 2. Ice-water bath. 3. Positively charged nylon membranes (Amersham #RPN119B). 4. UV light. 5. Blotting-grade blocker (Bio-Rad #170-6404). 6. 1X PBS with 0.1% Tween 20. 7. Goat anti-biotin-HRP antibody (CST #7075). 8. LumiGLO reagent (CST #7003).
2.5 Material of the Library Construction
1. Covaris S2 sonicator and Covaris microtubes (Covaris #500330). 2. NEB Ultra II DNA library prep Kit for Illumina (NEB #E7645). 3. Index primers set (NEB #7335S or E7500S). 4. Prepare 2X High Salt Buffer: 10 mM Tris–HCl pH 8.0, 2 M NaCl, 1 mM EDTA, adjust with nuclease-free water. 5. Prepare High Salt Buffer with 0.05% Triton X-100. 6. DNA LoBind Eppendorf tubes (Eppendorf AG #022431021). 7. Streptavidin Magnetic Beads (NEB #S1420S). 8. NEB Next Sample Purification Beads (E7104) or AMPure XP beads (Beckman Coulter #A63881). 9. Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific #Q32854). 10. End-over-end rotator (VWR #10136-084). 11. Bioanalyzer (Agilent 2100 Bioanalyzer with Agilent High Sensitivity DNA Kit, #5067-4526). 12. Monarch Genomic DNA Purification Kit (NEB #T3010S). 13. PCR machine.
Universal NicE-Seq: A Simple and Quick Method for Accessible. . .
3
43
Methods Carry out all procedures at room temperature unless otherwise specified.
3.1 Harvesting and Crosslinking Cells
1. Take cells from the incubator and visually check their health under the microscope. Remove old medium in the flask and transfer 50 mL to a sterile conical centrifuge tube and then add 5–10 mL TrypLE to the adherent cells in the flask, and incubate for 5 min at 37 °C. 2. Gently tap to detach cells, pipet cells to the old medium containing sterile conical centrifuge (the old medium/serum will inhibit the activity of the trypsin). Spin down 5 min at room temperature, 1500 rpm and remove supernatant (decant). 3. Wash in 5 mL 1X PBS, spin down (5 min at room temperature, 1500 rpm), remove supernatant, and resuspend in 5 mL 1X PBS. 4. Count cells: Dilute small amounts of cells (1:1) with Trypan Blue Stain (gives you a dilution factor of 2) (e.g., take 100 μL cells and 100 μL stain). Pipet into small edge of the hemocytometer at counting chamber. Calculate the number of cells per mL. 5. Calculate how many cells in total. Take 10e6 cells and transfer to 1.5 mL DNA LoBind Eppendorf tube. 6. Add 62.5 μL of 16% formaldehyde and adjust with 1X PBS up to 1 mL (final concentration would be 1% formaldehyde), incubate cells for 10 min at room temperature by the endover-end rotator (see Note 1). 7. Quench the reaction by adding 125 mM Glycine (for a volume of 1 mL, add 52 μL of 2.5 M stock) and incubate for 5 min at room temperature by the end-over-end rotator. 8. Wash cells twice by resuspending in 1 mL of 1X PBS and spinning down 1500 rpm, 1 min at 4 °C. Remove supernatant and cells may be stored at -80 °C for later use. For immediate use, resuspend cells in 1X PBS (0.5 mL) and counter the number of cells and make aliquots depending on how many cells will be needed for downstream work (ideally 400 μL for one million cells, which will give 250,000 in 0.1 mL). Note: Cells may be lost during centrifuge steps. Therefore, the above counting step is crucial for universal NicE-seq. In our experience, cell loss up to ~40% during the crosslink step may occur, if we count cells before adding formaldehyde. This is a critical step when cell numbers are limited (see Notes 1 and 2).
44
Hang Gyeong Chin et al.
3.2 Accessible Chromatin Labeling and Decrosslinking
1. Start with 250–250,000 crosslinked cells suspended in 1X PBS (25 μL). 2. Add 400 μL Cytosolic buffer to the cells and incubate for 10–20 min on ice with occasional tapping for mixing. The nuclei can be visualized under the microscope at this point (circular with smooth edges) (see Note 3). 3. Spin the nuclei down 10 min at 4 °C, 3000 rpm. Discard the supernatant. 4. Wash the nuclei pellet with 1X PBS for 5 min at 4 °C, 3000 rpm. Discard the supernatant. If cell number is small ’ in folder “restriction_enzyme” for your analysis. 7. Make new folders ‘data/bam’ within ‘’. 8. Put bam and bai files into ‘/data/bam’. Name files according to the following rules.
ORE-Seq: Genome-Wide Absolute Occupancy Measurement by Restriction Enzyme. . .
137
9. Start R and set working directory to ‘’. 10. ‘source(“../RE_analysis.R”)’ or run ‘RE_analysis.R’ step by step in RStudio from within ‘’. 11. Find desired output files and plots: a. The script creates a folder structure beginning with the folder ‘analysis_results’, which contains different subfolders for each type of plot and result files. b. Depending on the parameters chosen in the script, the main results path is ‘analysis_results/window_limit_times_1_max_length_500/close_distances_200_300/background_Michael’ (in the following called ‘MAIN’), with plot folders for certain intermediate results along the way. c. Genomic mean accessibilities are saved in ‘MAIN/acc_site_means_simple.txt’, with all_mean = cut–all cut results, cut_uncut_all_3 = uncorrected cut–uncut result, cut_uncut_4 = corrected cut–uncut result. d. Histograms of site accessibilities are saved in ‘MAIN/accessibility_histograms/’ for plus/minus strand and starting/ ending fragments as well as combined results (last column) with all_mean = cut–all cut results, cut_uncut_all_3 = uncorrected cut–uncut result, cut_uncut_4 = corrected cut–uncut result. e. Individual site results are saved in an R dataframe in ‘MAIN/occs_df_list.RData’, with columns chr, enzyme, pos, eff_coverage, eff_cuts, occ_X_1 (= cut–all cut), occ_cut_uncut_2 (=uncorrected cut–uncut), and occ_cut_uncut_4 (= corrected cut–uncut). 3.9.1 Sample Naming Rules
Bam files within ‘data/bam’ need to follow these naming conventions: 1. The script needs bam files for both samples (X% cut and 100% cut) with identical file name except the ending: Samples with one RE digest end with ‘_X.bam’, while samples with second RE digest end with ‘_1.bam’. 2. If there was no second digest and only cut–uncut analysis is wanted, the ‘_1.bam’ file can be a copy/hard link of the ‘_X. bam’ file and the cut–all cut results should then be ignored. 3. File names must contain the RE name of the enzyme present in the sample after an “_” sign, e.g., ‘_BamHIHF_ < RE units>_X.bam’, where the information in ‘’ and ‘ < RE units>’ is not used by the script and ‘’ could be omitted.
138
Elisa Oberbeckmann et al.
4. If the spike-in (if present) used a different RE, then add ‘ -norm’, e.g., ‘_AluI_EcoRInorm__X.bam’. 5. Usable RE names can be checked and added in the ‘RE_info. txt’ file. 6. Multiple REs can be used on the main genome (not the spikein): ‘_BamHI-HF__KpnI__X.bam’, which will be analyzed accordingly. 7. To only get results of one RE if others were present in the same sample, set parentheses to ignore REs: ‘_BamHIHF__(KpnI_400)_X.bam’. This will be analyzed similarly to ‘_BamHI-HF__X’, with the following difference: 8. The sites of the ignored RE (and their neighborhoods) will still be excluded when calculating the background. 9. The sites of the ignored RE might exclude sites of the “main” enzyme when they are close to each other. 10. For calibration samples to be used for fitting the uncut correction factor, add the cut percentage with ‘X_pct_cut’ as in this example: ‘_AluI_10_pct_cut.bam’. 3.9.2 REs
How to Use Other
Add the required information to ‘RE_info.txt’, see ‘RE_info_README.txt’.
3.9.3 How to Use Other Genomes
Other genomes than S. cerevisiae (and S. pombe for spike-in) are not supported by default, as there are unfortunately several references to the chromosome names within the code. If these are treated properly, the script should run for other genomes as well.
3.9.4 How to Fit Uncut Correction Factors
Run section ‘3.1.1 Calc and plot deviation from calibration samples’ in the script that is skipped by default.
4
Notes 1. β-Mercaptoethanol is toxic. Use appropriate personal safety equipment (lab coat, safety glasses, gloves) and handle in fume hood. Probably β-mercaptoethanol can be replaced by DTT, probably at lower concentration. 2. The yeast chromatin prepared by such methods is sometimes called “nuclei,” although this protocol will not yield nuclei in the clean sense, for example if checked by electron microscopy, as obtained, for example, for HeLa cell nuclei. For comparison with other protocols for the preparation of yeast chromatin, see for example [10–15].
ORE-Seq: Genome-Wide Absolute Occupancy Measurement by Restriction Enzyme. . .
139
3. Do not crosslink cells with formaldehyde. This will impair chromatin digestion by REs and lead to too high occupancy values [6]. This amount of cells is sufficient for preparing chromatin for digestion with one type of RE. Linear upscaling is readily possible if enough chromatin shall be prepared for analysis with several REs. 4. Cells are still alive until the cell lysis in step 12. So if you need to monitor some special conditions, e.g., high temperature for a temperature-sensitive mutant, you may have to keep up these conditions during chromatin preparation [16]. The temperatures given in the protocol are for our routine chromatin preparations from wild-type or mutant cells, like an isw1 or gcn5 mutant, grown to log phase at 30 °C. 5. This is a somewhat shaky measure as it depends on the residual amount of water in the pellet. Nonetheless, for most purposes it works just fine. If a more exact measure is needed, you could go by cell number via OD600 or cell counting. 6. If lysis is not efficient, as happens for example with stationary cells or very sick strains, either use more zymolyase (more effective, but may introduce more protease contamination alongside with zymolyase) or incubate longer (may not help much, but longer incubation usually does not harm as cells are still closed at this point) or go to 35 °C (optimal temperature for zymolyase). 7. From here on, pellets are not so tight and stable anymore. Take care not to lose the pellet! Maybe reduce vortex speed from now on to about half. Resuspending is now more difficult due to clumps. Use less volume first and stirring with inoculation loop for better resuspension efficiency. 8. Usually, there remains some sticky stuff at the tube walls that cannot be poured off and that is also not part of the pellet. Remove this sticky stuff, for example, with a cotton swab or tissue paper wrapped around a spatula, without disturbing the pellet. This also allows to remove residual Ficoll solution. 9. Tube labeling may wash off in dry ice/ethanol bath. 10. S. pombe gDNA is only required for the cut–all cut method. 11. Refer to the manufacturer’s detailed instructions for using this kit. This protocol states the basic steps only. 12. EDTA from the STOP-Buffer chelates all Mg2+ and other divalent cations so that DNases that usually depend on divalent cations are inhibited and the gDNA is stable at 4 °C. As the proteinase K will inactivate the RE and as the S. pombe gDNA spike-in is usually added to crude chromatin, purification of the S. pombe gDNA is not necessary at this point. Omitting the purification and storage at 4 °C instead of -20 °C avoids
140
Elisa Oberbeckmann et al.
unnecessary DNA breaks in the very long gDNA. Nonetheless, the S. pombe gDNA is purified in the context of the calibration curve (see step 1 in Subheading 3.8), as this is done also with purified S. cerevisiae gDNA and not crude chromatin. 13. It is of principal importance to ensure saturated digestion. We routinely use two sufficiently different, e.g., fourfold different, RE concentrations. Only if both of them yield approximately the same accessibility value within the same digestion time, e.g., within 5 percent points or some other criterion depending on the application and desired accuracy, then RE digestion was not limiting. This logic only applies if the dynamics of the chromatin substrate were sufficiently frozen, i.e., DNA accessibility did not change over time. This can only be tested by following digestion time courses. Sufficient chromatin stability is usually the case in the absence of ATP and the presence of >2 mM Mg2+, but may be an issue for S. cerevisiae chromatin at low Mg2+ concentrations [6]. If several different REs are used, the mock digestion sample does have to be prepared for all of them. 14. RE preparations and RE storage buffer usually contain a high glycerol concentration. Upon addition of RE or RE storage buffer, avoid a final concentration of >5% glycerol if the RE is prone to star activity. 15. The incubation time has to be long enough to reach saturation for the chosen RE concentrations. However, this RE digestion occurs in crude chromatin so that overlong incubation increases side effects. For example, chromatin preparations may contain endogenous exo- and endonucleases, which may resect DNA ends introduced by the RE or lead to not RE-caused DNA breaks, respectively. The former is detected and controlled for in the bioinformatic analysis (Subheading 3.9) and the latter by comparison with the mock digestion. The influence of endogenous nucleases can be dampened by lowering RE digestion time and/or temperature and/or Mg2+ concentration. While all these measures will also compromise the RE digestion, this can be compensated by increasing RE concentration. The RE digestion conditions given in the protocol here were found to work well for wild-type S. cerevisiae chromatin [6], but may have to be modified for other applications. 16. If the 1× RE-buffer contains enough potassium to cause precipitation at low temperature in the presence of SDS from the STOP-Buffer, as the case for 1× CutSmart buffer (NEB), keep the sample always at 37 °C until phenol/chloroform extraction. 17. Phenol is toxic and β-Mercaptoethanol is also toxic. Use personal safety equipment (lab coat, safety glasses, gloves) and handle in fume hood.
ORE-Seq: Genome-Wide Absolute Occupancy Measurement by Restriction Enzyme. . .
141
18. Heat inactivation is not always possible depending on the RE. However, just stopping by EDTA addition is also sufficient as this will reliably inactivate the RE digest and the RE is later on removed after shearing by AMPure bead purification. 19. We noted (see Fig. 2) a scoring bias that may stem from a bias toward RE-generated versus shearing-generated DNA ends and attribute this to incomplete DNA end repair and impaired sequencing adapter ligation of the latter in comparison to the former DNA ends. This may be due to incomplete removal of 3′ phosphate groups or other chemical modifications generated by shearing [7], but not by REs, that are not efficiently processed by end repair enzymes in the library preparation kit. Additional pretreatment with the DNA repair Pre-CR kit (NEB, M0309) may help, but others still observed a similar bias in their calibration curve despite using this kit [8]. To increase the likelihood of sufficient DNA end repair, we recommend to increase the ratio of repair enzymes relative to DNA and therefore to use only 100–200 ng, although the kit’s manufacturer recommend to use up to 1 μg of DNA. 20. Avoid using too many PCR cycles as this will increase the occurrence of clonal PCR duplicates. A maximum of 8 cycles is recommended, usually 6 cycles are sufficient and preferred. 21. Some REs, especially the HF (high fidelity) version sold by NEB may stick to the DNA ends after cutting their site. This may inhibit the adapter ligation reaction during sequencing library preparation. Therefore, it is recommended that the RE is removed by proteinase K digestion and further DNA purification. Treatment of mock-digested gDNA mix in parallel ensures that the DNA concentration for RE-digested and undigested gDNA stays the same, i.e., generating the defined percentages in the following step 6 can be done via mixing corresponding volumes without measuring DNA concentrations. 22. Your pipetting precision and accuracy will decide how well defined these percentages are. Follow the recommendations of the manufacturer of your pipets. Use the same pipet and setting for each percentage if you prepare calibration curves for several REs. The DNA masses given in Table 1 can be estimated from steps 2 and 3. The exact DNA amount is less important than the exact volume ratio of cut versus uncut gDNA (see Note 21). 4.1 Bioinformatic Notes
The bioinformatics analysis is a modified version based on the analysis in our first application [6] and in MRW’s PhD thesis [17]. Our script deposited at GitHub (https://github.com/ gerland-group/ORE-seq_analysis) is written for use with several
142
Elisa Oberbeckmann et al.
REs (thoroughly tested with AluI, BamHI, and HindIII) and the S. cerevisiae genome and with an S. pombe spike-in for the cut–all cut method. For custom applications of other REs and especially other genomes, the script has to be modified or newly written by a bioinformatics expert. As we cannot foresee future applications by other users, we just state in the following in detail the underlying rationale and mathematical background. In the following, steps marked with * are only needed for the cut–all cut method, which for example needs a normalization between the cut and all cut sample using an S. pombe gDNA spike-in (Fig. 4). Likewise, steps marked with ° are only needed for the cut–uncut method, like the counts of uncut fragments at a given cut site (Fig. 5). Note that this description and our script are an all-in-one solution that calculates the outcome according to both methods at the same time. First follow the steps for mapping/indexing and download the script files as described in Subheading 3.9 and have a look at the readme file of the repository. The script then performs the following actions: 4.1.1
Map/Filter Reads
1. Extract paired-end read information: chromosome, start, end, and strand information, with end positions shifted by +1 bp. 2. Remove fragments that are longer than 500 bp. 3. Remove rDNA fragments by excluding the following loci: S. cerevisiae chr. 12: 451500–495000 S. pombe chr. 3: 0–30000 S. pombe chr. 3: 2430000–2452883
4.1.2 Count Cut and Uncut° Fragments
4. Count the starting/ending fragments on plus and minus strand cτ(x) for each genomic position x with τ = 1, 2, 3, 4, denoting starts on plus, starts on minus, ends on plus, and ends on minus strands, respectively. For starting reads, we count the position of the first base pair, and for ending reads, we count the position after the last base pair (i.e., end positions are shifted by +1 bp). We use the notation c 1τ ðx Þ and c 2τ ðx Þ for the sample without and with second RE digest, respectively. For later modeling, we assume that one single given fragment with RE-cut or sheared fragment start or end at x will on average yield pxτ counts after PCR and Illumina sequencing. 5. For the cut–uncut method, we need the uncut fragments for fixed genomic positions x, i.e., fragments that start before x - d and end after x + d (end positions are shifted by +1 bp) in the sample without second RE digest. The extension by d is needed due to the fact that not all RE cut both strands at the same position, as explained later. We denote this number of uncut fragments with u1τ ðx, d Þ, also using the index τ as in step 4 to
ORE-Seq: Genome-Wide Absolute Occupancy Measurement by Restriction Enzyme. . .
143
differentiate between plus (τ = 1 or 3) and minus strand (τ = 2 or 4). We assume that one such uncut fragment at x will on average results in q xτ counts after PCR and Illumina sequencing. 4.1.3 Determine Cut Site Positions with RE Motif
6. Determine the cut site positions, i. e., the positions of the RE recognition motif, on both* genomes including generation of the actual DNA ends by end polishing in the following way. We define xi as the position of the first base pair of the recognition motif of cut site i plus half the length of the recognition motif, which usually has an even length. HindIII as an example with ‘|’ denoting the cut in both strands: Position xi is given by the underlined base: + strand: 5′-...A|A G C T T...-3′ - strand: 3′-...T T C G A|A...-5′
In case of a 5′ overhang, the 3′ end is elongated to match the 5′ end during DNA end polishing by a DNA polymerase. Conversely, a 3′ overhang is digested to match the recessed 5′ end during DNA end polishing by a 5′–3′ exonuclease. For such end-polished HindIII ends, we get the following double stranded fragment ends: Position xi is given by the underlined base: + strand: ending: 5′-...A A G C T-3′ and starting: 5′-A G C T T...-3′ -strand: ending: 3′-...T T C G A-5′ and starting: 3′-T C G A A...-5′
Let Δs be the shift length from the pattern center to the cut position of the + strand in upstream direction, which corresponds to half the length of the 5′ overhang of the cleavage product in bp. For HindIII, Δs = + 2, Δs = 0 for blunt end cutting RE, whereas in case of an RE with 3′ overhangs, Δs is negative.
RE
Recognition motif (vertical line indicates cut position)
Shift length Δs
AluI
AG|CT
0
BamHI G|GATCC
2
HindIII A|AGCTT
2
EcoRI
G|AATTC
2
HhaI
GCG|C
-1
KpnI
GGTAC|C
-2
144
Elisa Oberbeckmann et al.
Assuming proper end polishing of cut fragments as described above, we have the following counts for site i: Counts of starting reads on + strand: c 11 x i - Δs and c 21 x i - Δs Counts of starting reads on - strand: c 12 x i - Δs and c 22 x i - Δs Counts of ending reads on + strand: c 13 x i þ Δs and c 23 x i þ Δs Counts of ending reads on - strand: c 14 x i þ Δs and c 24 x i þ Δs Uncut read on + strand°: u11 x i , Δs Uncut read on - strand°: u12 x i , Δs To obtain the number of fragments not cut° by the RE at a given site, we count all fragments that start before xi - Δs and end i after x + Δs, yielding u1τ x i , Δs . For easier notation, we set x i1 = xi2 =x i -Δs and x i3 = x i4 = x i þ Δs , yielding the cuts at site i as c 1τ x iτ , c 2τ x iτ , τ = 1, 2, 3, 4. If the cut–all cut method is used, this step needs to be done on both the S. cerevisiae and S. pombe genomes. In the S. pombe gDNA spike-in, a different RE can be used. 4.1.4 Remove RE Sites with Close Neighbor RE Sites
7. Especially for X% samples derived from RE digestion of chromatin, uncut fragment counts are increased at cut sites with any neighboring cut site within approx. 150 bp. Thus, we ignore RE sites completely if they have a neighbor within 200 bp in either direction. This cutoff may be adjusted depending on given samples. We denote the set of leftover sites with I and J, for the S. cerevisiae and S. pombe* genomes, respectively. 8. As shown in Fig. 6, we often saw dependencies between the fragment counts C iτ and A iτ (defined below) and the distance to the next neighboring RE site, ranging up to 300–500 bp, e.g., for starting reads and the downstream distance to the next neighbor RE site (Fig. 6b). Thus, we ignore start or end cut counts of an RE site and near the RE site (see RE site window approach below), if the next RE site downstream or upstream, respectively, is closer than 300 bp, respectively. Note that this value can be further tuned to the experimental conditions, although for our calibration samples shown in Fig. 2, there was hardly any difference between this limit set to 300 bp or a more conservative 500 bp. In general, the higher the degree of shearing, i.e., the shorter the average fragment length, the lower this limit can be. See also legend to Fig. 6. 9. In our protocol here, we modified the original protocol [6] such that different REs are used for digesting S. cerevisiae chromatin or S. pombe gDNA spike-in, for example BamHI and EcoRI, respectively. In this case, the EcoRI sites in the S. cerevisiae genome are not considered when determining close RE sites. However, since the second RE digest is applied after including the S. pombe gDNA spike-in, the BamHI sites
ORE-Seq: Genome-Wide Absolute Occupancy Measurement by Restriction Enzyme. . .
145
Fig. 6 Scoring bias due to close next neighbor RE sites. Exemplary selection from the “cut_counts_vs_nn_distance” plots for AluI 50% cut calibration sample (as in Fig. 2) are shown. Our script automatically generates such plots for the “X%” (a, b) and “100%” (c, d) samples (X or 1 in y-axis label) that show the number of reads
146
Elisa Oberbeckmann et al.
need to be considered when removing the close EcoRI site on the S. pombe genome, i.e., remove all EcoRI sites with a close EcoRI neighbor or a close BamHI neighbor as described in this subheading above. 4.1.5 Collect Cut and Uncut° Counts Within Window Near Cut Sites to Correct for Resection
10. Due to endogenous exonucleases that may be present in the chromatin preparations and trim DNA ends after RE cleavage, some fragment ends do not match the RE cut site positions any more, even though they were generated by the RE. Thus we
ä Fig. 6 (continued) (position of the marker along the y-axis) that map to the indicated combinations (y-axis label) of the plus strand of the chromosome and start (a, b) or end (c, d) (y-axis label) at a given RE site (0 on xaxis) that have a next neighbor (nn) site for the same RE at a given upstream (a, c) or downstream (b, d) distance (x-axis label) in bp (position of the marker along the x-axis). Analogous plots are also generated for minus end reads. We found that the strand identity does not matter, but rather the orientation (upstream versus downstream) relative to whether a read starts or ends at the RE site. (e, f) Plots for sequencing reads analogous to those in panels a and b, but for uncut fragments where the given RE site was not cut. The green lines correspond to the average at a given x-axis position. Our interpretation of the observed curve shapes is as follows. (a) If the sequencing reads stem from fragments starting with the RE site, then next neighbor RE sites upstream are irrelevant for scoring efficiency measured via the obtained read number as they are not contiguous with the sequenced fragment anymore due to the RE cut and therefore do not affect scoring by adapter ligation and sequencing. (d) The same is true for next neighbor RE sites downstream of an RE site where a read ends. In contrast, next neighbor RE sites downstream (b) or upstream (c) of an RE site where a read starts or ends, respectively, may be cut (are indeed cut to 100% in our calibration samples shown here) and therefore generate DNA fragments with two RE cut ends and a consistent length that may be shorter than the average length generated by the combination of one RE and one shearing end. Such fragments with two RE ends are scored more efficiently (= above averages shown in panels A and D) than fragments with one RE and one sheared end. The paucity in fragments 500 bp length are excluded from the analysis as Illumina sequencing becomes biased against longer fragments, which explains that the curves level off to the average level (similar to green line in panels a and d) beyond 500 bp next neighbor distance. If shearing is more extensive, i.e., if the average fragment length is much shorter than 500 bp, then the curve will approach the average level at a next neighbor distance close to the average fragment length as next neighbor sites beyond the average fragment length will not be contiguous anymore Note that especially for the “X%” samples generated from chromatin digestion, the x-axis need not reflect the actual fragment length that gave rise to a certain sequencing read, but denotes a property of the genome sequence (distance to the next neighbor RE site). Nonetheless, for calibration samples shown here, the x-axis does mostly reflect actual fragment length as virtually all RE ends stem from the 100% cut S. cerevisiae gDNA that was mixed with uncut S. cerevisiae gDNA. Fortuitous ends at RE sites due to shearing are negligible (e.g., less than 10 counts on y-axis here). Finally, the bias due to next neighbor RE sites can also be apparent for uncut fragments where there is a potentially cut RE site within approx. 150 bp of the view point RE site (0 on x-axis) in either the upstream (E) or downstream (F) direction. While this is not much pronounced for the samples shown here as the uncut fragments stem from mock digests in these calibration samples, it may be considerable in chromatin samples and also calls for excluding such next neighbor sites. The bioinformatics procedure that corrects for the next neighbor site bias by excluding these RE sites is detailed in see Subheading 4.1.
ORE-Seq: Genome-Wide Absolute Occupancy Measurement by Restriction Enzyme. . .
147
need to count the starting and ending fragments not only at the exact cut positions, but also at some distance from it. The amount of strand resection varies between samples, so its correction needs to be tailored to each pair of samples without and with* second RE digest. We define count windows for each fragment type: For read starts, W1 = W2 = {0, 1, 2, . . ., w} to apply a window in the downstream direction and for read ends, W3 = W4 = {0, -1, -2, . . ., -w} to apply a window in the upstream direction. The algorithm to find the optimal value for w is described at the end of this step. C iτ denotes the number of cut fragments in the sample without second RE digest (“cut sample”) and Aiτ denotes the number of cut fragments in the sample with second RE digest* (“all cut sample”): X X c 1 x iτ þ a and A iτ = c 2 x iτ þ a C iτ = a∈W τ a∈W τ τ
τ
w is determined using the sample without second RE digest and for the data of the S. cerevisiae genome and then* the same value is used for the sample with second RE digest and the S. pombe genome as well. For mock-digested samples, we set w = 5 to average over-fluctuations in the very low cut counts at a single position. In the case of ignored start counts of step 8, we set i C τ = NA and A iτ = NA for τ = 1, 2 and the same for τ = 3, 4 in the case of ignored end counts. For normal samples, we use the following algorithm, which makes sure that increasing w by 1, 2, 3, 4, or 5 bp does not increase the summed counts within w by more than 1%, correcting for cut counts from shearing. Calculate the mean counts (averaged over all cut sites) at each position -200 bp to 200 bp away from the average cut site for starts and ends counts and both strands. These cut counts near the average cut site usually show a single peak at 0, but depending on the conditions there is also a decreasing shoulder downstream/upstream for starts/ends, respectively. Averaging the different types and strands (end counts need to be mirrored at 0 first) yields m(d), d being the distance to the average cut site. The cut counts need to be corrected by the average shearing cut counts, which we obtain 100–200 bp away from the cut site: mc(d) = m(d) - hm(d)id = 100, . . ., 200 (h. . .i indicating the Paverage). We define the cumulative sum of counts by S ðwÞ = wd = 0 m c ðd Þ. Finally, we set w equal to the first integer starting from 0 such that for all n ∈ {1, 2, 3, 4, 5}, the sum of the counts of the next n positions, S(w + n) - S(w), is less than 1% of S(w). In our samples, typical values for w ranged from 0 to 20, going up to 40 for samples with very strong resection.
148
Elisa Oberbeckmann et al.
Uncut fragment counts at any RE cut site are not influenced by endogenous exonucleases as they are still occupied by a nucleosome or other protein that blocked the RE. For easier notation, we define the uncut counts at site i by U iτ = u1τ x i , Δs resection Pw The c mean dm ð d Þ =S ð w Þ : d =0 4.1.6 Occupancy Estimation by Cut–All Cut Method with Background Correction and Normalization
length
is
defined
as
We seek to estimate the real accessibility αi at cut site i using the cut counts of the cut and all cut samples taking into account a bias toward RE versus sheared fragment ends and effective sequencing probabilities. We begin with viewing C iτ and A iτ as random variables with the expectation values. E C iτ = N C μi piτ with μi = αi þ 1 - αi s E A iτ = N A piτ where NC and NA are the number of cell cores in the samples without and with second RE digest, respectively, and piτ is a factor that combines the sequencing probabilities and the PCR multiplication of fragments of type τ in the window Wτ at cut site i and is an effective average of the pxτ with x∈x iτ þ W τ described earlier. The probability that a given (longer) fragment will be cut by shearing within a fixed region of length w + 1 within the fragment is denoted by s (not to be confused with the site shift variable Δs). Since the REs act before the shearing step, only the fraction that has not been cut by the RE can be cut in the shearing step, leading to μi = αi þ 1 - αi s: We assume that in the sample with second RE digest, all counts near a cut site came from a cut of the RE and all counts far away from cut sites occurred due to shearing. We use these four estimators for μi and αi: b μiτ ≔
Ciτ NA b μiτ - s i b : and α ≔ τ 1-s Aiτ NC
The estimators for αi are approximately unbiased, as h i ! " # h i E b μiτ - s i 1 NA 1 i - s ≈ αi , E b ατ = = E Cτ E i 1-s 1-s Aτ N C Because the C iτ and A iτ are statistically h i independent as they originate from different samples and E A1i ≈ E 1Ai : Note that the ½ τ τ
ORE-Seq: Genome-Wide Absolute Occupancy Measurement by Restriction Enzyme. . .
149
two sets C iτ and A iτ within themselves, however, are statistically dependent. We set b αiτ = NA, if A iτ = 0 or A iτ = NA (due to a close neighbor in direction of τ). To obtain NA/NC, we use the S. pombe gDNA spike-in cut sites, which are completely cut in both samples: i A τ i∈J ,τ NA = i NC C τ i∈J ,τ with hXi,τii,τ denoting the mean of Xi, τ over i and τ, ignoring NA values. Alternatively, the ratio of the number of sequenced S. pombe reads could be used but gave slightly worse results in our calibration runs. To estimate the probability s, we look at the set Z of all genomic positions in S. cerevisiae that are further away than 300 bp from any cut site (including the ones with a close neighbor). At these positions, 1 all counted starts and ends originate from shearing. Thus, c τ ðx Þ i∈Z ,τ is an estimator for N C s 1 pxτ ðx Þ i∈Z ,τ, with s1 being the probability of shearing a long fragment at one fixed position. Since piτ are effectively averages of pxτ in the cut site window of site xi, so their averages over large regions of the genome are (with good approximation) the same: D E x 1 i 1 i pτ ðx Þ x∈Z ,τ ffi piτ ðx Þ = E A τ i∈I ,τ ffi A NA N A τ i∈I ,τ i∈I ,τ The average of E Aiτ over i and τ can be well approximated by the average of A iτ , leading to our value for s1: 1 N A c τ ðz Þ z∈Z ,τ i s1 = NC A τ i∈I ,τ Thus, s1 is the (correctly normalized) ratio of the average fragment number at genomic positions where cuts can happen only by shearing (counts in the sample without second RE cleavage (“X%”) away from cut sites) and the average fragment number at genomic positions where cuts have to happen by the RE (counts in the sample with second RE cleave (“all cut” or “100%”) at the cut sites). Using s1, we can calculate the probability that a fragment is sheared at least once within a fixed window of length w + 1: s = (w + 1)s1. If a fragment is sheared more than once within a window of length w + 1, the new fragments within the window will be too small and filtered out before PCR and sequencing. Due to the stochasticity in the values for C iτ and A iτ for fixed i and τ, the estimators b αiτ can be smaller than 0 or larger than 1, even though the values they estimate, i.e., αi are between 0 and 1.
150
Elisa Oberbeckmann et al.
The lowest possible value of b αiτ is - 1 -s s , with s ≤ 0.15 in most samples. It is not useful to restrict the estimators b αiτ to [0; 1] before averaging, because a 100% accessibility test sample has measured accessibilities distributed around 1 in both directions. Capping the estimators b αiτ at 1 would then give a mean value lower than 1. However, very large outliers influence the mean very strongly, even though the real value cannot be greater than 1, thus we cap the values for b αiτ at 1.5 when averaging over τ, D
E b αi = min b αiτ , 1:5 , τ
to obtain one accessibility estimate for each cut site i. If b αiτ = NA, it is ignored during the averaging step. To obtain the global accessibility, we average over all sites: D E b α= b αi i∈I
When comparing the accessibility values of individual sites with the measured values from other assays, it does make sense to restrict the values of b αi to [0; 1], since this gives the best estimate for each individual site. 4.1.7 Occupancy Estimation by Cut–Uncut Method with Background Correction
In the following we only use data from the sample without second RE digest to estimate the accessibility and use the ratio of the cut counts C iτ and the counts of uncut fragments U iτ . We choose to only consider different PCR biases and sequencing biases between cut and uncut fragments, giving all cut fragments the sequencing probability p and all uncut fragments the sequencing probability q. Summing up cut counts and uncut counts, we set. C i ≔Ci1 þ Ci2 þ Ci3 þ Ci4 and Ui ≔2 Ui1 þ Ui:2 for sites without any neighbor within 300 bp and C i ≔Ci1 þ Ci2 or Ci ≔Ci3 þ Ci4 and Ui ≔Ui1 þ Ui2 for sites with one upstream/downstream neighbor within 300 bp, respectively. Then define the ratio of cut and uncut fragments, b κi ≔
Ci Ui
If the denominator is 0, we set b κi = 1, which will lead to an accessibility of 1. Similar to the previous subheading, we have E[Ci] = 4NCp(i α + (1 - αi)s1(w + 1)) with s1 being the shearing probability per base pair, but now calculated only using the cut sample, i.e., the ratio of all cut counts away from sites and the sum of cut and uncut fragment counts away from cut sites. For Ui, we assume that the uncut fragment counts are given by fragments that have not been cut by the RE at x iτ and after that also not been cut by shearing at x iτ.
ORE-Seq: Genome-Wide Absolute Occupancy Measurement by Restriction Enzyme. . .
151
The generally very low sequencing probabilities justify the assumption that C iτ and U iτ are “independent enough” to make the following approximation: h i E C i 4N C p αi þ 1 - αi s 1 ðw þ 1Þ i E b κ ≈ i = 4N C q ð1 - ðαi þ ð1 - αi Þs 1 ÞÞ E U The ratio of sequencing probabilities of cut and uncut fragments, the “uncut correction factor” γ≔ pq, is fitted to the calibration samples as described in the subheading below. We then obtain the following estimator for αi: b αi ≔1 - i 1þσ , bκγ þ1 - σw thus b αi =
Ci C i - σ ðw þ 1ÞU i γ = i eff i i i C - σ ðw þ 1ÞU γ þ ð1 þ σ ÞU γ C eff þ U eff i
hc1τ ðzÞiz∈Z,τ being the corrected ratio of all cut hu1τ ðzÞiz∈Z,τ counts away from all cut sites and all uncut fragment counts away from all cut sites. with σ≔ 1 -s1 s1 =
1 γ
C ieff = C i - σ ðw þ 1ÞU i γ and U ieff = ð1 þ σ ÞU i γ are the effective counts of cut and uncut fragments, respectively, both corrected for cuts in the shearing step and different sequencing probabilities of cut and uncut fragments. C ieff þ U ieff gives an “effective coverage” of cut and uncut fragments at the site i and we ignore sites with an effective coverage below 40. This limit may be adapted for different applications. D EFinally, the genome-wide average accessibility is given by b α= b αi
i∈I
.
Fit of γ using prepared calibration samples for RE digests: For each RE (AluI, BamHI, and HindIII) and each calibration sample s with 0%, 10%, 30%, 50%, 70%, 90%, and 100% prepared fraction of uncut DNA molecules, i.e., prepared occupancy ωs = 1 αs, we calculate the measured genome-wide average occupancy b s ðγ Þ = 1 -Db ω αs ðγ Þ for varying γ. We then choose γ for each RE E such that
b s ðγ ÞÞ2 ðωs - ω
s
is minimized. Additionally, we did a
combined fit over all calibration samples of the three REs to use for REs, for which no specific calibration samples were measured. The following table shows the best values for γ: RE
AluI
BamHI
HindIII
Combined
γ min
1.282
1.279
1.333
1.300
152
Elisa Oberbeckmann et al.
References 1. Wal M, Pugh BF (2012) Genome-wide mapping of nucleosome positions in yeast using high-resolution MNase ChIP-Seq. Methods Enzymol 513:233–250. https://doi. org/10.1016/b978-0-12-391938-0.00010-0 2. Buenrostro JD, Wu B, Chang HY, Greenleaf WJ (2015) ATAC-seq: a method for assaying chromatin accessibility genome-wide. Curr Protoc Mol Biol 109:21.29.21–21.29.29. https://doi.org/10.1002/0471142727. mb2129s109 3. Buenrostro JD, Wu B, Litzenburger UM, Ruff D, Gonzales ML, Snyder MP, Chang HY, Greenleaf WJ (2015) Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523(7561):486–490. https://doi.org/10.1038/nature14590 4. Kim YC, Grable JC, Love R, Greene PJ, Rosenberg JM (1990) Refinement of Eco RI endonuclease crystal structure: a revised protein chain tracing. Science (New York, NY) 249(4974):1307–1309. https://doi.org/10. 1126/science.2399465 5. Gregory PD, Barbaric S, Horz W (1999) Restriction nucleases as probes for chromatin structure. Methods Mol Biol (Clifton, NJ) 119:417–425. https://doi.org/10.1385/159259-681-9:417 6. Oberbeckmann E, Wolff M, Krietenstein N, Heron M, Ellins JL, Schmid A, Krebs S, Blum H, Gerland U, Korber P (2019) Absolute nucleosome occupancy map for the Saccharomyces cerevisiae genome. Genome Res 29(12):1996–2009. https://doi.org/10. 1101/gr.253419.119 7. Ohtsubo Y, Sakai K, Nagata Y, Tsuda M (2019) Properties and efficient scrap-andbuild repairing of mechanically sheared 3’ DNA ends. Commun Biol 2:409. https://doi. org/10.1038/s42003-019-0660-7 8. Chereji RV, Eriksson PR, Ocampo J, Prajapati HK, Clark DJ (2019) Accessibility of promoter DNA is not the primary determinant of chromatin-mediated gene regulation. Genome Res 29(12):1985–1995. https://doi.org/10. 1101/gr.249326.119 9. Biernacka A, Skrzypczak M, Zhu Y, Pasero P, Rowicka M, Ginalski K (2021) High-
resolution, ultrasensitive and quantitative DNA double-strand break labeling in eukaryotic cells using i-BLESS. Nat Protoc 16(2): 1034–1061. https://doi.org/10.1038/ s41596-020-00448-3 10. Martinez-Campa C, Kent NA, Mellor J (1997) Rapid isolation of yeast plasmids as native chromatin. Nucleic Acids Res 25(9):1872–1873 11. Aris JP, Blobel G (1991) Isolation of yeast nuclei. Methods Enzymol 194:735–749. https://doi.org/10.1016/0076-6879(91) 94056-i 12. Kizer KO, Xiao T, Strahl BD (2006) Accelerated nuclei preparation and methods for analysis of histone modifications in yeast. Methods (San Diego, Calif) 40(4):296–302. https:// doi.org/10.1016/j.ymeth.2006.06.022 13. Reese JC, Zhang H, Zhang Z (2008) Isolation of highly purified yeast nuclei for nuclease mapping of chromatin structure. Methods Mol Biol (Clifton, NJ) 463:43–53. https:// doi.org/10.1007/978-1-59745-406-3_3 14. Zhang Z, Reese JC (2006) Isolation of yeast nuclei and micrococcal nuclease mapping of nucleosome positioning. Methods Mol Biol (Clifton, NJ) 313:245–255. https://doi.org/ 10.1385/1-59259-958-3:245 15. Kiseleva E, Allen TD, Rutherford SA, Murray S, Morozova K, Gardiner F, Goldberg MW, Drummond SP (2007) A protocol for isolation and visualization of yeast nuclei by scanning electron microscopy (SEM). Nat Protoc 2(8):1943–1953. https://doi.org/10. 1038/nprot.2007.251 16. Schmid A, Fascher KD, Horz W (1992) Nucleosome disruption at the yeast PHO5 promoter upon PHO5 induction occurs in the absence of DNA replication. Cell 71(5):853–864 17. Wolff MR (2020) Nucleosome occupancy and dynamics in yeast: genome-wide and promoter-level analyses and modeling. PhD, LMU Mu¨nchen, Mu¨nchen 18. Chereji RV, Ramachandran S, Bryson TD, Henikoff S (2018) Precise genome-wide mapping of single nucleosomes and linkers in vivo. Genome Biol 19(1):19. https://doi. org/10.1186/s13059-018-1398-0
Part III Methods for Profiling Chromatin Accessibility at the Single-Cell Level
Chapter 10 Single-Cell Joint Profiling of Open Chromatin and Transcriptome by Paired-Seq Chenxu Zhu, Zhaoning Wang, and Bing Ren Abstract Simultaneous detection of chromatin accessibility and transcription from the same cells promises to greatly facilitate the dissection of cell-type-specific gene regulatory programs in complex tissues. Paired-seq enables joint analysis of open chromatin and nuclear transcriptome from up to a million cells in parallel. It achieves ultra-high-throughput single-cell multiomics with the use of a combinatorial barcoding strategy involving sequential ligation of multiplexed DNA barcodes to chromatin DNA fragments and reverse transcription products, followed by high-throughput DNA sequencing of the resulting DNA libraries and deconvolution of single-cell multiomic maps based on cell-specific barcodes. Key words Paired-seq, Single-cell multiomics, Chromatin accessibility, Gene expression, Epigenome
1
Introduction Cis-regulation elements (CREs) play a fundamental role in gene regulation. In eukaryotic cells, binding of transcription regulators to CREs leads to depletion of nucleosome and hypersensitivity to nucleases (such as DNase I or micrococcal nucleases) and Tn5 transposases [1–3]. Methods exploring the hypersensitivity of active CREs have been developed to map these sequences in the genome, including DNase I hypersensitive sites sequencing (DNase-seq) [4], micrococcal nuclease digestion with deep sequencing (MNase-seq) [5], formaldehyde-assisted isolation of regulatory elements sequencing (FAIRE-seq) [6], and assay for transposase-accessible chromatin using sequencing (ATAC-seq) [7, 8]. The advancement of single-cell chromatin accessibility assays using droplet-based or combinatorial barcoding strategies [9–14] has enabled deconvolution of cell-type-specific transcriptional programs from mixed cell populations and primary tissues [15]. However, measuring individual molecular modalities one at a time in single cells does not permit a full view of the gene regulatory
Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols, Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7_10, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023
155
156
Chenxu Zhu et al.
process in complex tissues and pathogenesis [16, 17]. Co-assay of gene expression together with DNA methylation [18], histone modification [19], chromatin accessibility [20], or high-order chromatin conformation [21] can lead to a better understanding of cell-type-specific gene regulatory programs and enable a better assessment of the role of epigenome in transcriptional regulation of each gene. Several methods have now been reported to enable joint analysis of nuclear transcriptome and accessible chromatin in individually isolated cells [22], or thousands of single cells with plate-based combinatorial indexing [20] and droplet-based barcoding [23]. Paired-seq is a scalable single-cell technology that can assay gene expression and chromatin accessibility for up to a million single cells in parallel [24] with the use of a ligation-based combinatorial indexing strategy [25]. It begins with fragmentation of open chromatin by the Tn5 transposases followed by reverse transcription of nuclear mRNA by reverse transcriptase. DNA barcodes are then subsequently ligated in situ to the chromatin fragments and reverse transcription products (cDNA) in each nucleus through a split-and-polling scheme in 96-well plates. Following nuclei lysis, the chromatin DNA and cDNA are amplified, and then split into two separate libraries corresponding to each molecular modality for next-generation DNA sequencing (Fig. 1). The entire Paired-seq procedure, not including DNA sequencing, spans 2 days (see Note 1). With a reasonable sequencing depth (number of sequenced reads per nuclei: 25,000 for DNA and 50,000 for RNA), Pairedseq can generate single-cell multiomics profiles with ~5000 unique tagmentation loci and ~ 10,000 unique transcripts per nucleus. a 1 hr
1 hr
Nuclei preparation
Tagmentation
Day 1
1.5 hr
4.5 hr
4 hr
Day 2 DNA: 2 hr / RNA: 1.5 hr
Reverse transcription
Nuclei barcoding
Pre amplification
Library dedicating SbfI
CCC GGG CCC GGG
TTT
AAA TTT
AAA
b
TTT
5’
GGG GGG
DNA
GGG
TTT
5’
Preamplification
TTT
5’
5’ 5’
NotI
5’
FokI cutting site
TTT TTT
TTT
TTT TTT
CCC GGG
Tagmentation
CCC GGG
5’
RNA Library TTT
TTT TTT
TTT TTT
TTT
TTT TTT
Read2 Primer
Read1 Primer
TTT TTT
100-500 bp
FokI recognition site
N4
CCC GGG TTT TTT
CCC GGG
CCCCCCCCCC GGG CCCCCCCCC GGG
5’
TTT TTT
Linear amplification 5’
NotI
TTT TTT
BC #1 #2 #3
DNA Library
>1.5 kb
CCCCCCCCCC CCCCCCCCC
5’
5’
GGG
TTT
3’
3’
GGG GGG
TTT
TdT tailing 5’
DNA: RNA:
TTT
Library amplification
TTT T TT
Tn5-Adaptor1 3’
Cellular barcodes BC#1
BC #1 #2
TTT TTT
cDNA
BC#4#3 #2 #1 TTT 5’ 5’
DNA: RNA:
AAA TTT
AAA
TTT
Pause point 1.5 hr
TTT TTT
CCC GGG CCC GGG
5’ 100-500 bp
CCC GGG
SbfI FokI
C CC G GG
Read1 Adaptor
Read2 Primer Read1 Primer
NNNN
Ligation
100-500 bp
Fig. 1 Overview of Paired-seq protocol. (a) Paired-seq protocol can be finished in 2 days, pause points are indicated. (b) Schematics for library preparation strategy of Paired-seq. Both DNA fragments from Tn5 tagmentation and cDNA were pre-amplified with a TdT-based strategy and then split into two portions. For DNA library, the 2nd adaptor was added by ligation; for RNA library, the 2nd adaptor was added by Tn5 tagmentation
Single-Cell Co-Assay of Open Chromatin and RNA
2
157
Materials
2.1 Reagents Preparation
1. Tn5 protein were purified according to ref. [26] and Paired-seq primers (Table 1 and see Note 2). 2. RT primers (Table 2). 3. Tn5 barcodes (Table 2). 4. Barcode oligos (Tables 3 and 4). 5. Tris–HCl, pH 7.5 (Invitrogen, Cat# 15567027). 6. NaCl (Sigma, Cat# S7653). 7. Glycerol (Sigma, Cat# G5516). 8. DTT (Sigma, Cat# D9779). 9. 200 μL thin-wall PCR tubes (USA Scientific, Cat# 14023900). 10. 1.5 mL low-bind tubes (Eppendorf, Cat# 022431021). 11. 15 mL tubes (Corning Costar, Cat# 430790). 12. 96-well low-bind PCR plate (Eppendorf, Cat# 0030129512). 13. Sterile Reagent reservoir (Corning Costar, Cat# 07200127). 14. Thermocycler (Bio-Rad, T100).
2.2
Nuclei Isolation
1. Douncing buffer (DB) (1.5 mL per sample).
Reagents
Stock concentration Volume
Sucrose (Sigma, Cat# S7903)
1M
0.375 mL 250 mM
KCl (Sigma, Cat# P9333)
2M
18.8 μL
25 mM
MgCl2 (Sigma, Cat# 63069)
1M
7.5 μL
5 mM
Tris–HCl, pH 7.5 (Invitrogen, 1 M Cat# 15567027)
15 μL
10 mM
DTT
1M
1.5 μL
1 mM
Protease Inhibitor (Sigma, Cat# 04693159001)
50X
30 μL
1X
SUPERase IN (Invitrogen, Cat# AM2696)
20 U/μL
37.5 μL
0.5 U/μL
RNase OUT (Invitrogen, Cat# 40 U/μL 10777019)
18.8 μL
0.5 U/μL
H 2O
996 μL
NA
NA
Final concentration
5Phos/CTGTCTCTTATACACATCTddC TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG CGAATGCTCTGGCCTCTCAAGCACGTGGAT ATCCACGTGCTTGAGAGGCCAGAGCATTCG GGTCTGAGTTCGCACCGAAACATCGGCCAC GTGGCCGATGTTTCGGTGCGAACTCAGACC AAGCAGTGGTATCAACGCAGAGTGAAGGATGTGGGGGGGGG*H ACACTCTTTCCCTACACGACGCTCTTCCGATCT 5Phos/NNDCAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTG ACACTCTTTCCCTACACGACGCTCTTCCGATCTH 5Phos/NNDCDAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTG CAGACGTGTGCTCTTCCGATCT AAGCAGTGGTATCAACGCAGAGT AATGATACGGCGACCACCGAGATCTACACXXXXXXXXTCGTCGGCAGCGTC CAAGCAGAAGACGGCATACGAGATXXXXXXGTGACTGGAGTTCAGACGTGTGCTCTTCCGA TC AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATC*T
pMENTs
AdaptorA
Linker-R02
Blocker-R02
Linker-R03
Quencher-R03
Anchor-FokI-GH
P5-FokI
P5c-NNDC-FokI
P5H-FokI
P5Hc-NNDC-FokI
PA-F
PA-R
N5XX
P7XX
P5 Universal
* denotes Phosphorothioate Bonds modification N denotes random bases X denotes Illumina Index sequences
Sequence (5′-3′)
Name
Table 1 Primer sequences
158 Chenxu Zhu et al.
Name
DNA_#01_RE
DNA_#02_RE
DNA_#03_RE
DNA_#04_RE
DNA_#05_RE
DNA_#06_RE
DNA_#07_RE
DNA_#08_RE
DNA_#09_RE
DNA_#10_RE
DNA_#11_RE
DNA_#12_RE
RNA_#01_RE
RNA_#02_RE
RNA_#03_RE
RNA_#04_RE
RNA_#05_RE
RNA_#06_RE
RNA_#07_RE
RNA_#08_RE
Well position
A1
A2
A3
A4
A5
A6
A7
A8
A9
A10
A11
A12
B1
B2
B3
B4
B5
B6
B7
B8
Table 2 Barcode plate #01
(continued)
/5Phos/AGGCCAGAGCATTCGTGTTGCCTGCAGGTTTTTTTTTTTTTTTTVN
/5Phos/AGGCCAGAGCATTCGTTTACCCTGCAGGTTTTTTTTTTTTTTTTVN
/5Phos/AGGCCAGAGCATTCGTTACGCCTGCAGGTTTTTTTTTTTTTTTTVN
/5Phos/AGGCCAGAGCATTCGTGAATCCTGCAGGTTTTTTTTTTTTTTTTVN
/5Phos/AGGCCAGAGCATTCGTACAGCCTGCAGGTTTTTTTTTTTTTTTTVN
/5Phos/AGGCCAGAGCATTCGTAGCTCCTGCAGGTTTTTTTTTTTTTTTTVN
/5Phos/AGGCCAGAGCATTCGTATGACCTGCAGGTTTTTTTTTTTTTTTTVN
/5Phos/AGGCCAGAGCATTCGTCATCCCTGCAGGTTTTTTTTTTTTTTTTVN
/5Phos/AGGCCAGAGCATTCGAGGGCGCGGCCGCAGATGTGTATAAGAGACAG
/5Phos/AGGCCAGAGCATTCGATCTAGCGGCCGCAGATGTGTATAAGAGACAG
/5Phos/AGGCCAGAGCATTCGACGAAGCGGCCGCAGATGTGTATAAGAGACAG
/5Phos/AGGCCAGAGCATTCGACCGTGCGGCCGCAGATGTGTATAAGAGACAG
/5Phos/AGGCCAGAGCATTCGAGTTGGCGGCCGCAGATGTGTATAAGAGACAG
/5Phos/AGGCCAGAGCATTCGATTACGCGGCCGCAGATGTGTATAAGAGACAG
/5Phos/AGGCCAGAGCATTCGATACGGCGGCCGCAGATGTGTATAAGAGACAG
/5Phos/AGGCCAGAGCATTCGAGAATGCGGCCGCAGATGTGTATAAGAGACAG
/5Phos/AGGCCAGAGCATTCGAACAGGCGGCCGCAGATGTGTATAAGAGACAG
/5Phos/AGGCCAGAGCATTCGAAGCTGCGGCCGCAGATGTGTATAAGAGACAG
/5Phos/AGGCCAGAGCATTCGAATGAGCGGCCGCAGATGTGTATAAGAGACAG
/5Phos/AGGCCAGAGCATTCGACATCGCGGCCGCAGATGTGTATAAGAGACAG
Sequence
Single-Cell Co-Assay of Open Chromatin and RNA 159
Name
RNA_#09_RE
RNA_#10_RE
RNA_#11_RE
RNA_#12_RE
RNA_#01_NRE
RNA_#02_NRE
RNA_#03_NRE
RNA_#04_NRE
RNA_#05_NRE
RNA_#06_NRE
RNA_#07_NRE
RNA_#08_NRE
RNA_#09_NRE
RNA_#10_NRE
RNA_#11_NRE
RNA_#12_NRE
Well position
B9
B10
B11
B12
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C11
C12
Table 2 (continued)
/5Phos/AGGCCAGAGCATTCGTGGGCCCTGCAGGNNNNNN
/5Phos/AGGCCAGAGCATTCGTTCTACCTGCAGGNNNNNN
/5Phos/AGGCCAGAGCATTCGTCGAACCTGCAGGNNNNNN
/5Phos/AGGCCAGAGCATTCGTCCGTCCTGCAGGNNNNNN
/5Phos/AGGCCAGAGCATTCGTGTTGCCTGCAGGNNNNNN
/5Phos/AGGCCAGAGCATTCGTTTACCCTGCAGGNNNNNN
/5Phos/AGGCCAGAGCATTCGTTACGCCTGCAGGNNNNNN
/5Phos/AGGCCAGAGCATTCGTGAATCCTGCAGGNNNNNN
/5Phos/AGGCCAGAGCATTCGTACAGCCTGCAGGNNNNNN
/5Phos/AGGCCAGAGCATTCGTAGCTCCTGCAGGNNNNNN
/5Phos/AGGCCAGAGCATTCGTATGACCTGCAGGNNNNNN
/5Phos/AGGCCAGAGCATTCGTCATCCCTGCAGGNNNNNN
/5Phos/AGGCCAGAGCATTCGTGGGCCCTGCAGGTTTTTTTTTTTTTTTTVN
/5Phos/AGGCCAGAGCATTCGTTCTACCTGCAGGTTTTTTTTTTTTTTTTVN
/5Phos/AGGCCAGAGCATTCGTCGAACCTGCAGGTTTTTTTTTTTTTTTTVN
/5Phos/AGGCCAGAGCATTCGTCCGTCCTGCAGGTTTTTTTTTTTTTTTTVN
Sequence
160 Chenxu Zhu et al.
Single-Cell Co-Assay of Open Chromatin and RNA
161
Table 3 Barcode plate #02 Well position
Name
Sequence
A1
R02_#01
/5Phos/GTGCGAACTCAGACCAAACCGGATCCACGTGCTTGAG
A2
R02_#02
/5Phos/GTGCGAACTCAGACCAAACGTCATCCACGTGCTTGAG
A3
R02_#03
/5Phos/GTGCGAACTCAGACCAAAGATGATCCACGTGCTTGAG
A4
R02_#04
/5Phos/GTGCGAACTCAGACCAAATCCAATCCACGTGCTTGAG
A5
R02_#05
/5Phos/GTGCGAACTCAGACCAAATGAGATCCACGTGCTTGAG
A6
R02_#06
/5Phos/GTGCGAACTCAGACCAACACTGATCCACGTGCTTGAG
A7
R02_#07
/5Phos/GTGCGAACTCAGACCAACGTTTATCCACGTGCTTGAG
A8
R02_#08
/5Phos/GTGCGAACTCAGACCAAGAAGCATCCACGTGCTTGAG
A9
R02_#09
/5Phos/GTGCGAACTCAGACCAAGCCCTATCCACGTGCTTGAG
A10
R02_#10
/5Phos/GTGCGAACTCAGACCAAGCTACATCCACGTGCTTGAG
A11
R02_#11
/5Phos/GTGCGAACTCAGACCAATCTTGATCCACGTGCTTGAG
A12
R02_#12
/5Phos/GTGCGAACTCAGACCACAACACATCCACGTGCTTGAG
B1
R02_#13
/5Phos/GTGCGAACTCAGACCACAGTATATCCACGTGCTTGAG
B2
R02_#14
/5Phos/GTGCGAACTCAGACCACCAAGTATCCACGTGCTTGAG
B3
R02_#15
/5Phos/GTGCGAACTCAGACCACCCTAAATCCACGTGCTTGAG
B4
R02_#16
/5Phos/GTGCGAACTCAGACCACCCTTTATCCACGTGCTTGAG
B5
R02_#17
/5Phos/GTGCGAACTCAGACCACCTCTCATCCACGTGCTTGAG
B6
R02_#18
/5Phos/GTGCGAACTCAGACCACGATTGATCCACGTGCTTGAG
B7
R02_#19
/5Phos/GTGCGAACTCAGACCACGCAGAATCCACGTGCTTGAG
B8
R02_#20
/5Phos/GTGCGAACTCAGACCACGTAAAATCCACGTGCTTGAG
B9
R02_#21
/5Phos/GTGCGAACTCAGACCACTACCTATCCACGTGCTTGAG
B10
R02_#22
/5Phos/GTGCGAACTCAGACCACTCGGTATCCACGTGCTTGAG
B11
R02_#23
/5Phos/GTGCGAACTCAGACCACTGTCGATCCACGTGCTTGAG
B12
R02_#24
/5Phos/GTGCGAACTCAGACCACTTATGATCCACGTGCTTGAG
C1
R02_#25
/5Phos/GTGCGAACTCAGACCAGAAAGGATCCACGTGCTTGAG
C2
R02_#26
/5Phos/GTGCGAACTCAGACCAGAATCTATCCACGTGCTTGAG
C3
R02_#27
/5Phos/GTGCGAACTCAGACCAGACATAATCCACGTGCTTGAG
C4
R02_#28
/5Phos/GTGCGAACTCAGACCAGAGACCATCCACGTGCTTGAG
C5
R02_#29
/5Phos/GTGCGAACTCAGACCAGCCCAAATCCACGTGCTTGAG
C6
R02_#30
/5Phos/GTGCGAACTCAGACCAGCTATTATCCACGTGCTTGAG
C7
R02_#31
/5Phos/GTGCGAACTCAGACCAGGAGGTATCCACGTGCTTGAG
C8
R02_#32
/5Phos/GTGCGAACTCAGACCAGGGCTTATCCACGTGCTTGAG (continued)
162
Chenxu Zhu et al.
Table 3 (continued) Well position
Name
Sequence
C9
R02_#33
/5Phos/GTGCGAACTCAGACCAGGTGTAATCCACGTGCTTGAG
C10
R02_#34
/5Phos/GTGCGAACTCAGACCAGTGCTCATCCACGTGCTTGAG
C11
R02_#35
/5Phos/GTGCGAACTCAGACCAGTGGGAATCCACGTGCTTGAG
C12
R02_#36
/5Phos/GTGCGAACTCAGACCAGTTACGATCCACGTGCTTGAG
D1
R02_#37
/5Phos/GTGCGAACTCAGACCATAAGGGATCCACGTGCTTGAG
D2
R02_#38
/5Phos/GTGCGAACTCAGACCATCATTCATCCACGTGCTTGAG
D3
R02_#39
/5Phos/GTGCGAACTCAGACCATGGAACATCCACGTGCTTGAG
D4
R02_#40
/5Phos/GTGCGAACTCAGACCATGTGCCATCCACGTGCTTGAG
D5
R02_#41
/5Phos/GTGCGAACTCAGACCATTCACCATCCACGTGCTTGAG
D6
R02_#42
/5Phos/GTGCGAACTCAGACCATTCGAGATCCACGTGCTTGAG
D7
R02_#43
/5Phos/GTGCGAACTCAGACCCAAGCCTATCCACGTGCTTGAG
D8
R02_#44
/5Phos/GTGCGAACTCAGACCCACAAGGATCCACGTGCTTGAG
D9
R02_#45
/5Phos/GTGCGAACTCAGACCCACCTTAATCCACGTGCTTGAG
D10
R02_#46
/5Phos/GTGCGAACTCAGACCCAGAGTGATCCACGTGCTTGAG
D11
R02_#47
/5Phos/GTGCGAACTCAGACCCAGCGAAATCCACGTGCTTGAG
D12
R02_#48
/5Phos/GTGCGAACTCAGACCCAGGTCAATCCACGTGCTTGAG
E1
R02_#49
/5Phos/GTGCGAACTCAGACCCATAACTATCCACGTGCTTGAG
E2
R02_#50
/5Phos/GTGCGAACTCAGACCCATATCGATCCACGTGCTTGAG
E3
R02_#51
/5Phos/GTGCGAACTCAGACCCATCGATATCCACGTGCTTGAG
E4
R02_#52
/5Phos/GTGCGAACTCAGACCCATTACAATCCACGTGCTTGAG
E5
R02_#53
/5Phos/GTGCGAACTCAGACCCATTTCCATCCACGTGCTTGAG
E6
R02_#54
/5Phos/GTGCGAACTCAGACCCCAAATGATCCACGTGCTTGAG
E7
R02_#55
/5Phos/GTGCGAACTCAGACCCCACTTGATCCACGTGCTTGAG
E8
R02_#56
/5Phos/GTGCGAACTCAGACCCCGGATAATCCACGTGCTTGAG
E9
R02_#57
/5Phos/GTGCGAACTCAGACCCCGGTTTATCCACGTGCTTGAG
E10
R02_#58
/5Phos/GTGCGAACTCAGACCCCTAAGAATCCACGTGCTTGAG
E11
R02_#59
/5Phos/GTGCGAACTCAGACCCCTAGTCATCCACGTGCTTGAG
E12
R02_#60
/5Phos/GTGCGAACTCAGACCCCTGCAAATCCACGTGCTTGAG
F1
R02_#61
/5Phos/GTGCGAACTCAGACCCGACGTTATCCACGTGCTTGAG
F2
R02_#62
/5Phos/GTGCGAACTCAGACCCGAGTAAATCCACGTGCTTGAG
F3
R02_#63
/5Phos/GTGCGAACTCAGACCCGATTATATCCACGTGCTTGAG
F4
R02_#64
/5Phos/GTGCGAACTCAGACCCGTAGCAATCCACGTGCTTGAG (continued)
Single-Cell Co-Assay of Open Chromatin and RNA
163
Table 3 (continued) Well position
Name
Sequence
F5
R02_#65
/5Phos/GTGCGAACTCAGACCCGTCTGAATCCACGTGCTTGAG
F6
R02_#66
/5Phos/GTGCGAACTCAGACCCTACAGCATCCACGTGCTTGAG
F7
R02_#67
/5Phos/GTGCGAACTCAGACCCTCAATAATCCACGTGCTTGAG
F8
R02_#68
/5Phos/GTGCGAACTCAGACCCTCGTTGATCCACGTGCTTGAG
F9
R02_#69
/5Phos/GTGCGAACTCAGACCCTCTACGATCCACGTGCTTGAG
F10
R02_#70
/5Phos/GTGCGAACTCAGACCCTTGGGTATCCACGTGCTTGAG
F11
R02_#71
/5Phos/GTGCGAACTCAGACCGAAACTCATCCACGTGCTTGAG
F12
R02_#72
/5Phos/GTGCGAACTCAGACCGACTGTCATCCACGTGCTTGAG
G1
R02_#73
/5Phos/GTGCGAACTCAGACCGATACAGATCCACGTGCTTGAG
G2
R02_#74
/5Phos/GTGCGAACTCAGACCGCGATCAATCCACGTGCTTGAG
G3
R02_#75
/5Phos/GTGCGAACTCAGACCGCGTACTATCCACGTGCTTGAG
G4
R02_#76
/5Phos/GTGCGAACTCAGACCGCTCGAAATCCACGTGCTTGAG
G5
R02_#77
/5Phos/GTGCGAACTCAGACCGGAAGAAATCCACGTGCTTGAG
G6
R02_#78
/5Phos/GTGCGAACTCAGACCGGAGATTATCCACGTGCTTGAG
G7
R02_#79
/5Phos/GTGCGAACTCAGACCGGGCTAAATCCACGTGCTTGAG
G8
R02_#80
/5Phos/GTGCGAACTCAGACCGGGTATGATCCACGTGCTTGAG
G9
R02_#81
/5Phos/GTGCGAACTCAGACCGGTAACCATCCACGTGCTTGAG
G10
R02_#82
/5Phos/GTGCGAACTCAGACCGGTAGTGATCCACGTGCTTGAG
G11
R02_#83
/5Phos/GTGCGAACTCAGACCGGTGAAAATCCACGTGCTTGAG
G12
R02_#84
/5Phos/GTGCGAACTCAGACCGTAATCGATCCACGTGCTTGAG
H1
R02_#85
/5Phos/GTGCGAACTCAGACCGTATAAGATCCACGTGCTTGAG
H2
R02_#86
/5Phos/GTGCGAACTCAGACCGTCAGACATCCACGTGCTTGAG
H3
R02_#87
/5Phos/GTGCGAACTCAGACCGTCCCTTATCCACGTGCTTGAG
H4
R02_#88
/5Phos/GTGCGAACTCAGACCGTGCCATATCCACGTGCTTGAG
H5
R02_#89
/5Phos/GTGCGAACTCAGACCGTGGTCTATCCACGTGCTTGAG
H6
R02_#90
/5Phos/GTGCGAACTCAGACCGTTCTCCATCCACGTGCTTGAG
H7
R02_#91
/5Phos/GTGCGAACTCAGACCGTTGCTTATCCACGTGCTTGAG
H8
R02_#92
/5Phos/GTGCGAACTCAGACCTACCCGAATCCACGTGCTTGAG
H9
R02_#93
/5Phos/GTGCGAACTCAGACCTAGACGAATCCACGTGCTTGAG
H10
R02_#94
/5Phos/GTGCGAACTCAGACCTAGTCACATCCACGTGCTTGAG
H11
R02_#95
/5Phos/GTGCGAACTCAGACCTCACATCATCCACGTGCTTGAG
H12
R02_#96
/5Phos/GTGCGAACTCAGACCTCAGCTGATCCACGTGCTTGAG
164
Chenxu Zhu et al.
2. Nuclei isolation buffer (NIB) (1 mL per sample). Stock concentration
Volume
Final concentration
IGEPAL CA-630 (Sigma, Cat# I8896)
10%
20 μL
0.2%
BSA in DPBS (Sigma, Cat# A1595)
10%
0.5 mL
5%
Protease Inhibitor
50X
20 μL
1X
SUPERase IN
20 U/μL
25 μL
0.5 U/μL
RNase OUT
40 U/μL
12.5 μL
0.5 U/μL
DPBS (Gibco, Cat# 14190136)
1X
422.5 μL NA
Reagents
3. 5% Triton X-100 (diluted from Sigma, Cat# T9284). 4. Dounce tissue grinder set (1.0 mL) (KIMBLE, Cat# DWK885300-0001). 5. Celltrics filters (30 μm) (Sysmex, Cat# 04-0042-2316). 6. Axygen Maximum Recovery tube (Corning, Cat# MCT-150L-C). 7. TC20 Cell Counter (Bio-Rad). 8. 1.5 mL low-bind tubes (Eppendorf, Cat# 022431021). 2.3 Chromatin Tagmentation
1. 10 mM PitStop2 (Millipore, Cat# SML1169). 2. 2X Tagmentation Buffer (10 mL, store at 4 °C).
Reagents
Stock concentration
Final Volume concentration
Tris–Ac, pH 7.5 (Sigma, Cat# 1 M 93337)
660 μL
66 mM
KAc (Sigma, Cat# P5708)
3M
440 μL
132 mM
MgAc2 (Sigma, Cat# M2545) 1 M
200 μL
20 mM
DMF (Millipore, Cat# DX1730)
NA
3200 μL 32%
Ultrapure H2O
1X
5500 μL NA
Single-Cell Co-Assay of Open Chromatin and RNA
165
3. Tagmentation Mix. Reagents
Stock concentration
Volume
2X Tagmentation Buffer
2X
66 μL
RNase OUT
40 U/μL
3.3 μL
SUPERase IN
20 U/μL
6.6 μL
Proteinase Inhibitor cocktail
50X
2.7 μL
PitStop2 (Sigma, Cat# SML1169)
10 mM
1 μL
Ultrapure H2O
NA
36.6 μL
4. 40 mM EDTA (diluted from Invitrogen, Cat# AM9261). 5. Loaded Tn5 (see step 3 of Subheading 3.1). 6. ThermoMixer (Eppendorf ThermoMixer R). 2.4 Reverse Transcription
1. NEBuffer 3.1 (NEB, Cat# B7203S). 2. Maxima H minus reverse transcriptase (Invitrogen, Cat# EP0751). 3. 5% Triton X-100 (diluted from Sigma, Cat# T9284). 4. RT Mix.
Stock concentration
Volume
5X RT Buffer (with Maxima H minus reverse transcriptase)
5X
52.8 μL
PBS
1X
52.8 μL
dNTP
10 mM
13.2 μL
RNase OUT
40 U/μL
1.65 μL
SUPERase IN
20 U/μL
3.3 μL
Ultrapure H2O
NA
61 μL
Reagents
5. Thermocycler (Bio-Rad, Cat# T100). 2.5 Adding DNA Barcodes
1. Ligation Mix.
Reagents T4 DNA Ligase Buffer (NEB, Cat# B0202S)
Stock concentration
Volume
10X
500 μL (continued)
166
Chenxu Zhu et al.
Reagents
Stock concentration
Volume
BSA (NEB, Cat# B9000S)
20 mg/mL
50 μL
NEBuffer 3.1 (NEB, Cat# B7203S)
10X
100 μL
Ultrapure H2O
NA
2250 μL
2. R02 Blocking Solution (see step 6 of Subheading 3.1). 3. R03 Termination Solution (see step 6 of Subheading 3.1). 4. T4 DNA Ligase (NEB, Cat# M0202L). 5. R02 Barcoding Working Plate (see step 4 of Subheading 3.1). 6. R03 Barcoding Working Plate (see step 4 of Subheading 3.1). 7. Proteinase K (NEB, Cat# P8107S). 8. SPRI beads (Beckman Coulter, Cat# B23319). 9. 80% EtOH. 10. 200 μL thin-wall PCR tubes or 96-well PCR plate. 11. Eppendorf ThermoMixer. 12. PCR plate film (Bio-Rad, Microseal B, Cat# MSB1001). 2.6 Library Preamplification
1. Terminal Transferase (NEB, Cat# M0315S). 2. 1 mM dCTP (NEB, Cat# N0446S). 3. Anchor Mix (15 μL per sample).
Reagents
Stock concentration
Volume
5X KAPA reaction buffer
5X
6 μL
dNTP
10 mM
0.6 μL
Anchor-FokI-GH (Table 1)
10 μM
0.6 μL
Ultrapure H2O
NA
7.2 μL
KAPA HiFI HS (KAPA, Cat# KK2502)
NA
0.6 μL
Reagents
Stock concentration
Volume
5X KAPA reaction buffer
5X
4 μL
dNTP
10 mM
0.5 μL
PA-F (Table 1)
10 μM
2 μL
4. Preamp Mix (20 μL per sample).
(continued)
Single-Cell Co-Assay of Open Chromatin and RNA
167
Reagents
Stock concentration
Volume
PA-R (Table 1)
10 μM
2 μL
Ultrapure H2O
NA
11 μL
KAPA HiFI HS (KAPA, Cat# KK2502)
NA
0.5 μL
5. Qubit dsDNA HS Assay Kit (Invitrogen, Cat# Q32854). 6. SPRI beads (Beckman). 7. Qubit (ThermoFisher Scientific, Cat# Q33239). 8. 200 μL thin-wall PCR tubes. 9. Thermocycler. 2.7
Library Splitting
1. FokI (NEB, Cat# R0109S). 2. NotI-HF (NEB, Cat# R3189). 3. SbfI-HF (NEB, Cat# R3642). 4. Adaptor Mix (see step 5 of Subheading 3.1). 5. Nextera XT DNA library preparation kit (Illumina, Cat# FC-131-1024). 6. SPRI beads (Beckman). 7. 200 μL thin-wall PCR tubes. 8. Thermocycler. 9. Magnetic separation rack (Bel-Art, Cat# F19900-0003).
2.8 Library Amplification
1. Illumina TruSeq i7 index primers (NEB, Cat# E7600S). 2. Illumina TruSeq i5 index primers (NEB, Cat# E7600S). 3. Illumina Nextera FC-131-2001).
i5
index
primers
(Illumina,
Cat#
4. NEBNext 2X HiFi PCR master mix (NEB, Cat# M0541S). 5. KAPA qPCR quantification kit for Illumina (KAPA, Cat# KK4923/4933/4943/4953/4973). 6. SPRI beads. 7. 200 μL thin-wall PCR tubes. 8. Thermocycler. 9. Magnetic separation rack. 10. Agilent Tapestation (Agilent 4200). 2.9 Sequencing and Data Preprocessing
1. Illumina Sequencer: HiSeq 2500/4000, NextSeq 550/2000, and NovaSeq 6000 were tested compatible with Paired-seq libraries.
Name
R04_#01
R04_#02
R04_#03
R04_#04
R04_#05
R04_#06
R04_#07
R04_#08
R04_#09
R04_#10
R04_#11
R04_#12
R04_#13
R04_#14
R04_#15
R04_#16
R04_#17
R04_#18
R04_#19
R04_#20
Well position
A1
A2
A3
A4
A5
A6
A7
A8
A9
A10
A11
A12
B1
B2
B3
B4
B5
B6
B7
B8
Table 4 Barcode plate #02
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNACGTAAANNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNACGCAGANNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNACGATTGNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNACCTCTCNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNACCCTTTNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNACCCTAAGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNACCAAGTGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNACAGTATGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNACAACACNNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAATCTTGNNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAAGCTACNNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAAGCCCTNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAAGAAGCNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAACGTTTNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAACACTGNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAAATGAGNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAAATCCANGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAAAGATGGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAAACGTCGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAAACCGGGTGGCCGATGTTTCG
Sequence
168 Chenxu Zhu et al.
R04_#21
R04_#22
R04_#23
R04_#24
R04_#25
R04_#26
R04_#27
R04_#28
R04_#29
R04_#30
R04_#31
R04_#32
R04_#33
R04_#34
R04_#35
R04_#36
R04_#37
R04_#38
R04_#39
R04_#40
R04_#41
R04_#42
R04_#43
B9
B10
B11
B12
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C11
C12
D1
D2
D3
D4
D5
D6
D7
(continued)
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCAAGCCTNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNATTCGAGNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNATTCACCNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNATGTGCCNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNATGGAACGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNATCATTCGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNATAAGGGGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAGTTACGNNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAGTGGGANNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAGTGCTCNNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAGGTGTANNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAGGGCTTNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAGGAGGTNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAGCTATTNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAGCCCAANGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAGAGACCNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAGACATAGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAGAATCTGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNAGAAAGGGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNACTTATGNNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNACTGTCGNNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNACTCGGTNNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNACTACCTNNGTGGCCGATGTTTCG
Single-Cell Co-Assay of Open Chromatin and RNA 169
Name
R04_#44
R04_#45
R04_#46
R04_#47
R04_#48
R04_#49
R04_#50
R04_#51
R04_#52
R04_#53
R04_#54
R04_#55
R04_#56
R04_#57
R04_#58
R04_#59
R04_#60
R04_#61
R04_#62
R04_#63
Well position
D8
D9
D10
D11
D12
E1
E2
E3
E4
E5
E6
E7
E8
E9
E10
E11
E12
F1
F2
F3
Table 4 (continued)
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCGATTATGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCGAGTAAGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCGACGTTGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCCTGCAANNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCCTAGTCNNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCCTAAGANNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCCGGTTTNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCCGGATANNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCCACTTGNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCCAAATGNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCATTTCCNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCATTACANGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCATCGATGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCATATCGGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCATAACTGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCAGGTCANNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCAGCGAANNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCAGAGTGNNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCACCTTANNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCACAAGGNNGTGGCCGATGTTTCG
Sequence
170 Chenxu Zhu et al.
R04_#64
R04_#65
R04_#66
R04_#67
R04_#68
R04_#69
R04_#70
R04_#71
R04_#72
R04_#73
R04_#74
R04_#75
R04_#76
R04_#77
R04_#78
R04_#79
R04_#80
R04_#81
R04_#82
R04_#83
R04_#84
R04_#85
R04_#86
F4
F5
F6
F7
F8
F9
F10
F11
F12
G1
G2
G3
G4
G5
G6
G7
G8
G9
G10
G11
G12
H1
H2
(continued)
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGTCAGACGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGTATAAGGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGTAATCGNNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGGTGAAANNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGGTAGTGNNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGGTAACCNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGGGTATGNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGGGCTAANNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGGAGATTNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGGAAGAANGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGCTCGAANGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGCGTACTGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGCGATCAGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGATACAGGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGACTGTCNNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGAAACTCNNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCTTGGGTNNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCTCTACGNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCTCGTTGNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCTCAATANNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCTACAGCNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCGTCTGANGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNCGTAGCANGTGGCCGATGTTTCG
Single-Cell Co-Assay of Open Chromatin and RNA 171
Name
R04_#87
R04_#88
R04_#89
R04_#90
R04_#91
R04_#92
R04_#93
R04_#94
R04_#95
R04_#96
Well position
H3
H4
H5
H6
H7
H8
H9
H10
H11
H12
Table 4 (continued)
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNTCAGCTGNNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNTCACATCNNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNTAGTCACNNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNTAGACGANNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNTACCCGANNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGTTGCTTNNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGTTCTCCNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGTGGTCTNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGTGCCATNGTGGCCGATGTTTCG
CAGACGTGTGCTCTTCCGATCTNNNNNNNNNNGTCCCTTGTGGCCGATGTTTCG
Sequence
172 Chenxu Zhu et al.
Single-Cell Co-Assay of Open Chromatin and RNA
173
2. Computation resources: A server with 16 cores and 128 GB RAM or above is recommended; storage space dependents on the number of cells analyzed, typically 1 TB of storage is needed for analysis of 100 k cells.
3
Methods
3.1 Reagents Preparation
1. All oligo DNA sequences in this subheading are listed in Tables 1 and 2. To prepare the RT primer mix, mix 12.5 μL of barcoded T15VN primer (RNA_#XX_RE, 100 μM), 12.5 μL barcoded N6 primer (RNA_#XX_NRE, 100 μM), and 75 μL ultrapure nuclease-free water in PCR tubes. Vortex to mix and store at -20 °C. 2. To prepare barcoded Tn5 adaptors, mix 10 μL barcoded Tn5 adaptor (DNA_#XX_RE, 100 μM) and 10 μL pMENTs (100 μM) in PCR tubes. Using a thermocycler, heat the mix at 95 °C for 5 min and slowly cool down to 20 °C (0.1 °C/s). Store the annealed adaptors at -20 °C or immediately use for step 3. 3. To prepare barcoded Tn5 complex, add 5 μL of barcoded annealed Tn5 adaptors (from step 2) to 1.5 mL low-bind tubes. Add 35 μL 0.5 mg/mL unloaded Tn5 protein to each tube and pipette to mix 5 times. Then vortex to mix for 3–5 s and spin down quickly. Incubate at room temperature for 30 min, then transfer to ice and sit for 5 min. Store at -20 °C. 4. To prepare R02 and R03 barcode plates, add 6 μL of R02 or R03 barcoded oligo (BC Plate#02 or BC Plate#03, 100 μM), 5.5 μL of Linker-R02 or Linker-R03 oligo (100 μM), and 38.5 μL ultrapure nuclease-free water to each well of a low-bind 96-well PCR plate and seal the plate (annealing plate). Heat at 95 °C for 5 min and slowly cool down to 20 ° C (0.1 °C/s). Aliquot 10 μL of annealed barcoded oligos from each well of annealing plate to four low-bind 96-well PCR plates (working plates). Store the working plates at -20 °C. 5. To prepare Adaptor Mix: (a) prepare P5-complex (25 μL 100 μM P5-FokI and 25 μL 100 μM P5c-NNDC-FokI) and P5H-complex (25 μL 100 μM P5H-FokI and 25 μL 100 μM P5Hc-NNDC-FokI) in two different tubes; (b) in a thermocycler, heat the mixtures for 5 min at 95 °C and slowly cool down to 20 °C (-0.1 °C/s); (c) mix 15 μL of P5-complex with 45 μL of P5H-complex on ice and pipette to mix, then add 240 μL cold ultrapure water (to dilute from 50 to 10 μM) and store at -20 °C. 6. To prepare R02 Blocking Solution, add 264 μL 100 μM Blocker-R02, 250 μL 10X T4 DNA Ligase Buffer, and
174
Chenxu Zhu et al.
486 μL ultrapure water to a 1.5 mL tube and mix. To prepare R03 Termination Solution, add 264 μL 100 μM QuencherR02, 500 μL 0.5 M EDTA, and 236 μL ultrapure water to a 1.5 mL tube and mix. Both R02 Blocking Solution and R03 Termination Solution should be kept on ice for later use. 3.2
Nuclei Isolation
1. Preparation of single-cell resuspension is required for nuclei isolation, which has different preferred protocols [27]. Here we take nuclei preparation from frozen mouse brain as an example (see Note 3). 2. For each sample, prepare 1.5 mL douncing buffer (DB) and 1 mL nuclei isolation buffer (NIB) freshly each time before performing the experiments. Prechill any tubes or tools. Set the centrifuge to 4 °C (see Note 4). 3. Wash the douncer with 1 mL of ultrapure water. Prechill the dounce and pestle (1 mL) on ice (avoid contamination by placing them on a parafilm or in a tube). 4. Add 0.5 mL of DB into douncer, and then add 10 μL 5% Triton X-100. 5. Transfer ~20–50 mg dissected frozen mouse brain tissue directly to the douncer with DB. 6. Apply the loose pestle gently 5–10 times on ice, and avoid introducing bubbles. 7. Apply the tight pestle gently 15–30 times on ice, and avoid introducing bubbles. 8. Filter the single-cell suspension with a 30 μm Celltrics filter into a 1.5 mL Axygen Maximum Recovery tube. Spin down at 1000 × g for 10 min at 4 °C and carefully discard the supernatant. 9. Gently resuspend the cell pellet in 0.5 mL of NIB. Spin down again at 1000 × g for 10 min at 4 °C, and discard the supernatant. 10. Gently resuspend the cell pellet in 0.5 mL of NIB and incubate on ice for 5–10 min. Take out 10 μL to measure the nuclei concentration with the cell counter.
3.3 Chromatin Tagmentation
1. Freshly prepare the Tagmentation Mix and keep on ice. 2. Label 12 tubes for tagmentation. Aliquot a total of 1200–2400 k nuclei into 12 tubes on ice, each tube with 100–200 k nuclei. Different samples or replicates can be multiplexed here, differed by their 1st round barcode (sample barcode) (see Note 5). 3. Spin down the 12 tubes at 1000 × g for 10 min at 4 °C, and carefully discard the supernatant. Samples should be kept on ice.
Single-Cell Co-Assay of Open Chromatin and RNA
175
4. For each tube, resuspend the nuclei pellet in 9 μL Tagmentation Mix. Add 1 μL of barcoded Tn5 into the corresponding tube. 5. Incubate in a ThermoMixer set at 37 °C, 550 rpm for 30 min. 6. Immediately add 5 μL of 40 mM EDTA and gently pipette to mix. Spin down at 1000 × g for 10 min at 4 °C, and carefully remove all the supernatant. Keep the nuclei on ice and proceed to Subheading 3.4 immediately. 3.4 Reverse Transcription
1. Freshly prepare the RT mix and keep on ice. 2. Add each of the 4 μL barcoded RT primers into the 12 corresponding 200 μL PCR tubes. 3. Resuspend the 12 tubes of nuclei pellet with 14 μL RT mix and transfer to 12 of 200 μL PCR tubes from the previous step with barcoded RT primers. 4. Add 2 μL Maxima H minus reverse transcriptase to each tube. Tap to mix and briefly spin down. 5. Perform the reverse transcription program in a thermocycler using the program set up as below:
Step no.
Temperature (°C)
Time
1
50
10 min
2
8
12 s
15
45 s
20
45 s
30
30 s
42
2 min
50
5 min; repeat step 2 for additional 2 cycles
3
50
10 min
4
12
Hold
6. Transfer the 12 tubes to ice. Keep on ice and pool all nuclei into a 1.5 mL Axygen Maximum Recovery tube, add 4.8 μL 5% Triton X-100, tap to mix, and quickly spin down. 7. Centrifuge to pellet the nuclei at 1000 × g for 10 min at 4 °C, and carefully discard the supernatant. 8. Resuspend the nuclei in 1 mL 1X NEBuffer 3.1 and proceed to Subheading 3.5 immediately.
176
Chenxu Zhu et al.
3.5 Adding DNA Barcodes
1. Prepare R02 Blocking Solution, R03 Termination Solution, and two tubes of Ligation Mix freshly before the experiment. 2. Prewash two 15 mL Corning tubes by rinsing each tube with 0.5 mL 0.1% BSA in PBS, and discard the liquid (see Note 6). 3. Add the nuclei suspension to the 1st Ligation Mix, add 100 μL T4 DNA Ligase, and gently mix by pipetting up and down. 4. Transfer the nuclei-Ligation Mix to a reagent reservoir, and distribute 40 μL of the mixture to each of the 96-well of R02 Barcoding Plate with a multichannel pipette. Seal the plate with film. 5. Incubate the nuclei–barcode ligation mixture in a ThermoMixer set to 37 °C, 300 rpm for 30 min. 6. Open the seal, add 10 μL of R02 Blocking Solution into each of the 96-well with a multichannel pipette, and reseal the plate. 7. Continue incubating the nuclei–barcode ligation mixture in a ThermoMixer set to 37 °C, 300 rpm for another 30 min. 8. Pool all nuclei in a reagent reservoir, and transfer the mixture containing the nuclei from the reagent reservoir to a 15 mL tube (prewashed with 0.1% BSA in PBS in step 2). 9. Wash the reagent reservoir with 1 mL of PBS and combine to the nuclei mixture. 10. Spin down the nuclei with a swing bucket centrifuge at 1000 × g for 10 min at 4 °C, and carefully discard the supernatant (see Note 7). 11. Resuspend the nuclei in 1 mL 1X NEBuffer 3.1. 12. Transfer the nuclei suspension to the 2nd Ligation Mix, add 100 μL T4 DNA Ligase, and gently mix by pipetting up and down. 13. Transfer the nuclei-Ligation Mix to a reagent reservoir, and distribute 40 μL of the mixture to each of the 96-well of R03 Barcoding Plate with a multichannel pipette. Seal the plate. 14. Incubate the nuclei–barcode ligation mixture in a ThermoMixer set to 37 °C, 300 rpm for 30 min. 15. Open the seal and add 10 μL of R03 Termination Solution into each of the 96-well with a multichannel pipette. 16. Immediately pool all nuclei in a reagent reservoir, and transfer the mixture containing the nuclei from the reagent reservoir to a 15 mL tube (prewashed with 0.1% BSA in PBS in step 2). 17. Wash the reagent reservoir with 1 mL of PBS and intermix with the nuclei mixture.
Single-Cell Co-Assay of Open Chromatin and RNA
177
18. Spin down the nuclei with a swing bucket centrifuge at 1000 × g for 10 min at 4 °C, and carefully discard the supernatant (see Note 7). 19. Resuspend the nuclei in 50 μL PBS (nuclei stock suspension). Dilute 1 μL of nuclei with 9 μL of PBS and count the concentration of nuclei. 20. Dilute the nuclei stock suspension to 1 k/μL. Aliquot 3 μL of nuclei (total 3 k) into 200 μL PCR tubes or 96-well low-bind PCR plates (as sub-libraries) (see Note 9). 21. Prepare the lysis mix as follows: (a) calculate the number of sub-libraries that need to be lysed; (b) for each sub-library, the lysis mix contains 18 μL PBS, 3 μL 4 M NaCl, 3 μL 10% SDS, and 3 μL 20 mg/mL Proteinase K; (c) add and mix the reagents in the order of PBS, NaCl, SDS, and Proteinase K. 22. Add 27 μL of lysis mix to each sub-library. Incubate in a ThermoMixer set to 55 °C, 550 rpm for 2 h. 23. Cool the lysis mixture to room temperature. Add 30 μL of SPRI beads (1X) into each well and mix. Incubate at room temperature for 5 min. Prepare 80% EtOH (see Note 8). 24. Place the tubes or plate on a magnetic stand, sit for 5 min until the liquid becomes clear, and carefully discard the supernatant. 25. Add 150 μL of 80% EtOH into each tube/well, sit for 30 s, and discard the supernatant. 26. Repeat step 25 for a total of two washes. 27. Elute the DNA/cDNA with 12.5 μL ultrapure H2O. The purified DNA/cDNA can be stored at -20 °C or can be directly used in Subheading 3.6. 3.6 Library Preamplification
1. Add 1.5 μL of 10X Terminal Transferase Buffer and 0.5 μL of 1 mM dCTP into each sub-library. Close the lid, tap to mix, and briefly spin down. 2. Incubate at 95 °C for 5 min and immediately chill on ice and sit for another 5 min. 3. Add 0.5 μL of Terminal Transferase into each tube. Close the lid, tap to mix, and briefly spin down. 4. Incubate at 37 °C for 30 min, followed by heat inactivating the reaction at 65 °C for 10 min. 5. Prepare the Anchor Mix freshly. Add 15 μL Anchor Mix into each tube. Close the lid, tap to mix, and briefly spin down. 6. Carry out the reaction in a thermocycler with the program below:
178
Chenxu Zhu et al.
Step no.
Temperature (°C) Time
1
95
3 min
2
95
15 s
47
1 min
68
2 min
47
1 min
68
2 min; repeat step 2 for additional 15 cycles
3
72
10 min
4
12
Hold
7. Prepare the Preamp Mix freshly. Add 20 μL Preamp Mix to each tube and gently mix by pipetting up and down. 8. Carry out the reaction in a thermocycler with the program below: Step no.
Temperature (°C) Time
1
98
3 min
2
98
20 s
65
20 s
72
2.5 min; repeat step 2 for additional 10 cycles
3
72
2 min
4
12
Hold
9. Add 10 μL of SPRI beads (0.2X) into each tube and mix. Incubate at room temperature for 5 min. Prepare 80% EtOH. 10. Place the tubes to a magnetic stand, and let them sit for 5 min until the liquid becomes clear. 11. Transfer the supernatant into new tubes, add 32.5 μL SPRI beads (0.65X + 0.2X = 0.85X) to each tube, and mix. Incubate at room temperature for 5 min. 12. Place the tubes on a magnetic stand, and let them sit for 5 min until the liquid becomes clear. Carefully discard the supernatant. 13. Add 150 μL of 80% EtOH into each tube/well, sit for 30 s, and discard the supernatant. 14. Repeat step 13 for a total of two washes.
Single-Cell Co-Assay of Open Chromatin and RNA
179
15. Elute the amplification product with 40 μL ultrapure H2O. The purified product can be stored at -20 °C. 16. Quantify the concentration of the amplification product with Qubit. 3.7
Library Splitting
1. Divide the purified amplification product into two tubes: 20.5 μL for Tn5 tagmentation-derived DNA library preparation and 17 μL for RNA-derived library preparation. 2. Steps 2–11 are for DNA library preparation. Add 2.5 μL 10X CutSmart Buffer, 1 μL SbfI-HF, and 1 μL FokI to each 20.5 μL aliquot of the purified amplification product. Incubate at 37 °C for 1 h. 3. Add 31.3 μL SPRI beads (1.25X) to each tube and mix. Incubate at room temperature for 5 min. 4. Place the tubes on a magnetic stand, and let them sit for 5 min until the liquid becomes clear. Carefully discard the supernatant. 5. Add 150 μL 80% EtOH into each tube/well, sit for 30 s, and discard the supernatant. 6. Repeat step 5 for a total of two washes. 7. Elute the amplification product with 15 μL ultrapure H2O. The purified product can be stored at -20 °C. 8. Add 2 μL 10X T4 DNA Ligase Buffer, 1.5 μL Adaptor Mix, and 1.5 μL T4 DNA Ligase to each tube from the previous step. Close the lid, tap to mix, and briefly spin down. 9. Carry out the ligation reaction in a thermocycler with the program as given below. Put the tubes to thermocycler immediately after the temperature reached 4 °C:
Step no.
Temperature (°C)
Time
1
4
10 min
2
10
5 min
3
16
15 min
4
25
45 min
5
12
Hold
10. Add 25 μL SPRI beads (1.25X) directly to the reaction mixture and mix. Incubate at room temperature for 5 min and repeat the wash steps as described in steps 4–6. 11. Elute the adaptor-ligated DNA with 21 μL ultrapure H2O. The purified product can be stored at -20 °C.
180
Chenxu Zhu et al.
12. Steps 12–18 are for RNA library preparation. Add 2 μL 10X CutSmart Buffer and 1 μL NotI-HF into the 17 μL amplification product. Incubate at 37 °C for 1 h. 13. Add 25 μL SPRI beads (1.25X) to each tube and mix. Incubate at room temperature for 5 min and repeat the wash steps as described in steps 4–6. 14. Elute with 10 μL ultrapure H2O. The purified product can be store at -20 °C. 15. Use 5 μL of the purified product for tagmentation with Illumina Nextera XT. Add 10 μL Buffer TD and pipette up and down to mix. 16. Add 5 μL of Amplicon Tagmentation Mix (ATM) to each tube, pipette 10 times to mix and close the lid, and quickly spin down. 17. Incubate the mixture in a thermocycler at 55 °C for 5 min, cool down to 10 °C, and immediately place the tubes on ice. 18. Add 5 μL Neutralize Tagment Buffer (NT) to each well, pipette 10 times to mix, close the lid, and incubate at room temperature for 5 min. Proceed to step 8 of Subheading 3.8. 3.8 Library Amplification
1. Steps 1–7 are for DNA library amplification. Add 2 μL of Illumina TruSeq i7 index primers, 2 μL Illumina TruSeq i5 index primers, and 25 μL NEBNext 2X HiFi PCR mix. Use pipette to mix, close the lid, and quickly spin down. 2. Carry out the PCR reaction in a thermocycler with the program as follows (see Note 10):
Temperature Step no. (°C)
Time
1
98
3 min
2
98
10 s
63
30 s
72
1 min; repeat step 2 for additional 11–13 cycles
3
72
5 min
4
12
Hold
3. Add 42.5 μL SPRI beads (0.85X) to each tube and mix. Incubate at room temperature for 5 min.
Single-Cell Co-Assay of Open Chromatin and RNA
181
4. Place the tubes on a magnetic stand, and let them sit for 5 min until the liquid becomes clear. Carefully discard the supernatant. 5. Add 150 μL 80% EtOH into each tube/well, sit for 30 s, and discard the supernatant. 6. Repeat step 5 for a total of two washes. 7. Elute the DNA library with 25 μL ultrapure H2O. The purified library can be stored at -20 °C. 8. Steps 8–11 are for RNA library amplification. Add 6 μL ultrapure H2O, 2 μL of Illumina TruSeq i7 index primers, 2 μL of Illumina Nextera i5 index primers, and 15 μL Nextera PCR Mix (NPM) to each tube. Pipette to mix, close the lid, and briefly spin down. 9. Carry out the PCR reaction in a thermocycler with the program as follows (see Note 10): Temperature Step no. (°C)
Time
1
72
3 min
2
95
30 s
3
95
10 s
55
30 s
72
1 min; repeat step 2 for additional 11–13 cycles
4
72
5 min
5
12
Hold
10. Purify the RNA library as described in steps 3–6. 11. Elute the RNA library with 25 μL ultrapure H2O. The purified library can be stored at -20 °C. 12. Quantify the concentration of library with KAPA qPCR quantification kit for Illumina. Check the size distribution of libraries with Tapestation (see Notes 11–13). 3.9 Sequencing and Data Preprocessing
1. DNA and RNA libraries with different combinations of indices can be multiplexed for sequencing. 2. Paired-seq requires at least 50 cycles for Read1 (insert genomic sequences), 8 cycles for Index Read1, 8 cycles for Index Read2, and 100 cycles for Read2 (cellular barcodes) (50 + 8 + 8 + 100).
182
Chenxu Zhu et al.
3. Paired-seq data preprocessing includes: (a) extracting threeround barcode sequences from Read2, (b) assigning barcode sequences to individual tube/wells, and (c) mapping reads to reference genome and generation of cell-counts matrices. 4. All scripts required for Paired-seq data preprocessing and the analysis steps are available from GitHub (https://github.com/ cxzhu/Paired-Tag).
4
Notes 1. All the safe pause points in the protocol are indicated in Fig. 1a. 2. For 96-well barcode plates, standard desalting purification can be used. For Index PCR primers, HPLC purification is required. 3. Native nuclei isolated from snap-frozen tissues or fresh tissue are preferred. Crosslinked nuclei will reduce the complexities for Paired-seq DNA libraries. 4. Nuclei preparation, tagmentation, reverse transcription, and combinatorial DNA barcoding must be carried out in a single day, which will take ~8 h. 5. The optimal input nuclei number is 1.2 million (or 100,000 × 12 tubes). Less cell number is acceptable but will result in a lower recovery rate. We can typically recover 200,000–300,000 from 1.2 million input nuclei (17–25%) and 30,000–50,000 from 500,000 (41,700 × 12 tubes) input nuclei (6–10%). 6. Prewash the 15 mL tubes with 0.1% BSA in PBS, which can reduce nuclei sticking to the tube and increase nuclei recovery rate. 7. During removing supernatants after spin-down steps, remove the liquid as much as possible. The downstream reaction might be interfered by residual buffers, salt, or oligos from the previous step (e.g., EDTA after tagmentation in step 6 of Subheading 3.3, and adaptors oligos after nuclei barcoding in Subheading 3.5). 8. During purification of nucleic acids from lysis mixture, make sure to wash out SDS as the residual SDS may inhibit the subsequent reactions. 9. The optimal number of nuclei in each sub-library is ~3500, which gives 6–10% potential barcode collision rate. A higher number of nuclei in each sub-library will result in higher barcode collision. Using nuclei sorting instead of dilution to aliquot sub-libraries can reduce the potential barcode collision, but will also reduce the recovery rate.
Single-Cell Co-Assay of Open Chromatin and RNA
183
00 10
0 50
0
0
30
20
0 10
50
25
0
600
15
00
Paired-seq RNA Library
300
00 10
0 50
0 30
0 20
0
0
A N R
25
25
300
10
50
Paired-seq DNA Library
50
100
600
c Normallized Intensity
200
N
A 500 400 300
D
EL bp 1,500 1,000 700
Normallized Intensity
b
a
15
00
Fig. 2 Representative fragment analysis results of Paired-seq library. (a) Tapestation analysis results of a representative Paired-seq library. EL electronic ladder. (b) Fragments size distribution of representative DNA (b) and RNA (c) library of Paired-seq
10. To determine the optimal PCR cycles for step 9 of Subheading 3.8, the PCR reaction can be carried out in two steps: (a) perform PCR amplification for 10 cycles and put on ice, take out 0.5 μL of PCR mixture and dilute 1000X, quantify with qPCR and calculate the additional cycles needed to reach 10 nM concentration; (b) perform PCR amplification with the needed additional cycles and purify the amplified products, and store at -20 °C. 11. After preamplification, typically ~40–1200 ng amplified products can be recovered (~1–30 ng/μL as measured by Qubit). The yields of parallel processed sub-libraries should be comparable with each other. 12. Paired-seq libraries should have a fragment size distribution of 300–1000 bp (Fig. 2). If fragments of ~245 bp (adaptors) appear as a significant fraction, try to remove them with an additional round of 0.75X SPRI beads size selection. 13. Quantification of libraries must be carried out by qPCR. Tapestation (or Fragment Analyzer) and Qubit analysis tend to overestimate Paired-seq library concentrations.
184
Chenxu Zhu et al.
Acknowledgments We thank QB3 MacroLab for the Tn5 enzyme. This study was funded by grant nos. 1 U19 MH114831-02, U01MH121282, and R01AG066018 and the Ludwig Institute for Cancer Research (to B.R.) and grant no. 1K99HG011483-01 (to C.Z.). References 1. Lee CK, Shibata Y, Rao B et al (2004) Evidence for nucleosome depletion at active regulatory regions genome-wide. Nat Genet 36(8): 900–905. https://doi.org/10.1038/ng1400 2. Thurman RE, Rynes E, Humbert R et al (2012) The accessible chromatin landscape of the human genome. Nature 489(7414): 7 5 – 8 2 . h t t p s : // d o i . o r g / 1 0 . 1 0 3 8 / nature11232 3. Yue F, Cheng Y, Breschi A et al (2014) A comparative encyclopedia of DNA elements in the mouse genome. Nature 515(7527): 3 5 5 – 3 6 4 . h t t p s : // d o i . o r g / 1 0 . 1 0 3 8 / nature13992 4. Boyle AP, Davis S, Shulha HP et al (2008) High-resolution mapping and characterization of open chromatin across the genome. Cell 132(2):311–322. https://doi.org/10.1016/j. cell.2007.12.014 5. Schones DE, Cui K, Cuddapah S et al (2008) Dynamic regulation of nucleosome positioning in the human genome. Cell 132(5):887–898. https://doi.org/10.1016/j.cell.2008.02.022 6. Giresi PG, Kim J, McDaniell RM et al (2007) FAIRE (Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory elements from human chromatin. Genome Res 17(6):877–885. https://doi.org/10.1101/gr. 5533506 7. Buenrostro JD, Giresi PG, Zaba LC et al (2013) Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods 10(12): 1213–1218. https://doi.org/10.1038/ nmeth.2688 8. Minnoye L, Marinov GK, Krausgruber T et al (2021) Chromatin accessibility profiling methods. Nat Rev Methods Prim 1(1):10. https:// doi.org/10.1038/s43586-020-00008-9 9. Jin W, Tang Q, Wan M et al (2015) Genomewide detection of DNase I hypersensitive sites in single cells and FFPE tissue samples. Nature 528(7580):142–146. https://doi.org/10. 1038/nature15740
10. Lai B, Gao W, Cui K et al (2018) Principles of nucleosome organization revealed by singlecell micrococcal nuclease sequencing. Nature 562(7726):281–285. https://doi.org/10. 1038/s41586-018-0567-3 11. Cusanovich DA, Daza R, Adey A et al (2015) Multiplex single cell profiling of chromatin accessibility by combinatorial cellular indexing. Science 348(6237):910–914. https://doi. org/10.1126/science.aab1601 12. Buenrostro JD, Wu B, Litzenburger UM et al (2015) Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523(7561):486–490. https://doi. org/10.1038/nature14590 13. Lareau CA, Duarte FM, Chew JG et al (2019) Droplet-based combinatorial indexing for massive-scale single-cell chromatin accessibility. Nat Biotechnol 37(8):916–924. https://doi. org/10.1038/s41587-019-0147-6 14. Preissl S, Fang R, Huang H et al (2018) Singlenucleus analysis of accessible chromatin in developing mouse forebrain reveals cell-typespecific transcriptional regulation. Nat Neurosci 21(3):432–439. https://doi.org/10. 1038/s41593-018-0079-3 15. Kelsey G, Stegle O, Reik W (2017) Single-cell epigenomics: recording the past and predicting the future. Science 358(6359):69–75. https:// doi.org/10.1126/science.aan6826 16. Stuart T, Satija R (2019) Integrative single-cell analysis. Nat Rev Genet 20(5):257–272. https://doi.org/10.1038/s41576-0190093-7 17. Zhu C, Preissl S, Ren B (2020) Single-cell multimodal omics: the power of many. Nat Methods 17(1):11–14. https://doi.org/10. 1038/s41592-019-0691-5 18. Angermueller C, Clark SJ, Lee HJ et al (2016) Parallel single-cell sequencing links transcriptional and epigenetic heterogeneity. Nat Methods 13(3):229–232. https://doi.org/10. 1038/nmeth.3728 19. Zhu C, Zhang Y, Li YE et al (2021) Joint profiling of histone modifications and transcriptome in single cells from mouse brain.
Single-Cell Co-Assay of Open Chromatin and RNA Nat Methods 18(3):283–292. https://doi. org/10.1038/s41592-021-01060-3 20. Cao J, Cusanovich DA, Ramani V et al (2018) Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science 361(6409):1380–1385. https://doi. org/10.1126/science.aau0730 21. Wei X, Xiang Y, Peters D et al (2022) HiCAR is a robust and sensitive method to analyze openchromatin-associated genome organization. 82 (6):1225–1238.e6. https://doi.org/10. 1016/j.molcel.2022.01.023 22. Liu L, Liu C, Quintero A et al (2019) Deconvolution of single-cell multi-omics layers reveals regulatory heterogeneity. Nat Commun 10(1):470. https://doi.org/10.1038/ s41467-018-08205-7 23. Chen S, Lake BB, Zhang K (2019) Highthroughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat Biotechnol 37:1452. https://doi.org/10. 1038/s41587-019-0290-0
185
24. Zhu C, Yu M, Huang H et al (2019) An ultra high-throughput method for single-cell joint analysis of open chromatin and transcriptome. Nat Struct Mol Biol 26(11):1063–1070. https://doi.org/10.1038/s41594-0190323-x 25. Rosenberg AB, Roco CM, Muscat RA et al (2018) Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. Science 360(6385):176–182. https://doi.org/10.1126/science.aam8999 26. Adey A, Morrison HG, Asan et al (2010) Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome Biol 11(12): R119. https://doi.org/10.1186/gb-201011-12-r119 27. Zhang K, Hocker JD, Miller M et al (2021) A single-cell atlas of chromatin accessibility in the human genome. Cell:184(24):5985–6001. e19. https://doi.org/10.1016/j.cell.2021. 10.024
Chapter 11 Simultaneous Single-Cell Profiling of the Transcriptome and Accessible Chromatin Using SHARE-seq Samuel H. Kim, Georgi K. Marinov, S. Tansu Bagdatli, Soon Il Higashino, Zohar Shipony, Anshul Kundaje, and William J. Greenleaf Abstract The ability to analyze the transcriptomic and epigenomic states of individual single cells has in recent years transformed our ability to measure and understand biological processes. Recent advancements have focused on increasing sensitivity and throughput to provide richer and deeper biological insights at the cellular level. The next frontier is the development of multiomic methods capable of analyzing multiple features from the same cell, such as the simultaneous measurement of the transcriptome and the chromatin accessibility of candidate regulatory elements. In this chapter, we discuss and describe SHARE-seq (Simultaneous highthroughput ATAC, and RNA expression with sequencing) for carrying out simultaneous chromatin accessibility and transcriptome measurements in single cells, together with the experimental and analytical considerations for achieving optimal results. Key words scRNA-seq, scATAC-seq, Multiomics, Chromatin accessibility, Transcriptomics, Split–pool
1
Introduction The basic unit of biological organization is the individual cell. In combination with the surrounding cellular microenvironments within the context of a multicellular organism, each cell integrates across internal and external stimuli to maintain or alter its state for biological function. Understanding the cellular state at the singlecell resolution, therefore, is critical to defining the regulatory processes driving health and disease. A key advancement toward understanding cellular states has been in the development of transcriptomic methods. With the advent of high-throughput sequencing methods in the late 2000s, RNA-seq was developed to profile transcriptomes at base-pair resolutions [1–4]. Subsequently, the molecular biology approaches that enabled ever improved RNA-seq sensitivity have led to the development of
Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols, Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7_11, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023
187
188
Samuel H. Kim et al.
single-cell RNA-seq (scRNA-seq) to measure transcriptomes at the single-cell level. The first scRNA-seq methods [5–8] were very low throughput, only able to measure a few cells at a time. Further technical advancements utilized microfluidics- and plate-based approaches to increase throughput to the 102–103 range [9, 10], while droplet- and bead-based methods later boosted it to the 104– 105 range [11–14]. However, the approach that holds the most promise for ultra-high-throughput single-cell measurements is combinatorial indexing. The core concept of these approaches is to dynamically assign barcodes through multiple rounds of splitting and pooling cells to create a combinatorial set of barcodes that can be used to uniquely identify each cell. Specifically, a set of cells can be split into a 96- or 384-well plates, each well given a specific barcode, and then pooled back together to be randomly split into another set of plates. Iteratively performing these split–pool rounds with an optimal number of input cells, barcodes, and the number of rounds of barcoding, one can create a sufficient diversity of barcodes to uniquely assign each cell to a combination of barcodes. In comparison to physical isolation of each cell in a droplet or a well, combinatorial indexing provides a scalable platform for single-cell measurements. This is the basis of all “sci” (single-cell combinatorial indexing) methods, such as sci-RNA-seq [15] and SPLiTseq [16]. While scRNA-seq measures the current amount of transcripts in a given cell, it does not provide insight into how that transcriptional state is achieved and maintained through regulation. Mapping active cis-regulatory elements (cREs) provides key insight to address this need. A common property of active cREs, originally recognized more than four decades ago [17–19], is that they are depleted of nucleosomes and exhibit an open, “accessible” conformation. This property has been the basis for numerous methods that have been developed over the years to profile these elements [20], which rely on the preferential enzymatic cleavage or labeling of open chromatin regions. ATAC-seq [21, 22] (Assay for Transposase-Accessible Chromatin using sequencing) has emerged as the most versatile instance of such assays. ATAC-seq takes advantage of the preferential insertion of a hyperactive Tn5 [23] transposase, preloaded with sequencing adapters, into open chromatin. Tn5 had been previously adapted and successfully used for the generation of high-throughput sequencing libraries from low-input DNA samples [24]. The realization that it can also be used to tag open chromatin regions with ready-for-amplification sequencing adapters in a single reaction allowed for chromatin accessibility profiling to be carried out in bulk on very low-input samples (typically 50,000 cells, but also down to just a few thousand [21]), and eventually in single cells, in the form of scATAC-
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . .
189
seq, in the mid-2010s[25]. As with scRNA-seq, the throughput of scATAC-seq has also been dramatically increased over the years, using combinatorial indexing (sciATAC-seq [26–28]), microwell plates (μATAC-seq [29]), droplet-based methods [30], and combinations of combinatorial indexing and droplets (dsciATAC-seq [31]). Techniques such as scRNA-seq and scATAC-seq have provided unprecedented insights into the diversity of cell types, their developmental dynamics, and cellular responses to external stimuli in a wide variety of context. However, the ideal measurements would provide information about all relevant aspects of the state from the same cell. To this end, a variety of single-cell multiomic methods, measuring multiple such modalities in the same individual cells, have been under active development in recent years. These include methods for sequencing the genomes and transcriptomes of single cells (G & T-seq [32], PRDD-seq [33], DNTR-seq [34], sci-L3RNA/DNA [35], TARGET-seq [36], and others), for sequencing methylomes and transcriptomes (scTrio-seq [37], scMT-seq [38], and scM & T-seq [39]), for mapping accessible chromatin and methylomes (e.g., scNOMe-seq [40]), for measuring proteins and transcripts (REAP-seq [41], CITE-seq [42], QBC [43], inCITEseq [44], iNS-seq [45], using methylation-based labeling of open chromatin to map accessible DNA and transcripts (COOL-seq [46], scNMT-seq [47], scNOMeRe-seq [48], snmC2T-seq [49]), mapping protein occupancy and transcriptomes (CoTECH [50], Paired-Tag [51], scDam & T-seq [52]), for quantifying proteins levels and mapping open chromatin (PHAGE-ATAC [53], ASAPseq [54]), for quantifying proteins and transcriptome levels and mapping open chromatin (DOGMA-seq [54], TEA-seq [55]), and others [56]. As regulatory elements and RNA levels are the two perhaps most informative modalities, joint scATAC-seq + scRNA-seq methods are the most sought after multiomic assays. A number of these have been developed in recent years—sci-CAR-seq [57], Paired-seq [58], ASTAR-seq [59], SNARE-seq [60], SHARE-seq [61], and others. The ideal such assay should capture as many of the transcripts present in each cell as possible and also as many of the open chromatin regions in the nucleus, with high specificity and little noise. The SHARE-seq assay, which is based on the combinatorial indexing described above, provides high-quality and highthroughput transcriptome and accessible chromatin measurements in the same single cells. In this chapter, we describe in detail the SHARE-seq procedure and discuss the key optimization points and considerations for the generation of high-quality scATAC+scRNA-seq datasets.
190
2
Samuel H. Kim et al.
Materials
2.1 DNA Oligos and Primers
All oligonucleotides can be obtained through IDT. The exact scale and purification methods are listed below: 1. Round 1 linker (1 μmol scale, standard desalting): CCGAGCCCACGAGACTCGGACGATCATGGG
2. Round 2 linker (1 μmol scale, standard desalting): CAAGTATGCAGCGCGCTCAAGCACGTGGAT
3. Round 3 linker (1 μmol scale, standard desalting): AGTCGTACGCCGATGCGAAACATCGGCCAC
4. Round 1 blocking (1 μmol scale, standard desalting): CCCATGATCGTCCGAGTCTCGTGGGCTCGG
5. Round 2 blocking (1 μmol scale, standard desalting): ATCCACGTGCTTGAGCGCGCTGCATACTTG
6. Round 3 blocking (1 μmol scale, standard desalting): GTGGCCGATGTTTCGCATCGGCGTACGACT
7. Read 1 (100 nmol scale, HPLC purified): TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
8. Template Switching Oligo (TSO) (100nmol scale, HPLC purified): AAGCAGTGGTATCAACGCAGAGTGAATrGrG+G 9. RNA PCR primer (100 nmol scale, standard desalting): AAGCAGTGGTATCAACGCAGAGT
10. P7 primer (100 nmol scale, standard desalting): CAAGCAGAAGACGGCATACGAGAT
11. Phosphorylated Read2 (100 nmol scale, HPLC purified): /5Phos/GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG— 12. Reverse transcription primer (RT primer) (100 nmol scale, HPLC purified) /5Phos/GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGNNNNNNNNNN/iBiodT/TTTTTTTTTTTTTTVN
13. Blocked_ME_Comp (100 nmol scale, HPLC purified): /5Phos/CTG TCT CTT ATA CA/3ddC/
14. Pool–split ligation Plate R1 (see Note 4: /5Phos/CGCGCTGCATACTTG[8-bp-barcode] CCCATGATCGTCCGA
15. Pool–split ligation Plate R2 (see Note 4: /5Phos/CATCGGCGTACGACT[8-bp-barcode] ATCCACGTGCTTGAG
16. Pool–split ligation Plate R3 (see Note 4: CAAGCAGAAGACGGCATACGAGAT[8-bp-barcode] GTGGCCGATGTTTCG
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . .
191
17. PCR Library indexing primers plate: AATGATACGGCGACCACCGAGATCTACAC[8bp-index] TCGTCGGCAGCGTCAGATGTGTAT
An example set of 96 barcodes are listed below: AACGTGAT AAGGTACA CACTTCGA GATAGACA TGGAACAA ATCATTCC AAACATCG ACACAGAA CAGCGTTA GCCACATA TGGCTTCA ATTGGCTC ATGCCTAA ACAGCAGA CATACCAA GCGAGTAA TGGTGGTA CAAGGAGC AGTGGTCA ACCTCCAA CCAGTTCA GCTAACGA TTCACGCA CACCTTAC ACCACTGT ACGCTCGA CCGAAGTA GCTCGGTA AACTCACC CCATCCTC ACATTGGC ACGTATCA CCGTGAGA GGAGAACA AAGAGATC CCGACAAC CAGATCTG ACTATGCA CCTCCTGA GGTGCGAA AAGGACAC CCTAATCC CATCAAGT AGAGTCAA CGAACTTA GTACGCAA AATCCGTC CCTCTATC CGCTGATC AGATCGCA CGACTGGA GTCGTAGA AATGTTGC CGACACAC ACAAGCTA AGCAGGAA CGCATACA GTCTGTCA ACACGACC CGGATTGC CTGTAGCC AGTCACTA CTCAATGA GTGTTCTA ACAGATTC CTAAGGTC AGTACAAG ATCCTGTA CTGAGCCA TAGGATGA AGATGTAC GAACAGGC AACAACCA ATTGAGGA CTGGCATA TATCAGCA AGCACCTC GACAGTGC AACCGAGA CAACCACA GAATCTGA TCCGTCTA AGCCATGC GAGTTAGC AACGCTTA GACTAGTA CAAGACTA TCTTCACA AGGCTAAC GATGAATC AAGACGGA CAATGGAA GAGCTGAA TGAAGAGA ATAGCGAC GCCAAGAC
2.2 General Reagents
1. Eppendorf ThermoMixer C (96-well plate adapter) 2. Tabletop centrifuge 3. Swing bucket centrifuge with temperature control 4. Thermal cycler 5. Cold room 6. qPCR machine (QuantStudio 3) 7. Qubit fluorometer or equivalent 8. E-gel electrophoresis system (Thermo Fisher Scientific) 9. TapeStation (Agilent) or equivalent, e.g., BioAnalyzer (Agilent). 10. Multichannel pipettes or liquid handling instruments 11. gentleMACS Dissociator (Miltenyi Biotec) 12. Automated cell counter, e.g., Countess 3 (Thermo Fisher Scientific) or equivalent.
2.3 General Equipment
1. 1× PBS buffer solution (Thermo Fisher Scientific, Cat #10010049) 2. Bovine Albumin Fraction V (7.5% solution) (Thermo Fisher Scientific, Cat #15260037) 3. Trypan Blue Stain (0.4%) (Thermo Fisher Scientific, Cat #T10282)
192
Samuel H. Kim et al.
4. Enzymatic RI (Qiagen, Cat #Y9240L) 5. SUPERase RI (Thermo Fisher Scientific, Cat #AM2696) 6. Lucigen RI (Lucigen Cat # 30281-2) 7. Protector RI (Sigma Aldrich Cat # 3335399001) 8. 16% FA (Thermo Fisher Scientific, Cat # 28906) 9. Glycine (Sigma Aldrich, Cat #50049) 10. 1 M Tris HCl pH 7.5 (Thermo Fisher Scientific, Cat #15567027) 11. 1 M Tris HCl pH 8.0 (Thermo Fisher Scientific, Cat #15568025) 12. 5 M NaCl (Thermo Fisher Scientific, Cat #AM9760G) 13. 1 M MgCl2 (Sigma Aldrich, Cat #63069) 14. 1 M CaCl2 (Sigma Aldrich, Cat #21115-100ML) 15. DMF (Dimethyl Formamide) (Sigma, Cat #227056) 16. 0.2 M Tris-acetate pH 7.8 (Bioworld, Cat #40120265-2) 17. 5 M Potassium acetate (Sigma Aldrich, Cat #95843-100ML-F) 18. 1 M Magnesium acetate (Sigma Aldrich, Cat #63052-100ML) 19. 10% NP-40 (Thermo Fisher Scientific, Cat #28324) 20. Buffer EB (Qiagen, Cat #19086) 21. PEG 6000 (Sigma Aldrich, Cat #528877) 22. Maxima H Minus Reverse Transcriptase with buffer (Thermo Fisher Scientific, Cat #EP075) 23. 10 mM dNTPs (NEB, Cat #N0447L) 24. T4 DNA Ligase (NEB, Cat #M0202L) 25. Additional 10× T4 Ligase buffer (NEB, Cat #B0202S) 26. Proteinase K (20 mg/mL) (NEB, Cat #P8107S) 27. 20% SDS (VWR, Cat #97062+440) 28. 100 mM PMSF/IPA (Sigma Aldrich, Cat # P7626) 29. cOmplete Protease Inhibitor Cocktail (Sigma Aldrich, Cat # 11697498001) 30. 0.5 M EDTA (Sigma Aldrich, Cat #AM9260G) 31. Tween-20 (Sigma Aldrich, Cat #P9416-100ML) 32. Digitonin (Promega, Cat #G9441) 33. MyOne C1 Dynabeads (Thermo Fisher Scientific, Cat #65001) 34. Ficoll PM-400 (20%) (Sigma Aldrich, Cat #F5415-25ML) 35. KAPA HiFi 2× mix (Fisher Scientific, Cat #NC0295239) 36. SPRIselect beads (Beckman Coulter, Cat #B23318)
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . .
193
37. 100% EtOH 38. 100 mM DTT (Thermo Fisher Scientific, Cat #707265ML) 39. NEBnext 2× Mix (NEB, Cat #M0541L) 40. Glycerol (Thermo Fisher Scientific, Cat #15514011) 41. TD buffer from Nextera kit 42. SYBR Green I Nucleic Acid Gel Stain (Thermo Fisher Scientific, Cat #S7563) 43. EVAGreen Dye, 20x in water (Biotium, Cat #31000) 44. Nuclease-free H2O 45. 96-well plates (Eppendorf, Cat #0030129300) (preferably low protein and DNA binding; see Note 5) 46. 1.5-mL microcentrifuge tubes, preferably low protein and DNA binding (see Note 5) 47. 2-mL, 15-mL, and 50-mL tubes 48. gentleMACS M-Tubes (Miltenyi Biotec, Cat #130-093-236) 49. 30 μm Sterile single-pack CellTrics filters (Sysmex, Cat #04004-2326) 50. 200-μL PCR tubes 51. Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, Cat # Q32851) 52. TapeStation D1000 and D5000 tape and reagents (Agilent) 53. Tn5 transposase (see Note 1) 54. MinElute PCR Purification Kit (Qiagen Cat# 28004/28006), Zymo DNA Clean and Concentrator Kit (Zymo Cat# D4013/ D4014), or equivalent 2.4 Buffers and Reagents
Make all buffers using ultrapure molecular biology-grade ddH2O: 1. 2.5M Glycine (50 mL) 9.375 g Glycine (powder) 1× PBS up to 50 mL Filter through a 0.22 μM filter. Store at room temperature. 2. Tissue Dissociation (MACS) buffer 10 mM Tris-HCl pH 8.0 5 mM CaCl2 5 mM EDTA 3 mM MgAc 0.6 mM DTT cOmplete Protease Inhibitor Make fresh every time.
194
Samuel H. Kim et al.
3. Nuclei Isolation Buffer (NIB) 10 mM Tris-HCl pH 7.4 10 mM NaCl 3 mM MgCl2 0.1% IGEPAL CA-630 Store at 4 ∘C. 4. 2× TD buffer 20 mM Tris-HCl pH 7.6 10 mM MgCl2 20% Dimethyl Formamide Store at - 20∘C. 5. PEG 6000 50% Mix equal mass of PEG6000 and H2O, heat to 65 ∘C) for 4 min, and then cool down to room temperature. 6. 2× RCB buffer 100 mM Tris pH 8.0 100 mM NaCl 0.40% SDS Store at room temperature. 7. 2× BW buffer 10 mM Tris pH 8.0 2 M NaCl 1 mM EDTA Store at 4 ∘C. 8. 1× B & W-T Buffer 5 mM Tris pH 8.0 1 M NaCl 0.5 mM EDTA 0.05% Tween-20 Store at 4 ∘C. 9. Oligo resuspension buffer (IDTE) 10 mM Tris pH 8.0 0.1 mM EDTA Store at room temperature. 10. Oligo annealing buffer (STE) 10 mM Tris pH 8.0 50 mM NaCl 1 mM EDTA Store at room temperature.
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . .
195
11. Dilution buffer 50% glycerol 50 mM Tris pH 7.5 100 mM NaCl 0.1 mM EDTA 0.1% NP-40 Store at - 20∘C. 2.5 Software Packages
1.
Bowtie
[62] (http://bowtie-bio.sourceforge.net/index.
shtml). 2.
SAMtools
3.
PicardTools
[63]: http://www.htslib.org/ https://broadinstitute.github.io/picard/
4. UCSC Genome Browser [64, 65] utilities: http:// hgdownload.cse.ucsc.edu/admin/exe/ 5.
STAR
[66] https://github.com/alexdobin/STAR
6. R: https://www.r-project.org/ 7.
Python
8.
ArchR
9.
Seurat
(version 2.7 or higher) https://www.python.org/
[67]: https://www.archrproject.com/ [68]: https://satijalab.org/seurat/
10. Additional scripts: https://github.com/georgimarinov/Geo rgiScripts. Contains python scripts used in the examples shown below; some of the scripts depend on having pysam (https:// pysam.readthedocs.io/en/latest/index.html) and pyBigWig (https://github.com/deeptools/pyBigWig) installed.
3
Methods The general outline of the SHARE-seq assay is shown in Fig. 1. The first of the two basic ideas behind SHARE-seq and other pool–splitbased assays is to label molecules originating from each cell with a unique combination of barcodes that are added serially and randomly by pooling cells and then randomly redistributing them across subsequent sets of barcodes, thus ensuring that statistically each cell can be identified through a unique combination of barcodes. The second is the separation of chromatin and transcriptome molecules through the use of a biotinylated reverse transcription (RT) primer, which can then be used for a streptavidin pulldown of the transcriptome. In brief, before the beginning of a SHARE-seq experiment, the needed barcode plates and transposases are prepared and stored. The experiment itself begins with the isolation of nuclei from cells in culture or from tissues (see Note 2. Nuclei are then crosslinked,
cells
nuclei isolation and crosslinking
Tagmentation
Tn5 transposase
Reverse transcription
biotin TTTTTTTTTTTTTT AAAAAAAAAAAA
3 rounds of pool/split and hybridization
biotin TTTTTTTTTTTTTT AAAAAAAAAAAA
Ligation and reverse crosslinking
Biotin pulldown
Supernatant:
Beads:
- PCR amplification - ATAClibrary
- cDNA amplification - tagmentation - PCR amplification - RNA library
Fig. 1 Outline of the SHARE-seq assay. Nuclei are isolated from cells or tissues and crosslinked. Transposition is then carried out on chromatin, followed by reverse transcription with a biotinylated RT primer. Three pool– split rounds of hybridization of barcode oligos are then performed. Hybridized barcodes are then ligated, and crosslinks are reversed. The ATAC and RNA portions are separated by streptavidin pulldown. The ATAC is directly amplified, and the RNA is subjected to cDNA amplification, tagmentation, and final library amplification
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . .
197
usually lightly (see Note 3). Transposition is then carried out, followed by reverse transcription using a biotinylated RT primer containing a random unique molecular identifier (UMI). Three rounds of pool–split hybridization and blocking are then carried out, after which the hybridized oligos are ligated into single molecules to each other and to the transposed chromatin fragments and reverse transcribed mRNA. Crosslinks are then reversed, and streptavidin pulldown is used to separate the chromatin from the transcriptome. ATAC libraries are directly amplified from the supernatant. The transcriptome is first amplified on-beads into cDNAs, which are then tagmented into sequenceable fragments and PCR-amplified into final libraries. The resulting library structures for ATAC and RNA are shown in Fig. 2. ATAC libraries contain three barcodes, while RNA libraries also include the UMI. Note that with many Illuminabased sequencing readouts, the first barcode to be read is actually the third one added during the pool–split procedure. 3.1 Determining the Optimal Cell Number
It is important to carefully track the number of cells going into the SHARE-seq assays and being retained at each key step of the procedure. Pool–split assays rely on the statistical uniqueness of barcode combinations through which cells pass, which in turn means that having too many cells entering the pool–split procedure will lead to an unacceptably high rate of doublets (two or more cells with the same barcode). In the same time, some of the reactions have an efficiency-imposed limit on the number of cells that can enter them and need to be distributed into parallel reactions for optimal results. This applies to the initial transposition and reverse transcription reactions, as well as to the final amplification, where the existing protocol is optimized for libraries of size 20,000 cells, which means that after the final pooling cells are split into separate subpools of that size and processed into individual sublibraries. Figure 3 shows the theoretical number of detected cells and doublet rate for different pool–split setups with three rounds, accounting for a certain level of cell loss during repeated handling. Based on these calculations and empirical experience, we usually start the pool–split rounds with 5× 105 cells for a 96 × 96 × 96 pool–split experiment.
ATAC P5
Read 1
R1 linker
R2 linker
R3 linker
P7
5’ AATGATACGGCGACCACCGAGATCTACACTAGATCGCTCGTCGTCGGCAGCGTCAGATGTGTATAAGAGACAG...CTGTCTCTTATACA CCGAGCCCACGAGACTCGGACGATCATGGG CAAGTATGCAGCGCGCTCAAGCACGTGGAT AGTCGTACGCCGATGCGAAACATCGGCCAC ACATATTCTCTGTC...GACAGAGAATATGTGTAGAGGCTCGGGTGCTCTGAGCCTGCTAGTACCCTAGTGCAAGTTCATACGTCGCGCGAGTTCGTGCACCTATAGTGCAATCAGCATGCGGCTACGCTTTGTAGCCGGTGTAGTGCAATAGAGCATACGGCAGAAGACGAAC 5’
Read 2
RNA P5
Read 1
UMI
R1 BC
R2 BC
R1 linker
R3 BC
R2 linker
R3 linker
P7
CCGAGCCCACGAGACTCGGACGATCATGGG CAAGTATGCAGCGCGCTCAAGCACGTGGAT AGTCGTACGCCGATGCGAAACATCGGCCAC 5’ AATGATACGGCGACCACCGAGATCTACACTAGATCGCTCGTCGTCGGCAGCGTCAGATGTGTATAAGAGACAG... AAAAAAAAAAAAAAA ACATATTCTCTGTC...NVTTTTTTTTTTTTTTTNNNNNNNNNNGACAGAGAATATGTGTAGAGGCTCGGGTGCTCTGAGCCTGCTAGTACCCTAGTGCAAGTTCATACGTCGCGCGAGTTCGTGCACCTATAGTGCAATCAGCATGCGGCTACGCTTTGTAGCCGGTGTAGTGCAATAGAGCATACGGCAGAAGACGAAC 5’
Read 2
R1 BC
R2 BC
R3 BC
Fig. 2 Structure of final SHARE-seq libraries. ATAC (top) and RNA (bottom). Dots represent the actual library insert
Doublet fraction
198
Samuel H. Kim et al.
0.20 0.19 0.18 0.17 0.16 0.15 0.14 0.13 0.12 0.11 0.10 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00
96 x 96 x 96 384 x 96 x 96 384 x 384 x 384
10 4
10 5
10 6
10 7
10 8
Number detected cells
Fig. 3 Combinatorial indexing and SHARE-seq’s throughput. Shown is the number of cells that can be detected at a given doublet rate; the pool–split process was simulated as a random Poisson loading at a 50% loss of cells during each pool–split round 3.2 Annealing of Oligo Plates
In this step, barcode containing oligonucleotides for each round of split–pool is annealed and distributed into 96-well plates prior to the actual assay. These plates can be stored at - 20∘C indefinitely. It is advisable for the purposes of time saving to prepare sufficiently many such plates in advance to support multiple experiments. It is critical to thaw these plates to room temperature prior to use. See Note 4. 1. Dilute Round 1 linker oligos (120 μL at 1 mM concentration) with 11,880 μL STE buffer. 2. Mix 90 μL diluted Round 1 linker oligo with 10 μL Round 1 oligo (at 100 μM) in the wells of a multiwell plate. 3. Dilute Round 2 linker oligos (120 μL at 1 mM concentration) with 9480 μL STE buffer. 4. Mix 88 μL diluted Round 2 linker oligo with 12 μL Round 2 oligo (at 100 μM) in the wells of a multiwell plate. 5. Dilute Round 3 linker oligos (144 μL at 1 mM concentration) with 9360 μL STE buffer. 6. Mix 86 μL diluted Round 3 linker oligo with 14 μL Round 3 oligo (at 100 μM) in the wells of a multiwell plate. 7. Anneal the Round 1, Round 2, and Round 3 plates as follows in a thermocycler:
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . .
199
2 min at 95 ∘C Slow ramp at - 1∘C per minute to 20 ∘C 2 min at 20 ∘C Indefinitely at 4 ∘C 8. Check if there has been significant water evaporation for wells situated at the corners. If yes, add water to equalize volumes. 9. Aliquot 10 μL of the annealed oligos to new plates. This should be enough for 9 experiments. Store these plates at - 20∘C. 3.3 Anneal Adapter Oligos
In this step, Tn5 adapters are prepared for both transposition of chromatin and tagmentation during cDNA library preparation: 1. Dilute the Phosphorylated Read2, Read1, and Blocked ME Comp oligos to a 100 μM concentration with the IDTE buffer. 2. Prepare the transposition adapter mix in a PCR tube as follows: 6.5 μL 100 μM Phosphorylated Read2 oligo 6.5 μL 100 μM Read1 oligo 13 μL 100 μM Blocked ME Comp oligo 0.26 μL 1 M Tris pH 8.0 0.26 μL 5 M NaCl 3. Prepare the tagmentation adapter mix in a PCR tube as follows: 13 μL 100 μM Read1 oligo 13 μL 100 μM Blocked ME Comp oligo 0.26 μL 1 M Tris pH 8.0 0.26 μL 5 M NaCl 4. Anneal oligos as follows in a thermocycler: 2 min at 85 ∘C Slow ramp at - 1∘C per minute to 20 ∘C 2 min at 20 ∘C Indefinitely at 4 ∘C 5. Heat glycerol to 65 temperature.
∘
C, and then equilibrate to room
6. Mix 25 μL glycerol with 25 μL of annealed oligo. The annealed adapters can be immediately used or stored at 20∘C. 3.4 Transposome Assembly
In this step, Tn5 transposomes are assembled together with the annealed adapter oligos: 1. Assemble Tn5 transposomes by mixing the following components:
200
Samuel H. Kim et al.
0.625× N 1× home-made Tn5 0.625× N dilution buffer 1.25× N annealed transposition adapter with glycerol Total volume: 2.5× N 2. Incubate at room temperature for 30 min. The assembled transposome can be stored at - 20∘C for up to 2 weeks. 3.5 Tissue Dissociation
Here, we describe an example tissue dissociation protocol that has worked successfully in our hands for several human embryonic tissues. However, users should be aware that generally each tissue requires separate optimization of dissociation conditions, and it is likely that a different protocol will have to be adapted in most situations. 1. Set swing bucket centrifuge to 4 ∘C Fast Temp and thaw 1M DTT. 2. Transfer tissue samples onto dry ice. 3. Prepare MACS buffer (2 mL for each sample) as described above. Make sure the buffer is cold on ice. 4. Add 10 μL Protector RNase Inhibitor for each 1 mL in GentleMACS M-tubes. Add 1 mL of MACS buffer to each GentleMACS M-tube and chill on ice. 5. Transfer 30–50 mg of tissue into each GentleMACS M-tube containing 1 mL MACS buffer. 6. Allow the tissue to thaw in buffer. Transition to a cold room. 7. Homogenize using a Protein_01_01 dissociation protocol on a GentleMACS Tissue Dissociator instrument. 8. Filter the homogenate through 30 μm CellTrics filter into a 2mL DNA LoBind tube by pipetting directly onto the top of the filter and gently tapping to allow flow. 9. Wash the GentleMACS M-tube with 1 mL MACS buffer and filter the wash again through the 30 μm CellTrics filter. 10. Spin down the homogenate in a swing bucket centrifuge at 500 g for 5 min at 4 ∘C (ramp up and down both at 3/9). 11. Remove and discard supernatant. 12. Resuspend in 1mL PBS-2RI. 13. Count cells/nuclei and proceed with a desired number of cells/nuclei.
3.6 Fixation of Cells in Culture and of Dissociated Nuclei from Tissue
The next step, if starting with a dissociated tissue, is to fix the nuclei. This is also the first step if starting with cells in culture. The procedure used is generally the same, with the difference that with nuclei the first step is directly the fixation:
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . .
201
1. Prepare PBS-2RI Buffer (4 mL) by mixing the following: 4 mL 1× PBS 21.4 μL 7.5% BSA 10 μL Enzymatic RI 5 μL SUPERase RI Keep on ice. 2. Prepare NIB-RI Buffer (8 mL) by mixing the following: 8 mL NIB 20 μL Enzymatic RI 20 μL SUPERase RI Keep on ice. 3. Spin down cells at 500 g. 4. Wash cells with 0.5 ml PBS-2RI. 5. Count cells with Trypan blue. 6. Resuspend cells with cold PBS-2RI at a concentration of 1 × 106 cells/mL. 7. For each 1 mL of cells in PBS-2RI, add 66.7 μL of 1.6% FA (final concentration 0.1% FA) for cells or 66.7 μL of 3.2% FA for tissues. Mix and incubate at room temperature for 5 min. 8. Quench the reaction by adding to each 1 mL of cells in PBS-2RI the following: 56.1 μL 2.5 M Glycine 50 μL 1M Tris pH 8.0 13.3 μL of 7.5% BSA Mix well and incubate on ice for 5 min. 9. Spin down at 500 g. Remove supernatant, and add 0.5 mL PBS-2RI without disturbing the cell pellet. 10. Prepare RSB-RI by mixing the following: 2.5 μL 1 M Tris-HCl pH 7.5 0.5 μL 5 M NaCl 0.75 μL 1 M MgCl2 2.5 μL 10% Tween-20 2.5 μL 10% NP-40 2.5 μL 1% Digitonin 33.3 μL 7.5% BSA 0.25 μL 1 M DTT 204 μL Ultrapure water 1.25 μL Enzymatic RI
202
Samuel H. Kim et al.
11. Spin down again at 500 g. Remove supernatant, and resuspend cells in 100 μL RSB-RI and incubate on ice for 3 min. 12. Prepare RSB-T by mixing the following: 25 μL 1 M Tris-HCl pH 7.5 5 μL 5 M NaCl 7.5 μL 1 M MgCl2 25 μL 10% Tween-20 333.3 μL 7.5% BSA 2.5 μL 1 M DTT 2089.5 μL Ultrapure water 12.5 μL Enzymatic RI 13. Pipette 1 mL of RSB-T to cells and mix. Spin down at 500 g for 5 min. 3.7
ATAC Reaction
In this step, transposition of the entire sample is performed by splitting it into 10,000–20,000 cells in 50-μL reactions each in a 96-well plate. The smaller volume and the number of cells per reaction improve the quality of transposition. The cell lysis conditions described here are adapted from the omniATAC bulk ATAC protocol [22] (see Note 7): 1. Prepare PBS-RI by mixing the following: 800 μL PBS 2 μL Enzymatic RI 2. After the last centrifugation, remove supernatant and resuspend the cells with PBS-RI to 2× 106 cells/mL. 3. Prepare 2× TB buffer (sufficient for 96 reactions) by mixing the following: 874.5 μL 0.2 M Tris-acetate 70 μL 5 M Potassium acetate 53 μL 1 M Magnesium acetate 53 μL 10% Tween-20 53 μL 1% Digitonin 848 μL 100% DMF 698.5 μL H2O 4. Prepare 1× TB buffer according to the number of reactions N to be carried out. N = 1 corresponds to 104 input cells. 25× N 2× TB 16.45× N H2O
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . .
203
0.2× N PIC 0.85× N Enzymatic RI Total volume: 42.5× N. 5. Aliquot 5× NμL of the diluted cells to a new tube, e.g., for 10× 105 cells, N = 10, so aliquot 50 μL cells to a new tube. 6. Add 42.5× NμL 1× TB to sample. 7. Add 2.5× NμL of assembled Tn5 to sample. Mix well. 8. Aliquot 50 μL of sample in the wells of a 96- or 384-well plate. 9. Seal the plate and incubate with shaking at 500 rpm for 30 min at 37 ∘C. 10. Pool the reactions and spin down at 500 g. 11. Add 0.5 mL NIB-RI without disturbing the pellet and spin down again at 500 g. 12. Resuspend the cells in 60 μL EB. 3.8 Reverse Transcription
In this step, reverse transcription is performed in situ. The conditions are optimized for 1×105 cells entering each 50-μL reaction: 1. Prepare the reverse transcription (RT) mix (sufficient for 6 reactions) as follows: 70 μL 5× RT buffer 2.19 μL Enzymatics RNase Inhibitor 4.38 μL SUPERase RI 17.5 μL dNTPs 35 μL RT Primer 10.94 μL H2O 105 μL 50% PEG 35 μL Maxima H Minus Reverse Transcriptase (add right before RT reaction) Total volume: 280 μL. 2. Add 240 μL RT mix to 60 μL cells in EB. 3. Aliquot 50 μL to 6 PCR wells. 4. Start thawing the oligo plates, while the RT is ongoing. 5. Run the reverse transcription reaction in a thermocycler as follows: 50 ∘C for 10 min 3 cycles of: 8 ∘C for 12 s 15 ∘C for 45 s 20 ∘C for 45 s
204
Samuel H. Kim et al.
30 ∘C for 30 s 42 ∘C for 2 min 50 ∘C for 3 min 50 ∘C for 5 min 6. Pool samples and mix with 500 μL NIB-RI. 7. Spin down at 500 g. 8. Wash with 1000 μL NIB. 9. Spin down at 500 g. 10. Resuspend with 1152 μL NIB-RI. 3.9 Hybridization– Ligation and Pool–Split
In this step, cells/nuclei are iteratively split into individual wells to dynamically create a combinatorial index statistically unique to each cell. All handling is performed at room temperature so make absolutely sure that oligo plates have been fully thawed before proceeding. If different samples are multiplexed in a single run, they can be individually identified based on the first-round barcodes. If such a strategy is deployed, each sample needs to be processed through transposition and reverse transcription separately and then loaded into specified positions in the first-round plate(s). 1. Prepare 3456 μL hybridization buffer as follows: 2761.9 μL H2O 576 μL 10× T4 ligase buffer 14.4 μL SUPERase RI 20 U/μL 46.08 μL Enzymatics RI 40 U/μL 57.60 μL 10% NP40 2. Mix 1152 μL of sample with 3456 μL hybridization buffer. Keep the sample at RT. 3. Aliquot 40 μL of mixture to a Round 1 plate. 4. Mix and shake at 300 rpm for 30 min at RT. 5. Prepare 1152 μL Blocking Oligo 1 mix as follows: 253.4 μL 100 μM Round 1 blocking oligo 211.2 μL 10× T4 DNA Ligase buffer 687.4 μL H2O 6. Add 10 μL Blocking Oligo 1 mix to each well. 7. Mix and shake at 300 rpm for 30 min at RT. 8. Pool samples from all wells. 9. Aliquot 50 μL of mixture to a Round 2 plate. 10. Mix and shake at 300 rpm for 30 min at RT.
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . .
11. Prepare 1152 μL Blocking Oligo 2 mix as follows: 304.1 μL 100 μM Round 2 blocking oligo 211.2 μL 10× T4 DNA Ligase buffer 636.7 μL H2O 12. Add 10 μL Blocking Oligo 2 mix to each well. 13. Mix and shake at 300 rpm for 30 min at RT. 14. Pool samples from all wells. 15. Aliquot 60 μL of mixture to a Round 2 plate. 16. Mix and shake at 300 rpm for 30 min at RT. 17. Prepare 1152 μL Blocking Oligo 3 mix as follows: 265.0 μL 100 μM Round 3 blocking oligo 11.5 μL 10% NP-40 875.5 μL H2O 18. Add 10 μL Blocking Oligo 1 mix to each well. 19. Mix and shake at 300 rpm for 30 min at RT. 20. Pool samples from all wells. 21. Spin down at 500 g 5 min. 22. Wash with 1 mL NIB-RI. 23. Spin down at 500 g 5 min. 24. Resuspend in 80 μL NIB-RI. 25. Prepare 320 μL Ligation mix as follows: 3.2 μL Enzymatics RI 1.00 μL SUPERase RI 40 μL 10× T4 DNA Ligase Ligation buffer 20 μL T4 DNA Ligase 400 U/μL 251.8 μL H2O 4 μL 10% NP40 26. Mix sample with the 320 μL Ligation mix. 27. Aliquot 8× 50 μL in PCR tubes. 28. Shake at 300 rpm for 30 min at RT. 29. Pool samples from all tubes. 30. Spin down at 500 g 5 min. 31. Wash with 1 mL NIB-RI. 32. Spin down at 500 g 5 min. 33. Resuspend in 400 μL NIB-RI. 34. Count the number of nuclei.
205
206
Samuel H. Kim et al.
Note: If fewer cells are preferred per sub-library, count cells to desired concentration and add more NIB to make the volume up to 50 μL per sub-library. 3.10 Reverse Crosslinking
In this step, cells are reverse crosslinked to release DNA from the bound proteins so that the ATAC libraries can be amplified. As the crosslinking is relatively gentle (at 0.1 or 0.2%), a milder reverse crosslinking condition of 1 h incubation at 55 ∘C is generally sufficient. Further reverse crosslinking optimization might be needed if the crosslinking protocol has been modified: 1. For each N of 50-μL sub-library, add the following: 50 μL 2× RCB 2 μL Proteinase K 1 μL SUPERase RI 2. Incubate at 55 ∘C for 1 h. 3. Add 5 μL 100 mM PMSF/IPA. 4. Incubate at room temperature for 10 min. Note: this is an optional stopping point. The reverse crosslinked product can be stored at - 80∘C for a few days.
3.11
Pulldown
In this step, the cDNA is separated from the transposition products by pulling down on the biotin that is part of the reverse transcription primer. The supernatant constitutes the transposition products and is processed separately from the cDNA: 1. Prepare 1× B & W-T/RI buffer by mixing the following: 400× (N + 1) μL 1× B & W-T buffer 4× (N + 1) μL SUPERase RI 2. Prepare 1× B & W/RI buffer by mixing the following: 100× (N + 1) μL 1× BW buffer 2× (N + 1) μL SUPERase RI 3. Prepare 1× STE/RI buffer by mixing the following: 200× (N + 1) μL 1× STE buffer N + 1 μL SUPERase RI 4. In a fresh tube, mix 10× NμL MyOne C1 Dynabeads with 100× NμL 1× B & W-T. 5. Separate on a magnetic rack and remove supernatant. 6. Wash twice with 100× NμL B & W-T without RI. 7. Wash once with 100× NμL B & W-T/RI. 8. Resuspend beads in 100× NμL 2× B & W/RI.
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . .
207
9. Add 100 μL beads to each sample. 10. Incubate at room temperature on a rotator for 60 min. 11. Place the tube on a magnetic rack. 12. Transfer the supernatant (which contains chromatin fragments) to a new tube for ATAC library preparation. The ATAC fragments are stable for a few hours at room temperature and can be processed concurrently or after cDNA library construction is complete. 13. Wash cDNA/RNA-bound beads three times with 100 μL 1× B & W-T/RI. 14. Wash with 100 μL 1× STE/RI without resuspending beads. 3.12 ATAC Library Preparation
In this step, ATAC fragments are purified and amplified into a final library ready for sequencing: 1. Clean up the ATAC part of the sample using Zymo DNA Clean and Concentrate. Elute in 11 μL EB buffer, and then elute again with additional 11 μL EB buffer (a total of 22 μL EB buffer). 2. Prepare ATAC PCR Master Mix by mixing the following: 225 μL 2× NEBnext Master Mix 9 μL P7 primer 25 μM 27 μL H2O 3. Mix the following: 20 μL sample 29 μL ATAC PCR Master Mix 1 μL of 25 μM Adapter 1 Primer (from the PCR Library indexing primers plate) 4. Run PCR for 5 cycles as follows: 72 ∘C for 5 min 98 ∘C for 30 s 5 cycles of: 98 ∘C for 10 s 65 ∘C for 30 s 72 ∘C for 30 s 5. Determine additional cycles using qPCR. Add 5 μL of the pre-amplified reaction to 10 μL qPCR Master Mix for a total qPCR reaction of 15 μL as follows: 5 μL NEBnext Master Mix 0.2 μL 25 μM Adapter 1.1 0.2 μL 25 μM P7
208
Samuel H. Kim et al.
0.9 μL 10x SYBR Green 3.7 μL H2O 6. Assess the amplification profiles and determine the required number of additional cycles to amplify. Please refer to Figure 2 in Buenrostro et al. [25]. 7. Carry out final amplification by placing the remaining 45 μL in a thermocycler and running the following program: Nadd cycles of: 98 ∘C for 10 s 65 ∘C for 30 s 72 ∘C for 30 s where Nadd is the number of additional cycles. 8. Clean up the final library using Zymo DNA Clean & Concentrate, eluting in 15 μL. 3.13 RNA Library Preparation Step 1. Template Switching
In this step, RNA library generation is initiated by carrying out template switching on the pulled down cDNA: 1. Prepare the Template switch mix by mixing the following: 11.25 μL H2O 125 μL 50% PEG 6000 90 μL 5× Maxima RT buffer 90 μL Ficoll PM-400 (20%) 45 μL 10 mM dNTPs 45 μL RNase inhibitor (Lucigen) 11.25 μL 100 μM TSO oligo 22.5 μL Maxima RT Rnase H Minus (add last right before reaction) 2. Remove all supernatant. Be careful to avoid drying the beads. 3. Resuspend beads in 50 μL Template switch mix. 4. Incubate samples for 30 min at room temperature with rotation. 5. Incubate samples for 90 min at 42 ∘C at 300 rpm. Resuspend every 30 min by pipetting up and down.
3.14 RNA Library Preparation Step 2. Amplification of cDNA
The next step is to amplify the individual cDNA molecules. 1. Prepare cDNA PCR Mix by mixing the following: 247.5 μL KAPA HiFi 2× mix 7.92 μL 25 μM RNA PCR primer 7.92 μL 25 μM P7 primer 231.7 μL H2O
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . .
209
2. Mix samples with 100 μL H2O. 3. Separate beads on magnet. Wash with 200 μL STE without resuspending the beads. 4. Mix beads with 55 μL cDNA PCR Mix and transfer to PCR tubes/plates. 5. Run PCR as follows: 95 ∘C for 3 min 5 cycles of: 98 ∘C for 20 s 65 ∘C for 45 s 72 ∘C for 3 min 6. Determine additional cycles using qPCR. Add 2.5 μL of the pre-amplified reaction to 7.5 μL qPCR Master Mix in a total qPCR reaction of 10 μL as follows: 3.75 μL KAPA HiFi 2× mix 0.12 μL 25 μM RNA PCR primer 0.12 μL 25 μM P7 primer 0.5 μL 20x EVAgreen 3.01 μL H2O 7. Determine additional cycles as described above for ATAC libraries. 5 cycles of: 98 ∘C for 20 s 65 ∘C for 45 s 72 ∘C for 3 min 8. Purify using SPRI beads. Mix the reaction with 0.8× volume of SPRI beads and incubate at room temperature for 10 min. Separate the beads on magnet and wash twice with 200 μL freshly prepared 70% EtOH. Make sure to remove all liquid, and elute in 20 μL. 9. Optional: check size of the cDNA using the D5000 TapeStation. 3.15 RNA Library Preparation Step 3. Tagmentation
The next step is to tagment the amplified cDNA, which will prepare it for the final library amplification step: 1. Quantify cDNA concentration using Qubit. 2. Dilute cDNA to a concentration of 5 ng/μL for tagmentation. Note: Expect more than 50 ng cDNA. If cDNA amount is low, it can get away with tagmenting 20 ng cDNA; in this case, adjust the volume of H2O and cDNA accordingly.
210
Samuel H. Kim et al.
3. Prepare tagmentation transposome by mixing the following: 11.25 μL 1× Tn5 11.25 μL Dilution Buffer 22.5 μL annealed tagmentation adapter with glycerol 4. Mix the following: 10 μL 5 ng/μL cDNA 10 μL H2O 25 μL 2× TD buffer 5 μL assembled Tn5 5. Incubate for 5 min at 55 ∘C. 6. Purify tagmented library using the Zymo kit (use 250 μL binding buffer). Elute twice with 11 μL EB (a total of 22 μL). 3.16 RNA Library Preparation Step 4. Final Amplification
Final libraries are generated by PCR. 1. Prepare post-tagmentation PCR mix by mixing the following: 20 μL sample 25 μL 2× NEB Next Master Mix 1 μL 25 μM P7 primer 1 μL 25 μM Adapter 1 Primer (from the PCR Library indexing primers plate) 3 μL H2O 2. Run PCR as follows: 72 ∘C for 5 min 9 cycles of: 98 ∘C for 10 s 65 ∘C for 30 s 72 ∘C for 60 s
3.17 Library Quantification and Evaluation of Library Quality
Before libraries can be sequenced, they need to be properly quantified and be subjected to quality evaluation. This is done by first, evaluation of the insert distribution, and second, quantification: 1. Examination of library size distribution. This step can be carried out using several different instruments, such as a TapeStation or a BioAnalyzer. We prefer to use a TapeStation (with the D1000 or HS D1000 kits) due to flexibility, ease of use, and rapid turnaround time. 2. Quantification of library concentration. For most highthroughput sequencing applications, this step is standardly carried out using a Qubit fluorometer. While this works well for libraries with a unimodal fragment-length distribution, ATAC libraries typically exhibit a multimodal fragment
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . .
211
distribution and also often contain fragments of length higher than what can be sequenced on standard Illumina instruments. As a result, effective library concentrations often differ from apparent library concentrations measured using Qubit, and the optimal way for estimating effective library concentration is qPCR. 3. Estimation of effective library concentration using qPCR. Standard Illumina library quantification kits can be used to quantify the concentration of the library that will be able to be sequenced. Products from NEB or KAPA are appropriate for this use. 3.18
4
Sequencing
The protocol described here generates libraries designed to be sequenced on Illumina sequencers, the most widely available of which is the NextSeq. On a NextSeq, SHARE-seq libraries are sequenced as follows using a 150-cycle kit: For the RNA libraries, use a 50 bp × 10 bp × 99 bp × 8 bp configuration (Read 1 × Read 2 × Index1 × Index2, respectively). For the ATAC libraries, use a 30 bp × 30 bp × 99 bp × 8 bp configuration (Read 1 × Read 2 × Index1 × Index2, respectively). For RNA, the 10bp of Read 2 captures the UMI, and the 50 bp captures the actual RNA sequence. For ATAC, fragments are sequenced in a 2 × 30 bp format. The 8 bp of Index 2 captures the library barcode (if more than one library is sequenced in a single run). The 99 bp of Index 1 captures the pool–split barcodes. For other Illumina instruments, different configurations can be used. For example, using a 200-cycle kit on NovaSeq, run ATAC libraries in 55 bp × 55 bp × 99 bp × 8 bp configuration and RNA libraries in a 100 bp × 10 bp × 99 bp × 8 bp configuration. An important consideration to take into account before sequencing is that the standard Illumina run recipes do not allow for the 99-bp index read configuration that is necessary for SHARE-seq libraries. This necessitates the creation of custom recipes in which the limits on the length of the index reads are increased accordingly. However, different methods for creating these custom recipes are necessary depending on the Illumina instrument used and the versions of the control software that the machine is equipped with; resolving this issues may on occasions require seeking help from Illumina’s customer support service.
Computational Processing At present there is no standard tool for analyzing pool–split-based multiomics datasets. The pipeline presented here is the one we have been using in our practice. Its objective is to take the raw SHARE-
212
Samuel H. Kim et al. ATAC FASTQ
RNA FASTQ barcode assignment
ATAC alignment
UMI assignment
filtering and deduplication
RNA alignment
cell assignment
gene quantification
downstream analysis (ArchR)
downstream analysis (Seurat) joint scATAC/RNA analyses
Fig. 4 Outline of the SHARE-seq computational processing procedures. As a first step, cell barcodes are annotated for all reads in both ATAC and RNA FASTQ files. Subsequently, UMIs are consolidated and assigned to reads in the RNA set. RNA reads are then aligned against the genome, and gene expression is quantified in single cells, resulting in a final data matrix that can be analyzed in Seurat (or other scRNA-seq) tools. ATAC reads are aligned against the genome, filtered (removing mitochondria-mapping reads), and deduplicated within each barcode. Alignments are then annotated with their cell barcodes and can be used as input for further analysis in ArchR. Further joint analysis of the ATAC and RNA can be carried out downstream
seq reads and to produce object that can be used for further analysis with established tools for scRNA-seq/scATAC-seq processing such as Seurat and ArchR (e.g., sparse matrices and BAM files). The outline of the processing is shown in Fig. 4. For both ATAC and RNA, reads are first assigned their cellular barcodes. RNA reads are additionally annotated with the sequenced UMIs. RNA reads are aligned against the genome, a quantification is carried out for each gene in each cell, and a final sparse matrix is created. For ATAC, reads are mapped against the genome, then filtered, and deduplicated within each cell, and a final BAM file with cellular barcodes appended to each alignment is created. 4.1
RNA
1. As a first step in the RNA processing, annotated barcodes for each read pair, using the SHARE-seq-barcode-annotate. py script. python SHARE-seq-barcode-annotate.py BC1file fieldID pos1 lenBC1 BC2file fieldID2 pos2 lenBC2 BC3file fieldID3 pos3 lenBC3 [-BCedit N] [-revcompBC]
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . .
213
The script is flexible and can be used to assign barcodes to almost any kind of pool–split experiment in which the indexes are in Index Read 1. It takes as input files containing the barcodes for each round of pool–split and the column positions of the barcode sequences in each file (0-based), their position in Index Read 1 (0-based), their length, and their orientation (use the [-revcompBC] option if the sequences are reverse complement, depending on the exact format of the sequencing). Use the [-BCedit] option to increase/decrease the stringency of matching barcode sequences to the master list (the default value is 1). In this case, the barcode files are in the following format: #WellPosition
Name
Sequence
A1
Round1_01
AACGTGAT
B1
Round1_02
AAACATCG
C1
Round1_03
ATGCCTAA
[...]
And barcodes are assigned in a single step as follows: python PEFastqToTabDelimited.py RNA.end1.fastq.gz RNA.end2.fastq.gz | python SHARE-seq-barcode-annotate.py Plate_R1.tsv 2 15 8 Plate_R2.tsv 2 53 8 Plate_R3.tsv 2 91 8 -revcompBC | PEFastqToTabDelimited-reverse.py RNA.barcodes_annotated
This will produce FASTQ files with headers looking as follows: @[readID]:::[GTTAGCCT+TAGTCTTG+TACCGAGC] 1:N:0: TGGGGNCACAGAGCCAAACCATATCAGCTG + AAAAA#EEEEEAEEEEEEEEEEEEEEEEEE
In which barcode combinations have been appended to the read headers, with nan if no matching barcode was found due to sequencing errors or other issues, e.g.: @[readID]:::[GACGGATT+GATAGAGG+nan] 1:N:0: ACCAANCTGTGCACAAGCGTGAATCAACCT + 6AAAA#E/EEEEEEEEEAEEEEEEEEEEEE
Note that it is considerably faster to split the FASTQ files into smaller pieces and process them in parallel.
214
Samuel H. Kim et al.
2. Compress the output files: gzip RNA.barcodes_annotated.barcodes_annotated.end1.fastq gzip RNA.barcodes_annotated.barcodes_annotated.end1.fastq
3. Annotated UMIs using the SHARE-seq-RNA-UMI-Add.py script, which is also flexible and can read UMIs of different lengths in each read in the pair: python SHARE-seq-RNA-UMI-Add.py UMIlen read1|read2
As follows: python PEFastqToTabDelimited.py RNA.barcodes_annotated.end1.fastq.gz RNA.barcodes_annotated.end2.fastq.gz | python SHARE-seq-RNA-UMI-Add.py 10 read2 | python PEFastqToTabDelimited-reverse.py RNA.barcodes_annotated.RNA_UMI
This step will append the UMI sequence to the cell barcodes in the read ID: @[readID]:::[TGACCACT+GGTCGTGT+TGCTGATA+TTTATGATAG] CCTCTNGCTCAGCCTATATACCGCCATCTTCAGCAAACCCTGATGAAGGC + AAAAA#EEEEEEEEEEEEEEEEEEEEEEEEEEEEEE/EEEEEEEEEEEEE
4. Compress the output files: gzip RNA.barcodes_annotated.RNA_UMI.end1.fastq gzip RNA.barcodes_annotated.RNA_UMI.end2.fastq
5. Merge the individual files: cat RNA_.barcodes_annotated.RNA_UMI.end1.fastq > RNA.barcodes_annotated.RNA_UMI.end1.fastq.gz cat RNA_.barcodes_annotated.RNA_UMI.end2.fastq > RNA.barcodes_annotated.RNA_UMI.end2.fastq.gz
6. Align the Read 1 FASTQ file against the genome using STAR as follows (the commands given here use the standard ENCODE Project Consortium[69] STAR settings): STAR --limitSjdbInsertNsj 10000000 --genomeDir genome/STAR --outFileNamePrefix RNA.end1.STAR/ --readFilesIn RNA.barcodes_annotated.RNA_UMI.end1.fastq.gz
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . .
215
--runThreadN 20 --outSAMunmapped Within --outFilterType BySJout --outSAMattributes NH HI AS NM MD --outFilterMultimapNmax 50 --outSAMstrandField intronMotif --outFilterMismatchNmax 999 --outFilterMismatchNoverReadLmax 0.04 --alignIntronMin 10 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --sjdbScore 1 --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --outWigStrand Stranded --twopassMode Basic --twopass1readsN -1 --limitBAMsortRAM 500000000000
7. Index the output BAM file: samtools index RNA.end1.STAR/Aligned.sortedByCoord.out.bam
8. Calculate global mapping statistics: python SAMstats.py RNA.end1.STAR/Aligned.sortedByCoord.out.bam SAMstats-RNA.end1.STAR.hg38 -bam genome.chrom.sizes samtools
This script will output the number of mapped reads in various categories (uniquely mapping, spliced, etc.) as well as the molecular complexity of the alignment. 9. Calculate read distribution relative to the genome annotation: python SAM_reads_in_genes3_BAM.py annotation.gtf RNA.end1.STAR/Aligned.sortedByCoord.out.bam genome.chrom.sizes sam_reads_genes-RNA.end1.STAR -nomulti
This script will output the fraction of exonic, intronic, and intergenic reads. This is important information for single-cell assays for evaluating to what extent the cytoplasm (which is enriched for exonic reads relative to the nucleus) is captured in the final libraries. 10. Make a RPM-normalized (Reads Per Million mapped reads) global coverage track: python makewigglefromBAM-NH.py title RNA.end1.STAR/Aligned.sortedByCoord.out.bam genome.chrom.sizes RNA.end1.STAR/Aligned.sortedByCoord.out.wig -RPM
216
Samuel H. Kim et al.
11. Evaluate read coverage along transcripts: python gene_coverage_wig_gtf.py annotation.gtf RNA.end1.STAR/Aligned.sortedByCoord.out.wig 1000 coverage-RNA -normalize -singlemodelgenes
This script run with these settings will output the average read profile over all genes with only a single transcript annotated (in order to avoid confounding by the presence of multiple isoforms) and ≥1000 bp in length. Use a simple annotation with few isoforms, such as refSeq to get as many genes meeting these requirements as possible. 12. Calculate UMI counts per gene and per cell barcode combination using the SHARE-seq_RNA_counts.py. For faster processing, run this on each chromosome in parallel, as follows (shown is chr1): python SHARE-seq_RNA_counts.py RNA.end1.STAR/Aligned.sortedByCoord.out.bam annotation.gtf.chr1 genome.chrom.sizes RNA.SHARE-seq_RNA_counts.chr1 -UMIedit 1
The [-UMIedit] option can be used to tweak the level of UMI collapsing (in this case UMIs within an edit distance of 1 from each other will be collapsed into a single UMI). 13. Calculate per-cell statistics by merging the individual outputs using the SHARE-seq-RNA-BC-sum-across-files.py script as follows: python SHARE-seq-RNA-BC-sum-across-files.py list_of_per_chromosome_outputs RNA.SHARE-seq_RNA_counts.UMIs_per_cell
This will output a file in the following format: #BC1+BC2+BC3
rank3 UMIs3 Aligned Positions genes
GCCAATGT+CAGATCTG+TAACGCTG 1
64660 171969
8369
GTTGTCGG+TAAGCGTT+GATCAGCG 2
47079 123008
7864
TGACCACT+GGTCGTGT+TGCTGATA 3
45034 109960
7652
which shows the number of UMIs and the number of detected genes for each cell barcode combination. 14. Extract cell barcode combinations above a desired threshold, e.g., ≥500 UMIs into a separate file.
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . .
217
15. Create final sparse matrix format files that can be used as input to Seurat for further analysis with the SHARE-seq-RNAUMIs-sum-across-files.py script: python SHARE-seq-RNA-UMIs-sum-across-files.py list_of_per_chromosome_outputs RNA.SHARE-seq_RNA_counts.UMIs_per_cell.min500 0 RNA.SHARE-seq_RNA_counts.UMIs_per_cell.min500.sparse -sparse
4.2
ATAC
The first steps of the ATAC processing are analogous to those of the RNA pipeline: 1. First, annotate cellular barcodes: python PEFastqToTabDelimited.py ATAC.end1.fastq.gz ATAC.end2.fastq.gz | python SHARE-seq-barcode-annotate.py Plate_R1.tsv 2 15 8 Plate_R2.tsv 2 53 8 Plate_R3.tsv 2 91 8 -revcompBC | PEFastqToTabDelimited-reverse.py ATAC.barcodes_annotated
Note as before that it is considerably faster to split the FASTQ files into smaller pieces and process them in parallel. 2. Compress the output files: gzip ATAC.barcodes_annotated.end1.fastq gzip ATAC.barcodes_annotated.end1.fastq
3. Merge the individual files: cat ATAC_.barcodes_annotated.end1.fastq > ATAC.barcodes_annotated.end1.fastq.gz cat ATAC_.barcodes_annotated.end2.fastq > ATAC.barcodes_annotated.end2.fastq.gz
4. Align reads against the mitochondrial genome with Bowtie as follows: python PEFastqToTabDelimited.py ATAC.barcodes_annotated.end1.fastq.gz ATAC.barcodes_annotated.end2.fastq.gz -trim 30 30 | bowtie bowtie-indexes/chrM -p 20 -v 2 -a -t --best --strata -q -X 1000 --sam --12 - | samtools view -F4 -bT genome.fa - | samtools sort - ATAC.2x30mers.chrM
218
Samuel H. Kim et al.
This step is for the purpose of evaluating the extent of mitochondrial contamination in the overall library. 5. Align reads against the full genome with Bowtie and filter out mitochondrial reads as follows: python PEFastqToTabDelimited.py ATAC.barcodes_annotated.end1.fastq.gz ATAC.barcodes_annotated.end2.fastq.gz -trim 30 30 | bowtie bowtie-indexes/genome -p 20 -v 2 -k 2 -m 1 -t --best --strata -q -X 1000 --sam --12 - | egrep -v chrM | samtools view -F4 -bT genome.fa - | samtools sort ATAC.2x30mers.unique.nochrM
Adjust accordingly if working a genome in which the mitochondrial chromosome/contigs are named differently or there are multiple contigs to be filtered out (e.g., in plants where there is also a plastid in addition to the mitochondrion). 6. Index the resulting BAM files. samtools index ATAC.2x30mers.unique.nochrM.bam samtools index ATAC.2x30mers.chrM.bam
7. Calculate mapping statistics for the two sets of alignments. python SAMstats.py ATAC.2x30mers.chrM.bam SAMstats-ATAC.2x30mers.chrM -bam genome.chrom.sizes samtools -paired -noNHinfo python SAMstats.py ATAC.2x30mers.unique.nochrM.bam SAMstats-ATAC.2x30mers.unique.nochrM -bam genome.chrom.sizes samtools -paired -uniqueBAM
8. Calculate the mitochondrial reads fraction MRF as follows: M RF =
|RM | |RM | + |RN |
ð1Þ
where RM is the total number of reads that map to the mitochondrial genome and RN is the number of reads that map to the nuclear genome after filtering out mito-mapping reads. 9. Evaluate the fragment size distribution over the nuclear genome:
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . .
219
python PEInsertDistFromBAM.py ATAC.2x30mers.unique.nochrM.bam genome.chrom.sizes ATAC.2x30mers.unique.nochrM.InsLen -uniqueBAM -normalize
10. Create a normalized genome coverage track: python makewigglefromBAM-NH.py title ATAC.2x30mers.unique.nochrM.bam genome.chrom.sizes ATAC.2x30mers.unique.nochrM.wig -notitle -RPM -uniqueBAM
11. Create a BigWig file using the wigToBigWig program from the UCSC Genome Browser utilities suite. wigToBigWig ATAC.2x30mers.unique.nochrM.wig genome.chrom.sizes ATAC.2x30mers.unique.nochrM.bigWig
12. Calculate the global TSS enrichment. The TSS enrichment TSSE is the most informative ATAC-seq and is based on generating an average read distribution profile around annotated transcription start sites for protein coding genes and then calculating the ratio between the number of reads in the immediate neighborhood of the TSS and the number of reads falling in the regions on the flanks of the TSS peak. The advantage of the TSSE metric is that it is an internal to the dataset measure independent of peak calling. We use a TSS window of ±100 bp and a TSS flank distance of 2000 bp, i.e., TSSE is calculated as follows: T SSE =
|R ∈ [T SS ± 100]| |R ∈ [T SS − 2050, T SS − 1950]| + |R ∈ [T SS + 1950, T SS + 2050]|
(2) First, generate the TSS metaprofile: python signalAroundCoordinate-BW.py annotation.TSS-0bp.bed 0 1 3 4000 ATAC.2x30mers.unique.nochrM.bigWig ATAC.2x30mers.unique.nochrM.TSS_profile -normalize
Note that you need a BED file containing the start positions and the strands of annotated TSSs in the genome, e.g., #chr TSS chr1 1000
TSS strand geneName 1000
+
GENE1
220
Samuel H. Kim et al.
Second, calculate the TSS score: python ATACTSSscore.py ATAC.2x30mers.unique.nochrM.TSS_profile 100 2000 >> ATACTSSscore.txt
13. Deduplicate the BAM file. Note that this step is different from the typical deduplication carried out in most high-throughput sequencing pipelines, based on tools such as MarkDups in picard. Here, we perform deduplication of fragments only within the same cell barcode, i.e., for two fragments to be collapsed, they need to have the same coordinates, orientation, and cell barcode. python SHARE-seq_ATAC_dedup.py ATAC.2x30mers.unique.nochrM.bam genome.chrom.sizes ATAC.2x30mers.unique.nochrM.BC_dedup.bam -addBC
Use the [-addBC] to append the cell barcodes to each alignment as a BC tag, making these final files ready to use with ArchR. 14. Index the deduplicated BAM file: samtools index ATAC.2x30mers.unique.nochrM.BC_dedup.bam
15. Calculate alignment stats for the deduplicated BAM file: python SAMstats.py ATAC.2x30mers.unique.nochrM.BC_dedup.bam SAMstats-ATAC.2x30mers.unique.nochrM.BC_dedup -bam genome.chrom.sizes samtools -paired -uniqueBAM
16. Calculate fragment count and TSS enrichment statistics for each cell barcode. python SHARE-seq_ATAC_stats_per_cell.py ATAC.2x30mers.unique.nochrM.BC_dedup.bam genome.chrom.sizes annotation.TSS-0bp.bed 0 1 2000 200 ATAC.2x30mers.unique.nochrM.BC_dedup.per_cell_stats
This script will output a file containing information about the number of fragments and TSS enrichment for each barcode that can be used to filter barcodes for downstream analysis. More sophisticated filtering, in addition to these simple metrics, i.e., of doublet cells, can be performed in ArchR [67].
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . .
5
221
Expected Results
5.1 Sequencing Libraries
Figure 5 shows the typical fragment profiles for ATAC and RNA SHARE-seq libraries. ATAC libraries are expected to show a nucleosomal signature, with a prominent subnucleosomal, mononucleosomal, and perhaps dinucleosomal peaks, shifted to the right by the length of the adapters and barcodes added to the original fragments. In contrast, RNA libraries are primarily unimodal in length.
5.2 Species Mixing Experiments
A customary experiment to be carried out when testing, adopting, or developing any new single-cell protocol is the species mixing experiment, in which cells from two different species, usually mouse and human, are mixed together, and the extent of crosstalk/contamination of individual barcodes or of doublet formation (in which two cells are processed together with the same barcode)
Fig. 5 Typical fragment-length profiles of SHARE-seq libraries. (a) BioAnalyzer profile of a SHARE-seq ATAC library. (b) BioAnalyzer profile of a SHARE-seq RNA library
222
Samuel H. Kim et al.
A
B
ATAC 15,000
RNA 50,000
mm10 UMIs per barcode
mm10 fragments per barcode
40,000 10,000
5,000
30,000
20,000
10,000
0
0 0
5,000
10,000
hg38 fragments per barcode
15,000
0
10,000
20,000
30,000
40,000
hg38 UMIs per barcode
Fig. 6 Typical results from a species mixing SHARE-seq experiment. Human HEK293 and mouse embryonic fibroblast (MEF) cells were mixed in equal proportions and carried through the SHARE-seq workflow. (a) ATAC fragments per cell. (b) RNA UMIs per cell
is assessed based on how many reads in each barcode map to each species. Ideally, all barcodes should feature reads coming from only one of the two species. Doublet arise from loading of multiple cells in the same droplets/wells (depending on the method used) or from physical clumping of cells early in the protocol that then are processed together throughout the rest of the procedure. Figure 6 shows typical species mixing results for a SHARE-seq experiment. We note that in our hands ATAC experiments usually show virtually no crosstalk between barcodes and very few doublets. On the other hand, pool–split RNA experiments in general often exhibit a small fraction of reads resulting from “leakage,” likely because of some cells opening up during cell handling and releasing their content into the general reaction pool. This issue does not significantly affect most analyses, but it should be kept in mind in the cases in which it could be a confounding factor. 5.3 ATAC Postsequencing Quality Evaluation
Figure 7 shows the key ATAC-seq bulk-level metrics. The fragmentlength distribution (Fig. 7a) usually shows strong subnucleosomal and nucleosomal peaks as well as a weaker dinucleosomal one. High TSS enrichment is desirable; in this case (Fig. 7b), it is very high (TSSE ≥25). See Note 8 for more details. Figure 7c shows the fraction of mitochondrial reads in the human and mouse cells in the species mixing experiment. Note that the fraction can vary greatly depending on the properties of the cell type (cancer cell lines and highly metabolically active cells tend to have more mitochondria [70]) and not just on the experimental variation (which in this case is completely minimized as the cells were processed together).
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . . A
B
0.015
0.3
TSSE = 27.51
C 1.0
nuclear
223 chrM
0.9
0.7
Fraction of reads
Average RPM
Number fragments
0.8 0.2
0.010
0.1
0.005
0.6 0.5 0.4 0.3 0.2 0.1
0.0
0.000 200
300
400
500
600
700
800
Fragment length
0 50 1, 0 00 1, 0 50 2, 0 00 2, 0 50 3, 0 00 3, 0 50 4, 0 00 0
100
-4 ,0 -3 00 ,5 -3 0 0 ,0 -2 00 ,5 -2 00 ,0 -1 0 0 ,5 -1 00 ,0 00 -5 00
0.0 0
HEK293 (hg38) MEF (mm10)
Position relative to TSS
Fig. 7 Basic evaluation of bulk-level ATAC quality and enrichment. (a) Fragment-length distribution. (b) TSS enrichment. Shown are the same experiments as those featured in Fig. 6. (c) Mitochondrial read fraction for each species in this experiment A
B
80
TSS ratio
60
40
20
0 10
100 1000 Number fragments
10000
Fig. 8 Basic evaluation of scATAC-seq-level quality and enrichment. (a) Number fragments per cell barcode vs. TSS enrichment. (b) Cell barcode rank (by fragment counts) vs. fragment counts per cell barcode
Figure 8 shows the key scATAC metrics. One such metric is the relationship between the number of fragments per cell barcode and the TSS enrichment within each cell barcode (Fig. 8a). Another is the curve of the number of fragments per cell barcode plotted against the rank (by the number of fragments per cell barcode) of the cell barcodes (Fig. 8b). Ideally, there should be a clear inflection point between the cell barcodes with high fragment counts and the cell barcodes with low fragment counts, indicating that a set of high-quality cells have been captured and preserved intact through the full pool–split procedure. A flatter, diagonal-like shape of that curve can be indicative of loss of cell integrity during handling and is potentially concerning regarding the biological interpretability of the experiment if the lack of inflection is too extreme.
224
A
Samuel H. Kim et al.
B
0.3
exonic
intronic
intergenic
0.9 0.8 0.2
0.7
Fraction of reads
Coverage (arbitrary units)
1.0
0.1
0.6 0.5 0.4 0.3 0.2 0.1
0.0
0.0 0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95 100
HEK293 (hg38) MEF (mm10)
Position along mRNA 5' -> 3' (for mRNAs >= 1,000 bp)
Fig. 9 Basic evaluation of the bulk-level RNA-seq properties. (a) Read distribution along transcript lengths. (b) Read distribution relative to the exonic, intronic, and intergenic genomic spaces 5.4 RNA Postsequencing Quality Evaluation
Figure 9 shows the typical parameters to be evaluated for a bulklevel RNA-seq dataset. One is the distribution of reads along transcripts (Fig. 9a). SHARE-seq is not a 3’-tagging experiment the way some scRNA-seq approaches are as it attaches UMIs to the 3’ end of transcripts, but cDNAs are tagmented at random after cDNA amplification; thus the first reads of the RNA part of a SHARE-seq dataset can be some distance away from the 3’ end. Another is the distribution of reads relative to the annotation (Fig. 9b). As is often observed in scRNA-seq datasets, SHARE-seq RNA libraries contain a significant portion of reads originating from introns, presumably from unspliced transcripts present in the nucleus. This is likely due to the fact that the ATAC reaction has to happen first in the workflow, and thus a substantial portion of the cytoplasm is lost and the final libraries are enriched for nuclear material. Figure 10 shows the key metric for evaluating the success of the RNA portion of a SHARE-seq experiment. As with ATAC above, the curve of the number of UMIs per cell barcode plotted against the rank (by the number of UMIs per cell barcode) of the cell barcodes should ideally feature a clear inflection point between the cell barcodes with high UMI counts and the cell barcodes with low UMI counts (Fig. 10a). There should also be a concordance between the cell barcodes with high ATAC fragment counts and those with high UMI counts, i.e., the same cells are of high quality in both modalities, and are thus usable for joint analysis (Fig. 10b).
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . .
B 10000
A
225
1000
10
Number_fragments +1
Number UMIs 100 1000
10000
count 20000 15000 10000 5000
100
10
1
1 1
10
100 1,000 10,000 cell barcode rank
100,000 1,000,000
1
10
100 1,000 UMIs + 1
10,000 100,000
Fig. 10 Basic evaluation of SHARE-seq RNA single-cell-level quality and enrichment. (a) Cell barcode rank (by UMI counts) vs. UMI counts per cell barcode. (b) UMI counts per barcode vs. ATAC fragment counts per barcode.
A
ATAC
B
RNA 10
UMAP_2
UMAP_2
5
0
−5
UMAP_1
−10
0
UMAP_1
10
Fig. 11 Example SHARE-seq output on human embryonic lung samples. (a) ArchR iterative LSI UMAP on the ATAC-seq dataset. (b) Seurat UMAP on the RNA dataset. Individual ArchR- and Seurat-defined clusters are colored separately 5.5 Dimensionality Reduction and Cell Type/Cluster Identification
Following initial data processing, clusters and cell types can be identified using standard tools for that purpose such as Seurat [68] and/or ArchR [67]. Figure 11 shows typical such output in UMAP space for both the ATAC and RNA sides of a SHARE-seq experiment from a human embryonic lung tissue sample.
226
6
Samuel H. Kim et al.
Notes
1. The details of the production of hyperactive transposition are beyond the scope of this chapter. However, detailed instructions for how to carry it out can be found in Picelli et al. 2014 [71]. 2. In this chapter, we presented one of many available protocols for tissue dissociation and nuclei isolation that has worked in our hands in some contexts. However, the variety of tissues and their properties that can be encountered in different organisms is vast, making it practically impossible to have one common such protocol for all situations. Thus novel optimal procedures for tissue dissociation often have to be empirically devised or adapted. 3. The protocol we described here used light 0.1% FA crosslinking. This does not mean that optimal results will be obtained in all contexts with the same conditions, and crosslinking may have to be optimized depending on the specifics of the experimental system being studied. 4. The protocol described here is for a 96 × 96 × 96 indexing. However, it can be expanded to more cycles and/or more barcodes, e.g., to a 3-round 384 × 384 × 384 indexing, or 4-round or 5-round 96/384 × 96/384 × 96/384. Pick the optimal design based on the availability of robotic liquids handlers (it is generally not practical to carry out pipetting of 384-well plates by hand), the desired throughput, and other considerations. Note that additional barcodes and linker would have to be designed so that they are compatible with each other and with further rounds of barcoding. Aim for as much distance in sequence space between the 8-bp barcodes (or increase their length, if the sequencing format allows for it). The set of 8-bp barcodes can be identical throughout all rounds of indexing. 5. Low-binding tubes are preferable for all reactions in order to ensure maximum yields. 6. It is optimal in terms of effort to anneal a sufficient amount of oligos for multiple experiments on many separate plates. These can then be used immediately when cells/tissues become available, saving a considerable amount of experimental time. 7. The TB buffer described here is modified from the original omniATAC protocol with the addition of acetate. In our experience, this provides superior results compared to the traditional buffer formulation.
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . .
227
8. In our (and not only ours) experience, experiments in cell lines always produce much higher quality ATAC datasets than those obtained from tissues, especially frozen tissues. This is not limited to SHARE-seq but is what has been observed by numerous previous studies mapping chromatin accessibility in tissue samples in contexts such as cancer, development, and adult tissues [27, 28, 72, 73]. This is likely due to the extensive handling and freezing and thawing of tissues leading to the breaking up of nuclei and the release of unprotected free DNA that is tagmented by Tn5, increasing the background fragments and decreasing the signal to noise. Whether future protocol optimizations can resolve these issues or they are fundamentally insurmountable is not known at present.
Acknowledgements The authors thank Sai Ma and Jason Buenrostro for helpful discussion regarding the SHARE-seq protocol. This work was supported by NIH grants (P50HG007735, RO1 HG008140, U19AI057266 and UM1HG009442 to W.J.G., 1UM1HG009436 to W.J.G. and A.K., 1DP2OD022870-01 and 1U01HG009431 to A.K., and HG006827 to C.H.), the Rita Allen Foundation (to W.J.G.), the Baxter Foundation Faculty Scholar Grant, and the Human Frontiers Science Program grant RGY006S (to W.J.G). W.J.G is a Chan Zuckerberg Biohub investigator and acknowledges grants 2017174468 and 2018-182817 from the Chan Zuckerberg Initiative. S.K. is supported by MSTP training grant T32GM007365 and the Paul and Daisy Soros Fellowship. Fellowship support also provided by the Stanford School of Medicine Dean’s Fellowship (G.K.M.), by the EMBO Long-Term Fellowship EMBO ALTF 1119-2016, and by the Human Frontier Science Program Long-Term Fellowship HFSP LT 000835/2017-L (Z.S.). References 1. Mortazavi A, Williams BA, McCue K et al. (2008) Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5(7):621–628 2. Nagalakshmi U, Wang Z, Waern K et al. (2008) The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320(5881):1344–1349 3. Sultan M, Schulz MH, Richard H et al. (2008) A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321(5891):956–960. 4. Wilhelm BT, Marguerat S, Watt S et al. (2008) Dynamic repertoire of a eukaryotic
transcriptome surveyed at single-nucleotide resolution. Nature 453(7199):1239–1243. 5. Tang F, Barbacioru C, Wang Y et al. (2009) mRNA-Seq whole-transcriptome analysis of a single cell. Nat Methods 6(5):377–382. 6. Islam S, Kj€allquist U, Moliner A et al. (2011) Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res 21(7):1160–1167. 7. Ramsko¨ld D, Luo S, Wang YC et al. (2012) Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat Biotechnol 30(8):777–782
228
Samuel H. Kim et al.
8. Hashimshony T, Wagner F, Sher N, Yanai I (2012) CEL-seq: single-cell RNA-Seq by multiplexed linear amplification. Cell Rep 2(3): 666–673. 9. Shalek AK, Satija R, Adiconis X, Gertner RS, Gaublomme JT, Raychowdhury R, Schwartz S, Yosef N, Malboeuf C, Lu D, Trombetta JJ, Gennert D, Gnirke A, Goren A, Hacohen N, Levin JZ, Park H, Regev A (2013) Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature 498(7453):236–240. 10. Jaitin DA, Kenigsberg E, Keren-Shaul H, Elefant N, Paul F, Zaretsky I, Mildner A, Cohen N, Jung S, Tanay A, Amit I (2014) Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science 343(6172):776–779 11. Klein AM, Mazutis L, Akartuna I, Tallapragada N, Veres A, Li V, Peshkin L, Weitz DA, Kirschner MW (2015) Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161(5): 1187–1201 12. Macosko EZ, Basu A, Satija R, Nemesh J, Shekhar K, Goldman M, Tirosh I, Bialas AR, Kamitaki N, Martersteck EM, Trombetta JJ, Weitz DA, Sanes JR, Shalek AK, Regev A, McCarroll SA (2015) Highly parallel genomewide expression profiling of individual cells using nanoliter droplets. Cell 161(5): 1202–1214 13. Zheng GX, Terry JM, Belgrader P et al. (2017) Massively parallel digital transcriptional profiling of single cells. Nat Commun 8:14049 14. Han X, Wang R, Zhou Y et al. (2018) Mapping the Mouse Cell Atlas by Microwell-Seq. Cell 172(5):1091–1107.e17 15. Cao J, Packer JS, Ramani V et al. (2017) Comprehensive single-cell transcriptional profiling of a multicellular organism. Science 357:661– 667 16. Rosenberg AB, Roco CM, Muscat RA et al. (2018) Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. Science 360:176–182 17. McGhee JD, Wood WI, Dolan M et al. (1981) A 200 base pair region at the 5′ end of the chicken adult β-globin gene is accessible to nuclease digestion. Cell 27:45–55 18. Keene MA, Corces V, Lowenhaupt K et al. (1981) DNase I hypersensitive sites in Drosophila chromatin occur at the 5′ ends of regions of transcription. Proc Natl Acad Sci U S A 78:143–146
19. Wu C (1980) The 5′ ends of Drosophila heat shock genes in chromatin are hypersensitive to DNase I. Nature 286(5776):854–860 20. Minnoye L, Marinov GK, Krausgruber T et al. (2021) Chromatin accessibility profiling methods. Nat Rev Meth Primers 1:10. 21. Buenrostro JD, Giresi PG, Zaba LC et al. (2013) Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods 10:1213–1218 22. Corces MR, Trevino AE, Hamilton EG et al. (2017) An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat Methods 14:959–962 23. Reznikoff WS (2008) Transposon Tn5. Annu Rev Genet 42:269–286 24. Adey A, Morrison HG, Asan et al. (2010) Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome Biol 11(12): R119 25. Buenrostro JD, Wu B, Litzenburger UM et al. (2015) Single-cell chromatin accessibility reveals principles of regulatory variation. Nature 523:486–490 26. Cusanovich DA, Daza R, Adey A et al. (2015) Multiplex single cell profiling of chromatin accessibility by combinatorial cellular indexing. Science 348:910–914 27. Cusanovich DA, Reddington JP, Garfield DA et al. (2018) The cis-regulatory dynamics of embryonic development at single-cell resolution. Nature 555:538–542 28. Preissl S, Fang R, Huang H et al. (2018) Single-nucleus analysis of accessible chromatin in developing mouse forebrain reveals celltype-specific transcriptional regulation. Nat Neurosci 21(3):432–439 29. Mezger A, Klemm S, Mann I et al. (2018) High-throughput chromatin accessibility profiling at single-cell resolution. Nat Commun 9(1):3647 30. Satpathy AT, Granja JM, Yost KE et al. (2019) Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion. Nat Biotechnol 37:925–936 31. Lareau CA, Duarte FM, Chew JG et al. (2019) Droplet-based combinatorial indexing for massive-scale single-cell chromatin accessibility. Nat Biotechnol 37:916–924 32. Macaulay IC, Haerty W, Kumar P, et al. 2015. G & T-seq: parallel sequencing of single-cell
Simultaneous Single-Cell Profiling of the Transcriptome and Accessible. . . genomes and transcriptomes. Nat Methods 12(6):519–522 33. Huang AY, Li P, Rodin RE et al. (2020) Parallel RNA and DNA analysis after deep sequencing (PRDD-seq) reveals cell type-specific lineage patterns in human brain. Proc Natl Acad Sci U S A 117(25):13886–13895 34. Zachariadis V, Cheng H, Andrews N, Enge M (2020) A highly scalable method for joint whole-genome sequencing and geneexpression profiling of single cells. Mol Cell 80(3):541–553.e5 35. Yin Y, Jiang Y, Lam KG et al. (2019) Highthroughput single-cell sequencing with linear amplification. Mol Cell 76(4):676–690.e10 36. Rodriguez-Meira A, Buck G, Clark SA et al. (2019) Unravelling intratumoral heterogeneity through high-sensitivity single-cell mutational analysis and parallel rna sequencing. Mol Cell 73(6):1292–1305.e8 37. Hou Y, Guo H, Cao C et al. (2016) Single-cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas. Cell Res 26(3): 304–319 38. Hu Y, Huang K, An Q, Du G, Hu G, Xue J, Zhu X, Wang CY, Xue Z, Fan G (2016) Simultaneous profiling of transcriptome and DNA methylome from a single cell. Genome Biol 17:88 39. Angermueller C, Clark SJ, Lee HJ et al. (2016) Parallel single-cell sequencing links transcriptional and epigenetic heterogeneity. Nat Methods 13(3):229–232 40. Pott S (2017) Simultaneous measurement of chromatin accessibility, DNA methylation, and nucleosome phasing in single cells. Elife 6: e23203 41. Peterson VM, Zhang KX, Kumar N et al. (2017) Multiplexed quantification of proteins and transcripts in single cells. Nat Biotechnol 35(10):936–939 42. Stoeckius M, Hafemeister C, Stephenson W et al. (2017) Simultaneous epitope and transcriptome measurement in single cells. Nat Methods 14(9):865–868 43. O’Huallachain M, Bava FA et al. (2020) Ultrahigh throughput single-cell analysis of proteins and RNAs by split-pool synthesis. Commun Biol 3(1):213 44. Chung H, Parkhurst CN, Magee EM et al. (2021) Simultaneous single cell measurements of intranuclear proteins and gene expression. https://doi.org/10.1101/2021.01.18.427139 45. Katzenelenbogen Y, Sheban F, Yalin A et al. (2020) Coupled scRNA-seq and intracellular protein activity reveal an immunosuppressive
229
role of TREM2 in cancer. Cell 182(4): 872–885.e19 46. Guo F, Li L, Li J et al. (2017) Single-cell multiomics sequencing of mouse early embryos and embryonic stem cells. Cell Res 27(8):967–988 47. Clark SJ, Argelaguet R, Kapourani CA et al. (2018) scNMT-seq enables joint profiling of chromatin accessibility DNA methylation and transcription in single cells. Nat Commun 9(1): 781 48. Wang Y, Yuan P, Yan Z et al. (2021) Single-cell multiomics sequencing reveals the functional regulatory landscape of early embryos. Nat Commun 12(1):1247 49. Luo C, Liu H, Xie F et al. (2019) Single nucleus multi-omics links human cortical cell regulatory genome diversity to disease risk variants. bioRxiv 2019.12.11.873398 50. Xiong H, Luo Y, Wang Q et al. (2021) Singlecell joint detection of chromatin occupancy and transcriptome enables higher-dimensional epigenomic reconstructions. Nat Methods 18(6):652–660 51. Zhu C, Zhang Y, Li YE et al. (2021) Joint profiling of histone modifications and transcriptome in single cells from mouse brain. Nat Methods 18(3):283–292 52. Markodimitraki CM, Rang FJ, Rooijers K et al. (2020) Simultaneous quantification of proteinDNA interactions and transcriptomes in single cells with scDam & T-seq. Nat Protoc 15(6): 1922–1953 53. Fiskin E, Lareau CA, Eraslan G et al. (2020) Single-cell multimodal profiling of proteins and chromatin accessibility using PHAGE-ATAC. bioRxiv 2020.10.01.322420 54. Mimitou EP, Lareau CA, Chen KY et al. (2021) Scalable, multimodal profiling of chromatin accessibility, gene expression and protein levels in single cells. Nat Biotechnol. https://doi. org/10.1038/s41587-021-00927-2 55. Swanson E, Lord C, Reading J et al. (2021) Simultaneous trimodal single-cell measurement of transcripts, epitopes, and chromatin accessibility using TEA-seq. eLife 10:e63632 56. Kearney CJ, Vervoort SJ, Ramsbottom KM et al. (2021) SUGAR-seq enables simultaneous detection of glycans, epitopes, and the transcriptome in single cells. Sci Adv 7(8): eabe3610 57. Cao J, Cusanovich DA, Ramani V et al. (2018) Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science 361:1380–1385 58. Zhu C, Yu M, Huang H et al. (2019) An ultra high-throughput method for single-cell joint
230
Samuel H. Kim et al.
analysis of open chromatin and transcriptome. Nat Struct Mol Biol 26:1063–1070 59. Xing QR, Farran CAE, Zeng YY et al. (2020) Parallel bimodal single-cell sequencing of transcriptome and chromatin accessibility. Genome Res 30(7):1027–1039 60. Chen S, Lake BB, Zhang K (2019) Highthroughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat Biotechnol 37(12):1452–1457 61. Ma S, Zhang B, LaFave LM et al. (2020) Chromatin potential identified by shared single-cell profiling of RNA and chromatin. Cell 183: 1103–1116.e20 62. Langmead B, Trapnell C, Pop M et al. (2009) Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10:R25 63. Li H, Handsaker B, Wysoker A et al. (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079 64. Kuhn RM, Haussler D, Kent WJ (2013) The UCSC Genome Browser and associated tools. Brief Bioinform 14:144–161 65. Kent WJ, Zweig AS, Barber G et al. (2010) BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics 26: 2204–2207
66. Dobin A, Davis CA, Schlesinger F et al. (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1):15–21. 67. Granja JM, Corces MR, Pierce SE et al. (2021) ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis. Nat Genet 53(3):403–411 68. Hao Y, Hao S, Andersen-Nissen E et al. (2021) Integrated analysis of multimodal single-cell data. Cell 184(13):3573–3587.e29 69. ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489:57–74 70. Marinov GK, Wang YE, Chan DC, Wold BJ (2014) Evidence for site-specific occupancy of the mitochondrial genome by nuclear transcription factors. PLoS ONE 9(1):e84713. link 71. Picelli S, Bjo¨rklund AK, Reinius B et al. (2014) Tn5 transposase and tagmentation procedures for massively scaled sequencing projects. Genome Res 24:2033–2040 72. Domcke S, Hill AJ, Daza RM et al. (2020) A human cell atlas of fetal chromatin accessibility. Science 370(6518):eaba7612 73. Corces MR, Granja JM, Shams S et al. (2018) The chromatin accessibility landscape of primary human cancers. Science 362(6413): eaav1898
Chapter 12 Simultaneous Measurement of DNA Methylation and Nucleosome Occupancy in Single Cells Using scNOMe-Seq Michael Wasney and Sebastian Pott Abstract Single-cell Nucleosome Occupancy and Methylome sequencing (scNOMe-seq) is a multimodal assay that simultaneously measures endogenous DNA methylation and nucleosome occupancy (i.e., chromatin accessibility) in single cells. scNOMe-seq combines the activity of a GpC Methyltransferase, an enzyme which methylates cytosines in GpC dinucleotides, with bisulfite conversion, whereby unmethylated cytosines are converted into thymines. Because GpC Methyltransferase acts only on cytosines present in non-nucleosomal regions of the genome, the subsequent bisulfite conversion step not only detects the endogenous DNA methylation, but also reveals the genome-wide pattern of chromatin accessibility. Implementing this technology at the single-cell level helps to capture the dynamics governing methylation and accessibility vary across individual cells and cell types. Here, we provide a scalable plate-based protocol for preparing scNOMe-seq libraries from single nucleus suspensions. Key words scNOMe-seq, Single cell, DNA methylation, Nucleosome occupancy, Chromatin accessibility, GpC Methyltransferase, Bisulfite sequencing, Epigenetic modification, Fluorescence-activated cell sorting
1
Introduction Multimodal technologies allow for the simultaneous characterization of multiple biological features in the same samples, making it possible to directly observe relationships between these features [1–8]. Increasingly, these assays are being implemented at the level of single cells. Single-cell data reveal cell-type-specific features and regulatory dynamics that are not apparent in data from bulk assays. Nucleosome Occupancy and Methylome sequencing (NOMeseq) is a multimodal assay that quantifies both endogenous DNA methylation and nucleosome occupancy [9–11] – two genomic features associated with modulation of transcription – and has
Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols, Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7_12, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023
231
Michael Wasney and Sebastian Pott
average methylation (%)
232
40
CpG
30
GpC
20 −1000 −500
0
500
1000
distance from DHSs center in bp Fig. 1 scNOMe-seq data measure chromatin accessibility and DNA methylation. Aggregated CpG (yellow) and GpC (blue) methylation data from multiple single cells at DNase Hypersensitive sites in GM12878 cells. Figure reproduced from Pott, 2017 [1], published under a Creative Commons Attribution license
been adapted for application on single cells (scNOMe-seq) [1–3]. DNA methylation, which in mammalian cells almost always occurs in CpG dinucleotides, is strongly associated with transcriptional repression [12]. Nucleosomes block most DNA-dependent processes by hindering access of transcription factors and other cellular machinery to the underlying genetic sequence [13]. Conversely, nucleosome-depleted regions (NDRs) identify gene regulatory elements, such as promoters and enhancers. In surveying both features at the single-cell level, scNOMe-seq can help to elucidate the functional relationship between DNA methylation and nucleosome occupancy within cell-type-specific contexts [14] (Fig. 1). scNOMe-seq utilizes the M.CviPi GpC Methyltransferase to methylate all accessible cytosine residues in GpC dinucleotides. Importantly, these residues are normally unmethylated in mammalian cells [15]. By following GpC Methyltransferase treatment with bisulfite conversion, all unmodified cytosine residues are converted, facilitating the detection of both accessible GpC residues and endogenous DNA methylation at CpG residues [1, 11]. To detect cytosine methylation, this protocol adapts a platebased library strategy developed for single-cell bisulfite sequencing [16, 17]. Our original scNOMe-seq publication used the Pico Methyl-seq library prep kit (Zymo Research D5456) to generate single cell libraries [1]. The Ecker lab established a multiplexed single-nucleus methylcytosine sequencing protocol [16, 17] which significantly improved samples throughput and genome coverage. We have used this protocol for scNOMe-seq since then. Luo et al. [16] was published under a Creative Commons Attribution 4.0 International License and we reproduced the protocol with small adaptations as part of our workflow outlined
scNOMe-Seq
233
B
A
GmC
CM
Cardiomyocytes
10 TSNE 2
100 kb
aggregate
20
0
mCG
Hemat.
Fibroblasts
Fibroblasts Pericytes/ Sm. mus.
−10
Hemat.
Endothelial cells
−10
−5
Endothelial
0 TSNE 1
5
10
Pericytes/ Sm.Mus. MYH7
Fig. 2 Multimodal profiling of the heart captures cell-type-specific epigenetic configurations in the MYH7 locus. scNOMe-seq data from an adult human heart sample comprising 1229 cells. (a) TSNE plot with clusters corresponding to major cell types (left). (b) Pseudobulk data tracks for the corresponding clusters for both data modalities capturing chromatin accessibility (GmC, green) and DNA methylation (mCG, blue), respectively
below. Sequenced scNOMe-seq libraries are aligned to a human genome, allowing for the methylation statuses of all captured cytosines in CpG and GpC contexts to be retrieved. Because CpG and GpC dinucleotides occur relatively frequently, a single read can simultaneously capture the accessibility and endogenous methylation of the genomic locus to which the read maps. The protocol below starts from single cell suspensions and combines initial preparation and GpC treatment of nuclei [18] with preparation of single-cell bisulfite libraries [16]. However, in our experience this protocol can be performed following most nuclei isolations. For example, we successfully used nuclei isolated from frozen human heart samples (Fig. 2).
2
Materials While performing this protocol, use ultrapure water and analytical grade reagents to prepare solutions and follow institutional and material-related safety guidelines when disposing of waste.
2.1 Reagents/ Consumables 2.1.1 Nuclei Isolation and GpC Methyltransferase Treatment
1. Digestion mix: Prepare 1.9% Proteinase K solution by adding 1040 μL of Proteinase K Storage buffer to one tube with 20 mg of Proteinase K (Zymo D3001-2-20) and allow Proteinase K to dissolve completely. Store Proteinase K solution at -20 °C between uses. To isolate nuclei, mix 883 μL of M-Digestion Buffer (Zymo D5020–9), 88 μL Proteinase K solution, and 795 μL water. Digestion mix can be prepared the day before an experiment and stored at 4 °C (see Note 2).
234
Michael Wasney and Sebastian Pott
2. 1X RSB Buffer [18]: Prepare a stock of 10X RSB with 100 mM Tris–HCl, pH 7.4, 100 mM NaCl, and 30 mM MgCl2. Make at least 3 mL of 1X RSB (1:10 dilution of 10X RSB) per sample being processed. 3. 1X PBS. 4. 1% NP-40: Make 1 mL of 1% NP-40 from 10% NP-40 by mixing 10 μL of 10% NP-40 with 90 μL of water. 5. GpC Methylase reaction mix: Mix 7.5 μL 10X GpC Methyltransferase buffer, 1.5 μL 32 mM SAM, and 50 μL 4 U/μL GpC Methyltransferase (NEB M0277L). Nuclei will be added directly to this mix for the methylase reaction. Reserve 25 μL 4 U/μL GpC Methyltransferase and 0.75 μL 32 mM SAM to boost the reaction. 2.1.2 FluorescenceActivated Cell Sorting
1. NucBlue™ Live Cell Stain ReadyProbes (Invitrogen R37605).
2.1.3 Bisulfite Conversion
1. CT Conversion reagent: Add 7.9 mL M-Solubilization Buffer and 3 mL M-Dilution Buffer to one bottle of CT Conversion reagent. Vortex vigorously for at least 10 min to fully dissolve the reagent, and then add 1.6 mL M-Reaction buffer (Zymo D5022). Prepared CT Conversion reagent can be stored overnight at room temperature, for a week at 4 °C, or for up to 1 month at -20 °C. If stored at 4 °C or -20 °C, warm the solution at 37 °C prior to use. 2. EZ-96 DNA Methylation-Direct Kit (shallow-well) (Zymo D5022). • M-Binding buffer. • M-Wash buffer: To prepare one bottle of M-Wash buffer, add 144 mL of 100% ethanol to the 36 mL of concentrate in the bottle provided by the kit. • M-Desulphonation buffer. • M-Elution buffer. 3. Random Primer Solution: Prepare Random Primer Solution prior to the purification by adding 64 μL 100 μM random primer stock to 728 μL M-Elution buffer. Keep on ice.
2.1.4 Random-Primed DNA Synthesis
1. Random Priming Master Mix: Prior to denaturing the samples as part of the Random-primed DNA synthesis step, mix 922 μL 10X Blue Buffer, 231 μL of 50 U/μL Klenow fragment (Qiagen Beverly P7010HCL), 461 μL of dNTP solution with each nucleotide at a concentration of 10 mM (Thermo Fisher R0191), and 2995 μL of water. Keep on ice.
scNOMe-Seq
235
2.1.5 Inactivation of Free Primers and dNTPs
1. Exo/rSAP Master Mix: Prior to beginning inactivation step, mix 922 μL of 20 U/μL Exonuclease I and 461 μL of 1 U/μL rSAP (Qiagen Beverly X8010L). Keep on ice.
2.1.6
1. SPRI Beads: Apportion 280 μL of Sera-Mag SpeedBeads into an Eppendorf tube and place on a magnetic stand. Allow the solution to clear of beads before carefully removing the supernatant. Wash the beads twice with 1 mL TE. Between washes, remove the tube from the magnet and mix by inversion before replaces the tube on the stand and allowing the beads to clear. After the second wash, resuspend beads in 280 μL of TE. Meanwhile, transfer 2.52 g PEG 8000 to a 50 mL conical tube. Add 2.8 mL of 5 M NaCl, 140 μL 1 M Tris–HCl pH 8, and 28 μL of 0.5 M EDTA pH 8. Add 7 to 8 mL of water and vortex the solution until the PEG 8000 has dissolved. Add the washed Sera-Mag SpeedBeads and bring the solution up to 14 mL with water. Store at 4 °C (see Notes 3 and 12).
Sample Clean-Up
2. 80% Ethanol: To make 50 mL of 80% ethanol, mix 40 mL 200 proof ethanol and 10 mL water. Vortex before use. 2.1.7
Adaptase Reaction
2.1.8 Library Amplification
1. Adaptase Master Mix: Mix 450.5 μL of Elution Buffer (Qiagen 19,086), 212 μL Buffer G1, 212 μL Reagent G2, 132 μL Reagent G3, 53 μL Enzyme G4, and 53 μL Enzyme G5. Pipette to mix and keep on ice. 1. P5L PCR Primer Mix: 1.2 μM P5L primer (working concentration of 600 nM when combined with P7L primer). Mix 1.2 μL of 100 μM P5L stock with 98.8 μL water. Keep on ice before use. 2. P7L PCR Primer Mix: 2 μM P7L primer (working concentration of 1 μM after being combined with P5L primer). Mix 2 μL of 100 μM P7L primer with 98 μL water. Keep on ice before use. 3. 2X Kapa Hifi Mix (Roche 07958935001).
2.1.9
Library Clean-Up
1. SPRI Beads. 2. 80% Ethanol. 3. Elution Buffer (Qiagen 19086). 4. Qubit 4 Fluorometer (Invitrogen or Qubit Flex Fluorometer (Invitrogen Q33327)).
2.1.10 Primers and Barcodes
1. HPLC purified random primers (added after bisulfite conversion): H: A, G, or T. Barcode 1: /5SpC3/ TTCCCTACACGACGCTCTTCC GATCTATCACG (H1:33340033)(H1)(H1)(H1)(H1)(H1) (H1)(H1)(H1).
236
Michael Wasney and Sebastian Pott
Barcode 2: /5SpC3/ TTCCCTACACGACGCTCTTCC GATCTCGATGT (H1:33340033)(H1)(H1)(H1)(H1)(H1) (H1)(H1)(H1). Barcode 3: /5SpC3/ TTCCCTACACGACGCTCTTCC GATCTTGACCA (H1:33340033)(H1)(H1)(H1)(H1)(H1) (H1)(H1)(H1). Barcode 4: /5SpC3/ TTCCCTACACGACGCTCTTCC GATCTGCCAAT (H1:33340033)(H1)(H1)(H1)(H1)(H1) (H1)(H1)(H1). Barcode 5: /5SpC3/ TTCCCTACACGACGCTCTTCC GATCTCAGATC (H1:33340033)(H1)(H1)(H1)(H1)(H1) (H1)(H1)(H1). Barcode 6: /5SpC3/ TTCCCTACACGACGCTCTTCC GATCTACTTGA (H1:33340033)(H1)(H1)(H1)(H1)(H1) (H1)(H1)(H1). Barcode 7: /5SpC3/ TTCCCTACACGACGCTCTTCC GATCTTAGCTT (H1:33340033)(H1)(H1)(H1)(H1)(H1) (H1)(H1)(H1). Barcode 8: /5SpC3/ TTCCCTACACGACGCTCTTCC GATCTCTTGTA (H1:33340033)(H1)(H1)(H1)(H1)(H1) (H1)(H1)(H1). These random priming oligos differ from the ones provided by Swift Biosciences, and sequences were provided in Luo et al. [17]. 2. P5 primers (added during library amplification). P501: AATGATACGGCGACCACCGAGATCTACACA CGATCAGACACTCTTTCCCTACACGACGCTCT. P502: AATGATACGGCGACCACCGAGATCTACACT CGAGAGTACACTCTTTCCCTACACGACGCTCT. P503: AATGATACGGCGACCACCGAGATCTACACC TAGCTCAACACTCTTTCCCTACACGACGCTCT. P504: AATGATACGGCGACCACCGAGATCTACACA TCGTCTCACACTCTTTCCCTACACGACGCTCT. P505: AATGATACGGCGACCACCGAGATCTACACT CGACAAGACACTCTTTCCCTACACGACGCTCT. P506: AATGATACGGCGACCACCGAGATCTACACC CTTGGAAACACTCTTTCCCTACACGACGCTCT. P507: AATGATACGGCGACCACCGAGATCTACACA TCATGCGACACTCTTTCCCTACACGACGCTCT. P508: AATGATACGGCGACCACCGAGATCTACACT GTTCCGTACACTCTTTCCCTACACGACGCTCT. P509: AATGATACGGCGACCACCGAGATCTACACA TTAGCCGACACTCTTTCCCTACACGACGCTCT. P510: AATGATACGGCGACCACCGAGATCTACACC GATCGATACACTCTTTCCCTACACGACGCTCT.
scNOMe-Seq
237
P511: AATGATACGGCGACCACCGAGATCTACACG ATCTTGCACACTCTTTCCCTACACGACGCTCT. P512: AATGATACGGCGACCACCGAGATCTACACA GGATAGCACACTCTTTCCCTACACGACGCTCT 3. P7 primers (added during library amplification). P701: CAAGCAGAAGACGGCATACGAGATAGGCAA TGGTGACTGGAGTTCAGACGTGTGCTCTT. P702: CAAGCAGAAGACGGCATACGAGATTCACCT AGGTGACTGGAGTTCAGACGTGTGCTCTT. P703: CAAGCAGAAGACGGCATACGAGATCATACG GAGTGACTGGAGTTCAGACGTGTGCTCTT. P704: CAAGCAGAAGACGGCATACGAGATGTCATC GTGTGACTGGAGTTCAGACGTGTGCTCTT. P705: CAAGCAGAAGACGGCATACGAGATTTACCG ACGTGACTGGAGTTCAGACGTGTGCTCTT. P706: CAAGCAGAAGACGGCATACGAGATACCTTC GAGTGACTGGAGTTCAGACGTGTGCTCTT. P707: CAAGCAGAAGACGGCATACGAGATACGCTT CTGTGACTGGAGTTCAGACGTGTGCTCTT. P708: CAAGCAGAAGACGGCATACGAGATGAGTAG AGGTGACTGGAGTTCAGACGTGTGCTCTT. 2.2 2.2.1
Equipment Assay
1. KIMBLE® Dounce tissue grinder set (DWK 885300–0001). 2. 384-well clear reaction plates (Applied Biosystems 4483285). 3. Adhesive PCR sealing foil sheets (Thermo Fisher 00139148). 4. Zymo-Spin 384-Well DNA Binding Plates (Zymo C2012). 5. Thermo Scientific™ Nunc™ 96-Well Polypropylene DeepWell™ Storage Plates (Thermo Fisher Scientific 95040462) (see Note 5). 6. DynaMag™-96 Side Skirted Magnet (Thermo Fisher Scientific 12,027). 7. 96-well PCR plate (Genesee Scientific 24-302). 8. DynaMag™-2 Magnet (Thermo Fisher Scientific 12321D). 9. 1.7 mL microtube, clear (Genesee Scientific 24-282LR). 10. 0.2 mL 8-well PCR strip tubes (Genesee Scientific 21-125). 11. Flow cytometer (e.g., BD FACSAria™ Fusion). 12. Thermocycler(s) with 96- and 384-well blocks. 13. Centrifuge outfitted with a swinging bucket rotor capable of spinning at 5000 × g, maintaining a temperature of 4 °C, and accommodating both plates and microtubes. 14. 12-channel pipette set capable of handling volumes ranging from 0.5 to 300 μL.
238
Michael Wasney and Sebastian Pott
15. Standard wet laboratory equipment (e.g., pipette set, serological pipette, 4 °C refrigerator, and - 20 °C freezer). 2.2.2
3 3.1
Sequencing
1. Access to Next Generation Sequencing platform: Illuminabased sequencing platform (e.g., NovaSeq 6000).
Methods Assay
3.1.1 Nuclei Isolation and GpC Methyltransferase Treatment
1. Prepare digestion mix on ice. Deliver 2 μL of mix to every well of two 384-well plates. Plates with digestion mix can be prepared the day before the experiment and stored at 4 °C (see Notes 1 and 2). 2. Obtain a suspension of single cells. This protocol was optimized for use with a total of 5–10 million cells. Centrifuge single cell suspension at 500 × g for 5 min at 4 °C, remove the supernatant, suspend in 1 mL ice-cold PBS, and centrifuge the sample again at the same settings. Discard the supernatant and suspend in 1 mL 1X RSB buffer. Incubate for 10 min at room temperature. 3. Add 15 μL of 1% NP-40 to the cell suspension (NP-40 concentration may need to be adjusted depending on the cell type). Transfer the cell suspension to a 2 mL dounce tissue grinder and add 1 mL of 1X RSB. Homogenize cell suspension using 15 strokes of both pestle A and B (number of strokes may be adjusted to accommodate the particular cell-/tissue-type being handled). Transfer lysed cells to a new 1.5 mL Eppendorf tube and centrifuge at 800 × g for 5 min at 4 °C. Discard the supernatant and wash with 1 mL 1X RSB. Incubate for 30 s to 1 min at room temperature. Centrifuge at 800 × g for 5 min at 4 °C. 4. Resuspend the nuclei in 1X GpC Methyltransferase buffer such that there are one million nuclei per 75 μL of buffer. If there are less than one million nuclei, suspend in 75 μL of buffer. Meanwhile, prepare GpC Methylase Reaction Mix. Add the 75 μL of nuclei to the reaction mix and incubate at 37 °C for 7.5 min. Add a boost of 25 μL GpC Methyltransferase and 0.75 μL 32 mM SAM and incubate for another 7.5 min at 37 °C (see Note 4). 5. Quench the reaction by adding 500 μL 1X PBS and spin at 800 × g for 5 min at 4 °C. Resuspend in 0.5–1 mL of 1X PBS and add 2 drops of Hoechst per mL of sample (1 drop for 0.5 mL, two drops for 1 mL). Keep the sample on ice for 15 min before commencing with fluorescence-activated cell sorting (FACS).
scNOMe-Seq 250k
5
10
239
105
Gate 2: 84% (37.1%)
103
150k
SSC-A
FSC-H
SSC-A
200k 104
100k
104
3
10 50k Gate 1: 44.2%
2
10
0 50k 100k 150k 200k 250k FSC-A
single nuclei Gate 3: 18.7% (6.9%)
102 50k 100k 150k 200k 250k FSC-A
102
103
104 BV421-A
105
Fig. 3 Example of a gating strategy during FACS sorting. Individual nuclei were selected based on size and DNA content. Percentages provide proportion of events within a particular gate for each scatter plot; the proportion of total events is indicated in parenthesis 3.1.2 FluorescenceActivated Cell Sorting
1. We use the BD FACSAria™ Fusion system to sort individual nuclei into 384-well plates. However, any system with this capability should suffice. 2. Our gating strategy is focused on recovering intact single nuclei and excluding cellular debris (Fig. 3). This needs to be adjusted based on the input material. 3. Sort a single nucleus into each well of the 384-well plates being processed. Place the plates on ice when sorting is complete.
3.1.3 Bisulfite Conversion
1. Prepare CT Conversion reagent and add 15 μL of it to each well of two 384-well reaction plates. Pipette up and down 8 times to mix the sample. Seal the plates and quick spin at 2000 × g for 10 s at room temperature. Place plates into a thermocycler able to accommodate 384-well plates, and run the following program: (a) 98 °C for 8 min (b) 64 °C for 3.5 h (c) Hold at 4 °C. (see Notes 1, 5, and 11) 2. Prior to purifying bisulfite-converted samples, prepare Random Primer Solutions (eight separate solutions for primers 1–8). Keep primer solutions on ice until use. 3. Place two Zymo-Spin 384-Well DNA Binding Plates on two 96-Well Polypropylene DeepWell™ Storage Plates and add 80 μL of M-Binding buffer to each well. Transfer bisulfiteconverted samples to the 384-Well DNA Binding Plates and pipette up and down 8 times to mix the samples. Centrifuge at 5000 × g for 5 min at room temperature (see Notes 1, 6, and 10).
240
Michael Wasney and Sebastian Pott
Plate 1 1
2
Plate 2 3
4
1
A
A
B
B
C
C
D
D
2
3
Barcode 1
Barcode 5
Barcode 2
Barcode 6
Barcode 3
Barcode 7
Barcode 4
Barcode 8
4
Fig. 4 Loading schema for primers in the two 384-well plates used for random priming step. Pattern shown for Wells A 1–2 and B 1–2 in plates 1 and 2, respectively, is repeated across the entire plate
4. Discard the flow-through in the 96-Well Storage Plates and add 100 μL M-Wash buffer to each well of the 384-Well DNA Binding Plates. Centrifuge at 5000 × g for 5 min at room temperature (see Note 1). 5. Discard the flow-through in the 96-Well Storage Plates and add 50 μL M-Desulphonation buffer to each well of the 384-Well DNA Binding Plates. Incubate at room temperature for 15 min and then centrifuge at 5000 × g for 5 min at room temperature (see Note 1). 6. Discard the flow-through in the 96-Well Storage Plates. Add 100 μL M-Wash buffer and centrifuge at 5000 × g for 5 min at room temperature. Repeat this wash step once more (see Note 1). 7. Place 384-Well DNA Binding Plates on 384-Well Reactions Plates (the two 96-Well Storage Plates can be disposed). Add 7 μL of one of the eight random primer solutions to each well of each plate such that half of the primers are delivered to one plate and the other half are delivered to the other plate such that every other well – along both ranks and files of the plates – contains the same primer (Fig. 4). Once primers have been added to every well, incubate the plates for 5 min and then centrifuge at 5000 × g for 5 min at room temperature. Seal the 384-Well Reaction Plates and store at -20 °C for up to 1 week (see Notes 1 and 10).
scNOMe-Seq 3.1.4 Random-Primed DNA Synthesis
241
1. Prepare Random Priming Master Mix prior to denaturing samples and keep the mix on ice (see Note 7). 2. Denature the samples by placing 384-well plates in a thermocycler and run the following program: (a) 95 °C for 3 min. 3. Place the plates on ice for 2 min. 4. Add 5 μL Random Priming Master Mix to each well of the 384-well reaction plates. Vortex to mix and quick spin at 2000 × g for 10 s at room temperature (see Notes 1 and 10). 5. Place the plates into a thermocycler and run the following program: (a) 4 °C for 5 min (b) 25 °C for 5 min (c) 37 °C for 60 min (d) Hold at 4 °C (see Note 11).
3.1.5 Inactivation of Free Primers and dNTPs
1. Prepare Exo/rSAP Master Mix and keep on ice. Add 1.5 μL to each well of the 384-well reaction plates. Vortex to mix and quick spin at 2000 × g for 10 s at room temperature (see Notes 1, 8, and 10). 2. Place the plates into a thermocycler and run the following program: (a) 37 °C for 30 min (b) Hold at 4 °C (see Note 11).
3.1.6
Sample Clean-Up
1. Prepare 14 mL of SPRI beads (see Note 12). 2. Add 73.6 μL (0.8×) of SPRI beads to each well of a clean 96-well plates. Pool the samples from the two 384-well plates in the wells of the 96-well plate such that each well of the 96-well plates holds a pool of eight samples, each with a distinct random barcode. Vortex the plates briefly and incubate for 5 min at room temperature (see Notes 1 and 10). 3. Quick spin at 2000 × g for 10 s at room temperature and then place the 96-well plate on a DynaMag™-96 Side Skirted Magnet and let stand until the solution is clear of beads. 4. Remove the supernatant and wash beads 3 times with 150 μL fresh 80% ethanol. After the third wash, remove the ethanol and allow the beads to dry at room temperature. Take care to not overdry beads (see Notes 1 and 10). 5. Remove the plate from the magnet, add 10 μL of Elution Buffer (Qiagen), and suspend beads by pipetting. Vortex the plates briefly and incubate for 5 min at room temperature (see Notes 1 and 10).
242
Michael Wasney and Sebastian Pott
6. Quick spin at 2000 × g for 10 s at room temperature and then place the 96-well plate on a DynaMag™-96 Side Skirted Magnet and let stand until the solution is clear of beads. 7. Transfer 10 μL of the supernatant from each well to a clean 96-well plate. Store at -20 °C or move on to the next step of the protocol (see Notes 1 and 10). 3.1.7
Adaptase Reaction
1. Prepare Adaptase Master Mix prior to denaturing the sample and keep mix on ice. 2. Denature the samples by placing the 96-well plate in a thermocycler and run the following program: (a) 95 °C for 3 min. 3. Place the plate on ice for 2 min. 4. Add 10.5 μL Adaptase Master Mix to each well of the 96-well plate. Vortex to mix and quick spin at 2000 × g for 10 s at room temperature (see Notes 1 and 10). 5. Place the plate in the thermocycler and run the following program: (a) 37 °C for 30 min (b) 95 °C for 2 min (c) Hold at 4 °C (see Note 11).
3.1.8 Library Amplification
1. Prepare P5L and P7L primer mixes. Add 5 μL of the appropriate primers to the each well of a clean 96-well plate such that each well has a unique P5L–P7L combination (keep note of each combination’s location in the plate). Transfer 5 μL of each P5L–P7L combination to the corresponding well in the 96-well plate containing the pooled samples (see Notes 1 and 10). 2. Add 25 μL 2X Kapa Hifi Mix to each well of the 96-well plate containing the samples. Vortex to mix and quick spin at 2000 × g for 10 s at room temperature (see Notes 1 and 10). 3. Place the plate in a thermocycler and run the following program: (a) 95 °C for 2 min (b) 98 °C for 30 s (c) 98 °C for 15 s (d) 64 °C for 30 s (e) 72 °C for 2 min Return to step c 14 times for a total of 15 cycles. (f) 72 °C for 5 min (g) Hold at 4 °C (see Notes 11 and 13).
scNOMe-Seq 3.1.9
Library Clean-Up
243
1. Add 40 μL SPRI beads to each well of the 96-well plate. Vortex the plate briefly and incubate for 5 min at room temperature and then quick spin at 2000 × g for 10 s at room temperature. Place the plate on a DynaMag™-96 Side Skirted Magnet and allow the solution to clear of beads (see Notes 1, 10, and 12). 2. Remove the supernatant and wash beads twice with 150 μL of freshly made 80% ethanol. After the final wash, remove the plate from the magnet and allow beads to dry at room temperature. Take care to not overdry beads (see Note 1). 3. Add 25 μL Elution Buffer (Qiagen) and suspend beads by pipette. Place back on the DynaMag™-96 Side Skirted Magnet and allow the solution to clear of beads. Combine the supernatant from each column into 12 Eppendorf tubes such that there is one Eppendorf tube per 96-well plate column. Add 160 μL (0.8×) SPRI beads to each of the 12 Eppendorf tubes. Pipette to mix and incubate for 5 min at room temperature (see Notes 1 and 10). 4. Place the Eppendorf tubes on a DynaMag™-2 Magnet and allow the solution to clear of beads. Discard the supernatant and wash the beads 2 times with 500 μL of 80% ethanol. After the second wash, remove all ethanol and allow the beads to dry at room temperature. Take care to not overdry beads (see Note 10). 5. Add 40 μL Elution Buffer (Qiagen) and suspend beads by pipette. Incubate for 5 min at room temperature. After incubation, transfer 40 μL of the supernatant to 12 new Eppendorf tubes (see Note 10). 6. Measure concentration of the libraries using a Qubit Fluorometer and assess the fragment size distribution with an Agilent 2100 Bioanalyzer. Fragment sizes should fall between 300 and 1500 bp (Fig. 5). On the fluorometer, libraries with a concentration of 2–15 ng/μL are to be expected. If concentration and size distributions are as expected, proceed with sequencing of the libraries.
3.2
Sequencing
3.3
Analysis
1. Using an Illumina-based next generation sequencing platform (e.g., NovaSeq 6000), sequence the libraries using pair end sequencing with a 200 bp cassette. We generally aim to obtain 500,000–one million reads per cell (see Note 9). A full description of the analysis is outside of the scope of this protocol describing the steps top generate scNOMe-seq libraries for sequencing. We provide an example of a processing pipeline for raw scNOMeseq data at [https://github.com/sebpott/scNOMe_smk].
244
Michael Wasney and Sebastian Pott
Fig. 5 Expected size distribution of scNOMe-seq library pools. Bioanalyzer profile shows size distribution of a representative pool of 64 individual scNOMe-seq libraries after final amplification
4
Notes 1. All steps involving 96- and 384-well plates should be performed with a 12-tip multichannel pipette. Reagents can be split between 12-tube rows of strip tubes using a single channel and then transferred to their final destinations in the 96- and 384-well plates using a multichannel pipette. 2. M-Digestion buffer can form a white precipitate, which can be dissolved by keeping the buffer at 37 °C for 30 min prior to use. 3. Make SPRI fresh for each experiment. Before each use, allow beads to warm to room temperature for 30 min and vortex vigorously. 4. In practice we have observed relatively small changes in global GpC levels between a single 7.5 min incubation period and double that time. However, in order to reach saturation, we continue to use ~15 min total incubation time. 5. It takes roughly 10 min of vigorous vortexing for the powdered CT Conversion reagent to go into the solution, and per the manufacturer’s instruction manual, it is normal to see trace amounts of undissolved reagent even after extensive mixing. 6. It is best to use the two 384-well plates as balances for each other for all centrifugations during the sample purification portion of the bisulfite conversion workflow. During steps that require a bench rest, begin the timer only after the reagent has been delivered to the final row of the second plate.
scNOMe-Seq
245
7. Prepare all master mixes in advance of their respective steps to avoid prolonged waiting times of the sample on ice or in the thermocycler. 8. The Exo/rSAP Master Mix is very viscous, which can make it difficult to aspirate equal amounts of fluid in all channels of a multichannel pipette. During this step, it is crucial to visually check the amount of fluid in each channel prior to delivering the solution to each well of the 384-well plates. 9. Sequencing parameters may change depending on the characteristics of your libraries, and how many finals pools you choose to submit. For example, a cassette with more base pairs may be desirable to obtain additional sequencing from each fragment for libraries with longer average fragment length. Choice of flow cell should be determined by the number of pools that are being multiplexed to achieve the optimal number of reads per cell. 10. Pipette tips should be replaced whenever they come in contact with the plates or with fluid in the plates’ wells (this includes most, but not all, multichannel pipetting steps). When pooling barcoded samples (i.e., 3.1.6.2 and 3.1.9.3), the same pipet tips can be used for samples that end up in the same pooled well. 11. If working with a thermocycler that can only handle a singleplate reaction plate at once, process the plates sequentially; that is, perform a step for the first of two plates and store that plate at 4 °C as the same step is performed on the second plate. After both plates are complete, move on to the second step. 12. SPRI beads should be prepared fresh for each experiment. Allow the beads to warm to room temperature and mix vigorously before use. 13. If the concentration of your libraries is lower than expected, consider the following: (a) Increase the cycle number (e.g., 16 or 17) to achieve more highly concentrated libraries. We have not observed a huge increase in redundant reads when increasing the amplification cycle number by one or two. (b) Ensure good pipetting technique, particularly because multichannel pipettes can be more difficult to use than normal ones. If not used properly, not all channels will aspirate the same volume of liquid. Review the directions provided by your multichannel pipette’s manufacturer prior to use.
246
Michael Wasney and Sebastian Pott
14. Manual processing of four 384-well plates works well for us. Scaling the assay up beyond that might be difficult without the right reagents and equipment, however. We routinely use eight barcodes, which allows for two 384-well plates to be pooled together (768 cells in total) as presented above. We successfully used 16 barcodes as well, which allows for four 384-well plates to be pooled together (1536 cells in total). Because each 384-well plate takes a significant amount of time and labor to process, some level of automation (e.g., a liquid handler and a thermocycler capable of accommodating multiple 384-well plates at once) would likely be advisable for scaling the assay beyond four plates. References 1. Pott S (2017) Simultaneous measurement of chromatin accessibility, DNA methylation, and nucleosome phasing in single cells. elife 6: 1127. https://doi.org/10.7554/elife.23203 2. Clark SJ, Argelaguet R, Kapourani C-A et al (2018) scNMT-seq enables joint profiling of chromatin accessibility DNA methylation and transcription in single cells. Nat Commun 9(1):9. https://doi.org/10.1038/s41467018-03149-4 3. Li L, Guo F, Gao Y et al (2018) Single-cell multi-omics sequencing of human early embryos. Nat Cell Biol 15(1):18. https://doi. org/10.1038/s41556-018-0123-2 4. Kaya-Okur HS, Wu SJ, Codomo CA et al (2019) CUT & Tag for efficient epigenomic profiling of small samples and single cells. Nat Commun 10:1930. https://doi.org/10. 1038/s41467-019-09982-5 5. Cao J, Cusanovich DA, Ramani V et al (2018) Joint profiling of chromatin accessibility and gene expression in thousands of single cells. Science 361(1380):1385. https://doi.org/ 10.1126/science.aau0730 6. Chen S, Lake BB, Zhang K (2019) Highthroughput sequencing of the transcriptome and chromatin accessibility in the same cell. Nat Biotechnol 37:1–6. https://doi.org/10. 1038/s41587-019-0290-0 7. Chen AF, Parks B, Kathiria AS et al (2021) NEAT-seq: simultaneous profiling of intranuclear proteins, chromatin accessibility, and gene expression in single cells. Biorxiv 2021.07.29.454078. https://doi.org/10. 1101/2021.07.29.454078 8. Wang Y, Yuan P, Yan Z et al (2021) Single-cell multiomics sequencing reveals the functional regulatory landscape of early embryos. Nat
Commun 12:1247. https://doi.org/10. 1038/s41467-021-21409-8 9. Nabilsi NH, Deleyrolle LP, Darst RP et al (2013) Multiplex mapping of chromatin accessibility and DNA methylation within targeted single molecules identifies epigenetic heterogeneity in neural stem cells and glioblastoma. Genome Res 24. https://doi.org/10.1101/ gr.161737.113 10. Kilgore JA, Hoose SA, Gustafson TL et al (2007) Single-molecule and population probing of chromatin structure using DNA methyltransferases. Methods 41(320):332. https:// doi.org/10.1016/j.ymeth.2006.08.008 11. Kelly TK, Liu Y, Lay FD et al (2012) Genomewide mapping of nucleosome positioning and DNA methylation within individual DNA molecules. Genome Res 22(2497):2506. https://doi.org/10.1101/gr.143008.112 12. Schu¨beler D (2015) Function and information content of DNA methylation. Nature 517: 3 2 1 – 3 2 6 . h t t p s : // d o i . o r g / 1 0 . 1 0 3 8 / nature14192 13. Lai WKM, Pugh BF (2017) Understanding nucleosome dynamics and their links to gene expression and DNA replication. Nat Rev Mol Cell Biol 18:548–562. https://doi.org/10. 1038/nrm.2017.47 14. Argelaguet R, Clark SJ, Mohammed H et al (2019) Multi-omics profiling of mouse gastrulation at single-cell resolution. Nature 1–5. https://doi.org/10.1038/s41586-0191825-8 15. Li E, Zhang Y (2014) DNA methylation in mammals. CSH Perspect Biol 6:a019133. https://doi.org/10.1101/cshperspect. a019133 16. Luo C, Rivkin A, Zhou J et al (2018) Robust single-cell DNA methylome profiling with
scNOMe-Seq snmC-seq2. Nat Commun 9:3824. https:// doi.org/10.1038/s41467-018-06355-2 17. Luo C, Keown CL, Kurihara L et al (2017) Single-cell methylomes identify neuronal subtypes and regulatory elements in mammalian cortex. Sci New York NY 357:600–604. https://doi.org/10.1126/science.aan3351
247
18. Miranda TB, Kelly TK, Bouazoune K, Jones PA (2010) Methylation-sensitive single-molecule analysis of chromatin structure. Curr Protoc Mol Biology Ed Frederick M Ausubel Et Al Chapter 21:Unit 21.17.1 16. https://doi. org/10.1002/0471142727.mb2117s89
Chapter 13 Massively Parallel Profiling of Accessible Chromatin and Proteins with ASAP-Seq Eleni P. Mimitou, Peter Smibert, and Caleb A. Lareau Abstract While methods such as the Assay for Transposase Accessible Chromatin by sequencing (ATAC-seq) enable a comprehensive characterization of regulatory DNA, additional measurements are required to characterize the multifaceted nature of eukaryotic cells. Here, we delineate the ATAC with Select Antigen Profiling by sequencing (ASAP-seq) protocol, a scalable approach to quantifying proteins via oligo-tagged antibodies alongside accessible DNA in thousands of single cells. Critically, our method utilizes a custom bridge oligo that enables the utilization of a variety of oligo-conjugated antibodies, enabling the utilization and repurposing of other commercial products. The ASAP-seq method can be completed with straightforward experimental and computational modifications existing single-cell ATAC-seq workflows but yields distinct modalities underlying complex cellular states, including estimation of protein abundance on the cell surface as well as intracellular and intranuclear factors. Key words Multimodal, Single-cell, Protein, Accessible chromatin, ATAC, Intracellular, Gene regulation
1
Introduction The massively parallel measurement of chromatin accessibility and transcriptomes within single cells has catalyzed a rapidly increasing number of studies that characterize cellular heterogeneity in biological systems. Specifically, droplet-based single-cell ATACseq (scATAC-seq) and scRNA-seq enable the comprehensive characterization of genome-wide chromatin accessibility and cellular polyadenylated transcripts in thousands of individual cells within a single experiment. Though these methods have enabled many fundamental insights of the biology underlying complex tissues, both scATAC-seq and scRNA-seq suffer from data sparsity, wherein most accessible loci and most genes are not measured in most cells. This sparse sampling of the underlying cellular features can complicate downstream analyses and inferences, particularly in identifying cell state features that delineate closely related cell
Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols, Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7_13, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023
249
250
Eleni P. Mimitou et al.
types. Hence, there has been increasing importance of the development and application of single-cell multi-omic technologies that measure multiple modalities from individual cells. To this end, several recent methods have combined targeted detection of select protein markers with scRNA-seq [1–4]. Conceptually, these methodologies synthesize decades of knowledge of specific cell types and states obtained by cytometry-based approaches with the mostly unbiased readout from the transcriptome. Furthermore, the pairing of a sparse genome-wide RNA measurement with a sensitive protein-based quantification for a smaller number of targets simultaneously enables both systematic discovery of genes associated with cellular phenotypes while retaining high-confidence inference for a selected subset of proteins. Importantly, recent work has established sophisticated computational algorithms that demonstrate that combining multiplexed protein detection with scRNA-seq resolves cell types better than either modality alone [5], reinforcing the utility of this single-cell multi-omic approach. Chromatin accessibility is an additional modality that is now routinely used to characterize single cells in high-throughput by a variety of different single-cell Assay for Transposase Accessible Chromatin (ATAC-seq) approaches [6–8]. In many circumstances, such as development, scATAC may provide a more sensitive measure of the continuum of cell states, particularly in differentiation settings where epigenetic reprogramming may be the first mover [9]. However, many complications arise from the accessible chromatin measurements derived from scATAC-seq. First, per-cell sparsity tends to be more extreme due in part to ~5–10× more features (accessible chromatin peaks) than scRNA-seq (genes). Additionally, inferences of gene activity scores rely on the (weighted) summation of accessibility fragments overlapping or near gene bodies. However, as many regulatory elements overlap gene bodies but control the expression of other loci, this method provides an imperfect determination of the genes that are actively transcribed, much less translated, in any given cell. Thus, fine-grained cell-type identification from scATAC-seq data alone is channeling, often requiring complementary scRNA-seq data for high-quality annotations. To remedy these issues with scATAC-seq data, we recently introduced ATAC with Select Antigen Profiling by sequencing (ASAP-seq) to combine robust detection of proteins with chromatin accessibility [10]. In practice, the ASAP-seq workflow builds on the mitochondrial scATAC-seq (mtscATAC-seq; see other chapters) method that enables scATAC-seq to be performed on whole cells [11] (Fig. 1). As a consequence of the modified protocol, the cell remains intact for high-quality estimation of protein abundances and accessible chromatin with minimal modifications to commercial products. Notably, ASAP-seq enables a number of tunable options, including (1) the use of “hashing” to multiplex
Massively Parallel Profiling of Accessible Chromatin and Proteins with ASAP-Seq
251
Fig. 1 Schematic of the experimental assay. ASAP-seq allows whole cell input into the scATAC-seq workflow, maintaining the connection between nuclear content and cell surface marker information. Cells are stained with oligo-conjugated antibodies followed by fixation, permeabilization, and Tn5 transposition. Bridge oligos are spiked in the barcoding mix prior to droplet formation to allow simultaneous barcoding of ATAC fragments and antibody-derived oligos
samples; (2) the use of different types of antibody reagents for protein detection; (3) the detection of intracellular proteins through minor modifications; and (4) the ability to either enrich or deplete reads derived from mitochondrial DNA, which can be used for inferring clonal relationships between cells in a sample [11]. We note one key feature that is conceptually distinct for the proteo-genomic capture in ASAP. Specifically, in contrast to other methods that use exogenous oligonucleotides to either report on protein abundance or enable sample multiplexing [1–4, 12, 13], the oligonucleotide sequences that read out protein levels in ASAPseq do not directly interact with the barcoding reagents from the parent ATAC-seq assay. Instead, ASAP-seq employs a bridging oligo to convert existing labeling oligonucleotides into a format that is compatible with the ATAC-seq kit, providing enhanced flexibility for reagent use in ASAP-seq (Fig. 2). Here, the specification of this part of the protocol has important implications for the accessibility and usability of the assay as the bridging oligo enables the immediate use of a large catalog of existing and available reagents as well as combinations of different specifications of reagents. In this method description, we outline the foundational steps requisite for enabling single-cell multi-omic profiling with ASAPseq technology [10]. We outline a synthesized experimental and computational workflow that provides flexibility to quantify proteins for downstream integrative analyses and identifies critical steps associated with quality control of libraries. Taken together, ASAPseq enables the high-confidence quantification of selected intracellular and surface antigens while retaining the comprehensive discovery of accessible chromatin loci and clonality underlying cells.
252
Eleni P. Mimitou et al.
Fig. 2 Barcoding scheme of the protein tags using the bridge oligo strategy. Bridge oligo A (BOA) and bridge oligo B (BOB) function as templates to extend the protein-derived oligos in droplets. While TSB tags (right) contain UMIs, UBIs (N9V) are introduced to TSA tags via the bridge oligo (left) to allow molecule counting
2
Materials
2.1 Cell Processing, Staining, Fixation, and Lysis
1. Phosphate buffered saline (PBS) (any provider). 2. CITE-seq staining buffer: 2% BSA, 0.01% Tween in PBS. 3. Human TruStain FcX™ (BioLegend 422,301). 4. TotalSeq™-A or TotalSeq™-B oligo-labeled antibody reagents (individually or as panels) (BioLegend – see Note 1). 5. FACS buffer: PBS with 1% FBS. Filtered at 0.45 μm, store at 4 ° C. 6. DAPI (any provider, for example BioLegend 422,801). 7. Formaldehyde, 16% (any provider, for example Thermo Fisher 28,906). 8. Glycine solution, 2.5 M (any provider, for example Ricca Chemical RMB19103-50C2). 9. Tris–HCl pH 7.5, 1 M (any provider, for example SigmaAldrich T2194). 10. NaCl, 5 M (any provider, for example Sigma-Aldrich 59222C). 11. MgCl2, 1 M (any provider, for example Sigma-Aldrich M1028). 12. NP40, 10% (Sigma-Aldrich, 74385). 13. Tween 20, 10% (Bio-Rad, 1662404). 14. Digitonin 5% (Thermo Fisher, BN2006). 15. BSA, 10% (any provider, for example Miltenyi Biotec 130-091376).
Massively Parallel Profiling of Accessible Chromatin and Proteins with ASAP-Seq
253
Table 1 Oligo sequences Oligo
Sequence (shown 5′ > 3′)
Notes
BOA
TCGTCGGCAGCGTCAGA TGTGTATAAGAGACAGNNNNN NNNNVTTTTTTTTTTTT TTTTTTTTTTTTTTTT/3InvdT
Used to bridge TSA tags 3′ modification to block extension Brings a 10-nt UBI ending in V (non-T)
BOB
TCGTCGGCAGCGTCAGATGTGTAT AAGAGACAGTTGCTAGGACC GGCCTTAAAGC/3InvdT/
Used to bridge TSB tags 3′ modification to block extension
P5
AATGATACGGCGACCACCGA
Forward primer to amplify TSA and TSB tags
P7
CAAGCAGAAGACGGCATACGAGAT
Reverse primer to re-amplify already indexed tag libraries (optional)
RPxx
CAAGCAGAAGACGGCATACGAGAT xxxxxxxxGTGACTGGAGTTCCTT GGCACCCGAGAATTCCA
TruSeq Small RNA indexing primer, used to index TSA tags
D7xx
CAAGCAGAAGACGGCATACGAGAT xxxxxxxxGTGACTGGAGTTCA GACGTGTGC
TruSeq DNA indexing primer, used to index TSB tags or TSA hashtags
16. Intracellular staining buffer (BioLegend, custom part number 900002577). Supplement with fresh DTT before use to 1 mM final concentration. 17. True-stain monocyte blocker (BioLegend, 426101). 18. DTT, 1 M (any provider, for example Sigma-Aldrich 646,563). 19. Flowmi Cell Strainer 40 μm (Bel-Art, H13680-0040). 20. Bridge oligo A (BOA) or bridge oligo B (BOB) (IDT, or other provider, see Table 1, Note 2). 21. Indexing primers (IDT or other provided, see Table 1). 2.2 ASAP-Seq Library Preparation
1. 10x Genomics Chromium Next GEM Single Cell ATAC Library & Gel Bead Kit, 16 or 4 rxns. 2. 10x Genomics Chromium Next GEM Chip H Single Cell Kit, 48 or 16 rxns. 3. 10x Genomics Single Index Kit N, Set A, 96 rxns. 4. 2x Kapa Hifi PCR mastermix. 5. SPRI beads (AMPure XP beads or KAPA Pure beads). 6. Custom oligonucleotides for library prep (see Table 1).
254
Eleni P. Mimitou et al.
2.3 Quality Control and Sequencing
1. Qubit dsDNA HS Assay Kit (Thermo Fisher Q32851 or Q33230). 2. Agilent Bioanalyzer High Sensitivity DNA Analysis Kit (or Tapestation or similar). 3. KAPA Library Quantification Kit for Illumina® Platforms (KAPA biosystems KK4835). 4. Illumina NovaSeq or NextSeq reagent kits.
2.4 Software and References Needed for Computational Analysis
1. Download cellranger-atac and relevant reference files (https:// suppor t.10xgenomics.com/single-cell-atac/software/ pipelines/latest/what-is-cell-ranger-atac). The most up-todate reference files and versions of the software are available online. This software will be used to demultiplex sequencing libraries rom an Illumina sequencing run and can be executed to process (see Note 3). 2. Install an up-to-date version of the Python 3 library either for the system, the user, or through a conda environment (see Note 4). 3. Download the kite antibody tag preprocessing toolkit. The most up-to-date version of the software is available online at https://github.com/pachterlab/kite. This software is used to build a reference map of the oligonucleotide barcodes to the respective antibody clones. 4. Download the kallisto and bustools software binaries. Current versions of these software are available at https://github.com/ pachterlab/kallisto and https://github.com/BUStools/bus tools, respectively. These software are utilities used to efficiently count reads assigned to each antibody barcode for every cell while efficiently correcting for sequencing errors. 5. Download the ASAP to kite script toolkit available online: https://github.com/caleblareau/asap_to_kite. This code is required to convert the ASAP-seq sequencing data into a format that are compatible with the existing kite | kallisto | bustools workflows (see Note 5). 6. mgatk package and dependencies (https://github.com/ caleblareau/mgatk). 7. 10x scATAC barcode whitelist: $ wget https://teichlab.github.io/scg_lib_structs/ data/737K-cratac-v1.txt. This file is available in the distribution of CellRanger-ATAC but is more accessible from the indicated GitHub link.
Massively Parallel Profiling of Accessible Chromatin and Proteins with ASAP-Seq
3
Methods
3.1 Cell Preparation, Fixation, and Permeabilization
3.1.1
255
Cell Staining
This section outlines the steps required to stain the cells with the conjugated antibodies, followed by fixation and permeabilization. The fixation steps are based on the mtscATAC-seq workflow (see a separate chapter describing mtscATAC-seq in the same issue). Permeabilization can be performed using two alternative lysis buffers: LLL (low loss lysis) and OMNI (based on OMNI-ATAC protocol [14]), which is the default lysis buffer in the 10x Genomics scATAC kit. LLL is the lysis buffer described in mtscATAC kit, which, due to lack of Tween 20 in its formulation, retains mtDNA fragments in the ATAC library that can be used for mtDNA variant tracing. In benchmarking experiments, either LLL or OMNI buffers yielded comparable ATAC and protein data and can be used interchangeably if mtDNA retention is not desired. 1. Obtain single cell suspensions (filter if needed) and measure viability and density. If viability is lower than 80%, proceed with live cell enrichment and/or use best judgement depending on sample source/importance/cell numbers. 2. Resuspend 1–2 million cells in 100 μL CITE staining buffer. 3. Add 10 μL Fc Blocking reagent. 4. Incubate for 10 min at 4 °C. 5. While cells are incubated in Fc Block, prepare the antibody pool (panel or titrated amounts). 6. Add antibody-oligo pool to cells. 7. Incubate for 30 min at 4 °C. 8. Wash cells 3 times with 1 mL CITE staining buffer, spin at 300 × g for 5 min at 4 °C for every wash to harvest cells. 9. Resuspend cells in 450 μL room temperature PBS.
3.1.2 Cell Fixation and Permeabilization
1. Use about 0.5–1 million cells in 450 μL PBS for the fixation reaction. 2. Add 30 μL 16% formaldehyde (1% final concentration), mix by pipetting, and incubate at room temperature for 10 min with occasional inversion. 3. Quench by adding glycine to final concentration 0.125 M. 4. Wash with 1× ice-cold PBS by filling up the tube, invert 5 times. 5. Spin at 400 × g for 5 min at 4 °C. 6. Discard supernatant and repeat wash with 1 mL 1× ice-cold PBS. 7. Spin at 400 × g for 5 min at 4 °C, discard the supernatant.
256
Eleni P. Mimitou et al.
Table 2 Permeabilization buffers Prepare fresh, keep on ice until use (see Note 3) OMNI lysis buffer
Wash buffer
1 M Tris–HCl pH 7.5 10 mM
10 mM
10 mM
5 M NaCl
10 mM
10 mM
10 mM
1 M MgCl2
3 mM
3 mM
3 mM
10% NP40
0.1%
0.1%
–
10% Tween 20
–
0.1%
–
5% Digitonin
–
0.01%
–
Materials
LLL lysis buffer
8. Resuspended cell pellet in 100 μL chilled lysis buffer (LLL or OMNI buffer, Table 2, see Note 6), mix by pipetting. 9. Incubate on ice, 3 min for primary cells, 5 min for cell lines. 10. Add 1 mL chilled wash buffer to the lysed cells, mix by pipetting. 11. Spin at 500 × g for 5 min at 4 °C. If intracellular staining is desired, go to Subheading 3.1.3. 12. Remove the supernatant, resuspend in 150 μL 1× nuclei buffer (10x Genomics). 13. Filter through 40 μm strainers. If the cell number is low, skip this step. 14. Count cells and adjust density according to 10× loading instructions. 15. Proceed to Subheading 3.2. 3.1.3 Intracellular Staining
1. Resuspend cell pellet from step 11 of Section 3.1.2 in 40 μL intracellular wash buffer. 2. Add 5 μL of FcX and 5 μL of monocyte block solution. 3. Incubate on ice for 15 min. 4. Add 50 μL of intracellular wash buffer, containing titrated amounts of conjugated intracellular markers (see Note 7), incubate on ice for 30 min. 5. Wash 3 times with the intracellular wash buffer, spin at 500 × g for 5 min at 4 °C. 6. Remove the supernatant, resuspend in 150 μL 1× nuclei buffer (10x Genomics). 7. Filter through 40 μm strainers. If the cell number is low, skip this step.
Massively Parallel Profiling of Accessible Chromatin and Proteins with ASAP-Seq
257
8. Count cells and adjust the density according to 10× loading instructions. 9. Proceed to Subheading 3.2. 3.2 Transposition and Barcoding
For this step, proceed according to 10x Genomics Single Cell ATAC protocol (CG000168 Rev. D for v1 and CG000209 Rev. D for v1.1; hereafter, ‘10x Protocol’) with the below modifications: 1. During the barcoding reaction (see step 2.1 of the 10x Protocol), spike in 0.5 μL of 1 μM bridge oligo. There is no dead volume in the reaction, so final volume will be 65.5 μL for v1 and 60.5 μL for v1.1. 2. During GEM incubation (see step 2.5 of the 10x Protocol), add a 5 min incubation at 40 °C at the beginning of the protocol (see Note 8). Incubation protocol: 40 °C 5 min, 72 °C 5 min, 98 °C 30 s, 98 °C 10 s, 59 °C 30 s, 72 °C 1 min, cycle for a total of 12 times, hold at 15 °C. 3. During silane bead elution (see step 3.1o of the 10x Protocol), add 43.5 μL of Elution Solution I and subsequently recover 43 μL. Keep 3 μL aside to use as input (see Note 9) in the tag library PCR, and with the remaining 40 μL, proceed to SPRI cleanup as per 10x protocol. 4. During SPRI cleanup (see step 3.2d of the 10x Protocol), save the supernatant. For the bead bound fraction, proceed as per 10x protocol. For the supernatant fraction, add 32 μL SPRI, let bind for 5 min. Collect beads on magnet, wash twice with 80% EtOH, remove the remaining ethanol and elute beads in 42 μL EB (see Note 9). This can be combined with the 3 μL left aside after the silane purification, as input in the TSA/TSB indexing reaction: 50 μL 2x KAPA mix 2.5 μL primer P5 10 μM 2.5 μL indexing primer 10 μM (RPxx or D7xx, see Table 2) 3–45 μL input fragments (see Note 9) 100 μL total. Incubation protocol: 95 °C 3 min, 95 °C 20 s, 60 °C 30 s, 72 °C 20 s, 72 °C 5 min, cycle for a total of 14–16 times, hold at 4 °C. 5. Proceed with indexing the ATAC library as described in Subheading 4.2 of the 10× protocol. Usually 10 cycles provide sufficient material to perform library QC and sequencing. If native nuclei are run in parallel, a noticeable reduction in PCR yield can be observed with the fixed sample compared to native nuclei (presumably due to fixation).
258
Eleni P. Mimitou et al.
Fig. 3 Representative fragment analyzer traces of the sequencing libraries. ATAC (top) and protein tag (bottom) libraries of fixed human PBMCs permeabilized with OMNI lysis buffer (a) or LLL lysis buffer (b). Note the increased abundance of the nucleosome-free region (size 25 k reads for panels >200 antibodies.
Massively Parallel Profiling of Accessible Chromatin and Proteins with ASAP-Seq
259
ATAC and TSA/TSB libraries can be sequenced on the same flow cell. We recommend that protein tag libraries should not occupy more than ~50% of the flow cell because beyond the first 10–15 nt, both Rd1 and Rd2 enter a low-diversity region (polyT/A or 10× capture sequence), resulting in a decreased data quality that can negatively impact ATAC fragment mapping. However, we note that we have not systematically evaluated relative loading abundances for the ATAC and protein tag library. We have used the Illumina NextSeq and NovaSeq reagents kits and respective sequencing platforms. A minimum of 75-cycle kit with recipe [34, 8, 16, 34] is sufficient if you are not intending to retain mtDNA reads. For experiments that plan to retain mtDNA for genotyping, we recommend using longer reads to obtain high coverage of the mitochondrial genome for variant calling. In this setting, we typically utilize a 150-cycle kit with a [72, 8, 16, 72] recipe. We note that the full length expected molecules per modality are shown in Box 1.
Box 1: ASAP-Seq Tag Libraries Structure and Sequencing Schemes ASAP-seq ADT in TotalSeq™-A format: Final library UBI READ 1 --> •••••••••• i7 index read --> •••••••• 5’AATGATACGGCGACCACCGAGATCTACACNNNNNNNNNNNNNNNNTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGNNNNNNNNNVTTTTTTTT TTTTTTTTTTTTTTTTTTTTTTVxxxxxxxxxxxxxxxTGGAATTCTCGGGTGCCAAGGAACTCCAGTCACxxxxxxxxATCTCGTATGCCGTCTTCT GCTTG 3’TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNNNNNNNNNAGCAGCCGTCGCAGTCTACACATATTCTCTGTCNNNNNNNNNBAAAAAAAA AAAAAAAAAAAAAAAAAAAAAABxxxxxxxxxxxxxxxACCTTAAGAGCCCACGGTTCCTTGAGGTCAGTGxxxxxxxxTAGAGCATACGGCAGAAGA CGAAC i5 •••••••••••••••• •••••••••••••••
UBI ••••••••••
i7 index read --> •••••••• 5’AATGATACGGCGACCACCGAGATCTACACNNNNNNNNNNNNNNNNTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGNNNNNNNNNVTTTTTTTT TTTTTTTTTTTTTTTTTTTTTTVxxxxxxxxxxxxxxxAGATCGGAAGAGCACACGTCTGAACTCCAGTCACxxxxxxxxATCTCGTATGCCGTCTTC TGCTTG 3’TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNNNNNNNNNAGCAGCCGTCGCAGTCTACACATATTCTCTGTCNNNNNNNNNBAAAAAAAA AAAAAAAAAAAAAAAAAAAAAABxxxxxxxxxxxxxxxTCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGxxxxxxxxTAGAGCATACGGCAGAAG ACGAAC i5 •••••••••••••••• ••••••••••••••• •••••••••••••••••••••••••••••••••••••••••••••••••• i7 index read --> •••••••• 5’AATGATACGGCGACCACCGAGATCTACACNNNNNNNNNNNNNNNNTCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTGCTAGGACCGGCCTTA AAGCNNNNNNNNNxxxxxxxxxxxxxxxNNNNNNNNNNAGATCGGAAGAGCACACGTCTGAACTCCAGTCACxxxxxxxxATCTCGTATGCCGTCTTC TGCTTG 3’TTACTATGCCGCTGGTGGCTCTAGATGTGNNNNNNNNNNNNNNNNAGCAGCCGTCGCAGTCTACACATATTCTCTGTCAACGATCCTGGCCGGAAT TTCGNNNNNNNNNxxxxxxxxxxxxxxxNNNNNNNNNNTCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTGxxxxxxxxTAGAGCATACGGCAGAAG ACGAAC i5 •••••••••••••••• •••••••••••••••••••••••••••••••••• =’"$IDR_$1,$2,$3,$4,$5,
$6,$7,$8,$9,$10}’ $IDR_OUTPUT | sort | uniq | sort -k7n,7n | gzip -nc > $IDR_PEAKS
2. Filter for ENCODE blacklisted regions if needed (see Subheading 3.3, step 2). 3.5 Generating Signal Tracks
1. Generate fold-change bigWig file with MACS2 (see Note 23). • Inputs: pileup bedGraph files, generated from MACS2 callpeak ($TREAT_PILEUP, $CONTROL_PILEUP), chromosome sizes file ($gensz). Note that this produces an intermediate bedGraph file ($fc_bedgraph, $fc_bedgraph_srt). • Outputs: fold-change bigWig ($FC_BIGWIG). • Commands: (1) macs2
bdgcmp -t $TREAT_PILEUP -c $CON-
TROL_PILEUP --o -prefix $prefix -m FE
(2)
slopBed -i ${prefix}_FE.bdg -g $gensz -b
0 | bedClip stdin $gensz $fc_bedgraph
(3)
sort
-k1,1
-k2,2n
$fc_bedgraph
>
$fc_bedgraph_srt
(4)
bedGraphToBigWig
$fc_bedgraph_srt
$gensz $FC_BIGWIG
2. Generate p-value bigWig files with MACS2. • Inputs: tagAlign file ($TAG), pileup bedGraph files, generated from MACS2 callpeak ($TREAT_PILEUP, $CONTROL_PILEUP), chromosome sizes file ($gensz). Note that this produces an intermediate bedGraph file ($pval_bedgraph, $pval_bedgraph_srt). • Outputs: p-value bigWig ($PVAL_BIGWIG). • Commands: (1) sval=$(wc -l
$pval_bedgraph_srt
(5)
bedGraphToBigWig
$pval_bedgraph_srt
$gensz $pval_bigwig
3. Optionally, generate count-signal tracks (see Note 24). • Inputs: tagAlign files ($TA_FILE), chromosome sizes file ($GENSZ). • Outputs: strand separated count-signal tracks ($POS_COUNT_BIGWIG, $NEG_COUNT_BIGWIG). • Commands: (1) zcat -f $TA_FILE | sort -k1,1 -k2,2n | bedtools genomecov -5 -bg -strand + -g $GENSZ -i stdin > TMP.POS.BED
(2)
bedGraphToBigWig
TMP.POS.BED
$GENSZ
$POS_COUNT_BIGWIG
(3)
zcat -f $TA_FILE | sort -k1,1 -k2,2n | bed-
tools genomecov -5 -bg -strand - -g $GENSZ -i stdin > TMP.NEG.BED
(4)
bedGraphToBigWig
TMP.NEG.BED
$GENSZ
$NEG_COUNT_BIGWIG
3.6 ATAC-seq Quality Control Evaluation
1. Calculate mitochondrial fraction (see Note 25). • Inputs: unfiltered alignments file in BAM format ($RAW_BAM), number of CPU threads ($nth). Note that these commands will produce intermediate files: (1) an alignments file without chrM alignments ($NON_MITO_BAM) and (2) an alignments file with only chrM alignments ($MITO_BAM). • Outputs: fraction of reads mapped to chrM. • Commands: (1) samtools idxstats $RAW_BAM | cut -f 1 | grep -v -P "^chrM$" | xargs samtools view $RAW_BAM -@ $nth -b> $NON_MITO_BAM
(2)
samtools view -b $RAW_BAM -@ $nth chrM >
$MITO_BAM
(3)
samtools sort -n --threads 10 $NON_MITO_-
BAM -O SAM | SAMstats.sort.stat.filter.py -sorted_sam_file - --outf $non_mito_samstat_qc
(4) samtools sort -n --threads 10 $MITO_BAM -O SAM
|
SAMstatspython
SAMstats.sort.stat.
312
Daniel S. Kim filter.py
--sorted_sam_file
-
--outf
${mito_samstat_qc}
(5) Rn = number of mapped reads in $non_mito_Rm = number of mapped reads in $mito_samstat_qc, then fraction of mito reads is Rm / (Rm + Rn).
samstat_qc,
2. Calculate read counts at each stage of filtering. This can be done with any alignment file (BAM format) to determine why reads are lost in processing and to guide future library generation as needed (see Note 26). • Input: alignments in BAM format ($BAM). • Output: mapped statistics ($MAPSTATS). • Command: samtools sort -n --threads 10 $BAM -O SAM | SAMstats --sorted_sam_file - --outf $MAPSTATS
3. Estimate library complexity (see Notes 27 and 28). • Inputs: final alignments file ($BAM), prefix for a temporary read name sorted BAM file ($OFPREFIX). • Outputs: PCR bottlenecking coefficient 1 (PBC1), PCR bottlenecking coefficient 2 (PBC2), Non-Redundant Fraction (NRF). • Commands: (1) samtools sort -n $BAM -o ${OFPREFIX}.srt. tmp.bam
(2)
bedtools bamtobed -bedpe -i ${OFPREFIX}.
srt.tmp.bam | awk ’BEGIN{OFS="\t"}{print $1,$2, $4,$6,$9,$10}’ | grep -v ’chrM’ | sort | uniq -c | awk
’BEGIN{mt=0;m0=0;m1=0;m2=0}
($1==1)
{m1=m1+1} ($1==2){m2=m2+1} {m0=m0+1} {mt=mt+ $1}
END{printf
"%d\t%d\t%d\t%d\t%f\t%f\t%f
\n",mt,m0,m1,m2,m0/mt,m1/m0,m1/m2}’
>
${PBC_FILE_QC}
4. Calculate cross-correlation metrics (see Notes 29 and 30). • Inputs: BEDPE files ($FINAL_BEDPE_FILE), number of reads to subsample ($NREADS), tagAlign file ($FINAL_TA_FILE) used for standardized randomization, number of compute threads ($NTHREADS). Note that an intermediate file ($SUBSAMPLED_TA_FILE) will be generated which can be deleted after this analysis. • Outputs: cross-correlation scores ($CC_SCORES_FILE) and plots ($CC_PLOT_FILE). • Commands.
ATAC-seq Data Processing
313
First, subsample the BEDPE or tagAlign file (default: 25M reads): zcat $FINAL_BEDPE_FILE | grep -v “chrM” | shuf -n
$NREADS
--random-source=
$SUBSAMPLED_TA_FILE
Then use the following commands to run crosscorrelation: (1) Rscript $(which run_spp.R) -c=$SUBSAMPLED_TA_FILE
-p=$NTHREADS
-filtchr=chrM
-savp=$CC_PLOT_FILE -out=$CC_SCORES_FILE
(2)
sed -r ’s/,[^\t]+//g’ $CC_SCORES_FILE >
temp
(3) mv temp $CC_SCORES_FILE 5. Calculate the Jensen-Shannon distance (JSD) metric (see Note 31). • Inputs: aligned reads in BAM format ($BAM), MAPQ threshold ($MAPQ_THRESH, default 30), number of processers ($NTH). • Outputs: fingerprint plots showing JSD ($JSD_PLOT) and log ($JSD_LOG). • Command: plotFingerprint -b $BAM --labels rep1 --outQualityMetrics
$JSD_LOG
--minMappingQuality
$MAPQ_THRESH -T "Fingerprints of different samples"
--numberOfProcessors
$NTH
--plotFile
$JSD_PLOT
6. Estimate GC bias (see Note 32). • Inputs: filtered alignments file ($BAM), reference genome ($REF_FA). • Outputs: GC bias plot ($GC_BIAS_PLOT) and log ($GC_BIAS_LOG) of results. The log can be used to replot as desired. • Command: java
-Xmx6G
-XX:ParallelGCThreads=1
-jar
picard.jar CollectGcBiasMetrics R=$REF_FA I= $BAM
O=$GC_BIAS_LOG
USE_JDK_INFLATER=TRUE
USE_JDK_DEFLATER=TRUE VERBOSITY=ERROR
314
Daniel S. Kim QUIET=TRUE
ASSUME_SORTED=FALSE
CHART=
$GC_BIAS_PLOT S=summary.txt
7. Fragment length statistics. This is for paired end only (see Note 33). • Inputs: final BAM file ($BAM). • Outputs: data file with fragment length distribution ($INSERT_DATA), distribution plot ($INSERT_PLOT). • Command: java
-Xmx6G
picard.jar
-XX:ParallelGCThreads=1
CollectInsertSizeMetrics
-jar
INPUT=
$BAM OUTPUT=$INSERT_DATA H=$INSERT_PLOT VERBOSITY=ERROR
QUIET=TRUE
USE_JDK_DEFLATER=TRUE
USE_JDK_INFLATER=TRUE
W=1000
STOP_AFTER=5000000
8. Analyze TSS enrichment (see Note 34). • Inputs: filtered final BAM file ($BAM), chromosome sizes file ($GENSZ), read length estimated from FASTQ ($READ_LEN), TSS BED file ($TSS_BED). • Output: TSS plot, TSS enrichment value at peak within a desired output directory ($OUT_DIR). • Command: encode_task_tss_enrich.py
--read_len
$READ_LEN --nodup-bm $BAM --chrsz $GENSZ --tss $TSS_BED --out-dir $OUT_DIR
9. Calculate the fraction of reads in peaks (FRiP) (see Notes 35 and 36). • Inputs: final filtered peak file ($PEAK), final filtered tagAlign file ($TA). • Output: text file with FRiP score ($FRIP). • Commands: (1) val1=$(bedtools
intersect -a $SUBSAMPLED_FASTQ, or if gzipped then zcat $FASTQ | head -n 1000000 > $SUBSAMPLED_FASTQ) to confirm the pipeline works properly before running the pipeline on potentially very deep sequencing libraries. 2. The adapter is the sequencing primer sequence used in transposition. Commonly the sequence is AGATCGGAAGAGC (Illumina) but confirm that your library generation method uses this primer. This trimming step is important as fragments generated by transposase cuts may be shorter than your read length. As an example, consider an open chromatin site which is 70 bp in length. It is possible for two transposases to bind in that open chromatin site, generating a fragment that is less than 70 bp in length. If the read length is 100 bp, then the sequencing will read through the adapter on the 3′ end of the fragment, leading to non-genomic adapter sequence in the read itself. This non-genomic sequence needs to be trimmed off for proper alignment of the reads. 3. Note that we run Bowtie2 with the -k parameter set to k + 1. This is by design, as it allows us to distinguish between reads that map to only k total positions and those that map to more than k positions in downstream processing. 4. For running alignment for single-end reads, replace the Bowtie2 command with: bowtie2 -k ${multimapping+1} --mm -x $bwt2_idx --threads $nth_bwt2 -U $log | samtools view -Su /dev/stdin | samtools sort - $pre-
Note that the key difference is in using parameter -U instead of -1 and -2.
fix.
5. For running alignment with uniquely mapped reads only, replace the Bowtie2 command with: bowtie2 -X2000 --mm --threads $nth_bwt2 -x $bwt2_idx -1 $fastq1 -2 $fastq2 2>$log | samtools view -Su /dev/stdin | samtools sort - $prefix.
Note
that the key difference is removal of the parameter -k. 6. Note the use of a custom script assign_multimappers.py. This script can be found in the ENCODE ATAC-seq pipeline GitHub repository. Briefly, this script looks at reads with multiple alignments and only keeps reads that mapped to no more than the desired number of multimappers. For example, if the
316
Daniel S. Kim
multimapping threshold is 4 but the read is found to map to 5 locations, the read is discarded (a read is only allowed to map to a maximum of 4 locations). Downstream filtering by samtools chooses one of the read alignments as primary and discards the rest. Note that MAPQ threshold is NOT used when processing multimappers, as all multimappers fall below the usual MAPQ threshold. 7. For filtering reads for single-ended read alignments (that are multimapped) with samtools flags, use these read filtering commands instead: (1) samtools sort -n ${RAW_BAM} -o ${QNAME_SORT_BAM}
(2) samtools view -h ${QNAME_SORT_BAM} | $(which assign_multimappers.py) -k $multimapping | samtools view -F 1804 -Su /dev/stdin | samtools sort / dev/stdin -o ${FLAG_FILT_BAM}
Note that the key differences are sorting by read name order first, no fixing read mates, and no filtering for read mates. 8. For filtering reads with uniquely mapped read alignments only with samtools flags, use these read filtering commands instead: (1) samtools view -F 1804 -f 2 -q ${MAPQ_THRESH} -u ${RAW_BAM} | samtools sort -n /dev/stdin -o ${TMP_FILT_BAM}
(2)
samtools
fixmate
-r
${TMP_FILT_BAM}
${TMP_FILT_FIXMATE_BAM}
(3) samtools view -F 1804 -f 2 -u ${TMP_FILT_FIXMATE_BAM}
|
samtools
sort
/dev/stdin
-o
${FLAG_FILT_BAM}
Note the key differences are no filtering for multimappers, immediate filtering with -F 1804, and use of MAPQ thresholds. Note that MAPQ threshold default is 30. This threshold is aligner dependent, so if using a different aligner then remember to adjust the MAPQ. 9. For filtering reads for single-ended read alignments that are uniquely mapped, use this read filtering command instead: (1) samtools view -F 1804 -q ${MAPQ_THRESH} -u ${RAW_BAM} | samtools sort /dev/stdin -o ${FILT_BAM} -T ${FILT_BAM_FILE_PREFIX}
10. Duplicates are filtered as a conservative measure to avoid read biases that may occur by using PCR duplicates. There is a tradeoff to be considered in only using unique reads vs using multimappers. While uniquely mapped reads are able to definitively avoid reads that are PCR duplicates generated by library amplification, there are a number of reads that are not PCR duplicates but will be seen as multimapped due to multiple distinct genomic locations that are closely or completely the
ATAC-seq Data Processing
317
same (as in the case of evolutionarily recent region duplications, such as around hemoglobin genes HBA1 and HBA2). To capture these reads, the ENCODE consortium utilizes some multimappers (reads that do not map to more than four unique locations) to help capture these evolutionarily recently duplicated genomic regions. 11. To filter duplicates in a single-end alignments file, adjust Command (3) to the following: samtools view -F 1804 -b ${FILT_BAM_FILE} > ${FINAL_BAM_FILE}
12. The tagAlign format is an ENCODE format in which alignments are kept in a BED format, such that each line in the BED file is an alignment. This format can be useful for quick compatibility with Bedtools. To generate a tagAlign of alignments from a single-end library, change the command to the following: bedtools bamtobed -i ${FINAL_BAM} | awk BEGIN {OFS="\t"}{$4="N";$5="1000";print
$0}’
|
gzip
-nc > ${FINAL_TA}
Note that there is no BEDPE in a single-end library as there are not paired reads. 13. Transposase activity leads to an offset cut that is 9 bp in length. To approximate the center of the transposase binding site on the sequence and achieve base-pair resolution information on the genome, read starts (transposase cut sites) are adjusted to get positions that are closer to the transposase center instead of the cut site by adding 4 to the positive strand reads and subtracting 5 from the negative strand reads. 14. Peaks are called with a loose p-value threshold to capture a large set of possible peaks. Having more peaks aids the IDR framework in determining the threshold for reproducible peaks. Stricter thresholding is used in calculating reproducible peaks in the next steps. If the IDR framework will not be used for filtering peaks (as may be the case when only a single replicate is present), we recommend setting a stricter p-value threshold for peak calling (e.g., 0.01). Adjust the p-value as needed based on your data quality and downstream IDR results. 15. It is helpful to clean up peak names after peak calling by replacing the peak names with peak ID where the ID number is the peak rank: sort -k 8gr,8gr "$prefix"_peaks.narrowPeak | awk ’BEGIN{OFS="\t"}{$4="Peak_"NR ; print $0}’ | head -n ${NPEAKS} | gzip -nc > $peakfile
16. To utilize MACS2 for ATAC-seq, we adjust --extsize and --shiftsize parameters to fit ATAC-seq specifications. In contrast to ChIP-seq, the ends of the reads (Tn5 transposase
318
Daniel S. Kim
cut sites) matter for ATAC-seq instead of the midpoints of the reads (approximate midpoint of DNA-protein binding sites). This requires adjusting read shifting to ensure that peak calling is performed with read ends rather than read midpoints. 17. Blacklist regions are genomic regions known to produce significant signal enrichment that can bias downstream analyses due to significant amplification of noise and artifact. Blacklist files can be found at: doi: 10.5281/zenodo.1491732 [14]. 18. To run IDR, also run peak calling on a pooled set of reads. To do so, simply concatenate the replicate tagAlign files and run peaks on the pooled set of alignments. For example: zcat ${REP1_TA_FILE} ${REP2_TA_FILE} | gzip -nc > ${POOLED_TA_FILE}
19. When running IDR, we also recommend capping the input peak files to a peak list of the top 300K peaks. Observationally, across ENCODE accessibility datasets, we find that most cell types have about 200K accessible regions. As such, we suggest only using at most 300K peaks in IDR analysis as the top 300K likely captures all the real accessibility regions in addition to some regions that represent noise. 20. When only a single replicate is present, IDR can be run on selfpseudoreplicates. To generate pseudoreplicates, take the single replicate, shuffle reads, and split the reads into two equal parts. Use the original replicate peak file as the new “pooled replicate” peak file when running IDR. • Commands for paired-ended datasets: (1) nlines=$( zcat ${joined} | wc -l ) (2) nlines=$(( (nlines + 1) / 2 )) (3) zcat -f ${joined} | shuf --randomsource= ${PR1_TA_FILE}
(5)
awk ’BEGIN{OFS="\t"}{printf "%s\t%s\t%s\t%
s\t%s\t%s\n%s\t%s\t%s\t%s\t%s\t%s\n",$1,$2, $3,$4,$5,$6,$7,$8,$9,$10,$11,$12}’ "${PR_PREFIX}01" | gzip -nc > ${PR2_TA_FILE}
• Commands for single-ended datasets: (1) nlines=$( zcat ${FINAL_TA_FILE} | wc -l ) (2) zcat ${FINAL_TA_FILE} | shuf --randomsource=
${PR2_TA_FILE}
21. When two replicates are heavily imbalanced in read counts, consider running IDR on pseudo-pseudoreplicates. To do so, first use the steps in Note 20 to generate two pseudoreplicates for each replicate. Then merge one pseudoreplicate from each replicate with the pseudoreplicate from the other replicate. For example, if replicate r1 produces pseudoreplicates r1pr1 and r1pr2 and replicate r2 produces pseudoreplicates r2pr1 and r2pr2, then merge r1pr1 and r2pr1 to get pseudopseudoreplicate ppr1. Use the pooled peak file as the “pooled peak file” in IDR. 22. In the ENCODE pipeline, IDR is run on multiple versions of peak sets, including true replicates, pseudoreplicates, and pooled pseudoreplicates. These peak files are all compared to determine an optimal peak set (the largest number of peaks) and a conservative peak set (the fewest peaks). Please see the ENCODE pipeline website for further discussion of how these comparisons with IDR can aid in creating robust and reproducible region sets. 23. We recommend producing both fold-change enrichment signal tracks and p-value signal tracks. The p-value signal tracks can often be more helpful in visualization, while downstream analyses are best done with the fold-change enrichment signal values. 24. The count tracks can be utilized in downstream base-pair resolution analyses, such as in deep learning with BPNet [15]. 25. The mitochondrial genome is very accessible as it has no nucleosomes and is a known source of poor library generation. It is important to check the fraction of mitochondrial mapped reads before a deep sequencing run if possible to determine how much sequencing is necessary to get the desired read depth on the non-mitochondrial genome. 26. Please note that samtools flagstat metrics track the number of alignments in the file, not the read count. We provide the SAMstats package as a way to calculate read counts in alignment files, to accurately capture read counts at each stage of data processing. 27. Library complexity measures are PBC1, PBC2, and NRF. PBC1 should be > 0.9, PBC2 > 10, and NRF > 0.9. The file produced has this information in the following columns:
320
Daniel S. Kim TotalReadPairs [tab] DistinctReadPairs [tab] OneReadPair
[tab]
TwoReadPairs
[tab]
NRF=Di-
stinct/Total [tab] PBC1=OnePair/Distinct [tab] PBC2=OnePair/TwoPair
28. To run library complexity measures for a single-end library, change the command to the following: bedtools bamtobed -i ${FILT_BAM_FILE} | awk ’BEGIN{OFS="\t"}{print $1,$2,$3,$6}’ | grep -v ’chrM’ | sort | uniq -c | awk ’BEGIN{mt=0;m0=0; m1=0;m2=0}
($1==1){m1=m1+1}
($1==2){m2=m2+1}
{m0=m0+1} {mt=mt+$1} END{printf "%d\t%d\t%d\t%d \t%f\t%f\t%f\n",mt,m0,m1,m2,m0/mt,m1/m0,m1/ m2}’ > ${PBC_FILE_QC}
29. When cross-correlating forward strand alignment start positions to reverse strand alignment start positions, you can generate two peaks of cross-correlation, one that corresponds to the read length and the other to the average fragment length. This is more useful in ChIP-seq experiments, as there is a characteristic fragment length (the length of DNA covered by the DNA-binding protein), but can also be a useful metric in ATAC-seq to confirm that the library is enriched for genomic DNA fragments. The score file for cross-correlation has the following columns: # Filename numReads estFragLen corr_estFragLen PhantomPeak corr_phantomPeak argmin_corr min_corr phantomPeakCoef relPhantomPeakCoef QualityTag
The following columns are most important: • Normalized strand cross-correlation coefficient (NSC) = col9 in outFile • Relative strand cross-correlation coefficient (RSC) = col10 in outFile • Estimated fragment length = col3 in outFile, take the top value. 30. For subsampling a single-end library: zcat ${FINAL_TA_FILE} | grep -v “chrM” | shuf -n ${NREADS}
--random-source= ${SUBSAMPLED_TA_FILE}
31. We recommend using deepTools to calculate a JensenShannon distance, which provides a measure of signal-tonoise ratio in the sequencing library. Please see deepTools
ATAC-seq Data Processing
321
references for more details. We filter out blacklist aligned reads before running JSD. 32. GC content sequence bias is a known phenomenon in nextgeneration sequencing methods for chromatin [16]. This analysis can be used to determine how much GC bias is in your experiment, and whether GC bias correction may be necessary. In practice, GC bias is ubiquitous and should always be taken into account in downstream analyses. 33. In a paired-end experiment, the fragment lengths generated from transposition can be determined after alignment. As the transposition sites (the ends of the fragments) are at accessible DNA locations, fragments can be generated within nucleosome-free regions (NFRs), or can span one or more nucleosomes. NFR fragments are the most common, while mono-nucleosomal fragments (fragments spanning one nucleosome) and di-nucleosomal fragments are rarer but often present in a good library. These fragment patterns at genomic loci can provide additional information about chromatin structure and nucleosome positioning at loci of interest, such as with V-plots [17]. To determine if such analyses may be possible, a fragment length distribution plot can be useful to determine if mono-nucleosomal and di-nucleosomal fragments are present. Observationally, >40% of reads fall in NFR regions (fragment length 0–150), and mono-nucleosomal reads may be approximately 40% of the NFR total. 34. The TSS enrichment is an important measure of signal-tonoise ratio within an ATAC-seq dataset. To calculate the TSS enrichment, use the following procedure (full code for this procedure can be found at https://github.com/ENCODEDCC/atac-seq-pipeline/blob/master/src/encode_task_tss_ enrich.py). For the TSS file, take a standard genomic annotation file (such as a GTF file), select only the protein coding genes, and use the start positions of the genes. Using these TSSs, generate the read pileups around each TSS, from 2000 bp downstream to 2000 bp upstream. Combine all these read pileups to get the aggregate read profile around TSSs. Calculate the background read pileup as the average read pileup in the 100 bps on either edge. Then normalize the aggregate profile by dividing aggregate plot by the background read pileup to get a fold change signal. Please note that TSS enrichment values are dependent on the reference used. For hg19 refSeq, a TSS enrichment >10 is ideal, though 6–10 is acceptable by ENCODE standards. For GRCh38 refSeq, >7 is ideal, though 5–7 is acceptable. For mm9 GENCODE, >7 is ideal, though 5–7 is acceptable. For mm10 refSeq, >15 is ideal, though 10–15 is acceptable. Please see the ENCODE data quality standards (https://www.encodeproject.org/atac-seq/
322
Daniel S. Kim
#standards) for the latest updates to TSS enrichment thresholds. 35. The Fraction of Reads in Peaks (FRiP) score is a measure of signal-to-noise. To calculate the FRiP, take your finalized peak set and determine the fraction of reads that fall into these peak regions. The higher the FRiP, the better signal-to-noise of the dataset. A strong FRiP score, particularly important in footprinting analyses, is 0.4 or higher. 36. The FRiP calculation can also be used with any region set desired. It may be of interest to calculate fraction of reads within all known open chromatin regions, or in blacklist regions. 37. Nt and Np should be within a factor of 2 of each other. If more than 2, this suggests that the replicates are very different in quality. N1 and N2 should be within a factor of 2 of each other. If more than 2, this also suggests that the replicates are very different in quality. Note that these metrics are simply based on how many peaks were discovered in IDR analysis. References 1. Buenrostro JD, Giresi PG, Zaba LC et al (2013) Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods 10:1213–1218. https://doi.org/10.1038/nmeth.2688 2. Galas DJ, Schmitz A (1978) DNAse footprinting: a simple method for the detection of protein-DNA binding specificity. Nucleic Acids Res 5:3157–3170. https://doi.org/10. 1093/nar/5.9.3157 3. Hesselberth JR, Chen X, Zhang Z et al (2009) Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nat Methods 6:283–289. https://doi.org/10. 1038/nmeth.1313 4. Li Z, Schulz MH, Look T et al (2019) Identification of transcription factor binding sites using ATAC-seq. Genome Biol 20:45. https://doi.org/10.1186/s13059-0191642-2 5. ENCODE Project Consortium, Moore JE, Purcaro MJ et al (2020) Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583:699–710. https://doi. org/10.1038/s41586-020-2493-4 6. Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 17:10. https://doi.org/10. 14806/ej.17.1.200
7. Langmead B, Salzberg SL (2012) Fast gappedread alignment with Bowtie 2. Nat Methods 9: 357–359. https://doi.org/10.1038/nmeth. 1923 8. Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079. https://doi.org/10.1093/bioinformatics/ btp352 9. Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842. https:// doi.org/10.1093/bioinformatics/btq033 10. (2020) Picard Toolkit. Broad Institute 11. Feng J, Liu T, Qin B et al (2012) Identifying ChIP-seq enrichment using MACS. Nat Protoc 7:1728–1740. https://doi.org/10.1038/ nprot.2012.101 12. Kharchenko PV, Tolstorukov MY, Park PJ (2008) Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat Biotechnol 26:1351–1359. https://doi.org/10. 1038/nbt.1508 13. Ramı´rez F, Ryan DP, Gru¨ning B et al (2016) deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res 44:W160–W165. https://doi.org/10. 1093/nar/gkw257 14. Amemiya HM, Kundaje A, Boyle AP (2019) The ENCODE blacklist: identification of problematic regions of the genome. Sci Rep 9:9354.
ATAC-seq Data Processing https://doi.org/10.1038/s41598-01945839-z ˇ , Weilert M, Shrikumar A et al (2021) 15. Avsec Z Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat Genet 53: 354–366. https://doi.org/10.1038/s41588021-00782-6 16. Meyer CA, Liu XS (2014) Identifying and mitigating bias in next-generation sequencing
323
methods for chromatin biology. Nat Rev Genet 15:709–721. https://doi.org/10. 1038/nrg3788 17. Henikoff JG, Belsky JA, Krassovsky K et al (2011) Epigenome characterization at single base-pair resolution. PNAS 108:18318– 18323. https://doi.org/10.1073/pnas. 1110731108
Chapter 18 Deep Learning on Chromatin Accessibility Daniel S. Kim Abstract DNA accessibility has been a powerful tool in locating active regulatory elements in a cell type, but dissecting the combinatorial logic within these regulatory elements has been a continued challenge in the field. Deep learning models have been shown to be highly predictive models of regulatory DNA and have led to new biological insights on regulatory syntax and logic. Here, we provide a framework for deep learning in genomics that implements best practices and focuses on ease of use, versatility, and compatibility with existing tools for inference on DNA sequence. Key words DNA accessibility, ATAC-seq, DNase-seq, Deep learning, Machine learning
1
Introduction DNA accessibility assays continue to be powerful tools in locating active regulatory elements in a cell type, as these assays work genome-wide and mark regions where DNA binding proteins are interacting with the genome to produce cell-type-specific gene regulation [1–5]. However, further analysis on accessible regions is necessary to dissect regulatory logic encoded in these regulatory regions. Deep learning models have emerged as highly predictive models of regulatory DNA [6]. These models learn nonlinear predictive functions that map DNA sequence to genome-wide profiles of regulatory activity by learning predictive sequence features and their higher order combinations. These models produce highly robust and accurate mappings between DNA sequence to molecular phenotypes like accessibility, suggesting that these models capture the higher order regulatory logic that produces accessibility and regulatory potential from a DNA sequence [7–9]. While deep learning models have been criticized for their opaqueness in interpretation, we and others have developed powerful interpretation methods to extract rules of cis-regulatory logic from these black-box models [10–14]. These models in conjunction with
Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols, Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7_18, © The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023
325
326
Daniel S. Kim
validated interpretation tools have provided new insights into DNA sequence syntax and logic, and have become established tools in computational biology. Here, we provide a framework and best practices for building deep learning models for genomics as well as guides for applying interpretation tools to these models. Deep learning continues to be a rapidly evolving field, so here we provide a high-level perspective. We note that there are many reasonable ways to implement a deep learning pipeline, including decisions around the programming language and the deep learning framework. Here we have chosen to share best practices that attempt to balance compatibility (both backward and toward the future), ease of use, and versatility. To this end, we provide a framework that is best implemented in Python 3, with Tensorflow 2.0 and TF Keras. Even more specifically, this protocol is for building a classification model from DNA sequence input to predict binarized accessibility (e.g., is an accessible peak present/not present at this genomic region) in a single cell type. Useful interpretation tools for analyzing such models are also recommended. While we focus on a specific machine learning problem in this protocol (mapping from DNA sequence to accessibility), we think that the best practices and ideas here are widely applicable to a variety of machine learning problems in genomics. As such we hope that this protocol will be an effective starting point for many possible types of deep learning in genomics.
2
Materials This protocol assumes working knowledge of Python, bioinformatics, and machine learning. As noted above, there are many ways to do deep learning, and frameworks and tools are rapidly evolving. At this current point in time, for deep learning frameworks we most recommend using either Tensorflow (Keras has now become a part of Tensorflow 2.0) or Pytorch. Both are best used in Python 3 for compatibility with existing inference tools. There are a variety of relevant resources for deep learning in genomics. For plug-and-play models, the Kipoi model zoo has a variety of models that may be of interest. For inference, the most common tools used include DeepLIFT, SHAP, and TF-MoDISco. Please note that you can run a deep learning pipeline without a graphical processing unit (GPU), but it will be exponentially slower. We recommend running training, evaluation, and inference with a GPU.
Deep Learning on Chromatin Accessibility
3
327
Methods
3.1 Data Processing and Data Loading
1. Start with a set of genomic intervals, such as a set of accessible regions for a cell type (see Note 1). This will be your set of genomic intervals that are labeled as positives. 2. Collect an informative set of negative genomic intervals (see Notes 2 and 3). This includes flanking intervals (the genomic intervals adjacent to the positive intervals on either side, our default is to collect three extra bins on each side), random intervals (intervals anywhere else in the genome that are not positives), as well as known accessible intervals that are not accessible in your set of genomic intervals (see Note 4). 3. For the positive intervals and negative intervals selected, bin these intervals into equal-size bins (see Note 5), using a stride length to generate fixed-length examples across the selected intervals (see Note 6). These bins are your genomic examples. Default bins are 200 bp in length (e.g., an example is 200 base pairs of genomic sequence), and our default stride length is 50 bp. 4. Set up labels for your examples. Positives should be labeled with 1 and negatives are labeled with 0 (see Note 7). 5. Extend each example interval to your final interval length (see Note 8). This step now adds the flanking sequences of each bin to give more sequence context during training. The default final length is 1000 bp. At this stage, you should have a set of genomic intervals that are all 1000 bp in length and are each associated with a label (1 or 0). 6. Optionally, pre-generate one-hot encodings for your regions (see Notes 9–11). If you intend to use a standard data loader for your desired deep learning framework, this will be necessary to have appropriate inputs for training. 7. Build a data generator appropriate for your desired deep learning framework. Many frameworks now provide the option to create your own data loader if needed. If performing a one-hot encoding on the fly, write a one-hot encoder in your data generator to ensure the deep learning framework receives a proper input with the label.
3.2
Train a Model
1. Before training, determine your evaluation setup. We use a cross-validation strategy based on splitting by chromosome (see Note 12). For a tenfold cross validation strategy, split your chromosomes by size as equally as possible across tenfolds, then use eight folds for training, one fold for validation, and one fold for testing (see Note 13).
328
Daniel S. Kim
2. Choose a model architecture and implement in your desired deep learning framework (see Notes 14 and 15). 3. Train the model using your desired deep learning framework. Training will require your dataset, a model, a loss function, and an optimizer. Use a Binary Cross Entropy loss (see Note 16) with Adam optimizer (see Note 17). Default parameters for Adam optimizer are: learning rate = 0.001, beta_1 = 0.9, beta_2 = 0.999, epsilon = 1e-08. Set up a training regimen: number of epochs to run the training data (default is 20) as well as the metric to optimize (default is the loss) (see Note 18). Adjust the parameters as desired to optimize training. 3.3
Evaluation
1. Evaluate your model using only the held-out test data (see Notes 19 and 20). Unlike training, where only an informative set of negative regions is used, please use the entirety of the validation chromosomes during evaluation. Useful measures during evaluation of a classification model include the loss, area under the precision-recall curve (AUPRC), and area under the receiver-operator curve (AUROC) (see Note 21).
3.4
Inference
1. As a starting point for downstream inference methods, generate base-pair level contribution scores on genomic intervals of interest (see Notes 22–24). To determine statistical significance of these scores, create dinucleotide shuffled versions of the sequences and generate base-pair level contribution scores on those sequences to get an empirical null distribution of contribution scores (see Note 25). We recommend tools such as DeepLIFT, SHAP, or backpropagated gradients (see Note 26), some of which will also handle null sequence generation and significance scoring for you. 2. For motif scanning, you can utilize your deep learning framework to quickly scan using your database of interest. The usual position-weight matrix (PWM) scan is a convolutional operation, so you can utilize the deep learning framework by initializing a convolutional layer with the weights set as the PWM weights for each motif in your database (see Note 27). 3. For de novo motif discovery, use TF-MoDISco to take contribution scores and find enriched patterns. 4. Combinatorial analyses can be performed on enriched motifs of interest (see Note 28). You can utilize the model predictions on the original sequence compared to combinatorial scrambling of identified motif sites in the sequence. Motif scrambling involves taking the underlying motif match sequence and shuffling that sequence in place to generate a sequence that does not contain that motif at that location anymore. Additionally, you can use the Deep Feature Importance Map (DFIM) method by scrambling motif sites and determining how the
Deep Learning on Chromatin Accessibility
329
contribution scores change compared to the original sequence. Both of these analyses can give you further insight and hypotheses of combinatorial logic in DNA sequence.
4
Notes 1. Please see the previous chapter in this book for a processing pipeline to generate peaks from an accessibility assay (such as ATAC-seq). 2. It is often easier in downstream processing to set up different files for positives and negatives. This can allow you greater control over the ratio of positives to negatives by adjusting how many examples come from each file in training. 3. We generally train with an equivalent number of positives and negatives, so we select the negative set of examples to approximately equal the number of positive examples. This can be adjusted to better optimize training and ensure the model has seen an appropriate diversity of negative examples. 4. Note that training uses an informative set of negatives, but for evaluation of the model it is best to evaluate the model against genome-wide data. It is important to keep this consideration in mind early as you set up your dataset, so that it is easy to switch to a genome-wide dataset for downstream evaluation. 5. Deep learning models require a fixed size input. As such, region sets must be binned to generate these fixed length inputs. To adequately cover the regions of interest without generating too many similar examples, a stride length of 50 bp is used between bins. Bin size and stride length can be adjusted as desired. 6. The 200 bp bins are the “active” DNA sequence of interest in the example. We consider the bin positive if more than 50% of the bin overlaps a positive interval. Later, the region is extended with additional flanking sequence to provide more context around the “active” sequence to the model. It is important to note that labeling is only done using the “active” sequence, i.e., the 200 bp bin, which acts as a way to focus the model on learning important features in the middle of the sequence, utilizing surrounding contextual DNA sequence. 7. Note that in the case of a single positives region set (single task model), this is trivial – simply label your positive bins with 1 and your negative bins with 0 – but in the case of multitask models, this will be an important step to generate an appropriate label set for each task (each region set), most commonly in an array of dimensions (n, n_task) where n is the number of examples and n_task is the number of tasks.
330
Daniel S. Kim
8. Please remember to check that extending the region intervals does not cause the interval to exceed chromosome boundaries. 9. By convention, the one-hot encoding order is alphabetical (i.e., A is [1, 0, 0, 0]; C, is [0, 1, 0, 0]; G is [0, 0, 1, 0]; and T is [0, 0, 0, 1]). 10. We recommend not generating the one-hot encodings in advance. This is quickly done with an interval lookup for the alphabetical sequence and then a lookup dictionary to convert to a one-hot encoding. This can be very helpful in reducing file sizes and decreasing I/O time. This may require a custom data loader for your model, though many deep learning frameworks do have one-hot encoding data layers available. 11. There are a variety of options for storing your dataset (text files, Python numpy arrays, etc.). For the purposes of this protocol we recommend HDF5 files, as HDF5 is a standard format well suited for machine learning datasets and widely used. 12. Think early about how you will manage chromosome splits (our recommended setup for cross-validation of models). An easy way to do so is to generate separate dataset files for each chromosome, so that you only load specific chromosomes in your data generator for various stages (training, validation, testing). We recommend creating train/validation/test splits by chromosome as to prevent train/test contamination from overlapping examples on the same chromosome. 13. A strategy for generating n equally sized folds is to do the following. Order your chromosomes by size (largest first). Set up n “buckets” for chromosomes. Place the first n chromosomes into the n buckets. Then, do the following iterative process until there are no more chromosomes: (1) find the bucket that is smallest in terms of examples and (2) add the largest remaining chromosome to that bucket. 14. Convolutional neural networks (CNNs) have been very effective for DNA sequence to accessibility models. We recommend starting with a CNN architecture and adjusting from there as desired. A useful tuned architecture is Basset [9]. 15. If running a regression model, simply remove the final activation layer (often a softmax or sigmoid layer). This exposes the logits as the final layer that can be float values (vs. probabilities). 16. Loss functions are an active area of research in deep learning. For classification, Binary Cross Entropy is an effective loss function. Check whether the loss function is designed to operate after activation (after the sigmoid/softmax layer) or before (on logits). For regression, mean-squared error loss is a reasonable starting point.
Deep Learning on Chromatin Accessibility
331
17. Optimizers are an active area of research in deep learning. Effective optimizers used in deep learning on sequence include Adam optimizer [15] and RMSprop [16]. Optimizer parameters for Adam are as given in the Methods, default parameters for RMSprop are: learning rate = 0.002, decay = 0.98, momentum = 0.0. 18. Many deep learning frameworks come with a training function that encapsulates the training process. This can simplify model training but also make the process opaque. A fuller description of deep learning requires much more explanation, but a brief high-level explanation is provided here. Within this training routine, the training data is fed to the model in batches. One full iteration through a training dataset is an epoch. The data is pushed forward through the model (interacting with model weights) to generate predictions, which are compared to the labels. The difference between the labels and predictions are then pushed backward through the model (backpropagation) based on the loss function and optimizer to adjust the model weights. The next batch is then pushed forward through the model, interacting with the updated weights, and so on. This continues until the training data is all used, at which point the model evaluates the performance of the model using the validation data. If the performance of the model is still improving, the training routine will run another epoch and evaluate again with the validation data. This continues until the routine has hit the maximum number of epochs or is no longer improving in performance on the validation data (most commonly based on early stopping criteria). 19. Note that in evaluation, there is no loss function or optimizer as these are only used in training. 20. We believe it is very important to determine performance of the model in a genome-wide setting. This provides an accurate view on how the model would perform in the true setting of genome-wide prediction. Genome-wide evaluation also carries the additional benefit of being a more comparable metric across studies. As different studies will select their positives or negatives in different ways, metrics calculated on subsets of the genome can be biased or an inaccurate measure of true performance on the genome. 21. AUROC is known to be a very inflated metric in genome-wide evaluation, as the number of negatives vastly outweighs the number of positives. We recommend AUPRC as the more accurate and meaningful metric to determine performance. High performing accessibility models have AUPRCs of 0.6 or more [17].
332
Daniel S. Kim
22. Sequences of interest can be dynamically accessible regions, accessible regions around a locus of interest, accessible regions that are known to be bound by a DNA binding protein, or any other subset of interest. To generate contribution scores, you can start with the gradients propagated back onto the input. Given the labeling strategy above, it is important to only perform interpretation on the actual example sequence (e.g., if the original bin was 200 base pairs, then interpretation should only be performed on those 200 base pairs). 23. Various studies in the field have looked at convolutional filters to interpret what the model has learned. While convolutional filters have shown pattern weights that look like motifs, we do not recommend analysis on weights in the model, as the model learns a representation of the input DNA sequence that is distributed across the entire layer and can be hard to interpret in an isolated convolutional filter. As such, we recommend backpropagation-based methods that reaggregate contribution information back onto the DNA sequence itself, which gives a more comprehensive view of the sequence features in relation to each other. 24. Base pair contribution scores can also be used to dissect genetic variation by taking known single nucleotide polymorphisms (SNPs) and adjusting the SNP to its allelic form. Of note, variant analyses in deep learning can be unstable, as the model was trained to predict accessibility and not variant effects. Interpret results carefully and utilize multiple inference methods when analyzing variants. 25. For inference methods, it is often very helpful to have empirical reference distributions to determine if your contribution scores or downstream results are significant. We recommend using dinucleotide shuffled sequences as the empirical null sequences, which aims to maintain the distribution of dinucleotides in the sequence rather than just the sequence content. This is a more constrained but more biologically accurate null sequence. 26. We do not recommend using the Integrated Gradients method, as we have found it tends to obscure cell-type-specific features (which is important in highlighting cross-cell type differences) and appears to highlight general features in DNA sequences. 27. When scanning for motifs on contribution scores with a motif database, it is important to also scan on the original sequence as well. PWMs are designed to give log-odds on sequence assuming a one-hot sequence without weights on the base pairs, and can give high scores to poor PWM matches on contribution scores. As such, it is important to check that the sequence is
Deep Learning on Chromatin Accessibility
333
also an appropriate match by PWM score on the original sequence. 28. One of the greatest strengths of deep learning in genomics is that it can build a high-performing mapping from DNA sequence to molecular phenotypes like accessibility without needing initial featurization. This suggests that this modeling framework is able to capture DNA syntax effectively, including parameters like variable spacing, motif density, and motif counts. As such, syntax analysis will be an important area of research with deep learning in genomics. References 1. Boyle AP, Davis S, Shulha HP et al (2008) High-resolution mapping and characterization of open chromatin across the genome. Cell 132:311–322. https://doi.org/10.1016/j. cell.2007.12.014 2. Song L, Crawford GE (2010) DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. Cold Spring Harb Protoc 2010:pdb.prot5384. https://doi.org/10. 1101/pdb.prot5384 3. Thurman RE, Rynes E, Humbert R et al (2012) The accessible chromatin landscape of the human genome. Nature 489:75–82. https://doi.org/10.1038/nature11232 4. Roadmap Epigenomics Consortium, Kundaje A, Meuleman W et al (2015) Integrative analysis of 111 reference human epigenomes. Nature 518:317–330. https://doi.org/ 10.1038/nature14248 5. Buenrostro JD, Giresi PG, Zaba LC et al (2013) Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position. Nat Methods 10:1213–1218. https://doi.org/10.1038/nmeth.2688 ˇ , Gagneur J, Theis FJ 6. Eraslan G, Avsec Z (2019) Deep learning: new computational modelling techniques for genomics. Nature Reviews Genetics 20:389–403. https://doi. org/10.1038/s41576-019-0122-6 7. Alipanahi B, Delong A, Weirauch MT, Frey BJ (2015) Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol 33:831–838. https://doi.org/10.1038/nbt.3300 8. Zhou J, Troyanskaya OG (2015) Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods 12:931–934. https://doi.org/10.1038/ nmeth.3547
9. Kelley DR, Snoek J, Rinn J (2016) Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res gr.200535.115. https:// doi.org/10.1101/gr.200535.115 10. Shrikumar A, Greenside P, Kundaje A (2017) Learning important features through propagating activation differences. arXiv:170402685 [cs] 11. Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. In: Advances in neural information processing systems. Curran Associates, Inc. 12. Greenside P, Shimko T, Fordyce P, Kundaje A (2018) Discovering epistatic feature interactions from neural network models of regulatory DNA sequences. Bioinformatics 34: i629–i637. https://doi.org/10.1093/bioin formatics/bty575 ˇ , Weilert M, Shrikumar A et al (2021) 13. Avsec Z Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat Genet 53: 354–366. https://doi.org/10.1038/s41588021-00782-6 ˇ , et al (2020) 14. Shrikumar A, Tian K, Avsec Z Technical note on transcription factor Motif discovery from importance scores (TF-MoDISco) version 0.5.6.5. arXiv:181100416 [cs, q-bio, stat] 15. Kingma DP, Ba J (2017) Adam: a method for stochastic optimization. arXiv:14126980 [cs] 16. Hinton G (2012) Neural networks for machine learning, Lecture 6 17. Kim DS, Risca V, Reynolds D et al (2020) The dynamic, combinatorial cis-regulatory lexicon of epidermal differentiation. bioRxiv 2020.10.16.342857. https://doi.org/10. 1101/2020.10.16.342857
INDEX A Absolute occupancy ............................................. 121–151 Adaptase................................................................ 235, 242 Adapters .................................................. 4–6, 8, 9, 15, 16, 23, 27, 28, 32, 34, 36, 40, 54, 58, 64, 90–93, 104, 109, 111, 123, 125, 141, 146, 188, 191, 199, 200, 207, 210, 221, 294, 306, 307, 315 Adapter trimming ......................................................... 110 Agarose ....................................................... 24, 25, 28, 29, 31, 33, 36, 86, 89, 91, 128 Alignment ................................................... 32, 64, 66, 82, 110, 111, 115, 135, 212, 215, 218, 220, 261, 306–309, 311–313, 315–321 Allele ..................................................................... 276, 281 AluI .............................................127, 142, 143, 145, 151 Amplicon ................................65, 69, 106, 109–111, 180 AMPure XP ..........................................24, 42, 56, 64, 66, 74, 88, 93, 95, 96, 103, 128, 133, 134, 253, 297 Antigen ................................................................. 250, 251 ArchR ........................ 195, 212, 220, 225, 264, 276, 281 Area under the precision-recall curve (AUPRC) .................................................. 328, 331 Area under the receiver-operator curve (AUROC) ................................................. 328, 331 ASAP-seq .............................................................. 249–266 Assay for transposase-accessible chromatin using sequencing (ATAC-seq)....................3–17, 23, 24, 33–35, 40, 53, 63, 71–84, 101, 109, 122, 155, 188, 219, 222, 225, 250, 251, 270, 274, 286, 290, 294, 305–322, 329 ASTAR-seq .................................................................... 189 ATAC-RSB buffer ................................................ 4, 5, 102 ATAC-see......................................................285–290, 294 ATP ...................................................................86, 89, 140
B Bacteria ........................................................ 63, 88, 90, 91 Bactopeptone .................................................................. 74 Binary alignment map (BAM)........ 32, 33, 82, 112, 114, 135–137, 212, 215, 218, 220, 307, 308, 311–314 BamHI ........................................................ 127, 131, 133, 137, 142–144, 146, 151
Barcode ............................................ 9, 59, 156, 157, 159, 161, 165, 168, 173, 174, 176–177, 181, 182, 188, 191, 195–198, 204, 211–213, 216, 217, 220–226, 235, 236, 241, 246, 254, 260–263, 280 Barcode collision ........................................................... 182 Basset ............................................................................. 330 BedGraph ....................................... 33, 37, 113, 309, 310 Bedtools............................. 306, 308, 309, 312, 317, 320 Beta-mercaptoethanol......................................... 125, 126, 128–130, 138, 140 BigWig ...................................................64, 113, 219, 310 BioAnalyzer ............................................6, 11, 42, 47, 54, 59, 73, 81, 104, 109, 128, 134, 191, 210, 221, 243, 244, 254, 271, 273, 274, 279, 297, 300 Biotin .....................................................45, 206, 290, 301 Biotin-14-dCTP ..................................................... 41, 296 Biotinylation 73 ........................................................78, 83 Bisulfite .............................. 108–110, 232–235, 239, 244 Blacklist................................ 82, 280, 309, 318, 321, 322 Blacklisted.................................................... 275, 309, 310 Blocking...................................................... 166, 173, 174, 176, 190, 197, 204, 205, 255 Bovine papillomavirus (BPV) ......................................... 25 Bovine serum albumin (BSA)............................... 60, 128, 164, 166, 176, 182, 201, 202, 252, 271, 290 Bowtie.......................................................... 195, 217, 218 Bowtie2............................32, 64, 67, 135, 306, 307, 315 Bradford................................................................ 286, 288 Bustools ....................................................... 254, 262, 263 bwa-meth....................................................................... 104 B & W buffer.................................... 79, 89, 93, 194, 206 B/W/T buffer ...............................................89, 194, 206
C CaCl2 ................................................ 56, 57, 60, 192, 193 cDNA................156, 177, 196, 197, 199, 206–210, 224 C. elegans ......................................................................... 17 Cell lysis ........................................ 77, 105, 118, 139, 202 Cellranger ................. 254, 261, 263, 275, 276, 278–280 Cell wall ............................................................16, 17, 106 Centered-log-ratio (CLR) ............................................ 264 Chromatin .................................3, 21, 39, 53, 63, 71, 85, 101, 121, 155, 188, 232, 249, 270, 285, 293, 305
Georgi K. Marinov and William J. Greenleaf (eds.), Chromatin Accessibility: Methods and Protocols, Methods in Molecular Biology, vol. 2611, https://doi.org/10.1007/978-1-0716-2899-7, © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2023
335
CHROMATIN ACCESSIBILITY: METHODS AND PROTOCOLS
336 Index
Chromatin accessibility ................ v, 13, 71–96, 101–118, 155, 156, 187–227, 232, 233, 249–266, 269–281, 285–290, 293–301, 325–333 Chromatin immunoprecipitation sequencing (ChIP-Seq) ............ 24, 33, 34, 60, 101, 317, 320 Chromatin interactions.............................................85–96 Chromatin looping ......................................................... 86 Chromium ....................................................253, 270–272 Chromosome sizes ...................................... 310, 311, 314 Circularization................................................................. 94 cis-regulatory elements (cREs) ........................... 3, 4, 101, 108, 155, 188 CITE-seq .............................................189, 252, 264, 265 Click-IT .............................................................. 74, 78, 83 CO2 ............................................................................56, 75 Combinatorial indexing ......................156, 188, 189, 198 Confocal .....................................288, 289, 296, 298, 301 Convolutional neural networks (CNNs) ..................... 330 Coomassie.......................................................33, 286, 288 CoTECH ....................................................................... 189 Covaris .............................................................42, 45, 104, 108, 109, 128, 133, 297, 298 CpG ............................................................ 103, 105–107, 112, 113, 115, 116, 232, 233 Cross-correlation......................................... 306, 312, 320 Crosslinked ...........................................17, 44, 48, 60, 65, 105, 182, 195, 196, 206, 300 Cross-linking .............................................................92–93 Cross-validation.................................................... 327, 330 CTCF .................................................................... 115–117 CuSO4 ............................................................................. 78 Cutadapt ......................................... 64, 66, 104, 306, 307 Cut sites ............................. 144, 146–151, 309, 317, 318 CutSmart .................... 93, 128, 131, 132, 140, 179, 180 Cytosolic buffer................... 41, 44, 48, 50, 51, 296, 298
D dATP ...................................................... 41, 294–296, 298 dCTP .....................................................41, 166, 177, 296 Deep learning ...............................................319, 325–333 DeepTools ........................................ 33, 59, 64, 306, 320 Demultiplex ................................ 254, 260–262, 275, 278 Desalting............................................................... 182, 190 Desulphonation.................................................... 234, 240 3D genome.................................................................... 285 dGTP ...................................................................... 41, 296 4′,6-diamidino-2-phenylindole (DAPI) .................54–59, 252, 287, 289, 294–296, 298 Digitonin ............4, 6, 16, 102, 192, 201, 202, 252, 256 Dimensionality reduction .................................... 225, 265 Dimethyl Formamide (DMF) ................................ 6, 164, 192, 194, 202, 287 Dinucleosomal...................................................... 221, 222
Diploid.......................................................................7, 116 Disuccinimidyl glutarate (DSG)..................................... 60 Dithiothreitol (DTT)................................. 35, 41, 89, 91, 103, 138, 157, 193, 200–202, 253, 286–289, 296 DNA fragmentation ........................................................ 45 DNA LoBind tube .................................. 47, 56, 200, 300 DNA methylation ............... 39, 102, 106, 156, 231–246 DNA Polymerase I .........................................41, 296, 301 DNA replication ........................................................72, 73 DNase ......................... 3, 4, 7, 22, 39, 54, 128, 139, 155 DNase-seq .................................................... 4, 39, 40, 53, 54, 63, 71, 101, 155, 294 DNse hypersensitivity (DHS)...................................39, 40 dNTP ............................................. 6, 13, 41, 88, 93, 165, 166, 192, 203, 208, 234, 235, 241, 296, 301 DOGMA-seq................................................................. 189 Doublets ..............................................197, 198, 220–222 Dounce ................................................164, 174, 237, 238 Droplet ....................................................... 155, 156, 188, 189, 222, 249, 251, 252 Drosophila ............................................... 72, 76, 105, 116 Drosophila melanogaster (D. melanogaster) 72–74, 76, 82 dsciATAC-seq ................................................................ 189 dSMF .................................................................... 101–118 dTTP....................................................................... 41, 296 Dynabeads .............................................88, 103, 192, 206
E EB buffer ..........................................................84, 88, 207 EDTA....................................................33, 42, 44, 48, 50, 55, 65, 75, 86, 89, 92, 93, 103, 125, 127, 128, 133, 139, 141, 165, 174, 175, 182, 192–195, 235, 287–289, 297, 298 EGTA...................................................................... 55, 127 EM-seq .........................................................103, 108–114 ENCODE............ 14, 54, 214, 305–307, 309, 310, 315, 317–319, 321 End repair ................... 28, 46, 49, 51, 58, 109, 141, 299 Enhancers ...............................................3, 14, 28, 34, 46, 53, 58, 71, 85, 86, 108, 129, 134, 232, 285, 299 Epigenome .................................................................... 156 Escherichia coli (E. coli) ........................... 65, 68, 286, 288 Ethidium bromide ....................................................31, 33 Ethylene glycol bis(succinimidyl succinate) (EGS)....... 60 5-ethynyl-2′-deoxyuridine (EdU)...............72–76, 82, 83 Euchromatin......................................................... 285, 293 Exonuclease ......................... 40, 123, 143, 146, 148, 235 Exo/rSAP .................................................... 235, 241, 245
F FASTA ............................................................67, 262, 280 FASTQ...........................32, 66, 212–214, 217, 307, 314 FASTQC..............................................32, 64, 66, 82, 261
CHROMATIN ACCESSIBILITY: METHODS Fetal bovine serum (FBS) ........56, 88, 91, 252, 271, 272 Fetal Calf Serum.............................................................. 74 FFPE ..........................................................................49–51 Ficoll gradient ................................................................... 7 FITC .............................................................................. 294 Fixation ................................48, 68, 91–92, 95, 200–202, 251–253, 255–257, 271–273, 279, 289, 296 Flowmi ......................................................... 253, 271, 273 Fluorescein .................................................. 294, 296, 298 Fluorescence-activated cell sorting (FACS)........... 16, 83, 234, 238, 239, 252, 271–273 Fluorophores ........................................................ 285, 294 FokI......................................................158, 167, 173, 179 Footprints ...........13, 101, 108, 109, 111, 114–117, 305 Formaldehyde............................................. 40, 41, 43, 48, 51, 55, 57, 60, 64, 65, 68, 86, 91, 139, 155, 252, 255, 271, 272, 287, 289, 295–297, 300 Formaldehyde-assisted isolation of regulatory elements sequencing (FAIRE-Seq) ..... 40, 63, 71, 155, 294 Fraction of reads in peaks (FRiP)............... 277, 314, 322 Fragment length......................................... 12, 59, 81, 82, 108, 144, 146, 210, 221, 223, 245, 314, 320, 321 FS-seq ........................................................................21–37
AND
PROTOCOLS Index 337
High Salt Buffer .................. 42, 45, 46, 49, 50, 297, 299 HindIII ................................................127, 142, 143, 151 HiSeq ...................................................................... 95, 167 Histone ...................................................3, 21, 24, 34, 40, 63, 72, 121, 122, 156, 293 Histone modifications...............................................21, 34 H3K9me3........................................................... 33, 34, 60 H3K27me3 ...............................................................55, 60 Hoechst ....................................................... 238, 294, 296 Homoplasmic ....................................................... 276, 281 HP1.................................................................................. 55 HPLC ..................................................182, 190, 235, 290 HT1080....................................................... 286, 288, 289 HU................................................................................... 64 Human papillomavirus (HPV) ....................................... 25 Hybridization ............................................... 40, 103, 105, 106, 108–109, 196, 197, 204–206
I
GC bias ................................................................. 313, 321 Gelatin ............................................................................. 75 Gene expression ........ v, 63, 85, 156, 212, 270, 285, 293 GentleMACS ............................................... 191, 193, 200 Glycerol.................................................33, 75, 86, 91, 92, 116, 140, 157, 193, 195, 199, 200, 210, 287–289 Glycine ........................................................ 41, 43, 48, 55, 57, 60, 65, 89, 92, 192, 193, 201, 252, 255, 271, 272, 289, 296, 297 Glycogen................................................42, 44, 88, 93, 94 GpC ....................................................103, 105–107, 110, 112–116, 232–234, 238, 244 Graphical processing unit (GPU) ................................ 326
i5 .................. 9, 13, 15, 65, 90, 134, 167, 180, 181, 260 i7 ...........................9, 13, 15, 65, 90, 167, 180, 181, 260 IDR .................................... 309, 310, 314, 317–319, 322 IGEPAL ...........................................................4, 5, 16, 55, 75, 102, 164, 194, 287, 289, 290 Illumina............................................ 9, 12, 13, 15, 23, 24, 27, 29, 30, 32, 42, 46, 47, 56, 59, 64, 74, 77, 82, 83, 90, 95, 102, 109, 110, 116, 123, 124, 128, 129, 133, 134, 142, 143, 146, 158, 167, 180, 181, 197, 211, 238, 244, 254, 259, 260, 271, 274, 278, 279, 297, 299, 300 Imidazole ............................................................ 86, 88, 89 Immuno-staining ................................................. 287, 289 Inaccessible ....................................................... 53–60, 108 Insulators ......................................................................... 14 Integrated Genome Browser (IGB).................. 64, 67, 68 Isopropanol ....................... 42, 44, 86, 94, 128, 130, 132 Isopropyl-β-d-1-thiogalactopyranoside (IPTG) ................................................90, 286, 288
H
J
H3 .............................................................. 3, 21, 163, 172 H4 .............................................................. 3, 21, 163, 172 H2A .............................................................................3, 21 Haploid .......................................................................... 116 H2B .............................................................................3, 21 HCT116 .................................41, 48, 54–56, 59, 60, 296 HDAC .................................................................. 294, 295 HDF5 ............................................................................ 330 HEK293 ........................................................................ 222 Hemocytometer .............................................................. 41 HEPES............................................................75, 287–289 Heterochromatin ........................... 59, 60, 285, 293, 294 Heteroplasmic ...................................................... 276, 281
Jensen-Shannon distance (JSD) .......................... 313, 321
G
K KAc ................................................................................ 164 Kallisto .................................................254, 261–263, 266 KAPA .................................... 13, 19, 166, 167, 181, 192, 208, 209, 211, 235, 242, 253, 254, 257, 258, 266 Keras .............................................................................. 326 Kipoi .............................................................................. 326 Kite.......................................................254, 261–264, 266 KOAc .................................................................... 128, 132 KOH .............................................................127, 287–289
CHROMATIN ACCESSIBILITY: METHODS AND PROTOCOLS
338 Index L
Lambda ................................................................. 108, 112 Lamin B1 ....................................................................... 287 Ligation ................................................28, 40, 46, 49, 51, 58, 60, 94, 123, 125, 129, 133, 134, 141, 146, 156, 165, 176, 179, 190, 204–206, 294, 299 Lineage tracing.............................................................. 263 Linker.................................................... 22, 23, 86, 91–93, 96, 158, 173, 190, 198, 226 Low loss lysis (LLL)............................255, 256, 258, 266
M MACS2 ......................................... 59, 306, 309, 310, 317 MAPQ ..................................................... 59, 82, 313, 316 MarkDuplicates ............................................................. 308 Matplotlib............................................................. 104, 115 Maxima H............................................165, 175, 192, 203 M.CviPI ...............................................103, 105–107, 232 mdCTP ............................................................................ 41 mESCs .......................................................................75, 76 Metaprofile .................................................. 114, 116, 219 MethylDackel ...............................................105, 112–114 Methylome ...............................................v, 101, 189, 231 Methyltransferase ...................................4, 102, 105, 106, 116, 232–234, 238 MgAc2 ............................................................................ 164 mgatk ................................. 254, 263, 272, 276, 280, 281 MgCl2 ....................................................... 5, 6, 41, 56, 57, 60, 75, 89, 102, 103, 127, 157, 192, 194, 201, 202, 234, 252, 256, 271, 287, 289, 296 Micrococcal nuclease (MNase).................. 22, 40, 54, 55, 57, 72, 122, 155 Microscope ................................................. 41, 43, 44, 48, 56, 75, 76, 296, 298, 301 Milli-Q ............................................................41, 286, 295 MinElute PCR Purification Kit ..........6, 74, 78, 104, 193 MiSeq.........................................................................25, 32 Mitochondria..........................................7, 12, 16, 17, 40, 212, 218, 222, 223, 250, 251, 270, 271, 274, 277, 278, 287, 311, 319 Mitochondrial DNA (mtDNA)................. 255, 258–260, 263, 264, 266, 270, 271, 274–281 Mitochondrial fraction.................................................. 311 Mitochondriall genome ........................... 16, 40, 82, 217, 218, 259, 263, 269–281, 319 MluCI ........................................................................88, 93 MNase-seq.................................................. 40, 53, 54, 63, 71, 72, 101, 122, 155, 294 Monarch PCR & DNA Cleanup Kit ............................. 25 Mononucleosomal .......................................................... 14 Mouse embryonic fibroblast (MEF) ............................ 222 M.SssI ...........................................................103, 105–107 mtscATAC ..................................................................... 255
Multimodal................................................. 210, 231, 233, 260, 264–265, 270, 281 Multiomics................................................v, 156, 189, 211
N NaClO4 ................................................................. 128, 132 NEBNext High-Fidelity ..................................6, 8, 9, 104 NEB NEXT Ultra II FS DNA library prep Kit .......24–27 NEB Ultra II DNA library prep Kit ................ 24, 27–32, 42, 56, 297 Nextera ..................... 15, 64, 65, 68, 167, 180, 181, 193 NextGEM ...................................................................... 271 Next generation sequencing (NGS) ...................... 22, 23, 45, 47, 51, 54, 57–59, 72, 73, 86, 238, 244, 294, 295, 299–300 NextSeq ...........................................................47, 82, 167, 211, 254, 259, 271, 274, 278 NicE-viewSeq ....................................................... 293–301 Nicking enzyme assisted sequencing (NicE-seq) ............................................ 39–51, 294 Nicks ..................................................40, 47–50, 294, 300 Ni-NTA............................................................... 86, 88, 91 NlaIII .........................................................................88, 93 NotI-HF ............................................................... 167, 180 NovaSeq........................................................ 95, 167, 211, 238, 244, 254, 259, 271, 274, 278 NP40..............................41, 55, 204, 205, 252, 256, 271 Nt.CviPII....................................41, 47, 49, 50, 295, 296 Nuclear periphery......................................................55, 59 Nuclei isolation .................. 7, 16, 17, 73, 106, 157–164, 174, 194, 196, 226, 233–234, 238–239, 278 Nuclei isolation buffer (NIB) .................... 164, 174, 194, 201, 203–206 Nucleoid-associated proteins (NAPs) ......................63, 64 Nucleosome............................... 3, 17, 21–37, 39, 40, 53, 63, 71, 72, 101, 106, 117, 122, 127, 148, 155, 188, 231–246, 258, 274, 279, 281, 305, 319, 321 Nucleosome depleted regions (NDRs)...........39, 40, 232 Nucleosome-free region (NFR) ............. 21, 22, 258, 321 Nucleosome Occupancy and Methylome sequencing (NOMe-seq) ............................................. 101–118 Nucleosome positioning................................72, 305, 321 NUMT.................................................................. 275, 280
O OD600 ........................................... 65, 90, 129, 139, 288 Oligos ................................................... 15, 24, 30, 89–90, 129, 157, 173, 182, 190–191, 193–199, 226, 236, 251, 252, 286, 288, 290, 297 OmniATAC .......................................................... 202, 226 Open chromatin ....................................... 4, 7, 16, 23, 35, 39, 53, 63, 64, 72, 101, 116, 155–183, 188, 189, 285, 295, 296, 298, 300, 301, 305, 315, 333 ORE-seq ............................................................... 121–151
CHROMATIN ACCESSIBILITY: METHODS
AND
PROTOCOLS Index 339
P
Q
P5 ................... 9, 59, 158, 173, 197, 236, 253, 257, 258 P7 ................... 9, 59, 190, 197, 207–210, 237, 253, 258 Paired-end .................................................. 13, 59, 82, 95, 110, 134, 142, 260, 306, 318, 321 Paired-seq .....................................................155–183, 189 Paired-Tag ..................................................................... 189 Paraffin............................................................................. 49 Paraformaldehyde ......................................................... 290 PB buffer ......................................................................... 17 PBST ..........................................................................45, 50 PCR............................................5, 25, 42, 56, 64, 74, 86, 104, 134, 157, 190, 235, 253, 273, 288, 297, 312 PCR duplicates ............................. 82, 112, 141, 280, 316 Peak calling ............................ 14, 82, 219, 309, 317, 318 PEG...................................... 89, 192, 194, 203, 208, 235 Penicillin/streptomycin .................................................. 74 Permeabilization..................................251, 255–257, 279 PHAGE-ATAC.............................................................. 189 Phenol................................................... 40, 42, 44–45, 86, 93, 94, 118, 128, 132, 140, 297, 298 Phenol:Chloroform:Isoamyl Alcohol ................... 42, 297 PhiX ...........................................................................11–13 Phosphate-buffered saline (PBS) ....................... 7, 41–45, 48, 50, 56, 57, 60, 74–76, 82, 83, 86, 92, 104, 107, 165, 176, 177, 182, 191, 193, 200–202, 234, 238, 252, 255, 271, 272, 287, 289, 290, 296–298, 301 Phusion High-Fidelity DNA Polymerase ........................ 9 Picard .....................................................32, 220, 306, 308 Picolyl-Azide-PEG4-Biotin ......................................74, 78 Pipeline .............................................. 124, 211, 217, 220, 244, 254, 262, 263, 305–307, 315, 319, 326, 329 PitStop2 ................................................................ 164, 165 PMSF ..............................................................91, 192, 206 Position-weight matrix (PWM).................. 328, 332, 333 Primary antibodies ............................................... 287, 289 Primers.........................................6, 9, 15, 25, 27, 30, 42, 59, 91, 104, 109, 157, 167, 175, 180–182, 190–191, 207, 210, 235–237, 239–242, 253 Prokaryotic chromatinOpenness Profiling sequencing (POP-seq) ......................................................64–68 Promoters .............3, 53, 71, 85, 86, 108, 114, 232, 285 Protease inhibitor.................. 57, 75, 157, 164, 192, 193 Protect-seq.................................................................53–60 Proteinase K .............. 42, 44, 48, 50, 58, 107, 128, 130, 131, 139, 141, 166, 177, 192, 206, 233, 297, 298 Pseudoreplicates .......................................... 314, 318, 319 pTXB1 .................................................................. 286, 288 pUC19.................................................................. 108, 112 pyBigWig .............................................................. 105, 195 Python ........................................................ 104, 105, 195, 254, 262, 263, 266, 326, 330
Q5 ................................................... 46, 59, 129, 134, 299 qPCR ........................................................ 7, 9–13, 26, 35, 56, 59, 65, 66, 104, 109, 167, 181, 183, 191, 207, 209, 211, 258, 266, 279 Quantification .........................11–13, 59, 109, 167, 183, 210–212, 250, 251, 254, 258, 261, 278, 279, 295 Qubit .........................................6, 12, 42, 44, 47, 56, 58, 59, 64–66, 68, 73, 74, 81, 88, 89, 95, 104, 107, 109, 128, 131, 133, 167, 179, 183, 191, 193, 209–211, 244, 254, 258, 271, 273, 274, 297, 300
R R................................................33, 59, 89, 135, 137, 165 Read counts .......................................................... 312, 319 Read mapping ...................................................... 111–112 repli-ATAC ................................................................71–84 Rescue ratio ................................................................... 314 Resection ......................................................123, 146–148 Restriction enzyme....................... 4, 86, 93–94, 121–151 Reverse cross-linking.................................................92–93 Reverse transcriptase ..................156, 165, 175, 192, 203 Reverse transcription (RT) primer ..................... 157, 173, 175, 190, 195–197, 203 Rhodamine .................................................................... 294 RNase A ...................................................... 42, 44, 48, 50, 88, 93, 107, 128, 132, 296, 298 RNase OUT ................................................ 157, 164, 165 RNA-seq ...................................................... 187, 188, 224 Rolling circle amplification (RCA).................... 86, 94–96
S Saccharomyces cerevisiae.............................. 124–130, 135, 138, 140, 142, 144, 146, 147 S-adenosylmethionine (SAM) ............................ 103, 107, 215, 234, 238, 311, 312 SAMstats ................... 215, 218, 220, 306, 311, 312, 319 Samtools .......................33, 64, 105, 111, 112, 135, 195, 215, 217, 218, 220, 306–308, 312, 315–317, 319 SbfI-HF ................................................................ 167, 179 scATAC ..................................................... 4, 17, 188, 189, 212, 223, 249–251, 254, 255, 260–262, 264, 265, 270, 271, 273–275, 278, 280 scDNase ........................................................................... 40 Schizosaccharomyces pombe................. 124, 125, 127, 128, 130–132, 135, 138–140, 142, 144, 146, 147, 149 sciATAC-seq .................................................................. 189 sci-CAR-seq ................................................................... 189 SciPy............................................................................... 104 sci-RNA-seq................................................................... 188 scNMT-seq .................................................................... 189 scNOMe-seq.................................................189, 231–246
CHROMATIN ACCESSIBILITY: METHODS AND PROTOCOLS
340 Index
Scythe............................................................................... 32 Secondary antibodies ........................................... 287, 289 Seurat ..........................................195, 212, 217, 225, 264 SHARE-seq .......................................................... 187–227 Shearing ...................................................... 104, 106, 108, 123–125, 128, 133, 141, 144, 146–151 Signac........................................................... 264, 276, 281 Simian Virus 40 (SV40)............................. 21, 22, 24–27, 32, 33, 35–37 Single-cell ..................................................... 4, 40, 55, 63, 155, 187, 231, 249, 269, 326 Single-cell RNA-seq (scRNA-seq) ..................... 188, 189, 212, 224, 249, 250, 262, 265 Single-end...............................13, 82, 110, 315–318, 320 Single molecule footprinting (SMF).............101–103, 105, 107–110, 115, 116 Single nucleotide polymorphism (SNP) ...................... 332 SMC ................................................................................. 64 SnapATAC ..................................................................... 276 SNARE-seq ................................................................... 189 Sodium acetate ................................................... 88, 93, 94 Sodium chloride (NaCl) .................................5, 6, 41, 42, 55, 65, 75, 86, 88, 89, 91, 94, 102, 103, 157, 177, 192, 194, 195, 199, 201, 202, 234, 252, 256, 271, 286–289, 296–298 Sodium dodecyl sulfate (SDS)..........................33, 42, 44, 48, 57, 58, 75, 88, 91, 92, 94, 128, 140, 177, 182, 192, 194, 287, 289, 290, 298 Sodium hydroxide (NaOH) ................................ 103, 286 Somatic mutation.......................................................... 271 Sonication ................................................... 24, 40, 45–48, 58, 91, 123, 124, 288 Sonicator........................42, 89, 109, 128, 133, 288, 297 Sorbitol ................................................................. 126, 129 S phase ............................................................................. 72 Split-pool .............................................................. 188, 198 SPRI beads .........................................166, 167, 177–180, 183, 209, 235, 241, 243, 245, 253 STAR.............................................................195, 214–216 Streptavidin....................... 42, 45, 49–51, 73, 74, 79, 84, 88, 93, 94, 103, 109, 195, 196, 297, 299, 300 Sub-library ...........................................177, 182, 183, 206 Subnucleosomal .........................14, 17, 72, 82, 221, 222 Subsampling ........................................312, 313, 315, 320 Sucrose......................................16, 41, 75, 103, 157, 296 SUPERase IN.............................................. 157, 164, 165 SYBR Green ...................... 6, 9, 13, 24, 27, 30, 193, 208
T TAE buffer....................................................................... 31 tagAlign ...............................................308–314, 317, 318 Tagmentation ........................................... 65, 73, 75, 156, 164, 165, 174, 175, 179, 180, 182, 196, 199, 209, 210, 273, 287, 289
TapeStation.......................................................... 6, 11, 12, 17, 66, 67, 104, 109, 128, 167, 183, 191, 193, 209, 210, 254, 258 TD buffer ................................................. 4, 6, 8, 77, 193, 194, 210, 287, 289 TDE1 ............................................................................... 77 T4 DNA ligase ......... 165, 173, 176, 179, 192, 204, 205 T7 DNA ligase ................................................... 88, 89, 94 TEA-seq......................................................................... 189 TE buffer .............................................. 42, 44–47, 49, 51, 56, 74, 103, 128, 130–135, 198, 199, 206, 252, 271, 287, 288, 297–300 Template switching oligo (TSO) ........................ 190, 208 Tensorflow ..................................................................... 326 Terminal transferase ............................................. 166, 177 Texas Red ............................................................. 294–296 TF-MODisco........................................................ 326, 328 Thermocycler ............................10, 73, 77, 94, 129, 133, 134, 157, 165, 167, 173, 175, 177–181, 198, 199, 203, 208, 237, 239, 241, 242, 245, 246, 288 Thermomixer...................................................6, 8, 56, 58, 73, 82, 104, 107, 165, 166, 175–177, 191 THPTA ......................................................................74, 78 Tissue dissociation ...................................... 193, 200, 226 Tn5 ......................................4–6, 8, 9, 15–17, 40, 63, 65, 68, 73, 86, 89–92, 96, 155–157, 165, 173, 175, 179, 184, 188, 193, 196, 199, 203, 210, 227, 251, 270, 280, 285–290, 294, 305, 317 Topologically associating domains (TADs) ................... 85 TotalSeq............................. 252, 259, 260, 262, 265, 266 Trac-looping ..............................................................85–96 Transcription factor...................................... 3, 32, 37, 71, 72, 106, 116, 117, 232, 270 Transcription factor binding sites (TFBS) ...............32, 37 Transcription start sites (TSSs)................................ 14, 114–116, 281, 321 Transcriptome ............................ 155–184, 187–227, 249 Transposase........................................... 4–6, 8, 15, 16, 23, 40, 63, 71, 73, 77, 83, 86, 92, 96, 156, 193, 195, 196, 250, 270, 285–290, 305, 309, 315, 317 Transposome ........................40, 199–200, 210, 288–289 TrimGalore ............................................................. 82, 104 Trimmomatic........................................................ 104, 110 Tris-Ac (Tris-acetate) ................... 89, 128, 164, 192, 202 Tris-HCl ................................................... 5, 6, 41, 42, 55, 65, 74, 75, 80, 86, 88, 89, 91, 102, 103, 128, 134, 157, 193, 194, 201, 202, 234, 235, 252, 256, 271, 287, 289, 296, 297 Triton-X100 .........................................42, 45, 46, 49, 50, 57, 88, 89, 164, 165, 175, 287, 289, 297, 299 TruSeq .................................................167, 180, 181, 253 Trypan Blue ...........................................41, 191, 201, 273 TrypLE ......................................................................41, 43 Trypsin ................................................... 43, 56, 74–76, 82
CHROMATIN ACCESSIBILITY: METHODS TSS enrichment.......................................... 219, 220, 222, 223, 314, 321, 322 TSS score ....................................................................... 220 Tween-20 ................................................. 4, 6, 16, 26, 42, 74, 75, 88, 102, 192, 194, 201, 202, 252, 255, 256, 278, 287, 301
U UCSC Genome Browser ............................ 104, 195, 219 UCSC tools ................................................................... 306 Unique molecular identifier (UMI)................... 197, 211, 212, 214, 216, 224, 225, 260, 261 Universal nicking enzyme-assisted sequencing (UniNicE-seq) ......................................49, 51, 294 Uracil ........................................................... 105, 108, 127 USER enzyme ................................ 28, 46, 129, 134, 299
AND
PROTOCOLS Index 341
V Visualization ............................... 112, 293–301, 306, 319
X 10x Genomics ............................................ 253, 255–257, 265, 271–273, 278, 279
Y Yeast ................................................... 7, 17, 74, 105, 106, 116, 127, 130, 134, 138, 286 Yeast extract............................................................ 74, 127
Z Zymo Clean & Concentrate.................................. 15, 208 Zymolyase..................................... 17, 127, 129, 130, 139