Plant Epigenetics and Epigenomics: Methods and Protocols (Methods in Molecular Biology, 2093) 1071601784, 9781071601785

This second edition volume expands on the previous edition with a look at the latest techniques in plant epigenetics and

130 2 7MB

English Pages 283 [269] Year 2020

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Preface
Contents
Contributors
Part I: Detection and Analysis of Epigenetic Marks in Plant Genomes
Chapter 1: An Overview of Current Research in Plant Epigenetic and Epigenomic Phenomena
1 Introduction
2 The Evolving Definition of Epigenetics: What Exactly Do We Mean?
3 Epigenetic Phenomena in the Context of Plant Science
4 Techniques for Addressing Challenges in Plant Epigenetic and Epigenomic Research
References
Chapter 2: Approaches to Whole-Genome Methylome Analysis in Plants
1 Introduction
2 Limitations of Current Analysis Methods for Methylome Studies
3 Gene Body Methylation
4 Newly Emerging Methylation Analysis Programs
5 Strategies for Methylome Dataset Validation
6 Methylation vs. Gene Expression Data
7 Concluding Comments
References
Chapter 3: Understanding DNA Methylation Patterns in Wheat
1 Introduction
1.1 The Use of Epigenetics in Plant Breeding
1.2 Challenges of Epigenomic Analysis in Wheat
2 Methods
2.1 Sequence Capture for Epityping and Genotyping
2.2 Bioinformatic Analysis for Non-bisulfite-Treated Sequence Data
2.3 Bioinformatic Analysis after Bisulfite Sequencing
2.4 Bioinformatic Analyses Specific to an Allopolyploid
3 Implications
4 Notes
References
Chapter 4: MCSeEd (Methylation Context Sensitive Enzyme ddRAD): A New Method to Analyze DNA Methylation
1 Introduction
2 Materials
2.1 DNA Extraction
2.2 Primer and Adapter Preparation
2.3 Restriction/Ligation reaction
2.4 Purification with PEG8000
2.5 Purification with AMpure Beads
2.6 Size Selection and Gel Extraction
2.7 PCR Enrichment
3 Methods
3.1 DNA Extraction
3.2 Adapter Preparation
3.3 Double Digestion and Adapter Ligation
3.4 Purification with PEG8000
3.5 Purification with AMpure Beads (1.1x) to Remove Fragments Shorter Than 250 bp
3.6 Size Selection and Gel Purification
3.7 Purification with 0.8x AMpure Beads
3.8 Qubit Quantification
3.9 Enrichment PCR
3.10 Purification with AMpure Beads (1x)
3.11 Sequencing
3.12 Bioinformatic Analysis
4 Notes
References
Chapter 5: Plant-RRBS: DNA Methylome Profiling Adjusted to Plant Genomes, Utilizing Efficient Endonuclease Combinations, for M...
1 Introduction
2 Materials
2.1 Plant Material
2.2 Enzymes
2.3 Solutions
2.4 Buffers
2.5 Kits
2.6 Products
2.7 Equipment
2.8 Data Sets
2.9 Software
3 Methods
3.1 Genomic DNA Isolation and Digestion
3.1.1 DNA Isolation, Quality, and Quantity Evaluation
3.1.2 In Silico Digestion
3.1.3 Digestion with Restriction Enzymes
3.2 Library Preparation for Illumina Sequencing and Quality Control
3.3 Bioinformatics Pipeline
3.3.1 Reads Analysis
3.3.2 Methylation Detection
4 Notes
References
Chapter 6: Rice Histone Propionylation and Generation of Chemically Derivatized Synthetic H3 and H4 Peptides for Identificatio...
1 Introduction
2 Materials
2.1 Rice Histone Extraction
2.2 Propionylation of Synthetic and Biological Peptides
2.3 Fmoc-Based Solid-Phase Synthesis of Peptides
2.4 Workup of Synthesized Peptides
2.5 LC-MS/MS
3 Methods
3.1 Extraction of Rice Histones
3.2 Double Propionylation of Biological Peptides
3.3 Synthesis of Rice Histone H3 and H4 Synthetic Peptides
3.4 Double Propionylation of Recovered Synthetic Peptides
3.5 Characterization of Doubly Propionylated Peptides Using LC-MS/MS
3.6 Identification and Quantification of Extracted Rice Histone Peptides Using Synthetic Peptide Information
4 Notes
References
Part II: Epigenetics and Plant Chromatin Structure
Chapter 7: Preparing Chromatin and RNA from Rare Cell Types with Fluorescence-Activated Nuclear Sorting (FANS)
1 Introduction
2 Materials
2.1 Nuclei Isolation
2.2 Fluorescence-Activated Nuclei Sorting (FANS)
2.3 RNA Extraction and Quality Control
2.4 DNA Extraction and Quantification
3 Methods
3.1 Nuclei Isolation
3.2 Fluorescence-Activated Nuclei Sorting (FANS)
3.3 RNA Extraction and Quantification
3.4 DNA Extraction and Quantification
4 Notes
References
Chapter 8: Measurement of Arabidopsis thaliana Nuclear Size and Shape
1 Introduction
2 Materials
2.1 Isolation of Nuclei (Adapted from)
2.2 Image Acquisition and Analysis
3 Methods
3.1 Isolation of Nuclei
3.2 Spreading and Staining with DAPI
3.3 Image Acquisition
3.4 Semiautomated Image Analysis
4 Notes
References
Chapter 9: Study of Cell-Type-Specific Chromatin Organization: In Situ Hi-C Library Preparation for Low-Input Plant Materials
1 Introduction
2 Materials
2.1 Tissue Fixation
2.2 Nuclei Isolation and Flow Cytometry
2.3 Chromatin Digestion, Ligation, and DNA Purification
2.4 DNA Manipulation and Library Amplification
3 Methods
3.1 Tissue Fixation
3.2 Nuclei Isolation and Flow Cytometry
3.3 Chromatin Digestion, Ligation, and DNA Purification
3.4 DNA Manipulation and Library Amplification
4 Notes
References
Chapter 10: Chromatin Analysis of Metabolic Gene Clusters in Plants
1 Introduction
2 Materials
2.1 Growth Media
2.2 Measurement of mRNA Levels
2.2.1 RNA Isolation
2.2.2 Removal of Genomic DNA
2.2.3 cDNA Preparation
2.2.4 qPCR
2.3 Chromatin Immunoprecipitation
2.3.1 Stock Solutions and Reagents
2.3.2 Buffers
2.4 Metabolite Extraction and Analysis
2.4.1 Instruments and Equipment
2.4.2 Solvents and Chemicals
3 Methods
3.1 Measurement of mRNA Levels
3.1.1 RNA Isolation
3.1.2 Removal of Genomic DNA
3.1.3 cDNA Preparation
3.1.4 Quantitative PCR
3.2 Chromatin Immunoprecipitation
3.2.1 Chromatin Extraction
3.2.2 Immunoprecipitation and DNA Recovery
3.2.3 Quantification
3.3 Metabolite Analysis
3.3.1 Sample Preparation for GC-MS Analysis
3.3.2 GC-MS Analysis Method (See Note 9)
3.3.3 Sample Preparation for LC-MS Analysis
3.3.4 Method for LC-MS Analysis (See Note 11)
3.3.5 Metabolomics Analysis
3.3.6 Quantitative Analysis (See Note 12)
4 Notes
References
Chapter 11: Characterization of Plant 3D Chromatin Architecture, In Situ Hi-C Library Preparation, and Data Analysis
1 Introduction
2 Materials
2.1 Formaldehyde Fixation
2.2 Nuclei Isolation
2.3 In Situ Digestion, Biotin End-Repair, and Ligation
2.4 On-Bead Illumina TruSeq Library Preparation
2.5 Hi-C Data Analysis
3 Methods
3.1 Formaldehyde Fixation
3.2 Nuclei Preparation
3.3 In Situ Restriction Enzyme Digestion
3.4 On-Bead Illumina TruSeq Library Preparation
3.5 Data Analysis
3.5.1 Map Hi-C Data to a Reference Genome
3.5.2 Chromosome-Wide A/B Compartment Calling
3.5.3 Local A/B Compartment Calling
3.5.4 Domain and Loop Calling
References
Part III: Applications and Novel Insights into Epigenetics and Epigenomics in Plants
Chapter 12: The Gene Balance Hypothesis: Epigenetics and Dosage Effects in Plants
1 Gene Dosage Effects
1.1 Introduction
1.2 Dosage Effects of Gene Expression
1.3 Dosage Involvement in Quantitative Traits
1.4 Evolutionary Genomics
1.5 Global Dosage Effects in Plants
2 Generating Ratio Distributions and Scatter Plots to Analyze Dosage Effects
2.1 Ratio Distribution Plots
2.2 Scatter Plots
3 Conclusions
Software Implementation
References
Chapter 13: Identification and Comparison of Imprinted Genes Across Plant Species
1 Introduction
1.1 Assessing and Comparing Imprinted Expression Across Species: A Brief Overview
1.2 Overview of the Imprinting Analysis Pipeline
1.3 Obtaining Data
1.4 Imprinting Criteria
1.5 Comparing Imprinting Between Species
2 Challenges and Considerations
2.1 Identifying SNPs
2.2 Distinguishing Imprinting and Strain Bias
2.3 Minimizing Mapping Bias
2.4 Contamination from Other Tissues
2.5 Leveraging Data from Replicates
3 Methods
3.1 Installation Instructions
3.2 Required Input Files
3.3 Genome Preparation
3.4 Initial Read Quality Filtering and Alignment
3.5 Identifying Imprinted Genes from Alignment Data
3.6 Comparing Imprinting Between Species
4 Illustrative Example Using Real Data
4.1 Identifying Imprinted Genes in an Arabidopsis thaliana Dataset Starting from Raw Sequencing Reads
4.2 Identifying Imprinted Genes in a Zea mays Dataset Starting from Count Data
4.3 Comparing Imprinting Between A. thaliana and Z. mays
5 Conclusion
References
Chapter 14: Epigenetic Approaches in Non-Model Plants
1 Introduction
2 Materials
2.1 Equipment
2.2 Reagents
3 Methods
3.1 Genomic DNA Isolation, Quality Check, and Quantification
3.2 Digestion of Genomic DNA
3.3 Ligation of Barcoded Adapters
3.4 Per Species Samples Pooling, Pool Clean-Up, and Concentration
3.5 Fragment Size Selection of Digested-Ligated DNA
3.6 Nick Translation
3.6.1 Optional Test GBS PCR
3.7 Bisulfite Conversion
3.8 Library Amplification (Final epiGBS PCR)
3.9 Quantification and Assessment of the Quality of the epiGBS Library
4 Notes
References
Chapter 15: Techniques for Small Non-Coding RNA Analysis in Seeds of Forest Tree Species
1 Introduction
2 Materials
2.1 Plant Material
2.2 RNA Extraction
2.3 Quantitative Real-Time RT-PCR
3 Methods
3.1 RNA Extraction
3.2 RNA Concentration Measurement and Quality Check
3.2.1 RNA Quantity Measurements
3.2.2 Checking RNA Quality by Electrophoresis
3.3 sRNA Library Construction and Sequencing
3.4 Small RNA Data Analysis
3.5 Validating sRNA-mRNA Pairs via Quantitative Real-Time RT-PCR
4 Notes
References
Chapter 16: Epigenetic Barcodes for Detection of Adulterated Plants and Plant-Derived Products
1 Introduction
2 Materials
2.1 DNA Extraction
2.2 Agarose Gel Electrophoresis
2.3 MS-AFLP Analysis of the Saffron Flower Parts
2.4 Fluorescent Analysis of MS-AFLP Fragments
3 Methods
3.1 Sample Set
3.2 DNA Extraction and Agarose Gel Electrophoresis
3.3 DNA Amplificability Via PCR Analysis
3.4 MS-AFLP Analysis of the Saffron Flower Parts
4 Notes
References
Chapter 17: Plant Epigenetic Stress Memory Induced by Drought: A Physiological and Molecular Perspective
1 Introduction
2 Mechanisms of Drought Stress Memory
2.1 Plant Physiological Perspective
2.2 Cross-Tolerance Between Different Stresses
2.3 Reprogramming of Transcriptional Memory Induced by Drought Stress
2.4 Histone Modifications Associated with Drought Stress Memory Genes
3 Possible Interactions Between Drought Stress Signals and Chromatin-Mediated Stress Memory
4 Conclusions and Future Perspectives
References
Chapter 18: A Critical Guide for Studies on Epigenetic Inheritance in Plants
1 Introduction
2 Materials
2.1 Biological Material
2.2 Computational Tools
3 Methods
3.1 Study Design
3.1.1 Stress Treatment
3.1.2 Control Treatments
3.1.3 Replication and Sampling
3.1.4 Treatments with Chemical Compounds Acting on Epigenetic Mechanisms
3.1.5 Resolution of Epigenetic Changes
3.2 Data Analysis
3.2.1 Differential DNA Methylation
3.2.2 Differential Analysis of Histone Modifications
3.3 Data Interpretation
References
Index
Recommend Papers

Plant Epigenetics and Epigenomics: Methods and Protocols (Methods in Molecular Biology, 2093)
 1071601784, 9781071601785

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Methods in Molecular Biology 2093

Charles Spillane Peter McKeown Editors

Plant Epigenetics and Epigenomics Methods and Protocols Second Edition

METHODS

IN

MOLECULAR BIOLOGY

Series Editor John M. Walker School of Life and Medical Sciences University of Hertfordshire Hatfield, Hertfordshire, UK

For further volumes: http://www.springer.com/series/7651

For over 35 years, biological scientists have come to rely on the research protocols and methodologies in the critically acclaimed Methods in Molecular Biology series. The series was the first to introduce the step-by-step protocols approach that has become the standard in all biomedical protocol publishing. Each protocol is provided in readily-reproducible step-bystep fashion, opening with an introductory overview, a list of the materials and reagents needed to complete the experiment, and followed by a detailed procedure that is supported with a helpful notes section offering tips and tricks of the trade as well as troubleshooting advice. These hallmark features were introduced by series editor Dr. John Walker and constitute the key ingredient in each and every volume of the Methods in Molecular Biology series. Tested and trusted, comprehensive and reliable, all protocols from the series are indexed in PubMed.

Plant Epigenetics and Epigenomics Methods and Protocols Second Edition

Edited by

Charles Spillane and Peter McKeown Plant and AgriBiosciences Research Centre, Ryan Institute, National University of Ireland Galway (NUI Galway), Galway, Ireland

Editors Charles Spillane Plant and AgriBiosciences Research Centre, Ryan Institute, National University of Ireland Galway (NUI Galway) Galway, Ireland

Peter McKeown Plant and AgriBiosciences Research Centre, Ryan Institute, National University of Ireland Galway (NUI Galway) Galway, Ireland

ISSN 1064-3745 ISSN 1940-6029 (electronic) Methods in Molecular Biology ISBN 978-1-0716-0178-5 ISBN 978-1-0716-0179-2 (eBook) https://doi.org/10.1007/978-1-0716-0179-2 © Springer Science+Business Media, LLC, part of Springer Nature 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Humana imprint is published by the registered company Springer Science+Business Media, LLC, part of Springer Nature. The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.

Preface This second edition volume of Plant Epigenetics and Epigenomics: Methods and Protocols continues the aims of the first: to gather comprehensive descriptions of contemporary techniques in plant epigenetic and epigenomic research. This topic continues to be timely: the number of sequenced plant genomes continues to rise, and the plant science community has greater access than ever to “-omics” data for individual organs, tissues, cell types, varieties, and natural accessions, under a range of environmental conditions. The mapping of epigenetic marks on chromatin is therefore necessary to complement this huge body of data and ensure the emergence of comprehensive systems-based descriptions of plant molecular responses. However, all “big data” approaches to biology risk being overwhelmed by information that makes careful experimental design and the use of appropriate analytical approaches particularly critical for making meaningful conclusions. The protocols and reviews presented here will (we hope) help researchers achieve these aims. The increased representation of techniques from different organisms, including crops and non-model systems, will also permit a greater focus on applied research and the development of tools for crop breeders, as well as making comparative and evolutionary analyses possible. At the same time, the strength of Arabidopsis as a model system continues to be a powerful tool for advancing basic understanding and pioneering new techniques. We finish this preface by thanking all of the researchers who have contributed their time and expertise to providing the content of this volume of Methods in Molecular Biology, and by reiterating our hope from the first volume that the methods and techniques they have described will advance the study of plant epigenetics and epigenomics and pave the way for integrating epigenetic mechanisms into models of plant function during development and evolution. Galway, Ireland

Charles Spillane Peter McKeown

v

Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

PART I

DETECTION AND ANALYSIS OF EPIGENETIC MARKS IN PLANT GENOMES

1 An Overview of Current Research in Plant Epigenetic and Epigenomic Phenomena . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Peter McKeown and Charles Spillane 2 Approaches to Whole-Genome Methylome Analysis in Plants . . . . . . . . . . . . . . . . Xiaodong Yang and Sally A. Mackenzie 3 Understanding DNA Methylation Patterns in Wheat . . . . . . . . . . . . . . . . . . . . . . . . Laura-Jayne Gardiner 4 MCSeEd (Methylation Context Sensitive Enzyme ddRAD): A New Method to Analyze DNA Methylation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marco Di Marsico, Elisa Cerruti, Cinzia Comino, Andrea Porceddu, Alberto Acquadro, Stefano Capomaccio, Gianpiero Marconi, and Emidio Albertini 5 Plant-RRBS: DNA Methylome Profiling Adjusted to Plant Genomes, Utilizing Efficient Endonuclease Combinations, for Multi-Sample Studies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Martin Schmidt, Magdalena Woloszynska, Michiel Van Bel, Frederik Coppens, and Mieke Van Lijsebettens 6 Rice Histone Propionylation and Generation of Chemically Derivatized Synthetic H3 and H4 Peptides for Identification of Acetylation Sites and Quantification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nino A. Espinas, Alejandro Villar-Briones, Michael C. Roy, and Hidetoshi Saze

PART II

v ix

3 15 33

47

65

81

EPIGENETICS AND PLANT CHROMATIN STRUCTURE

7 Preparing Chromatin and RNA from Rare Cell Types with Fluorescence-Activated Nuclear Sorting (FANS) . . . . . . . . . . . . . . . . . . . . . . . 95 Ruben Gutzat and Ortrun Mittelsten Scheid 8 Measurement of Arabidopsis thaliana Nuclear Size and Shape . . . . . . . . . . . . . . . . 107 Kalyanikrishna, Pawel Mikulski, and Daniel Schubert 9 Study of Cell-Type-Specific Chromatin Organization: In Situ Hi-C Library Preparation for Low-Input Plant Materials . . . . . . . . . . . . . . 115 Nan Wang and Chang Liu

vii

viii

10 11

Contents

Chromatin Analysis of Metabolic Gene Clusters in Plants . . . . . . . . . . . . . . . . . . . . 129 ¨ tzmann Ancheng C. Huang and Hans-Wilhelm Nu Characterization of Plant 3D Chromatin Architecture, In Situ Hi-C Library Preparation, and Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . 147 Pengfei Dong and Silin Zhong

PART III 12

13 14

15

16

17

18

APPLICATIONS AND NOVEL INSIGHTS INTO EPIGENETICS AND EPIGENOMICS IN PLANTS

The Gene Balance Hypothesis: Epigenetics and Dosage Effects in Plants . . . . . . Xiaowen Shi, Chen Chen, Hua Yang, Jie Hou, Tieming Ji, Jianlin Cheng, Reiner A. Veitia, and James A. Birchler Identification and Comparison of Imprinted Genes Across Plant Species. . . . . . . Colette L. Picard and Mary Gehring Epigenetic Approaches in Non-Model Plants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M. Teresa Boquete, Niels C. A. M. Wagemaker, Philippine Vergeer, Jeannie Mounger, and Christina L. Richards Techniques for Small Non-Coding RNA Analysis in Seeds of Forest Tree Species . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yang Liu and Yousry A. El-Kassaby Epigenetic Barcodes for Detection of Adulterated Plants and Plant-Derived Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matteo Busconi, Giovanna Soffritti, Marcelino De Los Mozos Pascual, and Jose´ Antonio Fernandez Plant Epigenetic Stress Memory Induced by Drought: A Physiological and Molecular Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . James Godwin and Sara Farrona A Critical Guide for Studies on Epigenetic Inheritance in Plants . . . . . . . . . . . . . . Daniela Ramos Cruz and Claude Becker

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

161

173 203

217

227

243 261 271

Contributors ALBERTO ACQUADRO • Department of Agricultural, Forest and Food Sciences, Plant Genetics and Breeding, University of Torino, Grugliasco, Italy EMIDIO ALBERTINI • Department of Agricultural, Food and Environmental Sciences, University of Perugia, Perugia, Italy CLAUDE BECKER • Gregor Mendel Institute of Molecular Biology, Austrian Academy of Sciences, Vienna BioCenter (VBC), Vienna, Austria JAMES A. BIRCHLER • Division of Biological Sciences, University of Missouri, Columbia, MO, USA M. TERESA BOQUETE • Estacion Biologica de Don ˜ ana, Consejo Superior de Investigaciones Cientı´ficas (CSIC), Sevilla, Spain; Department of Integrative Biology, University of South Florida, Tampa, FL, USA MATTEO BUSCONI • Faculty of Agriculture, Food and Environmental Sciences, Research ` Cattolica del Sacro Cuore, Piacenza, Italy Center BioDNA, Universita STEFANO CAPOMACCIO • Department of Veterinary Medicine, University of Perugia, Perugia, Italy ELISA CERRUTI • Department of Agricultural, Forest and Food Sciences, Plant Genetics and Breeding, University of Torino, Grugliasco, Italy CHEN CHEN • Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA JIANLIN CHENG • Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA CINZIA COMINO • Department of Agricultural, Forest and Food Sciences, Plant Genetics and Breeding, University of Torino, Grugliasco, Italy FREDERIK COPPENS • Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium; Center for Plant Systems Biology, VIB, Ghent, Belgium MARCELINO DE LOS MOZOS PASCUAL • Centro de Investigacion Agroforestal de Albaladejito, Instituto Regional de Investigacion y Desartrollo Agroalimentario y Forestal, Cuenca, Spain MARCO DI MARSICO • Department of Agricultural, Food and Environmental Sciences, University of Perugia, Perugia, Italy PENGFEI DONG • State Key Laboratory of Agrobiotechnology, School of Life Sciences, The Chinese University of Hong Kong, Sha Tin, Hong Kong, China YOUSRY A. EL-KASSABY • Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC, Canada NINO A. ESPINAS • Plant Epigenetics Unit, Okinawa Institute of Science and Technology Graduate University (OIST), Onna-son, Okinawa, Japan; Plant Immunity Research Group, RIKEN Center for Sustainable Resource Science (CSRS), Yokohama City, Kanagawa, Japan SARA FARRONA • Plant and AgriBiosciences Research Centre, Ryan Institute, National University of Ireland Galway, Galway, Ireland JOSE´ ANTONIO FERNANDEZ • IDR-Biotechnology and Natural Resources, Universidad de Castilla-La Mancha, Albacete, Spain LAURA-JAYNE GARDINER • Earlham Institute, Norwich, UK

ix

x

Contributors

MARY GEHRING • Computational and Systems Biology Graduate Program, Massachusetts Institute of Technology, Cambridge, MA, USA; Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA; Whitehead Institute for Biomedical Research, Cambridge, MA, USA RUBEN GUTZAT • Gregor Mendel Institute of Molecular Plant Biology, Austrian Academy of Sciences, Vienna BioCenter (VBC), Vienna, Austria JIE HOU • Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA ANCHENG C. HUANG • Institute of Plant and Food Science, Department of Biology, Southern University of Science and Technology, Shenzhen, China GODWIN JAMES • Plant and AgriBiosciences Research Centre, Ryan Institute, National University of Ireland Galway, Galway, Ireland TIEMING JI • Department of Statistics, University of Missouri, Columbia, MO, USA KALYANIKRISHNA • Institute of Biology, Freie Universit€ a t Berlin, Berlin, Germany CHANG LIU • Center for Plant Molecular Biology (ZMBP), University of Tu¨bingen, Tu¨bingen, Germany YANG LIU • Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC, Canada SALLY A. MACKENZIE • Departments of Biology and Plant Science, The Pennsylvania State University, University Park, PA, USA GIANPIERO MARCONI • Department of Agricultural, Food and Environmental Sciences, University of Perugia, Perugia, Italy PETER MCKEOWN • Plant and AgriBiosciences Research Centre, Ryan Institute, National University of Ireland Galway (NUI Galway), Galway, Ireland PAWEL MIKULSKI • Institute of Biology, Freie Universit€ a t Berlin, Berlin, Germany; John Innes Centre, Norwich, UK ORTRUN MITTELSTEN SCHEID • Gregor Mendel Institute of Molecular Plant Biology, Austrian Academy of Sciences, Vienna BioCenter (VBC), Vienna, Austria JEANNIE MOUNGER • Department of Integrative Biology, University of South Florida, Tampa, FL, USA HANS-WILHELM NU¨TZMANN • The Milner Centre for Evolution, University of Bath, Bath, UK COLETTE L. PICARD • Computational and Systems Biology Graduate Program, Massachusetts Institute of Technology, Cambridge, MA, USA; Whitehead Institute for Biomedical Research, Cambridge, MA, USA ANDREA PORCEDDU • Department of Agriculture, University of Sassari, Sassari, Italy DANIELA RAMOS CRUZ • Gregor Mendel Institute of Molecular Biology, Austrian Academy of Sciences, Vienna BioCenter (VBC), Vienna, Austria CHRISTINA L. RICHARDS • Department of Integrative Biology, University of South Florida, Tampa, FL, USA MICHAEL C. ROY • Instrumental Analysis Section, Okinawa Institute of Science and Technology Graduate University (OIST), Onna-son, Okinawa, Japan HIDETOSHI SAZE • Plant Epigenetics Unit, Okinawa Institute of Science and Technology Graduate University (OIST), Onna-son, Okinawa, Japan MARTIN SCHMIDT • Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium; Center for Plant Systems Biology, VIB, Ghent, Belgium DANIEL SCHUBERT • Institute of Biology, Freie Universit€ a t Berlin, Berlin, Germany XIAOWEN SHI • Division of Biological Sciences, University of Missouri, Columbia, MO, USA

Contributors

xi

GIOVANNA SOFFRITTI • Faculty of Agriculture, Food and Environmental Sciences, Research ` Cattolica del Sacro Cuore, Piacenza, Italy Center BioDNA, Universita CHARLES SPILLANE • Plant and AgriBiosciences Research Centre, Ryan Institute, National University of Ireland Galway (NUI Galway), Galway, Ireland MICHIEL VAN BEL • Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium; Center for Plant Systems Biology, VIB, Ghent, Belgium MIEKE VAN LIJSEBETTENS • Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium; Center for Plant Systems Biology, VIB, Ghent, Belgium REINER A. VEITIA • Institut Jacques Monod, Paris, France; Universite Paris-Diderot, Paris, France PHILIPPINE VERGEER • Institute of Water and Wetland Research, Radboud University Nijmegen, Nijmegen, The Netherlands; Plant Ecology and Nature Conservation Group, Wageningen University and Research, Wageningen, The Netherlands ALEJANDRO VILLAR-BRIONES • Instrumental Analysis Section, Okinawa Institute of Science and Technology Graduate University (OIST), Onna-son, Okinawa, Japan NIELS C. A. M. WAGEMAKER • Institute of Water and Wetland Research, Radboud University Nijmegen, Nijmegen, The Netherlands NAN WANG • Center for Plant Molecular Biology (ZMBP), University of Tu¨bingen, Tu¨bingen, Germany MAGDALENA WOLOSZYNSKA • Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium; Center for Plant Systems Biology, VIB, Ghent, Belgium; Department of Genetics, Wrocław University of Environmental and Life Sciences, Wrocław, Poland HUA YANG • Division of Biological Sciences, University of Missouri, Columbia, MO, USA XIAODONG YANG • Departments of Biology and Plant Science, The Pennsylvania State University, University Park, PA, USA SILIN ZHONG • State Key Laboratory of Agrobiotechnology, School of Life Sciences, The Chinese University of Hong Kong, Sha Tin, Hong Kong, China

Part I Detection and Analysis of Epigenetic Marks in Plant Genomes

Chapter 1 An Overview of Current Research in Plant Epigenetic and Epigenomic Phenomena Peter McKeown and Charles Spillane Abstract Biological phenomena defined as having an “epigenetic” component (according to various definitions) have been extensively studied in plant systems and illuminated many mechanisms by which gene expression is regulated and patterns of expression inherited through cell divisions. This second volume of Plant Epigenetics and Epigenomics: Methods in Molecular Biology builds on the work of its predecessor to describe cutting-edge tools for plant epigenetic and epigenomic research, and embrace crop and forestry species as well as natural populations and further insights from model species. In this chapter, the historical background to plant epigenetic and epigenomic research is summarized, and key considerations for the interpretation of current data are outlined. Key words Epigenetic, Epigenomic, Parent-of-origin, DNA methylation, Historical perspective

1

Introduction This second volume of Methods in Molecular Biology: Plant Epigenetics and Epigenomics builds on the work of the first volume, published in 2014 [1] to present a further body of cutting-edge accounts of the laboratory and bioinformatic techniques required for the investigation of epigenetic phenomena in plants. These include genome-wide analyses of chromatin modification, dosage effects, and the roles of small RNA molecules, histones, and gene clusters in organismal function. Understanding such mechanisms is necessary for assessing the potential impacts of epigenetics on plant growth, development, reproduction, and ultimately for the response of these factors to evolutionary pressures and crop breeding programs. A particular focus in this new edition has been the greater emphasis on applications in crop and forestry species and other non-model organisms. To place these techniques in context, we again begin with a short summary of what we consider epigenetics to be, the background and context of modern epigenetic

Charles Spillane and Peter McKeown (eds.), Plant Epigenetics and Epigenomics: Methods and Protocols, Methods in Molecular Biology, vol. 2093, https://doi.org/10.1007/978-1-0716-0179-2_1, © Springer Science+Business Media, LLC, part of Springer Nature 2020

3

4

Peter McKeown and Charles Spillane

research, and what developments the future might hold. We previously considered the history of the term “epigenetics” and how it has been understood, especially in the context of plant science research [2]. In particular, we concentrate on the importance of studying the nature of transcriptional regulation in plant cells for elucidating the mechanisms behind what are broadly known as epigenetic effects. We note the increasing range of applications of these insights, and our emerging understanding of how these regulatory systems can be inherited through cellular divisions. Finally, we consider whether, and to what extent, there is convincing evidence for inheritance of epigenetic cellular states across plant generations, and if so how its effects of plant fitness might be assessed. We conclude by assessing the opportunities and challenges for epigenetic research in model and non-model plants to further illuminate the regulation of genome function by epigenetic effects, and the roles that the techniques that follow may play in these advances.

2

The Evolving Definition of Epigenetics: What Exactly Do We Mean? In the past 40 years, the application of molecular biological techniques has transformed the field of plant biology, and of applied, ecological, and environmental sciences allied to it. This transformation has included a full deployment of genomic and other “-omics” technologies, strengthened by more recent attempts to integrate these into systems biology models. One important area upon which these molecular advances have focussed is that of gene regulation. Many of the molecular effects involved in the regulation of gene expression are commonly referred to as “epigenetic,” and “plant epigenetics” is itself considered as a major topic of scientific interest. For example, PubMed lists 1063 papers found by the search string “plant epigenetics” as having been published since the first edition of this volume in 2014. “Epigenetics” is commonly understood to refer to the study of the presence of “marks” within chromosomes which alter the expression of the genes and other chromosomal properties, usually in a stable manner. Similarly, “epigenomics” refers to the high-throughput identification of these marks at, for example, a genome-wide or even populationwide level. However, it is important to understand that the term “epigenetics” has come a long way from its original formulation, which may obscure the fact that so-called epigenetic effects must be seen within a broader context of gene regulation and cellular and organismal development. In the worst case, confusion between different operational definitions of epigenetics can lead to implicit or explicit misunderstanding over its biological significance. We therefore agree with other reviewers that researchers making use of the methods and protocols presented in this volume should be careful to consider what they mean when using terms such as

Current Research in Plant Epigenetics and Epigenomics

5

“epigenetic” and “epigenomic,” and what assumptions are inherent within this usage. The contorted history of the term epigenetics has been put into context in recent reviews and opinion pieces by Greally and Lappalainen [3, 4] who discuss the way in which the term has changed, and why. “Epigenetics” was originally coined to refer to a model of canalized organismal development by C. H. Waddington [5, 6]. It is often argued that the term arose from his attempts to formulate a model of developmental biology that avoided the reductionism which he considered inherent in the work of the quantitative geneticists which led to the Modern Synthesis [7, 8]. In 1939, Waddington defined the epigenotype as “the set of organisers and organising relations to which a certain piece of tissue will be subject during development” [9]. In a modern sense, the first field which could have been described as epigenetic therefore involved the study of the mechanisms by which the genotype brings about the phenotype [7, 10], although the term does not seem to have been widely known. The modern understanding of epigenetics arose instead from advances in molecular biology during the 1980s and came to be concerned with the control of gene expression by modification of the chromosome at the DNA and/or histone level and subsequent chromatin organization [7]. As Lappalainen and Greally have indicated, this has the curious effect of encompassing all transcriptional regulation except transcription factors. As we previously noted [2], such gene-modifying mechanisms can represent an element of the study of developmental pathways of Waddington, but with a very different focus and only in part. Indeed, to emphasize reversible covalent modifications without direct consideration of the transcription factors which are critical for differential gene expression during development means that the control of cell fate can never be accounted for fully by what is commonly understood as “epigenetics.” There is a further important consequence of the conflation of Waddingtonian cellular differentiation with molecular phenomena such as the occurrence of DNA or histone methylation. The current definition of epigenetics as agreed by most molecular biologists roughly agrees with that proposed by Arthur Riggs and colleagues, and cited by Adrian Bird among others. In this definition, epigenetics is “the study of mitotically and/or meiotically heritable changes in gene function that cannot be explained by changes in DNA sequence” [11, 12]. This has been characterized as the “epi + genetics” definition [3, 4], back-derived to imply the study of that which is “beyond” that encoded in the genome and which is commonly assumed to augment its information content. These changes are most commonly associated with changes to DNA and histones— DNA methylation in different context, covalent modifications of histones or histone variants, and the organization of chromatin at a

6

Peter McKeown and Charles Spillane

locus. These patterns are also associated with the action of proteins and protein complexes which bind to different features of chromatin, and/or reversibly alter them, leading to changes in gene expression. In some cases these are now also known to involve the regulatory activity of small RNA molecules of different classes. We previously highlighted that the perceived importance of the possible information content of these reversible chromatin-related changes means that many working definitions of epigenetics extend beyond the study of chromatin marks themselves into considerations of phenotype, at which level such putative information would be expressed. In fact, epigenetics is also commonly used in a third way, again related to but distinct from the others, as a term to denote “soft inheritance” in the sense used by Ernst Mayr [13]. In this definition it is used to refer to heritable biological information which is not encoded in the genome itself, conjuring images of stable information content being transmitted through lineages and generations without any change to the sequence of genes or other parts of the genome. This usage may be due to the conflation of molecular modifications to chromosomes with the cellular memory inherent in Waddington’s original conception of epigenesis [3, 4, 7] and further defined by Nanney [7, 14]. This is an intriguing concept, and indeed DNA methylation states at least are typically stably inherited in a Mendelian manner [15]. However, considerable care needs to be taken when making a claim that this leads to transmission of transcriptional states or other phenotypes. In particular, a clear distinction needs to be made between persistence of chromatin-mediated transcriptional states through mitotic divisions within in a cell or tissue lineage, as opposed to inheritance through generations, which normally requires passage through meiosis and gametogenesis. Only the latter exposes the “epigenetic” change to natural selection [16–19]. (Lineages which commonly reproduce via vegetative or apomictic means, as many plants do, may possibly be an exception.) It is generally agreed that the suggestions sometimes made that epigenetics represents a major challenge to the “modern synthesis,” or even to the foundations of Darwinian natural selection, are either not supported by the available data or indeed contradicted by it. Nevertheless, the previous edition of this volume [1] made a particular focus on cases where epigenetic modifications has been reliably linked with roles in inheritance that is not solely genetic in basis, often identified through the non-Mendelian patterns of inheritance which emerge. For various reasons, plant systems have played key roles in determining the mechanisms of these, which we briefly revisit below. Commentators such as Richards (2011) have discussed the particular difficulties in distinguishing truly “epigenetic” inheritance from other effects, for example phenotypic plasticity [20]. This study also drew attention to studies in multiple species

Current Research in Plant Epigenetics and Epigenomics

7

which seemed to represent possible cases where DNA methylation might be involved with natural variation for plant phenotypes, and which could therefore reward further study [20]; some of these cases have been updated subsequently [16]. Importantly, in some cases these changes have also been linked to selective pressures [21]. Forms of “epigenetics” of this kind also illustrate with an important observation made by Bird that “processes less irrevocable than mutation fall under the umbrella term “epigenetic” mechanisms” [22]. Hence, any change to cellular function which is heritable but which does not involve permanent changes to the DNA sequence can be classed as “epigenetic,” and not just the sorts of covalent modifications to DNA and histones that are often though of: a sort of epigenetics sensu latu. In other organisms, this definition should also logically embrace effects such as the inheritance of cellular organization (as occurs in Paramecium), maternal effects, cytoplasmic inheritance, prions, and the like. The fact that the transgenerational heritability of any change of this kind must be firmly demonstrated as a key feature of any strict definition of epigenetics remains critical [23] and has been comprehensively reviewed by Quadrana and Colot [17]. To summarize, the components of a molecular epigenetic system can be defined by three components: firstly, (1) a signal from the environment that leads to (2) a responding signal in the cell that specifies an affected chromosomal location and (3) a sustaining signal that perpetuates the change [24]. We reiterate our previous clarification that interactions between genomes during reproduction, hybridization or symbiosis, or interaction with pathogens or viruses should also be considered as part of the biotic environmental—and that particular care is needed when claiming that this signal can be sustained through meiosis rather than mitosis, or across generations of an organism’s life cycle.

3

Epigenetic Phenomena in the Context of Plant Science The discoveries of DNA cytosine methylation, histone modifications, and the many roles for chromatin-modifying complexes were for the most part made in prokaryotes, especially E. coli, in unicellular eukaryotes such as S. cerevisiae yeast, and animal systems, including human biomedical studies [25–29]. Plant epigenetic systems do however possess unique features [30–32] and provide powerful tools such as viable epigenetic mutants [33]. These effects have been illustrated by the study of phenomena observed in plant biology which fall into the class that Goldberg and colleagues spoke of as “biological phenomena, some considered bizarre and inexplicable,” and which were “lumped” as epigenetic as they cannot obviously be explained by genetic mechanisms alone [34]. These phenomena are often characterized by non-Mendelian inheritance

8

Peter McKeown and Charles Spillane

of traits and may be considered epigenetic sensu latu as defined above (although they may also be caused by the inheritance patterns of genes encoded on the organellar genomes) [35]. Some areas in which work in plant systems has provided valuable past insights include important examples of the persistence of cellular memory within a single generation, i.e., of the persistence of epigenetic states through multiple rounds of mitotic division [19]. These are typically important to align the plant’s response to environmental cues. For example, a complex network of histone marks, DNA methylation patterns and non-coding RNAs are involved in ensuring the cellular memory of winter during vernalization [36]. At least in Arabidopsis, it is also possible for vernalization pathways to influence traits of their progeny, such as seed germination, although this is due to pleiotropy and/or maternal effects rather than inheritance of transcriptional states [37]. Similarly, exposure to a stress can lead to altered response to future stress events through cellular memory, an effect which can be mimicked by seed priming and can lead to pleiotropic effects on germination [38]. In rare cases, altered transcriptional states are also inherited through meiosis and fecundation in the absence of DNA sequence changes, and hence across generations, leading to so-called epimutations. Plant systems have provided examples such as peloric and many others [39] and have been widely used to study the mechanisms which underlie these epimutations [17]. This category also includes the epigenetically heritable state of paramutation which was first described at the maize b1 locus [40] and continue to be studied in maize (reviewed [41]), as well as in Arabidopsis, petunia [42] and others. Maize was also the system in which genomic imprinting was first discovered, a non-Mendelian phenomenon subsequently discovered to occur throughout the flowering plants and also in mammals, but not in any other known biological group [43]. Genomic imprinting involves non-Mendelian inheritance of a trait due to differential expression of alleles inherited from the maternal and paternal genome [43, 44]. Imprinting is associated with regions of differential methylation [45] and may be driven by kin conflict over the allocation of resources to offspring [46–48]. Technically, genomic imprinting is also a transgenerational effect as it principally occurs in the endosperm, which is a post-fertilization tissue. However, it is a transient structure which degenerates during seed development, so its influence on the subsequent generation must generally be indirect. Recent research has also suggested important roles for imprinting as a speciation mechanism [49]. Rare examples of imprinting occurring in the embryo have however been reported (e.g., [50]), and in this case the transgenerational effect could presumably affect subsequent plant development directly.

Current Research in Plant Epigenetics and Epigenomics

9

In addition, plants famously played leading roles in the discovery of transposable elements and their regulation by DNA methylation [51]. Transposable elements are also likely to be key drivers of the evolution of small RNA pathways and may be responsible for the differential distributions of epigenetic marks that cause genomic imprinting. The importance of plant systems for demonstrating the importance of small RNAs as mechanisms of gene silencing, first of transgenes then of endogenous genes, has been wellreviewed [52–56]. Key questions remain over the role of epigenetic marks in the regulation of plant traits, including in crops and in natural populations. The inheritance of epigenetic states can be statistically associated with phenotypes through the use of Epigenome-wide Association Studies (epi-GWAS or EWAS), based on the analysis of heritable variation in genetically identical but epigenetically divergent Recombinant Inbred Lines (epi-RILs) in systems such as Arabidopsis [57, 58]. These are typically based on recombinant inbred lines formed by crossing wild-type plants to DNA methylation mutants such as met1-3 and ddm1 [51]. One area which has been illustrated by the use of epi-RILs is as a tool to study heterosis effects. Heterosis is a phenomenon in which inbred lines are crossed to generate a hybrid progeny with a higher fitness than the parents and is a key element of crop breeding programs [59, 60] and current research into future biotechnological applications [61]. We and others have shown that heterosis can be partially due to non-genetic changes such as those found in the methylation patterns in epi-RILs, although heterosis refers to a wide range of different phenomena and its mechanisms generally remain to be determined [62–67]. Hybrid effects may be under epigenetic control more generally, as has been shown in the case of nucleolar dominance, in which rDNA from one parental genotype is repressed in favor of that from the other [68]. Similar effects are likely to occur in other hybrid systems, including interspecific hybridization and in the generation of allopolyploid plants in which control of the transcriptional states of homeologous genes may be under epigenetic control [69]. By definition, these systems may involve significant genetic divergence so again care must be taken in identifying non-genetic sources of heritable variation. In a similar way, it has been noted in the course of crop breeding programs that conditions experienced by the parents can influence the traits of the offspring in ways which appear to be independent of the genotype of the offspring [70, 71]. These are known as maternal or paternal effects, with maternal effects much more commonly encountered. The fact that plants are likely to share the environment of their maternal parent could provide a rationale for these effects to have adaptive significance; however, this remains largely untested. Breeders have also documented occasional paternal or “pollen” effects, also termed “xenia” effects (see [72] for a rare example where the molecular basis of such an effect

10

Peter McKeown and Charles Spillane

has been elucidated). Such parental effects are sensu latu epigenetic as they do not depend upon the genotype of an organism but of that of one or other of its parents. In some cases these effects may be due to spatial cues, nutrition provided via the endosperm during offspring development, the direct effects of RNA, organelle or protein deposition from the gametes, or the influence of plant growth regulators (PGRs) on embryogenesis: we have discussed some of these possibilities in [73]. For reproductive characteristics, it is also necessary to understand whether these effects are due to the parental, endospermal or embryonic genomes, or some combination of these [71]. As a related question, the epigenetic control of sex chromosomes in plants which possess remains to be fully elaborated so the equivalents of the non-Mendelian process of X chromosome inactivation are yet to be fully elucidated [74].

4 Techniques for Addressing Challenges in Plant Epigenetic and Epigenomic Research From the preceding discussion, the roles of plant biology research in advancing our understanding of epigenetics are clear, and provide the background to the techniques involved in generating current advances in this field. In the remainder of this volume, protocols for applying these techniques are described. Section 1 is entitled Detection and analysis of epigenetic marks in plant genomes. Here, recent advances in understanding the distribution of epigenetic marks (especially DNA methylation in different contexts) are described. Section 2 turns to consideration of Epigenetics and plant chromatin structure, including the development of important techniques for analyzing chromatin at the single-cell level. Section 3 presents some Applications and novel insights into epigenetics and epigenomics in plants, and is especially concerned with phenomena such as genome dosage, genomic imprinting, and epigenetic memories of stress. The volume concludes with an important chapter by Ramos Cruz and Becker which critically considers how claims for an epigenetic basis to an observed phenomenon should be evaluated. In this volume we have principally drawn upon techniques which have been developed to take advantage of the power of Arabidopsis and other model organisms. However many of the techniques described are suitable for use in a wide range of systems including other species of plants and algae as well as other organisms. Many of the techniques which our contributing authors describe have been honed with the help of Arabidopsis thaliana as a model, and it would be particularly gratifying to learn of them being adapted for species used as models for addressing more specialized biological questions. The rapid recent growth in the

Current Research in Plant Epigenetics and Epigenomics

11

availability of sequenced plant genomes has made many more resources available for the plant science community, including for multiple accessions of A. thaliana and of its relatives. It is important that tools for understanding the epigenetic regulation of these newly sequenced genomes are also developed. It is our hope that the cutting-edge techniques and methodologies presented in this second volume will advance our understanding of the significance of plant epigenetic mechanisms, clarify their involvement in stress response, heterosis and parental effects, and robustly determine their roles in the heritability and adaptive potential of traits in plant populations. References 1. Spillane C, McKeown PC (eds) (2014) Plant epigenetics and epigenomics: methods and protocols, Methods in molecular biology, vol 1112. Springer Science+Business Media, New York 2. McKeown PC, Spillane C (2014) Landscaping plant epigenetics. In: Plant epigenetics and epigenomics. Springer, New York, pp 1–24 3. Greally JM (2018) A user’s guide to the ambiguous word ‘epigenetics’. Nat Rev Mol Cell Biol 19(4):207–208 4. Lappalainen T, Greally JM (2017) Associating cellular epigenetic models with human phenotypes. Nat Rev Genet 18:441. https://doi. org/10.1038/nrg.2017.32 5. Gayon J (2016) From Mendel to epigenetics: history of genetics. C R Biol 339 (7–8):225–230 6. Felsenfeld G (2014) A brief history of epigenetics. Cold Spring Harb Perspect Biol 6(1): a018200 7. Haig D (2004) The (dual) origin of epigenetics. Cold Spring Harb Symp Quant Biol 69:67–70. https://doi.org/10.1101/sqb. 2004.69.67 8. Huang S (2012) The molecular and mathematical basis of Waddington’s epigenetic landscape: a framework for post-Darwinian biology? BioEssays 34(2):149–157. https:// doi.org/10.1002/bies.201100031 9. Waddington CH (1939) An introduction to modern genetics. Allen and Unwin, London 10. Waddington CH (1942) Canalization of development and the inheritance of acquired characters. Nature 150(3811):563–565 11. Bird A (2007) Perceptions of epigenetics. Nature 447(7143):396–398 12. Russo VEA, Martienssen RA, Riggs AD (eds) (1996) Epigenetic mechanisms of gene

regulation. Cold Spring Harbor Laboratory Press, Woodbury 13. Richards EJ (2006) Inherited epigenetic variation—revisiting soft inheritance. Nat Rev Genet 7(5):395–401. https://doi.org/10. 1038/nrg1834 14. Nanney DL (1958) Epigenetic control systems. Proc Natl Acad Sci U S A 44:712–717 15. Vongs A, Kakutani T, Martienssen RA, Richards EJ (1993) Arabidopsis thaliana DNA methylation mutants. Science 260:1926–1928 16. Verhoeven KJ, Vonholdt BM, Sork VL (2016) Epigenetics in ecology and evolution: what we know and what we need to know. Mol Ecol 25 (8):1631–1638 17. Quadrana L, Colot V (2016) Plant transgenerational epigenetics. Annu Rev Genet 50:467–491 18. Henikoff S, Greally JM (2016) Epigenetics, cellular memory and gene regulation. Curr Biol 26(14):R644–R648 19. Crisp PA, Ganguly D, Eichten SR, Borevitz JO, Pogson BJ (2016) Reconsidering plant memory: intersections between stress recovery, RNA turnover, and epigenetics. Sci Adv 2(2): e1501340 20. Richards EJ (2011) Natural epigenetic variation in plant species: a view from the field. Curr Opin Plant Biol 14(2):204–209 21. He L, Wu W, Zinta G, Yang L, Wang D, Liu R, Zhang H, Zheng Z, Huang H, Zhang Q (2018) A naturally occurring epiallele associates with leaf senescence and local climate adaptation in Arabidopsis accessions. Nat Commun 9(1):460 22. Bird A (2002) DNA methylation patterns and epigenetic memory. Genes Dev 16(1):6–21. https://doi.org/10.1101/gad.947102 23. Jaenisch R, Bird A (2003) Epigenetic regulation of gene expression: how the genome

12

Peter McKeown and Charles Spillane

integrates intrinsic and environmental signals. Nat Genet 33:S245–S254 24. Berger SL, Kouzarides T, Shiekhattar R, Shilatifard A (2009) An operational definition of epigenetics. Genes Dev 23(7):781–783. https://doi.org/10.1101/gad.1787609 25. Bird AP (1986) CpG-rich islands and the function of DNA methylation. Nature 321 (6067):209–213 26. Doerfler W (1983) DNA methylation and gene activity. Annu Rev Biochem 52:93–124 27. Wigler MH (1981) The inheritance of methylation patterns in vertebrates. Cell 24 (2):285–286 28. Bestor TH, Verdine GL (1994) DNA methyltransferases. Curr Opin Cell Biol 6 (3):380–389 29. Youngson NA, Chong S, Whitelaw E (2011) Gene silencing is an ancient means of producing multiple phenotypes from the same genotype. BioEssays 33(2):95–99. https://doi.org/ 10.1002/bies.201000122 30. Gruenbaum Y, Navehmany T, Cedar H, Razin A (1981) Sequence specificity of methylation in higher plant DNA. Nature 292 (5826):860–862. https://doi.org/10.1038/ 292860a0 31. Lahmy S, Bies-Etheve N, Lagrange T (2010) Plant-specific multisubunit RNA polymerase in gene silencing. Epigenetics 5(1):4–8 32. Waterborg JH (1990) Sequence analysis of acetylation and methylation in two histone H3 variants of alfalfa. J Biol Chem 265 (28):17157–17161 33. Finnegan EJ, Peacock WJ, Dennis ES (1996) Reduced DNA methylation in Arabidopsis thaliana results in abnormal plant development. Proc Natl Acad Sci U S A 93 (16):8449–8454 34. Goldberg AD, Allis CD, Bernstein E (2007) Epigenetics: a landscape takes shape. Cell 128 (4):635–638 35. Chase CD (2007) Cytoplasmic male sterility: a window to the world of plant mitochondrial–nuclear interactions. Trends Genet 23 (2):81–90 36. Kim D-H, Sung SB (2014) Genetic and epigenetic mechanisms underlying vernalization. Arabidopsis Book 12:e0171 37. Auge G, Blair L, Neville H, Donohue K (2017) Maternal vernalization and vernalizationpathway genes influence progeny seed germination. New Phytol 216:388–400 38. L€amke J, B€aurle I (2017) Epigenetic and chromatin-based mechanisms in

environmental stress adaptation and stress memory in plants. Genome Biol 18(1):124 39. Johannes F, Schmitz RJ (2019) Spontaneous epimutations in plants. New Phytol 221 (3):1253–1259 40. Coe EH (1966) Properties origin and mechanism of conversion-type inheritance at b locus in maize. Genetics 53(6):1035 41. Chandler VL, Stam M (2004) Chromatin conversations: mechanisms and implications of paramutation. Nat Rev Genet 5(7):532–544. https://doi.org/10.1038/nrg1378 42. Meyer P, Heidmann I, Niedenhof I (1993) Differences in DNA methylation are associated with a paramutation phenomenon in transgenic Petunia. Plant J 4(1):89–100. https:// doi.org/10.1046/j.1365-313X.1993. 04010089.x 43. Ko¨hler C, Wolff P, Spillane C (2012) Epigenetic mechanisms underlying genomic imprinting in plants. Annu Rev Plant Biol 63(1):331. https://doi.org/10.1146/annurev-arplant042811-105514 44. McKeown PC, Fort A, Spillane C (2013) Genomic imprinting: parental control of gene expression in higher plants. In: Polyploid hybrid genomics. Wiley, New York, p 257 45. Gehring M, Bubb KL, Henikoff S (2009) Extensive demethylation of repetitive elements during seed development underlies gene imprinting. Science 324(5933):1447–1451. https://doi.org/10.1126/science.1171609 46. Tuteja R, McKeown PC, Ryan P, Morgan CC, Donoghue MTA, Downing T, O’Connell MJ, Spillane C (2019) Paternally expressed imprinted genes under positive Darwinian selection in Arabidopsis thaliana. Mol Biol Evol 36(6):1239–1253. https://doi.org/10. 1093/molbev/msz063 47. Costa Liliana M, Yuan J, Rouster J, Paul W, Dickinson H, Gutierrez-Marcos Jose F (2012) Maternal control of nutrient allocation in plant seeds by genomic imprinting. Curr Biol 22 (2):160–165 48. Haig D (2000) The kinship theory of genomic imprinting. Annu Rev Ecol Syst 31(1):9–32 49. Kradolfer D, Wolff P, Hua J, Siretskiy A, Ko¨hler C (2013) An imprinted gene underlies postzygotic reproductive isolation in Arabidopsis thaliana. Dev Cell 26:525–535 50. Dickinson H, Scholten S (2013) And baby makes three: genomic imprinting in plant embryos. PLoS Genet 9(12):e1003981 51. Reinders J, Wulff BBH, Mirouze M, Mar´ı-Ordo´n ˜ ez A, Dapp M, Rozhon W, Bucher E, Theiler G, Paszkowski J (2009) Compromised stability of DNA methylation and transposon

Current Research in Plant Epigenetics and Epigenomics immobilization in mosaic Arabidopsis epigenomes. Genes Dev 23(8):939–950. https://doi. org/10.1101/gad.524609 52. Lindbo JA, Dougherty WG (2005) Plant pathology and RNAi: a brief history. Annu Rev Phytopathol 43:191–204. https://doi. org/10.1146/annurev.phyto.43.040204. 140228 53. Sanford JC, Johnston SA (1985) The concept of parasite-derived resistance—deriving resistance genes from the parasites own genome. J Theor Biol 113(2):395–405. https://doi.org/ 10.1016/s0022-5193(85)80234-4 54. Dougherty WG, Parks TD (1995) Transgenes and gene suppression—telling us something new. Curr Opin Cell Biol 7(3):399–405. https://doi.org/10.1016/0955-0674(95) 80096-4 55. Sen GL, Blau HM (2006) A brief history of RNAi: the silence of the genes. FASEB J 20 (9):1293–1299. https://doi.org/10.1096/fj. 06-6014rev 56. Mirouze M, Reinders J, Bucher E, Nishimura T, Schneeberger K, Ossowski S, Cao J, Weigel D, Paszkowski J, Mathieu O (2009) Selective epigenetic control of retrotransposition in Arabidopsis. Nature 461 (7262):427 57. Zhang Y-Y, Latzel V, Fischer M, Bossdorf O (2018) Understanding the evolutionary potential of epigenetic variation: a comparison of heritable phenotypic variation in epiRILs, RILs, and natural ecotypes of Arabidopsis thaliana. Heredity 121(3):257–265 58. Catoni M, Cortijo S (2018) EpiRILs: lessons from arabidopsis. Plant epigenetics coming of age for breeding applications, vol 88, pp 87–116 59. Springer NM, Stupar RM (2007) Allelic variation and heterosis in maize: how do two halves make more than a whole? Genome Res 17 (3):264–275. https://doi.org/10.1101/gr. 5347007 60. Crow JF (1999) Anecdotal, historical and critical commentaries on genetics. Genetics 152 (3):821–825 61. McKeown PC, Fort A, Duszynska D, Sulpice R, Spillane C (2013) Emerging molecular mechanisms for biotechnological harnessing of heterosis in crops. Trends Biotechnol 31(10):549–551 62. Fort A, Ryder P, McKeown PC, Wijnen C, Aarts MG, Sulpice R, Spillane C (2016) Disaggregating polyploidy, parental genome dosage and hybridity contributions to heterosis in Arabidopsis thaliana. New Phytol 209 (2):590–599

13

63. Groszmann M, Greaves IK, Fujimoto R, Peacock WJ, Dennis ES (2013) The role of epigenetics in hybrid vigour. Trends Genet 29 (12):684–690 64. Lauss K, Wardenaar R, Oka R, van Hulten MH, Guryev V, Keurentjes JJ, Stam M, Johannes F (2018) Parental DNA methylation states are associated with heterosis in epigenetic hybrids. Plant Physiol 176(2):1627–1645 65. Dapp M, Reinders J, Bediee A, Balsera C, Bucher E, Theiler G, Granier C, Paszkowski J (2015) Heterosis and inbreeding depression of epigenetic Arabidopsis hybrids. Nat Plants 1 (7):15092 66. Chen ZJ (2013) Genomic and epigenetic insights into the molecular bases of heterosis. Nat Rev Genet 14(7):471 67. Ryder P, McKeown PC, Fort A, Spillane C (2019) Epigenetics and heterosis in crop plants. In: Epigenetics in plants of agronomic importance: fundamentals and applications. Springer, Cham, pp 129–147 68. Chen ZJ, Pikaard CS (1997) Epigenetic silencing of RNA polymerase I transcription: a role for DNA methylation and histone modification in nucleolar dominance. Genes Dev 11 (16):2124–2136. https://doi.org/10.1101/ gad.11.16.2124 69. Huang HR, Liu JJ, Xu Y, Lascoux M, Ge XJ, Wright SI (2018) Homeologue-specific expression divergence in the recently formed tetraploid Capsella bursa-pastoris (Brassicaceae). New Phytol 220(2):624–635 70. Wolf JB, Wade MJ (2009) What are maternal effects (and what are they not)? Philos Trans R Soc B 364(1520):1107–1115. https://doi. org/10.1098/rstb.2008.0238 71. Donohue K (2009) Completing the cycle: maternal effects as the missing link in plant life histories. Philos Trans R Soc B 364 (1520):1059–1074. https://doi.org/10. 1098/rstb.2008.0291 72. Forsthoefel NR, Vernon DM (2011) Effect of sporophytic PIRL9 genotype on post-meiotic expression of the Arabidopsis pirl1;pirl9 mutant pollen phenotype. Planta 233 (2):423–431. https://doi.org/10.1007/ s00425-010-1324-5 73. Duszynska D, McKeown PC, Juenger TE, Pietraszewska-Bogiel A, Geelen D, Spillane C (2013) Gamete fertility and ovule number variation in selfed reciprocal F1 hybrid triploid plants are heritable and display epigenetic parent-of-origin effects. New Phytol 198 (1):71–81 74. Charlesworth D (2012) Plant sex chromosome evolution. J Exp Bot 64(2):405–420

Chapter 2 Approaches to Whole-Genome Methylome Analysis in Plants Xiaodong Yang and Sally A. Mackenzie Abstract Cytosine methylation as a reversible chromatin mark has been investigated extensively for its influence on gene silencing and the regulation of its dynamic association–disassociation at specific sites within a eukaryotic genome. With the remarkable reductions in cost and time associated with whole-genome DNA sequence analysis, coupled with the high fidelity of bisulfite-treated DNA sequencing, single nucleotide resolution of cytosine methylation repatterning within even very large genomes is increasingly achievable. What remains a challenge is the analysis of genome-wide methylome datasets and, consequently, a clear understanding of the overall influence of methylation repatterning on gene expression or vice versa. Reported data have sometimes been subject to stringent data filtering methods that can serve to skew downstream biological interpretation. These complications derive from methylome analysis procedures that vary widely in method and parameter setting. DNA methylation as a chromatin feature that influences DNA stability can be dynamic and rapidly responsive to environmental change. Consequently, methods to discriminate background “noise” of the system from biological signal in response to specific perturbation is essential in some types of experiments. We describe numerous aspects of whole-genome bisulfite sequence data that must be contemplated as well as the various steps of methylome data analysis which impact the biological interpretation of the final output. Key words DNA methylation, Methylome, Bisulfite-seq, DMR

1

Introduction Improved technologies and dramatic cost reductions for wholegenome sodium bisulfite sequencing of DNA have opened the door to high-resolution methylome analysis as a valuable methodology for epigenomic investigations. However, much of what has shaped current thinking about DNA methylation functions emerged prior to the development of advanced computational methods for data analysis. The earliest DNA methylation patterning studies were conducted in mammalian systems [1, 2], where it was deduced that CpG sequences could be methylated de novo, methylation could silence genes, and methylation repatterning could be inherited somatically. Elucidating these properties on a

Charles Spillane and Peter McKeown (eds.), Plant Epigenetics and Epigenomics: Methods and Protocols, Methods in Molecular Biology, vol. 2093, https://doi.org/10.1007/978-1-0716-0179-2_2, © Springer Science+Business Media, LLC, part of Springer Nature 2020

15

16

Xiaodong Yang and Sally A. Mackenzie

genome-wide scale requires specialized computational tools. The earliest studies of plant DNA methylation exploited whole-genome bisulfite sequencing, which offered single nucleotide resolution of methylome architecture and revealed prominent association of dense methylation intervals within transposable elements and heterochromatin [3–5]. Whereas mammalian systems showed patterns of DNA methylation concentrated within gene promoterassociated “CpG islands” [6], plant methylome studies revealed DNA methylation in three cytosine contexts, CG, CHG, and CHH, where H is A, C, or T, and where CG methylation is predominantly located within gene regions [7]. These early observations were accompanied by a growing body of literature that unraveled the machinery controlling DNA methylation/demethylation [8] as well as its repatterning behavior in response to perturbations [9–11]. Numerous excellent reviews have been written to describe the DNA methylation process and machinery in plants and animals [7, 8, 12, 13], what we will focus on here are the emerging methods for interpreting whole-genome DNA methylation data. There are more than 20 different techniques (or their variants) for genome-wide DNA methylation profiling [14]. Despite the large number of available techniques, the principles behind all of them are the same: treatment of genomic DNA to differentiate methylated and unmethylated cytosine sites, followed by methods to track these differences on a genome-wide scale (generally by next-generation sequencing). There are three main approaches used to differentiate methylated and unmethylated cytosine sites: (1) restriction endonuclease treatment with methylation-sensitive enzymes, with differentially sized fragments subjected to sequencing [15]. A chief limitation of this method lies in the number of available enzymes that have proved sufficiently reliable. (2) Affinity enrichment, or Methylated DNA Immunoprecipitation (MeDIP), which resembles ChIP methodology. Affinity enrichment uses an antibody that binds methylated cytosine to enrich for methylated DNA regions of the genome [16]. (3)Whole-genome bisulfite sequencing (WGBS), considered as the gold standard for its single nucleotide resolution. As the cost of next-generation sequencing has dropped over time, and assembled genomes are available for an increasing number of species, WGBS has generally become the preferred method. The procedure involves bisulfite treatment of genomic DNA followed by whole-genome deep sequencing, usually with average genome coverage of 5–40. With the growing amount of available WGBS data, their analysis has become the bottleneck for this approach. As illustrated in Fig. 1, a typical pipeline for WGBS data analysis, where data derive from Illumina platforms, generally includes the following steps:

Whole Genome Methylome Analysis in Plants

17

Fig. 1 A typical workflow for whole-genome bisulfite sequence data analysis. Ovals represent input or output data, gray rectangles are processing procedures, and dashed line rectangles represent software or programs

1. Quality control, where bisulfite sequence reads are fed into FastQC to perform an overall quality check, and sequencing adaptors are removed by a trimmer (such as Cutadapt [17] or TrimGalore! [18]). 2. Alignment, so that clean, high-quality reads are then mapped to a reference genome. Unlike standard mapping, the aligner used for bisulfite sequencing is able to map Ts to either Cs or Ts. Generally, C to T conversion of the reference genome is required before alignment, and some aligners accomplish this (in conjunction with bowtie [19] are Bismark [20], BS-Seeker3 [21], and MethylCoder [22]). 3. Identification of differential methylation sites. The strategies and software used to identify differential methylation patterning are highly varied. The procedure generally starts with defining differentially methylated cytosines (DMCs), with Fisher’s exact test or logistic regression being applied to each single cytosine. DMCs are then defined based on the calculated p-value (or q-value after multiple test correction). In most cases, DMCs are scattered across the entire genome and subject to sampling and sequencing bias. Therefore, the concept of differentially methylated regions (DMRs) is generally employed. A DMR can be defined in many different ways depending on experimental design and program used. The

18

Xiaodong Yang and Sally A. Mackenzie

common feature of DMRs is that they are clusters of cytosines within genomic intervals (defined by a sliding or fixed window, usually 100 bp) that show significant methylation level difference within the group or by pairwise comparison. The cutoff for “significant methylation level difference” is subjective and highly variable within the literature. Several additional criteria can be applied to DMR identification to improve data robustness, including smoothing (averaging local methylation level), setting a cutoff for minimum coverage and number of DMCs per DMR. Numerous tools are available for the identification of DMCs and DMRs including DSS [23], methylpy [24], BSmooth [25], methylKit [26], RADMeth [27], and a binomial mixed model approach [28]. 4. The final, and most controversial, step is to associate differential methylation with relevant genomic features of interest. In the end, confidence in assignment of identified DMCs or DMRs to genes, transposons, intergenic regions, or heterochromatin depends on the experimental design, genome annotation availability, and resolution of data analysis procedures. DMR-based methylation analysis has proven to be extremely useful for genome-wide characterization of methylome architecture, defining heterochromatic regions [29, 30], epigenomic repatterning that accompanies mutations in chromatin-modifying machinery [31], and association of methylation with transposable element sequences and their associated genes [32–34]. Gene silencing events can also be revealed by high-density local methylation changes. Numerous studies have shown genome-wide methylation responses to abiotic [11] and biotic [9, 35] stress conditions, as well as long-term methylome adaptation to local environments [36–38]. An important feature of DMR-based methylation analysis is its design for stringent and objective genome-wide survey of DNA methylation patterning without regard to biological genome features. The analysis approach identifies DNA methylation changes that occur in sufficient density and magnitude for detection within given parameters without regard to their genomic context. The resulting output is then subsequently assessed for relationship to transposable element, genic or heterochromatic features of the genome.

2

Limitations of Current Analysis Methods for Methylome Studies There are at least three major challenges encountered by the current methodologies:

Whole Genome Methylome Analysis in Plants

19

1. The need to distinguish stochastic spontaneous variation within the methylome from treatment-associated signal. 2. The arbitrary nature of data filtering or subjective user settings, given the difficulty of optimizing parameters for particular experiments: settings that are too stringent will cause loss of information, while overly relaxed settings will cause unacceptable levels of false positives. 3. The nature of methylation heterogeneity: DNA methylation levels vary considerably in samples experiencing changes during development or in response to environmental change, and these effects can show measurable spatio-temporal differences. The majority of genome-wide cytosine methylation studies to date rely on large magnitude signal found in high-density methylation changes, with DMRs as the unit of analysis. These methylation data analysis procedures often employ stringent filters; for example, for each control to treatment group comparison, regions are considered DMRs only when the average methylation level difference is higher than 40%, 20%, and 20% for CG, CHG, and CHH, respectively, in a recent study of methylation variation under abiotic stress [39], versus 40%, 20%, and 10% in a recent seed development and germination study [40]. Significantly, this literature often lacks evidence of clear association between methylation repatterning and gene expression changes. Rather, this type of analysis has shown compellingly that high-density cytosine methylation serves to silence transposable elements and local TE-neighboring genes [33, 41, 42], and to stabilize heterochromatic regions of the genome [43], functions essential to chromosome steady state. However, the relevance of gene-associated methylation to gene expression and/or phenotype remains ambiguous [44]. Current methods for DMR-based methylation analysis tend to exclude low-density methylation signal as well as methylation changes occurring within a limited proportion of harvested cells. If a small number of strategically positioned methylation changes were sufficient to influence transcription initiation and/or splicing within a gene, such changes would be ignored by current analysis procedures. Likewise, one could argue that if methylation and gene expression changes occur coordinately in a subset of cell types, these likewise would be missed by current methods due to signal dilution by tissue pooling. Yet, low-density, gene-associated differential methylation has been reported in studies that bypass conventional DMR analysis [45–47] . Biophysical studies of DNA methylation indicate that altering methylation positions on a DNA double strand influences local physical properties of strand separation, flexibility, and protein accessibility [48–51]. In these studies, cytosine methylation is speculated to have evolved as a general stabilizing component of

20

Xiaodong Yang and Sally A. Mackenzie

the DNA molecule, with TE silencing presumably a derivative function. If true, it is reasonable to assume that gene regions displaying changes in DNA methylation should coincide with pathways that have undergone disruption or sporadic changes in gene expression, serving to reestablish equilibrium. In this scenario, gene-associated methylation changes would coincide with altered gene networks relevant to phenotype. Detecting such geneassociated methylation variation requires higher resolution than is currently achieved with conventional procedures. While some types of methylation changes may be prevalent in all or most cells, including those with large differential signal that influence TE and gene silencing, epigenomic changes also occur in tissue- or cell-type-specific patterns [38, 52], requiring greater resolution to confront methylation signal dilution. DMR-based analysis may be ineffective for detecting tissue-specific or subtle intragenic methylation changes on a genome-wide scale. This disparity in sensitivity could then lead to overly narrow biological interpretations of methylation repatterning.

3

Gene Body Methylation Gene-associated methylation has generally been studied in Arabidopsis thaliana as CG intragenic methylation patterns observed in wild-type individuals grown under standard conditions. With stringent filters used to define gene body methylation (GbM), early reports indicated its presence in 4361 genes, accounting for 16% of Arabidopsis genes [53]. GbM in these studies was defined as genes having at least 20 CG sites within the gene at an average CG methylation level for each CG site of greater than 95%, an extremely high cutoff. Plant species that appear to have no gene body methylation have been reported, and Arabidopsis recombinant inbred lines (RILs) mutated in CMT3 to eliminate gene body methylation produce no obvious aberrant effects to transcription when grown under controlled conditions [54]. However, plant tissues may differ in CG methylation [52], and plants displaying variation in gene body methylation within duplicate genes appear to differ in paralog gene expression [55]. The function of gene body methylation, therefore, remains unclear. Its presence in a range of organisms, including plants, animals, and other eukaryotes, confirms its evolutionary conservation, and it has been postulated to aid in gene homeostasis [56]. It is reasonable to assume that early GbM studies likely excluded significant intragenic methylation that is tissue- or celltype-specific due to low and/or variable differential methylation signal strength. For instance, studies investigating plant reproduction in Arabidopsis [45] or seed development in rice [57] have

Whole Genome Methylome Analysis in Plants

21

reported intragenic methylation variation in greater proportions of the genome, exceeding earlier GbM estimates. Altered plant developmental phenotypes have been associated with intragenic differential methylation targeted to specific gene networks, including tomato fruit ripening [58, 59], maize leaf development [60], and photoperiod sensitivity [61]. Dubin et al. (2015) showed that Arabidopsis CpG gene body methylation (GbM) was not affected by growth temperature, but it was instead correlated with the accession’s latitude of origin. Accessions from colder regions had higher levels of GbM for a significant fraction of the genome, and this was associated with increased transcription of genes affected [37]. This hypothesis is supported by other recent research. For example, a simulated selection experiment showed that epigenetic variation within gene body regions is subject to selection and, therefore, participates in adaptation [46]. Walker et al. (2018) identified specific DNA methylation signatures mediated by the RdDM pathway in the Arabidopsis male sexual lineage. This study further confirmed that a large proportion of those signatures, denoted as Sexual-Lineage-Hypermethylated (SLH) loci, were located in gene regions, suppressing gene transcription and promoting the splicing of a gene essential for meiosis [47]. The de novo repatterning of DNA methylation in the genome is thought to rely on the RNA-dependent DNA methylation (RdDM) pathway [62, 63]. Yet, evidence suggests that canonical RdDM components alone are insufficient to account for the variation in tissue-specific methylation changes [64–66], methylation reinforcement that shows differential stability for transgenerational inheritance [67], and the propensity of DNA methylation to be environmentally responsive [11, 35]. Refinements are emerging to the mechanism by which de novo methylation operates, suggesting that the process may be more nuanced than the original model was able to predict. For example, dicer-independent DNA methylation comprises more than 80% of the RdDM target loci [68], implying that more than one RNA form may be important to the targeting process. Components like MOM1 [69] or the MBD7 complex [70] may function downstream to methylation in the control of transcriptional activity, implying that “silencing” may represent a modulated process. The demethylating activity of ROS1 functions within a feedback regulatory process to provide a methylation “rheostat,” presumably to fine-tune the influence of methylation on gene expression [71]. Whereas general dogma indicates methylation reprogramming to occur predominantly during gametogenesis, examples exist of methylation variation arising within vegetative tissues and displaying limited transgenerational heritability [72]. Transcription factor[73, 74] and spliceosome-related [75] influences on methylation machinery have been speculated based on observed interactions. Differential gene body methylation can also be influenced by

22

Xiaodong Yang and Sally A. Mackenzie

histone modifications [13, 76] and particular histone classes, with H3.3 showing association with classes of stress response genes, for example [77]. Leaf development and circadian clock regulation involve GbM changes that depend on the histone deacetylase HDA6 [78–80], a direct interaction partner with the methyltransferase MET1 [81]. A specific subset of RdDM targets requires HDA6 acting upstream of PolIV and siRNA biogenesis, components of a two-step process for conditioning epigenetic memory [81]. These observations indicate that histone modifications can directly influence intragenic methylation behavior. The plant-specific MSH1 system produces an epigenetic memory response that involves reprogrammed gene expression and phenotype changes that are strikingly similar across plant species [82–84]. The msh1 memory condition is stably inherited through multiple generations. These observations imply that particular gene networks are responsive to MSH1 disruption across plant species, perhaps serving as “hubs” for stress-responsive epigenetic changes. The likelihood that these pathways are similarly regulated by neighboring TEs in each species appears unlikely, however.

4

Newly Emerging Methylation Analysis Programs Recently, novel approaches to methylome data analysis have emerged. For example, Jenkinson and colleagues demonstrated the information-theoretic nature of the epigenome and developed an approach, named informME, to define sample-specific energy landscapes [85, 86]. Another group has used a signal decoding procedure (including a forward–backward algorithm and Viterbi algorithm) in their epiallele inheritance analysis [87]. METHimpute, a Hidden Markov Model-based imputation algorithm, provides a high accuracy for differential methylation calling in low coverage data (as low as 6X) [88]. Tran and colleagues have attempted to identify differentially methylated sites with weak methylation effects by using a wavelet-based functional mixed model [89]. The program HOME, developed based on machine learning approaches, has achieved a higher accuracy in DMR calling [90]. All of these varied approaches have introduced additional sophistication to the methodology in an attempt to address the issues of sensitivity and heterogeneity that can confound methylome datasets. Methyl-IT, developed by our group, uses a signal detection approach that takes methylation heterogeneity, sequencing depth, and sampling variation into account, to increase sensitivity and permit resolution of DNA methylation changes that are associated with phenotype [91]. The method defines phenotype-associated methylation changes with the following stated premises:

Whole Genome Methylome Analysis in Plants

23

1. DNA methylation patterning is likely non-uniform across various cell types: this detectable heterogeneity contains important information. 2. Site-directed de novo methylation repatterning is thought to occur via RNA-directed DNA methylation. However, this may not be the only means of de novo methylation, given the striking heterogeneity that exists cell to cell. It is possible that complementary mechanisms recruit methylation machinery to sites within the genome, including transcription factors and local chromatin (histone modification) activity that arise during changes in gene transcription. 3. DNA methylation is dynamic, even within wild-type samples grown under normal environmental conditions [92, 93]. Thus, many of the methylation changes observed in an experimental sample also occur in controls. To discriminate variation within control samples, methylome analysis must take into account control inter-sample methylation variation in parallel to variation between control and experimental individuals. To resolve changes that distinguish experimental samples from control requires a signal detection step. 4. DNA methylation and demethylation changes can affect the structural fluctuations of up to six neighboring base pairs [51], so that it is reasonable to assume that a variety of local methylation changes can have synonymous effects on the properties of a given interval. Similarly, if one assumes that DNA methylation repatterning occurs in response to local histone modifications, the chromatin changes that serve to recruit methylation changes are likely to recruit a variety of functionally synonymous methylation pattern changes. For this reason, filtering methylation data to include only changes occurring at a single cytosine or in a single direction can serve to dampen biologically relevant signal. Methyl-IT implements an information-based approach to identify differentially informative methylated positions (DIMPs), and treats the observed distribution of methylation as a Weibull distribution [94, 95]. Control and treatment sample distributions are discriminated by a Hellinger divergence setting that encompasses signal detection principles for discrimination of signal from background variation. This analysis identifies a DMG dataset between control and treatment samples for subsequent validation. Initial Methyl-IT output can result in unexpectedly large numbers of differentially methylated genes (DMGs) under circumstances like stress, raising questions about sufficient discrimination of true biological signal from background variation. Increasing filter stringency of the derived DMG output, based on intragenic DIMP number, inter-sample standardization, or other criteria, can

24

Xiaodong Yang and Sally A. Mackenzie

Fig. 2 Methylation changes in response to drought stress in Arabidopsis thaliana at two sample loci. Methylome data from three drought-stressed plants (water withheld for 2 weeks) and three control plants were analyzed using Methyl-IT pipeline, and the methylation level of cytosine sites is shown in the ratio of methylated reads/all reads with orange tracks representing three drought-stressed individuals and green tracks representing non-stressed control plants. (a) AT1G10586 is used as an example of a strong GbM signal associated with a TE. (b) AT3G20440 is an example of weak but likely meaningful methylation signal that could potentially alter splicing pattern. Differentially methylated cytosines identified by Methyl-IT are shown. Drought stress methylome data were downloaded from the GEO database under accession number GSE94075, generated by [39]. IGB browser (ver 9.0.2) was used to generate the graph

produce a more targeted DMG number. However, network-based enrichment analysis of the original and filtered DMG sets produces very similar results regardless of filter stringency and DMG number, implying that trends observed within the dataset are consistent, and overly stringent filtering may actually serve to restrict resolution. Figure 2 shows examples of strong and weak methylation changes detected in plants under drought stress. The gene in Fig. 2a overlaps with a transposable element and undergoes high-density/highmagnitude methylation changes, while the gene in Fig. 2b undergoes relatively weak but precise methylation signal, with a potential to influence alternative splicing. Both scenarios are common within the drought stress methylome dataset. Although biological relevance in both cases still remains to be confirmed, these two examples demonstrate the importance of analysis parameter setting, where a stringent filter setting would result in loss of potentially useful information like that shown in Fig. 2b.

5

Strategies for Methylome Dataset Validation The challenge with all genome-wide methylome datasets, regardless of analysis procedure, is validating biological relevance of the methylome repatterning that is detected. In epigenetic memory

Whole Genome Methylome Analysis in Plants

25

studies, validation can involve co-association testing of methylation signals with phenotype as well as generational heritability. Two of the systems that offer experimental feasibility to address these questions are the MSH1 system and recombinant inbred epi-lines (epiRILs). In both systems, epigenetic changes are not limited to gene silencing but include more natural quantitative modulation of phenotype, and they each involve de novo emergence of novel phenotypes. In the case of the MSH1 system, where a heritable epigenetic memory phenotype follows RNAi suppression of the MSH1 gene in plants [82, 84, 96, 97], it is feasible to derive both memory and “non-memory” types as transgene-null segregants from a single MSH1-RNAi line. Thus, full-sibs plus and minus the memory phenotype emerge from a single parent, which provides the necessary rigor for testing phenotype co-association with identified methylation signal. Similar testing of phenotype-associated epigenomic changes is feasible in recombinant inbred lines (epiRILs) from a cross of wild type with a methylation mutant [67, 98]. Derived RILs can be used for testing the association of emergent phenotypes with specific changes in features of the methylome. In this case, numerous individual lineages over multiple generations are necessary to produce RIL lines, with the potential to introduce unrelated methylome background variation to the study. This background variation could help strengthen the rigor of the test but may also limit resolution to those associations sustainable through the manipulations. Generational stability available in both the MSH1 and epiRIL systems increases rigor of tests for association of epigenetic signal with phenotype. At least partial methylome resetting occurs within a lineage each generation following gametophytic reprogramming [10, 99, 100]. Consequently, transgenerational inheritance behavior can reveal associations of methylation patterning with emergent phenotype. In the MSH1 and epiRIL systems, both involving heritable epigenetic phenotypes, multi-generational co-association of phenotype variants with specific methylation changes offers a test for potential causal relationships. Still, since many of these epigenomic changes may occur in cell-type- and tissue-specific patterns, high-resolution analysis is necessary to discriminate this variation.

6

Methylation vs. Gene Expression Data A common feature of all current methylome analysis procedures is poor DMP association with gene expression changes. Unusually weak overlap is generally detected between genomic regions of

26

Xiaodong Yang and Sally A. Mackenzie

methylation repatterning and genes altered in expression, with the exception of TE or gene silencing events [101]. This poor correspondence is consistent with the growing assumption that intragenic methylation has little influence on gene expression. Yet, most commonly used gene expression assays that involve RNAseq data are inadequate for this type of comparative analysis. Even as intragenic methylation changes can occur by tissue- or cell-type-specific activity and to influence alternative splicing, RNAseq analyses are generally conducted with pooled tissue samples, diluting cellspecific signal and providing poor resolution of alternative splicing. Studies that use dissected tissues, qRT-PCR assays, and detailed spatio-temporal expression analyses would be predicted to increase the association but are generally not implemented. Thus, the numerous examples linking DNA methylation with changes in gene expression often involve single gene studies. One means of addressing this limitation to RNAseq data is to conduct comparative methylome-gene expression analysis by conducting gene network-based analysis with each of the two datasets prior to cross-comparison. While individual gene overlap between the two datasets may be sparse due to resolution limitations, network-based enrichment analysis serves to identify important methylome-gene expression network intersections. Network identification can then be used to reveal individual genes by more sensitive methodologies that can capture spatio-temporal information.

7

Concluding Comments Plant methylome research shows that a primary role of cytosine methylation repatterning in plants has been to anchor and activate transposable elements as they have emerged within plant genomes. Similarly, methylation plays an important role in heterochromatization at particular genomic intervals during development. However, what appears to be lacking from current plant literature is an accounting of intragenic methylation, the extent to which methylation participates in promoter accessibility and in alternative splicing activity, for example, particularly in a cell- or tissue-specific manner. Early genome-wide methylome studies were sufficiently stringent to filter gene-associated methylation signals by virtue of their tissue specificity and moderate to low density. In this case, it is possible that significant, albeit subtle, biological signal has gone undetected within whole-genome methylome datasets. The development of more sensitive assays for intragenic methylation behavior, in combination with spatio-temporal expression assays to enhance resolution of gene expression variation during plant development and under conditions of stress, may dramatically alter current thinking about RdDM components in epigenomic

Whole Genome Methylome Analysis in Plants

27

reprogramming, and their association with methylation/demethylation components and histone modifiers to establish short- and long-term (transgenerational) stress memory states and developmental transitions. As current investigations of large magnitude epigenomic signals have provided meaningful insight into gene and chromatin silencing/activating mechanisms, a more nuanced approach to these investigations is likely to shine light on components of phenotypic GxE as well as quantitative variation for trait expression. References 1. Holliday R, Pugh J (1975) DNA modification mechanisms and gene activity during development. Science 187(4173):226–232. https:// doi.org/10.1126/science.1111098 2. Riggs AD (1975) X inactivation, differentiation, and DNA methylation. Cytogenet Genome Res 14:9–25. https://doi.org/10. 1159/000130315 3. Zhang X, Yazaki J, Sundaresan A et al (2006) Genome-wide high-resolution mapping and functional analysis of DNA methylation in Arabidopsis. Cell 126:1189–1201. https:// doi.org/10.1016/j.cell.2006.08.003 4. Cokus SJ, Feng S, Zhang X et al (2008) Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 452:215–219. https://doi.org/10. 1038/nature06745 5. Lister R, O’Malley RC, Tonti-Filippini J et al (2008) Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell 133:523–536. https://doi.org/10. 1016/j.cell.2008.03.029 6. Jones PA (2012) Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat Rev Genet 13:484–492 7. Law JA, Jacobsen SE (2010) Establishing, maintaining and modifying DNA methylation patterns in plants and animals. Nat Rev Genet 11:204–220 8. Zhang H, Lang Z, Zhu JK (2018) Dynamics and function of DNA methylation in plants. Nat Rev Mol Cell Biol 19:489–506 9. Dowen RH, Pelizzola M, Schmitz RJ et al (2012) Widespread dynamic DNA methylation in response to biotic stress. Proc Natl Acad Sci 109:E2183–E2191. https://doi. org/10.1073/pnas.1209329109 10. Crisp PA, Ganguly D, Eichten SR et al (2016) Reconsidering plant memory: intersections between stress recovery, RNA turnover, and

epigenetics. Sci Adv 2:e1501340. https:// doi.org/10.1126/sciadv.1501340 11. Bej S, Basak J (2017) Abiotic stress induced epigenetic modifications in plants: how much do we know? In: Plant epigenetics, pp 493–512 12. He XJ, Chen T, Zhu JK (2011) Regulation and function of DNA methylation in plants and animals. Cell Res 21:442–465. https:// doi.org/10.1038/cr.2011.23 13. Du J, Johnson LM, Jacobsen SE, Patel DJ (2015) DNA methylation pathways and their crosstalk with histone methylation. Nat Rev Mol Cell Biol 16:519–532. https://doi.org/ 10.1038/nrm4043 14. Chatterjee A, Rodger EJ, Morison IM, et al (2017) Tools and strategies for analysis of genome-wide and gene-specific DNA methylation patterns. In: Methods in molecular biology. Humana Press, New York, pp 249–277 15. Laird PW (2010) Principles and challenges of genome-wide DNA methylation analysis. Nat Rev Genet 11:191–203 16. Jacinto FV, Ballestar E, Esteller M (2008) Methyl-DNA immunoprecipitation (MeDIP): hunting down the DNA methylome. BioTechniques 44:35–43. https://doi. org/10.2144/000112708 17. Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J 17:10. https://doi.org/10. 14806/ej.17.1.200 18. Krueger F (2016) Trim Galore. In: Babraham Bioinforma. https://www.bioinformatics. babraham.ac.uk/projects/trim_galore/ 19. Langmead B (2010) Aligning short sequencing reads with bowtie. Curr Protoc Bioinformatics. https://doi.org/10.1002/ 0471250953.bi1107s32 20. Krueger F, Andrews SR (2011) Bismark: a flexible aligner and methylation caller for bisulfite-Seq applications. Bioinformatics

28

Xiaodong Yang and Sally A. Mackenzie

27:1571–1572. https://doi.org/10.1093/ bioinformatics/btr167 21. Huang KYY, Huang YJ, Chen PY (2018) BS-Seeker3: ultrafast pipeline for bisulfite sequencing. BMC Bioinformatics 19:111. https://doi.org/10.1186/s12859-0182120-7 22. Pedersen B, Hsieh TF, Ibarra C, Fischer RL (2011) MethylCoder: software pipeline for bisulte-treated sequences. Bioinformatics 27:2435–2436. https://doi.org/10.1093/ bioinformatics/btr394 23. Feng H, Conneely KN, Wu H (2014) A Bayesian hierarchical model to detect differentially methylated loci from single nucleotide resolution sequencing data. Nucleic Acids Res 42(8):e69. https://doi.org/10.1093/nar/ gku154 24. Schultz MD, He Y, Whitaker JW et al (2015) Human body epigenome maps reveal noncanonical DNA methylation variation. Nature 523:212–216. https://doi.org/10.1038/ nature14465 25. Hansen KD, Langmead B, Irizarry RA (2012) BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions. Genome Biol 13:R83. https://doi. org/10.1186/gb-2012-13-10-R83 26. Akalin A, Kormaksson M, Li S et al (2012) MethylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biol 13:R87. https://doi. org/10.1186/gb-2012-13-10-R87 27. Dolzhenko E, Smith AD (2014) Using betabinomial regression for high-precision differential methylation analysis in multifactor whole-genome bisulfite sequencing experiments. BMC Bioinformatics 15:215. https://doi.org/10.1186/1471-2105-15215 28. Lea AJ, Tung J, Zhou X (2015) A flexible, efficient binomial mixed model for identifying differential DNA methylation in bisulfite sequencing data. PLoS Genet 11:e1005650. https://doi.org/10.1371/journal.pgen. 1005650 29. Yelagandula R, Stroud H, Holec S et al (2014) The histone variant H2A.W defines heterochromatin and promotes chromatin condensation in Arabidopsis. Cell 158:98–109. https://doi.org/10.1016/j. cell.2014.06.006 30. Gouil Q, Baulcombe DC (2016) DNA methylation signatures of the plant chromomethyltransferases. PLoS Genet 12:e1006526. https://doi.org/10.1371/journal.pgen. 1006526

31. Stroud H, Greenberg MVC, Feng S et al (2013) Comprehensive analysis of silencing mutants reveals complex regulation of the Arabidopsis methylome. Cell 152:352–364. https://doi.org/10.1016/j.cell.2012.10. 054 32. Mlura A, Yonebayashi S, Watanabe K et al (2001) Mobilization of transposons by a mutation abolishing full DNA methylation in Arabidopsis. Nature 411:212–214. https:// doi.org/10.1038/35075612 33. Stuart T, Eichten SR, Cahn J et al (2016) Population scale mapping of transposable element diversity reveals links to gene regulation and epigenomic variation. elife 5. https://doi. org/10.7554/eLife.20777 34. Fultz D, Slotkin RK (2017) Exogenous transposable elements circumvent identity-based silencing, permitting the dissection of expression-dependent silencing. Plant Cell 29:360–376. https://doi.org/10.1105/tpc. 16.00718 35. Yu A, Lepere G, Jay F et al (2013) Dynamics and biological relevance of DNA demethylation in Arabidopsis antibacterial defense. Proc Natl Acad Sci 110:2389–2394. https://doi. org/10.1073/pnas.1211757110 36. Schmitz RJ, Schultz MD, Urich MA et al (2013) Patterns of population epigenomic diversity. Nature 495:193–198. https://doi. org/10.1038/nature11968 37. Dubin MJ, Zhang P, Meng D et al (2015) DNA methylation in Arabidopsis has a genetic basis and shows evidence of local adaptation. elife 4:e05255. https://doi.org/10.7554/ eLife.05255 38. Kawakatsu T, Huang S shan C, Jupe F et al (2016) Epigenomic diversity in a global collection of Arabidopsis thaliana accessions. Cell 166:492–506. https://doi.org/10. 1016/j.cell.2016.06.044 39. Ganguly DR, Crisp PA, Eichten SR, Pogson BJ (2017) The Arabidopsis DNA methylome is stable under transgenerational drought stress. Plant Physiol 175:1893–1912. https://doi.org/10.1104/pp.17.00744 40. Kawakatsu T, Nery JR, Castanon R, Ecker JR (2017) Dynamic DNA methylation reconfiguration during seed development and germination. Genome Biol 18:171. https://doi. org/10.1186/s13059-017-1251-x 41. Kaplowitz PB, Jennings SS (1987) Effect of growth hormone therapy on caloric intake in children with growth hormone deficiency. Nutr Res 7:901–906. https://doi.org/10. 1016/S0271-5317(87)80158-6

Whole Genome Methylome Analysis in Plants 42. Mirouze M, Vitte C (2014) Transposable elements, a treasure trove to decipher epigenetic variation: insights from Arabidopsis and crop epigenomes. J Exp Bot 65:2801–2812 43. Moarefi AH, Che´din F (2011) ICF syndrome mutations cause a broad spectrum of biochemical defects in DNMT3B-mediated de novo DNA methylation. J Mol Biol 409:758–772. https://doi.org/10.1016/j. jmb.2011.04.050 44. Bewick AJ, Schmitz RJ (2017) Gene body DNA methylation in plants. Curr Opin Plant Biol 36:103–110 45. Yang H, Chang F, You C et al (2015) Wholegenome DNA methylation patterns and complex associations with gene structure and expression during flower development in Arabidopsis. Plant J 81:268–281. https://doi. org/10.1111/tpj.12726 46. Schmid MW, Heichinger C, Coman Schmid D et al (2018) Contribution of epigenetic variation to adaptation in Arabidopsis. Nat Commun 9:4446. https://doi.org/10. 1038/s41467-018-06932-5 47. Walker J, Gao H, Zhang J et al (2018) Sexuallineage-specific DNA methylation regulates meiosis in Arabidopsis. Nat Genet 50:130–137. https://doi.org/10.1038/ s41588-017-0008-5 48. Derreumaux S, Chaoui M, Tevanian G, Fermandjian S (2001) Impact of CpG methylation on structure, dynamics and solvation of cAMP DNA responsive element. Nucleic Acids Res 29:2314–2326. https://doi.org/ 10.1093/nar/29.11.2314 49. Severin PMD, Zou X, Gaub HE, Schulten K (2011) Cytosine methylation alters DNA mechanical properties. Nucleic Acids Res 39:8740–8751. https://doi.org/10.1093/ nar/gkr578 50. Pe´rez A, Castellazzi CL, Battistini F et al (2012) Impact of methylation on the physical properties of DNA. Biophys J 102:2140–2148. https://doi.org/10.1016/ j.bpj.2012.03.056 51. Ngo TTM, Yoo J, Dai Q et al (2016) Effects of cytosine modifications on DNA flexibility and nucleosome mechanical stability. Nat Commun 7. https://doi.org/10.1038/ ncomms10813 52. Roeler K, Takuno S, Gaut BS (2016) CG methylation covaries with differential gene expression between leaf and floral bud tissues of Brachypodium distachyon. PLoS One 11: e0150002. https://doi.org/10.1371/jour nal.pone.0150002

29

53. Takuno S, Gaut BS (2012) Body-methylated genes in Arabidopsis thaliana are functionally important and evolve slowly. Mol Biol Evol 29:219–227. https://doi.org/10.1093/ molbev/msr188 54. Bewick AJ, Ji L, Niederhuth CE et al (2016) On the origin and evolutionary consequences of gene body DNA methylation. Proc Natl Acad Sci 113:9111–9116. https://doi.org/ 10.1073/pnas.1604666113 55. Wang X, Zhang Z, Fu T et al (2017) Genebody CG methylation and divergent expression of duplicate genes in rice. Sci Rep 7:2675. https://doi.org/10.1038/s41598017-02860-4 56. Zilberman D (2017) An evolutionary case for functional gene body methylation in plants and animals. Genome Biol 18:87. https:// doi.org/10.1186/s13059-017-1230-2 57. Xing M-Q, Zhang Y-J, Zhou S-R et al (2015) Global analysis reveals the crucial roles of DNA methylation during Rice seed development. Plant Physiol 168:1417–1432. https:// doi.org/10.1104/pp.15.00414 58. Zhong S, Fei Z, Chen YR et al (2013) Singlebase resolution methylomes of tomato fruit development reveal epigenome modifications associated with ripening. Nat Biotechnol 31:154–159. https://doi.org/10.1038/nbt. 2462 59. Lang Z, Wang Y, Tang K et al (2017) Critical roles of DNA demethylation in the activation of ripening-induced genes and inhibition of ripening-repressed genes in tomato fruit. Proc Natl Acad Sci 114:E4511–E4519. https:// doi.org/10.1073/pnas.1705233114 60. Candaele J, Demuynck K, Mosoti D et al (2014) Differential methylation during maize leaf growth targets developmentally regulated genes. Plant Physiol 164:1350–1364. https://doi.org/10.1104/ pp.113.233312 61. Song Q, Zhang T, Stelly DM, Chen ZJ (2017) Epigenomic and functional analyses reveal roles of epialleles in the loss of photoperiod sensitivity during domestication of allotetraploid cottons. Genome Biol 18:99. https://doi.org/10.1186/s13059-0171229-8 62. Holoch D, Moazed D (2015) RNA-mediated epigenetic regulation of gene expression. Nat Rev Genet 16:71–84 63. Matzke MA, Kanno T, Matzke AJM (2015) RNA-directed DNA methylation: the evolution of a complex epigenetic pathway in flowering plants. Annu Rev Plant Biol

30

Xiaodong Yang and Sally A. Mackenzie

66:243–267. https://doi.org/10.1146/ annurev-arplant-043014-114633 64. Hossain MS, Kawakatsu T, Do KK et al (2017) Divergent cytosine DNA methylation patterns in single-cell, soybean root hairs. New Phytol 214:808–819. https://doi.org/ 10.1111/nph.14421 65. Lauria M, Echegoyen-Nava RA, Rodrı´guezRı´os D et al (2017) Inter-individual variation in DNA methylation is largely restricted to tissue-specific differentially methylated regions in maize. BMC Plant Biol 17:52. https://doi.org/10.1186/s12870-0170997-3 66. Turco GM, Kajala K, Kunde-Ramamoorthy G et al (2017) DNA methylation and gene expression regulation associated with vascularization in Sorghum bicolor. New Phytol 214:1213. https://doi.org/10.1111/nph. 14448 67. Johannes F, Porcher E, Teixeira FK et al (2009) Assessing the impact of transgenerational epigenetic variation on complex traits. PLoS Genet 5(6):e1000530. https://doi. org/10.1371/journal.pgen.1000530 68. Yang DL, Zhang G, Tang K et al (2016) Dicer-independent RNA-directed DNA methylation in Arabidopsis. Cell Res 26:66–82. https://doi.org/10.1038/cr. 2015.145 69. Scheid OM, Probst AV, Afsar K, Paszkowski J (2002) Two regulatory levels of transcriptional gene silencing in Arabidopsis. Proc Natl Acad Sci 99:13659–13662. https://doi. org/10.1073/pnas.202380499 70. Li D, Palanca AMS, Won SY et al (2017) The MBD7 complex promotes expression of methylated transgenes without significantly altering their methylation status. elife 6: e19893. https://doi.org/10.7554/eLife. 19893 71. Williams BP, Pignatta D, Henikoff S, Gehring M (2015) Methylation-sensitive expression of a DNA demethylase gene serves as an epigenetic rheostat. PLoS Genet 11:e1005142. https://doi.org/10.1371/journal.pgen. 1005142 72. Wibowo A, Becker C, Marconi G et al (2016) Hyperosmotic stress memory in Arabidopsis is mediated by distinct epigenetically labile sites in the genome and is restricted in the male germline by DNA glycosylase activity. elife 5: e13546. https://doi.org/10.7554/eLife. 13546 73. Zhu H, Wang G, Qian J (2016) Transcription factors as readers and effectors of DNA

methylation. Nat Rev Genet 17:551–565. https://doi.org/10.1038/nrg.2016.83 74. Neri F, Rapelli S, Krepelova A et al (2017) Intragenic DNA methylation prevents spurious transcription initiation. Nature 543:72–77. https://doi.org/10.1038/ nature21373 75. Wang X, Hu L, Wang X et al (2016) DNA methylation affects gene alternative splicing in plants: an example from rice. Mol Plant 9:305–307 76. Jackson JP, Lindroth AM, Cao X, Jacobsen SE (2002) Control of CpNpG DNA methylation by the KRYPTONITE histone H3 methyltransferase. Nature 416:556–560. https:// doi.org/10.1038/nature731 77. Wollmann H, Stroud H, Yelagandula R et al (2017) The histone H3 variant H3.3 regulates gene body DNA methylation in Arabidopsis thaliana. Genome Biol 18:94. https:// doi.org/10.1186/s13059-017-1221-3 78. Luo M, Yu CW, Chen FF et al (2012) Histone deacetylase HDA6 is functionally associated with AS1 in repression of KNOX genes in Arabidopsis. PLoS Genet 8:e1003114. https://doi.org/10.1371/journal.pgen. 1003114 79. Kim JM, To TK, Seki M (2012) An epigenetic integrator: new insights into genome regulation, environmental stress responses and developmental controls by histone deacetylase 6. Plant Cell Physiol 53:794–800 80. Iwasaki M, Takahashi H, Iwakawa H et al (2013) Dual regulation of ETTIN (ARF3) gene expression by AS1-AS2, which maintains the DNA methylation level, is involved in stabilization of leaf adaxial-abaxial partitioning in Arabidopsis. Development 140:1958–1969. https://doi.org/10.1242/ dev.085365 81. Blevins T, Pontvianne F, Cocklin R et al (2014) A two-step process for epigenetic inheritance in Arabidopsis. Mol Cell 54:30–42. https://doi.org/10.1016/j. molcel.2014.02.019 82. Xu Y-Z, de la Rosa Santamaria R, Virdi KS et al (2012) The chloroplast triggers developmental reprogramming when MUTS HOMOLOG1 is suppressed in plants. Plant Physiol 159:710–720. https://doi.org/10. 1104/pp.112.196055 83. Virdi KS, Laurie JD, Xu YZ et al (2015) Arabidopsis MSH1 mutation alters the epigenome and produces heritable changes in plant growth. Nat Commun 6:6386. https://doi. org/10.1038/ncomms7386

Whole Genome Methylome Analysis in Plants 84. Shao MR, Kumar Kenchanmane Raju S, Laurie JD et al (2017) Stress-responsive pathways and small RNA changes distinguish variable developmental phenotypes caused by MSH1 loss. BMC Plant Biol 17:47. https:// doi.org/10.1186/s12870-017-0996-4 85. Jenkinson G, Pujadas E, Goutsias J, Feinberg AP (2017) Potential energy landscapes identify the information-theoretic nature of the epigenome. Nat Genet 49:719–729. https:// doi.org/10.1038/ng.3811 86. Jenkinson G, Abante J, Feinberg AP, Goutsias J (2018) An information-theoretic approach to the modeling and analysis of wholegenome bisulfite sequencing data. BMC Bioinformatics 19:87. https://doi.org/10. 1186/s12859-018-2086-5 87. Hofmeister BT, Lee K, Rohr NA et al (2017) Stable inheritance of DNA methylation allows creation of epigenotype maps and the study of epiallele inheritance patterns in the absence of genetic variation. Genome Biol 18:155. https://doi.org/10.1186/s13059-0171288-x 88. Taudt A, Roquis D, Vidalis A et al (2018) METHimpute: imputation-guided construction of complete methylomes from WGBS data. BMC Genomics 19:444. https://doi. org/10.1186/s12864-018-4641-x 89. Tran H, Zhu H, Wu X et al (2018) Identification of differentially methylated sites with weak methylation effects. Genes (Basel) 9. https://doi.org/10.3390/genes9020075 90. Srivastava A, Karpievitch YV, Eichten SR et al (2019) HOME: a histogram based machine learning approach for effective identification of differentially methylated regions 2. 20 (1):253. https://doi.org/10.1101/228221 91. Sanchez R, Yang X, Kundariya H et al (2018) Enhancing resolution of natural methylome reprogramming behavior in plants. bioRxiv:252106. https://doi.org/10.1101/ 252106 92. Becker C, Hagmann J, Mu¨ller J et al (2011) Spontaneous epigenetic variation in the Arabidopsis thaliana methylome. Nature 480:245–249. https://doi.org/10.1038/ nature10555

31

93. Schmitz RJ, Schultz MD, Lewsey MG et al (2011) Transgenerational epigenetic instability is a source of novel methylation variants. Science 334:369–373. https://doi.org/10. 1126/science.1212959 94. Sanchez R, Mackenzie SA (2016) Genomewide discriminatory information patterns of cytosine DNA methylation. Int J Mol Sci 17 (6):938. https://doi.org/10.3390/ ijms17060938 95. Sanchez R, Mackenzie SA (2016) Information thermodynamics of cytosine DNA methylation. PLoS One 11:e0150427. https:// doi.org/10.1371/journal.pone.0150427 96. Yang X, Kundariya H, Xu Y-Z et al (2015) MutS HOMOLOG1-derived epigenetic breeding potential in tomato. Plant Physiol 168:222–232. https://doi.org/10.1104/pp. 15.00075 97. Raju SKK, Shao MR, Sanchez R et al (2018) An epigenetic breeding system in soybean for increased yield and stability. Plant Biotechnol J 16:1836–1847. https://doi.org/10.1111/ pbi.12919 98. Reinders J, Wulff BBH, Mirouze M et al (2009) Compromised stability of DNA methylation and transposon immobilization in mosaic Arabidopsis epigenomes. Genes Dev 23:939–950. https://doi.org/10.1101/gad. 524609 99. Lang-Mladek C, Popova O, Kiok K et al (2010) Transgenerational inheritance and resetting of stress-induced loss of epigenetic gene silencing in Arabidopsis. Mol Plant 3:594–602. https://doi.org/10.1093/mp/ ssq014 100. Uller T, English S, Pen I (2015) When is incomplete epigenetic resetting in germ cells favoured by natural selection? Proc R Soc B Biol Sci 282:1–8. https://doi.org/10.1098/ rspb.2015.0682 101. Quadrana L, Colot V (2016) Plant transgenerational epigenetics. Annu Rev Genet 50:467–491. https://doi.org/10.1146/ annurev-genet-120215-035254

Chapter 3 Understanding DNA Methylation Patterns in Wheat Laura-Jayne Gardiner Abstract The bread wheat genome is large (17 Gb), allohexaploid, and highly repetitive (80–90% of the genome), which makes genomic and epigenomic analyses expensive to conduct and a challenge to analyze. Here we provide an overview of recent bioinformatic and experimental methods that have been developed to understand DNA methylation patterns in the complex polyploid genome of wheat. Key words Wheat, Polyploid, DNA methylation, Genomics, Epigenomics

1

Introduction

1.1 The Use of Epigenetics in Plant Breeding

Bread wheat is the dominant cereal crop grown in temperate countries and is one of the most important crops for human and livestock feed [1]. It is predicted that by 2050 food production will have to have increased by 50% to meet demand despite pressures of competition for high-quality agricultural land, resource limitations, and adverse environmental conditions. It is therefore a top priority to increase wheat yields and as such wheat research is a field with increasing importance [2]. In plant breeding, researchers strive to identify alleles linked to favorable traits that can be bred into elite germplasm. Traits of interest can range from abiotic or biotic stress resistance, such as disease resistance, to agronomic traits such as height. It is likely that alongside genetic variation, all forms of genomic change, including epigenetic variation, can contribute to performance variation in crops such as wheat. However, the role of epigenetic variation in wheat improvement is poorly defined despite it being widespread in the wheat genome [3]. We now know that epigenetic variation can be stably inherited and that spontaneous epialleles are rare [4, 5]. Therefore, the contributions of epigenetic variants should be assessed alongside classic genetic variation and epigenetic

Charles Spillane and Peter McKeown (eds.), Plant Epigenetics and Epigenomics: Methods and Protocols, Methods in Molecular Biology, vol. 2093, https://doi.org/10.1007/978-1-0716-0179-2_3, © Springer Science+Business Media, LLC, part of Springer Nature 2020

33

34

Laura-Jayne Gardiner

variants that are defined and linked to traits could potentially be used as new sources of variation in breeding programs. In the plant community, an epigenetic trait is defined as a stably heritable phenotype resulting from changes in gene expression without DNA sequence alteration [6]. Epigenetic variation is common between tissues and developmental stages and can be influenced by factors such as environment and disease [7]. Mechanisms for epigenetic control include DNA methylation, histone modification, and non-coding RNA gene silencing. Cytosine DNA methylation acts as a mechanism of gene expression control. In plants, it occurs typically at CpG residues but can also occur at CHG and CHH sites (non-CpG sites where H represents adenine, cytosine, or thymine) [8]. In Arabidopsis thaliana and Chinese Spring wheat most gene body methylation occurs at CpG sites, while methylation elsewhere and in repetitive regions occurs at CpG and non-CpG sites [9–11]. Methylation in gene promoter regions is thought to inhibit regulatory protein binding and repress transcription; it can also silence transposable elements (TEs) and these are typically highly methylated. Conversely, methylation in introns/exons correlates with highly expressed genes [9]. In various crop species, epigenetic variation has been shown to have a major effect on key traits. In cotton, which is also polyploid, methylation differences were observed between homeologous genes. For example, the gene COL2D is activated by loss of methylation in allotetraploid cotton, influencing flowering time in domesticated lines [12]. In maize, the causal gene of a major QTL enhancing resistance to stalk rot, ZmCCT, can have two epigenetic states, one with a TE upstream of the ZmCCT promoter and the other without. This TE has enriched methylated CG that suppresses expression and increases disease susceptibility [13]. Tissue culture-induced demethylation of a retrotransposon in the intron of an oil palm gene, DEFICIENS, affects its splicing and causes premature termination [14]. This demethylation contributes to the mantled phenotype that limits clonal propagation of this key global crop. Similar mechanisms of epigenetic change in gene expression that are mediated by retrotransposons adjacent to promoters have also been noted in wheat, but these mechanisms are as yet poorly defined [15]. Furthermore, differential methylation of homeologous genes in wheat could control gene dosage between the sub-genomes: in previous surveys of methylation in Chinese Spring bread wheat the majority of methylation was conserved across the sub-genomes, but differential methylation was observed between them, and in promoter regions this was correlated with decreased expression of the loci on the methylated sub-genome [11].

Enabling Epigenomic Analysis of Bread Wheat

1.2 Challenges of Epigenomic Analysis in Wheat

35

The bread wheat genome is one of the largest higher plant genomes at 17 Gb in size. Wheat is allohexaploid, comprised of three independently maintained A, B, and D sub-genomes that are functionally diploid and derived from three diploid progenitors (AABBDD): the A sub-genome from Triticum urartu, the B sub-genome from an unknown species related to Aegilops speltoides, and the D sub-genome from Aegilops tauschii. AABB tetraploids (Triticum turgidum) appeared less than 0.5 million years ago, and bread wheat from hybridization with the D genome 10,000 years ago [16]. The coding regions of the three homeologous diploid wheat genomes are highly conserved, sharing over 90% homology [17]. The large size of the wheat genome and its high repeat content, alongside its polyploid nature, make genomic analyses expensive to conduct and a challenge to analyze [18]. Analyses of wheat are assisted by recent developments in its reference genome sequence and genomic resources; there is now a near-complete reference sequence for bread wheat based on the reference variety Chinese Spring [18–22]. It is thought that the gene space of wheat is fully represented within this reference, and it contains 21 chromosome-like sequence assemblies that are representative of the 21 chromosomes of wheat. The wheat reference represents hexaploid wheat as three diploid sub-genomes that simplify downstream analyses and open up analysis of the wheat genome to more cost-effective methods of re-sequencing. While it is possible to re-sequence a single accession of wheat to perform genotyping, when this analysis is scaled up to a population or collection of accessions it becomes prohibitively expensive. Furthermore, if we wish to profile DNA methylation at the base pair level across an accession, this is even more expensive due to the high depth of sequencing required. For methylation profiling, it is common to use bisulfite treatment of DNA prior to sequencing. This treatment deaminates unmethylated cytosines but not methylated cytosines, resulting in their specific conversion from a cytosine to a uracil residue. When combined with sequencing, bisulfite treatment allows the discrimination of methylated and unmethylated resides (methyl-seq). However, due to the presence of partially or hemi-methylated sites, where we can sometimes observe methylation in 90% conversion denoted as well converted and >98% conversion denoted as near full conversion [48]. In hexaploid wheat, Gardiner et al. [29] reported average bisulfite conversion rates of 98.7% across 104 landraces from the Watkins diversity collection.

Acknowledgments This chapter was planned and written by L. J. G. with editorial assistance from P. M. and C. S. The research presented to develop this protocol was supported by the BBSRC via an ERA-CAPS grant (BB/N005104/1) (L.G.) and a BBSRC grant BB/L011786/1 (L.O.). References 1. Shewry PR (2009) Wheat. J Exp Bot 60:1537–1553 2. Allen AM, Barker GL, Berry ST et al (2011) Transcript-specific, single-nucleotide polymorphism discovery and linkage analysis in hexaploid bread wheat (Triticum aestivum L.). Plant Biotechnol J 9:1086–1099 3. Springer NM, Schmitz RJ (2017) Exploiting induced and natural epigenetic variation for crop improvement. Nat Rev Genet 18:563–575 4. Johannes F, Porcher E, Teixeira FK et al (2009) Assessing the impact of transgenerational epigenetic variation on complex traits. PLoS Genet 5:e1000530 5. Hofmeister BT, Lee K, Rohr NA, Hall DW, Schmitz RJ (2017) Stable inheritance of DNA methylation allows creation of epigenotype maps and the study of epiallele inheritance patterns in the absence of genetic variation. Genome Biol 18:155 6. Wolffe AP, Matzke MA (1999) Epigenetics: regulation through repression. Science 286:481–486 7. Finnegan EJ (2002) Epialleles—a source of random variation in times of stress. Curr Opin Plant Biol 5:101–106 8. Finnegan EJ, Genger RK, Peacock WJ, Dennis ES (1998) DNA methylation in plants. Ann Rev Plant Physiol Plant Mol Biol 49:223–247

9. Zhang X, Yazaki J, Sundaresan A et al (2006) Genome-wide high-resolution mapping and functional analysis of DNA methylation in Arabidopsis. Cell 126:1189–1201 10. Cokus SJ, Feng S, Zhang X et al (2008) Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 452:215–219 11. Gardiner L, Quinton-Tulloch M, Olohan L et al (2015) A genome-wide survey of DNA methylation in hexaploid wheat. Genome Biol 16:273 12. Song Q, Zhang T, Stelly DM, Chen J (2017) Epigenomic and functional analyses reveal roles of epialleles in the loss of photoperiod sensitivity during domestication of allotetraploid cottons. Genome Biol 18:99 13. Wang C, Yang Q, Wang W et al (2017) A transposon-directed epigenetic change in ZmCCT underlies quantitative resistance to Gibberella stalk rot in maize. New Phytol 215:1503–1515 14. Ong-Abdullah M, Ordway JM, Jiang N et al (2015) Loss of Karma transposon methylation underlies the mantled somaclonal variant of oil palm. Nature 525:533–537 15. Kashkush K, Feldman M, Levy A (2002) Transcriptional activation of retrotransposons alters the expression of adjacent genes in wheat. Nat Genet 33:102–106 16. Marcussen T, Sandve SR, Heier L et al (2014) Ancient hybridizations among the ancestral

Enabling Epigenomic Analysis of Bread Wheat genomes of bread wheat. Science 345:1250092 17. Kawaura K, Mochida K, Enju A et al (2009) Assessment of adaptive evolution between wheat and rice as deduced from full-length common wheat cDNA sequence data and expression patterns. BMC Genomics 10:271 18. Brenchley R, Spannagl M, Pfeifer M et al (2012) Analysis of the bread wheat genome using whole-genome shotgun sequencing. Nature 491:705–710 19. IWGSC (2018) Shifting the limits in wheat research and breeding using a fully annotated reference genome. Science 361(6403): eaar7191 20. Chapman JA, Mascher M, Buluc¸ A et al (2015) A whole-genome shotgun approach for assembling and anchoring the hexaploid bread wheat genome. Genome Biol 16:26 21. Clavijo BJ, Venturini L, Schudoma C et al (2017) An improved assembly and annotation of the allohexaploid wheat genome identifies complete families of agronomic genes and provides genomic evidence for chromosomal translocations. Genome Res 27:885–896 22. Zimin AV, Puiu D, Hall R, Kingan S, Clavijo BJ, Salzberg SL (2017) The first near-complete assembly of the hexaploid bread wheat genome, Triticum aestivum. Gigascience 6 (11):1–7 23. Olohan L, Gardiner LJ, Lucaci A et al (2018) A modified sequence capture approach allowing standard and methylation analyses of the same enriched genomic DNA sample. BMC Genomics 19:250 24. Winfield MO, Wilkinson PA, Allen AM et al (2012) Targeted re-sequencing of the allohexaploid wheat exome. Plant Biotechnol J 10:733–742 25. Krasileva KV, Vasquez-Gross HA, Howell T et al (2017) Uncovering hidden variation in polyploid wheat. PNAS 114(6):E913 26. Jordan KW, Wang S, Lun Y et al (2015) A haplotype map of allohexaploid wheat reveals distinct patterns of selection on homoeologous genomes. Genome Biol 16:48 27. Gardiner LJ, Bansept-Basler P, Olohan L et al (2016) Mapping-by-sequencing in complex polyploid genomes using genic sequence capture: a case study to map yellow rust resistance in hexaploid wheat. Plant J 87:403–419 28. Grewal S, Gardiner L, Ndreca B, Knight E, Moore G, King I, King J (2017) Comparative mapping and targeted-capture sequencing of the gametocidal loci in Aegilops sharonensis. Plant Genome 10. https://doi.org/10.3835/ plantgenome2016.09.0090

45

29. Gardiner LJ, Joynson R, Omony J et al (2018) Hidden variation in polyploid wheat drives local adaptation. Genome Res 28:1319–1332 30. Steuernagel B, Periyannan SK, Herna´ndezPinzo´n I et al (2016) Rapid cloning of disease-resistance genes in plants using mutagenesis and sequence capture. Nat Biotechnol 34:652–655 31. Gardiner LJ, Brabbs T, Akhunov A et al (2018) Integrating genomic resources to present full gene and promoter capture probe sets for bread wheat. Gigascience 8. https://doi.org/10. 1093/gigascience/giz018 32. Li H, Durbin R (2009) Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics 25:1754–1756 33. Langmead B, Salzberg S (2012) Fast gappedread alignment with Bowtie 2. Nat Methods 9:357–359 34. Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map (SAM) format and SAMtools. Bioinformatics 25:2078–2079 35. McKenna A, Hanna M, Banks E et al (2010) The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20:1297–1303 36. Krueger F, Andrews SR (2011) Bismark: a flexible aligner and methylation caller for BisulfiteSeq applications. Bioinformatics 27:1571–1572 37. Chen P, Cokus S, Pellegrini M (2010) BS Seeker: precise mapping for bisulfite sequencing. BMC Bioinformatics 11:203 38. Xi Y, Li W (2009) BSMAP: whole genome bisulfite sequence MAPping program. BMC Bioinformatics 10:232 39. Guo W, Zhu P, Pellegrini M, Zhang MQ, Wang X, Ni Z (2018) CGmapTools improves the precision of heterozygous SNV calls and supports allele-specific methylation detection and visualization in bisulfite-sequencing data. Bioinformatics 34:381–387 40. Song Q, Chen ZJ (2015) Epigenetic and developmental regulation in plant polyploids. Curr Opin Plant Biol 24:101–109 41. Ramı´rez-Gonza´lez RH, Borrill P, Lang D et al (2018) The transcriptional landscape of polyploid wheat. Science 361:eaar6089 42. Akalin A, Kormaksson M, Li S, GarrettBakelman FE, Figueroa ME, Melnick A, Mason CE (2012) methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biol 13: R87 43. Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL

46

Laura-Jayne Gardiner

(2004) Versatile and open software for comparing large genomes. Genome Biol 5:R12 44. Gao S, Zou D, Mao L et al (2015) BS-SNPer: SNP calling in bisulfite-seq data. Bioinformatics 31:4006–4008 45. Kim D, Langmead B, Salzberg SL (2015) HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12:357–360 46. Lister R, O’Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH, Ecker JR (2008) Highly integrated single-base

resolution maps of the epigenome in Arabidopsis. Cell 133:523–536 47. Fojtova´ M, Kovarˇ´ık A, Matya´sˇek R (2001) Cytosine methylation of plastid genome in higher plants. Fact or artefact? Plant Sci 160:585–593 48. Genereux DP, Johnson WC, Burden AF, Sto¨ger R, Laird CD (2008) Errors in the bisulfite conversion of DNA: modulating inappropriate- and failed-conversion frequencies. Nucleic Acids Res 36:e150

Chapter 4 MCSeEd (Methylation Context Sensitive Enzyme ddRAD): A New Method to Analyze DNA Methylation Marco Di Marsico, Elisa Cerruti, Cinzia Comino, Andrea Porceddu, Alberto Acquadro, Stefano Capomaccio, Gianpiero Marconi, and Emidio Albertini Abstract Methylation context sensitive enzyme ddRAD (MCSeEd) is a NGS-based method for genome-wide investigations of DNA methylation at different contexts requiring only low to moderate sequencing depth. It is particularly useful for identifying methylation changes in experimental systems challenged by biotic or abiotic stresses or at different developmental stages. Key words DNA methylation, Methylation context sensitive enzyme ddRAD, MCSEeD, Sequencing, Regulation gene expression, Development, Zea mays

1

Introduction In plants and mammals, DNA methylation is highly conserved and contributes to the regulation of nuclear gene expression and genome stability. For example, specific patterns of genomic methylation are fundamental for organism development and to a rapid response to changing environment and biotic and abiotic stresses [1]. DNA methylation usually occurs at the 50 position of cytosine [2] and in plants it occurs in CG, CHG, and CHH contexts where H represents A, T, or C [3, 4]. DNA methylation in promoter regions has important roles in regulating gene expression, leading to inhibition or activation of gene transcription [5, 6]. In particular, CG methylation is usually associated with gene regulation [7], whereas CHG and CHH methylation marks are mainly involved in the regulation of transposable elements and in some cases to

Marco Di Marsico and Elisa Cerruti have to be considered as first author Charles Spillane and Peter McKeown (eds.), Plant Epigenetics and Epigenomics: Methods and Protocols, Methods in Molecular Biology, vol. 2093, https://doi.org/10.1007/978-1-0716-0179-2_4, © Springer Science+Business Media, LLC, part of Springer Nature 2020

47

48

Marco Di Marsico et al.

genes adjacent to these [3, 8]. DNA methylation may also control growth and development throughout the life cycle of plants [9, 10]. Abnormal loss of DNA methylation, especially in repetitive elements, may lead to their mobilization and, eventually, genome destabilization [11]. Furthermore, it has been reported that DNA methylation can be altered at individual loci in response to environmental stress and that the “memory” of the stress can, in some cases, be transmitted to the progenies of stressed plants [12]. One of the most widely used methods for DNA methylation analysis is whole-genome bisulfite sequencing (WGBS) that produces high-resolution genome-wide DNA methylome maps [13]. A main problem of the WGBS method is that it requires a very high sequencing coverage (at least 30) and a reference genome sequence, and that it is computationally demanding [14]. Methods requiring lower sequencing depth have been developed recently. One example is MREseq (methylation-sensitive restriction enzyme sequencing), a method that allows the detection of methylation through the digestion of genomic DNA with at least three methylation-sensitive restriction endonucleases. The enzymes are chosen for their capability to cut only when the CG in their recognition site is unmethylated, their activity being blocked when the same context is methylated [11]. With the aim of extending the MREseq method to detect methylation at all C contexts (CG, CHG, CHH), we developed MCSeEd (methylation context sensitive enzyme ddRAD) [15], very simple, highly scalable, cost-effective extension of the original ddRAD [16] protocol that allows the detection of methylation changes for the CG, CHG, CHH, and 6 mA contexts. Briefly, MCSeEd method combines a double digestion with a methylation-sensitive (AciI, PstI, and EcoT22I for CG, CHG, and CHH context, respectively) and a methylation-insensitive (MseI) restriction enzyme (Table 1). Since the methylation-sensitive enzymes cannot digest methylated sites, the read counts at restriction sites are expected to anticorrelate to genomic methylation levels. MCSeEd allows to call both DNA methylation changes and SNPs after demultiplexing highly multiplexed libraries. In addition, since MCSeEd does not apply DNA sequence modifications, it does not require a reference genomic sequence to infer the (methylated) modified positions. We tested MCSeEd technique to detect genome-wide changes of DNA methylation occurring during maize development, using shoot (S) and root (R) samples of the B73 line, collected 5 days after germination. A total of 27,525 positions that showed a change in DNA methylation between roots and shoots were referred to as differentially methylation positions (DMP). Next, we analyzed the genomic distribution of DMPs. A total of 216, 90, and 16 differentially methylated regions (DMRs) - defined as genomic regions with

MCSeEd: A New Method to Analyze DNA Methylation

49

Table 1 Characteristics of restriction enzymes used in MCSeEd method Enzyme

Recognition site

Methyl-sensitive?

Cleavage blocked by

Methyl-context

AciI

CCGC/GCGG

Yes

C 5mCGC/G 5mCGG

CG

PstI

CTGCAG

Yes

CTG 5mCAG

CHG

EcoT22I

ATGCAT

Yes

ATG 5mCAT

CHH

MseI

TTAA

No





co-regulated methylation changes upon drought stress as identified by an adjacent window approach that targeted adjacent DMPs with concordant methylation changes - were scored for CG, CHG, and CHH contexts, respectively. For these analyses we developed a dynamic approach based on Perl scripts produced in house. To confirm the accuracy of the MCSeEd technique, the inferred methylation patterns were compared to detailed methylation maps which have been constructed using the WGBS sequencing technique previously. Normalized numbers of reads interrogating cytosines for CG, CHG, and CHH contexts from the MCSeEd shoot samples were compared with BS-seq scores (numbers of reads for mC in CG, CHG, and CHH contexts) from two public datasets of shoot-WGBS as benchmark data [17] according to Maunakea et al. [18]. We found a negative correlation for CG (Spearman correlation 0.506, P-value ¼ 2.2e16) and CHG contexts (Spearman correlation 0.517, P-value ¼ 2.2e16, Fig. 4), whereas for CHH no correlation was observed (Spearman correlation 0.012 P-value ¼ 1.1e09, Fig. 4), probably due to the lower number of read counts. Moreover, both DMPs and DMRs were used to run a principal component analysis and were able to discriminate shoots from roots samples for all evaluated contexts (Fig. 1). DMRs in symmetric context (CG, CHG) were found to be enriched in gene bodies and in the regions 0–2 kb upstream of the TSS (transcription starting site) and downstream of the TTS (transcription termination site), whereas for asymmetric context (CHH) no significant enrichment was found (data not shown). Using genomic annotation, we identified the genes which overlapped with DMPs and/or DMRs, which we referred to as differentially methylated genes (DMGs). Gene ontology analysis showed that most DMGs were related to development and/or regulation of biological processes, underlying the association of DNA methylation in developmental differences between roots and shoots. As way of example some genes affected by differential methylation (DMG, differentially methylated genes) are reported in Table 2.

50

Marco Di Marsico et al.

Fig. 1 PCA analysis performed with DMPs

Table 2 Genes differentially methylated of shoots vs. roots DMG

DMR

Function

D expa

Zm00001d036982

Down

Role in senescence and seed development

2.41

Zm00001d048527

Down

Regulates meristem initiation at lateral positions

1.27

Zm00001d018260

Down

Involved in leaf development and expressed in root, shoot, and flower

2.73

Zm00001d044806

Gene

It plays a role in xylem differentiation downstream of auxin

2.96

Zm00001d002485

Down

Encodes a microtubule-associated protein. Putative role in flower development

1.27

Zm00001d045359

Up

Involved in xylem and phloem pattern formation

1.23

up 0–2 kb upstream TSS, gene gene body, down 0–2 kb downstream TTS a Different expression (in fold changes)

2 2.1

Materials DNA Extraction

1. Water bath or heating block to 65  C. 2. Genomic DNA extraction kit (we use DNeasy Plant Mini Kit (Qiagen) with Buffer AP1 and Buffer AW1 made up and warmed to 65  C for 5 min to dissolve any precipitates which have formed upon storage; ethanol was added to Buffers AW and AP3/E, and RNAse A to Buffer AP1 immediately before use). 3. Ultra-pure water warmed to 65  C. 4. Liquid nitrogen. 5. Spectrophotometer (e.g., NanoDrop, ThermoFisher).

MCSeEd: A New Method to Analyze DNA Methylation

51

6. Fluorometer for DNA quantification (e.g., Qubit, ThermoFisher), with appropriate assay tubes and assay kit (e.g., Qubit dsDNA HS Assay kit). 7. Agarose. 8. Ethidium bromide (or other DNA stain). 9. 50 TAE buffer: 2 M Tris base (242.2 g/L), 50 mM EDTA (100 mL of 0.5 M stock solution/L), 1 M acetic acid (57.1 mL 100% acetic acid/L), diluted to 1 for use (see Note 1) . 10. DNA Ladder (e.g., GeneRuler, ThermoFisher). 11. Gel electrophoresis system. 12. Gel imaging system. 2.2 Primer and Adapter Preparation

1. Barcoded adapter (Table 3: from P1 to P12) obtained by annealing single-stranded oligos (TOP and BOT). 2. Common adapter MseI: obtained by annealing single-stranded oligos (TOP and BOT). 3. Ultra-pure water.

2.3 Restriction/ Ligation reaction

1. Genomic template DNA: 100–200 ng. 2. EcoT22I restriction enzyme 10 U/μL (Takara). 3. MseI restriction enzyme 10 U/μL (e.g., New England BioLabs). 4. AciI restriction enzyme 10 U/μL. 5. PstI restriction enzyme 20 U/μL. 6. Unique barcoded adapter for each sample. 7. A common adapter. 8. T4 DNA ligase 5 U/μL. 9. 10 mM ATP. 10. 5 RL buffer (CutSmart): 5 CutSmart Buffer, 25 mM DTT (for AciI, EcoT22, and MseI enzymes). 11. 5 RL buffer (3.1): 5 3.1 buffer, 25 mM DTT (for PstI enzyme). 12. Thermal cycler for incubation.

2.4 Purification with PEG8000

1. 1 M MgCl2. 2. 3 g PEG8000 (Merk). 3. 0.2 μM filters (e.g., Corning).

2.5 Purification with AMpure Beads

1. Freshly prepared 70% ethanol. 2. Ultra-pure water for DNA elution.

52

Marco Di Marsico et al.

Table 3 List of single-stranded oligos used to obtain double-stranded adapter and to perform enrichment PCRs Barcoded adapters P1

PstI_P1_TOP PstI_P1_BOT

ACACTCTTTCCCTACACGACGCTCTTCCGATCTGCTACAATGCA [Phos]TTGTAGCAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

P2

PstI_P2_TOP PstI_P2_BOT

ACACTCTTTCCCTACACGACGCTCTTCCGATCTAGCATCATGCA [Phos]TGATGCTAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

P3

PstI_P3_TOP PstI_P3_BOT

ACACTCTTTCCCTACACGACGCTCTTCCGATCTCCATCCATGCA [Phos]TGGATGGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

P7

AciI_P7_TOP AciI_P7_BOT

ACACTCTTTCCCTACACGACGCTCTTCCGATCTCCACCGT [Phos] CGACGGTGGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

P8

AciI_P8_TOP AciI_P8_BOT

ACACTCTTTCCCTACACGACGCTCTTCCGATCTCGTGCGA [Phos] CGTCGCACGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

P9

AciI_P9_TOP AciI_P9_BOT

ACACTCTTTCCCTACACGACGCTCTTCCGATCTGAACAAT [Phos] CGATTGTTCAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

P10

EcoT_P10_TOP EcoT_P10_BOT

ACACTCTTTCCCTACACGACGCTCTTCCGATCTCCGATATTGCA [Phos]ATATCGGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

P11

EcoT_P11_TOP EcoT_P11_BOT

ACACTCTTTCCCTACACGACGCTCTTCCGATCTGACACGTTGCA [Phos]ACGTGTCAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

P12

EcoT_P12_TOP EcoT_P12_BOT

ACACTCTTTCCCTACACGACGCTCTTCCGATCTTCGGTTCTGCA [Phos]GAACCGAAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT

Common adapter MseI

MseI_TOP MseI_BOT

PCR

Primers

GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTG [Phos]TACAGATCGGAAGAGCGAGAACAA

Common PCR1_MCSeEd

AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACG

Index 2

PCR2_Index_2

CAAGCAGAAGACGGCATACGAGATACATCGGTGACTGGAG TTCAGACGTGTGC

Index 4

PCR2_Index_4

CAAGCAGAAGACGGCATACGAGATTGGTCAGTGACTGGAG TTCAGACGTGTGC

3. Agilent AMPure beads (Beckman). 4. Agencourt SPRIPlate Magnetic Plate (Beckman). 2.6 Size Selection and Gel Extraction

1. 1.2% agarose gel. 2. 20 μL of RL and 4 μL of xylene cyanol as loading dye. 3. DNA ladder. 4. Gel electrophoresis system.

MCSeEd: A New Method to Analyze DNA Methylation

53

5. Gel imaging system. 6. Gel extraction kit (e.g., QIAquick, Qiagen with Elution Buffer preheated to 50  C). 7. Qubit dsDNA HS Assay kit (range 0.1–100 ng). 2.7

PCR Enrichment

1. High-fidelity DNA polymerase Taq Kit (we use the Phusion system (New England BioLabs)) 1 U Phusion Taq polymerase. 2. 5 mM dNTP. 3. 10 mM PCR1_MCSeEd primer (see Table 3). 4. 10 mM PCR2index_2/4 primer (see Table 3).

3 3.1

Methods DNA Extraction

High-quality DNA without signs of degradation and denaturation is required: this is an important criterion for the success of the library preparation. 1. Collect shoots and roots from maize seedling, freeze in liquid nitrogen, and store at 80  C. 2. Grind each sample under liquid nitrogen using mortar and pestle.

Fig. 2 Visualization of genomic DNA. S shoot, M marker, R root

54

Marco Di Marsico et al.

Table 4 Adapter annealing temperature profile Denaturation 

95 C  3 min

Ramp 

Annealing

0.5 C every 10 s



70 C  10 min

Ramp 

05 C every 10 s

Annealing 

20 C  20 min

Hold 10  C

Fig. 3 Lanes (A–D): single-stranded oligos; lanes (E–H ): annealed adapters; M marker

3. Use the resulting powder to extract DNA with the kit, following the manufacturer’s instruction, and elute resulting DNA in 150 μL of ultra-pure water (see Note 2). 4. Quantify DNA with NanoDrop: use 2 μL of DNA solution. 5. To check DNA quality, load 5 μL of each sample in a 1.2% agarose gel, including a molecular weight marker in at least in one lane (Fig. 2, see Note 3). 3.2 Adapter Preparation

Libraries are prepared by ligating DNA fragments to doublestranded barcoded adapters, prepared as follows: 1. Dilute lyophilized single-stranded adapters at 100 μM in ultrapure water. 2. Mix each single-stranded oligo (TOP) with its complementary oligo (BOTTOM) in a 1:1 ratio (i.e., 25 μL of each oligo at a final concentration of 50 μM; Table 3). 3. Anneal the TOP and BOT adapters in the thermocycler following the temperature profile reported in Table 4. 4. Run the annealed adapters on a 4% agarose gel to verify the efficacy of the procedure (Fig. 3).

3.3 Double Digestion and Adapter Ligation

MCSeEd is a reduced representation library method. Therefore, genomic DNA is digested with a methylation-sensitive and a methylation-insensitive restriction enzyme; fragments are then ligated to a barcoded adapter at the methylation-sensitive restriction site and a common adapter at the methylation-insensitive site.

MCSeEd: A New Method to Analyze DNA Methylation

55

Table 5 Experimental design for barcoded adapters Aci_root_P7

Pst_root_P1

Eco_root_P10

Aci_root_P8

Pst_root_P2

Eco_root_P11

Aci_root_P9

Pst_root_P3

Eco_root_P12

Aci_shoot_P7

Pst_shoot_P1

Eco_shoot_P10

Aci_shoot_P8

Pst_shoot_P2

Eco_shoot_P11

Aci_shoot_P9

Pst_shoot_P3

Eco_shoot_P12

Table 6 Restriction-Ligation reaction mix for AciI and EcoT22I nucleases Working dilution

Stock volume

Final concentration

RL buffer CutSmart

5

10

1

AciI—EcoT22I

10 U/μL

0.5

5U

MseI

10 U/μL

0.5

5U

MseI adapter

50 μM

2

2 μM

ATP

10 mM

1

0.2 mM

T4 DNA ligase

5 U/μL

0.2

1U

Barcoded adapter

50 μM

2

2 μM

Ultra-pure water



6.8



Genomic DNA



Reagent

Final volume

27

100–200 ng

50

1. Prepare a 5 RL buffer (see Note 4) appropriate for both sensitive and insensitive restriction enzymes; for AciI, MseI, and EcoT22I use the CutSmart Buffer, while for PstI use the 3.1 buffer; both buffers are provided with the respective enzymes (see Note 5) . 2. Add 500 μL of 10 buffer (to a final concentration of 5), 25 μL of DTT 1 M (to a final concentration of 25 mM), and 475 μL of water. 3. DNA libraries are separately double-digested using a methylation-sensitive enzyme (either AciI, PstI, or EcoT22I) in combination with MseI as methylation-insensitive enzyme (see Note 6). According to experimental design, DNA double digestion and adapter ligation are performed in a one-step

56

Marco Di Marsico et al.

Table 7 Restriction-Ligation reaction mix for PstI nuclease Reagent

Working dilution Stock volume Final concentration

RL buffer 3.1

5

PstI

20 U/μL

0.3

6U

MseI

10 U/μL

0.7

7U

MseI adapter

50 μM

2

2 μM

ATP

10 mM

1

0.2 mM

T4 DNA ligase

5 U/μL

0.2

1U

Barcoded adapter 50 μM

2

2 μM

Ultra-pure water



6.8



Genomic DNA



10

27

Final volume

1

100–200 ng

50

Table 8 Individuals grouped by index Aci_root_P7 Aci_root_P8 Aci_root_P9

Pst_root_P1 Pst_root_P2 Pst_root_P3

Eco_root_P10 Eco_root_P11 Eco_root_P12

Index 2

Aci_shoot_P7 Aci_shoot_P8 Aci_shoot_P9

Pst_shoot_P1 Pst_shoot_P2 Pst_shoot_P3

Eco_shoot_P10 Eco_shoot_P11 Eco_shoot_P12

Index 4

reaction. Therefore, first prepare the experimental design for barcoded adapters as reported in Table 5. Prepare the restriction-ligation (see Notes 2 and 3) mixes as reported in Tables 6 and 7. 4. Incubate samples in a thermal cycler for 4 h at 37  C to perform the restriction-ligation reactions; these contain fragments of different sizes. 3.4 Purification with PEG8000

1. Combine ligated DNA (50 μL, now barcoded as in Subheading 3.3) of several samples to create a pool of individuals to be amplified according to a common index (see example in Table 8). 2. Prepare fresh 30% PEG8000/30 mM MgCl2: add 300 μL of MgCl2 to 6 mL of ultra-pure water in a Falcon tube (better performed in a 50-ml tube because PEG is very difficult to resuspend). Add 3 g of PEG8000. Reach 10 mL volume by adding sterile ultra-pure water. When the solution is clear, filter it with a 0.2-mM filter. The PEG solution must be freshly prepared.

MCSeEd: A New Method to Analyze DNA Methylation

57

3. Dispense 0.5 of PEG solution to each sample. 4. Pipette up and down at least 20 times. 5. Centrifuge at 10,000 g for 20 min at room temperature. 6. Remove the liquid phase quickly. 7. Resuspend the DNA pellet in 30 μL of ultra-pure water (see Note 7). 3.5 Purification with AMpure Beads (1.1) to Remove Fragments Shorter Than 250 bp

1. Vortex the stock AMpure solution to resuspend the beads. 2. Add 33 μL of AMpure solution to 30 μL of sample. 3. Pipette mix at least 10 times. 4. Let samples incubate for 5 min at room temperature. 5. Place samples in the magnet plate for at least 5 min to separate beads from the solution. 6. Discard the cleared supernatant (perform this step on the magnet plate). 7. Dispense 100 μL of 70% ethanol into each sample and incubate for 30 s at room temperature (perform this step on the magnet plate). 8. Repeat the washing step twice. 9. After the second wash, let bead rings in the magnetic plate until ethanol is totally evaporated: do not over dry bead rings. This step requires 8–10 min (see Note 8). 10. Remove samples from the magnetic plate. 11. Add 20 μL of ultra-pure water. 12. Mix by pipetting at least 25 times. 13. Incubate for 5 min at room temperature. 14. Place samples on the magnetic plate, take the supernatant and transfer it to a new tube.

3.6 Size Selection and Gel Purification

Illumina sequencing requires DNA fragments between 250 and 600 bp to work properly (Fig. 4). 1. Prepare a 1.2% agarose gel: 50 mL of volume and 5 μL of ethidium bromide. 2. Add to AMpure beads-purified samples 4 μL of xylene cyanol and load the resulting sample. 3. Load a molecular weight marker. 4. Run the gel for 30 min at 50 V, checking the gel under a UV lamp to verify the presence of RL smearing. 5. Add additional ethidium bromide directly to the TAE buffer if required. 6. Run the gel for additional 15 min.

58

Marco Di Marsico et al.

Fig. 4 Size selection step showing excised part of band corresponding to fragments with masses between 250 and 600 bp

7. Excise DNA fragments within the range 250–600 bp. 8. Perform gel purification with the QIAquick gel extraction Kit following the manufacturer’s instruction with one modification: use heated (at 50  C) instead of room temperature elution buffer. Pipette the elution buffer to the center of the membrane, incubate for 1 min at room temperature, and then centrifuge at 16,000 g for 1 min. 3.7 Purification with 0.8 AMpure Beads

1. Vortex the stock AMpure solution to resuspend the beads. 2. Add 38.4 μL of AMpure solution to 50 μL of sample. 3. Pipette mix at least 10 times. 4. Allow samples to incubate for 5 min at room temperature. 5. Place samples in the magnet plate for at least 5 min to separate beads from the solution. 6. Discard the cleared supernatant (perform this step on the magnet plate). 7. Wash the samples by adding 100 μL of 70% ethanol into each sample and incubate for 30 s at room temperature (perform this step on the magnet plate). 8. Repeat the washing step (Subheading 3.7, step 7) two further times. 9. After the final wash, leave bead rings in the magnetic plate until ethanol is totally evaporated: do not over dry bead rings. This step requires 8–10 min (see Note 8).

MCSeEd: A New Method to Analyze DNA Methylation

59

10. Remove samples from the magnetic plate. 11. Add 30 μL of ultra-pure water. 12. Mix by pipetting at least 25 times. 13. Incubate for 5 min at room temperature. 14. Place samples on the magnetic plate, draw off the supernatant, and transfer it to a new tube. 3.8 Qubit Quantification

1. For each sample, load 199 μL of HS buffer in a tube (see Note 9). 2. Add 1 μL of HS reagent per sample and vortex the resulting working solution. 3. Prepare at least two further samples for standards; for instance, if working with six samples, eight tubes should be prepared. 4. Load 190 μL of working solution in two Qubit tubes and add 10 μL of standard 1 (0 ng/μL) and 2 (10 ng/μL), respectively. 5. Load 198 μL of working solution in a Qubit tube and add 2 μL of purified DNA for a total volume of 200 μL (this should be done for each DNA sample). 6. Vortex each tube for 2–3 s and let samples incubate at room temperature for 2 min. 7. Perform Qubit calibration using the two standards. 8. Place one sample in the device and read the concentration, and repeat for each sample.

3.9

Enrichment PCR

This step enriches the sample for fragments that are ligated to common Mse and to barcoded adapters. This step will also add the index sequence to the common adapter site. Only if both adapters are bound to fragments, the enrichment PCR will produce the enriched library. Table 9 enrPCR mix Component

Volumes

DNA template

15 ng

5 Phusion HF buffer

10 μL

5 μM dNTPs

2 μL

10 μM PCR1ddRAD

1 μL

10 μM PCRidx2/idx4

1 μL

Phusion Taq DNA polymerase

1U

Nuclease-free water

Up to 50 μL

60

Marco Di Marsico et al.

Table 10 Thermal profile for enrPCR Initial denaturation Denaturation Annealing Elongation Final elongation End Temperature 98  C

98  C

56  C

72  C

72  C

10  C

Time

2 min

20 s

30 s

20 s

5 min

1

Cycles



12



Fig. 5 Enriched libraries before purification. M marker, S shoot, R root

1. Perform PCR according to the Phusion Taq DNA polymerase protocol (or otherwise according to manufacturer’s instructions). 2. Add 15 ng of DNA to the mix as reported in Table 9. 3. Perform 12 cycles of PCR following the profile reported in Table 10 (see Note 10). 4. Run 5 μL of each enriched library on a 1.5% agarose gel to evaluate enrichment step (Fig. 5). 3.10 Purification with AMpure Beads (1)

1. Vortex the stock AMpure solution to resuspend the beads. 2. Add 45 μL of AMpure solution to 45 μL of sample. 3. Pipette mix at least 10 times. 4. Let samples incubate for 5 min at room temperature. 5. Place samples in the magnet plate for at least 5 min to separate beads from the solution.

MCSeEd: A New Method to Analyze DNA Methylation

61

Fig. 6 Bioanalyzer analysis on libraries prepared from shoots (top) and roots (bottom)

6. Discard the cleared supernatant (perform this step on the magnet plate). 7. Wash the sample by adding 100 μL of 70% ethanol to each sample and incubate for 30 s at room temperature (perform this step on the magnet plate). 8. Repeat the washing step two further times. 9. After the final wash, leave bead rings in the magnetic plate until ethanol is totally evaporated; do not over dry bead rings. This step requires 8–10 min (see Note 8). 10. Remove samples from the magnetic plate. 11. Add 30 μL of ultra-pure water. 12. Mix by pipetting at least 25 times. 13. Incubate for 5 min at room temperature. 14. Place samples on the magnetic plate, draw off the supernatant and transfer it to a new tube. 15. Perform a Qubit quantification as in Subheading 3.8 above. 3.11

Sequencing

Libraries are sequenced using the Illumina HiSeq2500 platform. Before sequencing, it is recommended to run a Bioanalyzer analysis to verify the quality of the libraries (Fig. 6). If quality is not good enough (showing, for example, strong peaks of undesired size, i.e., smaller than 200 bp), it is not recommended to proceed with the sequencing. In Fig. 6 Bioanalyzer profiles show good quality libraries.

62

Marco Di Marsico et al.

3.12 Bioinformatic Analysis

Check the repository on https://bitbucket.org/capemaster/ mcseed/src/master/ for information about the pipeline and data analysis; this is further described by the accompanying paper by Marconi et al. (2019) [15]. These steps require two open source programs to be downloaded from the web and in-house Perl scripts from bitbucket. 1. Demultiplex raw reads from sequencing using the process_radtags tool [19] that identifies and assigns reads to each individual on the basis of barcode sequences. 2. Map sequences from each individual to the reference genome (AGPv4, https://www.maizegdb.org) with bwa mem algorithm: bam sorted and indexed files of uniquely mapped reads are produced with Samtools [20]. 3. Create a merged bam file that consists in a count matrix where columns are the libraries and rows are the location in the genome: this is important to store genomic positions sequenced and to record the number of uniquely mapped reads at each genomic location. 4. Use the bedtools suite, maintaining strandness, to collapse and sort redundant coordinates, and merge overlapping intervals [21]. 5. Convert information to a GFF file that can be used as the input file in featureCounts [22] along with the previously described bam files. 6. Normalize libraries and balance them in a RPM fashion. 7. Select loci with a coverage of at least 10 reads for further analysis. 8. Estimate relative methylation level per locus. 9. Parse filtered data using the methylKit R package [23]; differentially methylated positions are called following the methylKit manual. For genome reference-free analysis, raw reads are collapsed using Rainbow 2.0.4 [24] and CDHit [25] to create a pseudoreference genome consisting of a multi-fasta file containing read contigs. After mapping reads with bwa, a result matrix is then created for each sample invoking Samtools that counts how many sequences per contig are mapped. This matrix is then used following the loci counting approach described above.

MCSeEd: A New Method to Analyze DNA Methylation

4

63

Notes 1. Usually each component is prepared as a stock solution. 2. Pipette gently because the DNA can be easily sheared by vigorous mixing or pipetting. 3. DNA preparations need to be of reasonable quality to ensure complete digestion by the restriction enzymes analysis. 4. 5 RL buffer can be stored at 20  C for an extended period of time. 5. Unless another temperature is specified, store samples and reagent in ice. 6. Add enzymes only at the end of mix preparation. Before, do not remove them from the freezer. 7. Critical step. This is a very tricky element of the procedure. Pipette gently and avoid bubbles. When the tube is clear, the DNA is resuspended. 8. To speed up this step, the magnetic plate with samples can be placed in the chemical hood for 8 min: ethanol will evaporate quickly. 9. The Qubit fluorometer and dsDNA HS assay kit can be used to obtain accurate quantification of DNA concentration. The Qubit reagent must be at room temperature. 10. During the enrichment PCR, the number of cycles should not normally be increased beyond what is suggested, i.e., it should be less than 12, to avoid producing dimers that will alter the sequencing analysis. If starting with 8–10 ng of material, then 14 cycles can however be performed.

References 1. Law JA, Jacobsen SE (2010) Establishing, maintaining and modifying DNA methylation patterns in plants and animals. Nat Rev Genet 11:204–220 2. Chen T, Li E (2004) Structure and function of eukaryotic DNA methyltransferases. Curr Top Dev Biol 60:55–89 3. Zemach A, McDaniel IE, Silva P et al (2010) Genome-wide evolutionary analysis of eukaryotic DNA methylation. Science 328:916–919 4. Chen M, Lv S, Meng Y (2010) Epigenetic performers in plants. Develop Growth Differ 52:555–566 5. Gehring M, Henikoff S (2007) DNA methylation dynamics in plant genomes. Biochim Biophys Acta Gene Struct Expr 1769:276–286

6. Zilberman D, Gehring M, Tran RK et al (2007) Genome-wide analysis of Arabidopsis thaliana DNA methylation uncovers an interdependence between methylation and transcription. Nat Genet 39:61–69 7. Lister R, O’Malley RC, Tonti-Filippini J et al (2008) Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell 133:523–536 8. Zhang M, Kimatu JN, Xu K et al (2010) DNA cytosine methylation in plant development. J Genet Genomics 37:1–12 9. Zhang H, Lang Z, Zhu J-K (2018) Dynamics and function of DNA methylation in plants. Nat Rev Mol Cell Biol 19:489–506 10. Candaele J, Demuynck K, Mosoti D et al (2014) Differential methylation during maize

64

Marco Di Marsico et al.

leaf growth targets developmentally regulated genes. Plant Physiol 164:1350–1364 11. Stevens M, Cheng JB, Li D et al (2013) Estimating absolute methylation levels at singleCpG resolution from methylation enrichment and restriction enzyme sequencing methods. Genome Res 23:1541–1553 12. Wibowo A, Becker C, Marconi G et al (2016) Hyperosmotic stress memory in Arabidopsis is mediated by distinct epigenetically labile sites in the genome and is restricted in the male germline by DNA glycosylase activity. elife 5 13. Yong W-S, Hsu F-M, Chen P-Y (2016) Profiling genome-wide DNA methylation. Epigenetics Chromatin 9:26 14. Gu H, Smith ZD, Bock C et al (2011) Preparation of reduced representation bisulfite sequencing libraries for genome-scale DNA methylation profiling. Nat Protoc 6:468–481 15. Marconi G, Capomaccio S, Comino C, et al (2019) Methylation content sensitive enzyme ddRAD (MCSeEd): a reference-free, whole genome profiling system to address cytosine/ adenine methylation changes. Scientific Reports https://doi.org/10.1038/s41598019-51423-2 16. Peterson BK, Weber JN, Kay EH et al (2012) Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLoS One 7 17. Regulski M, Lu Z, Kendall J et al (2013) The maize methylome influences mRNA splice sites and reveals widespread paramutation-like

switches guided by small RNA. Genome Res 23:1651–1662 18. Maunakea AK, Nagarajan RP, Bilenky M et al (2010) Conserved role of intragenic DNA methylation in regulating alternative promoters. Nature 466:253–257 19. Catchen JM, Amores A, Hohenlohe P et al (2011) Stacks: building and genotyping loci de novo from short-read sequences. G3 Genes Genomes Genet 1:171–182 20. Li H, Handsaker B, Wysoker A et al (2009) The sequence alignment/map format and SAMtools. Bioinformatics 25:2078–2079 21. Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842 22. Liao Y, Smyth GK, Shi W (2014) FeatureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30:923–930 23. Akalin A, Kormaksson M, Li S et al (2012) MethylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biol 13:R87 24. Chong Z, Ruan J, Wu C-I (2012) Rainbow: an integrated tool for efficient clustering and assembling RAD-seq reads. Bioinformatics 28:2732–2737 25. Li W, Godzik A (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658–1659

Chapter 5 Plant-RRBS: DNA Methylome Profiling Adjusted to Plant Genomes, Utilizing Efficient Endonuclease Combinations, for Multi-Sample Studies Martin Schmidt, Magdalena Woloszynska, Michiel Van Bel, Frederik Coppens, and Mieke Van Lijsebettens Abstract In plants, methylation at cytosines often leads to changes in gene expression and inactivation of transposable elements. Changes in cytosine methylation (epimutations) might produce epialleles with distinct phenotypes. We present a genome-wide cytosine methylation profiling method based on bisulfite conversion and next-generation sequencing, which is applicable for plant species with available reference genomes. This so-called plant-RRBS profiling method reproducibly covers specific genomic regions and enriches for coverage of cytosine positions that are suitable for comparative analyses in multi-sample studies in basic biology and breeding studies. The plant-RRBS workflow consists of genomic DNA digestion with coverage-efficient restriction endonuclease combinations followed by a performant library generation and next-generation sequencing and a straightforward, publically available methylation data processing pipeline. Plant-RRBS has a twofold higher ratio of cytosine coverage per covered genome as compared to whole-genome bisulfite sequencing, covering tens of millions of cytosine positions, and allows detection of differential cytosine methylation, which was evaluated using rice epilines. Key words Cytosine methylation, DpnII, MspI and ApeKI endonucleases, Reduced representation bisulfite sequencing (RRBS), Epiline, Breeding, Oryza sativa

1

Introduction In plants, methylation occurs at cytosines in the CG, CHG, and CHH (H ¼ A, T, or C) sequence contexts. DNA may be methylated de novo, or methylation may be maintained by methyltransferases playing diverse, yet partially overlapping roles. Demethylation occurs via dilution during nuclear division or is driven by DNA glycosylases with DNA demethylation activity. In Arabidopsis thaliana, methylation in promoter regions of genes represses their transcription or inactivates transposable elements; in contrast, cytosine demethylation typically activates transcription

Charles Spillane and Peter McKeown (eds.), Plant Epigenetics and Epigenomics: Methods and Protocols, Methods in Molecular Biology, vol. 2093, https://doi.org/10.1007/978-1-0716-0179-2_5, © Springer Science+Business Media, LLC, part of Springer Nature 2020

65

66

Martin Schmidt et al.

or retrotransposition. The role of gene body methylation in transcriptional regulation is unclear (for reviews, see [1, 2]). Abiotic and biotic stresses affect DNA methylation, resulting in altered expression of stress- or defense-related genes and plant adaptation to environmental stress [1]. Changes in DNA methylation (epimutations) produce epialleles, which vary in expression level and which are heritable and may contribute to phenotypic variation [3, 4]. Natural epialleles can be related to agronomically important phenotypes: the Colorless non-ripening (Cnr) variant in tomato producing colorless fruits [5], the rice epialleles correlated with alterations in morphology or number of grains per panicle [6, 7], flowering time and plant height phenotypes in Arabidopsis [8], or improved energy use efficiency in canola and rice epilines [9–11]. The status of cytosine methylation can be detected via sodium hydrogen sulfite (bisulfite) conversion of non-methylated cytosines to uracil residues, deduced by PCR followed by DNA sequencing (bisulfite sequencing, BS). Integration of bisulfite conversion with next-generation DNA sequencing resulted in the whole-genome bisulfite sequencing (WGBS) protocol, successfully applied in plants [10, 12–14]. A profiling method that focuses on genomic regions that are crucial for gene expression would be a better solution for breeding studies than the whole-genome approaches that yield very large data sets. Reduced representation bisulfite sequencing (RRBS) was invented to allow low-cost DNA methylation profiling in mammals [15] and involved DNA digestion with methylation-insensitive restriction enzyme(s) recognizing sequences containing cytosine(s) followed by the selection of fragments for the BS step. The enzymes were chosen based on in silico digestion, and MspI (C#CGG) was most commonly used, either alone or in combination with another endonuclease selected to target the relevant genome regions, such as CG islands and gene promoters. Although plants differ in methylation patterns and do not show CG islands [16], their genome-wide abundant and often regional CG methylation [12, 17] was successfully detected by protocols based on mammalian RRBS. RRBS has, for example, addressed the roles played by DNA methylation in Brassica rapa during polyploid genome evolution [18], in Quercus lobata during local adaptation [19] or associated with climate gradient [20], in Zea mays during the transition from the vegetative to the reproductive phase [21], and in Nicotiana tabacum during the regulation of virus infection responses [22]. The early RRBS protocols had drawbacks of low cytosine coverages, uneven distribution of sequences captured (caused by the limited choice of restriction enzymes), and low multiplexing levels as a result of laborious gel-based size selection of enzymatic restriction products. Here, we present the detailed protocol of a nextgeneration sequencing (NGS)-based plant-RRBS method with

RRBS: DNA Methylome Profiling for Multi-Sample Studies

67

Fig. 1 Workflow for plant-RRBS. Main steps are DNA digestion (A), library construction (B), and Illumina sequencing and bioinformatics (C)

very high coverage of cytosine positions (up to 25%) and high reproducibility of these covered positions between the biological replicates using rice (epi)lines [23]. Our protocol can be applied to any plant species for which a reference genome is available. Optimal combinations of two restriction enzymes were identified in silico to ensure broad, genome-dispersed analysis of cytosine methylation, predominantly in putative promoters and non-annotated regions. The combinations of MspI with DpnII (#GATC) or with ApeKI (G#CWGC) represent an innovative double restriction approach in plants, effective for genome coverage by NGS [23]. The different steps in the plant-RRBS procedure are represented in Fig. 1 and include a wet lab component including DNA isolation and digestion (A), library preparation (B), and an Illumina sequencing and bioinformatics analysis component (C). The DNA fragments obtained after double digestion of the genomic DNA were end-repaired, ligated to adaptors, and bisulfite converted, yielding the library, which was PCR enriched and purified using solid-phase reversible immobilization (SPRI) to select for small DNA fragments and to avoid gel-based size selection. Following a quality control step, libraries were sequenced using the Illumina platform, producing reads that were analyzed in the designed bioinformatics pipeline, in which the read mapping and cytosine methylation detection occurred (Fig. 1). Plant-RRBS offers a twofold higher ratio of cytosine coverage per covered genome as compared to WGBS and shows very high potential to detect differential cytosine methylation. In addition, plant-RRBS is cheaper and much more effective than WGBS and can be therefore applied to large-scale

68

Martin Schmidt et al.

plant breeding programs. Moreover, plant-RRBS is highly reproducible and accurate, which is important in prospective studies or breeding programs where numerous individuals or consecutive generations are analyzed over a long period of time and cannot be directly compared within the same assay.

2

Materials

2.1

Plant Material

2.2

Enzymes

We used expanding leaves of fresh young plants as experimental material to yield good quality DNA. Rice plants (Oryza sativa ssp. indica) were grown in soil for 24 days to vegetative stage 2, characterized by the emerging fifth leaf at the first tiller, in a growth chamber at 26  C, 16-h day/21  C, 8-h night regime with a light intensity of 300 μmol m2 s1, and a relative humidity of 71% [23]. The fourth leaves of three to five individual plants (biological replicates) per epiline in vegetative stage 2 were harvested for DNA preparation (see Note 1). 1. Restriction enzymes: MspI (20,000 U/mL), ApeKI (5000 U/ mL), and DpnII (10,000 U/mL) (e.g., New England Biolabs). 2. DNA ligase. 3. Ex Taq DNA polymerase.

2.3

Solutions

1. Ethanol. 2. Deionized/PCR grade water. 3. Isopropanol. 4. 70% ethanol 5. 0.5 M EDTA, pH 8.0.

2.4

Buffers

1. DNA ligase buffer with ATP. 2. Ex Taq reaction buffer. 3. 50 TAE: 2 M Tris base (242.2 g/L), 50 mM EDTA (100 mL of 0.5 M stock solution/L), 1 M acetic acid (57.1 mL 100% acetic acid/L). 4. Restriction enzyme buffer suitable for double digestion (e.g., NEBuffer 3.1).

2.5

Kits

1. Wizard Genomic DNA Purification Kit (Promega). 2. EZ DNA Methylation-Lightning Kit (e.g., Zymo Research Corporation). 3. Cloning kit (e.g., Thermo Fisher Scientific). 4. Illumina TruSeq Kit (Illumina).

RRBS: DNA Methylome Profiling for Multi-Sample Studies

2.6

Products

69

1. Liquid nitrogen. 2. Agarose. 3. 1 kb Plus DNA Ladder. 4. SYBR Safe DNA gel stain or another product to stain DNA in gel, for example, ethidium bromide. 5. Agencourt AMPure XP beads (Beckman Coulter). 6. Custom-synthesized methylated multiplex adapters (e.g., Eurofins MWG Operon). 7. dNTPs. 8. PCR index primers.

2.7

Equipment

1. Mortar and pestle. 2. Microcentrifuge reaction tubes. 3. Microcentrifuge. 4. Vortex. 5. Water bath or thermoblock. 6. Spectrophotometer (e.g., NanoDrop, Thermo Fisher Scientific) or fluorimeter (e.g., Qubit, Thermo Fisher Scientific). 7. Apparatus for agarose gel electrophoresis. 8. Power supplier. 9. Transilluminator. 10. System for agarose gel documentation. 11. Magnetic stand for 1.5-mL tubes (e.g., DynaMagTM-2magnet, Life Technologies, Invitrogen). 12. Centrifuge. 13. PCR thermocycler. 14. Cloning equipment. 15. Sanger sequencer. 16. Quantitative PCR (qPCR) machine. 17. 2100 Bioanalyzer (Agilent Technologies). 18. Illumina HiSeq 2500 machine (Illumina).

2.8

Data Sets

1. Reference genome (fasta format) (Oryza sativa ssp. indica: version 9311_BGF_2005 [24], ftp://ftp.psb.ugent.be/pub/ plaza/plaza_public_02_5/Genomes/osaindica.con.gz) (for other genomes, see Note 6). 2. Reference genomic feature annotation (gff format) (Oryza sativa ssp. indica: ASM465v1.27).

70

2.9

Martin Schmidt et al.

Software

1. Biopieces framework (v0.48). 2. FastQC (v0.11.2). 3. FastX-Toolkit (v0.0.13). 4. Trim Galore (v0.3.3). 5. BS-Seeker (v2.0.5) [25]. 6. Bowtie (v2.1.0) [26]. 7. Qualimap (v2.1) [27]. 8. Picard (v1.129). 9. BamUtil (v20130118). 10. BEDTools package (v2.22.0) [28]. 11. MethylKit (v0.5.5) [29].

3

Methods

3.1 Genomic DNA Isolation and Digestion 3.1.1 DNA Isolation, Quality, and Quantity Evaluation

1. Harvest leaves and grind them with mortar and pestle in liquid nitrogen into a fine powder, transfer the powder to microcentrifuge reaction tubes, and store at 80  C. 60- to 80-mg samples of the powdered tissue (see Note 2) are taken for DNA isolation with the Wizard Genomic DNA Purification Kit, used according to manufacturer’s instructions. The centrifugations are performed for 2 min instead of 1 min to ensure sedimentation of DNA in the washing step. The final DNA pellets are suspended in 50 μL of the Rehydration Solution of the DNA Purification kit. 2. Measure the DNA concentration and purity of the samples (A260/A280 and A260/A230 ratios) with the NanoDrop (see Notes 3 and 4). Load aliquots of samples onto a 1% agarose gel to examine molecular weight, purity, and integrity of the DNA isolates. Incubate samples at 37  C for several hours before loading on the agarose gel to test for nuclease contamination (see Note 5).

3.1.2 In Silico Digestion

1. In silico digestion of the reference genome is used to predict the per-base genome coverage reached by double restriction endonucleases digestion. 2. Candidate restriction endonucleases should be chosen based on the presence of cytosine in their cutting sites in order to enrich for fragments from CG-rich regions. In addition, it is convenient to choose endonucleases exhibiting compatible in vitro incubation conditions. 3. We recommend to start the in silico analysis with a combination of MspI and DpnII enzymes because they meet the criteria explained in point 2. In addition, the MspI and ApeKI enzyme

RRBS: DNA Methylome Profiling for Multi-Sample Studies

71

Fig. 2 Frequency (count, y-axis) of MspI-DpnII (A) and MspI-ApeKI (B) fragment lengths in bins (nt, x-axis) from Oryza sativa ssp. indica

combination is worth testing, although they have different reaction temperatures. 4. For the in silico digestion, we use the biopieces v0.48 framework with the tool digest_seq. 5. The in silico digestion is performed using the nuclear reference genome of Oryza sativa ssp. indica and the MspI (C#CGG) with DpnII (#GATC) double digest combination as input in the following code: read_fasta -i osi_nuclrefgenome.fasta | digest_seq -p CCGG -c 1 | digest_seq -p GATC -c 0

6. The data sets should be filtered for the insert size range between 150 and 420 bp, specific for the applied plant-RRBS setup in the rice study from Schmidt et al. [23], from which the number of bases served to calculate the per-base genome coverage relative to the rice nuclear reference genome size. For

72

Martin Schmidt et al.

Fig. 3 Frequency (count, y-axis) of MspI (A), DpnII (B), and ApeKI (C) fragment lengths in bins (nt, x-axis) from Oryza sativa ssp. Indica

RRBS: DNA Methylome Profiling for Multi-Sample Studies

73

MspI-DpnII, the per-base genome coverage is 159,765,366/ 427,026,737 ∗ 100%  37%. A similar coverage was detected for other plant species: Arabidopsis thaliana, Beta vulgaris, Brassica rapa, O. sativa ssp. japonica, and Zea mays (see Note 6). The length distributions of double-digested fragments for O. sativa ssp. indica are visualized in Fig. 2. Single digestion results in lower per-base genome coverage for MspI with 15%, DpnII with 29%, and ApeKI with 14% along with less frequent fragment lengths (Fig. 3). 3.1.3 Digestion with Restriction Enzymes

3.2 Library Preparation for Illumina Sequencing and Quality Control

Double digest the DNA samples with either the MspI and ApeKI or MspI and DpnII enzymes. The restriction should always be performed first with MspI for 20 h, followed by the digestion with the second enzyme, also for 20 h. The reactions of DNA digestion with restriction enzymes are set up with 2–2.5 μg of the DNA sample in a total volume of 50 μL, 6 μL of NEBuffer 3.1 (see Note 7) and 3 μL (see Note 8) of each enzyme corresponding to 60 U of MspI, 15 U of ApeKI, or 30 U of DpnII. The reaction tube cap area should be wrapped with parafilm to avoid evaporation. The enzymes are added in two portions (see Note 9): the first (1.5 μL) added at the beginning of the reaction and the second (1.5 μL) 4–5 h later. The reactions are conducted at 37  C (MspI and DpnII) or 75  C (ApeKI). The efficiency of the digestion can be ascertained by gel electrophoresis (see Note 10). 1. Plant-RRBS libraries for Illumina sequencing are generated by Alpha Biolaboratory according to Hsieh [30] and Pignatta et al. [31] with modifications as described in Schmidt et al. [23], providing a gel-free approach with less laborious size selection. 2. DNA purification of 300 ng of double-digested genomic DNA should be carried out using the solid-phase reversible immobilization (SPRI) method. For this, a ratio of 1.8 Agencourt AMPure XP beads to DNA (v/v) is recommended, as outlined in the manufacturer’s instructions, in cleanup steps during the library construction procedure. This ratio ensures effective recovery even of short plant-RRBS library inserts. Beadbound DNA is separated from the supernatant by a magnetic stand, washed with 80% ethanol, and eluted from the beads using the manufacturer’s kit components. 3. The end repair of the size-selected purified DNA (see step 2) is performed using the Illumina TruSeq kit as outlined in the manufacturer’s instructions. DNA is then purified using Agencourt AMPure XP beads according to the manufacturer’s instructions (see step 2).

74

Martin Schmidt et al.

4. Adapter ligation is performed with custom-synthesized methylated multiplex adapters and DNA ligase according to the manufacturer’s instructions. DNA is then purified using Agencourt AMPure XP beads according to the manufacturer’s instructions (see step 2). 5. Bisulfite treat the library using the EZ DNA MethylationLightning Kit as outlined in the manufacturer’s instructions. Place the samples in a thermal cycler and perform the denaturation and conversion program. 6. PCR amplification of 5–10 ng of the libraries is performed using 2.5 U of Ex Taq DNA polymerase, 5 μL of 10 Ex Taq reaction buffer, 25 mM dNTPs, and 1 μL of index primers (10 μM) in a 50-μL reaction. The thermocycling program is 95  C for 3 min, followed by 12 cycles of 95  C for 30 s, 65  C for 30 s, and 72  C for 60 s. 7. Purifications of the library are performed twice with 0.8 (v/v) Agencourt AMPure XP beads, according to the manufacturer’s instructions, primarily to remove adapter dimers (see step 2). 8. Quality controls are performed by random subcloning and Sanger sequencing of 20–30 colonies to ensure correct library construction, bisulfite conversion, and the presence of correct indices. Bisulfite conversion efficiency rates can be assessed by calculating cytosine methylation levels in the non-methylated chloroplast genome and should be 98% or higher for a limited influence on data analysis. The final libraries are checked for size distribution by qPCR and for residual adapter dimers, and quantified with a 2100 Bioanalyzer. The detected library size should range from approximately 270 to 540 bp, consisting of a 150- to 420-bp insert size and a 119-bp forward or reverse primer. 9. The plant-RRBS libraries are paired-end sequenced using an Illumina HiSeq 2500; we used that at the Vincent J. Coates Genomics Sequencing Laboratory at UC Berkeley. 3.3 Bioinformatics Pipeline

The scripts are available via doi (https://doi.org/10.5281/ zenodo.168034) and on GitHub.

3.3.1 Reads Analysis

1. The quality of the reads is determined using FastQC (v0.11.2) for all libraries (Fig. 4). 2. The reads are adjusted to a uniform length of 50 nt by trimming the 30 end of the reads using the FastX-Toolkit (v0.0.13); this prevents the read lengths becoming a confounding variable in the downstream analyzes.

RRBS: DNA Methylome Profiling for Multi-Sample Studies

Fig. 4 Bioinformatics pipeline. Main steps are read mapping (A) and methylation detection (B)

75

76

Martin Schmidt et al.

3. Trim Galore (v0.3.3) is used to remove sequencing adapters from the reads. If, after trimming, reads have a length smaller than 20, the software will remove them from the library. 4. The resulting reads are mapped to the Oryza sativa spp. indica reference genome using BS-Seeker (v2.0.5) [25] and Bowtie (v2.1.0) [26]. Different indices are used in order to correctly map reads to the defined cutting sites, with indexing tools provided by the indicated software packages. 5. The mapping quality can be evaluated using Qualimap (v2.1) [27]. 6. The mapped reads (in BAM format) are sorted using Picard (v1.129). 7. Overlapping read (v20130118). 3.3.2 Methylation Detection

pairs

are

clipped

using

bamUtil

1. Determine the number of nucleotides (in the reference genome) covered by the mapped reads using the genomecov tool from BEDTools package (v2.22.0) [28]; this provides the per-base genome coverage for the libraries (Fig. 4). 2. Detect cytosine methylation using BS-Seeker [25], with the output converted to BED format. A minimum threshold of ten informative nucleotides per cytosine position should be used in order to remove spurious hits and false positives. 3. Genomic features of certain cytosine positions are determined based on the Oryza sativa ssp. indica annotation ASM465v1.27, with no transposable elements being retained. 4. Comparison of the methylated regions and genomics regions, as well as the differential methylation, is carried out using the R package MethylKit (v0.5.5) [29].

4

Notes 1. Each genotype was represented by five biological replicates in our study, but we recommend three replicates to obtain reproducible genome coverage between the samples and consistency of methylation levels detected between replicates. 2. There is an optimal amount of tissue per DNA extraction using the Wizard Genomic DNA Purification Kit (Promega) following the instructions of the manufacturer, i.e., 60–80 mg of powder obtained after fresh tissue freezing in liquid nitrogen and grinding to fine powder, in order to obtain high efficiency and quality of DNA isolates (Fig. 5, lane 1). Using less than 60 mg as a starting amount of powdered tissue results in a drastically decreased DNA concentration in the final samples

RRBS: DNA Methylome Profiling for Multi-Sample Studies

77

Fig. 5 The effect of the amount of plant tissue on the quantity and quality of resulting DNA samples. DNA isolated from 70 mg (lane 1), 45 mg (lane 2), and 110 mg (lane 3) of powdered tissue. The horizontal, parallel lines are marker lines from the gel carriage

(Fig. 5, lane 2). The DNA samples obtained from tissue amounts higher than 80 mg had a high concentration, but the quality of the obtained DNA was low, as demonstrated by the smear of low-molecular-weight DNA molecules below the main DNA band after electrophoresis in the 1% agarose gel (Fig. 5, lane 3). 3. Typically, DNA concentrations of 100–200 ng/μL were measured using NanoDrop corresponding to the total yield of 5–10 μg DNA per sample. DNA concentrations evaluated using the Qubit apparatus were usually lower by 15–50%. 4. The A260/A280 ratios were in the range of 1.8–2.1, indicating high purity with respect to protein contamination. The A260/ A230 ratios were in the range of 1.8–2.7 for the majority of samples, showing that the samples were free of denaturing components of buffers commonly included in kits for nucleic acid purification, which have maximal absorption around 230 nm and may affect downstream analysis of nucleic acids. Samples with ratios lower than 1.8, even as low as 1.4, indicating contamination with chaotropic salts, were occasionally obtained but performed well in the plant-RRBS protocol. 5. It is essential to obtain high-quality samples showing a single, high-molecular-weight DNA band without smear. A smear

78

Martin Schmidt et al.

corresponds to DNA molecules of variable sizes produced either by mechanical shearing during isolation or by the activity of unspecific nucleases during incubation with restriction enzymes. These undesired randomly produced short DNA molecules would be incorporated into the RRBS library together with the restriction digestion products, resulting in severe problems with data reproducibility. 6. The plant-RRBS setup was evaluated by in silico digestion for a spectrum of plants with nuclear reference genomes: A. thaliana (TAIR 10), B. vulgaris ssp. vulgaris, B. rapa, O. sativa ssp. indica version 9311_BGF_2005, O. sativa ssp. japonica, and Z. mays (B73) [24, 32–36]. In silico digestion resulted in, respectively, 36, 19, 33, 37, 38, and 40% per-base genome coverage using MspI-DpnII and 15, 10, 15, 25, 26, and 27% using MspI-ApeKI [23]. 7. The NEBuffer 3.1 was chosen for both MspI-ApeKI and MspIDpnII double digestions because both ApeKI and DpnII have 100% activity in this buffer, while the activity of MspI is 50%. We did not use the CutSmart or NEB2.1 buffers ensuring 100% activity of MspI, because the activities of ApeKI or DpnII in these buffers are below 50%. To compensate for the lower activity of MspI, a high concentration of the enzyme (60 U per reaction) was used. 8. To provide optimal storage conditions, restriction enzymes contain 50% glycerol which may inhibit digestion of DNA in concentrations higher that 5%. Therefore, the volume of restriction enzyme(s) should not exceed 10% of the final reaction mixture volume. 9. In the case of high DNA concentrations, complex templates (genomic DNA), and long reaction times, the most efficient digestion occurs when the enzyme is added in two or more portions, ensuring that the reaction mixture contains active enzyme molecules for a longer time. Gel electrophoresis is used to check that genomic DNA was completely digested under the applied conditions. 10. Successful digestion is indicated by a smear of fragments and no evidence of a non-digested, high-molecular band after gel electrophoresis. Not-digested DNA control samples can be taken along the incubation procedure and are expected to show a discrete high-molecular band, indicating persistent DNA quality.

RRBS: DNA Methylome Profiling for Multi-Sample Studies

79

Acknowledgments We thank Annick Bleys and Martine De Cock for precious help in preparing the manuscript. This research was funded by the European Union Seventh Framework Programme through the Marie Curie Intra-European program “Lighter” to M.W. and the Research Training Network “Chromatin in Plants–European Training and Mobility” to M.V.L. and fellow M.S. (CHIP-ET, FP7-PEOPLE-2013-ITN607880). References 1. Bartels A, Han Q, Nair P, Stacey L, Gaynier H, Mosley M et al (2018) Dynamic DNA methylation in plant growth and development. Int J Mol Sci 19:2144 2. Galindo-Gonza´lez L, Sarmiento F, Quimbaya MA (2018) Shaping plant adaptability, genome structure and gene expression through transposable element epigenetic control: focus on methylation. Agronomy 8:180 3. Becker C, Hagmann J, Mu¨ller J, Koenig D, Stegle O, Borgwardt K et al (2011) Spontaneous epigenetic variation in the Arabidopsis thaliana methylome. Nature 480:245–249 4. Schmitz RJ, Schultz MD, Lewsey MG, O’Malley RC, Urich MA, Libiger O et al (2011) Transgenerational epigenetic instability is a source of novel methylation variants. Science 334:369–373 5. Manning K, To¨r M, Poole M, Hong Y, Thompson AJ, King GJ et al (2006) A naturally occurring epigenetic mutation in a gene encoding an SBP-box transcription factor inhibits tomato fruit ripening. Nat Genet 38:948–952 6. Miura K, Agetsuma M, Kitano H, Yoshimura A, Matsuoka M, Jacobsen SE et al (2009) A metastable DWARF1 epigenetic mutant affecting plant stature in rice. Proc Natl Acad Sci U S A 106:11218–11223 7. Miura K, Ikeda M, Matsubara A, Song X-J, Ito M, Asano K et al (2010) OsSPL14 promotes panicle branching and higher grain productivity in rice. Nat Genet 42:545–549 8. Johannes F, Porcher E, Teixeira FK, SalibaColombani V, Simon M, Agier N et al (2009) Assessing the impact of transgenerational epigenetic variation on complex traits. PLoS Genet 5:e1000530 9. Hauben M, Haesendonckx B, Standaert E, Van Der Kelen K, Azmi A, Akpo H et al (2009) Energy use efficiency is characterized by an epigenetic component that can be directed through artificial selection to increase yield. Proc Natl Acad Sci U S A 106:20109–20114

10. Schmidt M, Byzova M, Martens C, Peeters M, Raj Y, Shukla S et al (2018) Methylome and epialleles in rice epilines selected for energy use efficiency. Agronomy 8:163 11. Verkest A, Byzova M, Martens C, Willems P, Verwulgen T, Slabbinck B et al (2015) Selection for improved energy use efficiency and drought tolerance in canola results in distinct transcriptome and epigenome changes. Plant Physiol 168:1338–1350 12. Cokus SJ, Feng S, Zhang X, Chen Z, Merriman B, Haudenschild CD et al (2008) Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 452:215–219 13. Lister R, O’Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH et al (2008) Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell 133:523–536 14. Zakrzewski F, Schmidt M, Van Lijsebettens M, Schmidt T (2017) DNA methylation of retrotransposons, DNA transposons and genes in sugar beet (Beta vulgaris L.). Plant J 90:1156–1175 15. Meissner A, Gnirke A, Bell GW, Ramsahoye B, Lander ES, Jaenisch R (2005) Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis. Nucleic Acids Res 33:5868–5877 16. Feng S, Cokus SJ, Zhang X, Chen P-Y, Bostick M, Goll MG et al (2010) Conservation and divergence of methylation patterning in plants and animals. Proc Natl Acad Sci U S A 107:8689–8694 17. Zemach A, McDaniel IE, Silva P, Zilberman D (2010) Genome-wide evolutionary analysis of eukaryotic DNA methylation. Science 328:916–919 18. Chen X, Ge X, Wang J, Tan C, King GJ, Liu K (2015) Genome-wide DNA methylation profiling by modified reduced representation bisulfite sequencing in Brassica rapa suggests

80

Martin Schmidt et al.

that epigenetic modifications play a key role in polyploid genome evolution. Front Plant Sci 6:836 19. Platt A, Gugger PF, Pellegrini M, Sork VL (2015) Genome-wide signature of local adaptation linked to variable CpG methylation in oak populations. Mol Ecol 24:3823–3830 20. Gugger PF, Fitz-Gibbon S, PellEgrini M, Sork VL (2016) Species-wide patterns of DNA methylation variation in Quercus lobata and their association with climate gradients. Mol Ecol 25:1665–1680 21. Hsu F-M, Yen M-R, Wang C-T, Lin C-Y, Wang C-JR, Chen P-Y (2017) Optimized reduced representation bisulfite sequencing reveals tissue-specific mCHH islands in maize. Epigenet Chromatin 10:42 22. Wang C, Wang C, Xu W, Zou J, Qiu Y, Kong J et al (2018) Epigenetic changes in the regulation of Nicotiana tabacum response to Cucumber mosaic virus infection and symptom recovery through single-base resolution methylomes. Viruses 10:402 23. Schmidt M, Van Bel M, Woloszynska M, Slabbinck B, Martens C, De Block M et al (2017) Plant-RRBS, a bisulfite and nextgeneration sequencing-based methylome profiling method enriching for coverage of cytosine positions. BMC Plant Biol 17:115 24. Yu J, Hu S, Wang J, Wong GK-S, Li S, Liu B et al (2002) A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science 296:79–92 25. Guo W, Fiziev P, Yan W, Cokus S, Sun X, Zhang MQ et al (2013) BS-Seeker2: a versatile aligning pipeline for bisulfite sequencing data. BMC Genomics 14:774 26. Langmead B, Salzberg SL (2012) Fast gappedread alignment with Bowtie 2. Nat Methods 9:357–359

27. Okonechnikov K, Conesa A, Garcı´a-Alcalde F (2016) Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics 32:292–294 28. Quinlan AR, Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26:841–842 29. Akalin A, Kormaksson M, Li S, GarrettBakelman FE, Figueroa ME, Melnick A et al (2012) MethylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biol 13:R87 30. Hsieh T-F (2015) Whole-genome DNA methylation profiling with nucleotide resolution. Methods Mol Biol 1284:27–40 31. Pignatta D, Bell GW, Gehring M (2015) Whole genome bisulfite sequencing and DNA methylation analysis from plant tissue. BioProtocol 5:e1407 32. Dohm JC, Minoche AE, Holtgr€a we D, Capella-Gutie´rrez S, Zakrzewski F, Tafer H et al (2014) The genome of the recently domesticated crop plant sugar beet (Beta vulgaris). Nature 505:546–549 33. International Rice Genome Sequencing Project (2005) The map-based sequence of the rice genome. Nature 436:793–800 34. Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S et al (2009) The B73 maize genome: complexity, diversity, and dynamics. Science 326:1112–1115 35. The Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796–815 36. The Brassica rapa Genome Sequencing Project Consortium, Wang X, Wang H, Wang J, Sun R, Wu J et al (2011) The genome of the mesopolyploid crop species Brassica rapa. Nat Genet 43:1035–1039

Chapter 6 Rice Histone Propionylation and Generation of Chemically Derivatized Synthetic H3 and H4 Peptides for Identification of Acetylation Sites and Quantification Nino A. Espinas, Alejandro Villar-Briones, Michael C. Roy, and Hidetoshi Saze Abstract Histone proteins are crucial in the study of chromatin dynamics owing to their wide-ranging implications in the regulation of gene expression. Modifications of histones are integral to these regulatory processes in concert with associated proteins, such as transcription factors and coactivators. One of the biochemical techniques available to enhance analysis of histone proteins is chemical derivatization using propionic anhydride. In this protocol, we describe the use of propionylation to efficiently derivatize acid-extracted histones from rice. We also synthesize H3 and H4 tryptic peptides, thus mimicking the nature of derivatized extracted peptides to aid in identification and quantification using targeted-mass spectrometry. Here we make available the masses of the precursor ions and the retention times (RT) of each synthesized peptide. These provide useful information to facilitate histone data analysis. Lastly, we note that we will distribute these synthetic peptides in nanomolar (nM) concentrations to those who wish to utilize them for assays and further experimental studies. Key words Chemical derivatization, Propionylation, Acetylation, Histone, Rice, H3, H4, Synthetic peptides, Mass spectrometry

1

Introduction Studies using Edman degradation protein sequencing of purified histones from peas (Pisum sativum) marked the start of plant histone acetylation research [1, 2]. In plants, histone H3 was found to be highly acetylated [3]. Previous work identified lysines 4, 9, 14, 18, 23, and 27 of histone H3 as acetylation targets, while H4 has five acetylation sites at K5, K8, K12, K16, and K20 (H4K20 is also typically methylated in yeasts and animals.) [4–7]. Thus,

Electronic supplementary material: The online version of this chapter (https://doi.org/10.1007/978-1-07160179-2_6) contains supplementary material, which is available to authorized users. Charles Spillane and Peter McKeown (eds.), Plant Epigenetics and Epigenomics: Methods and Protocols, Methods in Molecular Biology, vol. 2093, https://doi.org/10.1007/978-1-0716-0179-2_6, © Springer Science+Business Media, LLC, part of Springer Nature 2020

81

82

Nino A. Espinas et al.

many biochemical techniques have been developed to study histones and chromatin [8]. Mass spectrometry has also become commonplace in histone studies, although limitations abound with regard to histone analysis. Therefore, chemical derivatization via propionylation was introduced as an economically viable alternative to effectively identify and quantify modified sites in histones [8–11]. Propionylation of histones and their peptides confers two benefits: firstly, that enzymatic digestion produces relatively longer peptides and, secondly, that this process renders them more hydrophobic. These changes in chemical properties of histone peptides enhance discoverability during mass spectrometry. Here we demonstrate the use of propionylation to derivatize rice histones for mass spectrometric analysis. We also share mass spectrometric information sourced from non-acetylated and acetylated synthetic peptides of rice H3 and H4 that we have generated to facilitate histone acetylation analysis in rice epigenetic studies. Lastly, we provide non-acetylated and acetylated synthetic peptides to epigenetics researchers for use in their own assays and analyses. We first characterized the acid-extracted proteome (i.e., histones and other acid-soluble proteins) in rice using the Nipponbare cultivar (see Subheading 3.1). Totally 2373 unique proteins were identified from the pool of wild-type and transgenic line proteomes. Rice transgenic lines used for this protocol are RNAi lines with an observed reduction in global acetylation (Espinas et al., unpublished data). Acid-extracted proteins were chemically derivatized via propionylation of N-termini and empty lysine (K) sites before and after trypsin/Lys-C digestion (see Subheading 3.2). Peptides were then resuspended using 0.1% formic acid in Milli-Q water and were analyzed with a 70 min gradient of liquid chromatography-tandem mass spectrometry (LC-MS/MS) on a hybrid Quadrupole-Orbitrap Q-Exactive Plus mass spectrometer. All spectra were collected at high resolution using high-energy collisional dissociation (HCD) technology, and peptide sequences were identified using a combination of SEQUEST and Mascot search algorithms, with preference for peptides having variable modifications, including propionyl (N-terminal), propionyl (K), acetyl (K), acetyl (N-terminal), and methylation. A total of 31,954 unique peptide spectrum matches were collected using a false discovery rate (FDR) of 98% efficient. We then characterized 10,048 histone peptides from the acid-extracted proteome (Fig. 2c). These peptides constitute about 31% of the total acid-extracted proteome, and from these peptides, we utilized only N-terminally propionylated histones for downstream analysis. Our search identified 28 canonical and noncanonical histone proteins, with histones

Rice Histone H3 and H4 Peptides

83

Fig. 1 Efficiency of double propionyl labeling of peptides. Comparison of the amount of propionylation at peptide N-termini. The pie chart shows N-terminal propionylation in terms of the number of peptides (left) and intensity (right). Peptides were generated from doubly propionylated species of wild-type and transgenic line samples. (+) indicates the presence of propionylation and () indicates its absence

H3.2 and H3.3 having the most abundant peptide matches (Fig. 2a and b). To perform targeted identification and quantification of histone sites whether acetylated or not, we synthesized histone peptides mimicking the propionylated forms, as observed among extracted histone H3 and H4 peptides (see Subheading 3.3). As with extracted peptides, we doubly propionylated (see Subheading 3.4) and characterized (see Subheading 3.5) the synthetic peptides to determine their mass-to-charge ratios (m/z) and retention times (RT) necessary for identifying extracted peptides. We then performed parallel reaction-monitoring (PRM) mass spectrometry using these data to specifically quantify propionylated sites, H3K9 and H3K14, and to investigate their relative abundance and proportion in plant samples (Fig. 3) (Tables 1 and S1) (see Subheading 3.6) (see Note 7). PRM mass spectrometry results show that we can accurately identify H3K9 and H3K14. We can also estimate their proportional abundance relative to propionylated or empty peptides.

2

Materials

2.1 Rice Histone Extraction

1. Rice seedlings (30–40 days old). 2. EpiQuik Total Extraction Kit (EpiGentek). 3. Trichostatin A (TSA). 4. Nicotinamide (NAM). 5. 50 mL conical centrifuge tubes

84

Nino A. Espinas et al.

Fig. 2 Characterization of chemically derivatized rice histone peptides using data-dependent acquisition (DDA) mass spectrometry. (a) An MS/MS spectrum of [M + 2H]2+ precursor ions from an H3 peptide digested with trypsin/Lys-C and doubly propionylated. The lower panel shows the generated b+ and y+ ions corresponding to the amino acid sequence. (b) Identified histone proteins in DDA mode with their accession codes and peptide spectrum match scores (PSMS). (c) Efficiency of double propionylation in all histone peptides shown in terms of number (left) and intensity (right). Note: Masses in the sample spectrum and table correspond by using a 0.01 Da deviation tolerance for fragment ions

6. Mortar and pestle. 7. Liquid nitrogen. 8. Stainless steel spatula. 9. Nuclei isolation buffer (50 mL) (derived from [12]): To 40 mL Milli-Q water, add 0.226 g (15 mM) PIPES (see Note 6b) and adjust the pH to 6.8 using NaOH. Add 4.2 g sucrose (0.25 M), 0.024 g MgCl2 (5 mM), 0.224 g KCl (60 mM), 0.044 g NaCl (15 mM), 0.008 g CaCl2 (1 mM), 450 μL Triton X-100 (0.9%), and 500 μL protein inhibitor cocktail (1/100). Adjust volume to 50 mL with Milli-Q water. This solution should always be freshly prepared and the protease inhibitor cocktail added immediately before use. 10. EASY™ strainer cell sieves 100 μm filter (Greiner Bio-One). 11. Amicon Ultra—0.5 mL centrifugal filter Ultracel 3 K (Merck). 12. Sonicator.

Rice Histone H3 and H4 Peptides

85

Fig. 3 Targeted-mass spectrometry using parallel reaction-monitoring (PRM). PRM relative quantification for H3K9 and H3K14 acetylation using mass spectrometric analysis. AUC refers to the area-under-the-curve, which was used to quantify the intensity of each modification on a specific precursor ion and retention time (RT). The AUC proportion represents the amount measured for each modification in wild-type and transgenic lines 1 and 2

Table 1 Histone peptide masses of H3K9 and H3K14 sites (see Note 8)

Sequence

MþH [z ¼ 1]

MþH þ Prop [z ¼ 1]

M þ 2H þ Prop [z ¼ 2]

M þ 3H þ Prop [z ¼ 3]

Retention time (min)

K-S-T-G-G-K-A-P-R

901.52140

1069.60000

535.30335

357.20467

18.00

K(Ac)-S-T-G-G-K-A-P-R

943.53200

1055.58440

528.29555

352.53280

16.00

K-S-T-G-G-K(Ac)-A-P-R

943.53200

1055.58440

528.29555

352.53280

16.00

K(Ac)-S-T-G-G-K(Ac)-A-P-R

985.54250

1041.56870

521.28770

347.86090

15.00

86

Nino A. Espinas et al.

2.2 Propionylation of Synthetic and Biological Peptides

1. Purified histone extracts. 2. Synthetic histone peptides. 3. 100 mM tetraethylammonium bromide (TEAB) in Elix® water. 4. 25% ammonium hydroxide (NH4OH). 5. Propionic anhydride/isopropanol (1:3) solution. 6. Trypsin/Lys-C mix protease (e.g., Promega). 7. Vacuum dryer. 8. Incubator. 9. pH paper.

2.3 Fmoc-Based Solid-Phase Synthesis of Peptides

We used an Intavis ResPep SL automated peptide synthesizer for all syntheses of non-acetylated and acetylated histone H3 and H4 peptides. The synthesis scale was set at 2 μmol. If not specified otherwise, solvent concentrations are assumed to be 100%. 1. Fmoc-protected amino acids (Watanabe Chemical Ind., Ltd.): Fmoc-Ala-OH, Fmoc-Arg(Pbf)-OH, Fmoc-Asn(Trt)-OH, Fmoc-Asp(OtBu)-OH, Fmoc-Gln(Trt)-OH, Fmoc-Glu (OtBu)-OH, Fmoc-Gly-OH, Fmoc-His(Trt)-OH, Fmoc-IleOH, Fmoc-Leu-OH, Fmoc-Lys(Boc)-OH, Fmoc-Met-OH, Fmoc-Phe-OH, Fmoc-Pro-OH, Fmoc-Ser(tBu)-OH, FmocThr(tBu)-OH, Fmoc-Tyr(tBu)-OH, Fmoc-Val-OH, and Fmoc-Lys(Ac)-OH. All should be dissolved at 0.5 M, using N-methyl-2-pyrrolidone (NMP) as the solvent. 2. Reagents: Prepare the following for each zone—0.5 M HBTU (see Note 6a) in dimethylformamide (DMF) as a coupling reagent, 45% (v/v) N-methylmorpholine (NMM) in DMF as base, 12 mL (actual) NMP in the NMP zone, 48 mL (actual) dichloromethane (DCM) in the DCM zone, 5% (v/v) acetic anhydride in DMF as cap mixture, 20% (v/v) piperidine in DMF in the piperidine zone for amino acid deprotection, and 466 mL (actual) DMF as solvent 1. In the pump zone, use 1564 mL (actual) DMF as solvent 1 and 46 mL (actual) ethanol as solvent 2. 3. Resin preparation: Compute the amount of Fmoc-Rink-Amide (aminomethyl)-Resin (Intavis) for the number of peptides to be synthesized. For example, for a resin loading value of 0.48 mmol/g (as shown on the resin bottle), for synthesis of 2 μmol, we get 2 mg/peptide multiplied by the number of peptides for synthesis. Prepare the resin in a separate conical centrifuge tube.

2.4 Workup of Synthesized Peptides

1. Cleaving reagent: 100% TFA/TIPS/H2O (90:5:5). 2. Precipitation reagent: tert-butyl methyl ether.

Rice Histone H3 and H4 Peptides

2.5

LC-MS/MS

87

1. Liquid chromatography: Dionex UltiMate 3000 RSLCnano. 2. Mass spectrometer: We used a hybrid Quadrupole-Orbitrap (Q-Exactive Plus, Thermo Scientific). 3. Column: Zorbax 300SB-C18, 0.3  150 mm, 3.5 μm, Agilent Technologies. 4. 0.1% Formic acid (FA) in LCMS-grade water.

3

Methods

3.1 Extraction of Rice Histones

This is a hybrid protocol using our in-house procedures (i.e., with contributions from collaborating laboratories) combined with the protocol from the EpiQuik total histone extraction kit. The starting amount of sample tissue can vary from 3 to 5 g, with a yield ranging from 0.2 to 1.1 μg/μL of purified histones. Late vegetative phase (30–40 days old) plants were hyperacetylated by treating them with a final concentration of 1 μM trichostatin A (TSA) and 100 μM nicotinamide (NAM) for 48 h before collection (see Note 1). Roots were discarded during sampling. 1. Homogenize the plant sample completely in liquid nitrogen using a mortar and pestle. Avoid thawing the samples. 2. Transfer the powdered sample to 50 mL conical centrifuge tubes using a frozen spatula. Keep on ice for 5 min. 3. Suspend the sample in 5 mL cold nuclei isolation buffer with 1/100 protease inhibitor cocktail and histone deacetylase inhibitor cocktail (HDACi) (2.5 μL 4 mM TSA and 25 μL 1 M NAM). 4. Vortex for 1 min to mix and keep on ice for 30 min. 5. Filter the homogenized sample into a 50 mL conical centrifuge tube using an EASY strainer 100 μm filter. Centrifuge for 20 min at 11,000  g, 4  C. You will be able to see a white pellet (nuclei) with a layer of chlorophyll on the surface. 6. Discard the supernatant. 7. From the EpiQuik total histone extraction kit, dilute the 10 pre-lysis buffer to 1 with Milli-Q water. Add HDACi (0.5 μL 4 mM TSA and 5 μL of 100 mM NAM) to 1 mL 1 pre-lysis buffer. 8. Add the 1 mL of 1 pre-lysis buffer to the homogenized sample. Mix by pipetting. 9. Centrifuge for 20 min at 11,000  g, 4  C. 10. Remove the supernatant and discard. Keep the pellet on ice (see Note 2).

88

Nino A. Espinas et al.

11. Resuspend the nuclear pellet in 500 μL of lysis buffer with HDACi (0.5 μL 4 mM TSA and 5 μL of 100 mM NAM/1 mL lysis buffer). 12. Sonicate: 30 s, 20% amplitude, and cool on ice for 1 min. Repeat four times. 13. Incubate for 30 min on ice. 14. Centrifuge for 5 min at 13,000  g, 4  C. Transfer the supernatant fraction containing the acid-soluble proteins into a new microcentrifuge tube. 15. Prepare the balance-DTT buffer by adding DTT solution to balance buffer at a 0.5:250 μL ratio. 16. Add 0.3 volume of the balance buffer-DTT to supernatant immediately. 17. Further purify histone samples using an Amicon Ultra— 0.5 mL centrifugal filter Ultracel 3 K by centrifugation for 30 min at 14,000  g, 22  C. 18. Replace the lost Balance-DTT buffer with 300 μL Milli-Q water, and centrifuge again for 30 min at 14,000  g, 22  C. Repeat 1 from replacing with Milli-Q water. 19. Check the amount of liquid left and volume to approximately 100 μL using Milli-Q water. 20. Collect the histone proteins into a new microcentrifuge tube by invert-centrifugation. 21. Measure the histone protein concentration (see Note 3). 3.2 Double Propionylation of Biological Peptides

All procedures should be performed on low-binding microcentrifuge tubes. 1. Prepare at least 10 μg of purified histones. Vacuum dry the sample. 2. Add 10 μL of 100 mM TEAB. 3. Add 0.5 μL 25% NH4OH. 4. Subsequently, add propionic anhydride/isopropyl alcohol (1:3), for example, 1.25 μL 99% propionic anhydride and 3.75 μL of 100% isopropyl alcohol. 5. Incubate for 15 min at 37  C while adding 3 μL 25% NH4OH every 5 min. Mix continuously. 6. Vacuum dry the sample. 7. Resuspend the histones in 10 μL of 100 mM TEAB. 8. Double digest the histones with Trypsin/Lys-C enzyme at 37  C overnight by adding 1 μg/μL Trypsin/Lys-C to 10 μg/μL histones (1:10) (see Note 4). 9. The following day, vacuum dry the solution again. 10. Repeat steps 2–6 (see Subheading 3.2). Prepare the sample for LC-MS/MS (see Subheading 3.5).

Rice Histone H3 and H4 Peptides

3.3 Synthesis of Rice Histone H3 and H4 Synthetic Peptides

89

Automated synthesis of peptides using Intavis ResPep SL will not be detailed in this protocol, but we refer readers to the manufacturer’s protocol in the product manual. It should also be noted that we have synthesized peptides with C-terminal amide (see Note 9). 1. Prepare Fmoc-protected amino acids according to standard automated Fmoc protocols at 0.5 M of each amino acid in NMP and 0.5 M of HBTU coupling reagent in DMF. Deprotect amino acids using 20% piperidine in DMF, and synthesize the peptides on solid phase using Fmoc-Rink Amide resin. 2. Cleave synthesized peptides from the resin by adding 0.2 mL (2) in 100% TFA/TIPS/H2O (90:5:5). 3. Precipitate peptides from the solution by adding 0.5 mL 30  C cold tert-butyl methyl ether and keep at 30  C overnight. 4. The next day, discard the ether solution and wash precipitated peptides (5, 0.5 mL) with the same cold ether and vacuum dry.

3.4 Double Propionylation of Recovered Synthetic Peptides

All procedures can be performed on low-binding microcentrifuge tubes. 1. Prepare 2 μg of synthetic peptides and add 10 μL of 100 mM TEAB. 2. Add 0.5 μL 25% NH4OH. 3. Subsequently, add propionic anhydride/isopropyl alcohol (1:3), for example, 1.25 μL  99% propionic anhydride and 3.75 μL of 100% isopropyl alcohol. 4. Incubate for 15 min at 37  C while adding 3 μL 25% NH4OH every 5 min. Mix continuously. 5. Vacuum dry the sample. 6. Repeat steps 1–5 (Subheading 3.4) by adding another 10 μL of 100 mM TEAB. Prepare the sample for LC-MS/MS (see Subheading 3.5).

3.5 Characterization of Doubly Propionylated Peptides Using LC-MS/MS

Operation of the LC-MS/MS will not be detailed in this protocol, but we refer readers to their institutional mass spectrometry service. We used a Dionex UltiMate 3000 RSLCnano system operated at microflow rate, combined with an electrospray ion source for mass spectrometric characterization of synthetic peptides and for targeted analysis of extracted peptides. 1. Add 30 μL of 0.1% FA in LCMS-grade water to vacuum dried peptides. Mix well (see Note 5). 2. Separate peptides using C18 column with HPLC solvent A (0.1% v/v formic acid, 1% v/v acetonitrile in LCMS-grade

90

Nino A. Espinas et al.

water) and solvent B (0.1% v/v formic acid, 98% v/v acetonitrile in LCMS-grade water). 3. In a hybrid Quadrupole-Orbitrap, use data-dependent acquisition (DDA) mode for data collection. 4. Separate peptides in a C18 column at 3 μL per minute by a gradient: 1–5% solvent B for 2 min, followed by a ramp of 5–35% B in 50 min, then 35–45% B in 2 min, a wash at 75% B in 5 min, and re-equilibration at 1% B for a total run of 70 min. 5. Collect full-scan mass spectra at 70,000 resolution with a mass range of 250–1500 m/z and a target value of 1e6. Record MS/MS spectra at 17,500 resolution (higher-energy collisional dissociation or HCD). 3.6 Identification and Quantification of Extracted Rice Histone Peptides Using Synthetic Peptide Information

4

For parallel reaction-monitoring (PRM) mass spectrometry, it is necessary to input a mass inclusion list for histone H3 and H4 peptides (see Table S1). In the above file, we have provided mass spectrometric information for all peptides we synthesized for targeted identification and quantification of extracted histones. Conditions for LC-MS/MS in PRM were similar to DDA, except with a target value of 2e5 and an isolation window of 2.0 m/z. Various software packages (e.g., Proteome Discoverer and Pinpoint by Thermo Scientific or Skyline by MacCoss laboratory) may be used for quantification of modifications and other analyses.

Notes 1. Histone deacetylase inhibitor mix (HDACi) uses trichostatin A (TSA) and nicotinamide (NAM), which broadly inhibit class I and III deacetylases. Note that the mix does not inhibit class IV deacetylases, although class I and III types comprise the majority of known deacetylases [13]. For acetylation analysis, it is very important to hyperacetylate samples before extraction to prevent the dynamic removal of the acetyl moiety, which might make it difficult to measure acetylation levels. 2. This step can be utilized as a pause point. 3. Quantify histones accurately with fluorescence (e.g., Qubit Protein Assay Kit) or with infrared spectrometry (e.g., Direct Detect). Do not use the Bradford colorimetric method as core histones, including H1, react poorly if at all with Coomassie dye. Samples in this step can be used for assays that require pure histone samples. Samples must be diluted and optimized for specific biochemical assays. For Western blotting and downstream assays that use antibodies, check the pH of the purified

Rice Histone H3 and H4 Peptides

91

histone samples and adjust to pH 6.5–8.4 to prevent inhibition of binding. 4. Digestion conditions should be basic to optimize protease activity. Check the pH of the reaction solution if necessary. 5. The solution must be acidic (pH 2–4) to fully solubilize the peptides in the solution. Check the pH of the solution if necessary. 6. (a) HBTU, hexafluorophosphate benzotriazole tetramethyl uronium; (b) PIPES, piperazine-N,N0 -bis(2-ethanesulfonic acid). 7. Data presented for relative quantification were only normalized based on the value of wild-type AUC to show that identification and quantification is possible; however, a more proper way to normalize data for reliable quantification is to target the peptide or peptides that are unmodified and/or to use isotope labeling [9, 10]. 8. Lysine sites are available for propionylation if they are unmodified or monomethylated. Masses (colored) are utilized as an inclusion list for identification and quantification of specific in vivo histone lysine site modifications. For quantification of acetylation at K9 or K14, as shown in this protocol, we utilized Pinpoint software (i.e., Skyline is also highly recommended) to select the unique MS2 ion. 9. We could not synthesize some very long peptides, and no information was given in Table S1 (see supplementary information). We synthesized peptides with C-terminal amide to save cost. However, we confirmed that propionylation occurs in the N-terminal and that fragmentation was not affected. As propionylation enhances y-ions, we have also confirmed that having amides on the C-terminal did not affect y-ion fragmentation.

Acknowledgments The Plant Epigenetics Unit and Instrumental Analysis Section are funded in part by the Okinawa Institute of Science and Technology Graduate University (OIST). N.A.E was also funded by Grant-inAid for JSPS Fellows (Grant No: 18F18085). We thank Steven D. Aird (OIST) for editing the manuscript.

92

Nino A. Espinas et al.

References 1. Bonner J, Chalkley GR, Dahmus M, Fambrough D, Fujimura F, Huang RC, Huberman J, Jensen R, Marushige K, Ohlenbusch H, Olivera B, Widholm J (1968) Isolation and characterization of chromosomal nucleoproteins. Methods Enzymol 12:3–65 2. Fambrough DM, Fujimura F, Bonner J (1968) Quantitative distribution of histone components in the pea plant. Biochemistry 7 (2):575–585 3. Waterborg JH (1990) Sequence analysis of acetylation and methylation in two histone H3 variants of alfalfa. J Biol Chem 265 (28):17157–17161 4. Earley KW, Shook MS, Brower-Toland B, Hicks L, Pikaard CS (2007) In vitro specificities of Arabidopsis co-activator histone acetyltransferases: implications for histone hyperacetylation in gene activation. Plant J 52 (4):615–626. https://doi.org/10.1111/j. 1365-313X.2007.03264.x 5. Matthews HR, Waterborg JH (1985) Reversible modifications of nuclear proteins and their significance. The enzymology of posttranslational modification of proteins 2:125–185 6. Waterborg JH (1992) Identification of five sites of acetylation in alfalfa histone H4. Biochemistry 31(27):6211–6219 7. Zhang K, Sridhar VV, Zhu J, Kapoor A, Zhu JK (2007) Distinctive core histone posttranslational modification patterns in Arabidopsis thaliana. PLoS One 2(11):e1210. https://doi.org/10.1371/journal.pone. 0001210 8. Shechter D, Dormann HL, Allis CD, Hake SB (2007) Extraction, purification and analysis of

histones. Nat Protoc 2(6):1445–1457. https://doi.org/10.1038/nprot.2007.202 9. Garcia BA, Mollah S, Ueberheide BM, Busby SA, Muratore TL, Shabanowitz J, Hunt DF (2007) Chemical derivatization of histones for facilitated analysis by mass spectrometry. Nat Protoc 2(4):933–938. https://doi.org/10. 1038/nprot.2007.106 10. Maile TM, Izrael-Tomasevic A, Cheung T, Guler GD, Tindell C, Masselot A, Liang J, Zhao F, Trojer P, Classon M, Arnott D (2015) Mass spectrometric quantification of histone post-translational modifications by a hybrid chemical labeling method. Mol Cell Proteomics 14:1148–1158. https://doi.org/ 10.1074/mcp.O114.046573 11. Meert P, Govaert E, Scheerlinck E, Dhaenens M, Deforce D (2015) Pitfalls in histone propionylation during bottom-up mass spectrometry analysis. Proteomics 15:2966–2971 12. Saleh A, Alvarez-Venegas R, Avramova Z (2008) An efficient chromatin immunoprecipitation (ChIP) protocol for studying histone modifications in Arabidopsis plants. Nat Protoc 3(6):1018–1025. https://doi.org/10.1038/ nprot.2008.66 13. Scholz C, Weinert BT, Wagner SA, Beli P, Miyake Y, Qi J, Jensen LJ, Streicher W, McCarthy AR, Westwood NJ, Lain S, Cox J, Matthias P, Mann M, Bradner JE, Choudhary C (2015) Acetylation site specificities of lysine deacetylase inhibitors in human cells. Nat Biotechnol 33(4):415–423. https://doi.org/10. 1038/nbt.3130

Part II Epigenetics and Plant Chromatin Structure

Chapter 7 Preparing Chromatin and RNA from Rare Cell Types with Fluorescence-Activated Nuclear Sorting (FANS) Ruben Gutzat and Ortrun Mittelsten Scheid Abstract The application of fluorescent tags to generate cell type-specific translational and transcriptional reporter lines is routine in plants, but separation of different cell types for downstream analyses is hampered by the presence of cell walls and tight connections between cells. Enzymatic removal of cell walls induces a wound response, dedifferentiation, or reprogramming of the resulting protoplasts. Their osmotic and mechanical instability and their large size range are challenging for FACS, a flow -sorting procedure based on differential expression of fluorescent tags. In contrast, plant nuclei are relatively robust and easy to isolate. Here, we describe a protocol for fluorescence-activated nuclear sorting (FANS) that allows efficient purification of very few fluorescence-tagged nuclei from a large background of non-labeled tissue. Purified nuclei are suitable for genome, epigenome, transcriptome, or proteome analyses. We describe in detail how to analyze nuclear RNA and DNA methylation from sorted nuclei representing the limited number of stem cells in the shoot apical meristem of Arabidopsis. Key words FACS, FANS, RNA-seq, DNA methylation, Nucleus, Epigenetics

1

Introduction The transcriptome and chromatin structure of plant cells is highly dynamic and can change rapidly during development and in response to changing environments (reviewed in [1, 2]). For example, the genomic landscape of histone modification changes dramatically in vegetative and sperm cell nuclei during pollen development of Arabidopsis [3], and heat stress reduces nucleosome occupancy, heterochromatin decompaction, and reactivation of otherwise silenced regions [4]. Even the relatively static pattern of DNA methylation can be highly dynamic in stem-, germline, and somatic cell types during development [5–7] and at a population level [8]. These examples represent likely just the tip of an iceberg of epigenetic and transcriptional heterogeneity between tissues, cell types, and individual cells.

Charles Spillane and Peter McKeown (eds.), Plant Epigenetics and Epigenomics: Methods and Protocols, Methods in Molecular Biology, vol. 2093, https://doi.org/10.1007/978-1-0716-0179-2_7, © Springer Science+Business Media, LLC, part of Springer Nature 2020

95

96

Ruben Gutzat and Ortrun Mittelsten Scheid

Cell type-specific expression has been imaged with remarkable resolution ever since fluorescent protein (FP) tags became available to construct translational or transcriptional reporter lines (reviewed in [9, 10]). However, the use of the reporters to isolate individual cells or cell types for molecular analysis is limited by the need for (1) disintegration of cell walls and (2) a certain amount and (3) a certain degree of purity of material. Although the fluorescence reporters substantially support improvement in the latter case, most studies aiming at genome-wide characterization of specific transcriptomes or epigenomes in plants have so far been conducted with mixed tissues or abundant cell types. Protoplast preparation in combination with fluorescent markers has made plant cells accessible to cell sorting procedures like those well established in animal cell cultures (reviewed in [11]), but this approach is hampered by unintended physiological responses in the cells, dedifferentiation, osmotic instability, or interfering compounds like starch or secondary metabolites. The same factors often impede RNA and chromatin analysis. Circumventing some of the problems, alternative approaches via isolation of nuclei have been established. Nuclei preparation can be done quickly on ice or from frozen material to prevent unwanted chromatin changes. Nuclei contain all the chromatin, and the nuclear RNA pool can be used as a proxy for cellular mRNA levels [6, 12]. Cell type-specific biotin labeling of the nuclear envelope followed by affinity purification of nuclei with the INTACT method (isolation of nuclei tagged in specific cell types) [12–14] has been used successfully to determine mRNA content, genomewide histone modifications, and chromatin accessibility in several cell types but can fail if the ratio of labeled to unlabeled nuclei is very low. Alternatively, fluorescence-labeled nuclei are separated from others by flow sorting, in a process termed FANS (fluorescence-activated nuclei sorting). This principle has been successfully applied to different plant cell types and downstream analyses [15–18]. Here, we provide a protocol by which we successfully used the high sensitivity and specificity of FANS to obtain pure fractions of stem cells of Arabidopsis shoot apical meristems (SAM). We further describe in detail how the small numbers of sorted nuclei can be used for subsequent transcriptome and methylome analysis by preparing libraries for RNA- and bisulfite sequencing. Furthermore, FANS could be used to isolate different cell types and transition stages simultaneously and has the potential to be applied for single-cell (nuclei) sequencing techniques.

2 2.1

Materials Nuclei Isolation

Plant material: lines transgenic for a suitable fluorescent protein (FP) reporter construct labeling nuclei in the cell type of interest with high intensity and stability (see Note 1).

Preparing Chromatin and RNA from Rare Cell Types with FANS

97

Disposables: one-way plastic petri dishes (diameter 5 cm or bigger); disposable blades for the TissueRuptor (Qiagen) or fresh razor blades (e.g., we use Wilkinson Sword Classic, double-sided, see Note 2); Eppendorf tubes of 1.5, 2.0, and 5.0 mL volume; 10 mL Falcon tubes; 1 mL pipette tips (cut tip wider with scissors); ice for cooling. Equipment: CellTrics 30 μm nylon filters (Sysmex), microcentrifuge with cooling function, pipettes, forceps, TissueRuptor (Qiagen) (optional). Nuclei isolation buffer (NIB [19]) (see Note 3): 500 mM sucrose, 100 mM KCl, 10 mM Tris–HCl pH 9.5, 10 mM EDTA, 4 mM spermidine, 1 mM spermine. Filter-sterilize and keep at 4 C. Alternatively, NIB can be stored at 20 C for several months. Add 2-mercaptoethanol (1% v/v) just before use (see Note 4). Galbraith buffer (GB [20]) (see Note 3): 45 mM MgCl2  6H2O, 30 mM sodium citrate (trisodium), 20 mM MOPS (3-(N-morpholino)propanesulfonic acid), pH 7.0. Autoclave or filtersterilize. Add 0.1–1% Triton X-100 (v/v, see Note 3), and store at 4 C if need be. Add 5 μL 2-ME (mercaptoethanol)/ mL GB and optional inhibitors (see Note 4) just before use. Staining buffer (SB): GB plus 5 μg/mL DAPI (40 ,6-diamidino-2phenylindole). DAPI stock solution 5 mg/mL. 2.2 FluorescenceActivated Nuclei Sorting (FANS)

Disposables: polystyrene tubes (e.g., 55  12 mm, 3.5 mL, Sarstedt), DNA/RNA low-bind tubes (1.5 mL LoBind Tubes, Eppendorf). Equipment: cell sorter, FACS Aria III (BD Biosciences) or equivalent.

2.3 RNA Extraction and Quality Control

Disposables: DNA/RNA low-bind tubes (as above), filter tips for pipettes, kit for Bioanalyzer quantification (e.g., the Agilent RNA 6000 Pico Kit), DNAse, cDNA Synthesis Kit (Biorad iScript or equivalent), qPCR primers and probe for gene of interest (designed, e.g., with https://lifescience.roche.com/ en_at/brands/universal-probe-library.html#assay-design-cen ter), qPCR primers for control gene (we use in most instances F: ggattttcagctactcttcaagcta and R: ctgccttgactaagttgacacg, amplifying AT2G28390.1 and usable with Roche Probe 157), qPCR master mix, SMART-Seq V4 kit (Takara), or alternatively, use the protocol from [21]. Equipment: RNAse-free workspace, vortex, 20 C and 80 C freezer, microcentrifuge with cooling function, heat block, Bioanalyzer (including priming station, IKA vortex mixer, and 2100 Expert Software/Agilent Technologies), LightCycler (e.g., Roche96), PCR machine (e.g., Eppendorf).

98

Ruben Gutzat and Ortrun Mittelsten Scheid

TRIzol LS (Ambion/now Thermo Fisher). Chloroform (molecular grade). Isopropanol (molecular grade). Glycogen (RNAse/DNAse-free, e.g., glycogen RNA grade, Thermoscientific). Do NOT use glycoblue as this will interfere with downstream applications. Ethanol (75% or 85%, molecular grade). Nuclease-free water. Liquid nitrogen. 2.4 DNA Extraction and Quantification

Disposables: DNA/RNA low-bind tubes. Equipment: Fluorescence NanoDrop (e.g., NanoDrop 3300 Fluorospectrometer/Thermoscientific). Quick-gDNA MicroPrep Kit (Zymo Research) or equivalent. Nuclease-free 1X-TE buffer (either self-made or, e.g., IDT, IDTE). PicoGreen dsDNA Reagent Kit (Thermoscientific). Pico Methyl-Seq Library Prep Kit (Zymo Research).

3 3.1

Methods Nuclei Isolation

1. Depending on the cell type, it might be necessary to enrich for the tissue containing the desired cells. For stem cell nuclei, we collect shoot apices mechanically with forceps from seedlings or inflorescences into 1 mL NIB placed in a small petri dish on ice. Clean material from soil particles or seed coats. Take off most of the buffer with a pipette before transferring material to ~2 mL GB in 5 mL Eppendorf tubes on ice (see Note 5). 2. Material can be processed either with the TissueRuptor or manually. Use disposable blades for the TissueRuptor in at least 1.8 mL GB in 5 mL Eppendorf tubes for 1 min at lowest possible speed, not to destroy nuclear integrity. If no TissueRuptor is available, samples can be processed manually by chopping vigorously (at least 5–10 min/sample) in GB. We recommend using the lid of a petri dish tilted with ca. 10 angle (so that excess liquid can collect at the bottom) on ice and to chop moistened but not overly wet tissue with a new razorblade (see Note 2). In general, the quality of nuclei is better after hand chopping, but the yield of rare nuclei is worse. 3. Filter homogenate through 30 μm filter into Eppendorf tubes. For hand chopping, wash petri dish with small volume of GB and combine with sample. 4. Centrifuge 15 min at 1000  g in a precooled microcentrifuge.

Preparing Chromatin and RNA from Rare Cell Types with FANS

99

5. Remove most of the supernatant and resuspend the nuclei carefully (with cutoff pipette tips or a soft brush in 1–2 mL SB). If the ratio of plant material/buffer is unfavorable, another washing step in G-buffer is recommended, otherwise not necessary. 6. Incubate for 15 min. 7. Filter homogenate once more (green Partec filter 30 μm) into flow sorting tube. 3.2 FluorescenceActivated Nuclei Sorting (FANS)

1. For sorting the isolated nuclei, a BD Biosciences FACS Aria II, III, or similar instruments can be used. For DAPI excitation, a 375 nm laser is ideal but a 405 nm laser works, too (with appropriate filters, e.g., 442/46 nm or 450/40 nm). For GFP, a 488 nm laser (and 530/30 nm detection) is required, and for mCherry, a 561 nm laser with 610/20 nm detection is required. 2. If nuclei are sorted into lysis buffer (or TRIzol), a nozzle size of 70 μm and default sheath pressure of 70 PSI can be used. If intact nuclei for downstream analysis are required, we recommend using a larger nozzle (e.g., 100 μm with 20 PSI) (see also Note 6). 3. Collection tube holders and the loading port should be cooled to 4 C. Shake sample tube well before installing it on the loading port to avoid losing nuclei while adjusting gates. Start with a wild-type sample that does not contain fluorescent protein-labeled nuclei. 4. Adjust flow rate and forward (FSC-A) and side scatter (SSC-A) areas. Establish DAPI populations using “count over DAPI-A” as well as “DAPI-A over DAPI-W (width)” (see Note 7). 5. Establish gating for the peak area of the desired fluorophore (e.g., mCherry or eGFP) on PE-A. Described in the following is the procedure for mCherry-labeled material as example. Ideally, events should be distributed along the diagonal. Record a certain number of DAPI events (e.g., 25 k, 50 k or 100 k) and outline a gate with a maximum size shifted towards mCherry-A that does not contain events. Then, with a sample containing labeled nuclei, record the same amount of DAPI events and look for a population of events shifted towards mCherry. Adjust the mCherry gating so that less than 1% of mCherry events occur in the FP-free wild-type sample, compared to the sample with labeled nuclei and start sorting. A typical result is shown in Fig. 1. 6. Save recorded data as PDFs and FCS files for more in-depth analysis or documentation (processing figures with FlowJo, https://www.flowjo.com).

100

Ruben Gutzat and Ortrun Mittelsten Scheid

Fig. 1 Typical FANS result separating mCherry-labeled nuclei (left) from stem cells of the Arabidopsis shoot apical meristem and the corresponding wild type as negative control (right). (a) DAPI profile revealing different ploidy levels; (b) distribution of events; (c) statistical data (derived from Gutzat et al. [6]) 3.3 RNA Extraction and Quantification

1. Prepare collection tubes: for 1–125 μL of sorted nuclei, provide 375 μL of TRIzol LS in 1.5 mL DNA/RNA low-binding tubes. 2. Sort nuclei (up to 125 μL; if more, double TRIzol volume) directly into the prepared collection tubes. If possible, process the samples in a PCR workstation in the following. 3. Follow the suppliers’ manual for TRIzol LS with volumes halved, in short: incubate 5 min at room temperature. 4. Add 100 μL chloroform and shake/vortex vigorously for 15 s. 5. Incubate 10 min at room temperature. 6. Centrifuge 15 min at 4 C full speed. 7. Remove aqueous phase and transfer into new DNA low-binding 1.5 mL tubes (see Note 8). 8. Add 250 μL isopropanol and 1.5 μL RNA-grade glycogen.

Preparing Chromatin and RNA from Rare Cell Types with FANS

101

9. Mix and leave overnight at 20 C. 10. Centrifuge 20 min at 4 C. 11. Wash pellet with 750 μL 75% ethanol (for small RNAs, use 80–85% EtOH). 12. Centrifuge 5 min at 4 C. 13. Repeat washing steps 11 and 12. 14. Carefully remove all ethanol and dry pellets, within PCR workstation or flow hood. 15. Resuspend in 8.2 μL nuclease-free water by pipetting up and down (solubilization might be improved by vortexing and heating samples to 60 C for 2  5 min, interrupted by snapfreezing in liquid nitrogen between heat applications). 16. To avoid multiple freeze-thaw cycles, take an aliquot of 1.2 μL of each sample for RNA quantification before storing the RNA at 80 C. 17. Quantify RNA concentration with Bioanalyzer pico-chip (according to manufacturers’ protocol). In short: (a) Prepare the RNA ladder. (b) Set up chip priming station. (c) Prepare gel and gel-dye mix. (d) Load gel-dye mix, conditioning solution and marker. (e) Load diluted ladder and samples. (f) Insert chip in Agilent 2100 Bioanalyzer, choose appropriate program, enter sample information, and start run. (g) Typical nuclear RNA profiles are shown in Fig. 2 (see Note 9).

Fig. 2 Typical profiles of RNA on a Bioanalyzer pico-chip. Left, total RNA; right, nuclear RNA

102

Ruben Gutzat and Ortrun Mittelsten Scheid

18. An additional informative step now is a test for enrichment of the desired cell type by quantitative RT-PCR with primers for a specificity marker gene and normalization with a housekeeping gene. Due to the low amount of input, we recommend using the iScript cDNA Synthesis Kit (Biorad) and probe system from Roche (see equipment) according to the suppliers’ protocols. Sequencing libraries can be generated either with Smart-seqV4 (Takara) or (much more economically) according to the protocol from [21]. 3.4 DNA Extraction and Quantification

1. Prepare collection tubes: provide 200 μL of genomic lysis buffer (from the Zymo Quick-gDNA MicroPrep Kit) in 1.5 mL DNA/RNA low-binding tubes. 2. Sort into collection tubes as described above. As the lysis buffer contains SDS, precipitation can occur upon cooling but will dissolve later. 3. Follow the suppliers’ protocol “for whole blood, serum, and plasma samples,” with the following details. If possible, process the samples in a PCR workstation or at a flow hood. 4. Mix well by flipping the sample tubes (depending on application, you might want to avoid the vortexing recommended in the suppliers’ protocol) and let stand for 5–10 min at room temperature. 5. Transfer to a Zymo-Spin IC column in a collection tube and spin at 10,000  g for 1 min. Discard collection tube. 6. Transfer column to a new collection tube. Add 200 μL of DNA prewash buffer to the spin column and spin at 10,000  g for 1 min. 7. Add 500 μL of g-DNA wash buffer to the column and spin at 10,000  g for 1 min. 8. Transfer column to a clean DNA low-bind collection tube. Add 16.5 μL DNA elution buffer or DNAse-free water to column. Incubate for 3 min at room temperature and spin at 16,000  g for 30 s. 9. To avoid multiple freeze–thaw cycles, take an aliquot of 1.5 μL from each sample for DNA quantification and freeze all samples at 80 C for future use. 10. Quantify DNA concentration with pico-green on a fluorescent NanoDrop as recommended in the suppliers’ protocol. In short: (a) Generate a standard curve. (b) Prepare 1 TE buffer with nuclease-free water.

Preparing Chromatin and RNA from Rare Cell Types with FANS

103

(c) Dilute 5 μL of dye stock in 995 μL of 1 TE buffer (2 working solution). (d) Dilute 1.5 μL of aliquoted sample in 1.5 μL of 1 TE buffer and 3 μL of 2 working solution. (e) Measure each sample 3 using 1.5 μL and calculate average (consider the 4 dilution!). (f) For DNA methylation analysis, generate sequencing libraries with the Pico Methyl-Seq Kit (Zymo) with as little as 200 pg input DNA according to suppliers’ protocol.

4

Notes 1. For the plant material, the nuclei of interest need to be distinct from any other type in a parameter that is recognized by the available flow sorting equipment. Staining (e.g., by the most common DNA dye, DAPI) can separate nuclei according to DNA content (cell cycle stage, ploidy) [22], but for high specificity and resolution, nuclei can be labeled by expressing fluorescent proteins (e.g., GFP) under control of cell typespecific promoters. A valuable resource on dyes and fluorophores can be found at https://www.biolegend.com/spec traanalyzer. Construction of the reporter lines should consider that the label should stay firmly attached to the nuclei during the whole sorting procedure. Proteins with just a nuclear localization signal (NLS) are often not efficient, as they can diffuse out of the nucleus, especially during the long sorting times necessary for rare cell types. For the Arabidopsis experiments, we have had good experiences using tags fused to the C-terminus of histone H2B (HTB2, At5g22880), and the ubiquitously expressed HTA13 (At3g20670) should also be a suitable candidate as fusion partner and should work in other species. Histones are abundant nuclear proteins, and tagged versions are incorporated without obvious phenotypic abnormalities. We did not try fusing the fluorophore to nuclear envelope proteins, as the signal intensity is likely much lower and would require compensation by very bright reporter genes (e.g., Clover, mNeonGreen). As a rough orientation: 15 young leaves from 14-day-old seedlings should yield >200,000 2C or 4C nuclei; one whole 14-day-old aboveground tissue seedling should yield about 20 SAM nuclei expressing the CLV3-fusion protein that indicates the stem cell feature, and about >200,000 total events. 2. Manual processing by chopping vigorously (at least 5–10 min/ sample) should be done using a new razorblade for every

104

Ruben Gutzat and Ortrun Mittelsten Scheid

sample. The quality is important: Wilkinson double-sided blades worked best for us and are available in most supermarkets. 3. Triton concentration must be optimized; we recommend starting with 0.1% and, if necessary, increasing up to 1%. 4. An alternative is Honda buffer [16]. Depending on the purpose of the experiment, the nuclei isolation buffer can be complemented with suitable inhibitors just prior to use. For subsequent RNA extraction, add RiboLock (e.g., from Fermentas 40 U/μL) to NIB, GB, and SB to a final concentration of 1 U/μL. For protein work with sorted nuclei (e.g., for ChIP), add one Complete™ ULTRA tablet (Roche) or equivalent per 25 mL NIB and GB. 5. If harvest can be done fast, plant material can be collected directly in ~2 mL of GB. 6. The volume of 1000 sorted nuclei with a 70 μm nozzle and 70 PSI will be approximately 1 μL, with 100 μm nozzle and 20 PSI 3.5 μL. For very rare cell types, higher sheath pressure might be required for higher event throughput and shorter sorting times. To have an estimate of the required sorting time: with 25 14-day-old seedlings/replica and 70 PSI, we obtained about 100,000–150,000 DAPI-labeled nuclei in ca. 90 min. 7. Depending on tissue type (with high cell division activity) and preparation, DAPI-A peaks may not be well separated. Nevertheless, the nuclei are worth collecting, if the signals from the fluorescent protein allow a clear separation into positive and negative events. 8. All pipetting of RNA samples should be done with filter tips to avoid RNAse contaminations from the pipettes. 9. As RNA amounts are small, it is recommended to use all resulting RNA for subsequent cDNA synthesis. If larger amounts are needed, several RNA preparations from independent experiments can be pooled and ethanol-precipitated, to obtain the RNA in a suitable volume. References 1. Gutzat R, Mittelsten Scheid O (2012) Epigenetic responses to stress: triple defense? Curr Opin Plant Biol 15(5):568–573. https://doi. org/10.1016/j.pbi.2012.08.007 2. Vriet C, Hennig L, Laloi C (2015) Stressinduced chromatin changes in plants: of memories, metabolites and crop improvement. Cell Mol Life Sci 72(7):1261–1273

3. Borg M, Berger F (2015) Chromatin remodelling during male gametophyte development. Plant J 83(1):177–188. https://doi.org/10. 1111/tpj.12856 4. Pecinka A, Dinh HQ, Baubec T, Rosa M, Lettner N, Mittelsten Scheid O (2010) Epigenetic regulation of repetitive elements is attenuated by prolonged heat stress in Arabidopsis.

Preparing Chromatin and RNA from Rare Cell Types with FANS Plant Cell 22(9):3118–3129. https://doi.org/ 10.1105/tpc.110.078493 5. Feng XQ, Zilberman D, Dickinson H (2013) A conversation across generations: soma-germ cell crosstalk in plants. Dev Cell 24 (3):215–225. https://doi.org/10.1016/j. devcel.2013.01.014 6. Gutzat R, Rembart K, Nussbaumer T, Pisupati R, Hofmann F, Bradamant G, Daubel N, Gaidora A, Lettner N, Dona` M, Nordborg M, Nodine M, Mittelsten Scheid O (2018) Stage-specific transcriptomes and DNA methylomes indicate an early and transient loss of transposon control in Arabidopsis shoot stem cells. bioRxiv:430447. https://doi.org/ 10.1101/430447 7. Kawakatsu T, Stuart T, Valdes M, Breakfield N, Schmitz RJ, Nery JR, Urich MA, Han XW, Lister R, Benfey PN, Ecker JR (2016) Unique cell-type-specific patterns of DNA methylation in the root meristem. Nat Plants 2(5):16058. https://doi.org/10.1038/nplants.2016.58 8. Kawakatsu T, Huang SSC, Jupe F, Sasaki E, Schmitz RJ, Urich MA, Castanon R, Nery JR, Barragan C, He YP, Chen HM, Dubin M, Lee CR, Wang CM, Bemm F, Becker C, O’Neil R, O’Malley RC, Quarless DX, Schork NJ, Weigel D, Nordborg M, Ecker JR, Genomes C (2016) Epigenomic diversity in a global collection of Arabidopsis thaliana accessions. Cell 166(2):492–505. https://doi.org/10.1016/j. cell.2016.06.044 9. Dixit R, Cyr R, Gilroy S (2006) Using intrinsically fluorescent proteins for plant cell imaging. Plant J 45(4):599–615. https://doi.org/10. 1111/j.1365-313X.2006.02658.x 10. Tanz SK, Castleden I, Small ID, Millar AH (2013) Fluorescent protein tagging as a tool to define the subcellular distribution of proteins in plants. Front Plant Sci 4:214. https:// doi.org/10.3389/fpls.2013.00214 11. Carter AD, Bonyadi R, Gifford ML (2013) The use of fluorescence-activated cell sorting in studying plant development and environmental responses. Int J Dev Biol 57 (6–8):545–552. https://doi.org/10.1387/ ijdb.130195mg 12. Deal RB, Henikoff S (2011) The INTACT method for cell type-specific gene expression and chromatin profiling in Arabidopsis thaliana. Nat Protoc 6(1):56–68 13. Sijacic P, Bajic M, McKinney EC, Meagher RB, Deal RB (2018) Changes in chromatin

105

accessibility between Arabidopsis stem cells and mesophyll cells illuminate cell type-specific transcription factor networks. Plant J 94 (2):215–231. https://doi.org/10.1111/tpj. 13882 14. You Y, Sawikowska A, Neumann M, Pose D, Capovilla G, Langenecker T, Neher RA, Krajewski P, Schmid M (2017) Temporal dynamics of gene expression and histone marks at the Arabidopsis shoot meristem during flowering. Nat Commun 8:15120. https:// doi.org/10.1038/ncomms15120 15. Zhang C, Barthelson RA, Lambert GM, Galbraith DW (2008) Global characterization of cell-specific gene expression through fluorescence-activated sorting of nuclei. Plant Physiol 147(1):30–40. https://doi.org/10. 1104/pp.107.115246 16. Weinhofer I, Ko¨hler C (2014) Endospermspecific chromatin profiling by fluorescenceactivated nuclei sorting and chip-on-chip. Methods Mol Biol 1112:105–115 17. Lu ZF, Hofmeister BT, Vollmers C, DuBois RM, Schmitz RJ (2017) Combining ATACseq with nuclei sorting for discovery of cis-regulatory regions in plant genomes. Nucleic Acids Res 45(6):e41. https://doi. org/10.1093/nar/gkw1179 18. Slane D, Bayer M (2017) Cell type-specific gene expression profiling using fluorescenceactivated nuclear sorting. Methods Mol Biol 1629:27–35 19. Pavlova P, Tessadori F, de Jong HJ, Fransz P (2010) Immunocytological analysis of chromatin in isolated nuclei. Methods Mol Biol 655:413–432. https://doi.org/10.1007/ 978-1-60761-765-5_28 20. Galbraith DW, Harkins KR, Maddox JM, Ayres NM, Sharma DP, Firoozabady E (1983) Rapid flow cytometric analysis of the cell cycle in intact plant tissues. Science 220 (4601):1049–1051 21. Picelli S, Faridani OR, Bjorklund AK, Winberg G, Sagasser S, Sandberg R (2014) Full-length RNA-seq from single cells using Smart-seq2. Nat Protoc 9(1):171–181. https://doi.org/10.1038/nprot.2014.006 22. Dolezel J, Greilhuber J, Suda J (2007) Estimation of nuclear DNA content in plants using flow cytometry. Nat Protoc 2(9):2233–2244. https://doi.org/10.1038/nprot.2007.310

Chapter 8 Measurement of Arabidopsis thaliana Nuclear Size and Shape Kalyanikrishna, Pawel Mikulski, and Daniel Schubert Abstract Gene expression is tightly linked to the position of genes in the nucleus. Genomic regions associated with the nuclear envelope are usually repressed, including the heterochromatin carrying chromocenters. The shape and size of nuclei varies within tissues in plants and is dependent on proteins associated with the nuclear envelope. Here, we describe a protocol to isolate Arabidopsis thaliana nuclei and measure their size and morphology. Using this method, novel components regulating the nuclear envelope and chromatin association can be identified and analyzed. Key words Nuclear size, Nuclear shape, Nuclear envelope, Chromatin

1

Introduction The nucleus is one of the most prominent cell organelles and performs the role of storing and processing the information required for the cell to function. In eukaryotes, the nucleus is delimited from the surrounding cytoplasm by the nuclear envelope (NE), which creates a unique territory within the nucleus. Two important aspects of nuclear structure are their size and shape, which vary in a tissue-specific and developmental manner. In animal model systems, genomic ploidy, nuclear structural components, cytoplasmic factors, nucleocytoplasmic transport, the cytoskeleton, and the extracellular matrix can affect the nuclear size and shape [1]. Altered nuclear size and shape are attributed to various disease conditions, yet the control mechanism behind this is still unclear. Elucidating mechanisms of nuclear size regulation and the physiological significance of proper nuclear size control will help to elucidate the interplay between altered nuclear size and diseases like cancer [1]. The structural elements of the NE execute a crucial part in defining the nuclear size and shape. A typical metazoan NE is composed of the outer nuclear membrane (ONM), the inner

Charles Spillane and Peter McKeown (eds.), Plant Epigenetics and Epigenomics: Methods and Protocols, Methods in Molecular Biology, vol. 2093, https://doi.org/10.1007/978-1-0716-0179-2_8, © Springer Science+Business Media, LLC, part of Springer Nature 2020

107

108

Kalyanikrishna et al.

nuclear membrane (INM), the nuclear lamina (NL), and nuclear pore complexes (NPC). Enclosed within, the nucleus harbors organized chromatin territories and subnuclear compartments [2]. In the eukaryotic nucleus, the genome is highly compacted and each chromosome often occupies a specific position which may vary in a tissue-specific manner. Chromatin domains near the interior of the nucleus tend to be actively transcribed while genes at the periphery are often poorly expressed and associated with the nuclear lamina [3]. The role of nuclear size and morphology in chromosomal positioning and gene expression is an important unanswered question. Moreover, there is a strong correlation between nuclear volume and genome size in animals and plants [4, 5]. The NE in plants is structurally similar to the ones in other kingdoms while absence of many homologous proteins and presence of plant specific candidates make it unique. For instance, plants lack sequence homologues of the lamins, and instead of centrosomes, the entire nuclear envelope surface acts as a microtubule-organizing center [6]. In plants, variations in nuclear morphology appear in multiple tissues such as the epidermis, trichomes, and root hairs during development. Embryonic and meristem nuclei in flowering plants are nearly spherical while elongated nuclei typically correspond to differentiated cells [7]. Even though nuclear size correlates with the cell size or genome size, some studies have also emphasized that nuclear size and volume can be modulated independent of ploidy and karyoplasmic ratio [8, 9]. For instance, disrupting the NE or nuclear lamina-like components results in altered nuclear morphology in Arabidopsis thaliana. With recent advances in new technologies for 3D visualization, live cell imaging, and sequencing-based structural imaging, we are now better able to appreciate the connection between nuclear structure and function. Only relatively recently, the nuclear matrix constituent proteins (NMCPs) were identified as the functional homologues of lamins in plants. In Arabidopsis, NMCPs are encoded by the CROWDED NUCLEI14 (CRWN1-4) genes and play an integral role in maintaining nuclear size and shape. Disruption of CRWN genes results in altered nuclear morphology (20–40% reduced nuclear area compared to wild-type, and largely spherical nuclei instead of varied nuclear shapes). In addition, the integrity of the chromocenters is also reduced, especially in crwn4 mutants [10]. Although several components have thereafter been identified that regulate nuclear morphology in plants, it is likely that more NE-regulating components remain to be discovered. In addition, it is as yet unresolved whether mutants that show altered chromatin organization also show modified nuclear morphology. Thus, measurement of nuclear size and shape is a potential phenotypic parameter to examine the control of nuclear morphology. Here, we describe a simple method to measure the nuclear size and area from isolated nuclei of model

Nuclear Size and Shape Measurements

109

plant Arabidopsis, including image acquisition and analysis. In order to obtain appropriate results, a high-quality 2D image and Z-stack settings are crucial in the imaging process. Statistical analysis of measurements from nuclei of sufficient number (>50) is essential to determine significant changes in nuclear size and shape.

2

Materials

2.1 Isolation of Nuclei (Adapted from [11])

1. Plant material: fresh tissue (2-week-old seedlings or healthy young leaves from adult plants), 0.5–0.8 g. 2. 1 M phosphate buffer (PB) stock: Mix 57.7 mL 1 M disodium hydrogen phosphate (Na2HPO4) and 42.3 mL 1 M sodium dihydrogen phosphate (NaH2PO4). Sterile filtrate and store at room temperature. 3. 4% formaldehyde in 0.1 M PB (see Note 1) 4. Nuclei isolation buffer (NIB): 500 mM sucrose, 100 mM KCl, 10 mM Tris–HCl pH 9.5, 10 mM EDTA, 4 mM spermidine, 1 mM spermine, and 0.1% v/v 2-mercaptoethanol. Add 2-mercaptoethanol just before use. NIB can be stored at 20  C for several months. 5. Sharp razor blades. 6. Nylon mesh or cell strainer (50–100 μm). 7. Petri dishes. 8. Vectashield with 40 , 6-diamidino-2-phenylindole (DAPI) (1 μg/mL) mounting medium. 9. Microscopic slides. 10. Coverslip, e.g., Menzer glazer # 1.5 (0.16–0.19 mm thickness).

2.2 Image Acquisition and Analysis

1. Confocal microscope (we used a Leica SP8 SN: 8100000227 inverted microscope supported by LasX software). 2. Suitable immersion liquid (see Note 2). 3. ImageJ software (downloadable from https://imagej.nih.gov/ ij/. We based our analysis on ImageJ v1.8).

3

Methods

3.1 Isolation of Nuclei

1. Harvest 0.5–0.8 g of fresh plant tissue and add 20 mL of 4% PFA in PB to cover the tissue in a 50 mL falcon tube. 2. Vacuum-infiltrate the tissue on ice for 20 min. Apply vacuum for 10 min and release the pressure and vacuum again for 10 min. Be sure that the tissue is covered in the PFA solution.

110

Kalyanikrishna et al.

3. Wash the fixed sample in ice-cold PB buffer three times. 4. Transfer the sample to the glass petri dish and remove excess buffer by pipetting. 5. Add 200 μL of nuclei isolation buffer to the sample and finely chop using the razor blade. Do not tear the tissue. 6. Gently tilt the petri dish so that you can pipet the solution without the chopped sample; filter it through the nylon mesh to a new 1.5 mL tube. 7. Repeat the above steps by adding 300 μL of NIB. 8. Centrifuge in a precooled tabletop centrifuge at approximately 500  g for 3–5 min. 9. Remove the supernatant and add 20 μL of NIB. Resuspend the pellet very gently. 3.2 Spreading and Staining with DAPI

1. Pipet 2 μL of isolated nuclei onto a clean microscopic slide and spread into a thin layer and allow to dry. 2. Add 3 μL of DAPI and carefully place the coverslip on top. Avoid formation of air bubbles. 3. Seal the sides with a nail polish or a suitable adhesive. Slides can stay in 4  C for several months.

3.3 Image Acquisition

1. Start-up the microscope and PC along with the scanner power, laser power, and fluorescent lamp. 2. If you are using the Leica SP_8 system, launch the LasX software in PC and select configuration “machine.” 3. Select the suitable objective in the software to set it in the microscope (manual selection is possible if this option is available). 4. Clean the objective, add a small drop of immersion liquid, and carefully place the slide. Turn on the fluorescent lamp and focus the sample using the epifluorescence mode. 5. In the LasX software, go to Configuration and select Laser Config. Turn on the suitable laser. Photomultiplier (PMT) and hybrid lasers are available in this system. PMT is largely enough to image the DAPI-stained nuclei. 6. Go to Acquisition and set the parameters for imaging. Laser line setting enables you to adjust the laser intensity. Care should be taken not to overexpose the nuclei. Using PMT settings, set dye color (for instance, DAPI to blue) and modify range of emission spectra (410–550 nm for DAPI). 7. In the Scan settings, resolution and scan speed can be set. For Arabidopsis nuclei, a resolution of 512  512 with a scan speed 200 is suitable. Averaging sums up the pixel values from the scans and uses the arithmetic mean as the final value which

Nuclear Size and Shape Measurements

111

enables the preservation of persistent pixel values and avoids background noise. Both line averaging and frame averaging are possible; line average can be set to two or three here. 8. To collect Z-stacks, go to “Z-stack Configuration” and first define the volume by setting the Z-values from top and bottom of the area to be scanned using the Z-position knob in the control panel. Next, define the resolution by entering number of slices or thickness of Z-stack steps. Otherwise choose the system-optimized setting provided by the software. Press start and wait for the series to be scanned. Proper Z-stack dimensions are essential to have right planes for measurement. Here we recommend to use a Z-stack size between 0.2 and 0.3 μm. 9. Scan and collect Z-stack for all genotypes/samples to be analyzed. Save your project with the “.lif” extension to have the raw data. This file will be used to analyze the image later in ImageJ software. 3.4 Semiautomated Image Analysis

1. Install Fiji/ImageJ from: https://imagej.net/Fiji/Downloads. Follow downloading instructions present at the webpage. 2. Open the Z-stack DAPI image in Fiji. Drag and drop image file from the folder, press Ctrl+O or select:File!Open in the program menu. 3. Calibrate image distances. Go to Analyze!Set Scale in the program menu. Modify distance values to readjust pixels/μm (or other unit of length). If opened Z-stacks contain metadata from microscopy software, the scale bar is adjusted automatically and does not require any further modifications. 4. Duplicate image for a backup file. Press right mouse click, select duplicate. In the newly opened window, tick duplicate hyperstack to copy all slices from the Z-stack. If the image has multiple channels, only DAPI channel is necessary for the next steps. 5. Manually select Z-stack slice where the nuclei are in the middle cross section and show highest 2D area within the Z-stack. If some nuclei are out of plane, the measurements for them should be repeated on their respective middle cross-sectional plane. Use sliding bar below the image window to navigate between slices. 6. Binarize and threshold the image. Select threshold by pressing Ctr+Shift+T or go to Image!Adjust!Threshold. In newly opened window, select “Default” as thresholding method and “B&W” for black and white visualization. Depending on the user preference, background can be kept black or white by in “Dark background” option. Thresholding can be manually readjusted by moving

112

Kalyanikrishna et al.

sliding bars in the “Threshold” window. Changes will be applied to all the slices in the Z-stack. As a result, images will be transformed black and white binary projection. 7. Set parameters for size and shape measurements. Go to Analyze!Set Measurements. In the newly opened window, tick “Area,” “Shape descriptors,” and “Stack position.” “Stack position” is optional, but it provides information helpful to navigate between image and final calculations. Selecting “None” option next to “Redirect to” line allows to apply parameters globally, without specifying particular image file. 8. Start ROI manager. Go to Analyze!Tools!ROI Manager. ROI Manager helps to organize regions of interest (ROI) and save their positions on the threshold image. In the newly opened window, tick “Labels” and “Show all” to track nuclei selection process. 9. Select Wand Tool. Right-click on wand icon in the program menu bar to select the tool. 10. Select binarized nuclei and add as ROI for measurements. Using Wand Tool, right-click on binarized nuclei spots and press “t” on the keyboard or navigate to ROI Manager window and press “Add.” Added ROI will appear in ROI Manager window. Continue until all nuclei spots are selected. 11. Save ROI positions. After selecting all ROI, navigate to ROI Manager, go to More!Save, and select destination to save ROI positions. 12. Measure ROI. Navigate to ROI Manager and press “Measure.” Newly appeared Results window will contain shape and size measurements. Select the table (Ctrl+A or go to Edit!Select All in Results window) and copy (Ctrl+C or go to Edit!Copy in Results window). 13. ROI statistics. Open calculation program (e.g., Excel) and paste Results table. Repeat above steps for all relevant conditions and genotypes. Run statistical tests (e.g., F-test and Student’s T-test) to assess significance of any differences in nuclei area (“Area” column in Results table) and shape between conditions or genotypes. We recommend using circularity index as the nuclear shape indicator. Circularity index is defined as the following: (4π  [Area]/[Perimeter]2), with value of 1.0 corresponding to perfect circle and closer to 0.0 to elongated shapes. Reliable results should contain measurements of at least 50–100 nuclei

Nuclear Size and Shape Measurements

113

per genotype or condition. If number of nuclei in a particular Z-stack is low, measurements should be continued on the other Z-stacks of the same genotype/condition until reliable number of nuclei is reached.

4

Notes 1. For PFA concentrations more than 2%, phosphate buffer (PB) is more suitable than PBS to prevent acidification during fixation. To speed up the dissolving of PFA, few drops of 1 M NaOH can be added. 2. Proper immersion fluid is very important. The refractive index of the mounting media and the immersion fluid should be in the same range to have good quality of image.

References 1. Jevtic´ P, Levy DL (2014) Mechanisms of nuclear size regulation in model systems and cancer. Adv Exp Med Biol 773:537–569 2. Hetzer MW, Walther TC, Mattaj IW (2005) Pushing the envelope: structure, function, and dynamics of the nuclear periphery. Annu Rev Cell Dev Biol 21:347–380 3. Mukherjee R, Chen P, Levy D (2016) Recent advances in understanding nuclear size and shape. Nucleus 7:167–186 4. Baetcke KP, Sparrow AH, Nauman CH, Schwemmer SS (1967) The relationship of DNA content to nuclear and chromosome volumes and to radiosensitivity (LD50). Proc Natl Acad Sci U S A 58:533–540 5. Vialli M (1957) Deoxyribonucleic acid volume & content per nucleus. Exp Cell Res 13:284–293 6. Evans DE, Irons SL, Graumann K, Runions J (2008) The plant nuclear envelope. Plant Cell Monogr 14:9–28. https://doi.org/10.1007/ 7089_2009_229

7. Traas J, Hu¨lskamp M, Gendreau E, Ho¨fte H (1998) Endoreduplication and development: rule without dividing? Curr Opin Plant Biol 1:498–503 8. Jovtchev G, Schubert V, Meister A, Barow M, Schubert I (2006) Nuclear DNA content and nuclear and cell volume are positively correlated in angiosperms. Cytogenet Genome Res 114:77–82 9. Sugimoto-Shirasu K, Roberts K (2003) “Big it up”: endoreduplication and cell-size control in plants. Curr Opin Plant Biol 6:544–553 10. Wang H, Dittmer TA, Richards EJ (2013) Arabidopsis CROWDED NUCLEI (CRWN) proteins are required for nuclear size control and heterochromatin organization. BMC Plant Biol 13:200 11. Pavlova P, Tessadori F, Jong H, Franz P (2010) Immunocytological analysis of chromatin in isolated nuclei. Methods Mol Biol 655:413–432

Chapter 9 Study of Cell-Type-Specific Chromatin Organization: In Situ Hi-C Library Preparation for Low-Input Plant Materials Nan Wang and Chang Liu Abstract The three-dimensional folding of chromatin contributes to the control of genome functions in eukaryotes, including transcription, replication, chromosome segregation, and DNA repair. In recent decades, many cytological and molecular methods have provided profound structural insights into the hierarchical organization of plant chromatin. With the Hi-C (high-throughput chromosome conformation capture) technique, analyses of global chromatin organization in plants indicate considerable differences across species. However, our knowledge of how chromatin organization at a local level is connected to tissue-specific gene expression is rather limited. This problem can be tackled by performing fluorescence-activated sorting of fixed nuclei followed by Hi-C, which is tailored for a limited number of input nuclei. Here, we describe an approach of isolating Arabidopsis thaliana nuclei with defined endopolyploidy level and subsequent in situ Hi-C library preparation for low-input plant materials. In principle, this method can be applied to any types of fluorescence-labeled nuclei, offering researchers a useful tool to unveil temporal and spatial chromatin dynamics in 3D in a tissue-specific context. Key words In situ Hi-C, FACS, Chromatin organization, Low-input

1

Introduction It is recognized that diverse genome functions are linked not only to the linear genomic DNA sequence but also to the threedimensional (3D) genome organization. Thanks to the invention and development of many cytological and molecular tools (e.g., fluorescent in situ hybridization (FISH) and chromosome conformation capture (3C)), our understanding of genome organization and function in the nuclear space has been revolutionized recently [1–4]. The Hi-C (high-throughput chromosome conformation capture) method was developed from 3C. As it combines nextgeneration sequencing, the Hi-C technique permits highthroughput, high-resolution analysis of the chromatin interaction network in a genome-wide manner [5]. Briefly, the preparation of

Charles Spillane and Peter McKeown (eds.), Plant Epigenetics and Epigenomics: Methods and Protocols, Methods in Molecular Biology, vol. 2093, https://doi.org/10.1007/978-1-0716-0179-2_9, © Springer Science+Business Media, LLC, part of Springer Nature 2020

115

116

Nan Wang and Chang Liu

Hi-C libraries includes chromatin fixation, digestion, labeling, ligation, affinity purification of the ligation products, and conversion of DNA to sequencing libraries [6]. In plant science, researchers have applied the Hi-C method to the model species Arabidopsis thaliana and many crops to improve plant genome assembly [7–10], as well as to study structural features of chromatin organization of individual species [11–16] and how genomes interact with each other upon hybridization [17, 18]. In general, plant chromatin organization is found to be highly correlated with the epigenomic landscape of the genome at a chromosomal scale, as it generally is in animals [19]. On the other hand, studies focusing on understanding how the 3D plant chromatin organization modulates individual gene expression, particularly, during plants’ development and responses to the ever-changing environment, are limited. As the regular plant Hi-C protocol uses a mixture of different cell types as starting materials (typically seedlings), the output Hi-C data only reflects average chromatin interactions, which can potentially mask signals exclusively associated with a minority of cells. In response to the increasing demand in the plant science community for specifically and precisely harvesting nuclei of interest, methods permitting tissue-specific nuclei collection, such as those using the FACS (fluorescence-activated cell sorting) and INTACT (isolation of nuclei tagged in specific cell types) techniques, have been developed [20–23]. Here we describe a protocol of two modules: (1) isolating plant nuclei carrying fluorescent markers with FACS and (2) in situ Hi-C library preparation for low-input materials. The combination of these two modules allows us to address questions such as how tissue-specific chromatin structures are associated to transcriptional regulation and how diverse a genomic region is organized in different types of cells. Furthermore, each of these two modules can be easily integrated into other protocols. For example, FACScollected nuclei can be directly used for chromatin immunoprecipitation or FISH experiments. In addition to isolating cell-typespecific nuclei, the FACS approach allows one to differentiate nuclei with different ploidy levels caused by endoreduplication, which occurs widely in plants [24]. On the other hand, our in situ Hi-C library preparation protocol for low-input nuclei has been successful with 20,000 2C Arabidopsis thaliana nuclei, which is drastically less than that in our regulator plant Hi-C protocol requiring millions of input nuclei [25]. Compared to our regular plant Hi-C protocol [25], this modified version has simplified steps of recovering Hi-C ligation products, and it is more suited for scaling up to process multiple samples in parallel.

Study of Cell-Type-Specific Chromatin Organization

2

117

Materials Prepare all solutions with double-distilled water. It is recommended to use filter tips.

2.1

Tissue Fixation

1. MC buffer: 10 mM potassium phosphate, pH 7.0, 50 mM sodium chloride (NaCl); 0.1 M sucrose (see Note 1). 2. Formaldehyde (e.g., 37% stock). 3. MC buffer with glycine: 0.15 M glycine, 10 mM potassium phosphate, pH 7.0, 50 mM NaCl; 0.1 M sucrose (see Note 2).

2.2 Nuclei Isolation and Flow Cytometry

1. Nuclei isolation buffer: 20 mM HEPES, pH 8.0, 250 mM sucrose, 1 mM magnesium chloride (MgCl2), 5 mM potassium chloride (KCl), 40% (v/v) glycerol, 0.25% Triton X-100, 0.1 mM phenylmethylsulfonyl fluoride (PMSF), 0.1% (v/v) 2-mercaptoethanol (see Note 3). 2. Homemade filter tips (Fig. 1). See Note 4. 3. DAPI solution: 1 μg/mL 40 ,6-diamidino-2-phenylindole (DAPI) dissolved in PBS (phosphate-buffered saline).

2.3 Chromatin Digestion, Ligation, and DNA Purification

1. RE buffer (10): 1 M NaCl, 500 mM Tris–HCl, pH 7.9, 100 mM MgCl2, 10 mM 1,4-dithiothreitol (DTT) (see Note 5). 2. 10% (w/v) SDS solution: dissolved in water. 3. 10% (v/v) Triton X-100: diluted with water. 4. DpnII. 5. Regular deoxynucleotide triphosphates (in separate tubes): 10 mM dATP, 10 mM dGTP, 10 mM dTTP. 6. Biotin-labeled dCTP: 0.4 mM biotin-14-dCTP (Thermo Fisher Scientific). 7. Klenow fragment. 8. Blunt-end ligation buffer (10): 300 mM Tris–HCl, pH 7.8, 100 mM MgCl2; 100 mM DTT and 1 mM ATP (see Note 6). 9. T4 DNA ligase. 10. SDS lysis buffer: 50 mM Tris–HCl, pH 8.0, 1% SDS, 10 mM EDTA. 11. 18 mg/mL proteinase K. 12. 5 M NaCl solution: dissolved in water. 13. 100% isopropanol. 14. Glycogen (20 mg/mL). 15. 80% ethanol. 16. TE buffer: 10 mM Tris–HCl, pH 8.0, 1 mM EDTA. 17. 3 M sodium acetate buffer: pH 5.2. Adjust the pH to 5.2 with acetic acid.

118

Nan Wang and Chang Liu

Fig. 1 Collecting nuclei from crude extract with a homemade filter tip. (a–f) Making a homemade filter tip. (a–c) Cut a 1 mL pipette tip and a piece of Miracloth or nylon membrane (pore size: from 20 to 50 μm). (d) Bring the cut-end of the pipette tip close to flame for a few seconds until it is about to melt. (e) Quickly press the partially melted pipette tip against the membrane, which is placed on a glass slide on ice. (f) A ready-touse filter tip. (g) Load crude extract into a homemade filter tip. The flow-through is directed to a tube placed on ice 2.4 DNA Manipulation and Library Amplification

1. T4 DNA polymerase. 2. EDTA solution: 0.5 M EDTA, pH 8.0. Adjust the pH to 8.0 with NaOH. 3. Regular deoxynucleotide triphosphates (in separate tubes): 10 mM dATP, 10 mM dTTP. 4. AMPure® XP beads (Beckman Coulter) (critical step: see Note 7). 5. Magnetic tube rack.

Study of Cell-Type-Specific Chromatin Organization

119

6. Ethanol (80%). 7. Dynabeads® MyOne™ Streptavidin C1 beads (Invitrogen). 8. TWB buffer: 5 mM Tris–HCl, pH 8.0, 0.5 mM EDTA, 1 M NaCl, 0.05% Tween-20 (v/v). 9. BB buffer (2): 10 mM Tris–HCl, pH 8.0, 1 mM EDTA, 2 M NaCl. 10. NEBNext® Ultra™ II DNA Library Prep Kit. 11. Tris elution buffer: 10 mM Tris–HCl, pH 8.0.

3 3.1

Methods Tissue Fixation

1. Immerse samples in 100 mL of MC buffer, add 2.78 mL of formaldehyde to achieve a final concentration at 1% (v/v), apply vacuum infiltration for 15 min at room temperature, and then release vacuum slowly (see Note 8). 2. Apply a second round of vacuum infiltration as described above. 3. Discard fixative and immerse samples in 100 mL of MC buffer with glycine, apply vacuum infiltration for 10 min at room temperature, and release vacuum slowly. 4. Discard the solution, and rinse the fixed tissues with water briefly (see Note 9). 5. Blot the samples with tissue towels, and press gently to absorb more water. 6. If needed, proceed to manual dissection to collect desired tissues (see Note 10). 7. Proceed to nuclei extraction. Alternatively, the fixed samples can be kept in a 80  C freezer for long-term storage.

3.2 Nuclei Isolation and Flow Cytometry

1. Mix the fixed materials with 1 mL nuclei isolation buffer (see Note 11). 2. Chop the tissues with a razor to slurry (see Note 12). 3. Filter the slurry with a homemade filter tip, and collect the flow-through in a microcentrifuge tube (Fig. 1). Add 1 μL of DAPI solution to 1 mL of the nuclei flow-through, and load the sample directly onto a FACS machine (see Note 13). 4. Analyze DAPI signal distribution, from which apply gate settings to specifically collect 100,000 2C nuclei (Fig. 2). See Note 14. 5. Centrifuge the collected nuclei at 3,000 rcf for 5 min at 4  C, and carefully remove the supernatant (see Note 15).

120

Nan Wang and Chang Liu

Fig. 2 Collecting endopolyploidy level-specific nuclei with FACS. (a) Crude nuclei extract from fixed Arabidopsis leaves are analyzed. Nuclei with different endopolyploidy levels (2C, 4C, 8C, and 16C) are inferred according to their DAPI signals. (b) Collection of 2C nuclei 3.3 Chromatin Digestion, Ligation, and DNA Purification

1. Gently resuspend the pellet (usually invisible) in 12.5 μL of 0.5% SDS, and incubate the mixture at 62  C for 5 min. 2. Add 36.25 μL of water and 6.25 μL of 10% Triton X-100, mix gently by inverting the tube, and incubate at 37  C for 15 min. 3. Add 6.25 μL of RE buffer (10) and 10 U of DpnII, mix gently by tapping the tube, and incubate overnight at 37  C. 4. Incubate at 62  C for 20 min, and then cool down to room temperature (see Note 16). 5. Add 0.25 μL of 10 mM dTTP, 0.25 μL of 10 mM dATP, 0.25 μL of 10 mM dGTP, 6.25 μL of 0.4 mM biotin-14dCTP, 3.5 μL of water, and 1 μL of Klenow fragment (10 U), mix gently by tapping the tube, and incubate at 37  C for 2 h. 6. Add 166 μL of water, 30 μL of blunt-end ligation buffer (10), 25 μL of 10% Triton-100, and 5 U of T4 DNA ligase, mix gently, and incubate at room temperature for 4 h. 7. Centrifuge at 1,000 rcf for 3 min at room temperature, discard supernatant, and resuspend the pellet with 150 μL of SDS lysis buffer. 8. Add 1 μL of proteinase K (18 mg/mL) and incubate at 55  C for 30 min. 9. Add 8 μL of 5 M NaCl, and incubate at 65  C for 6 h to overnight. 10. Add 0.4 μL of glycogen, 16 μL of sodium acetate buffer, and 160 μL of isopropanol. Mix by vortexing.

Study of Cell-Type-Specific Chromatin Organization

121

11. Centrifuge at 13,000 rcf for 20 min at 4  C, and wash the pellet with 80% ethanol. Air-dry the pellet, and dissolve it with 100 μL of TE buffer. 3.4 DNA Manipulation and Library Amplification

1. Top up the DNA solution volume to 130 μL with TE (see Note 17). 2. Transfer the DNA solution into a Covaris® microTUBE; sonicate to produce 250~500 bp fragments (see Note 18). 3. Transfer the sonicated DNA solution (130 μL) into a PCR tube, mix it with 71.5 μL of AMPure® XP beads, mix, and incubate at room temperature for 10 min. Place the tube on a magnetic tube rack for 1 min, and transfer all the supernatant into a new PCR tube (see Note 19). 4. Add 32.5 μL of AMPure® XP beads, mix, and incubate at room temperature for 10 min. Place the tube on a magnetic tube rack for 1 min and discard the supernatant (see Note 20). 5. While keeping the tube on the magnetic tube rack, rinse beads with 200 μL of 80% ethanol without disturbing them. 6. Air-dry the beads and resuspend them with 20 μL of Tris elution buffer. 7. To the 20 μL of eluted DNA, add 10 μL of supplied buffer (5) for T4 DNA polymerase, 0.5 μL of 10 mM dTTP, 0.5 μL of 10 mM dATP, 5 U of T4 DNA polymerase, and water to top up the volume to 50 μL. Incubate at 20  C for 30 min (see Note 21). 8. Stop the reaction by adding 1 μL of EDTA solution. 9. Add 50 μL of AMPure® XP beads, mix, and incubate at room temperature for 10 min. Place the tube on a magnetic tube rack for 1 min and discard the supernatant. 10. While keeping the tube on the magnetic tube rack, rinse beads with 200 μL of 80% ethanol without disturbing them. 11. Air-dry the beads and resuspend them with 20 μL of Tris elution buffer. 12. Mix 20 μL of the eluted DNA with 1.2 μL of Ultra II End Prep Enzyme Mix and 2.8 μL of Ultra II End Prep Reaction Buffer; incubate at 20  C for 30 min followed by incubation at 65  C for 30 min (see Note 22). 13. Add 12 μL of Ultra II Ligation Master Mix, 0.4 μL of ligation enhancer, and 1 μL of diluted adaptor (diluted 1:10 with water) for Illumina sequentially. Mix, and incubate at 20  C for 15 min. 14. Add 1.2 μL of USER Enzyme, mix, and incubate at 37  C for 15 min.

122

Nan Wang and Chang Liu

15. Mix the DNA 40 μL of AMPure® XP beads, recover DNA as described above in steps 9–11, and elute DNA with 50 μL of Tris elution buffer. 16. During the DNA purification process, mix 10 μL of Dynabeads® MyOne™ Streptavidin C1 beads with 300 μL of TWB buffer. Place the tube on a magnetic tube rack for 2 min, discard the supernatant, and resuspend the beads with 50 μL of BB buffer (2). 17. Mix the purified DNA (50 μL) with the equilibrated Streptavidin C1 beads. Incubate at room temperature for 15 min. Tap the tube briefly in every 5 min (see Note 23). 18. Place the tube on a magnetic tube rack for 2 min, and discard the supernatant. Resuspend the beads with 500 μL of TWB buffer. 19. Repeat the last step. 20. Wash the beads with 500 μL of Tris elution buffer. 21. Resuspend the beads with 20 μL of Tris elution buffer (see Note 24). 22. To amplify library molecules, set up a PCR reaction as follows: 10 μL of beads, 25 μL of Ultra II Q5 Master Mix, 5 μL of universal primer, 5 μL of selected index primer, and 5 μL of water. Run PCR with the following program: 98  C for 30 s, then 16 cycles of amplification (in each cycle, 98  C for 10 s, 65  C for 30 s), and finally 65  C for 2 min (see Note 25). 23. Perform agarose gel electrophoresis with 2 μL of PCR products. Typically, the amplified library appears as a homogeneous smear between 400 and 700 bp (Fig. 3).

Fig. 3 Amplified in situ Hi-C libraries. In this gel photo, all the three samples show smears distributed between 500 bp and 700 bp

Study of Cell-Type-Specific Chromatin Organization

123

24. Purify the PCR products with any regular PCR product purification kit. 25. Quantify the Hi-C library with a Bioanalyzer.

4

Notes 1. Prepare it freshly prior to use. 2. Dissolve glycine powder prior to use. This can be done during the fixation step. 3. PMSF stock solution (0.1 M) can be made by dissolving PMSF with 100% isopropanol and stored at 20  C. Nuclei isolation buffer without PMSF and 2-mercaptoethanol can be stored at 4  C for months; prior to usage, add these two components freshly. 4. We recommend using a homemade filter to handle a small volume of homogenized sample (e.g., 1 mL). 5. The recipe of RE buffer totally depends on the choice of a restriction enzyme. If an alternative enzyme other than DpnII is used, it is necessary to check its compatibility with the RE buffer described in this protocol. 6. It has been reported that blunt-end ligation is suppressed by a high concentration of ATP [26]; here the ATP concentration is lower than what is routinely used for sticky end ligation. 7. Size selection of Hi-C DNA molecules is a critical step in this protocol. The Hi-C library is always a mixture of intact genomic DNA and informative chimeric DNA. A recent study shows that Hi-C libraries with larger insert sizes tend to contain more informative Hi-C reads [27]. On the other hand, newly developed Illumina sequencers (e.g., HiSeq 3000/ HiSeq 4000) have restrictions on library insert sizes, in which libraries with average insert sizes over 500 bp are not recommended. To ensure selection of DNA within the desired size range, it is therefore highly recommended to calibrate the AMPure® XP beads by mixing them with DNA ladders with various ratios. 8. Efficient fixation relies on good penetration of formaldehyde into tissues. If samples are thick or with considerable cuticles on their surface, it is necessary to cut samples into small pieces manually, and add 10 μL of 10% Triton X-100 to 100 mL of fixative. 9. If needed, the fixed materials can be manually dissected to remove unwanted parts. For example, one can remove flower buds of late developmental stages from inflorescences, if the nuclei of interest are located in floral meristems. Doing so can

124

Nan Wang and Chang Liu

exclude a large number of unrelated nuclei from the crude nuclei extract, significantly reducing the time spent with the FACS machine. 10. In some cases, manual dissection remarkably improves the efficiency of nuclei collection when running flow cytometry at a later stage, as a large fraction of unwanted tissues/organs can be simply removed here. Because plant materials are fixed, even if the entire dissection procedure takes hours, we consider potential changes in chromatin structures negligible. 11. Do not use excessive buffer, as it determines the input volume (hence the duration) of the flow cytometry step. 12. Another common method to disrupt plant materials is to grind them with a mortar and pestle in liquid nitrogen. However, we find that this method produces many more pieces of tiny debris than does the chopping method, making FACS signals noisier. 13. We do not recommend spinning down the nuclei. In our experience, nuclei extracted from fixed plant samples are prone to form clumps after centrifugation, which are difficult to disperse. 14. Parameter settings for harvesting the desired nuclei depend on the nature of the fluorescent labels with which the nuclei are labeled, and the FACS machine model. 15. In our experience, the nuclei pellet is usually invisible. Thus, it is critical to mark the tube before taking it out from the rotor so that the pellet will not get scratched accidently during pipetting. 16. This step is not required if a 6-cutter enzyme is used for chromatin digestion. 17. This volume is designed for performing DNA shearing with a microTUBE (Covaris®). If a different type of sonicator is used, adjustment of the top-up volume might be necessary. 18. Parameters for Covaris® S220: duty cycle 10, intensity 5, cycles per burst 200, time 50 s. Parameters for Covaris® E220: duty cycle 10, intensity 170, cycles per burst 200, time 60 s. 19. The aim of this step is to attach DNA molecules with sizes above 500 bp to the beads and to get rid of them (1 volume of DNA mixed with 0.55 volume of beads). Make sure that the AMPure® XP beads have been calibrated. If the sonicated DNA solution has a different volume, adjust the amount of AMPure® XP beads proportionally. 20. Adding more AMPure® XP beads at this step (1 volume of DNA mixed with 0.161 volume of beads) results in the binding of beads to DNA with sizes above 300 bp.

Study of Cell-Type-Specific Chromatin Organization

125

21. This step aims to get rid of biotin-labeled cytosine residues from unligated DNA, which are present at the end of these DNA molecules. By supplying dTTP and dATP, the 30 -to-50 exonuclease activity of T4 DNA polymerases is inhibited selectively so that bases 50 upstream of the terminal biotin-labeled cytosine are protected from removal. Depending on the choice of restriction enzyme for chromatin digestion, different deoxynucleotide triphosphates might be used. For example, if the restriction enzyme is HindIII, dGTP and/or dATP but not dTTP should be supplied. 22. Steps 12 to 14 in this section incorporate the NEBNext® Ultra™ II Library Prep system, in which DNA molecules are end-repaired and ligated to adaptors. Those who are not familiar with this system should read the detailed protocol from NEB. 23. The tube can be also placed on a shaker set at 1,000 rpm. 24. A trial PCR can be performed by taking 0.5 μL of beads as template, which is helpful in determining the optimal cycle number. 25. This PCR program is according to the NEBNext® Ultra™ II Library Prep system that uses Q5 High-Fidelity DNA Polymerases. If another type of polymerase is used, the PCR program should be revised according to manufacturer’s instructions.

Acknowledgment This work has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No. 757600). References 1. Bonev B, Cavalli G (2016) Organization and function of the 3D genome. Nat Rev Genet 17 (11):661–678. https://doi.org/10.1038/nrg. 2016.112 2. Dekker J, Belmont AS, Guttman M, Leshyk VO, Lis JT, Lomvardas S, Mirny LA, O’Shea CC, Park PJ, Ren B, Politz JCR, Shendure J, Zhong S, Network DN (2017) The 4D nucleome project. Nature 549 (7671):219–226. https://doi.org/10.1038/ nature23884 3. Gibcus JH, Dekker J (2013) The hierarchy of the 3D genome. Mol Cell 49(5):773–782. https://doi.org/10.1016/j.molcel.2013.02. 011

4. Spielmann M, Lupianez DG, Mundlos S (2018) Structural variation in the 3D genome. Nat Rev Genet 19(7):453–467. https://doi. org/10.1038/s41576-018-0007-0 5. Lieberman-Aiden E, Dekker J (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326(5950):289 6. Belton JM, Mccord RP, Gibcus J, Naumova N, Ye Z, Dekker J (2012) Hi-C: a comprehensive technique to capture the conformation of genomes. Methods 58(3):268–276 7. Jibran R, Dzierzon H, Bassil N, Bushakra JM, Edger PP, Sullivan S, Finn CE, Dossett M, Vining KJ, VanBuren R, Mockler TC,

126

Nan Wang and Chang Liu

Liachko I, Davies KM, Foster TM, Chagne D (2018) Chromosome-scale scaffolding of the black raspberry (Rubus occidentalis L.) genome based on chromatin interaction data. Hortic Res 5:8. https://doi.org/10.1038/ s41438-017-0013-y 8. Lightfoot DJ, Jarvis DE, Ramaraj T, Lee R, Jellen EN, Maughan PJ (2017) Singlemolecule sequencing and Hi-C-based proximity-guided assembly of amaranth (Amaranthus hypochondriacus) chromosomes provide insights into genome evolution. BMC Biol 15(1):74. https://doi.org/10.1186/ s12915-017-0412-4 9. Mascher M, Gundlach H, Himmelbach A, Beier S, Twardziok SO, Wicker T, Radchuk V, Dockter C, Hedley PE, Russell J, Bayer M, Ramsay L, Liu H, Haberer G, Zhang XQ, Zhang Q, Barrero RA, Li L, Taudien S, Groth M, Felder M, Hastie A, Simkova H, Stankova H, Vrana J, Chan S, MunozAmatriain M, Ounit R, Wanamaker S, Bolser D, Colmsee C, Schmutzer T, AliyevaSchnorr L, Grasso S, Tanskanen J, Chailyan A, Sampath D, Heavens D, Clissold L, Cao S, Chapman B, Dai F, Han Y, Li H, Li X, Lin C, McCooke JK, Tan C, Wang P, Wang S, Yin S, Zhou G, Poland JA, Bellgard MI, Borisjuk L, Houben A, Dolezel J, Ayling S, Lonardi S, Kersey P, Langridge P, Muehlbauer GJ, Clark MD, Caccamo M, Schulman AH, Mayer KFX, Platzer M, Close TJ, Scholz U, Hansson M, Zhang G, Braumann I, Spannagl M, Li C, Waugh R, Stein N (2017) A chromosome conformation capture ordered sequence of the barley genome. Nature 544(7651):427–433. https://doi.org/10.1038/nature22043 10. Raymond O, Gouzy J, Just J, Badouin H, Verdenaud M, Lemainque A, Vergne P, Moja S, Choisne N, Pont C, Carrere S, Caissard JC, Couloux A, Cottret L, Aury JM, Szecsi J, Latrasse D, Madoui MA, Francois L, Fu X, Yang SH, Dubois A, Piola F, Larrieu A, Perez M, Labadie K, Perrier L, Govetto B, Labrousse Y, Villand P, Bardoux C, Boltz V, Lopez-Roques C, Heitzler P, Vernoux T, Vandenbussche M, Quesneville H, Boualem A, Bendahmane A, Liu C, Le Bris M, Salse J, Baudino S, Benhamed M, Wincker P, Bendahmane M (2018) The Rosa genome provides new insights into the domestication of modern roses. Nat Genet 50 (6):772–777. https://doi.org/10.1038/ s41588-018-0110-3 11. Grob S, Schmid MW, Grossniklaus U (2014) Hi-C analysis in Arabidopsis identifies the KNOT, a structure with similarities to the flamenco locus of Drosophila. Mol Cell 55

(5):678–693. https://doi.org/10.1016/j. molcel.2014.07.009 12. Dong P, Tu X, Chu PY, Lu P, Zhu N, Grierson D, Du B, Li P, Zhong S (2017) 3D chromatin architecture of large plant genomes determined by local A/B compartments. Mol Plant 10(12):1497–1509. https://doi.org/10. 1016/j.molp.2017.11.005 13. Dong Q, Li N, Li X, Yuan Z, Xie D, Wang X, Li J, Yu Y, Wang J, Ding B, Zhang Z, Li C, Bian Y, Zhang A, Wu Y, Liu B, Gong L (2018) Genome-wide Hi-C analysis reveals extensive hierarchical chromatin interactions in rice. Plant J 94(6):1141–1156. https://doi.org/ 10.1111/tpj.13925 14. Feng S, Cokus SJ, Schubert V, Zhai J, Pellegrini M, Jacobsen SE (2014) Genomewide Hi-C analyses in wild-type and mutants reveal high-resolution chromatin interactions in Arabidopsis. Mol Cell 55(5):694–707. https://doi.org/10.1016/j.molcel.2014.07. 008 15. Liu C, Cheng YJ, Wang JW, Weigel D (2017) Prominent topologically associated domains differentiate global chromatin packing in rice from Arabidopsis. Nat Plants 3(9):742–748. https:// doi.org/10.1038/s41477-017-0005-9 16. Wang C, Liu C, Roqueiro D, Grimm D, Schwab R, Becker C, Lanz C, Weigel D (2015) Genome-wide analysis of local chromatin packing in Arabidopsis thaliana. Genome Res 25(2):246–256. https://doi.org/10. 1101/gr.170332.113 17. Wang M, Wang P, Lin M, Ye Z, Li G, Tu L, Shen C, Li J, Yang Q, Zhang X (2018) Evolutionary dynamics of 3D genome architecture following polyploidization in cotton. Nat Plants 4(2):90–97. https://doi.org/10. 1038/s41477-017-0096-3 18. Zhu W, Hu B, Becker C, Dogan ES, Berendzen KW, Weigel D, Liu C (2017) Altered chromatin compaction and histone methylation drive non-additive gene expression in an interspecific Arabidopsis hybrid. Genome Biol 18(1):157. https://doi.org/10.1186/s13059-017-12814 19. Sotelo-Silveira M, Chavez Montes RA, SoteloSilveira JR, Marsch-Martinez N, de Folter S (2018) Entering the next dimension: plant genomes in 3D. Trends Plant Sci 23 (7):598–612. https://doi.org/10.1016/j. tplants.2018.03.014 20. Borges F, Gardner R, Lopes T, Calarco JP, Boavida LC, Slotkin RK, Martienssen RA, Becker JD (2012) FACS-based purification of Arabidopsis microspores, sperm cells and vegetative nuclei. Plant Methods 8(1):44. https:// doi.org/10.1186/1746-4811-8-44

Study of Cell-Type-Specific Chromatin Organization 21. Deal RB, Henikoff S (2011) The INTACT method for cell type-specific gene expression and chromatin profiling in Arabidopsis thaliana. Nat Protoc 6(1):56–68. https://doi.org/ 10.1038/nprot.2010.175 22. Moreno-Romero J, Santos-Gonzalez J, Hennig L, Kohler C (2017) Applying the INTACT method to purify endosperm nuclei and to generate parental-specific epigenome profiles. Nat Protoc 12(2):238–254. https:// doi.org/10.1038/nprot.2016.167 23. Weinhofer I, Kohler C (2014) Endospermspecific chromatin profiling by fluorescenceactivated nuclei sorting and ChIP-on-chip. Methods Mol Biol 1112:105–115. https:// doi.org/10.1007/978-1-62703-773-0_7 24. Barow M (2006) Endopolyploidy in seed plants. BioEssays 28(3):271–281. https://doi. org/10.1002/bies.20371

127

25. Liu C (2017) In situ Hi-C library preparation for plants to study their three-dimensional chromatin interactions on a genome-wide scale. Methods Mol Biol 1629:155–166. https://doi.org/10.1007/978-1-4939-71251_11 26. Ferretti L, Sgaramella V (1981) Specific and reversible inhibition of the blunt end joining activity of the T4 DNA ligase. Nucleic Acids Res 9(15):3695–3705 27. Sexton T, Yaffe E, Kenigsberg E, Bantignies F, Leblanc B, Hoichman M, Parrinello H, Tanay A, Cavalli G (2012) Three-dimensional folding and functional organization principles of the Drosophila genome. Cell 148 (3):458–472. https://doi.org/10.1016/j.cell. 2012.01.010

Chapter 10 Chromatin Analysis of Metabolic Gene Clusters in Plants Ancheng C. Huang and Hans-Wilhelm Nu¨tzmann Abstract Plant metabolic gene clusters consist of neighboring genes that are involved in the biosynthesis of secondary or specialized metabolites. The genes within clusters are typically co-regulated, share a common set of chromatin marks, and code for the biosynthesis enzymes of a single metabolic pathway. Here, we describe three essential protocols for the basic analysis of metabolic gene clusters at transcription, histone modification, and metabolite level. The protocols are specified to clusters in the Arabidopsis thaliana genome and are transferable to other plant species. Key words Gene cluster, Metabolism, Chromatin modifications, Triterpene

1

Introduction Gene clusters, groups of functionally related neighboring genes, are a common feature of genomes across the eukaryotes. Many clusters encode for key development, immunity, and metabolism-related processes. These clusters have long been model systems for studying genetic and epigenetic regulatory mechanisms and have more recently become a focal point in the quest for novel chemistry [1, 2]. Plant metabolic gene clusters (MGC) encode for the biosynthesis genes of complex secondary or specialized metabolites. These metabolites are typically of low molecular weight and bioactive and play important roles in plant development and environmental interactions. MGCs range in size from 30 to several hundreds of kilobases and cover three or more biosynthesis genes. Cluster regions are often enriched in transposable and repetitive elements. Characteristically, cluster genes share highly similar expression patterns, and individual clusters are transcribed under specific developmental and environmental conditions [2, 3]. It was shown that the co-expression pattern of MGCs is accompanied by distinct signatures of chromatin marks. These marks are associated with clustered

Charles Spillane and Peter McKeown (eds.), Plant Epigenetics and Epigenomics: Methods and Protocols, Methods in Molecular Biology, vol. 2093, https://doi.org/10.1007/978-1-0716-0179-2_10, © Springer Science+Business Media, LLC, part of Springer Nature 2020

129

130

Ancheng C. Huang and Hans-Wilhelm Nu¨tzmann

Fig. 1 Section overview. Plant metabolic gene clusters consist of adjacent biosynthesis genes. MGCs are marked by conserved histone marks and show pattern of co-expression. Diverse specialized metabolites are produced by cluster-encoded biosynthesis enzymes. Red arrows, cluster genes; gray arrows, genes outside cluster; yellow, histone octamer; green, histone modification; red waves, cluster mRNAs; gray waves, mRNAs of genes outside cluster

genes and in some cases the intergenic cluster region. Labeling of MGCs with chromatin modifications is a dynamic process that is linked to the transcriptional activity of the cluster [4, 5]. In this chapter, we assemble methods for the analysis and relative quantification of transcription, chromatin modifications, and metabolite production of MGCs. Each section serves the characterization and identification of MGCs and underpins their exploitation and manipulation. Taken in combination, these protocols facilitate interdisciplinary investigations into MGCs (Fig. 1).

Analysis of Metabolic Gene Clusters

2

131

Materials Prepare and store all stock solutions at room temperature. Prepare chromatin immunoprecipitation (ChIP) buffers freshly and keep at 4  C during the experiment. Add β-mercaptoethanol and protease inhibitors to ChIP buffers immediately prior to use. Prepare and use toxic chemicals under the fume hood.

2.1

Growth Media

1. MS salt medium: 0.43% (w:v) Murashige & Skoog Medium without vitamins, 0.5% (w:v) sucrose, 1% (w:v) phytagel (adjust pH to 5.8 with 1 M NaOH before addition of phytagel; pour media directly after autoclaving). 2. ¼ MS medium: 0.11% (w:v) Murashige & Skoog Medium including vitamins, 0.77% (w:v) sucrose, 1% agar (adjust pH to 5.8 with 1 M NaOH before addition of agar).

2.2 Measurement of mRNA Levels

1. TRIzol (TOXIC).

2.2.1 RNA Isolation

3. 70% ethanol.

2. Chloroform (TOXIC). 4. Wash buffer I and wash buffer II (PureLink RNA Mini Kit, Thermo Fisher Scientific). 5. RNase-free water.

2.2.2 Removal of Genomic DNA

1. DNase (Ambion Turbo DNase, Thermo Fisher Scientific). 2. DNase reaction buffer. 3. DNase removal agent.

2.2.3 cDNA Preparation

1. Reverse transcriptase (RevertAid First Strand cDNA Synthesis Kit, Thermo Fisher Scientific). 2. Reverse transcriptase buffer. 3. RNase out. 4. dNTPs. 5. oligo(dT)18 primer.

2.2.4 qPCR

1. MyTaq HS Mix (Bioline). 2. EvaGreen dye (Biotium). 3. Nuclease-free water.

132

Ancheng C. Huang and Hans-Wilhelm Nu¨tzmann

2.3 Chromatin Immunoprecipitation

1. 0.5 M EDTA, pH 8.

2.3.1 Stock Solutions and Reagents

3. 1 M MgCl2.

2. 4 M LiCl. 4. 5 M NaCl. 5. 10% (w:v) NP-40. 6. 10 PBS buffer: 1.3 M NaCl, 30 mM Na2HPO4, NaH2PO4, pH 7.4. 7. 10% (w:v) SDS. 8. 2 M sucrose. 9. 1 M Tris–HCl, pH 8. 10. 10% (w:v) Triton X-100. 11. Dynabeads magnetic beads (Thermo Fisher Scientific). 12. 10% (w:v) Chelex resin (Bio-Rad). 13. StrataClean resin (Agilent). 14. Proteinase K solution (20 mg/mL) (Thermo Fisher Scientific). 15. Antibodies against histone modification of choice.

2.3.2 Buffers

1. Formaldehyde buffer: 1% (v:v) formaldehyde (TOXIC) in 1 PBS. 2. Glycine buffer: 2 M glycine in ddH2O. 3. Buffer A: 0.4 M sucrose, 10 mM Tris–HCl, pH 8.0, 5 mM βmercaptoethanol (TOXIC), protease inhibitor cocktail (cOmplete, Roche). 4. Buffer B: 0.25 M sucrose, 10 mM Tris–HCl, pH 8.0, 10 mM MgCl2, 1% (v:v) Triton X-100, 5 mM β-mercaptoethanol, protease inhibitor cocktail (cOmplete, Roche). 5. Buffer C: 1.7 M sucrose, 10 mM Tris–HCl, pH 8.0, 0.15% (v: v) Triton X-100, 2 mM MgCl2, 5 mM β-mercaptoethanol, protease inhibitor cocktail (cOmplete, Roche). 6. Buffer D: 50 mM Tris–HCl, pH 8.0, 10 mM EDTA, 1% (w:v) SDS, protease inhibitor cocktail (cOmplete, Roche). 7. ChIP buffer: 1.1% (v:v) Triton X-100, 1.2 mM EDTA, 16.7 mM Tris–HCl, pH 8.0, 167 mM NaCl, protease inhibitor cocktail (cOmplete, Roche). 8. Wash buffer A: 150 mM NaCl, 0.1% (w:v) SDS, 1% (v:v) Triton X-100, 2 mM EDTA, 20 mM Tris–HCl, pH 8.0. 9. Wash buffer B: 500 mM NaCl, 0.1% (w:v) SDS, 1% (v:v) Triton X-100, 2 mM EDTA, 20 mM Tris–HCl, pH 8.0. 10. Wash buffer C: 0.25 M LiCl, 1% (v:v) NP-40, 1% (w:v) sodium deoxycholate, 1 mM EDTA, 10 mM Tris–HCl, pH 8.0. 11. TE buffer: 10 mM Tris–HCl, pH 8.0, 1 mM EDTA.

Analysis of Metabolic Gene Clusters

2.4 Metabolite Extraction and Analysis 2.4.1 Instruments and Equipment

2.4.2 Solvents and Chemicals

133

1. Analytical instruments: gas chromatography-mass spectrometry (GC-MS, e.g., Agilent 7890B GC-5977A MS), highperformance liquid chromatography-mass spectrometry (LC-MS, e.g., Agilent Q-TOF, Shimadzu IT-TOF). 2. Equipment: freeze-drier, lab mill/grinder, centrifuge, PTFE filters, tungsten beads (3 mm diameter), Genevac evaporator (optional). 1. Mobile phase solvent A: Milli-Q water containing 0.1% formic acid. 2. Mobile phase solvent B: LC-MS grade acetonitrile. 3. Hexanes. 4. Ethyl acetate. 5. Methanol. 6. Acetonitrile (LC-MS grade). 7. Milli-Q water. 8. Internal standards: coprosterol or deuterium-labeled target compounds or simply alkanes (e.g., hexatriacontane-d74 for triterpenes) (see Note 1). 9. Derivatization reagents [e.g., trimethylchlorosilane in pyridine (TMCS), N,O-bis(trimethylsilyl) trifluoroacetamide (BSTFA), formic acid (LC-MS grade)] (TOXIC). 10. Formic acid (LC-MS grade).

3

Methods

3.1 Measurement of mRNA Levels

1. Grow A. thaliana vertically on MS salt media for 6–7 days under long-day conditions (16 h light/8 h dark).

3.1.1 RNA Isolation

2. Cut off roots and cotyledons from seedlings with razor blade. 3. Harvest separated material with forceps and dry with Miracloth and paper. 4. Transfer dried material to RNAse-free 1.5 mL tubes and place into liquid nitrogen. 5. Transfer 50–100 mg of material in screw lid tube with 3–4 zirconia beads (2 mm diameter). 6. Add 600 μL of TRIzol and incubate at RT for up to 10 min. 7. Disrupt tissues with bead mill (Precellys Evolution or similar) for 2  1 min at 6000 rpm with a 30 s break in between. 8. Add 120 μL chloroform, vortex for 10–20 s, and incubate for 2–3 min. 9. Centrifuge at 15,000  g at 4  C for 15 min.

134

Ancheng C. Huang and Hans-Wilhelm Nu¨tzmann

10. Transfer aqueous phase into new reaction tube and discard leftover. 11. Add one volume of 70% ethanol and vortex vigorously. 12. Transfer sample to assembled spin tube and centrifuge at >10,000  g for 15 s. Remove flow-through. 13. Add 700 μL wash buffer I and centrifuge at >10,000  g for 15 s. Remove flow-through. 14. Wash with 500 μL wash buffer II and centrifuge at >10,000  g for 15 s. Remove flow-through. Repeat. 15. Centrifuge at >10,000  g for 90 s and transfer spin cartridge into new reaction tube. 16. Elute sample with >30 μL RNase-free water at >10,000  g for 30 s. 17. RNA can be stored at 80  C. 3.1.2 Removal of Genomic DNA

1. Add 1/10 volume of reaction buffer to sample. 2. Add 1 μL of DNase to sample, mix carefully by pipetting up and down, and incubate at 37  C for 30 min. 3. Add 1/10 volume of DNase inactivation agent to sample, vortex, incubate for 90 s, vortex, and incubate for another 90 s. 4. Centrifuge at >12,000  g for 2 min. 5. Transfer supernatant to fresh tube. 6. RNA can be stored at 80  C.

3.1.3 cDNA Preparation

1. Measure RNA content in samples. 2. Transfer 50–5000 ng of RNA to PCR tube. Maximum volume 11 μL. Use similar RNA amounts for all samples. 3. Add 1 μL of oligo (dT)18 primer. 4. Fill to 12 μL with nuclease-free water. 5. Incubate at 65  C for 5 min in PCR cycler. 6. Reduce temperature to 42  C and add 4 μL reaction buffer, 1 μL RiboLock RNase inhibitor, 2 μL 10 mM dNTP Mix, and 1 μL RevertAid M-MuLV Reverse Transcriptase. Carefully mix by pipetting and incubate for 90 min at 42  C. 7. cDNA can be stored at 20  C for short-term and 80  C for long-term periods.

3.1.4 Quantitative PCR

1. Design primers for all genes within cluster of interest, one upstream and one downstream gene adjacent to the cluster, and a housekeeping reference gene (e.g., PP2AA3 (At1g13320) [6]). For primer design, use Primer3Plus or similar program.

Relative normalised expression

Analysis of Metabolic Gene Clusters

Leaf

10000

135

Root

1000 100 10 1

AC

T

AD

TH

AH

TH

AS

TH

Thalianol cluster

AC

T

AD

TH

AH

TH

AS

TH

Thalianol cluster

Fig. 2 Gene cluster co-expression. Exemplar pattern of expression of the A. thaliana thalianol gene cluster in leaves and roots of young seedlings [4]. In red cluster genes and respective relative mRNA levels. In gray flanking genes and respective mRNA levels. Relative mRNA levels were determined by qRT-PCR and leaf transcript level rate was set as 1. At1g13320 was used as an internal control [6]

2. Dilute cDNA sample with nuclease-free water to appropriate concentration for qPCR. For cDNA reactions with 2000 ng RNA, starting material typical dilutions are 1:6–1:7. Aim for cDNA concentrations that result in Ct-values between 20 and 30 cycles for reference gene. 3. Pipette 1.5 μL of diluted cDNA sample into individual well of qPCR plate. Aim for at least three technical replicates for each sample. Include water and DNase-free controls. 4. Prepare qPCR master mix. Use 10 μL MyTaq HS Mix, 1 μL EvaGreen dye, 0.4 μL primer each, and 6.7 μL nuclease-free water per reaction. 5. Apply qPCR master mix to template. 6. Run qPCR reaction on qPCR machine (e.g., CFX96 RealTime PCR Detection System (Bio-Rad) and AriaMx RealTime PCR System (Agilent)). 7. Analyze raw data by using the ΔΔCt method. Data are represented as fold change to control condition and are normalized to reference gene (Fig. 2). 3.2 Chromatin Immunoprecipitation

3.2.1 Chromatin Extraction

A chromatin immunoprecipitation protocol for locus-specific detection of histone modifications that includes cross-linking and sonication (X-ChIP) is presented [7]. 1. Grow A. thaliana vertically on MS salt media for 6–7 days under long-day conditions (16 h light/8 h dark). 2. Separate roots and cotyledons from seedlings with razor blade.

136

Ancheng C. Huang and Hans-Wilhelm Nu¨tzmann

3. Harvest material with forceps, cover in Miracloth, and dry with paper towel. 4. Place material in formaldehyde buffer in 50 mL falcon tube. 5. Apply vacuum 3 for 5 min each and release vacuum between steps. Establish vacuum that causes light air bubbles to form. Too strong vacuum will lead to spills. 6. Add glycine buffer to a final concentration of 0.125 M and apply vacuum for 5 min. 7. Wash material thoroughly 3 with ddH2O. 8. Dry material carefully in Miracloth and paper towels. Material can be stored at 80  C. 9. Use precooled mortar and pestle to grind material to fine powder in liquid nitrogen. 10. Transfer powder into 30 mL buffer A in 50 mL falcon tube, vortex immediately, and incubate for 5 min on ice. 11. Filter suspension twice through double layer of Miracloth into new cooled 50 mL falcon. 12. Centrifuge at 2800  g at 4  C for 20 min. 13. Carefully discard supernatant. 14. Resuspend chromatin pellet in 1 mL of buffer B and transfer to 1.5 mL reaction tube. 15. Centrifugation at 12,000  g at 4  C for 10 min. 16. Carefully discard supernatant. 17. Resuspend chromatin pellet in 300 μL of buffer C and transfer onto 900 μL of buffer C prefilled in new 1.5 mL reaction tube. 18. Centrifugation at 14,000  g at 4  C for 1 h. 19. Carefully discard supernatant. 20. Carefully take up pellet in 300 μL buffer D. 21. Shear chromatin solution by sonication to obtain DNA fragments of 100–1000 bp in size. For example, use BioRuptor (Diagenode) 4  5 min (30 s on/off) at “low” setting (see Note 2). 22. Centrifugation at 14,000  g at 4  C for 5 min. 23. Transfer supernatant to new reaction tube. Store 30 μL of sample as input control and continue with remaining chromatin solution in Subheading 3.2.2, step 7. 3.2.2 Immunoprecipitation and DNA Recovery

1. Transfer 15 μL of Dynabeads magnetic beads to one 2.0 mL reaction tube for each IP. Three IPs are prepared for each individual chromatin sample (target histone modification, core histone, and mock control; see Subheading 3.2.2, step 4).

Analysis of Metabolic Gene Clusters

137

2. Resuspend magnetic beads in 1 mL wash buffer A and incubate for 5 min on tube rotator. 3. Place reaction tube in magnetic rack, let beads attach to magnet, and remove wash buffer. Repeat steps 2 and 3 twice. 4. Resuspend magnetic beads in 50 μL ChIP buffer and add antibody of choice according to manufacturer’s instruction. Include antibodies against target histone modification and core histone and a non-specific control antibody (see Note 3). 5. Incubate at 4  C for 1 h. 6. Repeat wash steps 2 and 3 three times. 7. Transfer chromatin solution to 15 mL falcon tube and dilute 10 with ChIP buffer. 8. Transfer 900 μL of diluted samples to prepared magnetic beads and incubate on tube rotator at lowest setting overnight. 9. Resuspend magnetic beads in 1 mL wash buffer A and incubate for 5 min on tube rotator. 10. Place reaction tube in magnetic rack, let beads attach to magnet, and remove buffer. Repeat steps 9 and 10 once. 11. Resuspend magnetic beads in 1 mL wash buffer B and incubate for 5 min on tube rotator. 12. Place reaction tube in magnetic rack, let beads attach to magnet, and remove buffer. 13. Resuspend magnetic beads in 1 mL wash buffer C and incubate for 5 min on tube rotator. 14. Place reaction tube in magnetic rack, let beads attach to magnet, and remove buffer. 15. Resuspend magnetic beads in 1 mL TE buffer and incubate for 5 min on tube rotator. Repeat once. 16. Place reaction tube in magnetic rack, let beads attach to magnet, and remove buffer. Repeat 15 and 16 once. Transfer sample to new 1.5 mL reaction tube before last wash. 17. Carefully resuspend magnetic beads in 100 μL 10% Chelex resin, and incubate at 1300 rpm at 95  C for 10 min. Perform the same step with input samples and include these in following protocol. 18. Centrifuge briefly, place tubes on ice, and apply 2 μL Proteinase K. Incubate at 50  C for 30 min. 19. Incubate reaction tubes at 95  C for 10 min. 20. Cool on ice and add 10 μL StrataClean resin. Vortex mix and incubate for 10 min at room temperature (see Note 4). 21. Centrifuge at 12,000  g at room temperature for 2 min.

138

Ancheng C. Huang and Hans-Wilhelm Nu¨tzmann

22. Take up supernatant with clean DNA and store in fresh tube. Discard pellet. 23. Chromatin solution can be stored at 20  C. 3.2.3 Quantification

Quantification is performed by qPCR. Follow protocol as described in Subheading 3.1.4. Primers are designed to cover exemplar regions within clusters. It is advised to cover both genic, including transcriptional start site, gene body, and transcriptional termination region, and intergenic regions to determine the cluster-wide histone modification pattern. Results are calculated by enrichment per nucleosome or input DNA. To normalize for different conditions and tissues, results may be calculated in relation to internal control. Loci with stable chromatin marks across different conditions can be used as internal control (Fig. 3).

3.3 Metabolite Analysis

A metabolite analysis protocol is described that covers sample preparation as well as GC- and LC-MS and metabolomics analysis.

3.3.1 Sample Preparation for GC-MS Analysis

1. Grow A. thaliana on ¼ MS medium vertically for 10 days at 22  C under long-day condition (16 h light/ 8 h dark) (see Note 5). 2. Separate roots from hypocotyl and other tissues using a razor blade (see Note 6). 3. Harvest roots into a 2 mL Eppendorf tube using forceps. 4. Immediately flash-freeze root materials with liquid nitrogen. 5. Lyophilize the frozen A. thaliana root materials using a freezedryer until samples are dry (usually overnight is sufficient) (see Note 7). 6. Weigh 5–10 mg dry root materials into a new 1.5 mL tube, place a tungsten bead, and homogenize the root materials with a lab mill/grinder (e.g., Geno/Grinder (SPEX SamplePrep) at 1500 rpm for 1 min). 7. Quickly spin down the root powder. 8. Add ethyl acetate (200 μL, containing 10 μg/mL internal standard (e.g., coprosterol)); close the cap. 9. Extract root material by sonication for 30 min using a sonication bath. 10. Centrifuge the extract suspension at maximum speed (>12,000  g) for 10 min. 11. Carefully transfer the supernatants (170 μL) to a GC-MS sample vial insert. 12. Analyze the sample directly by GC-MS. Alternatively, dry down the root extract under a gentle stream of nitrogen gas

Analysis of Metabolic Gene Clusters 18

139

Actin2

FLC

THAH

Leaf Leaf Root Root

Leaf Leaf Root Root

Leaf Leaf Root Root

H3K27me3 enrichment

16 14 12 10 8 6 4 2 0

Fig. 3 Variable H3K27me3 enrichment at cluster gene. H3K27me3 enrichment at the Actin2 and FLC genes and thalianol cluster gene THAH (in red) in leaves and roots of 7 days old seedlings. Two biological replicates are shown. Enrichment is presented as (ChIP H3K27me3/ChIP H3)∗100. Actin2, a highly expressed housekeeping gene, is used as negative control. FLC, a regulator of plant vernalization with strong H3K27me3 markings, is used as positive control [15]. Note the relatively higher H3K27me3 markings of THAH in leaves compared to roots which negatively correlates to high expression of THAH in roots and low expression in leaves

or using a Genevac with preload method for low-boiling-point solvents, add formulated derivatization agents (e.g., TMCS in pyridine, 50 μL), and incubate the sample at 70  C for 20 min before analysis by GC-MS (see Note 8). 3.3.2 GC-MS Analysis Method (See Note 9)

1. Set up a GC-MS method. l Flow rate: 1.5 mL/min. l

Injection volume: 2 μL.

l

GC inlet: pulsed split mode; split ratio, 5:1.

l

Inlet temperature: 280  C.

l

MS transfer line temperature: 280  C.

l

MS source temperature: 150  C.

l

Quadrupler: 230  C.

l

MS scan range: 50–750 da.

l

Oven temperature gradient (Table 1).

2. Edit and start a GC-MS sequence (see Note 10). 3.3.3 Sample Preparation for LC-MS Analysis

1. Prepare material as described in Subheading 3.3.1, steps 1–7, and extract plant material with methanol (200 μL, containing 10 μg/mL digitoxin as internal standard) and sonicate for 30 min. 2. Filter the supernatants through a 0.45 μm PTFE filter. 3. Analyze filtrates by LC-MS.

140

Ancheng C. Huang and Hans-Wilhelm Nu¨tzmann

Table 1 Oven temperature gradients Ramp (oC/min) Initial temperature

Oven temperature ( C)

Hold time (min)

60

1

Ramp1

20

270

0

Ramp2

4

300

0

40

a

6

Ramp3

340

a

Make sure not to exceed maximum column temperature

3.3.4 Method for LC-MS Analysis (See Note 11)

1. Set up LC-MS method (MS parameters depicted here are based on a Shimadzu LC-MS-IT-TOF system and can vary depending on the LC-MS system in use). l Solvent gradient (Table 2). l

Flow rate: 0.5 mL/min.

l

Injection volume is 2 μL.

l

MS curve desorption line (CDL) temperature: 250  C.

l

Nebulizing gas flow: 1.5 L/min.

l

Heat block at 300  C.

2. Calibrate the mass spectrometry using calibrants. 3. Create and run a batch or sequence. 3.3.5 Metabolomics Analysis

Upon acquisition of the metabolite profiles of the wild-type plants, mutants, or overexpression lines using GC-MS and LC-MS, targeted or untargeted metabolomics analysis can be performed. 1. Load data files onto the instrument-associated analysis software (e.g., Agilent MassHunter). 2. Overlay the chromatograms of the extracts of wild-type plants, mutants, and overexpression lines, and manually inspect the presence and absence of peaks using the instrument-associated analysis software (Fig. 4). 3. Tentatively assign the identities of the significantly changed peaks across different sets of datasets by comparing the MS spectra to publicly available database such as NIST and METLIN. One can further predict the structure based on the MS fragmentation patterns (GC-MS) or MS/MS data (LC-MS). 4. Alternatively, perform comparative untargeted metabolomics analysis using the XC-MS online program to identify features with significant changes followed by comparison of the extracted ion chromatograms (Table. 3 and Fig. 4) [8]. The XC-MS online program is easy to use and without the need to

Analysis of Metabolic Gene Clusters

141

Table 2 Solvent gradient Time (min)

B.Conc

0.01

2

0.50

2

5.00

10

17.00

30

33.00

90

35.80

100

43.00

100

45.00

2

apply R scripts. The website provides specific instructions on how to use XC-MS online. The majority of acquired data are in profile mode and need to be converted to centroid mode in mzXML, NetCDF, or mzData format for XC-MS analysis. The analysis can also be performed by applying the XC-MS package in R [9]. 3.3.6 Quantitative Analysis (See Note 12)

1. Integrate total ion peaks that are derived from products synthesized by the cluster-encoded enzymes as identified in the metabolomics analysis. 2. Integrate the peak area of the internal standard. 3. Calculate the estimated concentration of target compounds using the equation: c (target) ¼ [A (target)/ A (ITSD)] x c (ITSD). where c stands for concentration, A area, and ITSD internal standard.

4

Notes 1. Internal standard is chosen based on the similarity of its physical and chemical property to the target compounds in the performed analysis. The optimal internal standard will be the structural analogues of the target compounds of interest. 2. DNA shearing. If protocol needs to be optimized for new tissue, growth condition, or plant species, it is advised to monitor DNA quality and shearing before and after sonication (Subheading 3.2.1, step 21). A standard DNA agarose gel is suitable. For high DNA concentrations, samples can be taken during full experiment. For experiments with low DNA

142

Ancheng C. Huang and Hans-Wilhelm Nu¨tzmann

Fig. 4 Comparative GC-MS profiles of root extracts of A. thaliana wild type, thalianol synthase (THAS) knockout line (thas-ko), and THAS overexpression line (thas-oe). (a) GC-MS total ion chromatogram (TIC) of wild-type root extracts; (b) GC-MS TIC of thas-ko root extracts; (c) GC-MS TIC of thas-oe root extracts; (d) GC-MS extracted ion chromatogram (EIC) of m/z (247) from a; (e) GC-MS EIC (247) from b; (f) GC-MS EIC (247) from c

concentrations, separate optimization experiments should be performed or more sensitive DNA measurement tools applied (e.g., Bioanalyzer).

Analysis of Metabolic Gene Clusters

143

Table 3 Features from a pair-wise untargeted metabolomics analysis of wild-type vs. thas-ko ranked by fold change Fold Rank change

Up/down m/z

Retention time

1

318

DOWN

261.2

16.83

2

268

DOWN

247.2

3

253

DOWN

4

118

5

Peak group

Name

27327.0

62

M261T17

13.87

32962.8

56

M247T14

302.3

16.83

19564.7

62

M302T17

DOWN

260.2

16.83

4415.8

62

M260T17

116

DOWN

355.1

10.06

276824.3

18

M355T10

6

114

DOWN

243.2

16.95

88558.0

35

M243T17

7

102

DOWN

344.3

17.08

42142.7

49

M344T17

8

77

DOWN

261.2

15.05

8277.6

89

M261T15_1

9

76

DOWN

230.2

13.87

7907.5

56

M230T14

Thalianol

10

68

DOWN

229.2

13.87

35743.7

56

M229T14

Thalianol

11

64

DOWN

244.2

17.08

27052.4

49

M244T17

12

50

DOWN

262.2

15.67

6163.8

60

M262T16

13

45

DOWN

261.2

15.67

29104.6

60

M261T16

14

28

DOWN

262.2

16.83

9594.9

62

M262T17

15

23

DOWN

259.2

14.81

11404.2

80

M259T15

16

16

DOWN

369.2

12.28

29041.5

17

M369T12

17

15

DOWN

259.2

16.58

9528.5

50

M259T17

18

14

DOWN

221.1

10.70

207291.8

23

M221T11

19

13

DOWN

282.15 14.70

19422.5

71

M282T15

20

12

DOWN

222.1

43781.0

27

M222T14

13.68

Maximum intensity

User notes

Thalianol

Features belonging to the product of thalianol synthase (thalianol) are in bold with m/z in dark yellow

3. Antibodies and Dynabeads. Precipitation strategy should be chosen based on target chromatin mark and quality of available antibodies. Protein A Dynabeads are advised for rabbit, pig, dog, and cat-derived antibodies and Protein G Dynabeads for mouse and human-derived antibodies. It should be aimed to apply antibodies validated on plant chromatin. If this is not possible, additional validation steps have to be applied such as Western blot, Dot blot, and protein mass spectrometry. 4. DNA precipitation. As an alternative to StrataClean resin or in addition to resin precipitation, standard ethanol precipitation can be applied. In brief, add ddH2O to samples to increase

144

Ancheng C. Huang and Hans-Wilhelm Nu¨tzmann

volume to at least 400 μL, add 1/10 vol NaOAc (pH 5.6) and 2 vol 100% EtOH, mix well by inverting, store at 20  C (or 80  C) for at least 90 min, centrifuge at >12,000  g at 4  C for 30 min, discard supernatant, wash with 400 μL 70% EtOH, centrifuge at >12,000  g at 4  C for 10 min, discard supernatant, and air-dry DNA pellet. Resuspend in suitable amount of ddH2O. 5. Mutant lines for MGC genes are essential to investigate the functions of a MGC in planta (e.g., gene disruption, knockout, knockdown, silencing, or overexpression). Furthermore, biochemical characterization in heterologous expression systems such as Nicotiana benthamiana, Saccharomyces cerevisiae, and Escherichia coli as well as in vitro enzyme assays using purified recombinant proteins are important for validation of MGC function. Sample preparation for heterologously expressing MGC genes is not described here as it varies depending on the system used. However, the GC-MS and LC-MS methods described here for metabolite analysis of plant materials can be modified for analysis of metabolite profiles of heterologous host expressing MGC genes. 6. Throughout the protocol, root material of fresh seedlings is used. This tissue is source of several MGC-derived metabolites in A. thaliana [10–12]. 7. The lyophilization process will result in loss of volatile to semivolatile compounds (e.g., sesquiterpenoids). Therefore, fresh root materials should be used when predicted MGC-coded metabolites are potentially volatile. 8. Derivatization of samples with derivatizing reagents [e.g., trimethylchlorosilane (TMCS) or N,O-bis(trimethylsilyl) trifluoroacetamide (BSTFA)] can aid in detection of very polar molecules such as carboxylic acids. 9. GC-MS analysis provides the desired sensitivity, resolution, reliability, and ease for initial profiling and identification. To obtain a relatively complete metabolic profile using GC-MS, it’s recommended to (a) set the inlet temperature at 280  C (make sure not to exceed GC column maximum temperature), (b) set the initial oven temperature at 60  C and hold for 60 s. This will concentrate the metabolites and improve resolution; (c) set a relatively high oven temperature at the end of the run and hold for a few minutes to allow for detection of compounds with high boiling temperature and to avoid carryover by removal of remaining compounds in the column; (d) use split mode to avoid thermal degradation in the inlet if a sample is concentrated enough; (e) use a nonpolar column to start with.

Analysis of Metabolic Gene Clusters

145

10. To avoid contamination of the MS by salts, it is recommended to divert the first 1–2 min of the eluents to waste. 11. LC-MS is the most widely used instrument for metabolite profiling, although the ionization efficiency is low for some nonpolar compounds such as unmodified terpenes. The resolution of a LC chromatogram is largely determined by the column in use. C18 column is the most commonly used reverse phase column for LC-MS analysis, although columns like C4 and C8 columns are also widely used. The length of the column and size of the sorbent within can vary, and selection is based on the pressure that the used LC-MS system can tolerate and on the physiochemical properties of the target compounds (e.g., polarity). 12. The quantitative analysis described herein is a rough estimation of the actual concentrations of the target compounds. For accurate and more reliable quantification, pure authentic compounds are required as standards. If available, generate calibration curves and calculate concentrations of target compounds [13, 14].

Acknowledgments This work was supported by startup funding from the Southern University of Science and Technology and Shenzhen municipal government (ACH) and the Royal Society University Research Fellowship (UF160138) (HWN). References 1. Hurst LD, Pa´l C, Lercher MJ (2004) The evolutionary dynamics of eukaryotic gene order. Nat Rev Genet 5:299–310 2. Nu¨tzmann HW, Huang A, Osbourn A (2016) Plant metabolic clusters – from genetics to genomics. New Phytol 211:771–789 3. Boycheva S, Daviet L, Wolfender JL, Fitzpatrick TB (2014) The rise of operon-like gene clusters in plants. Trends Plant Sci 19:447–459 4. Nu¨tzmann HW, Osbourn A (2015) Regulation of metabolic gene clusters in Arabidopsis thaliana. New Phytol 205:503–510 5. Yu N, Nu¨tzmann HW, MacDonald JT, Moore B, Field B, Berriri S et al (2016) Delineation of metabolic gene clusters in plant genomes by chromatin signatures. Nucleic Acids Res 44:2255–2265 6. Hong SM, Bahn SC, Lyu A, Jung HS, Ahn JH (2010) Identification and testing of superior reference genes for a starting pool of transcript

normalization in Arabidopsis. Plant Cell Physiol 51:1694–1706 7. Song J, Rutjens B, Dean C (2014) Detecting histone modifications in plants. Methods Mol Biol 1112:165–175 8. Gowda H, Ivanisevic J, Johnson CH, Kurczy ME, Benton HP, Rinehart D et al (2014) Interactive XCMS online: simplifying advanced metabolomic data processing and subsequent statistical analyses. Anal Chem 86:6931–6939 9. Rajniak J, Barco B, Clay NK, Sattely ES (2015) A new cyanogenic metabolite in Arabidopsis required for inducible pathogen defence. Nature 525:376–379 10. Field B, Fiston-Lavier AS, Kemen A, Geisler K, Quesneville H, Osbourn AE (2011) Formation of plant metabolic gene clusters within dynamic chromosomal regions. Proc Natl Acad Sci U S A 108:16116–16121

146

Ancheng C. Huang and Hans-Wilhelm Nu¨tzmann

11. Field B, Osbourn AE (2008) Metabolic diversification – independent assembly of operonlike gene clusters in different plants. Science 320:543–547 12. Sohrabi R, Huh JH, Badieyan S, Rakotondraibe LH, Kliebenstein DJ, Sobrado P, Tholl D (2015) In planta variation of volatile biosynthesis: an alternative biosynthetic route to the formation of the pathogen-induced volatile homoterpene DMNT via triterpene degradation in Arabidopsis roots. Plant Cell 27:874–890 13. Huang AC, Kautsar SA, Hong YJ, Medema MH, Bond AD, Tantillo DJ, Osbourn A

(2017) Unearthing a sesterterpene biosynthetic repertoire in the Brassicaceae through genome mining reveals convergent evolution. Proc Natl Acad Sci U S A 114:E6005–E6014 14. Huang AC, Jiang T, Liu YX, Bai YC, Reed J, Qu B, Goossens A, Nu¨tzmann HW, Bai Y, Osbourn A (2019) A specialized metabolic network selectively modulates Arabidopsis root microbiota. Science 364: pii: eaau6389 15. De Lucia F, Crevillen P, Jones AM, Greb T, Dean C (2008) A PHD-polycomb repressive complex 2 triggers the epigenetic silencing of FLC during vernalization. Proc Natl Acad Sci U S A 105:16831–16836

Chapter 11 Characterization of Plant 3D Chromatin Architecture, In Situ Hi-C Library Preparation, and Data Analysis Pengfei Dong and Silin Zhong Abstract In the mammalian nucleus, 3D organization of the chromatin plays an important role in the regulation of gene expression. Similar chromatin structures such as A/B compartments, domains, and loops have now been found in plants, yet their biological function remained to be further elucidated. In this chapter, we present a detailed protocol for examining genome-wide chromatin interaction using in situ Hi-C optimized for plant tissues. We also provide a step-by-step bioinformatic workflow for Hi-C sequencing data alignment, and subsequent compartment, domain, and chromatin loop identification. Key words Plant chromatin 3D architecture, A/B compartment, Local compartment, Domain, Loop, In situ Hi-C

1

Introduction In eukaryotic cells, DNA is wrapped around the histone octamer forming the 11-nanometer chromatin fiber that is often referred to as beads-on-a-string. How the chromatin fiber is packaged into a chromosome at the micrometer scale has fascinated biologists for a long time. Recent advance in chromosome conformation capture technologies such as in situ Hi-C enabled easy genome-wide profiling of chromatin interaction [1, 2]. The initial steps of the in situ Hi-C library preparation are identical to a ChIP-Seq experiment, in which the plant tissues are cross-linked by formaldehyde, and the nuclei isolated. The genomic DNA inside the nuclei is then digested by restriction enzymes and end-labeled with biotin nucleotides. These biotin-labeled ends are then ligated to each other to form chimeric DNA, and the non-ligated ends are digested to suppress the background. All these steps are performed inside the isolated nuclei, and hence the name of the technique is called “in situ” Hi-C. The cross-linking is

Charles Spillane and Peter McKeown (eds.), Plant Epigenetics and Epigenomics: Methods and Protocols, Methods in Molecular Biology, vol. 2093, https://doi.org/10.1007/978-1-0716-0179-2_11, © Springer Science+Business Media, LLC, part of Springer Nature 2020

147

148

Pengfei Dong and Silin Zhong

then reversed, in a similar way as a ChIP-Seq protocol, and the biotinylated DNA is enriched for Illumina sequencing. In addition, we present a simplified bioinformatic workflow for wet lab biologists to process and analyze the Hi-C data. The raw data is first mapped to the reference genome using the HiC-Pro package [3]. The A/B compartment, domain, and loop calling are performed using the Juicer package [4]. Local A/B compartment calling is performed using the Csoretool [5]. This in situ Hi-C library preparation and data analysis protocol has been used successfully for several plant species including tomato, maize, sorghum, and rice tissues [6].

2

Materials

2.1 Formaldehyde Fixation

1. 37% formaldehyde. 2. 10 PBS. 3. Glycine.

2.2

Nuclei Isolation

1. TE buffer: 10 mM Tris–HCl (pH 7.5), 1 mM EDTA. 2. 10% (v/v) Triton X-100: Diluted in distilled water. Can be stored in the fridge for up to 1 month. 3. 10% (w/v) SDS: Dissolve 1 g of SDS powder in 10 mL distilled water. Stored at room temperature. 4. Nylon filter (pore size 70 μM). 5. Protease Inhibitor cocktail. 6. 2-mercaptoethanol. 7. Nuclei isolation buffer: 0.4 M sucrose in 1 TE buffer. It can be stored in the fridge for 1 month. Fleshly add protease inhibitor tablet, 10% Triton X-100 to a final concentration of 1% (v/v), and 2-mercaptoethanol to 0.1% (v/v) prior to use. 8. Percoll. 9. Percoll gradient buffer: 0.25 M sucrose in 95% Percoll. To prepare a 50 mL Percoll gradient buffer, weight 4.2 g of sucrose, add 50 μL 1 M Tris–HCl (pH 7.5), and top up the volume with Percoll to 50 mL.

2.3 In Situ Digestion, Biotin End-Repair, and Ligation

1. Restriction enzymes and their respective buffers (e.g., DpnII, MobI, or HindIII). 2. Klenow fragment. 3. T4 DNA ligase. 4. 10 mM dNTP minus dCTP. 5. Biotin dNTP.

Characterization of Plant 3D Chromatin Architecture, In Situ Hi-C Library. . .

149

6. Protease K. 7. RNase A. 8. Reverse cross-linking buffer: 0.3 M NaCl, 10 mM Tris–HCl (pH 7.5), 1 mM EDTA, 0.1% (w/v) SDS, and 0.1% (v/v) 2-mercaptoethanol. 9. 3 M sodium acetate pH 5.2. 10. Glycogen. 11. Ethanol and isopropanol. 12. Chloroform. 13. Exonuclease III. 14. T4 DNA polymerase. 15. 10 mM dNTP minus dCTP and dGTP. 16. 0.5 M EDTA pH 8.0. 2.4 On-Bead Illumina TruSeq Library Preparation

1. Covaris M220 or suitable sonicator. 2. Covaris microTUBE AFA Fiber. 3. Streptavidin magnetic beads and suitable magnetic rack. 4. 1 binding buffer: 1 M NaCl, 10 mM Tris–HCl (pH 8.0), 1 mM EDTA. Add Tween-20 to 0.5% (v/v) prior to use. 5. End-repair enzyme and buffer with dNTP. 6. dA-tailing enzyme and buffer with dATP. 7. Illumina TruSeq Y-shape adapter and index PCR primers. 8. T4 DNA ligase and buffer. 9. Proofreading PCR enzymes. 10. Real-time PCR dye. 11. Real-time PCR machine. 12. PCR machine.

2.5 Hi-C Data Analysis

3

1. Multicore server with Linux-based OS. 2. Server GPU is required for loop calling.

Methods

3.1 Formaldehyde Fixation

1. Fully submerge the plant tissue, which could be 10–20 g of tomato leaf or fruit slices, in 100 mL 1 PBS buffer supplemented with 1% (v/v) formaldehyde. Vacuum infiltrate the sample for 10 min, then release vacuum slowly. Repeat the vacuum infiltration two more time.

150

Pengfei Dong and Silin Zhong

2. Add glycine to 0.125 M to stop the fixation. Mix well and vacuum infiltrate the sample for 5 min. Discard the fixation solution and quickly rinse the samples with chilled water. 3. Blot the samples on dry tissue towels to remove the water and immediately freeze the samples in liquid nitrogen. The samples can be stored in 80  C freezer indefinitely. 3.2 Nuclei Preparation

1. Grind the fixed tissues in liquid nitrogen, and resuspend the powder with at least 5 volumes of chilled nuclei isolation buffer. 2. Filter the solution through the nylon mesh, and collect the flow-through in one or multiple 50 mL centrifuge tubes on ice. Spin down the nuclei at 3000  g for 10 min at 4  C and discard the supernatant. 3. Resuspend the nuclei pellet using a suitable amount of chilled nuclei isolation buffer that is at least 5 volumes of the pellet size. Spin down the nuclei at 9000  g for 5 min at 4  C and discard the supernatant. This can be repeated multiple times till the chloroplasts are completely lysed and the pellet turns white or gray. For tomato fruits, the pellet will be of light red or orange color. 4. Resuspend the nuclei pellet with at least 10 volumes of chilled Percoll gradient buffer, and centrifuge the solution at 12,000  g for 15 min at 4  C. The starch particles should be partitioned to the pellet, while the nuclei and cell wall debris should be floating on top of the Percoll solution. Use a spatula to scoop out the top layer, and transfer to a clean 2 mL tube with chilled TE buffer. The remaining nuclei solution could be carefully recovered by pipette. 5. Spin down the nuclei in TE at 9000  g for 5 min at 4  C and discard the supernatant. Repeat the wash one more time.

3.3 In Situ Restriction Enzyme Digestion

1. For DpnII digestion, resuspend the nuclei with at least 2.5 volumes of NEBuffer 3 (supplemented with 1 protease inhibitor). We have also used NEBuffer 2 to resuspend the nuclei when other restriction enzyme, such as MboI or HindIII, is used. 2. Add 10% (w/v) SDS to a final concentration of 0.2% (w/v). Mix the nuclei by gentle pipetting. Incubate the nuclei at 62  C for 5 min in a water bath. This denature step partially unwind the DNA from the nucleosome and make them accessible for restriction enzyme. 3. Add at least equal volumes of NEBuffer 3 to bring the final SDS concentration down to less than 0.1% (w/v). Add 10% Triton X-100 to the final concentration of 1% (v/v). Incubate at 37  C for 15 min in a rotating incubator to quench the SDS.

Characterization of Plant 3D Chromatin Architecture, In Situ Hi-C Library. . .

151

4. Gently spin down the nuclei at 1000  g for 5 min at 4  C, resuspend in 100 μL (or appropriate volume depends on the size of the nuclei pellet) NEBuffer 3.1, and add 5 μL of DpnII (10 U/μL). Incubate the tube for at least 4 h at 37  C in a rotating incubator. We often perform digestion overnight. 5. Inactive the DpnII by incubating the tubes at 62  C for 20 min in a water bath. 6. Add 1 μL of 10 mM dNTP minus dC, 25 μL 0.4 mM biotin14-dCTP, 3 μL 10 NEBuffer 3.1, and 4 μL Klenow fragment (10 U/ μL). Incubate at room temperature for 4 h on a rotator. This step can also be performed overnight. 7. Spin down the nuclei at 1000  g for 5 min at 4  C. Carefully remove the supernatant without disturbing the pellet. Add 500 μL of 1 NEB T4 DNA ligase buffer supplemented with 1 BSA, 1 protease inhibitor, and 0.1% Triton X-100. Add 3 μL of T4 DNA ligase and incubate at room temperature for 4 h on a rotator. For large nuclei pellet, use at least five pellet volumes of buffer. 8. Spin down the nuclei at 1000  g for 5 min at 4  C, discard the supernatant, and resuspend with 200 μL of NEBuffer 1.1. Add 2 μL Exonuclease III (100 U/μL) to chewback the exposed DpnII overhangs, and incubate the reaction at 37  C for 10 min. 9. Spin down the nuclei at 1000  g for 5 min at 4  C, discard the supernatant, and resuspend with 500 μL of reverse crosslinking buffer. Add 5 μL of proteinase K (800 U/mL) and incubate at 55  C for 60 min. Transfer to the tube to 65  C and incubate overnight. 10. Add 1 μL of RNase A (10 mg/mL) and incubate at 37  C for 30 min to remove the RNA. 11. To purify the DNA, add equal volume of chloroform, mix by vortex, and centrifuge at 10,000  g at 4  C for 5 min. Transfer the upper aqueous phase to a new tube, and add 1/10 volume of 3 M sodium acetate, 1 μL of glycogen (10 mg/mL), and equal volume of isopropanol. Mix by inversion and incubate in 20  C for 2 h. Spin down the DNA at maximal speed (>20,000  g) for 20 min at 4  C, and wash the pellet with 80% ethanol. Air dry the pellet, and dissolve it with 100 μL of water. 12. Check the DNA concentration. We normally obtained 1 μg DNA from 106 tomato and maize nuclei. 3.4 On-Bead Illumina TruSeq Library Preparation

1. Start with 1–2 μg DNA in 100 μL. Add 10 μL 10 NEBuffer 2, 1 μL 10 mM dNTP minus dC and dG, and 2 μL T4 DNA polymerase. Incubate the reaction at 20  C for 2 h. Terminate the reaction by added 5 μL EDTA. T4 DNA polymerase in this step will further chewback the unligated DpnII overhangs.

152

Pengfei Dong and Silin Zhong

2. Transfer all 120 μL DNA to a Covaris microtube. Sonicate the DNA to 300–400 bp using Covaris M220. Other suitable sonicators such as the Bioruptor can also be used. 3. Purify the DNA fragment using a Zymo DNA purification column, and elute with 44 μL water. Add 5 μL of 10 end-repair buffer and 1 μL of NEB end-repair master mix, and incubate at room temperature for 30 min. 4. Purify the 50 μL reaction mix using 40 μL of AMPure® XP beads. Elute the DNA with 50 μL of TE and transfer to a new PCR tube. 5. Wash 10 μL of Dynabeads® MyOne™ Streptavidin C1 three times with 150 μL 1 binding buffer, and resuspend the beads with 50 μL of 2 high salt binding buffer. 6. Add the beads in 50 μL of 2 high salt binding buffer to the elute DNA. Bind at room temperature on a rotator for 1 h. 7. Place the tube on a magnetic tube rack for 1–2 min, and discard the supernatant. Wash the beads with 150 μL of 1 binding buffer. Repeat the wash three times. 8. Wash the beads with TE (supplemented with 0.1% Tween 20) and transfer them to a new PCR tube. 9. To perform the on-bead dA-tailing reaction, recover the beads on a magnetic rack and resuspend the beads in 50 μL of 1 dA-tailing buffer and add 1 μL dA-tailing enzyme. Incubate the tube at 37  C for 30 min in a rotating incubator. 10. Recover the beads on a magnetic rack. Resuspend the beads with 50 μL 1 rapid T4 ligase buffer with 1 μL 15 μM Illumina TruSeq adapter. Add 1 μL T4 DNA Ligase. Incubate the ligation reaction at room temperature for 30 min in a rotator. Ligation could be performed overnight at 4  C. 11. Place the tube on a magnetic tube rack for 2 min, and discard the supernatant. Wash the beads with 150 μL of 1 binding buffer. Repeat the wash three times. 12. Resuspend the beads with 150 μL of TE μL supplement with 0.1% Tween 20. Transfer the beads to a new tube. Repeat the wash three times. 13. The library on the beads can be split into two aliquots. Use half of them (5 μL initial C1 beads) in a 60 μL PCR reaction: 36 μL water, 2.5 μL of forward primer, 2.5 μL of reverse primer, 1 μL of 10 mM dNTP, 12 μL of Q5 buffer (5), 0.5 μL of Q5 DNA Polymerase. 14. Run PCR with the following program: 98  C denature for 30 s, then 4 cycles of amplification: 98  C for 10 s, 65  C for 20 s, 72  C for 20 s, and finally 72  C for 1 min, hold at 4  C.

Characterization of Plant 3D Chromatin Architecture, In Situ Hi-C Library. . .

153

15. Recover the supernatant on the magnetic rack and discard the beads. 16. Transfer 10 μL of the PCR reaction to a new qPCR machine compatible tube with 10 μl 1 Q5 PCR buffer supplemented with 0.5 μL EvaGreen qPCR dye. Real-time PCR is then performed in order to determine the optimal PCR cycle for the remaining libraries. The remaining 50 μL of PCR reaction can be kept on ice for 1 h. 17. PCR amplified the remaining libraries and purified the PCR product using equal volume of AMPure XP beads. Check the library size and concentration. Perform agarose gel electrophoresis and recover the 400–500 bp DNA if needed. We often mixed a small amount of the indexed Hi-C library with other libraries for a pilot sequencing to check the library quality. 3.5

Data Analysis

The following dependencies are required for data analysis. 1. HiC-Pro. 2. R (>3.1) with the data.table, rioja packages. 3. Juicer. 4. R script for local compartment identification. https://github. com/Pengfei-Dong/local-compartment/blob/master/call_ local_compartment.R 5. Cscoretool. 6. CUDA(7.0 for Juicer HICCUPS) and respective JCUDA or Fit-Hi-C.

3.5.1 Map Hi-C Data to a Reference Genome

In this section, we use one tomato Hi-C data [6] (NCBI SRA: SRR5748732) as an example to demonstrate the general workflow for Hi-C data analysis. We assume the raw data has been downloaded from SRA and stored in /raw/tomato3/; the reference genome directory is /path/to/ref/, which contains the genome file genome.fa; the HiC-Pro installation path is /path/to/HiCpro/; the working directory is /path/to/work/; and the path to juicer_tools.jar is /path/to/juicer_tools.jar. To perform in silico RE digestion: /path/to/HiCpro/bin/utils/digest_genome.py -r mboi -o MboI_resfrag.bed /path/to/ref/genome.fa

To obtain a chromosome size file: samtools faidx genome.fa awk ’($1 ~ /^SL3.0/){print $1"\t"$2}’ genome.fa.fai > chro. sizes

154

Pengfei Dong and Silin Zhong

Modify the hicpro configuration file “config-hicpro.txt” as follows: PAIR1_EXT = _1 #suffix for the read 1 FASTQ file PAIR2_EXT = _2 #suffix for the read 2 FASTQ file BOWTIE2_IDX_PATH = /path/to/ref/ REFERENCE_GENOME = genome GENOME_SIZE = /path/to/ref/chro.sizes GENOME_FRAGMENT = /path/to/ref/MboI_resfrag.bed LIGATION_SITE = GATCGATC BIN_SIZE = 10000 20000 40000 100000 500000

To run hicpro: nohup HiC-Pro –i /raw / -o ./tomato_hic –c config-hicpro.txt &

The processed Hi-C data, the interaction matrix, and the statistics will be saved in the following folders: tomato_hic/hic_results/data tomato_hic/hic_results/matrix tomato_hic/hic_results/pic

3.5.2 Chromosome-Wide A/B Compartment Calling

The following command will use juicer to call the A/B compartment of tomato chromosome 1. It will generate a file named “tomato3_ch01” containing the eigenvector of chromosome 1 at 500 kb bin resolution. The bins with positive or negative value will be assigned as A or B compartment, respectively. It should be noted that the sign of the eigenvector needs to be manually reversed if it anticorrelates with active chromatin signals such as DHS or H3K4me3 or gene density. Create a shell script. $ cat >run_juicer_compartment.sh awk ’{$4=$4!="+"; $7=$7!="+"} $2run_juicer.sh java -jar /path/to/juicer_tools.jar arrowhead -r 10000 -k KR --ignore_sparsity tomato3_allValidPairs.hic tomato3_domain_list Press Ctrl-C to save and exit, and then execute the script: $ nohup bash run_juicer.sh &

156

Pengfei Dong and Silin Zhong

To run HICCUPS:

$ java -Djava.library.path=/path/to/JCUDA -jar /path/to/juicer_tools.jar hiccups \ -m

2048

-r

10000

-f

0.1

-p

1

-i

3

--ignore_sparsity

tomato3_allValidPairs.hic \ tomato3_10000_hiccups

The HICCUPS loop file is stored in “tomato3_10000_hiccups” folder. The juicer HICCUPS needs large amount of computing resource and requires GPU. Fit-Hi-C [7] is an alternative tool for loop calling and doesn’t depend on GPU. The chromatin loops can be identified using the following script by Fit-Hi-C. Create a shell script. $ cat >run_fithic.sh #!/bin/bash /path/to/hicpro/bin/ut/hicpro2fithic.py -i \ tomato_hic/hic_results/matrix/tomato3/raw/10000/tomato3_10000.matrix -b \ tomato_hic/hic_results/matrix/tomato3/raw/10000/tomato3_10000_abs.bed -s \ tomato_hic/hic_results/matrix/tomato3/iced/10000/tomato3_10000_iced.matrix.biases -r 10000 fithic -f fithic.fragmentMappability.gz -i fithic.interactionCounts.gz -t fithic.biases.gz -o tomato3_fithic -l tomato3_fithic&

Press Ctrl-C to save and exit, and then execute the script: $ nohup bash run_fithic.sh & The Fit-Hi-C loop file is stored in “tomato3_fithic” folder

References 1. Lieberman-aiden E, Van BNL, Williams L et al (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326(80):289–293 2. Rao SSP, Huntley MHH, Durand NCC et al (2014) A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159:1665–1680. https://doi. org/10.1016/j.cell.2014.11.021

3. Servant N, Varoquaux N, Lajoie BR et al (2015) HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol 16:259. https://doi.org/10.1186/s13059-015-0831-x 4. Durand NC, Shamim MS, Machol I, et al (2016) Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments 5. Zheng X, Zheng Y (2018) CscoreTool: fast Hi-C compartment analysis at high resolution.

Characterization of Plant 3D Chromatin Architecture, In Situ Hi-C Library. . . Bioinformatics 34:1568–1570. https://doi. org/10.1093/bioinformatics/btx802 6. Dong P, Tu X, Chu P-Y et al (2017) 3D chromatin architecture of large plant genomes determined by local A/B compartments. Mol Plant 10:1497–1509. https://doi.org/10.1016/J. MOLP.2017.11.005

157

7. Ay F, Bailey TL, Noble WS (2014) Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts. Genome Res 24:999–1011. https://doi.org/10.1101/gr. 160374.113

Part III Applications and Novel Insights into Epigenetics and Epigenomics in Plants

Chapter 12 The Gene Balance Hypothesis: Epigenetics and Dosage Effects in Plants Xiaowen Shi, Chen Chen, Hua Yang, Jie Hou, Tieming Ji, Jianlin Cheng, Reiner A. Veitia, and James A. Birchler Abstract Dosage effects in plants are caused by changes in the copy number of chromosomes, segments of chromosomes, or multiples of individual genes. Genes often exhibit a dosage effect in which the amount of product is closely correlated with the number of copies present. However, when larger segments of chromosomes are varied, there are trans-acting effects across the genome that are unleashed that modulate gene expression in cascading effects. These appear to be mediated by the stoichiometric relationship of gene regulatory machineries. There are both positive and negative modulations of target gene expression, but the latter is the plurality effect. When this inverse effect is combined with a dosage effect, compensation for a gene can occur in which its expression is similar to the normal diploid regardless of the change in chromosomal dosage. In contrast, changing the whole genome in a polyploidy series has fewer relative effects as the stoichiometric relationship is not disrupted. Together, these observations suggest that the stoichiometry of gene regulation is important as a reflection of the mode of assembly of the individual subunits involved in the effective regulatory macromolecular complexes. This principle has implications for gene expression mechanisms, quantitative trait genetics, and the evolution of genes depending on the mode of duplication, either segmentally or via whole-genome duplication. Key words Aneuploidy, Ploidy, Copy number variants, Quantitative traits, Gene expression, Dosage compensation, Gene balance hypothesis

1 1.1

Gene Dosage Effects Introduction

Dosage effects can be defined as changes in expression of genes with altered copy number in small segments as well as from changing the whole set of chromosomes in a genome. The former is referred to as aneuploidy and the latter as polyploidy. There is a distinct difference in the effect on plant phenotypes depending upon whether there is a partial or whole-genome change in chromosome number. Unbalanced genomes have a detrimental effect on the phenotype (see Fig. 1). Early in the development of the field of genetics, it was recognized that defects in aneuploidy were much

Charles Spillane and Peter McKeown (eds.), Plant Epigenetics and Epigenomics: Methods and Protocols, Methods in Molecular Biology, vol. 2093, https://doi.org/10.1007/978-1-0716-0179-2_12, © Springer Science+Business Media, LLC, part of Springer Nature 2020

161

162

Xiaowen Shi et al.

Fig. 1 Haploid maize plants that are balanced compared to one that is disomic for the short arm of chromosome 1. The four plants to the left are normal haploids with ten chromosomes. The haploid plant at the right is the same age but has an additional short arm of chromosome 1. The imbalance of the genome has a detrimental effect on the plant stature

more severe than those in polyploidy [1–4]. This realization came initially from studies of trisomics and polyploids in Datura, but the principles learned in that species are now known to apply basically across all eukaryotes [5]. This comparison was recognized long before the genetic material was understood, but as molecular genetics emerged, this concept of genetic “balance” became interpreted as the result of a dosage effect of the encoded genes on the segment of the genome that was varied. Recent evidence, however, indicates that the situation is much more complicated and involves a stoichiometric relationship of regulatory factors determining aspects of gene expression [6–8]. Data from gene expression studies of aneuploids and polyploids, evolutionary genomics, and quantitative genetics as well as theoretical aspects of assembly of macromolecular complexes [9, 10] all point to a role of stoichiometry of gene regulation as the underlying basis. This principle forms the basis of the gene balance hypothesis, which states that varying the stoichiometry of members of macromolecular complexes will alter the assembly and function of the whole as the central basis of these dosage phenomena [5]. Here we review this evidence and provide methodology for analysis of dosage effects in plants. 1.2 Dosage Effects of Gene Expression

In terms of gene expression, early studies of isozyme and protein expression in aneuploids and ploidy series illustrated that some cases of gene dosage effects could be found [11, 12]. However,

Plant Dosage Effects

163

other instances revealed that there was no relative change of expression in an aneuploid series resulting in dosage compensation [12, 13]. Also, in those early studies, dosage effects operating in trans on the unvaried part of the genome were found. While these effects could be both positive and negative, an inverse correlation of the varied chromosome and the amount of expression of genes encoded elsewhere in the genome was more prevalent [12]. The basis of the dosage compensation in several examples was shown to result from the combination of a gene dosage effect and an inverse effect operating on the same target gene in a varied segment of the genome [12, 13]. In contrast, with a ploidy comparison, many fewer relative changes were found to occur [14–16]. Later, the same types of effects were found on the mRNA level in a survey of several model genes in maize for many regions of the genome varied in one, two, and three doses in embryo tissue and accompanied by examination of two, three, and four doses in the otherwise triploid endosperm [17]. Again, the more predominant effect in trans was an inverse effect. The impact of aneuploidy was greater in the diploid embryo than in the triploid endosperm— consistent with the degree of stoichiometric change being critical. Subsequently, it was found that the trans-acting positive and negative dosage effects and the coincident dosage compensation could be reduced to the action of single genes [18, 19]. Such studies were conducted in Drosophila due to the ease with which subtle positive and negative modulations could be identified. It was found that the reporter phenotype (the modulation of the leaky white-apricot eye color) could be affected in a dosage-sensitive manner by many genes. Those that have been identified at the molecular level are mainly transcription factors, signal transduction components, and chromatin proteins [19]. Furthermore, studies in yeast and human have indicated that dosage-sensitive haplo-insufficient genes are preferentially those that are involved in macromolecular complexes [20–22]. 1.3 Dosage Involvement in Quantitative Traits

Indeed, in plants, individual genes responsible for quantitative traits were found to be dosage-sensitive and are largely transcription factors and signaling components as well [23–28]. The parallels between aneuploid effects and quantitative trait genetics have been pointed out: namely, any one trait can be affected by multiple aneuploidies on the one hand and by multiple dosage-sensitive quantitative trait loci on the other [5, 17]. Given the parallels, it seems likely that they have a common basis.

1.4 Evolutionary Genomics

Evolutionary genomics has also provided data in support of the gene balance hypothesis. In most evolutionary lineages, and especially in the plant kingdom, there have been many cycles of wholegenome duplication followed by gene losses back to a near-diploid level [29–31]. Certain classes of genes are retained as duplicates for

164

Xiaowen Shi et al.

longer periods of time than others [29, 32–38], typically those involved with macromolecular complexes such as transcription factors and signal transduction components. It seems apparent that deletion of one member of a complex has a detrimental effect on fitness and is selected against. Eventually, however, drift and selection might evolve new balances that are not detrimental with further erosion of the duplicates to singleton status. Yet, duplicates maintained for long evolutionary periods might evolve new (neofunctionalization) or partitioned functions (subfunctionalization) that hold them in duplicate in perpetuity [33]. Indeed, the retention of regulatory functions in duplicate due to balance considerations might foster the divergence of functions. A generally complementary pattern is found for small-scale duplications. In this case, there is an underrepresentation of genes involved with macromolecular complexes [36, 38]. A duplication of a member of a macromolecular complex would have a negative fitness and be selected against, and a study across the plant kingdom showed this relationship is a generalized occurrence [38]. The different fate of duplicate genes based on the mode of duplication indicates the involvement of genomic balance in their evolutionary fate. 1.5 Global Dosage Effects in Plants

A comprehensive study of global gene expression across the genome utilizing all five trisomies and three ploidy levels in Arabidopsis thaliana illustrated the parameters of genomic balance on gene expression [39, 40]. The approach of the study was to perform gene expression analysis using RNA-seq with biological replicates for each trisomy and for diploids, triploids, and tetraploids. The biological replicates were averaged and ratios were generated for each expressed gene between the experimental genotype and the normal diploid. These were plotted in ratio distributions with a ratio of 1.00 representing no change between the experimental and the control diploid. In the ploidy series, expression was relatively similar with some outliers from 2x to 4x. It is known that the transcriptome and cell size in Arabidopsis increases with ploidy [16, 41–43] but the relative expression among genes remains somewhat similar, albeit with some differences also observed. In contrast, the distributions of gene expression ratios between trisomies and diploids exhibit greater variance: both increases and decreases were found to occur, but, as with individual gene determinations mentioned above, the overall effect is reduced expression in the range of the inverse level or below [40]. With the trisomies, the genes were partitioned into those located on the trisomic chromosome, which were referred to as being in cis, and those encoded elsewhere on the genome, which were referred to as being in trans. Genes in cis could theoretically exhibit a gene dosage effect approaching a 1.5 ratio, which would

Plant Dosage Effects

165

indicate a proportional gene dosage effect. However, many genes were not expressed at this level and were in fact closer to the diploid level. These were concluded to exhibit dosage compensation. When more genes showed an inverse effect in trans, more exhibited dosage compensation in cis, suggesting a relationship between the two behaviors as found at the individual gene level described above. DNA methylation was examined in each genotype. The greatest effect involved only one trisomy, chromosome 4, which had an increase in CHH methylation in gene bodies. There was not a great shift in methylation in the various other trisomies or the ploidy series indicating that DNA methylation is not mechanistically involved with the modulations found in aneuploidy but rather that aneuploidy could trigger some differences in methylation. When different functional groups of genes were examined, there were distinct patterns of response in the trisomies [40]. Genes displaying gene body methylation, which are typically thought to perform housekeeping functions, were less modulated than all genes. On the other extreme, genes encoding transcription factors, organelle-targeted proteins, and other categories were more strongly modulated with inverse effects routinely occurring. On the other hand, genes encoding the structural proteins of the ribosome had a predominantly positive modulation by trisomy. When transcription factors and their predicted targets were compared to their matched behavior in trisomies, there was considerable discordance, although there was concordance in about half of the comparisons. In other words, the expression between a transcription factor and its predicted targets is often opposite in the aneuploidy data to that found in co-expression network comparisons. Aneuploidy acts like a dominant negative on the processes of gene expression to a certain degree. Very few network connections showed differences in the ploidy comparisons. These comparisons illustrate in a different manner how genomic imbalance can alter global gene expression. These effects are likely a contributor to phenotypic effects [16, 41, 44–46].

2

Generating Ratio Distributions and Scatter Plots to Analyze Dosage Effects Because of the potential for global modulation in one direction from the control in studies of dosage effects, the ratio distribution approach has some advantages over differential expression studies, which could be skewed by global changes. Differential expression studies rely on significant differences in expression between the experimental and control samples. For many applications, this is robust but when global effects occur that are close to the control values, differential expression determinations are limited by the number of biological replicates that are feasible, experimentally and cost-wise. On the other hand, a ratio distribution analysis

166

Xiaowen Shi et al.

reveals the whole landscape especially for those modulations that are subtle and close in value to the control. All genes are used as data points in the distributions when comparing experimental and controls, thus allowing statistical analysis of subtle effects by using distributions. Kolmogorov–Smirnov and Bartlett’s statistical tests can compare distributions for differences and the spread of variation in the distributions, respectively [40]. Scatter plots can be used to determine significance of differential gene expression using Empirical Analysis of Digital Gene Expression Data in R (edgeR) as a complement to the ratio distributions [47, 48]. Please see software implementation for source code of ratio distribution and scatter plots. 2.1 Ratio Distribution Plots

RNA-seq experiments are usually limited by material and cost. Thus, trends of expression, which involve subtle changes between experimental material and the control, can be difficult to detect. A ratio distribution analysis allows these effects to be visualized with the added advantage that the entire landscape of effects can be revealed in one type of analysis; this type of analysis is particularly valuable for dosage studies. Here, a protocol for producing ratio distributions is described. 1. Normalize read counts. For instance, RPKM (Reads Per Kilobase Million) or FPKM (Fragments Per Kilobase Million) normalization could be performed depending on whether the sequencing is single-end or paired-end. If External RNA Controls Consortium (ERCC) spike-in mix is used in the RNA-seq, data could be normalized based on the ERCC read counts. 2. Remove lowly expressed genes (recommended). Lowly expressed genes would generate ratios of extreme values, forming peaks of outliers. Also, they might represent random alignments rather than being a reflection of real expression. If lowly expressed genes are not filtered from the dataset, removal of non-expressed genes (0 read counts when combining experimental and control) is required. 3. Compute the mean of normalized counts of all biological replicates within each experimental group and the mean of replicates in the control group. Replace the mean value of the control to 10E-6 if it equals zero. Calculate the ratio of each gene by dividing the mean of experimental counts by the mean of control counts. 4. Generate a histogram using ggplot2 package in the R program [49]. Ratios of experimental values to control values are plotted on the X-axis with a bin width of 0.05. Ratios greater than or equal to 6 are merged into one bin. Y-axis denotes the number of genes per bin. An example of a ratio distribution plot of a random RNA-Seq dataset is given below (Fig. 2).

Plant Dosage Effects

167

Fig. 2 Ratio distribution of gene expression of a random RNA-Seq dataset. For each expressed gene, a ratio of the averaged read counts in the respective experimental genotype was made over the read counts in the control. The X-axis notes the value for each bin of 0.05, and the Y-axis notes the number of genes per bin. Auxiliary lines at 0.67 (green), 1.00 (black), and 1.50 (pink) serve as an example of ratios representing different types of dosage effect. A ratio of 1.00 represents no change (dosage compensation) in the experimental genotype versus the control. Ratios of 0.67 and 1.50 represents an inverse effect and a dosage effect, respectively, in a hypothetical trisomic scenario

Fig. 3 Ratio distribution of gene expression of a random RNA-Seq dataset partitioned into cis and trans. Analysis was conducted as described in Fig. 2

168

Xiaowen Shi et al.

5. Under certain circumstances (e.g., aneuploidy), there may be a requirement to plot cis (genes on the varied chromosome) and trans (the remainder of the genome) genes separately. In this scenario, an input of a cis gene list is required. The figure will be subdivided into two histograms vertically. Figure 3 illustrates an example of a dataset with genes showing dosage compensation and dosage effect in cis with an inverse effect in trans. 2.2

Scatter Plots

1. Perform differential gene expression analysis using edgeR [47, 48]. edgeR would generate a table composed of log fold change with base 2 between the experimental group and the control group (logFC), the average log counts per million with base 2 (logCPM), p-value, likelihood ratio statistics (LR) and adjusted p-value (FDR) with raw read count table as an input. 2. Compute the mean of normalized counts. The normalized count table generated in Subheading 2.1, step 2 is required for calculating the mean of normalized counts of replicates of the experimental and control group. 3. Generate a scatter plot in using ggplot2 package in the R program [49]. The logFC is plotted on the X-axis, while the sum of normalized counts is plotted on the Y-axis. Data points with an adjusted p-value 0 were depicted in red, while points with an adjusted p-value C Chr1 964 965 C>T Chr1 1112 1113 A>G Chr1 1428 1429 C>A l

A reference genome for alignment (FASTA format)

>Chr1 AGATTTTCACAACCACCACACAATTTATAACATTTAACAACTCATCATTTCAAGATAACAAGGAATTTAAA TATGAGAAATAACTGAAAAATATTTGTGGTCATCATAAATGAAATTTGTACATTTTAGCTCATTTAAGTTGT

186

Colette L. Picard and Mary Gehring

l

A set of gene annotations (GTF format)

Chr1 SRC 5UTR 959 1201 . + . transcript_id "AT1G06530.1"; gene_id "AT1G06530" Chr1 SRC exon 959 2379 . + . transcript_id "AT1G06530.1"; gene_id "AT1G06530" Chr1

SRC

start_codon

1202

1204

.

+

.

transcript_id

"AT1G06530.1"; gene_id "AT1G06530"

If starting from allele-specific count data, four files are required: A-derived read counts in AxB, B-derived read counts in AxB, A-derived read counts in BxA, and B-derived read counts in BxA. Each should be in the following format, which is the standard output format for htseq-count [45]: AT1G04483 0 AT1G06530 24 AT1G06540 11 AT1G06550 4

3.3 Genome Preparation

To minimize mapping bias in favor of A or B (see Subheading 2.3), we suggest mapping to a “metagenome” of A and B instead of the reference genome, although this is optional. We provide a script, make_metagenome.py, to easily build a metagenome from a reference genome, a SNP file, and (optionally) a GTF file. An example command is: make_metagenome.py snps.bed genome.fa metagenome --GTF annotations.gtf

This will create three output files: the metagenome itself in FASTA format, a meta-chromosome file that maps the original chromosome names to the new ones in the metagenome, and (if GTF provided) a meta-GTF file that is compatible with the metagenome. This pipeline uses STAR [46] to align RNA-seq reads to the provided genome. In order to be used with STAR, the (meta) genome must also be indexed by running STAR in genomeGenerate mode. This example command creates a new folder STAR_index to store the STAR-indexed genome files and builds the index using the metagenome FASTA and GTF files: mkdir "STAR_index" STAR --runMode genomeGenerate --outFileNamePrefix "STAR_index" --genomeDir "STAR_index" --genomeFastaFiles "metagenome.fa" -sjdbGTFfile "metagenome_metagtf.gtf" --sjdbOverhang 30

Detection and Conservation of Imprinted Expression

3.4 Initial Read Quality Filtering and Alignment

187

The initial quality filtering, adapter trimming, and alignment of reads to the genome are performed using rna_seq_map.sh. This script evaluates read quality using FastQC [47], trims adapter sequences from reads and filters low-quality reads using trim_galore [48], and aligns reads to a provided genome using STAR [46]. This script is intended to be broadly useful in RNA-seq data analysis and has a number of tunable parameters and options. The reads from AxB and BxA must be mapped separately to the STAR-indexed genome. An example pair of commands is: rna_seq_map.sh -1 AxB.fq -g STAR_index -o AxB_mapped -n AxB rna_seq_map.sh -1 BxA.fq -g STAR_index -o BxA_mapped -n BxA

The results are output to new directories AxB_mapped and BxA_mapped. Alignments in BAM format are output to AxB_mapped/STAR/AxB_unique_alignments.bam and BxA_mapped/ STAR/BxA_unique_alignments.bam. Log files AxB_mapped/ AxB_log.txt and BxA_mapped/BxA_log.txt contain alignment statistics and other useful information. Additional information about rna_seq_map.sh, including usage information and a list of tunable parameters and options, can be obtained by running the script without any inputs: rna_seq_map.sh

In order to use rna_seq_map.sh with the metagenome, the STAR-indexed metagenome must be provided with the -g option, and the meta-chromosome file generated by make_metagenome.py must be provided with the -C option. Users may also want to modify the default maximum number of mismatches (provided as a fraction of read length, default 0.05) and minimum and maximum intron lengths (default 70 bp–5000 bp) using the options -N, -i, and -I, respectively. The adapter sequence used to trim adapters from reads is AGATCGGAAGAG by default, which is suitable for many Illumina applications. The adapter sequence can be changed using the -a option. 3.5 Identifying Imprinted Genes from Alignment Data

The script call_imprinting.sh performs the majority of the analysis. Starting from either RNA-seq alignments (obtained from rna_seq_map.sh) or allele-specific counts, imprinting is assessed using a streamlined pipeline with a number of tunable parameters to allow modification of the various cutoffs. If starting from RNA-seq alignments, the alignments are first compared to the provided SNPs and separated into A-derived, B-derived, or neither. The A-derived and B-derived alignments are used to get allele-specific per-gene counts

188

Colette L. Picard and Mary Gehring

using htseq-count [45]. These counts are then used to assess imprinting, using the criteria discussed in Subheading 1.4. To use alignments obtained from rna_seq_map.sh with call_imprinting.sh, an example command is: AxB_bam="AxB_mapped/STAR/AxB_unique_alignments.bam" BxA_bam="BxA_mapped/STAR/BxA_unique_alignments.bam" call_imprinting.sh -o imprinting_analysis -1 $AxB_bam -2 $BxA_bam -S snps.bed -G annotations.gtf

Alternatively, if using pre-calculated allele-specific count data, use: call_imprinting.sh -o imprinting_analysis -x AxB_A_counts.txt -y AxB_B_counts.txt -X BxA_A_counts.txt -Y BxA_B_counts.txt

These example commands will produce a new directory called imprinting_analysis. In the imprinting_analysis/imprinting subdirectory, MEGs and PEGs are listed in files ending with ∗_MEGs.txt and ∗_PEGs.txt, respectively. Results of the imprinting analysis for all genes, including the “imprinting status” (Fig. 5), can be found in ∗_imprinting_filtered_all.txt. Finally, scatter plots of the log2 (m:p ratio) in AxB vs. BxA are provided, with MEGs highlighted in red, PEGs in blue, and non-imprinted genes in gray. Additional information about call_imprinting.sh, including usage information and a list of tunable parameters and options, can be obtained by running the script without any inputs: call_imprinting.sh

In particular, the p-value, IF, and CEF thresholds can be adjusted using options -p, -I, and -C, respectively. The minimum percent of maternal reads required for a gene to be called a MEG can be adjusted with -M, and the maximum percent of maternal reads required for a gene to be called a PEG can be adjusted with -P. The minimum number of total allele-specific counts required in both directions of the cross to evaluate imprinting can also be adjusted with -c. We generally recommend the following parameters: If expected maternal:paternal ratio ¼ 1: -R 1 -c 10 -p 0.01 -I 2 -C 10 -M 70 -P 30 If expected maternal:paternal ratio ¼ 2 (e.g., most endosperm): -R 2 -c 10 -p 0.01 -I 2 -C 10 -M 85 -P 50 All filters can be disabled by setting the value of the filter such that it will always be true (e.g., setting the minimum percent

Detection and Conservation of Imprinted Expression

189

maternal reads required for a MEG to 0). Note that if data are from stranded RNA-seq libraries, the strandedness of both AxB and BxA must be indicated using the appropriate combination of -w, -W, -v, and -V. By default, both are assumed to be unstranded. Strandedness is taken into account when counting reads with htseq-count. 3.6 Comparing Imprinting Between Species

To compare imprinting between two species, referred to here as species X and species Y, the script get_homologs.py can first be used to obtain all known X to Y homolog pairs from Phytozome [31]. An example command is: get_homologs.py

"species

X"

"species

Y"

X_genelist.txt

X_to_Y_homologs.txt

This will output the file X_to_Y_homologs.txt, which contains the X genes in the first column and their Y homologs in the second column. The following is an example output file from A. thaliana and Z. mays: A. thaliana Z. mays AT5G23110 GRMZM2G084819 AT4G35800 GRMZM2G044306 AT5G65930 GRMZM2G070273 AT5G65930 GRMZM2G423861

Using this file, the imprinting status of the homologs of the MEGs and PEGs identified in each species can be summarized using comp_imprinting.py, assuming data from both X and Y have been analyzed using call_imprinting.sh. The following is an example command where the imprinting status of the Y homologs of X imprinted genes is assessed: comp_imprinting.py X_status_all.txt Y_status_all.txt X_to_Y_homologs.txt outprefix

This analysis is directional—that is, the imprinting status of imprinted genes in X can be queried in Y (as above), or the status of imprinted genes in Y can be queried in X. Additionally, genes in the same pathway, complex, or other interaction group can be provided with the—pathway option. The script will then perform an additional analysis to identify gene groups where members are imprinted or parentally biased in both X and Y, which may suggest group- or pathway-level conservation of imprinted expression.

190

4

Colette L. Picard and Mary Gehring

Illustrative Example Using Real Data

4.1 Identifying Imprinted Genes in an Arabidopsis thaliana Dataset Starting from Raw Sequencing Reads

Data from replicate 1 of Col x Cvi and Cvi x Col endosperm from Pignatta et al. 2014 [30] are available on the SRA [29] with run IDs SRR1039917 and SRR1039921. SNPs were obtained from Pignatta et al., supplementary file 2 [30]. The genome used is TAIR10 [49] and genome annotations are araport11 [50]. All commands below are run in a Linux environment. # make new directory to store all data mkdir workdir cd workdir # make Col + Cvi metagenome (Col is reference strain) make_metagenome.py Col_Cvi_SNPs.bed TAIR10.fa Col_Cvi_meta -GTF araport11.gtf # STAR-index metagenome mkdir Col_Cvi_meta_STAR STAR --runMode genomeGenerate --outFileNamePrefix "Col_Cvi_meta_STAR/log" --genomeDir "Col_Cvi_meta_STAR" --genomeFastaFiles Col_Cvi_meta.fa --sjdbGTFfile Col_Cvi_meta_metagtf.gtf --sjdbOverhang 30 # download data from SRA mkdir rawdata fastq-dump -F --split-files --outdir rawdata SRR1039917 fastq-dump -F --split-files --outdir rawdata SRR1039921 # map ColxCvi and CvixCol RNA-seq reads to metagenome (note will take a while) rna_seq_map.sh -1 rawdata/SRR1039917_1.fastq -g Col_Cvi_meta_STAR -C Col_Cvi_meta_metachrom.txt -o ColxCvi -A Col -B Cvi -n ColxCvi -a GATCGGAAGAGCGGTTCAG -3 rna_seq_map.sh -1 rawdata/SRR1039921_1.fastq -g Col_Cvi_meta_STAR -C Col_Cvi_meta_metachrom.txt -o CvixCol -A Col -B Cvi -n CvixCol -a GATCGGAAGAGCGGTTCAG -3 # call imprinting based on alignments obtained from rna_seq_map AxB_bam="ColxCvi/STAR/ColxCvi_unique_alignments.bam" BxA_bam="CvixCol/STAR/CvixCol_unique_alignments.bam" call_imprinting.sh -o Athaliana_imprinting -1 $AxB_bam -2 $BxA_bam -S Col_Cvi_SNPs.bed -G araport11.gtf -A Col -B Cvi -n Athaliana -R 2 -I 2 -C 10 -M 85 -P 50 -c 10

Detection and Conservation of Imprinted Expression

191

Summary of initial imprinting results from log file: Summary of results: - 13205 genes had allelic counts >= 10 in both crosses and could be evaluated - Of these 13205: - 8494 (64.3%) failed the p-value cutoff (adjusted imprinting p-value >= 0.01) - 3335 (25.3%) passed the p-value cutoff but failed the IF cutoff (IF < 2) - 67 (0.5%) passed the p-value and IF cutoffs but failed the CEF cutoff (CEF >= 10) - 835 (6.3%) passed the p-value and IF cutoffs but failed the pmat cutoff (pmat < 85 for MEGs, or pmat > 50 for PEGs) - 474 (3.6%) passed all filters for imprinting ---------------------Number of MEGs passing all filters: 415 Number of PEGs passing all filters: 59 ----------------------

The results indicate a much larger number of MEGs than PEGs. As noted in Subheading 2.4, contamination from the seed coat, which is a maternal tissue, is common in species with small seeds and can cause false-positive MEGs. Therefore, Pignatta et al. (2014) used data from Belmonte et al. 2013 [51], which profiled gene expression in the different seed compartments of the accession Ws-0, to identify genes highly expressed in seed coat relative to endosperm. To reduce false-positive MEGs, 934 genes with expression that was more than twice as high in seed coat relative to endosperm were censored from the analysis (available as part of Fig. 1—source data 2 [30]). The call_imprinting.sh script provides the option to censor any genes in a provided list from the analysis. Since the previous run of call_imprinting.sh generated allelespecific count files, which were saved to subfolder counts_per_gene in the output directory, these can be used as input instead of the BAM files to greatly speed up the analysis. # call imprinting again, using -f to censor highly expressed seed coat genes AxB_A_counts="Athaliana_imprinting/counts_per_gene/Athaliana_ColxCvi_Col_counts.txt" AxB_B_counts="Athaliana_imprinting/counts_per_gene/Athaliana_ColxCvi_Cvi_counts.txt" BxA_A_counts="Athaliana_imprinting/counts_per_gene/Athaliana_CvixCol_Col_counts.txt" BxA_B_counts="Athaliana_imprinting/counts_per_gene/Athaliana_CvixCol_Cvi_counts.txt"

192

Colette L. Picard and Mary Gehring call_imprinting.sh -o Athaliana_imprinting_2 -x $AxB_A_counts -y $AxB_B_counts -X $BxA_A_counts -Y $BxA_B_counts -A Col -B Cvi -n Athaliana -R 2 -I 2 -C 10 -M 85 -P 50 -c 10 -f thaliana_seedcoat_filter.txt

Summary of final results from log file: Summary of results: - 13205 genes had allelic counts >= 10 in both crosses and could be evaluated - Of these, 1241 (9.4%) were in the --filter file and censored - Of the remaining 11964 loci evaluated: - 8064 (67.4%) failed the p-value cutoff (adjusted imprinting p-value >= 0.01) - 3022 (25.3%) passed the p-value cutoff but failed the IF cutoff (IF < 2) - 52 (0.4%) passed the p-value and IF cutoffs but failed the CEF cutoff (CEF >= 10) - 636 (5.3%) passed the p-value and IF cutoffs but failed the pmat cutoff (pmat < 85 for MEGs, or pmat > 50 for PEGs) - 190 (1.6%) passed all filters for imprinting ---------------------Number of MEGs passing all filters: 131 Number of PEGs passing all filters: 59 ----------------------

As expected, censoring genes more highly expressed in the seed coat than the endosperm strongly reduced the number of MEGs, suggesting that many of the originally identified MEGs were false positives. 4.2 Identifying Imprinted Genes in a Zea mays Dataset Starting from Count Data

Count data were downloaded from the Plant Imprinting Database (plantimprinting.wi.mit.edu) for all maize endosperm data from Waters et al. 2011 [17]. The file was saved to workdir/ B73_Mo17.csv. Inbred lines B73 (“B”) and Mo17 (“M”) were compared. # convert csv file to 4 separate counts files cut -f1,8 -d’,’ B73_Mo17.csv | tr ’,’ ’\t’ | tail -n+2 > BxM_B_counts.txt cut -f1,9 -d’,’ B73_Mo17.csv | tr ’,’ ’\t’ | tail -n+2 > BxM_M_counts.txt cut -f1,10 -d’,’ B73_Mo17.csv | tr ’,’ ’\t’ | tail -n+2 > MxB_B_counts.txt cut -f1,11 -d’,’ B73_Mo17.csv | tr ’,’ ’\t’ | tail -n+2 > MxB_M_counts.txt

Detection and Conservation of Imprinted Expression

193

# run call_imprinting.sh call_imprinting.sh -o Zmays_imprinting -x BxM_B_counts.txt -y BxM_M_counts.txt -X MxB_B_counts.txt -Y MxB_M_counts.txt -A B73 -B Mo17 -n Zmays -R 2 -I 2 -C 10 -M 85 -P 50 -c 10

Summary of results from log file: Summary of results: - 12477 had allelic counts >= 10 in both crosses and could be evaluated - Of these 12477: - 10397 (83.3%) failed the p-value cutoff (adjusted imprinting p-value >= 0.01) - 1481 (11.9%) passed the p-value cutoff but failed the IF cutoff (IF < 2) - 50 (0.4%) passed the p-value and IF cutoffs but failed the CEF cutoff (CEF >= 10) - 237 (1.9%) passed the p-value and IF cutoffs but failed the pmat cutoff (pmat < 85 for MEGs, or pmat > 50 for PEGs) - 312 (2.5%) passed all filters for imprinting ---------------------Number of MEGs passing all filters: 85 Number of PEGs passing all filters: 227 ----------------------

Whereas seed coat contamination likely artificially inflated the number of MEGs initially identified in the A. thaliana example above, this type of contamination is less of a concern in maize, which has much larger seeds that are easier to dissect. 4.3 Comparing Imprinting Between A. thaliana and Z. mays

We can now compare the imprinting results obtained from A. thaliana and Z. mays in Subheadings 4.1 and 4.2. The results of the imprinting analysis for A. thaliana were saved to Athaliana_imprinting/imprinting/Athaliana_imprinting_filtered_all.txt and from Z. mays to Zmays_imprinting/imprinting/Zmays_imprinting_filtered_all.txt; these files will be used in the following analysis, saved as Athaliana_data and Zmays_data: Athal_data="Athaliana_imprinting_2/imprinting/Athaliana_imprinting_filtered_all.txt" Zmays_data="Zmays_imprinting/imprinting/Zmays_imprinting_filtered_all.txt"

194

Colette L. Picard and Mary Gehring

First, homologs between A. thaliana and Z. mays must be obtained. The list of genes assayed in A. thaliana can be obtained from the output file above: cut -f1 "$Athal_data" | tail -n+2 > Athaliana_genes.txt

The script get_homologs.py can then be used to get a list of A. thaliana to Z. mays homologs, based on the list of A. thaliana genes: get_homologs.py "A. thaliana" "Z. mays" Athaliana_genes.txt Athal_Zmays_homologs.txt

First, we demonstrate the naı¨ve approach, taking the A. thaliana PEGs and directly assessing the overlap between them and the set of Z. mays imprinted genes: Athal_PEGs="Athaliana_imprinting_2/imprinting/Athaliana_imprinting_filtered_PEGs.txt" Zmays_PEGs="Zmays_imprinting/imprinting/Zmays_imprinting_filtered_PEGs.txt" # get Z. mays homologs of A. thaliana PEGs join -j 1 -o 1.1,2.2 -t$’\t’ Athal_status.txt cut -f1,7,25 $Zmays_data | tail -n+2 > Zmays_status.txt # get status of A. thaliana imprinted genes in Z. mays

196

Colette L. Picard and Mary Gehring comp_imprinting.py Athal_status.txt Zmays_status.txt Athal_Zmays_homologs.txt Athal_vs_Zmays --species1 "A. thaliana" -species2 "Z. mays"

The script outputs a summary (results shown below for A. thaliana PEGs), as well as a text file of the status of all the A. thaliana imprinted genes in Z. mays and (if matplotlib is installed) pie charts summarizing the results (Fig. 7b): Summary of results for MEGs and PEGs in A. thaliana: ---------------------A total of 59 PEGs were detected in A. thaliana - 30 (50.8%) PEGs had no homolog in Z. mays - 9 (15.3%) PEGs had homolog in Z. mays, but no data was available for any homolog - 6 (10.2%) PEGs had homolog in Z. mays, but too few allelespecific reads were available - 0 (0.0%) PEGs had homolog in Z. mays, but all homolog (s) with data were censored The remaining 14 PEGs (22.6%) had at least one homolog that could be evaluated in Z. mays Of these: - 1 (7.1%) were also PEGs in Z. mays - 0 (0.0%) were substantially paternally biased in Z. mays but failed the % mat. cutoff - 0 (0.0%) were substantially paternally biased in Z. mays but also exhibited strain bias - 1 (7.1%) were significantly but not substantially paternally biased in Z. mays - 12 (85.7%) were not significantly parentally biased in Z. mays - 0 (0.0%) were significantly but not substantially maternally biased in Z. mays - 0 (0.0%) were substantially maternally biased in Z. mays but also exhibited strain bias - 0 (0.0%) were substantially maternally biased in Z. mays but failed the % mat. cutoff - 0 (0.0%) were MEGs in Z. mays

Based on this analysis, the majority of the A. thaliana PEGs had no homolog in Z. mays, and others couldn’t be evaluated in Z. mays because of lack of data (Fig. 7b). The naı¨ve comparison suggested that only 1/62 (1.6%) of A. thaliana PEGs had

Detection and Conservation of Imprinted Expression

197

conserved imprinting in Z. mays. This second analysis, however, suggests that a more accurate assessment is that 1/14 (7.1%) of A. thaliana PEGs had conserved imprinting in Z. mays and that 2/14 (14.2%) A. thaliana PEGs had at least some paternal bias in Z. mays (Fig. 7c). As previously reported, conservation of imprinting between A. thaliana and Z. mays is low [18]. However, not all data from Pignatta et al. were used in this example (only the first replicate of Col-Cvi reciprocal crosses). Additionally, the set of annotations used by Waters et al. (Z. mays B73 AGPv2) is now out of date (current version is AGPv4), which likely contributes to the large number of genes in A. thaliana with no Z. mays homolog according to Phytozome. This, in addition to the improvements in mapping software that have been made since 2011, strongly suggests that remapping of the Waters et al. data using the full pipeline and updated reference annotations would provide an even more accurate comparison. We also briefly demonstrate how to use the --pathway option in comp_imprinting.py to assess imprinting conservation at the level of gene groups instead of individual genes. To perform this analysis, information on which genes belong to groups or pathways of interest must be supplied by the user. We illustrate the method using genes in two groups: RdDM pathway components, which include PEGs in both A. thaliana and A. lyrata [12], and components of the PRC2 complex, several of which are MEGs in A. thaliana [14]. Gene IDs for several members of these two groups were curated by hand and saved to a file named pathway_info.txt along with the group to which each gene belongs: AT1G15215 RdDM AT1G63020 RdDM AT2G40030 RdDM AT4G11130 RdDM AT3G43920 RdDM AT2G27040 RdDM AT2G33830 RdDM AT4G15950 RdDM AT3G20740 PRC2 AT2G35670 PRC2 AT1G02580 PRC2 GRMZM2G157820 PRC2 GRMZM5G875502 PRC2 GRMZM2G043484 PRC2

Note that pathway_info.txt can contain both A. thaliana and Z. mays IDs; any homologs of these genes in the other species will also be considered members of the same group for the purposes of the analysis. While this simplifies the analysis, not all homologs

198

Colette L. Picard and Mary Gehring

share a function, so this is a potential source of error. Using the pathway data, the analysis was repeated: comp_imprinting.py Athal_status.txt Zmays_status.txt Athal_Zmays_homologs.txt Athal_vs_Zmays --species1 "A. thaliana" -species2 "Z. mays" --pathway pathway_info.txt

The analysis with the --pathway option also outputs the following pathway-level summary: pathway_ID species_1_bias species_1_most_biased species_2_bias species_2_most_biased PRC2 MEG AT2G35670 both_MEGs_and_PEGs GRMZM2G118205 RdDM patbias_fail_pmat_cutoff AT1G63020 PEG GRMZM2G007681

The most imprinted gene of each group is reported in the summary above, along with the degree to which it was imprinted. If two or more genes were equally highly imprinted in a group, one is chosen for the summary above at random. The full result of the analysis is also saved to a separate file. While imprinting at the gene level is not well conserved between A. thaliana and Z. mays (Fig. 7), our example suggests that both the RdDM pathway and the PRC2 complex have imprinted members in both species. Different components of the RdDM pathway are also paternally expressed in A. lyrata [12], suggesting that paternal control of this pathway may be highly conserved. Interestingly, while several PRC2 components are known to be MEGs in A. thaliana, the Z. mays homologs of these genes exhibit both maternal and paternal expression bias.

5

Conclusion Here we have presented a pipeline for easy, consistent analysis of imprinting from RNA-seq or count data. We have also provided tools to compare imprinting across different species and introduced the Plant Imprinting Database as a resource for researchers studying plant gene imprinting.

Acknowledgments We would like to thank Andy Nutter-Upham and Scott McCallum for their tireless work in building the imprinting database and Daniela Pignatta for initial contributions to database design.

Detection and Conservation of Imprinted Expression

199

References 1. Pires ND, Grossniklaus U (2014) Different yet similar: evolution of imprinting in flowering plants and mammals. F1000Prime Rep 6:63 2. Gehring M, Satyaki PR (2017) Endosperm and imprinting, inextricably linked. Plant Physiol 173:143–154 3. Plasschaert RN, Bartolomei MS (2014) Genomic imprinting in development, growth, behavior and stem cells. Development 141:1805–1813 4. John RM (2017) Imprinted genes and the regulation of placental endocrine function: pregnancy and beyond. Placenta 56:86–90 5. Pignatta D, Novitzky K, Satyaki PRV, Gehring M (2018) A variably imprinted epiallele impacts seed development. PLoS Genet 14: e1007469 6. Bai F, Settles AM (2014) Imprinting in plants as a mechanism to generate seed phenotypic diversity. Front Plant Sci 5:780 7. Kappil MA, Green BB, Armstrong DA, Sharp AJ, Lambertini L, Marsit CJ, Chen J (2015) Placental expression profile of imprinted genes impacts birth weight. Epigenetics 10:842–849 8. Kinoshita T, Yadegari R, Harada JJ, Goldberg RB, Fischer RL (1999) Imprinting of the MEDEA polycomb gene in the Arabidopsis endosperm. Plant Cell 11:1945–1952 9. Cassidy SB, Dykens E, Williams CA (2000) Prader-Willi and Angelman syndromes: sister imprinted disorders. Am J Med Genet 97:136–146 10. Gehring M, Bubb KL, Henikoff S (2009) Extensive demethylation of repetitive elements during seed development underlies gene imprinting. Science 324:1447–1451 11. Satyaki PRV, Gehring M (2017) DNA methylation and imprinting in plants: machinery and mechanisms. Crit Rev Biochem Mol Biol 52:163–175 12. Klosinska M, Picard CL, Gehring M (2016) Conserved imprinting associated with unique epigenetic signatures in the Arabidopsis genus. Nat Plants. 2:16145 13. Barlow DP, Bartolomei MS (2014) Genomic imprinting in mammals. CSH Perspect Biol 6: a018382 14. Jullien PE, Berger F (2009) Gamete-specific epigenetic mechanisms shape genomic imprinting. Curr Opin Plant Biol 12:637–642 15. Luo M, Taylor JM, Spriggs A, Zhang H, Wu X, Russell S, Singh M, Koltunow A (2011) A genome-wide survey of imprinted genes in

rice seeds reveals imprinting primarily occurs in the endosperm. PLoS Genet 7:e1002125 16. Chen C, Li T, Zhu S, Liu Z, Shi Z, Zheng X, Chen R, Huang J, Shen Y, Luo S, Wang L, Liu Q (2018) Characterization of imprinted genes in rice reveals conservation of regulation and imprinting with other plant species. Plant Physiol 177:1754–1771 17. Waters AJ, Makarevitch I, Eichten SR, Swanson-Wagner RA, Yeh C-T, Xu W, Schnable PS, Vaughn MW, Gehring M, Springer NM (2011) Parent-of-origin effects on gene expression and DNA methylation in the maize endosperm. Plant Cell 23:4221–4233 18. Zhang M, Zhao H, Xie S, Chen J, Xu Y, Wang K, Zhao H, Guan H, Hu X, Jiao Y, Song W, Lai J (2011) Extensive, clustered parental imprinting of protein-coding and noncoding RNAs in developing maize endosperm. Proc Natl Acad Sci U S A 108:20042–20047 19. Waters AJ, Bilinski P, Eichten SR, Vaughn MW, Ross-Ibarra J, Gehring M, Springer NM (2013) Comprehensive analysis of imprinted genes in maize reveals allelic variation for imprinting and limited conservation with other species. Proc Natl Acad Sci U S A 110:19639–19644 20. Xin M, Yang R, Li G, Chen H, Laurie J, Ma C, Wang D, Yao Y, Larkins BA, Sun Q, Yadegari R, Wang X, Ni Z (2013) Dynamic expression of imprinted genes associates with maternally controlled nutrient allocation during maize endosperm development. Plant Cell 25:3212–3227 21. Yang G, Liu Z, Gao L, Yu K, Feng M, Yao Y, Peng H, Hu Z, Sun Q, Ni Z, Xin M (2018) Genomic imprinting was evolutionarily conserved during wheat polyploidization. Plant Cell 30:37–47 22. Florez-Rueda AM, Paris M, Schmidt A, Widmer A, Grossniklaus U, St€adler T (2016) Genomic imprinting in the endosperm is systematically perturbed in abortive hybrid tomato seeds. Mol Biol Evol 33:2935–2946 23. Roth M, Florez-Rueda AM, Paris M, St€adler T (2018) Wild tomato endosperm transcriptomes reveal common roles of genomic imprinting in both nuclear and cellular endosperm. Plant J 95:1084–1101 24. Liu J, Li J, Liu HF, Fan SH, Singh S, Zhou XR, Hu ZY, Wang HZ, Hua W (2018) Genomewide screening and analysis of imprinted genes in rapeseed (Brassica napus L.) endosperm. DNA Res 25:629–640

200

Colette L. Picard and Mary Gehring

25. Xu W, Dai M, Li F, Liu A (2014) Genomic imprinting, methylation and parent-of-origin effects in reciprocal hybrid endosperm of castor bean. Nucleic Acids Res 42:6987–6998 26. Hatorangan MR, Laenen B, Steige KA, Slotte T, Ko¨hler C (2016) Rapid evolution of genomic imprinting in two species of the Brassicaceae. Plant Cell 28:1815–1827 27. Zhang M, Li N, He W, Zhang H, Yang W, Liu B (2016) Genome-wide screen of genes imprinted in sorghum endosperm, and the roles of allelic differential cytosine methylation. Plant J 85:424–436 28. Gehring M, Missirian V, Henikoff S (2011) Genomic analysis of parent-of-origin allelic expression in Arabidopsis thaliana seeds. PLoS One 6:e23687 29. Leinonen R, Sugawara H, Shumway M, International Nucleotide Sequence Database Collaboration (2010) The sequence read archive. Nucleic Acids Res 39:D19–D21 30. Pignatta D, Erdmann RM, Scheer E, Picard CL, Bell GW, Gehring M (2014) Natural epigenetic polymorphisms lead to intraspecific variation in Arabidopsis gene imprinting. eLife. 3:e03198 31. Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, Mitros T, Dirks W, Hellsten U, Putnam N, Rokhsar DS (2011) Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res 40: D1178–D1186 32. 1001 Genomes Consortium (2016) 1,135 genomes reveal the global pattern of polymorphism in Arabidopsis thaliana. Cell 166:481–491 33. Hansen NF (2016) Variant calling from next generation sequence data. In: Mathe´ E, Davis S (eds) Statistical genomics, Meth Mol Biol, vol 1418. Humana, New York 34. Kobayashi M, Ohyanagi H, Takanashi H, Asano A, Kudo T, Kajiya-Kanegae H, Nagano AJ, Tainaka H, Tokunaga T, Sazuka T, Iwata H, Tsutsumi N, Yano K (2017) Heap: a highly sensitive and accurate SNP detection tool for low-coverage high-throughput sequencing data. DNA Res 24:397–405 35. Christensen KA, Brunelli JP, Lambert MJ, DeKoning J, Phillips RB, Thorgaard GH (2013) Identification of single nucleotide polymorphisms from the transcriptome of an organism with a whole genome duplication. BMC Bioinformatics 14:325 36. Castel SE, Levy-Moonshine A, Mohammadi P, Banks E, Lappalainen T (2015) Tools and best practices for data processing in allelic expression analysis. Genome Biol 16:195

37. Li A, Liu D, Wu J, Zhao X, Hao M, Geng S, Yan J, Jiang X, Zhang L, Wu J, Yin L, Zhang R, Wu L, Zheng Y, Mao L (2014) mRNA and small RNA transcriptomes reveal insights into dynamic homoeolog regulation of allopolyploid heterosis in nascent hexaploid wheat. Plant Cell 26:1878–1900 38. Zhuo Z, Lamont SJ, Abasht B (2017) RNA-seq analyses identify frequent allele specific expression and no evidence of genomic imprinting in specific embryonic tissues of chicken. Sci Rep 7:11944 39. Moreno-Romero J, Jiang H, SantosGonza´lez J, Ko¨hler C (2016) Parental epigenetic asymmetry of PRC2-mediated histone modifications in the Arabidopsis endosperm. EMBO J 35:1298–1311 40. Brekke TD, Henry LA, Good JM (2016) Genomic imprinting, disrupted placental expression, and speciation. Evolution 70:2690–2703 41. Schon MA, Nodine MD (2017) Widespread contamination of Arabidopsis embryo and endosperm transcriptome data sets. Plant Cell 29:608–617 42. Wang Z, Clark AG (2014) Using nextgeneration RNA sequencing to identify imprinted genes. Heredity 113:156–166 43. Hsieh TF, Shin J, Uzawa R, Silva P, Cohen S, Bauer MJ, Hashimoto M, Kirkbride RC, Harada JJ, Zilberman D, Fischer RL (2011) Regulation of imprinted gene expression in Arabidopsis endosperm. Proc Natl Acad Sci U S A 108:1755–1762 44. Wolff P, Weinhofer I, Seguin J, Roszak P, Beisel C, Donoghue MTA, Spillane C, Nordborg M, Rehmsmeier M, Ko¨hler C (2011) High-resolution analysis of parent-oforigin allelic expression in the Arabidopsis endosperm. PLoS Genet 7:e1002126 45. Anders S, Pyl PT, Huber W (2014) HTSeq—a Python framework to work with highthroughput sequencing data. Bioinformatics 31:166–169 46. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR (2012) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29:15–21 47. Andrews S (2010) FastQC: a quality control tool for high throughput sequence data. http://www.bioinformatics.babraham.ac.uk/ projects/fastqc 48. Krueger F (2012) Trim Galore. http://www. bioinformatics.babraham.ac.uk/projects/ trim_galore 49. Lamesch P, Berardini TZ, Li D, Swarbreck D, Wilks C, Sasidharan R, Muller R, Dreher K,

Detection and Conservation of Imprinted Expression Alexander DL, Garcia-Hernandez M, Karthikeyan AS, Lee CH, Nelson WD, Ploetz L, Singh S, Wensel A, Huala E (2011) The Arabidopsis information resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res 40:D1202–D1210 50. Cheng C, Krishnakumar V, Chan AP, ThibaudNissen F, Schobel S, Town CD (2017) Araport11: a complete reannotation of the Arabidopsis thaliana reference genome. Plant J 89:789–804

201

51. Belmonte MF, Kirkbride RC, Stone SL, Pelletier JM, Bui AQ, Yeung EC, Hashimoto M, Fei J, Harada CM, Munoz MD, Le BH, Drews GN, Brady SM, Goldberg RB, Harada JJ (2013) Comprehensive developmental profiles of gene activity in regions and subregions of the Arabidopsis seed. Proc Natl Acad Sci U S A 110:E435–E444 52. Gordon A (2010) Fastx toolkit. https:// github.com/agordon/fastx_toolkit

Chapter 14 Epigenetic Approaches in Non-Model Plants M. Teresa Boquete, Niels C. A. M. Wagemaker, Philippine Vergeer, Jeannie Mounger, and Christina L. Richards Abstract Reduced representation bisulfite sequencing is an emerging methodology for evolutionary and ecological genomics and epigenomics research because it provides a cost-effective, high-resolution tool for exploration and comparative analysis of DNA methylation and genetic variation. Here we describe how digestion of genomic plant DNA with restriction enzymes, subsequent bisulfite conversion of unmethylated cytosines, and final DNA sequencing allow for the examination of genome-wide genetic and epigenetic variation in plants without the need for a reference genome. We explain how the use of several combinations of barcoded adapters for the creation of highly multiplexed libraries allows the inclusion of up to 144 different samples/individuals in only one sequencing lane. Key words Epigenetics, DNA methylation, Bisulfite treatment, epiGBS, Next-generation sequencing, Restriction enzymes, Multiplexing, Reduced representation sequencing

1

Introduction While understanding the ecological and evolutionary consequences of epigenetic variation has inspired several lines of inquiry, the lack of high-resolution genomic tools has limited our ability to address these questions in non-model organisms [1–3]. Methylationsensitive amplified fragment length polymorphism (MS-AFLP/ MSAP) has been a popular molecular marker approach that has been applied to several non-model organisms [2, 4]. However, this tool produces a limited number of anonymous markers (typically 100s to 1000s), which may prevent detection of DNA methylation structure (e.g., [5]). Development of new reduced representation bisulfite sequencing techniques, such as epigenotyping by sequencing (epiGBS; [6]), provides the potential to elucidate the contribution of DNA methylation to plant phenotypes, responses to environmental conditions, or adaptation, by increasing the number

Charles Spillane and Peter McKeown (eds.), Plant Epigenetics and Epigenomics: Methods and Protocols, Methods in Molecular Biology, vol. 2093, https://doi.org/10.1007/978-1-0716-0179-2_14, © Springer Science+Business Media, LLC, part of Springer Nature 2020

203

204

M. Teresa Boquete et al.

of loci by several orders of magnitude and potentially providing insight into function [1–3]. The epiGBS technique uses size selection of fragments created from digestion with restriction enzymes, to select a portion of the genome (usually less than ~1%, although this proportion depends on the species). Depending on the choice of enzyme, fragments can be targeted toward portions of the genome that are GC rich or that are not highly methylated (e.g., repeat regions). Additionally, epiGBS allows for the assessment of DNA methylation states in three different genomic contexts (CG, CHG, CHH, where H can be any base but G), which are potentially functionally relevant for plants [7–10]. This method, however, has several limitations [3, 5]. Calling methylation polymorphisms can be confounded with single nucleotide polymorphisms (SNP) loci if there is a mutation of either C or G as bisulfite treatment deaminates unmethylated cytosines to uracils, which then pair with thymine through PCR amplification. The method relies on de novo creation of the unconverted reference for calling polymorphisms in the fragments, which could be particularly sensitive to this confounding. Additionally, epiGBS is unable to distinguish between loci originating from different chromosomes. As a result, detected methylation levels represent a mean of methylation over all chromosomes (per locus). This could be particularly problematic for understanding the importance of methylation in polyploid species, where multiple copies of genes can take on new functions and some copies may be silenced by methylation [11]. To some extent this problem can be addressed with increased sequencing depth, and this limitation is also true for most other commonly applied reduced representation techniques. Despite the recent development and the shortcomings of epiGBS, the technique has already been successfully used to demonstrate the rapid emergence of divergent phenotypes in monoculture and mixture due to selection of genetic and possibly epigenetic variants in three perennial grassland species in a biodiversity experiment [12], as well as a genome-wide signature of genetic and epigenetic differentiation among Spartina alterniflora populations correlated to oil pollution [5].

2 2.1

Materials Equipment

1. 2100 Bioanalyzer system (Agilent). 2. Qubit™ 3.0 Fluorometer (Q33216, Invitrogen™). 3. Thermal cycler (e.g., Techne TC-4000). 4. QPCR cycles (optional). 5. Blue Pippin (optional).

Epigenetic Approaches in Non-Model Plants

2.2

Reagents

205

1. Molecular grade water. 2. DNA isolation kit: e.g., Macherey-Nagel Nucleospin 8/96 Plant II. 3. Qubit dsDNA HS Assay Kit (Q32851, Life technologies). 4. AseI. (R0526L, New England BioLabs). 5. NsiI. (R0127L, New England Biolabs). 6. FastDigest Csp6I (FD0214, Thermo Fisher Scientific). 7. FastDigest 10 buffer (B64, Thermo Fisher Scientific). 8. T4 DNA ligase. 9. Partly methylated, non-phosphorylated, barcoded adapters (e.g., Alpha DNA). 10. NucleoSpin™ Gel and PCR Clean-up Kit, e.g., MachereyNagel™. 11. Agencourt AMPure XP (A63880, Beckman Coulter). 12. 5-Methylcytosine dNTP Mix. 13. DNA polymerase I. 14. EZ DNA Methylation-Lightning™ Kit (Zymo Research). 15. Illumina PE PCR Primers. 16. KAPA HIFI (Uracil+) HotStart ReadyMix (KK2800, Roche Diagnostics). 17. KAPA Library Quantification Kit (for Bio-Rad) (KK4844, Roche Diagnostic) (optional). 18. High-sensitivity DNA kit (Agilent).

3

Methods The following instructions allow preparation of an epiGBS library composed of 96 multiplexed samples. The values can be scaled down if working with fewer samples. Additionally, if working with more than one species per sequencing lane, samples for each species need to be handled separately and only pooled at the very end of the library preparation.

3.1 Genomic DNA Isolation, Quality Check, and Quantification

1. Grind up to 100 mg of frozen plant tissue in a 2 mL Eppendorf tube using stainless steel beads. Perform 30 s grinding rounds for a total of 2–4 min and submerge tubes in liquid N between rounds (see Note 1). 2. Isolate genomic DNA (gDNA) using a commercial kit such as the Nucleospin 8 Plant II Kit from Macherey-Nagel. Follow the manufacturer’s instructions and elute gDNA in 50 μL of molecular grade water (see Note 2).

206

M. Teresa Boquete et al.

3. When working with new species or tissue types, determine the quality of gDNA by running a gel after the digestion step in order to confirm that restriction enzymes are working properly (see Note 3). 4. Quantify the concentration of gDNA in each sample with a reliable DNA quantification method, usually a fluorescentdependent method such as the Qubit Fluorometric dsDNA assay. 3.2 Digestion of Genomic DNA

1. Digest 400 ng of gDNA per sample (see Note 4) with a combination of two restriction enzymes (see Note 5, Fig. 1). Digestion runs overnight (17 h) at 37  C in a volume of 40 μL containing: 1 μL (10 units) of AseI, 1 μL (10 units) of NsiI, 4 μL of FastDigest 10 buffer, 4 μL of molecular grade water, and 30 μL containing 400 ng of gDNA plus water. First, pipette enough sample to get 400 ng of gDNA into each well on a 96-well plate and fill up to 30 μL with molecular grade water. Then, prepare a master mix in a 1.5 mL Eppendorf tube

Fig. 1 Expected minimum read coverage for Phragmites australis consisting of a standardized subset of one million reads sequenced per individual in epiGBS libraries prepared with two different enzyme combinations: Csp6I∗NsiI (dark blue line) and AseI∗NsiI (orange line). The graph shows a decrease in the read coverage with the increase in the number of contigs covered for both enzyme combinations, but coverage is always lower for Csp6I∗NsiI than for AseI∗NsiI. The cumulative coverage of both enzyme combinations shows that Csp6I∗NsiI (yellow line) yields a greater number of contigs with lower expected coverage than AseI∗NsiI (gray line), i.e., nearly 50% of the reads mapped (secondary y-axis) correspond to 8000 contigs with a minimum expected coverage of 7 for AseI∗NsiI (green line), while for Csp6I∗NsiI (yellow line), nearly 32% of the sequencing power corresponds to 20,000 contigs with a minimum expected coverage of 2. Note that doubling the number of reads obtained per individual for the Csp6I∗NsiI combination (light blue line) considerably increases the coverage

Epigenetic Approaches in Non-Model Plants

207

by adding 105 the volume required of each reagent (when working with a high number of samples, we recommend preparing enough mix for the total number of samples plus an extra 10%, e.g., for 96 samples, we would prepare enough mix for 105), and pipette 10 μL of this mix into each well. Put the plate in the thermal cycler with the lid temperature set to 30  C. 3.3 Ligation of Barcoded Adapters

The latest version of the barcoded adapters (Fig. 2) contains three random nucleotides (NNN wobble in Fig. 1) to be able to filter for PCR duplicates and an unmethylated control cytosine to facilitate annotation of Watson and Crick strand and to calculate bisulfite (BS) conversion rates. The amplified epiGBS fragments are derived from fixed loci without variation in their start- or endpoints, making it impossible to filter for PCR duplicates as is the case in genome sequencing projects. Therefore, in epiGBS, we use adapters with three random nucleotides incorporated directly after the inline barcode nucleotide. The three random nucleotides of the “upper” barcode strands are “complemented” in the lower strands during nick translation when the complete “lower” barcode strands are replaced. Hence the Watson and Crick strands of a single original molecule have two 3 bp complementary random nucleotide tags. The number of possible combinations in the six random nucleotides combined is 46 ¼ 4096, which is sufficient to identify PCR duplicates. Tagging of duplicate reads is later performed in the epiGBS pipeline (https://github.com/thomasvangurp/epiGBS) by the python

Fig. 2 Structure of the forward (Read 1, BA) and reverse (Read 2, CO) adapters used during the ligation step showing the following: (1) in red, the three random nucleotides allowing the identification and filtering of PCR duplicates (NNN wobble); (2) in blue, the barcode sequence (in this case, AACT) that identifies the reads pertaining to each sample; and (3) in green, the unmethylated control cytosine that enables annotation of Watson and Crick strands and calculation of bisulfite conversion rates

208

M. Teresa Boquete et al.

script mark_PCR_duplicates.py after creation of the BAM files during mapping. The unmethylated control cytosine comes after the three random nucleotides in the “upper” strand of the adapter. During BS conversion this nucleotide is converted to uracil and will subsequently be sequenced as a thymine in the Crick strand. The Watson strand will contain a guanidine at that position. With this information combined, Watson and Crick as well as BS conversion rate can later be identified at the sample level using the script demultiplex.py in the epiGBS pipeline. 1. Recover the plate containing the gDNA digest from the thermal cycler to perform the ligation reaction. This reaction runs overnight (at 22  C for 3 h, followed by 4  C overnight) in a volume of 60 μL containing: 40 μL of gDNA digest, 6 μL (1) of T4 DNA ligase buffer 10, 0.5 μL (1000 units) of T4 DNA ligase, 8 μL (1200 pg) of barcoded adapter mix, and 5.5 μL of molecular grade water. First, prepare the master mix in a 1.5 mL Eppendorf tube by adding 105 the volume of each reagent except the adapter mix, and add 12 μL of this mix to each well. The adapters have been previously combined in a 96-well plate to a concentration of 75 pg/μL (see Note 6). Add 8 μL of the appropriate barcoded adapter mix to each corresponding well. 3.4 Per Species Samples Pooling, Pool Clean-Up, and Concentration

1. Pool the digested-ligated gDNA (dig-lig gDNA) by combining all 60 μL from each sample into one single tube (60 μL/sample  96 samples ¼ 5760 μL) (see Note 7). Separate one third of the total volume of dig-lig gDNA and keep in the freezer as a backup. 2. Clean and concentrate the remaining two-thirds of the total library using the NucleoSpin™ Gel and PCR Clean-up Kit following the manufacturer’s instructions. During the first step of the clean-up, split the total volume (dig-lig gDNA plus binding buffer) into eight different columns to avoid exceeding the column binding capacities. Concentrate the library by eluting the gDNA in 40 μL of elution buffer (EB) into a 1.5 mL Eppendorf tube. The rest of the protocol will be performed in these eight pools.

3.5 Fragment Size Selection of DigestedLigated DNA

1. Size select the libraries using a 0.8 Agencourt AMPure XP purification selecting for gDNA fragments greater than 200 bp (see Note 8, Fig. 3). Elute gDNA in a total volume of 24 μL of EB. Usually, you will recover 22–23 μL of size selected gDNA into a new PCR tube (see Note 9). 2. Check the concentration of DNA in each pool at this point using the Qubit assay using only 1 μL of sample.

Epigenetic Approaches in Non-Model Plants

209

10380 7000 5000 3000 1500 1000 700 500

300

100 50

Fig. 3 Selection of DNA fragment sizes according to the solid phase reversible immobilization (SPRI) beads to DNA ratio used. During the two size selection steps of the epiGBS protocol, we recommend using a ratio of 0.8 in order to enrich the library in fragments between 200 and 600 bp long (red rectangle)

3.6

Nick Translation

Because non-phosphorylated adapters are used, epiGBS libraries contain nicks between the 30 fragment overhang and the 50 non-phosphorylated adapter nucleotide. To prevent the loss of ssDNA adapter strands at the nicked position during bisulfite treatment, this nick will get translated to the end of the adapter to remove it completely. 1. Run nick translation in the thermal cycler for 1 h at 15  C, with the lid temperature set to 30  C, in a volume of 25 μL containing: 19.25 μL of purified dig-lig gDNA, 2.5 μL of 5-methylcytosine dNTP mix (10 mM), 2.5 μL (1) of 10 NEB buffer 2, and 0.75 μL (7.5 units) of DNA polymerase I. Prepare the nick translation master mix as indicated for previous reactions and add 5.75 μL of the mix to each pool.

3.6.1 Optional Test GBS PCR

1. Use 1 μL of nick-translated DNA to perform a test GBS PCR in a volume of 10 μL containing: 1 μL of nick-translated DNA, 5 μL of KAPA HiFi HotStart Uracil+ ReadyMix, 0.3 μL (3 pmol) of each Illumina PE PCR Primer, and 3.4 μL of molecular grade water. Prepare the master mix and add 9 μL of the mix to each tube. Put the tubes in the thermal cycler with the following settings:

210

M. Teresa Boquete et al.

Heat lid

105  C

Preheat lid

Off

Pause

Off

In denature

95  C—3 min

Hot start

Off

Stage 1 Number of cycles

15

Step

98  C—0 min 10 s max

Step

65  C—0 min 15 s max

Step

72  C—0 min 15 s max

Fin extension

72  C—5 min

Fin hold

10  C

We run the test GBS PCR in order to adjust the number of cycles in the final epiGBS PCR needed to obtain enough, but not over-amplify, epiGBS library. Also, the quality of the library can be tested by running 2 ng/μL of the GBS PCR product on the bioanalyzer. The number of cycles should be adjusted as follows: if the concentration of the test GBS PCR is 8 ng/μL, use 14 cycles.

3.7 Bisulfite Conversion

1. Convert all nick-translated DNA left (23–24 μL) using the EZ DNA Methylation-Lightning™ kit following the manufacturer’s instructions (see Note 10). At the end of the conversion protocol, elute the DNA twice from the column into the same 1.5 mL Eppendorf tube. For the first elution use 12 μL of EB buffer and for the second one use 12 μL of molecular grade water (see Note 11).

3.8 Library Amplification (Final epiGBS PCR)

1. Use 22 μL bisulfite-converted DNA to perform the epiGBS PCR in a volume of 50 μL containing: 22 μL of bisulfiteconverted DNA, 25 μL of KAPA HiFi HotStart Uracil+ ReadyMix, and 1.5 μL (15 pmol) of each Illumina PE PCR Primer. We recommend preparing a master mix by adding 9 the volume required of each reagent and pipetting 28 μL of this mix into each pool. Split the 50 μL of each pool in five aliquots of 10 μL each into PCR tubes and put them in the thermal cycler with the following settings:

Epigenetic Approaches in Non-Model Plants

Heat lid

105  C

Preheat lid

Off

Pause

Off

In denature

95  C—3 min

Hot start

Off

211

Stage 1 Number of cycles

14/15/16–18a

Step

98  C—0 min 10 s max

Step

65  C—0 min 15 s max

Step

72  C—0 min 15 s max

Fin extension

72  C—5 min

Fin hold

10  C

a

Depending on the concentration of the library obtained in the test GBS PCR

2. Once the epiGBS PCR is finished, combine all five aliquots from each pool into a single one (note that if working with more than one species, we recommend pooling them separately per species level) (Fig. 4b). 3. Combine all pools in one single 1.5 mL Eppendorf tube and concentrate the library (Fig. 4c) using the NucleoSpin™ Gel and PCR Clean-up Kit following the manufacturer’s instructions (see Note 12). Elute in 30 μL of EB. 4. Perform one last size selection step using the Agencourt AMPure XP purification beads (see Subheading 2.1, item 5; adjust the volumes of beads to maintain the 1:0.8 ratio) in order to completely remove small fragments such as adapter dimers (Fig. 4d). Optionally, you can use Blue Pippin instead of the Agencourt AMPure XP purification beads to achieve a more precise size range in the epiGBS library, which would enable obtaining fewer loci for sequencing but with somewhat better coverage. 3.9 Quantification and Assessment of the Quality of the epiGBS Library

1. Quantify the amount of DNA in the library using the Qubit assay (use only 1 μL of sample). For best sequencing results we recommend quantifying the library by QPCR via the KAPA Library Quantification Kit (KAPA HIFI (Uracil+) is not necessary here since it is more expensive) following the manufacturer’s instructions. 2. Assess the quality of the libraries (Fig. 4d) analyzing 1 μL on a high-sensitivity DNA chip on the bioanalyzer.

212

M. Teresa Boquete et al.

Fig. 4 Examples of bioanalyzer output graphs in samples of the terrestrial moss Scopelophila cataractae run after several crucial steps of the epiGBS library preparation protocol: (a) profile of the digested and ligated DNA (average size: 5760 bp), (b) epiGBS library of bisulfite-converted DNA amplified for 20 cycles in the thermal cycler (average size: 375 bp), (c) epiGBS library concentrated using the NucleoSpin™ Gel and PCR Clean-up Kit (average size: 370 bp), and (d) epiGBS library after final SPRI beads size selection (average size: 389 bp). Vertical bars represent the size range of the fragments in the epiGBS library. During digestion of gDNA, we generate a wide range of fragments with different sizes that are then size selected in order to achieve a final epiGBS library with fragments ranging between 200 and 600 bp and with an average size between 350 and 450 bp. Libraries presented in (c) and (d) still present some contamination with adapter dimers as observed in the small size peaks; however these small fragments are completely cleaned during the last size selection step (d)

4

Notes 1. Tissue melting as well as overgrinding will lead to DNA degradation and therefore poor-quality gDNA, which would not serve for library preparation. Grinding time depends on the plant species; however, in our experience, 2–4 min of grinding is usually enough. We advise visually inspecting the samples every two rounds of grinding to be sure to avoid overgrinding. 2. Commercial kits for isolating high-quality gDNA do not work equally well for all plant species due to differences in the composition of their cell walls or production of different secondary compounds that may interfere with the extraction. For difficult species, we recommend using the CTAB version from the Macherey-Nagel NucleoSpin 96 Plant II Kit, or try the

Epigenetic Approaches in Non-Model Plants

213

DNA extraction protocol from recalcitrant plant tissues (https://www.protocols.io/view/high-quality-dna-extractionprotocol-from-recalcit-i8jchun), a modified cetyltrimethylammonium bromide (CTAB) protocol that proved to be effective for isolating high-quality gDNA from terrestrial mosses and algae. For specific samples that do not meet DNA quality requirements, we recommend performing a 1.5 Agencourt AMPure XP purification and final elution of gDNA in 50 μL of molecular grade water (Catarina L. Medeiros, personal communication). It is crucial to avoid any ethanol carryover from the extraction by completely drying (but not overdrying) the gDNA because ethanol will inhibit the restriction enzymes used to digest the gDNA. Extracted DNA should be dried for approximately 3 min at 55  C before adding the elution buffer. 3. You need high-quality genomic DNA (gDNA) as starting material to prepare the epiGBS libraries because restriction enzymes are very sensitive to protein and other contaminating compounds. Running a gel right after the digestion step should result in a wide range of fragments of different sizes if the gDNA was good enough. It is important to prevent over amplification as it causes high PCR duplicate rates. 4. The amount of gDNA needed is species specific due to differences in genome size, so we recommend testing the amount of gDNA for each species that yields a proper epiGBS library (see Subheading 2.2, item 10 and Fig. 4d) with a few samples before scaling up the protocol. In our experience, between 50 and 400 ng of DNA is enough. Also, it is important to consider that the lower the amount of starting gDNA, the higher the number of PCR cycles needed to obtain good libraries. A higher amount of PCR cycles will result in increased PCR duplicates during sequencing. 5. The best restriction enzyme combination for the digestion step will depend on the desired number of loci, their coverage, and the regions and the portion of the genome that will be targeted. This varies among species, but also within species the best enzyme combination may differ depending on the research question being addressed. For example, the combination AseI∗NsiI yields fewer loci with good coverage, while the combination Csp6I∗NsiI yields many loci although some of them may have very poor coverage in Phragmites australis (Fig. 1). Therefore, we recommend testing different enzyme combinations for each species and choosing the one that yields the best per loci coverage. 6. It is crucial to know exactly which adapter combination is added to each sample in order to be able to identify fragments corresponding to each individual after sequencing.

214

M. Teresa Boquete et al.

7. Because each species yields a different amount of library, pooling the dig-lig gDNA is done per species level (i.e., if the sequencing lane is shared by more than one species, samples from each species need to be combined with each other and the protocol will be run separately up until the final step). Moreover, if working with only one species that includes samples with low, medium, and high inputs of gDNA, we recommend pooling the samples per input level. 8. Solid phase reversible immobilization (SPRI) beads are paramagnetic beads coated with carboxyl molecules that reversibly bind the DNA in the presence of polyethylene glycol (PEG) and salt (NaCl). The concentration of PEG in the solution determines the size of the fragments of DNA selected, and therefore, the SPRI beads:DNA ratio is critical to perform the right fragment size selection (Fig. 3). In general, the lower the ratio SPRI:DNA, the larger the fragments selected and vice versa. 9. Equilibrate the Agencourt AMPure XP purification beads at room temperature for at least 30 min and vortex them vigorously before use. It is very important to maintain the 1:0.8 gDNA to beads ratio during size selection. For this reason, we recommend measuring the actual volume of sample in each tube after elution in the previous step (some volume might have been retained in the column during DNA elution) and adding the corresponding volume of purification beads (e.g., for 40 μL of sample, add 32 μL of beads, and for 39 μL of sample, add 31.2 μL of beads). Finally, the ethanol used during the cleaning step in the size selection needs to be fresh (prepared within the same day). 10. Bisulfite conversion rates may vary among the different pools, so we recommend using new bisulfite conversion kits, or at least not combining kits that have been opened on different days or stored differently. This is particularly important when studying species with low levels of methylation (e.g., 60% transposable elements). In this chapter, we use seeds of the conifer Picea glauca as a study system to describe the methods and protocols we used or have recently updated, from high-quality RNA isolation to sRNA identification, sequence conservation, abundance comparison, and functional analysis. Key words microRNA, siRNA, sRNA-seq, GRNs, Adaptation, Seeds, Conifers

1

Introduction Small RNAs (sRNAs) have been shown to have key regulatory functions in development, response to biotic and abiotic stressors, maintenance of genome stability, and control of transposable elements (TEs) control. Based on our updated knowledge of sRNA categorization and conifer genome architecture (Fig. 1; [1]), we conceived of a single sequencing experiment followed by bioinformatic analysis to capture sRNA features from millions of sRNA

Charles Spillane and Peter McKeown (eds.), Plant Epigenetics and Epigenomics: Methods and Protocols, Methods in Molecular Biology, vol. 2093, https://doi.org/10.1007/978-1-0716-0179-2_15, © Springer Science+Business Media, LLC, part of Springer Nature 2020

217

218

Yang Liu and Yousry A. El-Kassaby

Fig. 1 Key elements that motivate sRNA study in conifer trees

candidates generated in diverse pathways in the conifer seed. Using computational approaches, we sorted sRNAs according to their structural and functional properties including sRNA precursors, mature sRNA molecules, and their potential target loci. In this case study, we use a conifer species, white spruce (Picea glauca (Moench) Voss), as a study system [2, 3]. This chapter provides our most updated protocols of this work where modifications are made mainly based on the most recent research in the system.

2

Materials

2.1

Plant Material

We chose three white spruce populations, which bear different timing of fertilization and seed set duration; 20 developing cones for each population were collected at early, middle, and late seed set for a total of four time points in the Kalamalka seed orchard (50 – 51 370 N, 119 160 to 120 290 W), British Columbia, Canada. After dissection from white spruce cones, developing seeds were immediately frozen in liquid N2 and stored at 80  C until further use.

2.2

RNA Extraction

Prepare all solutions using DEPC-treated ddH2O and RNase-free glassware by being incubated in DEPC water and then autoclave sterilization at 121  C (249  F) for around 15–20 min. Repeat the autoclaving step 2–3 times to make sure no DEPC residuals. Prepare and store all reagents at 4  C. 1. DEPC-treated ddH2O: double deionized H2O incubated with 0.1% diethylpyrocarbonate (DEPC) and then autoclaved to remove the DEPC.

Small RNA Analysis in Trees

219

2. Extraction buffer: 200 mmol/L Tris–HCl (pH 9.0), 200 mmol/L LiCl, 5 mmol/L EDTA, 1% SDS, 1/1000 (v/v) β-mercaptoethanol. 3. TE buffer: 10 mmol/L Tris–HCl, 1 mmol/L EDTA, pH 8.0. 2.3 Quantitative Real-Time RT-PCR

1. Reaction mixture:

+

1.0 μL

Fivefold diluted cDNA

7.5 μL

Supermix (SYBR™ Green)

1.0 μL  2

Each primer (10 μmol/L)

4.5 μL

ddH2O

15 μL

Total

2. We chose three housekeeping genes as internal controls with amplification primer pairs (GenBank #) given below: (1) Peroxisomal targeting signal receptor (CO220221): 50 -ATGCCTATCTGAAATGGACAC; 30 -ACTGTCTATGTTTGGCAGCAC (2) Hypothetical protein (CO206996): 50 -GTCGTGTGGATTGTCTCTGC; 30 -ATGTATTCGAAGAGGAGGAATG (3) Ubiquitin conjugating enzyme 1 (AY639585): 50 -GGAACAGTGGAGTCCTGCTT; 30 -CCTTGCGGTGGACTCATATT

3 3.1

Methods RNA Extraction

Our previous study verified that total RNAs can be isolated in a good quality from developing seed samples using a PureLink Plant RNA Reagent (Ambion) according to the manufacturer’s instruction. We here provide our in-house protocol tailored to the total RNA isolation from seeds. 1. Grind developing seeds in a mortar and pestle with liquid N2 and then transfer powered seeds (~300 mL) to Eppendorf tubes with 800 μL ice-cold extraction buffer. 2. Vortex and centrifuge the sample at 16,000  g for 5 min and collect the supernatant. 3. Add an identical volume of water/phenol to the supernatant before vortexing and centrifugation at 16,000  g for 10 min at 4  C. 4. Transfer the water phase into new tubes and add an identical volume of chloroform/isoamyl alcohol (24:1).

220

Yang Liu and Yousry A. El-Kassaby

5. Centrifuge the mixture at 16,000  g for 5 min, collect the new water phase into a new tube, and add a 1/300 volume of acetic acid and an identical volume of isopropanol. 6. After storage at 20  C for 30 min, centrifuge tubes at 16,000  g for 5 min, and then discard the supernatant. 7. Dissolve the precipitate in 30 μL TE buffer and add a 1/5 volume of 10 mol/L LiCl. 8. After storage overnight at 4  C, centrifuge samples at 16,000  g for 10 min, and then dissolve the precipitate in 50 μL TE buffer with 125 μL ethanol added. 9. Rinse the centrifuged precipitate with 70% ethanol and then slightly dry in a vacuum concentrator. 10. Dissolve the precipitate in TE buffer and treat it with DNase-I using (for example) the TURBO DNA-free kit (Ambion) to eliminate any genomic DNA contamination. 3.2 RNA Concentration Measurement and Quality Check 3.2.1 RNA Quantity Measurements

All the following analyses are carried out in accordance with the manufacturer’s instructions. 1. Determine RNA purity and concentration spectrophotometrically by measuring the absorbance at 230, 260, and 280 nm using a Nanodrop ND-1000 spectrophotometer (Thermo Fisher Scientific) or similar (see Note 1). 2. Quantify the RNAs using a Qubit™ 4.0 Fluorometer (Thermo Fisher Scientific) or similar. 3. Assess the integrity of the RNAs by an Agilent 2100 BioAnalyser (Agilent Technologies) or similar.

3.2.2 Checking RNA Quality by Electrophoresis

1. Prepare a 1.2% (wt/vol) molecular biology grade agarose gel in the 0.5 TE buffer. 2. Boil the contents until the agarose dissolves in the buffer completely. 3. Add ethidium bromide (final concentration 0.5 μg/mL). 4. Pour and run the gel as described in Sambrook and Russell [4] (see Note 2). 5. Check results (Fig. 2).

3.3 sRNA Library Construction and Sequencing

We construct sRNA-seq libraries using a strand-specific, platebased protocol. 1. Perform polyA selection for total RNA samples using Miltenyi MultiMACS mRNA isolation kit (cat. 130-092-519) following the manufacturer’s protocol to enrich sRNAs. 2. Use the flowthrough (i.e., containing sRNA species without mRNA) for plate-based sRNA construction.

Small RNA Analysis in Trees

221

Fig. 2 Results of electrophoresis of high-quality RNAs. Arrows show 28S, 18S, and 5.8S ribosomal bands. The gel image taken by Gel Doc XR system

3. Selectively ligate a 30 adapter (an adenylated single-stranded DNA) to the sRNA template using a truncated T4 RNA ligase 2 (NEB). 4. Add a 50 adapter using a T4 RNA ligase (Ambion) and ATPs. 5. After ligation, synthesize first strand cDNA using a Superscript II Reverse Transcriptase (Invitrogen, cat. 18064 014) and one RT primer (see Note 3). 6. Pool constructed libraries in one 31 base SET lane. 7. Conduct sequencing on an Illumina HiSeq™ 2500 (or similar) using one short SET indexed lane in pool (in our study we used the BC Cancer Agency Genome Sciences Centre, Vancouver, Canada). 3.4 Small RNA Data Analysis

1. Partition the raw RNA-seq data into individual libraries based on the index read sequences. 2. Perform an initial QC assessment for the raw reads (see Note 4). 3. Trim adapters and barcode sequences using an internal matching algorithm (BC Cancer Agency). 4. Prune the low-quantity reads (ambiguous bases “N”  10% and more than 20% with quality score < 30 bases). 5. Map the reads to GenBank (http://www.ncbi.nlm.nih.gov/), Rfam (v.14.0) database (http://xfam.org/), tRNAdb [5], SILVA rRNA [6], and Repbase (http://www.girinst.org/). 6. Subtract r/t/sn/snoRNAs (i.e., ribosomal RNA, transference RNA, small nuclear ribonucleic RNA, and small nucleolar RNA) from the libraries using bowtie2 software with perfect matches (0 mismatch) [7]. 7. Parse the clean raw sequencing data (bam format) into fastq and fasta formats under Linux in a command-line environment for subsequent use.

222

Yang Liu and Yousry A. El-Kassaby

8. Use the sRNA toolbox [8] to profile sRNAs and size distribution. 9. Computationally predict highly enriched miRNAs in sRNA sequencing libraries (1,000 copies in at least one phase) against the P. glauca genome assemblies (PG29 v.3 [9], 20 Gb divided into 30 Mb per file) using miRPlant [10] (see Note 5). 10. Conduct abundance normalization (see Note 6). 11. Predict their mRNA targets using transcripts without miRNA genes on psRNATarget [11] with default options for search parameters. 12. Use the predicted MIR loci of hairpin structure (~160 nt) for highly abundant miRNAs and target genes to identify retroelements with autonomous LTRs, including Ty1/Copia, Ty3/Gypsy, Bel/Pao, Retroviridae, and caulimoviruses [12] (see Note 7). 13. Retrieve homologs for miRNA-targeted genes in P. glauca via a BLASTN search against Arabidopsis genome on EnsemblPlants (http://plants.ensembl.org). 14. Align top one predicted target gene for each miRNA of high abundance against the Gene Ontology (GO; http://www. geneontology.org/) protein database for GO term classification and KEGG pathway enrichment [13, 14] to annotate target mRNA functions. 3.5 Validating sRNAmRNA Pairs via Quantitative Real-Time RT-PCR

1. Reverse-transcribe 2 μg of RNA into cDNA using the EasyScript Plus™ kit (abmGood) with oligo-dT primers. 2. Dilute first-strand cDNA synthesis products five times (Fig. 3). 3. Run qRT-PCR analyses with three biological replicates per sample in 15 μL reaction volumes in an ABI7900HT machine (Applied Biosystems) using the PerfeCTa® SYBR® Green SuperMix with ROX (Quanta Biosciences) (see Note 8). 4. Generate dissociation curves at the end of each qRT-PCR to validate the amplification of only one product. 5. Perform efficiency calculation and normalization using realtime PCR Miner (www.miner.ewindup.info/). 6. Confirm data quality through internal controls and no template controls and by comparing the repeatability across replicates (see Note 9).

Small RNA Analysis in Trees

223

Fig. 3 Amplification plot (top left panel) showing the expression levels of serial dilutions of the cDNAs: 1:1 to 1:50 with increment by 1 using three replicates each [C(t) values at overlapped points with the threshold line]; PCR standard curves (top right panel, amplification efficiency circled in red); melt curves (bottom left panel); and unique melt peaks (bottom right panel) indicating specific products amplified

4

Notes 1. The purity of the RNA can be estimated by calculating the A260/A280 ratio. Pure RNA has a A260/A280 ratio between 1.8 and 2.2. Calculate the A260/A230 ratio to determine whether organic impurities, such as polysaccharides and polyphenolics, are present. 2. Loading and running buffer must be RNase-free. The gel box, comb, and tray must be RNase decontaminated. 3. This is the template for the final library PCR, into which 6-nt mers index is introduced to identify libraries (i.e., demultiplexed) from a sequenced pool. 4. Quality control includes a size distribution report for the distribution of read lengths, which is assessed manually to ensure that size selection is successful. A narrow size distribution centered at 22 bp is typical of a good quality library, and skewed or wide distribution indicates a poor-quality library or problems with size selection of library constructions. Besides, a summary report is generated to give quality metrics including the total number of reads aligned to miRNAs, the number of miRNA species covered by 1 and 10 reads, and a count of the total reads classified as each annotation type.

224

Yang Liu and Yousry A. El-Kassaby

5. As the size of our P. glauca sequence libraries exceeds the maximal single load on miRPlant, we divide each library into several sub-ones with a maximum size of 120 Mb under Linux, and after combining the output files in the same library (e.g., summing up raw abundance for the same unique reads from different sub-files), duplicated reads are removed and we only retain one copy of unique reads with the highest prediction score for each library using R 3.2.2 (The R Project for Statistical Computing), followed by a visual inspection to ensure the R script has attained our objectives. 6. Abundances are expressed in reads per million (RPM), calculated using the following equation: RPM ¼ number of mapped sRNA reads/number of clean sample reads  106. 7. Because the conifer genome massively accumulates TEs (e.g., 43.4% of loblolly pine genome is composed of long terminal repeats (LTRs); [1]) and they are potential sources for sRNAs generation and targets. 8. The reaction procedure is 5 min at 95  C, 45 cycles of 15 s at 95  C and 60 s at 59  C. 9. An average expression value for each gene at each time point is generated from the normalized data.

Acknowledgment This work was supported by the Johnson’s Family Forest Biotechnology Endowment and the National Science and Engineering Research Council (NSERC) of Canada Discovery and Industrial Research Chair to Y.A.E. References 1. Liu Y, El-Kassaby YA (2019) Novel insights into plant genome evolution and adaptation as revealed through transposable elements and non-coding RNAs in conifers. Genes (Basel) 10(3):228. https://doi.org/10.3390/ Genes10030228 2. Liu Y, El-Kassaby YA (2017) Landscape of fluid sets of hairpin-derived 21-/24-nt-long small RNAs at seed set uncovers special epigenetic features in Picea glauca. Genome Biol Evol 9(1):82–92. https://doi.org/10.1093/ gbe/evw283 3. Liu Y, El-Kassaby YA (2017) Global analysis of small RNA dynamics during seed development of Picea glauca and Arabidopsis thaliana populations reveals insights on their evolutionary

trajectories. Front Plant Sci 8:1719. https:// doi.org/10.3389/Fpls.2017.01719 4. Sambrook J, Russell DW (2001) Molecular cloning: a laboratory manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor 5. Ju¨hling F, Mo ¯ rl M, Hartmann RK, Sprinzl M, Stadler PF, Pu¨tz J (2009) tRNAdb 2009: compilation of tRNA sequences and tRNA genes. Nucleic Acids Res 37:159–162. https://doi. org/10.1093/nar/gkn772 6. Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig W, Peplies J, Glo¯ckner FO (2007) SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res 35(21):7188–7196. https://doi. org/10.1093/nar/gkm864

Small RNA Analysis in Trees 7. Langmead B, Salzberg SL (2012) Fast gappedread alignment with Bowtie 2. Nat Methods 9 (4):357–359. https://doi.org/10.1038/ Nmeth.1923 8. Rueda A, Barturen G, Lebro´n R, Go´mezMartı´n C, Alganza A, Oliver JL, Hackenberg M (2015) sRNAtoolbox: an integrated collection of small RNA research tools. Nucleic Acids Res 43:467–473. https://doi.org/10.1093/ nar/gkv555 9. Warren RL, Keeling CI, Yuen MM, Raymond A, Taylor GA, Vandervalk BP, Mohamadi H, Paulino D, Chiu R, Jackman SD, Robertson G, Yang C, Boyle B, Hoffmann M, Weigel D, Nelson DR, Ritland C, Isabel N, Jaquish B, Yanchuk A, Bousquet J, Jones SJ, MacKay J, Birol I, Bohlmann J (2015) Improved white spruce (Picea glauca) genome assemblies and annotation of large gene families of conifer terpenoid and phenolic defense metabolism. Plant J 83 (2):189–212. https://doi.org/10.1111/tpj. 12886 10. An J, Lai J, Sajjanhar A, Lehman ML, Nelson CC (2014) miRPlant: an integrated tool for identification of plant miRNA from RNA sequencing data. BMC Bioinformatics

225

15:275. https://doi.org/10.1186/14712105-15-275 11. Dai X, Zhao PX (2011) psRNATarget: a plant small RNA target analysis server. Nucleic Acids Res 39:W155–W159. https://doi.org/10. 1093/nar/gkr319 12. Llorens C, Futami R, Covelli L, DominguezEscriba L, Viu JM, Tamarit D, AguilarRodriguez J, Vicente-Ripolles M, Fuster G, Bernet GP, Maumus F, Munoz-Pomer A, Sempere JM, Latorre A, Moya A (2011) The Gypsy Database (GyDB) of mobile genetic elements: release 2.0. Nucleic Acids Res 39:70–74. https://doi.org/10.1093/nar/gkq1061 13. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1):25–29. https://doi.org/10.1038/75556 14. Kanehisa M, Goto S (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28(1):27–30. https://doi.org/10. 1093/Nar/28.1.27

Chapter 16 Epigenetic Barcodes for Detection of Adulterated Plants and Plant-Derived Products Matteo Busconi, Giovanna Soffritti, Marcelino De Los Mozos Pascual, and Jose´ Antonio Fernandez Abstract In this chapter, we report a possible alternative use of epigenetics by applying methylation-sensitive amplified fragment length polymorphisms (MS-AFLP) to saffron traceability. Saffron is the most expensive plant-derived product in the world and one of the most frequently adulterated. One of the most frequent adulteration is by adding to saffron stigmas different parts of the saffron flower itself to increase volumes. While DNA is the same in all the parts of the plant, the epigenetic state can vary according to the organ and/or tissue of origin, making it possible to discriminate the stigmas from the other parts of saffron flower. In the subsequent method, the protocol to carry out a MS-AFLP analysis of saffron DNA methylation patterns is described. Key words MS-AFLP, Epigenetics, Saffron, Traceability

1

Introduction Saffron is the spice produced from the dehydrated stigmas of Crocus sativus L. [1]. Saffron is a high-value agricultural product, the most expensive plant-derived product in the world, reaching 15,000–20,000 €/kg, and, because of this, it is frequently adulterated. In past years, adulterations have been detected, on both ground and whole stigmas, involving the addition of different plant species, plant- and animal-derived substances, synthetic dyes and chalk, among others [2, 3]. Concerning plant species, the saffron crocus is itself one of the most frequent adulterants of saffron, and cases of self-adulteration by adding cut and/or dyed C. sativus stamens and tepals are among the most common forms of adulteration. While adulteration with different plant species, such as safflower (Carthamus tinctorius L.), calendula (Calendula officinalis L.), and curcuma (Curcuma longa L.), can be easily detected, among others, by DNA analyses [4–6], the situation is more

Charles Spillane and Peter McKeown (eds.), Plant Epigenetics and Epigenomics: Methods and Protocols, Methods in Molecular Biology, vol. 2093, https://doi.org/10.1007/978-1-0716-0179-2_16, © Springer Science+Business Media, LLC, part of Springer Nature 2020

227

228

Matteo Busconi et al.

challenging if the adulterating material comes from the same biological species. In this case, DNA analysis is ineffective, as the different parts of the same plant possess the same DNA. Furthermore, saffron is a sterile plant, propagated year by year through vegetative reproduction with the production of new corms, and because of this, only little genetic variability is present [7] and no different characterized cultivars are recognized, as is the case as in most other crops. Under these conditions, detecting adulteration of stigmas with other parts of the flower by using molecular approaches is not as simple to achieve. In previous studies [7], we were able to detect a high epigenetic variability among saffron accessions with similar or different geographic origin. Epigenetics refers to a series of chemical modifications of DNA, especially cytosine methylation, and covalent modification of histones. The most frequent mark of DNA methylation is represented by 5-methylcytosine (5-mC) at different sites within the CG, CHG, and CHH context (where H represents any base but G). Unlike the DNA sequence itself, methylation patterns can widely vary within the same plant according to the cell, tissue, and organ in the vegetative and reproductive phases of a plant’s life cycle [8]. In addition to this, in a vegetatively propagated species like saffron, the epigenetic constitution is very stable in consecutive years with only small inter-annual variability [9]. Considering these factors together, we reasoned that analysis of epigenetic marks could be a possible way to detect the adulteration of saffron with different parts of the saffron flower itself and this possibility was investigated in a previous study. Pools of tepals, stamens, and stigmas, received from the World Saffron and Crocus Collection located at the Bank of Plant Germplasm of Cuenca (Cuenca, Spain), were analyzed at the epigenetic level by applying the methylation-sensitive amplified fragment length polymorphisms technique (MS-AFLP) [6]. Comparing the MS-AFLP profiles of the three different parts of saffron flower, different epigenetic profiles were observed. In particular, while the epigenetic profiles of stigmas and tepals were similar with just small differences, stamens (filament + anthers) had completely different profiles. MS-AFLP (also called MSAP) [10, 11] is, basically, a variant of the classical AFLP method [12]. The main difference is that, rather than using MseI as frequent cutter enzyme, methylation-sensitive enzymes (such as the isoschizomers MspI and HpaII) are used. Methylation-sensitive enzymes cut DNA accordingly to the methylation state of their restriction sites. More specifically, MspI and HpaII recognize and cut the same site (50 -CCGG-30 ) in the absence of methylation. In the presence of methylated cytosines, the situation is different: HpaII is basically blocked, while MspI can cut also some methylated sites [13]. In the present paper, the whole protocol for the MS-AFLP analysis, applied to the specific case of saffron traceability, is provided, from DNA extraction to MS-AFLP evaluation at the end of the analysis.

Epigenetic Barcodes for Detection of Adulterated Plant-Derived Products

2

229

Materials All solutions must be prepared by using ultrapure water and analytical grade reagents. All reagents can be prepared and stored at room temperature (unless indicated otherwise). Disposal of waste materials must be carried out by following existing regulations.

2.1

DNA Extraction

Perform DNA extraction using a commercial kit, e.g., the “GenElute Plant Genomic DNA Miniprep Kit” with addition of PVP (polyvinylpyrrolidone) (see Note 1). 1. PCR amplificability of extracted DNA. 2. Universal primers used for plant DNA barcoding as reported in Table 1 [14]. 3. PVP (polyvinylpyrrolidone): 10% solution in water (see Note 2). 4. Reagents for PCR analysis, for each single sample: PCR buffer, Mg2+, dNTPs (deoxyribonucleotides triphosphate), primers, thermostable DNA polymerase, and ultrapure water. 5. PCR thermal cycler(s) for DNA amplification.

2.2 Agarose Gel Electrophoresis

1. 0.5 M EDTA, pH 8.0 [15]. Add 186.1 g of disodium ethylenediaminetetraacetate 2H2O to 800 mL of H2O. Mix vigorously on a magnetic stirrer. Adjust the pH to 8.0 with NaOH (approximately 20 g NaOH pellets) (see Note 3). 2. Electrophoresis buffer stock: Tris-borate (TBE) 5 [15]. Add about 350 mL water to a 1 L graduated glass beaker (see Note 4). Weigh and transfer 54 g of Tris base to the beaker and wait for complete dissolution of salts. Weigh and add 27.5 g of boric acid to the solution and wait for complete dissolution. Add 20 mL of 0.5 M EDTA (pH 8.0). Add water

Table 1 Universal primers for plant barcoding studies used to test amplificability of DNA via PCR reaction

a

Annealing temperature ( C)

Amplicon size (bp)b

ACCCAGTCCATCTGGAAATCTTGGTTC CGTACAGTACTTTTGTGTTTACGAG

58

Around 900

rbcL-F rbcL-R

ACCACAAACAGAGACTAAAGC GTAAAATCAAGTCCACCRCG

52

Around 600

ITS-S2F ITS4

ATGCGATACTTGGTGTGAAT TCCTCCGCTTATTGATATGC

58

Around 400

Primer namea

Sequence

matK-KIM1R matK-KIM3F

The label of the primers as reported in Fazekas et al. [14] The size of the expected amplified fragments for saffron is reported in base pairs

b

230

Matteo Busconi et al.

to a volume of 1000 mL and wait for the complete homogenization of the solution. Transfer the final solution into a glass bottle for storing at room temperature (see Note 5). 3. Electrophoresis buffer working solution: Tris-borate (TBE) 0.5 [15]. Add 100 mL of TBE 5 to a 1 L bottle. Add water to a volume of 1000 mL and mix completely by inversion. 4. Agarose gel: Weigh into a flask agarose powder (see Note 6), add TBE 0.5 to the desired final volume (the amount of agarose and TBE is dependent by the size of the gel to be prepared), and heat in the microwave (see Note 7). Add a DNA gel stain, e.g., “Midori Green,” 1 μL for each 100 mL of gel, mix the solution, and pour into the mold. Wait until complete solidification of the gel (see Note 8). 5. UV transilluminator, wavelength, 254–312 nm. 6. Loading dye. 7. Size standard, 100 bp or 200 bp DNA ladder. 2.3 MS-AFLP Analysis of the Saffron Flower Parts

1. Adapters and primers used for the restriction-ligation step and for preselective and selective PCR amplifications for methylation-sensitive analysis are shown in Table 2. The selective primers specific for the EcoRI adapters need to be labeled with fluorescent dyes (see Note 9), but the choice of the preselective and selective primers to be used depends on researcher preference. 2. Preparation of EcoRI and HM adapters: (1) Mix in a single tube 200 pmol EcoRI-Fw and EcoRI-Rev adapters, and add water to a final volume of 100 μL and final concentration 2 μM; (2) Mix in a single tube 2000 pmol Hm-Fw and HM-Rev adapters, and add water to a final volume of 100 μL and final concentration 20 μM. Incubate the adapter solution at 65  C for 10 min followed with 10 min at 37  C and finally 10 min at 25  C (see Note 10). Adapter solution must be stored at 20  C until needed. 3. Molecular biology enzymes (e.g., ThermoFisher Scientific): T4 ligase, EcoRI restriction enzyme, methylation-sensitive restriction enzymes MspI and HpaII. All the enzymes are provided with the corresponding buffers for subsequent analyses. 4. PCR reagents, the same reported subsequently, see Subheading 3.3 DNA amplificability via PCR analysis, step 1.

2.4 Fluorescent Analysis of MS-AFLP Fragments

1. Genetic analyzer ABI Prism 3100 or ABI Prism 3130 (Applied Biosystems), automatic DNA sequencer with 16 capillaries.

Epigenetic Barcodes for Detection of Adulterated Plant-Derived Products

231

Table 2 List of adapters and primers used for MS-AFLPs marker analysis Adapters

Analysis

Typology

Adapter sequence 50 —30

EcoRI-Fw

MS-AFLP

Adapter ligation

CTCGTAGACTGCGTACC

EcoRI-Rev

MS-AFLP

Adapter ligation

AATTGGTACGCAGTCTAC

HM-Fw

MS-AFLP

Adapter ligation

GATCATGAGTCCTGCT

HM-Rev

MS-AFLP

Adapter ligation

CGAGCAGGACTCATGA

Primer

Analysis

Typology

Primer sequence 50 —30

E01a

MS-AFLP

Preselective

GACTGCGTACCAATTCAc

HM0b

MS-AFLP

Preselective

ATCATGAGTCCTGCTCGGT

E32

MS-AFLP

Selective

TETd-GACTGCGTACCAATTCAAC

E38

MS-AFLP

Selective

6FAM-GACTGCGTACCAATTCACT

E40

MS-AFLP

Selective

HEX-GACTGCGTACCAATTCAGC

HM1

MS-AFLP

Selective

ATCATGAGTCCTGCTCGGTAA

HM2

MS-AFLP

Selective

ATCATGAGTCCTGCTCGGTCC

HM3

MS-AFLP

Selective

ATCATGAGTCCTGCTCGGTTC

a

International codification for AFLP and MS-AFLP primers Custom codes for MS-AFLP preselective and selective primers c The selective nucleotides of the primers are written in bold d Fluorescent dyes, at the 50 end of EcoRI selective primers, are underlined b

2. 3130  L/3100 genetic analyzer 16-capillary array (Applied Biosystems) for capillary electrophoresis of fluorescent-labeled MS-AFLP fragments. 3. A polymer, e.g., NimaPOP 4 (ABI 3100) or NimaPOP 7 (ABI 3130) (Nimagen), for MS-AFLP fragments separation during capillary electrophoresis. 4. Hi-Di™ Formamide (Applied Biosystems). 5. GeneScan™ 500 LIZ™ dye Size Standard (Applied Biosystems). The size standard working solution must be prepared, under a chemical hood, as follows: 20 μL size standard stock diluted in 980 μL of Hi-Di Formamide. 6. GeneScan (or GeneMapper) software (Applied Biosystems) for analyzing and visualizing MS-AFLP profiles.

3

Methods All procedures must be performed at room temperature unless otherwise specified.

232

3.1

Matteo Busconi et al.

Sample Set

3.2 DNA Extraction and Agarose Gel Electrophoresis

Pools of tepals, stamens, and stigmas recovered during saffron flowering to have fresh material for subsequent analyses (see Note 11). Before the extraction, samples must be ground to a fine powder by using pestle and mortar in the presence of liquid nitrogen. 1. Perform DNA extraction following a customized version of manufacturer’s instructions (see Note 12). 2. Mix 5 μL of extracted DNA with 3 μL of loading dye and water to a final volume of 20 μL. 3. Run samples on a 1% agarose gel for electrophoresis (100 V for 20 min). 4. After stopping the electrophoretic run, visualize the DNA using Midori Green staining and the UV transilluminator (see Note 13).

3.3 DNA Amplificability Via PCR Analysis

1. PCRs were assembled into 250 μL tubes, in a final volume of 25 μL containing 10 ng of extracted DNA template, 1 PCR buffer, 1.5 mM Mg2+, 0.15 mM dNTPs, 10 pmol of each primer, 5 μL of 10% PVP solution, 1 unit (U) thermostable DNA polymerase, and ultrapure water to the final volume (see Note 14). Once fully assembled, tubes are inserted into a thermal cycler for PCR amplification. 2. Amplification cycle parameters were: 95  C for 5 min, 35 cycles of 30 s at 95  C, 40 s at the corresponding annealing temperature (Table 1), 1.5 min at 72  C, and a final extension of 10 min at 72  C. 3. The products of the PCR reactions are mixed with 3 μL of loading dye and analyzed by agarose gel electrophoresis on a 1.5% agarose gel (100 V for 20 min). Along with the samples, a DNA ladder was loaded to check for the proper size of the amplicons. 4. Visualization is carried as previously reported (see Subheading 2.2, step 4) (see Note 15).

3.4 MS-AFLP Analysis of the Saffron Flower Parts

1. 200 ng of high-quality DNA are needed for the analysis. The first step of the MS-AFLP analysis is the digestion of the DNA with the restriction enzymes (see Note 16). The whole procedure, referred to the digestion with EcoRI and HpaII, is presented in Fig. 1. 2. EcoRI/HpaII digestions are carried out in two steps. The first is digestion with HpaII: digestions are assembled into 250 μL tubes, in a final volume of 22 μL containing 200 ng of genomic DNA, 1 buffer TANGO (ThermoFisher Scientific), 5 U of HpaII, ultrapure water at the final volume. Digestions are incubated at 37  C for 1 h 30 min and, then, after the addition

Epigenetic Barcodes for Detection of Adulterated Plant-Derived Products

233

Fig. 1 The whole procedure for a MS-AFLP analysis is reported starting from the digestion until the visualization after the analysis with automatic DNA genetic analyzers. In this figure, the enzymes used for the digestion are EcoRI and HpaII, shown with their corresponding restriction sites. For HpaII, some unmethylated and methylated sites are represented showing that in the absence of DNA methylation the enzyme is able to cut while in the presence of methylation, its activity is blocked. The same procedure, with the only differences as a consequence of the different influence of methylation states on the enzyme activity, is valid using MspI instead of HpaII

234

Matteo Busconi et al.

to each sample of 2.75 μL buffer TANGO 10 and 5 U EcoRI, are further incubated at 37  C for an additional 2 h (Note 16). 3. EcoRI/MspI digestions are assembled in 250 μL tubes in a final reaction volume of 25 μL containing 200 ng of genomic DNA, 2 buffer TANGO (ThermoFisher Scientific), 10 U of MspI, 5 U of EcoRI, and ultrapure water to reach the final volume. Digestions are incubated at 37  C for 2 h 30 min (see Note 16). 4. Ligation of the adapters (see Note 17). 25 μL of ligation master mix (prepared as reported in Subheading 2.3, item 3) and 3 Weiss units of T4 ligase are added to 25 μL of each of the DNA digests generated in Subheading 3.4, steps 2 and 3; the final volume for the ligation should be 50 μL, made up with ultrapure water. The reactions are incubated in a thermal cycler for 2 h 30 min at 37  C. After the incubation, the restrictionligation reaction (50 μL) is diluted by adding 950 μL of ultrapure water. 5. Preselective amplification (see Note 18). PCRs are assembled into 250 μL tubes, in a final volume of 25 μL containing 5 μL of diluted restriction-ligation reaction, 1 PCR buffer, 1.5 mM Mg2+, 0.25 mM dNTPs, 10 pmol of E01 and 10 pmol of HM0 preselective primers (Table 2), 1 unit (U) thermostable DNA polymerase, and ultrapure water to the final volume. Once fully assembled, tubes are inserted into a thermal cycler for PCR amplification. The amplifications are carried out as follows: 94  C for 5 min, 35 cycles at 94  C for 30 s, 56  C for 1 min, and 72  C for 1 min; final extension at 72  C for 10 min. 6. Evaluation and dilution of preselective PCRs. 10 μL of preselective PCRs are analyzed and visualized by means of an agarose gel electrophoresis. The concentration of the gels must be 1.5% minimum (see Note 19). In the presence of positive amplifications, the remaining 15 μL of preselective PCR must be diluted by adding 210 μL of water. 7. Selective amplifications (see Note 20). PCRs are assembled into 250 μL tubes, in a final volume of 20 μL containing 3 μL of diluted restriction-ligation, 1 PCR buffer, 1.5 mM Mg2+, 0.4 mM dNTPs, 3 pmol of EcoRI fluorescent selective primers and 5 pmol of HM0 selective primers (Table 2), 1 unit (U) thermostable DNA polymerase, and ultrapure water to the final volume. The amplifications are carried out as follows: 94  C for 5 min; 10 cycles at 94  C for 1 min, 65  C for 1 min (reducing annealing temperature 1  C/cycle), and 72  C for 1.5 min; 25 cycles at 94  C for 1 min, 56  C for 1 min, and 72  C for 1.5 min; and final extension at 72  C for 10 min. 8. Analysis with ABI 3100 genetic analyzer. For each sample under analysis, 1 μL of selective PCR product is mixed with 9 μL of size standard working solution prepared as reported in

Epigenetic Barcodes for Detection of Adulterated Plant-Derived Products

235

Subheading 2.4, item 5. Samples are denatured at 95  C for 5 min and then cooled down and preserved at 4  C until the analysis with ABI Prism 3100 genetic analyzer (see Note 21). Samples are loaded and run on the ABI Prism 3100 genetic analyzer and they are subjected to capillary electrophoresis following manufacturer’s instructions. Fluorescent signals released by fluorescent dyes excited by a laser beam are collected by a CCD camera and sent to the computer for the analysis. 9. Analysis and visualization of MS-AFLP profiles. Raw fluorescent signals collected by the CCD camera are analyzed and visualized by using specific software provided with the platform (e.g., GeneScan, ABI Prism 3100 or GeneMapper, ABI Prism 3130). MS-AFLP profiles are visualized as a series of peaks of fluorescence with different height and colors that can be modified according to the necessity of analysis (see Note 22). An example of MS-AFLP profiles is reported in Fig. 2. 10. Evaluation of polymorphisms. The evaluation of polymorphisms is carried out by comparing the profiles of interest and scoring fluorescence signals for the presence/absence of the peaks (Fig. 2) (see Note 23). If further analyses, other than a simple visual inspection, are required, polymorphic signals can be scored as 1 for presence and 0 for absence, and a binary matrix obtained. This can subsequently be analyzed with software specific to other purposes.

4

Notes 1. This kit was selected as the most appropriate for our purposes based on the results of a comparison among different commercial kits developed for recovering DNA from plant tissues and plant-derived products [6]. 2. Store the solution at 4  C to extend solution stability and to reduce risks of microbial and fungal proliferation. 3. The disodium salt of EDTA will not go into solution until the pH of the solution is adjusted to approximately 8.0 by the addition of NaOH. 4. The presence of a small amount of water inside the beaker can be useful to make dissolution of salts easier. 5. A precipitate can frequently form if TBE is stored for prolonged periods. To avoid problems, discard any batches that develop a precipitate. 6. Agarose gel is made in w/v ratio. Gel concentration is dependent by the size of DNA fragments to be analyzed. For genomic DNA is recommended a 0.8–1.0% agarose gel while for

236

Matteo Busconi et al.

Fig. 2 MS-AFLP profiles generated at the end of the analysis. (a) Profiles generated by digestion with EcoRI and HpaII using the selective primer combination E38/HM2. (b) Profiles generated by the digestion with EcoRI and MspI using the selective primer combination E38/HM2. In both cases, the first panel shows the profile for stamens, in the second stamens without anthers (i.e., filaments), in the third panel for tepals, and in the last panel for stigmas. In both figures, polymorphic signals are apparent indicating differences in the DNA methylation profiles of the floral parts

Epigenetic Barcodes for Detection of Adulterated Plant-Derived Products

237

DNA amplificability PCR and MS-AFLP preselective PCR, as the size of amplicons are smaller than the size of genomic DNA, it is recommended to use gels with a minimum agarose concentration of 1.5%. 7. Heat the solution until boiling. Boil for some seconds to obtain a clear solution. Weigh the solution and adjust for volume decreases by adding TBE 0.5 to restore the volume of 100 mL. 8. It is better to carry out gel solidification under a chemical hood to aspirate molecules that can be released from the hot solution. 9. In fluorescent MS-AFLP analysis, primers developed with EcoRI adapters are labeled with fluorescent molecules to make it possible to run on automatic DNA capillary DNA genetic analyzers. The use of fluorescent dyes, instead of radioactivity, increases the biosafety of the operator. 10. Adapters are developed to bind to the sites generated with the restriction analysis making it possible to develop primers for preselective and selective PCR amplifications. More specifically, EcoRI adapters can bind to the sites generated by EcoRI digestion, while HM adapters can bind to the sites generated by the digestion with both HpaII and MspI (isoschizomers). 11. It is recommended to start with fresh material to reduce the risk of DNA degradation because of the prolonged storage of saffron parts. After the collection, the different parts of saffron flower must be processed as soon as possible. If not feasible, samples must be lyophilized and stored until the analysis at room temperature or below. 12. The starting material for DNA extraction can vary according to plant material available: extraction can be carried out from up to 100 mg of fresh tissues or up to 10 mg of freeze-dried tissues (although in our experience 20 mg is preferable). In both cases, before the first addition of buffers, 4% PVP powder was directly added to the ground plant materials. The percentage of PVP powder is calculated based on the amount of plant material. Plants are naturally rich in secondary metabolites such as phenolic compounds: these molecules can hinder the recovery of DNA and/or the subsequent enzymatic reactions, and addition of PVP during DNA extraction can facilitate the removal of phenolic compounds increasing the purity of the recovered DNA. Further modifications to the original protocol are: (1) all centrifugation steps to be carried out at 4  C; (2) the incubation times, to break cells and to release DNA, should be increased from 10 to 20 min; (3) the centrifugation step to precipitate debris is increased from 5 to 10 min; (4) the number of column washes is increased to three, instead of the two

238

Matteo Busconi et al.

proposed as in the original protocol; (5) final elution is carried out in a final volume of 50 μL, instead of 100 μL, of elution buffer. 13. Agarose gel electrophoresis makes it possible to quickly gain information concerning both quantity and quality of the extracted DNA. Concerning quantity, the stronger the light signal, the higher is the DNA concentration in the purified DNA. Concerning quality, the presence of a well-defined band at the top of the gel is indicative of a high-quality DNA, while the presence of a smear is indicative of DNA degradation. The stronger the smear, the bigger the likely degradation. For an analysis such as MS-AFLP, starting with high-quality DNA is fundamental to increase the reliability and reproducibility of the analysis itself. 14. DNA is directly added to each tube while all the other reagents are used to prepare a PCR master mix, at a volume determined by the number of samples to be amplified, which is subsequently aliquoted in the single tubes. The same approach can be considered for all the PCRs and the other reactions in the MS-AFLP procedure. Other than during DNA extraction, PVP is used also during PCR amplification to improve the amplificability of the recovered DNA. 15. The presence of the fragments of the desired size, according to the different primer pairs, is indicative of pure DNA that is able to be amplified via PCR and to be used for the subsequent MS-AFLP analysis. On the other hand, failure of these PCRs typically indicates the presence of high concentration of polymerase inhibitors that require further purification or the repetition of the DNA extraction. A positive amplification with the ITS marker (using ITS-S2F and ITS4 primers) indicates the presence of amplifiable nuclear DNA. A positive amplification with the RUBISCO (rbcL-F and rbcL-R primers) and maturase K markers (matK-KIM1R and matK-KIM3F primers) indicates the presence of amplifiable chloroplast DNA. 16. In a classical MS-AFLP analysis, two independent analyses are considered: (1) using EcoRI and HpaII and (2) using EcoRI and MspI. The influence of DNA methylation on EcoRI activity is usually considered negligible, while the ability of the isoschizomer pair HpaII and MspI to cleave the corresponding site is heavily affected by the methylation state of cytosines. Changing the enzymes used in the digestion completely changes the final profiles of fragments obtained at the end of the analysis. EcoRI recognizes a 6-base site (with sequence 50 -GAATTC-30 ), while isoschizomer pair (HpaII/MspI) a 4-base site. For each digestion, it is possible to add the enzymes at the same or at different ones according to their efficiency with the restriction

Epigenetic Barcodes for Detection of Adulterated Plant-Derived Products

239

buffer provided and used (this can vary according to the supplier). Some buffers are specific for the single enzymes, while others are developed to make simultaneous digestions with more than a single enzyme possible. In this second case, it is possible that the efficiency of the enzymes is not 100%. If this happens, it is possible to operate in different ways to increase efficiency: (1) it is possible to increase the amount of the buffer (more than the classical 1 final concentration); (2) it is possible to increase the amount (the units) of the less efficient enzyme to compensate for the inferior activity; (3) it is possible to start the digestion with a single enzyme and then, after the incubation step, to adjust the restriction conditions for the second enzyme; (4) it is possible to change the duration of the incubation step. Online tools, such as Double Digest Calculator (ThermoFisher Scientific) or NEBcloner (New England Biolabs), can help in finding the most appropriate conditions to carry out simultaneous digestions with multiple enzymes. In the present protocol, we used enzymes and the corresponding buffers provided by ThermoFisher Scientific and optimized the various steps. Changing the suppliers could require just little adjustment of restriction conditions. In our experiment, for the digestion EcoRI/HpaII, HpaII was added at the beginning and EcoRI afterwards, while, for the digestion EcoRI/MspI both the enzymes were added simultaneously, increasing the amount of MspI, at the beginning of the analysis. All the subsequent steps, ligation of the adapters and preselective and selective PCRs, were identical for the two analyses. 17. The two steps, restriction and ligation, can be performed separately or simultaneously depending on the protocol used. In our case, we decided to first carry out the restrictions with the selected enzymes and then subsequently the ligation of the adapters to the fragments resulting from the two double restrictions. 18. Preselective PCR is the first of the two PCRs required in the MS-AFLP reaction necessary to amplify a discrete set of DNA fragments from the bulk of fragments generated by the enzymatic digestion. Primers are developed based on the sequence of the adapters and of the restriction sites with one extra (preselective) nucleotide. Preselective nucleotides are important to select a specific subset of fragments, from the digestion bulk, to be amplified and analyzed. Changing the preselective nucleotide changes the subset of fragments that will be amplified. 19. In the presence of a positive preselective amplification, a more or less intense smear, corresponding to each sample, should be clearly visible. The smear is made up of a multitude of DNA

240

Matteo Busconi et al.

fragments that have been amplified. Within the smear, stronger bands indicate the presence of more abundant fragments. 20. Selective amplification is the last PCR in the procedure. The result of this analysis is a set of discrete fragments that can be easily visualized and analyzed. Selective primers developed using EcoRI adapters, assuming the analysis is to be carried out with automatic capillary sequencers, are labeled with fluorescent dyes to allow visualization on these platforms. Selective primers are identical to preselective primers, with the only difference, apart from the addition of fluorescent labeling, is the addition of more selective nucleotides at the 30 end of the primer sequence. The number of selective nucleotides is based on the size of the genome to be analyzed. For small genomes, a single selective nucleotide, in addition to the preselective one, can be enough. For big genomes, the addition of more selective nucleotides, two or three in addition to the preselective one, is recommended. As the saffron genome is very big, with an estimated size in excess of 10 Gb, selective primers had two bases added in addition to the one on the preselective primers. Changing the selective nucleotides, or the primer combinations, results in the amplification of different subsets of DNA fragments from the preselective PCR bulk. In the present protocol, the subsequent primer combinations were used: E38/HM2, E40/HM1, and E32/HM3 (see Table 2 for primer information). 21. Before running on the automatic sequencer, a mixture of size standard and formamide is added to each sample. Before loading, samples must be denatured by heating at 95  C and subsequent cooling in ice (or in the thermal cycler). The denaturation and the subsequent cooling in the presence of formamide is necessary to separate the two strands of the DNA fragments and to keep them separated, preventing the restoration of the double helix and the possible formation of secondary structure within single strands or other artifacts. This is extremely important to ensure correct migration of the fragments during capillary electrophoresis. The size standard is a DNA ladder, characterized by a set of fragments labeled with a different fluorescence marker. A standard is necessary for the subsequent analysis in order to correctly size the MS-AFLP fragments and to precisely align and compare MS-AFLP profiles from different samples (making it easier to visualize any differences). 22. The height of the peaks is partially correlated with the abundance of the fragment in the selective amplification. The size standard is used to determine the correct size of samples. Contrary to standard gel electrophoresis where size standard is loaded a single time, in capillary electrophoresis, each sample

Epigenetic Barcodes for Detection of Adulterated Plant-Derived Products

241

is mixed with size standard and sized independently by the other samples. Color of the profiles is based on the released fluorescence, but it can be modified by the operator according to the necessity making possible the superimposition of different profiles in a single panel for a better evaluation of polymorphisms. 23. As with a classical AFLP, MS-AFLP fragments are in effect dominant multilocus molecular markers: multilocus because several DNA regions can be analyzed simultaneously, dominant because the polymorphism is a presence/absence signal. Hence, it is not possible to distinguish a homozygous from a heterozygous sample. While this can be a limitation if specific analyses are required, in the case of traceability where it is sufficient to score the presence of differences from an expected profile, it is not a major issue. Two important aspects must be considered when carrying out a MS-AFLP analysis: (1) the more intact is the DNA, the better, as multilocus markers are highly susceptible to DNA degradation, and the more degraded the DNA, the less reproducible the profile across replicates; (2) it is recommended to replicate the same sample at least two or three times to have a more reliable profile (Fig. 2). Peaks that are consistently present can be considered as true fragments while peaks present just in one of the replicates are very likely to be the consequence of unspecific amplification during the selective PCR reaction.

Acknowledgments This work has been partially supported by the European Union COST FA1101 Action “Omics Technologies for Crop Improvement, Traceability, Determination of Authenticity, Adulteration and Origin in Saffron” and the Spanish project INIA RFP201400012 “Conservation and management of germplasm collections of the Spanish Network in Bank of Plant Germplasm of Cuenca.” References 1. Ferna´ndez JA (2004) Biology, biotechnology and biomedicine of saffron. Recent Res Dev Plant Sci 2:127–159 2. Food Fraud Database, Decernis Washington. https://www.foodfraud.org. Accessed on 24 September, 2015 3. European Saffron White Book. Available online: www.europeansaffron.eu/archivos/ White%20book%20english.pdf 4. Babaei S, Talebi M, Bahar M (2014) Developing an SCAR and ITS reliable multiplex

PCR-based assay for safflower adulterant detection in saffron samples. Food Control 35:323–328 5. Marieschi M, Torelli A, Bruni R (2012) Quality control of saffron (Crocus sativus L.): development of SCAR markers for the detection of plant adulterants used as bulking agents. J Agric Food Chem 60:10,998–11,004 6. Soffritti G, Busconi M, Sa´nchez RA, Thiercelin JM, Polissiou M, Rolda´n M, Ferna´ndez JA (2016) Genetic and epigenetic approaches for

242

Matteo Busconi et al.

the possible detection of adulteration and autoadulteration in saffron (Crocus sativus L.) spice. Molecules 21:343 7. Busconi M, Colli L, Sa´nchez RA, Santaella M, De-Los-Mozos Pascual M, Santana O, Rolda´n M, Ferna´ndez JA (2015) AFLP and MS-AFLP analysis of the variation within Saffron Crocus (Crocus sativus L.) Germplasm. PLoS One 10:e0123434 8. Widman N, Feng S, Jacobsen SE, Pellegrini M (2014) Epigenetic differences between shoots and roots in Arabidopsis reveals tissue-specific regulation. Epigenetics 9:236–242 9. Busconi M, Soffritti G, Stagnati L, Marocco A, Marcos Martı´nez J, De Los Mozos Pascual M, Fernandez JA (2018) Epigenetic stability in Saffron (Crocus sativus L.) accessions during four consutive years of cultivation and vegetative propagation under open field conditions. Plant Sci 277:1–10 10. Lu G, Wu X, Chen B, Gao G, Xu K (2007) Evaluation of genetic and epigenetic modification in rapeseed (Brassica napus) induced by salt stress. J Integr Plant Biol 49:1599–1607

11. Marconi G, Pace R, Traini A, Raggi L, Lutts S, Chiusano M et al (2013) Use of MSAP markers to analyse the effects of salt stress on DNA methylation in rapeseed (Brassica napus var. oleifera). PLoS One 8:e75597 12. Vos P, Hogers R, Bleeker M, Reijans M, van de Lee T, Hornes M, Frijters A, Pot J, Peleman J, Kuiper M, Zabeau M (1995) AFLP: a new technique for DNA fingerprinting. Nucleic Acids Res 23:4407–4414 13. Fulnecˇek J, Kovarˇ´ık A (2014) How to interpret methylation sensitive amplified polymorphism (MSAP) profiles? BMC Genet 15:2 14. Fazekas AJ, Kuzmina ML, Newmaster SG, Hollingsworth PM (2012) DNA barcoding methods for land plants. In: Kress WJ, Erickson DL (eds) DNA barcodes: methods and protocols, methods in molecular biology, 1st edn. Wiley, Weinheim 15. Sambrook J, Fritsch EF, Maniatis T (1989) Molecular cloning: a laboratory manual, 2nd edn. Cold Spring Harbor Laboratory Press, New York

Chapter 17 Plant Epigenetic Stress Memory Induced by Drought: A Physiological and Molecular Perspective James Godwin and Sara Farrona Abstract Drought stress is one of the most common stresses encountered by crops and other plants and leads to significant productivity losses. It commonly happens that drought stress occurs more than once during the plant’s life cycle. Plants suffering from drought stress can adapt their life strategies to acclimate and survive in many different ways. Interestingly, some plants have evolved a stress response strategy referred to as stress memory which leads to an enhanced response the next time the stress is encountered. The acquisition of stress memory leads to a reprogrammed transcriptional response during subsequent stress and subsequent changes both at the physiological and molecular level. Recent advances in understanding chromatin dynamics have demonstrated the involvement of chromatin modifications, especially histone marks, associated with drought stress-responsive memory genes and subsequent enhanced transcriptional responses to repeated drought stress. In this chapter, we describe recent progress in this area and summarize techniques for the study of plant epigenetic responses to stress, including the roles of ABA and transcription factors in superinduced transcriptional activation during recurrent drought stress. We also review the possible use of seed priming to induce stress memory later in the plant life cycle. Finally, we discuss the potential implications of understanding the epigenetic mechanisms involved in plant stress memory for future applications in crop improvement and drought resistance. Key words Drought, Abscisic acid (ABA), Priming, Stress memory, Transcriptional reprogramming, Histone marks

1

Introduction Plants are sessile organisms. They experience constant environmental challenges throughout their life cycles and this problem is likely to be exacerbated in the future by climate change. Drought stress is one of the most common environmental stresses in plants and leads to negative impacts on plant growth and development which ultimately affects crop productivity [1]; hence, plant responses to dehydration stress have been studied at different levels (physiological, cellular, and genome levels). It is predicted that climate change will increase the frequency of drought events leading to higher crop

Charles Spillane and Peter McKeown (eds.), Plant Epigenetics and Epigenomics: Methods and Protocols, Methods in Molecular Biology, vol. 2093, https://doi.org/10.1007/978-1-0716-0179-2_17, © Springer Science+Business Media, LLC, part of Springer Nature 2020

243

244

James Godwin and Sara Farrona

yield losses [2, 3]. For instance, a recent meta-analysis of data collected from 1980 to 2015 revealed that 21% and 40% yield reductions in this period were due to drought in wheat (Triticum aestivum L.) and maize (Zea mays L.), respectively [4]. Acute responses to drought stresses are comparatively well studied, but drought stresses are often recurring and responses to repetitive stress are much less well understood. Exposure to repeated stresses may exert an evolutionary pressure, which in turn leads to the development of sophisticated mechanisms to enable the plant to survive and cope with the stress. Plants exposed to recurrent periods of droughts and recovery have shown higher leaf water retention, less wilting, and improved tolerance to subsequent drought stress than plants encountering the stress for the first time [5–10]. These altered responses to periodical abiotic stress (drought) are referred to as hardening, acclimation, or priming and have led to the concept of “stress memory” that primes the plant for a faster and stronger response upon subsequent stress exposure events. In addition to altered physiological parameters, some drought stress-responsive genes display transcriptional responses that under repeated exposures are significantly different from the responses during their first contact with the stress [7]. Indeed, there is growing evidence that one of the most likely elements of stress memory is a modified transcriptional response ([11]; Fig. 1). From an evolutionary perspective, stress memory may be an effective strategy to prepare the plant for subsequent stress. In addition, stress memory improves the potential for local acclimation to changing environments. Nevertheless, stress memory may be associated with negative effects such as delayed growth and development, reduced yield, and possible risk of maladaptive memory [12, 13]. Hence, stress memory mechanisms might have evolved as an adaptive strategy to increase resistance against stress factors but may also compromise the overall plant performance, including with respect to human use, leading to trade-offs between stress survival and yield. Priming plants against future abiotic stress, including drought, can also be elicited by exogenous application of chemical treatments (e.g., salicylic acid or β-aminobutyric acid) [5, 14, 15]. Generally, priming provokes a response pattern by deploying the physiological, biochemical, and molecular mechanisms without the costs associated with constitutive expression of stress genes for enhanced protection [16] (Fig. 1). Even though it is widely accepted that plants can acquire stress memory, in many instances these memories can also be counterbalanced by resetting (or “forgetting”) during recovery and subsequent growth [12]. In most cases, memories formed as a result of stress are therefore relatively short term and persist only a fraction of the life span of the organism termed as somatic memory [17, 18]: in

Plant Epigenetic Stress Memory Induced by Drought

245

Fig. 1 Chromatin changes associated with drought stress memory and their transcriptional responses. When a naı¨ve plant is exposed to a first drought stress (S1), the plant acquires a stress memory in the form of the histone mark H3K4me3 keeping the chromatin in a permissive state. However, S1 is followed by a period of recovery (R) during which the plant decides either to retain the memory and remain in a primed sate or to reset [12]. When S2 occurs, higher H3K4me3 enrichment levels at drought stress-responsive memory genes (e.g., RD29B) are maintained even after recovery from S1 and abscisic acid-responsive factors (ABFs) bind to ABA response elements (ABREs), lying adjacent to each other in the promoter region which leads to transcriptional activation of RD29B during a subsequent drought stress [7]. Transcriptional stress memory in response to drought stress was also associated with Ser5P RNA polymerase II (RNAP II) marks (Ding et al 2012). Higher accumulation of Ser5P marks stalls the RNAP polymerase II during recovery phase. Subsequently, higher transcript levels are produced in S2 compared to S1 from the “superinduced” memory genes (represented by the fractured pink peaks), and memory genes also show rapid response in S2 compared to S1. Non-memory genes showed similar transcript levels during S1 and S2 (represented in non-fractured blue peaks) and slower response

other words, a somatic memory whose duration is limited to one generation and may be mitotically, but not meiotically, heritable [19]. Stress memory that can be passed on to the next generation or even further (transgenerational stress memory) has been recently discussed [19–22] but remains controversial. Despite some published evidence for stress memory being passed from parents to their progeny and beyond, it is hard to discount the possibility of other mechanisms such as maternal effects, and, importantly, unambiguous evidence for stable transmission of altered chromatin states controlled by stress effects to progeny has not been documented [23–26]. Even in the case of somatic memory retained

246

James Godwin and Sara Farrona

within a single generation, the underlying mechanisms are still not fully understood, although it has been discussed that it is likely to require signaling pathways, altered accumulation of transcription factors, and epigenetic modifications [27, 28]. Theoretically, epigenetic changes induced during priming could involve DNA methylation at different contexts, histone modification, or chromatin remodeling, leading to changes in gene expression [18, 20, 29]. Chromatin is the three-dimensional packaging of DNA and associated proteins compacted within the nucleus. The structure and compaction of chromatin plays an important role in maintaining genome integrity and the accessibility of the transcriptional machinery to genes [30]. The basic unit of chromatin organization is the nucleosome, in which approximately 147 base pairs of DNA wrap around histone octamers with two molecules of each histone, H2A, H2B, H3, and H4 [31]. In addition, the linker histones H1 bind the stretches of DNA between two nucleosomes [32]. During stress exposure in plants, various modifications take place at N-terminal tails of histones, including methylation, acetylation, ubiquitination, and phosphorylation [33, 34]. Some of the best known of these modifications include histone H3 lysine 4 trimethylation (H3K4me3) and histone H3 K9 acetylation, which typically correspond with areas of active gene transcription, whereas histone deacetylation and trimethylation of the lysine 27 of the H3 (H3K27me3) are often associated with gene repression [35]. This has contributed to our emerging understanding of the role of chromatin modifications in determining gene activity under drought stress [36]. Rapid modification of histone marks and subsequent inactivation of drought-inducible genes happen during the recovery period after drought stress [37]. Recent studies have studied the connections between somatic stress memories and histone marks associated with the chromatin at drought stress-responsive genes [18, 19, 38]. These studies suggest that dehydration stress-induced somatic memory depends on many factors, including the magnitude and duration of the stress, the genotype of the plant in question, and the growth phase in which the stress memory is stored [18]. In the following sections, we will describe our current knowledge of chromatin modifications associated with droughtinduced somatic transcriptional memory and with physiological stress memory developed in response to recurring drought stress.

2

Mechanisms of Drought Stress Memory

2.1 Plant Physiological Perspective

Drought stress negatively affects plants at different levels: developmental (e.g., reduced germination, plant height, plant biomass), physiological (e.g., reduced water content, photosynthetic activity, pigment content, membrane integrity), biochemical (e.g.,

Plant Epigenetic Stress Memory Induced by Drought

247

accumulation of osmoprotectants like proline, sugars, and antioxidants), and molecular (e.g., altered expression of stress-related genes) [36]. Plants have evolved a plethora of mechanisms to withstand recurring drought stress, for instance, by establishing drought stress memory through the photoinhibition of photosynthesis preceded by an increase in abscisic acid (ABA) levels [9, 10]. ABA plays a crucial role in stomatal closure and upkeeping the water balance in the plant during drought stress [39]. ABA is a key signaling molecule that initiates and regulates transcription of many drought stress-responsive genes, especially those encoding dehydrin-like proteins and promotes the production of protective osmolytes and helps maintain membrane structures [40]. Some reports showed that ABA may play a role in drought stress memory in a short term, i.e., for days or weeks [7, 41]. In Arabidopsis thaliana (Arabidopsis), transcription of several ABA-induced genes was found to be increased in response to repeated dehydration, keeping up the leaf water content by lower transpiration rates and, thus, maintaining a physiological memory [7, 42, 43]. In agreement with this study, ABA levels were higher in twice-stressed Aptenia cordifolia and rice plants compared to the once-stressed plants and hence it was concluded that drought stress memory occurs after repetitive stress in these plants [41, 44]. Moreover, it was shown that guard cell-specific stomatal memory mediated by ABA maintains a partially closed stomatal aperture during the recovery period in Arabidopsis plants [43]. These studies highlight the involvement of ABA in establishing a short-term physiological memory under repeated drought conditions [42, 43]. Many studies have proven that drought stress priming at early developmental stages effectively alleviates drought stress during the later growth stages in various crops [45–47]. In wheat, a second drought stress exposure resulted in improved H2O2 scavenging and control of reactive oxygen species (ROS) levels by enhanced functionality of glutathione reductase, dehydroascorbate reductase, and ascorbate peroxidase [48]. Pre-stressed rice seedlings developed an efficient antioxidative machinery (involving peroxidase and superoxide dismutase activity) to deter the harmful effects caused by oxidative damage in response to subsequent drought stress [49]. In addition to decreased accumulation of ROS, the drought-acclimated wheat seedlings also showed reduced membrane damage and higher water retention compared to the non-acclimated controls [45, 46, 50]. In agreement with the physiological responses, accumulation of proteins including Rubisco small subunit, Rubisco activase, and ascorbate peroxidase was observed in the primed plants [47]. Hence, the increased resistance in primed plants against drought stress reduces their short-term productivity through the reduction of photosynthesis but, in this way, leads to greater productivity in the longer term [27].

248

James Godwin and Sara Farrona

2.2 Cross-Tolerance Between Different Stresses

Drought stress priming can also induce a cross tolerance to other stresses [50, 51]. For example, drought stress applied at the vegetative stage (i.e., stem elongation) in spring wheat was shown to stimulate a cross tolerance to heat and drought stress during the grain filling stage, resulting in reduced grain yield loss compared to the non-primed plants [51]. The drought-primed plants showed higher photosynthetic rates under heat stress because of higher carboxylation rate and lower energy dissipation in relation to non-primed plants [51]. Furthermore, drought priming by exposure of plants to mild drought stress has been reported to improve heat tolerance in various plant species [27, 52, 53]. In the grass, tall fescue (Festuca arundinacea), enhanced heat tolerance in droughtprimed plants was found to be due to accumulation of phospholipids and glycolipids involved in membrane stabilization and stress signaling [52]. Chen and Arora (2013) extensively reviewed the mechanisms behind stress tolerance acquired by seed priming and proposed that improved stress tolerance in primed germinating seeds may be partially due to enhanced germination potential and acquisition of a “stress memory” as a result of abiotic stresses experienced during seed priming [54]. Several studies have shown that seeds lose their desiccation tolerance (DT) upon germination [55–57]. However, reinduction of DT in germinated seeds by priming with polyethylene glycol (PEG) was first reported in 1995 [58]. In fact, Arabidopsis seeds treated with PEG reinduced DT mechanisms after germination. Strikingly, improved survival in PEG-treated plants was still present for at least 5–10 days after rehydration [57]. Re-establishment of DT in germinated desiccation-sensitive seeds has been shown in several other plants [58–60]. Despite our current knowledge of the impact of priming on the physiology of the seed, there have been far fewer attempts to understand the molecular memory mechanisms of DT induced by seed priming [61, 62]. The amino acid, proline, seems to be a critical component of drought tolerance in plants [63], and higher concentrations of proline were accumulated after recurring drought stress in rice [44]. Contrastingly, lower proline content was observed during the second and third stresses compared to the first stress in sugar beet. Moreover, biochemical parameters like chlorophyll fluorescence and osmotic potential do not correlate with the physiological memory responses [64]. Taken together, these results point to high species specificity of these responses. The current advancement in “omics” techniques will enable us to further understand the specific consequences and the interaction among genes, proteins, and metabolites in response to plant/seed priming and an increased stress tolerance through the generation of a stress memory in different species.

Plant Epigenetic Stress Memory Induced by Drought

249

2.3 Reprogramming of Transcriptional Memory Induced by Drought Stress

Transcriptional reprogramming is a common feature of the primed state. Whole genome transcriptomic analysis of drought-primed plants revealed the presence of diverse transcriptional response patterns. In Arabidopsis, around 2000 genes showed memory responses with respect to drought [42]. Four distinct memory responses were recognized, based on the level of transcripts produced in subsequent stresses compared to the first stress, suggesting that the transcriptional response to priming is complex [42]. Our focus in this review is specifically on transcriptional memory responses in which increased transcript levels are produced from the memory genes upon repeated drought stress (Fig. 1). Other transcriptional memory responses include enhanced repression, loss of induction, and loss of repression of genes during subsequent stress. This study also revealed some conserved stress acclimation features between Arabidopsis and maize, but also differential regulation of orthologous genes depending on the species, indicating both evolutionary conservation and divergence in drought stress response and memory [42]. Especially, memory genes with functions in hormone-regulated pathways (abscisic acid (ABA) and jasmonic acid (JA)) were identified via transcriptomic analysis after repeated drought stress, implicating the conserved role of phytohormones in short-term drought stress memory [7, 8, 42, 43]. Furthermore, memory genes such as RESPONSIVE TO DESSICATION 29B (RD29B) and RESPONSIVE TO ABSCISIC ACID 18 (RAB18) displayed altered transcriptional levels as response to repeated drought stresses whereas related non-memory genes (e.g., RD29A) responded in a similar way at each stress exposure [8]. In rice, the expression of Δ1PYRROLINE-5-CARBOXYLATE SYNTHETASE 1 (P5CS1), a gene involved in proline biosynthesis and with a potential role in drought tolerance, exhibited a transcriptional memory after repeated drought stress [44]; this gene has also been implicated in transcriptional memory response to recurring salt stress [65].

2.4 Histone Modifications Associated with Drought Stress Memory Genes

In order to understand the regulation of memory genes, it will be important to relate these transcriptional changes to their associated histone marks. For example, persistence of histone marks through plant development could enable the altered regulation seen in response to repeated drought exposure and so explain the memory responses seen at memory genes like RD29B. A chromatin modification widely associated with somatic drought stress memory is the H3K4me3 mark [38]. It was found that upon repeated drought stress exposure in Arabidopsis, both RD29B and RAB18 maintained higher transcript levels than during the previous drought stress treatment and were associated with increased H3K4me3 levels and high phosphorylation of serine 5 (Ser5P) in the C-terminal domain of associated RNA polymerase II, indicating the presence of stalled RNA polymerase II during the

250

James Godwin and Sara Farrona

Table 1 Histone marks that have been related with the regulation of drought stress-responsive memory genes and non-memory genes tested in Arabidopsis Species

Chromatin marks tested

Arabidopsis H3K4me3 H3K4me3 H3K9ac H3K4me3, H3K9ac, H3K14ac, H3K23ac and H3K27ac H3K4me1, H3K4me2 and H3K4me3

H3K4me3 H3K27me3

Genes tested

Reference

RD29A, RD29B, COR15A, RAB18 [7] RD29A, RD20 [37] and AtGOLS2 RD29A, RD29B, RD20 and RAP2.4 [82] RD29A, RD29B AT5G52280, AT5G52290 and AT5G52320 LTP3, LTP4, HIPP2.2, RD29A, RD29B and RAB18

[92]

[70]

recovery phase. RNA polymerase II stalling could assist to provide an active chromatin state and prepare stimulus-responsive genes for expression at the right time. On the other hand, transcript levels of non-memory genes, RD29A and COLD REGULATED 15A (COR15A), remained similar after each drought treatment [7]. Therefore, the persistence of H3K4me3 marks at the drought stress-responsive genes after the initial stress exposure, together with paused RNA polymerase II complexes, may suggest a potential function as components of stress memory (Fig. 1). This is the first evidence that stalled RNA polymerase II in plants can be a potential factor involved in the transcriptional memory mechanism. H3K4me3 enrichment at memory genes has been additionally shown in other stresses, especially HSP 22.0 for heat stress [66] and P5CS1 for salt stress [65]. Enriched acetylation of lysine 9 of histone H3 (H3K9ac) is also correlated with the active state of gene expression during drought stress; however, it falls later during the recovery period. Therefore, specific chromatin marks, especially H3K4me3, but not others such as H3K9ac, at drought stress-responsive genes may play a key role in somatic stress memory [7]. In agreement with these reports, Kim et al. (2012) also reported that histone H3K4me3 was maintained on drought-inducible genes RD20, RD29A, and GALACTINOL SYNTHASE 2 (AtGOLS2) at some levels by rehydration [37]. These studies highlighted that H3K4me3 may act as a fingerprint for stress memory by keeping a basal transcriptional stage during the recovery phase that helps genes to promptly respond to a second stress. Table 1 highlights the studies in Arabidopsis showing histone marks related to drought stress-responsive memory and non-memory genes that have been tested to date. H3 lysine 4 (H3K4) methylation patterns are dynamically modulated by several specific H3K4 methyltransferases and

Plant Epigenetic Stress Memory Induced by Drought

251

demethylases [67] which may therefore be regulators of stress memory. However, the evidence for this remains unclear. The H3K4 methyltransferase ARABIDOPSIS TRITHORAX 1 (ATX1) is required to bind and activate the NINE-CIS-EPOXYCAROTENOID DIOXYGENASE 3 (NCED3) locus, which encodes a key enzyme in the ABA biosynthesis pathway and plays a key role in dehydration-induced stress signaling. In atx1 mutant plants, it has been shown NCED3 has significantly lower enrichment of H3K4me3 compared to wild-type plants under drought stress conditions which is correlated with a stronger decrease in NCED3 transcript levels and RNA polymerase II levels [68]. Moreover, the transcript levels of drought stress-responsive genes such as ALCOHOL DEHYDROGENASE 1 (ADH1), RD29A, COR15A, and RD29B were decreased in the drought-exposed atx1 mutant compared to drought-treated wild-type plants. However, decreased transcript levels were correlated with lower H3K4me3 levels at memory gene (RD29B) and non-memory genes (RD29A, COR15A and ADH1) irrespectively in the atx1 mutant after the first drought stress [68]. In Arabidopsis, ATX1 does not therefore seem to be the sole factor involved in drought stress memory despite playing a vital role in regulation of the drought stress-responsive gene network via NCED3. Thus, it is possible that a different H3K4 methyltransferase may be required for the transcriptional memory in plants. The polycomb group (PcG) methyltransferase CURLY LEAF (CLF), activity of which has been extensively studied in relation to the establishment of the H3K27me3 mark at developmental genes [69], was also found to be involved in dehydration stress response in a gene-specific manner [70, 71]. In animals and plants, H3K27me3 is considered a repressive mark counteracting the activating functions of H3K4me3 [72]. Accordingly, the transcript levels of memory genes such as LIPID TRANSFER PROTEIN 3 (LTP3), LTP4, and HEAVY METAL ASSOCIATED ISOPRENYLATED PLANT PROTEIN 2.2 (HIPP2.2) in clf are strongly increased, in agreement with the repressive function of CLF [70]. Curiously, lack of CLF instead leads to reduction of transcript levels of memory genes, RD29B and RAB18 (slightly in the first stress (S1) and more strongly during the second stress (S2)) despite showing similar H3K27me3 levels at those genes compared to wild type. Moreover, the presence of high levels of pre-existing H3K27me3 marks did not prevent the strong transcription of the dehydration stress-responsive genes, RD29B and RAB18 in wildtype plants during S2 [70, 71]. These studies could indicate that H3K4me3 and H3K27me3 marks work independently and are not mutually exclusive at dehydration stress-responding memory genes [71], which is in contrast to the general assumption [73, 74]. However, it is notable that these results could still be explained by either different chromatin states in different cell types or H3K4me3 and

252

James Godwin and Sara Farrona

H3K27me3 marks being present on the tails of different histone molecules. Schmitges et al. (2011) demonstrated that the presence of H3K4me3 inhibits the H3K27-methylating activity of polycomb repressive complex 2 (PRC2), but only if the target lysine is on the same histone tail (in cis). Furthermore, bivalent chromatin states, i.e., large regions of H3K27me3 harboring smaller regions of H3K4me3, have been described in developmental genes of both plants [75] and animals [76], but it has not yet been studied in relation to stress-responsive genes. Therefore, further studies might elucidate the mechanism of strong transcription rate of drought stress-responsive memory genes.

3 Possible Interactions Between Drought Stress Signals and Chromatin-Mediated Stress Memory In response to dehydration signals, plants activate stress response cascades via the expression of many drought stress-responsive genes that encode functional and regulatory proteins [77, 78]. However, chromatin marks themselves do not sense the stimuli; instead they are established at drought stress-responsive loci through the recruitment of chromatin modifiers by DNA-binding proteins such as transcription factors (TFs) [47]. TFs of the ABA response element (ABRE) binding factors (ABF) family are key regulators in dehydration-induced stress signaling [79, 80]. Two ABREs lying adjacent to each other in the promoter region were found to be required for transcriptional memory of RD29B during recurring drought stress [71]. ABREs are bound by ABFs to induce drought stress response. In Arabidopsis, three genes encoding for ABFs (ABF2, ABF3, and ABF4) were mutated to examine whether any changes occur to the transcriptional memory of drought stressassociated memory genes. Interestingly, the induction of memory genes was compromised, but did not result in the loss of transcriptional memory, apparently excluding the possibility that accumulated ABFs are responsible for superinduced transcription of memory genes. These findings also suggest that different transfactors may be necessary for transcriptional memory [7, 81]. The association of trans-factors, if any, with dehydration-specific memory responses still needs to be determined. Furthermore, it has been confirmed that transcriptional memory behavior induced by dehydration stress is through both ABA-dependent [43] and ABA-independent pathways [7, 70]. Virlouvet and Fromm (2015) examined cell-specific transcriptional stress memory between guard cells and leaf tissues (predominantly comprising mesophyll cells) during repeated dehydration stress [43]. The drought stress-responsive genes RAB18, LTP4, and RD29B displayed significant transcriptional memory with no difference in

Plant Epigenetic Stress Memory Induced by Drought

253

memory patterns between the two tissues [43], in agreement with importance of transcriptional memory of drought stress-responsive genes in leaves [7]. However, ABA-responsive genes such as NCED3 and ALDEHYDE OXIDASE 3 (AAO3) displayed a transcriptional memory in guard cells but not in leaf cells [43], in line with roles of ABA in control of stomatal aperture. These differences are presumably underpinned by cell type-specific chromatin patterns at these loci, but this remains to be determined. Taken together, much evidence points towards a prominent role of chromatin-based mechanisms in transcriptional memory responses associated with drought stress. There is no doubt that progress will be made in understanding the molecular basis of drought stress memory and its regulation in the coming years.

4

Conclusions and Future Perspectives The significance of drought stress memory has been demonstrated in many studies from a physiological perspective [43, 45, 46, 49, 51], but understanding the molecular mechanisms behind the development of stress memory is still in an infant stage. The contribution of chromatin dynamics to transcriptional memory has been shown to be an emerging mechanism in the response to repeated drought [18, 19]. Accumulated evidences suggest that memorization of certain past drought experiences occur through chromatin modifications at drought stress-responsive genes and cause subsequent enhanced responses [7, 18, 37, 43, 70, 71, 81, 82]. Deeper insights into the epigenetic mechanisms involved in drought stress memory will shed light on how plants tailor the transcription of drought stress-responsive genes under repeated drought stress, including in different tissues and cell types. Despite the studies of H3K4me3 enrichment at memory genes, there is still only limited knowledge on transcriptional drought stress memory responses associated with silencing modifications such as H3K27me3 and H3K9me3/me2. It will be fascinating to study the interplay between different histone modifications in future investigations. An open question still surrounds the persistence and stability of chromatin marks during mitosis for long-term memory and thus the maintenance of inherited transcriptional states. Indeed, the mitotic transmission of histone modifications is not well understood in general. In Arabidopsis, CG and non-CG methylation and H3K9 methylation present at heterochromatic loci are maintained by DNA DEMETHYLASE 1 (DDM1) through cell division [83–85]. Saze et al. (2008) suggested that the stable mitotic transmission of DNA methylation could be due to direct H3K9 methylation and could lead to the persistence of a repressive chromatin state [86]. In addition, DNA methylation in other organisms, such

254

James Godwin and Sara Farrona

as the fungus Neurospora crassa, appears to be controlled only by H3K9 methylation because lack of the Dim-5 H3K9 methyltransferase led to complete loss of DNA methylation genomewide [86– 89]. Possibly, DNA methylation might play a role in inheritance of histone modifications or vice versa. In addition, the specific mechanisms such as semi-conservative assembly of histone octamers during DNA replication and the deposition of histone variants H3.3 replacing H3 onto active loci in a replication-independent manner throughout the cell cycle may explain the mitotic transmission of histone marks [86]. Further studies are needed in plants to explain the mechanism of transfer of histone modifications during cell cycle. Several open questions concerning molecular, physiological, and ecological aspects of stress priming and memory also need to be addressed. For instance, it is essential to understand clearly whether drought priming is a constant feature throughout the lifetime of a plant or whether it depends on the developmental stage of a plant or is connected to specific tissues or organs. Furthermore, drought priming-induced cross-tolerance has been shown in plants [44, 50, 51, 53], but the underlying mechanisms are unknown. Moreover, it will be interesting to consider whether somatic memories developed by different stresses are maintained by the same epigenetic mechanisms: determining this across combinations of abiotic stresses with different causes, severity, and duration will elucidate this and potentially enable the identification of universal stress memory regulators of value in crop breeding and biotechnology. This integration of different environmental stress events at the level of transcriptional memory to generate an orchestrated outcome to modulate plant growth and responses to a complex environment will have particularly important potential applications for resilience of crop production under climate change scenarios. Another area of interest will be to examine the level of evolutionary conservation of stress memory and the molecular mechanisms involved in establishing it. The possible common mechanisms of memories of biotic and abiotic stress will also need to be elucidated. It has also been noted that the duration of drought stress memory is typically limited, so that if the stress is not encountered again within a certain period of time, the memory seems to be erased and resetting occurs [17]. This may be because maintaining the primed state requires the allocation of resources by the plant. Therefore, it will be important to understand what determines the duration of the stress memory and, at the same time, how can we determine somatic stress memory activation in the field level with combined stresses to figure out if they may provide fitness advantage to the plant.

Plant Epigenetic Stress Memory Induced by Drought

255

As most of the studies were carried out using the model plant Arabidopsis, further studies with crop plants will undoubtedly provide a deeper knowledge on epigenetic stress memory created after drought priming and novel insights into adaptation in plants. The advent of CRISPR/Cas9-mediated epigenome editing technology [90, 91] will enable researchers to selectively modify epigenetic marks at loci associated with drought memory and to explore mechanisms of how targeted epigenetic modifications would affect transcription regulation and subsequent changes in the phenotypic level under repeated drought stress exposure. In addition, targeted activation of stress memory pathways using the epigenetic toolbox will enable us to enhance drought tolerance in crop species. However, the main initial challenge is to elucidate the plant memory for a specific drought stress event in a dynamically changing environment. References 1. Ciais P, Reichstein M, Viovy N et al (2005) Europe-wide reduction in primary productivity caused by the heat and drought in 2003. Nature 437:529–533. https://doi.org/10. 1038/nature03972 2. Lobell DB, Schlenker W, Costa-Roberts J (2011) Climate trends and global crop production since 1980. Science 333(80):616–620. https://doi.org/10.1126/science.1204531 3. Tack J, Barkley A, Nalley LL (2015) Effect of warming temperatures on US wheat yields. Proc Natl Acad Sci U S A 112:6931–6936. https://doi.org/10.1073/pnas.1415181112 4. Daryanto S, Wang L, Jacinthe P-A (2016) Global synthesis of drought effects on maize and wheat production. PLoS One 11: e0156362. https://doi.org/10.1371/journal. pone.0156362 5. Jakab G, Ton J, Flors V et al (2005) Enhancing Arabidopsis salt and drought stress tolerance by chemical priming for its abscisic acid responses. Plant Physiol 139:267 LP–267274. https:// doi.org/10.1104/pp.105.065698 6. Maseda PH, Ferna´ndez RJ (2006) Stay wet or else: three ways in which plants can adjust hydraulically to their environment. J Exp Bot 57:3963–3977. https://doi.org/10.1093/ jxb/erl127 7. Ding Y, Fromm M, Avramova Z (2012) Multiple exposures to drought “train” transcriptional responses in Arabidopsis. Nat Commun 3:740–749. https://doi.org/10.1038/ ncomms1732 8. Ding Y, Virlouvet L, Liu N et al (2014) Dehydration stress memory genes of Zea mays;

comparison with Arabidopsis thaliana. BMC Plant Biol 14:141. https://doi.org/10.1186/ 1471-2229-14-141 9. Ramı´rez DA, Rolando JL, Yactayo W et al (2015) Improving potato drought tolerance through the induction of long-term water stress memory. Plant Sci 238:26–32. https:// doi.org/10.1016/j.plantsci.2015.05.016 10. Walter J, Nagy L, Hein R et al (2011) Do plants remember drought? Hints towards a drought-memory in grasses. Environ Exp Bot 71:34–40. https://doi.org/10.1016/j. envexpbot.2010.10.020 11. D’Urso A, Brickner JH (2014) Mechanisms of epigenetic memory. Trends Genet 30:230–236. https://doi.org/10.1016/j.tig. 2014.04.004 12. Crisp PA, Ganguly D, Eichten SR et al (2016) Reconsidering plant memory: intersections between stress recovery, RNA turnover, and epigenetics. Sci Adv 2:e1501340–e1501340. https://doi.org/10.1126/sciadv.1501340 13. Skirycz A, Inze´ D (2010) More from less: plant growth under limited water. Curr Opin Biotechnol 21:197–203. https://doi.org/10. 1016/j.copbio.2010.03.002 14. Ton J, Jakab G, Toquin V et al (2005) Dissecting the β-aminobutyric acid–induced priming phenomenon in Arabidopsis. Plant Cell 17:987 LP–987999. https://doi.org/10.1105/tpc. 104.029728 15. Zimmerli L, Hou B-H, Tsai C-H et al (2008) The xenobiotic β-aminobutyric acid enhances Arabidopsis thermotolerance. Plant J 53:144–156. https:// doi.org/10.1111/j.1365-313X.2007.03343.x

256

James Godwin and Sara Farrona

16. van Hulten M, Pelser M, van Loon LC et al (2006) Costs and benefits of priming for defense in Arabidopsis. Proc Natl Acad Sci U S A 103:5602–5607. https://doi.org/10. 1073/pnas.0510213103 17. Kinoshita T, Seki M (2014) Epigenetic memory for stress response and adaptation in plants. Plant Cell Physiol 55:1859–1863. https://doi. org/10.1093/pcp/pcu125 18. Avramova Z (2015) Transcriptional “memory” of a stress: transient chromatin and memory (epigenetic) marks at stress-response genes. Plant J 83:149–159. https://doi.org/10. 1111/tpj.12832 19. L€amke J, B€aurle I (2017) Epigenetic and chromatin-based mechanisms in environmental stress adaptation and stress memory in plants. Genome Biol 18:1–11. https://doi. org/10.1186/s13059-017-1263-6 20. Hilker M, Schwachtje J, Baier M et al (2016) Priming and memory of stress responses in organisms lacking a nervous system. Biol Rev 91:1118–1133. https://doi.org/10.1111/ brv.12215 21. Weinhold A (2018) Transgenerational stressadaption: an opportunity for ecological epigenetics. Plant Cell Rep 37:3–9. https://doi. org/10.1007/s00299-017-2216-y 22. Iglesias FM, Cerda´n PD (2016) Maintaining epigenetic inheritance during DNA replication in plants. Front Plant Sci 7:38 23. Hauser M-T, Aufsatz W, Jonak C, Luschnig C (2011) Transgenerational epigenetic inheritance in plants. Biochim Biophys Acta 1809:459–468. https://doi.org/10.1016/j. bbagrm.2011.03.007 24. Amtmann A, Sani E, Herzyk P et al (2013) Hyperosmotic priming of Arabidopsis seedlings establishes a long-term somatic memory accompanied by specific changes of the epigenome. Genome Biol 14:R59 25. Wibowo A, Becker C, Marconi G et al (2016) Hyperosmotic stress memory in Arabidopsis is mediated by distinct epigenetically labile sites in the genome and is restricted in the male germline by DNA glycosylase activity. Elife 5: e13546. https://doi.org/10.7554/eLife. 13546 26. Pecinka A, Mittelsten Scheid O (2012) Stressinduced chromatin changes: a critical view on their heritability. Plant Cell Physiol 53:801–808. https://doi.org/10.1093/pcp/ pcs044 27. Bruce TJA, Matthes MC, Napier JA, Pickett JA (2007) Stressful “memories” of plants: evidence and possible mechanisms. Plant Sci

173:603–608. https://doi.org/10.1016/j.pla ntsci.2007.09.002 28. Conrath U, Beckers GJM, Langenbach CJG, Jaskiewicz MR (2015) Priming for enhanced defense. Annu Rev Phytopathol 53:97–119. https://doi.org/10.1146/annurev-phyto080614-120132 29. Kim J-M, Sasaki T, Ueda M et al (2015) Chromatin changes in response to drought, salinity, heat, and cold stresses in plants. Front Plant Sci 6:1–12. https://doi.org/10.3389/fpls.2015. 00114 30. Zentner GE, Henikoff S (2013) Regulation of nucleosome dynamics by histone modifications. Nat Struct Mol Biol 20:259 31. Luger K, M€ader AW, Richmond RK et al (1997) Crystal structure of the nucleosome ˚ resolution. Nature core particle at 2.8 A 389:251–260. https://doi.org/10.1038/ 38444 32. Hergeth SP, Schneider R (2015) The H1 linker histones: multifunctional proteins beyond the nucleosomal core particle. EMBO Rep 16:1439–1453. https://doi.org/10.15252/ embr.201540749 33. Yuan L, Liu X, Luo M et al (2013) Involvement of histone modifications in plant abiotic stress responses. J Integr Plant Biol 55:892–901. https://doi.org/10.1111/jipb. 12060 34. Campos EI, Reinberg D (2009) Histones: annotating chromatin. Annu Rev Genet 43:559–599. https://doi.org/10.1146/ annurev.genet.032608.103928 35. Ba´rtova´ E, Krejcı´ J, Harnicarova´ A et al (2008) Histone modifications and nuclear architecture: a review. J Histochem Cytochem 56:711–721. https://doi.org/10.1369/jhc. 2008.951251 36. Farooq M, Hussain M, Wahid A, Siddique KHM (2012) Drought stress in plants: an overview. In: Aroca R (ed) Plant responses to drought stress: from morphological to molecular features. Springer, Berlin, Heidelberg, pp 1–33 37. Kim JM, To TK, Ishida J et al (2012) Transition of chromatin status during the process of recovery from drought stress in arabidopsis thaliana. Plant Cell Physiol 53:847–856. https://doi.org/10.1093/pcp/pcs053 38. B€aurle I (2018) Can’t remember to forget you: chromatin-based priming of somatic stress responses. Semin Cell Dev Biol 83:133–139. https://doi.org/10.1016/j.semcdb.2017.09. 032 39. Tuteja N (2007) Abscisic acid and abiotic stress signaling. Plant Signal Behav 2:135–138

Plant Epigenetic Stress Memory Induced by Drought 40. Verslues PE, Agarwal M, Katiyar-Agarwal S et al (2006) Methods and concepts in quantifying resistance to drought, salt and freezing, abiotic stresses that affect plant water status. Plant J 45:523–539. https://doi.org/10. 1111/j.1365-313X.2005.02593.x 41. Fleta-Soriano E, Pinto´-Marijuan M, Munne´-Bosch S (2015) Evidence of drought stress memory in the facultative CAM, Aptenia cordifolia: possible role of phytohormones. PLoS One 10:1–12. https://doi.org/10.1371/jour nal.pone.0135391 42. Ding Y, Liu N, Virlouvet L et al (2013) Four distinct types of dehydration stress memory genes in Arabidopsis thaliana. BMC Plant Biol 13:229. https://doi.org/10.1186/14712229-13-229 43. Virlouvet L, Fromm M (2015) Physiological and transcriptional memory in guard cells during repetitive dehydration stress. New Phytol 205:596–607. https://doi.org/10.1111/nph. 13080 44. Li P, Yang H, Wang L et al (2019) Physiological and transcriptome analyses reveal shortterm responses and formation of memory under drought stress in rice. Front Genet 10:1–16. https://doi.org/10.3389/fgene. 2019.00055 45. Selote DS, Khanna-Chopra R (2010) Antioxidant response of wheat roots to drought acclimation. Protoplasma 245:153–163. https:// doi.org/10.1007/s00709-010-0169-x 46. Selote DS, Khanna-Chopra R (2006) Drought acclimation confers oxidative stress tolerance by inducing co-ordinated antioxidant defense at cellular and subcellular level in leaves of wheat seedlings. Physiol Plant 127:494–506. https://doi.org/10.1111/j.1399-3054.2006. 00678.x 47. Wang X, Vignjevic M, Jiang D et al (2014) Improved tolerance to drought stress after anthesis due to priming before anthesis in wheat (Triticum aestivum L.) var. Vinjett. J Exp Bot 65:6441–6456. https://doi.org/10. 1093/jxb/eru362 48. Sgherri CLM, Navari-lzzo F, Menconi M, Pinzino C (1995) Activated oxygen production and detoxification in wheat plants subjected to a water deficit programme. J Exp Bot 46:1123–1130. https://doi.org/10.1093/ jxb/46.9.1123 49. Li X, Zhang L, Li Y (2011) Preconditioning alters antioxidative enzyme responses in rice seedlings to water stress. Procedia Environ Sci 11:1346–1351. https://doi.org/10.1016/j. proenv.2011.12.202

257

50. Wang X, Liu F-L, Jiang D (2017) Priming: a promising strategy for crop production in response to future climate. J Integr Agric 16 (12):2709–2716. https://doi.org/10.1016/ S2095-3119(17)61786-6 51. Wang X, Vignjevic M, Liu F et al (2015) Drought priming at vegetative growth stages improves tolerance to drought and heat stresses occurring during grain filling in spring wheat. Plant Growth Regul 75:677–687. https://doi. org/10.1007/s10725-014-9969-x 52. Zhang X, Xu Y, Huang B (2019) Lipidomic reprogramming associated with drought stress priming-enhanced heat tolerance in tall fescue (Festuca arundinacea). Plant Cell Environ 42:947–958. https://doi.org/10.1111/pce. 13405 53. Ashoub A, Baeumlisberger M, Neupaertl M et al (2015) Characterization of common and distinctive adjustments of wild barley leaf proteome under drought acclimation, heat stress and their combination. Plant Mol Biol 87:459–471. https://doi.org/10.1007/ s11103-015-0291-4 54. Chen K, Arora R (2013) Priming memory invokes seed stress-tolerance. Environ Exp Bot 94:33–45. https://doi.org/10.1016/j. envexpbot.2012.03.005 55. Buitink J, Leger JJ, Guisle I et al (2006) Transcriptome profiling uncovers metabolic and regulatory processes occurring during the transition from desiccation-sensitive to desiccationtolerant stages in Medicago truncatula seeds. Plant J 47:735–750 56. Daws MI, Bolton S, Burslem DFRP et al (2007) Loss of desiccation tolerance during germination in neo-tropical pioneer seeds: implications for seed mortality and germination characteristics. Seed Sci Res 17:273–281. https://doi.org/10.1017/ S0960258507837755 57. Maia J, Dekkers BJW, Provart NJ et al (2011) The re-establishment of desiccation tolerance in germinated arabidopsis thaliana seeds and its associated transcriptome. PLoS One 6:e29123. https://doi.org/10.1371/journal.pone. 0029123 58. Bruggink T, Van P (1995) Induction of desiccation tolerance in germinated seeds. Seed Sci Res 5:1–4. https://doi.org/10.1017/ S096025850000252X 59. Buitink J, Ly Vu B, Satour P, Leprince O (2003) The re-establishment of desiccation tolerance in germinated radicles of Medicago truncatula Gaertn. seeds. Seed Sci Res 13:273–286. https://doi.org/10.1079/ SSR2003145

258

James Godwin and Sara Farrona

60. Vieira CV, Amaral da Silva EA, de Alvarenga AA et al (2010) Stress-associated factors increase after desiccation of germinated seeds of Tabebuia impetiginosa Mart. Plant Growth Regul 62:257–263. https://doi.org/10. 1007/s10725-010-9496-3 61. Gallardo K (2001) Proteomic analysis of Arabidopsis seed germination and priming. Plant Physiol 126:835–848. https://doi.org/10. 1104/pp.126.2.835 62. Catusse J, Meinhard J, Job C et al (2011) Proteomics reveals potential biomarkers of seed vigor in sugarbeet. Proteomics 11:1569–1580. https://doi.org/10.1002/ pmic.201000586 63. Hayat S, Hayat Q, Alyemeni MN et al (2012) Role of proline under changing environments: a review. Plant Signal Behav 7:1456–1466. https://doi.org/10.4161/psb.21949 64. Leufen G, Noga G, Hunsche M (2016) Drought stress memory in sugar beet: mismatch between biochemical and physiological parameters. J Plant Growth Regul 35:680–689. https://doi.org/10.1007/ s00344-016-9571-8 65. Feng XJ, Li JR, Qi SL et al (2016) Light affects salt stress-induced transcriptional memory of P5CS1 in Arabidopsis. Proc Natl Acad Sci U S A 113:E8335–E8343. https://doi.org/10. 1073/pnas.1610670114 66. L€amke J, Brzezinka K, Altmann S, B€aurle I (2016) A hit-and-run heat shock factor governs sustained histone methylation and transcriptional stress memory. EMBO J 35:162 LP–162175. https://doi.org/10.15252/ embj.201592593 67. Chen L-Q, Luo J-H, Cui Z-H et al (2017) Encode putative H3K4 methyltransferases and are critical for plant development. Plant Physiol 174:1795–1806. https://doi.org/10.1104/ pp.16.01944 68. Ding Y, Avramova Z, Fromm M (2011) The Arabidopsis trithorax-like factor ATX1 functions in dehydration stress responses via ABA-dependent and ABA-independent pathways. Plant J 66:735–744. https://doi.org/ 10.1111/j.1365-313X.2011.04534.x 69. Schubert D, Primavesi L, Bishopp A et al (2006) Silencing by plant Polycomb-group genes requires dispersed trimethylation of histone H3 at lysine 27. EMBO J 25:4638–4649. https://doi.org/10.1038/sj.emboj.7601311 70. Liu N, Fromm M, Avramova Z (2014) H3K27me3 and H3K4me3 chromatin environment at super-induced dehydration stress memory genes of arabidopsis thaliana. Mol

Plant 7:502–513. https://doi.org/10.1093/ mp/ssu001 71. Liu N, Ding Y, Fromm M, Avramova Z (2014) Different gene-specific mechanisms determine the “revised-response” memory transcription patterns of a subset of A. thaliana dehydration stress responding genes. Nucleic Acids Res 42:5556–5566. https://doi.org/10.1093/ nar/gku220 72. Ko¨hler C, Villar CBR (2008) Programming of gene expression by Polycomb group proteins. Trends Cell Biol 18:236–243. https://doi. org/10.1016/J.TCB.2008.02.005 73. Roudier F, Ahmed I, Be´rard C et al (2011) Integrative epigenomic mapping defines four main chromatin states in Arabidopsis. EMBO J 30:1928–1938. https://doi.org/10.1038/ emboj.2011.103 74. Schmitges FW, Prusty AB, Faty M et al (2011) Histone methylation by PRC2 Is inhibited by active chromatin marks. Mol Cell 42:330–341. https://doi.org/10.1016/j.molcel.2011.03. 025 75. Saleh A, Al-Abdallat A, Ndamukong I et al (2007) The Arabidopsis homologs of trithorax (ATX1) and enhancer of zeste (CLF) establish “bivalent chromatin marks” at the silent AGAMOUS locus. Nucleic Acids Res 35:6290–6296. https://doi.org/10.1093/ nar/gkm464 76. Bernstein BE, Mikkelsen TS, Xie X et al (2006) A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125:315–326. https://doi.org/10.1016/ j.cell.2006.02.041 77. Fujita Y, Fujita M, Shinozaki K, YamaguchiShinozaki K (2011) ABA-mediated transcriptional regulation in response to osmotic stress in plants. J Plant Res 124:509–525. https:// doi.org/10.1007/s10265-011-0412-3 78. Yamaguchi-Shinozaki K, Shinozaki K (2005) Organization of cis-acting regulatory elements in osmotic- and cold-stress-responsive promoters. Trends Plant Sci 10:88–94. https://doi. org/10.1016/j.tplants.2004.12.012 79. Klingler JP, Batelli G, Zhu J-K (2010) ABA receptors: the START of a new paradigm in phytohormone signalling. J Exp Bot 61:3199–3210. https://doi.org/10.1093/ jxb/erq151 80. Nakashima K, Yamaguchi-Shinozaki K (2013) ABA signaling in stress-response and seed development. Plant Cell Rep 32:959–970. https://doi.org/10.1007/s00299-013-14181 81. Virlouvet L, Ding Y, Fujii H et al (2014) ABA signaling is necessary but not sufficient for

Plant Epigenetic Stress Memory Induced by Drought RD29B transcriptional memory during successive dehydration stresses in Arabidopsis thaliana. Plant J 79:150–161. https://doi.org/10. 1111/tpj.12548 82. Kim JM, To TK, Ishida J et al (2008) Alterations of lysine modifications on the histone H3 N-tail under drought stress conditions in Arabidopsis thaliana. Plant Cell Physiol 49:1580–1588. https://doi.org/10.1093/ pcp/pcn133 83. Vongs A, Kakutani T, Martienssen RA, Richards EJ (1993) Arabidopsis thaliana DNA methylation mutants. Science 260:1926–1928 84. Jeddeloh JA, Stokes TL, Richards EJ (1999) Maintenance of genomic methylation requires a SWI2/SNF2-like protein. Nat Genet 22:94 85. Lippman Z, Gendrel A-V, Black M et al (2004) Role of transposable elements in heterochromatin and epigenetic control. Nature 430:471 86. Saze H (2008) Epigenetic memory transmission through mitosis and meiosis in plants. Semin Cell Dev Biol 19:527–536. https:// doi.org/10.1016/j.semcdb.2008.07.017 87. Simpson VJ, Johnson TE, Hammen RF (1986) Caenorhabditis elegans DNA does not contain 5-methylcytosine at any time during

259

development or aging. Nucleic Acids Res 14:6711–6719 88. Patel CV, Gopinathan KP (1987) Determination of trace amounts of 5-methylcytosine in DNA by reverse-phase high-performance liquid chromatography. Anal Biochem 164:164–169 89. Tamaru H, Selker EU (2001) A histone H3 methyltransferase controls DNA methylation in Neurospora crassa. Nature 414:277 90. Papikian A, Liu W, Gallego-Bartolome´ J, Jacobsen SE (2019) Site-specific manipulation of Arabidopsis loci using CRISPR-Cas9 SunTag systems. Nat Commun 10:729. https:// doi.org/10.1038/s41467-019-08736-7 91. Gallego-Bartolome´ J, Gardiner J, Liu W et al (2018) Targeted DNA demethylation of the Arabidopsis genome using the human TET1 catalytic domain. Proc Natl Acad Sci U S A 115:E2125–E2134. https://doi.org/10. 1073/pnas.1716945115 92. van Dijk K, Ding Y, Malkaram S et al (2010) Dynamic changes in genome-wide histone H3 lysine 4 methylation patterns in response to dehydration stress in Arabidopsis thaliana. BMC Plant Biol 10:238. https://doi.org/10. 1186/1471-2229-10-238

Chapter 18 A Critical Guide for Studies on Epigenetic Inheritance in Plants Daniela Ramos Cruz and Claude Becker Abstract Studies on “epigenetic inheritance” or “transgenerational epigenetic inheritance” have emerged at everincreasing numbers in the last years, in plant as well as animal systems and in diverse contexts ranging from stress adaptation to behavioral studies. Despite the very different organisms and biological processes investigated, the overarching question has been if and how an organism’s epigenome registers and records external cues and relays this information to its progeny. Very often, these experiments are based on the hypothesis that epigenetic memorization of events or conditions could be the basis of an altered response of the progeny upon encountering the same or a similar condition. If confirming the hypothesis, such studies challenge our fundamental understanding of evolution by natural selection; therefore they require particular rigor in design and great care in data analysis. Here, we want to provide general guidelines on how to design studies on epigenetic inheritance in plants and to consider critical points during data analysis and interpretation. While we cannot provide a step-by-step guide that fits all experimental setups and questions addressed, we explain frequent misconceptions and often overlooked pitfalls. Our aim is to provide researchers with conceptual tools to sensibly design their studies and to interpret their results in the admissible framework. Key words Epigenetic inheritance, Transgenerational, Parental effects, DNA methylation, Posttranslational histone modifications, Small RNAs, Somatic memory, Priming

1

Introduction The term “epigenetics” used in different contexts has its definition seen overhauled several times since it was coined by Conrad Waddington in 1942 [1]. Here, we use it in the stricter sense, in which it refers to molecular traits that regulate gene expression without altering the DNA sequence and that are heritable across mitotic or meiotic cell divisions. Epigenetic mechanisms have been postulated to be involved in stress adaptation by priming or stress memory within and/or across generations. While “priming” refers to the physiological defense state triggered in plants in response to a stress stimulus and prepares the plant for future stress exposure,

Charles Spillane and Peter McKeown (eds.), Plant Epigenetics and Epigenomics: Methods and Protocols, Methods in Molecular Biology, vol. 2093, https://doi.org/10.1007/978-1-0716-0179-2_18, © Springer Science+Business Media, LLC, part of Springer Nature 2020

261

262

Daniela Ramos Cruz and Claude Becker

“memory” refers to increased resistance resulting from the priming stimulus. When these effects are observed within the same generation, they are referred to as somatic priming and somatic memory, respectively. In case they persist in one or more subsequent progeny generations, they are classified as intergenerational and transgenerational priming/memory, respectively [2, 3]. Unfortunately, many studies ignore the drastic effects of stress on progeny-bearing parents, often leading to interference with development of embryos and extraembryonic tissues like endosperm, yolk, or placenta. Modified features of the resulting progeny are parental effects, but no evidence for transgenerational inheritance. Only effects that persist in at least the second-generation progeny can be considered as transgenerational. Studies on transgenerational epigenetic inheritance often suffer from shortcomings in the initial experimental design or extent and/or quality of the data analysis, placing the resulting conclusions on uncertain terrain. Because the claim for epigenetic inheritance of acquired traits challenges the principles of natural selection acting on randomly occurring genetic mutations, experiments leading to the support of this claim must be designed with the utmost stringency and care. In the following, we have attempted to summarize the most relevant points to consider in such studies and to highlight limitations when drawing conclusions. As the variability in experimental organisms and conditions does not allow a universally valid step-by-step guide, we rather focus on general guidelines regarding study design and data analysis. Most frequently, epigenetic inheritance studies involve stress treatments; we therefore focus on such a scenario, although most recommendations apply equally to other experimental setups. Major hallmarks of epigenetic gene regulation are covalent modification of DNA (cytosine methylation) and DNA-associated proteins (histone methylation, acetylation, phosphorylation, and several others). Despite recent advances in profiling posttranslational histone modifications, the epigenetic mark most accessible to genome-wide analysis with high resolution and reproducibility is DNA methylation. In addition, it has been intuitively postulated as the most efficient carrier of epigenetic information across generations because of the semiconservative way in which DNA, including its modifications, can be propagated across cell divisions. However, comprehensive analyses of epigenetic inheritance should not be limited to DNA methylation but should also include investigating posttranslational histone modifications and small and long nonprotein-coding RNAs as important messengers for the deposition of epigenetic marks. Moreover, to be relevant for plant physiology, metabolism, vigor, and performance, epigenetic configurations must be linked to transcriptional regulation of corresponding genes. Therefore, all studies on epigenetic inheritance must comprise analyses at the transcriptional level, be it via whole-genome transcriptional profiling or via targeted expression analysis of epigenetically modified loci.

How to Study Epigenetic Inheritance

2

263

Materials Although precise materials will depend on the nature of the experiment and species concerned, some general considerations should be made prior to the experiment regarding the plant material to be analyzed and the computational tools to be used in an epigenetic inheritance study.

2.1 Biological Material

The first question that needs to be addressed is whether the experiments should focus on epigenetic modifications in somatic or reproductive tissue. Somatic tissue is generally easier to obtain in larger quantities and can reveal mitotic inheritance. However, to postulate propagation of somatic epigenetic changes, a sufficient number of cells in the tissue must undergo cell cycling. To pass epigenetic information from one generation to the next, this information needs to be encoded in the germline. Analysis of epigenetic features along the plant germline is hampered by the fact that flowers contain both somatic and reproductive cells, and seeds typically contain maternal tissue, polyploid endosperm, and embryonic tissue. Therefore, we recommend taking the nature of the tissue analyzed for epigenetic configuration into account when interpreting the data. Another important aspect that should be considered is potential genetic diversity of the material. Small genetic differences between samples can lead to substantial epigenetic differences if they affect genes involved in epigenetic pathways or via cis- or trans-acting control of gene regulation by repetitive sequences (e.g., [4, 5]). When interpreting epigenetic data, it is therefore important to know the outcrossing frequency of the species, the heterozygosity of the genome, ploidy, etc. Special consideration should be given to the cautious use of transgenic reporter lines in studies of epigenetic inheritance in response to stress. While they have been extremely useful to identify and characterize numerous epigenetic components, their suitability as proxy for changes at endogenous loci should be carefully evaluated, especially if the constructs contain sequences originating from pathogens, with heterogeneous regulatory elements or unusual base composition.

2.2 Computational Tools

In recent years, a multitude of computational tools have been developed to detect epigenetic differences between samples. Plant scientists should note, however, that most of them were designed to understand mammalian epigenetic patterns and therefore might have limitations when applied on plant data. This is particularly relevant in the case of DNA methylation. In mammals, cytosine methylation occurs almost exclusively in the CG context. In contrast, plant cytosine methylation exists in all three possible contexts,

264

Daniela Ramos Cruz and Claude Becker

CG, CHG, and CHH (where H can be any base but G). Because CHG and CHH methylation frequencies and distribution are very different from that at CG sites, and because the genomic distribution of DNA methylation in plants is often different from that in mammals, care needs to be taken when applying computational software to plant data. Tools designed for the analysis of histone modifications are less critical, but underlying assumptions should always be verified when applying them to organisms different from those for which they were originally developed.

3 3.1

Methods Study Design

3.1.1 Stress Treatment

Several parameters need to be defined for stress treatments: timing, intensity, duration, incidence, and frequency. l

Timing: Sensitivity to stress can vary considerably during a plant’s lifetime, requiring that the developmental stage for the stress application must be chosen carefully. Investigating intergenerational epigenetic changes, it is important to apply stress treatments pre-meiotically and end them before the vegetativeto-reproductive transition or, in the case of perennial and woody species, before the onset of flowering. Otherwise, stress experienced after development of gametes confounds direct effects and those transmitted to the progeny.

l

Intensity: Stress treatments should be strong enough to be effective but should not severely inhibit plant development, as this might affect the quality of the seeds produced, and effects observed in the following generation might be related to reduced seed fitness rather than to stress memory. Additionally, stress type and dose should be physiologically relevant, i.e., reflect conditions that plants are likely to encounter in a realworld scenario. We recommend testing different doses and follow their effectiveness with careful phenotyping and eventually expression assays for suitable marker genes.

l

Duration: Duration of stress can influence the probability that the stress will be memorized. No specific recommendation can be given here, as the duration of stress provoking a memory effect is different for each stress and each plant material, and long exposure to a low dose might have quite different consequences than short, intense treatments. However, the same comments as for stress intensity also apply to the choice of duration.

l

Incidence: Stress can be applied once or repeatedly to the same plants. This is an important factor to consider, as previous studies have shown that stress memory in plants depends on whether the stress was perceived once, twice, or multiple times

How to Study Epigenetic Inheritance

265

[6, 7]. As stress in natural habitats often originates from a combination of adverse conditions, simultaneous application of different stress types, in different doses and durations, increases the number of possible experimental setups by orders of magnitude and should be chosen purposefully. l

Frequency: If repeated stress treatments are applied, the interval between two subsequent exposures can influence the outcome. Too short stress-free periods will result in deterioration of the plants, while too long recovery might hinder the establishment or even cause loss of a memory. Somatic memory in Arabidopsis has been noted to last over several days to several weeks [2], but this may also vary with species and conditions.

3.1.2 Control Treatments

As important as the actual experiments are suitable and equivalent control experiments for each condition. Because epigenetic patterns can spontaneously change over time or across generations [4, 8], or because they might be affected by a systematic but unidentified factor in the study, control plants need to be grown and propagated under stress-free conditions along with the treated ones, under the same growth conditions, the same handling, and for the same number of generations. Whenever the experimental setup allows for it, we recommend growing control and stresstreated plants in a randomized arrangement, to avoid batch and position effects.

3.1.3 Replication and Sampling

l

Replication: To gain information about generality and reproducibility of results, all experiments should be repeated as completely independent studies. The degree of necessary replication depends on the epigenetic mark(s) of interest and the experimental condition. As a rule, transcriptome and histone profiling need higher replication than DNA methylation analysis. We recommend a minimum of three biological replicates for transcriptome and histone profiling and minimum two biological replicates for DNA methylation analysis.

l

Sampling: Experimental conditions permitting, it is generally advisable to collect tissue from several individuals, to exclude subtle intraindividual epigenetic variation from the analysis, and to increase the statistical power to detect condition-specific patterns. Care must be taken to choose material in similar developmental stages between individuals and between experiments. A factor often overlooked is the control over the circadian timing of sample collection (e.g., considering the light/dark regime of the culture conditions), as some epigenetic factors are connected to the internal clock (e.g., [9]).

266

Daniela Ramos Cruz and Claude Becker

3.1.4 Treatments with Chemical Compounds Acting on Epigenetic Mechanisms

A common practice in transgenerational studies is the treatment with epigenetically active chemical agents. Most common are DNA demethylating chemicals (e.g., 5-aza-cytidine or zebularine), histone deacetylase inhibitors (e.g., trichostatin-A), or general methylation inhibitors (e.g., dihydroxypropyladenine) (reviewed in [10]). Application of these substances is often combined with a stress treatment, in order to amplify stress-induced epigenetic changes and to strengthen their memorization in the plant. However, all of these drugs affect the methylation or acetylation status of many targets and act on multiple pathways, and observations following their application cannot be attributed with confidence to solely one mechanism or target. It is also important to consider that these chemicals are often toxic, difficult to apply uniformly to all tissues and developmental stages, and unstable under plant culture conditions, leading to uncontrolled variation between samples.

3.1.5 Resolution of Epigenetic Changes

One important parameter when interpreting stress-induced epigenetic changes is the degree of genomic resolution, differing substantially depending on the method and with considerable implications for potential interpretation. This is most obvious in case of DNA methylation analysis: so far often applied are methylsensitive amplification polymorphism (MSAP) or similar techniques, in which DNA is digested with methylation-sensitive and methylation-insensitive restriction enzymes, respectively, prior to PCR amplification of random or selected fragments. Consequently, MSAP delivers information on only very few genomic loci, does not allow conclusions on the genome-wide DNA methylation configuration, and is not suitable to link DNA methylation changes to treatment-induced heritable patterns. Reduced-representation techniques such as epiGBS or RAD-BS increase the number of investigated genomic loci considerably, but they too do not allow pinpointing specific genomic loci and functional links. Chemical conversion of unmethylated cytosine by bisulfite treatment and subsequent genome sequencing is currently the only technique that allows single-nucleotide resolution and quantitative DNA methylation analysis. Despite the limitations brought about by the requirement for a reference genome for the analysis, wholegenome bisulfite sequencing (WGBS) should be the method of choice for studying the role of DNA methylation in transgenerational epigenetic inheritance. Emerging techniques for recognizing modified bases during long-read direct sequencing will certainly move the field in this direction and expand the range towards analysis of repetitive sequences and single cells.

How to Study Epigenetic Inheritance

3.2

Data Analysis

3.2.1 Differential DNA Methylation

3.2.2 Differential Analysis of Histone Modifications

267

The most common analysis of differential epigenetic patterns is that of differential cytosine methylation. One generally distinguishes between single differentially methylated positions (DMPs) and differentially methylated regions (DMRs). For a meaningful WGBS data analysis, it needs to be verified that the conversion rate of unmethylated cytosine during bisulfite treatment is above 99%. This can be easily scored based on reads aligning to the generally unmethylated plastid DNA that “contaminates” nearly all genomic DNA preparations. l

For DMPs, standard pipelines are available. As explained above, many tools were designed to analyze mammalian DNA methylation and are not directly applicable to plants. We recommend strategies that do not exclude CHG and CHH as potential methylation sites (e.g., [4, 8]). Because DMP analysis involves testing many single sites across at least two samples, stringent multiple testing correction needs to be applied to filter for statistically relevant changes. It should also be noted that DMP analysis is generally biased towards highly covered sites and towards CG context (for a detailed explanation, see [4, 11]), and sequencing error rates should be considered.

l

Multiple tools have been published for DMR analysis, again with the abovementioned limitation for applicability in plants. In general, sliding-window-based approaches are very lenient and lead to many false positives, while tools that use DMP clustering information suffer from the same biases as the DMP calling itself, i.e., a bias to sparsely distributed methylated CG sites. Several recently released tools take a different approach, either segmenting the genome into high and low states of methylation [11], training on a background set of DMRs [12], or segmentation based on methylation differences between groups [13]. As DMR calling usually involves a large number of independent statistical tests, users need to assure to apply multiple testing corrections if these are not integral to the tool used in order to reduce the number of false positives.

It is not possible to provide recommendations that are generally applicable to the analysis of all posttranslational histone modifications, as they differ widely in frequency, intensity, stability, and detectability. However, the human Epigenome Roadmap project formulated essential general requirements regarding the statistical analysis that are also applicable to plants [14]. In brief, researchers need to assure sufficient statistical power by replicating their experiments (see above), and they need to check the stability of differential loci by confirming those identified using true replicates with those identified using pseudo-replicates and with subsamples of the data [14].

268

Daniela Ramos Cruz and Claude Becker

3.3 Data Interpretation

Despite all pitfalls in the experimental design listed above, the most challenging part of studies investigating transgenerational epigenetic inheritance is likely the correct interpretation and contextualization of observations. Instead of strict guidelines, we provide a list of questions that researchers should address when drawing conclusions from their data. For each question, we provide background on the underlying rationale. l

l

l

l

l

Has the null hypothesis been correctly formulated? Studies addressing the question whether specific conditions lead to epigenetic inheritance of memory information must start from the null hypothesis, namely that it does not occur. Consequently, all statistical analyses must be designed to test the null hypothesis. Only when the null hypothesis has been disproven can claims be made about the potential alternative scenario, namely that there could be epigenetically inherited traits. Is the important difference between correlation and causation respected in the interpretation? Co-occurrence of transcriptional and epigenetic changes is often interpreted as a causal relationship in which the epigenetic change is the regulatory unit, and the transcriptional change the consequence thereof. However, such conclusions cannot be made without a high temporal resolution of the analysis, and in the absence of this should be described as a correlative relationship instead. Are the observed epigenetic changes directed by the stress, or are they the result of accelerated, nondirected variation? In the literature, there are examples of stress-induced epigenetic changes that are different in each sample or replicate but are more frequent in stress-treated samples than in control samples [15]. In that case, the stress affects the epigenetic configuration, but the induced variation is not typical for the stress. Instead, stress accelerated the rate at which epigenetic changes occur. Does the progeny of stressed plants display phenotypic differences compared to that of control plants? Plants might inherit epigenetic variants that have no phenotypic consequence. For stress-induced epigenetic variants, one would expect an impact on the progeny’s response to either the same or another stress. If no such phenotypic effect can be detected, there is no evidence for an adaptive advantage of the inherited epigenetic patterns. Does the progeny of stressed plants display transcriptional differences compared to that of control plants, either under control or under stress conditions? Epigenetic variants can have a phenotypic effect only if they cause a change in the transcriptional landscape. Such a change

How to Study Epigenetic Inheritance

l

l

l

269

can be in cis or in trans to the epigenetically modified locus. If no transcriptional differences or no differential processing of the transcripts are detectable in the progeny, then the inherited epigenetic variant is without consequence and unrelated to a trait. Does the observed phenotypic/transcriptional/epigenetic effect last into the second- and third-generation progeny? Effects in the first-generation progeny can be related to fitness of the parents, especially that of the mother plant, affecting seed filling, seed maturation, parental imprinting, etc. If these effects are transient and cannot be detected after one, or preferably two, stress-free generations, they cannot be considered as transgenerationally inherited epigenetic traits. Are the effects in the progeny specific in response to the stress initially applied or are they of a more general nature? Priming can be specific for the initially applied stress (in the case of transgenerational studies to the stress experienced in the parental generation), or it can result in a nonspecific stress priming. Cross-priming in plants has been observed for several stresses [16]. Although nonspecificity is not an exclusion criterion for transgenerational inheritance, including cross-specificity tests in transgenerational studies improves the interpretation of the data substantially. When are observations meaningful compared to the expected frequency? Frequently, studies state that many genes identified as epigenetically affected in condition X are classified—for example, using a gene ontology database—as being involved in the plant’s response to X, confirming the initial hypothesis. Surprisingly often, statements of this kind lack a “base rate,” i.e., a comparison to the expected number of genes classified as being involved in the response to X. What is required in this case is a statistical test comparing the observed number of genes to the expected number of genes, considering the number of all epigenetically modified genes and the number of all genes classified as being involved in the response to X. Similar principles apply when comparing epigenetically and transcriptionally responsive loci.

Acknowledgments We would like to thank Ortrun Mittelsten Scheid for critical reading of the manuscript and for conceptual advice. This work was supported by the European Research Council (ERC) via the Marie Skłodowska Curie Innovative Training Network “Epidiverse” (grant ID: 764965) and the ERC Starting Grant “FEARSAP” (grant ID: 716823) and by the Austrian Academy of Sciences ¨ AW). (O

270

Daniela Ramos Cruz and Claude Becker

References 1. Deans C, Maggert KA (2015) What do you mean, “epigenetic”? Genetics 199:887–896 2. Lamke J, Baurle I (2017) Epigenetic and chromatin-based mechanisms in environmental stress adaptation and stress memory in plants. Genome Biol 18:124 3. Crisp PA et al (2016) Reconsidering plant memory: Intersections between stress recovery, RNA turnover, and epigenetics. Sci Adv 2: e1501340 4. Becker C et al (2011) Spontaneous epigenetic variation in the Arabidopsis thaliana methylome. Nature 480:245–249 5. Dubin MJ et al (2015) DNA methylation in Arabidopsis has a genetic basis and shows evidence of local adaptation. Elife 4:e05255 6. Ding Y, Fromm M, Avramova Z (2012) Multiple exposures to drought ‘train’ transcriptional responses in Arabidopsis. Nat Commun 3:740 7. Sani E et al (2013) Hyperosmotic priming of Arabidopsis seedlings establishes a long-term somatic memory accompanied by specific changes of the epigenome. Genome Biol 14: R59 8. Schmitz RJ et al (2011) Transgenerational epigenetic instability is a source of novel methylation variants. Science 334:369–373 9. Baerenfaller K et al (2016) Diurnal changes in the histone H3 signature H3K9ac|H3K27ac|

H3S28p are associated with diurnal gene expression in Arabidopsis. Plant Cell Environ 39:2557–2569 10. Zhang H et al (2013) Chemical probes in plant epigenetics studies. Plant Signal Behav 8(9): e25364 11. Hagmann J, Becker C (2017) Assessing distribution and variation of genome-wide dna methylation using short-read sequencing. Methods Mol Biol 1610:61–72 12. Srivastava A et al (2018) HOME: a histogram based machine learning approach for effective identification of differentially methylated regions. BMC Bioinformatics 20:253. https://doi.org/10.1101/228221 13. Juhling F et al (2016) Metilene: fast and sensitive calling of differentially methylated regions from bisulfite sequencing data. Genome Res 26:256–262 14. Bailey T et al (2013) Practical guidelines for the comprehensive analysis of ChIP-seq data. PLoS Comput Biol 9:e1003326 15. Jiang C et al (2014) Environmentally responsive genome-wide accumulation of de novo Arabidopsis thaliana mutations and epimutations. Genome Res 24:1821–1829 16. Rejeb IB, Pastor V, Mauch-Mani B (2014) Plant responses to simultaneous biotic and abiotic stress: molecular mechanisms. Plants 3:458–475

INDEX A

E

Abscisic acid (ABA).............................247, 249, 251, 253 Alternative splicing....................................................24, 26 Antibodies ...................................... 16, 90, 132, 137, 143 Arabidopsis genome of........................................21, 116, 164, 222 genomic imprinting in ........................ 9, 10, 174, 190

Endosperm ........................................................8, 10, 163, 173, 175, 176, 183, 184, 188, 190–192, 262, 263 Epigenetic genome-by-sequencing (epiGBS) ...........................................203–214, 266 Epigenetic inheritance ..................................... 6, 261–269 Epigenetic recombinant inbred lines (epi-RILs) ........................................................9, 25 Epimutations ...............................................................8, 66

B Barcodes, epigenetic ............................................ 227–231 Bisulflite sequencing (BS)........................................15–17, 38, 40, 43, 66, 96 See also Reduced representation bisulfite sequencing (RRBS) Brassica rapa .............................................................66, 73

F

C

Gene balance hypothesis...................................... 161–169 Genome dosage.................................................. 3, 10, 162 Genomic imprinting, see Imprinting

ChIP-Seq .............................................................. 147, 148 Chromatin domains ................................................................... 108 fibers......................................................................... 147 immunoprecipitation (see ChIP-seq) loops......................................................................... 156 of gene clusters............................................................ 3 Chromocentres.............................................................. 108 Conifers ....................................................... 217, 218, 224 CpG islands ..................................................................... 16

D DAPI staining....................................................... 103, 110 ddRAD.......................................................................47–63 Differentially methylated regions (DMRs).........................................................17–20, 22, 48–50, 267 Differentially methylation genes (DMGs) ........................................... 23, 24, 49, 50 DNA methylation CG................................................................ 16, 19, 20, 34, 47–49, 65, 66, 204, 228, 253, 263, 264, 267 CHG ...................................................................16, 19, 34, 40, 47–49, 65, 204, 228, 264, 267 non-CG.................................................................... 253

Fluorescence activated nuclei sorting (FANS).........................................................95–104

G

H High-throughput chromosome conformation capture (Hi-C) ...................................... 115–125, 147–156 Histones acetylation of .................................246, 250, 262, 266 chemical derivatization using propionicanhydrid ............................................... 82 H3.3............................................................22, 83, 254 methylation of ................................................. 5, 8, 22, 23, 27, 34, 228, 246, 254, 262, 266 HOME ............................................................................ 22

I Imprinting .......................................................8, 174–176, 178, 179, 181, 182, 184, 185, 188, 190–195, 269 Isolation of nuclei tagged in specific cell types (INTACT) .................................. 96, 116

J Juicer....................................................148, 152, 154–156

Charles Spillane and Peter McKeown (eds.), Plant Epigenetics and Epigenomics: Methods and Protocols, Methods in Molecular Biology, vol. 2093, https://doi.org/10.1007/978-1-0716-0179-2, © Springer Science+Business Media, LLC, part of Springer Nature 2020

271

PLANT EPIGENETICS

272 Index

AND

EPIGENOMICS: METHODS

AND

L LasX ...................................................................... 109, 110

M MASCOT ........................................................................ 82 Mass spectrometry (MS)......................................... 82, 84, 87–89, 131, 133, 135, 137, 139, 140, 143 Metabolic gene clusters (MGCs) ........................ 129–145 METHimpute ................................................................. 22 Methylation sensitive amplified fragment length polymorphisms (MS-AFLP/MSAP) ...............203, 228, 230–233, 236–241, 266 Methyl-IT ..................................................................22–24 MSH1.........................................................................22, 25

N Next generation sequencing (NGS), see RNA-seq Non-Mendelian inheritance ......................................... 6–8 Nuclear envelope (NE) .........................96, 103, 107, 108 Nuclear matrix constituent proteins (NMCPs) ........................................................... 108 Nuclei isolation buffer (NIB) ................................. 84, 87, 97, 98, 104, 109, 110, 117, 119, 122, 148, 150

O Oryza sativa, see Rice

P Parallel reaction-monitoring (PRM)-mass spectrometry ........................................... 83, 85, 90 Paramutation ..................................................................... 8 Plant breeding ................................................................. 33 Plant Imprinting Database ................................. 174, 176, 178, 191, 194 Polyploidy (in plants)........................................... 161, 162 Priming (of seeds) .............................................8, 97, 101, 244, 246–249, 254, 255, 261, 262, 269

PROTOCOLS See also epi-RILs Reduced representation bisulfite sequencing (RRBS)......................................... 65–78, 203, 204 Rice .................................................................... 20, 66–68, 71, 81–91, 148, 174, 247–249 RNA interference (RNAi) ............................................25, 82 isolation ................................................. 131, 133, 219 sequencing (RNA-seq) ................................... 43, 164, 166, 167, 174, 175, 181, 183–187, 189, 190, 221 RNA-dependent DNA methylation (RdDM) ................................ 21, 22, 26, 197, 198

S Saffron ................................................................. 227, 228, 230, 232, 237, 240 SEQUEST ....................................................................... 82 Shoot apical meristem (SAM) .......................96, 100, 103 Short Read Archive (SRA).......................... 152, 176, 190 Small RNAs (sRNAs) .............................................. 3, 6, 9, 101, 217, 218, 220–222, 224 Somatic memories ............................................... 244–246, 254, 262, 265 Stamens................................................227, 228, 232, 236 Stress, memory in plants......................................... 27, 48, 243–245, 261, 264

T Transgenerational inheritance ................................ 21, 25, 262, 269 Transposable elements (TEs) ....................................9, 16, 18–20, 22, 24, 26, 34, 47, 101, 103, 117, 121, 132, 137, 148, 150, 152, 217, 219, 220, 224 Tritium aestivum L., see Wheat

W Wheat genomes.........................................33, 34, 42, 43, 174 Whole genome methylation sequencing ....................... 16

R

Z

Recombinant inbred lines (RILs) .................................. 20

Zea mays L. .............................................. 66, 73, 191, 244