126 76 11MB
English Pages 343 [330] Year 2021
Methods in Molecular Biology 2370
Gavin P. Davey Editor
Glycosylation Methods and Protocols
METHODS
IN
MOLECULAR BIOLOGY
Series Editor John M. Walker School of Life and Medical Sciences University of Hertfordshire Hatfield, Hertfordshire, UK
For further volumes: http://www.springer.com/series/7651
For over 35 years, biological scientists have come to rely on the research protocols and methodologies in the critically acclaimed Methods in Molecular Biology series. The series was the first to introduce the step-by-step protocols approach that has become the standard in all biomedical protocol publishing. Each protocol is provided in readily-reproducible step-bystep fashion, opening with an introductory overview, a list of the materials and reagents needed to complete the experiment, and followed by a detailed procedure that is supported with a helpful notes section offering tips and tricks of the trade as well as troubleshooting advice. These hallmark features were introduced by series editor Dr. John Walker and constitute the key ingredient in each and every volume of the Methods in Molecular Biology series. Tested and trusted, comprehensive and reliable, all protocols from the series are indexed in PubMed.
Glycosylation Methods and Protocols
Edited by
Gavin P. Davey School of Biochemistry and Immunology, Trinity College Dublin, Dublin, Ireland
Editor Gavin P. Davey School of Biochemistry and Immunology Trinity College Dublin Dublin, Ireland
ISSN 1064-3745 ISSN 1940-6029 (electronic) Methods in Molecular Biology ISBN 978-1-0716-1684-0 ISBN 978-1-0716-1685-7 (eBook) https://doi.org/10.1007/978-1-0716-1685-7 © Springer Science+Business Media, LLC, part of Springer Nature 2022 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Humana imprint is published by the registered company Springer Science+Business Media, LLC, part of Springer Nature. The registered company address is: 1 New York Plaza, New York, NY 10004, U.S.A.
Preface The field of glycosylation research has expanded in recent years, and discovery in this area is moving at an extraordinary rate, with major impact on the fields of immunology, cancer, virology, metabolic disorders, bioprocessing, biotherapeutics, bioinformatics, and neuroscience. Undoubtedly, advances in analytical technologies, light microscopy, click chemistry, gene editing, glycoengineering, glycoinformatics, and computational biology have enabled this up-scaling in discovery. The multidisciplinary nature of glycosylation research has resulted in a wide range of specialist journals reporting latest discoveries. This book brings together leading glycoscientists, with multidisciplinary expertise, to provide the reader with comprehensive insight into the latest methods and technologies for performing up-to-date glycosylation research. The first part deals with latest analytical and bioinformatics technologies that enable glycosylation complexity to be characterized. The second part details the importance of synthetic chemistry and glycoengineering in the fields of bioprocessing and biotherapeutics development. The third part focuses on systems biology and computational technologies that scientists use to analyze glycosylation events that are enzymatically controlled in the cell. The fourth part concentrates on how cellular glycosylation biomarkers can be identified and used to stratify human clinical datasets. With the large amount of mass spectra data now available on glycan structures, properly curated and managed databases are required. Chapter 1 provides information on bioinformatics tools and standards used to characterize glycans by chromatographic and mass spectrometry analysis. It also describes new informatics protocols and software tools for automation of glycan annotation. Chapter 2 describes a new database, Glycostore, a platform for liquid chromatographic and capillary electrophoresis glycan structure data. Chapter 3 describes bioinformatics advances in glycosylation and highlights converging efforts toward building a consistent picture of protein glycosylation. Chapter 4 describes hplc methods for characterization of milk saccharides and glycosylation molecules in food sources. Chapter 5 presents cGlyco, an open-source program for comparative glycomics data analysis and its application in the study of glycan structures in human blood and immune cells. Controlling the efficient glycosylation of recombinantly produced monoclonal antibodies is a priority in the biopharmaceutical and bioprocessing industries. Chapter 6 describes a new method for the chemoenzymatic re-modeling of antibodies during the mAb isolation process. Chapter 7 provides details on a new method for rapid antibody glycoengineering in Chinese Hamster Ovary cells using RNAi and CGE-LIF N-glycomics technologies. How glycosylation influences protein folding and aggregation is not only an important cell biology event but also has major implications for unwanted aggregation of biotherapeutics during bioprocess. Chapter 8 describes a new in silico method for identification of aggregation prone regions in therapeutic antibodies and for identification of specific glycan structures that likely influence aggregation events. Yeast strains are widely used as recombinant protein expression systems, and Chapter 9 provides a detailed review of yeast glycoengineering systems, including yeast surface display of glycoproteins. The enzymes that carry out glycosylation reactions in the Golgi produce a complex network of glycans consisting of thousands of possible structures. In Chapter 10, a
v
vi
Preface
stochastic simulation modeling approach is used to generate glycan networks and predict glycosyltransferase locations in the Golgi compartments. Chapter 11 reports on a new formal-language-based generator of O-glycosylation networks and how it may be used to interrogate glycosylation networks that synthesize O-glycans and gangliosides. The large number of nuclear encoded glycosyltransferases and glycosidases forms the complex networks that deliver the glycosylation networks inside cells. Chapter 12 describes a series of FRET-enabled active probes that can be used to assay glycosylation enzyme activities. Specific glycan structures have been identified in a number of human diseases and are now established biomarkers. Chapter 13 details a robust method for analyzing blood plasma glycans that enable glycomarkers to be identified and associated with clinical datasets. Lectin biology has provided unique insight into glycobiology and glycosylation networks. Chapter 14 describes lectin histochemistry and dual lectin and antibody co-localization techniques for identifying glycan patterns on tissue samples, neuronal and astrocytic cell lines. Further use of lectin chromatography is described in Chapter 15 to screen differences in sialic acid linkage in the prostate-specific antigen (PSA) from prostate cancer serum samples at different stages of aggressiveness. Metabolic labeling of sugar nucleotides used in glycosylation reactions has provided insight into glycan structures on cells. Chapter 16 describes how different azide-sugar molecules can be incorporated into the glycocalyx of neurons and imaged using STED microscopy techniques. Chapter 17 describes flow cytometry and confocal microscopy techniques for visualization of the cell surface sialome on human immune cells. Dublin, Ireland
Gavin P. Davey
Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Contributors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PART I
ANALYTICAL AND BIOINFORMATICS METHODS FOR STUDYING GLYCOSYLATION
1 Glycoinformatics Tools for Comprehensive Characterization of Glycans Enzymatically Released from Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ian Walsh, Sophie Zhao, Katherine Wongtrakul-Kish, Matthew Choo, Shi Jie Tay, Christopher H. Taron, Pauline M. Rudd, and Terry Nguyen-Khuong 2 GlycoStore: A Platform for H/UPLC and Capillary Electrophoresis Glycan Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matthew P. Campbell, Sophie Zhao, Jodie L. Abrahams, Terry Nguyen-Khuong, and Pauline M. Rudd 3 An Interactive View of Glycosylation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Julien Mariethoz, Davide Alocci, Niclas G. Karlsson, Nicolle H. Packer, and Fre´de´rique Lisacek 4 Characterization and Analysis of Food-Sourced Carbohydrates . . . . . . . . . . . . . . . Leonie J. Kiely and Rita M. Hickey 5 Comparative Glycomics Analysis of Mass Spectrometry Data . . . . . . . . . . . . . . . . . Yusen Zhou and Sriram Neelamegham
PART II
v ix
3
25
41
67 97
SYNTHETIC BIOLOGY AND GLYCOENGINEERING METHODS FOR CONTROLLING GLYCOSYLATION
6 Cell Free Remodeling of Glycosylation of Antibodies. . . . . . . . . . . . . . . . . . . . . . . . 117 Letı´cia Martins Mota, Venkata S. Tayi, and Michael Butler 7 Rapid Antibody Glycoengineering in CHO Cells Via RNA Interference and CGE-LIF N-Glycomics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Pavlos Kotidis, Masue Marbiah, Roberto Donini, Itzcoatl A. Go mez, Ioscani Jimenez del Val, Stuart M. Haslam, Karen M. Polizzi, and Cleo Kontoravdi 8 In Silico Analysis of Therapeutic Antibody Aggregation and the Influence of Glycosylation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Hyesoo Jeon, Jerrard M. Hayes, and K. H. Mok 9 Recent Advances Toward Engineering Glycoproteins Using Modified Yeast Display Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Anjali Shenoy and Adam W. Barb
vii
viii
Contents
PART III 10
11
12
Computational Modeling of Glycan Processing in the Golgi for Investigating Changes in the Arrangements of Biosynthetic Enzymes . . . . . . . . . 209 Ben West, A. Jamie Wood, and Daniel Ungar O-Glycologue: A Formal-Language-Based Generator of O-Glycosylation Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Andrew G. McDonald and Gavin P. Davey Synthetic Strategies for FRET-Enabled Carbohydrate Active Enzyme Probes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 Meenakshi Singh, Michael Watkinson, Eoin M. Scanlan, and Gavin J. Miller
PART IV 13
14
15
16
17
SYSTEMS BIOLOGY AND COMPUTATIONAL METHODS FOR DECIPHERING GLYCOSYLATION NETWORKS
GLYCOSYLATION BIOMARKERS AND CLINICAL DATASETS
N-glycan Characterization by Liquid Chromatography Coupled with Fluorimetry and Mass Spectrometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Richard A. Gardner, Paulina A. Urbanowicz, and Daniel I. R. Spencer Lectin Histochemistry for Tissues and Cells, and Dual Lectin and Antibody Co-localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Siobha´n S. McMahon and Michelle Kilcoyne Lectin Affinity Chromatography for the Discovery of Novel Cancer Glycobiomarkers: A Case Study with PSA Glycoforms and Prostate Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Esther Llop and Rosa Peracaula Metabolic Labeling of Primary Neurons Using Carbohydrate Click Chemistry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jerrard M. Hayes, Darren M. O’Hara, and Gavin P. Davey Monitoring the Sialome on Human Immune Cells. . . . . . . . . . . . . . . . . . . . . . . . . . Laura K. O’Farrell, Alexander D. Fraser, and Gavin P. Davey
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
267
281
301
315 323 331
Contributors JODIE L. ABRAHAMS • Institute for Glycomics, Griffith University, Gold Coast, QLD, Australia DAVIDE ALOCCI • Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland ADAM W. BARB • Biochemistry and Molecular Biology Department, University of Georgia, Athens, GA, USA MICHAEL BUTLER • National Institute for Bioprocessing, Research and Training (NIBRT), Dublin, Ireland MATTHEW P. CAMPBELL • Institute for Glycomics, Griffith University, Gold Coast, QLD, Australia MATTHEW CHOO • Bioprocessing Technology Institute, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore GAVIN P. DAVEY • School of Biochemistry and Immunology, Trinity College Dublin, Dublin, Ireland IOSCANI JIMENEZ DEL VAL • School of Chemical & Bioprocess Engineering, University College Dublin, Dublin, Ireland ROBERTO DONINI • Department of Chemical Engineering, Imperial College London, London, UK; Department of Life Sciences, Imperial College London, London, UK ALEXANDER FRASER • Limerick General Hospitals, Limerick, Ireland; School of Medicine, University of Limerick, Limerick, Ireland RICHARD A. GARDNER • Culham Science Centre, Research and Development, Ludger Ltd, Abingdon, Oxfordshire, UK ITZCO´ATL A. GO´MEZ • School of Chemical & Bioprocess Engineering, University College Dublin, Dublin, Ireland STUART M. HASLAM • Department of Life Sciences, Imperial College London, London, UK JERRARD M. HAYES • Trinity Biomedical Sciences Institute (TBSI), School of Biochemistry & Immunology, Trinity College Dublin, The University of Dublin, Dublin 2, Ireland RITA M. HICKEY • Teagasc Food Research Centre, Moorepark, Fermoy, Co. Cork, Ireland HYESOO JEON • Trinity Biomedical Sciences Institute (TBSI), School of Biochemistry & Immunology, Trinity College Dublin, The University of Dublin, Dublin 2, Ireland NICLAS G. KARLSSON • Department of Medical Biochemistry and Cell Biology, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden LEONIE J. KIELY • Teagasc Food Research Centre, Moorepark, Fermoy, Co. Cork, Ireland MICHELLE KILCOYNE • Carbohydrate Signalling Group, Discipline of Microbiology, School of Natural Sciences, National University of Ireland Galway, Galway, Ireland CLEO KONTORAVDI • Department of Chemical Engineering, Imperial College London, London, UK PAVLOS KOTIDIS • Department of Chemical Engineering, Imperial College London, London, UK FRE´DE´RIQUE LISACEK • Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, University of Geneva, Geneva, Switzerland ESTHER LLOP • Department of Biology, Biochemistry and Molecular Biology Unit, University of Girona, Girona, Spain
ix
x
Contributors
MASUE MARBIAH • Department of Chemical Engineering, Imperial College London, London, UK; Imperial College Centre for Synthetic Biology, Imperial College London, London, UK JULIEN MARIETHOZ • Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland; Computer Science Department, University of Geneva, Geneva, Switzerland ANDREW G. MCDONALD • School of Biochemistry and Immunology, Trinity College Dublin, Dublin, Ireland SIOBHA´N S. MCMAHON • Discipline of Anatomy and Galway Neuroscience Centre, School of Medicine Nursing and Health Sciences, National University of Ireland Galway, Galway, Ireland GAVIN J. MILLER • Lennard-Jones Laboratories, School of Chemical and Physical Sciences, Keele University, Keele, Staffordshire, UK K. H. MOK • Trinity Biomedical Sciences Institute (TBSI), School of Biochemistry & Immunology, Trinity College Dublin, The University of Dublin, Dublin 2, Ireland; Centre for Research on Adaptive Nanostructures and Nanodevices (CRANN), Trinity College Dublin, The University of Dublin, Dublin 2, Ireland LETI´CIA MARTINS MOTA • Cell Technology Group, National Institute for Bioprocessing, Research and Training (NIBRT), Dublin, Ireland SRIRAM NEELAMEGHAM • Chemical and Biological Engineering, Biomedical Engineering and Medicine, State University of New York, Buffalo, NY, USA TERRY NGUYEN-KHUONG • Bioprocessing Technology Institute, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore LAURA K. O’FARRELL • School of Biochemistry and Immunology, Trinity Biomedical Sciences Institute, Trinity College Dublin, Dublin, Ireland DARREN M. O’HARA • School of Biochemistry and Immunology, Trinity Biomedical Sciences Institute, Trinity College Dublin, Dublin, Ireland NICOLLE H. PACKER • Department of Molecular Sciences and ARC Centre of Excellence for Nanoscale Biophotonics, Macquarie University, Sydney, NSW, Australia; Institute for Glycomics, Griffith University, Gold Coast, QLD, Australia ROSA PERACAULA • Department of Biology, Biochemistry and Molecular Biology Unit, University of Girona, Girona, Spain KAREN M. POLIZZI • Department of Chemical Engineering, Imperial College London, London, UK; Imperial College Centre for Synthetic Biology, Imperial College London, London, UK PAULINE M. RUDD • Bioprocessing Technology Institute, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore EOIN M. SCANLAN • School of Chemistry and Trinity Biomedical Sciences Institute, Trinity College Dublin, Dublin 2, Ireland ANJALI SHENOY • Biochemistry and Molecular Biology Department, University of Georgia, Athens, GA, USA MEENAKSHI SINGH • Lennard-Jones Laboratories, School of Chemical and Physical Sciences, Keele University, Keele, Staffordshire, UK DANIEL I. R. SPENCER • Culham Science Centre, Research and Development, Ludger Ltd, Abingdon, Oxfordshire, UK CHRISTOPHER H. TARON • New England Biolabs, Ipswich, MA, USA SHI JIE TAY • Bioprocessing Technology Institute, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
Contributors
xi
VENKATA S. TAYI • Department of Microbiology, University of Manitoba, Winnipeg, MB, Canada DANIEL UNGAR • Department of Biology, University of York, York, UK PAULINA A. URBANOWICZ • Culham Science Centre, Research and Development, Ludger Ltd, Abingdon, Oxfordshire, UK IAN WALSH • Bioprocessing Technology Institute, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore MICHAEL WATKINSON • Lennard-Jones Laboratories, School of Chemical and Physical Sciences, Keele University, Keele, Staffordshire, UK BEN WEST • Department of Biology, University of York, York, UK KATHERINE WONGTRAKUL-KISH • Bioprocessing Technology Institute, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore A. JAMIE WOOD • Departments of Biology and Mathematics, University of York, York, UK SOPHIE ZHAO • Bioprocessing Technology Institute, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore YUSEN ZHOU • Chemical and Biological Engineering, Biomedical Engineering and Medicine, University at Buffalo, State University of New York, Buffalo, NY, USA
Part I Analytical and Bioinformatics Methods for Studying Glycosylation
Chapter 1 Glycoinformatics Tools for Comprehensive Characterization of Glycans Enzymatically Released from Proteins Ian Walsh, Sophie Zhao, Katherine Wongtrakul-Kish, Matthew Choo, Shi Jie Tay, Christopher H. Taron, Pauline M. Rudd, and Terry Nguyen-Khuong Abstract Glycosylation is important in biology, contributing to both protein conformation and function. Structurally, glycosylation is complex and diverse. This complexity is reflected in the topology, composition, monosaccharide linkages, and isomerism of each oligosaccharide. Glycoanalytics is a discipline that addresses the understanding and characterization of this complexity and its correlation with biology. It includes analytical steps such as sample preparation, instrument measurements, and data analyses. Of these, data analysis has emerged as a critical bottleneck because data collection has increasingly become highthroughput. This has resulted in data-rich workflows that lack rapid and automated data analytics. To address this issue, the field has been developing software for interpretation of quantitative glycomics studies. Here, we describe a protocol using available informatics tools for analysis of data from analysis of released glycans using high/ultraperformance liquid chromatography (H/UPLC) coupled with mass spectrometry (MS). Key words Glycoanalytics, Glycosylation, Mass spectrometry, Bioinformatics
1
Introduction The field of glycoanalytics seeks to enable the analysis of glycan structural complexity and understand its contribution to protein function. Rapid advancement in analytical technologies such as capillary electrophoresis (CE) [1], mass spectrometry (MS) [2–4] liquid chromatography (LC) [5–8] and ion-mobility (IM) [9–11] have dramatically enhanced both the analytical sensitivity and throughput analyses of complex biological glycomes. As these approaches have become more data-intensive, there is a stronger need for better software tools for data analysis. The field has started to respond to this increasing need for software solutions for automated interpretation of glycomics data [12]. In the coming years, further enhancement of the field’s informatics capabilities will be
Gavin P. Davey (ed.), Glycosylation: Methods and Protocols, Methods in Molecular Biology, vol. 2370, https://doi.org/10.1007/978-1-0716-1685-7_1, © Springer Science+Business Media, LLC, part of Springer Nature 2022
3
4
Ian Walsh et al.
imperative to advance the role glycoanalytics will play in basic science, biopharmaceutical development and clinical diagnostics. The level of structural assignment required in an application is an important consideration for any software that annotates released glycans. In many glycoanalytical laboratories, glycans are commonly analyzed using LC-MS. Typically, released and fluorescently tagged glycans are analyzed using hydrophilic interaction chromatography ultra-performance liquid chromatography coupled with electrospray ionization mass spectrometry (HILIC-UPLC–ESI/ MS). In the UPLC component of this workflow, the retention time or elution time of particular glycans may vary between different experiments. To remedy this, a reproducible and predictable Glucose Unit (GU) can be calculated by comparison to a dextran ladder mobility standard. In the ESI MS component of the workflow, the charged mass of each glycan species can be measured. Thus, both attributes (GU and mass) are often used to annotate LC-MS peaks. Structure determination relies upon data comparison to accurate databases that maintain GU mobility and mass data for hundreds of known glycan species. LC-MS databases such as GlycoStore (in press) and GlycoBase [13] store information that helps software such as UNIFI (developed by Waters Corp.), make glycan structural assignments based on their GU and mass attributes. In cases where more in-depth analyses are required to refine structural interpretations, exoglycosidase arrays can help confirm the structure assignments. Data interpretation using such analyses is complicated and requires significant manual interpretation. This chapter aims to introduce the reader to bioinformatics tools and standards that are used to characterize glycans using HILIC-UPLC–ESI/MS analysis. We describe established standards for data representation used by glycoanalytical scientists. Additionally, we present an informatics protocol for characterization and quantitation of N-glycans enzymatically released from a glycoprotein (e.g., Immunoglobulin G) or from a mixture of glycoproteins (e.g., total serum). Two software tools are described that automate glycan annotation. The first is UNIFI which assigns glycans using GU and mass matches to a reference database. The second is GlycanAnalyzer, a new utility that utilizes exoglycosidase array digestion data to bolster structure interpretation, and can reveal coeluting isomers.
2
Data Collection and Presentation
2.1 Glycan Structure Representations and Visualization
Glycan structures are nonlinear sequences with complex branching and glycosidic linkages. Multiple possible substituents (through fluorescent labels, derivatization, etc.) on the reducing end make glycan structure representation even more challenging. To avoid redundancies amongst the community in depiction of a
Glycoinformatics Tools for Comprehensive Characterization of Glycans. . .
5
complicated structural biomolecule, there has been an evolution of glycan representations, for both visual presentation and for machine readable file/text formats for database entry and bioinformatics. Initially, file formats were developed by independent databases for both visual representation and data interpretation, resulting in an accumulation of formats without consensus [14]. For example, the IUPAC carbohydrate nomenclature [15] was the first text representation of glycan sequences. It was capable of describing complex features of glycan structures in many ways. Linear notations like LINUCS [16] and LinearCode [17] developed in 2001/2002 provided simpler approaches by representing glycans with a unique sequence for each glycan. KCF [18] developed by a Japanese funded KEGG project and GlycoCT [19] formats both used connection tables to cover all possible monosaccharides and are represented by linear notations. The most recently developed text representation WURCS [20] is a linear notation that can uniquely represent any glycan structures, even ones with unknown linkages. Symbol nomenclature is also commonly used for visual glycan structure representations. This is very important for simple recognition of complicated structures and has been widely adopted in scientific communications and regulatory reports. The two most commonly used symbolic representations are Consortium for Functional Glycomics (CFG) notation and University of Oxford (UOXF) notation. CFG notation originates from Kornfeld [21] and was standardized in the first edition of the textbook Essentials of Glycobiology. Recently, the glycomics community has come together to extend CFG to a novel notation called Symbol Nomenclature for Glycans (SNFG; Fig. 1) [41] which covers nonvertebrate glycans and will have extensive documentation in the third edition of Essentials of Glycobiology. Oxford notation [42] uniquely shows a sugar’s anomer and linkage diagrammatically (e.g., alpha linkage is represented as dashed lines) making it a very informative visual glycan representation. A variety of tools have been developed to draw glycan structures in symbolic form. GlycanBuilder [43] is a versatile program that supports both CFG and Oxford nomenclatures. Users can either draw glycan structures de novo or import structure information from text in GlycoCT format [44]. Glycan structures drawn on its canvas can be exported as a GlycoCT string or be saved in a standard graphical format (e.g., PNG images). As GlycoCT format is widely used among databases, GlycanBuilder is frequently being utilized as a glycan sequence converter. It is available as a standalone Java application but is commonly included within other applications (e.g., GlycoWorkBench) [45] for assisting with glycan annotation of MS data. It can be incorporated into web pages as a graphical editor that supports structure searches as well. The new version of GlycanBuilder [46] has adopted the new SNFG symbol
6
Ian Walsh et al.
Fig. 1 GlycoStore is central for LC-MS released glycan analytics. GlycoStore accepts and stores experimental data (standardized LC-MS values) via the GlycoCT file standard. GlycoStore permits retrieval of mass and GU values in this dataflow. Its collections are used to characterize glycans from worldwide LC-MS experiments and in particular experiments at the Bioprocessing Technology Institute (Singapore) and the National Institute for Bioprocessing Research and Training (Ireland). It uses GlycoRDF which permits linkage and registration to GlyTouCan, and from there to a plethora of other databases which are detailed in Table 1
standard. Moreover, WURCS is supported as an input and output format in this version. DrawRINGS [47] is another popular applet to draw glycan structures. KCF is the main format used by DrawRINGS for input and output purpose. DrawGlycan-SNFG [48] is a new tool that converts IUPAC condensed format to graphical representation and supports the latest SNFG nomenclature and can additionally display glycan and peptide bond fragmentations. There are an increasing number of laboratories depicting glycans as SNFG diagrams likely due to its complete specification in the literature. Moving forward, SNFG will be the central format for the standardization of graphical and structural glycan representation adopted across the glycobiology community [49].
3 3.1
Glycan LC-MS Databases GlycoStore
GlycoStore (in press) is the latest database of over 800 N-, O-glycans, glycosphingolipids and free oligosaccharides associated with a range of glycoproteins, glycolipids and biotherapeutics. GlycoStore provides access to approximately 8500 retention positions determined from hydrophilic interaction chromatography (HILIC)
Glycoinformatics Tools for Comprehensive Characterization of Glycans. . .
7
ultra-high performance liquid chromatography (U/HPLC), reversed phase (RP)-U/HPLC, porous graphitized carbon (PGC) chromatography in combination with ESI-MS/MS detection, and capillary electrophoresis with laser induced fluorescence detection (CE-LIF). It also adopts the GlycoRDF platform and thus the potential for future expansion of data collections and new attributes. Figure 1 summarizes the central relationship of GlycoStore to other tools and file formats. Data stored in GlycoStore will adopt the GlycoCT file format. 3.2
4
GlyTouCan
The first Glycan Structure Repository named GlyTouCan [28] was developed by an international community of glycoscientists to provide unique accession numbers for every identified glycan structure present in various databases. Users can register their glycan structure and the generated unique accession number serves as a universal reference. Formats supported by GlyTouCan include GlycoCT [19], LinearCode [17] and KCF [50] allowing users to register glycans via entered text or file upload. GlyTouCan also allows users to search for structures by mass, motif or from graphic/text input. By linking to external databases, users can examine a glycan for more detailed data like function, interaction, and KEGG pathway information. Being part of the GlycoRDF initiative, also allows interrogation of various other “omics” databases [51]. GlyTouCan supports standard SPARQL queries through a linked data endpoint at http://ts.glytoucan.org/ sparql. Figure 1, illustrates all the type of databases accessible from GlyTouCan and Table 1 provides the appropriate literature reference for each database. This is a very interesting feature in GlyTouCan as it allows users to find additional information for their glycan of interest (e.g., where a glycan is bonded in a protein 3D structure). GlyTouCan implements the GlycoCT format.
Released Glycan Analytical Workflow Glycomics involves the global analysis of glycans released from glycoproteins via chemical or enzymatic reactions. A typical glycomics analytical workflow is depicted in Fig. 2. Released and fluorescently labeled glycans are typically analyzed by HILIC-UPLC– ESI/MS and programs like UNIFI help automate data analysis of the resulting glycan profile. In some instances, coelution of glycans can yield an incomplete glycan profile. Characterizing the isobaric or coeluting structures and their behavior after treatment with a defined combination of specific exoglycosidases permits further deconvolution of each glycan’s structure. For these types of experiments, we have developed GlycanAnalyzer in collaboration with New England Biolabs. GlycanAnalyzer permits automatic annotation of the multiple LC traces of a glycome. In the following
8
Ian Walsh et al.
Table 1 Glycomics databases linked out from GlyTouCan which are useful for finding other information about a particular glycan Uses GlycoRDF URL
Database
Data type
CSDB
Yes Structural, taxonomic, bibliographic, and NMR spectroscopic data of carbohydrates from bacteria, plants, and fungi
MonosaccharideDB
Properties of monosaccharides, residue names in different notations, provide means for translation of residue names
GlycoEpitope
Carbohydrate antigens and Yes their recognizing antibodies
UniCarbKB
Glycan structures and the attached protein sites, biological source, experimental methods, and supporting references
GlycomeDB
Structural and taxonomical Yes data from major public carbohydrate databases (has become part of GlyTouCan now)
Yes
Yes
Provider and References
http://csdb. glycoscience.ru/
[22]
http://www. monosaccharidedb. org/
Glycosciences. de [23]
http://www. glycoepitope.jp/
[24]
http://www.unicarbkb. org/
[25]
http://www.glycomedb.org/
[26]
GlycoGene database Glycogenes related to (GGDB) glycan synthesis in human and mouse
Yes
http://acgg.asia/ ggdb2/
[27]
GlyTouCan
Glycan structures with unique accession number
Yes
https://glytoucan.org/
[28]
GlycoStore
Elution positions of glycans using LC/MS, CE, and PGC
Yes
https://www.glycostore. [13] org
GlycoProtDB
N-glycosylated proteins and corresponding glycosylated sites
Yes
https://jcggdb.jp/ rcmg/gpdb/index
[29]
GlycoBase
Chromatographic O- and N- glycan database
No
Offline
[13] (continued)
Glycoinformatics Tools for Comprehensive Characterization of Glycans. . .
Table 1 (continued) Uses GlycoRDF URL
Database
Data type
CFG
No Glycan arrays, MALDIMS, gene microarrays and mouse phenotyping data
CarbBank (CCSD)
Citation information of published structures of carbohydrates
http://www. functionalglycomics. org/fg/
Provider and References [30]
No
Part of GlyTouCan now [31] (discontinued, adopted by GlycomeDB which is adopted by GlyTouCan)
GlycoChemExplorer Search system used by JCGGDB
No
NA This is a tool not a database, example can be found in JCGGDB
Glycoscience.de
NMR spectra, mass spectrum, and 3D models of carbohydrate structures
No
http://www. glycosciences.de/ database/index.php
JCGGDB
No Meta-database about information on glycosylation sites, genes involved in glycan synthesis
http://jcggdb.jp/index_ [33] en.html
KEGG glycan
No A database that links glycan structures, genes, pathways, and disease
http://www.genome.jp/ [34] kegg/glycan/
RCSB PDB
3D structures of macromolecules
No
https://www.rcsb.org/
[35]
PDBj
3D structures of macromolecules
No
https://pdbj.org/
[36]
PDBe
3D structures of macromolecules
No
https://www.ebi.ac.uk/ pdbe/
[37]
PubChem CID
Unique chemical structures for compounds
No
https://www.ncbi.nlm. [38] nih.gov/pccompound
PubChem SID
Depositor provided substance descriptions
No
https://www.ncbi.nlm. nih.gov/pcsubstance
SugarBindDB
Publication information of No pathogen and biotoxin lectins and their carbohydrate ligands
http://sugarbind.expasy. [39] org/
UniCarb-DB
Glycan structures and fragmentation data derived by LC-MS/MS
https://unicarb-db. expasy.org/
No
[32]
[38]
[40]
9
10
Ian Walsh et al.
Fig. 2 A typical released glycan identification and quantification workflow. The workflow involves enzymatic release of glycans, followed by fluorescent labeling and analysis using LC-MS instrument (Waters H-Class Acquity-Xevo G2-XS QToF). The data is collected and processed by Waters UNIFI. Glycans are identified by interrogation of a glycomic database such as GlycoStore. For structure confirmation of ambiguous glycans, LC-MS data generated after exoglycosidase array are interpreted and deciphered using GlycanAnalyzer (NEB) in conjunction with GlycoStore
sections, we will describe protocols for processing and analyzing glycomic datasets using UNIFI and GlycanAnalyzer. For readers specifically interested in glycoproteomics, we suggest references [12, 52] and examination of the Byonic software package [53]. 4.1 Glycan Characterization and Structural Elucidation Using Waters UNIFI
In our laboratory we perform automated glycan assignment using Waters UNIFI Scientific Information System. Data are collected and processed using the Glycan Assay (FLR with MS Confirmation) combined Acquire and Process method. This method performs glycan quantitation, glycan assignment by GU matching, and assignment confirmation with MS data. The following example describes our data processing steps in the analysis of 2-AB labeled human serum IgG glycans after LC-MS data acquisition. 1. Retention Time calibration: The dextran homopolymer analyzed concurrently with the IgG sample is used to align
Glycoinformatics Tools for Comprehensive Characterization of Glycans. . .
11
retention times in the fluorescence (FLR) and MS Base Peak chromatograms by introducing a time offset (0.06 min) into the FLR chromatogram (Fig. 3a). This offset accounts for the time it takes a glycan to reach the MS after exiting the FLR detector, and allows the monosaccharide composition of a glycan assignment to be confirmed with m/z signals at the same time point (through UNIFI’s Mass Confirmed function). 2. GU Library interrogation: The recorded GU elution times are entered into the processing method (Fig. 3b) which allows UNIFI to generate a GU standard curve using a cubic spline fitting where retention time is plotted against an expected GU value (Fig. 3c). The retention times of all detected chromatography peaks are standardized through conversion to GU’s using this curve. Next, the UNIFI GU library search settings are selected. For IgG glycans labeled with 2-AB, the Waters Glycan GU Library is chosen with a search tolerance of 0.25 GU, and structures containing bisecting GlcNAc or alpha-gal
Fig. 3 Retention time calibration. (a) Alignment of retention times in fluorescence (FLR) and MS Base Peak chromatograms. (b) The alignment of GU and RT are adjusted based on the realigned dextran LC-MS data. (c) An appropriate standard curve for retention time is fitted against the experimentally derived ladder
12
Ian Walsh et al.
Fig. 4 Library settings in Waters UNIFI. The library is defined and certain filters can be applied if appropriate (e.g., removal of α–galactose based glycans from the output). These filters increase the accuracy of the search by limiting the search space
residues are omitted from the library (Fig. 4). All other default settings are retained. 3. Editing Results Following data processing, the results are summarized in a Component Summary window (Fig. 5a) that contains all identified glycans, their peak areas (Response), relative abundance (%), and expected and observed GU values and Mass (Da). For example, Peak 2 in the integrated FLR chromatogram (Fig. 5b) contains a F(6)A2 glycan comprising 21.946% of the sample and a with a GU ¼ 5.8727 (Fig. 5a). The corresponding neutral mass of this glycan (1582.6083 Da) was also detected by the MS. A Library Search Results window (Fig. 5c) displays all other possible glycan library matches for Peak 2 with their ΔGU (the difference between a possible matching glycan’s GU and the observed Peak 2 GU) and Δm/z (the difference between a possible matching glycan’s m/z and the closest m/z detected in the MS). This window allows the user to manually curate the glycan assignments as desired. Lastly, the MS profile at the corresponding time point (7.5 min; Fig. 5d) is manually checked for both the mass of F(6)A2 and the absence of any other prominent m/z signals that might indicate a coeluting glycan. In some cases, matching by GU and mass can lead to ambiguous structure assignments. This often occurs with coeluting glycans with the same mass (isomers). Coeluting isomers are a common occurrence in complex samples such as serum or intricate
Glycoinformatics Tools for Comprehensive Characterization of Glycans. . .
13
Fig. 5 Summary of results in UNIFI (a and b) A summary of annotated chromatography peaks is shown. (c and d) Manual confirmation of the annotation is required to ensure that the automated assignments generated by UNIFI are correct
glycoproteins like erythropoietin. For this reason we added another layer of software to our workflow, GlycanAnalyzer (see Fig. 2 for its position in the workflow). 4.2 Interpretation of Exoglycosidase Array Data Using GlycanAnalyzer
Exoglycosidases catalyze the removal of terminal monosaccharides from the nonreducing end of glycans. Each enzyme has specificity for a particular monosaccharide, its stereochemistry (α/β anomer) and its linkage to an adjacent sugar in an oligosaccharide (Fig. 6b). Historically, the field’s preferred analytical exoglycosidases were purified from native sources (e.g., seed meals, pathogenic organisms) and were often contaminated with unwanted side activities (e.g., proteases, unwanted glycosidases). A recent advance has been the dramatic improvement in glycosidase quality through recombinant production and stringent quality testing (New England Biolabs). This has led to the assembly of an array of several different recombinant exoglycosidases, each with a highly defined specificity, being used to confirm the structure of glycans (Fig. 6). After sequential treatment with exoglycosidases, the mass and GU shifts that result from removal of monosaccharides can be used to determine the exact glycan (Fig. 6b). Because the specificity of each exoglycosidase in the array is known, sequential removal of monosaccharides results in predictable peak shifts that can confirm the structure of a glycan (Fig. 6). However, manual interpretation of data from exoglycosidase array treatments can be cumbersome, especially when the sample is complex (e.g., >100 peaks in
14
Ian Walsh et al.
Fig. 6 Using exoglycosidase arrays to decipher ambiguous glycans. (a) Exoglycosidases remove terminal monosaccharides with specific linkages and can be used in a sequential manner to determine detailed structure of a glycan (b). The sequential profiles after treatment with exoglycosidase arrays are examined to ensure that the shifts in GU and MS are consistent with predicted glycans structures in the original glycomic profile
Glycoinformatics Tools for Comprehensive Characterization of Glycans. . .
15
Fig. 7 GlycanAnalyzer website is used to deconvolute, decipher and annotate ambiguous glycan structures in LC-MS data
serum). To address this problem, we developed Glycan Analyzer in collaboration with New England Biolabs (U.S.A.). GlycanAnalyzer interprets data produced from LC-MS combined with an optimized recombinant exoglycosidase array. Input LC-MS data can be uploaded through a web application (Fig. 7; https:// glycananalyzer.neb.com) and GlycanAnalyzer can interpret the GU and mass changes for each peak in response to exoglycosidase array analysis. The output is a ranked list of glycans (Fig. 11) where each structure’s score depends on how well the observed GU and mass shifts match predicted values derived from computationally modelled digestions of relevant structures in GlycoStore. It can be used to annotate glycans with a very precise level of detail, down to position and linkage isomers. Glycan from LC-MS data from an entire monoclonal antibody can be automatically assigned in 20–30 min, considerably faster than manual annotation which can take days or weeks. In the following section we briefly described how to use the GlycanAnalyzer. 1. Visit GlycanAnalyzer Once glycan-exoglycosidase array data has been generated, unassigned, ambiguous peaks can be assigned using GlycanAnalyzer. The website is hosted by New England Biolabs at http://glycananalyzer.neb.com. GlycanAnalyzer was tested and developed on Chrome and Firefox browsers which are recommended. Program steps are organized sequentially on
16
Ian Walsh et al.
Fig. 8 Data Preprocessing. (a) Data need to be appropriately parsed into categories corresponding to a defined exoglycosidase array (b) Insert the UPLC-MS profiles. Ensure the data are formatted to include GU, abundance (%), mass, and charge in a tab space limited manner (c) Final data preprocessing prior to submission. Each box contains data from a profile that has been generated using the defined array. After submission, the resultant file can be used to submit to GlycanAnalyzer (Step 3)
the website and involve uploading appropriate data, selecting the peak/s to be annotated and optionally, selecting a glycoprotein from which the glycans derive to increase interpretation accuracy (Fig. 7). 2. Data preprocessing Input data needs to be parsed appropriately from raw LC-MS data that is collected after exoglycosidase array treatment. This will usually require input from the user. The most important raw data are the “GU,” “amount (%),” “observed mass (m/z),” and “observed charge.” Data from UNIFI can generally be exported and edited from the “component summary” into excel spreadsheets. To preprocess the data, doubleclick the link “supplying peaks and mass” (Fig. 8a). Data from the arrays can be uploaded in the appropriate tabs (Fig. 8b). The final datasets should be arranged in an example shown in Fig. 8c. Clicking the “Generate Input file” button will format the data into a format appropriate to upload to the GlycanAnalyzer. 3. Upload files Once the data is parsed appropriately, it can be uploaded to GlycanAnalyzer using the upload tab (Fig. 9). 4. Select the peak that needs a structural assignment The peak of interest that is to be annotated can be selected on the main page (Fig. 10a, b).
Glycoinformatics Tools for Comprehensive Characterization of Glycans. . .
17
Fig. 9 Upload the preprocessed file
Fig. 10 Selection of peaks for analysis. (a) Ambiguous peaks to be annotated can be selected using the prompt on the main page. (b) Peaks to be assigned can be produced from a list of peaks that is generated after the data from uploaded input data file is processed
GlycanAnalyzer will determine the peaks from the submitted data and generate a list of peaks that can be selected for annotation. The user can select which peak to annotate or assign categories of peaks such as “top 5” (most abundant) or all of the peaks. This step is computationally demanding and it may take 10–20 min to return structural assignments. 5. Interpretation of results After data processing, the results are summarized in a table similar to that depicted in Fig. 11. This output shows the predicted structures according to their assigned score. The score is calculated such that the closer it is to 0, the better the
18
Ian Walsh et al.
Fig. 11 Ranked list of possible N-glycan structures. This list is used to assist the user in selecting the correct assignment. Each glycan is scored based on its GU and m/z. The closer the score is to zero, the more accurate the match. Graphical analysis data provide the supporting evidence for each result
predicted match. The user can either keep the suggested match or reject it. To help the user in their decision, each match is accompanied by supporting information such as the databases used for analytical comparisons, the GlyTouCan links to other databases, and graphical representation of the analyses performed. Further information can be obtained by clicking on the links in the graphical analysis panel. The final summary output page for a sample is returned when all peaks are accepted and computed (Fig. 12). The bar chart shows each peak’s relative abundance where a tick indicates the user has accepted the peak. The pie charts give the distribution of sialic acids, GlcNAc antennae, galactose, and other monosaccharides. The tables present the N-glycan structures from the accepted peaks. Peak 3 has two pieces of evidence an observed mass and similar GU to average evidence in public databases. Peak 4 has three pieces of evidence: mass, shifting peaks and similar GU to average evidence in public databases. Peak 4’s N-glycan assignment can be considered to have strong support while peak 3’s has medium support. The weakest level of supporting evidence is GU similarity alone. The Mass, Shifts, and Glucose unit buttons can be clicked to visualize the evidence.
Glycoinformatics Tools for Comprehensive Characterization of Glycans. . .
19
Fig. 12 Final Summary Page. After peak selection is finalized, the overall structural attributes, such as degree of sialylation, fucosylation, and antennae, are visually summarized
5
Conclusion The past decade has seen tremendous advances in the analysis of protein glycosylation. These technical advances have made it easier for researchers and analysts to generate glycan structural data from individual glycoproteins and complex mixtures of glycoproteins with increasing sensitivity and speed. However, this has created a demand for organization and advancement of bioinformatics resources to address the challenge of analyzing bigger data sets that are now frequently and routinely being generated. We have described how efforts to standardize of file formats, coordinate the international use of a common glycan nomenclature, and develop comprehensive reference databases and repositories have helped to organize the field’s efforts and provide a needed foundation upon which the future of glycoinformatics will continue to grow. Already, software platforms like UNIFI, and new utilities like
20
Ian Walsh et al.
GlycanAnalyzer are incorporating these resources to make analysis of glycan structure simpler, faster, higher throughput and more accurate. In the coming years, we expect this trend to continue with an emphasis on high-throughput analyses, full automation, and seamless integration with the various analytical platforms. It is clear bioinformatics will play an increasingly critical role in advancing the field of glycoanalytics.
Acknowledgments This work was supported by funding from New England Biolabs, by the National Medical Research Council through the Open Fund-Individual Research Grant titled Role of Glycosylation in Dengue virus fitness and virulence (OFIRG18MAY-0070). We would like to thank all members of the analytics group at the Bioprocessing Technology Institute Singapore. The authors thank Alicia Bielik for comments and suggestions. References 1. Mittermayr S, Bones J, Guttman A (2013) Unraveling the glyco-puzzle: glycan structure identification by capillary electrophoresis. Anal Chem 85(9):4228–4238 2. Harvey DJ, Royle L, Radcliffe CM, Rudd PM, Dwek RA (2008) Structural and quantitative analysis of N-linked glycans by matrix-assisted laser desorption ionization and negative ion nanospray mass spectrometry. Anal Biochem 376(1):44–60 3. Kolarich D, Jensen PH, Altmann F, Packer NH (2012) Determination of site-specific glycan heterogeneity on glycoproteins. Nat Protoc 7 (7):1285 4. Jensen PH, Karlsson NG, Kolarich D, Packer NH (2012) Structural analysis of N-and O-glycans released from glycoproteins. Nat Protoc 7 (7):1299 5. Royle L, Roos A, Harvey DJ, Wormald MR, Van Gijlswijk-Janssen D, Redwan E-RM, Wilson IA, Daha MR, Dwek RA, Rudd PM (2003) Secretory IgA N-and O-glycans provide a link between the innate and adaptive immune systems. J Biol Chem 278(22):20140–20153 6. Guile GR, Rudd PM, Wing DR, Prime SB, Dwek RA (1996) A rapid high-resolution high-performance liquid chromatographic method for separating glycan mixtures and analyzing oligosaccharide profiles. Anal Biochem 240(2):210–226 7. Sto¨ckmann H, Adamczyk B, Hayes J, Rudd PM (2013) Automated, high-throughput
IgG-antibody glycoprofiling platform. Anal Chem 85(18):8841–8849 8. Sto¨ckmann H, O’Flaherty R, Adamczyk B, Saldova R, Rudd P (2015) Automated, highthroughput serum glycoprofiling platform. Integr Biol 7(9):1026–1032 9. Hofmann J, Struwe WB, Scarff CA, Scrivens JH, Harvey DJ, Pagel K (2014) Estimating collision cross sections of negatively charged N-glycans using traveling wave ion mobilitymass spectrometry. Anal Chem 86 (21):10789–10795 10. Gray CJ, Schindler B, Migas LG, Picˇmanova´ M, Allouche AR, Green AP, Mandal S, Motawia MS, Sa´nchez-Pe´rez R, Bjarnholt N (2017) Bottom-up elucidation of glycosidic bond stereochemistry. Anal Chem 89(8):4540–4549 11. Manz C, Pagel K (2018) Glycan analysis by ion mobility-mass spectrometry and gas-phase spectroscopy. Curr Opin Chem Biol 42:16–24 12. Walsh I, Zhao S, Campbell M, Taron CH, Rudd PM (2016) Quantitative profiling of glycans and glycopeptides: an informatics’ perspective. Curr Opin Struct Biol 40:70–80 13. Campbell MP, Royle L, Radcliffe CM, Dwek RA, Rudd PM (2008) GlycoBase and autoGU: tools for HPLC-based glycan analysis. Bioinformatics 24(9):1214–1216 14. Packer NH, von der Lieth CW, Aoki-Kinoshita KF, Lebrilla CB, Paulson JC, Raman R, Rudd P, Sasisekharan R, Taniguchi N, York
Glycoinformatics Tools for Comprehensive Characterization of Glycans. . . WS (2008) Frontiers in glycomics: bioinformatics and biomarkers in disease an NIH white paper prepared from discussions by the focus groups at a workshop on the NIH campus, Bethesda MD (September 11–13, 2006). Proteomics 8(1):8–20 15. McNaught AD (1997) International Union of Pure and Applied Chemistry and International Union of Biochemistry and Molecular Biology. Joint commission on biochemical nomenclature. Nomenclature of carbohydrates. Carbohydr Res 297(1):1–92 16. Bohne-Lang A, Lang E, Forster T, von der Lieth CW (2001) LINUCS: linear notation for unique description of carbohydrate sequences. Carbohydr Res 336(1):1–11 17. Banin E, Neuberger Y, Altshuler Y, Halevi A, Inbar O, Nir D, Dukler A (2002) A novel linear code nomenclature for complex carbohydrates. Trends Glycosci Glycotechnol 14 (77):127–137. https://doi.org/10.4052/ tigg.14.127 18. Aoki KF, Yamaguchi A, Ueda N, Akutsu T, Mamitsuka H, Goto S, Kanehisa M (2004) KCaM (KEGG carbohydrate matcher): a software tool for analyzing the structures of carbohydrate sugar chains. Nucleic Acids Res 32 (Web Server issue):W267–W272. https://doi. org/10.1093/nar/gkh473 19. Herget S, Ranzinger R, Maass K, Lieth CW (2008) GlycoCT-a unifying sequence format for carbohydrates. Carbohydr Res 343 (12):2162–2171. https://doi.org/10.1016/j. carres.2008.03.011 20. Tanaka K, Aoki-Kinoshita KF, Kotera M, Sawaki H, Tsuchiya S, Fujita N, Shikanai T, Kato M, Kawano S, Yamada I, Narimatsu H (2014) WURCS: the Web3 unique representation of carbohydrate structures. J Chem Inf Model 54(6):1558–1566. https://doi.org/ 10.1021/ci400571e 21. Kornfeld S, Li E, Tabas I (1978) The synthesis of complex-type oligosaccharides. II. Characterization of the processing intermediates in the synthesis of the complex oligosaccharide units of the vesicular stomatitis virus G protein. J Biol Chem 253(21):7771–7778 22. Toukach PV, Egorova KS (2016) Carbohydrate structure database merged from bacterial, archaeal, plant and fungal parts. Nucleic Acids Res 44(Database issue):D1229–D1236. https://doi.org/10.1093/nar/gkv840 23. Comprehensive Monosaccharide Database (MonosaccharideDB). Available via Glycosciences.de. http://www.monosaccharidedb. org/. Accessed 18 Feb 2018
21
24. Okuda S, Nakao H, Kawasaki T (2015) GlycoEpitope: database for carbohydrate antigen and antibody. In: Taniguchi N, Endo T, Hart GW, Seeberger PH, Wong C-H (eds) Glycoscience: biology and medicine. Springer Japan, Tokyo, pp 267–273. https://doi.org/ 10.1007/978-4-431-54841-6_27 25. Campbell MP, Peterson R, Mariethoz J, Gasteiger E, Akune Y, Aoki-Kinoshita KF, Lisacek F, Packer NH (2014) UniCarbKB: building a knowledge platform for glycoproteomics. Nucleic Acids Res 42(Database issue):D215–D221. https://doi.org/10. 1093/nar/gkt1128 26. Ranzinger R, Herget S, von der Lieth C-W, Frank M (2011) GlycomeDB—a unified database for carbohydrate structures. Nucleic Acids Res 39(Database issue):D373–D376. https:// doi.org/10.1093/nar/gkq1014 27. Narimatsu H, Suzuki Y, Aoki-Kinoshita KF, Fujita N, Sawaki H, Shikanai T, Sato T, Togayachi A, Yoko-o T, Angata K, Kubota T, Noro E (2017) GlycoGene database (GGDB) on the semantic web. In: Aoki-Kinoshita KF (ed) A practical guide to using glycomics databases. Springer Japan, Tokyo, pp 163–175. https://doi.org/10.1007/978-4-431-564546_8 28. Aoki-Kinoshita K, Agravat S, Aoki NP, Arpinar S, Cummings RD, Fujita A, Fujita N, Hart GM, Haslam SM, Kawasaki T, Matsubara M, Moreman KW, Okuda S, Pierce M, Ranzinger R, Shikanai T, Shinmachi D, Solovieva E, Suzuki Y, Tsuchiya S, Yamada I, York WS, Zaia J, Narimatsu H (2016) GlyTouCan 1.0 - the international glycan structure repository. Nucleic Acids Res 44(D1):D1237–D1242. https:// doi.org/10.1093/nar/gkv1041 29. Kaji H, Shikanai T, Sasaki-Sawa A, Wen H, Fujita M, Suzuki Y, Sugahara D, Sawaki H, Yamauchi Y, Shinkawa T, Taoka M, Takahashi N, Isobe T, Narimatsu H (2012) Large-scale identification of N-glycosylated proteins of mouse tissues and construction of a glycoprotein database, GlycoProtDB. J Proteome Res 11(9):4553–4566. https://doi. org/10.1021/pr300346c 30. Raman R, Venkataraman M, Ramakrishnan S, Lang W, Raguram S, Sasisekharan R (2006) Advancing glycomics: implementation strategies at the consortium for functional glycomics. Glycobiology 16(5):82R–90R. https:// doi.org/10.1093/glycob/cwj080 31. Doubet S, Albersheim P (1992) CarbBank. Glycobiology 2(6):505 32. Lutteke T, Bohne-Lang A, Loss A, Goetz T, Frank M, von der Lieth CW (2006)
22
Ian Walsh et al.
GLYCOSCIENCES.de: an internet portal to support glycomics and glycobiology research. Glycobiology 16(5):71r–81r. https://doi.org/ 10.1093/glycob/cwj049 33. Maeda M, Fujita N, Suzuki Y, Sawaki H, Shikanai T, Narimatsu H (2015) JCGGDB: Japan consortium for Glycobiology And Glycotechnology Database. Methods Mol Biol 1273:161–179. https://doi.org/10.1007/ 978-1-4939-2343-4_12 34. Hashimoto K, Goto S, Kawano S, AokiKinoshita KF, Ueda N, Hamajima M, Kawasaki T, Kanehisa M (2006) KEGG as a glycome informatics resource. Glycobiology 16(5):63r–70r. https://doi.org/10.1093/ glycob/cwj010 35. Berman H, Henrick K, Nakamura H (2003) Announcing the worldwide Protein Data Bank. Nat Struct Biol 10:980. https://doi. org/10.1038/nsb1203-980 36. Kinjo AR, Yamashita R, Nakamura H (2010) PDBj Mine: design and implementation of relational database interface for Protein Data Bank Japan. Database 2010:baq021. https:// doi.org/10.1093/database/baq021 37. Mir S, Alhroub Y, Anyango S, Armstrong DR, Berrisford JM, Clark AR, Conroy MJ, Dana JM, Deshpande M, Gupta D, Gutmanas A, Haslam P, Mak L, Mukhopadhyay A, Nadzirin N, Paysan-Lafosse T, Sehnal D, Sen S, Smart OS, Varadi M, Kleywegt GJ, Velankar S (2018) PDBe: towards reusable data delivery infrastructure at protein data bank in Europe. Nucleic Acids Res 46(D1): D486–D492. https://doi.org/10.1093/nar/ gkx1070 38. Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He S, Shoemaker BA, Wang J, Yu B, Zhang J, Bryant SH (2016) PubChem substance and compound databases. Nucleic Acids Res 44(D1):D1202–D1213. https://doi.org/10.1093/nar/gkv951 39. Mariethoz J, Khatib K, Alocci D, Campbell MP, Karlsson NG, Packer NH, Mullen EH, Lisacek F (2016) SugarBindDB, a resource of glycan-mediated host–pathogen interactions. Nucleic Acids Res 44(Database issue): D1243–D1250. https://doi.org/10.1093/ nar/gkv1247 40. Hayes CA, Karlsson NG, Struwe WB, Lisacek F, Rudd PM, Packer NH, Campbell MP (2011) UniCarb-DB: a database resource for glycomic discovery. Bioinformatics 27 (9):1343–1344. https://doi.org/10.1093/ bioinformatics/btr137 41. Varki A, Cummings RD, Aebi M, Packer NH, Seeberger PH, Esko JD, Stanley P, Hart G, Darvill A, Kinoshita T, Prestegard JJ, Schnaar
RL, Freeze HH, Marth JD, Bertozzi CR, Etzler ME, Frank M, Vliegenthart JFG, Lu¨tteke T, Perez S, Bolton E, Rudd P, Paulson J, Kanehisa M, Toukach P, AokiKinoshita KF, Dell A, Narimatsu H, York W, Taniguchi N, Kornfeld S (2015) Symbol nomenclature for graphical representations of glycans. Glycobiology 25(12):1323–1324. https://doi.org/10.1093/glycob/cwv091 42. Harvey DJ, Merry AH, Royle L, Campbell MP, Dwek RA, Rudd PM (2009) Proposal for a standard system for drawing structural diagrams of N- and O-linked carbohydrates and related compounds. Proteomics 9 (15):3796–3801. https://doi.org/10.1002/ pmic.200900096 43. Ceroni A, Dell A, Haslam SM (2007) The GlycanBuilder: a fast, intuitive and flexible software tool for building and displaying glycan structures. Source Code Biol Med 2(1):3. https://doi.org/10.1186/1751-0473-2-3 44. von der Lieth C-W, Freire AA, Blank D, Campbell MP, Ceroni A, Damerell DR, Dell A, Dwek RA, Ernst B, Fogh R (2010) EUROCarbDB: an open-access platform for glycoinformatics. Glycobiology 21(4):493–502 45. Ceroni A, Maass K, Geyer H, Geyer R, Dell A, Haslam SM (2008) GlycoWorkbench: a tool for the computer-assisted annotation of mass spectra of glycans. J Proteome Res 7 (4):1650–1659 46. Tsuchiya S, Aoki NP, Shinmachi D, Matsubara M, Yamada I, Aoki-Kinoshita KF, Narimatsu H (2017) Implementation of GlycanBuilder to draw a wide variety of ambiguous glycans. Carbohydr Res 445:104–116. https://doi.org/10.1016/j.carres.2017.04. 015 47. Akune Y, Hosoda M, Kaiya S, Shinmachi D, Aoki-Kinoshita KF (2010) The RINGS resource for glycome informatics analysis and data mining on the web. Omics 14 (4):475–486. https://doi.org/10.1089/omi. 2009.0129 48. Cheng K, Zhou Y, Neelamegham S (2017) DrawGlycan-SNFG: a robust tool to render glycans and glycopeptides with fragmentation information. Glycobiology 27(3):200–205. https://doi.org/10.1093/glycob/cww115 49. Varki A, Cummings RD, Aebi M, Packer NH, Seeberger PH, Esko JD, Stanley P, Hart G, Darvill A, Kinoshita T (2015) Symbol nomenclature for graphical representations of glycans. Glycobiology 25(12):1323–1324 50. Hattori M, Okuno Y, Goto S, Kanehisa M (2003) Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the
Glycoinformatics Tools for Comprehensive Characterization of Glycans. . . metabolic pathways. J Am Chem Soc 125 (39):11853–11865. https://doi.org/10. 1021/ja036030u 51. Jupp S, Malone J, Bolleman J, Brandizi M, Davies M, Garcia L, Gaulton A, Gehant S, Laibe C, Redaschi N, Wimalaratne SM, Martin M, Le Novere N, Parkinson H, Birney E, Jenkinson AM (2014) The EBI RDF platform: linked open data for the life sciences. Bioinformatics 30(9):1338–1339. https://doi.org/10.1093/bioinformatics/ btt765
23
52. Thaysen-Andersen M, Packer NH (2014) Advances in LC–MS/MS-based glycoproteomics: getting closer to system-wide site-specific mapping of the N-and O-glycoproteome. Biochim Biophys Acta 1844(9):1437–1452 53. Bern M, Kil YJ, Becker C (2012) Byonic: advanced peptide and protein identification software. Curr Protoc Bioinformatics Chapter 13:Unit 13.20
Chapter 2 GlycoStore: A Platform for H/UPLC and Capillary Electrophoresis Glycan Data Matthew P. Campbell, Sophie Zhao, Jodie L. Abrahams, Terry Nguyen-Khuong, and Pauline M. Rudd Abstract GlycoStore (http://www.glycostore.org) is an open access chromatographic and electrophoretic retention database of glycans characterized from glycoproteins, glycolipids, and biotherapeutics. It is a continuation of the GlycoBase project (Oxford Glycobiology Institute and National Institute for Bioprocessing Research and Training, Ireland) but addresses many of the technological limitations that impacted the growth of GlycoBase, in particular, improvements to the bioinformatics architecture, enhancing data annotations and coverage, and improving connectivity with external resources. The first release of GlycoStore (October 2017) contains over 850 glycan entries accompanied by 8500+ retention positions including data from: (1) fluorescently labelled released glycans determined using hydrophilic interaction chromatography (HILIC) ultrahigh-performance liquid chromatography (U/HPLC) and reversed phase (RP)-U/HPLC; (2) porous graphitized carbon chromatography (PGC) interfaced with ESI-MS/MS; and (3) capillary electrophoresis with laser induced fluorescence detection (CE-LIF). In this chapter, we outline the objectives of GlycoStore, and describe a selection of step-by-step workflows for navigating and browsing the information available. We also provide a short description of informatics tools available to query the database using Semantic technologies. The information presented in this chapter supplements our documentation knowledge base that describes interface improvements, new features and tools, and content updates (https://unicarbkb.freshdesk.com/). Key words Glycomics, Glycobiology, Glycoinformatics, Database
1
Introduction Glycosylation is the most common and structurally diverse posttranslational modification that has an impact on a wide range of biological functions [1]. It is widely accepted that carbohydrates play an important role in protein folding, secretion and degradation, cell signaling, and the immune response, among others, and that alterations in glycosylation play a major role in many diseases, such as cancer [2, 3], autoimmunity [4, 5], and viral/bacterial infection [6, 7]. One of the key challenges is to understand in detail which glycans contribute to specific biological functions. However,
Gavin P. Davey (ed.), Glycosylation: Methods and Protocols, Methods in Molecular Biology, vol. 2370, https://doi.org/10.1007/978-1-0716-1685-7_2, © Springer Science+Business Media, LLC, part of Springer Nature 2022
25
26
Matthew P. Campbell et al.
the inherent complexity of glycan structures and microheterogeneity makes the analysis of glycoconjugates very challenging. To fully solve glycan structures, a variety of orthogonal tools are needed, commonly involving liquid chromatography, mass spectrometry, capillary electrophoresis, nuclear magnetic resonance (NMR), and lectin and glycan arrays. A number of reviews provide commentary on the advantages and disadvantages of each technique [8–16]. The growing interest in glycomics has provided an impetus for the development of high-throughput workflows for sample preparation and analysis. One of the most widely reported and established methods for relative glycan quantification is chromatographic separation with fluorescence detection, which is ideal for the effective separation of structural isomers with high reproducibility. Over the last few years semiautomated robotic high-throughput analytical technologies have improved our capability to generate high-quality, large-scale profiling data of thousands of fluorescently labelled glycans using ultrahighperformance liquid chromatography (UPLC) coupled with mass spectrometry; however, the analysis and storage of data is a bottleneck. 1.1 Glycoinformatic Resources for Separation and Electrophoretic Data
Only a handful of databases and tools have been developed to support separation and electrophoretic glycomics data, notably GlycoBase [17], GuCal [18], glyXalign [19], and the pyridylaminated (PA) elution map database [20]. In the late-2000’s, EUROCarbDB [21] laid the foundations for the creation of a structural database and associated informatic tools to support the deposition and curation of experimental data. A major outcome of EUROCarbDB was the development of GlycoBase and autoGU, which supported HILIC 2-aminobenzamide (2AB) analysis of released glycans. Initially, GlycoBase contained normalized HPLC elution positions for approximately 350 fluorescent labeled N- and Olinked glycan structures expressed in the form of glucose unit values obtained from a variety of biological sources and materials. Importantly, all structures were characterized by a combination of HILIC chromatography with exoglycosidase sequencing and mass spectrometry (MALDI-MS, ESI-MS, ESI-MS/MS, LC-MS, LC-ESI-MS/MS). Beyond EUROCarbDB, GlycoBase continued with three major releases accumulating data from CE-LIF, UPLC, and (RP)U/HPLC techniques. In addition, the expansion of GlycoBase led to the development of supporting data analysis tools (e.g., GlycoDigest [22] and GlycoMarker); however, despite its popularity and growth, maintaining and developing GlycoBase using the original software tools has become increasingly difficult. To preserve the information and to continue the efforts of GlycoBase a new international initiative was launched in 2017, which led to the release of GlycoStore [23]. This database has
GlycoStore: A Platform for H/UPLC and Capillary Electrophoresis Glycan Data
27
four levels of content. The first brings together annotated glycomics data sourced from the analytical platforms described above. The second level provides access to a growing, curated database of published literature, with a focus on data that has become available over the past 5 years—filling an information gap between GlycoBase and GlycoStore. The third is a structure search tool with improved functionality for filtering entries based on annotated features (e.g., epitopes and mass), by category type, and glycoprotein. The final level is the availability of Semantic resources providing a unified and stable platform for developers. In this chapter, we describe a selection of step-by-step protocols for searching and navigating GlycoStore. These protocols are representative of typical queries performed by end-users to find information from published and in-house research collections.
2
Accessing and Navigating GlycoBase To improve the user experience, we recommend using the latest version of any popular Internet browser (Internet Explorer, Firefox, Google Chrome, Safari, or Opera). The user interface is simple and intuitive and is based on a common template to display information that has been designed following user feedback and analysis of user activity. The layout includes a contents/results section, a navigation panel, search forms, nomenclature notation formats, links to biological associations and external databases. GlycoStore’s home page, accessible at http://www.glycostore. org, is the starting point for browsing the database. To access the database either click the ‘Explore’ button or make a selection from the navigation bar (Subheading 2.1).
2.1 GlycoStore Navigation Bar
The navigation bar, positioned at the top the interface, is the most important component in GlycoStore. Not only does it guide users to pages beyond the homepage, it provides access to all search tools, data collections and curated publications. Figure 1 shows four drop-down menus: “Show All,” “Search,” “Compare,” and “About.” Here, the “Show All” menu provides links to the data collection and publication pages, discussed in Subheading X; the “Search” menu lists “Retention Times/Units,” “GlycoCT” [24], and “Structure Motif and Mass”; the ‘Compare’ feature can be used to show glycan structures common in different taxonomies and glycoproteins; finally, the “About” menu provides links to the project documentation pages and general information. The search form embedded in the navigation bar (orange box, Fig. 1) can be used to either (1) find entries that match a queried Oxford shorthand structure name or (2) retrieve N-, O-linked, GSL, or free glycans whose reported elution time/unit value matches a specified value for a selected separation or electrophoretic method.
28
Matthew P. Campbell et al.
Fig. 1 GlycoStore contains over 850 structures sourced from published and in-house data collections. This screenshot shows all curated references with manuscript title, authors, year of publication, and the journal. To access structural and experimental data reported by a manuscript simply follow the title link. The blue box encapsulates the main search features that are described in Subheading 2.1. The search form embedded in the bar (orange box) provides a quick search option and supports structure name or elution position queries
3
Data Collections and Publications To improve the organization and management of GlycoStore we have organized the data model into collections and references. Collections group data from multiple sources, for example, the Porous Graphitized Carbon (PGC) collection stores retention properties derived from PGC LC-MS/MS workflows and the BTI N-glycans provides access to published and unpublished data generated by A*STAR Singapore. In comparison, references describe curated structural, experimental, and associated metadata extracted from over 30 publications. 1. To display the published material stored in GlycoStore select “References” from the “Show All” menu positioned the top navigation bar. To view glycan structures, corresponding elution positions, experimental details, and other biological source information (e.g., taxonomy) click the references title. An example screenshot is shown in Fig. 2. 2. Data collections can be accessed from the “Collections” link under the “Show All” menu. “Show Collections” will display a summary page similar to Table 1. To view all structural and
GlycoStore: A Platform for H/UPLC and Capillary Electrophoresis Glycan Data
29
Fig. 2 Screenshot of the references page (http://www.glycostore.org/references) shows all curated published material
Table 1 Summary of the twelve data collections available in GlycoStore (version 1.0) including a brief description of the analytical technique and sample types (http://www.glycostore.org/collections) Collection name
Description
Samples
BTI GSL standards
Glycosphingolipids standards labeled with 2-AB and procainamide
Ganglio-, lacto-, Neolacto-, Globo- HILIC, and Iso-globo UPLC
CE database
N-glycans analyzed by capillary electrophoresis
Haptoglobin, IgG, purified standards, RNase B, transferrin
CE-LIF
Human serum N-glycans NIBRT
Glycans characterized in-house
Human serum
HILICUPLC
Cow, dromedary camel, goat, horse, pig, sheep
HILICUPLC
Milk Milk oligosaccharides analyzed oligosaccharides from domestic animals
Technique
MQ porous graphitic carbon (PGC)
Reduced N-glycan structures released from standard glycoproteins
Commercially available glycoprotein standards
PGC-LCESIMS/ MS
NIBRT GSL
Glycosphingolipids from human serum and mammalian cell surfaces
Mammalian cell surfaces, blood serum
UPLCHILICFLD
O-Glycans
O-glycan library
Various O-glycans
HILICHPLC
Human serum, human immunoglobulins, monoclonal antibodies, follicle stimulating hormone, and cell surface glycoproteins including human CDs
HILICHPLC
Original release of Original collection of N-glycans GlycoBase analyzed at the Oxford Glycobiology Institute
(continued)
30
Matthew P. Campbell et al.
Table 1 (continued) Collection name
Description
Samples
Royle (2008) Paper
Human serum HPLC-based analysis of serum N-glycans on a 96-well plate Platform with dedicated database software
HILICHPLC
RP IgG glycans
N-glycans characterized by reversed-phase UPLC
RP-UPLC
UPLC BTI
N-glycans analyzed from a variety Human IgG of biological samples Alpha-1 antitrypsin(AAT), human serum, mouse serum
UPLC Ludger
N- and O-glycan structures determined from Ludger standards and a number of biological samples
Human IgG
Technique
HILICUPLC
Ludger commercial glycoprotein HILICand glycan standards, saliva, and HPLC, ocular mucin HILICUPLC
Fig. 3 Publication page for an article by Abrahams et al. [25] reporting a porous graphitized carbon retention library. The top content panel includes the title of the publication and journal details. The main body shows a list of experimentally validated structures released for a set of glycoprotein standards (summarized in the right panel, blue box), which are linked to the glycan structure summary page
GlycoStore: A Platform for H/UPLC and Capillary Electrophoresis Glycan Data
3.1 Individual Publications and Collections
31
experimental content click the collection name. A screenshot for is shown in Fig. 3. As shown in Fig. 3, the publication and data collections pages are divided into three sections: 1. Header displaying bibliographic information. 2. A results section displaying a table of glycan structures along with information on the fluorescent label used, or if the glycans are reduced, and the reported retention value. Each glycan image is linked to a “Glycan Structure” page. 3. The sidebar panel provides contextual information, for example, “Biological Associations.”
4
Curated Glycan Information
4.1 Biological Associations
5
In Subheading 3, we described the concept of collections and references to improve the organization of structural, experimental, and associated metadata. As shown in Fig. 3 each collection or reference displays a set of glycan structures with supporting contextual information. As part of our biocuration activities each structure deposited in the database is verified by at least two team members to ensure consistency with the published material, and when required authors are contacted to clarify any uncertainty. To view detailed experimental information for any glycan (e.g., exoglycosidase digestions and reported elution positions), simply click the image or name to load a summary page. An example entry page for the A2B (biantennary, bisected N-glycan) structure is shown in Fig. 4. Each page shows a pictorial representation of the structure depicting monosaccharide sequence, anomericity, and linkage; elution positions with standard deviation (calculated from all listed data); related reference information; and supporting exoglycosidase evidence. The (individual or multiple) taxonomic, glycoprotein, and tissue sources of a set of glycan structures can be found under “Biological Associations.” To find related biological content use the embedded links in the quick navigation lists to access individual taxonomy, glycoprotein, or tissue source pages.
Search Options GlycoStore offers a variety of search methods including: (a) by elution values such as GU, AU, or time (mins); (b) monosaccharide composition; or (c) by metadata labels such as taxonomy, sample name, or by structure name (Oxford linear notation). All available options are listed under the ‘Search’ menu. ‘Show Structures’ will direct users to a new single web application
32
Matthew P. Campbell et al.
Fig. 4 The glycan structure overview page displays all data specific to the glycan and any related information, and comprises three parts: (1) graphical glycan depiction (in a defined nomenclature format) with linkage and monosaccharide anomericity; (2) below the image, sections describing references, exoglycosidase treatments, reported retention positions, GlycoCT encoding, and when available MS2 spectra are display in a tab enabled table; (3) the “Biological Associations” subsection describes any associate taxonomies, tissues and/or glycoproteins; and (4) when available links to external database including GlyTouCan
5.1 Search by Structure Name
that lists all stored entries that can be filtered by defined properties, for example, by mass orcomplex a motif (See Notes 1–3). Glycans are inherently biomolecules, involving a breadth of monosaccharide diversity and varying degrees of branching. Similar to GlycoBase, GlycoStore has adopted the textual Oxford naming convention to describe N-glycan structures. This format was published by Harvey et al., and has been recently extended to include definitions for Neu5Ac and Neu5Ac residues, and O-acetylation. An explanation of the format with examples can be downloaded from the ‘About Us’ menu under Oxford Notation Guide. In brief, the notation assumes all N-glycans have two core GlcNAc residues, from the common pentasaccharide core, and defines the minimal requirements to unambiguously annotate a structure. For example, A2 describes a biantennary structure, and by extension F(6)A2 is a core fucosylated (alpha 1–6 linked) biantennary structure with both GlcNAcs as alpha 1–2 linked. F at the start of the abbreviation indicates a core fucose; Mx, the number(x) of mannose residues attached to the pentasaccharide core GlcNAcs; Ax, number of antenna (GlcNAc) on the trimannosyl core; Gx, number (x) of linked galactose on antenna; [3] G1 and [6] G1 indicates that the galactose is on the antenna of the alpha1–3 or alpha1–6 mannose; Sx, number (x) of sialic acids linked to galactose that can be
GlycoStore: A Platform for H/UPLC and Capillary Electrophoresis Glycan Data
33
replaced with Sg or Sa for Neu5Gc and Neu5Ac respectively. Numbers are also used to indicate linkages where known (e.g., F(6)A2G (4)2S(6)2 is a biantennary glycan with a core fucose in an alpha 1–6 linkage, the two galactoses are beta 1–4 linked, and the sialic acids are alpha 2–6 linked). 1. To find entries by structure name - select name from the drop down and enter a structure description based on the Oxford format rules. 2. The database will only return a result that exactly matches the search term, that is, identical topology, linkage, and anomeric configuration as defined by linear notation. The results page will list all structures characterized by the specified analytical platform and the elution range (see Subheading 5.2). 3. To view the glycan structure summary page, click the structure name or image. 5.2 Search by Separation Technology
GlycoStore provides access to the elution properties for over 850 N-, O-, and glycosphingolipid (GSL) glycans including standardized retention times expressed as glucose units (GU) and arabinose units (AU), for 2AB and procainamide labelled glycans determined by HPLC, UPLC, or RP-UPLC techniques. It also provides time based data for reduced glycans analyzed with graphitized carbon (PGC) workflows, along with a growing CE dataset of APTS labelled glycans, as shown in Fig. 5.
Fig. 5 GlycoStore search page is divided into six tab enabled sections. As shown in the screenshot the “Technology Platform” tab displays all supported analytical platforms along with a simple elution range search form. The ‘Sample Name’ and ‘Report Name’ sections include drop down menu listing unpublished data collections. The Tissue, Glycoprotein and Taxonomy sections have drop down lists to quickly retrieve associated glycan structures
34
Matthew P. Campbell et al.
1. To find glycan structures characterized by a specific analytical method go to “Search ¼> Retention Times/Units” from the main navigation bar. 2. Select the “Technology Platform” tab, followed by an option from the drop-down list labeled “Experiment Type.” 3. Enter a data range in the minimum and maximum boxes as indicated in Fig. 5 and click Go. Alternatively, the quick search form positioned in the navigation bar can be used (Subheading 5.1). As shown in Fig. 6 to perform a quick search, carry out the following steps. 1. Select an analytical method from the drop-down list (e.g., UPLC 2-AB). 2. Enter an elution value and click the magnifying glass symbol. This option will retrieve structure entries matching the submitted value, within a default range of 0.5 unit value or 2 min. In both cases the results page will list all structures characterized by the specified analytical platform and the elution range. 5.3 Search Filter Selection
Figure 7 shows an example 6–7 GU search against the UPLC 2-AB database, which returns over 40 structures. To refine these results users can select a set of features listed in the right navigation panel
Fig. 6 The search feature embedded in the top navigation bar can be used to (1) find structures matching an Oxford linear notation, or (2) to perform a simple retention value search against a selected analytical method with a tolerance of 0.5
GlycoStore: A Platform for H/UPLC and Capillary Electrophoresis Glycan Data
35
Fig. 7 GlycoStore search result for 2-AB labelled glycan structures eluting within a 6–7 glucose unit (GU) range. The N- and O-glycan filters displayed in the blue box (right panel) can be used to refine the structure results by feature selection
(blue box). Here, N-linked glycans can be filtered by branching type and terminal epitopes (e.g., sialylation, core fucosylation, and presence of outer arm fucose residues). O-link filters are representative of the database coverage, for example, only core 1 and core 2 structures are deposited in GlycoStore (version 1.0), thereby limiting core filters to these descriptors. 5.4 Using the Glycoprotein, Tissue, and Taxonomy Search Page
As shown in Fig. 5, the main search page features tabs named Glycoprotein, Tissue, and Taxonomy. These are simple navigation aids that allow users to find information on the distribution of glycan structures in different taxonomies and tissues, it also provides details and links to the glycan profiles of a limited set of glycoproteins (See Note 3). To access this information: 1. Point your browser at the GlycoStore search page http://www. glycostore.org/search or select “Retention Times/Units” tab from the ‘Search’ menu. 2. Next click the glycoprotein, tissue, or taxonomy tab and select a term from the drop down list, and press the Ok button. 3. The results table will list all structures associated with the selected search term. 4. Click the image or name to load the glycan structure summary page. It is also possible to compare multiple glycoproteins or taxonomies using the “Compare” feature: 1. Select Taxonomy or Glycoprotein from the “Compare” menu. 2. Make two or three checkbox selections.
36
Matthew P. Campbell et al.
3. Click Go to show common structures along with elution/ separation values.
6
Browse a Comprehensive List of Glycan Structures
6.1 Using the Structure Search Tool
GlycoStore features a new glycan structure database built for the web using the React JavaScript library and ElasticSearch. The tool provides components for efficiently browsing and filtering curated glycan structures deposited in GlycoStore by mass, composition, and motifs. Figure 8 shows the web application layout with the structure filters grouped in the left navigation panel (orange box). 1. The web application can be accessed from the main navigation bar (‘Search -> Structure Motif and Mass’). By default, GlycoStore represents structures using the hybrid Essentials/ Oxford graphical notation. To change the graphical notation preference, select a format from the list in the top right results panel. 2. The left navigation panel (orange box) lists filters available including sliders to set mass and monosaccharide composition ranges, and a selection of structural motifs. The ranges of these filters and motifs are based on the glycan structures deposited.
Fig. 8 The GlycoStore structure browsing tool can filter entries based on multiple selections of monosaccharide composition, mass range, and motifs. The structure listing provides links to the GlycoStore glycan structure summary pages and automatically updates with filter adjustments
GlycoStore: A Platform for H/UPLC and Capillary Electrophoresis Glycan Data
37
3. The glycan mass histogram displays the population of entries within the specified range. Adjusting the slider will automatically update the structure listings. 4. Similarly, the motif menu counts the number of structures containing a particular motif based on the other search criteria. Clicking on a motif in the list displays glycans containing the selected motif. 5. The blue box displays the current search criteria that can be deleted or reset. 6. The main results panel dynamically lists structures matching the searched criteria, and includes a summary of key attributes (e.g., mass, motif, and links to the main GlycoStore website). The glycan structure listings automatically update based on the search selections and ranges set. In some cases, a large number of results can be shown, but the multiple filtering options allow users to easily find their structure of interest. 6.2 Data Availability and Implementation
7
The web application is developed in Java and Scala using the Play Framework. Data is stored in an Apache Jena TDB triple store using a Resource Description Framework (RDF) format as defined by the GlycoRDF ontology. A Spring API has been developed to model SPARQL operations as Java objects and the library is available from the projects BitBucket repository (https://bitbucket.org/ glycostore/glycostore). The wiki provides documentation for the glycoinformatics community and how to perform remote queries using the SPARQL endpoint. Users are encouraged to review the source code and data licenses.
Notes 1. The results pages are displayed in a context-dependent manner that may include descriptive text details or provide direct links to glycan structures or biological associations (e.g., taxonomy, tissue, or glycoprotein entries). 2. The structure filter options are dependent on the results. For example, if no O-linked structures are returned the O-link menu will not appear. 3. The content displayed in the right-hand panel is dependent on the data shown and the section of the database accessed.
8
Summary Continuing advancements in analytical technologies and new data types are unraveling the complexity of the glycome and
38
Matthew P. Campbell et al.
glycoproteome. It is increasingly important that we ensure that data being deposited in databases is annotated and structured. GlycoStore is an important step in this direction, with an expanding collection of glycan structures, annotated glycoproteins, and biological contextual information. In this chapter, we have described the key functionality of GlycoStore and our efforts to bring together glycan structure information, chromatographic separation and electrophoretic data. It contains the largest collection of curated and in-house LC and CE experimental data on glycan structures with associated research literature. We will continue to adapt its data gathering, processing and user interfaces to support ongoing developments in separation-MS-based analytical workflows. It is intended that the database will be updated every 3 months with newly curated material, edits to existing entries, and feature updates as a result of improving technologies and user feedback.
9
Submission of Updates, New Data, and Troubleshooting GlycoStore is a community endeavor and the team encourages end-user feedback. To submit suggestions, improvements, or report errors please contact us at [email protected]. The project encourages the submission of new data and we have put together a set of Excel template files that can be uploaded to GlycoStore, for more information go to https://www.glycostore. org/contribute. For more information and guides, please refer to our documentation site https://unicarbkb.freshdesk.com.
Acknowledgments The authors acknowledge Ian Walsh (BTI, A*STAR Singapore) and Louise Royle (Ludger Ltd., UK) and Ciara McManus and Mark Hilliard (NIBRT) for supporting the migration of GlycoBase to GlycoStore, validating and checking data collections, and user interface feedback. Funding: GlycoStore is supported by the Institute for Glycomics (Griffith University, Australia), Macquarie University-Ludger Ltd. Pilot Scheme, A*STAR’s Joint Council Visiting Investigator Programme (HighGlycoART) and Biomedical Research Council Strategic Positioning Fund (GlycoSing). References 1. Varki A (2017) Biological roles of glycans. Glycobiology 27(1):3–49. https://doi.org/10. 1093/glycob/cww086
2. Christiansen MN, Chik J, Lee L, Anugraham M, Abrahams JL, Packer NH (2014) Cell surface protein glycosylation in
GlycoStore: A Platform for H/UPLC and Capillary Electrophoresis Glycan Data cancer. Proteomics 14(4–5):525–546. https:// doi.org/10.1002/pmic.201300387 3. Taniguchi N, Kizuka Y (2015) Glycans and cancer: role of N-glycans in cancer biomarker, progression and metastasis, and therapeutics. Adv Cancer Res 126:11–51. https://doi.org/ 10.1016/bs.acr.2014.11.001 4. Maverakis E, Kim K, Shimoda M, Gershwin ME, Patel F, Wilken R, Raychaudhuri S, Ruhaak LR, Lebrilla CB (2015) Glycans in the immune system and the altered glycan theory of autoimmunity: a critical review. J Autoimmun 57:1–13. https://doi.org/10.1016/j. jaut.2014.12.002 5. Parekh RB, Dwek RA, Sutton BJ, Fernandes DL, Leung A, Stanworth D, Rademacher TW, Mizuochi T, Taniguchi T, Matsuta K et al (1985) Association of rheumatoid arthritis and primary osteoarthritis with changes in the glycosylation pattern of total serum IgG. Nature 316(6027):452–457 6. Poole J, Day CJ, von Itzstein M, Paton JC, Jennings MP (2018) Glycointeractions in bacterial pathogenesis. Nat Rev Microbiol 16 (7):440–452. https://doi.org/10.1038/ s41579-018-0007-2 7. Scanlan CN, Offer J, Zitzmann N, Dwek RA (2007) Exploiting the defensive sugars of HIV-1 for drug and vaccine design. Nature 446(7139):1038–1045. https://doi.org/10. 1038/nature05818 8. Kailemia MJ, Ruhaak LR, Lebrilla CB, Amster IJ (2014) Oligosaccharide analysis by mass spectrometry: a review of recent developments. Anal Chem 86(1):196–212. https://doi.org/ 10.1021/ac403969n 9. Hofmann J, Pagel K (2017) Glycan analysis by ion mobility-mass spectrometry. Angew Chem Int Ed Engl 56(29):8342–8349. https://doi. org/10.1002/anie.201701309 10. Harvey DJ, Abrahams JL (2016) Fragmentation and ion mobility properties of negative ions from N-linked carbohydrates: part 7. Reduced glycans. Rapid Commun Mass Spectrom 30(5):627–634. https://doi.org/ 10.1002/rcm.7467 11. Ruhaak LR, Xu G, Li Q, Goonatilleke E, Lebrilla CB (2018) Mass spectrometry approaches to glycomic and glycoproteomic analyses. Chem Rev 118(17):7886–7930. https://doi. org/10.1021/acs.chemrev.7b00732 12. Purohit S, Li T, Guan W, Song X, Song J, Tian Y, Li L, Sharma A, Dun B, Mysona D, Ghamande S, Rungruang B, Cummings RD, Wang PG, She JX (2018) Multiplex glycan bead array for high throughput and high
39
content analyses of glycan binding proteins. Nat Commun 9(1):258. https://doi.org/10. 1038/s41467-017-02747-y 13. Mulloy B, Dell A, Stanley P, Prestegard JH (2015) Structural analysis of glycans. In: Varki A, Cummings RD et al (eds) Essentials of glycobiology. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, pp 639–652. https://doi.org/10.1101/gly cobiology.3e.050 14. Battistel MD, Azurmendi HF, Yu B, Freedberg DI (2014) NMR of glycans: shedding new light on old problems. Prog Nucl Magn Reson Spectrosc 79:48–68. https://doi.org/ 10.1016/j.pnmrs.2014.01.001 15. Stockmann H, O’Flaherty R, Adamczyk B, Saldova R, Rudd PM (2015) Automated, high-throughput serum glycoprofiling platform. Integr Biol (Camb) 7(9):1026–1032. https://doi.org/10.1039/c5ib00130g 16. Mantovani V, Galeotti F, Maccari F, Volpi N (2018) Recent advances in capillary electrophoresis separation of monosaccharides, oligosaccharides, and polysaccharides. Electrophoresis 39(1):179–189. https://doi. org/10.1002/elps.201700290 17. Campbell MP, Royle L, Radcliffe CM, Dwek RA, Rudd PM (2008) GlycoBase and autoGU: tools for HPLC-based glycan analysis. Bioinformatics 24(9):1214–1216. https://doi.org/ 10.1093/bioinformatics/btn090 18. Jarvas G, Szigeti M, Guttman A (2018) Structural identification of N-linked carbohydrates using the GUcal application: a tutorial. J Proteome 171:107–115. https://doi.org/10. 1016/j.jprot.2017.08.017 19. Behne A, Muth T, Borowiak M, Reichl U, Rapp E (2013) glyXalign: high-throughput migration time alignment preprocessing of electrophoretic data retrieved via multiplexed capillary gel electrophoresis with laser-induced fluorescence detection-based glycoprofiling. Electrophoresis 34(16):2311–2315. https:// doi.org/10.1002/elps.201200696 20. Takahashi N, Nakagawa H, Fujikawa K, Kawamura Y, Tomiya N (1995) Threedimensional elution mapping of pyridylaminated N-linked neutral and sialyl oligosaccharides. Anal Biochem 226(1):139–146. https:// doi.org/10.1006/abio.1995.1201 21. von der Lieth CW, Freire AA, Blank D, Campbell MP, Ceroni A, Damerell DR, Dell A, Dwek RA, Ernst B, Fogh R, Frank M, Geyer H, Geyer R, Harrison MJ, Henrick K, Herget S, Hull WE, Ionides J, Joshi HJ, Kamerling JP, Leeflang BR, Lutteke T, Lundborg M, Maass K, Merry A, Ranzinger R, Rosen J,
40
Matthew P. Campbell et al.
Royle L, Rudd PM, Schloissnig S, Stenutz R, Vranken WF, Widmalm G, Haslam SM (2011) EUROCarbDB: an open-access platform for glycoinformatics. Glycobiology 21 (4):493–502. https://doi.org/10.1093/ glycob/cwq188 22. Gotz L, Abrahams JL, Mariethoz J, Rudd PM, Karlsson NG, Packer NH, Campbell MP, Lisacek F (2014) GlycoDigest: a tool for the targeted use of exoglycosidase digestions in glycan structure determination. Bioinformatics 30(21):3131–3133. https://doi.org/10. 1093/bioinformatics/btu425 23. Zhao S, Walsh I, Abrahams JL, Royle L, Nguyen-Khuong T, Spencer D, Fernandes
DL, Packer NH, Rudd PM, Campbell MP (2018) GlycoStore: a database of retention properties for glycan analysis. Bioinformatics 1:2 24. Herget S, Ranzinger R, Maass K, Lieth CW (2008) GlycoCT-a unifying sequence format for carbohydrates. Carbohydr Res 343 (12):2162–2171. https://doi.org/10.1016/j. carres.2008.03.011 25. Abrahams JL, Campbell MP, Packer NH (2017) Building a PGC-LC-MS N-glycan retention library and elution mapping resource. Glycoconj J 35(1):15–29. https:// doi.org/10.1007/s10719-017-9793-4
Chapter 3 An Interactive View of Glycosylation Julien Mariethoz, Davide Alocci, Niclas G. Karlsson, Nicolle H. Packer, and Fre´de´rique Lisacek Abstract The present chapter focuses on the interactive and explorative aspects of bioinformatics resources that have been recently released in glycobiology. The comparative analysis of data in a field where knowledge is scattered, incomplete, and disconnected from main biology requires efficient visualization, integration, and interactive tools that are currently only partially implemented. This overview highlights converging efforts toward building a consistent picture of protein glycosylation. Key words Glycoinformatics, Data integration and visualization, Glycan structure, Glycoprotein, Glycan expression profile
1
Introduction Data visualization aims at efficient information capture and is usually solved by generating simplified graphics. Graphic representations are commonly used in biology not only to communicate experimental results but also to depict biological entities and their interactions. Clearly, these depictions are more convincing when they use standard representations of entities. This issue in the context of molecular biology was raised with great humor in [1] and found an interesting answer with the Systems Biology General Notation (SBGN) [2] which is slowly settling in, yet not widely adopted beyond bioinformatics circles. Nonetheless, a general agreement has been reached regarding the representation of some biomolecules. For instance, one-letter coded nucleotide and amino acid sequences are now universally understood. In biochemistry, IUPAC1 recommendations have enforced many standards and in chemistry, linear codes for small molecules such as SMILES2 and
1 2
International Union of Pure and Applied Chemistry. Simplified Molecular Input Line Entry Specification.
Gavin P. Davey (ed.), Glycosylation: Methods and Protocols, Methods in Molecular Biology, vol. 2370, https://doi.org/10.1007/978-1-0716-1685-7_3, © Springer Science+Business Media, LLC, part of Springer Nature 2022
41
42
Julien Mariethoz et al.
InChi3 are now also widespread. In all cases, the introduction of computers and automated processing in life sciences has significantly spread the use of standard representations and a logical consequence has been the design of software interfaces that rely on these shared representations. An obvious benefit of adopting common entity encoding and description is to facilitate comparison. The latter being the most basic operation that is performed in biological studies, an essential goal of visualization is the easier capture of variations in changing conditions. Primarily following the exponential release of genetic sequences, dedicated visualization tools have been developed over the years and their use has become a matter of course. Some popular application domains encompass phylogenetic trees reflecting DNA or protein sequence similarities [3] or gene expression data clustering that reveals differential profiles displayed in dendrograms [4]. In recent years, new challenges have arisen with nextgeneration sequencing (NGS) techniques and the sheer size of genome data has prompted the need for custom visualization of sequence alignments, expression patterns spanning entire genomes [5]. Within a short period of time, visualization was promoted in bioinformatics as key to interpreting the results of large volumes of data generated in -omics applications. Visualization is increasingly crucial to the integration of multiple data sources as exemplified by the general-purpose tools available to overlay data of different datasets. One of the most widespread open source platforms is Cytoscape [6], extensively used for visualizing interaction networks and biological pathways. Cytoscape can combine gene expression profiles, sequence annotation and other quantitative and qualitative data into informationrich networks. The definite added-value of the tools mentioned so far, lies in ever-increasing levels of interactivity favoring exploration even sometimes discovery. It is now almost inconceivable not to click on links (whether in a text or underlying in an object) that are populating web pages or software windows and yielding refined information. Another typical example of interactivity improved over the years and somehow ahead of many other domains, is the 3-dimensional representation of macromolecules and their interactions with other large or small molecules. Rotating or flipping protein 3D-structures represented as ribbons or sticks has long been achieved by software4 on a PC since early PyMol5 (2000) to recent LiteMol [7]. The wide range of options for manipulating a molecule in 3D enables the careful study of interactions.
3
International Chemical Identifier. https://proteopedia.org/wiki/index.php/Introduction_to_molecular_visualization. 5 https://pymol.org. 4
An Interactive View of Glycosylation
43
The present chapter focuses on the interactive and explorative aspects of bioinformatics resources that have been recently released in the field of glycobiology. Following the thread of the above description of the main benefits of visualization in molecular biology, we focus on (1) the notion of simplified graphics, (2) the comparative and (3) the integrative approaches that are implemented in glycotools and glycodatabases. In the vast majority of cases, these interactive tools are destined not only to help glycoscientists in their research but also to make glycobiology accessible to naive users.
2
Interactive and Graphic Showing and naming all atoms of a complex carbohydrate generates a very busy picture that is hardly intelligible. Unsurprisingly, glycoscientists have suggested for many years means to simplify glycan depiction (see [8] for details of the genesis of simplified representations). In a nutshell, glycans were first simplified as regular expressions6 based on the short names of monosaccharides (e.g., Glc for glucose), a or b (sometimes α or β) followed by digits to describe the linkage type (e.g., b1–4) and paired brackets to delineate branches. This is known as IUPAC linear. The poor potential of this linear notation for distinguishing the various categories of isomers was slightly improved with the IUPAC condensed notation that displays structures in 2D but finally, a standard notation was agreed upon as the most appropriate solution for a simplified representation of glycans [9]. This notation simply maps monosaccharides to a series of colored symbols. For instance, glucose systematically becomes. The notation gained popularity in glycobiology and was later named SNFG for “Symbol Nomenclature for Glycans” with a wide consensus [10]. SNFG is regularly updated [11]. The mention of SNFG is now spreading as the preferred code for representing glycans. It is the selected notation in GlyTouCan [12] the unique repository of glycan structures. Ultimately, a uniform usage of the nomenclature in the literature is expected. The SNFG notation does not solve all aspects of the ambiguity of isomers, but it is a definite step in making glycoscience easily manageable for experts and accessible to nonexperts.
2.1 What Can Be Drawn in 2D?
6
As just mentioned, drawing the intricate set of atoms and bonds of a glycan structure is notoriously cumbersome so that most drawing interfaces have resorted to either naming monosaccharide constituents or using symbolic representations. Basically, these interfaces differ on the naming or the symbol conventions they rely on but
https://en.wikipedia.org/wiki/Regular_expression.
44
Julien Mariethoz et al.
also on the drawing options for example, using drag and drop vs clicking. They also produce different renderings in a range of output formats. The distinctive capabilities of the most popular interfaces reviewed in this chapter are summarized in Table 1. Two pioneering interfaces were developed prior to the release of a consensus simplified representation of glycans. First, software integrated in GlycoSuiteDB [17] enabled graphical queries in the IUPAC condensed format used in that database. Second, a Java application called KegDraw was implemented to perform similarity search in the KEGG databases [18]. It output low-resolution images where text labels of monosaccharides connected by lines. The next-generation interfaces introduced a cartoon notation first proposed within the Consortium of Functional Glycomics (CFG) [19]. This notation preceded SNFG and used similar symbols. The implementation for drawing 2D structure then shifted to enabling the assembly of a glycan symbol by symbol. This was reflected in GlycanBuilder [20] and GlycoViewer [21]. GlycanBuilder, a java applet, was developed mainly to be embedded as a graphic tool in larger platforms managing the analysis of glycans [22]. GlycanBuilder was upgraded to work in a web environment. As such, it needs to be installed and connected with a server which challenges its use due to recent security upgrades in major browsers. This causes fairly long time-lags during processing despite a recent update [23]. GlycoViewer pioneered in terms of design, usability and speed. A potential drawback of this tool is the dragand-drop implementation that imposes the usage of a mouse while this may become tiresome if many and large structures are drawn on a touch-based device such as a tablet or a smartphone. GlycoViewer, like GlycanBuilder, is composed of a client interface and a server written in Ruby on Rails. Time-lagging issues prompted the design of JavaScript-based and browser-independent web interfaces represented by Glycano7 and SugarSketcher [13]. The Glycano interface uses SNFG symbols, though not in the proper color scheme. It is designed for trained chemists and glycobiologists somehow precluding access for nonexperts. Note that the drag-and-drop feature is also used in Glycano. As in GlycanBuilder, the overall drawing process of SugarSketcher is a succession of feature selection steps that ends with the display of an SNFG-compliant 2D structure. This process is repeated as many times as the target glycan contains building blocks (monosaccharides) with a minimized number of clicks. The tool can be run in “normal mode” for glycoscientists or “quick mode” for users with limited glycoknowledge. It also implements symbol positioning that follows rules of depicting monosaccharide linkages with embedded type and anomericity detailed in
7
http://glycano.cs.uct.ac.za.
✗
✓
✓
✗
✗
✗
Carbohydrate ✓ builder (GLYCAM)
✓
✓
✓
✓
✓
CSDB wizard [13]
DrawRINGS
DrawGlycan
Glycano
GlycoViewer [14]
✓
✗
✗
✗
Via text editing
✗
✗
✓
✗
✗
✓
✗
✓
✓
DD
DD
✗
C
C
✓ Graphics symbols shown after selection of text description
Clicks (C) vs drag and drop (DD) C
Selection of sugar residues via their text description ✓
✗
Selection of Edit a sugar Library of library and residues via SNFG preprepared add their graphic compatible structures substituents symbols
Table 1 Characteristic features of the most known 2D glycan structure drawing interfaces
✗
Unspecified
IUPAC
KCF
GlycoCT,
✗
Import
✗
PNG, SVG
✗
KCF, IUPAC
GlycoCT, WURCS, SMILES, GLYDE-II, GLYCAM, LINUCS, MOL-file
PDB
Export
(continued)
Ruby, JavaScript
JavaScript
MATLAB
Java, HTML5
PHP, Java
?
Implementation
An Interactive View of Glycosylation 45
✓
SugarSketcher] ✓
✓
✗ ✓
✗ ✗
✓
✗
POLYS builder ✓ [16]
✓
✗
✓
✓
✓
GlycanBuilder in GlyTouCan [15]
Selection of sugar residues via their text description
Selection of Edit a sugar Library of library and residues via SNFG preprepared add their graphic compatible structures substituents symbols
Table 1 (continued)
C
C
C
Clicks (C) vs drag and drop (DD)
GlycoCT, library
INP
GlycoCT, library, CarbBank, Linucs, IUPAC, WURCS
Import
PHP, C
Java
Implementation
GlycoCT, JavaScript SMILES, InChi, InChiKey, SVG
INP, PDB
GlycoCT, Glyde, Linucs, WURCS
Export
46 Julien Mariethoz et al.
An Interactive View of Glycosylation
47
[24]. Note that two drawing interfaces integrated in 3D modeling platforms serve as an input form for building 3D models: Polys [25] in the Glyco3D portal [26] and the (unpublished) carbohydrate builder8 of the GLYCAM-Web portal. Finally, DrawGlycan [27] should be mentioned as it produces high quality and SNFG-compliant depiction of glycan structures though data input is limited to IUPAC linear encoding. 2.2 What Can Be Uploaded for Automatic Drawing in 2D?
8
Cheminformatics has produced powerful tools for drawing molecules thereby spreading the usage of standards over the years. With a software package like CDK, the Chemistry Development Kit [28], users can upload a SMILES or an InChi (see Introduction) file and obtain the picture of the corresponding molecule featuring atoms and bonds. Although format sharing is usual in chemical biology, it is less of a habit in carbohydrate chemistry, and glycobiology. Several machine-readable formats have been proposed to account for the branched structures of glycans, in a variety of ways. In the end, the multiplicity is such that toolboxes of converters have been developed over the years [29, 30]. GlycoCT has however, emerged as a structure encoding format [31] shared by a wide range of glyco-databases and compatible with a broad tool collection. Users can upload a GlycoCT file and be returned a corresponding SNFG cartoon. The precision of the structure is high in most cases, yet encoding can be challenged by unusual monosaccharides not included in the associated library as happens in bacteria, for example. This shortcoming triggered the development of WURCS9 the last of the published formats, a linear notation for representing carbohydrates for the Semantic Web [32]. WURCS is not as widespread as GlycoCT. GlycanBuilder and SugarSketcher are embedded in several databases as graphic query tools and minimally take GlycoCT files as upload for generating SNFG-compliant structures. Note that DrawRINGS10 takes KEGG Chemical Function (KCF) and IUPAC formats to produce low-resolution images resembling but not quite matching SNFG cartoons. Exporting drawn structures in GlycoCT is also a common option. In conclusion to this 2D drawing section, it should be said that despite growing consensus in each community (encompassing organic and carbohydrate chemistry, biochemistry and glycobiology) for relying on shared standards, cross-talk is not quite established between all cited disciplines but progress is being made.
http://glycam.org/tools/molecular-dynamics/oligosaccharide-builder/build-glycan?id¼1. Web3 Unique Representation of Carbohydrate Structures. 10 http://rings.t.soka.ac.jp/drawRINGS-js/index.html. 9
48
Julien Mariethoz et al.
2.3 What Can Be Viewed in 3D?
11
For decades, the Protein Data Bank (PDB) [33] has become the reference resource for exploring the three-dimensional structures of proteins. The database also records interactions, for example by including molecule binding details whether a single metal ion or a glycan ligand. However, carbohydrate chemists spotted approximate to erroneous information regarding carbohydrates [34] that called for implementing correcting and visualizing tools [35]. The poorer resolution of carbohydrate moieties compared to protein parts in PDB records was mainly due to an imbalance between protein and carbohydrate 3D structure validation tools. The GLY COSCIENCES.de collection [36] has attempted over the years to restore the balance by providing a set of tools dedicated to producing a consistent 3D representation of protein-glycan interactions [37]. Irrespective of carbohydrates, 3D visualization tools have significantly evolved in the past decades thereby providing users with an increasing array of functionalities as exemplified with the development of PyMol.11 One of the constraining factors up to recently was the speed of display especially when zooming in/out or manipulating a large protein in a web interface. This issue was tackled in LiteMol [7]. Molecular dynamics is applied to glycan structure modeling and simulating their spatial behavior. Very few tools are available for that purpose. GLYCAM-Web12 and the tool suite hosted on the site are becoming increasingly popular, especially for having implemented a 3D version of the SNFG representation as shown in the first frame of Fig. 1. In parallel, a software package using video game technology [38] is another simulator that has adopted a similar approach by drawing carbohydrates with an SNFGcompatible color scheme as shown in the second frame of Fig. 1. Finally, Glycan Reader [39] also performs modeling from PDB data but does not particularly invest in designing 3D glycan images. The latest development useful for visualizing a glycoconjugate is a LiteMol plugin that relies on the GLYCAM-Web 3D view of glycans [40] to decorate a PDB entry with carbohydrates (as included in the data relative to binding molecules). An example is shown in the third frame of Fig. 1 as directly visible in the entry of human Alpha-1-antitrypsin (P01009) in PDBe,13 where LiteMol is integrated.
https://pymol.org. http://glycam.org. 13 https://www.ebi.ac.uk/pdbe/. 12
An Interactive View of Glycosylation
49
Fig. 1 Three examples of 3D images of carbohydrates in an SNFG-compatible color scheme generated with three different software tools named under each image
3
Interactive and Comparative
3.1 What Can Be Quantified?
Quantitative data are still relatively scarce in glycobiology and there are roughly two types that can be processed and visualized as profiles for comparison purposes. So far, the main effort has involved glycan-binding data following the development of array technology [41]. The second aspect targets the measurement of glycan abundance in glycomes but data production is significantly less prolific.
3.2 GlycanBinding Data
GlycomeAtlas [42] can be considered as the first visualization tool for quantitative glycan-binding data. This software has been specifically designed for exploring glycan profiles produced by Consortium for Functional Glycomics (CFG) [19]. It displays information as clickable anatomic maps linking a location on the map with the corresponding set of expressed glycans. Although GlycomeAtlas can be used to easily navigate in the CFG data, it cannot be extended nor run with other datasets. GlycoPattern [43], together
50
Julien Mariethoz et al.
with GlycanMotifMiner [44], represents another effort to visualize and mine CFG data. The implementation of dendrograms and heatmaps supports the detection of similar motifs in glycan binding proteins and the comparison of several glycan array experiments. Both tools have been designed and developed within CFG to define a common way of storing, visualizing, and analyzing data for the consortium. A more generic version applicable to non-CFG arrays was proposed in MotifFinder and recently updated [45]. 3.3 Glycan Abundance
At present, glycan expression in tissue, in a cell or on a glycoprotein thereby characterized as glycome, is revealed by chromatographic or mass spectrometric profiles. In this context, glycan compositions (i.e., unordered monosaccharide counts) are measured. Importantly, it should be said that there is no consensus usage of the symbolic representation of compositions that can be, for instance, for the fucosylated N-core, written as Hex:3HexNAc:2dHex:1, HexNAc(2)Hex(3)dHex(1), Hex3HexNAc2dHex1, H3N2F1, and so on. Of course, the human eye easily deals with these variations but bioinformaticians keep on discovering alternative notations in published papers that require software update. In this chapter for the sake of simplicity we adopt the short notation: H for hexose, N for N-Acetyl hexosamine, F for fucose, S for Nacetyl-neuraminic acid, G for N-glycolylneuraminic acid, and s for sulfate. In general, published articles contain descriptive tables and/or bar plots but this information remains static and poorly comparable from one publication to the other and even within a single article. Glycan expression studies are no exception and this situation prompted the development of Glynsight [16] designed to visualize profiles from tables of quantified glycan compositions. Glynsight uses an innovative visualization to bring out patterns hidden in many supplementary tables associated with publications within or across profiles. This web-based tool is provided with an upload function for spreadsheets containing expression data and allows the user to export each profile as an image or the whole collection as a JSON file. When published, a glycan profile consists in a list of independent compositions associated with corresponding intensities. These profiles are the input of Glynsight. The first step implemented in Glynsight is to bring out relationships between input compositions. To that end, compositions are incrementally ordered and structured in columns where all compositions with a constant overall number of monosaccharides are listed. For instance, if the profile is an N-glycome, then the first column of the profile is the N-core H3N2 (5 monosaccharides in total) and the next is H3N3 or H3N2F1, H4N2, H2N3F1 (6 monosaccharides in total) and so on. In this way, the implicit inclusion of H3N2 in H3N3 or
An Interactive View of Glycosylation
51
H3N2F1 is made explicit. More generally, this presentation of the profile highlights the monosaccharide shared between compositions. In this way, Glynsight shows the relationships between compositions in an interactive way. This allows the user to follow an increment in galactosylation or fucoslyation, and so on. The following emphasizes the time and energy saved when using Glynsight to get an instantaneous capture of data content. We selected [46] in a list of articles retrieved from PubMed when searching for quantitative studies in glycomics. In this article, Table I and Supplementary Table I show the averaged relative abundances of individual N-glycan structures in several cell lines. They contain over seventy rows and as many columns as there are cell lines. As hinted in the title, the rows characterize the quantified glycans and the columns describe glycan composition (monosaccharide counts) and type (hybrid, complex, etc.) followed by numbers of the average relative abundance of each cell line. For instance, in row 51 of Supplementary Table I, columns 3 and 4 show the abundances of NeuAc2NeuGc1Hex6HexNAc5 (H6N5S2G1) as measured in the Y101 and Y202 cell lines and are displayed as: Col3: 0.045 0.027. Col4: 0.095 0.036. Number variations in a column of more than seventy values requires some attention and may take a human eye a short while to spot differences. In particular, the comparison of variations within and between these columns is not caught at a glance. In both cases the added value of Glynsight is the instant capture of this information that is easily discernible. Glynsight can be used in two modes. In the Individual Display mode, Glynsight represents the content of one column in Tables mentioned above, that is, the percentages of the average relative glycan abundance in a single cell line. As shown in Fig. 2, the horizontal bars reflect these numbers. As an example, the tracker for fucosylation is on and shows the effect of an additional fucose in glycan expression. Differential Display is an adapted version of the Individual Display for comparing two glycan profiles. In this mode, the list of experiments in the left side menu is duplicated to provide the user with the option of selecting an experiment in each of the two lists. Glynsight subtracts intensities of each pair of identical compositions. The viewer shows red and blue bars whose height for each composition represents the result of the subtraction. For any given composition, if its intensity in the first experiment is higher than in the second, then the bar is blue. In the opposite situation, the bar is red. We also introduced a distinction to spot compositions unique to an experiment. Brighter blue (resp. red) highlights differences arising from compositions unique to the first (resp.
52
Julien Mariethoz et al.
Fig. 2 Individual profile of osteoblast glycan expression as reported in [46] and visualized with Glynsight. Horizontal bars reflect expression values of each glycan composition: green (respectively, yellow) is above (respectively, below) threshold. Gray bars correspond to glycan compositions not seen in osteoblasts but possibly in other cell lines tested in the study. Compositions are grouped and ordered as a function of monosaccharides counts. Each vertical group contains the exact same number of monosaccharides. Each vertical group to the right (respectively, left) contains exactly one more (respectively, less) monosaccharide. Individual monosaccharide changes can be tracked with buttons on the upper left. As an example, the tracker for fucose is on to follow fucosylation in glycan expression
second) experiment while toned down blue (resp. red) shows differences arising from compositions expressed in both experiments but higher in the first (resp. second) than in the second. Note that white and grey bars are still present with the same meaning as previously defined. Figure 3 illustrates the instant capture of similar or distinct cell line profiles. In the first frame, the variations are quite obvious and in the second one, the flat aspect of the chart indicates the close similarity of glycosylation in the two compared cell lines. This information was in the table published [46] and mentioned above. To link compositions with potential glycan structures, Glynsight searches the glycan structure database of the GlyConnect platform [15] for all reported glycans that match a particular composition. This process is transparent to the user and is initiated by clicking on a composition of interest or its corresponding bar in the main viewer. Then, potential structures are visualized in the structure tab under the viewer (not shown here). Results can be filtered by type of glycan attachment to protein, species, and tissue. Each selection can be removed by clicking on the red cross in the corner of each image.
An Interactive View of Glycosylation
53
Fig. 3 Differential display of glycan expression profiles of cell lines as reported in [46] and visualized with Glynsight. In both frames, the height of a bar represents the difference between the expression of a specific composition in the first cell line with that of the same composition in the second cell line. Brighter blue (respectively, red) bars highlight differences arising from compositions unique to the first cell line (respectively, the second) while pale blue (respectively, red) bars show differences arising from compositions expressed in all experiments but higher (respectively, lower) in the first cell line 3.4 What Can Be Modeled?
Quantifying glycan abundance in the glycome of a tissue, a cell or of a single glycoprotein logically leads to attempting to decipher the observed output through modeling and possibly simulating it. A mine of enzyme data is available in the CAZy database [47] which serves as a basis for reconstructing synthetic pathways. N-glycosylation models have been proposed [48] and have recently been updated in [49]. Other chapters in this volume focus on tools that model and simulate glycan biosynthesis. This topic is therefore only briefly mentioned here. Suffice to say that two major web-based tools exist: the Pathway predictor [14], which specifically addresses the synthesis of eukaryotic N-glycans, and the O-glycologue [50], that of O-glycans. To our knowledge, no such tool exists to simulate glycan degradation.
54
Julien Mariethoz et al.
4
Interactive and Integrated One of the challenges of glycoinformatics is to bring together disconnected information. For example, in many cases, the structure of glycans has been solved independently of the protein(s) on which they were attached. Conversely, protein glycosites are mapped after removal of the glycans. The restoration of lost links requires integrating multiple sources of data. In recent years, several approaches have been used to extensively map glycosites on glycoproteins. Some of the most systematic studies have generated large datasets covering N-sites [51] and O-sites [52]. The latter and subsequent work from the same group has led to the GlycoDomainViewer [53] that integrates experimental mapped O-sites, predicted N-sites along with other amino acid sequence features such as domains or variants. However, most of this site information lacks the knowledge of the attached full glycan structures. GlyConnect [15] fills this gap and integrates site information with details of glycan structures. In fact, the two resources complement each other. They both cover of information regarding human and mouse site mapping reflecting the current bias in published experimental data. Large scale studies tend to be limited to mammalian species. Figure 4 is a composite of GlycoDomainViewer and GlyConnect screenshots highlighting the summary information of GlycoDomainViewer (black frame) contrasting the detailed coverage of GlyConnect (brown frame) when both are showing their respective content associated with human latent transforming growth factor beta binding protein 1. GlycoDomainViewer is centered on the amino acid sequence and maps positional information of sites and domains (upper part of figure). It is exhaustive on that particular protein (Latent-transforming growth factor beta-binding protein 1, Q14766). GlyConnect displays less information about Q14766 but offers comparative options by providing links to structures and checking whether each structure is known to be linked to other proteins too (lower part of figure). In other words, GlycoDomainViewer enables drilling down and investigating extended site-specific properties of a given protein (search in depth) while GlyConnect displays contextual information characterizing the protein and its glycosylation profile that potentially overlaps with others on different proteins (search in breadth). The content and the usage of GlyConnect is now illustrated with a list of questions that can be answered using dedicated interfaces.
An Interactive View of Glycosylation
55
Fig. 4 Summary information of human latent transforming growth factor beta binding protein 1 (Q14766) in GlycoDomainViewer (black frame) and GlyConnect (brown frame). GlycoDomainViewer is centered on the amino acid sequence and maps positional information of sites and domains. GlyConnect displays less information and offers comparative options through links to structures and checking whether each structure is known to be linked to other proteins 4.1 Which Proteins Carry N-Linked, Fucosylated, Disialylated, and Triantennary Glycans?
Entering the “Search” section of GlyConnect prompts the display of options for querying the database. They appear as a series of blue buttons on the left side of the screen. To answer the question the button of interest is labeled “N-linked.” This selection will restrict the search to N-linked glycans that can be further characterized via a list of possible features in order to refine the query. GlyConnect divides features into three categories: cores, structural properties and determinants (or glyco-epitopes). This example targets information on structural properties suggesting the selection of the “Structural Properties” tab in the upper toolbar. The features corresponding to the question can then be selected one by one by clicking on colored boxes labeled “Di-sialylated,” “Fucosylated (any),” and “Tri-antennary.” Each time the user clicks on one property, this automatically moves the corresponding box to the frame below that subsequently contains a trace of the query (matching the question). In the end, the “Search” button will launch the query.
56
Julien Mariethoz et al.
Fig. 5 Octopus representation of GlyConnect associating glycoproteins with their glycomes through glycan compositions. Part A and B highlight the differences between two compositions. H6N5F1S2 is frequently recorded as attached to proteins in GlyConnect with a wide variety of glycans, in contrast with H6N6F1S2 that connects to only one record. Each right-side link of the octopus points to a record of a specific structure. Clicking opens a new window showing the corresponding structure and contextual information. In the case of highly connected H6N5F1S2, one of the options is shown. It is a “classical” tri-antennary structure in contrast to the unusual structure shown on the right-hand side
The results are displayed as shown in Fig. 5 in a graph connecting glycan structures to glycoproteins via a set of monosaccharide compositions.14 This graph was referred to as the “octopus” for obvious reasons in [15] and this reference is used here as well. All of the listed compositions match the selected properties of the query: “Di-sialylated,” “Fucosylated (any),” and “Tri-antennary.” Then each composition is linked (a) on the right, to corresponding glycan structures stored in the database and (b) on the left, to glycoproteins harboring these glycans as recorded in the database. When clicking on a specific composition, the actual links of the octopus are colored in red. Figure 5 highlights the differences between two compositions (part A and B). H6N5F1S2 is obviously frequently recorded as attached to proteins in GlyConnect with a 14
H stands for hexose, N for N-acetylhexosamine, F for fucose, S for sialic acid.
An Interactive View of Glycosylation
57
wide variety of glycans, in contrast with H6N6F1S2 that connects to only one record. Each right-side link of the octopus points to a record of a specific structure. In the example, the effect of clicking on a link is partly represented in Fig. 5. Clicking opens a new window showing the corresponding structure and contextual information. In the case of highly connected H6N5F1S2, one of the options is shown. It is a “classical” triantennary structure in contrast to the unusual structure shown on the right-hand side. 4.2 In which Tissue(s) Has a Structure Been Reported?
In the browser section of GlyConnect, after clicking on “Structures” in the upper grey horizontal bar, we can explore the complete list of all the glycan structures stored in GlyConnect. For each structure displayed, a series of clickable colored buttons is available on the right-hand side of the page. Clicking on the “Tissue” green button will show the list of tissues in which the glycan of interest was identified.
4.3 What Are the Known Structures Associated with a Disease?
In much the same way as the database can be explored by selecting “Structures” and checking the “Tissue” information, it possible to click on “Diseases” in the upper grey bar. Then the list of stored diseases is shown. This list can be filtered or scrolled down for a term of choice. Assume, “Leukemia, Acute myelogenous” is selected, the corresponding disease entry opens and it shows seven glycans that are associated. The complete list of glycans is available by clicking on the blue “Structures” button in the righthand side of the page. Note that diseases are cross-linked to disease ontology where synonyms can be found.
4.4 How Abundant Is H5N4F2S1 in the Human Plasma?
A large static table spread over several pages and containing integrated human plasma protein and glycan quantitative data was published in [54]. However, the underlying expression profiles are not easy to visualize or compare in this print out. The Glynsight visualizer and selected protein and glycan entries of GlyConnect were combined in a dedicated interface for exploring protein and glycan expression in human plasma based on the table in [54]. This interface is composed of four synchronized panels. The first two display protein abundance and glycan abundance per protein as shown in Fig. 6. The proteins listed reflect the selection of the publication. They are grouped in broad functional classes. Protein expression is represented as bubbles of size relative to the abundance. Expectedly, the largest bubble is labeled “Immunoglobulin G” (IgG). The glycan profile of each protein is visualized with Glynsight as shown in the second panel (“Glycan abundance per protein”). Clicking on a bubble prompts an immediate update of that second panel that shows the glycan profile of the chosen protein among the bubbles. In Fig. 6, two compositions are boxed (H5N4F1 and H5N4S2). As mentioned earlier, any of the green
58
Julien Mariethoz et al.
Fig. 6 First part of the plasma N-glycome visualization interface composed of four synchronized panels. This figure shows the first two panels that display protein abundance and glycan abundance per protein. The proteins are grouped in broad functional classes and their expression is represented as bubbles of size relative to their abundance. The glycan profile of each protein is visualized with Glynsight as shown in the second panel (“Glycan abundance per protein”). Clicking on a bubble prompts an immediate update of that second panel that shows the glycan profile of the chosen protein (here, Immunoglobulin G (IgG)) among the bubbles. Two compositions are boxed (H5N4F1 and H5N4S2) to connect to the next panels in Fig. 7
bar associated with a composition can be clicked on to show all or filtered glycan structures of the GlyConnect database matching that composition. This functionality is illustrated in Fig. 7, where the next three panels of the human plasma glycome interface are detailed. Single glycan abundance is shown in relation to all proteins listed in the first panel. The data for the two boxed compositions is shown in the A section of Fig. 7. A clear difference emerges between H5N4F1 and H5N4S2. The former appears to be unique to IgG while the latter is obviously ubiquitous on most human plasma glycoproteins. In the B section of Fig. 7, data for another composition, H5N5F1S2, is plotted and reveals it is unique to Immunoglobulin A in plasma. The profile for this protein is not shown, but clicking on the green bar representing H5N5F1S2 prompts structural information to be displayed in the fourth panel (“Glycan structures”).
An Interactive View of Glycosylation
59
Fig. 7 Second part of the plasma N-glycome visualization interface where single glycan abundance is shown in relation to all proteins listed in the first panel (see first part in Fig. 6). The data for the two boxed compositions is shown in section A. H5N4F1 (upper left panel) appears to be unique to IgG while H5N4S2 (lower left panel) is obviously ubiquitous on most human plasma glycoproteins. In section B, data for another composition, H5N5F1S2, is plotted and reveals it is unique to Immunoglobulin A in plasma (upper right panel). H5N5F1S2 relates to three human glycan structures stored in GlyConnect (lower right panel)
H5N5F1S2 relates to three human glycan structures stored in GlyConnect. The corresponding pages of these structures can be opened upon a click on either of the cartoons. In summary, clicking on a bubble triggers an update in the “Glycan abundance per protein” panel, where a profile is shown and in which clicking on a green bar launches two other updates regarding the corresponding composition in the “Single Glycan Abundance” and “Glycan Structures” panels. 4.5 Which Site-Specific Compositions Match an Unassigned Released Structure?
Experimental set-ups for investigating glycosylation have significantly evolved in the past decades. Very roughly put, many glycan structures were solved in the twentieth century with variable precision and poor information on the attachment sites on glycoproteins, an approach that is now defined as glycomics. In contrast, in the twenty-first century, large-scale glycoproteomics studies provide site-specific information with a poor resolution of glycan structures (usually only glycan composition). Consequently, it may be interesting to reconcile the glycomics and glycoproteomics data, especially in the case of abundant glycoproteins of commonly studied tissues such as the human plasma.
60
Julien Mariethoz et al.
Fig. 8 Colored buttons with corresponding data types and numbers summarize information for Human Beta-2glycoprotein 1 (aka, Apolipoprotein H or APOH) in GlyConnect. In the 2019 release of the database, seven defined structures and 35 compositions are associated with five glycosites (including “undefined” when the structure is not assigned to a site). One such structure is shown. Clicking on the “Reported glycosite” button reveals Asn-253 while clicking on the “Suggested glycosite” button lists two extra sites on the basis of the corresponding composition (H5N4S1) shown in the other frame of the figure. H5N4S1 is reported on three sites and matches the shown structure. Consequently, two alternative sites (Asn-107 and Asn-162) potentially also harbor the same structure. Asn-107 only occurs in amino acid sequences which carry the S- > N variant recorded as rs1801692 in dbSNP
Human Beta-2-glycoprotein 1 (aka, Apolipoprotein H or APOH) is one such glycoprotein identified and characterized in many studies. GlyConnect stores nine references where APOH and its glycans are identified and this information is shown in the corresponding protein page. As illustrated in Fig. 8, in that page, colored buttons summarize the information relative to Human Beta-2-glycoprotein 1. In particular, seven defined structures and 35 compositions are associated with five glycosites (including “undefined” when the structure is not assigned to a site). One such structure (only partially solved) is shown. The “Reported glycosite” button expanded upon clicking, shows that the structure is attached to Asn-253. The “Suggested glycosite” button also expanded upon clicking, suggests two more possible sites on the
An Interactive View of Glycosylation
61
basis of the corresponding composition shown in the other frame of the figure. In high throughput glycoproteomics experiments H5N4S1 is reported on three sites. Since this composition matches the shown structure, two alternative sites (Asn-107 and Asn-162) are potentially also harboring the same structure. Note that Asn-107 is an interesting case since it only occurs in amino acid sequences which carry the S- > N variant recorded as rs1801692 in dbSNP.15 No other structure than the one shown in the figure is actually found in association with Human Beta-2-glycoprotein 1 in the GlyConnect database. GlyConnect therefore helps bridging structure and site-specific information via compositions.
5
Conclusion Resources reviewed in this chapter emphasize past and current activity in designing interfaces that are useful for understanding and exploring glycodata. Access to structural and functional information characterizing glycans still needs to widen in order to guarantee the inclusion of these molecules into a larger picture of biology. This aspect is unfortunately still missing. User-friendly interactive and instructive visualization tools are a necessary but not sufficient condition for reaching this inclusive goal. The above also underlines the importance of consensual usage of common encodings that facilitate cross-discipline communication. This is a prerequisite as glycodata accumulates to bridging between well-established bioinformatics and still scattered glycoinformatics. It seems vital to build strong dependencies with key players that have evolved over decades such as UniProt, PDB, or CAZy. As a last comment, the wealth of the D3.js the javascript library that is extensively used not only in ours but in many bioinformatics resources, should be acknowledged as key to making visualization tools aesthetic, practical, and pleasantly interactive.
Acknowledgments This work was initially financed by the European Union FP7 GastricGlycoExplorer ITN (No 316929) and the Swiss Federal Government through the State Secretariat for Education, Research and Innovation SERI. It is currently supported by the Swiss National Science Foundation (SNSF) (grant 31003A_179249). ExPASy is maintained by the web team of the SIB Swiss Institute of Bioinformatics and hosted at the Vital-IT Competency Center.
15
https://www.ncbi.nlm.nih.gov/snp/.
62
Julien Mariethoz et al.
References 1. Lazebnik Y (2002) Can a biologist fix a radio?-or, what I learned while studying apoptosis. Cancer Cell 2:179–182 2. Nove`re NL, Hucka M, Mi H, Moodie S, Schreiber F, Sorokin A, Demir E, Wegner K, Aladjem MI, Wimalaratne SM, Bergman FT, Gauges R, Ghazal P, Kawaji H, Li L, Matsuoka Y, Ville´ger A, Boyd SE, Calzone L, Courtot M, Dogrusoz U, Freeman TC, Funahashi A, Ghosh S, Jouraku A, Kim S, Kolpakov F, Luna A, Sahle S, Schmidt E, Watterson S, Wu G, Goryanin I, Kell DB, Sander C, Sauro H, Snoep JL, Kohn K, Kitano H (2009) The systems biology graphical notation. Nat Biotechnol 27:735–741. https://doi. org/10.1038/nbt.1558 3. Allende C, Sohn E, Little C (2015) Treelink: data integration, clustering and visualization of phylogenetic trees. BMC Bioinformatics 16:414. https://doi.org/10.1186/s12859015-0860-1 4. Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci 95:14863–14868. https://doi.org/ 10.1073/pnas.95.25.14863 5. Down TA, Piipari M, Hubbard TJP (2011) Dalliance: interactive genome viewing on the web. Bioinformatics 27:889–890. https://doi. org/10.1093/bioinformatics/btr020 6. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T (2003). Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13(11):2498–2504. https://doi.org/10. 1101/gr.1239303 7. Sehnal D, Deshpande M, Varˇekova´ RS, Mir S, Berka K, Midlik A, Pravda L, Velankar S, Kocˇa J (2017) LiteMol suite: interactive web-based visualization of large-scale macromolecular structure data. Nat Methods 14:1121–1122. https://doi.org/10.1038/nmeth.4499 8. Varki A, Cummings RD, Esko JD, Stanley P, Hart GW, Aebi M, Darvill AG, Kinoshita T, Packer NH, Prestegard JH, Schnaar RL, Seeberger PH (2015) Essentials of glycobiology, 3rd edn. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY 9. Varki A, Cummings RD, Esko JD, Freeze HH, Stanley P, Marth JD, Bertozzi CR, Hart GW, Etzler ME (2009) Symbol nomenclature for glycan representation. Proteomics 9 (24):5398–5399. https://doi.org/10.1002/ pmic.200900708
10. Varki A, Cummings RD, Aebi M, Packer NH, Seeberger PH, Esko JD, Stanley P, Hart G, Darvill A, Kinoshita T, Prestegard JJ, Schnaar RL, Freeze HH, Marth JD, Bertozzi CR, Etzler ME, Frank M, Vliegenthart JF, Lu¨tteke T, Perez S, Bolton E, Rudd P, Paulson J, Kanehisa M, Toukach P, AokiKinoshita KF, Dell A, Narimatsu H, York W, Taniguchi N, Kornfeld S (2015) Symbol nomenclature for graphical representations of glycans. Glycobiology 25:1323–1324. https:// doi.org/10.1093/glycob/cwv091 11. Neelamegham S, Aoki-Kinoshita K, Bolton E, Frank M, Lisacek F, Lu¨tteke T, O’Boyle N, Packer NH, Stanley P, Toukach P, Varki A, Woods RJ (2019) SNFG discussion group. Updates to the symbol nomenclature for glycans guidelines. Glycobiology 29(9):620–624. https://doi.org/10.1093/glycob/cwz045 12. Tiemeyer M, Aoki K, Paulson J, Cummings RD, York WS, Karlsson NG, Lisacek F, Packer NH, Campbell MP, Aoki NP, Fujita A, Matsubara M, Shinmachi D, Tsuchiya S, Yamada I, Pierce M, Ranzinger R, Narimatsu H, Aoki-Kinoshita KF (2017) GlyTouCan: an accessible glycan structure repository. Glycobiology 27:915–919. https://doi. org/10.1093/glycob/cwx066 13. Alocci D, Sucha´nkova´ P, Costa R, Hory N, Mariethoz J, Varˇekova´ R, Toukach P, Lisacek F (2018) SugarSketcher: quick and intuitive online glycan drawing. Molecules 23:3206. https://doi.org/10.3390/ molecules23123206 14. Aoki-Kinoshita KF (2015) Analyzing glycan structure synthesis with the glycan pathway predictor (GPP) tool. Methods Mol Biol 1273:139–147. https://doi.org/10.1007/ 978-1-4939-2343-4_10 15. Alocci D, Mariethoz J, Gastaldello A, Gasteiger E, Karlsson NG, Kolarich D, Packer NH, Lisacek F (2019) GlyConnect: glycoproteomics goes visual, interactive, and analytical. J Proteome Res 18:664–677. https://doi.org/ 10.1021/acs.jproteome.8b00766 16. Alocci D, Ghraichy M, Barletta E, Gastaldello A, Mariethoz J, Lisacek F (2018) Understanding the glycome: an interactive view of glycosylation from glycocompositions to glycoepitopes. Glycobiology 28:349–362. https://doi.org/10.1093/glycob/cwy019 17. Cooper CA, Joshi HJ, Harrison MJ, Wilkins MR, Packer NH (2003) GlycoSuiteDB: a curated relational database of glycoprotein
An Interactive View of Glycosylation glycan structures and their biological sources. 2003 update. Nucleic Acids Res 31:511–513 18. Hashimoto K, Goto S, Kawano S, AokiKinoshita KF, Ueda N, Hamajima M, Kawasaki T, Kanehisa M (2006) KEGG as a glycome informatics resource. Glycobiology 16:63R–70R. https://doi.org/10.1093/ glycob/cwj010 19. Raman R, Venkataraman M, Ramakrishnan S, Lang W, Raguram S, Sasisekharan R (2006) Advancing glycomics: implementation strategies at the consortium for functional Glycomics. Glycobiology 16:82R–90R. https://doi. org/10.1093/glycob/cwj080 20. Ceroni A, Dell A, Haslam SM (2007) The GlycanBuilder: a fast, intuitive and flexible software tool for building and displaying glycan structures. Source Code Biol Med 2:3. https://doi.org/10.1186/1751-0473-2-3 21. Joshi HJ, von der Lieth C-W, Packer NH, Wilkins MR (2010) GlycoViewer: a tool for visual summary and comparative analysis of the glycome. Nucleic Acids Res 38: W667–W670. https://doi.org/10.1093/ nar/gkq446 22. von der Lieth C-W, Freire AA, Blank D, Campbell MP, Ceroni A, Damerell DR, Dell A, Dwek RA, Ernst B, Fogh R, Frank M, Geyer H, Geyer R, Harrison MJ, Henrick K, Herget S, Hull WE, Ionides J, Joshi HJ, Kamerling JP, Leeflang BR, Lutteke T, Lundborg M, Maass K, Merry A, Ranzinger R, Rosen J, Royle L, Rudd PM, Schloissnig S, Stenutz R, Vranken WF, Widmalm G, Haslam SM (2011) EUROCarbDB: an open-access platform for glycoinformatics. Glycobiology 21:493–502. https://doi.org/10.1093/glycob/cwq188 23. Tsuchiya S, Aoki NP, Shinmachi D, Matsubara M, Yamada I, Aoki-Kinoshita KF, Narimatsu H (2017) Implementation of GlycanBuilder to draw a wide variety of ambiguous glycans. Carbohydr Res 445:104–116. https://doi.org/10.1016/j.carres.2017.04. 015 24. Harvey DJ, Merry AH, Royle L, Campbell MP, Dwek RA, Rudd PM (2009) Proposal for a standard system for drawing structural diagrams of N- and O-linked carbohydrates and related compounds. Proteomics 9:3796–3801. https://doi.org/10.1002/pmic.200900096 25. Engelsen SB, Hansen PI, Pe´rez S (2014) POLYS 2.0: an open source software package for building three-dimensional structures of polysaccharides: the POLYSaccharide builder 2.0. Biopolymers 101:733–743. https://doi. org/10.1002/bip.22449
63
26. Pe´rez S, Sarkar A, Rivet A, Breton C, Imberty A (2015) Glyco3D: a portal for structural glycosciences. In: Lu¨tteke T, Frank M (eds) Glycoinformatics. Springer New York, New York, NY, pp 241–258 27. Cheng K, Zhou Y, Neelamegham S (2016) DrawGlycan-SNFG: a robust tool to render glycans and glycopeptides with fragmentation information. Glycobiology 27:200. https:// doi.org/10.1093/glycob/cww115 28. Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E (2003) The chemistry development kit (CDK): an open-source Java library for chemo- and bioinformatics. J Chem Inf Comput Sci 43:493–500. https:// doi.org/10.1021/ci025584y 29. Campbell MP, Ranzinger R, Lu¨tteke T, Mariethoz J, Hayes CA, Zhang J, Akune Y, Aoki-Kinoshita KF, Damerell D, Carta G, York WS, Haslam SM, Narimatsu H, Rudd PM, Karlsson NG, Packer NH, Lisacek F (2014) Toolboxes for a standardised and systematic study of glycans. BMC Bioinformatics 15:S9. https://doi.org/10.1186/1471-210515-S1-S9 30. Tsuchiya S, Yamada I, Aoki-Kinoshita KF (2018) GlycanFormatConverter: a conversion tool for translating the complexities of glycans. Bioinformatics 35:2434. https://doi.org/10. 1093/bioinformatics/bty990 31. Herget S, Ranzinger R, Maass K, Lieth C-WVD (2008) GlycoCT—a unifying sequence format for carbohydrates. Carbohydr Res 343:2162–2171. https://doi.org/10. 1016/j.carres.2008.03.011 32. Tanaka K, Aoki-Kinoshita KF, Kotera M, Sawaki H, Tsuchiya S, Fujita N, Shikanai T, Kato M, Kawano S, Yamada I, Narimatsu H (2014) WURCS: the Web3 unique representation of carbohydrate structures. J Chem Inf Model 54:1558–1566. https://doi.org/10. 1021/ci400571e 33. wwPDB consortium, Burley SK, Berman HM, Bhikadiya C, Bi C, Chen L, Costanzo LD, Christie C, Duarte JM, Dutta S, Feng Z, Ghosh S, Goodsell DS, Green RK, Guranovic V, Guzenko D, Hudson BP, Liang Y, Lowe R, Peisach E, Periskova I, Randle C, Rose A, Sekharan M, Shao C, Tao Y-P, Valasatava Y, Voigt M, Westbrook J, Young J, Zardecki C, Zhuravleva M, Kurisu G, Nakamura H, Kengaku Y, Cho H, Sato J, Kim JY, Ikegawa Y, Nakagawa A, Yamashita R, Kudou T, Bekker G-J, Suzuki H, Iwata T, Yokochi M, Kobayashi N, Fujiwara T, Velankar S, Kleywegt GJ,
64
Julien Mariethoz et al.
Anyango S, Armstrong DR, Berrisford JM, Conroy MJ, Dana JM, Deshpande M, Gane P, Ga´borova´ R, Gupta D, Gutmanas A, Kocˇa J, Mak L, Mir S, Mukhopadhyay A, Nadzirin N, Nair S, Patwardhan A, Paysan-Lafosse T, Pravda L, Salih O, Sehnal D, Varadi M, Varˇekova´ R, Markley JL, Hoch JC, Romero PR, Baskaran K, Maziuk D, Ulrich EL, Wedell JR, Yao H, Livny M, Ioannidis YE (2019) Protein data Bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res 47:D520–D528. https://doi.org/10. 1093/nar/gky949 34. Lu¨tteke T (2009) Analysis and validation of carbohydrate three-dimensional structures. Acta Crystallogr D Biol Crystallogr 65:156–168. https://doi.org/10.1107/ S0907444909001905 35. Emsley P, Brunger AT, Lu¨tteke T (2015) Tools to assist determination and validation of carbohydrate 3D structure data. In: Lu¨tteke T, Frank M (eds) Glycoinformatics. Springer New York, New York, NY, pp 229–240 36. Lutteke T (2006) GLYCOSCIENCES.de: an internet portal to support glycomics and glycobiology research. Glycobiology 16:71R–81R. https://doi.org/10.1093/ glycob/cwj049 37. Bo¨hm M, Bohne-Lang A, Frank M, Loss A, Rojas-Macias MA, Lu¨tteke T (2019) Glycosciences.DB: an annotated data collection linking glycomics and proteomics data (2018 update). Nucleic Acids Res 47: D1195–D1201. https://doi.org/10.1093/ nar/gky994 38. Perez S, Tubiana T, Imberty A, Baaden M (2015) Three-dimensional representations of complex carbohydrates and polysaccharides-SweetUnityMol: a video game-based computer graphic software. Glycobiology 25:483–491. https://doi.org/10.1093/glycob/cwu133 39. Park S-J, Lee J, Patel DS, Ma H, Lee HS, Jo S, Im W (2017) Glycan reader is improved to recognize most sugar types and chemical modifications in the protein data Bank. Bioinformatics 33:3051–3057. https://doi.org/10. 1093/bioinformatics/btx358 40. Sehnal D, Grant OC (2019) Rapidly display glycan symbols in 3D structures: 3D-SNFG in LiteMol. J Proteome Res 18:770–774. https://doi.org/10.1021/acs.jproteome. 8b00473 41. Liu Y, Palma AS, Feizi T (2009) Carbohydrate microarrays: key developments in glycobiology. Biol Chem 390:647. https://doi.org/10. 1515/BC.2009.071
42. Konishi Y, Aoki-Kinoshita KF (2012) The GlycomeAtlas tool for visualizing and querying glycome data. Bioinformatics 28:2849–2850. https://doi.org/10.1093/bioinformatics/ bts516 43. Agravat SB, Saltz JH, Cummings RD, Smith DF (2014) GlycoPattern: a web platform for glycan array mining. Bioinformatics 30 (23):3417–3418. https://doi.org/10.1093/ bioinformatics/btu559 44. Cholleti SR, Agravat S, Morris T, Saltz JH, Song X, Cummings RD, Smith (2012) Automated motif discovery from glycan array data. OMICS 16(10):497–512. https://doi.org/ 10.1089/omi.2012.0013 45. Klamer Z, Staal B, Prudden AR, Liu L, Smith DF, Boons G-J, Haab B (2017) Mining highcomplexity motifs in glycans: a new language to uncover the fine specificities of lectins and glycosidases. Anal Chem 89:12342–12350. https://doi.org/10.1021/acs.analchem. 7b04293 46. Wilson KM, Thomas-Oates JE, Genever PG, Ungar D (2016) Glycan profiling shows unvaried N-Glycomes in MSC clones with distinct differentiation potentials. Front Cell Dev Biol 4:52. https://doi.org/10.3389/fcell.2016. 00052 47. Lombard V, Golaconda Ramulu H, Drula E, Coutinho PM, Henrissat B (2014) The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic Acids Res 42:D490–D495. https://doi.org/10.1093/nar/gkt1178 48. Krambeck FJ, Betenbaugh MJ (2005) A mathematical model of N-linked glycosylation. Biotechnol Bioeng 92:711–728. https://doi.org/ 10.1002/bit.20645 49. Krambeck FJ, Bennun SV, Andersen MR, Betenbaugh MJ (2017) Model-based analysis of N-glycosylation in Chinese hamster ovary cells. PLoS One 12:e0175376. https://doi. org/10.1371/journal.pone.0175376 50. McDonald AG, Tipton KF, Davey GP (2016) A knowledge-based system for display and prediction of O-glycosylation network behaviour in response to enzyme knockouts. PLoS Comput Biol 12:e1004844. https://doi.org/10. 1371/journal.pcbi.1004844 51. Wollscheid B, Bausch-Fluck D, Henderson C, O’Brien R, Bibel M, Schiess R, Aebersold R, Watts JD (2009) Mass-spectrometric identification and relative quantification of N-linked cell surface glycoproteins. Nat Biotechnol 27:378–386. https://doi.org/10.1038/nbt. 1532
An Interactive View of Glycosylation 52. Steentoft C, Vakhrushev SY, Joshi HJ, Kong Y, Vester-Christensen MB, Schjoldager KT-BG, Lavrsen K, Dabelsteen S, Pedersen NB, Marcos-Silva L, Gupta R, Paul Bennett E, Mandel U, Brunak S, Wandall HH, Levery SB, Clausen H (2013) Precision mapping of the human O-GalNAc glycoproteome through SimpleCell technology. EMBO J 32:1478–1488. https://doi.org/10.1038/ emboj.2013.79 53. Joshi HJ, Jørgensen A, Schjoldager KT, Halim A, Dworkin LA, Steentoft C, Wandall
65
HH, Clausen H, Vakhrushev SY (2018) GlycoDomainViewer: a bioinformatics tool for contextual exploration of glycoproteomes. Glycobiology 28:131–136. https://doi.org/ 10.1093/glycob/cwx104 54. Clerc F, Reiding KR, Jansen BC, Kammeijer GSM, Bondt A, Wuhrer M (2016) Human plasma protein N-glycosylation. Glycoconj J 33:309–343. https://doi.org/10.1007/ s10719-015-9626-2
Chapter 4 Characterization and Analysis of Food-Sourced Carbohydrates Leonie J. Kiely and Rita M. Hickey Abstract Food carbohydrates are macronutrients that are found in fruits, grains, vegetables, and milk products. These organic compounds are present in foods in the form of sugars, starches, and fibers and are composed of carbon, hydrogen, and oxygen. These wide ranging macromolecules can be classified according to their chemical structure into three major groups: low molecular weight mono- and disaccharides, intermediate molecular weight oligosaccharides, and high molecular weight polysaccharides. Notably, the digestibility of specific carbohydrate components differ and nondigestible carbohydrates can reach the large intestine intact where they act as food sources for beneficial bacteria. In this review, we give an overview of advances made in food carbohydrate analysis. Overall, this review indicates the importance of carbohydrate analytical techniques in the quest to identify and isolate health-promoting carbohydrates to be used as additives in the functional foods industry. Key words Oligosaccharides, spectrometry
1
Chromatography,
Hydrolysis,
Derivatization,
NMR,
Mass
Introduction Carbohydrates are among the most important ingredients in foods and raw materials and are composed of carbon, hydrogen, and oxygen. These molecules constitute the main source of energy in the human diet accounting for anywhere between 40–80% of total energy intake [1]. Carbohydrates also impart crucial textural properties to foods and have many physiologically beneficial effects. Based on molecular structure, carbohydrates are classified according to their degree of polymerization into monosaccharides, disaccharides, oligosaccharides, and polysaccharides [2]. The digestibility of these differ with monosaccharides, disaccharides, starch, and glycogen being digestible to varying degrees in the mouth and small intestine, while oligosaccharides, resistant starch, and nonstarch polysaccharides are resistant to digestion [1]. Nondigestible carbohydrates can reach the large intestine relatively
Gavin P. Davey (ed.), Glycosylation: Methods and Protocols, Methods in Molecular Biology, vol. 2370, https://doi.org/10.1007/978-1-0716-1685-7_4, © Springer Science+Business Media, LLC, part of Springer Nature 2022
67
68
Leonie J. Kiely and Rita M. Hickey
intact where here they are fermented to varying degrees by gut microbes. Monosaccharides are the simplest form of carbohydrates, consisting of just one sugar unit or saccharide [3]. Importantly, monosaccharides are the only carbohydrates that can be absorbed from the small intestine. Glucose and fructose are two of the most common monosaccharides, and these are formed as a result of hydrolysis of the disaccharide sucrose and occur naturally in honey and fruit. Glucose and fructose both have the same molecular weight, but structural variations mean that these monosaccharides are absorbed and metabolized differentially in the body conferring different functions [4]. In the fed state, fructose is mainly used in de novo lipogenesis, which is the synthesis of lipids from dietary sources, usually carbohydrates by the liver, whereas in the fed state glucose is directed toward glycogenesis or glycogen synthesis [3]. Other minor monosaccharides which exist in foods include galactose, mannose, xylose, and arabinose. Chemically, monosaccharides are chiral polyhydroxylated aldehydes or ketones which contain two or more hydroxyl groups, and can be classified according to three different characteristics, namely, the location of the carbonyl group, the number of carbon atoms and their chiral property [5]. A monosaccharide is known as an aldehyde if its carbonyl group is an aldose and a ketose if it is a ketone. Monosaccharides are also named according to the number of carbons present, for example, the smallest monosaccharides are known as trioses, which contain three carbons while monosaccharides containing four carbon atoms are referred to as tetroses [1]. Disaccharides are simple structures composed of two monosaccharides linked via covalent glycosidic bonds, and the most common disaccharides include sucrose, maltose and lactose which occur naturally in foods such as honey, maple sugar, fruits, berries, malt, and milk [5]. The nonreducing sugar sucrose, which is composed of glucose and fructose, is the most common sugar present in the human diet, and is responsible for up to 14% of the total energy intake [6]. The disaccharide maltose is formed from two units of glucose and is linked via a ⍺ (1!4) glycosidic bond which can be broken down by the enzyme maltase to release the glucose units [7]. The reducing sugar lactose, which is present in milk and milk products serves as an energy source for infants and can be hydrolyzed by the enzyme β-galactosidase (also known as lactase) into its monosaccharide components, galactose and glucose. Oligosaccharides are carbohydrates containing three to ten monosaccharide residues linked covalently by glycosidic bonds formed by glycosyltransferase enzymes between the anomeric carbon of one monosaccharide and a hydroxyl group of another [8]. Their resistance to digestion means the majority of these oligosaccharides remain intact until they reach the large intestine, where they can be fermented in by colonic microorganisms
Characterization and Analysis of Food-Sourced Carbohydrates
69
selectively stimulating the growth and activity of these beneficial bacteria in the colon [6]. Common oligosaccharides include fructooligosaccharides, galactooligosaccharides, xylooligosaccharides, and milk oligosaccharides many of which are associated with numerous health benefits such as prebiotic activity [9], reduction of serum cholesterol levels, reduction of the risk of developing colon cancer [10], enhancement in mineral absorption, and improved immune function [11]. Polysaccharides are chemically diverse, abundant, natural polymers which are found throughout nature in plants, animals, and microorganisms. Glycans is a general term given to polysaccharides in which large numbers of monosaccharides are naturally joined by O-glycosidic linkages. These high molecular weight structures can be linear or branched [12] and can be generally presented by Cx (H2O)y formula, with x being a large number between 200 and 2500 [13]. Polysaccharides are renowned for their nutritive value as well as associated health benefits, such as the promotion of both immune and digestive function as well as their role in detoxification [14]. Polysaccharides constitute a major energy source and are also involved in the storage of this energy. The most abundant forms of polysaccharides that exist are starches and cellulose which are both long chain homopolymers of (1!4) linked D-glucose units with either an ⍺- or β- rotation of the acetal bond respectively. Cellulose is an insoluble structural polymer found in plant cell walls and in materials such as wood and cotton which is indigestible by human enzymes [15]. Starch, on the other hand, is an easily digestible material with no significant structural use [8]. Other complex polysaccharides or heteroglycans also exist and can be composed of sugars other than glucose such as galactose, mannose, and xylose [16]. These complex structures can be composed of branched chains, often containing substituents and functional groups [17]. Other well-known nonstarch polysaccharides used in the food industry include pectin, inulin, and chitin. The classification and measurement of dietary carbohydrates requires a systematic approach that describes both the chemical and functional properties of carbohydrates in foods. The detection of carbohydrates in foods is of particular importance in areas such as food nutritional labeling, elucidation of new structures and in facilitating the analytical and quantitative testing of new product compositions in order to evaluate their nutritional composition. This chapter provides a summary of the advances made in food carbohydrate analysis that make the detection and isolation of such components achievable. An overview of the common methods used for food carbohydrate analysis is illustrated in Fig. 1.
70
Leonie J. Kiely and Rita M. Hickey
Oligo- and Polysaccharides
Monosaccharides
Derivatized RP-HPLC
Labelled
Underivatized HPAEC-PAD
Quantitation of individual monosaccharides
CE
Differences in sialylation; isomer and glycoform differentiation
Not labelled
HPLC (RP, HILIC, AE)
HPAEC-PAD, PGC
Structural characterisation by comparing with known standards
Structural characterisation by comparing with known standards
Fig. 1 Overview of common methods for food carbohydrate analysis
2
Sample Preparation and Purification Sample preparation is critical for reliable and precise carbohydrate analysis. Samples in aqueous solutions, such as fruit juices, milk, honey, and syrups, generally require very little preparation prior to analysis. In solid food matrices such as in nuts, cereals, fruit, breads, and vegetables, carbohydrates are usually physically entrapped or chemically bound to other components, and require extraction and purification [18]. The methods used for sample preparation depends on the type of analysis, the food matrix and carbohydrate characteristics. For chromatographic analysis, carbohydrates often require any number of preparatory steps including filtration, fractionation, extraction, chemical hydrolysis, or derivatization.
2.1 Filtration and Fractionation
Filtration and dilution aim to prevent the insoluble material present in the sample from entering and blocking the chromatographic column. Various types of filtration or sample clean-up employed include the use of filter paper and other membrane filtration or fractionation techniques including ultrafiltration, nanofiltration, and reverse osmosis [19]. These methods can all be used to separate and fractionate oligosaccharides from polysaccharides for example as well as purify and concentrate carbohydrates prior to analysis and can be performed before or after extraction techniques [20]. Ultrafiltration and nanofiltration are commonly used, and the main consideration when selecting ultrafiltration or nanofiltration membranes it is the molecular weight cut-off point which is the molecular weight of the smallest compound retained [21]. As an example, Mehra et al. (2014) employed the membrane filtration technology to produce powders enriched in bovine milk oligosaccharides (BMOs) using mother liquor (the liquid remaining after the separation of lactose crystals from whey UF permeate) as a starting raw material. The microfiltrate of mother liquor from the microfiltration step was utilized as the feed to the ultrafiltration (spirally
Characterization and Analysis of Food-Sourced Carbohydrates
71
wound membranes with a porosity of 1 kDa MWCO) for fractionation and enrichment of milk oligosaccharides from lactose and mineral salts. Carbon fractionation can also be used to fractionate carbohydrates whereby the carbohydrate can adsorb to the activated charcoal column followed by elution using different concentrations of ethanol. This technique fractionates the carbohydrates according to their degree of polymerization and has been shown to be particularly effective in separating monosaccharides from complex food samples [22]. 2.2
Extraction
Although the preparation of carbohydrates is complex there are some commonly used methods (reviewed by Brummer and Cui 2005). In general, samples are dried and milled first, followed by a defatting step. Drying can be done under a vacuum, at atmospheric pressure, or in a freeze dryer if the samples heat sensitive. Lipids and oils are usually removed from the sample using nonpolar solvents such as hexane or chloroform before extraction to increase the yield and decrease contaminants. Low molecular-weight carbohydrates can be extracted using hot 80% ethanol. The ethanol extract will contain contaminants such as amino acids, organic acids, vitamins, pigments, and mineral salts (Brummer and Cui, 2005). These contaminants are usually removed by treating the solution with clarifying agents such as metal salts or by passing it through one or more ion-exchange resins [18]. After 80% ethanol treatment, the residue mainly contains proteins and high molecular-weight carbohydrates (polysaccharides). Water-soluble polysaccharides can be extracted using hot water and separated from insoluble material by centrifugation or filtration. Part of the insoluble polysaccharides can be obtained using mild base extraction (KOH or NaOH with the concentration less than 0.5 M). Removal of residual starch can be accomplished by digesting the samples with α-amylase or amyloglucosidase, followed by centrifugation (Brummer and Cui 2005). Proteins may be removed from samples enzymatically with proteases [23]. Overall, it is important that during the extraction procedure that caution is taken to ensure a change in form or quantity of sugars present does not occur and factors such as the extraction temperature and the time taken in performing extraction are optimized. These factors must be considered in order to ensure the sugars extracted are in stable form and at the highest possible extraction yield [24]. An alternative to the use of harsh chemical solvents for extraction is the use of ultrasound or microwave technology. The ultrasonic extraction method based on the ultrasonic wave cavitation is used to break cell walls and accelerate the dissolution of organics in cells, thus improving the yield of polysaccharides [25]. Zhu et al. [26] used ultrasound waves to extract polysaccharides from Polygonum multiflorum, achieving a maximum extraction rate of 5.49%. Recent research suggests that the ultrasonic extraction method
72
Leonie J. Kiely and Rita M. Hickey
could significantly improve the rate of polysaccharide dissolution, whereas prolonged exposure to ultrasound may change the advanced structure of polysaccharides and affect the biological activity [27]. The principle of the microwave extraction method is that, when the cell absorbs microwave energy, the intracellular pressure will increase, leading to cell rupture and causing the active components to flow into the solvent [28]. The microwave-assisted extraction method can improve the yield of polysaccharides significantly and save energy and time. For instance, the yield of polysaccharides from Cyphomandra betacea was determined to be 36.52% under the optimal conditions [29]. However, a rapid temperature spike may change the molecular mass distribution and the structures of the thermally unstable polysaccharides. It should therefore be noted that these technologies may be associated with adverse effects and are subject to increased operational costs (Nobre Gonc¸alves et al., 2015). Other methods used for extraction of desired carbohydrates include pressurized liquid extraction (PLE), supercritical fluid extraction (SFE), and solid-phase extraction (SPE). PLE is based on the use of solvents at high temperatures (50–200 C) and pressures (1450–2175 psi) to ensure the rapid extraction rate of compounds [30]. The high temperature enables higher solubility and a higher rate of solute diffusion in the solvent. The use of high pressure meanwhile maintains the solvent below its boiling point, allowing for a high penetration of the solvent into the sample [31]. PLE has been used to obtain enriched fractions of di- and trisaccharides from honey [32]. The use of supercritical fluid extraction consists of the separation of the analyte from the matrix using supercritical fluids as the extracting solvent [33]. Carbon dioxide which is probably the most used supercritical fluid is nontoxic, nonflammable, can work at low temperatures, and is relatively low in cost. However, the solubility of carbohydrates in the supercritical phase of this fluid is low (Soria et al., 2012). This technique has, however, been useful for the separation of lactulose and tagatose from their isomeric aldoses (i.e., lactose and galactose, respectively) [34] and GOS from monosaccharides in a commercial sample using CO2 with ethanol/water as cosolvent [35]. For SPE, the selection of an appropriate extraction sorbent depends on understanding the mechanism of interaction between sorbent and analyte of interest [36]. Reverse-phase cartridges are commonly used for the purification of carbohydrates. Octyl (C8) and octadecyl (C18) silica phases are the most common reverse phase cartridges used for carbohydrate cleanup. These sorbents display high affinity for hydrophobic compounds but less affinity for hydrophilic solutes such as oligosaccharides [37]. C18 cartridges in particular are useful for the fractionation of (1–4)-α-glucans depending on their degree of polymerization [33]. Graphitized carbon SPE has also been widely used for the
Characterization and Analysis of Food-Sourced Carbohydrates
73
separation of oligosaccharides from lactose in bovine milk, with washing steps and elution using acetonitrile, trifluoroacetic acid, or water employed following SPE extraction to maximize yield [38]. 2.3
Chromatography
Various high performance liquid chromatography (HPLC) methods have been also used for carbohydrate purification, including hydrophilic interaction liquid chromatography (HILIC), highperformance anion-exchange chromatography (HPAEC), and reversed-phase chromatography [39, 40] and these will be discussed in Subheading 3.9 in more detail in relation to their use in the identification of carbohydrates. Most of the HPLC separation methods are targeted for analytical glycomics using small quantities of starting material, which does not generate significant levels of carbohydrates for examining functionality. There have been a limited number of examples in which a more significant amount of starting material is used which allowed the production of sufficient amount carbohydrates for nuclear magnetic resonance (NMR) analysis. Isolation of high mannose N-glycans from soy proteins and egg yolks by a preparative scale multidimensional HPLC method has been reported [41, 42]. However, even after multidimensional HPLC and with the use of analytical columns, some fractions are still mixtures of isomers that are very difficult to separate. To address this, recycled HPLC could be a solution [43, 44] as it allows the recycling of sample, in part or full, and increases the separation efficiency of the process while keeping the peak dispersion to a minimum. Recently, a simple and affordable closed-loop recycled HPLC method for separation of complex carbohydrates in the preparative scale was developed. It was successfully applied to reverse-phase chromatography and HILIC [45]. Chromatographic techniques with stationary phase based gel-filtration/permeation mechanisms are also commonly used for the fractionation of carbohydrates. For example, gel-filtration chromatography (G25 Sephadex column) has often been used to separate human milk oligosaccharides (HMOs) from lactose and salts [33].
2.4
Hydrolysis
It is often necessary to perform a hydrolysis step on large carbohydrates such as polysaccharides prior to their analysis in order to obtain low molecular weight monosaccharides or oligosaccharides which can be analyzed more easily [46]. Acidic hydrolysis can cleave the N- and O- glycosidic linkages between carbohydrates, releasing constituent monosaccharides and oligosaccharides in order to cleave glycosidic bonds. Some of the most commonly utilized acids for this hydrolysis include, sulfuric acid (H2SO4), hydrochloric acid (HCl), and trifluoracetic acid (TFA) with TFA often preferred due to its high volatility and ease of removal prior to chromatographic analysis [23, 47]. Hydrolysis may also
74
Leonie J. Kiely and Rita M. Hickey
occasionally be performed using harsh alkaline agents such as NaOH and Ca(OH)2 for the pretreatment of certain hemicelluloses; however, this option can often result in the loss of valuable functional groups from samples [48]. The efficacy of the hydrolysis procedure is dependent on a number of factors such as concentration of acid used and hydrolysis temperature and time, with all these having been shown to significantly influence the efficacy of hydrolysis with acid concentration having the most potent effect [49]. While conditions used vary considerably, common conditions used for neutral-sugar analysis are often 1 M–2 M TFA at 100 C for a period of 4 h and for more complex aminoglycans the concentration of TFA or HCl is increased to 4 M, while temperature remains at 100 C and hydrolysis time is increased to 6 h [50]. Other modified versions of hydrolysis have also been developed such as the National Renewable Energy Laboratory (NREL) method, a two-stage acid hydrolysis protocol which begins with a high acid concentration such as between 3 M and 12 M of H2SO4 at a low temperature of 37 C for 1 h. This first step allows for the fragmentation of the polysaccharide complex while the second step uses a more dilute acid concentration such as 1 M H2SO4 in combination with a high temperature of 100 C for 2 h in order to fully hydrolyze oligosaccharides into constituent monosaccharides which can then be analyzed by HPLC [51]. Other methods also include an enzymatic hydrolysis step in addition to chemical hydrolysis. These methods often result in improved liberation of monosaccharides or oligosaccharides without degradation of sugars but do, however, require a much longer procedure so may not always be practical [52]. While acid hydrolysis steps are often necessary in carbohydrate analysis, it is not without a number of drawbacks. These include difficulty in optimizing conditions as well as undesirable side reactions and degradation products [53]. 2.5
Derivatization
Carbohydrates can be detected in their natural state without any further derivatization by their UV absorbance at low wavelengths (5% of the total intensity of all peaks in the full theoretical isotopic distribution, and (2) mth peak should have intensity >15% of the most abundant peak. The m/z of this mth peak of the jth glycan is then Mj.m. 3. Define the experimental “assignment window” for a single glycan: This is the m/z experimental window where the experimental spectra overlaps with the glycan isotopic peak for each of the N glycans. This is based on the theoretical mass distribution of the m peaks of the candidate glycan and allowed tolerance. Thus, for the jth glycan, this window stretches from the first theoretical monoisotopic peak, Mj,1 MA0 to Mj,2 MA0 . . . to Mj,m MA0 , where MA0 ¼ MA if MA is
106
Yusen Zhou and Sriram Neelamegham
defined in ‘Da’ units and MA0 ¼ Mj,1 MA/106 if MA is defined in “ppm” units. 4. Determine if assignment windows overlap: Two cases may exist. In the first case, there may be only one glycan in a given assignment window, in which case we follow the workflow outlined in point 5 below. In a second scenario, the assignment windows of multiple glycans may overlap, in which case proceed to calculations in steps 6–7. 5. “Pair” a single experimental peak with the monoisotopic peak of the glycan in the assignment window: Once the experimental assignment window and theoretical distribution are established, it is necessary to “pair” the monoisotopic peak of the candidate glycan with a single experimental peak. If there is only one experimental peak within the tolerance window proximal to the monoisotopic peak, that is, only one peak satisfying the criterion |Experimental Peak-Theoretical monoisotopic mass (Mj,1)| < 0.9 MA0 , this one is paired. If however, there are more than one possible experimental peaks in the tolerance window of the monoisotopic peak, then the fitness of all possible experimental peaks is evaluated by quantifying the difference between the experimental and theoretical isotopic distributions according to the following. vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi !2ffi u um1 X I =I j ,i j ,iþ1 expt: log 10 Isotopic deviation, ID j ,ðkÞ ¼ t I j ,i =I j ,iþ1 theor: i¼1 ð2Þ Here, k is the number of possible experimental monoisotopic peaks for the jth glycan. The ratio Ij, i/Ij, i + 1 is the intensity ratio of two consecutive peaks, and this is calculated both for the experimental data (numerator) and the candidate glycan (denominator). This ratio is calculated for each of the m theoretical peaks in the candidate glycan. If the match is perfect, the ratio would be 1, and the isotopic deviation (ID) value would be zero. Larger values would indicate greater deviation. In cGlyco, IDj, (k) is calculated for each of the k possible experimental monoisotopic peaks. The peak with the smallest value IDj, (k) value is fit to calculate abundance, provided the corresponding IDj, (k)< qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2ffi ðm 1Þ log 10 ð1:5Þ : If the criterion is not met, the candidate glycan is removed from glycanDB. The abundance of the glycan, Aj, is then set to equal the sum of the area of all experimental peaks within the assignment window, that is, m P Aj ¼ ðFWHM I Þi : i¼1
Comparative Glycomics Analysis of Mass Spectrometry Data
107
6. Combine overlapping “assignment windows”: If the m/z of two or more “assignment windows” overlaps, then these are combined into a single larger assignment window (Fig. 5). m again defines the last peak in the final assignment window, that is, Mj,mth is the last peak and it is located m1 (Da) away from Mj,1 (the first peak). 7. Calculating Aj when multiple “assignment windows” overlap: Here, we perform best fit calculations for each of the potential experimental peaks that lie in the tolerance window of the monoisotopic peak of the first glycan, with the smallest molecular mass. An initial glycan abundance values, Aj,init, is first estimated for the selected experimental peak using the expression: Aj,init ¼ (FWHMI)for paired expt. peak [Ij,1/∑Ij,i]theor, where [Ij,1/∑Ij,i]theor quantifies the relative abundance of the monoisotopic peak of the candidate glycan with respect to the full theoretical distribution, and FWHM data are available from MSData. Once this Aj,init value is determined, the theoretical area due to this glycan is subtracted from the experimental data. The relative abundance for the next candidate glycan is then estimated as necessary using the modified experimental data, and so on until this parameter is defined for all glycans in the window. Once all the initial Aj,init values are obtained for all glycans in the assignment window, the fmincon function of MATLAB is applied to the minimization problem in Eq. 3 below. vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 0 !2 1 u
m u n X X min I j ,i t ðFWHM I Þ @ P Aj A i I A j > 0 i¼1 j ,m m theor: j ¼1 ð3Þ In this expression, n is the total number of glycans in the merged assignment window, (FWHM I)i, is the area of ith experimental peak in the assignment window. This is matched by the ∑-term which provide a theoretical measure of the same. This experimental fitting results in a refined estimate of Aj. The above calculation is repeatedly performed for all k possible experimental monoisotopic peaks, with the best fit corresponding to the final solution.a 8. For each of these candidate glycans, assigned monoisotopic mass, Aj and IDj are stored in glycanDB. cGlyco also calculates the residue value for the entire spectrum, which quantifies the difference between the experimental measurements and fitting process, and these data are store in MSdata.
108
Yusen Zhou and Sriram Neelamegham
Fig. 5 Resolving relative abundance of two overlapping glycans in the same assignment window. (a, b) Theoretical isotopic distribution of two glycans. Monoisotopic mass differs by ~2 Da. Purple and red lines show the window of the theoretical distribution considered when filling data. (c) Experimental raw data fit to
Comparative Glycomics Analysis of Mass Spectrometry Data
Resm=z ð%Þ ¼
Aj I
j ,m theo =
P
m I j ,m theo
FWHM I
FWHM I
109
j ,m Expt
j ,m Expt
100% ð4Þ Case study data fitting: Figure 5 presents an example with two overlapping candidate glycans. The initial glycans have an assignment window that is 5 Da wide. Upon combining them, the final experimental assignment window is 7 Da wide. The theoretical abundance of these two candidate glycans is then estimated by minimizing the residue value between experimental and theoretical data as explained above. The final isotopic deviation in this case was small and accepted (~2 1016), suggesting good estimates for Aj. 3.4 Monosaccharide and Substructure Analysis
Summary: Once the AUC of all candidate glycans (Aj) is determined above, cGlyco calculates the relative abundance of specific glycans, monosaccharides, and glycan substructures. This requires several inputs: (1) name of the MS raw data file (MSfilename); (2) processed “glycanDB” and directory location (glycanDBdir) from step 3; (3) List of substructures (listOfStruct) that need to be classified (for substructure analysis only). (4) output directory location (outputdir), where the final “glycanDB” is stored including monosaccharide and substructure data. The equations for the estimations follow. 1. Glycan relative abundance Relative Abundance of j th glycan, RA j ð%Þ ¼
Aj N P j ¼1
100%
Aj
Here, if there are more than one glycan corresponds to a given m/z value, it is assumed that all isotopic glycans appear in equal concentration, and thus RAj at that m/z is divided among the isotopes. 2. Monosaccharide abundance: This quantifies the weighted average of monosaccharides in the experimental spectra: X RA j mono j ,m Monosaccharide abundance ðM m Þ ¼ j
ä Fig. 5 (continued) theoretical peaks over the combined mass range (i.e., “assignment window”) shown in panels (a, b). Assignment window has seven peaks (i.e., m ¼ 7) and two glycans (i.e., n ¼ 2) Purple/red arrows and boxes specify contribution of two glycans to the experimental data. Final, fitted glycan abundance data (Aj) are also provided
110
Yusen Zhou and Sriram Neelamegham
Here monoj,m is the number of the mth monosaccharide in the jth glycan. 3. Substructure abundance: This quantifies the weighted average of a given substructure across the identified glycans. X Substructure abundance ðSSs Þ ¼ RA j sub j ,s j
where subj,s is the number of substructure s, in the jth glycan. 4. Data presentation in heat maps: Results are presented as heat maps that compare the relative abundance of specific glycans, monosaccharides and substructures from different cells or tissues. All heat maps are presented on a log-scale with the log10(Y/X) representing the relative abundance of the measure on the Y-axis with respect to that on the X-axis. Case study monosaccharides and substructure analysis: cGlyco compares the relative abundance of various monosaccharides (Man, Gal, GlcNAc, NeuAc, Fuc) in four different human leukocytes (Fig. 6a). Here, it is noted that B-cells have a markedly lower abundance of Gal, GlcNAc, Fuc, and Neu5Ac. This reflects to the paucity of tri- and tetraantennary complex glycans in the B-cell pool. The monosaccharide composition of other cell types was comparable. Due to the reduced presence of complex glycans, B-cells also display lower amounts of LeX, sLeX, LacNAc and bisecting N-glycan structures (Fig. 6b). In contrast to B-cells, all other blood cells displayed similar monosaccharide and glycan substructure compositions, except that monocytes contained greater number of sialyl LacNAc epitopes. Figure 7 summarizes the specific carbohydrates that are differentially regulated in B-cells vs. monocytes.
4
Notes 1. cGlyco analysis relies heavily on the quality of experimental data. Thus, the conclusions of the work are dependent on the quantitative nature of the experiment. 2. To support this program, the open source cGlyco program contains a tutorial file (cGlycoTest.m, see github repository) that can be run in a stepwise manner. 3. While the above examples are provided for MALDI-TOF data only, the analysis framework may be extended to analyze ESI based experiments also.
Comparative Glycomics Analysis of Mass Spectrometry Data
111
Fig. 6 Monosaccharide and glycan epitope abundance: (a, b) Monosaccharide (panel a) and substructure (panel b) analysis for different cell types. The heat maps are built based on the values calculated by the equation log10(Y/X), where Y and X indicate the abundance value of cell type specified along the y- and x-axis, respectively. Thus, a log 10-fold change of 1.0 indicates a tenfold change in monosaccharide/substructure abundance
112
Yusen Zhou and Sriram Neelamegham
Fig. 7 MS composition comparison. Figure compares glycan abundance data for two different human leukocyte types. The heat map is built based on the value calculated by the equation log10(Y/X), as explained in the previous figure
Acknowledgments This work was supported US National Institutes of Health grants HL103411, GM133195, and GM126537. References 1. Neelamegham S, Mahal LK (2016) Multi-level regulation of cellular glycosylation: from genes to transcript to enzyme to structure. Curr Opin Struct Biol 40:145–152. https://doi.org/10. 1016/j.sbi.2016.09.013 2. Ruhaak LR, Xu G, Li Q, Goonatilleke E, Lebrilla CB (2018) Mass spectrometry approaches to glycomic and glycoproteomic analyses.
Chem Rev 118(17):7886–7930. https://doi. org/10.1021/acs.chemrev.7b00732 3. Zhou S, Hu Y, DeSantos-Garcia JL, Mechref Y (2015) Quantitation of permethylated N-glycans through multiple-reaction monitoring (MRM) LC-MS/MS. J Am Soc Mass Spectrom 26(4):596–603. https://doi.org/10. 1007/s13361-014-1054-1
Comparative Glycomics Analysis of Mass Spectrometry Data 4. Wada Y, Azadi P, Costello CE, Dell A, Dwek RA, Geyer H, Geyer R, Kakehi K, Karlsson NG, Kato K, Kawasaki N, Khoo KH, Kim S, Kondo A, Lattova E, Mechref Y, Miyoshi E, Nakamura K, Narimatsu H, Novotny MV, Packer NH, Perreault H, Peter-Katalinic J, Pohlentz G, Reinhold VN, Rudd PM, Suzuki A, Taniguchi N (2007) Comparison of the methods for profiling glycoprotein glycans—HUPO human disease glycomics/proteome initiative multi-institutional study. Glycobiology 17(4):411–422. https://doi. org/10.1093/glycob/cwl086 5. Del Solar V, Gupta R, Zhou Y, Pawlowski G, Matta KL, Neelamegham S (2020) Robustness in glycosylation systems: effect of modified monosaccharides, acceptor decoys and azido sugars on cellular nucleotide-sugar levels and pattern of N-linked glycosylation. Mol Omics. 16:377–386. https://doi.org/10.1039/ d0mo00023j 6. Wang SS, Gao X, Solar VD, Yu X, Antonopoulos A, Friedman AE, Matich EK, Atilla-Gokcumen GE, Nasirikenari M, Lau JT, Dell A, Haslam SM, Laine RA, Matta KL, Neelamegham S (2018) Thioglycosides are efficient metabolic decoys of glycosylation that reduce selectin dependent leukocyte adhesion. Cell Chem Biol 25(12):1519–1532e1515. https://doi.org/10.1016/j.chembiol.2018. 09.012 7. Han Y, Xiao K, Tian Z (2019) Comparative glycomics study of cell-surface N-glycomes of HepG2 versus LO2 cell lines. J Proteome Res 18(1):372–379. https://doi.org/10.1021/ acs.jproteome.8b00655 8. Plomp R, de Haan N, Bondt A, Murli J, Dotz V, Wuhrer M (2018) Comparative glycomics of immunoglobulin A and G from saliva
113
and plasma reveals biomarker potential. Front Immunol 9:2436. https://doi.org/10.3389/ fimmu.2018.02436 9. Mondal N, Buffone A Jr, Stolfa G, Antonopoulos A, Lau JT, Haslam SM, Dell A, Neelamegham S (2015) ST3Gal-4 is the primary sialyltransferase regulating the synthesis of E-, P-, and L-selectin ligands on human myeloid leukocytes. Blood 125(4):687–696. https://doi.org/10.1182/blood-2014-07588590 10. Liu G, Neelamegham S (2014) A computational framework for the automated construction of glycosylation reaction networks. PLoS One 9(6):e100939. https://doi.org/10. 1371/journal.pone.0100939 11. Neelamegham S, Liu G (2011) Systems glycobiology: biochemical reaction networks regulating glycan structure and function. Glycobiology 21(12):1541–1553. https:// doi.org/10.1093/glycob/cwr036 12. McDonald AG, Tipton KF, Davey GP (2016) A knowledge-based system for display and prediction of O-glycosylation network behaviour in response to enzyme knockouts. PLoS Comput Biol 12(4):e1004844. https://doi.org/10. 1371/journal.pcbi.1004844 13. Spahn PN, Hansen AH, Kol S, Voldborg BG, Lewis NE (2017) Predictive glycoengineering of biosimilars using a Markov chain glycosylation model. Biotechnol J 12(2):1600489. https://doi.org/10.1002/biot.201600489 14. Bennun SV, Yarema KJ, Betenbaugh MJ, Krambeck FJ (2013) Integration of the transcriptome and glycome for identification of glycan cell signatures. PLoS Comput Biol 9 (1):e1002813. https://doi.org/10.1371/jour nal.pcbi.1002813
Part II Synthetic Biology and Glycoengineering Methods for Controlling Glycosylation
Chapter 6 Cell Free Remodeling of Glycosylation of Antibodies Letı´cia Martins Mota, Venkata S. Tayi, and Michael Butler Abstract The N-glycosylation profile of a monoclonal antibody (mAb) is a critical quality attribute in relation to its therapeutic application. The control of this profile during biomanufacture is difficult because of the multiple parameters that affect the glycosylation metabolism within the cell and the environment in which the cell is grown. One of the approaches that can be used to produce a preferred glycan profile or a single glycoform is through chemoenzymatic remodeling during the isolation of a mAb. Here we describe protocols that can be utilized to produce preferred glycoforms that include galactosylated, agalactosylated, or sialylated glycoforms following isolation of a mAb. Methods for analysis and assignment of structures of the samples following glycoengineering are also described. Chemoenzymatic modeling of mAb glycans has the potential for scale-up and to be introduced into biomanufacturing of mAbs with higher specific therapeutic activities. Key words Glycoengineering, Glycosylation, Galactosylation, Sialylation, Glycan, CHO cell, 2-Aminobenzamide (2-AB), HILIC-HPLC, Monoclonal antibody (mAb), N-glycans, PNGase F, Protein-A
1
Introduction
1.1 Monoclonal Antibodies
Monoclonal antibodies (mAbs) are an important class of therapeutic glycoproteins that represent over 50% of the global sales of all biologics [1, 2]. More than 570 antibodies are at some stage of clinical trials for a number of therapeutic application, ranging from cancer to human immunodeficiency virus (HIV) infection treatment [3]. As the global demand for these molecules increases, so does the need for more efficient technologies able to produce high yields, while maintaining or improving mAbs quality attributes, that contribute to their efficacy as therapeutics. The glycan profile of mAbs has been recognized as a critical quality attribute for maintaining the efficacy of therapeutic antibodies [13]. The control of the glycan profile during bioprocessing is therefore essential and is the topic of this chapter.
Gavin P. Davey (ed.), Glycosylation: Methods and Protocols, Methods in Molecular Biology, vol. 2370, https://doi.org/10.1007/978-1-0716-1685-7_6, © Springer Science+Business Media, LLC, part of Springer Nature 2022
117
118
1.2
Letı´cia Martins Mota et al.
N-glycan
The immunoglobulin structure of antibodies comprises two heavy and two light polypeptides chains, with a variable [4] and a constant (Fc) domain with a molecular mass of approximately 146 kDa in humans. The light chains are bound to the heavy chains by disulfide bonds. The Fab region is responsible for antigen recognition. In the Fc region, between the subdomains CH2/CH3, there is a conserved N-glycan attached to the asparagine(N)297 residue [5]. The Fc region binds to receptors of the immune system, resulting in effector responses that are essential for therapeutic efficacy. Mammalian cell expression systems have been developed to target the high productivity of numerous proteins and glycoproteins [6, 18]. The process is based on inserting a plasmid into a cell, which translates a supplied DNA with a target gene into a sequence of amino acids [7]. After translation, the protein molecules are then subject to post-translational modifications (PTM) according to the function they execute. Glycosylation is one of the most important PTMs for therapeutic glycoproteins. The frequency and complexity of PTMs can differ substantially between prokaryotes and nucleated eukaryotes [8]. Whereas the protein backbone of mAbs is a product of the template of the nucleotide sequence of the plasmid transfected into the producer cells, the glycosylation profile is a product of the array of intracellular glycosylation enzymes and metabolic pool of nucleotide-sugars present in the host cell. Studies indicate that in vivo, cell type, species, age, sex, ethnicity, physiological, and pathological conditions impact the quality and quantity of Fc glycan composition [9–12]. During biomanufacture of therapeutic antibodies, the glycosylation profile is dependent upon a number of factors, such as cell line, media composition, culture time, supplements, pH, oxygen, and ammonia levels [13, 14]. Due to their human-like glycosylation machinery, absent in prokaryotic cells, mammalian cells are the preferred platform to produce therapeutic recombinant proteins [15]. Commercial production of therapeutic mAbs occurs mainly in Chinese hamster ovary (CHO) and murine NS0 cell lines due to their favorable characteristics of growth, transfection and high productivity. These cells produce human-like glycan profiles. However, different from human cell lines, NS0 may synthesize the Nglycolylneuraminic acid (Neu5Gc) and galactose-α1,3-galactose glycans, potentially immunogenic to humans [10, 16–19]. The glycan structures produced by any specific bioprocess are a mixture of heterogeneous glycoforms and the resulting profile is dependent upon the prevailing growth conditions in the bioreactor, as well as the enzyme profile and availability of substrates in the host cell line [20, 21]. It is widely documented that the mAb glycoforms influence protein function in various ways, including folding, aggregation, stability, transport, and immune-response [22, 23]. Furthermore, the Fc glycan profile plays a key role in therapeutic function. The
Cell Free Remodeling of Glycosylation of Antibodies
119
glycan present determines the antibody affinity to effector cells through selective binding to cell surface receptors, that defines the IgG immune function [9]. Studies have associated terminal galactose of IgG (e.g., A2G2) with higher affinity of the molecule for the Fcγ receptor IIIa (FcγRIIIa), present usually on the surface of macrophages and natural killer (NK) cells, leading to higher antibody-dependent cellular cytotoxicity (ADCC) [9, 24]. In addition, IgGs with terminal galactose and lacking fucose (e.g. A2G1 or A2G2) present higher binding affinity for C1q proteins from the complement system, resulting in a more intense complement dependent cytotoxicity (CDC) response [25]. Sialylated Fc N-glycans have been shown to upregulate the FcγRIIb on macrophages, which ultimately reduces inflammation [9, 26]. Reducing glycan heterogeneity in mAbs during a bioprocess offers several advantages. Firstly, this may reduce batch variability between manufacturing lots. Secondly, it may be that a certain glycan structure may be needed for a specific therapeutic activity. Thus, a mixture of mAb glycoforms may not achieve optimal therapeutic function, because of variable binding to immune system receptors. For example, the production of homogeneous sialylated immunoglobulins resulted in a drug candidate with tenfold higher anti-inflammatory activity [27]. This enhanced efficacy of a single glycoform antibody increases the specific activity of the product and allows a reduced of intake dose by the patient. A homogeneous glycan profile also may be desired to elucidate IgG pharmacodynamics and pharmacokinetics [22, 28], to define how a particular N-glycan structure modulates its antibody effector response [25]. This information can be a driver to producing a single glycoform mAb or homogeneous forms of antibody-drug conjugates (ADCs) [29–32]. Fine control of the glycosylation profile may also be required in the production of a biosimilar in which a mixture of homogeneous Fc glycans may simulate the profile of an approved originator molecule. In addition, having controlled ratios of different glycoforms in a formulation may be advantageous to trigger different immunologic pathways. 1.3 Methods of Controlling Fc N-glycosylation During a Bioprocess
There are three stages of bioprocessing during which it is possible to intervene to control the resulting glycosylation profile (Table 1). At the initial stage of upstream process, cell engineering is a method that can be applied at a cellular level aiming to intervene in glycan synthesis. It consists of knocking in or out genes expressing enzymes involved in the glycosylation pathway. For example, knock-out of fucosyltransferase FUT8 gene in CHO cells has proved to be efficient for production of afucosylated antibodies, ultimately improving the IgG ADCC activity [33]. Highly sialylated IgG also has been obtained by coexpressing the enzymes ST6Gal1 and β4GalT1 through gene manipulation [17, 18, 34]. Even though cell engineering has become much simple with
120
Letı´cia Martins Mota et al.
Table 1 Points of control of glycosylation during a bioprocess
Point of control
Method of control
Level of control
Advantages
Disadvantages
1 Cell line Gene engineering manipulation
Specific
Stable expression in a Time-consuming and single cell line limited to one type of change
2 Bioreactor
Culture environment
Enrichment Ease of manipulation Limited control by feeding
3 Downstream processing
Chemoenzymatic Specific
Applicable to all Expensive enzymes/ harvested products substrates
the sequencing of the genome of host cells such as CHO, this is still a time-consuming process. Moreover, if successfully engineered, each cell line synthesizes strictly only one type of glycan variation, limiting the therapeutic function of the respective IgG. During the bioproduction phase, there is a possibility of process control and media formulation in the bioreactor that can impact glycosylation significantly. Supplementation of media with glucose, glutamine, uridine, and manganese may increase galactosylated species levels [14, 35]. The opposite happens when ammonia level is high in the cell culture. Shifting of pH to acidic conditions increases activity of sialidase, therefore reducing levels of sialylated species [36]. On the other hand, sialylation levels can be augmented when media is supplemented with a combination of dexamethasone, manganese, uridine and galactose [35]. Despite being an important strategy, process and media control can only modulate glycosylation homogeneity to a limited extension. Moreover, it is necessary to have one batch production for each glycan profile targeted. The third stage outlined in Table 1 is the possibility of manipulation during downstream processing, which is the topic of this chapter. As outlined in the text above and in Table 1 there are of course advantages and disadvantages in choosing which stage to attempt to control glycosylation. Cell engineering is time consuming and leads to a cell with specific characteristics for producing a certain glycoform. Media supplementation with compounds that promote particular glycoforms or prevent others is a simple strategy but allows only limited control. Chemoenzymatic modification during downstream processing is probably the most effective in producing single glycoforms. The disadvantage may be the high cost of the enzymes and sugar-nucleotides that are required to effect these changes, but this may be mitigated by the enhanced specific activity of the resulting single glycoforms. Also, the cost of the required enzymes and substrates could be reduced through scale up as this strategy is accepted as a viable option.
Cell Free Remodeling of Glycosylation of Antibodies
121
1.4 Glycan Remodeling
At the downstream process stage the glycoprotein N-glycan structure can be rationally remodeled through in vitro enzymatic glycoengineering (IVGE). One approach is to incubate an intact IgG with endoglycosidases to remove the heterogeneous N-glycan, keeping only the core GlcNAc residue [37]. Following this step, the GlcNAc can be chemically conjugated en bloc to a presynthesized glycan transferred from an oxazoline substrate by a glycosyltransferase [5]. However, those oxazolines and enzymes are not commonly available, restricting the application of this technique. An alternative approach is to modify the IgG by removing terminal monosaccharides with glycosidases or elongating the glycan with specific sugars transferred from nucleotide-sugar substrates by glycosyltransferases. Recent advances of this technique resulted in a series of highly homogeneous glycan remodeled glycoproteins. Starting from IgG1, Thomann’s group (2016) applied this approach to produce samples approximately 85% hypogalactosylated after sample incubation with galactosidase, 84% hypergalactosylated followed by treatment of IgG with galactosyltransferase and UDP-galactose, and 48% mono and 30% disialylated after incubation of hypergalactosylated IgG with sialyltransferase and CMP-NANA [38]. Kurogochi et al. reported the synthesis of homogeneous M3 (trimannose), A2 (agalactosylated), A2G2 (digalactosylated), and A2G2F2 (disialylated) glycoforms using a similar technique [39]. This technique has also been used by other groups to obtain glycoproteins highly fucosylated [40], galactosylated and/or sialylated [5, 24, 26, 27, 41–43]. Even though in solution IVGE has proved to be an effective way to produce homogeneous Fc-glycan IgGs, this method may require several steps for processing and intermediate purification of antibodies from enzymes and buffers, when multiple sequential modifications are required to reach a desired glycoform. This results in multiple steps each of which can result product loss. Despite that, IVGE is an advantageous method of designing mAbs glycan variants especially as it is independent of the cell line and production process [38].
1.5 In-Vitro Glycan Engineering (IVGE) in Solid Phase
Solid phase transformations are often more efficient than liquid phase processes because of the potential for enhanced concentrations of the reactants and enzymes, as well as the ability to separate and reuse the end-products [4]. This circumvents the problem of product loss observed during multiple purification steps. IVGE in solid phase can be performed while the antibody is immobilized on a solid support, such as protein A. This method has the same advantages as IVGE in solution, but with the benefit of being more cost-effective in producing IgG with a single glycoform or with a predefined glycan profile. Protein A chromatography is the platform technology most used industrially for mAb purification. In this capture step the antibody binds to the chromatography media by the Fc domain,
122
Letı´cia Martins Mota et al.
still leaving the glycan site accessible for enzymes [44]. This way, IVGE can be integrated to the downstream processing. This approach can be applied whereby IgG with heterogeneous glycans is immobilized on a protein A column into which substrates, buffers and appropriate enzymes are added. Using this method, a sequential series of enzymatic reactions can be performed. At the end of each step enzymes are removed while keeping the IgG on the column and then a new enzyme can be applied. Enzymatic reactions can be performed in a stepwise manner with the advantage that inhibitory coproducts can be washed out and collected enzymes reused. This method has resulted in the production of IgGs with enriched single glycoforms; up to 94% FA2 (agalactosylated), 96% galactosylated, or sialylated with 76% S2 and 16% S1 [44]. The liquid phase IVGE could be performed with similar steps of enzyme modification. However, in contrast to solid phase modification, the enzymes or inhibitory coproducts cannot be easily recovered or removed once the target glycoprotein is mixed in the same solution. Furthermore, additional purification steps of the IgG would be required between each enzymatic modification. The availability of new enzymes with improved efficiency have helped the development of IVGE. Huang, Wang et al. generated mutants of the glycosynthase Endo-S able to add predefined Nglycans to Fc-deglycosylated IgG for gain of functions [5]. The same group recently reported the production of α1,6-fucosidase mutants capable of adding core fucose to intact N-glycoproteins [40]. The development of a series of glucosaminidases for generation of mAbs with homogeneous glycan has also been reported [39]. Although the elevated costs of enzymes and substrates are a challenge for this technique, this limitation may be eased through scale up and bulk production. In addition, when a reaction is performed on a solid support, reactants can be recycled which results in a more economical process [37]. Moreover, since IgGs homogeneous glycoforms are more efficient therapeutically, intake by patients would be reduced, consequently decreasing the production demand. Finally, the fact that in solution in vitro glycan engineered mAbs have been produced as clinical material confirms the feasibility of IVGE approach [27, 45]. The rest of this chapter outlines methods that have been applied to liquid-phase or solid-phase in vitro glycoengineering using a variety of glycosylation enzymes and substrates where needed to generate various single glycoform mAbs. The resulting antibody glycans are then analyzed by fluorescent labeling and profiling with a series of chromatographs following treatment with exoglycosidase enzyme arrays. The methods can be applied to generate sufficient quantities of homogeneous antibodies from the heterogenous product of a cell culture process. This allows
Cell Free Remodeling of Glycosylation of Antibodies
123
research to gain further understanding of the interaction of these antibodies with cell surface receptors. The methods also have a high potential for industrial application to develop the next generation mAbs or biosimilars with predefined glycan profiles.
2
Materials All buffers and solutions are prepared in water with a resistivity of 18 MΩ-cm and a total organic content below 5 parts per billion. Unless otherwise indicated, buffers and solutions should be prepared fresh and filtered through a sterile 0.22 μm filter. High purity grade reagents are used to eliminate possible impurities interferences. The enzymes and buffers listed here may be acquired from different suppliers. Since purity, activity and incubation parameters of enzymes may differ among enzymes from different sources, there may be a need to adapt reaction conditions according to the supplier’s recommendations. It is recommended to confirm the enzymes activity against a labeled standard N-glycan.
2.1 Sample Working Solution Preparation
1. 30 k MWCO spin filter, 2–20 mL capacity. 2. Metabolite analyzer (e.g., Cedex). 3. Centrifuge.
2.2
Glycoengineering
2.2.1 General Materials
1. PBS buffer: 1.14 g of Na2HPO4 and 0.24 g of NaH2PO4 in 450 mL of H2O adjusted to pH 7.4. 2. Protein A HP SpinTrap™ (Cytiva, formerly GE Healthcare Life Sciences). 3. 200 μL microcentrifuge tube. 4. Rotary shaker heat block.
2.2.2 Agalactosylation
1. α2–3,6,8 neuraminidase (New England Biolabs) 50 U/ μL. 2. β1–4-galactosidase S (New England Biolabs) 8 U/ μL. 3. Degalactosylation reaction buffer (B1 buffer): 25 mM TRIS, 50 mM NaCl, pH 6.6 (0.3028 g of tris base, 0.292 g of NaCl in 100 mL of H2O adjusted to pH 6.6).
2.2.3 Trimannose Modification
2. β-N-acetylglucosaminidase S (New England Biolabs) 4 U/ μL.
2.2.4 Galactosylation
1. α2–3,6,8 neuraminidase (New England Biolabs) 50 U/ μL.
1. Materials 1–3 from Subheading 2.2.2.
2. β(1,4) galactosyltransferase working solution: β(1,4) galactosyltransferase (Sigma-Aldrich) 0.025 U/ μL (dilute 5 U of the enzyme in 200 μL); make aliquots and store at 20 C (see Note 1).
124
Letı´cia Martins Mota et al.
3. UDP-galactose working solution: UDP-galactose 57 mM (dilute 5 mg of UDP-galactose (Sigma-Aldrich) in 145.6 μL of water; make aliquots and store at 20 C). 4. Galactosylation reaction buffer (B2 buffer): 25 mM Tris-HCl, 50 mM NaCl, pH 7.5 (0.30 g of tris base and 0.29 g of NaCl in 100 mL of water adjusted to pH 7.5). 5. 0.15 M MnCl2: 150 μL of 1 M MnCl2 stock solution in 850 μL of water. 2.2.5 α2–6 Sialylation
1. Materials 2–5 from Subheading 2.2.4. 2. α2–3 neuraminidase (New England Biolabs) 8 U/ μL. 3. α(2–6) Sialyltransferase (hST6Gal) (Roche Custom Biotech) working solution: hST6Gal 0.5 μg/ μL (dilute the enzyme in water for a final concentration of 0.5 μg/ μL; dispense the enzyme solution in aliquots and store at 20 C) (see Note 1). 4. CMP-NANA working solution: CMP-NANA 15 mM (1 mg of CMP-NANA (Sigma-Aldrich) in 89 μL of water); dispense the sample in aliquots and store the aliquots at 20 C.
2.3 Galactosylation Kinetics
1. β(1,4) galactosyltransferase working solution: 0.3 μg/ μL β(1,4) galactosyltransferase (Agilent) (dilute the enzyme in water for a final enzyme concentration 0.3 μg/ μL); make aliquots and store them at 20 C (see Note 1). 2. 5 UDP-galactose reaction buffer (Agilent): 50 mM MnCl2, 500 mM MES, pH 6.5. 3. 1 UDP galactose reaction buffer: 10 mM MnCl2, 100 mM MES, pH 6.5 (400 μL of water and 100 μL of 5 UDP-galactose reaction buffer) 4. UDP-galactose working solution: dilute 10 mg of UDP-galactose (Agilent) in 330 μL of 1 reaction buffer; make aliquots and store them at 20 C. 5. Vacuum concentrator.
2.4 N-glycan Release in Solid Phase and Labeling
1. Binding buffer: PBS buffer (see Material 1 from Subheading 2.2.1). 2. Elution buffer: 100 mM glycine hydrochloride pH 2.7 (0.75 g of glycine in 100 mL of H2O adjusted to pH 2.7). 3. Neutralizing buffer: 1 M TRIS-HCl pH 9.0 (12.12 g of TRIS in 100 mL of H2O adjusted to pH 9.0 with 1 M HCl). 4. 2-AB labeling solution: to make 2 mL of 2-AB labeling solution, mix 0.6 mL of glacial acetic acid and 1.4 mL of dimethyl sulfoxide. Split the solvent in two aliquots of 1 mL, solubilize 100 mg of 2-aminobenzamide (2-AB) into 1 mL of the solvent, dissolve 125 mg of NaBH3CN into other 1 mL of
Cell Free Remodeling of Glycosylation of Antibodies
125
solvent. Combine both solutions. Store it at 20 C protected from the light. 5. PNGase F recombinant glycerol-free 500,000 units/mL (New England Biolabs). 6. Acetonitrile HPLC grade. 7. Protein A HP SpinTrap™ (Cytiva, formerly GE Healthcare Life Sciences). 8. 2 mL microcentrifuge tube. 9. 1.5 mL microcentrifuge tube. 10. Nanosep® 10 K Omega filter (Pall Corporation). 11. HyperSep Diol™ 50 mg/1 mL 96 Removable Well Plate (ThermoFisher Scientific). 12. Deep-well collector plate. 13. Storage sealing foil. 14. Waste tray. 15. Deep-well collar. 16. Vacuum manifold. 17. Mini centrifuge. 18. Rotary shaker heat block. 19. Vacuum concentrator. 2.5 N-glycan Release in Liquid Phase and Labeling
1. 50 mM NH4HCO3 buffer pH 8.0: 0.40 g of NH4HCO3 in 100 mL of H2O adjusted to pH 8.0 with formic acid or NH4OH. 2. N-Glycanase working solution: in a 200 μL microcentrifuge tube mix 1.2 μL N-glycanase (Agilent) * with 1.2 μL of Gly-X digestion buffer* per sample to be analyzed. Prepare it fresh and store at 4 C. 3. 2-AB working solution: mix 2 μL of 2-AB solution*, 2 μL of 2-AB reductant*, 4 μL of catalyst*, and 88 μL of acetonitrile. Prepare it fresh and store at 4 C protected from the light. 4. Finishing reagent.*. 5. Acetonitrile HPLC grade. 6. Multichannel pipette. 7. 1.5 mL microcentrifuge tube. 8. 200 μL microcentrifuge tube. 9. Gly-X Deglycosylation plate.*. 10. Storage plate.*. 11. Gly-X Cleanup plate.*. 12. Deep-well plate.*.
126
Letı´cia Martins Mota et al.
13. Collection plate.*. 14. Gly-X Vacuum manifold spacer.*. 15. Storage sealing foil.*. 16. Two rotary shaker heat blocks. 17. Heat block lid.*. 18. Waste tray.*. 19. Vacuum manifold collar. 20. Vacuum manifold. 21. Vacuum concentrator.* * Materials supplied with AdvanceBio Gly-X N-Glycan Prep and 2-AB Express Kit. 2.6 Exoglycosidase Digestions
1. 10 sodium acetate buffer: 500 mM sodium acetate, 50 mM CaCl2, pH 5.5 (add 3.47 g sodium acetate, 0.46 g of glacial acetic acid, and 0.56 g of CaCl2 to 100 mL of H2O adjusted to pH 5.5). 2. 5 JBM zinc buffer (Agilent): 500 mM sodium acetate, 10 mM zinc chloride, pH 5.0 (supplied with the enzyme). 3. Streptococcus pneumoniae α-2-3 neuraminidase (sialidase S) expressed in E. coli—NanB (Agilent): 3.8 mU/ μL NanB (dilute 3 U of the enzyme in 790 μL of water and mix well. Keep the solubilized enzyme at 4 C). 4. Arthrobacter ureafaciens α-2-3,6,8,9 neuraminidase (sialidase A) expressed in E. coli—ABS (Agilent) 25 mU/ μL. 5. Streptococcus pneumoniae β1–4 galactosidase—SPG (Agilent) 20 mU/ μL. 6. Bovine kidney α-1-2,3,4,6 fucosidase—BKF (Agilent) 5 mU/μL: dilute 500 mU of the enzyme in 100 μL of water and mix well. Keep the solubilized enzyme at 4 C. 7. Streptococcus pneumoniae β-N-acetylglucosaminidase expressed in E. coli—GUH (Agilent) 40 mU/ μL. 8. Jack bean 150 mU/μL.
α1–2,3,6
mannosidase—JBM
(Agilent)
9. 200 μL microcentrifuge tubes. A panel of enzymes used for exoglycosidase digestions is shown in Table 2. 2.7
Glycan Analysis
1. 50 mM ammonium formate buffer, pH 4.4: Add 3.8 mL of formic acid into a beaker with 1.9 L of H2O, add ammonium hydroxide until pH 4.4 is reached. Make the volume up to 2 L with H2O, and filter the buffer through a sterile 0.22 μm filter.
Cell Free Remodeling of Glycosylation of Antibodies
127
Table 2 Exoglycosidase enzymes for glycan characterization Working solution Abbreviation concentration (mU/μL)
Enzyme
Source
α-2,3 sialidase (sialidase S)
Streptococcus pneumoniae expressed NanB in E. coli
3.80
α-2,3,6,8,9 sialidase (sialidase A)
Arthrobacter ureafaciens expressed ABS in E. coli
25
β-1,4 galactosidase
Streptococcus pneumoniae
SPG
20
α-1,2,3,4,6 fucosidase
Bovine kidney
BKF
5
β-acetylhexosaminidase
Streptococcus pneumoniae expressed in E. coli
GUH
40
α-1,2,3,6 mannosidase
Jack bean
JBM
150
2. Dextran calibration ladder standard (Waters): dilute 200 μg of the standard into 200 μL of water and keep at 4 C. For HPLC injection dilute 5 μL of this solution into 15 μL of acetonitrile. Inject 5 μL for HPLC analysis. 3. Acetonitrile HPLC grade. 4. Chromatography vials and lids. 5. HPLC system (e.g., Agilent). 6. Fluorescent detector (e.g., Agilent 1260 Infinity II FLD Spectra). 7. Column heater. 8. AdvanceBio Glycan Mapping, 2.1 150 mm, 2.7 μm column (Agilent). 9. OpenLAB CDS ChemStation Software (Agilent).
3
Methods The methods described here outline the steps needed (a) to modify chemoenzymatically the structure of the IgG N-glycans (b) and to analyze such structures. This is required before and after glycoengineering to monitor the modifications made. Two glycoengineering protocols are depicted to produce IgG with defined homogeneous glycans (Table 3). The first protocol comprises IgG immobilization on a solid support followed by glycan enzymatic modification. Whereas the second method describes the glycan modification while the IgG is in solution. The galactosylation (GI) or sialylation index (SI) (Eq. 1) is used to compare the modifications extent upon each method.
128
Letı´cia Martins Mota et al.
Table 3 Targeted glycan modifications and respective enzymes reactions Targeted glycan modification Trimannose
Agalactosylated
Galactosylated
α-2-6 sialylated
α2–3,6,8 neuraminidase α2–3,6,8 α2–3,6,8 neuraminidase Enzyme (1.33 U/μL), β(1,4) neuraminidase (1.33 U/μL), β(1–4)reaction galactosyltransferase (1.33 U/μL), galactosidase S (0.67 U/ stage 1 (0.67 U/μL),UDPβ1–4-galactosidase μL), B1 buffer galactose (1.5 mM), S (0.67 U/ μL),B1 MnCl2 (10 mM)B2 buffer buffer
α2–3 neuraminidase (1.33 U/μL), β(1,4) galactosyltransferase (0.67 U/μL),UDPgalactose (1.5 mM),MnCl2 (10 mM)B2 buffer
β-N-acetylglucosaminidase Enzyme S (0.13 U/μL)B1 buffer reaction stage 2
2 rounds/48 h each:α(2–6) sialyltransferase (5 μg)CMPNANA (1 mM)B2 buffer 4 rounds/24 h each:α(2–6) sialyltransferase (10 μg) CMP-NANA (1 mM)B2 buffer
GI ¼
ðX 1 0:5Þ þ X 2 ðX 0 þ X 1 þ X 2 Þ
ð1Þ
where X0, X1, and X2 represents respectively the amount of agalactosylated or asialylated species, monogalactosylated, or monosialylated species and digalactosylated or disialylated species. The glycan modification methods presented include agalactosylation (Fig. 1) and galactosylation (Fig. 2) of rituximab in solid and in liquid phase (Subheadings 3.2.2 and 3.2.4, respectively), trimannose modification (Fig. 3), and sialylation of EG2-hFc (Fig. 4) in solid phase (Subheadings 3.2.3 and 3.2.5, respectively). The method allows the formation of the α2–6 linked sialylated species, instead of the α2–3 linked glycan produced by CHO cells. A galactosylation kinetics method is presented to determine the best incubation time to obtain the highest galactosylation of rituximab (Fig. 5) with the parameters used (Subheading 3.3). Before and after modification the glycans are submitted to characterization, a process which starts with a deglycosylation step to separate N-glycans covalently attached to the antibody. Two methods of glycan removal are described: one involves release of the glycans as the antibody is held on a matrix (Subheading 3.4.1) and the second method (Subheading 3.4.2) is a rapid method of analysis of antibodies in solution. Figure 6 shows the equivalence of both methods. Following removal, the glycans are labeled and submitted to exoglycosidase digestions (Subheading 3.5). The glycan assignment by exoglycosidase sequencing (Fig. 7) is not a routine test. Instead, it is usually performed in the initial stage of glycan characterization. Once a confident fingerprint of the undigested glycan profile is stablished, it can be used for comparison with the glycoengineered profile. Exoglycosidase digestions are
Cell Free Remodeling of Glycosylation of Antibodies
129
Fig. 1 Agalactosylation of rituximab. HILIC-HPLC profiles of N-glycans of rituximab unmodified control (GI 0.27) (a), agalactosylated in solid phase for 24 h (GI 0.10) (b), agalactosylated in liquid phase for 24 h (GI 0.03) (c), agalactosylated in solid phase for 48 h (GI0.06) (d), and agalactosylated in liquid phase for 48 h (GI 0.03) (e)
especially useful to confirm monosaccharides linkages and identify coeluted glycans. N-glycans incubation with exoglycosidase enzymes follows the labeling step. The glycans can be separated by a hydrophilic interaction liquid chromatography (HILIC) method (Subheading 3.6). The retention times of the glycan peaks are converted into glucose units (GU) through the correlation to a dextran ladder. The peak structures are directly assigned by the association of their GU values to a glycan database. 3.1 Working Solution Sample Preparation
1. With a metabolite analyzer, measure the IgG concentration of the original sample. 2. Rinse a 30 k MWCO spin filter by adding around 10 mL of water and centrifuging for 5–10 min at 4000 g. Discard the
130
Letı´cia Martins Mota et al.
Fig. 2 Galactosylation of rituximab. HILIC-HPLC profiles of N-glycans of rituximab unmodified control (GI 0.27) (a), galactosylated in solid phase for 24 h (GI 0.88) (b), galactosylated in liquid phase for 24 h (GI 0.93) (c), galactosylated in solid phase for 48 h (GI 0.92) (d) and galactosylated in liquid phase for 48 h (GI 0.98) (e)
filtered water. Once rinsed do not allow the device membrane to dry out. 3. Load the sample to the rinsed 30 k MWCO centrifuge filter. 4. Add up 15 mL of water to the sample. 5. Spin the tube at 4000 g for 15–60 min, until the volume of the retentate is approximately 1 mL. 6. Repeat steps 4 and 5 two more times. 7. With a pipette collect the sample from the filter. 8. Measure the new IgG concentration.
Cell Free Remodeling of Glycosylation of Antibodies
131
Fig. 3 FM3 glycoform modification of EG2-hFc. HILIC-HPLC profiles of N-glycans of EG2-hFc unmodified control (a) and modified to FM3 glycoform (80%) in solid phase for 24 h (b)
Fig. 4 Sialylation of EG2-hFc. HILIC-HPLC profiles of N-glycans of EG2-hFc unmodified control (11.5% monosialylated, 5.5% disialylated) (a), sialylated in solid phase using 5 μg of enzyme in two reaction cycles of 48 h each (58% monosialylated, 17% disialylated) (b), sialylated in solid phase using 10 μg of enzyme in four reaction cycles of 24 h each recycling the enzyme from the previous cycle along with 2 μg of fresh enzyme (16% monosialylated, 76% disialylated) (c)
132
Letı´cia Martins Mota et al.
Fig. 5 Galactosylation kinetics of rituximab. HILIC-HPLC profiles of rituximab N-glycans unmodified control (a), galactosylated in liquid phase for 1 h (b), 2 h (c), 4 h (d), 8 h (e), and 24 h (f)
9. Dilute the IgG in water to a final concentration of 4 μg/ μL. This is the IgG stock solution and can be stored at 4 C to be used in the same week. Alternatively the sample can be aliquoted and stored at 20 C for several weeks (see Notes 2 and 3). 3.2
Glycoengineering
Two methods of glycoengineering are described to produce IgGs with homogeneous glycoforms, as summarized in Table 3. One method involves the modification of glycans structures while the antibody is immobilized on a matrix (solid phase), whereas in the
Cell Free Remodeling of Glycosylation of Antibodies
133
Fig. 6 Glycan profile of 40 μg of rituximab deglycosylated and labeled by liquid and solid phase methods
Fig. 7 Exoglycosidase sequential digestions of N-glycans of rituximab for glycan characterization. HILIC-HPLC profiles of rituximab N-glycans unmodified control (a); digested with NanB (b); ABS (c); ABS and SPG (d); ABS, SPG, and BKF (e); ABS, SPG, BKF, and GUH (f); ABS, SPG, BKF, GUH, and JBM (g)
134
Letı´cia Martins Mota et al.
Fig. 8 Glycan structures and respective names
second method the modification occurs as the IgG is in solution (liquid phase). The glycan species are represented in the chromatograms according to Fig. 8. 3.2.1 Antibody Immobilization for Glycoengineering in Solid Phase
1. Remove the bottom cap of the spin trap column, loosen the lid, insert the column in a 2 mL centrifuge tube, centrifuge the Protein A HP SpinTrap™ for 1 min at 200 g to remove the storage solution. 2. Place the column again in the 2 mL tube, add 600 μL of PBS buffer to the column and centrifuge for 1 min at 200 g. Discard the PBS washed out. 3. Plug the bottom of the column and add 15 μL of IgG stock solution, equivalent to 60 μg of glycoproteins. Close the lid and incubate the column in the rotary shaker for 15 min at room temperature. 4. Remove the bottom of the column and place it in a 2 mL tube. Centrifuge it to remove the solution. 5. Wash the column with 600 μL of PBS buffer. 6. Connect the plug, add the reaction buffer and enzymes according to the modification targeted as described in the following steps.
3.2.2 Agalactosylation
1. For agalactosylation in solid phase follow steps 1–6 from Subheading 3.2.1. For liquid phase modification add 15 μL (60 μg) of IgG stock solution to a 200 μL microcentrifuge tube. 2. Add 200 U (4 μL) of α2–3,6,8 neuraminidase to the reaction solution. 3. Add 100 U (12.5 μL) of β1–4-galactosidase S to the reaction solution. 4. Make the volume of the reaction solution up to 150 μL with buffer 1. 5. Incubate the protein A column/microcentrifuge tube in the heat block at 37 C, 300 rpm for either 24 or 48 h.
Cell Free Remodeling of Glycosylation of Antibodies 3.2.3 Trimannose Modification
135
1. Follow Subheading 3.2.2 using solid phase to obtain agalactosylated IgG. Use incubation time of 48 h. 2. Wash out the first stage reaction buffer as described in Subheading 3.2.1 steps 1 and 2. 3. Plug the bottom of the column, add 20 U (5 μL) of β-Nacetylglucosaminidase S. 4. Make the volume of the reaction solution up to 150 μL with B1 buffer. 5. Incubate the protein A column in a heat block at 37 C, 300 rpm for 72 h.
3.2.4 Galactosylation
1. For galactosylation in solid phase follow Subheading 3.2.1, steps 1–6. For liquid phase modification add 12.5 μL (50 μg) of IgG stock solution to a 200 μL microcentrifuge tube. 2. Add 200 U (4 μL) of α2–3,6,8 neuraminidase to the reaction solution. 3. Add 100 mU (4 μL) of β(1–4) galactosyltransferase to the reaction solution. 4. Add 4 μL of UDP-galactose working solution for a final concentration of 10 mM. 5. Add 10 μL of 0.15 M MnCl2 solution for a final concentration of 1.5 mM. 6. Make the volume of the reaction solution up to 150 μL with B2 buffer. 7. Incubate the protein A column/microcentrifuge tube in a heat block at 37 C, 300 rpm for either 24 or 48 h.
3.2.5 α2–6 Sialylation
1. Follow Subheading 3.2.4 using solid phase to obtain galactosylated IgG, using α2–3,6,8 neuraminidase or α2–3 neuraminidase (200 U) to remove any original 2,3 sialic acid. Use the incubation time of 48 h. 2. Wash out the first stage reaction buffer as described in Subheading 3.2.1 steps 1 and 2. Plug the bottom of the column. Proceed either to the following steps 3–7 for IgG sialylation or 8–15 for further sialylation. 3. Add 2.5 μg of α(2–6) sialyltransferase. 4. Add 10 μL of CMP-NANA working solution for a final concentration of 1 mM. 5. Make the volume up to 150 μL with B2 buffer. 6. Incubate the protein A column in a heat block at 37 C, 300 rpm for 48 h. 7. Using the same column, repeat steps 3, 4 and 6 from this section.
136
Letı´cia Martins Mota et al.
8. Add 2.5 μg of α(2–6) sialyltransferase. 9. Add 10 μL of CMP-NANA working solution for a final concentration of 1 mM. 10. Make the volume up to 150 μL with B2 buffer. 11. Incubate the protein A column in a heat block at 37 C, 300 rpm for 24 h. 12. Centrifuge the protein A column to collect the reaction mixture. 13. Filter the reaction mixture through a 10 kDa MWCO filter to collect the enzyme and remove the inhibitory CMP by product. 14. Add 2 μg of fresh enzyme and the recycled enzyme to the protein A column. 15. Using the same column, repeat steps 9–14 from this section three more times. 3.3 Galactosylation Kinetics
A method to determine the kinetics of galactosylation of rituximab is described to identify the best incubation time of the IgG with the enzyme and substrate to obtain the highest yield of galactosylated IgG. 1. In a vacuum concentrator dry down 12.5 μL of IgG working solution 4 μL/ μL, equivalent to 50 μg of IgG. Prepare one sample for each incubation time. 2. Set a heat block to 37 C. 3. Remove the dried sample from the vacuum concentrator and add to the tube 2 μL of UDP-galactose working solution (~60.6 μg), 5 μL of galactosyltransferase working solution (1.5 μg) and 13 μL of 1 galactosylation buffer and mix well. The sample final volume is 20 μL. 4. Incubate the vials in the heat block at 37 C for 1, 2, 4, 8 and 24 h. The incubation time to obtain the maximum galactosylation varies according to the glycoprotein and incubation conditions. Test different times as needed. In the case of rituximab, the galactosylation index increased from 0.27 (unmodified control) to 0.92 after 1 h, 0.96, after 2 h and 0.98 from 4 h of incubation. When the reaction is completed the sample can be used to analyze the glycan profile. The procedure involves deglycosylation, glycan labeling, and analysis according to the preferred method. (a) For deglycosylation in liquid phase, after completing galactosylation reaction filter the sample through a 10 kDa filter to remove the of UDP-galactose and buffer before glycan release.
Cell Free Remodeling of Glycosylation of Antibodies
137
(b) If the glycan analysis is executed in solid phase, after binding the galactosylated IgG to the protein A the galactosyltransferase may be collected, buffer exchanged and reused. 3.4 Degalactosylation and Labeling
Two methods of glycan removal are described: one involves release of the glycans as the antibody is held on a column (Subheading 3.4.1) and the second method (Subheading 3.4.2) is a rapid method of analysis of antibodies in solution.
3.4.1 Deglycosylation in Solid Phase and Labeling
This method described is based upon [46], with the modification that 2-AB cleanup is performed in a HyperSep Diol™ well plate. Antibody immobilization and deglycosylation 1. Remove the bottom cap of the spin trap column, loosen the lid, insert the column in a 2 mL centrifuge tube, centrifuge the Protein A HP SpinTrap™ for 1 min at 200 g to remove the storage solution.
2. Place the column again in the 2 mL tube, add 600 μL of PBS buffer to the column and centrifuge for 1 min at 200 g. Discard the PBS. 3. Plug the bottom of the column and add 25 μL of IgG stock solution, equivalent to 100 μg of glycoproteins. Close the lid and incubate the column in the rotary shaker for 15 min at room temperature. 4. Remove the bottom of the column and place it in a 2 mL tube. Centrifuge it to remove the solution. 5. Wash the column with 600 μL of PBS buffer. 6. Connect the plug, add 150 μL of PBS buffer, 1.5 μL of PNGaseF and incubate overnight (18–24 h) on a rotary shaker at 37 C. Glycan collection 7. Remove the bottom cap and place the protein A column immediately in a fresh 2 mL microcentrifuge tube. Loosen the lid and centrifuge the tube for 1 min at 800 g to collect the glycans.
8. Plug the bottom cap, add 250 μL of water, close the lid and shake gently to rinse the column. 9. Remove the bottom cap and place the protein A column immediately in the same collection tube from step 8. Loosen the lid and centrifuge the tube for 1 min at 800 g. The final volume of the collected sample is approximately 400 μL. 10. Plug the bottom cap, add 600 μL of PBS buffer to the column and keep it at 4 C for an optional protein analysis step described in step 23.
138
Letı´cia Martins Mota et al.
11. Add 500 μL of water to the 10 K MWCO microcentrifuge filter and centrifuge it for 5 min at 13,000 g. Discard the filtered water. 12. Transfer the glycans collected in step 9 to the rinsed microcentrifuge filter and centrifuge it for 10 min at 13,000 g. 13. Dry down the sample in a vacuum concentrator (see Note 4). 2-AB glycan labeling and cleanup 14. Add 5 μL of 2-AB solution to the dried glycans, vortex and spin the tube. Incubate it for 2 h at 65 C.
15. Install the HyperSep Diol™ well plate and a waste tray to the vacuum manifold, add 1 mL of water to the well and apply 99%). The occupancy increased to >99% and was comparable to results obtained from antibodies expressed in CHO cells by expressing the LmStt3-D paralog in Pichia pastoris, with no significant influence on the resulting glycoforms [49].
Recent Advances Toward Engineering Glycoproteins Using Modified Yeast. . .
193
A. Wild type Nascent polypeptide chain ER Lumen
Yeast Stt3
P P
N X T/S
Golgi B. ∆alg3∆alg11 Nascent polypeptide chain P P
Yeast Stt3 N X T/S
Inefficient transfer
N X T/S
C. ∆alg3∆alg11 Nascent polypeptide chain P P
Leishmania Stt3 N X T/S
Kre-GnTI Mnn2-GnTII
Fig. 5 An approach to engineer N-glycosylation in yeast by targeting enzyme involved in LLO biosynthesis. (a) The preferred substrate for yeast Oligosaccharyl Transferase (OST) is Glc3Man9GlcNAc2. This N-glycan is efficiently transferred from the dolichol diphosphate lipid donor to the N-X-(T/S) on a nascent polypeptide chain. (b) In the Δalg3Δalg11 knockout strain, the Man3GlcNAc2 glycoform is accumulated and results in this N-glycan being inefficiently transferred to the nascent polypeptide. (c) Replacing yeast OST complex with the Leishmania major Stt3 subunit, with a preference for the accumulated glycoform, overcomes this limitation. These truncated N-glycans are then available for modification with mammalian enzymes 2.2 Postpurification Methods to Add Mammalian Glycans to Yeast-Expressed Proteins
Previous sections covered the use of glycoengineered yeast strains to produce proteins with desired glycoforms. However, the glycoforms often lack a high degree of homogeneity. Some groups have used purified glycan modifying enzymes to engineer N-glycans. Two approaches to generate highly homogeneous protein preparations following purification of yeast-expressed glycoproteins include the use of mammalian glycosyltransferases in vitro or with the use of prokaryotic glycan hydrolases. The former subject is covered extensively in detail elsewhere [50–54]. Glycan hydrolase-treated proteins represent one approach to eliminate glycan heterogeneity in therapeutics when extended Nglycans are not required. Endoglycosidase (Endo) H (isolated from Streptomyces plicatus) trims yeast-produced oligomannose N-
194
Anjali Shenoy and Adam W. Barb
Endo H
Endo M Y217F
Oxazoline
Endo A N171A
Oxazoline
Endo F3 D165A
Oxazoline
Endo D N322Q
Oxazoline
Endo S D233A/ D233Q
Oxazoline
Endo S2 D184M
Oxazoline
Fig. 6 Enzymes used for modifying yeast-produced N-glycans modification of glycoproteins postpurification. Endo H trims oligomannose glycoforms to a single N-acetylglucosamine residue (GlcNAc) that serves as a substrate for transglycosylase reactions. Endo M, A, D, and F3 recognize a broad range of protein substrates. EndoS and Endo S2 are largely specific for immunoglobulins with EndoS2 capable of adding both complex and hybrid-type N-glycans to IgG1
glycans to a single N-acetylglucosamine residue and efficiently trims glycans following purification (Fig. 6) [55]. Alternatively, coculturing two Pichia pastoris strains expressing either the target glycoprotein or Endo H and generated antibody comparable to antibody treated with Endo H following purification [56]. Many prokaryotic glycan hydrolases possess the unique ability to transfer N-glycans from a donor (peptide/protein with an attached N-glycan or sugar oxazoline) to an acceptor protein through a transglycosylation reaction. Synthetic sugar oxazolines are transition state analogues that promote efficient reactions that proceed with minimal product degradation [57].
Recent Advances Toward Engineering Glycoproteins Using Modified Yeast. . .
195
Endo M (GH85 family) possesses little native transglycosylation ability. Umekawa and coworkers identified the Endo M Y217F variant that demonstrated improved transglycosylation activity with reduced product hydrolysis (Fig. 6) [58]. Other enzymes capable of catalyzing a variety of transglycosylation reactions include Endo A N171A that transfers an oligomannose N-glycan [4], Endo F3 D165A that transfers triantennary N-glycans [59] and Endo D that transfers Man3GlcNAc1 to a fucosylated donor (Fig. 6) [3]. Protein-substrate specific glycosidases have likewise provided a scaffold for protein-specific transglycosylases. Endo S (isolated from Streptomyces pyogenes) specifically hydrolyzes complex type N-glycans from immunoglobulin G [60, 61]. This enzyme is an endo β-1,4-N-acetylglucosamindase that cleaves the chitobiose core of a complex type N-glycan to generate IgG with a single N-linked N-acetylglucosamine attached to N297 and the hydrolyzed N-glycan product [60]. A crystal structure of inactive Endo S (D233A/E235L) revealed that the glycosidase domain of this enzyme forms a (β/α)8 barrel with a large central channel, the loops surrounding this domain form contacts with the N-glycan, thus determining N-glycan specificity [62]. Sequence alignment with a related enzyme, Endo F3, revealed a residue responsible for stabilizing the oxazolinium ion during catalysis (D233 in Endo S) [63]. Site directed mutagenesis to an A or Q residue at this site introduced efficient transglycosylase activity including the ability to transfer from an activated sugar-oxazoline donor with a complex-type glycan to IgG bearing a single N-acetylglucosamine in an hour [64]. This mutant enzyme likewise proved efficient with sialylated oxazoline donors that provided >90% conversion of an antibody. Comparable variants of a related enzyme, EndoS2, likewise display transglycosylation activity though with a broader substrate profile. EndoS2 (also isolated from Streptomyces pyogenes) is also a β-1,4-N-acetyl-D-endoglycosidase that can digest biantennary sialylated complex-type N-glycans from both IgG and the acute phase protein AGP [65]. EndoS2 and Endos share a 37% amino acid identity including active site conservation with the conserved GH18 amino acid motif (DXXDXXE) and conserved tryptophan residues [65]. However, EndoS2 exhibits a broader N-glycan specificity compared to Endo S, including the ability to digest complex-, hybrid- and oligomannose-type glycoforms [66]. This broad glycan specificity was later explained through crystal structures where it was shown that both the catalytic glycosylhydrolase (GH) and carbohydrate binding (CB) domains work in concert to accommodate a larger number of glycoforms with the active site of the enzyme [67]. Mutation of the EndoS2 D184 residue, responsible for stabilizing the oxazoline ion during catalysis, introduced transglycosylase activity. The EndoS2 D184M variant efficiently transferred a complex type N-glycan from the oxazoline substrate to IgG
196
Anjali Shenoy and Adam W. Barb
that was previously trimmed to display a single N-acetylglucosamine residue at a rate that was at least six times greater than the best EndoS variant. In one example, this enzyme converted >95% of the substrate antibody with sialylated N-glycans. EndoS2 D184M also transferred oligomannose and hybrid-type N-glycans, though with lower efficiency [68]. The discovery of a highly efficient EndoS2 variant having transglycosylation activity has allowed one group to use a combination approach that expressed antibodies with a Pichia pastoris expression platform lacking mannosyltransferases followed by enzymatic treatment, generating a homogeneously glycosylated product. The expressed antibody N-glycans were trimmed to a single Nacetylglucosamine residue following purification using Endo E. This product was next modified at an 80% level by EndoS2 D184M to generate antibodies with an Gal2GlcNAc2Man3GlcNAc2 glycoform [69].
3
Yeast Surface Display (YSD) Using Glycoengineered Yeast Strains This review has focused on glycoengineering yeast to produce mammalian glycoforms, though another advantage of using a yeast expression platform is the ability to create and screen yeast display libraries. Due to its sophisticated protein quality control machinery and ability to carry out eukaryotic posttranslational modifications, yeast is an excellent platform to diversify glycoproteins. This display technology involves screening through diverse protein libraries for a desired phenotype using techniques such as fluorescence assisted cell sorting (FACS) and magnetic assisted cell sorting (MACS) [70]. Through this technique it has been possible to identify several high affinity binders with a single chain variable antibody fragment (scFv) exhibiting affinity as high as 48 fM affinity [71]. Other proteins that have been diversified through this method include antibody antigen binding (Fab) domains that recognize a variety of antigens. Examples with low nM to μM affinity include anti-HIV antibodies (10–43 nM) [72] and anti–TNF-α antibodies [73], as well as the major histocompatibility complex HLA-DR1 [74] and T-Cell coreceptor CD4 (5.3–8.8 μM) [75]. YSD has also developed an antibody with 2.1 pM affinity against H1N1 [76]. Fab domains do not contain a conserved N-glycosylation site, unlike the crystallizable fragment (Fc) of IgG1 that also requires an N-glycan for receptor binding. Though Fab display on yeast is straightforward, it is much more challenging to display function forms of Fc and other glycoproteins due to improper modifications and folding defects as well as hypermannosylation. One notable example is the epidermal growth factor receptor (EGFR) with extensive posttranslational modifications including 25 disulfide
Recent Advances Toward Engineering Glycoproteins Using Modified Yeast. . .
197
bonds and 12 potential sites of N-glycosylation. This protein is partially misfolded when displayed on the yeast surface as determined through binding with epitope-specific monoclonal antibodies through flow cytometry. Therefore, it was necessary to screen for mutations within this protein to allow for better folding on the surface [77]. Similar problems were observed in the case of T cell receptors (TCR) displayed on the yeast surface [78]. Problems associated with hypermannosylation were observed when the HIV gp120 and gp140 proteins were displayed on the yeast surface. The large N-glycans on the glycoproteins blocked antibody binding to specific epitopes which resulted in problems detecting protein expression levels [79]. The N-glycan processing in Pichia pastoris is less extensive as compared to Saccharomyces cerevisiae. Several glycoengineered Pichia pastoris strains have been created that are capable of producing complex-type glycoforms. As a result, Pichia Surface Display (PSD) was developed to display complex-type glycoforms on the yeast surface. PSD utilized the same agglutination system as Saccharomyces cerevisiae (YSD) to display protein. A GS115 (wild type) and a Man5 glycoform producing Pichia pastoris strain were used to validate this technique [80]. To create display libraries using PSD, the cells are transformed with linearized plasmid through integration of the DNA into the genome; however, in the case of YSD, cells are transformed using an episomal plasmid DNA. As a result, there is higher percentage of cells that test positive for surface protein in PSD as compared to YSD [81]. Antibodies expressed in both CHO cells and glycoengineered strain of Pichia pastoris were comparable in terms of biochemical characteristics [82]. This implies that affinities measured through PSD would be comparable to those measured using protein isolated from mammalian systems.
4
Other Notable Applications of Pichia and Saccharomyces Surface Display YSD has also been leveraged for vaccine development. In this application, the antigen of interest is anchored to the yeast surface. These cells are then injected into mice to induce an immune response. The antibodies produced in response to the antigen are isolated from the mice and studied further. It has been shown that β-glucan from Saccharomyces cerevisiae can work as an adjuvant [83]. Some antigens that were used in this application include portions of the Hepatitis B antigen (HBsAg) [84], the VP7 capsid protein of Grass Carp Rheovirus (GCRV) [85], Eno1 protein of Candida albicans [86], and H5 antigen from Highly Pathogenic Avian Influenza (HPAI). These proteins were expressed on the surface of either wildtype or glycoengineered deletion strains [84, 85]. The rationale behind expressing antigens in deletion strains was to eliminate hypermannosylation and allow for epitopes
198
Anjali Shenoy and Adam W. Barb
that would otherwise be shielded by large N-glycans to be exposed and elicit a better immune response. PSD was used to create libraries of camelid heavy chain antibodies utilizing a strain that generated Man5 glycoforms. The library size is estimated to be 107 and considerably smaller compared to YSD libraries (108 to 109). With two rounds of sorting, antibodies having KD values of 1.6 and 3.2 nM were isolated. These KD values were determined through a titration experiment with antigen-GFP through flow cytometry [87]. Other aspects of this surface display technology have been modified for further improvement. One group identified anchor protein ScSed1p to anchor the protein of interest (Fab domain) onto the yeast surface. This anchor protein served as an alternative to the GPI anchor which has typically been used in yeast cells to display proteins on the surface [88]. This anchor protein was later used by another group to anchor Fc onto the surface to create PSD libraries of antibodies against antigen PCK9. This expression platform could simultaneously select for and secrete a display protein of interest [89]. Another indirect approach for display of proteins through PSD was developed by making use of E. coli derived proteins Im7 and CL7 (KD 10 14 to 10 17). The induced yeast cells (expressed scSed1p-Im7) were stained with purified CL7-GFP which resulted in more than 99% of the induced yeast cells expressing Sed1p-Im7CL7-GFP complex on the yeast surface [90]. PSD has also been used to immobilize enzymes of the yeast surface for catalysis, and these methods likewise can be used for engineering ER, Golgi and secreted enzymes that often contain extensive processing and modifications. Immobilizing enzymes onto the yeast surfaces is considered to be a milder alternative to chemical methods [91]. Some examples include enzyme Lipase B52 [92], Phytase [93], and Trehalose synthase [94]. In some cases, it was found that the enzyme when displayed on Pichia pastoris was more stable than when expressed on Saccharomyces cerevisiae. However, it took a longer induction time to induce protein expression in Pichia pastoris as compared to Saccharomyces cerevisiae [92]. Recently, groups have focused on increasing display efficiency and developing novel approaches to display the protein on the yeast surface. One group displayed protein of interest (Nanobody library) by fusing its C-terminus to the N-terminus of Aga2p, the Aga2p was then fused to Acyl Carrier Protein (ACP). This fusion with ACP allows for orthogonal labelling (CoA-647) or immobilizing onto a chip (CoA-biotin) for affinity measurements. Such a display platform would make validating hits obtained from sorting libraries less cumbersome [95]. In order to increase surface expression levels, one group created a bicistronic system to express both heavy chain and light chain of the antibody Fab domain under a single GAL1 promoter through ribosome skipping [96]. Another
Recent Advances Toward Engineering Glycoproteins Using Modified Yeast. . .
199
group focused on increasing the amount of functional yeast surface displayed protein, by expressing both protein disulfide isomerase (PDI) and ER resident chaperone Kar2p along with displaying Fab domains (against TNF-α) on the surface. This approach resulted in lower heavy chain and light chain expression on the cell surface; however, it greatly increased the amount of functional Fab that could bind antigen TNF-α compared to the total amount of Fab expression [97].
5
Discussion The availability of several glycoengineered Pichia pastoris and Saccharomyces cerevisiae strains combined with the ability to create diverse protein display libraries provides unique opportunities to engineer glycoproteins that contain appropriate modifications. PSD platform provides proof of concept that glycoproteins can be displayed on the surface of glycoengineered yeast strains. However, this technology entails a cumbersome procedure to create protein libraries (integration of linearized plasmid) as compared to YSD (assembly of libraries in vivo through homologous recombination). The PSD transformation protocol may result in a higher percentage of cells expressing surface displayed protein, but the resulting libraries will be smaller (106 to 107 vs. 108 to 109 for Saccharomyces cerevisiae). One major advantage of utilizing PSD is the availability of several glycoengineered strains, utilizing these strains for protein engineering purposes would allow for the displayed protein to have the desired glycoform. However, in the case of Saccharomyces cerevisiae, it has been difficult to create strains capable of producing homogeneous glycoforms, this is perhaps due to the fact to the extensive hypermannosylation found on protein expressed in this strain that blocks epitopes from detection using large reagents like antibodies, thus interfering with surface expression detection. Furthermore, eliminating these reactions impacts strain viability through off-target effects that limit the ability to efficiently sort and screen. Another possible approach to address nonhomogeneous glycan profiles in these strains would be to use transglycosylase variants of enzymes to remodel glycoproteins. This protocol entails multiple enzymatic steps and sugar oxazolines to carry out the reaction. Remodeling glycoproteins in this manner is feasible within a laboratory setting; however, it may face challenges during scale-up to achieve biologic therapeutic production. Recent developments in this field have focused on developing a more efficient method to stain cells for FACS and increasing the amount of functional yeast displayed protein on the surface. But so far there has not been a platform developed which would allow for creation of diverse libraries of glycoproteins with homogenous
200
Anjali Shenoy and Adam W. Barb
glycoforms. Here we defined the progress toward such a goal and note that efficient glycoprotein display and engineering is likely close at hand. The development of such a platform will allow effective screening of vast and diverse glycoproteins libraries including T cell receptors, major histocompatibility complexes or Fc domains that are currently not possible with existing display technologies. References 1. Sinclair AM, Elliott S (2005) Glycoengineering: the effect of glycosylation on the properties of therapeutic proteins. J Pharm Sci 94:1626–1635. https://doi.org/10.1002/ jps.20319 2. Egrie JC, Dwyer E, Browne JK, Hitz A, Lykos MA (2003) Darbepoetin alfa has a longer circulating half-life and greater in vivo potency than recombinant human erythropoietin. Exp Hematol 31:290–299. https://doi.org/10. 1016/S0301-472X(03)00006-7 3. Fan SQ, Huang W, Wang LX (2012) Remarkable transglycosylation activity of glycosynthase mutants of endo-D, an endo-β-N-acetylglucosaminidase from Streptococcus pneumoniae. J Biol Chem 287:11272–11281. https://doi. org/10.1074/jbc.M112.340497 4. Huang W, Li C, Li B, Umekawa M, Yamamoto K, Zhang X, Wang LX (2009) Glycosynthases enable a highly efficient chemoenzymatic synthesis of N-glycoproteins carrying intact natural N-Glycans. J Am Chem Soc 131:2214–2223. https://doi.org/10.1021/ ja8074677 5. Wildt S, Gerngross TU (2005) The humanization of N-glycosylation pathways in yeast. Nat Rev Microbiol 3(2):119–128. https://doi. org/10.1038/nrmicro1087 6. Chen R (2015) The sweet branch of metabolic engineering : cherry—picking the low—hanging sugary fruits. Microb Cell Factories 14:1–10. https://doi.org/10.1186/s12934015-0389-z 7. Piirainen MA, De Ruijter JC, Koskela EV, Frey AD (2014) Glycoengineering of yeasts from the perspective of glycosylation efficiency. New Biotechnol 31:532–537. https://doi. org/10.1016/j.nbt.2014.03.001 8. De Pourcq K, De Schutter K, Callewaert N (2010) Engineering of glycosylation in yeast and other fungi : current state and perspectives. Appl Microbiol Biotechnol 87:1617–1631. https://doi.org/10.1007/s00253-010-2721-1 9. Khan AH, Bayat H, Rajabibazl M, Sabri S, Rahimpour A (2017) Humanizing
glycosylation pathways in eukaryotic expression systems. World J Microbiol Biotechnol 33:4. https://doi.org/10.1007/s11274-0162172-7 10. Boder E, Wittrup K (1997) Yeast surface display for screening combinatorial polypeptide libraries. Nat Biotechnol 15:553–557. https://doi.org/10.1038/nbt0697-553 11. Ellgaard L, Helenius A (2003) Quality control in the endoplasmic reticulum. Nat Rev Mol Cell Biol 4:181–191. https://doi.org/10. 1038/nrm1052 12. Bowley DR, Labrijn AF, Zwick MB, Burton DR (2007) Antigen selection from an HIV-1 immune antibody library displayed on yeast yields many novel antibodies compared to selection from the same library displayed on phage. Protein Eng Des Sel 20:81–90. https://doi.org/10.1093/protein/gzl057 13. Schreuder MP, Mooren ATA, Toschka HY, Verrips CT, Klis FM (1996) Immobilizing proteins on the surface of yeast cells. Trends Biotechnol 14:115–120 14. Kukuruzinska MA, MLE B, Jackson BJ (1987) Protein Gl Ycosyla Tion in yeast. Annu Rev Biochem 56:915–944 15. Cereghino JL, Cregg JM (2000) Heterologous protein expression in the methylotrophic yeast Pichia pastoris. FEMS Microbiol Rev 24:45–66. https://doi.org/10.1016/S01686445(99)00029-7 16. Goetze AM, Liu YD, Zhang Z, Shah B, Lee E, Bondarenko PV, Flynn GC (2011) Highmannose glycans on the fc region of therapeutic IgG antibodies increase serum clearance in humans. Glycobiology 21:949–959. https:// doi.org/10.1093/glycob/cwr027 17. Baudin A, Ozier-kalogeropoulos O, Denouel A, Lacroute F, Cullin C (1993) A simple and efficient method for direct gene deletion in Saccharomyces cerevisiae. Nucleic Acids Res 21:3329–3330. https://doi.org/ 10.1093/nar/21.14.3329 18. Nakayama K, Nagasu T, Shimma Y, Kuromitsu J, Jigami Y (1992) OCH1 encodes
Recent Advances Toward Engineering Glycoproteins Using Modified Yeast. . . a novel membrane bound mannosyltransferase: outer chain elongation of asparagine-linked oligosaccharides. EMBO J 11:2511–2519. https://doi.org/10.1002/j.1460-2075.1992. tb05316.x 19. Nakayama KI, Nakanishi-Shindo Y, Tanaka A, Haga-Toda Y, Jigami Y (1997) Substrate specificity of α-1,6-mannosyltransferase that initiates N-linked mannose outer chain elongation in Saccharomyces cerevisiae. FEBS Lett 412:547–550. https://doi.org/10.1016/ S0014-5793(97)00634-0 20. Nakanishi-Shindo Y, Nakayama KI, Tanaka A, Toda Y, Jigami Y (1993) Structure of the N-linked oligosaccharides that show the complete loss of α-1,6-polymannose outer chain from och1, och1 mnn1, and och1 mnn1 alg3 mutants of Saccharomyces cerevisiae. J Biol Chem 268:26338–26345 21. Nagasu T, Shimma Y-I, Nakanishi Y, Kuromitsu J, Iwama K, Nakayama K-I, Suzuki K, Jigami Y (1992) Isolation of new temperature-sensitive mutants of Saccharomyces cerevisiae deficient in mannose outer chain elongation. Yeast 8:535–547. https://doi.org/ 10.1002/yea.320080705 22. Zhou J, Zhang H, Liu X, Wang PG, Qi Q (2007) Influence of N-glycosylation on Saccharomyces cerevisiae morphology: a Golgi glycosylation mutant shows cell division defects. Curr Microbiol 55:198–204. https://doi. org/10.1007/s00284-006-0585-5 23. Abe H, Takaoka Y, Chiba Y, Sato N, Ohgiya S, Itadani A, Hirashima M, Shimoda C, Jigami Y, Nakayama KI (2009) Development of valuable yeast strains using a novel mutagenesis technique for the effective production of therapeutic glycoproteins. Glycobiology 19:428–436. https://doi.org/10.1093/glycob/cwn157 24. Martinet W, Maras M, Saelens X, Jou WM, Contreras R (1998) Modification of the protein glycosylation pathway in the methylotrophic yeast Pichia pastoris. Biotechnol Lett 20:1171–1177. https://doi.org/10.1023/ A:1005340806821 25. Chiba Y, Suzuki M, Yoshida S, Yoshida A, Ikenaga H, Takeuchi M, Jigami Y, Ichishima E (1998) Production of human compatible high mannose-type (Man5GlcNAc2) sugar chains in Saccharomyces cerevisiae. J Biol Chem 273:26298–26304. https://doi.org/ 10.1074/jbc.273.41.26298 26. Callewaert N, Y, W. L., Cadirgi H, Geysens S, Saelens X, Min W, Y R. C FEBS Letters 503 (2001). Use of HDEL-tagged Trichoderma reesei mannosyl oligosaccharide 1,2- K - D mannosidase for N-glycan engineering in
201
Pichia pastoris. https://doi.org/10.1016/ s0014-5793(01)02676-x 27. Choi BK, Bobrowicz P, Davidson RC, Hamilton SR, Kung DH, Li H, Miele RG, Nett JH, Wildt S, Gerngross TU (2003) Use of combinatorial genetic libraries to humanize N-linked glycosylation in the yeast Pichia pastoris. Proc Natl Acad Sci U S A 100:5022–5027. https:// doi.org/10.1073/pnas.0931263100 28. Muraoka M, Miki T, Ishida N, Hara T, Kawakita M (2007) Variety of nucleotide sugar transporters with respect to the interaction with nucleoside mono- and diphosphates. J Biol Chem 282:24615–24622. https://doi. org/10.1074/jbc.M611358200 29. Hamilton SR, Davidson RC, Sethuraman N, Nett JH, Jiang Y, Rios S, Bobrowicz P, Stadheim TA, Li H, Choi B, Hopkins D, Wischnewski H, Roser J, Mitchell T, Strawbridge RR, Hoopes J, Wildt S, Gerngross TU (2006) Humanization of yeast to. Science 313:1441–1443. https://doi.org/10.1126/ science.1130256 30. Vervecken W, Kaigorodov V, Callewaert N, Geysens S, De Vusser K, Contreras R (2004) In vivo synthesis of mammalian-like, hybridtype N-glycans in Pichia pastoris. Appl Environ Microbiol 70:2639–2646. https://doi.org/ 10.1128/AEM.70.5.2639-2646.2004 31. Bobrowicz P, Davidson RC, Li H, Potgieter TI, Nett JH, Hamilton SR, Stadheim TA, Miele RG, Bobrowicz B, Mitchell T, Rausch S, Renfer E, Wildt S (2004) Engineering of an artificial glycosylation pathway blocked in core oligosaccharide assembly in the yeast Pichia pastoris : production of complex humanized glycoproteins with terminal galactose. Glycobiology 14:757–766. https:// doi.org/10.1093/glycob/cwh104 32. Cao J, Perez-Pinera P, Lowenhaupt K, Wu MR, Purcell O, De La Fuente-Nunez C, Lu TK (2018) Versatile and on-demand biologics co-production in yeast. Nat Commun 9:77. https://doi.org/10.1038/s41467-01702587-w 33. Li H, Sethuraman N, Stadheim TA, Zha D, Prinz B, Ballew N, Bobrowicz P, Choi B, Cook WJ, Cukan M, Houston-cummings NR, Davidson R, Gong B, Hamilton SR, Hoopes JP, Jiang Y, Kim N, Mansfield R, Nett JH, Rios S, Strawbridge R, Wildt S, Gerngross TU (2006) Optimization of humanized IgGs in glycoengineered Pichia pastoris. Nat Biotechnol 24:210–215. https://doi.org/10. 1038/nbt1178 34. Zhang G-W, Shen L, Zhong W, Xiong Y, Zhang LI, Tao HW (2016) Myocardial
202
Anjali Shenoy and Adam W. Barb
Extraction from Newborn Rats HHS public access. Physiol Behav 176:139–148. https:// doi.org/10.1016/j.physbeh.2017.03.040 35. Jiang Y, Li F, Button M, Cukan M, Moore R, Sharkey N, Li H (2010) A high-throughput purification of monoclonal antibodies from glycoengineered Pichia pastoris. Protein Expr Purif 74:9–15. https://doi.org/10.1016/j. pep.2010.04.016 36. Potgieter TI, Kersey SD, Mallem MR, Nylen AC, D’Anjou M (2010) Antibody expression kinetics in glycoengineered Pichia pastoris. Biotechnol Bioeng 106:918–927. https://doi. org/10.1002/bit.22756 37. Knauer R, Lehle L (1999) The oligosaccharyltransferase complex from yeast. Biochim Biophys Acta Gen Subj 1426:259–273. https:// doi.org/10.1016/S0304-4165(98)00128-7 38. Cain JA, Dale AL, Niewold P, Klare WP, Man L, White MY, Scott NE, Cordwell SJ (2019) Proteomics reveals multiple phenotypes associated with N-linked glycosylation in Campylobacter jejuni. Mol Cell Proteomics 18 (4):715–734. https://doi.org/10.1074/mcp. RA118.001199 39. Shams-Eldin H, Blaschke T, Anhlan D, Niehus S, Mu¨ller J, Azzouz N, Schwarz RT (2005) High-level expression of the toxoplasma gondii STT3 gene is required for suppression of the yeast STT3 gene mutation. Mol Biochem Parasitol 143:6–11. https://doi.org/ 10.1016/j.molbiopara.2005.04.008 40. Proteins Y (1981) N-glycosylation of yeast Proteins. Control 108:101–108 41. Huffaker TC, Robbins PW (1983) Yeast mutants deficient in protein glycosylation. Proc Natl Acad Sci U S A 80:7466–7470. https://doi.org/10.1073/pnas.80.24.7466 42. Castro O, Movsichoff F, Parodi AJ (2006) Preferential transfer of the complete glycan is determined by the oligosaccharyltransferase complex and not by the catalytic subunit. Proc Natl Acad Sci U S A 103:14756–14760. https://doi.org/10.1073/pnas.0607086103 43. Mokas S, Mills JR, Garreau C, Fournier M-J, Robert F, Arya P, Kaufman RJ, Pelletier J, Mazroui R (2009) Uncoupling stress granule assembly and translation initiation inhibition. Mol Biol Cell 20:2673–2683. https://doi. org/10.1091/mbc.E08 44. Hese K, Otto C, Routier H (2009) The yeast oligosaccharyltransferase complex can be replaced by STT3 from Leishmania major. Glycobiology 19:160–171. https://doi.org/10. 1093/glycob/cwn118 45. Davidson RC, Nett JH, Renfer E, Li H, Stadheim TA, Miller BJ, Miele RG, Hamilton SR,
Choi BK, Mitchell TI, Wildt S (2004) Functional analysis of the ALG3 gene encoding the Dol-P-man: Man5GlcNAc2-PP-Dol mannosyltransferase enzyme of P. pastoris. Glycobiology 14:399–407. https://doi.org/10.1093/ glycob/cwh023 46. Cipollo JF, Trimble RB, Chi JH, Yan Q, Dean N (2001) The yeast ALG11 gene specifies addition of the terminal α1,2-man to the Man5GlcNAc2-PP-dolichol N-glycosylation intermediate formed on the cytosolic side of the endoplasmic reticulum. J Biol Chem 276:21828–21840. https://doi.org/10. 1074/jbc.M010896200 47. Nasab FP, Aebi M, Bernhard G, Frey AD (2013) A combined system for engineering glycosylation efficiency and glycan structure in Saccharomyces cerevisiae. Appl Environ Microbiol 79:997–1007. https://doi.org/10.1128/ AEM.02817-12 48. Helenius J, Ng DTW, Marolda CL, Walter P, Valvano MA, Aebi M (2002) Translocation of lipid-linked oligosaccharides across the ER membrane requires Rft1 protein. Nature 1436:447–450 49. Choi B, Warburton S, Lin H (2012) Improvement of N-glycan site occupancy of therapeutic glycoproteins produced in Pichia pastoris. Appl Microbiol Biotechnol 95:671–682. https:// doi.org/10.1007/s00253-012-4067-3 50. Li C, Wang LX (2018) Chemoenzymatic methods for the synthesis of glycoproteins. Chem Rev 118:8359–8413. https://doi.org/ 10.1021/acs.chemrev.8b00238 51. Wang LX, Amin MN (2014) Chemical and chemoenzymatic synthesis of glycoproteins for deciphering functions. Chem Biol 21:51–66. https://doi.org/10.1016/j. chembiol.2014.01.001 52. Giddens JP, Lomino JV, DiLillo DJ, Ravetch JV, Wang LX (2018) Site-selective chemoenzymatic glycoengineering of fab and fc glycans of a therapeutic antibody. Proc Natl Acad Sci U S A 115:12023–12027. https://doi.org/10. 1073/pnas.1812833115 53. Li T, DiLillo DJ, Bournazos S, Giddens JP, Ravetch JV, Wang LX (2017) Modulating IgG effector function by fc glycan engineering. Proc Natl Acad Sci U S A 114:3485–3490. https://doi.org/10.1073/pnas.1702173114 54. Steinkellner H, Castilho A (2015) N-GlycoEngineering in Plants: Update on Strategies and Major Achievements. Methods Mol Bio 1321:195–212. https://doi.org/10.1007/ 978-1-4939-2760-9_14. PMID: 26082224 55. Robbins PW, Wirth DF, Hering C (1981) Expression of the Streptomyces enzyme
Recent Advances Toward Engineering Glycoproteins Using Modified Yeast. . . endoglycosidase H in Escherichia coli. J Biol Chem 256:10640–10644 56. Wang F, Wang X, Yu X, Fu L, Liu Y, Ma L, Zhai C (2015) High-level expression of endo-β-Nacetylglucosaminidase H from Streptomyces plicatus in Pichia pastoris and its application for the deglycosylation of glycoproteins. PLoS One 10:e0120458. https://doi.org/10. 1371/journal.pone.0120458 57. Fujita M, Ichiro SS, Haneda K, Inazu T, Takegawa K, Yamamoto K (2001) A novel disaccharide substrate having 1,2-oxazoline moiety for detection of transglycosylating activity of endoglycosidases. Biochim Biophys Acta Gen Subj 1528:9–14. https://doi.org/ 10.1016/S0304-4165(01)00164-7 58. Umekawa M, Huang W, Li B, Fujita K, Ashida H, Wang LX, Yamamoto K (2008) Mutants of Mucor hiemalis endo-β-N-acetylglucosaminidase show enhanced transglycosylation and glycosynthase-like activities. J Biol Chem 283:4469–4479. https://doi.org/10. 1074/jbc.M707137200 59. Giddens JP, Lomino JV, Amin MN, Wang LX (2016) Endo-F3 glycosynthase mutants enable chemoenzymatic synthesis of core-fucosylated triantennary complex type glycopeptides and glycoproteins. J Biol Chem 291:9356–9370. https://doi.org/10.1074/jbc.M116.721597 60. Collin M, Olse´n A (2001) EndoS, a novel secreted protein from Streptococcus pyogenes with endoglycosidase activity on human IgG. EMBO J 20:3046–3055. https://doi.org/10. 1093/emboj/20.12.3046 61. Allhorn M, Olin AI, Nimmerjahn F, Colin M (2008) Human IgC/FcγR interactions are modulated by streptococcal IgG glycan hydrolysis. PLoS One 3:e1413. https://doi.org/10. 1371/journal.pone.0001413 62. Trastoy B, Klontz E, Orwenyo J, Marina A, Wang LX, Sundberg EJ, Guerin ME (2018) Structural basis for the recognition of complex-type N-glycans by endoglycosidase S. Nat Commun 9:1–11. https://doi.org/10. 1038/s41467-018-04300-x 63. Allhorn M, Olse´n A, Collin M (2008) EndoS from Streptococcus pyogenes is hydrolyzed by the cysteine proteinase SpeB and requires glutamic acid 235 and tryptophans for IgG glycanhydrolyzing activity. BMC Microbiol 8:1–10. https://doi.org/10.1186/1471-2180-8-3 64. Manuscript A (2013) Gain of Functions 134:12308–12318. https://doi.org/10. 1021/ja3051266.Chemoenzymatic 65. Sjo¨gren J, Struwe WB, Cosgrave EFJ, Rudd PM, Stervander M, Allhorn M, Hollands A, Nizet V, Collin M (2013) EndoS2 is a unique
203
and conserved enzyme of serotype M49 group a streptococcus that hydrolyses N-linked glycans on IgG and α1-acid glycoprotein. Biochem J 455:107–118. https://doi.org/10. 1042/BJ20130126 66. Sjo¨gren J, Cosgrave EFJ, Allhorn M, Nordgren M, Bjo¨rk S, Olsson F, Fredriksson S, Collin M (2015) EndoS and EndoS2 hydrolyze fc-glycans on therapeutic antibodies with different glycoform selectivity and can be used for rapid quantification of high-mannose glycans. Glycobiology 25:1053–1063. https://doi.org/10.1093/ glycob/cwv047 67. Klontz EH, Trastoy B, Deredge D, Fields JK, Li C, Orwenyo J, Marina A, Beadenkopf R, Gu¨nther S, Flores J, Wintrode PL, Wang LX, Guerin ME, Sundberg EJ (2019) Molecular basis of broad Spectrum N-glycan specificity and processing of therapeutic IgG monoclonal Antibodies by endoglycosidase S2. ACS Central Sci 5:524–538. https://doi.org/10.1021/ acscentsci.8b00917 68. Li T, Tong X, Yang Q, Giddens JP, Wang LX (2016) Glycosynthase mutants of endoglycosidase S2 show potent transglycosylation activity and remarkably relaxed substrate specificity for antibody glycosylation remodeling. J Biol Chem 291:16508–16518. https://doi.org/ 10.1074/jbc.M116.738765 69. Liu CP, Tsai TI, Cheng T, Shivatare VS, Wu CY, Wu CY, Wong CH (2018) Glycoengineering of antibody (Herceptin) through yeast expression and in vitro enzymatic glycosylation. Proc Natl Acad Sci U S A 115:720–725. https://doi.org/10.1073/pnas.1718172115 70. Chao G, Lau WL, Hackel BJ, Sazinsky SL, Lippow SM, Wittrup KD (2007) Isolating and engineering human antibodies using yeast surface display. Nat Protoc 1:755–769. https://doi.org/10.1038/nprot.2006.94 71. Boder ET, Midelfort KS, Wittrup KD (2000) Directed evolution of antibody fragments with monovalent femtomolar antigen-binding affinity. Proc Natl Acad Sci U S A 97:10701–10705. https://doi.org/10.1073/ pnas.170297297 72. Walker LM, Bowley DR, Burton DR (2009) Efficient recovery of high-affinity Antibodies from a single-chain fab yeast display library. J Mol Biol 389:365–375. https://doi.org/10. 1016/j.jmb.2009.04.019 73. Rajpal A, Beyaz N, Haber L, Cappuccilli G, Yee H, Bhatt RR, Takeuchi T, Lerner RA, Crea R (2005) A general method for greatly improving the affinity of antibodies by using combinatorial libraries. Proc Natl Acad Sci U S A 102(24):8466–8471
204
Anjali Shenoy and Adam W. Barb
74. Boder ET, Bill JR, Nields AW, Marrack PC (2005) Yeast surface display of a noncovalent MHC class II heterodimer complexed with antigenic peptide. Biotechnol Bioeng 92 (4):485–491. https://doi.org/10.1002/bit. 20616 75. Wang XX, Li Y, Yin Y, Mo M, Wang Q, Gao W, Wang L, Mariuzza RA (2011) Affinity maturation of human CD4 by yeast surface display and crystal structure of a CD4-HLA-DR1 complex. Proc Natl Acad Sci U S A 108:15960–15965. https://doi.org/10. 1073/pnas.1109438108 76. Shembekar N, Mallajosyula VVA, Mishra A, Yeolekar L, Dhere R, Kapre S, Varadarajan R, Gupta SK (2013) Isolation of a high affinity neutralizing monoclonal antibody against 2009 pandemic H1N1 virus that binds at the “Sa” antigenic site. PLoS One 8:1–10. https:// doi.org/10.1371/journal.pone.0055516 77. Kim Y, Bhandari R, Cochran JR, Kuriyan J, Wittrup KD (2006) Directed evolution of the epidermal growth factor receptor extracellular domain for expression in yeast. Proteins 1035:1026–1035. https://doi.org/10.1002/ prot.20618 78. Ranz DAMK (1999) Selection of functional T cell receptor mutants from a yeast surfacedisplay library. Proc Natl Acad Sci U S A 96:5651–5656 79. Mathew E, Zhu H, Connelly SM, Sullivan MA, Brewer MG, Piepenbrink MS, Kobie JJ, Dewhurst S, Dumont ME (2018) Display of the HIV envelope protein at the yeast cell surface for immunogen development. PLoS One 13(10):e0205756 80. Jacobs PP, Ryckaert S, Geysens S, De Vusser K, Callewaert N, Contreras R (2008) Pichia surface display: display of proteins on the surface of glycoengineered Pichia pastoris strains. Biotechnol Lett 30:2173–2181. https://doi.org/ 10.1007/s10529-008-9807-1 81. Antibodies SD (2012) Chapter 8 Pichia surface display: a tool for screening single domain antibodies. Methods Mol Biol 911:125–134. https://doi.org/10.1007/978-1-61779-9686 82. Ha S, Wang Y, Rustandi RR (2011) Biochemical and biophysical characterization of humanized IgG1 produced in Pichia pastoris. mAbs 3:453–460. https://doi.org/10.4161/mabs. 3.5.16891 83. Berner VK, Sura ME, Hunter KW Jr (2008) Conjugation of protein antigen to microparticulate β -glucan from Saccharomyces cerevisiae: a new adjuvant for intradermal and oral immunizations. Appl Microbiol Biotechnol
80:1053–1061. https://doi.org/10.1007/ s00253-008-1618-8 84. Schreuder MP (1996) Yeast expressing hepatitis B virus surface antigen determinants on its surface : implications for a possible oral vaccine. Vaccine 14:383–388 85. Luo S, Yan L, Zhang X, Yuan L, Fang Q, Zhang YA, Dai H (2015) Yeast surface display of capsid protein VP7 of grass carp reovirus: fundamental investigation for the development of vaccine against hemorrhagic disease. J Microbiol Biotechnol 25:2135–2145. https://doi.org/10.4014/jmb.1505.05041 86. Shibasaki S, Aoki W, Nomura T, Miyoshi A, Tafuku S, Sewaki T, Ueda M (2013) An oral vaccine against candidiasis generated by a yeast molecular display system. Pathog Dis 69:262–268. https://doi.org/10.1111/ 2049-632X.12068 87. Ryckaert S, Pardon E, Steyaert J, Callewaert N (2010) Isolation of antigen-binding camelid heavy chain antibody fragments (nanobodies) from an immune library displayed on the surface of Pichia pastoris. J Biotechnol 145:93–98. https://doi.org/10.1016/j. jbiotec.2009.10.010 88. Lin S, Houston-cummings NR, Prinz B, Moore R, Bobrowicz B, Davidson RC, Wildt S, Stadheim TA, Zha D (2012) A novel fragment of antigen binding (fab) surface display platform using glycoengineered Pichia pastoris. J Immunol Methods 375:159–165. https://doi.org/10.1016/j.jim.2011.10.003 89. Shaheen HH, Prinz B, Chen M, Pavoor T, Lin S, Houston NR, Moore R, Stadheim TA, Zha D (2013) A Dual-Mode Surface Display System for the Maturation and Production of Monoclonal Antibodies in Glyco-Engineered Pichia pastoris. PLoS One 8:1–10. https:// doi.org/10.1371/journal.pone.0070190 90. Li S, Qiao J, Lin S, Liu Y, Ma L (2019) A Highly Efficient Indirect P. pastoris Surface Display Method Based on the CL7/Im7 Ultra-High-Affinity System. Molecules 24. https://doi.org/10.3390/ molecules24081483 91. Lozancˇic´ M, Hossain ASK, Mrsˇa V, Teparic´ R (2019) Surface display—an alternative to classic enzyme immobilization. Catalysts 9(9):728 92. Jiang Z, Gao B, Ren R, Tao X, Ma Y, Wei D (2008) Efficient display of active lipase LipB52 with a Pichia pastoris cell surface display system and comparison with the LipB52 displayed on Saccharomyces cerevisiae cell surface. BMC Biotechnol 8:1–7. https://doi.org/10.1186/ 1472-6750-8-4
Recent Advances Toward Engineering Glycoproteins Using Modified Yeast. . . 93. Harnpicharnchai P, Sornlake W, Tang K, Eurwilaichitr L, Tanapongpipat S (2010) Cell-surface phytase on Pichia pastoris cell wall offers great potential as a feed supplement. FEMS Microbiology Letters 302(1):8–14 https:// doi.org/10.1111/j.1574-6968.2009. 01811.x 94. Yang S, Lv X, Wang X, Wang J, Wang R, Wang T (2017) Cell-surface displayed expression of Trehalose synthase from pseudomonas putida ATCC 47054 in Pichia Pastoris using Pir1p as an anchor protein. Front Microbiol 8:1–9. https://doi.org/10.3389/fmicb.2017.02583 95. Uchan´ski T, Zo¨gg T, Yin J, Yuan D, Wohlko¨nig A, Fischer B, Rosenbaum DM, Kobilka BK, Pardon E, Steyaert J (2019) An improved yeast surface display platform for the
205
screening of nanobody immune libraries. Sci Rep 9:1–12. https://doi.org/10.1038/ s41598-018-37212-3 96. Rosowski S, Becker S, Toleikis L, Valldorf B, Grzeschik J, Demir D, Willenbu¨cher I, Gaa R, Kolmar H, Zielonka S, Krah S (2018) A novel one-step approach for the construction of yeast surface display fab antibody libraries. Microb Cell Factories 17:1–11. https://doi.org/10. 1186/s12934-017-0853-z 97. Mei M, Li J, Wang S, Lee KB, Iverson BL, Zhang G, Ge X, Yi L (2019) Prompting fab yeast surface display efficiency by ER retention and molecular chaperon co-expression. Front Bioeng Biotechnol 7:1–11. https://doi.org/ 10.3389/fbioe.2019.00362
Part III Systems Biology and Computational Methods for Deciphering Glycosylation Networks
Chapter 10 Computational Modeling of Glycan Processing in the Golgi for Investigating Changes in the Arrangements of Biosynthetic Enzymes Ben West, A. Jamie Wood, and Daniel Ungar Abstract Modeling glycan biosynthesis is becoming increasingly important due to the far-reaching implications that glycosylation can exhibit, from pathologies to biopharmaceutical manufacturing. Here we describe a stochastic simulation approach, to overcome the deterministic nature of previous models, that aims to simulate the action of glycan modifying enzymes to produce a glycan profile. This is then coupled with an approximate Bayesian computation methodology to systematically fit to empirical data in order to determine which set of parameters adequately describes the organization of enzymes within the Golgi. The model is described in detail along with a proof of concept and therapeutic applications. Key words Glycosylation, Stochastic simulation, Approximate Bayesian computation, Modeling
1
Introduction Protein glycosylation is a complex and flexible posttranslational modification that has been associated with a diverse set of biological processes and pathologies [1–5]. The high level of complexity arises from the vast variation in the structures of glycans that can be produced. Glycan structures are altered by two enzyme families: glycosidases and glycosyltransferases. Glycosidases hydrolyze glycosidic bonds, cleaving part of a glycan, whereas glycosyltransferases catalyze the formation of a glycosidic bond, thereby initiating, extending, and branching glycans. Glycans are polymers consisting of several different monosaccharide units that can be added to each other in different orders and into different positions. A large set of competing enzymes, of the aforementioned glycosidase and glycosyltransferase families, are used to generate the polymers in the absence of a template. In comparison to a polysaccharide polymer, such as cellulose, where a more limited number of enzymes act in a concerted manner, the
Gavin P. Davey (ed.), Glycosylation: Methods and Protocols, Methods in Molecular Biology, vol. 2370, https://doi.org/10.1007/978-1-0716-1685-7_10, © Springer Science+Business Media, LLC, part of Springer Nature 2022
209
210
Ben West et al.
competing reactions when making a glycan result in a highly heterogenous mix of glycans. Yet this heterogeneity is never completely random. For example, different cell types in the human body show distinct and reproducible glycan profiles. The glycan profiles are, in part, influenced by cell-line specific expression of the biosynthetic enzymes, however, this is not sufficient to explain the differences arising in the profiles of different cell types [6]. Hence, the importance of subcompartmentalization of enzymes across Golgi cisternae has been highlighted as a key feature in glycan synthesis [7]. Understanding how nonuniform enzyme distributions across the Golgi [8] are maintained and in what way these control reproducible but distinct glycan profiles across cell types has become a key area of investigation for glycan biosynthesis. Given the pervasiveness and far-reaching applications of glycosylation, a good understanding of how glycan heterogeneity is controlled is very important. Due to the complexities of this biosynthetic process, which involves a large number of competing enzymes working in concert, systems biology approaches are needed. Several computational models describing the synthesis of glycans have been produced in an effort to understand the biosynthetic requirements for generating different glycan patterns [9– 12]. Of particular interest for our method is the computational representation of the way different enzyme arrangements guide glycan processing. To understand the implications of models assessing Golgi enzyme arrangements, it is first important to understand how this organelle works with regards to the glycosylation of proteins. The cisternae of the Golgi should be thought of as dynamic reaction chambers that contain the glycan modifying enzymes. The cisternae are arranged from the cis side of the Golgi to the trans side [8], and glycoprotein substrates remain in the same cisterna throughout their residence in the organelle [13]. In order to maintain a sequence of glycan processing reactions, the enzymes must be retrogradely (i.e., in the opposite direction to secretion) trafficked in vesicles, so glycoproteins meet an ever changing subset of the enzymes [14, 15]. This process of Golgi trafficking is called the cisternal maturation model [13, 16] (Fig. 1). N-glycosylation is initiated on the cytoplasmic face of the endoplasmic reticulum (ER) with the creation of a glycan precursor consisting of two N-acetylglucosamine (GlcNAc) residues and five mannose residues (Man5GlcNAc2), which is flipped to the ER lumen. There, four more mannose residues are added as well as three glucoses, resulting in Glc3Man9GlcNAc2 that is transferred en bloc to an asparagine residue of a newly synthesized protein. The three glucoses are used as part of a protein quality control step and are removed sequentially, leaving Man9GlcNAc2 that can undergo further trimming prior to trafficking of the glycoprotein to the Golgi. However, some glycans with a single glucose can still enter
Computational Modeling of Glycan Processing in the Golgi for Investigating. . .
211
Fig. 1 An overview of the cisternal maturation model. Secreted cargo remains within the cisternae, while resident Golgi proteins are transported in a retrograde fashion to previous cisternae to induce their maturation
the Golgi, meaning that a glycan enters the Golgi in one of three states: Man8GlcNAc2, Man9GlcNAc2, or Glc1Man9GlcNAc2. Within the Golgi, alterations to a glycan result from the action of glycosidases and glycosyltransferases (Fig. 2), until the glycan leaves the Golgi. This processing generates N-glycans belonging to one of three classes: oligomannose, hybrid, or complex. All N-glycans share a common core sequence: two GlcNAc residues extended by three mannose residues (Man3GlcNAc2) onto which other monomer units are attached. Oligomannose glycans are those in which only mannose residues are attached onto the core and these are subject to mannose trimming enzymes such as mannosidase I and II. Complex glycans contain antennae initiated by GlcNAc residues added to the core. This occurs through the actions of Nacetylglucosamine transferases, such as MGAT1–MGAT5. Hybrid glycans are characterized as containing both mannose and GlcNAc initiated antennae [17]. Modeling the action, abundance, and localization of these glycan modifying enzymes will help us to understand how the synthesis of glycans is controlled. Despite lagging behind other biological systems, due to inherent structural complexity, the computational modeling of glycosylation has gained traction in recent years. The first model of glycosylation, developed by Umana and Bailey, was used to generate 33 N-glycan reactions in silico, up to the point of galactosylation in the N-glycosylation pathway. Each Golgi cisterna was modeled as a reaction chamber that follows Michaelis–Menten kinetics with literature-derived parameters. By solving a set of ordinary differential equations
212
Ben West et al.
Fig. 2 The route a glycan can take on its biosynthetic journey from the ER to the trans-Golgi. The blue box denotes glycans that are oligomannose, green box denotes hybrid glycans, and the red box denotes complex. The first four glycans in the blue box are those that can be subject to the oligomannose quench in the stochastic simulation of glycan processing, which targets them for removal from further processing in the Golgi. Enzyme abbreviations: mannosidase I (MAN1), mannosidase II (MAN2), fucosyltransferase 8 (FUT8), N-acetylglucosaminetransferase I-V (MGAT1–5), galactosyltransferases (GalT), sialyltransferases (SiaT)
(ODEs) the solution gave glycan structures that correctly simulated the experimental glycan profile typical for secreted recombinant proteins produced in Chinese hamster ovary (CHO) cells [9]. With the development of better technology this work has been greatly expanded on, going from 33 possible reactions to a possible 22,871 in a model by Krambeck and Betenbaugh, which not only includes galactosylation but fucosylation and sialylation as well [10]. Another model utilizing ODEs, which further extends on the previous modeling, utilized structure-specific turnover rates to provide a kinetic description of N-glycan processing along the entire secretory pathway [18]. Additionally, a study utilized the above described models to identify that changes in GalT activity unexpectedly affect branching during N-glycan processing [19]. A different aspect of modeling glycosylation is to model how altered culture conditions rather than altered enzyme arrangements/activities change glycan profiles. This has been extensively investigated for the influence of temperature [20], culture feed [21], and sugar nucleotide donor abundance [22]. This work, also based on ODE methods, demonstrates the complexity of trying to model glycosylation whilst encapsulating different parameters that may exhibit an effect on glycosylation. Whilst these ODE models have refined our knowledge of glycosylation, they assume that the dynamics of
Computational Modeling of Glycan Processing in the Golgi for Investigating. . .
213
glycosylation are captured thoroughly through a deterministic approach. Furthermore, ODE models cannot readily be used for fitting to experimental data. This is because the data needed to fit such models is not present at the appropriate level of detail to be sure that the models are constrained to the desired subspace of state space. Due to the low concentrations of enzymes and the high level of competition in the Golgi apparatus, stochastic models that incorporate biological noise are more appropriate when modeling glycosylation [11, 12]. One such stochastic model by Spahn et al. does not rely on kinetic information, but rather uses methods from Monte-Carlo Markov chain (MCMC) theory. In this model each glycan is regarded as a state within a network that transitions to other states with certain reaction probabilities, independent of the past. This coupled with flux based analysis and a genetic algorithm approach for optimization was used to model glycosylation. The stochastic model that is the focus of the remainder of this chapter utilizes MCMC and the Gillespie algorithm to simulate biological noise in conjunction with an approximate Bayesian computation (ABC) fitting methodology [12]. This method allows us to link the organization of Golgi enzymes to generated glycan profiles and thereby provides a tool for problems such as probing glycan engineering strategies, answering cell biological questions on intra-Golgi protein sorting, and pinpointing strategies for alleviating human diseases caused by defective glycan processing. A genetic algorithm is generally better at finding an accurate solution quickly, only if the solution is found, as the parameter perturbations used are random to counter the size of the biosynthetic flux system, which is too large for a systematic search. Each fit is independent of the last which leads to a loss of information regarding the trajectory of the fitting, making the found solutions less reliable, and preventing the direct assessment of relative shifts in the parameter space. In contrast, our use of Bayesian computation, enabled by the more streamlined flux map, is a statistical approach to fit the parameters of the biosynthetic machinery to a state that produces the expected glycan pattern. This allows a systematic approach for parameter fitting, delivering high quality relative information on parameter shifts, thereby providing important cell biological information on the changes to the glycosylation machinery between the assessed cellular states. The computational model of glycan biosynthesis was created using custom written Java code. The model aims to simulate the action of glycan modifying enzymes to produce a glycan profile that is then compared to an experimental glycan profile to determine which set of parameters adequately describes enzyme organization in the Golgi. The modeling method is divided into two separate bodies of code: stochastic simulation and model fitting. Both will be explored in depth below. Broadly, the stochastic simulation is
214
Ben West et al.
designed to create a glycan profile based largely on parameters termed the “effective” enzymatic rate (EFER). The EFER is an amalgamation of the enzyme’s amount, its turnover rate, and the sugar-nucleotide substrate concentration where appropriate. By subsuming these parameters under one value we decrease the parameter space, making the modeling computationally efficient. The EFER is used to describe the rate constant of a particular reaction experienced by a focal glycan. Using an ABC fitting algorithm that relies on some (often limited) prior knowledge of the parameters, randomly selected parameter values from a prior distribution are accepted or rejected based on similarity between simulated and experimentally obtained data. We are using data obtained using MALDI mass spectrometry of permethylated glycans, as this has been shown to provide reliable quantitative glycan profiles [23]. Results from the fitting process tell the researcher how a parameter set needs to change from a starting state to generate the altered glycan profile, providing insights into altered enzyme arrangements within the Golgi. Crucially this means that the key information is not the final parameter derived, but rather the changes needed to improve the fit.
2
Stochastic Simulation Algorithm (SSA) Underpinning the stochasticity of the model is the Gillespie algorithm, which uses the EFER as a reaction probability per unit time [24]. By treating the actions of independent enzymes in each of the cisternae as probabilities, we can generate heterogeneity similar to that seen in experimental glycan profiles. Man8GlcNAc2, Glc1Man9GlcNAc2, and Man9GlcNAc2 are the three possible input glycans used as the starting point of processing. Which of these three glycans gets used is determined by using two input parameters (Table 1 ‘E’), the Man8GlcNAc2 and Glc1Man9GlcNAc2 fractions. Each enzymatic processing step that is chosen via the stochastic process then progressively alters the input glycan as it moves through the Golgi. The substrate and product of these enzymatic steps are both in a linear notation form, as are all glycans within this simulation. The linear notation form used here is just one possible example, but the type used was created to be tailored for the string substitutions used by the SSA while allowing all the necessary information from mass spectrometry to be encapsulated. Linear notation allows for the actions of the enzymes to be implemented using a string substitution method to build new glycans. In essence, the action of each enzyme is simulated by the code searching for the substrate sequence, and if the enzyme is chosen to act this substrate sequence will be substituted with the enzyme’s product sequence. For example, MAN1 will look for the substrate sequence: “1Man2.1Man:” (Table 1 ‘B’)
Computational Modeling of Glycan Processing in the Golgi for Investigating. . .
215
Table 1 A table showing the.xls file used as an input for the stochastic simulation. The table contains information on the Golgi enzymes required for glycosylation. (A) The first column denotes the names of the different enzymes. It is worth noting that the same enzyme is present across multiple lines to account for instances where multiple different glycan structures serve as substrates for the same enzyme. In some cases, these different entries have a different EFER. For example, compare ST3Gal2.1 and ST3Gal2.2. (B) The enzyme substrate is a string sequence that the simulation will search for and this will be replaced with a different string, the product (C). The linkage between residues is denoted by the numbers in a conventional manner. Residues enclosed in brackets represent single branches. The underscore and lowercase letters represent the continuation from the previous residue not enclosed within the brackets. The “:” represents the termination of the branch and the “@” denotes the end of the N-glycan string. (D) The EFER for a particular enzyme across the three different cisternae used in this example. More cisternae can be added, and we have successfully run the model with four. (E) Three extra parameters that are required for the model: Man8GlcNAc2 fraction, Glc1Man9GlcNAc2 fraction, and transit time, respectively
and if the code finds it and the enzyme is chosen to act, it will replace “1Man2.1Man:” with the product sequence “1Man:” (Table 1 ‘C’), simulating the cleavage of a mannose residue. It is important to note that a single enzyme can have different substrates, which is why some enzymes have multiple entries in the table (Table 1 ‘A’). In some cases, different substrates are processed with a different rate; this is implemented using scale factors that alter the rate of the enzyme for a given substrate. These scalefactors initially have a value of one, meaning the enzyme’s rate for both substrates is the same. If, however, after fitting the scale-factor deviates from this initial value of one, then altered substrate
216
Ben West et al.
specificity of the enzyme is considered to be playing a role in glycan synthesis. This type of information is an additional output of the model beyond enzyme organization in the Golgi. 2.1 Glycan Processing
A typical simulation run will use 10,000 input glycans stochastically processed one-by-one to generate a computed glycan profile. The input glycans are divided into three types based on the allocated proportions as determined by the Man8GlcNAc2 fraction and Glc1Man9GlcNAc2 fraction. Then for each glycan the SSA will identify all possible substrate strings from the enzyme information (Table 1 ‘B’). The EFERs for each enzyme that can act on this glycan (Table 1 ‘D’) are added up and this value becomes the Total Propensity. The Total Propensity is required for implementation of the Gillespie algorithm. Using a pseudorandom number (we used a Mersenne Twister [25]) multiplied by the Total Propensity, an enzyme is chosen to act by randomly selecting from the EFERs of the reactions competing for the substrate in question. If there are multiple sites that the chosen enzyme can act upon, a similar process is iterated through to determine at which site the enzyme should act. A second pseudo random number is then used to randomly draw from an exponential distribution with a mean of (Total Propensity)1, in order to simulate a time interval within which the reaction occurred. The randomness arising from the use of pseudorandom numbers is an essential component of the stochasticity required to mimic the competitive and heterogeneous nature of glycan biosynthesis.
2.2 String Substitution to Modify Glycans In Silico
The string substitution is performed in a two-part process, to ensure fidelity of the glycan string. First, the substrate sequence is replaced with a proxy string and subsequently that string is replaced with the intended product (Table 1 ‘C’). The enzyme “OM Quench” (Table 1) is an artificial enzyme that is used to quench the processing of oligomannose glycans (Fig. 1). This is needed to mimic the action of glycans being transported retrogradely back to the endoplasmic reticulum or being phosphorylated for lysosomal targeting. Both of these actions stop further glycan processing, and to achieve this, the string substitution adds a “P” tag to all monosaccharide residues in the chain, making this new string unrecognizable for all enzymes. A glycan is modified in an iterative process until the cumulative time interval used by the enzymatic reactions exceeds the transit time (Table 1 ‘E’). At this point the glycan moves onto the next cisterna, or out of the Golgi if it was in the final cisterna. Once a glycan has moved out of the Golgi all “P” tags present are removed. The simulation thus generates 10,000 stochastically modified glycans, which are then used to produce a simulated glycan profile by calculating the relative abundance of each of its glycan species.
Computational Modeling of Glycan Processing in the Golgi for Investigating. . .
2.3 Comparison with Empirical Data
217
The simulation uses an empirically determined glycan profile to fit the simulated profile. The empirical data contains a list of glycans in linear notation, and the relative abundance of each as determined by mass spectrometry, as well as the error associated with each measurement. To compare simulated and empirical data, the empirical results have to first be aligned with the profile generated from the 10,000 simulated glycans so that a comparison of glycan abundance can be drawn. For this, the molecular weight of each glycan is calculated using the molecular weights of its monosaccharide building blocks. By calculating their molecular weights, we can merge different glycans that were computationally generated and have identical masses, to create a single virtual glycan species. This is necessary because such glycans are indistinguishable using simple MALDI mass spectrometry. Now the empirical and simulated relative abundances of each glycan of a given mass can be compared to calculate a penalty score based on the difference between empirical and simulated abundance. The sum of the individual penalty scores is the overall Score generated by the SSA. There are a multitude of scoring methods that can be employed, and each has its own merits. For example, the square difference between the error and the absolute value of the difference between empirical and simulated abundancies (Formula (1)) provides a penalty score that places a greater weight onto the most abundant glycans in the profile. This will allow computation of the best global fit, to obtain more generalized information from the model.
score ¼ ðemprical error jemprical abundance simulated abundancejÞ2 ð1Þ In contrast, using the coefficient of variation (Formula (2)) as the score method, puts much more focus on the less abundant glycan species. These can often be of great functional interest, such as some low abundance sialylated or fucosylated glycans, and therefore may require special attention. score ¼
jempirical abundance simulated abundancej empirical abundance
ð2Þ
These are just two possible scoring methods, and it will be up to the experimenter to consider which penalty score calculation best reflects the needs of the specific project.
3
Approximate Bayesian Computation for Fitting the Model to Experimental Data The second body of code is used to adjust the parameter set to ensure the modeled glycan profile fits the empirically determined one; this is accomplished through the application of Bayesian
218
Ben West et al.
statistics. The aim of Bayesian methods is to compute a posterior probability P(A|B), for a set of uncertain parameters A, given experimentally observed data B. Bayes formula, where P(A) is the prior probability of our beliefs about the system (e.g., abundance and activity of resident Golgi enzymes) before B is observed, is given below. P ðAjB Þ ¼
P ðBjA Þ∙P ðA Þ P ðB Þ
In our case, the beliefs about the system are the information gathered from literature, which are used to calculate the EFER. This information is highly uncertain in most cases. Therefore, rather than treating the EFERs as single values, they are defined as probability density functions (PDFs). Two types of PDFs are chosen to describe parameter distributions: log normal and exponential decay curves. These PDFs represent simple distributions with support on the positive half place only. The mean for each curve is set to our best estimate for the corresponding EFER. A log normal curve, when sampled from, will probabilistically yield a value that tends away from zero and it is for this reason that a log normal distribution is used to describe enzymes that we believe are active within a cisterna. In contrast, an exponential decay curve will yield values much closer to zero compared to a log normal, as smaller values have a greater probability. This type of PDF is used for simulating enzymes that we do not believe are acting in a particular cisterna. Importantly though, an exponential decay PDF still allows the model to engage an enzyme in that cisterna, so if we are wrong about the absence of the enzyme, the model will correct our error of judgment. By working through Bayes formula, we generate a posterior probability distribution and by comparing our prior knowledge to the generated posterior knowledge we could infer changes within the Golgi. However, this method would rely on calculating a likelihood function P(B|A), which represents the probability of observing the data B, given the parameter values A. But as is often the case, for the system modeled here the likelihood function is intractable. Therefore, an ABC method has been adapted [26]. Essentially, for each EFER, values are sampled from its PDF, fed into the SSA, and depending on the calculated Score, the parameter value sets are either accepted or rejected. 3.1 ABC Prerequisites
The ABC code loads enzyme rules and the order of the enzymes into its register from the same .xls file that the SSA uses. In addition, files containing 1000 x and y coordinates, describing values of the PDF, are used for each parameter. These PDF containing files are ordered in the same way as the enzymes in the register. When creating the PDFs it is important to choose an appropriate x value increment (step-size) in order for the tail of the distribution to reach
Computational Modeling of Glycan Processing in the Golgi for Investigating. . .
219
a y value below machine error (108, as anything below this is effectively zero to a machine), to avoid boundary effects. Furthermore, when creating a PDF with a log normal distribution a variance must be specified, which in most cases can be defined as the square of the mean. However, in some cases, our prior beliefs about the system are more uncertain due to a lack of literature and in these cases the variance can be increased to sample from a wider distribution. 3.2 Fitting Methodology
The PDF for each enzyme within each cisterna, as well as PDFs for the starting glycan fractions and transit time, are loaded in from the prior spreadsheet files and stored in an array. A pseudorandom number is generated and from this, the y values from a PDF are sequentially subtracted until the random number becomes negative. The last value subtracted becomes the EFER for that specific parameter, and repeating this for all PDFs, a parameter value set is passed onto the stochastic simulation. The SSA returns a Score, which is compared to a set threshold predetermined by the user. If the Score is below the threshold it is accepted, and the used parameter value set is stored. This process is repeated until there are n accepted sets of Scores. After n Scores are accepted, a file is generated containing the stored parameter value sets. This file can be used to produce the posterior PDFs for each parameter. By examining the shifts in distribution between prior and posterior PDFs we can begin to understand how the organization of the Golgi had to change to generate the glycan profile differences between the cellular states used at the start and end of the fitting process.
3.3 Threshold Variability
Within our code is a sub-routine that can help set the Scoreacceptance threshold. It has been demonstrated that MCMC algorithms are most efficient when the acceptance rate is set at 7.001% [27]. This is achieved by the code through first randomly sampling prior values and accepting or rejecting based on an initial threshold. This initial threshold is determined as the lowest Score that could be achieved in a reasonable amount of computational time (typically 24 h). If the acceptance rate is lower than 7%, the initial threshold is deemed too low, and the code will increase the threshold by 10%. In contrast, if greater than 7% of the prior values are being accepted, the threshold is deemed too high, and is decreased by 10%. In this way, the Score-acceptance threshold can be brought to a number close to that deemed optimally efficient. The Scoreacceptance threshold is continually changed within a run until a second more stringent user-defined threshold is reached. The algorithm will then sample in that region of the parameter space until 10,000 parameter values for each variable have been accepted. During a chain of fitting runs the stringent threshold is generally lowered whilst some of the parameters are shifted from to the
220
Ben West et al.
posterior PDF of the previous run. Decisions on which parameters to shift have to be taken by users experienced in the cell biology of glycan biosynthesis to ensure that parameter fitting yields plausible cellular states. This type of user guided fitting is a crucial aspect of the ABC approach [28].
4
Proof of Concept We are providing one example of validation here to help the reader understand how the modeling is used. For more examples the reader is referred to Fisher et al. papers [12, 29]. To explore if the model has the ability to make rational predictions, initially the computed glycan profile was fitted to experimental profiles obtained from wild-type HEK293T cell lines. Following this, HEK293T cells were treated with the MAN2 inhibitor swainsonine [30], and the altered glycan profile determined. As expected, the resulting glycan profile shows a significant increase in hybrid glycans. By starting with the parameter values obtained from the wild type fit, we reasoned that any changes in parameters required to fit to the swainsonine treated glycan profile, reflected changes in the glycosylation machinery due to swainsonine treatment. After iterating through several rounds of simulation and fitting, the model predicted a large decrease in the EFER of MAN2 (Fig. 3) [12].
5
Further Applications of the Computational Model The implication of a model that can deduce the organizational changes of the glycosylation machinery using experimentally determined glycan profiles is far-reaching and important. The ability to determine the changes necessary to get from one cell-type to another, or more importantly wild type to mutant cell is incredibly significant with regards to disease. For example, many congenital disorders of glycosylation (CDGs) show complex alterations in glycan biosynthesis [31, 32]. The model could potentially help the diagnostics of orphan CDGs by comparing the glycan profiles obtained from patients and healthy controls, which would highlight critical changes in the biosynthetic machinery. In addition, computational modeling could pinpoint critical changes in the glycosylation machinery that may correct the glycan profile in patients, and even perform in silico drug tests on healthy and patient cells to suggest novel therapeutic directions. Similar approaches could be used for other diseases, such as some cancers where altered glycosylation is known to influence the pathology [5]. For example, using the model it was found that decreased fucosylation flux in a Cog4KO cell line compared to wild type was the result of reduced MGAT1 activity by restricting the amount of
Computational Modeling of Glycan Processing in the Golgi for Investigating. . .
221
Fig. 3 Proof of concept using drug treatment of HEK293T cells. Total “effective” enzymatic rate changes for seven different enzymes, obtained when fitting a glycan profile of untreated cells to a profile obtained following swainsonine treatment. Error bars are SD for n ¼ 16 individual fitting runs. (Figure adapted from Fisher et al. [12])
complex glycans which were the ideal substrates for fucosylation [12]. Another possible application of the model is within the biopharmaceutical industry. As previously mentioned, glycan heterogeneity can have a detrimental effect on drug efficacy. Therefore, being able to control glycosylation would be of enormous benefit both economically and from a health perspective. By understanding how the organization of enzymes in the Golgi needs to change in order to get a specific glycan profile, it should be possible to plan glycoengineering strategies to control the glycosylation of biopharmaceuticals [29]. References 1. Takahashi M, Tsuda T, Ikeda Y et al (2003) Role of N-glycans in growth factor signaling. Glycoconj J 20:207–212 2. Gu J, Isaji T, Xu Q et al (2012) Potential roles of N-glycosylation in cell adhesion. Glycoconj J 29:599–607 3. Xu C, Ng DTW (2015) Glycosylation-directed quality control of protein folding. Nat Rev Mol Cell Biol 16:742–752 4. Chang IJ, He M, Lam CT (2018) Congenital disorders of glycosylation. Ann Transl Med 6:477–477 5. Pinho SS, Reis CA (2015) Glycosylation in cancer: mechanisms and clinical implications. Nat Rev Cancer 15:540–555
6. Nairn AV, Aoki K, Dela Rosa M et al (2012) Regulation of glycan structures in murine embryonic stem cells: combined transcript profiling of glycan-related genes and glycan structural analysis. J Biol Chem 287:37835–37856 7. Fisher P, Ungar D (2016) Bridging the gap between glycosylation and vesicle traffic. Front Cell Dev Biol 4:15 8. Rabouille C, Hui N, Hunte F et al (1995) Mapping the distribution of Golgi enzymes involved in the construction of complex oligosaccharides. J Cell Sci 108:1617–1627 ˜ a P, Bailey JE (1997) A mathematical 9. Uman model of N-linked glycoform biosynthesis. Biotechnol Bioeng 55:890–908
222
Ben West et al.
10. Krambeck FJ, Betenbaugh MJ (2005) A mathematical model of N-linked glycosylation. Biotechnol Bioeng 92:711–728 11. Spahn PN, Hansen AH, Hansen HG et al (2016) A Markov chain model for N-linked protein glycosylation towards a low-parameter tool for model-driven glycoengineering. Metab Eng 33:52–66 12. Fisher P, Spencer H, Thomas-Oates J et al (2019) Modeling glycan processing reveals Golgi-enzyme homeostasis upon trafficking defects and cellular differentiation. Cell Rep 27:1231–1243 13. Glick BS, Elston T, Oster G (1997) A cisternal maturation mechanism can explain the asymmetry of the Golgi stack. FEBS Lett 414:177–181 14. Ungar D, Oka T, Brittle EE et al (2002) Characterization of a mammalian Golgi-localized protein complex, COG, that is required for normal Golgi morphology and function. J Cell Biol 157:405–415 15. Harris SL, Waters MG (1996) Localization of a yeast early Golgi mannosyltransferase, Och1p, involves retrograde transport. J Cell Biol 132:985–998 16. Morre´ DJ, Ovtracht L (1977) Dynamics of the Golgi apparatus: membrane differentiation and membrane flow. Int Rev Cytol Suppl 5:61–188 17. Ungar D (2009) Golgi linked protein glycosylation and associated diseases. Semin Cell Dev Biol 20:762–769 18. Arigoni-Affolter I, Scibona E, Lin C-W et al (2019) Mechanistic reconstruction of glycoprotein secretion through monitoring of intracellular N-glycan processing. Sci Adv 5: eaax8930 19. McDonald AG, Hayes JM, Bezak T et al (2014) Galactosyltransferase 4 is a major control point for glycan branching in N-linked glycosylation. J Cell Sci 127:5014–5026 20. Sou SN, Jedrzejewski PM, Lee K et al (2017) Model-based investigation of intracellular processes determining antibody Fc-glycosylation under mild hypothermia. Biotechnol Bioeng 114:1570–1582 21. Kotidis P, Jedrzejewski P, Sou SN et al (2019) Model-based optimization of antibody
galactosylation in CHO cell culture. Biotechnol Bioeng 116:1612–1626 22. Jimenez del Val I, Nagy JM, Kontoravdi C (2011) A dynamic mathematical model for monoclonal antibody N-linked glycosylation and nucleotide sugar donor transport within a maturing Golgi apparatus. Biotechnol Prog 27:1730–1743 23. Wada Y, Azadi P, Costello CE et al (2007) Comparison of the methods for profiling glycoprotein glycans—HUPO human disease glycomics/proteome initiative multi-institutional study. Glycobiology 17:411–422 24. Gillespie DT (1976) A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J Comput Phys 22:403–434 25. Matsumoto M, Nishimura T (1998) Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans Model Comput Simul 8:3–30 26. Marjoram P, Molitor J, Plagnol V, Tavare´ S (2003) Markov chain Monte Carlo without likelihoods. Proc Natl Acad Sci U S A 100:15324–15328 27. Sherlock C, Thiery AH, Roberts GO, Rosenthal JS (2015) On the efficiency of pseudo-marginal random walk metropolis algorithms. Ann Stat 43:238–275 28. Toni T, Stumpf MPH (2009) Simulationbased model selection for dynamical systems in systems and population biology. Bioinformatics 26:104–110 29. Fisher P, Thomas-Oates J, Wood AJ, Ungar D (2019) The N-glycosylation processing potential of the mammalian Golgi apparatus. Front Cell Dev Biol 7:157 30. Elbein AD, Solf R, Dorling PR, Vosbeck K (1981) Swainsonine: an inhibitor of glycoprotein processing. Proc Natl Acad Sci U S A 78:7393–7397 31. Zeevaert R, Foulquier F, Jaeken J, Matthijs G (2008) Deficiencies in subunits of the conserved oligomeric Golgi (COG) complex define a novel group of congenital disorders of glycosylation. Mol Genet Metab 93:15–21 32. Ng BG, Freeze HH (2018) Perspectives on glycosylation and its congenital disorders. Trends Genet 34:466–476
Chapter 11 O-Glycologue: A Formal-Language-Based Generator of O-Glycosylation Networks Andrew G. McDonald and Gavin P. Davey Abstract The web application O-Glycologue provides an online simulation of the biosynthetic enzymes of O-linked glycosylation, using a knowledge-based system described previously. Glycans can be imported in GlycoCT condensed format, or else as IUPAC condensed names, and passed as substrates to the enzymes, which are modeled as regular-expression-based substitutions on strings. The resulting networks of reactions can be exported as SBML. The effects of knocking out different sets of enzyme activities can be compared. A method is provided for predicting the enzymes required to produce a given substrate, using an O-glycan from human gastric mucin as an example. The system has been adapted to other systems of glycosylation enzymes, and an application to ganglioside oligosaccharide synthesis is demonstrated. O-Glycologue is available at https://glycologue.org/o/. Key words O-linked glycosylation, Glycosyltransferases, Gangliosides, Human milk oligosaccharides, Computational modeling
1
Introduction Glycosylation is a posttranslational modification of proteins and lipids that is implicated in a wide variety of biological processes, that include cell structure, trafficking, protein folding [1], development and cell–cell recognition [2]. Glycosylation can be as simple as the addition of a single sugar, as in O-GlcNAcylation, but often involves the sequential addition of sugars to the carbohydrate moiety to form complex glycoconjugates. Further modification of the sugars is possible, including sulfation, methylation, acetylation, and phosphorylation. Although the number of activated sugars is quite small, the number of reactions catalyzed by glycosyltransferase enzymes is large, leading to a highly diverse population of glycoforms. Unsurprisingly, given their ubiquity and diverse functionalities, changes in glycosylation may result in many pathologies, such as congenital muscular dystrophies [3, 4] and cancer [5–7].
Gavin P. Davey (ed.), Glycosylation: Methods and Protocols, Methods in Molecular Biology, vol. 2370, https://doi.org/10.1007/978-1-0716-1685-7_11, © Springer Science+Business Media, LLC, part of Springer Nature 2022
223
224
Andrew G. McDonald and Gavin P. Davey
Computational models are becoming an increasingly important part of research into the biology and pathology of glycosylation. Since changes to the glycan epitopes in cancers are common diagnostic and prognostic features [5], predicting the effects of altering enzyme expressions to changes to the glycocalyx could be beneficial toward the development of treatments such as cancer vaccines [8, 9]. Modeling of N-linked glycosylation has received considerable attention, with a variety of tools available, such as GPP (part of RINGS [10]) and Glycosylation Network Analysis Toolbox [11]. In this work, we describe O-Glycologue, which models the enzymes associated with the synthesis of O-linked (GalNAc-linked) glycans, which are common to mucin glycoproteins. O-Glycologue can be used to predict the enzymes required to produce known glycans, or to predict the results of knocking out one or more enzyme activities on the resulting biosynthetic reaction networks. Methods are described for de novo simulation of the enzymes of Olinked glycosylation, comparative enzyme knockouts, and import of individual O-glycans in the commonly used GlycoCT-condensed format, which can then be presented as substrates to the enzymes of O-linked biosynthesis. Finally, extensions to other classes of glycan are described, with a demonstration of the modeling of ganglioside biosynthesis.
2
Methods O-Glycologue is a web application that is based upon a set of regular-expression based transformation rules acting on strings, called structure identifiers, themselves based on a formal language, which together comprise the deductive apparatus of a formal system [12]. Transferase enzyme activities are modeled as substitutions on structure identifiers, with Boolean conditionals to provide additional reaction constraints. The algorithm presents every substrate in turn to every active enzyme, recording any new products formed. Where the enzyme transformation rule can match at more than one point in the string, the string is split on the pattern and each substring is transformed separately, before being recombined to form the product. Any newly discovered products then become substrates at the next iteration. The process repeats until either no new products are generated, or the number of iterations exceeds a user-specified maximum. The set of reactions generated by the simulator forms a network, which is said be closed when no further products are generated when the iteration number is incremented. Since the extension of linear polysaccharides such as poly-N-acetyllactosamine can, in principle, proceed indefinitely, reaction networks are able to be closed by placing a limit on the total number of GlcNAc residues incorporated per glycan.
O-Glycologue: A Formal-Language-Based Generator of O-Glycosylation Networks
225
Quick access to the simulator is provided by the Home page. [Method] 1. Visit https://glycologue.org/o/.
2. Click “Enter a glycan” to choose a starting substrate (optional) in GlycoForm. 3. Choose the enzymes active in the simulator (optional). 4. Click “React” to enter the simulator. 2.1
GlycoForm
GlycoForm is the subset of O-Glycologue that draws two-dimensional representations of glycans from their structures identifiers. (The names of the two applications have been inherited from those of the original desktop tools for drawing N-glycans [13].) Glycans are drawn, from right to left, starting at the reducing end, the structure identifier acting as a set of drawing instructions to a Logo-like interpreter. Glycans are drawn using either the Symbolic Nomenclature for Glycans (SNFG) [14] symbols for the monosaccharides, or Oxford black and white symbols (UOXF). In GlycoForm, SNFG styling displays the anomeric information, and linkage position on the parent residue, on each bond (graph edge). In UOXF mode, the angle between two adjacent sugar units denotes the linkage position (2, 3, 4, 6, 8), while the edge style encodes the anomeric information, a dashed line denoting α linkage, and a solid line denoting β. The symbol key at the bottom of the page updates according to the chosen style (see Fig. 1). [Note: L-Iduronate (IdoA) is not covered by the UOXF symbol set.] The GlycoForm web page is divided into three sections, the Control panel, the GlycoForm results pane, and Instructions. The Control panel sets the parameters of the enzyme simulation and provides a switch between SNFG and UOXF drawing styles. GlycoForm input is in the form of native O-Glycologue structure identifiers or IUPAC condensed names [15], while GlycoCT-condensed [16] import is also available using a separate tool (see GlycoCT import). The default structure identifier is VT, which represents the α-D-GalNAc O-linked to a protein backbone. The modeling language assumes a default anomericity for each monosaccharide, as specified in the final column of the symbol key in the Instructions. The four possible actions available are Draw, Select, React, and Predict. On changing any text in the input box provided, the user should select the “Draw” button to update the image. Fig. 2 shows an example O-glycan obtained from the list of experimentally derived glycans provided on the O-Glycologue website at https:// glycologue.org/o/sample.php. Information on the structure appears in the Properties section, which provides calculated monoisotopic and average mass values, and core and epitope descriptions. Glycans can be downloaded as vector images in either SVG (Scalable Vector Graphics) or PDF (Portable Document Format), or as text in IUPAC-condensed name, Linear Code [17] or GlycoCTcondensed formats.
226
Andrew G. McDonald and Gavin P. Davey
Fig. 1 Single-letter code used in O-Glycologue, with IUPAC equivalents, definitions, and default anomeric linkage types. Symbols are drawn in the SNFG style 2.2 Enzyme Simulation
In O-Glycologue, the enzyme simulation results page is divided into five sections, labelled: Control panel, Structures, Glycan profile, Enzyme profile, and Reaction network. As with the GlycoForm tool, the Control panel allows the user to change the number of iterations of the simulator and the maximum number of GlcNAc residues incorporated. In the Structures section beneath, the total number of distinct glycans produced is given, along with a list of the enzyme activities omitted, where applicable. The glycans themselves are displayed as an array, the dimensions of which are set with the “Columns” and “Cell height” parameters of the Control panel. The display of the array is suppressed where the total number of glycans is greater than 24, but can be toggled in the Control panel at the top of the page. The value of the cell height is given as a
O-Glycologue: A Formal-Language-Based Generator of O-Glycosylation Networks
227
Fig. 2 An example O-linked glycan entered as an O-Glycologue structure identifier. The Properties panel includes calculated monoisotopic and average mass values in g/mol, along with glycan type and epitope information. Text-based export is provided by IUPAC-condensed name, Linear Code, and GlycoCT-condensed formats. The Select button allows the user to choose which enzymes to make inactive, while the React and Predict buttons pass the current structure as the initial substrate to the enzyme simulator in the forward (biosynthetic), or reversed directions, respectively
multiple of 20 px (pixels). Glycans are rendered as PNG image files and stored locally on the server. Structure identifiers are used to label each image, although their display can be toggled using the Control panel. Beneath the Structures section, the Glycan profile section tabulates the number of structures matching a particular pattern. In the first row of the table, the glycan core configurations 1–8 are listed. In the second row, a number of common epitopes expressed on O-glycosylated proteins are tallied, along with percentages of the total number of Lewis structures, x, y, a, and b, and their sialylated forms, and the A, B, H, and Sda/Cad antigens. The Enzyme profile records the number of enzyme activities disabled, and a linear array of switches to represent the enzyme activities in the simulator. Clicking an element of the array toggles that activity and reruns the simulator with the same options as before. The final section, Reaction networks, displays the network of reactions obtained, and some simple network properties, such as the degree-distribution. Networks are colored according to the type of monosaccharide being transferred, using the SNFG color system. For instance, reactions catalyzed by the fucosyltransferases are colored red, while those of the galactosyltransferases are colored yellow. A color key is printed at the bottom of the enzyme simulator output and enzyme-selection web pages.
228
Andrew G. McDonald and Gavin P. Davey
GraphViz (https://graphviz.org) is used to lay out network diagrams in a horizontal, left-to-right, orientation, with the initial substrate as the leftmost (root) node. Networks of 5000 nodes (glycans) or less can be downloaded as SVG and PDF vector images; for larger networks, the GraphViz DOT source file is provided for off-line layout and rendering. SBML export is provided for use with other applications, such as CellDesigner [18] or Tellurium Notebook [19]. Each glycan in the SBML output is annotated using GlycoCT XML [16], which defines the O-Glycologue structure identifiers in a widely recognised format. 2.3 Enzyme Knockouts
A powerful feature of O-Glycologue is its ability to knock out the activities of one or more enzymes. In addition, the effects of one knockout can be compared with those of another by setting the first as a baseline and then selecting a further subset of the activities of the remaining enzymes to eliminate. [Method] 1. Click the Enzymes menu-bar option, or go to https:// glycologue.org/o/enzymes.php.
2. Select the enzyme activities to knock out. The first enzymes to act in mucin-type glycosylation are those of the polypeptide Nacetylgalactosaminyltransferase family (EC 2.4.1.41), which is assumed always to be active. 3. Click the checkbox to enable “Set current K.O. selection as baseline,” which acts as a simulated “wild type” condition. 4. Select a further set of enzyme activities to be knocked out. 5. Choose the number of iterations of the simulator, between 1 and 10. The initial substrate is set to ‘VT’, by default, but can be changed to any valid structure identifier. Select “comparing with the baseline” to do a baseline comparison. 6. Click Run to run the simulation with these options. 7. In the Structures panel, click Show/hide structures to display the glycan structures obtained in the simulation. Click “missing” from the Control-panel section to toggle the display of glycan structures absent from the current knockout, but which are present in the baseline (see Fig. 3). 8. When a baseline knockout is set, the additional knockout compares the numbers of different epitopes produced, at the current iteration level and GlcNAc limit, and records the increase (+) or decrease () in the numbers of glycans in each category, in the Glycan-profile section of the results. The Enzyme profile shows two panels, one above the other, to enable ready comparison of the enzyme activities knocked out in each case.
O-Glycologue: A Formal-Language-Based Generator of O-Glycosylation Networks
229
Fig. 3 Comparison between two sets of enzyme knockouts in O-Glycologue. Structures shown with a grey background are missing from the second of the two knockouts, where the first knockout is set as the baseline
Moving the mouse over the bar reveals popup text with the short name of the enzyme under the cursor. 2.4
GlycoCT Import
A recent development of O-Glycologue has been the adoption of the widely supported GlycoCT condensed format for glycan input [16]. [Method] 1. Go to https://glycologue.org/o/import.php.
2. Paste GlycoCT condensed-format text into the input box (see Fig. 4). 3. Click Submit. 4. Click on a structure identifier to open it in the GlycoForm drawing application. A set of one or more O-Glycologue structure identifiers will be generated from the GlycoCT. When the parent linkage position is unspecified by the GlycoCT, a number of possible linkage positions is generated based on known combinations for the child monosaccharide unit. Thus, for example, in Fig. 4, the GlycoCT translator produces the partial structure identifier, [[f2]L?[f?]Y6][S3L3]VT, with question marks representing the unknown linkage positions. Under the assumption that D-Gal (L) can be 3- or 4-linked to the parent sugar, and that L-Fuc (f) can form 2-, 3- or 4-linked, the Cartesian product predicts six possible structures, of which two are immediately eliminated from consideration owing to the impossibility of having two monosaccharides linked to the same position on the parent sugar, namely, [[f2]L3[f3]Y6][S3L3]VT and [[f2]L4 [f4]Y6][S3L3]VT, in which both fucose and galactose are linked to the same atom on the N-acetylglucosamine. The remaining four
230
Andrew G. McDonald and Gavin P. Davey
Fig. 4 GlycoCT-condensed import in O-Glycologue. GlycoCT code is translated to one or more O-Glycologue structure identifiers, which are passed to the simulator to determine their predictability by the system. Clicking on the structure identifier link opens it in GlycoForm, from where it can be submitted to the simulator as a substrate
semantically valid structure identifiers are then passed to the reversed-enzyme simulator, of which two are found to be predictable within the system. Clicking on any structure identifier will open it in GlycoForm, where it can be drawn, and then passed to the enzyme simulator as a substrate in either in the forward (React) or reverse (Predict) directions. 2.5
Prediction
Reversing the activities of the enzymes removes sugars sequentially from the starting glycan. If the reversed-enzyme simulator reaches the terminus (T) through the reversed action of EC 2.4.1.41 (ppGalNAc-T), then the starting glycan is considered to be predicted within the system. As a demonstration, the method is used to predict the enzyme activities that are required to produce a glycan observed in human gastric mucin [20]: [Method] 1. Go to the GlyTouCan glycan repository [21] (https://gly toucan.org) and retrieve the record with Accession Number G46941CR [22].
2. From the Computed Descriptors section, select and copy the GlycoCT portion of the record. 3. Go to https://glycologue.org/o/import.php.
O-Glycologue: A Formal-Language-Based Generator of O-Glycosylation Networks
231
4. Paste the GlycoCT text into the input window and press the Submit button. The structure identifier [L?Y3L4Y6][[[f2] L4Y6][[f2]L4Y3]L3]VT is produced, which contains one D-galactose linked to its parent residue, β-D-GlcNAc, at an unknown linkage position. Two possible structures will be generated, assuming the linkage can be at carbon 3 or 4 of the parent. The two structure identifiers will have already been passed as substrates to the reversed-enzyme simulator, and both are predicted within the system. 5. Click on the first fully specified structure, [L3Y3L4Y6][[[f2] L4Y6][[f2]L4Y3]L3]VT, to open it in GlycoForm. It is a type1 O-glycan, being terminated with β-1,3-linked galactose, and of core type 2, and expressing the H antigen. 6. Click the Predict button to present it to the enzyme simulator and predict the enzymes used in its biosynthesis. 7. In the Structures section of the simulator output, 50 O-glycans were generated by the reversed-enzyme simulator, using nine enzyme activities, C1Gal-T1, C2Gn-T, Gcnt2, alpha2Fuc-Ts, beta3Gal-T5, beta3Gn-T2/3/4/5/7, beta3Gn-T3, beta4Gal-T4, and ppGalNAc-Ts. The resulting network is shown in Fig. 5a. 8. Under Enzyme profile, click the “Change” link to enter the Enzymes page. The nine predicted enzymes are enabled, while the remaining enzymes are disabled. Change the initial substrate to VT, and the number of iterations to 10, before clicking Run, to run the enzyme simulator in the forward direction using only these enzymes. 9. Increase the iteration number again to 12, leaving the GlcNAc limit at the default value of 4. The biosynthetic network will appear as shown in Fig. 5b. Note: The anomericity of the terminal D-Gal, which was undetermined in the GlycoCT used in steps 2–4, has here been assumed as to be β. The α-linked variant structure, [La3Y3L4Y6][[[f2] L4Y6][[f2]L4Y3]L3]VT, is not predicted within the O-Glycologue system. 2.6 Extension to Other Systems
In addition to GalNAc-linked O-glycans, GlycoForm can draw other types of glycan, such as N-linked, xyloglucan, glycosphingolipids (such as gangliosides) and human milk oligosaccharides (see Fig. 6). Support for other glycosyltransferases is under development, in order to be able to generate biosynthetic reaction networks for different classes of glycan. In its more general form, the enzyme simulator is called Glycologue. For example, ganglioside biosynthesis in Glycologue is at present modeled with a set of 11 enzymes (Fig. 7a) that generate the mono-, di-, tri-, and tetrasialylated structures of the GM, GD, GT, and GQ series [24].
232
Andrew G. McDonald and Gavin P. Davey
Fig. 5 Predicted biosynthetic networks of a human gastric mucin O-glycan (GlyTouCan accession number G46941CR) in (a) the reverse and (b) the forward directions
O-Glycologue: A Formal-Language-Based Generator of O-Glycosylation Networks
a
233
b 4
2 4 6
3
6
6
4
4
4
3 3
4
3
2 3
c
d 4 6
6
3
6
4 3
4
3
3 8
8
e
4
4
3
f 3
4
3
4
3
4
3
4
3
3
3
4
4
2S
4
3
3
3
4
NS
Fig. 6 Examples of the various types of oligosaccharide structure that can be drawn with GlycoForm. (a) N-linked glycan, [L4Y2M6][S3L4Y2M3][Yb4]Mb4Y4[f6]YT; (b) O-linked glycan, [L4[f3]Y6][S3L3]VT; (c) human milk oligosaccharide, [[L4[f3]Y6][S6L4Y3]L4G]; (d) a ganglioside (GQ1bα), [S8S3]L3[S6]Vb4[S8S3]L4GT; (e) hyaluronan, [Z3Y4Z3Y4Z3Y4Z3Y4Z3Y]; (f) heparan sulfate, Z3Y4[s2]u3Y4Z3[sn]Y4Z3L3L4XT [Method] 1. Go to https://glycologue.org/g/.
2. Click the “Enzymes” menu bar item, or the “Select” button on that page. 3. Change the iteration number to 10, then click “Run.” 4. In the Control panel, increase the iteration number to 11 to close the network (Fig. 7b). 5. Under “Enzyme profile,” click the fourth enzyme in the “Knockout” panel to knock out the activity of ST3Gal-V (EC 2.4.99.9, lactosylceramide α-2,3-sialyltransferase), the loss of which is known to result in auditory impairment in humans and mice [23]. The resulting reaction network is shown in Fig. 7c. 6. To compare with the full network, in “Enzyme profile” section click “Change” to open the enzyme-selection tool. 7. Set the number of iterations to 8, then check the “comparing with the baseline option” before clicking “Run.” The Structures section of the results page can then be used to compare the glycans in the knockout with those of the full network with all enzymes active (cf. Enzyme knockouts). As with the O-glycosylation enzyme simulator, the ganglioside variant includes support for three donors of the sialic acids,
234
Andrew G. McDonald and Gavin P. Davey
Fig. 7 Simulation of ganglioside biosynthesis. (a) List of the enzymes modeled. (b) Full network of 41 oligosaccharides, with all 11 enzyme activities enabled. (c) The predicted ganglioside biosynthetic network with the activity of ST3Gal-V knocked out
CMP-Neu5Ac, CMP-Neu5Gc, and CMP-Kdn. Although the reaction network described by the above method is closed after 11 iterations, comprising 39 glycans including the initial substrate, ceramide (T), the number of possible gangliosides increases to 6077 on the inclusion of the three sialic-acid donors. By default, Glycologue does not include the rarer Neu5Gc and Kdn sugars in forward simulations, but will include them automatically during reversed simulations, in order to allow prediction of the enzymes leading to the target glycan, should the glycan contain these residues.
Acknowledgments This work was supported in part by an EU Initial Training Network, Project No. 608381—Training in Neurodegeneration, Therapeutics Intervention and Neurorepair (TINTIN) (to G.P.D.) and Science Foundation Ireland Grant No. SFI-13/SP SSPC/I2893 (to G.P.D.).
O-Glycologue: A Formal-Language-Based Generator of O-Glycosylation Networks
235
References 1. Xu C, Ng DT (2018) Glycosylation-directed quality control of protein folding. Nat Rev Mol Cell Biol 16(12):742–752 2. Sharon N (1987) Bacterial lectins, cell-cell recognition and infectious disease. FEBS Lett 217 (2):145–157 3. Endo T (2015) Glycobiology of alphadystroglycan and muscular dystrophy. J Biochem 157(1):1–12. https://doi.org/10. 1093/jb/mvu066 4. Kanagawa M, Toda T (2018) Ribitolphosphate-a newly identified posttranslational glycosylation unit in mammals: structure, modification enzymes and relationship to human diseases. J Biochem 163(5):359–369. https://doi.org/10.1093/jb/mvy020 5. Pearce OMT (2018) Cancer glycan epitopes: biosynthesis, structure and function. Glycobiology 28(9):670–696. https://doi.org/10. 1093/glycob/cwy023 6. Oliveira-Ferrer L, Legler K, Milde-Langosch K (2017) Role of protein glycosylation in cancer metastasis. Semin Cancer Biol 44:141–152. https://doi.org/10.1016/j.semcancer.2017. 03.002 7. Varki A, Kannagi R, Toole B, Stanley P (2015) Glycosylation changes in cancer. In: Varki A, Cummings RD et al (eds) Essentials of glycobiology. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, pp 597–609. https://doi.org/10.1101/glycobiology.3e. 047 8. Tarp MA, Clausen H (2008) Mucin-type O-glycosylation and its potential use in drug and vaccine development. Biochim Biophys Acta 1780(3):546–563. https://doi.org/10. 1016/j.bbagen.2007.09.010 9. Heimburg-Molinaro J, Lum M, Vijay G, Jain M, Almogren A, Rittenhouse-Olson K (2011) Cancer vaccines and carbohydrate epitopes. Vaccine 29(48):8802–8826. https:// doi.org/10.1016/j.vaccine.2011.09.009 10. Akune Y, Hosoda M, Kaiya S, Shinmachi D, Aoki-Kinoshita KF (2010) The RINGS resource for glycome informatics analysis and data mining on the Web. OMICS 14 (4):475–486. https://doi.org/10.1089/omi. 2009.0129 11. Liu G, Neelamegham S (2014) A computational framework for the automated construction of glycosylation reaction networks. PLoS One 9(6):e100939. https://doi.org/10. 1371/journal.pone.0100939 12. McDonald AG, Tipton KF, Davey GP (2016) A knowledge-based system for display and
prediction of O-glycosylation network behaviour in response to enzyme knockouts. PLoS Comput Biol 12(4):e1004844. https://doi. org/10.1371/journal.pcbi.1004844 13. McDonald AG, Tipton KF, Stroop CJ, Davey GP (2010) GlycoForm and Glycologue: two software applications for the rapid construction and display of N-glycans from mammalian sources. BMC Res Notes 3:173. https://doi. org/10.1186/1756-0500-3-173 14. Varki A, Cummings RD, Aebi M, Packer NH, Seeberger PH, Esko JD, Stanley P, Hart G, Darvill A, Kinoshita T, Prestegard JJ, Schnaar RL, Freeze HH, Marth JD, Bertozzi CR, Etzler ME, Frank M, Vliegenthart JF, Lutteke T, Perez S, Bolton E, Rudd P, Paulson J, Kanehisa M, Toukach P, AokiKinoshita KF, Dell A, Narimatsu H, York W, Taniguchi N, Kornfeld S (2015) Symbol nomenclature for graphical representations of glycans. Glycobiology 25(12):1323–1324. https://doi.org/10.1093/glycob/cwv091 15. Sharon N (1986) IUPAC-IUB Joint Commission on Biochemical Nomenclature (JCBN). Nomenclature of glycoproteins, glycopeptides and peptidoglycans. Recommendations 1985. Eur J Biochem 159(1):1–6 16. Herget S, Ranzinger R, Maass K, Lieth CW (2008) GlycoCT—a unifying sequence format for carbohydrates. Carbohydr Res 343 (12):2162–2171. https://doi.org/10.1016/j. carres.2008.03.011 17. Banin E, Neuberger Y, Altshuler Y, Halevi A, Inbar O, Nir D, Dukler A, author_in_Japanese (2002) A novel linear code® nomenclature for complex carbohydrates. Trends Glycosci Glycotechnol 14(77):127–137. https://doi.org/ 10.4052/tigg.14.127 18. Matsuoka Y, Funahashi A, Ghosh S, Kitano H (2014) Modeling and simulation using CellDesigner. Methods Mol Biol 1164:121–145. https://doi.org/10.1007/978-1-4939-08059_11 19. Medley JK, Choi K, Konig M, Smith L, Gu S, Hellerstein J, Sealfon SC, Sauro HM (2018) Tellurium notebooks—an environment for reproducible dynamical modeling in systems biology. PLoS Comput Biol 14(6):e1006220. https://doi.org/10.1371/journal.pcbi. 1006220 20. Jin C, Kenny DT, Skoog EC, Padra M, Adamczyk B, Vitizeva V, Thorell A, Venkatakrishnan V, Linden SK, Karlsson NG (2017) Structural diversity of human gastric mucin glycans. Mol Cell Proteomics. https:// doi.org/10.1074/mcp.M117.067983
236
Andrew G. McDonald and Gavin P. Davey
21. Tiemeyer M, Aoki K, Paulson J, Cummings RD, York WS, Karlsson NG, Lisacek F, Packer NH, Campbell MP, Aoki NP, Fujita A, Matsubara M, Shinmachi D, Tsuchiya S, Yamada I, Pierce M, Ranzinger R, Narimatsu H, Aoki-Kinoshita KF (2017) GlyTouCan: an accessible glycan structure repository. Glycobiology 27(10):915–919. https:// doi.org/10.1093/glycob/cwx066 22. International Glycan Structure Repository. GlyTouCan; Accession Number¼G46941CR. GlyTouCan.org. Available via GlyTouCan.org.
https://glytoucan.org/Structures/Glycans/ G46941CR 23. Inokuchi JI, Go S, Yoshikawa M, Strauss K (2017) Gangliosides and hearing. Biochim Biophys Acta 1861(10):2485–2493. https:// doi.org/10.1016/j.bbagen.2017.05.025 24. McDonald AG, Davey GP (2021) Simulating the enzymes of ganglioside biosynthesis with Glycologue. Beilstein J Org Chem. 17:739–748.
Chapter 12 Synthetic Strategies for FRET-Enabled Carbohydrate Active Enzyme Probes Meenakshi Singh, Michael Watkinson, Eoin M. Scanlan, and Gavin J. Miller Abstract Carbohydrates are an essential class of biomolecule and carbohydrate active enzymes (CAZys) catalyze their synthesis, refinement, and degradation, hence contributing an overall regulatory capacity to their underpinning physiological roles. Here we survey recent accomplishments for accessing defined carbohydrate structures, suitably equipped with FRET probe capability, followed by their utilization in studying particular classes of CAZy. Key words Glycosidase, Carbohydrate, FRET, Enzyme probe
1
Introduction Glycosylation is ubiquitous in nature and carbohydrates play pivotal roles in a diverse range of complex biological processes including fertilization, neuronal development, cell–cell interactions, hormone activity, inflammatory responses, and infection [1–3]. Metabolism of glycans is therefore critical to the regulated biological function of an organism. Carbohydrates are composed of individual monosaccharides covalently linked through a diverse array of glycosidic bonds. Degradation of carbohydrates is challenging due to the relative stability of these glycosidic linkages; however, their hydrolysis is catalyzed by enzymes known as glycosidases, which typically increase the rate of this hydrolysis by 1017 fold compared to spontaneous hydrolysis events [4]. This remarkable acceleration in rate identifies glycosidases as highly efficient catalysts for this process and although they catalyze similar reactions, they also exhibit exquisite substrate selectivity [5]. The large number of glycosidases found in nature (e.g., approximately 3% of the human genome is dedicated to the encoding of glycosidases) has led to their classification into various
Gavin P. Davey (ed.), Glycosylation: Methods and Protocols, Methods in Molecular Biology, vol. 2370, https://doi.org/10.1007/978-1-0716-1685-7_12, © Springer Science+Business Media, LLC, part of Springer Nature 2022
237
238
Meenakshi Singh et al.
sub-categories based on criteria such as catalytic mechanism, substrate specificity, site of glycosidic bond cleavage, and amino acid sequence [6, 7]. exo-Glycosidases act specifically on the termini of polysaccharides, while endo-glycosidases promote internal hydrolysis of glycosidic linkages. Hydrolysis can occur via one of two widely accepted mechanisms; proceeding with either retention or inversion of stereochemistry at the anomeric position, as outlined by Koshland in 1953 [8]. In both mechanisms, oxocarbenium ion intermediates and two amino acids (Asp or Glu) are involved and several studies outlining the development of mechanistic details of glycosidase activity on various carbohydrates have been published [5, 9, 10]. More recently, evidence for a novel epoxide intermediate in glycosidase catalysis has also been reported [11]. Glycosidases have been organized into more than 120 different families based on amino acid sequence similarity and this classification has led to the generation of the CAZy database (Carbohydrate Active Enzyme, available online at www.cazy.org) [12–14]. Glycosidases grouped within a particular family share structural similarities and often their hydrolytic mechanisms are identical. The emergence of glycosidases’ dominant role in many biological and industrial processes has spurred interest into the development of tools to accurately detect and profile these enzymes so as to improve our biological understanding of their function, particularly within the context of human health and disease [15]. Glycosidase enzyme deficiencies are associated with several pathological conditions including Gaucher disease, Pompe disease, Crohn’s disease, and cancer [16, 17]. Schindler and Fabry diseases arise due to incomplete degradation of carbohydrates with terminal α-N-acetylgalactosamine and α-galactose, respectively [18]. Heparanase, a heparan sulfate degrading endoglycosidase, is overexpressed in almost all cancer types where it is closely associated with tumorigenesis and metastasis [19]. Furthermore, glycosidases have been reported to function as important gene markers, for example, the gene encoding β-galactosidase, lacZ, is extensively used as a reporter gene in animals and yeast [20]. Glycosidases also play a critical role in industry with xylanases and cellulases of particular interest for the bulk production of bioethanol [21]. This review aims to survey the development and application of fluorescent probes to detect this important class of enzyme using appropriately labeled carbohydrate substrates (from 2010 onward). Focus is paid to the design and synthesis of chemical probes whereby a glycosidic linkage is hydrolyzed by the enzyme under study and a reporter molecule is released, enabling a ratiometric monitoring of activity using Fo¨rster Resonance Energy Transfer (FRET). The review will not cover other families of fluorescent probes or activity-based glycosidase probes which have been comprehensively reviewed elsewhere [22]. A brief overview of FRET is presented first, followed by consideration of probes for endo and
Synthetic Strategies for FRET-Enabled Carbohydrate Active Enzyme Probes
239
exo-acting glycosidases and ending with recent examples beyond this classification, for example, glycosyl transferases and antibodies with innate glycosidase activity.
2
Fo¨rster Resonance Energy Transfer (FRET) FRET, often also referred to as Fluorescence Resonance Energy Transfer, is the process by which energy is transferred in a nonradiative manner from an excited state donor molecular chromophore (D) to another acceptor chromophore (A), instead of being emitted from the donor as a photon, that is, a conventional fluorescence response, and is ratiometric in nature. There are a number of requirements for FRET to occur and the underlying theory has been reviewed extensively elsewhere and is therefore not covered here [23, 24]. The phenomenon is highly distance dependent and decreases exponentially with distance between donor and acceptor molecule and is generally only considered to be effective over distances below 10 nm in biological systems. As this is also the length scale of proteins, it paves a way to extremely widespread utility in probing protein-protein interactions, modifications, conformational changes, and a plethora of biochemical signaling events and processes [25], and has often been described as a “molecular ruler.” In addition to this close proximity, there must also be adequate overlap between the fluorescence spectrum of D and the absorption spectrum of A along with the sufficiently high quantum yield of D ( fD) and the high absorption coefficient of A (eA) (Fig. 1). A further requirement for FRET to occur is that the transition dipoles of both D and A must be appropriately oriented or at least one (or both) must have a level of rapid rotational freedom, otherwise distance estimations can suffer large errors; this condition is usually met for chromophores attached to biomolecules. FRET is also sensitive to the local environment (e.g., solvent, pH, polarity, and viscosity) and these elements need to be considered when interpreting data. A wide range of chromophores have been employed in FRET systems and whilst genetically encoded biosensors utilizing fluorescent proteins have found considerable applications in the life sciences [26], a number of other chromophores have also been exploited including organic dyes, quantum dots, and lanthanide complexes [27]. Organic fluorophores are particularly useful as they can readily be attached covalently to a range of substrates such as amino acid residues, nucleic acids, and carbohydrates; large arrays of such labels are synthetically accessible or commercially available with desirable photochemical properties [28]. Quantum dots are also increasingly used and are appealing chromophores due to their highly tunable photochemical
240
Meenakshi Singh et al.
Fig. 1 Illustration of donor and acceptor partners required for FRET, including the spectral overlap region of component fluorophores, adapted with permission from ref. 25
properties, broad absorption and narrow emission bands. In addition, they generally display excellent chemical and thermal stability and there are increasingly reliable and variable methods available to link them to the analyte in question [27]. Lanthanide complexes (principally of Tb3+ and Eu3+) are also increasingly popular donors as although they have generally narrower absorption profiles they are excellent donors, due to their long photoluminescent decays. In this review the focus is on FRET systems that exploit enzymatic cleavage to elicit a change in the fluorescence response. This has been widely utilized in a number of biosensors to detect enzymatic activity in which the biosensor design includes a specific cleavage sequence for the enzyme of interest [25]. Prior to enzymatic action, the FRET donor and acceptor are located closely enough for efficient energy transfer to occur (FRET quenching). However, upon cleavage of the biomolecule by the enzyme, they move apart resulting in a reduction of the FRET response and a concomitant increase in the fluorescent response of the donor fluorophore allowing analyte quantitation.
3
FRET Probes for Endo-Acting Glycosyl Hydrolases
3.1 Maltoligosaccharide Probes for α-Glucosidase
In 2012 Matsuoka and coworkers synthesized a series of FREToligosaccharide probes for α-amylase [29], an enzyme responsible for the cleavage of α-1,4 glycosidic bonds in large, polysaccharide architectures, including starch and glycogen. Links between α-amylase isozyme activity and diseases such as myeloma and
Synthetic Strategies for FRET-Enabled Carbohydrate Active Enzyme Probes
241
Scheme 1 Reagents and conditions: (i) HBr/AcOH, Ac2O, AcOH, then AcSK, DMF, 72% two steps (ii) 5-bromopent-1-ene, DMF, diethylamine, 91%; (iii) NaOMe, MeOH, quant.; (iv) 2-naphthaldehyde di-ibutyl acetal, CSA, then Ac2O-pyr, 45%; (v) BH3.NMe3, AlCl3, MS 4 Å, THF, 74%; (vi) NaOMe, MeOH, quant.; (vii) HS (CH2)2NH2.HCl, MeOH, H2O, UV irradiation, quant.; (viii) dansyl chloride, Et3N, MeOH, quant
diabetes have been reported and synthetic probes that could ratiometrically determine α-amylase activity are therefore important [30]. Building upon their previous work in this area [30], the group synthesized a small library of maltoligosaccharides with terminal FRET partners (a naphthylmethyl, NAP, donor and dansyl, DANSYL, acceptor), varying the length of the repeating α-1–4-DGlc sugar chain from trisaccharide up to heptasaccharide. Peracetylated maltoligosaccharide starting materials were sourced from maltodextrin or through acetolysis of cyclodextrin components and model work using maltose was first undertaken to establish an efficient route for fluorescent label incorporation (Scheme 1). Accordingly, conversion of anomeric acetate mixture 1 to an anomeric thioacetate was accomplished via an SN2 reaction of the derived α-bromo glycoside with KSAc. Subsequent chemoselective S-deacetylation using diethylamine was followed by installation of an S-pentenyl glycoside at the reducing end in 72% yield over two steps. Following this, Ze´mplen deacetylation afforded S-pentenyl maltoside 2 in quantitative yield. A 4,6-naphthylidine acetal was next installed at the nonreducing end, followed by reacetylation of the remaining hydroxyl groups, opened regioselectively using BH3.NMe3-AlCl3 in 74% yield to incorporate the NAP-donor at the nonreducing end 6-position. Another Ze´mplen deacetylation step gave 60 -O-NAP-S-pentenyl maltoside 3. With the nonreducing component in place, radical mediated anti-Markovnikov addition of 2-mercaptoethyl amine to the S-pentenyl group and subsequent reaction of the free amine with dansyl chloride afforded target model probe 4, with both of these steps reported in quantitative yield. Solvent choice was important for effecting the radical functionalization reaction with MeOH/H2O found to provide an optimal solubility of both starting materials.
242
Meenakshi Singh et al.
Fig. 2 (a) Library of nonreducing end NAP and reducing-end DANSYL functionalized maltoligosaccharide probes 5–10, the α-amylase cleavage point is shown in green. (b) Time course for hydrolysis of maltoligosaccharide probes by α-amylase using fluorescence detection at 333 nm, adapted with permission from ref. 29. Orange and blue curves represent hexa- and heptasaccharide substrates with higher fluorescence intensities of NAP emission
This successful synthetic route was then adapted for longer maltoligosaccharide sequences to afford a small matrix of compounds, 5–10, with the inter-chain length varying between one and five sugar units (Fig. 2a). When completing this synthesis, the authors noted a need to increase the reaction time for standard Ze´mplen transesterification on longer substrates and also that the previously successful reducing-end 4,6-naphthylidine acetal installation was not regioselective, requiring careful chromatographic separation of mono-, di-, and tri-O-naphthylmethylidenated materials. The installed NAP and DANSYL fluorogenic partners enabled a continuous assay to be established for each of substrates 5–10 using human saliva α-amylase (Fig. 2b). For sequences longer than pentasaccharide, a significant increase in relative fluorescence intensity at 333 nm (NAP emission) was observed (corrected for the initial FRET dansyl intensity), indicating that chain length correlated to enzymatic capability and that larger sequences may be better substrates for α-amylase. The results obtained by Matsuoka correlated to those reported previously using a 14C-lablled maltoligosaccharide [31]. 3.2 Chitooligosaccharides for an Endo-β-Glycosidase
In 2013, Halila and colleagues developed a FRET probe of chitin [32], a linear polysaccharide composed of β-(1,4)-linked N-acetylglucosamine (GlcNAc) units. Chitin is a natural polysaccharide with valuable biological and biomedical properties and methods to access and functionalize these materials in homogeneous form is an area of active research. Utilizing a reducing end, hemiacetal/aldehyde modification with aniline catalyzed amine ligation followed by
Synthetic Strategies for FRET-Enabled Carbohydrate Active Enzyme Probes
243
Scheme 2 Reagents and conditions: (i) NaCNBH4, AcONH4, aniline, propargyl amine, 70%; (ii) N3-EDANS, CuSO4, sodium ascorbate; (iii) DABITC, NaHCO3, 48% two steps
reductive amination, the group installed a click-ready tether at one terminus (Scheme 2). Starting from tetra-N-acetyl-chitopentaose 11 an optimized ligation method was developed to deliver propargyl modified oligosaccharide 12. By using an excess of propargyl amine ([propargylamine]:[11]:[aniline] stoichiometry ¼ 5:1:0.5) a yield of 70% for 12 was obtained. The authors proposed that the reaction sequence proceeded via rapid formation of an intermediate aniline-iminium ion which then underwent transimination with propargyl amine followed by reduction with NaCNBH4, to 12. With this oligosaccharide in hand, a FRET pair [5-(2-aminoethyl)amino-1-naphthalene-sulfonic acid, EDANS and dimethylaminophenylazophenyl, DAB] was installed utilizing a method developed by Cottaz (Scheme 2) [33]. This introduced the fluorescent EDANS component at the opened reducing end sugar and the DAB acceptor through the amine of the nonreducing GlcNAc residue. Copper catalyzed azide-alkyne cycloaddition (CuAAC) “click” chemistry was used to incorporate EDANS followed by free amine reaction with the isocyanate form of DAB (4-dimethylaminoazobenzene-40 -isothiocyanate, DABITC) in 48% yield over the two steps to deliver the final probe 13. This fluorescence quenched material was shown to be an active substrate for Chitinase A1 from Bacillus circulans, with a time dependent increase in fluorescence at 490 nm corresponding to EDANS emission upon cleavage of the oligosaccharide by the enzyme and separation of the FRET-quenched pair.
244
Meenakshi Singh et al.
3.3 Fluorogenic Probe for Endo-βAcetylglucoaminidases 3.3.1 Core N-glycan Probes
Endo-β-N-acetylglucosaminidase (ENGase) is a hydrolytic enzyme that cleaves a disaccharide component (N,N0 -diacetylchitobiose) from within asparagine linked glycans. Recently, cytosolic ENGases have been highlighted as crucial mediators in the development and pathogenesis of NGLY1, an N-glycanase deficiency, which gives rise to a rare genetic disorder. As such, ENGase inhibitors are promising drug targets, but this is coupled with a requirement for a rapid and accurate method to screen potential inhibitors. In 2018, Matsuo and coworkers developed a fluorescence quenched pentasaccharide probe 14 for ENGase activity [34], incorporating fluorescent N-methylanthraniloyl (NMA) at the nonreducing terminus and 2,4-dinitrophenyl (DNP) as the acceptor component at the reducing end (Fig. 3a). The group used their previously reported chemical methodology to access the required pentasaccharide core [35] and attached the FRET components using 2,4-dinitrophenylfluoride (for DNP) and N-methylanthranilic acid (for NMA) to effect aromatic substitution or amidation with respective carbohydrate amine components. The required probe 14 was first shown to be 98% quenched by association of the appended intramolecular components (Fig. 3b, inset) followed by confirmed hydrolysis of 14 by commercial ENGase Endo-M from Mucor hiemalis. The process was monitored by HPLC and the characteristic retention times for enzymatic hydrolysis products confirmed using comparison to authentic standards. Following this initial study, a real-time activity monitoring of the hydrolysis reaction was tested against five different ENGases from different species and activity confirmed for human and mouse ENGases, Endo-Om and Endo-CC, but no activity was observed against Endo-H. This was rationalized by Endo-H not generally being considered able to cleave core N-glycan regions, instead
Fig. 3 (a) Pentasaccharide ENGase FRET probe 14, green highlights the ENGase cleavage point, blue the NMA fluorescent component and red the DNP acceptor. (b) Fluorescence emission spectrum of 14 (inset red) alongside the NMA cleavage product (blue), following excitation and probe cleavage at 340 nm, concentration of 14 ¼ 40 μm, adapted with permission from ref. 34
Synthetic Strategies for FRET-Enabled Carbohydrate Active Enzyme Probes
245
preferring high mannose type ligands. Finally, the authors adapted their probe to a high throughput screen (HTS) format. Detection of the hydrolysis of 14 was possible down to 0.5 μM and, importantly for HTS development, the inclusion of up to 20% DMSO in the reaction medium had no effect on probe activity. Using either thiazoline or Rabeprazole sodium inhibitors of Endo-M, IC50 values for these compounds were obtained as 626 and 40 μm respectively, indicating the applicability of this probe to screen libraries of potential ENGase inhibitors. 3.3.2 High Mannose Type ENGase Probes (Endo-H)
In 2019, Matsuo’s team further enhanced their ENGase probe capability, preparing FRET quenched, high mannose type heptasaccharide 15 (Fig. 4) [36]. This was designed specifically to probe Endo-H activity, a capability not accessible using the previously described probes demonstrating general ENGase activity. Chemical synthesis of the probe started with galactosylchitobiose 16 as an acceptor and mannose thioglycoside 17 as donor (Scheme 3). The C-3 hydroxyl group in the galactose component of 16 had been shown to have enhanced nucleophilicity [37], and coupling with 17 was therefore completed using an unprotected galactosyl component within 16, providing tetrasaccharide 18 in 14% yield. Tetrasaccharide 18 was next reacted with a mannotriose donor 21, synthesized from the glycosylation of mannosyl acceptor 19 and chloride donor 20 using silver triflate (AgOTf) as a promoter. A trifluoromethanesulfonic anhydride (Tf2O) promoted coupling of donor 21 with tetrasaccharide 18 provided C-6 galactose-linked heptasaccharide 22 in 25% yield. Following this, triflation of the C-2 and C-4 hydroxyl groups in 22 was completed to give 23. Successful triflation then enabled a regioselective nucleophilic substitution of the C-4 triflate with azide using TBAN3 (Scheme 4). A second nucleophilic substitution at C-2 with CsOAc using 18-crown-6 in toluene under ultrasonication yielded
Fig. 4 Heptasaccharide Endo-H FRET probe 15, green region highlights the Endo-H cleavage point, blue the NMA fluorescent component and red the DNP acceptor
246
Meenakshi Singh et al.
Scheme 3 Reagents and conditions: (i) NIS, TfOH, MS 4 Å, CH2Cl2, 78 C, 14%; (ii) AgOTf, MS4 Å, CH2Cl2, 40 C, 92%; (iii) NIS, TfOH, MS 4 Å, CH2Cl2, 78 C, 25%; (iv) Tf2O, pyridine, CH2Cl2, 10 C
Scheme 4 Reagents and conditions: (i) TBAN3, toluene; (ii) CsOAc, 18-crown-6, toluene, ultrasonication, 40 C, 65% (three steps from 22); (iii) (a) TBAF, THF, 40 C, (b) ethylenediamine, n-BuOH, 90 C, (c) Ac2O, pyridine, 40 C, (d) NaOMe, MeOH, THF, 30 C, (e) H2, Pd(OH)2, THF, H2O, 40 C, 57% (five steps); (iv) (a) 2,4-dinitrophenylfluoride, NaHCO3, MeOH, H2O, (b) N-methylanthranilic acid, HATU, DMAP, DMSO, 31% (two steps)
heptasaccharide derivative 24. This material then underwent a series of protecting group manipulations, developed previously by Hindsgaul [38], followed by deprotection. Exploiting the difference in reactivity between the two free amino groups, a DNP group was then selectively introduced at the reducing end amino propyl
Synthetic Strategies for FRET-Enabled Carbohydrate Active Enzyme Probes
247
Fig. 5 Monitoring hydrolysis of probe 15 by Endo-H at different substrate concentrations (green and blue traces) compared to pentasaccharide probe MM3 (orange trace), adapted with permission from ref. 36
group followed by nonreducing end labeling with N-methyl anthranilic acid (MANT) to afford the final FRET-enabled heptasaccharide probe 15. Heptasaccharide 15 was confirmed as 96% quenched by the DNP group and was then incubated with Endo-H and the hydrolysis reaction monitored by HPLC using 360 nm (UV) and fluorescence (440 nm, excitation at 430 nm) detection. The results observed from the fluorescence monitoring showed formation of hexasaccharide and monosaccharide products, indicative of branching mannose cleavage, alongside an increase in fluorescence with increasing concentrations of Endo-H. Comparatively, a pentasaccharide with no branching mannose residues (MM3D) was not hydrolyzed by Endo-H (Fig. 5). To demonstrate broader application for the probe, the team performed substrate specificity studies using Endo-M, belonging to GH family 85, comparing the hydrolysis of 15 and MM3D. They observed that both probes exhibited activity but that activity was higher for the MM3D ligand, confirming previous substrate specificity for Endo-M. 3.4 A FRET Assay for Monitoring Golgi Endo-α-Mannosidase Activity
Matsuo et al. also reported the synthesis of a tetrasaccharide probe for quantitative detection of hydrolytic activity for a Golgi endo-α-mannosidase (G-EM) using a fluorescence quenching assay [39]. G-EM catalyzes deglycosylation of N-glycans and plays crucial roles in the post-endoplasmic reticulum quality control pathway [40]. Radiolabeled N-glycans have previously been used for measuring G-EM activity however these techniques are faced with limitations, such as challenging chromatographic separation of
248
Meenakshi Singh et al.
radioisotopically labeled compounds, poor reproducibility, and lack of immediate access to real-time monitoring. Furthermore, fluorogenic probes require an enzyme inhibitor to be included in the assay setup to avoid removal of the glucose residue by ER glucosidase II [41], and can therefore complicate the process. To overcome these limitations, the group synthesized tetrasaccharide probe 25 to provide accurate measurement of G-EM activity based on FRET quenching and which was not susceptible to glucosidase activity. The tetrasaccharide probe was labeled with MANT as the donor dye at the nonreducing end and DNP group as the acceptor at the reducing end (Fig. 6). The synthesis of 25 utilized mannotriose 26, conveniently prepared from a common thioglycoside precursor, with a pivaloyl (Piv) group at the nonreducing terminal C-3 position to provide selectivity in introducing the glucose component (Scheme 5).
Fig. 6 Structure of fluorogenic probe 25 containing a high-mannose type oligosaccharide labeled with MANT and DNP components. The enzyme cleavage position is shown in green
Scheme 5 Reagents and conditions: (i) (a) TBAF, THF, (b) NaOMe, THF, MeOH, 40 C, 83%, two steps; (ii) (a) trimethylorthoacetate, CSA, CH3CN, (b) HCl, EtOAc, 61%, two steps; (iii) NIS, AgOTf, MS4 Å, CH2Cl2, 20 C, 29, 63%; (iv) (a) TBAF, THF, 40 C, (b) ethylenediamine, n-BuOH, 90 C, 60%, two steps; (v) Boc2O, THF; (vi) NaOMe, THF, MeOH, 40 C, 74%, two steps; (vii) Pd(OH)2/C, t-BuOH, H2O, 40 C, 97%; (viii) 1-fluoro2,4-dinitrobenzene, NaHCO3, MeOH, H2O, 61%; (ix) TFA, 0 C; (x) (a) N-methylanthranilic anhydride, NaHCO3, 1,4-dioxane, (b) NaOMe, MeOH, 61%, three steps
Synthetic Strategies for FRET-Enabled Carbohydrate Active Enzyme Probes
249
tert-Butyldimethylsilyl (TBS) deprotection and subsequent selective removal of the Piv group afforded 27 which was then acetylated to obtain acceptor 28. Glycosylation of this acceptor was performed using NIS/AgOTf as promoter and donor 29 (obtained in two steps from a benzylidene glucose derivative) to afford tetrasaccharide 30. This reaction afforded both α- and β-anomers of 30, which were separated by silica gel column chromatography followed by final deprotection and introduction of the MANT and DNP fluorescent dyes, which was carried out in a stepwise manner to yield doubly labeled α-tetrasaccharide probe 25. The photophysical properties of probe 25 were investigated by comparing the fluorescence intensity and decay of 25 and a MANT labeled disaccharide 31 (the expected product of hydrolysis of 25 by G-EM). Fluorescence intensity of the latter was found to be 53 times higher than for probe 25, indicating the MANT group to be 95% quenched by the DNP acceptor. To test the biological capability of 25, hydrolysis experiments were performed with recombinant human G-EM [42], and the reaction was monitored by HPLC with UV detection. The results indicated that G-EM cleaved the central glycosidic linkage of 25 to give glucomannose 31 and mannobiose derivative 32 (Fig. 7). The group also investigated hydrolysis of probe 25 by G-EM in real time using a microplate reader. Differing concentrations of G-EM were incubated with 5 μM of 25 and the observed fluorescence intensity increased rapidly with increasing concentration of enzyme. Notably, the fluorescence intensity remained unchanged when the β-Glc configured stereoisomer of 25 was used, confirming specificity of G-EM for Glc-1,3-Man α-linkages. Determination of the Michaelis constant (Km) for 25 with G-EM (Km ¼ 19 μM) and for a tetrasaccharide derivative without the MANT donor (Km ¼ 15 μM) indicated no significant difference with regard to
Fig. 7 HPLC traces for cleavage of probe 15 by human G-EM. Disaccharide products 31 and 32 were detected at 360 and 440 nm for 32 and 31 respectively, adapted with permission from ref. 39
250
Meenakshi Singh et al.
enzyme recognition. The resistance of fluorogenic probe 25 to glucosidase was also confirmed by incubating 25 with glucosidase II and no reaction was observed. Importantly this research described the first example of a FRET probe for detecting G-EM activity without the need for a glucosidase inhibitor to be included in the assay setup. 3.5 Fluorescent Heparin to Monitor Endo-β-DGlucuronidase Activity
Human heparanase (Hpa) is an endo-β-D-glucuronidase that cleaves linkages between glucuronic acid (GlcA) and glucosamine (GlcN) components in heparin (H) and heparan sulfate (HS) chains (Fig. 8). These polymeric chains are present on cell surfaces and in the extracellular matrix, playing a crucial role in tumor growth, angiogenesis, and metastasis [43]. Due to the pathophysiological ubiquity of Hpa, several biochemical and biophysical methods have been developed to detect its activity [44–46]. Initially, these methods focused on quantifying levels of H/HS degradation products, for example, GlcA-terminated reducing sugars, and more recently, activity-based probes for Hpa have been established [47]. In 2019, Desai and colleagues reported the synthesis of fluorescently labeled H as an efficient FRET probe for quantifying human Hpa activity and to screen for potential inhibitors [48]. The team utilized heterogeneous heparin (containing repeat GlcA–GlcNS6S units) and conjugated with 40 -dimethylaminoazobenzene-4-carboxylate (DABCYL) and EDANS as the FRET pair. This was completed using conventional carbodiimide coupling of uronic acid carboxylates within the carbohydrate polymer to amine-terminated fluorescent labels. As such, either probe could theoretically be introduced anywhere along the chain and the team prepared four heparin–FRET samples using varying equivalents of EDANS and DABCYL (1:1, 1:2, 2:2, and 2:1) per average chain of unfractionated heparin. 1H NMR spectroscopy was used to determine the stoichiometries of the two FRET labels in the final product. The cleavage of these substrates by Hpa and consequent FRET signal enhancement was evaluated and the modified polymer with singular incorporation of both EDANS and DABCYL was observed to show the highest increase in fluorescent intensity.
Fig. 8 Chemical structure of heparin (H)/heparan sulfate (HS), Hpa cleavage site shown in green
Synthetic Strategies for FRET-Enabled Carbohydrate Active Enzyme Probes
251
Next, the team investigated the kinetic properties of the substrate to obtain optimal conditions for a Hpa activity assay. A Km was calculated as 46 12 μM and inhibition experiments of Hpa by suramin [49], using the FRET probe, showed it to offer sufficiently sensitive measurement capability; an IC50 for suramin was established as 330 2.5 μM. Furthermore, the group implemented the FRET assay to monitor the activity of Hpa secreted by cancer cells. The human breast cancer cell line MCF7 was grown in both normoxic and hypoxic conditions with varying amounts of fetal bovine serum (FBS, 0%, 2%, and 10%) to mimic the in vivo conditions. Hpa activity was found to be almost 10% higher under hypoxic conditions (compared to normoxic) for cells grown in FBS-free media [50]. Cells grown in 2% FBS did not show much difference in activity between hypoxic and normoxic conditions and 10% FBS media, interestingly, repressed the Hpa activity under both conditions. The establishment of this rapid and simple to use assay for heparanase activity is important as the field progresses to identify potential inhibitors and undertake monitoring of enzyme activity in cells. Importantly, the authors identified caution that must be taken using this approach with labeled heterogeneous heparin samples and the requirement to analytical reproducibility in terms of future commercial viability. 3.6 A Multipurpose FRET Probe for GangliosideProcessing Enzyme Activity
In 2015, Withers et al. reported the synthesis and application of a fluorescent probe to detect and quantify ganglioside degrading enzymes in cell lysates, as well as in living cells [51]. Gangliosides, a group of glycosphingolipids containing one or more terminal sialic acid residues, are found in the cell membrane and play crucial roles in biochemical signaling, pathogen entry, membrane transport, and intracellular protein sorting [52]. The design of the probe featured 7-hydroxycoumarin and BODIPY as FRET donor and acceptor components, attached to either end of a ganglioside GM3 derivative (33, Fig. 9). This was proposed to enable recognition and cleavage by different ganglioside-degrading enzymes, namely sphingolipid ceramide Ndeacylase (Enz1), endoglycoceramidase (Enz2), and neuraminidase (Enz3). Cleavage of the probe by any of these enzymes would yield a large decrease in FRET efficiency, providing a detectable ratiometric fluorescent signal for enzyme activity.
Fig. 9 Chemical structure of ganglioside FRET probe 33 with cleavage points by ganglioside processing enzymes highlighted in green
252
Meenakshi Singh et al.
Scheme 6 Reagents and conditions: (i) EGCase glycosynthase, D-erythro-sphingosine, NaOAc buffer pH 5.3, 10% dimethoxyethane, 37 C; (ii) CMP-9-azido-9-deoxy-sialic acid, α-2,3-sialyltransferase, alkaline phosphatase, Tris buffer (pH 7.5), MgCl2, RT; (iii) 37, BTTES buffer, sodium ascorbate, CuSO4, DMF, 35 C; (iv) 38, SCDase glycosphingolipid N-deacylase, HEPES buffer (pH 7.5), 10% dimethoxyethane, 35 C
The synthesis of the target probe was completed chemoenzymatically, initially using enzymatic methods previously established by the group (Scheme 6) [53, 54]. Glycosylation of D-erythrosphingosine with lactosyl fluoride 34 was completed using EGCase glycosynthase to yield lactosyl sphingosine 35 followed by coupling of 9-azido sialic acid using the sialyltransferase Cst-I from C. jejuni to obtain trisaccharide 36. Next, a coupling of coumarin derivative 37 with 36 using CuAAC “click” chemistry resulted in a fluorescent lyso-GM3 derivative, which was followed by enzymatic condensation of with BODIPY-coupled lauric acid 38 to afford the target FRET-enabled probe 33. Probe 33 formed micelles in aqueous solution affecting its fluorescent properties significantly, so the team used Triton X-100 as a spacer surfactant to reduce the self-quenching that was initially observed. The emission of 7-hydroxycoumarin at 450 nm was barely observed when excitation was completed at 360 nm, while a strong emission from BODIPY was evident at 518 nm, with a FRET efficiency of >95%. Next, the individual addition of ganglioside degrading enzymes (Enz1 from Shewanella algae G8, Enz2 from Rhodococcus strain M-777 or Enz3 from Micromonospora viridifaciens) increased the fluorescence emission at 450 nm significantly, accompanied by a decrease in emission at 518 nm. The products obtained after the cleavage of 33 by Enz1, Enz2, and Enz3 were visualized by TLC and a markedly different pattern was observed in each case, making 33 suitable for simultaneous detection in a multienzyme system (Fig. 10). The changes in ratio of emission (λem 450 nm and/λem 518 nm) for the FRET components were up to 70-fold for Enz2 and Enz3. However, with Enz1 the change was significantly smaller (7.3-fold) since an equilibrium is established between hydrolysis and forward synthesis.
Synthetic Strategies for FRET-Enabled Carbohydrate Active Enzyme Probes
253
Fig. 10 Hydrolysis of probe 33 by sphingolipid ceramide N-deacylase (Enz1), endoglycoceramidase (Enz2), or neuraminidase (Enz3), coumarin fluorescence (blue) and BODIPY fluorescence (green). (a) Fluorescence spectra before and after 30 min incubation with the different enzymes. (b) Visual detection of the reactions under a UV lamp. (c) Different hydrolysis patterns of probe 33 after hydrolysis by Enz1, Enz2, and Enz3, adapted with permission from ref. 51
The ability of 33 to insert into the membrane of human lymphoma cells was investigated next and its rapid hydrolysis was observed upon simple addition to the cell media due to neuraminidase activity. To diminish this effect, a microinjector was utilized to add a small amount of 33 in the vicinity of cells, during imaging. From the observed BODIPY emission, it was clear that staining of the cells occurred primarily at the plasma membrane. The addition of 33 strongly suppressed the decrease in fluorescence signal in all three channels, suggesting that the decrease monitored was primarily due to neuraminidase activity. The background-corrected FRET signal from cells labeled with 33 showed large differences in their rate of decay in the FRET channel in the presence versus the absence of 2,3-didehydro-N-acetyl-neuraminic acid (DANA), a neuraminidase inhibitor used to block the action of these enzymes. The team concluded that 33 inserted into the plasma membrane where it is acted on as a substrate of human neuraminidases, although they could not rule out that other enzymes or physical processes were contributing to the observed emission changes. The experiments suggested no background hydrolysis of the probe in lysates of control E. coli BL21 cells, even after 12 h. However, on addition of lysate from E. coli cells expressing a control
254
Meenakshi Singh et al.
neuraminidase, the changes in fluorescence were observed to be dependent on the lysate dose. The synthesized probe 33 is ideal for the high-throughput assay of ganglioside degrading enzymes and enables a real time observation of the enzymatic process with high sensitivity.
4
FRET Probes for Exo-Acting Glycosyl Hydrolases
4.1 Bis-Acetal Based Substrates for Probing Exo-Glycosidase Activity
In 2017, Vocadlo and Cecioni developed bis-acetal based substrates (BABS) as fluorescence quenched probes to monitor exo-glycosidase activity [55]. Exo-glycosidases cleave internal glycosidic linkages which are usually more difficult to target with fluorescence equipped probes as the binding sites are often sterically encumbered and pocket shaped. This class of CAZy have emerging roles in human health and systems that utilize covalent attachment have previously been developed in this area [56]. The team proposed to position the fluorescent components outside of the enzymesubstrate binding region, thereby preserving the essential carbohydrate recognition motif. Accordingly, a fluorescence quenched FRET pair attached to a glycoside substrate at the anomeric position (as a bis-acetal) was envisaged, with subsequent action of a glycosidase releasing the carbohydrate and FRET component as hemiacetals, the latter then breaking down to release the FRET pair and fluorogenic readout garnered from the partners no longer being in close proximity. The known stability of anomeric acetals (through endocyclic oxygen) compared to regular acetals was suggested to enable stability across a physiological pH gradient. The chemical synthesis of two BABS derivatives started from conveniently prepared D-GlcNAc derivative 39 (Scheme 7) [57]. Firstly, a regio- and stereoselective bromoalkoxylation of 39 was completed using N-bromosuccinimide (NBS) and 2-(2-azidoethoxy)ethanol followed by a three-step one-pot functional group manipulation to deliver carboxylic acid 40. An EDANS donor fluorophore was then installed using amide coupling with carboxylate 40 followed by installation of a DABCYL quencher through the pendant azide. Final deprotection furnished GlcNAcBr-BABS probe 41. For an alternative GlcNAc-OH-BABS system, the alkene in 39 first underwent stereoselective epoxidation to 42 followed by regioselective acid mediated ring opening with 2-(2-azidoethoxy)ethanol and functional group manipulation to deliver 43, which was then labeled in a similar manner to afford 44. With fluorescence quenched probes 41 and 44 in hand, the group first confirmed their stability across a pH range from 2 to 10 and then examined hydrolytic susceptibility using the exo-acting human O-GlcNAcase (hOGA), which cleaves O-linked β-NGlcNAc units from serine and threonine residues. hOGA
Synthetic Strategies for FRET-Enabled Carbohydrate Active Enzyme Probes
255
Scheme 7 Reagents and conditions: (i) 2-(2-azidoethoxy)ethanol, NBS, DCM, 57%; (ii) (a) LiOH, H2O, THF, (b) Ac2O, pyr.; (c) THF, H2O, 57–62% three steps; (iii) EDANS-NH2, HBTU, DIPEA, DMF; (iv) DABCYL-Alkyne, Cu (MeCN)4PF6, DIPEA, DCM, 64%, two steps; (v) NaOMe, MeOH; (vi) mCPBA, Na2SO4, toluene, 67%; (vii) 2-(2-azidoethoxy)ethanol, CSA, DCM, 66%; (viii) (a) LiOH, H2O, THF; (b) Ac2O, pyr.; (c) THF, H2O, 57–62% three steps; (ix) EDANS-NH2, HBTU, DIPEA, DMF; (x) DABCYL-alkyne, Cu(MeCN)4PF6, DIPEA, DCM, 81%, two steps; (xi) NaOMe, MeOH
Fig. 11 BABS probes 41 and 44 are turned over by hOGA. Evolution of fluorescence (RFU) for different concentrations of 41 and 44 in the presence of hOGA. Dotted lines represent the linear rates reached at steady state, adapted with permission from ref. 55
successfully catalyzed the hydrolysis of 41 and 44, revealing kcat/Km values comparable to the native substrate (263 and 519 M1 s1, respectively) (Fig. 11). During these experiments, the authors noticed a short lag phase (approx. 1 min) before linear activity rates were achieved which was proposed to be from an accumulation of the hydrolyzed hemiacetal intermediate containing both FRET components. To explore this effect, the enzymatic activity was stopped after 1 min by addition of a tight binding hOGA inhibitor (Thimaet-G, 100 uM, Ki ¼ 2 nM), which enabled monitoring of the breakdown of hemiacetal generated before this point. Hemiacetal from 44
256
Meenakshi Singh et al.
broke down 4.5 times faster than that from 41, supporting a stabilizing effect of the alpha bromine in reducing hemiacetal decomposition. Finally, turnover of 41 was monitored in SK-N-SH cell lysate with increasing concentrations of hOGA and processing rates similar to those in buffer were observed. Overall, the glyco-BABS probes provided proof of concept for monitoring exo-glycosidase activity in vitro and in cells and could be used in harmony with inhibitor development programs or to monitor factors regulating enzymatic activity.
5
FRET Probes for Glycosyl Transferases
5.1 Monitoring α-1,3-Fucosyltransferase IX Activity
Fucosyltransferases catalyze the transfer of L-fucose from a donor molecule (a sugar nucleotide, NDP-sugar) to an acceptor. Within this classification, human fucosyltransferase IX (hFucT IX) belongs to the α-1,3-family, primarily transferring L-fucose with inversion of the anomeric configuration to terminal N-acetyllactosamine units. This process completes the synthesis of Lewis X (Lex), an epitope of many glycoconjugates. Lex is involved in cell-cell interaction and adhesion processes in both healthy and pathogenic pathways. The ability to accurately monitor hFucT IX catalyzed activity in vitro would enable the elucidation of vital structure–function information offering the potential to then exploit the pathways it regulates to be uncovered, for example through the development of inhibitors. In 2013, Meir and Hahn described the synthesis of a fluorescently labeled GDP-β-L-fucose and used this to form a FRETenabled Lex conjugate, catalyzed by hFucT IX, thus establishing the capability to monitor the transformation [58]. The group selected two commercial ATTO dyes as the FRET pair (donor ATTO550 and acceptor ATTO647N) and first completed a synthesis of ATTO 550 labeled GDP-β-L-fucose 48 (Scheme 8). The chemical synthesis of NDP-sugars is an historically challenging process, often associated with low yields and long reactions times [59, 60]. The same group have recently developed cycloSalnucleotides [61], which serve as active phosphate ester building blocks for combination with an anomerically pure glycosyl 1-phosphate to reliably access NDP-sugars in high yield. Accordingly, 5-nitro-cycloSal-N2-acetyl-20 ,30 -di-O-acetylguanosine monophosphate 45 and 2,3,4-tri-O-acetyl-6-azido-6-deoxy-α-Lgalactopyranosyl phosphate 46 were coupled together in 41% yield (following deacetylation) to afford GDP-sugar 47. This material was then reduced at the L-Gal C-6 position using standard conditions in very good yield (90%), followed by facile coupling with the NHS-ester form of ATTO 550, to give fluorescent NDP-sugar 48 in 73% yield. The authors observed formation of
Synthetic Strategies for FRET-Enabled Carbohydrate Active Enzyme Probes
257
Scheme 8 Reagents and conditions: (i) (a) DMF, 4 Å MS; (b) CH3OH–H2O–Et3N (7:3:1), 41% two steps; (ii) H2, Pd/C, CH3OH, 90%; (iii) ATTO 550-NHS, NaHCO3, DMSO, buffer (pH ¼ 8.3), 73%
Scheme 9 Reagents and conditions: (i) (a) TMSN3, TMSOTf; (ii) (a) ROH (allyl alcohol or 4-penten-1-ol), Ph3P, CH2Cl2; (b) Dowex OH, EtOH, 65% two steps; (iii) NaOMe, MeOH, 94%; (iv) HSCH2CH2NH2.HCl, hv ¼ 254 nm, H2O, 79% and 71%; (v) ATTO 647N-NHS, DMSO
hydrolyzed dye during this final coupling step, but were able to separate this material from the required compound using silica gel and gel filtration purification techniques. HRMS confirmed the molecular mass of 48 (the structure of ATTO 550 is currently not disclosed but is related to other rhodamine dyes) and the conjugate exhibited identical absorbance and fluorescence maxima, compared to the NHS dye. To access a fluorescently labeled acceptor partner for 48, Nacetyllactosamines 52 and 53 were synthesized from lactal 49 (Scheme 9). Iodoacetoxylation of 49 followed by treatment of the resultant 2-iodo anomeric acetate with TMSN3/TMSOTf furnished an α-glycosyl azide. Subsequently, an amido glycosylation approach developed by Lafont was used to prepare β-glycosides of N-acetyl lactosaminide using either allyl alcohol or 4-penten-1-ol [62]. Under these conditions the starting α-glycosyl azide forms an α-iminophosphorane through reaction with PPh3. This intermediate underwent intramolecular reaction to eliminate the C2 iodide and form a bridging C1-C2 aziridinium ion which was subsequently opened by the incoming alcohol nucleophile. Following
258
Meenakshi Singh et al.
Fig. 12 (a) hFucT IX synthesis of Lex derivative 54 containing FRET enabled ATTO dyes. The α-fucosyltransferase catalyzed bond formed is shown in green
basic workup, lactosaminides 50 were isolated in good yields. Finally, Ze´mplen deprotection, alkene functionalization with cysteamine gave 51, and reaction with the NHS-ester form of ATTO 647N was completed. This afforded acceptor components 52 and 53 with an ATTO 467N acceptor dye at the reducing terminal, distanced from the anomeric center by either propyl or pentyl hydrocarbon spacers and cysteamine. The turnover of fluorescently labeled substrates 48 and 52/53 by hFucT IX was then analyzed by high-performance thin-layer chromatography coupled with mass spectrometry and dual color fluorescence cross correlation spectroscopy. These studies revealed that both the sugar nucleotide donor 48 and disaccharide acceptor substrates 52/53 were accepted by recombinant hFucT IX, establishing a quick and versatile method for monitoring progress of this enzymatic process. Evidence for a FRET effect between the two dyes incorporated within the newly formed trisaccharide 54 was investigated (Fig. 12). Following successful glycosylation, a decrease in fluorescence intensity for ATT550 would be expected (FRET quenching). This was indeed observed by the authors, however, the additional expected increase in the fluorescence emission of the ATTO647N acceptor was not observed, suggesting a distance of >10 nm between the pair in 54. This work provides an exciting precedent for utilizing labeled donor and acceptor components to monitor glycosyl transferase activity and also highlights the important molecular distance relationships required to capitalize on FRET effects.
6
Screening Catalytic Antibody Activity
6.1 FRET Probes for Screening Catalytic Monoclonal Antibody Activity
Recently, Casadevall and Oscarson reported the design and synthesis of oligosaccharide FRET probes to characterize the hydrolytic activity of four catalytic monoclonal antibodies, effectively demonstrating their innate glycosidase activity [63]. This was undertaken as part of a strategy targeting glycoconjugate vaccine candidates against the opportunistic fungi Cryptococcus neoformans, where the capsule polysaccharide glucuronoxylomannan (GXM) dominates. Based on their prior work developing synthetic glycan arrays of
Synthetic Strategies for FRET-Enabled Carbohydrate Active Enzyme Probes
259
Fig. 13 Decasaccharide GXM FRET probes for screening catalytic antibody activity
Scheme 10 Reagents and conditions: (i) DMTST, Et2O, 4 Å MS, 0 C 65%: (ii) DDQ, CH2Cl2, PBS pH 7.5, 0 C, 70%; (iii) DMTST, Et2O, 4 Å MS, 0 C, 85%; (iv) ethylene diamine, n-BuOH, 90 C; (v) Boc2O, THF, H2O, 57%, two steps; (vi) Pd/C (pretreated catalyst), H2, THF, t-BuOH, PBS (pH 5), 66%; (vii) 7-methoxycoumarin-4-acetic acid-N-succinimidyl ester, Et3N, DMSO; (viii) HCl, MeOH, H2O; (ix) N-(2,4-dinitrophenyl) glycine-N-succinimidyl ester, Et3N, DMSO
GXM structures [64], the group synthesized GXM decasaccharide targets 55 and 56, equipped with FRET capability, using a coumarin/DNP pairing (Fig. 13). The stepwise synthesis and amalgamation of building blocks was carried out using method previously developed by the group [65]. Scheme 10 highlights the synthesis, starting from tetrasaccharide 57, which underwent iterative coupling/deprotection sequences with glycosyl donors 58 and 59, to reveal a protected decasaccharide 60. This material was then globally deprotected using a pretreated palladium catalyst [66], and consecutively conjugated with FRET donor and acceptor components to reveal probe 56. The nonacylated probe 55 was accessed by removal of the acetyl groups from 56.
260
Meenakshi Singh et al.
Next, the team implemented acylated and nonacylated FRET probes 55 and 56 to detect the catalytic activity of four murine monoclonal antibodies (mAbs): 18B7 (IgG1), 2H1 (IgG1), and two isotypes of 3E5, an IgG1 and IgG3. Highlights from this study showed the IgG3 isotype of 3E5 differed in specificity toward the two FRET probes when compared to 3E5 IgG1, despite these mAbs having the same variable sequence. The IgG3 displayed no activity toward 55, but the second highest activity against the acetylated ligand 56. An evaluation of the Michaelis–Menten kinetics of mAb 2H1 against both FRET probes was also performed. Probe 56 had a Km of 2.12 104 mM, while 55 had a Km of 1.8 104 mM indicating marginally higher affinity of 2H1 for 55. However, determination of kcat (6.16 103 s1 for 55 and 1.3 102 s1 for 56) suggested 2H1 to be twice as efficient in hydrolyzing 55. The kcat/Km values showed mAb 2H1 to be three times more efficient at hydrolysis of 55 than 18B7, suggesting 2H1 to be much more efficient at catalyzing the hydrolysis of GXM oligosaccharides. Overall this study gathered important new understanding regarding the role of oligosaccharide acetylation in antibody recognition and catalysis, with the inclusion of appropriate FRET components enabling kinetic parameters to be evaluated and the catalytic activity observable on the native capsule of heat-killed C. neoformans cells.
7
Summary of Common FRET Pairings Used in Oligosaccharide Probe Design In Table 1, we summarize the common FRET pairings used to equip glycans for probing CAZy activity. Conventionally, these organic dyes are attached at the reducing and nonreducing ends of oligosaccharides at a late stage in the synthesis of linear sequences (ranging in length from tetrasaccharide up to polymeric systems). Parings have also been accomplished at only the reducing end of monosaccharides and at branching units in nonlinear systems.
8
Conclusion and Outlook Carbohydrate active enzymes control the production, processing and degradation of glycans which are underpinning to fundamental biology. Nondestructive analytical tools that can be utilized to monitor their function are therefore important and highly desirable; FRET-enabled carbohydrate probes are rapidly emerging to fulfil this requirement, as important enabling tools. Their conceptualization broadly involves synthetically installing required fluorescent components (acceptor and donor) within a structurally defined carbohydrate enzyme substrate, thus enabling monitoring
Synthetic Strategies for FRET-Enabled Carbohydrate Active Enzyme Probes
261
Table 1 Common organic label components for FRET enabled carbohydrate probes FRET pair
Spectral properties
Donor
Quencher
Donor (λabs/λem-max) (nm)
Quencher (λabs-max) (nm)
NAP
DANSYL
290/333
333
EDANS
DAB
340/490
430
NMA (MANT)
DNP
340/440
360
ATTO550
ATTO647N
554/576
646
EDANS
DABCYL
340/490
470
7-Hydroxycoumarin
BODIPY
325/450
520
7-Methoxycoumarin
DNP
325/410
360
of catalytic function as the enzyme acts on the probe. Ratiometric monitoring enables alterations in fluorogenic readout to be captured as FRET is prevented and this provides direct access to important kinetic parameters regarding enzymatic activity and opens a pathway to screen for inhibitors. Exciting achievements in the past 10 years have largely focused on endo-acting glycosyl hydrolases (enzymes that cleave internal glycosidic linkages in a given oligosaccharide chain) and as this field continues to evolve, methodologies to address a broader range of enzyme classes are required. From a synthetic chemistry perspective there will always be a design challenge to functionalize a substrate with FRET components without compromising target binding and this will need to be addressed as more challenging enzyme-substrate binding environments are considered, for example, glycosyl transferases or exo-acting hydrolases. Additionally, limiting factors from the perspective of requirements for FRET function (distance between fluorophores) will need to be addressed as capability extends carbohydrate chain lengths into polysaccharide space. Flexibility in the attachment position along the carbohydrate chain and the rigidity of the fluorophore need to be key focus points here. Finally, the transition of these probe systems, which have to date largely been utilized in vitro, to live biological scenarios for direct monitoring of specific enzymatic activity is required. This is particularly important in the established context of upregulated CAZy activity as a marker in disease diagnosis and in addressing clinically relevant questions. Advancement here will expand the field of FRET-carbohydrate probes to capability levels apparent for application of this sensing platform within other biological macromolecule classes.
262
Meenakshi Singh et al.
References 1. Bertozzi CR, Kiessling LL (2001) Chemical glycobiology. Science 291:2357–2364 2. Rudd PM, Elliott T, Cresswell P, Wilson IA, Dwek RA (2001) Glycosylation and the immune system. Science 291:2370–2376 3. Varki A (1993) Biological roles of oligosaccharides: all of the theories are correct. Glycobiology 3:97–130 4. Wolfenden R, Lu X, Young G (1998) Spontaneous hydrolysis of glycosides. J Am Chem Soc 120:6814–6815 5. Zechel DL, Withers SG (2000) Glycosidase mechanisms: anatomy of a finely tuned catalyst. Acc Chem Res 33:11–18 6. Davies GJ, Sinnott ML (2008) Sorting the diverse: the sequencebased classifications of carbohydrateactive enzymes. Biochemist 30:26–32 7. Coutinho PM, Deleury E, Henrissat B (2003) The families of carbohydrate-active enzymes in the genomic era. J Appl Glycosci 50:241–244 8. Koshland DE (1953) Stereochemistry and the mechanism of enzymatic reactions. Biol Rev Camb Philos Soc 28:416–436 9. McCarter JD, Withers GS (1994) Mechanisms of enzymatic glycoside hydrolysis. Curr Opin Struct Biol 4:885–892 10. Rye CS, Withers SG (2000) Glycosidase mechanisms. Curr Opin Chem Biol 4:573–580 11. Sobala LF, Speciale G, Zhu S, Raich L, Sannikova N, Thompson AJ, Hakki Z, Lu D, Abadi SSK, Lewis AR, Rojas-Cervellera V, BernardoSeisdedos G, Zhang Y, Millet O, Jime´nez-Barbero J, Bennet AJ, Sollogoub M, Rovira C, Davies GJ, Williams SJ (2020) An epoxide intermediate in glycosidase catalysis. ACS Cent Sci 6:760–770 12. Lombard V, Ramulu HG, Drula E, Coutinho PM, Henrissat B (2013) The carbohydrateactive enzymes database (CAZy) in 2013. Nucleic Acids Res 42:D490–D495 13. Stubbs KA (2014) Activity-based proteomics probes for carbohydrate-processing enzymes: current trends and future outlook. Carbohydr Res 390:9–19 14. Henrissat B, Davies G (1997) Structural and sequence-based classification of glycoside hydrolases. Curr Opin Struct Biol 7:637–644 15. Burke HM, Gunnlaugsson T, Scanlan EM (2015) Recent advances in the development of synthetic chemical probes for glycosidase enzymes. Chem Commun 51:10576–10588 16. Patterson MC (2017) Swaiman’s pediatric neurology, 6th edn. Elsevier, Amsterdam
17. Kishnani PS, Chen Y-T (2013) Emery and Rimoin’s principles and practice of medical genetics, 6th edn. Elsevier, Philadelphia 18. Tomasic IB, Metcalf MC, Guce AI, Clark NE, Garman SC (2010) Interconversion of the specificities of human lysosomal enzymes associated with Fabry and Schindler diseases. J Biol Chem 285:21560–21566 19. Vlodavsky I, Singh P, Boyango I, GutterKapon L, Elkin M, Sanderson RD, Ilan N (2016) Heparanase: from basic research to therapeutic applications in cancer and inflammation. Drug Resist Update 29:54–75 20. Plegt L, Bino RJ (1989) β-Glucuronidase activity during development of the male gametophyte from transgenic and non-transgenic plants. Mol Gen Genet 216:321–327 21. Kalyani D, Lee K-M, Kim T-S, Li J, Dhiman SS, Kang YC, Lee J-K (2013) Microbial consortia for saccharification of woody biomass and ethanol fermentation. Fuel 107:815–822 22. Willems LI, Jiang J, Li K-Y, Witte MD, Kallemeijn WW, Beenakker TJN, Schro¨der SP, Aerts JMFG, van der Marel GA, Code´e JDC, Overkleeft HS (2014) From covalent glycosidase inhibitors to activity-based glycosidase probes. Chem Eur J 20:10864–10872 23. Clegg RM (1995) Fluorescence resonance energy transfer. Curr Opin Biotech 6:103–110 24. Bunt G, Wouters FS (2017) FRET from single to multiplexed signaling events. Biophys Rev 9:119–129 25. Broussard JA, Green KJ (2017) Research techniques made simple: methodology and applications of Fo¨rster resonance energy transfer (FRET) microscopy. J Invest Dermatol 137: e185–e191 26. Bajar BT, Wang ES, Zhang S, Lin MZ, Chu J (2016) A guide to fluorescent protein FRET Pairs. Sensors 16:1488 27. Kaur A, Dhakal S (2020) Recent applications of FRET-based multiplexed techniques. Trends Anal Chem 123:115777 28. Sapsford KE, Berti L, Medintz IL (2006) Materials for fluorescence resonance energy transfer analysis: beyond traditional donor–acceptor combinations. Angew Chem 45:4562–4589 29. Oka H, Koyama T, Hatano K, Matsuoka K (2012) Synthetic studies of bi-fluorescencelabeled maltooligosaccharides as substrates for α-amylase on the basis of fluorescence resonance energy transfer (FRET). Bioorg Med Chem 20:435–445
Synthetic Strategies for FRET-Enabled Carbohydrate Active Enzyme Probes 30. Oka H, Koyama T, Hatano K, Terunuma D, Matsuoka K (2010) Simple and conveniently accessible bi-fluorescence-labeled substrates for amylases. Bioorg Med Chem Lett 20:1969–1971 31. Suganuma T, Matsuno R, Ohnishi M, Hiromi K (1978) A study of the mechanism of action of Taka-amylase A1 on linear oligosaccharides by product analysis and computer simulation. J Biochem 84:293–316 32. Guerry A, Bernard J, Samain E, Fleury E, Cottaz S, Halila S (2013) Aniline-catalyzed reductive amination as a powerful method for the preparation of reducing end-“clickable” chitooligosaccharides. Bioconj Chem 24:544–549 33. Cottaz S, Brasme B, Driguez H (2000) A fluorescence-quenched chitopentaose for the study of endo-chitinases and chitobiosidases. Eur J Biochem 267:5593–5600 34. Ishii N, Sunaga C, Sano K, Huang C, Iino K, Matsuzaki Y, Suzuki T, Matsuo I (2018) A new fluorogenic probe for the detection of endo-βN-acetylglucosaminidase. ChemBioChem 19:660–663 35. Matsuo I, Isomura M, Ajisaka K (1999) Synthesis of an asparagine-linked core pentasaccharide by means of simultaneous inversion reactions. J Carbohydr Chem 18:841–850 36. Ishii N, Sano K, Matsuo I (2019) Fluorogenic probe for measuring high-mannose type glycan-specific endo-β-N-acetylglucosaminidase H activity. Bioorg Med Chem Lett 29:1643–1646 37. Matsuo I, Isomura M, Miyazaki T, Sakakibara T, Ajisaka K (1997) Chemoenzymatic synthesis of the branched oligosaccharides which correspond to the core structures of N-linked sugar chains. Carbohydr Res 305:401–413 38. Kanie O, Crawley SC, Palcic MM, Hindsgaul O (1993) Acceptor-substrate recognition by N-acetylglucosaminyltransferase-V: critical role of the 400 -hydroxyl group in β-dGlcpNAc-(1 ! 2)-α-d-Manp(1 ! 6)-β-dGlcp-OR. Carbohydr Res 243:139–164 39. Sano K, Kuribara T, Ishii N, Kuroiwa A, Yoshihara T, Tobita S, Totani K, Matsuo I (2019) Fluorescence quenching-based assay for measuring Golgi endo-α-mannosidase. Chem Asian J 14:1965–1969 40. Spiro RG (2004) Role of N-linked polymannose oligosaccharides in targeting glycoproteins for endoplasmic reticulum-associated degradation. Cell Mol Life Sci 61:1025–1041 41. Lubas WA, Spiro RG (1987) Golgi endoalpha-D-mannosidase from rat liver, a novel N-linked carbohydrate unit processing enzyme. J Biol Chem 262:3775–3781
263
42. Iwamoto S, Kasahara Y, Yoshimura Y, Seko A, Takeda Y, Ito Y, Totani K, Matsuo I (2017) Endo-α-mannosidase-catalyzed transglycosylation. ChemBioChem 18:1376–1378 43. Sanderson RD, Elkin M, Rapraeger AC, Ilan N, Vlodavsky I (2016) Heparanase regulation of cancer, autophagy and inflammation: new mechanisms and targets for therapy. FEBS J 284:42–55 44. Tsuchida S, Podyma-Inoue KA, Yanagishita M (2004) Ultrafiltration-based assay for heparanase activity. Anal Biochem 331:147–152 45. Hammond E, Li CP, Ferro V (2009) Development of a colorimetric assay for heparanase activity suitable for kinetic analysis and inhibitor screening. Anal Biochem 396:112–116 46. Melo CM, Tersariol ILS, Nader HB, Pinhal MAS, Lima MA (2015) Development of new methods for determining the heparanase enzymatic activity. Carbohydr Res 412:66–70 47. Wu L, Jiang J, Jin Y, Kallemeijn WW, Kuo C-L, Artola M, Dai W, van Elk C, van Eijk M, van der Marel GA, Code´e JDC, Florea BI, Aerts JMFG, Overkleeft HS, Davies GJ (2017) Activity-based probes for functional interrogation of retaining β-glucuronidases. Nat Chem Biol 13:867–873 48. Sistla JC, Morla S, Alabbas A-HB, Kalathur RC, Sharon C, Patel BB, Desai UR (2019) Polymeric fluorescent heparin as one-step FRET substrate of human heparanase. Carbohydr Polym 205:385–391 49. Li H, Li H, Qu H, Zhao M, Yuan B, Cao M, Cui J (2015) Suramin inhibits cell proliferation in ovarian and cervical cancer by downregulating heparanase expression. Cancer Cell Int 15:52 50. Poupard N, Badarou P, Fasani F, Groult H, Bridiau N, Sannier F, Bordenave-Juchereau S, Kieda C, Piot J-M, Grillon C, Fruitier-Arnaudin I, Maugard T (2017) Assessment of heparanase-mediated angiogenesis using microvascular endothelial cells: identification of λ-carrageenan derivative as a potent anti angiogenic agent. Mar Drugs 15:134 51. Yang G-Y, Li C, Fischer M, Cairo CW, Feng Y, Withers SG (2015) A FRET probe for cellbased imaging of ganglioside-processing enzyme activity and high-throughput screening. Angew Chem 54:5389–5393 52. Sandhoff K, Harzer K (2013) Gangliosides and gangliosidoses: principles of molecular and metabolic pathogenesis. J Neurosci 33:10195–10208 53. Rich JR, Withers SG (2012) A chemoenzymatic total synthesis of the neurogenic starfish ganglioside LLG-3 using an engineered and
264
Meenakshi Singh et al.
evolved synthase. Angew Chem 51:8640–8643 54. Hancock SM, Rich JR, Caines MEC, Strynadka NCJ, Withers SG (2009) Designer enzymes for glycosphingolipid synthesis by directed evolution. Nat Chem Biol 5:508–514 55. Cecioni S, Vocadlo DJ (2017) Carbohydrate bis-acetal-based substrates as tunable fluorescence-quenched probes for monitoring exoglycosidase activity. J Am Chem Soc 139:8392–8395 56. Witte MD, Kallemeijn WW, Aten J, Li K-Y, Strijland A, Donker-Koopman WE, van den Nieuwendijk AMCH, Bleijlevens B, Kramer G, Florea BI, Hooibrink B, Hollak CEM, Ottenhoff R, Boot RG, van der Marel GA, Overkleeft HS, Aerts JMFG (2010) Ultrasensitive in situ visualization of active glucocerebrosidase molecules. Nat Chem Biol 6:907–913 57. Larsen DS, Stoodley RJ (1989) Asymmetric Diels–Alder reactions. Part 3. Influence of butadiene structure upon the diastereofacial reactivity of (E)-1-(20 ,30 ,40 ,60 -tetra-O-acetylβ-D-glucopyranosyloxy)buta-1,3-dienes. J Chem Soc Perkin Trans 1:1841–1852. https://doi.org/10.1039/p19890001841 58. Lunau N, Seelhorst K, Kahl S, Tscherch K, Stacke C, Rohn S, Thiem J, Hahn U, Meier C (2013) Fluorescently labeled substrates for monitoring α1,3-fucosyltransferase IX activity. Chem Eur J 19:17379–17390 59. Ahmadipour S, Miller GJ (2017) Recent advances in the chemical synthesis of sugarnucleotides. Carbohydr Res 451:95–109 60. Ahmadipour S, Beswick L, Miller GJ (2018) Recent advances in the enzymatic synthesis of sugar-nucleotides using
nucleotidylyltransferases and glycosyltransferases. Carbohydr Res 469:38–47 61. Wolf S, Zismann T, Lunau N, Meier C (2009) Reliable synthesis of various nucleoside diphosphate glycopyranoses. Chem Eur J 15:7656–7664 62. Lafont D, Boullanger P, Carvalho F, Vottero P (1997) A convenient access to β-glycosides of N-acetyllactosamine. Carbohydr Res 297:117–126 63. Crawford C, Wear MP, Smith DFQ, d’Errico C, Casadevall A, Oscarson S (2020) Glycan FRET probes for screening catalytic antibodies against Cryptococcus neoformans capsule. ChemRxiv 2020 64. Guazzelli L, Ulc R, Bowen A, Crawford C, McCabe O, Jedlicka A, Wear M, Casadevall A, Oscarson S (2020) A synthetic glycan array containing Cryptococcus neoformans glucuronoxylomannan capsular polysaccharide fragments allows the mapping of protective epitopes. ChemRxiv. https://doi.org/10. 26434/chemrxiv.11914905.v1 65. Guazzelli L, Ulc R, Rydner L, Oscarson S (2015) A synthetic strategy to xylose-containing thioglycoside tri- and tetrasaccharide building blocks corresponding to Cryptococcus neoformans capsular polysaccharide structures. Org Biomol Chem 13:6598–6610 66. Crawford C, Oscarson S (2020) Optimized conditions for the palladium-catalyzed hydrogenolysis of benzyl and naphthylmethyl ethers: preventing saturation of aromatic protecting groups: optimized conditions for the palladium-catalyzed hydrogenolysis of benzyl and naphthylmethyl ethers: preventing saturation of aromatic protecting groups. Eur J Org Chem 2020:3332–3337
Part IV Glycosylation Biomarkers and Clinical Datasets
Chapter 13 N-glycan Characterization by Liquid Chromatography Coupled with Fluorimetry and Mass Spectrometry Richard A. Gardner, Paulina A. Urbanowicz, and Daniel I. R. Spencer Abstract Human blood plasma and serum have been a source of biomarkers for the indication and progression of many diseases for a few decades now. Human blood plasma is also an excellent source material to enable patients to monitor their health, with a multitude of biomarkers detectable for the assessment of health status. Blood sampling kits are increasingly available for use in the home with no specialist clinical skills required to obtain good quality samples for pathology lab analysis. Many of the proteins that constitute plasma are glycosylated with both N- and O-type glycans. There is increasing interest in the scientific community to identify potential glycan biomarkers or glycan features that are indicative of disease, and in particular disease at an early stage. The quality and reproducibility of glycan analysis data is key in order to identify and utilise glycan-based blood biomarkers with sufficient specificity and sensitivity; hence, the required analytical tools need to be robust. In this chapter, we describe an analytical method for the UHPLC separation of plasma N-glycans which utilizes both glycan reducing terminus fluorophore labeling, to ensure stoichiometric analysis of relative glycan abundance, and online mass spectrometry for glycan identification. Exoglycosidase digestions were employed as example technique to aid and enable structure identification. Key words N-glycan, Glycosylation, Reductive amination, Fluorophore, LC-MS, Plasma, Glycoprofiling, Biomarker
1
Introduction Blood plasma is both a convenient and rich source of biomarkers of acute, chronic, and developing health conditions [1–3]. Its accessibility through the advent of home blood sampling kits, coupled with centralized testing laboratories and web-based sample and results management systems, has expanded the potential for direct client-to-testing laboratory interactions, both within and outside of the clinical communities. With this potential comes the responsibility of providing timely and robust analytical services. Central to ensuring the quality of sample processing is the validity of the analytical systems being used.
Gavin P. Davey (ed.), Glycosylation: Methods and Protocols, Methods in Molecular Biology, vol. 2370, https://doi.org/10.1007/978-1-0716-1685-7_13, © Springer Science+Business Media, LLC, part of Springer Nature 2022
267
268
Richard A. Gardner et al.
Many different plasma biomarkers have been explored and used over the years including DNA, RNA, proteins and small molecule biomarkers. A more novel approach that has been investigated in recent years is the use of glycans and/or glycoproteins as biomarkers. Typically, these approaches and techniques focus on the analysis of N-glycans as biomarkers or health indicators, with several currently being explored [4, 5]. N-glycans, complex sugars linked to the asparagine of proteins, feature predominantly within this emerging repertoire of commercially or clinically exploitable analytical assays due to their ability to reflect physiological and pathological changes within a patient. There has been extensive validation of analytical systems and testing pathways within clinical settings for DNA, RNA, protein, and small molecule biomarkers, whereas glycan and glycoprotein biomarkers have been underutilized and still operate within a more research and development environment than a clinical one. Part of this is due to the complications arising from the nature of the glycans themselves which are typically branched and have no natural chromophore. There is also a lack of widely established analytical instrumentation for high-throughput sample processing, compared to methods such as PCR for DNA samples. Furthermore, low concentrations within the blood stream means analytical techniques and instrumentation have to be sensitive and reliable. However, much work has been done to improve the robustness of techniques for the analysis of glycan biomarkers, in particular techniques for analyzing liquid biopsies such as blood plasma [6, 7]. N-glycans have been the focus of various studies as, unlike O-glycans, they are the most readily accessible to characterize with a universal enzyme for their cleavage from the protein backbone—peptide N-glycosidase F (PNGase F). Once cleaved from the protein, the glycans lend themselves to stoichiometric fluorophore labeling with one fluorophore per N-glycan, irrespective of its size or branching characteristics. The fluorophore enables detection of the N-glycans when a fluorescent detector is coupled to a liquid chromatography (LC) instrument. The different glycan structures obtained from blood plasma are separated and resolved, typically on a hydrophilic interaction liquid chromatography (HILIC) column, within a 30–70 min timeframe per sample. This technique has been greatly improved since its emergence 30–35 years ago and has advanced at a greater pace over the past 15 years with the introduction of higher pressure liquid chromatography (UHPLC) systems. These enhanced UHPLC systems have enabled the development and use of sub 2 μm particle size HPLC columns, allowing for improved resolution of glycan structures from each other. Many different fluorophores have been used for N-glycan analysis over the years with the main ones being 2-aminobenzamide, 2-aminobenzoic acid, and 2-aminopyridine. However, newer
N-glycan Characterization by Liquid Chromatography Coupled with. . .
269
fluorophores for the detection of N-glycans have also been developed to allow greater assay sensitivity, not only with regards to fluorescence detection but also with regard to mass spectrometric detection. A few of the most recent fluorophores that have been researched and developed for use in coupled techniques such as LC-MS include (1) the InstantAB system developed by ProZyme, (2) the RapiFluor system developed by Waters (both systems being variants of the work by Baginski [8]), (3) AQC (6-AminoquinolylN-hydroxysuccinimidyl carbamate) [9], (4) procainamide [10, 11], first tested by Yoshino and coworkers [12], a routinely used label due to its versatility and sample stability. The addition of these labels to glycans enable the end user to achieve repeatable and robust data for relative abundancies of glycan structures found within blood plasma samples. The labels also allow structural identification through coupled LC-ESI-MS techniques, enabling good signal sensitivity, mass determination and fragmentation confirmation of multiple structures. Fundamentally, the speed of sample preparation, glycan release, cleanup, and fluorophore labeling, has improved tremendously with several specialist groups routinely using high throughput automated liquid handling workstations in 96 well formats [11, 13]. However, improvements in sample running times on LC still need to catch up considerably. To date, most of the improvements in pressure capacities of instruments and denser particle packed LC columns have been channeled into the capability of improving the resolution of glycan structures rather than the speed of the sample chromatography. The next generation of LC instruments and columns will hopefully arise to improve sample throughput. In the meantime, we demonstrate the validity and flexibility of the procainamide label with N-glycans released from plasma samples. The procainamide labeling technique is both specific and sensitive for N-glycans and the robustness of the technique has made it amenable to robotic sample preparation which in turn has led to sample chromatography repeatability of sub 5% for peak area determination. N-glycan changes within plasma biomarkers, particularly in situations such as prodromal inflammation, early stage cancer and diabetes, are likely to be subtle; hence, variations in sample processing and data acquisition due to the sample technique itself need to be minimized to exploit the subtle N-glycan changes between samples. Additionally, exoglycosidase incubation of procainamide samples is demonstrated. Digestion with these highly specific enzymes that cleave certain glycan bonds enhances the identification capabilities of chromatographic peaks and can be utilized in a combined analytical package to determine new glycan biomarkers in a research and development setting.
270
2
Richard A. Gardner et al.
Materials Deionized water (18.2 MΩ cm at 25 C) is used throughout. Human plasma (Sigma)—stored frozen and thawed at room temperature directly before analysis.
2.1 PNGase F Release
1. 96-Well PCR plate or low volume tubes. 2. PNGase F enzyme (here part of commercial kit LZ-rPNGaseFkit (Ludger Ltd., UK)). 3. Reaction buffer—500 mM sodium phosphate (pH 7.5 at 1 dilution). 4. Denaturation solution—5% SDS 400 mM DTT. 5. 10% NP-40 solution. 6. Sample oven. 7. Rotary vacuum centrifuge. 8. PCR plate sealer (optional for when 96-well PCR plate is used for sample incubation).
2.2 Cleanup of Released Glycans
1. 5% formic acid (50 μL of formic acid in 950 μL water). 2. Protein binding membrane plate (PBM) (Ludger Ltd). 3. 96-Well sample collection plate. 4. Methanol. 5. Centrifuge equipped with plate rotor.
2.3 Labeling N-glycans
1. 30% acetic acid in DMSO. 2. 16 mg procainamide dye. 3. 9 mg sodium cyanoborohydride. 4. Sample oven.
2.4 Postlabeling Cleanup
1. HILIC cleanup SPE plate (here LC-PROC-96, Ludger Ltd.). 2. 70% ethanol. 3. Acetonitrile. 4. Vacuum manifold.
2.5 Exoglycosidase Digestion and Cleanup
1. Sialidase capable of removing α2-3,6 sialic acid (E-S001, QABio, USA). 2. 250 mM sodium phosphate pH 6. 3. 30 kDa molecular cutoff spin columns (LC-EXO-A6 (Ludger Ltd.)). 4. Sample oven. 5. Rotary vacuum concentrator.
N-glycan Characterization by Liquid Chromatography Coupled with. . .
2.6 UHPLC-MS Analysis
271
1. Acetonitrile. 2. 50 mM ammonium formate pH 4.4. 3. Glucose Homopolymer (GHP) procainamide labeled calibration standard. 4. Procainamide labeled sialylated glycan standard—system suitability (here procainamide labeled A2G2S2). 5. ACQUITY BEH Glycan column (1.7 μm, 2.1 150 mm) (Waters). 6. UHPLC system with fluorescence detector (here UltiMate 3000 (Dionex)). 7. ESI-MS (here amaZon speed ETD ion trap (Bruker)).
3
Methods
3.1 PNGase F Release
Outline of the experimental procedure of the present protocol is shown in Fig. 1. Aliquot 5 μL of plasma sample into a 96-well PCR plate or small centrifuge tubes or sample vials.
3.1.1 Prepare Plasma Samples 3.1.2 Denature
To each sample add 4 μL of water and 1 μL of Denaturation solution. If using a 96-well PCR plate, seal the plate. Mix all the samples on a vortexer for 1–2 min followed by briefly centrifuging to ensure all the samples are in the bottom of the wells or vials. Incubate the samples for 10 min at 100 C and allow them to cool to room temperature.
3.1.3 Addition of PNGase F Enzyme
If a 96-well PCR plate is used, carefully remove the seal. To each sample add 6 μL of water and 2 μL Reaction buffer followed by 2 μL of NP-40 followed by 1 μL of PNGase F. If using a 96-well PCR plate, seal the plate. Mix all the samples on a vortexer for 1–2 min followed by briefly centrifuging to ensure all the samples are in the bottom of the wells or vials. Incubate the samples for 16–20 h at 37 C.
3.2 Cleanup of Released N-glycans
The addition of formic acid solution to released N-glycan samples aids in the hydrolysis of the glycosylamine form of the N-glycans following PNGase F release. Hydrolyzing the glycosylamine promotes the formation of a reducing end which enables the glycans to be fluorescently labeled.
272
Richard A. Gardner et al.
Fig. 1 Outline of the experimental procedure
N-glycan Characterization by Liquid Chromatography Coupled with. . .
273
3.2.1 Prepare a Formic Acid Solution
Prepare a solution of 5% formic acid by adding 50 μL formic acid to 950 μL water.
3.2.2 Conversion of N-glycan to Aldose Form
Add 5 μL of the 5% formic acid solution to each sample, mix all the samples on a plate shaker or vortexer for 1–2 min and then briefly centrifuge. Incubate at room temperature for 45 min. Following this incubation, the samples need to be cleaned up from contaminating proteins and acid straight away. Do not leave the samples in acid.
3.2.3 Preparation of PBM Plate
Place the PBM plate on top of a 96 well waste collection plate. Pipet 100 μL of methanol into each well of the PBM plate to wet the membrane. Place the lid on the PBM plate and hold the PBM plate and the 96-well collection plate securely together using tape or elastic. Using a benchtop centrifuge or equivalent, centrifuge the PBM plate and collection plate at 800 rpm for 3 min. Remove the tape or elastic and lid and pipet 300 μL of water into each well to wash the membrane. Replace the lid back on the PBM plate and secure together using tape or elastic. Repeat the centrifugation step. Remove the PBM plate from the top of the waste collection plate and blot the bottom of the plate on a paper towel to remove the excess water.
3.2.4 Sample Cleanup
Pipet the released N-glycan samples in formic acid into the PBM plate wells. Wash out each sample well or vial with 100 μL of water and add this to the PBM plate wells. Place the PBM plate containing the samples directly on top of a clean 96-well collection plate so that well A1 on the PBM plate covers well A1 of the deep well collection plate, well A2 covers well A2, and so on. Place the lid on the PBM plate and hold the PBM plate and the 96-well collection plate securely together using tape or elastic. Using a benchtop centrifuge or equivalent, centrifuge the PBM plate and collection plate at 800 rpm for 3 min to elute the samples through the PBM plate. Remove the tape or elastic and lid from the PBM plate and pipet 100 μL of water into each well. Replace the lid on the PBM plate and hold the PBM plate and the 96-well collection plate securely together using tape or elastic. Centrifuge the PBM plate and collection plate at 800 rpm for 3 min to elute the wash through the PBM plate.
3.2.5 Dry Down the Samples
Transfer the eluted N-glycan samples from the 96-well collection plate to a 300 μL 96 well PCR plate or 0.5 mL centrifuge vials. Dry the samples down completely using a rotary vacuum concentrator (approximately 6–9 h).
274
Richard A. Gardner et al.
3.3 Labeling N-glycans 3.3.1 Prepare the Labeling Reagent
Add 150 μL of 30% acetic acid in DMSO to a vial of procainamide dye and mix by pipette action until the dye is dissolved. Heat (30–60 C, 10–20 s) may be required to help dissolve the dye. Add 150 μL of water to the procainamide dye solution and mix by pipette action. Transfer all of the dissolved procainamide dye solution to a vial of sodium cyanoborohydride reductant and mix by pipette action until the reductant is dissolved. Heat (30–60 C, 10–20 s) may be required to help dissolve the reductant.
3.3.2 Add Labeling Reagent to Samples
Add 20 μL of labeling reagent to each glycan sample. If using a 96-well PCR plate, seal the plate. Mix the samples on a vortexer for 1–2 min followed by briefly centrifuging to ensure all the samples are in the bottom of the wells or vials.
3.3.3 Incubate
Place the reaction vials in a dry oven set at 65 C and incubate for 1 h.
3.3.4 Centrifuge and Cool
After incubation, remove the samples, centrifuge the plate or vials briefly and allow them to cool to RT.
3.4 Postlabeling Cleanup
Assemble a vacuum manifold with a waste collection plate inside. Place the SPE procainamide cleanup plate (LC-PROC-96) on top of the manifold.
3.4.1 Prepare the SPE Cleanup Plate 3.4.2 Wash and Prime the Plate
Prime the plate by adding 200 μL of 70% ethanol. Apply a vacuum and adjust to between 0.05 to 0.3 bar to elute the 70% ethanol through the wells of the plate. Wash the plate by adding 200 μL of water. Apply a vacuum and adjust to between 0.05 to 0.3 bar to elute the water through the wells of the plate. Repeat the same process with 200 μL of acetonitrile to prime the plate ready for the samples.
3.4.3 Prepare the N-glycan Samples
Pipet 230 μL of acetonitrile into each procainamide labeled sample to make the volume up to 250 μL. Gently mix the samples 5 times by pipette action.
3.4.4 Apply the Samples to the Plate
Load each sample onto a primed well. Initially, allow the acetonitrile to pass slowly through the cartridges by gravity. After 10 min, apply a vacuum and adjust to between 0.05 and 0.2 bar to elute any remaining acetonitrile slowly through the cartridges (see Note 1).
3.4.5 Wash the Plate
Wash the plate with 200 μL of acetonitrile. After the wash addition, allow the acetonitrile to pass slowly through the plate by gravity. After 5 min, apply a vacuum (0.05 to 0.2 bar) to elute any remaining acetonitrile slowly through the cartridges. Repeat the
N-glycan Characterization by Liquid Chromatography Coupled with. . .
275
same process with two additional washes of 200 μL of acetonitrile. After the last wash elution step, apply a higher vacuum setting (0.3 to 0.5 bar) to remove as much acetonitrile as possible from the plate. Should any acetonitrile persist under the plate, remove the plate from the vacuum manifold and blot the bottom of the plate on a paper towel to remove the excess acetonitrile. Initially, after the addition of each wash, if the acetonitrile does not elute through the plate, apply a low vacuum setting (0.05 bar) for up to 10 s to get the flow started. 3.4.6 Sample Elution
Place a 2 mL 96-well collection plate inside the vacuum manifold. Assemble the manifold with the SPE cleanup plate on top ensuring that the collection plate is in-line with the wells of the plate. Ensure that the distance between the collection plate and the SPE cleanup plate is as small as possible to reduce the gap. Elute the procainamide labeled N-glycans by adding 100 μL of water to each well. Use a low vacuum setting (0.05 bar, up to 10 s) to start the elution through the cartridges and then switch off the vacuum to allow the labeled glycans to elute under gravity. After 15 min, apply a vacuum and adjust to between 0.05 and 0.1 bar to elute any remaining water slowly through the cartridges. Repeat the same process with two additional washes of 100 μL of water. After the water has eluted, apply a higher vacuum setting (0.5 bar) to elute any further remaining water from the cartridges.
3.5 Exoglycosidase Digestions and Cleanup
Pipet into a 0.2 mL polypropylene vial or similar enough procainamide labeled glycan to give a readable trace on the UHPLC (usually between 5 and 30 μL). If the volume of sample required is greater than 7 μL, then dry down the sample using rotary vacuum concentrator.
3.5.1 Preparation of Samples for Digestion 3.5.2 Digestion
Samples are digested using the commercial kit E-S001 (QABio) (see Note 2). Add 1 μL sialidase, 2 μL of 250 mM sodium phosphate buffer, pH 6.0 and enough water so that the final volume in the vial is 10 μL. Vortex samples to mix, briefly centrifuge and make sure the lids are secure. Incubate in an oven at 37 C for 16–20 h.
3.5.3 Preparation of Exoglycosidase Cleanup Columns
LC-EXO-A6 spin columns are used for glycan purification from contaminating enzyme. Make sure that the spin column is securely assembled inside of the sample collection vial. Pipet 100 μL of pure water onto the spin column membrane. Close the vial and centrifuge for 3 min at 10,000 g. Remove the spin column from its vial and discard the wash. Place the spin column back into its vial.
3.5.4 Purification of Glycans
Pipet the glycan sample onto the spin column membrane. Wash out each sample vial with 100 μL of water and add this to the spin column membrane. Close the vial and centrifuge for 3 min at
276
Richard A. Gardner et al.
10,000 g. Pipet 100 μL of water directly onto the spin column membrane to wash through any remaining sample. Close the vial and centrifuge for 3 min at 10,000 g. Take out the spin column from its vial and discard. 3.5.5 Sample Drying and Preparation for Analysis
Dry the samples using a rotary vacuum concentrator. Reconstitute the samples for analysis in 20 μL of water.
3.6 UHPLC-MS Analysis
UHPLCs from a range of manufacturers can be used. The maximum working pressure may vary between instruments. We recommend using a BEH-Glycan 1.7 μm column, 2.1 150 mm (Waters, UK) and a fluorescence detector set to the following wavelengths, λex ¼ 310 nm, λem ¼ 370 nm (see Note 3). Solvent A: 50 mM ammonium formate pH 4.4; Solvent B: acetonitrile.
3.6.1 UHPLC-FD-MS Methods
3.6.2 Prepare the UHPLC-FD-MS
When starting the UHPLC system, we recommend running at least one full method without any sample injected to condition the column. Next, we recommend running at least three glucose homopolymer (GHP) injections until the profiles overlap (see Note 4); the peak shape should be symmetrical and the peak width at half height should be less than 0.22 min for the glucose unit (GU) 10 peak. This should be followed by an injection of procainamide labeled A2 (A2G2S2) standard: you should see a sharp single peak (see Note 5). If required, other procainamide labeled glycan standards (e.g., high-mannose, bi-, tri-, or tetraantennary glycans) can be run for comparison of retention times and GU values to the sample.
3.6.3 Analysis of Samples by UHPLC-FD-MS
Prepare the samples for UHPLC by taking between 10 μL and 15 μL of each procainamide labeled glycan sample and mixing with between 35 and 40 μL of acetonitrile in sample vials (see Note 5). The aqueous to acetonitrile ratio should be the same for all samples. HILIC-FLD-MSn. Inject 20 μL of prepared samples into an ACQUITY BEH Glycan column (1.7 μm, 2.1 150 mm Waters Inc., USA) at 60 C on a Dionex Ultimate 3000 UHPLC instrument (Thermo, UK) with a fluorescence detector (λex ¼ 310 λem ¼ 370 nm) coupled in-line to an Amazon speed ETD (Bruker Daltonics, Bremen, Germany). Use the following UHPLC gradient conditions (see Note 6): (solvent A—50 mM ammonium formate, pH 4.4; solvent B—acetonitrile) 0–53.5 min, 76–51% B, 0.4 mL/min; 53.5–55.5 min, 51–0% B, 0.4 mL/min to 0.2 mL/ min; 55.5–57.5 min, 0% B at a flow rate of 0.2 mL/min; 57.5–59.5 min, 0–76% B, 0.2 mL/min; 59.5–65.5 min, 76% B, 0.2 mL/min; 65.5–66.5 min, 76% B, 0.2–0.4 mL/min; 66.5–70.0 min, 76% B, 0.4 mL/min. Use the following Amazon Speed settings (see Note 7): source temperature, 250 C; gas flow,
N-glycan Characterization by Liquid Chromatography Coupled with. . .
277
10 L/min; capillary voltage, 4500 V; ICC target, 200,000; Max. accu. time (Maximum Accumulation Time), 50.00 ms; rolling average, 2; number of precursor ions selected, 3; release after 0.2 min; positive ion mode; scan mode, enhanced resolution; mass range scanned, 600–2000; target mass, 900.
4
Notes 1. Allow sufficient time for sample binding to the SPE membrane. Insufficient time can result in partial/ selective binding that will cause skewing of the relative proportions of glycan species. 2. Exoglycosidases other than α2-3,6,8,9 specific sialidase can be used depending on the requirements. α1-6 core specific and α1-3,4 antenna specific fucosidases are commercially available and they are a useful tool for structure identification and in biomarker research. Mixtures of exoglycosidases (e.g., α23,6,8,9 sialidase + β1-4 galactosidase) can also be used. 3. There are a variety of HILIC UHPLC resins and columns available commercially and many of them have been proven suitable for glycan separation. These include: LudgerSep N2, 3 μm particle size (Ludger), HALO 2 Penta-HILIC, 2.0 μm particle size (Advanced Materials Technology), BIOshell glycan column, 2.7 μm particle size (Supelco), GlycanPac™ AXH-1, 1.9 μm particle size (Thermo Scientific), Accucore™ 150 Amide HILIC, 2.6 μm particle size (Thermo Scientific), AdvanceBio Glycan Mapping, 1.8 and 2.7 μm particle size (Agilent), bioZen Glycan, 2.6 μm particle size (Phenomenex). Each column exhibits its own particular separation profile. When establishing the column of choice in the laboratory it is advisable to test at least two columns from separate batches to ascertain that the column manufacture process provides stable retention characteristics from batch to batch. As columns age through use, the retention time of sialylated glycans can reduce significantly. Different UHPLC systems will add variation to the glycoprofiles obtained and will likely differ from published profiles. 4. The GHP can be used as a reference standard to assign glucose unit (GU) values to peaks in the released glycan pool. These GU values are reproducible and predictive as each monosaccharide in a glycan contributes a set increment in the GU value. This allows for primary assignment of structure by comparison of GU values for unknown glycans with glycan standards whose GU values are in databases or reported in the literature (https://glycostore.org/displayCollection/Ludger) [14].
278
Richard A. Gardner et al.
Fig. 2 Fluorescence chromatogram of procainamide labeled blood plasma N-glycans separated on a BEH glycan column (15 cm 2.1 mm—Waters). Main structures are annotated
5. Make up GHP, procainamide labeled standards and samples in a mix of 20–30% water, and 70–80% acetonitrile. The ratio of sample to acetonitrile is typically equivalent to the starting gradient of your method. The % aqueous used with the standards should be the same as the % aqueous used with samples. For sialylated samples, a low water fraction % is preferred as this stops peak splitting—use 20% or 25%. For samples in high-salt buffers (e.g., after exoglycosidase digestion) up to 30% aqueous is recommended. 6. Separation achieved for procainamide labeled plasma N-glycans separated on BEH column is presented here (Fig. 2). Main glycan structures were annotated using Consortium for Functional Glycomics notation. 7. There are a wide variety of mass spectrometry instruments available and as such, optimal MS conditions will vary significantly from instrument to instrument. There is a general decrease in MS signal with sialylated glycans as the amount of sialic acids increase which presents potential detection issues especially as the abundance of such glycans in plasma presents itself in a multitude of different linkage variations leading to splitting of the signal into separate peaks on the HILIC HPLC column. Care should also be taken when coupling a mass spectrometer after a fluorometer to avoid exceeding the pressure rating of the fluorometer flow cell causing it to leak. Shortening the length and increasing the internal diameter of the connecting tubing could help here.
N-glycan Characterization by Liquid Chromatography Coupled with. . .
279
Acknowledgments The authors would like to acknowledge grant funding from the EU commission FP7 and Horizon 2020 programs and UK academia and industrial collaborative funding (HighGlycan—grant agreement ID: 278535. IBD-BIOM grant agreement ID: 305479, GlySign—Marie Skłodowska-Curie grant agreement ID: 722095 and the UK BBSRC, EPSRC, and Innovate UK funded Glycoenzymes for Bioindustries grant.) References 1. Singh SS, Naber A, Dotz V, Schoep E, Memarian E, Slieker RC, Elders PJM, Vreeker G, Nicolardi S, Wuhrer M, Sijbrands EJG, Lieverse AG, ‘t Hart LM, van Hoek M (2020) Metformin and statin use associate with plasma protein N-glycosylation in people with type 2 diabetes. BMJ Open Diabetes Res Care 8(1):e001230. https://doi.org/10.1136/ bmjdrc-2020-001230 2. Matsumoto T, Hatakeyama S, Yoneyama T, Tobisawa Y, Ishibashi Y, Yamamoto H, Yoneyama T, Hashimoto Y, Ito H, Nishimura SI, Ohyama C (2019) Serum N-glycan profiling is a potential biomarker for castration-resistant prostate cancer. Sci Rep 9 (1):16761. https://doi.org/10.1038/ s41598-019-53384-y 3. Reiding KR, Bondt A, Hennig R, Gardner RA, O’Flaherty R, Trbojevic´-Akmacˇic´ I, Shubhakar A, Hazes JMW, Reichl U, Fernandes DL, Pucˇic´-Bakovic´ M, Rapp E, Spencer DIR, Dolhain RJEM, Rudd PM, Lauc G, Wuhrer M (2019) High-throughput serum N-glycomics: method comparison and application to study rheumatoid arthritis and pregnancy-associated changes. Mol Cell Proteomics 18(1):3–15. https://doi.org/10.1074/ mcp.RA117.000454 4. Thanabalasingham G, Huffman JE, Kattla JJ, Novokmet M, Rudan I, Gloyn AL, Hayward C, Adamczyk B, Reynolds RM, Muzinic A, Hassanali N, Pucic M, Bennett AJ, Essafi A, Polasek O, Mughal SA, Redzic I, Primorac D, Zgaga L, Kolcic I, Hansen T, Gasperikova D, Tjora E, Strachan MW, Nielsen T, Stanik J, Klimes I, Pedersen OB, Njølstad PR, Wild SH, Gyllensten U, Gornik O, Wilson JF, Hastie ND, Campbell H, McCarthy MI, Rudd PM, Owen KR, Lauc G, Wright AF (2013) Mutations in HNF1A result in marked alterations of plasma glycan profile. Diabetes 62 (4):1329–1337. https://doi.org/10.2337/ db12-0880
5. Scott DA, Drake RR (2019) Glycosylation and its implications in breast cancer. Expert Rev Proteomics 16(8):665–680. https://doi.org/ 10.1080/14789450.2019.1645604 6. Ruhaak LR, Hennig R, Huhn C, Borowiak M, Dolhain RJ, Deelder AM, Rapp E, Wuhrer M (2010) Optimized workflow for preparation of APTS-labeled N-glycans allowing highthroughput analysis of human plasma glycomes using 48-channel multiplexed CGE-LIF. J Proteome Res 9(12):6655–6664. https://doi. org/10.1021/pr100802f 7. Ventham NT, Gardner RA, Kennedy NA, Shubhakar A, Kalla R, Nimmo ER, IBD-BIOM Consortium, Fernandes DL, Satsangi J, Spencer DI (2015) Changes to serum sample tube and processing methodology does not cause inter-individual variation in automated whole serum N-glycan profiling in health and disease. PLoS One 10(6): e0129335. https://doi.org/10.1371/journal. pone.0129335 8. Baginski T. (2009) Compounds and methods for the rapid labelling of N-glycans. Patent WO 2009/100155 Al 9. O’Flaherty R, Muniyappa M, Walsh I, Sto¨ckmann H, Hilliard M, Hutson R, Saldova R, Rudd PM (2019) A robust and versatile automated glycoanalytical technology for serum antibodies and acute phase proteins: ovarian cancer case study. Mol Cell Proteomics 18(11):2191–2206. https://doi.org/10. 1074/mcp.RA119.001531 10. Thomson RI, Gardner RA, Strohfeldt K, Fernandes DL, Stafford GP, Spencer DIR, Osborn HMI (2017) Analysis of three epoetin alpha products by LC and LC-MS indicates differences in glycosylation critical quality attributes, including sialic acid content. Anal Chem 89 (12):6455–6462. https://doi.org/10.1021/ acs.analchem.7b00353
280
Richard A. Gardner et al.
11. Kozak RP, Tortosa CB, Fernandes DL, Spencer DI (2015) Comparison of procainamide and 2-aminobenzamide labeling for profiling and identification of glycans by liquid chromatography with fluorescence detection coupled to electrospray ionization-mass spectrometry. Anal Biochem 486:38–40. https://doi.org/ 10.1016/j.ab.2015.06.006 12. Yoshino K, Takao T, Murata H, Shimonishi Y (1995) Use of the derivatizing agent 4-aminobenzoic acid 2-(diethylamino)ethyl ester for high-sensitivity detection of oligosaccharides by electrospray ionization mass spectrometry. Anal Chem 67(21):4028–4031. https://doi.org/10.1021/ac00117a034
13. Shubhakar A, Pang PC, Fernandes DL, Dell A, Spencer DIR, Haslam SM (2018) Towards automation of glycomic profiling of complex biological materials. Glycoconj J 35 (3):311–321. https://doi.org/10.1007/ s10719-018-9825-8 14. Zhao S, Walsh I, Abrahams JL, Royle L, Nguyen-Khuong T, Spencer D, Fernandes DL, Packer NH, Rudd PM, Campbell MP (2018) GlycoStore: a database of retention properties for glycan analysis. Bioinformatics 34(18):3231–3232. https://doi.org/10. 1093/bioinformatics/bty319
Chapter 14 Lectin Histochemistry for Tissues and Cells, and Dual Lectin and Antibody Co-localization Siobha´n S. McMahon and Michelle Kilcoyne Abstract In this chapter we describe in detail methods for lectin staining of (1) tissues, and (2) cells to identify and map endogenous glycosylation. We also describe (3) dual antibody and lectin staining of tissues to associate glycosylation with particular proteins or cells in tissues. Key words Lectin, Histochemistry, Antibody, Staining, Tissue, Cells, Co-localization, Dual, Glycosylation
1
Introduction Carbohydrates and their interactions play a crucial role in the development and maintenance of tissues and organs in vivo and are involved in numerous biological processes ranging from cell-tocell adhesion, inflammation, cell proliferation, oncogenic transformation to immune interactions [1]. Cell surface glycosylation alters with cell differentiation and altered cell and tissue glycosylation has been associated with aging, malignancy, tissue injury, disease states, and regeneration [1–4]. Moreover, specific glycosylation is a typical feature of certain cells and tissues, and can be used for cell biomarker identification. For example, the B4 isolectin from Griffonia simplicifolia seeds (GS-I B4) is widely used to visualize microglial cells (but not other glial cells) and the nodes of Ranvier in the rat central nervous system, as it specifically localizes to microglial plasma membranes and intracytoplasmic membranes [5, 6]. This staining indicates that terminal α-linked galactose is present on a cell surface glycoconjugate [5]. Lectins are non-enzymatic carbohydrate-binding proteins of non-immune origin that precipitate glycoproteins or polysaccharides and agglutinate cells [7, 8]. They are found in almost all organisms studied to date, but those from the plant kingdom
Gavin P. Davey (ed.), Glycosylation: Methods and Protocols, Methods in Molecular Biology, vol. 2370, https://doi.org/10.1007/978-1-0716-1685-7_14, © Springer Science+Business Media, LLC, part of Springer Nature 2022
281
282
Siobha´n S. McMahon and Michelle Kilcoyne
have been most widely researched and subsequently used in histochemical applications [8, 9]. Lectins bind specifically to distinct carbohydrate moieties but in some cases lectins can contain one or more non-carbohydrate ligand sites [8, 10], so carbohydratemediated binding must be validated [11]. As each lectin binds to different carbohydrates in different configurations, it is possible to “map” glycosylation expression on a cell or tissue surface with a library of different lectins [2–4]. This property can then be used to exploit the glycosylation changes associated with cell transformation and differentiation as a diagnostic of particular cell states [12, 13]. In this chapter, we describe methods for lectin staining of tissues and cells, and dual antibody and lectin histochemistry to co-localize carbohydrate structures with known proteins or cells.
2
Materials Prepare all solutions in ultrapure water (15.5 MΩ at 25 C) and using analytical grade reagents unless indicated otherwise. Prepare and store all reagents at room temperature unless indicated otherwise. Follow all local waste disposal regulations when disposing of waste materials.
2.1 Periodic AcidTreated BSA for Blocking
1. Bovine serum albumin (BSA) of 99% purity or greater should be used (see Note 1). The periodic acid treatment of BSA method described here was adapted from Glass et al. [14]. 2. Prepare a sufficient volume 10 mM periodic acid in 0.1 M sodium acetate, pH 4.5 for treating BSA quantity (see Note 2). For 1 L of buffer, first place the correct quantity of sodium acetate salt (8.2 g anhydrous sodium acetate) in the bottle (or beaker or flask, we normally use 1 L media bottles) along with a magnetic stir bar and add approximately 600 mL ultrapure water. Place on a magnetic stir place and stir gently to dissolve. Adjust pH to 4.5 using drops of glacial acetic acid (99%) initially, followed by 1 N acetic acid when close to the desired pH. Use 1 and 0.1 M sodium hydroxide to slightly increase pH if pH accidentally is brought too low. Add 2.28 g periodic acid (ortho-periodic acid) and continue to stir gently to dissolve. Periodic acid is an oxidizing agent of moderate strength and should be handled with caution. Measure pH again, adjust back to pH 4.5 if necessary and make up to 1 L with ultrapure water after removing the stir bar with a magnet. 3. Dissolve BSA in the freshly made 10 mM periodic acid in 0.1 M sodium acetate, pH 4.5 buffer to a concentration of approximately 5 g BSA/100 mL buffer with gentle stirring (see Note 3).
Lectin Histochemistry for Tissues and Cells, and Dual Lectin and Antibody. . .
283
Fig. 1 (a) Fill 50 mL tubes with dialyzed ptBSA mixture up to half way. (b) Freeze the liquid at an angle to increase surface area by storing for up to 6 h in a 80 C freezer. (c) Pierce holes in the plastic lid of the 50 mL tubes with an awl or scissors from inside of cap (to avoid hanging plastic chads tearing Kimwipes). (d) Remove tubes with frozen dialyzed ptBSA and assemble tubes with Kim wipes and pierced lids. Place assemble frozen tubes on the lyophilizer until ptBSA has completely lyophilized to a powder (approximately 2 days). (e) Remove tubes with dry ptBSA powder, remove pierced caps and Kimwipes, screw on intact caps, and store at 4 C until use
4. Continue to gently stir the mixture for 6 h (including the time it takes to dissolve the BSA completely) (see Notes 4 and 5). 5. Dialyze the mixture against water over 2 days with four changes of water (morning and evening) at 4 C (e.g., in a cold room). 6. Dispense the liquid in to an appropriate number of 50 mL tubes (Sarstedt). Only half fill the tubes, loosely cap and freeze at an angle in a 80 C freezer for approximately 6 h. The angled surface will increase surface area to speed lyophilization. Upon removal from the freezer, remove the caps, cover the top of the tube with Kimwipes or another no-shed thin tissue (see Note 6) and screw on pierced caps (Fig. 1). 7. Lyophilize the dialyzed periodic acid-treated BSA (ptBSA) to complete dryness (approximately 2 days). The powdered ptBSA should be white and fluffy (see Note 7). Remove the tubes from the lyophilizer and cap the tubes with complete (non-pierced) screw top caps and store at 4 C until use (see Note 8). 2.2 Lectin Histochemistry for Tissues
1. Slide warmer. 2. A low salt version of Tris-buffered saline supplemented with Ca2+ and Mg2+ ions (TBS) is used for lectin histochemistry (see Note 9). TBS: 20 mM Tris–HCl, 100 mM NaCl, 1 mM CaCl2, 1 mM MgCl2, pH to 7.2 with concentrated HCl. Usually 1 L of a 10 TBS stock is made and diluted to 1 with ultrapure water just before use.
284
Siobha´n S. McMahon and Michelle Kilcoyne
3. TBS-T: 100 mL 10 TBS and add 900 mL ultrapure water and 0.5 mL Triton™ X-100 (0.05%) (see Note 10). 4. Glass Coplin jars. 5. Opaque slide staining chamber (see Note 11). Add ultrapure water to the bottom of the chamber to approximately 3 mm depth at the start of the staining procedure and replace the lid for a humidity chamber. 6. 2% ptBSA in TBS (blocking buffer) (see Note 12): Dissolve 2 g ptBSA in 100 mL TBS using constant stirring (scale to desired final volume, approximately 1 mL per slide is required). Complete dissolution can take up to 30 min. Blocking solution can be prepared in advance of the staining procedure and stored at 4 C (see Note 13). In the latter case, remove blocking buffer from the fridge at least 30 min to 1 h for volumes up to 50 mL (longer for larger volumes) before starting the staining procedure to allow the blocking buffer to equilibrate to room temperature before use. 7. Fluorescently labeled lectins (see Table 1 for examples of commonly used lectins for histochemistry). Stock solutions from commercial vendors are typically supplied at 1 or 5 mg/mL. Table 1 A selection of lectins commonly used for histochemistry, their sources, carbohydrate binding specificities, and haptenic sugars Conc. (mM) haptenic sugar
Lectin Source
Carbohydrate binding specificity
Con A Canavalia ensiformis
Man in high-mannose type, hybrid type and biantennary complex type N-glycans
100 mM Man
GS-IB4
Griffonia simplicifolia (I B4 isolectin)
Terminal α-linked Gal
100 mM Gal
GS-II
Griffonia simplicifolia (lectin II)
Terminal GlcNAc
100 mM GlcNAc
(Neu5Ac)Gal-β-(1 ! 3)-GalNAc-α-O-S/T (T-antigen)
100 mM Gal
Jacalin Artocarpus integrifolia PNA
Arachis hypogaea
Gal-β-(1 ! 3)-GalNAc-α-O-S/T (T-antigen)
100 mM Gal
WGA
Triticum vulgaris
GlcNAc-β-(1 ! 4)-GlcNAc-β-(1 ! 4)-GlcNAc, sialic acid
100 mM GlcNAc
VVA
Vicia villosa
GalNAc-α-O-S/T (Tn-antigen)
50 mM GalNAc
Fuc-α-(1 ! 2)-Gal-R
50 mM Fuc
Neu5Ac/Gc-α-(2 ! 3)-Gal-β-(1 ! 4)GlcNAc-β-(1 ! R)
100 mM lactose
Neu5Ac-α-(2 ! 6)-Gal(NAc)-R
100 mM lactose
UEA- Ulex europaeus I MAA
Maackia amurensis
SNA-I Sambucus nigra
Lectin Histochemistry for Tissues and Cells, and Dual Lectin and Antibody. . .
285
Dilute lectins to desired concentrations in the required volume of TBS-T 30 min before use and store on ice until use (approximately 1 mL per slide is required). Fluorescently labeled lectins should be handled in the dark and dilutions should be prepared and held in the dark. Typical lectin concentrations used are 1–20 μg/mL but concentrations should be titrated for specific samples. Selection of fluorescent label depends on the lasers of fluorescence microscope available, but fluorescein isothiocyanate (FITC) and tetramethylrhodamine isothiocyanate (TRITC) are commonly commercially available fluorescent label options for lectins (see Note 14). 8. Monosaccharides and disaccharides (haptenic sugars) solutions in TBS-T for lectin binding inhibition. Sugar solutions of appropriate concentration (see Table 1 for examples) for each lectin used should be made in TBS-T (e.g., 100 mM Man) at required volumes (typically 10 mL of each sugar solution is required depending on planned experiments). Dilute fluorescently labeled lectins to the desired concentration in the required volume of the appropriate haptenic sugar in TBS-T 30 min before use and store on ice until use (e.g., Con A-FITC diluted in 100 mM Man in TBS-T). Fluorescently labeled lectins should be handled in the dark and dilutions should be prepared and held in the dark. 9. 40 ,6-Diamidino-2-phenylindole dihydrochloride (DAPI). Make 1 mg/mL in ultrapure water as stock solution and store aliquots (1 mg/mL in water) at 20 C. Remove an aliquot 1 h before use and thaw on ice. Dilute 1 μL of stock per 2.5 mL TBS 20 min before use and keep on ice until use. Protect DAPI solutions from light at all times. 10. ProLong™ Gold Antifade Mountant (Thermo Fisher Scientific) (see Note 15). Remove from the freezer 30 min before starting the staining procedure and place on its side to thaw to room temperature to discourage the formation of bubbles. 11. Glass coverslips. 12. Fluorescence microscope. 2.3 Lectin Cytohistochemistry
1. Sterile 15 mL polypropylene centrifuge tubes with conical bottoms (e.g., Corning or Sarstedt). 2. Phosphate buffered saline (PBS), pH 7.4. 3. 4% paraformaldehyde (pFA): For 500 mL of 4% pFA, preheat 400 mL phosphate buffered saline (PBS), pH 7.4 (see Note 16) in a media bottle to 60 C with on a heat block with stirring using a magnetic stir bar in a fume hood (see Note 17). Weigh 20 g pFA powder (see Note 18) and add to the heated PBS with constant stirring. Do not allow the temperature to rise above 60 C. The powder will not immediately go in to solution, so slowly add 1 M NaOH dropwise until the solution clears.
286
Siobha´n S. McMahon and Michelle Kilcoyne
When pFA is dissolved, remove the bottle from heat, add the final 100 mL PBS and allow to cool to room temperature. Solution is then filtered (1 kDa molecular weight cutoff filter) and 10 mL aliquots of the filtered 4% pFA stored at 20 C for up to 3 months. Aliquots are thawed just before use (see Note 19) (do not refreeze thawed aliquots). 4. TBS-T1: TBS-T: 100 mL 10 TBS and add 900 mL ultrapure water and 0.2 mL Triton™ X-100 (0.02%) (see Note 10). 5. DAPI: Dilute 1 μL of 1 mg/mL stock per 1 mL TBS (i.e., 1 μg/mL final concentration) 20 min before use and keep on ice until use. Protect DAPI solutions from light at all times. 6. Plastic Pasteur pipettes. 7. Glass microscope slides (e.g., SuperFrost™ microscope slides, Thermo Fisher Scientific). 8. All other reagents are the same as Subheading 2.2. 2.4 Dual Immunohistochemistry and Lectin Histochemistry Staining
1. TBS-T1: TBS-T: 100 mL 10 TBS and add 900 mL ultrapure water and 0.2 mL Triton™ X-100 (0.02%) (see Note 10). 2. Primary antibody or antibodies (unlabeled). Antibodies may be monoclonal or polyclonal, but monoclonal antibodies are usually preferable for more specific tissue staining. Primary antibody should be from a species different to the species of the tissue under investigation to avoid non-specific binding by the secondary antibody. Dilute stock solutions of primary antibody (or antibodies) from commercial vendor(s) to the desired concentrations in the required volume of TBS-T1 (see Note 20) 30 min before use and store on ice until use (approximately 1 mL per slide is required). Dilution concentrations should be titrated for specific samples. 3. Fluorescently labeled secondary antibody or antibodies. Secondary antibodies bind to the heavy chain of primary antibodies so that they do not interfere in the binding of the primary antibody. Selection of the secondary antibody depends upon the species that the primary antibody was made from. The secondary antibody should be from a different species and should be against the primary antibody species. For example, anti-GFAP rabbit antibody (primary antibody) is detected by TRITC-labeled goat anti-rabbit antibody (secondary antibody). Selection of fluorescent label depends on (1) the lasers of fluorescence microscope available, and (2) the label used for fluorescently labeled lectins. The label selected for secondary antibodies cannot be the same as that selected for the lectin or cannot fluoresce at the same wavelength. For example, if FITC label has been selected for the lectin in a dual staining experiment, TRITC should be selected for the secondary antibody label. Dilute stock solutions of fluorescently labeled secondary
Lectin Histochemistry for Tissues and Cells, and Dual Lectin and Antibody. . .
287
antibody (or antibodies) from commercial vendor(s) to the desired concentrations in the required volume of TBS-T1 (see Note 20) 30 min before use and store on ice until use (approximately 1 mL per slide is required). Dilution concentrations should be titrated for specific samples. Fluorescently labeled secondary antibodies should be handled in the dark and dilutions should be prepared and held in the dark. 4. All other reagents are the same as Subheading 2.2.
3
Methods
3.1 Lectin Histochemistry for Fixed and/or Fresh Frozen Tissue Sections
1. This method has been used for a variety of fixed or (flash) fresh frozen OCT embedded tissues sections on glass slides, including sucrose infiltrated and 4% pFA fixed rat spinal cord tissue sections and intervertebral discs of approximately 20 μm thickness [2, 4]. All steps are carried out at room temperature unless otherwise indicated. 2. Remove slides with frozen tissue sections from freezer and place on a slide warmer at room temperature for 20 min before beginning staining procedure. 3. Place slides in a glass Coplin jar and wash three times in TBS-T with gentle shaking on an orbital shaker for 5 min each wash. When pouring off buffer from the slides, place a gloved finger over the top edges of the slides to maintain their position. When pouring buffer in to the jar, pour on to the sides of the jar and not directly on the slides. Pour buffer on the side of the jar with the edges of the slides rather than the face of the slides to avoid any extra possibility of shearing the tissue from the face of the slide. Leave the final wash in the jar with the slides. 4. Remove the slides from the Coplin jar and place in to an opaque slide staining chamber with the water layer already at the bottom of the chamber (see Note 21). Cover the tissue sections with 2% ptBSA in TBS (see Note 22). Place the lid on the slide staining chamber and incubate at room temperature for 1 h. 5. Remove the slide staining chamber lid and remove solution from the slide surfaces using a plastic Pasteur pipette, being careful to keep the pipette tip away from the tissue sections at all times. Discard the used blocking buffer. 6. Immediately place the slides in glass Coplin jars and wash three times with TBS as in step 3 above. Do not discard the last wash; leave the slides in the Coplin jar in the last TBS wash. 7. From this point onward, procedure must be carried out in darkness and slides must be protected from light (see Note 23).
288
Siobha´n S. McMahon and Michelle Kilcoyne
8. Remove slides from TBS wash in Coplin jar and place in to an opaque slide staining chamber with the water layer already at the bottom of the chamber (see Note 21). Cover the tissue sections with fluorescently labeled lectin solution in TBS-T at appropriate dilution (see Note 22). Place the lid on the slide staining chamber and incubate at room temperature for 1 h. 9. In parallel to step 8, separate tissue slides treated in the same manner up to step 7 should be incubated with fluorescently labeled lectins prepared and preincubated in haptenic sugars in TBS-T (Table 1). These are competitive inhibition controls (see Note 24). 10. After the lectin incubation step, remove the slide staining chamber lid and remove solutions from the slide surfaces as per step 5. Discard the used lectin and lectin inhibitor solutions. 11. Immediately place the slides in glass Coplin jars and wash three times with TBS as in step 3 above. Do not discard the last wash; leave the slides in the Coplin jar in the last TBS wash. 12. Remove slides from TBS wash in Coplin jar and place in to an opaque slide staining chamber with the water layer already at the bottom of the chamber (see Note 21). Cover the tissue sections with DAPI diluted 1:2500 in TBS (see Note 22). Place the lid on the slide staining chamber and incubate at room temperature for 20 min. 13. Immediately place the slides in glass Coplin jars and wash three times with TBS as in step 3 above. Do not discard the last wash; leave the slides in the Coplin jar in the last TBS wash. 14. Apply coverslips to stained tissue with ProLong™ Gold Antifade Mountant (see Note 25). Remove the slide from the TBS wash and dry by pressing the edges of the slide to folder paper towel. The stained tissue on the slide should remain hydrated. Apply a drop of antifade mountant to the middle of the coverslip and a thin line to the bottom. Slowly sandwich the coverslip to the slide with the antifade mountant side downward, making sure to allow the antifade mountant to spread to cover the stained tissue evenly with no air bubbles (Fig. 2). 15. Dry the slide staining chamber and place the coverslipped slides in the slide staining chamber. Replace the lid and leave the slides to cure in the dark for 1 day. 16. Paint around the edges of the cover slips on the slides with clear nail varnish (see Note 26) and allow to dry (approximately 5 min). Place all slides in an opaque slide box. Image stained tissues using a fluorescence microscope within 2 days after curing for best results (Fig. 3).
Lectin Histochemistry for Tissues and Cells, and Dual Lectin and Antibody. . .
289
Fig. 2 (a) Stained tissue is on dried slide. (b) Apply a drop of antifade mountant to the middle of one side of the coverslip and a thin line to the bottom of the same side. (c) Place the coverslip edge-on against the slide face with the stained tissue. The coverslip edge with the thin line of antifade mountant should be against the slide and the coverslip side with the applied antifade mountant should be facing the slide. (d) Slowly lower the coverslip to sandwich against the slide in a levering motion using the coverslip edge with the thin line of antifade mountant as the fulcrum. The antifade mountant should begin to spread evenly across the face of the coverslip. (e) The coverslip should be sandwiched flush against the slide with the antifade mountant spread evenly to cover the tissue with no air bubbles 3.2 Lectin Cytohistochemistry
All steps are carried out at room temperature unless otherwise indicated. 1. Media is removed from cells by placing cells and media supernatant in to a sterile 15 mL polypropylene centrifuge tube and pelleting the cells by centrifugation at 90 g (approximately 700 rpm) for 5 min. Remove the tube carefully from the centrifuge rotor and remove the media using a pipette, being careful not to disturb or disrupt the cell pellet. 2. Wash the cells four times in PBS, pH 7.4. A wash is done by gently adding 1 mL PBS to the cell pellet using a pipette and gently resuspending the cells in the PBS by pipette action. The cells are then pelleted by centrifugation as in step 1 and the PBS wash supernatant removed using a pipette, always being careful not to disturb or disrupt the cell pellet.
290
Siobha´n S. McMahon and Michelle Kilcoyne
Fig. 3 FITC-labeled MAA (green) and DAPI (blue) stained nucleus pulposus (NP), annulus fibrosis (AF), and articular cartilage tissue from intervertebral discs from 3 month (immature) and 11 month (mature) old sheep. Scale bar ¼ 100 μm. Images were obtained on an Olympus IX81 inverted epifluorescence microscope. (Figure reproduced from [2] under the terms of the Creative Commons CC BY license. Copyright © 2016, Springer Nature)
Lectin Histochemistry for Tissues and Cells, and Dual Lectin and Antibody. . .
291
3. For the last wash, after resuspending the cells, transfer the resuspended cells to a 2 mL centrifugal tube for subsequent washes and staining steps. The cells are then pelleted by centrifugation as in step 1. 4. Remove the last PBS wash and add 1 mL 4% pFA. Make sure to gently resuspend the cells in the pFA solution by pipette action. Incubate for 10 min at room temperature. 5. Pellet the cells by centrifugation as in step 1, remove the pFA solution by pipette action and discard the pFA solution (see Note 19). 6. Wash the cells four times in TBS as per step 2. These washes should be added to pFA waste. If intra cell staining is of interest in addition to cell surface staining, TBS-T1 washes should be used instead of TBS. 7. After removing the last TBS (or TBS-T1) wash, resuspend cells in 1 mL of 2% ptBSA in TBS (or TBS-T1 if interested in intra cell staining). Incubate for 30 min at room temperature with gentle rotation. 8. Pellet the cells by centrifugation as in step 1, remove the blocking solution by pipette action while being careful not to disturb the cell pellet and discard the used blocking solution. 9. Wash the cells four times in TBS as per step 2. 10. From this point onward, procedure must be carried out in darkness and slides must be protected from light (see Note 23). 11. After removing the last TBS wash, add 1 mL fluorescently labeled lectin at appropriate dilution in TBS (or TBS-T1 if interested in intra cell staining) and gently resuspend cells by pipette action. Incubate for 1 h at room temperature with gentle rotation. 12. In parallel to step 11, incubate a separate tube of cells treated in the same manner up to step 10 with fluorescently labeled lectins prepared and preincubated in appropriate haptenic sugars in TBS (or TBS-T1 if interested in intra cell staining) (Table 1). These are competitive inhibition controls (see Note 24). 13. After the lectin incubation step, pellet the cells by centrifugation as in step 1, remove lectin and lectin inhibitor solutions using a pipette and discard solutions. 14. Wash the cells four times in TBS as per step 2. 15. After removing the last TBS wash, resuspend the cells in 1 mL of 1 μg/mL DAPI (i.e., stock diluted 1:1000) in TBS and incubate for 5 min at room temperature with gentle rotation.
292
Siobha´n S. McMahon and Michelle Kilcoyne
16. Remove the DAPI solution by pelleting the cells by centrifugation as in step 1 and wash the cells four times in TBS as in step 2. 17. Leave half of the last TBS wash on the cell pellet. Resuspend the cells in the remaining TBS by gentle pipette action and, using a plastic Pasteur pipette (one per tube of stained cells), place 1–5 drops of cells on a microscope slide. 18. Apply coverslips to cells with ProLong™ Gold Antifade Mountant (see Note 25). Apply a drop of antifade mountant to the middle of the coverslip and a thin line to the bottom. Slowly sandwich the coverslip to the slide with the antifade mountant side downward, making sure to allow the antifade mountant to spread to cover the tissue evenly with no air bubbles (Fig. 2). 19. Place the coverslipped slides in the opaque slide staining chamber. Replace the lid and leave the slides to cure in the dark for 1 day. 20. Paint around the edges of the cover slips on the slides with clear nail varnish (see Note 26) and allow to dry (approximately 5 min). Place all slides in an opaque slide box. Image stained cells using a fluorescence microscope within 2 days after curing for best results (Fig. 4). 3.3 Dual Lectin and Antibody Staining for Tissue Co-localization
Antibodies are glycosylated molecules and therefore lectins have the potential to bind to the carbohydrates presented by the antibodies rather than the tissue. To verify that lectin and antibody truly co-localize in tissue (where this is observed) rather than the lectin merely binding to the antibody which is already bound to the tissue, several controls must be carried out in parallel. In addition to controls, we prefer to carry out lectin staining before antibody staining to minimize the possibility of the occurrence of this false positive. 1. For lectin and immunohistochemistry double staining, carry out the lectin histochemistry method in Subheading 3.1 to step 11 (above) and continue with this procedure in the dark (see Note 23) at room temperature. 2. Immediately after the TBS wash step, remove the slides from the Coplin jar and place in to an opaque slide staining chamber with the water layer already at the bottom of the chamber (see Note 21). Cover the tissue sections with the primary antibody solution in TBS-T1 at appropriate dilution (see Note 22). Place the lid on the slide staining chamber and incubate at room temperature for 2 h. 3. Remove the slide staining chamber lid and remove solution from the slide surfaces using a plastic Pasteur pipette, being careful to keep the pipette tip away from the tissue sections at all times. Discard the used antibody solution.
Lectin Histochemistry for Tissues and Cells, and Dual Lectin and Antibody. . .
293
Fig. 4 Photomicrographs of WFA-FITC (green) and DAPI (blue) stained PC12 cells (a rat neuronal cell line) at 1 (a–c), 4 (d–f), and 8 DIV (g–i) cocultured with primary astrocytes (“normal”), Neu7 cells (“inhibitory,” a rat astrocytic cell line which secretes the inhibitory molecules NG2, versican, and the CS-56 antigen [15]) and Neu7 cells treated with chondroitinase ABC (“treated”), respectively. Scale bar ¼ 30 μm. Images obtained on an Olympus IX81 fluorescent microscope using PerkinElmer Volocity image acquisition software. (Reprinted from [3] with permission from Elsevier)
4. Place slides in a glass Coplin jar and wash three times in TBS (see Note 27) with gentle shaking on an orbital shaker for 5 min each wash. When pouring off buffer from the slides, place a gloved finger over the top edges of the slides to
294
Siobha´n S. McMahon and Michelle Kilcoyne
maintain their position. When pouring buffer in to the jar, pour on to the sides of the jar and not directly on the slides. Pour buffer on the side of the jar with the edges of the slides rather than the face of the slides to avoid any extra possibility of shearing the tissue from the face of the slide. Leave the final wash in the jar with the slides. 5. Immediately after the final TBS wash, remove the slides from the Coplin jar and place in to an opaque slide staining chamber with the water layer already at the bottom of the chamber (see Note 21). Cover the tissue sections with the fluorescently labeled antibody solution in TBS-T1 at appropriate dilution (see Note 22). Place the lid on the slide staining chamber and incubate at room temperature for 1 h. 6. After the secondary antibody incubation step, remove the slide staining chamber lid and remove solutions from the slide surfaces as per step 3. Discard the used antibody solutions. 7. Immediately place the slides in glass Coplin jars and wash three times with TBS (see Note 27) as in step 4 above. Do not discard the last wash; leave the slides in the Coplin jar in the last TBS wash. 8. Remove slides from TBS wash in Coplin jar and place in to an opaque slide staining chamber with the water layer already at the bottom of the chamber (see Note 21). Cover the tissue sections with DAPI diluted 1:2500 in TBS (see Note 22). Place the lid on the slide staining chamber and incubate at room temperature for 20 min. 9. Immediately place the slides in glass Coplin jars and wash three times with TBS as in step 4 above. Do not discard the last wash; leave the slides in the Coplin jar in the last TBS wash. 10. Apply coverslips to tissue with ProLong™ Gold Antifade Mountant (see Note 25). Remove the slide from the TBS wash and dry by pressing the edges of the slide to folder paper towel. The tissue on the slide should remain hydrated. Apply a drop of antifade mountant to the middle of the coverslip and a thin line to the bottom. Slowly sandwich the coverslip to the slide with the antifade mountant side downward, making sure to allow the antifade mountant to spread to cover the tissue evenly with no air bubbles (Fig. 2). 11. Dry the slide staining chamber and place the coverslipped slides in the slide staining chamber. Replace the lid and leave the slides to cure in the dark for 1 day. 12. Paint around the edges of the cover slips on the slides with clear nail varnish (see Note 26) and allow to dry (approximately 5 min). Place all slides in an opaque slide box. Image stained tissues using a fluorescence microscope within 2 days after curing for best results (Fig. 5).
Lectin Histochemistry for Tissues and Cells, and Dual Lectin and Antibody. . .
295
Fig. 5 Dual staining of astrocytes (detected by anti-GFAP antibody) and α-(2,6)-linked sialic acid residues (detected by SNA-I) in injured spinal cord slices. Photomicrographs show anti-GFAP rabbit antibody detected by TRITC labeled goat anti-rabbit antibody (red), FITC-labeled SNA-I (green), and DAPI (blue) staining in the lesion site of spinal cord slices of injured and Cyclosporin-A-treated animals (a, c). Scale bar ¼ 50 μm. The boxed areas in (a) and (c) are magnified in (b) and (d), respectively. Scale bar ¼ 20 μm. Images obtained on an Olympus IX81 fluorescent microscope at 20 and 40 magnifications. (Figure reproduced from [4] with permission from the American Chemical Society (ACS). Further permissions related to the material excerpted should be directed to the ACS)
13. The controls for dual lectin and immunohistochemistry experiments that must be carried out in parallel include incubating separate tissues with (1) lectin alone, (2) lectin with appropriate haptenic sugar alone, (3) primary and secondary antibodies alone, and (4) secondary antibody alone. The staining from the primary antibody and lectin by themselves should be the same as their tissue staining pattern in the dual staining experiment.
296
4
Siobha´n S. McMahon and Michelle Kilcoyne
Notes 1. We use the “99%, essentially globulin-free” grade BSA from Sigma-Aldrich Co., catalog number A7638. The aim is to use BSA that is mostly free from glycoprotein components of serum as the presence of carbohydrates will cause false positives and/ or negatives for carbohydrate binding recognition molecules. Periodic acid (or sodium metaperiodate) treatment renders any contaminating carbohydrates present in the BSA unrecognizable to lectins. 2. We find that it is best to prepare this fresh just before use each time. 3. Place the dry BSA in a Pyrex media bottle of sufficient size (e.g., 500 mL bottle for 200 mL final volume). Add a magnetic stir bar and place the bottle on a magnetic stir plate. Add the appropriate buffer volume to the bottle slowly with gentle stirring. It may take up to 30 min for the BSA to fully dissolve. 4. Do not incubate for excess time or the protein will oxidize. 5. It is not necessary to protect the oxidation reaction from light as the effect of light is only important for longer-term reactions (i.e., days) [16]. 6. This is to prevent powder escaping in to the lyophilizer in case of vacuum failure or accidental temporary interruption by another equipment user. In the absence of any accident, the ptBSA will remain in a loose plug powder at the bottom of the tubes when lyophilizing. 7. If the resulting protein after dialysis and lyophilization is not white and fluffy (e.g., it is orange to brown and lumpy), then the oxidized BSA should be discarded as there will be solubility issues. 8. We have used ptBSA properly stored dry for up to 3 years with no adverse effects. 9. We (and others) use a combination of Ca2+ and Mg2+ for functional lectin applications as many lectins require the presence of these divalent cations to function correctly. Mn2+ can also be included in the lectin buffer but this ion precipitates from TBS within a few hours and can leave debris on tissues and interfere with imaging. 10. Triton™ X-100 is very viscous so the end of the 1 mL pipette tip must be cut to aspirate the detergent. Maintain a very slow draw and dispensing. Invert the TBS-T to mix after addition of Triton™ X-100 and allow at least 30 min for the detergent to dissolve before use.
Lectin Histochemistry for Tissues and Cells, and Dual Lectin and Antibody. . .
297
11. We use the StainTray slide staining system with the black lid (catalog number Z670146 from Sigma-Aldrich Co. (now Merck)). It facilitates use as a humidity chamber, has raised polymer runners to keep slides in place and separated from water or excess stain that may drip from other slides, and it completely blocks light for incubations when lid is in place. 12. Dry skim milk, animal sera, and so on are not acceptable for use as blocking agents for histochemical procedures involving lectins. Blocking agents must be free from carbohydrates and glycosylated conjugates (e.g., glycoproteins and glycolipids) which cause false positives and/or negatives by interacting with the lectins themselves. 13. We have stored unopened blocking buffer aliquots at 4 C for up to a month in advance of use. The main risk is from contamination of the blocking buffer so we do not recommend reuse of aliquots. 14. Lectins with various fluorescent labels are available from many international reagent companies including EY Laboratories Inc. (San Mateo, CA, USA), Sigma-Aldrich Co., Vector Laboratories (Burlingame, CA, USA), Elicityl (Crolles, France), and Thermo Fisher Scientific. This list is by no means exhaustive. 15. Other antifade mountants may be alternatively used. We prefer ProLong™ Gold as in our experience it has even and easy spreading consistency when coverslipping slides, good curing time, and no background for fluorescence imaging. 16. PFA will not dissolve at lower temperatures. PFA is depolymerized to formaldehyde at 60 C. Do not heat above 60 C as the flash point of pFA is 70 C. 17. We use an oil bath to heat the PBS, both to maintain even heat throughout the column of liquid and avoid heat spikes when dissolving the pFA. 18. PFA is toxic, carcinogenic and causes respiratory irritation. Wear a mask, nitrile gloves and goggles when weighing pFA. To avoid exposing coworkers to pFA, cover the weigh boat containing the weighed pFA with another weigh boat (similar size to the original weigh boat containing the weighed pFA) when transporting to the fume hood. Transfer the weighed pFA to the bottle inside the fume hood and mix on a heat plate with magnetic stirrer placed inside the hood. 19. Dispose of used and unused pFA solutions as hazardous waste. 20. Inclusion of 0.1% ptBSA in the TBS-T1 diluent may be necessary for some antibodies with high background. 21. Slides should not be completely wet when placed in the slide staining chamber. The bottom edge of each slide can be
298
Siobha´n S. McMahon and Michelle Kilcoyne
touched against a folded paper towel after removing from the Coplin jar with the TBS wash to remove the bulk of the liquid. The slide should not be completely dry (i.e., a thin skin of hydration should be left intact to keep the tissue hydrated). 22. The total surface of the slide can be covered (approximately 1 mL per slide). Care should be taken not to break the surface tension at the edge of the slide when pipetting the solution on to the surface of the slide. When the lid is placed on the slide staining chamber, there is no significant loss of the liquid layering the slide surface during the incubation period (1–2 h). If a reagent of limited supply is used (e.g., antibody solutions), it is preferable to instead outline the tissue with a wax pen and layer the liquid inside this barrier to incubate the tissue. This approach uses significantly less volume than covering the whole slide surface (e.g., approximately 100 μL compared to 1 mL). 23. Fluorescent labels will become bleached if exposed to light. In our experience of teaching this method to other researchers over the years, exposure to light during reagent preparation, staining or after slide curing is the most common reason that no or low fluorescent signal is observed when imaging. 24. Using the same microscope settings, signal from the fluorescently labeled lectin coincubated with the haptenic sugar should be significantly reduced compared to the signal from the fluorescently labeled lectin without haptenic sugar. A reduction in signal intensity indicates that the lectin binding is carbohydrate-mediated. No change in signal intensity indicates that lectin binding is non—carbohydrate-mediated and the carbohydrate is not present in the tissue [11]. 25. We have also used ProLong™ Gold Antifade Mountant formulated with the blue DNA stain DAPI instead of using a separate DAPI staining step. While we observe good nuclear staining, we also occasionally observe higher slide background for fluorescence microscopy with the DAPI incorporated ProLong™ Gold Mountant. Thus for most of our tissue and cell staining we prefer to use mountant without DAPI and instead incorporate a separate nuclear staining step in the histochemical procedure. 26. We usually use Boots No. 7 clear nail topcoat but any normal clear nail topcoat available from a local pharmacy will work. 27. If the background is a little high it may be necessary to replace TBS washes with TBS-T washes after antibody incubations.
Lectin Histochemistry for Tissues and Cells, and Dual Lectin and Antibody. . .
299
References 1. Reily C, Stewart TJ, Renfrow MB, Novak J (2019) Glycosylation in health and disease. Nat Rev Nephrol 15:346–366 2. Collin EC, Kilcoyne M, White SJ, Grad S, Alini M, Joshi L, Pandit AS (2016) Unique glycosignature for intervertebral disc and articular cartilage cells and tissues in immaturity and maturity. Sci Rep 6:23062 3. Kilcoyne M, Sharma S, McDevitt N, O’Leary C, Joshi L, McMahon SS (2012) Neuronal glycosylation differentials in normal, injured and chondroitinase-treated environments. Biochem Biophys Res Commun 420:616–622 4. Kilcoyne M, Patil V, O’Grady C, Bradley C, McMahon SS (2019) Differential glycosylation expression in injured rat spinal cord treated with immunosuppressive drug Cyclosporin-a. ACS Omega 4:3083–3097 5. Streit WJ, Kreutzberg GW (1987) Lectin binding by resting and reactive microglia. J Neurocytol 16:249–260 6. Streit WJ (1990) An improved staining method for rat microglial cells using the lectin from Griffonia simplicifolia (GSA I-B4). J Histochem Cytochem 38:1683–1686 7. Goldstein IJ, Hughes RC, Monsigny M, Osawa T, Sharon N (1980) What should be called a lectin? Nature 285:66–66 8. Sharon N, Lis H (2003) Lectins, 2nd edn. Kluwer Academic Publishers, Dordrecht, The Netherlands 9. Naeem A, Saleemuddin M, Khan RH (2007) Glycoprotein targeting and other applications
of lectins in biotechnology. Curr Protein Peptide Sci 8:261–271 10. Barondes SH (1988) Bifunctional properties of lectins: lectins redefined. Trends Biochem Sci 13:480–482 11. Gerlach JQ, Kilcoyne M, Eaton S, Bhavanandan V, Joshi L (2011) Noncarbohydrate-mediated interaction of lectins with plant proteins. Adv Exp Med Biol 705:257–269 12. Venable A, Mitalipova M, Lyons I, Jones K, Shin S, Pierce M, Stice S (2005) Lectin binding profiles of SSEA-4 enriched, pluripotent human embryonic stem cell surfaces. BMC Dev Biol 5:15 13. Rao RR, Johnson AV, Stice SL (2007) Cell surface markers in human embryonic stem cells. In: Vemuri MC (ed) Stem cell assays. Humana Press, Totowa, NJ, pp 51–61 14. Glass WF, Briggs RC, Hnilica LS (1981) Use of lectins for detection of electrophoretically separated glycoproteins transferred onto nitrocellulose sheets. Anal Biochem 115:219–224 15. Fidler PS, Schuette K, Asher RA, Dobbertin A, Thornton SR, Calle-Patino Y, Muir E, Levine JM, Geller HM, Rogers JH, Faissner A, Fawcett JW (1999) Comparing astrocytic cell lines that are inhibitory or permissive for axon growth: the major axon-inhibitory proteoglycan is NG2. J Neurosci 19:8778–8788 16. Bobbitt JM (1956) Periodate oxidation of carbohydrates. In: Wolfrom ML, Tipson RS (eds) Advances in carbohydrate chemistry. Academic Press, London, pp 1–41
Chapter 15 Lectin Affinity Chromatography for the Discovery of Novel Cancer Glycobiomarkers: A Case Study with PSA Glycoforms and Prostate Cancer Esther Llop and Rosa Peracaula Abstract Many clinical biomarkers in cancer are glycoproteins, but the majority of them only consider the protein levels. Indeed, only alfa-fetoprotein (AFP) in hepatocarcinoma and CA15-3 in breast cancer are clinically monitored for their glycoforms. Aberrant glycosylation occurs frequently in many of the glycoproteins synthesized by tumor cells and often produce changes in protein glycoforms that could be exploited as potential biomarkers for improving diagnosis, prognosis or to study the response to treatment. Ideally, the screening of potential biomarkers should be performed from noninvasive samples like serum or plasma, therefore these glycoproteins with tumor associated-glycoforms should be shed from the tumor cell membrane or secreted into the blood to be detectable. Glycosylation changes that are commonly associated with cancer transformation include fucosylation, sialylation, branching, and polylactosaminylation. Lectins are glycan-binding proteins that bind with great specificity to different glycan moieties. Lectinbased strategies to enrich or fractionate glycoproteins are being extensively used and hold promise in targeted analysis for cancer biomarker discovery. Here we describe the use of lectin chromatography to separate prostate specific antigen (PSA) glycoforms based on their sialic acid linkage from sera of patients with prostate cancer (with PSA levels in the range of 2–20 ng/mL). In particular, agarose-bound Sambucus nigra agglutinin (SNA) lectin which has affinity for terminal α2,6-sialic acids on glycoproteins was used. The protocol included first a previous immunoaffinity step to enrich PSA and to avoid interferences of the most abundant serum glycoproteins. Then, the immunopurified PSA was loaded on the SNA chromatography and two fractions were obtained, the first one (unbound fraction) containing the PSA glycoforms without α2,6-sialic acid (basically α2,3-sialylated PSA glycoforms) and the second one (bound fraction) the α2,6-sialylated PSA glycoforms. The quantification of the PSA eluted in the two fractions allows for the determination of the relative content of both groups of PSA glycoforms. The percentage of the α2,6sialylated PSA glycoforms is significantly decreased in aggressive prostate cancer compared to indolent prostate cancer and benign prostate hyperplasia, being a promising new glycobiomarker for prostate cancer risk stratification. Key words Affinity chromatography, Cancer biomarker, Glycoprotein, Lectin, Prostate cancer, Prostate-specific antigen (PSA), Sambucus nigra agglutinin (SNA), Sialic acid
Gavin P. Davey (ed.), Glycosylation: Methods and Protocols, Methods in Molecular Biology, vol. 2370, https://doi.org/10.1007/978-1-0716-1685-7_15, © Springer Science+Business Media, LLC, part of Springer Nature 2022
301
302
1
Esther Llop and Rosa Peracaula
Introduction In recent years, many approaches using lectins have been developed for identifying new tumor protein glycoforms that could help in cancer diagnosis, prognosis and treatment monitoring [1]. The lack of specific antibodies against particular protein glycoepitopes has become an important obstacle for the application of immunoaffinity techniques commonly used for proteins. Fortunately, the specific affinity of lectins toward particular glycan moieties made these molecules suitable for glycoprotein or glycan enrichment, purification, detection, characterization, or quantification, especially when those come from complex biological samples as plasma or serum [2]. The presence of glycoproteins with potential to become specific cancer biomarkers in bodily fluids are usually in a very low proportion compared to other major glycoproteins. In this regard, it is important to highlight that 82 of the 100 most abundant plasma proteins are glycoproteins, being the immunoglobulins (Ig) and acute-phase proteins the most abundant. The dynamic range of these glycoproteins can be up to 10–12 orders of magnitude compared with the candidate glycobiomarkers [3]. Therefore, whenever possible, a depletion or an immunopurification step before performing lectin affinity approaches is required to address the glycosylation analysis of the candidate glycoprotein. This enrichment reduces the number of glycoproteins competing with the lectin, minimizes nonspecific interactions and improves the lectin binding capacity against the candidate glycoprotein. To fully unravel the glycan structures of a glycoprotein, the lectin affinity chromatography can be complemented either with mass spectrometry analysis [4] or with UPLC-HILIC glycan profiling [5]. However, these two last techniques need large amounts of samples, trained staff, expensive equipment and are limited in terms of capability to be high-throughput in clinical applications. In some situations, this structural characterization of all the glycoform structures is unnecessary because the separation of the glycoprotein glycoforms in two chromatographic fractions (lectin bound and unbound) confers enough information to differentiate tumor samples from benign or control ones. Moreover, these lectin chromatography protocols hold promise to develop clinical biomarker because they are particularly well suited to be transformed into a rapid point of care testing that could be easily transferred into clinics [6]. Here, we describe a lectin-affinity chromatography protocol to screen differences in sialic acid linkage in the prostate specific antigen (PSA) from prostate cancer serum samples of different aggressiveness. Although the same type of strategy could be pursued using any type of lectin, we selected the Sambucus nigra agglutinin (SNA) which binds preferentially to sialic acid attached
Lectin Affinity Chromatography for the Discovery of Novel Cancer. . .
303
to terminal galactose in α2,6 linkage, to isolate glycoconjugates containing α2,6-sialic acid from the nonglycosylated, neutral and the ones containing α2,3-sialic acid. This binding is inhibited to some extent by lactose or galactose; hence, these saccharides are commonly used for the glycoprotein elution. Sometimes an acidic lectin buffer is required for increasing the elution yield. SNA lectin may be covalently coupled to CNBr Activated Sepharose but agarose bound lectins are also commercially available. Agarose lectin is usually packed into a column, being recommended the spin columns when low amounts of sample are processed. Using SNA agarose chromatography on PSA previously immunopurified from serum samples it is possible to separate two fractions containing the PSA glycoforms without α2,6-sialic acid (basically α2,3-sialylated glycoforms) and the α2,6-sialylated PSA glycoforms, respectively. The quantification of PSA contained in each fraction using a specific antibody against PSA avoid bias produced by glycoproteins other than PSA and provide a percentage of each type of PSA sialylated glycoforms. After the analyses of multiple serum samples, we could establish a reference cut-off value of the percentage of αPSA 2,3-sialylated glycoforms that discriminated aggressive prostate cancer patients from indolent prostate cancer and benign hyperplasia patients. Thus, we could classify the aggressive prostate cancer patients according to their decrease in the percentage of PSA α2,6sialylated glycoforms with high sensitivity and specificity [7, 8].
2
Materials High-purity chemicals, including deionized water (18 MΩ cm at 25 C), are used throughout. PSA standard (BBI solutions). It is used spiked in women serum (with undetectable PSA levels) as a control experiment. Human serum samples from patients with prostatic pathologies (prostate cancer or benign prostatic hyperplasia). Serum samples should be aliquoted and frozen at 80 C until use. Avoid freeze– thaw cycles. To be analyzed, 1 mL of serum samples are left at 4 C until completely thawed. An aliquot of 250 μL of each serum sample is stored at 4 C for subsequently PSA quantification in Subheading 3.5 (step 1).
2.1 Treatment to Release PSA from the PSA-ACT Complex (Optional, See Note 1)
1. Ethanolamine solution: prepare a 200 μL 5 M ethanolamine solution by mixing 60.4 μL of pure ethanolamine >99% with 139.6 μL of deionized water. 2. HCl solution: prepare a 5 M HCl from Hydrochloric Acid 37% in deionized water. 3. pH meter. 4. Thermoblock.
304
Esther Llop and Rosa Peracaula
2.2 Direct Immunoprecipitation (IP) with Magnetic Beads
1. Magnetic particles coated with mouse monoclonal anti-PSA monoclonal antibody in Tris buffered saline, bovine serum albumin (BSA),